Description
Currently, the following loop runs in a single process, which can be a bottleneck when checking a large number of cached requests:
lm-evaluation-harness/lm_eval/api/model.py
Lines 264 to 284 in a96085f
Since this part of the code is only computing hashes and checking for existence in self.dbdict, it seems like a good candidate for parallelization using multiprocessing.
Would it be possible to explore using Python’s multiprocessing module or similar (e.g., concurrent.futures.ProcessPoolExecutor) to speed up this step?
This could significantly reduce the latency during evaluation or repeated runs, especially when dealing with a large number of requests.
Let me know if this is something you’d be open to – I’d be happy to explore or contribute a PR if helpful.
Activity