-
Notifications
You must be signed in to change notification settings - Fork 754
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add CPU affinity setting to latency benchmark #3085
Add CPU affinity setting to latency benchmark #3085
Conversation
Add CPU affinity setting for latency test
@hubertlu-tw May you provide offline benchmark numbers to two models and two different builds (machines) for the difference? Previously we only enabled this for online cases. Thanks! |
@hubertlu-tw Can you please fix the CI/Lint? |
I tested the changes on two different models (DeepSeek-V3 and Llama-3.1-70B) and two different servers (with different system setups). I have consistently observed 1-5% perf improvement with CPU affinity setting while running @HaiShaw the Lint error is resolved. Thanks. |
Motivation
This PR adds CPU affinity setting (i.e. NUMA binding) to latency test (bench_one_batch.py) to increase performance.
Modifications
Leverage "set_gpu_proc_affinity" when SGLANG_SET_CPU_AFFINITY=1
Checklist
CC: @HaiShaw