-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Roadmap] vLLM Roadmap Q1 2025 #11862
Comments
Will vLLM consider optimizing communication operations such as all gather/reduce through 4bit or 8bit quantization? |
When will V1 support fp8 kv cache? |
Will vllm consider supporting sparse attention like streamingLLM and h2o? |
Does vllm have plans to optimize host operations? Currently, both scheduling and sampling are handled on the host, which reduces GPU utilization. Is it possible to pipeline the scheduling, model execution, and post-processing operations to improve efficiency? Three-Stage Pipeline Timeline
|
This page is accessible via roadmap.vllm.ai
This is a living document! For each item here, we intend to link the RFC as well as discussion Slack channel in the vLLM Slack
vLLM Core
These projects will deliver performance enhancements to majority of workloads running on vLLM, and the core team has assigned priorities to signal what must get done. Help is also wanted here, especially for people want to get more involved in the core of vLLM.
Ship a performant and modular V1 architecture (#8779, #sig-v1)
Support large and long context models
Improved performance in batch mode
Others
Model Support
transformers
backend support #11330)Hardware Support
Optimizations
CI and Developer Productivity
Ecosystem Projects
These are independent projects that we love to have native collaboration and integration with!
If any of the items you wanted is not on the roadmap, your suggestion and contribution is strongly welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.
Historical Roadmap: #9006, #5805, #3861, #2681, #244
The text was updated successfully, but these errors were encountered: