-
-
Notifications
You must be signed in to change notification settings - Fork 5.3k
Issues: vllm-project/vllm
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Author
Label
Projects
Milestones
Assignee
Sort
Issues list
[Bug]: vLLM doesn't run on M4 Macbook Pro
bug
Something isn't working
#12426
opened Jan 25, 2025 by
BenzeneRain
1 task done
[Installation]: no module named "resources"
installation
Installation problems
#12425
opened Jan 25, 2025 by
Omni-NexusAI
1 task done
[Misc]: How to use a chat template to be applied ?
misc
#12423
opened Jan 24, 2025 by
MohamedAliRashad
1 task done
[Usage]: Is it possible to use How to use vllm
meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8
with vLLM?
usage
#12411
opened Jan 24, 2025 by
mrakgr
1 task done
[Bug]: Performance regression when use PyTorch regional compilation
bug
Something isn't working
#12410
opened Jan 24, 2025 by
anko-intel
1 task done
[Bug]: Slower inference time on less input tokens
bug
Something isn't working
#12406
opened Jan 24, 2025 by
vishalkumardas
1 task done
[Bug]: InternVL2-26B-AWQ Service startup failure
bug
Something isn't working
#12404
opened Jan 24, 2025 by
CallmeZhangChenchen
1 task done
[Bug]: AsyncEngineDeadError during inference of two vllm engine on single gpu
bug
Something isn't working
#12401
opened Jan 24, 2025 by
semensorokin
1 task done
[Usage]: Overwhelmed trying to find out information about how to serve Llama-3 70b to multiple users with 128k context
usage
How to use vllm
#12400
opened Jan 24, 2025 by
Arche151
[Feature]: Consider integrating SVDquant (W4A4 quantization) from Nunchaku project
feature request
#12399
opened Jan 24, 2025 by
dengyingxu
1 task
[Performance]: Unexpected performance of vLLM Cascade Attention
performance
Performance-related issues
#12395
opened Jan 24, 2025 by
lauthu
1 task done
[Usage]: use vllm to serve gguf model with cpu only
usage
How to use vllm
#12391
opened Jan 24, 2025 by
pamdla
1 task done
[Bug]: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xf8 in position 0: invalid start byte
bug
Something isn't working
#12390
opened Jan 24, 2025 by
jaydyi
1 task done
[Performance]: Details about the performance of vLLM on Performance-related issues
reasoning models
performance
#12387
opened Jan 24, 2025 by
shaoyuyoung
1 task done
[Bug]: v0.6.6.post1 is incompatible with Something isn't working
pynvml==12.0.0
bug
#12386
opened Jan 24, 2025 by
sharafeddeen
1 task done
[Usage]: mistralai/Ministral-8B-Instruct-2410 scale to 128k context length.
usage
How to use vllm
#12385
opened Jan 24, 2025 by
DJWCB-AUV
1 task done
[Bug]: Cannot run MiniCPMV on OpenVINO
bug
Something isn't working
#12384
opened Jan 24, 2025 by
cheng358
1 task done
[Usage]: How to run vllm with regression task, just like classify task
usage
How to use vllm
#12379
opened Jan 24, 2025 by
zhanpengjie
1 task done
[Usage]: vLLM serving with local model
usage
How to use vllm
#12378
opened Jan 24, 2025 by
kunrenzhilu
1 task done
[Bug]: [TPU] Prefix caching + w8a8 + long context results in degraded performance and corrupted output
bug
Something isn't working
#12371
opened Jan 23, 2025 by
kiratp
1 task done
Release v0.7.0
release
Related to new version release
#12365
opened Jan 23, 2025 by
simon-mo
5 of 8 tasks
[Bug]: Inference with gguf returns garbage
bug
Something isn't working
#12364
opened Jan 23, 2025 by
q0dr
1 task done
[RFC]: Refactor
config-format
and load-format
as plugins
RFC
#12363
opened Jan 23, 2025 by
maxdebayser
1 task done
[Usage]: When running models on multiple GPUs, workload does not get split
usage
How to use vllm
#12354
opened Jan 23, 2025 by
ArturDev42
1 task done
[Bug]: Cannot serve Qwen2.5 in OpenVINO
bug
Something isn't working
#12350
opened Jan 23, 2025 by
cheng358
1 task done
Previous Next
ProTip!
Find all open issues with in progress development work with linked:pr.