[Usage]: Is it possible to use meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8
with vLLM?
#12411
Open
1 task done
Labels
usage
How to use vllm
Your current environment
How would you like to use vllm
I want to run inference of meta-llama/Llama-3.2-1B-Instruct-SpinQuant_INT4_EO8 so I can use it as a draft model in speculative decoding for larger models, but I don't know how to integrate it with vLLM and I suspect it's due to vLLM not supporting the quantization scheme. I get the following errors when I try to run it directly. Is there a setting I could set to make it work or is it completely unsupported at the moment?
Before submitting a new issue...
The text was updated successfully, but these errors were encountered: