vLLM v1 alpha bugs #1395

lmmx · 2025-01-27T21:59:00Z

Describe the issue as clearly as possible:

vLLM v1 just dropped and I tried it with some existing Outlines code, it looks like adapt_tokenizer broke

Steps/code to reproduce the bug:

Lightly adapted repro (swap out the model as desired):


from pydantic import BaseModel

class FooModel(BaseModel):
    answer: int

def main(messages=["Hello world"], guide=FooModel, model_size="7b", cot_prefill="<think>\n\n</think>\n"):
    from outlines.models.vllm import adapt_tokenizer
    from outlines.processors import JSONLogitsProcessor
    from transformers import AutoTokenizer

    model_name = f"casperhansen/deepseek-r1-distill-qwen-{model_size}-awq"
    llm = LLM(model_name, enable_prefix_caching=True)
    tokenizer = llm.get_tokenizer()

    # Build the prompt with CoT markers
    msg_list = [{"role": "user", "content": msg} for msg in messages]
    prompt = (
        tokenizer.apply_chat_template(
            msg_list, tokenize=False, add_generation_prompt=True
        )
        + f"{cot_prefill}"
    )

    # Configure processors
    json_schema = json.dumps(guide.model_json_schema())
    model_name = llm.llm_engine.model_config.model
    outlines_tokenizer = adapt_tokenizer(AutoTokenizer.from_pretrained(model_name))
    guided_processor = JSONLogitsProcessor(
        schema=json_schema, tokenizer=outlines_tokenizer, whitespace_pattern=r" ?"
    )
    sampling_params = SamplingParams(
        temperature=temperature,
        top_p=top_p,
        max_tokens=max_new_tokens,
    )
    # Generate output
    output = llm.generate(prompt, sampling_params, use_tqdm=False)

main()

Expected result:

(working generation!)

Error message:

File "/home/louis/lab/r1/src/r1/silent_thought_vllm.py", line 115, in think                                   
    output = llm.generate(prompt, sampling_params, use_tqdm=False)                                                                                    
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^             
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/utils.py", line 1021, in inner                                                              
    return fn(*args, **kwargs)                                                 
           ^^^^^^^^^^^^^^^^^^^                                                 
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 454, in generate                                                  
    self._validate_and_add_requests(                                           
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 1175, in _validate_and_add_requests                               
    self._add_request(                                                         
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 1193, in _add_request                                             
    self.llm_engine.add_request(                                               
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 163, in add_request                                          
    self.engine_core.add_request(engine_core_req)                              
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 215, in add_request                                         
    self._send_input(EngineCoreRequestType.ADD, request)                       
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 211, in _send_input                                         
    msg = (request_type.value, self.encoder.encode(request))                   
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                    
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 7, in encode                                                      
    return pickle.dumps(obj)                                                   
           ^^^^^^^^^^^^^^^^^                                                   
AttributeError: Can't get local object 'adapt_tokenizer.<locals>.convert_token_to_string'

Outlines/Python version information:

Version information

``` 0.1.11 Python 3.12.8 | packaged by Anaconda, Inc. | (main, Dec 11 2024, 16:31:09) [GCC 11.2.0] accelerate==1.3.0 aiohappyeyeballs==2.4.4 aiohttp==3.11.11 aiohttp-cors==0.7.0 aiosignal==1.3.2 airportsdata==20241001 annotated-types==0.7.0 anyio==4.8.0 argh==0.31.3 astor==0.8.1 attrs==24.3.0 bitsandbytes==0.45.0 blake3==1.0.2 cachetools==5.5.0 certifi==2024.12.14 charset-normalizer==3.4.1 click==8.1.8 cloudpickle==3.1.1 colorful==0.5.6 compressed-tensors==0.8.1 datasets==3.2.0 depyf==0.18.0 dill==0.3.8 diskcache==5.6.3 distlib==0.3.9 distro==1.9.0 einops==0.8.0 fastapi==0.115.6 filelock==3.16.1 frozenlist==1.5.0 fsspec==2024.9.0 gguf==0.10.0 google-api-core==2.24.0 google-auth==2.37.0 googleapis-common-protos==1.66.0 grpcio==1.69.0 h11==0.14.0 httpcore==1.0.7 httptools==0.6.4 httpx==0.28.1 huggingface-hub==0.27.1 idna==3.10 importlib-metadata==8.6.1 iniconfig==2.0.0 interegular==0.3.3 jinja2==3.1.5 jiter==0.8.2 jsonschema==4.23.0 jsonschema-specifications==2024.10.1 lark==1.2.2 linkify-it-py==2.0.3 lm-format-enforcer==0.10.9 markdown-it-py==3.0.0 markupsafe==3.0.2 mdit-py-plugins==0.4.2 mdurl==0.1.2 memray==1.15.0 mistral-common==1.5.1 mpmath==1.3.0 msgpack==1.1.0 msgspec==0.19.0 multidict==6.1.0 multiprocess==0.70.16 nest-asyncio==1.6.0 networkx==3.4.2 numpy==1.26.4 nvidia-cublas-cu12==12.4.5.8 nvidia-cuda-cupti-cu12==12.4.127 nvidia-cuda-nvrtc-cu12==12.4.127 nvidia-cuda-runtime-cu12==12.4.127 nvidia-cudnn-cu12==9.1.0.70 nvidia-cufft-cu12==11.2.1.3 nvidia-curand-cu12==10.3.5.147 nvidia-cusolver-cu12==11.6.1.9 nvidia-cusparse-cu12==12.3.1.170 nvidia-ml-py==12.560.30 nvidia-nccl-cu12==2.21.5 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu12==12.4.127 openai==1.59.9 opencensus==0.11.4 opencensus-context==0.1.3 opencv-python-headless==4.11.0.86 outlines==0.1.11 outlines-core==0.1.26 packaging==24.2 pandas==2.2.3 partial-json-parser==0.2.1.1.post5 pillow==10.4.0 platformdirs==4.3.6 pluggy==1.5.0 prometheus-client==0.21.1 prometheus-fastapi-instrumentator==7.0.2 propcache==0.2.1 proto-plus==1.25.0 protobuf==5.29.3 psutil==6.1.1 py-cpuinfo==9.0.0 py-spy==0.4.0 pyarrow==19.0.0 pyasn1==0.6.1 pyasn1-modules==0.4.1 pybind11==2.13.6 pycountry==24.6.1 pydantic==2.10.5 pydantic-core==2.27.2 pygments==2.19.1 pytest==8.3.4 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 pytz==2024.2 pyyaml==6.0.2 pyzmq==26.2.0 -e file:///home/louis/lab/r1 ray==2.40.0 referencing==0.36.1 regex==2024.11.6 requests==2.32.3 rich==13.9.4 rpds-py==0.22.3 rsa==4.9 safetensors==0.5.2 sentencepiece==0.2.0 setuptools==75.8.0 six==1.17.0 smart-open==7.1.0 sniffio==1.3.1 starlette==0.41.3 sympy==1.13.1 textual==1.0.0 tiktoken==0.7.0 tokenizers==0.21.0 torch==2.5.1 torchvision==0.20.1 tqdm==4.67.1 transformers==4.48.0 triton==3.1.0 typing-extensions==4.12.2 tzdata==2025.1 uc-micro-py==1.0.3 urllib3==2.3.0 uvicorn==0.34.0 uvloop==0.21.0 virtualenv==20.29.1 vllm==0.6.6.post1 watchfiles==1.0.4 websockets==14.2 wrapt==1.17.2 xformers==0.0.28.post3 xgrammar==0.1.10 xxhash==3.5.0 yarl==1.18.3 zipp==3.21.0 ```

Context for the issue:

It just got announced in alpha, thought I should report :-)

The text was updated successfully, but these errors were encountered:

yvan-sraka · 2025-01-28T18:39:43Z

That’s great news! TBH, I’ve struggled so far to reproduce a working outlines test environment from scratch, mainly due to issues with vllm... if upgrading our dependency resolves this as a side effect, that would be nice! #1389 (comment)

lmmx added the bug label Jan 27, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM v1 alpha bugs #1395

vLLM v1 alpha bugs #1395

lmmx commented Jan 27, 2025 •

edited

Loading

yvan-sraka commented Jan 28, 2025

vLLM v1 alpha bugs #1395

vLLM v1 alpha bugs #1395

Comments

lmmx commented Jan 27, 2025 • edited Loading

Describe the issue as clearly as possible:

Steps/code to reproduce the bug:

Expected result:

Error message:

Outlines/Python version information:

Context for the issue:

yvan-sraka commented Jan 28, 2025

lmmx commented Jan 27, 2025 •

edited

Loading