Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

vLLM v1 alpha bugs #1395

Open
lmmx opened this issue Jan 27, 2025 · 1 comment
Open

vLLM v1 alpha bugs #1395

lmmx opened this issue Jan 27, 2025 · 1 comment
Labels

Comments

@lmmx
Copy link

lmmx commented Jan 27, 2025

Describe the issue as clearly as possible:

vLLM v1 just dropped and I tried it with some existing Outlines code, it looks like adapt_tokenizer broke

Steps/code to reproduce the bug:

Lightly adapted repro (swap out the model as desired):


from pydantic import BaseModel

class FooModel(BaseModel):
    answer: int

def main(messages=["Hello world"], guide=FooModel, model_size="7b", cot_prefill="<think>\n\n</think>\n"):
    from outlines.models.vllm import adapt_tokenizer
    from outlines.processors import JSONLogitsProcessor
    from transformers import AutoTokenizer

    model_name = f"casperhansen/deepseek-r1-distill-qwen-{model_size}-awq"
    llm = LLM(model_name, enable_prefix_caching=True)
    tokenizer = llm.get_tokenizer()

    # Build the prompt with CoT markers
    msg_list = [{"role": "user", "content": msg} for msg in messages]
    prompt = (
        tokenizer.apply_chat_template(
            msg_list, tokenize=False, add_generation_prompt=True
        )
        + f"{cot_prefill}"
    )

    # Configure processors
    json_schema = json.dumps(guide.model_json_schema())
    model_name = llm.llm_engine.model_config.model
    outlines_tokenizer = adapt_tokenizer(AutoTokenizer.from_pretrained(model_name))
    guided_processor = JSONLogitsProcessor(
        schema=json_schema, tokenizer=outlines_tokenizer, whitespace_pattern=r" ?"
    )
    sampling_params = SamplingParams(
        temperature=temperature,
        top_p=top_p,
        max_tokens=max_new_tokens,
    )
    # Generate output
    output = llm.generate(prompt, sampling_params, use_tqdm=False)

main()

Expected result:

(working generation!)

Error message:

File "/home/louis/lab/r1/src/r1/silent_thought_vllm.py", line 115, in think                                   
    output = llm.generate(prompt, sampling_params, use_tqdm=False)                                                                                    
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^             
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/utils.py", line 1021, in inner                                                              
    return fn(*args, **kwargs)                                                 
           ^^^^^^^^^^^^^^^^^^^                                                 
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 454, in generate                                                  
    self._validate_and_add_requests(                                           
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 1175, in _validate_and_add_requests                               
    self._add_request(                                                         
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/entrypoints/llm.py", line 1193, in _add_request                                             
    self.llm_engine.add_request(                                               
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/v1/engine/llm_engine.py", line 163, in add_request                                          
    self.engine_core.add_request(engine_core_req)                              
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 215, in add_request                                         
    self._send_input(EngineCoreRequestType.ADD, request)                       
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/v1/engine/core_client.py", line 211, in _send_input                                         
    msg = (request_type.value, self.encoder.encode(request))                   
                               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^                    
  File "/home/louis/lab/r1/.venv/lib/python3.12/site-packages/vllm/v1/serial_utils.py", line 7, in encode                                                      
    return pickle.dumps(obj)                                                   
           ^^^^^^^^^^^^^^^^^                                                   
AttributeError: Can't get local object 'adapt_tokenizer.<locals>.convert_token_to_string'

Outlines/Python version information:

Version information

``` 0.1.11 Python 3.12.8 | packaged by Anaconda, Inc. | (main, Dec 11 2024, 16:31:09) [GCC 11.2.0] accelerate==1.3.0 aiohappyeyeballs==2.4.4 aiohttp==3.11.11 aiohttp-cors==0.7.0 aiosignal==1.3.2 airportsdata==20241001 annotated-types==0.7.0 anyio==4.8.0 argh==0.31.3 astor==0.8.1 attrs==24.3.0 bitsandbytes==0.45.0 blake3==1.0.2 cachetools==5.5.0 certifi==2024.12.14 charset-normalizer==3.4.1 click==8.1.8 cloudpickle==3.1.1 colorful==0.5.6 compressed-tensors==0.8.1 datasets==3.2.0 depyf==0.18.0 dill==0.3.8 diskcache==5.6.3 distlib==0.3.9 distro==1.9.0 einops==0.8.0 fastapi==0.115.6 filelock==3.16.1 frozenlist==1.5.0 fsspec==2024.9.0 gguf==0.10.0 google-api-core==2.24.0 google-auth==2.37.0 googleapis-common-protos==1.66.0 grpcio==1.69.0 h11==0.14.0 httpcore==1.0.7 httptools==0.6.4 httpx==0.28.1 huggingface-hub==0.27.1 idna==3.10 importlib-metadata==8.6.1 iniconfig==2.0.0 interegular==0.3.3 jinja2==3.1.5 jiter==0.8.2 jsonschema==4.23.0 jsonschema-specifications==2024.10.1 lark==1.2.2 linkify-it-py==2.0.3 lm-format-enforcer==0.10.9 markdown-it-py==3.0.0 markupsafe==3.0.2 mdit-py-plugins==0.4.2 mdurl==0.1.2 memray==1.15.0 mistral-common==1.5.1 mpmath==1.3.0 msgpack==1.1.0 msgspec==0.19.0 multidict==6.1.0 multiprocess==0.70.16 nest-asyncio==1.6.0 networkx==3.4.2 numpy==1.26.4 nvidia-cublas-cu12==12.4.5.8 nvidia-cuda-cupti-cu12==12.4.127 nvidia-cuda-nvrtc-cu12==12.4.127 nvidia-cuda-runtime-cu12==12.4.127 nvidia-cudnn-cu12==9.1.0.70 nvidia-cufft-cu12==11.2.1.3 nvidia-curand-cu12==10.3.5.147 nvidia-cusolver-cu12==11.6.1.9 nvidia-cusparse-cu12==12.3.1.170 nvidia-ml-py==12.560.30 nvidia-nccl-cu12==2.21.5 nvidia-nvjitlink-cu12==12.4.127 nvidia-nvtx-cu12==12.4.127 openai==1.59.9 opencensus==0.11.4 opencensus-context==0.1.3 opencv-python-headless==4.11.0.86 outlines==0.1.11 outlines-core==0.1.26 packaging==24.2 pandas==2.2.3 partial-json-parser==0.2.1.1.post5 pillow==10.4.0 platformdirs==4.3.6 pluggy==1.5.0 prometheus-client==0.21.1 prometheus-fastapi-instrumentator==7.0.2 propcache==0.2.1 proto-plus==1.25.0 protobuf==5.29.3 psutil==6.1.1 py-cpuinfo==9.0.0 py-spy==0.4.0 pyarrow==19.0.0 pyasn1==0.6.1 pyasn1-modules==0.4.1 pybind11==2.13.6 pycountry==24.6.1 pydantic==2.10.5 pydantic-core==2.27.2 pygments==2.19.1 pytest==8.3.4 python-dateutil==2.9.0.post0 python-dotenv==1.0.1 pytz==2024.2 pyyaml==6.0.2 pyzmq==26.2.0 -e file:///home/louis/lab/r1 ray==2.40.0 referencing==0.36.1 regex==2024.11.6 requests==2.32.3 rich==13.9.4 rpds-py==0.22.3 rsa==4.9 safetensors==0.5.2 sentencepiece==0.2.0 setuptools==75.8.0 six==1.17.0 smart-open==7.1.0 sniffio==1.3.1 starlette==0.41.3 sympy==1.13.1 textual==1.0.0 tiktoken==0.7.0 tokenizers==0.21.0 torch==2.5.1 torchvision==0.20.1 tqdm==4.67.1 transformers==4.48.0 triton==3.1.0 typing-extensions==4.12.2 tzdata==2025.1 uc-micro-py==1.0.3 urllib3==2.3.0 uvicorn==0.34.0 uvloop==0.21.0 virtualenv==20.29.1 vllm==0.6.6.post1 watchfiles==1.0.4 websockets==14.2 wrapt==1.17.2 xformers==0.0.28.post3 xgrammar==0.1.10 xxhash==3.5.0 yarl==1.18.3 zipp==3.21.0 ```

Context for the issue:

It just got announced in alpha, thought I should report :-)

@lmmx lmmx added the bug label Jan 27, 2025
@yvan-sraka
Copy link
Contributor

That’s great news! TBH, I’ve struggled so far to reproduce a working outlines test environment from scratch, mainly due to issues with vllm... if upgrading our dependency resolves this as a side effect, that would be nice! #1389 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants