vLLM guided generation with xGrammars has regressed after vLLM v0.6.5. Very slow TTFT. #156

AlbertoCastelo · 2025-01-16T16:14:13Z

See issue I opened in vLLM

Ubospica · 2025-01-17T07:21:33Z

Hi @AlbertoCastelo, thanks for raising the issue! Could you provide the grammar you are using, so we can better find the problem?

AlbertoCastelo · 2025-02-04T18:04:07Z

@Ubospica completely forgot I didn't answer here.

I cannot share the full thing but I've replicated the issue with the example below. This is the intended structure of my response (I know it's not ideal but I cannot change it at this point):

Preamble with free text (except this sequence of characters "<|start|>"
<|start|>
generate some structured response
<|end|>

Issue

Take a look at the definition of preamble. Is there a better way to avoid a sequence of characters?

Slow grammar

the preamble avoid the "<|" sequence

root ::= message
message ::= preamble "<|start|>\n" structured-content "\n<|end|>"

preamble ::= ([^<] | "<" [^|])*
structured-content ::= ...

Faster grammar

the preamble only avoids the char "<" and takes it as indication of <|start|> block should start.

root ::= message
message ::= preamble "<|start|>\n" structured-content "\n<|end|>"

preamble ::= [^<]* 
structured-content ::= ...

Questions

Is there a better way to avoid a sequence of characters?
Also do you think having too much free text penalises the performance? do you guys have some benchmarks on this?
- Intuitively I think that the more structured the response the better because it can take advantage of skipping a few forward passess (decoding several tokens at once).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vLLM guided generation with xGrammars has regressed after vLLM v0.6.5. Very slow TTFT. #156

vLLM guided generation with xGrammars has regressed after vLLM v0.6.5. Very slow TTFT. #156

AlbertoCastelo commented Jan 16, 2025

Ubospica commented Jan 17, 2025 •

edited

Loading

AlbertoCastelo commented Feb 4, 2025

vLLM guided generation with xGrammars has regressed after vLLM v0.6.5. Very slow TTFT. #156

vLLM guided generation with xGrammars has regressed after vLLM v0.6.5. Very slow TTFT. #156

Comments

AlbertoCastelo commented Jan 16, 2025

Ubospica commented Jan 17, 2025 • edited Loading

AlbertoCastelo commented Feb 4, 2025

Issue

Slow grammar

Faster grammar

Questions

Ubospica commented Jan 17, 2025 •

edited

Loading