-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Infinite loop while using xgrammar with specific grammar #127
Comments
Hi @roG0d, thanks for the bug report. Sorry for the late response as I just finished my travel. After digging into your script I think that is mainly because there is too much non-determinism in the grammar (that means, for the same input string, there can be multiple interpretations according to the grammar). To much nondeterminism means multiple (maybe exponential) parsing stacks, which increases overhead at runtime. (Check out the "Why is it hard to accelerate general CFGs?" section in our blog. Specifically in your grammar,
Currently, you can try rewriting your grammar to reduce such nondeterminism. We are also looking into ways to alleviate this issue through automatic optimization. Please stay tuned! |
Thx @Ubospica sorry for the late response too! |
this schema also cause infinite loop when run benchmark its index is 26 in the benchmark dataset {"title": "Complex Object", "type": "object", "properties": {"name": {"type": "string"}, "age": {"type": "integer", "minimum": 0}, "address": {"type": "object", "properties": {"street": {"type": "string"}, "city": {"type": "string"}, "state": {"type": "string"}, "postalCode": {"type": "string", "pattern": "\\d{5}"}}, "required": ["street", "city", "state", "postalCode"]}, "hobbies": {"type": "array", "items": {"type": "string"}}}, "required": ["name", "age"]} Backend: xgrammar, Warmup Iter: 0: 0%| | 0/2 [00:00<?, ?it/s]
Backend: xgrammar, Data Point: 36: 36%|██████████▊ | 36/100 [00:05<00:11, 5.38it/s]
Backend: xgrammar, Data Point: 39: 39%|███████████▋ | 39/100 [00:05<00:08, 7.43it/s]
Backend: xgrammar, Data Point: 57: 57%|█████████████████ | 57/100 [00:08<00:05, 8.19it/s]
Backend: xgrammar, Data Point: 74: 74%|██████████████████████▏ | 74/100 [00:10<00:03, 6.59it/s]
Backend: xgrammar, Data Point: 99: 100%|█████████████████████████████| 100/100 [00:14<00:00, 6.99it/s]
Backend: xgrammar, Iter: 0: 50%|████████████████████ | 1/2 [00:14<00:14, 14.30s/it]
Backend: xgrammar, Data Point: 26: 25%|███████▌ | 25/100 [00:01<00:04, 15.04it/s] |
Hi @Dimitri-WEI-Lingfeng, thanks for raising this! We will try to fix it recently |
Env
Issue
Hitting a infinite loop using the specific grammar provided in the code. (WIP: Trying to debug and understand the error)
Context
We (@antferdom) are researching different gramar-based decoding techniques for code generation. In this specific setting we're trying to generate a DSL called UVL. A reiterative problem is that is indent-sensitive so context-dependent, if It can be fixed I believe it could unlock Python codegen as well.
Errors
When leaving generation, no error is triggered rather than an infinite execution. When the processs is stopped with SIGINT (CTRL + C) the error output is:
Code snippet for reproducibility
The text was updated successfully, but these errors were encountered: