Skip to content

Issues: EleutherAI/lm-evaluation-harness

reproduce llama 3 evals
#2557 opened Dec 10, 2024 by baberabb
Open 7
Beta
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Issues list

Passing sample based parameters for metric feature request A feature that isn't implemented yet.
#3038 opened Jun 3, 2025 by elements72
Add Support Conditional Generation Models like Mistral3 feature request A feature that isn't implemented yet.
#3027 opened May 29, 2025 by KyleMylonakisProtopia
Issue with quantization_config argument bug Something isn't working.
#3026 opened May 28, 2025 by shanhx2000
Support for using a remote /tokenize API endpoint as the tokenizer feature request A feature that isn't implemented yet.
#3017 opened May 24, 2025 by furkancoskun
Couldn't find file squad-v1.1/train-v1.1.json when evaluate Qwen3-A3B with vllm pipeline asking questions For asking for clarification / support on library usage.
#3015 opened May 23, 2025 by Lynnzake
hellaswag not working: "no tasks specified" and "Keyerror: 'train' asking questions For asking for clarification / support on library usage.
#3010 opened May 22, 2025 by matthijsvk
zeno_visualize.py can't parse model_args bug Something isn't working. good first issue Good for newcomers
#3005 opened May 21, 2025 by login256
unitxt with local-chat-completions gets stuck forever bug Something isn't working.
#2986 opened May 15, 2025 by ivanbaldo
Ruler QA tasks do not work for max_seq_lengths < 4096 bug Something isn't working.
#2963 opened May 9, 2025 by sustcsonglin
Log truncation/max_length to logged samples feature request A feature that isn't implemented yet.
#2961 opened May 8, 2025 by freshpearYoon
Is this result reasonable, please?
#2960 opened May 7, 2025 by kuang1216
Allow tasks to register a metric
#2950 opened May 2, 2025 by cbare
Livecodebench,AIME24 datasets
#2944 opened Apr 30, 2025 by sravan500
ProTip! Adding no:label will show everything without a label.