Benchmark Results #31
Pinned
codelion
started this conversation in
Show and tell
Replies: 5 comments 2 replies
-
Reread on LiveBench
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Scatter plot showing the scaling of test time compute with different approaches with gpt-4o-mini on AIME 2024 You can see the original illustration here |
Beta Was this translation helpful? Give feedback.
2 replies
-
Results on the FRAMES benchmark with the memory plugin.
|
Beta Was this translation helpful? Give feedback.
0 replies
-
Results on AIME 2024 benchmark with optillm (eval script) AIME (2024) pass@1
|
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Entropy Decoding and CoT Decoding on GSM8k with Qwen2.5-0.5B-Instruct Model
Beta Was this translation helpful? Give feedback.
All reactions