Replies: 1 comment 5 replies
-
We haven't tried GSM8k so this is not known to us. I wonder if the "math" nature of GSM8k makes it quite different than more QA-style question answering datasets, or inference style datasets we tried. Can you paste the LASER hypermaraeters that you have tried for your LLM below? Also, what is the number of layers in the LLM? I saw you used 51 which seems like this model is much deeper than Llama2 which is 32 layers, GPTJ which is 28 layers, and Roberta which is 12 layers. I do want to try GSM8k with LASER+Llama and a very exhaustive search of hyperparameters. Currently, I am finishing experiments on Phi-1.5, and then I can look into this and the Mistral LLM request. |
Beta Was this translation helpful? Give feedback.
-
Hi!
Right now the HF leaderboard has multiple models that have LASER interventions and all of them seem to have a drop in GSM8K benchmark results versus their base model.
Is this a known behavior? The paper talks about enhancing reasoning abilities and GSM8K should be closer related to reasoning than other benchmarks.
I thought it's an interesting subject to discuss.
Beta Was this translation helpful? Give feedback.
All reactions