-
Notifications
You must be signed in to change notification settings - Fork 219
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Leaderboard README improvements #217
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi, thanks for the fixes!
@@ -111,7 +113,7 @@ for lang in "${langs[@]}"; do | |||
task=multiple-$lang | |||
fi | |||
|
|||
gen_suffix=generations_$task\_$model.json | |||
gen_suffix=generations_$task\_$model\_$task.json |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need to have the same path format as here
bigcode-evaluation-harness/main.py
Line 387 in 094c7cc
save_generations_path = f"{os.path.splitext(args.save_generations_path)[0]}_{task}.json" |
--load-generations_path
which can be anything. So let's maybe keep the original path to not have task
twice?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
because during evaluation we call
--load-generations_path
which can be anything.
Right however, current README steps for Evaluation passes $gen_suffix
variable in --load-generations_path
argument
bigcode-evaluation-harness/leaderboard/README.md
Lines 114 to 121 in 642c57f
gen_suffix=generations_$task\_$model.json | |
metric_suffix=metrics_$task\_$model.json | |
echo "Evaluation of $model on $task benchmark, data in $generations_path/$gen_suffix" | |
sudo docker run -v $(pwd)/$generations_path/$gen_suffix:/app/$gen_suffix:ro -v $(pwd)/$metrics_path:/app/$metrics_path -it evaluation-harness-multiple python3 main.py \ | |
--model $org/$model \ | |
--tasks $task \ | |
--load_generations_path /app/$gen_suffix \ |
Since $gen_suffix
is missing the _task
suffix, Running evaluations results in the following error
After adding _task
suffix in $gen_suffix
, evaluations run successfully.
I was able to run evaluations for the Artigenz-Coder-DS-6.7B here after these changes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it shouldn't throw an error if you used save_generations_path=generations_$task\_$model.json
in the generations
While trying to run the steps given in leaderboard README, found following improvements
1-Setup
model
variable to be initialised before creating generations and metrics directories2-Generations
save_generations
flag is missing in while running generationsmax_length
to be 1024 for some tasks, based on your tokeniser (Fix for max_length_generation parameter #207)3-Evaluations
Generations file is saved in
save_generations_path_$task
, while running evaluations it should load from this path(_$task is missing in the path in README).bigcode-evaluation-harness/main.py
Line 387 in 094c7cc