Leaderboard README improvements by nikita1503 · Pull Request #217 · bigcode-project/bigcode-evaluation-harness

nikita1503 · 2024-04-14T18:52:03Z

While trying to run the steps given in leaderboard README, found following improvements

1-Setup
model variable to be initialised before creating generations and metrics directories
2-Generations
save_generations flag is missing in while running generations
max_length to be 1024 for some tasks, based on your tokeniser (Fix for max_length_generation parameter #207)
3-Evaluations
Generations file is saved in save_generations_path_$task, while running evaluations it should load from this path(_$task is missing in the path in README).

bigcode-evaluation-harness/main.py

Line 387 in 094c7cc

save_generations_path = f"{os.path.splitext(args.save_generations_path)[0]}_{task}.json"

loubnabnl

Hi, thanks for the fixes!

loubnabnl · 2024-04-16T09:11:34Z

leaderboard/README.md

    fi

-    gen_suffix=generations_$task\_$model.json
+    gen_suffix=generations_$task\_$model\_$task.json


we don't need to have the same path format as here

bigcode-evaluation-harness/main.py

Line 387 in 094c7cc

save_generations_path = f"{os.path.splitext(args.save_generations_path)[0]}_{task}.json"

because during evaluation we call --load-generations_path which can be anything. So let's maybe keep the original path to not have task twice?

because during evaluation we call --load-generations_path which can be anything.

Right however, current README steps for Evaluation passes $gen_suffix variable in --load-generations_path argument

bigcode-evaluation-harness/leaderboard/README.md

Lines 114 to 121 in 642c57f

gen_suffix=generations_$task\_$model.json

metric_suffix=metrics_$task\_$model.json

echo "Evaluation of $model on $task benchmark, data in $generations_path/$gen_suffix"

sudo docker run -v $(pwd)/$generations_path/$gen_suffix:/app/$gen_suffix:ro -v $(pwd)/$metrics_path:/app/$metrics_path -it evaluation-harness-multiple python3 main.py \

--model $org/$model \

--tasks $task \

--load_generations_path /app/$gen_suffix \

Since $gen_suffix is missing the _task suffix, Running evaluations results in the following error

After adding _task suffix in $gen_suffix, evaluations run successfully.

I was able to run evaluations for the Artigenz-Coder-DS-6.7B here after these changes

it shouldn't throw an error if you used save_generations_path=generations_$task\_$model.json in the generations

updated leaderboard README

11abd17

loubnabnl reviewed Apr 16, 2024

View reviewed changes

Fix for Issue bigcode-project#207

101788d

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Leaderboard README improvements#217

Leaderboard README improvements#217
nikita1503 wants to merge 2 commits intobigcode-project:mainfrom
nikita1503:main

nikita1503 commented Apr 14, 2024 •

edited

Loading

Uh oh!

loubnabnl left a comment •

edited

Loading

Uh oh!

loubnabnl Apr 16, 2024

Uh oh!

nikita1503 Apr 17, 2024

Uh oh!

loubnabnl Jun 10, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

	gen_suffix=generations_$task\_$model.json
	metric_suffix=metrics_$task\_$model.json
	echo "Evaluation of $model on $task benchmark, data in $generations_path/$gen_suffix"

	sudo docker run -v $(pwd)/$generations_path/$gen_suffix:/app/$gen_suffix:ro -v $(pwd)/$metrics_path:/app/$metrics_path -it evaluation-harness-multiple python3 main.py \
	--model $org/$model \
	--tasks $task \
	--load_generations_path /app/$gen_suffix \

Conversation

nikita1503 commented Apr 14, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

loubnabnl left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

loubnabnl Apr 16, 2024

Choose a reason for hiding this comment

Uh oh!

nikita1503 Apr 17, 2024

Choose a reason for hiding this comment

Uh oh!

loubnabnl Jun 10, 2024

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

nikita1503 commented Apr 14, 2024 •

edited

Loading

loubnabnl left a comment •

edited

Loading