Hi! I try to run the LiveCodeBench evaluation scripts in instruct/code_eval/lcb/ and it yields weird results.
# Result: 0.37888198757763975
CUDA_VISIBLE_DEVICES=0 python run_lcb.py \
--n 1 \
--difficulty easy \
--model Dream-org/Dream-v0-Instruct-7B \
--use_instruct_prompt \
--diffusion_steps 512 \
--max_new_tokens 1024 \
--evaluate \
--diffusion_remask_alg maskgit_plus \
--temperature 0.1 \
--use_cache
# Result: 0.3695652173913043
CUDA_VISIBLE_DEVICES=0 python run_lcb.py \
--n 1 \
--difficulty easy \
--model Dream-org/Dream-Coder-v0-Instruct-7B \
--use_instruct_prompt \
--diffusion_steps 512 \
--max_new_tokens 1024 \
--evaluate \
--diffusion_remask_alg maskgit_plus \
--temperature 0.1 \
--use_cache
Do the base and code models yielding identical results indicate expected behavior, or is there an issue with the evaluation scripts or released checkpoint?
Hi! I try to run the LiveCodeBench evaluation scripts in
instruct/code_eval/lcb/and it yields weird results.Do the base and code models yielding identical results indicate expected behavior, or is there an issue with the evaluation scripts or released checkpoint?