Performance for LivecodeBench

Hi! I try to run the LiveCodeBench evaluation scripts in `instruct/code_eval/lcb/` and it yields weird results.

```bash
# Result: 0.37888198757763975
CUDA_VISIBLE_DEVICES=0 python run_lcb.py \
    --n 1 \
    --difficulty easy \
    --model Dream-org/Dream-v0-Instruct-7B \
    --use_instruct_prompt \
    --diffusion_steps 512 \
    --max_new_tokens 1024 \
    --evaluate \
    --diffusion_remask_alg maskgit_plus \
    --temperature 0.1 \
    --use_cache


# Result: 0.3695652173913043
CUDA_VISIBLE_DEVICES=0 python run_lcb.py \
    --n 1 \
    --difficulty easy \
    --model Dream-org/Dream-Coder-v0-Instruct-7B \
    --use_instruct_prompt \
    --diffusion_steps 512 \
    --max_new_tokens 1024 \
    --evaluate \
    --diffusion_remask_alg maskgit_plus \
    --temperature 0.1 \
    --use_cache
```

Do the base and code models yielding identical results indicate expected behavior, or is there an issue with the evaluation scripts or released checkpoint?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Performance for LivecodeBench #13

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Performance for LivecodeBench #13

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions