Skip to content

Request gold compile.sh scripts #23

@LC-John

Description

@LC-John

Hi there.

Really like this benchmark. I’m building a Modal-based pipeline to evaluate ProgramBench with higher parallelism, and I’d like to validate my implementation end-to-end without running a full agent each time.

For that purpose, I’m looking for a way to mock the agent generation phase with known-good submissions. The repo part is simple as the commit hash is known. As for the compile script part, it is a little bit tricky. Would it be possible to release the compile.sh scripts used to build the gold/reference executables, or any equivalent reference build scripts?

I understand if these cannot be shared due to benchmark integrity concerns. In that case, is there a recommended way to create a small set of known-good mock submissions for validating the evaluation pipeline?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions