Skip to content

sslab-gatech/CRSBench

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2,461 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CRSBench - Cyber Reasoning System Benchmark Suite

CRSBench is the benchmark suite for OSS-CRS, the open-source orchestration framework for LLM-based autonomous bug-finding and bug-fixing systems (Cyber Reasoning Systems). It provides curated benchmarks and an evaluation harness for measuring any OSS-CRS-compatible CRS on vulnerability discovery and program repair.

Unlike traditional fuzzing benchmarks (e.g., FuzzBench) that only report coverage/crashes, CRSBench stores complete ground truth to track whether vulnerabilities are actually found and correctly patched.

Benchmark Statistics

Metric Value
Benchmarks 124 (87 Delta + 37 Full)
Upstream projects 82
Vulnerabilities (CPVs) 315
C / C++ 63 benchmarks, 123 vulnerabilities
JVM (Java) 61 benchmarks, 192 vulnerabilities
Distinct CWEs 91 (covers 21 of the 2025 CWE Top 25)
Vulnerabilities per harness 1.65 average, 12 max
PoV variants per vulnerability 3.89 average

Full breakdown and regeneration steps: docs/reference/benchmark-statistics.md.

Quick Start

CRSBench is Linux-only and requires Docker. The smallest first run is a queue-backed single-host experiment against the sanity suite.

git clone https://github.com/sslab-gatech/CRSBench.git && cd CRSBench
git submodule update --init --recursive
uv sync
./scripts/setup-third-party.sh
uv run crsbench prepare
uv run crsbench prepare --coverage   # for the bundled starter CRS

Configure environment variables. CRSBench auto-loads .env from the repo root; edit it for distributed Redis, LiteLLM credentials, etc. CRSBench currently requires you to bring your own LiteLLM endpoint, either via the local helper at scripts/litellm-helper.py or an external proxy. Refer to the LiteLLM docs for configuring providers, routing, and keys. See docs/getting-started/configuration.md for the CRSBench-side wiring.

cp .env.example .env

Request access to the HuggingFace dataset (gated). Open https://huggingface.co/datasets/sslab-gatech/crsbench-dataset and accept the Data Use Agreement - access is granted after manual approval. Once approved, authenticate (either set HF_TOKEN=hf_... in .env, or run hf auth login) and download the sanity suite:

uv run hf auth login   # or set HF_TOKEN in .env
uv run crsbench download --benchmark-suite smoke/sanity

Run the bundled quick-start config experiment-configs/smoke-testing/first-run.yaml. It targets the smoke/sanity suite (2 benchmarks, 3 harnesses) with the bundled atlantis-multilang-given_fuzzer CRS, runs 3 trial jobs in parallel, and does not need external LLM credentials (runtime.litellm.skip: true):

uv run python scripts/valkey-helper.py start
uv run crsbench worker --experiment-config experiment-configs/smoke-testing/first-run.yaml   # terminal 1
uv run crsbench run    --experiment-config experiment-configs/smoke-testing/first-run.yaml   # terminal 2

Documentation

Start with Getting Started:

  1. Install
  2. Configuration
  3. First Experiment
  4. Experiments - bug-finding, bug-fixing, discovery, replay, merge
  5. Deployment - single-machine, multi-machine, GCE cloud

Other entry points:

Architecture

CRSBench/
├── benchmarks/              # Benchmark projects (RFC format)
├── crsbench/                # Main Python package
│   ├── builder/             #   OSS-Fuzz variant building
│   ├── evaluation/          #   CRS execution & verification
│   ├── distributed/         #   Multi-machine execution (Redis/RQ)
│   ├── benchmark/           #   Packaging, canary, seed tools
│   ├── dataset/             #   HuggingFace upload/download
│   ├── validation/          #   Format validation & schemas
│   ├── reporting/           #   Reports & dashboard
│   └── statistics/          #   Benchmark statistics
├── oss-crs/                 # OSS-CRS runtime and registry (submodule)
├── third_party/oss-fuzz/    # Managed OSS-Fuzz checkout (sparse)
└── docs/                    # Documentation hub

License

CRSBench is licensed under MIT. Bundled upstream source code retains its original license - see LICENSE-THIRD-PARTY.md.

Related Projects

  • FuzzBench - fuzzer evaluation platform
  • OSS-Fuzz - continuous fuzzing for open source
  • AIxCC - AI Cyber Challenge

About

Cyber Reasoning System Benchmark Suite

Resources

License

MIT, Unknown licenses found

Licenses found

MIT
LICENSE
Unknown
LICENSE-THIRD-PARTY.md

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors