Uses an LLM to generate C++ cache replacement policies, compiles them, tests them in the ChampSim CRC-2 simulator, and feeds the results back for the next iteration.
- When generated code fails to compile, the compiler errors are sent back to the LLM for automatic fixing (up to 3 attempts)
- Each generation pulls the best-performing policies from the database and includes them in the prompt (RAG)
- All policies, simulation results, and compilation attempts are logged in SQLite
- Python 3.8+
- Linux or WSL (the CRC-2
.alibraries are Linux binaries; on Windows, compilation and simulation run through WSL automatically) - g++
-
Create a virtual environment and install dependencies:
python -m venv venv source venv/Scripts/activate # Windows # source venv/bin/activate # Linux/Mac pip install -r requirements.txt
-
Get a GROQ API key from console.groq.com
-
Create a
.envfile:cp .env.example .env # Edit .env and add your GROQ_API_KEY -
Download the trace file (~500 MB) from Dropbox and place it at
champsim_crc2/trace/astar_313B.trace.gz
Seed the database with 36 reference CRC-2 policies:
python -m cache_policy_gen.main --seedRun the generation loop (default 10 iterations):
python -m cache_policy_gen.main --iterations 3Seed and run together:
python -m cache_policy_gen.main --seed --iterations 5- Queries the database for the best policies so far (RAG)
- Builds a prompt with the CRC-2 template and those top policies
- Sends the prompt to the LLM (Groq)
- Parses the C++ code from the response
- Compiles with g++. If compilation fails, the errors go back to the LLM for fixing (up to 3 tries)
- Runs the compiled binary through the ChampSim simulator
- Stores the results and uses them as feedback for the next iteration
Edit config.yaml to change:
- LLM model and temperature
- Compiler flags
- Simulator parameters (warmup/simulation instructions, timeout)
- Number of iterations and fix attempts
The generation prompt includes the CRC-2 interface spec along with the top policies from the database. Here is an abbreviated version:
System: You are an expert in CPU cache replacement policies. You write correct,
compilable C++ code for the ChampSim CRC-2 simulator.
User:
Create a novel, high-performance LLC replacement policy for the CRC-2 simulator.
The CRC-2 interface requires implementing 5 functions:
- InitReplacementState()
- GetVictimInSet(cpu, set, current_set, PC, paddr, type)
- UpdateReplacementState(cpu, set, way, paddr, PC, victim_addr, type, hit)
- PrintStats_Heartbeat()
- PrintStats()
Cache config: 1 core, 2048 sets, 16 ways per set.
Top Performing Policies So Far:
#1: ship++ (Hit Rate: 68.42%) [full code included]
#2: hawkeye_final (Hit Rate: 65.10%) [full code included]
#3: srrip (Hit Rate: 61.87%) [full code included]
Create a BETTER policy using advanced techniques (set dueling,
PC-based prediction, frequency tracking, adaptive insertion, etc.)
Return COMPLETE, compilable C++ code.
Sample run with 10 iterations on the astar_313B trace (see sample_run.txt for full output):
| Policy | Hit Rate | IPC |
|---|---|---|
| adaptive_frequency_and_pc-based_replacement (generated) | 47.69% | 0.1056 |
| shark (generated) | 47.69% | 0.1056 |
| srrip (reference baseline) | 45.86% | 0.1068 |
| lru (reference baseline) | 45.45% | 0.1040 |
The generated policies beat the LRU and SRRIP baselines by about 2 percentage points in hit rate. Iteration 8 is worth noting: the first compile failed with 2 errors, the LLM fixed them, and the second attempt compiled and ran successfully.
This project was inspired by the problem statement from CSC: GenAI for Systems, taught at North Carolina State University by Dr. Samira Mirbagher Ajorpaz.