English | 简体中文
Stop hand-tuning
GEMINI.md. Let Gemini tune it — with holdout tests and hard gates, so regressions never ship.
If you use the Gemini CLI, you already have a GEMINI.md somewhere. If it's been growing for a while, chances are some of it has gone vague, duplicated, or stale. gemini-evolve runs a small optimization loop that mutates your instructions, grades each variant by actually running Gemini CLI on real tasks, and only writes back when a holdout set (examples it never trained on) says the new version is genuinely better.
Your baseline ~/.gemini/GEMINI.md:
# My Gemini Instructions
- try to be concise
- generally avoid long explanations
- if possible, use bullet points
- prefer to keep answers shortAfter one gemini-evolve evolve --apply run, the sharpen_constraints + condense mutations typically produce something like:
# My Gemini Instructions
## Response Style
- Answer in <=3 sentences by default; use bullets for lists of 3+ items.
- No preamble ("Sure!", "Great question!"). Start with the answer.
- Expand only when the user asks "explain" / "why" / "walk me through".Holdout score went up, size cap not exceeded, so --apply wrote the new file and kept GEMINI.md.<timestamp>.bak as your escape hatch. If the holdout score had not improved by at least 2%, the loop would refuse to apply — no sneaky regressions.
- You've edited
GEMINI.mdby hand one too many times. - You suspect it's bloated but don't want to break what works.
- You want measured improvements, not vibes.
| Requirement | How to check | Notes |
|---|---|---|
| Python 3.11+ | python3 --version |
3.12 tested too |
gemini CLI installed + logged in |
gemini --version then gemini -p "hi" -o json |
Install: google-gemini/gemini-cli. gemini-evolve just shells out to gemini, so whatever auth your CLI uses is fine. See the 30-second CLI check right below for verification order. |
| macOS or Linux | — | trigger cron-* (launchd) is macOS only. trigger watch and trigger hook-* work on both. Windows isn't tested. |
A GEMINI.md to evolve |
ls ~/.gemini/GEMINI.md |
If you don't have one yet, the Quickstart creates a stub for you. |
No Gemini SDK or API key handling lives inside this repo — every model call shells out to your local gemini CLI (gemini -p ... -o json). Same auth, same tools, same skills, same MCP servers as your real Gemini sessions.
If you've never used the Gemini CLI before, run these in order:
gemini --version # 1. must print a version, NOT "command not found"
# 2. authenticate (pick ONE — there is no `gemini auth` subcommand):
# a) first-time / Google account: just run `gemini` with no args, it opens
# an OAuth browser flow and drops creds at ~/.gemini/oauth_creds.json
# b) non-interactive / CI / AI Studio key: export GEMINI_API_KEY=<your-key>
gemini -p "say hi in one word" -o json # 3. verify you get a JSON object back
# NOTE: this is a real call (~5s, tiny quota hit)If the last line returns JSON, you're done — gemini-evolve will use the same auth automatically.
The full copy-paste-everything session is in docs/QUICKSTART.md. Short version — 5 steps, each with a "success looks like" line.
Tip: run
source .venv/bin/activateonce after step 1 to drop the./.venv/bin/prefix everywhere. (The cwd assumption is also repeated inline in step 1.)
# 1. install — run from any dir; the `cd` on line 3 leaves you inside gemini-evolve/
# (all subsequent steps assume that cwd). 3.11+ required.
git clone https://github.com/LichAmnesia/gemini-evolve.git
cd gemini-evolve
python3 -m venv .venv
./.venv/bin/pip install -e .
# success: pip finishes with "Successfully installed gemini-evolve-0.1.0 ..."# 2. verify both CLIs are reachable
./.venv/bin/gemini-evolve --version # success: "gemini-evolve, version 0.1.0"
gemini --version # success: a version string (NOT "command not found")# 3. make sure a target exists
mkdir -p ~/.gemini
test -f ~/.gemini/GEMINI.md || printf '# Gemini Instructions\n- be concise\n- use bullets\n' > ~/.gemini/GEMINI.md
# success: `cat ~/.gemini/GEMINI.md` prints something# 4. dry-run — validates gates, writes nothing to your target file
./.venv/bin/gemini-evolve evolve ~/.gemini/GEMINI.md --dry-run
# success: lines like "PASS non_empty", "PASS size", and "Dry run — skipping optimization."
# NOTE: --dry-run still builds the eval dataset first. With the default
# --eval-source synthetic this makes ONE real gemini call (~5-10s, small quota hit).
# For a truly zero-cost dry-run, use a tiny golden JSONL instead, e.g.:
# echo '{"task_input":"hi","expected_behavior":"respond briefly"}' > /tmp/stub.jsonl
# ./.venv/bin/gemini-evolve evolve ~/.gemini/GEMINI.md --dry-run \
# --eval-source golden --eval-dataset /tmp/stub.jsonl# 5. do a tiny real run — takes a few minutes
# default engine is GEPA; add --engine ga for the lighter tournament GA
# rough cost on defaults (dataset_size=10, GEPA auto=light):
# ~20 gemini calls total, 3-5 minutes wall clock, negligible spend
# (that's 1 synthetic-dataset build + ~10 GEPA metric calls + baseline/holdout scoring).
# medium scales ~2x, heavy ~5x. Interrupt with Ctrl-C any time — nothing is written
# until step 6 (--apply).
./.venv/bin/gemini-evolve evolve ~/.gemini/GEMINI.md --gepa-budget light
# success: GEPA iterations print, final "Evolution Results" table renders,
# and a new folder appears at ./output/<name>/<timestamp>/When the results table prints Evolution improved ... by +X%, open output/<target>/<timestamp>/evolved.md and eyeball it. Like what you see? Run once more with --apply:
./.venv/bin/gemini-evolve evolve ~/.gemini/GEMINI.md --applyOn success, your original is backed up as GEMINI.md.<UTC timestamp>.bak right next to the new file. Rollback = cp the backup back.
New here? docs/QUICKSTART.md has a single paste-and-run block, docs/FAQ.md covers the first 10 things that will go wrong, and docs/EXAMPLES.md has more before/after diffs.
"Why not just edit GEMINI.md by hand?"
You can, and you should, for stuff you know is wrong. gemini-evolve targets the stuff you don't know — like which phrasing of a rule actually changes Gemini's behavior. It A/B-tests variants against an eval set, so you get evidence instead of a guess.
"Why not just ask Gemini: 'fix my GEMINI.md'?"
That gives you one unvalidated rewrite. gemini-evolve generates several variants via distinct mutation strategies (clarity, condense, sharpen_constraints, ...), scores each by actually running Gemini CLI on eval tasks, runs a holdout (never-seen) comparison against the baseline, and hard-gates the apply on size, growth, and min-improvement. Overfitting and prompt-bloat get caught before they hit your file.
"How is this different from DSPy / GEPA / promptfoo?"
- DSPy / GEPA are general prompt-optimization frameworks. We use GEPA as the default engine — gemini-evolve is a thin, Gemini-CLI-native front-end for them, not a replacement.
- promptfoo is a prompt-eval harness. It tells you which of your prompts wins.
gemini-evolveactively generates candidates and closes the loop back into yourGEMINI.md. - Unique bit: everything runs through your local
geminiCLI — same auth, same tools, same skills, same MCP servers as your real sessions. No separate API key to manage.
Glossary (first-time terms):
- GEPA = Generative Evolutionary Prompt Adaptation — a reflection-driven prompt optimizer. We use it as the default engine. Opt out with
--engine gafor the lighter built-in tournament GA. - DSPy = Stanford's framework for programming with LMs; GEPA lives inside it. You don't write DSPy code to use gemini-evolve.
- Holdout = a slice of eval examples the loop never trains on, used only for the final before/after score. Catches overfitting.
- Tournament selection = each generation, score N variants, keep the best; that winner is the parent of the next generation.
- Hard gate = an apply-blocking rule (non-empty, size cap, growth cap, min improvement %). Fail any one → no write-back.
| Target | Path pattern | Scope | Size cap |
|---|---|---|---|
| Global instructions | ~/.gemini/GEMINI.md |
All Gemini CLI sessions | 15KB |
| Project instructions | <project>/.gemini/GEMINI.md |
One project | 15KB |
| Commands | ~/.gemini/commands/*.toml |
Custom slash commands | 5KB |
| Skills | ~/.gemini/skills/**/*.md |
Skill definitions | 15KB |
Project discovery scans ~/ws, ~/projects, ~/code by default — override with GEMINI_EVOLVE_PROJECT_PATHS=/path/a:/path/b.
More concrete before/after examples — rewriting custom commands/*.toml, evaluating skills/**/*.md, and mining real sessions into eval data — each has its own worked example in docs/EXAMPLES.md.
target file
-> build eval dataset (synthetic | session-mined | golden JSONL)
-> validate baseline gates (non-empty, size)
-> for each generation:
mutate N variants in parallel via gemini CLI
evaluate each variant in plan mode
tournament-select the best; optional crossover
-> holdout compare evolved vs baseline (never-seen examples)
-> save output/<target>/<timestamp>/{baseline.md, evolved.md, metrics.json}
-> optionally --apply and keep a .bak
Apply hard gates (all must pass, or no write-back):
- Content is non-empty.
- Size stays under the per-target cap (above).
- Growth stays within
max_growth_pct(default 20%). - Holdout improvement ≥
min_improvement_pct(default 2%). - Evolved content actually differs from baseline.
Real entry points: gemini_evolve/cli.py, evolve.py, gepa_evolve.py, mutator.py, fitness.py.
| Source | Flag | Data | When to use |
|---|---|---|---|
| Synthetic | --eval-source synthetic (default) |
Gemini generates tasks from your current instructions | First run; no eval data yet |
| Session | --eval-source session |
Real user messages mined from ~/.gemini/tmp/*/chats/session-*.json |
You've used Gemini CLI a while; want your actual workflows to drive optimization |
| Golden | --eval-source golden --eval-dataset data.jsonl |
Your own hand-curated JSONL | You know exactly what "good" looks like |
Session mining skips messages that look like secrets or credentials.
# watch sessions live; evolve when a session goes quiet
./.venv/bin/gemini-evolve trigger watch --apply
# macOS launchd (runs every N hours)
./.venv/bin/gemini-evolve trigger cron-install --interval 12 --apply
./.venv/bin/gemini-evolve trigger cron-status
./.venv/bin/gemini-evolve trigger cron-remove
# git post-commit hook — only fires on commits touching GEMINI.md / .gemini/
./.venv/bin/gemini-evolve trigger hook-install .
./.venv/bin/gemini-evolve trigger hook-remove .Heads up: cron-* uses launchd so it's macOS only. watch and hook-* work on Linux too.
Full list in docs/FAQ.md. Top hits:
| Symptom | Cause | Fix |
|---|---|---|
gemini CLI not found on PATH |
gemini not installed or not in the same shell |
Install Gemini CLI; which gemini must work in the terminal where you run gemini-evolve. |
No instructions targets found |
No ~/.gemini/GEMINI.md and no project .gemini/GEMINI.md under ~/ws, ~/projects, ~/code |
Create one, or set GEMINI_EVOLVE_PROJECT_PATHS=/abs/path and re-run discover. |
| Empty synthetic/session dataset | Gemini CLI call failed, no sessions, or sessions were all filtered out | Try --eval-source golden --eval-dataset your.jsonl, or run Gemini CLI a few times and retry session. |
Improvement below threshold |
Evolved version scored < min_improvement_pct on holdout |
Not a bug — this is the loop refusing to apply a regression. Rerun with more generations or a different --eval-source. |
launchctl errors on cron-install |
Not macOS (launchd missing) | Use trigger watch (cross-platform) or cron/systemd on your own. |
Not a git repository on hook-install |
Wrong cwd | cd into the repo root first. |
gemini-evolve discover --type instructions|commands|skills
gemini-evolve evolve TARGET [--dry-run] [--apply]
[-g GENERATIONS] [-p POPULATION]
[--eval-source synthetic|session|golden] [--eval-dataset FILE]
[--dataset-size N] [--llm-judge]
[--engine ga|gepa]
[--capture-trace] [--gepa-budget light|medium|heavy]
[--reflection-model MODEL]
gemini-evolve evolve-all --type instructions|commands|skills [--apply]
gemini-evolve trigger watch [--dir PATH] [--debounce FLOAT] [--type TYPE] [--apply]
gemini-evolve trigger cron-install [--interval N] [--type TYPE] [--apply]
gemini-evolve trigger cron-status
gemini-evolve trigger cron-remove
gemini-evolve trigger hook-install [REPO]
gemini-evolve trigger hook-remove [REPO]
Engine detail:
gepa(default) — DSPy + GEPA reflective loop ingemini_evolve/gepa_evolve.py. Uses reflection on captured traces; usually finds better variants per unit of compute.ga— built-in tournament GA ingemini_evolve/evolve.py. Lighter, no DSPy overhead; pass--engine gato use it.evolve-allalso accepts--engineand defaults togepa.
Runtime: variant evaluation calls gemini -p ... -o json --approval-mode plan and parses the first JSON blob from stdout, so MCP warnings don't break scoring. Plan mode already blocks tool execution, so --sandbox is off by default to save 5–20s per call.
Environment variables read by the code:
| Variable | Default | Used by |
|---|---|---|
GEMINI_EVOLVE_HOME |
~/.gemini |
Base Gemini home for targets and session mining |
GEMINI_EVOLVE_MUTATOR_MODEL |
gemini-3-flash-preview |
Variant generation |
GEMINI_EVOLVE_JUDGE_MODEL |
gemini-3.1-pro-preview |
LLM judge scoring |
GEMINI_EVOLVE_POPULATION |
4 |
Population size per generation |
GEMINI_EVOLVE_GENERATIONS |
5 |
Number of generations |
GEMINI_EVOLVE_OUTPUT |
output |
Result directory |
GEMINI_EVOLVE_PROJECT_PATHS |
~/ws:~/projects:~/code |
Project search roots for .gemini/GEMINI.md |
If your Gemini account can't access the preview models above, override them with any model that
gemini -m <model> -p hireturns successfully — e.g.export GEMINI_EVOLVE_MUTATOR_MODEL=gemini-2.5-flashandexport GEMINI_EVOLVE_JUDGE_MODEL=gemini-2.5-pro.
./.venv/bin/pip install -e ".[dev]"
./.venv/bin/python -m gemini_evolve.cli --help
./.venv/bin/pytest -qArchitecture and module inventory: docs/CODEMAPS/INDEX.md. Operator guide: docs/RUNBOOK.md. Contributing: docs/CONTRIB.md.
MIT — see LICENSE.