A structured knowledge graph of all WorldQuant Brain alpha development sessions, backed by a live memory layer that simulates, scores, and writes back new alphas directly into the graph.
The project has two halves:
- Knowledge graph (
graph/,nodes/, originalscripts/) — the static, extracted record of past alpha work. - Memory layer (
memory_layer/) — a live system that talks to the WQ Brain API, runs simulations, checks for overfit/correlation, and persists results back into the graph so each new session builds on the last.
How the pieces fit together — external sources feed an extraction pipeline into the knowledge graph, while the memory layer runs a live research loop against the WQ Brain API and writes results back into the same graph.
flowchart TB
subgraph EXT["External sources"]
WQB["WorldQuant Brain API"]
EXP["Claude.ai chat exports"]
LIT["Literature / SSRN papers"]
end
subgraph INGEST["Ingestion & extraction"]
PARSE["parse_exports.py"]
EXTRACT["entity extraction"]
NOTES["external_research/ notes"]
end
subgraph KG["Knowledge graph"]
NODES[("nodes/ — alphas, concepts,<br/>datafields, operators, failures")]
GRAPH[("graph/ — NetworkX DiGraph,<br/>edges.csv, graph.html")]
end
subgraph MEM["Memory layer"]
BRAINAPI["brain_api.py<br/>WQB REST client"]
PREFLIGHT["preflight.py<br/>novelty / coverage"]
SIM["simulator.py<br/>run + write_back"]
CORR["correlation_engine.py<br/>self-corr gating"]
FACTOR["factor_ontology.py<br/>16-factor taxonomy"]
BUDGET["budget.py<br/>sim-budget guard"]
VEC["vector_memory.py<br/>semantic recall"]
end
subgraph IFACE["Interfaces"]
CLI["query.py CLI"]
API["api.py — FastAPI"]
MCP["mcp.py — MCP server"]
CRITIC["critic.py — multi-model"]
end
EXP --> PARSE --> EXTRACT --> NODES
NODES --> GRAPH
LIT --> NOTES --> SIM
WQB <--> BRAINAPI
BRAINAPI --> SIM
PREFLIGHT --> SIM
BUDGET --> SIM
FACTOR --> CORR
SIM --> CORR
SIM -->|write_back| NODES
GRAPH --> VEC
GRAPH --> CLI
GRAPH --> API
GRAPH --> MCP
NODES --> CRITIC
Each new alpha is designed from graph gaps + literature, gated before and after simulation, and persisted so the next iteration starts smarter.
flowchart LR
A["Design template<br/>from graph gaps + literature"] --> B["preflight<br/>novelty + coverage"]
B --> C["sweep.py / simulator<br/>run on WQB"]
C --> D["overfit_checker<br/>+ correlation gate"]
D -->|pass| E["write_back to graph"]
D -->|fail| A
E --> F["query.py saturation<br/>→ next gap"]
F --> A
See ARCHITECTURE.md for the full component breakdown and API surface.
| Item | Count |
|---|---|
| Chat sessions processed | 16 |
| Alpha nodes extracted | 91 |
| Distinct concepts | 13 |
| Distinct datafields | 24 |
| Distinct operators | 23 |
| Failure mode types | 7 |
| Setting combos | 7 |
| Total graph edges | 1295 |
- CLV (Close-Location Value) —
((close-low)-(high-close))/(high-low)— the single highest-Sharpe direction found (alpha_0803, Sharpe 2.19), but all variants failed due to extreme turnover (70–138%). The core signal is strong; the challenge is reducing turnover via decay settings while preserving Sharpe. - EBIT/CapEx ratio —
-rank(ebit/capex)— the canonical beginner alpha. Industry neutralization brought Sharpe to 1.31 (alpha_0901) but failed self-correlation (0.9886). All smoothing attempts destroyed the signal because EBIT/CapEx is quarterly data — smoothing the same quarterly value adds noise. Final unresolved path: useDecay=5in simulation settings (not formula). - Win-streak momentum —
-ts_rank(returns>0?1:0, 250)— Sharpe 1.62 (alpha_0600) but turnover 131%. A 250-day linear decay partially fixed this (alpha_0604, iterating). - EBIT margin + valuation composite —
group_rank(... fnd2_ebitdm, fnd2_ebitfr ...)— heavy iteration in the abe10f6c session; best Sharpe ~1.04 but consistently failed fitness. - Analyst revision momentum —
analyst_revision_rank_derivative— consistently negative Sharpe (-0.70). Decaying and z-scoring did not help. Signal direction may be inverted or the field needs cross-sectional normalization.
- low_fitness (76 alphas) — Most common failure. Fitness = f(Sharpe, turnover); failing Sharpe always fails fitness.
- low_sharpe (61 alphas) — The 1.25 threshold is strict; most fundamental signals underperform without proper neutralization.
- high_turnover (12 alphas) — Mainly technical/price-volume signals. Fix: add
ts_decay_linearor increaseDecayin settings. - sector_bias (11 alphas) — Using market neutralization on fundamental ratios compares tech to utilities. Fix: use industry or subindustry neutralization.
- os_failure (3 alphas) — IS/OS divergence; signals overfit on 2019-2023 training window.
These appear in 0–2 alphas despite being available in WQ Brain:
cap(market cap) — 0 alphas — pure size signal unexploredvwap— 2 alphas — intraday price discovery signal underusedcashflow_op— 2 alphas — operating cash flow signals mostly skippedbeta_last_30_days_spy— 1 alpha — market sensitivity signal barely touched
- TOP3000 delay=1: 65 alphas (main workbench)
- TOP1000 delay=1: 13 alphas (competition-oriented testing)
- TOP500 delay=1: 13 alphas (concentrated portfolio testing)
# List all alphas implementing a concept, sorted by Sharpe
python scripts/query.py concept mean_reversion
# List all alphas using a specific datafield
python scripts/query.py datafield close
# List alphas tested under a specific setting
python scripts/query.py setting TOP3000 1 industry
# Frequency table of all failure modes
python scripts/query.py failures
# Datafields used in fewer than 3 alphas (exploration gaps)
python scripts/query.py gaps
# Print the full derivation tree for an alpha
python scripts/query.py lineage alpha_0900
# Top N alphas by Sharpe with full details
python scripts/query.py best 10
# Which factor families are over/under-represented in the portfolio
python scripts/query.py saturationThe live system that turns the static graph into a working research loop. Highlights:
brain_api.py— WQ Brain REST client (login, simulate, fetch alphas, self-correlation). Supports auto-reauth from saved credentials (python scripts/wqbrain_login.py --save-credentials).simulator.py— runs simulations andwrite_backs results (expression, metrics, datafields, operators, concepts) into the graph as new alpha nodes.brain_catalogue.json/brain_catalogue.py— full WQ Brain datafield catalogue (coverage, type, dataset) used for pre-flight checks.factor_ontology.py— maps datafields to factor families so the portfolio can be checked for saturation.preflight.py— field-novelty / operator-overlap checks before spending sim budget.correlation_engine.py,budget.py,provenance.py— self-correlation gating, daily sim-budget guard, and provenance tracking.
The original pipeline scripts (parse_exports.py, build_graph.py, query.py, visualize.py) are joined by a set of live-research helpers:
# Fresh-context critic — reviews recent work for logical fallacies + economic errors.
# Field ids are pseudonymized to factor-class tokens and resource ids / git history are
# stripped before anything leaves the machine. Claude-only by default (the Anthropic API
# does not train on inputs); other models opt in only via no-train endpoints. No chat /
# memory context; skips cleanly if no API key is set.
python scripts/critic.py
# Overfit risk check for an alpha (static + IS metrics + year-by-year + coverage).
python scripts/overfit_checker.py alpha_1206 --live
# Fan out multiple sweep.py templates in parallel, capped at WQB's concurrent-sim limit.
python scripts/run_concurrent.py --tabs # one Windows Terminal tab per template
# Recover a completed-but-unwritten WQB alpha into the graph by remote id.
python scripts/writeback_alpha.py <remote_id> "<expression>" --hypothesis "..." \
--datafields fld1,fld2 --operators op1,op2 --concepts c1,c2
# Rank a model53 sweep and propose the next 5 variants on the winning branch.
python scripts/analyze_mdl53.pyexternal_research/ holds the literature notes (options PCR, news sentiment, credit models, volatility sizing) that inform template design before sweeps are fired. Raw sweep logs are kept local / git-ignored — they carry live field ids and expressions.
exports/ Raw chat exports (.json)
nodes/
alphas/ alpha_NNNN.md — one file per distinct alpha expression
concepts/ Concept entity nodes (mean_reversion, momentum, ...)
datafields/ Datafield entity nodes (close, ebit, capex, ...)
operators/ Operator entity nodes (rank, ts_rank, group_neutralize, ...)
settings/ Setting combo nodes (TOP3000_1_industry.md, ...)
failure_modes/ Failure mode nodes
sessions/ Session summary nodes + _manifest.json
raw/ Full session text files (input for extraction)
graph/
graph.gpickle NetworkX DiGraph (Python pickle)
edges.csv All edges as CSV (source, target, relation)
graph.png Rendered visualization
graph.gexf Gephi-compatible export
graph.html Interactive HTML graph view
memory_layer/ Live memory layer (WQB API, simulator, catalogue, ontology) — see above
external_research/ Literature notes that inform template design (PCR, news, credit, vol)
logs/ Captured output of past phase sweeps
scripts/
parse_exports.py Phase 2: parse exports, filter WQ Brain sessions
backfill_entities.py Post-extraction: fill missing entity node files
build_graph.py Phase 4: build NetworkX graph from markdown files
query.py Phase 5: CLI retrieval helper (+ saturation/best)
visualize.py Phase 6: render graph.png + graph.gexf
sweep.py Fire a template of sims against WQB
run_concurrent.py Fan out multiple sweeps in parallel
critic.py Fresh-context multi-model critic
overfit_checker.py Overfit risk check for an alpha
writeback_alpha.py Recover a completed alpha into the graph
analyze_mdl53.py Rank a sweep and propose next variants
wqbrain_login.py WQB session login + credential save
Note:
nodes/,exports/,private/,logs/, the rendered graph exports (graph/*.html,*.png,*.gpickle), and the version / correlation dumps are git-ignored — they hold private session data and live field ids. The repo ships the code, the architecture, and curated summaries;public/is a sanitized mirror.
When you add new Claude.ai or ChatGPT exports:
# 1. Drop new .json file(s) into exports/
# 2. Re-parse (only new sessions are added; existing ones skip via manifest)
python scripts/parse_exports.py
# 3. Backfill entities
python scripts/backfill_entities.py
# 4. Rebuild graph
python scripts/build_graph.py
# 5. Re-visualize
python scripts/visualize.pyFor new sessions only, check nodes/sessions/_manifest.json to see which session_ids already exist. Copy the new session .txt from nodes/sessions/raw/ and run the extraction guide manually or spawn a sub-agent with the new session file and a fresh alpha ID range (start after the highest existing ID).
Current highest used IDs per agent batch:
- 0001–0017 (Pearson correlation session)
- 0100–0108 (uncorrelated price-volume session)
- 0200–0212 (EBIT ranking session)
- 0300–0309 (creating alphas + combined report)
- 0400–0407 (grouping sectors)
- 0600–0609 (liabilities session)
- 0700–0706 (paper strategies + WQB overview)
- 0800–0807 (fitness improvement + ruflo)
- 0900–0908 (EBIT/CapEx ratio)
Next available batch: start from alpha_1000 for new sessions.
- "This block is not supported on your current device yet." — Several sessions had file-write blocks that weren't visible in the export. The d9635e35 "Combined alpha report" session is entirely composed of these invisible outputs; no expressions were extractable from it.
- Informal metric reporting — When users pasted IS test results as plain text (e.g., "Sharpe of 1.31 is above cutoff"), agents parsed the numeric value directly. When Sharpe was mentioned only in passing ("it was around 1.3"), the agent used that value and is marked in status appropriately.
- Multi-session expressions — If the same expression appeared in multiple sessions, the earliest session was used as the canonical source.
- Settings in formula vs. settings panel — WQ Brain expressions sometimes embed neutralization inline (
with industry in Neutralization) and sometimes it's a separate simulation setting. Both are normalized to theneutralizationfrontmatter field.