FinResearchAI

Multi-agent equity research you can watch think. A 12-node LangGraph pipeline streams three parallel analysts, a bounded bull/bear debate, a trader, and a risk debate into a BUY / SELL / HOLD verdict — live over SSE, persisted to a Postgres warehouse, replayable forever, and measured by the debate-cost ablation its reference paper omits.

Live demo: coming soon at finresearch.prateekmulye.dev — until then, the zero-key demo runs the full app on your machine in about two minutes.

Type a ticker, watch the agent graph light up node by node as analysts report, the bull and bear argue, the trader sizes a position, and a conservative↔aggressive risk debate settles into a final decision with a 0–100 score and conviction — with per-node cost, latency, and token counts streamed the whole way. Every run is written through to a Postgres + pgvector warehouse, so the research library, timeline replays, market data, and semantic search keep working even when live-run quota is spent.

How it works

flowchart LR
    router[router]
    subgraph analysts [parallel analysts]
        news[news analyst]
        fund[fundamentals analyst]
        tech[technicals analyst]
    end
    subgraph debate [research debate]
        bull[bull researcher]
        bear[bear researcher]
        facilitator[facilitator]
    end
    trader[trader]
    subgraph risk [risk debate]
        cons[conservative]
        aggr[aggressive]
        arbiter[risk arbiter]
    end
    reporter[reporter]

    router --> news
    router --> fund
    router --> tech
    news --> bull
    fund --> bull
    tech --> bull
    news --> bear
    fund --> bear
    tech --> bear
    bull --> facilitator
    bear --> facilitator
    facilitator --> trader
    trader --> cons
    trader --> aggr
    cons --> arbiter
    aggr --> arbiter
    arbiter --> reporter

That is the debate-on topology (12 nodes). build_graph("off") swaps the three debate nodes for a single research_synthesis node (10 nodes) — the single-pass baseline the eval harness compares against.

State-first LangGraph. A typed AgentState (src/state.py) is the only channel between nodes; concurrent writers (analyst reports, debate transcripts, metrics) merge through explicit reducers, never last-write-wins.
Structured outputs only. Every node calls with_structured_output(Schema, method=STRUCT_METHOD) against Pydantic models — no JSON string-scraping anywhere in the pipeline.
Degrade everywhere. A node whose LLM or tool call fails returns a schema-shaped fallback plus a zero-metrics trace line; one failure never aborts the graph or drops observability.
Write-through warehouse. Analyst tools persist whatever they fetch (prices, fundamentals, news) into Postgres 16 + pgvector during each run; a nightly in-process collector refreshes a 30-ticker watchlist across US, Indian, Japanese, Chinese, Hong Kong, and European exchanges.
Quick/deep model tiers. Provider-agnostic ChatOpenAI pointed at Ollama Cloud's OpenAI-compatible /v1 — gpt-oss:20b for routing, analysts, and reporting; gpt-oss:120b for debate, trading, risk, and judging. No OpenAI key, no GPU.
Local embeddings. fastembed BGE-small (384-dim) embeds news and report summaries at write time; semantic search is a pgvector cosine query with a keyword fallback. No embedding API key.

Screenshots

Research library	Timeline replay

Market dossier	Eval dashboard

The ablation the paper omits

The design adopts a deliberate subset of TradingAgents (arXiv 2412.20138): bull/bear debate, quick/deep model routing, structured state-passing, and an actionable signal. The paper, however, never isolates what the debate itself buys you. This repo does:

python -m src.eval.run --tickers evals/tickers.json --label demo

runs every ticker through both topologies — debate-on and debate-off — and reports action agreement, a blind judge's reasoning preference, score deltas, and the exact cost/latency/token price of the debate, persisted to the warehouse and visualized on the Eval page.

Honest framing: the quality signal is a judge-preference proxy, not realized P&L. The harness runs no backtest and reports no returns; the judge sees the two verdicts in a fixed A/B position, so a small positional bias is possible. That disclaimer is embedded in every report it writes (PROXY_DISCLAIMER in src/eval/report.py) — I'd rather ship a number I can explain than a backtest I can't defend.

Features

Live cockpit — the agent graph lights node by node from the SSE stream; bull/bear theses stream side by side; the decision reveals with a conviction gauge and a live cost ticker.
Run library + replay — every run is persisted with its full event stream; any run replays through a timeline scrubber with transport controls, exactly as it happened.
Market explorer + semantic search — global instrument search, candlestick / fundamentals / news dossiers, and pgvector semantic search over accumulated research.
Eval dashboard — the debate A/B results: judge preference, score deltas, and a cost-vs-quality scatter per ticker.
Demo guard — live runs are capped per IP and globally per day (Postgres-backed, restart-proof); the library, replays, and market data stay free; an X-Admin-Token header bypasses caps.
Fake-LLM mode — APP_FAKE_LLM=1 boots the entire stack with a deterministic offline LLM and canned market data: no keys, no network, no cost. It powers the e2e suite and the zero-key demo below.

Quickstart

a) Zero-key demo (no API keys)

Deterministic fake LLM + canned data — the full UI with nothing to sign up for.

pip install -e ".[all]"

# seed a throwaway SQLite warehouse with a few demo runs
APP_FAKE_LLM=1 DATABASE_URL=sqlite+aiosqlite:////tmp/finr.db \
    python scripts/seed_library.py

# backend
APP_FAKE_LLM=1 DATABASE_URL=sqlite+aiosqlite:////tmp/finr.db \
    uvicorn src.api.main:app --port 7860

# frontend (second terminal) → open http://localhost:5173
cd web && npm install && npm run dev

b) Real mode (live LLMs + market data)

cp .env.example .env          # add OLLAMA_API_KEY and FIRECRAWL_API_KEY
pip install -e ".[all]"

docker compose up -d db       # Postgres 16 + pgvector on localhost:5433
alembic upgrade head          # create the warehouse schema

export DATABASE_URL=postgresql+asyncpg://finresearch:finresearch@localhost:5433/finresearch
uvicorn src.api.main:app --port 7860
cd web && npm install && npm run dev

The warehouse is optional — without DATABASE_URL the graph still runs and records JSONL traces; you lose the library, market explorer, and search.

c) Production

One VPS, three containers (Caddy auto-HTTPS → FastAPI → Postgres):

docker compose -f docker-compose.prod.yml up -d --build

Full runbook — DNS, secrets, first-run checks, updates, backups, and the gated GitHub Actions deploy pipeline — in docs/deploy.md.

API

Everything under /api, liveness at the root:

Endpoint	What it does
`POST /api/analyze`	Run the graph; streams SSE (`start` / `node_start` / `node_complete` / `token` / `done` / `error`).
`GET /api/library`	Past runs, newest first, filterable by ticker/status.
`GET /api/runs/{run_id}`	Full run detail + the recorded event stream for replay.
`GET /api/market/instruments?q=`	Global instrument search across the watchlist.
`GET /api/market/{ticker}/prices`	Daily candles (also `/fundamentals`, `/news`).
`GET /api/eval/results`	Persisted debate A/B eval results.
`GET /api/search?q=`	Semantic (pgvector) search over news + run summaries, keyword fallback.
`GET /api/quota`	Remaining live-run demo quota for the caller.
`GET /healthz`	Liveness probe.

Stack

Backend: Python 3.11+, LangGraph, FastAPI + sse-starlette, SQLAlchemy 2 (async) + Alembic, Postgres 16 + pgvector, fastembed (BGE-small), APScheduler
LLMs: Ollama Cloud via langchain-openai (gpt-oss:20b quick / gpt-oss:120b deep)
Data: yfinance, Firecrawl, tradingview-ta
Frontend: React 19, Vite, TypeScript, Tailwind v4 + shadcn/ui, TanStack Query, @xyflow/react, lightweight-charts, Recharts, Motion
Ops: Docker Compose, Caddy (auto-HTTPS), GitHub Actions (5-job CI + gated deploy), Dependabot, gitleaks / pip-audit / Trivy

Repository layout

src/            the application — graph, agents, llm, tools, api, warehouse, collector, eval, memory, obs
web/            React SPA (Vite + TS); design tokens in web/DESIGN.md
tests/          offline unit + integration suites (540 tests; no network)
migrations/     Alembic schema for the warehouse
evals/          curated A/B ticker set; eval reports land here
scripts/        dev utilities (demo seeder, smoke test)
docker/         container entrypoint
docs/           deploy runbook, design history (docs/superpowers/), screenshots
.github/        CI + gated deploy workflows, Dependabot

Testing

815 tests: 540 backend (pytest, fully offline — LLMs and tools mocked) + 275 frontend (vitest). CI runs five parallel jobs: backend matrix (3.11/3.13), frontend gates, real-Postgres integration, security (gitleaks + pip-audit + npm audit), and a fake-LLM e2e smoke with a Trivy image scan.

python -m pytest -q                 # backend, offline
ruff check . && mypy src            # lint + types
cd web && npm run lint && npm run typecheck && npm run test:run && npm run build

Live tests (real Ollama Cloud + Firecrawl) are opt-in: RUN_LIVE=1 python -m pytest -m live. Postgres integration tests: docker compose up -d db, then python -m pytest -m db.

Design docs

The full design history — paper analysis, codebase assessment, work breakdown, and the per-work-package plans behind both the original build and the flagship elevation — lives in docs/superpowers/. Contributions welcome: see CONTRIBUTING.md.

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 178 Commits
.github		.github
docker		docker
docs		docs
evals		evals
migrations		migrations
scripts		scripts
src		src
tests		tests
web		web
.dockerignore		.dockerignore
.env.example		.env.example
.gitignore		.gitignore
.trivyignore		.trivyignore
CONTRIBUTING.md		CONTRIBUTING.md
Caddyfile		Caddyfile
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
alembic.ini		alembic.ini
docker-compose.prod.yml		docker-compose.prod.yml
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FinResearchAI

How it works

Screenshots

The ablation the paper omits

Features

Quickstart

a) Zero-key demo (no API keys)

b) Real mode (live LLMs + market data)

c) Production

API

Stack

Repository layout

Testing

Design docs

License

About

Uh oh!

Releases 1

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

FinResearchAI

How it works

Screenshots

The ablation the paper omits

Features

Quickstart

a) Zero-key demo (no API keys)

b) Real mode (live LLMs + market data)

c) Production

API

Stack

Repository layout

Testing

Design docs

License

About

Topics

Resources

License

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages