Portable, model-agnostic deep research agent. Built on deepagents + LangGraph. Plans, asks clarifying questions when the request is ambiguous, spawns parallel sub-researchers, calls web search + MCP tools, runs code in an optional sandbox, applies on-disk skills, and writes a cited report — exposing a typed streaming-event protocol so any frontend can render the Claude/Gemini-style live research UI (clarification cards, search queries, website grid, MCP calls, skills, thinking, interleaved [n] citations).
Design goals
- No host-app dependency. Copy this directory (or
uv pip install -e .) into any app. The only seam isResearchConfig(env + per-runconfigurable). Zero imports from your backend. - Not model-locked. Every model goes through an OpenAI-compatible
base_url(OpenRouter by default). Models are organized as named price tiers defined in code (MODEL_TIERS: any OpenRouter slug, local vLLM, …); runtime selects a tier by name (DRA_MODEL_TIER=extra-low|low|mid|high, see Model tiers). - Replaceable parts. Search backend, MCP servers, skills, prompts, and the event emitter are all isolated modules.
This project is managed with uv (uv.lock is committed). Use uv — do not pip install into your base interpreter.
cp .env.example .env # set OPENAI_API_KEY, TAVILY_API_KEY (+ optional DRA_MCP_*)
./run.sh # sync deps (first run) + start the dev server on :2024run.sh is a one-command dev bring-up. It loads ./.env (so the script and server share config), syncs ./.venv on first run, and starts the LangGraph server:
| Command | Does |
|---|---|
./run.sh (or ./run.sh up) |
Sync deps if ./.venv is missing, then start the dev server (API + docs at http://127.0.0.1:2024/docs). Warns if OPENAI_API_KEY / TAVILY_API_KEY are unset. |
./run.sh --sync |
Force uv sync --extra dev, then start the server. |
./run.sh ask "<question>" |
Stream one research run against an already-running server. |
./run.sh smoke |
ask a canned question against a running server. |
./run.sh test |
Sync, then run the offline pytest suite (no API keys / network). |
Host/port follow DRA_HOST (default 127.0.0.1) and PORT (default 2024). ask/smoke need the server up in another shell first. The equivalent manual commands:
uv sync --extra dev # create ./.venv with all deps + the langgraph CLI
uv run langgraph dev --host 127.0.0.1 --port 2024
uv run python examples/client.py "What are the recent trends across the tracked entities, and where can I find supporting data?"Graph id: deep_research_agent — set this as your caller's assistant_id.
Tests live in tests/ (e.g. the deterministic report-hygiene guard — scrub_report + lint_citations / report_problems). Pure-Python, no API keys or network needed. pytest ships in the dev extra, so the suite runs inside ./.venv alongside the runtime deps:
./run.sh test # sync + run the suite (equivalent to the two commands below)
uv sync --extra dev # installs pytest + deepagents + the langgraph CLI into ./.venv
uv run pytest tests/ -qResolution order for every field: per-run configurable override → env var → default. configurable accepts both this package's native keys and compatibility aliases (research_model, final_report_model, apiKeys, mcp_config, mcp_prompt) so an existing caller can adopt the agent with zero backend changes.
| Env var | Default | Purpose |
|---|---|---|
OPENAI_API_KEY (or OPENROUTER_API_KEY) |
— | Key sent as Bearer to OPENAI_BASE_URL |
OPENAI_BASE_URL |
https://openrouter.ai/api/v1 |
OpenAI-compatible endpoint |
DRA_ALLOWED_BASE_URLS |
— | Comma-separated allowlist of extra base URLs a run may override to (key-exfiltration guard) |
TAVILY_API_KEY |
— | Web search; if unset, the web_search tool is omitted |
DRA_MODEL_TIER |
extra-low |
Named model package: extra-low | low | mid | high (see Model tiers below). The only model knob — individual models are chosen in code (MODEL_TIERS), never per env/run |
DRA_MCP_URL |
— | Single MCP server (bare host → /mcp appended) |
DRA_MCP_LABEL |
— | Friendly name for that server in the report's Sources |
DRA_MCP_SERVERS |
— | JSON list of {label, url} for multiple servers |
DRA_MCP_BEARER |
— | Bearer token attached to every MCP server lacking explicit auth |
DRA_MCP_MAX_CONCURRENCY |
10 |
Hard ceiling on simultaneous MCP calls across the whole run |
DRA_MCP_RATE_LIMIT_MAX_WAIT |
120 |
Per-call 429 backoff budget (seconds) before the call fails |
DRA_SKILLS_DIR |
./skills |
Directory of agent skills (see below) |
DRA_STREAMING |
true |
Token-by-token streaming; set false for models with off-spec streaming chunks |
DRA_STREAMING_DENYLIST |
deepseek-v4-flash |
Comma-separated model-name substrings that force streaming off |
DRA_RECURSION_LIMIT |
4500 |
LangGraph super-step ceiling for the orchestrator loop (caps loops, not tool calls) |
DRA_MAX_TOOL_CALLS |
200 |
Cumulative tool-call ceiling per run (BudgetMiddleware) before a hard stop |
DRA_MAX_TOTAL_TOKENS |
4000000 |
Cumulative token ceiling per run; soft wrap-up nudge at 75%, hard stop at 100% |
DRA_MAX_RESULT_CHARS |
60000 |
Per-call MCP result size over which the result offloads to a file (or truncates, no sandbox) |
DRA_MAX_RESULT_ROWS |
1000 |
Per-call MCP result row count that triggers the same offload/truncate |
DRA_OFFLOAD_RESULTS |
true |
Offload large MCP results to the sandbox filesystem instead of truncating them |
DRA_OFFLOAD_DIR |
/workspace/data |
Directory (inside the sandbox) for offloaded result files |
LLM_SANDBOX_URL |
— | Code-execution sandbox sidecar; when set, the execute tool runs real shell/Python/JS |
LLM_SANDBOX_TOKEN |
— | Auth token; must match the sandbox service's LLM_SANDBOX_TOKEN |
LLM_SANDBOX_NETWORK |
false |
Allow outbound network from inside the sandbox |
LLM_SANDBOX_SESSION_TIMEOUT |
900 |
Sandbox session timeout (seconds) |
Per-run configurable keys mirror these: model_tier, apiKeys.{OPENAI_API_KEY,TAVILY_API_KEY}, base_url (allowlisted only), temperature, search_max_results, max_concurrent_research_units, mcp_servers / mcp_config, mcp_prompt, mcp_max_concurrency, mcp_rate_limit_max_wait, skills_dir, streaming, streaming_denylist, recursion_limit, max_tool_calls, max_total_tokens, max_result_chars, max_result_rows, offload_results, offload_dir, sandbox_url, sandbox_token, sandbox_network, sandbox_session_timeout.
Models are chosen by NAME only: DRA_MODEL_TIER=mid (or per-run configurable.model_tier). Which models a name means is decided in code — MODEL_TIERS in config.py, one reviewed place — and is not settable per env var or per run; legacy per-model keys (research_model, final_report_model, compression_model, …) are ignored with a warning. The default, when nothing is configured, is extra-low — a bare checkout can't silently burn money; opt up explicitly for real work. An unknown tier name warns and falls back to the default. OpenRouter slugs, prices $/M input/output as of 2026-06:
| Tier | Research (orchestrator) | Sub-agent | Utility |
|---|---|---|---|
extra-low |
deepseek/deepseek-v4-flash (0.10/0.20) |
deepseek/deepseek-v4-flash (0.10/0.20) |
qwen/qwen3-30b-a3b-instruct-2507 (0.05/0.19) |
low |
deepseek/deepseek-v4-pro (0.44/0.87) |
deepseek/deepseek-v4-flash (0.10/0.20) |
deepseek/deepseek-v4-flash |
mid |
google/gemini-3.5-flash (1.50/9) |
google/gemini-2.5-flash (0.30/2.50) |
deepseek/deepseek-v4-flash |
high |
anthropic/claude-opus-4.8 (5/25) |
anthropic/claude-sonnet-4.6 (3/15) |
anthropic/claude-haiku-4.5 (1/5) |
extra-low is rock bottom — deepseek-v4-flash for both tool-loop roles (it's proven reliable here as this tier's orchestrator and the low tier's sub-agent, unlike the cheaper-but-flakier open-weight options that gave up mid-loop), so the sub-agent is never pricier than the orchestrator. Delegation still pays off via context isolation. ~$0.02 of orchestrator spend per medium run. Expect noticeably weaker planning and earlier give-ups than higher tiers; the force-completion / findings-gate / budget backstops keep runs honest, not great. For demos, smoke tests, and high-volume low-stakes scheduled ticks — not for decisions. high deliberately keeps sub-agent/utility at sonnet/haiku tier — Opus plans and synthesizes only; an Opus sub-agent fleet would defeat the tiering. To add your own packaging: add an entry to MODEL_TIERS (code), pick a name, and document it in this table — callers then select it with DRA_MODEL_TIER=<name>. An unknown tier name is ignored with a warning (plain defaults apply).
Stream with stream_mode=["messages","updates","custom"] and stream_subgraphs=True. The custom channel carries protocol events (each a JSON object with type); the messages channel carries assistant thinking tokens for the collapsible pane.
type |
Key fields | Renders as |
|---|---|---|
clarification |
questions[] |
Question card; input re-enabled. On submit, reply on the same thread with each answer paired to its question (1. Q: … A: …) — not bare answers |
search_query |
id, query, source |
Globe row |
search_results |
id, query, ok, count, results[].{title,url,domain,snippet} |
Favicon + title grid |
source |
title, url, domain |
Live citation list entry |
mcp_call |
id, tool, args |
MCP call row |
mcp_result |
id, tool, ok, summary; on failure error_class = permanent | transient | unknown (+ repeated when an identical failed call was answered locally) |
MCP result row |
skill |
name, path, state |
"Skill applied: <name>" indicator |
subagent_findings |
unit, summary, findings[].{finding,evidence,source}, gaps[] |
Folded findings table (one per sub-agent); emitted when a sub-agent's findings validate |
report |
markdown |
Final answer (also in state final_report) |
usage |
tool_calls, total_tokens, model_calls, limits{}, … |
Per-run ledger at run end (no UI; logging / cost tracking) |
status |
state = mcp_ready | mcp_error | budget_soft | budget_halt | revising | done |
Lifecycle / errors |
status detail: mcp_ready carries tool_count + tools[]; mcp_error carries detail, server, label; budget_soft is the 75% wrap-up nudge and budget_halt the hard ceiling stop (see budgets below); revising fires when a gate bounces a deliverable back for one revision — reason: report_quality (final report) or reason: subagent_findings (a sub-agent's findings handoff); done fires when the report is finalized.
The usage event (from metering.py) reports orchestrator-level token counts plus global tool-call / result-size totals and the configured ceilings — emitted once at run end for logging and cost tracking.
Final thread state also exposes final_report (string) and sources ([{index,url,domain}]) — structured citations independent of the inline [n] markers the writer model produces.
Async / background runs (Gemini-style "leave this chat"). LangGraph persists the thread, so a run survives client disconnect. Reconnect by joining the run stream or polling GET /threads/{id}/state for final_report.
When a request is ambiguous (unclear scope, timeframe, entity, or goal) the orchestrator calls request_clarification up front, emits a clarification event, and stops.
The reply must pair each answer with its question. The frontend collects the answers and sends them back on the same thread as one user message restating each question with its answer — e.g.:
Answers to your clarifying questions:
1. Q: Scope to US-listed BDCs or global? A: US-listed
2. Q: Timeframe? A: last 12 months
Sending bare answers ("the first") loses meaning — without the question the agent can't tell what "the first" refers to. As insurance the request_clarification tool also echoes the questions into its own result, so they stay in context even if history is trimmed. The user's reply lands on the same thread, so the agent then has the full Q&A in context and proceeds to research. A deterministic fallback (ClarificationFallbackMiddleware) emits the same event if a model narrates questions in prose without calling the tool, so the card always appears regardless of model. See examples/client.py for the round-trip.
Skills are folders under ./skills/, each with a SKILL.md (progressive-disclosure instructions the agent reads on demand). They're mounted read-only at the virtual path /skills/; the agent reads them via read_file("/skills/<name>/SKILL.md") while its own scratch files stay in an ephemeral state backend. The first time a skill is read in a turn, a skill event fires ("Skill applied: <name>"). Point elsewhere with DRA_SKILLS_DIR / configurable.skills_dir; if the directory is absent the agent runs normally with no skills.
Add a deployment-specific tool without touching the generic codebase: drop a *.py file in ./custom_tools/ and restart. Each file subclasses CustomTool — set name / description, implement run — and the loader auto-discovers it, infers the arg schema from run's typed params, and gives it to the orchestrator and every sub-agent.
# custom_tools/weather.py
from deep_research_agent.tools.custom import CustomTool
class WeatherNow(CustomTool):
name = "weather_now"
description = "Current weather for a city. Cite as 'OpenWeather'."
async def run(self, city: str) -> str: # sync def works too
# self.cfg is the run config; return a string (hardcoded here as an example)
return f"{city}: 21°C, clear skies, humidity 48%. Source: OpenWeather."run may be sync or async; self.cfg is the live ResearchConfig. Return value: the model always sees a string — return a str (JSON-encode structured data yourself), or a list/dict and the framework JSON-encodes it for you; a large list of rows is offloaded to a file the execute tool reads back. Override enabled(cls, cfg) -> bool to load conditionally (e.g. only when an env var is set). Copy custom_tools/_template.py to start; for dynamic cases a build_tools(cfg) / build_tool(cfg) factory returning LangChain tools is also accepted. Point elsewhere with DRA_CUSTOM_TOOLS_DIR. Full guide: docs/CUSTOM_TOOLS.md.
It speaks the LangGraph HTTP/SSE API, so any consumer (the included examples/client.py, the JS @langchain/langgraph-sdk, or raw SSE) works. To wire it into an existing deployment:
- Run this graph (point your dev script /
langgraph.jsonat it). - Set
assistant_idtodeep_research_agent. - Pass per-run config via
configurable(see the Configuration table above). - To get the rich live UI, have the frontend additionally consume the
customevent channel above.
Who connects, and where the config comes from. The agent is always the MCP client — it opens the connection itself (at graph build, agent.py → load_mcp_tools) and the model calls the resulting tools during research. There is no separate connector process. What varies is where the server list (url + auth) is resolved from. Precedence (first non-empty wins, config.py):
configurable.mcp_servers— per-run request (native).configurable.mcp_config— per-run request (compat alias). The normal host-app path: the backend injects url +headers(incl. auth) into every run, so the env vars below are never consulted.DRA_MCP_SERVERS— env (JSON list).DRA_MCP_URL(+DRA_MCP_LABEL) — env (single server).
So when a request arrives with MCP config, the agent connects using that (and its auth). When a bare run arrives without it — e.g. a Studio / langgraph dev trigger, or any caller that omits configurable.mcp_config — it falls back to the DRA_MCP_* env entry. The env entry is a standalone-run fallback, not the primary path. If that fallback has no auth, you get the failure below.
Auth / 401 Unauthorized. A 401 means the connection reached the server and was rejected for missing/wrong credentials — the path is correct, so do not strip /mcp (that would give 404, a different error). Attach credentials instead:
- request-supplied servers: put them in
headers(e.g.{"Authorization": "Bearer …"}or a server-specific header likex-litellm-api-key). - env-supplied servers: set
DRA_MCP_BEARER=<token>— it's attached asAuthorization: Bearer <token>to every server that doesn't already carry explicit auth.
To keep bare local runs from attempting an auth-less connect at all, leave DRA_MCP_URL unset and rely on the backend to inject mcp_config.
/mcp path rule differs by source. Under mcp_config, url is treated as a base and /mcp is appended for you — pass the url without /mcp. Under DRA_MCP_URL / mcp_servers, the url is used as given except that a bare host gets /mcp appended; a url that already has a path is left untouched — so pass the full url with /mcp.
Other guards.
- Connect to
127.0.0.1, never0.0.0.0(bind address — dialing it fails). Config normalizes0.0.0.0→ loopback defensively. - Each call is bounded by a shared semaphore (
mcp_max_concurrency) so the agent's fan-out can't exhaust the server's file descriptors; 429s back off and retry withinmcp_rate_limit_max_waitrather than failing immediately. - SSRF guard: only
http(s)schemes are allowed and link-local / cloud-metadata targets are refused. Loopback / private hosts are allowed (the internal gateway uses them). - Connection failures emit
status: mcp_error(with detail) instead of failing silently — one unreachable server does not take down the others or the run. - A FAILED tool call never kills the run: the error is returned to the model as the tool result with retry guidance, classified
permanent(validation / unknown names — fix the arguments, never retry) vstransient(one retry ok). Servers can tag explicitly by prefixing the error message with[permanent]/[transient]; an identical retry of a permanently-failed call is answered locally without hitting the server.
- Code execution. Set
LLM_SANDBOX_URL(+LLM_SANDBOX_TOKEN) to attach an llm-sandbox sidecar; deepagents'executetool then runs real shell / Python / JS in the container, so the model computes aggregates and joins instead of doing arithmetic in its head. With no sandbox configured the agent falls back to an in-memory backend and execution is disabled — it degrades gracefully and says so rather than faking output. - Large-result offload. When a single MCP result exceeds
DRA_MAX_RESULT_CHARS/DRA_MAX_RESULT_ROWS, the full payload is written to a file underDRA_OFFLOAD_DIRand only a compact stub (path, row count, columns, head) enters context; the model reads the file back withexecute. Without a sandbox these bounds become hard truncation caps instead. This is how a large cross-entity scan stays within the context window. - Budgets.
BudgetMiddlewareenforces cumulative per-run ceilings —DRA_MAX_TOOL_CALLSandDRA_MAX_TOTAL_TOKENS— emitting abudget_softwrap-up nudge at 75% and abudget_halthard stop at 100%.DRA_RECURSION_LIMITseparately caps orchestrator super-steps. Theusageevent reports the run's spend against these ceilings at the end.
src/deep_research_agent/
agent.py make_graph(config) factory ← langgraph.json entrypoint
config.py env + per-run config (the portability seam)
models.py OpenAI-compatible model builder
events.py event protocol + tool instrumentation (mcp_call/mcp_result)
prompts.py orchestrator + subagent prompts (citation + MCP-source rules)
citations.py output middleware → final_report + sources[]
completion.py force-completion middleware (no premature ReAct termination)
findings_gate.py sub-agent findings gate — JSON contract, validator, bounce (report_gate's twin)
budget.py BudgetMiddleware — hard tool-call + token ceilings (soft nudge → hard stop)
clarify_fallback.py emits clarification event when a model narrates questions in prose
skill_usage.py emits a skill event the first time each skill is read in a turn
turn.py scopes thread messages to the current turn (multi-turn safety)
report_hygiene.py deterministic scrub + citation lint applied to the final report
report_gate.py report quality gate — bounces a report back once for fixable defects
metering.py per-run usage ledger → usage event + RESEARCH USAGE log
sandbox.py wires the execute / filesystem tools to the llm-sandbox sidecar
tools/search.py Tavily web_search, emits search events
tools/mcp.py MCP loader + per-call instrumentation, concurrency + 429 backoff
tools/clarify.py request_clarification tool → clarification event
tools/report.py submit_report tool — the single explicit deliverable → report event
tools/custom.py CustomTool base class + drop-in loader for custom_tools/
custom_tools/ drop-in deployment-specific tools (CustomTool subclasses), auto-loaded
skills/ agent skills (each a folder with SKILL.md), mounted read-only at /skills/
examples/client.py reference SSE consumer