You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Investigate and document what actually happens when the kernel runs in N worker
processes (gunicorn/uvicorn workers, horizontally scaled containers): which
guarantees silently weaken (rate limits multiply by N, revocation doesn't
propagate, budgets/traces/handles fragment per process), and recommend the
mitigation architecture — documentation guardrails now, and the right long-term
seam (persistence #126, remote mode ISSUE 44, or sticky-state guidance).
Why this matters
Every serious deployment of a Python service runs multiple workers, and every piece
of kernel state is per-process: a "5 calls/hour" rate limit becomes 5×N, a revoked
token stays valid in N−1 workers until expiry, and handles minted in one worker
don't expand in another (HMAC tokens, being stateless, do verify everywhere —
which makes the asymmetry genuinely non-obvious). Today the docs don't address
this, so operators will discover it in production. An investigation that produces
precise documentation plus a recommended architecture is the cheapest way to
convert a latent footgun into a roadmap decision.
Token verification is process-independent (HMAC over payload) — the mixed model (stateless verify, stateful everything else) is exactly what needs documenting.
Per-process state divergence in scaled Python services is a well-known operational
class of issue; authorization systems conventionally document their consistency
model explicitly.
Proposed implementation
Build a small reproduction: two kernel instances sharing a secret; demonstrate
(a) cross-process token verify succeeding, (b) revocation not propagating,
(c) rate-limit multiplication, (d) handle non-portability. Record results.
Write the consistency-model documentation: a table of each stateful component ×
multi-worker behavior × mitigation.
The reproduction script (runnable locally, documented); docs review. Run make ci
(unchanged).
Documentation plan
New deployment/consistency section in docs/security.md or a dedicated page;
CHANGELOG Added (docs).
Migration and compatibility notes
Investigation and documentation only; not expected to require migration.
Risks and tradeoffs
Documenting limits may slow some adoption decisions — but undocumented surprise
weakening of rate limits and revocation in production is strictly worse for trust.
Summary
Investigate and document what actually happens when the kernel runs in N worker
processes (gunicorn/uvicorn workers, horizontally scaled containers): which
guarantees silently weaken (rate limits multiply by N, revocation doesn't
propagate, budgets/traces/handles fragment per process), and recommend the
mitigation architecture — documentation guardrails now, and the right long-term
seam (persistence #126, remote mode ISSUE 44, or sticky-state guidance).
Why this matters
Every serious deployment of a Python service runs multiple workers, and every piece
of kernel state is per-process: a "5 calls/hour" rate limit becomes 5×N, a revoked
token stays valid in N−1 workers until expiry, and handles minted in one worker
don't expand in another (HMAC tokens, being stateless, do verify everywhere —
which makes the asymmetry genuinely non-obvious). Today the docs don't address
this, so operators will discover it in production. An investigation that produces
precise documentation plus a recommended architecture is the cheapest way to
convert a latent footgun into a roadmap decision.
Current evidence
rate_limit.py(in-memory windows),tokens.py_revoked/_principal_tokens,trace.py,handles.py,firewall/budget_manager.py.docs/content covers multi-worker deployment; [Testing] Document the concurrency model and add asyncio stress tests #142 covers asyncio concurrency within a process, not cross-process semantics (distinct scope, noted).External context
Per-process state divergence in scaled Python services is a well-known operational
class of issue; authorization systems conventionally document their consistency
model explicitly.
Proposed implementation
(a) cross-process token verify succeeding, (b) revocation not propagating,
(c) rate-limit multiplication, (d) handle non-portability. Record results.
multi-worker behavior × mitigation.
remote/sidecar kernel (ISSUE 44), or documented single-worker guidance for
high-assurance deployments — recommend sequencing.
as an explicit [Feature] Pluggable persistence for TraceStore, HandleStore, and token revocation (SQLite + JSONL backends) #126 requirement).
AI-agent execution notes
docs/security.md(where guarantees are stated); issues [Feature] Pluggable persistence for TraceStore, HandleStore, and token revocation (SQLite + JSONL backends) #126/[Testing] Document the concurrency model and add asyncio stress tests #142 and ISSUE 44 text.Acceptance criteria
docs/(security or deployment section).Test plan
The reproduction script (runnable locally, documented); docs review. Run
make ci(unchanged).
Documentation plan
New deployment/consistency section in
docs/security.mdor a dedicated page;CHANGELOG
Added(docs).Migration and compatibility notes
Investigation and documentation only; not expected to require migration.
Risks and tradeoffs
Documenting limits may slow some adoption decisions — but undocumented surprise
weakening of rate limits and revocation in production is strictly worse for trust.
Suggested labels
investigation, reliability, security, documentation