Skip to content

Investigate multi-process and multi-worker deployment semantics of in-memory kernel state #226

@dgenio

Description

@dgenio

Summary

Investigate and document what actually happens when the kernel runs in N worker
processes (gunicorn/uvicorn workers, horizontally scaled containers): which
guarantees silently weaken (rate limits multiply by N, revocation doesn't
propagate, budgets/traces/handles fragment per process), and recommend the
mitigation architecture — documentation guardrails now, and the right long-term
seam (persistence #126, remote mode ISSUE 44, or sticky-state guidance).

Why this matters

Every serious deployment of a Python service runs multiple workers, and every piece
of kernel state is per-process: a "5 calls/hour" rate limit becomes 5×N, a revoked
token stays valid in N−1 workers until expiry, and handles minted in one worker
don't expand in another (HMAC tokens, being stateless, do verify everywhere —
which makes the asymmetry genuinely non-obvious). Today the docs don't address
this, so operators will discover it in production. An investigation that produces
precise documentation plus a recommended architecture is the cheapest way to
convert a latent footgun into a roadmap decision.

Current evidence

  • Per-process state: rate_limit.py (in-memory windows), tokens.py _revoked/_principal_tokens, trace.py, handles.py, firewall/budget_manager.py.
  • Token verification is process-independent (HMAC over payload) — the mixed model (stateless verify, stateful everything else) is exactly what needs documenting.
  • No docs/ content covers multi-worker deployment; [Testing] Document the concurrency model and add asyncio stress tests #142 covers asyncio concurrency within a process, not cross-process semantics (distinct scope, noted).

External context

Per-process state divergence in scaled Python services is a well-known operational
class of issue; authorization systems conventionally document their consistency
model explicitly.

Proposed implementation

  1. Build a small reproduction: two kernel instances sharing a secret; demonstrate
    (a) cross-process token verify succeeding, (b) revocation not propagating,
    (c) rate-limit multiplication, (d) handle non-portability. Record results.
  2. Write the consistency-model documentation: a table of each stateful component ×
    multi-worker behavior × mitigation.
  3. Evaluate mitigation options and their fit: shared persistence backends ([Feature] Pluggable persistence for TraceStore, HandleStore, and token revocation (SQLite + JSONL backends) #126),
    remote/sidecar kernel (ISSUE 44), or documented single-worker guidance for
    high-assurance deployments — recommend sequencing.
  4. File concrete follow-ups (e.g., "revocation propagation requires shared store"
    as an explicit [Feature] Pluggable persistence for TraceStore, HandleStore, and token revocation (SQLite + JSONL backends) #126 requirement).

AI-agent execution notes

Acceptance criteria

Test plan

The reproduction script (runnable locally, documented); docs review. Run make ci
(unchanged).

Documentation plan

New deployment/consistency section in docs/security.md or a dedicated page;
CHANGELOG Added (docs).

Migration and compatibility notes

Investigation and documentation only; not expected to require migration.

Risks and tradeoffs

Documenting limits may slow some adoption decisions — but undocumented surprise
weakening of rate limits and revocation in production is strictly worse for trust.

Suggested labels

investigation, reliability, security, documentation

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions