Skip to content

openab-agent: multi-vendor OAuth ADR + cross-process auth.json locking (codex + MCP)#1190

Open
brettchien wants to merge 18 commits into
openabdev:mainfrom
brettchien:docs/adr-openab-agent-oauth
Open

openab-agent: multi-vendor OAuth ADR + cross-process auth.json locking (codex + MCP)#1190
brettchien wants to merge 18 commits into
openabdev:mainfrom
brettchien:docs/adr-openab-agent-oauth

Conversation

@brettchien

Copy link
Copy Markdown
Contributor

What problem does this solve?

openab-agent's auth.json is a shared, multi-writer credential file with an unlocked read-modify-write, and openab-agent runs one process per Discord thread. So ordinary concurrent usage = concurrent processes refreshing the same OAuth token → refresh-token-rotation reuse → worst case OAuth 2.1 §10.4 token-family revocation = fleet-wide logout. API-key users never hit this; OAuth adoption (PR #1187, Anthropic) is what activates it.

This PR lands the ADR that sets the multi-vendor OAuth pattern (so every future provider stops hand-rolling its own flow) and the concurrency fix for the release-blocker-class storage race — for both the codex tenant and the MCP CredentialStore.

Closes #

Discord Discussion URL: https://discord.com/channels/1490643539682529300/1519372543377539173

At a Glance

auth.json  (multi-tenant: codex / anthropic-oauth / mcp:<server>×N, one process per Discord thread)

 (a) file integrity — ONE global lock          (b) refresh-token rotation — ONE lock per tenant
 ┌─ with_auth_locked(path, f) ──────────┐       ┌─ lock_tenant_refresh(path, tenant) ───────────┐
 │ flock EX  auth.json.global.lock       │       │ try-lock  auth.json.refresh.<tenant>.lock     │
 │ re-read → f(&mut map) → gc_pending →   │       │  + async backoff + 10s timeout (never wedge)  │
 │ atomic tmp+rename                      │       │ held ACROSS the network refresh               │
 └───────────────────────────────────────┘       └───────────────────────────────────────────────┘
   ▲ save_tokens (codex)                            ▲ get_valid_token / force_refresh (codex)
   ▲ McpCredentialStore::save / clear (MCP)         ▲ resolve_oauth_dial → get_access_token (MCP)
   = writers MERGE, never lost-update                = exactly ONE network refresh per tenant; the
                                                        lock-loser re-load()s the winner's token and
                                                        skips its own refresh (no RT_old reuse)

Prior Art & Industry Research

OpenClaw: API keys + subscription OAuth; stores per-profile {access,refresh,expires,accountId} and treats the profile file as a token sink refreshed under a file lock — direct corroboration for the locked-RMW decision.

Hermes Agent (NousResearch): PROVIDER_REGISTRY dataclasses declare each provider's auth type + URLs + env vars behind one resolve_runtime_provider() entry point; auth.json guarded with fcntl/msvcrt file locks — corroborates both the registry shape (OAuthVendor) and the file-lock decision.

Other references:

  • Pi (earendil-works/pi) — ported flow for feat(native-agent): Anthropic OAuth (Claude Pro/Max) login for openab-agent #1187; CLAUDE_CODE_OAUTH_TOKEN (#3591); Kiro/Cursor/xAI provider extensions.
  • rclonercloneEncryptedClientSecret + runtime obscure.MustReveal(): precedent for the ADR §9 Q2 encode-at-rest of a non-confidential bundled OAuth secret (scanner-evasion, explicitly not encryption).
  • Antigravity (agy) OAuth ecosystem — GitHub survey 2026-06-24 (≥326 repos hardcode the same client_id/secret; 99/107 sampled commit it plaintext) grounding the agy go/no-go + secret-storage decisions.

Proposed Solution

ADR (docs/adr/openab-agent-oauth.md): a two-axis model — auth (OAuthVendor descriptor + a shared driver on the official oauth2 crate) kept orthogonal to inference (one provider per wire format) — plus a 14-variant vendor feasibility matrix, the CLAUDE_CODE_OAUTH_TOKEN env route, the /auth two-step UX (per-user-keyed PKCE pending state), and the storage invariant below.

Implementation (this PR):

  • with_auth_locked — global flock(2) on an auth.json.global.lock sidecar across re-read → mutate → atomic write. All writers funnel through it (codex save_tokens + MCP McpCredentialStore::save/clear), so they merge onto the latest on-disk state instead of lost-updating.
  • lock_tenant_refresh — per-tenant flock (non-blocking try + async backoff + 10s timeout) held across the network refresh, with a double-checked re-read, so concurrent processes do exactly one real refresh per tenant. Codex uses it in get_valid_token/force_refresh; MCP uses it in resolve_oauth_dial around get_access_token().
  • flock(2) on sidecar files (kernel auto-releases on fd close / process death — no stale lock); libc::flock under cfg(unix), non-unix no-op.
  • ADR §7 pending-entry GC: created_at + a 15-min sweep inside with_auth_locked.

Why this approach?

  • Two locks, two hazards. The global lock fixes file lost-updates; the per-tenant lock fixes RTR revocation. A single global lock held across the network refresh would serialize one tenant's slow refresh in front of every other tenant (head-of-line blocking MCP servers / Codex), so refresh I/O is held under the per-tenant lock only.
  • flock(2) over a sentinel lockfile — the kernel releases it on process death, so a crashed holder never wedges the file (no orphan cleanup / manual timeout).
  • MCP single-flight without an rmcp fork. rmcp's CredentialStore exposes no pre-refresh hook, but openab drives the MCP refresh explicitly in resolve_oauth_dial; rmcp's get_access_token re-load()s auth.json and skips the network refresh when the token is already fresh, so the per-tenant lock + rmcp's own fresh-load gives cross-process single-flight with no extra double-check code.
  • Known limitation: rustix is not in-tree (the ADR text was optimistic), so this uses libc::flock directly.

Alternatives Considered

  • In-process Mutex / tokio single-flight — useless across the per-thread processes.
  • Sentinel lockfile (create→delete) — deadlocks if a holder dies; flock(2) auto-releases.
  • Refresh outside the lock, re-read on commit (an earlier ADR draft) — rejected: every process has already sent the refresh with the same RT_old before it commits, so RTR still fires.
  • Forcing everything through rmcp CredentialStore — lossy; the shared layer sits below both stores (file RMW).

Validation

Rust changes:

  • cargo check passes
  • cargo test passes (192 passed, 0 failed, 11 ignored) — incl. new with_auth_locked_merges_concurrent_tenants_no_lost_update and with_auth_locked_gcs_stale_pending_but_keeps_fresh_and_tokens
  • cargo clippy -- -D warnings clean
  • cargo fmt --check clean

All PRs:

  • Manual testing — ran the full gate from openab-agent/ (the crate is built standalone, matching ci-openab-agent.yml). Reviewed end-to-end with a second agent across two ADR-review rounds and two code-review rounds; all findings (RTR refresh race, per-user PKCE keying, pending GC, portable WouldBlock, MCP cross-process refresh) addressed and re-verified.

brettchien and others added 12 commits June 25, 2026 00:03
Proposed ADR for the openab-agent LLM-provider OAuth revamp: a two-axis
OAuthVendor adapter (auth flow vs inference transport), a cross-process
flock-guarded credential-store invariant for auth.json, the
CLAUDE_CODE_OAUTH_TOKEN env route, a 14-variant vendor feasibility matrix,
and the /auth (PR openabdev#1185) auth-trigger model. Surfaced while reviewing
PR openabdev#1187 (first OAuth vendor).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… names

- Build the OAuthVendor driver on the official `oauth2` crate (already in-tree
  via the MCP side) instead of a hand-rolled PKCE/exchange/refresh flow; the
  Anthropic JSON-token-body quirk is applied via the crate's custom http-client
  hook. Reframe §8 accordingly (hand-rolled flow is the rejected alternative).
- Remove the project Rollout-plan section (internal sequencing, not ADR
  material); keep the race-window mitigation in §5.4.
- Use vendor names only; drop internal fleet-agent references from the matrix.
- Replace unexplained "ToS-gray" with "ToS-risk" + a definition.
- Fix cross-references (crate-qualified paths, §-refs, line numbers) and move
  the settled model-default decision under "Decisions & open questions".

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…w r1)

Fold in two release-blocker-class concurrency fixes from Mira's review:

- §5.4 refresh-token rotation: the prior "refresh outside lock, re-read on
  commit" claim was wrong — N processes each send a refresh with the same
  RT_old before committing, tripping OAuth 2.1 §10.4 reuse -> token-family
  revocation. Replace with a per-tenant exclusive lock: network refresh held
  under the tenant lock only (not the global lock), so exactly one refresh per
  tenant per expiry with no head-of-line blocking across tenants. Extends to
  mcp:<server> tenants.
- §7 /auth: key the pending PKCE verifier+state by the initiating Discord user
  id instead of a single global entry — prevents concurrent-user overwrite
  (PKCE mismatch) and session hijack.
- §5.1 OAuthVendor::redirect() -> Option, since device flows have no redirect.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
- §6/§9 Q2: correct the stale "git-safe" claim on the gemini GOCSPX-
  secret. The value is non-confidential by RFC 8252 / Google docs, but
  GitHub now push-protects Google secrets by default (changelog 2026-03)
  and partner-scans them for auto-revoke, so a raw literal is not safe in a
  public repo. Decide: encode-at-rest (scanner-evasion for a non-secret, NOT
  a security control) or env-inject at runtime — not raw text.
- §7: pending PKCE entries get created_at + a 15-min GC sweep in
  with_auth_locked so abandoned /auth attempts don't accumulate.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…rett decisions r3)

- §9 Q2 DECIDED: env-injection (b) is the default for the gemini/agy bundled
  client_secret; encode-at-rest (a) is the fallback for bundled zero-config
  binaries only, framed as scanner-evasion (not security). Cite rclone
  rcloneEncryptedClientSecret + obscure.MustReveal() as the canonical
  precedent (§10).
- §9 Q3 DECIDED: GO gemini/grok (first wave) + agy (experimental, opt-in,
  ToS caveat — shares gemini's Code-Assist provider, residual risk is ToS not
  secret storage); No-Go cursor/kiro. Mirror as a build-decision line under §6.
- §10: add rclone + GitHub/Google secret-scanning references.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
GitHub survey 2026-06-24 grounds the agy GO decision:
- §6/§9 Q2: agy client_secret requirement CONFIRMED — it needs a GOCSPX-
  secret, a public constant >=20 antigravity-auth repos hardcode verbatim
  (NoeFabris/opencode-antigravity-auth, router-for-me/CLIProxyAPI, ...).
  Literal deliberately NOT pasted into the doc — would trip the very §9 Q2
  push-protection, so we dogfood the env/encode decision. Redirect confirmed
  localhost:51121/oauth-callback.
- §9 Q3: ecosystem evidence — agy OAuth is widely ported (opencode/pi/hermes/
  openclaw plugins + proxies), proving the integration, while the same
  ecosystem's anti-ban / quota-locking / multi-account-rotation tooling
  empirically confirms the ToS-ban + 429 risks, reinforcing the opt-in gate.
- §10: add the antigravity OAuth ecosystem reference set.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… in scope (r5)

Grounded by a codebase survey (2026-06-24):

- §4/§6: clarify "agy as a GO vendor" means a native OAuthVendor + Code-Assist
  inference provider — it does NOT run the agy CLI. agy speaks no ACP; the
  existing `antigravity` runtime variant (Mira/ECS) only works via a dedicated
  agy-acp adapter that shells out to the agy binary per prompt and polls its
  SQLite DB. The provider path sidesteps ACP entirely and supersedes the
  CLI-wrapper for native use, so agy's lack of ACP doesn't block the GO.
- §5.4/§9 Q4: make the MCP CredentialStore revamp explicitly in-scope. auth.json
  has NO lock today (only atomic rename); provider save_tokens (auth.rs:234) and
  McpCredentialStore::save/clear (auth.rs:284-328) are two independent unlocked
  RMW callers, so with_auth_locked must wrap BOTH or the race persists. Also
  correct the stale save_tokens_for name and note with_auth_locked is new.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
The §9 Q4 tail referenced 'openab-agent-mcp.md open items openabdev#1 (reqwest
0.12/0.13 split) and openabdev#8 (doctor/runtime two-store split)' — but that ADR's
§10 Open Questions has only two items (mcp.json location; native-vs-broker
parity), neither matching, and no such numbered items / terms exist anywhere
in it. The phantom reference dated to the original draft. Replace with an
accurate statement: McpCredentialStore reuses the same TokenStore/auth.json
storage (openab-agent-mcp.md §6.1), so the lock lands once and serves both;
the reqwest version conflict is the rmcp-OAuth dependency issue surfaced on
the feat/openab-agent-mcp-resilience PR, not an mcp-ADR open item.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ive (Brett)

Brett's call given the 93%-plaintext ecosystem survey: keep the bundled
zero-config UX via encode-at-rest (obscure, rclone obscure.MustReveal style)
as the DEFAULT, with env-injection as the alternative for fleet/pod. Reverses
the prior (b)-default ordering. Framing unchanged: encode-at-rest is
scanner-evasion for a non-confidential value, not a security control. Record
the survey numbers (99/107 plaintext; auto-revoke largely unrealized) and that
the real risk mitigated is org-repo push-protection friction, not credential
loss.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Implements the ADR §5.4 invariant. auth.json had NO lock: provider
save_tokens and McpCredentialStore::save/clear each did an independent
unlocked read-modify-write, so a concurrent codex refresh and MCP save
last-writer-wins the whole map, and N processes (one openab-agent per Discord
thread) could each refresh the codex token with the same RT_old → OAuth 2.1
§10.4 token-family revocation (fleet-wide logout).

Two locks, flock(2) on sidecar files (kernel auto-releases on death), cfg(unix)
with a non-unix no-op:

- with_auth_locked: global exclusive lock across the re-read -> mutate ->
  atomic-write. ALL writers funnel through it — save_tokens (codex) and
  McpCredentialStore::save/clear (MCP) — so writers merge onto the latest
  on-disk state instead of lost-updating. Held only for the fast file RMW,
  never across network I/O.
- lock_tenant_refresh: per-tenant refresh serialisation. get_valid_token /
  force_refresh take the codex tenant lock (non-blocking try + async backoff +
  10s timeout, held across the network refresh), with a double-checked re-read
  so a process that waited adopts the token another already refreshed → exactly
  one real refresh per tenant per expiry, no RT_old reuse.

Uses libc::flock (already a cfg(unix) dep); rustix is NOT in-tree (ADR text was
optimistic). New test asserts the locked RMW merges codex + MCP tenants without
lost-update. fmt + clippy -D warnings + test (191 passed) green.

Known gap (for review): the MCP *network* refresh is owned by rmcp's
AuthorizationManager (it refreshes then calls CredentialStore::save), so MCP
writes get the file-integrity lock but MCP refreshes are not yet tenant-
serialised. Closing that needs an rmcp-level hook — proposed as follow-up.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Round-2 code review (Mira):
- §7 pending-entry GC: add created_at to PendingPasteLogin and sweep
  AuthEntry::Pending older than 15 min inside with_auth_locked on every write,
  so abandoned /auth two-step attempts don't accumulate. PendingPasteLogin is
  currently a legacy tombstone with no live writer (PKCE state lives in rmcp's
  in-memory StateStore); the field + GC land now per ADR §7 and are
  forward-compatible with the forthcoming /auth two-step flow, and meanwhile
  sweep legacy stray entries (created_at default 0 reads as ancient). New test
  covers stale-swept / fresh-kept / real-tenant-untouched.
- flock_try_exclusive: match std::io::ErrorKind::WouldBlock instead of a single
  raw EWOULDBLOCK errno, covering EAGAIN/EWOULDBLOCK across libc/BSD.
- MCP network-refresh serialisation stays a follow-up (rmcp owns the refresh) —
  Mira concurs.

fmt + clippy -D warnings + test (192 passed) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
… gap)

Closes the known gap from f370110: MCP refreshes now get the same per-tenant
serialisation as codex.

rmcp's CredentialStore exposes no pre-refresh hook, but openab drives the MCP
refresh explicitly in McpRuntimeManager::resolve_oauth_dial via
client.get_access_token(). Wrap that call (per-server) with the existing
auth::lock_tenant_refresh so only one process refreshes a given server at a
time. No explicit double-check needed: rmcp's get_access_token re-load()s
auth.json each call and returns the cached token without a network refresh when
remaining >= REFRESH_BUFFER (rmcp auth.rs:1238), so a process that loses the
race adopts the token the winner wrote to the shared file — no second RT_old
presentation, no OAuth 2.1 §10.4 family revocation. rmcp already single-flights
within one process via its AuthorizationManager Mutex; this closes the
cross-process gap.

- auth.rs: lock_tenant_refresh + AuthFileLock made pub(crate) for the mcp module.
- ADR §5.4: document the resolve_oauth_dial serialisation point.

fmt + clippy -D warnings + test (192 passed) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@brettchien brettchien requested a review from thepagent as a code owner June 24, 2026 19:53
…h fd

Quality-only cleanups from a 4-angle /simplify pass (no behaviour change):
- Extract lock_global(path) -> Result<Option<AuthFileLock>> and route both
  with_auth_locked and McpCredentialStore::clear through it, so the "global"
  sidecar name + the cfg(unix) acquire live in one place instead of two
  copy-pasted blocks. clear flattens (drop the inner run closure).
- lock_tenant_refresh opens the lock fd once and re-issues flock on it each
  retry, instead of re-opening (and re-create_dir_all-ing) the file every
  100ms under contention. Removes the now-unused flock_try_exclusive helper.
- Document lock_tenant_refresh's double-check contract (the lock only
  serialises; callers must re-check freshness after acquiring).

fmt + clippy -D warnings + test (192 passed) green.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@chaodu-agent

This comment has been minimized.

brettchien and others added 5 commits June 25, 2026 04:07
Doc-only clarifications from a /code-review pass (no behaviour change):
- PendingPasteLogin.created_at: warn loudly that any writer of a fresh Pending
  entry MUST stamp created_at, else gc_stale_pending sweeps it on the next
  locked write (latent footgun for the forthcoming /auth two-step writer).
- lock_tenant_refresh: correct the contract — reuse-safety comes from loading
  the refresh token INSIDE the lock (so force_refresh, which always refreshes
  on a 401, is reuse-safe too); the post-lock expiry re-check is only an
  optimisation to skip a redundant refresh.

Review also surfaced a pre-existing, out-of-scope namespacing gap (an MCP
server literally named "codex" collides with the codex tenant's auth.json key
and refresh lock — committed MCP creds use the bare server name, not
mcp:<server> as ADR §5.4 describes) — flagged for a separate change.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ci-openab-agent.yml runs cargo fmt/clippy/test/build with
working-directory: openab-agent, but the crate is not a member of the parent
openab workspace (members = crates/openab-core, openab-gateway) and was not
excluded, so cargo errors 'current package believes it's in a workspace when
it's not' and every step fails before doing any work. openab-agent is
intentionally standalone (own version + dual reqwest 0.12/0.13 for rmcp), so
the correct fix is an empty [workspace] table making it its own root.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
ci-openab-agent.yml ran cargo from working-directory: openab-agent without the
[workspace] table the crate needs (it's standalone, not a parent-workspace
member), so every step failed with 'believes it's in a workspace when it's
not' — the workflow had been red independently of any PR. Dockerfile.unified
already works around this by appending the table at build time; replicate that
in the workflow rather than committing it to Cargo.toml (which would
double-append in the Dockerfile and break the image build).

Also harden the ACP smoke test: build the release binary in its own step and
bump the response timeout 5s -> 30s so a loaded runner doesn't flake the
agent's first-response window (the binary itself responds in <1s locally,
verified incl. a clean-HOME release build).

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…osis

The smoke test produces empty stdout only in CI (the binary responds with
agentInfo in <1s locally across debug/release/clean-HOME/CI-env-var runs).
Capture stderr + exit code so the next CI run reveals why stdout is empty.
@chaodu-agent

Copy link
Copy Markdown
Collaborator

LGTM ✅ — Well-designed cross-process locking for auth.json with thorough ADR documentation and correct concurrency semantics.

What This PR Does

Introduces a multi-vendor OAuth Architecture Decision Record and implements cross-process auth.json locking to prevent OAuth 2.1 §10.4 token-family revocation caused by concurrent refresh-token rotation across openab-agent's one-process-per-Discord-thread model.

How It Works

Two-lock architecture: (a) a global flock(2) sidecar lock (auth.json.global.lock) serialises all read-modify-write operations to prevent lost updates, and (b) per-tenant refresh locks (auth.json.refresh.<tenant>.lock) serialise network refreshes so only one process presents RT_old per expiry cycle. Both codex save_tokens and MCP McpCredentialStore funnel through the same global lock. The per-tenant lock uses non-blocking try + async backoff with a 10s timeout to avoid wedging.

Findings

# Severity Finding Location
1 🟢 Correct two-lock separation — global for file integrity, per-tenant for refresh serialisation avoids head-of-line blocking auth.rs:227-400
2 🟢 Double-check pattern after acquiring tenant lock prevents redundant refreshes auth.rs:530-538
3 🟢 MCP integration leverages rmcp's own re-load() as implicit double-check — elegant zero-code single-flight mcp/runtime.rs:634-651
4 🟢 Pending-entry GC with created_at timestamp prevents stale PKCE verifier accumulation auth.rs:309-320
5 🟢 Comprehensive ADR covering prior art, vendor matrix, rejected alternatives, and open questions with clear decisions docs/adr/openab-agent-oauth.md
6 🟢 Test coverage for both the merge-not-clobber invariant and the GC sweep logic auth.rs:1306-1372
Finding Details

🟢 F1: Two-lock design

The global lock (with_auth_locked) serialises only the fast in-memory mutate + atomic write — no network I/O inside. The per-tenant lock (lock_tenant_refresh) serialises the slow network refresh per-tenant only. This avoids head-of-line blocking where a stalled refresh for one tenant would block all other writers.

🟢 F2: Double-check after tenant lock acquisition

get_valid_token re-loads and checks expiry after acquiring the tenant lock, so a process that waited while another refreshed skips the redundant network call. Correct single-flight pattern.

🟢 F3: MCP cross-process single-flight without rmcp fork

rmcp's get_access_token re-load()s auth.json and returns early when the token is already fresh. Combined with the per-server tenant lock, this gives cross-process single-flight with no additional double-check code in openab — clean integration.

🟢 F4: Pending-entry GC

gc_stale_pending sweeps abandoned PKCE pending entries (>15 min) on every locked write. created_at = 0 (legacy) is treated as ancient and swept — safe backward compat. Fresh pending entries are preserved.

🟢 F5: ADR quality

The ADR documents: (1) the two-axis auth/inference model, (2) a 14-variant vendor feasibility matrix with concrete OAuth values, (3) the concurrency invariant with pseudocode, (4) prior-art survey (OpenClaw, Hermes, Pi, rclone), (5) rejected alternatives with clear rationale, (6) explicit decisions with owner attribution.

🟢 F6: Test coverage

Two focused tests verify the core invariants: with_auth_locked_merges_concurrent_tenants_no_lost_update confirms the merge-not-clobber semantics, with_auth_locked_gcs_stale_pending_but_keeps_fresh_and_tokens confirms GC correctness.

Baseline Check
  • PR opened: 2026-06-24
  • Main already has: Unlocked save_tokens + McpCredentialStore RMW, atomic tmp+rename write, PendingPasteLogin struct (without created_at)
  • Net-new value: (1) Cross-process file locking via flock(2) sidecar — with_auth_locked funnel + lock_tenant_refresh per-tenant serialisation. (2) Pending-entry GC with TTL. (3) Comprehensive multi-vendor OAuth ADR. (4) CI workflow fix for standalone crate workspace resolution.
What's Good (🟢)
  • Two-lock design correctly separates file-integrity (fast, no I/O) from refresh-serialisation (slow, network I/O)
  • flock(2) on sidecar files: kernel auto-releases on process death — no stale locks, no orphan cleanup
  • clear() acquires lock_global directly rather than using with_auth_locked — correct for delete-on-empty semantics
  • WouldBlock handling in lock_tenant_refresh gracefully degrades (proceeds unserialised) rather than failing hard
  • Refresh-token splice logic now executes inside the lock, closing the race window
  • CI fix (printf '\n[workspace]\n' >> Cargo.toml) mirrors Dockerfile.unified approach — correct standalone crate resolution
  • ACP smoke test improvements: captures stderr separately, extends timeout to 30s, adds diagnostics on failure

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants