From f4034d8be66e0f077c2587d39c890614338957d0 Mon Sep 17 00:00:00 2001
From: "Yoshiaki Ueda (bootjp)" <contact@bootjp.me>
Date: Fri, 22 May 2026 16:01:27 +0900
Subject: [PATCH 1/2] docs/design: rename 2026_05_16 admin purge queue proposal
 to implemented

Phase 6 of the rollout plan: all milestones in
docs/design/2026_05_16_*_admin_purge_queue.md have shipped on main:

  Phase 2 (#771) - AdminPurgeQueue + IsDLQ/DLQSources
  Phase 3 (#794) - AdminPeekQueue backend
  Phase 4 (#797) - HTTP handler + bridge
  Phase 5 (#798) - SPA Messages tab + Purge button + DLQ chips

Rename the file via git mv per CLAUDE.md's lifecycle rule
(*_proposed_*.md to *_implemented_*.md), bump Status, and add an
Implementation history table so a future reader can trace each
phase back to its PR.

Out-of-scope follow-ups (throttle integration, audit + Prometheus
counters, principalForReadSensitive live re-check, page-size
selector) are listed in the doc but are tracked separately and do
not gate this rename.
---
 ...026_05_16_implemented_admin_purge_queue.md} | 18 +++++++++++++++++-
 1 file changed, 17 insertions(+), 1 deletion(-)
 rename docs/design/{2026_05_16_proposed_admin_purge_queue.md => 2026_05_16_implemented_admin_purge_queue.md} (98%)

diff --git a/docs/design/2026_05_16_proposed_admin_purge_queue.md b/docs/design/2026_05_16_implemented_admin_purge_queue.md
similarity index 98%
rename from docs/design/2026_05_16_proposed_admin_purge_queue.md
rename to docs/design/2026_05_16_implemented_admin_purge_queue.md
index 48cd9759..73f1352a 100644
--- a/docs/design/2026_05_16_proposed_admin_purge_queue.md
+++ b/docs/design/2026_05_16_implemented_admin_purge_queue.md
@@ -1,9 +1,25 @@
 # Admin Queue Peek and Purge (DLQ-Aware) for the SQS Web Console
 
-**Status:** Proposed
+**Status:** Implemented
 **Author:** bootjp
 **Date:** 2026-05-16
 
+## Implementation history
+
+| Phase | PR | Landed |
+|-------|-----|--------|
+| 1 (this doc, proposal) | #757 | 2026-05-16 |
+| 2 (`AdminPurgeQueue` + `IsDLQ`/`DLQSources`) | #771 | 2026-05-17 |
+| 3 (`AdminPeekQueue` backend) | #794 | 2026-05-20 |
+| 4 (HTTP handler + bridge) | #797 | 2026-05-21 |
+| 5 (SPA Messages tab + Purge button + DLQ chips) | #798 | 2026-05-21 |
+
+Out-of-scope follow-ups (tracked separately, not gating this rename):
+- Throttle integration (`bucketActionAdminPeek` + dedicated per-queue admin-peek bucket per §3.1)
+- Audit logging + Prometheus counters per §3.6
+- `principalForReadSensitive` live `RoleStore` re-check (Goal 8, blocked on wider RoleStore plumbing)
+- Page-size selector (20 / 50 / 100) + response-size warning
+
 ---
 
 ## 1. Background and Motivation

From d723f516c8761007927fe9734f36668f34986777 Mon Sep 17 00:00:00 2001
From: "Yoshiaki Ueda (bootjp)" <contact@bootjp.me>
Date: Sat, 23 May 2026 01:41:03 +0900
Subject: [PATCH 2/2] docs/design: flag out-of-scope sections inline (r1
 review)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Gemini r1 on PR #800 flagged that the Status: Implemented banner

and the Out-of-scope follow-ups list at the top were not aligned

with the body: §2.1 Goal 8 (live RoleStore re-check), §3.1

Throttle, §3.5 page-size selector, and §3.6 audit + Prometheus

counters still read like implemented features.

Add inline 'not yet implemented' markers at the entry of each

deferred section, with a one-line mitigation note so a future

reader (a) understands what is missing, (b) sees what the

current behaviour falls back to, and (c) can cross-reference

the top-of-doc follow-ups list. Body text remains unchanged

below the marker so the future-implementer can pick up the

design intent verbatim.

Claude r1's two informational notes (skipped _partial_

lifecycle step, Status header bold-format vs README template)

are noted but not actioned: the _partial_ step was a

process miss during the Phase 2-5 reviews that does not change

the final state, and the Status format is pre-existing across

the whole docs/design/ tree (Implemented and Proposed all use

bold).
---
 docs/design/2026_05_16_implemented_admin_purge_queue.md | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/docs/design/2026_05_16_implemented_admin_purge_queue.md b/docs/design/2026_05_16_implemented_admin_purge_queue.md
index 73f1352a..b5b05299 100644
--- a/docs/design/2026_05_16_implemented_admin_purge_queue.md
+++ b/docs/design/2026_05_16_implemented_admin_purge_queue.md
@@ -58,7 +58,7 @@ Both features work for any queue. The **UI is DLQ-aware** so the operator gets t
    - `DLQSources []string` — the source-queue names that point at this queue. The SPA renders these as a chip list on the detail page so the operator can confirm they understand what queue feeds the DLQ before purging.
 6. **Same AWS-shaped error mapping** as the SigV4 path — purging more than once per 60 seconds returns the SQS `PurgeQueueInProgress` semantics that `tryPurgeQueueOnce` already enforces. The admin response surfaces it as a structured `429 Too Many Requests` JSON payload (`{"code":"PurgeQueueInProgress", "retry_after_seconds":N}`).
 7. **Audit** — `admin.sqs.purge_queue` (subject, role, queue, generation_before, generation_after). Peek is a read and does NOT generate an audit line per call (the SPA polls; per-poll audit would drown the log) but the admin handler emits the standard request-log line with `route` / `subject` / `status_code` so the call is still traceable.
-8. **Read-only role can peek but not purge.** Peek is gated by a **live `RoleStore` re-check** (not just the session-auth gate that List / Describe currently use), introduced as a new `principalForReadSensitive` helper alongside the existing `principalForWrite`. Purge stays gated by `principalForWrite` (live-role re-check), matching `AdminDeleteQueue` exactly. Codex r9 P1 flagged the security gap in the earlier draft: peek exposes full message bodies / attributes (not just metadata like List / Describe), so a session JWT that was revoked or whose role was downgraded after login could still read DLQ payloads via peek until the token's natural 1-hour TTL elapsed. The new `principalForReadSensitive` helper performs the same revocation check `principalForWrite` does, but classifies the call as a read in the audit pipeline — keeping the audit shape parallel to List / Describe while closing the confidentiality gap. List / Describe themselves remain on the session-auth-only gate because their output is metadata that is already shown on the SPA's queue list page; the divergence is intentional and is documented at the call site so a future reviewer does not "fix" the inconsistency by downgrading peek's gate. Claude r2 caught the earlier draft that implied a non-existent `principalForRead` helper; this paragraph spells out the actual gate with the security-class distinction.
+8. **Read-only role can peek but not purge.** _The live `RoleStore` re-check is NOT yet implemented in the initial rollout — see "Out-of-scope follow-ups" at the top. Phase 4 shipped a `Role.AllowsRead()` gate that accepts the JWT-embedded role plus an optional RoleStore lookup; the wider live-revalidation plumbing the design calls for is blocked on a broader RoleStore refactor that affects every adapter's read path. Mitigation in absence: a revoked / downgraded key keeps peek access until the session JWT's natural 1-hour TTL expires._ Peek is gated by a **live `RoleStore` re-check** (not just the session-auth gate that List / Describe currently use), introduced as a new `principalForReadSensitive` helper alongside the existing `principalForWrite`. Purge stays gated by `principalForWrite` (live-role re-check), matching `AdminDeleteQueue` exactly. Codex r9 P1 flagged the security gap in the earlier draft: peek exposes full message bodies / attributes (not just metadata like List / Describe), so a session JWT that was revoked or whose role was downgraded after login could still read DLQ payloads via peek until the token's natural 1-hour TTL elapsed. The new `principalForReadSensitive` helper performs the same revocation check `principalForWrite` does, but classifies the call as a read in the audit pipeline — keeping the audit shape parallel to List / Describe while closing the confidentiality gap. List / Describe themselves remain on the session-auth-only gate because their output is metadata that is already shown on the SPA's queue list page; the divergence is intentional and is documented at the call site so a future reviewer does not "fix" the inconsistency by downgrading peek's gate. Claude r2 caught the earlier draft that implied a non-existent `principalForRead` helper; this paragraph spells out the actual gate with the security-class distinction.
 
 ### 2.2 Non-Goals
 
@@ -188,7 +188,7 @@ The walk terminates when either `Partition` advances back to `StartPartition` (f
 
 Cost is `O(Limit)` round-trips against Pebble at peek time — tiny for the bounded result sets the SPA uses. The bound on `Limit` (max 100) prevents an operator script from accidentally issuing million-row peeks against the leader.
 
-**Throttle.** Peek consults a **distinct per-queue admin-peek bucket**, *not* the per-queue `ReceiveMessage` budget. An earlier draft of this design merged the two; Claude r2 flagged that an operator paginating through a 10k-message DLQ could exhaust the budget that real consumers depend on. The separate admin-peek bucket defaults to a lower steady-rate (`adminPeekRPS = 5`, `adminPeekBurst = 20`) so a pagination loop cannot starve consumers.
+**Throttle.** _Not yet implemented in the initial rollout — see "Out-of-scope follow-ups" at the top. Mitigation in absence: the Phase 3 implementation enforces a hard `Limit ≤ 100` per call and leader-only execution, which bounds the steady-state cost; per-queue throttling lands when the SPA wiring needs the rate-limit metric to have a real consumer._ Peek consults a **distinct per-queue admin-peek bucket**, *not* the per-queue `ReceiveMessage` budget. An earlier draft of this design merged the two; Claude r2 flagged that an operator paginating through a 10k-message DLQ could exhaust the budget that real consumers depend on. The separate admin-peek bucket defaults to a lower steady-rate (`adminPeekRPS = 5`, `adminPeekBurst = 20`) so a pagination loop cannot starve consumers.
 
 **Bucket key format.** The existing `bucketStore` (`adapter/sqs_throttle.go`) keys on a struct `bucketKey{queue, action, incarnation}`, not a string. The admin-peek bucket therefore uses `bucketStore.charge()` directly with `action = bucketActionAdminPeek` and the queue's current incarnation, exactly like the `SendMessage` / `ReceiveMessage` paths do. Claude r4 flagged an earlier draft that described the bucket as a free-standing string-keyed map; that would have required parallel rate-limiter infrastructure and would not have been swept by `invalidateQueue()` on queue re-creation. The `bucketStore.charge(adminPeekThrottle, queueName, bucketActionAdminPeek, meta.Incarnation, 1)` form participates in the existing incarnation reset machinery automatically.
 
@@ -446,7 +446,7 @@ The queue detail page gains two new pieces of UI on top of the existing attribut
 | Body preview      | `body` (already truncated by backend)               | first 96 chars; "…" suffix when `body_truncated`. Row click opens detail modal.                                                    |
 | Size              | `body_original_size`                                | human-readable ("1.4 kB") so operators can spot oversized messages                                                                 |
 
-Below the table: a page-size selector (20 / 50 / 100), a Refresh button, and Next / Previous controls driven by the cursor. Detail modal shows full body + every attribute + the timestamps; a "Copy as JSON" button copies the row's full record to the clipboard for manual export.
+Below the table: _the page-size selector (20 / 50 / 100) is NOT yet implemented — see "Out-of-scope follow-ups" at the top. Phase 5 shipped a hard default of 20 rows; the worst-case response (20 × 256 KiB = 5 MiB) stays well under network / JSON-parse budgets even without operator-tunable sizes. Selector + size warning land in a follow-up if operators ask for larger pages._ A Refresh button, and Next / Previous controls driven by the cursor. Detail modal shows full body + every attribute + the timestamps; a "Copy as JSON" button copies the row's full record to the clipboard for manual export.
 
 **Copy as JSON payload schema.** The clipboard payload is the exact wire shape of a single `AdminPeekedMessage` entry (top-level keys: `message_id`, `body`, `body_truncated`, `body_original_size`, `sent_timestamp`, `receive_count`, `group_id`, `deduplication_id`, `attributes`) plus a wrapper `{"schema_version": 1, "queue": "<name>", "exported_at": "<ISO8601>", "message": { … }}`. The `schema_version` is what downstream tooling pins so a future change to the export format (e.g. multi-message JSONL bundle) does not silently break exporters. Operator workflows that pipe this into a recovery tool can rely on the schema not shifting under them.
 
@@ -491,6 +491,8 @@ mirroring the existing `deleteQueue` / `describeQueue` shape. `peekQueue` is `si
 
 ### 3.6 Audit and observability
 
+_Not yet implemented in the initial rollout — see "Out-of-scope follow-ups" at the top. Mitigation in absence: the admin handler still emits the standard request-log line with `route` / `subject` / `status_code` for both purge and peek calls, so an operator can correlate "who did what when" against the application logs at audit-review time. The structured `admin.sqs.purge_queue` audit line and the two Prometheus counters land alongside the SPA wiring so the metrics have a real consumer._
+
 New structured log line at `slog.Info` level (matches `AdminDeleteQueue`):
 
 ```