Skip to content

adapter/sqs: AdminPeekQueue backend (Phase 3)#794

Merged
bootjp merged 5 commits into
mainfrom
feat/sqs-admin-peek-queue
May 21, 2026
Merged

adapter/sqs: AdminPeekQueue backend (Phase 3)#794
bootjp merged 5 commits into
mainfrom
feat/sqs-admin-peek-queue

Conversation

@bootjp
Copy link
Copy Markdown
Owner

@bootjp bootjp commented May 20, 2026

Summary

Phase 3 of docs/design/2026_05_16_proposed_admin_purge_queue.md: backend AdminPeekQueue method on *adapter.SQSServer.

  • Returns up to opts.Limit currently-visible messages (visible_at <= now); pure read, no receive-count bump, no receipt-handle minting, no visibility-timer change.
  • Partitioned FIFO support via rotated sequential scanning across partitions, reusing the receive path's receiveFanoutCounters for fairness.
  • Versioned JSON + base64url cursor pins Generation, StartPartition, Partition, and LastKey; stale-generation cursors are rejected so a purge between pages forces the SPA to refresh from the front.
  • AdminRole.canRead() — peek requires non-zero role; read-only principals can triage, purge stays on canWrite().
  • Pre-existing goconst regression in sqs_messages.go (the validator switch's inline "String" / "Number" literals) fixed by extracting sqsAttributeBaseTypeString / sqsAttributeBaseTypeNumber alongside the existing sqsAttributeBaseTypeBinary.

Caller audit (semantic-change rule): no existing function signatures changed. canRead is a new method; AdminPeekQueue is a new entrypoint. The validator switch's constant extraction is a literal-for-named rename (byte-identical wire behaviour).

Behavior change

  • New AdminPeekQueue admin method. Not yet wired to an HTTP handler (Phase 4) — this PR only adds the adapter surface.
  • No change to SigV4 paths, AdminPurgeQueue, or AdminDescribeQueue.

Risk

Low. Pure-read entry; the only writes I touch are the constant rename in sqs_messages.go (byte-identical to the prior inline literals). The cursor codec is hard-capped at 512 bytes after encoding; oversize cursors surface as ErrAdminSQSValidation from the decoder.

Self-review (5 passes)

  1. Data loss — Read-only entry. The constant extraction in sqs_messages.go is a literal-for-named rename with no behaviour change.
  2. Concurrency — Leader-only via isVerifiedSQSLeader. Single nextTxnReadTS threaded into loadQueueMetaAt, ScanAt, and per-row GetAt, so the page is a consistent MVCC snapshot. The cursor's stored Generation is checked against the live meta on every continuation page — a purge between pages produces ErrAdminSQSValidation rather than rows from a purged generation.
  3. PerformanceO(opts.Limit) ScanAt round-trips + per-row GetAt. Hard caps: Limit <= 100, body <= 256 KiB. Cursor is bounded at 512 bytes. No hot-path allocations.
  4. Consistency — Cursor encodes Generation; mismatch surfaces as ErrAdminSQSValidation. Partition rotation uses the same receiveFanoutCounters the receive path uses, so peek's partition fairness matches the data plane's. Data records are read via sqsMsgDataKeyDispatch, so partitioned FIFO data keys are routed to the same partition the vis-index entry was found under.
  5. Test coverage — 22 new tests (happy path / no-bump-on-peek / truncation / Limit clamp / cursor round-trip / stale-gen / four malformed-cursor classes / read-only allowed / role-less denied / missing queue / empty name / delayed-message hidden / FIFO attribute projection / typed MessageAttribute (String+Binary) round-trip / partitioned FIFO pagination / cursor codec round-trip+empty+oversize / preparePeekCursor fresh+stale / clampPeekLimit+clampPeekBodyBytes truth tables / projectPeekedAttributes nil-safety / canRead truth table). go test -race clean.

Test plan

  • go test -race -count=1 ./adapter/... (targeted: TestAdminPeekQueue*, TestPeekCursorCodec*, TestClampPeek*, TestPreparePeekCursor*, TestEncodePeekCursor*, TestProjectPeekedAttributes*, TestAdminRole_CanRead, TestAdminPurge*, TestAdminQueueSummary*, TestSQSServer_PartitionedFIFO*, TestSQSServer_Send*)
  • make lint clean on ./adapter/... (0 issues)
  • CI on this PR

Out of scope (follow-ups)

  • Throttle integration: dedicated per-queue admin-peek bucket (bucketActionAdminPeek + resolveActionConfig explicit case + *adminPeekThrottledError + *PeekThrottledError). Leader-only + Limit <= 100 already bound the steady-state cost; the dedicated bucket adds a per-queue cap that should land alongside the SPA wiring so the metric has a real consumer.
  • Phase 4: HTTP handler at GET /admin/api/v1/sqs/queues/{name}/messages, bridge in main_admin.go, integration tests.
  • principalForReadSensitive live RoleStore re-check (design doc Goal 8): blocked on the wider RoleStore plumbing — neither AdminPurgeQueue nor AdminDeleteQueue do a live re-check today; peek will inherit that pattern when the broader work lands.

Summary by CodeRabbit

  • New Features
    • Admin API to peek SQS queue messages without modifying them: pagination with continuation tokens, configurable body truncation while reporting original size, full message attribute visibility, supports standard and FIFO queues.
  • Bug Fix / Permissions
    • Read-only admin roles are now allowed to perform non-destructive admin reads (peek).
  • Polish
    • Improved message-attribute typing/handling for more reliable attribute projection.

Review Change Stack

Phase 3 of the Admin Queue Peek and Purge design
(docs/design/2026_05_16_proposed_admin_purge_queue.md). Implements
the read-only counterpart to AdminPurgeQueue:

  AdminPeekQueue(ctx, principal, name, opts)
    ([]AdminPeekedMessage, nextCursor string, error)

Peek walks the leader's visibility index at the next read timestamp,
returns up to opts.Limit currently-visible messages (visible_at
less than or equal to now), and bumps NO receive counts. No receipt
handle is minted; no visibility timer is started. Pure read.

New surface:
  - AdminPeekedAttribute mirrors the typed SQS MessageAttribute
    shape so binary payloads and the DataType discriminator survive
    the SPA round-trip (Codex r11 on the design doc).
  - AdminPeekedMessage carries body (truncated per BodyMaxBytes),
    BodyTruncated, BodyOriginalSize, SentTimestamp, ReceiveCount,
    GroupID, DeduplicationID, and the typed attribute map.
  - AdminPeekMessageOptions{Limit, Cursor, BodyMaxBytes} with
    documented defaults (20, "", 4096) and bounds
    ([1, 100], 512-byte cursor cap, [256, 262144] body cap).
  - peekCursor (versioned JSON, base64url-wrapped) so the SPA can
    page across partition boundaries deterministically. Cursor
    pins Generation so a purge between pages forces the SPA to
    refresh from the front (ErrAdminSQSValidation rather than
    silently surfacing rows from a purged generation).
  - AdminRole.canRead() - peek requires non-zero role; both
    read-only and full principals satisfy the gate. Purge stays
    on canWrite() so role-only principals can triage but cannot
    drop the queue.

Partitioned FIFO support: peek uses rotated sequential scanning
over partitions, picking the start partition via the same
receiveFanoutCounters the receive path uses for fairness. The
cursor encodes both StartPartition (for the wrap detection) and
the current Partition + LastKey so subsequent pages continue
deterministically.

Caller audit (semantic-change rule): no existing function
signatures changed. canRead is a new method; AdminPeekQueue is a
new entrypoint. The sqsAttributeBaseType{String,Number} constant
extraction in sqs_messages.go fixes a pre-existing goconst
regression that the larger test surface tripped; the validator
switch now reads canonical constants instead of inline literals.

Tests (adapter/sqs_admin_peek_test.go) cover:
  - happy path with body / MessageID / SentTimestamp assertions
  - peek does NOT bump ApproximateReceiveCount across two calls
  - body truncation reports BodyTruncated + BodyOriginalSize
  - Limit clamping (0 to default 20, 500 to max 100)
  - cursor round-trip across two pages, no overlap, complete
    coverage
  - stale-generation cursor after purge to ErrAdminSQSValidation
  - oversize / bad-base64 / bad-JSON / wrong-version cursors all
    map to ErrAdminSQSValidation
  - read-only principal allowed; role-less denied
  - missing queue to ErrAdminSQSNotFound
  - empty / whitespace name to ErrAdminSQSValidation
  - DelaySeconds > 0 hides the message until visible_at elapses
  - FIFO MessageGroupId / MessageDeduplicationId both projected
  - typed MessageAttribute (String + Binary) round-trips
  - partitioned FIFO (PartitionCount=4): every sent message
    surfaces exactly once across the union of paginated peek
    pages, regardless of the rotated start partition
  - cursor codec unit tests: round-trip, empty-cursor sentinel,
    length cap, generation mismatch
  - clampPeekLimit / clampPeekBodyBytes truth tables
  - projectPeekedAttributes nil-safety and DataType preservation
  - AdminRole.canRead() truth table

Out of scope (deferred to a follow-up):
  - Throttle integration (bucketActionAdminPeek + dedicated
    per-queue admin-peek bucket per design doc 3.1). Leader-only
    + Limit cap already bound the cost; the dedicated bucket adds
    a steady-rate cap that should land alongside the SPA wiring
    so the rate-limit metric has a real consumer.
  - HTTP handler + bridge - Phase 4.
  - principalForReadSensitive live RoleStore re-check - depends
    on the wider RoleStore plumbing landing first (design doc
    Goal 8, noted as a future direction).
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 20, 2026

@claude review

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 20, 2026

Warning

Rate limit exceeded

@bootjp has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 57 minutes and 49 seconds before requesting another review.

You’ve run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: ecc74c49-2ba8-4c1c-9464-65e58adff03e

📥 Commits

Reviewing files that changed from the base of the PR and between fc35ee5 and ddb4e6e.

📒 Files selected for processing (2)
  • adapter/sqs_admin_peek.go
  • adapter/sqs_admin_peek_test.go
📝 Walkthrough

Walkthrough

This PR implements a read-only admin API for peeking into SQS queues without modifying message state, complete with cursor-based pagination, authorization gating, and comprehensive test coverage spanning standard, FIFO, and partitioned queue types.

Changes

Admin SQS Peek API

Layer / File(s) Summary
Authorization gating for admin reads
adapter/dynamodb_admin.go
AdminRole.canRead() method permits both read_only and full roles, rejecting the zero/unspecified role to gate non-destructive admin operations.
Wire contracts and message attribute types
adapter/sqs_admin_peek.go, adapter/sqs_messages.go
Exported response types (AdminPeekedMessage, AdminPeekedAttribute, AdminPeekMessageOptions), peek constants (limits, body truncation bounds, cursor sizing), and new sqsAttributeBaseTypeString/sqsAttributeBaseTypeNumber constants extracted from hardcoded strings for AWS compatibility.
Cursor codec for versioned pagination
adapter/sqs_admin_peek.go
Versioned cursor struct with base64url+JSON encoding/decoding, strict schema validation, size limits, and error mapping to ErrAdminSQSValidation for malformed inputs.
Peek handler and partition routing
adapter/sqs_admin_peek.go
AdminPeekQueue handler enforces read permissions and leader-only execution, validates queue name, clamps request options, reads queue metadata at stable MVCC timestamp, orchestrates visibility-index walk, and returns peeked rows with continuation cursor. Includes peekStartPartition (rotated start position for first page) and preparePeekCursor (generation/partition validation for resume).
Visibility index pagination and record loading
adapter/sqs_admin_peek.go
walkPeek and walkPeekPartition scan visibility index with rotation for partitioned queues and last-key resume semantics. loadPeekMessageRecord fetches message records at caller's MVCC snapshot, benignly skipping tombstoned entries instead of failing the whole peek.
Response projection and formatting
adapter/sqs_admin_peek.go
Projects stored records into AdminPeekedMessage response objects with body truncation and original-size tracking, receive-count clamping to int32, UTC timestamp conversion, and nil-safe typed attribute mapping that omits empty attributes from JSON.
Comprehensive test suite
adapter/sqs_admin_peek_test.go
End-to-end validation: happy path, pure-read semantics, body truncation, limit clamping, cursor-based pagination, stale-generation rejection, malformed cursor handling, authorization gating (read-only allowed vs. role-less forbidden), queue validation, delayed message visibility, FIFO attribute projection (group/dedup IDs), typed attributes (String/Binary), partitioned queue pagination with fanout-counter regression, cursor codec round-trip, nil-safe projection, canRead() truth table, and hostile cursor regression with timeout bounds.

Sequence Diagram

sequenceDiagram
  participant Client
  participant AdminPeekQueue as AdminPeekQueue
  participant Authorization as canRead()
  participant QueueMeta as QueueMetadata
  participant CursorCodec as Cursor
  participant VisIndex as VisibilityIndex
  participant DataRecords as DataRecords
  participant Projection as Projection
  
  Client->>AdminPeekQueue: AdminPeekQueue(ctx, principal, queue, opts)
  AdminPeekQueue->>Authorization: canRead()
  Authorization-->>AdminPeekQueue: permitted?
  AdminPeekQueue->>QueueMeta: LoadQueueMetadata(MVCC snapshot)
  QueueMeta-->>AdminPeekQueue: metadata, generation
  AdminPeekQueue->>CursorCodec: DecodePeekCursor(opts.Cursor)
  CursorCodec-->>AdminPeekQueue: cursor or error
  AdminPeekQueue->>VisIndex: walkPeek(partitions, limit)
  VisIndex->>VisIndex: ScanAt per partition
  VisIndex->>DataRecords: LoadRecord for each key
  DataRecords-->>VisIndex: message or skip
  VisIndex->>Projection: Accumulate rows
  Projection-->>VisIndex: []AdminPeekedMessage
  VisIndex-->>AdminPeekQueue: rows, nextCursor, done
  AdminPeekQueue->>CursorCodec: EncodePeekCursor(nextCursor)
  CursorCodec-->>AdminPeekQueue: encoded string
  AdminPeekQueue-->>Client: rows, continuationToken, error
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

  • bootjp/elastickv#757: Proposes the original admin console design for AdminPeekQueue with cursor-paginated non-destructive peeking; this PR implements that proposed feature with authorization gating and comprehensive validation.

Poem

🐰 A peek into queues without a trace,
Pagination cursors keep their place,
Read-only admins see what's inside,
No messages harmed—they safely hide!
With tests so thorough, we hop with glee,
The peek API is all we see! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title 'adapter/sqs: AdminPeekQueue backend (Phase 3)' is specific, concise, and accurately describes the main change: adding the AdminPeekQueue backend implementation as Phase 3 of a larger effort.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/sqs-admin-peek-queue

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude Bot commented May 20, 2026

Claude finished @bootjp's task in 6m 20s —— View job


Code Review

  • Gather context and read changed files
  • Review adapter/sqs_admin_peek.go (core implementation)
  • Review adapter/sqs_admin_peek_test.go (test coverage)
  • Review adapter/sqs_messages.go (constant extraction)
  • Review adapter/dynamodb_admin.go (minor additions)
  • Compile findings and post review

Summary

Solid, well-structured addition. The design is clean, the pagination cursor is versioned and generation-checked, and the test suite is thorough. One medium-severity bug (potential infinite loop via forged cursor) requires a fix; several minor points below.


Medium — Infinite loop via forged StartPartition

adapter/sqs_admin_peek.go:262-278 (preparePeekCursor) and adapter/sqs_admin_peek.go:297-321 (walkPeek)

decodePeekCursor validates schema version but does not validate that cursor.StartPartition < meta.PartitionCount. walkPeek's loop termination condition is:

nextPart := (cursor.Partition + 1) % meta.PartitionCount
if nextPart == cursor.StartPartition {
    return rows, nil, nil
}

If an authenticated admin principal submits a cursor where StartPartition >= meta.PartitionCount (e.g. StartPartition = math.MaxUint32), nextPart cycles through [0, PartitionCount) via modulo but never equals StartPartition, so the loop runs indefinitely as long as all partitions are exhausted (empty or all-delayed queue). An empty/all-delayed partitioned FIFO queue is the expected steady-state for a DLQ after a purge, making this reachable in practice.

The fix is a range check in preparePeekCursor on the continuation path:

if cursor == nil {
    ...
}
// Validate partition ranges before trusting client-supplied cursor fields.
if meta.PartitionCount > 1 && cursor.StartPartition >= meta.PartitionCount {
    return nil, errors.Wrap(ErrAdminSQSValidation,
        "admin peek: cursor StartPartition out of range")
}
if meta.PartitionCount > 1 && cursor.Partition >= meta.PartitionCount {
    return nil, errors.Wrap(ErrAdminSQSValidation,
        "admin peek: cursor Partition out of range")
}
if cursor.Generation != meta.Generation {
    ...
}

A corresponding test case for TestAdminPeekQueue_CursorMalformed or a new TestPreparePeekCursor_OutOfRangePartition would lock this down. Fix this →


Minor — truncatePeekBody splits at byte boundary (may yield invalid UTF-8)

adapter/sqs_admin_peek.go:447-453

return string(body[:bodyMaxBytes]), true, originalSize

SQS message bodies are opaque byte sequences, so a UTF-8 multibyte character (e.g. a Japanese queue name in the body) can be split at bodyMaxBytes, yielding a string with invalid UTF-8. The BodyTruncated=true flag tells the SPA to expect this, but it's worth documenting explicitly in the struct comment on AdminPeekedMessage.Body (or in truncatePeekBody's godoc) that the body may not be valid UTF-8 when truncated. The SPA must sanitize before display/JSON re-encode.


Minor — Peek side-effects the receive fanout counter

adapter/sqs_admin_peek.go:236

cursor, err = preparePeekCursor(cursor, meta, s.nextReceiveFanoutStart(name, meta.PartitionCount))

nextReceiveFanoutStart atomically increments the fanout counter even on read-only peek calls. This is documented as intentional ("rotated sequential scanning… reusing the receive path's receiveFanoutCounters for fairness"), but the consequence — a high-frequency admin polling loop (SPA tab refresh) subtly biases the receive path's partition rotation — is not documented in the code or design doc. Low risk with Limit ≤ 100 and leader-only, but worth a comment on AdminPeekQueue so the next person doesn't "fix" it back to a dedicated counter.


Minor — AdminPeekedAttribute(v) type cast is silent if structs diverge

adapter/sqs_admin_peek.go:466

out[name] = AdminPeekedAttribute(v)

This relies on AdminPeekedAttribute and sqsMessageAttributeValue having identical field names and types (Go ignores struct tags for conversion). The comment in projectPeekedAttributes documents this, but if sqsMessageAttributeValue gains a new field (e.g. a future CustomTypeHeader string) the cast will fail to compile — which is actually the right behavior (compiler enforces sync). No action needed, just noting it's by design and the compile error is the guard.


Nit — Missing comment on sqsAttributeBaseTypeString and sqsAttributeBaseTypeNumber

adapter/sqs_messages.go:1724-1725

sqsAttributeBaseTypeBinary has a doc comment explaining why the canonical name matters (it's part of the wire contract). The two new constants sqsAttributeBaseTypeString and sqsAttributeBaseTypeNumber are symmetrical and could benefit from a brief comment for consistency. Small, but aligns with the existing documentation pattern.


What looks good

  • Generation check in preparePeekCursor: Clean sentinel on stale cursors — forces SPA refresh rather than silently serving rows from a purged generation. The TestAdminPeekQueue_StaleGenerationCursor integration test pins the behaviour end-to-end.
  • Consistent MVCC snapshot: Single nextTxnReadTS threaded through loadQueueMetaAt, ScanAt, and every GetAt — the page is snapshot-isolated. The now for vis-bounds is captured once at the top of walkPeek (consistent across partitions), matching the receive path pattern.
  • Tombstone skip is defensive, not wrong: The !ok skip in walkPeekPartition (record absent despite vis-index entry) is a correct defensive guard; DeleteMessage atomically removes both entries so the skip fires only on crash/race edge cases.
  • clampPeekLimit / clampPeekBodyBytes truth-table tests: Covering all clamp branches explicitly rather than relying on integration tests to exercise them.
  • canRead() placement: Added to dynamodb_admin.go next to canWrite() where AdminRole is defined — right location.
  • Constant extraction in sqs_messages.go: Byte-identical rename; the switch case (sqsAttributeBaseTypeString, sqsAttributeBaseTypeNumber) reads correctly.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the AdminPeekQueue functionality for SQS, allowing users to inspect messages without affecting their visibility or receive counts. The implementation includes a new canRead authorization gate, support for paginated results via base64url-encoded JSON cursors, and configurable message body truncation. Comprehensive tests have been added to verify the behavior across standard and partitioned FIFO queues, ensuring that the peek operation remains non-destructive and respects security roles. I have no feedback to provide as there were no review comments to assess.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: f5838bddb7

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/sqs_admin_peek.go
Comment on lines +274 to +278
if cursor.Generation != meta.Generation {
return nil, errors.Wrap(ErrAdminSQSValidation,
"admin peek: cursor is from a prior generation; restart from the front")
}
return cursor, nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Reject out-of-range cursor partitions

preparePeekCursor validates only Generation, so a client can submit a cursor whose StartPartition is outside [0, meta.PartitionCount). In walkPeek, termination depends on nextPart == cursor.StartPartition; with an out-of-range start partition this condition is never true, so an empty/fully exhausted partitioned queue causes an unbounded loop of ScanAt calls and the request never returns. This is a request-amplification DoS against the admin endpoint and should be rejected as ErrAdminSQSValidation before scanning.

Useful? React with 👍 / 👎.

bootjp added a commit that referenced this pull request May 20, 2026
Codex r1 P1 on PR #794: walkPeek's partition-rotation termination
condition is `(cursor.Partition + 1) % meta.PartitionCount ==
cursor.StartPartition`. With a client-supplied cursor whose
StartPartition lies outside [0, PartitionCount, the modulo cycle
never matches and the call loops ScanAt over guaranteed-empty
partitions forever — a request-amplification DoS against the
admin endpoint.

preparePeekCursor now bounds-checks both StartPartition and
Partition against max(meta.PartitionCount, 1) and rejects
out-of-range values with ErrAdminSQSValidation. Non-partitioned
queues (PartitionCount <= 1) require both fields to be 0.

Caller audit: peek cursors are constructed in exactly two places —
preparePeekCursor itself (now validated) and walkPeek's
partition-advance (,
which produces values in [0, PartitionCount) by construction).
No other caller. Generation validation already handles the
purge-between-pages case separately.

Tests: regression test feeds a wire-level cursor with
StartPartition=999 / Partition=999 against a PartitionCount=4
queue and asserts ErrAdminSQSValidation under a 5-second deadline
so any future loss of the bounds check terminates rather than
hanging CI. Four preparePeekCursor unit tests pin the truth table
(partitioned out-of-range / non-partitioned non-zero / in-range
accepted).
EOF
)
Codex r1 P1 on PR #794: walkPeek's partition-rotation termination
condition is "(Partition + 1) mod PartitionCount == StartPartition".
With a client-supplied cursor whose StartPartition lies outside
[0, PartitionCount), the modulo cycle never matches and the call
loops ScanAt over guaranteed-empty partitions forever - a
request-amplification DoS against the admin endpoint.

preparePeekCursor now bounds-checks both StartPartition and
Partition against max(meta.PartitionCount, 1) and rejects
out-of-range values with ErrAdminSQSValidation. Non-partitioned
queues (PartitionCount <= 1) require both fields to be 0.

Caller audit: peek cursors are constructed in exactly two places -
preparePeekCursor itself (now validated) and walkPeek's
partition-advance ((Partition+1) mod PartitionCount), which
produces values in [0, PartitionCount) by construction. No other
caller. Generation validation already handles the purge-between-
pages case separately.

Tests: regression test feeds a wire-level cursor with
StartPartition=999 / Partition=999 against a PartitionCount=4
queue and asserts ErrAdminSQSValidation under a 5-second deadline
so any future loss of the bounds check terminates rather than
hanging CI. Four preparePeekCursor unit tests pin the truth table
(partitioned out-of-range / non-partitioned non-zero / in-range
accepted).
@bootjp bootjp force-pushed the feat/sqs-admin-peek-queue branch from ed24325 to 9bfaacf Compare May 20, 2026 11:43
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 20, 2026

@claude review

Addressed Codex r1 P1 in 9bfaacf:

  • preparePeekCursor now rejects out-of-range StartPartition / Partition with ErrAdminSQSValidation. Non-partitioned queues (PartitionCount <= 1) require both fields to be 0.
  • Closes the unbounded-loop DoS vector: walkPeek's '(Partition+1) mod PartitionCount == StartPartition' termination never fired for an out-of-range StartPartition, looping ScanAt forever.
  • Regression test runs under a 5-second deadline so any future loss of the bounds check terminates rather than hanging CI; four unit tests pin the truth table.
  • Caller audit: walkPeek's partition advance ((p+1) mod PartitionCount) is the only other cursor producer; it yields values in [0, PartitionCount) by construction.

@claude
Copy link
Copy Markdown

claude Bot commented May 20, 2026

Claude finished @bootjp's task in 1m 39s —— View job


Code Review

Reviewing the r1 P1 fix (out-of-range cursor partition bounds check) from commit 9bfaacf.

  • Load tools
  • Read adapter/sqs_admin_peek.go — verify bounds checks in preparePeekCursor
  • Read adapter/sqs_admin_peek_test.go — verify regression tests
  • Post review

r1 P1 Fix: Partition bounds check in preparePeekCursor

The fix is correct and complete. Here's a thorough assessment.

What changed

adapter/sqs_admin_peek.go:288-300

maxPartition := meta.PartitionCount
if maxPartition <= 1 {
    // Non-partitioned queues only have partition 0. A non-zero
    // Partition or StartPartition would also fail the rotation
    // termination check; reject explicitly so the error is the
    // documented 400, not a silent O(infty) loop.
    maxPartition = 1
}
if cursor.StartPartition >= maxPartition || cursor.Partition >= maxPartition {
    return nil, errors.Wrapf(ErrAdminSQSValidation,
        "admin peek: cursor partition index out of range (StartPartition=%d, Partition=%d, max=%d)",
        cursor.StartPartition, cursor.Partition, maxPartition)
}

The maxPartition normalization (PartitionCount <= 1 → 1) correctly maps both non-partitioned queues (PartitionCount=0) and single-partition queues (PartitionCount=1) to maxPartition=1, requiring both fields to be 0. Partitioned queues use PartitionCount directly. Both StartPartition and Partition are checked in one condition so neither can independently escape the valid range. The error message embeds the actual field values for operator diagnostics.

The generation check is placed before the bounds check (line 284). A cursor that is both stale-generation and out-of-range will return the generation error message rather than the bounds error — this is fine, both are ErrAdminSQSValidation and both correctly terminate the call.

Test coverage assessment

TestPreparePeekCursor_PartitionOutOfRange (lines 771-826): Four sub-cases that pin the full truth table:

Sub-case Cursor input Meta Expected
StartPartition >= PartitionCount SP=99, P=0 PC=4 ErrAdminSQSValidation
Partition >= PartitionCount SP=0, P=99 PC=4 ErrAdminSQSValidation
Non-partitioned: non-zero StartPartition SP=1, P=0 PC=0 ErrAdminSQSValidation
Non-partitioned: non-zero Partition SP=0, P=5 PC=1 ErrAdminSQSValidation
In-range cursor accepted SP=3, P=2 PC=4 nil (accepted)

All four rejection paths are structurally distinct — the acceptance case verifies valid cursors aren't blocked.

TestAdminPeekQueue_HostileCursorBoundedRequest (lines 834-862): End-to-end regression with a 5-second context.WithTimeout. The hostile cursor (StartPartition=999, Partition=999) is wire-encoded and passed through the full AdminPeekQueue stack. The deadline is the key safety net: if the bounds check is ever removed, the test times out (context deadline exceeded) rather than returning ErrAdminSQSValidation, making the regression immediately visible in CI rather than hanging the suite.

One minor observation: the hostile cursor uses Generation: 1, but the queue's actual generation may differ (depending on what installPartitionedMetaForTest initializes). If they differ, the generation check fires before the bounds check. The test still passes because ErrAdminSQSValidation is returned either way — it just may not exercise the partition bounds path end-to-end when the generations don't match. This is acceptable because TestPreparePeekCursor_PartitionOutOfRange pins the bounds check in isolation, and the integration test pins the "hostile cursor → 400" property regardless of which guard fires.

preparePeekCursor docstring

The rationale comment (lines 258-266) is exactly the right level of detail — it names the attack vector, explains why the termination condition fails, and cites the PR and issue. Future readers won't be tempted to "simplify" the check.

Verdict

The fix closes the DoS vector correctly. The unit-level truth table + end-to-end deadline regression is the right pairing. No additional changes needed for this fix.

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 9bfaacfa4f

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/sqs_admin_peek.go Outdated
if !exists {
return nil, "", ErrAdminSQSNotFound
}
cursor, err = preparePeekCursor(cursor, meta, s.nextReceiveFanoutStart(name, meta.PartitionCount))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid advancing fanout rotation on peek cursor resumes

AdminPeekQueue always calls nextReceiveFanoutStart before preparePeekCursor, even when the request already carries a continuation cursor. Because this counter is shared with ReceiveMessage, every extra peek page advances the receive start partition and can reintroduce fixed-stride aliasing (e.g., 4 partitions and 3 peek pages between receives makes every receive start at the same partition). Under sustained backlog with small MaxNumberOfMessages, this can repeatedly prioritize the same partitions and starve others, so continuation requests should not mutate the shared receive-rotation counter.

Useful? React with 👍 / 👎.

Codex r2 P1 on PR #794: AdminPeekQueue called nextReceiveFanoutStart
unconditionally - on every continuation page as well as the first.
The counter is shared with ReceiveMessage's partition-rotation;
bumping it on every peek page perturbs the stride ReceiveMessage
reads for fairness. Under sustained backlog with small
MaxNumberOfMessages, a fixed peek-to-receive interleave (e.g.
PartitionCount=4 + 3 peek pages between receives) re-aligns every
receive to the same partition, starving the others.

Extract a peekStartPartition helper that returns 0 (no consultation
needed) when either (a) the call carries a continuation cursor or
(b) the queue is non-partitioned. The fanout counter advances only
on the first page of a partitioned-queue peek walk.

Caller audit: nextReceiveFanoutStart has two callers - the receive
fanout (unchanged, still bumps per receive call) and now
peekStartPartition (bumps only on first peek page). preparePeekCursor
only USES startPartition when stamping a fresh cursor, so passing 0
on continuations is equivalent to passing the unused fanout value
except it does not perturb the shared counter.

Tests: TestAdminPeekQueue_ContinuationDoesNotBumpFanoutCounter
exercises a partitioned FIFO queue, reads the counter value
directly from receiveFanoutCounters after the first page, drives
five continuation peek calls, and asserts the counter is
unchanged. Regression of this property reintroduces the receive
starvation Codex flagged.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 20, 2026

@claude review

Addressed Codex r2 P1 in 3e6841e:

  • AdminPeekQueue no longer calls nextReceiveFanoutStart on continuation pages. New peekStartPartition helper consults the counter only when (a) cursor is nil (first page) AND (b) PartitionCount > 1.
  • Closes the fanout-stride aliasing: peek pagination no longer perturbs the partition-rotation counter ReceiveMessage uses for fairness.
  • Regression test reads receiveFanoutCounters directly after the first page, drives 5 continuation calls, and asserts the counter is unchanged.
  • Caller audit: nextReceiveFanoutStart still bumps once per receive (unchanged) and once per fresh peek walk (new), never on continuations.

@claude
Copy link
Copy Markdown

claude Bot commented May 20, 2026

Claude finished @bootjp's task in 1m 7s —— View job


Code Review

Reviewing the r2 P1 fix (fanout-stride aliasing on continuation pages) from commit 3e6841e.

  • Load tools
  • Read adapter/sqs_admin_peek.go — verify peekStartPartition helper and call sites
  • Read adapter/sqs_admin_peek_test.go — verify regression test
  • Post review

r2 P1 Fix: Fanout counter isolation on continuation pages

The fix is correct and complete.

What changed

adapter/sqs_admin_peek.go:264-269 — new peekStartPartition helper:

func (s *SQSServer) peekStartPartition(queueName string, cursor *peekCursor, meta *sqsQueueMeta) uint32 {
    if cursor != nil || meta.PartitionCount <= 1 {
        return 0
    }
    return s.nextReceiveFanoutStart(queueName, meta.PartitionCount)
}

Call site at line 236 — the decoded cursor (nil on first page, non-nil on continuation) is passed directly to peekStartPartition before it is replaced by preparePeekCursor. This is the correct ordering: the guard reads the incoming cursor state, not the outgoing one.

Correctness analysis

Three cases to verify:

Scenario cursor arg PartitionCount Action Correct?
First page, partitioned nil > 1 Calls nextReceiveFanoutStart (bumps once)
Continuation page, partitioned non-nil > 1 Returns 0, counter untouched
Any page, non-partitioned nil or non-nil <= 1 Returns 0, no counter involved

The returned 0 on the continuation path is safe because preparePeekCursor ignores the startPartition argument when cursor != nil — it reads cursor.StartPartition directly from the incoming cursor, which was already set on the first page. Passing 0 is a no-op, not a reset.

Test coverage

TestAdminPeekQueue_ContinuationDoesNotBumpFanoutCounter: The test verifies the contract end-to-end:

  1. Creates a 4-partition FIFO queue, sends 6 messages across distinct groups.
  2. Executes a first-page peek (Limit=2) and asserts it creates the fanout counter entry (if it doesn't, the test fails fast — this pin on the first-page bump is a nice addition).
  3. Snapshots counter.Load() after the first page.
  4. Drives 5 continuation calls.
  5. Asserts counter.Load() is unchanged.

The choice of 5 continuation calls (more than PartitionCount=4) is intentional — any aliasing from a per-call bump would manifest as a multiple-of-4 drift, making a regression unambiguous.

One minor observation: the 5 continuation calls may exhaust the queue (6 messages / Limit=2 = 3 pages), so loops i=3,4 will receive an empty cursor and break early. The assertion still fires correctly because the counter is checked after the loop either way — the early break is benign here.

Summary

The fix is minimal, targeted, and well-reasoned. The peekStartPartition helper is the right level of abstraction: it centralises the "when to advance" decision without touching preparePeekCursor's unit-testability (that helper remains method-free). The docstring names the aliasing scenario precisely and cites the issue — future readers won't misread the cursor != nil short-circuit as dead code.

No further changes needed for this fix.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@adapter/sqs_admin_peek_test.go`:
- Around line 49-51: The current loop uses assertPeekRowMatchesIndexedBody(t, i,
row, "body-") which couples test to ordering; instead, extract the actual row
bodies from rows (e.g., map or slice of strings using the same accessor used in
assertPeekRowMatchesIndexedBody), build a set of actual bodies, and assert the
set equals the expected set {"body-0","body-1","body-2"} (or that it contains
each expected value) so order does not matter; apply the same change to the
other assertion block that currently iterates with index-based checks.
- Around line 870-872: The code does a type assertion counter, _ :=
v.(*atomic.Uint32) then immediately calls counter.Load(), which will panic if
the assertion failed or returned nil; update the test to check the assertion
success and non-nil before dereferencing: perform counter, ok :=
v.(*atomic.Uint32) and if !ok || counter == nil call the test failure helper
(e.g., t.Fatalf or require.Failf) with a clear message referencing v and the
expected *atomic.Uint32, otherwise call counter.Load(); update the usage sites
in the test that reference counter.Load() accordingly.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: fbf13a1e-f945-42fc-ba13-fb7921795dea

📥 Commits

Reviewing files that changed from the base of the PR and between c9a7870 and 3e6841e.

📒 Files selected for processing (4)
  • adapter/dynamodb_admin.go
  • adapter/sqs_admin_peek.go
  • adapter/sqs_admin_peek_test.go
  • adapter/sqs_messages.go

Comment thread adapter/sqs_admin_peek_test.go Outdated
Comment thread adapter/sqs_admin_peek_test.go Outdated
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 3e6841e0c5

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/sqs_admin_peek.go Outdated
Comment on lines +265 to +268
if cursor != nil || meta.PartitionCount <= 1 {
return 0
}
return s.nextReceiveFanoutStart(queueName, meta.PartitionCount)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Use effective partition count for admin peek fanout

peekStartPartition keys fanout on meta.PartitionCount directly, but per-queue FIFO mode (FifoThroughputLimit=perQueue) intentionally collapses all traffic to partition 0 (see partitionFor / effectivePartitionCount). In that configuration, AdminPeekQueue still rotates/scans as if all partitions are live, so each peek can perform up to N-1 guaranteed-empty ScanAt calls (e.g., 31 extra scans at PartitionCount=32) and needlessly loads the leader. Aligning peek with effectivePartitionCount(meta) would preserve behavior while avoiding this avoidable read amplification.

Useful? React with 👍 / 👎.

Three r3 findings on PR #794:

1. Codex P2 (sqs_admin_peek.go:268). peekStartPartition and
   walkPeek's partition advance / termination keyed off
   meta.PartitionCount directly. In perQueue throughput mode
   (FifoThroughputLimit=perQueue) the data plane collapses every
   MessageGroupId to partition 0 (see effectivePartitionCount in
   sqs_keys_dispatch.go); peek still rotated over all N partitions,
   wasting up to N-1 guaranteed-empty ScanAt calls per peek (31
   extra scans at PartitionCount=32). Switch to
   effectivePartitionCount(meta) in peekStartPartition, walkPeek's
   partition advance, and preparePeekCursor's bounds check.
   Caller audit: effectivePartitionCount is the same helper
   receiveMessage uses for its fanout loop (sqs_messages.go:885),
   so peek and receive now agree on partition usage.

2. CodeRabbit minor (test order-coupling). HappyPath asserted row
   ordering matched send order, but vis-index entries with the same
   visible_at millisecond tie-break on message_id (random hex) — a
   timing-sensitive flake. Replace ordered loop with set-based
   assertion via assertPeekRowsAsSet.

3. CodeRabbit minor (counter type assertion panic). The fanout
   counter test used `counter, _ := v.(*atomic.Uint32)` then
   counter.Load(), which panics if the assertion fails. Extract
   loadFanoutCounter helper that fails the test explicitly on bad
   type / nil rather than panicking.

Tests: TestAdminPeekQueue_PerQueueFIFOCollapsesToOnePartition pins
the perQueue functional contract end-to-end.
TestPreparePeekCursor_PerQueueCollapse pins the cursor codec's
effective-bounds validation (a Partition=2 cursor against a
perQueue PartitionCount=4 queue is rejected — even though the raw
PartitionCount would otherwise allow it).
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 21, 2026

@claude review

Addressed r3 in fc35ee5:

  • Codex P2 (sqs_admin_peek.go:268): peekStartPartition / walkPeek / preparePeekCursor now key off effectivePartitionCount(meta), aligning peek with the data plane. perQueue FIFO queues now walk only partition 0 instead of N-1 guaranteed-empty partitions. New end-to-end test TestAdminPeekQueue_PerQueueFIFOCollapsesToOnePartition + cursor codec test TestPreparePeekCursor_PerQueueCollapse pin the contract.
  • CodeRabbit (test order-coupling): HappyPath now uses set-based assertion via assertPeekRowsAsSet — vis-index entries with the same visible_at millisecond tie-break on random message_id, so order-sensitive assertion was timing-flaky.
  • CodeRabbit (counter type assertion panic): extracted loadFanoutCounter helper that fails the test on bad type / nil instead of panicking on counter.Load().
  • Caller audit: effectivePartitionCount is the same helper receiveMessage uses for its fanout loop, so peek and receive agree on partition usage.

@claude
Copy link
Copy Markdown

claude Bot commented May 21, 2026

Claude finished @bootjp's task in 2m 17s —— View job


Code Review

Reviewing r3 changes from commit fc35ee5.

  • Load tools
  • Read adapter/sqs_admin_peek.go — verify effectivePartitionCount integration
  • Read adapter/sqs_admin_peek_test.go — verify new tests and helpers
  • Post review

r3 Fix: Align peek with effectivePartitionCount

The fix is correct and complete across all three call sites. Here's a thorough assessment of each change.


1. Codex P2 — effectivePartitionCount integration

Three sites updated; all correct.

peekStartPartition (sqs_admin_peek.go:264-273)

Before: if cursor != nil || meta.PartitionCount <= 1 { return 0 } then nextReceiveFanoutStart(queueName, meta.PartitionCount).
After: effective := effectivePartitionCount(meta); if effective <= 1 { return 0 }; return s.nextReceiveFanoutStart(queueName, effective).

For a perQueue FIFO queue with PartitionCount=4, effective=1 → the effective <= 1 guard fires → 0 returned with no counter bump. Correct — there is no meaningful partition rotation for perQueue queues; calling the fanout counter for it would have been pointless noise on the shared counter.

preparePeekCursor (sqs_admin_peek.go:307-329)

The refactoring removes the old maxPartition variable (which only handled PartitionCount <= 1 but not FifoThroughputLimit=perQueue) and replaces it with effective := effectivePartitionCount(meta). Two correctness properties:

  • Fresh cursor path (cursor == nil): if effective > 1 { out.StartPartition = startPartition; out.Partition = startPartition } — for perQueue with effective=1, this block is skipped; StartPartition and Partition remain 0 even when caller passes a non-zero startPartition. The TestPreparePeekCursor_PerQueueCollapse/"fresh cursor on perQueue stamps StartPartition=0" sub-case pins this explicitly by passing startPartition=2 and asserting both fields come out 0.

  • Continuation path bounds check: if cursor.StartPartition >= effective || cursor.Partition >= effective — for perQueue with effective=1, any Partition > 0 is rejected. The "perQueue rejects cursor.Partition >= 1" sub-case (Partition=2, PartitionCount=4, perQueue → effective=1 → 2 >= 1 → rejected) locks this down. Under the old maxPartition code, this would have used maxPartition = PartitionCount = 4 and incorrectly accepted the cursor.

walkPeek (sqs_admin_peek.go:365-374)

effective := effectivePartitionCount(meta) computed inside the loop (pure function, constant per call), replacing the old meta.PartitionCount <= 1 check and % meta.PartitionCount modulo. For perQueue with effective=1: the effective <= 1 guard fires on the first exhausted partition, returning the partial rows immediately — no rotation into guaranteed-empty partitions.

Minor observation: effectivePartitionCount is pure and meta is constant during the walk, so computing effective once before the loop would be slightly more idiomatic Go. Not a bug — the compiler will likely inline it anyway given it's a tiny two-branch function — but worth noting.


2. CodeRabbit — assertPeekRowsAsSet (test order-coupling)

sqs_admin_peek_test.go:59-82

The helper is clean. It validates per-row invariants (BodyTruncated, MessageID, SentTimestamp) inline while building the observed set, then checks membership from both directions (want ⊆ got, |got| == |want|). The direction of the failure messages is informative — the "missing body %q in observed set %v" message names the absent key rather than dumping the whole want set. Good.

The TestAdminPeekQueue_HappyPath usage at line 49-50 is now unambiguously order-independent.


3. CodeRabbit — loadFanoutCounter (nil-panic safety)

sqs_admin_peek_test.go:842-853

func loadFanoutCounter(t *testing.T, node Node, queueName string) *atomic.Uint32 {
    t.Helper()
    v, ok := node.sqsServer.receiveFanoutCounters.Load(queueName)
    if !ok {
        t.Fatalf("fanout counter missing for queue %q (bump-on-first-page contract broken)", queueName)
    }
    counter, ok := v.(*atomic.Uint32)
    if !ok || counter == nil {
        t.Fatalf("fanout counter for %q has unexpected type %T (want *atomic.Uint32)", queueName, v)
    }
    return counter
}

Two-level check: first confirms the key exists in the sync.Map (catches a regression where the first page fails to create the entry), then asserts the concrete type. The error message on type mismatch prints the actual type (%T) which is exactly what you want for debugging a future refactor that changes the counter type. Solid.


New tests

TestAdminPeekQueue_PerQueueFIFOCollapsesToOnePartition (sqs_admin_peek_test.go:927-948): Sends 3 messages across 3 groups to a perQueue FIFO queue with PartitionCount=4, walks peek to exhaustion, asserts every message was seen exactly once. The functional contract is validated end-to-end. The internal optimization (no wasted ScanAt calls) is not directly observable without instrumentation, but the correctness property is what matters for a test.

TestPreparePeekCursor_PerQueueCollapse (sqs_admin_peek_test.go:955-978): Two sub-cases that form a minimal truth table for perQueue cursor handling. The "rejects cursor.Partition >= 1" sub-case is the more important one — it explicitly pins the "effective bound, not raw PartitionCount" distinction that was the bug. The test comment ("raw PartitionCount=4 would allow") makes the intent clear.


Existing tests still valid

TestPreparePeekCursor_PartitionOutOfRange's four sub-cases all still exercise the right code paths under the refactoring:

Sub-case effective cursor Result
non-partitioned, SP=1 1 SP=1 >= 1 rejected ✓
non-partitioned, P=5 1 P=5 >= 1 rejected ✓
partitioned, SP=99 4 SP=99 >= 4 rejected ✓
partitioned, in-range 4 SP=3, P=2 accepted ✓

What's clean

  • The three-site change (peekStartPartition / preparePeekCursor / walkPeek) is consistent — all three now agree on effectivePartitionCount as the authoritative bound. A partial fix (e.g., only updating walkPeek's modulo but not preparePeekCursor's validation) would have left an exploitable gap.
  • The TestPreparePeekCursor_PerQueueCollapse "fresh cursor" sub-case demonstrates that the caller-supplied startPartition is safely ignored for perQueue queues — the partition is pinned to 0 by the effective > 1 guard, not by trusting the caller to always pass 0.
  • Commit message explicitly lists all three findings and their sources (Codex P2, two CodeRabbit minors), making the PR history self-documenting.

No further changes needed. This PR is in good shape — all three rounds of review findings have been addressed.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
adapter/sqs_admin_peek.go (1)

35-45: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Add JSON tags to AdminPeekedMessage.

This is the wire model, but it currently marshals as MessageID, BodyTruncated, etc., and Attributes will serialize as null instead of being omitted. That breaks the intended JSON shape and makes the projectPeekedAttributes omitempty comment ineffective.

Suggested fix
 type AdminPeekedMessage struct {
-	MessageID        string
-	Body             string                          // truncated per opts.BodyMaxBytes
-	BodyTruncated    bool                            // true when Body was cut
-	BodyOriginalSize int64                           // bytes in the original body, for display
-	SentTimestamp    time.Time                       // SQS SentTimestamp
-	ReceiveCount     int32                           // ApproximateReceiveCount
-	GroupID          string                          // FIFO MessageGroupId, empty for standard
-	DeduplicationID  string                          // FIFO MessageDeduplicationId, empty for standard
-	Attributes       map[string]AdminPeekedAttribute // typed SQS message attributes
+	MessageID        string                          `json:"message_id"`
+	Body             string                          `json:"body"` // truncated per opts.BodyMaxBytes
+	BodyTruncated    bool                            `json:"body_truncated"`
+	BodyOriginalSize int64                           `json:"body_original_size"`
+	SentTimestamp    time.Time                       `json:"sent_timestamp"`
+	ReceiveCount     int32                           `json:"receive_count"`
+	GroupID          string                          `json:"group_id,omitempty"`
+	DeduplicationID  string                          `json:"deduplication_id,omitempty"`
+	Attributes       map[string]AdminPeekedAttribute `json:"attributes,omitempty"`
 }
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@adapter/sqs_admin_peek.go` around lines 35 - 45, The AdminPeekedMessage
struct is missing JSON tags, causing fields like MessageID and Attributes to
serialize incorrectly; update the struct (AdminPeekedMessage) to add explicit
`json:"..."` tags for each field (e.g., MessageID, Body, BodyTruncated,
BodyOriginalSize, SentTimestamp, ReceiveCount, GroupID, DeduplicationID) and
mark optional fields with `omitempty` (notably Attributes should be
`json:"attributes,omitempty"` so it is omitted when nil); ensure naming uses
lowerCamelCase or the desired wire keys to match the intended JSON shape and
keep types and comments unchanged.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@adapter/sqs_admin_peek.go`:
- Around line 35-45: The AdminPeekedMessage struct is missing JSON tags, causing
fields like MessageID and Attributes to serialize incorrectly; update the struct
(AdminPeekedMessage) to add explicit `json:"..."` tags for each field (e.g.,
MessageID, Body, BodyTruncated, BodyOriginalSize, SentTimestamp, ReceiveCount,
GroupID, DeduplicationID) and mark optional fields with `omitempty` (notably
Attributes should be `json:"attributes,omitempty"` so it is omitted when nil);
ensure naming uses lowerCamelCase or the desired wire keys to match the intended
JSON shape and keep types and comments unchanged.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 450a6342-93cc-4022-9072-b744f0a1df17

📥 Commits

Reviewing files that changed from the base of the PR and between 3e6841e and fc35ee5.

📒 Files selected for processing (2)
  • adapter/sqs_admin_peek.go
  • adapter/sqs_admin_peek_test.go

Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: fc35ee5174

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment thread adapter/sqs_admin_peek.go
Comment on lines +399 to +401
if len(lastKey) > 0 {
start = nextScanCursorAfter(lastKey)
if end != nil && bytes.Compare(start, end) > 0 {
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Validate cursor LastKey stays in the queue scan range

A continuation cursor with a valid gen/partition but a forged k can force scans to start outside this queue’s visibility-index prefix, because lastKey is trusted directly as the next start key and only checked against the upper bound. In that case ScanAt can walk unrelated key ranges and trigger up to Limit extra GetAt misses per call, so repeated requests become a read-amplification path against the leader. The cursor should be rejected as ErrAdminSQSValidation (or start clamped) when lastKey is below the partition’s visibility start prefix.

Useful? React with 👍 / 👎.

Two r4 findings on PR #794:

1. Codex P2 (sqs_admin_peek.go:401). preparePeekCursor only
   validated Generation and partition indices; cursor.LastKey was
   trusted directly as the ScanAt start key, checked only against
   the upper bound. A forged LastKey below the queue's
   visibility-index prefix would let ScanAt walk unrelated key
   ranges, triggering up to Limit extra GetAt misses per call -
   read amplification against the leader. preparePeekCursor now
   rejects with ErrAdminSQSValidation when cursor.LastKey is not
   prefixed by sqsMsgVisPrefixForQueueDispatch(meta, queueName,
   Partition, Generation). The check catches three classes of
   forgery: completely arbitrary bytes (e.g. "aaaa"), a valid
   LastKey from a different queue (prefix encodes queue name),
   and a valid LastKey from a different partition of the same
   queue. preparePeekCursor's signature gains queueName; the only
   existing caller (AdminPeekQueue) and the existing unit tests
   were updated.

2. CodeRabbit Major (sqs_admin_peek.go:35-45, outside diff).
   AdminPeekedMessage lacked JSON tags, so json.Marshal emitted
   Go-style PascalCase ("MessageID", "BodyTruncated", ...) instead
   of the snake_case wire shape the design doc §3.5 specifies
   (the SPA's client adapter expects the spec'd form). Also empty
   Attributes / GroupID / DeduplicationID would serialize as
   "null" / "" instead of being omitted. JSON tags added with
   appropriate omitempty.

Caller audit (semantic-change rule): preparePeekCursor signature
changed from (cursor, meta, startPartition) to (cursor, meta,
queueName, startPartition). One non-test caller (AdminPeekQueue at
sqs_admin_peek.go:240) and 10 test call sites updated. The new
queueName parameter is purely additive validation; it does not
alter the cursor's stamping or generation-mismatch behavior.

Tests: TestAdminPeekedMessage_JSONWireFormat /
TestAdminPeekedMessage_JSONOmitsEmptyAttributes pin the JSON wire
shape. TestPreparePeekCursor_PartitionOutOfRange gains three new
sub-cases (forged LastKey, foreign-queue LastKey, mismatched-
partition LastKey). The test function was rewritten as a
table-driven loop so it stays under the cyclop budget.
@bootjp
Copy link
Copy Markdown
Owner Author

bootjp commented May 21, 2026

@claude review

Addressed r4 in ddb4e6e:

  • Codex P2 (sqs_admin_peek.go:401, LastKey forgery → read amplification): preparePeekCursor now validates cursor.LastKey has the expected visibility-index prefix (queue+partition+gen). Three forgery classes covered: arbitrary bytes / foreign-queue LastKey / mismatched-partition LastKey. Caller audit: signature gained queueName; one non-test caller (AdminPeekQueue) and 10 test sites updated.
  • CodeRabbit Major (AdminPeekedMessage JSON tags missing): added snake_case json tags matching the design doc §3.5 wire shape; omitempty on GroupID / DeduplicationID / Attributes so empty values are dropped, not emitted as null/'""'. Two new tests pin the wire shape: TestAdminPeekedMessage_JSONWireFormat + TestAdminPeekedMessage_JSONOmitsEmptyAttributes.

@claude
Copy link
Copy Markdown

claude Bot commented May 21, 2026

Claude finished @bootjp's task in 2m 37s —— View job


Code Review

Reviewing r4 changes from commit ddb4e6e.

  • Load tools
  • Read adapter/sqs_admin_peek.go — verify LastKey prefix validation + JSON tags
  • Read adapter/sqs_admin_peek_test.go — verify new regression tests
  • Post review

r4: LastKey prefix validation + JSON tags

Both fixes are correct and complete. Detailed assessment below.


1. Codex P2 — cursor.LastKey prefix validation

adapter/sqs_admin_peek.go:341-347

if len(cursor.LastKey) > 0 {
    expectedPrefix := sqsMsgVisPrefixForQueueDispatch(meta, queueName, cursor.Partition, meta.Generation)
    if !bytes.HasPrefix(cursor.LastKey, expectedPrefix) {
        return nil, errors.Wrap(ErrAdminSQSValidation,
            "admin peek: cursor LastKey is outside the queue's visibility-index prefix")
    }
}

Check ordering is correct. The three guards execute in the right sequence:

  1. Generation mismatch (line 332) — rejects stale cursors before using any cursor field
  2. Partition bounds (line 336) — rejects out-of-range cursor.Partition before using it to construct a prefix
  3. LastKey prefix (line 341) — uses the already-validated cursor.Partition and the generation-matched meta.Generation

Using meta.Generation rather than cursor.Generation in the prefix call is safe and equivalent — the generation check above guarantees they are equal by this point.

bytes.HasPrefix edge cases are safe. If cursor.LastKey is shorter than expectedPrefix, the prefix check returns false (rejected). If someone crafts a key equal to exactly the prefix itself, bytes.HasPrefix returns true — this is benign: a scan starting at the prefix boundary starts at the very beginning of this queue+partition's keyspace and cannot escape it.

Signature change is clean. preparePeekCursor gained queueName string as a parameter. The one production call site (AdminPeekQueue:240) passes name correctly; all 10+ test call sites pass "test" / tc.queue / "queue-A" as appropriate.

Three forgery classes are well-chosen (sqs_admin_peek_test.go:879-884):

Sub-case Attack vector
[]byte("aaaa") Arbitrary bytes — not a valid vis-index prefix at all
foreignKey (queue-X prefix) Valid vis-index key but for a different queue
mismatchedKey (queue-A, partition 2, but cursor.Partition=0) Correct queue, wrong partition

These cover the full attack surface: arbitrary, wrong-queue, and wrong-partition forgeries. The foreignKey and mismatchedKey are constructed with append(sqsMsgVisPrefixForQueueDispatch(...), 'X'), which correctly creates a key that is inside the foreign prefix but fails the expected-queue prefix check.

Minor observation — no unit test for a valid LastKey on the acceptance path. The existing TestPreparePeekCursor_PartitionOutOfRange acceptance sub-case passes cursor.LastKey = nil, so it doesn't exercise bytes.HasPrefix returning true. The integration tests (TestAdminPeekQueue_CursorRoundTrip, TestAdminPeekQueue_PartitionedFIFO_Pagination, TestAdminPeekQueue_ContinuationDoesNotBumpFanoutCounter) all use real continuation cursors from walkPeekPartition, so the acceptance path is covered end-to-end. Not blocking — the integration coverage is sufficient.


2. CodeRabbit Major — AdminPeekedMessage JSON tags

adapter/sqs_admin_peek.go:39-49

type AdminPeekedMessage struct {
    MessageID        string                          `json:"message_id"`
    Body             string                          `json:"body"`
    BodyTruncated    bool                            `json:"body_truncated"`
    BodyOriginalSize int64                           `json:"body_original_size"`
    SentTimestamp    time.Time                       `json:"sent_timestamp"`
    ReceiveCount     int32                           `json:"receive_count"`
    GroupID          string                          `json:"group_id,omitempty"`
    DeduplicationID  string                          `json:"deduplication_id,omitempty"`
    Attributes       map[string]AdminPeekedAttribute `json:"attributes,omitempty"`
}

omitempty placement is correct:

  • BodyTruncated bool — no omitempty: always emits "body_truncated": false when not truncated. Correct — the SPA needs the explicit boolean rather than inferring absence as false.
  • BodyOriginalSize int64 — no omitempty: always emits, including 0. Correct — the SPA should always display the original size, even for an empty-body message (edge case but well-defined).
  • ReceiveCount int32 — no omitempty: a count of 0 (never received) is meaningful, not an absence.
  • GroupID, DeduplicationID, Attributes — with omitempty: empty strings and nil maps are correctly suppressed for standard queues. Attributes uses a map type, so omitempty drops it when nil (which projectPeekedAttributes returns for an empty input) rather than emitting "attributes": {} or "attributes": null.

TestAdminPeekedMessage_JSONWireFormat (line 735) is the right kind of wire-contract test: it checks for the presence of snake_case keys AND explicitly asserts the absence of PascalCase keys. The nested AdminPeekedAttribute snake_case check (line 774-779) pins the attribute shape in the same test, avoiding a separate fixture.

TestAdminPeekedMessage_JSONOmitsEmptyAttributes (line 786) correctly tests the three omitempty fields together. The assertion checks for absence by key, not for nil/zero, which is the right level — it matches what the SPA actually receives.


What's clean across both fixes

  • preparePeekCursor docstring (lines 279-318): the added queueName paragraph at lines 311-318 explains both why the field was added and what attack it closes, with a PR reference. This is exactly the right level of detail for a security-motivated parameter addition.
  • Comment-in-code audit trail: both the AdminPeekedMessage struct comment and the preparePeekCursor docstring cite the reviewer/round that caught the issue (CodeRabbit r4, Codex r4 P2). Saves future reviewers from having to dig through git log.
  • No behaviour change to the receive path: preparePeekCursor is only called from AdminPeekQueue; the new queueName parameter doesn't touch the data plane.

Verdict

All four rounds of review findings have been addressed. The PR is in good shape for merge. No further changes needed.

@bootjp bootjp merged commit fe96c88 into main May 21, 2026
9 of 10 checks passed
@bootjp bootjp deleted the feat/sqs-admin-peek-queue branch May 21, 2026 13:49
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant