docs: add 0.7 planning brief (0.6 recap + open-issue inventory)#384
docs: add 0.7 planning brief (0.6 recap + open-issue inventory)#384hardbyte wants to merge 4 commits into
Conversation
Consolidated context pack for planning the 0.7 cycle: what Awa is, what the 0.6 queue-storage release shipped, the ADR landscape (incl. proposed 027/028), and a full inventory of all 20 open issues distinguishing the v0.7.0 milestone (8 issues) from unmilestoned candidates. Claude-Session: https://claude.ai/code/session_01JvK9hvfEziH2Xnm1pEUuUh
Adds the #169 hot-row chronology (the fix-one-row-next-surfaces lineage from queue_lanes.available_count through the queue_claim_heads plan-bug fix in #355), the Awa-vs-pgque WAL/job gap, the raised stable gate, and what it all hands the #295 storage RFC; plus a section on the #197 release-readiness process as the template for driving 0.7 to a tag. Claude-Session: https://claude.ai/code/session_01JvK9hvfEziH2Xnm1pEUuUh
Strategic decisions (evidence-gated storage evolution, canonical deprecation gate, deployment-shape completion, tiered per-key control, safe-by-default admin, public stability policy, minimal A->B dependencies, end-to-end tracing), six workstreams with scoped issues (existing + 17 new drafts), eight experiments with numeric decision rules incl. the Gate A storage decision, gate-sequenced milestones M0-M6, six release gates, risks, and a disposition for every open issue. Claude-Session: https://claude.ai/code/session_01JvK9hvfEziH2Xnm1pEUuUh
…aft, test-plan validation matrix - Claim ADR numbers 033 (per-key control, #340), 034 (job dependencies, #14), 035 (backpressure, #341), 036 (stability policy, #369), 037 (canonical deprecation, #370) as placeholders per the ADR convention; register in index. - Draft docs/stability.md (D6): surface-by-surface compatibility map, deprecation policy, binary/schema skew support statement. - Add the 0.7 planned-validation matrix (V1-V29) to docs/test-plan.md, mapped to filed issues and the #383 release-gate tracker. - Sync docs/0.7-roadmap.md draft-issue references to filed numbers #367-#383. Claude-Session: https://claude.ai/code/session_01JvK9hvfEziH2Xnm1pEUuUh
📝 WalkthroughWalkthroughThis PR adds documentation-only content for the Awa 0.7 release cycle: a planning brief, a roadmap document, five new ADRs (033-037), an updated ADR index, a new public surface stability policy document, and a test-plan section listing planned 0.7 validation items. Changes0.7 Planning and Governance Documentation
Estimated code review effort: 2 (Simple) | ~10 minutes Possibly related issues
Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 798644cb9c
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| | Item | Pri | Scope & acceptance | | ||
| | --- | --- | --- | | ||
| | **#335** CI sharding | P0 | Matrix-sharded Rust tests (own Postgres per shard), cargo-nextest, audit of every >60s test to pin behavior with sub-second configured windows. **Accept:** worst shard <8 min; no real-time production defaults load-bearing in tests. Do first — it multiplies every later item. | | ||
| | **#360** Engine-aware test harness | P0 | `AWA_TEST_ENGINE` parameterization (spike branch exists); engine guard for canonical-only tests; fix the 13 triaged test artifacts; CI matrix leg. Under D2 this becomes a *bounded* compat suite: full matrix until 0.8 removes canonical, then retires to the forward-compat matrix. **Accept:** broad suite green under `queue_storage`; `cancel_by_unique_key`-class defects structurally impossible to ship again. | |
There was a problem hiding this comment.
Remove completed #360 harness work from M0
This scopes the AWA_TEST_ENGINE parameterization, canonical-only guard, artifact fixes, and CI queue-storage leg as future P0 work, but the tree already contains those pieces: .github/workflows/ci.yml has a rust-test-queue-storage job running integration_test with AWA_TEST_ENGINE=queue_storage (lines 89-124), and awa-testing/src/setup.rs has the env-driven engine switch plus skip_unless_canonical guard (lines 52-170). Leaving this stale spike text in the roadmap can send 0.7 planning back through work that has already landed; if #360 still has residual follow-up, this row should be narrowed to that remaining scope.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@docs/0.7-planning-brief.md`:
- Around line 243-256: Update the `#295` entry in the planning brief so it no
longer presents the storage RFC as committed v0.7.0 scope; revise the row and
the “#### `#295`” paragraph to match the roadmap’s Gate A evidence-track framing.
Keep the wording consistent with the rest of the table and remove any
release-scope signal that implies implementation commitment, using the `#295` row
and its “flagship” description as the primary references.
In `@docs/0.7-roadmap.md`:
- Around line 180-182: The Markdown table cells in the roadmap entries are being
split by unescaped pipe characters, breaking the table layout. Update the
affected cells in the roadmap content so the option lists in the backpressure
and enqueue semantics text are escaped or otherwise rendered as literal text,
and verify the table stays at the intended column count when rendered. Use the
existing roadmap entries around `#341` and the enqueue-related cell text as the
anchor for the fix.
- Around line 311-315: The milestone text in the roadmap is being broken by
wrapped lines that start with issue numbers, which Markdown can interpret as
headings. Reflow the affected milestone entries or escape those continuation
lines in the roadmap section around the §7 gates and Exit items so the wrapped
issue-number references stay part of the same bullet text. Apply the same fix
consistently to all similar milestone lines in that block.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 0a9f3f6b-518c-467e-b30f-e1701169d044
📒 Files selected for processing (10)
docs/0.7-planning-brief.mddocs/0.7-roadmap.mddocs/adr/033-per-key-execution-control.mddocs/adr/034-job-dependencies.mddocs/adr/035-backpressure-flow-control.mddocs/adr/036-public-surface-stability-policy.mddocs/adr/037-canonical-engine-deprecation.mddocs/adr/README.mddocs/stability.mddocs/test-plan.md
| | **295** | RFC: v0.7 storage engine — segment storage + cursor allocator | feature, correctness | **The defining 0.7 item.** RFC-stage, no code. | | ||
| | **246** *(not milestoned — see 4.2, but tightly bound to 295)* | | | | | ||
| | **360** | Test harness: run suite under both storage engines | — | Quality/safety prerequisite for evolving storage. | | ||
| | **303** | maintenance: split rescue+promote into separate tokio task (v0.7) | operational | **Conditional** — telemetry-gated, may close wontfix. | | ||
| | **282** | Add a maintenance-only runtime role | feature, operational | Implements ADR-028. | | ||
| | **143** | Split awa-api from awa-ui | — | Decouple API response types from embedded React. | | ||
| | **118** | Serverless-friendly dispatch: `tick()` endpoint | — | Zero-infra deployments; pairs with ADR-028. | | ||
| | **110** | Tracing (end-to-end distributed tracing) | feature | Context propagation enqueue→exec, across PyO3. | | ||
| | **14** | Job dependencies: run B after A completes | feature | Must stay minimal (A→B, not DAGs) per non-goal. | | ||
|
|
||
| #### #295 — RFC: v0.7 storage engine — segment storage + cursor allocator *(flagship)* | ||
| Milestone `v0.7.0`. **Design RFC, no implementation.** Directly targets the #169 residual | ||
| degradation (799 → 387 jobs/s at 800 offered/s over a 2h pinned phase) — the class of behaviour | ||
| 0.6's ADR-012/023/025/026 mitigated but did not eliminate. |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟠 Major | ⚡ Quick win
Align #295 with the roadmap.
This row still treats the storage RFC as committed v0.7.0 scope, but the roadmap now makes it a Gate A evidence track. Readers will get conflicting release-scope signals unless one of these is updated.
Suggested adjustment
-| **295** | RFC: v0.7 storage engine — segment storage + cursor allocator | feature, correctness | **The defining 0.7 item.** RFC-stage, no code. |
+| **295** | RFC: v0.7 storage engine — segment storage + cursor allocator | feature, correctness | Evidence track; scope depends on Gate A. |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | **295** | RFC: v0.7 storage engine — segment storage + cursor allocator | feature, correctness | **The defining 0.7 item.** RFC-stage, no code. | | |
| | **246** *(not milestoned — see 4.2, but tightly bound to 295)* | | | | | |
| | **360** | Test harness: run suite under both storage engines | — | Quality/safety prerequisite for evolving storage. | | |
| | **303** | maintenance: split rescue+promote into separate tokio task (v0.7) | operational | **Conditional** — telemetry-gated, may close wontfix. | | |
| | **282** | Add a maintenance-only runtime role | feature, operational | Implements ADR-028. | | |
| | **143** | Split awa-api from awa-ui | — | Decouple API response types from embedded React. | | |
| | **118** | Serverless-friendly dispatch: `tick()` endpoint | — | Zero-infra deployments; pairs with ADR-028. | | |
| | **110** | Tracing (end-to-end distributed tracing) | feature | Context propagation enqueue→exec, across PyO3. | | |
| | **14** | Job dependencies: run B after A completes | feature | Must stay minimal (A→B, not DAGs) per non-goal. | | |
| #### #295 — RFC: v0.7 storage engine — segment storage + cursor allocator *(flagship)* | |
| Milestone `v0.7.0`. **Design RFC, no implementation.** Directly targets the #169 residual | |
| degradation (799 → 387 jobs/s at 800 offered/s over a 2h pinned phase) — the class of behaviour | |
| 0.6's ADR-012/023/025/026 mitigated but did not eliminate. | |
| | **295** | RFC: v0.7 storage engine — segment storage + cursor allocator | feature, correctness | Evidence track; scope depends on Gate A. | | |
| | **246** *(not milestoned — see 4.2, but tightly bound to 295)* | | | | | |
| | **360** | Test harness: run suite under both storage engines | — | Quality/safety prerequisite for evolving storage. | | |
| | **303** | maintenance: split rescue+promote into separate tokio task (v0.7) | operational | **Conditional** — telemetry-gated, may close wontfix. | | |
| | **282** | Add a maintenance-only runtime role | feature, operational | Implements ADR-028. | | |
| | **143** | Split awa-api from awa-ui | — | Decouple API response types from embedded React. | | |
| | **118** | Serverless-friendly dispatch: `tick()` endpoint | — | Zero-infra deployments; pairs with ADR-028. | | |
| | **110** | Tracing (end-to-end distributed tracing) | feature | Context propagation enqueue→exec, across PyO3. | | |
| | **14** | Job dependencies: run B after A completes | feature | Must stay minimal (A→B, not DAGs) per non-goal. | | |
| #### `#295` — RFC: v0.7 storage engine — segment storage + cursor allocator *(flagship)* | |
| Milestone `v0.7.0`. **Design RFC, no implementation.** Directly targets the `#169` residual | |
| degradation (799 → 387 jobs/s at 800 offered/s over a 2h pinned phase) — the class of behaviour | |
| 0.6's ADR-012/023/025/026 mitigated but did not eliminate. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/0.7-planning-brief.md` around lines 243 - 256, Update the `#295` entry in
the planning brief so it no longer presents the storage RFC as committed v0.7.0
scope; revise the row and the “#### `#295`” paragraph to match the roadmap’s Gate
A evidence-track framing. Keep the wording consistent with the rest of the table
and remove any release-scope signal that implies implementation commitment,
using the `#295` row and its “flagship” description as the primary references.
| | **#295** Storage RFC → Gate A | P0 (decision), scope conditional | Run E1 (allocator bake-off) + E3 (WAL decomposition). Author the draft ADR from results. **Gate A decision rule (§5)** determines whether restructuring migrations enter 0.7 or the RFC graduates to 0.8 with data. Either way #295's RFC questions 1–5 get *answered*, not deferred. | | ||
| | **#371** Ring-state metadata striping | P1 | The residual pinned-horizon accumulators after #355 are `lease_ring_state` (~14k dead tuples/hr) and `claim_ring_state` (~3.5k). Apply the ADR-026 discipline (striping or delta-append) to ring bookkeeping. Small, measurable, keeps the "no dominant hot row" invariant tight. **Accept:** long-horizon idle-phase dead tuples for ring-state ≤ noise. | | ||
| | **#341** Backpressure | P1 | Per the issue's own design lean: **soft-signal default, opt-in hard rejection.** Concretely: (a) enqueue paths can return a depth signal (`EnqueueOutcome::pressure`) sourced from lane-head cursors (index-only, no scans); (b) `InsertOpts::backpressure: Off | Signal | Reject{limit}` — `Reject` returns a typed error and is documented as *changing transactional-enqueue semantics* (ADR-006 tension made explicit); (c) `PacedProducer` helpers in Rust + Python (the pattern the bench harness already proved externally); (d) metrics `awa.enqueue.backpressure.{signaled,rejected}`. E6 validates defaults. **Accept:** 2×-capacity offered load with paced producer holds bounded depth; hard-reject never enabled implicitly. | |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Escape the pipes in these table cells.
Off | Signal | Reject{limit} and Cancel | EnqueueWithContext | Discard split the Markdown tables into extra columns, so the acceptance text will render incorrectly.
Suggested fix
- | **`#341`** Backpressure | P1 | ... `InsertOpts::backpressure: Off | Signal | Reject{limit}` ... |
+ | **`#341`** Backpressure | P1 | ... `InsertOpts::backpressure: Off \| Signal \| Reject{limit}` ... |
- | **`#14`** Job dependencies | P1 | ... `on_parent_failure: Cancel | EnqueueWithContext | Discard`. ... |
+ | **`#14`** Job dependencies | P1 | ... `on_parent_failure: Cancel \| EnqueueWithContext \| Discard`. ... |Also applies to: 210-212
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 182-182: Table column count
Expected: 3; Actual: 5; Too many cells, extra data will be missing
(MD056, table-column-count)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/0.7-roadmap.md` around lines 180 - 182, The Markdown table cells in the
roadmap entries are being split by unescaped pipe characters, breaking the table
layout. Update the affected cells in the roadmap content so the option lists in
the backpressure and enqueue semantics text are escaped or otherwise rendered as
literal text, and verify the table stays at the intended column count when
rendered. Use the existing roadmap entries around `#341` and the enqueue-related
cell text as the anchor for the fix.
Source: Linters/SAST tools
| #335, #360, #367, #143, #368, #369, #370 groundwork, housekeeping closes, #383 tracker | ||
| opened with §7 gates. | ||
| *Exit:* worst CI shard <8 min; dual-engine suite green; compat matrix in nightly; `awa-api` | ||
| published as a crate; health endpoints merged; stability policy merged. | ||
|
|
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Reflow or escape the issue-number continuations.
The wrapped lines starting with #335, #282, #118, #110, and #340 will be parsed as headings, not milestone text.
Suggested pattern
- `#335`, `#360`, `#367`, `#143`, `#368`, `#369`, `#370` groundwork, housekeeping closes, `#383` tracker
+ **`#335`**, **`#360`**, **`#367`**, **`#143`**, **`#368`**, **`#369`**, **`#370`** groundwork, housekeeping closes; **`#383`** tracker.Apply the same treatment to the other milestone lines.
Also applies to: 323-327, 329-332, 334-336
🧰 Tools
🪛 markdownlint-cli2 (0.22.1)
[warning] 311-311: No space after hash on atx style heading
(MD018, no-missing-space-atx)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@docs/0.7-roadmap.md` around lines 311 - 315, The milestone text in the
roadmap is being broken by wrapped lines that start with issue numbers, which
Markdown can interpret as headings. Reflow the affected milestone entries or
escape those continuation lines in the roadmap section around the §7 gates and
Exit items so the wrapped issue-number references stay part of the same bullet
text. Apply the same fix consistently to all similar milestone lines in that
block.
Source: Linters/SAST tools
Consolidated context pack for planning the 0.7 cycle: what Awa is, what the
0.6 queue-storage release shipped, the ADR landscape (incl. proposed 027/028),
and a full inventory of all 20 open issues distinguishing the v0.7.0 milestone
(8 issues) from unmilestoned candidates.
Claude-Session: https://claude.ai/code/session_01JvK9hvfEziH2Xnm1pEUuUh
Summary by CodeRabbit