feat(observability): direct-POST metrics + error channel, per-call fetch operations, audit CLI by vibe-dex · Pull Request #179 · decocms/deco-start

vibe-dex · 2026-05-18T13:19:11Z

Summary

Builds on the 5.0 CF-native observability pipeline. Stamps service.name and service.version (from CF_VERSION_METADATA) on every framework span and log line via the attribute floor, and sets SpanStatusCode.ERROR on framework spans when withTracing rethrows. Adds new spans across the framework — cache lookup/store decisions in both cache paths (HTML + serverFn) with profile/kind/decision labels via an extended recordCacheMetric(hit, profile, decision), the runSectionLoaders batch, the deferred section load, and every admin protocol handler (meta, decofile.read|reload, render, invoke). Logger and logRequest now stamp trace_id/span_id on every line emitted inside a withTracing scope, so logs and traces correlate in ClickStack. New public helper injectTraceContext(headers) writes a W3C traceparent from the active span for outbound fetches (consumed by createInstrumentedFetch in @decocms/apps). Ships docs/observability.md covering the architecture, span/metric reference, wrangler.jsonc shape, identity floor, ClickHouse JSONExtract query patterns, trace propagation, and sampling notes; linked from README.

Test plan

bun run typecheck clean
bun test — all 553 tests pass (11 new)
bun run build succeeds; new exports (injectTraceContext, CacheDecision, spanContext) appear in dist d.ts
bunx tsx scripts/check-tier-boundaries.ts — clean
Post-merge smoke on a preview deployment with version_metadata binding + ingest destinations: confirm ResourceAttributes['service.version'] and SpanAttributes['cache.profile'] land in ClickHouse; force a 5xx and verify the framework span shows STATUS_CODE_ERROR in ClickStack; filter logs by trace_id and confirm correlation.

🤖 Generated with Claude Code

Summary by cubic

Expands CF-native observability with wider span coverage, direct OTLP/HTTP metrics and 100%-error logs, safe URL logging, trace/log correlation, per-call operation names on outbound fetches, and a deco-audit-observability CLI. Also exposes Router preload reuse options and fixes fetch method/header semantics, exporter reliability, and redaction edge cases.

New Features
- Error logs: direct OTLP/HTTP channel posts level: "error" to /v1/logs when DECO_OTEL_LOGS_ENDPOINT is set; rate-limited, buffered per isolate, flushed via ctx.waitUntil. Safe to lower observability.logs.head_sampling_rate.
- Metrics: OTLP/HTTP meter posts to /v1/metrics when DECO_OTEL_METRICS_ENDPOINT is set; aggregates per isolate, flushes via ctx.waitUntil; composes with Workers Analytics Engine.
- Spans/metrics: stamps service.name/service.version; sets error status on framework spans; new spans for cache lookup/store (HTML + serverFn), deco.section.loaders.batch, deferred section load, and admin handlers. Cache metrics carry a decision label and set deco.cache.decision/deco.cache.profile on the active span.
- Correlation/propagation: logger/logRequest add trace_id/span_id inside withTracing. injectTraceContext(headers) writes W3C traceparent; used by createInstrumentedFetch (default on, opt out with injectTraceparent: false).
- Outbound fetches: createInstrumentedFetch adds per-call operation names — init.operation, defaultOperation, and resolveOperation(url, method); stamps fetch.operation on the span and exposes it via onComplete.
- URL safety: createInstrumentedFetch redacts query values on http.url and dev logs; allowlist via keepQueryKeys. Public redactUrl helper exported.
- Router: expose preload reuse knobs (defaultPreloadStaleTime, defaultPreloadGcTime, defaultPreloadDelay, defaultStaleTime, defaultPendingMs, defaultPendingMinMs); adds deco-pdp-fast-navigation reference.
- Audit/docs/defaults: deco-audit-observability audits wrangler.jsonc; observability.traces.head_sampling_rate defaults to 0.01. docs/observability.md documents wiring and cost model; README links it. New exports: @decocms/start/sdk/otelHttpMeter, @decocms/start/sdk/otelHttpErrorLog, injectTraceContext, CacheDecision, redactUrl.
Bug Fixes
- Fetch: header handling now matches the Fetch spec — init.headers replaces Request.headers; when omitted, existing headers are preserved. Method resolution now follows init.method ?? input.method ?? "GET" so http.method and resolveOperation see the real verb.
- Error-log exporter: on non-2xx/transport errors, restores the flushed snapshot instead of dropping records, with snapshot-first policy — evicts newest live-buffer records to honor the buffer cap; only truncates the snapshot if it alone exceeds the cap; guards attribute serialization when JSON.stringify is undefined.
- Metrics exporter: checks overflow before creating metric entries to avoid empty OTLP envelopes.
- URL redaction: fallback path also strips fragments (#...) to avoid leaking tokens.
- JSONC parsing: adds trailing-comma support via shared helpers; deco-audit-observability and deco-cf-observability handle real wrangler.jsonc files cleanly.

^{Written for commit 999fd30. Summary will update on new commits. Review in cubic}

…and trace correlation Stamps service.name and service.version (from CF_VERSION_METADATA) on every framework span and log via the attribute floor; sets SpanStatusCode.ERROR on framework spans when withTracing rethrows; adds spanContext() on the Span adapter so logger and propagation helpers can read trace IDs. New framework spans cover cache lookup/store decisions (HTML + serverFn paths) with profile and kind attributes, the runSectionLoaders batch, deferred section load, and every admin protocol handler (meta, decofile read/reload, render, invoke). recordCacheMetric now takes a CacheDecision label (HIT|STALE-HIT|STALE-ERROR|MISS|BYPASS) and is wired at every decision point. Logger emit and logRequest now stamp trace_id/span_id on every line emitted inside a withTracing scope, so logs and traces correlate in ClickStack. New injectTraceContext(headers) helper writes a W3C traceparent for outbound fetches (consumed by createInstrumentedFetch in @decocms/apps). Adds docs/observability.md covering architecture, span/metric reference, wrangler.jsonc shape, identity floor, ClickHouse JSONExtract query patterns, trace propagation, and sampling notes; README links it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

cubic-dev-ai

1 issue found across 15 files

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

…ng default to 1% Phase 1 — fix spy leak flagged by cubic review on PR #179. The `instrumentWorker — tracer bridge` describe block in `otel.test.ts` installs a `vi.spyOn(trace, "getTracer")` per test via `installBridgeWithFakeOtelSpan()` but the `afterEach` only reset boot state — it never restored the spy. That meant any subsequent test in the same vitest worker that touched the real OTel global API would get the mocked tracer back. Added `vi.restoreAllMocks()` to the afterEach so each test's spies die with it. 553/553 still pass. Phase 2 — lower documented trace `head_sampling_rate` default from 0.1 (10%) to 0.01 (1%) across docs, scaffold, codemod, and skill reference. At fleet scale (~2.5B req/mo across 100 sites, ~20 spans/req), 10% puts the account at meaningful risk of tripping the CF 5B-events/day cap, after which CF auto-applies forced 1% sampling for the rest of the day fleet-wide — turning a single viral campaign on one storefront into a fleet-wide outage of telemetry. 1% gives 2.5x headroom and keeps annual telemetry cost in the low hundreds of dollars. Also added to `docs/observability.md`: - Heavy-site override tier (0.001 / 0.1%) gated on explicit team sign-off, NOT a codemod default. No documented rate above 1%. - Cost model table (CF Destinations + ingestor + AE + DO) verified against current public CF pricing. - Note that the 100%-error-traces upgrade path (ingest-side filtering at 1.0) costs ~$44K/mo at our scale vs ~$420/mo for the chosen path of 1% head + direct-POST errors. The migration codemod's `--traces-rate` default now ships 0.01 so newly-migrated sites land on the safe number out of the box. Co-authored-by: Cursor <cursoragent@cursor.com>

…ingest CF Destinations supports OTLP for logs and traces only — there is no `observability.metrics` block. To get metrics into stats-lake the framework now POSTs OTLP/HTTP JSON straight to deco-otel-ingest's `/v1/metrics` endpoint, alongside (not replacing) the AE meter. Adds `createOtlpHttpMeterAdapter` in `src/core/sdk/otelHttpMeter.ts`: - Per-isolate, in-memory buffer for counters/gauges/histograms with CUMULATIVE aggregation temporality (matches the clickhouseexporter schema the ingestor inherits). - Counters aggregate by (metric name, attribute fingerprint); each `counterInc()` bumps the cumulative value in place. - Gauges are last-write-wins per attribute fingerprint. - Histograms accumulate count/sum/min/max + per-bucket counts against explicit bounds (default `[5,10,25,50,75,100,250,500,1000]` ms). - Re-registering a metric name as a different kind drops the second call and surfaces via `onError("kind-mismatch", …)` — never throws. - Buffer cap (default 2000 datapoints) bounds isolate memory. New attribute fingerprints past the cap are dropped via `onError("overflow", …)`; existing fingerprints still update. - `flush()` POSTs the snapshot and is throttled by `minFlushIntervalMs` (default 5000ms) per isolate. Concurrent callers share a single in-flight POST. Cooldown is bypassed when the buffer is at the cap. - Transport failures (network, non-2xx, AbortError on timeout) never throw — they surface via `onError("flush", err)` so the request hot path stays uninstrumented. Wires into `instrumentWorker`: - `bootObservability` reads `env.DECO_OTEL_METRICS_ENDPOINT` (env var configurable via `OtelOptions.otlpMetricsEndpointEnvVar`). When set and `otlpMetricsEnabled !== false`, the OTLP meter is installed beside the AE meter via `createCompositeMeter([ae, otlp])`. - After every `handler.fetch`, the wrapped fetch fires `ctx.waitUntil(otlpMeter.flush())` so the buffer drains without blocking the response. The throttle keeps actual POST volume proportional to traffic, not to request count. - Boot breadcrumb adds `otlpMetrics: true|false` so operators can confirm the wire at a glance from CF Logs. Tests: - `otelHttpMeter.test.ts` — 10 unit tests covering cumulative sum aggregation, histogram bucket assignment, gauge last-write-wins, kind-mismatch rejection, cooldown gating, overflow protection, in-flight flush coalescing, non-2xx + transport failures. - `otel.test.ts` — 4 integration tests covering the boot/flush wiring (no-op without env var, flush enqueued via ctx.waitUntil, `otlpMetricsEnabled: false` opt-out, flush still fires when the inner handler throws). End-to-end verified live against the deployed deco-otel-ingest endpoint: sum, gauge, and histogram all round-trip through the exporter → ingestor → ClickHouse `otel_metrics_{sum,gauge,histogram}` with correct values, attribute fingerprints, and timestamps. Smoke harness at `scripts/smoke-otlp-meter.ts` for repeatability. 567/567 vitest pass, typecheck clean. Co-authored-by: Cursor <cursoragent@cursor.com>

cubic-dev-ai

1 issue found across 9 files (changes from recent commits).

_{Reply with feedback, questions, or to request a fix.

Re-trigger cubic}

…nstrumentedFetch Two related changes that close gaps in outbound fetch instrumentation before downstream services see our traffic: **`redactUrl()` (new in `src/core/sdk/urlRedaction.ts`).** Replaces every query value with `"REDACTED"` by default, with an opt-in `keepQueryKeys` allowlist for structural params (`page`, `sort`, etc.). Strips userinfo and the fragment, preserves host + path verbatim, and falls back to a defensive substring-before-`?` for unparseable inputs. Exported as `@decocms/start/sdk/urlRedaction`. Why redact at the emit site instead of relying on the ingestor: spans land in CF Workers Tracing the instant they're stamped — once the unredacted URL is in the CF dashboard, we can't take it back. Same for `console.log` lines captured by `observability.logs`. The ingestor's existing header-redaction layer doesn't reach the URL, so this is the only point where outbound URLs get sanitized. **`createInstrumentedFetch` updates:** - `http.url` span attribute now carries the redacted URL. - The colored dev log lines (`[vtex] GET ...`) print the redacted URL. - New `keepQueryKeys: ReadonlyArray<string>` option propagates to `redactUrl` for callers that want to keep specific keys readable. - New `injectTraceparent: boolean` option (default `true`) injects the active span's W3C `traceparent` header onto every outbound request via `injectTraceContext(headers)`. No-op when no span is active, so CF Worker auto-instrumented spans automatically propagate when present and the wrapper is invisible otherwise. - Header merging preserves caller-supplied headers when `input` is a `Request` — pre-existing values win, the wrapper adds traceparent only. The structured `"outgoing fetch"` breadcrumb (only emitted when `OTEL_LOG_OUTGOING_FETCH=true`) keeps using `host`/`path` derived from the rawUrl since both fields are individually safe and the value lands in the logger pipe where the ingestor's redaction layer applies. Tests: - `urlRedaction.test.ts` — 10 cases: every-value redaction, allowlist, empty-value preservation, userinfo stripping, fragment drop, multi- value query collapse, unparseable input fallback. - `instrumentedFetch.test.ts` — 7 cases: redacted URL on span, allowlist works, traceparent injection on outbound request when span has a spanContext, opt-out via `injectTraceparent: false`, no-op when no span is active, caller headers preserved alongside injected traceparent. 584/584 vitest pass, typecheck clean, tier-boundaries audit clean. This unblocks PR #3 in `@decocms/apps-start`: VTEX/Shopify fetch wrappers will migrate to `createInstrumentedFetch` and inherit traceparent propagation + URL redaction for free, instead of duplicating the logic in each app. Co-authored-by: Cursor <cursoragent@cursor.com>

cubic-dev-ai

3 issues found across 6 files (changes from recent commits).

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic}

…ture Errors are too important to leave to head sampling. Once the cost model drops `logs.head_sampling_rate` below 1.0 — the recommended target is 0.01 once this channel ships — a sampled CF Destinations pipe loses 99% of `logger.error(...)` calls. Sites lose the one signal they care about most. This commit adds a separate, framework-controlled transport for `level: "error"` records via direct POST to deco-otel-ingest `/v1/logs` (same endpoint CF Destinations uses; same `default.otel_logs` table; no ingestor changes required). **`createOtlpHttpErrorLogAdapter` (new in `src/core/sdk/otelHttpErrorLog.ts`).** - Returns `{ adapter, flush, pendingRecordCount }`. The `adapter` is a `LoggerAdapter` that filters out debug/info/warn upstream of the buffer — only errors travel this transport. The composite logger fans non-error calls out to `defaultLoggerAdapter` which CF Destinations samples normally. - Token-bucket rate limiter prevents log storms (a pathological error loop blasting 10K errors/sec would be self-inflicted DoS on stats-lake). Default: 100 errors/min refill, burst capacity 20. Both knobs are options. - Buffer cap (default 500 records) bounds isolate memory. Overflow surfaces via `onError("overflow", …)`; existing records still queue. - `flush()` POSTs and is throttled by `minFlushIntervalMs` (default 5000ms). Concurrent callers share a single in-flight POST. Cooldown is bypassed when the buffer is at the cap. - Snapshot-then-reset semantics: records buffered after `flush()` starts the POST aren't lost — they queue for the next flush. - Transport failures surface via `onError("flush", err)`. Never throws. - Wire format: OTLP/HTTP JSON `ResourceLogs` matching what CF Destinations emits, so the existing ingestor handler accepts both transports unchanged. Scalar attribute kinds (string/bool/int/double) serialize to the right OTLP variants; structured attrs round-trip as JSON strings. **`instrumentWorker` wiring:** - `bootObservability` reads `env.DECO_OTEL_LOGS_ENDPOINT` (env var configurable via `OtelOptions.otlpErrorLogsEndpointEnvVar`). When set and `otlpErrorLogsEnabled !== false`, the error-log adapter is composed onto `defaultLoggerAdapter` via `createCompositeLogger`. - After every `handler.fetch`, the wrapped fetch fires `ctx.waitUntil(otlpErrorLog.flush())` alongside the metrics flush. Both throttle themselves; calling on every request is cheap. - Boot breadcrumb adds `otlpErrorLogs: true|false`. - The exporter's `onError` callback uses `console.warn` directly (not `logger.warn`) to avoid recursing back into the logger composite. Tests: - `otelHttpErrorLog.test.ts` — 10 unit tests: level filter, OTLP shape, resource attributes, scalar serialization for all kinds, token-bucket rate limiting with refill, buffer overflow, flush success/non-200/ rejection, cooldown gating with cap bypass, in-flight coalescing. - `otel.test.ts` — 3 integration tests: error routes through direct POST when env is set, info/warn/debug do NOT trigger POSTs, opt-out via `otlpErrorLogsEnabled: false`. End-to-end verified live against deployed deco-otel-ingest: an error record with mixed-kind attributes (string, bool, int) round-trips into `default.otel_logs` with correct SeverityText="error", SeverityNumber=17, Body, LogAttributes, ServiceName, and ScopeName="@decocms/start". Smoke harness at `scripts/smoke-otlp-errorlog.ts`. 597/597 vitest pass, typecheck clean. With this channel in place, the next safe step (in a follow-up PR) is to lower `logs.head_sampling_rate` from 1.0 to 0.01, dropping the fleet-wide CF Destinations log bill by ~99% while keeping 100% of errors via this transport. Co-authored-by: Cursor <cursoragent@cursor.com>

Phase 6 (deeper instrumentation): the highest-value addition over the existing withTracing coverage is making cache decisions filterable on traces directly, without joining to the metric tables. `recordCacheMetric` now stamps `deco.cache.decision` and `deco.cache.profile` on the closest enclosing span before recording the counter. Existing call sites in workerEntry.ts (HIT / STALE-HIT / STALE-ERROR / MISS / BYPASS) get this for free — no call-site changes. This is intentionally narrower than the originally scoped block.resolve / schema.compose / render.shell spans. Those would either duplicate existing spans (resolvePage already wraps block resolution) or fire only at boot time (composeMeta), and would explode cardinality without proportional value. Stamping the cache decision on the span that's already there is one attribute per request, zero new spans. Tests: 4 new cases covering decision+profile stamp, no-span safety, profile-omitted, and confirming the counter still fires. Co-authored-by: Cursor <cursoragent@cursor.com>

cubic-dev-ai

2 issues found across 6 files (changes from recent commits).

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic}

…rangler.jsonc Phase 7 (audit rules): the **detect** half of D3 ("audit is the safety net"). Pairs 1:1 with the existing `deco-cf-observability --write` fix — every rule has a `fix:` line in its finding pointing at the exact codemod flag that resolves it. Read-only auditor with 6 rules covering the fleet-scale failure modes: observability_missing error observability_disabled error traces_disabled / logs_disabled warn head_sampling_rate_elevated error (traces > 0.01) logs_head_sampling_rate_low warn (logs < 1) persist_disabled_no_destination error CI-friendly: exit 0 on clean, 1 on findings (any severity), 2 on unparseable wrangler.jsonc. `--json` for machine-readable output. Wired up as a published bin: `deco-audit-observability`. Storefronts can run it in CI; a future CF Tail Worker pre-merge gate can run it against PR changes. Also factored `stripJsoncComments` out of the codemod into `scripts/lib/jsonc.ts` so both scripts share the helper. Tests: 12 cases — canonical-passes, every rule fires, boundary value 0.01 doesn't false-fire, persist:false-with-destination is fine, multiple findings stack correctly. Co-authored-by: Cursor <cursoragent@cursor.com>

…loss profile Phase 9 (docs refresh): aligns docs/observability.md with what shipped across Phases 3, 4, 5, 6, 7. No more "will ship" or "to land" — only present tense for code that exists. Changes: - Architecture diagram now shows both transport paths (CF Destinations for traces + info/warn logs; direct POST for metrics + error logs). - New "Metrics: AE vs OTLP" subsection explaining the two-meter composite — what each pipe is for, what's lost by dropping each. - New "Data loss profile" table covering all five signal classes: traces, info/warn logs, error logs, OTLP metrics, AE metrics. Each with explicit loss conditions (sampling, rate limiter, buffer overflow, isolate eviction) so operators know where to look when a signal goes missing. - New "Wiring" subsection documenting the direct-POST env vars (DECO_OTEL_METRICS_ENDPOINT, DECO_OTEL_LOGS_ENDPOINT) — these are the user-facing config knobs that turn the new channels on. - Sampling section now uses present tense for the direct-POST error channel, with an explicit note that errors bypass head sampling. - Cost-model footnote moved from future-tense to past-tense for the shipped error channel; updated savings projection. - "Out of scope" trimmed and made specific — calls out that direct POST IS in scope (deliberately) for metrics and error logs only, not for spans or info-logs. Tests + typecheck still 613 passing. Co-authored-by: Cursor <cursoragent@cursor.com>

cubic-dev-ai

1 issue found across 9 files (changes from recent commits).

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic}

Phase 8 (cubic review fixes): triage of seven findings on the last five cubic reviews. All seven were real and now addressed. Tests up to 626 (from 613), all green. P1 — correctness: - instrumentedFetch.ts header semantics now follow the Fetch spec: when init.headers is provided, it REPLACES Request.headers; when it's omitted, Request.headers are preserved. Previous code unioned them, which silently leaked Request headers through even when callers intended to drop them. Two new tests pin the contract. - otelHttpErrorLog.ts no longer permanently drops records on a failed POST. The snapshot is now restored to the front of the buffer on non-2xx and on network errors, so a subsequent flush picks them up. The buffer's existing maxBufferRecords cap still bounds total memory if the endpoint stays down — newest-tail records are dropped via onError("overflow", …) only when the buffer is at the cap during restore. Three new tests cover the restore path including the cap-overflow case. P2 — defensive coverage: - urlRedaction.ts fallback path now strips fragments too (`#…`), not just queries — fragments carry OAuth implicit-grant access tokens. - otelHttpMeter.ts splits getOrCreate so the overflow check runs BEFORE materializing a new MetricEntry. Previously, a dropped datapoint left a permanent empty entry in the metrics map that serialized to an empty OTLP envelope each flush. - otelHttpErrorLog.ts attrToOtlpValue guards JSON.stringify(v) === undefined (functions, undefined, symbols) — previously surfaced as malformed { stringValue: undefined } in the OTLP payload. - lib/jsonc.ts adds stripJsoncTrailingCommas + parseJsonc helpers; the audit and codemod now use them so real wrangler.jsonc files with trailing commas (very common) audit cleanly instead of failing with a parse error. P3 — test honesty: - instrumentedFetch.test.ts "preserves rawUrl in the structured outgoing-fetch log" was misnamed: the test didn't enable OTEL_LOG_OUTGOING_FETCH and didn't assert on the logger. Replaced with two tests: one that flips the env flag, captures the logger, and asserts the breadcrumb's host/path/method/status; another that confirms the breadcrumb is absent when the flag is off. Docs: data loss profile in docs/observability.md updated to reflect the failed-flush restore path and the actual maxBufferRecords default. Co-authored-by: Cursor <cursoragent@cursor.com>

cubic-dev-ai

1 issue found across 12 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/core/sdk/otelHttpErrorLog.ts">

<violation number="1" location="src/core/sdk/otelHttpErrorLog.ts:206">
P2: When flush fails and the buffer is full, this branch drops the entire failed snapshot (older records) instead of evicting newer buffered entries. That reverses the intended “prioritize originating-error context” behavior and can lose the most relevant error logs during endpoint outages.</violation>
</file>

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic}

Add per-call operation resolution to the upstream createInstrumentedFetch helper so commerce integrations (@decocms/apps) can produce semantic span names like "vtex.intelligent-search.product_search" or "vtex.checkout.orderForm" without forking the framework's outbound-fetch span shape or paying for a second per-call child span. API additions: - `init.operation?: string` — per-call override. Stripped from init before reaching baseFetch. - `FetchInstrumentationOptions.defaultOperation?: string` — fallback when init.operation is unset. Useful for single-purpose clients (e.g. Resend → "emails.send" for every call). - `FetchInstrumentationOptions.resolveOperation?: (url, method) => string | undefined` — URL-derived long-tail router. Returns undefined to opt out (span falls back to "${name}.fetch"). Designed to give the apps repo a two-layer model: explicit operation strings at hot call sites, URL-derived defaults for the long tail. Resolution precedence: init.operation ?? defaultOperation ?? resolveOperation(url, method) ?? "fetch". The resolved operation is stamped as `fetch.operation` on the span (queryable independent of span name) and surfaced in the onComplete callback payload so per-app duration histograms can label by operation. Types: - `InstrumentedFetchInit = RequestInit & { operation?: string }` - `InstrumentedFetch = (input, init?: InstrumentedFetchInit) => Promise<Response>` `InstrumentedFetch` remains structurally assignable to typeof fetch (operation is optional), so existing setter signatures (setVtexFetch, setShopifyFetch) keep compiling unchanged. Apps repo will switch its module-level `let _fetch: typeof fetch` to `InstrumentedFetch` to surface the operation property type. Why this lives in @decocms/start instead of @decocms/apps: the framework already owns the outbound-fetch span shape (${name}.fetch). Letting two repos define parallel span shapes would force every call to produce two spans (outer apps-side wrapper + inner upstream ${name}.fetch), doubling per-month CF Destinations egress and ClickHouse storage. Single-span model is materially cheaper at fleet scale and architecturally cleaner. Tests: 7 new in instrumentedFetch.test.ts — explicit operation, defaultOperation, resolveOperation, undefined-returning router, precedence (init wins over default wins over router), onComplete payload, init-mutation safety (operation stripped before baseFetch). 633 tests total, all green; typecheck and biome clean. Docs: docs/observability.md gains a "Per-call operation names on outbound fetches" subsection under Trace propagation, and the Out-of- scope note for commerce spans is rewritten to reflect the framework/apps split (framework owns span shape, apps owns operation strings + provider histogram). Co-authored-by: Cursor <cursoragent@cursor.com>

cubic-dev-ai

1 issue found across 3 files (changes from recent commits).

_{Tip: Review your code locally with the cubic CLI to iterate faster.

Re-trigger cubic}

cubic flagged that the Phase 8 restore-on-failure path inverted the intended priority: when buffer + snapshot exceeded the cap, the path truncated the SNAPSHOT (older, more-likely-causal records) rather than the buffer (newer records that arrived during the failed POST). That contradicted the stated invariant ("preserve originating-error context"). Fix: restore policy is now snapshot-first, evict newest: 1. snapshot + buffer ≤ cap → unshift snapshot, done. 2. Otherwise, drop newest-tail records from the live buffer until either the buffer is empty or the total fits. 3. Only if the snapshot ALONE still exceeds the cap do we truncate it — and even then, keep the OLDEST end and drop the newest tail. The previous test "drops oldest-tail records of the snapshot" asserted the buggy behavior and has been replaced with two tests that pin the corrected semantics: - snapshot (2) + buffer (2) > cap (3) → drops 1 newer record from buffer, snapshot fully restored; onError surfaces the count and the preserved-record count. - snapshot (2) + buffer (2) > cap (2) → drops both buffer records; snapshot fully restored. Data-loss-profile in docs/observability.md updated to match. Co-authored-by: Cursor <cursoragent@cursor.com>

…omitted cubic flagged that `resolveOperation` (and the `http.method` span attr) received "GET" for any caller doing `fetch(new Request(url, { method: "POST" }))` without an explicit `init.method` — the wrapper's `init?.method || "GET"` ignored the Request's own method. POST/PUT/DELETE traffic surfaced as GET on spans and was routed by `resolveOperation(url, "GET")` even when the real method was different. Fix: method resolution now follows the Fetch spec — `init.method ?? input.method ?? "GET"`. Init wins (Fetch spec), then the Request's method survives, then we default to GET to match `new Request(url).method`. Tests: - a Request with method=POST and no init → span attr http.method=POST, resolveOperation called with "POST", span name reflects POST. - init.method=DELETE overrides Request.method=POST → http.method=DELETE. 636 tests total, all green. Co-authored-by: Cursor <cursoragent@cursor.com>

vibe-dex requested a review from a team May 18, 2026 13:19