feat(observability): direct-POST metrics + error channel, per-call fetch operations, audit CLI#179
feat(observability): direct-POST metrics + error channel, per-call fetch operations, audit CLI#179vibe-dex wants to merge 12 commits into
Conversation
…and trace correlation Stamps service.name and service.version (from CF_VERSION_METADATA) on every framework span and log via the attribute floor; sets SpanStatusCode.ERROR on framework spans when withTracing rethrows; adds spanContext() on the Span adapter so logger and propagation helpers can read trace IDs. New framework spans cover cache lookup/store decisions (HTML + serverFn paths) with profile and kind attributes, the runSectionLoaders batch, deferred section load, and every admin protocol handler (meta, decofile read/reload, render, invoke). recordCacheMetric now takes a CacheDecision label (HIT|STALE-HIT|STALE-ERROR|MISS|BYPASS) and is wired at every decision point. Logger emit and logRequest now stamp trace_id/span_id on every line emitted inside a withTracing scope, so logs and traces correlate in ClickStack. New injectTraceContext(headers) helper writes a W3C traceparent for outbound fetches (consumed by createInstrumentedFetch in @decocms/apps). Adds docs/observability.md covering architecture, span/metric reference, wrangler.jsonc shape, identity floor, ClickHouse JSONExtract query patterns, trace propagation, and sampling notes; README links it. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
1 issue found across 15 files
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
…ng default to 1% Phase 1 — fix spy leak flagged by cubic review on PR #179. The `instrumentWorker — tracer bridge` describe block in `otel.test.ts` installs a `vi.spyOn(trace, "getTracer")` per test via `installBridgeWithFakeOtelSpan()` but the `afterEach` only reset boot state — it never restored the spy. That meant any subsequent test in the same vitest worker that touched the real OTel global API would get the mocked tracer back. Added `vi.restoreAllMocks()` to the afterEach so each test's spies die with it. 553/553 still pass. Phase 2 — lower documented trace `head_sampling_rate` default from 0.1 (10%) to 0.01 (1%) across docs, scaffold, codemod, and skill reference. At fleet scale (~2.5B req/mo across 100 sites, ~20 spans/req), 10% puts the account at meaningful risk of tripping the CF 5B-events/day cap, after which CF auto-applies forced 1% sampling for the rest of the day fleet-wide — turning a single viral campaign on one storefront into a fleet-wide outage of telemetry. 1% gives 2.5x headroom and keeps annual telemetry cost in the low hundreds of dollars. Also added to `docs/observability.md`: - Heavy-site override tier (0.001 / 0.1%) gated on explicit team sign-off, NOT a codemod default. No documented rate above 1%. - Cost model table (CF Destinations + ingestor + AE + DO) verified against current public CF pricing. - Note that the 100%-error-traces upgrade path (ingest-side filtering at 1.0) costs ~$44K/mo at our scale vs ~$420/mo for the chosen path of 1% head + direct-POST errors. The migration codemod's `--traces-rate` default now ships 0.01 so newly-migrated sites land on the safe number out of the box. Co-authored-by: Cursor <cursoragent@cursor.com>
…ingest
CF Destinations supports OTLP for logs and traces only — there is no
`observability.metrics` block. To get metrics into stats-lake the
framework now POSTs OTLP/HTTP JSON straight to deco-otel-ingest's
`/v1/metrics` endpoint, alongside (not replacing) the AE meter.
Adds `createOtlpHttpMeterAdapter` in `src/core/sdk/otelHttpMeter.ts`:
- Per-isolate, in-memory buffer for counters/gauges/histograms with
CUMULATIVE aggregation temporality (matches the clickhouseexporter
schema the ingestor inherits).
- Counters aggregate by (metric name, attribute fingerprint); each
`counterInc()` bumps the cumulative value in place.
- Gauges are last-write-wins per attribute fingerprint.
- Histograms accumulate count/sum/min/max + per-bucket counts against
explicit bounds (default `[5,10,25,50,75,100,250,500,1000]` ms).
- Re-registering a metric name as a different kind drops the second
call and surfaces via `onError("kind-mismatch", …)` — never throws.
- Buffer cap (default 2000 datapoints) bounds isolate memory. New
attribute fingerprints past the cap are dropped via
`onError("overflow", …)`; existing fingerprints still update.
- `flush()` POSTs the snapshot and is throttled by `minFlushIntervalMs`
(default 5000ms) per isolate. Concurrent callers share a single
in-flight POST. Cooldown is bypassed when the buffer is at the cap.
- Transport failures (network, non-2xx, AbortError on timeout) never
throw — they surface via `onError("flush", err)` so the request hot
path stays uninstrumented.
Wires into `instrumentWorker`:
- `bootObservability` reads `env.DECO_OTEL_METRICS_ENDPOINT` (env var
configurable via `OtelOptions.otlpMetricsEndpointEnvVar`). When set
and `otlpMetricsEnabled !== false`, the OTLP meter is installed
beside the AE meter via `createCompositeMeter([ae, otlp])`.
- After every `handler.fetch`, the wrapped fetch fires
`ctx.waitUntil(otlpMeter.flush())` so the buffer drains without
blocking the response. The throttle keeps actual POST volume
proportional to traffic, not to request count.
- Boot breadcrumb adds `otlpMetrics: true|false` so operators can
confirm the wire at a glance from CF Logs.
Tests:
- `otelHttpMeter.test.ts` — 10 unit tests covering cumulative sum
aggregation, histogram bucket assignment, gauge last-write-wins,
kind-mismatch rejection, cooldown gating, overflow protection,
in-flight flush coalescing, non-2xx + transport failures.
- `otel.test.ts` — 4 integration tests covering the boot/flush
wiring (no-op without env var, flush enqueued via ctx.waitUntil,
`otlpMetricsEnabled: false` opt-out, flush still fires when the
inner handler throws).
End-to-end verified live against the deployed deco-otel-ingest
endpoint: sum, gauge, and histogram all round-trip through the
exporter → ingestor → ClickHouse `otel_metrics_{sum,gauge,histogram}`
with correct values, attribute fingerprints, and timestamps. Smoke
harness at `scripts/smoke-otlp-meter.ts` for repeatability.
567/567 vitest pass, typecheck clean.
Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
1 issue found across 9 files (changes from recent commits).
Reply with feedback, questions, or to request a fix.
Re-trigger cubic
…nstrumentedFetch Two related changes that close gaps in outbound fetch instrumentation before downstream services see our traffic: **`redactUrl()` (new in `src/core/sdk/urlRedaction.ts`).** Replaces every query value with `"REDACTED"` by default, with an opt-in `keepQueryKeys` allowlist for structural params (`page`, `sort`, etc.). Strips userinfo and the fragment, preserves host + path verbatim, and falls back to a defensive substring-before-`?` for unparseable inputs. Exported as `@decocms/start/sdk/urlRedaction`. Why redact at the emit site instead of relying on the ingestor: spans land in CF Workers Tracing the instant they're stamped — once the unredacted URL is in the CF dashboard, we can't take it back. Same for `console.log` lines captured by `observability.logs`. The ingestor's existing header-redaction layer doesn't reach the URL, so this is the only point where outbound URLs get sanitized. **`createInstrumentedFetch` updates:** - `http.url` span attribute now carries the redacted URL. - The colored dev log lines (`[vtex] GET ...`) print the redacted URL. - New `keepQueryKeys: ReadonlyArray<string>` option propagates to `redactUrl` for callers that want to keep specific keys readable. - New `injectTraceparent: boolean` option (default `true`) injects the active span's W3C `traceparent` header onto every outbound request via `injectTraceContext(headers)`. No-op when no span is active, so CF Worker auto-instrumented spans automatically propagate when present and the wrapper is invisible otherwise. - Header merging preserves caller-supplied headers when `input` is a `Request` — pre-existing values win, the wrapper adds traceparent only. The structured `"outgoing fetch"` breadcrumb (only emitted when `OTEL_LOG_OUTGOING_FETCH=true`) keeps using `host`/`path` derived from the rawUrl since both fields are individually safe and the value lands in the logger pipe where the ingestor's redaction layer applies. Tests: - `urlRedaction.test.ts` — 10 cases: every-value redaction, allowlist, empty-value preservation, userinfo stripping, fragment drop, multi- value query collapse, unparseable input fallback. - `instrumentedFetch.test.ts` — 7 cases: redacted URL on span, allowlist works, traceparent injection on outbound request when span has a spanContext, opt-out via `injectTraceparent: false`, no-op when no span is active, caller headers preserved alongside injected traceparent. 584/584 vitest pass, typecheck clean, tier-boundaries audit clean. This unblocks PR #3 in `@decocms/apps-start`: VTEX/Shopify fetch wrappers will migrate to `createInstrumentedFetch` and inherit traceparent propagation + URL redaction for free, instead of duplicating the logic in each app. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
3 issues found across 6 files (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
…ture
Errors are too important to leave to head sampling. Once the cost model
drops `logs.head_sampling_rate` below 1.0 — the recommended target is
0.01 once this channel ships — a sampled CF Destinations pipe loses
99% of `logger.error(...)` calls. Sites lose the one signal they care
about most.
This commit adds a separate, framework-controlled transport for
`level: "error"` records via direct POST to deco-otel-ingest `/v1/logs`
(same endpoint CF Destinations uses; same `default.otel_logs` table;
no ingestor changes required).
**`createOtlpHttpErrorLogAdapter` (new in `src/core/sdk/otelHttpErrorLog.ts`).**
- Returns `{ adapter, flush, pendingRecordCount }`. The `adapter` is a
`LoggerAdapter` that filters out debug/info/warn upstream of the
buffer — only errors travel this transport. The composite logger
fans non-error calls out to `defaultLoggerAdapter` which CF
Destinations samples normally.
- Token-bucket rate limiter prevents log storms (a pathological error
loop blasting 10K errors/sec would be self-inflicted DoS on
stats-lake). Default: 100 errors/min refill, burst capacity 20.
Both knobs are options.
- Buffer cap (default 500 records) bounds isolate memory. Overflow
surfaces via `onError("overflow", …)`; existing records still queue.
- `flush()` POSTs and is throttled by `minFlushIntervalMs` (default
5000ms). Concurrent callers share a single in-flight POST. Cooldown
is bypassed when the buffer is at the cap.
- Snapshot-then-reset semantics: records buffered after `flush()`
starts the POST aren't lost — they queue for the next flush.
- Transport failures surface via `onError("flush", err)`. Never throws.
- Wire format: OTLP/HTTP JSON `ResourceLogs` matching what CF
Destinations emits, so the existing ingestor handler accepts both
transports unchanged. Scalar attribute kinds (string/bool/int/double)
serialize to the right OTLP variants; structured attrs round-trip
as JSON strings.
**`instrumentWorker` wiring:**
- `bootObservability` reads `env.DECO_OTEL_LOGS_ENDPOINT` (env var
configurable via `OtelOptions.otlpErrorLogsEndpointEnvVar`). When
set and `otlpErrorLogsEnabled !== false`, the error-log adapter is
composed onto `defaultLoggerAdapter` via `createCompositeLogger`.
- After every `handler.fetch`, the wrapped fetch fires
`ctx.waitUntil(otlpErrorLog.flush())` alongside the metrics flush.
Both throttle themselves; calling on every request is cheap.
- Boot breadcrumb adds `otlpErrorLogs: true|false`.
- The exporter's `onError` callback uses `console.warn` directly (not
`logger.warn`) to avoid recursing back into the logger composite.
Tests:
- `otelHttpErrorLog.test.ts` — 10 unit tests: level filter, OTLP shape,
resource attributes, scalar serialization for all kinds, token-bucket
rate limiting with refill, buffer overflow, flush success/non-200/
rejection, cooldown gating with cap bypass, in-flight coalescing.
- `otel.test.ts` — 3 integration tests: error routes through direct
POST when env is set, info/warn/debug do NOT trigger POSTs, opt-out
via `otlpErrorLogsEnabled: false`.
End-to-end verified live against deployed deco-otel-ingest: an error
record with mixed-kind attributes (string, bool, int) round-trips
into `default.otel_logs` with correct SeverityText="error",
SeverityNumber=17, Body, LogAttributes, ServiceName, and
ScopeName="@decocms/start". Smoke harness at
`scripts/smoke-otlp-errorlog.ts`.
597/597 vitest pass, typecheck clean.
With this channel in place, the next safe step (in a follow-up PR) is
to lower `logs.head_sampling_rate` from 1.0 to 0.01, dropping the
fleet-wide CF Destinations log bill by ~99% while keeping 100% of
errors via this transport.
Co-authored-by: Cursor <cursoragent@cursor.com>
Phase 6 (deeper instrumentation): the highest-value addition over the existing withTracing coverage is making cache decisions filterable on traces directly, without joining to the metric tables. `recordCacheMetric` now stamps `deco.cache.decision` and `deco.cache.profile` on the closest enclosing span before recording the counter. Existing call sites in workerEntry.ts (HIT / STALE-HIT / STALE-ERROR / MISS / BYPASS) get this for free — no call-site changes. This is intentionally narrower than the originally scoped block.resolve / schema.compose / render.shell spans. Those would either duplicate existing spans (resolvePage already wraps block resolution) or fire only at boot time (composeMeta), and would explode cardinality without proportional value. Stamping the cache decision on the span that's already there is one attribute per request, zero new spans. Tests: 4 new cases covering decision+profile stamp, no-span safety, profile-omitted, and confirming the counter still fires. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
2 issues found across 6 files (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
…rangler.jsonc
Phase 7 (audit rules): the **detect** half of D3 ("audit is the safety
net"). Pairs 1:1 with the existing `deco-cf-observability --write` fix
— every rule has a `fix:` line in its finding pointing at the exact
codemod flag that resolves it.
Read-only auditor with 6 rules covering the fleet-scale failure modes:
observability_missing error
observability_disabled error
traces_disabled / logs_disabled warn
head_sampling_rate_elevated error (traces > 0.01)
logs_head_sampling_rate_low warn (logs < 1)
persist_disabled_no_destination error
CI-friendly: exit 0 on clean, 1 on findings (any severity), 2 on
unparseable wrangler.jsonc. `--json` for machine-readable output.
Wired up as a published bin: `deco-audit-observability`. Storefronts can
run it in CI; a future CF Tail Worker pre-merge gate can run it against
PR changes.
Also factored `stripJsoncComments` out of the codemod into
`scripts/lib/jsonc.ts` so both scripts share the helper.
Tests: 12 cases — canonical-passes, every rule fires, boundary value
0.01 doesn't false-fire, persist:false-with-destination is fine,
multiple findings stack correctly.
Co-authored-by: Cursor <cursoragent@cursor.com>
…loss profile Phase 9 (docs refresh): aligns docs/observability.md with what shipped across Phases 3, 4, 5, 6, 7. No more "will ship" or "to land" — only present tense for code that exists. Changes: - Architecture diagram now shows both transport paths (CF Destinations for traces + info/warn logs; direct POST for metrics + error logs). - New "Metrics: AE vs OTLP" subsection explaining the two-meter composite — what each pipe is for, what's lost by dropping each. - New "Data loss profile" table covering all five signal classes: traces, info/warn logs, error logs, OTLP metrics, AE metrics. Each with explicit loss conditions (sampling, rate limiter, buffer overflow, isolate eviction) so operators know where to look when a signal goes missing. - New "Wiring" subsection documenting the direct-POST env vars (DECO_OTEL_METRICS_ENDPOINT, DECO_OTEL_LOGS_ENDPOINT) — these are the user-facing config knobs that turn the new channels on. - Sampling section now uses present tense for the direct-POST error channel, with an explicit note that errors bypass head sampling. - Cost-model footnote moved from future-tense to past-tense for the shipped error channel; updated savings projection. - "Out of scope" trimmed and made specific — calls out that direct POST IS in scope (deliberately) for metrics and error logs only, not for spans or info-logs. Tests + typecheck still 613 passing. Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
1 issue found across 9 files (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
Phase 8 (cubic review fixes): triage of seven findings on the last five
cubic reviews. All seven were real and now addressed. Tests up to 626
(from 613), all green.
P1 — correctness:
- instrumentedFetch.ts header semantics now follow the Fetch spec:
when init.headers is provided, it REPLACES Request.headers; when it's
omitted, Request.headers are preserved. Previous code unioned them,
which silently leaked Request headers through even when callers
intended to drop them. Two new tests pin the contract.
- otelHttpErrorLog.ts no longer permanently drops records on a failed
POST. The snapshot is now restored to the front of the buffer on
non-2xx and on network errors, so a subsequent flush picks them up.
The buffer's existing maxBufferRecords cap still bounds total memory
if the endpoint stays down — newest-tail records are dropped via
onError("overflow", …) only when the buffer is at the cap during
restore. Three new tests cover the restore path including the
cap-overflow case.
P2 — defensive coverage:
- urlRedaction.ts fallback path now strips fragments too (`#…`), not
just queries — fragments carry OAuth implicit-grant access tokens.
- otelHttpMeter.ts splits getOrCreate so the overflow check runs
BEFORE materializing a new MetricEntry. Previously, a dropped
datapoint left a permanent empty entry in the metrics map that
serialized to an empty OTLP envelope each flush.
- otelHttpErrorLog.ts attrToOtlpValue guards JSON.stringify(v) ===
undefined (functions, undefined, symbols) — previously surfaced as
malformed { stringValue: undefined } in the OTLP payload.
- lib/jsonc.ts adds stripJsoncTrailingCommas + parseJsonc helpers; the
audit and codemod now use them so real wrangler.jsonc files with
trailing commas (very common) audit cleanly instead of failing with
a parse error.
P3 — test honesty:
- instrumentedFetch.test.ts "preserves rawUrl in the structured
outgoing-fetch log" was misnamed: the test didn't enable
OTEL_LOG_OUTGOING_FETCH and didn't assert on the logger. Replaced
with two tests: one that flips the env flag, captures the logger,
and asserts the breadcrumb's host/path/method/status; another that
confirms the breadcrumb is absent when the flag is off.
Docs: data loss profile in docs/observability.md updated to reflect
the failed-flush restore path and the actual maxBufferRecords default.
Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
1 issue found across 12 files (changes from recent commits).
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/core/sdk/otelHttpErrorLog.ts">
<violation number="1" location="src/core/sdk/otelHttpErrorLog.ts:206">
P2: When flush fails and the buffer is full, this branch drops the entire failed snapshot (older records) instead of evicting newer buffered entries. That reverses the intended “prioritize originating-error context” behavior and can lose the most relevant error logs during endpoint outages.</violation>
</file>
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
Add per-call operation resolution to the upstream
createInstrumentedFetch helper so commerce integrations
(@decocms/apps) can produce semantic span names like
"vtex.intelligent-search.product_search" or "vtex.checkout.orderForm"
without forking the framework's outbound-fetch span shape or paying for
a second per-call child span.
API additions:
- `init.operation?: string` — per-call override. Stripped from init
before reaching baseFetch.
- `FetchInstrumentationOptions.defaultOperation?: string` — fallback
when init.operation is unset. Useful for single-purpose clients
(e.g. Resend → "emails.send" for every call).
- `FetchInstrumentationOptions.resolveOperation?: (url, method) =>
string | undefined` — URL-derived long-tail router. Returns
undefined to opt out (span falls back to "${name}.fetch"). Designed
to give the apps repo a two-layer model: explicit operation strings
at hot call sites, URL-derived defaults for the long tail.
Resolution precedence: init.operation ?? defaultOperation ??
resolveOperation(url, method) ?? "fetch". The resolved operation is
stamped as `fetch.operation` on the span (queryable independent of
span name) and surfaced in the onComplete callback payload so per-app
duration histograms can label by operation.
Types:
- `InstrumentedFetchInit = RequestInit & { operation?: string }`
- `InstrumentedFetch = (input, init?: InstrumentedFetchInit) =>
Promise<Response>`
`InstrumentedFetch` remains structurally assignable to typeof fetch
(operation is optional), so existing setter signatures
(setVtexFetch, setShopifyFetch) keep compiling unchanged. Apps repo
will switch its module-level `let _fetch: typeof fetch` to
`InstrumentedFetch` to surface the operation property type.
Why this lives in @decocms/start instead of @decocms/apps: the
framework already owns the outbound-fetch span shape (${name}.fetch).
Letting two repos define parallel span shapes would force every call
to produce two spans (outer apps-side wrapper + inner upstream
${name}.fetch), doubling per-month CF Destinations egress and
ClickHouse storage. Single-span model is materially cheaper at fleet
scale and architecturally cleaner.
Tests: 7 new in instrumentedFetch.test.ts — explicit operation,
defaultOperation, resolveOperation, undefined-returning router,
precedence (init wins over default wins over router), onComplete
payload, init-mutation safety (operation stripped before baseFetch).
633 tests total, all green; typecheck and biome clean.
Docs: docs/observability.md gains a "Per-call operation names on
outbound fetches" subsection under Trace propagation, and the Out-of-
scope note for commerce spans is rewritten to reflect the
framework/apps split (framework owns span shape, apps owns operation
strings + provider histogram).
Co-authored-by: Cursor <cursoragent@cursor.com>
There was a problem hiding this comment.
1 issue found across 3 files (changes from recent commits).
Tip: Review your code locally with the cubic CLI to iterate faster.
Re-trigger cubic
cubic flagged that the Phase 8 restore-on-failure path inverted the
intended priority: when buffer + snapshot exceeded the cap, the path
truncated the SNAPSHOT (older, more-likely-causal records) rather
than the buffer (newer records that arrived during the failed POST).
That contradicted the stated invariant ("preserve originating-error
context").
Fix: restore policy is now snapshot-first, evict newest:
1. snapshot + buffer ≤ cap → unshift snapshot, done.
2. Otherwise, drop newest-tail records from the live buffer until
either the buffer is empty or the total fits.
3. Only if the snapshot ALONE still exceeds the cap do we truncate
it — and even then, keep the OLDEST end and drop the newest tail.
The previous test "drops oldest-tail records of the snapshot"
asserted the buggy behavior and has been replaced with two tests
that pin the corrected semantics:
- snapshot (2) + buffer (2) > cap (3) → drops 1 newer record from
buffer, snapshot fully restored; onError surfaces the count and
the preserved-record count.
- snapshot (2) + buffer (2) > cap (2) → drops both buffer records;
snapshot fully restored.
Data-loss-profile in docs/observability.md updated to match.
Co-authored-by: Cursor <cursoragent@cursor.com>
…omitted
cubic flagged that `resolveOperation` (and the `http.method` span attr)
received "GET" for any caller doing
`fetch(new Request(url, { method: "POST" }))` without an explicit
`init.method` — the wrapper's `init?.method || "GET"` ignored the
Request's own method. POST/PUT/DELETE traffic surfaced as GET on
spans and was routed by `resolveOperation(url, "GET")` even when the
real method was different.
Fix: method resolution now follows the Fetch spec —
`init.method ?? input.method ?? "GET"`. Init wins (Fetch spec), then
the Request's method survives, then we default to GET to match
`new Request(url).method`.
Tests:
- a Request with method=POST and no init → span attr http.method=POST,
resolveOperation called with "POST", span name reflects POST.
- init.method=DELETE overrides Request.method=POST → http.method=DELETE.
636 tests total, all green.
Co-authored-by: Cursor <cursoragent@cursor.com>
Summary
Builds on the 5.0 CF-native observability pipeline. Stamps
service.nameandservice.version(fromCF_VERSION_METADATA) on every framework span and log line via the attribute floor, and setsSpanStatusCode.ERRORon framework spans whenwithTracingrethrows. Adds new spans across the framework — cache lookup/store decisions in both cache paths (HTML + serverFn) with profile/kind/decision labels via an extendedrecordCacheMetric(hit, profile, decision), therunSectionLoadersbatch, the deferred section load, and every admin protocol handler (meta,decofile.read|reload,render,invoke). Logger andlogRequestnow stamptrace_id/span_idon every line emitted inside awithTracingscope, so logs and traces correlate in ClickStack. New public helperinjectTraceContext(headers)writes a W3Ctraceparentfrom the active span for outbound fetches (consumed bycreateInstrumentedFetchin@decocms/apps). Shipsdocs/observability.mdcovering the architecture, span/metric reference,wrangler.jsoncshape, identity floor, ClickHouse JSONExtract query patterns, trace propagation, and sampling notes; linked from README.Test plan
bun run typecheckcleanbun test— all 553 tests pass (11 new)bun run buildsucceeds; new exports (injectTraceContext,CacheDecision,spanContext) appear in dist d.tsbunx tsx scripts/check-tier-boundaries.ts— cleanversion_metadatabinding + ingest destinations: confirmResourceAttributes['service.version']andSpanAttributes['cache.profile']land in ClickHouse; force a 5xx and verify the framework span showsSTATUS_CODE_ERRORin ClickStack; filter logs bytrace_idand confirm correlation.🤖 Generated with Claude Code
Summary by cubic
Expands CF-native observability with wider span coverage, direct OTLP/HTTP metrics and 100%-error logs, safe URL logging, trace/log correlation, per-call operation names on outbound fetches, and a
deco-audit-observabilityCLI. Also exposes Router preload reuse options and fixes fetch method/header semantics, exporter reliability, and redaction edge cases.New Features
level: "error"to/v1/logswhenDECO_OTEL_LOGS_ENDPOINTis set; rate-limited, buffered per isolate, flushed viactx.waitUntil. Safe to lowerobservability.logs.head_sampling_rate./v1/metricswhenDECO_OTEL_METRICS_ENDPOINTis set; aggregates per isolate, flushes viactx.waitUntil; composes with Workers Analytics Engine.service.name/service.version; sets error status on framework spans; new spans for cache lookup/store (HTML + serverFn),deco.section.loaders.batch, deferred section load, and admin handlers. Cache metrics carry adecisionlabel and setdeco.cache.decision/deco.cache.profileon the active span.logger/logRequestaddtrace_id/span_idinsidewithTracing.injectTraceContext(headers)writes W3Ctraceparent; used bycreateInstrumentedFetch(default on, opt out withinjectTraceparent: false).createInstrumentedFetchadds per-call operation names —init.operation,defaultOperation, andresolveOperation(url, method); stampsfetch.operationon the span and exposes it viaonComplete.createInstrumentedFetchredacts query values onhttp.urland dev logs; allowlist viakeepQueryKeys. PublicredactUrlhelper exported.defaultPreloadStaleTime,defaultPreloadGcTime,defaultPreloadDelay,defaultStaleTime,defaultPendingMs,defaultPendingMinMs); addsdeco-pdp-fast-navigationreference.deco-audit-observabilityauditswrangler.jsonc;observability.traces.head_sampling_ratedefaults to0.01.docs/observability.mddocuments wiring and cost model; README links it. New exports:@decocms/start/sdk/otelHttpMeter,@decocms/start/sdk/otelHttpErrorLog,injectTraceContext,CacheDecision,redactUrl.Bug Fixes
init.headersreplacesRequest.headers; when omitted, existing headers are preserved. Method resolution now followsinit.method ?? input.method ?? "GET"sohttp.methodandresolveOperationsee the real verb.JSON.stringifyis undefined.#...) to avoid leaking tokens.deco-audit-observabilityanddeco-cf-observabilityhandle realwrangler.jsoncfiles cleanly.Written for commit 999fd30. Summary will update on new commits. Review in cubic