Implement Edge Cookie identity system with KV graph and integration tests#621
Implement Edge Cookie identity system with KV graph and integration tests#621ChristianPavilonis wants to merge 81 commits into
Conversation
de0525d to
0940c34
Compare
aram356
left a comment
There was a problem hiding this comment.
Summary
Comprehensive EC identity subsystem: HMAC-based ID generation, KV identity graph with CAS concurrency, partner sync (pixel, batch, pull), identify endpoint, and auction bidstream decoration. The architecture is well-designed — clean separation of concerns, strong input validation, and solid concurrency model. A few cleartext logging issues and a docs inconsistency need addressing before merge.
Blocking
🔧 wrench
- Cleartext EC ID logging: Full EC IDs are logged at
warn!/error!/trace!levels in 5 locations acrossec/mod.rs(lines 128, 211, 296),sync_pixel.rs(line 91), andpull_sync.rs(line 91). Thelog_id()truncation helper was introduced in this PR for exactly this purpose but is not used consistently. See inline comments for fixes. - Stale env var in docs:
TRUSTED_SERVER__EDGE_COOKIE__SECRET_KEYstill appears inconfiguration.md(lines 40, 956) anderror-reference.md(line 154). The PR renamed the TOML section from[edge_cookie]to[ec]and updated some env var references but missed these. Users following the docs will set a variable that is silently ignored, resulting in a startup failure from the placeholder passphrase. Should beTRUSTED_SERVER__EC__PASSPHRASE.
❓ question
HttpOnlyomitted from EC cookie (ec/cookies.rs:11): Intentionally omitted for a hypothetical future JS use case. Is there a concrete plan? The identify endpoint already exposes the EC ID. If not,HttpOnlywould be cheap XSS defense-in-depth.
Non-blocking
🤔 thinking
- Uppercase EC ID rejection in batch sync (
batch_sync.rs:209):is_valid_ec_idrejects uppercase hex, but batch sync validates before normalizing (line 225). Partners submitting uppercase EC IDs getinvalid_ec_idinstead of normalization.
♻️ refactor
_pull_enabledindex lacks CAS (partner.rs:564): Read-modify-write without generation markers. Concurrent partner registrations can overwrite each other's index updates. Self-healing via fallback, but inconsistent with the CAS discipline used elsewhere.
🌱 seedling
- Integration tests don't verify identify response shape:
FullLifecycleandConcurrentPartnerSyncsonly assertuids.<partner>. Theec,consent,degraded,eids, andcluster_sizefields from the API reference are never checked. - Pixel sync failure paths untested end-to-end:
ts_reason=no_ec,no_consent, domain mismatch redirects are unit-tested but not covered in integration tests.
📝 note
trusted-server.tomlships banned placeholder:passphrase = "trusted-server"is rejected byreject_placeholder_secrets. Works in CI via env override, but new contributors will hit a confusing startup error. A TOML comment would help.
⛏ nitpick
Eid/UidmissingDeserialize:openrtb.rstypes deriveSerializeonly. If the auction ever needs to parse EIDs from provider responses,Deserializewill be needed.
CI Status
- cargo fmt: PASS
- clippy: PASS
- cargo test: PASS
- vitest: PASS
- integration tests: FAIL
- CodeQL: FAIL (likely related to cleartext logging findings above)
prk-Jr
left a comment
There was a problem hiding this comment.
Summary
Comprehensive EC identity subsystem: HMAC-based ID generation, KV identity graph with CAS concurrency, partner sync (pixel, batch, pull), identify endpoint, and auction bidstream decoration. The architecture is well-designed with clean separation of concerns, strong concurrency discipline, and solid security choices throughout. Four new blocking findings join the existing ones — three are input-validation gaps and one is a PII leak in the pull sync URL builder that should be resolved before merge.
Blocking
🔧 wrench
- PII leak: client IP forwarded to partners —
pull_sync.rs:273:build_pull_request_urlappends the raw user IP as theipquery parameter on every outbound pull-sync request, exposing it in partner access logs. Contradicts the privacy-preserving design intent and is likely a GDPR Article 5 concern. Remove theipparameter or gate it behind an explicit per-partnerallow_ip_forwardingconfig flag. - Pull sync response body unbounded —
pull_sync.rs:325:take_body_bytes()with no size cap before JSON parsing. A malicious partner can exhaust WASM memory. Batch sync and admin endpoints both haveMAX_BODY_SIZEguards; this path needs one too (64 KiB is generous for a{"uid":"..."}response). - Whitespace-only UIDs bypass validation —
batch_sync.rs:217andsync_pixel.rs:41:is_empty()passes" "and"\t", which get stored as garbage EID values in the KV graph. Replace withtrim().is_empty()at both sites. rand::thread_rng()WASM compatibility unverified —generation.rs:52: requiresgetrandomwith thewasifeature onwasm32-wasip1. Nativecargo testpassing does not prove the WASM build is safe; integration tests are already failing so this may not have been caught. Switch toOsRngor addgetrandom = { version = "0.2", features = ["wasi"] }explicitly.
Non-blocking
🤔 thinking
process_mappingsUpsertResult branches untested —batch_sync.rs:232:NotFound,ConsentWithdrawn, andStalehave zero unit test coverage.Staleis documented as "counted as accepted" with no regression test.- Shared error messages make pull sync validation tests non-isolating —
partner.rs:152,165: both missing-URL and missing-token conditions return the identical error string, so the tests asserting on substrings pass even if the wrong branch fires. Use distinct messages per condition.
♻️ refactor
ec_consent_grantedhas no tests —consent.rs:20: the entry-point gate for all EC creation has no#[cfg(test)]section. Add smoke tests for granted and denied paths.- KV tombstone path never exercised in finalize tests —
finalize.rs:149: all finalize tests passkv: None, so the tombstone write and the cookie-EC ≠ active-EC case inwithdrawal_ec_idsare untested. - Missing HMAC prefix stability test —
generation.rs:228: no test asserts that the same IP + passphrase always produces the same 64-char hash prefix, which is the core deduplication guarantee for the KV graph. normalize_ec_idduplicated in integration tests —integration-tests/tests/common/ec.rs:374: reimplementsnormalize_ec_id_for_kvfrom core. Re-export and use the canonical implementation.
⛏ nitpick
- Use
saturating_subfor consistency —kv.rs:605: the subtraction is safe due to the guard above but inconsistent withsaturating_subused throughout the rest of the module. log_idshould encapsulate the…suffix —mod.rs:51: every call site manually appends"…"in the format string; move it inside the function.
Praises 👍
- CAS-based optimistic concurrency (
kv.rs): bounded retries, gracefulItemPreconditionFailedhandling, andMAX_CAS_RETRIESpreventing infinite loops — textbook correct for a single-writer KV model. - Constant-time API key comparison (
partner.rs):subtle::ConstantTimeEqfor timing attack prevention on key lookups. Many implementations miss this entirely. KvMetadatafast-path consent check (kv_types.rs): mirroringok/country/cluster_sizein metadata to avoid streaming the full KV body is an excellent performance optimisation with the right constraint test.evaluate_known_browserwithOnceLock-cached hash table (device.rs): pre-hashing fingerprints once and caching viaOnceLockis the right WASM-compatible lazy-init pattern.- HMAC stability explicitly documented (
generation.rs:30-33): noting that the output format is a "stable contract" that would invalidate all existing identities if changed is exactly the kind of correctness annotation that prevents future breakage.
CI Status
- cargo fmt: PASS
- cargo clippy: PASS
- cargo test: PASS
- vitest: PASS
- integration tests: FAIL
- CodeQL: FAIL (cleartext logging — covered in prior review)
aram356
left a comment
There was a problem hiding this comment.
Summary
This PR lands a large, well-partitioned Edge Cookie identity subsystem, and the prior blocking feedback (PII logging via log_id(), whitespace UID checks, pull-sync response size cap, dropping raw client IP from pull-sync query strings, constant-time auth comparisons) has been materially addressed. Blocking issues remaining are: a CI-red integration test (deterministic, root cause in the test client, not the server), an unverified rand::thread_rng() on wasm32-wasip1 that CI cannot catch because no wasm build runs, two new CodeQL alerts on ec/kv.rs riding the pre-existing settings-taint false-positive flow, one untested backend-state machine in batch_sync, and two questions on the auction / partner surfaces.
Blocking
🔧 wrench
- CI red —
test_ec_lifecycle_fastlydeterministic failure: the test client omitsSec-Fetch-Dest: documentandAccept: text/html, sois_navigation_request()correctly returnsfalseand EC never generates (crates/integration-tests/tests/common/ec.rs:93). rand::thread_rng()onwasm32-wasip1unverified; CI builds no wasm target (crates/trusted-server-core/src/ec/generation.rs:52). Add a wasm build to CI and/or switch to an explicitgetrandomwith thewasifeature.- Two new CodeQL alerts on
ec/kv.rs:637and:789ride the samesettings.reject_placeholder_secrets()taint flow that already produces noise onmain. Both are false positives but block the CodeQL gate. Fix with a scoped suppression or a repo-level CodeQL config filter onrust/cleartext-loggingfor settings-derived values. UpsertResult::{NotFound, ConsentWithdrawn, Stale}untested inbatch_sync::process_mappings(crates/trusted-server-core/src/ec/batch_sync.rs:231-244). Prior reviewer flagged this; still unresolved.
❓ question
- Auction
user.id = ""on the no-consent path is emitted into the OpenRTB bid request JSON (crates/trusted-server-core/src/auction/endpoints.rs:60-77,formats.rs:186). Has this been validated against the live Prebid Server target? If not, changeUserInfo.idtoOption<String>withskip_serializing_if. update_pull_enabled_indexrace self-heal path untested (crates/trusted-server-core/src/ec/partner.rs:552-604). The documented fallback atpartner.rs:350is the only thing keeping the index correct under concurrent upserts — please add a concurrency test or file a tracked follow-up.
Non-blocking
🤔 thinking
MAX_CAS_RETRIES = 3, no backoff — likely to starve under contention on hot prefixes; consider exponential backoff + jitter (crates/trusted-server-core/src/ec/kv.rs:300-339).HashMap<String, KvPartnerId>non-deterministic serialization — breaks future hash-based dedup and byte-diffing of stored values; useBTreeMaporIndexMap(crates/trusted-server-core/src/ec/kv_types.rs:59).seen_domainsdrop-newest eviction freezes long-lived ECs on their first 50 domains, defeating thelasttimestamp; switch to LRU or document (crates/trusted-server-core/src/ec/kv.rs:627-642).
♻️ refactor
- EID gating failures log at
debug!— bump towarn!so ad-stack anomalies surface in production (crates/trusted-server-core/src/auction/endpoints.rs:80). ec_consent_granted()is a 1-line pass-through — inline or document the sealing-point intent (crates/trusted-server-core/src/ec/consent.rs:20-22).- Unvalidated domain strings in EC graph writes — add a 255-byte length cap and hostname-shape check at the write boundary (
crates/trusted-server-core/src/ec/kv.rs update_last_seen,kv_types.rs).
🌱 seedling
- No integration test for the GPC / consent-denied identify path. Once the
Sec-Fetch-Destfix lands, addec_full_lifecycle_with_gpcsendingSec-GPC: 1and asserting/_ts/api/v1/identifyreturns 403 / no EID payload (crates/integration-tests/tests/frameworks/scenarios.rs:501-565). - No wasm-target build in CI — root cause of how the
rand::thread_rng()concern reached this point unnoticed. Addingcargo build -p trusted-server-adapter-fastly --target wasm32-wasip1would catch an entire class of deps-pin regressions.
📌 out of scope
- Pull-sync EC ID still travels as a URL query parameter — acknowledged in the PR description as deferred. Please file the follow-up issue and reference it from the commit that ships this PR so it is not lost (
crates/trusted-server-core/src/ec/pull_sync.rs:260-263).
CI Status
- cargo fmt: PASS
- cargo clippy: PASS
- cargo test (native): PASS
- vitest: PASS
- integration tests: FAIL —
test_ec_lifecycle_fastly(see blocking #1) - CodeQL: FAIL — 15 high alerts; 13 are pre-existing
settings.reject_placeholder_secrets()taint-flow false positives already present onmain, 2 are new onec/kv.rsriding the same flow (see blocking #3)
7009838 to
43df212
Compare
aram356
left a comment
There was a problem hiding this comment.
Summary
Large PR (98 files, +11,974 / −3,099). Architecture and consent plumbing are solid; this review focuses on concrete defects and hardening gaps verified against 109c929c. Three CI checks currently fail (integration tests, format-typescript, CodeQL); two of those are trivially fixable and inline-commented below. The CodeQL failure appears to be a false positive that needs a team decision.
Blocking
🔧 wrench
- Integration tests fail to compile:
crates/integration-tests/Cargo.tomlmissingtrusted-server-coredev-dependency whiletests/common/ec.rs:19andtests/frameworks/scenarios.rs:5import it. (inline) - format-typescript CI failure:
crates/js/lib/src/integrations/prebid/index.ts:413fails prettier — one-line fix vianpm run format. (inline) - Unbounded cookie payloads:
ec/prebid_eids.rs:43(ts-eids) andec/prebid_eids.rs:111(sharedId) decode/clone attacker-controlled cookie values with no size ceiling. (inline)
❓ question
- Consent-withdrawal split-brain at
ec/finalize.rs:56: consent-store delete failures are logged-and-continued while KV tombstones still write. Intentional (KV = source of truth) or should withdrawal fail-closed? (inline) - CodeQL cross-cutting: 15 high-severity
rust/cleartext-loggingalerts all namesettings.reject_placeholder_secrets()as the tainted source, pinned to log-call lines that do not actually reference that method (e.g.auction/orchestrator.rs:125is a timeout warning). This is taint-analyzer propagation through a boolean validator that uses.expose()internally; the logged bytes are field names, not secret material. How would the team like to resolve: (a)#[allow]on the validator with a comment explaining the false positive, (b) renamereject_placeholder_secrets→validate_secrets_not_placeholdersto break the taint-name heuristic, or (c) configure a CodeQL suppression?
Non-blocking
🤔 thinking
- Bot gate is presence-only (
ec/device.rs:76) — attackers with spoofed JA4 pass. Document threat model. (inline) user.id = Nonewhen EC not allowed (auction/endpoints.rs:60) — correct for consent, but may depress bidder dedup/yield. Consider a session-scoped ephemeral ID. (inline)- Pull-sync only on organic routes (
trusted-server-adapter-fastly/src/main.rs:373) — auction-heavy SPA publishers never refresh KV. Document or extend. (inline)
♻️ refactor
- Duplicated bearer parser in
ec/identify.rs:34andec/batch_sync.rs:101— extract toec/auth.rs. (inline) - Rate-limiter hourly→minute conversion is lossy by up to ~2× (
ec/rate_limiter.rs:59) — document and test the overshoot bound. (inline)
🌱 seedling
- Non-deterministic EID order on timestamp ties (
ec/eids.rs:64) — addpartner_idas secondary sort key. (inline) - No post-deserialize bounds on
KvEntry(ec/kv.rs:158) — legacy/corrupt oversized entries will round-trip through. (inline) platform/test_support.rsdefines stubs (NoopConfigStore,NoopSecretStore,noop_services) that no unit test exercises, so trait drift will silently break them. Either consume in ≥1 unit test or remove.- Auction ext metadata unsrialized:
auction/formats.rscomputes strategy/provider-count/timing intoresult.metadatabut never emits it intoext.trustedServeron the OpenRTB response — a useful debug surface.
⛏ nitpick
has_cookie_mismatchnaming (ec/mod.rs:398) — ambiguous; preferheader_overrides_cookie. (inline)
📌 out of scope
ec_id/client_ipas query parameters in pull sync — PR description acknowledges as follow-up.
CI Status
- cargo fmt: PASS
- clippy: PASS
- cargo test (workspace): PASS
- vitest: PASS
- format-typescript: FAIL (see inline)
- integration tests: FAIL — compile error (see inline)
- CodeQL: FAIL — 15 high (likely false positives) (see question above)
- browser integration tests: PASS
- format-docs: PASS
|
Addressed the remaining merge blockers in 4d2ef30:
Validation run locally:
|
|
Follow-up update in 6016c48 to address the remaining open review threads:
Validation re-run locally:
I also replied to and resolved the remaining open review threads tied to these changes. |
prk-Jr
left a comment
There was a problem hiding this comment.
Summary
This PR lands the Edge Cookie identity system end to end, but I found two merge blockers in the request lifecycle.
Blocking
🔧 wrench
- Identify GET still missing browser-direct CORS support:
OPTIONS /_ts/api/v1/identifyreflects CORS headers, butGET /_ts/api/v1/identifynever applies them, so the documented browser-direct flow still fails after a successful preflight.classify_origin()also accepts matching hosts regardless of scheme, while the spec only allowshttpsorigins.
Fix: classify origin in the GET path too, return 403 for denied origins, reflect CORS headers on allowed GET responses, and require origin_url.scheme() == "https" in classify_origin(). (crates/trusted-server-core/src/ec/identify.rs:31, crates/trusted-server-core/src/ec/identify.rs:171)
CI Status
- cargo fmt: PASS
- cargo test: PASS
- vitest: PASS
- browser integration tests: PASS
- integration tests: FAIL (
test_ec_lifecycle_fastly: organicGET /should setts-eccookie) - CodeQL: FAIL (12 open alerts on the merge ref)
aram356
left a comment
There was a problem hiding this comment.
Summary
Edge Cookie identity subsystem is materially complete and the prior review's blockers have been addressed: navigation-headers fix unblocks test_ec_lifecycle_fastly, UpsertResult variants are tested, user.id is Option<String> with skip_serializing_if, the partner-registry race is gone (architecture is now config-derived), MAX_CAS_RETRIES is 5 with backoff+jitter, raw client IP is no longer forwarded to pull-sync partners, and the WASM build now runs in CI.
Remaining blocking issues: a plaintext partner pull-token sitting on a Debug-deriving struct, an unintentional-looking read/write asymmetry in the bot gate, and a still-failing CodeQL gate riding the same Settings Debug taint-flow false positive that was flagged on the prior review.
Blocking
🔧 wrench
- Plaintext
ts_pull_tokenonPartnerConfig(crates/trusted-server-core/src/ec/registry.rs:44, :208)
❓ question
- Bot-gate read/write asymmetry intentional? (
crates/trusted-server-adapter-fastly/src/main.rs:208-212vs.:272-276) - CodeQL CI gate is failing on 12 new
rust/cleartext-loggingalerts, all riding thesettings.reject_placeholder_secrets()→log::debug!("Settings {settings:?}")taint flow that was flagged on the prior review.SettingsDebug is redacted viaRedacted<T>, so the alerts are benign — but they block the gate. Sites:crates/trusted-server-adapter-fastly/src/main.rs:71,crates/trusted-server-core/src/publisher.rs:178/:341/:382,crates/trusted-server-core/src/html_processor.rs:70,crates/trusted-server-core/src/consent/mod.rs:173,crates/trusted-server-core/src/auction/orchestrator.rs:125/:278/:338/:408,crates/trusted-server-core/src/integrations/nextjs/html_post_process.rs:123/:456. Plan to suppress at PR level or add a repo-wide CodeQL filter onRedacted-derived flows before merge?
Non-blocking
🤔 thinking
std::thread::sleepin CAS retry loop on wasm32-wasip1 (crates/trusted-server-core/src/ec/kv.rs:343, :426, :471, :558)seen_domainsstill usesHashMapwhileidswas converted toBTreeMap(crates/trusted-server-core/src/ec/kv_types.rs:133)MIN_PASSPHRASE_LENGTH = 8is weak entropy for the HMAC-SHA256 anchor (crates/trusted-server-core/src/settings.rs:353)seen_domainsdrop-newest eviction freezes long-lived ECs at the 50-domain cap (crates/trusted-server-core/src/ec/kv.rs:667-683) — repeat from prior review
♻️ refactor
PartnerRegistry::all()exposesHashMapiteration order to callers (crates/trusted-server-core/src/ec/registry.rs:174-176)RateLimiter::exceeded_per_minuteround-trips through hourly math (crates/trusted-server-core/src/ec/rate_limiter.rs:42-46, :49-52)
🌱 seedling
- No end-to-end integration tests for batch sync 429 / 400 / 413 paths (
crates/integration-tests/tests/frameworks/scenarios.rs:761-779) - Bot-gate observability hook (
crates/trusted-server-core/src/ec/device.rs:84-86) — operators have no metric for what fraction of traffic is excluded
📌 out of scope
- Repo-level CodeQL filter for the
Redacted-derivedrust/cleartext-loggingtaint flow so this stops gating PRs. Second consecutive PR review where the same false-positive flow blocks CI.
⛏ nitpick
derive(Debug)onPartnerRegistry/PartnerConfig(crates/trusted-server-core/src/ec/registry.rs:17, :53) — combined with the plaintext token wrench, this is the loaded gun.
CI Status
- cargo fmt: PASS
- cargo clippy: PASS
- cargo test (workspace): PASS
- vitest: PASS
- integration tests: PASS
- browser integration tests: PASS
- WASM release build: PASS
- CodeQL: FAIL — 12 new alerts, all on the same
SettingsDebug taint-flow false-positive flow (see ❓ above)
ChristianPavilonis
left a comment
There was a problem hiding this comment.
Clearing an old pending local review so I can reply to the current threads from the branch update.
Closes #693
f29c4a5 to
aa9c96e
Compare
aram356
left a comment
There was a problem hiding this comment.
Summary
Pass-2 review of the EC subsystem after the rebase onto current main. The pure-EC code (KV graph, consent gating, finalize, identify, batch sync, pull sync, device signals) is in good shape: clippy/test/fmt clean locally, no unwrap() on hot paths, withdrawal semantics carefully handled, and the unit-test surface is thorough (1178 core tests). Most pass-1 findings are now resolved upstream. Three blocking items remain at the edges of the PR.
Proposed fixes for the implementable findings below are in a stacked PR: #705. Merge it into this branch to absorb them. Two findings (the two ❓ questions) need product/privacy sign-off rather than code; one 🔧 (
/ad-proxy/) was intentionally left for you to decide whether to drop or to land in a follow-up PR with proper routing.
Blocking
🔧 wrench
/auctionbody parsed without size cap —crates/trusted-server-core/src/auction/endpoints.rs:53.handle_auctiondoesserde_json::from_slice(&req.take_body_bytes())with no limit; an authenticated client can buffer arbitrary bytes into the Wasm worker before any validation. Sister endpoint/batch-syncgot the right pattern in the rebase;/auctiondid not. Proposed fix in #705 (256 KiB cap,Content-Lengthprecheck + post-read check, returns413)./ad-proxy/URLs minted but not routed —crates/trusted-server-core/src/integrations/prebid.rs:422-502, 1138.transform_prebid_responserewrites every winning bid'sadm,nurl, andburltohttps://<request_host>/ad-proxy/...URLs. No/ad-proxy/route exists (not inadapter-fastly/main.rs, not inPrebidIntegration::routes()). Live impact: every winning bid from the five hardcoded CDN patterns serves a creative whose resources 404;nurl/burlbilling pixels also 404 — bidders may stop bidding. Regression vs.main(the rewriter did not exist there). Recommend either (a) gating it behind a/ad-proxy/route handler in a separate PR or (b) dropping the rewriter from this PR (scope creep). Not implemented in #705 by request.
❓ question
Jurisdiction::Unknownfail-closed → silent EC blackhole when geo is missing —crates/trusted-server-core/src/consent/mod.rs:506-508,crates/trusted-server-adapter-fastly/src/main.rs(geo wiring).allows_ec_creationreturnsfalseforUnknown.GeoInfo::from_requestmay returnNoneif geo isn't enabled on the deployed Fastly service. Result: zero EC issuance everywhere (auction EID decoration,/identify, organic generation) with no error and no startup warning. Is the operator-mistake blast radius intentional? At minimum, consider a startuplog::warn!if a bootstrap geo probe returnsNone, or surface on/health.- TCF storage-consent overrides US-Privacy
opt_out_sale=Yesin US states —crates/trusted-server-core/src/consent/mod.rs:486-498, testec_allowed_us_state_tcf_takes_priority_over_us_privacy. ForUsState, if a CMP wrote a TCF storage-consent string and the user also opted out via US-Privacy,allows_ec_creationreturnstrue. The precedence ("TCF beats CCPA opt-out") is codified in the test. Was this signed off by privacy/legal? It's a real-world failure mode (stale TCF from a prior EU visit + fresh US opt-out → consent bypass).
Non-blocking
♻️ refactor
IntegrationRegistry::handle_proxy>7 args, papered over with#[allow(clippy::too_many_arguments)]—crates/trusted-server-core/src/integrations/registry.rs:651-660. CLAUDE.md disallows>7arguments. Proposed fix in #705 (ProxyDispatchInputstruct, allow-attr removed).content_length_exceeds_limitduplicated —crates/trusted-server-core/src/ec/batch_sync.rs:174-178, and (with the/auctionfix in #705)auction/endpoints.rs. The natural home ishttp_util, buthttp_utilhas migrated tohttp::Request<EdgeBody>whilebatch_sync/auctionstill usefastly::Request. Defer the lift until those callers migrate, so the helper can land in the right type system.
🤔 thinking
/auctionlacks per-route rate limiting — adjacent to the body-size finding./batch-syncand pull sync have rate limiters;/auctionis reachable from any first-party client with no per-IP/per-EC limit. Worth a follow-up issue (not blocking).- Bot gate is presence-only and gates pull-sync too —
crates/trusted-server-core/src/ec/device.rs:84-86,adapter-fastly/main.rs:170-179, 457-465. A client withoutja4_class(rare — Fastly Compute over HTTP/1.1 or non-TLS test traffic) silently loses pull-sync forever. Documented as a known limitation; flagged for product visibility.
📝 note
reject_placeholder_secretsdoesn't cover partnerapi_tokens —crates/trusted-server-core/src/settings.rs. Shippedtrusted-server.tomlincludes three partners (sharedid,inttest,inttest2) whoseapi_tokens are placeholder strings that pass the 32-byte length check. If an operator deploys without overriding them,/_ts/api/v1/batch-syncwrite access and/_ts/api/v1/identifyread access are granted to those tokens by default.inttest*are at least labeled test-only in comments;sharedidis not. Worth adding to the placeholder list, or shippingenabled=falseby default.- Cluster size capped at one
list(prefix, limit=100)page —crates/trusted-server-core/src/ec/kv.rs:606-631. For high-traffic publishers the prefix space could exceed 100 in legitimate traffic; the count is capped silently with no pagination. Worth instrumenting later to confirm rarity.
⛏ nitpick
- Leftover
syntheticnaming intest_support.rs, stale/_ts/api/v1/syncdoc-comments inec/mod.rs::consent_mutandintegration-tests/.../scenarios.rs,/identifyexample missingcluster_sizeindocs/guide/api-reference.md. All four addressed in #705.
🌱 seedling
- No end-to-end CAS-conflict or consent-withdrawal-mid-flight scenario —
crates/integration-tests/tests/frameworks/scenarios.rs. The sevenEcScenarios are happy-path coverage. KV unit tests cover the code path; no integration coverage of the concurrent paths that the CAS retry/tombstone logic exists to handle. Follow-up.
👍 praise
- EC test surface is genuinely comprehensive at the unit level — 1178
trusted-server-coretests, including invalid-legacy-entries / unsupported-schema-version / refusing-to-serialize-invalid / cluster cache hit. Solid. finalize.rscleanly centralizes the cookie/header/tombstone lifecycle — every route now goes through one helper; consent-withdrawal path well-tested; partner-scopedx-ts-<id>headers stripped on withdrawal.- Withdrawal targets both header-EC and cookie-EC (
finalize.rs::withdrawal_ec_ids) — tombstones both when they differ, deduplicates when they match. Good edge-case handling. - Bearer-auth helper extracted to
ec/auth.rs— single source of truth for/identifyand/batch-sync, crisp tests rejecting malformed schemes / multi-token headers. Redacted<String>discipline on every secret field (api_token,ts_pull_token,passphrase,proxy_secret) with a regression test thatDebugoutput does not leak.- CodeQL sensitive-data fixes (
log_idtruncation, redactedSettingsDebug) are surgical and well-targeted.
CI status
cargo fmt --all -- --check— PASS (local)cargo clippy --workspace --all-targets --all-features -- -D warnings— PASS (local)cargo test --workspace— PASS, 0 failures (1178 core + 21 openrtb + 20 adapter-fastly + 2 doc; local)cd crates/js/lib && npx vitest run— PASS (318 tests)- WASM release build with integration-test env — PASS (locally, with the upstream-fixed passphrase of length ≥32)
Pass-1 findings resolved upstream since the previous review
- ✅ Branch is now rebased (4 main commits behind, fine).
- ✅
/identifyresponses set CORS headers on every status (200/204/401/403), not just preflight. - ✅
/batch-synchas aContent-Lengthprecheck before buffering. - ✅
partners/registerand/_ts/api/v1/syncdocs removed fromapi-reference.md. - ✅
MIN_PASSPHRASE_LENGTHraised 8 → 32. - ✅ CAS loops no longer
thread::sleep. - ✅ Prebid EID
idlength now checked againstMAX_UID_LENGTH. - ✅ CI build-env passphrase length now satisfies
MIN_PASSPHRASE_LENGTH.
aram356
left a comment
There was a problem hiding this comment.
Summary
Pass-3 review of the Edge Cookie (EC) identity system after the round-8 CHANGES_REQUESTED. Reviewed against current main in an isolated worktree at PR head 61c3a14; all 126 changed files were read in full across parallel subsystem passes. The EC subsystem is mature and well-tested, and most round-8 findings are genuinely resolved (see Resolved since round 8 below). Four items remain — two new, two carried over.
Blocking
Full detail is in the inline comments; summary:
🔧 wrench
Set-Cookiedata loss inrebuild_response_with_body—crates/trusted-server-core/src/proxy.rs:150. The header-copy loop changedappend_header→set_headerand themainregression test was deleted; multipleSet-Cookieheaders on rewritten proxy responses collapse to the last one. Regression vsmain.- GPP-only US opt-out not treated as a withdrawal —
crates/trusted-server-core/src/consent/mod.rs:546.has_explicit_ec_withdrawalchecksgpc/us_privacybut notgpp.us_sale_opt_out, whileallows_ec_creationdoes — a GPP-string opt-out blocks new EC but leaves the existing cookie + KV row alive. - Partner
api_tokenplaceholders shipped + embedded —trusted-server.toml:38. Three committed partner tokens pass the 32-byte check;reject_placeholder_secrets()does not cover them;build.rsbakes them into the binary — default Bearer credentials forbatch-sync/identifyin a public repo.
❓ question
- TCF overrides a fresh US opt-out —
crates/trusted-server-core/src/consent/mod.rs:501. A stale TCF storage-consent string silently beats a fresh USopt_out_sale=Yes; no freshness guard. Carried over from round 8 — needs privacy/legal sign-off.
Non-blocking
♻️ refactor
- Dead
KvEntry::minimal+ staleupsert_partner_iddoc —ec/kv.rs:524,ec/kv_types.rs:299. Theupsert_partner_iddoc describes a "missing-root recovery path" the code does not implement (it rejects missing keys, correctly).KvEntry::minimalhas no production caller and builds aconsent.ok = trueentry withcountry: "ZZ"and no consent strings — delete it so it cannot be reintroduced as a consent-gate bypass, and fix the doc. exceeded_per_minutedouble-converts —ec/rate_limiter.rs:42. It multiplies a per-minute limit by 60 and routes back throughhourly_limit_to_per_minute_limit, lossless only because the product is a multiple of 60. Implementexceeded_per_minutedirectly against the 60s counter.- Inconsistent UID whitespace handling —
ec/batch_sync.rs:198,ec/pull_sync.rs,ec/prebid_eids.rs.batch_syncaccepts and stores untrimmedpartner_uid(" abc "passestrim().is_empty()but the raw value is written to KV);prebid_eidstrims. The same partner identity can fragment into whitespace-variant duplicates. Trim once and store uniformly. - Dead
_request_infoparameter —integrations/prebid.rs:878.to_openrtbtakes_request_info: RequestInfo(never used; recomputed internally). Drop it and themake_request_infocall sites. - Inline
useinside functions —ec/pull_sync.rs:376,ec/batch_sync.rs:37.use super::kv_types::MAX_UID_LENGTH;inside a function / after a const violates the CLAUDE.md "no local imports" rule. Hoist to the top import block.
🤔 thinking
/auctionhas no per-route rate limiting —auction/endpoints.rs,adapter-fastly/src/main.rs:368. It is an unauthenticated POST that does KV reads and HTTP fan-out to every bid provider;batch-syncand pull-sync both gate withFastlyRateLimiter. The 256 KiB body cap bounds per-request memory but not request volume. Recommend a per-EC-ID (or per-IP) limiter before production traffic.- Bot gate permanently denies EC to UA-unrecognized browsers —
ec/device.rs:84.looks_like_browser()requires bothja4_classandplatform_class, andplatform_classis a substring match against a fixed UA allowlist. A privacy browser with a frozen/stripped UA, or a future OS whose UA token is not listed, fails the gate on every request and permanently loses KV-backed EC generation + pull-sync. Consider gating onja4_classalone, or treatingplatform_class == Noneas "unknown → allow". cluster_sizesaturates silently at 100 —ec/kv.rs:765.count_hash_prefix_keyscaps at one 100-keylist()page by design, but the value then caches at exactly 100 forever and consumers cannot tell 100 from 100,000. Add aninfo/warnlog when the count hitsCLUSTER_LIST_LIMIT./auctionnever refreshes pull-sync —adapter-fastly/src/main.rs:518. Pull-sync is dispatched only on organic routes; a user whose only repeated traffic is/auction(SPA / ad-refresh pages) never gets partner UIDs filled. Documented as an intentional follow-up — flagging so it is a conscious product decision.fitAuctionEidsToCookiesilently drops identity data —js/lib/src/integrations/prebid/index.ts:507. It progressively dropsuidsthen whole EID sources to fit the cookie budget, from the end of the array, with no test coverage. Add a >3072-byte test and aconsole.warnon truncation.buildRequestsunguardeddocumentaccess —js/lib/src/integrations/prebid/index.ts:319.clearPrebidEidsCookie()touchesdocument.cookieunconditionally, unlikesendAuction/mirrorSourcepointConsentwhich guard their globals. Add atypeof document === 'undefined'early-return, consistent with the rest of the file.- Sourcepoint
looksLikeGppheuristic is fragile —js/lib/src/integrations/sourcepoint/index.ts:104. It accepts any consent string containing a~as GPP and mirrors it into__gpp. A GPP-header prefix check would be safer than matching a stray separator.
🌱 seedling
- EC integration scenarios are happy-path only —
integration-tests/tests/frameworks/scenarios.rs.ec_consent_withdrawalasserts thets-eccookie is expired but never confirms the server wrote a KV tombstone — it would still pass if withdrawal only cleared the cookie. Cheap single-threaded-safe additions: seed aconsent.ok = falserow and assertbatch-syncto it is rejected; assertbatch-syncto a non-existent EC returnsrejected. (The CAS-conflict path genuinely cannot be exercised single-threaded under Viceroy.) - No test for the
build-all.mjsuser-ID code generator —js/lib/build-all.mjs.generatePrebidUserIdModulesdoes registry validation, dev-overrides, and manifest hashing at build time only; a node test asserting the generated imports matchuser_id_modules.jsonwould catch drift in_user_ids.generated.ts.
📝 note
- Bearer auth lacks an explaining comment —
ec/auth.rs. The presented token is SHA-256-hashed and looked up by 64-hex digest in aHashMap; this is sound (the lookup key is itself a hash of the secret, so there is no token-revealing timing channel) and is why no constant-time compare is needed. Three review passes independently re-flagged it — a one-line comment would close it permanently. - Stale doc comments —
ec/mod.rs:142(cookie_ec_valuedoc claims a request header "may take precedence", butparse_ec_from_requestreads only thets-eccookie and ignoresx-ts-econ input);auction/formats.rs:76(convert_tsjs_to_auction_requestdoc still says it generates an EC ID and lists "Fresh EC ID generation fails" under# Errors, but it is read-only now). - Dead Viceroy secret-store entry —
integration-tests/fixtures/configs/viceroy-template.toml.[[local_server.secret_stores.api-keys]]api_key = "test-api-key"is unused; EC tests authenticate with theinttest*partner tokens fromtrusted-server.toml. Remove it or comment why the store must exist. - Pull-sync has no per-request timeout —
ec/pull_sync.rs:261.PendingRequest::wait()has no timeout (the SDK exposes none); a hung partner ties up the compute instance until the platform cap. Best-effort by design and acknowledged — follow-up when timeout APIs land. MISSING_GEO_WARNING_LOGGEDis process-global —consent/mod.rs. On Fastly's per-request instance model the warn-once may fire on nearly every request, or suppress a recurring outage after the first hit. Acceptable noise-reduction; worth a comment noting it is per-instance.
⛏ nitpick
is_valid_ec_idvsis_valid_ec_hashcase asymmetry —ec/generation.rs:155.is_valid_ec_idrejects uppercase hex whileis_valid_ec_hashaccepts it; an uppercase-hexts-eccookie would be silently excluded from the withdrawal tombstone path. Not an active bug (TS-minted cookies are lowercase) — normalize before validating, or cross-reference the deliberate asymmetry in a comment.viceroy-template.tomlis loaded statically — it has no substitution step despite the "template" name; rename toviceroy.toml.parse_client_auction_uiddouble-fetchesext—auction/endpoints.rs:291. Useuid.get("ext").filter(|v| v.is_object()).cloned().scripts/batch-sync.sh— passes the JSON body via-d "$BODY"on the command line (visible inps//proc) and treats any2xxas success; if the endpoint ever returns200withrejected > 0, mappings are silently dropped. Pipe the body via stdin and parserejected.count_hash_prefix_keysbyte-sliceshash_prefix—ec/kv.rs:783. Safe today (ASCII hex);hash_prefix.chars().take(8).collect::<String>()is panic-proof.
Resolved since round 8
- ✅
/auctionbody size cap — Content-Length precheck + post-read check, both413. - ✅ Dead
/ad-proxy/URL rewriting removed; all EC endpoints routed. - ✅
IntegrationRegistry::handle_proxy>7 args →ProxyDispatchInputstruct. - ✅ Geo-unavailable
log::warn!once added. - ✅ Rate-limiter rounding, Prebid EID length cap, batch-sync Content-Length precheck — present and tested.
CI Status
- fmt: PASS (GitHub)
- clippy: PASS —
cargo clippy --workspace --exclude trusted-server-cli --all-targets --all-features -- -D warnings(local) - rust tests: PASS —
cargo test --workspace --exclude trusted-server-cli(local;Analyze (rust)/cargo teststill pending on GitHub) - js tests: PASS — vitest (GitHub)
| continue; | ||
| } | ||
| resp.append_header(name, value); | ||
| resp.set_header(name, value); |
There was a problem hiding this comment.
🔧 wrench — rebuild_response_with_body drops all but the last Set-Cookie (regression vs main).
This header-copy loop was changed from append_header → set_header, and the main regression test rebuild_response_with_body_preserves_multiple_set_cookie_headers was deleted in this PR. Response::get_headers() yields one entry per header value, so iterating multiple Set-Cookie values with set_header (replace semantics) means each call overwrites the previous — only the last Set-Cookie survives. This is reached on the text/html/text/css creative-rewrite proxy paths via finalize_proxied_response, so any proxied origin that sets more than one cookie silently loses all but one. Verified against main with a two-dot diff.
Fix: special-case Set-Cookie (and other inherently multi-valued headers) to use append_header, or restore append_header for all as main had it:
for (name, value) in headers {
if name == header::CONTENT_LENGTH || name == header::CONTENT_TYPE {
continue;
}
if name == header::CONTENT_ENCODING && !preserve_encoding {
continue;
}
resp.append_header(name, value);
}Restore the deleted rebuild_response_with_body_preserves_multiple_set_cookie_headers test.
| if let Some(tcf) = effective_tcf(ctx) { | ||
| return !tcf.has_storage_consent(); | ||
| } | ||
| ctx.us_privacy |
There was a problem hiding this comment.
🔧 wrench — GPP-only US opt-out is not treated as an explicit EC withdrawal.
allows_ec_creation's UsState arm checks ctx.gpp.us_sale_opt_out (lines 505-509) to block new EC, but this has_explicit_ec_withdrawal arm checks only gpc and us_privacy — never gpp.us_sale_opt_out. A returning user who opts out only via a GPP US string (no us_privacy cookie, no GPC header) gets new EC blocked, but their existing ts-ec cookie is not expired and the KV row is not tombstoned (finalize.rs gates the withdrawal path on consent_withdrawn). The identity persists in KV and on the client after a legally valid opt-out.
Fix — mirror the allows_ec_creation precedence (gpc → tcf → gpp → us_privacy):
jurisdiction::Jurisdiction::UsState(_) => {
if ctx.gpc {
return true;
}
if let Some(tcf) = effective_tcf(ctx) {
return !tcf.has_storage_consent();
}
if ctx.gpp.as_ref().and_then(|g| g.us_sale_opt_out) == Some(true) {
return true;
}
ctx.us_privacy
.as_ref()
.is_some_and(|usp| usp.opt_out_sale == PrivacyFlag::Yes)
}Add a withdrawal test driven purely by a GPP string.
| // When a CMP uses TCF in the US (e.g. Didomi), respect the | ||
| // TCF Purpose 1 decision — this is an explicit opt-in signal. | ||
| // The Sourcepoint GPP design documents this precedence decision. | ||
| if let Some(tcf) = effective_tcf(ctx) { |
There was a problem hiding this comment.
❓ question — TCF storage-consent unconditionally overrides a fresh US opt-out (carried over from the round-8 review).
effective_tcf(ctx) returns here before the GPP us_sale_opt_out check (lines 505-509) and the US-Privacy opt_out_sale check (lines 511-513) are ever reached. So a TCF storage-consent string — including a stale one from a prior EU visit, or one carried via a GPP section-2 — silently overrides a fresh US opt_out_sale=Yes. There is no timestamp comparison: apply_expiration_check only clears TCF once it exceeds max_consent_age_days (default 395), so a ~13-month-old TCF grant still wins over a brand-new US opt-out. has_explicit_ec_withdrawal mirrors the same precedence, and both behaviors are now codified by ec_allowed_us_state_tcf_takes_priority_over_us_privacy / ec_us_state_tcf_takes_priority_over_gpp_us.
The inline comment cites "the Sourcepoint GPP design documents this precedence decision," but the rationale isn't reproduced here and this is a real consent-bypass path in US states. Has privacy/legal signed off that an opt-in TCF signal beats a fresh CCPA-style opt-out? If yes, please document the reasoning inline and consider a freshness guard (the US opt-out wins when it is newer than the TCF signal). If no, US opt-out signals (us_privacy, gpp.us_sale_opt_out) should be evaluated before TCF in this arm.
| source_domain = "sharedid.org" | ||
| openrtb_atype = 1 | ||
| bidstream_enabled = true | ||
| api_token = "sharedid-internal-token-32-bytes" |
There was a problem hiding this comment.
🔧 wrench — Partner api_token placeholders ship committed and pass all validation.
This token — and inttest-api-key-1-32-bytes-minimum / inttest2-api-key-2-32-bytes-minimum below — are all ≥32 bytes, so they pass validate_api_token's minimum-length check. Settings::reject_placeholder_secrets() only inspects ec.passphrase and publisher.proxy_secret — never partner api_tokens. crates/trusted-server-core/build.rs reads trusted-server.toml and bakes the merged config into the build. In a public repo, any deployment that doesn't override the [[ec.partners]] block grants /_ts/api/v1/batch-sync write access and /_ts/api/v1/identify read access to anyone who reads this file. The 32-byte length check gives false assurance, and sharedid.org is a real partner shipped enabled by default (not labeled test-only).
Fix: extend reject_placeholder_secrets() to reject known partner-token placeholders (e.g. an EcPartner::API_TOKEN_PLACEHOLDERS const, or a shape check for the shipped -32-bytes / -minimum style values). Alternatively, ship the sharedid.org partner as opt-in and move the inttest* partners into a test-only config that is never embedded in release builds.
Summary
Changes
crates/trusted-server-core/src/ec/(new module)mod.rs(lifecycle orchestration),generation.rs(HMAC-SHA256),cookies.rs(cookie read/write),consent.rs(consent gating),device.rs(device signal derivation, bot gate),kv.rs+kv_types.rs(KV identity graph with CAS),partner.rs(partner registry + admin),sync_pixel.rs(pixel sync),batch_sync.rs(S2S batch sync),pull_sync.rs(background pull sync),identify.rs(identity lookup with CORS),eids.rs(EID encoding),finalize.rs(response finalization middleware),admin.rs(admin endpoints)crates/integration-tests/tests/common/ec.rsandtests/frameworks/scenarios.rs; updatedintegration.rstest runner and Viceroy config fixturescrates/trusted-server-adapter-fastly/src/main.rs/_ts/api/v1/and/_ts/admin/;send_to_client()pattern with background pull sync dispatchcrates/trusted-server-core/src/auction/endpoints.rsandformats.rsupdated to decorate bid requests with partner EIDs from the KV identity graph (user.id,user.eids,user.consent)crates/trusted-server-core/src/integrations/prebid.rscrates/js/lib/src/integrations/prebid/index.tscrates/trusted-server-core/src/synthetic.rscrates/trusted-server-core/src/cookies.rsec/cookies.rscrates/trusted-server-core/src/settings.rs+settings_data.rscrates/trusted-server-core/src/consent/crates/trusted-server-core/src/http_util.rscrates/trusted-server-core/src/proxy.rscrates/trusted-server-core/src/publisher.rscrates/trusted-server-core/src/constants.rsdocs/guide/ec-setup-guide.md(new)docs/guide/edge-cookies.md(new)docs/guide/api-reference.mddocs/guide/configuration.mddocs/guide/synthetic-ids.mddocs/(various)docs/internal/superpowers/specs/2026-03-24-ssc-technical-spec-design.mdfastly.tomltrusted-server.tomlCloses
Closes #532
Closes #533
Closes #534
Closes #535
Closes #536
Closes #537
Closes #538
Closes #539
Closes #540
Closes #541
Closes #542
Closes #543
Closes #544
Closes #611
Closes #612
Follow-up
Test plan
cargo test --workspacecargo clippy --workspace --all-targets --all-features -- -D warningscargo fmt --all -- --checkcd crates/js/lib && npx vitest runcd crates/js/lib && npm run formatcd docs && npm run formatChecklist
unwrap()in production code — useexpect("should ...")tracingmacros (notprintln!)