refactor: consolidate duplicated code (~150 LOC reduction)#71
Conversation
Audit-driven consolidation pass. No behavior change; durability paths (WAL append, flush, Delta commit) only gain message-preserving error wrapping. - errors: add arrow_err/exec_err/wal_err helpers, replace 34 inline .map_err closures across database, mem_buffer, dml, buffered_write_layer, plan_cache, stats_table, functions - functions: reg_from! macro for UDF registration; inline 6 thin create_*_udf wrappers; drop duplicate scalar_to_string in favor of extract_utf8_string - wal: collapse 3 deserialize fns into generic decode_payload<T>; drop the two *_public wrappers (inner fns now pub(crate)) - object_store_cache: record_miss_with_fetch/record_meta_miss_with_fetch collapse repeated stat-update blocks - optimizers: shared extract_utf8_string; config: GIB constant - tests: shared insert_at/insert_for (e2e harness), array_get_str and minio_test_config (test_helpers) replacing 10 duplicated local helpers docs/plans: consolidation audit + rejected-ideas record
Code Review — PR #71:
|
Code Review — refactor: consolidate duplicated code (~150 LOC reduction)Overall: This is a well-scoped, low-risk refactor that delivers what it promises. The changes are mechanical equivalences, the PR description is thorough, and the rejected-ideas doc is a nice touch. A few things worth noting before merge. ✅ What's working well
Test helper consolidation
|
test_variant_column_round_trips_as_json read the attributes Variant column as String, but bare Variant columns surface PG jsonb OID 3802 (required by monoscope's hasql decoder; enforced by jsonb_oid_test::bare_variant_column_returns_jsonb_oid). String FromSql rejects jsonb, so the read failed. Decode as serde_json::Value instead. Pre-existing failure on master since the 2026-06-12 jsonb OID change; the test assertion was not updated then.
Code Review: refactor/consolidate-loc-reductionOverall: This is a high-quality, well-planned refactoring. The changes are mechanical and low-risk, the plan document is a thoughtful artifact, and the rejected-approaches section will save future reviewers from re-litigating bad ideas. A few items worth discussing before merging. What's good
Issues to address1.
Suggestion: move it to a 2. Missing doc comment on
/// Cache miss for a data range — increments misses + inner_gets in data stats.
async fn record_miss_with_fetch(&self) { ... }
/// Cache miss for a metadata range — increments misses + inner_gets in metadata stats.
async fn record_meta_miss_with_fetch(&self) { ... }3. Un-flagged behavior changes in The PR description says "no behavior change" but
4. The diff shows Minor nits (non-blocking)
VerdictItems 1–4 above; 1–3 are the substantive ones. Happy to approve once those are addressed (or discussed if there's a reason |
100ms TTL made the put->get hit window too tight: a loaded CI runner could stall past expiry, turning the first expected hit into a miss (intermittent 'First get should hit' failures). Widen TTL to 2s and expiry sleeps to 2.5s; same assertions, robust timing.
Code Review — PR #71: refactor: consolidate duplicated code (~150 LOC reduction)Summary: Clean, mechanical consolidation pass. The six workstreams are independent, well-scoped, and the diff is easy to verify change-by-change. No behavior changes in production paths. A few observations below. What works well
Observations / minor issues1. // Before: 150ms sleep × 2 = ~300ms
// After: 2500ms sleep × 2 = 5000msThe fix is correct (100ms TTL was too tight under CI load), but the test now holds a serial slot for 5+ seconds. If 2. pub fn exec_err<E: Display>(ctx: &'static str) -> impl Fn(E) -> DataFusionError {All current call sites pass string literals, so this compiles fine. Future callers that need a dynamic context string (e.g., including a column name) would have to fall back to 3.
4. // Before: round-trip via String → serde_json::from_str
// After: direct tokio-postgres binary jsonb decoding → serde_json::Value
let parsed: serde_json::Value = row.get(0);This is technically a test behavior change (not just cleanup), though it's correct if 5. Shipping a planning/audit document as a source artifact is unconventional but explicitly intentional per the PR description. The "do not re-attempt" list is the valuable part. Consider whether No concerns on
Overall: Approve with the observations above noted for follow-up. The |
Code Review — refactor: consolidate duplicated code (~150 LOC reduction)Overall: This is a clean, well-scoped refactor. The deduplication is genuine, the equivalences are correct, and the PR description is admirably transparent about what was rejected and why. A few items worth discussion below. ✅ What's well done
Test deduplication — 🟡 Minor items
The method names imply the methods perform a fetch, but they only record stats for a miss-that-triggered-a-fetch. A reader encountering
This is a scalar-value utility with no semantic tie to the optimizer layer — it's now imported by TTL test is now 5 seconds slower ( The rationale (CI runners can stall past a 100ms TTL) is valid. A 2s TTL + 2.5s sleep is very conservative though — that's ~5× the original wait. If this test is in the hot path for local dev, it might be worth exploring
// Before (just validated JSON was parseable):
serde_json::from_str::<serde_json::Value>(&got).unwrap_or_else(...)
// After (asserts a specific value):
assert_eq!(parsed["http"]["status"], 200, "Sort+Limit path must round-trip variant as jsonb");This is strictly a test improvement, but it's bundled with the refactor. Worth calling out so reviewers don't miss that this test now verifies round-trip correctness on the Sort+Limit path, not just parsability. 🔍 One thing to verify before mergeThe SummaryNo correctness issues found. The two naming suggestions ( |
Summary
Audit-driven consolidation pass to reduce duplicated code and boilerplate, per the repo's "minimal code / zero boilerplate" priority. ~150 fewer lines of code, no behavior change. Durability paths (WAL append, flush, Delta commit) are untouched except message-preserving error wrapping.
A multi-agent audit surfaced 99 candidate opportunities; these were verified against the code and distilled to the low-risk, net-positive set below. Rejected ideas (e.g. a generic
get_or_computecache that would add ~110 LOC, mergingperform_delta_update/delete) were considered and rejected (net-negative LOC or correctness risk) so they aren't re-attempted.Changes
src/errors.rs)arrow_err/exec_err/wal_errhelpers replace 34 inline.map_errclosures across 7 filesreg_from!macro for UDF registration; inline 6 thincreate_*_udfwrappers; drop duplicatescalar_to_string(useextract_utf8_string)decode_payload<T>; drop 2*_publicwrappers (inner fns nowpub(crate))record_miss_with_fetch/record_meta_miss_with_fetchcollapse repeated stat blocksextract_utf8_string;GIBconstantinsert_at/insert_for(e2e harness),array_get_str+minio_test_config(test_helpers) replace 10 duplicated local helpersVerification
cargo build+cargo build --testsclean (only pre-existing vendored-walrusdead-code warnings)cargo fmt+cargo clippy --testscleandecode_payloadround-trips (test_batch_queue_under_loadis a pre-existing flake — fails identically onmaster)test_custom_functionspasses (exercises the UDF macro +extract_utf8_string+array_get_str)Note
MinIO/Docker integration + e2e suites compile cleanly but were not run locally — please run
make test-minio/cargo test --test e2ein CI to fully validate the test-helper changes.This PR intentionally excludes the unrelated
cargo fmtreflows that stable rustfmt produced on files outside this change (the repo formats with nightly options).