fix(data): preserve merkle resume after preflight by mickvandijke · Pull Request #94 · WithAutonomi/ant-client

mickvandijke · 2026-05-21T14:18:56Z

Summary

This PR changes merkle uploads so payment is based only on chunks that still need to be stored.

Code changes:

Adds a merkle upload preflight planner that probes each chunk through the existing quote path. AlreadyStored responses are treated as stored chunks, while normal quote responses become the to_upload set for merkle payment.
If every chunk is already stored, upload returns successfully without fetching merkle quotes, submitting payment, or PUTing chunks again.
If only some chunks are already stored, merkle payment/proofs are built only for the remaining chunks, and upload progress/final counts include the skipped stored chunks.
Re-runs the existing payment-mode threshold after preflight. In Auto mode, if the remaining chunks drop below the merkle threshold, the upload switches to single-node payment for only those remaining chunks. Forced Merkle still follows the existing Merkle-mode minimum.
Preserves merkle resume behavior by loading cached merkle receipts before preflight fallback decisions. If preflight fails but cached proofs cover the file, the upload resumes with cached proofs instead of paying again through the single-node path.
Updates external-signer file preparation to use the same preflight behavior and to fall back to wave-batch preparation when merkle candidate collection has insufficient peers.
Replaces address HashSet filtering with an order-preserving content/address selector and adds a length check before merkle upload so chunk contents cannot silently desync from proof addresses.

Verification

cargo fmt --all -- --check
cargo check --all-targets --all-features
cargo clippy --all-targets --all-features -- -D warnings
cargo test -p ant-core --lib

Preflight merkle uploads with the existing AlreadyStored quote signal, skip chunks already on the network, and fall back to single-node payment when the remaining chunk count is below the merkle threshold. Carry already-stored chunk counts through external-signer finalization so upload totals remain accurate.

Prefer valid cached merkle proofs when preflight quote checks fail so retries do not repay through the single-node path. Fall back external-signer merkle preparation to wave-batch on insufficient peers, and keep chunk contents aligned with merkle addresses before upload.

dirvine

Review: fix(data): preserve merkle resume after preflight

I have reviewed the full diff (+851/−216 across data.rs, file.rs, merkle.rs). All CI checks pass (Format, Clippy, Unit Tests ×2, E2E ×2, Merkle E2E ×2, Docs, Build ×2, Security Audit).

Strengths

Architecture: The preflight planner (plan_merkle_upload) slots correctly into the existing quote pipeline — reusing get_store_quotes and the AlreadyStored error path means zero new network protocol surface. The MerkleUploadPlan struct cleanly separates concerns: planning, filtering, and execution.

Correctness of the race window: Between preflight and payment, a chunks status can change, but the design handles both directions safely:

Stored → not stored: payment proceeds normally, PUT succeeds.
Not stored → stored: payment succeeds, network handles AlreadyExists gracefully on PUT.
Worst case is a small overpayment, never an underpayment.

Edge cases covered well:

Empty to_upload → returns success immediately with correct chunks_stored: chunk_count
Threshold re-check after preflight → switches to single-node if remaining chunks drop below merkle threshold
Cache invalidation → cached proofs validated against the actual remaining chunk set before reuse
Length check in merkle_upload_chunks → prevents silent address/content desync

Order preservation: chunk_contents_for_upload_addresses uses VecDeque to handle duplicate-content chunks correctly while preserving address order. Good.

Progress reporting: The stored_offset parameter threads already-stored counts through to progress events, so UI progress bars stay accurate for the total file, not just the payment batch.

Minor suggestions (non-blocking)

chunk_contents_for_upload_addresses — HashMap overhead with many duplicates: If the same chunk content appears hundreds of times (e.g. all-zero padding), building a HashMap of VecDeques is O(n) memory. Fine for typical file chunks, but worth noting if someone uploads a sparse file with many identical chunks.
plan_merkle_upload logs at info! for every chunk in debug mode: The per-chunk debug! logs are fine, but the info!("Checking {total_chunks} merkle chunks...") at the start and completion summary at the end are well-placed. No change needed.
Public API surface: PreparedUpload now carries already_stored_addresses: Vec<[u8; 32]> and total_chunks: usize. These are needed for correct finalization, but make sure any bindings/wasm wrappers propagate them through to the final FileUploadResult. The existing field docs in the struct are clear enough.

Verdict

This is a well-structured, correctly implemented optimization that meaningfully reduces user costs for re-upload scenarios. The resume-preservation logic is handled carefully (cache validity check before discard). I am happy to approve.

Approved ✅

mickvandijke added 3 commits May 21, 2026 12:46

fix(docs): avoid private merkle preflight link

5bcf437

dirvine approved these changes May 21, 2026

View reviewed changes

fix(data): trim merkle chunk selection memory

5b8de0f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(data): preserve merkle resume after preflight#94

fix(data): preserve merkle resume after preflight#94
mickvandijke wants to merge 4 commits into
mainfrom
fix/merkle-preflight-resume

mickvandijke commented May 21, 2026 •

edited

Loading

Uh oh!

dirvine left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mickvandijke commented May 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Verification

Uh oh!

dirvine left a comment

Choose a reason for hiding this comment

Review: fix(data): preserve merkle resume after preflight

Strengths

Minor suggestions (non-blocking)

Verdict

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mickvandijke commented May 21, 2026 •

edited

Loading