fix(occ): manifest list rebuild#982
Open
mzzz-zzm wants to merge 5 commits intoapache:mainfrom
Open
Conversation
When doCommit retries after an OCC conflict (ErrCommitFailed / HTTP 409), the AddSnapshotUpdate still carries the manifest list built against the original base snapshot. That list does not include any manifests committed by concurrent writers between the first attempt and the retry, so the retried commit silently discards those manifests and the data they reference is lost. Fix: snapshotProducer records its own manifest files at commit-build time (computeOwnManifests). doCommit calls a rebuildManifestList closure on each retry that rewrites the AddSnapshotUpdate with a fresh manifest list that prepends the current branch-head manifests before the producer's own files. Orphaned manifest-list files from failed attempts are deleted after a successful commit. Fixes apache#976
Covers four cases: - closure is called with the fresh branch head as freshParent - rebuild is skipped when snapshot parent already matches the fresh head - updates without a rebuild closure pass through unchanged - errors returned by the closure propagate to the caller
Verifies that a retried snapshot correctly inherits manifests from concurrent writers rather than reusing the stale manifest list. The test fails on main (got 1 manifest) and passes with this fix (got 2 manifests), serving as a direct regression guard for the manifest-list rebuild logic in rebuildSnapshotUpdates. Signed-off-by: mzzz-zzm <80936176+mzzz-zzm@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
fix(occ): rebuild manifest list on OCC retry to inherit concurrent writes
Fixes #976
Problem
When
doCommitretries after an OCC conflict, it re-uses the stale snapshot that was built before the first attempt. That snapshot's manifest list was written against the original parent and does not include any data files committed by concurrent writers in the meantime.On catalogs that perform server-side snapshot validation (e.g. AWS S3 Tables), this causes silent data loss: the retried commit succeeds (HTTP 200) but the stale manifest list effectively drops the concurrent writer's files from the table history.
Fix
Each
snapshotProducernow records which manifests it wrote itself (ownManifests— those not inherited from the original parent). ArebuildManifestListclosure is attached to theaddSnapshotUpdateand called bydoCommiton every retry. The closure:ownManifestswith the fresh inherited manifests.sequence-numberrelative to the fresh parent.After a successful commit,
doCommitdeletes the manifest list files produced by superseded retry attempts (orphan cleanup).Files changed
table/snapshot_producers.go:computeOwnManifests,rebuildFnclosuretable/updates.go:ownManifests/rebuildManifestListfields onaddSnapshotUpdate, propagated throughApplytable/table.go:rebuildSnapshotUpdateshelper, orphan cleanup after committable/rebuild_manifest_test.go: unit tests forrebuildSnapshotUpdates