feat: add pre_seed phase — S3 zip download before pre_boot#1189
Conversation
Adds a new [pre_seed] config section that runs before pre_boot hooks. Downloads up to 5 zip archives from S3 and extracts them in order to $HOME (or a custom target), implementing a layer system where later archives overwrite earlier ones. Closes #1188
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
- Move pre_seed under [hooks.pre_seed] (consistent lifecycle grouping) - Add SHA-256 integrity verification (sha256s field) - Add max_bytes size cap (default 100 MiB) to prevent OOM - Extract to temp dir first, then move into target (atomic, fixes spawn_blocking timeout race condition) - Remove bytes.to_vec() — pass Bytes slice directly - Add region/endpoint_url override (LocalStack/VPC support) - Use shared parse_s3_uri from config.rs (dedup) - Move pre-seed to opt-in feature (not in default) - Manual Default impl (timeout_seconds=300, max_bytes=100MiB) - Update docs and config example
This comment has been minimized.
This comment has been minimized.
- Replace tokio::time::timeout with cooperative Instant deadline passed into the blocking task - Check deadline before each file extraction and each move operation - If deadline expires mid-extraction, bail immediately (temp dir auto-cleans via Drop) — target never gets partial writes - Add extracted bytes budget (500 MiB) and file count limit (10k) to prevent zip-bomb disk/CPU exhaustion - Add tests for expired deadline, move deadline enforcement
Bytes is Arc-backed; clone is a ref-count bump, not a memcpy. This eliminates the last unnecessary memory copy.
- Add test for file count limit (DEFAULT_MAX_FILE_COUNT + 1 entries) - Add test verifying normal zips pass budget checks - Update docs/hooks.md: clarify move phase is per-file, not atomic (partial apply possible with on_failure=warn)
- extract_rejects_exceeding_extracted_bytes: uses extract_zip_budgeted with max_bytes=20, zip has 30 bytes → fails as expected - extract_rejects_exceeding_file_count: uses extract_zip_budgeted with max_file_count=3, zip has 5 files → fails as expected - Refactored extract_zip_with_limits to delegate to extract_zip_budgeted with configurable limits for testability
This comment has been minimized.
This comment has been minimized.
- Request ChecksumMode::Enabled in GetObject call - If S3 object has x-amz-checksum-sha256 (uploaded with --checksum-algorithm SHA256), verify automatically - User-provided sha256s still works as additional layer - Add docs recommending 'aws s3 cp --checksum-algorithm SHA256' - No config change needed for S3-native verification
S3 supports native SHA-256 checksums (x-amz-checksum-sha256) when objects are uploaded with --checksum-algorithm SHA256. OpenAB now automatically verifies this on download — no user config needed. Removed the sha256s field to reduce maintenance burden (users had to update hashes on every zip change). Trust model: - Object has S3 checksum → auto-verified - Object has no checksum → trust IAM + bucket policy (same as config-s3) Users just need: aws s3 cp file.zip s3://bucket/ --checksum-algorithm SHA256
This comment has been minimized.
This comment has been minimized.
|
LGTM ✅ — Well-structured pre_seed implementation with proper safety boundaries and clean integration. What This PR DoesAdds a How It WorksNew Findings
Finding Details🟢 F1: Comprehensive zip extraction safetyMultiple layers of protection: 🟢 F2: Good test coverageTests cover the important edge cases: expired deadlines, size budget overflow, file count limits, layer overwrite semantics, and the empty/overflow source count validation. All tests are synchronous where possible (faster CI). 🟢 F3: Clean dependency managementReuses existing workspace deps ( 🟢 F4: Documentation qualityThe docs include lifecycle ordering, field reference tables, IAM policy examples, and S3 checksum upload instructions. The Baseline Check
Minor Notes (non-blocking)
What's Good (🟢)
|
Summary
Implements #1188 — adds a
[pre_seed]config section that downloads and extracts zip archives from S3 before thepre_boothook runs. This seeds the agent environment (configs, tools, memory files) without requiring AWS CLI in the container image.Changes
crates/openab-core/src/config.rs—PreSeedConfigstruct (sources, target, timeout, on_failure)crates/openab-core/src/pre_seed.rs— new module: S3 download + zip extraction with layer semanticscrates/openab-core/src/lib.rs— exportpre_seed(gated bypre-seedfeature)crates/openab-core/Cargo.toml— addzipdep +pre-seedfeatureCargo.toml— forwardpre-seedfeature toopenab-coresrc/main.rs— callpre_seed::run()beforepre_boothookconfig.toml.example— add commented[pre_seed]exampledocs/hooks.md— lifecycle diagram + pre_seed docsdocs/config-reference.md—[pre_seed]field tableKey Design Decisions
pre-seedfeature (default-on), compiles away when unusedLifecycle
Testing
parse_s3_uri,extract_zip,run_empty,run_too_manycargo verify-project✅cargo metadataresolves ✅Closes #1188