Add LZ4HC & LZ4OPT & LZ4MID support#209
Conversation
|
Thanks for the PR! I left some comments. There seem to be many unnecessary |
Implement LZ4 High Compression algorithm (levels 3-9) using hash chain approach for better compression ratios at the cost of compression speed. Features: - HashTableHCU32 with configurable search depth based on level - Match finding with forward and backward extension - Proper distance validation to ensure offsets fit in 16-bit format - Fuzz testing targets for HC compression
Implement optimal parsing algorithm (levels 10-12) for maximum compression ratio. Uses dynamic programming to find the optimal sequence of literals and matches. Also refactors encode_sequence into a reusable function and makes handle_last_literals pub(crate) for use by HC algorithms.
Implement lz4mid algorithm matching C LZ4HC behavior for compression levels 0-2. This provides better compression than the fast algorithm while being faster than HC. Compression level routing in compress_hc (matching C k_clTable): - Levels 0-2: lz4mid (two hash tables: 4-byte and 8-byte) - Levels 3-9: lz4hc (hash chain algorithm) - Levels 10-12: lz4opt (optimal parsing) Also exports backtrack_match and count_same_bytes as pub(crate) for reuse by lz4mid.
Add FrameEncoder::with_compression_level() constructor (requires `hc` feature) that allows selecting compression algorithm via level parameter (matching C LZ4 CLI): - Level 1: Fast algorithm (supports linked blocks) - Level 2: lz4mid intermediate algorithm (independent blocks) - Levels 3-9: HC hash chain algorithm (independent blocks) - Levels 10-12: Optimal parsing algorithm (independent blocks) Levels 2+ automatically force BlockMode::Independent since HC/mid compression doesn't support linked blocks.
Add command-line options for the lz4 binary: - `-l/--level`: Compression level 1-12 (requires `hc` feature) - Level 1: fast algorithm - Level 2: lz4mid - Levels 3-9: HC hash chain - Levels 10-12: optimal parsing - `-B/--block-size`: Block size (4=64KB, 5=256KB, 6=1MB, 7=4MB) The level option is only available when compiled with `hc` feature (enabled by default). Block size defaults to 4MB to match C lz4 CLI.
Reduce memory allocation for small inputs by sizing the chain table proportionally to input length instead of always allocating 128KB. For a 725B input, chain table is now 1KB instead of 128KB (-49% total).
When safe-encode feature is disabled, use get_unchecked/get_unchecked_mut for dict and chain table accesses to eliminate bounds checking overhead. Provides ~17% speedup in HC compression hot paths.
Add build and test coverage for the HC (high compression) feature: - Build tests for no_std with hc and hc+safe-encode - Unit tests with hc, hc+frame, hc+safe-encode combinations - Fuzz tests for HC in unsafe mode (fuzz_roundtrip_hc, fuzz_roundtrip_hc_cpp, fuzz_roundtrip_frame)
- Match struct: start/len/ref_pos from usize to u32 (24 → 12 bytes), halves stack pressure in HC's 4-Match juggling loop - OptimalState: use i32 for all fields matching C's LZ4HC_optimal_t layout (u16 off/mlen caused 15-20% regression from widening conversions) - find_longer_match: return (u32, u16), params narrowed to u32 - count_same_bytes: accept explicit match_limit parameter, use pointer-based loop matching C's LZ4_count for ~10% speedup - HC common_bytes: thin wrapper delegating to shared count_same_bytes - Pre-check uses single u16 read instead of two byte comparisons - Safe variant uses chunks_exact + zip iterators - Added #[inline] hints and sufficient_len cap matching C's behavior - Add thread-local cached state for HC
- Add doc comments to HashTableHCU32 and Match struct fields. Replace - unsafe write_bytes with safe .fill(0) in HashTableHCU32::reset. - Rename pattern32 to pattern in count/reverse_count_pattern.
|
Hi @PSeitz, all concerns addressed. As for
sub x8, x1, #1 ; mask = len - 1
and x8, x2, x8 ; idx = pos & mask
ldrh w0, [x0, x8, lsl #1] ; return chain_table[idx]
retWithout (branch + full panic path): sub x9, x1, #1 ; mask = len - 1
and x0, x2, x9 ; idx = pos & mask
cbz x1, LBB6_2 ; if len == 0, panic ← BOUNDS CHECK
ldrh w0, [x8, x0, lsl #1]
ret
LBB6_2:
; ... 6 more instructions setting up panic_bounds_check call ...
bl panic_bounds_checkNote: LLVM is smart enough to realize
ldr w0, [x0, x2, lsl #2] ; return dict[hash]
retWithout (compare + conditional branch + panic path): cmp x2, x1 ; hash >= len? ← BOUNDS CHECK
b.hs LBB8_2 ; if so, panic
ldr w0, [x0, x2, lsl #2]
ret
LBB8_2:
; ... 7 more instructions setting up panic_bounds_check call ...
bl panic_bounds_checkNote that functions using A conditional branch (cbz/b.hs) on the hot path — even if perfectly predicted, this costs a slot in the branch predictor and prevents certain instruction reorderings. |
- Block: compress_hc_to_vec levels 1-12 - Frame: all levels x Independent/Linked; add lz4_flex_frame_compress_with_level - C interop: decompress lz4_flex HC frames at all levels
The pointer-based rewrite of count_same_bytes regressed fast block compression ~3–4% on benchmarks; behavior is unchanged: same match_limit bounds, origin/main-style *cur/input_end loop for unsafe path. HC callers still share this helper with identical semantics.
Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>
- Use #[inline] on HC helpers instead of #[inline(always)] (HC is new vs origin/main; fast block still matches main on copy_literals_wild). - Clarify count_pattern / reverse_count_pattern: one repeated byte is passed as a u32 with four copies (e.g. 0xAB -> 0xABABABAB) for XOR batch scans. - Rename pre_check_ok to tail_matches_past_best in find_longer_match.
Introduce PatternChainAction and HashTableHCU32::pattern_chain_action to replace deeply nested pattern/repeat detection with early returns and a small match at the call site.
- HcLevelParams / HcCompressionStrategy; frame encoder linked-block path - Expand optimal DP locals (optimal_states, opt_window_index, match length/offset names) - MIN_BYTES_FROM_CURSOR_TO_BLOCK_END alias for block tail room (MFLIMIT)
|
@PSeitz hi, thanks for your review. Is there any further concerns that block this PR from merging? |
Performance:
Block Compression (Safe)
Frame Compression (Safe)
close #21
close #165