Add LZ4HC & LZ4OPT & LZ4MID support by yujincheng08 · Pull Request #209 · PSeitz/lz4_flex

yujincheng08 · 2026-03-12T01:14:19Z

Performance:

Block Compression (Safe)

Size	Level	Thru (Rust)	Thru (C)	Thru Diff	Ratio (Rust)	Ratio (C)	Ratio Diff
725	L1	123.6 MB/s	171.1 MB/s	-27.7%	76.14%	74.76%	+1.38%
725	L2	122.9 MB/s	171.0 MB/s	-28.1%	76.14%	74.76%	+1.38%
725	L3	193.9 MB/s	168.7 MB/s	+14.9%	74.76%	74.76%	0
725	L4	191.7 MB/s	170.4 MB/s	+12.5%	74.76%	74.76%	0
725	L5	191.6 MB/s	169.1 MB/s	+13.3%	74.76%	74.76%	0
725	L6	195.9 MB/s	172.7 MB/s	+13.4%	74.76%	74.76%	0
725	L7	195.6 MB/s	172.5 MB/s	+13.4%	74.76%	74.76%	0
725	L8	196.3 MB/s	170.8 MB/s	+14.9%	74.76%	74.76%	0
725	L9	196.0 MB/s	171.9 MB/s	+14.0%	74.76%	74.76%	0
725	L10	114.8 MB/s	110.7 MB/s	+3.7%	74.62%	74.62%	0
725	L11	114.9 MB/s	111.2 MB/s	+3.4%	74.62%	74.62%	0
725	L12	97.5 MB/s	95.8 MB/s	+1.8%	74.62%	74.62%	0
34K	L1	498.0 MB/s	421.3 MB/s	+18.2%	50.25%	48.40%	+1.85%
34K	L2	500.8 MB/s	425.9 MB/s	+17.6%	50.25%	48.40%	+1.85%
34K	L3	317.4 MB/s	353.4 MB/s	-10.2%	47.39%	47.38%	+0.01%
34K	L4	258.2 MB/s	293.4 MB/s	-12.0%	46.88%	46.80%	+0.08%
34K	L5	215.8 MB/s	241.6 MB/s	-10.7%	46.67%	46.54%	+0.13%
34K	L6	212.3 MB/s	201.2 MB/s	+5.5%	46.59%	46.42%	+0.17%
34K	L7	200.3 MB/s	180.9 MB/s	+10.7%	46.57%	46.37%	+0.20%
34K	L8	202.0 MB/s	150.6 MB/s	+34.1%	46.57%	46.35%	+0.22%
34K	L9	171.1 MB/s	145.6 MB/s	+17.5%	46.57%	46.33%	+0.24%
34K	L10	73.3 MB/s	69.3 MB/s	+5.8%	46.11%	46.11%	0
34K	L11	70.4 MB/s	67.2 MB/s	+4.9%	46.10%	46.10%	0
34K	L12	46.9 MB/s	47.5 MB/s	-1.4%	46.09%	46.09%	0
65K	L1	497.5 MB/s	348.2 MB/s	+42.9%	49.21%	47.43%	+1.78%
65K	L2	497.7 MB/s	335.6 MB/s	+48.3%	49.21%	47.43%	+1.78%
65K	L3	235.1 MB/s	246.1 MB/s	-4.5%	46.17%	46.12%	+0.05%
65K	L4	174.8 MB/s	188.4 MB/s	-7.2%	45.58%	45.49%	+0.09%
65K	L5	143.4 MB/s	145.6 MB/s	-1.5%	45.29%	45.15%	+0.14%
65K	L6	125.9 MB/s	114.2 MB/s	+10.3%	45.21%	45.03%	+0.18%
65K	L7	116.3 MB/s	100.7 MB/s	+15.5%	45.18%	44.98%	+0.19%
65K	L8	112.4 MB/s	83.6 MB/s	+34.5%	45.18%	44.95%	+0.23%
65K	L9	97.8 MB/s	78.1 MB/s	+25.3%	45.18%	44.94%	+0.24%
65K	L10	51.0 MB/s	49.5 MB/s	+3.0%	44.77%	44.77%	0
65K	L11	45.0 MB/s	44.0 MB/s	+2.3%	44.72%	44.72%	0
65K	L12	34.6 MB/s	34.5 MB/s	+0.5%	44.72%	44.72%	0
66K JSON	L1	1376 MB/s	676 MB/s	+103.5%	21.15%	19.42%	+1.74%
66K JSON	L2	1377 MB/s	677 MB/s	+103.4%	21.15%	19.42%	+1.74%
66K JSON	L3	618 MB/s	551 MB/s	+12.1%	18.96%	18.81%	+0.15%
66K JSON	L4	518 MB/s	418 MB/s	+23.8%	18.77%	18.64%	+0.13%
66K JSON	L5	376 MB/s	282 MB/s	+33.4%	18.63%	18.50%	+0.14%
66K JSON	L6	291 MB/s	197 MB/s	+47.3%	18.55%	18.40%	+0.15%
66K JSON	L7	231 MB/s	153 MB/s	+50.5%	18.40%	18.34%	+0.06%
66K JSON	L8	214 MB/s	132 MB/s	+62.4%	18.42%	18.33%	+0.09%
66K JSON	L9	181 MB/s	114 MB/s	+58.1%	18.42%	18.32%	+0.10%
66K JSON	L10	69.5 MB/s	67.2 MB/s	+3.4%	18.04%	18.04%	0
66K JSON	L11	47.9 MB/s	47.6 MB/s	+0.8%	18.02%	18.02%	0
66K JSON	L12	40.7 MB/s	42.0 MB/s	-3.3%	18.02%	18.02%	0
10MB	L1	265.8 MB/s	137.8 MB/s	+92.9%	50.84%	49.76%	+1.09%
10MB	L2	266.4 MB/s	137.8 MB/s	+93.3%	50.84%	49.76%	+1.09%
10MB	L3	102.1 MB/s	102.4 MB/s	-0.3%	47.34%	47.27%	+0.08%
10MB	L4	74.3 MB/s	74.4 MB/s	-0.2%	45.73%	45.64%	+0.09%
10MB	L5	54.9 MB/s	54.3 MB/s	+1.1%	44.85%	44.71%	+0.14%
10MB	L6	43.7 MB/s	39.8 MB/s	+10.0%	44.47%	44.26%	+0.21%
10MB	L7	38.6 MB/s	33.3 MB/s	+15.6%	44.35%	44.07%	+0.28%
10MB	L8	36.9 MB/s	29.8 MB/s	+23.8%	44.33%	44.01%	+0.32%
10MB	L9	36.8 MB/s	26.9 MB/s	+36.6%	44.33%	43.97%	+0.35%
10MB	L10	22.9 MB/s	22.3 MB/s	+2.7%	43.56%	43.56%	0
10MB	L11	18.6 MB/s	18.3 MB/s	+1.6%	43.45%	43.45%	0
10MB	L12	17.6 MB/s	17.6 MB/s	0.0%	43.45%	43.45%	0
96K jpg	L1	1350 MB/s	159 MB/s	+747%	100.24%	100.15%	+0.09%
96K jpg	L2	1347 MB/s	161 MB/s	+739%	100.24%	100.15%	+0.09%
96K jpg	L3	96.2 MB/s	105.5 MB/s	-8.8%	100.14%	100.14%	0
96K jpg	L4	94.7 MB/s	101.7 MB/s	-6.9%	100.14%	100.14%	0
96K jpg	L5	93.9 MB/s	101.8 MB/s	-7.7%	100.14%	100.14%	0
96K jpg	L6	94.8 MB/s	101.6 MB/s	-6.6%	100.14%	100.14%	0
96K jpg	L7	95.1 MB/s	101.4 MB/s	-6.2%	100.14%	100.14%	0
96K jpg	L8	94.7 MB/s	95.6 MB/s	-0.9%	100.14%	100.14%	0
96K jpg	L9	91.4 MB/s	96.0 MB/s	-4.7%	100.14%	100.14%	0
96K jpg	L10	74.4 MB/s	75.4 MB/s	-1.4%	100.14%	100.14%	0
96K jpg	L11	75.8 MB/s	76.3 MB/s	-0.6%	100.14%	100.14%	0
96K jpg	L12	75.5 MB/s	76.2 MB/s	-0.9%	100.14%	100.14%	0

Frame Compression (Safe)

Size	Level	Thru (Rust)	Thru (C)	Thru Diff	Ratio (Rust)	Ratio (C)	Ratio Diff
725	L1	680.1 MB/s	205.8 MB/s	+230.5%	80.28%	78.90%	+1.38%
725	L2	117.5 MB/s	205.6 MB/s	-42.8%	78.21%	78.90%	-0.69%
725	L3	180.4 MB/s	135.0 MB/s	+33.6%	76.83%	76.83%	0
725	L4	181.2 MB/s	134.9 MB/s	+34.3%	76.83%	76.83%	0
725	L5	181.4 MB/s	134.7 MB/s	+34.7%	76.83%	76.83%	0
725	L6	180.5 MB/s	134.5 MB/s	+34.2%	76.83%	76.83%	0
725	L7	180.7 MB/s	134.5 MB/s	+34.3%	76.83%	76.83%	0
725	L8	181.3 MB/s	135.3 MB/s	+34.0%	76.83%	76.83%	0
725	L9	180.7 MB/s	135.2 MB/s	+33.6%	76.83%	76.83%	0
725	L10	107.2 MB/s	94.5 MB/s	+13.5%	76.69%	76.69%	0
725	L11	107.2 MB/s	94.1 MB/s	+14.0%	76.69%	76.69%	0
725	L12	92.1 MB/s	83.5 MB/s	+10.2%	76.69%	76.69%	0
10MB	L1	558.3 MB/s	429.3 MB/s	+30.1%	64.21%	64.18%	+0.03%
10MB	L2	276.9 MB/s	427.5 MB/s	-35.2%	51.99%	64.18%	-12.18%
10MB	L3	105.3 MB/s	106.0 MB/s	-0.7%	48.60%	48.53%	+0.07%
10MB	L4	77.8 MB/s	79.0 MB/s	-1.5%	47.13%	47.05%	+0.08%
10MB	L5	59.3 MB/s	59.2 MB/s	+0.2%	46.35%	46.22%	+0.13%
10MB	L6	48.2 MB/s	44.6 MB/s	+8.2%	46.02%	45.82%	+0.20%
10MB	L7	43.1 MB/s	37.9 MB/s	+13.8%	45.92%	45.66%	+0.26%
10MB	L8	41.5 MB/s	34.0 MB/s	+21.9%	45.90%	45.60%	+0.30%
10MB	L9	41.4 MB/s	31.0 MB/s	+33.5%	45.90%	45.57%	+0.33%
10MB	L10	25.6 MB/s	25.1 MB/s	+2.1%	45.18%	45.18%	0
10MB	L11	21.2 MB/s	20.9 MB/s	+1.6%	45.09%	45.09%	0
10MB	L12	19.7 MB/s	19.6 MB/s	+0.2%	45.08%	45.08%	0
hdfs 8.4MB	L1	2255 MB/s	1916 MB/s	+17.7%	9.37%	9.35%	+0.02%
hdfs 8.4MB	L2	1812 MB/s	1918 MB/s	-5.5%	8.93%	9.35%	-0.43%
hdfs 8.4MB	L3	648.6 MB/s	603.5 MB/s	+7.5%	7.25%	7.20%	+0.05%
hdfs 8.4MB	L4	530.1 MB/s	484.0 MB/s	+9.5%	7.20%	7.15%	+0.05%
hdfs 8.4MB	L5	442.3 MB/s	385.0 MB/s	+14.9%	7.17%	7.12%	+0.05%
hdfs 8.4MB	L6	377.5 MB/s	286.1 MB/s	+32.0%	7.15%	7.09%	+0.06%
hdfs 8.4MB	L7	339.8 MB/s	208.6 MB/s	+62.9%	7.14%	7.07%	+0.08%
hdfs 8.4MB	L8	338.9 MB/s	187.5 MB/s	+80.7%	7.14%	7.07%	+0.07%
hdfs 8.4MB	L9	336.7 MB/s	174.6 MB/s	+92.8%	7.14%	7.07%	+0.08%
hdfs 8.4MB	L10	157.7 MB/s	161.5 MB/s	-2.4%	7.21%	7.21%	0
hdfs 8.4MB	L11	97.0 MB/s	98.0 MB/s	-1.0%	7.03%	7.03%	0
hdfs 8.4MB	L12	56.5 MB/s	58.7 MB/s	-3.8%	7.01%	7.01%	0
reymont 6.6MB	L1	521.3 MB/s	504.0 MB/s	+3.4%	48.59%	48.58%	+0.01%
reymont 6.6MB	L2	365.5 MB/s	504.8 MB/s	-27.6%	40.64%	48.58%	-7.94%
reymont 6.6MB	L3	138.2 MB/s	136.9 MB/s	+1.0%	37.63%	37.55%	+0.07%
reymont 6.6MB	L4	102.1 MB/s	101.4 MB/s	+0.7%	35.83%	35.74%	+0.09%
reymont 6.6MB	L5	75.5 MB/s	74.1 MB/s	+1.8%	34.73%	34.57%	+0.15%
reymont 6.6MB	L6	57.5 MB/s	52.5 MB/s	+9.4%	34.15%	33.88%	+0.27%
reymont 6.6MB	L7	46.9 MB/s	38.6 MB/s	+21.3%	33.90%	33.45%	+0.46%
reymont 6.6MB	L8	42.9 MB/s	28.9 MB/s	+48.6%	33.84%	33.20%	+0.63%
reymont 6.6MB	L9	42.2 MB/s	22.8 MB/s	+84.8%	33.83%	33.08%	+0.75%
reymont 6.6MB	L10	21.0 MB/s	20.1 MB/s	+4.5%	32.74%	32.74%	0
reymont 6.6MB	L11	11.8 MB/s	11.6 MB/s	+1.9%	32.40%	32.40%	0
reymont 6.6MB	L12	14.6 MB/s	14.6 MB/s	+0.2%	32.39%	32.39%	0
xml 5.3MB	L1	1147 MB/s	1004 MB/s	+14.2%	23.52%	23.53%	-0.00%
xml 5.3MB	L2	738.5 MB/s	1009 MB/s	-26.8%	20.16%	23.53%	-3.37%
xml 5.3MB	L3	296.8 MB/s	298.4 MB/s	-0.5%	16.85%	16.74%	+0.11%
xml 5.3MB	L4	236.8 MB/s	237.6 MB/s	-0.3%	16.17%	16.08%	+0.09%
xml 5.3MB	L5	191.6 MB/s	189.1 MB/s	+1.3%	15.80%	15.69%	+0.11%
xml 5.3MB	L6	161.2 MB/s	149.7 MB/s	+7.7%	15.62%	15.48%	+0.14%
xml 5.3MB	L7	141.3 MB/s	123.6 MB/s	+14.3%	15.54%	15.38%	+0.17%
xml 5.3MB	L8	130.3 MB/s	104.5 MB/s	+24.7%	15.51%	15.32%	+0.19%
xml 5.3MB	L9	124.4 MB/s	92.8 MB/s	+34.0%	15.50%	15.31%	+0.19%
xml 5.3MB	L10	71.5 MB/s	69.8 MB/s	+2.5%	15.29%	15.29%	0
xml 5.3MB	L11	41.9 MB/s	41.3 MB/s	+1.6%	15.16%	15.16%	0
xml 5.3MB	L12	38.6 MB/s	39.3 MB/s	-1.8%	15.13%	15.13%	0

close #21
close #165

PSeitz · 2026-03-15T19:03:33Z

Thanks for the PR! I left some comments.

There seem to be many unnecessary unsafe blocks and the main compression loop is quite unidiomatic (too repetitive and nested).

Implement LZ4 High Compression algorithm (levels 3-9) using hash chain approach for better compression ratios at the cost of compression speed. Features: - HashTableHCU32 with configurable search depth based on level - Match finding with forward and backward extension - Proper distance validation to ensure offsets fit in 16-bit format - Fuzz testing targets for HC compression

Implement optimal parsing algorithm (levels 10-12) for maximum compression ratio. Uses dynamic programming to find the optimal sequence of literals and matches. Also refactors encode_sequence into a reusable function and makes handle_last_literals pub(crate) for use by HC algorithms.

Implement lz4mid algorithm matching C LZ4HC behavior for compression levels 0-2. This provides better compression than the fast algorithm while being faster than HC. Compression level routing in compress_hc (matching C k_clTable): - Levels 0-2: lz4mid (two hash tables: 4-byte and 8-byte) - Levels 3-9: lz4hc (hash chain algorithm) - Levels 10-12: lz4opt (optimal parsing) Also exports backtrack_match and count_same_bytes as pub(crate) for reuse by lz4mid.

Add FrameEncoder::with_compression_level() constructor (requires `hc` feature) that allows selecting compression algorithm via level parameter (matching C LZ4 CLI): - Level 1: Fast algorithm (supports linked blocks) - Level 2: lz4mid intermediate algorithm (independent blocks) - Levels 3-9: HC hash chain algorithm (independent blocks) - Levels 10-12: Optimal parsing algorithm (independent blocks) Levels 2+ automatically force BlockMode::Independent since HC/mid compression doesn't support linked blocks.

Add command-line options for the lz4 binary: - `-l/--level`: Compression level 1-12 (requires `hc` feature) - Level 1: fast algorithm - Level 2: lz4mid - Levels 3-9: HC hash chain - Levels 10-12: optimal parsing - `-B/--block-size`: Block size (4=64KB, 5=256KB, 6=1MB, 7=4MB) The level option is only available when compiled with `hc` feature (enabled by default). Block size defaults to 4MB to match C lz4 CLI.

Reduce memory allocation for small inputs by sizing the chain table proportionally to input length instead of always allocating 128KB. For a 725B input, chain table is now 1KB instead of 128KB (-49% total).

When safe-encode feature is disabled, use get_unchecked/get_unchecked_mut for dict and chain table accesses to eliminate bounds checking overhead. Provides ~17% speedup in HC compression hot paths.

Add build and test coverage for the HC (high compression) feature: - Build tests for no_std with hc and hc+safe-encode - Unit tests with hc, hc+frame, hc+safe-encode combinations - Fuzz tests for HC in unsafe mode (fuzz_roundtrip_hc, fuzz_roundtrip_hc_cpp, fuzz_roundtrip_frame)

- Match struct: start/len/ref_pos from usize to u32 (24 → 12 bytes), halves stack pressure in HC's 4-Match juggling loop - OptimalState: use i32 for all fields matching C's LZ4HC_optimal_t layout (u16 off/mlen caused 15-20% regression from widening conversions) - find_longer_match: return (u32, u16), params narrowed to u32 - count_same_bytes: accept explicit match_limit parameter, use pointer-based loop matching C's LZ4_count for ~10% speedup - HC common_bytes: thin wrapper delegating to shared count_same_bytes - Pre-check uses single u16 read instead of two byte comparisons - Safe variant uses chunks_exact + zip iterators - Added #[inline] hints and sufficient_len cap matching C's behavior - Add thread-local cached state for HC

- Add doc comments to HashTableHCU32 and Match struct fields. Replace - unsafe write_bytes with safe .fill(0) in HashTableHCU32::reset. - Rename pattern32 to pattern in count/reverse_count_pattern.

yujincheng08 · 2026-03-16T03:24:21Z

Hi @PSeitz, all concerns addressed. As for assert_unchecked's, they are necessary for performance. They can eliminate bounds checks. Here's the assembly evidence (aarch64, rustc -O), using two functions straight from the codebase pattern — chain_delta (power-of-2 masked index) and get_dict (hash-bounded index):

chain_delta — index via pos & (len - 1)
With assert_unchecked (3 instructions, no branch):

    sub  x8, x1, #1          ; mask = len - 1
    and  x8, x2, x8          ; idx = pos & mask
    ldrh w0, [x0, x8, lsl #1] ; return chain_table[idx]
    ret

Without (branch + full panic path):

    sub  x9, x1, #1          ; mask = len - 1
    and  x0, x2, x9          ; idx = pos & mask
    cbz  x1, LBB6_2          ; if len == 0, panic  ← BOUNDS CHECK
    ldrh w0, [x8, x0, lsl #1]
    ret
LBB6_2:
    ; ... 6 more instructions setting up panic_bounds_check call ...
    bl   panic_bounds_check

Note: LLVM is smart enough to realize idx = pos & (len-1) can only be out-of-bounds when len == 0 (the mask wraps), so the check collapses to cbz x1. But it still can't prove len != 0 from the type alone, so the branch + panic cold path remains. assert_unchecked removes both entirely.

get_dict — direct index by hash
With assert_unchecked (1 instruction, no branch):

    ldr  w0, [x0, x2, lsl #2] ; return dict[hash]
    ret

Without (compare + conditional branch + panic path):

    cmp  x2, x1              ; hash >= len?  ← BOUNDS CHECK
    b.hs LBB8_2              ; if so, panic
    ldr  w0, [x0, x2, lsl #2]
    ret
LBB8_2:
    ; ... 7 more instructions setting up panic_bounds_check call ...
    bl   panic_bounds_check

Note that functions using assert_unchecked (e.g., get_dict, chain_delta, next, add_hash, set_chain) are called millions of times per compression — they're the innermost hash-chain traversal operations. Each bounds check adds:

A conditional branch (cbz/b.hs) on the hot path — even if perfectly predicted, this costs a slot in the branch predictor and prevents certain instruction reorderings.
Cold panic code that bloats the function, hurting I-cache and preventing inlining at call sites (LLVM's inlining heuristic weighs instruction count).
The assert_unchecked calls are all gated behind #[cfg(not(feature = "safe-encode"))], so users who prefer safety over perf can opt in to full bounds checking with the safe-encode feature.

- Block: compress_hc_to_vec levels 1-12 - Frame: all levels x Independent/Linked; add lz4_flex_frame_compress_with_level - C interop: decompress lz4_flex HC frames at all levels

The pointer-based rewrite of count_same_bytes regressed fast block compression ~3–4% on benchmarks; behavior is unchanged: same match_limit bounds, origin/main-style *cur/input_end loop for unsafe path. HC callers still share this helper with identical semantics.

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>

- Use #[inline] on HC helpers instead of #[inline(always)] (HC is new vs origin/main; fast block still matches main on copy_literals_wild). - Clarify count_pattern / reverse_count_pattern: one repeated byte is passed as a u32 with four copies (e.g. 0xAB -> 0xABABABAB) for XOR batch scans. - Rename pre_check_ok to tail_matches_past_best in find_longer_match.

Introduce PatternChainAction and HashTableHCU32::pattern_chain_action to replace deeply nested pattern/repeat detection with early returns and a small match at the call site.

- HcLevelParams / HcCompressionStrategy; frame encoder linked-block path - Expand optimal DP locals (optimal_states, opt_window_index, match length/offset names) - MIN_BYTES_FROM_CURSOR_TO_BLOCK_END alias for block tail room (MFLIMIT)

yujincheng08 · 2026-03-29T05:21:14Z

@PSeitz hi, thanks for your review. Is there any further concerns that block this PR from merging?

Squashed from PR #209 (yujincheng08/lz4_flex#hc). Adds high-compression block and frame compression with multiple compression levels (L1-L12), including HC, MID, and OPT strategies. Closes #21, closes #165

PSeitz · 2026-03-30T13:51:36Z

@PSeitz hi, thanks for your review. Is there any further concerns that block this PR from merging?

Hi, thanks for the PR. I moved it to this PR #216, since it's easier for me to make the changes directly. Hope that's ok for you.

Squashed from PR #209 (yujincheng08/lz4_flex#hc). Adds high-compression block and frame compression with multiple compression levels (L1-L12), including HC, MID, and OPT strategies. Closes #21, closes #165

yujincheng08 mentioned this pull request Mar 12, 2026

Add LZ4HC & LZ4OPT & LZ4MID support #191

Closed

PSeitz reviewed Mar 15, 2026

View reviewed changes

Comment thread fuzz/fuzz_targets/fuzz_roundtrip_frame.rs Outdated

PSeitz reviewed Mar 15, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 15, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 15, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 15, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 15, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 15, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 15, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 15, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 15, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 15, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 15, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 15, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 15, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

yujincheng08 added 9 commits March 16, 2026 10:50

Use dynamic chain table size based on input length

44eedb2

Reduce memory allocation for small inputs by sizing the chain table proportionally to input length instead of always allocating 128KB. For a 725B input, chain table is now 1KB instead of 128KB (-49% total).

Add unsafe hash table accessors for HC compression

b5b0379

When safe-encode feature is disabled, use get_unchecked/get_unchecked_mut for dict and chain table accesses to eliminate bounds checking overhead. Provides ~17% speedup in HC compression hot paths.

yujincheng08 force-pushed the hc branch from e3e5138 to 2dba6a0 Compare March 16, 2026 02:51

yujincheng08 added 2 commits March 16, 2026 10:57

Split for fuzz test of hc algorithms

1277811

Improve HC compress documentation and safety

49c7b88

- Add doc comments to HashTableHCU32 and Match struct fields. Replace - unsafe write_bytes with safe .fill(0) in HashTableHCU32::reset. - Rename pattern32 to pattern in count/reverse_count_pattern.

yujincheng08 force-pushed the hc branch from 21e5ee0 to 49c7b88 Compare March 16, 2026 03:10

yujincheng08 added 2 commits March 19, 2026 04:05

Expand HC linked block tests to cover all compression levels 1-12

3bf30c7

Fix format

7e3b9fd

PSeitz reviewed Mar 20, 2026

View reviewed changes

Comment thread src/block/compress.rs

PSeitz reviewed Mar 20, 2026

View reviewed changes

Comment thread tests/tests.rs

test: roundtrip all HC levels in test_roundtrip

7c3339a

- Block: compress_hc_to_vec levels 1-12 - Frame: all levels x Independent/Linked; add lz4_flex_frame_compress_with_level - C interop: decompress lz4_flex HC frames at all levels

yujincheng08 force-pushed the hc branch from 25e8bc7 to 7c3339a Compare March 20, 2026 02:09

PSeitz reviewed Mar 20, 2026

View reviewed changes

Comment thread src/block/compress.rs Outdated

yujincheng08 force-pushed the hc branch from 730ecd5 to 19b2a8a Compare March 20, 2026 08:38

Change count_same_bytes's STEP to CMP_SIZE

8dc753d

Co-authored-by: PSeitz <PSeitz@users.noreply.github.com>

yujincheng08 force-pushed the hc branch from 19b2a8a to 8dc753d Compare March 20, 2026 08:39

PSeitz reviewed Mar 22, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 22, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 22, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 22, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

yujincheng08 added 2 commits March 22, 2026 21:15

extract pattern-chain logic from find_longer_match

f270258

Introduce PatternChainAction and HashTableHCU32::pattern_chain_action to replace deeply nested pattern/repeat detection with early returns and a small match at the call site.

PSeitz reviewed Mar 22, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 22, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 22, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 22, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

PSeitz reviewed Mar 22, 2026

View reviewed changes

Comment thread src/block/compress_hc.rs Outdated

yujincheng08 force-pushed the hc branch from 1acdb0e to f3ab3ca Compare March 23, 2026 03:52

PSeitz mentioned this pull request Mar 30, 2026

Add LZ4HC & LZ4OPT & LZ4MID support #216

Open

yujincheng08 closed this Mar 31, 2026

Conversation

yujincheng08 commented Mar 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Performance:

Block Compression (Safe)

Frame Compression (Safe)

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PSeitz commented Mar 15, 2026

Uh oh!

yujincheng08 commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yujincheng08 commented Mar 29, 2026

Uh oh!

PSeitz commented Mar 30, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

yujincheng08 commented Mar 12, 2026 •

edited

Loading

yujincheng08 commented Mar 16, 2026 •

edited

Loading