Add LZ4HC & LZ4OPT & LZ4MID support by PSeitz · Pull Request #216 · PSeitz/lz4_flex

PSeitz · 2026-03-30T13:47:18Z

Based on the PR from @yujincheng08 #209 with some changes on top

closes #21
closes #165

yujincheng08 · 2026-04-09T03:57:50Z

Hi, i can see the compression ratio of lz4mid becoming worse on dickens.txt:

on commit 15b8568: 5,086,276

on commit 55b87d7: 5,277,283

PSeitz · 2026-04-09T04:11:33Z

Hi, i can see the compression ratio of lz4mid becoming worse on dickens.txt:

on commit 15b8568: 5,086,276

on commit 55b87d7: 5,277,283

I think this was from a hashtable size change, which I reverted

yujincheng08 · 2026-04-09T06:29:45Z

I found that pre hash hurts the compression ratio. the following patch can improve the compression ratio for lz4hc:

Details

diff --git a/src/block/compress_hc/hash_chain.rs b/src/block/compress_hc/hash_chain.rs
index a84230f..865312c 100644
--- a/src/block/compress_hc/hash_chain.rs
+++ b/src/block/compress_hc/hash_chain.rs
@@ -487,19 +487,6 @@ impl HashTableHCU32 {
         self.dictionary[hash] as usize
     }
 
-    /// Set dictionary slot at hash index.
-    #[inline]
-    fn set_dictionary_at(&mut self, hash: usize, pos: usize) {
-        self.dictionary[hash] = pos as u32;
-    }
-
-    /// Set chain value at position
-    #[inline]
-    fn set_chain(&mut self, pos: usize, delta: u16) {
-        let chain_index = pos & self.chain_mask();
-        self.chain_table[chain_index] = delta;
-    }
-
     /// Insert hashes for all positions up to the given local offset.
     /// Positions stored in the hash table are absolute (`local_pos + stream_offset`).
     #[inline]
@@ -739,39 +726,6 @@ pub(super) fn find_longer_hash_chain_match(
     }
 }
 
-/// Update the hash chain for the first match found at `cur`.
-///
-/// This mirrors the pre-hash step from the LZ4 HC reference: positions inside the
-/// first accepted match are inserted eagerly so later searches can skip ahead.
-fn prehash_first_match(
-    hash_table: &mut HashTableHCU32,
-    input: &[u8],
-    cur_absolute: usize,
-    stream_offset: usize,
-    first_match_length: usize,
-    delta: usize,
-) {
-    let mut hash_pos = cur_absolute;
-    let end_pos = cur_absolute + first_match_length - 3;
-
-    while hash_pos < end_pos - delta {
-        hash_table.set_chain(hash_pos, delta as u16);
-        hash_pos += 1;
-    }
-
-    loop {
-        hash_table.set_chain(hash_pos, delta as u16);
-        let local_hash_pos = hash_pos - stream_offset;
-        hash_table.set_dictionary_at(get_hash_at(input, local_hash_pos), hash_pos);
-        hash_pos += 1;
-        if hash_pos >= end_pos {
-            break;
-        }
-    }
-
-    hash_table.next_to_update = end_pos;
-}
-
 /// Insert `cur` into the hash/chain tables, then search the chain for the
 /// longest match starting at `cur`.
 ///
@@ -795,8 +749,6 @@ fn find_best_hash_chain_match(
         match_length: 0,
         candidate: 0,
     };
-    let mut first_match_delta = 0usize;
-    let mut first_match_length = 0usize;
 
     let cur_absolute = cur + stream_offset;
     let ext_dict_stream_offset = stream_offset - ext_dict.len();
@@ -805,7 +757,7 @@ fn find_best_hash_chain_match(
 
     let mut candidate = hash_table.get_dictionary_at(get_hash_at(input, cur));
 
-    for attempt in 0..hash_table.max_attempts {
+    for _ in 0..hash_table.max_attempts {
         if !hash_table.in_range(candidate, cur_absolute) {
             break;
         }
@@ -836,28 +788,12 @@ fn find_best_hash_chain_match(
             best_match.match_length = match_length as u32;
         }
 
-        if attempt == 0 && match_length > 0 {
-            first_match_length = match_length;
-            first_match_delta = cur_absolute - candidate;
-        }
-
         let Some(next_candidate) = hash_table.advance(candidate, cur_absolute) else {
             break;
         };
         candidate = next_candidate;
     }
 
-    if first_match_length != 0 {
-        prehash_first_match(
-            hash_table,
-            input,
-            cur_absolute,
-            stream_offset,
-            first_match_length,
-            first_match_delta,
-        );
-    }
-
     if best_match.match_length == 0 {
         None
     } else {
@@ -1165,13 +1101,14 @@ pub(super) fn compress_hash_chain_internal(
     // Do not extend matches into the last `LAST_LITERALS` bytes (they are literals).
     let match_limit = input_end - LAST_LITERALS;
 
-    let mut cur = input_pos + 1;
+    // Match C's LZ4HC main loop: start at block start and scan through `mflimit` inclusive.
+    let mut cur = input_pos;
     let mut literal_start = input_pos;
     let mut previous_match;
     let mut current_match;
     let mut next_match;
 
-    while cur < end_pos_check {
+    while cur <= end_pos_check {
         let Some(found_match) = find_best_hash_chain_match(
             hash_table,
             input,
diff --git a/src/block/compress_hc/tests.rs b/src/block/compress_hc/tests.rs
index 07f8a06..9bd9fc6 100644
--- a/src/block/compress_hc/tests.rs
+++ b/src/block/compress_hc/tests.rs
@@ -257,8 +257,8 @@ fn test_compressed_sizes_exact() {
     // (level, html_like, json_like, code_like)
     let expected: &[(u8, usize, usize, usize)] = &[
         (1, 16_350, 16_183, 6_246),
-        (4, 15_620, 16_198, 6_071),
-        (9, 15_509, 15_698, 5_985),
+        (4, 15_642, 16_363, 6_081),
+        (9, 15_111, 15_482, 5_984),
         (10, 15_153, 15_102, 5_990),
         (12, 15_100, 15_083, 5_979),
     ];

Squashed from PR #209 (yujincheng08/lz4_flex#hc). Adds high-compression block and frame compression with multiple compression levels (L1-L12), including HC, MID, and OPT strategies. Closes #21, closes #165

- cursor_pos → cur, literal_anchor_pos → literal_start - candidate_absolute_position → candidate, absolute_byte_offset → cur_absolute - reference_local_position → candidate_local - external_dictionary → ext_dict, external_dictionary_stream_offset → ext_dict_stream_offset - match_extension_end_pos → match_limit, max_main_cursor_pos → end_pos_check - MIN_MATCH → MINMATCH (use shared constant from mod.rs) - MIN_BYTES_FROM_CURSOR_TO_BLOCK_END → MFLIMIT (use shared constant) - Shorten verbose names in opt parser and mid compressor - Ignore 10MB tests for faster iteration

The same ext_dict candidate matching logic (boundary-crossing reads, min-match check, forward count) was duplicated across 3 search methods in HashTableHCU32. Extract into a shared helper function.

- Extract in_range() and advance() helpers to replace repeated chain-validity checks - Use early continue/break to reduce indentation depth - Shorten local variable names in insert_and_find_wider_match

…loops Replace the hard-to-follow break true/break false pattern in the match0/match1/match2/match3 lazy evaluation with labeled loops ('lazy and 'resolve), making control flow explicit and adding comments explaining each branch.

…ctions Move add_hash4/add_hash8 to methods on HashTableMid and resolve_candidate to a standalone resolve_mid_candidate function, eliminating nested fn definitions and reducing parameter passing.

Remove test_lz4mid_debug (println-based debug test). Make Match, HashTableHCU32, and HashTableMid private since they're only used within compress_hc.rs. Restrict HcLevelParams fields to pub(crate).

::"#

Align compress_hc.rs naming with compress.rs conventions where the earlier match position is consistently called 'candidate'.

yujincheng08 · 2026-05-08T10:10:07Z

Any further concerns? Also, please take a look at #216 (comment). Thanks.

Copilot

Pull request overview

This PR adds multi-level compression support (LZ4MID / LZ4HC / LZ4OPT-style behavior) to lz4_flex, integrating it into both block compression APIs and the frame encoder, along with CLI, tests/fuzzing, and benchmarking/docs updates.

Changes:

Introduces a new block::compress_hc* API family with level-based strategy selection (two-hash-tables, hash-chain HC, optimal parsing) and reusable compression tables.
Extends frame::FrameEncoder with with_compression_level(...) and routes frame block compression through fast vs HC/optimal paths.
Updates lz4_bin, benches, tests, fuzz targets, and adds documentation/perf notes for the new algorithms.

Reviewed changes

Copilot reviewed 23 out of 23 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
`tests/tests.rs`	Expands roundtrip/cross-compat tests to cover levels 1–12 and adds linked-mode HC tests; marks some large tests ignored.
`src/frame/compress.rs`	Adds level-aware frame encoding, switching between fast and HC/optimal compressors and maintaining reusable state.
`src/block/mod.rs`	Wires in the new `compress_hc` module and re-exports its public API; minor doc link tweak.
`src/block/compress.rs`	Refactors shared helpers (`count_same_bytes`, `handle_last_literals`, `encode_sequence`) for reuse by HC implementations.
`src/block/compress_hc/mod.rs`	New high-compression module: strategy selection, reusable tables, public APIs, and linked-block support hooks for frames.
`src/block/compress_hc/two_hashtables.rs`	New “mid” (two-hash-tables) compressor used for low levels.
`src/block/compress_hc/hash_chain.rs`	New hash-chain HC compressor implementation (levels ~3–9).
`src/block/compress_hc/optimal.rs`	New optimal parsing implementation (levels ~10–12).
`src/block/compress_hc/tests.rs`	Adds unit tests for the new HC/optimal compressors and clamping behavior.
`perf.md`	Adds performance investigation notes for the two-hash-tables strategy.
`lz4_Block_format.md`	Adds a copy of the upstream LZ4 block format description for reference.
`lz4_bin/src/main.rs`	Adds CLI flags for compression level and block size; uses the new frame encoder constructor.
`lz4_bin/Cargo.toml`	Enables the `frame` feature for `lz4_flex` in the CLI crate.
`fuzz/fuzz_targets/fuzz_roundtrip_hc.rs`	Adds block HC roundtrip fuzzing with varied levels.
`fuzz/fuzz_targets/fuzz_roundtrip_hc_cpp.rs`	Adds block HC fuzzing with C++ (lzzzz) decompression validation.
`fuzz/fuzz_targets/fuzz_roundtrip_hc_frame.rs`	Adds frame fuzzing across levels, block modes, block sizes, and checksum settings.
`fuzz/Cargo.toml`	Registers the new fuzz targets as binaries.
`docs/compress_hc_algorithms.md`	Adds high-level documentation describing the three HC strategies and how to read the implementation.
`CLAUDE.md`	Adds repository code style guidance.
`Cargo.toml`	Updates bench profile settings (adds `debug = true`).
`benches/binggan_bench.rs`	Extends benchmarking to cover block HC levels and compare against `lz4` reference where applicable.
`benches/bench.rs`	Removes an old commented-out benchmark file.
`.rgignore`	Adjusts ripgrep ignore rules to skip large benchmark corpora while keeping bench source searchable.

Comments suppressed due to low confidence (1)

tests/tests.rs:741

This test is now unconditionally marked #[ignore], so it won’t run in CI and may stop catching regressions in block sizing/compression behavior. If it’s too slow for default CI, consider gating it behind an env var/feature (e.g. SLOW_TESTS) rather than ignoring it entirely.

    #[test]
    #[cfg_attr(miri, ignore)]
    #[ignore]
    fn block_size() {

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

+    // The unsafe version copies blocks of 8bytes, and therefore may copy up to 7bytes more than
+    // needed. This is safe, because the last 12 bytes (MF_LIMIT) are handled in
+    // handle_last_literals.
+    copy_literals_wild(output, literal, 0, literal.len());


+    /// Reset the table for reuse.
+    ///
+    /// Stale entries are harmless: `resolve_candidate` bounds-checks every
+    /// position before use, so old entries just fail the match attempt.
+    /// Skipping the 128 KB memset is a large win for small inputs.
+    pub(super) fn reset(&mut self) {
+        // Intentionally not zeroed — see resolve_candidate.


 #[test]
 #[cfg_attr(miri, ignore)]
+#[ignore]
 fn test_text_10mb() {


Prehashing the first match can reduce compression ratio by skipping useful later candidates. Remove that step and scan the HC main loop from the block start through mflimit inclusive. Benchmark: cargo bench "block_compress AND flex" Level 9 impact: - 725: 19.548 MB/s (-0.43%), reuse 16.284 MB/s (-1.26%), output 542 - 34308: 62.703 MB/s (-10.59%), reuse 61.468 MB/s (-11.68%), output 15_901 (-0.48%) - 64723: 54.728 MB/s (-14.94%), reuse 54.194 MB/s (-19.79%), output 29_088 (-0.52%) - 66675: 87.811 MB/s (-14.86%), reuse 86.612 MB/s (-13.28%), output 12_249 (-0.25%) - 9991663: 29.519 MB/s (-21.45%), reuse 29.523 MB/s (-21.64%), output 4_393_516 (-0.80%)

PSeitz · 2026-05-16T14:50:25Z

I found that pre hash hurts the compression ratio. the following patch can improve the compression ratio for lz4hc:
Details

Thanks, I applied the diff here 5739b5b

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

PSeitz mentioned this pull request Mar 30, 2026

Add LZ4HC & LZ4OPT & LZ4MID support #209

Closed

PSeitz force-pushed the hc_pr branch from e37b0f6 to cbe16d6 Compare April 4, 2026 12:30

PSeitz force-pushed the hc_pr branch from 55b87d7 to aa06c52 Compare April 9, 2026 04:10

topjohnwu mentioned this pull request Apr 14, 2026

Use pure rust lz4_flex (with HC support) topjohnwu/Magisk#9663

Closed

yujincheng08 and others added 23 commits April 18, 2026 11:26

Add LZ4HC, LZ4OPT & LZ4MID compression support

ca4617f

Squashed from PR #209 (yujincheng08/lz4_flex#hc). Adds high-compression block and frame compression with multiple compression levels (L1-L12), including HC, MID, and OPT strategies. Closes #21, closes #165

Extract try_ext_dict_match helper to deduplicate ext_dict matching

41d819c

The same ext_dict candidate matching logic (boundary-crossing reads, min-match check, forward count) was duplicated across 3 search methods in HashTableHCU32. Extract into a shared helper function.

Flatten nesting in HashTableHCU32 search methods

63804a9

- Extract in_range() and advance() helpers to replace repeated chain-validity checks - Use early continue/break to reduce indentation depth - Shorten local variable names in insert_and_find_wider_match

refactor: extract inner fns from compress_mid_internal to methods/fun…

1632b4e

…ctions Move add_hash4/add_hash8 to methods on HashTableMid and resolve_candidate to a standalone resolve_mid_candidate function, eliminating nested fn definitions and reducing parameter passing.

refactor: tighten visibility and remove debug test in compress_hc

04da7ba

Remove test_lz4mid_debug (println-based debug test). Make Match, HashTableHCU32, and HashTableMid private since they're only used within compress_hc.rs. Restrict HcLevelParams fields to pub(crate).

refactor

aeba8ad

clippy

c55e28e

add comment

c56539a

cleanup, refactor

94f1afa

renames

ba15c51

refactor

dbde976

align variable naming

e11ea04

refactor

449c4b8

replace loop, add benchmark

a1bee2e

flatten branches

6a61ee4

align naming conventions

f25e109

::"#

align naming conventions

3f975c5

Rename reference_position to candidate in Match struct

b3d2ca5

Align compress_hc.rs naming with compress.rs conventions where the earlier match position is consistently called 'candidate'.

add exact test

8c92728

cleanup bench

6188e6c

Rename hash-chain compressor and bump binggan

4fec1be

PSeitz added 12 commits April 18, 2026 11:27

refactor

c63f480

refactor structure

6c6e75f

move hc to seperate files

6ebcf28

Clarify lz4mid 8-byte hash endianness

bead0b1

Clarify lz4mid match start and end

013652b

Extract lz4mid match finder

8964448

add missing inline

295dbcb

add reuse hashtable bench

9e8c74a

precheck for collisions

bf54190

remove struct

b9a52b9

add comment

7edd342

add block format description

ecefbdc

PSeitz force-pushed the hc_pr branch from 982c46d to ecefbdc Compare April 18, 2026 09:27

PSeitz added 4 commits April 18, 2026 12:49

add USE_DICT in mid.rs

c93e821

ignore bench files in rg

715368a

rename mid to two-hashtables

7de2d2f

better docs

2cd4070

Refactor HC hash-chain match helpers

d8d32b4

Copilot AI review requested due to automatic review settings May 16, 2026 11:43

Copilot started reviewing on behalf of PSeitz May 16, 2026 11:43 View session

Copilot AI reviewed May 16, 2026

View reviewed changes

PSeitz and others added 6 commits May 16, 2026 16:50

Update binggan benchmark dependency

dcb58ec

attribution

5b52063

Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

ignore test files

d651d72

remove duplicated code

5562534

remove unnecessary code

d841efc

introduce CandidateSource

2d8b32c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add LZ4HC & LZ4OPT & LZ4MID support#216

Add LZ4HC & LZ4OPT & LZ4MID support#216
PSeitz wants to merge 50 commits into
mainfrom
hc_pr

PSeitz commented Mar 30, 2026

Uh oh!

yujincheng08 commented Apr 9, 2026 •

edited

Loading

Uh oh!

PSeitz commented Apr 9, 2026

Uh oh!

yujincheng08 commented Apr 9, 2026

Uh oh!

yujincheng08 commented May 8, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PSeitz commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

PSeitz commented Mar 30, 2026

Uh oh!

yujincheng08 commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

PSeitz commented Apr 9, 2026

Uh oh!

yujincheng08 commented Apr 9, 2026

Uh oh!

yujincheng08 commented May 8, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

PSeitz commented May 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

yujincheng08 commented Apr 9, 2026 •

edited

Loading