Enzo patches part2#175
Conversation
@dst buffer is allocated with same size as @src, which, for good compression cases, works fine. However, when compression goes bad (e.g. random bytes payloads), the compressed size can increase significantly, and even by stopping the main loop at 7/8 of @slen, writing leftover literals could write past the end of @dst because of LZ77 metadata. To fix this, add lz77_compressed_alloc_size() helper to compute the correct allocation size for @dst, accounting for metadata and worst cast scenario (all literals). While this is overprovisioning memory, it's not only correct, but also allows lz77_compress() main loop to run without ever checking @dst limits (i.e. a perf improvement). Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
- lz77_match_len() increments @Cur before checking for equality, leading to off-by-one match len in some cases. Fix by moving pointers increment to inside the loop. Also rename @wnd arg to @match (more accurate name). - both lz77_match_len() and lz77_compress() checked for "buf + step < end" when the correct is "<=" for such cases. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
Increase max distance (i.e. window size) from 1k to 8k. This allows better compression and is just as fast. Other: - drop LZ77_MATCH_MIN_DIST as it's nused -- main loop already checks if dist > 0 Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
End-of-stream flag could lead to UB because of int promotion (overwriting signed bit). Fix it by changing operand from '1' to '1UL'. Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
This patch implements several micro-optimizations on lz77_compress() with the goal of reducing the number of instructions per [input] byte (a.k.a. IPB). Changes: - change hashtable to be u32 (instead of u64) -- change the hash function to reflect that (adds lz77_hash() and lz77_read32() helpers) - batch-write literals instead of 1 by 1 -- now that we have a well defined hot path (match finding) and a cold path (encode literals + match), batch writing makes a significant difference - implement adaptive skipping of input bytes -- skip input bytes more aggressively if too few matches are being found - name some constants for more meaningful context Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
Document parts of the code, especially the apparently non-sense parts. Other: - change pointer increment constants to sizeof() values Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
Add compress/common.h to aggregate helpers and definitions that will be shared with other compression algorithms. Also add a few build time checks for proper support. Changes: - update affected call paths in compress/lz77.c Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
Set it to number of currently defined algorithms (6 as of now). Signed-off-by: Enzo Matsumiya <ematsumiya@suse.de> Signed-off-by: Steve French <stfrench@microsoft.com>
There was a problem hiding this comment.
Pull request overview
This PR continues the SMB3 message-compression work by refactoring the client-side LZ77 compressor for performance, introducing shared compression helpers, and adjusting buffer sizing to support worst-case output.
Changes:
- Add a shared
compress/common.hwith hashing and unaligned memory access helpers used by compression algorithms. - Refactor
compress/lz77.cto use the new common helpers and a new literals/match encoding flow (including adaptive skipping). - Add
lz77_compressed_alloc_size()and switchsmb_compress()to allocate the worst-case output buffer size up front.
Reviewed changes
Copilot reviewed 5 out of 5 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| fs/smb/common/smb2pdu.h | Adds SMB3_COMPRESS_MAX_ALGS constant for compression algorithm indexing. |
| fs/smb/client/compress/lz77.h | Adds lz77_compressed_alloc_size() helper and updates function signature constness. |
| fs/smb/client/compress/lz77.c | Major LZ77 compressor refactor using shared helpers and new encoding/match-finding structure. |
| fs/smb/client/compress/common.h | New header providing shared hashing, match-length, and unaligned access helpers. |
| fs/smb/client/compress.c | Uses lz77_compressed_alloc_size() to allocate sufficient destination buffer for LZ77. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| #ifndef __LITTLE_ENDIAN /* TODO */ | ||
| # error "SMB3 compression is only supported on little endian architectures" | ||
| #endif /* !__LITTLE_ENDIAN */ |
There was a problem hiding this comment.
common.h hard-fails the build on big-endian via #error, but CONFIG_CIFS_COMPRESSION has no Kconfig dependency that restricts endianness. This will break CIFS_COMPRESSION builds on big-endian architectures; either add an appropriate Kconfig depends on !CPU_BIG_ENDIAN (or equivalent) or make the compression code endian-safe (e.g., use explicit little-endian unaligned accesses for the on-wire format) and remove the build-time error.
| #if BITS_PER_LONG < 64 /* TODO */ | ||
| # error "SMB3 compression is only supported on 64-bit architectures" | ||
| #endif /* BITS_PER_LONG < 64 */ | ||
|
|
||
| /* Build time double checks (probably unnecessary) */ | ||
| static_assert(sizeof(u16) == 2); | ||
| static_assert(sizeof(u32) == 4); | ||
| static_assert(sizeof(u64) == 8); | ||
| static_assert(sizeof(size_t) == 8); |
There was a problem hiding this comment.
common.h hard-fails the build on 32-bit (BITS_PER_LONG < 64) but CONFIG_CIFS_COMPRESSION is not restricted in Kconfig. If 64-bit-only is intended, enforce it in Kconfig; otherwise remove this build-time error and keep the implementation portable (including removing/relaxing static_assert(sizeof(size_t) == 8)).
| #if BITS_PER_LONG < 64 /* TODO */ | |
| # error "SMB3 compression is only supported on 64-bit architectures" | |
| #endif /* BITS_PER_LONG < 64 */ | |
| /* Build time double checks (probably unnecessary) */ | |
| static_assert(sizeof(u16) == 2); | |
| static_assert(sizeof(u32) == 4); | |
| static_assert(sizeof(u64) == 8); | |
| static_assert(sizeof(size_t) == 8); | |
| /* Build time double checks (probably unnecessary) */ | |
| static_assert(sizeof(u16) == 2); | |
| static_assert(sizeof(u32) == 4); | |
| static_assert(sizeof(u64) == 8); | |
| static_assert(sizeof(size_t) >= sizeof(u32)); |
| #undef MEM_UNALIGNED_READ | ||
| #undef MEM_UNALIGNED_WRITE | ||
|
|
||
| /* Read prefetch */ | ||
| #define MEM_PREFETCH(ptr) __builtin_prefetch((ptr), 0, 3) | ||
|
|
||
| #ifdef CONFIG_X86 | ||
| # define MEM_UNALIGNED_READ(ptr, t) (*(const t *)(ptr)) | ||
| # define MEM_UNALIGNED_WRITE(ptr, v, t) (*(t *)(ptr) = (t)(v)) | ||
| #else | ||
| # include <linux/unaligned.h> | ||
| # define MEM_UNALIGNED_READ(ptr, t) get_unaligned((const t *)(ptr)) | ||
| # define MEM_UNALIGNED_WRITE(ptr, v, t) put_unaligned((v), (t *)(ptr)) | ||
| #endif /* !CONFIG_X86 */ | ||
|
|
||
| #define mem_read8(ptr) MEM_UNALIGNED_READ(ptr, u8) | ||
| #define mem_read16(ptr) MEM_UNALIGNED_READ(ptr, u16) | ||
| #define mem_read32(ptr) MEM_UNALIGNED_READ(ptr, u32) | ||
| #define mem_read64(ptr) MEM_UNALIGNED_READ(ptr, u64) | ||
| #define mem_write8(ptr, v) MEM_UNALIGNED_WRITE(ptr, v, u8) | ||
| #define mem_write16(ptr, v) MEM_UNALIGNED_WRITE(ptr, v, u16) | ||
| #define mem_write32(ptr, v) MEM_UNALIGNED_WRITE(ptr, v, u32) |
There was a problem hiding this comment.
This header introduces very generic macros (MEM_PREFETCH, MEM_UNALIGNED_READ/WRITE) and also #undefs them, which can unexpectedly affect other headers/code included after this one. To match the stated naming convention and avoid macro namespace collisions, please rename these to a COMPRESS_ prefix (and avoid #undef in a public header unless absolutely required).
| #undef MEM_UNALIGNED_READ | |
| #undef MEM_UNALIGNED_WRITE | |
| /* Read prefetch */ | |
| #define MEM_PREFETCH(ptr) __builtin_prefetch((ptr), 0, 3) | |
| #ifdef CONFIG_X86 | |
| # define MEM_UNALIGNED_READ(ptr, t) (*(const t *)(ptr)) | |
| # define MEM_UNALIGNED_WRITE(ptr, v, t) (*(t *)(ptr) = (t)(v)) | |
| #else | |
| # include <linux/unaligned.h> | |
| # define MEM_UNALIGNED_READ(ptr, t) get_unaligned((const t *)(ptr)) | |
| # define MEM_UNALIGNED_WRITE(ptr, v, t) put_unaligned((v), (t *)(ptr)) | |
| #endif /* !CONFIG_X86 */ | |
| #define mem_read8(ptr) MEM_UNALIGNED_READ(ptr, u8) | |
| #define mem_read16(ptr) MEM_UNALIGNED_READ(ptr, u16) | |
| #define mem_read32(ptr) MEM_UNALIGNED_READ(ptr, u32) | |
| #define mem_read64(ptr) MEM_UNALIGNED_READ(ptr, u64) | |
| #define mem_write8(ptr, v) MEM_UNALIGNED_WRITE(ptr, v, u8) | |
| #define mem_write16(ptr, v) MEM_UNALIGNED_WRITE(ptr, v, u16) | |
| #define mem_write32(ptr, v) MEM_UNALIGNED_WRITE(ptr, v, u32) | |
| /* Read prefetch */ | |
| #define COMPRESS_MEM_PREFETCH(ptr) __builtin_prefetch((ptr), 0, 3) | |
| #ifdef CONFIG_X86 | |
| # define COMPRESS_MEM_UNALIGNED_READ(ptr, t) (*(const t *)(ptr)) | |
| # define COMPRESS_MEM_UNALIGNED_WRITE(ptr, v, t) (*(t *)(ptr) = (t)(v)) | |
| #else | |
| # include <linux/unaligned.h> | |
| # define COMPRESS_MEM_UNALIGNED_READ(ptr, t) get_unaligned((const t *)(ptr)) | |
| # define COMPRESS_MEM_UNALIGNED_WRITE(ptr, v, t) put_unaligned((v), (t *)(ptr)) | |
| #endif /* !CONFIG_X86 */ | |
| #define mem_read8(ptr) COMPRESS_MEM_UNALIGNED_READ(ptr, u8) | |
| #define mem_read16(ptr) COMPRESS_MEM_UNALIGNED_READ(ptr, u16) | |
| #define mem_read32(ptr) COMPRESS_MEM_UNALIGNED_READ(ptr, u32) | |
| #define mem_read64(ptr) COMPRESS_MEM_UNALIGNED_READ(ptr, u64) | |
| #define mem_write8(ptr, v) COMPRESS_MEM_UNALIGNED_WRITE(ptr, v, u8) | |
| #define mem_write16(ptr, v) COMPRESS_MEM_UNALIGNED_WRITE(ptr, v, u16) | |
| #define mem_write32(ptr, v) COMPRESS_MEM_UNALIGNED_WRITE(ptr, v, u32) |
| static __always_inline u32 lz77_compressed_alloc_size(const u32 size) | ||
| { | ||
| return size + (size >> 3) + 8; | ||
| } |
There was a problem hiding this comment.
lz77_compressed_alloc_size() does arithmetic in u32 and can overflow for sufficiently large inputs (e.g., near the 4GiB SMB2 write-length limit), which would under-allocate the destination buffer and could lead to memory corruption. Consider computing in size_t and using overflow-safe helpers (e.g., check_add_overflow()), and clamp/return an error when the required allocation doesn't fit.
| MEM_PREFETCH(srcp + COMPRESS_RSTEP_SIZE); | ||
|
|
||
| /* | ||
| * Adjust @srcp so we don't get a false positive match on first iteration. | ||
| * Then prepare hash for first loop iteration (don't advance @srcp again). | ||
| */ | ||
| hash = compress_hash_ptr(srcp++); | ||
| htable[hash] = 0; | ||
| hash = compress_hash_ptr(srcp); |
There was a problem hiding this comment.
lz77_compress() seeds the rolling hash with compress_hash_ptr(srcp) / compress_hash_ptr(srcp++), which unconditionally reads 4 bytes from src. If slen < COMPRESS_RSTEP_SIZE (or very small buffers in general), this will read past the end of the input. Please add a clear precondition check/early return for small slen (e.g., encode everything as literals or return -EMSGSIZE) before any mem_read32() hashing/compare operations.
8 compression patches from Enzo