Skip to content

RFC: Lossless fixed-width integer bit-packing codec for ADC-style data #813

@sakshi2433

Description

@sakshi2433

Hi! This is an exploratory design discussion based on a working prototype and benchmarks.
I’d really appreciate feedback on API shape and integration direction.

Motivation

Many scientific datasets store integer signals where only a subset of bits are meaningful
(e.g. 10–12 bit ADC data stored in uint16). Zarr/numcodecs currently rely on byte-level
compression for such data, which does not explicitly remove unused bits.

Observations

In an ADC-style benchmark (uint16, effective 12 bits, 10M samples), default Zarr compression
reduced storage from ~19 MB to ~7 MB, but further gains plateaued. Existing bit-level tools
in numcodecs (e.g. PackBits) do not support lossless integer bit-width packing.

Proposal

Introduce a lossless integer bit-packing codec/filter that:

  • Packs fixed-width integer values using exactly N bits
  • Operates per chunk
  • Is fully reversible
  • Can be composed with existing compression

Prototype

I implemented a pure-Python prototype to validate feasibility:

  • Correct round-trip verified
  • Storage reduction proportional to effective bit-width
  • Zarr v2 compatible (self-describing stream)

Results (summary)

  • Bit-packing alone reduces storage predictably (e.g. 12/16 → ~0.75×)
  • Bit-packing + Blosc/Zstd achieves size near default compression
  • Python implementation is CPU-heavy → optimized backend likely needed

Questions

  • Should this live as a codec or filter in numcodecs?
  • How should bit-width metadata be handled (header vs external)?
  • Is limiting initial scope to uint16 reasonable?
  • Any guidance on aligning this with Zarr v3’s codec pipeline?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions