Skip to content

Question about pyzstd.compress() determinism vs compression.zstd #66

@datahack00

Description

@datahack00

I hope this message finds you well. First of all, I sincerely apologize for reaching out this way. I couldn't find a better channel, and I hope you don't mind.

I also want to take a moment to thank you for your work on pyzstd. It has served us reliably, and we genuinely appreciate the effort you've put into the library.

We are currently in the process of migrating our compression layer from pyzstd to Python's built-in compression.zstd module (introduced in Python 3.14), and during that transition, we noticed an interesting behavioral difference that we haven't been able to fully explain.

With pyzstd, the following call always produced an identical byte output across multiple invocations:

pyzstd.compress(
    data=content,
    zstd_dict=training_dict.as_digested_dict
)

However, with the new built-in module, the equivalent call:

zstd.compress(
    data=content,
    zstd_dict=training_dict.as_digested_dict
)

...produces byte objects of different sizes across two consecutive calls with the same input.

One additional detail that may be relevant: between the two compress() calls, we reload the dictionary from disk using:

training_dict = zstd.ZstdDict(zst_file)

It is the exact same .zst file each time, we simply call this line again before the second compress() call. We're not sure whether re-instantiating ZstdDict from the same file could have any effect on the resulting compressed output, but we wanted to mention it in case it's a factor.

Would you happen to know why this might be the case?

Thank you so much for your time, and apologies again for the interruption.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions