Skip to content

Advice on choosing bloom filter parameters #38

@luxe

Description

@luxe

I have 863,049,256 keys whose format is 34 character alphanumeric:

Z6DauUBtzw8p77MLxy7VWYCKN92JfKiCK
AJe8BJYnJHp9DDUrvGgFwmn5oBjhUSgowr
114VNKGsr9M4ogvwjz6ESNUqYdroGyht7r
etc..

Do you have any recommendations on bloom filter parameters?
In particular these:

/* k-mer size */
const unsigned k = //?;

/* number of Bloom filter hash functions */
const unsigned numHashes = //?;

/* size of Bloom filter (in bits) */
const unsigned size = //?;

It takes a long time for me to initialize the filter, so exerpimenting with different configs takes time.
Curious if you have any insight into a good starting configuration.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions