Skip to content

libbpf-tools: Add generic datastructure_helpers (vec + hashmap) and use in biotop#5507

Open
Bojun-Seo wants to merge 2 commits intoiovisor:masterfrom
Bojun-Seo:datastructure
Open

libbpf-tools: Add generic datastructure_helpers (vec + hashmap) and use in biotop#5507
Bojun-Seo wants to merge 2 commits intoiovisor:masterfrom
Bojun-Seo:datastructure

Conversation

@Bojun-Seo
Copy link
Copy Markdown
Contributor

Description

Add a new datastructure_helpers library (datastructure_helpers.h /
datastructure_helpers.c) to libbpf-tools/ that provides two generic,
reusable data structures for user-space libbpf tool code:

  • ds_vec — a realloc-based dynamic array storing elements inline
    (amortised O(1) push, O(1) indexed access, built-in qsort wrapper).
  • ds_hashmap — a separate-chaining hash map storing keys and values
    inline after each node header, using FNV-1a 64-bit hashing and a 2×
    bucket-array growth policy (expected O(1) insert / lookup / delete).

The second commit migrates biotop.c to use these helpers in place of the
ad-hoc struct vector / grow_vector / free_vector code that existed
locally in that file. As a side effect, search_disk_name() is upgraded from
an O(n) linear scan to an O(1) hashmap lookup.

A unit-test binary (libbpf-tools/tests/test_datastructure_helpers) with its
own Makefile is included and covers the full public API of both structures.

Why this approach

Several libbpf-tools already duplicate small dynamic-array or
map-lookup patterns (biotop's disk list being one example). A shared
helper library avoids this duplication without pulling in a heavy
external dependency.

Why a new file rather than extending map_helpers?
map_helpers is specifically for BPF map I/O. Mixing general-purpose
user-space data structures there would blur its purpose.

Why FNV-1a rather than the existing libbpf hashmap?
libbpf's internal hashmap API is not part of its stable public surface
and is not designed for direct use by tools. ds_hashmap is a thin,
self-contained alternative with an API tailored to the libbpf-tools coding
style (pass-by-pointer, error returns as negative errno).

Why separate-chaining rather than open addressing?
Separate chaining keeps insertion O(1) amortised without the tombstone
complexity of open addressing, which matters for the delete + re-insert
patterns some tools may need.


Checklist

  • Commit prefix matches changed area (libbpf-tools:, libbpf-tools/biotop:)
  • Commit body explains why this change is needed

Bojun-Seo added 2 commits May 4, 2026 15:12
Add datastructure_helpers.h and datastructure_helpers.c implementing
two general-purpose data structures for shared use across libbpf-tools:

- struct vec: a realloc-based dynamic array with amortized O(1) push_back

- struct hashmap: a separate-chaining hash map.  Each bucket holds a
  singly-linked list of nodes with key and value stored inline.  Uses
  FNV-1a hashing.  The bucket array doubles when the average chain
  length exceeds 2, keeping expected lookup O(1).

FNV-1a is released into the public domain under CC0 1.0.

Unit tests are added under libbpf-tools/tests/ along with a standalone
Makefile so the tests can be built and run without BPF or kernel support:

  make -C libbpf-tools/tests test

Why:
Several libbpf-tools have duplicated their own ad-hoc dynamic-array or
lookup implementations. A shared helper avoids this duplication and
gives future tools a standard building block without pulling in an
external dependence
Replace the ad-hoc struct vector / grow_vector / free_vector
implementation with ds_vec and ds_hashmap from datastructure_helpers.

- struct disk entries are stored in a ds_vec (parse_disk_stat)
- a ds_hashmap keyed by (major, minor) is populated at the same time,
  turning search_disk_name from an O(n) linear scan into an O(1) lookup
- datastructure_helpers.o is added to COMMON_OBJ in the Makefile so
  all tools can link against it

No functional change to the tool's output or behaviour.

Why:
The ad-hoc vector in biotop.c is functionally equivalent to ds_vec but
exists only in that one file, making it a maintenance liability. Removing
it in favour of the shared library reduces duplication. The O(n) scan in
search_disk_name is also an unnecessary cost on systems with many block
devices; a keyed hashmap lookup is a straightforward improvement.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant