Skip to content

fastfloat/int_serialization_benchmark

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

112 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Integer-to-String Benchmark

A benchmark suite for integer-to-string conversion algorithms, containing our Champagne-Lemire AVX-512 algorithm.

Overview

This project benchmarks various algorithms for converting 64-bit unsigned integers to their decimal string representation. The focus is on high-throughput batch conversion scenarios.

Requirements

  • An x86 64-bit CPU
  • AVX-512 support (for the Champagne-Lemire algorithm)
  • A recent GCC or Clang on a Linux system
  • CMake

Building

cmake -B build
cmake --build build

With a specific compiler, e.g., clang:

CC=clang CXX=clang++ cmake -B buildclang
cmake --build buildclang

Usage

Run the benchmark:

./build/benchmark

Quick validation mode:

./build/benchmark -q

With input data files:

./build/benchmark -f data/citm_catalog_integers.txt
./build/benchmark -f data/twitterjson_integers.txt

Performance counters may require privileged execution (sudo).

Algorithms Benchmarked

  • Champagne-Lemire (homogeneous): AVX-512 algorithm; variant optimized for uniform digit counts
  • Champagne-Lemire (heterogeneous): AVX-512 algorithm; variant for varying digit counts
  • Champagne-Lemire (auto): Dynamically selects between homogeneous/heterogeneous variants
  • std::to_chars: Standard library implementation
  • Abseil: Google's FastIntToBuffer routine
  • jeaiii: James Edward Anhalt III's algorithm
  • AppNexus: From the AppNexus Common Framework library
  • yy_itoa: Yao Yuan's implementation
  • Mathisen: SSE4.1-based algorithm inspired by Mathisen's arithmetic approach
  • Muła: Wojciech Muła's SSE-based algorithm
  • Hopman: The hopman_fast algorithm (extended to 64-bit)
  • naive_onepass: Classic algorithm with one-pass optimization

Key Techniques

Digit Counting

Fast algorithms to determine the number of decimal digits in a 64-bit integer:

Lookup Tables

Precomputed strings for digit pairs (00 through 99) reduce store operations by processing two digits at a time.

Division by Constants

Efficient computation of quotient and remainder using multiplication:

Fixed-Digit Representations

Datasets

See data/DATASETS.md for descriptions of the included integer datasets:

  • citm_catalog_integers.txt - Event catalog IDs (mostly 9-digit, homogeneous)
  • twitterjson_integers.txt - Twitter API integers (heterogeneous distribution)
  • cit_patents_citing_integers.txt.gz - US patent numbers (7-digit, homogeneous)
  • stackoverflow_unix_timestamps_integers.txt.gz - Unix timestamps (10-digit, homogeneous)

Benchmark Metrics

The benchmark reports the following metrics:

Metric Description
ns/n Nanoseconds per number (integer)
GHz CPU frequency during benchmark
c/n CPU cycles per number
i/n Instructions per number
B/n Branches per number
BM/n Branch misses per number
i/d Instructions per output digit
i/c Instructions per cycle (IPC)

About

integer-to-string conversion algorithms

Resources

Stars

Watchers

Forks

Packages

 
 
 

Contributors