Multithreaded C++ log processor featuring mmap-based I/O, thread pools, profiling-driven optimizations, and scalable parallel processing.
mmap file -> line-aligned Chunks -> TaskQueue -> Worker pool -> per-thread Stats -> merge -> Report
-
mmap reader (
mmap_reader.*) maps the file read-only and splits it into ~16 MB chunks whose boundaries land on newlines, so no record is split. -
parser (
log_parser.*) scans each line with raw pointers (no regex / nostringstream) intostring_views pointing into the mapped file (zero copy). -
aggregator (
aggregator.*) folds records into a per-threadThreadStats(no locks);merge()combines them and computes the analytics. -
task queue / thread pool (
task_queue.*,thread_pool.*) form the producer-consumer pipeline: one producer feeds chunks, N workers consume.
cmake -B build-release -DCMAKE_BUILD_TYPE=Release
cmake --build build-releaseDebug build (warnings-as-errors): -DCMAKE_BUILD_TYPE=Debug.
ThreadSanitizer build: -DENGINE_SANITIZE_THREAD=ON.
The code is portable POSIX and builds on macOS too.
-march=nativeis enabled only if the compiler accepts it (auto-detected).MAP_POPULATEis Linux-only and guarded.
./build-release/log_engine --input data/sample.log --threads 8 --top-endpoints 10
# positional input also works: ./build-release/log_engine data/sample.logOptions: --input <file>, --threads N (default: hardware concurrency),
--top-endpoints N (default 10), --chunk-mb N (default 16).
python3 scripts/gen_logs.py --size-mb 10 --out data/sample.log
python3 scripts/gen_logs.py --size-mb 1024 --out data/bench.log --error-rate 0.02# Sweeps threads 1,2,4,8,16 (each in its own process for accurate peak RSS),
# generating a 1 GB dataset if data/bench.log is missing.
scripts/run_benchmarks.shOutput CSV columns: threads,runtime_ms,throughput_mb_s,peak_rss_mb,total_requests.
Dataset: 1 GB synthetic log, 16,975,689 records. Machine: Apple Silicon (arm64),
10 cores, Apple Clang 16, Release build (-O3 -march=native).
Warm page cache (the file is fully resident, so timing reflects compute, not disk):
| Threads | Runtime (ms) | Throughput (MB/s) | Peak RSS (MB) | Speedup |
|---|---|---|---|---|
| 1 | 1585 | 646 | 1315 | 1.00x |
| 2 | 1069 | 957 | 1286 | 1.48x |
| 4 | 601 | 1703 | 1299 | 2.64x |
| 8 | 479 | 2139 | 1298 | 3.31x |
cmake -B build -DCMAKE_BUILD_TYPE=Debug && cmake --build build
ctest --test-dir build --output-on-failureCovers chunk alignment, parsing, aggregation/merge, and a determinism check (N-thread output == 1-thread output). Run the TSan build to verify no data races.
Optimization decisions should be driven by profiling data, not guesses.
- Linux:
perf record ./build-release/log_engine --input data/bench.log --threads 8 perf report
- macOS: use
sampleor Instruments:./build-release/log_engine --input data/bench.log --threads 8 & sample $! 5 -file /tmp/log_engine.sample.txt # 5-second profile # or: xctrace record --template 'Time Profiler' --launch -- ./build-release/log_engine ...
Look for hotspots in parsing, hashing/map inserts, and synchronization.