Multithreaded Log Processor

Multithreaded C++ log processor featuring mmap-based I/O, thread pools, profiling-driven optimizations, and scalable parallel processing.

Architecture

mmap file -> line-aligned Chunks -> TaskQueue -> Worker pool -> per-thread Stats -> merge -> Report

mmap reader (mmap_reader.*) maps the file read-only and splits it into ~16 MB chunks whose boundaries land on newlines, so no record is split.
parser (log_parser.*) scans each line with raw pointers (no regex / no stringstream) into string_views pointing into the mapped file (zero copy).
aggregator (aggregator.*) folds records into a per-thread ThreadStats (no locks); merge() combines them and computes the analytics.
task queue / thread pool (task_queue.*, thread_pool.*) form the producer-consumer pipeline: one producer feeds chunks, N workers consume.

Build

cmake -B build-release -DCMAKE_BUILD_TYPE=Release
cmake --build build-release

Debug build (warnings-as-errors): -DCMAKE_BUILD_TYPE=Debug. ThreadSanitizer build: -DENGINE_SANITIZE_THREAD=ON.

The code is portable POSIX and builds on macOS too. -march=native is enabled only if the compiler accepts it (auto-detected). MAP_POPULATE is Linux-only and guarded.

Run

./build-release/log_engine --input data/sample.log --threads 8 --top-endpoints 10
# positional input also works:  ./build-release/log_engine data/sample.log

Options: --input <file>, --threads N (default: hardware concurrency), --top-endpoints N (default 10), --chunk-mb N (default 16).

Generate test data

python3 scripts/gen_logs.py --size-mb 10   --out data/sample.log
python3 scripts/gen_logs.py --size-mb 1024 --out data/bench.log --error-rate 0.02

Benchmark

# Sweeps threads 1,2,4,8,16 (each in its own process for accurate peak RSS),
# generating a 1 GB dataset if data/bench.log is missing.
scripts/run_benchmarks.sh

Output CSV columns: threads,runtime_ms,throughput_mb_s,peak_rss_mb,total_requests.

Performance results

Dataset: 1 GB synthetic log, 16,975,689 records. Machine: Apple Silicon (arm64), 10 cores, Apple Clang 16, Release build (-O3 -march=native).

Warm page cache (the file is fully resident, so timing reflects compute, not disk):

Threads	Runtime (ms)	Throughput (MB/s)	Peak RSS (MB)	Speedup
1	1585	646	1315	1.00x
2	1069	957	1286	1.48x
4	601	1703	1299	2.64x
8	479	2139	1298	3.31x

Tests

cmake -B build -DCMAKE_BUILD_TYPE=Debug && cmake --build build
ctest --test-dir build --output-on-failure

Covers chunk alignment, parsing, aggregation/merge, and a determinism check (N-thread output == 1-thread output). Run the TSan build to verify no data races.

Profiling

Optimization decisions should be driven by profiling data, not guesses.

Linux:

perf record ./build-release/log_engine --input data/bench.log --threads 8
perf report

macOS: use sample or Instruments:

./build-release/log_engine --input data/bench.log --threads 8 &
sample $! 5 -file /tmp/log_engine.sample.txt   # 5-second profile
# or: xctrace record --template 'Time Profiler' --launch -- ./build-release/log_engine ...

Look for hotspots in parsing, hashing/map inserts, and synchronization.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
tests		tests
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
README.md		README.md
aggregator.cpp		aggregator.cpp
aggregator.h		aggregator.h
bench_main.cpp		bench_main.cpp
benchmark.cpp		benchmark.cpp
benchmark.h		benchmark.h
log_parser.cpp		log_parser.cpp
log_parser.h		log_parser.h
main.cpp		main.cpp
mmap_reader.cpp		mmap_reader.cpp
mmap_reader.h		mmap_reader.h
task_queue.cpp		task_queue.cpp
task_queue.h		task_queue.h
thread_pool.cpp		thread_pool.cpp
thread_pool.h		thread_pool.h

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multithreaded Log Processor

Architecture

Build

Run

Generate test data

Benchmark

Performance results

Tests

Profiling

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Multithreaded Log Processor

Architecture

Build

Run

Generate test data

Benchmark

Performance results

Tests

Profiling

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages