Baseline performance numbers for Group's core operations — local ETS reads, GenServer writes with shard scaling, and distributed replication across separate BEAM VMs.
cd priv/group/priv/bench
mix deps.getSingle-node, no distribution required:
./run_local.shUses 3 separate BEAM VMs (coordinator + 2 replicas) as OS processes:
./run_distributed.shThe script compiles once, starts both replicas in the background, then launches the coordinator. Replicas are killed automatically on exit.
All local benchmarks run for both the default (nil) cluster and a named cluster
("game") to verify there's no performance difference between the two paths.
Pure ETS read — the hot path for Group. Registers 10K keys, then measures 100K
random Group.lookup/3 calls.
Reports ops/sec and p50/p99/max latency.
ETS read that returns a list. Joins 100 processes to each of 100 groups (10K
memberships total), then measures 100K random Group.members/3 calls.
Slower than lookup because each call copies a 100-element list out of ETS.
Measures concurrent Group.register/4 calls — each of 10K spawned processes
registers itself in parallel. Varies shard count (1, 2, 4, schedulers_online)
to show how write throughput scales with sharding.
Sequential register + unregister pairs from a single process. Measures the GenServer round-trip cost of two writes back-to-back (10K cycles).
Reports per-cycle latency percentiles.
Same shape as register throughput but with Group.join/4. 10K processes each
join a group concurrently, varying shard count.
Calls Group.monitor(:bench, :all), then registers 5K keys and measures the
time until all 5K :registered events are received by the monitoring process.
Three separate BEAM VMs on 127.0.0.1 — each with its own schedulers, memory
allocator, and GC. The coordinator drives all operations via :erpc.call with
MFA (module/function/args) to the replica nodes.
The core distributed measurement. Registers a key on replica1, then spin-polls
Group.lookup on replica2 until it appears. Repeats 1,000 times.
Reports p50/p99/max latency covering the full path: GenServer call on replica1, Erlang distribution message, GenServer cast on replica2, ETS insert.
Measures how fast a new node catches up to an existing peer's state. Registers N keys on replica1 (1K and 10K), then starts Group on replica2 and polls until all N entries are visible.
Group sends all data in a single cluster_state message on peer discovery, so
this is bounded by serialization + network, not per-key round-trips.
Both replicas register 5K keys simultaneously (10K total), then waits for full convergence — both nodes see all 10K entries.
Measures total throughput including the time for all replication messages to settle.
Same as scenario 1 (replication latency) but within a named cluster
("game"). Both replicas call Group.connect/2 before measuring.
Useful for verifying that the named cluster replication path has no overhead compared to the default nil cluster.
The critical distributed cleanup path. Registers 1K and 5K processes on
replica1, kills them all, then measures how long until replica2 sees zero
entries. Exercises: local DOWN handler → replicate_unregister broadcast →
remote ETS cleanup.
This scenario catches O(N²) message amplification bugs where remote nodes redundantly monitor pids and re-broadcast cleanup messages.
Sustained churn: 10 waves of 500 register+kill cycles on replica1, measuring total wall time including convergence on replica2. Simulates steady-state deploy churn where processes are constantly starting and stopping.
Same as scenario 5 but for process groups. Spawns 1K processes on replica1,
all joining the same group key, then kills them all. Measures cleanup
convergence on replica2 via the replicate_leave path.
All members hash to the same shard (single key), making this the worst case for shard contention during bulk cleanup.
priv/group/priv/bench/
├── mix.exs # depends on :group via path: "../../"
├── run_distributed.sh # starts 3 VMs, cleans up on exit
├── README.md
├── lib/
│ ├── group_bench.ex # CLI entry — dispatches local/distributed
│ ├── group_bench/
│ │ ├── local.ex # 6 local benchmarks
│ │ ├── distributed.ex # coordinator: connects + drives 7 benchmarks
│ │ ├── replica.ex # helpers called by coordinator via :erpc
│ │ └── helpers.ex # timing, formatting, percentile math
The bench suite is a standalone Mix project that depends on :group. Protocol
consolidation is enabled for realistic production performance. Distributed
benchmarks use separate OS processes (not :peer) so each node gets its own
scheduler pool and memory allocator.