Emit WDL per-stage overall metrics for Manifold by charles-typ · Pull Request #684 · facebookresearch/DCPerf

charles-typ · 2026-06-10T22:07:42Z

Summary:
D108110315 teaches the benchpress perf hook to collect PMU + sysstat data in
per-WDL-sub-benchmark stage directories for WDL prod_set. The raw staged CSVs
are preserved by perfpub's recursive Manifold upload, but users still need a
single processed summary per sub-benchmark, analogous to the normal
overall-metrics.csv that perfpub emits for a single benchmark.

This change adds stage-aware summary generation to perfpub:

benchmark_metrics_<run_id>/
memcpy_benchmark/
overall-metrics.csv # NEW
perf-stat.csv
nv-perf-collector-summary.csv
...
hash_hash_benchmark/
overall-metrics.csv # NEW
...
wdl_stage_overall_metrics.csv # NEW aggregate index
wdl_stage_overall_metrics.json # NEW aggregate index
wdl_stage_perf_summary.csv # generic numeric summary from prior patch
wdl_stage_perf_summary.json

Implementation details:

For each immediate stage subdir that contains CSVs, perfpub temporarily
chdirs into that subdir and reuses the same reader functions as the top-level
path (read_mpstat, read_memstat, read_cpufreq_*, read_perfstat,
read_nv_perf_collector, read_arm_perf_collector, etc.). This gives each
stage the same processed metric lines as a normal single benchmark's
overall-metrics.csv.
Adds the WDL sub-benchmark score at the top when the score exists in the
parent benchmark metrics JSON.
Writes per-stage overall-metrics.csv, plus top-level CSV/JSON indices for
easy discovery in Manifold.
No XDB/dashboard changes. The goal is Manifold artifact usability.
sample_avg_from_csv() now tolerates partial CSV schemas by selecting only
requested metric columns that exist and warning about missing ones. This is
useful for stage dirs where a monitor didn't emit the full standard set of
columns.

Differential Revision: D108195608

Summary: The WDL bench `prod_set` job runs ~25 individual sub-benchmarks (memcpy_benchmark, openssl, lzbench, ProtocolBench, ...) back-to-back in a single benchpress run. Today the perf hook spans the whole run, so PMU + sysstat data ends up smeared across all sub-benchmarks in one set of CSVs. Distinguishing IPC, topdown breakdown, mpstat etc. per sub-benchmark requires teasing apart timestamps after the fact, which is brittle. This change adds an opt-in stage-aware mode to the perf hook so each sub-benchmark gets its own folder of perf data: benchmark_metrics_<uuid>/ memcpy_benchmark/ mpstat.csv mem-stat.csv perf-stat.csv topdown-... .csv (etc -- one set per enabled perf monitor) hash_hash_benchmark/ ... ... The mechanism: 1. The perf hook accepts a new option `stage_aware: true`. When set, `before_job` does NOT start any monitors. Instead it creates a FIFO under benchmark_metrics_<uuid>/perf_stage.fifo, advertises its path via the env var BENCHPRESS_PERF_STAGE_FIFO, and spawns a coordinator thread. 2. The coordinator reads commands from the FIFO. Each "START <stage_name>" allocates a fresh set of perf monitors with that stage name as a sub-folder; each "STOP" terminates the monitors and writes their CSVs. Multiple START/STOP cycles are supported. `after_job` writes a final __EXIT__ to drain the coordinator. 3. Every existing perf monitor (mpstat, memstat, netstat, perfstat, vmstat, cpufreq*, power, topdown -- including IntelPerfSpect/3, BasePerfUtil, AMDPerfUtil, ARMPerfUtil, NVPerfUtil, NeoVerseV3PerfUtil) gains a `subdir` constructor arg that the base `Monitor.gen_path` joins under benchmark_metrics_<uuid>/. Existing callers that don't pass `subdir` keep their flat layout, so default mode is byte-for-byte unchanged. 4. `packages/wdl_bench/run_prod.sh` emits "START <bench>" before each sub-benchmark and "STOP" after. The emit is gated on the BENCHPRESS_PERF_STAGE_FIFO env var, so running prod_set without stage-aware mode (or without the perf hook at all) is unchanged. 5. A new job entry `prod_set_per_stage_perf` in jobs_wdl.yml documents the wiring; it has the same shape as `prod_set`. To activate, run: ./benchpress run prod_set_per_stage_perf \\ -k perf -a '{"perf": {"stage_aware": true}}' Differential Revision: D108110315

Summary: D108110315 teaches the benchpress perf hook to collect PMU + sysstat data in per-WDL-sub-benchmark stage directories for WDL prod_set. The raw staged CSVs are preserved by perfpub's recursive Manifold upload, but users still need a single processed summary per sub-benchmark, analogous to the normal `overall-metrics.csv` that perfpub emits for a single benchmark. This change adds stage-aware summary generation to perfpub: benchmark_metrics_<run_id>/ memcpy_benchmark/ overall-metrics.csv # NEW perf-stat.csv nv-perf-collector-summary.csv ... hash_hash_benchmark/ overall-metrics.csv # NEW ... wdl_stage_overall_metrics.csv # NEW aggregate index wdl_stage_overall_metrics.json # NEW aggregate index wdl_stage_perf_summary.csv # generic numeric summary from prior patch wdl_stage_perf_summary.json Implementation details: - For each immediate stage subdir that contains CSVs, perfpub temporarily chdirs into that subdir and reuses the same reader functions as the top-level path (`read_mpstat`, `read_memstat`, `read_cpufreq_*`, `read_perfstat`, `read_nv_perf_collector`, `read_arm_perf_collector`, etc.). This gives each stage the same processed metric lines as a normal single benchmark's `overall-metrics.csv`. - Adds the WDL sub-benchmark score at the top when the score exists in the parent benchmark metrics JSON. - Writes per-stage `overall-metrics.csv`, plus top-level CSV/JSON indices for easy discovery in Manifold. - No XDB/dashboard changes. The goal is Manifold artifact usability. - `sample_avg_from_csv()` now tolerates partial CSV schemas by selecting only requested metric columns that exist and warning about missing ones. This is useful for stage dirs where a monitor didn't emit the full standard set of columns. Differential Revision: D108195608

meta-codesync · 2026-06-10T22:08:00Z

@charles-typ has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108195608.

charles-typ added 2 commits June 10, 2026 15:07

meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 10, 2026

meta-codesync Bot added the meta-exported label Jun 10, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Emit WDL per-stage overall metrics for Manifold#684

Emit WDL per-stage overall metrics for Manifold#684
charles-typ wants to merge 2 commits into
facebookresearch:v2-betafrom
charles-typ:export-D108195608-to-v2-beta

charles-typ commented Jun 10, 2026

Uh oh!

meta-codesync Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

charles-typ commented Jun 10, 2026

Uh oh!

meta-codesync Bot commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant