Skip to content

Emit WDL per-stage overall metrics for Manifold#684

Open
charles-typ wants to merge 2 commits into
facebookresearch:v2-betafrom
charles-typ:export-D108195608-to-v2-beta
Open

Emit WDL per-stage overall metrics for Manifold#684
charles-typ wants to merge 2 commits into
facebookresearch:v2-betafrom
charles-typ:export-D108195608-to-v2-beta

Conversation

@charles-typ

Copy link
Copy Markdown
Contributor

Summary:
D108110315 teaches the benchpress perf hook to collect PMU + sysstat data in
per-WDL-sub-benchmark stage directories for WDL prod_set. The raw staged CSVs
are preserved by perfpub's recursive Manifold upload, but users still need a
single processed summary per sub-benchmark, analogous to the normal
overall-metrics.csv that perfpub emits for a single benchmark.

This change adds stage-aware summary generation to perfpub:

benchmark_metrics_<run_id>/
memcpy_benchmark/
overall-metrics.csv # NEW
perf-stat.csv
nv-perf-collector-summary.csv
...
hash_hash_benchmark/
overall-metrics.csv # NEW
...
wdl_stage_overall_metrics.csv # NEW aggregate index
wdl_stage_overall_metrics.json # NEW aggregate index
wdl_stage_perf_summary.csv # generic numeric summary from prior patch
wdl_stage_perf_summary.json

Implementation details:

  • For each immediate stage subdir that contains CSVs, perfpub temporarily
    chdirs into that subdir and reuses the same reader functions as the top-level
    path (read_mpstat, read_memstat, read_cpufreq_*, read_perfstat,
    read_nv_perf_collector, read_arm_perf_collector, etc.). This gives each
    stage the same processed metric lines as a normal single benchmark's
    overall-metrics.csv.
  • Adds the WDL sub-benchmark score at the top when the score exists in the
    parent benchmark metrics JSON.
  • Writes per-stage overall-metrics.csv, plus top-level CSV/JSON indices for
    easy discovery in Manifold.
  • No XDB/dashboard changes. The goal is Manifold artifact usability.
  • sample_avg_from_csv() now tolerates partial CSV schemas by selecting only
    requested metric columns that exist and warning about missing ones. This is
    useful for stage dirs where a monitor didn't emit the full standard set of
    columns.

Differential Revision: D108195608

Summary:
The WDL bench `prod_set` job runs ~25 individual sub-benchmarks
(memcpy_benchmark, openssl, lzbench, ProtocolBench, ...) back-to-back in a
single benchpress run. Today the perf hook spans the whole run, so PMU +
sysstat data ends up smeared across all sub-benchmarks in one set of CSVs.
Distinguishing IPC, topdown breakdown, mpstat etc. per sub-benchmark
requires teasing apart timestamps after the fact, which is brittle.

This change adds an opt-in stage-aware mode to the perf hook so each
sub-benchmark gets its own folder of perf data:

  benchmark_metrics_<uuid>/
      memcpy_benchmark/
          mpstat.csv
          mem-stat.csv
          perf-stat.csv
          topdown-... .csv
          (etc -- one set per enabled perf monitor)
      hash_hash_benchmark/
          ...
      ...

The mechanism:

  1. The perf hook accepts a new option `stage_aware: true`. When set,
     `before_job` does NOT start any monitors. Instead it creates a FIFO
     under benchmark_metrics_<uuid>/perf_stage.fifo, advertises its path
     via the env var BENCHPRESS_PERF_STAGE_FIFO, and spawns a
     coordinator thread.

  2. The coordinator reads commands from the FIFO. Each
     "START <stage_name>" allocates a fresh set of perf monitors with
     that stage name as a sub-folder; each "STOP" terminates the
     monitors and writes their CSVs. Multiple START/STOP cycles are
     supported. `after_job` writes a final __EXIT__ to drain the
     coordinator.

  3. Every existing perf monitor (mpstat, memstat, netstat, perfstat,
     vmstat, cpufreq*, power, topdown -- including IntelPerfSpect/3,
     BasePerfUtil, AMDPerfUtil, ARMPerfUtil, NVPerfUtil,
     NeoVerseV3PerfUtil) gains a `subdir` constructor arg that the base
     `Monitor.gen_path` joins under benchmark_metrics_<uuid>/. Existing
     callers that don't pass `subdir` keep their flat layout, so default
     mode is byte-for-byte unchanged.

  4. `packages/wdl_bench/run_prod.sh` emits "START <bench>" before each
     sub-benchmark and "STOP" after. The emit is gated on the
     BENCHPRESS_PERF_STAGE_FIFO env var, so running prod_set without
     stage-aware mode (or without the perf hook at all) is unchanged.

  5. A new job entry `prod_set_per_stage_perf` in jobs_wdl.yml
     documents the wiring; it has the same shape as `prod_set`. To
     activate, run:

         ./benchpress run prod_set_per_stage_perf \\
             -k perf -a '{"perf": {"stage_aware": true}}'

Differential Revision: D108110315
Summary:
D108110315 teaches the benchpress perf hook to collect PMU + sysstat data in
per-WDL-sub-benchmark stage directories for WDL prod_set. The raw staged CSVs
are preserved by perfpub's recursive Manifold upload, but users still need a
single processed summary per sub-benchmark, analogous to the normal
`overall-metrics.csv` that perfpub emits for a single benchmark.

This change adds stage-aware summary generation to perfpub:

  benchmark_metrics_<run_id>/
    memcpy_benchmark/
      overall-metrics.csv              # NEW
      perf-stat.csv
      nv-perf-collector-summary.csv
      ...
    hash_hash_benchmark/
      overall-metrics.csv              # NEW
      ...
    wdl_stage_overall_metrics.csv      # NEW aggregate index
    wdl_stage_overall_metrics.json     # NEW aggregate index
    wdl_stage_perf_summary.csv         # generic numeric summary from prior patch
    wdl_stage_perf_summary.json

Implementation details:
- For each immediate stage subdir that contains CSVs, perfpub temporarily
  chdirs into that subdir and reuses the same reader functions as the top-level
  path (`read_mpstat`, `read_memstat`, `read_cpufreq_*`, `read_perfstat`,
  `read_nv_perf_collector`, `read_arm_perf_collector`, etc.). This gives each
  stage the same processed metric lines as a normal single benchmark's
  `overall-metrics.csv`.
- Adds the WDL sub-benchmark score at the top when the score exists in the
  parent benchmark metrics JSON.
- Writes per-stage `overall-metrics.csv`, plus top-level CSV/JSON indices for
  easy discovery in Manifold.
- No XDB/dashboard changes. The goal is Manifold artifact usability.
- `sample_avg_from_csv()` now tolerates partial CSV schemas by selecting only
  requested metric columns that exist and warning about missing ones. This is
  useful for stage dirs where a monitor didn't emit the full standard set of
  columns.

Differential Revision: D108195608
@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Jun 10, 2026
@meta-codesync

meta-codesync Bot commented Jun 10, 2026

Copy link
Copy Markdown

@charles-typ has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108195608.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. meta-exported

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant