Emit WDL per-stage overall metrics for Manifold#684
Open
charles-typ wants to merge 2 commits into
Open
Conversation
Summary:
The WDL bench `prod_set` job runs ~25 individual sub-benchmarks
(memcpy_benchmark, openssl, lzbench, ProtocolBench, ...) back-to-back in a
single benchpress run. Today the perf hook spans the whole run, so PMU +
sysstat data ends up smeared across all sub-benchmarks in one set of CSVs.
Distinguishing IPC, topdown breakdown, mpstat etc. per sub-benchmark
requires teasing apart timestamps after the fact, which is brittle.
This change adds an opt-in stage-aware mode to the perf hook so each
sub-benchmark gets its own folder of perf data:
benchmark_metrics_<uuid>/
memcpy_benchmark/
mpstat.csv
mem-stat.csv
perf-stat.csv
topdown-... .csv
(etc -- one set per enabled perf monitor)
hash_hash_benchmark/
...
...
The mechanism:
1. The perf hook accepts a new option `stage_aware: true`. When set,
`before_job` does NOT start any monitors. Instead it creates a FIFO
under benchmark_metrics_<uuid>/perf_stage.fifo, advertises its path
via the env var BENCHPRESS_PERF_STAGE_FIFO, and spawns a
coordinator thread.
2. The coordinator reads commands from the FIFO. Each
"START <stage_name>" allocates a fresh set of perf monitors with
that stage name as a sub-folder; each "STOP" terminates the
monitors and writes their CSVs. Multiple START/STOP cycles are
supported. `after_job` writes a final __EXIT__ to drain the
coordinator.
3. Every existing perf monitor (mpstat, memstat, netstat, perfstat,
vmstat, cpufreq*, power, topdown -- including IntelPerfSpect/3,
BasePerfUtil, AMDPerfUtil, ARMPerfUtil, NVPerfUtil,
NeoVerseV3PerfUtil) gains a `subdir` constructor arg that the base
`Monitor.gen_path` joins under benchmark_metrics_<uuid>/. Existing
callers that don't pass `subdir` keep their flat layout, so default
mode is byte-for-byte unchanged.
4. `packages/wdl_bench/run_prod.sh` emits "START <bench>" before each
sub-benchmark and "STOP" after. The emit is gated on the
BENCHPRESS_PERF_STAGE_FIFO env var, so running prod_set without
stage-aware mode (or without the perf hook at all) is unchanged.
5. A new job entry `prod_set_per_stage_perf` in jobs_wdl.yml
documents the wiring; it has the same shape as `prod_set`. To
activate, run:
./benchpress run prod_set_per_stage_perf \\
-k perf -a '{"perf": {"stage_aware": true}}'
Differential Revision: D108110315
Summary:
D108110315 teaches the benchpress perf hook to collect PMU + sysstat data in
per-WDL-sub-benchmark stage directories for WDL prod_set. The raw staged CSVs
are preserved by perfpub's recursive Manifold upload, but users still need a
single processed summary per sub-benchmark, analogous to the normal
`overall-metrics.csv` that perfpub emits for a single benchmark.
This change adds stage-aware summary generation to perfpub:
benchmark_metrics_<run_id>/
memcpy_benchmark/
overall-metrics.csv # NEW
perf-stat.csv
nv-perf-collector-summary.csv
...
hash_hash_benchmark/
overall-metrics.csv # NEW
...
wdl_stage_overall_metrics.csv # NEW aggregate index
wdl_stage_overall_metrics.json # NEW aggregate index
wdl_stage_perf_summary.csv # generic numeric summary from prior patch
wdl_stage_perf_summary.json
Implementation details:
- For each immediate stage subdir that contains CSVs, perfpub temporarily
chdirs into that subdir and reuses the same reader functions as the top-level
path (`read_mpstat`, `read_memstat`, `read_cpufreq_*`, `read_perfstat`,
`read_nv_perf_collector`, `read_arm_perf_collector`, etc.). This gives each
stage the same processed metric lines as a normal single benchmark's
`overall-metrics.csv`.
- Adds the WDL sub-benchmark score at the top when the score exists in the
parent benchmark metrics JSON.
- Writes per-stage `overall-metrics.csv`, plus top-level CSV/JSON indices for
easy discovery in Manifold.
- No XDB/dashboard changes. The goal is Manifold artifact usability.
- `sample_avg_from_csv()` now tolerates partial CSV schemas by selecting only
requested metric columns that exist and warning about missing ones. This is
useful for stage dirs where a monitor didn't emit the full standard set of
columns.
Differential Revision: D108195608
|
@charles-typ has exported this pull request. If you are a Meta employee, you can view the originating Diff in D108195608. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary:
D108110315 teaches the benchpress perf hook to collect PMU + sysstat data in
per-WDL-sub-benchmark stage directories for WDL prod_set. The raw staged CSVs
are preserved by perfpub's recursive Manifold upload, but users still need a
single processed summary per sub-benchmark, analogous to the normal
overall-metrics.csvthat perfpub emits for a single benchmark.This change adds stage-aware summary generation to perfpub:
benchmark_metrics_<run_id>/
memcpy_benchmark/
overall-metrics.csv # NEW
perf-stat.csv
nv-perf-collector-summary.csv
...
hash_hash_benchmark/
overall-metrics.csv # NEW
...
wdl_stage_overall_metrics.csv # NEW aggregate index
wdl_stage_overall_metrics.json # NEW aggregate index
wdl_stage_perf_summary.csv # generic numeric summary from prior patch
wdl_stage_perf_summary.json
Implementation details:
chdirs into that subdir and reuses the same reader functions as the top-level
path (
read_mpstat,read_memstat,read_cpufreq_*,read_perfstat,read_nv_perf_collector,read_arm_perf_collector, etc.). This gives eachstage the same processed metric lines as a normal single benchmark's
overall-metrics.csv.parent benchmark metrics JSON.
overall-metrics.csv, plus top-level CSV/JSON indices foreasy discovery in Manifold.
sample_avg_from_csv()now tolerates partial CSV schemas by selecting onlyrequested metric columns that exist and warning about missing ones. This is
useful for stage dirs where a monitor didn't emit the full standard set of
columns.
Differential Revision: D108195608