In experimental data acquisition, we often need to batch together many measurements/"shots" at the same setpoints (inputs). Duplicating their general meta-data can be cumbersome and reading such files in analysis is highly non-contiguous.
Meshes
In the Hardware Aware AI (HAAI) beamline data aquisiton schema draft of @ericcropp, SLAC started to add extra dimensions on the records (variables) inside a snapshot to capture multiple shots.
Example: a 2D quad scan (to measure 4D emittance) adds two extra "batch" dimensions on the slowest varying index to each data record's feature dimensions.
Particles
A natural way to extend openPMD dataframes / tables is to add an extra column (1d record) for every batch dimension and concatenate the data of all batches.
The new columns then identify which particle particle (row) belongs to which batch.
Possible Complication
As a real-life complication, not every "diagnostics"/"shot" has the same frequency of output: In a batch of 100 shots, some records contribute only every N-th time and might not all start exactly at the same time.
Possible approach: as long as the batch can be in some way time aligned and the frequency of contributions to records is stable within the batch, one can:
- use a global shot (e.g., first common shot) as the openPMD snapshot number, e.g.,
100
- add an attribute how many entries are in the batch at most, e.g., "50 snapshots in this batch"
- store an attribute similar to a numpy slice / interval on each record to store contribution slicing, e.g, "105:121:5": this record has global snapshots 105, 110, 115 and 120 inside the 100-150 shot batch interval
Discussed on March 4, 2026: This feature received no great resonance in the HAAI meeting as it complicates the logic for human and machine-reading a lot. We would limit a batching implementation to equal contributions.
In experimental data acquisition, we often need to batch together many measurements/"shots" at the same setpoints (inputs). Duplicating their general meta-data can be cumbersome and reading such files in analysis is highly non-contiguous.
Meshes
In the Hardware Aware AI (HAAI) beamline data aquisiton schema draft of @ericcropp, SLAC started to add extra dimensions on the records (variables) inside a snapshot to capture multiple shots.
Example: a 2D quad scan (to measure 4D emittance) adds two extra "batch" dimensions on the slowest varying index to each data record's feature dimensions.
Particles
A natural way to extend openPMD dataframes / tables is to add an extra column (1d record) for every batch dimension and concatenate the data of all batches.
The new columns then identify which particle particle (row) belongs to which batch.
Possible Complication
As a real-life complication, not every "diagnostics"/"shot" has the same frequency of output: In a batch of 100 shots, some records contribute only every N-th time and might not all start exactly at the same time.
Possible approach: as long as the batch can be in some way time aligned and the frequency of contributions to records is stable within the batch, one can:
100Discussed on March 4, 2026: This feature received no great resonance in the HAAI meeting as it complicates the logic for human and machine-reading a lot. We would limit a batching implementation to equal contributions.