NNJA Conv Obs fixes#888
Conversation
Ensure NNJA pressure observations represent GSI ps-like station pressure rather than every PrepBUFR POB level, preserving CAT metadata so sounding levels and wind metadata are excluded before conversion. Commit message authored by AI
Route `prepbufr.acft_profiles` through the NNJA `bufr/` prefix and normalize aircraft profile report types to their GSI U/V equivalents so profile winds decode consistently with merged PrepBUFR processing. Commit message authored by AI
| 435: 235, | ||
| 535: 235, | ||
| } | ||
|
|
There was a problem hiding this comment.
why remapping the type in the data source output? If the remapping is for HealDA only, i think it should go in the model in order to keep the data source as generic as possible. @NickGeneva
There was a problem hiding this comment.
This remap is not HealDA-specific -- it follows GSI’s handling of the prepbufr.acft_profiles product (not the main prepbufr files). However, this remapping is only appropriate if NNJAObsConv is trying to expose GSI/PREPBUFR-compatible report types.
GSI treats these files specially:
Standard prepbufr is the main merged conventional obs file.
When aircraft_t_bc is enabled, GSI skips aircraft-profile-type observations from the normal prepbufr path.
It then reads the aircraft profile file separately (acft_profl_file) and remaps the profile-specific 33x/43x/53x codes to standard aircraft report types (23x). You can think of the prepbufr as being the full homogenized set of obs, and the prepbufr.acft_profiles as being a more detailed version focused just on the aircraft data.
330/430/530 -> 230331/431/531 -> 231332/432/532 -> 232333/433/533 -> 233334/434/534 -> 234335/435/535 -> 235
The tradeoff is that this collapses profile-stage information. If we want NNJAObsConv(source="prepbufr.acft_profiles") to preserve raw profile codes for users interested in the flight-level/ascending/descending distinction, then the better API would be to add another column.
Greptile SummaryThis PR aligns
|
| Filename | Overview |
|---|---|
| earth2studio/data/nnja.py | Adds OBS_CAT tracking through _extract_subset, emits level_cat per row, builds corrected prepbufr.acft_profiles S3 URI (bufr sub-dir), and applies _ACFT_PROFILE_UV_TYPE_MAP remap after finalization. Logic is sound. |
| earth2studio/data/utils_bufr.py | Expands PREPBUFR_OBS_TYPES to match NCEP Table 1.a more completely; fixes 112→SPSSMI / 118→GPSIPW numbering and adds 103/106/108/111/114-117/119-120. Addresses the previously flagged missing PROFLR (106) entry. |
| earth2studio/lexicon/nnja.py | Introduces station-pressure filtering in the pres modifier guarded by column presence check; existing quality-mark tautology (0–15) noted in prior thread. No new bugs. |
| test/data/test_nnja.py | Adds level_cat to mock data and two new targeted tests for the pressure-filtering modifier; coverage looks appropriate for the changed paths. |
Reviews (2): Last reviewed commit: "Merge branch 'main' of github.com:NVIDIA..." | Re-trigger Greptile




Earth2Studio Pull Request
Description
This PR fixes NNJA conventional observation decoding so
NNJAObsConvbetter matches the GSI/UFS Replay conventional-observation convention.1. Filter NNJA
presto GSIps-like observationsNNJA
prepbufr::POBwas previously treated aspresfor every PREPBUFR level. That is too broad:POBis often just vertical-coordinate metadata for upper-air, wind, satellite-wind, or scatterometer reports.This now matches GSI’s
obstype="ps"behavior more closely:120,180,181,187level_cat == 0POB < 500 hPaThis is source-level normalization.
Reference:
https://github.com/NOAA-EMC/GSI/blob/860d13740352004fca0136a8c3d0ac9dea30e0da/src/gsi/read_prepbufr.F90#L1898-L1904
2. Complete the PREPBUFR message-family table
PREPBUFR_OBS_TYPESnow follows the NCEP PREPBUFR Table 1.a message families more completely.This fixes skipped merged-PREPBUFR message groups, notably:
103: AIRCAR, which carries MDCRS/ACARS aircraft observations and restores missing U/V report type233106: PROFLR, which restores rare profiler/PILOT wind type229Reference:
https://www.emc.ncep.noaa.gov/mmb/data_processing/prepbufr.doc/table_1.htm
3. Fix
prepbufr.acft_profilespath and type normalizationprepbufr.acft_profilesuses a special NNJA S3 layout under.../bufr/..., unlike the standard mergedprepbufrlayout. The path builder now handles that source correctly.That source also uses aircraft-profile-specific report codes in the
33x/43x/53xranges. GSI remaps those back to standard aircraft report types when reading U/V from the aircraft-profile file:330/430/530 -> 230331/431/531 -> 231332/432/532 -> 232333/433/533 -> 233334/434/534 -> 234335/435/535 -> 235This is not HealDA-specific. It normalizes the special
prepbufr.acft_profilessource to the same report-type convention as standard merged PREPBUFR and UFS/GSI diagnostics.Reference:
https://github.com/NOAA-EMC/GSI/blob/860d13740352004fca0136a8c3d0ac9dea30e0da/src/gsi/read_prepbufr.F90#L730-L745
Note: for most downstream users, standard merged
prepbufrshould remain the default.prepbufr.acft_profilesis a special aircraft-profile product; use it only when that profile structure is specifically needed.Checklist
Dependencies