Skip to content

NNJA Conv Obs fixes#888

Open
aayushg55 wants to merge 9 commits into
NVIDIA:mainfrom
aayushg55:ag-nnja-fixes
Open

NNJA Conv Obs fixes#888
aayushg55 wants to merge 9 commits into
NVIDIA:mainfrom
aayushg55:ag-nnja-fixes

Conversation

@aayushg55
Copy link
Copy Markdown
Collaborator

@aayushg55 aayushg55 commented May 29, 2026

Earth2Studio Pull Request

Description

This PR fixes NNJA conventional observation decoding so NNJAObsConv better matches the GSI/UFS Replay conventional-observation convention.

1. Filter NNJA pres to GSI ps-like observations

NNJA prepbufr::POB was previously treated as pres for every PREPBUFR level. That is too broad: POB is often just vertical-coordinate metadata for upper-air, wind, satellite-wind, or scatterometer reports.
This now matches GSI’s obstype="ps" behavior more closely:

2. Complete the PREPBUFR message-family table

PREPBUFR_OBS_TYPES now follows the NCEP PREPBUFR Table 1.a message families more completely.
This fixes skipped merged-PREPBUFR message groups, notably:

3. Fix prepbufr.acft_profiles path and type normalization

prepbufr.acft_profiles uses a special NNJA S3 layout under .../bufr/..., unlike the standard merged prepbufr layout. The path builder now handles that source correctly.
That source also uses aircraft-profile-specific report codes in the 33x/43x/53x ranges. GSI remaps those back to standard aircraft report types when reading U/V from the aircraft-profile file:

  • 330/430/530 -> 230
  • 331/431/531 -> 231
  • 332/432/532 -> 232
  • 333/433/533 -> 233
  • 334/434/534 -> 234
  • 335/435/535 -> 235
    This is not HealDA-specific. It normalizes the special prepbufr.acft_profiles source to the same report-type convention as standard merged PREPBUFR and UFS/GSI diagnostics.
    Reference:
    https://github.com/NOAA-EMC/GSI/blob/860d13740352004fca0136a8c3d0ac9dea30e0da/src/gsi/read_prepbufr.F90#L730-L745
    Note: for most downstream users, standard merged prepbufr should remain the default. prepbufr.acft_profiles is a special aircraft-profile product; use it only when that profile structure is specifically needed.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.
  • The CHANGELOG.md is up to date with these changes.
  • An issue is linked to this pull request.
  • Assess and address Greptile feedback (AI code review bot for guidance; use discretion, addressing all feedback is not required).

Dependencies

aayushg55 added 5 commits May 28, 2026 18:17
Ensure NNJA pressure observations represent GSI ps-like station pressure rather than every PrepBUFR POB level, preserving CAT metadata so sounding levels and wind metadata are excluded before conversion.

Commit message authored by AI
Route `prepbufr.acft_profiles` through the NNJA `bufr/` prefix and normalize aircraft profile report types to their GSI U/V equivalents so profile winds decode consistently with merged PrepBUFR processing.

Commit message authored by AI
@aayushg55
Copy link
Copy Markdown
Collaborator Author

aayushg55 commented May 29, 2026

image For conv_uv, now NNJA is correctly a superset of UFS replay. We see 233 (imortant) and other minor obs type (229) that were missing before. image image image

Comment thread earth2studio/data/nnja.py
435: 235,
535: 235,
}

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why remapping the type in the data source output? If the remapping is for HealDA only, i think it should go in the model in order to keep the data source as generic as possible. @NickGeneva

Copy link
Copy Markdown
Collaborator Author

@aayushg55 aayushg55 May 29, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This remap is not HealDA-specific -- it follows GSI’s handling of the prepbufr.acft_profiles product (not the main prepbufr files). However, this remapping is only appropriate if NNJAObsConv is trying to expose GSI/PREPBUFR-compatible report types.

GSI treats these files specially:

Standard prepbufr is the main merged conventional obs file.
When aircraft_t_bc is enabled, GSI skips aircraft-profile-type observations from the normal prepbufr path.
It then reads the aircraft profile file separately (acft_profl_file) and remaps the profile-specific 33x/43x/53x codes to standard aircraft report types (23x). You can think of the prepbufr as being the full homogenized set of obs, and the prepbufr.acft_profiles as being a more detailed version focused just on the aircraft data.

  • 330/430/530 -> 230
  • 331/431/531 -> 231
  • 332/432/532 -> 232
  • 333/433/533 -> 233
  • 334/434/534 -> 234
  • 335/435/535 -> 235

The tradeoff is that this collapses profile-stage information. If we want NNJAObsConv(source="prepbufr.acft_profiles") to preserve raw profile codes for users interested in the flight-level/ascending/descending distinction, then the better API would be to add another column.

@aayushg55 aayushg55 requested a review from NickGeneva May 29, 2026 20:00
@aayushg55 aayushg55 marked this pull request as ready for review May 29, 2026 20:00
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented May 29, 2026

Greptile Summary

This PR aligns NNJAObsConv more closely with GSI/UFS PrepBUFR conventions by filtering surface-pressure observations to GSI obstype="ps" report types, expanding the PREPBUFR_OBS_TYPES table to cover the full NCEP Table 1.a message families, and fixing both the prepbufr.acft_profiles S3 path and its aircraft profile report-type codes.

  • Pressure filtering (lexicon/nnja.py): The pres modifier now guards on type ∈ {120,180,181,187}, class ∈ {ADPUPA,ADPSFC,SFCSHP}, level_cat == 0, and obs ≥ 500 hPa before the hPa→Pa conversion, matching GSI's obstype="ps" selection. The required level_cat field is propagated from OBS_CAT descriptors through _extract_subset and added to _NNJA_CONV_SCHEMA.
  • PREPBUFR_OBS_TYPES expansion (utils_bufr.py): Adds 103 AIRCAR, 106 PROFLR, 108 SATEMP, 111 SFCBOG, 114 ERS1DA, 115 GOESND, 116 QKSWND, 117 MSONET, 118 GPSIPW, 119 RASSDA, 120 WDSATR; also corrects the previous 112→GPSIPW mis-labeling to 112→SPSSMI / 118→GPSIPW per the NCEP table. This resolves the previously flagged missing PROFLR (106) entry.
  • prepbufr.acft_profiles fixes (nnja.py): Switches the S3 sub-directory from prepbufr.acft_profiles to bufr for that special source, and applies _ACFT_PROFILE_UV_TYPE_MAP to remap 33x/43x/53x report codes back to the standard 23x range after decoding.

Confidence Score: 5/5

Safe to merge; all three fix areas are well-scoped and backed by GSI reference citations and new unit tests.

The changes are narrowly targeted corrections (filtering logic, table completeness, path/type normalization) with clear upstream references. New tests cover the main new code paths. No regressions were identified in the existing decode pipeline.

No files require special attention.

Important Files Changed

Filename Overview
earth2studio/data/nnja.py Adds OBS_CAT tracking through _extract_subset, emits level_cat per row, builds corrected prepbufr.acft_profiles S3 URI (bufr sub-dir), and applies _ACFT_PROFILE_UV_TYPE_MAP remap after finalization. Logic is sound.
earth2studio/data/utils_bufr.py Expands PREPBUFR_OBS_TYPES to match NCEP Table 1.a more completely; fixes 112→SPSSMI / 118→GPSIPW numbering and adds 103/106/108/111/114-117/119-120. Addresses the previously flagged missing PROFLR (106) entry.
earth2studio/lexicon/nnja.py Introduces station-pressure filtering in the pres modifier guarded by column presence check; existing quality-mark tautology (0–15) noted in prior thread. No new bugs.
test/data/test_nnja.py Adds level_cat to mock data and two new targeted tests for the pressure-filtering modifier; coverage looks appropriate for the changed paths.

Reviews (2): Last reviewed commit: "Merge branch 'main' of github.com:NVIDIA..." | Re-trigger Greptile

Comment thread earth2studio/lexicon/nnja.py
@aayushg55
Copy link
Copy Markdown
Collaborator Author

@greptile-apps

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants