Fix #624: Refine ExtraFieldWarning handling in validation by sejalpunwatkar · Pull Request #1443 · hdmf-dev/hdmf

sejalpunwatkar · 2026-04-18T17:59:39Z

Motivation

Fixes #624

This PR refines how ExtraFieldWarning is generated during validation.

Currently, some attributes that are valid in extended or inherited specifications may still be incorrectly flagged as unexpected. Additionally, unexpected child elements are not always consistently reported depending on matching conditions.

This PR aims to:

Improve handling of ExtraFieldWarning for attributes across related specs
Ensure unexpected elements are consistently reported during validation

How to test the behavior?

pytest tests/unit/validator_tests/test_validate.py
pytest tests/unit/common/test_table.py

Focus on:

TestExtraFieldIntersection
validation behavior for unexpected attributes and elements

Checklist

Did you update CHANGELOG.md with your changes?
Does the PR clearly describe the problem and the solution?
Have you reviewed our Contributing Guide?
Does the PR use "Fix #XXX" notation to tell GitHub to close the relevant issue numbered XXX when the PR is merged?

sejalpunwatkar · 2026-04-18T18:09:18Z

@rly I opened this new PR after closing the previous one because it had a large diff (-283) due to merge conflicts, which made it hard to review.

In this version, I kept the changes minimal and focused on aligning with the intended behavior from #542 and #624 (warnings instead of errors).

Right now I’m seeing two edge cases:

An attribute like resolution (defined in a parent/extended spec) is still being flagged as an extra field
An unmatched dataset is not always producing an "Unexpected element" warning

Before I refine further, I wanted to confirm if these cases should be handled at the validator level or if this is expected behavior from the spec/matcher side. Thank You!

codecov · 2026-04-18T18:24:03Z

Codecov Report

❌ Patch coverage is 4.03226% with 119 lines in your changes missing coverage. Please review.
✅ Project coverage is 37.48%. Comparing base (0d61982) to head (47699ef).

Files with missing lines	Patch %	Lines
src/hdmf/validate/validator.py	0.83%	119 Missing ⚠️

❗ There is a different number of reports uploaded between BASE (0d61982) and HEAD (47699ef). Click for more details.

HEAD has 9 uploads less than BASE

Flag BASE (0d61982) HEAD (47699ef)

10 1

Additional details and impacted files

@@             Coverage Diff             @@
##              dev    #1443       +/-   ##
===========================================
- Coverage   93.18%   37.48%   -55.71%     
===========================================
  Files          41       41               
  Lines       10176    10291      +115     
  Branches     2103     2142       +39     
===========================================
- Hits         9483     3858     -5625     
- Misses        415     6071     +5656     
- Partials      278      362       +84

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

…f-dev#1429) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com>

for more information, see https://pre-commit.ci

…dmf-dev#1439) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: rly <310197+rly@users.noreply.github.com>

…r shape error

…umns and extended type attributes - Skip unmatched typed children whose type is expected by parent spec hierarchy (MissingDataType covers them) - Skip unmatched untyped children in DynamicTable subtypes (columns like id, foo, VectorData etc.) - Skip unmatched untyped children when spec defines untyped dataset children (wildcard datasets) - Check subtypes when looking up attributes to avoid false ExtraFieldWarning for extended types - Return early from AttributeValidator after ExpectedArrayError to avoid duplicate errors

rly · 2026-04-24T07:19:13Z

An attribute like resolution (defined in a parent/extended spec) is still being flagged as an extra field

This should not be flagged as an extra field. Validation should happen against the full, extended spec, not the base VectorData spec.

An unmatched dataset is not always producing an "Unexpected element" warning

Could you provide concrete examples of these? It should.

sejalpunwatkar · 2026-04-25T08:07:20Z

Thank you for the feedback! I've moved resolution into the inline dataset spec within GroupWithBase and removed the subtype lookup. I'll look at #542 for the ValidationWarning/UserWarning approach and update accordingly.

rly · 2026-04-30T06:54:49Z

Thanks for the changes @sejalpunwatkar . I'll finish reviewing later this week. In the meantime, could you address the ruff errors? Also there is a significant amount of new code that is not covered by tests. Could you review that code and ensure that it is necessary to implement the fix here? Some of the new code looks like it is hard-coded against some of the attributes that we have in existing tests. Instead of excluding specially named fields during validation, I think the tests should be updated. Feel free to wait on that until I have the bandwidth to do a more complete review.

sejalpunwatkar · 2026-05-02T10:40:34Z

Thank you for the feedback @rly ! I'll fix the ruff errors and remove the hardcoded field names; those were added as a workaround and I agree the tests should be updated instead. I'll hold off on further changes until your complete review so I can address everything at once.

…f-dev#1429) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com>

…#1440) * Set reasonable default chunk sizes (~4 MB) when chunks=True h5py's default auto-chunking targets ~32 KB per chunk, which is too small for cloud access where each chunk requires a separate HTTP range request. Add _compute_chunk_shape() that targets ~4 MB chunks (in the recommended 2-16 MB range for cloud-hosted files). The method is applied in __list_fill__, __setup_chunked_dset__, and __setup_empty_dset__ whenever chunks=True (boolean). Also ensures chunks=True is explicitly set when maxshape forces chunking, so the compute logic always triggers. Closes hdmf-dev#1438 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update src/hdmf/backends/hdf5/h5tools.py Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com> * Address review: public API, mesoscale-safe chunking, CHANGELOG - Rename _compute_chunk_shape → compute_default_chunk_shape so users can inspect or override the shape that would be used when chunks=True. - Add optional neurodata_type hook (unused in the default implementation) so subclasses can specialize chunking per type without signature churn. - When a single first-axis slice already exceeds the target (e.g. a (1000, 4096, 4096) uint16 mesoscale volume at 32 MB/slice), halve the largest trailing axis until the chunk fits within the target. Prevents auto-chunks from ballooning well past 16 MB on large-frame datasets. - Add tests for the new trailing-dim split path and the bytes_per_row==0 fallback branch flagged in review. - CHANGELOG entry for the ~4 MB default and the new public helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com>

for more information, see https://pre-commit.ci

…-dev#1445) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com>

sejalpunwatkar · 2026-05-09T05:36:43Z

Hi @rly! Just checking in on PR #1443 whenever you get a chance. Happy to address all the feedback together once you've had time for the complete review!

rly · 2026-05-23T21:15:45Z

            The shape of the dataset.
        dtype : numpy.dtype or type
            The data type, used to determine bytes per element.
+        target_chunk_bytes : int, optional


Why were these changes made in this file?

rly · 2026-05-23T21:17:10Z

 #     hooks:
 #     -   id: black
 -   repo: https://github.com/astral-sh/ruff-pre-commit
+    rev: v0.15.11


Please undo this change as well

rly · 2026-05-23T21:18:50Z

@@ -1,5 +1,13 @@
 # HDMF Changelog

+## Unreleased


Please undo this change as well. These were released in HDMF 6.0.0. It seems like these came from resolving conflicts from a git merge incorrectly.

rly · 2026-05-23T22:18:39Z

@sejalpunwatkar Thanks for continuing your work on this! As I review the PR, I'm realizing this is a significant change to the validator, and we should think through the implementation plan carefully and break it up into smaller pieces.

I owe you a clarification on my earlier suggestion. When I suggested making ValidationWarning a subclass of UserWarning, I didn't think through how these objects would actually be wired into the validator's existing model. The HDMF validator does not raise Error objects at the point of detection. It collects them and returns a list, leaving it up to the caller to decide what to do with them (display, log, fail the run, etc.). Downstream applications like PyNWB and NWB Inspector treat a non-empty return list as "this file is invalid", so we cannot simply mix ValidationWarning objects into that list. Doing so would silently flip previously-valid files to invalid the moment any warning fires.

We also don't want to emit warnings through Python's warnings.warn() machinery. That would break the validator's "collect and return" model and force downstream tools into warnings.catch_warnings() gymnastics just to render warnings alongside errors in the same report.

I've opened #1479 to track the refactor that needs to land before the warning logic from this PR can go in. The short version: introduce a small ValidationResult wrapper that holds both an errors list and a warnings list, and have ValidatorMap.validate() return it instead of a bare list. The wrapper implements __iter__ / __len__ / __bool__ / __getitem__ over the errors list, so every common pattern downstream uses today keeps working unchanged, and new callers opt in to warnings via result.warnings. No coordinated migration with PyNWB / NWB Inspector required. See #1479 for the full rationale and alternatives considered.

Suggested PR breakdown

Given that, I'd suggest splitting this work into two PRs:

PR 1 (addresses #1479): the plumbing only, no new warnings yet.

Add ValidationWarning as a sibling of Error (same .name / .reason / .location shape, not a UserWarning subclass).
Add the ValidationResult wrapper.
Change ValidatorMap.validate() (and the per-type validators it delegates to) to return ValidationResult instead of a bare list, with warnings always empty for now.
Verify the existing validator test suite passes unchanged. The whole point of the wrapper is that it should.

PR 2 (addresses #624): the actual feature.

Add ExtraFieldWarning(ValidationWarning).
Add the logic to detect attributes and children that no applicable spec recognizes.
Add focused tests that assert against result.warnings for the specific cases from Create validation warnings on extra fields #624.

Splitting it this way keeps each PR small enough to review carefully, decouples the API change from the harder semantic questions about when a field is really "extra", and gives us a clean rollback point if either piece turns out to need more thought.

Does this plan make sense? Happy to discuss further before you start cutting it up.

sejalpunwatkar · 2026-05-24T15:53:36Z

Hi @rly! Thank you for the detailed explanation and for opening #1479 , this really clarifies the path forward. The ValidationResult wrapper approach makes a lot of sense to me. Keeping the iter/len/bool/getitem interface over the errors list ensures all existing downstream behavior stays intact, while new callers can opt into result.warnings when needed; no coordinated migration required with PyNWB or NWB Inspector.
Yes, the two-PR split sounds like the right approach. PR 1 as pure plumbing keeps the API change isolated and reviewable on its own, and gives a clean rollback point if anything needs more thought. PR 2 can then focus purely on the detection logic for #624 without mixing concerns.
I'm happy to start on PR 1 following the structure outlined in #1479. Does that work for you?

WIP: validation warning handling (2 failing tests remaining)

84df54f

sejalpunwatkar force-pushed the feature/624-validation-warnings-v2 branch from fd470a8 to 274161a Compare April 23, 2026 14:52

sejalpunwatkar and others added 9 commits April 24, 2026 05:15

Add validation for dataset reference target types (hdmf-dev#632) (hdm…

6f1ce0b

…f-dev#1429) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com>

Bump codecov/codecov-action from 5 to 6 (hdmf-dev#1433)

48519de

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com>

[pre-commit.ci] pre-commit autoupdate (hdmf-dev#1434)

55f674d

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

4d11721

for more information, see https://pre-commit.ci

Make expandable a list; default expands only DynamicTable columns (h…

2e9565c

…dmf-dev#1439) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: rly <310197+rly@users.noreply.github.com>

WIP: validation warning handling (2 failing tests remaining)

dc89001

Fix validation: skip wrong-named children, dedupe warnings, fix scala…

501df27

…r shape error

Fix validator: clean duplicate checks in unmatched builders

883b733

sejalpunwatkar force-pushed the feature/624-validation-warnings-v2 branch from f94d6b1 to 27aa910 Compare April 24, 2026 05:17

rly reviewed Apr 24, 2026

View reviewed changes

Comment thread src/hdmf/validate/errors.py Outdated

rly reviewed Apr 24, 2026

View reviewed changes

Comment thread src/hdmf/validate/errors.py Outdated

rly reviewed Apr 24, 2026

View reviewed changes

Comment thread tests/unit/validator_tests/test_validate.py

sejalpunwatkar force-pushed the feature/624-validation-warnings-v2 branch from 31938db to 0115dc1 Compare April 25, 2026 04:42

sejalpunwatkar and others added 8 commits May 9, 2026 04:55

Add validation for dataset reference target types (hdmf-dev#632) (hdm…

d8a690f

…f-dev#1429) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com>

[pre-commit.ci] auto fixes from pre-commit.com hooks

8fbfcf7

for more information, see https://pre-commit.ci

fix: ruff linting - fix line length and remove unused variables

bc684d6

fix: move resolution into inline spec and remove subtype lookup

b4e7100

[pre-commit.ci] auto fixes from pre-commit.com hooks

b1b9cd0

for more information, see https://pre-commit.ci

Fix link target being included as a column on DynamicTable read (hdmf…

41ed817

…-dev#1445) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

[pre-commit.ci] pre-commit autoupdate (hdmf-dev#1435)

5126c65

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com>

fix: update ValidationWarning to extend UserWarning as suggested

acfb1ab

sejalpunwatkar force-pushed the feature/624-validation-warnings-v2 branch from 6578f41 to acfb1ab Compare May 9, 2026 04:55

Merge branch 'dev' into feature/624-validation-warnings-v2

47699ef

rly reviewed May 23, 2026

View reviewed changes

rly mentioned this pull request May 23, 2026

Refactor validator return type to support warnings as well as errors #1479

Open

Conversation

sejalpunwatkar commented Apr 18, 2026

Motivation

How to test the behavior?

Checklist

Uh oh!

sejalpunwatkar commented Apr 18, 2026

Uh oh!

codecov Bot commented Apr 18, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

rly commented Apr 24, 2026

Uh oh!

Uh oh!

sejalpunwatkar commented Apr 25, 2026

Uh oh!

rly commented Apr 30, 2026

Uh oh!

sejalpunwatkar commented May 2, 2026

Uh oh!

sejalpunwatkar commented May 9, 2026

Uh oh!

rly May 23, 2026

Choose a reason for hiding this comment

Uh oh!

rly May 23, 2026

Choose a reason for hiding this comment

Uh oh!

rly May 23, 2026

Choose a reason for hiding this comment

Uh oh!

rly commented May 23, 2026

Suggested PR breakdown

Uh oh!

sejalpunwatkar commented May 24, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov Bot commented Apr 18, 2026 •

edited

Loading