Fix #624: Refine ExtraFieldWarning handling in validation#1443
Fix #624: Refine ExtraFieldWarning handling in validation#1443sejalpunwatkar wants to merge 20 commits into
Conversation
|
@rly I opened this new PR after closing the previous one because it had a large diff (-283) due to merge conflicts, which made it hard to review. In this version, I kept the changes minimal and focused on aligning with the intended behavior from #542 and #624 (warnings instead of errors). Right now I’m seeing two edge cases: An attribute like resolution (defined in a parent/extended spec) is still being flagged as an extra field Before I refine further, I wanted to confirm if these cases should be handled at the validator level or if this is expected behavior from the spec/matcher side. Thank You! |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## dev #1443 +/- ##
===========================================
- Coverage 93.18% 37.48% -55.71%
===========================================
Files 41 41
Lines 10176 10291 +115
Branches 2103 2142 +39
===========================================
- Hits 9483 3858 -5625
- Misses 415 6071 +5656
- Partials 278 362 +84 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
fd470a8 to
274161a
Compare
…f-dev#1429) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com>
Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com>
for more information, see https://pre-commit.ci
…dmf-dev#1439) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com> Co-authored-by: rly <310197+rly@users.noreply.github.com>
…umns and extended type attributes - Skip unmatched typed children whose type is expected by parent spec hierarchy (MissingDataType covers them) - Skip unmatched untyped children in DynamicTable subtypes (columns like id, foo, VectorData etc.) - Skip unmatched untyped children when spec defines untyped dataset children (wildcard datasets) - Check subtypes when looking up attributes to avoid false ExtraFieldWarning for extended types - Return early from AttributeValidator after ExpectedArrayError to avoid duplicate errors
f94d6b1 to
27aa910
Compare
This should not be flagged as an extra field. Validation should happen against the full, extended spec, not the base VectorData spec.
Could you provide concrete examples of these? It should. |
31938db to
0115dc1
Compare
|
Thank you for the feedback! I've moved resolution into the inline dataset spec within GroupWithBase and removed the subtype lookup. I'll look at #542 for the ValidationWarning/UserWarning approach and update accordingly. |
|
Thanks for the changes @sejalpunwatkar . I'll finish reviewing later this week. In the meantime, could you address the |
|
Thank you for the feedback @rly ! I'll fix the ruff errors and remove the hardcoded field names; those were added as a workaround and I agree the tests should be updated instead. I'll hold off on further changes until your complete review so I can address everything at once. |
…f-dev#1429) Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com>
…#1440) * Set reasonable default chunk sizes (~4 MB) when chunks=True h5py's default auto-chunking targets ~32 KB per chunk, which is too small for cloud access where each chunk requires a separate HTTP range request. Add _compute_chunk_shape() that targets ~4 MB chunks (in the recommended 2-16 MB range for cloud-hosted files). The method is applied in __list_fill__, __setup_chunked_dset__, and __setup_empty_dset__ whenever chunks=True (boolean). Also ensures chunks=True is explicitly set when maxshape forces chunking, so the compute logic always triggers. Closes hdmf-dev#1438 Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com> * Update src/hdmf/backends/hdf5/h5tools.py Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com> * Address review: public API, mesoscale-safe chunking, CHANGELOG - Rename _compute_chunk_shape → compute_default_chunk_shape so users can inspect or override the shape that would be used when chunks=True. - Add optional neurodata_type hook (unused in the default implementation) so subclasses can specialize chunking per type without signature churn. - When a single first-axis slice already exceeds the target (e.g. a (1000, 4096, 4096) uint16 mesoscale volume at 32 MB/slice), halve the largest trailing axis until the chunk fits within the target. Prevents auto-chunks from ballooning well past 16 MB on large-frame datasets. - Add tests for the new trailing-dim split path and the bytes_per_row==0 fallback branch flagged in review. - CHANGELOG entry for the ~4 MB default and the new public helper. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com> --------- Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com> Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com>
for more information, see https://pre-commit.ci
for more information, see https://pre-commit.ci
…-dev#1445) Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Ryan Ly <310197+rly@users.noreply.github.com>
6578f41 to
acfb1ab
Compare
| The shape of the dataset. | ||
| dtype : numpy.dtype or type | ||
| The data type, used to determine bytes per element. | ||
| target_chunk_bytes : int, optional |
There was a problem hiding this comment.
Why were these changes made in this file?
| # hooks: | ||
| # - id: black | ||
| - repo: https://github.com/astral-sh/ruff-pre-commit | ||
| rev: v0.15.11 |
There was a problem hiding this comment.
Please undo this change as well
| @@ -1,5 +1,13 @@ | |||
| # HDMF Changelog | |||
|
|
|||
| ## Unreleased | |||
There was a problem hiding this comment.
Please undo this change as well. These were released in HDMF 6.0.0. It seems like these came from resolving conflicts from a git merge incorrectly.
|
@sejalpunwatkar Thanks for continuing your work on this! As I review the PR, I'm realizing this is a significant change to the validator, and we should think through the implementation plan carefully and break it up into smaller pieces. I owe you a clarification on my earlier suggestion. When I suggested making We also don't want to emit warnings through Python's I've opened #1479 to track the refactor that needs to land before the warning logic from this PR can go in. The short version: introduce a small Suggested PR breakdownGiven that, I'd suggest splitting this work into two PRs: PR 1 (addresses #1479): the plumbing only, no new warnings yet.
PR 2 (addresses #624): the actual feature.
Splitting it this way keeps each PR small enough to review carefully, decouples the API change from the harder semantic questions about when a field is really "extra", and gives us a clean rollback point if either piece turns out to need more thought. Does this plan make sense? Happy to discuss further before you start cutting it up. |
|
Hi @rly! Thank you for the detailed explanation and for opening #1479 , this really clarifies the path forward. The ValidationResult wrapper approach makes a lot of sense to me. Keeping the iter/len/bool/getitem interface over the errors list ensures all existing downstream behavior stays intact, while new callers can opt into result.warnings when needed; no coordinated migration required with PyNWB or NWB Inspector. |
Motivation
Fixes #624
This PR refines how ExtraFieldWarning is generated during validation.
Currently, some attributes that are valid in extended or inherited specifications may still be incorrectly flagged as unexpected. Additionally, unexpected child elements are not always consistently reported depending on matching conditions.
This PR aims to:
How to test the behavior?
pytest tests/unit/validator_tests/test_validate.py
pytest tests/unit/common/test_table.py
Focus on:
Checklist
CHANGELOG.mdwith your changes?