Accept pandas Series/ExtensionArray for Data; lift pandas<3 cap by rly · Pull Request #1469 · hdmf-dev/hdmf

rly · 2026-05-04T19:28:26Z

Summary

Accept pd.Series and pandas.api.extensions.ExtensionArray (incl. StringArray/ArrowStringArray) as data in Data and its subclasses, normalizing to numpy at the Data.__init__/Data.extend boundary so every subclass (VectorData, VectorIndex, ScratchData, ElementIdentifiers, …) picks up the fix without per-class changes.
Reject pd.NA/NaN and pandas nullable numeric/boolean dtypes (IntegerArray, BooleanArray, FloatingArray) with informative TypeErrors — the former would crash at HDF5 vlen-string write time, the latter would silently widen on .to_numpy().
Lift the pandas<3 cap from pyproject.toml.

Why

Pandas 3.0 makes PyArrow-backed strings the default for all DataFrame string columns. df['col'].values is now ArrowStringArray, so VectorData(name=..., data=df['col'].values) (and any other typical user pattern that hands HDMF a string column) now fails docval type validation. Centralizing the fix at the Data construction boundary means VectorData, add_unit, add_electrode, from_dataframe, etc. all keep working with no further changes.

Behavior

ArrowStringArray, StringArray, pd.Series (any backing dtype), pd.Categorical → converted to np.ndarray silently.
pandas input containing pd.NA or NaN → TypeError pointing at the missing-values cause and asking the user to fill with a sentinel.
IntegerArray/BooleanArray/FloatingArray → TypeError asking the user to cast explicitly (.astype('int64').to_numpy() or .to_numpy(dtype=...)), since defaulting .to_numpy() would silently change the dtype.
Non-pandas inputs are pass-through; no behavior change for existing callers.

Verification

Reproducer from Pandas 3.0 String Type Compatibility Breaking HDMF Data Ingestion #1384 now succeeds.
HDF5 roundtrip on DynamicTable.from_dataframe(df=...) with pandas 3.0.2 default string columns works end-to-end.
Full unit suite on a pandas 3.0.2 environment: 1801 passed, 111 skipped, 1 xfailed, 0 failed.

Test plan

Unit tests added for coerce_pandas_data covering StringArray, ArrowStringArray, plain numeric Series, Categorical, NA-bearing inputs, and nullable int/bool.
End-to-end test through VectorData for both Series and df.values paths.
Manual HDF5 roundtrip with pandas 3.0.
CI passes on Python 3.10–3.13 with pandas 1.4 (lower bound), pandas 2.x, and pandas 3.x.

🤖 Generated with Claude Code

…pat) Pandas 3.0 makes PyArrow-backed strings the default for DataFrame string columns, so df['col'].values is now ArrowStringArray and constructing VectorData(data=...) fails type validation. Add pd.Series and pandas.api.extensions.ExtensionArray to the array_data macro and coerce to numpy at the Data construction boundary so every Data subclass picks up the fix without per-class changes. Reject pd.NA/NaN with an informative TypeError (HDF5 vlen-string writes already crash on these) and reject IntegerArray/BooleanArray/FloatingArray to avoid silent dtype widening on .to_numpy(). Lift the pandas<3 cap. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

codecov · 2026-05-04T19:29:51Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 93.20%. Comparing base (0d61982) to head (2788270).

Additional details and impacted files

@@            Coverage Diff             @@
##              dev    #1469      +/-   ##
==========================================
+ Coverage   93.18%   93.20%   +0.01%     
==========================================
  Files          41       41              
  Lines       10176    10195      +19     
  Branches     2103     2108       +5     
==========================================
+ Hits         9483     9502      +19     
  Misses        415      415              
  Partials      278      278

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

rly marked this pull request as draft May 4, 2026 19:29

Merge branch 'dev' into fix/pandas-3-compat

2788270

rly mentioned this pull request May 5, 2026

Pandas 3.0 String Type Compatibility Breaking HDMF Data Ingestion #1384

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accept pandas Series/ExtensionArray for Data; lift pandas<3 cap#1469

Accept pandas Series/ExtensionArray for Data; lift pandas<3 cap#1469
rly wants to merge 2 commits into
devfrom
fix/pandas-3-compat

rly commented May 4, 2026

Uh oh!

codecov Bot commented May 4, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rly commented May 4, 2026

Summary

Why

Behavior

Verification

Test plan

Uh oh!

codecov Bot commented May 4, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

codecov Bot commented May 4, 2026 •

edited

Loading