Refactor/input module improvements by pnorton-usgs · Pull Request #56 · DOI-USGS/pyPRMS

pnorton-usgs · 2026-05-12T15:29:02Z

Summary

Refactors pyPRMS/input/DataFile.py and pyPRMS/input/InputVariable.py to improve readability, correctness, and maintainability.

Changes

DataFile.py

Add __repr__ showing filename, variable count, and date range
Cache the combined DataFrame in the data property; add invalidate_cache() method
Use context manager (with statement) in write_ascii to prevent file handle leaks
Replace all line[0:len(X)] == X patterns with startswith()
Add from_file() classmethod factory; extract _resolve_units() helper
Remove explicit object inheritance
Make metadata and parameters private with read-only properties
Accept str | os.PathLike in write_ascii for consistency with load_file
Remove commented-out dead code

InputVariable.py

Add __repr__ showing name, station count, and row count
Fix data.setter to use split('_', 1) so station IDs with underscores are preserved
Add metadata validation in __init__ with a clear ValueError on missing keys
Remove explicit object inheritance
Add _id_column property for reliable station ID column access regardless of case
Add full type annotations to stations, file_metadata_str, full_column_names, and drop()
Document that drop() requires DataFile.invalidate_cache() afterward

pyproject.toml

Add pandas-stubs to dev dependencies
Remove # type: ignore from pandas imports in both source files

Testing

All 10 existing tests pass (including 4 roundtrip tests across different data file formats)
3 new tests added: test_datafile_repr, test_input_variable_repr, test_invalidate_cache

- Add __repr__ to DataFile (shows filename, variable count, date range) - Add __repr__ to InputVariable (shows name, station count, row count) - Cache combined DataFrame in DataFile.data property; add invalidate_cache() - Use context manager (with statement) in write_ascii - Fix InputVariable.data setter to use split('_', 1) for station IDs with underscores - Extract _resolve_units() and add from_file() classmethod factory

- test_datafile_repr: verifies DataFile repr content - test_input_variable_repr: verifies InputVariable repr content - test_invalidate_cache: confirms caching behavior and cache invalidation

Replace all line[0:len(X)] == X patterns with line.startswith(X) for improved readability throughout load_file and _add_file_metadata.

Raise ValueError with available keys if the variable name is not found in the metadata dict, instead of an opaque KeyError.

- Remove explicit object inheritance from DataFile and InputVariable (Python 2 artifact) - Make DataFile.metadata and DataFile.parameters private with read-only properties - Type-annotate write_ascii filename as str | os.PathLike for consistency - Add full type annotations to InputVariable properties (stations, file_metadata_str, full_column_names, drop) - Document that InputVariable.drop() requires DataFile.invalidate_cache() afterward

- Remove '# type: ignore' from pandas imports in DataFile.py and InputVariable.py - Add pandas-stubs to [project.optional-dependencies] dev in pyproject.toml

DataFile.py: - Remove commented-out NA_VALS_DEFAULT constant - Remove commented-out self.__input_vars_intern assignment in _add_variable_data InputVariable.py: - Add _id_column property that returns the first column name (station ID) regardless of case (handles both 'id' and 'ID' from different file formats) - Replace positional iloc[:, 0] with _id_column in stations property - Replace hardcoded 'id' with _id_column in file_metadata_str and drop()

pnorton-usgs added 7 commits May 12, 2026 08:18

Add tests for __repr__, invalidate_cache, and data caching

aa7857f

- test_datafile_repr: verifies DataFile repr content - test_input_variable_repr: verifies InputVariable repr content - test_invalidate_cache: confirms caching behavior and cache invalidation

Replace manual string slicing with startswith() in DataFile

a4f9cb2

Replace all line[0:len(X)] == X patterns with line.startswith(X) for improved readability throughout load_file and _add_file_metadata.

Add metadata validation in InputVariable.__init__

5c200e6

Raise ValueError with available keys if the variable name is not found in the metadata dict, instead of an opaque KeyError.

Remove pandas type: ignore and add pandas-stubs to dev deps

accd15f

- Remove '# type: ignore' from pandas imports in DataFile.py and InputVariable.py - Add pandas-stubs to [project.optional-dependencies] dev in pyproject.toml

pnorton-usgs self-assigned this May 12, 2026

pnorton-usgs merged commit dcecc5b into development May 12, 2026
7 checks passed

pnorton-usgs deleted the refactor/input-module-improvements branch May 12, 2026 15:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor/input module improvements#56

Refactor/input module improvements#56
pnorton-usgs merged 7 commits into
developmentfrom
refactor/input-module-improvements

pnorton-usgs commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

pnorton-usgs commented May 12, 2026

Summary

Changes

DataFile.py

InputVariable.py

pyproject.toml

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant