Skip to content

Refactor/input module improvements#56

Merged
pnorton-usgs merged 7 commits into
developmentfrom
refactor/input-module-improvements
May 12, 2026
Merged

Refactor/input module improvements#56
pnorton-usgs merged 7 commits into
developmentfrom
refactor/input-module-improvements

Conversation

@pnorton-usgs
Copy link
Copy Markdown
Member

Summary

Refactors pyPRMS/input/DataFile.py and pyPRMS/input/InputVariable.py to improve readability, correctness, and maintainability.

Changes

DataFile.py

  • Add __repr__ showing filename, variable count, and date range
  • Cache the combined DataFrame in the data property; add invalidate_cache() method
  • Use context manager (with statement) in write_ascii to prevent file handle leaks
  • Replace all line[0:len(X)] == X patterns with startswith()
  • Add from_file() classmethod factory; extract _resolve_units() helper
  • Remove explicit object inheritance
  • Make metadata and parameters private with read-only properties
  • Accept str | os.PathLike in write_ascii for consistency with load_file
  • Remove commented-out dead code

InputVariable.py

  • Add __repr__ showing name, station count, and row count
  • Fix data.setter to use split('_', 1) so station IDs with underscores are preserved
  • Add metadata validation in __init__ with a clear ValueError on missing keys
  • Remove explicit object inheritance
  • Add _id_column property for reliable station ID column access regardless of case
  • Add full type annotations to stations, file_metadata_str, full_column_names, and drop()
  • Document that drop() requires DataFile.invalidate_cache() afterward

pyproject.toml

  • Add pandas-stubs to dev dependencies
  • Remove # type: ignore from pandas imports in both source files

Testing

  • All 10 existing tests pass (including 4 roundtrip tests across different data file formats)
  • 3 new tests added: test_datafile_repr, test_input_variable_repr, test_invalidate_cache

- Add __repr__ to DataFile (shows filename, variable count, date range)
- Add __repr__ to InputVariable (shows name, station count, row count)
- Cache combined DataFrame in DataFile.data property; add invalidate_cache()
- Use context manager (with statement) in write_ascii
- Fix InputVariable.data setter to use split('_', 1) for station IDs with underscores
- Extract _resolve_units() and add from_file() classmethod factory
- test_datafile_repr: verifies DataFile repr content
- test_input_variable_repr: verifies InputVariable repr content
- test_invalidate_cache: confirms caching behavior and cache invalidation
Replace all line[0:len(X)] == X patterns with line.startswith(X) for
improved readability throughout load_file and _add_file_metadata.
Raise ValueError with available keys if the variable name is not found
in the metadata dict, instead of an opaque KeyError.
- Remove explicit object inheritance from DataFile and InputVariable (Python 2 artifact)
- Make DataFile.metadata and DataFile.parameters private with read-only properties
- Type-annotate write_ascii filename as str | os.PathLike for consistency
- Add full type annotations to InputVariable properties (stations, file_metadata_str, full_column_names, drop)
- Document that InputVariable.drop() requires DataFile.invalidate_cache() afterward
- Remove '# type: ignore' from pandas imports in DataFile.py and InputVariable.py
- Add pandas-stubs to [project.optional-dependencies] dev in pyproject.toml
DataFile.py:
- Remove commented-out NA_VALS_DEFAULT constant
- Remove commented-out self.__input_vars_intern assignment in _add_variable_data

InputVariable.py:
- Add _id_column property that returns the first column name (station ID)
  regardless of case (handles both 'id' and 'ID' from different file formats)
- Replace positional iloc[:, 0] with _id_column in stations property
- Replace hardcoded 'id' with _id_column in file_metadata_str and drop()
@pnorton-usgs pnorton-usgs self-assigned this May 12, 2026
@pnorton-usgs pnorton-usgs merged commit dcecc5b into development May 12, 2026
7 checks passed
@pnorton-usgs pnorton-usgs deleted the refactor/input-module-improvements branch May 12, 2026 15:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant