Refactor/input module improvements#56
Merged
Merged
Conversation
- Add __repr__ to DataFile (shows filename, variable count, date range)
- Add __repr__ to InputVariable (shows name, station count, row count)
- Cache combined DataFrame in DataFile.data property; add invalidate_cache()
- Use context manager (with statement) in write_ascii
- Fix InputVariable.data setter to use split('_', 1) for station IDs with underscores
- Extract _resolve_units() and add from_file() classmethod factory
- test_datafile_repr: verifies DataFile repr content - test_input_variable_repr: verifies InputVariable repr content - test_invalidate_cache: confirms caching behavior and cache invalidation
Replace all line[0:len(X)] == X patterns with line.startswith(X) for improved readability throughout load_file and _add_file_metadata.
Raise ValueError with available keys if the variable name is not found in the metadata dict, instead of an opaque KeyError.
- Remove explicit object inheritance from DataFile and InputVariable (Python 2 artifact) - Make DataFile.metadata and DataFile.parameters private with read-only properties - Type-annotate write_ascii filename as str | os.PathLike for consistency - Add full type annotations to InputVariable properties (stations, file_metadata_str, full_column_names, drop) - Document that InputVariable.drop() requires DataFile.invalidate_cache() afterward
- Remove '# type: ignore' from pandas imports in DataFile.py and InputVariable.py - Add pandas-stubs to [project.optional-dependencies] dev in pyproject.toml
DataFile.py: - Remove commented-out NA_VALS_DEFAULT constant - Remove commented-out self.__input_vars_intern assignment in _add_variable_data InputVariable.py: - Add _id_column property that returns the first column name (station ID) regardless of case (handles both 'id' and 'ID' from different file formats) - Replace positional iloc[:, 0] with _id_column in stations property - Replace hardcoded 'id' with _id_column in file_metadata_str and drop()
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refactors
pyPRMS/input/DataFile.pyandpyPRMS/input/InputVariable.pyto improve readability, correctness, and maintainability.Changes
DataFile.py
__repr__showing filename, variable count, and date rangedataproperty; addinvalidate_cache()methodwithstatement) inwrite_asciito prevent file handle leaksline[0:len(X)] == Xpatterns withstartswith()from_file()classmethod factory; extract_resolve_units()helperobjectinheritancemetadataandparametersprivate with read-only propertiesstr | os.PathLikeinwrite_asciifor consistency withload_fileInputVariable.py
__repr__showing name, station count, and row countdata.setterto usesplit('_', 1)so station IDs with underscores are preserved__init__with a clearValueErroron missing keysobjectinheritance_id_columnproperty for reliable station ID column access regardless of casestations,file_metadata_str,full_column_names, anddrop()drop()requiresDataFile.invalidate_cache()afterwardpyproject.toml
pandas-stubsto dev dependencies# type: ignorefrom pandas imports in both source filesTesting
test_datafile_repr,test_input_variable_repr,test_invalidate_cache