Releases: kaparoo/kaparoo-python
Releases · kaparoo/kaparoo-python
v0.7.0
Added
kaparoo.filesystem.utils.ensure_file_extension: a pure (no filesystem)
extension check requiring a case-insensitive.<ext>final suffix
(raisingValueErrorotherwise).extmay be a single extension or an
iterable of acceptable ones (e.g.("jpg", "jpeg")).add=True(mirroring
makeonensure_dir_exists) appends the first extension when the path
has no suffix instead of raising (np.save-style); a wrong suffix still
raises. The leading dot onextis optional.
Changed
- Renamed
SegmentTimer->SpanTimerandSegmentRecord->SpanRecord
(modulekaparoo.utils.timer). "Span" fits bothlap(contiguous spans)
andmeasure(arbitrary spans) without implying a partition, and avoids
the "periodic timer" reading of interval. Thelap/measuremethods,
thedurationfield, and all behavior are unchanged. Breaking: update
imports fromSegmentTimer/SegmentRecordtoSpanTimer/SpanRecord.
v0.6.0
Added
kaparoo.data.sequences.TransformedSequence: a lazy view that applies a
transformcallable to each item ofsource.get_metapasses through
source.get_metaby default (M_out = M_in); override in a subclass when
M_outdiffers.T_outandM_outdefault toT_in/M_in(PEP 696).kaparoo.data.sequences.ZippedSequence: element-wise zip of two
sequences — itemiis(first[i], second[i])and metadataiis
(M1, M2)(the "paired image + label" patternConcatSequencecannot
express).strict=True(default) requires equal lengths and raises
ValueErroron a mismatch;strict=Falsetruncates to the shorter
length like the builtinzip.get_items/get_metasbulk-delegate to
each source. For three or more, nest the pairs.
Changed
WindowedSequence[T, M_in, M_out]:M_outnow defaults toM_in(PEP
696), so the common case ofM_out == M_inno longer requires the third
type argument. Existing explicit three-argument usage is unaffected.FileFolderSequenceis now a subclass ofFileListSequence— the folder
case is just aFileListSequencewhose list is discovered under aroot
and stored root-relative. Its API and behavior are unchanged (paths are
still kept relative andget_filere-prependsroot), but
isinstance(seq, FileListSequence)is now True for folder sequences.
v0.5.0
Added
kaparoo.utils.aggregate(still experimental):VarandStdreductions
-- weighted population variance and standard deviation, accumulated online
(Welford) and merged exactly (Chan's parallel algorithm), so they nest
across loop levels like the other reductions.kaparoo.data.sequences.FileListSequence: a "one file per item"
DataSequenceover an explicit, ordered list of files. Unlike
FileFolderSequenceit takes the files directly (norootdiscovery),
so they may live in unrelated directories -- or, on Windows, different
drives -- whichFileFolderSequencecannot represent. Subclasses
implement onlyload_file/get_meta; the input order is preserved
verbatim (duplicates kept) and files are loaded lazily.
Fixed
make_dirsnow raisesNotADirectoryError(matchingmake_dir) when a
path exists but is not a directory, instead of the divergent
FileExistsErrorthatmkdirproduced.make_dir/make_dirsvalidate every path before any directory is
wiped or created, so a deterministically bad entry (e.g. a file in the
list) no longer leaves earlier directories already cleaned or created.make_dir(clean=True)/make_dirs(clean=True)reject a symlink with
NotADirectoryErrorrather than failing deep insideshutil.rmtree;
cleaning never operates through a link.reserve_path/reserve_pathstreat a symlink -- including a broken
one, whichPath.existsreports as absent -- as occupying the path.StagedFile.commit(withoverwrite=False) no longer fails outright on a
filesystem without hardlink support (FAT/exFAT, some network mounts): it
falls back to an existence check plus replace instead of losing the staged
content to a rawOSError.StagedFile.commit/StagedDirectory.commitnow fsync the destination's
parent directory after the move, so the committed result survives a crash
on POSIX (a no-op where directories cannot be fsynced, e.g. Windows).StagedDirectory.commitwithoverwrite=Truenow restores the original
directory if moving the staged one into place fails, instead of leaving
the destination missing with the old contents stranded under a<name>.old
name; the backup removal is best-effort.
v0.4.0
Added
kaparoo.filesystem.staged.StagedFile: a safe (atomic) file writer.
Content is staged in a temporary file in the destination's directory and
moved into place only on commit, so readers never see a half-written file
and a failed write leaves any existing file untouched. Usable as a context
manager (commit on clean exit, discard on exception) or explicitly like
a file object (write/seek/tell/flush, pluscommit/
abort,path,committed, and the underlyingfile). Text by default
(StagedFile[str]) with optionalencoding/newline;binary=True
gives a binary writer (StagedFile[bytes]), the type parameter tracking
the mode.overwrite=False(default) fails fast on an existing destination
and creates the file atomically;overwrite=Truereplaces it, keeping its
permissions;make_parents=Truecreates a missing parent directory. An
uncommitted writer discards its staged file on garbage collection.kaparoo.filesystem.staged.StagedDirectory: the directory counterpart of
StagedFile. Files are written into a temporaryworkdirin the
destination's parent and moved into place on commit. Same context-manager /
explicit usage andcommit/abort/path/committedAPI (plus
workdir), and the sameoverwrite/make_parentsoptions. Creating a
new directory is atomic (single rename); replacing an existing one
(overwrite=True) swaps the old aside and removes it, which is not fully
atomic. An uncommitted builder discards its staging directory on garbage
collection.kaparoo.filesystem.utils.reserve_path/reserve_paths: a guard (and
its bulk form) for a path that should not yet exist, returning it
(optionally stringified) so the caller can create something there.
exist_ok(named as inmake_dir/Path.mkdir) is a
non-destructive bypass (nothing is deleted) andmake_parents
creates the parent directory when missing.
RaisesFileExistsErroron conflict.reserve_pathsis fail-fast and
takes noroot(compose withwrap_paths(prepend=...)). For directory
destinations prefermake_dir(exist_ok=...); for exclusive file creation
the stdlibopen(path, "x")suffices.cleanoption onmake_dir/make_dirs: when an existing directory
is present, remove its contents and recreate it empty (a fresh slate).
Destructive, and only ever wipes a directory -- a non-directory at
the path still raisesNotADirectoryError.clean=Truemakesexist_ok
moot, since the directory is removed and remade.kaparoo.filesystemdirectory checksdir_not_empty,
dir_not_empty_unsafe,dirs_not_empty, anddirs_not_empty_unsafe,
the negated counterparts of thedir_emptyseries.dirs_not_empty
is True only when every directory is non-empty.kaparoo.utils.aggregatemodule (experimental -- the API may change in
a later release):Aggregatorfor nested, pluggable metric aggregation
(the batch → epoch → run pattern). Each metric is
reduced by aReduction-- built-insMean(weighted),Sum,Min,
Max,Last, andFold(a scalar monoid from a callable) -- with
per-metricoverrides. Reductions are online (constant memory); nested
levels compose viamerge(exact sample-weighted pooling) or
update(child.compute(), ...)(different reduction per level). Custom
reductions subclassReduction/UnweightedReduction.SegmentTimer.measure(label): a stopwatch-style context manager (and
decorator) that records a segment covering only the wrapped block, so
time spent outside anymeasureblock is excluded fromrecords/
summary. Complementslap, which splits the timeline into
contiguous segments. Pauses inside the block are excluded; a block
that raises records nothing.
Changed
- Renamed
LapTimer->SegmentTimer,LapRecord->SegmentRecord,
and the record fieldlap_time->duration, reflecting that the
timer now records named segments via bothlap(split) and the new
measure(block). Thelapmethod keeps its name. Timer.resume/SegmentTimer.resumenow returnNoneinstead of
the pause duration in nanoseconds. The value had no consumer
(suspenddiscarded it) and leaked a raw-nanosecond figure that broke
the timer'sunitabstraction. Subclasses that need the pause
interval override the new protected_resumehook instead.
v0.3.0
Added
kaparoo.data.sequencessubpackage: aSequence-based foundation for
dataset code.DataSequence[T, M]ABC with abstractget_item/get_metaand
defaultget_items/get_metas/get_pair/get_pairs.
__getitem__returns the item only.- Composers:
SlicedSequence(stable-length view at given indices,
duplicates allowed and order preserved);ConcatSequence
(O(log N) lookup over multiple sources via cumulative lengths +
bisect_right);WindowedSequence[T, M_in, M_out](abstract
sliding window withsize/step/skip;get_itemis
implemented,get_metais left abstract). - Templates:
FileFolderSequence(folder-rooted, one file per item;
subclasses implementlist_files/load_file/get_meta;
supports the "set state BEFOREsuper().__init__()" pattern for
parameterized subclasses);SingleFileSequence(thin ABC for
"one file, many records" formats).
Changed
generate_batches:step,skip,start,stop, anddrop_last
are now keyword-only. Empty ranges (start == stop) are accepted
and yield no batches. Docstring expanded.
Fixed
register_filterdecorator now preserves the decorated subclass's
type. Previously it widened totype[Filter], so static checkers
rejected subclass-specific constructor calls at decorated classes.generate_batcheswithdrop_last=False: the final partial window
no longer extends paststopwhenstop < len(sequence).
Removed
kaparoo.data.sequence(single module) andkaparoo.data.utils—
replaced by thekaparoo.data.sequencessubpackage. The previous
DataSequence.by_index/by_indicesAPI was a placeholder and
has been superseded byget_item/get_items/get_meta/
get_metas/get_pair/get_pairs.
v0.2.1
Added
- Filter serialization:
Filter.to_dict()/Filter.from_dict()with
a"kind"-discriminated polymorphic dispatcher. Each concrete
filter round-trips through a JSON-compatible dict. register_filter(kind)decorator for registering customFilter
subclasses with the polymorphic dispatcher.Filter.parse(value)— normalizes either aFilterinstance
(passed through) or aFilterDictinto aFilter.FilterDictTypedDict family at
kaparoo.filesystem.search.filters.types:FilterDict(base,
kind-only),PatternFilterDict,MultiPatternFilterDict,
LogicalChildrenFilterDict,LogicalChildFilterDict. User-defined
filter dicts extend these to type-check againstFilter.parse.Search.run/search_paths/search_files/search_dirs
accept aFilterDictforpart_filterandname_filterin
addition to aFilterinstance.
v0.2.0
Published on PyPI: https://pypi.org/project/kaparoo-python/0.2.0/
uv add kaparoo-python # or: pip install kaparoo-pythonRequires Python 3.14+.
See CHANGELOG.md for the full list of changes.