Efficiencies by nasiryahm · Pull Request #3 · Mishne-Lab/pyRATS

nasiryahm · 2026-04-10T10:18:18Z

Summary of the changes in this pull request:

Performance changes: vectorized core computations, replaced loop-based batched_pdist with NumPy triu_indices vectorization, vectorized B matrix construction ++
Parallelization : unified parallelism on joblib, replaced raw multiprocess / shared_memory usage in lpca, kpca, best, and post-processing with joblib.Parallel, eliminating shared-memory boilerplate. Also parallelized post-processing and refactored _postprocess into a standalone _postprocess_worker with batched evaluation
Dynamic memory allocation: added _get_available_memory() to auto-detect available RAM (via psutil or PYRATS_MEMORY_LIMIT env var) and batch work accordingly, preventing OOM crashes on large datasets
sklearn-style API rename: renamed parameters to follow sklearn conventions (d→n_components, k→n_neighbors, eta_min→min_cluster_size, max_iter→n_iter, kpca_kernel→kernel, etc.) with conversions for backwards compatibility
Progress bars: added tqdm progress bars (via verbose flag) to PCA, KPCA, intermediate views, and refinement loops; removed raw print statements
CI benchmark workflow: added .github/workflows/benchmark.yml and tests/scripts/benchmark.py for speed testing across platforms
Cleanup: removed duplicate imports, unused procrustes code, and multiprocess dependency; updated pyproject.toml to add tqdm; refreshed example notebooks and README

…s unified. Removed duplicated imports.

…domization issue

…es using smart triu and neighbourhood graph checks

…crustes code.

…a range of datapoint numbers

…ersions

… of row and col via nonzero and dist measure. Plus better csc usage

… test

… commits

…parisons

…eads, for postprocessing

…asets without crashes

when the embedding dimension is 1, .squeeze will remove the axis corresponding to the embedding dimension.

C and Ug are boolean sparse matrices. On my end, scipy is resulting a matrix with int data type with summed values (treating True as ones) instead of OR. Casting into boolean dtype gives back OR-ed values.

When there is no tear, compute_tear_graph returns a tuple of None. Comparing it with None results in False resulting in unnecessary subsequent processing.

Added comments to clarify the significance of singular values in overlap calculations.

…odebase

…efficiencies

nasiryahm added 30 commits March 24, 2026 16:38

fix: Parallelization broken in lpca and different parallelism backend…

dda0543

…s unified. Removed duplicated imports.

Tests: Update to CI for new parallel library changes plus fix for ran…

a4e1851

…domization issue

speedup: Parallelization of post-processing neighbour distance measur…

6aac4b1

…es using smart triu and neighbourhood graph checks

speedup: Vectorizing pdist

8035701

fix: Improving error handling and variable descriptions

01ab3c8

style: Code cleanups for neatness and readability. Removed unused pro…

0e2313b

…crustes code.

CI: New continuous integration testing for speed. Tests speed across …

e425db7

…a range of datapoint numbers

speedup: Dhruv's eigh implementation

cc84dd7

fix: Imports fix

40db85a

speedup: Vectorizing B construction and avoiding sparse to dense conv…

4641346

…ersions

speedup: Speedup of zeta function by replacing pdist with measurement…

26528b3

… of row and col via nonzero and dist measure. Plus better csc usage

Merging zeta change

5bdc3d7

tests: Updating benchmarking tests to include windows, mac, and linux…

cb6fcd1

… test

Test: Adding a test benchmark script to give timings across different…

a1a2806

… commits

Fix: Issue with sha code

ef23418

Matrixifying refs for benchmarks and removing Dhruv's

bfca429

Efficiency: Memefficiency via Joshua code and and updated sha for com…

274f841

…parisons

Fix: new mode of batch all and importing

1254199

Fix: Past git repo commit needed init

a82f827

Update: Cleaner method for src reconstruction for commit comparisons

d1f3123

Test: Alternative memory efficient, with parallelization over num thr…

81018d5

…eads, for postprocessing

Style: Updating parameter names to match sklearn

3325634

Fix: Test var names

d7e10d6

style: Naming change of buml_obj to model

2d7695b

feat: New method for dynamic allocation of memory to enable large dat…

d7fcb10

…asets without crashes

Fix: Dynamic memory allocation fix

d8f85a1

style: progress bar update and verbose cleanup

87b2e81

Update to readme and pyproject (tqdm addition)

1648a79

style: Cleaner tqdms

a074cc9

speedup test: bottlnecks re-introduced. Testing shift back.

9203f56

nasiryahm force-pushed the efficiencies branch from 9e613c4 to 9203f56 Compare April 11, 2026 07:18

ffOj and others added 19 commits April 12, 2026 10:13

feat: add global distortion measurement

a46dceb

Fix for the breakages of parallelization in batched calls

293c317

Memory Limiting Overhaul Check

5369ea3

Allowing multiithreads within blas workers

2069762

ci fixes

f01c358

Testing more efficient allocation and memory checks

a2e7add

Setting cpu core checker and warning

84a46f4

Fix param.v update

7db1ddf

when the embedding dimension is 1, .squeeze will remove the axis corresponding to the embedding dimension.

Cast Utildeg to boolean after matrix multiplication

39534f8

C and Ug are boolean sparse matrices. On my end, scipy is resulting a matrix with int data type with summed values (treating True as ones) instead of OR. Casting into boolean dtype gives back OR-ed values.

Fix output collection

cf7eced

When there is no tear, compute_tear_graph returns a tuple of None. Comparing it with None results in False resulting in unnecessary subsequent processing.

Enhance comments on singular value handling in overlaps

07bb35e

Added comments to clarify the significance of singular values in overlap calculations.

Reverting to old parallel lib for exact matching to Dhruv's initial c…

8fcf478

…odebase

Memory cache update for memory testing

02ca601

Merge branch 'main' into efficiencies

5d35f4c

Merge remote-tracking branch 'origin/chiggum-patch-1' into efficiencies

898a7bb

Merge remote-tracking branch 'origin/chiggum-patch-2' into efficiencies

9c4107c

Merge remote-tracking branch 'origin/chiggum-patch-3' into efficiencies

c78c0ce

Merge remote-tracking branch 'origin/feature/global-distortion' into …

c9890a7

…efficiencies

tqdm cleanup

101ba26

nasiryahm merged commit 9e8344a into main May 21, 2026
14 checks passed

nasiryahm deleted the efficiencies branch May 21, 2026 18:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Efficiencies#3

Efficiencies#3
nasiryahm merged 49 commits into
mainfrom
efficiencies

nasiryahm commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

nasiryahm commented Apr 10, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants