Skip to content

Efficiencies#3

Merged
nasiryahm merged 49 commits into
mainfrom
efficiencies
May 21, 2026
Merged

Efficiencies#3
nasiryahm merged 49 commits into
mainfrom
efficiencies

Conversation

@nasiryahm

Copy link
Copy Markdown
Collaborator

Summary of the changes in this pull request:

  • Performance changes: vectorized core computations, replaced loop-based batched_pdist with NumPy triu_indices vectorization, vectorized B matrix construction ++

  • Parallelization : unified parallelism on joblib, replaced raw multiprocess / shared_memory usage in lpca, kpca, best, and post-processing with joblib.Parallel, eliminating shared-memory boilerplate. Also parallelized post-processing and refactored _postprocess into a standalone _postprocess_worker with batched evaluation

  • Dynamic memory allocation: added _get_available_memory() to auto-detect available RAM (via psutil or PYRATS_MEMORY_LIMIT env var) and batch work accordingly, preventing OOM crashes on large datasets

  • sklearn-style API rename: renamed parameters to follow sklearn conventions (d→n_components, k→n_neighbors, eta_min→min_cluster_size, max_iter→n_iter, kpca_kernel→kernel, etc.) with conversions for backwards compatibility

  • Progress bars: added tqdm progress bars (via verbose flag) to PCA, KPCA, intermediate views, and refinement loops; removed raw print statements

  • CI benchmark workflow: added .github/workflows/benchmark.yml and tests/scripts/benchmark.py for speed testing across platforms

  • Cleanup: removed duplicate imports, unused procrustes code, and multiprocess dependency; updated pyproject.toml to add tqdm; refreshed example notebooks and README

…es using smart triu and neighbourhood graph checks
… of row and col via nonzero and dist measure. Plus better csc usage
ffOj and others added 19 commits April 12, 2026 10:13
when the embedding dimension is 1, .squeeze will remove the axis corresponding to the embedding dimension.
C and Ug are boolean sparse matrices. On my end, scipy is resulting a matrix with int data type with summed values (treating True as ones) instead of OR. Casting into boolean dtype gives back OR-ed values.
When there is no tear, compute_tear_graph returns a tuple of None. Comparing it with None results in False resulting in unnecessary subsequent processing.
Added comments to clarify the significance of singular values in overlap calculations.
@nasiryahm nasiryahm merged commit 9e8344a into main May 21, 2026
14 checks passed
@nasiryahm nasiryahm deleted the efficiencies branch May 21, 2026 18:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants