Fix O(n^2) pd.concat in _get_cs_contents#642
Merged
Conversation
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #642 +/- ##
=======================================
Coverage 76.39% 76.39%
=======================================
Files 11 11
Lines 3237 3237
Branches 759 761 +2
=======================================
Hits 2473 2473
Misses 466 466
Partials 298 298
🚀 New features to boost your workflow:
|
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ppend
- Pass pre-built cs_index to _get_elements_to_be_rendered instead of
rebuilding set_index("cs") inside the function
- Vectorize astype("bool") to a single assignment over content_flags
- Replace += [elem] with .append(elem) in render cmd loop
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
_get_cs_contentswas callingpd.concatinside a loop with fourastype("bool")casts per iteration — O(n²) row copies for n coordinate systems. For a 49-sample Visium dataset this means 49 growing concat callscs_contents.query(f"cs == '{cs}'")inside loops, which is O(n) per call and raisespandas.errors.UndefinedVariableErrorfor coordinate system names containing single quotes (e.g."patient's_sample")Fixes:
_get_cs_contentsnow collects rows as a list and builds the DataFrame once, with a single pass forastype("bool")— O(n) total.query(f"cs == '{cs}'")usages replaced withcs_contents.set_index("cs").loc[cs]— O(1) lookup, no injection risk_get_cs_contentsinshow()is eliminated by reusing the already-computedcs_indexMeasured speedup (50 runs each):
Closes #602