Skip to content

Fix O(n^2) pd.concat in _get_cs_contents#642

Merged
timtreis merged 5 commits intomainfrom
fix/issue-602-quadratic-cs-contents
May 7, 2026
Merged

Fix O(n^2) pd.concat in _get_cs_contents#642
timtreis merged 5 commits intomainfrom
fix/issue-602-quadratic-cs-contents

Conversation

@timtreis
Copy link
Copy Markdown
Member

@timtreis timtreis commented May 7, 2026

Summary

  • _get_cs_contents was calling pd.concat inside a loop with four astype("bool") casts per iteration — O(n²) row copies for n coordinate systems. For a 49-sample Visium dataset this means 49 growing concat calls
  • All callers used cs_contents.query(f"cs == '{cs}'") inside loops, which is O(n) per call and raises pandas.errors.UndefinedVariableError for coordinate system names containing single quotes (e.g. "patient's_sample")

Fixes:

  • _get_cs_contents now collects rows as a list and builds the DataFrame once, with a single pass for astype("bool") — O(n) total
  • All .query(f"cs == '{cs}'") usages replaced with cs_contents.set_index("cs").loc[cs] — O(1) lookup, no injection risk
  • The redundant second call to _get_cs_contents in show() is eliminated by reusing the already-computed cs_index

Measured speedup (50 runs each):

n=  10:  16x faster
n=  50:  55x faster  
n= 100: 101x faster

Closes #602

@codecov-commenter
Copy link
Copy Markdown

codecov-commenter commented May 7, 2026

Codecov Report

❌ Patch coverage is 91.66667% with 1 line in your changes missing coverage. Please review.
✅ Project coverage is 76.39%. Comparing base (e1f8863) to head (6ffb163).
⚠️ Report is 3 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata_plot/pl/utils.py 80.00% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@           Coverage Diff           @@
##             main     #642   +/-   ##
=======================================
  Coverage   76.39%   76.39%           
=======================================
  Files          11       11           
  Lines        3237     3237           
  Branches      759      761    +2     
=======================================
  Hits         2473     2473           
  Misses        466      466           
  Partials      298      298           
Files with missing lines Coverage Δ
src/spatialdata_plot/pl/basic.py 85.66% <100.00%> (+0.18%) ⬆️
src/spatialdata_plot/pl/utils.py 66.21% <80.00%> (-0.23%) ⬇️

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@timtreis timtreis changed the title Fix O(n²) pd.concat in _get_cs_contents; eliminate .query() injection risk (#602) Fix O(n^2) pd.concat in _get_cs_contents; eliminate .query() injection risk May 7, 2026
timtreis and others added 2 commits May 7, 2026 22:05
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ppend

- Pass pre-built cs_index to _get_elements_to_be_rendered instead of
  rebuilding set_index("cs") inside the function
- Vectorize astype("bool") to a single assignment over content_flags
- Replace += [elem] with .append(elem) in render cmd loop

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@timtreis timtreis changed the title Fix O(n^2) pd.concat in _get_cs_contents; eliminate .query() injection risk Fix O(n^2) pd.concat in _get_cs_contents May 7, 2026
timtreis and others added 2 commits May 7, 2026 22:19
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@timtreis timtreis merged commit e682fd8 into main May 7, 2026
7 of 8 checks passed
@timtreis timtreis deleted the fix/issue-602-quadratic-cs-contents branch May 7, 2026 20:28
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

O(n²) pd.concat inside loop in _get_cs_contents — slow for multi-sample datasets

2 participants