chore: bump dependencies by tristan-f-r · Pull Request #310 · Reed-CompBio/spras

tristan-f-r · 2025-07-01T22:20:30Z

Closes #210. Closes #134.

Depends on test(evaluate): dynamically serialize data.pickle #340.

ntalluri · 2025-07-02T16:50:44Z

Do you have an example of this (the image) that you could add as a comment?

ntalluri · 2025-07-02T16:54:29Z

Also could you mention what computer you are using (there is something weird with windows version 10 with scikit learn that we were dealing with 2 years ago but never fixed #135)

tristan-f-r · 2025-07-02T16:56:14Z

NixOS 25.05.20250108.bffc22e (Warbler) on x86_64, though I do think this issue is reproducing on CI, so I doubt this is platform dependent. I'll reproduce it.

tristan-f-r · 2025-07-02T20:57:14Z

@ntalluri

spras/test/ml/test_ml.py

Lines 83 to 95 in 54bd449

    
           def test_pca_robustness(self): 
        
               dataframe = ml.summarize_networks([INPUT_DIR + 'test-data-s1/s1.txt', INPUT_DIR + 'test-data-s2/s2.txt', INPUT_DIR + 'test-data-s3/s3.txt']) 
        
               expected = pd.read_table(EXPECT_DIR + 'expected-pca-coordinates.tsv') 
        
               expected = expected.round(5) 
        
               for _ in range(5): 
        
                   dataframe_shuffled = dataframe.sample(frac=1, axis=1)  # permute the columns 
        
                   ml.pca(dataframe_shuffled, OUT_DIR + 'pca-shuffled-columns.png', OUT_DIR + 'pca-shuffled-columns-variance.txt', 
        
                       OUT_DIR + 'pca-shuffled-columns-coordinates.tsv') 
        
                   coord = pd.read_table(OUT_DIR + 'pca-shuffled-columns-coordinates.tsv') 
        
                   coord = coord.round(5)  # round values to 5 digits to account for numeric differences across machines 
        
                   coord.sort_values(by='algorithm', ignore_index=True, inplace=True) 
        
                   assert coord.equals(expected)

Coord:
      algorithm      PC1      PC2
0  test-data-s1 -2.00665 -0.98659
1  test-data-s2 -1.52765  1.07995
2  test-data-s3  3.53430 -0.09336       algorithm      PC1      PC2
0  test-data-s1 -2.00665 -0.98659
1  test-data-s2 -1.52765  1.07995
2  test-data-s3  3.53430 -0.09336

Expected:
      algorithm      PC1      PC2
0  test-data-s1  2.00665 -0.98659
1  test-data-s2  1.52765  1.07995
2  test-data-s3 -3.53430 -0.09336       algorithm      PC1      PC2
0  test-data-s1 -2.00665 -0.98659
1  test-data-s2 -1.52765  1.07995
2  test-data-s3  3.53430 -0.09336

tristan-f-r · 2025-07-02T21:16:09Z

Found it. https://scikit-learn.org/stable/whats_new/v1.5.html#changed-models

ntalluri · 2025-07-03T15:23:35Z

Based on what I’m reading, bumping the version is a good idea because it improves PCA in a meaningful way. Before, PCA (in version 1.2) signs were chosen by looking at how the data looked after projection (transformed data), but now they are chosen by inspecting each component vector directly. Previously, each solver decided the signs in its own way, which meant that changing the solver could cause components to unexpectedly flip signs. While the mathematical results seem to always still be correct, this causes confusion. The new approach in version 1.7 makes the sign choice independent of the solver because it relies only on the component vector’s own values, ensuring we always get consistent component signs no matter which solver we use.

We don't allow a user to pick a solver, we have it set to auto at the moment.
https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html#sklearn.decomposition.PCA:~:text=svd_solver%7B%E2%80%98auto%E2%80%99%2C%20%E2%80%98full,optionally%20truncated%20afterwards.

untested; need to open my devcontainer

tristan-f-r · 2025-07-03T17:05:35Z

Let me know if this commit is the right way to approach that. To my understanding, this does mess with our sampling test because our component vectors differ per run; hopefully, the aforementioned commit addresses that.

tristan-f-r · 2025-07-03T17:50:30Z

Blocked by #265 - we should discuss removing GraphSpace support in today's meeting.

ntalluri · 2025-07-03T20:03:24Z

Let me know if this commit is the right way to approach that. To my understanding, this does mess with our sampling test because our component vectors differ per run; hopefully, the aforementioned commit addresses that.

For that specific test, we want the output to be the same despite shuffling the rows and columns. So having 2 different outputs is not what we would want.

tristan-f-r · 2025-07-03T20:27:38Z

For that specific test, we want the output to be the same despite shuffling the rows and columns. So having 2 different outputs is not what we would want.

Hm, do we roll back scikit-learn then? The PCA signage is going to start depending on the content, and it seems that permutations count as a mutation over the content.

ntalluri · 2025-07-07T18:09:22Z

Maybe we need to rethink the test then. Is it okay the values are the same but the signs change? Or should it always be the same values no matter the data? (Just some questions to ask). I will think through what we can do for this test case.

agitter · 2025-07-13T13:19:48Z

Is it okay the values are the same but the signs change?

Can you help me get caught up on the core question? It seems valid to have a transformation of the PCA solution where the signs are flipped.

tristan-f-r · 2025-07-13T16:28:12Z

Can you help me get caught up on the core question? It seems valid to have a transformation of the PCA solution where the signs are flipped.

The essence of the test that was broken here was row shuffling preserving PCA output:

dataframe_shuffled = dataframe.sample(frac=1, axis=1)  # permute the columns
ml.pca(dataframe_shuffled, OUT_DIR + 'pca-shuffled-columns.png', OUT_DIR + 'pca-shuffled-columns-variance.txt',
    OUT_DIR + 'pca-shuffled-columns-coordinates.tsv')
coord = pd.read_table(OUT_DIR + 'pca-shuffled-columns-coordinates.tsv')
coord = coord.round(5)  # round values to 5 digits to account for numeric differences across machines
coord.sort_values(by='algorithm', ignore_index=True, inplace=True)

This now needs an assertion check for two possible signings:

assert coord.equals(expected) or coord.equals(expected_other)

tristan-f-r · 2025-07-18T20:25:04Z

~~Since this is just a standard dependency bump, this seems okay to merge. I do want one last review to check in that we're okay with this PCA output signage change.~~ Will be reviewed on Monday.

ntalluri · 2025-07-21T20:04:54Z

If it is the ensembling one, there was a data.pickle I added that has a toy dataset network that is needed for ensembling in general that we use to ensure we calculate recall correct. I'm not sure why that would need to be updated (if it is this pickled dataset) because it has nothing to do with the pca.

agitter · 2025-07-21T21:19:32Z

Regardless, we probably should stop committing artifacts to main anyway - this causes problems; see #320 for an example of a large test refactor because of a related issue

This seems worthwhile to discuss in a new issue as a testing strategy change if it affects multiple tests.

tristan-f-r · 2025-07-21T21:23:03Z

#339

tristan-f-r · 2025-07-21T21:27:47Z

@tristan-f-r Which PR was it? The ensembling one?

👍 It was #212 (origin commit). I'll prepare another PR to make that test follow #339.

tristan-f-r · 2025-07-21T21:31:21Z

If it is the ensembling one, there was a data.pickle I added that has a toy dataset network that is needed for ensembling in general that we use to ensure we calculate recall correct. I'm not sure why that would need to be updated (if it is this pickled dataset) because it has nothing to do with the pca.

Pickled objects contain internal representations of the object, and are subject to change throughout package versions. In this case, pandas itself changed.

tristan-f-r · 2025-07-21T21:43:04Z

#340 should fix this.

tristan-f-r · 2025-07-23T17:58:24Z

@ntalluri there seems to be sorting issues with the new commits making the PCA test flaky (see this commit which addresses that) - the tests only worked occasionally on my machine before that commit.

ntalluri

Code looks good to me. However, I am going to test the environment updates locally before approving.

tristan-f-r · 2025-07-23T18:36:18Z

By the way, some dependencies may be a little outdated (e.g. this PR outlived a pandas release cycle) - I can update them, but this PR was mostly intended to avoid all of the dependency errors that were causing me problems with pixi.

tristan-f-r · 2025-07-24T19:29:25Z

Okay - sorry about this terribly long hill of a PR for how small it is. That should be the last commit which fixes the tests from the latest merge 👍

agitter

I ran the TestML tests locally and they passed

pre-commit is too outdated to run typos, we need to bump pre-commit

chore: bump dependencies

637454a

tristan-f-r added the refactor Changes that don't actually improve anything except for code quality. label Jul 1, 2025

Merge branch 'umain' into dep-bump

2ae2804

tristan-f-r added 4 commits July 2, 2025 20:58

chore: relax sphinx

30c731d

chore: debmp docutils

613eaa5

chore: debump docutils in pyproject.toml

2a62446

fix: note about conda docker py

faf7f8e

tristan-f-r and others added 2 commits July 3, 2025 09:42

test: try flipping pca

2b33325

untested; need to open my devcontainer

test: add second signage pca

ae02c99

tristan-f-r added the blocked-by-other-pr label Jul 3, 2025

ntalluri reviewed Jul 3, 2025

View reviewed changes

Comment thread test/ml/test_ml.py

tristan-f-r removed the blocked-by-other-pr label Jul 8, 2025

Merge branch 'umain' into dep-bump

0a41d50

tristan-f-r marked this pull request as ready for review July 8, 2025 20:11

tristan-f-r requested a review from ntalluri July 18, 2025 20:25

agitter reviewed Jul 19, 2025

View reviewed changes

Comment thread test/ml/expected/expected-pca-coordinates-2.tsv Outdated

tristan-f-r mentioned this pull request Jul 21, 2025

Integration testing instead of artifacts #339

Open

tristan-f-r mentioned this pull request Jul 21, 2025

test(evaluate): dynamically serialize data.pickle #340

Merged

tristan-f-r added the blocked-by-other-pr label Jul 21, 2025

tristan-f-r added 2 commits July 23, 2025 10:24

Merge branch 'umain' into dep-bump

7f4a00b

fix: correct negated file

88c8568

tristan-f-r removed the blocked-by-other-pr label Jul 23, 2025

tristan-f-r added 2 commits July 23, 2025 10:26

style: fmt

cc89624

fix: sort datapoint_labels

05b9a69

ntalluri previously requested changes Jul 23, 2025

View reviewed changes

Comment thread test/ml/expected/expected-pca-coordinates-negated.tsv

Comment thread test/ml/test_ml.py Outdated

chore: drop unnecessary line

310fb0d

tristan-f-r requested a review from ntalluri July 23, 2025 18:23

ntalluri reviewed Jul 23, 2025

View reviewed changes

tristan-f-r added 2 commits July 24, 2025 17:06

test: kde negated

ed89f6b

test: finalize

9362f5f

tristan-f-r requested a review from ntalluri July 24, 2025 19:29

agitter approved these changes Jul 25, 2025

View reviewed changes

tristan-f-r merged commit e899da8 into Reed-CompBio:main Jul 25, 2025
15 checks passed

tristan-f-r deleted the dep-bump branch July 25, 2025 15:42

agitter mentioned this pull request Jul 25, 2025

fix: Roll back typos to rev 1.25.0 #355

Closed

tristan-f-r mentioned this pull request May 11, 2026

refactor: preserve list-generating data #474

Draft

3 tasks

Conversation

tristan-f-r commented Jul 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ntalluri commented Jul 2, 2025

Uh oh!

ntalluri commented Jul 2, 2025

Uh oh!

tristan-f-r commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tristan-f-r commented Jul 2, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tristan-f-r commented Jul 2, 2025

Uh oh!

ntalluri commented Jul 3, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tristan-f-r commented Jul 3, 2025

Uh oh!

tristan-f-r commented Jul 3, 2025

Uh oh!

Uh oh!

ntalluri commented Jul 3, 2025

Uh oh!

tristan-f-r commented Jul 3, 2025

Uh oh!

ntalluri commented Jul 7, 2025

Uh oh!

agitter commented Jul 13, 2025

Uh oh!

tristan-f-r commented Jul 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tristan-f-r commented Jul 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

ntalluri commented Jul 21, 2025

Uh oh!

agitter commented Jul 21, 2025

Uh oh!

tristan-f-r commented Jul 21, 2025

Uh oh!

tristan-f-r commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tristan-f-r commented Jul 21, 2025

Uh oh!

tristan-f-r commented Jul 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tristan-f-r commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ntalluri left a comment

Choose a reason for hiding this comment

Uh oh!

tristan-f-r commented Jul 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tristan-f-r commented Jul 24, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

agitter left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

tristan-f-r commented Jul 1, 2025 •

edited

Loading

tristan-f-r commented Jul 2, 2025 •

edited

Loading

tristan-f-r commented Jul 2, 2025 •

edited

Loading

ntalluri commented Jul 3, 2025 •

edited

Loading

tristan-f-r commented Jul 13, 2025 •

edited

Loading

tristan-f-r commented Jul 18, 2025 •

edited

Loading

tristan-f-r commented Jul 21, 2025 •

edited

Loading

tristan-f-r commented Jul 21, 2025 •

edited

Loading

tristan-f-r commented Jul 23, 2025 •

edited

Loading

tristan-f-r commented Jul 23, 2025 •

edited

Loading

tristan-f-r commented Jul 24, 2025 •

edited

Loading