feat(enrichr): retrieve gene sets incl. MSigDB collections (#139) by Elarwei001 · Pull Request #241 · scverse/gget

Elarwei001 · 2026-06-24T15:19:32Z

Resolves #139

Summary

gget enrichr: Added support for retrieving gene sets, including MSigDB collections (fixes issue 139).

Testing

Unit tests added/extended in tests/test_enrichr.py with fixture entries in tests/fixtures/test_enrichr.json; run with pytest.

…#178, scverse#177) (scverse#222) * feat(pdb): support PDBx/mmCIF format and auto-fallback (scverse#178, scverse#177) The legacy PDB format is being phased out by RCSB and is unavailable for large structures (e.g. 6Q38, 7A01), causing `gget pdb` to fail with "not found" — the bug reported in scverse#177. - Add `resource="mmcif"` to download the structure in PDBx/mmCIF (.cif). - `resource="pdb"` (default) now automatically falls back to PDBx/mmCIF when the legacy PDB file is unavailable, logging a warning. Saved files use the correct extension (.cif vs .pdb) based on the format fetched. - Backward compatible: existing commands that already worked are unchanged. - Tests: explicit mmcif download + legacy->mmcif fallback regression (6Q38). - Docs + updates.md. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com> * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * fix: ruff lint errors flagged by pre-commit.ci - gget_g2p.py: collapse the multi-line docstring summary into a single line + blank line (ruff D205). - main.py: add the missing # noqa: E402 to the 6 new import lines (g2p, ref, search, seq, setup, virus). All earlier imports already carry this noqa because dt_string is computed at module top before the import block, so E402 fires on any unmarked later import. Also drops the stray "# Module functions" comment that was splitting one alphabetical import list into two. --------- Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com> Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Laura Luebbert <laura.lbt60@gmail.com>

Send `User-Agent: gget/<version> (+https://github.com/scverse/gget)` on all Bgee API calls so the upstream service can attribute traffic to gget and reach the project if needed. Co-authored-by: Claude Opus 4.7 <noreply@anthropic.com>

…+ recent main bot commits # Conflicts: # .github/badges/tests.json # tests/pytest_results.txt

@overload

…cverse#225) * feat(types): pay down 67 mypy errors (var-annotated + json overloads) mypy baseline: 613 → 546 (−67, ~11%). No behavior changes — every edit is annotation-only and verified with python -m py_compile + `import gget` smoke test + `pytest --collect-only` (400 tests, 0 collection errors). Resolves part of scverse#216. Two passes: 1. var-annotated quick wins (−21 errors, → 0 remaining) - Added explicit type annotations to 13 empty container literals across utils.py, gget_ref.py, gget_info.py, gget_blat.py, gget_muscle.py, gget_virus.py. Inferred element type from surrounding code (list[str], dict[str, Any], etc.) — fell back to Any only when the type was genuinely dynamic. 2. typing.overload for the json= flag pattern (−~20 union-attr, plus ~26 other category errors that depended on the narrowed return type) - Added @typing.overload signatures for the 12 modules with `def f(..., json: bool = False, ...) -> DataFrame | dict`: gget_8cube, gget_archs4, gget_bgee, gget_blast, gget_blat, gget_cosmic, gget_diamond, gget_elm, gget_enrichr, gget_info, gget_opentargets, gget_search. - Now `f(...)` returns DataFrame and `f(..., json=True)` returns dict at the type-check level. Implementation signature unchanged. Why only 67 and not the predicted ~150: - Most remaining [union-attr] errors come from BeautifulSoup (`Tag | None` from `.find()`) and `str | None` checks, not the json= flag pattern. Those need per-callsite None-guards, which is the next batch. Remaining categories (sorted, top 6): [index] ~157 (pandas df["col"] indexing — needs cast() or # type: ignore) [union-attr] 115 (BeautifulSoup / str|None — needs None-guards) [attr-defined] 68 (dynamic JSON response shapes) [call-overload] 58 (pandas/numpy stubs) [assignment] ~56 [arg-type] ~53 * fix(bgee): restore Literal + overload imports lost in dev merge The merge of dev into feat/mypy-cleanup (925f66d) collided on gget_bgee.py's typing import line. Git auto-resolved by taking dev's version (`from typing import TYPE_CHECKING, Any` — added by the bgee user-agent PR scverse#224) and silently dropped the `Literal, overload` additions from this branch, while keeping the @overload decorators at lines 183/192 that use them. Result: module-load NameError that broke test collection for every test file that imports gget. Restore the full import: TYPE_CHECKING, Any, Literal, overload. * ci: re-trigger pre-commit.ci * fix(pre-commit): exclude .github/badges/*.json from formatting The badge JSON is regenerated by ci.yml's "Generate tests badge JSON" step using json.dumps() with no `indent` parameter — single-line compact output. biome's default JSON formatter wants multi-line tab- indented output. So every CI run writes the compact form, and every pre-commit.ci run reformats it back, and we get a permanent biome-format failure that never resolves. Same fix as the tests/pytest_results.txt entry: just exclude the auto-generated file from formatting hooks.

Add `gget.enrichr_library()` (CLI: `gget enrichr --get_library`) to fetch the gene sets (members) of any Enrichr gene-set library — the recommended way to retrieve MSigDB gene sets (e.g. MSigDB_Hallmark_2020) without MSigDB login. - Returns a long-format DataFrame (gene_set, gene), or a {gene_set: [genes]} dict with json=True. `gene_set=` returns a single set; `species` selects the non-human Enrichr variants. - CLI: new --get_library/-gl and --gene_set/-gs; genes/--database made optional in library mode (still enforced for enrichment). Backward compatible. - Detects Enrichr's HTML-404 (HTTP 200) response for unknown libraries. - Tests + fixtures (live Enrichr) and docs. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

codecov-commenter · 2026-06-24T16:01:24Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 56.62%. Comparing base (5cf607f) to head (69cc3ea).
⚠️ Report is 1 commits behind head on dev.

Additional details and impacted files

@@            Coverage Diff             @@
##              dev     #241      +/-   ##
==========================================
+ Coverage   56.14%   56.62%   +0.48%     
==========================================
  Files          29       29              
  Lines        9244     9285      +41     
==========================================
+ Hits         5190     5258      +68     
+ Misses       4054     4027      -27

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Add a network-free TestEnrichrLibraryOffline class that mocks requests to cover enrichr_library: invalid species, verbose logging, blank-line parsing, bad/empty library errors, gene_set filter + not-found, and the json/json+save/CSV-save branches. All PR-added lines now covered. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

lauraluebbert and others added 12 commits June 21, 2026 22:01

Bump dev version

382ff8a

Add space for new udpates

f9eddbe

CI: update pytest results (dev)

41b3202

CI: update pytest results (dev)

81ac6e9

Merge main into dev: bring in PR scverse#223 (ruff baseline cleanup) …

18f14e1

…+ recent main bot commits # Conflicts: # .github/badges/tests.json # tests/pytest_results.txt

CI: update pytest results (dev)

7061185

CI: update pytest results (dev)

5cf607f

CI: update pytest results (dev)

5506537

Elarwei001 marked this pull request as draft June 25, 2026 03:44

lauraluebbert deleted the branch scverse:dev June 28, 2026 20:31

lauraluebbert closed this Jun 28, 2026

lauraluebbert reopened this Jun 28, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat(enrichr): retrieve gene sets incl. MSigDB collections (#139)#241

feat(enrichr): retrieve gene sets incl. MSigDB collections (#139)#241
Elarwei001 wants to merge 13 commits into
scverse:devfrom
Elarwei001:feature/enrichr-msigdb-139

Elarwei001 commented Jun 24, 2026

Uh oh!

codecov-commenter commented Jun 24, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

Elarwei001 commented Jun 24, 2026

Summary

Testing

Uh oh!

codecov-commenter commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

codecov-commenter commented Jun 24, 2026 •

edited

Loading