feat: extend NetworkCommons toward perturbation biology with LEMBAS integration by HugoHakem · Pull Request #72 · saezlab/networkcommons

HugoHakem · 2026-06-26T13:20:00Z

Summary

This PR is the outcome of the Algorithms & Benchmarks session started at the Saez Lab
retreat in Paris (June 17th 2026), which explored how network-based and biologically
informed ML methods can be applied, evaluated, and benchmarked on perturbational
datasets within NetworkCommons. The contribution focuses on the LEMBAS ligand-perturbation dataset and method as a concrete end-to-end integration point.

Core contributions

Dataset integration (data/omics/_lembas.py): lembas_ligands() and
lembas_tfs() now return DataFrames with a proper condition index, making them
directly usable in perturbation workflows without manual reshaping.

Faithful LEMBAS-RNN reimplementation (methods/_perturbation.py): the existing
prototype is upgraded to closely match the architecture of Nilsson et al. 2022
(Nat Commun). This implementation is LLM-assisted — a careful human review of the
code against the original repositories is recommended before relying on it in
production. The reference codebases are:

CPU implementation: https://github.com/Lauffenburger-Lab/LEMBAS
GPU implementation: https://github.com/AvlantNilssonLab/LEMBAS_GPU

LEMBAS vignette extended (docs/src/vignettes/C_lembas.ipynb): adds a
Leave-One-Out Cross-Validation section (section 7) following the evaluation protocol
of the original paper, alongside a mean-response and ridge baseline for comparison.

Environment modernisation and re-validation

To run the updated notebook, the environment was modernised. The project was migrated
from Poetry to uv using uvx migrate-to-uv, and additional pixi features were added
to cover dependencies that still required conda-forge or bioconda packages (e.g. R,
Bioconductor, Nextflow for the flop environment). Other additions: rdata>=0.10
core dep; pertpy and torch-cu128 (PyTorch CUDA 12.8 index wired up) optional
extras; dedicated dev-cu128 GPU environment; nbsphinx_execute = 'never' to prevent
notebook re-execution on ReadTheDocs.

As a consequence, all vignette notebooks were re-run end-to-end to verify nothing
broke. Several issues were caught and fixed in the process:

eval/_metrics.py: decoupler 2.x renamed the ORA result column Term → source
methods/_causal.py: CORNETO now surfaces the exception type and message when no
solution is found, replacing a silent failure
visual/_network_stats.py: filepath type widened to str | None

Also added: get_hmdb_mapper in data/network/_moon.py to download and cache the
HMDB ID → metabolite name mapping from cosmosR (needed to interpret MOON/COSMOS results
on the LEMBAS network).

Test plan

All vignette notebooks re-executed end-to-end and outputs updated
C_lembas.ipynb runs with the updated LEMBAS-RNN and LOOCV section
uv run pytest passes — pygraphviz-dependent tests in test_utils and test_vis_networkx are skipped/fail because pygraphviz is not in the base uv test env (pre-existing on main, not introduced here)
pixi run -e dev pytest tests/test_utils.py passes fully (25/25) — pygraphviz is available in the dev pixi env via the graphviz conda-forge package
ReadTheDocs build succeeds
Human review of methods/_perturbation.py against the reference LEMBAS codebases

🤖 Generated with Claude Code

Implements lembas_network(), lembas_ligands(), lembas_tfs(), lembas_annotation() and lembas_datasets() to fetch the macrophage and ligand screen datasets used in Nilsson et al. 2022 (Nat Commun). Macrophage files are pulled from Zenodo (record 10815391); ligand screen files from the Lauffenburger-Lab/LEMBAS GitHub repo. Also registers both datasets in datasets.yaml and adds the LEMBAS section to api.rst and datasets.rst. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Implements the full LEMBAS-RNN method (MML activation, steady-state convergence, uniform regularisation) alongside ridge and mean-response baselines. Adds lembas_format_network to utils, wires the new methods module, updates API and narrative docs, and adds a pytest smoke suite. Co-Authored-By: daniele-bottazzi <daniele-bottazzi@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…h optional dep evaluate_predictions gains an axis parameter ('readout' | 'condition') so users can inspect which TFs or which experimental conditions are predicted poorly. Adds three new tests covering both axes and bad-axis validation. Tutorial notebook C_lembas.ipynb walks through the full macrophage pipeline: data loading, network formatting, train/test split, mean/ridge/LEMBAS-RNN models, and per-readout and per-condition evaluation. Registers torch as an optional dependency installable via pip install networkcommons[torch]. Co-Authored-By: daniele-bottazzi <daniele-bottazzi@users.noreply.github.com> Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- use uvx migrate-to-uv (translate automatically from poetry to uv) - use uvx pyproject-fmt (reformat the pyproject)

- Relax over-constrained dependency bounds; add lower bounds to unconstrained deps - Restructure dependency-groups into test/docs/lint/dev sub-groups - Replace black/isort/flake8/yapf/pyupgrade with ruff; switch to Google docstring convention - Rewrite tox.ini for uv (tox-uv, dependency_groups); rewrite CI workflows to use astral-sh/setup-uv - Drop legacy artifacts: setup.py, environment.yml, docs/src/requirements.txt - Consolidate docs deps into pyproject.toml; update .readthedocs.yaml to use uv sync - Fix _metadata.py to read from [project] instead of [tool.poetry] - Remove stale poetry references from notebook and installation docs Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Switch build backend from uv_build to hatchling - Fix _metadata.py: replace deprecated toml with tomllib/tomli (stdlib 3.11+, backport for 3.10) - Upgrade corneto 1.0.0a0 → >=1.0.0b7 (drops numpy<2 cap) - Upgrade omnipath >=1.0.8 → >=1.0.12 (fixes np.NAN removed in numpy 2) - Remove numpy<2 upper bound (no longer needed) - Pin pypath-omnipath to saezlab/pypath git master: fixes module-level RaMP API call crashing json.loads when the server is unreachable (issue #318) - Add pypath-omnipath[curl] extra to bring pycurl back (now optional in pypath) - Restructure pixi environments: add feature-level pypi-dependencies with extras so each environment activates the right optional deps; dev env now includes igraph, torch, corneto-backends and pygraphviz Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Corneto 1.0.0b7: - Use cn.Graph (public API) instead of cn._graph.Graph in utils.py, _network.py and test_utils.py; beta moved the class to a new module - Workaround corneto internal isinstance mismatch: runVanillaCarnival imports from corneto._graph while our graphs are corneto.graph._graph; pass SIF tuples instead so it builds its own graph internally - Accept corneto._graph.BaseGraph in to_networkx() so graphs returned by runVanillaCarnival (still old-style) are correctly converted to networkx - Add type: ignore[attr-defined] on cn.methods calls (Pylance false positive; cn.methods is present at runtime in corneto beta) - Add TYPE_CHECKING imports in networkcommons/__init__.py so Pylance resolves networkcommons.eval (previously only set via dynamic globals()) Decoupler 2.x: - dc.run_wmean -> dc.mt.waggr, dc.run_ulm -> dc.mt.ulm - dc.get_ora_df -> dc.mt.query_set; update run_ora default metric from ora_Combined score to ora_stat and update test expectations - Fix recursive loop in run_moon_core to also use dc.mt.* calls - Add _moon_score_layer() fallback: decoupler 2.x raises ValueError on 1-sample matrices because FDR correction fails on NaN t-statistics; fall back to a simple weighted mean (MOON only uses estimates, not pvals) - norm_wmean now equals wmean since waggr has no permutation normalization; remove test assertion that they differ _perturbation.py refactor: - Remove _import_torch() and the torch-as-parameter anti-pattern - Module-level try/except import: torch = None on ImportError - Remove torch param from _torch_dtype, _torch_device, _mml_activation, _make_lembas_model; use module-level torch directly - Guard run_lembas_rnn entry point with explicit ImportError when torch=None Notebook: - docs: dc.get_resource -> dc.op.resource in evaluation vignette Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Tighten Python version range to >=3.10,<3.13 - Document all optional extras (corneto-backends, igraph, torch) - Add GPU/CUDA section: requirements-local.txt pattern for per-machine CUDA wheel selection, installed via pixi's bundled uv or standalone uv - Add Pixi section with dev environment setup instructions Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Switch from build.jobs.install shell override to the proper python.install with method: uv, which RTD understands natively. Also bump Python to 3.12. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Add rdata>=0.10 as a core dependency (needed by get_hmdb_mapper) - Add pertpy and torch-cu128 optional extras; register pytorch-cu128 index in uv so the CUDA wheel resolves automatically - Add flop pixi feature/environment (R + Bioconductor + Nextflow stack for the FLOP pipeline) - Refactor pixi environments to inherit a base feature; add dev-cu128 env - Update installation.rst to document the new torch-cu128 extra and the dedicated dev-cu128 pixi environment for GPU users - Set nbsphinx_execute = 'never' in conf.py so notebooks are never re-executed during the ReadTheDocs build - Add flop_repo/ to .gitignore Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…isation Notebooks were re-executed end-to-end to validate the modernised environment. Several regressions were caught and fixed: - eval/_metrics.py: decoupler 2.x renamed the column 'Term' → 'source' in ORA results; add rename so downstream code stays compatible - methods/_causal.py: catch and log the exception type + message when CORNETO finds no solution, making silent failures diagnosable - visual/_network_stats.py: widen filepath type to str | None in plot_scatter and create_heatmap (was causing type errors) - data/omics/_lembas.py: lembas_ligands / lembas_tfs now set the first column as the DataFrame index (named 'condition') so callers don't need an extra reset_index step Updated notebook outputs reflect the fixed behaviour. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Downloads HMDB_mapper_vec.RData from the cosmosR GitHub repository, parses it with the rdata library, and returns a dict mapping HMDB IDs (e.g. 'HMDB0000122') to human-readable metabolite names. Result is cached as a pickle in the configured pickle_dir; pass update=True to force a fresh download. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

The previous implementation approximated the LEMBAS architecture from Nilsson et al. (Nat Commun 2022). This commit aligns it with the original R/MATLAB bionetwork codebase: Weight initialisation - Edges: 0.1 + 0.1×rand, negated for inhibitory signs (bionet.initializeWeights) - Bias: 1e-3 everywhere; nodes that receive only inhibitory edges get bias=1 - Input scale: fixed buffer (inputAmplitude), not a learnable parameter - Output projection: per-output scalar init to projection_amplitude (no bias) Training loop - Cosine one-cycle LR schedule peaking at lr_peak (bionetwork.oneCycle) - Mini-batch training (default batch_size=5) with per-batch weight noise (1e-8) - Per-batch input noise: drive += noiseLevel × curLr × randn - Adam with lr=1.0; actual LR injected each epoch; momentum reset every 200 epochs - Weight pre-scaling to spectral radius 0.8 before training (bionet.preScaleWeights) Regularisation - Spectral radius loss: soft exponential penalty with differentiable power iteration (bionetwork.spectralLoss) - Uniform state distribution: mean/var/min/max loss matching bionetwork.uniformLossBatch (replaces old sorted-distribution loss) - Sign regularisation unchanged; ligand bias penalty added (1e-3) - L2 + inverse barrier on edge weights to prevent collapse to zero Defaults updated: epochs=5000, tolerance=1e-6, dtype=float64, uniform_penalty=1e-5, batch_size=5, projection_amplitude=1.2 C_lembas.ipynb: add section 7 (LOOCV) following the evaluation protocol from the original paper; re-executed with fresh outputs. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- test_eval_graph: update test_run_ora to expect 'ora_Term' column (our backward-compat rename from decoupler 2.x 'source' → 'Term') - test_utils: move pygraphviz import inside the two tests that need it using pytest.importorskip so the rest of the module runs without it - test_utils: replace fragile try/except + exact dtype string match in test_handle_missing_values_more_than_one_non_numeric_column with pytest.raises + partial match (pandas dtype repr changed across versions) - uv.lock: regenerated to encode the torch vs torch-cu128 extras conflict Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

--locked re-runs the full resolution and fails if the result differs from the committed lock file, which happens when the lock was generated on a different platform (e.g. Linux) and CI runs on another (macOS). --frozen installs exactly the versions in the lock file without re-resolving, which is the correct behaviour for reproducible CI. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

deeenes · 2026-06-26T14:21:16Z


-
    return file_legend
+


On the long term this should use the utils API by omnipath-client and rely on higher level objects

Investigated this — omnipath-client v0.2.3 is now added as a dependency. However, oc.utils.translate('hmdb', 'traditional_iupac') currently returns HTTP 500 on utils.omnipathdb.org: the utils service only supports cross-database ID mapping (e.g. hmdb → chebi works), not name resolution. Names are served by the separate metabo.omnipathdb.org service via entities/resolve, which has no public wrapper yet in the current version.

For now get_hmdb_mapper keeps the rdata approach with a note in the docstring pointing to the intended migration. A working workaround using OmniPath()._fetch('entities/resolve') is preserved on branch hmdb-mapper-omnipath-workaround for reference, and should be promoted once oc.utils.translate supports metabolite name mapping server-side.

Beyond get_hmdb_mapper, we identified three other places in the codebase that could benefit from switching to omnipath-client long-term:

noi/_node.py — currently calls pypath.utils.mapping.map_name() and pypath.utils.orthology.translate() directly; oc.utils.map_name() / oc.utils.orthology_translate() are the intended replacements and would reduce the hard dependency on pypath-omnipath[curl]

data/omics/_common.py — uses biomart for Ensembl → HGNC symbol mapping; oc.utils.translate() covers this across 97 ID types

data/network/_omnipath.py — uses the older omnipath client; omnipath-client is the intended successor

…hmdb_mapper Adds omnipath-client>=0.2.3 as a dependency for future use (node ID translation, Ensembl mappings, COSMOS PKN via oc.cosmos once available). get_hmdb_mapper keeps the existing rdata approach for now: the intended migration to oc.utils.translate('hmdb', 'traditional_iupac') is blocked by a server-side 500 on utils.omnipathdb.org. A working workaround using OmniPath()._fetch('entities/resolve') is preserved on branch hmdb-mapper-omnipath-workaround for when the API matures. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

HugoHakem and others added 15 commits June 17, 2026 17:13

chore: migrate to uv

d422e5c

- use uvx migrate-to-uv (translate automatically from poetry to uv) - use uvx pyproject-fmt (reformat the pyproject)

chore: simplify formatter w/ ruff

7255b29

ci: migrate ReadTheDocs config to native uv integration

93a28e0

Switch from build.jobs.install shell override to the proper python.install with method: uv, which RTD understands natively. Also bump Python to 3.12. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

HugoHakem requested a review from pablormier June 26, 2026 13:21

HugoHakem assigned HugoHakem, pablormier and daniele-bottazzi and unassigned HugoHakem Jun 26, 2026

HugoHakem added documentation Improvements or additions to documentation enhancement New feature or request labels Jun 26, 2026

HugoHakem mentioned this pull request Jun 26, 2026

pypath.utils.mapping fails to import when RaMP API is unreachable (module-level HTTP call) saezlab/pypath#318

Closed

deeenes reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: extend NetworkCommons toward perturbation biology with LEMBAS integration#72

feat: extend NetworkCommons toward perturbation biology with LEMBAS integration#72
HugoHakem wants to merge 17 commits into
mainfrom
lembas

HugoHakem commented Jun 26, 2026

Uh oh!

deeenes Jun 26, 2026

Uh oh!

HugoHakem Jun 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

HugoHakem commented Jun 26, 2026

Summary

Core contributions

Environment modernisation and re-validation

Test plan

Uh oh!

deeenes Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

HugoHakem Jun 27, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants