Skip to content

feat: extend NetworkCommons toward perturbation biology with LEMBAS integration#72

Open
HugoHakem wants to merge 17 commits into
mainfrom
lembas
Open

feat: extend NetworkCommons toward perturbation biology with LEMBAS integration#72
HugoHakem wants to merge 17 commits into
mainfrom
lembas

Conversation

@HugoHakem

Copy link
Copy Markdown
Member

Summary

This PR is the outcome of the Algorithms & Benchmarks session started at the Saez Lab
retreat in Paris (June 17th 2026), which explored how network-based and biologically
informed ML methods can be applied, evaluated, and benchmarked on perturbational
datasets within NetworkCommons. The contribution focuses on the LEMBAS ligand-perturbation dataset and method as a concrete end-to-end integration point.

Core contributions

Dataset integration (data/omics/_lembas.py): lembas_ligands() and
lembas_tfs() now return DataFrames with a proper condition index, making them
directly usable in perturbation workflows without manual reshaping.

Faithful LEMBAS-RNN reimplementation (methods/_perturbation.py): the existing
prototype is upgraded to closely match the architecture of Nilsson et al. 2022
(Nat Commun). This implementation is LLM-assisted — a careful human review of the
code against the original repositories is recommended before relying on it in
production. The reference codebases are:

LEMBAS vignette extended (docs/src/vignettes/C_lembas.ipynb): adds a
Leave-One-Out Cross-Validation section (section 7) following the evaluation protocol
of the original paper, alongside a mean-response and ridge baseline for comparison.

Environment modernisation and re-validation

To run the updated notebook, the environment was modernised. The project was migrated
from Poetry to uv using uvx migrate-to-uv, and additional pixi features were added
to cover dependencies that still required conda-forge or bioconda packages (e.g. R,
Bioconductor, Nextflow for the flop environment). Other additions: rdata>=0.10
core dep; pertpy and torch-cu128 (PyTorch CUDA 12.8 index wired up) optional
extras; dedicated dev-cu128 GPU environment; nbsphinx_execute = 'never' to prevent
notebook re-execution on ReadTheDocs.

As a consequence, all vignette notebooks were re-run end-to-end to verify nothing
broke. Several issues were caught and fixed in the process:

  • eval/_metrics.py: decoupler 2.x renamed the ORA result column Termsource
  • methods/_causal.py: CORNETO now surfaces the exception type and message when no
    solution is found, replacing a silent failure
  • visual/_network_stats.py: filepath type widened to str | None

Also added: get_hmdb_mapper in data/network/_moon.py to download and cache the
HMDB ID → metabolite name mapping from cosmosR (needed to interpret MOON/COSMOS results
on the LEMBAS network).

Test plan

  • All vignette notebooks re-executed end-to-end and outputs updated
  • C_lembas.ipynb runs with the updated LEMBAS-RNN and LOOCV section
  • uv run pytest passes — pygraphviz-dependent tests in test_utils and test_vis_networkx are skipped/fail because pygraphviz is not in the base uv test env (pre-existing on main, not introduced here)
  • pixi run -e dev pytest tests/test_utils.py passes fully (25/25) — pygraphviz is available in the dev pixi env via the graphviz conda-forge package
  • ReadTheDocs build succeeds
  • Human review of methods/_perturbation.py against the reference LEMBAS codebases

🤖 Generated with Claude Code

HugoHakem and others added 15 commits June 17, 2026 17:13
Implements lembas_network(), lembas_ligands(), lembas_tfs(),
lembas_annotation() and lembas_datasets() to fetch the macrophage
and ligand screen datasets used in Nilsson et al. 2022 (Nat Commun).
Macrophage files are pulled from Zenodo (record 10815391); ligand
screen files from the Lauffenburger-Lab/LEMBAS GitHub repo.

Also registers both datasets in datasets.yaml and adds the LEMBAS
section to api.rst and datasets.rst.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements the full LEMBAS-RNN method (MML activation, steady-state
convergence, uniform regularisation) alongside ridge and mean-response
baselines. Adds lembas_format_network to utils, wires the new methods
module, updates API and narrative docs, and adds a pytest smoke suite.

Co-Authored-By: daniele-bottazzi <daniele-bottazzi@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…h optional dep

evaluate_predictions gains an axis parameter ('readout' | 'condition') so
users can inspect which TFs or which experimental conditions are predicted
poorly. Adds three new tests covering both axes and bad-axis validation.
Tutorial notebook C_lembas.ipynb walks through the full macrophage pipeline:
data loading, network formatting, train/test split, mean/ridge/LEMBAS-RNN
models, and per-readout and per-condition evaluation. Registers torch as an
optional dependency installable via pip install networkcommons[torch].

Co-Authored-By: daniele-bottazzi <daniele-bottazzi@users.noreply.github.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- use uvx migrate-to-uv (translate automatically from poetry to uv)
- use uvx pyproject-fmt (reformat the pyproject)
- Relax over-constrained dependency bounds; add lower bounds to unconstrained deps
- Restructure dependency-groups into test/docs/lint/dev sub-groups
- Replace black/isort/flake8/yapf/pyupgrade with ruff; switch to Google docstring convention
- Rewrite tox.ini for uv (tox-uv, dependency_groups); rewrite CI workflows to use astral-sh/setup-uv
- Drop legacy artifacts: setup.py, environment.yml, docs/src/requirements.txt
- Consolidate docs deps into pyproject.toml; update .readthedocs.yaml to use uv sync
- Fix _metadata.py to read from [project] instead of [tool.poetry]
- Remove stale poetry references from notebook and installation docs

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Switch build backend from uv_build to hatchling
- Fix _metadata.py: replace deprecated toml with tomllib/tomli (stdlib 3.11+, backport for 3.10)
- Upgrade corneto 1.0.0a0 → >=1.0.0b7 (drops numpy<2 cap)
- Upgrade omnipath >=1.0.8 → >=1.0.12 (fixes np.NAN removed in numpy 2)
- Remove numpy<2 upper bound (no longer needed)
- Pin pypath-omnipath to saezlab/pypath git master: fixes module-level RaMP
  API call crashing json.loads when the server is unreachable (issue #318)
- Add pypath-omnipath[curl] extra to bring pycurl back (now optional in pypath)
- Restructure pixi environments: add feature-level pypi-dependencies with
  extras so each environment activates the right optional deps; dev env now
  includes igraph, torch, corneto-backends and pygraphviz

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Corneto 1.0.0b7:
- Use cn.Graph (public API) instead of cn._graph.Graph in utils.py,
  _network.py and test_utils.py; beta moved the class to a new module
- Workaround corneto internal isinstance mismatch: runVanillaCarnival
  imports from corneto._graph while our graphs are corneto.graph._graph;
  pass SIF tuples instead so it builds its own graph internally
- Accept corneto._graph.BaseGraph in to_networkx() so graphs returned by
  runVanillaCarnival (still old-style) are correctly converted to networkx
- Add type: ignore[attr-defined] on cn.methods calls (Pylance false
  positive; cn.methods is present at runtime in corneto beta)
- Add TYPE_CHECKING imports in networkcommons/__init__.py so Pylance
  resolves networkcommons.eval (previously only set via dynamic globals())

Decoupler 2.x:
- dc.run_wmean -> dc.mt.waggr, dc.run_ulm -> dc.mt.ulm
- dc.get_ora_df -> dc.mt.query_set; update run_ora default metric from
  ora_Combined score to ora_stat and update test expectations
- Fix recursive loop in run_moon_core to also use dc.mt.* calls
- Add _moon_score_layer() fallback: decoupler 2.x raises ValueError on
  1-sample matrices because FDR correction fails on NaN t-statistics;
  fall back to a simple weighted mean (MOON only uses estimates, not pvals)
- norm_wmean now equals wmean since waggr has no permutation normalization;
  remove test assertion that they differ

_perturbation.py refactor:
- Remove _import_torch() and the torch-as-parameter anti-pattern
- Module-level try/except import: torch = None on ImportError
- Remove torch param from _torch_dtype, _torch_device, _mml_activation,
  _make_lembas_model; use module-level torch directly
- Guard run_lembas_rnn entry point with explicit ImportError when torch=None

Notebook:
- docs: dc.get_resource -> dc.op.resource in evaluation vignette

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Tighten Python version range to >=3.10,<3.13
- Document all optional extras (corneto-backends, igraph, torch)
- Add GPU/CUDA section: requirements-local.txt pattern for per-machine
  CUDA wheel selection, installed via pixi's bundled uv or standalone uv
- Add Pixi section with dev environment setup instructions

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Switch from build.jobs.install shell override to the proper
python.install with method: uv, which RTD understands natively.
Also bump Python to 3.12.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Add rdata>=0.10 as a core dependency (needed by get_hmdb_mapper)
- Add pertpy and torch-cu128 optional extras; register pytorch-cu128 index
  in uv so the CUDA wheel resolves automatically
- Add flop pixi feature/environment (R + Bioconductor + Nextflow stack for
  the FLOP pipeline)
- Refactor pixi environments to inherit a base feature; add dev-cu128 env
- Update installation.rst to document the new torch-cu128 extra and the
  dedicated dev-cu128 pixi environment for GPU users
- Set nbsphinx_execute = 'never' in conf.py so notebooks are never
  re-executed during the ReadTheDocs build
- Add flop_repo/ to .gitignore

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…isation

Notebooks were re-executed end-to-end to validate the modernised environment.
Several regressions were caught and fixed:

- eval/_metrics.py: decoupler 2.x renamed the column 'Term' → 'source' in
  ORA results; add rename so downstream code stays compatible
- methods/_causal.py: catch and log the exception type + message when
  CORNETO finds no solution, making silent failures diagnosable
- visual/_network_stats.py: widen filepath type to str | None in
  plot_scatter and create_heatmap (was causing type errors)
- data/omics/_lembas.py: lembas_ligands / lembas_tfs now set the first
  column as the DataFrame index (named 'condition') so callers don't need
  an extra reset_index step

Updated notebook outputs reflect the fixed behaviour.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Downloads HMDB_mapper_vec.RData from the cosmosR GitHub repository,
parses it with the rdata library, and returns a dict mapping HMDB IDs
(e.g. 'HMDB0000122') to human-readable metabolite names. Result is
cached as a pickle in the configured pickle_dir; pass update=True to
force a fresh download.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The previous implementation approximated the LEMBAS architecture from
Nilsson et al. (Nat Commun 2022). This commit aligns it with the original
R/MATLAB bionetwork codebase:

Weight initialisation
- Edges: 0.1 + 0.1×rand, negated for inhibitory signs (bionet.initializeWeights)
- Bias: 1e-3 everywhere; nodes that receive only inhibitory edges get bias=1
- Input scale: fixed buffer (inputAmplitude), not a learnable parameter
- Output projection: per-output scalar init to projection_amplitude (no bias)

Training loop
- Cosine one-cycle LR schedule peaking at lr_peak (bionetwork.oneCycle)
- Mini-batch training (default batch_size=5) with per-batch weight noise (1e-8)
- Per-batch input noise: drive += noiseLevel × curLr × randn
- Adam with lr=1.0; actual LR injected each epoch; momentum reset every 200 epochs
- Weight pre-scaling to spectral radius 0.8 before training (bionet.preScaleWeights)

Regularisation
- Spectral radius loss: soft exponential penalty with differentiable power
  iteration (bionetwork.spectralLoss)
- Uniform state distribution: mean/var/min/max loss matching
  bionetwork.uniformLossBatch (replaces old sorted-distribution loss)
- Sign regularisation unchanged; ligand bias penalty added (1e-3)
- L2 + inverse barrier on edge weights to prevent collapse to zero

Defaults updated: epochs=5000, tolerance=1e-6, dtype=float64,
uniform_penalty=1e-5, batch_size=5, projection_amplitude=1.2

C_lembas.ipynb: add section 7 (LOOCV) following the evaluation protocol
from the original paper; re-executed with fresh outputs.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- test_eval_graph: update test_run_ora to expect 'ora_Term' column
  (our backward-compat rename from decoupler 2.x 'source' → 'Term')
- test_utils: move pygraphviz import inside the two tests that need it
  using pytest.importorskip so the rest of the module runs without it
- test_utils: replace fragile try/except + exact dtype string match in
  test_handle_missing_values_more_than_one_non_numeric_column with
  pytest.raises + partial match (pandas dtype repr changed across versions)
- uv.lock: regenerated to encode the torch vs torch-cu128 extras conflict

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@HugoHakem HugoHakem requested a review from pablormier June 26, 2026 13:21
@HugoHakem HugoHakem added documentation Improvements or additions to documentation enhancement New feature or request labels Jun 26, 2026
--locked re-runs the full resolution and fails if the result differs
from the committed lock file, which happens when the lock was generated
on a different platform (e.g. Linux) and CI runs on another (macOS).
--frozen installs exactly the versions in the lock file without
re-resolving, which is the correct behaviour for reproducible CI.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>


return file_legend

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the long term this should use the utils API by omnipath-client and rely on higher level objects

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Investigated this — omnipath-client v0.2.3 is now added as a dependency. However, oc.utils.translate('hmdb', 'traditional_iupac') currently returns HTTP 500 on utils.omnipathdb.org: the utils service only supports cross-database ID mapping (e.g. hmdb → chebi works), not name resolution. Names are served by the separate metabo.omnipathdb.org service via entities/resolve, which has no public wrapper yet in the current version.

For now get_hmdb_mapper keeps the rdata approach with a note in the docstring pointing to the intended migration. A working workaround using OmniPath()._fetch('entities/resolve') is preserved on branch hmdb-mapper-omnipath-workaround for reference, and should be promoted once oc.utils.translate supports metabolite name mapping server-side.

Beyond get_hmdb_mapper, we identified three other places in the codebase that could benefit from switching to omnipath-client long-term:

  • noi/_node.py — currently calls pypath.utils.mapping.map_name() and pypath.utils.orthology.translate() directly; oc.utils.map_name() / oc.utils.orthology_translate() are the intended replacements and would reduce the hard dependency on pypath-omnipath[curl]
  • data/omics/_common.py — uses biomart for Ensembl → HGNC symbol mapping; oc.utils.translate() covers this across 97 ID types
  • data/network/_omnipath.py — uses the older omnipath client; omnipath-client is the intended successor

…hmdb_mapper

Adds omnipath-client>=0.2.3 as a dependency for future use (node ID
translation, Ensembl mappings, COSMOS PKN via oc.cosmos once available).

get_hmdb_mapper keeps the existing rdata approach for now: the intended
migration to oc.utils.translate('hmdb', 'traditional_iupac') is blocked
by a server-side 500 on utils.omnipathdb.org. A working workaround using
OmniPath()._fetch('entities/resolve') is preserved on branch
hmdb-mapper-omnipath-workaround for when the API matures.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants