Skip to content

New modules: Llamacpp-python/run and huggingface/download for allowing to run simple text workloads with local LLMs #11053

Merged
toniher merged 58 commits into
nf-core:masterfrom
biocorecrg:llamacpp
May 7, 2026
Merged

New modules: Llamacpp-python/run and huggingface/download for allowing to run simple text workloads with local LLMs #11053
toniher merged 58 commits into
nf-core:masterfrom
biocorecrg:llamacpp

Conversation

@toniher
Copy link
Copy Markdown
Member

@toniher toniher commented Mar 26, 2026

This pull request, contributed jointly with @lucacozzuto , provides a simple workload for running text inference tasks using llamacpp-python against local LLMs.
This effort was worked on during the nf-core Hackathon in March 2026.

PR checklist

Closes #XXX

  • This comment contains a description of changes (with reason).
  • If you've fixed a bug or added code that should be tested, add tests!
  • If you've added a new tool - have you followed the module conventions in the contribution docs
  • If necessary, include test data in your PR.
  • Remove all TODO statements.
  • Broadcast software version numbers to topic: versions - See version_topics
  • Follow the naming conventions.
  • Follow the parameters requirements.
  • Follow the input/output options guidelines.
  • Add a resource label
  • Use BioConda and BioContainers if possible to fulfil software requirements.
  • Ensure that the test works with either Docker / Singularity. Conda CI tests can be quite flaky:
    • For modules:
      • nf-core modules test <MODULE> --profile docker
      • nf-core modules test <MODULE> --profile singularity
      • nf-core modules test <MODULE> --profile conda
    • For subworkflows:
      • nf-core subworkflows test <SUBWORKFLOW> --profile docker
      • nf-core subworkflows test <SUBWORKFLOW> --profile singularity
      • nf-core subworkflows test <SUBWORKFLOW> --profile conda

@toniher toniher added the new module Adding a new module label Mar 26, 2026
@toniher toniher requested a review from JoseEspinosa March 26, 2026 14:08
@toniher toniher moved this to Ready for review in Hackathon March 2026 Mar 26, 2026
Copy link
Copy Markdown
Contributor

@famosab famosab left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your contribution to nf-core! We really appreciate it. I added a few comments to your PR.

We usually recommend to have one module per PR. That makes the review process easier and its more likely that someone will review your PR. You can keep that in mind for the next PRs.

Comment thread modules/nf-core/huggingface/download/tests/main.nf.test Outdated
Comment thread modules/nf-core/huggingface/download/tests/main.nf.test Outdated
Comment thread modules/nf-core/huggingface/download/tests/nextflow.config Outdated
Comment thread modules/nf-core/huggingface/download/tests/nextflow.config Outdated
Comment thread modules/nf-core/huggingface/download/main.nf Outdated
Comment thread modules/nf-core/huggingface/download/main.nf
Comment thread modules/nf-core/huggingface/download/main.nf Outdated
Comment thread modules/nf-core/huggingface/download/meta.yml Outdated
Comment thread modules/nf-core/llamacpp-python/run/tests/data/stub_model.gguf Outdated
@toniher
Copy link
Copy Markdown
Member Author

toniher commented Apr 4, 2026

Thank you for your contribution to nf-core! We really appreciate it. I added a few comments to your PR.

We usually recommend to have one module per PR. That makes the review process easier and its more likely that someone will review your PR. You can keep that in mind for the next PRs.

Hi @famosab . Thanks for the feedback and I will go through your comments! I was told about the module submission, but since you need the output of one of the processes for the other, I thought it would help potential users once it could become eventually accepted. But, certainly, is more work for everyone. Sorry about this and I will avoid it in future PRs.

Copy link
Copy Markdown
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the (draft) GPU module guidelines:

  • Emit CUDA runtime version on the versions topic (CPU path falls back to no CUDA available)
  • maxForks = 1 on GPU tests for Singularity GPU concurrency
  • Snap md5s included so you don't need to rerun

Verified locally: CPU path md5 3650e9ce14ab350b69387f6631b57885, real GPU path md5 28ef3c6aa93df0de54e66c13cc9db845.

Comment thread modules/nf-core/llamacpppython/run/main.nf
Comment thread modules/nf-core/llamacpppython/run/templates/llama-cpp-python.py
Comment thread modules/nf-core/llamacpppython/run/tests/nextflow.gpu.config
Comment thread modules/nf-core/llamacpppython/run/tests/main.nf.test.snap Outdated
Comment thread modules/nf-core/llamacpppython/run/tests/main.nf.test.snap Outdated
Comment thread modules/nf-core/llamacpppython/run/tests/main.gpu.nf.test.snap Outdated
Comment thread modules/nf-core/llamacpppython/run/tests/main.gpu.nf.test.snap Outdated
@pinin4fjords
Copy link
Copy Markdown
Member

Sorry for the AI language above, just thought I'd help out by fixing this up for you- all the above are tested so you should just be able to accept the suggestions. It's my fault, I amended the proposed guidelines a little in response to my own testing/ feedback thereon. Now, to your question...

toniher and others added 8 commits May 6, 2026 12:01
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
@pinin4fjords
Copy link
Copy Markdown
Member

Glad it's in! On testing and Wave upgrades:

Testing the GPU path locally

You need three things on the host: a GPU, nvidia-container-toolkit (Docker) or NVIDIA drivers (Singularity), and the modules-repo gpu profile composed onto your container profile - that profile (in tests/config/nf-test.config) is what adds --gpus all / --nv. nf-core modules test --profile X only accepts docker | singularity | conda so you can't use it for the GPU run; drive nf-test directly:

nf-test test --tag llamacpppython/run --profile docker,gpu \
    modules/nf-core/llamacpppython/run/tests/main.gpu.nf.test

That dispatches the GPU container, runs real inference, and exercises the versions.yml shape end-to-end. The CI workflow .github/workflows/nf-test-gpu.yml does the same on a GPU runner.

If you don't have a GPU box to hand, a g4dn.xlarge on AWS (Deep Learning Base GPU AMI, Tesla T4, under $1/hr on-demand) takes ~70 seconds to run both GPU tests once the container and model are pulled. I just verified the current PR head + the open review suggestions there - both GPU tests pass and the snap md5s match (3650e9ce… for the stub, 28ef3c6a… for the real run).

Without composing gpu in, the GPU test isn't a no-op - it fails at the python3 -c 'import llama_cpp; …' call inside the container, because llama_cpp opens libcudart at import time and the runtime has no GPU. So --profile docker on its own won't tell you anything useful about the accelerator path.

Upgrading the Wave containers

seqera.io/containers doesn't expose --freeze / --build-template, so I use the CLI directly. For this module:

# CPU
wave --conda-file environment.yml --freeze --await
wave --conda-file environment.yml --freeze --await --singularity

# GPU (pip wheel + cuda-runtime)
wave --conda-file environment.gpu.yml --freeze --await \
     --build-template conda/micromamba:v2 \
     --config-env 'LD_LIBRARY_PATH=/opt/conda/lib'
wave --conda-file environment.gpu.yml --freeze --await --singularity \
     --build-template conda/micromamba:v2 \
     --config-env 'LD_LIBRARY_PATH=/opt/conda/lib'

--config-env LD_LIBRARY_PATH=/opt/conda/lib is needed because conda's activate.d hooks don't fire under docker run, so the wheel's binary can't otherwise find the conda-provided CUDA libs at dlopen time.

Each invocation prints a frozen tag - community.wave.seqera.io/library/…:<hash> for Docker, oras://community.wave.seqera.io/library/…:<hash> for Singularity. Those four URLs go into the container ternary.

The Wave guidance is in the proposed module specs (nf-core/website#4142), under "GPU-capable modules" and "Pip-based GPU packages" - probably worth waiting for that to land rather than duplicating it elsewhere.

@pinin4fjords
Copy link
Copy Markdown
Member

One more handy bit since you don't have a GPU on hand: if a future change shifts the GPU snap md5s, you can ask the nf-core bot to regenerate them on a CI runner with the right hardware by commenting on the PR:

@nf-core-bot update gpu snapshot path:modules/nf-core/llamacpppython/run

(works the same for any GPU-tagged module - I used it on #11258 for ribodetector). Saves spinning up an instance just to refresh a md5.

@pinin4fjords
Copy link
Copy Markdown
Member

One more handy bit since you don't have a GPU on hand: if a future change shifts the GPU snap md5s, you can ask the nf-core bot to regenerate them on a CI runner with the right hardware by commenting on the PR:

@nf-core-bot update gpu snapshot path:modules/nf-core/llamacpppython/run

(works the same for any GPU-tagged module - I used it on #11258 for ribodetector). Saves spinning up an instance just to refresh a md5.

and of course this info triggered it!

Copy link
Copy Markdown
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Spotted while looking over the diff: the four real tests pull ~3.2 GB of GGUFs from HuggingFace per fresh CI run (Gemma-1B, SmolLM3-3B). For the inference tests in llamacpppython/run, swapping to mradermacher/tiny-gemma-test-i1-GGUF (~14 MB, same Gemma architecture, loads cleanly in llama-cpp-python 0.3.16 - I verified) keeps full coverage.

For huggingface/download, the tests only assert that download works, so they don't need a model at all - any file from any HF repo proves the path. hf-internal-testing/tiny-random-* repos exist for exactly this purpose; swapping to their config.json files (~800 B each) brings the download down to under 2 KB.

Total: ~3.2 GB → ~14 MB per run.

Comment thread modules/nf-core/huggingface/download/tests/main.nf.test Outdated
Comment thread modules/nf-core/huggingface/download/tests/main.nf.test.snap
Comment thread modules/nf-core/llamacpppython/run/tests/main.nf.test
Comment thread modules/nf-core/llamacpppython/run/tests/main.gpu.nf.test Outdated
@pinin4fjords
Copy link
Copy Markdown
Member

Just looked at the conda CI failures - they're the same root cause as the size review I just posted. The conda jobs are timing out at exactly 2h on the SmolLM3 download:

ERROR ~ Error executing process > 'HUGGINGFACE_DOWNLOAD (test_model_smollm3)'
Caused by:
  Process exceeded running time limit (2h)

(from x64 | conda | 2)

The hf download is just sitting there for 2h - likely a slower mirror or network issue under the conda env's huggingface_hub 1.6.0. The suggestions in the review above replace SmolLM3 (~860 MB) and Gemma-1B (~700 MB) with hf-internal-testing/tiny-random-{gpt2,bert}/config.json (~800 B each) and mradermacher/tiny-gemma-test-i1-GGUF (~14 MB), so the conda jobs would just complete in seconds.

Applying the suggestions should fix the conda red ticks too.

Copy link
Copy Markdown
Member

@pinin4fjords pinin4fjords left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving to unblock you- I think that smaller model download thing should fix up the rest of it and get you passing CI, as long as you don't see a problem there.

toniher and others added 5 commits May 6, 2026 16:27
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
@pinin4fjords
Copy link
Copy Markdown
Member

Note: the linting failure is not your fault, it's pending #11543

@pinin4fjords
Copy link
Copy Markdown
Member

@toniher can you merge in the latest master please? I don't have permission to on your fork.

@toniher toniher enabled auto-merge May 7, 2026 10:20
@toniher toniher added this pull request to the merge queue May 7, 2026
Merged via the queue into nf-core:master with commit a548143 May 7, 2026
30 of 32 checks passed
@toniher toniher deleted the llamacpp branch May 7, 2026 10:25
@toniher
Copy link
Copy Markdown
Member Author

toniher commented May 7, 2026

Thanks everyone for all the involved work! 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

Status: Ready for review

Development

Successfully merging this pull request may close these issues.

6 participants