New modules: Llamacpp-python/run and huggingface/download for allowing to run simple text workloads with local LLMs by toniher · Pull Request #11053 · nf-core/modules

toniher · 2026-03-26T11:06:13Z

This pull request, contributed jointly with @lucacozzuto , provides a simple workload for running text inference tasks using llamacpp-python against local LLMs.
This effort was worked on during the nf-core Hackathon in March 2026.

PR checklist

Closes #XXX

… llamacpp

famosab

Thank you for your contribution to nf-core! We really appreciate it. I added a few comments to your PR.

We usually recommend to have one module per PR. That makes the review process easier and its more likely that someone will review your PR. You can keep that in mind for the next PRs.

toniher · 2026-04-04T16:39:40Z

Thank you for your contribution to nf-core! We really appreciate it. I added a few comments to your PR.

We usually recommend to have one module per PR. That makes the review process easier and its more likely that someone will review your PR. You can keep that in mind for the next PRs.

Hi @famosab . Thanks for the feedback and I will go through your comments! I was told about the module submission, but since you need the output of one of the processes for the other, I thought it would help potential users once it could become eventually accepted. But, certainly, is more work for everyone. Sorry about this and I will avoid it in future PRs.

…elines/components/modules#naming-conventions

Not so many assertions for stub test Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>

… llamacpp

pinin4fjords

Per the (draft) GPU module guidelines:

Emit CUDA runtime version on the versions topic (CPU path falls back to no CUDA available)
maxForks = 1 on GPU tests for Singularity GPU concurrency
Snap md5s included so you don't need to rerun

Verified locally: CPU path md5 3650e9ce14ab350b69387f6631b57885, real GPU path md5 28ef3c6aa93df0de54e66c13cc9db845.

pinin4fjords · 2026-05-06T09:45:59Z

Sorry for the AI language above, just thought I'd help out by fixing this up for you- all the above are tested so you should just be able to accept the suggestions. It's my fault, I amended the proposed guidelines a little in response to my own testing/ feedback thereon. Now, to your question...

Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>

pinin4fjords · 2026-05-06T10:46:39Z

Glad it's in! On testing and Wave upgrades:

Testing the GPU path locally

You need three things on the host: a GPU, nvidia-container-toolkit (Docker) or NVIDIA drivers (Singularity), and the modules-repo gpu profile composed onto your container profile - that profile (in tests/config/nf-test.config) is what adds --gpus all / --nv. nf-core modules test --profile X only accepts docker | singularity | conda so you can't use it for the GPU run; drive nf-test directly:

nf-test test --tag llamacpppython/run --profile docker,gpu \
    modules/nf-core/llamacpppython/run/tests/main.gpu.nf.test

That dispatches the GPU container, runs real inference, and exercises the versions.yml shape end-to-end. The CI workflow .github/workflows/nf-test-gpu.yml does the same on a GPU runner.

If you don't have a GPU box to hand, a g4dn.xlarge on AWS (Deep Learning Base GPU AMI, Tesla T4, under $1/hr on-demand) takes ~70 seconds to run both GPU tests once the container and model are pulled. I just verified the current PR head + the open review suggestions there - both GPU tests pass and the snap md5s match (3650e9ce… for the stub, 28ef3c6a… for the real run).

Without composing gpu in, the GPU test isn't a no-op - it fails at the python3 -c 'import llama_cpp; …' call inside the container, because llama_cpp opens libcudart at import time and the runtime has no GPU. So --profile docker on its own won't tell you anything useful about the accelerator path.

Upgrading the Wave containers

seqera.io/containers doesn't expose --freeze / --build-template, so I use the CLI directly. For this module:

# CPU
wave --conda-file environment.yml --freeze --await
wave --conda-file environment.yml --freeze --await --singularity

# GPU (pip wheel + cuda-runtime)
wave --conda-file environment.gpu.yml --freeze --await \
     --build-template conda/micromamba:v2 \
     --config-env 'LD_LIBRARY_PATH=/opt/conda/lib'
wave --conda-file environment.gpu.yml --freeze --await --singularity \
     --build-template conda/micromamba:v2 \
     --config-env 'LD_LIBRARY_PATH=/opt/conda/lib'

--config-env LD_LIBRARY_PATH=/opt/conda/lib is needed because conda's activate.d hooks don't fire under docker run, so the wheel's binary can't otherwise find the conda-provided CUDA libs at dlopen time.

Each invocation prints a frozen tag - community.wave.seqera.io/library/…:<hash> for Docker, oras://community.wave.seqera.io/library/…:<hash> for Singularity. Those four URLs go into the container ternary.

The Wave guidance is in the proposed module specs (nf-core/website#4142), under "GPU-capable modules" and "Pip-based GPU packages" - probably worth waiting for that to land rather than duplicating it elsewhere.

pinin4fjords · 2026-05-06T12:00:07Z

One more handy bit since you don't have a GPU on hand: if a future change shifts the GPU snap md5s, you can ask the nf-core bot to regenerate them on a CI runner with the right hardware by commenting on the PR:

@nf-core-bot update gpu snapshot path:modules/nf-core/llamacpppython/run

(works the same for any GPU-tagged module - I used it on #11258 for ribodetector). Saves spinning up an instance just to refresh a md5.

pinin4fjords · 2026-05-06T12:02:01Z

One more handy bit since you don't have a GPU on hand: if a future change shifts the GPU snap md5s, you can ask the nf-core bot to regenerate them on a CI runner with the right hardware by commenting on the PR:
@nf-core-bot update gpu snapshot path:modules/nf-core/llamacpppython/run
(works the same for any GPU-tagged module - I used it on #11258 for ribodetector). Saves spinning up an instance just to refresh a md5.

and of course this info triggered it!

pinin4fjords

Spotted while looking over the diff: the four real tests pull ~3.2 GB of GGUFs from HuggingFace per fresh CI run (Gemma-1B, SmolLM3-3B). For the inference tests in llamacpppython/run, swapping to mradermacher/tiny-gemma-test-i1-GGUF (~14 MB, same Gemma architecture, loads cleanly in llama-cpp-python 0.3.16 - I verified) keeps full coverage.

For huggingface/download, the tests only assert that download works, so they don't need a model at all - any file from any HF repo proves the path. hf-internal-testing/tiny-random-* repos exist for exactly this purpose; swapping to their config.json files (~800 B each) brings the download down to under 2 KB.

Total: ~3.2 GB → ~14 MB per run.

pinin4fjords · 2026-05-06T12:48:47Z

Just looked at the conda CI failures - they're the same root cause as the size review I just posted. The conda jobs are timing out at exactly 2h on the SmolLM3 download:

ERROR ~ Error executing process > 'HUGGINGFACE_DOWNLOAD (test_model_smollm3)'
Caused by:
  Process exceeded running time limit (2h)

(from x64 | conda | 2)

The hf download is just sitting there for 2h - likely a slower mirror or network issue under the conda env's huggingface_hub 1.6.0. The suggestions in the review above replace SmolLM3 (~860 MB) and Gemma-1B (~700 MB) with hf-internal-testing/tiny-random-{gpt2,bert}/config.json (~800 B each) and mradermacher/tiny-gemma-test-i1-GGUF (~14 MB), so the conda jobs would just complete in seconds.

Applying the suggestions should fix the conda red ticks too.

pinin4fjords

Approving to unblock you- I think that smaller model download thing should fix up the rest of it and get you passing CI, as long as you don't see a problem there.

Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>

pinin4fjords · 2026-05-06T16:17:04Z

Note: the linting failure is not your fault, it's pending #11543

pinin4fjords · 2026-05-07T09:16:23Z

@toniher can you merge in the latest master please? I don't have permission to on your fork.

toniher · 2026-05-07T11:02:08Z

Thanks everyone for all the involved work! 🥳

toniher added 5 commits March 25, 2026 18:55

adding modules for downloading and running gguf modules

7f1b7d7

adding docker support

9031f48

allow custom HF_HOME cache input and other fixes

04cd556

several test fixes

537a891

Upgrade problem with versions and test

20b5d26

toniher added this to Hackathon March 2026 Mar 26, 2026

toniher requested review from edmundmiller and maxulysse as code owners March 26, 2026 11:06

toniher added the new module Adding a new module label Mar 26, 2026

toniher and others added 7 commits March 26, 2026 12:27

fix precommit linting

c9997f5

fix yaml for prettier

5035e82

fix retrieval of version for huggingface

2c796de

Merge branch 'nf-core:master' into llamacpp

4f64bac

importing nextflow.config from HF_DOWNLOAD

ff7039c

Merge branch 'llamacpp' of github.com:biocorecrg/nf-core-modules into…

7a3f8fc

… llamacpp

adding hf_cache for setup as well

fb1768c

toniher requested a review from JoseEspinosa March 26, 2026 14:08

toniher added the Ready for Review label Mar 26, 2026

toniher moved this to Ready for review in Hackathon March 2026 Mar 26, 2026

famosab reviewed Apr 2, 2026

View reviewed changes

Comment thread modules/nf-core/llamacpp-python/run/tests/data/stub_model.gguf Outdated

toniher and others added 8 commits April 4, 2026 18:47

moving HF_DOWNLOAD to HUGGINGFACE_DOWNLOAD https://nf-co.re/docs/guid…

ac7f44c

…elines/components/modules#naming-conventions

Update modules/nf-core/huggingface/download/tests/main.nf.test

26168b7

Not so many assertions for stub test Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>

more detail and naming of Hugging Face

6dfff97

Merge remote-tracking branch 'upstream/master' into llamacpp

2d171ca

Merge branch 'llamacpp' of github.com:biocorecrg/nf-core-modules into…

ee04a27

… llamacpp

linting modules using

3021074

generate files on the fly

4630e5a

rmed data files for tests

53c7826

pinin4fjords reviewed May 6, 2026

View reviewed changes

toniher and others added 8 commits May 6, 2026 12:01

Update modules/nf-core/llamacpppython/run/main.nf

37c7025

Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>

Update modules/nf-core/llamacpppython/run/templates/llama-cpp-python.py

44586fe

Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>

Update modules/nf-core/llamacpppython/run/tests/nextflow.gpu.config

1aac201

Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>

Update modules/nf-core/llamacpppython/run/tests/main.gpu.nf.test.snap

d6ee95d

Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>

Update modules/nf-core/llamacpppython/run/tests/main.gpu.nf.test.snap

dcc782d

Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>

Update modules/nf-core/llamacpppython/run/tests/main.nf.test.snap

96e6450

Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>

Update modules/nf-core/llamacpppython/run/tests/main.nf.test.snap

30b2030

Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>

Merge branch 'nf-core:master' into llamacpp

5589689

pinin4fjords reviewed May 6, 2026

View reviewed changes

pinin4fjords approved these changes May 6, 2026

View reviewed changes

toniher and others added 5 commits May 6, 2026 16:27

Update modules/nf-core/llamacpppython/run/tests/main.gpu.nf.test

7f0bd4c

Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>

Update modules/nf-core/llamacpppython/run/tests/main.nf.test

c625a64

Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>

Update modules/nf-core/huggingface/download/tests/main.nf.test.snap

152b422

Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>

Update modules/nf-core/huggingface/download/tests/main.nf.test

5bd9d91

Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>

Merge branch 'master' into llamacpp

10e7fcc

Merge branch 'master' into llamacpp

c7f7d91

toniher enabled auto-merge May 7, 2026 10:20

toniher added this pull request to the merge queue May 7, 2026

Merged via the queue into nf-core:master with commit a548143 May 7, 2026
30 of 32 checks passed

toniher deleted the llamacpp branch May 7, 2026 10:25

Conversation

toniher commented Mar 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR checklist

Uh oh!

famosab left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

toniher commented Apr 4, 2026

Uh oh!

pinin4fjords left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pinin4fjords commented May 6, 2026

Uh oh!

pinin4fjords commented May 6, 2026

Uh oh!

pinin4fjords commented May 6, 2026

Uh oh!

pinin4fjords commented May 6, 2026

Uh oh!

pinin4fjords left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pinin4fjords commented May 6, 2026

Uh oh!

pinin4fjords left a comment

Choose a reason for hiding this comment

Uh oh!

pinin4fjords commented May 6, 2026

Uh oh!

pinin4fjords commented May 7, 2026

Uh oh!

Uh oh!

toniher commented May 7, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

toniher commented Mar 26, 2026 •

edited

Loading

pinin4fjords left a comment •

edited

Loading