New modules: Llamacpp-python/run and huggingface/download for allowing to run simple text workloads with local LLMs #11053
Conversation
famosab
left a comment
There was a problem hiding this comment.
Thank you for your contribution to nf-core! We really appreciate it. I added a few comments to your PR.
We usually recommend to have one module per PR. That makes the review process easier and its more likely that someone will review your PR. You can keep that in mind for the next PRs.
Hi @famosab . Thanks for the feedback and I will go through your comments! I was told about the module submission, but since you need the output of one of the processes for the other, I thought it would help potential users once it could become eventually accepted. But, certainly, is more work for everyone. Sorry about this and I will avoid it in future PRs. |
Not so many assertions for stub test Co-authored-by: Famke Bäuerle <45968370+famosab@users.noreply.github.com>
There was a problem hiding this comment.
Per the (draft) GPU module guidelines:
- Emit CUDA runtime version on the
versionstopic (CPU path falls back tono CUDA available) maxForks = 1on GPU tests for Singularity GPU concurrency- Snap md5s included so you don't need to rerun
Verified locally: CPU path md5 3650e9ce14ab350b69387f6631b57885, real GPU path md5 28ef3c6aa93df0de54e66c13cc9db845.
|
Sorry for the AI language above, just thought I'd help out by fixing this up for you- all the above are tested so you should just be able to accept the suggestions. It's my fault, I amended the proposed guidelines a little in response to my own testing/ feedback thereon. Now, to your question... |
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
|
Glad it's in! On testing and Wave upgrades: Testing the GPU path locally You need three things on the host: a GPU, nf-test test --tag llamacpppython/run --profile docker,gpu \
modules/nf-core/llamacpppython/run/tests/main.gpu.nf.testThat dispatches the GPU container, runs real inference, and exercises the If you don't have a GPU box to hand, a Without composing Upgrading the Wave containers
# CPU
wave --conda-file environment.yml --freeze --await
wave --conda-file environment.yml --freeze --await --singularity
# GPU (pip wheel + cuda-runtime)
wave --conda-file environment.gpu.yml --freeze --await \
--build-template conda/micromamba:v2 \
--config-env 'LD_LIBRARY_PATH=/opt/conda/lib'
wave --conda-file environment.gpu.yml --freeze --await --singularity \
--build-template conda/micromamba:v2 \
--config-env 'LD_LIBRARY_PATH=/opt/conda/lib'
Each invocation prints a frozen tag - The Wave guidance is in the proposed module specs (nf-core/website#4142), under "GPU-capable modules" and "Pip-based GPU packages" - probably worth waiting for that to land rather than duplicating it elsewhere. |
|
One more handy bit since you don't have a GPU on hand: if a future change shifts the GPU snap md5s, you can ask the nf-core bot to regenerate them on a CI runner with the right hardware by commenting on the PR: (works the same for any GPU-tagged module - I used it on #11258 for ribodetector). Saves spinning up an instance just to refresh a md5. |
and of course this info triggered it! |
pinin4fjords
left a comment
There was a problem hiding this comment.
Spotted while looking over the diff: the four real tests pull ~3.2 GB of GGUFs from HuggingFace per fresh CI run (Gemma-1B, SmolLM3-3B). For the inference tests in llamacpppython/run, swapping to mradermacher/tiny-gemma-test-i1-GGUF (~14 MB, same Gemma architecture, loads cleanly in llama-cpp-python 0.3.16 - I verified) keeps full coverage.
For huggingface/download, the tests only assert that download works, so they don't need a model at all - any file from any HF repo proves the path. hf-internal-testing/tiny-random-* repos exist for exactly this purpose; swapping to their config.json files (~800 B each) brings the download down to under 2 KB.
Total: ~3.2 GB → ~14 MB per run.
|
Just looked at the conda CI failures - they're the same root cause as the size review I just posted. The conda jobs are timing out at exactly 2h on the SmolLM3 download: (from The Applying the suggestions should fix the conda red ticks too. |
pinin4fjords
left a comment
There was a problem hiding this comment.
Approving to unblock you- I think that smaller model download thing should fix up the rest of it and get you passing CI, as long as you don't see a problem there.
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
Co-authored-by: Jonathan Manning <pininforthefjords@gmail.com>
|
Note: the linting failure is not your fault, it's pending #11543 |
|
@toniher can you merge in the latest master please? I don't have permission to on your fork. |
|
Thanks everyone for all the involved work! 🥳 |
This pull request, contributed jointly with @lucacozzuto , provides a simple workload for running text inference tasks using llamacpp-python against local LLMs.
This effort was worked on during the nf-core Hackathon in March 2026.
PR checklist
Closes #XXX
topic: versions- See version_topicslabelnf-core modules test <MODULE> --profile dockernf-core modules test <MODULE> --profile singularitynf-core modules test <MODULE> --profile condanf-core subworkflows test <SUBWORKFLOW> --profile dockernf-core subworkflows test <SUBWORKFLOW> --profile singularitynf-core subworkflows test <SUBWORKFLOW> --profile conda