Summary
The official context_chat_backend Docker images do not support NVIDIA Blackwell GPUs (RTX 5090, RTX 5080, etc.) because they are compiled against CUDA 12.2 without sm_120 architecture support.
Environment
- GPU: NVIDIA GeForce RTX 5090 (compute capability 12.0 / sm_120)
- Driver: 595.58.03
- CUDA: 13.2
- context_chat_backend: 5.3.0
- context_chat: 5.3.1
- Nextcloud: 33.x
Problem
The official image ghcr.io/nextcloud/context_chat_backend:5.3.0 is based on nvidia/cuda:12.2.2-runtime-ubuntu22.04 and downloads a prebuilt llama-cpp-python wheel compiled for CUDA 12.2 without Blackwell support:
CUDA : ARCHS = 500,520,530,600,610,620,700,720,750,800,860,870,890,900
# Missing: 1000 (sm_100), 1200 (sm_120)
As a result:
- GPU utilization stays at 0%
- Only 746 MiB of 32607 MiB VRAM is used
- Power draw: 7W of 575W
- All embedding computation falls back to CPU
The RTX 5090 requires CUDA 12.8+ and sm_120 for native Blackwell support.
What already works
The master branch (5.4.0-beta0) has a new multi-stage Dockerfile using CUDA 12.8 which is much better. However, the final stage still defaults to FROM runtime-cpu AS final instead of FROM runtime-cuda AS final.
With two small fixes to the master Dockerfile:
# 1. Add Blackwell architecture
ENV CMAKE_CUDA_ARCHITECTURES="89;90;100;120"
# 2. Use CUDA runtime as final stage
FROM runtime-cuda AS final # was: runtime-cpu
...the RTX 5090 works correctly:
CUDA : ARCHS = 500,610,700,750,800,860,890,1200 | BLACKWELL_NATIVE_FP4 = 1
ggml_cuda_init: found 1 CUDA devices:
Device 0: NVIDIA GeForce RTX 5090, compute capability 12.0
Request
- Fix the
FROM runtime-cpu AS final → FROM runtime-cuda AS final in the master Dockerfile (this seems like a bug)
- Add
sm_120 to CMAKE_CUDA_ARCHITECTURES for Blackwell support
- Consider publishing a CUDA-specific image tag e.g.
context_chat_backend:5.3.x-cuda for GPU users
- Publish a
v5.3.1 git tag to match the context_chat frontend app version
Additional Notes
- The RTX 5090 has 32 GB VRAM which is ideal for large embedding workloads
- With proper Blackwell support, indexing performance should improve dramatically
BLACKWELL_NATIVE_FP4 = 1 confirms native FP4 Tensor Core support is available
Thank you for the great work on this project!
Summary
The official
context_chat_backendDocker images do not support NVIDIA Blackwell GPUs (RTX 5090, RTX 5080, etc.) because they are compiled against CUDA 12.2 without sm_120 architecture support.Environment
Problem
The official image
ghcr.io/nextcloud/context_chat_backend:5.3.0is based onnvidia/cuda:12.2.2-runtime-ubuntu22.04and downloads a prebuilt llama-cpp-python wheel compiled for CUDA 12.2 without Blackwell support:As a result:
The RTX 5090 requires CUDA 12.8+ and sm_120 for native Blackwell support.
What already works
The
masterbranch (5.4.0-beta0) has a new multi-stage Dockerfile using CUDA 12.8 which is much better. However, the final stage still defaults toFROM runtime-cpu AS finalinstead ofFROM runtime-cuda AS final.With two small fixes to the master Dockerfile:
...the RTX 5090 works correctly:
Request
FROM runtime-cpu AS final→FROM runtime-cuda AS finalin the master Dockerfile (this seems like a bug)sm_120toCMAKE_CUDA_ARCHITECTURESfor Blackwell supportcontext_chat_backend:5.3.x-cudafor GPU usersv5.3.1git tag to match thecontext_chatfrontend app versionAdditional Notes
BLACKWELL_NATIVE_FP4 = 1confirms native FP4 Tensor Core support is availableThank you for the great work on this project!