Version 3.0.0.0 | GitHub | Jellyfin 10.11.2 | .NET 9.0
Generate AI-powered subtitles for your media library using OpenAI's Whisper model via whisper.cpp. Fully local -- no external API calls, no data leaves your server. Runs on CPU or NVIDIA GPU (CUDA).
- Fully local -- no API keys, no subscriptions, no data sent to third parties
- Multiple Whisper models -- Tiny, Base, Small, Medium, Large, Turbo with English-only variants
- Translation to English -- transcribe any language, translate to English
- CPU and GPU -- runs on CPU (AVX2) or NVIDIA GPU via CUDA with automatic binary selection
- Audio chunking -- splits audio >30 minutes into chunks to prevent OOM on low-RAM servers (4 GB)
- Word-level timestamps -- precise subtitle timing (approximately 2-3x slower processing)
- Scheduled task -- process entire libraries on a schedule
- Library scan hook -- auto-generate subtitles for new media as it is discovered
- Per-chunk progress reporting -- smooth progress updates in the scheduled task UI
- Intelligent subtitle detection -- skip existing subtitles, force-regenerate AI-tagged subtitles
- AI identifier tagging -- mark generated subtitles with a configurable tag (e.g.,
video.en.whisper.srt) - Library filtering -- select which libraries to process, exclude specific folders
- FFmpeg audio extraction -- works with any video format (MP4, MKV, AVI, MOV, etc.)
- Dual binary system -- ships both CPU (4.2 MB) and CUDA (~1.1 GB) whisper-cli binaries
- Jellyfin 10.11.x (10.11.2 is the target ABI)
- Linux x86_64 server (binaries compiled for linux-x64)
- ~750 MB free disk space for the combined plugin zip (CPU + CUDA)
- Additional disk space for model files (75 MB for Tiny, up to 3 GB for Large)
- FFmpeg (bundled with Jellyfin, auto-detected)
- NVIDIA GPU (optional, for CUDA acceleration)
- Docker (optional, for building from source)
- Dashboard -> Plugins -> Repositories -> Add
- URL:
https://github.com/zakattack02/Whisper-Script/raw/refs/heads/feature/jellyfin-plugin/manifest.json - Name:
Whisper Subtitles - Save, then Catalog -> Install -> Restart Jellyfin
Download the latest release zip from GitHub Releases, then Dashboard -> Plugins -> Manual Install -> Upload the zip -> Restart.
# Extract into the Jellyfin plugins directory
sudo unzip jellyfin-plugin-whispersubtitles_3.0.0.0.zip \
-d /var/lib/jellyfin/plugins/Whisper\ Subtitles_3.0.0.0/
sudo chown -R jellyfin:jellyfin /var/lib/jellyfin/plugins/Whisper\ Subtitles_3.0.0.0/
sudo systemctl restart jellyfinOn first use, the plugin automatically deploys the whisper binary from the bundle to ~/.cache/whisper-cpp/ and downloads the selected model to ~/.cache/whisper/. When upgrading, clear the old binary cache:
rm -rf /cache/whisper-cpp/ # or ~/.cache/whisper-cpp/Navigate to Dashboard -> Plugins -> Whisper Subtitles -> Settings.
| Setting | Description | Default |
|---|---|---|
| Whisper Model | Model size (larger = more accurate, slower, more VRAM) | Small |
| Download Model | Pre-download the selected model | Button |
| Target Language | Language code for subtitles (e.g., en, es, fr, de, ja, zh) | en |
| AI Identifier | Tag appended to subtitle filenames (e.g., video.en.whisper.srt) |
whisper |
| Setting | Description | Default |
|---|---|---|
| Enable CUDA (NVIDIA GPU) | Use the CUDA GPU binary instead of the CPU binary | Enabled |
| FFprobe Path | Custom path to ffprobe (used for audio duration detection) | Auto-detect |
The config page shows runtime diagnostics: detected GPU type (cuda, vulkan, metal, or none), CUDA binary deployment status, available CPU threads, and cached model count.
| Setting | Description | Default |
|---|---|---|
| Process on Library Scan | Auto-generate subtitles for new media on library scan | Disabled |
| Skip Existing Subtitles | Skip videos that already have subtitle tracks | Enabled |
| Regenerate AI Subtitles | Force-regenerate even if AI-tagged subtitle exists | Disabled |
| Translate to English | Translate non-English audio to English subtitles | Disabled |
| Word-Level Timestamps | More precise timing (2-3x slower processing) | Disabled |
| Show in Main Menu | Toggle the plugin entry in Jellyfin sidebar navigation | Enabled |
| Libraries to Process | Select which media libraries to scan (empty = all) | All |
| Folders to Exclude | Absolute paths to exclude (one per line) | Empty |
Dashboard -> Scheduled Tasks -> Generate Whisper Subtitles -> Play button to run immediately, or configure a trigger (e.g., daily at 2 AM).
Enable Process on Library Scan in settings. Subtitles are auto-generated for new media items when a library scan completes or new files are detected.
For each video:
- Skip check -- skip if existing subtitles are found (respecting Skip Existing and Regenerate AI settings)
- Audio extraction -- FFmpeg extracts 16 kHz mono WAV
- Duration check -- if audio >30 minutes, split into 30-minute chunks via FFmpeg segment muxer
- Transcription -- each chunk (or the full audio) processed by whisper.cpp (CPU or CUDA)
- SRT merging -- chunk SRTs merged into one with sequential segment numbering
- File tagging -- output saved as
video.{lang}.{identifier}.srtnext to the video
/Media/Movies/My Movie (2024).mkv
/Media/Movies/My Movie (2024).en.whisper.srt
The scheduled task UI shows percentage progress with per-chunk updates. Check Jellyfin logs for detailed per-step logging:
[INF] Whisper task starting. Model=Small, Language="en", Translate=False, Identifier="whisper"
[INF] Generating: /Media/Movies/My Movie (2024).mkv
[INF] Using CUDA binary at /cache/whisper-cpp/whisper-whisper-cli-cuda
[INF] Subtitles written: My Movie (2024).en.whisper.srt (12345 bytes)
[INF] Task complete. Generated=1, Skipped=0, Errors=0
Models are downloaded from Hugging Face (ggerganov/whisper.cpp) and cached in ~/.cache/whisper/.
| Model | Size | Speed (vs Large) | VRAM | Quality |
|---|---|---|---|---|
| Tiny / Tiny.en | 75 MB | ~10x | ~1 GB | Lowest |
| Base / Base.en | 140 MB | ~7x | ~1 GB | Low |
| Small / Small.en | 460 MB | ~4x | ~2 GB | Recommended |
| Medium / Medium.en | 1.5 GB | ~2x | ~5 GB | High |
| Turbo | 1.6 GB | ~8x | ~6 GB | High (fast) |
| Large (v3) | 3 GB | 1x | ~10 GB | Best |
Small is the default and recommended starting point. Turbo is nearly as accurate as Large but 8x faster.
| Model | RAM per chunk |
|---|---|
| Tiny | ~300 MB |
| Base | ~400 MB |
| Small | ~800 MB |
| Turbo | ~2 GB |
| Medium | ~2 GB |
| Large | ~3.8 GB |
The plugin ships two separate whisper binaries:
| Binary | Purpose | Size |
|---|---|---|
whisper-whisper-cli |
CPU-only (AVX2) | 4.2 MB |
whisper-whisper-cli-cuda |
CUDA GPU | ~1.1 GB |
Three CUDA shared libraries are bundled alongside the CUDA binary: libcudart.so.12 (692 KB), libcublas.so.12 (105 MB), libcublasLt.so.12 (422 MB). The plugin sets LD_LIBRARY_PATH to locate them at runtime. The NVIDIA driver library libcuda.so.1 is NOT bundled -- it comes from the host driver via container GPU passthrough.
- NVIDIA driver installed on the host (verify with
nvidia-smi) - nvidia-container-toolkit installed for Docker:
sudo apt-get install nvidia-container-toolkit sudo systemctl restart docker
- Container started with
--gpus allorruntime: nvidia
- User checks "Enable CUDA" in config
nvidia-smidetects NVIDIA GPU -> GPU type = "cuda"- CUDA binary is found in cache -> use CUDA binary with
-dev 0flag - Falls back to CPU binary with
-ngflag if any step fails
| Model | CPU (6 threads) | GPU (RTX 3060) | Speedup |
|---|---|---|---|
| Tiny | ~0.3x realtime | ~40x realtime | ~130x |
| Base | ~0.8x realtime | ~30x realtime | ~37x |
| Small | ~0.4x realtime | ~15x realtime | ~37x |
A 30-minute chunk on CPU takes ~23 minutes with Base model. On CUDA it takes ~1 minute.
Cause: Binary compiled with AVX-512 instructions running on a CPU without AVX-512 support.
Fix: Rebuild with -DGGML_NATIVE=OFF, -DCMAKE_C_FLAGS="-march=x86-64 -mtune=generic". Fixed in v1.1.1.0+.
Cause: Long audio files loaded entirely into memory exceed available RAM.
Fix: Audio chunking splits files >30 minutes into 30-minute segments. Each chunk stays within ~4 GB even with the Large model. Fixed in v2.0.0.0+.
Cause: Plugin could not locate the ffprobe binary.
Fix: Set the FFprobe Path in Settings, or ensure ffprobe is in the same directory as ffmpeg. Fixed in v2.1.0.0+.
Check: Config page shows "Runtime Hardware Status" (should be "cuda"), CUDA binary is deployed, container has --gpus all, docker exec jellyfin nvidia-smi works.
Cause: Plugin zip was extracted incorrectly or binary cache is stale.
Fix: Click Deploy Runtime Binary on the config page, or delete ~/.cache/whisper-cpp/ and reinstall.
The config page shows binary deployment status, GPU type, cached model count, and available CPU threads. If diagnostics show "unconfigured", trigger the task once or click Deploy Runtime Binary.
- Docker (recommended for GLIBC-compatible builds)
- .NET SDK 9.0
- git, cmake, build-essential
ghCLI (optional, for publishing releases)
# From the repo root
bash make-release.shThis builds whisper.cpp in Docker (CPU + CUDA), builds the C# plugin, packages the zip, and optionally publishes to GitHub.
# Docker build (recommended)
bash Jellyfin.Plugin.WhisperSubtitles/Scripts/Build-whisper.sh \
Jellyfin.Plugin.WhisperSubtitles/Jellyfin.Plugin.WhisperSubtitles/bin/whisper/linux-x64/
# Native build (fallback -- may require GLIBC 2.43+)
bash Jellyfin.Plugin.WhisperSubtitles/Scripts/Build-whisper.sh --no-docker \
Jellyfin.Plugin.WhisperSubtitles/Jellyfin.Plugin.WhisperSubtitles/bin/whisper/linux-x64/dotnet publish --configuration Release \
Jellyfin.Plugin.WhisperSubtitles/Jellyfin.Plugin.WhisperSubtitles/Jellyfin.Plugin.WhisperSubtitles.csprojThe multi-stage Dockerfile (Scripts/Dockerfile.whisper):
- Stage 1 (cpu-builder): Ubuntu 22.04,
-DGGML_CUDA=OFF->whisper-whisper-cli - Stage 2 (cuda-builder):
nvidia/cuda:12.4.1-devel-ubuntu22.04,-DGGML_CUDA=ON, targets GPU architectures 50/60/70/75/80/86/89 ->whisper-whisper-cli-cuda+ libcudart/libcublas/libcublasLt - Stage 3 (output): Copies all artifacts
| Flag | Purpose |
|---|---|
-DGGML_NATIVE=OFF |
Prevents AVX-512 instructions in the binary |
-DGGML_OPENMP=OFF |
Disables OpenMP (not needed, uses own threading) |
-DGGML_CUDA=ON |
Enables CUDA GPU support (CUDA stage only) |
-DCMAKE_CUDA_ARCHITECTURES |
Target GPU architectures (each adds ~150 MB to binary) |
-DCMAKE_C_FLAGS=-march=x86-64 -mtune=generic |
Maximum CPU compatibility |
-DCMAKE_EXE_LINKER_FLAGS=-static-libgcc -static-libstdc++ |
Static GCC/GLIBCXX linkage |
| Version | Changes |
|---|---|
| 3.0.0.0 | Semantic versioning restructure. Dual CPU+GPU binaries with CUDA support. Dockerfile builds both binaries + bundles CUDA .so files. Plugin auto-selects CUDA binary. |
| 2.2.0.0 | Per-video sub-progress reporting. Progress bar updates per chunk instead of freezing per video. |
| 2.1.0.0 | Fixed ffprobe path detection. Added FfprobePath config field + FindFfprobe() fallback chain. |
| 2.0.0.0 | Audio chunking for >30 min videos to prevent OOM on low-RAM servers (4 GB). Chunked processing + SRT merging. |
| 1.1.2.0 | Fixed -ngl 999 -> -dev 0 for GPU path. |
| 1.1.1.0 | Fixed AVX-512 SIGILL: added GGML_NATIVE=OFF to cmake. |
| 1.1.0.0 | Fixed whisper-cli arguments: removed unsupported --output-dir and -vv flags, added -ng for CPU mode. |
| 1.0.0.0 | Initial release. FFmpeg audio extraction, config page, Jellyfin 10.11.2 compatibility. |
Video File
|
v
Audio Extraction (FFmpeg -> 16 kHz mono WAV)
|
v
Duration Check
+-- <=30 min -> Single chunk
+-- >30 min -> Split into 30-min chunks via FFmpeg segment muxer
|
v
whisper.cpp (CPU or CUDA binary)
|
v
Per-chunk SRT files
|
v
Merge SRTs (renumber segments)
|
v
Final SRT -> saved next to video
~/.cache/
whisper/ -- Model files (ggml-*.bin)
whisper-cpp/ -- Binaries and CUDA .so files
whisper-whisper-cli
whisper-whisper-cli-cuda
libcudart.so.12
libcublas.so.12
libcublasLt.so.12
MIT License. This plugin uses whisper.cpp by Georgi Gerganov and OpenAI Whisper.