SD-orb is a high-performance, real-time AI VJ orchestrator. It combines the power of Stable Diffusion (via NVIDIA TensorRT) with audio-reactive feedback loops to create immersive, recursive visuals that respond to live music.
- Real-time AI Generation: Stable Diffusion 1.5 + LCM (Latent Consistency Model) compiled to NVIDIA TensorRT 10.x for one-step inference per frame.
- Accurate Audio Reactivity:
AudioAnalyzerwith a Hann-windowed FFT, musical EQ bands (Bass 20–250 Hz / Mids 250–2000 Hz / Highs 2–8 kHz), per-band auto-gain, and an asymmetric envelope follower (fast attack, slow release) so transients punch through. - GPU Feedback Engine:
Visualizerdoes its polar warp on the GPU withF.grid_sample— zoom, rotation and decay all happen on resident CUDA tensors, no CPU round-trip. - Image-Space Recursion: The AI step blends in image space (
output = lerp(warped_input, ai_image, strength)), so geometric warps survive the denoiser instead of getting redrawn back to centre. - 🎚️ Hot-Swappable Model Bank: Multiple SD 1.5 engines (RealisticVision, DreamShaper, Deliberate, MeinaMix, …) can be compiled and selected at runtime via a UI dropdown. Engine swap takes ~1–2 s with atomic restore-on-failure, so a bad load can't crash the loop.
- 🌀 Deep Dream: Local Ollama LLM generates an evolving surreal narrative — each new prompt continues the previous one as a single dream. Picker for any installed model, temperature slider, keyword steering (Influences), history reset.
- 🎥 Recording: One-click capture of the canvas to MP4. Video uses NVENC (zero CUDA contention with SD); audio is captured in parallel from PulseAudio's system-output monitor (or mic, configurable) at full rate and muxed at stop. Files land in
recordings/. - 🎛️ MIDI Learn: Bind any hardware knob, fader or pad to any UI control. Mappings persist in
midi_mappings.jsonacross restarts. CC for sliders, Note for buttons/toggles. - Interactive UI:
DearPyGui, with a scalable display so the rendered canvas can be shown at 2× or more without re-rendering at higher resolution.
- GPU: NVIDIA RTX 30-series or 40-series (8 GB+ VRAM). NVENC required for the in-app recorder.
- Driver: NVIDIA Driver 535+
- CUDA: 12.x • TensorRT: 10.x
- Audio: PulseAudio or PipeWire (for the recording feature). PyAudio for the visualizer's analyzer works on ALSA.
- (Optional) Ollama: Required only for Deep Dream mode. Any installed instruction-tuned model works;
llama3.1:8bis a fast default. - (Optional) MIDI Controller: Any class-compliant USB MIDI device — picked up via mido/rtmidi.
git clone https://github.com/xaymup/SD-orb.git
cd SD-orbpython -m venv venv
source venv/bin/activatepip install -r requirements.txtPlace SD 1.5 checkpoints in models/. Recommended starter:
Realistic Vision V6.0 B1
For more variety, see the Model Bank section below.
Engine is shape-baked at the resolution in main.py (default 384x384). Takes 10–20 min the first time.
python builder.py <model_name><model_name>looks upmodels/<model_name>.safetensorsand writesengines/<model_name>/unet.engine.- Run with no argument to build the legacy default (
realisticVisionV60B1_v51HyperVAE).
ollama serve &
ollama pull llama3.1:8bpython main.pyThe viewport title shows live FPS, audio bands, the actual zoom/rot being applied, and warp/AI diff diagnostics:
25.9 fps | bass 0.74 mid 0.62 hi 0.47 | zoom 0.877 rot +0.224 | warp_diff 0.16 ai_diff 0.08
| Section | Control | Notes |
|---|---|---|
| SD Model | Dropdown | Hot-swap between any compiled engine in engines/. Status line confirms active model. |
| Prompt Playlist | Add / edit / shuffle, Auto Switch on interval | Manual or auto cycling |
| AI | Strength | 0 = pure warp passthrough · 0.3 default · 1 = AI fully redraws each frame |
| AI | Temporal Smooth | Image-space blend with previous output frame |
| AI | Bypass AI | Skip the diffusion step entirely — useful for tuning the warp in isolation |
| Feedback Engine | Zoom Base / React | Constant zoom + bass-driven zoom-in pulse |
| Feedback Engine | Rotate Base / React | Constant rotation + mids-driven rotation |
| Feedback Engine | Audio Smooth | Release time of the envelope follower |
| Recording | Start / Stop, Audio source | NVENC video + PulseAudio capture → MP4 in recordings/. Pick system (loopback) or mic. |
| MIDI Learn | Port, Control, Listen | Pick port + target control, hit Listen, touch your MIDI device |
| Deep Dream | Enable | Switches the auto-prompt cycle to Ollama-generated dreams |
| Deep Dream | Model / Temp / Reset | Pick model, push temperature up for weirder dreams, reset history |
| Deep Dream | Influences | Comma-separated keywords steer the dream softly toward themes |
For a clearly perceptible "dive into the image" effect with audio reactivity:
- AI Strength: 0.3
- Temporal Smooth: 0.4
- Zoom React: 1.5–2.0
- Rotate React: 1.0–1.5
The runtime hot-swaps between any engines in engines/. All engines share the canonical SD 1.5 VAE (stabilityai/sd-vae-ft-mse), the same text encoder, and the same LCM scheduler — only the UNet weights differ.
To add a model:
- Drop the
.safetensorsinmodels/with a clean name (e.g.models/dreamshaper8.safetensors). - Build the engine:
python builder.py dreamshaper8
- Restart
main.py— it appears in the SD Model dropdown.
Engines live at engines/<name>/unet.engine. The flat layout (engines/<name>.engine) also works for one-off engines.
| Model | Vibe | Best for |
|---|---|---|
| RealisticVision V6 | Cinematic photoreal portraits | "cinematic portrait" / realistic scenes |
| DreamShaper 8 | Versatile painterly-photoreal | Fantasy, sci-fi, abstract — all-rounder |
| Deliberate v11 | Strongly painted / illustrated | Best for recursive feedback (brushstrokes trail beautifully on bass kicks) |
| MeinaMix V8 | Anime, saturated, high-contrast | Anime / "trippy" sets, holds up under aggressive warp |
Many other SD 1.5 finetunes work the same way (AbsoluteReality, EpicRealism, CyberRealistic, Counterfeit, etc.).
On an RTX 4080 at 384×384:
- ~22–26 fps end-to-end (including full-VAE encode + decode and image-space blending)
- Recording active: ~0 fps drop (NVENC uses the dedicated video engine)
- Deep Dream prompt generation: ~1–3 s per dream, runs in a background thread; never blocks the render loop
- Engine swap: ~1–2 s (releases old engine atomically, falls back if the new one fails to load)
512² or higher requires an engine rebuild and will roughly halve the framerate.
main.py Entry point, UI, main render loop
pipeline.py SD1.5 + LCM TRT pipeline (image-space strength/temporal blend,
atomic engine swap, canonical SD 1.5 VAE)
visualizer.py Polar warp + decay, all on GPU via F.grid_sample
audio_analyzer.py FFT bands with auto-gain + asymmetric envelope
dreamer.py Background Ollama client for evolving dream prompts + keyword steering
recorder.py NVENC video + PulseAudio capture, async mux on stop
midi.py mido-based MIDI input with learn mode and JSON-persisted mappings
builder.py TensorRT engine compiler (gitignored) — accepts <model_name> arg
models/ (gitignored) .safetensors checkpoints
engines/ (gitignored) Compiled TensorRT engines (per-model subdirs)
recordings/ (gitignored) Output MP4 files
midi_mappings.json (gitignored) Persisted MIDI bindings
MIT — see LICENSE.
- StreamDiffusion for acceleration patterns and the TRT compile path.
- HuggingFace Diffusers.
- Ollama for the local LLM runtime that powers Deep Dream.
- DearPyGui.
- mido + python-rtmidi for MIDI input.
