SD-orb

SD-orb is a high-performance, real-time AI VJ orchestrator. It combines the power of Stable Diffusion (via NVIDIA TensorRT) with audio-reactive feedback loops to create immersive, recursive visuals that respond to live music.

🚀 Key Features

Real-time AI Generation: Stable Diffusion 1.5 + LCM (Latent Consistency Model) compiled to NVIDIA TensorRT 10.x for one-step inference per frame.
Accurate Audio Reactivity: AudioAnalyzer with a Hann-windowed FFT, musical EQ bands (Bass 20–250 Hz / Mids 250–2000 Hz / Highs 2–8 kHz), per-band auto-gain, and an asymmetric envelope follower (fast attack, slow release) so transients punch through.
GPU Feedback Engine: Visualizer does its polar warp on the GPU with F.grid_sample — zoom, rotation and decay all happen on resident CUDA tensors, no CPU round-trip.
Image-Space Recursion: The AI step blends in image space (output = lerp(warped_input, ai_image, strength)), so geometric warps survive the denoiser instead of getting redrawn back to centre.
🎚️ Hot-Swappable Model Bank: Multiple SD 1.5 engines (RealisticVision, DreamShaper, Deliberate, MeinaMix, …) can be compiled and selected at runtime via a UI dropdown. Engine swap takes ~1–2 s with atomic restore-on-failure, so a bad load can't crash the loop.
🌀 Deep Dream: Local Ollama LLM generates an evolving surreal narrative — each new prompt continues the previous one as a single dream. Picker for any installed model, temperature slider, keyword steering (Influences), history reset.
🎥 Recording: One-click capture of the canvas to MP4. Video uses NVENC (zero CUDA contention with SD); audio is captured in parallel from PulseAudio's system-output monitor (or mic, configurable) at full rate and muxed at stop. Files land in recordings/.
🎛️ MIDI Learn: Bind any hardware knob, fader or pad to any UI control. Mappings persist in midi_mappings.json across restarts. CC for sliders, Note for buttons/toggles.
Interactive UI: DearPyGui, with a scalable display so the rendered canvas can be shown at 2× or more without re-rendering at higher resolution.

💻 Hardware Requirements

GPU: NVIDIA RTX 30-series or 40-series (8 GB+ VRAM). NVENC required for the in-app recorder.
Driver: NVIDIA Driver 535+
CUDA: 12.x • TensorRT: 10.x
Audio: PulseAudio or PipeWire (for the recording feature). PyAudio for the visualizer's analyzer works on ALSA.
(Optional) Ollama: Required only for Deep Dream mode. Any installed instruction-tuned model works; llama3.1:8b is a fast default.
(Optional) MIDI Controller: Any class-compliant USB MIDI device — picked up via mido/rtmidi.

🛠️ Installation

1. Clone the Repository

git clone https://github.com/xaymup/SD-orb.git
cd SD-orb

2. Setup Virtual Environment

python -m venv venv
source venv/bin/activate

3. Install Dependencies

pip install -r requirements.txt

4. Download Models

Place SD 1.5 checkpoints in models/. Recommended starter: Realistic Vision V6.0 B1

For more variety, see the Model Bank section below.

5. Build TensorRT Engine

Engine is shape-baked at the resolution in main.py (default 384x384). Takes 10–20 min the first time.

python builder.py <model_name>

<model_name> looks up models/<model_name>.safetensors and writes engines/<model_name>/unet.engine.
Run with no argument to build the legacy default (realisticVisionV60B1_v51HyperVAE).

6. (Optional) Start Ollama for Deep Dream

ollama serve &
ollama pull llama3.1:8b

🎮 Usage

python main.py

The viewport title shows live FPS, audio bands, the actual zoom/rot being applied, and warp/AI diff diagnostics:

25.9 fps | bass 0.74 mid 0.62 hi 0.47 | zoom 0.877 rot +0.224 | warp_diff 0.16 ai_diff 0.08

Controls

Section	Control	Notes
SD Model	Dropdown	Hot-swap between any compiled engine in `engines/`. Status line confirms active model.
Prompt Playlist	Add / edit / shuffle, Auto Switch on interval	Manual or auto cycling
AI	Strength	0 = pure warp passthrough · 0.3 default · 1 = AI fully redraws each frame
AI	Temporal Smooth	Image-space blend with previous output frame
AI	Bypass AI	Skip the diffusion step entirely — useful for tuning the warp in isolation
Feedback Engine	Zoom Base / React	Constant zoom + bass-driven zoom-in pulse
Feedback Engine	Rotate Base / React	Constant rotation + mids-driven rotation
Feedback Engine	Audio Smooth	Release time of the envelope follower
Recording	Start / Stop, Audio source	NVENC video + PulseAudio capture → MP4 in `recordings/`. Pick `system` (loopback) or `mic`.
MIDI Learn	Port, Control, Listen	Pick port + target control, hit Listen, touch your MIDI device
Deep Dream	Enable	Switches the auto-prompt cycle to Ollama-generated dreams
Deep Dream	Model / Temp / Reset	Pick model, push temperature up for weirder dreams, reset history
Deep Dream	Influences	Comma-separated keywords steer the dream softly toward themes

Recommended Starting Settings

For a clearly perceptible "dive into the image" effect with audio reactivity:

AI Strength: 0.3
Temporal Smooth: 0.4
Zoom React: 1.5–2.0
Rotate React: 1.0–1.5

🎚️ Model Bank

The runtime hot-swaps between any engines in engines/. All engines share the canonical SD 1.5 VAE (stabilityai/sd-vae-ft-mse), the same text encoder, and the same LCM scheduler — only the UNet weights differ.

To add a model:

Drop the .safetensors in models/ with a clean name (e.g. models/dreamshaper8.safetensors).
Build the engine:
```
python builder.py dreamshaper8
```
Restart main.py — it appears in the SD Model dropdown.

Engines live at engines/<name>/unet.engine. The flat layout (engines/<name>.engine) also works for one-off engines.

Aesthetic guide

Model	Vibe	Best for
RealisticVision V6	Cinematic photoreal portraits	"cinematic portrait" / realistic scenes
DreamShaper 8	Versatile painterly-photoreal	Fantasy, sci-fi, abstract — all-rounder
Deliberate v11	Strongly painted / illustrated	Best for recursive feedback (brushstrokes trail beautifully on bass kicks)
MeinaMix V8	Anime, saturated, high-contrast	Anime / "trippy" sets, holds up under aggressive warp

Many other SD 1.5 finetunes work the same way (AbsoluteReality, EpicRealism, CyberRealistic, Counterfeit, etc.).

📊 Performance Notes

On an RTX 4080 at 384×384:

~22–26 fps end-to-end (including full-VAE encode + decode and image-space blending)
Recording active: ~0 fps drop (NVENC uses the dedicated video engine)
Deep Dream prompt generation: ~1–3 s per dream, runs in a background thread; never blocks the render loop
Engine swap: ~1–2 s (releases old engine atomically, falls back if the new one fails to load)

512² or higher requires an engine rebuild and will roughly halve the framerate.

📂 Project Structure

main.py            Entry point, UI, main render loop
pipeline.py        SD1.5 + LCM TRT pipeline (image-space strength/temporal blend,
                   atomic engine swap, canonical SD 1.5 VAE)
visualizer.py      Polar warp + decay, all on GPU via F.grid_sample
audio_analyzer.py  FFT bands with auto-gain + asymmetric envelope
dreamer.py         Background Ollama client for evolving dream prompts + keyword steering
recorder.py        NVENC video + PulseAudio capture, async mux on stop
midi.py            mido-based MIDI input with learn mode and JSON-persisted mappings
builder.py         TensorRT engine compiler (gitignored) — accepts <model_name> arg
models/            (gitignored) .safetensors checkpoints
engines/           (gitignored) Compiled TensorRT engines (per-model subdirs)
recordings/        (gitignored) Output MP4 files
midi_mappings.json (gitignored) Persisted MIDI bindings

📄 License

MIT — see LICENSE.

🙏 Acknowledgments

StreamDiffusion for acceleration patterns and the TRT compile path.
HuggingFace Diffusers.
Ollama for the local LLM runtime that powers Deep Dream.
DearPyGui.
mido + python-rtmidi for MIDI input.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
audio_analyzer.py		audio_analyzer.py
build_all.py		build_all.py
download_loras.py		download_loras.py
dreamer.py		dreamer.py
fullscreen.py		fullscreen.py
fullscreen_viewer.py		fullscreen_viewer.py
main.py		main.py
midi.py		midi.py
pipeline.py		pipeline.py
prompter.py		prompter.py
recipes.py		recipes.py
recorder.py		recorder.py
requirements.txt		requirements.txt
settings.py		settings.py
setup.py		setup.py
upscaler.py		upscaler.py
visualizer.py		visualizer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SD-orb

🚀 Key Features

💻 Hardware Requirements

🛠️ Installation

1. Clone the Repository

2. Setup Virtual Environment

3. Install Dependencies

4. Download Models

5. Build TensorRT Engine

6. (Optional) Start Ollama for Deep Dream

🎮 Usage

Controls

Recommended Starting Settings

🎚️ Model Bank

Aesthetic guide

📊 Performance Notes

📂 Project Structure

📄 License

🙏 Acknowledgments

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

SD-orb

🚀 Key Features

💻 Hardware Requirements

🛠️ Installation

1. Clone the Repository

2. Setup Virtual Environment

3. Install Dependencies

4. Download Models

5. Build TensorRT Engine

6. (Optional) Start Ollama for Deep Dream

🎮 Usage

Controls

Recommended Starting Settings

🎚️ Model Bank

Aesthetic guide

📊 Performance Notes

📂 Project Structure

📄 License

🙏 Acknowledgments

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages