Scope Load Testing

Automated load testing harness for Daydream Scope cloud inference on the Livepeer orchestrator network.

What It Does

This harness continuously validates that Scope's cloud inference pipeline works correctly across all Livepeer orchestrators. It:

Runs real inference sessions through Scope's HTTP API — connect to cloud, load pipelines, stream video, validate output frames
Covers all major pipelines — longlive, ltx2, chained graphs (longlive+rife, depth+longlive+rife)
Tests all modes — text-to-video (t2v), video-to-video (v2v), image-to-video (i2v) at short (1m), mid (5m), and long (15m) durations
Rotates fairly across orchestrators — discovers all available orchestrators, distributes test load evenly, tracks coverage
Detects regressions — rolling 7-day baselines for latency and FPS, alerts on >20% degradation
Reports to Grafana — real-time dashboards via Prometheus push gateway

Architecture

┌─────────────────────── Docker Compose ───────────────────────┐
│                                                               │
│  scope-1 (daydream-scope :8001) ─┐                           │
│  scope-2 (daydream-scope :8002) ─┤── Livepeer Orchestrators  │
│                                   │                           │
│  loadtest-harness ────────────────┘                           │
│    │                                                          │
│    └── pushgateway (:9091) ──── Grafana                       │
└───────────────────────────────────────────────────────────────┘

The harness is a lightweight Python service (~6 dependencies, no ML/GPU needed) that drives unmodified Scope instances via their HTTP API. Each Scope instance connects to a Livepeer orchestrator for remote GPU inference.

Quick Start

# 1. Clone
git clone https://github.com/livepeer/scope-load-testing.git
cd scope-load-testing

# 2. Configure
cp .env.example .env
# Edit .env with your Livepeer and Scope credentials

# 3. Generate test videos (requires opencv-python)
pip install opencv-python
python scripts/generate_test_videos.py

# 4. Start
docker compose up -d

The scheduler starts automatically, discovers orchestrators, and begins running test scenarios.

CLI

# Run a single scenario manually
python -m loadtest.cli run --scenario longlive_t2v_5m --scope-url http://localhost:8001

# Start the scheduler daemon
python -m loadtest.cli schedule

# List discovered orchestrators
python -m loadtest.cli discover

# Show today's test coverage
python -m loadtest.cli coverage

# Show performance baselines
python -m loadtest.cli baselines

Configuration

All configuration is in config/default.yaml:

budget:
  daily_percent: 20        # % of 24hrs each orchestrator is under test
  max_run_duration_mins: 30 # hard cap per run
  min_run_gap_mins: 15     # cooldown between batches

scenarios:
  - pipeline: longlive
    modes: [t2v, v2v, i2v]
    durations: [1, 5, 15]
    prompts_pool: nature
    # ...

Adding a new pipeline requires only a config change — add an entry to the scenarios list. No code modifications needed.

How It Works

Scheduler

Discovers all healthy Livepeer orchestrators (refreshes every 4 hours)
Calculates a daily test budget per orchestrator (default: 20% of 24hrs = ~10 runs/day)
Assigns scenarios using a test-debt priority queue — orchestrators with the least coverage get tested first
Runs scenarios concurrently across available Scope instances (1 per orchestrator at a time)
After 5 consecutive failures, an orchestrator is blacklisted for the day

Executor

Each test run follows this lifecycle:

Connect — POST /api/v1/cloud/connect → poll until connected (timeout: 120s)
Load — POST /api/v1/pipeline/load → poll until loaded (timeout: 300s)
Stream — POST /api/v1/session/start → monitoring loop:
- Capture frames, validate they're not black/corrupt
- Switch prompts, verify output changes (pixel diff)
- Track FPS and VRAM usage
- Detect stalls (fps=0 for >10s)
Cleanup — stop session, disconnect, capture logs on failure
Report — classify errors, push metrics, update baselines

Validation

Frame quality — JPEG decode, dimension check, black frame detection (pixel variance)
Prompt sensitivity — mean pixel difference before/after prompt switch must exceed threshold
VRAM leak — compare first-quarter vs last-quarter VRAM in mid/long sessions
Stall detection — fps_out=0 for >10s triggers failure

Error Taxonomy

Every failure is classified as: network (timeout, disconnect), orchestrator (502/503, capacity), runner (OOM, CUDA, pipeline crash), or protocol (bad response). Logs are captured on failure for post-mortem.

Regression Detection

7-day rolling P50/P95 baselines per scenario
Flags >20% degradation in first-frame latency, FPS, or pipeline load time
Cold start frequency tracking per orchestrator

Grafana Dashboard

Import dashboards/grafana/scope-loadtest.json into Grafana. Six panels:

Overview — total runs, pass rate, orchestrator coverage
Per-Orchestrator — table with success rate, connect time, FPS, budget progress
Per-Pipeline — load time, first-frame latency, steady FPS by pipeline/mode
Latency Trends — 7-day P50/P95 with drift overlay
Error Breakdown — failures by category over time
Budget & Coverage — daily budget consumed per orchestrator

Development

# Install with dev dependencies
pip install -e ".[dev]"

# Run tests
pytest                           # all tests
pytest --ignore=tests/test_executor.py  # fast tests only (~1s)

# Build Docker image
docker build -f Dockerfile.harness -t scope-loadtest-harness .

Project Structure

src/loadtest/
├── cli.py           # Click CLI (run, schedule, discover, coverage, baselines)
├── config.py        # YAML config loading and validation
├── scenarios.py     # Scenario matrix expansion, session body builder
├── scope_client.py  # Typed async HTTP client for Scope API
├── executor.py      # Full test lifecycle (connect→load→stream→cleanup)
├── scheduler.py     # Budget planning, fair rotation, execution loop
├── discovery.py     # Orchestrator discovery and health tracking
├── coverage.py      # Per-orchestrator daily coverage persistence
├── metrics.py       # Prometheus metric definitions and push
├── validators.py    # Frame quality and prompt sensitivity (Pillow)
├── results.py       # RunResult, error taxonomy, log capture
└── regression.py    # Rolling baselines and drift detection

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
config		config
dashboards/grafana		dashboards/grafana
docs		docs
scripts		scripts
src/loadtest		src/loadtest
tests		tests
.env.example		.env.example
.gitignore		.gitignore
CLAUDE.md		CLAUDE.md
Dockerfile.harness		Dockerfile.harness
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Scope Load Testing

What It Does

Architecture

Quick Start

CLI

Configuration

How It Works

Scheduler

Executor

Validation

Error Taxonomy

Regression Detection

Grafana Dashboard

Development

Project Structure

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Scope Load Testing

What It Does

Architecture

Quick Start

CLI

Configuration

How It Works

Scheduler

Executor

Validation

Error Taxonomy

Regression Detection

Grafana Dashboard

Development

Project Structure

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages