Skip to content

JDSM01/MP4_backgroundAdder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

MP4 Background Swap

Replace the background of a talking-head MP4 with a still image — no green screen required. Uses Robust Video Matting (RVM) via ONNX Runtime to segment the person, then composites them over the chosen background and re-encodes with ffmpeg.

The project is split into:

  • backend/ — FastAPI service that runs RVM matting (ONNX, CPU) frame-by-frame and pipes the composited frames into ffmpeg for H.264/AAC encoding.
  • frontend/ — React + Vite + TypeScript UI.

Requirements

  • Python 3.9+
  • Node.js 20+
  • ffmpeg and ffprobe available on PATH (e.g. brew install ffmpeg on macOS, apt install ffmpeg on Debian/Ubuntu).

Local development

Backend

cd backend
python -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt
PYTHONPATH=. uvicorn app.main:app --reload --port 8000

Frontend

cd frontend
npm install
npm run dev

The Vite dev server proxies /api to http://localhost:8000.

Both at once

./start.sh

(Assumes the backend's .venv already exists.)

Docker

docker compose up --build

The image installs ffmpeg, builds the frontend into static/ and serves the API on port 8000.

API

  • POST /api/probe — multipart file (mp4/mov) → { "duration_seconds", "width", "height" }.
  • POST /api/build — multipart:
    • video_file: the source MP4 (talking-head; any background).
    • background_file: a still image (PNG/JPEG/WEBP) to use as the new background.
    • matting_model (optional, default mobilenetv3): mobilenetv3 (Fast, ~14 MB) or resnet50 (Quality, ~100 MB).
    • downsample_ratio (optional, default 0.25, range 0.05–1.0): internal inference scale. RVM suggests ~0.25 for 1080p, ~0.375 for 720p, 1.0 for ≤512 px. Lower = faster, less edge detail.
    • output_name (optional): output filename. Returns the generated video/mp4.

How it works

Each frame is run through the RVM ONNX model (CPU), which produces a foreground RGB and an alpha matte. The frame is composited over the background image (scaled + center-cropped to the video's resolution) and streamed as raw BGR24 into a single ffmpeg process that mixes in the original audio and encodes H.264/AAC.

The selected model file is downloaded on first use into ~/.cache/mp4_bg_swap/models/ and reused afterwards (override the cache location with the MP4_BG_SWAP_CACHE environment variable).

Performance: CPU inference is slow. On an Apple-silicon Mac, expect roughly 1 fps for a 1080p source with mobilenetv3 at downsample_ratio=0.25; resnet50 is several times slower. Pick Fast for iteration, Quality for the final render.

Output

  • Same resolution and frame rate as the input video, H.264 (yuv420p), AAC audio at 192 kbps.

About

Adds a background to MP4 with a person in it

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors