Inferred eyebrow tracking for VRChat — no Quest Pro required.
Estimates VRCFT Unified Expression brow parameters from eye tracking, lower face tracking, and microphone prosody data using a rule-based pipeline with a learned GRU residual model. Works with any OpenXR-compatible headset.
- Python 3.11+
- SteamVR (required for head motion tracking; other runtimes fall back to no head tracking)
- VRCFaceTracking (VRCFT) for eye/face input
- Microphone (optional, improves prosody-driven brow movement)
pip install -r requirements.txt# Default startup (will load the model at start)
python -m ws_server.server
# With ONNX model and terminal UI
python -m ws_server.server --model models/browsync.onnx
# Headless (log to console, no TUI)
python -m ws_server.server --model models/browsync.onnx --no-tui
# Custom port (default: 7720)
python -m ws_server.server --port 7721Note: Head motion tracking uses SteamVR via OpenVR's background mode — BrowSync does not need a graphics context, but SteamVR must already be running. If SteamVR is not running when BrowSync starts, head tracking is disabled and the server falls back to a lower inference mode automatically.
BrowSyncModule/ contains the C# plugin that connects VRCFT to the Python server. Each frame it sends current eye/face expression data to the server and receives inferred brow values in return, writing them into VRCFT's UnifiedTracking shape buffer so VRChat picks them up via OSC.
The module deliberately only writes brow shapes — it does not claim eye tracking or lower-face data. This means it stacks cleanly with whatever eye/face tracker you already use in VRCFT.
- VRCFaceTracking v5+
- .NET 7 SDK
cd BrowSyncModule
dotnet build -c ReleaseThe post-build step automatically copies BrowSyncModule.dll to %APPDATA%\VRCFaceTracking\CustomLibs\. VRCFT discovers modules there on startup. To install manually, copy the DLL yourself:
BrowSyncModule\bin\Release\net7.0\BrowSyncModule.dll
→ %APPDATA%\VRCFaceTracking\CustomLibs\BrowSyncModule.dll
- Start the Python server (see Quick Start)
- Launch VRCFaceTracking — the module loads automatically and connects to
ws://localhost:7720 - If the server isn't running yet, the module will retry in the background and connect once it's up
The module sends a reset on connect to clear the GRU buffer and pings every 5 seconds to keep the connection alive. If the Python server disconnects, brow shapes are zeroed and the module reconnects automatically with a 3-second backoff.
The host and port are hardcoded to
localhost:7720. If you need to change them, editBrowSyncModule.csand rebuild.
The TUI launches automatically unless --no-tui is passed. It shows live inference mode, FPS, per-AU bar meters, and source status.
| Key | Action |
|---|---|
q |
Quit |
r |
Recalibrate head tracking neutral pose |
d |
Donate current session to HF Hub (Quest Pro users) |
Ctrl+L |
Toggle dev log |
The server selects the best available mode automatically and degrades gracefully as sources become unavailable. VRCFT data is considered stale after 0.5 seconds. A source must be absent for 300ms before the mode degrades (hysteresis), and mode transitions blend over 150ms to prevent visible brow snapping.
| Mode | Eye + Face | Mic | Head | ML Model |
|---|---|---|---|---|
ml |
✓ | ✓ | ✓ | ✓ |
rules_only |
✓ | ✓ | ✓ | — |
mic_head |
— | ✓ | ✓ | — |
head_only |
— | — | ✓ | — |
noise_only |
— | — | — | — |
noise_only outputs procedural animation to prevent the avatar from freezing. In this mode the VRCFT module zeros all brow shapes rather than applying the noise to the avatar.
| Group | Count | Features |
|---|---|---|
| Eye tracking | 14 | Openness, wide, squint, lid tightener, gaze X/Y, blink (per side) |
| Lower face | 13 | Jaw, lip corners, cheek raise, lip stretch, and others |
| Prosody | 6 | PitchNorm, PitchDelta, EnergyNorm, EnergyDelta, SpeechRate, IsSpeaking |
| Emotion | 3 | Valence, arousal, confidence (requires SpeechBrain, optional) |
| Temporal deltas | 5 | Computed from key signals each frame |
| Head motion | 11 | Pitch/roll/yaw, Y/Z translation, linear/angular velocity and acceleration |
Missing features default to 0.0 — partial tracker setups are fully supported.
All outputs are VRCFT Unified Expression parameters in [0, 1]:
BrowInnerUpLeft/BrowInnerUpRightBrowOuterUpLeft/BrowOuterUpRightBrowLowererLeft/BrowLowererRightBrowPinchLeft/BrowPinchRight
Default endpoint: ws://localhost:7720
Send eye/face feature values from VRCFT. Only include features that are available — missing keys default to 0.0.
{
"type": "frame",
"ts": 1234567890.123,
"inputs": {
"EyeOpenessLeft": 0.82,
"EyeOpenessRight": 0.79,
"LipCornerPullLeft": 0.3
}
}Pushed at ~90fps regardless of whether the client sends frames.
{
"type": "brow",
"ts": 1234567890.123,
"outputs": {
"BrowInnerUpLeft": 0.34,
"BrowInnerUpRight": 0.34,
"BrowOuterUpLeft": 0.28,
"BrowOuterUpRight": 0.31,
"BrowLowererLeft": 0.05,
"BrowLowererRight": 0.06,
"BrowPinchLeft": 0.02,
"BrowPinchRight": 0.02
},
"mode": "ml",
"head_tracking": true,
"mic_active": true
}| Message | Description | Acknowledgement |
|---|---|---|
{"type": "ping"} |
Keepalive | {"type": "pong"} |
{"type": "reset"} |
Clear GRU buffer and recalibrate head | {"type": "reset_ack"} |
{"type": "recalibrate_head"} |
Restart head neutral pose calibration | {"type": "recalibrate_head_ack"} |
{"type": "set_mode", "mode": "rules_only"} |
Force a specific inference mode | {"type": "mode_ack", "mode": "..."} |
{"type": "get_status"} |
Request current server status | {"type": "status", ...} |
{"type": "get_calibration"} |
Calibration phase + per-AU output variance (last 3s) | {"type": "calibration", ...} |
Input sources (90fps polling loop):
VRCFT client → 14 eye + 13 face features (WebSocket frames from client)
MicrophoneProcessor → 6 prosody features (background thread, 16kHz audio)
HeadMotionTracker → 11 head motion features (background thread, OpenVR poll)
Emotion context → 3 features (optional SpeechBrain SER)
Delta features → 5 features (computed per-frame)
─────────────────────────────
52 inputs total
→ RuleBasedEstimator deterministic baseline → 8 AU values [0, 1]
→ GRU residual learned correction in (−1, 1), scaled by 0.4×
→ Spring-damper smoother per-AU physics (asymmetric attack/decay)
→ 8 VRCFT brow AU outputs, pushed to all connected clients
The GRU learns residuals on top of the rule base, not the full mapping. This means rules-only mode is immediately usable, and the ML model improves on top incrementally.
Head tracking calibrates on the first 2.5 seconds of runtime, treating the average pose as neutral. All subsequent values are offsets from that baseline.
Three recording paths are available depending on your setup:
Live VRCFT recorder — captures eye/face data from the C# module while you play normally:
python data/record_session.py --subject alice
# Options: --no_mic --no_head --val --port 7721Runs as a drop-in replacement for the inference server on port 7720. Merges microphone prosody and head motion into full 52-feature frames. Sends dummy zero brow responses so the C# module stays connected.
Raw Quest recorder — reads face expression weights directly from Quest hardware via OpenXR, bypassing VRCFT's remapping entirely:
python data/record_quest_raw.py --subject alice
# Requires: Meta Quest Link running, Meta set as active OpenXR runtimeQuest Pro sessions include labeled brow ground truth (has_labels=True). Quest 3 sessions are unlabeled — rule pseudo-labels are applied at training time.
Offline converter — converts already-captured recordings to BrowSync .jsonl format:
python data/convert_raw.py raw_recording.jsonl --subject alice
python data/convert_raw.py path/to/dir/ --subject alice # batchAccepts WebSocket frame log .jsonl or feature .csv with schema.py column names.
# Place .jsonl session files in:
# data/sessions/train/
# data/sessions/val/
python -m training.train
# Outputs: models/checkpoints/best_model.pt + models/browsync.onnxEach line in a session file is a BrowFrame JSON object:
{"timestamp_ms": 123.4, "inputs": [52 floats], "targets": [8 floats], "has_labels": true, "session_id": "uuid"}Sessions without Quest Pro ground truth omit targets and set has_labels: false. They are trained with rule-based pseudo-labels at a reduced loss weight (0.25).
Training config: 60 epochs, batch size 256, LR 3e-4, early stop patience 15. Loss = MSE + temporal smoothness penalty + asymmetric output weights (brow raises weighted 1.3× vs lowers). Rule estimates are pre-cached at dataset construction time so they are not recomputed per batch. torch.compile is enabled automatically on platforms where MSVC is available.
The exported models/browsync.onnx is self-contained:
- Normalisation stats (mean/std per feature) embedded in ONNX metadata
- Feature names embedded in ONNX metadata
- Input:
float32[batch, 30, 52]— 30-frame window at 90fps (≈333ms context) - Output:
float32[batch, 8]— Tanh residuals, scaled 0.4× at inference time
The companion models/browsync.onnx.data file holds the weights — both files must remain together.
Quest Pro users can donate sessions to improve the shared model. In the TUI, press d during a recording session to upload to HF Hub. Donated sessions provide real brow AU ground truth paired with the full input feature vector. They are:
- Stored locally in
data/sessions/donated/(gitignored) before upload - Uploaded to HF Hub via
huggingface_hub— requires the optionalhuggingface_hubdependency - Used to fine-tune future model releases
- Opt-in only; never collected without the explicit
dkeypress