Skip to content

viz/layers: stereo QuadLayer (paired mailbox + per-eye binding)#534

Draft
farbod-nv wants to merge 4 commits into
mainfrom
fm/stereo_quad
Draft

viz/layers: stereo QuadLayer (paired mailbox + per-eye binding)#534
farbod-nv wants to merge 4 commits into
mainfrom
fm/stereo_quad

Conversation

@farbod-nv

Copy link
Copy Markdown
Contributor

QuadLayer::Config gains stereo and stereo_baseline_m. Stereo layers allocate two DeviceImages per mailbox slot (left + right) and expose a two-arg submit:

void submit(left, right, stream = 0);

Both eyes are copied + the single cuda_done_writing signal is emitted on the same CUDA stream, so stream ordering — not a second semaphore — is what guarantees the renderer never sees a half-matched pair. Strict invariants: submit(src) throws std::logic_error on a stereo layer, submit(left, right) throws on a mono layer.

record() in kXr binds the left descriptor for view 0 and the right for view 1. Window / offscreen (single-view) renders left only. Baseline disparity is applied host-side by translating each eye's placement position by ±stereo_baseline_m/2 along the placement's local +x axis before composing the MVP — no shader change.

Mip generation runs on both eyes in the pre-render-pass when stereo

  • generate_mipmaps are both on. Descriptor pool sizing + writes double when stereo.

Memory: 2× per-slot allocation for stereo layers (~112 MB at 1080p RGBA8 + 7 slots vs ~56 MB mono).

Tests (catch2, gpu-gated where needed):

  • ctor rejects non-finite stereo_baseline_m
  • stereo allocates paired DeviceImages for every slot (distinct vk_image + cuda_array per eye)
  • mono device_image_right is null
  • mono submit(left, right) throws
  • stereo submit(left) throws
  • stereo submit lands matching L/R pair in the latest slot
  • stereo rapid submits keep every L/R pair atomic (decodes the per-submit index from the left pixel and proves the right pixel is the matching one for every written slot — covers torn writes
    • cross-eye aliasing in one shot)

QuadLayer::Config gains ``stereo`` and ``stereo_baseline_m``. Stereo
layers allocate two DeviceImages per mailbox slot (left + right) and
expose a two-arg submit:

    void submit(left, right, stream = 0);

Both eyes are copied + the single cuda_done_writing signal is emitted
on the same CUDA stream, so stream ordering — not a second semaphore
— is what guarantees the renderer never sees a half-matched pair.
Strict invariants: submit(src) throws std::logic_error on a stereo
layer, submit(left, right) throws on a mono layer.

record() in kXr binds the left descriptor for view 0 and the right
for view 1. Window / offscreen (single-view) renders left only.
Baseline disparity is applied host-side by translating each eye's
placement position by ±stereo_baseline_m/2 along the placement's
local +x axis before composing the MVP — no shader change.

Mip generation runs on both eyes in the pre-render-pass when stereo
+ generate_mipmaps are both on. Descriptor pool sizing + writes
double when stereo.

Memory: 2× per-slot allocation for stereo layers (~112 MB at 1080p
RGBA8 + 7 slots vs ~56 MB mono).

Tests (catch2, gpu-gated where needed):
- ctor rejects non-finite stereo_baseline_m
- stereo allocates paired DeviceImages for every slot (distinct
  vk_image + cuda_array per eye)
- mono device_image_right is null
- mono submit(left, right) throws
- stereo submit(left) throws
- stereo submit lands matching L/R pair in the latest slot
- stereo rapid submits keep every L/R pair atomic (decodes the
  per-submit index from the left pixel and proves the right pixel
  is the matching one for every written slot — covers torn writes
  + cross-eye aliasing in one shot)
@coderabbitai

coderabbitai Bot commented May 14, 2026

Copy link
Copy Markdown
Contributor

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Enterprise

Run ID: ea16d5dd-4931-4fce-89cb-ab6ca5dcb320

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

  • 🔍 Trigger review
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fm/stereo_quad

Comment @coderabbitai help to get the list of available commands and usage tips.

farbod-nv added 3 commits May 14, 2026 15:22
Pythonization of the stereo path that landed in C++:

QuadLayerConfig
  * stereo: bool (default False)
  * stereo_baseline_mm: float (default 0.0). Disparity in *millimeters*
    rather than meters — typical IPDs / camera baselines are 50–80 mm
    and meter values felt off when typed out.

QuadLayer.submit(left, right=None, stream=0)
  * One method, accepts a VizBuffer OR any __cuda_array_interface__-
    exposing object (CuPy / PyTorch / Numba / numpy on a CUDA pointer)
    for each of left / right. Conversion + validation moved into
    bindings_helpers.hpp::cuda_array_to_viz_buffer, callable from the
    binding lambda twice (once per eye).
  * Mono layer + right buffer → RuntimeError("...mono...").
  * Stereo layer + missing right → RuntimeError("...stereo...").
  * submit_cuda_array removed; in-tree callers
    (camera_viz/pipeline/runner.py + viz_init.py + the python_tests
    file) updated to the new shape.

Tests added to test_offscreen_session.py:
  * test_stereo_quad_layer_round_trip — distinct L/R sources, render
    in offscreen mode, readback must show the LEFT buffer (per the
    single-view fallback documented on the layer).
  * test_stereo_invariants — submit(left, right) on mono and
    submit(left) on stereo both raise the right RuntimeError.
  * test_stereo_invalid_baseline_rejected — NaN baseline rejected at
    add_quad_layer time.
  * Module-level skipif gate keeps these no-ops on the GPU-less CI
    runners (same as the existing tests in this file).

C++ rename: QuadLayer::Config::stereo_baseline_m → stereo_baseline_mm
and the corresponding C++ test. The host-side translation in record()
multiplies by 0.0005f (×0.5 to halve per eye, ×0.001 to convert mm→m
for the world-meters placement.pose.position).
Plumbing for stereo all the way through the local pipeline:

pipeline.Frame
  * New ``image_right: Optional[Any]`` field. None on mono frames;
    set by stereo sources to the right-eye buffer. The two eyes must
    come from the same capture instant — the QuadLayer mailbox is the
    one that proves atomicity at the GPU side.

pipeline.runner
  * VizRunner dispatches submit(image, image_right, stream) when
    ``frame.image_right`` is set, else falls back to submit(image, stream).

sources
  * ``SyntheticStereoSource`` — animated test pattern with a
    configurable horizontal pixel disparity between the eyes. Lets
    camera_viz exercise the stereo path without stereo camera hardware.
  * ``PairedFrameSource(name, left, right)`` — generic wrapper that
    merges two per-eye FrameSources into one stereo source. Skips the
    publish until both eyes are ready, so the layer never sees an
    unpaired update.
  * ``build_local_camera``: when ``cameras.<cam>.stereo: true``,
    wraps the existing per-eye ZED / OAK-D sources in
    PairedFrameSource (or directly returns SyntheticStereoSource for
    synthetic). v4l2 + stereo combo is rejected — USB UVC is mono.

camera_viz.py
  * SourceEntry promoted to a dataclass carrying (source, placement,
    stereo, stereo_baseline_mm). YAML keys:
      cameras.<cam>.stereo               : producer-side toggle
      placements.<cam>.stereo_baseline_mm: render-side disparity (mm)
    The layer-builder loop sets QuadLayerConfig.stereo and
    .stereo_baseline_mm from the entry.
  * RTP entries leave stereo=false for now — paired-stream RTP is C4.

configs/synthetic_stereo.yaml — hardware-free demo config exercising
the whole stack end to end.
C4 of the stereo plan: stereo cameras over the RTP path send two
independent H.264 streams (rtp.port for left, rtp.port_right for
right). The receiver opens both, pairs them via PairedFrameSource,
and feeds the QuadLayer mailbox — atomicity guarantees come back at
the GPU side, not on the wire. Wire-level drift is acceptable per
the agreed spec.

camera_streamer
  * ``_pick_mono_source`` → ``_eye_sources`` returns [src] for mono
    cameras, [left, right] for stereo (unwrapped from the
    PairedFrameSource that build_local_camera returns).
  * ``_build_sender`` → ``_build_senders`` returns a list; one
    RtpH264Sender per eye. Supervisor starts both, polls both for
    liveness, restarts both as a pair if either drops — keeping eyes
    in lockstep across reconnects.
  * Requires ``rtp.port_right`` when the camera is stereo. Mono
    config is unchanged.

camera_viz._build_rtp_entries
  * Stereo cameras open two RtpH264Source instances (one per port),
    named "<cam>.left" / "<cam>.right", and wrap them in
    PairedFrameSource. SourceEntry carries stereo + baseline through
    to the QuadLayerConfig the same way the local path does.
  * Mono RTP cameras unchanged.

sources.PairedFrameSource
  * Expose ``.left`` / ``.right`` properties so camera_streamer can
    fan out to per-eye senders without re-opening the camera.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant