Skip to content

Make ContainerEnvironment pluggable so non-Docker backends can run eval#25

Open
frmsaul wants to merge 1 commit into
facebookresearch:mainfrom
frmsaul:pluggable-container-backend
Open

Make ContainerEnvironment pluggable so non-Docker backends can run eval#25
frmsaul wants to merge 1 commit into
facebookresearch:mainfrom
frmsaul:pluggable-container-backend

Conversation

@frmsaul
Copy link
Copy Markdown

@frmsaul frmsaul commented May 16, 2026

Problem

Evaluator in programbench/eval/eval.py assumes Docker is the only container runtime — _new_env constructs ContainerEnvironment(...) directly, and _copy_file_from_container shells docker cp by reaching into env.executable / env.container_id. That blocks downstream callers running ProgramBench in environments that don't support Docker-in-Docker (cloud sandboxes, gVisor, firecracker, CI runners that share a single Docker daemon). The workaround today is monkey-patching programbench.container.ContainerEnvironment and Evaluator._copy_file_from_container at runtime — fragile and not future-proof.

What this PR does

Introduces two Protocols (Environment, ContainerBackend) and threads an optional backend kwarg through Evaluator.__init__. The default is DockerBackend() — byte-identical behavior to today.

  • ContainerEnvironment.copy_out(container_path, *, timeout) is new — encapsulates the docker cp logic that lived inline in _copy_file_from_container. Returns (contents, command_string) so callers can log the wire command without knowing the backend's format.
  • DockerBackend is the default implementation; wraps the existing ContainerEnvironment and module-level remove_image.
  • Evaluator._new_env and the final-cleanup remove_image call now delegate to self.backend. _copy_file_from_container collapses from ~45 lines of subprocess plumbing to a single env.copy_out(...) call.

Backwards compatibility

  • Default backend=NoneDockerBackend(executable=DOCKER_EXECUTABLE, run_args=DOCKER_RUN_ARGS). Same behavior, same --init, same xdist hardening.
  • ContainerEnvironment's public surface is unchanged; copy_out is additive.
  • All 28 existing tests still pass; 31/31 with the 3 new ones.

Tests

tests/test_pluggable_backend.py covers:

  1. Injecting a custom FakeBackend routes _new_env and _copy_file_from_container through the fake (no Docker shellouts in the test).
  2. Omitting backend= produces a DockerBackend.
  3. DockerBackend satisfies the ContainerBackend Protocol structurally.

Motivating use case

My infra runs the eval inside Daytona sandboxes, which can't run Docker. Today I monkey-patch programbench.container.ContainerEnvironment and Evaluator._copy_file_from_container at runtime; after this PR both patches go away.

The eval pipeline assumes Docker is the only container runtime: `Evaluator._new_env`
constructs `ContainerEnvironment(...)` directly, and `_copy_file_from_container`
shells `docker cp` by reaching into `env.executable` / `env.container_id`.
That blocks anyone running ProgramBench inside an environment that doesn't
support Docker-in-Docker (cloud sandboxes, gVisor, firecracker, most CI
runners that share a single Docker daemon). They have to monkey-patch
internals to swap the backend.

This change introduces two Protocols (`Environment`, `ContainerBackend`)
and threads an optional `backend` kwarg through `Evaluator.__init__`. The
default is `DockerBackend()` — same behavior as today; the byte-identical
test suite confirms it.

Concrete changes:

- `Environment` Protocol codifies the surface `Evaluator` already uses:
  `execute`, `copy_in`, `copy_in_tar`, `copy_out` (new), `commit`,
  `cleanup`. Plus `cwd` and `default_timeout` attributes that
  `_run_test_branch` reads directly.

- `ContainerEnvironment.copy_out(container_path, *, timeout)` is new.
  It replaces the direct `docker cp` call that lived in
  `_copy_file_from_container`. Returns `(contents, command_string)` so
  the caller can log the wire command without knowing the backend's
  format.

- `ContainerBackend` Protocol with `new_env(image, *, cwd, timeout, cpus,
  env, run_args)` and `remove_image(image_ref)`.

- `DockerBackend` is the default implementation: wraps
  `ContainerEnvironment` and the module-level `remove_image` helper.
  Preserves the `executable` and base `run_args` plumbing.

- `Evaluator.__init__` accepts `backend: ContainerBackend | None = None`,
  defaulting to `DockerBackend(executable=DOCKER_EXECUTABLE, run_args=DOCKER_RUN_ARGS)`.

- `Evaluator._new_env` delegates to `self.backend.new_env(...)`. The
  baseline xdist hardening (PYTEST_ADDOPTS, serial_pytest, --init in
  run_args) is unchanged.

- `Evaluator._copy_file_from_container` collapses from ~45 lines of
  subprocess plumbing to a single `env.copy_out(...)` call plus the
  same step-log entry shape it always wrote. Removed `tempfile` and
  `subprocess` imports from eval.py (now only needed in container.py).

- `Evaluator.run`'s final `remove_image(committed_image, ...)` call
  becomes `self.backend.remove_image(committed_image)`.

Tests:

- `tests/test_pluggable_backend.py` adds 3 examples:
  * Building an Evaluator with a FakeBackend routes `_new_env` and
    `_copy_file_from_container` through the fake (no Docker shellouts).
  * No `backend=` kwarg → DockerBackend.
  * DockerBackend satisfies the ContainerBackend Protocol structurally.

- All 28 existing tests still pass; 31/31 with the new ones.

Motivating use case: saul-agent-infra runs the eval inside Daytona
sandboxes which can't run Docker. Today we monkey-patch
`programbench.container.ContainerEnvironment`, `Evaluator._copy_file_from_container`,
and `Evaluator._restore_executable`. After this PR the first two go
away entirely. (The third, swapping `mv` for `cp` to handle serial
test branches without re-imaging, is a small separate fix worth its
own discussion.)
@meta-cla
Copy link
Copy Markdown

meta-cla Bot commented May 16, 2026

Hi @frmsaul!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

@meta-cla meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 17, 2026
@meta-cla
Copy link
Copy Markdown

meta-cla Bot commented May 17, 2026

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

CLA Signed This label is managed by the Meta Open Source bot.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant