Make ContainerEnvironment pluggable so non-Docker backends can run eval by frmsaul · Pull Request #25 · facebookresearch/ProgramBench

frmsaul · 2026-05-16T23:35:27Z

Problem

Evaluator in programbench/eval/eval.py assumes Docker is the only container runtime — _new_env constructs ContainerEnvironment(...) directly, and _copy_file_from_container shells docker cp by reaching into env.executable / env.container_id. That blocks downstream callers running ProgramBench in environments that don't support Docker-in-Docker (cloud sandboxes, gVisor, firecracker, CI runners that share a single Docker daemon). The workaround today is monkey-patching programbench.container.ContainerEnvironment and Evaluator._copy_file_from_container at runtime — fragile and not future-proof.

What this PR does

Introduces two Protocols (Environment, ContainerBackend) and threads an optional backend kwarg through Evaluator.__init__. The default is DockerBackend() — byte-identical behavior to today.

ContainerEnvironment.copy_out(container_path, *, timeout) is new — encapsulates the docker cp logic that lived inline in _copy_file_from_container. Returns (contents, command_string) so callers can log the wire command without knowing the backend's format.
DockerBackend is the default implementation; wraps the existing ContainerEnvironment and module-level remove_image.
Evaluator._new_env and the final-cleanup remove_image call now delegate to self.backend. _copy_file_from_container collapses from ~45 lines of subprocess plumbing to a single env.copy_out(...) call.

Backwards compatibility

Default backend=None → DockerBackend(executable=DOCKER_EXECUTABLE, run_args=DOCKER_RUN_ARGS). Same behavior, same --init, same xdist hardening.
ContainerEnvironment's public surface is unchanged; copy_out is additive.
All 28 existing tests still pass; 31/31 with the 3 new ones.

Tests

tests/test_pluggable_backend.py covers:

Injecting a custom FakeBackend routes _new_env and _copy_file_from_container through the fake (no Docker shellouts in the test).
Omitting backend= produces a DockerBackend.
DockerBackend satisfies the ContainerBackend Protocol structurally.

Motivating use case

My infra runs the eval inside Daytona sandboxes, which can't run Docker. Today I monkey-patch programbench.container.ContainerEnvironment and Evaluator._copy_file_from_container at runtime; after this PR both patches go away.

The eval pipeline assumes Docker is the only container runtime: `Evaluator._new_env` constructs `ContainerEnvironment(...)` directly, and `_copy_file_from_container` shells `docker cp` by reaching into `env.executable` / `env.container_id`. That blocks anyone running ProgramBench inside an environment that doesn't support Docker-in-Docker (cloud sandboxes, gVisor, firecracker, most CI runners that share a single Docker daemon). They have to monkey-patch internals to swap the backend. This change introduces two Protocols (`Environment`, `ContainerBackend`) and threads an optional `backend` kwarg through `Evaluator.__init__`. The default is `DockerBackend()` — same behavior as today; the byte-identical test suite confirms it. Concrete changes: - `Environment` Protocol codifies the surface `Evaluator` already uses: `execute`, `copy_in`, `copy_in_tar`, `copy_out` (new), `commit`, `cleanup`. Plus `cwd` and `default_timeout` attributes that `_run_test_branch` reads directly. - `ContainerEnvironment.copy_out(container_path, *, timeout)` is new. It replaces the direct `docker cp` call that lived in `_copy_file_from_container`. Returns `(contents, command_string)` so the caller can log the wire command without knowing the backend's format. - `ContainerBackend` Protocol with `new_env(image, *, cwd, timeout, cpus, env, run_args)` and `remove_image(image_ref)`. - `DockerBackend` is the default implementation: wraps `ContainerEnvironment` and the module-level `remove_image` helper. Preserves the `executable` and base `run_args` plumbing. - `Evaluator.__init__` accepts `backend: ContainerBackend | None = None`, defaulting to `DockerBackend(executable=DOCKER_EXECUTABLE, run_args=DOCKER_RUN_ARGS)`. - `Evaluator._new_env` delegates to `self.backend.new_env(...)`. The baseline xdist hardening (PYTEST_ADDOPTS, serial_pytest, --init in run_args) is unchanged. - `Evaluator._copy_file_from_container` collapses from ~45 lines of subprocess plumbing to a single `env.copy_out(...)` call plus the same step-log entry shape it always wrote. Removed `tempfile` and `subprocess` imports from eval.py (now only needed in container.py). - `Evaluator.run`'s final `remove_image(committed_image, ...)` call becomes `self.backend.remove_image(committed_image)`. Tests: - `tests/test_pluggable_backend.py` adds 3 examples: * Building an Evaluator with a FakeBackend routes `_new_env` and `_copy_file_from_container` through the fake (no Docker shellouts). * No `backend=` kwarg → DockerBackend. * DockerBackend satisfies the ContainerBackend Protocol structurally. - All 28 existing tests still pass; 31/31 with the new ones. Motivating use case: saul-agent-infra runs the eval inside Daytona sandboxes which can't run Docker. Today we monkey-patch `programbench.container.ContainerEnvironment`, `Evaluator._copy_file_from_container`, and `Evaluator._restore_executable`. After this PR the first two go away entirely. (The third, swapping `mv` for `cp` to handle serial test branches without re-imaging, is a small separate fix worth its own discussion.)

meta-cla · 2026-05-16T23:35:32Z

Hi @frmsaul!

Thank you for your pull request and welcome to our community.

Action Required

In order to merge any pull request (code, docs, etc.), we require contributors to sign our Contributor License Agreement, and we don't seem to have one on file for you.

Process

In order for us to review and merge your suggested changes, please sign at https://code.facebook.com/cla. If you are contributing on behalf of someone else (eg your employer), the individual CLA may not be sufficient and your employer may need to sign the corporate CLA.

Once the CLA is signed, our tooling will perform checks and validations. Afterwards, the pull request will be tagged with CLA signed. The tagging process may take up to 1 hour after signing. Please give it that time before contacting us about it.

If you have received this in error or have any questions, please contact us at cla@meta.com. Thanks!

meta-cla · 2026-05-17T02:17:21Z

Thank you for signing our Contributor License Agreement. We can now accept your code for this (and any) Meta Open Source project. Thanks!

meta-cla Bot added the CLA Signed This label is managed by the Meta Open Source bot. label May 17, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make ContainerEnvironment pluggable so non-Docker backends can run eval#25

Make ContainerEnvironment pluggable so non-Docker backends can run eval#25
frmsaul wants to merge 1 commit into
facebookresearch:mainfrom
frmsaul:pluggable-container-backend

frmsaul commented May 16, 2026 •

edited

Loading

Uh oh!

meta-cla Bot commented May 16, 2026

Uh oh!

meta-cla Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

frmsaul commented May 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

What this PR does

Backwards compatibility

Tests

Motivating use case

Uh oh!

meta-cla Bot commented May 16, 2026

Action Required

Process

Uh oh!

meta-cla Bot commented May 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

frmsaul commented May 16, 2026 •

edited

Loading