Skip to content

torkbot/sandbox

Repository files navigation

Sandbox

Sandbox is a TypeScript-first Node.js library for running AI-agent work inside libkrun-backed microVMs. It gives agent builders a small API for booting isolated Linux workers, mounting host-controlled filesystems, persisting rootfs mutations, and enforcing explicit network egress policy.

Use Sandbox when your agent needs to run tools, install packages, clone repos, execute untrusted code, or call external APIs without handing the guest broad host filesystem or network access.

Sandbox is designed for:

  • agent runtimes that need disposable or durable Linux workspaces,
  • coding agents that need real shells, compilers, package managers, and repo checkouts,
  • browser or data agents that need tightly-scoped outbound network access,
  • hosted agent platforms that need per-task isolation with policy and audit points in TypeScript,
  • systems that need to broker credentials from the host without putting long lived secrets inside the guest.

Example

This example creates a durable agent lane, mounts a host-controlled workspace, allows DNS through Cloudflare, allows only GitHub API HTTP traffic, and injects a short-lived GitHub token from the host before the request leaves the sandbox.

import {
  defineSandbox,
  fs,
  network,
  rootfs,
  type SandboxBlockStore,
} from "@torkbot/sandbox";

const workspace = fs.memory({
  files: {
    "/task.txt": "Summarize the current GitHub user\n",
  },
});

const writableRootfs: SandboxBlockStore = new BlobBackedBlockStore({
  bucket: "agent-rootfs-overlays",
  keyPrefix: "lanes/github-worker",
});

const githubTokens = new GitHubInstallationTokenService({
  installationId: 123456,
});

const sandbox = defineSandbox({
  rootfs: rootfs.cow({
    base: rootfs.builtIn("alpine:3.23"),
    writable: writableRootfs,
  }),
  resources: {
    cpus: 4,
    memoryMiB: 4096,
  },
  network: network.policy(async (conn) => {
    // Let the guest resolve names, but answer DNS via an explicit resolver.
    if (conn.matchDns()?.accept({ resolvers: ["1.1.1.1"] })) return;

    // Only GitHub API HTTP(S) traffic gets HTTP middleware.
    const github = conn.matchHttp("api.github.com");
    if (!github) return;

    // Keep the credential decision in host-controlled TypeScript.
    if (!(await githubTokens.canServe(github))) return;

    github.accept(async (request) => {
      request.headers.set(
        "authorization",
        `Bearer ${await githubTokens.tokenForRequest(request)}`,
      );
    });
  }),
});

await using lane = await sandbox.boot({
  mounts: {
    "/workspace": fs.virtual(workspace),
  },
  cwd: "/workspace",
});

const result = await lane.exec("sh", [
  "-lc",
  "cat task.txt && curl -fsSL https://api.github.com/user",
]);

if (result.exitCode !== 0) {
  throw new Error(result.stderr);
}

The guest gets a normal Linux environment. The host keeps control over the workspace contents, rootfs persistence, network decisions, and credential injection.

Quick Paths

Run one isolated command

Use a built-in read-only rootfs and a memory-backed workspace:

const workspace = fs.memory({
  files: { "/hello.txt": "hello from the host\n" },
});

const sandbox = defineSandbox({
  rootfs: rootfs.builtIn("alpine:3.23"),
});

await using lane = await sandbox.boot({
  mounts: { "/workspace": fs.virtual(workspace) },
  cwd: "/workspace",
});

const result = await lane.exec("cat", ["hello.txt"]);

Give an agent a durable machine

Use rootfs.cow(...) when package installs and rootfs mutations should survive across boots. Sandbox owns the block-device protocol; your application owns the storage backend.

const source = rootfs.compose({
  base: rootfs.builtIn("alpine:3.23"),
  overlay: blockStore,
});

const sandbox = defineSandbox({
  rootfs: rootfs.cow({ source }),
});

Attach a writable COW block store to at most one running sandbox instance at a time. Create one store per lane, or enforce exclusivity in your storage layer.

Mount host-controlled data

Mounts are per boot. They are guest-visible paths backed by TypeScript filesystem implementations, not host path passthrough.

await using lane = await sandbox.boot({
  hostname: "agent-42",
  mounts: {
    "/workspace": fs.virtual(workspaceFs),
    "/mnt/shared": fs.virtual(sharedFs),
  },
  cwd: "/workspace",
});

Sandbox configures the kernel hostname during boot. Omit hostname to use the built-in default sandbox.

Sandbox init creates missing mount target directories immediately before attaching each virtual filesystem, matching container runtime behavior when the target parent is on init-owned tmpfs or comes from an earlier virtual mount. On durable rootfs paths that cannot be created without changing rootfs or COW semantics, boot fails with the guest init error surfaced in the host exception.

Control network egress

Networking is default-deny. Policy callbacks grant only the flows they accept:

const sandbox = defineSandbox({
  rootfs: rootfs.builtIn("alpine:3.23"),
  network: network.policy((conn) => {
    if (conn.matchDns()?.accept({ resolvers: ["1.1.1.1"] })) return;

    conn.matchHttp("api.example.com")?.accept((request) => {
      request.headers.set("authorization", `Bearer ${apiToken}`);
    });
  }),
});

Use conn.accept() for raw transport, and protocol match helpers when you want Sandbox to handle protocol-specific semantics. conn.matchHttp(...) does not trust the HTTP Host header; it uses trusted destination metadata and then routes accepted traffic through HTTP-family enforcement.

Broker credentials from the host

Credential injection belongs in HTTP middleware, not in the guest filesystem or environment:

const github = conn.matchHttp("api.github.com");
if (!github) return;

if (!(await policyManager.allow(github))) return;

github.accept(async (request) => {
  request.headers.set(
    "authorization",
    `Bearer ${await policyManager.tokenFor(request)}`,
  );
});

This lets the guest run ordinary tools such as curl, git, package managers, or language CLIs while the host decides which outbound requests receive credentials.

API Reference

defineSandbox(options)

Creates reusable machine configuration.

const sandbox = defineSandbox({
  rootfs: rootfs.builtIn("alpine:3.23"),
  resources: {
    cpus: 4,
    memoryMiB: 4096,
  },
  network: network.policy((conn) => {
    conn.matchDns()?.accept({ resolvers: ["1.1.1.1"] });
  }),
});

defineSandbox(...) does not boot a VM. It describes rootfs, resource, and network policy defaults that can be reused across many boots.

rootfs

rootfs.builtIn("alpine:3.23");

Selects a read-only built-in rootfs artifact. Built-in rootfs artifacts are prepared at build or install time; Sandbox does not pull container images or build root filesystems during boot().

rootfs.ephemeral({
  base: rootfs.builtIn("alpine:3.23"),
  maxDirtyBytes: 64 * 1024 * 1024,
});

Mounts a built-in base rootfs through a writable in-process copy-on-write block store. Clean base-image blocks are served from the built-in artifact, dirty blocks are kept by the native runtime, and all rootfs mutations are discarded when the sandbox instance exits. Use this when a command needs to mutate the machine rootfs but those mutations must not persist across boots.

rootfs.cow({
  base: rootfs.builtIn("alpine:3.23"),
  writable: blockStore,
  maxDirtyBytes: 64 * 1024 * 1024,
});

Mounts a built-in base rootfs through a writable copy-on-write block store. For offline image work, describe the same merged view without booting a VM:

const source = rootfs.compose({
  base: rootfs.builtIn("alpine:3.23"),
  overlay: blockStore,
});

const image = await rootfs.flatten({
  format: "qcow2",
  source,
  dest: imageStore,
  clusterSize: 65536,
});

for await (const chunk of rootfs.bytes(image)) {
  await upload(chunk);
}

rootfs.ephemeral(...) and rootfs.cow(...) both present a writable block rootfs without modifying the built-in artifact. rootfs.cow(...) normalizes through the same composed source used by rootfs.flatten(...), so boot and image export share one base-plus-overlay contract. Clean base-image blocks are served from the built-in artifact. Dirty blocks are read lazily and flushed through your SandboxBlockStore. maxDirtyBytes limits how many dirty COW block bytes the native runtime buffers before forcing a write to the block store during a VM run. For rootfs.ephemeral(...), the same value is the native in-memory dirty block quota; guest writes beyond that budget fail instead of growing host memory without bound. When omitted, Sandbox uses a 64 MiB block-aligned default.

rootfs.flatten(...) writes a standalone QCOW2 image into dest, using dest as random-access image-byte storage. rootfs.bytes(...) streams raw image container bytes for a built-in image or a flattened image; it does not stream the filesystem contents inside the image.

interface SandboxBlockStore {
  readonly blockSize: number;
  list(context: SandboxBlockStoreContext): Promise<readonly bigint[]>;
  read(
    range: SandboxBlockRange,
    context: SandboxBlockStoreContext,
  ): Promise<readonly SandboxBlockChunk[]>;
  write(
    chunks: readonly SandboxBlockChunk[],
    context: SandboxBlockStoreContext,
  ): Promise<void>;
  flush?(context: SandboxBlockStoreContext): Promise<void>;
}

The context.base value identifies the exact built-in base image for the boot, so storage layers can namespace blocks, reject mismatched snapshots, or migrate state.

write() receives block bytes owned by the block store. The sandbox runtime will not mutate those Uint8Array values after passing them to write(), so a store may retain them for delayed persistence. A store that returns bytes from read() should treat the returned arrays as immutable after the promise resolves.

sandbox.boot(options)

Boots a sandbox instance.

await using lane = await sandbox.boot({
  mounts: {
    "/workspace": fs.virtual(workspaceFs),
  },
  cwd: "/workspace",
});

Boot options are per instance. The same sandbox definition can be booted with different mounts and working directories.

Filesystems

const workspaceFs = fs.memory({
  files: {
    "/README.md": "# Task\n",
  },
});

fs.memory(...) creates an in-memory POSIX filesystem.

const mount = fs.virtual(workspaceFs);

fs.virtual(...) adapts a compatible JavaScript filesystem for guest mounts. Sandbox mounts are host-implemented filesystems, not direct host directory mounts.

Processes

const result = await lane.exec("npm", ["test"], {
  cwd: "/workspace",
  env: { CI: "1" },
  timeoutMs: 120_000,
  signal: abortController.signal,
});

exec(...) is the buffered process API. It returns after the command exits with exitCode, stdout, and stderr. When timeoutMs expires, Sandbox terminates the guest process group and returns exit code 124. When signal aborts, Sandbox terminates that guest process group, rejects the exec(...) promise with an AbortError, and keeps the sandbox usable for subsequent commands.

Use spawn(...) for non-interactive long-lived processes with streaming stdin, stdout, and stderr. It returns a process handle immediately; ready tracks guest-side startup.

import { Writable } from "node:stream";

const child = lane.spawn("npm", ["test"], { cwd: "/workspace" });

child.stdout.pipeTo(Writable.toWeb(process.stdout));
child.stderr.pipeTo(Writable.toWeb(process.stderr));

const { exitCode } = await child.exit;

Use pty(...) when the guest process should see a real terminal, such as an interactive shell, REPL, pager, or terminal UI:

import { Readable, Writable } from "node:stream";

const shell = lane.pty("/bin/sh", ["-i"], {
  cwd: "/workspace",
  env: { TERM: "xterm-256color" },
  size: { rows: 40, cols: 120 },
});

Readable.toWeb(process.stdin).pipeTo(shell.input);
shell.output.pipeTo(Writable.toWeb(process.stdout));

Network Policy

const policy = network.policy(async (conn) => {
  if (conn.matchDns()?.accept({ resolvers: ["1.1.1.1"] })) return;

  const api = conn.matchHttp("api.example.com");
  if (!api) return;

  if (!(await policyManager.allow(api))) return;

  api.accept((request) => policyManager.handleHttp(request));
});

Policy callbacks are default-deny. If the callback does not create a grant, the connection is blocked.

Every policy event exposes IP-layer endpoints:

conn.src.ip;
conn.src.port;
conn.dst.ip;
conn.dst.port;

Endpoint helpers classify address ranges without relying on DNS, TLS, or HTTP:

conn.dst.isLoopback();
conn.dst.isPrivate();
conn.dst.isLinkLocal();
conn.dst.isMulticast();
conn.dst.isBroadcast();
conn.dst.isDocumentation();
conn.dst.isReserved();
conn.dst.isPublicInternet();

Transport and protocol helpers:

conn.accept();                  // accept raw TCP or UDP transport
conn.matchDns();                // DNS over UDP or TCP
conn.matchTcp("203.0.113.10:5432");
conn.matchUdp("203.0.113.10:8125");
conn.matchHttp("api.example.com");

conn.matchDns() normalizes DNS over UDP and TCP. The guest is configured to use Sandbox's internal resolver; policy code decides whether to accept that DNS flow and can choose explicit upstream resolvers in accept(...). Accepted DNS answers are cached as trusted, guest-scoped hostname metadata for later connection policy decisions. Sandbox may retain this attribution metadata for a bounded window after a short DNS TTL so delayed connections can still be tied back to the accepted DNS answer that produced their destination IP.

const dns = conn.matchDns();
if (dns) {
  dns.accept({ resolvers: ["1.1.1.1", "8.8.8.8"] });
}

conn.matchHttp(...) acquires an HTTP capability from trusted destination metadata, including hostnames observed from accepted DNS answers. It does not inspect or trust the HTTP Host header. IP-addressed HTTP requests can still be accepted through lower-level policy, but they do not advertise a trusted hostname. Calling accept() without middleware authorizes the matched flow without rewriting bytes; pass middleware only when Sandbox should inspect and mutate HTTP request headers.

const http = conn.matchHttp((candidate) =>
  candidate.hostname.endsWith(".example.com")
);

if (http) {
  http.accept((request) => {
    request.headers.set("x-agent-policy", "allowed");
  });
}

http.accept(...) enters Sandbox's HTTP-family enforcement path. If the matched flow is not actually HTTP or HTTPS, it fails closed.

Architecture Reference

Sandbox hides the kernel, init, transport, and host helper behind a TypeScript API:

  • The runtime boots a libkrun-backed microVM from a prebuilt rootfs artifact.
  • The built-in rootfs is a compressed QCOW2 image containing an ext4 guest filesystem.
  • A signed sandbox-host helper owns the Node/Rust/libkrun boundary.
  • Guest control traffic uses an fd-backed transport between the host and the custom Sandbox init process.
  • Host-implemented virtual filesystems are mounted into the guest.
  • Durable rootfs mutation is modeled as block-level copy-on-write storage, not as a guest-visible POSIX filesystem.
  • Network egress is default-deny and policy-controlled.
  • HTTP request middleware is caller-provided JavaScript, while Sandbox owns interception, forwarding, and certificate plumbing.

When HTTP interception is enabled, the host generates CA material and passes only the public CA certificate to Sandbox init. Init installs that CA using the selected rootfs' native trust-store mechanism only when the rootfs is writable. Read-only built-in rootfs launches keep the CA available under /run and do not mutate the trust store, so HTTP interception does not change rootfs write behavior. If a writable rootfs does not provide a supported trust-store installer, init fails closed.

The intended boundary is:

  • Sandbox owns launch, isolation, mounts, network interception, and enforcement.
  • User-space owns artifact selection, filesystem durability, network policy state, confirmation flows, audit logs, and credential brokering.

See docs/architecture.md for the design background, docs/kernel-build.md for kernel artifact builds, and docs/testing-strategy.md for the integration and e2e testing plan.

Platform Notes

The npm package is published as @torkbot/sandbox. It does not use post-install scripts. The root package contains the TypeScript API and declares platform artifacts as optional dependencies:

  • @torkbot/sandbox-darwin-arm64
  • @torkbot/sandbox-linux-x64-gnu

Each platform package contains the sandbox-host helper and built-in rootfs artifacts for that target. Runtime artifact resolution only loads the installed optional dependency for the current platform. Local development uses the same layout by materializing the current platform package under node_modules.

macOS signing setup

For now, the macOS sandbox-host artifact is not Developer ID signed or notarized. macOS users must sign the installed helper locally before launching a VM:

npx @torkbot/sandbox setup-macos

This performs an ad-hoc local codesign with the com.apple.security.hypervisor entitlement required by Hypervisor.framework. It does not contact Apple and does not require an Apple Developer account. If a macOS user tries to launch a VM before running setup, Sandbox throws a runtime error that points back to this command.

Development

Local release packaging sanity check:

npm run release:pack

After rebuilding local native artifacts, refresh the local optional package layout with:

npm run artifacts:link-current

Repository layout:

  • src/: TypeScript API consumed by Node.js callers.
  • crates/sandbox: Rust host implementation for libkrun, block storage, network, HTTP, and VFS services.
  • crates/sandbox-host: signed VM-host helper used for macOS HVF launch.
  • crates/sandbox-init: custom guest init used to configure the guest before supervising untrusted code.
  • tests/e2e: TypeScript e2e scenarios run directly by Node.js 24+ type stripping.

About

Sandbox is a TypeScript-first Node.js library for spawning libkrun-backed microVMs.

Resources

Stars

Watchers

Forks

Contributors