From da82448f8a1b3c4845e1e3ea406302fa4ac180c9 Mon Sep 17 00:00:00 2001 From: Drew Newberry Date: Tue, 5 May 2026 22:11:41 -0700 Subject: [PATCH 1/5] docs(architecture): consolidate architecture docs Signed-off-by: Drew Newberry --- .../skills/generate-sandbox-policy/SKILL.md | 9 +- AGENTS.md | 11 + architecture/README.md | 325 +-- architecture/build-containers.md | 109 - architecture/build.md | 67 + architecture/ci-e2e.md | 198 -- architecture/compute-runtimes.md | 69 + architecture/custom-vm-runtime.md | 343 --- architecture/docker-driver.md | 129 -- architecture/docs-site.md | 53 - architecture/gateway-deploy-connect.md | 140 -- architecture/gateway-security.md | 439 ---- architecture/gateway-settings.md | 562 ----- architecture/gateway-single-node.md | 129 -- architecture/gateway.md | 759 +------ architecture/inference-routing.md | 359 --- architecture/object-metadata.md | 426 ---- architecture/oidc-auth.md | 542 ----- architecture/oidc-local-testing.md | 575 ----- architecture/podman-driver.md | 271 --- architecture/podman-rootless-networking.md | 387 ---- architecture/policy-advisor.md | 246 --- architecture/sandbox-connect.md | 649 ------ architecture/sandbox-custom-containers.md | 128 -- architecture/sandbox-providers.md | 455 ---- architecture/sandbox.md | 1941 +---------------- architecture/security-policy.md | 1619 +------------- architecture/system-architecture.md | 212 -- architecture/tui.md | 198 -- crates/openshell-core/README.md | 52 + crates/openshell-driver-docker/README.md | 65 + crates/openshell-driver-kubernetes/README.md | 49 + crates/openshell-driver-podman/README.md | 74 + crates/openshell-driver-vm/README.md | 4 +- crates/openshell-providers/README.md | 32 + crates/openshell-router/README.md | 17 +- 36 files changed, 715 insertions(+), 10928 deletions(-) delete mode 100644 architecture/build-containers.md create mode 100644 architecture/build.md delete mode 100644 architecture/ci-e2e.md create mode 100644 architecture/compute-runtimes.md delete mode 100644 architecture/custom-vm-runtime.md delete mode 100644 architecture/docker-driver.md delete mode 100644 architecture/docs-site.md delete mode 100644 architecture/gateway-deploy-connect.md delete mode 100644 architecture/gateway-security.md delete mode 100644 architecture/gateway-settings.md delete mode 100644 architecture/gateway-single-node.md delete mode 100644 architecture/inference-routing.md delete mode 100644 architecture/object-metadata.md delete mode 100644 architecture/oidc-auth.md delete mode 100644 architecture/oidc-local-testing.md delete mode 100644 architecture/podman-driver.md delete mode 100644 architecture/podman-rootless-networking.md delete mode 100644 architecture/policy-advisor.md delete mode 100644 architecture/sandbox-connect.md delete mode 100644 architecture/sandbox-custom-containers.md delete mode 100644 architecture/sandbox-providers.md delete mode 100644 architecture/system-architecture.md delete mode 100644 architecture/tui.md create mode 100644 crates/openshell-core/README.md create mode 100644 crates/openshell-driver-docker/README.md create mode 100644 crates/openshell-driver-kubernetes/README.md create mode 100644 crates/openshell-driver-podman/README.md create mode 100644 crates/openshell-providers/README.md diff --git a/.agents/skills/generate-sandbox-policy/SKILL.md b/.agents/skills/generate-sandbox-policy/SKILL.md index 95767fe3f..97f6dbe31 100644 --- a/.agents/skills/generate-sandbox-policy/SKILL.md +++ b/.agents/skills/generate-sandbox-policy/SKILL.md @@ -155,11 +155,11 @@ You may need to go back and forth a few times. Keep the loop tight: Read the full policy schema reference: ``` -Read architecture/security-policy.md +Read docs/reference/policy-schema.mdx ``` Key sections to reference: -- **Full YAML Policy Schema** — top-level structure +- **Policy Schema Reference** — top-level structure - **`network_policies`** — rule structure - **`NetworkEndpoint`** fields — host, port, protocol, tls, enforcement, access, rules, allowed_ips - **`L7Rule` / `L7Allow`** — method + path matching @@ -167,7 +167,7 @@ Key sections to reference: - **Private IP Access via `allowed_ips`** — CIDR allowlist for private IP space - **Validation Rules** — what combinations are valid/invalid -Also read the example policy for real-world patterns. The default policy is baked into the community base image (`ghcr.io/nvidia/openshell-community/sandboxes/base:latest`). For reference, consult the policy schema documentation: +Also read the architecture overview for enforcement context. The default policy is baked into the community base image (`ghcr.io/nvidia/openshell-community/sandboxes/base:latest`). For reference, consult: ``` Read architecture/security-policy.md @@ -567,7 +567,8 @@ private_services: ## Additional Resources -- Full policy schema: [architecture/security-policy.md](../../../architecture/security-policy.md) +- Full policy schema: [docs/reference/policy-schema.mdx](../../../docs/reference/policy-schema.mdx) +- Enforcement overview: [architecture/security-policy.md](../../../architecture/security-policy.md) - Default policy: baked into the community base image (`ghcr.io/nvidia/openshell-community/sandboxes/base:latest`) - Rego evaluation rules: [sandbox-policy.rego](../../../crates/openshell-sandbox/data/sandbox-policy.rego) - For translation examples from real API docs, see [examples.md](examples.md) diff --git a/AGENTS.md b/AGENTS.md index 93062fd5f..3fc34ad55 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -196,6 +196,17 @@ ocsf_emit!(event); - Fern PR previews run through `.github/workflows/branch-docs.yml`, and production publish runs through the `publish-fern-docs` job in `.github/workflows/release-tag.yml`. - Use the `update-docs` skill to scan recent commits and draft doc updates. +### Architecture Docs + +- Architecture docs are short canonical subsystem overviews, not exhaustive implementation notes. +- Update one of the existing top-level architecture docs before adding a new file. +- Put useful crate-specific details in the relevant crate `README.md`. +- Add a new top-level architecture doc only when explicitly requested or when an RFC-level design needs a stable home. +- Keep architecture docs focused on stable boundaries, data/control flow, invariants, and operational constraints. +- Remove stale detail instead of preserving it by default. +- Do not include testing transcripts, historical debugging notes, long source-file inventories, or field-by-field schema references. +- Put user-facing instructions in `docs/`, broad design proposals in `rfc/`, and temporary plans in ignored `architecture/plans/`. + ## Security - Never commit secrets, API keys, or credentials. If a file looks like it contains secrets (`.env`, `credentials.json`, etc.), do not stage it. diff --git a/architecture/README.md b/architecture/README.md index 9eb109b04..3e566364d 100644 --- a/architecture/README.md +++ b/architecture/README.md @@ -1,290 +1,67 @@ -# System Overview +# OpenShell Architecture -## What This Project Does - -This project is a platform for securely running AI agents in isolated sandbox environments. AI agents -- tools that can read, write, and execute code on a user's behalf -- need to operate with real system access to be useful, but granting that access without guardrails poses serious security risks. An unconstrained agent could read sensitive files, exfiltrate data over the network, or execute dangerous system calls. - -This platform solves that problem by creating sandboxed execution environments where agents run with exactly the permissions they need and nothing more. Every sandbox is governed by a policy that defines which files the agent can access, which network hosts it can reach, and which system operations it can perform. All outbound network traffic is forced through a controlled proxy that inspects and enforces access rules in real time. - -The platform runs a gateway control plane and uses a configured compute driver to run agents in isolated sandbox environments. Supported compute platforms include Docker, Podman, Kubernetes, and the experimental MicroVM runtime. The system handles credential management, policy enforcement, and secure remote access while leaving runtime and cluster provisioning to the operator. - -## How the Subsystems Fit Together - -The following diagram shows how the major subsystems interact at a high level. Users interact through the CLI, which communicates with a central gateway. The gateway manages sandbox lifecycle through a compute driver, and each sandbox enforces its own policy locally. Inference API calls to `inference.local` are routed locally within the sandbox by an embedded inference router, without traversing the gateway at request time. +OpenShell runs AI agents in sandboxed environments behind a gateway control +plane. The gateway owns API access, persistence, credentials, and lifecycle +orchestration. A compute runtime creates sandbox workloads. Each sandbox runs a +supervisor that launches the agent as a restricted child process and enforces +policy locally. ```mermaid flowchart TB - subgraph USER["User's Machine"] - CLI["Command-Line Interface"] + CLI["CLI / SDK / TUI"] -->|"gRPC or HTTP"| GW["Gateway"] + GW --> DB[("Gateway database")] + GW --> DRIVER["Compute runtime
Docker, Podman, Kubernetes, VM"] + DRIVER --> SBX["Sandbox workload"] + + subgraph SBX["Sandbox workload"] + SUP["Supervisor"] + PROXY["Policy proxy"] + ROUTER["Inference router"] + AGENT["Agent process"] + POLICY["OPA policy engine"] + SUP --> AGENT + AGENT --> PROXY + PROXY --> POLICY + PROXY --> ROUTER end - subgraph PLATFORM["Compute Platform"] - SERVER["Gateway / Control Plane"] - DB["Database (SQLite or Postgres)"] - DRIVER["Compute Driver
(Docker, Podman,
Kubernetes, VM)"] - - subgraph SBX["Sandbox Workload"] - SUPERVISOR["Sandbox Supervisor"] - PROXY["Network Proxy"] - ROUTER["Inference Router"] - CHILD["Agent Process (restricted)"] - OPA["Policy Engine (OPA)"] - end - end - - subgraph EXT["External Services"] - HOSTS["Allowed Hosts (github.com, api.anthropic.com, ...)"] - CREDS["Provider APIs (Claude, GitHub, GitLab, ...)"] - BACKEND["Inference Backends (OpenAI, Anthropic, NVIDIA, local)"] - end - - CLI -- "gRPC / HTTPS" --> SERVER - CLI -- "SSH over HTTP CONNECT" --> SERVER - SERVER -- "CRUD + Watch" --> DB - SERVER -- "Create / Delete / Watch" --> DRIVER - DRIVER -- "Manage sandbox workload" --> SBX - SUPERVISOR -- "Fetch Policy + Credentials + Inference Bundle" --> SERVER - SUPERVISOR -- "Spawn + Restrict" --> CHILD - CHILD -- "All network traffic" --> PROXY - PROXY -- "Evaluate request" --> OPA - PROXY -- "Allowed traffic only" --> HOSTS - PROXY -- "inference.local requests" --> ROUTER - ROUTER -- "Proxied inference" --> BACKEND - SERVER -. "Store / retrieve credentials" .-> CREDS -``` - -## Major Subsystems - -### Sandbox Execution Environment - -The sandbox is the core of the platform. It creates a restricted environment where an AI agent can run code without being able to harm the host system or access resources it should not. - -Each sandbox runs inside a container as two processes: a privileged **supervisor** and a restricted **child process** (the agent). The supervisor sets up the isolation environment, then launches the child with reduced privileges. The child process runs as a separate, unprivileged user account. - -Isolation is enforced through multiple independent mechanisms that work together as layers of defense: - -- **Filesystem restrictions** control which directories the agent can read and write. The platform uses a Linux kernel feature called Landlock to enforce these rules. If the policy says the agent can only write to `/sandbox`, any attempt to write elsewhere is blocked by the kernel itself -- not by the application. - -- **System call filtering** prevents the agent from performing dangerous low-level operations. A filter (seccomp) blocks the agent from creating raw network sockets, which prevents it from bypassing the network proxy. - -- **Network namespace isolation** places the agent in a separate network environment where the only reachable destination is the proxy. The agent literally cannot send packets to the internet directly; every connection must go through the proxy, which enforces the access policy. - -- **Process privilege separation** ensures the supervisor retains enough privileges to manage the sandbox while the agent process runs with minimal permissions. - -All of these restrictions are driven by a **policy** -- a configuration that defines what a specific sandbox is allowed to do. Policies are written in YAML and evaluated by an embedded policy engine (OPA/Rego). This means security rules are declarative, auditable, and can vary per sandbox. - -For more detail, see [Sandbox Architecture](sandbox.md). - -### Network Proxy and Access Control - -Every sandbox forces all outbound network traffic through an HTTP CONNECT proxy. The proxy sits between the agent and the internet, acting as a gatekeeper that decides which connections are permitted. - -When the agent (or any tool running inside the sandbox) tries to connect to a remote host, the proxy: - -1. **Identifies the requesting program** by inspecting the Linux process table (`/proc`) to determine which binary opened the connection. -2. **Verifies the program's integrity** using a trust-on-first-use model: the first time a binary makes a network request, its cryptographic hash (SHA256) is recorded. If the binary changes later (indicating possible tampering), subsequent requests are denied. -3. **Evaluates the request against policy** using the OPA engine. The policy can allow or deny connections based on the destination hostname, port, and the identity of the requesting program. -4. **Rejects connections to internal IP addresses** as a defense against SSRF (Server-Side Request Forgery). Even if the policy allows a hostname, the proxy resolves DNS before connecting and blocks any result that points to a private network address (e.g., cloud metadata endpoints, localhost, or RFC 1918 ranges). This prevents an attacker from redirecting an allowed hostname to internal infrastructure. -5. **Performs protocol-aware inspection (L7)** for configured endpoints. The proxy can terminate TLS, inspect the underlying HTTP traffic, and enforce rules on individual API requests -- not just connection-level allow/deny. This operates in either audit mode (log violations but allow traffic) or enforce mode (block violations). -6. **Intercepts inference API calls** to `inference.local`. When the agent sends an HTTPS CONNECT request to `inference.local`, the proxy bypasses OPA evaluation entirely and handles the connection through a dedicated inference interception path. It TLS-terminates the connection, parses the HTTP request, detects known inference API patterns (OpenAI, Anthropic, model discovery), and routes matching requests locally through the sandbox's embedded inference router (`openshell-router`). Non-inference requests to `inference.local` are denied with 403. - -The proxy generates an ephemeral certificate authority at startup and injects it into the sandbox's trust store. This allows it to transparently inspect HTTPS traffic when L7 inspection is configured for an endpoint, and to serve TLS for `inference.local` interception. - -For more detail, see [Sandbox Architecture](sandbox.md) (Proxy Routing section). - -### Gateway / Control Plane - -The gateway is the central orchestration service. It provides the API that the CLI talks to and manages sandbox lifecycle through the selected compute driver. - -Key responsibilities: - -- **Sandbox lifecycle management**: Creating, deleting, and monitoring sandboxes. When a user creates a sandbox, the gateway asks the selected compute driver to provision a workload with the correct image, policy, and environment configuration. -- **gRPC and HTTP APIs**: The gateway exposes a gRPC API for structured operations (sandbox CRUD, provider management, SSH session creation) and HTTP endpoints for health checks. Both protocols share a single network port through protocol multiplexing. -- **Data persistence**: Sandbox records, provider credentials, SSH sessions, and inference routes are stored in a database (SQLite by default, Postgres as an option). -- **TLS termination**: The gateway supports TLS with automatic protocol negotiation, so gRPC and HTTP clients can connect securely on the same port. -- **SSH tunnel gateway**: The gateway provides the entry point for SSH connections into sandboxes (see Sandbox Connect below). -- **Real-time updates**: The gateway streams sandbox status changes to the CLI, so users see live progress when a sandbox is starting up. -- **Inference bundle resolution**: The gateway stores gateway-level inference configuration (provider name + model ID) and resolves it into bundles containing endpoint URLs, API keys, supported protocols, provider type, and auth metadata. Sandboxes fetch these bundles at startup and refresh them periodically. The gateway does not proxy inference traffic at request time -- it only provides configuration. - -For more detail, see [Gateway Architecture](gateway.md). - -### Gateway Deployment Infrastructure - -The gateway can run as a standalone process, a container, or a Kubernetes workload installed by the Helm chart in `deploy/helm/openshell`. Operators supply the compute platform and configure the driver that the gateway should use for sandboxes. - -The deployment layer handles: - -- **Gateway startup**: Running the gateway process or installing the Kubernetes Helm release. -- **Runtime configuration**: Supplying image references, service exposure, sandbox runtime configuration, callback endpoints, and TLS material. -- **Credential distribution**: Providing TLS and SSH relay material to the gateway and sandbox workloads. -- **Compute driver configuration**: Selecting Docker, Podman, Kubernetes, or VM-backed sandbox execution. - -The target onboarding experience is: - -```bash -mise run gateway:docker -openshell gateway add + SUP -->|"config, credentials, logs, relay"| GW + PROXY -->|"allowed network traffic"| EXT["External services"] + ROUTER -->|"managed inference"| MODEL["Inference backends"] ``` -The first command is one local development example. Kubernetes operators use `helm upgrade --install openshell ./deploy/helm/openshell --namespace openshell` instead. The second command registers the reachable endpoint with the CLI. - -For more detail, see [Gateway Deployment and Compute Platforms](gateway-single-node.md). - -### Sandbox Connect (SSH Tunneling) - -Users can open interactive terminal sessions into running sandboxes. SSH traffic is tunneled through the gateway rather than exposing sandbox pods directly on the network. - -The connection flow works as follows: - -1. The CLI requests a session token from the gateway. -2. The CLI opens an HTTP CONNECT tunnel to the gateway's SSH tunnel endpoint, passing the token and sandbox identifier. -3. The gateway validates the token, confirms the sandbox is running, resolves the pod's network address, and establishes a TCP connection to the sandbox's embedded SSH server. -4. A cryptographic handshake (HMAC-verified) confirms the gateway's identity to the sandbox. -5. The CLI and sandbox exchange SSH traffic bidirectionally through the tunnel. +## Core Boundaries -This design provides several benefits: - -- Sandbox pods are never directly accessible from outside the cluster. -- All access is authenticated and auditable through the gateway. -- Session tokens can be revoked to immediately cut off access. -- The same mechanism supports both interactive shells and file synchronization (rsync). - -For more detail, see [Sandbox Connect Architecture](sandbox-connect.md). - -### Provider System - -AI agents typically need credentials to access external services -- an API key for the AI model provider, a token for GitHub or GitLab, and so on. The platform manages these credentials as first-class entities called **providers**. - -The provider system handles: - -- **Automatic discovery**: The CLI scans the user's local machine for existing credentials (environment variables, configuration files) and offers to upload them to the gateway. Supported providers include Claude, Codex, OpenCode, OpenAI, Anthropic, NVIDIA, GitHub, GitLab, and others. -- **Secure storage**: Credentials are stored on the gateway, separate from sandbox definitions. They never appear in Kubernetes pod specifications. -- **Runtime injection**: When a sandbox starts, the supervisor process fetches the credentials from the gateway via gRPC and injects them as environment variables into every process it spawns (both the initial agent process and any SSH sessions). -- **CLI management**: Users can create, update, list, and delete providers through standard CLI commands. - -This approach means users configure credentials once, and every sandbox that needs them receives them automatically at runtime. - -For more detail, see [Providers](sandbox-providers.md). - -### Inference Routing - -The inference routing system transparently intercepts AI inference API calls from sandboxed agents and routes them to configured backends. Routing happens locally within the sandbox -- the proxy intercepts connections to `inference.local`, and the embedded `openshell-router` forwards requests directly to the backend without traversing the gateway at request time. - -**How it works end-to-end:** - -1. An operator configures gateway-level inference via `openshell inference set --provider --model `. This stores a reference to the named provider and model on the gateway. -2. When a sandbox starts, the supervisor fetches an inference bundle from the gateway via the `GetInferenceBundle` RPC. The gateway resolves the stored provider reference into a complete route: endpoint URL, API key, supported protocols, provider type, and auth metadata. The sandbox refreshes this bundle eagerly in the background every 5 seconds by default (override with `OPENSHELL_ROUTE_REFRESH_INTERVAL_SECS`). -3. The agent sends requests to `https://inference.local` using standard OpenAI or Anthropic SDK calls. -4. The sandbox proxy intercepts the HTTPS CONNECT to `inference.local` (bypassing OPA policy evaluation), TLS-terminates the connection using the sandbox's ephemeral CA, and parses the HTTP request. -5. Known inference API patterns are detected (e.g., `POST /v1/chat/completions` for OpenAI, `POST /v1/messages` for Anthropic, `GET /v1/models` for model discovery). Matching requests are forwarded to the first compatible route by the `openshell-router`, which rewrites the auth header, injects provider-specific default headers (e.g., `anthropic-version` for Anthropic), and overrides the model field in the request body. -6. Non-inference requests to `inference.local` are denied with 403. - -**Key design properties:** - -- Agents need zero code changes -- standard OpenAI/Anthropic SDK calls work transparently when pointed at `inference.local`. -- The sandbox never sees the real API key for the backend -- credential isolation is maintained through the gateway's bundle resolution. -- Routing is explicit via `inference.local`; OPA network policy is not involved in inference routing. -- Provider-specific behavior (auth header style, default headers, supported protocols) is centralized in `InferenceProviderProfile` definitions in `openshell-core`. Supported inference provider types are openai, anthropic, and nvidia. -- Gateway inference is managed via CLI (`openshell inference set/get`). - -**Inference routes** are stored on the gateway as protobuf objects (`InferenceRoute` in `proto/inference.proto`). Gateway inference uses a managed singleton route entry keyed by `inference.local` and configured from provider + model settings. Endpoint, credentials, and protocols are resolved from the referenced provider record at bundle fetch time, so rotating a provider's API key takes effect on the next bundle refresh without reconfiguring the route. - -**Components involved:** - -| Component | Location | Role | -|---|---|---| -| Proxy inference interception | `crates/openshell-sandbox/src/proxy.rs` | Intercepts `inference.local` CONNECT requests, TLS-terminates, dispatches to router | -| Inference pattern detection | `crates/openshell-sandbox/src/l7/inference.rs` | Matches HTTP method + path against known inference API patterns | -| Local inference router | `crates/openshell-router/src/lib.rs` | Selects a compatible route by protocol and proxies to the backend | -| Provider profiles | `crates/openshell-core/src/inference.rs` | Centralized auth, headers, protocols, and endpoint defaults per provider type | -| Gateway inference service | `crates/openshell-server/src/inference.rs` | Stores gateway inference config, resolves bundles with credentials from provider records | -| Proto definitions | `proto/inference.proto` | `ClusterInferenceConfig`, `ResolvedRoute`, bundle RPCs | - -### Container and Build System - -The platform publishes the gateway image and relies on community-maintained sandbox images: - -| Image | Purpose | +| Component | Boundary | |---|---| -| **Gateway** | Runs the control plane. Contains the gateway binary, database migrations, and an embedded SSH client for sandbox management. | -| **Sandbox** | Runs each sandbox workload. Maintained in the OpenShell Community repository or supplied by the user. | - -Builds use multi-stage Dockerfiles with caching to keep rebuild times fast. The Helm chart handles Kubernetes-level configuration such as service ports, health checks, security contexts, resource limits, storage, and TLS secret mounts. Docker, Podman, and VM-backed deployments configure equivalent runtime concerns through their driver-specific gateway configuration. - -For more detail, see [Container Management](build-containers.md). +| Gateway | Authenticated control plane, state store, provider records, sandbox lifecycle, relay coordination. | +| Compute runtime | Driver-specific creation and deletion of sandbox workloads. | +| Sandbox supervisor | Local sandbox setup, credential injection, policy polling, SSH relay, log push. | +| Policy proxy | Mandatory egress path for agent traffic and policy decisions. | +| Inference router | Sandbox-local forwarding for `https://inference.local`. | -### Policy Language +## Request Flow -Sandbox behavior is governed by policies written in YAML and evaluated by an embedded OPA (Open Policy Agent) engine using the Rego policy language. Policies define: - -- **Filesystem access**: Which directories are readable, which are writable. -- **Network access**: Which remote hosts each program in the sandbox can connect to, with per-binary granularity. -- **Process privileges**: What user/group the agent runs as. -- **L7 inspection rules**: Protocol-level constraints on HTTP API calls for specific endpoints. - -Inference routing to `inference.local` is configured separately at the gateway level and does not require network policy entries. The OPA engine evaluates only explicit network policies; `inference.local` connections bypass OPA entirely and are handled by the proxy's dedicated inference interception path. - -Policies are not intended to be hand-edited by end users in normal operation. They are associated with sandboxes at creation time and fetched by the sandbox supervisor at startup via gRPC. For development and testing, policies can also be loaded from local files. A gateway-global policy can override all sandbox policies via `openshell policy set --global`. - -In addition to policy, the gateway delivers runtime **settings** -- typed key-value pairs (e.g., `log_level`) that can be configured per-sandbox or globally. Settings and policy are delivered together through the `GetSandboxSettings` RPC and tracked by a single `config_revision` fingerprint. See [Gateway Settings Channel](gateway-settings.md) for details. - -For more detail on the policy language, see [Policy Language](security-policy.md). - -### Command-Line Interface - -The CLI is the primary way users interact with the platform. It provides commands organized into four groups: - -- **Gateway management** (`openshell gateway`): Register, select, and inspect gateway endpoints. -- **Sandbox management** (`openshell sandbox`): Create sandboxes (with optional file upload and provider auto-discovery), connect to sandboxes via SSH, and delete sandboxes. -- **Top-level commands**: `openshell status` (gateway health), `openshell logs` (sandbox logs), `openshell forward` (port forwarding), `openshell policy` (sandbox policy management), `openshell settings` (effective sandbox settings and global/sandbox key updates). -- **Provider management** (`openshell provider`): Create, update, list, and delete external service credentials. -- **Inference management** (`openshell inference`): Configure gateway-level inference by specifying a provider and model. The gateway resolves endpoint and credential details from the named provider record. - -The CLI resolves which gateway to operate on through a priority chain: explicit `--gateway` flag, then the `OPENSHELL_GATEWAY` environment variable, then the active gateway set by `openshell gateway select`. Gateway names are exposed to shell completion from local metadata, and `openshell gateway select` opens an interactive chooser on a TTY while falling back to a printed list in non-interactive use. The CLI supports TLS client certificates for mutual authentication with the gateway. - -## How Users Get Started - -The onboarding flow starts from a reachable gateway endpoint. - -**Step 1: Install the CLI.** - -```bash -pip install -``` - -**Step 2: Create a sandbox.** - -```bash -openshell sandbox create -- claude -``` - -Before creating a sandbox, start or deploy the gateway on the selected compute platform and register the reachable endpoint with the CLI. - -**Step 3: Connect to a running sandbox.** - -```bash -openshell sandbox connect -``` +1. A user creates or manages a sandbox through the CLI, SDK, or TUI. +2. The gateway persists state and asks the selected compute runtime to create a workload. +3. The sandbox supervisor starts, fetches policy, settings, providers, and inference routes from the gateway. +4. The supervisor launches the agent as a restricted user in an isolated environment. +5. Agent network traffic goes through the sandbox proxy. The proxy allows, denies, inspects, or routes requests according to policy and inference configuration. +6. Connect, exec, and file sync traffic use a gateway relay to the sandbox supervisor. The gateway does not require direct inbound access to sandbox workloads. -This opens an interactive SSH session into the sandbox, with all provider credentials available as environment variables. +## Architecture Docs -## Architecture Documents Index +Architecture docs are short subsystem overviews. User-facing how-to content +lives in `docs/`. Implementation notes that only matter to one crate belong in +that crate's `README.md`. -| Document | Description | +| Document | Purpose | |---|---| -| [Gateway Deployment and Compute Platforms](gateway-single-node.md) | How the gateway runs across Docker, Podman, Kubernetes with Helm, and the experimental VM driver. | -| [Gateway Architecture](gateway.md) | The control plane gateway: API multiplexing, gRPC services, persistence, TLS, and sandbox orchestration. | -| [Gateway Communication](gateway-deploy-connect.md) | How the CLI resolves a gateway and communicates with it over mTLS, plaintext HTTP/2, or an edge-authenticated WebSocket tunnel. | -| [Gateway Security](gateway-security.md) | mTLS enforcement, PKI provisioning, certificate hierarchy, and the gateway trust model. | -| [Sandbox Architecture](sandbox.md) | The sandbox execution environment: policy enforcement, Landlock, seccomp, network namespaces, and the network proxy. | -| [Container Management](build-containers.md) | Container images, Dockerfiles, Helm charts, build tasks, and CI/CD. | -| [Sandbox Connect](sandbox-connect.md) | SSH tunneling into sandboxes through the gateway. | -| [Sandbox Custom Containers](sandbox-custom-containers.md) | Building and using custom container images for sandboxes. | -| [Providers](sandbox-providers.md) | External credential management, auto-discovery, and runtime injection. | -| [Docs Site Architecture](docs-site.md) | Documentation source layout, navigation structure, local validation and preview workflow, and publish pipeline. | -| [Policy Language](security-policy.md) | The YAML/Rego policy system that governs sandbox behavior. | -| [Inference Routing](inference-routing.md) | Transparent interception and sandbox-local routing of AI inference API calls to configured backends. | -| [Docker Driver](docker-driver.md) | Docker compute driver implementation, host networking, loopback gateway connectivity. | -| [System Architecture](system-architecture.md) | Top-level system architecture diagram with all deployable components and communication flows. | -| [Gateway Settings Channel](gateway-settings.md) | Runtime settings channel: two-tier key-value configuration, global policy override, settings registry, CLI/TUI commands. | -| [TUI](tui.md) | Terminal user interface for sandbox interaction. | +| [Gateway](gateway.md) | Gateway control plane, auth, APIs, persistence, settings, and relay coordination. | +| [Sandbox](sandbox.md) | Sandbox supervisor, child process isolation, proxy, credentials, inference, connect, and logs. | +| [Security Policy](security-policy.md) | Policy model, enforcement layers, policy updates, policy advisor, and security logging. | +| [Compute Runtimes](compute-runtimes.md) | Docker, Podman, Kubernetes, VM, sandbox images, and runtime-specific responsibilities. | +| [Build](build.md) | Build artifacts, CI/E2E, docs site validation, and release packaging. | + +For broad design proposals, use `rfc/`. For temporary working plans, use the +ignored `architecture/plans/` directory. diff --git a/architecture/build-containers.md b/architecture/build-containers.md deleted file mode 100644 index c59ec9a5a..000000000 --- a/architecture/build-containers.md +++ /dev/null @@ -1,109 +0,0 @@ -# Container Images and Deployment Packaging - -OpenShell publishes the gateway image and keeps Kubernetes Helm packaging in this repository. Sandbox images are maintained in the separate OpenShell Community repository. - -## Gateway Image - -The gateway image runs the control plane API server. Kubernetes deployments use it through the Helm chart. Standalone container deployments can use the same image with driver-specific runtime configuration. - -- **Docker target**: `gateway` in `deploy/docker/Dockerfile.images` -- **Registry**: `ghcr.io/nvidia/openshell/gateway:latest` -- **Pulled when**: Helm install or upgrade, or standalone container deployment -- **Entrypoint**: `openshell-gateway --port 8080` - -The image contains the gateway binary and database migrations. Runtime configuration is supplied by Helm values and Kubernetes secrets for Kubernetes, or by driver-specific configuration for standalone gateway deployments. - -## Helm Chart - -The Helm chart at `deploy/helm/openshell` owns Kubernetes deployment concerns: - -- Gateway StatefulSet and persistent volume claim. -- Service account, RBAC, and service. -- Gateway service exposure. -- TLS secret mounts and environment variables. -- Sandbox namespace, default sandbox image, and callback endpoint configuration. -- NetworkPolicy restricting sandbox SSH ingress to the gateway. - -The chart remains the supported deployment artifact for Kubernetes. - -## Image Build Pipeline - -`deploy/docker/Dockerfile.images` no longer compiles Rust. CI calls `.github/workflows/shadow-rust-native-build.yml` through `workflow_call` to build `openshell-gateway` or `openshell-sandbox` natively on the target architecture. `.github/workflows/docker-build.yml` downloads the resulting artifact, stages it at `deploy/docker/.build/prebuilt-binaries//`, builds the per-arch image with the local Buildx driver, and merges multi-arch pushes with `docker buildx imagetools create`. Callers normally publish the GitHub SHA tag, but can pass `image-tag` to publish isolated temporary tags for validation. - -Local image builds use `tasks/scripts/stage-prebuilt-binaries.sh` through `tasks/scripts/docker-build-image.sh` before invoking Docker, so clean checkouts do not need to create the staging directory manually. - -## Supervisor Delivery - -The `openshell-sandbox` supervisor is delivered by the selected compute driver: - -| Driver | Supervisor delivery | -|---|---| -| Kubernetes | Sandbox pod image or Kubernetes driver pod template configuration. | -| Docker | Local supervisor binary or supervisor image extraction configured by the gateway. | -| Podman | Read-only OCI image volume from the `supervisor-output` image. | -| VM | Embedded in the VM runtime rootfs. | - -Each compute driver owns supervisor delivery for its runtime. - -## Standalone Gateway Binary - -OpenShell also publishes a standalone `openshell-gateway` binary as a GitHub release asset. - -- **Source crate**: `crates/openshell-server` -- **Artifact name**: `openshell-gateway-.tar.gz` -- **Targets**: `x86_64-unknown-linux-gnu`, `aarch64-unknown-linux-gnu`, `aarch64-apple-darwin` -- **Release workflows**: `.github/workflows/release-dev.yml`, `.github/workflows/release-tag.yml` - -Both the standalone artifact and the deployed container image use the `openshell-gateway` binary. - -## Python Wheels - -OpenShell also publishes Python wheels for `linux/amd64`, `linux/arm64`, and macOS ARM64. - -- Linux wheels are built natively on matching Linux runners via `build:python:wheel:linux:amd64` and `build:python:wheel:linux:arm64` in `tasks/python.toml`. -- There is no local Linux multiarch wheel build task. Release workflows own the per-arch Linux wheel production. -- The macOS ARM64 wheel is cross-compiled with `deploy/docker/Dockerfile.python-wheels-macos` via `build:python:wheel:macos`. -- Release workflows mirror the CLI layout: a Linux matrix job for amd64/arm64, a separate macOS job, and release jobs that download the per-platform wheel artifacts directly before publishing. -- Release CPU jobs run on `linux-amd64-cpu8` and `linux-arm64-cpu8`; the macOS wheel is still cross-compiled in Docker from the amd64 Linux runner. - -## Development Release Assets - -The rolling `dev` release is installer-facing but still publishes the full -artifact set: CLI tarballs, standalone gateway and sandbox tarballs, Python -wheels, Debian packages, RPM packages, and checksums. Every artifact is built -from the version computed once in `release-dev.yml`. - -Package-manager artifacts use stable dev aliases on the GitHub release -(`openshell-dev-*.deb`, `openshell-dev-*.rpm`, and -`openshell-gateway-dev-*.rpm`) so the rolling release stays readable. Python -wheels keep their versioned filenames because wheel metadata requires it. - -The dev release workflow prunes workflow-owned `openshell*` assets before -uploading the fresh set. `openshell-driver-vm` artifacts are intentionally not -published on the main `dev` release; VM driver binaries live on `vm-dev`. - -## Sandbox Images - -Sandbox images are not built in this repository. They are maintained in the [openshell-community](https://github.com/nvidia/openshell-community) repository and pulled from `ghcr.io/nvidia/openshell-community/sandboxes/` at runtime. - -The default sandbox image is `ghcr.io/nvidia/openshell-community/sandboxes/base:latest`. To use a named community sandbox: - -```bash -openshell sandbox create --from -``` - -This pulls `ghcr.io/nvidia/openshell-community/sandboxes/:latest`. - -## Local Development - -Use the workflow that matches the driver you are changing: - -| Area | Typical local command | -|---|---| -| Gateway image or chart | `mise run helm:lint` and `mise run docker:build:gateway` | -| Docker driver | `mise run gateway:docker` or `mise run e2e:docker` | -| Podman driver | `mise run e2e:podman` | -| VM driver | `mise run e2e:vm` | -| Published docs | `mise run docs` | - -Kubernetes chart changes should be validated with `helm lint deploy/helm/openshell` and, when possible, by installing the chart into a disposable Kubernetes namespace. diff --git a/architecture/build.md b/architecture/build.md new file mode 100644 index 000000000..2567285b5 --- /dev/null +++ b/architecture/build.md @@ -0,0 +1,67 @@ +# Build + +This page records the stable build, CI, docs, and release architecture. It is +not a command reference. Contributor-facing workflow details live in +`CONTRIBUTING.md`, `CI.md`, and published docs. + +## Artifacts + +OpenShell builds these main artifacts: + +| Artifact | Source | +|---|---| +| Gateway binary | `crates/openshell-server` | +| CLI package and Python SDK | `python/openshell` plus Rust binaries where packaged | +| Gateway container image | `deploy/docker/Dockerfile.images` | +| Helm chart | `deploy/helm/openshell` | +| VM driver/runtime assets | `crates/openshell-driver-vm` and `crates/openshell-vm` | +| Published docs site | `docs/` rendered by Fern config in `fern/` | + +Sandbox community images are built outside this repository. + +## Container Builds + +The Docker image pipeline stages prebuilt Rust binaries, then builds container +images from `deploy/docker/Dockerfile.images`. CI builds native artifacts on the +target architecture, stages them under `deploy/docker/.build/`, and then uses +Buildx to publish per-architecture images and multi-architecture tags. + +Local image work should use `mise` tasks rather than direct Docker commands so +the same staging and tagging assumptions are used locally and in CI. + +## CI and E2E + +Required checks run on GitHub Actions. E2E and GPU workflows use NVIDIA +self-hosted runners, so trusted PRs are mirrored by copy-pr-bot into +`pull-request/` branches before those workflows run. + +The high-level CI model: + +1. Standard branch checks run on normal PR activity. +2. Label-gated E2E and GPU checks run from trusted mirror branches. +3. Gate jobs verify that the expected non-gate workflow actually ran. +4. Release workflows rebuild and publish binaries, wheels, images, and docs. + +See `CI.md` for the contributor workflow and labels. + +## Docs Site + +Published docs live in `docs/`. Navigation lives in `docs/index.yml`. Fern site +configuration, components, theme assets, and publish settings live in `fern/`. + +Use `mise run docs` for strict validation and `mise run docs:serve` for local +preview. PR previews are produced by `.github/workflows/branch-docs.yml` when +Fern credentials are available. Production docs publish from the release tag +workflow. + +## Validation Expectations + +- Run `mise run pre-commit` before committing. +- Run `mise run test` after code changes. +- Run `mise run e2e` for sandbox, policy, driver, or deployment changes when the + affected runtime can be exercised. +- Run `mise run ci` before opening a PR when practical. +- Run `mise run docs` when `docs/` or `fern/` changes. + +Architecture-only changes should still check links and references because this +directory is used by agents during implementation and review. diff --git a/architecture/ci-e2e.md b/architecture/ci-e2e.md deleted file mode 100644 index 4fc007fca..000000000 --- a/architecture/ci-e2e.md +++ /dev/null @@ -1,198 +0,0 @@ -# E2E CI Architecture - -This document describes the architecture of the E2E CI flow: every workflow involved, the trigger each one listens on, why those triggers were chosen, and how the pieces fit together. For the contributor-facing how-to (labels, signing, fork flow), see [CI.md](../CI.md). - -## Goals and constraints - -Three independent goals shape the design: - -1. **Self-hosted runner safety.** Required PR checks, E2E, and GPU tests run on NVIDIA self-hosted runners. GitHub's [security hardening guide](https://docs.github.com/en/actions/security-guides/security-hardening-for-github-actions#hardening-for-self-hosted-runners) states bluntly: "Self-hosted runners should almost never be used for public repositories on GitHub, because any user can open pull requests against the repository and compromise the environment." Our workaround is the same one used elsewhere in NVIDIA's GHA infrastructure: copy-pr-bot mirrors trusted PRs into `pull-request/` branches inside this repository, and the self-hosted workflows trigger on `push` to those mirror branches rather than on `pull_request`. -2. **Label as a hard merge gate.** When a PR carries `test:e2e` (or `test:e2e-gpu`), the corresponding suite *must* have actually executed and passed for the PR head SHA. The label has to be enforcing, not advisory: it blocks merge unless the suite ran with the label set. -3. **Per-job least privilege on the GitHub token.** Each workflow declares `permissions: {}` at the top, and each job declares only what it needs. This follows the hardening pattern described at . - -These three goals do not compose cleanly: the safety goal forces `push: pull-request/` triggers (which the PR author can't influence), but `push` triggers don't fire on label changes, so the label gate has to come from a separate workflow on a different trigger. That is the heart of the architecture. - -## Pieces at a glance - -| File | Trigger | Role | -|---|---|---| -| `.github/copy-pr-bot.yaml` | (config) | Tells copy-pr-bot to mirror trusted PRs into `pull-request/` branches. Pre-existed. | -| `.github/workflows/branch-checks.yml` | `push: pull-request/[0-9]+` + `workflow_dispatch` | Runs required branch checks on `linux-amd64-cpu8` and `linux-arm64-cpu8`. | -| `.github/workflows/branch-e2e.yml` | `push: pull-request/[0-9]+` + `workflow_dispatch` | Runs non-GPU E2E on `linux-arm64-cpu8`. | -| `.github/workflows/test-gpu.yml` | `push: pull-request/[0-9]+` + `workflow_dispatch` | Runs GPU E2E on self-hosted GPU runners. | -| `.github/actions/pr-gate/action.yml` | (composite) | Resolves PR metadata for a `pull-request/` push and decides whether the run should proceed. Label enforcement is optional so ordinary branch checks can validate mirror metadata without introducing another PR label. | -| `.github/workflows/e2e-gate.yml` | `pull_request` + `workflow_run` | Posts the required `E2E Gate` check on the PR. Re-evaluates after the gated workflow completes. | -| `.github/workflows/e2e-gate-check.yml` | `workflow_call` | Reusable gate logic shared by E2E and GPU E2E. | -| `.github/workflows/e2e-label-help.yml` | `pull_request_target: [labeled]` | Posts a PR comment when a `test:e2e*` label is applied, telling the maintainer the next manual step (re-run an existing run, or `/ok to test ` to refresh the mirror). Does *not* dispatch the workflow itself - see "Why we don't auto-dispatch" below. | -| `.github/workflows/e2e-test.yml`, `e2e-gpu-test.yaml`, `docker-build.yml` | `workflow_call` | Reusable worker workflows. Unchanged by this design - called from the gated workflows and from release workflows. | - -## OS-49 runner migration - -OS-49 Phase 5 added non-required shadow workflows for the non-release workflows being prepared for shared-runner cutover. Phase 6 promoted the validated shared-runner path into the real non-release workflows and removed the obsolete PR-triggered shadow workflows to avoid duplicate PR checks. - -`branch-checks.yml` uses `pr-gate` without a required label. That still verifies the mirror SHA matches the source PR head SHA, but does not require a new GitHub label for ordinary required checks. `branch-e2e.yml` keeps the existing `test:e2e` gate because it publishes temporary images and runs the expensive E2E suite. `ci-image.yml` now builds amd64 and arm64 CI images natively on shared CPU runners and merges the multi-arch manifest after both per-arch images are pushed. - -The `mise-lockfile` job regenerates `mise.lock` with the CI image's pinned mise version and requires the checked-in file to match exactly. This intentionally includes generated metadata so contributors catch toolchain-version drift instead of letting different mise versions churn the lockfile. - -OS-49 Phase 7 moves the release-facing CPU jobs in `release-canary.yml`, `release-dev.yml`, and `release-tag.yml` to the same shared CPU labels. The release workflows also call `driver-vm-linux.yml`, `driver-vm-macos.yml`, and `deb-package.yml`, so those reusable workers use the same labels to avoid retaining a hidden ARC dependency in the release path. `release-vm-kernel.yml` uses the shared CPU labels for its Linux runtime and release jobs; the macOS runtime job stays on `macos-latest-xlarge` because it builds native macOS dylibs. - -## Trigger taxonomy - -Five GitHub Actions trigger types appear in this flow. Each one was chosen for a specific reason - they are not interchangeable. - -| Trigger | Workflow context | Token scope | Why we use it here | -|---|---|---|---| -| `push: pull-request/[0-9]+` | The pushed commit (mirror branch) | Repo-default | Only fires for branches copy-pr-bot created. Decouples test execution from PR author actions: the author cannot create a `pull-request/` branch themselves. | -| `pull_request` | The PR head SHA, but actions checkout the *base* branch's workflow files | Read-only for forks | Lets us post a status check on the PR's head SHA (so branch protection sees it). Used by the `E2E Gate` evaluation jobs. | -| `pull_request_target` | Base branch | Write-capable, even for forks | Needed for `e2e-label-help.yml` to post a comment on a forked PR. The workflow never checks out PR code, so the standard `pull_request_target` foot-gun does not apply. | -| `workflow_run` | Default branch | Repo-default | Fires when the gated workflow finishes. Lets us run a gate re-evaluation step in a trusted (default-branch) context. | -| `workflow_dispatch` | Caller's ref | Repo-default | Maintainer-only manual re-run (clicking "Re-run all jobs" in the Actions UI). We deliberately do not call this from another workflow - see "Why we don't auto-dispatch" below. | - -The non-obvious move here is that the same logical "did E2E pass for this PR" check has to be posted from two of these trigger contexts: a `pull_request`-triggered run (which can attach a check to the PR head SHA) and a `workflow_run`-triggered run (which knows the gated workflow finished but can only attach checks to `main`). The flow stitches them together by re-running the original `pull_request`-triggered run after the gated workflow completes. - -## Happy-path flow (trusted PR, label applied after mirror) - -```mermaid -sequenceDiagram - autonumber - participant Author as PR Author (org member) - participant GH as GitHub - participant Bot as copy-pr-bot - participant BranchE2E as Branch E2E Checks
(self-hosted) - participant Gate as E2E Gate
(github-hosted) - participant Help as E2E Label Help
(github-hosted) - participant Maintainer - - Author->>GH: Open PR (signed commits) - GH->>Bot: PR opened - Bot->>GH: push pull-request/N (mirror) - GH->>BranchE2E: push event on pull-request/N - BranchE2E->>BranchE2E: pr_metadata: should_run = false
(no label yet) - BranchE2E-->>GH: workflow concludes success
(only metadata job ran) - - GH->>Gate: pull_request opened - Gate->>Gate: no label, gate passes (no-op) - - Maintainer->>GH: apply test:e2e label - GH->>Gate: pull_request labeled - Gate->>Gate: label set,
upstream only ran metadata
→ FAIL (red) - GH->>Help: pull_request_target labeled - Help->>GH: comment on PR with link
to existing Branch E2E Checks run - Maintainer->>GH: open the linked run, click "Re-run all jobs" - GH->>BranchE2E: re-run (push event replayed) - BranchE2E->>BranchE2E: pr_metadata: should_run = true
(label set, SHA matches) - BranchE2E->>BranchE2E: build + e2e jobs run - - BranchE2E-->>GH: workflow concludes success - GH->>Gate: workflow_run completed - Gate->>GH: rerun original pull_request gate run - GH->>Gate: pull_request rerun (replays event) - Gate->>Gate: label set,
upstream success + non-gate jobs ran
→ PASS (green) -``` - -The label-help workflow is intentionally a comment-only nudge: it never dispatches the workflow itself, so the maintainer's re-run goes through the same `push`-event run-id that originally fired on the mirror. This preserves in-progress visibility on the PR's Checks tab. - -## Forked PR flow - -The shape is identical but with two extra round trips: the maintainer has to vet each commit before copy-pr-bot will mirror it. - -```mermaid -sequenceDiagram - autonumber - participant Author as PR Author (fork) - participant GH as GitHub - participant Bot as copy-pr-bot - participant Maintainer - - Author->>GH: Open PR from fork - GH->>Bot: PR opened - Bot->>Bot: not trusted, wait - Maintainer->>GH: comment "/ok to test " - Bot->>GH: push pull-request/N - Note over Bot,GH: From here, identical to the trusted flow:
label → help comment → maintainer re-runs → gate flips green - Author->>GH: push new commit - Bot->>Bot: still untrusted, wait again - Maintainer->>GH: comment "/ok to test " -``` - -## Why each design choice exists - -### Why `push` on `pull-request/` instead of `pull_request` - -`pull_request` workflows execute the workflow file from the PR's own branch. On a self-hosted runner, that means an attacker can rewrite our workflow YAML and run anything. `push: pull-request/` only fires for branches that copy-pr-bot creates, so the workflow file source is always one that the bot vetted (signed commit + trusted author, or `/ok to test`). - -### Why the gate has to verify a non-gate job actually ran - -The gated workflows always start with a `pr_metadata` job. When the label is missing, `pr_metadata` reports `should_run=false` and the build/E2E jobs are skipped. From GitHub's perspective the workflow concluded `success`. If the gate only checked top-level conclusion, an unlabeled run from earlier would satisfy the gate forever - the label could be added without ever causing E2E to actually execute. The gate's "at least one non-gate job succeeded" check (`e2e-gate-check.yml:106-110`) is what forces a re-run after labeling. - -### Why `workflow_run` is needed for the gate flip - -Once the gated workflow runs and finishes, the `pull_request`-triggered gate check posted earlier still says "fail". `workflow_run` is the only event that fires when an arbitrary other workflow completes, and it's how we know to re-evaluate the gate. But `workflow_run` runs in the *default branch context*, so a check posted from there lands on `main` instead of the PR. Workaround: instead of posting a new check, look up the most recent `pull_request`-triggered gate run for the same head SHA and call `POST /actions/runs//rerun`. The re-run replays the original `pull_request` event, so the new check posts against the PR's head SHA and branch protection picks it up. - -### Why `pull_request_target` for the label-help workflow - -A `pull_request` workflow on a forked PR receives a read-only `GITHUB_TOKEN`. That's intentional: it prevents PR-supplied workflow code from escalating. But the help workflow doesn't *run* PR code - it never checks out the PR head, only the workflow file from `main`. It needs `pull-requests: write` to post a comment. `pull_request_target` provides a write-capable token while still loading the workflow definition from `main`. The standard `pull_request_target` warning ("don't check out PR code with this token") doesn't apply because we don't check out anything. - -### Why we don't auto-dispatch the gated workflow - -An earlier iteration of this design auto-dispatched the gated workflow via `gh workflow run --ref pull-request/` from a `pull_request_target: [labeled]` workflow. It worked, but produced a worse UX: `workflow_dispatch`-triggered runs do not appear in the PR's Checks tab. The check-runs are technically attached to the PR head SHA (visible via `gh api commits//check-runs`), but the PR UI filters them out because the run isn't associated with a PR-context event. The maintainer would see "Dispatched" comment, then no progress on the PR until the gate eventually flipped from red to green many minutes later. - -We considered alternatives: - -- **Push an empty marker commit to `pull-request/` to fire a fresh `push` event.** Changes the SHA, breaks the gate's head-SHA equivalence, and writes to a branch copy-pr-bot owns. Architecturally bad. -- **Re-trigger copy-pr-bot programmatically.** copy-pr-bot only listens for `pull_request.*` and `issue_comment.created` events ([source](https://github.com/NVIDIA/gha-runners-apps/blob/main/packages/copy-pr-bot/src/app.ts)). Even commenting `/ok to test ` is a no-op when the mirror is already at that SHA - the bot calls `git.updateRef` with the same SHA and GitHub fires no new push event. There is no way to make copy-pr-bot re-fire a push without an actual SHA change. -- **Have the dispatcher post mirror Check Runs against the PR head SHA via the Checks API.** Possible, but adds a polling/webhook loop to keep the mirror checks in sync with the actual run. Not worth the complexity for a flow a maintainer goes through manually anyway. - -The current design takes the pragmatic path: when a label is applied, the help workflow posts a comment with a deep link to the existing `Branch E2E Checks` run on the mirror. The maintainer clicks **Re-run all jobs**. That re-run replays the original `push` event, so its check-runs surface on the PR's Checks tab in real time. The cost is one human click per label application, in exchange for live progress visibility. - -### Why labels and not comment commands - -Labels persist as PR metadata and survive re-runs and force-pushes. Comment-based commands like `/ok to test` don't survive the same way: a comment from yesterday doesn't enable today's commit. Branch protection rules can require a check be present; they cannot require a comment. The label is the merge gate's primary signal because it is the only thing GitHub's branch protection knows how to look at. - -## Permission posture - -The gated E2E workflows declare `permissions: {}` at the top. Branch checks and CI image publishing use the minimum workflow/job grants needed for checkout, package pulls, and package pushes. - -| Workflow | Job | Grants | -|---|---|---| -| `branch-checks.yml` | workflow default | `contents: read`, `packages: read` | -| | `pr_metadata` | `contents: read`, `pull-requests: read` | -| `ci-image.yml` | workflow default | `contents: read`, `packages: write` | -| `branch-e2e.yml`, `test-gpu.yml` | `pr_metadata` | `contents: read`, `pull-requests: read` | -| | `build-*` | `contents: read`, `packages: write` | -| | `e2e*` | `contents: read`, `packages: read` | -| `e2e-gate.yml` | `e2e`, `gpu` (`workflow_call`) | inherits via the called workflow | -| | `rerun-on-completion` | `actions: write` | -| `e2e-gate-check.yml` | `check` | `contents: read`, `pull-requests: read`, `actions: read` | -| `e2e-label-help.yml` | `hint` | `pull-requests: write`, `actions: read`, `contents: read` | - -The reusable worker workflows (`e2e-test.yml`, `e2e-gpu-test.yaml`, `docker-build.yml`) declare their own internal permissions; the calling job grants are an upper bound for them. - -Only one workflow holds an "interesting" token: `rerun-on-completion` in `e2e-gate.yml` has `actions: write`. It calls one specific endpoint - `POST /actions/runs//rerun` for an `e2e-gate.yml` run on the same head SHA - and never executes PR code. The label-help workflow holds only `pull-requests: write` for posting the comment, also without checking out PR code. - -## Release flow - -`release-tag.yml` and `release-dev.yml` call `e2e-test.yml` directly on `main` / tag pushes. Tags and `main` are inherently trusted refs, so they bypass copy-pr-bot. E2E still blocks the release jobs (`tag-ghcr-release: needs: [..., e2e]`). - -The release CPU jobs run on `linux-amd64-cpu8` and `linux-arm64-cpu8`. GitHub-hosted docs publishing and the external wheel-publish bridge keep their existing runners. VM development release workflows are tracked separately because the managed platform capability decision is still open. - -Permissions on the release workflows are not yet scoped per-job. Tracked separately. - -## Edge cases - -| Case | What happens | -|---|---| -| Label applied before copy-pr-bot mirrors the PR | Help workflow detects no `pull-request/` branch and posts a comment telling the maintainer to wait or run `/ok to test `. | -| Label applied while mirror is stale (new commit pending `/ok to test`) | Help workflow detects mirror SHA != PR head SHA and posts the corresponding comment with the SHA the maintainer needs to vet. | -| Label removed | No reaction. The next PR event (push, label, etc.) re-evaluates the gate, which now sees no label and passes as a no-op. | -| Author force-pushes after label set | copy-pr-bot re-mirrors the new SHA → gated workflow fires on `push` → because the label is still on the PR, `pr_metadata` runs the build/E2E jobs without manual re-run → `workflow_run` fires the gate re-run → new green check on the new SHA. | -| Maintainer re-runs the gated workflow manually from the Actions UI | Same as above without the force-push. This is the path the help workflow points the maintainer at. | -| Gate's first evaluation fails (label set, upstream not yet started) | Email-on-failure noise. The check eventually flips to success once upstream finishes and `workflow_run` re-runs the gate. Tracked as a known rough edge; possible fix is posting `neutral` until the upstream completes. | - -## References - -- copy-pr-bot: -- Astral hardening guidance: -- GitHub Actions security pattern for self-hosted runners: -- `pull_request_target` foot-gun: -- Contributor-facing flow doc: [../CI.md](../CI.md) diff --git a/architecture/compute-runtimes.md b/architecture/compute-runtimes.md new file mode 100644 index 000000000..095b7d020 --- /dev/null +++ b/architecture/compute-runtimes.md @@ -0,0 +1,69 @@ +# Compute Runtimes + +Compute runtimes create, stop, delete, and watch sandbox workloads for the +gateway. They do not replace sandbox policy enforcement. Every runtime starts a +workload that runs the `openshell-sandbox` supervisor, and the supervisor +enforces the sandbox contract locally. + +## Driver Contract + +Each runtime receives a sandbox spec from the gateway and is responsible for: + +- Selecting the sandbox image. +- Injecting sandbox identity and gateway callback configuration. +- Supplying TLS or secret material for supervisor callbacks. +- Providing the supervisor binary or image in the workload. +- Reporting lifecycle and platform events back to the gateway. +- Cleaning up runtime-owned resources. + +## Runtime Summary + +| Runtime | Best fit | Sandbox boundary | Notes | +|---|---|---|---| +| Docker | Local development with Docker available. | Container plus nested sandbox namespace. | Uses host networking so loopback gateway endpoints work from the supervisor. | +| Podman | Rootless or single-machine deployments. | Container plus nested sandbox namespace. | Uses the Podman REST API, OCI image volumes, and CDI GPU devices when available. | +| Kubernetes | Cluster deployment through Helm. | Pod plus nested sandbox namespace. | Uses Kubernetes API objects, service accounts, secrets, PVC-backed workspace storage, and GPU resources. | +| VM | Experimental microVM isolation. | Per-sandbox libkrun VM. | Gateway spawns `openshell-driver-vm` as a subprocess over a Unix socket. | + +Runtime-specific implementation notes belong in the driver crate README: + +- `crates/openshell-driver-docker/README.md` +- `crates/openshell-driver-podman/README.md` +- `crates/openshell-driver-kubernetes/README.md` +- `crates/openshell-driver-vm/README.md` + +## Supervisor Delivery + +The supervisor must be available inside each sandbox workload: + +| Runtime | Delivery model | +|---|---| +| Docker | Bind-mounted or extracted supervisor binary configured by the gateway. | +| Podman | Read-only OCI image volume containing the supervisor binary. | +| Kubernetes | Sandbox pod image or pod template configuration. | +| VM | Embedded in the guest rootfs bundle. | + +Driver-controlled environment variables must override sandbox image or template +values for sandbox ID, sandbox name, gateway endpoint, relay socket path, TLS +paths, and command metadata. + +## Images + +The gateway image and Helm chart are built from this repository. Sandbox images +are maintained separately in the OpenShell Community repository or supplied by +users. + +Custom sandbox images must include the agent runtime and any system +dependencies, but they should not need to include the gateway. GPU-capable +images must include the user-space libraries required by the workload. The +runtime still owns GPU device injection. + +## Deployment Shape + +Kubernetes deployments use the Helm chart under `deploy/helm/openshell`. +Standalone local deployments start the gateway with a selected runtime such as +Docker, Podman, or VM. The CLI can register multiple gateways and switch between +them without changing the sandbox architecture. + +When runtime infrastructure changes, validate the relevant sandbox e2e path and +update the matching driver README if a maintainer-facing constraint changes. diff --git a/architecture/custom-vm-runtime.md b/architecture/custom-vm-runtime.md deleted file mode 100644 index 40fd29dcb..000000000 --- a/architecture/custom-vm-runtime.md +++ /dev/null @@ -1,343 +0,0 @@ -# Custom libkrunfw VM Runtime - -> Status: Experimental and work in progress (WIP). The VM compute driver is -> under active development and may change. - -## Overview - -The OpenShell gateway uses [libkrun](https://github.com/containers/libkrun) via the -`openshell-driver-vm` compute driver to boot a lightweight microVM per sandbox. -Each VM runs on Apple Hypervisor.framework (macOS) or KVM (Linux), with the guest -kernel embedded inside `libkrunfw`. - -The stock `libkrunfw` from Homebrew ships a minimal kernel without bridge, -netfilter, or conntrack support. That is insufficient for the sandbox supervisor's -per-sandbox network namespace primitives (veth pair + iptables, see -`crates/openshell-sandbox/src/sandbox/linux/netns.rs`). The custom libkrunfw -runtime adds bridge, iptables/nftables, and conntrack support to the guest -kernel. - -The driver is spawned by `openshell-gateway` as a subprocess, talks to it over a -Unix domain socket (`compute-driver.sock`) with the -`openshell.compute.v1.ComputeDriver` gRPC surface, and manages per-sandbox -microVMs. The runtime (libkrun + libkrunfw + gvproxy) and the sandbox -supervisor are embedded directly in the driver binary; each sandbox guest -rootfs is derived from a container image at create time. - -## Architecture - -```mermaid -graph TD - subgraph Host["Host (macOS / Linux)"] - GATEWAY["openshell-gateway
(compute::vm::spawn)"] - DRIVER["openshell-driver-vm
(compute-driver.sock)"] - EMB["Embedded runtime (zstd)
libkrun · libkrunfw · gvproxy
+ openshell-sandbox.zst"] - GVP["gvproxy (per sandbox)
virtio-net · DHCP · DNS"] - - GATEWAY <-->|gRPC over UDS| DRIVER - DRIVER --> EMB - DRIVER -->|spawns one per sandbox| GVP - end - - subgraph Guest["Per-sandbox microVM"] - SBXINIT["/srv/openshell-vm-sandbox-init.sh"] - SBX["/opt/openshell/bin/openshell-sandbox
(PID 1, supervisor)"] - SBXINIT --> SBX - end - - DRIVER -- "fork + krun_start_enter" --> SBXINIT - GVP -- "virtio-net eth0" --> Guest - SBX -.->|"outbound ConnectSupervisor
gRPC stream"| GATEWAY - CLIENT["openshell-cli"] -->|SSH over supervisor relay| GATEWAY -``` - -The driver spawns **one microVM per sandbox**. Each VM boots directly into -`openshell-sandbox` as PID 1. All gateway ingress — SSH, exec, connect — rides -the supervisor-initiated `ConnectSupervisor` gRPC stream opened from inside the -guest back out to the gateway, so gvproxy is configured with `-ssh-port -1` and -never binds a host-side TCP listener. - -## Embedded Runtime - -`openshell-driver-vm` embeds the VM runtime libraries and the sandbox -supervisor as zstd-compressed byte arrays, extracting on demand: - -```text -~/.local/share/openshell/vm-runtime// # libkrun / libkrunfw / gvproxy -├── libkrun.{dylib,so} -├── libkrunfw.{5.dylib,so.5} -└── gvproxy - -/sandboxes//rootfs/ # per-sandbox rootfs -``` - -Old runtime cache versions are cleaned up when a new version is extracted. - -### Sandbox rootfs preparation - -Each VM sandbox starts from either a registry image fetched directly over OCI or -a local Docker image reference produced by Dockerfile-based `--from` sources. -For local Dockerfile sources, the CLI builds the image on the local Docker -daemon and passes the ordinary image tag through `template.image`. The VM driver -first checks the local Docker daemon for that tag; when present, it exports the -image filesystem and **rewrites that filesystem into a supervisor-only sandbox -guest** before caching it: - -- `/srv/openshell-vm-sandbox-init.sh` is installed as the guest entrypoint -- the bundled `openshell-sandbox` binary is copied into - `/opt/openshell/bin/openshell-sandbox` -- Kubernetes state and manifests are stripped out if the image contains them -- the guest boots directly into `openshell-sandbox` -- no Kubernetes control - plane, no kube-proxy, no CNI plugins - -See `crates/openshell-driver-vm/src/rootfs.rs` for the rewrite logic and -`crates/openshell-driver-vm/scripts/openshell-vm-sandbox-init.sh` for the init -script that gets installed. - -### `--internal-run-vm` helper - -The driver binary has two modes: the default mode is the gRPC server; when -launched with `--internal-run-vm` it becomes a per-sandbox launcher. The driver -spawns one launcher per sandbox as a subprocess, which in turn starts `gvproxy` -and calls `krun_start_enter` to boot the guest. Keeping the launcher in the -same binary means the driver ships a single artifact for both roles. - -When a sandbox sets `template.image` through `openshell sandbox create --from ...`, -the VM driver treats that image as the base guest rootfs source for that -sandbox. When `template.image` is omitted, the gateway fills it from the VM -driver's advertised `default_image`, which matches the gateway's configured -sandbox image. The driver: - -- resolves the image on the gateway host without Docker for registry and - community image refs -- for local Dockerfile sources, the CLI builds through the host Docker socket - and passes the resulting ordinary Docker tag through `template.image` -- unpacks the image filesystem, injects the VM sandbox init/supervisor files, - and validates required guest tools such as `bash`, `mount`, `ip`, and `sed` -- caches the prepared guest rootfs under - `/images//rootfs.tar` -- extracts a private runtime copy under - `/sandboxes//rootfs` - -The cache key uses an immutable image identity: repo digest for registry images -and the local Docker image ID for images resolved from the local daemon. -Different VM sandboxes can use different base images concurrently because the -shared cache is per image, not global for the driver. Cached prepared rootfs -entries remain on disk until the operator removes them from the VM driver state -directory. - -Docker is therefore no longer required for VM sandboxes created from registry or -community image refs. It is only required on the local CLI/gateway host when the -source is a local Dockerfile or build context. - -Local Dockerfile sources are treated as trusted local-development inputs for VM -gateways. Remote VM gateways still reject local Dockerfile sources until a -gateway-side artifact validation and transfer boundary is designed. - -There is no embedded guest rootfs fallback anymore. VM sandboxes therefore -require either `template.image` or a configured default sandbox image. This is -still replace-the-rootfs semantics, so VM images must remain base-compatible -with the sandbox guest init path. Distroless or `scratch` images are not -expected to work. - -The legacy `openshell-vm` crate remains in the repository for later -deprecation, but it is excluded from the normal workspace and release paths. -`openshell-driver-vm` owns active VM runtime build inputs. - -## Network Plane - -The driver launches a **dedicated `gvproxy` instance per sandbox** to provide the -guest's networking plane: - -- virtio-net backend over a Unix SOCK_STREAM (Linux) or SOCK_DGRAM (macOS vfkit) - socket, which surfaces as `eth0` inside the guest -- DHCP server + default router (192.168.127.1 / 192.168.127.2) for the guest's - udhcpc client -- DNS for host aliases: the guest init script seeds `/etc/hosts` with - `host.openshell.internal` → 192.168.127.1, while leaving gvproxy's legacy - `host.containers.internal` / `host.docker.internal` resolution intact - -The `-listen` API socket and the `-ssh-port` forwarder are both intentionally -omitted. After the supervisor-initiated relay migration the driver does not -enqueue any host-side port forwards, and the guest's SSH listener lives on a -Unix socket at `/run/openshell/ssh.sock` inside the VM that is reached over the -outbound `ConnectSupervisor` gRPC stream. Binding a host listener would race -concurrent sandboxes for port 2222 and surface a misleading "sshd is reachable" -endpoint. - -The sandbox supervisor's per-sandbox netns (veth pair + iptables) branches off -of this plane. libkrun's built-in TSI socket impersonation would not satisfy -those kernel-level primitives, which is why we need the custom libkrunfw. - -## Process Lifecycle Cleanup - -`openshell-driver-vm` installs a cross-platform "die when my parent dies" -primitive (`procguard`) in every link of the spawn chain so that killing -`openshell-gateway` (SIGTERM, SIGKILL, or crash) reaps the driver, per-sandbox -launcher, gvproxy, and the libkrun worker: - -- Linux: `nix::sys::prctl::set_pdeathsig(SIGKILL)` -- macOS / BSDs: `smol-rs/polling` with `ProcessOps::Exit` on a helper thread -- gvproxy (the one non-Rust child) gets `PR_SET_PDEATHSIG` via `pre_exec` on - Linux, and is SIGTERM'd from the launcher's procguard cleanup callback on - macOS - -See `crates/openshell-driver-vm/src/procguard.rs` for the implementation and -`tasks/scripts/vm/smoke-orphan-cleanup.sh` (exposed as -`mise run vm:smoke:orphan-cleanup`) for the regression test that covers both -SIGTERM and SIGKILL paths. - -## Runtime Provenance - -At driver startup the loaded runtime bundle is logged with: - -- Library paths and SHA-256 hashes -- Whether the runtime is custom-built or stock -- For custom runtimes: libkrunfw commit, kernel version, build timestamp - -This information is sourced from `provenance.json` (generated by the build -script) and makes it straightforward to correlate sandbox VM behavior with a -specific runtime artifact. - -## Build Pipeline - -```mermaid -graph LR - subgraph Source["crates/openshell-driver-vm/runtime/"] - KCONF["kernel/openshell.kconfig
Kernel config fragment"] - end - - subgraph Linux["Linux CI (build-libkrun.sh)"] - BUILD_L["Build kernel + libkrunfw.so + libkrun.so"] - end - - subgraph macOS["macOS CI (build-libkrun-macos.sh)"] - BUILD_M["Build libkrunfw.dylib + libkrun.dylib"] - end - - subgraph Output["vm-runtime-<platform>.tar.zst"] - LIB_SO["libkrunfw.so + libkrun.so + gvproxy
(Linux)"] - LIB_DY["libkrunfw.dylib + libkrun.dylib + gvproxy
(macOS)"] - end - - KCONF --> BUILD_L --> LIB_SO - KCONF --> BUILD_M --> LIB_DY -``` - -The `vm-runtime-.tar.zst` artifact is consumed by -`openshell-driver-vm`'s `build.rs`, which embeds the library set into the -binary via `include_bytes!()`. Setting `OPENSHELL_VM_RUNTIME_COMPRESSED_DIR` -at build time (wired up by `tasks/scripts/gateway-vm.sh`, registered as -`mise run gateway:vm`) points the build at the staged artifacts. - -## Kernel Config Fragment - -The `openshell.kconfig` fragment enables these kernel features on top of the -stock libkrunfw kernel: - -| Feature | Key Configs | Purpose | -|---------|-------------|---------| -| Network namespaces | `CONFIG_NET_NS`, `CONFIG_NAMESPACES` | Sandbox netns isolation | -| veth | `CONFIG_VETH` | Sandbox network namespace pairs | -| Bridge device | `CONFIG_BRIDGE`, `CONFIG_BRIDGE_NETFILTER` | Bridge support + iptables visibility into bridge traffic | -| Netfilter framework | `CONFIG_NETFILTER`, `CONFIG_NETFILTER_ADVANCED`, `CONFIG_NETFILTER_XTABLES` | iptables/nftables framework | -| xtables match modules | `CONFIG_NETFILTER_XT_MATCH_CONNTRACK`, `_COMMENT`, `_MULTIPORT`, `_MARK`, `_STATISTIC`, `_ADDRTYPE`, `_RECENT`, `_LIMIT` | Sandbox supervisor iptables rules | -| Connection tracking | `CONFIG_NF_CONNTRACK`, `CONFIG_NF_CT_NETLINK` | NAT state tracking | -| NAT | `CONFIG_NF_NAT` | Sandbox egress DNAT/SNAT | -| iptables | `CONFIG_IP_NF_IPTABLES`, `CONFIG_IP_NF_FILTER`, `CONFIG_IP_NF_NAT`, `CONFIG_IP_NF_MANGLE` | Masquerade and compat | -| nftables | `CONFIG_NF_TABLES`, `CONFIG_NFT_CT`, `CONFIG_NFT_NAT`, `CONFIG_NFT_MASQ`, `CONFIG_NFT_NUMGEN`, `CONFIG_NFT_FIB_IPV4` | nftables path | -| IP forwarding | `CONFIG_IP_ADVANCED_ROUTER`, `CONFIG_IP_MULTIPLE_TABLES` | Sandbox-to-host routing | -| Traffic control | `CONFIG_NET_SCH_HTB`, `CONFIG_NET_CLS_CGROUP` | QoS | -| Cgroups | `CONFIG_CGROUPS`, `CONFIG_CGROUP_DEVICE`, `CONFIG_MEMCG`, `CONFIG_CGROUP_PIDS` | Sandbox resource limits | -| TUN/TAP | `CONFIG_TUN` | CNI plugin compatibility; inherited from the shared kconfig, not exercised by the driver. | -| Dummy interface | `CONFIG_DUMMY` | Fallback networking | -| Landlock | `CONFIG_SECURITY_LANDLOCK` | Sandbox supervisor filesystem sandboxing | -| Seccomp filter | `CONFIG_SECCOMP_FILTER` | Sandbox supervisor syscall filtering | - -See `crates/openshell-driver-vm/runtime/kernel/openshell.kconfig` for the full -fragment with inline comments explaining why each option is needed. - -## Verification - -- **Capability checker** (`check-vm-capabilities.sh`): runs inside a sandbox VM - to verify kernel capabilities. Produces pass/fail results for each required - feature. -- **Orphan-cleanup smoke test**: `mise run vm:smoke:orphan-cleanup` asserts - that killing the gateway leaves zero driver, launcher, gvproxy, or libkrun - survivors. - -## Build Commands - -```shell -# One-time setup: download pre-built runtime (~30s) -mise run vm:setup - -# Start openshell-gateway with the VM compute driver -mise run gateway:vm - -# With custom kernel (optional, adds ~20 min) -FROM_SOURCE=1 mise run vm:setup - -# Remove the staged compressed runtime when you need a clean rebuild -rm -rf target/vm-runtime-compressed -``` - -See `crates/openshell-driver-vm/README.md` for the full driver workflow, -including multi-gateway development, CLI registration, and sandbox creation -examples. - -## CI/CD - -The driver release path is split between on-demand runtime builds and normal -OpenShell releases: - -### Kernel Runtime (`release-vm-kernel.yml`) - -Builds the custom libkrunfw (kernel firmware), libkrun (VMM), and gvproxy for -all supported platforms. Run it on demand when the kernel config or pinned -versions change. - -| Platform | Runner | Build Method | -|----------|--------|-------------| -| Linux ARM64 | `linux-arm64-cpu8` | Native `build-libkrun.sh` | -| Linux x86_64 | `linux-amd64-cpu8` | Native `build-libkrun.sh` | -| macOS ARM64 | `macos-latest-xlarge` (GitHub-hosted) | `build-libkrun-macos.sh` | - -Artifacts: `vm-runtime-{platform}.tar.zst` containing libkrun, libkrunfw, -gvproxy, and provenance metadata. Each platform builds its own libkrunfw and -libkrun natively; the kernel inside libkrunfw is always Linux regardless of -host platform. The workflow publishes GitHub artifact attestations for each -runtime tarball instead of a separate runtime checksum file. - -### Driver Binary (`release-dev.yml` / `release-tag.yml`) - -Builds the self-contained `openshell-driver-vm` binary for every platform, -with the kernel runtime + bundled sandbox supervisor embedded. Development -driver binaries are published to the rolling `dev` release; tagged driver -binaries are published to the corresponding `v*` release. - -The reusable driver workflows pull the current `vm-runtime-.tar.zst` -from the `vm-runtime` release; their build jobs set -`OPENSHELL_VM_RUNTIME_COMPRESSED_DIR=$PWD/target/vm-runtime-compressed` and -run `cargo build --release -p openshell-driver-vm`. The macOS driver is -cross-compiled via osxcross (no macOS runner needed for the binary build — -only for the kernel build). - -macOS driver binaries produced via osxcross are not codesigned. Development -builds are signed automatically by `tasks/scripts/gateway-vm.sh` -(registered as `mise run gateway:vm`) and by the generated Homebrew formula -when `install-dev.sh` installs the selected release on Apple Silicon macOS. A -packaged release needs signing in CI. - -## Rollout Strategy - -1. Custom runtime is embedded by default when building `openshell-driver-vm` - with `OPENSHELL_VM_RUNTIME_COMPRESSED_DIR` set (wired up by - `tasks/scripts/gateway-vm.sh`). -2. The sandbox init script validates kernel capabilities at boot and fails - fast if missing. -3. For development, override with `OPENSHELL_VM_RUNTIME_DIR` to use a local - directory instead of the extracted cache. -4. In CI, the kernel runtime is pre-built and cached in the `vm-runtime` release. - Dev and tagged release builds download that runtime, embed it into - `openshell-driver-vm`, and publish the driver next to `openshell-gateway`. diff --git a/architecture/docker-driver.md b/architecture/docker-driver.md deleted file mode 100644 index 25d48f740..000000000 --- a/architecture/docker-driver.md +++ /dev/null @@ -1,129 +0,0 @@ -# Docker Driver - -The Docker compute driver manages sandbox containers through the local Docker -daemon using the `bollard` client. It targets local developer environments -where running a full Kubernetes cluster is unnecessary but Docker is already -available. - -The gateway remains a host process. Each sandbox container bind-mounts a Linux -`openshell-sandbox` supervisor binary and uses Docker host networking so the -supervisor can connect to a gateway that is listening on host loopback without -requiring an additional bridge-reachable listener on Linux. - -## Source Map - -| Path | Purpose | -|---|---| -| `crates/openshell-driver-docker/src/lib.rs` | Docker compute driver implementation | -| `crates/openshell-driver-docker/src/tests.rs` | Unit tests for container spec, env, TLS paths, GPU, resource limits, and cache helpers | -| `crates/openshell-server/src/cli.rs` | Gateway CLI flags for Docker driver configuration | -| `crates/openshell-server/src/lib.rs` | In-process Docker compute runtime wiring | - -## Runtime Model - -```mermaid -flowchart LR - CLI["OpenShell CLI
host"] -->|gRPC/HTTP
127.0.0.1:8080| GW["Gateway
host process"] - GW -->|Docker API| DA["Docker daemon"] - DA --> C["Sandbox container
network_mode=host"] - C --> SV["openshell-sandbox
supervisor"] - SV -->|ConnectSupervisor
OPENSHELL_ENDPOINT| GW - SV --> NS["Nested sandbox netns
workload + policy proxy"] -``` - -The Docker container itself uses `network_mode = "host"`. This is intentional -for now: it makes a gateway bound to `127.0.0.1` reachable from the supervisor -as `127.0.0.1`, matching the host process' endpoint without a bridge listener, -NAT rule, or userland proxy. - -The container also gets a Docker-managed `/etc/hosts` entry for -`host.openshell.internal` that resolves to `127.0.0.1`. This gives callers a -stable OpenShell-owned hostname for host services without requiring changes to -the host machine's hosts file. - -The supervisor still creates a nested network namespace for the actual workload -and routes workload traffic through its policy proxy. Agent network requests are -enforced by the supervisor in that nested namespace. - -## Container Spec - -`build_container_create_body()` constructs the Docker container: - -| Field | Value | Reason | -|---|---|---| -| `image` | Sandbox template image | User-selected runtime image | -| `user` | `"0"` | Supervisor needs root inside the container for namespace and mount setup | -| `entrypoint` | `/opt/openshell/bin/openshell-sandbox` | Bind-mounted supervisor binary | -| `cmd` | Empty vector | Prevents image CMD args from being appended to the supervisor entrypoint | -| `network_mode` | `"host"` | Lets supervisor connect to host loopback gateway endpoints | -| `extra_hosts` | `host.openshell.internal:127.0.0.1` | Stable container-local alias for host loopback services | -| `cap_add` | `SYS_ADMIN`, `NET_ADMIN`, `SYS_PTRACE`, `SYSLOG` | Required for supervisor isolation setup and process inspection | -| `security_opt` | `apparmor=unconfined` | Docker's default AppArmor profile blocks mount operations required by network namespace setup | -| `restart_policy` | `unless-stopped` | Resume managed sandboxes after Docker or gateway restarts | -| `device_requests` | CDI all-GPU request when `spec.gpu` is true | Enables Docker CDI GPU sandboxes when daemon support is detected | - -## Gateway Callback - -The Docker driver injects `OPENSHELL_ENDPOINT` into each sandbox container from -`Config::grpc_endpoint` without rewriting it. This is the key difference from a -bridge-network design. - -Examples: - -```shell -OPENSHELL_GRPC_ENDPOINT=http://127.0.0.1:8080 -``` - -and: - -```shell -OPENSHELL_GRPC_ENDPOINT=https://127.0.0.1:8080 -``` - -are passed into the supervisor as-is. Because the container shares the host -network namespace, `127.0.0.1` resolves to the host loopback interface and the -gateway is reachable when it binds loopback. - -The endpoint can also use the stable alias: - -```shell -OPENSHELL_GRPC_ENDPOINT=http://host.openshell.internal:8080 -``` - -In host network mode this name resolves to `127.0.0.1` inside the container. - -For TLS endpoints, the gateway certificate must include the exact endpoint host -as a subject alternative name. For `https://127.0.0.1:8080`, the certificate -needs an IP SAN for `127.0.0.1`. For `https://localhost:8080`, it needs a DNS -SAN for `localhost`. For `https://host.openshell.internal:8080`, it needs a DNS -SAN for `host.openshell.internal`. Docker sandboxes also require client TLS -material: - -| Env / flag | Purpose | -|---|---| -| `OPENSHELL_DOCKER_TLS_CA` / `--docker-tls-ca` | CA certificate mounted at `/etc/openshell/tls/client/ca.crt` | -| `OPENSHELL_DOCKER_TLS_CERT` / `--docker-tls-cert` | Client certificate mounted at `/etc/openshell/tls/client/tls.crt` | -| `OPENSHELL_DOCKER_TLS_KEY` / `--docker-tls-key` | Client private key mounted at `/etc/openshell/tls/client/tls.key` | - -When `OPENSHELL_GRPC_ENDPOINT` uses `http://`, these TLS mounts are not -required and providing them is rejected. When it uses `https://`, all three are -required. - -## Environment - -`build_environment()` merges template environment, spec environment, and -driver-controlled keys. Driver-controlled keys win: - -| Variable | Value | -|---|---| -| `OPENSHELL_ENDPOINT` | Exact configured gateway endpoint | -| `OPENSHELL_SANDBOX_ID` | Sandbox id | -| `OPENSHELL_SANDBOX` | Sandbox name | -| `OPENSHELL_SSH_SOCKET_PATH` | Unix socket path used by the supervisor's embedded SSH daemon | -| `OPENSHELL_SANDBOX_COMMAND` | `sleep infinity` | -| `OPENSHELL_TLS_CA` | Mounted CA path when HTTPS is enabled | -| `OPENSHELL_TLS_CERT` | Mounted client cert path when HTTPS is enabled | -| `OPENSHELL_TLS_KEY` | Mounted client key path when HTTPS is enabled | - -The Docker driver does not inject `OPENSHELL_SSH_HANDSHAKE_SECRET`; the -supervisor-to-gateway path relies on mTLS for the Docker callback. diff --git a/architecture/docs-site.md b/architecture/docs-site.md deleted file mode 100644 index f017ad628..000000000 --- a/architecture/docs-site.md +++ /dev/null @@ -1,53 +0,0 @@ -# Docs Site Layout - -## Overview - -Published documentation content lives under `docs/`. The `fern/` directory stores Fern site configuration, React components, theme assets, and publish settings. - -## Repository Layout - -| Path | Role | -|---|---| -| `docs/` | Source of truth for published documentation pages and assets | -| `docs/index.yml` | Navigation definition for the published docs site | -| `fern/docs.yml` | Fern site configuration, including version wiring and publish settings | -| `fern/components/` | Custom Fern React components | -| `fern/assets/` | Site logos and other Fern-managed assets | -| `fern/main.css` | Site theme overrides | -| `fern/fern.config.json` | Fern CLI version and organization config | - -The navigation source is `docs/index.yml`. `fern/docs.yml` points its `versions[].path` field at `../docs/index.yml`, so Fern reads page structure from `docs/` during validation, preview, and publish. - -## Local Workflow - -`tasks/docs.toml` defines the local docs tasks: - -- `mise run docs` runs strict validation. - - Resolves the Fern CLI version from `fern/fern.config.json` - - Runs `fern check` -- `mise run docs:serve` starts a local Fern preview server with `fern docs dev` - -Both tasks execute from `fern/`, but they validate and render the content defined in `docs/`. - -## CI and Release Workflow - -### Pull requests - -`.github/workflows/branch-docs.yml` is the PR docs workflow. - -- It triggers on changes under `docs/**`, `fern/**`, and the workflow file itself. -- It validates the site with `fern check`. -- When `FERN_TOKEN` is available, it runs `fern generate --docs --preview --id pr-` and posts or updates a preview URL on the pull request. - -### Releases - -`.github/workflows/release-tag.yml` publishes production docs in the `publish-fern-docs` job. - -- The job runs after the release job completes. -- It installs the Fern CLI, changes into `./fern`, and runs `fern generate --docs`. - -## Operational Rules - -- Add or edit published pages in `docs/`. -- Change sidebar structure in `docs/index.yml`. -- Change site chrome, theme, Fern behavior, or publish settings in `fern/`. diff --git a/architecture/gateway-deploy-connect.md b/architecture/gateway-deploy-connect.md deleted file mode 100644 index 14bb3e90f..000000000 --- a/architecture/gateway-deploy-connect.md +++ /dev/null @@ -1,140 +0,0 @@ -# Gateway Communication - -## Overview - -This document describes how the CLI resolves a gateway and communicates with it once the endpoint already exists. The gateway exposes gRPC and HTTP services on a single multiplexed port, and the CLI chooses one of three connection modes: direct mTLS, edge-authenticated WebSocket tunnel, or plaintext HTTP/2 behind a trusted proxy. - -## Connection Flow - -### Gateway resolution - -When any CLI command needs to talk to the gateway, it resolves the target through a priority chain (`crates/openshell-cli/src/main.rs` -- `resolve_gateway()`): - -1. `--gateway-endpoint ` flag (direct URL, reusing stored metadata when the gateway is known). -2. `--gateway ` / `-g ` flag. -3. `OPENSHELL_GATEWAY` environment variable. -4. Active gateway from `~/.config/openshell/active_gateway`. - -Resolution loads `GatewayMetadata` from disk to get the `gateway_endpoint` URL and `auth_mode`. When `--gateway-endpoint` is used, the CLI still tries to match the URL to stored metadata so edge auth tokens and TLS bundles continue to resolve by gateway name. - -### Connection modes - -```mermaid -graph TD - CLI["CLI Command"] - RESOLVE["Resolve Gateway"] - MODE{"auth_mode?"} - - MTLS["mTLS Channel"] - EDGE["Edge Tunnel"] - PLAIN["Plaintext Channel"] - - GW["Gateway (gRPC + HTTP)"] - - CLI --> RESOLVE - RESOLVE --> MODE - - MODE -->|"null / mtls"| MTLS - MODE -->|"cloudflare_jwt"| EDGE - MODE -->|"plaintext"| PLAIN - - MTLS -->|"TLS + client cert"| GW - EDGE -->|"WSS tunnel + JWT"| GW - PLAIN -->|"HTTP/2 plaintext"| GW -``` - -### mTLS connection (default) - -**File**: `crates/openshell-cli/src/tls.rs` -- `build_channel()` - -The default mode for self-deployed gateways. The CLI loads three PEM files from `~/.config/openshell/gateways//mtls/`: - -| File | Purpose | -| --------- | -------------------------------------------------------------- | -| `ca.crt` | Gateway CA certificate -- verifies the gateway's server cert | -| `tls.crt` | Client certificate -- proves the CLI's identity to the gateway | -| `tls.key` | Client private key | - -These are used to build a `tonic::transport::ClientTlsConfig`, which configures a `tonic::transport::Channel` for gRPC communication over HTTP/2 with mTLS. - -```mermaid -sequenceDiagram - participant CLI as CLI - participant GW as Gateway - - CLI->>CLI: Load ca.crt, tls.crt, tls.key - CLI->>GW: TCP connect to gateway_endpoint - CLI->>GW: TLS handshake (present client cert) - GW->>GW: Verify client cert against CA - GW-->>CLI: TLS established (HTTP/2 via ALPN) - CLI->>GW: gRPC requests (OpenShell / Inference service) -``` - -### Edge-authenticated connection - -**Files**: `crates/openshell-cli/src/edge_tunnel.rs`, `crates/openshell-cli/src/auth.rs` - -For gateways behind an edge proxy (e.g., Cloudflare Access), the CLI routes traffic through a local WebSocket tunnel proxy: - -1. `start_tunnel_proxy()` binds an ephemeral local TCP port. -2. Opens a WebSocket connection (`wss:///_ws_tunnel`) to the edge with the stored bearer token in headers. -3. The gateway's `ws_tunnel.rs` handler upgrades the WebSocket and bridges it to an in-memory `MultiplexService` instance. -4. The gRPC channel connects to `http://127.0.0.1:` (plaintext HTTP/2 over the tunnel). - -Authentication uses a browser-based flow: `gateway add` opens the user's browser to the gateway's `/auth/connect` endpoint, which reads the `CF_Authorization` cookie and relays it back to a localhost callback server. The token is stored at `~/.config/openshell/gateways//edge_token`. - -### Plaintext connection - -When the gateway is deployed with `--plaintext`, TLS is disabled entirely. The CLI connects over plain HTTP/2. This mode is intended for gateways behind a trusted reverse proxy or tunnel that handles TLS termination. - -The CLI also treats an explicit `http://...` registration as plaintext mode: - -```shell -openshell gateway add http://127.0.0.1:8080 --local -``` - -This stores `auth_mode = "plaintext"`, skips mTLS certificate extraction, and bypasses the edge browser-auth flow. - -## File System Layout - -All connection artifacts are stored under `$XDG_CONFIG_HOME/openshell/` (default `~/.config/openshell/`): - -```text -openshell/ - active_gateway # plain text: active gateway name - gateways/ - / - metadata.json # GatewayMetadata JSON - mtls/ # mTLS bundle (when TLS enabled) - ca.crt # gateway CA certificate - tls.crt # client certificate - tls.key # client private key - edge_token # Edge auth JWT (when auth_mode=cloudflare_jwt) -``` - -## Registering an Edge-Authenticated Gateway - -For gateways that are already deployed behind an edge proxy (e.g., Cloudflare Access), deployment is not needed -- only registration. - -**File**: `crates/openshell-cli/src/run.rs` -- `gateway_add()` - -```mermaid -sequenceDiagram - participant U as User - participant CLI as openshell-cli - participant Browser as Browser - participant Edge as Edge Proxy - participant GW as Gateway - - U->>CLI: openshell gateway add https://gw.example.com - CLI->>CLI: Store metadata (auth_mode: cloudflare_jwt) - CLI->>Browser: Open https://gw.example.com/auth/connect - Browser->>Edge: Edge proxy login - Edge-->>Browser: CF_Authorization cookie - Browser->>GW: GET /auth/connect (with cookie) - GW-->>Browser: Relay page (extracts token, POSTs to localhost) - Browser->>CLI: POST token to localhost callback - CLI->>CLI: Store edge_token - CLI->>CLI: save_active_gateway - CLI-->>U: Gateway added and set as active -``` diff --git a/architecture/gateway-security.md b/architecture/gateway-security.md deleted file mode 100644 index b8c00571d..000000000 --- a/architecture/gateway-security.md +++ /dev/null @@ -1,439 +0,0 @@ -# Gateway Security - -## Overview - -By default, communication with the OpenShell gateway is secured by mutual TLS (mTLS). The CLI, SDK, and sandbox workloads present certificates signed by the deployment CA before they reach any application handler. In Helm deployments, operators provide the certificate bundle as Kubernetes secrets and place the CLI client bundle in the local gateway credential directory. Non-Kubernetes deployments provide equivalent certificate files to the gateway and sandbox runtime. - -The gateway also supports Cloudflare-fronted deployments where the edge, not the gateway, is the first authentication boundary. In that mode the gateway either keeps TLS enabled but allows no-certificate client handshakes (`allow_unauthenticated=true`) and relies on application-layer Cloudflare JWTs, or disables gateway TLS entirely and serves plaintext behind a trusted reverse proxy or tunnel. - -This document covers the certificate hierarchy, how gateway transport security modes are enforced, how sandboxes and the CLI consume their certificates, and the broader security model of the gateway. - -## Architecture Diagram - -```mermaid -graph TD - subgraph PKI["PKI (operator provided)"] - CA["openshell-ca
(self-signed root)"] - SERVER_CERT["openshell-server cert
(signed by CA)"] - CLIENT_CERT["openshell-client cert
(signed by CA, shared)"] - CA --> SERVER_CERT - CA --> CLIENT_CERT - end - - subgraph CLUSTER["Kubernetes Helm Deployment"] - S1["openshell-server-tls
Secret (server cert+key)"] - S2["openshell-server-client-ca
Secret (CA cert)"] - S3["openshell-client-tls
Secret (client cert+key+CA)"] - GW["Gateway Process
(tokio-rustls)"] - SBX["Sandbox Workload"] - end - - subgraph HOST["User's Machine"] - CLI["CLI"] - MTLS_DIR["~/.config/openshell/
gateways/<name>/mtls/"] - end - - SERVER_CERT --> S1 - CA --> S2 - CLIENT_CERT --> S3 - CLIENT_CERT --> MTLS_DIR - - S1 --> GW - S2 --> GW - S3 --> SBX - MTLS_DIR --> CLI - - CLI -- "mTLS" --> GW - SBX -- "mTLS" --> GW -``` - -## Certificate Hierarchy - -The default PKI shape is a single-tier CA hierarchy. Operators can generate this bundle with their internal PKI tooling, cert-manager, or a local development CA. - -```text -openshell-ca (Self-signed Root CA, O=openshell, CN=openshell-ca) -├── openshell-server (Leaf cert, CN=openshell-server) -│ SANs: openshell, openshell.openshell.svc, -│ openshell.openshell.svc.cluster.local, -│ localhost, host.docker.internal, 127.0.0.1 -│ + extra SANs for remote deployments -│ -└── openshell-client (Leaf cert, CN=openshell-client) - Shared by the CLI and all sandbox workloads. -``` - -Key design decisions: - -- **Single client certificate**: One client cert is shared by the CLI and every sandbox workload. This simplifies secret management. Individual sandbox identity is not expressed at the TLS layer; post-authentication identification uses the `x-sandbox-id` gRPC header. -- **Certificate lifetime**: Certificate validity is owned by the operator's PKI policy. -- **CA key not stored in OpenShell**: The chart consumes certificates and CA bundles, but it does not need the CA private key. - -## Kubernetes Secret Distribution - -In Helm deployments, the PKI bundle is distributed as three Kubernetes secrets in the `openshell` namespace: - -| Secret Name | Type | Contents | Consumed By | -|---|---|---|---| -| `openshell-server-tls` | `kubernetes.io/tls` | `tls.crt` (server cert), `tls.key` (server key) | Gateway StatefulSet | -| `openshell-server-client-ca` | `Opaque` | `ca.crt` (CA cert) | Gateway StatefulSet (client verification) | -| `openshell-client-tls` | `Opaque` | `tls.crt` (client cert), `tls.key` (client key), `ca.crt` (CA cert) | Sandbox workloads, CLI (via local filesystem) | - -Secret names are chart values under `server.tls.*` in `deploy/helm/openshell/values.yaml`. - -### Gateway Mounts - -The Helm StatefulSet (`deploy/helm/openshell/templates/statefulset.yaml`) mounts: - -| Volume | Mount Path | Source Secret | -|---|---|---| -| `tls-cert` | `/etc/openshell-tls/server/` (read-only) | `openshell-server-tls` | -| `tls-client-ca` | `/etc/openshell-tls/client-ca/` (read-only) | `openshell-server-client-ca` | - -Environment variables point the gateway binary to these paths: - -```text -OPENSHELL_TLS_CERT=/etc/openshell-tls/server/tls.crt -OPENSHELL_TLS_KEY=/etc/openshell-tls/server/tls.key -OPENSHELL_TLS_CLIENT_CA=/etc/openshell-tls/client-ca/ca.crt -``` - -### Sandbox Workload TLS Material - -When the Kubernetes driver creates a sandbox pod, it injects: - -- A volume backed by the `openshell-client-tls` secret. -- A read-only mount at `/etc/openshell-tls/client/` on the agent container. -- Environment variables for the sandbox gRPC client: - -```text -OPENSHELL_TLS_CA=/etc/openshell-tls/client/ca.crt -OPENSHELL_TLS_CERT=/etc/openshell-tls/client/tls.crt -OPENSHELL_TLS_KEY=/etc/openshell-tls/client/tls.key -OPENSHELL_ENDPOINT=https://openshell.openshell.svc.cluster.local:8080 -``` - -### CLI Local Storage - -The CLI's copy of the client certificate bundle is written to: - -```text -$XDG_CONFIG_HOME/openshell/gateways//mtls/ -├── ca.crt -├── tls.crt -└── tls.key -``` - -Files are written atomically using a temp-dir -> validate -> rename strategy with backup and rollback on failure. See `crates/openshell-bootstrap/src/mtls.rs:10`. - -## PKI Provisioning Sequence - -PKI provisioning is operator-driven: - -1. Generate or obtain a server certificate, server key, client certificate, client key, and CA certificate. -2. Provide the server certificate and client CA to the gateway process. -3. Provide the client certificate bundle to sandbox workloads through the selected compute driver. -4. Store the same client bundle under `~/.config/openshell/gateways//mtls/` so the CLI can authenticate to the gateway. - -For Helm deployments, steps 2 and 3 use the `openshell-server-tls`, `openshell-server-client-ca`, and `openshell-client-tls` Kubernetes secrets before installing or upgrading the chart. - -```mermaid -sequenceDiagram - participant O as Operator - participant K8s as Kubernetes API - participant Helm as Helm - participant GW as Gateway Pod - participant CLI as CLI - - O->>K8s: Create TLS and SSH handshake secrets - O->>Helm: helm upgrade --install - Helm->>K8s: Apply StatefulSet and mounts - K8s->>GW: Start gateway with mounted certs - O->>CLI: Store client cert bundle locally - CLI->>GW: Connect with mTLS -``` - -## Gateway TLS Enforcement - -The gateway supports three transport modes: - -1. **mTLS (default)** -- TLS is enabled and client certificates are required. -2. **Dual-auth TLS** -- TLS is enabled, but the handshake also accepts clients without certificates (`allow_unauthenticated=true`). This is used for Cloudflare Tunnel deployments where the edge authenticates the user and forwards a Cloudflare JWT to the gateway. -3. **Plaintext behind edge** -- TLS is disabled at the gateway and the service listens on HTTP behind a trusted reverse proxy or tunnel. - -### Server Configuration - -`TlsAcceptor::from_files()` (`crates/openshell-server/src/tls.rs:27`) constructs the `rustls::ServerConfig`: - -1. **Server identity**: loads the server certificate and private key from PEM files (supports PKCS#1, PKCS#8, and SEC1 key formats). -2. **Client verification**: builds a `WebPkiClientVerifier` from the CA certificate. In the default mode it requires a valid client certificate; in dual-auth mode it also accepts no-certificate clients and defers authentication to the HTTP/gRPC layer. -3. **ALPN**: advertises `h2` and `http/1.1` for protocol negotiation. - -### Connection Flow - -```text -TCP accept - → TLS handshake (mandatory client cert in mTLS mode, optional in dual-auth mode) - → hyper auto-negotiates HTTP/1.1 or HTTP/2 via ALPN - → MultiplexedService routes by content-type: - ├── application/grpc → GrpcRouter - └── other → Axum HTTP Router -``` - -All traffic shares a single port. When TLS is enabled, the TLS handshake occurs before any HTTP parsing. In plaintext mode, the gateway expects an upstream reverse proxy or tunnel to be the outer security boundary. - -### Cloudflare-Specific HTTP Endpoints - -Cloudflare-fronted gateways add two HTTP endpoints on the same multiplexed port: - -- `/auth/connect` -- browser login relay that reads the `CF_Authorization` cookie server-side and POSTs the token back to the CLI's localhost callback server. -- `/_ws_tunnel` -- WebSocket upgrade endpoint used to carry gRPC and SSH bytes through Cloudflare Access. - -The WebSocket tunnel bridges directly into the gateway's `MultiplexedService` over an in-memory duplex stream. It does not re-enter the public listener, so it behaves the same whether the public listener is plaintext or TLS-backed. - -### What Gets Rejected - -The e2e test suite (`e2e/python/test_security_tls.py`) validates four scenarios: - -| Scenario | Result | -|---|---| -| Client presents correct mTLS cert | `HEALTHY` response | -| Client trusts CA but presents no client cert | `UNAVAILABLE` -- handshake terminated | -| Client presents cert signed by a different CA | `UNAVAILABLE` -- handshake terminated | -| Client connects with plaintext (no TLS) | `UNAVAILABLE` -- transport failure | - -## Sandbox-to-Gateway mTLS - -Sandbox workloads connect back to the gateway at startup to fetch their policy and provider credentials. The gRPC client (`crates/openshell-sandbox/src/grpc_client.rs:18`) reads three environment variables to configure mTLS: - -| Env Var | Value | -|---|---| -| `OPENSHELL_TLS_CA` | `/etc/openshell-tls/client/ca.crt` | -| `OPENSHELL_TLS_CERT` | `/etc/openshell-tls/client/tls.crt` | -| `OPENSHELL_TLS_KEY` | `/etc/openshell-tls/client/tls.key` | - -These are used to build a `tonic::transport::ClientTlsConfig` with: - -- `ca_certificate()` -- verifies the server's certificate against the deployment CA. -- `identity()` -- presents the shared client certificate for mTLS. - -The sandbox calls two RPCs over this authenticated channel: - -- `GetSandboxSettings` -- fetches the YAML policy that governs the sandbox's behavior. -- `GetSandboxProviderEnvironment` -- fetches provider credentials as environment variables. - -## SSH Tunnel Authentication - -SSH connections into sandboxes pass through the gateway's HTTP CONNECT tunnel at `/connect/ssh`. This adds a second authentication layer on top of mTLS. - -### Request Headers - -| Header | Purpose | -|---|---| -| `x-sandbox-id` | Identifies the target sandbox | -| `x-sandbox-token` | Session token (created via `CreateSshSession` RPC) | - -The gateway validates the token against the stored `SshSession` record and checks: - -1. The token has not been revoked. -2. The `sandbox_id` matches the request header. -3. The token has not expired (`expires_at_ms` check; 0 means no expiry for backward compatibility). - -### Session Lifecycle - -SSH session tokens have a configurable TTL (`ssh_session_ttl_secs`, default 24 hours). The `expires_at_ms` field is set at creation time and checked on every tunnel request. Setting the TTL to 0 disables expiry. - -Sessions are cleaned up automatically: - -- **On sandbox deletion**: all SSH sessions for the deleted sandbox are removed from the store. -- **Background reaper**: a periodic task (hourly) deletes expired and revoked session records to prevent unbounded database growth. - -### Connection Limits - -The gateway enforces two concurrent connection limits to bound the impact of credential misuse: - -| Limit | Value | Purpose | -|---|---|---| -| Per-token | 10 concurrent tunnels | Limits damage from a single leaked token | -| Per-sandbox | 20 concurrent tunnels | Prevents bypass via creating many tokens for one sandbox | - -These limits are tracked in-memory and decremented when tunnels close. Exceeding either limit returns HTTP 429 (Too Many Requests). - -### Supervisor-Initiated Relay Model - -The gateway never dials the sandbox. Instead, the sandbox supervisor opens an outbound `ConnectSupervisor` bidirectional gRPC stream to the gateway on startup and keeps it alive for the sandbox lifetime. SSH traffic for `/connect/ssh` (and exec traffic for `ExecSandbox`) rides this same TCP+TLS+HTTP/2 connection as separate multiplexed HTTP/2 streams. The gateway-side registry and `RelayStream` handler live in `crates/openshell-server/src/supervisor_session.rs`; the supervisor-side bridge lives in `crates/openshell-sandbox/src/supervisor_session.rs`. - -Per-connection flow: - -1. CLI presents `x-sandbox-id` + `x-sandbox-token` at `/connect/ssh` and passes gateway token validation. -2. Gateway calls `SupervisorSessionRegistry::open_relay(sandbox_id, ...)`, which allocates a `channel_id` (UUID) and sends a `RelayOpen` message to the supervisor over the already-established `ConnectSupervisor` stream. If no session is registered yet, it polls with exponential backoff up to a bounded timeout (30 s for `/connect/ssh`, 15 s for `ExecSandbox`). -3. The supervisor opens a new `RelayStream` RPC on the same `Channel` — a new HTTP/2 stream, no new TCP connection and no new TLS handshake. The first `RelayFrame` is a `RelayInit { channel_id }` that claims the pending slot on the gateway. -4. `claim_relay` pairs the gateway-side waiter with the supervisor-side RPC via a `tokio::io::duplex(64 KiB)` pair. Subsequent `RelayFrame::data` frames carry raw SSH bytes in both directions. The supervisor is a dumb byte bridge: it has no protocol awareness of the SSH bytes flowing through. -5. Inside the sandbox workload, the supervisor connects the relay to sshd over a Unix domain socket at `/run/openshell/ssh.sock`. - -Security properties of this model: - -- **One auth boundary.** mTLS on the `ConnectSupervisor` stream is the only identity check between gateway and sandbox. Every relay rides that same authenticated HTTP/2 connection. -- **No inbound network path into the sandbox.** The sandbox exposes no TCP port for gateway ingress; all relays are supervisor-initiated. The workload only needs egress to the gateway. -- **In-workload access control is filesystem permissions on the Unix socket.** sshd listens on `/run/openshell/ssh.sock` with the parent directory at `0700` and the socket itself at `0600`, both owned by the supervisor (root). The sandbox entrypoint runs as an unprivileged user and cannot open either. Any process in the supervisor's filesystem view that can open the socket can reach sshd; this is the same trust model as any local Unix socket with `0600` permissions. See `crates/openshell-sandbox/src/ssh.rs:55-83`. -- **Supersede race is closed.** A supervisor reconnect registers a new `session_id` for the same sandbox id. Cleanup on the old session's task uses `remove_if_current(sandbox_id, session_id)` so a late-finishing old task cannot evict the new registration or serve relays meant for the new instance. See `SupervisorSessionRegistry::remove_if_current` in `crates/openshell-server/src/supervisor_session.rs`. -- **Pending-relay reaper.** A background task sweeps `pending_relays` entries older than 10 s (`RELAY_PENDING_TIMEOUT`). If the supervisor acknowledges `RelayOpen` but never initiates `RelayStream` — crash, deadlock, or adversarial stall — the gateway-side slot does not pin indefinitely. -- **Client-side keepalives.** The CLI's `ssh` invocation sets `ServerAliveInterval=15` / `ServerAliveCountMax=3` (`crates/openshell-cli/src/ssh.rs:150`), so a silently-dropped relay (gateway restart, supervisor restart, or adversarial TCP drop) surfaces to the user within roughly 45 s rather than hanging. - -Observability (sandbox side, OCSF): `session_established`, `session_closed`, `session_failed`, `relay_open`, `relay_closed`, `relay_failed`, `relay_close_from_gateway` — all emitted as `NetworkActivity` events. Gateway-side OCSF emission for the same lifecycle is a tracked follow-up. - -## Port Configuration - -Traffic flows through the configured gateway exposure path to the gateway process. Kubernetes deployments use the Helm-managed service; standalone deployments bind the gateway port directly or place it behind an operator-managed proxy. - -| Layer | Port | Configurable Via | -|---|---|---| -| External ingress / port-forward / load balancer / reverse proxy | Operator choice | Platform-specific service or proxy configuration | -| Kubernetes Service | `8080` by default | `deploy/helm/openshell/values.yaml` (`service.port`) | -| NodePort, when enabled | `30051` by default | `deploy/helm/openshell/values.yaml` (`service.nodePort`) | -| Server bind | `8080` | `--port` flag / `OPENSHELL_SERVER_PORT` env var | - -The server binds `0.0.0.0:8080` by default. The chart maps the service port to the gateway workload's `grpc` port for Kubernetes deployments. - -## Security Model Summary - -### Trust Boundaries - -```mermaid -graph LR - subgraph EXTERNAL["External"] - CLI["CLI"] - SDK["SDK"] - end - - subgraph GW["Gateway (mTLS boundary)"] - TLS["TLS Termination
(WebPkiClientVerifier)"] - API["gRPC + HTTP API"] - end - - subgraph PLATFORM["Compute Platform"] - SBX["Sandbox Workload"] - end - - subgraph INET["Internet"] - HOSTS["Allowed Hosts"] - end - - CLI -- "mTLS
(deployment CA)" --> TLS - SDK -- "mTLS
(deployment CA)" --> TLS - TLS --> API - SBX -- "mTLS + ConnectSupervisor
(supervisor-initiated)" --> TLS - API -- "RelayStream
(HTTP/2 on same mTLS conn)" --> SBX - SBX -- "OPA policy +
process identity" --> HOSTS -``` - -### What Is Authenticated - -| Boundary | Mechanism | -|---|---| -| External → Gateway | mTLS with deployment CA by default, or trusted reverse-proxy/Cloudflare boundary in edge mode | -| Sandbox → Gateway | mTLS with shared client cert (supervisor-initiated `ConnectSupervisor` stream) | -| Gateway → Sandbox (SSH/exec) | Rides the supervisor's mTLS `ConnectSupervisor` HTTP/2 connection as a `RelayStream`; no separate gateway-to-sandbox network connection | -| Supervisor → workload sshd | Unix-socket filesystem permissions (`/run/openshell/ssh.sock`, 0700 parent / 0600 socket) | -| Sandbox → External (network) | OPA policy + process identity binding via `/proc` | - -### What Is Not Authenticated (by Design) - -- **Individual sandbox identity at the TLS layer**: all sandboxes share one client certificate (`CN=openshell-client`). Post-TLS identification uses the `x-sandbox-id` gRPC metadata header, which is trusted because it arrives over an mTLS-authenticated channel. -- **Health endpoints in reverse-proxy mode**: when the gateway is deployed behind Cloudflare or another trusted edge, `/health`, `/healthz`, and `/readyz` are protected by that upstream boundary rather than by direct mTLS at the gateway. - -### Gateway Security Context - -The gateway workload runs with a hardened security context (`deploy/helm/openshell/values.yaml:25`): - -```yaml -securityContext: - runAsNonRoot: true - runAsUser: 1000 - allowPrivilegeEscalation: false - capabilities: - drop: - - ALL -``` - -The gateway process has no elevated privileges and drops all Linux capabilities. - -## Threat Model - -This section defines the primary attacker profiles, what the current design protects, and where residual risk remains. - -### Security Goals - -- Prevent unauthenticated access to gateway APIs and SSH tunneling. -- Prevent unauthorized sandbox access across tenants/sessions. -- Protect sandbox-to-gateway policy and credential exchange in transit. -- Limit impact from network-level attackers and accidental misconfiguration. - -### In Scope Threat Actors - -| Threat Actor | Example Capability | -|---|---| -| Network attacker | Can observe/modify traffic between clients and gateway | -| Unauthorized external client | Can reach gateway port but has no valid client cert | -| Compromised sandbox workload | Has code execution inside one sandbox workload | -| Malicious platform peer | Can attempt direct workload-to-workload connections | -| Stolen CLI credentials | Has copied `ca.crt`/`tls.crt`/`tls.key` from a developer machine | - -### Primary Defenses - -| Threat | Existing Defense | Notes | -|---|---|---| -| MITM or passive interception of gateway traffic | Mandatory mTLS with deployment CA, or trusted reverse-proxy boundary in Cloudflare mode | Default mode is direct mTLS; reverse-proxy mode shifts the outer trust boundary upstream | -| Unauthenticated API/health access | mTLS by default, or Cloudflare/reverse-proxy auth in edge mode | `/health*` are direct-mTLS only in the default deployment mode | -| Forged SSH tunnel connection to sandbox | Session token validation at the gateway; only the supervisor's authenticated mTLS `ConnectSupervisor` stream can carry a `RelayStream` to its sandbox | Forging a relay requires stealing a valid mTLS client identity | -| Direct access to sandbox sshd from platform peers | sshd listens on a Unix socket (`0700` parent / `0600` socket) inside the workload | No network path exists to sshd from platform peers | -| Stale or reconnecting supervisor serves relays for a new instance | `session_id`-scoped `remove_if_current` on the registry | Old session cleanup cannot evict a newer registration | -| Supervisor acknowledges `RelayOpen` but never initiates `RelayStream` | Gateway-side pending-relay reaper (10 s timeout) | Prevents indefinite resource pinning by a buggy or malicious supervisor | -| Silent TCP drop of an in-flight relay | CLI `ServerAliveInterval=15` / `ServerAliveCountMax=3` | Client detects a dead relay within ~45 s instead of hanging | -| Unauthorized outbound internet access from sandbox | OPA policy + process identity checks | Applies to sandbox egress policy layer | - -### Residual Risks and Current Tradeoffs - -| Risk | Why It Exists | -|---|---| -| No per-sandbox TLS identity | All sandboxes and CLI share one client certificate | -| Broad blast radius on key compromise | Shared client key reuse across multiple components | -| Weak cryptoperiod | Certificates are effectively non-expiring by default | -| Limited fine-grained revocation | CA private key is not persisted; rotation is coarse-grained | -| Local credential theft risk | CLI mTLS key material is stored on developer filesystem | -| SSH token + mTLS = persistent access within trust boundary | SSH tokens expire after 24h (configurable) and are capped at 3 concurrent connections per token / 20 per sandbox, but within the mTLS trust boundary a stolen token remains usable until TTL expires | - -### Out of Scope / Not Defended By This Layer - -- A fully compromised compute platform, such as a Kubernetes control plane, container host, or VM host. -- A malicious actor with direct access to deployment secrets for the gateway or sandbox runtime. -- Host-level compromise of the developer workstation running the CLI. -- Application-layer authorization bugs after mTLS authentication succeeds. - -### Trust Assumptions - -- The deployment CA is generated and distributed without interception during provisioning. -- Secret access is restricted to intended workloads and operators. -- Gateway and sandbox container images are trusted and not tampered with. -- The sandbox workload's filesystem is trusted: only the supervisor process (root) can open `/run/openshell/ssh.sock`, which is enforced by the `0700` parent directory and `0600` socket permissions set at sshd start. - -## Sandbox Outbound TLS (L7 Inspection) - -Separate from the gateway mTLS infrastructure, each sandbox has an independent TLS capability for inspecting outbound HTTPS traffic. This is documented here for completeness because it involves a distinct, per-sandbox PKI. - -The sandbox proxy automatically detects and terminates TLS on outbound HTTPS connections by peeking the first bytes of each tunnel. This enables credential injection and L7 inspection without requiring explicit policy configuration. The proxy performs TLS man-in-the-middle inspection: - -1. **Ephemeral sandbox CA**: a per-sandbox CA (`CN=OpenShell Sandbox CA, O=OpenShell`) is generated at sandbox startup. This CA is completely independent of the gateway mTLS CA. -2. **Trust injection**: the sandbox CA is written to the sandbox filesystem and injected via `NODE_EXTRA_CA_CERTS` and `SSL_CERT_FILE` so processes inside the sandbox trust it. -3. **Dynamic leaf certs**: for each target hostname, the proxy generates and caches a leaf certificate signed by the sandbox CA (up to 256 entries). -4. **Upstream verification**: the proxy verifies upstream server certificates against Mozilla root CAs (`webpki-roots`) and system CA certificates from the container's trust store, not against the gateway mTLS CA. Custom sandbox images can add corporate/internal CAs via `update-ca-certificates`. - -This capability is orthogonal to gateway mTLS -- it operates only on sandbox-to-internet traffic and uses entirely separate key material. See [Policy Language](security-policy.md) for configuration details. - -## Cross-References - -- [Gateway Architecture](gateway.md) -- protocol multiplexing, gRPC services, persistence, and SSH tunneling -- [Gateway Deployment and Compute Platforms](gateway-single-node.md) -- gateway deployment modes, compute platform inputs, and removed k3s responsibilities -- [Sandbox Architecture](sandbox.md) -- sandbox-side isolation, proxy, and policy enforcement -- [Sandbox Connect](sandbox-connect.md) -- client-side SSH connection flow through the gateway -- [Policy Language](security-policy.md) -- YAML/Rego policy system including L7 TLS inspection configuration diff --git a/architecture/gateway-settings.md b/architecture/gateway-settings.md deleted file mode 100644 index 0b6da9bdd..000000000 --- a/architecture/gateway-settings.md +++ /dev/null @@ -1,562 +0,0 @@ -# Gateway Settings Channel - -## Overview - -The settings channel provides a two-tier key-value configuration system that the gateway delivers to sandboxes alongside policy. Settings are runtime-mutable name-value pairs (e.g., `log_level`, feature flags) that flow from the gateway to sandboxes through the existing `GetSandboxSettings` poll loop. The system supports two scopes -- sandbox-level and global -- with a deterministic merge strategy and per-key mutual exclusion to prevent conflicting ownership. - -## Architecture - -```mermaid -graph TD - CLI["CLI / TUI"] - GW["Gateway
(openshell-server)"] - OBJ["Store: objects table
(gateway_settings,
sandbox_settings blobs)"] - POL["Store: sandbox_policies table
(revisions for sandbox-scoped
and __global__ policies)"] - SB["Sandbox
(poll loop)"] - - CLI -- "UpdateSettings
(policy / setting_key + value)" --> GW - CLI -- "GetSandboxSettings
GetGatewaySettings
ListSandboxPolicies
GetSandboxPolicyStatus" --> GW - GW -- "load/save settings blobs
(delivery mechanism)" --> OBJ - GW -- "put/list/update
policy revisions
(audit + versioning)" --> POL - GW -- "GetSandboxSettingsResponse
(policy + settings +
config_revision +
global_policy_version)" --> SB - SB -- "diff settings
reload OPA on policy change" --> SB -``` - -## Settings Registry - -**File:** `crates/openshell-core/src/settings.rs` - -The `REGISTERED_SETTINGS` static array defines the allowed setting keys and their value types. The registry is the source of truth for both client-side validation (CLI, TUI) and server-side enforcement. - -```rust -pub const REGISTERED_SETTINGS: &[RegisteredSetting] = &[ - RegisteredSetting { key: "providers_v2_enabled", kind: SettingValueKind::Bool }, - RegisteredSetting { key: "ocsf_json_enabled", kind: SettingValueKind::Bool }, -]; -``` - -| Type | Proto variant | Description | -|------|---------------|-------------| -| `String` | `SettingValue.string_value` | Arbitrary UTF-8 string | -| `Int` | `SettingValue.int_value` | 64-bit signed integer | -| `Bool` | `SettingValue.bool_value` | Boolean; CLI accepts `true/false/yes/no/1/0/on/off` via `parse_bool_like()` | - -The reserved key `policy` is excluded from the registry. It is handled by dedicated policy commands and stored as a hex-encoded protobuf `SandboxPolicy` in the global settings' `Bytes` variant. Attempts to set or delete the `policy` key through settings commands are rejected. - -Helper functions: - -- `setting_for_key(key)` -- look up a `RegisteredSetting` by name, returns `None` for unknown keys -- `registered_keys_csv()` -- comma-separated list of valid keys for error messages -- `parse_bool_like(raw)` -- flexible bool parsing from CLI string input - -## Proto Layer - -**File:** `proto/sandbox.proto` - -### New Message Types - -| Message | Fields | Purpose | -|---------|--------|---------| -| `SettingValue` | `oneof value { string_value, bool_value, int_value, bytes_value }` | Type-aware setting value | -| `EffectiveSetting` | `SettingValue value`, `SettingScope scope` | A resolved setting with its controlling scope | -| `SettingScope` enum | `UNSPECIFIED`, `SANDBOX`, `GLOBAL` | Which tier controls the current value | -| `PolicySource` enum | `UNSPECIFIED`, `SANDBOX`, `GLOBAL` | Origin of the policy in a settings response | - -### New RPCs - -**File:** `proto/openshell.proto` - -| RPC | Request | Response | Called by | -|-----|---------|----------|-----------| -| `GetSandboxSettings` | `GetSandboxSettingsRequest { sandbox_id }` | `GetSandboxSettingsResponse { policy, version, policy_hash, settings, config_revision, policy_source, global_policy_version }` | Sandbox poll loop, CLI `settings get` | -| `GetGatewaySettings` | `GetGatewaySettingsRequest {}` | `GetGatewaySettingsResponse { settings, settings_revision }` | CLI `settings get --global`, TUI dashboard | - -### `UpdateSettingsRequest` - -The `UpdateSettings` RPC multiplexes policy and setting mutations through a single request message: - -| Field | Type | Description | -|-------|------|-------------| -| `setting_key` | `string` | Key to mutate (mutually exclusive with `policy` payload) | -| `setting_value` | `SettingValue` | Value to set (for upsert operations) | -| `delete_setting` | `bool` | Delete the key from the specified scope | -| `global` | `bool` | Target gateway-global scope instead of sandbox scope | - -Validation rules: - -- `policy` and `setting_key` cannot both be present -- At least one of `policy` or `setting_key` must be present -- `delete_setting` cannot be combined with a `policy` payload -- The reserved `policy` key requires the `policy` field (not `setting_key`) for set operations -- `name` is required for sandbox-scoped updates but not for global updates - -## Server Implementation - -**File:** `crates/openshell-server/src/grpc.rs` - -### Storage Model - -The settings channel uses two storage mechanisms: the `objects` table for settings blobs (fast delivery) and the `sandbox_policies` table for versioned policy revisions (audit/history). - -#### Settings blobs (`objects` table) - -Settings are persisted using the existing generic `objects` table with two object types: - -| Object type string | Record ID | Record name | Purpose | -|--------------------|-----------|-------------|---------| -| `gateway_settings` | `"global"` | `"global"` | Singleton global settings (includes reserved `policy` key for delivery) | -| `sandbox_settings` | `"settings:{sandbox_uuid}"` | sandbox name | Per-sandbox settings | - -The sandbox settings ID is prefixed with `settings:` to avoid a primary key collision with the sandbox's own record in the `objects` table. The `sandbox_settings_id()` function computes this key. - -The payload is a JSON-encoded `StoredSettings` struct: - -```rust -struct StoredSettings { - revision: u64, // Monotonically increasing - settings: BTreeMap, // Sorted for determinism -} - -enum StoredSettingValue { - String(String), - Bool(bool), - Int(i64), - Bytes(String), // Hex-encoded binary (used for global policy) -} -``` - -#### Policy revisions (`sandbox_policies` table) - -Global policy revisions are stored in the `sandbox_policies` table using the sentinel `sandbox_id = "__global__"` (`GLOBAL_POLICY_SANDBOX_ID` constant). This reuses the same schema as sandbox-scoped policy revisions: - -| Column | Type | Description | -|--------|------|-------------| -| `id` | `TEXT` | UUID primary key | -| `sandbox_id` | `TEXT` | `"__global__"` for global revisions, sandbox UUID for sandbox-scoped | -| `version` | `INTEGER` | Monotonically increasing per `sandbox_id` | -| `policy_payload` | `BLOB` | Protobuf-encoded `SandboxPolicy` | -| `policy_hash` | `TEXT` | Deterministic SHA-256 hash of the policy | -| `status` | `TEXT` | `pending`, `loaded`, `failed`, or `superseded` | -| `load_error` | `TEXT` | Error message (populated on `failed` status) | -| `created_at_ms` | `INTEGER` | Epoch milliseconds when the revision was created | -| `loaded_at_ms` | `INTEGER` | Epoch milliseconds when the revision was marked loaded | - -The `sandbox_policies` table provides history and audit trail (queried by `policy list --global` and `policy get --global`). The `gateway_settings` blob's `policy` key is the authoritative source that `GetSandboxSettings` reads for fast poll resolution. Both are written on `policy set --global` -- this dual-write is intentional. - -### Two-Tier Resolution (`merge_effective_settings`) - -The `GetSandboxSettings` handler resolves the effective settings map by merging sandbox and global tiers: - -1. **Seed registered keys**: All keys from `REGISTERED_SETTINGS` are inserted with `scope: UNSPECIFIED` and `value: None`. This ensures registered keys always appear in the response even when unset. -2. **Apply sandbox values**: Sandbox-scoped settings overlay the registered defaults. Scope becomes `SANDBOX`. -3. **Apply global values**: Global settings override sandbox values. Scope becomes `GLOBAL`. -4. **Exclude reserved keys**: The `policy` key is excluded from the merged settings map (it is delivered as the top-level `policy` field in the response). - -```mermaid -flowchart LR - REG["REGISTERED_SETTINGS
(seed: scope=UNSPECIFIED)"] - SB["Sandbox settings
(scope=SANDBOX)"] - GL["Global settings
(scope=GLOBAL)"] - OUT["Effective settings map"] - - REG --> OUT - SB -->|"overlay"| OUT - GL -->|"override"| OUT -``` - -### Global Policy as a Setting - -The reserved `policy` key in global settings stores a hex-encoded protobuf `SandboxPolicy`. When present, `GetSandboxSettings` uses the global policy instead of the sandbox's own policy: - -1. `decode_policy_from_global_settings()` checks for the `policy` key in global settings -2. If present, the global policy replaces the sandbox policy in the response -3. `policy_source` is set to `GLOBAL` -4. The sandbox policy version counter is preserved for status APIs -5. The `global_policy_version` field is populated from the latest `__global__` revision in the `sandbox_policies` table - -This allows operators to push a single policy that applies to all sandboxes via `openshell policy set --global --policy FILE`. - -### Global Policy Lifecycle - -Global policies are versioned through a full revision lifecycle stored alongside sandbox policies. The sentinel `sandbox_id = "__global__"` (constant `GLOBAL_POLICY_SANDBOX_ID`) distinguishes global revisions from sandbox-scoped revisions in the same `sandbox_policies` table. - -#### State Machine - -```mermaid -stateDiagram-v2 - [*] --> NoGlobalPolicy - - NoGlobalPolicy --> v1_Loaded : policy set --global
(creates v1, marks loaded) - - v1_Loaded --> v1_Loaded : policy set --global
(same hash, dedup no-op) - v1_Loaded --> v2_Loaded : policy set --global
(different hash) - v1_Loaded --> AllSuperseded : policy delete --global - - v2_Loaded --> v2_Loaded : policy set --global
(same hash, dedup no-op) - v2_Loaded --> v3_Loaded : policy set --global
(different hash) - v2_Loaded --> AllSuperseded : policy delete --global - - v3_Loaded --> v3_Loaded : policy set --global
(same hash, dedup no-op) - v3_Loaded --> AllSuperseded : policy delete --global - - AllSuperseded --> NewVersion_Loaded : policy set --global
(any hash, no dedup) - - state "No Global Policy" as NoGlobalPolicy - state "v1: Loaded" as v1_Loaded - state "v2: Loaded, v1: Superseded" as v2_Loaded - state "v3: Loaded, v1-v2: Superseded" as v3_Loaded - state "All Revisions Superseded
(no active global policy)" as AllSuperseded - state "vN: Loaded, older: Superseded" as NewVersion_Loaded -``` - -#### Key behaviors - -- **Dedup on set**: When the latest global revision has status `loaded` and its hash matches the submitted policy, no new revision is created. The settings blob is still ensured to have the `policy` key (reconciliation against potential data loss from a pod restart while the `sandbox_policies` table retained the revision). See `crates/openshell-server/src/grpc.rs` -- `update_settings()`, lines around the `current.policy_hash == hash && current.status == "loaded"` check. - -- **No dedup against superseded**: If the latest revision has status `superseded` (e.g., after a `policy delete --global`), the same hash creates a new revision. This supports the toggle pattern: delete the global policy, then re-set the same policy. The dedup check explicitly requires `status == "loaded"`. - -- **Immediate load**: Global policy revisions are marked `loaded` immediately upon creation (no sandbox confirmation needed). The gateway calls `update_policy_status(GLOBAL_POLICY_SANDBOX_ID, next_version, "loaded", ...)` right after `put_policy_revision()`. Sandboxes pick up changes via the 10-second poll loop. - -- **Supersede on set**: When a new global revision is created, `supersede_older_policies(GLOBAL_POLICY_SANDBOX_ID, next_version)` marks all older revisions with `pending` or `loaded` status as `superseded`. - -- **Delete supersedes all**: `policy delete --global` removes the `policy` key from the `gateway_settings` blob and calls `supersede_older_policies()` with `latest.version + 1` to mark ALL `__global__` revisions as `superseded`. This restores sandbox-level policy control. - -- **Dual-write**: `policy set --global` writes to BOTH the `sandbox_policies` revision table (for audit/listing via `policy list --global`) AND the `gateway_settings` blob (for fast delivery via `GetSandboxSettings`). The revision table provides history; the settings blob is the authoritative source that sandboxes poll. - -- **Concurrency**: All global mutations acquire `ServerState.settings_mutex` (a `tokio::sync::Mutex<()>`) for the duration of the read-modify-write cycle. This prevents races between concurrent global policy set/delete operations and global setting mutations. - -#### Global policy effects on sandboxes - -When a global policy is active (the `policy` key exists in `gateway_settings`): - -| Operation | Effect | -|-----------|--------| -| `GetSandboxSettings` | Returns the global policy payload instead of the sandbox's own policy. `policy_source = GLOBAL`. `global_policy_version` set to the active revision's version number. | -| `policy set ` | Rejected with `FailedPrecondition: "policy is managed globally; delete global policy before sandbox policy update"` | -| `rule approve ` | Rejected with `FailedPrecondition: "cannot approve rules while a global policy is active; delete the global policy to manage per-sandbox rules"` | -| `rule approve-all` | Rejected with same `FailedPrecondition` as `rule approve` | -| Revoking an approved chunk (via `rule reject` on an `approved` chunk) | Rejected with same `FailedPrecondition` -- revoking would modify the sandbox policy which is not in use | -| Rejecting a `pending` chunk | Allowed -- rejection does not modify the sandbox policy | -| `settings set/delete` at sandbox scope | Allowed -- settings and policy are independent channels | -| Draft chunk collection | Continues normally -- sandbox proxy still generates proposals. Chunks are visible but cannot be approved. | - -The blocking logic is implemented by `require_no_global_policy()` in `crates/openshell-server/src/grpc.rs`, which checks for the `policy` key in global settings and returns `FailedPrecondition` if present. - -### `config_revision` and `global_policy_version` - -**`config_revision`** (`u64`): Content hash of the merged effective config. Computed by `compute_config_revision()` from three inputs: `policy_source` (as 4 LE bytes), the deterministic policy hash (if policy present), and sorted settings entries (key bytes + scope as 4 LE bytes + type tag byte + value bytes). The SHA-256 digest is truncated to 8 bytes and interpreted as `u64` (little-endian). Changes when the global policy, sandbox policy, settings, or policy source changes. Used by the sandbox poll loop for change detection. - -**`global_policy_version`** (`u32`): The version number of the active global policy revision. Populated in `GetSandboxSettingsResponse` when `policy_source == GLOBAL` by looking up the latest revision for `GLOBAL_POLICY_SANDBOX_ID`. Zero when no global policy is active or when `policy_source == SANDBOX`. Displayed in the TUI dashboard and sandbox metadata pane, and logged by the sandbox on reload. - -### Per-Key Mutual Exclusion - -Global and sandbox scopes cannot both control the same key simultaneously: - -| Operation | Global key exists | Behavior | -|-----------|-------------------|----------| -| Sandbox set | Yes | `FailedPrecondition`: "setting '{key}' is managed globally; delete the global setting before sandbox update" | -| Sandbox delete | Yes | `FailedPrecondition`: "setting '{key}' is managed globally; delete the global setting first" | -| Sandbox set | No | Allowed | -| Sandbox delete | No | Allowed | -| Global set | (any) | Always allowed (global overrides) | -| Global delete | (any) | Allowed; unlocks sandbox control for the key | - -This prevents conflicting values at different scopes. An operator must delete a global key before a sandbox-level value can be set for the same key. - -### Sandbox-Scoped Policy Update Interaction - -When a global policy is set, sandbox-scoped policy updates via `UpdateSettings` are rejected with `FailedPrecondition`: - -```text -policy is managed globally; delete global policy before sandbox policy update -``` - -Deleting the global policy (`openshell policy delete --global`) removes the `policy` key from global settings and restores sandbox-level policy control. - -## Sandbox Implementation - -### Poll Loop Changes - -**File:** `crates/openshell-sandbox/src/lib.rs` (`run_policy_poll_loop`) - -The poll loop uses `GetSandboxSettings` (not a policy-specific RPC) and tracks `config_revision` as the change-detection signal: - -1. **Fetch initial state**: Call `poll_settings(sandbox_id)` to establish baseline `current_config_revision`, `current_policy_hash`, and `current_settings`. -2. **On each tick**: Compare `result.config_revision` against `current_config_revision`. If unchanged, skip. -3. **Determine what changed**: - - Compare `result.policy_hash` against `current_policy_hash` to detect policy changes - - Call `log_setting_changes()` to diff the settings map and log individual changes -4. **Conditional OPA reload**: Only call `opa_engine.reload_from_proto()` when `policy_hash` changes. Settings-only changes update the tracked state without touching the OPA engine. -5. **Status reporting**: Report policy load status only for sandbox-scoped revisions (`policy_source == SANDBOX` and `version > 0`). Global policy overrides trigger a reload but do not write per-sandbox policy status history. - -```mermaid -sequenceDiagram - participant PL as Poll Loop - participant GW as Gateway - participant OPA as OPA Engine - - PL->>GW: GetSandboxSettings(sandbox_id) - GW-->>PL: policy + settings + config_revision - - loop Every interval (default 10s) - PL->>GW: GetSandboxSettings(sandbox_id) - GW-->>PL: response - - alt config_revision unchanged - PL->>PL: Skip - else config_revision changed - PL->>PL: log_setting_changes(old, new) - alt policy_hash changed - PL->>OPA: reload_from_proto(policy) - PL->>GW: ReportPolicyStatus (if sandbox-scoped) - else settings-only change - PL->>PL: Update tracked state (no OPA reload) - end - end - end -``` - -### Per-Setting Diff Logging - -**File:** `crates/openshell-sandbox/src/lib.rs` (`log_setting_changes`) - -When `config_revision` changes, the sandbox logs each individual setting change: - -- **Changed**: `info!(key, old, new, "Setting changed")` -- logs old and new values -- **Added**: `info!(key, value, "Setting added")` -- new key not in previous snapshot -- **Removed**: `info!(key, "Setting removed")` -- key in previous snapshot but not in new - -Values are formatted by `format_setting_value()`: strings as-is, bools and ints as their string representation, bytes as ``, unset as ``. - -### `SettingsPollResult` - -**File:** `crates/openshell-sandbox/src/grpc_client.rs` - -```rust -pub struct SettingsPollResult { - pub policy: Option, - pub version: u32, - pub policy_hash: String, - pub config_revision: u64, - pub policy_source: PolicySource, - pub settings: HashMap, - pub global_policy_version: u32, -} -``` - -The `poll_settings()` method maps the full `GetSandboxSettingsResponse` into this struct. The `settings` field carries the effective settings map for diff logging. The `global_policy_version` field is propagated from the response and used for logging when the sandbox reloads a global policy. - -## CLI Commands - -**File:** `crates/openshell-cli/src/main.rs` (`SettingsCommands`), `crates/openshell-cli/src/run.rs` - -### `settings get [name] [--global]` - -Display effective settings for a sandbox or the gateway-global scope. - -```bash -# Sandbox-scoped effective settings -openshell settings get my-sandbox - -# Gateway-global settings -openshell settings get --global -``` - -Sandbox output includes: sandbox name, config revision, policy source (sandbox/global), policy hash, and a table of settings with key, value, and scope (sandbox/global/unset). - -Global output includes: scope label, settings revision, and a table of settings with key and value. Registered keys without a configured value display as ``. - -### `settings set [name] --key K --value V [--global] [--yes]` - -Set a single setting key at sandbox or global scope. - -```bash -# Sandbox-scoped -openshell settings set my-sandbox --key ocsf_json_enabled --value true - -# Global (requires confirmation) -openshell settings set --global --key providers_v2_enabled --value true -openshell settings set --global --key ocsf_json_enabled --value true - -# Skip confirmation -openshell settings set --global --key providers_v2_enabled --value true --yes -``` - -Value parsing is type-aware: bool keys accept `true/false/yes/no/1/0/on/off` via `parse_bool_like()`. Int keys parse as base-10 `i64`. String keys accept any value. - -### `settings delete [name] --key K [--global] [--yes]` - -Delete a setting key from the specified scope. - -```bash -# Global delete (unlocks sandbox control) -openshell settings delete --global --key providers_v2_enabled --yes -``` - -### `policy set --global --policy FILE [--yes]` - -Set a gateway-global policy that overrides all sandbox policies. Creates a versioned revision in the `sandbox_policies` table and writes the policy to the `gateway_settings` blob for delivery. - -```bash -openshell policy set --global --policy policy.yaml --yes -``` - -The `--wait` flag is rejected for global policy updates with: `"--wait is not supported for global policies; global policies are effective immediately"`. See `crates/openshell-cli/src/main.rs`. - -### `policy delete --global [--yes]` - -Delete the gateway-global policy, restoring sandbox-level policy control. Removes the `policy` key from the `gateway_settings` blob and supersedes all `__global__` revisions. - -```bash -openshell policy delete --global --yes -``` - -Note: `policy delete` without `--global` is not supported (sandbox policies are managed through versioned updates, not deletion). The CLI returns: `"sandbox policy delete is not supported; use --global to remove global policy lock"`. - -### `policy list --global [--limit N]` - -List global policy revision history. Uses `ListSandboxPolicies` with `global: true`, which routes to the `__global__` sentinel in the `sandbox_policies` table. - -```bash -openshell policy list --global -openshell policy list --global --limit 10 -``` - -### `policy get --global [--rev N] [--full]` - -Show a specific global policy revision (or the latest). Uses `GetSandboxPolicyStatus` with `global: true`. - -```bash -# Latest global revision -openshell policy get --global - -# Specific version -openshell policy get --global --rev 3 - -# Full policy payload as YAML -openshell policy get --global --full -``` - -### HITL Confirmation - -All `--global` mutations require human-in-the-loop confirmation via an interactive prompt. The `--yes` flag bypasses the prompt for scripted/CI usage. In non-interactive mode (no TTY), `--yes` is required -- otherwise the command fails with an error. - -The confirmation message varies: - -- **Global setting set**: warns that this will override sandbox-level values for the key -- **Global setting delete**: warns that this re-enables sandbox-level management -- **Global policy set**: warns that this overrides all sandbox policies -- **Global policy delete**: warns that this restores sandbox-level control - -## TUI Integration - -**File:** `crates/openshell-tui/src/` - -### Dashboard: Global Policy Indicator - -**File:** `crates/openshell-tui/src/ui/dashboard.rs` - -The gateway row in the dashboard shows a yellow `Global Policy Active (vN)` indicator when a global policy is active. The TUI detects this by calling `ListSandboxPolicies` with `global: true, limit: 1` on each polling tick and checking if the latest revision has `PolicyStatus::Loaded`. The version number and active flag are tracked in `App.global_policy_active` and `App.global_policy_version`. - -### Dashboard: Global Settings Tab - -The dashboard's middle pane has a tabbed interface: **Providers** | **Global Settings**. Press `Tab` to switch. - -The Global Settings tab displays registered keys with their current values, fetched via `GetGatewaySettings`. Features: - -- **Navigate**: `j`/`k` or arrow keys to select a setting -- **Edit** (`Enter`): Opens a type-aware editor: - - Bool keys: toggle between true/false - - String/Int keys: text input field -- **Delete** (`d`): Remove the selected key's value -- **Confirmation modals**: Both edit and delete operations show a confirmation dialog before applying -- **Scope indicators**: Each key shows its current value or `` - -### Sandbox Metadata Pane: Global Policy Indicator - -**File:** `crates/openshell-tui/src/ui/sandbox_detail.rs` - -When the sandbox's policy source is `GLOBAL` (detected via `policy_source` in the `GetSandboxSettings` response), the metadata pane shows `Policy: managed globally (vN)` in yellow. The version comes from `global_policy_version` in the response. Tracked in `App.sandbox_policy_is_global` and `App.sandbox_global_policy_version`. - -### Network Rules Pane: Global Policy Warning - -**File:** `crates/openshell-tui/src/ui/sandbox_draft.rs` - -When `sandbox_policy_is_global` is true, the Network Rules pane displays a yellow bottom title: `" Cannot approve rules while global policy is active "`. Draft chunks are still rendered but their status styles are greyed out (`t.muted`). Keyboard actions for approve (`a`), reject/revoke (`x`), and approve-all are intercepted client-side with status messages like `"Cannot approve rules while a global policy is active"` and `"Cannot modify rules while a global policy is active"`. See `crates/openshell-tui/src/app.rs` -- draft key handling. - -### Sandbox Screen: Settings Tab - -The sandbox detail view's bottom pane has a tabbed interface: **Policy** | **Settings**. Press `l` to switch tabs. - -The Settings tab shows effective settings for the selected sandbox, fetched as part of the `GetSandboxSettings` response. Features: - -- Same navigation and editing as the global settings tab -- **Scope indicators**: Each key shows `(sandbox)`, `(global)`, or `(unset)` to indicate the controlling tier -- Sandbox-scoped edits are blocked for globally-managed keys (server returns `FailedPrecondition`) - -### Data Refresh - -Settings are refreshed on each 2-second polling tick alongside the sandbox list and health status. The global settings revision is tracked to detect changes. Sandbox settings are refreshed when viewing a specific sandbox. Global policy active status is detected on each tick via `ListSandboxPolicies` with `global: true`. - -## Data Flow: Setting a Global Key - -End-to-end trace for `openshell settings set --global --key providers_v2_enabled --value true --yes`: - -1. **CLI** (`crates/openshell-cli/src/run.rs` -- `gateway_setting_set()`): - - `parse_cli_setting_value("providers_v2_enabled", "true")` -- looks up `SettingValueKind::Bool` in the registry, wraps as `SettingValue { bool_value: true }` - - `confirm_global_setting_takeover()` -- skipped because `--yes` - - Sends `UpdateSettingsRequest { setting_key: "providers_v2_enabled", setting_value: Some(...), global: true }` - -2. **Gateway** (`crates/openshell-server/src/grpc.rs` -- `update_settings()`): - - Acquires `settings_mutex` for the duration of the operation - - Detects `global=true`, `has_setting=true` - - `validate_registered_setting_key("providers_v2_enabled")` -- passes (key is in registry) - - `load_global_settings()` -- reads `gateway_settings` record from store - - `proto_setting_to_stored()` -- converts proto value to `StoredSettingValue::Bool(true)` - - `upsert_setting_value()` -- inserts into `BTreeMap`, returns `true` (changed) - - Increments `revision`, calls `save_global_settings()` - - Returns `UpdateSettingsResponse { settings_revision: N }` - -3. **Sandbox** (next poll tick in `run_policy_poll_loop()`): - - `poll_settings(sandbox_id)` returns new `config_revision` - - `log_setting_changes()` logs: `Setting changed key="providers_v2_enabled" old="" new="true"` - - `policy_hash` unchanged -- no OPA reload - - Updates tracked `current_config_revision` and `current_settings` - -## Data Flow: Setting a Global Policy - -End-to-end trace for `openshell policy set --global --policy policy.yaml --yes`: - -1. **CLI** (`crates/openshell-cli/src/main.rs`, `crates/openshell-cli/src/run.rs` -- `sandbox_policy_set_global()`): - - Rejects `--wait` flag with `"--wait is not supported for global policies; global policies are effective immediately"` - - Loads and parses the YAML policy file into a `SandboxPolicy` protobuf - - Sends `UpdateSettingsRequest { policy: Some(sandbox_policy), global: true }` - -2. **Gateway** (`crates/openshell-server/src/grpc.rs` -- `update_settings()`): - - Acquires `settings_mutex` - - Detects `global=true`, `has_policy=true` - - `ensure_sandbox_process_identity()` -- ensures process identity defaults to "sandbox" - - `validate_policy_safety()` -- rejects unsafe policies (e.g., root process) - - `deterministic_policy_hash()` -- computes SHA-256 hash of the policy - - **Dedup check**: Fetches `get_latest_policy(GLOBAL_POLICY_SANDBOX_ID)` - - If latest exists with `status == "loaded"` and same hash → no-op (ensures settings blob has `policy` key, returns existing version) - - If no latest, or latest is `superseded`, or hash differs → create new revision - - `put_policy_revision(id, "__global__", next_version, payload, hash)` -- persists revision - - `update_policy_status("__global__", next_version, "loaded")` -- marks loaded immediately - - `supersede_older_policies("__global__", next_version)` -- marks all older revisions as superseded - - Stores hex-encoded payload in `gateway_settings` blob under `policy` key via `upsert_setting_value()` - - Returns `UpdateSettingsResponse { version: N, policy_hash: "..." }` - -3. **Sandbox** (next poll tick, ~10 seconds): - - `poll_settings(sandbox_id)` returns response with `policy_source: GLOBAL`, `global_policy_version: N` - - `config_revision` changed → enters change processing - - `policy_hash` changed → calls `opa_engine.reload_from_proto(global_policy)` - - Logs `"Policy reloaded successfully (global)"` with `global_version=N` - - Does NOT call `ReportPolicyStatus` (global policies skip per-sandbox status reporting) - -## Cross-References - -- [Gateway Architecture](gateway.md) -- Persistence layer, gRPC service, object types -- [Sandbox Architecture](sandbox.md) -- Poll loop, `CachedOpenShellClient`, OPA reload lifecycle -- [Policy Language](security-policy.md) -- Live policy updates, global policy CLI commands -- [TUI](tui.md) -- Settings tabs in dashboard and sandbox views diff --git a/architecture/gateway-single-node.md b/architecture/gateway-single-node.md deleted file mode 100644 index 4b5a6aa6d..000000000 --- a/architecture/gateway-single-node.md +++ /dev/null @@ -1,129 +0,0 @@ -# Gateway Deployment and Compute Platforms - -This document describes the OpenShell gateway deployment model. Operators run a gateway endpoint and configure the compute driver that should create sandboxes. - -The Helm chart remains in this repository as the supported Kubernetes deployment artifact. Docker, Podman, and the experimental MicroVM runtime remain first-class compute platforms for local and specialized deployments. - -## Goals and Scope - -- Keep the gateway deployable as a standard process, container, or Kubernetes Helm release. -- Keep the Helm chart for Kubernetes deployments. -- Keep the gateway image independent from the compute runtime. -- Make compute-platform dependencies explicit. -- Preserve CLI gateway registration and selection as the way users target an already-running gateway. - -Out of scope: - -- Provisioning Kubernetes, Docker, Podman, or VM host infrastructure. -- Defining a new one-command mTLS import flow for every deployment type. - -## Components - -- `crates/openshell-server`: Gateway API server, persistence, inference route management, SSH relay, and compute-driver integration. -- `crates/openshell-driver-kubernetes`: Kubernetes compute driver for sandbox pods and Kubernetes resources. -- `crates/openshell-driver-docker`: Docker compute driver for local sandbox containers. -- `crates/openshell-driver-vm`: VM compute driver for libkrun-backed sandboxes. -- Podman driver path: rootless container execution compatible with the Podman runtime model. -- `deploy/helm/openshell`: Helm chart for deploying the gateway and Kubernetes driver configuration. -- `deploy/docker/Dockerfile.images` target `gateway`: Builds the published gateway image. -- `crates/openshell-cli`: CLI commands that register, select, and talk to gateways. - -## Deployment Flow - -```mermaid -sequenceDiagram - participant O as Operator - participant G as Gateway - participant D as Compute Driver - participant P as Compute Platform - participant C as openshell CLI - - O->>G: Start gateway with driver configuration - G->>D: Initialize selected compute driver - D->>P: Verify runtime or cluster access - O->>C: Register reachable gateway endpoint - C->>G: gRPC / HTTP requests - G->>D: Create / delete / watch sandboxes - D->>P: Create sandbox workload -``` - -## Supported Compute Platforms - -| Platform | Gateway shape | Sandbox workload | Primary dependencies | -|---|---|---|---| -| Docker | Standalone gateway process or container on a host with Docker access. | Local containers. | Docker daemon, image pull/build access, local networking. | -| Podman | Standalone gateway process with Podman socket access. | Rootless or user-scoped containers. | Podman socket, rootless networking, image pull/build access. | -| Kubernetes | Gateway StatefulSet installed by Helm. | Sandbox pods. | Kubernetes API, namespace, service account, RBAC, storage, secrets. | -| MicroVM | Gateway process with VM driver access. | VM-backed sandboxes. | VM runtime rootfs, libkrun-based driver, host virtualization support. | - -## Kubernetes Helm Deployment - -The Helm chart at `deploy/helm/openshell` owns Kubernetes deployment concerns: - -- Gateway StatefulSet and persistent volume claim. -- Service account, RBAC, and service. -- Gateway service exposure. -- TLS secret mounts and environment variables. -- Sandbox namespace, default sandbox image, and callback endpoint configuration. -- NetworkPolicy restricting sandbox SSH ingress to the gateway. - -The chart expects these operator-provided inputs: - -| Input | Purpose | -|---|---| -| Namespace | Release namespace and default sandbox namespace. | -| `openshell-ssh-handshake` Secret | HMAC key used by the SSH relay handshake. | -| `openshell-server-tls` Secret | Server certificate and key when TLS is enabled. | -| `openshell-server-client-ca` Secret | CA bundle used by the gateway to verify client certificates. | -| `openshell-client-tls` Secret | Client certificate bundle mounted into sandbox pods. | -| StorageClass / PVC support | Persistent gateway SQLite data when using the default `server.dbUrl`. | -| Service exposure | Port-forward, ingress, load balancer, or NodePort for CLI access. | - -For local Kubernetes evaluation, TLS may be disabled with `server.disableTls=true` and the service can be reached through `kubectl port-forward`. Production deployments should keep TLS enabled or place the gateway behind a trusted TLS-terminating access proxy. - -Key Helm values: - -| Value | Effect | -|---|---| -| `image.repository`, `image.tag` | Select the gateway image. | -| `service.type`, `service.port`, `service.nodePort` | Expose the gateway service. | -| `server.dbUrl` | Select SQLite or Postgres persistence. | -| `server.sandboxNamespace` | Namespace for sandbox resources. | -| `server.sandboxImage` | Default sandbox image. | -| `server.grpcEndpoint` | Endpoint sandbox supervisors use to call back to the gateway. | -| `server.sshGatewayHost`, `server.sshGatewayPort` | Host and port returned to CLI clients for SSH proxy connections. | -| `server.disableTls`, `server.disableGatewayAuth` | Transport/authentication mode. | -| `server.tls.*` | Names of TLS secrets mounted into the gateway and sandboxes. | - -## Runtime Shape - -```mermaid -flowchart LR - CLI[openshell CLI] -->|gRPC / HTTP| GW[Gateway] - GW --> DB[(SQLite or Postgres)] - GW --> DRIVER[Compute Driver] - DRIVER --> DOCKER[Docker] - DRIVER --> PODMAN[Podman] - DRIVER --> K8S[Kubernetes API] - DRIVER --> VM[MicroVM Driver] - DOCKER --> SBX1[Sandbox Container] - PODMAN --> SBX2[Sandbox Container] - K8S --> SBX3[Sandbox Pod] - VM --> SBX4[Sandbox VM] - SBX1 --> GW - SBX2 --> GW - SBX3 --> GW - SBX4 --> GW -``` - -The gateway process manages all OpenShell control-plane APIs. It persists records in SQLite or Postgres, watches sandbox state through the selected compute driver, and brokers SSH access through supervisor-initiated relay streams. - -## Operational Notes - -- Gateway endpoint registration should use `openshell gateway add ` regardless of compute platform. -- Kubernetes chart changes should be validated with `helm lint deploy/helm/openshell` and an install into a disposable namespace when possible. -- Docker driver changes should be validated with `mise run gateway:docker` or `mise run e2e:docker`. -- Podman driver changes should be validated with `mise run e2e:podman`. -- VM driver changes should be validated with `mise run e2e:vm`. -- Gateway image changes should be validated by building `deploy/docker/Dockerfile.images` target `gateway`. -- Published docs should describe gateway deployment and endpoint registration. diff --git a/architecture/gateway.md b/architecture/gateway.md index c85f77ef6..9c7f3a8d3 100644 --- a/architecture/gateway.md +++ b/architecture/gateway.md @@ -1,703 +1,106 @@ -# Gateway Architecture +# Gateway -## Overview +The gateway is the OpenShell control plane. It exposes the API used by the CLI, +SDK, and TUI; persists platform state; manages provider credentials and +inference configuration; and asks compute runtimes to create or delete sandbox +workloads. -`openshell-server` is the gateway -- the central control plane for a cluster. It exposes two gRPC services (OpenShell and Inference) and HTTP endpoints on a single multiplexed port, manages sandbox lifecycle through a pluggable compute driver, persists state in SQLite or Postgres, and brokers SSH access into sandboxes through supervisor-initiated relay streams. The gateway coordinates all interactions between clients, the compute backend, and the persistence layer. +## Responsibilities -Each sandbox supervisor opens a persistent inbound gRPC session (`ConnectSupervisor`); the gateway multiplexes per-invocation `RelayStream` RPCs onto the same HTTP/2 connection to move bytes between clients and the in-sandbox SSH Unix socket. The gateway does not need to know, resolve, or reach the sandbox's network address. +- Authenticate clients and sandbox callbacks. +- Serve gRPC APIs for sandbox lifecycle, provider management, policy updates, + settings, inference configuration, logs, and watch streams. +- Serve HTTP endpoints for health, SSH tunnel upgrades, and edge-auth flows. +- Persist domain objects in SQLite or Postgres. +- Resolve provider credentials and inference bundles for sandbox supervisors. +- Coordinate supervisor relay sessions for connect, exec, and file sync. -## Architecture Diagram +The gateway does not enforce agent network policy at request time. That happens +inside each sandbox, where the supervisor and proxy can observe local process +identity. -The following diagram shows the major components inside the gateway process and their relationships. +## Protocol and Auth -```mermaid -graph TD - Client["gRPC / HTTP Client"] - Supervisor["Sandbox Supervisor
(inbound gRPC)"] - TCP["TCP Listener"] - TLS["TLS Acceptor
(optional)"] - MUX["MultiplexedService
(HTTP/2 adaptive window)"] - GRPC_ROUTER["GrpcRouter"] - NAV["OpenShellServer
(OpenShell service)"] - INF["InferenceServer
(Inference service)"] - HTTP["HTTP Router
(Axum)"] - HEALTH["Health Endpoints"] - SSH_TUNNEL["SSH Tunnel
(/connect/ssh)"] - SUP_REG["SupervisorSessionRegistry"] - STORE["Store
(SQLite / Postgres)"] - COMPUTE["ComputeRuntime"] - DRIVER["ComputeDriver
(kubernetes / vm)"] - WATCH_BUS["SandboxWatchBus"] - LOG_BUS["TracingLogBus"] - PLAT_BUS["PlatformEventBus"] - INDEX["SandboxIndex"] - - Client --> TCP - Supervisor --> TCP - TCP --> TLS - TLS --> MUX - MUX -->|"content-type: application/grpc"| GRPC_ROUTER - MUX -->|"other"| HTTP - GRPC_ROUTER -->|"/openshell.inference.v1.Inference/*"| INF - GRPC_ROUTER -->|"all other paths"| NAV - HTTP --> HEALTH - HTTP --> SSH_TUNNEL - NAV --> STORE - NAV --> COMPUTE - NAV --> SUP_REG - SSH_TUNNEL --> STORE - SSH_TUNNEL --> SUP_REG - INF --> STORE - COMPUTE --> DRIVER - COMPUTE --> STORE - COMPUTE --> WATCH_BUS - COMPUTE --> INDEX - COMPUTE --> PLAT_BUS - LOG_BUS --> PLAT_BUS -``` - -## Source Layout - -| Module | File | Purpose | -|--------|------|---------| -| Entry point | `crates/openshell-server/src/main.rs` | Thin binary wrapper that calls `cli::run_cli` | -| CLI | `crates/openshell-server/src/cli.rs` | `Args` parser, config assembly, tracing setup, calls `run_server` | -| Gateway runtime | `crates/openshell-server/src/lib.rs` | `ServerState` struct, `run_server()` accept loop | -| Protocol mux | `crates/openshell-server/src/multiplex.rs` | `MultiplexService`, `MultiplexedService`, `GrpcRouter`, `BoxBody`, HTTP/2 adaptive-window tuning | -| gRPC: OpenShell | `crates/openshell-server/src/grpc/mod.rs` | `OpenShellService` trait impl -- dispatches to per-concern handlers | -| gRPC: Sandbox/Exec | `crates/openshell-server/src/grpc/sandbox.rs` | Sandbox CRUD, `ExecSandbox`, SSH session handlers, relay-backed exec proxy | -| gRPC: Inference | `crates/openshell-server/src/inference.rs` | `InferenceService` -- gateway inference config and sandbox bundle delivery | -| Supervisor sessions | `crates/openshell-server/src/supervisor_session.rs` | `SupervisorSessionRegistry`, `handle_connect_supervisor`, `handle_relay_stream`, reaper | -| HTTP | `crates/openshell-server/src/http.rs` | Health endpoints, merged with SSH tunnel router | -| Browser auth | `crates/openshell-server/src/auth.rs` | Cloudflare browser login relay at `/auth/connect` | -| SSH tunnel | `crates/openshell-server/src/ssh_tunnel.rs` | HTTP CONNECT handler at `/connect/ssh` backed by `open_relay` | -| WS tunnel | `crates/openshell-server/src/ws_tunnel.rs` | WebSocket tunnel handler at `/_ws_tunnel` for Cloudflare-fronted clients | -| TLS | `crates/openshell-server/src/tls.rs` | `TlsAcceptor` wrapping rustls with ALPN | -| Persistence | `crates/openshell-server/src/persistence/mod.rs` | `Store` enum (SQLite/Postgres), generic object CRUD, protobuf codec | -| Compute runtime | `crates/openshell-server/src/compute/mod.rs` | `ComputeRuntime`, gateway-owned sandbox lifecycle orchestration over a compute backend | -| Compute driver: Kubernetes | `crates/openshell-driver-kubernetes/src/driver.rs` | Kubernetes CRD create/delete/watch, pod template translation | -| Compute driver: Docker | `crates/openshell-driver-docker/src/lib.rs` | Local Docker container create/stop/delete/watch | -| Compute driver: VM | `crates/openshell-driver-vm/src/driver.rs` | Per-sandbox microVM create/delete/watch, supervisor-only guest boot | -| Sandbox index | `crates/openshell-server/src/sandbox_index.rs` | `SandboxIndex` -- in-memory name/pod-to-id correlation | -| Watch bus | `crates/openshell-server/src/sandbox_watch.rs` | `SandboxWatchBus` -- in-memory broadcast for persisted sandbox updates | -| Tracing bus | `crates/openshell-server/src/tracing_bus.rs` | `TracingLogBus` -- captures tracing events keyed by `sandbox_id` | - -Proto definitions consumed by the gateway: - -| Proto file | Package | Defines | -|------------|---------|---------| -| `proto/openshell.proto` | `openshell.v1` | `OpenShell` service, public sandbox resource model, provider/SSH/watch/policy messages, supervisor session messages (`ConnectSupervisor`, `RelayStream`, `RelayFrame`) | -| `proto/compute_driver.proto` | `openshell.compute.v1` | Internal `ComputeDriver` service, driver-native sandbox observations, compute watch stream envelopes | -| `proto/inference.proto` | `openshell.inference.v1` | `Inference` service: `SetClusterInference`, `GetClusterInference`, `GetInferenceBundle` | -| `proto/datamodel.proto` | `openshell.datamodel.v1` | `Provider` | -| `proto/sandbox.proto` | `openshell.sandbox.v1` | Sandbox supervisor policy, settings, and config messages | - -## Startup Sequence - -The gateway boots in `cli::run_cli` (`crates/openshell-server/src/cli.rs`) and proceeds through these steps: - -1. **Install rustls crypto provider** -- `rustls::crypto::ring::default_provider().install_default()`. -2. **Parse CLI arguments** -- `Args::parse()` via `clap`. Every flag has a corresponding environment variable (see [Configuration](#configuration)). -3. **Initialize tracing** -- Creates a `TracingLogBus` and installs a tracing subscriber that writes to stdout and publishes log events keyed by `sandbox_id` into the bus. -4. **Build `Config`** -- Assembles an `openshell_core::Config` from the parsed arguments. -5. **Call `run_server()`** (`crates/openshell-server/src/lib.rs`): - 1. Connect to the persistence store (`Store::connect`), which auto-detects SQLite vs Postgres from the URL prefix and runs migrations. - 2. Create `ComputeRuntime` with a `ComputeDriver` implementation selected by `OPENSHELL_DRIVERS`: - - `kubernetes` wraps `KubernetesComputeDriver` in `ComputeDriverService`, so the gateway uses the `openshell.compute.v1.ComputeDriver` RPC surface even without transport. - - `docker` constructs `openshell-driver-docker` in-process and manages local containers labeled with the configured sandbox namespace. - - `vm` spawns the standalone `openshell-driver-vm` binary as a local compute-driver process, resolves it from `--driver-dir`, conventional libexec install paths, or a sibling of the gateway binary, connects to it over a Unix domain socket, and keeps the libkrun/rootfs runtime out of the gateway binary. - 3. Build `ServerState` (shared via `Arc` across all handlers), including a fresh `SupervisorSessionRegistry`. - 4. Resume persisted sandboxes that were stopped during the previous gateway shutdown. - 5. **Spawn background tasks**: - - `ComputeRuntime::spawn_watchers` -- consumes the compute-driver watch stream, republishes platform events, and runs a periodic `ListSandboxes` snapshot reconcile. - - `ssh_tunnel::spawn_session_reaper` -- sweeps expired or revoked SSH session tokens from the store hourly. - - `supervisor_session::spawn_relay_reaper` -- sweeps orphaned pending relay channels every 30 seconds. - 6. Create `MultiplexService`. - 7. Bind the primary gateway listener and any compute-driver requested listeners. Docker requests the Docker bridge gateway address with the normal gateway port, so sandbox containers can call back over the bridge without joining the host network. - 8. Bind optional health and metrics listeners. - 9. Optionally create `TlsAcceptor` from cert/key files. - 10. Spawn a task per gateway listener. Each accepted connection optionally performs a TLS handshake, then calls `MultiplexService::serve()`. - -## Configuration - -All configuration is via CLI flags with environment variable fallbacks. The `--db-url` flag is required. - -| Flag | Env Var | Default | Description | -|------|---------|---------|-------------| -| `--bind-address` | `OPENSHELL_BIND_ADDRESS` | `127.0.0.1` | IP address for gateway, health, and metrics listeners. Container deployments pass `0.0.0.0` explicitly. | -| `--port` | `OPENSHELL_SERVER_PORT` | `8080` | TCP listen port | -| `--log-level` | `OPENSHELL_LOG_LEVEL` | `info` | Tracing log level filter | -| `--tls-cert` | `OPENSHELL_TLS_CERT` | None | Path to PEM certificate file | -| `--tls-key` | `OPENSHELL_TLS_KEY` | None | Path to PEM private key file | -| `--tls-client-ca` | `OPENSHELL_TLS_CLIENT_CA` | None | Path to PEM CA cert for mTLS client verification | -| `--disable-tls` | `OPENSHELL_DISABLE_TLS` | `false` | Listen on plaintext HTTP behind a trusted reverse proxy or tunnel | -| `--disable-gateway-auth` | `OPENSHELL_DISABLE_GATEWAY_AUTH` | `false` | Keep TLS enabled but allow no-certificate clients and rely on application-layer auth | -| `--client-tls-secret-name` | `OPENSHELL_CLIENT_TLS_SECRET_NAME` | None | K8s secret name to mount into sandbox pods for mTLS | -| `--db-url` | `OPENSHELL_DB_URL` | *required* | Database URL (`sqlite:...` or `postgres://...`). The Helm chart defaults to `sqlite:/var/openshell/openshell.db` (persistent volume). In-memory SQLite (`sqlite::memory:?cache=shared`) works for ephemeral/test environments but data is lost on restart. | -| `--sandbox-namespace` | `OPENSHELL_SANDBOX_NAMESPACE` | `default` | Kubernetes namespace for sandbox CRDs | -| `--sandbox-image` | `OPENSHELL_SANDBOX_IMAGE` | None | Default container image for sandbox pods | -| `--grpc-endpoint` | `OPENSHELL_GRPC_ENDPOINT` | None | gRPC endpoint reachable from sandbox workloads for supervisor callbacks | -| `--drivers` | `OPENSHELL_DRIVERS` | `kubernetes` | Compute backend to use. Current options are `kubernetes`, `docker`, and `vm`. | -| `--docker-network-name` | `OPENSHELL_DOCKER_NETWORK_NAME` | `openshell-docker` | Docker bridge network that local Docker sandboxes join | -| `--driver-dir` | `OPENSHELL_DRIVER_DIR` | unset | Override directory for `openshell-driver-vm`. When unset, the gateway searches `~/.local/libexec/openshell`, `/usr/libexec/openshell`, `/usr/local/libexec/openshell`, `/usr/local/libexec`, then a sibling binary. | -| `--vm-driver-state-dir` | `OPENSHELL_VM_DRIVER_STATE_DIR` | `target/openshell-vm-driver` | Host directory for VM sandbox rootfs, console logs, runtime state, and shared image-rootfs cache | -| `--vm-krun-log-level` | `OPENSHELL_VM_KRUN_LOG_LEVEL` | `1` | libkrun log level for VM helper processes | -| `--vm-driver-vcpus` | `OPENSHELL_VM_DRIVER_VCPUS` | `2` | Default vCPU count for VM sandboxes | -| `--vm-driver-mem-mib` | `OPENSHELL_VM_DRIVER_MEM_MIB` | `2048` | Default memory allocation for VM sandboxes in MiB | -| `--vm-tls-ca` | `OPENSHELL_VM_TLS_CA` | None | CA cert copied into VM guests for gateway mTLS | -| `--vm-tls-cert` | `OPENSHELL_VM_TLS_CERT` | None | Client cert copied into VM guests for gateway mTLS | -| `--vm-tls-key` | `OPENSHELL_VM_TLS_KEY` | None | Client private key copied into VM guests for gateway mTLS | -| `--ssh-gateway-host` | `OPENSHELL_SSH_GATEWAY_HOST` | `127.0.0.1` | Public hostname returned in SSH session responses | -| `--ssh-gateway-port` | `OPENSHELL_SSH_GATEWAY_PORT` | `8080` | Public port returned in SSH session responses | -| `--ssh-connect-path` | `OPENSHELL_SSH_CONNECT_PATH` | `/connect/ssh` | HTTP path for SSH CONNECT/upgrade | - -The sandbox-side SSH listener is a Unix domain socket inside the sandbox. The path defaults to `/run/openshell/ssh.sock` and is configured on the compute driver (e.g. `openshell-driver-kubernetes --sandbox-ssh-socket-path`). The gateway never dials this socket itself; the supervisor bridges it onto a `RelayStream` when asked. - -## Shared State - -All handlers share an `Arc` (`crates/openshell-server/src/lib.rs`): - -```rust -pub struct ServerState { - pub config: Config, - pub store: Arc, - pub compute: ComputeRuntime, - pub sandbox_index: SandboxIndex, - pub sandbox_watch_bus: SandboxWatchBus, - pub tracing_log_bus: TracingLogBus, - pub ssh_connections_by_token: Mutex>, - pub ssh_connections_by_sandbox: Mutex>, - pub settings_mutex: tokio::sync::Mutex<()>, - pub supervisor_sessions: SupervisorSessionRegistry, -} -``` - -- **`store`** -- persistence backend (SQLite or Postgres) for all object types. -- **`compute`** -- gateway-owned compute orchestration. Persists sandbox lifecycle transitions, validates create requests through the compute backend, consumes the backend watch stream, and periodically reconciles the store against `ComputeDriver/ListSandboxes` snapshots. -- **`sandbox_index`** -- in-memory bidirectional index mapping sandbox names and agent pod names to sandbox IDs. Updated from compute-driver sandbox snapshots. -- **`sandbox_watch_bus`** -- `broadcast`-based notification bus keyed by sandbox ID. Producers call `notify(&id)` when the persisted sandbox record changes; consumers in `WatchSandbox` streams receive `()` signals and re-read the record. -- **`tracing_log_bus`** -- captures `tracing` events that include a `sandbox_id` field and republishes them as `SandboxLogLine` messages. Maintains a per-sandbox tail buffer (default 200 entries). Also contains a nested `PlatformEventBus` for compute-driver platform events. -- **`supervisor_sessions`** -- tracks the live `ConnectSupervisor` session per sandbox and the set of pending relay channels awaiting the supervisor's `RelayStream` dial-back. See [Supervisor Sessions](#supervisor-sessions). -- **`settings_mutex`** -- serializes settings mutations (global and sandbox) to prevent read-modify-write races. See [Gateway Settings Channel](gateway-settings.md#global-policy-lifecycle). - -## Protocol Multiplexing - -All traffic (gRPC and HTTP) shares a single TCP port. Multiplexing happens at the request level, not the connection level. +The gateway listens on one service port and multiplexes gRPC and HTTP traffic. +The default deployment mode is mTLS: clients and sandbox workloads present a +certificate signed by the deployment CA before reaching application handlers. -### Connection Handling +Supported auth modes: -`MultiplexService::serve()` (`crates/openshell-server/src/multiplex.rs`) creates per-connection service instances: +| Mode | Use | +|---|---| +| mTLS | Default direct gateway access for CLI, SDK, TUI, and sandbox callbacks. | +| Plaintext | Local development or a trusted reverse proxy boundary. | +| Cloudflare JWT | Edge-authenticated deployments where Cloudflare Access supplies identity. | +| OIDC | Bearer-token auth for users, with browser PKCE or client credentials login. | -1. Each accepted TCP stream (optionally TLS-wrapped) is passed to `hyper_util::server::conn::auto::Builder`, which auto-negotiates HTTP/1.1 or HTTP/2. -2. The HTTP/2 side is built with `adaptive_window(true)`. Hyper/h2 auto-sizes the per-stream flow-control window based on measured bandwidth-delay product, so bulk byte transfers on `RelayStream` (and `ExecSandbox` / `PushSandboxLogs`) are not throttled by the default 64 KiB window. Idle streams stay cheap; active streams grow as needed. -3. The builder calls `serve_connection_with_upgrades()`, which supports HTTP upgrades (needed for the SSH tunnel's CONNECT method). -4. For each request, `MultiplexedService` inspects the `content-type` header: - - **Starts with `application/grpc`** -- routes to `GrpcRouter`. - - **Anything else** -- routes to the Axum HTTP router. +Sandbox supervisor RPCs authenticate with either mTLS material or a sandbox +secret depending on the runtime and deployment mode. User-facing mutations are +authorized by role policy when OIDC or edge identity is enabled. -### gRPC Sub-Routing +## API Surface -`GrpcRouter` (`crates/openshell-server/src/multiplex.rs`) further routes gRPC requests by URI path prefix: +The gateway API is organized around platform objects and operational streams: -- Paths starting with `/openshell.inference.v1.Inference/` go to `InferenceServer`. -- All other gRPC paths go to `OpenShellServer`. +| Area | Examples | +|---|---| +| Sandbox lifecycle | Create, list, delete, watch, exec, SSH session bootstrap. | +| Providers | Store provider records, discover credentials, resolve runtime environment. | +| Policy and settings | Get effective sandbox config, update sandbox policy, manage global settings. | +| Inference | Set gateway-level model/provider config and resolve sandbox route bundles. | +| Observability | Push sandbox logs, stream sandbox status and logs to clients. | -### Body Type Normalization +Domain objects use shared metadata: stable server-generated IDs, human-readable +names, creation timestamps, and labels. Crate-level details live in +`crates/openshell-core/README.md`. -Both gRPC and HTTP handlers produce different response body types. `MultiplexedService` normalizes them through a custom `BoxBody` wrapper (an `UnsyncBoxBody>`) so that Hyper receives a uniform response type. +## Persistence -### TLS + mTLS +The gateway stores protobuf payloads with indexed object metadata. SQLite is the +default local store; Postgres is supported for deployments that need an external +database. Persisted state includes sandboxes, providers, SSH sessions, policy +revisions, settings, inference configuration, and deployment records. -When TLS is enabled (`crates/openshell-server/src/tls.rs`): +Policy and runtime settings are delivered together through the effective sandbox +config path. A gateway-global policy can override sandbox-scoped policy. The +sandbox supervisor polls for config revisions and hot-reloads dynamic policy +when the policy engine accepts the update. -- `TlsAcceptor::from_files()` loads PEM certificates and keys via `rustls_pemfile`, builds a `rustls::ServerConfig`, and configures ALPN to advertise `h2` and `http/1.1`. -- When a client CA path is provided (`--tls-client-ca`), the server enforces mutual TLS using `WebPkiClientVerifier` by default. In Cloudflare-fronted deployments, `--disable-gateway-auth` keeps TLS enabled but allows no-certificate clients so the edge can forward a JWT instead. -- `--disable-tls` removes gateway-side TLS entirely and serves plaintext HTTP behind a trusted reverse proxy or tunnel. -- Supports PKCS#1, PKCS#8, and SEC1 private key formats. -- The TLS handshake happens before the stream reaches Hyper's auto builder, so ALPN negotiation and HTTP version detection work together transparently. -- Certificates are operator-provided in the target deployment model. Helm deployments consume three K8s secrets: `openshell-server-tls` (server cert+key), `openshell-server-client-ca` (CA cert), and `openshell-client-tls` (client cert+key+CA, shared by CLI and sandbox workloads). -- Sandbox supervisors reuse the shared client cert to authenticate their `ConnectSupervisor` and `RelayStream` calls over the same mTLS channel. +## Supervisor Relay -## Supervisor Sessions - -The gateway brokers all byte-level access into a sandbox through a two-plane design on a single HTTP/2 connection initiated by the supervisor: - -1. **Control plane** -- `ConnectSupervisor(stream SupervisorMessage) returns (stream GatewayMessage)`. Long-lived, one per sandbox. Carries `SupervisorHello`, `SessionAccepted`/`SessionRejected`, heartbeats, and `RelayOpen`/`RelayClose` control messages. -2. **Data plane** -- `RelayStream(stream RelayFrame) returns (stream RelayFrame)`. One short-lived call per SSH or exec invocation. The first inbound frame is a `RelayInit { channel_id }`; subsequent frames carry raw bytes in `RelayFrame.data` in either direction. - -Both RPCs are defined in `proto/openshell.proto` and ride the same TCP + TLS + HTTP/2 connection from the supervisor. No new TLS handshake, no reverse HTTP CONNECT, no direct gateway-to-pod dial. - -### `SupervisorSessionRegistry` - -`crates/openshell-server/src/supervisor_session.rs` defines `SupervisorSessionRegistry`, a single instance of which lives on `ServerState.supervisor_sessions`. It holds two maps guarded by `std::sync::Mutex`: - -- `sessions: HashMap` -- one entry per connected supervisor. Each `LiveSession` carries a unique `session_id`, the `mpsc::Sender` for the outbound stream, and a connection timestamp. -- `pending_relays: HashMap` -- one entry per in-flight `open_relay` call awaiting the supervisor's `RelayStream` dial-back. Each `PendingRelay` wraps a `oneshot::Sender` and a creation timestamp. - -Core operations: - -| Method | Purpose | -|--------|---------| -| `register(sandbox_id, session_id, tx)` | Insert a live session; returns the previous session's sender (if any) so the caller can close it. Used by `handle_connect_supervisor` when a supervisor reconnects. | -| `remove_if_current(sandbox_id, session_id)` | Remove the session only if its `session_id` still matches. Guards against the supersede race where an old session's cleanup task fires after a newer session already registered. | -| `open_relay(sandbox_id, session_wait_timeout)` | Wait up to `session_wait_timeout` for a live session, allocate a fresh `channel_id` (UUID v4), insert the pending slot, send `RelayOpen { channel_id }` to the supervisor, and return `(channel_id, oneshot::Receiver)`. The receiver resolves once the supervisor's `RelayStream` arrives and `claim_relay` pairs them up. | -| `claim_relay(channel_id)` | Consume the pending slot, construct a `tokio::io::duplex(64 KiB)` pair, hand the gateway-side half to the waiter via the oneshot, and return the supervisor-side half to `handle_relay_stream`. | -| `reap_expired_relays()` | Drop pending relays older than 10 s. Called by `spawn_relay_reaper` on a 30 s cadence. | - -Session wait uses exponential backoff from 100 ms to 2 s while polling the sessions map. Pending-relay expiry is fixed at `RELAY_PENDING_TIMEOUT = 10 s`. - -### `handle_connect_supervisor` - -Lifecycle of a supervisor session: - -1. Read the first `SupervisorMessage`; require `payload = Hello { sandbox_id, instance_id }` and a non-empty `sandbox_id`. -2. Allocate a fresh `session_id` (UUID v4) and create an `mpsc::channel::(64)` for the outbound stream. -3. Call `registry.register(...)`. If it returns a previous sender, log that the previous session was superseded (dropping the previous `tx` closes the old outbound stream). -4. Send `SessionAccepted { session_id, heartbeat_interval_secs: 15 }`. If the send fails, call `remove_if_current` (so a concurrent reconnect isn't evicted) and return `Internal`. -5. Spawn a session loop that `select!`s between inbound messages and a 15 s heartbeat timer. Inbound heartbeats are silent; `RelayOpenResult` is logged; `RelayClose` is logged; unknown payloads are logged as warnings. -6. When the loop exits (inbound EOF, inbound error, or outbound channel closed), `remove_if_current` drops the registration -- unless a newer session has already replaced it. - -### `handle_relay_stream` - -Lifecycle of one relay call: - -1. Read the first inbound `RelayFrame`; require `payload = Init { channel_id }` with a non-empty `channel_id`. Reject anything else with `InvalidArgument`. -2. Call `registry.claim_relay(channel_id)`. Returns `NotFound` if the channel is unknown or already expired, `DeadlineExceeded` if older than 10 s, or `Internal` if the waiter has dropped the oneshot receiver. -3. Split the supervisor-side `DuplexStream` into read and write halves and spawn two tasks: - - **Supervisor → gateway**: pull `RelayFrame`s from the inbound stream, accept `Data(bytes)`, write to the duplex write-half. On non-data frames, warn and break. Best-effort `shutdown()` on exit so the reader sees EOF. - - **Gateway → supervisor**: read up to `RELAY_STREAM_CHUNK_SIZE = 16 KiB` at a time from the duplex read-half and emit `RelayFrame { Data }` messages on an outbound `mpsc::channel(16)`. -4. Return the outbound receiver as the RPC response stream. - -### Connect Flow (SSH Tunnel) +Sandbox workloads maintain an outbound supervisor session to the gateway. This +lets the gateway open per-request byte relays without requiring inbound network +access to the sandbox workload. ```mermaid sequenceDiagram - participant Client as SSH client - participant GW as Gateway
(/connect/ssh) - participant Reg as SupervisorSessionRegistry - participant Sup as Sandbox Supervisor - participant Daemon as In-sandbox sshd
(Unix socket) - - Client->>GW: CONNECT /connect/ssh
x-sandbox-id, x-sandbox-token - GW->>GW: validate session + sandbox Ready - GW->>Reg: open_relay(sandbox_id, 30s) - Reg->>Sup: GatewayMessage::RelayOpen { channel_id } - Note over Reg: waits for RelayStream on channel_id - Sup->>Daemon: connect to Unix socket - Sup->>GW: RelayStream(RelayFrame::Init { channel_id }) - GW->>Reg: claim_relay(channel_id) - Reg-->>Sup: supervisor-side DuplexStream - Reg-->>GW: gateway-side DuplexStream - GW-->>Client: 200 OK + HTTP upgrade - Client<<->>GW: copy_bidirectional(upgraded, duplex) - GW<<->>Sup: RelayFrame::Data in both directions - Sup<<->>Daemon: raw SSH bytes + participant CLI + participant GW as Gateway + participant SUP as Sandbox supervisor + participant SSH as Sandbox SSH socket + + SUP->>GW: ConnectSupervisor stream + CLI->>GW: connect / exec / sync request + GW->>SUP: RelayOpen(channel) + SUP->>GW: RelayStream(channel) + SUP->>SSH: Bridge bytes to Unix socket + CLI->>GW: Client bytes + GW-->>CLI: Client bytes + GW->>SUP: Relay bytes + SUP-->>GW: Relay bytes ``` -Timeouts on the tunnel path: - -- `open_relay` session wait: **30 s**. A first `sandbox connect` immediately after `sandbox create` must cover the supervisor's initial TLS + gRPC handshake on a cold pod. -- `relay_rx` delivery timeout: 10 s. Covers the round-trip from the `RelayOpen` message to the supervisor's `RelayStream` dial-back. - -Per-token and per-sandbox concurrent-tunnel limits (3 and 20 respectively) are still enforced before the upgrade. - -### Exec Flow - -`ExecSandbox` reuses the same machinery from `grpc/sandbox.rs`: - -1. Validate the request (`sandbox_id`, `command`, env-key format, other field rules), fetch the sandbox, require `Ready` phase. -2. `state.supervisor_sessions.open_relay(&sandbox.id, 15s)` -- shorter timeout than SSH connect, because exec is typically called mid-lifetime after the supervisor session is already established. -3. Wait up to 10 s for the relay `DuplexStream`. -4. `stream_exec_over_relay`: bind an ephemeral localhost TCP listener, bridge that single-use TCP socket to the relay duplex, and drive a `russh` client through the local port. The `russh` session opens a channel, executes the shell-escaped command, and streams `ExecSandboxStdout`/`ExecSandboxStderr` chunks to the caller. On completion, send `ExecSandboxExit { exit_code }`. -5. On timeout (if `timeout_seconds > 0`), emit exit code 124 (matching `timeout(1)`). - -The supervisor-side SSH daemon is an SSH server bound to a Unix domain socket inside the sandbox's filesystem. Filesystem permissions on that socket are the only access-control boundary between the supervisor bridge and the daemon; all higher-level authorization is enforced at `CreateSshSession` / `ExecSandbox` in the gateway. - -### Regression Coverage - -`crates/openshell-server/tests/supervisor_relay_integration.rs` is the regression guard for the `RelayStream` wire protocol. It stands up an in-process tonic server that mounts the real `handle_relay_stream` behind `MultiplexedService`, connects a mock supervisor client over a real tonic `Channel`, and exercises the registry's `open_relay` → `claim_relay` pairing end to end with `tokio::io::duplex` bridging. The five test cases cover: - -- Round-trip bytes from gateway to supervisor and back (echo loop). -- Clean close when the gateway drops the relay. -- EOF propagation when the supervisor closes its outbound sender. -- `Unavailable` when `open_relay` is called without a registered session. -- Concurrent `RelayStream` calls multiplexed independently on the same connection. - -These complement the unit tests inside `supervisor_session.rs` (registry-only behavior) and the live cluster tests (full CLI → gateway → sandbox path). - -## gRPC Services - -### OpenShell Service - -Defined in `proto/openshell.proto`, implemented in `crates/openshell-server/src/grpc/mod.rs` as `OpenShellService`. Per-concern handlers live in `crates/openshell-server/src/grpc/` submodules. - -#### Sandbox Management - -| RPC | Description | Key behavior | -|-----|-------------|--------------| -| `Health` | Returns service status and version | Always returns `HEALTHY` with `CARGO_PKG_VERSION` | -| `CreateSandbox` | Create a new sandbox | Validates spec and policy, validates provider names exist (fail-fast), persists to store, creates the compute-driver sandbox. On driver failure, rolls back the store record and index entry. | -| `GetSandbox` | Fetch sandbox by name | Looks up by name via `store.get_message_by_name()` | -| `ListSandboxes` | List sandboxes | Paginated (default limit 100), decodes protobuf payloads from store records | -| `DeleteSandbox` | Delete sandbox by name | Sets phase to `Deleting`, persists, notifies watch bus, then deletes via the compute driver. Cleans up store if the sandbox was already gone. | -| `WatchSandbox` | Stream sandbox updates | Server-streaming RPC. See [Watch Sandbox Stream](#watch-sandbox-stream) below. | -| `ExecSandbox` | Execute command in sandbox | Server-streaming RPC; data plane runs through `SupervisorSessionRegistry::open_relay`. See [Exec Flow](#exec-flow). | - -#### Supervisor Session - -| RPC | Description | -|-----|-------------| -| `ConnectSupervisor` | Persistent bidi stream from the sandbox supervisor. Carries hello/accept/heartbeat/`RelayOpen`/`RelayClose`. One session per sandbox; reconnects supersede. | -| `RelayStream` | Per-invocation bidi byte bridge. Supervisor initiates after receiving `RelayOpen`; first frame is `RelayInit { channel_id }`; subsequent frames carry raw bytes. | - -Neither RPC is called by end users. They are the private control/data plane between the gateway and each sandbox supervisor. - -#### SSH Session Management - -| RPC | Description | -|-----|-------------| -| `CreateSshSession` | Creates a session token for a `Ready` sandbox. Persists an `SshSession` record and returns gateway connection details (host, port, scheme, connect path). The resulting token is presented on the `/connect/ssh` HTTP CONNECT request. | -| `RevokeSshSession` | Marks a session as revoked by setting `session.revoked = true` in the store. | - -#### Provider Management - -Full CRUD for `Provider` objects, which store typed credentials (e.g., API keys for Claude, GitLab tokens). - -| RPC | Description | -|-----|-------------| -| `CreateProvider` | Creates a provider. Requires `type` field; auto-generates a 6-char name if not provided. Rejects duplicates by name. | -| `GetProvider` | Fetches a provider by name. | -| `ListProviders` | Paginated list (default limit 100). | -| `UpdateProvider` | Updates an existing provider by name. Preserves the stored `id` and `name`; replaces `type`, `credentials`, and `config`. | -| `DeleteProvider` | Deletes a provider by name. Returns `deleted: true/false`. | - -#### Policy, Settings, and Provider Environment Delivery - -These RPCs are called by sandbox supervisors at startup and during runtime polling. - -| RPC | Description | -|-----|-------------| -| `GetSandboxConfig` | Returns effective sandbox config looked up by sandbox ID: policy payload, policy metadata (version, hash, source, `global_policy_version`), merged effective settings, and a `config_revision` fingerprint for change detection. Two-tier resolution: registered keys start unset, sandbox values overlay, global values override. The reserved `policy` key in global settings can override the sandbox's own policy. When a global policy is active, `policy_source` is `GLOBAL` and `global_policy_version` carries the active revision number. See [Gateway Settings Channel](gateway-settings.md). | -| `GetGatewayConfig` | Returns gateway-global settings only (excluding the reserved `policy` key). Returns registered keys with empty values when unconfigured, and a monotonic `settings_revision`. | -| `GetSandboxProviderEnvironment` | Resolves provider credentials into environment variables for a sandbox. Iterates the sandbox's `spec.providers` list, fetches each `Provider`, and collects credential key-value pairs. First provider wins on duplicate keys. Skips credential keys that do not match `^[A-Za-z_][A-Za-z0-9_]*$`. | - -#### Policy Recommendation (Network Rules) - -These RPCs support the sandbox-initiated policy recommendation pipeline. The sandbox generates proposals via its mechanistic mapper and submits them; the gateway validates, persists, and manages the approval workflow. See [policy-advisor.md](policy-advisor.md) for the full pipeline design. - -| RPC | Description | -|-----|-------------| -| `SubmitPolicyAnalysis` | Receives pre-formed `PolicyChunk` proposals from a sandbox. Validates each chunk, persists via upsert on `(sandbox_id, host, port, binary)` dedup key, notifies watch bus. | -| `GetDraftPolicy` | Returns all draft chunks for a sandbox with current draft version. | -| `ApproveDraftChunk` | Approves a pending or rejected chunk. Merges the proposed rule into the active policy (appends binary to existing rule or inserts new rule). **Blocked when a global policy is active** -- returns `FailedPrecondition`. | -| `RejectDraftChunk` | Rejects a pending chunk or revokes an approved chunk. If revoking, removes the binary from the active policy rule. Rejection of `pending` chunks is always allowed. **Revoking approved chunks is blocked when a global policy is active** -- returns `FailedPrecondition`. | -| `ApproveAllDraftChunks` | Bulk approves all pending chunks for a sandbox. **Blocked when a global policy is active** -- returns `FailedPrecondition`. | -| `EditDraftChunk` | Updates the proposed rule on a pending chunk. | -| `GetDraftHistory` | Returns all chunks (including rejected) for audit trail. | - -### Inference Service - -Defined in `proto/inference.proto`, implemented in `crates/openshell-server/src/inference.rs` as `InferenceService`. - -The gateway acts as the control plane for inference configuration. It stores a single managed gateway inference route (named `inference.local`) and delivers resolved route bundles to sandbox pods. The gateway does not execute inference requests -- sandboxes connect directly to inference backends using the credentials and endpoints provided in the bundle. - -#### Gateway Inference Configuration - -The gateway manages a single gateway-wide inference route that maps to a provider record. When set, the route stores only a `provider_name` and `model_id` reference. At bundle resolution time, the gateway looks up the referenced provider and derives the endpoint URL, API key, protocols, and provider type from it. This late-binding design means provider credential rotations are automatically reflected in the next bundle fetch without updating the route itself. - -| RPC | Description | -|-----|-------------| -| `SetClusterInference` | Configures the gateway inference route. Validates `provider_name` and `model_id` are non-empty, verifies the named provider exists and has a supported type for inference (openai, anthropic, nvidia), validates the provider has a usable API key, then upserts the `inference.local` route record. Increments a monotonic `version` on each update. Returns the configured `provider_name`, `model_id`, and `version`. | -| `GetClusterInference` | Returns the current gateway inference configuration (`provider_name`, `model_id`, `version`). Returns `NotFound` if no gateway inference is configured, or `FailedPrecondition` if the stored route has empty provider/model metadata. | -| `GetInferenceBundle` | Returns the resolved inference route bundle for sandbox consumption. See [Route Bundle Delivery](#route-bundle-delivery) below. | - -#### Route Bundle Delivery - -The `GetInferenceBundle` RPC resolves the managed gateway route into a `GetInferenceBundleResponse` containing fully materialized route data that sandboxes can use directly. - -The trait method delegates to `resolve_inference_bundle(store)` (`crates/openshell-server/src/inference.rs`), which takes `&Store` instead of `&self`. This extraction decouples bundle resolution from `ServerState`, enabling direct unit testing against an in-memory SQLite store without constructing a full server. - -The `GetInferenceBundleResponse` includes: - -- **`routes`** -- a list of `ResolvedRoute` messages containing base URL, model ID, API key, protocols, and provider type. Currently contains zero or one routes (the managed gateway route). -- **`revision`** -- a hex-encoded hash computed from route contents. Sandboxes compare this value to detect when their route set has changed. -- **`generated_at_ms`** -- epoch milliseconds when the bundle was assembled. - -#### Provider-Based Route Resolution - -Managed route resolution in `resolve_managed_cluster_route()` (`crates/openshell-server/src/inference.rs`): - -1. Load the managed route by name (`inference.local`). -2. Skip (return `None`) if the route does not exist, has no spec, or is disabled. -3. Validate that `provider_name` and `model_id` are non-empty. -4. Fetch the referenced provider record from the store. -5. Resolve the provider into a `ResolvedProviderRoute` via `resolve_provider_route()`: - - Look up the `InferenceProviderProfile` for the provider's type via `openshell_core::inference::profile_for()`. Supported types: `openai`, `anthropic`, `nvidia`. - - Search the provider's credentials map for an API key using the profile's preferred key name (e.g., `OPENAI_API_KEY`), falling back to the first non-empty credential in sorted key order. - - Resolve the base URL from the provider's config map using the profile-specific key (e.g., `OPENAI_BASE_URL`), falling back to the profile's default URL. - - Derive protocols from the profile (e.g., `openai_chat_completions`, `openai_completions`, `openai_responses`, `model_discovery` for OpenAI-compatible providers). -6. Return a `ResolvedRoute` with the fully materialized endpoint, credentials, and protocols. - -The `ClusterInferenceConfig` stored in the database contains only `provider_name` and `model_id`. All other fields (endpoint, credentials, protocols, auth style) are resolved from the provider record at bundle generation time via `build_cluster_inference_config()`. - -## HTTP Endpoints - -The HTTP router (`crates/openshell-server/src/http.rs`) merges two sub-routers: - -### Health Endpoints - -| Path | Method | Response | -|------|--------|----------| -| `/health` | GET | `200 OK` (empty body) | -| `/healthz` | GET | `200 OK` (empty body) -- Kubernetes liveness probe | -| `/readyz` | GET | `200 OK` with JSON `{"status": "healthy", "version": ""}` -- Kubernetes readiness probe | - -### SSH Tunnel Endpoint - -| Path | Method | Response | -|------|--------|----------| -| `/connect/ssh` | CONNECT | Upgrades the connection to a bidirectional byte bridge tunneled through `SupervisorSessionRegistry::open_relay` | - -See [Connect Flow (SSH Tunnel)](#connect-flow-ssh-tunnel) for details. - -### Cloudflare Endpoints - -| Path | Method | Response | -|------|--------|----------| -| `/auth/connect` | GET | Browser login relay page that reads `CF_Authorization` and POSTs it back to the CLI localhost callback | -| `/_ws_tunnel` | GET upgrade | WebSocket tunnel that bridges bytes directly into `MultiplexedService` over an in-memory duplex stream | - -## Watch Sandbox Stream - -The `WatchSandbox` RPC (`crates/openshell-server/src/grpc/`) provides a multiplexed server-streaming response that can include sandbox status snapshots, gateway log lines, and platform events. - -### Request Options - -The `WatchSandboxRequest` controls what the stream includes: - -- `follow_status` -- subscribe to `SandboxWatchBus` notifications and re-read the sandbox record on each change. -- `follow_logs` -- subscribe to `TracingLogBus` for gateway log lines correlated by `sandbox_id`. -- `follow_events` -- subscribe to `PlatformEventBus` for compute-driver platform events correlated to the sandbox. -- `log_tail_lines` -- replay the last N log lines before following (default 200). -- `stop_on_terminal` -- end the stream when the sandbox reaches the `Ready` phase. Note: `Error` phase does not stop the stream because it may be transient (e.g., `ReconcilerError`). - -### Stream Protocol - -1. Subscribe to all requested buses **before** reading the initial snapshot (prevents missed notifications). -2. Send the current sandbox record as the first event. -3. If `stop_on_terminal` is set and the sandbox is already `Ready`, end the stream immediately. -4. Replay tail logs if `follow_logs` is enabled. -5. Enter a `tokio::select!` loop listening on up to three broadcast receivers: - - **Status updates**: re-read the sandbox from the store, send the snapshot, check for terminal phase. - - **Log lines**: forward `SandboxStreamEvent::Log` messages. - - **Platform events**: forward `SandboxStreamEvent::Event` messages. - -### Event Bus Architecture - -```mermaid -graph LR - CW["ComputeRuntime watcher"] - PE["Platform events
(driver watch)"] - TL["SandboxLogLayer
(tracing layer)"] - - WB["SandboxWatchBus
(broadcast per ID)"] - LB["TracingLogBus
(broadcast per ID + tail buffer)"] - PB["PlatformEventBus
(broadcast per ID)"] - - WS["WatchSandbox stream"] - - CW -->|"notify(id)"| WB - TL -->|"publish(id, log_event)"| LB - PE -->|"publish(id, platform_event)"| PB - - WB -->|"subscribe(id)"| WS - LB -->|"subscribe(id)"| WS - PB -->|"subscribe(id)"| WS -``` - -All buses use `tokio::sync::broadcast` channels keyed by sandbox ID. Buffer sizes: - -- `SandboxWatchBus`: 128 (signals only, no payload -- just `()`) -- `TracingLogBus`: 1024 (full `SandboxStreamEvent` payloads) -- `PlatformEventBus`: 1024 (full `SandboxStreamEvent` payloads) - -Broadcast lag is translated to `Status::resource_exhausted` via `broadcast_to_status()`. - -**Cleanup:** Each bus exposes a `remove(sandbox_id)` method that drops the broadcast sender (closing active receivers with `RecvError::Closed`) and frees internal map entries. Cleanup is wired into the compute watch reconciler, the periodic snapshot reconcile for sandboxes missing from the driver, and the `delete_sandbox` gRPC handler to prevent unbounded memory growth from accumulated entries for deleted sandboxes. - -**Validation:** `WatchSandbox` validates that the sandbox exists before subscribing to any bus, preventing entries from being created for non-existent IDs. `PushSandboxLogs` validates sandbox existence once on the first batch of the stream. - -## Persistence Layer - -### Store Architecture - -The `Store` enum (`crates/openshell-server/src/persistence/mod.rs`) dispatches to either `SqliteStore` or `PostgresStore` based on the database URL prefix: - -- `sqlite:*` -- uses `sqlx::SqlitePool` (1 connection for in-memory, 5 for file-based). -- `postgres://` or `postgresql://` -- uses `sqlx::PgPool` (max 10 connections). - -Both backends auto-run migrations on connect from `crates/openshell-server/migrations/{sqlite,postgres}/`. - -### Schema - -A single `objects` table stores all object types: - -```sql -CREATE TABLE objects ( - object_type TEXT NOT NULL, - id TEXT NOT NULL, - name TEXT NOT NULL, - payload BLOB NOT NULL, - created_at_ms INTEGER NOT NULL, - updated_at_ms INTEGER NOT NULL, - PRIMARY KEY (id), - UNIQUE (object_type, name) -); -``` - -Objects are identified by `(object_type, id)` with a unique constraint on `(object_type, name)`. The `payload` column stores protobuf-encoded bytes. - -### Object Types - -| Object type string | Proto message / format | Traits implemented | Notes | -|--------------------|------------------------|-------------------|-------| -| `"sandbox"` | `Sandbox` | `ObjectType`, `ObjectId`, `ObjectName` | | -| `"provider"` | `Provider` | `ObjectType`, `ObjectId`, `ObjectName` | | -| `"ssh_session"` | `SshSession` | `ObjectType`, `ObjectId`, `ObjectName` | | -| `"inference_route"` | `InferenceRoute` | `ObjectType`, `ObjectId`, `ObjectName` | | -| `"gateway_settings"` | JSON `StoredSettings` | Generic `put`/`get` | Singleton, id=`"global"`. Contains the reserved `policy` key for global policy delivery. | -| `"sandbox_settings"` | JSON `StoredSettings` | Generic `put`/`get` | Per-sandbox, id=`"settings:{sandbox_uuid}"` | - -The `sandbox_policies` table stores versioned policy revisions for both sandbox-scoped and global policies. Global revisions use the sentinel `sandbox_id = "__global__"`. See [Gateway Settings Channel](gateway-settings.md#storage-model) for schema details. - -### Generic Protobuf Codec - -The `Store` provides typed helpers that leverage trait bounds: - -- `put_message(&self, msg: &T)` -- encodes to protobuf bytes and upserts. -- `get_message(&self, id: &str)` -- fetches by ID, decodes protobuf. -- `get_message_by_name(&self, name: &str)` -- fetches by name, decodes protobuf. - -The `generate_name()` function produces random 6-character lowercase alphabetic strings for auto-naming objects. - -### Deployment Storage - -The gateway runs as a Kubernetes **StatefulSet** with a `volumeClaimTemplate` that provisions a 1Gi `ReadWriteOnce` PersistentVolumeClaim mounted at `/var/openshell`. The cluster's default StorageClass supplies the volume unless an operator customizes the chart. The SQLite database file at `/var/openshell/openshell.db` survives pod restarts and rescheduling. - -The Helm chart template is at `deploy/helm/openshell/templates/statefulset.yaml`. - -### CRUD Semantics - -- **Put**: Performs an upsert (`INSERT ... ON CONFLICT (id) DO UPDATE ...`). Both `created_at_ms` and `updated_at_ms` are set to the current timestamp in the `VALUES` clause, but the `ON CONFLICT` update only writes `payload` and `updated_at_ms` -- so `created_at_ms` is preserved after the initial insert. -- **Get / Delete**: Operate by primary key (`id`), filtered by `object_type`. -- **List**: Pages by `limit` + `offset` with deterministic ordering: `ORDER BY created_at_ms ASC, name ASC`. The secondary sort on `name` prevents unstable ordering when rows share the same millisecond timestamp. - -## Compute Driver Integration - -### Kubernetes Driver - -`KubernetesComputeDriver` (`crates/openshell-driver-kubernetes/src/driver.rs`) manages `agents.x-k8s.io/v1alpha1/Sandbox` CRDs behind the gateway's compute interface. The gateway binds to that driver through `ComputeDriverService` (`crates/openshell-driver-kubernetes/src/grpc.rs`) in-process, so the same `openshell.compute.v1.ComputeDriver` request and response types are exercised whether the driver is invoked locally or served over gRPC. - -- **Get**: `GetSandbox` looks up a sandbox CRD by name and returns a driver-native platform observation (`openshell.compute.v1.DriverSandbox`) with raw status and condition data from the object. -- **List**: `ListSandboxes` enumerates sandbox CRDs and returns driver-native platform observations for each, sorted by name for stable results. -- **Create**: Translates an internal `openshell.compute.v1.DriverSandbox` message into a Kubernetes `DynamicObject` with labels (`openshell.ai/sandbox-id`, `openshell.ai/managed-by: openshell`) and a spec that includes the pod template, environment variables, and gateway-required env vars (`OPENSHELL_SANDBOX_ID`, `OPENSHELL_ENDPOINT`, `OPENSHELL_SSH_SOCKET_PATH`, etc.). `OPENSHELL_SSH_SOCKET_PATH` is set from the driver's `--sandbox-ssh-socket-path` flag (default `/run/openshell/ssh.sock`) so the in-sandbox SSH daemon binds a Unix socket rather than a TCP port. When callers do not provide custom `volumeClaimTemplates`, the driver injects a default `workspace` PVC and mounts it at `/sandbox` so the default sandbox home/workdir survives pod rescheduling. -- **Delete**: Calls the Kubernetes API to delete the CRD by name. Returns `false` if already gone (404). -- **Stop**: `proto/compute_driver.proto` reserves `StopSandbox` for a non-destructive lifecycle transition. Resume is intentionally not a dedicated compute-driver RPC; the gateway auto-resumes a stopped sandbox when a client connects or executes into it. - -The gateway reaches the sandbox exclusively through the supervisor-initiated `ConnectSupervisor` session, so the driver never returns sandbox network endpoints. - -### Docker Driver - -The Docker driver (`crates/openshell-driver-docker/src/lib.rs`) is an in-process compute backend for local standalone gateways. It creates one Docker container per sandbox, labels each container with `openshell.ai/managed-by=openshell`, `openshell.ai/sandbox-id`, `openshell.ai/sandbox-name`, and `openshell.ai/sandbox-namespace`, and bind-mounts a Linux `openshell-sandbox` supervisor binary into the container. - -- **Create**: Pulls or validates the sandbox image according to `sandbox_image_pull_policy`, creates a labeled container, mounts the supervisor binary and optional TLS material, and starts the container with the supervisor as entrypoint. -- **Bridge networking**: Ensures a local Docker bridge network exists (`openshell-docker` by default) and starts every sandbox container on that network instead of using `network_mode=host`. -- **Gateway callback routing**: On native Linux Docker, injects `host.openshell.internal` with the bridge gateway IP and reports that bridge gateway IP plus the normal gateway port to `run_server()` as an extra listener. If the primary listener already binds the wildcard address for that port, the extra address is covered and is not bound a second time. On Docker Desktop, the bridge gateway IP belongs to Docker Desktop's VM rather than the macOS/Windows host, so the driver maps `host.openshell.internal` to Docker's `host-gateway` alias and does not request an extra listener. `OPENSHELL_ENDPOINT` inside Docker sandboxes uses the configured scheme and points at `host.openshell.internal:` in both cases. -- **Environment ownership**: Merges template and spec environment first, then overwrites driver-owned supervisor variables, including `PATH`, `OPENSHELL_ENDPOINT`, `OPENSHELL_SANDBOX_ID`, `OPENSHELL_SSH_SOCKET_PATH`, and `OPENSHELL_SANDBOX_COMMAND`. This keeps privileged supervisor setup from resolving helper binaries through a user-controlled search path. -- **List/Get/Watch**: Reads labeled containers in the configured sandbox namespace and derives driver-native sandbox status from Docker state plus supervisor relay readiness. -- **Stop**: Stops the matching labeled container without deleting it. -- **Delete**: Force-removes the matching labeled container. -- **Gateway shutdown**: On SIGINT or SIGTERM, `run_server()` leaves the accept loop and calls the Docker shutdown cleanup hook. The hook stops all running, restarting, or paused OpenShell-managed containers in the configured sandbox namespace so local sandboxes do not keep running after the gateway exits. -- **Gateway startup resume**: Before the watch and reconcile loops spawn, `ComputeRuntime::resume_persisted_sandboxes()` walks every sandbox record in the store. For each sandbox whose phase is `Provisioning`, `Ready`, or `Unknown`, it asks the Docker driver to start the labeled container if it is in the `exited` or `created` state (`StartupResume::resume_sandbox`). Containers in `running` or `restarting` are left alone; `paused`, `dead`, and `removing` are skipped. If the matching container has disappeared, the sandbox is moved to phase `Error` with reason `BackendResourceMissing`; if the start call fails, the sandbox moves to `Error` with reason `ResumeFailed`. This is what makes sandboxes survive a graceful gateway restart end-to-end: shutdown stops them, the next startup resumes them, and the store remains the source of truth across the cycle. Drivers that do not need this hook (Kubernetes, Podman, VM) leave `startup_resume = None`, which makes the resume sweep a no-op. -- **Handshake secret**: The Docker driver does not inject `OPENSHELL_SSH_HANDSHAKE_SECRET` or `OPENSHELL_SSH_HANDSHAKE_SKEW_SECS` into containers. Supervisor relay auth relies on the gateway connection rather than a Docker-visible container env secret. - -### VM Driver - -`VmDriver` (`crates/openshell-driver-vm/src/driver.rs`) is served by the standalone `openshell-driver-vm` process. The gateway spawns that binary on demand and talks to it over the internal `openshell.compute.v1.ComputeDriver` gRPC contract via a Unix domain socket. - -- **Create**: The VM driver process exports the selected sandbox image from the local Docker daemon, rewrites it into a sandbox-specific guest rootfs, injects an explicitly configured guest mTLS bundle when the gateway callback endpoint is `https://`, then re-execs itself in a hidden helper mode that loads libkrun directly and boots the supervisor. -- **Networking**: The helper starts an embedded `gvproxy`, wires it into libkrun as virtio-net, and gives the guest outbound connectivity. No inbound TCP listener is needed — the supervisor reaches the gateway over its outbound `ConnectSupervisor` stream. -- **Gateway callback**: The guest init script configures `eth0` for gvproxy networking, seeds `/etc/hosts` so `host.openshell.internal` resolves to the gvproxy gateway IP (`192.168.127.1`), preserves gvproxy's legacy `host.containers.internal` / `host.docker.internal` DNS answers, prefers the configured `OPENSHELL_GRPC_ENDPOINT`, and falls back to those aliases or the raw gateway IP when local hostname resolution is unavailable on macOS. -- **Guest boot**: The sandbox guest runs a minimal init script that starts `openshell-sandbox` directly as PID 1 inside the VM. -- **Watch stream**: Emits provisioning, ready, error, deleting, deleted, and platform-event updates so the gateway store remains the durable source of truth. - -### Compute Runtime - -`ComputeRuntime` consumes the driver-native watch stream from `WatchSandboxes`, translates the snapshots into public `openshell.v1.Sandbox` resources, derives the public phase, and applies the results to the store. In parallel, it periodically calls `ListSandboxes` and reconciles the store to the full driver snapshot; public `GetSandbox` and `ListSandboxes` handlers remain store-backed but are refreshed from the driver on a timer. - -### Gateway Phase Derivation - -`ComputeRuntime::derive_phase()` (`crates/openshell-server/src/compute/mod.rs`) maps driver-native compute status to the public `SandboxPhase` exposed by `proto/openshell.proto`: - -| Condition | Phase | -|-----------|-------| -| Driver status `deleting=true` | `Deleting` | -| Ready condition `status=True` | `Ready` | -| Ready condition `status=False`, terminal reason | `Error` | -| Ready condition `status=False`, transient reason | `Provisioning` | -| No conditions or no status | `Provisioning` (if status exists) / `Unknown` (if no status) | - -**Transient reasons** (will retry, stay in `Provisioning`): `ReconcilerError`, `DependenciesNotReady`. -All other `Ready=False` reasons are treated as terminal failures (`Error` phase). - -### Kubernetes Event Correlation - -The Kubernetes driver also watches namespace-scoped Kubernetes `Event` objects and correlates them to sandbox IDs before emitting them as compute-driver platform events: - -- Events involving `kind: Sandbox` are correlated by sandbox name. -- Events involving `kind: Pod` are correlated by agent pod name. -- Other event kinds are ignored. - -Matched events are published to the `PlatformEventBus` as `SandboxStreamEvent::Event` payloads. - -## Sandbox Index - -`SandboxIndex` (`crates/openshell-server/src/sandbox_index.rs`) maintains two in-memory maps protected by an `RwLock`: - -- `sandbox_name_to_id: HashMap` -- `agent_pod_to_id: HashMap` - -Updated by the compute watcher on every driver observation and by gRPC handlers during sandbox creation. Used by the compute-driver event correlator to map platform events back to sandbox IDs. - -## Observability - -Supervisor session telemetry is currently emitted as plain `tracing` events from `supervisor_session.rs` (accepted, superseded, ended, relay opened/claimed). OCSF structured logging on the gateway side is a tracked follow-up -- the `openshell-ocsf` crate needs a `GatewayContext` equivalent to the sandbox's `SandboxContext` before events like `network.activity` or `app.lifecycle` can be emitted here. Sandbox-side OCSF already covers SSH authentication, network decisions, and supervisor lifecycle. - -## Error Handling - -- **gRPC errors**: All gRPC handlers return `tonic::Status` with appropriate codes: - - `InvalidArgument` for missing/malformed fields, including a non-`Init` first frame on `RelayStream`. - - `NotFound` for nonexistent objects, including unknown or expired relay channels on `claim_relay`. - - `AlreadyExists` for duplicate creation. - - `FailedPrecondition` for state violations (e.g., exec on non-Ready sandbox, missing provider). - - `Unavailable` when the supervisor session for a sandbox is not connected within `open_relay`'s wait window, or when the supervisor's outbound channel has closed between lookup and send. - - `DeadlineExceeded` when a pending relay slot is claimed past `RELAY_PENDING_TIMEOUT`, or when `relay_rx` fails to deliver in time. - - `Internal` for store/decode/driver failures and for the `claim_relay` case where the waiter has dropped the oneshot receiver. - - `PermissionDenied` for policy violations. - - `ResourceExhausted` for broadcast lag (missed messages). - - `Cancelled` for closed broadcast channels. - -- **HTTP errors**: The SSH tunnel handler returns HTTP status codes directly (`401`, `404`, `405`, `412`, `429`, `500`, `502`). `502` indicates the supervisor relay could not be opened; `429` indicates a per-token or per-sandbox concurrent-tunnel limit. - -- **Connection errors**: Logged at `error` level but do not crash the gateway. TLS handshake failures and individual connection errors are caught and logged per-connection. - -- **Background task errors**: The compute watcher and relay reaper log warnings for individual processing failures but continue running. If the watcher stream ends, it logs a warning and the task exits (no automatic restart). +The same relay pattern backs interactive SSH, command execution, and file sync. +The gateway tracks live sessions in memory and persists session records so +tokens can expire or be revoked. -## Cross-References +## Operational Constraints -- [Sandbox Architecture](sandbox.md) -- sandbox-side policy enforcement, supervisor, and the local SSH daemon on the other end of the relay -- [Gateway Settings Channel](gateway-settings.md) -- runtime settings channel, two-tier resolution, CLI/TUI commands -- [Inference Routing](inference-routing.md) -- end-to-end inference interception flow, sandbox-side proxy logic, and route resolution -- [Container Management](build-containers.md) -- how sandbox container images are built and configured -- [Sandbox Connect](sandbox-connect.md) -- client-side SSH connection flow -- [Providers](sandbox-providers.md) -- provider credential management and injection +- Gateway TLS and client certificate distribution are deployment concerns owned + by the operator or packaging layer. +- Compute runtimes own the mechanics of starting workloads and injecting + callback configuration. +- Gateway restarts recover persisted objects from storage, but live relay + streams must be re-established by supervisors. +- User-facing behavior changes must update published docs in `docs/`; this file + should only record stable architecture. diff --git a/architecture/inference-routing.md b/architecture/inference-routing.md deleted file mode 100644 index 01b6a853a..000000000 --- a/architecture/inference-routing.md +++ /dev/null @@ -1,359 +0,0 @@ -# Inference Routing - -Inference routing gives sandboxed agents access to LLM APIs through a single, explicit endpoint: `inference.local`. There is no implicit catch-all interception for arbitrary hosts. Requests are routed only when the process targets `inference.local` via HTTPS and the request matches a supported inference API pattern. - -All inference execution happens locally inside the sandbox via the `openshell-router` crate. The gateway is control-plane only: it stores configuration and delivers resolved route bundles to sandboxes over gRPC. - -## Architecture Overview - -```mermaid -sequenceDiagram - participant Agent as Agent Process - participant Proxy as Sandbox Proxy - participant Router as openshell-router - participant Gateway as Gateway (gRPC) - participant Backend as Inference Backend - - Note over Gateway,Router: Control plane (startup + periodic refresh) - Gateway->>Router: GetInferenceBundle (routes, credentials) - - Note over Agent,Backend: Data plane (per-request) - Agent->>Proxy: CONNECT inference.local:443 - Proxy->>Proxy: TLS terminate (MITM) - Proxy->>Proxy: Parse HTTP, detect pattern - Proxy->>Router: proxy_with_candidates() - Router->>Router: Select route by protocol - Router->>Router: Rewrite auth + model - Router->>Backend: HTTPS request - Backend->>Router: Response headers + body stream - Router->>Proxy: StreamingProxyResponse (headers first) - Proxy->>Agent: HTTP/1.1 headers (chunked TE) - loop Each body chunk (120s idle timeout per chunk) - Router->>Proxy: chunk via next_chunk() - Proxy->>Agent: Chunked-encoded frame - end - alt Stream truncated (idle timeout, byte limit, upstream error) - Proxy->>Agent: SSE error event (proxy_stream_error) - end - Proxy->>Agent: Chunk terminator (0\r\n\r\n) -``` - -## Provider Profiles - -File: `crates/openshell-core/src/inference.rs` - -`InferenceProviderProfile` is the single source of truth for provider-specific inference knowledge: default endpoint, supported protocols, credential key lookup order, auth header style, and default headers. - -Three profiles are defined: - -| Provider | Default Base URL | Protocols | Auth | Default Headers | -|----------|-----------------|-----------|------|-----------------| -| `openai` | `https://api.openai.com/v1` | `openai_chat_completions`, `openai_completions`, `openai_responses`, `model_discovery` | `Authorization: Bearer` | (none) | -| `anthropic` | `https://api.anthropic.com/v1` | `anthropic_messages`, `model_discovery` | `x-api-key` | `anthropic-version: 2023-06-01` | -| `nvidia` | `https://integrate.api.nvidia.com/v1` | `openai_chat_completions`, `openai_completions`, `openai_responses`, `model_discovery` | `Authorization: Bearer` | (none) | - -Each profile also defines `credential_key_names` (e.g. `["OPENAI_API_KEY"]`) and `base_url_config_keys` (e.g. `["OPENAI_BASE_URL"]`) used by the gateway to resolve credentials and endpoint overrides from provider records. - -Unknown provider types return `None` from `profile_for()` and default to `Bearer` auth with no default headers via `auth_for_provider_type()`. - -## Control Plane (Gateway) - -File: `crates/openshell-server/src/inference.rs` - -The gateway implements the `Inference` gRPC service defined in `proto/inference.proto`. - -### Gateway inference set/get - -`SetClusterInference` takes a `provider_name` and `model_id`. It: - -1. Validates that both fields are non-empty. -2. Fetches the named provider record from the store. -3. Validates the provider by resolving its route (checking that the provider type is supported and has a usable API key). -4. By default, performs a lightweight provider-shaped probe against the resolved upstream endpoint (for example, a tiny chat/messages request with `max_tokens: 1`) to confirm the endpoint is reachable and accepts the expected auth/request shape. `--no-verify` disables this probe when the endpoint is not up yet. -5. Builds a managed route spec that stores only `provider_name` and `model_id`. The spec intentionally leaves `base_url`, `api_key`, and `protocols` empty -- these are resolved dynamically at bundle time from the provider record. -6. Upserts the route with name `inference.local`. Version starts at 1 and increments monotonically on each update. - -`GetClusterInference` returns `provider_name`, `model_id`, and `version` for the managed route. Returns `NOT_FOUND` if gateway inference is not configured. - -### Bundle delivery - -`GetInferenceBundle` resolves the managed route at request time: - -1. Loads the `inference.local` route from the store. -2. Looks up the referenced provider record by `provider_name`. -3. Resolves endpoint, API key, protocols, and provider type from the provider record using the `InferenceProviderProfile` registry. -4. If the provider's config map contains a base URL override key (e.g. `OPENAI_BASE_URL`), that value overrides the profile default. -5. Returns a `GetInferenceBundleResponse` with the resolved route(s), a revision hash (DefaultHasher over route fields), and `generated_at_ms` timestamp. - -Because resolution happens at request time, credential rotation and endpoint changes on the provider record take effect on the next bundle fetch without re-running `SetClusterInference`. - -An empty route list is valid and indicates gateway inference is not yet configured. - -### Proto definitions - -File: `proto/inference.proto` - -Key messages: - -- `SetClusterInferenceRequest` -- `provider_name` + `model_id` + `timeout_secs` + optional `no_verify` override, with verification enabled by default -- `SetClusterInferenceResponse` -- `provider_name` + `model_id` + `timeout_secs` + `version` -- `GetInferenceBundleResponse` -- `repeated ResolvedRoute routes` + `revision` + `generated_at_ms` -- `ResolvedRoute` -- `name`, `base_url`, `protocols`, `api_key`, `model_id`, `provider_type`, `timeout_secs` - -## Data Plane (Sandbox) - -Files: - -- `crates/openshell-sandbox/src/proxy.rs` -- proxy interception, inference context, request routing -- `crates/openshell-sandbox/src/l7/inference.rs` -- pattern detection, HTTP parsing, response formatting, SSE error generation (`format_sse_error()`) -- `crates/openshell-sandbox/src/lib.rs` -- inference context initialization, route refresh -- `crates/openshell-sandbox/src/grpc_client.rs` -- `fetch_inference_bundle()` - -In gateway bundle mode, the sandbox starts a background refresh loop as soon as the inference context is created. The loop polls the gateway every 5 seconds by default (`OPENSHELL_ROUTE_REFRESH_INTERVAL_SECS` override) and uses the bundle revision hash to skip no-op cache writes. The revision hash covers all route fields including `timeout_secs`, so any configuration change (provider, model, or timeout) triggers a cache update on the next poll. - -### Interception flow - -The proxy handles only `CONNECT` requests to `inference.local`. Non-CONNECT requests (any method, any host) are rejected with `403`. - -When a `CONNECT inference.local:443` arrives: - -1. Proxy responds `200 Connection Established`. -2. `handle_inference_interception()` TLS-terminates the client connection using the sandbox CA (MITM). -3. Raw HTTP requests are parsed from the TLS tunnel using `try_parse_http_request()` (supports Content-Length and chunked transfer encoding). -4. Each parsed request is passed to `route_inference_request()`. -5. The tunnel supports HTTP keep-alive: multiple requests can be processed sequentially. -6. Buffer starts at 64 KiB (`INITIAL_INFERENCE_BUF`) and grows up to 10 MiB (`MAX_INFERENCE_BUF`). Requests exceeding the max get `413 Payload Too Large`. - -### Request classification - -File: `crates/openshell-sandbox/src/l7/inference.rs` -- `default_patterns()` and `detect_inference_pattern()` - -Supported built-in patterns: - -| Method | Path | Protocol | Kind | -|--------|------|----------|------| -| `POST` | `/v1/chat/completions` | `openai_chat_completions` | `chat_completion` | -| `POST` | `/v1/completions` | `openai_completions` | `completion` | -| `POST` | `/v1/responses` | `openai_responses` | `responses` | -| `POST` | `/v1/messages` | `anthropic_messages` | `messages` | -| `GET` | `/v1/models` | `model_discovery` | `models_list` | -| `GET` | `/v1/models/*` | `model_discovery` | `models_get` | - -Query strings are stripped before matching. Path matching is exact for most patterns; `/v1/models/*` matches any sub-path (e.g. `/v1/models/gpt-4.1`). Absolute-form URIs (e.g. `https://inference.local/v1/chat/completions`) are normalized to path-only form by `normalize_inference_path()` before detection. - -If no pattern matches, the proxy returns `403 Forbidden` with `{"error": "connection not allowed by policy"}`. - -### Route cache - -- `InferenceContext` holds a `Router`, the pattern list, and an `Arc>>` route cache. -- In gateway bundle mode, `spawn_route_refresh()` polls `GetInferenceBundle` every 5 seconds (`OPENSHELL_ROUTE_REFRESH_INTERVAL_SECS`). On failure, stale routes are kept. -- In file mode (`--inference-routes`), routes load once at startup from YAML. No refresh task is spawned. -- In gateway bundle mode, an empty initial bundle still enables the inference context so the refresh task can pick up later configuration. - -### Bundle-to-route conversion - -`bundle_to_resolved_routes()` in `lib.rs` converts proto `ResolvedRoute` messages to router `ResolvedRoute` structs. Auth header style and default headers are derived from `provider_type` using `openshell_core::inference::auth_for_provider_type()`. - -## Router Behavior - -Files: - -- `crates/openshell-router/src/lib.rs` -- `Router`, `proxy_with_candidates()`, `proxy_with_candidates_streaming()` -- `crates/openshell-router/src/backend.rs` -- `prepare_backend_request()`, `send_backend_request()`, `send_backend_request_streaming()`, `proxy_to_backend()`, `proxy_to_backend_streaming()`, URL construction -- `crates/openshell-router/src/config.rs` -- `RouteConfig`, `ResolvedRoute`, YAML loading - -### Route selection - -`proxy_with_candidates()` finds the first route whose `protocols` list contains the detected source protocol (normalized to lowercase). If no route matches, returns `RouterError::NoCompatibleRoute`. - -### Request rewriting - -`prepare_backend_request()` (shared by both buffered and streaming paths) rewrites outgoing requests: - -1. **Auth injection**: Uses the route's `AuthHeader` -- either `Authorization: Bearer ` or a custom header (e.g. `x-api-key: ` for Anthropic). -2. **Header allowlist**: Keeps only explicitly approved request headers: common inference headers (`content-type`, `accept`, `accept-encoding`, `user-agent`), route-specific passthrough headers (for example `openai-organization`, `x-model-id`, `anthropic-version`, `anthropic-beta`), and any route default header names. -3. **Header stripping**: Removes `authorization`, `x-api-key`, `host`, `content-length`, hop-by-hop headers, and any non-allowlisted request headers. -4. **Default headers**: Applies route-level default headers (e.g. `anthropic-version: 2023-06-01`) unless the client already sent them. -5. **Model rewrite**: Parses the request body as JSON and replaces the `model` field with the route's configured model. Non-JSON bodies are forwarded unchanged. -6. **URL construction**: `build_backend_url()` appends the request path to the route endpoint. If the endpoint already ends with `/v1` and the request path starts with `/v1/`, the duplicate prefix is deduplicated. - -### Header sanitization - -Before forwarding inference requests, the router enforces a route-aware request allowlist and strips sensitive/framing headers. Response sanitization remains framing-only: - -- **Request**: forwards only common inference headers plus route-specific passthrough headers and route default header names. Always strips `authorization`, `x-api-key`, `host`, `content-length`, unknown headers such as `cookie`, and hop-by-hop headers (`connection`, `keep-alive`, `proxy-authenticate`, `proxy-authorization`, `proxy-connection`, `te`, `trailer`, `transfer-encoding`, `upgrade`). -- **Response**: `content-length` and hop-by-hop headers. - -### Response streaming - -The router supports two response modes: - -- **Buffered** (`proxy_with_candidates()`): Reads the entire upstream response body into memory before returning a `ProxyResponse { status, headers, body: Bytes }`. Used by mock routes and in-process system inference calls where latency is not a concern. -- **Streaming** (`proxy_with_candidates_streaming()`): Returns a `StreamingProxyResponse` as soon as response headers arrive from the backend. The body is exposed as a `StreamingBody` enum with a `next_chunk()` method that yields `Option` incrementally. - -`StreamingBody` has two variants: - -| Variant | Source | Behavior | -|---------|--------|----------| -| `Live(reqwest::Response)` | Real HTTP backend | Calls `response.chunk()` to yield each body fragment as it arrives from the network | -| `Buffered(Option)` | Mock routes or fallback | Yields the entire body on the first call, then `None` | - -The sandbox proxy (`route_inference_request()` in `proxy.rs`) uses the streaming path for all inference requests: - -1. Calls `proxy_with_candidates_streaming()` to get headers immediately. -2. Formats and sends the HTTP/1.1 response header with `Transfer-Encoding: chunked` via `format_http_response_header()`. -3. Wraps the TLS client stream in a `BufWriter` (16 KiB capacity) to coalesce small SSE chunks into fewer TLS records, reducing per-chunk flush overhead. -4. Loops on `body.next_chunk()` with a per-chunk idle timeout (`CHUNK_IDLE_TIMEOUT`, 120 seconds), wrapping each fragment in HTTP chunked encoding via `format_chunk()`. The 120-second timeout accommodates reasoning models (e.g. nemotron-3-super, o1, o3) that pause 60+ seconds between thinking and output phases. -5. Enforces a total streaming body cap (`MAX_STREAMING_BODY`, 32 MiB). -6. On truncation (idle timeout, byte limit, or upstream read error), injects an SSE error event before the chunk terminator so clients can detect the truncation rather than silently losing data. -7. Sends the chunk terminator (`0\r\n\r\n`) via `format_chunk_terminator()` and flushes the `BufWriter`. - -This eliminates full-body buffering for streaming responses (SSE). Time-to-first-byte is determined by the backend's first chunk latency rather than the full generation time. - -#### Truncation signaling - -When the proxy truncates a streaming response, it injects an SSE error event via `format_sse_error()` (in `crates/openshell-sandbox/src/l7/inference.rs`) before sending the HTTP chunked terminator: - -```text -data: {"error":{"message":"","type":"proxy_stream_error"}} -``` - -Three truncation paths exist: - -| Cause | SSE error message | OCSF severity | -|-------|-------------------|---------------| -| Per-chunk idle timeout (120s) | `response truncated: chunk idle timeout exceeded` | Medium | -| Upstream read error | `response truncated: upstream read error` | Medium | -| Streaming body exceeds 32 MiB | `response truncated: exceeded maximum streaming body size` | *(warn log only)* | - -The `reason` field in the SSE event is sanitized — it never contains internal URLs, hostnames, or credentials. Full details are captured server-side in the OCSF log. - -### Mock routes - -File: `crates/openshell-router/src/mock.rs` - -Routes with `mock://` scheme endpoints return canned responses without making HTTP requests. Mock responses are protocol-aware (OpenAI chat completion, OpenAI completion, Anthropic messages, or generic JSON). Mock routes include an `x-openshell-mock: true` response header. - -### Timeout model - -The router uses a layered timeout strategy with separate handling for buffered and streaming responses. - -**Client connect timeout**: The `reqwest::Client` is built with a 30-second `connect_timeout` (in `crates/openshell-router/src/lib.rs` → `Router::new()`). This bounds TCP connection establishment and applies to all outgoing requests regardless of response mode. - -**Buffered responses** (`proxy_to_backend()` via `send_backend_request()`): Apply the route's `timeout` as a total request timeout covering the entire lifecycle (connect + headers + body). When `timeout_secs` is `0` in the proto message, the default of 60 seconds is used (defined as `DEFAULT_ROUTE_TIMEOUT` in `config.rs`). Timeouts and connection failures map to `RouterError::UpstreamUnavailable`. - -**Streaming responses** (`proxy_to_backend_streaming()` via `send_backend_request_streaming()`): Do **not** apply a total request timeout. The total duration of a streaming response is unbounded — liveness is enforced by the sandbox proxy's per-chunk idle timeout (`CHUNK_IDLE_TIMEOUT`, 120 seconds in `proxy.rs`) instead. This separation exists because streaming inference responses (especially from reasoning models) can legitimately take minutes to complete while still sending data. The `prepare_backend_request()` helper in `backend.rs` builds the request identically for both paths; the caller decides whether to chain `.timeout()` before sending. - -Timeout changes propagate dynamically to running sandboxes. The bundle revision hash includes `timeout_secs`, so when the timeout is updated via `openshell inference update --timeout`, the refresh loop detects the revision change and updates the route cache within one polling interval (5 seconds by default). - -## Standalone Route File - -File: `crates/openshell-router/src/config.rs` - -Standalone sandboxes can load static routes from YAML via `--inference-routes`: - -```yaml -routes: - - route: inference.local - endpoint: http://localhost:1234/v1 - model: local-model - protocols: [openai_chat_completions] - api_key: lm-studio - # Or reference an environment variable: - # api_key_env: OPENAI_API_KEY -``` - -Fields: - -- `route` -- route name (informational) -- `endpoint` -- backend base URL -- `model` -- model ID to force on outgoing requests -- `protocols` -- list of supported protocol strings -- `provider_type` -- optional; determines auth style and default headers via `InferenceProviderProfile` -- `api_key` -- inline API key (mutually exclusive with `api_key_env`) -- `api_key_env` -- environment variable name containing the API key - -Validation at load time requires either `api_key` or `api_key_env` to resolve, and at least one protocol. Protocols are normalized (lowercased, trimmed, deduplicated). - -## Error Model - -| Status | Condition | -|--------|-----------| -| `403` | Request on `inference.local` does not match a recognized inference API pattern | -| `503` | Pattern matched but route cache is empty (gateway inference not configured) | -| `400` | No compatible route for the detected source protocol | -| `401` | Upstream returned unauthorized | -| `502` | Upstream protocol error or internal router error | -| `503` | Upstream unavailable (timeout or connection failure) | -| `413` | Request body exceeds 10 MiB buffer limit | - -## System Inference Route - -In addition to the user-facing `inference.local` route, the gateway supports a second managed route named `sandbox-system` for platform system functions (e.g. an embedded agent harness for policy analysis). - -### Key differences from user inference - -| Aspect | User (`inference.local`) | System (`sandbox-system`) | -|--------|--------------------------|---------------------------| -| **Consumer** | Agent code inside sandbox | Supervisor binary only | -| **Access** | Proxy-intercepted CONNECT | In-process API on `InferenceContext` | -| **Network surface** | HTTPS to `inference.local:443` | None -- function call | -| **Route cache** | `InferenceContext.routes` | `InferenceContext.system_routes` | - -### In-process API - -`InferenceContext::system_inference()` provides the supervisor with direct access to inference using the system routes. It calls `Router::proxy_with_candidates()` with the system route cache -- the same backend proxy logic used for user inference, but without any CONNECT/TLS overhead. - -```rust -ctx.system_inference( - "openai_chat_completions", - "POST", - "/v1/chat/completions", - headers, - body, -).await -``` - -### Access control - -The system route is not exposed through the CONNECT proxy. The supervisor runs in the host network namespace and calls the router directly. User processes are in an isolated sandbox network namespace and cannot reach the in-process API. - -### Bundle delivery - -Both routes are included in `GetInferenceBundleResponse.routes` (which is `repeated ResolvedRoute`). The sandbox partitions routes by `ResolvedRoute.name` during `bundle_to_resolved_routes()`: routes named `sandbox-system` go to the system cache, everything else goes to the user cache. Both caches are refreshed on the same polling interval. - -### Storage - -The system route is stored as a separate `InferenceRoute` record in the gateway store with `name = "sandbox-system"`. The `SetClusterInferenceRequest.route_name` field selects which route to target (empty string defaults to `inference.local`). - -## CLI Surface - -Gateway inference commands: - -- `openshell inference set --provider --model [--timeout ]` -- configures user-facing gateway inference -- `openshell inference set --system --provider --model [--timeout ]` -- configures system inference -- `openshell inference update [--provider ] [--model ] [--timeout ]` -- updates individual fields without resetting others -- `openshell inference get` -- displays both user and system inference configuration -- `openshell inference get --system` -- displays only the system inference configuration - -The `--provider` flag references a provider record name (not a provider type). The provider must already exist on the gateway and have a supported inference type (`openai`, `anthropic`, or `nvidia`). - -The `--timeout` flag sets the per-request timeout in seconds for upstream inference calls. When omitted or set to `0`, the default of 60 seconds applies. Timeout changes propagate to running sandboxes within the route refresh interval (5 seconds by default). - -Inference writes verify by default. `--no-verify` is the explicit opt-out for endpoints that are not up yet. - -## Provider Discovery - -Files: - -- `crates/openshell-providers/src/lib.rs` -- `ProviderRegistry`, `ProviderPlugin` trait -- `crates/openshell-providers/src/providers/openai.rs` -- `OpenaiProvider` -- `crates/openshell-providers/src/providers/anthropic.rs` -- `AnthropicProvider` -- `crates/openshell-providers/src/providers/nvidia.rs` -- `NvidiaProvider` - -Provider discovery and inference routing are separate concerns: - -- `ProviderPlugin` (in `openshell-providers`) handles credential *discovery* -- scanning environment variables to find API keys. -- `InferenceProviderProfile` (in `openshell-core`) handles how to *use* discovered credentials to make inference API calls. - -The `openai`, `anthropic`, and `nvidia` provider plugins each discover credentials from their canonical environment variable (`OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, `NVIDIA_API_KEY`). These credentials are stored in provider records and looked up by the gateway at bundle resolution time. diff --git a/architecture/object-metadata.md b/architecture/object-metadata.md deleted file mode 100644 index 961576e16..000000000 --- a/architecture/object-metadata.md +++ /dev/null @@ -1,426 +0,0 @@ -# Object Metadata Convention - -## Overview - -OpenShell adopts a Kubernetes-style object metadata convention for all top-level domain objects. This standardizes how resources are identified, labeled, and queried across the platform. All resources that users interact with directly (Sandbox, Provider, SshSession, InferenceRoute) follow this convention. - -## Core Principles - -### 1. Uniform Metadata Structure - -All top-level objects embed a common `ObjectMeta` message containing: - -- **Stable ID**: Server-generated UUID that never changes -- **Human-readable name**: User-friendly identifier (unique per object type) -- **Creation timestamp**: Milliseconds since Unix epoch -- **Labels**: Key-value pairs for filtering and organization - -### 2. Trait-Based Access - -Rather than accessing metadata fields directly (e.g., `sandbox.metadata.as_ref().unwrap().id`), code uses trait methods from `openshell_core::metadata`: - -```rust -use openshell_core::{ObjectId, ObjectName, ObjectLabels}; - -let id = sandbox.object_id(); // Returns &str -let name = sandbox.object_name(); // Returns &str -let labels = sandbox.object_labels(); // Returns Option> -``` - -This provides: - -- **Uniform API** across all object types -- **Graceful fallback** (returns empty string if metadata is None) -- **Reduced boilerplate** in code that works with multiple object types - -### 3. Labels for Organization and Filtering - -Labels are key-value metadata attached to objects for: - -- **Grouping** related resources (e.g., all dev environment sandboxes) -- **Filtering** in list operations (e.g., show only sandboxes with `team=backend`) -- **Automation** and selection in scripts - -## Implementation Pattern - -### Protobuf Definition - -Define `ObjectMeta` once in `proto/datamodel.proto`: - -```protobuf -message ObjectMeta { - string id = 1; - string name = 2; - int64 created_at_ms = 3; - map labels = 4; -} -``` - -Embed it in top-level objects: - -```protobuf -message Sandbox { - ObjectMeta metadata = 1; - SandboxSpec spec = 2; - SandboxStatus status = 3; - int32 phase = 4; - int32 current_policy_version = 5; -} -``` - -**Migration**: When adding metadata to an existing object, shift field numbers to make room for `metadata = 1`. This maintains backward compatibility if done before release. - -### Trait Implementation - -Implement the three traits for each object in `crates/openshell-core/src/metadata.rs`: - -```rust -impl ObjectId for Sandbox { - fn object_id(&self) -> &str { - self.metadata.as_ref().map(|m| m.id.as_str()).unwrap_or("") - } -} - -impl ObjectName for Sandbox { - fn object_name(&self) -> &str { - self.metadata.as_ref().map(|m| m.name.as_str()).unwrap_or("") - } -} - -impl ObjectLabels for Sandbox { - fn object_labels(&self) -> Option> { - self.metadata.as_ref().map(|m| m.labels.clone()) - } -} -``` - -**Pattern**: Always return empty string for missing metadata rather than panicking. This makes code resilient to malformed data. - -### Persistence Layer - -The `Store` trait in `crates/openshell-server/src/persistence/mod.rs` provides three methods for working with objects: - -```rust -// Store/retrieve by stable ID -async fn put_message( - &self, - message: &T, -) -> Result<(), String>; - -async fn get(&self, object_type: &str, id: &str) - -> Result, String>; - -// Retrieve by human-readable name -async fn get_message_by_name( - &self, - name: &str, -) -> Result, String>; -``` - -**Database schema pattern**: Each object type has: - -- `id` column (TEXT PRIMARY KEY) — stable UUID -- `name` column (TEXT UNIQUE NOT NULL) — user-facing name -- `payload` column (BLOB) — serialized protobuf -- `created_at_ms` column (INTEGER) — denormalized from metadata for indexing -- `updated_at_ms` column (INTEGER) — last modification time - -### Label Filtering - -Label selectors follow Kubernetes conventions: - -**Format**: `key1=value1,key2=value2` (comma-separated, AND logic) - -**Implementation**: - -1. Parse selector into key-value pairs -2. For each object, check that ALL selector labels match -3. Return only objects where every label in the selector exists with the exact value - -**SQL pattern** (PostgreSQL with JSONB): - -```sql -WHERE labels @> '{"env": "dev", "team": "backend"}'::jsonb -``` - -**SQL pattern** (SQLite): - -```sql -WHERE json_extract(labels, '$.env') = 'dev' - AND json_extract(labels, '$.team') = 'backend' -``` - -The `list_with_selector` method on `Store` handles this transparently. - -### Validation Rules - -Labels must follow Kubernetes naming conventions (enforced in `crates/openshell-server/src/grpc/validation.rs`): - -**Label keys**: - -- Optional prefix + `/` + name (e.g., `example.com/app` or `app`) -- Prefix: DNS subdomain (lowercase alphanumeric, `-`, `.`, max 253 chars) -- Name: alphanumeric + `-`, `_`, `.`, max 63 chars -- Cannot start or end with `-` or `.` - -**Label values**: - -- Alphanumeric + `-`, `_`, `.` -- Max 63 characters -- Can be empty string - -**Validation functions**: - -```rust -validate_label_key(key: &str) -> Result<(), Status> -validate_label_value(value: &str) -> Result<(), Status> -validate_labels(labels: &HashMap) -> Result<(), Status> -``` - -**Validation timing**: Validate at API ingress (gRPC handlers) before persisting. Reject invalid labels immediately rather than storing and failing later. - -## CLI Integration - -### Creating Objects with Labels - -```bash -openshell sandbox create --label env=dev --label team=backend -openshell provider create openai --label project=research -``` - -**Pattern**: Repeatable `--label key=value` flags parsed into `HashMap`. - -### Listing with Selectors - -```bash -openshell sandbox list --selector env=dev -openshell sandbox list --selector env=dev,team=backend -``` - -**Display**: Show labels in tabular output when present, or in detail views. - -## Testing Requirements - -### Unit Tests - -Test validation logic for: - -- Valid label keys (with and without prefix) -- Invalid keys (bad characters, too long, empty segments) -- Valid label values -- Invalid values (non-alphanumeric, too long) -- Selector parsing - -### Integration Tests - -Test persistence layer: - -- Store object with labels -- Retrieve by name and verify labels present -- Filter with single-label selector -- Filter with multi-label selector (AND logic) -- Empty results for non-matching selector - -### E2E Tests - -Test full workflow through CLI: - -- Create multiple objects with different labels -- List all objects -- Filter by single label -- Filter by multiple labels -- Verify labels persist across gateway restarts - -**Location**: `e2e/rust/tests/sandbox_labels.rs` (or equivalent for each object type) - -## Migration Checklist - -When adding object metadata to a new resource type: - -1. **Proto changes**: - - [ ] Add `ObjectMeta metadata = 1;` field - - [ ] Shift existing field numbers if needed - - [ ] Update any references to old id/name fields - -2. **Trait implementations**: - - [ ] Implement `ObjectId` trait - - [ ] Implement `ObjectName` trait - - [ ] Implement `ObjectLabels` trait - - [ ] Add to `crates/openshell-core/src/metadata.rs` - -3. **Persistence**: - - [ ] Add database migration (SQLite + PostgreSQL) - - [ ] Create `labels` column (JSON/JSONB type) - - [ ] Migrate existing `id`/`name` to `ObjectMeta` - - [ ] Update `ObjectType` implementation - - [ ] Update create/read operations to use new structure - -4. **Validation**: - - [ ] Add label validation in gRPC handlers - - [ ] Validate on create and update operations - - [ ] Test validation with unit tests - -5. **API updates**: - - [ ] Add `label_selector` parameter to List RPC - - [ ] Implement selector filtering in persistence layer - - [ ] Add `labels` field to Create/Update RPCs - -6. **CLI updates**: - - [ ] Add `--label` flag to create command - - [ ] Add `--selector` flag to list command - - [ ] Update completion for label keys (if applicable) - - [ ] Display labels in list and get output - -7. **Tests**: - - [ ] Unit tests for validation - - [ ] Integration tests for persistence - - [ ] E2E tests for CLI workflow - -8. **Documentation**: - - [ ] Update user-facing docs for new flags - - [ ] Add examples with labels to guides - -## Common Patterns - -### Creating Objects with Metadata - -```rust -use crate::persistence::current_time_ms; - -let now_ms = current_time_ms() - .map_err(|e| Status::internal(format!("get current time: {e}")))?; - -let sandbox = Sandbox { - metadata: Some(openshell_core::proto::datamodel::v1::ObjectMeta { - id: uuid::Uuid::new_v4().to_string(), - name: user_provided_name, - created_at_ms: now_ms, - labels: request.labels, - }), - spec: Some(spec), - status: None, - phase: SandboxPhase::Provisioning as i32, - current_policy_version: 0, -}; - -// Validate before persisting -validate_object_metadata(sandbox.metadata.as_ref(), "sandbox")?; -store.put_message(&sandbox).await?; -``` - -### Filtering by Labels - -```rust -let sandboxes = if request.label_selector.is_empty() { - store.list(Sandbox::object_type(), limit, offset).await? -} else { - validate_label_selector(&request.label_selector)?; - store.list_with_selector( - Sandbox::object_type(), - &request.label_selector, - limit, - offset, - ).await? -}; -``` - -### Accessing Metadata Fields - -```rust -use openshell_core::{ObjectId, ObjectName}; - -// Good: trait-based access -let sandbox_id = sandbox.object_id(); -let sandbox_name = sandbox.object_name(); - -// Avoid: direct field access -let sandbox_id = sandbox.metadata.as_ref().unwrap().id.as_str(); // Don't do this -``` - -## Anti-Patterns to Avoid - -### ❌ Bypassing Validation - -```rust -// Bad: storing labels without validation -store.put_message(&sandbox).await?; -``` - -```rust -// Good: validate before storing -validate_labels(&sandbox.metadata.as_ref().unwrap().labels)?; -store.put_message(&sandbox).await?; -``` - -### ❌ Direct Field Access - -```rust -// Bad: fragile to missing metadata -let id = sandbox.metadata.as_ref().unwrap().id.clone(); -``` - -```rust -// Good: trait-based with fallback -let id = sandbox.object_id().to_string(); -``` - -### ❌ Inconsistent Object Construction - -```rust -// Bad: forgetting created_at_ms or labels -let sandbox = Sandbox { - metadata: Some(ObjectMeta { - id: uuid::Uuid::new_v4().to_string(), - name: "test".to_string(), - ..Default::default() // Silently sets created_at_ms=0, labels=empty - }), - ..Default::default() -}; -``` - -```rust -// Good: explicit fields -let sandbox = Sandbox { - metadata: Some(ObjectMeta { - id: uuid::Uuid::new_v4().to_string(), - name: "test".to_string(), - created_at_ms: current_time_ms()?, - labels: request.labels, - }), - ..Default::default() -}; -``` - -### ❌ Client-Side ID Generation - -```rust -// Bad: letting clients specify IDs -let sandbox = Sandbox { - metadata: Some(ObjectMeta { - id: request.id, // Never trust client-provided IDs - .. - }), - .. -}; -``` - -```rust -// Good: server generates stable IDs -let sandbox = Sandbox { - metadata: Some(ObjectMeta { - id: uuid::Uuid::new_v4().to_string(), - .. - }), - .. -}; -``` - -## References - -- **Kubernetes API Conventions**: https://kubernetes.io/docs/reference/using-api/api-concepts/ -- **Label Syntax**: https://kubernetes.io/docs/concepts/overview/working-with-objects/labels/ -- **Proto definition**: `proto/datamodel.proto` -- **Trait implementations**: `crates/openshell-core/src/metadata.rs` -- **Persistence layer**: `crates/openshell-server/src/persistence/mod.rs` -- **Validation logic**: `crates/openshell-server/src/grpc/validation.rs` -- **E2E tests**: `e2e/rust/tests/sandbox_labels.rs` diff --git a/architecture/oidc-auth.md b/architecture/oidc-auth.md deleted file mode 100644 index 746a6a519..000000000 --- a/architecture/oidc-auth.md +++ /dev/null @@ -1,542 +0,0 @@ -# OIDC Authentication - -OpenShell supports OAuth2/OIDC (OpenID Connect) as an authentication mode alongside mTLS and Cloudflare Access. When enabled, the gateway server validates JWT bearer tokens on gRPC requests against an OIDC provider's JWKS endpoint. The CLI acquires tokens via browser-based login (Authorization Code + PKCE) or environment variables (Client Credentials). - -## Architecture - -```mermaid -graph LR - CLI -->|"Bearer token
(gRPC metadata)"| Gateway["Gateway
Server"] - Gateway -->|response| CLI - Gateway -->|"JWKS (cached)"| Keycloak["Keycloak /
OIDC Provider"] - Browser -->|"Auth Code + PKCE"| Gateway - Keycloak -->|"Token exchange"| CLI - Keycloak -->|"Token exchange"| Browser -``` - -## Auth Modes - -OpenShell determines the authentication strategy per gateway via the `auth_mode` field in gateway metadata (`~/.config/openshell/gateways//metadata.json`): - -| `auth_mode` | Transport | Identity | Token Storage | -|---|---|---|---| -| `"mtls"` | mTLS client cert | Cert CN | N/A | -| `"plaintext"` | HTTP (no TLS) | None | N/A | -| `"cloudflare_jwt"` | Edge TLS (CF Tunnel) | CF Access JWT | `edge_token` file | -| `"oidc"` | mTLS or plaintext | OIDC JWT | `oidc_token.json` | - -## Token Acquisition - -### Interactive: Authorization Code + PKCE - -Used by `openshell gateway login` for interactive CLI sessions. The login flow accepts a `client_id` (the OIDC client application) and an optional `audience` (the API resource server). When `audience` differs from `client_id` — common with providers like Entra ID — it is appended to the authorization URL so the issued token targets the correct API. - -``` -CLI Browser Keycloak - | | | - | 1. Discover OIDC endpoints | | - | GET {issuer}/.well-known/openid-configuration | - | | | - | 2. Generate PKCE pair | | - | code_verifier = random(32 bytes) -> base64url | - | code_challenge = base64url(SHA256(code_verifier)) | - | state = random(16 bytes) -> hex | - | | | - | 3. Start localhost callback | | - | on 127.0.0.1: | | - | | | - | 4. Open browser | | - | -------xdg-open------------->| | - | | 5. Redirect to Keycloak | - | | /auth?response_type=code | - | | &client_id={client_id} | - | | &redirect_uri=localhost:... | - | | &code_challenge=... | - | | &code_challenge_method=S256 | - | | &state=... | - | | [&audience={audience}] | - | | --------------------------->| - | | | - | | 6. User logs | - | | in | - | | | - | | 7. Redirect back | - | | <-- ?code=...&state=... ---| - | | | - | 8. Receive code on callback | | - | <----GET /callback?code=..---| | - | | | - | 9. Validate state matches | | - | | | - | 10. Exchange code for tokens | | - | POST {token_endpoint} | | - | grant_type=authorization_code | - | code=... | | - | redirect_uri=... | | - | client_id={client_id} | | - | code_verifier=... ------------------------------------->| - | | | - | <-- { access_token, refresh_token, expires_in } -----------| - | | | - | 11. Store token bundle | | - | ~/.config/openshell/gateways//oidc_token.json | -``` - -### Non-Interactive: Client Credentials - -Used for CI/automation when `OPENSHELL_OIDC_CLIENT_SECRET` is set. The optional `audience` parameter is included when the API resource server differs from the client ID. - -``` -CI Agent Keycloak - | | - | POST {token_endpoint} | - | grant_type=client_credentials | - | client_id={client_id} | - | client_secret={OPENSHELL_OIDC_CLIENT_SECRET} | - | [audience={audience}] --------------------------------->| - | | - | <-- { access_token, expires_in } -------------------------| - | | - | Store token bundle (no refresh_token) | -``` - -## Token Storage - -OIDC tokens are stored as JSON at `~/.config/openshell/gateways//oidc_token.json` with `0600` permissions: - -```json -{ - "access_token": "eyJhbGci...", - "refresh_token": "eyJhbGci...", - "expires_at": 1718400300, - "issuer": "http://localhost:8180/realms/openshell", - "client_id": "openshell-cli" -} -``` - -The CLI checks `expires_at` before each request. If the token is within 30 seconds of expiry and a `refresh_token` is available, it silently refreshes via the token endpoint's `refresh_token` grant. If refresh fails, the user is prompted to re-authenticate with `openshell gateway login`. - -## Per-Request Flow - -On every gRPC call, the CLI interceptor injects the token as a standard HTTP header: - -``` -authorization: Bearer eyJhbGci... -``` - -The server-side auth middleware (`AuthGrpcRouter` in `multiplex.rs`) classifies each request into one of three categories and processes it accordingly: - -1. **Strip internal markers** — remove `x-openshell-auth-source` from incoming headers to prevent spoofing. -2. **Unauthenticated?** — health probes and reflection pass through with no auth. -3. **Sandbox-secret?** — supervisor RPCs validate the `x-sandbox-secret` header against the server's SSH handshake secret. On success, mark the request with an internal `x-openshell-auth-source: sandbox-secret` header for downstream authorization. -4. **Dual-auth?** — methods like `UpdateConfig` try sandbox-secret first; if no valid secret, fall through to Bearer token validation. -5. **Bearer token** — extract `authorization: Bearer `, decode the JWT header for `kid`, look up the signing key in the JWKS cache, and validate signature (RS256), `exp`, `iss`, `aud` claims. -6. **Authorize** — on successful authentication, check RBAC roles via `AuthzPolicy` (in `authz.rs`). -7. On any failure, return `UNAUTHENTICATED` or `PERMISSION_DENIED` status. - -## JWKS Key Caching - -The server fetches the OIDC provider's JSON Web Key Set at startup via discovery: - -``` -GET {issuer}/.well-known/openid-configuration -> jwks_uri -GET {jwks_uri} -> { keys: [...] } -``` - -Keys are cached in memory with a configurable TTL (default: 1 hour). A `refresh_mutex` serializes refresh operations so concurrent requests coalesce into a single HTTP fetch. The cache refreshes: - -- When the TTL expires (on next request, re-checked under the mutex to avoid thundering herd). -- Immediately when a JWT references a `kid` not in the cache (handles key rotation). - -## Method Authentication Categories - -Every gRPC method falls into one of three categories, defined in `oidc.rs`: - -### Unauthenticated - -These methods require no authentication at all — health probes and infrastructure endpoints. - -| Method / Prefix | Reason | -|---|---| -| `OpenShell/Health` | Kubernetes liveness/readiness probes | -| `Inference/Health` | Inference service health probes | -| `/grpc.reflection.*` | gRPC server reflection (debugging tools) | -| `/grpc.health.*` | gRPC health check protocol | - -### Sandbox-Secret Authenticated - -Sandbox-to-server RPCs authenticate via the `x-sandbox-secret` metadata header, which must match the server's SSH handshake secret. These methods do not use OIDC Bearer tokens. - -| Method | Purpose | -|---|---| -| `SandboxService/GetSandboxConfig` | Supervisor fetches sandbox configuration | -| `ReportPolicyStatus` | Supervisor reports policy enforcement status | -| `PushSandboxLogs` | Supervisor streams sandbox logs to gateway | -| `GetSandboxProviderEnvironment` | Supervisor fetches provider credentials | -| `SubmitPolicyAnalysis` | Supervisor submits policy analysis results | -| `Inference/GetInferenceBundle` | Supervisor fetches resolved inference routes and provider API keys | - -### Dual-Auth - -These methods accept either an OIDC Bearer token (CLI users) or a sandbox secret (supervisor). The middleware tries sandbox-secret first; if not present, it falls through to Bearer token validation. - -| Method | Purpose | -|---|---| -| `UpdateConfig` | Policy and settings mutations | -| `OpenShell/GetSandboxConfig` | CLI reads effective sandbox policy and settings; sandbox callers may still use the shared secret | - -**Sandbox-secret restriction on `UpdateConfig`:** When a sandbox-secret-authenticated caller invokes `UpdateConfig`, the handler in `policy.rs` enforces strict scope limits via `validate_sandbox_secret_update()`. The caller: - -- **Must** provide a sandbox `name` (sandbox-scoped only). -- **Must** include a `policy` payload (policy sync only). -- **May not** set `global = true` (no global config mutation). -- **May not** set `delete_setting` (no setting deletion). -- **May not** provide a `setting_key` (no setting mutation). - -This ensures the sandbox supervisor can sync its own policy on startup but cannot modify global configuration or sandbox settings. - -## Role-Based Access Control (RBAC) - -After JWT validation, the server checks the user's roles against a per-method requirement. Roles are extracted from a configurable claim path in the JWT. - -### Role Mapping - -| Operation | Required Role | -|---|---| -| Health probes, reflection | (no auth — unauthenticated) | -| Supervisor-only RPCs (`SandboxService/GetSandboxConfig`, `GetInferenceBundle`, etc.) | (sandbox secret — no RBAC) | -| UpdateConfig via sandbox secret | (sandbox secret — scope-restricted, no RBAC) | -| OpenShell/GetSandboxConfig via Bearer | user role | -| Sandbox create, list, delete, exec, SSH | user role | -| Provider list, get | user role | -| Provider create, update, delete | admin role | -| Global config/policy updates | admin role | -| Draft policy approvals/rejections | admin role | -| All other authenticated RPCs | user role | - -### Configurable Roles - -The roles claim path and role names are configurable to support different OIDC providers. Each provider stores roles differently in the JWT: - -| Provider | Roles Claim | Example Admin Role | Example User Role | -|---|---|---|---| -| Keycloak | `realm_access.roles` (default) | `openshell-admin` | `openshell-user` | -| Microsoft Entra ID | `roles` | `OpenShell.Admin` | `OpenShell.User` | -| Okta | `groups` | `openshell-admin` | `openshell-user` | -| GitHub | N/A | (empty — skip RBAC) | (empty — skip RBAC) | - -When both `--oidc-admin-role` and `--oidc-user-role` are set to empty strings, RBAC is skipped entirely — any valid JWT is authorized. This supports providers like GitHub that don't emit roles in JWTs (authentication-only mode). - -**Security note on authentication-only mode:** In this mode, the server validates token signature, issuer, and audience, but does not restrict which principals can call which methods. Any entity able to mint a valid token for the configured audience gains full access. For GitHub Actions, this means any workflow in any repository that can request a token with the configured audience is authorized. Consider using scope enforcement (`--oidc-scopes-claim`) or restricting the audience to limit the blast radius. - -## Scope-Based Fine-Grained Permissions - -Scopes provide opt-in, per-method access control on top of roles. When `--oidc-scopes-claim` is set, the server extracts scopes from the JWT and checks them against an exhaustive method-to-scope map. A caller must have both the required role AND the required scope. - -### Scope Definitions - -| Scope | Operations | -|---|---| -| `sandbox:read` | GetSandbox, ListSandboxes, WatchSandbox, GetSandboxLogs, GetSandboxPolicyStatus, ListSandboxPolicies | -| `sandbox:write` | CreateSandbox, DeleteSandbox, ExecSandbox, CreateSshSession, RevokeSshSession | -| `provider:read` | GetProvider, ListProviders | -| `provider:write` | CreateProvider, UpdateProvider, DeleteProvider | -| `config:read` | GetGatewayConfig, GetSandboxConfig, GetDraftPolicy, GetDraftHistory | -| `config:write` | UpdateConfig (Bearer), ApproveDraftChunk, ApproveAllDraftChunks, RejectDraftChunk, EditDraftChunk, UndoDraftChunk, ClearDraftChunks | -| `inference:read` | GetClusterInference | -| `inference:write` | SetClusterInference | -| `openshell:all` | All of the above (wildcard) | - -Methods not listed in the scope map require `openshell:all`. Scopes cannot escalate privilege — `openshell:all` on a user-role token still cannot call admin methods. - -### Authorization Flow - -``` -Request arrives (Bearer-authenticated) - │ - ├── Role check (existing) - │ └── Does identity have required role? No → PERMISSION_DENIED - │ - └── Scope check (only if --oidc-scopes-claim is configured) - ├── Does identity have openshell:all? → proceed - ├── Does identity have required scope for this method? → proceed - └── No → PERMISSION_DENIED("scope 'X' required") -``` - -When `--oidc-scopes-claim` is not set (default), scope enforcement is disabled and roles alone determine access. Auth-only mode (empty role names) still enforces scopes when enabled. - -### Scope Extraction - -The server extracts scopes from the JWT claim path configured by `--oidc-scopes-claim`. Two formats are supported: - -- **Space-delimited string** (Keycloak, Entra ID): `"openid sandbox:read sandbox:write"` -- **JSON array** (Okta): `["sandbox:read", "sandbox:write"]` - -Standard OIDC scopes (`openid`, `profile`, `email`, `offline_access`) are filtered out before enforcement. - -### CLI Scope Requests - -The `--oidc-scopes` flag on `gateway add` and `gateway start` is stored in gateway metadata and included in OAuth2 token requests: - -- **Browser flow**: appended to the `scope` parameter alongside `openid` -- **Client credentials flow**: sent as-is (without `openid`, which is inappropriate for service tokens) -- **Token refresh**: scopes are not re-sent; the authorization server preserves them per RFC 6749 §6 - -### Provider Compatibility - -| Provider | Scopes Claim | Format | Fine-Grained Selection | -|---|---|---|---| -| Keycloak | `scope` | Space-delimited | Yes — client requests specific scopes | -| Okta | `scp` | JSON array | Yes — client requests specific scopes | -| Entra ID | `scp` | Space-delimited | Limited — uses `.default` for all granted permissions | -| GitHub | N/A | N/A | No — use with scopes disabled | - -### Keycloak Client Scopes - -The dev realm (`scripts/keycloak-realm.json`) includes all 9 OpenShell scopes as **optional scopes** on `openshell-cli` and `openshell:all` as a **default scope** on `openshell-ci`. Built-in Keycloak scopes (`openid`, `profile`, `email`, `roles`, `web-origins`, `acr`) are assigned as default scopes on both clients so roles and profile claims are always present regardless of optional scope requests. - -## Server Configuration - -### Server Binary Flags - -These flags configure JWT validation on the `openshell-server` binary: - -| Flag | Env Var | Default | Description | -|---|---|---|---| -| `--oidc-issuer` | `OPENSHELL_OIDC_ISSUER` | (none) | OIDC issuer URL (enables JWT validation) | -| `--oidc-audience` | `OPENSHELL_OIDC_AUDIENCE` | `openshell-cli` | Expected `aud` claim in validated JWTs | -| `--oidc-jwks-ttl` | `OPENSHELL_OIDC_JWKS_TTL` | `3600` | JWKS cache TTL in seconds | -| `--oidc-roles-claim` | `OPENSHELL_OIDC_ROLES_CLAIM` | `realm_access.roles` | Dot-separated path to roles array in JWT | -| `--oidc-admin-role` | `OPENSHELL_OIDC_ADMIN_ROLE` | `openshell-admin` | Role name for admin access | -| `--oidc-user-role` | `OPENSHELL_OIDC_USER_ROLE` | `openshell-user` | Role name for user access | -| `--oidc-scopes-claim` | `OPENSHELL_OIDC_SCOPES_CLAIM` | (empty) | Claim path for scopes; enables scope enforcement when set | - -When `--oidc-issuer` is not set, OIDC validation is disabled and the server falls back to mTLS-only or plaintext behavior. - -### Gateway Start Flags (CLI) - -The `openshell gateway start` command exposes flags that configure both the server and the local gateway metadata: - -| Flag | Default | Description | -|---|---|---| -| `--oidc-issuer` | (none) | OIDC issuer URL; passed to the server binary | -| `--oidc-audience` | `openshell-cli` | Expected `aud` claim; passed to the server binary | -| `--oidc-client-id` | `openshell-cli` | Client ID stored in gateway metadata for CLI login flows | -| `--oidc-roles-claim` | (none) | Passed to the server binary if set | -| `--oidc-admin-role` | (none) | Passed to the server binary if set | -| `--oidc-user-role` | (none) | Passed to the server binary if set | -| `--oidc-scopes-claim` | (none) | Passed to the server binary; enables scope enforcement | -| `--oidc-scopes` | (none) | Stored in gateway metadata; included in CLI token requests | - -The `--oidc-client-id` flag is **not** a server flag — it is stored in gateway metadata and used by the CLI during login. The `--oidc-audience` flag is both a server flag (for JWT validation) and stored in metadata (for token requests). - -### Helm Values - -```yaml -server: - oidc: - issuer: "https://keycloak.example.com/realms/openshell" - audience: "openshell-cli" - jwksTtl: 3600 - scopesClaim: "scope" # enable scope enforcement (Keycloak) -``` - -### Discovery Endpoint - -The server exposes `GET /auth/oidc-config` which returns the configured OIDC issuer and audience. This allows CLI auto-discovery during `gateway add`. - -## Provider Examples - -### Keycloak - -```bash -openshell gateway start \ - --oidc-issuer http://keycloak:8180/realms/openshell -# Defaults work: realm_access.roles, openshell-admin, openshell-user -``` - -### Microsoft Entra ID - -Register an app in Azure Portal with app roles `OpenShell.Admin` and `OpenShell.User`. With Entra ID the client ID (the SPA/public app registration) and audience (the API app registration, e.g. `api://openshell`) are typically different: - -```bash -openshell gateway start \ - --oidc-issuer https://login.microsoftonline.com/{tenant-id}/v2.0 \ - --oidc-audience api://openshell \ - --oidc-client-id {client-id} \ - --oidc-roles-claim roles \ - --oidc-admin-role OpenShell.Admin \ - --oidc-user-role OpenShell.User -``` - -CLI registration (separate client ID and audience): - -```bash -openshell gateway add https://gateway:8080 \ - --oidc-issuer https://login.microsoftonline.com/{tenant-id}/v2.0 \ - --oidc-client-id {client-id} \ - --oidc-audience api://openshell -``` - -### Okta - -Create an authorization server with a `groups` claim, then: - -```bash -openshell gateway start \ - --oidc-issuer https://dev-xxxxx.okta.com/oauth2/default \ - --oidc-roles-claim groups \ - --oidc-admin-role openshell-admin \ - --oidc-user-role openshell-user -``` - -### GitHub (Authentication Only) - -GitHub's OIDC tokens (from Actions) don't carry roles. Use empty role names to skip RBAC — any valid GitHub JWT is authorized: - -```bash -openshell gateway start \ - --oidc-issuer https://token.actions.githubusercontent.com \ - --oidc-audience https://github.com/{org} \ - --oidc-admin-role "" \ - --oidc-user-role "" -``` - -## CLI Commands - -### Register an OIDC Gateway - -```bash -openshell gateway add http://gateway:8080 \ - --oidc-issuer http://keycloak:8180/realms/openshell - -# With custom client ID: -openshell gateway add http://gateway:8080 \ - --oidc-issuer http://keycloak:8180/realms/openshell \ - --oidc-client-id my-client - -# With separate client ID and audience (e.g. Entra ID): -openshell gateway add http://gateway:8080 \ - --oidc-issuer https://login.microsoftonline.com/{tenant-id}/v2.0 \ - --oidc-client-id {client-id} \ - --oidc-audience api://openshell -``` - -### Start a K3s Gateway with OIDC - -```bash -openshell gateway start \ - --oidc-issuer http://keycloak:8180/realms/openshell \ - --plaintext - -# With RBAC configuration: -openshell gateway start \ - --oidc-issuer http://keycloak:8180/realms/openshell \ - --oidc-client-id openshell-cli \ - --oidc-roles-claim realm_access.roles \ - --oidc-admin-role openshell-admin \ - --oidc-user-role openshell-user -``` - -### Authenticate - -```bash -# Interactive (opens browser) -openshell gateway login -# Expected: ✓ Authenticated to gateway 'openshell' as admin@test - -# CI / automation -OPENSHELL_OIDC_CLIENT_SECRET=secret openshell gateway login -``` - -### Logout - -```bash -openshell gateway logout -# Expected: ✓ Logged out of gateway 'openshell' -``` - -## Keycloak Setup - -### Realm Configuration - -The `scripts/keycloak-realm.json` file provides a pre-configured realm for development: - -- **Realm**: `openshell` -- **Clients**: - - `openshell-cli` — Public client, Authorization Code + PKCE, redirect URIs `http://127.0.0.1:*` - - `openshell-ci` — Confidential client, Client Credentials grant, secret `ci-test-secret` -- **Roles**: `openshell-admin`, `openshell-user` -- **Test Users**: - - `admin@test` / `admin` (roles: `openshell-admin`, `openshell-user`) - - `user@test` / `user` (roles: `openshell-user`) - -### Dev Server - -```bash -# Start Keycloak on port 8180 -./scripts/keycloak-dev.sh start - -# Check status -./scripts/keycloak-dev.sh status - -# Stop -./scripts/keycloak-dev.sh stop -``` - -Admin console: `http://localhost:8180/admin` (admin/admin). - -## Coexistence with Other Auth Modes - -OIDC is additive — it does not replace mTLS or Cloudflare Access. When OIDC is configured, the `AuthGrpcRouter` processes requests through the three-category classification: - -``` -Request arrives - | - +-- Strip x-openshell-auth-source (anti-spoofing) - | - +-- OIDC not configured? --> Pass through (mTLS/plaintext fallback) - | - +-- Unauthenticated method? --> Pass through - | - +-- Sandbox-secret method? - | +-- Valid x-sandbox-secret --> Mark auth-source, pass through - | +-- Invalid/missing --> UNAUTHENTICATED - | - +-- Dual-auth method? - | +-- Valid x-sandbox-secret --> Mark auth-source, pass through - | +-- No sandbox secret --> Fall through to Bearer - | - +-- Has "authorization: Bearer" header? - | +-- Validate JWT --> Check RBAC --> Check scopes (if enabled) --> Authenticated (OIDC) - | +-- Invalid JWT --> UNAUTHENTICATED - | - +-- No bearer header --> UNAUTHENTICATED -``` - -The CLI determines which auth mode to use based on `auth_mode` in gateway metadata. Only one mode is active per gateway registration. - -## Key Files - -| Component | File | -|---|---| -| Server OIDC validation + method classification | `crates/openshell-server/src/oidc.rs` | -| Server auth middleware | `crates/openshell-server/src/multiplex.rs` (`AuthGrpcRouter`) | -| Server authorization (RBAC) | `crates/openshell-server/src/authz.rs` (`AuthzPolicy`) | -| Sandbox-secret scope enforcement | `crates/openshell-server/src/grpc/policy.rs` (`validate_sandbox_secret_update`) | -| Server config | `crates/openshell-core/src/config.rs` (`OidcConfig`) | -| Server CLI flags | `crates/openshell-server/src/main.rs` | -| Server discovery endpoint | `crates/openshell-server/src/auth.rs` (`/auth/oidc-config`) | -| CLI OIDC flows | `crates/openshell-cli/src/oidc_auth.rs` | -| CLI interceptor | `crates/openshell-cli/src/tls.rs` (`EdgeAuthInterceptor`) | -| CLI auth dispatch | `crates/openshell-cli/src/main.rs` (`apply_auth`) | -| CLI gateway commands | `crates/openshell-cli/src/run.rs` (`gateway_add`, `gateway_login`) | -| Token storage | `crates/openshell-bootstrap/src/oidc_token.rs` | -| Gateway metadata | `crates/openshell-bootstrap/src/metadata.rs` | -| Bootstrap pipeline | `crates/openshell-bootstrap/src/lib.rs`, `docker.rs` | -| K3s entrypoint | `deploy/docker/cluster-entrypoint.sh` | -| HelmChart template | `deploy/kube/manifests/openshell-helmchart.yaml` | -| Helm values | `deploy/helm/openshell/values.yaml` | -| Helm statefulset | `deploy/helm/openshell/templates/statefulset.yaml` | -| Keycloak dev script | `scripts/keycloak-dev.sh` | -| Keycloak realm config | `scripts/keycloak-realm.json` | diff --git a/architecture/oidc-local-testing.md b/architecture/oidc-local-testing.md deleted file mode 100644 index 160636a9e..000000000 --- a/architecture/oidc-local-testing.md +++ /dev/null @@ -1,575 +0,0 @@ -# OIDC Local Testing Guide - -Step-by-step instructions for testing OIDC/Keycloak authentication locally, -including both standalone server testing and full end-to-end K3s testing. - -## Prerequisites - -- Docker or Podman -- Rust toolchain (edition 2024, rust 1.88+) -- `grpcurl` (for raw gRPC testing) -- `jq` (for JSON parsing) - -## 1. Start Keycloak - -```bash -mise run keycloak -``` - -Wait for "Keycloak is ready." The script prints connection info including test users. - -Verify: - -```bash -curl -s http://localhost:8180/realms/openshell/.well-known/openid-configuration | jq .issuer -# Expected: "http://localhost:8180/realms/openshell" -``` - -## 2. Standalone Server Testing (No K3s) - -Start the server directly with OIDC enabled. No Kubernetes cluster required. - -```bash -cargo run -p openshell-server -- \ - --disable-tls \ - --db-url sqlite:/tmp/openshell-test.db \ - --ssh-handshake-secret test \ - --oidc-issuer http://localhost:8180/realms/openshell -``` - -You should see: - -``` -OIDC JWT validation enabled (issuer: http://localhost:8180/realms/openshell) -Server listening address=0.0.0.0:8080 -``` - -K8s compute driver warnings are expected and non-fatal. - -### 2a. Test Health (unauthenticated — should succeed) - -```bash -grpcurl -plaintext -import-path proto -proto openshell.proto \ - 127.0.0.1:8080 openshell.v1.OpenShell/Health -# Expected: SERVICE_STATUS_HEALTHY -``` - -### 2b. Test without token (should fail) - -```bash -grpcurl -plaintext -import-path proto -proto openshell.proto \ - 127.0.0.1:8080 openshell.v1.OpenShell/ListSandboxes -# Expected: Code: Unauthenticated, Message: missing authorization header -``` - -### 2c. Get tokens from Keycloak - -```bash -ADMIN_TOKEN=$(curl -s -X POST http://localhost:8180/realms/openshell/protocol/openid-connect/token \ - -d 'grant_type=password&client_id=openshell-cli&username=admin@test&password=admin' \ - | jq -r .access_token) - -USER_TOKEN=$(curl -s -X POST http://localhost:8180/realms/openshell/protocol/openid-connect/token \ - -d 'grant_type=password&client_id=openshell-cli&username=user@test&password=user' \ - | jq -r .access_token) -``` - -### 2d. Test authenticated access - -```bash -# Admin can list sandboxes -grpcurl -plaintext -import-path proto -proto openshell.proto \ - -H "authorization: Bearer $ADMIN_TOKEN" \ - 127.0.0.1:8080 openshell.v1.OpenShell/ListSandboxes -# Expected: {} (empty list) - -# User can list sandboxes -grpcurl -plaintext -import-path proto -proto openshell.proto \ - -H "authorization: Bearer $USER_TOKEN" \ - 127.0.0.1:8080 openshell.v1.OpenShell/ListSandboxes -# Expected: {} (empty list) -``` - -### 2e. Test RBAC - -```bash -# User CANNOT create provider (requires openshell-admin) -grpcurl -plaintext -import-path proto -proto openshell.proto \ - -H "authorization: Bearer $USER_TOKEN" \ - -d '{"provider":{"name":"test","type":"claude","credentials":{"key":"val"}}}' \ - 127.0.0.1:8080 openshell.v1.OpenShell/CreateProvider -# Expected: Code: PermissionDenied, Message: role 'openshell-admin' required - -# Admin CAN create provider -grpcurl -plaintext -import-path proto -proto openshell.proto \ - -H "authorization: Bearer $ADMIN_TOKEN" \ - -d '{"provider":{"name":"test","type":"claude","credentials":{"key":"val"}}}' \ - 127.0.0.1:8080 openshell.v1.OpenShell/CreateProvider -# Expected: success -``` - -### 2f. Test sandbox secret auth - -```bash -# Correct secret — should succeed (returns an empty bundle when no routes are configured) -grpcurl -plaintext -import-path proto -proto inference.proto \ - -H "x-sandbox-secret: test" \ - 127.0.0.1:8080 openshell.inference.v1.Inference/GetInferenceBundle -# Expected: success with { "routes": [], ... } - -# Wrong secret — should fail at auth -grpcurl -plaintext -import-path proto -proto inference.proto \ - -H "x-sandbox-secret: wrong" \ - 127.0.0.1:8080 openshell.inference.v1.Inference/GetInferenceBundle -# Expected: Code: Unauthenticated, Message: invalid sandbox secret - -# No secret — should fail at auth -grpcurl -plaintext -import-path proto -proto inference.proto \ - 127.0.0.1:8080 openshell.inference.v1.Inference/GetInferenceBundle -# Expected: Code: Unauthenticated, Message: sandbox secret required -``` - -### 2g. Test OIDC discovery endpoint - -```bash -curl -s http://127.0.0.1:8080/auth/oidc-config | jq . -# Expected: {"audience":"openshell-cli","issuer":"http://localhost:8180/realms/openshell"} -``` - -Stop the standalone server (Ctrl+C) before proceeding to K3s testing. - -## 3. CLI OIDC Flow (Standalone) - -With the standalone server running from step 2: - -```bash -# Register the gateway with OIDC auth -cargo run -p openshell-cli --features bundled-z3 -- gateway add http://127.0.0.1:8080 \ - --oidc-issuer http://localhost:8180/realms/openshell - -# Browser opens to Keycloak. Login with: admin@test / admin -# Expected: ✓ Authenticated to gateway 'localhost' as admin@test - -# Verify stored token -cat ~/.config/openshell/gateways/127.0.0.1/oidc_token.json | jq . - -# Test authenticated CLI command -cargo run -p openshell-cli --features bundled-z3 -- sandbox list -``` - -### Test client credentials (CI mode) - -The CI client (`openshell-ci`) is separate from the interactive client (`openshell-cli`). -Register the gateway with the CI client ID first: - -```bash -cargo run -p openshell-cli --features bundled-z3 -- gateway add http://127.0.0.1:8080 \ - --oidc-issuer http://localhost:8180/realms/openshell \ - --oidc-client-id openshell-ci - -OPENSHELL_OIDC_CLIENT_SECRET=ci-test-secret \ -cargo run -p openshell-cli --features bundled-z3 -- gateway login -# Expected: ✓ Authenticated to gateway (no browser opened) -``` - -### Test logout - -```bash -cargo run -p openshell-cli --features bundled-z3 -- gateway logout -# Expected: ✓ Logged out of gateway - -cargo run -p openshell-cli --features bundled-z3 -- sandbox list -# Expected: error (no token) -``` - -## 4. End-to-End K3s Testing - -This deploys a full K3s cluster with OIDC enforcement and tests sandbox -creation, RBAC, login/logout, and token expiry. - -### 4a. Bootstrap the cluster with OIDC - -Keycloak runs on the host. The K3s container reaches it via the host IP. -The `OPENSHELL_OIDC_ISSUER` env var tells the deploy script to pass the -issuer to the Helm chart so the gateway starts with JWT validation enabled. - -```bash -HOST_IP=$(hostname -I | awk '{print $1}') -OPENSHELL_OIDC_ISSUER="http://${HOST_IP}:8180/realms/openshell" \ -OPENSHELL_OIDC_SCOPES="openshell:all" \ -mise run cluster -``` - -Add `OPENSHELL_OIDC_SCOPES_CLAIM="scope"` to also enable scope enforcement. -The `OPENSHELL_OIDC_SCOPES` value is stored in gateway metadata so `gateway login` -requests these scopes automatically. - -Wait for "Deploy complete!" and verify OIDC is active: - -```bash -CONTAINER=$(docker ps --format '{{.Names}}' | grep openshell-cluster) -docker exec $CONTAINER kubectl -n openshell logs openshell-0 | grep OIDC -# Expected: OIDC JWT validation enabled (issuer: http://...) -``` - -### 4b. Login to the gateway - -The bootstrap step above configures the gateway metadata with the OIDC -issuer automatically. Authenticate with Keycloak: - -```bash -openshell gateway login -# Login with: admin@test / admin -# Expected: ✓ Authenticated to gateway 'openshell' as admin@test -``` - -### 4c. Create and list sandboxes - -```bash -# Login as admin -openshell gateway login -# Login with: admin@test / admin -# Expected: ✓ Authenticated to gateway 'openshell' as admin@test - -# Create a sandbox -openshell sandbox create -# Expected: Created sandbox: - -# List sandboxes -openshell sandbox list -# Expected: shows the created sandbox -``` - -### 4d. Verify authentication enforcement - -```bash -# Logout -openshell gateway logout -# Expected: ✓ Logged out of gateway 'openshell' - -# Should fail without token -openshell sandbox list -# Expected: Unauthenticated error - -# Login again -openshell gateway login -# Login with: admin@test / admin - -# Should work again -openshell sandbox list -# Expected: shows sandboxes -``` - -### 4e. Verify token expiry - -Keycloak access tokens expire after 5 minutes by default. - -```bash -# Wait 5+ minutes, then: -openshell sandbox list -# Expected: Unauthenticated: ExpiredSignature - -# Re-login -openshell gateway login -openshell sandbox list -# Expected: success -``` - -### 4f. Verify RBAC - -```bash -# Login as admin -openshell gateway login -# Login with: admin@test / admin - -# Admin can create a provider -openshell provider create \ - --name test-provider --type claude --credential API_KEY=test123 -# Expected: success - -# Login as user (openshell-user only, no openshell-admin) -openshell gateway login -# Login with: user@test / user -# Expected: ✓ Authenticated to gateway 'openshell' as user@test - -# User can list sandboxes -openshell sandbox list -# Expected: success - -# User can list providers -openshell provider list -# Expected: shows test-provider - -# User CANNOT create a provider -openshell provider create \ - --name blocked --type claude --credential API_KEY=nope -# Expected: PermissionDenied: role 'openshell-admin' required - -# User CANNOT delete a provider -openshell provider delete test-provider -# Expected: PermissionDenied: role 'openshell-admin' required - -# User CAN create sandboxes -openshell sandbox create -# Expected: success -``` - -### 4g. Test client credentials (CI mode) - -The CI client uses `openshell-ci` (confidential) instead of `openshell-cli` (public). -Update the gateway metadata to use the CI client, then login: - -```bash -jq '.oidc_client_id = "openshell-ci"' \ - ~/.config/openshell/gateways/openshell/metadata.json > /tmp/meta.json \ - && mv /tmp/meta.json ~/.config/openshell/gateways/openshell/metadata.json - -OPENSHELL_OIDC_CLIENT_SECRET=ci-test-secret \ -openshell gateway login -# Expected: ✓ Authenticated to gateway 'openshell' (no browser) - -openshell sandbox list -# Expected: success - -# Restore interactive client for further testing -jq '.oidc_client_id = "openshell-cli"' \ - ~/.config/openshell/gateways/openshell/metadata.json > /tmp/meta.json \ - && mv /tmp/meta.json ~/.config/openshell/gateways/openshell/metadata.json -``` - -### 4h. Clean up sandboxes - -```bash -# Login as admin to clean up -openshell gateway login -# Login with: admin@test / admin - -openshell sandbox list -# Note sandbox names, then: -openshell sandbox delete - -openshell provider delete test-provider -``` - -## 5. Scope-Based Permissions Testing - -Scopes provide fine-grained, per-method access control on top of roles. This section tests scope enforcement using both the standalone server and K3s. - -### 5a. Standalone server with scope enforcement - -```bash -cargo run -p openshell-server -- \ - --disable-tls \ - --db-url sqlite:/tmp/openshell-scopes-test.db \ - --ssh-handshake-secret test \ - --oidc-issuer http://localhost:8180/realms/openshell \ - --oidc-scopes-claim scope -``` - -### 5b. Get tokens with specific scopes - -```bash -# Token with sandbox scopes only -TOKEN_SANDBOX=$(curl -s -X POST http://localhost:8180/realms/openshell/protocol/openid-connect/token \ - -d 'grant_type=password&client_id=openshell-cli&username=admin@test&password=admin' \ - -d 'scope=openid sandbox:read sandbox:write' \ - | jq -r .access_token) - -# Token with all scopes -TOKEN_ALL=$(curl -s -X POST http://localhost:8180/realms/openshell/protocol/openid-connect/token \ - -d 'grant_type=password&client_id=openshell-cli&username=admin@test&password=admin' \ - -d 'scope=openid openshell:all' \ - | jq -r .access_token) - -# Token without OpenShell scopes (roles-only) -TOKEN_NO_SCOPES=$(curl -s -X POST http://localhost:8180/realms/openshell/protocol/openid-connect/token \ - -d 'grant_type=password&client_id=openshell-cli&username=admin@test&password=admin' \ - | jq -r .access_token) -``` - -### 5c. Inspect tokens - -```bash -# Verify scopes are in the JWT -echo "$TOKEN_SANDBOX" | cut -d. -f2 | base64 -d 2>/dev/null | jq '{scope, realm_access, preferred_username}' -# Expected: scope contains "sandbox:read sandbox:write", realm_access has roles, preferred_username is set - -echo "$TOKEN_NO_SCOPES" | cut -d. -f2 | base64 -d 2>/dev/null | jq '.scope' -# Expected: "openid email profile" (no OpenShell scopes) -``` - -### 5d. Test scope enforcement with grpcurl - -```bash -# Sandbox-scoped token — ListSandboxes should work -grpcurl -plaintext -import-path proto -proto openshell.proto \ - -H "authorization: Bearer $TOKEN_SANDBOX" \ - 127.0.0.1:8080 openshell.v1.OpenShell/ListSandboxes -# Expected: success (empty list) - -# Sandbox-scoped token — ListProviders should FAIL -grpcurl -plaintext -import-path proto -proto openshell.proto \ - -H "authorization: Bearer $TOKEN_SANDBOX" \ - 127.0.0.1:8080 openshell.v1.OpenShell/ListProviders -# Expected: PermissionDenied: scope 'provider:read' required - -# openshell:all token — everything works -grpcurl -plaintext -import-path proto -proto openshell.proto \ - -H "authorization: Bearer $TOKEN_ALL" \ - 127.0.0.1:8080 openshell.v1.OpenShell/ListProviders -# Expected: success - -# No-scopes token — denied -grpcurl -plaintext -import-path proto -proto openshell.proto \ - -H "authorization: Bearer $TOKEN_NO_SCOPES" \ - 127.0.0.1:8080 openshell.v1.OpenShell/ListSandboxes -# Expected: PermissionDenied: scope 'sandbox:read' required -``` - -### 5e. Test CLI with scopes - -Stop the standalone server. Register a gateway with scopes: - -```bash -openshell gateway add http://127.0.0.1:8080 \ - --oidc-issuer http://localhost:8180/realms/openshell \ - --oidc-scopes "sandbox:read sandbox:write" -``` - -Or for K3s testing, pass `OPENSHELL_OIDC_SCOPES` during bootstrap: - -```bash -HOST_IP=$(hostname -I | awk '{print $1}') -OPENSHELL_OIDC_ISSUER="http://${HOST_IP}:8180/realms/openshell" \ -OPENSHELL_OIDC_SCOPES_CLAIM="scope" \ -OPENSHELL_OIDC_SCOPES="sandbox:read sandbox:write" \ -mise run cluster -``` - -Then login and test: - -```bash -openshell gateway login -# Login with: admin@test / admin - -openshell sandbox list # should work (has sandbox:read) -openshell provider list # should fail (no provider:read scope) -``` - -### 5f. Test openshell:all via CLI - -For K3s, restart the cluster with `openshell:all`: - -```bash -mise run cluster:stop -HOST_IP=$(hostname -I | awk '{print $1}') -OPENSHELL_OIDC_ISSUER="http://${HOST_IP}:8180/realms/openshell" \ -OPENSHELL_OIDC_SCOPES_CLAIM="scope" \ -OPENSHELL_OIDC_SCOPES="openshell:all" \ -mise run cluster - -openshell gateway login -openshell sandbox list # should work -openshell provider list # should work -``` - -### 5g. Test CI client credentials with scopes - -```bash -OPENSHELL_OIDC_CLIENT_SECRET=ci-test-secret openshell gateway login -# openshell-ci has openshell:all as a default scope - -openshell sandbox list # should work -openshell provider list # should work -``` - -### 5h. Test without scope enforcement (default behavior preserved) - -Restart the server WITHOUT `--oidc-scopes-claim`: - -```bash -cargo run -p openshell-server -- \ - --disable-tls \ - --db-url sqlite:/tmp/openshell-noscopes-test.db \ - --ssh-handshake-secret test \ - --oidc-issuer http://localhost:8180/realms/openshell -``` - -```bash -# Token without scopes should work (roles-only mode) -grpcurl -plaintext -import-path proto -proto openshell.proto \ - -H "authorization: Bearer $TOKEN_NO_SCOPES" \ - 127.0.0.1:8080 openshell.v1.OpenShell/ListSandboxes -# Expected: success — scopes are not enforced -``` - -## 6. Cleanup - -```bash -# Stop the cluster -mise run cluster:stop - -# Stop Keycloak -mise run keycloak:stop -``` - -## Test Users - -| Username | Password | Roles | -|---|---|---| -| `admin@test` | `admin` | `openshell-admin`, `openshell-user` | -| `user@test` | `user` | `openshell-user` | - -## OIDC Clients - -| Client ID | Type | Grant | Secret | -|---|---|---|---| -| `openshell-cli` | Public | Auth Code + PKCE | N/A | -| `openshell-ci` | Confidential | Client Credentials | `ci-test-secret` | - -## Method Authentication Categories - -| Category | Methods | Auth Mechanism | -|---|---|---| -| Unauthenticated | Health, gRPC reflection | None | -| Sandbox-secret | GetSandboxConfig, GetSandboxProviderEnvironment, ReportPolicyStatus, PushSandboxLogs, SubmitPolicyAnalysis | `x-sandbox-secret` header | -| Dual-auth | UpdateConfig | Bearer token OR `x-sandbox-secret` | -| OIDC Bearer | All other RPCs | `authorization: Bearer ` | - -## Role Requirements - -| Operation | Required Role | -|---|---| -| Sandbox create, list, delete, exec, SSH | `openshell-user` | -| Provider list, get | `openshell-user` | -| Provider create, update, delete | `openshell-admin` | -| Global config/policy updates | `openshell-admin` | -| Draft policy approvals | `openshell-admin` | - -## Troubleshooting - -**"missing authorization header"** — No OIDC token stored. Run `openshell gateway login`. - -**"invalid token: ExpiredSignature"** — Token expired (default 5 min). Run `openshell gateway login`. - -**"PermissionDenied: role 'openshell-admin' required"** — Logged in as a user without the admin role. Login as `admin@test`. - -**"sandbox secret required for this method"** — A sandbox-to-server RPC was called without the `x-sandbox-secret` header. - -**"OIDC discovery request failed"** — Server can't reach Keycloak. Use the host IP (not `localhost`) for K3s deployments. - -**"invalid token: unknown signing key"** — JWKS key mismatch. Restart the server to refresh the cache. - -**No "OIDC JWT validation enabled" in K3s logs** — The `OPENSHELL_OIDC_ISSUER` env var was not set when deploying. Re-run `OPENSHELL_OIDC_ISSUER="http://:8180/realms/openshell" mise run cluster gateway` to rebuild and redeploy with OIDC enabled. - -**"InvalidIssuer"** — The issuer URL in the OIDC token does not match the server's configured issuer. Ensure the gateway metadata `oidc_issuer` uses the same URL the server was started with (typically the host IP, not `localhost`). - -**"connection refused" with grpcurl** — On Fedora/systems where `localhost` resolves to IPv6, use `127.0.0.1` instead of `localhost`. - -**"no such table: objects"** — Using `sqlite::memory:` which doesn't run migrations. Use a file path like `sqlite:/tmp/openshell-test.db`. - -**"scope 'X' required"** — The server has `--oidc-scopes-claim` enabled and the token is missing the required scope. Either request the scope during login (`--oidc-scopes "sandbox:read sandbox:write"`) or use `openshell:all` for full access. - -**Token has scopes but server doesn't enforce them** — The server was started without `--oidc-scopes-claim`. Add `--oidc-scopes-claim scope` (for Keycloak) to enable enforcement. - -**Scopes missing from token after Keycloak login** — The browser may have reused an old Keycloak session with the previous scope set. Sign out at `http://localhost:8180/realms/openshell/account/#/` and re-run `openshell gateway login`. diff --git a/architecture/podman-driver.md b/architecture/podman-driver.md deleted file mode 100644 index c6fcfdb0d..000000000 --- a/architecture/podman-driver.md +++ /dev/null @@ -1,271 +0,0 @@ -# Podman Compute Driver - -The Podman compute driver manages sandbox containers via the Podman REST API over a Unix socket. It targets single-machine and developer environments where rootless container isolation is preferred over a full Kubernetes cluster. The driver runs in-process within the gateway server and delegates all sandbox isolation enforcement to the `openshell-sandbox` supervisor binary, which is sideloaded into each container via an OCI image volume mount. - -## Source File Index - -All paths are relative to `crates/openshell-driver-podman/src/`. - -| File | Purpose | -|------|---------| -| `lib.rs` | Crate root; declares modules and re-exports `PodmanComputeConfig`, `PodmanComputeDriver`, `ComputeDriverService` | -| `main.rs` | Standalone binary entrypoint; parses CLI args/env vars, constructs the driver, starts a gRPC server with graceful shutdown | -| `driver.rs` | Core `PodmanComputeDriver` -- sandbox lifecycle (create/stop/delete/list/get), endpoint resolution, GPU detection, rootless pre-flight checks | -| `client.rs` | `PodmanClient` -- async HTTP/1.1 client over Unix socket for the Podman libpod REST API (containers, volumes, networks, secrets, images, events, system info) | -| `container.rs` | Container spec construction -- labels, env vars, resource limits, capabilities, seccomp config, health checks, port mappings, image volumes, secret injection | -| `config.rs` | `PodmanComputeConfig` struct, `ImagePullPolicy` enum, default socket path resolution, `Debug` impl that redacts secrets | -| `grpc.rs` | `ComputeDriverService` -- tonic gRPC service mapping RPCs to driver methods, with error-to-Status conversion | -| `watcher.rs` | Watch stream -- initial state sync via container list, then live Podman events mapped to `WatchSandboxesEvent` protobuf messages | - -## Architecture - -The Podman driver is one of three `ComputeDriver` implementations. It communicates with the Podman daemon over a Unix socket and delegates sandbox isolation to the supervisor binary running inside each container. - -```mermaid -graph TB - CLI["openshell CLI"] -->|gRPC| GW["Gateway Server
(openshell-server)"] - GW -->|in-process| PD["PodmanComputeDriver"] - PD -->|HTTP/1.1
Unix socket| PA["Podman API"] - PA -->|OCI runtime
crun/runc| C["Sandbox Container"] - C -->|image volume
read-only| SV["Supervisor Binary
/opt/openshell/bin/openshell-sandbox"] - SV -->|creates| NS["Nested Network Namespace
veth pair + proxy"] - SV -->|enforces| LL["Landlock + seccomp"] - SV -->|gRPC callback| GW -``` - -### Driver Comparison - -| Aspect | Kubernetes | VM (libkrun) | Podman | -|--------|-----------|--------------|--------| -| Execution model | In-process | Standalone subprocess (gRPC over UDS) | In-process | -| Backend | K8s API (CRD + controller) | libkrun hypervisor (KVM/HVF) | Podman REST API (Unix socket) | -| Isolation boundary | Container (supervisor inside pod) | Hardware VM | Container (supervisor inside container) | -| Supervisor delivery | hostPath volume (read-only) | Embedded in rootfs tarball | OCI image volume (read-only) | -| Network model | Supervisor creates netns inside pod | gvproxy virtio-net (192.168.127.0/24) | Supervisor creates netns inside container | -| Credential injection | Plaintext env var + K8s Secret volume (0400) | Rootfs file copy (0600) + env vars | Podman `secret_env` API + env vars | -| GPU support | Yes (nvidia.com/gpu resource) | No | Yes (CDI device) | -| `stop_sandbox` | Unimplemented | Unimplemented | Implemented (graceful stop) | -| State storage | Kubernetes API (CRD) | In-memory HashMap + filesystem | Podman daemon (container state) | -| Endpoint resolution | Pod IP / cluster DNS | 127.0.0.1 + allocated port | 127.0.0.1 + ephemeral port | - -## Isolation Model - -The Podman driver provides the same four protection layers as the Kubernetes driver. The driver itself does not implement isolation primitives directly -- it configures the container so that the `openshell-sandbox` supervisor binary can enforce them at runtime. - -### Container Security Configuration - -The container spec (`container.rs`) sets: - -| Setting | Value | Rationale | -|---------|-------|-----------| -| `user` | `0:0` | Supervisor needs root for namespace creation, proxy setup, Landlock/seccomp | -| `cap_drop` | `ALL` | Drop all capabilities, then selectively add back | -| `cap_add` | `SYS_ADMIN, NET_ADMIN, SYS_PTRACE, SYSLOG, SETUID, SETGID, DAC_READ_SEARCH` | See capability breakdown below | -| `no_new_privileges` | `true` | Prevent privilege escalation after exec | -| `seccomp_profile_path` | `unconfined` | Supervisor installs its own policy-aware BPF filter; container-level profile would block Landlock/seccomp syscalls during setup | - -### Capability Breakdown - -The Kubernetes driver uses 4 capabilities (`SYS_ADMIN, NET_ADMIN, SYS_PTRACE, SYSLOG`). The Podman driver adds 3 more, all required for rootless operation: - -| Capability | Shared with K8s? | Purpose | -|------------|------------------|---------| -| `SYS_ADMIN` | Yes | seccomp filter installation, namespace creation, Landlock | -| `NET_ADMIN` | Yes | Network namespace veth setup, IP/route configuration | -| `SYS_PTRACE` | Yes | Reading `/proc//exe` and ancestor walk for binary identity | -| `SYSLOG` | Yes | Reading `/dev/kmsg` for bypass-detection diagnostics | -| `SETUID` | Podman only | `drop_privileges()` calls `setuid()` to sandbox user. Rootless `cap_drop: ALL` removes this from the bounding set. | -| `SETGID` | Podman only | `drop_privileges()` calls `setgid()` + `initgroups()`. Same rootless reason as SETUID. | -| `DAC_READ_SEARCH` | Podman only | Proxy reads `/proc//fd/` across UIDs for binary identity. In rootless Podman, supervisor (UID 0 in user namespace) and sandbox processes have different UIDs. | - -In the Kubernetes driver, these three capabilities are implicitly available because the kubelet does not drop them from the bounding set. In rootless Podman, `cap_drop: ALL` removes everything, requiring explicit re-addition. - -All capabilities are only available to the supervisor process. Sandbox child processes lose them after `setuid()` to the sandbox user in the `pre_exec` hook. - -## Supervisor Sideloading - -The supervisor binary is delivered to sandbox containers via Podman's OCI image volume mechanism, distinct from both the Kubernetes hostPath approach and the VM's embedded rootfs. - -```mermaid -sequenceDiagram - participant D as PodmanComputeDriver - participant P as Podman API - participant C as Sandbox Container - - D->>P: pull_image("openshell/supervisor:latest", "missing") - D->>P: create_container(spec with image_volumes) - Note over P: Podman resolves image_volumes at
libpod layer before OCI spec generation - P->>C: Mount supervisor image at /opt/openshell/bin (read-only) - D->>P: start_container - C->>C: entrypoint: /opt/openshell/bin/openshell-sandbox -``` - -The supervisor image is a `FROM scratch` image containing only the prebuilt `openshell-sandbox` binary. It is built by the `supervisor` target in `deploy/docker/Dockerfile.images`. The `image_volumes` field in the container spec mounts this image's filesystem at `/opt/openshell/bin` with `rw: false`, making it a read-only overlay that the sandbox cannot tamper with. - -## TLS - -When the Podman driver's TLS configuration is set (`tls_ca`, `tls_cert`, `tls_key` in `PodmanComputeConfig`), the driver: - -1. Switches the auto-detected endpoint scheme from `http://` to `https://` -2. Bind-mounts the client cert files (read-only) into the container at `/etc/openshell/tls/client/` -3. Sets `OPENSHELL_TLS_CA`, `OPENSHELL_TLS_CERT`, `OPENSHELL_TLS_KEY` env vars pointing to the container-side paths - -The supervisor reads these env vars and uses them to establish an mTLS connection back to the gateway. - -The RPM packaging auto-generates a self-signed PKI on first start via `init-pki.sh`. Client certs are placed in the CLI auto-discovery directory (`~/.config/openshell/gateways/openshell/mtls/`) so the CLI connects with mTLS without manual configuration. See `deploy/rpm/CONFIGURATION.md` for the full RPM configuration reference and `deploy/rpm/QUICKSTART.md` for the quick start guide. - -## Network Model - -Sandbox network isolation uses a two-layer approach: a Podman bridge network for container-to-host communication, and a nested network namespace (created by the supervisor) for sandbox process isolation. - -```mermaid -graph TB - subgraph Host - GW["Gateway Server
127.0.0.1:8080"] - PS["Podman Socket"] - end - - subgraph Bridge["Podman Bridge Network (10.89.x.x)"] - subgraph Container["Sandbox Container"] - SV["Supervisor
(root in user ns)"] - subgraph NestedNS["Nested Network Namespace"] - SP["Sandbox Process
(sandbox user)"] - VE2["veth1: 10.200.0.2"] - end - VE1["veth0: 10.200.0.1
(CONNECT proxy)"] - SV --- VE1 - VE1 ---|veth pair| VE2 - end - end - - GW -.->|SSH via supervisor relay
gRPC session| SV - SV -->|gRPC callback via
host.containers.internal| GW - SP -->|all egress via proxy| VE1 -``` - -Key points: - -- **Bridge network**: Created by `client.ensure_network()` with DNS enabled. Containers on the bridge can see each other at L3, but sandbox processes cannot because they are isolated inside the nested netns. -- **Nested netns**: The supervisor creates a private `NetworkNamespace` with a veth pair (10.200.0.1/24 <-> 10.200.0.2/24). Sandbox processes enter this netns via `setns(fd, CLONE_NEWNET)` in the `pre_exec` hook, forcing all traffic through the CONNECT proxy. -- **Port publishing**: SSH uses `host_port: 0` (ephemeral port assignment) for health checks and debug access. The gateway SSH tunnel uses the supervisor relay (`supervisor_sessions.open_relay()`) rather than connecting directly to the published port. -- **Host gateway**: `host.containers.internal:host-gateway` in `/etc/hosts` allows containers to reach the gateway server on the host. -- **nsenter**: The supervisor uses `nsenter --net=` instead of `ip netns exec` for namespace operations, avoiding the sysfs remount that fails in rootless containers. - -## Supervisor relay (SSH Unix socket) - -Podman now follows the same end-to-end contract as the Kubernetes and VM drivers for the in-container SSH relay: **gateway config → `PodmanComputeConfig` → sandbox environment → supervisor session registration on that path**. - -1. `openshell-core` `Config::sandbox_ssh_socket_path` (gateway YAML / defaults) is copied into `PodmanComputeConfig::sandbox_ssh_socket_path` when the gateway builds the in-process driver (`crates/openshell-server/src/lib.rs`, `ComputeDriverKind::Podman`). -2. `build_env()` in `container.rs` sets `OPENSHELL_SSH_SOCKET_PATH` to that value, alongside required vars `OPENSHELL_ENDPOINT` and `OPENSHELL_SANDBOX_ID` (and `OPENSHELL_SANDBOX`, etc.). These driver-controlled entries overwrite user template env to prevent spoofing. -3. The supervisor reads `OPENSHELL_SSH_SOCKET_PATH` and uses it for the Unix socket the gateway’s SSH stack bridges to. The standalone `openshell-driver-podman` binary sets the same struct field from `OPENSHELL_SANDBOX_SSH_SOCKET_PATH` (`main.rs`). - -## Credential Injection - -The SSH handshake secret is injected via Podman's `secret_env` API rather than a plaintext environment variable. - -| Credential | Mechanism | Visible in `inspect`? | Visible in `/proc//environ`? | -|------------|-----------|----------------------|----------------------------------| -| SSH handshake secret | Podman `secret_env` (created via secrets API, referenced by name) | No | Yes (supervisor only; scrubbed from children) | -| Sandbox identity (`OPENSHELL_SANDBOX_ID`, etc.) | Plaintext env var | Yes | Yes | -| gRPC endpoint (`OPENSHELL_ENDPOINT`) | Plaintext env var, override-protected | Yes | Yes | -| Supervisor relay socket path (`OPENSHELL_SSH_SOCKET_PATH`) | Plaintext env var, override-protected (same value as `PodmanComputeConfig::sandbox_ssh_socket_path`) | Yes | Yes | - -The `build_env()` function in `container.rs` inserts user-supplied variables first, then unconditionally overwrites all security-critical variables to prevent spoofing via sandbox templates: `OPENSHELL_SANDBOX`, `OPENSHELL_SANDBOX_ID`, `OPENSHELL_ENDPOINT`, `OPENSHELL_SSH_SOCKET_PATH`, `OPENSHELL_SSH_HANDSHAKE_SKEW_SECS`, `OPENSHELL_CONTAINER_IMAGE`, `OPENSHELL_SANDBOX_COMMAND`. - -The `PodmanComputeConfig::Debug` impl redacts the handshake secret as `[REDACTED]`. - -## Sandbox Lifecycle - -### Creation Flow - -```mermaid -sequenceDiagram - participant GW as Gateway - participant D as PodmanComputeDriver - participant P as Podman API - - GW->>D: create_sandbox(DriverSandbox) - D->>D: validate name + id - D->>D: validated_container_name() - - D->>P: pull_image(supervisor, "missing") - D->>P: pull_image(sandbox_image, policy) - - D->>P: create_secret(handshake) - Note over D: On failure below, rollback secret - - D->>P: create_volume(workspace) - Note over D: On failure below, rollback volume + secret - - D->>P: create_container(spec) - alt Conflict (409) - D->>P: remove_volume + remove_secret - D-->>GW: AlreadyExists - end - Note over D: On failure below, rollback container + volume + secret - - D->>P: start_container - D-->>GW: Ok -``` - -Each step rolls back all previously-created resources on failure. The Conflict path (409 from container creation) cleans up the volume and secret because they are keyed by the new sandbox's ID, not the conflicting container's. - -### Readiness and health - -The container `healthconfig` in `container.rs` marks the sandbox healthy when **any** of these signals succeeds: legacy file marker `/var/run/openshell-ssh-ready`, **or** `test -S` on the configured supervisor Unix socket path (`sandbox_ssh_socket_path` / `OPENSHELL_SSH_SOCKET_PATH`), **or** the prior TCP check (`ss` listening on the in-container SSH port). That allows relay-only readiness when the supervisor exposes the socket without the old marker or published-port signal. - -### Deletion Flow - -1. Validate `sandbox_name` and stable `sandbox_id` from `DeleteSandboxRequest` -2. Best-effort inspect cross-checks the container label when present, but cleanup remains keyed by the request `sandbox_id` -3. Best-effort stop (result ignored) -4. Force-remove container (`?force=true&v=true`) -5. Remove workspace volume derived from the request `sandbox_id` (warn on failure, continue) -6. Remove handshake secret derived from the request `sandbox_id` (warn on failure, continue) - -If the container is already gone during inspect or remove, the driver still performs idempotent volume/secret cleanup using the request `sandbox_id` and returns `Ok(false)` for the container-delete result. This prevents leaked Podman resources after out-of-band container removal or label drift. - -## Configuration - -| Environment Variable | CLI Flag | Default | Description | -|---------------------|----------|---------|-------------| -| `OPENSHELL_PODMAN_SOCKET` | `--podman-socket` | `$XDG_RUNTIME_DIR/podman/podman.sock` (Linux), `$HOME/.local/share/containers/podman/machine/podman.sock` (macOS) | Podman API Unix socket path | -| `OPENSHELL_SANDBOX_IMAGE` | `--sandbox-image` | (from gateway config) | Default OCI image for sandboxes | -| `OPENSHELL_SANDBOX_IMAGE_PULL_POLICY` | `--sandbox-image-pull-policy` | `missing` | Pull policy: `always`, `missing`, `never`, `newer` | -| `OPENSHELL_GRPC_ENDPOINT` | `--grpc-endpoint` | Auto-detected via `host.containers.internal` | Gateway gRPC endpoint for sandbox callbacks | -| `OPENSHELL_NETWORK_NAME` | `--network-name` | `openshell` | Podman bridge network name | -| `OPENSHELL_SANDBOX_SSH_PORT` | `--sandbox-ssh-port` | `2222` | SSH port inside the container | -| `OPENSHELL_SSH_HANDSHAKE_SECRET` | `--ssh-handshake-secret` | (required) | Shared secret for NSSH1 handshake | -| `OPENSHELL_SANDBOX_SSH_SOCKET_PATH` | `--sandbox-ssh-socket-path` | `/run/openshell/ssh.sock` | Standalone driver only: supervisor Unix socket path in `PodmanComputeConfig` (in-gateway Podman uses server `config.sandbox_ssh_socket_path`) | -| `OPENSHELL_STOP_TIMEOUT` | `--stop-timeout` | `10` | Container stop timeout in seconds (SIGTERM -> SIGKILL) | -| `OPENSHELL_SUPERVISOR_IMAGE` | `--supervisor-image` | `openshell/supervisor:latest` (struct default; standalone binary requires explicit value) | OCI image containing the supervisor binary | - -## Rootless-Specific Adaptations - -The Podman driver is designed for rootless operation. The following adaptations were made compared to the Kubernetes driver: - -1. **subuid/subgid pre-flight check**: `check_subuid_range()` in `driver.rs` warns operators if `/etc/subuid` or `/etc/subgid` entries are missing for the current user. Not a hard error because some systems use LDAP or other mechanisms. - -2. **cgroups v2 requirement**: The driver refuses to start if cgroups v1 is detected. Rootless Podman requires the unified cgroup hierarchy. - -3. **nsenter for namespace operations**: `run_ip_netns()` and `run_iptables_netns()` in `crates/openshell-sandbox/src/sandbox/linux/netns.rs` use `nsenter --net=` instead of `ip netns exec` to avoid the sysfs remount that requires real `CAP_SYS_ADMIN` in the host user namespace. - -4. **DAC_READ_SEARCH capability**: Required for the proxy to read `/proc//fd/` across UIDs within the user namespace. - -5. **SETUID/SETGID capabilities**: Required for `drop_privileges()` to call `setuid()`/`setgid()` after `cap_drop: ALL` removes them from the bounding set. - -6. **host.containers.internal**: Used instead of Docker's `host.docker.internal` for container-to-host communication. Injected via `hostadd` with Podman's `host-gateway` magic value. - -7. **Ephemeral port publishing**: SSH port uses `host_port: 0` because the bridge network IP (10.89.x.x) is not routable from the host in rootless mode. The published port is used for health checks and debug access; the gateway SSH tunnel uses the supervisor relay. - -8. **tmpfs at `/run/netns`**: A private tmpfs is mounted so the supervisor can create named network namespaces via `ip netns add`, which requires `/run/netns` to exist and be writable. - -## Implementation References - -- Gateway integration: `crates/openshell-server/src/compute/mod.rs` (`new_podman` and `PodmanComputeDriver` wiring) -- Server configuration: `crates/openshell-server/src/lib.rs` (`ComputeDriverKind::Podman` — builds `PodmanComputeConfig` including `sandbox_ssh_socket_path` from gateway `Config`) -- Gateway relay path: `openshell-core` `Config::sandbox_ssh_socket_path` in `crates/openshell-core/src/config.rs` -- SSRF mitigation: `crates/openshell-core/src/net.rs` (IP classification: `is_always_blocked_ip`, `is_internal_ip`), `crates/openshell-sandbox/src/proxy.rs` (runtime enforcement on CONNECT/forward proxy), `crates/openshell-server/src/grpc/policy.rs` (load-time validation via `validate_rule_not_always_blocked`) -- Sandbox supervisor: `crates/openshell-sandbox/src/` (Landlock, seccomp, netns, proxy -- shared by all drivers) -- Container engine abstraction: `tasks/scripts/container-engine.sh` (build/deploy support for Docker and Podman) -- Supervisor image build: `deploy/docker/Dockerfile.images` (`supervisor-output` target) diff --git a/architecture/podman-rootless-networking.md b/architecture/podman-rootless-networking.md deleted file mode 100644 index 99d162669..000000000 --- a/architecture/podman-rootless-networking.md +++ /dev/null @@ -1,387 +0,0 @@ -# Rootless Podman Networking - -Deep-dive into how networking works in the Podman compute driver when running rootless with pasta as the network backend. Covers the external tooling (Podman, Netavark, pasta, aardvark-dns), the three nested namespace layers, and the complete data paths for SSH, outbound traffic, and supervisor-to-gateway communication. - -For the general Podman driver architecture (lifecycle, API surface, driver comparison), see [podman-driver.md](podman-driver.md). - -## Component Stack - -Podman's networking is composed of four independent projects: - -| Component | Language | Role | -|-----------|----------|------| -| **Podman** | Go | Container runtime; orchestrates network lifecycle | -| **Netavark** | Rust | Network backend; creates interfaces, bridges, firewall rules | -| **aardvark-dns** | Rust | Authoritative DNS server for container name resolution (A/AAAA records) | -| **pasta** (part of passt) | C | User-mode networking; L2-to-L4 socket translation for rootless containers | - -The key split: rootful containers default to Netavark (bridge networking with real kernel interfaces), while rootless containers default to pasta (user-mode networking, no privileges needed). - -## How Netavark Works (Rootful) - -Netavark is invoked by Podman as an external binary. It reads a JSON network configuration from STDIN and executes one of three commands: - -- `netavark setup ` -- creates interfaces, assigns IPs, sets up firewall rules for NAT/port-forwarding -- `netavark teardown ` -- reverses setup; removes interfaces and firewall rules -- `netavark create` -- takes a partial network config and completes it (assigns subnets, gateways) - -For rootful bridge networking: - -1. Podman creates a network namespace for the container -2. Podman invokes `netavark setup` passing the network config JSON -3. Netavark creates a bridge (e.g., `podman0`) if it doesn't exist -- default subnet is `10.88.0.0/16` -4. Netavark creates a veth pair -- one end goes into the container's netns, the other attaches to the bridge -5. Netavark assigns an IP from the subnet to the container's veth interface (host-local IPAM) -6. Netavark configures iptables/nftables rules -- masquerade for outbound, DNAT for port mappings -7. Netavark starts aardvark-dns if DNS is enabled, listening on the bridge gateway address - -```text -Host Kernel - | - +-- Bridge interface (e.g., "podman0") <-- created by Netavark - | | - | +-- veth pair endpoint (host side, container 1) - | +-- veth pair endpoint (host side, container 2) - | - +-- Host physical interface (e.g., eth0) - | - +-- NAT (iptables/nftables rules managed by Netavark) -``` - -Netavark also supports macvlan networks (container gets a sub-interface of a physical host NIC with its own MAC, appearing directly on the physical network) and external plugins via a documented JSON API. - -## How Pasta Works (Rootless) - -### The Problem - -Unprivileged users cannot create network interfaces on the host. They cannot create veth pairs, bridges, or configure iptables rules. Netavark's bridge approach cannot work directly for rootless containers. - -### The Solution - -Pasta (part of the `passt` project -- same binary, different command name) operates entirely in userspace, translating between the container's L2 TAP interface and the host's L4 sockets. It requires no capabilities or privileges. - -```text -Container Network Namespace - | - +-- TAP device (e.g., "eth0") - | ^ - | | L2 frames (Ethernet) - | v - +-- pasta process (userspace) - | - | Translation: L2 frames <-> L4 sockets - | - v - Host Network Stack (native TCP/UDP/ICMP sockets) -``` - -### Detailed Data Path - -For an outbound TCP connection from a container: - -1. The application calls `connect()` to an external address -2. The kernel routes the packet through the default gateway to the TAP device -3. Pasta reads the raw Ethernet frame from the TAP file descriptor -4. Pasta parses L2/L3/L4 headers and identifies the TCP SYN -5. Pasta opens a native TCP socket on the host and calls `connect()` to the same destination -6. When the host socket connects, pasta reflects the SYN-ACK back through the TAP as an L2 frame -7. For ongoing data transfer, pasta translates between TAP frames and the host socket, coordinating TCP windows and acknowledgments between the two sides - -Pasta does NOT maintain per-connection packet buffers -- it reflects observed sending windows and ACKs directly between peers. This is a thinner translation layer than a full TCP/IP stack (like slirp4netns used). - -### Built-in Services - -Pasta includes minimalistic network services so the container's stack can auto-configure: - -| Service | Purpose | -|---------|---------| -| ARP proxy | Resolves the gateway address to the host's MAC address | -| DHCP server | Hands out a single IPv4 address (same as host's upstream interface) | -| NDP proxy | Handles IPv6 neighbor discovery, SLAAC prefix advertisement | -| DHCPv6 server | Hands out a single IPv6 address (same as host's upstream interface) | - -By default there is no NAT -- pasta copies the host's IP addresses into the container namespace. - -### Local Connection Bypass (Splice Path) - -For connections between the container and the host, pasta implements a zero-copy bypass: - -- Packets with a local destination skip L2 translation entirely -- `splice(2)` for TCP (zero-copy), `recvmmsg(2)` / `sendmmsg(2)` for UDP (batched) -- Achieves ~38 Gbps TCP throughput for local connections - -### Port Forwarding - -By default, pasta uses auto-detection: it scans `/proc/net/tcp` and `/proc/net/tcp6` periodically and automatically forwards any ports that are bound/listening. Port forwarding is fully configurable via pasta options. - -### Security Properties - -- No dynamic memory allocation (`sbrk`, `brk`, `mmap` blocked via seccomp) -- All capabilities dropped (except `CAP_NET_BIND_SERVICE` if granted) -- Restrictive seccomp profiles (43 syscalls allowed on x86_64) -- Detaches into its own user, mount, IPC, UTS, PID namespaces -- No external dependencies beyond libc -- ~5,000 lines of code target - -### Inter-Container Limitation - -Unlike bridge networking, pasta containers are isolated from each other by default. No virtual bridge connects them. Communication requires port mappings through the host, pods (shared network namespace), or opting into rootless Netavark bridge networking via `podman network create`. - -## Three Nested Namespaces in the Podman Driver - -The Podman compute driver creates three layers of network isolation: - -```text -Namespace 1: Host - | - pasta manages port forwarding (127.0.0.1:) - gateway listens on 0.0.0.0:8080 - | -Namespace 2: Rootless Podman network namespace (managed by pasta) - | - Bridge "openshell" (10.89.x.0/24) - aardvark-dns for container name resolution - | - Container netns (10.89.x.2) - supervisor, proxy, SSH daemon all run here - | -Namespace 3: Inner sandbox netns (created by supervisor) - | - veth pair (10.200.0.1 <-> 10.200.0.2) - iptables forces all traffic through proxy - user workload runs here -``` - -Pasta bridges namespace 1 and 2, the veth pair bridges namespace 2 and 3, and the proxy at the boundary of 2/3 enforces network policy. - -### Layer 1: Pasta (Rootless Podman Bridge) - -At driver startup (`driver.rs:104-114`), the driver ensures a Podman bridge network exists: - -```rust -client.ensure_network(&config.network_name).await?; -``` - -This creates a bridge network named `"openshell"` (default from `DEFAULT_NETWORK_NAME` in `openshell-core/src/config.rs`) with `dns_enabled: true`. In rootless mode, this bridge exists inside a user namespace managed by pasta. The bridge IP range (e.g., `10.89.x.x`) is not routable from the host. - -```text -Host (your machine) - | - 127.0.0.1: <--- pasta binds this on the host - | - [pasta process] <--- translates L4 sockets <-> L2 TAP frames - | - [rootless network namespace] - | - Bridge "openshell" (10.89.1.0/24) - | - +-- 10.89.1.1 (bridge gateway, aardvark-dns listens here) - | - +-- veth --> Container netns - | - 10.89.1.2 (container IP) -``` - -### Layer 2: Container Networking (Pasta Port Forwarding) - -The container spec (`container.rs:447-471`) configures: - -- `nsmode: "bridge"` -- uses the Podman bridge network -- `networks: {"openshell"}` -- attaches to the named bridge -- `portmappings: [{host_port: 0, container_port: 2222, protocol: "tcp"}]` -- publishes SSH on an ephemeral host port -- `hostadd: ["host.containers.internal:host-gateway"]` -- resolves to the host IP (pasta uses `169.254.1.2` in rootless mode) - -Pasta is never explicitly configured. The driver sets `nsmode: "bridge"` and Podman selects pasta automatically as the rootless network backend. The driver logs the detected backend at startup (`driver.rs:86`): - -```rust -network_backend = %info.host.network_backend, -``` - -The `host.containers.internal` hostname (the Podman equivalent of Docker's `host.docker.internal`) is injected into `/etc/hosts` so the supervisor can reach the gateway on the host. The gRPC callback endpoint is auto-detected at `driver.rs:116-130`: - -```rust -if config.grpc_endpoint.is_empty() { - config.grpc_endpoint = - format!("http://host.containers.internal:{}", config.gateway_port); -} -``` - -The bridge gateway IP does NOT work for this purpose in rootless mode because it lives inside the user namespace, not on the host. - -### Layer 3: Inner Sandbox Network Namespace - -Inside the container, the supervisor creates another network namespace (`netns.rs:53-178`, setup at lines 53-63, `ip netns add` at line 77) for the user workload: - -```text -Container (10.89.1.2 on the Podman bridge) - | - [Supervisor process - runs in container's default netns] - | - +-- Proxy listener at 10.200.0.1:3128 - | - +-- veth pair: veth-h-{short_id} <-> veth-s-{short_id} - | - +-- Inner network namespace "sandbox-{short_id}" (short_id = first 8 chars of UUID) - | - 10.200.0.2/24 - | - default route -> 10.200.0.1 (supervisor's proxy) - | - [User's code runs here] - | - iptables rules (IPv4; IPv6 installed best-effort): - ACCEPT -> 10.200.0.1:{proxy_port} TCP (proxy) - ACCEPT -> loopback (-o lo) - ACCEPT -> established/related (conntrack) - LOG -> TCP SYN bypass attempts (rate-limited 5/sec) - REJECT -> TCP (icmp-port-unreachable) - LOG -> UDP bypass attempts (rate-limited 5/sec) - REJECT -> UDP (icmp-port-unreachable) -``` - -The supervisor uses `nsenter --net=` rather than `ip netns exec` to avoid sysfs remount issues that arise under rootless Podman where real `CAP_SYS_ADMIN` is unavailable (`netns.rs:681-716`, function body at 691). - -A tmpfs is mounted at `/run/netns` in the container spec (`container.rs:458-463`) so the supervisor can create named network namespaces. In rootless Podman this directory does not exist on the host, so `mkdir` would fail with `EPERM` without a private tmpfs. - -## Complete Data Paths - -### SSH Session: Client to Sandbox Shell - -```text -Client (CLI on user's machine) - | - 1. gRPC: CreateSshSession -> gateway (returns token, connect_path) - 2. HTTP CONNECT /connect/ssh to gateway - (headers: x-sandbox-id, x-sandbox-token) - | -Gateway (host, port 8080) - | - 3. Looks up SupervisorSession for sandbox_id - 4. Sends RelayOpen{channel_id} over ConnectSupervisor bidi stream - | - [gRPC traverses: host -> pasta L4 translation -> container bridge] - | -Supervisor (inside container at 10.89.x.2) - | - 5. Receives RelayOpen, opens new RelayStream RPC back to gateway - 6. Sends RelayInit{channel_id} on the stream - 7. Connects to Unix socket /run/openshell/ssh.sock - 8. Bidirectional bridge: RelayStream <-> Unix socket (16 KiB chunks) - | -SSH daemon (inside container, Unix socket only, root-only permissions) - | - 9. Authenticates (all auth accepted -- access gated by relay chain) - 10. Spawns shell process - 11. Shell enters inner netns via setns(fd, CLONE_NEWNET) - | -User's shell (in sandbox netns at 10.200.0.2) -``` - -The SSH daemon listens on a Unix socket (not a TCP port) with 0600 permissions. The published port mapping (`host_port: 0 -> container_port: 2222`) exists in the container spec but is currently inert -- nothing listens on TCP 2222 inside the container. All SSH communication uses the gRPC reverse-connect relay pattern exclusively. - -### Outbound HTTP Request from Sandbox Process - -```text -User's code (inner netns, 10.200.0.2) - | - 1. curl https://api.example.com - (HTTP_PROXY=http://10.200.0.1:3128 set via environment) - | - 2. TCP connect to 10.200.0.1:3128 - (allowed by iptables -- only permitted egress destination) - | - 3. HTTP CONNECT api.example.com:443 - | -Supervisor proxy (10.200.0.1:3128 in container netns) - | - 4. OPA policy evaluation (process identity via /proc/net/tcp -> PID) - 5. SSRF check (block internal IPs unless allowed by policy) - 6. Optional L7: TLS intercept, HTTP method/path inspection - | - 7. If allowed: TCP connect to api.example.com:443 - (from container netns, 10.89.x.2) - | - 8. Through Podman bridge -> pasta L2-to-L4 -> host -> internet -``` - -### Supervisor gRPC Callback to Gateway - -The Podman driver auto-detects the callback endpoint scheme based on -whether TLS client certificates are configured. When the RPM's -auto-generated PKI is in place, the endpoint is -`https://host.containers.internal:8080` and the supervisor connects -with mTLS. Without TLS configuration, it falls back to -`http://host.containers.internal:8080`. - -```text -Supervisor (container netns, 10.89.x.2) - | - 1. mTLS connect to https://host.containers.internal:8080 - (resolves to 169.254.1.2:8080 via /etc/hosts) - Client cert bind-mounted from host at /etc/openshell/tls/client/ - | - 2. Routed through container default gateway (bridge) - | - 3. Pasta translates: L2 frame -> host L4 socket - | - 4. Host TCP socket connects to gateway (0.0.0.0:8080) - | -Gateway (host, 0.0.0.0:8080, mTLS enabled) - | - 5. TLS handshake: server presents server cert, client presents client cert - 6. ConnectSupervisor bidirectional stream established - 7. Heartbeats every N seconds (gateway sends interval in SessionAccepted, default 15s) - 8. Reconnects with exponential backoff (1s initial, 30s max) on failure - 9. Same gRPC channel reused for RelayStream calls (no new TLS handshake) -``` - -The gateway binds to `0.0.0.0` by default in the RPM packaging. mTLS -prevents unauthenticated access even though the gateway is reachable -from the network. Client certificates are auto-generated by -`init-pki.sh` on first start and bind-mounted into sandbox containers -by the Podman driver. See `deploy/rpm/CONFIGURATION.md` for the full -configuration reference. - -## Differences from the Kubernetes Driver - -| Aspect | Kubernetes | Podman (rootless pasta) | -|--------|-----------|----------------------| -| Container/Pod IP | Routable cluster-wide | Non-routable (10.89.x.x inside user namespace) | -| Network reachability | Pod IPs reachable from gateway | Bridge not routable from host; requires `host.containers.internal` | -| Sandbox -> Gateway | Direct TCP to K8s service IP | `host.containers.internal` via bridge + pasta | -| SSH transport | Reverse gRPC relay (`ConnectSupervisor` + `RelayStream`) -- same mechanism as Podman | Reverse gRPC relay (`ConnectSupervisor` + `RelayStream`) | -| Port publishing | Not needed (routable IPs) | Ephemeral host port via pasta port forwarding | -| TLS | mTLS via K8s secrets | mTLS via auto-generated PKI (RPM default) or `--disable-tls` | -| DNS | Kubernetes CoreDNS | Podman bridge DNS (aardvark-dns, `dns_enabled: true`) | -| Network policy | K8s NetworkPolicy (ingress restricted to gateway) | iptables inside inner sandbox netns | -| Supervisor delivery | Kubernetes driver managed pod image/template | OCI image volume mount (FROM scratch image) | -| Secrets | K8s Secret volume mount (TLS certs); SSH handshake secret via env var | Podman `secret_env` API (hidden from `podman inspect`) | - -Both drivers use the same reverse gRPC relay (`ConnectSupervisor` + `RelayStream`) for SSH transport. The most significant difference is network reachability: in rootless Podman, the bridge network is not routable from the host, so all communication between host and container goes through either pasta port forwarding (`portmappings`) or the `host.containers.internal` hostname (resolved to `169.254.1.2` by pasta). - -## Port Assignments - -| Port | Component | Purpose | -|------|-----------|---------| -| 8080 | Gateway | gRPC + HTTP multiplexed (default `DEFAULT_SERVER_PORT`) | -| 2222 | Sandbox | Port mapping in container spec (default `DEFAULT_SSH_PORT`); currently inert -- SSH daemon uses Unix socket only | -| 3128 | Sandbox proxy | HTTP CONNECT proxy (inside container, on inner netns host side) | -| 0 (ephemeral) | Host (via pasta) | Published mapping for container SSH port | - -## Key Source Files - -| File | What it controls | -|------|-----------------| -| `crates/openshell-driver-podman/src/driver.rs` | Bridge network creation, gRPC endpoint auto-detection, rootless checks | -| `crates/openshell-driver-podman/src/container.rs` | Container spec: network mode, port mappings, hostadd, tmpfs, capabilities | -| `crates/openshell-driver-podman/src/client.rs` | Podman REST API calls for network ensure/inspect, port discovery | -| `crates/openshell-driver-podman/src/config.rs` | Network name, socket path, SSH port, gateway port defaults | -| `crates/openshell-sandbox/src/sandbox/linux/netns.rs` | Inner network namespace: veth pair, IP addressing, iptables rules | -| `crates/openshell-sandbox/src/proxy.rs` | HTTP CONNECT proxy: OPA policy, SSRF protection, L7 inspection | -| `crates/openshell-sandbox/src/ssh.rs` | SSH daemon on Unix socket, shell process netns entry via `setns()` | -| `crates/openshell-sandbox/src/supervisor_session.rs` | gRPC ConnectSupervisor stream, RelayStream for SSH tunneling | -| `crates/openshell-sandbox/src/grpc_client.rs` | gRPC channel to gateway (mTLS or plaintext, keep-alive, adaptive windowing) | -| `crates/openshell-server/src/ssh_tunnel.rs` | Gateway-side SSH tunnel: HTTP CONNECT endpoint, relay bridging | -| `crates/openshell-server/src/supervisor_session.rs` | SupervisorSessionRegistry, relay claim/open lifecycle | -| `crates/openshell-server/src/compute/mod.rs` | `ComputeRuntime::new_podman()` -- Podman compute driver initialization | -| `crates/openshell-core/src/config.rs` | Default constants: ports, network name | diff --git a/architecture/policy-advisor.md b/architecture/policy-advisor.md deleted file mode 100644 index c70bfcbd3..000000000 --- a/architecture/policy-advisor.md +++ /dev/null @@ -1,246 +0,0 @@ -# Policy Advisor - -The Policy Advisor is a recommendation system that observes denied connections in a sandbox and proposes policy updates to allow legitimate traffic. It operates as a feedback loop: denials are detected, aggregated, analyzed sandbox-side, and submitted to the gateway for the user to review and approve. - -This document covers the plumbing layer (issue #204). The LLM-powered agent harness that enriches recommendations with context-aware rationale is covered separately (issue #205). - -## Overview - -```mermaid -flowchart LR - PROXY[Sandbox Proxy] -->|DenialEvent| AGG[DenialAggregator] - AGG -->|drain| MAPPER[Mechanistic Mapper] - MAPPER -->|SubmitPolicyAnalysis| GW[Gateway Server] - GW --> STORE[(SQLite/Postgres)] - STORE --> CLI[CLI: openshell rule] - STORE --> TUI[TUI: Network Rules] - CLI -->|approve/reject| GW - TUI -->|approve/reject| GW - GW -->|merge rule| POLICY[Active Policy] -``` - -The key architectural decision: **all analysis runs sandbox-side**. The gateway is a thin persistence + validation + approval layer. It never generates proposals or calls an LLM. Each sandbox handles its own analysis — N sandboxes = N independent pipelines, horizontal scale for free. - -## Components - -### Denial Aggregator (Sandbox Side) - -The `DenialAggregator` (`crates/openshell-sandbox/src/denial_aggregator.rs`) runs as a background tokio task inside the sandbox supervisor. It: - -1. Receives `DenialEvent` structs from the proxy via an unbounded MPSC channel -2. Deduplicates events by `(host, port, binary)` key with running counters -3. Periodically drains accumulated summaries, runs the mechanistic mapper, and submits proposals to the gateway via `SubmitPolicyAnalysis` gRPC - -The flush interval defaults to 10 seconds (configurable via `OPENSHELL_DENIAL_FLUSH_INTERVAL_SECS`). - -### Denial Event Sources - -Events are emitted at four denial points in the proxy: - -| Source | Stage | File | Description | -|--------|-------|------|-------------| -| CONNECT OPA deny | `connect` | `proxy.rs` | No matching network policy rule | -| CONNECT SSRF deny | `ssrf` | `proxy.rs` | Resolved IP is internal/private | -| FORWARD OPA deny | `forward` | `proxy.rs` | Forward proxy policy deny | -| FORWARD SSRF deny | `ssrf` | `proxy.rs` | Forward proxy SSRF check failed | - -L7 (per-request) denials from `l7/relay.rs` are captured via tracing in the current implementation, with structured channel support planned for issue #205. - -### Mechanistic Mapper (Sandbox Side) - -The `mechanistic_mapper` module (`crates/openshell-sandbox/src/mechanistic_mapper.rs`) generates draft policy recommendations deterministically, without requiring an LLM: - -1. Groups denial summaries by `(host, port, binary)` — one proposal per unique triple -2. For each group, generates a `NetworkPolicyRule` allowing that endpoint for that binary -3. Generates idempotent rule names via `generate_rule_name(host, port)` producing deterministic names like `allow_httpbin_org_443` — DB-level dedup handles uniqueness, no collision checking needed -4. **Filters always-blocked destinations.** Before DNS resolution, `is_always_blocked_destination(host)` checks if the host is a literal always-blocked IP (loopback, link-local, unspecified) or the hostname `localhost`. If so, the proposal is skipped with an info log and `continue`. This prevents an infinite TUI notification loop: the proxy denies these destinations regardless of policy, so they re-trigger denials every flush cycle without any possible fix. The helper lives in the mapper module and delegates to `openshell_core::net::is_always_blocked_ip` for literal IP addresses. -5. Resolves each host via DNS; if any resolved IP is private (RFC 1918, loopback, link-local), populates `allowed_ips` in the proposed endpoint for the SSRF override. The `resolve_allowed_ips_if_private` function filters out always-blocked IPs from the resolved address list before populating `allowed_ips` — only RFC 1918/ULA addresses survive. If *all* resolved IPs are always-blocked (e.g., a host that resolves solely to `127.0.0.1`), the function returns an empty vec. -6. Computes confidence scores based on: - - Denial count (higher count = higher confidence) - - Port recognition (well-known ports like 443, 5432 get a boost) - - SSRF origin (SSRF denials get lower confidence) -7. Generates security notes for private IPs, database ports, and ephemeral port ranges -8. If L7 request samples are present, generates specific L7 rules (method + path) with `protocol: rest` (TLS termination is automatic — no `tls` field needed). Plumbed but not yet fed data — see issue #205. - -The mapper runs in `flush_proposals_to_gateway` after the aggregator drains. It produces `PolicyChunk` protos that are sent alongside the raw `DenialSummary` protos to the gateway. - -#### Shared IP Classification Helpers - -IP classification functions (`is_always_blocked_ip`, `is_always_blocked_net`, `is_internal_ip`) live in `openshell_core::net` (`crates/openshell-core/src/net.rs`). They are shared across the sandbox proxy (runtime SSRF enforcement), the mechanistic mapper (proposal filtering), and the gateway server (defense-in-depth validation on approval). The distinction between the two tiers: - -- **Always-blocked** (`is_always_blocked_ip`): loopback (`127.0.0.0/8`), link-local (`169.254.0.0/16`, `fe80::/10`), unspecified (`0.0.0.0`, `::`), and IPv4-mapped IPv6 equivalents. These are blocked unconditionally — no policy can override them. -- **Internal** (`is_internal_ip`): a superset that adds RFC 1918 (`10/8`, `172.16/12`, `192.168/16`) and IPv6 ULA (`fc00::/7`). These are blocked by default but can be allowed via `allowed_ips` in policy rules. - -### Gateway: Validate and Persist - -The gateway's `SubmitPolicyAnalysis` handler (`crates/openshell-server/src/grpc.rs`) is deliberately thin: - -1. Receives proposed chunks and denial summaries from the sandbox -2. Validates each chunk (rejects missing `rule_name` or `proposed_rule`) -3. Extracts `(host, port, binary)` from the proposed rule for the dedup key -4. Persists via upsert — `ON CONFLICT (sandbox_id, host, port, binary) DO UPDATE SET hit_count = hit_count + excluded.hit_count, last_seen_ms = excluded.last_seen_ms` -5. Notifies watchers so the TUI refreshes - -The gateway does not store denial summaries (they are included in the request for future audit trail use but not persisted today). It does not run the mapper or any analysis. - -#### Always-Blocked Validation on Approval - -`merge_chunk_into_policy` (`crates/openshell-server/src/grpc/policy.rs`) validates proposed rules before merging them into the active policy. The `validate_rule_not_always_blocked` function runs as a defense-in-depth gate, catching rules that the sandbox mapper should have filtered but didn't (e.g., proposals from an older sandbox version): - -- Rejects endpoint hosts that parse as always-blocked IPs (loopback, link-local, unspecified) -- Rejects the literal hostname `localhost` (case-insensitive, with or without trailing dot) -- Rejects `allowed_ips` entries that parse as always-blocked networks via `is_always_blocked_net` - -On failure, the function returns `Status::invalid_argument` with a message explaining that the proxy will deny traffic to the destination regardless of policy. This uses the same `openshell_core::net` helpers as the sandbox-side filtering. - -### Persistence - -Draft chunks are stored in the gateway database: - -```sql -CREATE TABLE draft_policy_chunks ( - id TEXT PRIMARY KEY, - sandbox_id TEXT NOT NULL, - draft_version INTEGER NOT NULL, - status TEXT NOT NULL DEFAULT 'pending', -- pending | approved | rejected - rule_name TEXT NOT NULL, - proposed_rule BLOB NOT NULL, -- protobuf-encoded NetworkPolicyRule - rationale TEXT NOT NULL DEFAULT '', - security_notes TEXT NOT NULL DEFAULT '', - confidence REAL NOT NULL DEFAULT 0.0, - host TEXT NOT NULL DEFAULT '', -- denormalized for dedup - port INTEGER NOT NULL DEFAULT 0, - binary TEXT NOT NULL DEFAULT '', -- per-binary granularity - hit_count INTEGER NOT NULL DEFAULT 1, -- accumulated real denial count - first_seen_ms INTEGER NOT NULL, - last_seen_ms INTEGER NOT NULL, - created_at_ms INTEGER NOT NULL, - decided_at_ms INTEGER -); - --- One active chunk per (sandbox, endpoint, binary). -CREATE UNIQUE INDEX idx_draft_chunks_endpoint - ON draft_policy_chunks (sandbox_id, host, port, binary) - WHERE status IN ('pending', 'approved', 'rejected'); -``` - -Schema lives in `crates/openshell-server/migrations/{sqlite,postgres}/003_create_policy_recommendations.sql`. - -### Per-Binary Granularity - -Each `(sandbox_id, host, port, binary)` gets its own row. Two unrelated processes hitting the same endpoint (e.g. `python3` and a separately launched `curl` both denied for `ip-api.com:80`) produce two separate rules in the TUI. Approving one doesn't approve the other. When both are approved, they share the same `NetworkPolicyRule` in the active policy with two entries in the `binaries` list. Revoking one removes only that binary from the rule; if no binaries remain, the entire rule is removed. - -Note: OPA's `binary_allowed` rule includes ancestor matching — a child process (e.g. curl spawned by python via `subprocess.run`) inherits its parent's network access because the parent binary appears in the child's `/proc` ancestor chain. This means a child process won't generate a separate denial for endpoints its parent is already approved for. Per-binary granularity is most visible when different binaries independently access distinct endpoints. - -## Approval Workflow - -Draft chunks follow a toggle model: - -```mermaid -stateDiagram-v2 - [*] --> pending: proposed - pending --> approved: approve - pending --> rejected: reject - approved --> rejected: revoke - rejected --> approved: approve -``` - -There is no "undo" — reject is the revoke. Re-denials of a rejected endpoint bump `hit_count` and `last_seen_ms` but don't change status. - -### Approval Actions - -| Action | CLI Command | gRPC RPC | Effect | -|--------|-------------|----------|--------| -| View rules | `openshell rule get ` | `GetDraftPolicy` | List pending/approved/rejected chunks | -| Approve one | `openshell rule approve --chunk-id X` | `ApproveDraftChunk` | Merge rule into active policy, mark approved | -| Reject one | `openshell rule reject --chunk-id X` | `RejectDraftChunk` | Mark rejected (no policy change) | -| Approve all | `openshell rule approve-all ` | `ApproveAllDraftChunks` | Bulk approve all pending chunks | -| History | `openshell rule history ` | `GetDraftHistory` | Show timeline of proposals and decisions | - -### Policy Merge - -When a chunk is approved, the server: - -1. Decodes the chunk's `proposed_rule` (protobuf `NetworkPolicyRule`) -2. Fetches the current active `SandboxPolicy` -3. Looks up the rule by `rule_name` in `network_policies`: - - If the rule exists, **appends** the chunk's binary to the rule's `binaries` list - - If no rule exists, inserts the whole proposed rule -4. Persists a new policy revision with deterministic hash (optimistic retry up to 5 attempts on version conflicts) -5. Supersedes older policy versions -6. Notifies watchers (triggers sandbox policy poll) - -When a chunk is revoked (approved → rejected), the server calls `remove_chunk_from_policy`: - -1. Finds the rule by `rule_name` -2. Removes just this chunk's binary from the rule's `binaries` list -3. If no binaries remain, removes the entire rule -4. Persists a new policy revision - -The sandbox picks up the new policy on its next poll cycle (default 10 seconds) and hot-reloads the OPA engine. - -## User Interfaces - -### CLI - -The `openshell rule` command group provides review and approval: - -```bash -# View pending recommendations -openshell rule get my-sandbox - -# Approve a specific chunk -openshell rule approve my-sandbox --chunk-id abc123 - -# Approve all pending -openshell rule approve-all my-sandbox - -# Reject a chunk -openshell rule reject my-sandbox --chunk-id xyz789 -``` - -### TUI - -The TUI sandbox screen includes a "Network Rules" panel accessible via `[r]` from the sandbox detail view. It displays: - -- List of rules with endpoint, binary name (short), and status badge (pending/approved/rejected) -- Hit count and first/last seen timestamps -- Expanded detail popup with full binary path, rationale, security notes, and proposed rule - -Keybindings are state-aware: - -- **Pending** → `[a]` approve, `[x]` reject, `[A]` approve all -- **Approved** → `[x]` revoke -- **Rejected** → `[a]` approve - -## Configuration - -| Environment Variable | Default | Description | -|---------------------|---------|-------------| -| `OPENSHELL_DENIAL_FLUSH_INTERVAL_SECS` | `10` | How often the aggregator flushes and submits proposals | -| `OPENSHELL_POLICY_POLL_INTERVAL_SECS` | `10` | How often the sandbox polls for policy updates | - -## Known Behavior - -### Always-Blocked Destinations - -Destinations classified as always-blocked (loopback, link-local, unspecified, `localhost`) are filtered at three layers: - -1. **Sandbox mapper** — `generate_proposals` skips them before building a `PolicyChunk` -2. **Sandbox mapper** — `resolve_allowed_ips_if_private` strips always-blocked IPs from `allowed_ips`, returning empty if none survive -3. **Gateway approval** — `merge_chunk_into_policy` rejects them with `INVALID_ARGUMENT` - -If a sandbox process repeatedly attempts connections to these addresses, the proxy denies them every time and the denial aggregator accumulates counts. The mapper discards these summaries silently rather than forwarding un-fixable proposals. Before issue #814, these proposals would reach the TUI and reappear every flush cycle (default 10 seconds) since approving them would have no effect — the proxy blocks them regardless of policy. - -Existing `pending` rows for always-blocked destinations that were persisted before this filtering was added remain in the database. They are inert: attempting to approve them now fails at the gateway's `validate_rule_not_always_blocked` check. They can be rejected manually via `openshell rule reject` or left to age out. - -## Future Work (Issue #205) - -The LLM PolicyAdvisor agent will run sandbox-side via `inference.local`: - -- Wrap the mechanistic mapper with LLM-powered analysis -- Generate context-aware rationale explaining *why* each rule is recommended -- Group related denials into higher-level recommendations -- Detect patterns (e.g., "this looks like a pip install") and suggest broader rules -- Validate proposals against the local OPA engine before submission -- Progressive L7 visibility: Stage 1 audit-mode rules → Stage 2 data-driven L7 refinement diff --git a/architecture/sandbox-connect.md b/architecture/sandbox-connect.md deleted file mode 100644 index cf49ff916..000000000 --- a/architecture/sandbox-connect.md +++ /dev/null @@ -1,649 +0,0 @@ -# Sandbox Connect Architecture - -## Overview - -Sandbox connect provides secure remote access into running sandbox environments. It supports three modes of interaction: - -1. **Interactive shell** (`sandbox connect`) -- opens a PTY-backed SSH session for interactive use -2. **Command execution** (`sandbox create -- `) -- runs a command over SSH with stdout/stderr piped back -3. **File sync** (`sandbox create --upload`) -- uploads local files into the sandbox before command execution - -Gateway connectivity is **supervisor-initiated**: the gateway never dials the sandbox pod. On startup, each sandbox's supervisor opens a long-lived bidirectional gRPC stream (`ConnectSupervisor`) to the gateway and holds it for the sandbox's lifetime. **`CreateSshSession` → HTTP CONNECT and `ExecSandbox` both depend on that registration**: `open_relay` blocks until a live `ConnectSupervisor` entry exists for the `sandbox_id`; if the supervisor never registers (wrong endpoint, bad env, crash loop), the client hits the supervisor-session wait timeout instead of getting a relay. When a client asks the gateway for SSH, the gateway sends a `RelayOpen` message over that stream; the supervisor responds by initiating a `RelayStream` gRPC call that rides the same TCP+TLS+HTTP/2 connection as a new multiplexed stream. The supervisor bridges the bytes of that stream into a root-owned Unix socket where the embedded SSH daemon listens. **The in-container sshd is reached only on that local Unix socket** — the supervisor `UnixStream::connect`s to it. Do not assume the relay path terminates at a container-exposed TCP listener for sshd; any optional TCP surface is separate from the gateway relay bridge. - -There is also a gateway-side `ExecSandbox` gRPC RPC that executes commands inside sandboxes without requiring an external SSH client. It uses the same relay mechanism. - -### Podman and relay environment - -The **Podman** compute driver (`crates/openshell-driver-podman/src/container.rs`, `build_env` / `build_container_spec`) must inject the same **relay-critical** environment variables into the container as the Kubernetes driver: `OPENSHELL_ENDPOINT` (gateway gRPC), `OPENSHELL_SANDBOX_ID`, and `OPENSHELL_SSH_SOCKET_PATH` (Unix path the embedded sshd binds and the supervisor dials). Without `OPENSHELL_SSH_SOCKET_PATH`, the in-container `openshell-sandbox` process does not know where to create the socket; without `OPENSHELL_ENDPOINT` / `OPENSHELL_SANDBOX_ID`, the supervisor cannot complete `ConnectSupervisor`, so the gateway never has a session to target with `RelayOpen`. Driver-owned keys overwrite user spec/template env so these cannot be overridden. **Podman container readiness** (libpod `HealthConfig` in `build_container_spec`) treats the sandbox as ready when a sentinel file exists, **or** `test -S` passes on the configured `sandbox_ssh_socket_path` (**supervisor / Unix-socket path**), **or** a legacy TCP listen check on the published SSH port — so the `Ready` phase used by `CreateSshSession` and the SSH tunnel can reflect Unix-socket–based startup, not only a TCP listener. - -## Two-Plane Architecture - -The supervisor and gateway maintain two logical planes over **one TCP+TLS connection**, multiplexed by HTTP/2 streams: - -- **Control plane** -- the `ConnectSupervisor` bidirectional gRPC stream. Carries `SupervisorHello`, heartbeats, `RelayOpen`/`RelayClose` requests from the gateway, and `RelayOpenResult`/`RelayClose` replies from the supervisor. Lives for the lifetime of the sandbox supervisor process. -- **Data plane** -- one `RelayStream` bidirectional gRPC call per SSH connect or exec invocation. Each call is a new HTTP/2 stream on the same connection. Frames are opaque bytes except for the first frame from the supervisor, which is a typed `RelayInit { channel_id }` used to pair the stream with a pending relay slot on the gateway. - -Running both planes over one HTTP/2 connection means each relay avoids a fresh TLS handshake and benefits from a single authenticated transport boundary. Hyper/h2 `adaptive_window(true)` is enabled on both sides so bulk transfers (large file uploads, long exec stdout) aren't pinned to the default 64 KiB stream window. - -The supervisor-initiated direction gives the model two properties: - -1. The sandbox pod exposes no ingress surface. Network reachability is whatever the supervisor itself can reach outward. -2. Authentication reduces to one place: the existing gateway mTLS channel. There is no second application-layer handshake to design, rotate, or replay-protect. - -## Components - -### CLI SSH module - -**File**: `crates/openshell-cli/src/ssh.rs` - -Client-side SSH and editor-launch helpers: - -- `sandbox_connect()` -- interactive SSH shell session -- `sandbox_exec()` -- non-interactive command execution via SSH -- `sandbox_rsync()` -- file synchronization via tar-over-SSH -- `sandbox_ssh_proxy()` -- the `ProxyCommand` process that bridges stdin/stdout to the gateway -- OpenShell-managed SSH config helpers -- install a single `Include` entry in `~/.ssh/config` and maintain generated `Host openshell-` blocks in a separate OpenShell-owned config file for editor workflows - -Every generated SSH invocation and every entry in the OpenShell-managed `~/.ssh/config` include `ServerAliveInterval=15` and `ServerAliveCountMax=3`. SSH has no other way to observe that the underlying relay (not the end-to-end TCP socket) has silently dropped, so the client falls back to SSH-level keepalives to surface dead connections within ~45 seconds. - -These helpers are re-exported from `crates/openshell-cli/src/run.rs` for backward compatibility. - -### CLI `ssh-proxy` subcommand - -**File**: `crates/openshell-cli/src/main.rs` (`Commands::SshProxy`) - -A top-level CLI subcommand (`ssh-proxy`) that the SSH `ProxyCommand` invokes. It receives `--gateway`, `--sandbox-id`, `--token`, and `--gateway-name` flags, then delegates to `sandbox_ssh_proxy()`. This process has no TTY of its own -- it pipes stdin/stdout directly to the gateway tunnel. - -### gRPC session bootstrap - -**Files**: `proto/openshell.proto`, `crates/openshell-server/src/grpc/sandbox.rs` - -Two RPCs manage SSH session tokens: - -- `CreateSshSession(sandbox_id)` -- validates the sandbox exists and is `Ready`, generates a UUID token, persists an `SshSession` record, and returns the token plus gateway connection details (host, port, scheme, connect path, optional TTL). -- `RevokeSshSession(token)` -- marks the session's `revoked` flag to `true` in the persistence layer. - -### Supervisor session registry - -**File**: `crates/openshell-server/src/supervisor_session.rs` - -`SupervisorSessionRegistry` holds: - -- `sessions: HashMap` -- the active `ConnectSupervisor` stream sender for each sandbox, plus a `session_id` that uniquely identifies each registration. -- `pending_relays: HashMap` -- one entry per `RelayOpen` waiting for the supervisor's `RelayStream` to arrive. - -Key operations: - -- `register(sandbox_id, session_id, tx)` -- inserts a new session and returns the previous sender if it superseded one. Used by `handle_connect_supervisor` to accept a new stream. -- `remove_if_current(sandbox_id, session_id)` -- removes only if the stored `session_id` matches. Guards against the supersede race where an old session's cleanup runs after a newer session has already registered. -- `open_relay(sandbox_id, timeout)` -- called by the gateway tunnel and exec handlers. Waits up to `timeout` for a supervisor session to appear (with exponential backoff 100 ms → 2 s), registers a pending relay slot keyed by a fresh `channel_id`, sends `RelayOpen` to the supervisor, and returns a `oneshot::Receiver` that resolves when the supervisor claims the slot. -- `claim_relay(channel_id)` -- called by `handle_relay_stream` when the supervisor's first `RelayFrame::Init` arrives. Removes the pending entry, enforces a 10-second staleness bound (`RELAY_PENDING_TIMEOUT`), creates a 64 KiB `tokio::io::duplex` pair, hands the gateway-side half to the waiter, and returns the supervisor-side half to be bridged against the inbound/outbound `RelayFrame` streams. -- `reap_expired_relays()` -- bounds leaks from pending slots the supervisor never claimed (e.g., supervisor crashed between `RelayOpen` and `RelayStream`). Scheduled every 30 s by `spawn_relay_reaper()` during server startup. - -The `ConnectSupervisor` handler (`handle_connect_supervisor`) validates `SupervisorHello`, assigns a fresh `session_id`, sends `SessionAccepted { heartbeat_interval_secs: 15 }`, spawns a loop that processes inbound messages (`Heartbeat`, `RelayOpenResult`, `RelayClose`), and emits a `GatewayHeartbeat` every 15 seconds. - -### RelayStream handler - -**File**: `crates/openshell-server/src/supervisor_session.rs` (`handle_relay_stream`) - -Accepts one inbound `RelayFrame` to extract `channel_id` from `RelayInit`, claims the pending relay, then runs two concurrent forwarding tasks: - -- **Supervisor → gateway**: drains `RelayFrame::Data` frames and writes the bytes to the supervisor-side end of the duplex pair. -- **Gateway → supervisor**: reads the duplex in `RELAY_STREAM_CHUNK_SIZE` (16 KiB) chunks and emits `RelayFrame::Data` messages back. - -The first frame that isn't `RelayInit` is rejected (`invalid_argument`). Any non-data frame after init closes the relay. - -### Gateway tunnel handler - -**File**: `crates/openshell-server/src/ssh_tunnel.rs` - -An Axum route at `/connect/ssh` on the shared gateway port. Handles HTTP CONNECT requests by: - -1. Validating the session token (present, not revoked, bound to the sandbox id in `X-Sandbox-Id`, not expired). -2. Confirming the sandbox is in `Ready` phase. -3. Enforcing per-token (max 3) and per-sandbox (max 20) concurrent connection limits. -4. Calling `supervisor_sessions.open_relay(sandbox_id, 30s)` -- the 30-second wait covers the supervisor's initial mTLS + `ConnectSupervisor` handshake on a freshly-scheduled pod. -5. Waiting up to 10 seconds for the supervisor to open its `RelayStream` and deliver the gateway-side `DuplexStream`. -6. Performing the HTTP CONNECT upgrade on the client connection and calling `copy_bidirectional` between the upgraded client socket and the relay stream. - -There is no gateway-to-sandbox TCP dial, handshake preface, or pod-IP resolution in this path. - -### Gateway multiplexing - -**File**: `crates/openshell-server/src/multiplex.rs` - -The gateway runs a single listener that multiplexes gRPC and HTTP on the same port. `MultiplexedService` routes based on the `content-type` header: requests with `application/grpc` go to the gRPC router; all others (including HTTP CONNECT) go to the HTTP router. The HTTP router (`crates/openshell-server/src/http.rs`) merges health endpoints with the SSH tunnel router. Hyper is configured with `http2().adaptive_window(true)` so the HTTP/2 stream windows grow under load rather than throttling `RelayStream` to the default 64 KiB window. - -### Sandbox supervisor session - -**File**: `crates/openshell-sandbox/src/supervisor_session.rs` - -`spawn(endpoint, sandbox_id, ssh_socket_path)` starts a background task that: - -1. Opens a gRPC `Channel` to the gateway (`http2_adaptive_window(true)`). The same channel multiplexes the control stream and every relay. -2. Sends `SupervisorHello { sandbox_id, instance_id }` as the first outbound message. -3. Waits for `SessionAccepted` (or fails fast on `SessionRejected`). -4. Runs a loop that reads inbound `GatewayMessage` values and emits `SupervisorHeartbeat` at the accepted interval (min 5 s, usually 15 s). -5. On `RelayOpen`, spawns `handle_relay_open()` which opens a new `RelayStream` RPC on the existing channel, sends `RelayInit { channel_id }` as the first frame, dials the local SSH Unix socket, and bridges bytes in both directions in 16 KiB chunks. - -Reconnect policy: the session loop wraps `run_single_session()` with exponential backoff (1 s → 30 s) on any error. A `session_established` / `session_failed` OCSF event is emitted on each attempt. - -The supervisor is a dumb byte bridge with no awareness of the SSH protocol flowing through it. - -### Sandbox SSH daemon - -**File**: `crates/openshell-sandbox/src/ssh.rs` - -An embedded SSH server built on `russh` that runs inside each sandbox pod. It: - -- Generates an ephemeral Ed25519 host key on startup (no persistent key material). -- Listens on a Unix socket (default `/run/openshell/ssh.sock`, see [Unix socket access control](#unix-socket-access-control)). -- Accepts any SSH authentication (none or public key) because authorization is handled upstream by the gateway session token and by filesystem permissions on the socket. -- Spawns shell processes on a PTY with full sandbox policy enforcement (Landlock, seccomp, network namespace, privilege dropping). -- Supports interactive shells, exec commands, PTY resize, window-change events, and loopback-only `direct-tcpip` channels for port forwarding. - -### Gateway-side exec (gRPC) - -**File**: `crates/openshell-server/src/grpc/sandbox.rs` (`handle_exec_sandbox`, `stream_exec_over_relay`, `start_single_use_ssh_proxy_over_relay`, `run_exec_with_russh`) - -The `ExecSandbox` gRPC RPC provides programmatic command execution without requiring an external SSH client. It: - -1. Validates `sandbox_id`, `command`, env keys, and field sizes; confirms the sandbox is `Ready`. -2. Calls `supervisor_sessions.open_relay(sandbox_id, 15s)` -- a shorter wait than connect because exec runs in steady state, not on cold start. -3. Waits up to 10 seconds for the relay `DuplexStream` to arrive. -4. Starts a single-use localhost TCP listener on `127.0.0.1:0` and spawns a task that bridges a single accept to the `DuplexStream` with `copy_bidirectional`. This adapts the `DuplexStream` to something `russh::client::connect_stream` can dial. -5. Connects `russh` to the local proxy, authenticates `none` as user `sandbox`, opens a channel, optionally requests a PTY, and executes the shell-escaped command. -6. Streams `stdout`/`stderr`/`exit` events back to the gRPC caller. - -If `timeout_seconds > 0`, the exec is wrapped in `tokio::time::timeout`. On timeout, exit code 124 is sent (matching the `timeout` command convention). - -## Connection Flows - -### Interactive Connect (CLI) - -The `sandbox connect` command opens an interactive SSH session. - -```mermaid -sequenceDiagram - participant User as User Terminal - participant CLI as CLI (sandbox connect) - participant GW as Gateway - participant Reg as SessionRegistry - participant Sup as Supervisor (sandbox) - participant Sock as SSH Unix socket - participant SSHD as russh daemon - - Note over Sup,GW: On sandbox startup (persistent): - Sup->>GW: ConnectSupervisor stream + SupervisorHello - GW-->>Sup: SessionAccepted{session_id, heartbeat=15s} - - User->>CLI: openshell sandbox connect foo - CLI->>GW: GetSandbox(name) -> sandbox.id - CLI->>GW: CreateSshSession(sandbox_id) - GW-->>CLI: token, gateway_host, gateway_port, scheme, connect_path - - Note over CLI: Builds ProxyCommand string: exec()s ssh - - User->>CLI: ssh spawns ssh-proxy subprocess - CLI->>GW: CONNECT /connect/ssh
X-Sandbox-Id, X-Sandbox-Token - GW->>GW: Validate token + sandbox Ready - GW->>Reg: open_relay(sandbox_id, 30s) - Reg-->>GW: (channel_id, relay_rx) - GW->>Sup: RelayOpen{channel_id} (over ConnectSupervisor) - - Sup->>GW: RelayStream RPC (new HTTP/2 stream) - Sup->>GW: RelayFrame::Init{channel_id} - GW->>Reg: claim_relay(channel_id) -> DuplexStream pair - Reg-->>GW: gateway-side DuplexStream (via relay_rx) - Sup->>Sock: UnixStream::connect(/run/openshell/ssh.sock) - Sock-->>SSHD: connection accepted - - GW-->>CLI: 200 OK (upgrade) - - Note over CLI,SSHD: SSH protocol over:
CLI↔GW (HTTP CONNECT) ↔ RelayStream frames ↔ Sup ↔ Unix socket ↔ SSHD - - CLI->>SSHD: SSH handshake + auth_none - SSHD-->>CLI: Auth accepted - CLI->>SSHD: channel_open + shell_request - SSHD->>SSHD: openpty() + spawn /bin/bash -i
(with sandbox policy applied) - User<<->>SSHD: Interactive PTY session -``` - -**Code trace for `sandbox connect`:** - -1. `crates/openshell-cli/src/main.rs` -- `SandboxCommands::Connect { name }` dispatches to `run::sandbox_connect()`. -2. `crates/openshell-cli/src/ssh.rs` -- `sandbox_connect()` calls `ssh_session_config()`: - - Resolves sandbox name to ID via `GetSandbox` gRPC. - - Creates an SSH session via `CreateSshSession` gRPC. - - Builds a `ProxyCommand` string: ` ssh-proxy --gateway --sandbox-id --token --gateway-name `. - - If the SSH gateway host is loopback but the registered gateway endpoint is not, `resolve_ssh_gateway()` overrides the host with the registered endpoint's host. -3. `sandbox_connect()` builds an `ssh` command with: - - `-o ProxyCommand=...` - - `-o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null -o GlobalKnownHostsFile=/dev/null` (ephemeral host keys) - - `-o ServerAliveInterval=15 -o ServerAliveCountMax=3` (surface silently-dropped relays in ~45 s) - - `-tt -o RequestTTY=force` (force PTY allocation) - - `-o SetEnv=TERM=xterm-256color` - - `sandbox` as the SSH user -4. If stdin is a terminal (interactive), the CLI calls `exec()` (Unix) to replace itself with the `ssh` process. Otherwise it spawns and waits. -5. `sandbox_ssh_proxy()` connects via TCP (plain) or TLS (mTLS) to the gateway, sends a raw HTTP CONNECT request with `X-Sandbox-Id` and `X-Sandbox-Token` headers, and on a 200 response spawns two tasks to copy bytes between stdin/stdout and the tunnel. -6. Gateway-side: `ssh_connect()` in `ssh_tunnel.rs` authorizes the request, opens a relay, waits for the supervisor's `RelayStream`, and bridges the upgraded HTTP connection to the relay with `tokio::io::copy_bidirectional`. -7. Supervisor-side: on `RelayOpen`, `handle_relay_open()` in `crates/openshell-sandbox/src/supervisor_session.rs` opens a `RelayStream` RPC, sends `RelayInit`, dials `/run/openshell/ssh.sock`, and bridges the frames to the Unix socket. - -### Command Execution (CLI) - -The `sandbox exec` path is identical to interactive connect except: - -- The SSH command uses `-T -o RequestTTY=no` (no PTY) when `tty=false`. -- The command string is passed as the final SSH argument. -- The sandbox daemon routes it through `exec_request()` instead of `shell_request()`, spawning `/bin/bash -lc `. - -When `openshell sandbox create` launches a `--no-keep` command or shell, it keeps the CLI process alive instead of `exec()`-ing into SSH so it can delete the sandbox after SSH exits. The default create flow, along with `--forward`, keeps the sandbox running. - -### Port Forwarding (`forward start`) - -`openshell forward start ` opens a local SSH tunnel so connections to `127.0.0.1:` on the host are forwarded to `127.0.0.1:` inside the sandbox. Because SSH runs over the same relay as interactive connect, no additional proxying machinery is needed. - -#### CLI - -- Reuses the same `ProxyCommand` path as `sandbox connect`. -- Invokes OpenSSH with `-N -o ExitOnForwardFailure=yes -L :127.0.0.1: sandbox`. -- By default stays attached in foreground until interrupted (Ctrl+C), and prints an early startup confirmation after SSH stays up through its initial forward-setup checks. -- With `-d`/`--background`, SSH forks after auth and the CLI exits. The PID is tracked in `~/.config/openshell/forwards/-.pid` along with sandbox id metadata. -- `openshell forward stop ` validates PID ownership and then kills a background forward. -- `openshell forward list` shows all tracked forwards. -- `openshell forward stop` and `openshell forward list` are local operations and do not require resolving an active gateway. -- `openshell sandbox create --forward ` starts a background forward before connect/exec, including when no trailing command is provided. -- `openshell sandbox delete` auto-stops any active forwards for the deleted sandbox. - -#### TUI - -The TUI (`crates/openshell-tui/`) supports port forwarding through the create sandbox modal. Users specify comma-separated ports in the **Ports** field. After sandbox creation: - -1. The TUI polls for `Ready` state (up to 30 attempts at 2-second intervals). -2. Creates an SSH session via `CreateSshSession` gRPC. -3. Spawns background SSH tunnels (`ssh -N -f -L :127.0.0.1:`) for each port. -4. Sends a `ForwardResult` event back to the main loop with the outcome. - -Active forwards are displayed in the sandbox table's NOTES column (e.g., `fwd:8080,3000`) and in the sandbox detail view's Forwards row. - -When deleting a sandbox, the TUI calls `stop_forwards_for_sandbox()` before sending the delete request. PID tracking uses the same `~/.config/openshell/forwards/` directory as the CLI. - -#### Shared forward module - -**File**: `crates/openshell-core/src/forward.rs` - -Port forwarding PID management and SSH utility functions are shared between the CLI and TUI: - -- `forward_dir()` -- returns `~/.config/openshell/forwards/`, creating it if needed -- `save_forward_pid()` / `read_forward_pid()` / `remove_forward_pid()` -- PID file I/O -- `list_forwards()` -- lists all active forwards from PID files -- `stop_forward()` / `stop_forwards_for_sandbox()` -- kills forwarding processes by PID -- `resolve_ssh_gateway()` -- loopback gateway resolution (see [Gateway Loopback Resolution](#gateway-loopback-resolution)) -- `shell_escape()` -- safe shell argument escaping for SSH commands -- `build_sandbox_notes()` -- builds notes strings (e.g., `fwd:8080,3000`) from active forwards - -#### Supervisor `direct-tcpip` handling - -The sandbox SSH server (`crates/openshell-sandbox/src/ssh.rs`) implements `channel_open_direct_tcpip` from the russh `Handler` trait. - -- **Loopback-only**: only `127.0.0.1`, `localhost`, and `::1` destinations are accepted. Non-loopback destinations are rejected (`Ok(false)`) to prevent the sandbox from being used as a generic proxy. -- **Bridge**: accepted channels spawn a tokio task that connects a `TcpStream` to the target address and uses `copy_bidirectional` between the SSH channel stream and the TCP stream. - -### Gateway-side Exec (gRPC) - -The `ExecSandbox` gRPC RPC bypasses the external SSH client entirely while using the same relay plumbing. - -```mermaid -sequenceDiagram - participant Client as gRPC Client - participant GW as Gateway - participant Reg as SessionRegistry - participant Sup as Supervisor - participant SSHD as SSH daemon (Unix socket) - - Client->>GW: ExecSandbox(sandbox_id, command, stdin, timeout) - GW->>GW: Validate sandbox exists + Ready - GW->>Reg: open_relay(sandbox_id, 15s) - Reg-->>GW: (channel_id, relay_rx) - GW->>Sup: RelayOpen{channel_id} - - Sup->>GW: RelayStream + RelayInit{channel_id} - GW->>Reg: claim_relay -> DuplexStream - Sup->>SSHD: connect /run/openshell/ssh.sock - - Note over GW: start_single_use_ssh_proxy_over_relay
(127.0.0.1:ephemeral -> DuplexStream) - - GW->>GW: russh client dials 127.0.0.1: - GW->>SSHD: SSH auth_none + channel_open + exec(command) - GW->>SSHD: stdin payload + EOF - - loop Stream output - SSHD-->>GW: stdout/stderr chunks - GW-->>Client: ExecSandboxEvent (Stdout/Stderr) - end - - SSHD-->>GW: ExitStatus - GW-->>Client: ExecSandboxEvent (Exit) -``` - -`start_single_use_ssh_proxy_over_relay()` exists only as an adapter so `russh::client::connect_stream` can consume the relay `DuplexStream` through an ephemeral TCP listener on `127.0.0.1:0`. It never reaches the network. - -### File Sync - -File sync uses **tar-over-SSH**: the CLI streams a tar archive through the existing SSH proxy tunnel. No external dependencies (like `rsync`) are required on the client side. The sandbox image provides GNU `tar` for extraction. - -**Files**: `crates/openshell-cli/src/ssh.rs`, `crates/openshell-cli/src/run.rs` - -#### `sandbox create --upload` - -When `--upload` is passed to `sandbox create`, the CLI pushes local files into `/sandbox` (or a specified destination) after the sandbox reaches `Ready` and before any command runs. - -1. `git_repo_root()` determines the repository root via `git rev-parse --show-toplevel`. -2. `git_sync_files()` lists files with `git ls-files -co --exclude-standard -z` (tracked + untracked, respecting gitignore, null-delimited). -3. `sandbox_sync_up_files()` creates an SSH session config, spawns `ssh sandbox "tar xf - -C /sandbox"`, and streams a tar archive of the file list to the SSH child's stdin using the `tar` crate. -4. Files land in `/sandbox` inside the container. - -#### `openshell sandbox upload` / `openshell sandbox download` - -Standalone commands support bidirectional file transfer: - -```bash -# Push local files up to sandbox -openshell sandbox upload [] - -# Pull sandbox files down to local -openshell sandbox download [] -``` - -- **Upload**: `sandbox_upload()` streams a tar archive of the local path to `ssh ... tar xf - -C ` on the sandbox side. Default destination: `/sandbox`. - Named directory uploads preserve the source directory basename at the destination, matching `scp -r` and `cp -r`; uploading `.` remains flat. - `.gitignore` filtering only changes which files are included, not the destination layout. -- **Download**: `sandbox_download()` runs `ssh ... tar cf - -C ` on the sandbox side and extracts the output locally via `tar::Archive`. Default destination: `.` (current directory). -- No compression for v1 -- the SSH tunnel rides the already-TLS-encrypted gateway connection; compression adds CPU cost with marginal bandwidth savings. - -## Supervisor Session Lifecycle - -Each sandbox has at most one live `ConnectSupervisor` stream at a time. The registry enforces this via `register()`, which overwrites any previous entry. - -### States - -```mermaid -stateDiagram-v2 - [*] --> Connecting: spawn() - Connecting --> Rejected: SessionRejected - Connecting --> Live: SessionAccepted - Live --> Live: Heartbeats
RelayOpen/Result
RelayClose - Live --> Disconnected: stream closed / error - Disconnected --> Connecting: backoff (1s..30s) - Rejected --> Connecting: backoff (1s..30s) - Live --> [*]: sandbox exits -``` - -### Hello and accept - -The supervisor sends `SupervisorHello { sandbox_id, instance_id }` (where `instance_id` is a fresh UUID per process start) as the first message. The gateway: - -1. Assigns `session_id = Uuid::new_v4()`. -2. Registers the session; any existing entry is evicted and its sender is dropped. -3. Replies with `SessionAccepted { session_id, heartbeat_interval_secs: 15 }`. -4. Spawns `run_session_loop` to process inbound messages and emit gateway heartbeats. - -On any registration failure (e.g., the supervisor's mpsc receiver was already dropped), `remove_if_current` is called with the assigned `session_id` so the cleanup does not evict a newer successful registration. - -### Heartbeats - -Both directions emit heartbeats at the negotiated interval (15 s). Heartbeats are strictly informational -- their purpose is to keep the HTTP/2 connection warm and let each side detect a half-open transport quickly. There is no explicit application-level timeout that kills the session if heartbeats stop; failures are detected when a send fails or when the stream reports EOF / error. - -### Supersede semantics - -If a supervisor restarts (or a network blip forces a new `ConnectSupervisor` call), the gateway sees a second `SupervisorHello` for the same `sandbox_id`. `register()` inserts the new session and returns the old `tx`. The old session's `run_session_loop` continues to poll its inbound stream until it errors out, at which point its cleanup calls `remove_if_current(sandbox_id, old_session_id)` -- which does nothing because the stored entry now has the new `session_id`. The newer session stays live. - -Tests in `supervisor_session.rs` pin this behavior: - -- `registry_supersedes_previous_session` -- confirms that `register()` returns the prior sender. -- `remove_if_current_ignores_stale_session_id` -- confirms a late cleanup does not evict a newer registration. -- `open_relay_uses_newest_session_after_supersede` -- confirms `RelayOpen` is delivered to the newest session only. - -### Pending-relay reaper - -`spawn_relay_reaper(state, 30s)` sweeps `pending_relays` every 30 seconds and removes entries older than `RELAY_PENDING_TIMEOUT` (10 s). This bounds the leak if a supervisor acknowledges `RelayOpen` but crashes before initiating `RelayStream`. - -## Authentication and Security Model - -### Transport authentication - -All gRPC traffic (control plane + data plane + other RPCs) rides one mTLS-authenticated TCP+TLS+HTTP/2 connection from the supervisor to the gateway. Client certificates prove the supervisor's identity; the server certificate proves the gateway's. Nothing sits between the supervisor and the SSH daemon except the Unix socket's filesystem permissions. - -The CLI continues to authenticate to the gateway with its own mTLS credentials (or Cloudflare bearer token in reverse-proxy deployments) and a per-session token returned by `CreateSshSession`. The session token is enforced at the gateway: token scope (sandbox id), revocation state, and optional expiry are all checked in `ssh_connect()` before `open_relay()` is called. - -### Unix socket access control - -The supervisor creates `/run/openshell/ssh.sock` (path is configurable via the gateway's `sandbox_ssh_socket_path` / supervisor's `--ssh-socket-path` / `OPENSHELL_SSH_SOCKET_PATH`) and: - -1. Creates the parent directory if missing and sets it to mode `0700` (root-owned). -2. Removes any stale socket from a previous run. -3. Binds a `UnixListener` on the path. -4. Sets the socket to mode `0600`. - -The supervisor runs as root; the sandbox workload runs as an unprivileged user. Only the supervisor can connect to the socket. The workload inside the sandbox has no filesystem path by which it can reach the SSH daemon directly. All ingress goes through the relay bridge, which only the supervisor can open (because only the supervisor holds the gateway session). - -`handle_connection()` in `crates/openshell-sandbox/src/ssh.rs` hands the Unix stream directly to `russh::server::run_stream` with no preface or handshake layer in between. - -### Kubernetes NetworkPolicy - -The sandbox pod needs no gateway-to-sandbox ingress rule; the SSH daemon has no TCP listener. Helm ships an egress policy that constrains what the pod can reach outward -- see [Gateway Security](gateway-security.md). - -### What SSH auth does NOT enforce - -The embedded SSH daemon accepts all authentication attempts. This is intentional: - -- The gateway already validated the session token and sandbox readiness. -- Unix socket permissions already restrict who can connect to the daemon to the supervisor, and the supervisor only opens the socket in response to a gateway `RelayOpen`. -- SSH key management would add complexity without additional security value in this architecture. - -### Ephemeral host keys - -The sandbox generates a fresh Ed25519 host key on every startup. The CLI disables `StrictHostKeyChecking` and sets `UserKnownHostsFile=/dev/null` and `GlobalKnownHostsFile=/dev/null` to avoid known-hosts conflicts. - -## Sandbox Target Resolution - -The gateway does not resolve a sandbox's network address or port. The only identifier that matters is `sandbox_id`, which keys into the supervisor session registry. - -## API and Persistence - -### CreateSshSession - -**Proto**: `proto/openshell.proto` -- `CreateSshSessionRequest` / `CreateSshSessionResponse` - -Request: - -- `sandbox_id` (string) -- the sandbox to connect to - -Response: - -- `sandbox_id` (string) -- `token` (string) -- UUID session token -- `gateway_host` (string) -- resolved from `Config::ssh_gateway_host` (defaults to bind address if empty) -- `gateway_port` (uint32) -- resolved from `Config::ssh_gateway_port` (defaults to bind port if 0) -- `gateway_scheme` (string) -- `"https"` if TLS is configured, otherwise `"http"` -- `connect_path` (string) -- from `Config::ssh_connect_path` (default: `/connect/ssh`) -- `host_key_fingerprint` (string) -- currently unused (empty) -- `expires_at_ms` (int64) -- session expiry; 0 disables expiry - -### RevokeSshSession - -Request: - -- `token` (string) -- session token to revoke - -Response: - -- `revoked` (bool) -- true if a session was found and revoked - -### SshSession persistence - -**Proto**: `proto/openshell.proto` -- `SshSession` message - -Stored in the gateway's persistence layer (SQLite or Postgres) as object type `"ssh_session"`: - -| Field | Type | Description | -|-----------------|--------|-------------| -| `id` | string | Same as token (the token is the primary key) | -| `sandbox_id` | string | Sandbox this session is scoped to | -| `token` | string | UUID session token | -| `created_at_ms` | int64 | Creation time (ms since epoch) | -| `revoked` | bool | Whether the session has been revoked | -| `name` | string | Auto-generated human-friendly name | -| `expires_at_ms` | int64 | Expiry timestamp; 0 means no expiry | - -A background reaper (`spawn_session_reaper`) deletes revoked and expired rows every hour. - -### ConnectSupervisor / RelayStream - -**Proto**: `proto/openshell.proto` - -- `ConnectSupervisor(stream SupervisorMessage) returns (stream GatewayMessage)` -- `RelayStream(stream RelayFrame) returns (stream RelayFrame)` - -Key messages: - -| Message | Direction | Fields | -|---|---|---| -| `SupervisorHello` | sup → gw | `sandbox_id`, `instance_id` | -| `SessionAccepted` | gw → sup | `session_id`, `heartbeat_interval_secs` | -| `SessionRejected` | gw → sup | `reason` | -| `SupervisorHeartbeat` | sup → gw | (empty) | -| `GatewayHeartbeat` | gw → sup | (empty) | -| `RelayOpen` | gw → sup | `channel_id` (UUID) | -| `RelayOpenResult` | sup → gw | `channel_id`, `success`, `error` | -| `RelayClose` | either | `channel_id`, `reason` | -| `RelayInit` | sup → gw (first `RelayFrame`) | `channel_id` | -| `RelayFrame` | either | `oneof { RelayInit init, bytes data }` | - -### ExecSandbox - -**Proto**: `proto/openshell.proto` -- `ExecSandboxRequest` / `ExecSandboxEvent` - -Request: - -- `sandbox_id` (string) -- `command` (repeated string) -- command and arguments -- `workdir` (string) -- optional working directory -- `environment` (map) -- optional env var overrides (keys validated against `^[A-Za-z_][A-Za-z0-9_]*$`) -- `timeout_seconds` (uint32) -- 0 means no timeout -- `stdin` (bytes) -- optional stdin payload -- `tty` (bool) -- request a PTY - -Response stream (`ExecSandboxEvent`): - -- `Stdout(data)` -- stdout chunk -- `Stderr(data)` -- stderr chunk -- `Exit(exit_code)` -- final exit status (124 on timeout) - -The gateway builds the remote command by shell-escaping arguments, prepending sorted env var assignments, and optionally wrapping in `cd && ...`. The assembled command is capped at 256 KiB. - -## Gateway Loopback Resolution - -**File**: `crates/openshell-core/src/forward.rs` -- `resolve_ssh_gateway()` - -When the gateway returns a loopback address (`127.0.0.1`, `0.0.0.0`, `localhost`, or `::1`), the client overrides it with the host from the registered gateway endpoint URL. This handles the common case where the gateway defaults to `127.0.0.1` but the gateway is running on a remote machine. - -The override only applies if the registered gateway endpoint itself is not also a loopback address. If both are loopback, the original address is kept. - -This function is shared between the CLI and TUI via the `openshell-core::forward` module. - -## Timeouts - -| Stage | Duration | Where | -|---|---|---| -| Supervisor session wait (SSH connect) | 30 s | `ssh_tunnel::ssh_connect` -> `open_relay` | -| Supervisor session wait (ExecSandbox) | 15 s | `handle_exec_sandbox` -> `open_relay` | -| Wait for supervisor to claim relay | 10 s | `relay_rx` wrapped in `tokio::time::timeout` | -| Pending-relay TTL (reaper) | 10 s | `RELAY_PENDING_TIMEOUT` in registry | -| Session-wait backoff | 100 ms → 2 s | `wait_for_session` | -| Supervisor reconnect backoff | 1 s → 30 s | `run_session_loop` in sandbox supervisor | -| SSH-level keepalive | 15 s × 3 | CLI / managed ssh-config | -| Supervisor heartbeat | 15 s | `HEARTBEAT_INTERVAL_SECS` | -| SSH session reaper sweep | 1 h | `spawn_session_reaper` | -| Pending-relay reaper sweep | 30 s | `spawn_relay_reaper` | - -## Failure Modes - -| Scenario | Status / Behavior | Source | -|---|---|---| -| Missing `X-Sandbox-Id` or `X-Sandbox-Token` header | `401 Unauthorized` | `ssh_tunnel.rs` -- `header_value()` | -| Empty header value | `400 Bad Request` | `ssh_tunnel.rs` -- `header_value()` | -| Non-CONNECT method on `/connect/ssh` | `405 Method Not Allowed` | `ssh_tunnel.rs` -- `ssh_connect()` | -| Token not found in persistence | `401 Unauthorized` | `ssh_tunnel.rs` -- `ssh_connect()` | -| Token revoked or sandbox ID mismatch | `401 Unauthorized` | `ssh_tunnel.rs` -- `ssh_connect()` | -| Token expired | `401 Unauthorized` | `ssh_tunnel.rs` -- `ssh_connect()` | -| Sandbox not found | `404 Not Found` | `ssh_tunnel.rs` -- `ssh_connect()` | -| Sandbox not in `Ready` phase | `412 Precondition Failed` | `ssh_tunnel.rs` -- `ssh_connect()` | -| Per-token or per-sandbox concurrency limit hit | `429 Too Many Requests` | `ssh_tunnel.rs` -- `ssh_connect()` | -| Supervisor session not connected after 30 s | `502 Bad Gateway` | `ssh_tunnel.rs` -- `ssh_connect()` | -| Supervisor failed to claim relay within 10 s | Tunnel closed; `"relay open timed out"` logged | `ssh_tunnel.rs` -- spawned tunnel task | -| Relay channel oneshot dropped | Tunnel closed; `"relay channel dropped"` logged | `ssh_tunnel.rs` -- spawned tunnel task | -| First `RelayFrame` not `RelayInit` or empty `channel_id` | `invalid_argument` on `RelayStream` | `supervisor_session.rs` -- `handle_relay_stream` | -| `RelayStream` arrives after pending entry expired (>10 s) | `deadline_exceeded` | `supervisor_session.rs` -- `claim_relay` | -| Gateway restart during live relay | CLI SSH detects via keepalive within ~45 s; relays are torn down with the TCP connection | CLI `ServerAliveInterval=15`, `ServerAliveCountMax=3` | -| Supervisor restart | Gateway sends on stale mpsc fails; client sees same behavior as gateway restart; supervisor's reconnect loop re-registers | `run_session_loop`, `open_relay` | -| Silently-dropped relay (half-open TCP) | CLI-side SSH keepalives probe every 15 s; session exits with `Broken pipe` after 3 missed probes | SSH client keepalives | -| ExecSandbox timeout | Exit code 124 returned to caller | `stream_exec_over_relay` | -| Command exceeds 256 KiB assembled length | `invalid_argument` | `build_remote_exec_command` | - -## Graceful Shutdown - -### Gateway tunnel teardown - -After `copy_bidirectional` completes on either side, `ssh_connect()` calls `AsyncWriteExt::shutdown()` on the upgraded client connection so SSH sees a clean EOF and can read any remaining protocol data (e.g., exit-status) before exiting. - -### RelayStream teardown - -The `handle_relay_stream` task half-closes the supervisor-side duplex on inbound EOF so the gateway-side reader sees EOF and terminates its own forwarding task. On the supervisor side, `handle_relay_open` does the symmetric shutdown on the Unix socket after inbound EOF, then drops the outbound mpsc so the gateway observes EOF on the response stream too. - -### Supervisor session teardown - -When the sandbox exits, the supervisor process ends, the HTTP/2 connection closes, and all multiplexed streams fail with `stream error`. The gateway's `run_session_loop` observes the error, logs `supervisor session: ended`, and calls `remove_if_current` to deregister. Pending relay slots that never got claimed are swept by `reap_expired_relays` within 30 s. - -### PTY reader-exit ordering - -The sandbox SSH daemon's exit thread waits for the reader thread to finish forwarding all PTY output before sending `exit_status_request` and `close`. This prevents a race where the channel closes before all output has been delivered. - -## Configuration Reference - -### Gateway configuration - -**File**: `crates/openshell-core/src/config.rs` -- `Config` struct - -| Field | Default | Description | -|---|---|---| -| `ssh_gateway_host` | `127.0.0.1` | Public hostname/IP advertised in `CreateSshSessionResponse` | -| `ssh_gateway_port` | `8080` | Public port for gateway connections (0 = use bind port) | -| `ssh_connect_path` | `/connect/ssh` | HTTP path for CONNECT requests | -| `sandbox_ssh_socket_path` | `/run/openshell/ssh.sock` | Path the supervisor binds its Unix socket on; passed to the sandbox as `OPENSHELL_SSH_SOCKET_PATH` | -| `ssh_session_ttl_secs` | (default in code) | Default TTL applied to new `SshSession` rows; 0 disables expiry | - -### Sandbox environment variables - -These are injected into compute-backed sandboxes by the **Kubernetes** driver (`crates/openshell-driver-kubernetes/src/driver.rs`), the **Podman** driver (`crates/openshell-driver-podman/src/container.rs`), and the **Docker** driver (`crates/openshell-driver-docker/src/lib.rs`). Together they are required for **persistent `ConnectSupervisor` registration and relay** (see [Podman and relay environment](#podman-and-relay-environment) for the Podman-specific fix): - -| Variable | Description | -|---|---| -| `OPENSHELL_SSH_SOCKET_PATH` | Filesystem path for the embedded SSH server's Unix socket (default `/run/openshell/ssh.sock`); must align with gateway `sandbox_ssh_socket_path` | -| `OPENSHELL_ENDPOINT` | Gateway gRPC endpoint; the supervisor uses this to open `ConnectSupervisor` | -| `OPENSHELL_SANDBOX_ID` | Identifier reported in `SupervisorHello` | - -### CLI TLS options - -| Flag / Env Var | Description | -|---|---| -| `--tls-ca` / `OPENSHELL_TLS_CA` | CA certificate for gateway verification | -| `--tls-cert` / `OPENSHELL_TLS_CERT` | Client certificate for mTLS | -| `--tls-key` / `OPENSHELL_TLS_KEY` | Client private key for mTLS | - -## Cross-References - -- [Gateway Architecture](gateway.md) -- gateway multiplexing, persistence layer, gRPC service details -- [Gateway Security](gateway-security.md) -- mTLS, session tokens, network policy -- [Sandbox Architecture](sandbox.md) -- sandbox lifecycle, policy enforcement, network isolation, proxy -- [Providers](sandbox-providers.md) -- provider credential injection into SSH shell processes diff --git a/architecture/sandbox-custom-containers.md b/architecture/sandbox-custom-containers.md deleted file mode 100644 index b8303c5de..000000000 --- a/architecture/sandbox-custom-containers.md +++ /dev/null @@ -1,128 +0,0 @@ -# Sandbox Custom Containers - -Users can run `openshell sandbox create --from ` to launch a sandbox with a custom container image while keeping the `openshell-sandbox` process supervisor in control. - -## The `--from` Flag - -The `--from` flag accepts four kinds of input: - -| Input | Example | Behavior | -|-------|---------|----------| -| **Community sandbox name** | `--from openclaw` | Resolves to `ghcr.io/nvidia/openshell-community/sandboxes/openclaw:latest` | -| **Dockerfile path** | `--from ./Dockerfile` | Builds the image locally, makes it available to the local gateway when needed, then creates the sandbox | -| **Directory with Dockerfile** | `--from ./my-sandbox/` | Uses the directory as the build context | -| **Full image reference** | `--from myregistry.com/img:tag` | Uses the image directly | - -### Resolution heuristic - -The CLI classifies the value in this order: - -1. **Existing file** whose name contains "Dockerfile" (case-insensitive) — treated as a Dockerfile to build. -2. **Existing directory** containing a `Dockerfile` — treated as a build context directory. -3. **Missing explicit local path** (for example `./Dockerfile`, `../ctx`, or an absolute path) — rejected locally instead of sent to the gateway as an image pull. -4. **Contains `/`, `:`, or `.`** — treated as a full container image reference. -5. **Otherwise** — treated as a community sandbox name, expanded to `{OPENSHELL_COMMUNITY_REGISTRY}/{name}:latest`. - -The community registry prefix defaults to `ghcr.io/nvidia/openshell-community/sandboxes` and can be overridden with the `OPENSHELL_COMMUNITY_REGISTRY` environment variable. - -### GPU image-name detection - -`sandbox create` also infers GPU intent from the final image name. The current rule matches when the last image name component contains `gpu` (for example `ghcr.io/nvidia/openshell-community/sandboxes/nvidia-gpu:latest` or `registry.example.com/team/my-gpu-image:latest`). When that rule matches, the sandbox request is treated the same as passing `--gpu`. - -### Dockerfile build flow - -When `--from` points to a Dockerfile or directory, the CLI: - -1. Builds the image locally via the Docker daemon (respecting `.dockerignore`). -2. Makes it available to the local gateway runtime when a managed local gateway is running; otherwise keeps the tag in the host Docker daemon for standalone local drivers. -3. Creates the sandbox with the resulting image tag. - -The build step aborts with a clear error if the Docker build stream stays silent for longer than `OPENSHELL_BUILD_NO_PROGRESS_TIMEOUT_SECS` seconds (default 1800). This is a guard against deadlocked container runtimes — most commonly an under-provisioned VM (e.g. macOS Colima with the default 2 vCPU / 2 GiB) where BuildKit can stop emitting events partway through a multi-step build and never recover. Raise the value if a legitimate build step is just quiet, or lower it for tighter CI budgets. - -## How It Works - -The supervisor binary (`openshell-sandbox`) must be delivered by the selected compute driver. The target architecture does not depend on a k3s node hostPath or a cluster image. - -```mermaid -flowchart TB - subgraph delivery["Supervisor delivery"] - bin["openshell-sandbox - (image, image volume, local binary, or VM rootfs)"] - end - - delivery --> agent - - subgraph pod["Pod"] - subgraph agent["Agent Container"] - agent_desc["Image: community base or custom image - Command: /opt/openshell/bin/openshell-sandbox - Supervisor path configured by compute driver - Env: OPENSHELL_SANDBOX_ID, OPENSHELL_ENDPOINT, ... - Caps: SYS_ADMIN, NET_ADMIN, SYS_PTRACE"] - end - end -``` - -For Kubernetes-backed sandboxes, the driver must ensure every pod template has: - -1. A resolvable `openshell-sandbox` entrypoint. -2. Gateway callback environment variables such as `OPENSHELL_SANDBOX_ID`, `OPENSHELL_ENDPOINT`, and `OPENSHELL_SSH_SOCKET_PATH`. -3. TLS and SSH handshake materials when the gateway requires them. -4. The capabilities needed for namespace creation, proxy setup, and Landlock/seccomp. - -These transforms apply to every generated pod template. - -## CLI Usage - -### Creating a sandbox from a community image - -```bash -openshell sandbox create --from openclaw -``` - -### Creating a sandbox with a custom image - -```bash -openshell sandbox create --from myimage:latest -- echo "hello from custom container" -``` - -When `--from` is set the CLI clears the default `run_as_user`/`run_as_group` policy (which expects a `sandbox` user) so that arbitrary images that lack that user can start without error. - -### Building from a Dockerfile in one step - -```bash -openshell sandbox create --from ./Dockerfile -- echo "built and running" -openshell sandbox create --from ./my-sandbox/ # directory with Dockerfile -``` - -## Supervisor Behavior in Custom Images - -The `openshell-sandbox` supervisor adapts to arbitrary environments: - -- **Log file fallback**: Attempts to open `/var/log/openshell.log` for append; if the path is not writable, the supervisor keeps console shorthand logging on stderr only. -- **Command resolution**: Executes the command from CLI args, then the `OPENSHELL_SANDBOX_COMMAND` env var (set to `sleep infinity` by the server), then `/bin/bash` as a last resort. -- **Startup seccomp prelude**: Before parsing CLI args or starting the async runtime, the supervisor sets `PR_SET_NO_NEW_PRIVS` and installs a narrow seccomp filter that blocks mount/remount, the new mount API syscalls, module loading, kexec, `bpf`, `perf_event_open`, and `userfaultfd`. This closes the privileged remount window while still leaving required child-setup syscalls such as `setns` available. -- **Network namespace**: Requires successful namespace creation for proxy isolation; startup fails in proxy mode if required capabilities (`CAP_NET_ADMIN`, `CAP_SYS_ADMIN`) or `iproute2` are unavailable. If the `iptables` package is present, the supervisor installs OUTPUT chain rules (LOG + REJECT) inside the namespace to provide fast-fail behavior (immediate `ECONNREFUSED` instead of a 30-second timeout) and diagnostic logging when processes attempt direct connections that bypass the HTTP CONNECT proxy. If `iptables` is absent, the supervisor logs a warning and continues — core network isolation still works via routing. - -## Design Decisions - -| Decision | Rationale | -|----------|-----------| -| Unified `--from` flag | Single entry point for community names, Dockerfiles, directories, and image refs — removes the need to know registry paths | -| Community name resolution | Bare names like `openclaw` expand to the GHCR community registry, making the common case simple | -| Auto build/import for Dockerfiles | Eliminates the two-step build/import + create workflow for local gateway development | -| `OPENSHELL_COMMUNITY_REGISTRY` env var | Allows organizations to host their own community sandbox registry | -| Driver-owned supervisor delivery | Each compute driver decides how to deliver `openshell-sandbox` for its runtime. | -| Read-only supervisor delivery | The supervisor should be mounted or packaged read-only where the driver supports it, and the startup seccomp prelude blocks remount syscalls that would otherwise reopen it for writes once privileged bootstrap has completed. | -| Command override | Ensures `openshell-sandbox` is the entrypoint regardless of the image's default CMD | -| Clear `run_as_user/group` for custom images | Prevents startup failure when the image lacks the default `sandbox` user | -| Non-fatal log file init | `/var/log/openshell.log` may be unwritable in arbitrary images; falls back to stdout | -| Local gateway image availability | Dockerfile sources build into the host Docker daemon; managed local gateway deployments import the tag so the selected runtime can resolve it. | -| Optional `iptables` for bypass detection | Core network isolation works via routing alone (`iproute2`); `iptables` only adds fast-fail (`ECONNREFUSED`) and diagnostic LOG entries. Making it optional avoids hard failures in minimal images that lack `iptables` while giving better UX when it is available. | - -## Limitations - -- Distroless / `FROM scratch` images are not supported (the supervisor needs glibc and `/proc`) -- Missing `iproute2` (or required capabilities) blocks startup in proxy mode because namespace isolation is mandatory -- Local Dockerfile sources are only supported for local gateways; remote gateways require registry image references. -- The selected compute driver must provide an `openshell-sandbox` binary compatible with the sandbox image and host architecture. diff --git a/architecture/sandbox-providers.md b/architecture/sandbox-providers.md deleted file mode 100644 index d66b6cd2c..000000000 --- a/architecture/sandbox-providers.md +++ /dev/null @@ -1,455 +0,0 @@ -# Providers - -## Overview - -OpenShell uses a first-class `Provider` entity to represent external tool credentials and -configuration (for example `claude`, `gitlab`, `github`, `outlook`, `generic`, `nvidia`). - -Providers exist as an abstraction layer for configuring tools that rely on third-party -access. Rather than each tool managing its own credentials and service configuration, -providers centralize that concern: a user configures a provider once, and any sandbox that -needs that external service can reference it. - -At sandbox creation time, providers are validated and associated with the sandbox. The -sandbox supervisor then fetches credentials at runtime, keeps the real secret values in -supervisor-only memory, and injects placeholder environment variables into every child -process it spawns. When outbound traffic is allowed through the sandbox proxy, the -supervisor rewrites those placeholders back to the real secret values before forwarding. -Access is enforced through the sandbox policy — the policy decides which outbound -requests are allowed or denied based on the providers attached to that sandbox. - -Core goals: - -- manage providers directly via CLI, -- discover provider data from the local machine automatically, -- require providers during sandbox creation, -- project provider context into sandbox runtime, -- drive sandbox policy to allow or deny outbound access to third-party services. - -## Data Model - -Provider is defined in `proto/datamodel.proto`: - -- `id`: unique entity id -- `name`: user-managed name -- `type`: canonical provider slug (`claude`, `gitlab`, `github`, etc.) -- `credentials`: `map` for secret values -- `config`: `map` for non-secret settings - -The gRPC surface is defined in `proto/openshell.proto`: - -- `CreateProvider` -- `GetProvider` -- `ListProviders` -- `ListProviderProfiles` -- `GetProviderProfile` -- `UpdateProvider` -- `DeleteProvider` - -## Provider Type Profiles - -Provider type profiles are declarative metadata for provider types. Built-in profiles -live as one YAML document per provider under the top-level `providers/` directory -and are exposed through -`ListProviderProfiles` and `GetProviderProfile`. The profile loader validates the -YAML catalog and materializes the same proto-backed shape that future API imports -will accept. Profiles describe credential names and environment variables, known -network endpoints, expected binaries, category, and whether the provider is -inference-capable. Categories are a proto enum so clients can group and filter -provider types without parsing display strings. Current values are `other`, -`inference`, `agent`, `source_control`, `messaging`, `data`, and `knowledge`. -Agent profiles such as `claude`, `codex`, and `opencode` can still be -inference-capable when their tool talks to an inference API. - -Profiles are additive to provider records. A provider record with only `type`, -`credentials`, and `config` can be matched to built-in profile metadata by -`provider.type`. Profile-generated policy is still opt-in: the gateway composes provider -profile rules only when the gateway-global `providers_v2_enabled` setting is true. - -This keeps the compatibility boundary at the gateway. A gateway without -`providers_v2_enabled=true` keeps the existing credential-only provider behavior, while a -gateway with the flag enabled routes all attached known provider types through the -profile-backed policy path. - -### Provider Policy Composition - -Sandbox policy fetch uses just-in-time composition: - -```text -effective policy = base/static policy + provider profile rules + user rules -``` - -The composed policy is derived data. The sandbox still receives one normal -`SandboxPolicy`, but provider-generated entries are not persisted as user-authored -policy revisions. Full policy replacement and incremental policy updates continue to -mutate the user-authored policy layer. Provider-generated rules are re-added during -composition for each attached provider whose type has a built-in profile. - -Provider-generated network rules use reserved `_provider_*` names derived from the -provider record name. If a user or global policy already has the same key, composition -keeps the policy entry and adds a numeric suffix to the provider entry. Duplicate -host/port endpoints across policy and provider rules are valid; OPA evaluates all -rules, so allow decisions are the union of matching allows and deny rules continue to -win globally. - -Gateway-global policy still overrides sandbox-authored policy. When `providers_v2_enabled` -is true, provider layers compose JIT onto the effective policy source, whether that -source is sandbox-scoped or global. The composed payload is derived data and is not -persisted as a policy revision. - -## Components - -- `crates/openshell-providers` - - canonical provider type normalization and command detection, - - YAML-backed built-in provider profiles, - - provider registry and per-provider discovery plugins, - - shared discovery engine and context abstraction for testability. -- `crates/openshell-cli` - - `openshell provider ...` command handlers, - - sandbox provider requirement resolution in `sandbox create`. -- `crates/openshell-server` (gateway) - - provider CRUD gRPC handlers, - - `GetSandboxProviderEnvironment` handler resolves credentials at runtime, - - persistence using `object_type = "provider"`. -- `crates/openshell-sandbox` - - sandbox supervisor fetches provider credentials via gRPC at startup, - - injects placeholder env vars into entrypoint and SSH child processes, - - resolves placeholders back to real secrets in the outbound proxy path. - -## Provider Plugins - -Each provider has its own module under `crates/openshell-providers/src/providers/`. - -### Trait Definition - -`ProviderPlugin` (`crates/openshell-providers/src/lib.rs`): - -```rust -pub trait ProviderPlugin: Send + Sync { - fn id(&self) -> &'static str; - fn discover_existing(&self) -> Result, ProviderError>; - fn apply_to_sandbox(&self, _provider: &Provider) -> Result<(), ProviderError> { - Ok(()) // default no-op, forward-looking extension point - } -} -``` - -`DiscoveredProvider` holds two maps (`credentials` and `config`) returned by discovery. - -### Current Modules - -| Module | Env Vars Discovered | Config Paths | -|---|---|---| -| `claude.rs` | `ANTHROPIC_API_KEY`, `CLAUDE_API_KEY` | `~/.claude.json`, `~/.claude/credentials.json`, `~/.config/claude/config.json` | -| `codex.rs` | `OPENAI_API_KEY` | `~/.config/codex/config.json`, `~/.codex/config.json`, `~/.config/openai/config.json` | -| `opencode.rs` | `OPENCODE_API_KEY`, `OPENROUTER_API_KEY`, `OPENAI_API_KEY` | `~/.config/opencode/config.json`, `~/.opencode/config.json` | -| `openclaw.rs` | `OPENCLAW_API_KEY`, `OPENAI_API_KEY` | `~/.config/openclaw/config.json`, `~/.openclaw/config.json` | -| `generic.rs` | *(none)* | *(none)* | -| `nvidia.rs` | `NVIDIA_API_KEY` | *(none)* | -| `gitlab.rs` | `GITLAB_TOKEN`, `GLAB_TOKEN`, `CI_JOB_TOKEN` | `~/.config/glab-cli/config.yml` | -| `github.rs` | `GITHUB_TOKEN`, `GH_TOKEN` | `~/.config/gh/hosts.yml` | -| `outlook.rs` | *(none)* | *(none)* | - -`generic` and `outlook` are stubs — `discover_existing()` always returns `None`. - -Each plugin defines a `ProviderDiscoverySpec` with its `id`, `credential_env_vars`, and -`config_paths`. The registry is assembled in `ProviderRegistry::new()` by registering -each provider module. - -### Normalization - -`normalize_provider_type()` maps common aliases to canonical slugs: `"glab"` -> `"gitlab"`, -`"gh"` -> `"github"`, and accepts `"generic"` directly as a first-class type. -`detect_provider_from_command()` extracts the file basename from the first command token -and passes it through normalization. - -## Discovery Architecture - -Discovery behavior is split into three layers: - -1. provider module defines static spec (`ProviderDiscoverySpec`), -2. shared engine (`discover_with_spec`) performs env/file scanning, -3. runtime context (`DiscoveryContext`) supplies filesystem/environment reads. - -### Discovery Engine - -`discover_with_spec(spec, context)` performs two passes: - -1. **Environment variable scan**: for each var in `spec.credential_env_vars`, reads from - the `DiscoveryContext`. Non-empty values are stored in `discovered.credentials`. - -2. **Config file scan**: for each path in `spec.config_paths`: - - expands `~/` via the context, - - rejects `~/` expansions that contain path-escape components (for example `..`), - - checks file existence, - - **only parses `.json` files** (`.yml`/`.yaml` are checked for existence but not read), - - recursively collects JSON fields whose keys match credential patterns - (`api_key`, `apikey`, `token`, `secret`, `password`, `auth` — case-insensitive), - - collected values go into `discovered.credentials` using dotted path keys - (for example `"oauth.api_key"`). - -Config file values always go into `credentials`, not `config`. The `config` map is only -populated via explicit CLI flags. - -### Discovery Context - -`DiscoveryContext` trait: - -```rust -pub trait DiscoveryContext { - fn env_var(&self, key: &str) -> Option; - fn expand_home(&self, path: &str) -> Option; - fn path_exists(&self, path: &Path) -> bool; - fn read_to_string(&self, path: &Path) -> Option; -} -``` - -Implementations: - -- `RealDiscoveryContext` for production runtime (reads from `std::env` and filesystem), -- `MockDiscoveryContext` test helper for deterministic tests. - -This keeps provider tests isolated from host environment and filesystem. - -## CLI Flows - -### Provider CRUD - -`openshell provider create --type --name [--from-existing] [--credential k=v]... [--config k=v]...` - -- `--credential` supports `KEY=VALUE` and `KEY` forms. - - `KEY=VALUE` sets an explicit credential value. - - `KEY` reads from the local environment variable with the same key, and fails when - the local value is missing or empty. -- `--from-existing` and `--credential` are mutually exclusive. -- `--from-existing` merges discovered laptop data into explicit `--config` args. - -Also supported: - -- `openshell provider get ` -- `openshell provider list` -- `openshell provider list-profiles` -- `openshell provider update ...` -- `openshell provider delete [...]` - -### Sandbox Create - -`openshell sandbox create --provider gitlab -- claude` - -Resolution logic (CLI side, `crates/openshell-cli/src/run.rs`): - -1. `detect_provider_from_command()` infers provider from command token after `--` - (for example `claude`), -2. union with explicit `--provider ` flags (normalized), -3. deduplicate, -4. `ensure_required_providers()` checks each required type exists on the gateway, -5. if interactive and missing, auto-create from existing local state - (uses `ProviderRegistry::discover_existing()`), trying names like `"claude"`, - `"claude-1"`, etc. up to 5 retries for name conflicts, -6. non-interactive mode fails with a clear missing-provider error, -7. set resolved provider **names** in `SandboxSpec.providers`. - -Gateway-side `create_sandbox()` (`crates/openshell-server/src/grpc.rs`): - -1. validates all provider names exist by fetching each from the store (fail fast), -2. creates the `Sandbox` object with `spec.providers` set, -3. **does not inject credentials into the pod spec** — credentials are fetched at runtime. - -If a requested provider name is not found, sandbox creation fails with a -`FailedPrecondition` error. - -> **Note:** Providers can also be configured from within the sandbox itself. This allows -> sandbox users to set up or update provider credentials and configuration at runtime, -> without requiring them to be fully resolved before sandbox creation. - -## Sandbox Credential Injection - -### Runtime Credential Resolution - -`SandboxSpec` includes a `providers` field (`repeated string`) containing provider names. -Credentials are **not** embedded in the pod spec. Instead, the sandbox supervisor fetches -them at runtime via the `GetSandboxProviderEnvironment` gRPC call. - -### Gateway-side: `resolve_provider_environment()` - -`resolve_provider_environment()` (`crates/openshell-server/src/grpc.rs`) builds the -environment map returned by `GetSandboxProviderEnvironment`: - -1. for each provider name in `spec.providers`, fetch the provider from the store, -2. iterate over `provider.credentials` only (not `config`), -3. validate each key matches `^[A-Za-z_][A-Za-z0-9_]*$` (valid env var name), -4. insert into result map using `entry().or_insert()` — first provider's value wins - when duplicate keys appear across providers, -5. invalid keys are skipped with a warning log. - -Key behaviors: - -- Only `credentials` are injected, not `config`. -- Invalid env var keys (containing `.`, `-`, spaces, etc.) are skipped. -- Credentials are never persisted in the sandbox spec's environment map. -- Provider profiles do not change credential injection in the first iteration. - Injection still uses the existing placeholder environment path. - -### Sandbox Supervisor: Fetching Credentials - -The sandbox pod runs `openshell-sandbox` (`crates/openshell-sandbox/src/main.rs`). On -startup it receives `OPENSHELL_SANDBOX_ID` and `OPENSHELL_ENDPOINT` as environment -variables (injected into the pod spec by the gateway's Kubernetes sandbox creation code). - -In `run_sandbox()` (`crates/openshell-sandbox/src/lib.rs`): - -1. loads the sandbox policy via gRPC (`GetSandboxSettings`), -2. fetches provider credentials via gRPC (`GetSandboxProviderEnvironment`), -3. if the fetch fails, continues with an empty map (graceful degradation with a warning). - -The returned `provider_env` `HashMap` is immediately transformed into: - -- a child-visible env map with placeholder values such as - `openshell:resolve:env:ANTHROPIC_API_KEY`, and -- a supervisor-only in-memory registry mapping each placeholder back to its real secret. - -The placeholder env map is threaded to the entrypoint process spawner and SSH server. -The registry is threaded to the proxy so it can rewrite outbound headers. - -### Child Process Environment Variable Injection - -Provider placeholders are injected into child processes in two places, covering all -process spawning paths inside the sandbox: - -**1. Entrypoint process** (`crates/openshell-sandbox/src/process.rs`): - -```rust -let mut cmd = Command::new(program); -cmd.args(args) - .env("OPENSHELL_SANDBOX", "1"); - -// Set provider environment variables (supervisor-managed placeholders). -for (key, value) in provider_env { - cmd.env(key, value); -} -``` - -This uses `tokio::process::Command`. The `.env()` call adds each variable to the child's -inherited environment without clearing it. The spawn path also explicitly removes -`OPENSHELL_SSH_HANDSHAKE_SECRET` so the handshake secret does not leak into the agent -entrypoint process. - -After provider env vars, proxy env vars (`HTTP_PROXY`, `HTTPS_PROXY`, `ALL_PROXY`, -`NO_PROXY=127.0.0.1,localhost,::1`, lowercase variants, etc.) are also set when -`NetworkMode` is `Proxy`. The child is then launched with namespace -isolation, privilege dropping, seccomp, and Landlock restrictions via `pre_exec`. - -**2. SSH shell sessions** (`crates/openshell-sandbox/src/ssh.rs`): - -When a user connects via `openshell sandbox connect`, a PTY shell is spawned: - -```rust -let mut cmd = Command::new(shell); -cmd.env("OPENSHELL_SANDBOX", "1") - .env("HOME", "/sandbox") - .env("USER", "sandbox") - .env("TERM", term); - -// Set provider environment variables (supervisor-managed placeholders). -for (key, value) in provider_env { - cmd.env(key, value); -} -``` - -This uses `std::process::Command`. The `SshHandler` holds the `provider_env` map and -passes it to `spawn_pty_shell()` for each new shell or exec request. SSH child processes -start from `env_clear()`, so the handshake secret is not present there. - -### Proxy-Time Secret Resolution - -When a sandboxed tool uses one of these placeholder env vars in an outbound HTTP request, -the sandbox proxy rewrites the placeholder to the real secret value immediately before the -request is forwarded upstream. Placeholders are resolved in four locations: - -- **HTTP header values** — exact match (`x-api-key: openshell:resolve:env:KEY`), prefixed - match (`Authorization: Bearer openshell:resolve:env:KEY`), and Base64-decoded Basic auth - tokens (`Authorization: Basic `) -- **URL query parameters** — for APIs that authenticate via query string - (e.g., `?key=openshell:resolve:env:YOUTUBE_API_KEY`) -- **URL path segments** — for APIs that embed tokens in the URL path - (e.g., `/bot/sendMessage` for Telegram Bot API) - -This applies to forward-proxy HTTP requests, L7-inspected REST requests inside CONNECT -tunnels, and credential-injection-only passthrough relays on TLS-terminated connections. - -All rewriting fails closed: if any `openshell:resolve:env:*` placeholder is detected but -cannot be resolved, the proxy rejects the request with HTTP 500 instead of forwarding the -raw placeholder upstream. Resolved secret values are validated for prohibited control -characters (CR, LF, null byte) to prevent header injection (CWE-113). Path segment -credentials are additionally validated to reject traversal sequences, path separators, and -URI delimiters (CWE-22). - -The real secret value remains in supervisor memory only; it is not re-injected into the -child process environment. See [Credential injection](sandbox.md#credential-injection) for -the full implementation details, encoding rules, and security properties. - -### End-to-End Flow - -```text -CLI: openshell sandbox create -- claude - | - +-- detect_provider_from_command(["claude"]) -> "claude" - +-- ensure_required_providers() -> discovers local ANTHROPIC_API_KEY - | +-- Creates provider record "claude" on gateway with credentials - +-- Sets SandboxSpec.providers = ["claude"] - +-- Sends CreateSandboxRequest to gateway - | - Gateway: create_sandbox() - +-- Validates provider "claude" exists in store (fail fast) - +-- Persists Sandbox with spec.providers = ["claude"] - +-- Creates K8s Sandbox CRD (no credentials in pod spec) - | - K8s: pod starts openshell-sandbox binary - +-- OPENSHELL_SANDBOX_ID and OPENSHELL_ENDPOINT set in pod env - | - Sandbox supervisor: run_sandbox() - +-- Fetches policy via gRPC - +-- Fetches provider env via gRPC - | +-- Gateway resolves: "claude" -> credentials -> {ANTHROPIC_API_KEY: "sk-..."} - +-- Builds placeholder registry - | +-- child env: {ANTHROPIC_API_KEY: "openshell:resolve:env:ANTHROPIC_API_KEY"} - | +-- supervisor registry: {"openshell:resolve:env:ANTHROPIC_API_KEY": "sk-..."} - +-- Spawns entrypoint with placeholder env - +-- SSH server holds placeholder env - | +-- Each SSH shell: cmd.env("ANTHROPIC_API_KEY", "openshell:resolve:env:ANTHROPIC_API_KEY") - +-- Proxy rewrites outbound auth header placeholders -> real secrets -``` - -## Persistence and Validation - -The gateway enforces: - -- `provider.type` must be non-empty, -- name uniqueness for providers, -- generated `id` on create, -- id preservation on update. - -Providers are stored with `object_type = "provider"` in the shared object store. - -## Security Notes - -- Provider credentials are stored in `credentials` map and treated as sensitive. -- CLI output intentionally avoids printing credential values. -- CLI displays only non-sensitive summaries (counts/key names where relevant). -- Credentials are never persisted in the sandbox spec — they exist only in the - provider store and are fetched at runtime by the sandbox supervisor. -- Child processes never receive the raw provider secret values; they only receive - placeholders, and the supervisor resolves those placeholders during outbound proxying. -- `OPENSHELL_SSH_HANDSHAKE_SECRET` is required by the supervisor/SSH server path but is - explicitly kept out of spawned sandbox child-process environments. - -## Test Strategy - -- Per-provider unit tests in each provider module. -- Shared normalization/command-detection tests in `crates/openshell-providers/src/lib.rs`. -- Mocked discovery context tests cover env and path-based behavior. -- CLI and gateway integration tests validate end-to-end RPC compatibility. -- `resolve_provider_environment` unit tests in `crates/openshell-server/src/grpc.rs`. -- sandbox unit tests validate placeholder generation and header rewriting. -- E2E sandbox tests verify placeholders are visible in child env, outbound proxy traffic - is rewritten with the real secret, and the SSH handshake secret is absent from exec env. diff --git a/architecture/sandbox.md b/architecture/sandbox.md index 18967cf90..71dd35227 100644 --- a/architecture/sandbox.md +++ b/architecture/sandbox.md @@ -1,1890 +1,101 @@ -# Sandbox Architecture +# Sandbox -The sandbox binary isolates a user-specified command inside a child process with policy-driven enforcement. It combines Linux kernel mechanisms (Landlock, seccomp, network namespaces) with an application-layer HTTP CONNECT proxy to provide filesystem, syscall, and network isolation. An embedded OPA/Rego policy engine evaluates every outbound network connection against per-binary rules, and an optional L7 inspection layer examines individual HTTP requests within allowed tunnels. +A sandbox is the runtime boundary where agent code executes. It is created by a +compute runtime and managed inside the workload by `openshell-sandbox`, the +sandbox supervisor. -## Source File Index +## Runtime Model -All paths are relative to `crates/openshell-sandbox/src/`. +Each sandbox workload has two trust levels: -| File | Purpose | -|------|---------| -| `main.rs` | CLI entry point, argument parsing via `clap`, dual-output logging setup, log push layer initialization | -| `lib.rs` | `run_sandbox()` orchestration -- the main startup sequence | -| `log_push.rs` | `LogPushLayer` tracing layer and `spawn_log_push_task()` background batching/streaming to gateway | -| `policy.rs` | `SandboxPolicy`, `NetworkPolicy`, `ProxyPolicy`, `LandlockPolicy`, `ProcessPolicy` structs and proto conversions | -| `opa.rs` | OPA/Rego policy engine using `regorus` crate -- network evaluation, sandbox config queries, L7 endpoint queries | -| `process.rs` | `ProcessHandle` for spawning child processes, privilege dropping, signal handling | -| `proxy.rs` | HTTP CONNECT proxy with OPA evaluation, process-identity binding, inference interception, and L7 dispatch | -| `ssh.rs` | Embedded SSH server (`russh` crate) listening on a Unix socket, with PTY support | -| `supervisor_session.rs` | Persistent outbound `ConnectSupervisor` gRPC session to the gateway; bridges `RelayStream` calls to the local SSH daemon's Unix socket | -| `identity.rs` | `BinaryIdentityCache` -- SHA256 trust-on-first-use binary integrity | -| `procfs.rs` | `/proc` filesystem reading for TCP peer identity resolution and ancestor chain walking | -| `grpc_client.rs` | gRPC client for fetching policy, provider environment, inference route bundles, policy polling/status reporting, proposal submission, and log push (`CachedOpenShellClient`) | -| `denial_aggregator.rs` | `DenialAggregator` background task -- receives `DenialEvent`s from the proxy and bypass monitor, deduplicates by `(host, port, binary)`, drains on flush interval | -| `mechanistic_mapper.rs` | Deterministic policy recommendation generator -- converts denial summaries to `PolicyChunk` proposals with confidence scores, rationale, and SSRF/private-IP detection | -| `sandbox/mod.rs` | Platform abstraction -- dispatches to Linux or no-op | -| `sandbox/linux/mod.rs` | Linux composition: Landlock then seccomp | -| `sandbox/linux/landlock.rs` | Filesystem isolation via Landlock LSM (ABI V1) | -| `sandbox/linux/seccomp.rs` | Syscall filtering via BPF: socket domain blocks, dangerous syscall blocks, conditional flag blocks | -| `bypass_monitor.rs` | Background `/dev/kmsg` reader for iptables bypass detection events | -| `sandbox/linux/netns.rs` | Network namespace creation, veth pair setup, bypass detection iptables rules, cleanup on drop | -| `l7/mod.rs` | L7 types (`L7Protocol`, `TlsMode`, `EnforcementMode`, `L7EndpointConfig`), config parsing, validation, access preset expansion, deprecated `tls` value handling | -| `l7/graphql.rs` | GraphQL-over-HTTP request classifier, body buffering, operation/root-field extraction, and persisted-query metadata handling | -| `l7/inference.rs` | Inference API pattern detection (`detect_inference_pattern()`), HTTP request/response parsing and formatting for intercepted inference connections | -| `l7/tls.rs` | Ephemeral CA generation (`SandboxCa`), per-hostname leaf cert cache (`CertCache`), TLS termination/connection helpers, `looks_like_tls()` auto-detection | -| `l7/relay.rs` | Protocol-aware bidirectional relay with per-request OPA evaluation, credential-injection-only passthrough relay | -| `l7/rest.rs` | HTTP/1.1 request/response parsing, body framing (Content-Length, chunked), deny response generation | -| `l7/path.rs` | Request-target canonicalization: percent-decoding, dot-segment resolution, `;params` stripping, encoded-slash policy (opt-in per endpoint via `allow_encoded_slash: true` for upstreams like GitLab that embed `%2F` in paths). Single source of truth for the path both OPA evaluates and the upstream receives. | -| `l7/provider.rs` | `L7Provider` trait and `L7Request`/`BodyLength` types | -| `secrets.rs` | `SecretResolver` credential placeholder system — placeholder generation, multi-location rewriting (headers, query params, path segments, Basic auth), fail-closed scanning, secret validation, percent-encoding | +| Process | Role | +|---|---| +| Supervisor | Starts as root inside the workload, prepares isolation, runs the proxy, fetches config, injects credentials, serves the relay socket, and launches child processes. | +| Agent child | Runs as an unprivileged user with filesystem, process, and network restrictions applied. | -## Startup and Orchestration +The supervisor keeps enough privilege to manage the sandbox, but the agent child +loses that privilege before user code runs. -The `run_sandbox()` function in `crates/openshell-sandbox/src/lib.rs` is the main orchestration entry point. It executes the following steps in order. +## Startup Flow -### Orchestration flow +1. The compute runtime starts the workload with sandbox identity, callback + endpoint, TLS or secret material, image metadata, and initial command. +2. The supervisor loads policy and runtime settings from local files or the + gateway, depending on mode. +3. It prepares filesystem access, process restrictions, network namespace + routing, trust stores, provider credential resolution, and inference routes. +4. It starts the policy proxy and local SSH server. +5. It opens a supervisor session back to the gateway for connect, exec, file + sync, config polling, and log push. +6. It launches the agent command as the restricted sandbox user. -```mermaid -flowchart TD - A[Parse CLI args] --> B0{gRPC mode?} - B0 -- Yes --> B1[Spawn log push task + LogPushLayer] - B0 -- No --> B2[Skip log push] - B1 --> B[Initialize logging with push layer] - B2 --> B[Initialize logging] - B --> C[Install rustls crypto provider] - C --> D[run_sandbox] - D --> E[load_policy] - E --> F[Fetch provider env via gRPC] - F --> G[Create BinaryIdentityCache] - G --> H[prepare_filesystem] - H --> I{Proxy mode?} - I -- Yes --> J[Generate ephemeral CA + write TLS files] - J --> K[Create network namespace] - K --> K1[Install bypass detection rules] - K1 --> K2[Build InferenceContext] - K2 --> L[Start HTTP CONNECT proxy] - I -- No --> M[Skip proxy setup] - L --> L2[Spawn bypass monitor] - L2 --> N{SSH enabled?} - M --> N - N -- Yes --> O[Spawn SSH server task on Unix socket] - N -- No --> P0{gRPC mode + socket?} - O --> P0 - P0 -- Yes --> P1[Spawn supervisor session task] - P0 -- No --> P[Spawn child process] - P1 --> P - P --> Q[Store entrypoint PID] - Q --> R{gRPC mode?} - R -- Yes --> T[Spawn policy poll task] - R -- No --> U[Skip policy poll] - T --> V[Wait with optional timeout] - U --> V - V --> S[Exit with child exit code] -``` +## Isolation Layers -### Step-by-step detail +OpenShell uses overlapping controls rather than a single sandbox primitive: -1. **Policy loading** (`load_policy()`): - - Priority 1: `--policy-rules` + `--policy-data` provided -- load OPA engine from local Rego file and YAML data file via `OpaEngine::from_files()`. Query `query_sandbox_config()` for filesystem/landlock/process settings. Network mode forced to `Proxy`. - - Priority 2: `--sandbox-id` + `--openshell-endpoint` provided -- fetch typed proto policy via `grpc_client::fetch_policy()`. Create OPA engine via `OpaEngine::from_proto()` using baked-in Rego rules. Convert proto to `SandboxPolicy` via `TryFrom`, which always forces `NetworkMode::Proxy` so that all egress passes through the proxy and the `inference.local` virtual host is always addressable. - - Neither present: return fatal error. - - Output: `(SandboxPolicy, Option>)` +| Layer | Purpose | +|---|---| +| Filesystem policy | Landlock restricts the paths the agent can read or write. | +| Process policy | The child process runs as a non-root user with reduced privileges. | +| Seccomp | Blocks dangerous syscalls, including raw socket paths that bypass the proxy. | +| Network namespace | Forces ordinary agent egress through the local CONNECT proxy. | +| Policy proxy | Evaluates destination, binary identity, TLS/L7 rules, SSRF checks, and inference interception. | -2. **Provider environment fetching**: If sandbox ID and endpoint are available, call `grpc_client::fetch_provider_environment()` to get a `HashMap` of credential environment variables. On failure, log a warning and continue with an empty map. +The supervisor may enrich baseline filesystem allowances for runtime-required +paths, such as proxy support files or GPU device paths when a GPU is present. -3. **Binary identity cache**: If OPA engine is active, create `Arc` for SHA256 TOFU enforcement. +## Network and Inference -4. **Filesystem preparation** (`prepare_filesystem()`): For each path in `filesystem.read_write`, reject symlinks, create the directory if it does not exist, and `chown` only newly-created paths to the configured `run_as_user`/`run_as_group`. Pre-existing paths keep the image-defined ownership. Runs as the supervisor (root) before forking. +All ordinary agent egress is routed through the sandbox proxy. The proxy +identifies the calling binary, checks trust-on-first-use binary identity, rejects +unsafe internal destinations, and evaluates the active policy. -5. **TLS state for L7 inspection** (proxy mode only): - - Generate ephemeral CA via `SandboxCa::generate()` using `rcgen` - - Write CA cert PEM and combined bundle (system CAs + sandbox CA) to `/etc/openshell-tls/` - - Add the TLS directory to `policy.filesystem.read_only` so Landlock allows the child to read it - - Build upstream `ClientConfig` with Mozilla root CAs (`webpki_roots`) plus system CA certificates from the container's trust store (e.g. corporate CAs added via `update-ca-certificates`) - - Create `Arc` wrapping a `CertCache` and the upstream config +`https://inference.local` is special. It bypasses OPA network policy and is +handled by the inference interception path: -6. **Network namespace** (Linux, proxy mode only): - - `NetworkNamespace::create()` builds the veth pair and namespace - - Opens `/var/run/netns/sandbox-{uuid}` as an FD for later `setns()` - - `install_bypass_rules(proxy_port)` installs iptables OUTPUT chain rules for bypass detection (fast-fail UX + diagnostic logging). See [Bypass detection](#bypass-detection). - - On failure: return a fatal startup error (fail-closed). Bypass rule failure is non-fatal (logged as warning). +1. The proxy terminates the local TLS connection with the sandbox CA. +2. It detects known OpenAI, Anthropic, and compatible inference request shapes. +3. It strips caller-supplied credentials and disallowed headers. +4. It forwards through `openshell-router` using the route bundle fetched from + the gateway. -7. **Proxy startup** (proxy mode only): - - Validate that OPA engine and identity cache are present - - Determine bind address: on Linux, use the netns veth host IP (netns creation is required and startup already aborted if it failed); on non-Linux, use `policy.network.proxy.http_addr` - - Build `InferenceContext` via `build_inference_context()` which resolves routes from one of two sources (see [Inference routing context](#inference-routing-context) below) - - `ProxyHandle::start_with_bind_addr()` binds a `TcpListener` and spawns an accept loop, passing the inference context to each connection handler +External inference endpoints that do not use `inference.local` are treated like +ordinary network traffic and must be allowed by policy. -8. **SSH server** (optional): If `--ssh-socket-path` is provided, spawn an async task running `ssh::run_ssh_server()` with the policy, workdir, netns FD, proxy URL, CA paths, and provider env. The value is a filesystem path to the Unix socket the embedded sshd binds. The supervisor waits on a readiness `oneshot` channel before proceeding so that exec requests arriving immediately after pod-ready cannot race against socket bind. +## Credentials -9. **Supervisor session** (gRPC mode + SSH socket only): If `--sandbox-id`, `--openshell-endpoint`, and an SSH socket path are all set, spawn `supervisor_session::spawn()`. This task opens a persistent outbound bidirectional gRPC stream to the gateway and bridges inbound relay requests to the local SSH daemon. See [Supervisor Session](#supervisor-session) for the full protocol. +Provider credentials are stored at the gateway and fetched by the supervisor at +runtime. The supervisor injects resolved environment variables into the initial +agent process and SSH child processes. Driver-controlled environment variables +override template values so sandbox images cannot spoof identity, callback, or +relay settings. -10. **Child process spawning** (`ProcessHandle::spawn()`): - - Build `tokio::process::Command` with inherited stdio and `kill_on_drop(true)` - - Set environment variables: `OPENSHELL_SANDBOX=1`, provider credentials, proxy URLs, TLS trust store paths - - Pre-exec closure (async-signal-safe): `setpgid` (if non-interactive) -> `setns` (enter netns) -> `drop_privileges` -> `sandbox::apply` (Landlock + seccomp) +Credential placeholders in proxied HTTP requests can be resolved by the proxy +when policy allows the target endpoint. Secrets must not be logged in OCSF or +plain tracing output. -11. **Store entrypoint PID**: `entrypoint_pid.store(pid, Ordering::Release)` so the proxy can resolve TCP peer identity via `/proc`. +## Connect and Logs -12. **Spawn policy poll task** (gRPC mode only): If `sandbox_id`, `openshell_endpoint`, and an OPA engine are all present, spawn `run_policy_poll_loop()` as a background tokio task. This task polls the gateway for policy updates and hot-reloads the OPA engine when a new version is detected. See [Policy Reload Lifecycle](#policy-reload-lifecycle) for details. +The supervisor runs an SSH server on a Unix socket inside the sandbox. The +gateway reaches it through the outbound supervisor relay, not by dialing the +sandbox workload directly. The relay supports: -13. **Wait with timeout**: If `--timeout > 0`, wrap `handle.wait()` in `tokio::time::timeout()`. On timeout, kill the process and return exit code 124. +- Interactive shell sessions. +- Command execution. +- Tar-based file sync. +- Port forwarding where supported by the CLI/TUI surface. -## Policy Model +Sandbox logs are emitted locally and can also be pushed back to the gateway. +Security-relevant sandbox behavior uses OCSF structured events; internal +diagnostics use ordinary tracing. -Policy data structures live in `crates/openshell-sandbox/src/policy.rs`. +## Failure Behavior -```rust -pub struct SandboxPolicy { - pub version: u32, - pub filesystem: FilesystemPolicy, - pub network: NetworkPolicy, - pub landlock: LandlockPolicy, - pub process: ProcessPolicy, -} - -pub struct FilesystemPolicy { - pub read_only: Vec, // Landlock read-only allowlist - pub read_write: Vec, // Landlock read-write allowlist (auto-created, chowned) - pub include_workdir: bool, // Add --workdir to read_write (default: true) -} - -pub struct NetworkPolicy { - pub mode: NetworkMode, // Block | Proxy | Allow - pub proxy: Option, -} - -pub struct ProxyPolicy { - pub http_addr: Option, // Loopback bind address when not using netns -} - -pub struct LandlockPolicy { - pub compatibility: LandlockCompatibility, // BestEffort | HardRequirement -} - -pub struct ProcessPolicy { - pub run_as_user: Option, - pub run_as_group: Option, -} -``` - -### Network mode derivation - -The network mode determines which enforcement mechanisms activate: - -| Mode | Seccomp | Network namespace | Proxy | Use case | -|------|---------|-------------------|-------|----------| -| `Block` | Blocks `AF_INET`, `AF_INET6` + others | No | No | No network access at all | -| `Proxy` | Blocks `AF_NETLINK`, `AF_PACKET`, `AF_BLUETOOTH`, `AF_VSOCK` (allows `AF_INET`/`AF_INET6`) | Yes (Linux) | Yes | Controlled network via proxy + OPA | -| `Allow` | No seccomp filter | No | No | Unrestricted network (seccomp skipped entirely) | - -In gRPC mode, the mode is always `Proxy`. The `SandboxPolicy::try_from()` conversion forces `NetworkMode::Proxy` unconditionally so that all egress passes through the proxy and the `inference.local` virtual host is always addressable. In file mode, the mode is also always `Proxy` (the presence of `--policy-rules` implies network policy evaluation). - -### Policy loading modes - -```mermaid -flowchart LR - subgraph "File mode (dev)" - A[--policy-rules .rego] --> C[OpaEngine::from_files] - B[--policy-data .yaml] --> C - C --> D[query_sandbox_config] - D --> E[SandboxPolicy] - end - subgraph "gRPC mode (production)" - F[OPENSHELL_SANDBOX_ID] --> H[grpc_client::fetch_policy] - G[OPENSHELL_ENDPOINT] --> H - H --> I[ProtoSandboxPolicy] - I --> J[OpaEngine::from_proto] - I --> K[SandboxPolicy::try_from] - end -``` - -## OPA Policy Engine - -The OPA engine lives in `crates/openshell-sandbox/src/opa.rs` and uses the `regorus` crate -- a pure-Rust Rego evaluator with no external OPA daemon dependency. - -### Baked-in rules - -The Rego rules are compiled into the binary via `include_str!("../data/sandbox-policy.rego")`. The package is `openshell.sandbox`. Key rules: - -| Rule | Type | Purpose | -|------|------|---------| -| `allow_network` | bool | L4 allow/deny decision for a CONNECT request | -| `network_action` | string | Routing decision: `"allow"` or `"deny"` | -| `deny_reason` | string | Human-readable deny reason | -| `matched_network_policy` | string | Name of the matched policy rule | -| `matched_endpoint_config` | object | Full endpoint config for L7 inspection lookup | -| `allow_request` | bool | L7 per-request allow/deny decision | -| `request_deny_reason` | string | L7 deny reason | -| `filesystem_policy` | object | Static filesystem config passthrough | -| `landlock_policy` | object | Static Landlock config passthrough | -| `process_policy` | object | Static process config passthrough | - -### `OpaEngine` struct - -```rust -pub struct OpaEngine { - engine: Mutex, -} -``` - -The inner `regorus::Engine` requires `&mut self` for evaluation, so access is serialized via `Mutex`. This is acceptable because policy evaluation completes in microseconds and contention is low (one evaluation per CONNECT request at the L4 layer). - -### Loading methods - -- **`from_files(policy_path, data_path)`**: Load a user-supplied `.rego` file and YAML data file. Preprocesses data to expand access presets and validate L7 config. -- **`from_strings(policy, data_yaml)`**: Load from string content (used in tests). -- **`from_proto(proto_policy)`**: Uses the baked-in Rego rules. Converts the proto's typed fields to JSON under the `sandbox` key (matching `data.sandbox.*` references). Validates L7 config, then expands access presets. - -All loading methods run the same preprocessing pipeline: L7 validation (errors block startup, warnings are logged), then access preset expansion (e.g., `access: "read-only"` becomes explicit `rules` with GET/HEAD/OPTIONS). - -### Network evaluation - -Two evaluation methods exist: `evaluate_network()` for the legacy bool-based path, and `evaluate_network_action()` for the two-state routing path used by the proxy. - -#### `evaluate_network(input: &NetworkInput) -> Result` - -Input JSON shape: - -```json -{ - "exec": { - "path": "/usr/bin/curl", - "ancestors": ["/usr/bin/bash", "/usr/bin/node"], - "cmdline_paths": ["/usr/local/bin/claude"] - }, - "network": { - "host": "api.example.com", - "port": 443 - } -} -``` - -Evaluates three Rego rules: - -1. `data.openshell.sandbox.allow_network` -> bool -2. `data.openshell.sandbox.deny_reason` -> string -3. `data.openshell.sandbox.matched_network_policy` -> string (or `Undefined`) - -Returns `PolicyDecision { allowed, reason, matched_policy }`. - -#### `evaluate_network_action(input: &NetworkInput) -> Result` - -Uses the same input JSON shape as `evaluate_network()`. Evaluates the `data.openshell.sandbox.network_action` Rego rule, which returns one of two string values: - -- `"allow"` -- endpoint + binary explicitly matched in a network policy -- `"deny"` -- network connections not allowed by policy - -The Rego logic: - -1. If `network_policy_for_request` exists (endpoint + binary match), return `"allow"` -2. Default: `"deny"` - -Returns `NetworkAction`, an enum with two variants: - -```rust -pub enum NetworkAction { - Allow { matched_policy: Option }, - Deny { reason: String }, -} -``` - -The proxy calls `evaluate_network_action()` (not `evaluate_network()`) as its main decision path. Connections to the `inference.local` virtual host bypass OPA evaluation entirely and are handled by the [inference interception](#inference-interception) path before the OPA check. - -### L7 endpoint config query - -After L4 allows a connection, `query_endpoint_config_with_generation(input)` evaluates `data.openshell.sandbox.matched_endpoint_config` to get the full endpoint object and the policy generation used for the query. If the endpoint has a `protocol` field, `l7::parse_l7_config()` extracts the L7 config for protocol-aware inspection. - -### Engine cloning for L7 - -`clone_engine_for_tunnel(expected_generation)` clones the inner `regorus::Engine` only if the current policy generation still matches the endpoint config generation captured above. With the `arc` feature, this shares compiled policy via `Arc` and only duplicates interpreter state (microseconds). The cloned engine is wrapped in a generation-bound `TunnelPolicyEngine` and used by the L7 relay without contention on the main engine. - -The L7 relay checks the captured generation before parsing, evaluating, and forwarding each request. If a policy reload has advanced the shared generation, the relay closes the tunnel before forwarding more bytes. This applies live policy changes to the next L7 request on a keep-alive tunnel and avoids pairing stale endpoint config with a newer policy engine. HTTP passthrough tunnels without endpoint `protocol` are also generation-bound so credential-injection-only keep-alive tunnels close after a reload before forwarding another request. - -Raw streams are connection-scoped and outside the L7 live-reload guarantee. This includes `tls: skip`, binary/non-HTTP CONNECT tunnels, SQL audit fallback passthrough, HTTP upgrades such as WebSocket after `101 Switching Protocols`, and already-forwarded long-lived response bodies such as SSE. Policy reloads affect the next connection or the next parsed HTTP request; they do not interrupt raw byte streams that have already moved outside the request parser. - -### Hot reload - -Two reload methods exist: - -- **`reload(policy, data_yaml)`**: Builds a new engine from raw Rego + YAML strings and atomically replaces the inner engine. Used in tests and by the file-mode path. -- **`reload_from_proto(proto)`**: Builds a new engine through the same validated pipeline as `from_proto()` -- proto-to-JSON conversion, L7 validation, access preset expansion -- then atomically swaps the inner `regorus::Engine`. On success, all subsequent `evaluate_network_action()` and `query_endpoint_config()` calls use the new policy, and the engine generation increments so active L7 tunnels close before forwarding another request under stale state. On failure (e.g., L7 validation errors), the previous engine and generation are untouched (last-known-good behavior). This is the method used by the policy poll loop for live reloads in gRPC mode. - -Both methods hold the `Mutex` only for the final swap (`*engine = new_engine`), so evaluation is blocked for only the duration of a pointer-sized assignment. - -## Policy Reload Lifecycle - -**File:** `crates/openshell-sandbox/src/lib.rs` (`run_policy_poll_loop()`) - -In gRPC mode, the sandbox can receive policy updates at runtime without restarting. A background task polls the gateway for new policy versions and hot-reloads the OPA engine when changes are detected. Only **dynamic** policy domains (network rules) can change at runtime; **static** domains (filesystem, Landlock, process) are applied once in the pre-exec closure and cannot be modified after the child process spawns. - -### Dynamic vs static policy domains - -| Domain | Mutable at runtime | Applied where | Reason | -|--------|-------------------|---------------|--------| -| `network_policies` | Yes | OPA engine (proxy evaluates per-CONNECT) | Engine swap updates all future evaluations | -| `filesystem` | No | Landlock LSM in pre-exec | Kernel-enforced; cannot be modified after `restrict_self()` | -| `landlock` | No | Landlock LSM in pre_exec | Configuration for the above; same restriction | -| `process` | No | `setuid`/`setgid` in pre-exec | Privileges dropped irrevocably before exec | - -The gateway's `UpdateSandboxPolicy` RPC enforces this boundary: it rejects any update where the static fields (`filesystem`, `landlock`, `process`) differ from the version 1 (creation-time) policy. `network_policies` remain live-editable, including transitions between an empty rule set and a non-empty one, because proto-backed sandboxes already start with the proxy and network namespace infrastructure in place. - -### Poll loop - -The poll loop tracks `config_revision` (a fingerprint of policy + settings + source) as the primary change-detection signal. It separately tracks `policy_hash` to determine whether an OPA reload is needed -- settings-only changes do not trigger OPA reloads. - -```mermaid -sequenceDiagram - participant PL as Settings Poll Loop - participant GW as Gateway (gRPC) - participant OPA as OPA Engine (Arc) - - PL->>GW: GetSandboxSettings(sandbox_id) - GW-->>PL: policy + settings + config_revision - PL->>PL: Store initial config_revision, policy_hash, settings - - loop Every OPENSHELL_POLICY_POLL_INTERVAL_SECS (default 10) - PL->>GW: GetSandboxSettings(sandbox_id) - GW-->>PL: policy + settings + config_revision - alt config_revision unchanged - PL->>PL: Skip - else config_revision changed - PL->>PL: log_setting_changes(old_settings, new_settings) - alt policy_hash changed - PL->>OPA: reload_from_proto(policy) - alt Reload succeeds - OPA-->>PL: Ok - PL->>PL: Update tracked state - PL->>GW: ReportPolicyStatus(version, LOADED) - else Reload fails (validation error) - OPA-->>PL: Err (old engine untouched) - PL->>GW: ReportPolicyStatus(version, FAILED, error_msg) - end - else settings-only change - PL->>PL: Update tracked state (no OPA reload) - end - end - end -``` - -The `run_policy_poll_loop()` function in `crates/openshell-sandbox/src/lib.rs` implements this loop: - -1. **Connect once**: Create a `CachedOpenShellClient` that holds a persistent mTLS channel to the gateway. This avoids TLS renegotiation on every poll. -2. **Fetch initial state**: Call `poll_settings(sandbox_id)` to establish baseline `current_config_revision`, `current_policy_hash`, and `current_settings` map. On failure, log a warning and retry on the next interval. -3. **Poll loop**: Sleep for the configured interval, then call `poll_settings()` again. -4. **Config comparison**: If `result.config_revision == current_config_revision`, skip. -5. **Per-setting diff logging**: Call `log_setting_changes()` to diff old and new settings maps. Each individual change is logged with old and new values. -6. **Conditional OPA reload**: Only call `opa_engine.reload_from_proto(policy)` when `policy_hash` changes. Settings-only changes (e.g., `log_level` updated) update the tracked state without touching the OPA engine. -7. **Status reporting**: On success/failure, report status only for sandbox-scoped policy revisions (`policy_source = SANDBOX`, `version > 0`). Global policy overrides still trigger OPA reload, but they do not write per-sandbox policy status history. -8. **Global policy logging**: When `global_policy_version > 0`, the sandbox logs `"Policy reloaded successfully (global)"` with the `global_version` field. This distinguishes global reloads from sandbox-scoped reloads in the log stream. -9. **Update tracked state**: After processing, update `current_config_revision`, `current_policy_hash`, and `current_settings` regardless of whether OPA was reloaded. - -### `CachedOpenShellClient` - -**File:** `crates/openshell-sandbox/src/grpc_client.rs` - -`CachedOpenShellClient` is a persistent gRPC client for the `OpenShell` service. It wraps a `OpenShellClient` connected once at construction and reused for all subsequent calls. - -```rust -pub struct CachedOpenShellClient { - client: OpenShellClient, -} - -pub struct SettingsPollResult { - pub policy: Option, - pub version: u32, - pub policy_hash: String, - pub config_revision: u64, - pub policy_source: PolicySource, - pub settings: HashMap, - pub global_policy_version: u32, -} -``` - -Methods: - -- **`connect(endpoint)`**: Establish an mTLS channel and return a new client. -- **`poll_settings(sandbox_id)`**: Call `GetSandboxSettings` RPC and return a `SettingsPollResult` containing policy payload (optional), policy metadata, effective config revision, policy source, global policy version, and the effective settings map (for diff logging). -- **`report_policy_status(sandbox_id, version, loaded, error_msg)`**: Call `ReportPolicyStatus` RPC with the appropriate `PolicyStatus` enum value (`Loaded` or `Failed`). -- **`raw_client()`**: Return a clone of the underlying `OpenShellClient` for direct RPC calls (used by the log push task). - -### Server-side policy versioning - -The gateway assigns a monotonically increasing version number to each sandbox policy revision. `GetSandboxSettingsResponse` carries the full effective configuration: policy payload, effective settings map (with per-key scope indicators), a `config_revision` fingerprint that changes when any effective input changes (policy, settings, or source), and a `policy_source` field indicating whether the policy came from the sandbox's own history or from a global override. - -Proto messages involved: - -- `GetSandboxSettingsResponse` (`proto/sandbox.proto`): `policy`, `version`, `policy_hash`, `settings` (map of `EffectiveSetting`), `config_revision`, `policy_source`, `global_policy_version` -- `EffectiveSetting` (`proto/sandbox.proto`): `SettingValue value`, `SettingScope scope` -- `SettingScope` enum: `UNSPECIFIED`, `SANDBOX`, `GLOBAL` -- `PolicySource` enum: `UNSPECIFIED`, `SANDBOX`, `GLOBAL` -- `ReportPolicyStatusRequest` (`proto/openshell.proto`): `sandbox_id`, `version`, `status` (enum), `load_error` -- `PolicyStatus` enum: `PENDING`, `LOADED`, `FAILED`, `SUPERSEDED` -- `SandboxPolicyRevision` (`proto/openshell.proto`): Full revision metadata including `created_at_ms`, `loaded_at_ms` - -The `global_policy_version` field is zero when no global policy is active or when `policy_source` is `SANDBOX`. When `policy_source` is `GLOBAL`, it carries the version number of the active global revision. The sandbox logs this value on reload (`"Policy reloaded successfully (global)" global_version=N`) and the TUI displays it in the dashboard and sandbox metadata pane. - -See [Gateway Settings Channel](gateway-settings.md) for full details on the settings resolution model, storage, and CLI/TUI commands. - -### Failure modes - -| Condition | Behavior | -|-----------|----------| -| Gateway unreachable during poll | Log at debug level, retry on next interval | -| Initial version fetch fails | Log warning, retry on next interval (poll loop continues) | -| `reload_from_proto()` fails (L7 validation error) | Log warning, keep last-known-good engine, report FAILED status | -| Status report RPC fails | Log warning, poll loop continues unaffected | -| Poll interval env var unparseable | Fall back to default (10 seconds) | - -## Linux Enforcement - -All enforcement code runs in the child process's pre-exec closure -- after `fork()` but before `exec()`. The application order is: `setpgid` -> `setns` (netns) -> `drop_privileges` -> `sandbox::apply` (Landlock then seccomp). - -### Landlock filesystem isolation - -**File:** `crates/openshell-sandbox/src/sandbox/linux/landlock.rs` - -Landlock restricts the child process's filesystem access to an explicit allowlist. - -1. Build path lists from `filesystem.read_only` and `filesystem.read_write` -2. If `include_workdir` is true, add the working directory to `read_write` -3. If both lists are empty, skip Landlock entirely (no-op) -4. Create a Landlock ruleset targeting ABI V2: - - Read-only paths receive `AccessFs::from_read(abi)` rights - - Read-write paths receive `AccessFs::from_all(abi)` rights -5. For each path, attempt `PathFd::new()`. If it fails: - - `BestEffort`: Log a warning with the error classification (not found, permission denied, symlink loop, etc.) and skip the path. Continue building the ruleset from remaining valid paths. - - `HardRequirement`: Return a fatal error, aborting the sandbox. -6. If all paths failed (zero rules applied), return an error rather than calling `restrict_self()` on an empty ruleset (which would block all filesystem access) -7. Call `ruleset.restrict_self()` -- this applies to the calling process and all descendants - -Kernel-level error behavior (e.g., Landlock ABI unavailable) depends on `LandlockCompatibility`: - -- `BestEffort`: Log a warning and continue without filesystem isolation -- `HardRequirement`: Return a fatal error, aborting the sandbox - -**Baseline path filtering**: System-injected baseline paths (e.g., `/app`) are pre-filtered by `enrich_proto_baseline_paths()` / `enrich_sandbox_baseline_paths()` using `Path::exists()` before they reach Landlock. If a baseline `read_write` path is already present in `read_only`, enrichment skips the promotion so explicit policy intent is preserved. User-specified paths are not pre-filtered -- they are evaluated at Landlock apply time so misconfigurations surface as warnings or errors. - -**GPU baseline paths**: The supervisor currently infers GPU baseline paths from -device nodes and NVIDIA runtime paths visible inside the sandbox container. The -Docker compute driver can request CDI GPU injection, but this implementation -does not pass CDI metadata into the supervisor. Future device-specific CDI -selection may need follow-up work so the supervisor can enrich Landlock using -the requested CDI device's actual device nodes and mounted library paths. That -design must work for remote Docker daemons, where Docker-reported CDI spec -directories are paths on the daemon host and may not be readable by the gateway -process or the sandbox supervisor. - -### Seccomp syscall filtering - -**File:** `crates/openshell-sandbox/src/sandbox/linux/seccomp.rs` - -Seccomp provides three layers of syscall restriction: socket domain blocks, unconditional syscall blocks, and conditional syscall blocks. The filter uses a default-allow policy (`SeccompAction::Allow`) with targeted rules that return `Errno(EPERM)`. - -**Skipped entirely** in `Allow` mode. - -Setup: - -1. `prctl(PR_SET_NO_NEW_PRIVS, 1)` -- required before seccomp -2. `seccompiler::apply_filter()` with default action `Allow` and per-rule action `Errno(EPERM)` - -#### Socket domain blocks - -| Domain | Always blocked | Additionally blocked in Block mode | -|--------|:-:|:-:| -| `AF_PACKET` | Yes | | -| `AF_BLUETOOTH` | Yes | | -| `AF_VSOCK` | Yes | | -| `AF_INET` | | Yes | -| `AF_INET6` | | Yes | -| `AF_NETLINK` | | Yes | - -In `Proxy` mode, `AF_INET`/`AF_INET6` are allowed because the sandboxed process needs to connect to the proxy over the veth pair. The network namespace ensures it can only reach the proxy's IP (`10.200.0.1`). - -#### Unconditional syscall blocks - -These syscalls are blocked entirely (EPERM for any invocation): - -| Syscall | Reason | -|---------|--------| -| `memfd_create` | Fileless binary execution bypasses Landlock filesystem restrictions | -| `ptrace` | Cross-process memory inspection and code injection | -| `bpf` | Kernel BPF program loading | -| `process_vm_readv` | Cross-process memory read | -| `io_uring_setup` | Async I/O subsystem with extensive CVE history | -| `mount` | Filesystem mount could subvert Landlock or overlay writable paths | - -#### Conditional syscall blocks - -These syscalls are only blocked when specific flag patterns are present: - -| Syscall | Condition | Reason | -|---------|-----------|--------| -| `execveat` | `AT_EMPTY_PATH` flag set (arg4) | Fileless execution from an anonymous fd | -| `unshare` | `CLONE_NEWUSER` flag set (arg0) | User namespace creation enables privilege escalation | -| `seccomp` | operation == `SECCOMP_SET_MODE_FILTER` (arg0) | Prevents sandboxed code from replacing the active filter | - -Conditional blocks use `MaskedEq` for flag checks (bit-test) and `Eq` for exact-value matches. This allows normal use of these syscalls while blocking the dangerous flag combinations. - -### Network namespace isolation - -**File:** `crates/openshell-sandbox/src/sandbox/linux/netns.rs` - -The network namespace creates an isolated network stack where the sandboxed process can only communicate through the proxy. - -#### Topology - -```text -HOST NAMESPACE SANDBOX NAMESPACE ------------------ ----------------- -veth-h-{uuid} veth-s-{uuid} -10.200.0.1/24 <------- veth pair ----> 10.200.0.2/24 - | | - v v -Proxy listener Sandboxed process - | (default route -> 10.200.0.1) - v -Internet (filtered by OPA policy) -``` - -#### Creation sequence (`NetworkNamespace::create()`) - -1. Generate UUID-based short ID (first 8 chars) -2. `ip netns add sandbox-{id}` -- create the namespace -3. `ip link add veth-h-{id} type veth peer name veth-s-{id}` -- create veth pair -4. `ip link set veth-s-{id} netns sandbox-{id}` -- move sandbox veth into namespace -5. Configure host side: assign `10.200.0.1/24`, bring up -6. Configure sandbox side (inside namespace): assign `10.200.0.2/24`, bring up loopback, add default route via `10.200.0.1` -7. Open `/var/run/netns/sandbox-{id}` FD for later `setns()` calls - -Each step has rollback on failure -- if any `ip` command fails, previously created resources are cleaned up. - -#### Cleanup on drop - -`NetworkNamespace` implements `Drop`: - -1. Close the namespace FD -2. Delete the host-side veth (`ip link delete veth-h-{id}`) -- this automatically removes the peer -3. Delete the namespace (`ip netns delete sandbox-{id}`) - -#### Bypass detection - -**Files:** `crates/openshell-sandbox/src/sandbox/linux/netns.rs` (`install_bypass_rules()`), `crates/openshell-sandbox/src/bypass_monitor.rs` - -The network namespace routes all sandbox traffic through the veth pair, but a misconfigured process that ignores proxy environment variables can still attempt direct connections to the veth gateway IP or other addresses. Bypass detection catches these attempts, providing two benefits: immediate connection failure (fast-fail UX) instead of a 30-second TCP timeout, and structured diagnostic logging that identifies the offending process. - -##### iptables rules - -`install_bypass_rules()` installs OUTPUT chain rules inside the sandbox network namespace using `iptables` (IPv4) and `ip6tables` (IPv6, best-effort). Rules are installed via `ip netns exec {namespace} iptables ...`. The rules are evaluated in order: - -| # | Rule | Target | Purpose | -|---|------|--------|---------| -| 1 | `-d {host_ip}/32 -p tcp --dport {proxy_port}` | `ACCEPT` | Allow traffic to the proxy | -| 2 | `-o lo` | `ACCEPT` | Allow loopback traffic | -| 3 | `-m conntrack --ctstate ESTABLISHED,RELATED` | `ACCEPT` | Allow response packets for established connections | -| 4 | `-p tcp --syn -m limit --limit 5/sec --limit-burst 10 --log-prefix "openshell:bypass:{ns}:"` | `LOG` | Log TCP SYN bypass attempts (rate-limited) | -| 5 | `-p tcp` | `REJECT --reject-with icmp-port-unreachable` | Reject TCP bypass attempts (fast-fail) | -| 6 | `-p udp -m limit --limit 5/sec --limit-burst 10 --log-prefix "openshell:bypass:{ns}:"` | `LOG` | Log UDP bypass attempts, including DNS (rate-limited) | -| 7 | `-p udp` | `REJECT --reject-with icmp-port-unreachable` | Reject UDP bypass attempts (fast-fail) | - -The LOG rules use the `--log-uid` flag to include the UID of the process that initiated the connection. The log prefix `openshell:bypass:{namespace_name}:` enables the bypass monitor to filter `/dev/kmsg` for events belonging to a specific sandbox. - -The proxy port defaults to `3128` unless the policy specifies a different `http_addr`. IPv6 rules mirror the IPv4 rules via `ip6tables`; IPv6 rule installation failure is non-fatal (logged as warning) since IPv4 is the primary path. - -**Graceful degradation:** If iptables is not available (checked via `which iptables`), a warning is logged and rule installation is skipped entirely. The network namespace still provides isolation via routing — processes can only reach the proxy's IP, but without bypass rules they get a timeout rather than an immediate rejection. LOG rule failure is also non-fatal — if the `xt_LOG` kernel module is not loaded, the REJECT rules are still installed for fast-fail behavior. - -##### /dev/kmsg monitor - -`bypass_monitor::spawn()` starts a background tokio task (via `spawn_blocking`) that reads kernel log messages from `/dev/kmsg`. The monitor: - -1. Opens `/dev/kmsg` in read mode and seeks to end (skips historical messages) -2. Reads lines via `BufReader`, filtering for the namespace-specific prefix `openshell:bypass:{namespace_name}:` -3. Parses iptables LOG format via `parse_kmsg_line()`, extracting `DST`, `DPT`, `SPT`, `PROTO`, and `UID` fields -4. Resolves process identity for TCP events via multi-owner socket inode lookup (best-effort — requires a valid entrypoint PID and non-zero source port). If multiple processes hold the same socket with different executable identities, the event is marked ambiguous instead of attributing it to one PID. -5. Emits a structured `tracing::warn!()` event with the tag `BYPASS_DETECT` -6. Sends a `DenialEvent` to the denial aggregator channel (if available) - -The `BypassEvent` struct holds the parsed fields: - -```rust -pub struct BypassEvent { - pub dst_addr: String, // Destination IP address - pub dst_port: u16, // Destination port - pub src_port: u16, // Source port (for process identity resolution) - pub proto: String, // "tcp" or "udp" - pub uid: Option, // UID from --log-uid (if present) -} -``` - -##### BYPASS_DETECT tracing event - -Each detected bypass attempt emits a `warn!()` log line with the following structured fields: - -| Field | Type | Description | -|-------|------|-------------| -| `dst_addr` | string | Destination IP address | -| `dst_port` | u16 | Destination port | -| `proto` | string | `"tcp"` or `"udp"` | -| `binary` | string | Binary path of the offending process (or `"-"` if unresolved) | -| `binary_pid` | string | PID of the offending process (or `"-"`) | -| `ancestors` | string | Ancestor chain (e.g., `"/usr/bin/bash -> /usr/bin/node"`) or `"-"` | -| `action` | string | Always `"reject"` | -| `reason` | string | `"direct connection bypassed HTTP CONNECT proxy"` | -| `hint` | string | Context-specific remediation hint (see below) | - -The `hint` field provides actionable guidance: - -| Condition | Hint | -|-----------|------| -| UDP + port 53 | `"DNS queries should route through the sandbox proxy; check resolver configuration"` | -| UDP (other) | `"UDP traffic must route through the sandbox proxy"` | -| TCP | `"ensure process honors HTTP_PROXY/HTTPS_PROXY; for Node.js set NODE_USE_ENV_PROXY=1"` | - -Process identity resolution is best-effort and TCP-only. For UDP events or when the entrypoint PID is not yet set (PID == 0), the binary, PID, and ancestors fields are reported as `"-"`. - -##### DenialEvent integration - -Each bypass event sends a `DenialEvent` to the denial aggregator with `denial_stage: "bypass"`. This integrates bypass detections into the same deduplication, aggregation, and policy proposal pipeline as proxy-level denials. The `DenialEvent` fields: - -| Field | Value | -|-------|-------| -| `host` | Destination IP address | -| `port` | Destination port | -| `binary` | Binary path (or `"-"`) | -| `ancestors` | Ancestor chain parsed from `" -> "` separator | -| `deny_reason` | `"direct connection bypassed HTTP CONNECT proxy"` | -| `denial_stage` | `"bypass"` | -| `l7_method` | `None` | -| `l7_path` | `None` | - -The denial aggregator deduplicates bypass events by the same `(host, port, binary)` key used for proxy denials, and flushes them to the gateway via `SubmitPolicyAnalysis` on the same interval. - -##### Lifecycle wiring - -The bypass detection subsystem is wired in `crates/openshell-sandbox/src/lib.rs`: - -1. After `NetworkNamespace::create()` succeeds, `install_bypass_rules(proxy_port)` is called. Failure is non-fatal (logged as warning). -2. The proxy's denial channel sender (`denial_tx`) is cloned as `bypass_denial_tx` before being passed to the proxy. -3. After proxy startup, `bypass_monitor::spawn()` is called with the namespace name, entrypoint PID, and `bypass_denial_tx`. Returns `Option` — `None` if `/dev/kmsg` is unavailable. - -The monitor runs for the lifetime of the sandbox. It exits when `/dev/kmsg` reaches EOF (process termination) or encounters an unrecoverable read error. - -**Graceful degradation:** If `/dev/kmsg` cannot be opened (e.g., restricted container environment without access to the kernel ring buffer), the monitor logs a one-time warning and returns `None`. The iptables REJECT rules still provide fast-fail UX — the monitor only adds diagnostic visibility. - -##### Dependencies - -Bypass detection requires the `iptables` package for rule installation (in addition to `iproute2` for namespace management). If iptables is not installed, bypass detection degrades to routing-only isolation. The `/dev/kmsg` device is required for the monitor but not for the REJECT rules. - -#### Required capabilities - -| Capability | Purpose | -|------------|---------| -| `CAP_SYS_ADMIN` | Creating network namespaces, `setns()` | -| `CAP_NET_ADMIN` | Creating veth pairs, assigning IPs, configuring routes, installing iptables bypass detection rules | -| `CAP_SYS_PTRACE` | Proxy reading `/proc//fd/` and `/proc//exe` for processes running as a different user | - -The `iproute2` package must be installed (provides the `ip` command). The `iptables` package is required for bypass detection rules; if absent, the namespace still provides routing-based isolation but without fast-fail rejection or diagnostic logging for bypass attempts. - -If namespace creation fails (e.g., missing capabilities), startup fails in `Proxy` mode. This preserves fail-closed behavior: either network namespace isolation is active, or the sandbox does not run. - -## HTTP CONNECT Proxy - -**File:** `crates/openshell-sandbox/src/proxy.rs` - -The proxy is an async TCP listener that accepts HTTP CONNECT requests. Each connection spawns a handler task. The proxy evaluates every CONNECT request against OPA policy with full process-identity binding, except for connections to the `inference.local` virtual host which bypass OPA and are handled by the inference interception path. - -### Connection flow - -```mermaid -sequenceDiagram - participant S as Sandboxed Process - participant P as Proxy (host netns) - participant O as OPA Engine - participant R as Router (sandbox-local) - participant DNS as DNS Resolver - participant Backend as Inference Backend - participant U as Upstream Server - - S->>P: CONNECT host:port HTTP/1.1 - P->>P: Parse CONNECT target (host, port) - - alt Target is inference.local - P-->>S: HTTP/1.1 200 Connection Established - P->>P: TLS-terminate client (SandboxCa) - P->>P: Parse HTTP request from tunnel - alt Inference API pattern matched - P->>P: Strip Authorization header - P->>R: proxy_with_candidates(protocol, method, path, headers, body, routes) - R->>Backend: POST /v1/chat/completions (with route API key) - Backend-->>R: HTTP response - R-->>P: ProxyResponse(status, headers, body) - P-->>S: HTTP response (re-encrypted via TLS) - else Non-inference request - P-->>S: HTTP/1.1 403 JSON error - end - else Regular host - P->>P: Resolve TCP peer identity via /proc - P->>P: TOFU verify binary SHA256 - P->>P: Walk ancestor chain, verify each - P->>P: Collect cmdline paths - P->>O: evaluate_network_action(input) - O-->>P: NetworkAction (Allow / Deny) - P->>P: Log CONNECT decision (unified log line) - alt Deny - P-->>S: HTTP/1.1 403 Forbidden - else Allow - P->>DNS: resolve_and_reject_internal(host, port) - DNS-->>P: Resolved addresses - alt Any IP is internal - P->>P: Log warning (SSRF blocked) - P-->>S: HTTP/1.1 403 Forbidden - else All IPs public - P->>U: TCP connect (resolved addrs) - P-->>S: HTTP/1.1 200 Connection Established - alt tls: skip - P->>P: copy_bidirectional (raw tunnel) - else Auto-detect - P->>P: Peek first bytes - alt TLS detected - P->>P: TLS terminate (MITM) - alt L7 config present - P->>P: relay_with_inspection (per-request L7 evaluation) - else No L7 config - P->>P: relay_passthrough_with_credentials (credential injection) - end - else HTTP detected - alt L7 config present - P->>P: relay_with_inspection - else No L7 config - P->>P: relay_passthrough_with_credentials - end - else Neither TLS nor HTTP - P->>P: copy_bidirectional (raw tunnel) - end - end - end - end - end -``` - -### `ProxyHandle` - -`ProxyHandle` wraps a `JoinHandle` and the bound address. The `Drop` implementation aborts the accept loop. `start_with_bind_addr()` accepts an optional `inference_ctx: Option>` that enables inference interception. See [Inference routing context](#inference-routing-context) for how the `InferenceContext` is constructed. - -Startup steps: - -1. Determine bind address: use the override (veth host IP) if provided, else fall back to `policy.http_addr` -2. Enforce loopback restriction when not using a network namespace override -3. Bind `TcpListener`, spawn accept loop -4. Each accepted connection spawns `handle_tcp_connection()` as a separate tokio task, passing the `InferenceContext` (if present) to each handler - -### Request parsing - -The proxy reads up to 8192 bytes (`MAX_HEADER_BYTES`) looking for `\r\n\r\n`. It validates the method is `CONNECT` (returning 403 for anything else with a structured log) and parses the `host:port` target. - -### `inference.local` interception (pre-OPA fast path) - -After parsing the CONNECT target, the proxy checks whether the hostname (lowercased) matches `INFERENCE_LOCAL_HOST` (`"inference.local"`). If it does, the proxy immediately sends `200 Connection Established` and hands the connection to `handle_inference_interception()`, bypassing OPA evaluation entirely. This design ensures `inference.local` is always addressable in proxy mode regardless of what network policies are configured. - -### OPA evaluation with identity binding (`evaluate_opa_tcp()`) - -For all non-`inference.local` CONNECT targets, the proxy performs OPA evaluation with process-identity binding. This is the core security evaluation path, Linux-only (requires `/proc`). - -```mermaid -flowchart TD - A[Get entrypoint PID from AtomicU32] --> B{PID == 0?} - B -- Yes --> C[Deny: process not yet spawned] - B -- No --> D[Parse /proc/PID/net/tcp for peer port] - D --> E[Find socket inode] - E --> F[Scan descendant FDs for inode] - F --> G[Read /proc/PID/exe for binary path] - G --> H[TOFU verify binary SHA256] - H --> I{Hash match?} - I -- No --> J[Deny: integrity violation] - I -- Yes --> K[Walk PPid chain for ancestors] - K --> L[TOFU verify each ancestor] - L --> M[Collect cmdline absolute paths] - M --> N[Build NetworkInput] - N --> O[OPA evaluate_network_action] - O --> P[Return ConnectDecision] -``` - -On non-Linux platforms, `evaluate_opa_tcp()` always denies with the reason "identity binding unavailable on this platform". - -### `ConnectDecision` struct - -```rust -struct ConnectDecision { - action: NetworkAction, // Allow or Deny - binary: Option, - binary_pid: Option, - ancestors: Vec, - cmdline_paths: Vec, -} -``` - -The `action` field carries the matched policy name (for `Allow`) or the deny reason (for `Deny`) inside the `NetworkAction` enum variants. - -### Unified logging - -Every CONNECT request to a non-`inference.local` target produces an `info!()` log line with all context: source/destination addresses, binary path, PID, ancestor chain, cmdline paths, action (`allow` or `deny`), engine, matched policy, and deny reason. Inference interception failures produce a separate `info!()` log with `action=deny` and the denial reason. - -### SSRF protection (internal IP rejection) - -After OPA allows a connection, the proxy resolves the host using the sandbox's `/etc/hosts` first on Linux (via `/proc//root/etc/hosts`, which picks up Kubernetes `hostAliases`), then falls back to DNS. It rejects any host that resolves to an internal IP address (loopback, RFC 1918 private, link-local, or IPv4-mapped IPv6 equivalents). This defense-in-depth measure prevents SSRF attacks where an allowed hostname is pointed at internal infrastructure. The check is implemented by `resolve_and_reject_internal()`, which validates every resolved address via `is_internal_ip()`. If any resolved IP is internal, the connection receives a `403 Forbidden` response and a warning is logged. `hostAliases` only affect name resolution — private destinations still need `allowed_ips`. See [SSRF Protection](security-policy.md#ssrf-protection-internal-ip-rejection) for the full list of blocked ranges. - -IP classification helpers (`is_always_blocked_ip`, `is_always_blocked_net`, `is_internal_ip`) are shared from `openshell_core::net`. The `parse_allowed_ips` function rejects entries overlapping always-blocked ranges (loopback, link-local, unspecified) at load time with a hard error, and `implicit_allowed_ips_for_ip_host` skips synthesis for always-blocked literal IP hosts. The mechanistic mapper filters proposals for always-blocked destinations to prevent infinite TUI notification loops. - -### Inference interception - -When a CONNECT target is `inference.local`, the proxy TLS-terminates the client side and inspects the HTTP traffic to detect inference API calls. Matched requests are executed locally via the `openshell-router` crate. The function `handle_inference_interception()` implements this path and returns an `InferenceOutcome`: - -```rust -enum InferenceOutcome { - /// At least one request was successfully routed to a local inference backend. - Routed, - /// The connection was denied (TLS failure, non-inference request, etc.). - Denied { reason: String }, -} -``` - -Every exit path in `handle_inference_interception` produces an explicit outcome. The `Denied` variant carries a human-readable reason describing the failure. At the call site in `handle_tcp_connection`, `Denied` outcomes trigger a structured CONNECT deny log with the denial reason. The `route_inference_request` helper returns `Result` where `true` means the request was routed and `false` means the request was not allowed by policy and was denied inline. - -The interception steps: - -1. **TLS termination**: The proxy responds with `200 Connection Established`, then performs TLS termination using the existing `SandboxCa` / `CertCache` infrastructure (same as L7 inspection). The client sees a valid certificate for the target hostname. If TLS termination fails, returns `Denied { reason: "TLS handshake failed: ..." }`. - -2. **HTTP request parsing**: Reads HTTP/1.1 requests from the decrypted tunnel using `try_parse_http_request()` from `l7/inference.rs`. Supports both `Content-Length` and `Transfer-Encoding: chunked` request framing (chunked bodies are decoded before forwarding). Uses a growable buffer starting at 64 KiB (`INITIAL_INFERENCE_BUF`) up to 10 MiB (`MAX_INFERENCE_BUF`). Returns `413 Payload Too Large` if the limit is exceeded (and `Denied { reason: "payload too large" }` if no request was previously routed). - -3. **Inference pattern detection**: `detect_inference_pattern()` checks the request method and path against the configured patterns. Default patterns from `default_patterns()`: - - | Method | Path | Protocol | Kind | - |--------|------|----------|------| - | `POST` | `/v1/chat/completions` | `openai_chat_completions` | `chat_completion` | - | `POST` | `/v1/completions` | `openai_completions` | `completion` | - | `POST` | `/v1/responses` | `openai_responses` | `responses` | - | `POST` | `/v1/messages` | `anthropic_messages` | `messages` | - | `GET` | `/v1/models` | `model_discovery` | `models_list` | - | `GET` | `/v1/models/*` | `model_discovery` | `models_get` | - - Pattern matching strips query strings. Exact path comparison is used for most patterns; the `/v1/models/*` pattern matches `/v1/models` itself or any path under `/v1/models/` (e.g., `/v1/models/gpt-4.1`). - -4. **Header sanitization**: For matched inference requests, the proxy passes the parsed headers to the router. The router applies a route-aware allowlist before forwarding: common inference headers (`content-type`, `accept`, `accept-encoding`, `user-agent`), provider-specific passthrough headers (for example `openai-organization`, `x-model-id`, `anthropic-version`, `anthropic-beta`), and any route default header names. It always strips client-supplied credential headers (`Authorization`, `x-api-key`) and framing/hop-by-hop headers (`host`, `content-length`, `transfer-encoding`, `connection`, etc.). The router rebuilds correct framing for the forwarded body. - -5. **Local routing**: Matched requests are executed by calling `Router::proxy_with_candidates_streaming()`, passing the detected protocol, HTTP method, path, original parsed headers, body, and the cached `ResolvedRoute` list from `InferenceContext`. The router selects the first route whose `protocols` list contains the source protocol (see [Inference Routing -- Response streaming](inference-routing.md#response-streaming) for details). When forwarding to the backend, the router rewrites the request: the route's `api_key` replaces the client auth header, the `Host` header is set to the backend endpoint, only allowlisted request headers survive, and the `"model"` field in the JSON request body is replaced with the route's configured `model` value. If the request body is not valid JSON or does not contain a `"model"` key, the body is forwarded unchanged. - -6. **Response handling (streaming)**: - - On success: response headers are sent back to the client immediately as an HTTP/1.1 response with `Transfer-Encoding: chunked`, using `format_http_response_header()`. Framing/hop-by-hop headers are stripped from the upstream response. Body chunks are then forwarded incrementally as they arrive from the backend via `StreamingProxyResponse::next_chunk()`, each wrapped in HTTP chunked encoding by `format_chunk()`. The stream is terminated with a `0\r\n\r\n` chunk terminator. This ensures time-to-first-byte reflects the backend's first token latency rather than the full generation time. - - On router failure: the error is mapped to an HTTP status code via `router_error_to_http()` and returned as a JSON error body (see error table below) - - Empty route cache: returns `503` JSON error (`{"error": "cluster inference is not configured"}`) - - Non-inference requests: returns `403 Forbidden` with a JSON error body (`{"error": "connection not allowed by policy"}`) - -7. **Connection lifecycle**: The handler loops to process multiple HTTP requests on the same connection (HTTP keep-alive). The loop ends when the client closes the connection or an unrecoverable error occurs. Once at least one request has been successfully routed (`routed_any` flag), subsequent failures (client disconnect, I/O error, payload too large, request not allowed by policy) are treated as clean termination (`InferenceOutcome::Routed`) rather than denials. - -### Router error to HTTP mapping - -When `Router::proxy_with_candidates()` returns an error, `router_error_to_http()` in `proxy.rs` maps it to an HTTP status code: - -| `RouterError` variant | HTTP status | Response body | -|----------------------|-------------|---------------| -| `RouteNotFound(_)` | `400` | `no inference route configured` | -| `NoCompatibleRoute(_)` | `400` | `no compatible inference route available` | -| `Unauthorized(_)` | `401` | `unauthorized` | -| `UpstreamUnavailable(_)` | `503` | `inference service unavailable` | -| `UpstreamProtocol(_)` / `Internal(_)` | `502` | `inference service error` | - -Response messages are generic — internal details (upstream URLs, hostnames, TLS errors, route hints) are never exposed to the sandboxed process. Full error context is logged server-side at `warn` level. - -### Inference routing context - -**Files:** `crates/openshell-sandbox/src/lib.rs` (`build_inference_context`, `bundle_to_resolved_routes`, `spawn_route_refresh`), `crates/openshell-sandbox/src/proxy.rs` (`InferenceContext`) - -The sandbox executes inference requests locally using the `openshell-router` crate. `InferenceContext` holds the router, API patterns, and a cached set of resolved routes: - -```rust -pub struct InferenceContext { - pub patterns: Vec, - router: openshell_router::Router, - routes: Arc>>, -} -``` - -`build_inference_context()` in `lib.rs` resolves routes from one of two sources. - -#### Design decision: standalone capability - -The sandbox is designed to operate both under a gateway-managed compute platform and as a standalone component without gateway infrastructure. This is intentional -- it enables local development workflows (e.g., a developer running a sandbox against a local LLM server without deploying the full stack), CI/CD environments where sandboxes run as isolated test harnesses, and air-gapped deployments where the gateway is not available. Everything the sandbox needs -- policy, inference routes -- can be provided without any dependency on the control plane. - -#### Route sources (priority order) - -1. **Route file (standalone mode)**: `--inference-routes` / `OPENSHELL_INFERENCE_ROUTES` points to a YAML file parsed by `RouterConfig::load_from_file()`. Routes are resolved via `config.resolve_routes()`. File loading or parsing errors are fatal (fail-fast), but an empty route list gracefully disables inference routing (returns `None`). The route file always takes precedence -- if both a route file and gateway credentials are present, the route file wins and the gateway bundle is not fetched. - -2. **Gateway bundle (gateway mode)**: When `openshell_endpoint` is available (and no route file is configured), routes are fetched from the gateway via `grpc_client::fetch_inference_bundle()`, which calls the `GetInferenceBundle` gRPC RPC on the `Inference` service. The RPC takes no arguments (the bundle is gateway-scoped, not per-sandbox). The gateway returns a `GetInferenceBundleResponse` containing resolved `ResolvedRoute` entries for the managed gateway route. These proto messages are converted to router `ResolvedRoute` structs by `bundle_to_resolved_routes()`, which maps provider types to auth headers and default headers via `openshell_core::inference::auth_for_provider_type()`. - -3. **No source**: If neither route file nor gateway credentials are configured, `build_inference_context()` returns `None` and inference routing is disabled. - -#### Gateway mode graceful degradation - -In gateway mode, `fetch_inference_bundle()` failures are handled based on the error type: - -- gRPC `PermissionDenied` or `NotFound` (detected via error message string matching): sandbox has no inference policy -- inference routing is silently disabled. -- Other errors: logged as a warning, inference routing is disabled. -- Empty initial route bundle: inference routing stays enabled with an empty cache and background refresh continues. - -Route sources handle empty route lists differently: file mode disables inference routing when the file resolves to zero routes, while gateway mode keeps inference routing active with an empty cache so refresh can pick up routes created later. File *loading errors* (missing file, parse failure) are fatal, while gateway *fetch errors* are non-fatal. - -#### Background route cache refresh - -In gateway mode (when no route file is configured), `spawn_route_refresh()` starts a background tokio task that refreshes the route cache every 30 seconds (`ROUTE_REFRESH_INTERVAL_SECS`). The task calls `fetch_inference_bundle()` on each tick and replaces the `RwLock>` contents. On fetch failure, the task logs a warning and keeps the stale routes. The `MissedTickBehavior::Skip` policy prevents refresh storms after temporary gateway outages. - -```mermaid -flowchart TD - A[build_inference_context] --> B{Route file configured?} - B -- Yes --> C[RouterConfig::load_from_file] - C --> D[resolve_routes] - D --> E{Routes non-empty?} - E -- Yes --> F[Create InferenceContext] - E -- No --> L[None: inference disabled] - B -- No --> H{sandbox_id + endpoint?} - H -- Yes --> I[fetch_inference_bundle via gRPC] - I --> J{Success?} - J -- Yes --> K{Routes non-empty?} - K -- Yes --> F - K -- No --> G[Create InferenceContext with empty cache] - J -- No --> M{PermissionDenied / NotFound?} - M -- Yes --> L - M -- No --> N[Warn + None] - H -- No --> L - F --> O[spawn_route_refresh if gateway mode] - G --> O -``` - -#### API key security - -`ResolvedRoute` has a custom `Debug` implementation in `crates/openshell-router/src/config.rs` that redacts the `api_key` field, printing `[REDACTED]` instead of the actual value. This prevents key leakage in log output and debug traces. - -### Post-decision: auto-TLS detection, L7 dispatch, or raw tunnel (`Allow` path) - -After a CONNECT is allowed, the SSRF check passes, and the upstream TCP connection is established, the proxy determines how to handle the tunnel traffic. TLS detection is automatic — the proxy peeks the first bytes of the client stream to decide. - -1. **Query L7 route**: `query_l7_route_snapshot()` asks the OPA engine for `matched_endpoint_config` and the current policy generation. If the endpoint has a `protocol` field, parse it into a generation-bound `L7ConfigSnapshot`. If the endpoint has no `protocol`, retain the generation for HTTP passthrough keep-alive tunnels. - -2. **Check for `tls: skip`**: If the endpoint has `tls: skip`, bypass all auto-detection and relay raw bytes via `copy_bidirectional()`. This is the escape hatch for client-cert mTLS or non-standard protocols. - -3. **Peek and auto-detect**: Read up to 8 bytes from the client stream via `TcpStream::peek()`. Classify the traffic using `looks_like_tls()` (checks for TLS ClientHello record: byte 0 = `0x16`, bytes 1-2 = TLS version `0x03xx`) and `looks_like_http()` (checks for HTTP method prefix). - -4. **TLS detected** (`is_tls = true`): - - Terminate TLS unconditionally via `tls_terminate_client()` + `tls_connect_upstream()`. This happens for all HTTPS endpoints, not just those with L7 config. - - If L7 config is present: clone the OPA engine for the captured generation (`clone_engine_for_tunnel(generation)`), run `relay_with_inspection()` for per-request policy evaluation. If the generation changed between config lookup and clone, close the tunnel before inspection. - - If no L7 config: run `relay_passthrough_with_credentials()` — parses HTTP minimally to inject credentials (via `SecretResolver`) and log requests, but does not evaluate L7 OPA rules. The passthrough relay is bound to the policy generation captured at connection setup and closes before forwarding another request after a reload. This enables credential injection on all HTTPS endpoints without requiring `protocol` in the policy. - - If TLS state is not configured: fall back to raw `copy_bidirectional()` with a warning. - -5. **Plaintext HTTP detected** (`is_http = true`, `is_tls = false`): - - If L7 config present: clone the OPA engine for the captured generation, run `relay_with_inspection()` directly on the plaintext streams. - - If no L7 config: run `relay_passthrough_with_credentials()` for credential injection and observability, with the same per-request generation guard. - -6. **Neither TLS nor HTTP**: Raw `copy_bidirectional()` tunnel (binary protocols, SSH-over-CONNECT, etc.). These raw streams are connection-scoped and continue until either side closes; live policy reload does not interrupt them. - -```mermaid -flowchart TD - A["CONNECT allowed + upstream connected"] --> B["Query L7 config"] - B --> C{"tls: skip?"} - C -- Yes --> D["Raw copy_bidirectional"] - C -- No --> E["Peek first bytes"] - E --> F{"looks_like_tls?"} - F -- Yes --> G["TLS terminate client + upstream"] - G --> H{"L7 config?"} - H -- Yes --> I["relay_with_inspection"] - H -- No --> J["relay_passthrough_with_credentials
(credential injection, no L7 rules)"] - F -- No --> K{"looks_like_http?"} - K -- Yes --> L{"L7 config?"} - L -- Yes --> M["relay_with_inspection"] - L -- No --> N["relay_passthrough_with_credentials"] - K -- No --> O["Raw copy_bidirectional
(binary protocol)"] -``` - -## L7 Protocol-Aware Inspection - -**Files:** `crates/openshell-sandbox/src/l7/` - -The L7 subsystem inspects application-layer traffic within CONNECT tunnels. Instead of raw `copy_bidirectional`, each request is parsed, evaluated against OPA rules, and either forwarded or blocked. The relay uses a generation-bound policy snapshot; after a successful policy reload, an existing L7 keep-alive tunnel closes before forwarding another request. Once an HTTP request has upgraded into a raw stream, or when a response body is a long-lived stream, that stream is connection-scoped and is not interrupted by L7 live reload. - -### Architecture - -```mermaid -flowchart LR - subgraph "Per-connection (after CONNECT allowed)" - A[Client TLS/TCP] --> B[L7 Provider: parse_request] - B --> C[OPA: evaluate_l7_request] - C --> D{Decision} - D -- Allow or Audit --> E[Provider: relay to upstream] - D -- Enforce deny --> F[Provider: send deny response] - E --> G[Parse response from upstream] - G --> H[Relay response to client] - H --> B - end -``` - -### Types - -| Type | Definition | Purpose | -|------|-----------|---------| -| `L7Protocol` | `Rest`, `Graphql`, `Sql` | Supported application protocols | -| `TlsMode` | `Auto` (default), `Skip` | TLS handling strategy — `Auto` peeks first bytes and terminates if TLS is detected; `Skip` bypasses detection entirely | -| `EnforcementMode` | `Audit`, `Enforce` | What to do on L7 deny (log-only vs block) | -| `L7EndpointConfig` | `{ protocol, path, tls, enforcement, allow_encoded_slash, graphql_max_body_bytes }` | Per-endpoint L7 configuration, including optional path scoping for shared host:port APIs | -| `L7Decision` | `{ allowed, reason, matched_rule }` | Result of L7 evaluation | -| `L7RequestInfo` | `{ action, target, query_params, graphql }` | HTTP method, path, decoded query multimap, and optional GraphQL classification for policy evaluation | - -### Access presets - -Policy data supports shorthand `access` presets that expand into explicit `rules` during preprocessing: - -| Preset | Expands to | -|--------|-----------| -| `read-only` | REST: `GET **`, `HEAD **`, `OPTIONS **`; GraphQL: `query` | -| `read-write` | REST: `GET **`, `HEAD **`, `OPTIONS **`, `POST **`, `PUT **`, `PATCH **`; GraphQL: `query`, `mutation` | -| `full` | REST: `* **`; GraphQL: `operation_type: "*"` | - -Expansion happens in `expand_access_presets()` before the Rego engine loads the data. The `rules` and `access` fields are mutually exclusive (validated at startup). - -### Policy validation - -`validate_l7_policies()` runs at engine load time and returns `(errors, warnings)`: - -**Errors** (block startup): - -- `rules` and `access` both specified on same endpoint -- `protocol` specified without `rules` or `access` -- unknown `protocol` -- `protocol: sql` with `enforcement: enforce` (SQL parsing not available in v1) -- Empty `rules` array (would deny all traffic) -- invalid GraphQL operation types, persisted-query mode, body limit, or rule shape - -**Warnings** (logged): - -- `tls: terminate` or `tls: passthrough` on any endpoint (deprecated — TLS termination is now automatic; use `tls: skip` to disable) -- `tls: skip` with L7 rules on port 443 (L7 inspection cannot work on encrypted traffic) -- Unknown HTTP method in rules -- GraphQL-specific fields on non-GraphQL endpoints - -### TLS termination (auto-detect) - -**File:** `crates/openshell-sandbox/src/l7/tls.rs` - -TLS termination is automatic. The proxy peeks the first bytes of every CONNECT tunnel and terminates TLS whenever a ClientHello is detected. This enables credential injection and L7 inspection on all HTTPS endpoints without requiring explicit `tls: terminate` in the policy. The `tls` field defaults to `Auto`; use `tls: skip` to opt out entirely (e.g., for client-cert mTLS to upstream). - -**Ephemeral CA lifecycle:** - -1. At sandbox startup, `SandboxCa::generate()` creates a self-signed CA (CN: "OpenShell Sandbox CA") using `rcgen` -2. The CA cert PEM and a combined bundle (system CAs + sandbox CA) are written to `/etc/openshell-tls/` -3. The sandbox CA cert path is set as `NODE_EXTRA_CA_CERTS` (additive for Node.js) -4. The combined bundle is set as `SSL_CERT_FILE`, `REQUESTS_CA_BUNDLE`, `CURL_CA_BUNDLE` (replaces defaults for OpenSSL, Python requests, curl) - -**TLS auto-detection** (`looks_like_tls()`): - -- Peeks up to 8 bytes from the client stream -- Checks for TLS ClientHello pattern: byte 0 = `0x16` (ContentType::Handshake), byte 1 = `0x03` (TLS major version), byte 2 ≤ `0x04` (minor version, covering SSL 3.0 through TLS 1.3) -- Returns `false` for plaintext HTTP, SSH, or other binary protocols - -**Per-hostname leaf cert generation:** - -- `CertCache` maps hostnames to `CertifiedLeaf` structs (cert chain + private key) -- First request for a hostname generates a leaf cert signed by the sandbox CA via `rcgen` -- Cache has a hard limit of 256 entries; on overflow, the entire cache is cleared (sufficient for sandbox scale) -- Each leaf cert chain contains two certs: the leaf and the CA - -**Connection flow (when TLS is detected):** - -1. `tls_terminate_client()`: Accept TLS from the sandboxed client using a `ServerConfig` with the hostname-specific leaf cert. ALPN: `http/1.1`. -2. `tls_connect_upstream()`: Connect TLS to the real upstream using a `ClientConfig` with Mozilla root CAs (`webpki_roots`) and system CA certificates. ALPN: `http/1.1`. -3. Proxy now holds plaintext on both sides. If L7 config is present, runs `relay_with_inspection()`. Otherwise, runs `relay_passthrough_with_credentials()` for credential injection without L7 evaluation. - -System CA bundles are searched at well-known paths: `/etc/ssl/certs/ca-certificates.crt` (Debian/Ubuntu), `/etc/pki/tls/certs/ca-bundle.crt` (RHEL), `/etc/ssl/ca-bundle.pem` (openSUSE), `/etc/ssl/cert.pem` (Alpine/macOS). - -### Credential injection - -**Files:** `crates/openshell-sandbox/src/secrets.rs`, `crates/openshell-sandbox/src/l7/relay.rs`, `crates/openshell-sandbox/src/l7/rest.rs`, `crates/openshell-sandbox/src/proxy.rs` - -The sandbox proxy resolves `openshell:resolve:env:*` credential placeholders in outbound HTTP requests. The `SecretResolver` holds a supervisor-only map from placeholder strings to real secret values, constructed at startup from the provider environment. Child processes only see placeholder values in their environment; the proxy rewrites them to real secrets immediately before forwarding upstream. - -#### `SecretResolver` - -```rust -pub(crate) struct SecretResolver { - by_placeholder: HashMap, -} -``` - -`SecretResolver::from_provider_env()` splits the provider environment into two maps: a child-visible map with placeholder values (`openshell:resolve:env:ANTHROPIC_API_KEY`) and a supervisor-only resolver map (`{"openshell:resolve:env:ANTHROPIC_API_KEY": "sk-real-key"}`). The placeholder grammar is `openshell:resolve:env:[A-Za-z_][A-Za-z0-9_]*`. - -#### Credential placement locations - -The resolver rewrites placeholders in four locations within HTTP requests: - -| Location | Example | Encoding | Implementation | -|----------|---------|----------|----------------| -| Header value (exact) | `x-api-key: openshell:resolve:env:KEY` | None (raw replacement) | `rewrite_header_value()` | -| Header value (prefixed) | `Authorization: Bearer openshell:resolve:env:KEY` | None (prefix preserved) | `rewrite_header_value()` | -| Basic auth token | `Authorization: Basic ` | Base64 decode → resolve → re-encode | `rewrite_basic_auth_token()` | -| URL query parameter | `?key=openshell:resolve:env:KEY` | Percent-decode → resolve → percent-encode (RFC 3986 unreserved) | `rewrite_uri_query_params()` | -| URL path segment | `/bot/sendMessage` | Percent-decode → resolve → validate → percent-encode (RFC 3986 pchar) | `rewrite_uri_path()` → `rewrite_path_segment()` | - -**Header values**: Direct match replaces the entire value. Prefixed match (e.g., `Bearer `) splits on whitespace, resolves the placeholder portion, and reassembles. Basic auth match detects `Authorization: Basic `, decodes the Base64 content, resolves any placeholders in the decoded `user:password` string, and re-encodes. - -**Query parameters**: Each `key=value` pair is checked. Values are percent-decoded before resolution and percent-encoded after (RFC 3986 Section 2.3 unreserved characters preserved: `ALPHA / DIGIT / "-" / "." / "_" / "~"`). - -**Path segments**: Handles substring matching for APIs that embed tokens within path segments (e.g., Telegram's `/bot{TOKEN}/sendMessage`). Each segment is percent-decoded, scanned for placeholder boundaries using the env var key grammar (`[A-Za-z_][A-Za-z0-9_]*`), resolved, validated for path safety, and percent-encoded per RFC 3986 Section 3.3 pchar rules (`unreserved / sub-delims / ":" / "@"`). - -#### Path credential validation (CWE-22) - -Resolved credential values destined for URL path segments are validated by `validate_credential_for_path()` before insertion. The following values are rejected: - -| Pattern | Rejection reason | -|---------|-----------------| -| `../`, `..\\`, `..` | Path traversal sequence | -| `/`, `\` | Path separator | -| `\0`, `\r`, `\n` | Control character | -| `?`, `#` | URI delimiter | - -Rejection causes the request to fail closed (HTTP 500). - -#### Secret value validation (CWE-113) - -All resolved credential values are validated at the `resolve_placeholder()` level for prohibited control characters: CR (`\r`), LF (`\n`), and null byte (`\0`). This prevents HTTP header injection via malicious credential values. The validation applies to all placement locations automatically — header values, query parameters, and path segments all pass through `resolve_placeholder()`. - -#### Fail-closed behavior - -All placeholder rewriting fails closed. If any `openshell:resolve:env:*` placeholder is detected in the request but cannot be resolved, the proxy rejects the request with HTTP 500 instead of forwarding the raw placeholder to the upstream. The fail-closed mechanism operates at two levels: - -1. **Per-location**: Each rewrite function (`rewrite_uri_query_params`, `rewrite_path_segment`, `rewrite_header_line`) returns an `UnresolvedPlaceholderError` when a placeholder is detected but the resolver has no mapping for it. - -2. **Final scan**: After all rewriting completes, `rewrite_http_header_block()` scans the output for any remaining `openshell:resolve:env:` tokens. It also checks the percent-decoded form of the request line to catch encoded placeholder bypass attempts (e.g., `openshell%3Aresolve%3Aenv%3AUNKNOWN`). - -```rust -pub(crate) struct UnresolvedPlaceholderError { - pub location: &'static str, // "header", "query_param", "path" -} -``` - -#### Rewrite-before-OPA with redaction - -When L7 inspection is active, credential placeholders in the request target (path + query) are resolved BEFORE OPA L7 policy evaluation. This is implemented in `relay_with_inspection()` and `relay_passthrough_with_credentials()` in `l7/relay.rs`: - -1. `rewrite_target_for_eval()` resolves the request target, producing two strings: - - **Resolved**: real secrets inserted — used only for the upstream connection - - **Redacted**: `[CREDENTIAL]` markers in place of secrets — used for OPA input and logs - -2. OPA `evaluate_l7_request()` receives the redacted path in `request.path`, so policy rules never see real credential values. - -3. All log statements (`L7_REQUEST`, `HTTP_REQUEST`) use the redacted target. Real credential values never appear in logs. - -4. The resolved path (with real secrets) goes only to the upstream via `relay_http_request_with_resolver()`. - -```rust -pub(crate) struct RewriteTargetResult { - pub resolved: String, // for upstream forwarding only - pub redacted: String, // for OPA + logs -} -``` - -If credential resolution fails on the request target, the relay returns HTTP 500 and closes the connection. - -#### Credential-injection-only relay - -**File:** `crates/openshell-sandbox/src/l7/relay.rs` (`relay_passthrough_with_credentials()`) - -When TLS is auto-terminated but no L7 policy (`protocol` + `access`/`rules`) is configured on the endpoint, the proxy enters a passthrough mode that still provides credential injection and observability. This relay: - -1. Reads each HTTP request from the client via `RestProvider::parse_request()` -2. Resolves and redacts the request target via `rewrite_target_for_eval()` (for log safety) -3. Logs the request method, redacted path, host, and port at `info!()` level (tagged `HTTP_REQUEST`) -4. Forwards the request to upstream via `relay_http_request_with_resolver()`, which rewrites all credential placeholders in headers, query parameters, path segments, and Basic auth tokens -5. Relays the upstream response back to the client -6. Loops for HTTP keep-alive; exits on client close or non-reusable response - -This enables credential injection on all HTTPS endpoints automatically, without requiring the policy author to add `protocol: rest` and `access: full` just to get credentials injected. - -#### Known limitation: host-binding - -The resolver resolves all placeholders regardless of destination host. If an agent has OPA-allowed access to an attacker-controlled host, it could construct a URL containing a placeholder and exfiltrate the resolved credential value to that host. OPA host restrictions are the defense — only endpoints explicitly allowed by policy receive traffic. Per-credential host binding (restricting which credentials resolve for which destination hosts) is not implemented. - -#### Data flow - -```mermaid -sequenceDiagram - participant A as Agent Process - participant P as Proxy (SecretResolver) - participant O as OPA Engine - participant U as Upstream API - - A->>P: GET /bot/send?key= HTTP/1.1
Authorization: Bearer - P->>P: rewrite_target_for_eval(target)
→ resolved: /bot{secret}/send?key={secret}
→ redacted: /bot[CREDENTIAL]/send?key=[CREDENTIAL] - P->>O: evaluate_l7_request(redacted path) - O-->>P: allow - P->>P: rewrite_http_header_block(headers)
→ resolve header placeholders
→ resolve query param placeholders
→ resolve path segment placeholders
→ fail-closed scan - P->>U: GET /bot{secret}/send?key={secret} HTTP/1.1
Authorization: Bearer {secret} - Note over P: Logs use redacted path only -``` - -### REST protocol provider - -**File:** `crates/openshell-sandbox/src/l7/rest.rs` - -Implements `L7Provider` for HTTP/1.1: - -- **`parse_request()`**: Reads up to 16 KiB of headers, parses the request line (method, path), decodes query parameters into a multimap, determines body framing from `Content-Length` or `Transfer-Encoding: chunked` headers. Returns `L7Request` with raw header bytes (may include overflow body bytes). - -- **`relay()`**: Forwards request headers and body to upstream (handling Content-Length, chunked, and no-body cases), then reads and relays the full response back to the client. - -- **`deny()`**: Sends an HTTP `403 Forbidden` JSON response with `Content-Type: application/json`, including the policy name, matched rule, and deny reason. Sets `Connection: close` and includes an `X-OpenShell-Policy` header. - -- **`looks_like_http()`**: Protocol detection via first-byte peek -- checks for standard HTTP method prefixes (GET, HEAD, POST, PUT, DELETE, PATCH, OPTIONS, CONNECT, TRACE). - -### GraphQL protocol classifier - -**File:** `crates/openshell-sandbox/src/l7/graphql.rs` - -GraphQL inspection reuses the HTTP parser, then buffers the request body up to `graphql_max_body_bytes` for classification. It supports `GET` and `POST` GraphQL-over-HTTP envelopes, JSON batches, named operations, root fragment expansion, Apollo persisted-query hashes, and saved-query IDs (`id`, `documentId`, `queryId`). The classifier emits `GraphqlRequestInfo` with operation type, optional operation name, root fields, and persisted-query identifiers. - -Hash-only or saved-query-only requests cannot be parsed into operation fields. They are denied unless the endpoint sets `persisted_queries: allow_registered` and provides a trusted `graphql_persisted_queries` entry for the hash or ID. Batch requests are fail-closed: any malformed, denied, or unregistered operation denies the whole HTTP request. - -### Per-request L7 evaluation - -`relay_with_inspection()` in `crates/openshell-sandbox/src/l7/relay.rs` is the main relay loop: - -1. Parse one HTTP request from client via the provider. Parser and path-canonicalization failures close the connection and emit a denied OCSF network event with the rejection reason in `status_detail`. -2. Resolve credential placeholders in the request target via `rewrite_target_for_eval()`. OPA receives the redacted path (`[CREDENTIAL]` markers); the resolved path goes only to upstream. If resolution fails, return HTTP 500 and close the connection. -3. Build L7 input JSON with `request.method`, the **redacted** `request.path`, `request.query_params`, optional `request.graphql`, plus the CONNECT-level context (host, port, binary, ancestors, cmdline) -4. Evaluate `data.openshell.sandbox.allow_request` and `data.openshell.sandbox.request_deny_reason` -5. Log the L7 decision (tagged `L7_REQUEST`) using the redacted target — real credential values never appear in logs -6. If allowed (or audit mode): relay request to upstream via `relay_http_request_with_resolver()` (which rewrites all remaining credential placeholders in headers, query parameters, path segments, and Basic auth tokens) and relay the response back to client, then loop -7. If denied in enforce mode: send 403 (using redacted target in the response body) and close the connection - -Before parsing, before evaluation, and before forwarding each request, the relay checks whether its captured policy generation still matches the shared engine generation. If not, it emits a denied OCSF network event and closes the tunnel without forwarding the request. - -## Process Identity - -### SHA256 TOFU (Trust-On-First-Use) - -**File:** `crates/openshell-sandbox/src/identity.rs` - -`BinaryIdentityCache` wraps a `Mutex>`, where -each cached entry stores: - -- Hex-encoded SHA256 hash -- File fingerprint (`len`, `mtime`, `ctime`, and on Unix `dev` + `inode`) - -`verify_or_cache(path)`: - -- **First call for a path**: Compute SHA256 via `procfs::file_sha256()`, store as the "golden" hash plus fingerprint, return the hash. -- **Subsequent calls, unchanged fingerprint**: Return cached hash without re-hashing the file. -- **Subsequent calls, changed fingerprint**: Recompute SHA256 and compare with cached value. Return `Ok(hash)` on match; return `Err` on mismatch (binary tampered/replaced mid-sandbox). - -The TOFU model means: - -- No hashes are specified in policy data -- the first observed binary is trusted -- Once trusted, the binary cannot change for the sandbox's lifetime -- Both the immediate binary and all ancestor binaries are TOFU-verified - -### /proc-based identity resolution - -**File:** `crates/openshell-sandbox/src/procfs.rs` - -The proxy resolves which binary is making each network request by inspecting `/proc`. - -**`resolve_tcp_peer_socket_owners(entrypoint_pid, peer_port) -> TcpPeerSocketOwners`** - -```mermaid -flowchart TD - A["Parse /proc/{entrypoint}/net/tcp + tcp6"] --> B[Find ESTABLISHED socket with matching local port] - B --> C[Extract socket inode] - C --> D["BFS collect descendants of entrypoint via /proc/{pid}/task/{tid}/children, deduping PIDs"] - D --> E["Scan every descendant /proc/{pid}/fd/* for socket:[inode] symlink"] - E --> F["Fallback: scan all /proc PIDs not already checked"] - F --> G["Return all socket owner PIDs with source/depth metadata"] - G --> H["Read /proc/{pid}/exe, TOFU-check each owner and ancestor"] - H --> I{All owners same policy identity?} - I -- Yes --> J["Evaluate OPA once with the shared identity"] - I -- No --> K["Deny as ambiguous shared socket ownership"] -``` - -Both IPv4 (`/proc/{pid}/net/tcp`) and IPv6 (`/proc/{pid}/net/tcp6`) tables are checked because some clients (notably gRPC C-core) use `AF_INET6` sockets with IPv4-mapped addresses. - -Multiple processes can hold the same socket inode after `fork()` or fd inheritance across `execve()`. The proxy treats that as ambiguous unless all socket owners resolve to the same policy identity: binary path, TOFU hash, ancestor chain, and cmdline-derived absolute paths. Ambiguous ownership is denied before OPA evaluation so a trusted co-owner cannot accidentally authorize traffic for a different process. - -**`collect_ancestor_binaries(pid, stop_pid) -> Vec`**: Walk the PPid chain via `/proc/{pid}/status`, collecting `binary_path()` for each ancestor. Stops at PID 1, `stop_pid` (entrypoint), or after 64 levels (safety limit). Does not include `pid` itself. - -**`collect_cmdline_paths(pid, stop_pid, exclude) -> Vec`**: Extract absolute paths from `/proc/{pid}/cmdline` for the process and its ancestor chain. Captures script paths that don't appear in `/proc/{pid}/exe` -- for example, when `#!/usr/bin/env node` runs a script at `/usr/local/bin/claude`, the exe is `/usr/bin/node` but cmdline contains the script path. Paths already in `exclude` (exe-based paths) are omitted. - -**`file_sha256(path) -> String`**: Read the file and compute `SHA256` via the `sha2` crate, returned as hex. - -## Process Management - -**File:** `crates/openshell-sandbox/src/process.rs` - -### `ProcessHandle` - -Wraps `tokio::process::Child` + PID. Platform-specific `spawn()` methods delegate to `spawn_impl()`. - -**Environment setup** (both Linux and non-Linux): - -- `OPENSHELL_SANDBOX=1` (always set) -- Provider credentials (from `GetSandboxProviderEnvironment` RPC) -- Proxy URLs: `HTTP_PROXY`, `HTTPS_PROXY`, `ALL_PROXY` (uppercase for curl/wget), `NO_PROXY=127.0.0.1,localhost,::1` for localhost bypass, `http_proxy`, `https_proxy`, `grpc_proxy` (lowercase for gRPC C-core), `no_proxy=127.0.0.1,localhost,::1`, `NODE_USE_ENV_PROXY=1` (required for Node.js built-in `fetch`/`http` clients to honor proxy env vars) -- TLS trust store: `NODE_EXTRA_CA_CERTS` (standalone CA cert), `SSL_CERT_FILE`, `REQUESTS_CA_BUNDLE`, `CURL_CA_BUNDLE` (combined bundle) - -**Pre-exec closure** (runs in child after fork, before exec -- async-signal-safe): - -1. `setpgid(0, 0)` if non-interactive (create new process group) -2. `setns(fd, CLONE_NEWNET)` to enter network namespace (Linux only) -3. `drop_privileges(policy)`: `initgroups()` -> `setgid()` -> `setuid()` -4. Disable core dumps with `setrlimit(RLIMIT_CORE, 0)` on Unix -5. Set `prctl(PR_SET_DUMPABLE, 0)` on Linux -6. `sandbox::apply(policy, workdir)`: Landlock then seccomp - -### `drop_privileges()` - -Resolves user/group names from policy, then: - -1. `initgroups()` to set supplementary groups (Linux only, not macOS) -2. `setgid()` to target group -3. Verify `getegid()` matches the target GID -4. `setuid()` to target user -5. Verify `geteuid()` matches the target UID -6. Verify `setuid(0)` fails (confirms root cannot be re-acquired) - -The ordering is significant: `initgroups`/`setgid` must happen before `setuid` because switching user may drop the privileges needed for group manipulation. Similarly, privilege dropping must happen before Landlock because Landlock may block access to `/etc/passwd` and `/etc/group`. - -Steps 3, 5, and 6 are defense-in-depth post-condition checks (CWE-250 / CERT POS37-C). All three syscalls (`geteuid`, `getegid`, `setuid`) are async-signal-safe, so they are safe to call in the `pre_exec` context. The checks add negligible overhead while guarding against hypothetical kernel-level defects that could cause `setuid`/`setgid` to return success without actually changing the effective IDs. - -After the privilege drop, the child process also disables core dumps before Landlock and seccomp are applied. On all Unix targets it sets `RLIMIT_CORE=0`; on Linux it additionally sets `PR_SET_DUMPABLE=0`. This prevents crash artifacts from containing provider credentials, request payloads, or other sensitive in-memory data. - -### `ProcessStatus` - -Exit code is `code` if the process exited normally, or `128 + signal` if killed by a signal (standard Unix convention). Returns `-1` if neither is available. - -### Signal handling - -`kill()` sends SIGTERM, waits 100ms, then sends SIGKILL if the process is still running. - -## SSH Server - -**File:** `crates/openshell-sandbox/src/ssh.rs` - -The embedded SSH server provides remote shell access to the sandbox. It uses the `russh` crate and allocates PTYs for interactive sessions. The daemon listens on a **Unix domain socket** rather than a TCP port -- the gateway never dials the sandbox pod directly. All SSH traffic arrives through the [supervisor session](#supervisor-session)'s `RelayStream` RPC, which the supervisor bridges into the socket. - -### Startup - -`ssh_server_init()` (called from `run_ssh_server()`): - -1. Generate an ephemeral Ed25519 host key via `russh::keys::PrivateKey::random()`. -2. Ensure the socket's parent directory exists and is owned by root with mode `0700`. The sandbox entrypoint runs as an unprivileged user, so it cannot enter this directory. -3. Remove any stale socket file from a prior run, then `UnixListener::bind(listen_path)`. -4. Set the socket file's mode to `0600` so only the supervisor (root) can connect to it. -5. Signal readiness back to `lib.rs` via a `oneshot` channel. -6. Accept connections in a loop and spawn `handle_connection()` per connection. - -The socket path is taken from `--ssh-socket-path` / `OPENSHELL_SSH_SOCKET_PATH`. The Kubernetes compute driver sets this to `/run/openshell/ssh.sock` by default (see `crates/openshell-driver-kubernetes/src/main.rs`); the VM driver pins it to the same path inside the guest. - -### Access control - -The filesystem permissions on the parent directory (`0700`) and the socket itself (`0600`) are the sole authentication boundary. Only the supervisor, which runs as root inside the container, can open the socket. The sandboxed entrypoint process -- dropped to the unprivileged `sandbox` user and further constrained by Landlock -- cannot reach `/run/openshell/` at all. Consequently, the SSH session handler's `auth_none` and `auth_publickey` callbacks both return `Auth::Accept` unconditionally; any byte stream that reaches the daemon has already passed the trust check via the socket's permission bits. - -### Shell/exec handling - -The `SshHandler` implements `russh::server::Handler`: - -- **`pty_request()`**: Store terminal dimensions for PTY allocation -- **`shell_request()`**: Start an interactive `/bin/bash -i` -- **`exec_request()`**: Start `/bin/bash -lc {command}` -- **`window_change_request()`**: Resize PTY via `TIOCSWINSZ` ioctl -- **`data()`**: Forward client input to the PTY via an `mpsc::channel` - -### PTY child process - -`spawn_pty_shell()`: - -1. `openpty()` to create a master/slave PTY pair -2. Build `std::process::Command` (not tokio) with slave FDs for stdin/stdout/stderr -3. Set environment: `OPENSHELL_SANDBOX=1`, `HOME=/sandbox`, `USER=sandbox`, `TERM={negotiated}`, proxy URLs, TLS trust store paths, provider credentials -4. Install pre-exec closure (via `unsafe_pty::install_pre_exec()`): - - `setsid()` to create a new session - - `TIOCSCTTY` ioctl to set the controlling terminal - - `setns()` to enter the network namespace (Linux) - - `drop_privileges()` then `sandbox::apply()` (Landlock + seccomp) -5. Spawn three threads: - - **Writer thread**: Reads from `mpsc::Receiver`, writes to PTY master - - **Reader thread**: Reads from PTY master, sends SSH channel data, sends EOF when done, signals the exit thread - - **Exit thread**: Waits for child to exit, waits for reader to finish (ensures correct SSH protocol ordering: data -> EOF -> exit-status -> close), sends exit status and closes the channel - -## Supervisor Session - -**File:** `crates/openshell-sandbox/src/supervisor_session.rs` - -The sandbox pod has no inbound network surface. Instead, the supervisor opens a single persistent outbound gRPC stream to the gateway and the gateway uses that stream to request on-demand byte relays back into the sandbox. All SSH connect traffic and `ExecSandbox` calls ride this connection -- there is no reverse HTTP CONNECT, no TCP listener on the pod, and no per-session TLS handshake. - -### Connection model - -```mermaid -sequenceDiagram - participant S as Supervisor (sandbox) - participant GW as Gateway - participant SSHD as Local sshd (Unix socket) - participant Client as Operator / CLI - - S->>GW: ConnectSupervisor stream (mTLS, HTTP/2) - S->>GW: SupervisorMessage::Hello{sandbox_id, instance_id} - GW-->>S: GatewayMessage::SessionAccepted{session_id, heartbeat_interval_secs} - loop Heartbeats (max(accepted.heartbeat_interval_secs, 5)) - S-->>GW: SupervisorHeartbeat - GW-->>S: GatewayHeartbeat - end - - Client->>GW: sandbox connect / ExecSandbox - GW-->>S: GatewayMessage::RelayOpen{channel_id} - S->>GW: RelayStream RPC (new HTTP/2 stream on same Channel) - S->>GW: RelayFrame::Init{channel_id} - S->>SSHD: UnixStream::connect(ssh_socket_path) - loop Relay active - GW-->>S: RelayFrame::Data (raw SSH bytes from operator) - S->>SSHD: write_all - SSHD-->>S: read chunk (up to 16 KiB) - S-->>GW: RelayFrame::Data - end -``` - -One TCP+TLS+HTTP/2 connection carries both the long-lived control stream and every concurrent relay. The sandbox-side `Endpoint` uses `adaptive_window(true)` so HTTP/2 flow control does not throttle bulk transfers (SFTP, `sandbox rsync`) to the 64 KiB default window. - -### Session lifecycle - -`spawn(endpoint, sandbox_id, ssh_socket_path)` launches `run_session_loop()`, which runs for the lifetime of the supervisor: - -1. **Connect**: `grpc_client::connect_channel_pub(endpoint)` builds an mTLS `tonic::transport::Channel`. The same `Channel` is cloned into every subsequent `RelayStream` call so no additional TLS handshakes occur. -2. **Hello**: The supervisor sends `SupervisorMessage::Hello { sandbox_id, instance_id }` as the first envelope, where `instance_id` is a fresh UUID per session. The gateway uses the sandbox ID and instance ID to supersede a stale prior session (see [Supersede](#session-supersede)). -3. **Wait for `SessionAccepted` / `SessionRejected`**: If rejected, the loop returns an error and backs off. On accept, the supervisor clamps `heartbeat_interval_secs` to a minimum of 5 seconds. -4. **Main select loop**: Concurrently reads inbound `GatewayMessage`s and fires heartbeat ticks. Inbound `Heartbeat` messages are acknowledged by the supervisor's outbound heartbeat cadence; `RelayOpen` and `RelayClose` are dispatched to `handle_gateway_message()`. -5. **Reconnect**: Any error in the session (stream error, connect failure, rejected hello) is reported as an OCSF event and the loop sleeps with exponential backoff (`INITIAL_BACKOFF = 1s`, doubled up to `MAX_BACKOFF = 30s`) before redialing. - -### Relay bridge loop - -`handle_gateway_message()` is a synchronous dispatcher. When a `RelayOpen { channel_id }` arrives, it spawns a dedicated task running `handle_relay_open()`. That task: - -1. Creates an outbound `mpsc::channel::(16)` wrapped in a `ReceiverStream`. -2. Sends `RelayFrame { payload: RelayInit { channel_id } }` as the first frame -- this claims the matching pending-relay slot on the gateway. -3. Calls `OpenShellClient::relay_stream(outbound)` on the shared `Channel`. This opens a new HTTP/2 stream on the existing connection -- no new TCP or TLS handshake. -4. `UnixStream::connect(ssh_socket_path)` dials the local sshd. The split read/write halves become the local endpoints of the bridge. -5. Spawns a task that reads from the Unix socket in 16 KiB chunks (`RELAY_CHUNK_SIZE`, matching the default HTTP/2 frame size) and forwards each chunk as `RelayFrame::Data` on the outbound stream. -6. The main loop drains inbound `RelayFrame::Data` messages and writes them to the socket. Non-data inbound frames (e.g. a second `Init`) are treated as protocol errors. -7. On any side closing, the bridge calls `ssh_w.shutdown()` to propagate EOF, drops the outbound sender to close the gRPC stream, and joins the reader task. - -The supervisor has no SSH or HTTP awareness -- it is purely a byte bridge. The protocol on top of the relay is whatever the gateway's caller (interactive `sandbox connect`, `ExecSandbox`, `rsync`-over-ssh) speaks to the sshd. - -### Session supersede - -If the gateway restarts or the sandbox restarts and reconnects with a new `instance_id` for the same `sandbox_id`, the gateway atomically replaces any prior session it has recorded. The new supervisor continues normally; the old stream (if still live on the gateway side) is torn down by the gateway's `remove_if_current` logic. Supervisors never need to coordinate between themselves -- each just keeps trying to connect, and the most recent `Hello` wins. - -If the gateway closes the stream cleanly (`inbound.message()` returns `Ok(None)`), `run_single_session` returns `Ok(())` and a `session_closed` event is emitted. Otherwise the loop reconnects. - -### OCSF telemetry - -Every session and relay transition emits an OCSF `NetworkActivity` event via `ocsf_emit!()` so operators can audit the control-plane connection from the sandbox's own logs. All events are built in `supervisor_session.rs` and covered by unit tests in the `ocsf_event_tests` module. - -| Helper | `activity_id` | `severity` | `status` | Fires when | -|--------|---------------|------------|----------|------------| -| `session_established_event` | `Open` | `Informational` | `Success` | After `SessionAccepted`, includes `session_id` and `heartbeat_secs` in the message | -| `session_closed_event` | `Close` | `Informational` | `Success` | Gateway closed the stream cleanly (`Ok(None)`) | -| `session_failed_event` | `Fail` | `Low` | `Failure` | Connect failed, hello rejected, or stream errored. Includes reconnect attempt counter | -| `relay_open_event` | `Open` | `Informational` | `Success` | `RelayOpen` received from the gateway | -| `relay_closed_event` | `Close` | `Informational` | `Success` | Relay bridge task exited without error | -| `relay_failed_event` | `Fail` | `Low` | `Failure` | Bridge task returned an error (e.g., socket write failure, inbound non-data frame) | -| `relay_close_from_gateway_event` | `Close` | `Informational` | -- | Gateway sent an explicit `RelayClose` on the control stream, with its `reason` | - -The `dst_endpoint` on session events is parsed from the gateway URI by `ocsf_gateway_endpoint()`. Relay events omit a destination (the bridge is sandbox-internal). - -## Zombie Reaping (PID 1 Init Duties) - -`openshell-sandbox` runs as PID 1 inside the container. In Linux, when a process exits, its parent must call `waitpid()` to collect the exit status; otherwise the process remains as a zombie. Orphaned processes (whose parent exits first) are reparented to PID 1, which becomes responsible for reaping them. - -Coding agents running inside the sandbox (OpenClaw, Claude, Codex) frequently spawn background daemons and child processes. When these grandchildren are orphaned, they become PID 1's responsibility. Without reaping, they accumulate as zombies for the lifetime of the container. - -**File:** `crates/openshell-sandbox/src/lib.rs` - -The sandbox supervisor registers a `SIGCHLD` handler at startup and spawns a background reaper task. The reaper also runs on a 5-second interval timer as a fallback in case signals are coalesced or missed. On each wake, it loops calling `waitid(Id::All, WEXITED | WNOHANG | WNOWAIT)` to inspect exited children without consuming their status. For each exited child: - -1. Check `MANAGED_CHILDREN` (a `Mutex>`) to determine if the PID belongs to a managed child (entrypoint or SSH session process) that has an explicit waiter. -2. If managed, break out of the loop -- the explicit `child.wait()` call owns that status. -3. If not managed (an orphaned grandchild), call `waitpid(pid, WNOHANG)` to reap it. - -This two-phase approach (peek with `WNOWAIT`, then selectively reap) avoids `ECHILD` races with explicit `child.wait()` calls on managed children while still collecting orphan zombies. The `MANAGED_CHILDREN` set is updated via `register_managed_child()` (at spawn) and `unregister_managed_child()` (after wait completes). This feature is Linux-only (`#[cfg(target_os = "linux")]`). - -## Environment Variables Reference - -### Configuration (CLI flags / env vars) - -| Variable | CLI flag | Default | Purpose | -|----------|----------|---------|---------| -| `OPENSHELL_SANDBOX_COMMAND` | (trailing args) | `/bin/bash` | Command to execute inside sandbox | -| `OPENSHELL_SANDBOX_ID` | `--sandbox-id` | | Sandbox ID for gRPC policy fetch | -| `OPENSHELL_ENDPOINT` | `--openshell-endpoint` | | Gateway gRPC endpoint | -| `OPENSHELL_POLICY_RULES` | `--policy-rules` | | Path to Rego policy file | -| `OPENSHELL_POLICY_DATA` | `--policy-data` | | Path to YAML data file | -| `OPENSHELL_LOG_LEVEL` | `--log-level` | `warn` | Log level (trace/debug/info/warn/error) | -| `OPENSHELL_POLICY_POLL_INTERVAL_SECS` | | `30` | Poll interval for gRPC policy updates (seconds). Only active in gRPC mode. | -| `OPENSHELL_LOG_PUSH_LEVEL` | | `info` | Maximum tracing level for log push to gateway. Events above this level are not streamed. Only active in gRPC mode. | -| `OPENSHELL_SSH_SOCKET_PATH` | `--ssh-socket-path` | | Filesystem path to the Unix socket the embedded sshd binds (e.g. `/run/openshell/ssh.sock`). | -| `OPENSHELL_INFERENCE_ROUTES` | `--inference-routes` | | Path to YAML inference routes file for standalone routing | - -### Injected into child process - -| Variable | Purpose | -|----------|---------| -| `OPENSHELL_SANDBOX` | Always `"1"` -- signals the process is sandboxed | -| `HTTP_PROXY` / `HTTPS_PROXY` / `ALL_PROXY` | Proxy URL (uppercase, for curl/wget) | -| `http_proxy` / `https_proxy` / `grpc_proxy` | Proxy URL (lowercase, for gRPC C-core) | -| `NODE_USE_ENV_PROXY` | Set to `1` so Node.js built-in `fetch`/`http` clients honor proxy env vars | -| `NODE_EXTRA_CA_CERTS` | Path to sandbox CA cert PEM (Node.js, additive) | -| `SSL_CERT_FILE` | Combined CA bundle path (OpenSSL/Python/Go) | -| `REQUESTS_CA_BUNDLE` | Combined CA bundle path (Python requests) | -| `CURL_CA_BUNDLE` | Combined CA bundle path (curl/libcurl) | -| Provider credentials | From `GetSandboxProviderEnvironment` RPC (e.g., `ANTHROPIC_API_KEY`) | - -### Injected into SSH child process (additional) - -| Variable | Purpose | -|----------|---------| -| `HOME` | `/sandbox` | -| `USER` | `sandbox` | -| `TERM` | Negotiated terminal type (default `xterm-256color`) | - -## Error Handling and Graceful Degradation - -The sandbox uses `miette` for error reporting and `thiserror` for typed errors. The general principle is: fail hard on security-critical errors, degrade gracefully on non-critical ones. - -| Condition | Behavior | -|-----------|----------| -| Policy fetch failure (gRPC or file) | Fatal -- sandbox cannot start without policy | -| Provider env fetch failure | Warn + continue with empty map | -| Policy poll: gateway unreachable | Debug log + retry on next interval | -| Policy poll: `reload_from_proto()` failure | Warn + keep last-known-good engine + report FAILED status to gateway | -| Policy poll: status report failure | Warn + poll loop continues | -| Landlock failure + `BestEffort` | Warn + continue without filesystem isolation | -| Landlock failure + `HardRequirement` | Fatal | -| Seccomp failure | Fatal | -| Network namespace creation failure | Fatal in `Proxy` mode (sandbox startup aborts) | -| Bypass detection: iptables not available | Warn + skip rule installation (routing-only isolation) | -| Bypass detection: IPv4 rule installation failure | Warn + returned as error (non-fatal at call site) | -| Bypass detection: IPv6 rule installation failure | Warn + continue (IPv4 rules are the primary path) | -| Bypass detection: LOG rule installation failure | Warn + continue (REJECT rules still installed for fast-fail) | -| Bypass detection: `/dev/kmsg` not available | Warn + monitor not started (REJECT rules still provide fast-fail) | -| Bypass detection: `/dev/kmsg` read error (EPIPE/EIO) | Debug log + continue reading (kernel ring buffer overrun) | -| Ephemeral CA generation failure | Warn + TLS termination disabled (L7 inspection on TLS endpoints will not work) | -| CA file write failure | Warn + TLS termination disabled | -| OPA engine Mutex lock poisoned | Error on the individual evaluation | -| Binary integrity TOFU mismatch | Deny the specific CONNECT request | -| SSRF: hostname resolves to internal IP | Deny the specific CONNECT request (403 Forbidden + warning log) | -| SSRF: DNS resolution failure | Deny the specific CONNECT request | -| Inference route file load/parse error | Fatal -- sandbox startup aborts | -| Inference route file with empty routes | Inference routing disabled (graceful) | -| Inference gateway bundle with empty routes | Inference routing stays enabled with empty cache; refresh can activate routes later | -| Inference gateway bundle fetch failure | Warn + inference routing disabled (graceful) | -| Inference interception: missing InferenceContext | Denied outcome + structured CONNECT deny log | -| Inference interception: missing TLS state | Denied outcome + structured CONNECT deny log | -| Inference interception: TLS handshake failure | Denied outcome + structured CONNECT deny log | -| Inference interception: client disconnect (no prior routing) | Denied outcome + structured CONNECT deny log | -| Inference interception: I/O error (no prior routing) | Denied outcome + structured CONNECT deny log | -| Inference interception: empty route cache | 503 Service Unavailable with JSON error body | -| Inference interception: no compatible route | 400 Bad Request with JSON error body | -| Inference interception: backend timeout/unavailable | 503 Service Unavailable with JSON error body | -| Inference interception: backend protocol error | 502 Bad Gateway with JSON error body | -| Inference interception: request not allowed by policy (no prior routing) | 403 Forbidden with JSON error body + structured CONNECT deny log | -| Inference interception: request not allowed by policy (after prior routing) | 403 Forbidden with JSON error body (no deny log, connection counts as routed) | -| Log push gRPC connection fails | Task prints to stderr and exits; logs not pushed for sandbox lifetime | -| Log push mpsc channel full (1024 lines) | Event dropped silently; logging never blocks | -| Log push gRPC stream breaks | Push loop exits, flushes remaining batch | -| Proxy accept error | Log + break accept loop | -| Benign connection close (EOF, reset, pipe) | Debug level (not visible to user by default) | -| Credential injection: unresolved placeholder detected | HTTP 500, connection closed (fail-closed) | -| Credential injection: resolved value contains CR/LF/null | Placeholder treated as unresolvable, fail-closed | -| Credential injection: path credential contains traversal/separator | HTTP 500, connection closed (fail-closed) | -| Credential injection: percent-encoded placeholder bypass attempt | HTTP 500, connection closed (fail-closed) | -| L7 parse or path-canonicalization error | Emit denied OCSF network event with `status_detail`, close the connection | -| SSH socket bind failure | Fatal -- reported through the readiness channel and aborts startup | -| SSH server accept failure | Async task error logged, main process unaffected | -| Supervisor session: connect failure | Emit `session_failed` OCSF event, sleep with exponential backoff (1s -> 30s) and reconnect | -| Supervisor session: `SessionRejected` | Emit `session_failed` event with rejection reason; backoff and reconnect | -| Supervisor session: stream error mid-session | Emit `session_failed` event; backoff and reconnect | -| Supervisor session: gateway closes stream cleanly | Emit `session_closed` event and exit the task (no reconnect) | -| Relay bridge: `RelayStream` RPC failure | Emit `relay_failed` event; the individual relay is abandoned, the session stays up | -| Relay bridge: Unix socket connect failure | Emit `relay_failed` event; gateway observes EOF on the RelayStream | -| Relay bridge: non-data inbound frame after Init | Emit `relay_failed` event with protocol error | -| Process timeout | Kill process, return exit code 124 | - -## Logging - -Dual-output logging is configured in `main.rs`: - -- **stdout**: Filtered by `--log-level` (default `warn`), uses ANSI colors -- **`/var/log/openshell.log`**: Fixed at `info` level, no ANSI, non-blocking writer - -Key structured log events: - -- `CONNECT`: One per proxy CONNECT request (for non-`inference.local` targets) with full identity context. Inference interception failures produce a separate `info!()` log with `action=deny` and the denial reason. -- `BYPASS_DETECT`: One per detected direct connection attempt that bypassed the HTTP CONNECT proxy. Includes destination, protocol, process identity (best-effort), and remediation hint. Emitted at `warn` level. -- `L7_REQUEST`: One per L7-inspected request with method, path, and decision -- Supervisor session / relay OCSF events: `session_established`, `session_closed`, `session_failed`, `relay_open`, `relay_closed`, `relay_failed`, `relay_close_from_gateway` (see [Supervisor Session](#supervisor-session)). -- Sandbox lifecycle events: process start, exit, namespace creation/cleanup, bypass rule installation -- Policy reload events: new version detected, reload success/failure, status report outcomes - -## Log Streaming - -In gRPC mode, sandbox supervisor logs are streamed to the gateway in real time. This enables operators and CLI users to view both gateway-side and sandbox-side logs in a unified stream via `nav logs`. - -### Architecture overview - -```mermaid -flowchart LR - subgraph "Sandbox supervisor" - A[tracing events] --> B[LogPushLayer] - B -->|try_send| C[mpsc channel\n1024 lines] - C --> D[Background task] - D -->|batched| E[PushSandboxLogs\nclient-streaming RPC] - end - subgraph "Gateway server" - E --> F[push_sandbox_logs handler] - F -->|force source=sandbox| G[TracingLogBus.publish_external] - G --> H[broadcast channel\n+ tail buffer 2000 lines] - I[SandboxLogLayer] -->|source=gateway| H - end - subgraph "CLI / watchers" - H --> J[WatchSandbox stream] - H --> K[GetSandboxLogs one-shot] - end -``` - -Two log sources feed the same `TracingLogBus`: - -- **Gateway logs** (`source: "gateway"`): Generated by the server's `SandboxLogLayer` tracing layer when server-side code emits events containing a `sandbox_id` field. These capture reconciliation, provisioning, and management operations. -- **Sandbox logs** (`source: "sandbox"`): Pushed from the sandbox supervisor via the `PushSandboxLogs` client-streaming RPC. These capture proxy decisions, policy reloads, process lifecycle, and all other sandbox-internal tracing events. - -### LogPushLayer - -**File:** `crates/openshell-sandbox/src/log_push.rs` - -`LogPushLayer` is a `tracing_subscriber::Layer` that intercepts tracing events in the sandbox supervisor and forwards them to the gateway. - -```rust -pub struct LogPushLayer { - sandbox_id: String, - tx: mpsc::Sender, - max_level: tracing::Level, -} -``` - -Key behaviors: - -- **Level filtering**: Defaults to `INFO`. Configurable via the `OPENSHELL_LOG_PUSH_LEVEL` environment variable (accepts `trace`, `debug`, `info`, `warn`, `error`). Events above the configured level are silently discarded. -- **Best-effort delivery**: Uses `try_send()` on the mpsc channel. If the channel is full (1024 lines buffered), the event is dropped. Logging never blocks the sandbox supervisor. -- **Structured fields**: Implements a `LogVisitor` that collects all tracing key-value fields (e.g., `dst_host`, `action`, `policy`) into a `HashMap`. The `message` field is extracted separately; all other fields go into `SandboxLogLine.fields`. -- **Source tagging**: Sets `source: "sandbox"` on every log line at construction time. - -### Initialization - -**File:** `crates/openshell-sandbox/src/main.rs` - -The log push layer is set up in `main()` before calling `run_sandbox()`, only in gRPC mode (when both `--sandbox-id` and `--openshell-endpoint` are present): - -1. `spawn_log_push_task(endpoint, sandbox_id)` creates the mpsc channel and background task, returning the sender half and a `JoinHandle`. -2. `LogPushLayer::new(sandbox_id, tx)` wraps the sender in a tracing layer. -3. The layer is added to the `tracing_subscriber::registry()` alongside the stdout and file layers. - -This means the push layer captures all tracing events the sandbox supervisor generates, filtered by `OPENSHELL_LOG_PUSH_LEVEL` (default INFO). - -### Background push task - -**File:** `crates/openshell-sandbox/src/log_push.rs` (`spawn_log_push_task()`, `run_push_loop()`) - -The background task batches log lines and streams them to the gateway: - -1. **Channel setup**: Creates a bounded `mpsc::channel::(1024)`. The sender goes to the `LogPushLayer`; the receiver feeds the push loop. -2. **gRPC connection**: Connects a `CachedOpenShellClient` to the gateway. On connection failure, the task prints to stderr (cannot use tracing to avoid recursion) and exits. -3. **Client-streaming RPC**: Opens a `PushSandboxLogs` client-streaming call via a secondary `mpsc::channel::(32)` wrapped in `tokio_stream::wrappers::ReceiverStream`. A separate spawned task drives the gRPC call. -4. **Batch-and-flush loop**: Accumulates lines in a `Vec` (capacity 50). Flushes when: - - The batch reaches 50 lines, OR - - A 500ms interval timer fires (with `MissedTickBehavior::Skip`) -5. **Shutdown**: When the `LogPushLayer` sender is dropped (sandbox exits), the receiver returns `None`, the loop breaks, and any remaining lines are flushed in a final batch. - -### Server-side ingestion - -**File:** `crates/openshell-server/src/grpc.rs` (`push_sandbox_logs`) - -The `PushSandboxLogs` RPC handler processes each batch: - -1. Validates `sandbox_id` is non-empty (skips empty batches). -2. Iterates over `batch.logs`, capped at 100 lines per batch to prevent abuse. -3. Forces `log.source = "sandbox"` on every line -- the sandbox cannot claim to be the gateway. -4. Forces `log.sandbox_id` to match the batch envelope -- a sandbox cannot inject logs for other sandboxes. -5. Publishes each log via `TracingLogBus::publish_external()`. - -### TracingLogBus integration - -**File:** `crates/openshell-server/src/tracing_bus.rs` - -`publish_external()` wraps the `SandboxLogLine` in a `SandboxStreamEvent` and calls the internal `publish()` method, which: - -1. Sends the event to the per-sandbox `broadcast::Sender` (capacity 1024). Subscribers (active `WatchSandbox` streams) receive the event immediately. -2. Appends the event to the per-sandbox tail buffer (`VecDeque`), capped at 2000 lines. Overflow evicts the oldest entry. - -The same `publish()` method is used by the server's own `SandboxLogLayer` for gateway-sourced logs, so both sources share identical broadcast and tail buffer infrastructure. - -### Source tagging - -The `SandboxLogLine.source` field distinguishes log origins: - -| Source | Set by | Description | -|--------|--------|-------------| -| `"gateway"` | `SandboxLogLayer` in `tracing_bus.rs` | Server-side logs (reconciliation, provisioning, management) | -| `"sandbox"` | `push_sandbox_logs` handler in `grpc.rs` | Sandbox supervisor logs (proxy, policy, process lifecycle) | -| `""` (empty) | Legacy/pre-source logs | Treated as `"gateway"` by the CLI (`print_log_line()`) and server (`source_matches()`) | - -### Structured fields - -The `SandboxLogLine.fields` map (`map` in proto) carries tracing key-value pairs from sandbox events. Examples: - -| Field | Source | Description | -|-------|--------|-------------| -| `dst_host` | Proxy CONNECT log | Destination hostname | -| `action` | Proxy CONNECT log | `allow` or `deny` | -| `policy` | Proxy CONNECT log | Matched policy name | -| `version` | Policy reload log | New policy version number | -| `policy_hash` | Policy reload log | SHA256 hash of new policy | - -Gateway-sourced logs do not currently populate the `fields` map (it remains empty). Only sandbox-pushed logs include structured fields. - -### CLI filtering - -**File:** `crates/openshell-cli/src/main.rs` (command definition), `crates/openshell-cli/src/run.rs` (`sandbox_logs()`) - -The `nav logs` command supports filtering by source and level: - -```bash -# Show only sandbox-side logs -nav logs my-sandbox --source sandbox - -# Show only warnings and errors from the gateway -nav logs my-sandbox --source gateway --level warn - -# Stream live logs from all sources -nav logs my-sandbox --tail - -# Stream live sandbox logs only -nav logs my-sandbox --tail --source sandbox -``` - -**CLI flags:** - -| Flag | Default | Description | -|------|---------|-------------| -| `--source` | `all` | Filter by source: `gateway`, `sandbox`, or `all`. Can be specified multiple times. | -| `--level` | (empty) | Minimum log level: `error`, `warn`, `info`, `debug`, `trace`. Empty means all levels. | - -**Server-side filtering:** - -Both `WatchSandboxRequest` and `GetSandboxLogsRequest` carry filter fields: - -| Proto field | Message | Purpose | -|-------------|---------|---------| -| `log_sources` | `WatchSandboxRequest` | `repeated string` -- filter live log events by source | -| `log_min_level` | `WatchSandboxRequest` | `string` -- minimum log level for live events | -| `sources` | `GetSandboxLogsRequest` | `repeated string` -- filter one-shot log fetch by source | -| `min_level` | `GetSandboxLogsRequest` | `string` -- minimum log level for one-shot fetch | - -Filtering is implemented server-side. For `WatchSandbox`, filters apply to both the tail replay and live events. For `GetSandboxLogs`, filters apply to the tail buffer scan. The `source_matches()` helper treats empty source as `"gateway"` for backward compatibility. The `level_matches()` helper uses a numeric ranking (ERROR=0, WARN=1, INFO=2, DEBUG=3, TRACE=4); unknown levels always pass. - -### CLI output format - -`print_log_line()` in `crates/openshell-cli/src/run.rs` formats each log line: - -```text -[timestamp] [source ] [level] [target] message key=value key=value -``` - -Example output: - -```text -[1708891234.567] [sandbox] [INFO ] [openshell_sandbox::proxy] CONNECT api.example.com:443 dst_host=api.example.com action=allow -[1708891234.890] [gateway] [INFO ] [openshell_server::grpc] ReportPolicyStatus: sandbox reported policy load result -``` - -When the `fields` map is non-empty, entries are sorted by key and appended as `key=value` pairs. - -### Create-watch filter - -**File:** `crates/openshell-cli/src/run.rs` - -During `sandbox create`, the CLI opens a `WatchSandbox` stream with `stop_on_terminal: true` to wait until the sandbox reaches `Ready` phase. This stream uses `log_sources: ["gateway"]` to filter out sandbox-pushed logs. Without this filter, continuous sandbox supervisor logs (e.g., proxy CONNECT events) would keep the stream active and prevent `stop_on_terminal` from detecting that provisioning has completed and the stream should close. - -### Data flow summary - -```mermaid -sequenceDiagram - participant SB as Sandbox Supervisor - participant LP as LogPushLayer - participant CH as mpsc channel (1024) - participant BG as Background push task - participant GW as Gateway (push_sandbox_logs) - participant TB as TracingLogBus - participant CL as CLI (nav logs) - - SB->>LP: tracing event (info!(...)) - LP->>LP: Check level >= OPENSHELL_LOG_PUSH_LEVEL - LP->>CH: try_send(SandboxLogLine) - Note over CH: Drops if full (best-effort) - CH->>BG: recv() - BG->>BG: Accumulate in batch (max 50) - alt Batch full OR 500ms timer - BG->>GW: PushSandboxLogsRequest (client-streaming) - end - GW->>GW: Force source="sandbox", cap 100 lines - GW->>TB: publish_external(log) - TB->>TB: broadcast + append to tail buffer (2000 cap) - CL->>TB: WatchSandbox / GetSandboxLogs - TB-->>CL: SandboxStreamEvent with log payload -``` - -### Failure modes - -| Condition | Behavior | -|-----------|----------| -| Log push gRPC connection fails | Task prints to stderr and exits; no logs are pushed for the sandbox lifetime | -| mpsc channel full (1024 lines buffered) | `try_send()` drops the event silently; logging never blocks | -| gRPC stream breaks mid-session | Push loop detects send error, breaks, flushes remaining batch | -| Push batch exceeds 100 lines | Server caps at 100 lines per batch; excess lines in the batch are ignored | -| `OPENSHELL_LOG_PUSH_LEVEL` unparseable | Falls back to INFO | - -## Platform Support - -Platform-specific code is abstracted through `crates/openshell-sandbox/src/sandbox/mod.rs`. - -| Feature | Linux | Other platforms | -|---------|-------|-----------------| -| Landlock | Applied via `landlock` crate (ABI V1) | Warning + no-op | -| Seccomp | Applied via `seccompiler` crate | No-op | -| Network namespace | Full veth pair isolation | Not available | -| Bypass detection | iptables rules + `/dev/kmsg` monitor | Not available (no netns) | -| `/proc` identity binding | Full support | `evaluate_opa_tcp()` always denies | -| Proxy | Functional (binds to veth IP or loopback) | Functional (loopback only, no identity binding) | -| SSH server | Full support (with netns for shell processes) | Functional (no netns isolation for shell processes) | -| Privilege dropping | `initgroups` + `setgid` + `setuid` | `setgid` + `setuid` (no `initgroups` on macOS) | - -On non-Linux platforms, the sandbox can still run commands with proxy-based network filtering, but the kernel-level isolation (filesystem, syscall, namespace) and process-identity binding are unavailable. - -## Cross-References - -- [Overview](README.md) -- System-wide architecture context -- [Gateway Architecture](gateway.md) -- gRPC services that serve policy to the sandbox -- [Container Management](build-containers.md) -- How sandbox containers are built and deployed -- [Sandbox Connect](sandbox-connect.md) -- SSH tunnel from gateway to sandbox -- [Providers](sandbox-providers.md) -- Provider credential injection -- [Policy Language](security-policy.md) -- Rego policy syntax and rules -- [Inference Routing](inference-routing.md) -- Inference interception, route management, and the `openshell-router` crate +- If gateway config polling fails, the sandbox keeps its last-known-good policy. +- If a live policy update is invalid, the supervisor rejects it and keeps the + current policy. +- Existing raw byte streams are connection scoped. Dynamic policy changes apply + to new connections or the next parsed HTTP request where the proxy can safely + re-evaluate. +- If the supervisor relay drops, the sandbox can keep running, but connect and + exec operations fail until the supervisor registers again. diff --git a/architecture/security-policy.md b/architecture/security-policy.md index 1445705db..e5f179dc1 100644 --- a/architecture/security-policy.md +++ b/architecture/security-policy.md @@ -1,1575 +1,94 @@ -# Policy Language +# Security Policy -The sandbox system uses a YAML-based policy language to govern sandbox behavior. This document is the definitive reference for the policy schema, how each field maps to enforcement mechanisms, and the behavioral triggers that control which enforcement layer is activated. +OpenShell policy defines what a sandboxed agent can access. The policy is +enforced inside each sandbox by kernel controls, process setup, and the local +policy proxy. The gateway stores and delivers policy, but it does not make +per-request egress decisions. -Policies serve two purposes: +For the field-by-field YAML reference, use +[Policy Schema Reference](../docs/reference/policy-schema.mdx). -1. **Static configuration** -- filesystem access rules, Landlock compatibility, and process privilege dropping (applied once at sandbox startup and immutable for the sandbox's lifetime). -2. **Dynamic network decisions** -- per-connection and per-request access control evaluated at runtime by the OPA engine. These fields can be updated on a running sandbox via live policy updates. +## Policy Areas -## Policy Loading +| Area | Enforcement | +|---|---| +| Filesystem | Landlock restricts read-only and read-write paths. | +| Process | The supervisor launches the agent as an unprivileged user with reduced capabilities. | +| Network | The proxy evaluates destination, port, calling binary, and optional L7 rules. | +| Inference | `inference.local` is configured through gateway inference settings, not OPA network policy. | +| Runtime settings | Typed settings are delivered with policy and can be global or sandbox scoped. | -The sandbox supervisor loads policy through one of two paths, selected at startup based on available configuration. +Filesystem and process policy are startup-time controls. Network policy is +dynamic and can be hot-reloaded when the new policy validates successfully. -### File Mode (Local Development) +## Network Decisions -Provide a Rego rules file and a YAML data file via CLI flags or environment variables: +Ordinary network traffic follows this order: -```bash -openshell-sandbox \ - --policy-rules sandbox-policy.rego \ - --policy-data dev-sandbox-policy.yaml \ - -- /bin/bash -``` +1. Force traffic through the sandbox proxy with namespace and seccomp controls. +2. Identify the calling binary and compare its trusted identity. +3. Reject hard-blocked destinations, including unsafe internal IP ranges unless + explicitly allowed. +4. Match the destination and binary against network policy blocks. +5. Apply optional HTTP/L7 rules for endpoints that enable protocol inspection. +6. Allow, deny, audit, or log according to the matched policy. -| Flag | Environment Variable | Description | -| ---------------- | ------------------------ | ------------------------------------------------ | -| `--policy-rules` | `OPENSHELL_POLICY_RULES` | Path to `.rego` file containing evaluation rules | -| `--policy-data` | `OPENSHELL_POLICY_DATA` | Path to YAML file containing policy data | +Explicit deny and hardening checks win over allow rules. If no rule matches, the +request is denied. -The YAML data file is preprocessed before loading into the OPA engine: L7 policies are validated, and `access` presets are expanded into explicit `rules` arrays. See `crates/openshell-sandbox/src/opa.rs` -- `preprocess_yaml_data()`. +## TLS and L7 Inspection -### gRPC Mode (Production) +For HTTP endpoints that need request-level controls, the proxy can terminate TLS +with the sandbox's ephemeral CA and inspect method/path or protocol-specific +metadata before forwarding. The proxy also supports credential injection on +terminated HTTP streams when policy allows the endpoint. -When the sandbox runs under a gateway-managed compute platform, it fetches its typed protobuf policy from the gateway: +Raw streams, HTTP upgrades, and long-lived response bodies are connection +scoped. Policy reloads affect the next connection or the next parsed HTTP +request; they do not rewrite bytes already being relayed. -```bash -openshell-sandbox \ - --sandbox-id abc123 \ - --openshell-endpoint https://openshell:8080 \ - -- /bin/bash -``` +## Live Updates -| Flag | Environment Variable | Description | -| ------------------------ | ---------------------- | ---------------------------- | -| `--sandbox-id` | `OPENSHELL_SANDBOX_ID` | Sandbox ID for policy lookup | -| `--openshell-endpoint` | `OPENSHELL_ENDPOINT` | Gateway gRPC endpoint | +The gateway stores policy revisions and exposes effective sandbox configuration. +The supervisor polls for config revisions and attempts to load new dynamic +policy into the in-process OPA engine. -The gateway returns a `SandboxPolicy` protobuf message (defined in `proto/sandbox.proto`). The sandbox supervisor converts this proto into JSON, validates L7 config, expands presets, and loads it into the OPA engine using baked-in Rego rules (`sandbox-policy.rego` compiled via `include_str!`). See `crates/openshell-sandbox/src/opa.rs` -- `OpaEngine::from_proto()`. +If a new policy fails validation or loading, the supervisor reports the failure +and keeps the last-known-good policy. Static controls, such as filesystem +allowlists and process identity, require a new sandbox because they are applied +before the child process starts. -### Policy Loading Sequence +Gateway-global policy can override sandbox-scoped policy. Use it sparingly +because it changes the effective access model for every sandbox on the gateway. -```mermaid -flowchart TD - START[Sandbox Startup] --> CHECK{File mode?
--policy-rules +
--policy-data} - CHECK -->|Yes| FILE[Read .rego + .yaml from disk] - CHECK -->|No| OPENSHELL{gRPC mode?
--sandbox-id +
--openshell-endpoint} - OPENSHELL -->|Yes| FETCH[Fetch SandboxPolicy proto via gRPC] - OPENSHELL -->|No| ERR[Error: no policy source] +## Policy Advisor - FILE --> PREPROCESS[Preprocess YAML:
validate L7, expand presets] - FETCH --> PROTO2JSON[Convert proto to JSON
validate L7, expand presets] +The policy advisor pipeline turns observed denials into draft policy +recommendations: - PREPROCESS --> OPA[Load into OPA engine] - PROTO2JSON --> OPA +1. The sandbox aggregates denied network events. +2. A mechanistic mapper proposes minimal endpoint, binary, or rule additions. +3. The gateway validates and stores draft recommendations. +4. A human or admin workflow approves or rejects drafts. +5. Approved drafts merge into the target sandbox policy. - OPA --> QUERY[Query sandbox config:
filesystem, landlock, process] - QUERY --> APPLY[Apply sandbox restrictions] -``` +The advisor should propose narrow additions and preserve explicit-deny behavior. +It is a workflow aid, not an automatic permission grant. -### Priority +## Security Logging -File mode takes precedence. If both `--policy-rules`/`--policy-data` and `--sandbox-id`/`--openshell-endpoint` are provided, file mode is used. See `crates/openshell-sandbox/src/lib.rs` -- `load_policy()`. +Sandbox events that represent observable behavior use OCSF structured logs: -## Live Policy Updates +| Event | OCSF class | +|---|---| +| Network and proxy decisions | Network or HTTP activity | +| SSH authentication and relay activity | SSH activity | +| Process lifecycle | Process activity | +| Policy and settings changes | Configuration state change | +| Security findings | Detection finding | -Policy can be updated on a running sandbox without restarting it. This enables operators to tighten or relax network access rules in response to changing requirements. +Use plain tracing for internal plumbing such as retries, debug state, and +intermediate steps where the final observable event is logged separately. -Live updates are only available in **gRPC mode** (production clusters). File-mode sandboxes load policy once at startup and do not poll for changes. - -### Static vs. Dynamic Fields - -Policy fields fall into two categories based on when they are enforced: - -| Category | Fields | Enforcement Point | Updatable? | -|----------|--------|-------------------|------------| -| **Static** | `filesystem_policy`, `landlock`, `process` | Applied once in the child process `pre_exec` (after `fork()`, before `exec()`). Kernel-level Landlock rulesets and UID/GID changes cannot be reversed. | No -- immutable after sandbox creation | -| **Dynamic** | `network_policies` | Evaluated at runtime by the OPA engine on every proxy CONNECT request and L7 rule check. The OPA engine can be atomically replaced. | Yes -- via `openshell policy set` | - -Attempting to change a static field in an update request returns an `INVALID_ARGUMENT` error with a message indicating which field cannot be modified. See `crates/openshell-server/src/grpc.rs` -- `validate_static_fields_unchanged()`. - -### Network Mode Immutability - -Proto-backed sandboxes always run with proxy networking. The proxy, network namespace, and OPA evaluation path are created at sandbox startup and stay in place for the lifetime of the sandbox. - -That means `network_policies` can change freely at runtime, including transitions between an empty map (proxy-backed deny-all) and a non-empty map (proxy-backed allowlist). The immutable boundary is the proxy infrastructure itself, not whether the current policy has any rules. - -### Update Flow - -The update mechanism uses a poll-based model with versioned policy revisions and server-side status tracking. - -```mermaid -sequenceDiagram - participant CLI as nav policy set - participant GW as Gateway (openshell-server) - participant DB as Persistence (SQLite/Postgres) - participant SB as Sandbox (openshell-sandbox) - - CLI->>GW: UpdateSandboxPolicy(name, new_policy) - GW->>GW: Validate static fields unchanged - GW->>DB: put_policy_revision(version=N, status=pending) - GW->>DB: supersede_pending_policies(before_version=N) - GW-->>CLI: UpdateSandboxPolicyResponse(version=N, hash) - - loop Every 30s (configurable) - SB->>GW: GetSandboxSettings(sandbox_id) - GW->>DB: get_latest_policy(sandbox_id) - GW-->>SB: GetSandboxSettingsResponse(policy, version=N, hash) - end - - Note over SB: Detects version > current_version - SB->>SB: OpaEngine::reload_from_proto(new_policy) - - alt Reload succeeds - SB->>GW: ReportPolicyStatus(version=N, LOADED) - GW->>DB: update_policy_status(version=N, "loaded") - GW->>DB: Update sandbox.current_policy_version = N - else Reload fails (validation error) - Note over SB: Previous engine untouched (LKG) - SB->>GW: ReportPolicyStatus(version=N, FAILED, error_msg) - GW->>DB: update_policy_status(version=N, "failed", error_msg) - end - - opt CLI --wait flag - CLI->>GW: GetSandboxPolicyStatus(name, version=N) - GW-->>CLI: revision.status = LOADED / FAILED - end -``` - -### Policy Versioning - -Each sandbox maintains an independent, monotonically increasing version counter for its policy revisions: - -- **Version 1** is the policy from the sandbox's `spec.policy` at creation time. It is backfilled lazily on the first `GetSandboxSettings` call if no explicit revision exists in the policy history table. See `crates/openshell-server/src/grpc.rs` -- `get_sandbox_settings()`. -- Each `UpdateSandboxPolicy` call computes the next version as `latest_version + 1` and persists a new `PolicyRecord` with status `"pending"`. -- When a new version is persisted, all older revisions still in `"pending"` status are marked `"superseded"` via `supersede_pending_policies()`. This handles rapid successive updates where the sandbox has not yet picked up an intermediate version. -- The `Sandbox` protobuf object carries a `current_policy_version` field (see `proto/datamodel.proto`) that is updated when the sandbox reports a successful load. - -Each revision is stored as a `PolicyRecord` containing the full serialized protobuf payload, a SHA-256 hash of that payload, a status string, and timestamps. See `crates/openshell-server/src/persistence/mod.rs` -- `PolicyRecord`. - -### Deterministic Policy Hashing - -Policy hashes use a deterministic function that avoids the non-determinism of protobuf's `encode_to_vec()` on `map` fields. Protobuf `map` fields are backed by `HashMap`, whose iteration order is randomized, so encoding the same logical policy twice can produce different byte sequences. The `deterministic_policy_hash()` function avoids this by hashing each top-level field individually and sorting `network_policies` map entries by key before hashing. See `crates/openshell-server/src/grpc.rs` -- `deterministic_policy_hash()`. - -The hash is computed as follows: - -1. Hash the `version` field as little-endian bytes. -2. Hash the `filesystem`, `landlock`, and `process` sub-messages via `encode_to_vec()` (these contain no `map` fields, so encoding is deterministic). -3. Collect `network_policies` entries, sort by map key, then hash each key (as UTF-8 bytes) followed by the value's `encode_to_vec()`. -4. Return the hex-encoded SHA-256 digest. - -This guarantees that the same logical policy always produces the same hash regardless of protobuf serialization order. - -**Idempotent updates**: `UpdateSandboxPolicy` compares the deterministic hash of the submitted policy against the latest stored revision's hash. If they match, the handler returns the existing version and hash without creating a new revision. The CLI detects this (the returned version equals the pre-call version) and prints `Policy unchanged` instead of `Policy version N submitted`. This makes repeated `policy set` calls safe and idempotent. - -### Incremental Merge Updates - -`UpdateConfigRequest.merge_operations` supports batched incremental changes to the dynamic `network_policies` section. The CLI exposes this as `openshell policy update`. - -Supported first-pass operations: - -- `--add-endpoint host:port[:access[:protocol[:enforcement]]]` -- `--remove-endpoint host:port` -- `--remove-rule ` -- `--add-allow host:port:METHOD:path_glob` -- `--add-deny host:port:METHOD:path_glob` - -`--add-allow` and `--add-deny` target existing `protocol: rest` endpoints only. `--binary` may be repeated with `--add-endpoint`, and `--rule-name` is allowed only when exactly one `--add-endpoint` is present. - -Each `openshell policy update` invocation is atomic at the revision level: the CLI sends one `merge_operations` batch, the server merges the whole batch into the latest policy, validates the result, and persists at most one new revision. Concurrency is handled with optimistic retries on the `(sandbox_id, version)` uniqueness boundary. If another writer wins first, the server refetches the latest policy, reapplies the full batch, revalidates it, and retries. This preserves batch atomicity without serializing all sandbox policy writes behind a sandbox-global mutex. - -The gateway emits per-sandbox OCSF `CONFIG:*` audit lines when incremental merge operations are applied and when draft chunks are approved or removed. These audit lines are streamed through the existing gateway log path, so operators can inspect the exact logical mutation that produced a policy revision without waiting for the sandbox poll loop to reload that revision. - -### Policy Revision Statuses - -| Status | Meaning | -|--------|---------| -| `pending` | Server accepted the update; sandbox has not yet polled and loaded it | -| `loaded` | Sandbox successfully applied this version via `OpaEngine::reload_from_proto()` | -| `failed` | Sandbox attempted to load but validation failed; LKG policy remains active | -| `superseded` | A newer version was persisted before the sandbox loaded this one | - -### Sandbox Poll Loop - -In gRPC mode, the sandbox spawns a background task that periodically polls the gateway for policy updates. See `crates/openshell-sandbox/src/lib.rs` -- `run_policy_poll_loop()`. - -| Parameter | Default | Override | -|-----------|---------|----------| -| Poll interval | 10 seconds | `OPENSHELL_POLICY_POLL_INTERVAL_SECS` environment variable | - -The poll loop: - -1. Connects a reusable gRPC client (`CachedOpenShellClient`) to avoid per-poll TLS handshake overhead. -2. Fetches the current policy via `GetSandboxSettings`, which returns the latest version, its policy payload, and a SHA-256 hash. -3. Compares the returned version against the locally tracked `current_version`. If the server version is not greater, the loop sleeps and retries. -4. On a new version, calls `OpaEngine::reload_from_proto()` which builds a complete new `regorus::Engine` through the same validated pipeline as the initial load (proto-to-JSON conversion, L7 validation, access preset expansion). -5. If the new engine builds successfully, it atomically replaces the inner `Mutex` and increments the policy generation. Active L7 keep-alive tunnels close before forwarding another request after they observe the new generation. If reload fails, the previous engine and generation are untouched. -6. Reports success or failure back to the server via `ReportPolicyStatus`. - -See `crates/openshell-sandbox/src/grpc_client.rs` -- `CachedOpenShellClient`. - -### Last-Known-Good (LKG) Behavior - -When a new policy version fails validation during reload, the sandbox keeps the previous policy active. This provides safe rollback semantics: - -- `OpaEngine::reload_from_proto()` constructs a complete new engine via `OpaEngine::from_proto()` before touching the existing one. If `from_proto()` returns an error (L7 validation failures, preset expansion errors, malformed proto data), the existing engine's `Mutex` is never locked for replacement. See `crates/openshell-sandbox/src/opa.rs` -- `reload_from_proto()`. -- The failure error message is reported back to the server via `ReportPolicyStatus` with `PolicyStatus::FAILED` and stored in the `PolicyRecord.load_error` field. -- The CLI's `--wait` flag polls `GetSandboxPolicyStatus` and surfaces the error to the operator. - -Failure scenarios that trigger LKG behavior include: - -- L7 validation errors (e.g., `rules` and `access` both set on an endpoint) -- Preset expansion failures (e.g., unknown access preset value) -- Rego rule compilation failures (should not occur with baked-in rules, but guarded against) - -### CLI Commands - -The `openshell policy` subcommand group manages live policy updates through full replacement (`policy set`) and incremental merges (`policy update`): - -```bash -# Merge endpoint/rule changes into the current sandbox policy -openshell policy update \ - --add-endpoint api.github.com:443:read-only:rest:enforce \ - --binary /usr/bin/gh \ - --wait - -# Add a REST allow rule to an existing endpoint -openshell policy update \ - --add-allow api.github.com:443:POST:/repos/*/issues \ - --wait - -# Push a new policy to a running sandbox -openshell policy set --policy updated-policy.yaml - -# Push and wait for the sandbox to load it (with 60s timeout) -openshell policy set --policy updated-policy.yaml --wait - -# Push and wait with a custom timeout -openshell policy set --policy updated-policy.yaml --wait --timeout 120 - -# Set a gateway-global policy (overrides all sandbox policies) -openshell policy set --global --policy policy.yaml --yes - -# Delete the gateway-global policy (restores sandbox-level control) -openshell policy delete --global --yes - -# View the current active policy and its status -openshell policy get - -# Inspect a specific revision -openshell policy get --rev 3 - -# Print the full policy as YAML (round-trips with --policy input format) -openshell policy get --full - -# Combine: inspect a specific revision's full policy -openshell policy get --rev 2 --full - -# List policy revision history -openshell policy list --limit 20 -``` - -#### Global Policy - -The `--global` flag on `policy set`, `policy delete`, `policy list`, and `policy get` manages a gateway-wide policy override. When a global policy is set, all sandboxes receive it through `GetSandboxSettings` (with `policy_source: GLOBAL`) instead of their own per-sandbox policy. Global policies are versioned through the `sandbox_policies` table using the sentinel `sandbox_id = "__global__"` and delivered to sandboxes via the reserved `policy` key in the `gateway_settings` blob. - -| Command | Behavior | -|---------|----------| -| `policy set --global --policy FILE` | Creates a versioned revision (marked `loaded` immediately) and stores the policy in the global settings blob. Sandboxes pick it up on their next poll (~10s). Deduplicates against the latest `loaded` revision by hash. | -| `policy delete --global` | Removes the `policy` key from global settings and supersedes all `__global__` revisions. Sandboxes revert to their per-sandbox policy on the next poll. | -| `policy list --global [--limit N]` | Lists global policy revision history (version, hash, status, timestamps). | -| `policy get --global [--rev N] [--full]` | Shows a specific global revision's metadata, or the latest. `--full` includes the full policy as YAML. | - -Both `set` and `delete` require interactive confirmation (or `--yes` to bypass). The `--wait` flag is rejected for global policy updates: `"--wait is not supported for global policies; global policies are effective immediately"`. - -When a global policy is active, sandbox-scoped policy mutations are blocked: - -- `policy set ` returns `FailedPrecondition: "policy is managed globally"` -- `policy update ` returns `FailedPrecondition: "policy is managed globally"` -- `rule approve`, `rule approve-all` return `FailedPrecondition: "cannot approve rules while a global policy is active"` -- Revoking a previously approved draft chunk is blocked (it would modify the sandbox policy) -- Rejecting pending chunks is allowed (does not modify the sandbox policy) - -See [Gateway Settings Channel](gateway-settings.md#global-policy-lifecycle) for the full state machine, storage model, and implementation details. - -#### `policy get` flags - -| Flag | Default | Description | -|------|---------|-------------| -| `--rev N` | `0` (latest) | Retrieve a specific policy revision by version number instead of the latest. Maps to the `version` field of `GetSandboxPolicyStatusRequest` -- version `0` resolves to the latest revision server-side. | -| `--full` | off | Print the complete policy as YAML after the metadata summary. The YAML output uses the same schema as the `--policy` input file, so it round-trips: you can save it to a file and pass it back to `nav policy set --policy`. | - -When `--full` is specified, the server includes the deserialized `SandboxPolicy` protobuf in the `SandboxPolicyRevision.policy` field (see `crates/openshell-server/src/grpc.rs` -- `policy_record_to_revision()` with `include_policy: true`). The CLI converts this proto back to YAML via `policy_to_yaml()`, which uses a `BTreeMap` for `network_policies` to produce deterministic key ordering. See `crates/openshell-cli/src/run.rs` -- `policy_to_yaml()`, `policy_get()`. - -See `crates/openshell-cli/src/main.rs` -- `PolicyCommands` enum, `crates/openshell-cli/src/run.rs` -- `policy_update()`, `policy_set()`, `policy_get()`, `policy_list()`. - ---- - -## Full YAML Policy Schema - -The YAML data file contains top-level keys that map directly to the OPA data namespace (`data.*`). The following sections document every field. - -### Top-Level Structure - -```yaml -# Required version field -version: 1 - -# Filesystem access policy (applied at startup via Landlock) -filesystem_policy: - include_workdir: true - read_only: [] - read_write: [] - -# Landlock LSM configuration -landlock: - compatibility: best_effort - -# Process privilege configuration -process: - run_as_user: sandbox - run_as_group: sandbox - -# Network policies (evaluated per-CONNECT request via OPA) -network_policies: - policy_name: - name: policy_name - endpoints: [] - binaries: [] - -``` - ---- - -### `filesystem_policy` - -Controls which filesystem paths the sandboxed process can access. Enforced via Linux Landlock LSM at process startup. **Static field** -- immutable after sandbox creation (see [Static vs. Dynamic Fields](#static-vs-dynamic-fields)). - -| Field | Type | Default | Description | -| ----------------- | ---------- | ------- | -------------------------------------------------------------- | -| `include_workdir` | `bool` | `true` | Automatically add the working directory to the read-write list | -| `read_only` | `string[]` | `[]` | Paths accessible in read-only mode | -| `read_write` | `string[]` | `[]` | Paths accessible in read-write mode | - -**Enforcement mapping**: Each path becomes a Landlock `PathBeneath` rule. Read-only paths receive `AccessFs::from_read(ABI::V2)` permissions. Read-write paths receive `AccessFs::from_all(ABI::V2)` permissions (read, write, execute, create, delete, rename). All other paths are denied by the Landlock ruleset. - -**Filesystem preparation**: Before the child process spawns, the supervisor rejects symlinked `read_write` paths, creates any missing `read_write` directories, and sets ownership via `chown()` only on paths it created. Pre-existing image paths keep their existing ownership. See `crates/openshell-sandbox/src/lib.rs` -- `prepare_filesystem()`. - -**Working directory**: When `include_workdir` is `true` and a `--workdir` is specified, the working directory path is appended to `read_write` if not already present. See `crates/openshell-sandbox/src/sandbox/linux/landlock.rs` -- `apply()`. - -**TLS directory**: When network proxy mode is active, the directory `/etc/openshell-tls` is automatically appended to `read_only` so sandbox processes can read the ephemeral CA certificate files (used for auto-TLS termination). - -```yaml -filesystem_policy: - include_workdir: true - read_only: - - /usr - - /lib - - /proc - - /dev/urandom - - /app - - /etc - read_write: - - /sandbox - - /tmp - - /dev/null -``` - ---- - -### `landlock` - -Controls Landlock LSM compatibility behavior. **Static field** -- immutable after sandbox creation (see [Static vs. Dynamic Fields](#static-vs-dynamic-fields)). - -| Field | Type | Default | Description | -| --------------- | -------- | --------------- | ------------------------------------- | -| `compatibility` | `string` | `"best_effort"` | How to handle Landlock unavailability | - -**Accepted values**: - -| Value | Behavior | -| ------------------ | --------------------------------------------------------------------------------------------------------------------------- | -| `best_effort` | If Landlock is unavailable (older kernel, unprivileged container), log a warning and continue without filesystem sandboxing. Individual inaccessible paths (missing, permission denied, symlink loops) are skipped with a warning while remaining rules are still applied. If all paths fail, the sandbox continues without Landlock rather than applying an empty ruleset that would block all access. | -| `hard_requirement` | If Landlock is unavailable or any configured path cannot be opened, abort sandbox startup with an error. | - -**Per-path error handling**: `PathFd::new()` (which wraps `open(path, O_PATH | O_CLOEXEC)`) can fail for several reasons beyond path non-existence: `EACCES` (permission denied), `ELOOP` (symlink loop), `ENAMETOOLONG`, `ENOTDIR`. Each failure is classified with a human-readable reason in logs. In `best_effort` mode, the path is skipped and ruleset construction continues. In `hard_requirement` mode, the error is fatal. - -**Baseline path filtering**: The enrichment functions (`enrich_proto_baseline_paths`, `enrich_sandbox_baseline_paths`) pre-filter system-injected baseline paths (e.g., `/app`) by checking `Path::exists()` before adding them to the policy. This prevents missing baseline paths from reaching Landlock at all. If a baseline `read_write` path is explicitly configured in `read_only`, enrichment skips the promotion and preserves the stricter policy intent. User-specified paths are not pre-filtered — they are evaluated at Landlock apply time so that misconfigurations surface as warnings (`best_effort`) or errors (`hard_requirement`). - -**Zero-rule safety check**: If all paths in the ruleset fail to open, `apply()` returns an error rather than calling `restrict_self()` on an empty ruleset. An empty Landlock ruleset with `restrict_self()` would block all filesystem access — the inverse of the intended degradation behavior. This error is caught by the outer `BestEffort` handler, which logs a warning and continues without Landlock. - -See `crates/openshell-sandbox/src/sandbox/linux/landlock.rs` -- `compat_level()`, `try_open_path()`, `classify_path_fd_error()`, `classify_io_error()`. - -```yaml -landlock: - compatibility: best_effort -``` - ---- - -### `process` - -Controls privilege dropping for the sandboxed process. **Static field** -- immutable after sandbox creation (see [Static vs. Dynamic Fields](#static-vs-dynamic-fields)). - -| Field | Type | Default | Description | -| -------------- | -------- | -------------- | ---------------------------------------- | -| `run_as_user` | `string` | `""` (no drop) | Unix user name to switch to before exec | -| `run_as_group` | `string` | `""` (no drop) | Unix group name to switch to before exec | - -**Enforcement sequence** (in the child process `pre_exec`, before sandbox restrictions are applied): - -1. `initgroups()` -- set supplementary groups for the target user -2. `setgid()` -- switch to the target group -3. Verify `getegid()` matches the target GID (defense-in-depth, CWE-250 / CERT POS37-C) -4. `setuid()` -- switch to the target user -5. Verify `geteuid()` matches the target UID -6. Verify `setuid(0)` fails -- confirms root cannot be re-acquired - -This happens before Landlock and seccomp are applied because `initgroups` needs access to `/etc/group` and `/etc/passwd`, which Landlock may subsequently block. The post-condition checks (steps 3, 5, 6) are async-signal-safe and add negligible overhead while guarding against hypothetical kernel-level defects. See `crates/openshell-sandbox/src/process.rs` -- `drop_privileges()`. - -```yaml -process: - run_as_user: sandbox - run_as_group: sandbox -``` - ---- - -### `network_policies` - -A map of named network policy rules. Each rule defines which binary/endpoint pairs are allowed to make outbound network connections. This is the core of the network access control system. **Dynamic field** -- can be updated on a running sandbox via live policy updates (see [Live Policy Updates](#live-policy-updates)). - -**Behavioral trigger**: The sandbox always starts in **proxy mode** regardless of whether `network_policies` is present. The proxy is required so that all egress can be evaluated by OPA and the virtual hostname `inference.local` is always addressable for inference routing. When `network_policies` is empty, the OPA engine denies all connections. - -```yaml -network_policies: - claude_code: # <-- map key (arbitrary identifier) - name: claude_code # <-- human-readable name (used in audit logs) - endpoints: # <-- allowed host:port pairs - - { host: api.anthropic.com, port: 443 } - - { host: "*.anthropic.com", ports: [443, 8443] } # glob host + multi-port - binaries: # <-- allowed binary identities - - { path: /usr/local/bin/claude } -``` - -#### Network Policy Rule - -| Field | Type | Required | Description | -| ----------- | ------------------- | -------- | -------------------------------------------------------------------- | -| `name` | `string` | Yes | Human-readable policy name (appears in proxy log lines as `policy=`) | -| `endpoints` | `NetworkEndpoint[]` | Yes | List of allowed host:port pairs | -| `binaries` | `NetworkBinary[]` | Yes | List of allowed binary identities | - -#### `NetworkEndpoint` - -Each endpoint defines a network destination and, optionally, L7 inspection behavior. - -| Field | Type | Default | Description | -| ------------- | ----------- | --------------- | ------------------------------------------------------------------------------------------------------------------- | -| `host` | `string` | _(required)_ | Hostname or glob pattern to match (case-insensitive). Supports wildcards (`*.example.com`). Optional when `allowed_ips` is set (see [Hostless Endpoints](#hostless-endpoints-allowed_ips-without-host)). See [Host Wildcards](#host-wildcards). | -| `port` | `integer` | _(required)_ | TCP port to match. Mutually exclusive with `ports` — if both are set, `ports` takes precedence. See [Multi-Port Endpoints](#multi-port-endpoints). | -| `ports` | `integer[]`| `[]` | Multiple TCP ports to match. When non-empty, the endpoint covers all listed ports. Backwards compatible with `port`. See [Multi-Port Endpoints](#multi-port-endpoints). | -| `path` | `string` | `""` | Optional HTTP path glob for L7 endpoint selection when multiple protocols share a host:port, such as `/repos/**` and `/graphql`. Empty matches all paths. | -| `protocol` | `string` | `""` | Application protocol for L7 inspection. See [Behavioral Trigger: L7 Inspection](#behavioral-trigger-l7-inspection). | -| `tls` | `string` | `""` (auto) | TLS handling mode. Absent or empty: auto-detect and terminate TLS if detected. `"skip"`: bypass TLS detection entirely. `"terminate"` and `"passthrough"` are deprecated (treated as auto). See [Behavioral Trigger: TLS Handling](#behavioral-trigger-tls-handling). | -| `enforcement` | `string` | `"audit"` | L7 enforcement mode: `"enforce"` or `"audit"` | -| `access` | `string` | `""` | Shorthand preset for common L7 rule sets. Mutually exclusive with `rules`. | -| `rules` | `L7Rule[]` | `[]` | Explicit L7 allow rules. Mutually exclusive with `access`. | -| `allowed_ips` | `string[]` | `[]` | IP allowlist for SSRF override. Entries overlapping always-blocked ranges (loopback, link-local, unspecified) are rejected at load time. See [Private IP Access via `allowed_ips`](#private-ip-access-via-allowed_ips). | -| `allow_encoded_slash` | `bool` | `false` | Preserves `%2F` inside L7 request path segments instead of rejecting the request. Required for endpoints such as npm scoped packages. | -| `persisted_queries` | `string` | `"deny"` | GraphQL hash-only/saved-query behavior. Use `"allow_registered"` only with `graphql_persisted_queries`. | -| `graphql_persisted_queries` | `map` | `{}` | Trusted GraphQL persisted-query registry keyed by hash or service-specific ID. Values contain `operation_type`, optional `operation_name`, and optional root `fields`. | -| `graphql_max_body_bytes` | `integer` | `65536` | Maximum GraphQL request body size buffered for inspection. Larger GraphQL bodies are rejected before policy evaluation. | - -#### `NetworkBinary` - -| Field | Type | Required | Description | -| ------ | -------- | -------- | ------------------------------------------------------------------ | -| `path` | `string` | Yes | Filesystem path of the binary. Supports glob patterns (`*`, `**`). | - -**Binary identity matching** is evaluated in the Rego rules (`sandbox-policy.rego`) using four strategies, tried in order: - -1. **Direct path match** -- `exec.path == binary.path` -2. **Ancestor match** -- any entry in `exec.ancestors` matches `binary.path` -3. **Cmdline match** -- any entry in `exec.cmdline_paths` matches `binary.path` (for script interpreters -- e.g., `/usr/bin/node` runs `/usr/local/bin/claude`, the exe is `node` but cmdline contains `claude`) -4. **Glob match** -- if `binary.path` contains `*`, all paths (direct, ancestors, cmdline) are tested via `glob.match(pattern, ["/"], path)`. The `*` wildcard does not cross `/` boundaries. Use `**` for recursive matching. - -#### `L7Rule` - -Each rule contains a single `allow` block. Rules are allow-only; anything not explicitly allowed is denied. - -```yaml -rules: - - allow: - method: GET - path: "/repos/**" - query: - per_page: "1*" - - allow: - method: POST - path: "/repos/*/issues" - query: - labels: - any: ["bug*", "p1*"] - - allow: - operation_type: query - fields: [viewer, repository] - - allow: - operation_type: mutation - operation_name: Issue* - fields: [createIssue] -``` - -#### `L7Allow` - -| Field | Type | Description | -| --------- | -------- | ---------------------------------------------------------------------------------------------------------------------------- | -| `method` | `string` | HTTP method: `GET`, `HEAD`, `POST`, `PUT`, `DELETE`, `PATCH`, `OPTIONS`, or `*` (any). Case-insensitive matching. | -| `path` | `string` | URL path glob pattern: `**` matches everything, otherwise `glob.match` with `/` delimiter. | -| `command` | `string` | SQL command: `SELECT`, `INSERT`, `UPDATE`, `DELETE`, or `*` (any). Case-insensitive matching. For `protocol: sql` endpoints. | -| `query` | `map` | Optional REST query rules keyed by decoded query param name. Value is either a glob string (for example, `tag: "foo-*"`) or `{ any: ["foo-*", "bar-*"] }`. | -| `operation_type` | `string` | GraphQL operation type: `query`, `mutation`, `subscription`, or `*`. Required for `protocol: graphql` allow rules. | -| `operation_name` | `string` | Optional GraphQL operation-name glob. Omit to match any operation name. | -| `fields` | `string[]` | Optional GraphQL root-field globs. For allow rules, every selected root field must match one configured glob. For deny rules, any matching root field blocks the request. | - -Method, command, and GraphQL operation type fields use `*` as wildcard for "any". Path patterns use `**` for "match everything" and standard glob patterns with `/` as a delimiter otherwise. Query matching is case-sensitive and evaluates decoded values; when duplicate keys are present in the request, every value for that key must match the configured matcher. GraphQL field and operation-name matching also uses glob patterns. See `sandbox-policy.rego` -- `method_matches()`, `path_matches()`, `command_matches()`, `query_params_match()`, and `graphql_*`. - -GraphQL inspection supports `GET` and `POST` GraphQL-over-HTTP envelopes, JSON batches, named-operation selection, fragments at the operation root, Apollo persisted-query hashes, and service-specific saved-query IDs (`id`, `documentId`, or `queryId`). Hash-only or saved-query-only requests have no parseable document, so they are denied unless `persisted_queries: allow_registered` is set and the hash or ID appears in `graphql_persisted_queries`. If a batch contains any denied, malformed, or unregistered operation, the whole request is denied. - -#### Access Presets - -The `access` field provides shorthand for common rule sets. During preprocessing, presets are expanded into explicit `rules` arrays before Rego evaluation. - -| Preset | REST expansion | GraphQL expansion | Description | -| ------------ | ------------------------------------------------------------------ | ------------------------------ | ---------------------------------------- | -| `read-only` | `GET/**`, `HEAD/**`, `OPTIONS/**` | `operation_type: query` | Safe read-only access | -| `read-write` | `GET/**`, `HEAD/**`, `OPTIONS/**`, `POST/**`, `PUT/**`, `PATCH/**` | `query`, `mutation` | Read and write but not delete for REST | -| `full` | `*/**` | `operation_type: "*"` | All supported actions | - -See `crates/openshell-sandbox/src/l7/mod.rs` -- `expand_access_presets()`. - -#### Host Wildcards - -The `host` field supports glob patterns for matching multiple subdomains under a common domain. Wildcards use OPA's `glob.match` function with `.` as the delimiter, consistent with TLS certificate wildcard semantics. - -| Pattern | Matches | Does Not Match | -|---------|---------|----------------| -| `*.example.com` | `api.example.com`, `cdn.example.com` | `example.com`, `deep.sub.example.com` | -| `**.example.com` | `api.example.com`, `deep.sub.example.com` | `example.com` | -| `*.EXAMPLE.COM` | `api.example.com` (case-insensitive) | | - -**Wildcard semantics**: - -- `*` matches exactly one DNS label (does not cross `.` boundaries). `*.example.com` matches `api.example.com` but not `deep.sub.example.com`. -- `**` matches across label boundaries. `**.example.com` matches both `api.example.com` and `deep.sub.example.com`. -- Matching is case-insensitive — both the pattern and the incoming hostname are lowercased before comparison. -- The bare domain is never matched. `*.example.com` does not match `example.com` (there must be at least one label before the domain). - -**Validation rules**: - -- **Error**: Bare `*` or `**` (matches all hosts) is rejected. Use a specific pattern like `*.example.com`. -- **Error**: Patterns must start with `*.` or `**.` prefix. Malformed patterns like `*com` are rejected. -- **Warning**: Broad patterns like `*.com` (only two labels) trigger a warning about covering all subdomains of a TLD. - -See `crates/openshell-sandbox/src/l7/mod.rs` -- `validate_l7_policies()` for validation, `sandbox-policy.rego` -- `endpoint_allowed` for the Rego glob matching rule. - -**Rego implementation**: The Rego rules detect host wildcards via `contains(endpoint.host, "*")` and dispatch to `glob.match(lower(endpoint.host), ["."], lower(network.host))`. Exact-match hosts use a separate, faster `lower(endpoint.host) == lower(network.host)` rule. See `crates/openshell-sandbox/data/sandbox-policy.rego`. - -**Example**: Allow any subdomain of `example.com` on port 443: - -```yaml -network_policies: - example_wildcard: - name: example_wildcard - endpoints: - - host: "*.example.com" - port: 443 - binaries: - - { path: /usr/bin/curl } -``` - -Host wildcards compose with all other endpoint features — L7 inspection, auto-TLS termination, multi-port, and `allowed_ips`: - -```yaml -network_policies: - wildcard_l7: - name: wildcard_l7 - endpoints: - - host: "*.example.com" - port: 8080 - protocol: rest - enforcement: enforce - rules: - - allow: - method: GET - path: "/api/**" - binaries: - - { path: /usr/bin/curl } -``` - -#### Multi-Port Endpoints - -The `ports` field allows a single endpoint entry to cover multiple TCP ports. This avoids duplicating endpoint definitions that differ only in port number. - -**Normalization**: Both YAML loading paths (file mode and gRPC mode) normalize `port` and `ports` before the data reaches the OPA engine: - -- If `ports` is non-empty, it takes precedence. `port` is ignored. -- If `ports` is empty and `port` is set, the scalar is promoted to `ports: [port]`. -- The scalar `port` field is removed from the JSON fed to OPA. Rego rules always reference `endpoint.ports[_]`. - -This normalization happens in `crates/openshell-sandbox/src/opa.rs` -- `normalize_endpoint_ports()` (YAML path) and `proto_to_opa_data_json()` (proto path). - -**Backwards compatibility**: Existing policies using `port: 443` continue to work without changes. The scalar is silently promoted to `ports: [443]` at load time. - -**YAML serialization**: When serializing policy back to YAML (e.g., `nav policy get --full`), a single-element `ports` array is emitted as the compact `port: N` scalar form. Multi-element arrays are emitted as `ports: [N, M]`. See `crates/openshell-policy/src/lib.rs` -- `from_proto()`. - -**Example**: Allow both standard HTTPS and a custom TLS port: - -```yaml -network_policies: - multi_port: - name: multi_port - endpoints: - - host: api.example.com - ports: - - 443 - - 8443 - binaries: - - { path: /usr/bin/curl } -``` - -This is equivalent to two separate endpoint entries: - -```yaml - endpoints: - - { host: api.example.com, port: 443 } - - { host: api.example.com, port: 8443 } -``` - -Multi-port endpoints compose with host wildcards, L7 rules, and all other endpoint fields: - -```yaml -network_policies: - wildcard_multi_port: - name: wildcard_multi_port - endpoints: - - host: "*.example.com" - ports: [443, 8443] - protocol: rest - enforcement: enforce - access: read-only - binaries: - - { path: /usr/bin/curl } -``` - -Hostless endpoints also support multi-port: - -```yaml -network_policies: - private_multi: - name: private_multi - endpoints: - - ports: [80, 443] - allowed_ips: ["10.0.0.0/8"] - binaries: - - { path: /usr/bin/curl } -``` - ---- - -### Inference Routing - -Inference routing to `inference.local` is handled by the proxy's `InferenceContext`, not by the OPA policy engine or an `inference` block in the policy YAML. The proxy intercepts HTTPS CONNECT requests to `inference.local` and routes matching inference API requests (e.g., `POST /v1/chat/completions`, `POST /v1/messages`) through the sandbox-local `openshell-router`. See [Inference Routing](inference-routing.md) for details on route configuration and the router architecture. - -The proxy always runs in proxy mode so that `inference.local` is addressable from within the sandbox's network namespace. Inference route sources are configured separately from policy: via `--inference-routes` (file mode) or fetched from the gateway's inference bundle (gateway mode). See `crates/openshell-sandbox/src/proxy.rs` -- `InferenceContext`, `crates/openshell-sandbox/src/l7/inference.rs`. - ---- - -## Behavioral Triggers - -Several policy fields trigger fundamentally different enforcement behavior. Understanding these triggers is critical for writing correct policies. - -### Network Mode: Always Proxy - -The sandbox always runs in **proxy mode**. Both file mode and gRPC mode set `NetworkMode::Proxy` unconditionally. This ensures all egress is evaluated by OPA and the virtual hostname `inference.local` is always addressable for inference routing. See `crates/openshell-sandbox/src/lib.rs` -- `load_policy()`, `crates/openshell-sandbox/src/policy.rs` -- `TryFrom`. - -In proxy mode: - -- Seccomp allows `AF_INET` and `AF_INET6` sockets (but blocks `AF_NETLINK`, `AF_PACKET`, `AF_BLUETOOTH`, `AF_VSOCK`). -- An HTTP CONNECT proxy starts, bound to the host side of a veth pair. -- A network namespace with a veth pair isolates the sandbox process. -- `HTTP_PROXY`/`HTTPS_PROXY`/`ALL_PROXY` environment variables are set on the child process. - -When `network_policies` is empty, the OPA engine denies all outbound connections (except `inference.local` which is handled separately by the proxy before OPA evaluation). - -The gateway validates that static fields stay unchanged across live policy updates, then persists a new policy revision for the supervisor to load. Empty and non-empty `network_policies` revisions follow the same live-update path. - -**Proxy sub-modes**: In proxy mode, the proxy handles two distinct request types: - -| Client sends | Proxy behavior | Typical use case | -|---|---|---| -| `CONNECT host:port` | CONNECT tunnel (bidirectional TCP relay or L7 inspection) | HTTPS to any destination, HTTP through an opaque tunnel | -| `GET http://host/path HTTP/1.1` (absolute-form) | **Forward proxy** — rewrites to origin-form & relays | Plain HTTP to private IP endpoints | - -See [Behavioral Trigger: Forward Proxy Mode](#behavioral-trigger-forward-proxy-mode) for full details on the forward proxy path. - -```mermaid -flowchart LR - SANDBOX[Sandbox Startup] --> PROXY[Proxy Mode
Always Active] - - PROXY --> SECCOMP_ALLOW["seccomp: allow AF_INET + AF_INET6
block AF_NETLINK, AF_PACKET, etc."] - PROXY --> NETNS["Create network namespace
veth pair: 10.200.0.1 ↔ 10.200.0.2"] - PROXY --> START_PROXY["Start HTTP proxy
bound to veth host IP"] - PROXY --> ENVVARS["Set HTTP_PROXY, HTTPS_PROXY,
ALL_PROXY on child process"] - - START_PROXY --> CONNECT{CONNECT request} - CONNECT -->|inference.local| INFERENCE["InferenceContext:
route to local backend"] - CONNECT -->|Other host| OPA["OPA evaluation:
network_policies"] -``` - -### Behavioral Trigger: L7 Inspection - -**Trigger**: The `protocol` field on a `NetworkEndpoint`. - -| Condition | Enforcement Layer | Behavior | -| -------------------------- | -------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -| `protocol` absent or empty | **L4 (transport)** | The proxy performs a raw `copy_bidirectional` after the CONNECT handshake. No application-layer inspection occurs. Only the host:port and binary identity are checked. | -| `protocol: rest` | **L7 (application)** | The proxy parses each HTTP/1.1 request within the tunnel, evaluates method+path against the endpoint's `rules`, and either forwards or denies each request individually. | -| `protocol: graphql` | **L7 (application)** | The proxy parses GraphQL-over-HTTP requests, classifies operation type, operation name, root fields, and persisted-query identifiers, then evaluates GraphQL allow and deny rules. | -| `protocol: sql` | **L7 (application, audit-only)** | Reserved for SQL protocol inspection. Currently falls through to passthrough with a warning. `enforcement: enforce` is rejected at validation time for SQL endpoints. | - -This is the single most important behavioral trigger in the policy language. An endpoint with no `protocol` field passes traffic opaquely after the L4 (CONNECT) check. Adding `protocol: rest` or `protocol: graphql` activates per-request HTTP parsing and policy evaluation inside the proxy. - -**Implementation path**: After L4 CONNECT is allowed, the proxy calls `query_l7_route_snapshot()` which evaluates the Rego rule `data.openshell.sandbox._matching_endpoint_configs` and records the policy generation. If one or more endpoint `protocol` configs are returned, the proxy enters path-aware L7 route selection instead of `copy_bidirectional()`. See `crates/openshell-sandbox/src/proxy.rs` -- `handle_tcp_connection()`. - -For L7-inspected CONNECT tunnels, the proxy binds endpoint config and the per-tunnel policy engine clone to the policy generation observed at tunnel setup. If a live policy reload advances the generation, the relay closes the existing keep-alive tunnel before forwarding another request. HTTP passthrough tunnels without endpoint `protocol` use the same generation guard for parsed requests even though they do not evaluate L7 OPA rules. Clients should reconnect so the next request is evaluated under the current policy. - -Raw streams are connection-scoped and outside L7 live-reload guarantees. This includes endpoints with `tls: skip`, non-HTTP CONNECT payloads, SQL audit fallback passthrough, HTTP upgrades after `101 Switching Protocols`, and already-forwarded streaming response bodies such as SSE. A policy reload applies to the next connection or next parsed HTTP request; it does not terminate raw bytes already relayed outside the HTTP request parser. - -**Validation requirement**: When `protocol` is set, either `rules` or `access` must also be present. An endpoint with `protocol` but no rules/access is rejected at validation time because it would deny all traffic (no allow rules means nothing matches). See `crates/openshell-sandbox/src/l7/mod.rs` -- `validate_l7_policies()`. - -### Behavioral Trigger: Forward Proxy Mode - -**Trigger**: A non-CONNECT HTTP method with an absolute-form URI (e.g., `GET http://host:port/path HTTP/1.1`). - -When a client sets `HTTP_PROXY` and makes a plain `http://` request, standard HTTP libraries send a **forward proxy request** instead of a CONNECT tunnel. The proxy handles these requests via the forward proxy path rather than the CONNECT path. - -**Security constraint**: Forward proxy mode is restricted to **private IP endpoints** that are explicitly allowed by policy. Plain HTTP traffic never reaches the public internet. All three conditions must be true: - -1. OPA policy explicitly allows the destination (`action=allow`) -2. The matched endpoint has `allowed_ips` configured -3. All resolved IP addresses are RFC 1918 private (`10/8`, `172.16/12`, `192.168/16`) - -If any condition fails, the proxy returns `403 Forbidden`. - -| Condition | Forward proxy | CONNECT | -|---|---|---| -| Public IP, no `allowed_ips` | 403 | Allowed (standard SSRF check) | -| Public IP, with `allowed_ips` | 403 (private-IP gate) | Allowed if IP in allowlist | -| Private IP, no `allowed_ips` | 403 | 403 (SSRF block) | -| Private IP, with `allowed_ips` | **Allowed** | Allowed | -| `https://` scheme | 403 (must use CONNECT) | N/A | - -**Request processing**: When a forward proxy request is accepted, the proxy: - -1. Parses the absolute-form URI to extract scheme, host, port, and path (`parse_proxy_uri`) -2. Rejects `https://` — clients must use CONNECT for TLS -3. Evaluates OPA policy (same `evaluate_opa_tcp` as CONNECT) -4. Requires `allowed_ips` on the matched endpoint -5. Resolves DNS and validates all IPs are private and within `allowed_ips` -6. Connects to upstream -7. Rewrites the request: absolute-form → origin-form (`GET /path HTTP/1.1`), strips hop-by-hop headers, adds `Via: 1.1 openshell-sandbox` and `Connection: close` -8. Relays the rewritten request and response through the shared guarded HTTP relay. This reuses the same request body framing, CL/TE rejection, credential rewrite fail-closed behavior, unsolicited `101` blocking, and policy-generation checks as CONNECT L7 HTTP. - -**V1 simplifications**: Forward proxy v1 injects `Connection: close` (no keep-alive). Every forward proxy connection handles exactly one request-response exchange. The request is bound to the policy generation used for the L4 allow decision and is checked again before upstream connect and request forwarding. When an endpoint has L7 rules configured, the forward proxy also evaluates the single request's method and path against L7 policy before forwarding. - -**Implementation**: See `crates/openshell-sandbox/src/proxy.rs` -- `handle_forward_proxy()`, `parse_proxy_uri()`, `rewrite_forward_request()`. - -**Logging**: Forward proxy requests are logged distinctly from CONNECT: - -```text -FORWARD method=GET dst_host=10.86.8.223 dst_port=8000 path=/screenshot/ action=allow policy=computer-control -``` - -```mermaid -flowchart TD - A["Non-CONNECT request received
e.g. GET http://host/path"] --> B["parse_proxy_uri(uri)"] - B --> C{Scheme = http?} - C -- No --> D["403 Forbidden
(HTTPS must use CONNECT)"] - C -- Yes --> E["OPA policy evaluation"] - E --> F{Allowed?} - F -- No --> G["403 Forbidden"] - F -- Yes --> H{allowed_ips on endpoint?} - H -- No --> I["403 Forbidden
(forward proxy requires allowed_ips)"] - H -- Yes --> J["resolve_and_check_allowed_ips()"] - J --> K{All IPs private
AND in allowlist?} - K -- No --> L["403 Forbidden"] - K -- Yes --> M["TCP connect to upstream"] - M --> N["Rewrite request to origin-form
Add Via + Connection: close"] - N --> O["Guarded HTTP relay"] -``` - -#### Example: Forward Proxy Policy - -The same policy that enables CONNECT to a private endpoint also enables forward proxy access. No new policy fields are needed: - -```yaml -network_policies: - computer_control: - name: computer-control - endpoints: - - host: 10.86.8.223 - port: 8000 - allowed_ips: - - "10.86.8.223/32" - binaries: - - { path: /usr/local/bin/python3.13 } -``` - -With this policy, both work: - -```python -# CONNECT tunnel (httpx with HTTPS, or explicit tunnel code) -# Forward proxy (httpx with HTTP_PROXY set for http:// URLs) -import httpx -resp = httpx.get("http://10.86.8.223:8000/screenshot/", - proxy="http://10.200.0.1:3128") -``` - -### Behavioral Trigger: TLS Handling - -**Trigger**: The `tls` field on a `NetworkEndpoint`. - -TLS termination is automatic. The proxy peeks the first bytes of every CONNECT tunnel and terminates TLS whenever a ClientHello is detected. This removes the need for explicit `tls: terminate` in policy — all HTTPS connections are automatically terminated for credential injection and (when configured) L7 inspection. - -| Condition | Behavior | -| ------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `tls` absent or `""` (default) | **Auto-detect**: The proxy peeks the first bytes of the tunnel. If TLS is detected (ClientHello pattern), the proxy terminates TLS transparently (MITM), enabling credential injection and L7 inspection. If plaintext HTTP is detected, the proxy inspects directly. If neither, traffic is relayed raw. | -| `tls: "skip"` | **Explicit opt-out**: No TLS detection, no termination, no credential injection. The tunnel is a raw `copy_bidirectional` relay. Use for client-cert mTLS to upstream or non-standard binary protocols. | -| `tls: "terminate"` _(deprecated)_ | Treated as auto-detect. Emits a deprecation warning: "TLS termination is now automatic. Use `tls: skip` to explicitly disable." | -| `tls: "passthrough"` _(deprecated)_ | Treated as auto-detect. Emits the same deprecation warning. | - -**Prerequisites for TLS termination (auto-detect path)**: - -- The sandbox supervisor generates an ephemeral CA at startup (`SandboxCa::generate()`) and writes it to `/etc/openshell-tls/`. -- Trust store environment variables are set on the child process: `NODE_EXTRA_CA_CERTS`, `SSL_CERT_FILE`, `REQUESTS_CA_BUNDLE`, `CURL_CA_BUNDLE`. -- A combined CA bundle (system CAs + sandbox CA) is written to `/etc/openshell-tls/ca-bundle.pem` so `SSL_CERT_FILE` replaces the default trust store while still trusting real CAs. - -**Certificate caching**: Per-hostname leaf certificates are cached (up to 256 entries, then the entire cache is cleared). See `crates/openshell-sandbox/src/l7/tls.rs` -- `CertCache`. - -**Credential injection**: When TLS is auto-terminated but no L7 policy is configured (no `protocol` field), the proxy enters a passthrough relay that rewrites credential placeholders in HTTP headers (via `SecretResolver`) and logs requests for observability, but does not evaluate L7 OPA rules. The relay still closes parsed keep-alive HTTP tunnels after policy generation changes before forwarding another request. This means credential injection works on all HTTPS endpoints automatically. - -**Validation warnings**: - -- `tls: terminate` or `tls: passthrough`: deprecated, emits a warning. -- `tls: skip` with `protocol: rest` on port 443: emits a warning ("L7 inspection cannot work on encrypted traffic"). - -### Behavioral Trigger: Enforcement Mode - -**Trigger**: The `enforcement` field on a `NetworkEndpoint` with L7 inspection enabled. - -| Value | Behavior | -| ----------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -| `audit` (default) | L7 rule violations are logged as `l7_decision=audit` but traffic is forwarded to upstream. This is the safe migration path for introducing L7 rules without breaking existing behavior. | -| `enforce` | L7 rule violations result in a `403 Forbidden` JSON response sent to the client. The connection is closed after the deny response. Traffic never reaches upstream. | - -**Enforce-mode deny response format**: - -```json -{ - "error": "policy_denied", - "policy": "internal_api", - "rule": "DELETE /api/v1/data", - "detail": "DELETE /api/v1/data not permitted by policy" -} -``` - -The response includes an `X-OpenShell-Policy` header and `Connection: close`. See `crates/openshell-sandbox/src/l7/rest.rs` -- `send_deny_response()`. - -**SQL restriction**: `protocol: sql` + `enforcement: enforce` is rejected at validation time because full SQL parsing is not available in v1. SQL endpoints must use `enforcement: audit`. - -### Behavioral Trigger: Access Presets vs. Explicit Rules - -**Trigger**: The `access` and `rules` fields on a `NetworkEndpoint`. - -| Condition | Behavior | -| --------------------------------------- | ----------------------------------------------------------------------------------------------------- | -| Neither `access` nor `rules` | Valid only if `protocol` is also absent (L4-only endpoint). If `protocol` is set, validation rejects. | -| `access` only | Expanded to explicit `rules` during preprocessing. | -| `rules` only | Used directly. | -| Both `access` and `rules` | Rejected at validation time ("rules and access are mutually exclusive"). | -| `rules` present but empty (`rules: []`) | Rejected at validation time ("rules list cannot be empty -- would deny all traffic"). | - ---- - -## Seccomp Filter Details - -The seccomp filter uses a default-allow policy (`SeccompAction::Allow`) with targeted rules that return `EPERM`. It provides three layers of protection: socket domain blocks, unconditional syscall blocks, and conditional syscall blocks. See `crates/openshell-sandbox/src/sandbox/linux/seccomp.rs`. - -### Blocked socket domains - -Regardless of network mode, certain socket domains are always blocked: - -| Domain | Constant | Reason | -| -------------- | -------- | ------------------------------------------------------------------------------- | -| `AF_NETLINK` | 16 | Prevents manipulation of routing tables, firewall rules, and network interfaces | -| `AF_PACKET` | 17 | Prevents raw packet capture and injection | -| `AF_BLUETOOTH` | 31 | Prevents Bluetooth access | -| `AF_VSOCK` | 40 | Prevents VM socket communication | - -In proxy mode (which is always active), `AF_INET` (2) and `AF_INET6` (10) are allowed so the sandbox process can reach the proxy. - -### Blocked syscalls - -These syscalls are blocked unconditionally (EPERM for any invocation): - -| Syscall | NR (x86-64) | Reason | -|---------|-------------|--------| -| `memfd_create` | 319 | Fileless binary execution bypasses Landlock filesystem restrictions | -| `ptrace` | 101 | Cross-process memory inspection and code injection | -| `bpf` | 321 | Kernel BPF program loading | -| `process_vm_readv` | 310 | Cross-process memory read | -| `io_uring_setup` | 425 | Async I/O subsystem with extensive CVE history | -| `mount` | 165 | Filesystem mount could subvert Landlock or overlay writable paths | - -### Conditionally blocked syscalls - -These syscalls are blocked only when specific flag patterns are present in their arguments: - -| Syscall | NR (x86-64) | Condition | Reason | -|---------|-------------|-----------|--------| -| `execveat` | 322 | `AT_EMPTY_PATH` (0x1000) set in flags (arg4) | Fileless execution from an anonymous fd | -| `unshare` | 272 | `CLONE_NEWUSER` (0x10000000) set in flags (arg0) | User namespace creation enables privilege escalation | -| `seccomp` | 317 | operation == `SECCOMP_SET_MODE_FILTER` (1) in arg0 | Prevents sandboxed code from replacing the active filter | - -Flag checks use `MaskedEq` (`(arg & mask) == mask`) to detect the flag bit regardless of other bits. The `seccomp` syscall check uses `Eq` for exact value comparison on the operation argument. - ---- - -## Network Namespace Isolation - -When proxy mode is active (on Linux), the sandbox creates an isolated network namespace: - -| Component | Value | Description | -| --------------- | ----------------- | ---------------------------------------------- | -| Namespace name | `sandbox-{uuid8}` | 8-character UUID prefix | -| Host veth IP | `10.200.0.1/24` | Proxy binds here | -| Sandbox veth IP | `10.200.0.2/24` | Child process operates here | -| Default route | via `10.200.0.1` | All sandbox traffic goes through the host veth | -| Proxy port | `3128` (default) | Configurable | - -The child process enters the namespace via `setns(fd, CLONE_NEWNET)` in `pre_exec`. This provides hard network isolation -- even if a process ignores proxy environment variables, it can only reach the host veth IP, where the proxy listens. See `crates/openshell-sandbox/src/sandbox/linux/netns.rs`. - ---- - -## Identity Binding - -The proxy identifies which binary initiated each CONNECT request using Linux `/proc` introspection: - -1. **Socket lookup**: `/proc/net/tcp` maps the client's source port to an inode, then scans `/proc/{pid}/fd/` under the entrypoint process tree to find which PID owns that socket. -2. **Binary resolution**: `/proc/{pid}/exe` resolves the actual binary path. -3. **Ancestor walk**: `/proc/{pid}/status` PPid field is followed upward to build the ancestor binary chain. -4. **Cmdline extraction**: `/proc/{pid}/cmdline` is parsed for absolute paths to capture script names (e.g., when `node` runs `/usr/local/bin/claude`). -5. **TOFU verification**: SHA256 hash of each binary is computed on first use and cached. Subsequent requests from the same binary path must match the cached hash. A mismatch (binary replaced mid-sandbox) triggers an immediate deny. - -See `crates/openshell-sandbox/src/procfs.rs`, `crates/openshell-sandbox/src/identity.rs`. - ---- - -## L7 Request Evaluation Flow - -When an endpoint has L7 inspection enabled, each HTTP request within the CONNECT tunnel follows this evaluation path: - -```mermaid -sequenceDiagram - participant Client as Sandbox Process - participant Proxy as CONNECT Proxy - participant OPA as OPA Engine (Rego) - participant Upstream as Remote Server - - Client->>Proxy: HTTP CONNECT host:port - Note over Proxy: L4 evaluation: identity + endpoint check - Proxy->>OPA: evaluate_network(host, port, binary, ancestors, cmdline) - OPA-->>Proxy: allowed=true, matched_policy="api_policy" - Proxy-->>Client: 200 Connection Established - - Note over Proxy: Query L7 config and generation for matched endpoint - Proxy->>OPA: query_endpoint_config_with_generation(host, port, binary) - OPA-->>Proxy: {protocol: rest, enforcement: enforce}, generation=N - Proxy->>OPA: clone_engine_for_tunnel(generation=N) - OPA-->>Proxy: generation-bound tunnel evaluator - - Note over Proxy: Auto-detect TLS (peek first bytes) - Note over Proxy: TLS ClientHello detected → terminate - Client->>Proxy: TLS ClientHello - Proxy-->>Client: TLS ServerHello (ephemeral cert for host) - Note over Proxy: Decrypt client traffic - - Proxy->>Upstream: TLS ClientHello (real cert verification) - Upstream-->>Proxy: TLS ServerHello - Note over Proxy: Encrypt to upstream - - loop Per HTTP request in tunnel - Client->>Proxy: GET /repos/myorg/foo HTTP/1.1 - Note over Proxy: Close if policy generation changed - Note over Proxy: Parse HTTP request line + headers - Note over Proxy: Close if policy generation changed - Proxy->>OPA: allow_request(method=GET, path=/repos/myorg/foo) - OPA-->>Proxy: allowed=true - Note over Proxy: Close if policy generation changed - Proxy->>Upstream: GET /repos/myorg/foo HTTP/1.1 - Upstream-->>Proxy: 200 OK (response body) - Proxy-->>Client: 200 OK (response body) - end - - Client->>Proxy: DELETE /repos/myorg/foo HTTP/1.1 - Proxy->>OPA: allow_request(method=DELETE, path=/repos/myorg/foo) - OPA-->>Proxy: allowed=false, reason="DELETE /repos/myorg/foo not permitted" - alt enforcement: enforce - Proxy-->>Client: 403 Forbidden (JSON body) - Note over Proxy: Connection closed - else enforcement: audit - Note over Proxy: Log: l7_decision=audit - Proxy->>Upstream: DELETE /repos/myorg/foo HTTP/1.1 - Upstream-->>Proxy: 200 OK - Proxy-->>Client: 200 OK - end -``` - ---- - -## Validation Rules - -The following validation rules are enforced during policy loading (both file mode and gRPC mode). Errors prevent sandbox startup; warnings are logged but do not block. - -### Errors (Block Startup) - -| Condition | Error Message | -| ---------------------------------------------- | ------------------------------------------------------------------------------------------ | -| Both `rules` and `access` on the same endpoint | `rules and access are mutually exclusive` | -| `protocol` set without `rules` or `access` | `protocol requires rules or access to define allowed traffic` | -| `protocol: sql` with `enforcement: enforce` | `SQL enforcement requires full SQL parsing (not available in v1). Use enforcement: audit.` | -| `rules: []` (empty list) | `rules list cannot be empty (would deny all traffic). Use access: full or remove rules.` | -| Host wildcard is bare `*` or `**` | `host wildcard '*' matches all hosts; use specific patterns like '*.example.com'` | -| Host wildcard does not start with `*.` or `**.`| `host wildcard must start with '*.' or '**.' (e.g., '*.example.com'), got '{host}'` | -| `allowed_ips` entry overlaps always-blocked range | `allowed_ips entry {entry} falls within always-blocked range (loopback/link-local/unspecified)` | -| Invalid HTTP method in REST rules | _(warning, not error)_ | - -### Errors (Live Update Rejection) - -These errors are returned by the gateway's `UpdateSandboxPolicy` handler and reject the update before it is persisted. See `crates/openshell-server/src/grpc.rs`. - -| Condition | Error Message | -|-----------|---------------| -| `filesystem_policy` differs from version 1 | `filesystem policy cannot be changed on a live sandbox (applied at startup)` | -| `landlock` differs from version 1 | `landlock policy cannot be changed on a live sandbox (applied at startup)` | -| `process` differs from version 1 | `process policy cannot be changed on a live sandbox (applied at startup)` | - -### Errors (Rule Merge Rejection) - -These errors are returned by the gateway's `merge_chunk_into_policy` when approving proposed rules. See `crates/openshell-server/src/grpc/policy.rs` -- `validate_rule_not_always_blocked()`. - -| Condition | Error Message | -|-----------|---------------| -| Proposed endpoint host is a literal always-blocked IP | `proposed rule endpoint host '{host}' is an always-blocked address (loopback/link-local/unspecified)` | -| Proposed endpoint host is `localhost` | `proposed rule endpoint host 'localhost' is always blocked` | -| Proposed `allowed_ips` entry overlaps always-blocked range | `proposed rule contains always-blocked allowed_ips entry '{entry}'` | - -### Warnings (Log Only) - -| Condition | Warning Message | -| ---------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------- | -| `tls: terminate` or `tls: passthrough` on any endpoint | `'tls: {value}' is deprecated; TLS termination is now automatic. Use 'tls: skip' to disable.` | -| `tls: skip` with L7 rules on port 443 | `'tls: skip' with L7 rules on port 443 — L7 inspection cannot work on encrypted traffic` | -| Host wildcard with ≤2 labels (e.g., `*.com`) | `host wildcard '*.com' is very broad (covers all subdomains of a TLD)` | -| Unknown HTTP method in rules (not GET/HEAD/POST/PUT/DELETE/PATCH/OPTIONS/\*) | `Unknown HTTP method '{method}'. Standard methods: GET, HEAD, POST, PUT, DELETE, PATCH, OPTIONS.` | - -See `crates/openshell-sandbox/src/l7/mod.rs` -- `validate_l7_policies()`. - ---- - -## SSRF Protection (Internal IP Rejection) - -As a defense-in-depth measure, the proxy resolves DNS before connecting to upstream hosts and rejects any connection where the resolved IP address falls within an internal range. This prevents Server-Side Request Forgery (SSRF) attacks where a misconfigured or overly permissive OPA policy could allow connections to infrastructure endpoints such as cloud metadata services (`169.254.169.254`), localhost, or RFC 1918 private addresses. - -The check runs after OPA policy allows the connection but before the TCP connection to the upstream is established. Even if an attacker controls a DNS record that maps an allowed hostname to an internal IP, the proxy blocks the connection. - -### Always-Blocked IP Ranges - -These IP ranges are **always blocked**, even when `allowed_ips` is configured on an endpoint: - -| Range | Description | Reason | -|-------|-------------|--------| -| `127.0.0.0/8` | IPv4 loopback | Prevents proxy bypass via localhost | -| `169.254.0.0/16` | IPv4 link-local | Prevents cloud metadata SSRF (`169.254.169.254`) | -| `0.0.0.0` | IPv4 unspecified | Prevents binding/connecting to all interfaces | -| `::1` | IPv6 loopback | Prevents proxy bypass via IPv6 localhost | -| `::` | IPv6 unspecified | Prevents binding/connecting to all interfaces | -| `fe80::/10` | IPv6 link-local | Prevents IPv6 link-local access | -| `::ffff:0:0/96` (mapped) | IPv4-mapped IPv6 addresses are unwrapped and checked as IPv4 | | - -These ranges are enforced at multiple layers: load-time validation rejects `allowed_ips` entries that overlap these ranges (see [`parse_allowed_ips`](#implementation)), the server rejects proposed rules targeting them (see [Server-Side Defense-in-Depth](#server-side-defense-in-depth)), and the proxy runtime blocks resolved IPs that fall within them. - -### Default-Blocked IP Ranges (Private) - -These ranges are blocked by default but can be selectively allowed via the `allowed_ips` field on an endpoint: - -| Range | Description | -|-------|-------------| -| `10.0.0.0/8` | RFC 1918 private (Class A) | -| `172.16.0.0/12` | RFC 1918 private (Class B) | -| `192.168.0.0/16` | RFC 1918 private (Class C) | -| `fc00::/7` | IPv6 Unique Local Address (ULA) private space | - -### Implementation - -IP classification helpers live in `crates/openshell-core/src/net.rs` and are shared across the sandbox proxy, the mechanistic mapper, and the gateway server: - -- **`is_always_blocked_ip(ip: IpAddr) -> bool`**: Checks if an IP is always blocked regardless of policy — loopback (`127.0.0.0/8`), link-local (`169.254.0.0/16`), and unspecified (`0.0.0.0`). For IPv6, unwraps IPv4-mapped addresses (`::ffff:x.x.x.x`) via `to_ipv4_mapped()` and applies IPv4 checks. Used in the `allowed_ips` code path and by `implicit_allowed_ips_for_ip_host` to enforce the hard block even when private IPs are permitted. - -- **`is_always_blocked_net(net: IpNet) -> bool`**: Checks if a CIDR network overlaps any always-blocked range. Returns `true` if the network contains or overlaps loopback, link-local, or unspecified addresses. A CIDR like `0.0.0.0/0` is rejected because it contains always-blocked addresses. Used at policy load time by `parse_allowed_ips` and at server-side approval time by `validate_rule_not_always_blocked`. - -- **`is_internal_ip(ip: IpAddr) -> bool`**: Classifies an IP address as internal or public. Broader than `is_always_blocked_ip` — also includes RFC 1918 private ranges (`10/8`, `172.16/12`, `192.168/16`) and IPv6 ULA (`fc00::/7`). Used in the default (no `allowed_ips`) SSRF code path and by the mechanistic mapper to detect when `allowed_ips` should be populated in proposals. - -Runtime resolution and enforcement functions remain in `crates/openshell-sandbox/src/proxy.rs`: - -- **`resolve_and_reject_internal(host, port, entrypoint_pid) -> Result, String>`**: Default SSRF check. Resolves the host using the sandbox's `/etc/hosts` first on Linux (via `/proc//root/etc/hosts`, which captures Kubernetes `hostAliases`), then falls back to `tokio::net::lookup_host()`. It checks every resolved address against `is_internal_ip()`. If any address is internal, the entire connection is rejected. - -- **`resolve_and_check_allowed_ips(host, port, allowed_ips, entrypoint_pid) -> Result, String>`**: Allowlist-based SSRF check. Resolves the host using the sandbox's `/etc/hosts` first on Linux, rejects any always-blocked IPs, then verifies every resolved address matches at least one entry in the `allowed_ips` list. - -- **`parse_allowed_ips(raw) -> Result, String>`**: Parses CIDR/IP strings into typed `IpNet` values. **Rejects entries at load time** that overlap always-blocked ranges (loopback, link-local, unspecified) via `is_always_blocked_net`. Accepts both CIDR notation (`10.0.5.0/24`) and bare IPs (`10.0.5.20`, treated as `/32`). This prevents confusing UX where an entry is accepted in policy but silently denied at runtime. - -- **`implicit_allowed_ips_for_ip_host(host) -> Vec`**: When a policy endpoint has a literal IP address as its host (e.g., `10.0.5.20`), synthesizes an `allowed_ips` entry so the allowlist-validation path is used instead of blanket internal-IP rejection. **Skips always-blocked addresses** — if the host is loopback, link-local, or unspecified, returns empty and logs a warning instead of synthesizing an un-enforceable entry. - -### Server-Side Defense-in-Depth - -The gateway server provides an additional validation layer when merging proposed rules into a sandbox's active policy. Before `merge_chunk_into_policy` applies a proposed rule, it calls `validate_rule_not_always_blocked` (in `crates/openshell-server/src/grpc/policy.rs`) which: - -1. Checks if the proposed endpoint host is a literal always-blocked IP (via `is_always_blocked_ip`) or `localhost`. -2. Checks each `allowed_ips` entry for overlap with always-blocked ranges (via `is_always_blocked_net`). - -If either check fails, the merge returns `INVALID_ARGUMENT` and the proposed rule is not applied. This prevents always-blocked destinations from entering the active policy even if the sandbox's mechanistic mapper or an older sandbox version did not filter them. - -### Placement in Proxy Flow - -The SSRF check applies to both CONNECT and forward proxy requests. For forward proxy, an additional private-IP gate requires all resolved IPs to be RFC 1918 private. - -```mermaid -flowchart TD - A["Request received"] --> B{CONNECT?} - B -- Yes --> INF{inference.local?} - INF -- Yes --> C["InferenceContext: route locally"] - INF -- No --> D[OPA policy evaluation] - B -- No --> FP["Forward proxy path
(see Forward Proxy Mode)"] - D --> E{Allowed?} - E -- No --> F["403 Forbidden"] - E -- Yes --> G{allowed_ips on endpoint?} - G -- Yes --> VAL["parse_allowed_ips:
validate no always-blocked entries"] - VAL --> VAL_OK{Valid?} - VAL_OK -- No --> J2["Connection rejected
(always-blocked entry in allowed_ips)"] - VAL_OK -- Yes --> H["resolve_and_check_allowed_ips(host, port, nets)"] - H --> I{All IPs in allowlist
and not always-blocked?} - I -- No --> J["403 Forbidden + log warning"] - I -- Yes --> K["TcpStream::connect(resolved addrs)"] - G -- No --> L["resolve_and_reject_internal(host, port)"] - L --> M{All IPs public?} - M -- No --> J - M -- Yes --> K - K --> N["200 Connection Established"] - - FP --> FP_OPA["OPA evaluation + require allowed_ips"] - FP_OPA --> FP_VAL["parse_allowed_ips: validate"] - FP_VAL --> FP_RESOLVE["resolve_and_check_allowed_ips"] - FP_RESOLVE --> FP_PRIVATE{All IPs private?} - FP_PRIVATE -- No --> J - FP_PRIVATE -- Yes --> FP_CONNECT["TCP connect + rewrite + relay"] -``` - -### Private IP Access via `allowed_ips` - -The `allowed_ips` field on a `NetworkEndpoint` enables controlled access to private IP space. When present, the default SSRF internal-IP rejection is replaced by an allowlist check: resolved IPs must match at least one entry in `allowed_ips`, and always-blocked ranges (loopback, link-local, unspecified) are still rejected. - -**Load-time validation**: `parse_allowed_ips` rejects entries that overlap always-blocked ranges with a hard error at policy load time. This catches misconfigurations early — an entry like `127.0.0.0/8` or `0.0.0.0/0` in `allowed_ips` would be silently un-enforceable at runtime, so it is rejected before the policy is applied. The same validation runs in both file mode (sandbox startup) and gRPC mode (live policy updates via `OpaEngine::reload_from_proto`). - -**Sandbox `/etc/hosts` and `hostAliases`**: On Linux, the proxy consults the sandbox's `/etc/hosts` before falling back to DNS. This gives policy evaluation the same hostname-to-IP view that sandboxed tools see when Kubernetes `hostAliases` populate `/etc/hosts`. This is resolver input only. It does **not** synthesize `allowed_ips`, and it does not bypass the private-IP SSRF check. If `searxng.local` resolves to `192.168.1.105`, the request still needs `allowed_ips: ["192.168.1.105/32"]` to succeed. - -**Implicit `allowed_ips` for IP hosts**: When a policy endpoint has a literal IP address as its host (e.g., `host: 10.0.5.20`), the proxy synthesizes an `allowed_ips` entry automatically via `implicit_allowed_ips_for_ip_host`. If the host is an always-blocked address (e.g., `127.0.0.1`, `169.254.169.254`, `0.0.0.0`), the function returns empty and logs a warning — no `allowed_ips` entry is synthesized, so the standard SSRF rejection applies. - -This supports three usage modes: - -| Mode | Endpoint Configuration | Behavior | -|------|----------------------|----------| -| **Default** | `host` only, no `allowed_ips` | Standard SSRF protection: all private IPs blocked | -| **Host + allowlist** | `host` + `allowed_ips` | Domain must match `host` AND resolve to an IP in `allowed_ips` | -| **Hostless allowlist** | `allowed_ips` only (no `host`) | Any domain allowed on the specified `port`, as long as it resolves to an IP in `allowed_ips` | - -Example: - -```yaml -network_policies: - websearch: - name: websearch - endpoints: - - host: searxng.local - port: 8080 - allowed_ips: - - "192.168.1.105/32" - binaries: - - path: /usr/bin/curl -``` - -With a matching sandbox `/etc/hosts` entry such as `192.168.1.105 searxng.local`, this policy works. Without the `allowed_ips` entry, the request stays blocked because the resolved destination is private. - -#### `allowed_ips` Format - -Entries can be: - -- **CIDR notation**: `10.0.5.0/24`, `172.16.0.0/12`, `192.168.1.0/24` -- **Exact IP**: `10.0.5.20` (treated as `/32` for IPv4 or `/128` for IPv6) - -Entries that overlap always-blocked ranges — loopback (`127.0.0.0/8`), link-local (`169.254.0.0/16`), or unspecified (`0.0.0.0`) — are rejected at load time with a hard error. Broad CIDRs that contain always-blocked addresses (e.g., `0.0.0.0/0`) are also rejected. - -#### Hostless Endpoints (`allowed_ips` without `host`) - -When an endpoint has `allowed_ips` but no `host`, it matches **any hostname** on the specified port. This is useful for allowing access to a range of internal services without enumerating every hostname. The resolved IP must still fall within the allowlist. - -**OPA behavior**: The Rego `endpoint_allowed` rule has a clause that matches hostless endpoints by port only. The `matched_endpoint_config` rule includes these endpoints via `endpoint_has_extended_config`. Well-authored policies should ensure hostless endpoints use different ports than host-based endpoints to avoid OPA complete-rule conflicts. - -#### Example: Host + Allowlist (Mode 2) - -```yaml -network_policies: - internal_api: - name: internal_api - endpoints: - - host: api.internal.corp - port: 8080 - allowed_ips: - - "10.0.5.0/24" - binaries: - - { path: /usr/bin/curl } -``` - -The sandbox can connect to `api.internal.corp:8080` only if DNS resolves to an IP within `10.0.5.0/24`. Connections to any other host, or to `api.internal.corp` resolving outside the allowlist, are blocked. - -#### Example: Hostless Allowlist (Mode 3) - -```yaml -network_policies: - private_network: - name: private_network - endpoints: - - port: 8080 - allowed_ips: - - "10.0.5.0/24" - - "10.0.6.0/24" - binaries: - - { path: /usr/bin/curl } -``` - -Any hostname on port 8080 is allowed, provided DNS resolves to an IP within `10.0.5.0/24` or `10.0.6.0/24`. This allows access to multiple internal services without listing each hostname. - -### DNS Resolution Failure - -If DNS resolution fails (no addresses returned or lookup error), the connection is rejected with a descriptive error. This prevents connections to hosts that cannot be validated. - ---- - -## Complete Example: Mixed L4 and L7 Policy - -This example demonstrates all policy features in a single file. - -```yaml -version: 1 - -filesystem_policy: - include_workdir: true - read_only: - - /usr - - /lib - - /proc - - /dev/urandom - - /app - - /etc - read_write: - - /sandbox - - /tmp - - /dev/null - -landlock: - compatibility: best_effort - -process: - run_as_user: sandbox - run_as_group: sandbox - -network_policies: - # L4-only: Claude Code can reach Anthropic APIs (no L7 inspection) - claude_code: - name: claude_code - endpoints: - - { host: api.anthropic.com, port: 443 } - - { host: statsig.anthropic.com, port: 443 } - - { host: sentry.io, port: 443 } - binaries: - - { path: /usr/local/bin/claude } - - # L7 + auto-TLS: Full access with HTTPS inspection (TLS terminated automatically) - claude_code_inspected: - name: claude_code_inspected - endpoints: - - host: api.anthropic.com - port: 443 - protocol: rest - enforcement: enforce - access: full - binaries: - - { path: /usr/local/bin/claude } - - # L7 with access preset: Read-only API access (GET, HEAD, OPTIONS) - github_readonly: - name: github_readonly - endpoints: - - host: api.github.com - port: 8080 - protocol: rest - enforcement: audit - access: read-only - binaries: - - { path: /usr/bin/curl } - - # L7 with explicit rules: Fine-grained method+path control - internal_api: - name: internal_api - endpoints: - - host: api.internal.svc - port: 8080 - protocol: rest - enforcement: enforce - rules: - - allow: - method: GET - path: "/api/v1/**" - - allow: - method: POST - path: "/api/v1/data" - binaries: - - { path: /usr/bin/curl } - - # L4-only: Git operations via glab CLI - gitlab: - name: gitlab - endpoints: - - { host: gitlab.com, port: 443 } - binaries: - - { path: /usr/bin/glab } - - # Glob binary pattern: Any binary in /usr/bin/ can reach this endpoint - monitoring: - name: monitoring - endpoints: - - { host: metrics.internal, port: 9090 } - binaries: - - { path: "/usr/bin/*" } - - # Private IP access: host + allowed_ips (SSRF allowlist) - internal_database: - name: internal_database - endpoints: - - host: db.internal.corp - port: 5432 - allowed_ips: - - "10.0.5.0/24" - binaries: - - { path: /usr/bin/curl } - - # Hostless private IP access: any hostname on port 8080 within the allowlist - private_services: - name: private_services - endpoints: - - port: 8080 - allowed_ips: - - "10.0.5.0/24" - - "10.0.6.0/24" - binaries: - - { path: /usr/bin/curl } - - # Host wildcard: allow any subdomain of example.com on dual ports - example_apis: - name: example_apis - endpoints: - - host: "*.example.com" - ports: - - 443 - - 8443 - binaries: - - { path: /usr/bin/curl } - - # Multi-port with L7: same L7 rules applied across two ports (TLS auto-terminated) - multi_port_l7: - name: multi_port_l7 - endpoints: - - host: api.internal.svc - ports: [8080, 9090] - protocol: rest - enforcement: enforce - access: read-only - binaries: - - { path: /usr/bin/curl } - - # Forward proxy + CONNECT: private service accessible via plain HTTP or tunnel - # With allowed_ips set and the destination being a private IP, both - # `http://10.86.8.223:8000/path` (forward proxy) and - # `CONNECT 10.86.8.223:8000` (tunnel) work. - computer_control: - name: computer-control - endpoints: - - host: 10.86.8.223 - port: 8000 - allowed_ips: - - "10.86.8.223/32" - binaries: - - { path: /usr/local/bin/python3.13 } - -``` - ---- - -## Proto-to-YAML Field Mapping - -When the gateway delivers policy via gRPC, the protobuf `SandboxPolicy` message fields map to YAML keys as follows: - -| Proto Message | Proto Field | YAML Key | -| ------------------- | ------------------------------------------------------------------- | ------------------------------------------- | -| `SandboxPolicy` | `filesystem` | `filesystem_policy` | -| `SandboxPolicy` | `landlock` | `landlock` | -| `SandboxPolicy` | `process` | `process` | -| `SandboxPolicy` | `network_policies` | `network_policies` | -| `FilesystemPolicy` | `include_workdir` | `filesystem_policy.include_workdir` | -| `FilesystemPolicy` | `read_only` | `filesystem_policy.read_only` | -| `FilesystemPolicy` | `read_write` | `filesystem_policy.read_write` | -| `LandlockPolicy` | `compatibility` | `landlock.compatibility` | -| `ProcessPolicy` | `run_as_user` | `process.run_as_user` | -| `ProcessPolicy` | `run_as_group` | `process.run_as_group` | -| `NetworkPolicyRule` | `name` | `network_policies..name` | -| `NetworkPolicyRule` | `endpoints` | `network_policies..endpoints` | -| `NetworkPolicyRule` | `binaries` | `network_policies..binaries` | -| `NetworkEndpoint` | `host`, `port`, `ports`, `protocol`, `tls`, `enforcement`, `access`, `rules`, `allowed_ips` | Same field names. `port`/`ports` normalized during loading (see [Multi-Port Endpoints](#multi-port-endpoints)). | -| `L7Rule` | `allow` | `rules[].allow` | -| `L7Allow` | `method`, `path`, `command` | `rules[].allow.method`, `.path`, `.command` | - -The conversion is performed in `crates/openshell-sandbox/src/opa.rs` -- `proto_to_opa_data_json()`. - ---- - -## Enforcement Application Order - -The sandbox supervisor applies enforcement mechanisms in a specific order during the child process `pre_exec` (after `fork()`, before `exec()`): - -1. **Network namespace entry** -- `setns(fd, CLONE_NEWNET)` places the child in the isolated namespace -2. **Privilege drop** -- `initgroups()` + `setgid()` + `setuid()` switch to the sandbox user -3. **Landlock** -- Filesystem access rules are applied -4. **Seccomp** -- Socket domain restrictions are applied - -This ordering is intentional: privilege dropping needs `/etc/group` and `/etc/passwd` access, which Landlock may subsequently restrict. Network namespace entry must happen before any network operations. See `crates/openshell-sandbox/src/process.rs` -- `spawn_impl()`. - ---- - -## Rego Rule Architecture - -The OPA engine evaluates two categories of rules: - -### L4 Rules (per-connection) - -Evaluated on every CONNECT request and every forward proxy request. The same OPA input is used in both cases. - -| Rule | Signature | Returns | -| ------------------------- | ----------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------- | -| `allow_network` | `input.network.host`, `input.network.port`, `input.exec.path`, `input.exec.ancestors`, `input.exec.cmdline_paths` | `true` if any policy matches both endpoint and binary | -| `network_action` | Same input | `"allow"` if endpoint + binary matched, `"deny"` otherwise | -| `deny_reason` | Same input | Human-readable string explaining why access was denied | -| `matched_network_policy` | Same input | Name of the matched policy (for audit logging) | -| `matched_endpoint_config` | Same input | Raw endpoint object for L7 config extraction (returned if endpoint has `protocol`, `allowed_ips`, or explicit TLS config) | - -### L7 Rules (per-request within tunnel) - -| Rule | Signature | Returns | -| --------------------- | ---------------------------------------------------------------------------------------------------------- | -------------------------------------------------------------- | -| `allow_request` | `input.network.*`, `input.exec.*`, `input.request.method`, `input.request.path`, optional `request.graphql` | `true` if the request matches the matched endpoint's L7 rules | -| `request_deny_reason` | Same input | Human-readable deny message | - -See `sandbox-policy.rego` for the full Rego implementation. - ---- - -## Sandbox Log Filtering - -The `nav logs` command retrieves log lines from the gateway's in-memory log buffer. Two server-side filters narrow the output before logs are sent to the CLI. - -### Source Filter (`--source`) - -Log lines carry a `source` field identifying their origin. The `--source` flag filters by this field. - -| Value | Description | -|-------|-------------| -| `all` | Show logs from all sources (default). Translates to an empty filter list server-side. | -| `gateway` | Show only server-side logs (reconciler events, gRPC handler traces). Logs with an empty `source` field are treated as `gateway` for backward compatibility. | -| `sandbox` | Show only supervisor logs (proxy decisions, OPA evaluations, identity checks). These are pushed from the sandbox to the gateway via `PushSandboxLogs`. | - -Multiple sources can be specified: `--source gateway --source sandbox` is equivalent to `--source all`. - -```bash -# Show only proxy/OPA logs from the sandbox supervisor -nav logs my-sandbox --source sandbox - -# Show only gateway-side reconciler logs -nav logs my-sandbox --source gateway -``` - -The filter applies to both one-shot mode (`GetSandboxLogs` RPC) and streaming mode (`--tail`, via `WatchSandbox` RPC). In both cases, the server evaluates `source_matches()` before sending each log line to the client. See `crates/openshell-server/src/grpc.rs` -- `source_matches()`, `get_sandbox_logs()`. - -### Level Filter (`--level`) - -The `--level` flag sets a minimum log severity. Only lines at or above the specified level are returned. - -| Level | Numeric | Passes when `--level` is | -|-------|---------|--------------------------| -| `ERROR` | 0 | error, warn, info, debug, trace | -| `WARN` | 1 | warn, info, debug, trace | -| `INFO` | 2 | info, debug, trace | -| `DEBUG` | 3 | debug, trace | -| `TRACE` | 4 | trace | - -The default (empty string) disables level filtering -- all levels pass. An unrecognized level string is assigned numeric value 5, so it always passes. - -```bash -# Show only WARN and ERROR logs -nav logs my-sandbox --level warn - -# Combine with source filter: only sandbox ERROR logs -nav logs my-sandbox --source sandbox --level error -``` - -The filter is applied server-side via `level_matches()` in both one-shot and streaming modes. See `crates/openshell-server/src/grpc.rs` -- `level_matches()`. - -### Proto Messages - -The source and level filters are carried in both log-related RPC messages: - -| RPC | Proto Message | Source Field | Level Field | -|-----|---------------|--------------|-------------| -| `GetSandboxLogs` | `GetSandboxLogsRequest` | `repeated string sources` | `string min_level` | -| `WatchSandbox` | `WatchSandboxRequest` | `repeated string log_sources` | `string log_min_level` | - -An empty `sources`/`log_sources` list means no source filtering (all sources pass). An empty `min_level`/`log_min_level` string means no level filtering (all levels pass). See `proto/openshell.proto`. - ---- - -## Cross-References - -- [Sandbox Architecture](sandbox.md) -- Full sandbox lifecycle, enforcement mechanisms, and component interaction -- [Gateway Architecture](gateway.md) -- How the gateway stores and delivers policies via gRPC -- [Gateway Settings Channel](gateway-settings.md) -- Runtime settings channel, global policy override, CLI/TUI settings commands -- [Inference Routing](inference-routing.md) -- How `inference.local` requests are routed to model backends -- [Overview](README.md) -- System-level context for how policies fit into the platform -- [Plain HTTP Forward Proxy Plan](plans/plain-http-forward-proxy.md) -- Design document for the forward proxy feature +Never log secrets, credentials, bearer tokens, or query parameters in OCSF +messages. OCSF JSONL output may be shipped to external systems. diff --git a/architecture/system-architecture.md b/architecture/system-architecture.md deleted file mode 100644 index b271cdd67..000000000 --- a/architecture/system-architecture.md +++ /dev/null @@ -1,212 +0,0 @@ -# OpenShell System Architecture - -```mermaid -graph TB - %% ============================================================ - %% USER'S MACHINE - %% ============================================================ - subgraph UserMachine["User's Machine"] - CLI["OpenShell CLI
(openshell)"] - TUI["OpenShell TUI
(openshell term)"] - SDK["Python SDK
(openshell)"] - LocalConfig["~/.config/openshell/
gateways, mTLS certs,
active_gateway"] - end - - %% ============================================================ - %% GATEWAY AND COMPUTE PLATFORM - %% ============================================================ - subgraph Cluster["Gateway and Compute Platform"] - - ComputeDriver["Compute Driver
(Docker, Podman,
Kubernetes, VM)"] - DockerAPI["Docker API"] - PodmanAPI["Podman API"] - KubeAPI["Kubernetes API
(optional)"] - VMDriver["VM Driver
(experimental)"] - - subgraph NSNamespace["Gateway runtime"] - - subgraph GatewayPod["Gateway Process"] - Gateway["openshell-server
:8080
(gRPC + HTTP, mTLS)"] - SQLite[("SQLite DB
/var/openshell/
openshell.db")] - SupRegistry["SupervisorSessionRegistry
(live sessions + pending relays)"] - WatchBus["SandboxWatchBus
(in-memory broadcast)"] - LogBus["TracingLogBus
(in-memory broadcast)"] - end - - subgraph SandboxPod["Sandbox Workload
(container, pod, or VM)"] - - subgraph Supervisor["Sandbox Supervisor
(privileged user)"] - SSHServer["Embedded SSH
Server (russh)
Unix socket
/run/openshell/ssh.sock"] - RelayBridge["Relay Bridge
(ConnectSupervisor +
RelayStream client)"] - Proxy["HTTP CONNECT
Proxy
10.200.0.1:3128"] - OPA["OPA Policy Engine
(regorus, in-process)"] - InferenceRouter["Inference Router
(openshell-router)"] - CertCache["TLS MITM
Cert Cache"] - end - - subgraph AgentProcess["Agent Process (restricted user)"] - Agent["AI Agent
(Claude / OpenCode /
Codex / Openclaw)"] - Landlock["Landlock FS
Isolation"] - Seccomp["Seccomp BPF
Filtering"] - end - - NetNS["Network Namespace
(veth pair:
10.200.0.1 <-> 10.200.0.2)"] - end - end - - subgraph ASNamespace["Kubernetes driver only"] - CRDController["Agent Sandbox
CRD Controller"] - end - - end - - %% ============================================================ - %% EXTERNAL SYSTEMS - %% ============================================================ - subgraph ExternalAI["AI Provider APIs"] - Anthropic["Anthropic API
api.anthropic.com:443"] - OpenAI["OpenAI API
api.openai.com:443"] - NVIDIA_API["NVIDIA NIM
integrate.api.nvidia.com:443"] - end - - subgraph CodeHosting["Code Hosting"] - GitHub["GitHub
github.com:443
api.github.com:443"] - GitLab["GitLab
gitlab.com:443"] - end - - subgraph InferenceBackends["Self-Hosted Inference"] - LMStudio["LM Studio"] - VLLM["vLLM"] - end - - subgraph PackageRegistries["Package Registries"] - PyPI["PyPI
pypi.org:443"] - NPM["npm Registry
registry.npmjs.org:443"] - end - - subgraph ContainerRegistry["Container Registry"] - GHCR["GitHub Container Registry
ghcr.io"] - end - - %% ============================================================ - %% CONNECTIONS: User Machine --> Cluster - %% ============================================================ - CLI -- "gRPC over HTTPS (mTLS)
service / ingress / port-forward" --> Gateway - TUI -- "gRPC polling (mTLS)
every 2s" --> Gateway - SDK -- "gRPC over HTTPS (mTLS)" --> Gateway - CLI -- "HTTP CONNECT upgrade
/connect/ssh (mTLS)" --> Gateway - CLI -. "reads mTLS certs" .-> LocalConfig - - %% ============================================================ - %% CONNECTIONS: Gateway internals - %% ============================================================ - Gateway --> SQLite - Gateway --> SupRegistry - Gateway -- "Create / delete / watch
sandboxes" --> ComputeDriver - ComputeDriver --> DockerAPI - ComputeDriver --> PodmanAPI - ComputeDriver --> KubeAPI - ComputeDriver --> VMDriver - ComputeDriver -- "status, platform events" --> Gateway - - %% ============================================================ - %% CONNECTIONS: Supervisor session (inbound from sandbox) - %% ============================================================ - RelayBridge -- "ConnectSupervisor
(persistent bidi stream)" --> SupRegistry - RelayBridge -- "RelayStream
(per-invocation byte bridge,
same HTTP/2 connection)" --> SupRegistry - RelayBridge -- "Unix socket
SSH bytes" --> SSHServer - - %% ============================================================ - %% CONNECTIONS: CRD Controller - %% ============================================================ - CRDController -- "manages Sandbox
custom resources" --> KubeAPI - - %% ============================================================ - %% CONNECTIONS: Sandbox internals - %% ============================================================ - Agent -- "all traffic via
HTTP CONNECT" --> NetNS - NetNS -- "proxied traffic" --> Proxy - Proxy -- "policy evaluation" --> OPA - Proxy -- "inference requests" --> InferenceRouter - Proxy -- "Auto TLS termination
+ optional L7 inspection" --> CertCache - - %% ============================================================ - %% CONNECTIONS: Sandbox --> Gateway (control plane) - %% ============================================================ - Supervisor -- "gRPC (mTLS):
GetSandboxConfig
(policy + settings),
GetProviderEnvironment,
GetInferenceBundle,
PushSandboxLogs" --> Gateway - - %% ============================================================ - %% CONNECTIONS: Sandbox --> External (via proxy) - %% ============================================================ - Proxy -- "HTTPS
(auto TLS termination)" --> Anthropic - Proxy -- "HTTPS" --> OpenAI - Proxy -- "HTTPS" --> NVIDIA_API - Proxy -- "HTTPS" --> GitHub - Proxy -- "HTTPS" --> GitLab - Proxy -- "HTTPS" --> PyPI - Proxy -- "HTTPS" --> NPM - InferenceRouter -- "HTTP/HTTPS
(model ID + auth
rewritten)" --> LMStudio - InferenceRouter -- "HTTP/HTTPS" --> VLLM - InferenceRouter -- "HTTPS" --> NVIDIA_API - - %% ============================================================ - %% CONNECTIONS: Image pulls - %% ============================================================ - ComputeDriver -- "pulls or schedules workloads
that pull images" --> GHCR - - %% ============================================================ - %% CLIENT SSH / EXEC (bytes tunneled via supervisor relay) - %% ============================================================ - CLI -- "HTTP CONNECT /connect/ssh
+ tar-over-SSH file sync
(bytes bridged through
SupervisorSessionRegistry)" --> Gateway - - %% ============================================================ - %% STYLES - %% ============================================================ - classDef userComponent fill:#4A90D9,stroke:#2C5F8A,color:#fff - classDef gateway fill:#E8A838,stroke:#B07D28,color:#fff - classDef sandbox fill:#7CB342,stroke:#558B2F,color:#fff - classDef sandboxInternal fill:#81C784,stroke:#4CAF50,color:#fff - classDef agent fill:#AB47BC,stroke:#7B1FA2,color:#fff - classDef security fill:#EF5350,stroke:#C62828,color:#fff - classDef datastore fill:#5C6BC0,stroke:#3949AB,color:#fff - classDef external fill:#78909C,stroke:#546E7A,color:#fff - classDef k8s fill:#326CE5,stroke:#1A4DB5,color:#fff - classDef config fill:#90A4AE,stroke:#607D8B,color:#fff - - class CLI,TUI,SDK userComponent - class Gateway,SupRegistry,WatchBus,LogBus gateway - class SSHServer,RelayBridge,Proxy,OPA,InferenceRouter,CertCache sandbox - class Agent,Landlock,Seccomp,NetNS agent - class SQLite datastore - class Anthropic,OpenAI,NVIDIA_API,GitHub,GitLab,PyPI,NPM,LMStudio,VLLM,GHCR external - class ComputeDriver,DockerAPI,PodmanAPI,KubeAPI,VMDriver,CRDController k8s - class LocalConfig config -``` - -## Component Legend - -| Color | Category | Examples | -|-------|----------|---------| -| Blue | User-side components | OpenShell CLI, OpenShell TUI, Python SDK | -| Orange | Gateway / Control plane | openshell-server, watch bus, log bus | -| Green | Sandbox supervisor | SSH server, HTTP CONNECT proxy, OPA engine, inference router | -| Purple | Agent process & isolation | AI agent, Landlock, Seccomp, network namespace | -| Indigo | Data stores | SQLite database | -| Dark blue | Compute infrastructure | Docker API, Podman API, K8s API, VM driver | -| Gray | External systems | AI APIs, code hosting, package registries, inference backends | - -## Key Communication Flows - -1. **CLI/SDK to Gateway**: Control-plane traffic uses gRPC over HTTPS with mutual TLS (mTLS) unless the gateway is explicitly deployed in plaintext mode behind a trusted transport. The gateway listens on one multiplexed service port. - -2. **Supervisor Session (inbound from sandbox)**: Each sandbox supervisor opens a persistent `ConnectSupervisor` bidi gRPC stream to the gateway over mTLS. The gateway tracks these in `SupervisorSessionRegistry`. When SSH or exec access is needed, the gateway sends `RelayOpen { channel_id }` on that stream; the supervisor responds by initiating a `RelayStream` RPC on the same HTTP/2 connection whose first frame is a `RelayInit { channel_id }`. Subsequent frames carry raw bytes in both directions. The gateway never dials the sandbox pod. - -3. **SSH / Exec Access**: CLI connects via HTTP CONNECT upgrade at `/connect/ssh` (or calls `ExecSandbox` gRPC). The gateway authenticates, calls `open_relay`, and bridges the client bytes through the supervisor's `RelayStream` to the supervisor's in-sandbox SSH daemon, which binds to a Unix socket (`/run/openshell/ssh.sock`) rather than a TCP port. - -4. **File Sync**: tar archives streamed over the relay-tunneled SSH session (no rsync dependency). - -5. **Sandbox to External**: All agent outbound traffic is forced through the HTTP CONNECT proxy (10.200.0.1:3128) via a network namespace veth pair. OPA/Rego policies evaluate every connection. TLS is automatically detected and terminated for credential injection; endpoints with `protocol` configured also get L7 request-level inspection. - -6. **Inference Routing**: Inference requests are handled inside the sandbox by the openshell-router (not through the gateway). The gateway provides route configuration and credentials via gRPC; the sandbox executes HTTP requests directly to inference backends. - -7. **Sandbox to Gateway (control plane)**: The sandbox supervisor uses gRPC (mTLS) to fetch policies and runtime settings (via `GetSandboxConfig`), provider credentials, inference bundles, and to push logs back to the gateway. The settings channel delivers typed key-value pairs alongside policy through a unified poll loop. This reuses the same mTLS connection that carries `ConnectSupervisor`. diff --git a/architecture/tui.md b/architecture/tui.md deleted file mode 100644 index 850cebc88..000000000 --- a/architecture/tui.md +++ /dev/null @@ -1,198 +0,0 @@ -# OpenShell TUI - -The OpenShell TUI is a terminal user interface for OpenShell, inspired by [k9s](https://k9scli.io/). Instead of typing individual CLI commands to check gateway health, list sandboxes, and manage resources, the TUI gives you a real-time, keyboard-driven dashboard — everything updates automatically and you navigate with a few keystrokes. - -## Launching the TUI - -The TUI is a subcommand of the OpenShell CLI, so it inherits all your existing configuration — gateway selection, TLS settings, and verbosity flags all work the same way. - -```bash -openshell term # launch against the active gateway -nav term # dev alias (builds from source) -nav term --gateway prod # target a specific gateway -OPENSHELL_GATEWAY=prod nav term # same thing, via environment variable -``` - -Gateway resolution follows the same priority as the rest of the CLI: - -1. `--gateway` flag (if provided) -2. `OPENSHELL_GATEWAY` environment variable -3. Active gateway from `~/.config/openshell/active_gateway` - -No separate configuration files or authentication are needed. - -## Screen Layout - -The TUI divides the terminal into four horizontal regions: - -```text -┌─────────────────────────────────────────────────────────────────┐ -│ OpenShell ─ my-gateway ─ Dashboard ● Healthy │ ← title bar -├─────────────────────────────────────────────────────────────────┤ -│ │ -│ (view content — Dashboard or Sandboxes) │ ← main area -│ │ -├─────────────────────────────────────────────────────────────────┤ -│ [1] Dashboard [2] Sandboxes │ [?] Help [q] Quit │ ← nav bar -├─────────────────────────────────────────────────────────────────┤ -│ : │ ← command bar -└─────────────────────────────────────────────────────────────────┘ -``` - -- **Title bar** — shows the OpenShell logo, gateway name, current view, and live gateway health status. -- **Main area** — the active view (Dashboard or Sandboxes). -- **Navigation bar** — lists available views with their shortcut keys, plus Help and Quit. -- **Command bar** — appears when you press `:` to type a command (like vim). - -## Views - -### Dashboard (press `1`) - -The Dashboard is the home screen. It shows your gateway at a glance. - -The dashboard is divided into a top info pane and a middle pane with two tabs: - -- **Top pane**: Gateway name, gateway endpoint, health status, sandbox count. -- **Middle pane**: Tabbed view toggled with `Tab`: - - **Providers** — provider configurations attached to the gateway. - - **Global Settings** — gateway-global runtime settings (fetched via `GetGatewaySettings`). - -**Health status** indicators: - -- `●` **Healthy** (green) — everything is running normally. -- `◐` **Degraded** (yellow) — the gateway is up but something needs attention. -- `○` **Unhealthy** (red) — the gateway is not operating correctly. -- `…` — still connecting or status unknown. - -**Global policy indicator**: When a global policy is active, the gateway row shows `Global Policy Active (vN)` in yellow (the `status_warn` style). The TUI detects this by polling `ListSandboxPolicies` with `global: true, limit: 1` on each tick and checking if the latest revision has `PolicyStatus::Loaded`. See `crates/openshell-tui/src/ui/dashboard.rs`. - -#### Global Settings Tab - -The Global Settings tab shows all registered setting keys with their current values. Keys without a configured value display as ``. - -| Key | Action | -|-----|--------| -| `j` / `↓` | Move selection down | -| `k` / `↑` | Move selection up | -| `Enter` | Edit the selected setting (type-aware: bool toggle, string/int text input) | -| `d` | Delete the selected setting's value | - -Both edit and delete operations display a confirmation modal before applying. Changes are sent to the gateway via the `UpdateSandboxPolicy` RPC with `global: true`. - -### Sandboxes (press `2`) - -The Sandboxes view shows a table of all sandboxes in the gateway: - -| Column | Description | -|--------|-------------| -| NAME | Sandbox name | -| STATUS | Current phase, color-coded (see below) | -| AGE | Time since creation (e.g., `45s`, `12m`, `3h 20m`, `2d 5h`) | -| IMAGE | Container image the sandbox is running | -| PROVIDERS | Provider names attached to the sandbox | -| NOTES | General-purpose metadata (e.g., `fwd:8080,3000` for forwarded ports) | - -Status colors tell you the sandbox state at a glance: - -- **Green** — Ready (sandbox is running and accessible) -- **Yellow** — Provisioning (sandbox is starting up) -- **Red** — Error (something went wrong) -- **Dim** — Deleting or Unknown - -Use `j`/`k` or the arrow keys to move through the list. The selected row is highlighted in green. - -When there are no sandboxes, the view displays: *"No sandboxes found."* - -When viewing a specific sandbox (by pressing `Enter` on a selected row), the bottom pane shows a tabbed view toggled with `l`: - -- **Policy** — the sandbox's current active policy, auto-refreshed on version change. -- **Settings** — effective runtime settings for the sandbox (fetched via `GetSandboxSettings`). - -**Global policy indicator on sandbox detail**: When the sandbox's policy is managed globally (`policy_source == GLOBAL` in the `GetSandboxSettings` response), the metadata pane shows `Policy: managed globally (vN)` in yellow. Draft chunks in the **Network Rules** pane are greyed out and a yellow warning reads `"Cannot approve rules while global policy is active"`. Approve (`a`), reject/revoke (`x`), and approve-all actions are blocked client-side with status messages. See `crates/openshell-tui/src/ui/sandbox_detail.rs` and `crates/openshell-tui/src/ui/sandbox_draft.rs`. - -#### Sandbox Settings Tab - -The Settings tab shows all registered setting keys with their effective values and scope indicators: - -- **(sandbox)** — value is set at sandbox scope -- **(global)** — value is set at gateway-global scope (overrides sandbox) -- **(unset)** — no value configured at any scope - -Navigation and editing use the same keys as the Global Settings tab (`j`/`k`, `Enter` to edit, `d` to delete). Sandbox-scoped edits to globally-managed keys are rejected by the server with a `FailedPrecondition` error. - -## Keyboard Controls - -The TUI has two input modes: **Normal** (default) and **Command** (activated by pressing `:`). - -### Normal Mode - -| Key | Action | -|-----|--------| -| `1` | Switch to Dashboard view | -| `2` | Switch to Sandboxes view | -| `j` or `↓` | Move selection down | -| `k` or `↑` | Move selection up | -| `:` | Enter command mode | -| `q` | Quit | -| `Ctrl+C` | Force quit | - -### Command Mode - -Press `:` to open the command bar at the bottom of the screen. Type a command and press `Enter` to execute it. - -| Command | Action | -|---------|--------| -| `quit` or `q` | Quit | -| `dashboard` or `1` | Switch to Dashboard view | -| `sandboxes` or `2` | Switch to Sandboxes view | - -Press `Esc` to cancel and return to Normal mode. `Backspace` deletes characters as you type. - -## Data Refresh - -The TUI automatically polls the gateway every **2 seconds**. Gateway health, the sandbox list, and global settings all update on each tick, so the display stays current without manual refreshing. This uses the same gRPC calls as the CLI — no additional server-side setup is required. - -When viewing a sandbox, the policy pane auto-refreshes when a new policy version is detected. The sandbox list response includes `current_policy_version` for each sandbox; on every tick the TUI compares this against the currently displayed policy version and re-fetches the full policy only when they differ. This avoids extra RPCs during normal operation while ensuring policy updates appear within the polling interval. The user's scroll position is preserved across auto-refreshes. - -Global settings are refreshed via `GetGatewaySettings` and tracked by `settings_revision` to detect changes. Sandbox settings are fetched as part of the `GetSandboxSettings` response when viewing a specific sandbox. - -## Theme - -The TUI uses a dark terminal theme based on the NVIDIA brand palette: - -- **Background**: Black — the standard terminal background. -- **Text**: White for primary content, dimmed white for labels and secondary information. -- **Accent**: NVIDIA Green (`#76b900`) — used for the selected row, active tab indicator, and healthy/ready status. -- **Borders**: Everglade (`#123123`) — subtle dark green for structural separators. -- **Status**: Green for healthy/ready, yellow for pending/provisioning, red for error/unhealthy. - -The title bar uses white text on an Everglade background to visually anchor the top of the screen. - -## Port Forwarding - -The TUI supports creating sandboxes with port forwarding directly from the create modal. When creating a sandbox, you can specify ports to forward in the **Ports** field (comma-separated, e.g., `8080,3000`). After the sandbox reaches `Ready` state, the TUI automatically spawns background SSH tunnels (`ssh -N -f -L :127.0.0.1:`) for each specified port. - -Forwarded ports are displayed in the **NOTES** column of the sandbox table as `fwd:8080,3000` and in the **Forwards** row of the sandbox detail view. - -Port forwarding lifecycle: - -- **On create**: The TUI polls for sandbox readiness (up to 30 attempts at 2-second intervals), then spawns SSH tunnels. -- **On delete**: Any active forwards for the sandbox are automatically stopped before deletion. -- **PID tracking**: Forward PIDs are stored in `~/.config/openshell/forwards/-.pid`, shared with the CLI. - -The forwarding implementation lives in `openshell-core::forward`, shared between the CLI and TUI. - -## What is Not Yet Available - -The TUI is in active development. The following features are planned but not yet implemented: - -- **Inference views** — browsing inference routes and configuration. -- **Help overlay** — the `?` key is shown in the nav bar but does not open a help screen yet. -- **Command bar autocomplete** — the command bar accepts text but does not offer suggestions. -- **Filtering and search** — no `/` search within views yet. - -## Crate Structure - -The TUI lives in `crates/openshell-tui/`, a separate workspace crate. The CLI crate (`crates/openshell-cli/`) depends on it and launches it via the `Term` command variant in the `Commands` enum. This keeps TUI-specific dependencies (ratatui, crossterm) out of the CLI when not in use. - -The `openshell-tui` crate depends on `openshell-core` for protobuf types, the gRPC client, and shared utilities (e.g., `openshell_core::forward` for port forwarding PID management) — it communicates with the gateway over the same gRPC channel the CLI uses. diff --git a/crates/openshell-core/README.md b/crates/openshell-core/README.md new file mode 100644 index 000000000..51da847b8 --- /dev/null +++ b/crates/openshell-core/README.md @@ -0,0 +1,52 @@ +# openshell-core + +Shared types, constants, configuration, and helpers used across OpenShell +crates. + +## Object Metadata + +Top-level user-facing objects use a Kubernetes-style metadata convention. The +metadata shape provides: + +- Stable server-generated ID. +- Human-readable name. +- Creation timestamp. +- Optional labels for filtering and automation. + +Code that works with object metadata should use the traits in +`openshell_core::metadata` instead of reaching into protobuf fields directly: + +```rust +use openshell_core::{ObjectId, ObjectLabels, ObjectName}; + +let id = sandbox.object_id(); +let name = sandbox.object_name(); +let labels = sandbox.object_labels(); +``` + +Trait methods must tolerate missing metadata and return safe empty values rather +than panicking. + +## Label Rules + +Labels follow Kubernetes-style key and value conventions: + +- Keys may include an optional DNS-prefix followed by `/`. +- Names are limited to alphanumeric characters plus `-`, `_`, and `.`. +- Values use the same character set and may be empty. +- Selectors use comma-separated `key=value` pairs with AND semantics. + +Validate labels at API ingress before persisting objects. + +## Inference Profiles + +Provider inference profiles live in this crate so the gateway, sandbox, and +router agree on provider defaults. Profiles define: + +- Auth header style. +- Default upstream headers. +- Client-supplied passthrough headers. +- Supported inference protocol shapes. + +Do not duplicate provider-specific inference behavior in callers. Add shared +behavior here, then consume it from the gateway, sandbox, and router. diff --git a/crates/openshell-driver-docker/README.md b/crates/openshell-driver-docker/README.md new file mode 100644 index 000000000..7bc8048b2 --- /dev/null +++ b/crates/openshell-driver-docker/README.md @@ -0,0 +1,65 @@ +# openshell-driver-docker + +Docker-backed compute driver for local OpenShell gateways. + +The driver manages sandbox containers through the local Docker daemon with the +`bollard` client. It is intended for developer environments where Docker is +already available and running Kubernetes would be unnecessary. + +## Runtime Model + +The gateway runs as a host process. The Docker driver creates one container per +sandbox and starts the `openshell-sandbox` supervisor inside that container. The +supervisor then creates the nested sandbox namespace for the agent process. + +Docker containers currently use host networking. This lets a supervisor reach a +gateway bound to `127.0.0.1` without requiring a separate bridge listener, NAT +rule, or userland proxy. The container also receives +`host.openshell.internal -> 127.0.0.1` so local host services have a stable +OpenShell-owned name. + +## Container Contract + +The driver-controlled container settings are part of the sandbox security +contract: + +| Setting | Purpose | +|---|---| +| `user = "0"` | The supervisor needs root inside the container to prepare namespaces, mounts, Landlock, and seccomp. | +| `network_mode = "host"` | Lets the supervisor call back to loopback gateway endpoints. | +| `cap_add` | Grants supervisor-only capabilities required for namespace setup and process inspection. | +| `apparmor=unconfined` | Avoids Docker's default profile blocking required mount operations. | +| `restart_policy = unless-stopped` | Keeps managed sandboxes resumable across daemon or gateway restarts. | +| CDI GPU request | Requests all NVIDIA GPUs when the sandbox spec asks for GPU support and daemon CDI support is detected. | + +The agent child process does not retain these supervisor privileges. + +## Callback and TLS + +`OPENSHELL_ENDPOINT` is injected from the gateway's configured gRPC endpoint +without rewriting. Because the container uses host networking, loopback +endpoints such as `http://127.0.0.1:8080` resolve to the host gateway. + +For HTTPS endpoints, the server certificate must include the endpoint host as a +subject alternative name. Docker sandboxes also need the client TLS bundle +mounted into the container and exposed with: + +- `OPENSHELL_TLS_CA` +- `OPENSHELL_TLS_CERT` +- `OPENSHELL_TLS_KEY` + +HTTP endpoints reject TLS material because the supervisor would not use it. + +## Environment Ownership + +The driver merges template environment and sandbox spec environment first, then +overwrites security-critical keys: + +- `OPENSHELL_ENDPOINT` +- `OPENSHELL_SANDBOX_ID` +- `OPENSHELL_SANDBOX` +- `OPENSHELL_SSH_SOCKET_PATH` +- `OPENSHELL_SANDBOX_COMMAND` +- TLS path variables when HTTPS is enabled + +Do not allow sandbox images or templates to override these values. diff --git a/crates/openshell-driver-kubernetes/README.md b/crates/openshell-driver-kubernetes/README.md new file mode 100644 index 000000000..4a8a8f76b --- /dev/null +++ b/crates/openshell-driver-kubernetes/README.md @@ -0,0 +1,49 @@ +# openshell-driver-kubernetes + +Kubernetes-backed compute driver for OpenShell cluster deployments. + +The driver uses the Kubernetes API to create, delete, fetch, and watch sandbox +custom resources in the configured namespace. It runs in-process with the +gateway server. + +## Runtime Model + +The gateway stores platform state and delegates sandbox workload creation to +this driver. Kubernetes owns scheduling and pod lifecycle. The +`openshell-sandbox` supervisor inside each workload owns agent isolation, +credential injection, policy polling, logs, and the gateway relay. + +## Sandbox Resource + +The driver works with the `agents.x-k8s.io/v1alpha1` `Sandbox` custom resource. +Driver events map Kubernetes object state and platform events into the shared +compute-driver protobuf surface used by the gateway. + +Kubernetes API calls use explicit timeouts so gRPC handlers do not block +indefinitely when the API server is slow or unavailable. + +## Workspace Persistence + +Sandbox pods use a PVC-backed `/sandbox` workspace. An init container seeds the +PVC from the image's original `/sandbox` contents on first start and writes a +sentinel so subsequent starts skip the copy. + +This is a stopgap persistence model. It preserves user files across pod +rescheduling but duplicates the base workspace and does not automatically apply +image updates to existing PVCs. Future snapshotting should replace it. + +## Credentials, TLS, and Relay + +The driver injects gateway callback configuration, sandbox identity, TLS client +material, and the supervisor SSH socket path into the workload. Driver-owned +values must override image-provided environment variables. + +The gateway uses the supervisor relay for connect, exec, and file sync. Sandbox +pods do not need direct external ingress for SSH. + +## GPU Support + +When a sandbox requests GPU support, the driver checks node allocatable capacity +for `nvidia.com/gpu` and requests one GPU resource in the workload spec. The +sandbox image must provide the user-space libraries needed by the agent +workload. diff --git a/crates/openshell-driver-podman/README.md b/crates/openshell-driver-podman/README.md new file mode 100644 index 000000000..1193416e1 --- /dev/null +++ b/crates/openshell-driver-podman/README.md @@ -0,0 +1,74 @@ +# openshell-driver-podman + +Podman-backed compute driver for rootless and single-machine OpenShell +deployments. + +The driver talks to the Podman libpod REST API over a Unix socket. It runs +in-process with the gateway server and creates one sandbox container per +sandbox. The `openshell-sandbox` supervisor inside the container still owns the +actual agent isolation. + +## Runtime Model + +```mermaid +flowchart LR + GW["Gateway"] -->|"in-process driver"| D["PodmanComputeDriver"] + D -->|"HTTP over Unix socket"| P["Podman API"] + P --> C["Sandbox container"] + C --> S["openshell-sandbox supervisor"] + S --> A["restricted agent child"] +``` + +The container is the runtime boundary. Inside it, the supervisor creates a +nested network namespace, starts the policy proxy, applies Landlock/seccomp, and +launches the agent child as an unprivileged user. + +## Supervisor Delivery + +Podman uses an OCI image volume to mount the supervisor binary read-only at +`/opt/openshell/bin`. The supervisor image is built from the `supervisor` target +in `deploy/docker/Dockerfile.images`. + +This keeps the supervisor outside the mutable sandbox image while avoiding a +hostPath-style bind mount. + +## Rootless Adaptations + +Rootless Podman has stricter capability behavior than Kubernetes. The container +spec drops all capabilities and adds back only the supervisor capabilities it +needs: + +- `SYS_ADMIN` for namespace and Landlock setup. +- `NET_ADMIN` for nested network namespace routing. +- `SYS_PTRACE` and `DAC_READ_SEARCH` for process identity inspection. +- `SYSLOG` for bypass diagnostics. +- `SETUID` and `SETGID` for dropping to the sandbox user. + +The restricted agent child loses these privileges before user code runs. + +## Network Model + +The driver creates or reuses a Podman bridge network for container-to-host +communication. The agent child does not use that bridge directly. The supervisor +creates a nested namespace and routes agent egress through the local CONNECT +proxy. + +`host.containers.internal` is used for callbacks to the host gateway. Rootless +networking may use pasta under the hood; avoid assumptions that require +container-to-container L2 reachability. + +## Secrets and Environment + +The SSH handshake secret is injected with Podman's `secret_env` API rather than +as a plain inspectable environment value. Sandbox identity, callback endpoint, +relay socket path, and command metadata are driver-controlled environment +variables and must override template values. + +When TLS is configured, the driver mounts the client bundle read-only and sets +the standard `OPENSHELL_TLS_*` environment variables for the supervisor. + +## GPU Support + +GPU sandboxes use CDI device injection when `spec.gpu` is true and NVIDIA CDI +devices are available. The sandbox image must still include the user-space +libraries required by the workload. diff --git a/crates/openshell-driver-vm/README.md b/crates/openshell-driver-vm/README.md index 78550421b..bc56b5e4e 100644 --- a/crates/openshell-driver-vm/README.md +++ b/crates/openshell-driver-vm/README.md @@ -51,7 +51,9 @@ For GPU passthrough (VFIO), pass `-- --gpu` and run with root privileges: sudo -E env "PATH=$PATH" mise run gateway:vm -- --gpu ``` -See [`architecture/vm-gpu-sandbox-guide.md`](../../architecture/vm-gpu-sandbox-guide.md) for full GPU prerequisites and usage. +GPU passthrough uses VFIO and requires host support for IOMMU, root privileges +for bind/unbind operations, and a compatible sandbox image. The public GPU +overview lives in the repository `README.md`. Point the CLI at the gateway with one of: diff --git a/crates/openshell-providers/README.md b/crates/openshell-providers/README.md new file mode 100644 index 000000000..f0b5b7923 --- /dev/null +++ b/crates/openshell-providers/README.md @@ -0,0 +1,32 @@ +# openshell-providers + +Provider discovery and normalization for credentials that sandboxes need at +runtime. + +The gateway persists provider records. The sandbox supervisor fetches resolved +provider environment from the gateway and injects credentials into agent child +processes. This crate keeps provider-specific discovery and normalization logic +out of the CLI and gateway control flow. + +## Responsibilities + +- Discover local credentials from environment variables and known config files. +- Normalize discovered data into provider records. +- Keep provider-specific parsing rules in provider modules. +- Avoid logging credential values. + +## Non-Responsibilities + +- Persisting provider records. +- Authorizing provider CRUD operations. +- Injecting credentials into sandbox child processes. +- Routing inference requests. + +Those are owned by the gateway, sandbox supervisor, and router. + +## Security Notes + +Provider data often contains API keys, bearer tokens, or local account +configuration. Discovery code should return structured values without printing +or tracing secrets. Callers that display provider data must redact sensitive +fields by default. diff --git a/crates/openshell-router/README.md b/crates/openshell-router/README.md index fc034ae09..8c9527da3 100644 --- a/crates/openshell-router/README.md +++ b/crates/openshell-router/README.md @@ -1,6 +1,7 @@ # openshell-router -`openshell-router` is the inference routing and upstream execution engine used by `openshell-server`. +`openshell-router` is the inference routing and upstream execution engine used +by the sandbox proxy and gateway inference validation paths. ## Responsibilities @@ -16,16 +17,20 @@ - Persistence of routes/entities. - Loading sandbox or policy objects. -These are owned by `openshell-server`. +These are owned by `openshell-server` and `openshell-sandbox`. -## Integration contract with openshell-server +## Integration Contract Current split: - `openshell-server`: - - authenticates request origin - - resolves cluster-managed inference route candidates from providers - - loads enabled route candidates from the entity store + - authenticates user-facing inference configuration changes + - resolves managed route candidates from provider records + - validates backend endpoints +- `openshell-sandbox`: + - intercepts `https://inference.local` + - detects the source inference protocol + - passes sanitized requests and resolved route candidates to the router - `openshell-router`: - picks a route from candidates (`proxy_with_candidates`) - forwards the HTTP request upstream and returns the raw response From 08428e009357e20ef5b6bae1218b5ace25c5230e Mon Sep 17 00:00:00 2001 From: Drew Newberry Date: Wed, 6 May 2026 15:56:06 -0700 Subject: [PATCH 2/5] docs(skills): defer architecture doc guidance to agents Signed-off-by: Drew Newberry --- .agents/skills/build-from-issue/SKILL.md | 23 +++++------------- .agents/skills/create-github-pr/SKILL.md | 2 -- .agents/skills/fix-security-issue/SKILL.md | 28 ++++++---------------- 3 files changed, 13 insertions(+), 40 deletions(-) diff --git a/.agents/skills/build-from-issue/SKILL.md b/.agents/skills/build-from-issue/SKILL.md index c91c712b6..dbb5396cd 100644 --- a/.agents/skills/build-from-issue/SKILL.md +++ b/.agents/skills/build-from-issue/SKILL.md @@ -185,7 +185,7 @@ gh issue comment --body "$(cat <<'EOF' - ### Documentation Impact -- +- --- *Revision 1 — initial plan* @@ -431,18 +431,9 @@ Do not proceed to PR creation if E2E verification is not green. ### Step 11: Update Documentation -Use the `arch-doc-writer` sub-agent to update architecture documentation. Use the Task tool: - -``` -Task tool with subagent_type="arch-doc-writer" -``` - -In the prompt, provide: -- Which files were changed and why (from the plan + any deviations) -- The issue context (what was built/fixed) -- Which architecture docs in `architecture/` are likely affected - -Launch one `arch-doc-writer` instance per documentation file that needs updating. If no documentation changes are needed, the `arch-doc-writer` will make that determination. +Review the documentation requirements in `AGENTS.md` and update any affected +docs as part of the implementation. Keep documentation changes scoped to the +behavior or subsystem that changed. ### Step 12: Commit and Push @@ -502,10 +493,9 @@ Closes # ## Checklist - [x] Follows Conventional Commits - [x] Commits are signed off (DCO) -- [x] Architecture docs updated (if applicable) **Documentation updated:** -- ``: +- ``: EOF )" ``` @@ -537,7 +527,7 @@ PR: [#](https://github.com/OWNER/REPO/pull/) - E2E: ### Docs updated -- +- The issue will auto-close when the PR is merged. EOF @@ -691,7 +681,6 @@ User says: "Build issue #42" 7. Add unit tests for pagination logic, integration tests for both endpoints 8. `mise run pre-commit` passes on first attempt 9. E2E tests skipped (no changes under `e2e/`) -10. `arch-doc-writer` updates `architecture/gateway.md` with pagination details 10. Commit, push, create PR with `Closes #42` 11. Post summary comment on issue with PR link 12. Update labels: remove `state:in-progress` + `state:review-ready`, add `state:pr-opened` diff --git a/.agents/skills/create-github-pr/SKILL.md b/.agents/skills/create-github-pr/SKILL.md index 9050db6de..e4d9f81b2 100644 --- a/.agents/skills/create-github-pr/SKILL.md +++ b/.agents/skills/create-github-pr/SKILL.md @@ -158,7 +158,6 @@ PR descriptions must follow the project's [PR template](.github/PULL_REQUEST_TEM ## Checklist - [ ] Follows Conventional Commits - [ ] Commits are signed off (DCO) -- [ ] Architecture docs updated (if applicable) ``` Populate the testing checklist based on what was actually run. Check boxes for steps that were completed. @@ -193,7 +192,6 @@ Closes #456 - [x] Follows Conventional Commits - [x] Commits are signed off (DCO) -- [ ] Architecture docs updated (if applicable) EOF )" ``` diff --git a/.agents/skills/fix-security-issue/SKILL.md b/.agents/skills/fix-security-issue/SKILL.md index 22ffd4254..75703c4bf 100644 --- a/.agents/skills/fix-security-issue/SKILL.md +++ b/.agents/skills/fix-security-issue/SKILL.md @@ -157,24 +157,10 @@ If the review identified a specific exploit scenario, verify that it is no longe ## Step 7: Update Documentation -Use the `arch-doc-writer` sub-agent to update any architecture documentation affected by the fix. Use the Task tool: - -``` -Task tool with subagent_type="arch-doc-writer" -``` - -In the prompt, provide: -- Which files were changed and why -- The security context (what vulnerability was fixed) -- Which architecture docs in `architecture/` are likely affected - -The `arch-doc-writer` will determine which docs need updating and make the changes. Common cases include: -- A new validation layer or middleware was added -- An API contract changed (new required headers, changed error responses, etc.) -- Access control or authentication flow was modified -- Network or infrastructure security boundaries changed - -If the fix is purely internal (e.g., switching to parameterized queries with no external behavior change), documentation updates may not be needed -- let the `arch-doc-writer` make that determination. +Review the documentation requirements in `AGENTS.md` and update any affected +docs as part of the security fix. If the fix is purely internal, such as +switching to parameterized queries with no external behavior change, +documentation updates may not be needed. ## Step 8: Commit, Push, and Open PR @@ -232,7 +218,7 @@ Closes # - **Integration/E2E:** ### Documentation Updated -- ``: +- ``: ### Verification @@ -281,7 +267,7 @@ User says: "Fix security issue #42" 4. Create branch `fix/security-42-input-sanitization` 5. Implement the fix 6. Add unit tests for the sanitization function and an integration test for the endpoint -7. Run `arch-doc-writer` to update `architecture/sandbox.md` with the new input validation layer +7. Update affected documentation per `AGENTS.md`, if needed 8. Commit, push, and open PR with `Closes #42` 9. Report the PR link and changes to the user @@ -294,7 +280,7 @@ User says: "Fix any ready security issues" 3. Fetch the review comment -- determination is "Legitimate concern" 4. Implement parameterized queries 5. Add `test_rejects_sql_injection_in_search_query` unit test and e2e test for the search endpoint -6. `arch-doc-writer` updates API docs to note the query parameter validation +6. Update affected documentation per `AGENTS.md`, if needed 7. Commit, push, open PR with `Closes #78`, report to user ### Issue with non-actionable review From 0b36be51041faef9fce9930a4dc068e5bb4647e0 Mon Sep 17 00:00:00 2001 From: Drew Newberry Date: Wed, 6 May 2026 15:57:29 -0700 Subject: [PATCH 3/5] docs(architecture): document driver philosophy Signed-off-by: Drew Newberry --- architecture/README.md | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/architecture/README.md b/architecture/README.md index 3e566364d..b54475f1b 100644 --- a/architecture/README.md +++ b/architecture/README.md @@ -49,6 +49,35 @@ flowchart TB 5. Agent network traffic goes through the sandbox proxy. The proxy allows, denies, inspects, or routes requests according to policy and inference configuration. 6. Connect, exec, and file sync traffic use a gateway relay to the sandbox supervisor. The gateway does not require direct inbound access to sandbox workloads. +## Driver Philosophy + +OpenShell should integrate with the compute ecosystem instead of replacing it. +Drivers adapt Docker, Podman, Kubernetes, and VM runtimes to a common sandbox +contract, while those runtimes keep ownership of scheduling, image management, +storage primitives, GPU/device exposure, and platform lifecycle. + +Drivers should stay thin: + +- Translate OpenShell sandbox specs into native runtime objects. +- Inject the supervisor, sandbox identity, callback configuration, and runtime + credentials needed by the supervisor. +- Report lifecycle and platform events back through the shared driver contract. +- Preserve native runtime behavior unless it conflicts with the sandbox security + contract. + +The supervisor is where OpenShell-specific enforcement belongs. Filesystem +policy, process privilege reduction, network proxying, inference interception, +credential injection, log emission, and gateway relay behavior should be +consistent across runtimes. If a runtime needs special handling, keep that logic +inside the driver or crate README rather than leaking it into the core sandbox +model. + +When adding a new runtime, prefer native APIs and conventions over bespoke +orchestration. The driver should make OpenShell feel like a well-behaved member +of that ecosystem: observable through standard tools, compatible with existing +image and device workflows, and clear about the small set of assumptions +OpenShell requires. + ## Architecture Docs Architecture docs are short subsystem overviews. User-facing how-to content From 71e1f7496a75617a79411b3e681aa40945aaa8e6 Mon Sep 17 00:00:00 2001 From: Drew Newberry Date: Wed, 6 May 2026 23:30:36 -0700 Subject: [PATCH 4/5] docs(architecture): update architecture overview --- architecture/README.md | 204 ++++++++++++++++++++++++++++------------- 1 file changed, 142 insertions(+), 62 deletions(-) diff --git a/architecture/README.md b/architecture/README.md index b54475f1b..5a1d82831 100644 --- a/architecture/README.md +++ b/architecture/README.md @@ -1,32 +1,87 @@ # OpenShell Architecture -OpenShell runs AI agents in sandboxed environments behind a gateway control -plane. The gateway owns API access, persistence, credentials, and lifecycle -orchestration. A compute runtime creates sandbox workloads. Each sandbox runs a -supervisor that launches the agent as a restricted child process and enforces -policy locally. +OpenShell runs autonomous AI agents in sandboxed environments with explicit +policy, credential, identity, and network boundaries. The target architecture is +built around three stable runtime components: the **CLI**, the **Gateway**, and +the **Supervisor**. + +The CLI, SDK, and TUI provide user-facing access. The gateway is the +authenticated control plane: it owns API access, durable state, policy and +settings delivery, provider and inference configuration, and relay +coordination. The supervisor runs inside every sandbox workload and is the local +security boundary. It launches the agent as a restricted child process and +enforces policy where process identity, filesystem access, network egress, and +runtime credentials are visible. + +Infrastructure-specific work sits behind integration boundaries. Compute, +credentials, control-plane identity, and sandbox identity each have a driver or +adapter boundary so OpenShell can integrate with native runtimes, secret stores, +identity providers, and workload identity systems without moving those concerns +into the core gateway or sandbox model. ```mermaid flowchart TB - CLI["CLI / SDK / TUI"] -->|"gRPC or HTTP"| GW["Gateway"] - GW --> DB[("Gateway database")] - GW --> DRIVER["Compute runtime
Docker, Podman, Kubernetes, VM"] - DRIVER --> SBX["Sandbox workload"] + subgraph USER["User Interfaces"] + CLI["CLI"] + SDK["SDK"] + TUI["TUI"] + end + + subgraph CP["Control Plane"] + GW["Gateway core"] + DB[("Shared persistence")] + COMPUTE["Compute"] + CREDS["Credentials"] + CPIDENT["Control-plane identity"] + SIDENT["Sandbox identity"] + CDRV["Compute driver"] + CRDRV["Credentials driver"] + CPIDRV["Control-plane identity driver"] + SIDRV["Sandbox identity driver"] + end + + subgraph INFRA["Integrated Infrastructure"] + RUNTIME["Docker / Podman / Kubernetes / VM"] + SECRETSTORE["Eg: Keychain / Secret Service / Vault / Kubernetes Secrets"] + IDP["Eg: mTLS / OIDC / Local identity"] + WORKLOADID["Eg: SPIFFE / Gateway-issued workload identity"] + end - subgraph SBX["Sandbox workload"] + subgraph DP["Sandbox Data Plane"] SUP["Supervisor"] PROXY["Policy proxy"] ROUTER["Inference router"] - AGENT["Agent process"] POLICY["OPA policy engine"] - SUP --> AGENT - AGENT --> PROXY - PROXY --> POLICY - PROXY --> ROUTER + AGENT["Restricted agent process"] end - SUP -->|"config, credentials, logs, relay"| GW - PROXY -->|"allowed network traffic"| EXT["External services"] + CLI -->|"gRPC / HTTP"| GW + SDK -->|"gRPC / HTTP"| GW + TUI -->|"gRPC / HTTP"| GW + + GW --> DB + GW --> COMPUTE + GW --> CREDS + GW --> CPIDENT + GW --> SIDENT + + COMPUTE -->|"gRPC / UDS"| CDRV + CREDS -->|"gRPC / UDS"| CRDRV + CPIDENT -->|"gRPC / UDS"| CPIDRV + SIDENT -->|"gRPC / UDS"| SIDRV + + CDRV --> RUNTIME + CRDRV --> SECRETSTORE + CPIDRV --> IDP + SIDRV --> WORKLOADID + RUNTIME -->|"provisions workload"| SUP + + SUP -->|"outbound control, config, logs, relay"| GW + SUP -->|"spawn + restrict"| AGENT + AGENT -->|"all ordinary egress"| PROXY + PROXY -->|"evaluate"| POLICY + PROXY -->|"allowed traffic"| EXT["External services"] + PROXY -->|"inference.local"| ROUTER ROUTER -->|"managed inference"| MODEL["Inference backends"] ``` @@ -34,49 +89,69 @@ flowchart TB | Component | Boundary | |---|---| -| Gateway | Authenticated control plane, state store, provider records, sandbox lifecycle, relay coordination. | -| Compute runtime | Driver-specific creation and deletion of sandbox workloads. | -| Sandbox supervisor | Local sandbox setup, credential injection, policy polling, SSH relay, log push. | -| Policy proxy | Mandatory egress path for agent traffic and policy decisions. | -| Inference router | Sandbox-local forwarding for `https://inference.local`. | - -## Request Flow - -1. A user creates or manages a sandbox through the CLI, SDK, or TUI. -2. The gateway persists state and asks the selected compute runtime to create a workload. -3. The sandbox supervisor starts, fetches policy, settings, providers, and inference routes from the gateway. -4. The supervisor launches the agent as a restricted user in an isolated environment. -5. Agent network traffic goes through the sandbox proxy. The proxy allows, denies, inspects, or routes requests according to policy and inference configuration. -6. Connect, exec, and file sync traffic use a gateway relay to the sandbox supervisor. The gateway does not require direct inbound access to sandbox workloads. - -## Driver Philosophy - -OpenShell should integrate with the compute ecosystem instead of replacing it. -Drivers adapt Docker, Podman, Kubernetes, and VM runtimes to a common sandbox -contract, while those runtimes keep ownership of scheduling, image management, -storage primitives, GPU/device exposure, and platform lifecycle. - -Drivers should stay thin: - -- Translate OpenShell sandbox specs into native runtime objects. -- Inject the supervisor, sandbox identity, callback configuration, and runtime - credentials needed by the supervisor. -- Report lifecycle and platform events back through the shared driver contract. -- Preserve native runtime behavior unless it conflicts with the sandbox security - contract. - -The supervisor is where OpenShell-specific enforcement belongs. Filesystem -policy, process privilege reduction, network proxying, inference interception, -credential injection, log emission, and gateway relay behavior should be -consistent across runtimes. If a runtime needs special handling, keep that logic -inside the driver or crate README rather than leaking it into the core sandbox -model. - -When adding a new runtime, prefer native APIs and conventions over bespoke -orchestration. The driver should make OpenShell feel like a well-behaved member -of that ecosystem: observable through standard tools, compatible with existing -image and device workflows, and clear about the small set of assumptions -OpenShell requires. +| CLI, SDK, TUI | User-facing management surfaces. They talk to the gateway and do not need to know which infrastructure drivers are active. | +| Gateway | Authenticated control plane, API server, durable state, policy and settings delivery, provider and inference config, supervisor session ownership, and relay coordination. | +| Compute subsystem | Sandbox lifecycle semantics: creation, deletion, watching, reconciliation, and state transitions. Platform provisioning details belong to the compute driver. | +| Credentials subsystem | Logical provider and credential resolution. Secret storage and platform-native credential access belong to credentials drivers. | +| Control-plane identity | Authentication and authorization for users, operators, and API clients. External identity verification belongs to identity drivers. | +| Sandbox identity | Workload identity for supervisors and sandbox-to-sandbox authorization. Identity issuance or verification belongs to sandbox identity drivers. | +| Supervisor | Sandbox-local security boundary. It prepares isolation, fetches config, injects credentials, runs relay endpoints, starts the proxy, and launches restricted agent processes. | +| Policy proxy | Mandatory egress path for agent traffic. It enforces destination, binary identity, SSRF, TLS/L7, credential injection, and inference interception rules. | +| Inference router | Sandbox-local forwarding for `https://inference.local` to configured model backends. | + +## Integrating with the Ecosystem + +OpenShell should integrate with infrastructure ecosystems instead of replacing +them. The core value is safe, policy-enforced agent execution. Runtimes, +schedulers, secret stores, identity providers, workload identity systems, image +pipelines, storage, and GPU or device exposure should remain owned by the +platforms that already provide them. + +The gateway owns OpenShell control-plane semantics: sandbox state, lifecycle +ordering, policy and settings resolution, credential mapping, authorization, +inference configuration, and relay coordination. Drivers translate those +semantics into platform-native operations. They should stay thin, preserve +native behavior by default, and report platform lifecycle events back through +the shared contracts. + +The supervisor owns OpenShell sandbox semantics. Filesystem policy, process +privilege reduction, network proxying, inference interception, credential +injection, security logging, and gateway relay behavior should remain +consistent across runtimes. + +This keeps OpenShell usable in local single-player setups, Kubernetes +deployments, VM-backed sandboxes, and future third-party environments. A new +integration should make OpenShell feel like a well-behaved member of that +ecosystem. + +## Gateways and Sandboxes + +The gateway and sandbox split control-plane authority from runtime enforcement. +The gateway owns durable platform state: sandboxes, policy revisions, runtime +settings, provider records, inference configuration, session records, and +authorization decisions. A sandbox owns the local execution boundary: process +identity, filesystem access, network egress, credential injection, local logs, +and the agent child process. + +The relationship is supervisor initiated. Each sandbox supervisor connects +outbound to a known gateway endpoint, authenticates as a sandbox workload, and +keeps a live session open for control traffic and relays. This avoids requiring +every compute driver to solve gateway-to-sandbox reachability through pod IPs, +bridge networks, port mappings, NAT traversal, or bespoke tunnels. The common +runtime requirement is narrower: the supervisor must be able to reach the +gateway. + +The gateway delivers desired state; the sandbox applies it locally. Policy, +settings, credentials, and inference routes flow from the gateway to the +supervisor. The supervisor validates and applies what can change at runtime, +keeps last-known-good config when refresh fails, and leaves static isolation +controls in place until the sandbox is recreated. + +Live operations use the same authenticated gateway-supervisor relationship. +Config refresh, policy updates, credential delivery, log push, connect, exec, +file sync, and relay setup are multiplexed over supervisor sessions. If a +session drops, the sandbox may keep running, but live operations fail or become +unreachable until the supervisor reconnects and reconciles state. ## Architecture Docs @@ -92,5 +167,10 @@ that crate's `README.md`. | [Compute Runtimes](compute-runtimes.md) | Docker, Podman, Kubernetes, VM, sandbox images, and runtime-specific responsibilities. | | [Build](build.md) | Build artifacts, CI/E2E, docs site validation, and release packaging. | -For broad design proposals, use `rfc/`. For temporary working plans, use the -ignored `architecture/plans/` directory. +## `rfc/` vs `architecture/` + +For broad design proposals, use `rfc/`. Once an RFC is adopted, appropriate details should be written back to architecture docs. + +`architecture/` serves as the canonical reference for OpenShell's design and architecture. + +`rfc` serves to help facilitate discussion and ensure features are appropriately designed. These are useful for understanding the context in which certain architecture designs were made. From 34b8507d2c3ee0e624aed62e9d97c5a0de864cd5 Mon Sep 17 00:00:00 2001 From: Drew Newberry Date: Wed, 6 May 2026 23:36:20 -0700 Subject: [PATCH 5/5] docs(architecture): detail gateway persistence --- architecture/gateway.md | 46 +++++++++++++++++++++++++++++++++++++---- 1 file changed, 42 insertions(+), 4 deletions(-) diff --git a/architecture/gateway.md b/architecture/gateway.md index 9c7f3a8d3..f36878cf1 100644 --- a/architecture/gateway.md +++ b/architecture/gateway.md @@ -56,10 +56,48 @@ names, creation timestamps, and labels. Crate-level details live in ## Persistence -The gateway stores protobuf payloads with indexed object metadata. SQLite is the -default local store; Postgres is supported for deployments that need an external -database. Persisted state includes sandboxes, providers, SSH sessions, policy -revisions, settings, inference configuration, and deployment records. +The gateway persistence layer is a protobuf object store. Domain services store +typed protobuf messages as opaque binary payloads, while the database keeps a +small set of indexed metadata columns for lookup, listing, versioning, and +workflow state. The implementation lives in the +[gateway persistence module](../crates/openshell-server/src/persistence/mod.rs); +backend-specific SQL lives in the SQLite and Postgres migration directories +under `crates/openshell-server/migrations/`. + +The storage schema is intentionally narrow: + +| Column | Purpose | +|---|---| +| `id` | Stable gateway-generated object ID and primary key. | +| `object_type` | Logical resource kind, such as `sandbox`, `provider`, `ssh_session`, `inference_route`, `sandbox_policy`, or `draft_policy_chunk`. | +| `name` | Human-readable name, unique within an object type when present. | +| `scope` | Optional owner or namespace for scoped/versioned records, such as a sandbox ID for policy revisions. | +| `version` | Optional monotonically increasing version for scoped records. | +| `status` | Optional workflow state for records such as policy revisions or draft policy chunks. | +| `dedup_key` and `hit_count` | Optional policy-advisor fields for coalescing repeated observations. | +| `payload` | Prost-encoded protobuf payload for the full domain object. | +| `created_at_ms` and `updated_at_ms` | Gateway timestamps used for ordering and list output. | +| `labels` | JSON object carrying Kubernetes-style object labels for filtering and organization. | + +Common resources use generic helpers that derive `object_type`, `id`, `name`, +and labels from protobuf metadata traits before encoding the full message into +`payload`. Policy revisions and draft policy chunks use the same table but also +populate `scope`, `version`, `status`, `dedup_key`, and `hit_count` so the +gateway can efficiently fetch the latest policy, track load status, and manage +advisor drafts without creating resource-specific tables. + +SQLite is the default local store; Postgres is supported for deployments that +need an external database or multi-replica coordination. Both backends expose +the same `Store` API and the same logical schema. Backend differences stay +inside the adapters: for example, SQLite stores labels as JSON text and payloads +as `BLOB`, while Postgres stores labels as `JSONB` and payloads as `BYTEA`. +Domain code should depend on the object-store contract, not SQL dialect details. +This keeps the gateway data model portable across storage backends and leaves +room for future stores that can provide the same object, label, version, and +scope semantics. + +Persisted state includes sandboxes, providers, SSH sessions, policy revisions, +settings, inference configuration, and deployment records. Policy and runtime settings are delivered together through the effective sandbox config path. A gateway-global policy can override sandbox-scoped policy. The