Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .agents/skills/debug-openshell-cluster/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ Use `openshell` first to identify the active endpoint. Then use the platform too

The target deployment flow is:

1. Operator starts or deploys the gateway.
1. Operator starts or deploys the gateway with system packages, systemd, Helm, or a development task. The CLI does not start, stop, or destroy gateway services.
2. Operator configures the compute driver.
3. Operator provides TLS and SSH relay material for the deployment mode.
4. The CLI registers a reachable gateway endpoint with `openshell gateway add`.
Expand Down Expand Up @@ -198,7 +198,7 @@ openshell logs <sandbox-name>
| Kubernetes gateway pod crash loops | Missing secret, bad DB URL, bad TLS config | `kubectl -n openshell logs statefulset/openshell` |
| CLI TLS error | Local mTLS bundle does not match server cert/CA | Check `~/.config/openshell/gateways/<name>/mtls/` |
| Image pull failure | Gateway or sandbox image cannot be pulled | Runtime events and image pull credentials |
| `K8s namespace not ready` with `envoy-gateway-openshell.yaml: the server could not find the requested resource` | Optional Gateway API manifest was auto-applied without Envoy Gateway CRDs, or k3s Helm controller startup exceeded the namespace wait | Confirm the cluster image only bundles core manifests; apply `deploy/kube/manifests/envoy-gateway-openshell.yaml` manually only when `grpcRoute` is enabled |
| `K8s namespace not ready` with `envoy-gateway-openshell.yaml: the server could not find the requested resource` | Optional Gateway API manifest was applied without Envoy Gateway CRDs, or k3s Helm controller startup exceeded the namespace wait | Apply `deploy/kube/manifests/envoy-gateway-openshell.yaml` manually only after Envoy Gateway is installed and `grpcRoute` is enabled |

## Reporting

Expand Down
6 changes: 3 additions & 3 deletions .agents/skills/openshell-cli/SKILL.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ Guide agents through using the `openshell` CLI for sandbox and platform manageme

## Overview

The OpenShell CLI (`openshell`) is the primary interface for managing sandboxes, providers, policies, inference routes, and gateways. This skill teaches agents how to orchestrate CLI commands for common and complex workflows.
The OpenShell CLI (`openshell`) is the primary interface for managing sandboxes, providers, policies, inference routes, and gateway registrations. Gateway service lifecycle is handled outside the CLI by packages, systemd, Helm, or development tasks. This skill teaches agents how to orchestrate CLI commands for common and complex workflows.

**Companion skill**: For creating or modifying sandbox policy YAML content (network rules, L7 inspection, access presets), use the `generate-sandbox-policy` skill. This skill covers the CLI *commands* for the policy lifecycle; `generate-sandbox-policy` covers policy *content authoring*.

Expand Down Expand Up @@ -486,7 +486,7 @@ openshell status # Verify connectivity
```bash
openshell gateway add http://127.0.0.1:8080 --local --name local
openshell gateway add https://gateway.example.com --name production
openshell gateway destroy --name local # Remove local registration
openshell gateway remove local # Remove local registration
```

### Platform-specific deployment inspection
Expand Down Expand Up @@ -549,7 +549,7 @@ $ openshell sandbox upload --help
| Configure gateway inference | `openshell inference set --provider P --model M` |
| View gateway inference | `openshell inference get` |
| Delete sandbox | `openshell sandbox delete <name>` |
| Remove gateway registration | `openshell gateway destroy --name <name>` |
| Remove gateway registration | `openshell gateway remove <name>` |
| Self-teach any command | `openshell <group> <cmd> --help` |

## Companion Skills
Expand Down
43 changes: 11 additions & 32 deletions .agents/skills/openshell-cli/cli-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -27,8 +27,10 @@ openshell
├── gateway
│ ├── add <endpoint> [opts]
│ ├── login [name]
│ ├── destroy [opts]
│ ├── logout [name]
│ ├── remove [name]
│ ├── info [--name]
│ ├── list
│ └── select [name]
├── status
├── inference
Expand Down Expand Up @@ -62,8 +64,7 @@ openshell
│ ├── update <name> --type [opts]
│ └── delete <name>...
├── doctor
│ ├── logs [--name] [-n] [--tail] [--remote] [--ssh-key]
│ └── exec [--name] [--remote] [--ssh-key] -- <command...>
│ └── check
├── term
├── completions <shell>
└── ssh-proxy [opts]
Expand All @@ -82,16 +83,15 @@ Register an existing gateway endpoint.
| `--name <NAME>` | Gateway name |
| `--local` | Register a local endpoint, commonly a trusted port-forward |
| `--remote <USER@HOST>` | Register a remote gateway associated with an SSH destination |
| `--ssh-key <PATH>` | SSH private key for the remote host |

Examples:

- `openshell gateway add http://127.0.0.1:8080 --local --name local`
- `openshell gateway add https://gateway.example.com --name production`

### `openshell gateway destroy`
### `openshell gateway remove [name]`

Remove a gateway registration. For Helm deployments this affects local CLI metadata only; it does not uninstall the Helm release.
Remove a local gateway registration. This removes CLI metadata and stored auth tokens only; package managers, systemd, Helm, Docker, and other platform tools still own the gateway process.

### `openshell gateway login [name]`

Expand All @@ -107,38 +107,17 @@ Show gateway details: endpoint, auth mode, and remote host metadata when present

### `openshell gateway select [name]`

Set the active gateway. Writes to `~/.config/openshell/active_gateway`. When called without arguments, lists all provisioned gateways with the active one marked with `*`.
Set the active gateway. Writes to `~/.config/openshell/active_gateway`. When called without arguments, lists all registered gateways with the active one marked with `*`.

---

## Doctor Commands

### `openshell doctor logs`
### `openshell doctor check`

Fetch logs when gateway metadata supports it. For Helm deployments, prefer `kubectl -n openshell logs statefulset/openshell`.

| Flag | Default | Description |
|------|---------|-------------|
| `--name <NAME>` | active gateway | Gateway name |
| `-n, --lines <N>` | all | Number of log lines to return |
| `--tail` | false | Stream live logs (follow mode) |
| `--remote <USER@HOST>` | auto-resolved | SSH destination for remote gateways |
| `--ssh-key <PATH>` | none | SSH private key for remote gateways |

### `openshell doctor exec -- <COMMAND...>`

Run a diagnostic command when gateway metadata supports it. For Helm deployments, prefer direct `kubectl` and `helm` commands.

| Flag | Default | Description |
|------|---------|-------------|
| `--name <NAME>` | active gateway | Gateway name |
| `--remote <USER@HOST>` | auto-resolved | SSH destination for remote gateways |
| `--ssh-key <PATH>` | none | SSH private key for remote gateways |

Examples:
- `kubectl -n openshell get pods`
- `kubectl -n openshell logs statefulset/openshell`
- `helm -n openshell status openshell`
Validate local Docker prerequisites for standalone gateway development. For
package-managed or Helm gateways, use `systemctl`, `journalctl`, `kubectl`, and
`helm` directly.

---

Expand Down
19 changes: 8 additions & 11 deletions .env.example
Original file line number Diff line number Diff line change
@@ -1,22 +1,19 @@
# OpenShell local development environment
# Copy to .env and customise. Mise loads .env automatically.
#
# Use unique CLUSTER_NAME/GATEWAY_PORT values per worktree to run
# multiple clusters simultaneously; `mise run cluster` will recreate as needed.
# Use unique gateway names and ports per worktree when running standalone
# gateways with `mise run gateway:docker`.

# ---------- Cluster identity ----------
# ---------- Gateway identity ----------

# Name used for the Docker container, k3s volume, TLS secrets, and the
# openshell CLI's active-cluster bookmark. Defaults to the repo directory
# basename (e.g. "openshell-c").
#CLUSTER_NAME=openshell-c
# Gateway name registered by `mise run gateway:docker`.
#OPENSHELL_DOCKER_GATEWAY_NAME=openshell-c

# Default gateway name used by `openshell` commands in this repo when `--gateway`
# is not provided. Usually matches CLUSTER_NAME.
# is not provided. Usually matches OPENSHELL_DOCKER_GATEWAY_NAME.
#OPENSHELL_GATEWAY=openshell-c

# ---------- Ports ----------

# Host port mapped to the k3s NodePort (30051) where the OpenShell gateway
# listens. The CLI connects here. Must be unique per cluster.
#GATEWAY_PORT=8080
# Host port where the standalone gateway listens. Must be unique per worktree.
#OPENSHELL_SERVER_PORT=18080
9 changes: 2 additions & 7 deletions .github/workflows/docker-build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ on:
workflow_call:
inputs:
component:
description: "Component to build (gateway, supervisor, cluster)"
description: "Component to build (gateway, supervisor)"
required: true
type: string
timeout-minutes:
Expand Down Expand Up @@ -73,7 +73,7 @@ jobs:
binary_component=gateway
binary_name=openshell-gateway
;;
supervisor|cluster)
supervisor)
binary_component=sandbox
binary_name=openshell-sandbox
;;
Expand Down Expand Up @@ -246,11 +246,6 @@ jobs:
echo "$output"
grep -q '^openshell-sandbox ' <<<"$output"
;;
cluster)
output="$(docker run --rm --platform "${{ matrix.platform }}" --entrypoint /opt/openshell/bin/openshell-sandbox "$image" --version)"
echo "$output"
grep -q '^openshell-sandbox ' <<<"$output"
;;
esac

merge:
Expand Down
23 changes: 1 addition & 22 deletions .github/workflows/e2e-gpu-test.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,18 +24,12 @@ jobs:
include:
- name: linux-arm64
runner: linux-arm64-gpu-l4-latest-1
cluster: e2e-gpu-arm64
port: "8083"
experimental: false
- name: linux-amd64
runner: linux-amd64-gpu-rtxpro6000-latest-1
cluster: e2e-gpu-amd64
port: "8084"
experimental: false
- name: wsl-amd64
runner: wsl-amd64-gpu-rtxpro6000-latest-1
cluster: e2e-gpu-wsl
port: "8085"
experimental: true
container:
image: ghcr.io/nvidia/openshell/ci:latest
Expand All @@ -53,30 +47,15 @@ jobs:
OPENSHELL_REGISTRY_NAMESPACE: nvidia/openshell
OPENSHELL_REGISTRY_USERNAME: ${{ github.actor }}
OPENSHELL_REGISTRY_PASSWORD: ${{ secrets.GITHUB_TOKEN }}
OPENSHELL_GATEWAY: ${{ matrix.cluster }}
OPENSHELL_E2E_DOCKER_GPU: "1"
steps:
- uses: actions/checkout@v6

- name: Log in to GHCR
run: echo "${{ secrets.GITHUB_TOKEN }}" | docker login ghcr.io -u "${{ github.actor }}" --password-stdin

- name: Pull cluster image
run: docker pull ghcr.io/nvidia/openshell/cluster:${{ inputs.image-tag }}

- name: Install Python dependencies and generate protobuf stubs
run: uv sync --frozen && mise run --no-deps python:proto

- name: Bootstrap GPU cluster
env:
GATEWAY_HOST: host.docker.internal
GATEWAY_PORT: ${{ matrix.port }}
CLUSTER_NAME: ${{ matrix.cluster }}
# Passes --gpu to the gateway bootstrap so the cluster comes up with GPU passthrough enabled.
CLUSTER_GPU: "1"
SKIP_IMAGE_PUSH: "1"
SKIP_CLUSTER_IMAGE_BUILD: "1"
OPENSHELL_CLUSTER_IMAGE: ghcr.io/nvidia/openshell/cluster:${{ inputs.image-tag }}
run: mise run --no-deps --skip-deps cluster

- name: Run tests
run: mise run --no-deps --skip-deps e2e:python:gpu
Loading
Loading