Skip to content

refactor(api): split monolithic OpenShell gRPC service #1226

@drew

Description

@drew

Problem Statement

The generated OpenShell gRPC service trait is monolithic. Any new RPC added to proto/openshell.proto becomes a required method on every test fake that implements open_shell_server::OpenShell, even when the test only exercises unrelated gateway behavior such as TLS, HTTP/gRPC multiplexing, WebSocket tunneling, or auth routing.

PR #1170 exposed the maintenance cost: adding provider profile registry RPCs forced unrelated integration tests to add UNIMPLEMENTED stubs just to keep compiling. This couples test fixtures and future API changes to the full gateway surface area.

Proposed Design

Split the public gRPC API into focused services instead of one broad OpenShell service. A likely starting shape:

  • SandboxService: sandbox lifecycle, watch, exec, SSH session APIs.
  • ProviderService: provider records and provider profile registry APIs.
  • PolicyService: config updates, policy status, draft policy workflow, policy analysis.
  • SupervisorService: sandbox config, gateway config, provider environment, supervisor log push, supervisor control stream.
  • RelayService: raw relay stream used by SSH and exec tunneling.

The gateway can still bind all services on the same listener and preserve the existing multiplexing behavior. Client code should receive a compatibility layer or staged migration path so CLI and SDK changes can happen incrementally.

Tests should then implement only the service traits they exercise. For example, TLS and HTTP/gRPC multiplex tests should not need provider profile stubs, and provider command tests should not need supervisor relay stubs.

Alternatives Considered

A shared test stub would reduce immediate boilerplate and is a good short-term cleanup. It does not remove the underlying API coupling because every new RPC still expands the same generated trait.

A macro for test stubs would reduce repeated code, but it hides behavior and still leaves all tests coupled to every RPC on the monolithic trait.

Keeping the single service is simplest operationally, but it makes the trait harder to evolve and increases unrelated churn in integration tests.

Agent Investigation

While reviewing PR #1170, the linked diff anchor mapped to crates/openshell-server/tests/multiplex_tls_integration.rs. The PR adds provider profile RPCs to proto/openshell.proto:

  • ImportProviderProfiles
  • LintProviderProfiles
  • DeleteProviderProfile

Because tonic generates a single required OpenShell trait, these provider-specific RPCs had to be stubbed in unrelated local test services. Repository search found multiple local implementations of the generated trait across CLI and server integration tests, including:

  • crates/openshell-server/tests/multiplex_integration.rs
  • crates/openshell-server/tests/multiplex_tls_integration.rs
  • crates/openshell-server/tests/ws_tunnel_integration.rs
  • crates/openshell-server/tests/edge_tunnel_auth.rs
  • crates/openshell-server/tests/auth_endpoint_integration.rs
  • crates/openshell-cli/tests/provider_commands_integration.rs
  • crates/openshell-cli/tests/ensure_providers_integration.rs
  • crates/openshell-cli/tests/mtls_integration.rs
  • crates/openshell-cli/tests/sandbox_create_lifecycle_integration.rs
  • crates/openshell-cli/tests/sandbox_name_fallback_integration.rs

This issue tracks the long-term architectural fix. A separate short-term task could introduce shared test fixtures to reduce repeated boilerplate before the service split.

Acceptance Criteria

  • proto/openshell.proto or successor proto files define focused services rather than one monolithic OpenShell service.
  • The gateway serves the focused services on the existing gateway listener without regressing HTTP/gRPC multiplexing, TLS, WebSocket tunnel, or supervisor relay behavior.
  • CLI and server clients are migrated or provided compatibility wrappers.
  • Integration tests implement only the service traits required by each scenario.
  • Adding a provider-specific RPC no longer requires edits to unrelated sandbox, TLS, multiplex, WebSocket, or supervisor tests.
  • Architecture documentation is updated to describe the service boundaries and migration path.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions