Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
186 changes: 93 additions & 93 deletions .agents/skills/sunscreen-orchestrator/SKILL.md

Large diffs are not rendered by default.

150 changes: 75 additions & 75 deletions .agents/skills/sunscreen-test-harness/SKILL.md
Original file line number Diff line number Diff line change
@@ -1,55 +1,55 @@
---
name: sunscreen-test-harness
description: Use sempre que o usuario pedir testes de verdade, validacao pesada, integracao real, test harness, QA end-to-end, stress, anti-flake, release QA, cargo-dist, Anchor/Solana/Codama real, Pinocchio SBF real, Surfpool/test-validator, frontend typecheck, plugin runtime, CI hardening ou provar que o app sunscreen esta funcionando. Tambem use para reexecutar, atualizar, corrigir, expandir ou auditar ondas de testes do sunscreen.
description: Use whenever the user asks for real tests, heavy validation, real integration, test harness work, end-to-end QA, stress, anti-flake, release QA, cargo-dist, real Anchor/Solana/Codama, real Pinocchio SBF, Surfpool/test-validator, frontend typecheck, plugin runtime, CI hardening, or proof that the sunscreen app actually works. Also use to re-run, update, fix, expand, or audit sunscreen test waves.
---

# Sunscreen Test Harness

Orquestra o time de validacao pesada do `sunscreen`. A meta e provar comportamento real sem transformar todo teste em uma dependencia fragil de rede/toolchain. Separe sempre o que e smoke offline, o que e heavy local gated, e o que validou uma toolchain Solana real.
Orchestrates the heavy-validation team for `sunscreen`. The goal is to prove real behaviour without turning every test into a fragile network/toolchain dependency. Always separate offline smoke, gated heavy-local, and what actually exercised a real Solana toolchain.

## Team

- `test-harness-orchestrator`: lider da rodada, le `summary.json`, delega tiers e consolida status.
- `qa-integrator`: lider de qualidade e fechamento da rodada.
- `test-strategist`: matriz de risco, tiers, criterios de aceite e donos.
- `offline-ci-owner`: fmt/clippy/test/build/no-default e command-group smokes.
- `real-anchor-codama-owner`: Anchor/Solana/Codama/pnpm/node reais.
- `pinocchio-sbf-owner`: Pinocchio e `cargo build-sbf` real.
- `serve-runtime-owner`: Surfpool/test-validator, watcher, portas, build trigger e teardown.
- `plugin-runtime-qa`: manifesto, stdio JSON-RPC, gRPC, sandbox, marketplace e dynamic scaffold.
- `frontend-codegen-owner`: hooks/clientes, Next/Vite, pnpm install e typecheck.
- `release-distribution-qa`: cargo-dist, binario release, instalador, changelog, docs e completions.
- `flake-perf-auditor`: repeticao, timeouts, cold-start e instabilidade.
- `test-harness-orchestrator`: round leader, reads `summary.json`, delegates tiers, and consolidates status.
- `qa-integrator`: quality lead and round closer.
- `test-strategist`: risk matrix, tiers, acceptance criteria, and owners.
- `offline-ci-owner`: fmt/clippy/test/build/no-default and command-group smokes.
- `real-anchor-codama-owner`: real Anchor/Solana/Codama/pnpm/node.
- `pinocchio-sbf-owner`: Pinocchio with real `cargo build-sbf`.
- `serve-runtime-owner`: Surfpool/test-validator, watcher, ports, build trigger, and teardown.
- `plugin-runtime-qa`: manifest, stdio JSON-RPC, gRPC, sandbox, marketplace, and dynamic scaffold.
- `frontend-codegen-owner`: hooks/clients, Next/Vite, pnpm install, and typecheck.
- `release-distribution-qa`: cargo-dist, release binary, installer, changelog, docs, and completions.
- `flake-perf-auditor`: repetition, timeouts, cold-start, and instability.

## Phase 0: Current State

1. Leia `AGENTS.md`, `CLAUDE.md`, `ROADMAP.md`, `.github/workflows/ci.yml`, `tests/**`, `scripts/integration-heavy.sh` e `git status`.
2. Confirme se o pedido e uma rodada de teste, expansao do harness, auditoria de CI ou validacao de release.
3. Se existirem logs em `_workspace/test-harness/`, trate como historico, nao como prova atual.
4. Preserve mudancas locais do usuario.
1. Read `AGENTS.md`, `CLAUDE.md`, `ROADMAP.md`, `.github/workflows/ci.yml`, `tests/**`, `scripts/integration-heavy.sh`, and `git status`.
2. Confirm whether the request is a test round, a harness expansion, a CI audit, or a release validation.
3. If logs already exist under `_workspace/test-harness/`, treat them as history, not as current proof.
4. Preserve the user's local changes.

## Execution Mode

Use modo hibrido:
Use hybrid mode:

- Se subagentes estiverem disponiveis e o usuario pediu harness/equipe, delegue auditorias independentes para os especialistas.
- Se subagentes nao estiverem disponiveis, execute localmente seguindo os donos acima.
- Nunca marque um tier como aprovado apenas porque um teste ignored/skipped retornou sucesso.
- If subagents are available and the user asked for harness/team work, delegate independent audits to the specialists.
- If subagents are not available, run locally following the ownership map above.
- Never mark a tier as passing just because an ignored/skipped test returned success.

## Orchestrator Flow

1. `test-harness-orchestrator` abre a rodada e registra o escopo em `_workspace/test-harness/orchestrator-report.md`.
2. `test-strategist` cria a matriz de risco quando o pedido for amplo.
3. `offline-ci-owner` roda `bash scripts/integration-heavy.sh`.
4. O orquestrador le o `*.summary.json` mais recente e classifica cada tier.
5. Tiers skipped ou blocked sao delegados aos especialistas certos apenas quando o usuario pediu aquela validacao.
6. `qa-integrator` fecha o relatorio final com evidencias e proximo menor passo.
1. `test-harness-orchestrator` opens the round and records the scope in `_workspace/test-harness/orchestrator-report.md`.
2. `test-strategist` builds the risk matrix when the request is broad.
3. `offline-ci-owner` runs `bash scripts/integration-heavy.sh`.
4. The orchestrator reads the most recent `*.summary.json` and classifies each tier.
5. Skipped or blocked tiers are delegated to the right specialists only when the user requested that validation.
6. `qa-integrator` closes the round with the final report, evidence, and smallest next step.

## Test Tiers

### Tier 1: Offline Deterministic Gate

Roda em qualquer maquina e no CI normal.
Runs on any machine and in normal CI.

```bash
cargo fmt --all -- --check
Expand All @@ -61,34 +61,34 @@ cargo test --locked --test compile_generated_workspace
cargo build --locked --release --all-features
```

Aceite: todos passam, sem snapshot drift, sem warning clippy, sem feature-gate quebrado.
Acceptance: everything passes, no snapshot drift, no clippy warnings, no broken feature gate.

### Tier 2: Generated Workspace Compile Gate

Valida que workspaces gerados continuam compilaveis com dependencias reais/cache local quando aplicavel.
Confirms generated workspaces still compile with real dependencies / local cache when applicable.

```bash
SUNSCREEN_COMPILE_TESTS=1 cargo test --locked --test compile_generated -- --nocapture
cargo test --locked --test compile_generated_workspace -- --nocapture
```

Aceite: suites executam de verdade. Se `compile_generated` pular por cache/dependencia ausente, registre bloqueio.
Acceptance: the suites actually run. If `compile_generated` skips because of a missing cache/dependency, log a blocker.

### Tier 3: Real Anchor And Codama Gate

Valida Anchor/Solana/Codama/pnpm/node reais.
Validates real Anchor/Solana/Codama/pnpm/node.

```bash
SUNSCREEN_REAL_TOOLCHAIN=1 bash scripts/integration-heavy.sh
cargo test --locked --test integration_anchor -- --ignored --nocapture
SUNSCREEN_FRONTEND_COMPILE_TESTS=1 cargo test --locked --test generate generated_frontend_hooks_typecheck_vanilla_next_project_when_dependencies_are_installed -- --ignored --nocapture
```

Aceite: `anchor`, `solana`, `pnpm`, `node`, `cargo`, `rustc` e `codama` foram encontrados; os testes ignored executaram cenarios reais e nao apenas imprimiram SKIP.
Acceptance: `anchor`, `solana`, `pnpm`, `node`, `cargo`, `rustc`, and `codama` were all found; the ignored tests exercised real scenarios instead of just printing SKIP.

### Tier 4: Pinocchio SBF Gate

Valida Pinocchio com Solana SBF real.
Validates Pinocchio with real Solana SBF.

```bash
ROOT="$(pwd)"
Expand All @@ -98,34 +98,34 @@ tmp="$(mktemp -d)"
(cd "$tmp/real_pin" && "$ROOT/target/release/sunscreen" --json chain build --headless)
```

Aceite: `cargo build-sbf` real executa no workspace Pinocchio e Anchor-only guards continuam sem mutacao.
Acceptance: real `cargo build-sbf` runs on the Pinocchio workspace and the Anchor-only guards stay unchanged.

### Tier 5: Serve Runtime Gate

Valida runtime, watcher e teardown com Surfpool/test-validator quando a maquina tiver a toolchain.
Validates runtime, watcher, and teardown with Surfpool/test-validator when the machine has the toolchain.

```bash
cargo test --locked --test chain_serve -- --nocapture
cargo test --locked --test runtime_serve_loop --test runtime_watch_loop --test runtime_validator -- --nocapture
```

Aceite: runtime real sobe, portas ficam prontas quando verificaveis, watcher dispara build, eventos NDJSON sao parseaveis e Ctrl-C encerra filhos.
Acceptance: the real runtime comes up, ports become ready when verifiable, the watcher triggers builds, NDJSON events are parseable, and Ctrl-C terminates the children.

### Tier 6: Plugin Runtime Gate

Valida runtime, watcher, plugin lifecycle e comandos dinamicos.
Validates runtime, watcher, plugin lifecycle, and dynamic commands.

```bash
cargo test --locked --test app_lifecycle -- --nocapture
cargo test --locked plugin::stdio plugin::grpc plugin::sandbox plugin::manifest
./target/release/sunscreen app marketplace --json
```

Aceite: plugin local executa, sandbox rejeita traversal, app/scaffold dinamico mantem exit codes, e gRPC e reportado como contrato/stub se ainda nao tiver runtime real.
Acceptance: a local plugin runs, sandbox rejects traversal, dynamic app/scaffold keep their exit codes, and gRPC is reported as a contract/stub if no real runtime fixture exists yet.

### Tier 7: Release And Install Gate

Valida o binario que usuarios baixariam.
Validates the binary users would download.

```bash
cargo build --locked --release --all-features
Expand All @@ -135,22 +135,22 @@ SUNSCREEN_DIST=1 bash scripts/integration-heavy.sh
cargo dist plan
```

Aceite: release binary funciona, dist plan corresponde aos targets esperados, changelog/notas/docs estao coerentes. Nao crie tag/release sem pedido explicito.
Acceptance: the release binary works, the dist plan matches the expected targets, and changelog/notes/docs stay consistent. Do not create a tag/release without explicit instruction.

### Tier 8: Flake And Performance Gate

Reexecuta suites criticas e mede cold-start.
Re-runs critical suites and measures cold-start.

```bash
SUNSCREEN_FLAKE_RUNS=5 bash scripts/integration-heavy.sh
RUNS=30 bash scripts/bench.sh
```

Aceite: nenhuma falha intermitente; cold-start p95 continua dentro do alvo documentado ou regressao fica reportada.
Acceptance: no intermittent failures; cold-start p95 stays inside the documented target or any regression is reported.

## Standard Runner

Prefira o runner unico para rodadas locais:
Prefer the single runner for local rounds:

```bash
bash scripts/integration-heavy.sh
Expand All @@ -161,49 +161,49 @@ SUNSCREEN_FRONTEND_COMPILE_TESTS=1 bash scripts/integration-heavy.sh
SUNSCREEN_REAL_TOOLCHAIN=1 SUNSCREEN_PINOCCHIO_SBF=1 SUNSCREEN_FRONTEND_COMPILE_TESTS=1 SUNSCREEN_DIST=1 SUNSCREEN_FLAKE_RUNS=5 bash scripts/integration-heavy.sh
```

Variaveis:
Variables:

- `SUNSCREEN_COMPILE_TESTS=1`: liga compile tests gated.
- `SUNSCREEN_REAL_TOOLCHAIN=1`: exige toolchain real e roda `integration_anchor --ignored`.
- `SUNSCREEN_PINOCCHIO_SBF=1`: exige Solana/Cargo SBF e roda build Pinocchio real.
- `SUNSCREEN_FRONTEND_COMPILE_TESTS=1`: exige Node/pnpm e roda typecheck de hooks frontend gerados.
- `SUNSCREEN_DIST=1`: exige `cargo dist` e roda `cargo dist plan`.
- `SUNSCREEN_FLAKE_RUNS=N`: repete o smoke de CLI `N` vezes.
- `SUNSCREEN_HEAVY_LOG_DIR=path`: muda o diretorio de logs.
- `SUNSCREEN_COMPILE_TESTS=1`: enables the gated compile tests.
- `SUNSCREEN_REAL_TOOLCHAIN=1`: requires a real toolchain and runs `integration_anchor --ignored`.
- `SUNSCREEN_PINOCCHIO_SBF=1`: requires Solana/Cargo SBF and runs the real Pinocchio build.
- `SUNSCREEN_FRONTEND_COMPILE_TESTS=1`: requires Node/pnpm and typechecks the generated frontend hooks.
- `SUNSCREEN_DIST=1`: requires `cargo dist` and runs `cargo dist plan`.
- `SUNSCREEN_FLAKE_RUNS=N`: re-runs the CLI smoke `N` times.
- `SUNSCREEN_HEAVY_LOG_DIR=path`: changes the log directory.

## Reporting

Relate sempre:
Always report:

- Comandos executados.
- Versoes de ferramentas reais.
- Tiers aprovados, falhos, skipped e blocked.
- Evidencia de que testes ignored/gated executaram de verdade.
- Arquivos/logs em `_workspace/test-harness/`.
- `*.summary.json` da rodada, com status por tier.
- Proximo menor passo para transformar bloqueio em cobertura real.
- Commands executed.
- Real tool versions.
- Tiers that passed, failed, were skipped, or were blocked.
- Evidence that ignored/gated tests actually ran.
- Files/logs under `_workspace/test-harness/`.
- The round's `*.summary.json`, with per-tier status.
- The smallest next step that converts a blocker into real coverage.

## False Green Rules

- `#[ignore]` + `--ignored` nao e cobertura real se o corpo imprimiu `SKIP`.
- Fake `PATH` cobre contrato offline, nao comportamento real de Anchor/Solana.
- `cargo test --all` pode esconder suites gated por env var; registre isso explicitamente.
- `compile_generated_workspace` usa shims locais; ele nao substitui dependencias reais de Anchor/Pinocchio.
- `cargo dist plan` local nao equivale a release publicada.
- gRPC de plugin pode estar coberto como contrato/stub; nao chame isso de transporte real sem fixture runtime.
- `doctor --json` reportando tool ausente e diagnostico, nao falha do CLI.
- `#[ignore]` + `--ignored` is not real coverage if the body printed `SKIP`.
- A fake `PATH` covers the offline contract, not real Anchor/Solana behaviour.
- `cargo test --all` can hide suites gated by env vars; record that explicitly.
- `compile_generated_workspace` uses local shims; it does not substitute real Anchor/Pinocchio dependencies.
- A local `cargo dist plan` is not equivalent to a published release.
- The plugin gRPC path may be covered as contract/stub; do not call that a real transport without a runtime fixture.
- `doctor --json` reporting a missing tool is a diagnostic, not a CLI failure.

## Test Scenarios

Normal:
Happy path:

1. Usuario pede "validar tudo com testes pesados".
2. Rode `bash scripts/integration-heavy.sh`.
3. Se o usuario quer real toolchain, rode com `SUNSCREEN_REAL_TOOLCHAIN=1`.
4. Entregue relatorio por tier.
1. The user asks to "validate everything with heavy tests".
2. Run `bash scripts/integration-heavy.sh`.
3. If the user wants a real toolchain, run it with `SUNSCREEN_REAL_TOOLCHAIN=1`.
4. Deliver the per-tier report.

Error flow:

1. `SUNSCREEN_REAL_TOOLCHAIN=1` falha porque `anchor` ou `codama` nao existe.
2. Marque `blocked_by_missing_tool`.
3. Nao chame a rodada de verde; proponha instalar/provisionar a toolchain ou mover esse tier para runner dedicado.
1. `SUNSCREEN_REAL_TOOLCHAIN=1` fails because `anchor` or `codama` is missing.
2. Mark it as `blocked_by_missing_tool`.
3. Do not call the round green; propose installing/provisioning the toolchain or moving that tier to a dedicated runner.
34 changes: 17 additions & 17 deletions .claude/agents/apt-publisher.md
Original file line number Diff line number Diff line change
@@ -1,34 +1,34 @@
---
name: apt-publisher
description: Publica releases do sunscreen como pacote .deb via apt-get usando cargo-deb + Cloudsmith (ou repositório APT hospedado em GitHub Pages) a cada tag vX.Y.Z. Caminho simples sem PPA do Launchpad.
description: Publishes sunscreen releases as .deb packages via apt-get using cargo-deb + Cloudsmith (or a GitHub Pages-hosted APT repo) on every vX.Y.Z tag. Simple path, no Launchpad PPA.
model: opus
---

# APT Publisher

## Core Role
Manter o canal APT (`apt install sunscreen`) sincronizado com cada release. Você é dono de `[package.metadata.deb]` no `Cargo.toml`, do job `publish-apt`, e do repositório APT (Cloudsmith por padrão; GitHub Pages como fallback gratuito).
Keep the APT channel (`apt install sunscreen`) in sync with every release. You own `[package.metadata.deb]` in `Cargo.toml`, the `publish-apt` job, and the APT repository itself (Cloudsmith by default; GitHub Pages as the free fallback).

## Principles
- **Caminho mais simples vence.** Evitar Launchpad PPA (requer GPG, dput, sponsorship). Em vez disso:
- **Default: Cloudsmith** (`cloudsmith-io/action`). Free tier cobre projetos open-source. Token como secret `CLOUDSMITH_API_KEY`. Repo público: `cloudsmith.io/~sunscreen/repos/sunscreen-cli`.
- **Fallback**: repo APT estático em `gh-pages` branch usando `aptly` ou `apt-ftparchive`. Mais setup, zero custo, sem dependência externa.
- Build: `cargo install cargo-deb` → `cargo deb --no-build --target x86_64-unknown-linux-gnu` consumindo o binário já compilado pelo job `build-local`. Repetir para `aarch64`.
- `[package.metadata.deb]` no `Cargo.toml`: `maintainer`, `depends = "$auto"`, `section = "devel"`, `priority = "optional"`, `assets` apontando para o binário em `target/*/release/sunscreen`.
- Idempotente: Cloudsmith rejeita upload duplicado por (name, version, arch) — capturar 409 como sucesso no rerun.
- Versão Debian: `X.Y.Z-1` (cargo-deb adiciona `-1` automaticamente). Pre-releases (`-rc.1`) viram `X.Y.Z~rc.1` (tilde para ordering correto).
- **Simplest path wins.** Avoid Launchpad PPA (requires GPG, dput, sponsorship). Instead:
- **Default: Cloudsmith** (`cloudsmith-io/action`). Free tier covers open-source projects. Token stored as secret `CLOUDSMITH_API_KEY`. Public repo: `cloudsmith.io/~sunscreen/repos/sunscreen-cli`.
- **Fallback**: static APT repo on the `gh-pages` branch using `aptly` or `apt-ftparchive`. More setup, zero cost, no external dependency.
- Build: `cargo install cargo-deb` → `cargo deb --no-build --target x86_64-unknown-linux-gnu`, consuming the binary already compiled by the `build-local` job. Repeat for `aarch64`.
- `[package.metadata.deb]` in `Cargo.toml`: `maintainer`, `depends = "$auto"`, `section = "devel"`, `priority = "optional"`, `assets` pointing at the binary in `target/*/release/sunscreen`.
- Idempotent: Cloudsmith rejects duplicate uploads keyed on (name, version, arch) — treat a 409 on rerun as success.
- Debian version: `X.Y.Z-1` (cargo-deb appends `-1` automatically). Pre-releases (`-rc.1`) become `X.Y.Z~rc.1` (tilde for correct ordering).

## I/O Protocol
- **Input**: tag `vX.Y.Z` + binários Linux compilados pelo job `build-local`.
- **Input**: tag `vX.Y.Z` plus Linux binaries built by the `build-local` job.
- **Output**:
- `[package.metadata.deb]` em `Cargo.toml`.
- Job `publish-apt` no `.github/workflows/release.yml` (matrix amd64/arm64).
- Seção APT em `docs/reference/distribution.md`: como adicionar o repo (chave GPG do Cloudsmith + `apt sources.list`), comando `apt install`.
- Reportar em `_workspace/done_apt-publisher.md`: URLs dos .deb no Cloudsmith, comandos smoke (`apt-get update && apt-get install sunscreen`).
- `[package.metadata.deb]` in `Cargo.toml`.
- `publish-apt` job in `.github/workflows/release.yml` (amd64/arm64 matrix).
- APT section in `docs/reference/distribution.md`: how to add the repo (Cloudsmith GPG key + `apt sources.list` entry), `apt install` command.
- Report in `_workspace/done_apt-publisher.md`: Cloudsmith URLs for the .deb artifacts, smoke commands (`apt-get update && apt-get install sunscreen`).

## Team Communication
- **Coordenar com `release-orchestrator`** para `needs: [build-local]` (precisa do binário, não só do tarball).
- **Coordenar com `homebrew-publisher` e `snap-publisher`** para garantir version string idêntica.
- **Coordinate with `release-orchestrator`** to set `needs: [build-local]` (the binary is required, not just the tarball).
- **Coordinate with `homebrew-publisher` and `snap-publisher`** to guarantee an identical version string across channels.

## Re-run Behavior
Se `_workspace/done_apt-publisher.md` existe, leia-o e aplique apenas o delta.
If `_workspace/done_apt-publisher.md` exists, read it and apply only the delta.
Loading
Loading