Skip to content

Gateway compat CI (fork validation)#3

Closed
ranjeshj wants to merge 387 commits into
masterfrom
user/ranjeshj/testing
Closed

Gateway compat CI (fork validation)#3
ranjeshj wants to merge 387 commits into
masterfrom
user/ranjeshj/testing

Conversation

@ranjeshj

Copy link
Copy Markdown
Owner

Validating the W1-W5 gateway compatibility CI changes within the fork before any upstream PR. Do not merge - this PR exists to drive the workflows on real CI.

Branch contains 12 commits covering:

  • LKG version pinning (gateway-lkg.json + GatewayLkg.cs + drift-detection test)
  • Compile-time-gated tray.testhook.* MCP tool surface with Release-build safety net
  • Spike workflow (validated on real CI, ~2m12s)
  • Fake LLM server inside WSL
  • Full GatewayCompatFixture harness (smoke + gateway tier)
  • 7 real gateway-tier scenarios (operator pair, node pair, tool events, chat round-trip, node.invoke, reconnect, config patch)
  • gateway-compat.yml workflow (PR-gating Smoke + Gateway LKG cell; nightly latest matrix)
  • gateway-lkg-bump.yml scheduled poll + auto-PR

Expected: ci.yml passes; gateway-compat Smoke passes (~3min); gateway-compat Gateway tier vs LKG attempts a full WSL+openclaw run (~10-15 min). First real run may need timing tweaks.

shanselman and others added 30 commits May 13, 2026 13:04
UI fixes: skills redesign, workspace caching, sidebar/voice width
…y-diagnostic-helper

refactor: extract CopyDiagnostic helper for diagnostic copy methods
Add a shared ClipboardHelper for text copy operations and route existing WinUI clipboard writes through it while preserving the chat timeline flush behavior and App.CopyTextToClipboard API.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…etup" dismiss

Two bugs reported by Scott Hanselman against master:

1. Tray app launched the onboarding wizard on every start even when the
   user already had a working remote-gateway operator configuration.
   StartupSetupState.RequiresSetup only short-circuited for node mode
   (EnableNodeMode + node device token) or MCP-only mode, so an operator
   with a non-default gateway URL + stored device token still got the
   wizard popped at OnLaunched.

   Fix: add an operator-mode short-circuit that requires BOTH a stored
   operator device token AND a non-default GatewayUrl (guards against
   orphan tokens after uninstall and against half-finished setups that
   never picked a gateway target).

2. On the SetupWarning page warn-and-confirm UI, clicking "Keep my setup"
   only toggled in-page state. Because OnboardingWindow defaulted
   SetupPath = Advanced when existing config was detected, the global
   nav-bar Next button stayed enabled, so the user was one click from
   advancing into ConnectionPage anyway.

   Fix: add OnboardingState.Dismiss() that raises a new Dismissed event;
   OnboardingWindow handles it by setting a _dismissedWithoutCompletion
   guard, then Close()ing the window. OnClosed now skips
   TryCompleteOnboarding when that guard is set so OnboardingCompleted
   is NOT fired and existing settings / gateway connection are preserved.
   SetupWarningPage.CancelReplace calls Props.Dismiss().

   Belt-and-suspenders: drop the auto-default of SetupPath = Advanced for
   existing-config users in OnboardingWindow. With SetupPath left null,
   the nav-bar Next button is disabled on SetupWarning so the user MUST
   pick "Replace my setup", "Keep my setup", or "Advanced setup"
   explicitly — no accidental Next-into-setup path remains.

Tests:
- StartupSetupStateTests: operator paired with remote gateway returns false;
  operator token + default URL still returns true (stale-token guard);
  non-default URL alone (no token) still returns true.
- OnboardingStateTests: Dismiss fires Dismissed but NOT Finished; safe
  without subscribers.

Validation:
- ./build.ps1 succeeded
- Shared.Tests: 1548 passed, 28 skipped
- Tray.Tests: 1175 passed

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…rd-helper

Refactor WinUI clipboard text copies
Fixes from a Hanselman adversarial code review (Opus + Codex parallel):

1. Per-gateway tokens (Codex HIGH) — RequiresSetup only scanned the legacy
   root identity (device-key-ed25519.json at the dataPath root). Modern
   pairings via DeviceIdentityStore write tokens at
   <dataPath>/gateways/<gatewayId>/device-key-ed25519.json (see
   GatewayConnectionManager._activeIdentityPath = perGatewayIdentityDir).
   Operators paired post-GatewayRegistry would still see the wizard pop on
   every launch. Fix: HasAnyOperatorDeviceToken now scans the legacy root
   AND every gateways/* subdir.

2. SSH-tunnel false positive (Codex HIGH) — SSH topology routes via
   ws://127.0.0.1:LocalPort and the user typically leaves GatewayUrl at
   default. HasNonDefaultGatewayUrl alone returned false. Fix:
   HasAnyConfiguredGatewayTarget treats (UseSshTunnel + non-empty
   SshTunnelHost) as a configured target.

3. NodeMode + MCP precedence regression (Codex MEDIUM) — original code
   was 'if (NodeMode && nodeToken) false; return !MCP;' which let
   MCP-only mode bypass setup even when NodeMode was accidentally true
   without a node token. The first patch made NodeMode short-circuit
   first, breaking that precedence. Fix: check EnableMcpServer BEFORE
   EnableNodeMode so MCP wins, matching original semantics.

4. _dismissedWithoutCompletion stuck on Close exception (Opus MEDIUM) —
   the flag was set BEFORE Close(); if Close() threw, the flag stayed
   true and TryCompleteOnboarding was permanently suppressed for the
   window's lifetime, wedging the user. Fix: reset the flag in the
   catch block so the X-button / Finish path still works.

5. DefaultGatewayUrl duplication (Opus HIGH) — the constant existed in
   both StartupSetupState and OnboardingExistingConfigGuard with only a
   comment promising sync. Fix: promote
   OnboardingExistingConfigGuard.DefaultGatewayUrl to public const
   (single source of truth) and reference it from StartupSetupState.
   Added DefaultGatewayUrl_MatchesGuardConstant invariant test.

6. CancelReplace UI flash (Opus MEDIUM) — setConfirmingReplace(false)
   was called immediately before Props.Dismiss(), causing a brief
   re-render of the 'Set up locally' button before the window closed.
   Fix: drop the dead state change.

Tests added (5):
- RequiresSetup_ReturnsFalse_WhenSshTunnelConfiguredWithStoredToken
- RequiresSetup_ReturnsTrue_WhenSshTunnelEnabledButNoHostConfigured
- RequiresSetup_ReturnsFalse_WhenOperatorTokenStoredOnlyInPerGatewayDir
- RequiresSetup_ReturnsFalse_WhenMcpEnabledEvenWithNodeModeAndNoNodeToken
- DefaultGatewayUrl_MatchesGuardConstant

Validation:
- ./build.ps1 succeeded
- Shared.Tests: 1548 passed, 28 skipped
- Tray.Tests: 1180 passed (5 new); all 16 onboarding-fix tests green

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replace the duplicate Conversations page with an enhanced Sessions page:

- Remove 'Conversations' nav item (was showing identical data to Sessions)
- Add SelectorBar with channel filter tabs (All + auto-populated per-channel)
- Show per-session context usage as a progress bar (TotalTokens/ContextTokens)
- Display input/output token counts per session (↓in / ↑out)
- 3-row card layout: name+status, provider·model·channel, progress+tokens
- Keep Reset/Compact/Delete action buttons from original SessionsPage
- Redirect legacy 'conversations' nav tag to SessionsPage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…with Fluent rows

UX overhaul of the OpenClaw Tray hub. Capabilities is folded into Permissions
so device-level capability picks and exec-policy/allowlist controls live in
one place. Settings gets a consistent Fluent row-card pattern with auto-save.
Both pages localize ~40 newly-introduced strings.

## Pages
- **PermissionsPage** absorbs the former Capabilities page:
  - Node Mode master toggle + live Node Status card on top
  - Per-capability rows (Browser, Camera, Canvas, Screen, Location, TTS, STT),
    disabled and dimmed when Node Mode is off
  - STT row description notes the Whisper model download trigger
  - STT/TTS engine details render as subtle attached continuation panels
    (no duplicate banner; provider combo + ElevenLabs config for TTS;
    download status + retry hint for STT)
  - Local MCP Server integration card
  - Exec policy: default-action row + rules card with auto-save, count badge,
    Fluent semantic action pills, trash-icon row actions, empty state
  - Node allowlist (gateway-side, read-only)
  - Windows-level privacy launcher row
  - Whisper model auto-download when STT is toggled on, with failure surface
- **SettingsPage** rewrites the old expander layout into row cards:
  General · Notifications · Privacy · Local Gateway (conditional). Auto-save
  with a transient "Saved" toast bottom-right. No Save/Cancel buttons.
- **HubWindow** drops the standalone Capabilities nav item; `"capabilities"`
  tag routes to PermissionsPage for back-compat. Permissions sidebar icon
  switched from key to shield (Glyph EA18). Settings sidebar keeps its gear.
- Home and About/Info pages are untouched and identical to master.

## Localization
- 13 `CapabilitiesPage_*` x:Uid keys renamed to `PermissionsPage_*` (XAML +
  5 locale resw + coverage tests + invariant list)
- 41 new `PermissionsPage_*` resw keys for code-built strings: capability
  labels/descriptions, node status text, STT engine hints, MCP statuses,
  rule-count formatters, allowlist messages, TTS provider status, MCP
  token-read failure format
- Pinned in `LocalizationValidationTests.InvariantOrDeferredResourceKeys`
- New `LocalizationHelper.Format(key, args)` helper catches `FormatException`
  from malformed translations so a translator placeholder typo can't crash
  the UI thread
- New `NoLocale_HasEmptyOrWhitespaceValues` test prevents an empty resw value
  from leaking the raw resource-key into UI via the GetString fallback

## Lifecycle + threading correctness
- `SettingsManager.Saved` subscribe/unsubscribe moved to page `Loaded` /
  `Unloaded` on both pages; the per-navigation handler leak (and the latent
  N² stale-page UI work it caused) is gone
- `EnsureWhisperModelDownloadedAsync` is `async void` with a try/catch
  wrapping the entire body so no path can escape to
  `SynchronizationContext.UnhandledException`; page-local
  `_isDownloadingWhisperModel` + `_whisperDownloadError` give accurate hint
  copy independent of `VoiceService` state
- Whisper-download early-return also defers to
  `VoiceService.IsWhisperDownloadingModel` to avoid concurrent writes to the
  model file
- `OnSettingsSaved` refreshes MCP/STT/TTS cards too, gated by `IsLoaded`;
  `UpdateTtsCard` skips writes to TTS textboxes when `FocusState !=
  Unfocused` so cross-surface saves can't clobber in-progress input
- `UpdateTtsCard` no longer unconditionally clears `TtsStatusText`, so the
  auto-save toast ("Default provider: x", "ElevenLabs settings saved.") is
  no longer wiped one frame later by the dispatched refresh
- `_execSavedHintTimer` / `_savedIndicatorTimer` reused per page instead of
  allocated on every save
- `_execPolicyLoaded` one-shot latch replaced with scoped
  `_loadingExecPolicy` try/finally flag — safe for future reload paths

## Exec policy
- Case-insensitive JSON read (accepts both `pattern` and `Pattern`) to
  recover policy files written by the pre-fix anonymous-type leak; writes
  always use lowercase going forward
- Auto-saves on every mutation (add rule, remove rule, default action
  change). Inline "Saved" pill in the rules-card header, 1.5s
- `NewRuleAction` ComboBox now uses `Tag="allow"/"deny"` rather than reading
  the localizable `Content`, so future translations can't break the
  JSON-on-disk contract

## Tests / validation
- 1161 / 1161 tray tests pass (added `NoLocale_HasEmptyOrWhitespaceValues`)
- All locales preserve format-placeholder parity (existing test)
- Build clean on net10.0-windows10.0.22621.0 / win-arm64
- Two Hanselman-style dual-model adversarial reviews
  (Claude Opus 4.7 + GPT-5.3-Codex) ran across the diff; all HIGH-consensus
  and LOW-consensus-real findings have fixes in this commit

## Master-merge work
- Carried over master's clipboard refactor: `ClipboardHelper.CopyText`
  replaces the `DataPackage` + `Clipboard.SetContent` pair in the MCP
  token/URL copy methods on PermissionsPage

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the orphaned Conversations page files after routing conversations into Sessions, and update the chat root comment to point at SessionsPage.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…gs-info-merge

Merge Capabilities into Permissions; redesign Settings & Permissions with Fluent rows
…ons-page

feat: unify Sessions and Conversations into single Sessions page
Assert sanitized jsonlPath error responses now that internal exception details stay local to logs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Assert the battery failure payload keeps internal exception details out of the response.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…smiss

Addresses Scott Hanselman's review on PR openclaw#340:

Blocking fix:
- OnboardingExistingConfigGuard.GetSummary().HasOperatorDeviceToken only
  checked DeviceIdentity.HasStoredDeviceToken on the legacy root path.
  Modern pairings store the operator token at
  <dataPath>/gateways/<id>/device-key-ed25519.json via DeviceIdentityStore,
  so a fresh-paired user opening Setup/Reconfigure could overwrite a
  working gateway without seeing the "Replace my setup / Keep my setup"
  warning.
- Extracted the per-gateway scan (previously private to StartupSetupState)
  to OnboardingExistingConfigGuard.HasAnyOperatorDeviceToken as the single
  source of truth. StartupSetupState.HasUsableOperatorConfiguration and
  GetSummary() both call it now, so the startup auto-launch decision and
  the in-wizard guard always agree on what counts as paired.

Hardening (Scott's lower-confidence suggestion):
- OnboardingState.Dismiss() is now idempotent. A double-click or repeated
  handler invocation no longer fires the lifecycle signal twice.

Tests added:
- OnboardingExistingConfigGuardTests.HasExistingConfiguration_ReturnsTrue_
  WhenOperatorTokenStoredOnlyInPerGatewayDir — Scott's exact test shape.
- OnboardingStateTests.Dismiss_IsIdempotent_FiresDismissedAtMostOnce.

Follow-up tracked separately (per Scott's note):
- Make the startup token scan registry-aware (prefer the active
  GatewayRegistry record's identity dir over arbitrary gateways/* dirs)
  to avoid orphan dirs from suppressing onboarding for a different
  active gateway.

Validation:
- ./build.ps1 succeeded
- Shared.Tests: 1548 passed, 28 skipped
- Tray.Tests: 1182 passed (+2 new)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…f-section-probe-and-missing-settings-2026-05-09-ae8d66c4b9104b7f

[Repo Assist] fix(wsl): add mountFsTab=false + [time] section to wsl.conf; make IsAlreadyConfigured probe section-aware
…age-leaks-remaining-2026-05-11-83f5733e4978f96a

[Repo Assist] fix(security): stop leaking ex.Message in node client, device capability, and approval prompts
…jsonlpath-exmessage-leak-2026-05-13-78f4414fcfd54f2f

[Repo Assist] fix(security): remove residual ex.Message leak in canvas jsonlPath error path
…-existing-config

fix(onboarding): skip wizard for paired operators and make "Keep my setup" actually dismiss
Node-mode startup was still checking only the legacy root identity file for node device tokens. Modern local setup can persist the node token under gateways/<gateway-id>/device-key-ed25519.json, so startup kept reopening onboarding after Keep my setup.

Reuse the per-gateway identity scan for all token roles and add regression coverage for per-gateway node tokens in both the startup gate and existing-config guard.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
BuildTrayTooltip read eight App fields to assemble the tray icon
tooltip string. This moves that logic into a dedicated
TrayTooltipBuilder, following the same snapshot pattern established
by CommandCenterStateBuilder.

- Add TrayStateSnapshot (8 fields: status, activity, channels, nodes,
  node service, auth failure, last check time, settings)
- Add TrayTooltipBuilder — receives a snapshot, delegates to
  TrayTooltipFormatter.FitShellTooltip for the 127-char shell limit
- Replace BuildTrayTooltip body in App with a two-line delegation;
  add CaptureTraySnapshot alongside it

No observable behaviour change: tooltip content, truncation, and all
three call sites (InitializeTrayIcon ×2, UpdateTrayIcon) are unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
V2 (ExecShellWrapperNormalizer) already recognised these shells; V1
(ExecShellWrapperParser), which is the live exec-approval gate, did not.
Policy rules like `Allow: bash *` would not match a zsh invocation.

Fixes openclaw#366.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
NodeService.ShowToast called ToastContentBuilder.Show() directly,
bypassing the user's sound preference (None/Subtle/Default) and the
30-second deduplication window implemented in App.ShowToast.

Replace the private helper with a ToastRequested event; App subscribes
and delegates to its own ShowToast, so sound and dedup are honoured for
all screen-capture, screen-record, and camera toasts.

Fixes openclaw#342.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ValidateExecApprovalRules accepted patterns like C:\evil.exe as Allow
rules. A local process with the MCP bearer token could use this to
whitelist an attacker-controlled binary, then invoke it via system.run
(two-step local EoP).

Add a check: Allow rules whose pattern starts with a drive root (X:\,
X:/) or a UNC/long-path prefix (\) are rejected. Legitimate rules
name commands, not paths.

Fixes openclaw#347.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
ranjeshj and others added 29 commits May 19, 2026 19:12
W0 — .github/workflows/gateway-compat-spike.yml (manual dispatch only):
  proves WSL + Ubuntu-24.04 + openclaw install + provider config
  validation on a windows-2025 runner before we build the real harness.
  Records cold-start timings and the authoritative provider config shape.

W2 — tools/fake-llm-server/: minimal OpenAI-compatible HTTP mock used by
  the gateway-compat tests to avoid burning real provider credit. Scope is
  intentionally tiny (one non-streaming endpoint + assertion endpoints);
  expand as scenarios demand.

W3.1 — Compile-time gating for the future tray.testhook.* MCP tool
  surface. New MSBuild property OpenClawEnableTestHooks=true defines the
  OPENCLAW_E2E_HOOKS constant; the placeholder TestHookCapability.cs is
  wrapped in #if OPENCLAW_E2E_HOOKS. Rubber-duck critique flagged that
  env-var gating in a shipped binary is unsafe (loopback MCP token +
  destructive hooks like pairing.reset); compile-time gating + a
  Release-build smoke test (ReleaseBuildExcludesTestHooksTests, verified
  to fail loudly when the hooks are accidentally shipped) keep the
  dangerous surface out of production tray binaries.

Validated: build green; shared 1808 passed; tray 1128 passed (incl. the
new smoke test + verified red when -p:OpenClawEnableTestHooks=true).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds docs/GATEWAY_COMPAT_TESTING.md as the operator-facing companion to
the implementation plan: pieces, LKG bump flow (manual + automated),
local override, opting into compile-time test hooks for local dev,
running the fake LLM standalone, adding a new scenario, extending the
fake LLM.

Adds a 'Gateway version (LKG) pinning' section to docs/RELEASING.md
that names the source of truth, the auto-bump workflow, the
no-auto-merge rule, and the runtime override env var.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The spike (.github/workflows/gateway-compat-spike.yml) was authored to
prove the WSL + openclaw + provider-config + fake-LLM pipeline on
windows-2025 before sinking effort into the real harness. After several
iterations (lessons captured below), the run is now green end-to-end in
~2m12s and the canonical provider config shape is verified.

Spike outcome
-------------
- windows-2025 ships WSL 2.7.3.0 preinstalled, no distros. Ubuntu-24.04
  install ~36s; openclaw npm install ~66s; full spike job 2m12s cold.
  CI budget verified for the real workflows.
- Provider config root is models.providers.<id>, NOT agents.providers.<id>.
  Verified accepted keys (openclaw 2026.5.18 schema):
    api / baseUrl / apiKey / authMode / models[].id
- Default selector: agents.defaults.model.primary = "<provider>/<model>".
- openclaw config patch --file accepts atomic JSON5 patches.
- openclaw config validate is the build gate.
- openclaw config schema prints the full 2.2 MB canonical schema.

The verified JSON5 patch is committed to tools/fake-llm-server/README.md
and will be used verbatim by the W3 harness.

Lessons baked into the workflow
-------------------------------
- Shell scripts live in tools/spike/*.sh with .gitattributes "*.sh
  text eol=lf" so CRLF on Windows checkout never breaks "set -euo
  pipefail" inside WSL.
- Workflow steps invoke .sh files via `wsl ... -- bash $wslPath`
  through a ConvertTo-WslPath PowerShell helper. NOT via piping
  PS here-strings to wsl stdin (which mangles encoding).
- Diagnostics step is `continue-on-error: true` so a fresh runner
  without registered distros (the expected state) doesn't kill the
  job before real work begins.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds the TestHookCapability the gateway-compat harness will drive via the
local MCP HTTP server. The class is compile-time gated behind
OpenClawEnableTestHooks=true (production tray binaries do not contain it,
enforced by ReleaseBuildExcludesTestHooksTests). NodeService registers
it MCP-only (registerOnGateway: false) so a misbehaving gateway can
never trigger destructive hooks like pairing.reset, and the capability
second-gates on OPENCLAW_TRAY_E2E=1 at runtime.

Surface (8 commands declared; diagnostics.dump fully implemented):
- tray.testhook.diagnostics.dump (implemented)
- tray.testhook.gateway.config.patch (stub)
- tray.testhook.localSetup.start/status/cancel (stub)
- tray.testhook.connection.waitFor (stub)
- tray.testhook.pairing.reset (stub)
- tray.testhook.chat.send (stub)

Stubs return a stable "not yet implemented" error so the harness can
probe the surface, and a test asserts that message stays stable so a
future commit filling in a tool cannot regress to silent success.

13 unit tests in OpenClaw.Tray.Tests cover the surface snapshot, both
gates, the diagnostics shape (snapshot via JSON parse), error wrapping,
and the stub failure mode. Test project defines OPENCLAW_E2E_HOOKS so
it can exercise the class; the Release-build smoke test
re-verifies absence in the shipped tray binary.

Validated: 1140 tray tests pass (+12); 1808 shared tests pass.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
New tests/OpenClaw.GatewayCompat.E2ETests/ xUnit project that drives the
real tray exe over MCP. GatewayCompatFixture provisions isolated AppData,
finds a free port, spawns the E2E-built tray with OPENCLAW_TRAY_E2E=1,
waits for mcp-token.txt + the HTTP listener, and hands tests an McpClient
ready to call tray.testhook.* tools.

Test taxonomy via xUnit Trait:
  Tier=Smoke    - HarnessSmokeTests: spawn tray, list tools, call
                  tray.testhook.diagnostics.dump. Runs anywhere; no WSL.
  Tier=Gateway  - OperatorPairingTests etc.: real gateway scenarios.
                  Gated by GatewayCompatFactAttribute which skips unless
                  OPENCLAW_RUN_GATEWAY_COMPAT=1, so they only run on the
                  Windows+WSL CI lane.

Reuses tests/OpenClaw.Tray.IntegrationTests/McpClient.cs via <Compile Link>
so the JSON-RPC wire shape stays single-source-of-truth.

Locates the E2E tray binary via OPENCLAW_E2E_TRAY_EXE env first, then
falls back to src/OpenClaw.Tray.WinUI/bin/{E2E,Debug}/.../OpenClaw.Tray.WinUI.exe.
The harness expects that build to have -p:OpenClawEnableTestHooks=true;
without it, tray.testhook.* tools are absent and the smoke test fails
loudly.

OperatorPairingTests added as a Tier=Gateway placeholder (Assert.Fail
with "Implementation pending - W3.2 follow-up tools required") so the
real CI workflow has a target to depend on while the testhook stubs are
filled in.

Validated end-to-end: built tray with -p:OpenClawEnableTestHooks=true,
ran smoke tier - 2 tests pass, fixture spawn + MCP handshake + diagnostics
dump round-trip all work in 2 seconds.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
gateway-compat.yml
  - On PR/push to relevant paths: runs the Smoke tier (no WSL) - merge gate.
  - On schedule (nightly 07:00 UTC) or workflow_dispatch with
    run_gateway_tier=true: also runs the Gateway tier with WSL +
    Ubuntu-24.04 + openclaw + fake LLM. Matrix tests gateway_version
    in [lkg, latest]; "latest" failures are alert-only (continue-on-error
    via matrix include.failure_is_blocking=false).
  - Reusable via workflow_call so gateway-lkg-bump.yml can invoke it.
  - Reuses tools/spike/*.sh + ConvertTo-WslPath helper from the W0 spike.

gateway-lkg-bump.yml
  - Scheduled every 6h. Polls registry.npmjs.org/openclaw for the
    "latest" dist-tag, compares to gateway-lkg.json.
  - Refuses pre-releases (alpha/beta/rc/...) unless force_version is set.
  - On newer candidate: calls gateway-compat.yml as a reusable workflow
    with the candidate version and run_gateway_tier=true.
  - On green: opens (or updates) a PR titled
    "chore(lkg): bump gateway LKG to X.Y.Z" updating gateway-lkg.json AND
    src/OpenClaw.Shared/GatewayLkg.cs in lockstep (the existing
    GatewayLkgTests enforces drift = build failure).
  - PR body records previous + new version, npm publish time, tarball
    shasum, and a link to the validation workflow run.
  - NEVER auto-merges. CODEOWNER review required.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Every test hook must invoke the same method the matching UI click
handler invokes. If a handler does the work inline, extract a shared
service method first and have both the handler and the hook call that
method. No parallel implementations - they defeat the purpose of
gateway-compat (a test that passes against a stub tells us nothing
about whether the real UI path works).

Rule encoded in:
- src/OpenClaw.Tray.WinUI/Services/TestHooks/TestHookCapability.cs
  file header (anyone editing the file has to read it)
- docs/GATEWAY_COMPAT_TESTING.md "Same-path-as-user rule" section
  with a mapping table (test hook -> shared method -> UI caller)
- plan.md
- Repository memory

Each new tool comment will name the UI caller and the shared method
so future refactors can't drift.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
First real W4 hook. Writes a JSON5 patch into the WSL distro and runs
the exact same `openclaw config patch --file <path>` + `openclaw config
validate` CLI sequence the user can run by hand - via the same
IWslCommandRunner the tray uses for every other WSL operation. No
parallel implementation (same-path rule).

NodeService now constructs a WslExeCommandRunner and hands it to
TestHookCapability, mirroring how LocalGatewaySetup obtains the runner.

Args: { distroName, patchJson, openclawBinPath?, patchPath?, wslUser? }
Returns: { writeOk, writeStderr, patchOk, patchStdout, patchStderr,
           validateOk, validateStdout, validateStderr, patchPath }

The hook returns Ok=true even when validate fails so the harness can
inspect WHY (typical pattern: a future gateway version moves a key and
the scenario test surfaces the exact schema error).

5 new TestHookCapabilityTests cover:
- requires IWslCommandRunner
- requires distroName / patchJson
- exact 3-call sequence (write, patch, validate) with arg snapshots
  and base64 round-trip verification of the written body
- validate failure returns Ok=true with payload (doesn't throw)
- write failure short-circuits (no patch or validate call)

New tests/OpenClaw.GatewayCompat.E2ETests/GatewayConfigPatchTests.cs
is a Tier=Gateway scenario that asserts the verified fake-LLM patch
shape still validates against the running gateway. Catches schema drift
in the openclaw config root and blocks the LKG-bump auto-PR when
upstream breaks compatibility.

Validated: 1145 tray tests pass (+5); harness builds.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Per user direction: E2E scenarios will cover what unit tests do today, so
trim unit tests to the irreducible set the harness cannot replace.

Deletes from TestHookCapabilityTests:
- Surface stability snapshot (covered by HarnessSmokeTests.ToolsList...)
- Diagnostics shape (covered by HarnessSmokeTests.DiagnosticsDump...)
- Diagnostics provider-error wrapping (low value, breaking the host in
  E2E is impractical)
- All "not yet implemented" placeholder assertions (they go away as
  each hook is implemented and gets a real scenario test)
- Gateway-config-patch arg-validation guards (distroName/patchJson)

Keeps:
- AllTools_AreGatedBy_OPENCLAW_TRAY_E2E (security invariant E2E can't prove)
- UnknownCommand (trivial)
- gateway.config.patch exact-command-sequence assertion (same-path rule)
- gateway.config.patch failure-mode tests (write fails, validate fails)
- requires-IWslCommandRunner

Deletes from LocalGatewaySetupTests:
- 4 OPENCLAW_GATEWAY_VERSION env-override tests
- LocalGatewaySetupOptions_DefaultsToLkgVersion
(These will be re-covered by an E2E scenario that sets
OPENCLAW_GATEWAY_VERSION and asserts the actually-installed gateway
version matches.)

Promotes Gateway tier (LKG cell only) to run on every PR. The matrix
expands to ['lkg','latest'] only on schedule. Adds ~3min PR latency in
exchange for catching gateway regressions before merge instead of
the morning after.

Tests: 1129 tray (was 1145; -16 redundant); shared still 1808.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Adds:
- tray.testhook.connection.waitFor
- tray.testhook.pairing.reset
- tray.testhook.chat.send
- tray.testhook.localSetup.start / status / cancel

All four follow the same-path-as-user rule: each invokes the same
production method the matching UI click handler invokes.

New plumbing:
- ITestHookHost interface (compile-time-gated) aggregates the App-level
  dependencies the hooks need. App.TestHookHost.cs (partial class, also
  compile-time-gated) wires it up.
- TestHookCapability accepts an optional ITestHookHost. NodeService
  passes (App.Current as App) when registering the capability.

Same-path mappings:
- connection.waitFor -> IGatewayConnectionManager.StateChanged
  (same event tray icon + ConnectionPage observe)
- pairing.reset -> GatewayRegistry.Remove + per-gateway identity wipe
  (same Remove method UI surfaces use)
- chat.send -> OpenClawChatDataProvider.SendMessageAsync
  (same method ChatWindow.OnSendClicked invokes)
- localSetup.start -> App.CreateLocalGatewaySetupEngine + RunLocalOnlyAsync
  (same chain LocalSetupProgressPage / OnboardingV2Bridge invoke)

LocalSetup hook is async-shaped: start kicks off RunLocalOnlyAsync on a
background Task with its own CTS, status polls the latest engine state
(captured via the same StateChanged event the V2 bridge subscribes to),
cancel triggers the CTS. Concurrency-guarded: a second start while a
run is in-flight returns an error rather than racing.

ITestHookHost is also linked into OpenClaw.Tray.Tests so the existing
unit tests still compile. Tray tests: 1129 passing.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Replaces the placeholder OperatorPairingTests Assert.Fail with real
end-to-end scenarios that drive the production code paths via the
tray.testhook.* tools. Per user direction: no stubs, all fully
implemented and tested.

New GatewayCompatScenarios.cs centralizes:
- DistroName ("Ubuntu-24.04") and FakeLlmPort
- The verified fake-LLM provider JSON5 patch (single source of truth
  for the schema-validated body; tools/fake-llm-server/README.md and
  this file move together)
- ApplyFakeLlmProviderAsync (called by every scenario)
- UnwrapToolPayload helper for MCP tools/call response shape

7 scenarios under Tier=Gateway (skipped unless OPENCLAW_RUN_GATEWAY_COMPAT=1):
1. GatewayConfigPatchTests — pre-existing; validates the fake-LLM provider
   patch against the live gateway. Failure blocks LKG auto-bump.
2. OperatorPairingTests — drives local-setup -> waits for operator
   Connected -> asserts a device ID was issued.
3. NodePairingTests — waits for node Connected+Paired -> asserts
   gateway sees the node via app.nodes (existing production MCP tool).
4. ToolEventsTests — regression guard for the "tool-events cap missing"
   bug (repo memory). Sends a chat and confirms send=true.
5. ChatRoundTripTests — sends a chat via chat.send and asserts the
   fake LLM server received the user message verbatim (via the W2
   /__assert/last-request endpoint).
6. NodeInvokeTests — asserts gateway sees the Windows node with at
   least one capability via app.nodes; the failure mode this guards
   is "node.invoke silently dropped" per docs/gateway-node-integration.md.
7. ReconnectTests — pair -> pairing.reset -> re-pair, asserts Ready in
   both passes and that reset removed at least one record.

Validation (no-hooks build, normal dev):
- Shared 1808 passed
- Tray 1129 passed
- Harness Smoke 2 passed, Gateway 7 skipped (correctly gated)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The dotnet restore at the workflow root doesn't generate the win-x64
RID-targeted assets for the WinUI sub-projects (FunctionalUI,
OnboardingV2). The existing ci.yml works around this by omitting
--no-restore on the 'Build Tray App (WinUI)' step, which triggers
the RID-targeted restore. Mirror that here.

Caught by the first PR-triggered run of gateway-compat.yml on the
fork (run 26141658423).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
npm reports no such version 2026.5.17, so PR CI failed to install it
in the Gateway tier. The W0 spike (run 26138294682) installed and
verified 2026.5.18 (which is npm dist-tag 'latest'). Use that as the
real LKG. GatewayLkgTests stays green because both gateway-lkg.json
and GatewayLkg.cs are bumped together.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
First real PR-triggered run on the fork (run 26142143433) revealed the
hook was passing args to RunInDistroAsync which prepends '-d name --'.
Combined with my '-u user --' that produces a double-'--' that ends
wsl arg parsing prematurely - bash sees '-' as positional arg 0 and
fails with 'bash: - : invalid option'.

Switch to RunAsync directly with the production-pattern args:
  wsl -d <distro> -u <user> -- bash -lc <script>
This matches LocalGatewaySetup.cs:993 exactly (which is the
production install command users run via the local-setup flow).

Unit tests updated to snapshot the new arg layout. FakeWslRunner now
implements RunAsync (was previously only RunInDistroAsync). Distro
name extracted from '-d' arg position for test assertion convenience.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR-triggered run 26142580405 surfaced the exact schema requirement:

  models.providers.fake.models.0.name: Invalid input

The W0 spike (which used 'openclaw config schema') only confirmed the
provider root path; it didn't probe the inner array element shape.
Real validate caught it.

Updated GatewayCompatScenarios.FakeLlmProviderPatch and the docs in
tools/fake-llm-server/README.md to use 'name' instead of 'id'.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Previous PR-triggered runs flip-flopped between 'models.0.id: Invalid'
and 'models.0.name: Invalid' depending on which field was missing
last. The real shape requires BOTH id and name plus reasoning, input,
cost, contextWindow, maxTokens - taken verbatim from openclaw's own
src/config/model-alias-defaults.test.ts fixture.

Also fix authMode -> auth (schema.help.ts:938 confirms 'auth' is the
canonical name).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Real schema confirmed at src/config/zod-schema.core.ts:319 of the
gateway repo. Required: id (min 1) + name (min 1). All other fields
optional. My JSON5 shape was correct but flip-flopping errors suggest
the parser is picky. Switch to strict JSON with quoted keys to
remove parser ambiguity as a variable.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PR run 26143696116 advanced past the schema issue but hit:
'ConfigMutationConflictError: config changed since last load'

openclaw config patch is read-modify-write and can race with the
gateway's own config writes. Retry up to 5 times with 500ms*attempt
backoff, but only for that specific error - other failures fail
fast.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The workflow no longer pre-installs WSL + openclaw under Ubuntu-24.04.
The gateway-compat scenarios now drive the production install path
themselves via tray.testhook.localSetup.start (the same code path the
LocalSetupProgressPage 'Set up locally' button invokes). That is the
exact regression target we want to test against new gateway versions.

- Drop: Install Ubuntu-24.04 distro
- Drop: Provision openclaw user
- Drop: Install openclaw@<version>
- Drop: Start fake LLM server inside WSL
- Add:  WSL host diagnostics (wsl --version/status/list)
- Keep: Register WSL path helper (useful for log paths)
- Change: Collect WSL gateway log now targets OpenClawGateway distro
          (production default created by LocalGatewaySetup engine)
- Change: Cleanup WSL distro now unregisters OpenClawGateway

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Introduces a collection-scoped xUnit fixture that drives the full
production tray.testhook.localSetup.start flow once, then shares the
resulting installed-and-paired tray with every gateway-tier scenario
in the [Collection(`"Gateway`")] collection. Cost (~3-4 min cold) is
paid once per CI run instead of per scenario.

Adds GatewayCompatScenarios helpers:
- DriveLocalSetupAndPrepareGatewayAsync: kicks off localSetup, polls
  localSetup.status to terminal, shells wsl.exe into OpenClawGateway
  to launch tools/spike/start-fake-llm.sh, then applies the verified
  fake-LLM provider patch.
- StartFakeLlmInDistroAsync: wsl.exe-based bootstrap, UTF-8 capture.
- WaitForConnectionAsync: client-side polling around <=20s server
  waits to respect McpClient's 30s HTTP timeout.
- FindRepoRoot + ToWslPath: path helpers.

DistroName flipped from Ubuntu-24.04 to the production default
OpenClawGateway (LocalGatewaySetupOptions.DistroName).

A separate ReconnectFixture lets ReconnectTests own its own pairing
state since it resets and re-pairs.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Now that GatewayCollectionFixture drives the full production install
and pairing flow once per CI run, the per-scenario setup boilerplate
(ApplyFakeLlmProviderAsync + localSetup.start + connection.waitFor
with 600s server timeouts) goes away. Each test body becomes just
the specific assertion it was meant to express.

- OperatorPairingTests, NodePairingTests, ToolEventsTests,
  ChatRoundTripTests, NodeInvokeTests, GatewayConfigPatchTests:
  joined [Collection(`"Gateway`")], use GatewayCollectionFixture,
  and confirm settled connection state via WaitForConnectionAsync
  (client-side polling, respects McpClient 30s timeout).
- GatewayConfigPatchTests now uses GatewayCompatScenarios.DistroName
  + FakeLlmProviderPatch (the verified strict-JSON patch shape),
  exercising idempotence against the already-installed gateway.
- ReconnectTests stays per-class on ReconnectFixture so the reset /
  re-pair dance doesn't trash the shared collection state.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Iteration in response to the first push to PR #3 where 7 of 9 gateway
scenarios failed:
- Collection fixture's first localSetup attempt failed at `"Creating the
  OpenClaw Gateway WSL instance`" within 18s on a cold runner. All five
  shared-collection scenarios then failed instantly because the fixture
  init faulted once and xUnit reuses the fault.
- ReconnectFixture's attempt got past WSL install but hung 20 min at
  `"Pairing Windows tray node`" before our timeout fired.

Changes:
- DriveLocalSetupAndPrepareGatewayAsync now retries once on
  status=FailedRetryable. Matches the production `"Retry`" button UX
  the user would click on a transient WSL hiccup. Terminal failures
  (FailedTerminal) still fail-fast.
- localSetup wall timeout bumped from 20 min to 25 min (gives the
  pairing step more headroom; will revisit if it still times out).
- GatewayCompatFixture preserves the tray's DataDir (including
  openclaw-tray.log) into ` before deleting it,
  when the workflow sets that env. Workflow sets it to
  TestResults/Gateway-<version>/tray-data, which is uploaded as part
  of the existing gateway-tier results artifact.
- `"Collect WSL gateway log`" now also dumps openclaw service logs
  under ~/.openclaw, distro process list, and listening sockets — so
  the next failure tells us whether the gateway was even listening
  when pairing hung.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
CI run 26146683288 surfaced two issues. This commit fixes #1; #2 is a
production-side issue documented in the next-session handoff.

#1: Race between the shared Gateway collection and ReconnectFixture
Both fixtures spawned in parallel and both invoked tray.testhook.localSetup.start
on their own tray instances. localSetup eventually calls wsl --install
OpenClawGateway, and on a fresh runner one side wins while the other
sees a partial registration and bails with wsl_existing_distro_unavailable
(WSL_E_DISTRO_NOT_FOUND on the probe). Confirmed via setup-state.json
artifacts.

Fix: disable assembly-level parallel collection execution. The Gateway
collection and ReconnectFixture now serialize. The ~3-4 min cold install
is paid once for the collection and once for Reconnect; total wall time
roughly equals the single longest scenario plus reconnect, which fits
inside the existing 45 min job budget.

#2 (deferred): operator pair succeeds; node pair fails because
   - node connects with role=node but existing approval is role=operator,
     so gateway returns NOT_PAIRED/role-upgrade.
   - tray autopair fires node.pair.approve and gets `"unknown requestId`"
     (likely a race against the just-issued request).
   - then the gateway sends `"shutdown / 1012 service restart`" and never
     comes back: tray gets `"Unable to connect to the remote server`" for
     the next 20 minutes until our deadline expires.
   This is a production / gateway-side flow problem and is now visible
   precisely because Plan A drives the real path. Investigation belongs
   in a follow-up commit (likely either: ensure the user-systemd unit for
   openclaw-gateway sets Restart=on-failure, or add a tray.testhook hook
   that calls `openclaw devices approve` from inside WSL to side-step the
   autopair race).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…tures

Workarounds for the two production-side issues found by iteration 2:

#2: Gateway exits with 1012 `"service restart`" mid-Pairing-Windows-tray-node
   and never auto-restarts; tray autopair sends node.pair.approve too
   eagerly and races the gateway's request-registration (gateway returns
   `"unknown requestId`", autopair gives up). The fixture now runs a
   background watchdog during localSetup that, every 10 seconds:
     wsl -d OpenClawGateway -u openclaw -- openclaw gateway start
       (idempotent — restarts the gateway if it has crashed)
     wsl -d OpenClawGateway -u openclaw -- openclaw devices list --json
       (then devices approve <requestId> for each pending one)
   The CLI's local-state fallback assumes operator.admin scope so the
   approve succeeds even though autopair couldn't. Watchdog starts 60s
   after localSetup.start so it doesn't trample the pre-pair install
   phases.

#3: Reconnect's per-class fixture saw `"local gateway port 18789 already
   in use`" after the collection fixture finished because the latter
   only kills the tray process — not the WSL distro. Add an explicit
   wsl --terminate OpenClawGateway in GatewayCollectionFixture +
   ReconnectFixture DisposeAsync so the next fixture's install starts
   against a stopped distro.

These are workarounds — the real bugs (gateway needs Restart=on-failure
unit; autopair vs request-registration race) still want upstream fixes,
documented in the prior commit.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Iteration 3's watchdog ran silently but did nothing useful — node pair
still hung 20 min on the same requestId. Two likely causes:

1. CLI invoked as `wsl ... -u openclaw -- /opt/openclaw/bin/openclaw ...`
   bypasses the login shell; OPENCLAW_PROFILE / OPENCLAW_STATE_DIR
   (set in /etc/profile.d) never get exported, so the local-state
   fallback in `devices approve` looks at the wrong path. Switch
   invocations to `wsl ... -u openclaw -- bash -lc '...'` so the
   profile scripts run.
2. `devices approve` may have wanted `--url` to talk to the running
   gateway instead of touching local state. Pass
   `--url ws://localhost:18789 --yes` on every approve call.

Plus: write every watchdog tick to `/node-pair-watchdog.log`
so the next run's artifact tells us whether it ran, what gateway start
returned, what devices list returned, what requestIds were found, and
what approve did. (Previously every exception was swallowed.)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Iteration 4's watchdog log showed:
- Initial ticks: openclaw binary not yet installed (60s wait too short).
- Subsequent ticks: gateway start exit=0, devices list exit=1 with 363
  chars of stdout we never logged. pending request ids parsed as [].

Capture the devices-list stdout (up to 500 chars) so the next run tells
us what the CLI is actually returning. Bump initial wait to 120s so the
watchdog skips the install phases entirely.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Definitive finding from iteration 5: `openclaw gateway start` returns
exit 0 but the underlying node process dies within seconds. The
watchdog log shows every devices-list call to ws://127.0.0.1:18789
returning `gateway closed (1006 abnormal closure)` — proof that the
gateway is not actually up by the time the watchdog tries to use it
(or by the time the tray's Phase 14 keeps reconnecting).

Workaround: have the watchdog spawn the gateway directly via:
   nohup /opt/openclaw/bin/openclaw gateway --port 18789 \
     > /home/openclaw/openclaw-gateway-watchdog.log 2>&1 &
   disown
Guarded by a pgrep so we don't start a second copy. This bypasses the
broken `openclaw gateway start` flow entirely and keeps the gateway
alive for the rest of the localSetup pair attempts.

After spawn, the watchdog waits 3 s for the port to bind, then runs
`openclaw devices list --json` and approves each pending requestId
(side-stepping the tray's autopair race).

Plus: switched RunWslOpenClawAsync invocations through bash -lc so
profile.d env (OPENCLAW_PROFILE, OPENCLAW_STATE_DIR) is set; added a
generic RunWslBashAsync helper for raw shell snippets.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Iteration 6 broke through to the watchdog actually identifying pending
pair requests, but every approve attempt died with:
   OpenClaw does not recognize option `"--yes`"

Drop the flag. The requestId is explicit so no interactive confirmation
is needed anyway.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Iteration 7 surfaced `"gateway url override requires explicit
credentials`" — when --url is set, the CLI insists on --token or
--password. Without --url, the CLI uses the local profile config plus
direct local-state fallback (no auth needed because the openclaw user
owns /home/openclaw/.openclaw).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@ranjeshj ranjeshj closed this May 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

10 participants