Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
41 commits
Select commit Hold shift + click to select a range
11093bc
ci: run build & test on a windows-latest runner
zpzjzj May 19, 2026
db2a610
feat: make the script judge cross-platform on Windows
zpzjzj May 19, 2026
3c5099a
refactor(agent): route quoting through shellquote, fix Check on Windows
zpzjzj May 19, 2026
13d8774
chore: add PowerShell tooling and example script for Windows
zpzjzj May 19, 2026
b303054
chore: pin line endings via .gitattributes
zpzjzj May 19, 2026
bcafdf0
docs: add Windows support guide
zpzjzj May 19, 2026
616f476
docs: clarify OpenSandbox runtime behavior on Windows
zpzjzj May 19, 2026
89a9498
docs: correct OpenSandbox Windows-sandbox availability
zpzjzj May 20, 2026
17ccabb
fix(windows): use POSIX quoting and forward-slash paths where targets…
zpzjzj May 20, 2026
73b296a
fix(windows): force -1 on ctx-cancel and make tests OS-aware
zpzjzj May 20, 2026
8db5434
fix(windows): use POSIX quoting in git checkout helper and run tests …
zpzjzj May 20, 2026
daa9cdd
fix(windows): address PR #33 review nits
zpzjzj May 20, 2026
c38f955
fix(windows): always quote QuoteWindows output and disable cmd AutoRu…
zpzjzj May 20, 2026
0585937
fix(windows): defang MSYS argv conversion and skip stdio-framing test
zpzjzj May 20, 2026
90bcdaa
fix(windows): cap NoneRuntime.Exec pipe wait with cmd.WaitDelay
zpzjzj May 20, 2026
2e30708
ci: make Windows job required now that all tests pass
zpzjzj May 20, 2026
8144c27
fix(windows): skip WSL bash even when PATH-discovered
zpzjzj May 20, 2026
6e11453
test(mcp): skip RejectsSymlinkEscape on Windows for the same Node-std…
zpzjzj May 20, 2026
acbf0d4
test(mcp): centralize Windows skip in startMockServer
zpzjzj May 20, 2026
f42dae8
fix(windows): reject WSL bash via SKILL_UP_BASH override too
zpzjzj May 20, 2026
0b9498b
ci: add a Windows e2e job
zpzjzj May 21, 2026
614438e
fix(e2e): emit skill-up.exe on Windows so TestMain build is executable
zpzjzj May 21, 2026
0e38503
fix(windows): normalize transcript path and honor shebang options for…
zpzjzj May 21, 2026
1298a43
fix(windows): escape bash actives in script paths; align WSL docs
zpzjzj May 21, 2026
74246c9
fix(windows): make cmd fallback reliable, shell-aware quoter, env -S …
zpzjzj May 21, 2026
1b3c51c
fix(windows): consume env value-flags and use shell-style env -S toke…
zpzjzj May 21, 2026
abd0a9d
fix(windows): bash-safe quoter, restrict shebang routing, /dev/null i…
zpzjzj May 21, 2026
19d7f8f
fix(windows): handle env long-flag values and `\_` separator in env -S
zpzjzj May 21, 2026
391a55a
refactor(windows): unify shell host, drop dead quote API, require Tar…
zpzjzj May 21, 2026
1b75e55
fix(windows): cache platform.Host() and require bash for agent CLIs
zpzjzj May 21, 2026
501905c
fix(judge): clean .sh temp dirs via bash rm on Windows
zpzjzj May 21, 2026
a2ca8f0
chore(windows): post-review cleanups (CI versions, docs, lint dead code)
zpzjzj May 21, 2026
954e79a
fix(judge): handle env -a, optional-arg signal flags, NAME=VALUE, and \c
zpzjzj May 21, 2026
0dca8b4
fix(judge): honor pwsh shebang and forward PowerShell shebang flags
zpzjzj May 21, 2026
cf17b53
chore(release): tag this branch's CHANGELOG entry as 0.3.0
zpzjzj May 25, 2026
d996465
refactor(agent): use platform.GOOSWindows constant in requireBashOnWi…
zpzjzj May 26, 2026
8199082
refactor(judge): defensive copy in .ps1 plan.command to match .sh branch
zpzjzj May 26, 2026
c83477e
docs(runtime): clarify execContextGracePeriod applies on POSIX too
zpzjzj May 26, 2026
6ac7525
fix(runtime): correct DockerRuntime.TargetGOOS doc comment
zpzjzj May 26, 2026
d5f1bd0
chore(release): bump 0.3.0 date to 2026-05-27 to follow main's 0.2.3
zpzjzj May 26, 2026
9a69230
fix(windows): reject rooted POSIX-style paths in workspace/sandbox gu…
zpzjzj May 27, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions .gitattributes
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# Keep line endings deterministic regardless of the contributor's OS or
# core.autocrlf setting.

# Go source must be LF: gofmt and the toolchain expect Unix line endings.
*.go text eol=lf

# Shell scripts must be LF: a CRLF shebang line (e.g. "#!/bin/sh\r") makes
# the kernel fail to locate the interpreter.
*.sh text eol=lf

# PowerShell handles LF on every platform; keep it consistent with .editorconfig.
*.ps1 text eol=lf

# Windows batch scripts use CRLF.
*.cmd text eol=crlf
*.bat text eol=crlf
59 changes: 54 additions & 5 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,9 +24,13 @@ permissions:
jobs:
build:
name: Build & Test
runs-on: ubuntu-latest
runs-on: ${{ matrix.os }}
strategy:
# Surface failures on every OS independently instead of cancelling the
# whole matrix when one runner fails.
fail-fast: false
matrix:
os: [ubuntu-latest, windows-latest]
go-version: ["1.25.x"]

steps:
Expand All @@ -43,14 +47,22 @@ jobs:
- name: Build
run: go build ./...

# Force bash on Windows runners too: pwsh's legacy native-argument
# passing splits `-coverprofile=coverage.out` and feeds `.out` to go
# as a package import path, producing `FAIL .out [setup failed]`.
# Git Bash is preinstalled on windows-latest and parses args verbatim.
- name: Test (with race detector and coverage)
shell: bash
run: go test -race -timeout 120s -covermode=atomic -coverpkg=./... -coverprofile=coverage.out ./...

# Self-hosted coverage badge: parse the total from `go tool cover -func`
# and rewrite .github/badges/coverage.json. The shields.io endpoint badge
# in README.md reads this JSON via raw.githubusercontent.com.
# Ubuntu-only: the Windows leg of the matrix uses a different shell
# toolchain and we only need one canonical coverage number.
- name: Compute coverage and update badge
id: coverage
if: matrix.os == 'ubuntu-latest'
run: |
pct=$(go tool cover -func=coverage.out | awk '/^total:/ {gsub("%","",$3); print $3}')
if [ -z "$pct" ]; then
Expand Down Expand Up @@ -79,7 +91,7 @@ jobs:
# upload would silently match nothing and, combined with
# if-no-files-found: error, fail the build job on every push to main.
- name: Upload coverage badge artifact
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
if: matrix.os == 'ubuntu-latest' && github.event_name == 'push' && github.ref == 'refs/heads/main'
uses: actions/upload-artifact@v5
with:
name: coverage-badge
Expand Down Expand Up @@ -121,6 +133,43 @@ jobs:
git commit -m "chore(ci): update coverage badge to ${pct} [skip ci]"
git push origin badges

e2e-windows:
# Windows e2e is intentionally narrower than the Linux e2e: it does not
# set SKILL_UP_FULL_E2E, so the LLM-dependent tests in e2e/cli_test.go,
# e2e/agent_test.go and e2e/mcp_test.go self-skip. The mock-engine and
# script-judge contract tests still exercise the Windows-specific code
# paths added by issue #31 — shell selection, MSYS argv handling,
# QuoteWindows, NewShellCmd's WaitDelay, the script-judge interpreter
# dispatch — through the full CLI pipeline. Real-LLM coverage stays on
# the Linux e2e job below.
name: E2E (none runtime, Windows)
runs-on: windows-latest
timeout-minutes: 20
steps:
- uses: actions/checkout@v6
Comment thread
zpzjzj marked this conversation as resolved.

- uses: actions/setup-go@v6
with:
go-version: "1.25.x"
cache: true

# Match the Build & Test step's shell choice so pwsh's legacy argv
# passing does not split `-coverprofile`-style flags.
- name: Run e2e tests (none runtime, quick mode)
shell: bash
run: go test -tags e2e -timeout 1200s -count=1 -v ./e2e
env:
SKILL_UP_E2E_ARTIFACT_DIR: ${{ github.workspace }}/e2e-artifacts

- name: Upload e2e workspace artifacts
if: always() && hashFiles('e2e-artifacts/**') != ''
uses: actions/upload-artifact@v5
with:
name: e2e-windows-workspaces
path: e2e-artifacts/
if-no-files-found: ignore
retention-days: 14

e2e:
name: E2E (none runtime)
runs-on: ubuntu-latest
Expand Down Expand Up @@ -213,7 +262,7 @@ jobs:

- name: Upload e2e workspace artifacts
if: always() && steps.secrets.outputs.available == 'true' && hashFiles('e2e-artifacts/**') != ''
uses: actions/upload-artifact@v7
uses: actions/upload-artifact@v5
with:
name: e2e-workspaces
path: e2e-artifacts/
Expand Down Expand Up @@ -336,7 +385,7 @@ jobs:

- name: Upload OpenSandbox server log
if: always() && steps.secrets.outputs.available == 'true'
uses: actions/upload-artifact@v7
uses: actions/upload-artifact@v5
with:
name: opensandbox-server-log
path: ${{ runner.temp }}/opensandbox-server.log
Expand All @@ -345,7 +394,7 @@ jobs:

- name: Upload opensandbox e2e workspace artifacts
if: always() && steps.secrets.outputs.available == 'true' && hashFiles('e2e-opensandbox-artifacts/**') != ''
uses: actions/upload-artifact@v7
uses: actions/upload-artifact@v5
with:
name: e2e-opensandbox-workspaces
path: e2e-opensandbox-artifacts/
Expand Down
5 changes: 5 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,11 @@ make lint-tools
If you are in mainland China and `go install` is slow, set
`GOPROXY=https://goproxy.cn,direct` before running the commands above.

On Windows, `make` is unavailable by default; use the PowerShell equivalents
in `scripts/windows/` (`hooks.ps1`, `lint-tools.ps1`, `verify.ps1`). See the
[Windows support guide](docs/guide/windows.md) for supported features and
known limitations.

## Build & run

```bash
Expand Down
33 changes: 33 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,39 @@ All notable changes to this project will be documented in this file.
The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).

## [0.3.0] - 2026-05-27

### Added
- **First-class Windows support** for the CLI, the `none` runtime, and the
script judge. Native Windows builds run all unit tests, the script judge
routes `.ps1`/`.cmd`/`.bat` directly and `.sh` through Git Bash when
available, and CI gains a `windows-latest` build/test matrix plus a
dedicated `E2E (none runtime, Windows)` contract job.
See [Windows Support](docs/guide/windows.md) for the full guide.
- `SKILL_UP_BASH` environment variable: explicit path to a `bash`
executable for skill-up's `none` runtime to use. Honored on every
platform (read once at startup, takes precedence over `PATH`).
- PowerShell tooling under `scripts/windows/`: `hooks.ps1`,
`lint-tools.ps1`, and `verify.ps1` mirror the Makefile targets for
contributors on Windows; `examples/judge-debug-eval.ps1` provides a
runnable PowerShell script-judge example.
- `.gitattributes` pins line endings (LF for `*.sh`, CRLF for `*.ps1` /
`*.cmd` / `*.bat`) so Git checkout on Windows does not break scripts.

### Changed
- Agent CLIs (Claude Code, Codex, Qoder CLI) now hard-fail on Windows
hosts without a discoverable bash, with a clear error pointing at
Git Bash or `SKILL_UP_BASH`. Previously the cmd.exe fallback would
accept agent commands but leak shell metacharacters from instructions
into the host shell.
- `internal/platform` centralizes host shell, quoter, and bash discovery
behind a single `platform.Host()` (cached for the process lifetime).
Replaces the previous ad-hoc platform branching in `NoneRuntime.Exec`
and the script-judge planner.
- `Runtime.TargetGOOS() string` is now a required interface method so
future runtimes get a compile-time error rather than silently
defaulting to `"linux"`.

## [0.2.3] - 2026-05-27

### Added
Expand Down
6 changes: 6 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,12 @@ We welcome bug reports, feature requests, documentation improvements, and code c
# If you touched anything under e2e/ or internal/runner/, also run:
make e2e
```
On Windows, `make` is unavailable by default — use the PowerShell scripts
in `scripts/windows/` (`verify.ps1`, `lint-tools.ps1`, `hooks.ps1`) and the
standard `go build` / `go test -race ./...` commands. The `make e2e`
equivalent is `go test -tags e2e -v ./e2e` (with the same env vars the
Makefile target sets). See the
[Windows support guide](docs/guide/windows.md).
5. Commit using **Conventional Commits** (enforced by `.githooks/commit-msg`). See the *Commit Message* section below for the allowed types and examples.
6. Push your branch to your fork and open a Pull Request against `main`. Fill out the PR template, link any related issues, and describe the user-visible impact.
7. Update [`CHANGELOG.md`](CHANGELOG.md) in the same PR if your change is user-visible.
Expand Down
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,11 @@ make build
go build -o bin/skill-up ./cmd/skill-up
```

**Windows users**: skill-up runs natively on Windows. See
[Windows Support](docs/guide/windows.md) for the recommended workflow,
known limitations (notably: native agent CLI execution requires Git
Bash), and the PowerShell tooling under `scripts/windows/`.

## Quick Start

### 1. Create Eval Config
Expand Down
5 changes: 5 additions & 0 deletions README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -141,6 +141,11 @@ make build
go build -o bin/skill-up ./cmd/skill-up
```

**Windows 用户**:skill-up 原生支持 Windows。请参阅
[Windows 支持指南](docs/zh/guide/windows.md) 了解推荐工作流、已知限制
(特别是:原生运行 agent CLI 需要 Git Bash)以及 `scripts/windows/`
下的 PowerShell 工具脚本。

## 快速上手

### 第一步:创建评测配置
Expand Down
2 changes: 2 additions & 0 deletions docs/.vitepress/config.ts
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,7 @@ export default defineConfig({
text: 'Introduction',
items: [
{ text: 'Getting Started', link: '/guide/getting-started' },
{ text: 'Windows Support', link: '/guide/windows' },
],
},
{
Expand Down Expand Up @@ -106,6 +107,7 @@ export default defineConfig({
text: '入门',
items: [
{ text: '快速上手', link: '/zh/guide/getting-started' },
{ text: 'Windows 支持', link: '/zh/guide/windows' },
],
},
{
Expand Down
109 changes: 109 additions & 0 deletions docs/guide/windows.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,109 @@
# Windows Support

skill-up runs natively on Windows. This page covers what works, the current
limitations, and the recommended workflow.

---

## Supported

- **Build and unit tests** — `go build ./...` and `go test ./...` pass on
Windows. CI exercises a `windows-latest` runner alongside Linux.
- **The `none` runtime** — commands run on the host through `cmd.exe`.
- **The `opensandbox` runtime** — unaffected by the host OS; it always
executes inside a Linux sandbox.
- **The script judge** — dispatches by file extension (or shebang):

| Script | Interpreter on Windows |
| ----------------- | ------------------------------------------- |
| `.ps1` | PowerShell |
| `.cmd` / `.bat` | `cmd.exe` |
| `.sh` | bash (Git Bash; see below) |

## Running `.sh` script judges on Windows

A `.sh` script judge needs a `bash` interpreter. skill-up looks for one in
this order:

1. the `SKILL_UP_BASH` environment variable (an explicit path to `bash.exe`);
2. `bash` on `PATH`;
3. well-known Git Bash install locations —
`C:\Program Files\Git\bin\bash.exe` and
`C:\Program Files (x86)\Git\bin\bash.exe`.

If none is found the script judge fails with a clear error. Install
[Git for Windows](https://git-scm.com/download/win) or set `SKILL_UP_BASH`.

The WSL shim at `C:\Windows\System32\bash.exe` is intentionally rejected at
all three steps (override, PATH, well-known) because it expects Linux-format
`/mnt/c/...` paths and silently fails on the Windows-style paths skill-up
generates. Users who want to drive script judges through WSL must arrange
path translation upstream and point `SKILL_UP_BASH` at a non-WSL bash — or
simply run skill-up inside WSL itself (see "Recommended workflow" below).

## OpenSandbox runtime on Windows

The `opensandbox` runtime talks to a remote OpenSandbox server over HTTP and
never spawns a host shell. Running `skill-up.exe` on native Windows against a
remote sandbox works today: all host-side path handling already crosses the
host→sandbox boundary through `filepath.ToSlash`, and the sandbox itself is a
Linux container, so the script judge and any agent run inside it behave
exactly as they do on Linux.

OpenSandbox also offers a [**Windows guest profile**](https://github.com/alibaba/OpenSandbox/blob/main/docs/windows-sandbox.md):
the server runs `dockur/windows` (Windows in KVM/QEMU inside a Linux container)
and the API accepts `platform: {"os": "windows", "arch": "amd64"}` on create.
At the time of writing the Go SDK does not yet expose the `Platform` field, so
driving a Windows-guest sandbox from skill-up is blocked on an upstream Go SDK
update — tracked separately.

For a Windows machine that needs the full agent workflow **without** a remote
sandbox, run skill-up inside **WSL2**. WSL2 is a Linux environment, so both the
`none` and `opensandbox` runtimes — including the agent Node/nvm bootstrap —
work without limitation.

## Contributor tooling

`make` is not available on Windows by default. Use the PowerShell scripts
under `scripts/windows/` instead:

```powershell
# Install git hooks (equivalent to `make hooks`)
pwsh scripts/windows/hooks.ps1

# Install pinned lint tools into .tools/bin (equivalent to `make lint-tools`)
pwsh scripts/windows/lint-tools.ps1

# fmt-check + vet + revive + golangci-lint (equivalent to `make verify`)
pwsh scripts/windows/verify.ps1
```

Build and test use the standard Go toolchain, which is cross-platform:

```powershell
go build -o bin/skill-up.exe ./cmd/skill-up
go test -race ./...
```

## Known limitations

- **Running real agents natively** — Claude Code / Codex / Qoder CLI are
launched through a bash-based Node/nvm bootstrap. That bootstrap does not
run under `cmd.exe`. To run full agent evals on Windows, either install
Node.js and the agent CLIs yourself beforehand, or use WSL2.
- **`.ps1` script judges require a Windows target** — when the runtime target
is POSIX (for example the `opensandbox` Linux sandbox), only `.sh` scripts
are supported.
- **`cmd.exe` expands `%VAR%` inside arguments** — when no bash is discovered
and the `cmd /d /s /c` fallback shell runs, literal `%NAME%` substrings
inside command arguments are still expanded by cmd. There is no reliable
command-line escape for this. Do not interpolate untrusted strings into
shell commands. Install Git Bash (which skill-up auto-discovers) to avoid
the cmd fallback entirely.

## Recommended workflow

- **Authoring and running script-judge evals** — native Windows works well.
Prefer `.ps1` script judges, or install Git for Windows for `.sh` support.
- **Running full agent evals** — use **WSL2**, so the evaluator and the agent
CLIs share one POSIX environment and avoid path/credential friction.
Loading
Loading