ci(bench): port performance-benchmarking to Gitea native-ci (sccache), scope GitHub to manual by AlexMikhalev · Pull Request #898 · terraphim/terraphim-ai

AlexMikhalev · 2026-06-03T19:40:37Z

Summary

Add .gitea/workflows/benchmark.yml: native-ci Criterion benchmark runner + regression gate
Scope .github/workflows/performance-benchmarking.yml to workflow_dispatch only (remove push: and pull_request: triggers)

What the Gitea workflow does

Runs on the terraphim-native runner with sccache backed by SeaweedFS (S3-compatible, same env config as ci-native.yml).

on: push → main  +  workflow_dispatch

Steps:

sccache --start-server + zero stats
cargo bench -p terraphim_tinyclaw --bench tinyclaw_benchmarks (Criterion)
Collect mean.point_estimate from target/criterion/*/new/estimates.json → benchmark-results/current-YYYY-MM-DD.json
Regression gate: compare against ~/.cache/terraphim-bench/baseline.json on the runner; fail if any benchmark degrades >20%
- First run (no baseline): seeds baseline with today's results and exits 0
Update baseline (main branch only): copy current estimates → baseline
sccache --show-stats

Why baseline stored on runner, not in git

benchmark-results/ is in .gitignore. Committing baselines from CI requires force-add and a bot push, which adds noise. Runner-local storage (~/.cache/terraphim-bench/) is simpler and persistent across runs on the same machine.

Test plan

Merge lands; Gitea runner fires benchmark job on push to main
First run: baseline seeded, job green
Subsequent runs: regression gate compares and reports
Manual trigger via workflow_dispatch works
GitHub performance-benchmarking.yml no longer auto-triggers on push/PR

🤖 Generated with Terraphim AI

…tHub to manual Add .gitea/workflows/benchmark.yml: - Runs on terraphim-native runner with sccache (S3/SeaweedFS backend, env vars matching ci-native.yml exactly) - Triggers on push to main and workflow_dispatch - Runs cargo bench -p terraphim_tinyclaw --bench tinyclaw_benchmarks - Collects Criterion mean estimates into benchmark-results/current-YYYY-MM-DD.json - Regression gate compares against runner-local baseline at ~/.cache/terraphim-bench/baseline.json; fails if any benchmark degrades >20%; seeds baseline from first run (today's date) - Updates baseline automatically on main-branch pushes Scope .github/workflows/performance-benchmarking.yml to manual only: - Remove pull_request: and push: triggers; retain workflow_dispatch: - Add comment explaining the split: Gitea handles CI benchmarks, GitHub workflow kept for deep on-demand analysis and SLO reporting

AlexMikhalev · 2026-06-03T19:44:53Z

compound-review verdict: GO

Reviewed by: Carthos (Domain Architect, quality-coordinator)
Scope: .gitea/workflows/benchmark.yml (new) + .github/workflows/performance-benchmarking.yml (scoped to manual)

Architecture Assessment

The boundary between Gitea native-ci (sccache-backed self-hosted runner, push-triggered) and GitHub CI (cloud, manual/on-demand) is clean and well-reasoned. This follows the ADR-0001 pattern already established.

Findings

Acceptable (matches established pattern)

No checkout step: native-ci.yml also omits checkout. The terraphim-native runner operates against a pre-seeded workspace. Consistent with existing convention.
Hardcoded sccache path (/home/alex/.local/bin/sccache): matches native-ci.yml env config exactly.
github.ref condition in Gitea workflow: Gitea Actions exposes the github context for compatibility -- this is correct.
benchmark-results/ in .gitignore: confirmed present.
Crate and benchmark file exist: crates/terraphim_tinyclaw/benches/tinyclaw_benchmarks.rs exists; [[bench]] entry correct in Cargo.toml.

Non-blocking recommendation

Python glob depth: p.glob("*/new/estimates.json") matches only one level deep. Criterion groups benchmarks in subdirectories: target/criterion/{group}/{bench_name}/new/estimates.json. Change the glob to p.glob("**/new/estimates.json") to handle grouped benchmarks as the suite grows. Current benchmarks are placeholders (black_box(())), so the immediate impact is low -- the baseline seeds empty results and the gate always passes -- but the fix should land before real implementations replace the placeholders.

Informational

All four benchmarks (session_load, session_save, bus_send_receive, tool_filesystem) are placeholder implementations. The regression gate infrastructure is correct but will only become meaningful when real implementations replace black_box(()). This is intentional scaffolding.

Agent PR Checklist

cargo fmt --check -- not applicable (YAML/Python only)
No #[allow(...)] annotations
.gitignore updated (benchmark-results/ present)
Scope discipline: GitHub workflow cleanly scoped to workflow_dispatch
Runner-local baseline strategy is documented and sound

Triggering review chain:
@adf:test-guardian please review issue #898
@adf:security-sentinel please review issue #898
@adf:spec-validator please review issue #898
@adf:compliance-watchdog please review issue #898

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci(bench): port performance-benchmarking to Gitea native-ci (sccache), scope GitHub to manual#898

ci(bench): port performance-benchmarking to Gitea native-ci (sccache), scope GitHub to manual#898
AlexMikhalev wants to merge 1 commit into
mainfrom
task/bench-gitea-native-ci-v2

AlexMikhalev commented Jun 3, 2026

Uh oh!

AlexMikhalev commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

AlexMikhalev commented Jun 3, 2026

Summary

What the Gitea workflow does

Why baseline stored on runner, not in git

Test plan

Uh oh!

AlexMikhalev commented Jun 3, 2026

Architecture Assessment

Findings

Acceptable (matches established pattern)

Non-blocking recommendation

Informational

Agent PR Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant