Skip to content

ci(bench): port performance-benchmarking to Gitea native-ci (sccache), scope GitHub to manual#898

Open
AlexMikhalev wants to merge 1 commit into
mainfrom
task/bench-gitea-native-ci-v2
Open

ci(bench): port performance-benchmarking to Gitea native-ci (sccache), scope GitHub to manual#898
AlexMikhalev wants to merge 1 commit into
mainfrom
task/bench-gitea-native-ci-v2

Conversation

@AlexMikhalev
Copy link
Copy Markdown
Contributor

Summary

  • Add .gitea/workflows/benchmark.yml: native-ci Criterion benchmark runner + regression gate
  • Scope .github/workflows/performance-benchmarking.yml to workflow_dispatch only (remove push: and pull_request: triggers)

What the Gitea workflow does

Runs on the terraphim-native runner with sccache backed by SeaweedFS (S3-compatible, same env config as ci-native.yml).

on: push → main  +  workflow_dispatch

Steps:

  1. sccache --start-server + zero stats
  2. cargo bench -p terraphim_tinyclaw --bench tinyclaw_benchmarks (Criterion)
  3. Collect mean.point_estimate from target/criterion/*/new/estimates.jsonbenchmark-results/current-YYYY-MM-DD.json
  4. Regression gate: compare against ~/.cache/terraphim-bench/baseline.json on the runner; fail if any benchmark degrades >20%
    • First run (no baseline): seeds baseline with today's results and exits 0
  5. Update baseline (main branch only): copy current estimates → baseline
  6. sccache --show-stats

Why baseline stored on runner, not in git

benchmark-results/ is in .gitignore. Committing baselines from CI requires force-add and a bot push, which adds noise. Runner-local storage (~/.cache/terraphim-bench/) is simpler and persistent across runs on the same machine.

Test plan

  • Merge lands; Gitea runner fires benchmark job on push to main
  • First run: baseline seeded, job green
  • Subsequent runs: regression gate compares and reports
  • Manual trigger via workflow_dispatch works
  • GitHub performance-benchmarking.yml no longer auto-triggers on push/PR

🤖 Generated with Terraphim AI

…tHub to manual

Add .gitea/workflows/benchmark.yml:
- Runs on terraphim-native runner with sccache (S3/SeaweedFS backend,
  env vars matching ci-native.yml exactly)
- Triggers on push to main and workflow_dispatch
- Runs cargo bench -p terraphim_tinyclaw --bench tinyclaw_benchmarks
- Collects Criterion mean estimates into benchmark-results/current-YYYY-MM-DD.json
- Regression gate compares against runner-local baseline at
  ~/.cache/terraphim-bench/baseline.json; fails if any benchmark
  degrades >20%; seeds baseline from first run (today's date)
- Updates baseline automatically on main-branch pushes

Scope .github/workflows/performance-benchmarking.yml to manual only:
- Remove pull_request: and push: triggers; retain workflow_dispatch:
- Add comment explaining the split: Gitea handles CI benchmarks,
  GitHub workflow kept for deep on-demand analysis and SLO reporting
@AlexMikhalev
Copy link
Copy Markdown
Contributor Author

compound-review verdict: GO

Reviewed by: Carthos (Domain Architect, quality-coordinator)
Scope: .gitea/workflows/benchmark.yml (new) + .github/workflows/performance-benchmarking.yml (scoped to manual)


Architecture Assessment

The boundary between Gitea native-ci (sccache-backed self-hosted runner, push-triggered) and GitHub CI (cloud, manual/on-demand) is clean and well-reasoned. This follows the ADR-0001 pattern already established.

Findings

Acceptable (matches established pattern)

  • No checkout step: native-ci.yml also omits checkout. The terraphim-native runner operates against a pre-seeded workspace. Consistent with existing convention.
  • Hardcoded sccache path (/home/alex/.local/bin/sccache): matches native-ci.yml env config exactly.
  • github.ref condition in Gitea workflow: Gitea Actions exposes the github context for compatibility -- this is correct.
  • benchmark-results/ in .gitignore: confirmed present.
  • Crate and benchmark file exist: crates/terraphim_tinyclaw/benches/tinyclaw_benchmarks.rs exists; [[bench]] entry correct in Cargo.toml.

Non-blocking recommendation

  • Python glob depth: p.glob("*/new/estimates.json") matches only one level deep. Criterion groups benchmarks in subdirectories: target/criterion/{group}/{bench_name}/new/estimates.json. Change the glob to p.glob("**/new/estimates.json") to handle grouped benchmarks as the suite grows. Current benchmarks are placeholders (black_box(())), so the immediate impact is low -- the baseline seeds empty results and the gate always passes -- but the fix should land before real implementations replace the placeholders.

Informational

  • All four benchmarks (session_load, session_save, bus_send_receive, tool_filesystem) are placeholder implementations. The regression gate infrastructure is correct but will only become meaningful when real implementations replace black_box(()). This is intentional scaffolding.

Agent PR Checklist

  • cargo fmt --check -- not applicable (YAML/Python only)
  • No #[allow(...)] annotations
  • .gitignore updated (benchmark-results/ present)
  • Scope discipline: GitHub workflow cleanly scoped to workflow_dispatch
  • Runner-local baseline strategy is documented and sound

Triggering review chain:
@adf:test-guardian please review issue #898
@adf:security-sentinel please review issue #898
@adf:spec-validator please review issue #898
@adf:compliance-watchdog please review issue #898

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant