Integrate Benchmarking into CI with Performance Analysis by ev1lm0nk3y · Pull Request #418 · walles/moor

ev1lm0nk3y · 2026-05-20T03:27:03Z

Overview
This PR overhauls our Continuous Integration pipeline by consolidating workflows, migrating to modern GitHub Actions for linting, and introducing automated performance tracking with Pull Request feedback. The goal is to catch both logic bugs and performance regressions in a single, efficient pipeline.

Key Changes

Optimized CI Architecture (2-Job Structure)
The workflow is now split into two focused jobs to reduce redundant cycles and improve security:

Lint Job (Ubuntu-only):
- Migrated to golangci/golangci-lint-action@v6 with only-new-issues: true for cleaner PR reviews.
- Centralized man page linting (mandoc) into this job, removing it from the matrixed testing job.
Validate Job (Matrix: Linux, macOS, Windows):
- Consolidates unit tests and benchmarks into a single matrixed execution.
- Security: Elevated permissions (contents: write, pull-requests: write) are scoped specifically to this job to support performance analysis.

Automated Benchmarking & PR Feedback

New benchmark.sh Script: A platform-aware script that detects system specs (OS/Arch/CPU) and saves results to benchmarks/.
PR Performance Comments: Integrated benchmark-action/github-action-benchmark to automatically post a comment on PRs comparing the current run's performance against the baseline.
Regression Guardrails: The CI will fail if performance regresses by more than 20% (alert-threshold: 120%).
Stable Baselines: While benchmarks run on all platforms for runtime validation, analysis/commenting is restricted to ubuntu-latest to ensure hardware consistency and prevent redundant bot comments.

Tooling Enhancements

test.sh: Added a SKIP_LINT flag to allow CI to skip the internal linting step (now handled by the specialized Lint job).
README.md: Updated the "Developing" section with instructions for the new benchmarking workflow.

How to Use Locally
Run the new benchmark script directly:

1 ./benchmark.sh
Or use the new flag in the test suite:

1 ./test.sh --bench

Verification Results

test.sh --bench correctly delegates to benchmark.sh.
benchmark.sh correctly identifies hardware (e.g., benchmarks/darwin-arm64-m4_pro).
CI correctly skips linting in the matrix jobs when SKIP_LINT is set.
Workflow YAML validated for matrix logic and permission scoping.

ev1lm0nk3y · 2026-05-20T03:27:39Z

Feel free to reject this, but I thought these changes would be useful

Split CI into separate lint, test, and benchmark jobs Consolidate CI: Combine tests/benchmarks and move mandoc linting to Lint job limit fetch depth for linting

walles · 2026-05-23T06:59:07Z

Doesn't this assume that all GitHub Actions runners always have the same performance characteristics?

Would the benchmark numbers here be stable or flaky, and if flaky, by how much?

Since these are random cloud machines, how would one protect from flakiness from things like:

Load from other VMs on the same host
Disk caches being filled or not filled, on the host
Amount of RAM in the VM
Number of cores in the VM

ev1lm0nk3y · 2026-06-01T14:10:32Z

Responses in-line

In short, yes these are valid concerns and I have some suggestions I would like your opinion on.

Do nothing and let the alert threshold at 120% of baseline account for the variability at the risk of introducing performance regressions over time.
Use sample averaging to smooth out any single test run spike alleviating the "noisy neighbor" concern
Move to "relative" benchmarking where the tests are run within the source and destination branches to ensure that both are using the same hardware specs
Convert to using deterministic metrics over time-based metrics because things like allocs/op or B/op do not change with the host's current performance

My inclination is to use deterministic metrics instead of sample averaging and relative benchmarking because these are calculated by the Go runtime based on code execution and are virtually immune to VM specs and noisy neighbors. To my knowledge benchdiff and benchstat use the time-based metrics so a custom parser my be required.

What do you think?

Integrate benchmarking into CI with PR analysis

c91d14b

Split CI into separate lint, test, and benchmark jobs Consolidate CI: Combine tests/benchmarks and move mandoc linting to Lint job limit fetch depth for linting

ev1lm0nk3y force-pushed the ci-benchmark-integration branch from 84ae6aa to c91d14b Compare May 20, 2026 03:30

Merge branch 'master' into ci-benchmark-integration

145f6c3

ev1lm0nk3y added 2 commits June 1, 2026 07:18

fix: lint action version upgrade to v8

bacbc66

fix: test.sh depends on 'mandoc' so installing on mac and ubuntu

b1a030e

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Integrate Benchmarking into CI with Performance Analysis#418

Integrate Benchmarking into CI with Performance Analysis#418
ev1lm0nk3y wants to merge 4 commits into
walles:masterfrom
ev1lm0nk3y:ci-benchmark-integration

ev1lm0nk3y commented May 20, 2026

Uh oh!

ev1lm0nk3y commented May 20, 2026

Uh oh!

walles commented May 23, 2026

Uh oh!

ev1lm0nk3y commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

ev1lm0nk3y commented May 20, 2026

Uh oh!

ev1lm0nk3y commented May 20, 2026

Uh oh!

walles commented May 23, 2026

Uh oh!

ev1lm0nk3y commented Jun 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants