|
| 1 | +# Performance Benchmarks |
| 2 | + |
| 3 | +Structured workflow for measuring and documenting optimization impact. |
| 4 | + |
| 5 | +## Quick reference |
| 6 | + |
| 7 | +```bash |
| 8 | +# Run benchmarks and save a named snapshot |
| 9 | +uv run pytest tests/test_profiling.py -m slow \ |
| 10 | + --benchmark-only --benchmark-disable-gc \ |
| 11 | + --benchmark-save=<label> |
| 12 | + |
| 13 | +# Compare current run against a named baseline |
| 14 | +uv run pytest tests/test_profiling.py -m slow \ |
| 15 | + --benchmark-only --benchmark-disable-gc \ |
| 16 | + --benchmark-compare=0001 # or use the full ID |
| 17 | + |
| 18 | +# Compare two saved snapshots (no test run) |
| 19 | +uv run pytest-benchmark compare \ |
| 20 | + .benchmarks/Darwin-CPython-3.14-64bit/0001_*.json \ |
| 21 | + .benchmarks/Darwin-CPython-3.14-64bit/0002_*.json \ |
| 22 | + --columns=mean,stddev \ |
| 23 | + --sort=name \ |
| 24 | + --group-by=name |
| 25 | + |
| 26 | +# Generate cProfile data for the top 20 functions per benchmark |
| 27 | +uv run pytest tests/test_profiling.py -m slow \ |
| 28 | + --benchmark-only --benchmark-disable-gc \ |
| 29 | + --benchmark-cprofile=cumtime --benchmark-cprofile-top=20 |
| 30 | + |
| 31 | +# Export to JSON for scripted analysis |
| 32 | +uv run pytest tests/test_profiling.py -m slow \ |
| 33 | + --benchmark-only --benchmark-disable-gc \ |
| 34 | + --benchmark-json=/tmp/bench.json |
| 35 | +``` |
| 36 | + |
| 37 | +## Optimization workflow |
| 38 | + |
| 39 | +### 1. Establish baseline |
| 40 | + |
| 41 | +Before any optimization, save a named baseline: |
| 42 | + |
| 43 | +```bash |
| 44 | +uv run pytest tests/test_profiling.py -m slow \ |
| 45 | + --benchmark-only --benchmark-disable-gc \ |
| 46 | + --benchmark-save=baseline |
| 47 | +``` |
| 48 | + |
| 49 | +This creates a file like `.benchmarks/.../0001_<hash>_<date>_baseline.json`. |
| 50 | +Note the run number (e.g., `0001`) — you'll use it for comparisons. |
| 51 | + |
| 52 | +### 2. Apply one optimization at a time |
| 53 | + |
| 54 | +Each optimization should be: |
| 55 | +- A single, focused change |
| 56 | +- On its own commit (or branch) |
| 57 | +- Measured immediately after |
| 58 | + |
| 59 | +### 3. Measure and compare |
| 60 | + |
| 61 | +After applying an optimization: |
| 62 | + |
| 63 | +```bash |
| 64 | +uv run pytest tests/test_profiling.py -m slow \ |
| 65 | + --benchmark-only --benchmark-disable-gc \ |
| 66 | + --benchmark-save=<optimization-label> \ |
| 67 | + --benchmark-compare=<baseline-number> |
| 68 | +``` |
| 69 | + |
| 70 | +This runs the benchmarks, saves the results, and prints a comparison table |
| 71 | +showing the delta (%) against the baseline. |
| 72 | + |
| 73 | +### 4. Log results |
| 74 | + |
| 75 | +After each optimization, add a row to the progress log below. |
| 76 | + |
| 77 | +### 5. Validate correctness |
| 78 | + |
| 79 | +Always run the full test suite after each optimization: |
| 80 | + |
| 81 | +```bash |
| 82 | +timeout 120 uv run pytest -n 4 |
| 83 | +``` |
| 84 | + |
| 85 | +## Benchmark matrix |
| 86 | + |
| 87 | +| Category | Benchmark | What it exercises | |
| 88 | +|----------|-----------|-------------------| |
| 89 | +| **Setup** | `test_flat_machine` | Instance + listener + callback registration | |
| 90 | +| **Setup** | `test_compound_machine` | Nested state setup | |
| 91 | +| **Setup** | `test_parallel_machine` | Parallel region setup | |
| 92 | +| **Setup** | `test_guarded_machine` | Guard/cond expression parsing | |
| 93 | +| **Setup** | `test_history_machine` | History state setup | |
| 94 | +| **Setup** | `test_deep_history_machine` | Deep nested history setup | |
| 95 | +| **Events** | `test_flat_self_transition` | Self-transition + model callbacks | |
| 96 | +| **Events** | `test_compound_enter_exit` | Enter/exit compound state | |
| 97 | +| **Events** | `test_parallel_region_events` | Events in parallel regions | |
| 98 | +| **Events** | `test_guarded_transitions` | Guard evaluation + selection | |
| 99 | +| **Events** | `test_history_pause_resume` | Shallow history save/restore | |
| 100 | +| **Events** | `test_deep_history_cycle` | Deep history save/restore | |
| 101 | +| **Events** | `test_many_transitions_full_cycle` | 5-state ring traversal | |
| 102 | +| **Events** | `test_many_transitions_reset` | Composite event (multi-source `\|`) | |
| 103 | + |
| 104 | +## Progress log |
| 105 | + |
| 106 | +Record each optimization here. Use `--benchmark-compare` output as source. |
| 107 | + |
| 108 | +### Baseline (run `0512`, CPython 3.14, Apple Silicon) |
| 109 | + |
| 110 | +| Benchmark | Mean | StdDev | |
| 111 | +|-----------|------|--------| |
| 112 | +| test_flat_machine | 189.6 µs | 2.0 µs | |
| 113 | +| test_compound_machine | 172.7 µs | 50.2 µs | |
| 114 | +| test_parallel_machine | 159.3 µs | 4.8 µs | |
| 115 | +| test_guarded_machine | 162.8 µs | 7.6 µs | |
| 116 | +| test_history_machine | 151.8 µs | 5.3 µs | |
| 117 | +| test_deep_history_machine | 164.0 µs | 7.0 µs | |
| 118 | +| test_flat_self_transition | 267.0 µs | 8.5 µs | |
| 119 | +| test_compound_enter_exit | 1018.5 µs | 18.6 µs | |
| 120 | +| test_parallel_region_events | 1280.4 µs | 16.0 µs | |
| 121 | +| test_guarded_transitions | 502.9 µs | 7.3 µs | |
| 122 | +| test_history_pause_resume | 631.6 µs | 14.8 µs | |
| 123 | +| test_deep_history_cycle | 706.0 µs | 10.5 µs | |
| 124 | +| test_many_transitions_full_cycle | 1262.0 µs | 22.3 µs | |
| 125 | +| test_many_transitions_reset | 1016.0 µs | 21.1 µs | |
| 126 | + |
| 127 | +<!-- Copy this template for each optimization: |
| 128 | +
|
| 129 | +### Optimization N: <title> |
| 130 | +
|
| 131 | +| Benchmark | Before | After | Delta | |
| 132 | +|-----------|--------|-------|-------| |
| 133 | +| ... | ... | ... | ...% | |
| 134 | +
|
| 135 | +**Commit:** `<hash>` |
| 136 | +**Description:** ... |
| 137 | +**Tests pass:** yes/no |
| 138 | +--> |
| 139 | + |
| 140 | +--- |
| 141 | + |
| 142 | +## Advanced: ad-hoc profiling |
| 143 | + |
| 144 | +For deeper investigation of a specific benchmark, use the cProfile integration: |
| 145 | + |
| 146 | +```bash |
| 147 | +# cProfile sorted by cumulative time |
| 148 | +uv run pytest tests/test_profiling.py::TestEventPerformance::test_parallel_region_events \ |
| 149 | + -m slow --benchmark-only --benchmark-disable-gc \ |
| 150 | + --benchmark-cprofile=cumtime --benchmark-cprofile-top=30 |
| 151 | +``` |
| 152 | + |
| 153 | +To generate `.prof` files for visualization (snakeviz, speedscope, etc.): |
| 154 | + |
| 155 | +```bash |
| 156 | +uv run pytest tests/test_profiling.py::TestEventPerformance::test_parallel_region_events \ |
| 157 | + -m slow --benchmark-only --benchmark-disable-gc \ |
| 158 | + --benchmark-cprofile=cumtime \ |
| 159 | + --benchmark-cprofile-dump=/tmp/bench |
| 160 | + |
| 161 | +# Opens interactive flamegraph in the browser |
| 162 | +uv run snakeviz /tmp/bench-test_parallel_region_events.prof |
| 163 | +``` |
0 commit comments