Skip to content

Test suite: opt-in parallel runs, faster serial, stricter hygiene#1064

Open
tony wants to merge 10 commits into
masterfrom
pytest-improvements
Open

Test suite: opt-in parallel runs, faster serial, stricter hygiene#1064
tony wants to merge 10 commits into
masterfrom
pytest-improvements

Conversation

@tony

@tony tony commented Jun 28, 2026

Copy link
Copy Markdown
Member

Summary

  • Add opt-in parallel test execution via pytest-xdist and a just test-parallel recipe, so the largely-independent, tmux-driven suite can fan out across cores for a much faster local run while the default just test stays serial.
  • Remove two fixed time.sleep(0.50) settle waits in the freezer tests — WorkspaceBuilder.build() already waits for pane readiness (_wait_for_pane_ready), so the post-build wait was dead time; trims ~1s from the default serial run without weakening the assertions.
  • Fix a color-env test fragility: an autouse fixture neutralizes FORCE_COLOR/NO_COLOR for every test, so the CLI help-text assertions are hermetic regardless of a contributor's exported shell color settings (and stay green under xdist).
  • Enforce --strict-markers so a mistyped or unregistered marker becomes a hard collection error instead of a silent warning.
  • Flatten the only two class-based test groupings into module-level functions, aligning with the project's functional-tests convention.

Changes by area

Parallelism (opt-in)

  • pyproject.toml: add pytest-xdist to the dev and testing dependency groups (lockfile updated).
  • justfile: add a test-parallel recipe using a capped worker count (-n 8); test remains serial.

Test robustness & speed

  • tests/workspace/test_freezer.py: drop the fixed time.sleep(0.50) settle wait (and the now-unused import time) from test_freeze_config and test_freeze_logs_debugbuild() already waits for pane readiness, so freezing immediately after build() is safe.
  • conftest.py: new autouse _pin_test_color_env clearing FORCE_COLOR/NO_COLOR (mirrors the existing _pin_test_shell_env hermeticity pattern); color-specific tests still set the vars explicitly.

Test hygiene & convention

  • pyproject.toml: append --strict-markers to [tool.pytest.ini_options] addopts.
  • tests/workspace/test_finders_local.py: flatten TestFindLocalWorkspaceEdgeCases and TestLocalWorkspaceFilesConstant into module-level def test_* functions (bodies and assertions unchanged; the two constant tests renamed so they read at module scope).

Tooling

  • .gitignore: ignore .pytest-optimizer/ (local test-profiling tool state; not part of the project).

Design decisions

  • Capped worker count, not -n auto: on a high-core machine -n auto saturates every core and regresses below serial (each worker re-imports and re-collects the whole suite). A fixed -n 8 sits in the sweet spot. -n is deliberately kept out of addopts so it doesn't degrade --pdb/-x ergonomics or the default run; parallelism is opt-in via the recipe.
  • No post-build wait needed: WorkspaceBuilder.build() already calls _wait_for_pane_ready() per pane, so the freezer tests freeze immediately after build() returns — the prior fixed sleep was redundant.
  • Neutralize color env at the root, not per-test: like _pin_test_shell_env pins $SHELL, pinning the color env once (autouse) makes every test independent of the contributor's ambient environment rather than patching each color-sensitive test.
  • No markers ini entry: the only non-builtin marker (flaky) is owned and registered by pytest-rerunfailures (a mandatory dependency); the project defines no custom markers of its own, so --strict-markers alone is complete.

Verification

No fixed sleeps remain in the freezer tests:

$ rg 'time\.sleep' tests/workspace/test_freezer.py

No class-based groupings remain in the finders tests:

$ rg '^class Test' tests/workspace/test_finders_local.py

Strict-markers is active:

$ rg 'strict-markers' pyproject.toml

Test plan

  • Default serial suite green — uv run py.test
  • Parallel suite green — just test-parallel (uv run py.test -n 8)
  • Suite green with FORCE_COLOR exported — proves the color-env hermeticity fix
  • Unregistered marker hard-errors at collection — proves --strict-markers is enforced via addopts
  • tests/workspace/test_finders_local.py collects the same set after flattening (no class prefixes)
  • Lint clean — uv run ruff check . and uv run ruff format .
  • Types clean — uv run mypy (strict)

tony added 6 commits June 28, 2026 06:47
why: The pytest-optimizer tooling writes durable profiling and
benchmark JSON under .pytest-optimizer/; it is local state, not
source, and should never be committed.

what:
- Add .pytest-optimizer/ to .gitignore
why: test_freeze_config and test_freeze_logs_debug each slept a
fixed time.sleep(0.50) to let tmux settle after build() before
freeze() — ~1.0s of dead wait per run.

what:
- Replace both sleeps with retry_until polling on
  len(session.windows) >= len(session_config["windows"])
- Import retry_until (libtmux), drop now-unused import time
- Suite 39.69s -> 38.53s (-1.16s), above the 0.68s k*MAD noise band
why: the suite is ~88% call-bound across independent, tmux-driven
test bodies (each test spawns its own daemon on a unique socket),
so it parallelizes cleanly. Measured: serial 39.69s -> ~16-19s
under capped xdist (best 13.7s) ~= 2.1-2.9x, far above the 0.68s
noise band; fully green under -n 8 in clean CI conditions.

what:
- Add pytest-xdist to the dev and testing dependency groups
- Add a `just test-parallel` recipe with a capped worker count
  (-n 8); `just test` stays serial
- Keep -n out of addopts: -n auto (=cores) thrashes at full core
  saturation (~40s) and -n breaks --pdb/-x ergonomics
why: CLI color is decided live from FORCE_COLOR/NO_COLOR (see
tmuxp._internal.colors.Colors). A contributor who exports
FORCE_COLOR in their shell gets ANSI-wrapped help text, so
test_help_examples.py fails (len(examples) == 0). The same leak
made the suite look order-dependent under xdist when FORCE_COLOR
reached the workers.

what:
- Add autouse _pin_test_color_env clearing FORCE_COLOR/NO_COLOR for
  every test; color-specific tests still set them explicitly and
  monkeypatch restores the contributor's values at teardown
- Suite now green with FORCE_COLOR exported and under xdist
  regardless of ambient color env
why: CLAUDE.md mandates functional tests only ("avoid class TestFoo
groupings"). test_finders_local.py had two class-based groupings,
the only such violation in the suite.

what:
- Flatten TestFindLocalWorkspaceEdgeCases (7 methods) and
  TestLocalWorkspaceFilesConstant (2 methods) into module-level
  def test_* functions; drop the self receiver
- Rename the constant tests to test_local_workspace_files_constant_*
  so they read at module scope without the class for context
- No behavior change: same tests collected, assertions verbatim
why: without --strict-markers an unregistered or mistyped marker
(e.g. @pytest.mark.flakey) is a silent warning, so a typo'd mark
quietly stops applying. Strict mode turns it into a hard collection
error.

what:
- Add --strict-markers to [tool.pytest.ini_options] addopts
- No markers ini entry needed: the only custom marker (flaky) is
  registered by pytest-rerunfailures; there are no project-defined
  custom markers
- Verified: suite green and an unregistered mark now errors at
  collection
@codecov

codecov Bot commented Jun 28, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.01%. Comparing base (c706e5d) to head (bd3db6e).

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1064      +/-   ##
==========================================
+ Coverage   81.98%   82.01%   +0.02%     
==========================================
  Files          28       28              
  Lines        2548     2552       +4     
  Branches      485      485              
==========================================
+ Hits         2089     2093       +4     
  Misses        328      328              
  Partials      131      131              

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@tony

tony commented Jun 28, 2026

Copy link
Copy Markdown
Member Author

Code review

Found 3 issues:

  1. The retry_until(...) guard is satisfied the instant build() returns. build() creates windows synchronously (each new_window() is a blocking tmux call), so len(session.windows) >= expected_windows is already true on the first check and the call returns without polling. The speedup is real and the tests stay green, but the comment "Wait for all configured windows to register" overstates it, and the pane/shell settling that the prior time.sleep(0.50) allowed before freeze() is no longer awaited. If only window registration matters, the indirection adds little; if settling matters, this condition doesn't capture it. Same pattern at L126-L130. (code observation)

# Wait for all configured windows to register instead of a fixed sleep.
expected_windows = len(session_config["windows"])
retry_until(lambda: len(session.windows) >= expected_windows, 2)

  1. Commit bodies for 599c7a76 and 900cc704 end with Refs: pytest-optimizer H33 / H19 — internal AI-tooling identifiers with no meaning to a git-log reader. (CLAUDE.md says remove "AI-tool metadata")

https://github.com/tmux-python/tmuxp/blob/f30986a31ee1b48eefef1390db310658d1a056cd/CLAUDE.md#L528-L530

  1. Several commit bodies embed fragile pass/collection counts — "797 passed" (f30986a3, ac9c318a), "17 collected items" (25b71b14), "10/10 + 3/3 green" (599c7a76). The timing figures that justify the worker-count decision are evidence and fine; the pass counts add nothing over "green". (CLAUDE.md Brittle References: "fragile file/test counts")

https://github.com/tmux-python/tmuxp/blob/f30986a31ee1b48eefef1390db310658d1a056cd/CLAUDE.md#L531-L534

tony added 3 commits June 28, 2026 09:50
why: WorkspaceBuilder.build() already calls _wait_for_pane_ready()
for each normal pane, so the session is fully materialized when
build() returns. The post-build retry_until on window count was a
no-op (windows register synchronously) and its comment overstated
what it did.

what:
- Remove the retry_until window-count waits from test_freeze_config
  and test_freeze_logs_debug, plus the now-unused retry_until import
- build() flows straight to freeze(); the existing freezer tests
  remain the regression coverage
why: test_at_filesystem_root requested tmp_path and monkeypatch but
used neither — it exercises traversal against the real filesystem
root and patches nothing.

what:
- Remove the unused tmp_path/monkeypatch parameters from the test
  signature
why: the comment said "should find both" above an assertion that
tolerates more than two results — when traversal continues past
home it can also pick up configs in ancestor directories.

what:
- Reword the comment to match the >= 2 assertion
@tony tony force-pushed the pytest-improvements branch from f30986a to 8a39075 Compare June 28, 2026 14:56
@tony

tony commented Jun 28, 2026

Copy link
Copy Markdown
Member Author

Code review (re-review after force-push)

The three findings from the earlier review are resolved: the no-op retry_until was removed (build() already waits for pane readiness), and the commit bodies no longer carry Refs: metadata or brittle counts. I also corrected the PR description, which still described the removed retry_until approach.

Found 1 issue:

  1. test_local_workspace_files_constant_is_list is fully subsumed by test_local_workspace_files_constant_order just above it — the equality assertion LOCAL_WORKSPACE_FILES == [".tmuxp.yaml", ".tmuxp.yml", ".tmuxp.json"] already proves the value is a list of length 3, so the isinstance/len checks add no coverage. Optional cleanup (carried over verbatim when the class was flattened).

def test_local_workspace_files_constant_order() -> None:
"""Verify LOCAL_WORKSPACE_FILES has correct order (yaml, yml, json)."""
assert LOCAL_WORKSPACE_FILES == [".tmuxp.yaml", ".tmuxp.yml", ".tmuxp.json"]
def test_local_workspace_files_constant_is_list() -> None:
"""Verify LOCAL_WORKSPACE_FILES is a list."""
assert isinstance(LOCAL_WORKSPACE_FILES, list)
assert len(LOCAL_WORKSPACE_FILES) == 3

why: test_local_workspace_files_constant_is_list only asserted the
constant is a list of length 3 — both already proven by the sibling
test_local_workspace_files_constant_order via its equality assertion
against the list literal.

what:
- Remove the subsumed is_list test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant