test(hosting): fix flaky "Docker is not running" failures in pipeline tests by radical · Pull Request #18126 · microsoft/aspire

radical · 2026-06-11T20:36:24Z

Description

DistributedApplicationPipelineTests intermittently went red on windows-latest
CI (and passed on re-run) with:

Aspire.Hosting.DistributedApplicationException: Docker is not running.
Start Docker and try again.
   at ...DistributedApplicationPipeline... (check-container-runtime step)

One run took out 33 tests at once; this signature drove the red-then-green CI on
the Hosting-4 (windows-latest) job.

Root cause: each test builds a publish pipeline with step: null, so
ExecuteAsync runs every default step — including check-container-runtime,
which resolves the real IContainerRuntimeResolver and shells out to
docker container ls. GitHub-hosted Windows runners don't run a Linux-container
Docker daemon, so the step throws; on re-run, when a daemon happens to be up, the
identical code passes. Pure daemon-state flakiness — these tests only validate
pipeline ordering and never intend to touch a container runtime.

Fix: register the shared FakeContainerRuntime as IContainerRuntimeResolver
in each test, so the preflight resolves a fake that reports "running" instead of
probing a real daemon. The builder registers the real resolver with AddSingleton
(not TryAdd), so the later test registration wins. This mirrors the existing
pattern in DockerComposePublisherTests and ResourceContainerImageManagerTests.

Why the test-side fix: check-container-runtime is a legitimate default step;
the bug is that ordering-only unit tests run it against a real daemon. Faking the
resolver is the minimal, established pattern and touches no product code.

Call-outs:

Test-only; no product code changed.
Verified locally with Docker both stopped and running: with the fix the full
class passes (77/77) in both states; reverting it makes the tests fail when the
daemon is down and pass when it's up — reproducing the CI behavior exactly.
The same latent pattern exists in AddJavaScriptAppTests / AddViteAppTests
(they run where Docker is present, so they aren't flaky today); left out to keep
this PR tight.

Checklist

Is this feature complete?
- Yes. Ready to ship.
Are you including unit tests for the changes and scenario tests if relevant?
- Yes
Did you add public API?
- No
Does the change make any security assumptions or guarantees?
- No

DistributedApplicationPipelineTests intermittently failed on windows-latest CI (then passed on re-run) with: Aspire.Hosting.DistributedApplicationException: Docker is not running. Start Docker and try again. In one job this took out 33 tests at once. Root cause: each test builds a publish pipeline with step: null, so ExecuteAsync runs every default step, including check-container-runtime. That step resolves the real IContainerRuntimeResolver and shells out to `docker container ls`. On runners where the Docker daemon isn't up the step throws; on re-run, when the daemon is up, the same code passes -- so the failures are pure daemon-state flakiness, not a real test or product bug. These tests only validate pipeline ordering and never intend to touch a container runtime. Fix: register the shared FakeContainerRuntime as IContainerRuntimeResolver in each test. The preflight then resolves a fake that reports running, making the tests independent of a real daemon. This mirrors the existing pattern in DockerComposePublisherTests and ResourceContainerImageManagerTests. Validated locally (no product code changed): with the fix the full class passes whether or not Docker is running; reverting it makes the tests fail when the daemon is down and pass when it is up -- reproducing the CI behavior exactly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

github-actions · 2026-06-11T20:37:11Z

🚀 Dogfood this PR with:

⚠️ WARNING: Do not do this without first carefully reviewing the code of this PR to satisfy yourself it is safe.

curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 18126

Or

Run remotely in PowerShell:

iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 18126"

Copilot

Pull request overview

This PR fixes intermittent "Docker is not running" failures in DistributedApplicationPipelineTests on Windows CI by registering a FakeContainerRuntime as the IContainerRuntimeResolver in every test. The tests validate pipeline ordering/behavior and never intend to touch a real container runtime, so the check-container-runtime default step was failing on GitHub-hosted Windows runners where Docker isn't always available.

Changes:

Added #pragma warning disable ASPIRECONTAINERRUNTIME001 to suppress the experimental API warning for IContainerRuntimeResolver
Registered FakeContainerRuntime as IContainerRuntimeResolver in all 65 test methods that create a TestDistributedApplicationBuilder, using the established fake-registration pattern

Every test in DistributedApplicationPipelineTests repeated the same builder construction and service registrations (test output helper, FakeContainerRuntime as IContainerRuntimeResolver, and the activity reporter). Extract this into a CreatePipelineTestBuilder helper so the shared setup lives in one place. Call sites collapse to a single line; the step, log level, and a caller-supplied activity reporter are passed as helper arguments to cover the few tests that diverge from the defaults. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Copilot AI review requested due to automatic review settings June 11, 2026 20:36

Copilot started reviewing on behalf of radical June 11, 2026 20:37 View session

radical requested review from davidfowl, mitchdenny and sebastienros June 11, 2026 20:38

radical marked this pull request as ready for review June 11, 2026 20:38

Copilot AI reviewed Jun 11, 2026

View reviewed changes

davidfowl reviewed Jun 12, 2026

View reviewed changes

Comment thread tests/Aspire.Hosting.Tests/Pipelines/DistributedApplicationPipelineTests.cs Outdated

radical requested review from adamint and davidfowl June 12, 2026 18:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

test(hosting): fix flaky "Docker is not running" failures in pipeline tests#18126

test(hosting): fix flaky "Docker is not running" failures in pipeline tests#18126
radical wants to merge 2 commits into
microsoft:mainfrom
radical:ankj/fix-pipeline-tests-docker-preflight

radical commented Jun 11, 2026

Uh oh!

github-actions Bot commented Jun 11, 2026 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

radical commented Jun 11, 2026

Description

Checklist

Uh oh!

github-actions Bot commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Jun 11, 2026 •

edited

Loading