test(hosting): fix flaky "Docker is not running" failures in pipeline tests#18126
Open
radical wants to merge 2 commits into
Open
test(hosting): fix flaky "Docker is not running" failures in pipeline tests#18126radical wants to merge 2 commits into
radical wants to merge 2 commits into
Conversation
DistributedApplicationPipelineTests intermittently failed on
windows-latest CI (then passed on re-run) with:
Aspire.Hosting.DistributedApplicationException: Docker is not
running. Start Docker and try again.
In one job this took out 33 tests at once.
Root cause: each test builds a publish pipeline with step: null, so
ExecuteAsync runs every default step, including check-container-runtime.
That step resolves the real IContainerRuntimeResolver and shells out to
`docker container ls`. On runners where the Docker daemon isn't up the
step throws; on re-run, when the daemon is up, the same code passes -- so
the failures are pure daemon-state flakiness, not a real test or product
bug. These tests only validate pipeline ordering and never intend to
touch a container runtime.
Fix: register the shared FakeContainerRuntime as IContainerRuntimeResolver
in each test. The preflight then resolves a fake that reports running,
making the tests independent of a real daemon. This mirrors the existing
pattern in DockerComposePublisherTests and ResourceContainerImageManagerTests.
Validated locally (no product code changed): with the fix the full class
passes whether or not Docker is running; reverting it makes the tests
fail when the daemon is down and pass when it is up -- reproducing the CI
behavior exactly.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
|
🚀 Dogfood this PR with:
curl -fsSL https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.sh | bash -s -- 18126Or
iex "& { $(irm https://raw.githubusercontent.com/microsoft/aspire/main/eng/scripts/get-aspire-cli-pr.ps1) } 18126" |
Contributor
There was a problem hiding this comment.
Pull request overview
This PR fixes intermittent "Docker is not running" failures in DistributedApplicationPipelineTests on Windows CI by registering a FakeContainerRuntime as the IContainerRuntimeResolver in every test. The tests validate pipeline ordering/behavior and never intend to touch a real container runtime, so the check-container-runtime default step was failing on GitHub-hosted Windows runners where Docker isn't always available.
Changes:
- Added
#pragma warning disable ASPIRECONTAINERRUNTIME001to suppress the experimental API warning forIContainerRuntimeResolver - Registered
FakeContainerRuntimeasIContainerRuntimeResolverin all 65 test methods that create aTestDistributedApplicationBuilder, using the established fake-registration pattern
davidfowl
reviewed
Jun 12, 2026
Every test in DistributedApplicationPipelineTests repeated the same builder construction and service registrations (test output helper, FakeContainerRuntime as IContainerRuntimeResolver, and the activity reporter). Extract this into a CreatePipelineTestBuilder helper so the shared setup lives in one place. Call sites collapse to a single line; the step, log level, and a caller-supplied activity reporter are passed as helper arguments to cover the few tests that diverge from the defaults. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
DistributedApplicationPipelineTestsintermittently went red onwindows-latestCI (and passed on re-run) with:
One run took out 33 tests at once; this signature drove the red-then-green CI on
the
Hosting-4 (windows-latest)job.Root cause: each test builds a publish pipeline with
step: null, soExecuteAsyncruns every default step — includingcheck-container-runtime,which resolves the real
IContainerRuntimeResolverand shells out todocker container ls. GitHub-hosted Windows runners don't run a Linux-containerDocker daemon, so the step throws; on re-run, when a daemon happens to be up, the
identical code passes. Pure daemon-state flakiness — these tests only validate
pipeline ordering and never intend to touch a container runtime.
Fix: register the shared
FakeContainerRuntimeasIContainerRuntimeResolverin each test, so the preflight resolves a fake that reports "running" instead of
probing a real daemon. The builder registers the real resolver with
AddSingleton(not
TryAdd), so the later test registration wins. This mirrors the existingpattern in
DockerComposePublisherTestsandResourceContainerImageManagerTests.Why the test-side fix:
check-container-runtimeis a legitimate default step;the bug is that ordering-only unit tests run it against a real daemon. Faking the
resolver is the minimal, established pattern and touches no product code.
Call-outs:
class passes (77/77) in both states; reverting it makes the tests fail when the
daemon is down and pass when it's up — reproducing the CI behavior exactly.
AddJavaScriptAppTests/AddViteAppTests(they run where Docker is present, so they aren't flaky today); left out to keep
this PR tight.
Checklist