Prototype: phased shutdown for MTP (#5345)#8580
Draft
Evangelink wants to merge 1 commit into
Draft
Conversation
Introduce a two-phase shutdown model for Microsoft.Testing.Platform so
test sessions get a deterministic drain window before being aborted:
- Extend internal ITestApplicationCancellationTokenSource with
DrainingToken (graceful cancel) and AbortingToken (forceful abort),
plus an Abort() entry point. The existing CancellationToken is kept
as a back-compat alias for DrainingToken so current consumers keep
observing graceful cancellation without changes.
- Rewrite CTRLPlusCCancellationTokenSource with a Running -> Draining
-> Aborting state machine:
* 1st Ctrl+C enters Draining, starts a 30s grace timer that
escalates to Aborting on elapse.
* 2nd Ctrl+C escalates to Aborting immediately, starts a 10s
abort-timeout safety net that FailFasts via IEnvironment.
* 3rd Ctrl+C is no longer intercepted - the runtime terminates
the process (matches docker compose / kubectl / npm UX).
- Add 6 unit tests covering initial state, single/idempotent cancel,
abort, grace-period escalation, and zero-grace escalation. Passes
on net9.0 and net462.
CLI options (--shutdown-grace-period, --shutdown-abort-timeout), env-
var propagation to controlled hosts, TerminalOutputDevice UX updates,
and migration of existing token consumers are intentionally deferred
to follow-up PRs and tracked in the RFC.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Contributor
There was a problem hiding this comment.
Pull request overview
Prototype implementation of a two-phase shutdown model for Microsoft.Testing.Platform (MTP), introducing distinct “Draining” vs “Aborting” cancellation semantics to address Ctrl+C behavior discussed in #5345 while keeping back-compat for existing token consumers.
Changes:
- Extend
ITestApplicationCancellationTokenSourcewithDrainingToken,AbortingToken, and anAbort()escalation API (withCancellationTokenpreserved as a Draining alias). - Rewrite
CTRLPlusCCancellationTokenSourceto implement aRunning → Draining → Abortingphase model with Ctrl+C escalation and grace/abort timers. - Add unit tests covering the new phase/token behavior.
Show a summary per file
| File | Description |
|---|---|
| src/Platform/Microsoft.Testing.Platform/Services/ITestApplicationCancellationTokenSource.cs | Adds two-phase shutdown tokens and Abort() API while preserving legacy token alias. |
| src/Platform/Microsoft.Testing.Platform/Services/CTRLPlusCCancellationTokenSource.cs | Implements the phased shutdown state machine, Ctrl+C escalation logic, and timers. |
| test/UnitTests/Microsoft.Testing.Platform.UnitTests/Hosts/CommonHostTests.cs | Updates local test stub to satisfy the extended cancellation token source interface. |
| test/UnitTests/Microsoft.Testing.Platform.UnitTests/Services/CTRLPlusCCancellationTokenSourceTests.cs | Adds new unit tests validating draining/aborting semantics and grace escalation. |
Copilot's findings
- Files reviewed: 4/4 changed files
- Comments generated: 5
| case 2: | ||
| // 2nd Ctrl+C: escalate to abort. | ||
| e.Cancel = true; | ||
| EnterAborting(); |
Comment on lines
+105
to
+109
| public void Dispose() | ||
| { | ||
| _drainingCts.Dispose(); | ||
| _abortingCts.Dispose(); | ||
| } |
Comment on lines
+163
to
+173
| private void EnterAborting() | ||
| { | ||
| if (Interlocked.Exchange(ref _phase, PhaseAborting) == PhaseAborting) | ||
| { | ||
| return; | ||
| } | ||
|
|
||
| public void Cancel() | ||
| => _cancellationTokenSource.Cancel(); | ||
| try | ||
| { | ||
| _abortingCts.Cancel(); | ||
| } |
Comment on lines
+189
to
+196
| private static void ScheduleEscalation(TimeSpan delay, Action action) | ||
| { | ||
| // Fire-and-forget timer. We don't dispose: the host is shutting down anyway, | ||
| // and a short-lived CTS is cheaper than holding a Timer reference we'd need | ||
| // to manage across the phase machine. | ||
| var timerCts = new CancellationTokenSource(delay); | ||
| timerCts.Token.Register(action); | ||
| } |
Comment on lines
+88
to
+92
| while (!source.AbortingToken.IsCancellationRequested && !waitCts.IsCancellationRequested) | ||
| { | ||
| await Task.Delay(10, TestContext.CancellationToken).ConfigureAwait(false); | ||
| } | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Prototype of a two-phase graceful shutdown for Microsoft.Testing.Platform, exploring the design discussed in #5345 (see this comment).
Status: draft / RFC — opened for early design feedback. CLI options and most consumer migrations are intentionally deferred.
Motivation
Today MTP exposes a single
ITestApplicationCancellationTokenSource.CancellationToken. On Ctrl+C, every consumer (tests, extensions, hosts) receives the same signal at the same time, and there is no contract distinguishing "please wind down gracefully" from "stop right now". This conflates the two shutdown modes that virtually every other host-style runtime treats as distinct phases (systemdEXTEND_TIMEOUT_USEC, macOSNSTerminateLater, AWS Lambda ExtensionsSHUTDOWN, KubernetespreStop/terminationGracePeriodSeconds, docker compose 2-press SIGTERM→SIGKILL, VitestteardownTimeout, …).What's in this PR
A working prototype of Design A from the offline analysis:
DrainingToken(graceful) andAbortingToken(forceful) on the internalITestApplicationCancellationTokenSourceCancellationTokenis preserved as an alias ofDrainingTokenso the ~14 current consumers keep working unchangedAbort()APIRunning → Draining → AbortingusingInterlocked.CompareExchangefor atomic phase transitionsIEnvironment.FailFast(bannedEnvironment.FailFastreplaced with the injectable wrapper)Files
src/Platform/Microsoft.Testing.Platform/Services/ITestApplicationCancellationTokenSource.cs— extended interfacesrc/Platform/Microsoft.Testing.Platform/Services/CTRLPlusCCancellationTokenSource.cs— full rewritetest/UnitTests/Microsoft.Testing.Platform.UnitTests/Hosts/CommonHostTests.cs— inline mock updatedtest/UnitTests/Microsoft.Testing.Platform.UnitTests/Services/CTRLPlusCCancellationTokenSourceTests.cs— 6 new unit testsVerification
net8.0/net9.0/netstandard2.0net9.0net462(confirms no DIM/runtime issues for theMSTest.TestAdapterconsumer of the netstandard2.0 build)CommonHost/Cancellation/StopPolic*tests still pass — no regressionsIntentionally deferred (follow-ups)
To keep this PR reviewable, the following are explicitly NOT in scope and will land as separate PRs once the core direction is approved:
--shutdown-grace-periodand--shutdown-abort-timeoutviaPlatformCommandLineProvider+HelpInfoTestsacceptance assertionsTestHostOrchestratorHost/TestHostControllersTestHostcontrolled childrenTerminalOutputDeviceUX line: "Cancelling test session… (press Ctrl+C again to force quit)"DrainingToken/AbortingTokenusageHotReloadextension's ownCancelKeyPresshandlerIShutdownParticipantack/extend protocol) — separate RFCOpen design questions
HostOptions.ShutdownTimeout+ a newAbortTimeout?IEnvironmentinjection be threaded explicitly fromTestHostBuilder.CommonServicesnow, or stay defaulted toSystemEnvironment?Process.Killourselves?Full RFC (motivation, prior-art table, rollout plan, alternatives) is available in my session notes and will be posted as a comment on #5345 once the prototype direction is acknowledged.
Refs #5345