Skip to content

Enhance crash dump collection options in RemoteExecutor (release/10.0)#17060

Draft
hoyosjs wants to merge 1 commit into
release/10.0from
juhoyosa/port-crash-dumps-10-0
Draft

Enhance crash dump collection options in RemoteExecutor (release/10.0)#17060
hoyosjs wants to merge 1 commit into
release/10.0from
juhoyosa/port-crash-dumps-10-0

Conversation

@hoyosjs

@hoyosjs hoyosjs commented Jun 28, 2026

Copy link
Copy Markdown
Member

Summary

This is the release/10.0 port of #16716 ("Enhance crash dump collection options in RemoteExecutor"). It is an adaptation, not a cherry-pick: release/10.0 ships older dependencies, so the CLRMD upgrade required additional dependency handling.

The feature adds env-var-based crash dump control plus improved timeout-diagnostics dump collection to Microsoft.DotNet.RemoteExecutor:

  • New CrashDumpCollectionType enum and EnableTimeoutDumpCollection / CrashDumpCollectionType / CrashDumpPath options on RemoteInvokeOptions.
  • Env-var manipulation (DOTNET_DbgEnableMiniDump, DOTNET_DbgMiniDumpType, DOTNET_DbgMiniDumpName) in RemoteExecutor.
  • Refactored CollectTimeoutDiagnostics using Microsoft.Diagnostics.NETCore.Client DiagnosticsClient.WriteDump on .NET Core, plus CLRMD v4 thread enumeration; the legacy MiniDump.cs path remains for .NET Framework.
  • New tests.

Key difference from main

On main, CLRMD (Microsoft.Diagnostics.Runtime) was already v4. On release/10.0 it was still 1.0.5, so this PR performs the CLRMD 1.0.5 → 4.0.732101 upgrade and adds Microsoft.Diagnostics.NETCore.Client 0.2.721401.

Review-feedback fixes (same as main)

  1. RemoteExecutor.csDOTNET_DbgMiniDumpType formatted with CultureInfo.InvariantCulture.
  2. RemoteInvokeHandle.cs — timeout dump uses DumpType.WithHeap (not Full).
  3. RemoteExecutorTests.csCrashDumpCollection_DefaultLeavesEnvVarsUntouched is now deterministic: it sets known DOTNET_Dbg* values on options.StartInfo.Environment and asserts the child observes the same values (passthrough).

Intentionally excluded

The unrelated DefaultIdentityTokenCredential.cs / Managed Identity change from the source branch (which would require an Azure.Identity central bump) is not ported — it is unrelated to crash dumps.

Files changed

  • Directory.Packages.props
  • eng/Versions.props
  • src/Microsoft.DotNet.RemoteExecutor/src/RemoteExecutor.cs
  • src/Microsoft.DotNet.RemoteExecutor/src/RemoteInvokeHandle.cs
  • src/Microsoft.DotNet.RemoteExecutor/src/RemoteInvokeOptions.cs
  • src/Microsoft.DotNet.RemoteExecutor/src/Microsoft.DotNet.RemoteExecutor.csproj
  • src/Microsoft.DotNet.RemoteExecutor/tests/RemoteExecutorTests.cs
  • src/Microsoft.DotNet.RemoteExecutor/tests/Microsoft.DotNet.RemoteExecutor.Tests.csproj

Risk assessment

Central package changes (Directory.Packages.props / eng/Versions.props)

Minimal, and all are needed by RemoteExecutor's CLRMD/NETCore.Client upgrade:

Package release/10.0 This PR Reason
Microsoft.Diagnostics.Runtime 1.0.5 4.0.732101 The feature itself (CLRMD v4 API). RemoteExecutor-exclusive.
Microsoft.Diagnostics.NETCore.Client 0.2.721401 (new) DiagnosticsClient.WriteDump on .NET Core. RemoteExecutor-exclusive.
System.Collections.Immutable 9.0.0-rc 10.0.7 Required by CLRMD 4.x.
Microsoft.Bcl.AsyncInterfaces 9.0.0-rc 9.0.8 Required by NETCore.Client 0.2.721401.
SystemMemoryVersion (eng/Versions.props) 4.5.5 4.6.3 Required by System.Collections.Immutable 10.0.7.

These do not leak the .NET-10-final stack into netfx build tasks: the netfx build-task packages keep their 8.0.x pins via Update= entries in eng/BuildTask.Packages.props, and CLRMD/NETCore.Client are consumed only by RemoteExecutor.

Isolated VersionOverride (scoped to the 2 RemoteExecutor projects only)

CLRMD 4.x transitively requires Azure.Identity ≥ 1.21.0 → Azure.Core ≥ 1.53.0, which pulls the 10.0.x System.Text.Json / Encodings.Web / Bcl.AsyncInterfaces / Extensions.* and Microsoft.Identity.Client 4.83.1. Rather than bumping these central versions repo-wide (which would push the 10.0.x stack into 5 unrelated consumers — Build.Tasks.Feed, ArcadeAzureIntegration, Helix.Client, Helix.JobSender, Internal.SymbolHelper — and risk netfx build-task binding-redirect load failures), they are pinned locally via VersionOverride on direct PackageReferences in only the src and tests csproj. Minimum versions that produce a clean restore were chosen (Azure.Identity 1.21.0, Azure.Core 1.53.0, etc.), not main's latest.

  • src project (multi-targets net10 + net462): 8 overrides — Azure.Identity 1.21.0, Azure.Core 1.53.0, Microsoft.Identity.Client 4.83.1, System.Text.Json 10.0.3, System.Text.Encodings.Web 10.0.3, Microsoft.Bcl.AsyncInterfaces 10.0.3, Microsoft.Extensions.DependencyInjection.Abstractions 10.0.3, Microsoft.Extensions.FileProviders.Abstractions 10.0.3.
  • tests project (net10 only): 6 overrides — same set minus System.Text.Json and System.Text.Encodings.Web (framework-provided / pruned on net10; an explicit reference would trip NU1510).

Blast radius

  • Build tasks untouched — the netfx 8.0.x pins in eng/BuildTask.Packages.props are unchanged; no build task picks up the 10.0.x stack.
  • The only projects whose restore graph changes are the two RemoteExecutor projects.
  • DefaultIdentityTokenCredential.cs / Azure.Identity central version left as-is.

Validation

  • restore.cmd: clean (0 warnings, 0 errors).
  • Tests project build (net10, Release): succeeded.
  • Library build for net462 (Release): succeeded — confirms the #else MiniDump.Create path still compiles.
  • Test run (net10, Release): 19/19 passed, including the new crash-dump tests.

Residual risk & rollback

  • Residual risk: low. Changes are isolated to RemoteExecutor; build-task restore graphs are unaffected. The main unverifiable-at-build-time aspect is live crash-dump capture on a real CI timeout, which depends on runtime/OS dump support rather than these package versions.
  • Rollback: revert this single commit / close this PR. Because the heavy lifting is via project-local VersionOverride, reverting restores the prior central graph for all other projects with no cross-project cleanup needed.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings June 28, 2026 05:39

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR ports the “enhance crash dump collection options in RemoteExecutor” feature to release/10.0, adding configurable crash-dump behavior for RemoteExecutor subprocesses and modernizing timeout diagnostics (including updated diagnostic dependencies required on this branch).

Changes:

  • Adds CrashDumpCollectionType, CrashDumpPath, and EnableTimeoutDumpCollection options to control crash-dump environment variables and timeout dump behavior.
  • Refactors timeout diagnostics to use DiagnosticsClient.WriteDump on .NET (Core) and CLRMD v4 thread enumeration, while keeping the legacy MiniDump path for .NET Framework.
  • Updates package versions (CLRMD upgrade + new NETCore.Client) and adds tests validating crash-dump env-var behavior and dump creation.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
Directory.Packages.props Updates central package versions; adds Microsoft.Diagnostics.NETCore.Client and upgrades CLRMD to v4.
eng/Versions.props Bumps SystemMemoryVersion to satisfy upgraded dependency requirements.
src/Microsoft.DotNet.RemoteExecutor/src/RemoteExecutor.cs Sets/removes DOTNET_Dbg* env vars based on new RemoteInvokeOptions crash-dump settings.
src/Microsoft.DotNet.RemoteExecutor/src/RemoteInvokeHandle.cs Adds EnableTimeoutDumpCollection gate and refactors timeout dump + thread diagnostics using DiagnosticsClient/CLRMD v4.
src/Microsoft.DotNet.RemoteExecutor/src/RemoteInvokeOptions.cs Introduces crash-dump configuration API surface (CrashDumpCollectionType, CrashDumpPath, EnableTimeoutDumpCollection).
src/Microsoft.DotNet.RemoteExecutor/src/Microsoft.DotNet.RemoteExecutor.csproj Adds NETCore.Client dependency (netcore only), removes MiniDump.cs from netcore compilation, and applies scoped VersionOverrides.
src/Microsoft.DotNet.RemoteExecutor/tests/RemoteExecutorTests.cs Adds tests for crash-dump env-var behavior and dump creation on crash.
src/Microsoft.DotNet.RemoteExecutor/tests/Microsoft.DotNet.RemoteExecutor.Tests.csproj Mirrors scoped VersionOverrides to keep test restore graph consistent with src project.

Comment on lines +214 to +220
try
{
string dumpPath = Path.Combine(uploadPath, $"{Process.Id}.{Path.GetRandomFileName()}.dmp");
#if NETCOREAPP
// These define guards assume that harness running on .NET Framework implies test process runs on .NET Framework.
var client = new DiagnosticsClient(Process.Id);
client.WriteDump(DumpType.WithHeap, dumpPath, logDumpGeneration: false);
Comment on lines +246 to +247
string[] dumpFiles = Directory.GetFiles(dumpDir, "*.dmp");
Assert.NotEmpty(dumpFiles);
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants