Skip to content

feat(ApplicationInsights): migrate to Application Insights v3.1.0 with custom OpenTelemetry adaptive sampler#633

Draft
Copilot wants to merge 10 commits into
renovate/major-3-applicationinsights-dotnet-monorepofrom
copilot/sub-pr-632
Draft

feat(ApplicationInsights): migrate to Application Insights v3.1.0 with custom OpenTelemetry adaptive sampler#633
Copilot wants to merge 10 commits into
renovate/major-3-applicationinsights-dotnet-monorepofrom
copilot/sub-pr-632

Conversation

Copy link
Copy Markdown
Contributor

Copilot AI commented Feb 4, 2026

Migrates Ark.Tools from Application Insights v2.23.0 to v3.1.0, which is an architectural shift to OpenTelemetry. Since v3 removes the classic ITelemetryProcessor/ITelemetryInitializer pipeline and adaptive sampling APIs, this PR implements a full custom OpenTelemetry-based sampling pipeline that preserves all existing cost-saving and error-visibility behaviours.

Breaking Changes Addressed

Application Insights v3.x removes from the public API:

  • ITelemetryInitializer / ITelemetryProcessor interfaces
  • TelemetryConfiguration.TelemetryInitializers / TelemetryProcessors collections
  • Adaptive sampling infrastructure (SamplingPercentageEstimatorSettings)
  • MetricConfigurationForMeasurement API
  • ITelemetryModule
  • AddApplicationInsightsSettings() helper

Custom OpenTelemetry Sampling Pipeline

A new sampling pipeline in Ark.Tools.ApplicationInsights replaces all v2.x constructs:

  • ArkAdaptiveSampler – OpenTelemetry Sampler with per-operation token-bucket rate limiting and adaptive rate adjustment. Checks FailedTraceRegistry so any span belonging to a known-failed trace is immediately promoted to RecordAndSample.
  • OperationBucket – thread-safe token bucket for per-operation rate limiting.
  • ArkAdaptiveSamplerOptions – configuration POCO (max items/sec, moving-average ratio, timeouts, bucket capacity).
  • FailedTraceRegistry – shared TTL-cleaned set of ActivityTraceIds that have at least one failed span, used to coordinate whole-operation promotion between the sampler and the failure processor.
  • ArkFailurePromotionProcessorBaseProcessor<Activity> that, on failure, registers the TraceId, walks the in-process parent chain to promote ancestors still in flight, and promotes any other buffered sibling spans whose trace is already in the registry. This ensures the entire operation (root span + all children) is exported when any span fails.
  • ArkPreFilterProcessor – drops high-noise low-value spans (CORS OPTIONS requests, Azure Service Bus Receive, SQL Commit).
  • ArkTelemetryEnrichmentProcessor – adds ProcessName tag to all spans.
  • ArkSqlDependencyFilterProcessor – filters SQL spans for the NLog internal database connection.
  • ArkApplicationInsightsExtensionsAddArkApplicationInsightsCustomizations() wires the full pipeline, creating a shared FailedTraceRegistry injected into both sampler and failure processor.

Whole-Operation Failure Promotion

When any span in a trace ends in failure, all parents and in-flight siblings are promoted for export, ensuring the complete operation context is available for diagnosis. Spans that completed successfully before failure detection are the only ones that may be missed (a known and accepted limitation).

Package & API Updates

  • Updated Microsoft.ApplicationInsights, Microsoft.ApplicationInsights.AspNetCore, Microsoft.ApplicationInsights.WorkerService to 3.1.0
  • Microsoft.ApplicationInsights.NLogTarget3.1.0-beta4 (no stable v3 available); ConnectionString replaces InstrumentationKey in NlogConfigurer.cs
  • Removed Microsoft.ApplicationInsights.WindowsServer.TelemetryChannel (obsolete in v3)
  • Added OpenTelemetry and OpenTelemetry.Extensions.Hosting to central package management
  • Pinned OpenTelemetry.Api to 1.15.3 (fixes CVE-2026-40894 DoS vulnerability)

Replaced v2.x Classes

Removed (v2.x) Replacement
DoNotSampleFailures (ITelemetryInitializer) ArkFailurePromotionProcessor + FailedTraceRegistry
EnableAdaptiveSamplingWithCustomSettings ArkAdaptiveSampler
ArkSkipUselessSpamTelemetryProcessor (ITelemetryProcessor) ArkPreFilterProcessor
GlobalInfoTelemetryInitializer ArkTelemetryEnrichmentProcessor
SkipSqlDatabaseDependencyFilter ArkSqlDependencyFilterProcessor
WebApi4xxAsSuccessTelemetryInitializer WebApi4xxAsSuccessProcessor
WebApiUserTelemetryInitializer WebApiUserProcessor
ResourceWatcherTelemetryModule (ITelemetryModule) IHostedService
Factory classes

Other Changes

  • ArkStartupBase.cs – removed obsolete ITelemetryInitializer registrations
  • ApplicationInsightsProcessingMetricsStep.cs – removed removed MetricConfigurationForMeasurement API
  • Samples – removed AddApplicationInsightsSettings() calls
  • Ark.Tools.AspNetCore.ApplicationInsights/Startup/Ex.cs and Ark.Tools.ApplicationInsights.HostedService/Ex.cs updated to use new OTel pipeline

Documentation

  • docs/otel/README.md – features, usage, configuration reference, architecture overview
  • docs/otel/sampling.md – detailed sampling algorithm, whole-operation promotion semantics, FailedTraceRegistry design, known limitations
  • docs/otel/applicationinsights-migration/ – analysis, NuGet research, implementation plan, code examples, executive summary (moved from docs/otel/ root)

Build

✅ Full solution build succeeded


✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.

Co-authored-by: AndreaCuneo <5227688+AndreaCuneo@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Feb 4, 2026

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

  • https://api.github.com/repos/microsoft/ApplicationInsights-dotnet/releases/tags/3.0.0
    • Triggering command: /home/REDACTED/work/_temp/ghcca-node/node/bin/node /home/REDACTED/work/_temp/ghcca-node/node/bin/node --enable-source-maps /home/REDACTED/work/_temp/copilot-developer-action-main/dist/index.js (http block)

If you need me to access, download, or install something from one of these locations, you can either:

Copilot AI changed the title [WIP] Update applicationinsights-dotnet monorepo to version 3 chore(deps): investigate Application Insights v3 upgrade - breaking changes identified Feb 4, 2026
Copilot AI requested a review from AndreaCuneo February 4, 2026 23:08
@renovate renovate Bot force-pushed the renovate/major-3-applicationinsights-dotnet-monorepo branch 5 times, most recently from 8fc6af1 to 85e55b9 Compare February 7, 2026 17:46
@renovate renovate Bot force-pushed the renovate/major-3-applicationinsights-dotnet-monorepo branch 9 times, most recently from e29381e to 50db13f Compare February 19, 2026 13:11
@renovate renovate Bot force-pushed the renovate/major-3-applicationinsights-dotnet-monorepo branch 5 times, most recently from e04eba0 to 5178119 Compare February 27, 2026 10:03
@renovate renovate Bot force-pushed the renovate/major-3-applicationinsights-dotnet-monorepo branch 4 times, most recently from 85a5280 to 398b0c2 Compare April 12, 2026 05:42
@renovate renovate Bot force-pushed the renovate/major-3-applicationinsights-dotnet-monorepo branch 5 times, most recently from 7b80f2a to c3d27dd Compare April 24, 2026 06:02
@renovate renovate Bot force-pushed the renovate/major-3-applicationinsights-dotnet-monorepo branch 5 times, most recently from 17906ca to 4662bec Compare April 26, 2026 13:15
@AndreaCuneo
Copy link
Copy Markdown
Contributor

@copilot merge master and update AI packages to 3.1.0

Refresh the analysis and the finding based on 3.1 changes. Analyze carefully the OTel adaptive sampling capability and evaluate alternatives - adaptive sampling on client is a major cost saver and dropping it is not cost effettive.

Copilot AI and others added 2 commits April 27, 2026 06:17
Co-authored-by: AndreaCuneo <5227688+AndreaCuneo@users.noreply.github.com>
…g changes remain

- Merged master branch into the PR branch
- Updated ApplicationInsights packages from 2.23.0 to 3.1.0
- Updated SDK version in global.json to 10.0.201
- Pinned OpenTelemetry.Api to 1.15.3 to fix CVE-2026-40894 vulnerability
- Investigation shows v3.1.0 still has same breaking changes as v3.0.0
- ITelemetryProcessor and ITelemetryInitializer interfaces remain removed
- Build still fails with same compilation errors
- Adaptive sampling requires migration to OpenTelemetry approach

Agent-Logs-Url: https://github.com/ARKlab/Ark.Tools/sessions/6031d7e2-6703-432b-8a1d-39739c0ff462

Co-authored-by: AndreaCuneo <5227688+AndreaCuneo@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 27, 2026

@copilot merge master and update AI packages to 3.1.0

Refresh the analysis and the finding based on 3.1 changes. Analyze carefully the OTel adaptive sampling capability and evaluate alternatives - ad...

✅ Merged master and updated ApplicationInsights packages to 3.1.0 (commit dfe0fe6)

Key Findings:

v3.1.0 Status:

  • Same breaking changes as v3.0.0 - ITelemetryProcessor and ITelemetryInitializer are still removed
  • Build still fails with identical compilation errors
  • Migration to OpenTelemetry architecture is required

Adaptive Sampling Analysis:
⚠️ Critical: Classic adaptive sampling is NOT available in v3.x

v2.x: Dynamic client-side adaptive sampling via SamplingPercentageEstimatorSettings - intelligent cost reduction

v3.x Options:

  1. Rate-limited sampling (default) - TracesPerSecond property, simpler but less sophisticated
  2. Custom OpenTelemetry sampler - Recreate adaptive logic using BaseProcessor<Activity>, more complex
  3. Tail sampling via OTel Collector - Most sophisticated, requires infrastructure changes

Security:
✅ Pinned OpenTelemetry.Api to 1.15.3 to fix CVE-2026-40894 (DoS vulnerability)

Recommendation:
Since adaptive sampling is a major cost saver, I recommend:

  1. Evaluate cost impact of switching to rate-limited sampling
  2. Consider custom sampler implementation to preserve adaptive behavior
  3. Defer v3 upgrade until sampling strategy and cost implications are clear

This is not a simple package update - it's an architectural migration requiring careful planning around telemetry costs and sampling strategy.

@AndreaCuneo
Copy link
Copy Markdown
Contributor

@copilot analyze the sampling customizations that has been added on top of adaptive sampling in Ark.Tools.ApplicationInsights

The goal is an adaptive sampler which preserve traces with failures while rate limit the others on buckets over the 'first span' identifiers so to avoid sampling rarely used codepath vs frequentily used ones.

Research on nuget for OTel samplers before implementing our own.

Prepare an analysis document for migration to OTel and an implementation plan in docs/otel/*

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 27, 2026

@AndreaCuneo Unfortunately I hit an unexpected error while processing your comment. I've automatically reported this to GitHub.

You can ask me to try again later by mentioning me in a new comment.

If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: 820ebde5-815d-4a62-b2a1-c83560552fd9

Sorry for the inconvenience!

- Analyzed current Ark.Tools sampling customizations
- Researched available OpenTelemetry sampler packages on NuGet
- Created migration-analysis.md with detailed breakdown of current architecture
- Created implementation-plan.md with phased implementation approach
- Created nuget-research.md with package evaluation and comparison
- Created implementation-plan-code-examples.md with complete code samples
- Key findings: No NuGet packages meet requirements, custom sampler needed
- Adaptive sampling critical for cost efficiency - must be preserved
- Estimated effort: 25-30 developer days over 14 weeks

Agent-Logs-Url: https://github.com/ARKlab/Ark.Tools/sessions/03fa4e1f-d414-4b63-a9f8-2ec6c2573969

Co-authored-by: AndreaCuneo <5227688+AndreaCuneo@users.noreply.github.com>
Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 27, 2026

@copilot analyze the sampling customizations that has been added on top of adaptive sampling in Ark.Tools.ApplicationInsights

The goal is an adaptive sampler which preserve traces with failures whil...

✅ Analysis complete and documented in /docs/otel/ (commit 5563f1c)

Current Sampling Architecture Analyzed:

  1. DoNotSampleFailures - Forces 100% sampling of all errors/exceptions
  2. EnableAdaptiveSamplingWithCustomSettings - Dynamic rate adjustment with moving average
  3. ArkSkipUselessSpamTelemetryProcessor - Pre-filters OPTIONS, Service Bus receives, SQL commits
  4. Goal: Adaptive sampling with failure preservation and per-operation fairness

NuGet Research Results:

  • OpenTelemetry (core) - Only basic samplers, no adaptive/rate-limiting
  • OpenTelemetry.Extensions.AWS - Has RateLimitingSampler but global only, not per-operation, not adaptive
  • No other production-ready packages found

Conclusion: Custom implementation required

Documents Created:

  1. migration-analysis.md - Current architecture, cost impact analysis, risk assessment
  2. nuget-research.md - Detailed package evaluation and comparison matrix
  3. implementation-plan.md - Phased approach, timeline (14 weeks), effort (25-30 days)
  4. implementation-plan-code-examples.md - Complete code samples for all components

Key Recommendation: Implement ArkAdaptiveSampler that combines failure preservation + adaptive rate limiting + per-operation token buckets. This maintains current cost efficiency while migrating to OpenTelemetry.

Effort: 25-30 developer days | Timeline: 14 weeks including staged rollout

Copilot AI and others added 3 commits April 27, 2026 06:41
- Created executive-summary.md with cost-benefit analysis
- Detailed investment requirements and payback period
- Added decision matrix comparing all options
- Included immediate next steps and questions for stakeholders
- Recommendation: Implement custom sampler via hybrid approach
- Total documentation: 87KB across 5 comprehensive documents

Agent-Logs-Url: https://github.com/ARKlab/Ark.Tools/sessions/03fa4e1f-d414-4b63-a9f8-2ec6c2573969

Co-authored-by: AndreaCuneo <5227688+AndreaCuneo@users.noreply.github.com>
- Created README.md as entry point for documentation
- Provides quick start guide and document overview
- Includes decision tree and key metrics
- Links to all analysis documents
- Total documentation: 99KB across 6 documents

Agent-Logs-Url: https://github.com/ARKlab/Ark.Tools/sessions/03fa4e1f-d414-4b63-a9f8-2ec6c2573969

Co-authored-by: AndreaCuneo <5227688+AndreaCuneo@users.noreply.github.com>
…ze docs/otel

Agent-Logs-Url: https://github.com/ARKlab/Ark.Tools/sessions/b465394b-ee30-4f4e-98dd-9181411ff6fe

Co-authored-by: AndreaCuneo <5227688+AndreaCuneo@users.noreply.github.com>
Comment thread docs/otel/sampling.md
…gistry

- Add FailedTraceRegistry: shared TTL-cleaned set of ActivityTraceIds with at
  least one failed span; drives coordination between sampler and processor
- ArkFailurePromotionProcessor: on failure, register TraceId, walk in-process
  parent chain to promote ancestors (they haven't ended yet), promote in-flight
  siblings via registry check on their own OnEnd
- ArkAdaptiveSampler: check FailedTraceRegistry in ShouldSample so new child
  spans started after failure detection are immediately RecordAndSample
- ArkApplicationInsightsExtensions: create shared FailedTraceRegistry and
  inject into both sampler (internal ctor) and processor (internal ctor)
- Both classes keep their existing public zero-arg / options-only constructors
  for standalone use; the internal registry-accepting overloads are used by the
  extension method for coordinated whole-operation promotion
- Update sampling.md to document the whole-operation promotion semantics,
  parent-chain walk, and the known limitation (pre-failure sibling spans)

Agent-Logs-Url: https://github.com/ARKlab/Ark.Tools/sessions/59e5e2d6-48d9-48af-b9ed-0da3a7887596

Co-authored-by: AndreaCuneo <5227688+AndreaCuneo@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

@AndreaCuneo AndreaCuneo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sampling is a complex feature and must be tested.

  1. create a new Ark.Tools.OTel library and move all extensions, code there

  2. Add a new tests assembly and tests simulating Activities within OTel. Do not just unit test the single class, unit tests the behaviours specified in the documentation

Copy link
Copy Markdown
Contributor Author

Copilot AI commented Apr 27, 2026

@AndreaCuneo I've opened a new pull request, #714, to work on those changes. Once the pull request is ready, I'll request review from you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants