feat(ApplicationInsights): migrate to Application Insights v3.1.0 with custom OpenTelemetry adaptive sampler#633
Conversation
Co-authored-by: AndreaCuneo <5227688+AndreaCuneo@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
8fc6af1 to
85e55b9
Compare
e29381e to
50db13f
Compare
e04eba0 to
5178119
Compare
85a5280 to
398b0c2
Compare
7b80f2a to
c3d27dd
Compare
17906ca to
4662bec
Compare
|
@copilot merge master and update AI packages to 3.1.0 Refresh the analysis and the finding based on 3.1 changes. Analyze carefully the OTel adaptive sampling capability and evaluate alternatives - adaptive sampling on client is a major cost saver and dropping it is not cost effettive. |
Co-authored-by: AndreaCuneo <5227688+AndreaCuneo@users.noreply.github.com>
…g changes remain - Merged master branch into the PR branch - Updated ApplicationInsights packages from 2.23.0 to 3.1.0 - Updated SDK version in global.json to 10.0.201 - Pinned OpenTelemetry.Api to 1.15.3 to fix CVE-2026-40894 vulnerability - Investigation shows v3.1.0 still has same breaking changes as v3.0.0 - ITelemetryProcessor and ITelemetryInitializer interfaces remain removed - Build still fails with same compilation errors - Adaptive sampling requires migration to OpenTelemetry approach Agent-Logs-Url: https://github.com/ARKlab/Ark.Tools/sessions/6031d7e2-6703-432b-8a1d-39739c0ff462 Co-authored-by: AndreaCuneo <5227688+AndreaCuneo@users.noreply.github.com>
✅ Merged master and updated ApplicationInsights packages to 3.1.0 (commit dfe0fe6) Key Findings: v3.1.0 Status:
Adaptive Sampling Analysis: v2.x: Dynamic client-side adaptive sampling via v3.x Options:
Security: Recommendation:
This is not a simple package update - it's an architectural migration requiring careful planning around telemetry costs and sampling strategy. |
|
@copilot analyze the sampling customizations that has been added on top of adaptive sampling in Ark.Tools.ApplicationInsights The goal is an adaptive sampler which preserve traces with failures while rate limit the others on buckets over the 'first span' identifiers so to avoid sampling rarely used codepath vs frequentily used ones. Research on nuget for OTel samplers before implementing our own. Prepare an analysis document for migration to OTel and an implementation plan in docs/otel/* |
|
@AndreaCuneo Unfortunately I hit an unexpected error while processing your comment. I've automatically reported this to GitHub. You can ask me to try again later by mentioning me in a new comment. If you want to contact GitHub about this error, please mention the following identifier so they can better serve you: Sorry for the inconvenience! |
- Analyzed current Ark.Tools sampling customizations - Researched available OpenTelemetry sampler packages on NuGet - Created migration-analysis.md with detailed breakdown of current architecture - Created implementation-plan.md with phased implementation approach - Created nuget-research.md with package evaluation and comparison - Created implementation-plan-code-examples.md with complete code samples - Key findings: No NuGet packages meet requirements, custom sampler needed - Adaptive sampling critical for cost efficiency - must be preserved - Estimated effort: 25-30 developer days over 14 weeks Agent-Logs-Url: https://github.com/ARKlab/Ark.Tools/sessions/03fa4e1f-d414-4b63-a9f8-2ec6c2573969 Co-authored-by: AndreaCuneo <5227688+AndreaCuneo@users.noreply.github.com>
✅ Analysis complete and documented in Current Sampling Architecture Analyzed:
NuGet Research Results:
Conclusion: Custom implementation required Documents Created:
Key Recommendation: Implement Effort: 25-30 developer days | Timeline: 14 weeks including staged rollout |
- Created executive-summary.md with cost-benefit analysis - Detailed investment requirements and payback period - Added decision matrix comparing all options - Included immediate next steps and questions for stakeholders - Recommendation: Implement custom sampler via hybrid approach - Total documentation: 87KB across 5 comprehensive documents Agent-Logs-Url: https://github.com/ARKlab/Ark.Tools/sessions/03fa4e1f-d414-4b63-a9f8-2ec6c2573969 Co-authored-by: AndreaCuneo <5227688+AndreaCuneo@users.noreply.github.com>
- Created README.md as entry point for documentation - Provides quick start guide and document overview - Includes decision tree and key metrics - Links to all analysis documents - Total documentation: 99KB across 6 documents Agent-Logs-Url: https://github.com/ARKlab/Ark.Tools/sessions/03fa4e1f-d414-4b63-a9f8-2ec6c2573969 Co-authored-by: AndreaCuneo <5227688+AndreaCuneo@users.noreply.github.com>
…ze docs/otel Agent-Logs-Url: https://github.com/ARKlab/Ark.Tools/sessions/b465394b-ee30-4f4e-98dd-9181411ff6fe Co-authored-by: AndreaCuneo <5227688+AndreaCuneo@users.noreply.github.com>
…gistry - Add FailedTraceRegistry: shared TTL-cleaned set of ActivityTraceIds with at least one failed span; drives coordination between sampler and processor - ArkFailurePromotionProcessor: on failure, register TraceId, walk in-process parent chain to promote ancestors (they haven't ended yet), promote in-flight siblings via registry check on their own OnEnd - ArkAdaptiveSampler: check FailedTraceRegistry in ShouldSample so new child spans started after failure detection are immediately RecordAndSample - ArkApplicationInsightsExtensions: create shared FailedTraceRegistry and inject into both sampler (internal ctor) and processor (internal ctor) - Both classes keep their existing public zero-arg / options-only constructors for standalone use; the internal registry-accepting overloads are used by the extension method for coordinated whole-operation promotion - Update sampling.md to document the whole-operation promotion semantics, parent-chain walk, and the known limitation (pre-failure sibling spans) Agent-Logs-Url: https://github.com/ARKlab/Ark.Tools/sessions/59e5e2d6-48d9-48af-b9ed-0da3a7887596 Co-authored-by: AndreaCuneo <5227688+AndreaCuneo@users.noreply.github.com>
AndreaCuneo
left a comment
There was a problem hiding this comment.
Sampling is a complex feature and must be tested.
-
create a new Ark.Tools.OTel library and move all extensions, code there
-
Add a new tests assembly and tests simulating Activities within OTel. Do not just unit test the single class, unit tests the behaviours specified in the documentation
|
@AndreaCuneo I've opened a new pull request, #714, to work on those changes. Once the pull request is ready, I'll request review from you. |
Migrates Ark.Tools from Application Insights v2.23.0 to v3.1.0, which is an architectural shift to OpenTelemetry. Since v3 removes the classic
ITelemetryProcessor/ITelemetryInitializerpipeline and adaptive sampling APIs, this PR implements a full custom OpenTelemetry-based sampling pipeline that preserves all existing cost-saving and error-visibility behaviours.Breaking Changes Addressed
Application Insights v3.x removes from the public API:
ITelemetryInitializer/ITelemetryProcessorinterfacesTelemetryConfiguration.TelemetryInitializers/TelemetryProcessorscollectionsSamplingPercentageEstimatorSettings)MetricConfigurationForMeasurementAPIITelemetryModuleAddApplicationInsightsSettings()helperCustom OpenTelemetry Sampling Pipeline
A new sampling pipeline in
Ark.Tools.ApplicationInsightsreplaces all v2.x constructs:ArkAdaptiveSampler– OpenTelemetrySamplerwith per-operation token-bucket rate limiting and adaptive rate adjustment. ChecksFailedTraceRegistryso any span belonging to a known-failed trace is immediately promoted toRecordAndSample.OperationBucket– thread-safe token bucket for per-operation rate limiting.ArkAdaptiveSamplerOptions– configuration POCO (max items/sec, moving-average ratio, timeouts, bucket capacity).FailedTraceRegistry– shared TTL-cleaned set ofActivityTraceIds that have at least one failed span, used to coordinate whole-operation promotion between the sampler and the failure processor.ArkFailurePromotionProcessor–BaseProcessor<Activity>that, on failure, registers theTraceId, walks the in-process parent chain to promote ancestors still in flight, and promotes any other buffered sibling spans whose trace is already in the registry. This ensures the entire operation (root span + all children) is exported when any span fails.ArkPreFilterProcessor– drops high-noise low-value spans (CORS OPTIONS requests, Azure Service Bus Receive, SQL Commit).ArkTelemetryEnrichmentProcessor– addsProcessNametag to all spans.ArkSqlDependencyFilterProcessor– filters SQL spans for the NLog internal database connection.ArkApplicationInsightsExtensions–AddArkApplicationInsightsCustomizations()wires the full pipeline, creating a sharedFailedTraceRegistryinjected into both sampler and failure processor.Whole-Operation Failure Promotion
When any span in a trace ends in failure, all parents and in-flight siblings are promoted for export, ensuring the complete operation context is available for diagnosis. Spans that completed successfully before failure detection are the only ones that may be missed (a known and accepted limitation).
Package & API Updates
Microsoft.ApplicationInsights,Microsoft.ApplicationInsights.AspNetCore,Microsoft.ApplicationInsights.WorkerServiceto 3.1.0Microsoft.ApplicationInsights.NLogTarget→ 3.1.0-beta4 (no stable v3 available);ConnectionStringreplacesInstrumentationKeyinNlogConfigurer.csMicrosoft.ApplicationInsights.WindowsServer.TelemetryChannel(obsolete in v3)OpenTelemetryandOpenTelemetry.Extensions.Hostingto central package managementOpenTelemetry.Apito 1.15.3 (fixes CVE-2026-40894 DoS vulnerability)Replaced v2.x Classes
DoNotSampleFailures(ITelemetryInitializer)ArkFailurePromotionProcessor+FailedTraceRegistryEnableAdaptiveSamplingWithCustomSettingsArkAdaptiveSamplerArkSkipUselessSpamTelemetryProcessor(ITelemetryProcessor)ArkPreFilterProcessorGlobalInfoTelemetryInitializerArkTelemetryEnrichmentProcessorSkipSqlDatabaseDependencyFilterArkSqlDependencyFilterProcessorWebApi4xxAsSuccessTelemetryInitializerWebApi4xxAsSuccessProcessorWebApiUserTelemetryInitializerWebApiUserProcessorResourceWatcherTelemetryModule(ITelemetryModule)IHostedServiceOther Changes
ArkStartupBase.cs– removed obsoleteITelemetryInitializerregistrationsApplicationInsightsProcessingMetricsStep.cs– removed removedMetricConfigurationForMeasurementAPIAddApplicationInsightsSettings()callsArk.Tools.AspNetCore.ApplicationInsights/Startup/Ex.csandArk.Tools.ApplicationInsights.HostedService/Ex.csupdated to use new OTel pipelineDocumentation
docs/otel/README.md– features, usage, configuration reference, architecture overviewdocs/otel/sampling.md– detailed sampling algorithm, whole-operation promotion semantics,FailedTraceRegistrydesign, known limitationsdocs/otel/applicationinsights-migration/– analysis, NuGet research, implementation plan, code examples, executive summary (moved fromdocs/otel/root)Build
✅ Full solution build succeeded
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.