Enable Customer SDK Stats#2707
Conversation
There was a problem hiding this comment.
Pull request overview
This pull request introduces comprehensive SDK statistics tracking and enhances the notification system to support telemetry retry events. The main objective is to enable periodic reporting of SDK usage metrics (success, dropped, and retry counts) and provide visibility into the complete telemetry lifecycle including retry scenarios.
Changes:
- Adds
eventsRetrynotification callback to INotificationManager and INotificationListener interfaces for tracking telemetry retry events with HTTP status codes - Implements
SdkStatsNotificationCbk- a notification listener that accumulates telemetry counts by type and periodically reports them as metrics - Integrates SDK stats listener into AISku with automatic initialization (enabled by default) and proper cleanup during unload
- Updates Sender to trigger notifications for sent, discarded, and retried events, storing baseType in IInternalStorageItem for telemetry type mapping
- Adds comprehensive unit tests for the SDK stats notification callback covering all major scenarios
Reviewed changes
Copilot reviewed 11 out of 11 changed files in this pull request and generated 6 comments.
Show a summary per file
| File | Description |
|---|---|
| shared/AppInsightsCore/src/interfaces/ai/INotificationManager.ts | Adds eventsRetry method signature to notification manager interface |
| shared/AppInsightsCore/src/interfaces/ai/INotificationListener.ts | Adds optional eventsRetry callback to listener interface |
| shared/AppInsightsCore/src/index.ts | Exports new SDK stats notification callback factory and interfaces |
| shared/AppInsightsCore/src/core/SdkStatsNotificationCbk.ts | New file implementing the SDK stats accumulator and periodic metric flushing logic |
| shared/AppInsightsCore/src/core/NotificationManager.ts | Implements eventsRetry notification dispatch to all registered listeners |
| shared/AppInsightsCore/src/constants/InternalConstants.ts | Adds STR_EVENTS_RETRY constant for the new notification type |
| shared/AppInsightsCore/Tests/Unit/src/aiunittests.ts | Registers new SDK stats notification callback test suite |
| shared/AppInsightsCore/Tests/Unit/src/ai/SdkStatsNotificationCbk.Tests.ts | New comprehensive test suite covering all SDK stats functionality |
| channels/applicationinsights-channel-js/src/Sender.ts | Adds notification triggers for sent/discarded/retry events and extracts telemetry items for notifications |
| channels/applicationinsights-channel-js/src/Interfaces.ts | Adds bT (baseType) property to IInternalStorageItem for telemetry type mapping |
| AISKU/src/AISku.ts | Integrates SDK stats listener with initialization, configuration, and unload handling |
9ed6655 to
7eaafe0
Compare
…nload Clearing the cache alone did not fix the build: Puppeteer's browser auto-download is failing on the hosted runner (it leaves a partial cache folder and then aborts with 'the browser folder exists but the executable is missing'), which fails 'npm install grunt-cli' across all Node versions. Skip the auto-download (PUPPETEER_SKIP_DOWNLOAD) so the install steps no longer fail, and point Puppeteer at the Google Chrome already present on the runner image via PUPPETEER_EXECUTABLE_PATH. The grunt qunit task launches Puppeteer without an explicit executablePath, so it honours this variable. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
After merging main, the core bundle grew past the size guardrails (raw 133.99 KB > 132 KB, deflate 53.81 KB > 53 KB) due to the OpenTelemetry additions on main. Raise the raw/bundle limit to 135 KB and the deflate limit to 54 KB so the size checks reflect the current footprint. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The SDK stats notification changes grew the 1ds-post-js deflate bundle to 27 KB (limit was 26.5 KB), and the merged core growth pushed the AISKULight (web-basic) deflate bundle to 42.61 KB (limit was 42 KB). Raise the 1ds-post-js deflate limit to 27.5 KB and the AISKULight deflate limits to 43 KB. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t size limit The hosted runner's default ~/.cache/puppeteer contains a partial browser entry that aborts Puppeteer install. Rather than driving the system Chrome (which makes the offline-channel IndexedDB tests hang), point Puppeteer at a clean per-job PUPPETEER_CACHE_DIR so it downloads and runs its own bundled browser - matching the environment under which these tests previously passed. Also raise the AISKULight (web-basic) raw/bundle size limit from 102 to 105 KB to match the current bundle footprint after merging main. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Even with a fresh PUPPETEER_CACHE_DIR the auto-download during the install steps leaves a partial browser folder, so a later install aborts with 'the browser folder exists but the executable is missing'. Disable the auto-download (PUPPETEER_SKIP_DOWNLOAD) for the install steps and perform a single clean 'puppeteer browsers install' afterwards (the CLI ignores the skip flag). The tests run against this bundled browser. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- telemetry_type -> telemetryType - drop.code -> dropCode - retry.code -> retryCode - language value: JavaScript -> javascript Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
76596bc to
956aa48
Compare
…-customer-sdk-stats # Conflicts: # .github/workflows/ci.yml # AISKULight/Tests/Unit/src/AISKULightSize.Tests.ts # channels/1ds-post-js/test/Unit/src/FileSizeCheckTest.ts # common/config/rush/npm-shrinkwrap.json # shared/AppInsightsCore/Tests/Unit/src/ai/AppInsightsCoreSize.Tests.ts
…ntion and tests The SDK stats notification callback emitted language as lowercase 'javascript', which failed the SdkStatsNotifCbk unit tests expecting 'JavaScript' (matching StatsBeat.STATSBEAT_LANGUAGE). Align the emitted value and the one inconsistent test assertion on 'JavaScript'. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The customer SDK stats code increased the AISKU bundle to 175.56 KB, exceeding the previous 175 KB limit. This was previously masked because the AppInsightsCore language test failed first and blocked the dependent applicationinsights-web tests. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…n paths Addresses the Copilot performance review: - Sender: skip _extractTelemetryItems entirely when the NotificationManager has no listeners, avoiding N+1 throwaway item allocations per batch when SDK stats is disabled (_onError/_onSuccess/_notifyRetry). - PostChannel._requeueEvents: lazy-init requeuedEvents/droppedEvents so the unused array isn't allocated on each call. - SdkStatsNotificationCbk: precompute the version string once at creation, and reset accumulators in place instead of allocating new null-prototype objects. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…name, ambiguous status) - SdkStatsNotifCbk.unload(): set an _unloaded guard BEFORE the final flush so any synchronous notification fired by core.track() while the listener is still registered becomes a no-op, and the flush timer can no longer re-arm. Added a regression test covering post-unload callbacks. - Sender: preserve the original telemetry item name (new iN field on IInternalStorageItem) when rebuilding items for notification dispatch, instead of substituting baseType. This keeps SDK stats self-filtering (Item_Success_Count etc.) correct so emitted stats metrics aren't counted back into the stats. - PostChannel._requeueEvents: use -1 (not 0) as the retry status code when the reason is outside the ResponseFailure range, so an unmapped reason is distinguishable from a real network-level failure (HTTP status 0). Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…nel-js This temp tsconfig was a build/typecheck artifact committed by mistake (it is not referenced by any build script and is not gitignored). Removing it from the package. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
PageActionData, ContentUpdateData and PageUnloadData now report as CUSTOM_EVENT in SDK Stats telemetryType, with matching test coverage. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Single-sample timing ratio occasionally exceeded the 15x threshold on noisy CI runners. Run 10 iterations and assert on the best-case (minimum) overhead, matching the existing useSpan perf test. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t name, @SInCE) - Use ts-utils isUnsafePropKey instead of duplicated prototype-pollution guard - Add optional eventName param to createSdkStatsNotifCbk for 1DS (Ms.Web.SdkStat) - Correct stale @SInCE 3.3.12 tags to 3.4.1 - Add unit tests for the eventName override Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| name: evtName, | ||
| baseType: MetricDataType, | ||
| baseData: { | ||
| name: evtName, |
There was a problem hiding this comment.
This can be the passed name (props.metricName) -- it's only the top-level name that will cause the issue.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The top-level event name drives backend routing, so emitting AI stats under the counter names (Item_Success_Count etc.) would land them in customers' custom events / metrics tables. Always emit under a dedicated stats event name (1DS: Ms.Web.SdkStat, AI: Ms.AI.SdkStat) and keep the counter name in baseData.name and the metricName property. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
| var DROP_CLIENT_EXCEPTION = "CLIENT_EXCEPTION"; | ||
| // Dedicated top-level event name for AI stats. Keeps counter names out of the top-level name so | ||
| // the backend doesn't route stats into the customer's custom events / metrics tables. | ||
| var AI_STATS_EVENT = "Ms.AI.SdkStat"; |
There was a problem hiding this comment.
I was thinking that it should match something like the main Application Insights events -- so something like
Microsoft.ApplicationInsights.SdkStats?? potentially with the iKey (this want the portal could do something in the future -- not just the workbooks)
There was a problem hiding this comment.
I've implemented your suggestion here. Including the iKey component for future Azure portal usage.
MSNev
left a comment
There was a problem hiding this comment.
Approved with 1 additional suggestion
Default the AI SDK stats top-level event name to Microsoft.ApplicationInsights.<iKey>.SdkStats (dashes stripped) instead of the dedicated Ms.AI.SdkStat name, matching standard AI event naming. Update tests accordingly. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
@microsoft-github-policy-service rerun |
This pull request introduces SDK statistics telemetry collection and notification improvements, as well as enhanced event retry notification and error reporting in the Application Insights JavaScript SDK. The main changes add support for dynamically enabling/disabling SDK stats telemetry, improve notification to listeners about retried and discarded events (with status codes), and enhance test coverage for these new behaviors.
SDK Statistics Telemetry and Dynamic Listener Management
SdkStats) in the main configuration and enabled dynamic creation and cleanup of the SDK stats notification listener (_sdkStatsListener) based on feature flags. The listener is properly unloaded and flushed with other telemetry during shutdown. (AISKU/src/AISku.ts) [1] [2] [3] [4] [5] [6] [7]Event Retry and Notification Enhancements
channels/1ds-post-js/src/PostChannel.ts) [1] [2] [3]Testing Improvements
eventsRetrynotification is fired when events are requeued after a retriable failure, ensuring the feature works as intended. (channels/1ds-post-js/test/Unit/src/PostChannelTest.ts) [1] [2] [3]Channel Plugin Notification Improvements
baseTypefor SDK stats mapping.channels/applicationinsights-channel-js/src/Interfaces.ts,channels/applicationinsights-channel-js/src/Sender.ts) [1] [2] [3] [4] [5] [6] [7] [8]