diff --git a/README.md b/README.md index 5bef964..ffea986 100644 --- a/README.md +++ b/README.md @@ -5,7 +5,7 @@ ServiceLevelIndicators is a .NET library for emitting service-level latency metrics in milliseconds using the standard [System.Diagnostics.Metrics](https://learn.microsoft.com/dotnet/api/system.diagnostics.metrics) and OpenTelemetry pipeline. -It is designed for teams that need more than generic request timing. The library helps measure meaningful operations, attach service-specific dimensions such as customer, location, operation name, and status, and build SLO or SLA-oriented dashboards and alerts from those metrics. +It is designed for teams that need more than generic request timing. The library helps measure meaningful operations, attach service-specific dimensions such as customer, location, operation name, and SLI outcome, and build SLO or SLA-oriented dashboards and alerts from those metrics. Service level indicators (SLIs) are metrics used to track how a service is performing against expected reliability and responsiveness goals. Common examples include availability, response time, throughput, and error rate. This library focuses on latency SLIs so you can consistently measure operation duration across background work, ASP.NET Core APIs, and versioned endpoints. @@ -17,22 +17,19 @@ Service level indicators (SLIs) are metrics used to track how a service is perfo **Trellis.ServiceLevelIndicators** emits operation latency metrics in milliseconds so service owners can monitor performance over time using dimensions that matter to their system. The metrics are emitted via the standard [.NET Meter Class](https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.metrics.meter). -By default, a meter named `Trellis.SLI` with instrument name `operation.duration` is added to the service metrics. The metrics are emitted with the following [attributes](https://opentelemetry.io/docs/specs/otel/common/#attribute). +By default, a meter named `Trellis.SLI` with instrument name `operation.duration` is added to the service metrics. If you configure `ServiceLevelIndicatorOptions.Meter`, metrics are emitted from that meter instead. Metrics recorded with `StartMeasuring(...)` emit the following [attributes](https://opentelemetry.io/docs/specs/otel/common/#attribute). - CustomerResourceId - The **target resource** of the operation — the noun in the URL path being read or modified, normalized to a stable identifier (tenant, subscription, account, work item). **NOT** the caller, **NOT** a per-request GUID, **NOT** a user ID or email. Example: for `GET /teams/{teamId}` called by user `xa1` for team `team1`, the value is `"team1"`, not `"xa1"`. See the [ASP.NET Core package README](Trellis.ServiceLevelIndicators.Asp/src/README.md#what-customerresourceid-is--and-what-it-is-not) for the full mental model. -- LocationId - The location where the service running. eg. Public cloud, West US 3 region. [Azure Core](https://learn.microsoft.com/en-us/dotnet/api/azure.core.azurelocation?view=azure-dotnet) +- LocationId - The location where the service is running, such as public cloud in the West US 3 region. [Azure Core](https://learn.microsoft.com/en-us/dotnet/api/azure.core.azurelocation?view=azure-dotnet) - Operation - The name of the operation. -- activity.status.code - The activity status code is set based on the success or failure of the operation. [ActivityStatusCode](https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.activitystatuscode). +- Outcome - The SLI outcome. Exact values are `Success`, `Failure`, `ClientError`, and `Ignored`. Default success-rate queries should use `Success / (Success + Failure)`. **Trellis.ServiceLevelIndicators.Asp** adds the following dimensions. - Operation - For ASP.NET endpoints, the operation name is the HTTP method plus the route template, resolved in this order: (1) `[ServiceLevelIndicator(Operation = "...")]` attribute or `.AddServiceLevelIndicator("op")` override, (2) MVC `AttributeRouteInfo.Template`, (3) the endpoint's `RouteEndpoint.RoutePattern.RawText` (Minimal APIs / conventional routing). Route placeholders such as `{id}` are preserved, never substituted with the concrete request value. If no bounded template is available, the middleware emits the sentinel `" "` and logs a warning — see that value in your metrics as a signal to add a route template. -- The activity status code will be - "Ok" when the http response status code is in the 2xx range, - "Error" when the http response status code is in the 5xx range, - "Unset" for any other status code. +- Outcome - By default, 2xx and 3xx responses are `Success`, common caller errors such as 400/401/403/404/409/412/422 are `ClientError`, 429 and 5xx responses are `Failure`, and request-aborted cancellations are `Ignored`. - http.response.status.code - The http status code. -- http.request.method (Optional)- The http request method (GET, POST, etc) is added. +- http.request.method - The http request method (GET, POST, etc). Difference between ServiceLevelIndicator and http.server.request.duration @@ -40,12 +37,12 @@ Difference between ServiceLevelIndicator and http.server.request.duration | ---------- | ------- | ------ | Resolution | milliseconds | seconds | Customer | CustomerResourceId | N/A -| Error check | Activity or HTTP status.code | HTTP status code +| Error check | `Outcome` and HTTP status code | HTTP status code This makes the library useful when generic HTTP server metrics are not enough, especially for multi-tenant services, APIs with customer-specific objectives, or workloads that need the same SLI model outside HTTP request handling. **Trellis.ServiceLevelIndicators.Asp.ApiVersioning** adds the following dimensions. -- http.api.version - The API Version when used in conjunction with [API Versioning package](https://github.com/dotnet/aspnet-api-versioning). +- http.api.version - The resolved API version when used in conjunction with the [API Versioning package](https://github.com/dotnet/aspnet-api-versioning). The value can be a version string, `Neutral`, `Unspecified`, or an empty string for invalid or ambiguous requests. ## NuGet Packages @@ -199,11 +196,11 @@ You can measure a block of code by wrapping it in a `using` clause of `MeasuredO Example: ```csharp -async Task MeasureCodeBlock(ServiceLevelIndicator serviceLevelIndicator) +void MeasureCodeBlock(ServiceLevelIndicator serviceLevelIndicator) { using var measuredOperation = serviceLevelIndicator.StartMeasuring("OperationName"); // Do Work. - measuredOperation.SetActivityStatusCode(System.Diagnostics.ActivityStatusCode.Ok); + measuredOperation.SetOutcome(SliOutcome.Success); } ``` @@ -211,7 +208,7 @@ async Task MeasureCodeBlock(ServiceLevelIndicator serviceLevelIndicator) ### Cardinality Guidance -All three required tags — `Operation`, `LocationId`, and `CustomerResourceId` — must be **low-cardinality and bounded**. The library bounds `Operation` for you via the route-template resolver and the `` sentinel; you are responsible for `LocationId` (set once from configuration) and `CustomerResourceId` (stable tenant / subscription / resource identifier). +Required tags must be stable and meaningful. The library bounds `Operation` for you via the route-template resolver and the `` sentinel; you are responsible for `LocationId` (set once from configuration) and `CustomerResourceId` (stable tenant / subscription / resource identifier). `CustomerResourceId` may be high-cardinality when the backend is designed for it, but it must not be a per-request generated value. The same discipline applies to `[Measure]` parameters and any custom attributes added via `AddAttribute(...)`. Avoid email addresses, request IDs, timestamps, or unconstrained free text unless your metrics backend is explicitly designed for high-cardinality telemetry. @@ -237,18 +234,7 @@ The default operation name is the HTTP method plus the route template (placehold .AddApiVersion(); ``` -- To add HTTP method as a dimension, add `AddHttpMethod` to Service Level Indicator. - - Example: - - ```csharp - builder.Services.AddServiceLevelIndicator(options => - { - /// Options - }) - .AddMvc() - .AddHttpMethod(); - ``` +- `http.request.method` is emitted by default by the ASP.NET Core middleware. `AddHttpMethod()` remains available as a no-op for older setup code. - Enrich SLI with the `Enrich` callback. The callback receives a `MeasuredOperation` as context that can be used to set `CustomerResourceId` or additional attributes. An async version `EnrichAsync` is also available. @@ -345,6 +331,8 @@ The default operation name is the HTTP method plus the route template (placehold Try out the sample weather forecast Web API. +For a local Grafana/Prometheus/OpenTelemetry Collector experience, run the provisioned dashboard in [`sample\Observability\Grafana`](sample/Observability/Grafana/README.md). It shows SLI latency percentiles, success rate, failures, client errors, unknown customer diagnostics, and `` detection. + To view the metrics locally using the [.NET Aspire Dashboard](https://aspire.dev/dashboard/standalone/): 1. Start the Aspire dashboard: @@ -352,7 +340,7 @@ To view the metrics locally using the [.NET Aspire Dashboard](https://aspire.dev docker run --rm -it -d -p 18888:18888 -p 4317:18889 -e DOTNET_DASHBOARD_UNSECURED_ALLOW_ANONYMOUS=true -e DASHBOARD__OTLP__AUTHMODE=Unsecured --name aspire-dashboard mcr.microsoft.com/dotnet/aspire-dashboard:latest ``` 2. Run the sample web API project and call the `GET WeatherForecast` using the Open API UI. -3. Open `http://localhost:18888` to view the dashboard. You should see the SLI metrics under the instrument `operation.duration` where `Operation = "GET WeatherForecast"`, `http.response.status.code = 200`, `LocationId = "ms-loc://az/public/westus2"`, `activity.status.code = Ok`. +3. Open `http://localhost:18888` to view the dashboard. You should see the SLI metrics under the instrument `operation.duration` where `Operation = "GET WeatherForecast"`, `Outcome = "Success"`, `http.response.status.code = 200`, and `LocationId = "ms-loc://az/public/westus3"`. ![SLI](assets/aspire.jpg) 4. If you run the sample with API Versioning, you will see something similar to the following. ![SLI](assets/versioned.jpg) diff --git a/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/src/ApiVersionEnrichment.cs b/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/src/ApiVersionEnrichment.cs index fab7115..76a00c8 100644 --- a/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/src/ApiVersionEnrichment.cs +++ b/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/src/ApiVersionEnrichment.cs @@ -27,4 +27,4 @@ private static string GetApiVersion(HttpContext context) return "Unspecified"; } -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/src/README.md b/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/src/README.md index 86ee19b..4eda486 100644 --- a/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/src/README.md +++ b/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/src/README.md @@ -43,9 +43,9 @@ This registers `ApiVersionEnrichment`, which reads the resolved API version from | Attribute | Description | |-----------|-------------| -| `http.api.version` | The resolved API version string (e.g. `1.0`, `2024-01-15`), `Neutral`, or `Unspecified` | +| `http.api.version` | The single resolved API version string (e.g. `1.0`, `2024-01-15`), `Neutral` for API-version-neutral endpoints, `Unspecified` when no version is requested and no default is assumed, or an empty string for invalid or ambiguous requests | -This attribute is added alongside all the standard attributes emitted by `Trellis.ServiceLevelIndicators.Asp` (`Operation`, `CustomerResourceId`, `LocationId`, `activity.status.code`, `http.response.status.code`). +This attribute is added alongside all the standard attributes emitted by `Trellis.ServiceLevelIndicators.Asp` (`Operation`, `CustomerResourceId`, `LocationId`, `Outcome`, `http.request.method`, `http.response.status.code`). ## Further Reading diff --git a/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/src/ServiceLevelIndicatorServiceCollectionExtensions.cs b/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/src/ServiceLevelIndicatorServiceCollectionExtensions.cs index 2d782be..c75ce10 100644 --- a/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/src/ServiceLevelIndicatorServiceCollectionExtensions.cs +++ b/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/src/ServiceLevelIndicatorServiceCollectionExtensions.cs @@ -11,4 +11,4 @@ public static IServiceLevelIndicatorBuilder AddApiVersion(this IServiceLevelIndi builder.Services.TryAddEnumerable(ServiceDescriptor.Singleton, ApiVersionEnrichment>()); return builder; } -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/tests/ServiceLevelIndicatorVersionedAspTests.cs b/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/tests/ServiceLevelIndicatorVersionedAspTests.cs index 3d8c92b..c292fb8 100644 --- a/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/tests/ServiceLevelIndicatorVersionedAspTests.cs +++ b/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/tests/ServiceLevelIndicatorVersionedAspTests.cs @@ -1,4 +1,4 @@ -namespace Trellis.ServiceLevelIndicators.Asp.ApiVersioning.Tests; +namespace Trellis.ServiceLevelIndicators.Asp.ApiVersioning.Tests; using System.Diagnostics.Metrics; using System.Net; @@ -49,7 +49,7 @@ public async Task SLI_Metrics_is_emitted_with_API_version_as_query_parameter() new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET TestSingle"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), new("http.api.version", "2023-08-29"), ]; @@ -72,7 +72,7 @@ public async Task SLI_Metrics_is_emitted_with_API_version_as_header() new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET TestSingle"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), new("http.api.version", "2023-08-29"), ]; @@ -98,7 +98,7 @@ public async Task SLI_Metrics_is_emitted_with_neutral_API_version() new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET TestNeutral"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), ]; using var host = await CreateHost(); @@ -121,7 +121,7 @@ public async Task SLI_Metrics_is_emitted_with_default_API_version() new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET TestSingle"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), ]; using var host = await CreateHostWithDefaultApiVersion(); @@ -159,7 +159,7 @@ public async Task SLI_Metrics_is_emitted_when_api_version_is_invalid(string rout new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET "), - new("activity.status.code", "Unset"), + new("Outcome", "ClientError"), new("http.response.status.code", 400), ]; var routeWithVersion = route + "?" + version; @@ -239,10 +239,21 @@ private async Task CreateHostWithDefaultApiVersion() => private void ValidateMetrics() { + _expectedTags = AddDefaultHttpMethod(_expectedTags); + _callbackCalled.Should().BeTrue(); + _actualTags.Should().NotContain(tag => tag.Key == "activity.status.code"); _actualTags.Should().BeEquivalentTo(_expectedTags); } + private static KeyValuePair[] AddDefaultHttpMethod(KeyValuePair[] expectedTags) + { + if (expectedTags.Any(tag => tag.Key == "http.request.method")) + return expectedTags; + + return [.. expectedTags, new KeyValuePair("http.request.method", "GET")]; + } + private void OnMeasurementRecorded(Instrument instrument, long measurement, ReadOnlySpan> tags, object? state) { _actualTags = tags.ToArray(); @@ -273,4 +284,4 @@ public void Dispose() Dispose(disposing: true); GC.SuppressFinalize(this); } -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/tests/TestDoubleController.cs b/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/tests/TestDoubleController.cs index 5672edf..574f327 100644 --- a/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/tests/TestDoubleController.cs +++ b/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/tests/TestDoubleController.cs @@ -11,4 +11,4 @@ public class TestDoubleController : ControllerBase { [HttpGet] public IActionResult Get() => Ok("Hello World!"); -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/tests/TestNeutralController.cs b/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/tests/TestNeutralController.cs index 90ea550..b90a919 100644 --- a/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/tests/TestNeutralController.cs +++ b/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/tests/TestNeutralController.cs @@ -10,4 +10,4 @@ public class TestNeutralController : ControllerBase { [HttpGet] public IActionResult Get() => Ok("Hello World!"); -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/tests/TestSingleController.cs b/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/tests/TestSingleController.cs index 9d9533a..7a5dc78 100644 --- a/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/tests/TestSingleController.cs +++ b/Trellis.ServiceLevelIndicators.Asp.ApiVersioning/tests/TestSingleController.cs @@ -10,4 +10,4 @@ public class TestSingleController : ControllerBase { [HttpGet] public IActionResult Get() => Ok("Hello World!"); -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/src/CustomerResourceIdAttribute.cs b/Trellis.ServiceLevelIndicators.Asp/src/CustomerResourceIdAttribute.cs index 2f87c41..be639ba 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/CustomerResourceIdAttribute.cs +++ b/Trellis.ServiceLevelIndicators.Asp/src/CustomerResourceIdAttribute.cs @@ -7,4 +7,4 @@ [AttributeUsage(AttributeTargets.Parameter, AllowMultiple = false, Inherited = true)] public sealed class CustomerResourceIdAttribute : Attribute { -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/src/CustomerResourceIdMetadata.cs b/Trellis.ServiceLevelIndicators.Asp/src/CustomerResourceIdMetadata.cs index f13ca16..fd7aadf 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/CustomerResourceIdMetadata.cs +++ b/Trellis.ServiceLevelIndicators.Asp/src/CustomerResourceIdMetadata.cs @@ -9,4 +9,4 @@ public sealed class CustomerResourceIdMetadata(string routeValueName) /// Gets the route value name mapped to the customer resource identifier. /// public string RouteValueName { get; } = routeValueName; -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/src/EndpointBuilderExtensions.cs b/Trellis.ServiceLevelIndicators.Asp/src/EndpointBuilderExtensions.cs index 0d52cd1..6d3e43d 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/EndpointBuilderExtensions.cs +++ b/Trellis.ServiceLevelIndicators.Asp/src/EndpointBuilderExtensions.cs @@ -54,4 +54,4 @@ private static void AddSliMetadata(EndpointBuilder endpoint) } } } -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/src/Enrich.cs b/Trellis.ServiceLevelIndicators.Asp/src/Enrich.cs index c782732..4e299f7 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/Enrich.cs +++ b/Trellis.ServiceLevelIndicators.Asp/src/Enrich.cs @@ -14,4 +14,4 @@ public ValueTask EnrichAsync(WebEnrichmentContext context, CancellationToken can _action(context); return ValueTask.CompletedTask; } -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/src/EnrichAsync.cs b/Trellis.ServiceLevelIndicators.Asp/src/EnrichAsync.cs index f51ef2a..fde4998 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/EnrichAsync.cs +++ b/Trellis.ServiceLevelIndicators.Asp/src/EnrichAsync.cs @@ -11,4 +11,4 @@ internal sealed class EnrichAsync : IEnrichment ValueTask IEnrichment.EnrichAsync(WebEnrichmentContext context, CancellationToken cancellationToken) => _func(context, cancellationToken); -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/src/HttpContextExtensions.cs b/Trellis.ServiceLevelIndicators.Asp/src/HttpContextExtensions.cs index a8bb6fd..56c780c 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/HttpContextExtensions.cs +++ b/Trellis.ServiceLevelIndicators.Asp/src/HttpContextExtensions.cs @@ -42,4 +42,4 @@ public static bool TryGetMeasuredOperation(this HttpContext context, [MaybeNullW measuredOperation = null; return false; } -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/src/HttpMethodEnrichment.cs b/Trellis.ServiceLevelIndicators.Asp/src/HttpMethodEnrichment.cs index c0f2a63..c65fd5b 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/HttpMethodEnrichment.cs +++ b/Trellis.ServiceLevelIndicators.Asp/src/HttpMethodEnrichment.cs @@ -10,4 +10,4 @@ public ValueTask EnrichAsync(WebEnrichmentContext context, CancellationToken can context.AddAttribute("http.request.method", context.HttpContext.Request.Method); return ValueTask.CompletedTask; } -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/src/IServiceLevelIndicatorFeature.cs b/Trellis.ServiceLevelIndicators.Asp/src/IServiceLevelIndicatorFeature.cs index 26221dc..2b58f2b 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/IServiceLevelIndicatorFeature.cs +++ b/Trellis.ServiceLevelIndicators.Asp/src/IServiceLevelIndicatorFeature.cs @@ -5,4 +5,4 @@ public interface IServiceLevelIndicatorFeature { MeasuredOperation MeasuredOperation { get; } -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/src/MeasureAttribute.cs b/Trellis.ServiceLevelIndicators.Asp/src/MeasureAttribute.cs index 40b6147..3b5d175 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/MeasureAttribute.cs +++ b/Trellis.ServiceLevelIndicators.Asp/src/MeasureAttribute.cs @@ -7,4 +7,4 @@ public sealed class MeasureAttribute(string? name = default) : Attribute { public string? Name { get; } = name; -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/src/MeasureMetadata.cs b/Trellis.ServiceLevelIndicators.Asp/src/MeasureMetadata.cs index 9ab3aa4..fb8f222 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/MeasureMetadata.cs +++ b/Trellis.ServiceLevelIndicators.Asp/src/MeasureMetadata.cs @@ -5,4 +5,4 @@ public sealed class MeasureMetadata(string routeValueName, string? attributeName public string RouteValueName { get; } = routeValueName; public string AttributeName { get; } = attributeName ?? routeValueName; -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/src/README.md b/Trellis.ServiceLevelIndicators.Asp/src/README.md index e19da19..6e1fcdf 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/README.md +++ b/Trellis.ServiceLevelIndicators.Asp/src/README.md @@ -43,6 +43,8 @@ app.UseServiceLevelIndicator(); ## Quick Start — Minimal APIs ```csharp +// Register with OpenTelemetry as shown in the MVC example above. + builder.Services.AddServiceLevelIndicator(options => { options.LocationId = ServiceLevelIndicator.CreateLocationId("public", "westus3"); @@ -60,18 +62,22 @@ If you configure a custom `Meter` in `ServiceLevelIndicatorOptions`, register th `ServiceLevelIndicator` is a sealed `IDisposable` registered as a singleton; the DI container disposes it (and the `Meter` it created) at host shutdown — no manual cleanup needed. A `Meter` you supply via `Options.Meter` is owned by you and is never disposed by SLI. +Place `UseServiceLevelIndicator()` after routing and before endpoint execution so the middleware can read endpoint metadata and measure request handling. + ## Emitted Metrics -A meter named `Trellis.SLI` with instrument `operation.duration` (milliseconds) is emitted with the following attributes: +By default, a meter named `Trellis.SLI` emits the `operation.duration` histogram in milliseconds. If you configure `ServiceLevelIndicatorOptions.Meter`, ASP.NET Core SLI metrics are emitted from that meter instead. + +Measured HTTP requests emit the following attributes: | Attribute | Description | |-----------|-------------| | `Operation` | The HTTP method + route template (e.g. `GET /teams/{teamId}`) — see [How `Operation` is resolved](#how-operation-is-resolved) below. | | `CustomerResourceId` | The **target resource** of the operation — see [What `CustomerResourceId` is — and what it is NOT](#what-customerresourceid-is--and-what-it-is-not) below. | | `LocationId` | Where the service is running | -| `activity.status.code` | `Ok` (2xx), `Error` (5xx), or `Unset` (other) | +| `Outcome` | `Success`, `Failure`, `ClientError`, or `Ignored`; 2xx/3xx responses are `Success`, common 4xx caller errors are `ClientError`, 429/5xx and unhandled exceptions are `Failure`, and request-aborted cancellations are `Ignored` | | `http.response.status.code` | The HTTP response status code | -| `http.request.method` | *(Optional)* The HTTP method — enabled via `AddHttpMethod()` | +| `http.request.method` | The HTTP method | ### What `CustomerResourceId` is — and what it is NOT @@ -98,7 +104,7 @@ app.MapGet("/teams/{teamId}", .AddServiceLevelIndicator("GetTeam"); ``` -Or set it imperatively from claims/headers via `Enrich` or `HttpContext.GetMeasuredOperation()` — but the value must still be a stable, low-cardinality resource identifier. +Or set it imperatively from claims/headers via `Enrich` or `HttpContext.GetMeasuredOperation()` — but the value must still be stable and meaningful. ### How `Operation` is resolved @@ -112,13 +118,9 @@ If none of those yield a bounded template (e.g. a synthetic problem-details endp ## Customizations -### Add HTTP method as a dimension +### HTTP method dimension -```csharp -builder.Services.AddServiceLevelIndicator(options => { /* ... */ }) - .AddMvc() - .AddHttpMethod(); -``` +`http.request.method` is emitted by default. `AddHttpMethod()` remains available as a no-op for older setup code. ### Enrich with custom data @@ -182,11 +184,11 @@ if (HttpContext.TryGetMeasuredOperation(out var op)) ## Cardinality Guidance -All three required tags — `Operation`, `LocationId`, and `CustomerResourceId` — must be **low-cardinality and bounded**: +Required tags must be stable and meaningful: - **`Operation`** is bounded for you by the route-template resolver above (one series per HTTP method × route template). Watch your metrics for the `` sentinel — it means an endpoint is missing a route template. - **`LocationId`** is set once per process from configuration — naturally bounded. -- **`CustomerResourceId`** is your responsibility. Use a stable tenant / subscription / resource identifier; do not use per-request GUIDs, user IDs, email addresses, request IDs, or raw user input. +- **`CustomerResourceId`** is your responsibility. Use a stable tenant / subscription / resource identifier; do not use per-request GUIDs, timestamps, request IDs, or raw user input. High-cardinality customer resources are acceptable when they are stable, meaningful, and supported by your metrics backend. The same discipline applies to `[Measure]` parameters and any custom attributes you add via `AddAttribute(...)`. diff --git a/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorApplicationBuilderExtensions.cs b/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorApplicationBuilderExtensions.cs index 008bf41..6ffc8ea 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorApplicationBuilderExtensions.cs +++ b/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorApplicationBuilderExtensions.cs @@ -18,4 +18,4 @@ public static IApplicationBuilder UseServiceLevelIndicator(this IApplicationBuil return app.UseMiddleware(); } -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorAttribute.cs b/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorAttribute.cs index 533346c..752d900 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorAttribute.cs +++ b/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorAttribute.cs @@ -13,4 +13,4 @@ public ServiceLevelIndicatorAttribute() { } public ServiceLevelIndicatorAttribute(string operation) => Operation = operation; public string? Operation { get; set; } -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorConvention.cs b/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorConvention.cs index 7fe4b5f..f1280dc 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorConvention.cs +++ b/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorConvention.cs @@ -36,4 +36,4 @@ public void Apply(ParameterModel parameter) } } } -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorFeature.cs b/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorFeature.cs index 26da258..461ec55 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorFeature.cs +++ b/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorFeature.cs @@ -5,4 +5,4 @@ internal sealed class ServiceLevelIndicatorFeature : IServiceLevelIndicatorFeatu public ServiceLevelIndicatorFeature(MeasuredOperation measureOperation) => MeasuredOperation = measureOperation; public MeasuredOperation MeasuredOperation { get; } -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorHttpOptions.cs b/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorHttpOptions.cs new file mode 100644 index 0000000..fc40992 --- /dev/null +++ b/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorHttpOptions.cs @@ -0,0 +1,14 @@ +namespace Trellis.ServiceLevelIndicators; + +using Microsoft.AspNetCore.Http; + +/// +/// ASP.NET Core-specific SLI options. +/// +public sealed class ServiceLevelIndicatorHttpOptions +{ + /// + /// Optional classifier that maps the completed HTTP request to an SLI outcome. + /// + public Func? ClassifyOutcome { get; set; } +} diff --git a/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorMiddleware.cs b/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorMiddleware.cs index 69aedf4..7d85de1 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorMiddleware.cs +++ b/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorMiddleware.cs @@ -8,6 +8,7 @@ using Microsoft.AspNetCore.Mvc.Controllers; using Microsoft.AspNetCore.Routing; using Microsoft.Extensions.Logging; +using Microsoft.Extensions.Options; internal sealed partial class ServiceLevelIndicatorMiddleware { @@ -15,13 +16,20 @@ internal sealed partial class ServiceLevelIndicatorMiddleware private readonly ServiceLevelIndicator _serviceLevelIndicator; private readonly IEnumerable> _enrichments; private readonly ILogger _logger; - - public ServiceLevelIndicatorMiddleware(RequestDelegate next, ServiceLevelIndicator serviceLevelIndicator, IEnumerable> enrichments, ILogger logger) + private readonly ServiceLevelIndicatorHttpOptions _httpOptions; + + public ServiceLevelIndicatorMiddleware( + RequestDelegate next, + ServiceLevelIndicator serviceLevelIndicator, + IEnumerable> enrichments, + ILogger logger, + IOptions httpOptions) { _next = next; _serviceLevelIndicator = serviceLevelIndicator; _enrichments = enrichments; _logger = logger; + _httpOptions = httpOptions.Value; } public async Task InvokeAsync(HttpContext context) @@ -49,8 +57,12 @@ public async Task InvokeAsync(HttpContext context) { unhandledException = ex; - if (!context.Response.HasStarted && context.Response.StatusCode < StatusCodes.Status500InternalServerError) + if (!IsRequestAborted(context, ex) && + !context.Response.HasStarted && + context.Response.StatusCode < StatusCodes.Status500InternalServerError) + { context.Response.StatusCode = StatusCodes.Status500InternalServerError; + } throw; } @@ -59,7 +71,7 @@ public async Task InvokeAsync(HttpContext context) try { var webmeasurementContext = new WebEnrichmentContext(measuredOperation, context); - UpdateOperationWithResponseStatus(context, measuredOperation, unhandledException is not null); + UpdateOperationWithResponseStatus(context, measuredOperation, unhandledException); foreach (var enrichment in _enrichments) { @@ -96,19 +108,40 @@ private static void SetCustomerResourceIdFromAttribute(HttpContext context, Endp measuredOperation.CustomerResourceId = customerResourceId; } - private static void UpdateOperationWithResponseStatus(HttpContext context, MeasuredOperation measuredOperation, bool unhandledException = false) + private void UpdateOperationWithResponseStatus(HttpContext context, MeasuredOperation measuredOperation, Exception? unhandledException) { var statusCode = context.Response.StatusCode; - measuredOperation.AddAttribute("http.response.status.code", statusCode); - var activityCode = unhandledException ? ActivityStatusCode.Error : statusCode switch - { - >= StatusCodes.Status500InternalServerError => ActivityStatusCode.Error, - >= StatusCodes.Status200OK and < StatusCodes.Status300MultipleChoices => ActivityStatusCode.Ok, - _ => ActivityStatusCode.Unset, - }; - measuredOperation.SetActivityStatusCode(activityCode); + measuredOperation.Attributes.Add(new KeyValuePair("http.request.method", context.Request.Method)); + measuredOperation.Attributes.Add(new KeyValuePair("http.response.status.code", statusCode)); + + var outcome = context.RequestAborted.IsCancellationRequested + ? SliOutcome.Ignored + : unhandledException is not null + ? SliOutcome.Failure + : _httpOptions.ClassifyOutcome?.Invoke(context) ?? ClassifyStatusCode(statusCode); + + measuredOperation.SetOutcome(outcome); } + private static SliOutcome ClassifyStatusCode(int statusCode) => statusCode switch + { + >= StatusCodes.Status200OK and < StatusCodes.Status400BadRequest => SliOutcome.Success, + StatusCodes.Status400BadRequest + or StatusCodes.Status401Unauthorized + or StatusCodes.Status403Forbidden + or StatusCodes.Status404NotFound + or StatusCodes.Status409Conflict + or StatusCodes.Status412PreconditionFailed + or StatusCodes.Status422UnprocessableEntity => SliOutcome.ClientError, + StatusCodes.Status429TooManyRequests => SliOutcome.Failure, + >= StatusCodes.Status500InternalServerError => SliOutcome.Failure, + _ => SliOutcome.Ignored, + }; + + private static bool IsRequestAborted(HttpContext context, Exception? exception) => + context.RequestAborted.IsCancellationRequested && + exception is OperationCanceledException; + private bool ShouldEmitMetrics(EndpointMetadataCollection metadata) => _serviceLevelIndicator.ServiceLevelIndicatorOptions.AutomaticallyEmitted || GetSliAttribute(metadata) is not null; @@ -133,10 +166,10 @@ private string GetOperation(HttpContext context, EndpointMetadataCollection meta if (string.IsNullOrEmpty(template)) { LogMissingRouteTemplate(context.GetEndpoint()?.DisplayName ?? "(no endpoint)"); - return context.Request.Method + " "; + return context.Request.Method.ToUpperInvariant() + " "; } - return context.Request.Method + " " + template; + return context.Request.Method.ToUpperInvariant() + " " + template; } private static string? GetCustomerResourceIdAttributes(HttpContext context, EndpointMetadataCollection metadata) @@ -182,4 +215,4 @@ private void AddSliFeatureToHttpContext(HttpContext context, MeasuredOperation m private static void RemoveSliFeatureFromHttpContext(HttpContext context) => context.Features.Set(null); -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorServiceCollectionExtensions.cs b/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorServiceCollectionExtensions.cs index 239aac2..7b4e7a6 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorServiceCollectionExtensions.cs +++ b/Trellis.ServiceLevelIndicators.Asp/src/ServiceLevelIndicatorServiceCollectionExtensions.cs @@ -1,5 +1,6 @@ namespace Trellis.ServiceLevelIndicators; +using Microsoft.AspNetCore.Http; using Microsoft.AspNetCore.Mvc; using Microsoft.Extensions.DependencyInjection; using Microsoft.Extensions.DependencyInjection.Extensions; @@ -19,7 +20,15 @@ public static IServiceLevelIndicatorBuilder AddMvc(this IServiceLevelIndicatorBu public static IServiceLevelIndicatorBuilder AddHttpMethod(this IServiceLevelIndicatorBuilder builder) { ArgumentNullException.ThrowIfNull(builder); - builder.Services.TryAddEnumerable(ServiceDescriptor.Singleton, HttpMethodEnrichment>()); + return builder; + } + + public static IServiceLevelIndicatorBuilder ClassifyHttpOutcome(this IServiceLevelIndicatorBuilder builder, Func classifier) + { + ArgumentNullException.ThrowIfNull(builder); + ArgumentNullException.ThrowIfNull(classifier); + + builder.Services.Configure(options => options.ClassifyOutcome = classifier); return builder; } diff --git a/Trellis.ServiceLevelIndicators.Asp/src/WebEnrichmentContext.cs b/Trellis.ServiceLevelIndicators.Asp/src/WebEnrichmentContext.cs index 1d53221..9b8ef16 100644 --- a/Trellis.ServiceLevelIndicators.Asp/src/WebEnrichmentContext.cs +++ b/Trellis.ServiceLevelIndicators.Asp/src/WebEnrichmentContext.cs @@ -21,4 +21,4 @@ public WebEnrichmentContext(MeasuredOperation operation, HttpContext httpContext public void AddAttribute(string name, object? value) => _operation.AddAttribute(name, value); public void SetCustomerResourceId(string id) => _operation.CustomerResourceId = id; -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/tests/IServiceCollectionExtensions.cs b/Trellis.ServiceLevelIndicators.Asp/tests/IServiceCollectionExtensions.cs index 2508c09..1e8d8d7 100644 --- a/Trellis.ServiceLevelIndicators.Asp/tests/IServiceCollectionExtensions.cs +++ b/Trellis.ServiceLevelIndicators.Asp/tests/IServiceCollectionExtensions.cs @@ -11,4 +11,4 @@ public static IServiceLevelIndicatorBuilder AddTestEnrichment(this IServiceLevel builder.Services.AddSingleton>(new TestEnrichment(key, value)); return builder; } -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/tests/MultipleCustomerResourceIdController.cs b/Trellis.ServiceLevelIndicators.Asp/tests/MultipleCustomerResourceIdController.cs index 021ef2c..047cee0 100644 --- a/Trellis.ServiceLevelIndicators.Asp/tests/MultipleCustomerResourceIdController.cs +++ b/Trellis.ServiceLevelIndicators.Asp/tests/MultipleCustomerResourceIdController.cs @@ -1,4 +1,4 @@ -namespace Trellis.ServiceLevelIndicators.Asp.Tests; +namespace Trellis.ServiceLevelIndicators.Asp.Tests; using System.Reflection; using Microsoft.AspNetCore.Mvc; diff --git a/Trellis.ServiceLevelIndicators.Asp/tests/ProblemDetailsInteropTests.cs b/Trellis.ServiceLevelIndicators.Asp/tests/ProblemDetailsInteropTests.cs index d3b46a6..0bf5439 100644 --- a/Trellis.ServiceLevelIndicators.Asp/tests/ProblemDetailsInteropTests.cs +++ b/Trellis.ServiceLevelIndicators.Asp/tests/ProblemDetailsInteropTests.cs @@ -1,4 +1,4 @@ -namespace Trellis.ServiceLevelIndicators.Asp.Tests; +namespace Trellis.ServiceLevelIndicators.Asp.Tests; using System.Diagnostics.Metrics; using System.Net; diff --git a/Trellis.ServiceLevelIndicators.Asp/tests/ServiceLevelIndicatorAspTests.cs b/Trellis.ServiceLevelIndicators.Asp/tests/ServiceLevelIndicatorAspTests.cs index a1dd845..2573c4b 100644 --- a/Trellis.ServiceLevelIndicators.Asp/tests/ServiceLevelIndicatorAspTests.cs +++ b/Trellis.ServiceLevelIndicators.Asp/tests/ServiceLevelIndicatorAspTests.cs @@ -1,4 +1,4 @@ -namespace Trellis.ServiceLevelIndicators.Asp.Tests; +namespace Trellis.ServiceLevelIndicators.Asp.Tests; using System; using System.Diagnostics.Metrics; @@ -51,7 +51,7 @@ public async Task SLI_Metrics_is_emitted_for_successful_API_call() new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET Test"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), }; @@ -71,7 +71,7 @@ public async Task SLI_Metrics_is_emitted_for_successful_POST_API_call() new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "POST Test"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), }; @@ -91,13 +91,58 @@ public async Task SLI_Metrics_is_emitted_for_failed_API_call() new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET Test/bad_request"), - new("activity.status.code", "Unset"), + new("Outcome", "ClientError"), new("http.response.status.code", 400), }; ValidateMetrics(expectedTags); } + [Theory] + [InlineData("test/redirect", HttpStatusCode.Found, "GET Test/redirect", "Success", 302)] + [InlineData("test/unprocessable", HttpStatusCode.UnprocessableEntity, "GET Test/unprocessable", "ClientError", 422)] + [InlineData("test/too_many_requests", (HttpStatusCode)429, "GET Test/too_many_requests", "Failure", 429)] + public async Task SLI_Metrics_classifies_http_status_codes(string route, HttpStatusCode expectedStatus, string operation, string outcome, int statusCode) + { + using var host = await TestHostBuilder.CreateHostWithSli(_meter); + + var response = await host.GetTestClient().GetAsync(route, TestContext.Current.CancellationToken); + response.StatusCode.Should().Be(expectedStatus); + + var expectedTags = new KeyValuePair[] + { + new("CustomerResourceId", "TestCustomerResourceId"), + new("LocationId", "ms-loc://az/public/West US 3"), + new("Operation", operation), + new("Outcome", outcome), + new("http.response.status.code", statusCode), + }; + + ValidateMetrics(expectedTags); + } + + [Fact] + public async Task SLI_Metrics_uses_configured_http_outcome_classifier() + { + using var host = await TestHostBuilder.CreateHostWithSli( + _meter, + context => context.Response.StatusCode == 429 ? SliOutcome.ClientError : SliOutcome.Failure); + + var response = await host.GetTestClient().GetAsync("test/too_many_requests", TestContext.Current.CancellationToken); + response.StatusCode.Should().Be((HttpStatusCode)429); + + var expectedTags = new KeyValuePair[] + { + new("CustomerResourceId", "TestCustomerResourceId"), + new("LocationId", "ms-loc://az/public/West US 3"), + new("Operation", "GET Test/too_many_requests"), + new("Outcome", "ClientError"), + new("http.response.status.code", 429), + }; + + ValidateMetrics(expectedTags); + } + [Fact] public async Task SLI_Metrics_is_emitted_with_enriched_data() { @@ -114,7 +159,7 @@ public async Task SLI_Metrics_is_emitted_with_enriched_data() new("CustomerResourceId", "xavier@somewhere.com"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET Test"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.request.method", "GET"), new("http.response.status.code", 200), new("foo", "bar"), @@ -138,7 +183,7 @@ public async Task Override_Operation_name() new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "TestOperation"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), }; @@ -158,7 +203,7 @@ public async Task Override_CustomerResourceId() new("CustomerResourceId", "myId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET Test/customer_resourceid/{id}"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), }; @@ -178,7 +223,7 @@ public async Task CustomAttribute_is_added_to_SLI_dimension() new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET Test/custom_attribute/{value}"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), new("CustomAttribute", "Mickey"), }; @@ -210,7 +255,7 @@ public async Task When_automatically_emit_SLI_is_Off_X2C_send_SLI_using_attribut new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET Test/send_sli"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), }; @@ -258,7 +303,7 @@ public async Task TryGetMeasuredOperation_will_return_true_if_route_emits_SLI() new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET Test/try_get_measured_operation/{value}"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), new("CustomAttribute", "Goofy"), }; @@ -281,7 +326,7 @@ public async Task SLI_Measure_is_emitted() new("age", "25"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET Test/name/{first}/{surname}/{age}"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), }; @@ -374,7 +419,7 @@ public async Task SLI_Metrics_is_emitted_for_server_error() new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET Test/server_error"), - new("activity.status.code", "Error"), + new("Outcome", "Failure"), new("http.response.status.code", 500), }; @@ -396,13 +441,34 @@ await act.Should().ThrowAsync() new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET Test/throw"), - new("activity.status.code", "Error"), + new("Outcome", "Failure"), new("http.response.status.code", 500), }; ValidateMetrics(expectedTags); } + [Fact] + public async Task SLI_Metrics_is_emitted_as_ignored_for_request_aborted_cancellation() + { + using var host = await TestHostBuilder.CreateHostWithSli(_meter); + + Func act = () => host.GetTestClient().GetAsync("test/request_aborted", TestContext.Current.CancellationToken); + + await act.Should().ThrowAsync(); + + var expectedTags = new KeyValuePair[] + { + new("CustomerResourceId", "TestCustomerResourceId"), + new("LocationId", "ms-loc://az/public/West US 3"), + new("Operation", "GET Test/request_aborted"), + new("Outcome", "Ignored"), + new("http.response.status.code", 200), + }; + + ValidateMetrics(expectedTags); + } + protected virtual void Dispose(bool disposing) { if (!_disposedValue) @@ -434,11 +500,25 @@ private void OnMeasurementRecorded(Instrument instrument, long measurement, Read private void ValidateMetrics(KeyValuePair[] expectedTags) { + expectedTags = AddDefaultHttpMethod(expectedTags); + _callbackCalled.Should().BeTrue(); _instrument!.Name.Should().Be("operation.duration"); _instrument.Unit.Should().Be("ms"); _measurement.Should().BeInRange(TestHostBuilder.MillisecondsDelay - 10, TestHostBuilder.MillisecondsDelay + 400); + _actualTags.Should().NotContain(tag => tag.Key == "activity.status.code"); _actualTags.Should().BeEquivalentTo(expectedTags); } -} \ No newline at end of file + private static KeyValuePair[] AddDefaultHttpMethod(KeyValuePair[] expectedTags) + { + if (expectedTags.Any(tag => tag.Key == "http.request.method")) + return expectedTags; + + var operation = expectedTags.First(tag => tag.Key == "Operation").Value?.ToString(); + var method = operation?.StartsWith("POST ", StringComparison.Ordinal) == true ? "POST" : "GET"; + + return [.. expectedTags, new KeyValuePair("http.request.method", method)]; + } + +} diff --git a/Trellis.ServiceLevelIndicators.Asp/tests/ServiceLevelIndicatorMinimalApiTests.cs b/Trellis.ServiceLevelIndicators.Asp/tests/ServiceLevelIndicatorMinimalApiTests.cs index 8c7aca8..9809e27 100644 --- a/Trellis.ServiceLevelIndicators.Asp/tests/ServiceLevelIndicatorMinimalApiTests.cs +++ b/Trellis.ServiceLevelIndicators.Asp/tests/ServiceLevelIndicatorMinimalApiTests.cs @@ -1,4 +1,4 @@ -namespace Trellis.ServiceLevelIndicators.Asp.Tests; +namespace Trellis.ServiceLevelIndicators.Asp.Tests; using System; using System.Diagnostics.Metrics; @@ -55,7 +55,7 @@ public async Task SLI_Metrics_is_emitted_for_minimal_api_get() new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET /hello"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), }; @@ -79,7 +79,7 @@ public async Task SLI_Metrics_is_emitted_with_custom_operation_name() new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "CustomOp"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), }; @@ -103,7 +103,7 @@ public async Task SLI_Metrics_is_emitted_with_customer_resource_id_from_route() new("CustomerResourceId", "myResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET /resource/{id}"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), }; @@ -128,7 +128,7 @@ public async Task SLI_Metrics_is_emitted_with_measure_attribute() new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET /measured/items/{name}"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), }; @@ -191,7 +191,7 @@ public async Task SLI_Metrics_is_automatically_emitted_for_minimal_api_when_auto new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET /auto-sli"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), }; @@ -215,7 +215,7 @@ public async Task SLI_Metrics_is_emitted_with_enrichment_for_minimal_api() new("CustomerResourceId", "TestCustomerResourceId"), new("LocationId", "ms-loc://az/public/West US 3"), new("Operation", "GET /hello"), - new("activity.status.code", "Ok"), + new("Outcome", "Success"), new("http.response.status.code", 200), new("http.request.method", "GET"), }; @@ -336,13 +336,24 @@ private void OnMeasurementRecorded(Instrument instrument, long measurement, Read private void ValidateMetrics(KeyValuePair[] expectedTags) { + expectedTags = AddDefaultHttpMethod(expectedTags); + _callbackCalled.Should().BeTrue(); _instrument!.Name.Should().Be("operation.duration"); _instrument.Unit.Should().Be("ms"); _measurement.Should().BeInRange(MillisecondsDelay - 10, MillisecondsDelay + 400); + _actualTags.Should().NotContain(tag => tag.Key == "activity.status.code"); _actualTags.Should().BeEquivalentTo(expectedTags); } + private static KeyValuePair[] AddDefaultHttpMethod(KeyValuePair[] expectedTags) + { + if (expectedTags.Any(tag => tag.Key == "http.request.method")) + return expectedTags; + + return [.. expectedTags, new KeyValuePair("http.request.method", "GET")]; + } + protected virtual void Dispose(bool disposing) { if (!_disposedValue) diff --git a/Trellis.ServiceLevelIndicators.Asp/tests/TestController.cs b/Trellis.ServiceLevelIndicators.Asp/tests/TestController.cs index ea5db0e..0ae2a54 100644 --- a/Trellis.ServiceLevelIndicators.Asp/tests/TestController.cs +++ b/Trellis.ServiceLevelIndicators.Asp/tests/TestController.cs @@ -16,6 +16,15 @@ public class TestController : ControllerBase [HttpGet("bad_request")] public IActionResult Bad() => BadRequest("Sad World!"); + [HttpGet("unprocessable")] + public IActionResult Unprocessable() => UnprocessableEntity("Invalid World!"); + + [HttpGet("too_many_requests")] + public IActionResult TooManyRequestsResult() => StatusCode(429, "Busy World!"); + + [HttpGet("redirect")] + public IActionResult RedirectResult() => StatusCode(302); + [HttpGet("server_error")] public IActionResult ServerError() => StatusCode(500, "Server Error!"); @@ -26,6 +35,13 @@ public IActionResult Throw() throw new InvalidOperationException("Boom"); } + [HttpGet("request_aborted")] + public IActionResult RequestAborted() + { + HttpContext.Abort(); + throw new OperationCanceledException(); + } + [HttpGet("operation")] [ServiceLevelIndicator(Operation = "TestOperation")] public IActionResult GetOperation() => Ok("Hello World!"); @@ -58,4 +74,4 @@ public IActionResult TryGetMeasuredOperation(string value) [HttpGet("name/{first}/{surname}/{age}")] public IActionResult GetCustomerResourceId([Measure] string first, [CustomerResourceId] string surname, [Measure] int age) => Ok(first + " " + surname + " " + age); -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/tests/TestEnrichment.cs b/Trellis.ServiceLevelIndicators.Asp/tests/TestEnrichment.cs index 707293e..1a3b9b6 100644 --- a/Trellis.ServiceLevelIndicators.Asp/tests/TestEnrichment.cs +++ b/Trellis.ServiceLevelIndicators.Asp/tests/TestEnrichment.cs @@ -13,4 +13,4 @@ public ValueTask EnrichAsync(WebEnrichmentContext context, CancellationToken can context.AddAttribute(_key, _value); return ValueTask.CompletedTask; } -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators.Asp/tests/TestHostBuilder.cs b/Trellis.ServiceLevelIndicators.Asp/tests/TestHostBuilder.cs index fdbf753..22465fb 100644 --- a/Trellis.ServiceLevelIndicators.Asp/tests/TestHostBuilder.cs +++ b/Trellis.ServiceLevelIndicators.Asp/tests/TestHostBuilder.cs @@ -4,6 +4,7 @@ using System.Threading.Tasks; using Microsoft.AspNetCore.Builder; using Microsoft.AspNetCore.Hosting; +using Microsoft.AspNetCore.Http; using Microsoft.AspNetCore.TestHost; using Microsoft.Extensions.DependencyInjection; using Microsoft.Extensions.Hosting; @@ -36,6 +37,32 @@ public static async Task CreateHostWithSli(Meter meter) => .UseEndpoints(endpoints => endpoints.MapControllers()))) .StartAsync(); + public static async Task CreateHostWithSli(Meter meter, Func classifier) => + await new HostBuilder() + .ConfigureWebHost(webBuilder => webBuilder + .UseTestServer() + .ConfigureServices(services => + { + services.AddControllers(); + services.AddServiceLevelIndicator(options => + { + options.Meter = meter; + options.CustomerResourceId = "TestCustomerResourceId"; + options.LocationId = ServiceLevelIndicator.CreateLocationId("public", "West US 3"); + }) + .AddMvc() + .ClassifyHttpOutcome(classifier); + }) + .Configure(app => app.UseRouting() + .UseServiceLevelIndicator() + .Use(async (context, next) => + { + await Task.Delay(MillisecondsDelay); + await next(context); + }) + .UseEndpoints(endpoints => endpoints.MapControllers()))) + .StartAsync(); + public static async Task CreateHostWithSliEnriched(Meter meter) => await new HostBuilder() .ConfigureWebHost(webBuilder => webBuilder @@ -110,4 +137,4 @@ public static async Task CreateHostWithSliEnriched(Meter meter) => }) .UseEndpoints(endpoints => endpoints.MapControllers()))) .StartAsync(); -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators/src/IEnrichment.cs b/Trellis.ServiceLevelIndicators/src/IEnrichment.cs index 1784a78..0ac467a 100644 --- a/Trellis.ServiceLevelIndicators/src/IEnrichment.cs +++ b/Trellis.ServiceLevelIndicators/src/IEnrichment.cs @@ -16,4 +16,4 @@ public interface IEnrichment /// A cancellation token. /// A representing the asynchronous operation. ValueTask EnrichAsync(T context, CancellationToken cancellationToken); -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators/src/IEnrichmentContext.cs b/Trellis.ServiceLevelIndicators/src/IEnrichmentContext.cs index 55cfbad..8fa7993 100644 --- a/Trellis.ServiceLevelIndicators/src/IEnrichmentContext.cs +++ b/Trellis.ServiceLevelIndicators/src/IEnrichmentContext.cs @@ -22,4 +22,4 @@ public interface IEnrichmentContext /// The attribute name. /// The attribute value. void AddAttribute(string name, object? value); -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators/src/IServiceLevelIndicatorBuilder.cs b/Trellis.ServiceLevelIndicators/src/IServiceLevelIndicatorBuilder.cs index aacd1b9..6381390 100644 --- a/Trellis.ServiceLevelIndicators/src/IServiceLevelIndicatorBuilder.cs +++ b/Trellis.ServiceLevelIndicators/src/IServiceLevelIndicatorBuilder.cs @@ -1,4 +1,4 @@ -namespace Trellis.ServiceLevelIndicators; +namespace Trellis.ServiceLevelIndicators; using Microsoft.Extensions.DependencyInjection; diff --git a/Trellis.ServiceLevelIndicators/src/MeasuredOperation.cs b/Trellis.ServiceLevelIndicators/src/MeasuredOperation.cs index b701b5c..1a92e04 100644 --- a/Trellis.ServiceLevelIndicators/src/MeasuredOperation.cs +++ b/Trellis.ServiceLevelIndicators/src/MeasuredOperation.cs @@ -14,7 +14,8 @@ public class MeasuredOperation : IDisposable private readonly ServiceLevelIndicator _serviceLevelIndicator; private readonly Stopwatch _stopWatch; private readonly HashSet _attributeNames; - private ActivityStatusCode _activityStatusCode = ActivityStatusCode.Unset; + private SliOutcome _outcome = SliOutcome.Ignored; + private bool _outcomeExplicitlySet; private readonly object _disposeLock = new(); public MeasuredOperation(ServiceLevelIndicator serviceLevelIndicator, string operation, params KeyValuePair[] attributes) : @@ -23,6 +24,8 @@ public MeasuredOperation(ServiceLevelIndicator serviceLevelIndicator, string ope public MeasuredOperation(ServiceLevelIndicator serviceLevelIndicator, string operation, string customerResourceId, params KeyValuePair[] attributes) { + ArgumentException.ThrowIfNullOrWhiteSpace(operation); + _serviceLevelIndicator = serviceLevelIndicator; Operation = operation; CustomerResourceId = customerResourceId; @@ -30,7 +33,7 @@ public MeasuredOperation(ServiceLevelIndicator serviceLevelIndicator, string ope _attributeNames = new HashSet(attributes.Length, StringComparer.Ordinal); for (var i = 0; i < attributes.Length; i++) { - _serviceLevelIndicator.ValidateAttributeName(attributes[i].Key); + ServiceLevelIndicator.ValidateAttributeName(attributes[i].Key); ValidateUniqueAttributeName(attributes[i].Key); Attributes.Add(attributes[i]); } @@ -54,10 +57,35 @@ public MeasuredOperation(ServiceLevelIndicator serviceLevelIndicator, string ope public List> Attributes { get; } /// - /// Sets the recorded with the measurement. + /// Sets the outcome recorded with the measurement. + /// + /// The SLI outcome. + public void SetOutcome(SliOutcome outcome) + { + _outcome = outcome; + _outcomeExplicitlySet = true; + } + + internal void SetInferredOutcome(SliOutcome outcome) + { + if (!_outcomeExplicitlySet) + _outcome = outcome; + } + + internal void ForceOutcome(SliOutcome outcome) => _outcome = outcome; + + /// + /// Sets the outcome based on an . /// /// The activity status code. - public void SetActivityStatusCode(ActivityStatusCode activityStatusCode) => _activityStatusCode = activityStatusCode; + [Obsolete("Use SetOutcome(SliOutcome) instead.")] + public void SetActivityStatusCode(ActivityStatusCode activityStatusCode) => + SetOutcome(activityStatusCode switch + { + ActivityStatusCode.Ok => SliOutcome.Success, + ActivityStatusCode.Error => SliOutcome.Failure, + _ => SliOutcome.Ignored + }); /// /// Adds a custom attribute to the measurement. @@ -66,7 +94,7 @@ public MeasuredOperation(ServiceLevelIndicator serviceLevelIndicator, string ope /// The attribute value. public void AddAttribute(string attribute, object? value) { - _serviceLevelIndicator.ValidateAttributeName(attribute); + ServiceLevelIndicator.ValidateAttributeName(attribute); ValidateUniqueAttributeName(attribute); Attributes.Add(new KeyValuePair(attribute, value)); } @@ -91,8 +119,7 @@ protected virtual void Dispose(bool disposing) { _stopWatch.Stop(); var elapsedTime = _stopWatch.ElapsedMilliseconds; - Attributes.Add(new KeyValuePair(_serviceLevelIndicator.ServiceLevelIndicatorOptions.ActivityStatusCodeAttributeName, _activityStatusCode.ToString())); - _serviceLevelIndicator.RecordMeasurement(Operation, CustomerResourceId, elapsedTime, Attributes.ToArray()); + _serviceLevelIndicator.RecordMeasurement(Operation, CustomerResourceId, elapsedTime, _outcome, Attributes.ToArray()); } _disposed = true; diff --git a/Trellis.ServiceLevelIndicators/src/README.md b/Trellis.ServiceLevelIndicators/src/README.md index c9d46f1..e21c8cc 100644 --- a/Trellis.ServiceLevelIndicators/src/README.md +++ b/Trellis.ServiceLevelIndicators/src/README.md @@ -66,11 +66,11 @@ builder.Services.AddServiceLevelIndicator(options => Wrap any block of code in a `using StartMeasuring` scope: ```csharp -async Task DoWork(ServiceLevelIndicator sli) +void DoWork(ServiceLevelIndicator sli) { using var op = sli.StartMeasuring("ProcessOrder"); // Do work... - op.SetActivityStatusCode(ActivityStatusCode.Ok); + op.SetOutcome(SliOutcome.Success); } ``` @@ -81,22 +81,26 @@ var attribute = new KeyValuePair("Event", "OrderCreated"); using var op = sli.StartMeasuring("ProcessOrder", attribute); ``` -Custom attribute names must be unique and must not reuse SLI-reserved tags such as `CustomerResourceId`, `LocationId`, `Operation`, or `activity.status.code`. +Custom attribute names must be unique and must not reuse SLI-reserved tags such as `CustomerResourceId`, `LocationId`, `Operation`, `Outcome`, `activity.status.code`, `http.request.method`, or `http.response.status.code`. ## Emitted Metrics -A meter named `Trellis.SLI` with instrument `operation.duration` (milliseconds) is emitted with the following attributes: +By default, a meter named `Trellis.SLI` emits the `operation.duration` histogram in milliseconds. If you configure `ServiceLevelIndicatorOptions.Meter`, metrics are emitted from that meter instead. + +`StartMeasuring(...)` emits the following attributes when the returned `MeasuredOperation` is disposed: | Attribute | Description | |-----------|-------------| | `Operation` | The name of the measured operation | | `CustomerResourceId` | A **stable** identifier for the entity the operation is acting on (tenant, subscription, account, work item, etc.). NOT the caller, NOT a per-request ID, NOT a user/email. | | `LocationId` | Where the service is running (e.g. `ms-loc://az/public/westus3`) | -| `activity.status.code` | `Ok`, `Error`, or `Unset` based on the operation outcome | +| `Outcome` | `Success`, `Failure`, `ClientError`, or `Ignored` | + +Direct `Record(...)` calls emit `CustomerResourceId`, `LocationId`, `Operation`, `Outcome`, and any custom attributes supplied to the call. Without an explicit outcome, manual/background measurements default to `Ignored`. ## Cardinality Guidance -All three required tags — `Operation`, `LocationId`, and `CustomerResourceId` — must be **low-cardinality and bounded**. Good values: tenant, subscription, environment, region, product tier, work-item type. Bad values: per-request GUIDs, user IDs / emails, timestamps, free-form user input. The same rule applies to any custom attributes you add via `MeasuredOperation.AddAttribute(...)`. +Required tags must be stable and meaningful. Good values: tenant, subscription, environment, region, product tier, work-item type. Bad values: per-request GUIDs, timestamps, free-form user input, or raw emails when a stable object ID is available. The same rule applies to any custom attributes you add via `MeasuredOperation.AddAttribute(...)`. ## Disposal @@ -107,7 +111,7 @@ All three required tags — `Operation`, `LocationId`, and `CustomerResourceId` | Type / Method | Description | |---------------|-------------| | `ServiceLevelIndicator.StartMeasuring(operation, attributes)` | Start a measured operation scope | -| `MeasuredOperation.SetActivityStatusCode(code)` | Set the outcome status | +| `MeasuredOperation.SetOutcome(outcome)` | Set the SLI outcome | | `MeasuredOperation.AddAttribute(name, value)` | Add a custom metric attribute | | `MeasuredOperation.CustomerResourceId` | Get/set the customer resource ID | | `ServiceLevelIndicator.CreateLocationId(cloud, region?, zone?)` | Helper to build a location ID string | diff --git a/Trellis.ServiceLevelIndicators/src/ServiceLevelIndicator.cs b/Trellis.ServiceLevelIndicators/src/ServiceLevelIndicator.cs index dc6e09a..6710339 100644 --- a/Trellis.ServiceLevelIndicators/src/ServiceLevelIndicator.cs +++ b/Trellis.ServiceLevelIndicators/src/ServiceLevelIndicator.cs @@ -4,6 +4,7 @@ using System.Diagnostics; using System.Diagnostics.Metrics; using System.Reflection; +using System.Threading.Tasks; using Microsoft.Extensions.Options; /// @@ -23,12 +24,18 @@ public sealed class ServiceLevelIndicator : IDisposable /// public const string DefaultMeterName = "Trellis.SLI"; + /// + /// Default value used when no customer resource identifier is available. + /// + public const string UnknownCustomerResourceId = "Unknown"; + /// /// Gets the options used to configure this instance. /// public ServiceLevelIndicatorOptions ServiceLevelIndicatorOptions { get; } private readonly Histogram _responseLatencyHistogram; + private readonly Counter _unknownCustomerResourceIdCounter; private readonly Meter _meter; private readonly bool _ownsMeter; private bool _disposed; @@ -39,7 +46,6 @@ public ServiceLevelIndicator(IOptions options) ArgumentException.ThrowIfNullOrWhiteSpace(ServiceLevelIndicatorOptions.LocationId, nameof(ServiceLevelIndicatorOptions.LocationId)); ArgumentException.ThrowIfNullOrWhiteSpace(ServiceLevelIndicatorOptions.DurationInstrumentName, nameof(ServiceLevelIndicatorOptions.DurationInstrumentName)); - ValidateActivityStatusCodeAttributeName(); if (ServiceLevelIndicatorOptions.Meter == null) { @@ -56,6 +62,7 @@ public ServiceLevelIndicator(IOptions options) } _responseLatencyHistogram = _meter.CreateHistogram(ServiceLevelIndicatorOptions.DurationInstrumentName, "ms", "Duration of the operation."); + _unknownCustomerResourceIdCounter = _meter.CreateCounter("sli.diagnostics.unknown_customer_resource_id", description: "Count of SLI measurements emitted with an unknown customer resource identifier."); } /// @@ -79,6 +86,16 @@ public void Dispose() public void Record(string operation, long elapsedTime, params KeyValuePair[] attributes) => Record(operation, ServiceLevelIndicatorOptions.CustomerResourceId, elapsedTime, attributes); + /// + /// Records an operation measurement using the default . + /// + /// The operation name. + /// Elapsed time in milliseconds. + /// The SLI outcome. + /// Additional measurement attributes. + public void Record(string operation, long elapsedTime, SliOutcome outcome, params KeyValuePair[] attributes) => + Record(operation, ServiceLevelIndicatorOptions.CustomerResourceId, elapsedTime, outcome, attributes); + /// /// Records an operation measurement with an explicit customer resource identifier. /// @@ -87,21 +104,38 @@ public void Record(string operation, long elapsedTime, params KeyValuePairElapsed time in milliseconds. /// Additional measurement attributes. public void Record(string operation, string customerResourceId, long elapsedTime, params KeyValuePair[] attributes) + => Record(operation, customerResourceId, elapsedTime, SliOutcome.Ignored, attributes); + + /// + /// Records an operation measurement with an explicit customer resource identifier. + /// + /// The operation name. + /// The customer resource identifier. + /// Elapsed time in milliseconds. + /// The SLI outcome. + /// Additional measurement attributes. + public void Record(string operation, string customerResourceId, long elapsedTime, SliOutcome outcome, params KeyValuePair[] attributes) { ValidateAttributes(attributes); ValidateDuplicateArgumentAttributeNames(attributes); - RecordMeasurement(operation, customerResourceId, elapsedTime, attributes); + RecordMeasurement(operation, customerResourceId, elapsedTime, outcome, attributes); } - internal void RecordMeasurement(string operation, string customerResourceId, long elapsedTime, params KeyValuePair[] attributes) + internal void RecordMeasurement(string operation, string customerResourceId, long elapsedTime, SliOutcome outcome, params KeyValuePair[] attributes) { + ArgumentException.ThrowIfNullOrWhiteSpace(operation, nameof(operation)); ValidateRecordAttributeNames(attributes); + customerResourceId = NormalizeCustomerResourceId(customerResourceId); + RecordUnknownCustomerResourceId(operation, customerResourceId); + Activity.Current?.SetStatus(GetActivityStatusCode(outcome)); + var tagList = new TagList { { "CustomerResourceId", customerResourceId }, { "LocationId", ServiceLevelIndicatorOptions.LocationId }, - { "Operation", operation } + { "Operation", operation }, + { "Outcome", ToWireValue(outcome) } }; for (var i = 0; i < attributes.Length; i++) @@ -118,33 +152,127 @@ internal void RecordMeasurement(string operation, string customerResourceId, lon /// A that records the metric on disposal. public MeasuredOperation StartMeasuring(string operation, params KeyValuePair[] attributes) => new(this, operation, attributes); - internal void ValidateAttributeName(string attribute) + /// + /// Measures a synchronous operation and infers the SLI outcome from completion or exception. + /// + public void Measure(string operation, Action action, params KeyValuePair[] attributes) { - ArgumentException.ThrowIfNullOrWhiteSpace(attribute, nameof(attribute)); + ArgumentNullException.ThrowIfNull(action); - if (attribute is "CustomerResourceId" or "LocationId" or "Operation" || - attribute == ServiceLevelIndicatorOptions.ActivityStatusCodeAttributeName) + using var measuredOperation = StartMeasuring(operation, attributes); + try { - throw new ArgumentException( - $"'{attribute}' is a reserved Service Level Indicator attribute name and cannot be used as a custom metric attribute.", - nameof(attribute)); + action(); + measuredOperation.SetInferredOutcome(SliOutcome.Success); + } + catch (OperationCanceledException) + { + measuredOperation.ForceOutcome(SliOutcome.Ignored); + throw; + } + catch + { + measuredOperation.ForceOutcome(SliOutcome.Failure); + throw; } } - private void ValidateActivityStatusCodeAttributeName() + /// + /// Measures a synchronous operation and returns its result. + /// + public T Measure(string operation, Func action, params KeyValuePair[] attributes) { - var attribute = ServiceLevelIndicatorOptions.ActivityStatusCodeAttributeName; - ArgumentException.ThrowIfNullOrWhiteSpace(attribute, nameof(ServiceLevelIndicatorOptions.ActivityStatusCodeAttributeName)); + ArgumentNullException.ThrowIfNull(action); - if (attribute is "CustomerResourceId" or "LocationId" or "Operation") + using var measuredOperation = StartMeasuring(operation, attributes); + try + { + var result = action(); + measuredOperation.SetInferredOutcome(SliOutcome.Success); + return result; + } + catch (OperationCanceledException) + { + measuredOperation.ForceOutcome(SliOutcome.Ignored); + throw; + } + catch + { + measuredOperation.ForceOutcome(SliOutcome.Failure); + throw; + } + } + + /// + /// Measures an asynchronous operation and infers the SLI outcome from completion or exception. + /// + public async Task MeasureAsync(string operation, Func action, params KeyValuePair[] attributes) + { + ArgumentNullException.ThrowIfNull(action); + + using var measuredOperation = StartMeasuring(operation, attributes); + try + { + await action().ConfigureAwait(false); + measuredOperation.SetInferredOutcome(SliOutcome.Success); + } + catch (OperationCanceledException) + { + measuredOperation.ForceOutcome(SliOutcome.Ignored); + throw; + } + catch + { + measuredOperation.ForceOutcome(SliOutcome.Failure); + throw; + } + } + + /// + /// Measures an asynchronous operation and returns its result. + /// + public async Task MeasureAsync(string operation, Func> action, params KeyValuePair[] attributes) + { + ArgumentNullException.ThrowIfNull(action); + + using var measuredOperation = StartMeasuring(operation, attributes); + try + { + var result = await action().ConfigureAwait(false); + measuredOperation.SetInferredOutcome(SliOutcome.Success); + return result; + } + catch (OperationCanceledException) + { + measuredOperation.ForceOutcome(SliOutcome.Ignored); + throw; + } + catch + { + measuredOperation.ForceOutcome(SliOutcome.Failure); + throw; + } + } + + internal static void ValidateAttributeName(string attribute) + { + ArgumentException.ThrowIfNullOrWhiteSpace(attribute, nameof(attribute)); + + if (attribute is "CustomerResourceId" + or "LocationId" + or "Operation" + or "Outcome" + or "activity.status.code" + or "http.request.method" + or "http.response.status.code") { throw new ArgumentException( - $"'{attribute}' is a reserved Service Level Indicator attribute name and cannot be used as the activity status code attribute name.", - nameof(ServiceLevelIndicatorOptions.ActivityStatusCodeAttributeName)); + $"'{attribute}' is a reserved Service Level Indicator attribute name and cannot be used as a custom metric attribute.", + nameof(attribute)); } } - private void ValidateAttributes(ReadOnlySpan> attributes) + private static void ValidateAttributes(ReadOnlySpan> attributes) { for (var i = 0; i < attributes.Length; i++) ValidateAttributeName(attributes[i].Key); @@ -184,7 +312,8 @@ private static void ValidateRecordAttributeNames(ReadOnlySpan !string.IsNullOrEmpty(s))); return id; } + + internal static string ToWireValue(SliOutcome outcome) => outcome switch + { + SliOutcome.Success => "Success", + SliOutcome.Failure => "Failure", + SliOutcome.ClientError => "ClientError", + SliOutcome.Ignored => "Ignored", + _ => throw new ArgumentOutOfRangeException(nameof(outcome), outcome, "Unknown SLI outcome.") + }; + + private static ActivityStatusCode GetActivityStatusCode(SliOutcome outcome) => outcome switch + { + SliOutcome.Success => ActivityStatusCode.Ok, + SliOutcome.Failure => ActivityStatusCode.Error, + SliOutcome.ClientError or SliOutcome.Ignored => ActivityStatusCode.Unset, + _ => ActivityStatusCode.Unset + }; + + private static string NormalizeCustomerResourceId(string customerResourceId) => + string.IsNullOrWhiteSpace(customerResourceId) ? UnknownCustomerResourceId : customerResourceId; + + private void RecordUnknownCustomerResourceId(string operation, string customerResourceId) + { + if (!string.Equals(customerResourceId, UnknownCustomerResourceId, StringComparison.Ordinal)) + return; + + _unknownCustomerResourceIdCounter.Add( + 1, + new KeyValuePair("Operation", operation), + new KeyValuePair("LocationId", ServiceLevelIndicatorOptions.LocationId)); + } } diff --git a/Trellis.ServiceLevelIndicators/src/ServiceLevelIndicatorMeterProviderBuilderExtensions.cs b/Trellis.ServiceLevelIndicators/src/ServiceLevelIndicatorMeterProviderBuilderExtensions.cs index 150063f..423bc2c 100644 --- a/Trellis.ServiceLevelIndicators/src/ServiceLevelIndicatorMeterProviderBuilderExtensions.cs +++ b/Trellis.ServiceLevelIndicators/src/ServiceLevelIndicatorMeterProviderBuilderExtensions.cs @@ -42,4 +42,4 @@ public static MeterProviderBuilder AddServiceLevelIndicatorInstrumentation(this ArgumentNullException.ThrowIfNull(meter); return builder.AddServiceLevelIndicatorInstrumentation(meter.Name); } -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators/src/ServiceLevelIndicatorOptions.cs b/Trellis.ServiceLevelIndicators/src/ServiceLevelIndicatorOptions.cs index a0166d6..5371b27 100644 --- a/Trellis.ServiceLevelIndicators/src/ServiceLevelIndicatorOptions.cs +++ b/Trellis.ServiceLevelIndicators/src/ServiceLevelIndicatorOptions.cs @@ -4,7 +4,7 @@ /// /// Options for configuring the Service Level Indicator. -/// DefaultCustomerResourceId & LocationId are mandatory properties. +/// LocationId is required. CustomerResourceId defaults to when not configured. /// public class ServiceLevelIndicatorOptions { @@ -18,7 +18,7 @@ public class ServiceLevelIndicatorOptions /// CustomerResourceId is the unique identifier for the customer like subscriptionId, tenantId, etc. /// CustomerResourceId can be set for the entire service here or in each API method. /// - public string CustomerResourceId { get; set; } = "Unset"; + public string CustomerResourceId { get; set; } = ServiceLevelIndicator.UnknownCustomerResourceId; /// /// Location where the service is running. @@ -31,9 +31,9 @@ public class ServiceLevelIndicatorOptions public string DurationInstrumentName { get; set; } = "operation.duration"; /// - /// Activity Status Code attribute name. - /// [ActivityStatusCode](https://learn.microsoft.com/en-us/dotnet/api/system.diagnostics.activitystatuscode?view=net-7.0) + /// Obsolete. Activity status is no longer emitted as a metric dimension. /// + [Obsolete("Activity status is no longer emitted as a metric dimension. Use SliOutcome instead.")] public string ActivityStatusCodeAttributeName { get; set; } = "activity.status.code"; /// @@ -41,4 +41,4 @@ public class ServiceLevelIndicatorOptions /// If false, use the ServiceLevelIndicator Attribute to emit. /// public bool AutomaticallyEmitted { get; set; } = true; -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators/src/ServiceLevelIndicatorServiceCollectionExtensions.cs b/Trellis.ServiceLevelIndicators/src/ServiceLevelIndicatorServiceCollectionExtensions.cs index c5e3876..5ecaad8 100644 --- a/Trellis.ServiceLevelIndicators/src/ServiceLevelIndicatorServiceCollectionExtensions.cs +++ b/Trellis.ServiceLevelIndicators/src/ServiceLevelIndicatorServiceCollectionExtensions.cs @@ -1,4 +1,4 @@ -namespace Trellis.ServiceLevelIndicators; +namespace Trellis.ServiceLevelIndicators; using Microsoft.Extensions.DependencyInjection; diff --git a/Trellis.ServiceLevelIndicators/src/SliOutcome.cs b/Trellis.ServiceLevelIndicators/src/SliOutcome.cs new file mode 100644 index 0000000..9666053 --- /dev/null +++ b/Trellis.ServiceLevelIndicators/src/SliOutcome.cs @@ -0,0 +1,27 @@ +namespace Trellis.ServiceLevelIndicators; + +/// +/// SLI outcome emitted with each operation measurement. +/// +public enum SliOutcome +{ + /// + /// The operation completed successfully and counts toward the success numerator and denominator. + /// + Success, + + /// + /// The operation failed and counts toward the success-rate denominator. + /// + Failure, + + /// + /// The operation failed because of caller input or another expected client condition. + /// + ClientError, + + /// + /// The operation should be excluded from the default success-rate denominator. + /// + Ignored +} diff --git a/Trellis.ServiceLevelIndicators/tests/CustomerResourceIdTests.cs b/Trellis.ServiceLevelIndicators/tests/CustomerResourceIdTests.cs index 6c28ada..b0fb72f 100644 --- a/Trellis.ServiceLevelIndicators/tests/CustomerResourceIdTests.cs +++ b/Trellis.ServiceLevelIndicators/tests/CustomerResourceIdTests.cs @@ -31,4 +31,4 @@ public void Cannot_create_CustomerResourceId_with_default_GUID() action.Should().Throw() .WithMessage("Value cannot be null. (Parameter 'serviceId')"); } -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators/tests/LocationIdTests.cs b/Trellis.ServiceLevelIndicators/tests/LocationIdTests.cs index 4cde305..039226d 100644 --- a/Trellis.ServiceLevelIndicators/tests/LocationIdTests.cs +++ b/Trellis.ServiceLevelIndicators/tests/LocationIdTests.cs @@ -37,4 +37,4 @@ public void Will_create_LocationId_with_cloud_region_zone() // Assert actual.Should().Be("ms-loc://az/Public/eastus2/1"); } -} \ No newline at end of file +} diff --git a/Trellis.ServiceLevelIndicators/tests/ServiceLevelIndicatorMeterProviderBuilderExtensionsTests.cs b/Trellis.ServiceLevelIndicators/tests/ServiceLevelIndicatorMeterProviderBuilderExtensionsTests.cs index 5ad37de..9d895fc 100644 --- a/Trellis.ServiceLevelIndicators/tests/ServiceLevelIndicatorMeterProviderBuilderExtensionsTests.cs +++ b/Trellis.ServiceLevelIndicators/tests/ServiceLevelIndicatorMeterProviderBuilderExtensionsTests.cs @@ -1,4 +1,4 @@ -namespace Trellis.ServiceLevelIndicators.Tests; +namespace Trellis.ServiceLevelIndicators.Tests; using System; using System.Diagnostics.Metrics; diff --git a/Trellis.ServiceLevelIndicators/tests/ServiceLevelIndicatorTests.cs b/Trellis.ServiceLevelIndicators/tests/ServiceLevelIndicatorTests.cs index 295ae4c..1e7e423 100644 --- a/Trellis.ServiceLevelIndicators/tests/ServiceLevelIndicatorTests.cs +++ b/Trellis.ServiceLevelIndicators/tests/ServiceLevelIndicatorTests.cs @@ -71,6 +71,7 @@ public void Record() new("CustomerResourceId", customerResourceId), new("LocationId", locationId), new("Operation", operation), + new("Outcome", "Ignored"), new("Attribute1", "Value1"), new("Attribute2", "Value2") ]; @@ -103,7 +104,7 @@ public async Task Will_measure_code_block() new("CustomerResourceId", customerResourceId), new("LocationId", locationId), new("Operation", "SleepWorker"), - new("activity.status.code", nameof(System.Diagnostics.ActivityStatusCode.Ok)), + new("Outcome", "Success"), ]; ValidateMetrics(sleepTime, approx: 100); @@ -112,10 +113,96 @@ async Task MeasureCodeBlock(ServiceLevelIndicator serviceLevelIndicator) { using var measuredOperation = serviceLevelIndicator.StartMeasuring("SleepWorker"); await Task.Delay(sleepTime); - measuredOperation.SetActivityStatusCode(System.Diagnostics.ActivityStatusCode.Ok); + measuredOperation.SetOutcome(SliOutcome.Success); } } + [Fact] + public void Measure_sets_success_outcome_when_action_completes() + { + // Arrange + var serviceLevelIndicator = CreateServiceLevelIndicator(); + + // Act + serviceLevelIndicator.Measure("MeasuredAction", () => { }); + + // Assert + _expectedTags = + [ + new("CustomerResourceId", "TestResourceId"), + new("LocationId", "TestLocationId"), + new("Operation", "MeasuredAction"), + new("Outcome", "Success"), + ]; + + ValidateMetrics(0, approx: 100); + } + + [Fact] + public void Measure_sets_failure_outcome_and_rethrows_when_action_throws() + { + // Arrange + var serviceLevelIndicator = CreateServiceLevelIndicator(); + + // Act + Action act = () => serviceLevelIndicator.Measure("MeasuredAction", () => throw new InvalidOperationException("Boom")); + + // Assert + act.Should().Throw().WithMessage("Boom"); + _expectedTags = + [ + new("CustomerResourceId", "TestResourceId"), + new("LocationId", "TestLocationId"), + new("Operation", "MeasuredAction"), + new("Outcome", "Failure"), + ]; + + ValidateMetrics(0, approx: 100); + } + + [Fact] + public void Measure_sets_ignored_outcome_and_rethrows_for_cancellation() + { + // Arrange + var serviceLevelIndicator = CreateServiceLevelIndicator(); + + // Act + Action act = () => serviceLevelIndicator.Measure("MeasuredAction", () => throw new OperationCanceledException()); + + // Assert + act.Should().Throw(); + _expectedTags = + [ + new("CustomerResourceId", "TestResourceId"), + new("LocationId", "TestLocationId"), + new("Operation", "MeasuredAction"), + new("Outcome", "Ignored"), + ]; + + ValidateMetrics(0, approx: 100); + } + + [Fact] + public async Task MeasureAsync_sets_success_outcome_when_action_completes() + { + // Arrange + var serviceLevelIndicator = CreateServiceLevelIndicator(); + + // Act + await serviceLevelIndicator.MeasureAsync("MeasuredAsyncAction", () => Task.CompletedTask); + + // Assert + _expectedTags = + [ + new("CustomerResourceId", "TestResourceId"), + new("LocationId", "TestLocationId"), + new("Operation", "MeasuredAsyncAction"), + new("Outcome", "Success"), + ]; + + ValidateMetrics(0, approx: 100); + } + [Fact] public void Uses_default_meter_when_none_provided() { @@ -160,12 +247,105 @@ public void Record_with_no_attributes() [ new("CustomerResourceId", customerResourceId), new("LocationId", locationId), - new("Operation", operation) + new("Operation", operation), + new("Outcome", "Ignored") ]; ValidateMetrics(elapsedTime); } + [Theory] + [InlineData(SliOutcome.Success, "Success")] + [InlineData(SliOutcome.Failure, "Failure")] + [InlineData(SliOutcome.ClientError, "ClientError")] + [InlineData(SliOutcome.Ignored, "Ignored")] + public void Record_emits_explicit_outcome_wire_value(SliOutcome outcome, string wireValue) + { + // Arrange + var serviceLevelIndicator = CreateServiceLevelIndicator(); + + // Act + serviceLevelIndicator.Record("TestOperation", 25, outcome); + + // Assert + _expectedTags = + [ + new("CustomerResourceId", "TestResourceId"), + new("LocationId", "TestLocationId"), + new("Operation", "TestOperation"), + new("Outcome", wireValue) + ]; + + ValidateMetrics(25); + } + + [Fact] + public void Record_uses_unknown_customer_resource_id_when_default_is_not_configured() + { + // Arrange + var serviceLevelIndicator = new ServiceLevelIndicator(Options.Create(new ServiceLevelIndicatorOptions + { + LocationId = "TestLocationId", + Meter = _meter + })); + + // Act + serviceLevelIndicator.Record("TestOperation", elapsedTime: 10); + + // Assert + _expectedTags = + [ + new("CustomerResourceId", "Unknown"), + new("LocationId", "TestLocationId"), + new("Operation", "TestOperation"), + new("Outcome", "Ignored") + ]; + + ValidateMetrics(10); + } + + [Fact] + public void Unknown_customer_resource_id_emits_diagnostic_counter_on_configured_meter() + { + // Arrange + var serviceLevelIndicator = new ServiceLevelIndicator(Options.Create(new ServiceLevelIndicatorOptions + { + LocationId = "TestLocationId", + Meter = _meter + })); + + Instrument? counterInstrument = null; + long counterMeasurement = 0; + KeyValuePair[] counterTags = []; + + _meterListener.SetMeasurementEventCallback((instrument, measurement, tags, state) => + { + if (instrument.Name == "sli.diagnostics.unknown_customer_resource_id") + { + counterInstrument = instrument; + counterMeasurement = measurement; + counterTags = tags.ToArray(); + } + else + { + OnMeasurementRecorded(instrument, measurement, tags, state); + } + }); + + // Act + serviceLevelIndicator.Record("TestOperation", elapsedTime: 10); + + // Assert + counterInstrument.Should().NotBeNull(); + counterInstrument!.Meter.Should().BeSameAs(_meter); + counterMeasurement.Should().Be(1); + counterTags.Should().BeEquivalentTo( + [ + new KeyValuePair("Operation", "TestOperation"), + new KeyValuePair("LocationId", "TestLocationId") + ]); + } + [Fact] public void Record_with_customerResourceId_override() { @@ -193,7 +373,8 @@ public void Record_with_customerResourceId_override() [ new("CustomerResourceId", overrideCustomerResourceId), new("LocationId", locationId), - new("Operation", operation) + new("Operation", operation), + new("Outcome", "Ignored") ]; ValidateMetrics(elapsedTime); @@ -224,7 +405,7 @@ public async Task MeasuredOperation_double_dispose_does_not_record_twice() // Act var measuredOperation = serviceLevelIndicator.StartMeasuring("DoubleDispose"); await Task.Delay(50, TestContext.Current.CancellationToken); - measuredOperation.SetActivityStatusCode(System.Diagnostics.ActivityStatusCode.Ok); + measuredOperation.SetOutcome(SliOutcome.Success); measuredOperation.Dispose(); measuredOperation.Dispose(); // Second dispose should be a no-op @@ -260,7 +441,8 @@ public void Customize_instrument_name() [ new("CustomerResourceId", customerResourceId), new("LocationId", locationId), - new("Operation", operation) + new("Operation", operation), + new("Outcome", "Ignored") ]; ValidateMetrics(elapsedTime, InstrumentName); @@ -380,7 +562,10 @@ public void AddServiceLevelIndicator_registers_core_services_without_asp_package [InlineData("CustomerResourceId")] [InlineData("LocationId")] [InlineData("Operation")] + [InlineData("Outcome")] [InlineData("activity.status.code")] + [InlineData("http.request.method")] + [InlineData("http.response.status.code")] public void Record_rejects_reserved_custom_attribute_names(string reservedName) { // Arrange @@ -419,7 +604,10 @@ public void Record_rejects_duplicate_custom_attribute_names_as_argument_error() [InlineData("CustomerResourceId")] [InlineData("LocationId")] [InlineData("Operation")] + [InlineData("Outcome")] [InlineData("activity.status.code")] + [InlineData("http.request.method")] + [InlineData("http.response.status.code")] public void StartMeasuring_rejects_reserved_initial_attribute_names(string reservedName) { // Arrange @@ -439,7 +627,10 @@ public void StartMeasuring_rejects_reserved_initial_attribute_names(string reser [InlineData("CustomerResourceId")] [InlineData("LocationId")] [InlineData("Operation")] + [InlineData("Outcome")] [InlineData("activity.status.code")] + [InlineData("http.request.method")] + [InlineData("http.response.status.code")] public void MeasuredOperation_AddAttribute_rejects_reserved_attribute_names(string reservedName) { // Arrange @@ -503,26 +694,17 @@ public void MeasuredOperation_AddAttribute_rejects_duplicate_attribute_names() } [Fact] - public void Constructor_rejects_activity_status_attribute_name_that_collides_with_core_tags() + public void StartMeasuring_rejects_blank_operation() { // Arrange - var options = new ServiceLevelIndicatorOptions - { - CustomerResourceId = "TestResourceId", - LocationId = "TestLocationId", - Meter = _meter, - ActivityStatusCodeAttributeName = "Operation" - }; + var serviceLevelIndicator = CreateServiceLevelIndicator(); // Act - Action act = () => - { - using var serviceLevelIndicator = new ServiceLevelIndicator(Options.Create(options)); - }; + Action act = () => serviceLevelIndicator.StartMeasuring(" "); // Assert act.Should().Throw() - .WithMessage("*reserved Service Level Indicator attribute name*"); + .Where(ex => ex.ParamName == "operation"); } [Fact] diff --git a/docs/api_reference/trellis-api-sli-apiversioning.md b/docs/api_reference/trellis-api-sli-apiversioning.md index d289583..0fef47b 100644 --- a/docs/api_reference/trellis-api-sli-apiversioning.md +++ b/docs/api_reference/trellis-api-sli-apiversioning.md @@ -12,7 +12,7 @@ See also: [`trellis-api-sli.md`](trellis-api-sli.md), [`trellis-api-sli-asp.md`] | Tag | Value | |---|---| -| `http.api.version` | The single requested API version (e.g. `2023-06-06`), `"Neutral"` if the endpoint is API-version-neutral, `"Unspecified"` if no version was requested, or `""` if multiple versions were requested. | +| `http.api.version` | The single resolved API version (e.g. `2023-06-06`), `"Neutral"` if the endpoint is API-version-neutral, `"Unspecified"` if no version was requested and no default was assumed, or `""` if the request is invalid or ambiguous. | --- diff --git a/docs/api_reference/trellis-api-sli-asp.md b/docs/api_reference/trellis-api-sli-asp.md index de5e133..cc00a84 100644 --- a/docs/api_reference/trellis-api-sli-asp.md +++ b/docs/api_reference/trellis-api-sli-asp.md @@ -12,14 +12,14 @@ See also: [`trellis-api-sli.md`](trellis-api-sli.md). ## Auto-emitted dimensions (in addition to the core ones) -When the middleware is registered, every measured request emits these tags on top of the core `Operation` / `CustomerResourceId` / `LocationId` / `activity.status.code`: +When the middleware is registered, every measured request emits these tags on top of the core `CustomerResourceId` / `LocationId` / `Operation` / `Outcome` dimensions: | Tag | Source | |---|---| -| `Operation` | Route template (e.g. `GET Weatherforecast`) — derived from `ControllerActionDescriptor.AttributeRouteInfo.Template` for MVC, or from the route pattern for Minimal APIs. Overridable via `[ServiceLevelIndicator(operation)]` or `AddServiceLevelIndicator("operation")`. | -| `activity.status.code` | `Ok` for HTTP 2xx, `Error` for HTTP 5xx (or unhandled exception), `Unset` otherwise. | +| `Operation` | HTTP method plus route template (e.g. `GET WeatherForecast` or `GET /teams/{teamId}`) — derived from `ControllerActionDescriptor.AttributeRouteInfo.Template` for MVC, or from the route pattern for Minimal APIs. Overridable via `[ServiceLevelIndicator(operation)]` or `AddServiceLevelIndicator("operation")`. | +| `Outcome` | `Success` for 2xx/3xx, `ClientError` for common caller errors (400/401/403/404/409/412/422), `Failure` for 429/5xx and unhandled exceptions, and `Ignored` for request-aborted cancellations. | | `http.response.status.code` | `HttpContext.Response.StatusCode`. | -| `http.request.method` | Added when `AddHttpMethod()` is called on the SLI builder. | +| `http.request.method` | `HttpContext.Request.Method`; emitted by default. | --- @@ -43,19 +43,19 @@ Builder returned by `AddServiceLevelIndicator(...)` to chain MVC integration, HT --- -### IServiceCollectionExtensions +### ServiceLevelIndicatorCoreServiceCollectionExtensions **Declaration** ```csharp -public static class IServiceCollectionExtensions +public static class ServiceLevelIndicatorCoreServiceCollectionExtensions ``` **Methods** | Signature | Returns | Description | |---|---|---| -| `public static IServiceLevelIndicatorBuilder AddServiceLevelIndicator(this IServiceCollection services, Action configureOptions)` | `IServiceLevelIndicatorBuilder` | Registers the `ServiceLevelIndicator` singleton, configures `ServiceLevelIndicatorOptions`, and returns a builder for additional setup. | +| `public static IServiceLevelIndicatorBuilder AddServiceLevelIndicator(this IServiceCollection services, Action configureOptions)` | `IServiceLevelIndicatorBuilder` | Registers the `ServiceLevelIndicator` singleton, configures `ServiceLevelIndicatorOptions`, and returns a builder for additional setup. This extension is defined by the core package and used by the ASP.NET Core integration. | --- @@ -74,7 +74,8 @@ Extensions on `IServiceLevelIndicatorBuilder` for opting into MVC support and en | Signature | Returns | Description | |---|---|---| | `public static IServiceLevelIndicatorBuilder AddMvc(this IServiceLevelIndicatorBuilder builder)` | `IServiceLevelIndicatorBuilder` | Registers the MVC convention so that `[CustomerResourceId]` and `[Measure]` parameter attributes contribute endpoint metadata. **Required** for any MVC controller that uses these attributes. | -| `public static IServiceLevelIndicatorBuilder AddHttpMethod(this IServiceLevelIndicatorBuilder builder)` | `IServiceLevelIndicatorBuilder` | Adds the built-in enrichment that emits `http.request.method`. | +| `public static IServiceLevelIndicatorBuilder AddHttpMethod(this IServiceLevelIndicatorBuilder builder)` | `IServiceLevelIndicatorBuilder` | No-op compatibility method; `http.request.method` is emitted by default. | +| `public static IServiceLevelIndicatorBuilder ClassifyHttpOutcome(this IServiceLevelIndicatorBuilder builder, Func classifier)` | `IServiceLevelIndicatorBuilder` | Configures a global HTTP outcome classifier. The returned `SliOutcome` overrides the default status-code mapping for completed requests. | | `public static IServiceLevelIndicatorBuilder Enrich(this IServiceLevelIndicatorBuilder builder, Action action)` | `IServiceLevelIndicatorBuilder` | Registers a synchronous enrichment delegate. | | `public static IServiceLevelIndicatorBuilder EnrichAsync(this IServiceLevelIndicatorBuilder builder, Func func)` | `IServiceLevelIndicatorBuilder` | Registers an asynchronous enrichment delegate. | @@ -92,7 +93,7 @@ public static class ServiceLevelIndicatorApplicationBuilderExtensions | Signature | Returns | Description | |---|---|---| -| `public static IApplicationBuilder UseServiceLevelIndicator(this IApplicationBuilder app)` | `IApplicationBuilder` | Adds the `ServiceLevelIndicatorMiddleware` to the request pipeline. Place after routing so the endpoint is already resolved. | +| `public static IApplicationBuilder UseServiceLevelIndicator(this IApplicationBuilder app)` | `IApplicationBuilder` | Adds the `ServiceLevelIndicatorMiddleware` to the request pipeline. Place after routing and before endpoint execution so the endpoint is already resolved and request handling is measured. | --- @@ -280,7 +281,7 @@ The `ServiceLevelIndicatorMiddleware` (registered by `UseServiceLevelIndicator() 4. Starts a `MeasuredOperation` and attaches an `IServiceLevelIndicatorFeature` to `HttpContext.Features`. 5. Optionally overrides the customer id from a `CustomerResourceIdMetadata`-tagged route value. 6. Invokes the next middleware. On unhandled exceptions, sets status to 500 (when not started) and rethrows. -7. In `finally`, sets `activity.status.code` from the response status (`Ok` for 2xx, `Error` for 5xx, `Unset` otherwise) and runs all registered `IEnrichment` enrichments. Enrichment exceptions are caught and logged. +7. In `finally`, sets `Outcome`, `http.request.method`, and `http.response.status.code`, then runs all registered `IEnrichment` enrichments. Enrichment exceptions are caught and logged. 8. Disposes the `MeasuredOperation` (which records the metric) and removes the feature. Throws `InvalidOperationException` if a second instance of the middleware tries to attach an SLI feature to the same request. @@ -301,8 +302,7 @@ builder.Services.AddServiceLevelIndicator(o => // Automatic emission is enabled by default. Set AutomaticallyEmitted = false // to opt in endpoint-by-endpoint with AddServiceLevelIndicator(). }) -.AddMvc() -.AddHttpMethod(); +.AddMvc(); var app = builder.Build(); app.UseRouting(); diff --git a/docs/api_reference/trellis-api-sli.md b/docs/api_reference/trellis-api-sli.md index 7fbdc68..f0fd659 100644 --- a/docs/api_reference/trellis-api-sli.md +++ b/docs/api_reference/trellis-api-sli.md @@ -10,16 +10,16 @@ See also: [`trellis-api-sli-asp.md`](trellis-api-sli-asp.md), [`trellis-api-sli- ## Default emitted dimensions -Every measurement emits the following tags on instrument `operation.duration` (ms, `Histogram`): +`StartMeasuring(...)` scopes emit the following tags on instrument `operation.duration` (ms, `Histogram`) when the returned `MeasuredOperation` is disposed: | Tag | Source | |---|---| | `CustomerResourceId` | `ServiceLevelIndicatorOptions.CustomerResourceId` (or per-call override) | | `LocationId` | `ServiceLevelIndicatorOptions.LocationId` | | `Operation` | Caller-supplied operation name | -| `activity.status.code` | Set on `MeasuredOperation` (`Unset` / `Ok` / `Error`) | +| `Outcome` | `Success`, `Failure`, `ClientError`, or `Ignored` | -Additional attributes can be appended via `MeasuredOperation.AddAttribute(...)` or the `attributes` parameter of `Record`/`StartMeasuring`. Custom attributes must not reuse `CustomerResourceId`, `LocationId`, `Operation`, the configured activity-status tag name, or any other attribute name already present on the measurement. +Additional attributes can be appended via `MeasuredOperation.AddAttribute(...)` or the `attributes` parameter of `Record`/`StartMeasuring`. Custom attributes must not reuse `CustomerResourceId`, `LocationId`, `Operation`, `Outcome`, `activity.status.code`, `http.request.method`, `http.response.status.code`, or any other attribute name already present on the measurement. Direct `Record(...)` calls emit `CustomerResourceId`, `LocationId`, `Operation`, `Outcome`, and any supplied custom attributes. --- @@ -59,9 +59,15 @@ Singleton service that creates and records SLI metrics using an OpenTelemetry `H | Signature | Returns | Description | |---|---|---| -| `public void Record(string operation, long elapsedTime, params KeyValuePair[] attributes)` | `void` | Records a measurement using the configured default `CustomerResourceId`. | -| `public void Record(string operation, string customerResourceId, long elapsedTime, params KeyValuePair[] attributes)` | `void` | Records a measurement with an explicit `CustomerResourceId`. | +| `public void Record(string operation, long elapsedTime, params KeyValuePair[] attributes)` | `void` | Records a measurement using the configured default `CustomerResourceId` and default outcome `Ignored`. | +| `public void Record(string operation, long elapsedTime, SliOutcome outcome, params KeyValuePair[] attributes)` | `void` | Records a measurement using the configured default `CustomerResourceId` and explicit outcome. | +| `public void Record(string operation, string customerResourceId, long elapsedTime, params KeyValuePair[] attributes)` | `void` | Records a measurement with an explicit `CustomerResourceId` and default outcome `Ignored`. | +| `public void Record(string operation, string customerResourceId, long elapsedTime, SliOutcome outcome, params KeyValuePair[] attributes)` | `void` | Records a measurement with an explicit `CustomerResourceId` and outcome. | | `public MeasuredOperation StartMeasuring(string operation, params KeyValuePair[] attributes)` | `MeasuredOperation` | Starts a stopwatch-backed measurement; dispose the returned object to record the elapsed time as a metric. | +| `public void Measure(string operation, Action action, params KeyValuePair[] attributes)` | `void` | Measures a synchronous operation and infers `Success`, `Failure`, or `Ignored` for `OperationCanceledException`. | +| `public T Measure(string operation, Func action, params KeyValuePair[] attributes)` | `T` | Measures a synchronous operation and returns its result. | +| `public Task MeasureAsync(string operation, Func action, params KeyValuePair[] attributes)` | `Task` | Measures an asynchronous operation and infers outcome before rethrowing exceptions. | +| `public Task MeasureAsync(string operation, Func> action, params KeyValuePair[] attributes)` | `Task` | Measures an asynchronous operation and returns its result. | | `public void Dispose()` | `void` | Disposes the internally-created `Meter` if this instance created it; never disposes a user-supplied meter. Idempotent. Normally invoked by the DI container at host shutdown. | | `public static string CreateCustomerResourceId(Guid serviceId)` | `string` | Builds a `ServiceTreeId://` customer resource id. Throws `ArgumentNullException` if `serviceId` is `Guid.Empty`. | | `public static string CreateLocationId(string cloud, string? region = null, string? zone = null)` | `string` | Builds an `ms-loc://az///` location id, omitting empty segments. | @@ -83,10 +89,10 @@ Bound via `IOptions`. `LocationId` and `DurationIn | Name | Type | Default | Description | |---|---|---|---| | `Meter` | `Meter` | `null` (auto-created) | The meter used to create the duration histogram. Set during startup; read once when the `ServiceLevelIndicator` singleton is constructed. | -| `CustomerResourceId` | `string` | `"Unset"` | Default `CustomerResourceId` tag value (per-tenant / per-subscription identifier). Can be overridden on each call. | +| `CustomerResourceId` | `string` | `"Unknown"` | Default `CustomerResourceId` tag value. Can be overridden on each call. When emitted as `Unknown`, the diagnostic counter `sli.diagnostics.unknown_customer_resource_id` is incremented. | | `LocationId` | `string` | `""` | **Required.** Where the service is running (e.g. `ms-loc://az/public/westus3`). Must be non-empty. | | `DurationInstrumentName` | `string` | `"operation.duration"` | **Required.** Histogram instrument name. Must be non-empty. | -| `ActivityStatusCodeAttributeName` | `string` | `"activity.status.code"` | Tag name used to emit the operation's `ActivityStatusCode`. Must be non-empty and cannot be `CustomerResourceId`, `LocationId`, or `Operation`. | +| `ActivityStatusCodeAttributeName` | `string` | `"activity.status.code"` | Obsolete compatibility option. Activity status is no longer emitted as a metric dimension. | | `AutomaticallyEmitted` | `bool` | `true` | When `false`, only operations explicitly opted-in (e.g. via the `[ServiceLevelIndicator]` attribute in the ASP package) emit metrics. | --- @@ -113,7 +119,7 @@ Registers `ServiceLevelIndicator` as a singleton and configures `ServiceLevelInd public class MeasuredOperation : IDisposable ``` -Represents an in-flight measurement. The stopwatch starts in the constructor; disposing records the elapsed milliseconds plus the `activity.status.code` tag. +Represents an in-flight measurement. The stopwatch starts in the constructor; disposing records the elapsed milliseconds plus the `Outcome` tag. **Properties** @@ -134,7 +140,8 @@ Represents an in-flight measurement. The stopwatch starts in the constructor; di | Signature | Returns | Description | |---|---|---| -| `public void SetActivityStatusCode(ActivityStatusCode code)` | `void` | Sets the `ActivityStatusCode` recorded with the measurement. Default is `Unset`. | +| `public void SetOutcome(SliOutcome outcome)` | `void` | Sets the SLI outcome recorded with the measurement. Default is `Ignored`. | +| `public void SetActivityStatusCode(ActivityStatusCode code)` | `void` | Obsolete compatibility shim that maps `Ok` to `Success`, `Error` to `Failure`, and other values to `Ignored`. | | `public void AddAttribute(string attribute, object? value)` | `void` | Appends a custom attribute to be emitted with the measurement. Throws if the name collides with a reserved SLI tag. | | `public void Dispose()` | `void` | Stops the stopwatch and records the metric. Idempotent. | | `protected virtual void Dispose(bool disposing)` | `void` | Standard dispose pattern hook. | @@ -227,11 +234,11 @@ async Task DoWorkAsync(ServiceLevelIndicator sli, CancellationToken ct) try { await ProcessAsync(ct); - op.SetActivityStatusCode(ActivityStatusCode.Ok); + op.SetOutcome(SliOutcome.Success); } catch { - op.SetActivityStatusCode(ActivityStatusCode.Error); + op.SetOutcome(SliOutcome.Failure); throw; } } diff --git a/docs/design/sli-metric-contract.md b/docs/design/sli-metric-contract.md new file mode 100644 index 0000000..eac6923 --- /dev/null +++ b/docs/design/sli-metric-contract.md @@ -0,0 +1,386 @@ +# RFC: Trellis ServiceLevelIndicators metric contract + +## Status + +Implemented in the pre-1.0 metric contract finalization. + +## Context + +`Trellis.ServiceLevelIndicators` is part of the Trellis AI-first .NET framework. Its role is to provide structural observability guardrails for generated and hand-written services by emitting service-level latency metrics with stable, meaningful dimensions. + +The library is currently in the .NET 10 preview/alpha phase. The changes in this RFC finalize the intended pre-1.0 metric contract; no migration guide is required for previous alpha behavior. + +## Standards and platform alignment + +The design aligns with: + +| Source | Usage | +|---|---| +| Google SRE | SLI/SLO semantics, valid requests, excluded traffic, error budgets | +| OpenTelemetry | .NET metric emission, histograms, meter registration, optional HTTP/API dimensions | +| OpenSLO | Documentation examples for portable SLO definitions; no runtime support initially | +| Microsoft Azure Monitor / Application Insights | KQL examples and Azure operational guidance | +| Trellis backend contract | Required dimensions and SLI-focused naming | + +## Metric + +The primary metric remains: + +| Metric | Type | Unit | Required dimensions | +|---|---|---|---| +| `operation.duration` | Histogram | `ms` | `CustomerResourceId`, `LocationId`, `Operation`, `Outcome` | + +`operation.count` is not added initially. Success-rate and availability queries should use the histogram count produced by `operation.duration`. A separate count metric may be reconsidered only if backend constraints prove histogram count is insufficient. + +`operation.duration` is a custom Trellis SLI metric, not an OpenTelemetry semantic-convention metric. The unit is intentionally milliseconds (`ms`), even though OpenTelemetry HTTP duration semantic conventions use seconds. + +## Required dimensions + +These dimensions are mandatory and emitted exactly: + +- `CustomerResourceId` +- `LocationId` +- `Operation` +- `Outcome` + +Dimension sources: + +| Dimension | Source / fallback | +|---|---| +| `CustomerResourceId` | Explicit/default resource identifier; fallback `Unknown` | +| `LocationId` | Required startup configuration; fail fast when missing or empty | +| `Operation` | Caller-supplied operation; ASP.NET uses explicit override, route template, then ` ` | +| `Outcome` | Explicit outcome, helper inference, middleware inference, or default `Ignored` | + +Manual/background operations throw `ArgumentException` when `Operation` is null, empty, or whitespace. ASP.NET unrouted fallback uses uppercase HTTP method formatting, for example `GET `. + +`LocationId` is startup-only. It cannot be overridden per operation or request. + +Optional HTTP/API dimensions use OpenTelemetry-style names, including: + +- `http.request.method` +- `http.response.status.code` +- `http.api.version` + +The required PascalCase dimensions intentionally deviate from OpenTelemetry attribute naming conventions because they are part of the Trellis backend contract. + +## Outcome model + +C# code should expose: + +```csharp +public enum SliOutcome +{ + Success, + Failure, + ClientError, + Ignored +} +``` + +Metrics emit the dimension: + +```text +Outcome = "Success" | "Failure" | "ClientError" | "Ignored" +``` + +Emit outcome values through explicit string mapping. Do not rely on enum `ToString()` for the wire value. +Wire values are case-sensitive. + +SLO semantics: + +| Outcome | Meaning | SLO usage | +|---|---|---| +| `Success` | Operation completed successfully | Numerator and denominator | +| `Failure` | Service failed to satisfy the operation | Counted as a valid event; not counted as good | +| `ClientError` | Client/request/input problem, reported separately | Excluded from default success-rate denominator | +| `Ignored` | Not part of SLI measurement | Excluded | + +Default success-rate formula: + +```text +success_rate = count(Outcome == "Success") / (count(Outcome == "Success") + count(Outcome == "Failure")) +``` + +`ClientError` and `Ignored` are excluded from the default denominator. Services may report `ClientError` separately or opt into policy-specific SLO formulas. + +Outcome precedence: + +1. Explicit user-set outcome. +2. Helper inference. +3. Middleware inference. +4. Default `Ignored`. + +Unhandled exceptions escaping the measured operation are an explicit exception to precedence and force `Failure`. + +## ASP.NET classification defaults + +| HTTP status/result | Outcome | +|---|---| +| 2xx | `Success` | +| 3xx | `Success` | +| 400, 401, 403, 404, 409, 412, 422 | `ClientError` | +| 429 | `Failure` by default; configurable | +| 5xx | `Failure` | +| Client disconnect / request-aborted cancellation | `Ignored` | +| Unhandled exception | `Failure` | + +Application/business cancellations remain configurable. + +429 reduces success rate by default because throttling is treated as a service capacity/backpressure signal. Services that treat 429 as client-driven may classify it as `ClientError`. + +3xx responses are `Success` by default because redirects and cache-validation responses can be successful service behavior. Redirect loops should be monitored through status-code queries or classifier overrides. + +For exceptions, record the final `HttpContext.Response.StatusCode`. If an unhandled exception escapes and no status was set, emit `http.response.status.code = 500`. + +ASP.NET emits these optional dimensions by default: + +- `http.request.method` +- `http.response.status.code` + +API version remains opt-in through `.AddApiVersion()` and emits: + +- `http.api.version` + +## Manual measurement + +Manual/background operations default to `Outcome = Ignored` unless explicitly set or inferred by helper APIs. + +Core APIs should include outcome-oriented methods such as: + +```csharp +measuredOperation.SetOutcome(SliOutcome.Success); +``` + +Core `Measure(...)` and `MeasureAsync(...)` helpers should infer: + +- `Success` on normal completion. +- `Failure` on exception. +- `Ignored` on `OperationCanceledException`. + +Core helpers are not ASP.NET-specific and do not detect client disconnects. ASP.NET middleware maps `HttpContext.RequestAborted` / request-aborted cancellation to `Ignored`; application/business cancellations remain configurable. + +Result-aware helpers for Trellis `Result` belong in the optional `Trellis.ServiceLevelIndicators.Results` package, not in the core package. + +## Activity correlation + +`activity.status.code` is removed from default metric dimensions. + +The library should continue updating `Activity.Current` status for trace correlation. Activity status is not the source of SLI outcome. + +Activity status mapping: + +| Outcome | Activity status | +|---|---| +| `Success` | `Ok` | +| `Failure` | `Error` | +| `ClientError` | `Unset` | +| `Ignored` | `Unset` | + +## Unknown customer resource + +When no customer resource is known, emit: + +```text +CustomerResourceId = "Unknown" +``` + +Diagnostics: + +- Log a one-time warning per operation/location when `CustomerResourceId = Unknown`. +- Increment `sli.diagnostics.unknown_customer_resource_id` every time `CustomerResourceId = Unknown`. +- Do not throw, drop, cap, or replace required dimensions by default. + +`Unknown` is deliberate and distinct from `ActivityStatusCode.Unset`. + +The one-time warning scope is per process lifetime. The implementation should bound the warning cache to avoid unbounded memory growth. + +Diagnostic counter contract: + +| Instrument | Type | Dimensions | +|---|---|---| +| `sli.diagnostics.unknown_customer_resource_id` | `Counter` | `Operation`, `LocationId` | + +## Meter + +The default meter name remains: + +```text +Trellis.SLI +``` + +Custom meter registration remains supported. All SLI metrics and diagnostic counters should use the configured meter consistently. + +## Implementation design + +### Core recording + +`ServiceLevelIndicator` owns the `operation.duration` histogram and the `sli.diagnostics.unknown_customer_resource_id` counter. Both instruments use the default `Trellis.SLI` meter or the configured custom meter. + +Every duration recording emits: + +- `CustomerResourceId` +- `LocationId` +- `Operation` +- `Outcome` +- any custom dimensions + +Custom dimensions must not reuse required names, reserved names, or optional dimensions already present on the measurement. Schema collisions throw immediately. The diagnostics-only rule applies to suspicious values, not to schema collisions. + +### MeasuredOperation + +`MeasuredOperation` stores `SliOutcome`, defaulting to `Ignored`. + +Public API: + +```csharp +measuredOperation.SetOutcome(SliOutcome.Success); +``` + +Internal behavior: + +- Disposing records elapsed milliseconds. +- Disposing records the explicit or inferred `Outcome`. +- Raw `StartMeasuring(...)`/`Dispose()` cannot detect escaping exceptions and records the explicit/default outcome. +- `Measure(...)`, `MeasureAsync(...)`, and ASP.NET middleware can force `Failure` when they observe an exception. +- `Dispose()` remains idempotent. +- `SetOutcome(...)` after disposal has no effect. + +### Helper APIs + +Core helper APIs should infer outcomes: + +| Result | Outcome | +|---|---| +| Normal completion | `Success` | +| Exception | `Failure` | +| `OperationCanceledException` | `Ignored` | + +Helpers rethrow exceptions and cancellations after setting the inferred outcome. +Rethrows should use `throw;` so exception stack traces are preserved. + +If user code explicitly sets `Failure` inside a helper and completes normally, explicit `Failure` remains. Helper success inference does not promote it to `Success`. Unhandled exceptions observed by helpers still force `Failure`. + +### ASP.NET middleware + +Middleware owns ASP.NET request measurement. Core helpers and ASP.NET middleware normally do not co-apply to the same `MeasuredOperation`; precedence exists for explicit outcomes, inferred outcomes, and future extension points. + +Middleware should: + +- resolve `Operation` from explicit metadata, route template, then uppercase ` `, +- set `CustomerResourceId` from endpoint metadata when present, +- default missing customer resource to `Unknown`, +- classify outcome using configured classifier first, then the default table, +- emit `http.request.method` and `http.response.status.code` by default, +- run enrichments before recording, +- rethrow unhandled exceptions after recording failure semantics. + +### Classifier extensibility + +The first implementation should provide a global HTTP outcome classifier option. Exact per-route classifier API shape is intentionally left for implementation design and can follow after the global classifier. + +Classifier behavior: + +- returned `SliOutcome` wins over default status mapping, +- no result / null falls back to default status mapping, +- unhandled exceptions still force `Failure`. + +### API versioning + +The API versioning package remains an enrichment package. It continues to emit `http.api.version` only when `.AddApiVersion()` is called and does not participate in outcome classification. + +## Dimension stability guardrails + +The rule is stable and meaningful dimensions, not low cardinality. + +`CustomerResourceId` may legitimately have very high cardinality at Microsoft/Azure scale, such as user object IDs in login services. Required dimensions must pass through exactly by default. + +Default behavior: + +- Required dimensions: diagnostics only; never mutate by default. +- Optional/custom dimensions: diagnostics only by default. +- Strict blocking or replacement: opt-in only. +- Runtime suspicious-value heuristics are off by default to avoid hot-path overhead at high scale; analyzers and documentation are the default guardrails. + +Diagnostics should look for unstable values such as: + +- request IDs, +- timestamps, +- raw paths, +- arbitrary text, +- generated-per-request GUIDs, +- emails when a stable object ID is available. + +Analyzer support is a later milestone. + +## Package architecture + +Core SLI package remains independent from `Trellis.Core`. + +Optional Trellis integration package: + +```text +Trellis.ServiceLevelIndicators.Results +``` + +Dependency direction: + +```text +Trellis.ServiceLevelIndicators.Results + -> Trellis.ServiceLevelIndicators + -> Trellis.Core +``` + +Neither core package depends on the Results integration package. + +## Documentation requirements + +Docs should include: + +- Google SRE-aligned SLO semantics. +- OpenTelemetry metric setup. +- OpenSLO examples, documentation only. +- Azure Monitor / Application Insights KQL examples. +- Success-rate queries using `operation.duration` histogram count. +- Latency percentile queries. +- Client-error rate queries. +- Unknown `CustomerResourceId` detection. +- OpenTelemetry View/cardinality-limit guidance for services that intentionally use high-cardinality `CustomerResourceId` values. + +## Testing support + +Use TDD by phase: write or adjust failing tests first, implement the smallest passing change, then refactor. + +Core and ASP.NET tests should cover outcome values, diagnostics, exception/cancellation paths, Activity status mapping, required dimensions, and custom meter behavior. Custom meter tests must verify both `operation.duration` and `sli.diagnostics.unknown_customer_resource_id` use the configured meter. + +Eventually add: + +```text +Trellis.ServiceLevelIndicators.Testing +``` + +This package should provide metric assertion helpers for AI-generated and hand-written tests. + +## Histogram views/buckets + +Provide optional helper APIs and documentation for recommended API latency histogram views/buckets. Do not force bucket configuration by default. + +## Alternatives considered + +### Keep `activity.status.code` as the primary outcome dimension + +Rejected. `activity.status.code` is trace-oriented and does not model SLI-specific states such as `ClientError` and `Ignored` clearly enough for SLO math. The library should still update `Activity.Current` for trace correlation. + +### Add `operation.count` immediately + +Rejected for the first implementation. Histograms already produce a count series, and a separate counter risks drift unless every recording path updates both instruments consistently. + +### Use seconds instead of milliseconds + +Rejected. OpenTelemetry HTTP semantic-convention duration metrics use seconds, but Trellis SLI metrics intentionally emit latency in milliseconds to match the backend contract and existing library positioning. + +## Open questions + +- Exact recommended histogram bucket boundaries for the optional helper. +- Exact per-route classifier API shape. +- Exact analyzer packaging and suppression model for the later analyzer milestone. diff --git a/docs/usage-reference.md b/docs/usage-reference.md index 628b35a..182d2e6 100644 --- a/docs/usage-reference.md +++ b/docs/usage-reference.md @@ -21,10 +21,10 @@ These values are part of the library contract and should be treated as stable un | Unit | milliseconds (`ms`) | | Required tag | `CustomerResourceId` | | Required tag | `LocationId` | -| Standard tag | `Operation` | -| Standard tag | `activity.status.code` | +| Required tag | `Operation` | +| Required tag | `Outcome` (`Success`, `Failure`, `ClientError`, or `Ignored`) | -For ASP.NET Core, the library also emits `http.response.status.code` and can optionally emit `http.request.method` and `http.api.version`. +For ASP.NET Core, the library also emits `http.request.method` and `http.response.status.code`; `http.api.version` is emitted when API version enrichment is enabled. ## Core Package @@ -65,11 +65,11 @@ async Task ProcessOrder(ServiceLevelIndicator sli) await Task.Delay(50); - op.SetActivityStatusCode(ActivityStatusCode.Ok); + op.SetOutcome(SliOutcome.Success); } ``` -Direct recording is also available when you already know the elapsed time: +Direct recording is also available when you already know the elapsed time. `Record(...)` emits `CustomerResourceId`, `LocationId`, `Operation`, `Outcome`, and any custom attributes supplied to the call. Manual measurements default to `Ignored` unless you set an outcome. ```csharp sli.Record("ProcessOrder", elapsedTime: 42); @@ -145,7 +145,6 @@ builder.Services.AddServiceLevelIndicator(options => options.LocationId = ServiceLevelIndicator.CreateLocationId("public", "westus3"); }) .AddMvc() -.AddHttpMethod() .Enrich(context => { context.SetCustomerResourceId("tenant-a"); @@ -213,7 +212,7 @@ builder.Services.AddServiceLevelIndicator(options => .AddApiVersion(); ``` -This adds the `http.api.version` metric dimension when Asp.Versioning is present. +This adds the `http.api.version` metric dimension when Asp.Versioning is present. The value is the single resolved API version, `Neutral`, `Unspecified`, or an empty string when the requested version is invalid or ambiguous. ## ASP.NET Runtime Helpers @@ -234,20 +233,21 @@ Use `GetMeasuredOperation()` when the route is guaranteed to emit SLI metrics. U ## Status Semantics -For non-HTTP code, set the outcome explicitly: +For non-HTTP code, set the outcome explicitly or use `Measure(...)` / `MeasureAsync(...)` helpers to infer it: ```csharp -op.SetActivityStatusCode(ActivityStatusCode.Ok); +op.SetOutcome(SliOutcome.Success); ``` For ASP.NET Core: -| Response outcome | `activity.status.code` | +| Response outcome | `Outcome` | |---|---| -| `2xx` | `Ok` | -| `5xx` | `Error` | -| Other status codes | `Unset` | -| Unhandled exceptions | `Error` | +| `2xx`, `3xx` | `Success` | +| `400`, `401`, `403`, `404`, `409`, `412`, `422` | `ClientError` | +| `429`, `5xx` | `Failure` | +| Unhandled exceptions | `Failure` | +| Request-aborted cancellations | `Ignored` | ## Cardinality Guidance @@ -276,7 +276,7 @@ Avoid values that can explode cardinality unless your backend is designed for th 3. Forgetting `AddMvc()` when relying on MVC conventions and attribute-based overrides. 4. Forgetting `.AddServiceLevelIndicator()` on Minimal API endpoints when `AutomaticallyEmitted` is `false`. 5. Renaming `CustomerResourceId` or `LocationId` even though downstream systems depend on those exact names. -6. Reusing reserved tag names such as `CustomerResourceId`, `LocationId`, `Operation`, or `activity.status.code` as custom attributes. +6. Reusing reserved tag names such as `CustomerResourceId`, `LocationId`, `Operation`, `Outcome`, `activity.status.code`, `http.request.method`, or `http.response.status.code` as custom attributes. ## Public API Cheat Sheet @@ -295,7 +295,7 @@ ASP.NET Core package: - `UseServiceLevelIndicator()` - `IServiceLevelIndicatorBuilder.AddMvc()` -- `IServiceLevelIndicatorBuilder.AddHttpMethod()` +- `IServiceLevelIndicatorBuilder.ClassifyHttpOutcome(...)` - `IServiceLevelIndicatorBuilder.Enrich(...)` - `IServiceLevelIndicatorBuilder.EnrichAsync(...)` - `EndpointConventionBuilder.AddServiceLevelIndicator(...)` diff --git a/sample/ConsoleApp/Program.cs b/sample/ConsoleApp/Program.cs index 1b0029a..0c01ad7 100644 --- a/sample/ConsoleApp/Program.cs +++ b/sample/ConsoleApp/Program.cs @@ -1,5 +1,4 @@ -using System.Diagnostics; -using System.Reflection; +using System.Reflection; using Azure.Core; using Microsoft.Extensions.DependencyInjection; using Microsoft.Extensions.Logging; @@ -45,17 +44,17 @@ .Build(); var serviceLevelIndicator = serviceProvider.GetRequiredService(); -using MeasuredOperation measuredOperation = serviceLevelIndicator.StartMeasuring("OperationWork"); try { - logger.LogInformation("Starting to do some work..."); - await Task.Delay(1000); // Simulate some work - logger.LogInformation("Work done."); - measuredOperation.SetActivityStatusCode(ActivityStatusCode.Ok); + await serviceLevelIndicator.MeasureAsync("OperationWork", async () => + { + logger.LogInformation("Starting to do some work..."); + await Task.Delay(1000); // Simulate some work + logger.LogInformation("Work done."); + }); } catch (Exception ex) { - measuredOperation.SetActivityStatusCode(ActivityStatusCode.Error); logger.LogError(ex, "An error occurred doing work."); } diff --git a/sample/GenerateSli/Program.cs b/sample/GenerateSli/Program.cs index 346491c..8601779 100644 --- a/sample/GenerateSli/Program.cs +++ b/sample/GenerateSli/Program.cs @@ -35,4 +35,4 @@ static async Task ClientRequests() Console.WriteLine($"Request {i}: Exception occurred: {ex.Message}"); } } -} \ No newline at end of file +} diff --git a/sample/MinApi/Program.cs b/sample/MinApi/Program.cs index 5336a3e..9a168e8 100644 --- a/sample/MinApi/Program.cs +++ b/sample/MinApi/Program.cs @@ -1,8 +1,8 @@ using Azure.Core; using OpenTelemetry.Metrics; using OpenTelemetry.Resources; -using Scalar.AspNetCore; using SampleMinimalApiSli; +using Scalar.AspNetCore; using Trellis.ServiceLevelIndicators; var builder = WebApplication.CreateBuilder(args); @@ -34,8 +34,7 @@ { options.CustomerResourceId = "SampleCustomerResourceId"; options.LocationId = ServiceLevelIndicator.CreateLocationId("public", AzureLocation.WestUS3.Name); -}) -.AddHttpMethod(); +}); // Add services to the container. diff --git a/sample/MinApi/UserExt.cs b/sample/MinApi/UserExt.cs index 6be015d..1354835 100644 --- a/sample/MinApi/UserExt.cs +++ b/sample/MinApi/UserExt.cs @@ -21,4 +21,4 @@ public static void UseUserRoute(this WebApplication app) userApi.MapGet("/{name}", (string name) => $"Hello {name}").WithName("GetUserById"); } -} \ No newline at end of file +} diff --git a/sample/Observability/Grafana/README.md b/sample/Observability/Grafana/README.md new file mode 100644 index 0000000..d763b3f --- /dev/null +++ b/sample/Observability/Grafana/README.md @@ -0,0 +1,64 @@ +# Local SLI Grafana dashboard + +This sample runs a local OpenTelemetry Collector, Prometheus, and Grafana stack so you can see the value of the SLI library while running one of the sample applications. + +## Start the dashboard stack + +From this directory: + +```powershell +docker compose up -d +``` + +Grafana starts at http://localhost:3000 with anonymous admin access enabled for local development. The SLI dashboard is provisioned automatically under **Dashboards > Trellis > Service Level Indicators**. + +![Trellis SLI Grafana dashboard](assets/sli-grafana-dashboard.png) + +## Run a sample app + +In another terminal, run the Web API sample and point its OTLP exporter at the local collector: + +```powershell +$env:OTEL_EXPORTER_OTLP_ENDPOINT = "http://localhost:4317" +dotnet run --project ..\..\WebApi\SampleWebApplicationSLI.csproj +``` + +Generate traffic: + +```powershell +Invoke-RestMethod https://localhost:63936/WeatherForecast -SkipCertificateCheck +Invoke-RestMethod https://localhost:63936/WeatherForecast/MyAction1 -SkipCertificateCheck +Invoke-RestMethod https://localhost:63936/WeatherForecast/MyAction2 -SkipCertificateCheck +Invoke-RestMethod https://localhost:63936/WeatherForecast/my-customer-resource-id -SkipCertificateCheck +``` + +You can also run the Minimal API or API-versioned samples with the same `OTEL_EXPORTER_OTLP_ENDPOINT` environment variable. + +## What the dashboard shows + +The dashboard uses the SLI metric contract: + +- `operation.duration` histogram exported to Prometheus as `operation_duration_milliseconds_*` +- `CustomerResourceId` +- `LocationId` +- `Operation` +- `Outcome` +- `http.request.method` +- `http.response.status.code` +- optional `http.api.version` + +Panels include: + +- request volume by operation and outcome +- p50/p95/p99 latency +- success rate using `Success / (Success + Failure)` +- failure and client-error rates +- HTTP status-code breakdown +- unknown customer diagnostics +- `` operation detection + +## Stop the stack + +```powershell +docker compose down +``` diff --git a/sample/Observability/Grafana/assets/sli-grafana-dashboard.png b/sample/Observability/Grafana/assets/sli-grafana-dashboard.png new file mode 100644 index 0000000..2e6e33f Binary files /dev/null and b/sample/Observability/Grafana/assets/sli-grafana-dashboard.png differ diff --git a/sample/Observability/Grafana/docker-compose.yml b/sample/Observability/Grafana/docker-compose.yml new file mode 100644 index 0000000..2754088 --- /dev/null +++ b/sample/Observability/Grafana/docker-compose.yml @@ -0,0 +1,44 @@ +services: + otel-collector: + image: otel/opentelemetry-collector-contrib:0.123.0 + command: ["--config=/etc/otelcol/config.yaml"] + volumes: + - ./otel-collector-config.yaml:/etc/otelcol/config.yaml:ro + ports: + - "4317:4317" + - "4318:4318" + - "9464:9464" + + prometheus: + image: prom/prometheus:v3.3.0 + command: + - "--config.file=/etc/prometheus/prometheus.yml" + - "--storage.tsdb.path=/prometheus" + - "--web.enable-lifecycle" + volumes: + - ./prometheus.yml:/etc/prometheus/prometheus.yml:ro + - prometheus-data:/prometheus + ports: + - "9090:9090" + depends_on: + - otel-collector + + grafana: + image: grafana/grafana:11.6.0 + environment: + GF_AUTH_ANONYMOUS_ENABLED: "true" + GF_AUTH_ANONYMOUS_ORG_ROLE: "Admin" + GF_AUTH_DISABLE_LOGIN_FORM: "true" + GF_USERS_DEFAULT_THEME: "light" + volumes: + - grafana-data:/var/lib/grafana + - ./grafana/provisioning:/etc/grafana/provisioning:ro + - ./grafana/dashboards:/var/lib/grafana/dashboards:ro + ports: + - "3000:3000" + depends_on: + - prometheus + +volumes: + prometheus-data: + grafana-data: diff --git a/sample/Observability/Grafana/grafana/dashboards/sli-dashboard.json b/sample/Observability/Grafana/grafana/dashboards/sli-dashboard.json new file mode 100644 index 0000000..51a5af4 --- /dev/null +++ b/sample/Observability/Grafana/grafana/dashboards/sli-dashboard.json @@ -0,0 +1,818 @@ +{ + "annotations": { + "list": [ + { + "builtIn": 1, + "datasource": { + "type": "grafana", + "uid": "-- Grafana --" + }, + "enable": true, + "hide": true, + "iconColor": "rgba(0, 211, 255, 1)", + "name": "Annotations & Alerts", + "type": "dashboard" + } + ] + }, + "editable": true, + "fiscalYearStartMonth": 0, + "graphTooltip": 0, + "id": null, + "links": [], + "panels": [ + { + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "thresholds" + }, + "mappings": [], + "max": 100, + "min": 0, + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "red", + "value": null + }, + { + "color": "orange", + "value": 95 + }, + { + "color": "green", + "value": 99 + } + ] + }, + "unit": "percent" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 6, + "x": 0, + "y": 0 + }, + "id": 1, + "options": { + "colorMode": "value", + "graphMode": "area", + "justifyMode": "auto", + "orientation": "auto", + "percentChangeColorMode": "standard", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "fields": "", + "values": false + }, + "showPercentChange": false, + "textMode": "auto", + "wideLayout": true + }, + "pluginVersion": "11.6.0", + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "editorMode": "code", + "expr": "100 * sum(rate(operation_duration_milliseconds_count{Operation=~\"$operation\",CustomerResourceId=~\"$customer\",LocationId=~\"$location\",Outcome=\"Success\"}[$__rate_interval])) / sum(rate(operation_duration_milliseconds_count{Operation=~\"$operation\",CustomerResourceId=~\"$customer\",LocationId=~\"$location\",Outcome=~\"Success|Failure\"}[$__rate_interval]))", + "legendFormat": "success rate", + "range": true, + "refId": "A" + } + ], + "title": "Success rate", + "type": "stat" + }, + { + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 12, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 2, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "never", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "ms" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 18, + "x": 6, + "y": 0 + }, + "id": 2, + "options": { + "legend": { + "calcs": [ + "lastNotNull", + "max" + ], + "displayMode": "table", + "placement": "right", + "showLegend": true + }, + "tooltip": { + "hideZeros": false, + "mode": "multi", + "sort": "desc" + } + }, + "pluginVersion": "11.6.0", + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.50, sum by (le, Operation) (rate(operation_duration_milliseconds_bucket{Operation=~\"$operation\",CustomerResourceId=~\"$customer\",LocationId=~\"$location\"}[$__rate_interval])))", + "legendFormat": "p50 {{Operation}}", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.95, sum by (le, Operation) (rate(operation_duration_milliseconds_bucket{Operation=~\"$operation\",CustomerResourceId=~\"$customer\",LocationId=~\"$location\"}[$__rate_interval])))", + "hide": false, + "legendFormat": "p95 {{Operation}}", + "range": true, + "refId": "B" + }, + { + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "editorMode": "code", + "expr": "histogram_quantile(0.99, sum by (le, Operation) (rate(operation_duration_milliseconds_bucket{Operation=~\"$operation\",CustomerResourceId=~\"$customer\",LocationId=~\"$location\"}[$__rate_interval])))", + "hide": false, + "legendFormat": "p99 {{Operation}}", + "range": true, + "refId": "C" + } + ], + "title": "Latency percentiles", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 20, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 2, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "never", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "reqps" + }, + "overrides": [] + }, + "gridPos": { + "h": 9, + "w": 12, + "x": 0, + "y": 8 + }, + "id": 3, + "options": { + "legend": { + "calcs": [ + "lastNotNull" + ], + "displayMode": "table", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "hideZeros": false, + "mode": "multi", + "sort": "desc" + } + }, + "pluginVersion": "11.6.0", + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "editorMode": "code", + "expr": "sum by (Operation, Outcome) (rate(operation_duration_milliseconds_count{Operation=~\"$operation\",CustomerResourceId=~\"$customer\",LocationId=~\"$location\",Outcome=~\"$outcome\"}[$__rate_interval]))", + "legendFormat": "{{Operation}} / {{Outcome}}", + "range": true, + "refId": "A" + } + ], + "title": "Request volume by operation and outcome", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 20, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 2, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "never", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "reqps" + }, + "overrides": [] + }, + "gridPos": { + "h": 9, + "w": 12, + "x": 12, + "y": 8 + }, + "id": 4, + "options": { + "legend": { + "calcs": [ + "lastNotNull" + ], + "displayMode": "table", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "hideZeros": false, + "mode": "multi", + "sort": "desc" + } + }, + "pluginVersion": "11.6.0", + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "editorMode": "code", + "expr": "sum by (Operation) (rate(operation_duration_milliseconds_count{Operation=~\"$operation\",CustomerResourceId=~\"$customer\",LocationId=~\"$location\",Outcome=\"Failure\"}[$__rate_interval]))", + "legendFormat": "Failure {{Operation}}", + "range": true, + "refId": "A" + }, + { + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "editorMode": "code", + "expr": "sum by (Operation) (rate(operation_duration_milliseconds_count{Operation=~\"$operation\",CustomerResourceId=~\"$customer\",LocationId=~\"$location\",Outcome=\"ClientError\"}[$__rate_interval]))", + "legendFormat": "ClientError {{Operation}}", + "range": true, + "refId": "B" + } + ], + "title": "Failure and client-error rate", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "bars", + "fillOpacity": 60, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 1, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "never", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "reqps" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 0, + "y": 17 + }, + "id": 5, + "options": { + "legend": { + "calcs": [ + "lastNotNull" + ], + "displayMode": "table", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "hideZeros": false, + "mode": "multi", + "sort": "desc" + } + }, + "pluginVersion": "11.6.0", + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "editorMode": "code", + "expr": "sum by (http_response_status_code) (rate(operation_duration_milliseconds_count{Operation=~\"$operation\",CustomerResourceId=~\"$customer\",LocationId=~\"$location\"}[$__rate_interval]))", + "legendFormat": "HTTP {{http_response_status_code}}", + "range": true, + "refId": "A" + } + ], + "title": "HTTP status-code breakdown", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "palette-classic" + }, + "custom": { + "axisBorderShow": false, + "axisCenteredZero": false, + "axisColorMode": "text", + "axisLabel": "", + "axisPlacement": "auto", + "barAlignment": 0, + "barWidthFactor": 0.6, + "drawStyle": "line", + "fillOpacity": 20, + "gradientMode": "none", + "hideFrom": { + "legend": false, + "tooltip": false, + "viz": false + }, + "insertNulls": false, + "lineInterpolation": "linear", + "lineWidth": 2, + "pointSize": 5, + "scaleDistribution": { + "type": "linear" + }, + "showPoints": "never", + "spanNulls": false, + "stacking": { + "group": "A", + "mode": "none" + }, + "thresholdsStyle": { + "mode": "off" + } + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + } + ] + }, + "unit": "short" + }, + "overrides": [] + }, + "gridPos": { + "h": 8, + "w": 12, + "x": 12, + "y": 17 + }, + "id": 6, + "options": { + "legend": { + "calcs": [ + "lastNotNull" + ], + "displayMode": "table", + "placement": "bottom", + "showLegend": true + }, + "tooltip": { + "hideZeros": false, + "mode": "multi", + "sort": "desc" + } + }, + "pluginVersion": "11.6.0", + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "editorMode": "code", + "expr": "sum by (Operation, LocationId) (increase(sli_diagnostics_unknown_customer_resource_id_total[$__range]))", + "legendFormat": "{{Operation}} / {{LocationId}}", + "range": true, + "refId": "A" + } + ], + "title": "Unknown customer diagnostics", + "type": "timeseries" + }, + { + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "fieldConfig": { + "defaults": { + "color": { + "mode": "thresholds" + }, + "mappings": [], + "thresholds": { + "mode": "absolute", + "steps": [ + { + "color": "green", + "value": null + }, + { + "color": "red", + "value": 1 + } + ] + }, + "unit": "short" + }, + "overrides": [] + }, + "gridPos": { + "h": 6, + "w": 24, + "x": 0, + "y": 25 + }, + "id": 7, + "options": { + "displayMode": "gradient", + "legend": { + "calcs": [], + "displayMode": "list", + "placement": "bottom", + "showLegend": true + }, + "maxVizHeight": 300, + "minVizHeight": 16, + "minVizWidth": 8, + "namePlacement": "auto", + "orientation": "horizontal", + "reduceOptions": { + "calcs": [ + "lastNotNull" + ], + "fields": "", + "values": false + }, + "showUnfilled": true, + "sizing": "auto", + "valueMode": "color" + }, + "pluginVersion": "11.6.0", + "targets": [ + { + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "editorMode": "code", + "expr": "sum by (Operation) (increase(operation_duration_milliseconds_count{Operation=~\".*.*\"}[$__range]))", + "legendFormat": "{{Operation}}", + "range": true, + "refId": "A" + } + ], + "title": " operation detection", + "type": "bargauge" + } + ], + "preload": false, + "refresh": "5s", + "schemaVersion": 41, + "tags": [ + "trellis", + "sli", + "opentelemetry" + ], + "templating": { + "list": [ + { + "allValue": ".*", + "current": { + "selected": true, + "text": "All", + "value": "$__all" + }, + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "definition": "label_values(operation_duration_milliseconds_count, Operation)", + "includeAll": true, + "label": "Operation", + "multi": true, + "name": "operation", + "options": [], + "query": { + "qryType": 1, + "query": "label_values(operation_duration_milliseconds_count, Operation)", + "refId": "PrometheusVariableQueryEditor-VariableQuery" + }, + "refresh": 1, + "regex": "", + "type": "query" + }, + { + "allValue": ".*", + "current": { + "selected": true, + "text": "All", + "value": "$__all" + }, + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "definition": "label_values(operation_duration_milliseconds_count, CustomerResourceId)", + "includeAll": true, + "label": "CustomerResourceId", + "multi": true, + "name": "customer", + "options": [], + "query": { + "qryType": 1, + "query": "label_values(operation_duration_milliseconds_count, CustomerResourceId)", + "refId": "PrometheusVariableQueryEditor-VariableQuery" + }, + "refresh": 1, + "regex": "", + "type": "query" + }, + { + "allValue": ".*", + "current": { + "selected": true, + "text": "All", + "value": "$__all" + }, + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "definition": "label_values(operation_duration_milliseconds_count, LocationId)", + "includeAll": true, + "label": "LocationId", + "multi": true, + "name": "location", + "options": [], + "query": { + "qryType": 1, + "query": "label_values(operation_duration_milliseconds_count, LocationId)", + "refId": "PrometheusVariableQueryEditor-VariableQuery" + }, + "refresh": 1, + "regex": "", + "type": "query" + }, + { + "allValue": ".*", + "current": { + "selected": true, + "text": "All", + "value": "$__all" + }, + "datasource": { + "type": "prometheus", + "uid": "prometheus" + }, + "definition": "label_values(operation_duration_milliseconds_count, Outcome)", + "includeAll": true, + "label": "Outcome", + "multi": true, + "name": "outcome", + "options": [], + "query": { + "qryType": 1, + "query": "label_values(operation_duration_milliseconds_count, Outcome)", + "refId": "PrometheusVariableQueryEditor-VariableQuery" + }, + "refresh": 1, + "regex": "", + "type": "query" + } + ] + }, + "time": { + "from": "now-15m", + "to": "now" + }, + "timepicker": {}, + "timezone": "browser", + "title": "Trellis Service Level Indicators", + "uid": "trellis-sli", + "version": 1, + "weekStart": "" +} diff --git a/sample/Observability/Grafana/grafana/provisioning/dashboards/sli.yml b/sample/Observability/Grafana/grafana/provisioning/dashboards/sli.yml new file mode 100644 index 0000000..8c46634 --- /dev/null +++ b/sample/Observability/Grafana/grafana/provisioning/dashboards/sli.yml @@ -0,0 +1,12 @@ +apiVersion: 1 + +providers: + - name: Trellis + orgId: 1 + folder: Trellis + type: file + disableDeletion: false + editable: true + updateIntervalSeconds: 10 + options: + path: /var/lib/grafana/dashboards diff --git a/sample/Observability/Grafana/grafana/provisioning/datasources/prometheus.yml b/sample/Observability/Grafana/grafana/provisioning/datasources/prometheus.yml new file mode 100644 index 0000000..00f9915 --- /dev/null +++ b/sample/Observability/Grafana/grafana/provisioning/datasources/prometheus.yml @@ -0,0 +1,10 @@ +apiVersion: 1 + +datasources: + - name: Prometheus + uid: prometheus + type: prometheus + access: proxy + url: http://prometheus:9090 + isDefault: true + editable: false diff --git a/sample/Observability/Grafana/otel-collector-config.yaml b/sample/Observability/Grafana/otel-collector-config.yaml new file mode 100644 index 0000000..9c361c0 --- /dev/null +++ b/sample/Observability/Grafana/otel-collector-config.yaml @@ -0,0 +1,26 @@ +receivers: + otlp: + protocols: + grpc: + endpoint: 0.0.0.0:4317 + http: + endpoint: 0.0.0.0:4318 + +processors: + batch: + +exporters: + prometheus: + endpoint: 0.0.0.0:9464 + enable_open_metrics: true + resource_to_telemetry_conversion: + enabled: true + debug: + verbosity: basic + +service: + pipelines: + metrics: + receivers: [otlp] + processors: [batch] + exporters: [prometheus, debug] diff --git a/sample/Observability/Grafana/prometheus.yml b/sample/Observability/Grafana/prometheus.yml new file mode 100644 index 0000000..454d12e --- /dev/null +++ b/sample/Observability/Grafana/prometheus.yml @@ -0,0 +1,9 @@ +global: + scrape_interval: 5s + evaluation_interval: 5s + +scrape_configs: + - job_name: otel-collector + static_configs: + - targets: + - otel-collector:9464 diff --git a/sample/WebApi/ConfigureServiceLevelIndicatorOptions.cs b/sample/WebApi/ConfigureServiceLevelIndicatorOptions.cs index f52b5c3..89f6348 100644 --- a/sample/WebApi/ConfigureServiceLevelIndicatorOptions.cs +++ b/sample/WebApi/ConfigureServiceLevelIndicatorOptions.cs @@ -10,4 +10,4 @@ internal sealed class ConfigureServiceLevelIndicatorOptions : IConfigureOptions< public ConfigureServiceLevelIndicatorOptions(SampleApiMeters meters) => this.meters = meters; public void Configure(ServiceLevelIndicatorOptions options) => options.Meter = meters.Meter; -} \ No newline at end of file +} diff --git a/sample/WebApi/Controllers/WeatherForecastController.cs b/sample/WebApi/Controllers/WeatherForecastController.cs index 87c9bf0..22604fc 100644 --- a/sample/WebApi/Controllers/WeatherForecastController.cs +++ b/sample/WebApi/Controllers/WeatherForecastController.cs @@ -20,6 +20,7 @@ public class WeatherForecastController : ControllerBase /// Should emit SLI metrics /// Operation: "GET WeatherForecast" /// CustomerResourceId = "SampleCustomerResourceId" + /// Outcome = "Success" /// [HttpGet] public IEnumerable Get() => GetWeather(); @@ -28,6 +29,7 @@ public class WeatherForecastController : ControllerBase /// Should emit SLI metrics /// Operation: "GET WeatherForecast/MyAction1" /// CustomerResourceId = "SampleCustomerResourceId" + /// Outcome = "Success" /// [HttpGet("MyAction1")] @@ -37,6 +39,7 @@ public class WeatherForecastController : ControllerBase /// Should emit SLI metrics /// Operation: "MyOperation" /// CustomerResourceId = "SampleCustomerResourceId" + /// Outcome = "Success" /// [HttpGet("MyAction2")] [ServiceLevelIndicator(Operation = "MyOperation")] @@ -46,10 +49,32 @@ public class WeatherForecastController : ControllerBase /// Should emit SLI metrics /// Operation: "GET WeatherForecast/{customerResourceId}" /// CustomerResourceId = "Your input" + /// Outcome = "Success" /// [HttpGet("{customerResourceId}")] public IEnumerable Get([CustomerResourceId] string customerResourceId) => GetWeather(); + /// + /// Demo endpoint that emits Outcome = "ClientError". + /// + [HttpGet("demo/client-error/{customerResourceId}")] + public IActionResult ClientError([CustomerResourceId] string customerResourceId) => + BadRequest(new { customerResourceId, error = "Invalid forecast request." }); + + /// + /// Demo endpoint that emits Outcome = "Failure" because 429 is service-impacting by default. + /// + [HttpGet("demo/throttled/{customerResourceId}")] + public IActionResult Throttled([CustomerResourceId] string customerResourceId) => + StatusCode(StatusCodes.Status429TooManyRequests, new { customerResourceId, error = "Too many forecast requests." }); + + /// + /// Demo endpoint that emits Outcome = "Failure". + /// + [HttpGet("demo/server-error/{customerResourceId}")] + public IActionResult ServerError([CustomerResourceId] string customerResourceId) => + StatusCode(StatusCodes.Status500InternalServerError, new { customerResourceId, error = "Forecast service unavailable." }); + private static WeatherForecast[] GetWeather() => Enumerable.Range(1, 5).Select(index => new WeatherForecast { Date = DateTime.Now.AddDays(index), @@ -57,4 +82,4 @@ public class WeatherForecastController : ControllerBase Summary = Summaries[Random.Shared.Next(Summaries.Length)] }) .ToArray(); -} \ No newline at end of file +} diff --git a/sample/WebApi/Program.cs b/sample/WebApi/Program.cs index 0273fb7..9d0ebef 100644 --- a/sample/WebApi/Program.cs +++ b/sample/WebApi/Program.cs @@ -3,8 +3,8 @@ using Microsoft.Extensions.Options; using OpenTelemetry.Metrics; using OpenTelemetry.Resources; -using Scalar.AspNetCore; using SampleWebApplicationSLI; +using Scalar.AspNetCore; using Trellis.ServiceLevelIndicators; var builder = WebApplication.CreateBuilder(args); @@ -37,8 +37,7 @@ options.CustomerResourceId = "SampleCustomerResourceId"; options.LocationId = ServiceLevelIndicator.CreateLocationId("public", AzureLocation.WestUS3.Name); }) -.AddMvc() -.AddHttpMethod(); +.AddMvc(); var app = builder.Build(); diff --git a/sample/WebApi/SampleApiMeters.cs b/sample/WebApi/SampleApiMeters.cs index 343baf9..8a09e12 100644 --- a/sample/WebApi/SampleApiMeters.cs +++ b/sample/WebApi/SampleApiMeters.cs @@ -7,4 +7,4 @@ internal class SampleApiMeters public const string MeterName = "SampleMeter"; public Meter Meter { get; } = new Meter(MeterName); -} \ No newline at end of file +} diff --git a/sample/WebApi/WeatherForecast.cs b/sample/WebApi/WeatherForecast.cs index 0893baa..906d361 100644 --- a/sample/WebApi/WeatherForecast.cs +++ b/sample/WebApi/WeatherForecast.cs @@ -24,4 +24,4 @@ public class WeatherForecast /// Temperature feeling. /// public string? Summary { get; set; } -} \ No newline at end of file +} diff --git a/sample/WebApiVersioned/Controllers/2023-06-06/HelloWorldController.cs b/sample/WebApiVersioned/Controllers/2023-06-06/HelloWorldController.cs index 091388e..a2e3cf9 100644 --- a/sample/WebApiVersioned/Controllers/2023-06-06/HelloWorldController.cs +++ b/sample/WebApiVersioned/Controllers/2023-06-06/HelloWorldController.cs @@ -39,4 +39,4 @@ public ActionResult GetCustom([CustomerResourceId] string name) if (next < 20) return StatusCode(StatusCodes.Status500InternalServerError, "Sim Server error"); return Ok("Hello World " + name); } -} \ No newline at end of file +}