feat(metrics): migrate app/legacyabci, loadtest, utils/logging, wasmbinding to OpenTelemetry#3446
Conversation
PR SummaryMedium Risk Overview Updates Adds OTel error counters for Reviewed by Cursor Bugbot for commit 9685149. Bugbot is set up for automated code reviews on this repo. Configure here. |
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## main #3446 +/- ##
==========================================
+ Coverage 59.32% 59.34% +0.01%
==========================================
Files 2127 2130 +3
Lines 175898 175859 -39
==========================================
+ Hits 104353 104364 +11
+ Misses 62457 62404 -53
- Partials 9088 9091 +3
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
| // Generate a message type first | ||
| messageType := c.getRandomMessageType(config.MessageTypes) | ||
| metrics.IncrProducerEventCount(messageType) | ||
| loadtestMetrics.produceCount.Add(context.Background(), 1, otelmetric.WithAttributes(attribute.String("msg_type", messageType))) |
There was a problem hiding this comment.
For loadtest, replaced the metrics as it is only for test and no need to emit dual metrics.
| // reraise panic in main goroutine | ||
| panic(err) | ||
| case <-time.After(after): | ||
| metrics.IncrLogIfNotDoneAfter(label) |
There was a problem hiding this comment.
This function LogIfNotDoneAfter is only used in tests, so no need for keeping the older metric.
| // Metric Names: | ||
| // | ||
| // sei_log_not_done_after_counter | ||
| func IncrLogIfNotDoneAfter(label string) { |
There was a problem hiding this comment.
removed functions that were only used in tests.
| } | ||
| } | ||
|
|
||
| func MetricsPanicCallback(err any, ctx sdk.Context, key string) { |
There was a problem hiding this comment.
Unused function.
…and-some-others-Otel
| wasmQueryMetrics.sdkError.Add(ctx, 1, metric.WithAttributes( | ||
| attribute.String("scenario", scenario), | ||
| attribute.String("codespace", codespace), | ||
| attribute.String("code", fmt.Sprintf("%d", code)), |
There was a problem hiding this comment.
- Why convert to string? attributes can have numeric types.
- What is the cardinality of the code? For example if we read this code over the wire, we mustn't add it as an attribute to metrics since it opens up arbitrary memory growth.
There was a problem hiding this comment.
That is a great feedback! I will remove the code attribute to make the cardinality more manageable while keeping scenario and codespace which tells us "which query type is failing in which module".
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 9685149. Configure here.
| "github.com/sei-protocol/sei-chain/utils/metrics" | ||
| tokenfactorytypes "github.com/sei-protocol/sei-chain/x/tokenfactory/types" | ||
| "go.opentelemetry.io/otel/attribute" | ||
| otelmetric "go.opentelemetry.io/otel/metric" |
There was a problem hiding this comment.
Loadtest OTel provider never configured
Medium Severity
The loadtest binary records produce, consume, and tps via OpenTelemetry but never configures a global MeterProvider (unlike seid, which calls SetupOtelMetricsProvider). Instruments use the default no-op provider, so those metrics are dropped and no longer show up on the loadtest /metrics scrape after the legacy telemetry helpers were removed.
Additional Locations (2)
Reviewed by Cursor Bugbot for commit 9685149. Configure here.
There was a problem hiding this comment.
Not spending too much time on loadtest as the sei-load is the preferred load testing framework.


Adds OTel instrumentation to
app/legacyabci,loadtest,utils/logging, andwasmbindingfollowing the same pattern as PLT-329, PLT-330, and PLT-339 (#3439).New instruments
app/legacyabci (meter
legacyabci)begin_blocker_duration— histogram, seconds, fine-grained buckets; dual-emit TODO(PLT-343)ibc_begin_blocker_duration— histogram, seconds, fine-grained buckets; dual-emit TODO(PLT-343)tx_duration— histogram, seconds,modelabel (check/recheck/deliver); dual-emit TODO(PLT-343)loadtest (meter
loadtest)produce— counter,msg_typelabel (replacesmetrics.IncrProducerEventCount)consume— counter,msg_typelabel (replacesmetrics.IncrConsumerEventCount)tps— gauge,msg_typelabel (replacesmetrics.SetThroughputMetricByType)utils/logging (meter
utils_logging)log_not_done_after— counter,labellabel (replacesmetrics.IncrLogIfNotDoneAfter)wasmbinding (meter
wasmbinding)wasm_query_association_error— counter,scenario+typelabels (replacesmetrics.IncrementErrorMetrics)wasm_query_sdk_error— counter,scenario+codespace+codelabels (new; fires for any structured SDK error that is not an association error)Notes
loadtestandutils/logging(only used for tests) are direct replacement with no dual-emit — legacy calls removed entirely.app/legacyabciuses dual-emit withTODO(PLT-343)comments pending dashboard verification.utils/panic.MetricsPanicCallbackwas unused and removed as part of this cleanup.