Skip to content

feat(nns): add maturity modulation freshness metrics#10261

Open
jasonz-dfinity wants to merge 1 commit into
masterfrom
add-maturity-modulation-metrics-to-nns-governance
Open

feat(nns): add maturity modulation freshness metrics#10261
jasonz-dfinity wants to merge 1 commit into
masterfrom
add-maturity-modulation-metrics-to-nns-governance

Conversation

@jasonz-dfinity
Copy link
Copy Markdown
Contributor

Summary

Adds two Prometheus gauges to NNS Governance /metrics so monitoring can alert on stale maturity modulation and on growing gaps in the ICP/XDR rate history that drives it.

  • governance_maturity_modulation_updated_at_timestamp_seconds — drives IC_NNS_StaleMaturityModulation. Sourced from heap_data.maturity_modulation.updated_at_days_since_epoch * 86_400 so time() - X works in PromQL.
  • governance_icp_xdr_price_history_missing_days_in_window — drives IC_NNS_MissingIcpXdrRatesHistory. Count of days in [today-364, today] with no entry in heap_data.icp_price_history.icp_xdr_rates.

Both gauges are skipped entirely while the backing state is None, so a freshly-installed canister does not trip alerts before the initial ~30-minute backfill completes. After PR #10210 added LOCF, isolated gaps no longer block modulation updates; these metrics let us alert before gaps grow large enough to compromise the 7-day or 365-day window.

Sequencing

Must merge and roll out to mainnet NNS Governance before the k8s alerts go live, otherwise the alerts will fire on missing metrics. The k8s alert change is blocked on this one rolling out.

Testing

  • Added test_metrics_maturity_modulation_and_icp_price_history covering both Some and None states for heap_data.maturity_modulation and heap_data.icp_price_history, including a stale entry outside the 365-day window that must not count toward days_present.

Add two Prometheus gauges to the NNS Governance /metrics endpoint so monitoring
can alert on stale maturity modulation and on growing gaps in the ICP/XDR rate
history that drives it:

* governance_maturity_modulation_updated_at_timestamp_seconds
* governance_icp_xdr_price_history_missing_days_in_window

Both gauges are skipped entirely while their backing state is None so a
freshly-installed canister does not trip alerts before the initial ~30-minute
backfill completes.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds Prometheus “freshness” metrics to the NNS Governance /metrics endpoint to support alerting on stale maturity modulation computation and gaps in the ICP/XDR price history used to compute it.

Changes:

  • Added two new gauges to encode_metrics: maturity modulation “updated at” (as a Unix timestamp in seconds) and missing ICP/XDR rate days over the last 365-day window.
  • Re-exported ONE_DAY_SECONDS from timer_tasks for use in metrics encoding.
  • Added unit test coverage for None vs Some state behavior and window filtering, and documented the addition in the unreleased changelog.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
rs/nns/governance/unreleased_changelog.md Documents the new /metrics gauges for freshness monitoring.
rs/nns/governance/src/timer_tasks/mod.rs Re-exports ONE_DAY_SECONDS from the ICP/XDR rate update task module.
rs/nns/governance/src/lib.rs Encodes the two new freshness gauges into the Governance metrics output.
rs/nns/governance/src/governance/tests/mod.rs Adds a unit test validating emission/omission and correct window counting.
Comments suppressed due to low confidence (1)

rs/nns/governance/src/lib.rs:719

  • updated_at_days_since_epoch * ONE_DAY_SECONDS can overflow u64 before being cast to f64, which would silently wrap in release builds and produce an incorrect timestamp metric. Consider avoiding integer multiplication overflow by converting operands to f64 before multiplying, or using checked_mul/saturating_mul and skipping/clamping the metric on overflow.
        w.encode_gauge(
            "governance_maturity_modulation_updated_at_timestamp_seconds",
            (updated_at_days_since_epoch * ONE_DAY_SECONDS) as f64,
            "Timestamp (seconds since the Unix epoch) of the day for which the cached maturity modulation was last computed.",
        )?;

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 143 to 145
use storage::VOTING_POWER_SNAPSHOTS;
use timer_tasks::encode_timer_task_metrics;
use timer_tasks::{ONE_DAY_SECONDS, encode_timer_task_metrics};

Comment on lines 17 to +19
pub(crate) use update_icp_xdr_rate_related_data::{
MATURITY_MODULATION_MAX_PERMYRIAD_MISSION_70, MATURITY_MODULATION_MIN_PERMYRIAD_MISSION_70,
ONE_DAY_SECONDS,
Comment on lines 872 to +876
test_utils::{MockEnvironment, StubCMC, StubIcpLedger},
};

const ONE_DAY_SECONDS: u64 = 86_400;

@jasonz-dfinity jasonz-dfinity marked this pull request as ready for review May 19, 2026 23:21
@jasonz-dfinity jasonz-dfinity requested a review from a team as a code owner May 19, 2026 23:21
Copy link
Copy Markdown
Contributor

@github-actions github-actions Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This pull request changes code owned by the Governance team. Therefore, make sure that
you have considered the following (for Governance-owned code):

  1. Update unreleased_changelog.md (if there are behavior changes, even if they are
    non-breaking).

  2. Are there BREAKING changes?

  3. Is a data migration needed?

  4. Security review?

How to Satisfy This Automatic Review

  1. Go to the bottom of the pull request page.

  2. Look for where it says this bot is requesting changes.

  3. Click the three dots to the right.

  4. Select "Dismiss review".

  5. In the text entry box, respond to each of the numbered items in the previous
    section, declare one of the following:

  • Done.

  • $REASON_WHY_NO_NEED. E.g. for unreleased_changelog.md, "No
    canister behavior changes.", or for item 2, "Existing APIs
    behave as before.".

Brief Guide to "Externally Visible" Changes

"Externally visible behavior change" is very often due to some NEW canister API.

Changes to EXISTING APIs are more likely to be "breaking".

If these changes are breaking, make sure that clients know how to migrate, how to
maintain their continuity of operations.

If your changes are behind a feature flag, then, do NOT add entrie(s) to
unreleased_changelog.md in this PR! But rather, add entrie(s) later, in the PR
that enables these changes in production.

Reference(s)

For a more comprehensive checklist, see here.

GOVERNANCE_CHECKLIST_REMINDER_DEDUP

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants