Skip to content

feat: add /metrics endpoint with Prometheus support and configuration options#431

Merged
frontegg-david merged 4 commits into
mainfrom
fix/397-metrics-endpoint
May 24, 2026
Merged

feat: add /metrics endpoint with Prometheus support and configuration options#431
frontegg-david merged 4 commits into
mainfrom
fix/397-metrics-endpoint

Conversation

@frontegg-david
Copy link
Copy Markdown
Contributor

@frontegg-david frontegg-david commented May 17, 2026

Summary by CodeRabbit

Release Notes

  • New Features

    • Added optional /metrics endpoint for Prometheus metrics scraping (disabled by default; enable via metrics: { enabled: true } configuration).
    • Metrics endpoint exposes process statistics, framework counters, and custom counters with configurable output format (Prometheus text or JSON).
    • Optional token-based authentication for metrics endpoint.
    • Category filtering to include/exclude specific metric types.
  • Documentation

    • Added comprehensive metrics endpoint documentation, configuration reference, and getting-started example.

Review Change Stack

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 17, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: d493fa26-1f1d-40de-89fa-67256bf94562

📥 Commits

Reviewing files that changed from the base of the PR and between b0b56f0 and 65d2ba3.

📒 Files selected for processing (1)
  • libs/skills/catalog/skills-manifest.json
✅ Files skipped from review due to trivial changes (1)
  • libs/skills/catalog/skills-manifest.json

📝 Walkthrough

Walkthrough

This PR implements an opt-in Prometheus-compatible /metrics endpoint for FrontMCP, featuring Prometheus text/JSON exposition renderers, per-scrape process metrics collection (CPU, memory, event-loop lag, file descriptors), configurable authentication, category-based filtering, server integration, comprehensive tests, and user documentation.

Changes

Metrics Endpoint Feature

Layer / File(s) Summary
Prometheus exposition renderers & tests
libs/observability/src/prometheus/render.ts, libs/observability/src/prometheus/__tests__/render.spec.ts, libs/observability/src/prometheus/index.ts, libs/observability/src/index.ts
renderPrometheusExposition() formats counters/gauges as Prometheus text (0.0.4) with deterministic ordering, label escaping, and grouped HELP/TYPE output. renderJsonExposition() returns a { counters, gauges } envelope. Tests verify formatting, label escaping, metric filtering, and stability.
Process stats collector & tests
libs/observability/src/process-stats/process-stats.collector.ts, libs/observability/src/process-stats/__tests__/process-stats.collector.spec.ts, libs/observability/src/process-stats/index.ts
ProcessStatsCollector emits per-scrape process metrics: CPU (user/system seconds), memory (rss/heap/external), uptime, optional event-loop lag quantiles, optional active handles/requests, and optional Linux file descriptor count. Supports full DI for all probes. Tests cover default and injected behaviors, end-to-end scenarios with real Node APIs, and edge cases (NaN/Infinity handling, missing APIs).
Configuration types & schemas
libs/sdk/src/common/types/options/metrics/interfaces.ts, libs/sdk/src/common/types/options/metrics/schema.ts, libs/sdk/src/common/types/options/metrics/index.ts, libs/sdk/src/common/types/options/index.ts, libs/sdk/src/common/metadata/front-mcp.metadata.ts, libs/sdk/src/common/tokens/front-mcp.tokens.ts
Introduces MetricsFormat (prometheus/json), MetricsAuth (public/token/inline), MetricsCategory enum, and MetricsOptionsInterface for /metrics configuration. Zod schemas validate and provide defaults (disabled by default). Metadata and token wiring enables decorator parsing.
MetricsService implementation & errors
libs/sdk/src/metrics/metrics.service.ts, libs/sdk/src/metrics/metrics.errors.ts, libs/sdk/src/metrics/__tests__/metrics.service.spec.ts, libs/sdk/src/metrics/index.ts
MetricsService validates metrics.path against reserved MCP prefixes, resolves token auth from env or inline config, filters counters/gauges by include[] categories, and renders snapshots. Throws MetricsPathConflictError and MetricsTokenNotConfiguredError on startup validation. Tests cover path guards, auth logic, filtering, and both Prometheus and JSON rendering.
Route registration & handler
libs/sdk/src/metrics/metrics.routes.ts, libs/sdk/src/metrics/__tests__/metrics.routes.spec.ts
registerMetricsRoutes() conditionally registers a GET handler when enabled, reads Authorization headers case-insensitively, calls authorize() (returns 200/401/403), sets Cache-Control headers, and serves either JSON or Prometheus text. Guards against JSON parse errors and missing adapter methods. Tests verify registration, auth flows (public/token/inline), format switching and error handling.
Server wiring & lifecycle
libs/sdk/src/front-mcp/front-mcp.ts, libs/sdk/src/server/server.instance.ts
FrontMcpInstance calls wireMetricsService() during start() and serverless handler creation, instantiating MetricsService when metrics.enabled === true (disabled by default). FrontMcpServerInstance tracks metrics state and conditionally registers routes via setMetricsService() and setMetricsConfig() methods; routes only registered when both service and config are present and enabled.
Documentation & examples
docs/frontmcp/deployment/metrics.mdx, docs/docs.json, libs/skills/catalog/frontmcp-observability/examples/metrics-endpoint/enable-metrics-endpoint.md, libs/skills/catalog/frontmcp-observability/references/metrics-endpoint.md, libs/skills/catalog/frontmcp-observability/SKILL.md, libs/skills/catalog/skills-manifest.json
Comprehensive deployment docs cover endpoint purpose (off by default, same listener as /healthz), quick-start config, response formats, auth modes with token env behavior, category filtering, path conflict guard, OpenTelemetry interaction, and custom counter guidance. Quick-enable example shows @FrontMcp({ metrics: { enabled: true } }) with curl verification. Skill catalog and manifest updated with metrics-endpoint references and examples.
Supporting changes & utilities
libs/observability/src/telemetry/__tests__/telemetry.factory.spec.ts, libs/utils/src/fs/fs.ts, libs/utils/src/fs/index.ts, libs/utils/src/index.ts, libs/skills/__tests__/skills-validation.spec.ts
Telemetry factory tests validate counter/span creation and withSpan async handling. New readdirSync() utility for synchronous directory listing with lazy fs module loading. Skills validation test exempts metrics-endpoint docs from auth-shorthand checks.

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

📊 A /metrics path now glows with Prometheus light,
Process stats and counters dancing through the night,
Token auth keeps watch while categories align,
From process CPU to event-loop lag so fine,
Observability blooms—off by default, perfectly bright! 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.29% which is insufficient. The required threshold is 65.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main feature being added: a /metrics endpoint with Prometheus support and configuration options, which aligns with the core changes across observability, SDK, and documentation files.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch fix/397-metrics-endpoint

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@libs/observability/src/process-stats/process-stats.collector.ts`:
- Around line 86-91: In the defaultReadFdCount function replace the direct
require('node:fs') usage with the filesystem helpers from `@frontmcp/utils`:
import or call the utils' readdirSync (or equivalent synchronous directory read)
to list '/proc/self/fd' and return its length; keep the existing platform check
(process.platform !== 'linux') and catch behavior unchanged, and ensure types
match the previous usage (i.e., treat the utils API as returning a string[] so
defaultReadFdCount still returns number | undefined). Use the function name
defaultReadFdCount to locate the change.

In `@libs/observability/src/prometheus/render.ts`:
- Around line 91-117: The code groups metric samples into byName using only
entry.name, which lets counters and gauges with the same name merge and produce
a wrong `# TYPE` block; update the grouping logic in the loops that handle
counters and gauges (the blocks that reference METRIC_NAME_REGEX, formatFloat,
sortedLabelString and manipulate byName and group) to detect cross-type
collisions: if byName.get(entry.name) exists with a different group.type, either
skip adding the conflicting sample and record/log the collision or create
separate keys that include the metric type (e.g., `${entry.name}|${type}`) so
counters and gauges are isolated; ensure help propagation (the group.help
assignment logic) still applies only within the same-type group.

In `@libs/sdk/src/metrics/__tests__/metrics.routes.spec.ts`:
- Around line 37-133: The tests mutate process.env['FRONTMCP_METRICS_TOKEN']
without restoring it; capture the original value at suite start (e.g. const
originalMetricsToken = process.env['FRONTMCP_METRICS_TOKEN'] or use a let) and
restore it in an afterEach hook so each test leaves env unchanged; update the
describe block around registerMetricsRoutes to save the value before token-auth
tests run and call process.env['FRONTMCP_METRICS_TOKEN'] = originalMetricsToken
(or delete it if undefined) in afterEach to isolate state changes.

In `@libs/sdk/src/metrics/__tests__/metrics.service.spec.ts`:
- Line 121: Replace the incorrect use of Parameters with ConstructorParameters
when deriving the type of the MetricsService constructor's second argument in
the tests: locate the casts using Parameters<typeof MetricsService>[1] (used
around the mock constructor args for MetricsService) and change them to
ConstructorParameters<typeof MetricsService>[1]; ensure both occurrences (the
one near the first mock and the second near the other mock) are updated so the
test types correctly reflect the class constructor signature.

In `@libs/skills/catalog/frontmcp-observability/references/metrics-endpoint.md`:
- Around line 99-102: The fenced code block containing the Prometheus sample
(the lines including "# TYPE my_cache_hits_total counter" and
"my_cache_hits_total{tier=\"l1\"} 1") lacks a language tag and triggers MD040;
update the opening triple-backtick to include a language tag (e.g., "text") so
the block becomes a typed fence (```text) to satisfy markdownlint and make the
snippet explicit.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: b2c61afa-f38c-492a-9428-a56621236b19

📥 Commits

Reviewing files that changed from the base of the PR and between 4328981 and 154767f.

📒 Files selected for processing (29)
  • .gitignore
  • docs/docs.json
  • docs/frontmcp/deployment/metrics.mdx
  • libs/observability/src/index.ts
  • libs/observability/src/process-stats/__tests__/process-stats.collector.spec.ts
  • libs/observability/src/process-stats/index.ts
  • libs/observability/src/process-stats/process-stats.collector.ts
  • libs/observability/src/prometheus/__tests__/render.spec.ts
  • libs/observability/src/prometheus/index.ts
  • libs/observability/src/prometheus/render.ts
  • libs/sdk/src/common/metadata/front-mcp.metadata.ts
  • libs/sdk/src/common/tokens/front-mcp.tokens.ts
  • libs/sdk/src/common/types/options/index.ts
  • libs/sdk/src/common/types/options/metrics/index.ts
  • libs/sdk/src/common/types/options/metrics/interfaces.ts
  • libs/sdk/src/common/types/options/metrics/schema.ts
  • libs/sdk/src/front-mcp/front-mcp.ts
  • libs/sdk/src/metrics/__tests__/metrics.routes.spec.ts
  • libs/sdk/src/metrics/__tests__/metrics.service.spec.ts
  • libs/sdk/src/metrics/index.ts
  • libs/sdk/src/metrics/metrics.errors.ts
  • libs/sdk/src/metrics/metrics.routes.ts
  • libs/sdk/src/metrics/metrics.service.ts
  • libs/sdk/src/server/server.instance.ts
  • libs/skills/__tests__/skills-validation.spec.ts
  • libs/skills/catalog/frontmcp-observability/SKILL.md
  • libs/skills/catalog/frontmcp-observability/examples/metrics-endpoint/enable-metrics-endpoint.md
  • libs/skills/catalog/frontmcp-observability/references/metrics-endpoint.md
  • libs/skills/catalog/skills-manifest.json

Comment thread libs/observability/src/process-stats/process-stats.collector.ts
Comment thread libs/observability/src/prometheus/render.ts
Comment thread libs/sdk/src/metrics/__tests__/metrics.routes.spec.ts
Comment thread libs/sdk/src/metrics/__tests__/metrics.service.spec.ts
Comment thread libs/skills/catalog/frontmcp-observability/references/metrics-endpoint.md Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 24, 2026

Performance Test Results

Status: ✅ All tests passed

Summary

Project Tests Passed Warnings Failed Leaks
✅ demo-e2e-agents 4 4 0 0 0
✅ demo-e2e-cache 11 11 0 0 0
✅ demo-e2e-codecall 4 4 0 0 0
✅ demo-e2e-config 4 4 0 0 0
✅ demo-e2e-direct 3 3 0 0 0
✅ demo-e2e-elicitation 1 1 0 0 0
✅ demo-e2e-errors 4 4 0 0 0
✅ demo-e2e-hooks 3 3 0 0 0
✅ demo-e2e-multiapp 4 4 0 0 0
✅ demo-e2e-notifications 3 3 0 0 0
✅ demo-e2e-openapi 2 2 0 0 0
✅ demo-e2e-providers 4 4 0 0 0
✅ demo-e2e-public 4 4 0 0 0
✅ demo-e2e-redis 15 15 0 0 0
✅ demo-e2e-remember 4 4 0 0 0
✅ demo-e2e-remote 5 5 0 0 0
✅ demo-e2e-serverless 2 2 0 0 0
✅ demo-e2e-skills 15 15 0 0 0
✅ demo-e2e-standalone 2 2 0 0 0
✅ demo-e2e-transport-recreation 3 3 0 0 0
✅ demo-e2e-ui 4 4 0 0 0

Total: 101 tests across 21 projects

📊 View full report in workflow run


Generated at: 2026-05-24T17:09:02.670Z
Commit: 0415aa83

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
libs/sdk/src/metrics/metrics.routes.ts (1)

70-71: ⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Set Content-Type only on success paths.

Content-Type is set to metrics format before branching, so the Line 96 JSON 500 path can inherit text/plain on some adapters. Move the content-type assignment into each success branch (or explicitly set JSON content type in error branches).

💡 Suggested patch
     const result = service.getMetrics();
     res.setHeader?.('Cache-Control', 'no-store');
-    res.setHeader?.('Content-Type', result.contentType);

     if ((config.format ?? 'prometheus') === 'json') {
       try {
+        res.setHeader?.('Content-Type', result.contentType);
         res.status(200).json(JSON.parse(result.body));
       } catch {
         // `getMetrics()` builds the JSON via `JSON.stringify`, so this
         // branch should be unreachable — but if a downstream override
         // produces malformed JSON we surface a 500 rather than letting
         // the parse exception escape the route handler.
+        res.setHeader?.('Content-Type', 'application/json; charset=utf-8');
         res.status(500).json({
           error: 'internal_error',
           message: 'Failed to serialise metrics JSON',
         });
       }
       return;
     }
     if (typeof res.send === 'function') {
+      res.setHeader?.('Content-Type', result.contentType);
       res.status(200);
       res.send(result.body);
       return;
     }
     // Prometheus scrape format is `text/plain` — wrapping it in JSON
     // would silently break every scraper. Surface a 500 instead so the
     // adapter mismatch is visible.
+    res.setHeader?.('Content-Type', 'application/json; charset=utf-8');
     res.status(500).json({
       error: 'internal_error',
       message: 'Server adapter does not support Prometheus text format. Use `format: "json"` or upgrade the adapter.',
     });

Also applies to: 93-99

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@libs/sdk/src/metrics/metrics.routes.ts` around lines 70 - 71, The
Content-Type header is being set unconditionally
(res.setHeader?.('Content-Type', result.contentType)) which lets error paths
inherit the wrong type; update the metrics handler to set Content-Type only in
each success branch that sends a formatted metric (use result.contentType where
the response body is written in the success branches) and explicitly set
'application/json' (or appropriate error content type) in the error/500 branches
(e.g., the JSON error response at the path that returns 500). Locate the header
usage in metrics.routes.ts (the res.setHeader calls and the success branches
that use result.contentType) and move or add the header assignments accordingly
so errors never inherit the success content type.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Outside diff comments:
In `@libs/sdk/src/metrics/metrics.routes.ts`:
- Around line 70-71: The Content-Type header is being set unconditionally
(res.setHeader?.('Content-Type', result.contentType)) which lets error paths
inherit the wrong type; update the metrics handler to set Content-Type only in
each success branch that sends a formatted metric (use result.contentType where
the response body is written in the success branches) and explicitly set
'application/json' (or appropriate error content type) in the error/500 branches
(e.g., the JSON error response at the path that returns 500). Locate the header
usage in metrics.routes.ts (the res.setHeader calls and the success branches
that use result.contentType) and move or add the header assignments accordingly
so errors never inherit the success content type.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 41cccfc0-0198-452b-967c-656df2ced164

📥 Commits

Reviewing files that changed from the base of the PR and between 154767f and b0b56f0.

📒 Files selected for processing (13)
  • libs/observability/src/process-stats/__tests__/process-stats.collector.spec.ts
  • libs/observability/src/process-stats/process-stats.collector.ts
  • libs/observability/src/telemetry/__tests__/telemetry.factory.spec.ts
  • libs/sdk/src/front-mcp/front-mcp.ts
  • libs/sdk/src/metrics/__tests__/metrics.routes.spec.ts
  • libs/sdk/src/metrics/__tests__/metrics.service.spec.ts
  • libs/sdk/src/metrics/index.ts
  • libs/sdk/src/metrics/metrics.routes.ts
  • libs/sdk/src/metrics/metrics.service.ts
  • libs/skills/catalog/frontmcp-observability/references/metrics-endpoint.md
  • libs/utils/src/fs/fs.ts
  • libs/utils/src/fs/index.ts
  • libs/utils/src/index.ts
✅ Files skipped from review due to trivial changes (1)
  • libs/skills/catalog/frontmcp-observability/references/metrics-endpoint.md

@frontegg-david frontegg-david merged commit 435ba4f into main May 24, 2026
32 checks passed
@frontegg-david frontegg-david deleted the fix/397-metrics-endpoint branch May 24, 2026 17:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Feature request: built-in /metrics endpoint exposing CPU / memory / resource usage

1 participant