docs: update hyperdrive observability metrics documentation by ivoryibu · Pull Request #31729 · cloudflare/cloudflare-docs

ivoryibu · 2026-06-26T16:10:32Z

Clarifies how Hyperdrive connection pool metrics are reported, to reduce confusion from customers who see "Open Connections" plateau below their configured limit and assume the product is throttling them. This is a recurring escalation path where the pooling behavior is healthy but the per-machine metric presentation looks broken.

Add a "How Hyperdrive distributes your connection pool" subsection to Pool size metrics explaining that the pool is sharded across 5 machines per Cloudflare location, so the configured maximum is divided across them (for example, a limit of 100 gives each machine 20). This explains why max.currentPoolSize plateaus at the per-machine slice rather than the full limit even when healthy.
Note that pools start empty and reuse connections, so Hyperdrive opens few origin connections — addressing customers who expect to see connection counts climb toward their full limit.
Add a note clarifying that waitingClients is a point-in-time count, not a duration. A brief wait (typically milliseconds) registers as a spike, which customers misread as clients being stuck.
State that pool size metrics are currently per-machine and not aggregated, and tell customers to sum per-location values to estimate total usage. Phrased as current behavior to leave room for a future aggregate ("All locations") view.
Add context to the dashboard section noting the Pool connections chart shows per-machine values.
Add a GraphQL API note that hyperdrivePoolSizesAdaptiveGroups reports per-machine values grouped by coloCode, and that summing across coloCode groups is the closest available estimate of total usage.

Source: recent connection-limit escalations and the near-term documentation actions in the pool metrics PRD. No data pipeline or schema changes are involved.

cloudflare-docs-bot · 2026-06-26T16:10:39Z

Review

⚠️ 1 warning found in commit 1b543f5.

Code Review

This code review is in beta and may not always be helpful — use your judgment.

Warnings (1)

File	Issue
`hyperdrive/observability/metrics.mdx` line 62	Inconsistent aggregation guidance — Line 62 tells readers to estimate total connection usage by summing per-location values. Lines 50-54 and 89 state that pool metrics are reported per machine, with five machines per Cloudflare location, and that each returned value reflects a single machine rather than a location's full pool. If the query returns one value per coloCode, summing those values would undercount total usage by a factor of five. Fix: Clarify whether readers should sum all per-machine rows from the GraphQL response, or adjust the guidance to multiply by the number of machines per location when estimating total usage.

Style Guide Review

✅ No style-guide issues found.

Commands

Only codeowners can run commands. Post a comment with the command to trigger it.

Command	Description
`/review`	Runs a review now. Incremental if a prior review exists, full if not.
`/full-review`	Re-reviews the entire PR diff from scratch, ignoring incremental history. Useful after a rebase, when you want a fresh review, or if the bot gets out of sync and reports issues that no longer exist.
`/ignore-review-limit`	Permanently lifts the 2-review automatic limit for this PR. Future pushes will trigger reviews as normal.

github-actions · 2026-06-26T16:11:15Z

This pull request requires reviews from CODEOWNERS as it changes files that match the following patterns:

Pattern	Owners
`/src/content/docs/hyperdrive/`	`@elithrar`, `@rita3ko`, `@irvinebroque`, `@vy-ton`, `@ivoryibu`, `@thomasgauvin`, `@sejoker`, `@oxyjun`, `@knickish`, `@cloudflare/product-owners`

ask-bonk

Review Summary

This is a clear, well-targeted docs change that directly addresses a recurring customer escalation path. The new How Hyperdrive distributes your connection pool subsection and the clarifications in the Dashboard and GraphQL sections do a good job of explaining why metrics can appear lower than expected.

What changed

Added an H4 subsection explaining per-machine pool sharding (five machines per Cloudflare location).
Clarified that pools start empty and reuse connections, so open connection counts remain low rather than climbing toward the full limit.
Added a :::note explaining that waitingClients is a point-in-time count, not a measure of wait duration.
Updated the Dashboard and GraphQL sections to state that metrics are per-machine and not aggregated.

Suggestions

I left a few non-blocking inline comments with suggestions to tighten wording and avoid potential confusion. Nothing that would block merging.

Overall, this looks good to merge after considering the inline suggestions.

ask-bonk · 2026-06-26T16:16:25Z

-Connection contention appears as a spike in waiting clients, or when open connections consistently approach the pool size maximum. If your open connections regularly approach this limit, consider [increasing your Hyperdrive connection limit](/hyperdrive/platform/limits/#request-a-limit-increase).
+#### How Hyperdrive distributes your connection pool
+
+Hyperdrive shards your connection pool across five machines within each Cloudflare location. Your configured maximum is divided evenly across these machines. For example, a connection limit of 100 gives each machine a maximum of 20 connections (100 / 5).


Suggestion: "divided evenly" implies exact equal distribution for every possible limit. For limits not divisible by 5 (for example, 97), this may not be literally true. Consider softening to "divided across" or adding an "approximately" qualifier.

ask-bonk · 2026-06-26T16:16:25Z

+
+Connection contention appears as a sustained spike in waiting clients, or when per-machine open connections consistently approach the per-machine slice of your pool size maximum. If your open connections regularly approach this limit, consider [increasing your Hyperdrive connection limit](/hyperdrive/platform/limits/#request-a-limit-increase).
+
+To estimate your total connection usage across a configuration, sum the per-location values returned by the [GraphQL Analytics API](#query-via-the-graphql-api).


Suggestion: This says to "sum the per-location values," but the GraphQL section clarifies that each returned value is per-machine. This could be misread as summing one value per location. Consider rewording to something like: "To estimate your total connection usage across a configuration, sum all values returned by the GraphQL Analytics API."

ask-bonk · 2026-06-26T16:16:25Z


 Hyperdrive's GraphQL datasets require an `accountTag` filter with your Cloudflare account ID. Hyperdrive exposes the `hyperdriveQueriesAdaptiveGroups` and `hyperdrivePoolSizesAdaptiveGroups` datasets.

+The `hyperdrivePoolSizesAdaptiveGroups` dataset reports per-machine values grouped by the `coloCode` dimension, rather than a single aggregated total. To approximate your total connection usage across all locations, sum the values across each `coloCode` group. This remains an estimate, because each value reflects a single machine rather than a location's full pool. Refer to [How Hyperdrive distributes your connection pool](#how-hyperdrive-distributes-your-connection-pool) for details.


Suggestion: The instruction to "sum the values across each coloCode group" is slightly ambiguous about whether multiple entries per coloCode need to be summed. Consider making it explicit, for example: "To approximate your total connection usage across all locations, sum the values for every entry returned. Each entry reflects a single machine, so you can also group by coloCode first if you need a per-location estimate."

ask-bonk · 2026-06-26T16:16:41Z

PR #31729 Review Complete

Labels applied: documentation, product:hyperdrive, content:edit

Summary

This PR updates src/content/docs/hyperdrive/observability/metrics.mdx to clarify how Hyperdrive reports connection pool metrics. It adds a new "How Hyperdrive distributes your connection pool" subsection explaining that pools are sharded across five machines per Cloudflare location, which causes per-machine metrics to plateau below the configured maximum. It also adds a note that waitingClients is a point-in-time count (not a duration), and updates the Dashboard and GraphQL sections to state that reported values are per-machine and not aggregated.

What I found

Title & description: The title follows commit conventions and the description is thorough and accurately reflects the diff.
Frontmatter & structure: Valid frontmatter, correct heading hierarchy (H2 → H3 → H4), no skipped levels.
Links: All internal links use root-relative paths with trailing slashes.
MDX syntax: No unescaped special characters; components are properly imported.
Style: Generally clean. One minor point: "five" is correctly spelled out per the style guide (numbers 0–9).

Issues flagged (non-blocking)

I posted three inline review comments with suggestions to tighten wording:

Line 50: "divided evenly" may not hold for limits not divisible by 5 — suggest softening the language.
Line 62: "sum the per-location values" is ambiguous because the API returns per-machine values — suggest clarifying that users should sum all returned values.
Line 89: The instruction to "sum the values across each coloCode group" is slightly ambiguous about multiple entries per coloCode — suggest making it explicit.

Overall, this is a good change that directly addresses a real support escalation pattern. I left the review as Comment rather than Approve or Request changes so the author can decide whether to accept the wording suggestions. No commits were pushed and no new PRs were opened.

github run

github-actions · 2026-06-26T16:39:01Z

Preview URL: https://1b543f5b.preview.developers.cloudflare.com
Preview Branch URL: https://docs-hyperdrive-graphql-metrics.preview.developers.cloudflare.com

Files with changes (up to 15)

Original Link	Updated Link
https://developers.cloudflare.com/hyperdrive/observability/metrics/	https://docs-hyperdrive-graphql-metrics.preview.developers.cloudflare.com/hyperdrive/observability/metrics/

docs: update hyperdrive observability metrics documentation

1b543f5

ivoryibu requested review from a team, Oxyjun, elithrar, irvinebroque, rita3ko, sejoker, thomasgauvin and vy-ton as code owners June 26, 2026 16:10

github-actions Bot added product:hyperdrive Hyperdrive: https://developers.cloudflare.com/hyperdrive/ size/s labels Jun 26, 2026

github-actions Bot assigned elithrar, irvinebroque, Oxyjun, rita3ko, sejoker, thomasgauvin and vy-ton Jun 26, 2026

ask-bonk Bot added documentation Documentation edits content:edit Request for content edits labels Jun 26, 2026

ivoryibu marked this pull request as draft June 26, 2026 16:16

ask-bonk Bot reviewed Jun 26, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

docs: update hyperdrive observability metrics documentation#31729

docs: update hyperdrive observability metrics documentation#31729
ivoryibu wants to merge 1 commit into
productionfrom
docs/hyperdrive-graphql-metrics

ivoryibu commented Jun 26, 2026

Uh oh!

cloudflare-docs-bot Bot commented Jun 26, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

ask-bonk Bot left a comment

Uh oh!

ask-bonk Bot Jun 26, 2026

Uh oh!

ask-bonk Bot Jun 26, 2026

Uh oh!

ask-bonk Bot Jun 26, 2026

Uh oh!

ask-bonk Bot commented Jun 26, 2026

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants


		Connection contention appears as a sustained spike in waiting clients, or when per-machine open connections consistently approach the per-machine slice of your pool size maximum. If your open connections regularly approach this limit, consider [increasing your Hyperdrive connection limit](/hyperdrive/platform/limits/#request-a-limit-increase).

		To estimate your total connection usage across a configuration, sum the per-location values returned by the [GraphQL Analytics API](#query-via-the-graphql-api).


		Hyperdrive's GraphQL datasets require an `accountTag` filter with your Cloudflare account ID. Hyperdrive exposes the `hyperdriveQueriesAdaptiveGroups` and `hyperdrivePoolSizesAdaptiveGroups` datasets.

		The `hyperdrivePoolSizesAdaptiveGroups` dataset reports per-machine values grouped by the `coloCode` dimension, rather than a single aggregated total. To approximate your total connection usage across all locations, sum the values across each `coloCode` group. This remains an estimate, because each value reflects a single machine rather than a location's full pool. Refer to [How Hyperdrive distributes your connection pool](#how-hyperdrive-distributes-your-connection-pool) for details.

Uh oh!

Conversation

ivoryibu commented Jun 26, 2026

Uh oh!

cloudflare-docs-bot Bot commented Jun 26, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review

Code Review

Style Guide Review

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

ask-bonk Bot left a comment

Choose a reason for hiding this comment

Review Summary

What changed

Suggestions

Uh oh!

ask-bonk Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot Jun 26, 2026

Choose a reason for hiding this comment

Uh oh!

ask-bonk Bot commented Jun 26, 2026

PR #31729 Review Complete

Summary

What I found

Issues flagged (non-blocking)

Uh oh!

github-actions Bot commented Jun 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

8 participants

cloudflare-docs-bot Bot commented Jun 26, 2026 •

edited

Loading