docs: update hyperdrive observability metrics documentation#31729
docs: update hyperdrive observability metrics documentation#31729ivoryibu wants to merge 1 commit into
Conversation
Review
Code ReviewThis code review is in beta and may not always be helpful — use your judgment. Warnings (1)
Style Guide Review✅ No style-guide issues found. CommandsOnly codeowners can run commands. Post a comment with the command to trigger it.
|
|
This pull request requires reviews from CODEOWNERS as it changes files that match the following patterns:
|
There was a problem hiding this comment.
Review Summary
This is a clear, well-targeted docs change that directly addresses a recurring customer escalation path. The new How Hyperdrive distributes your connection pool subsection and the clarifications in the Dashboard and GraphQL sections do a good job of explaining why metrics can appear lower than expected.
What changed
- Added an H4 subsection explaining per-machine pool sharding (five machines per Cloudflare location).
- Clarified that pools start empty and reuse connections, so open connection counts remain low rather than climbing toward the full limit.
- Added a
:::noteexplaining thatwaitingClientsis a point-in-time count, not a measure of wait duration. - Updated the Dashboard and GraphQL sections to state that metrics are per-machine and not aggregated.
Suggestions
I left a few non-blocking inline comments with suggestions to tighten wording and avoid potential confusion. Nothing that would block merging.
Overall, this looks good to merge after considering the inline suggestions.
| Connection contention appears as a spike in waiting clients, or when open connections consistently approach the pool size maximum. If your open connections regularly approach this limit, consider [increasing your Hyperdrive connection limit](/hyperdrive/platform/limits/#request-a-limit-increase). | ||
| #### How Hyperdrive distributes your connection pool | ||
|
|
||
| Hyperdrive shards your connection pool across five machines within each Cloudflare location. Your configured maximum is divided evenly across these machines. For example, a connection limit of 100 gives each machine a maximum of 20 connections (100 / 5). |
There was a problem hiding this comment.
Suggestion: "divided evenly" implies exact equal distribution for every possible limit. For limits not divisible by 5 (for example, 97), this may not be literally true. Consider softening to "divided across" or adding an "approximately" qualifier.
|
|
||
| Connection contention appears as a sustained spike in waiting clients, or when per-machine open connections consistently approach the per-machine slice of your pool size maximum. If your open connections regularly approach this limit, consider [increasing your Hyperdrive connection limit](/hyperdrive/platform/limits/#request-a-limit-increase). | ||
|
|
||
| To estimate your total connection usage across a configuration, sum the per-location values returned by the [GraphQL Analytics API](#query-via-the-graphql-api). |
There was a problem hiding this comment.
Suggestion: This says to "sum the per-location values," but the GraphQL section clarifies that each returned value is per-machine. This could be misread as summing one value per location. Consider rewording to something like: "To estimate your total connection usage across a configuration, sum all values returned by the GraphQL Analytics API."
|
|
||
| Hyperdrive's GraphQL datasets require an `accountTag` filter with your Cloudflare account ID. Hyperdrive exposes the `hyperdriveQueriesAdaptiveGroups` and `hyperdrivePoolSizesAdaptiveGroups` datasets. | ||
|
|
||
| The `hyperdrivePoolSizesAdaptiveGroups` dataset reports per-machine values grouped by the `coloCode` dimension, rather than a single aggregated total. To approximate your total connection usage across all locations, sum the values across each `coloCode` group. This remains an estimate, because each value reflects a single machine rather than a location's full pool. Refer to [How Hyperdrive distributes your connection pool](#how-hyperdrive-distributes-your-connection-pool) for details. |
There was a problem hiding this comment.
Suggestion: The instruction to "sum the values across each coloCode group" is slightly ambiguous about whether multiple entries per coloCode need to be summed. Consider making it explicit, for example: "To approximate your total connection usage across all locations, sum the values for every entry returned. Each entry reflects a single machine, so you can also group by coloCode first if you need a per-location estimate."
PR #31729 Review CompleteLabels applied: SummaryThis PR updates What I found
Issues flagged (non-blocking)I posted three inline review comments with suggestions to tighten wording:
Overall, this is a good change that directly addresses a real support escalation pattern. I left the review as Comment rather than Approve or Request changes so the author can decide whether to accept the wording suggestions. No commits were pushed and no new PRs were opened. |
|
Preview URL: https://1b543f5b.preview.developers.cloudflare.com Files with changes (up to 15) |
Clarifies how Hyperdrive connection pool metrics are reported, to reduce confusion from customers who see "Open Connections" plateau below their configured limit and assume the product is throttling them. This is a recurring escalation path where the pooling behavior is healthy but the per-machine metric presentation looks broken.
max.currentPoolSizeplateaus at the per-machine slice rather than the full limit even when healthy.waitingClientsis a point-in-time count, not a duration. A brief wait (typically milliseconds) registers as a spike, which customers misread as clients being stuck.hyperdrivePoolSizesAdaptiveGroupsreports per-machine values grouped bycoloCode, and that summing acrosscoloCodegroups is the closest available estimate of total usage.Source: recent connection-limit escalations and the near-term documentation actions in the pool metrics PRD. No data pipeline or schema changes are involved.