conn pool: add NotifyPeerAndDrainExisting drain behavior by ivpr · Pull Request #45322 · envoyproxy/envoy

ivpr · 2026-05-27T21:12:09Z

This PR is part of a 3-PR stack. PR2 and PR3 are out-of-tree against ivpr/envoy and exist to demonstrate a concrete consumer of the API added here — they do not need to land in envoyproxy/envoy for this PR to be useful on its own.

# Where What

PR1 (this PR) envoyproxy/envoy Core: DrainBehavior::NotifyPeerAndDrainExisting enum value + virtual hook ActiveClient::notifyPeerAndDrain() + ClusterManager::drainConnections(cluster, predicate, drain_behavior) overload.

PR2 → ivpr/envoy#1 ivpr/envoy (stacked on PR1) Upstream consumer: reverse-tunnel acceptor extension registers a POST /reverse_tunnel/drain_cluster admin endpoint that fans out the new drain behavior to every envoy.clusters.reverse_connection cluster. Opt-in via envoy.reloadable_features.reverse_tunnel_drain_with_goaway.

PR3 → ivpr/envoy#2 ivpr/envoy (stacked on PR2) Downstream consumer: drain-aware HCM observes peer GOAWAY on reverse-tunnel sockets and asks the initiator IOHandle to drop the tunnel from tracking and dial a replacement. Same runtime guard.

End-to-end flow this PR enables

sequenceDiagram
    autonumber
    participant Op as Operator
    participant US as Upstream Envoy
    participant Tun as Reverse tunnel (HTTP/2)
    participant DS as Downstream Envoy

    Note over US,DS: Steady state. DS holds a long-lived HTTP/2 tunnel. <br/>US owns the HTTP/2 client codec via the reverse_connection cluster.

    Op->>US: POST /reverse_tunnel/drain_cluster
    rect rgba(150,200,255,0.15)
        Note over US: PR2 consumer. Admin handler iterates envoy.clusters.reverse_connection clusters <br/>and fans out the new drain behavior.
        US->>US: drainConnections(cluster, nullptr, NotifyPeerAndDrainExisting)
    end

    rect rgba(255,210,150,0.25)
        Note over US: PR1 (this PR). Pool snapshots ready, busy and early-data clients, <br/>then calls MultiplexedActiveClientBase.notifyPeerAndDrain on each, <br/>which calls codec_client_.goAway and transitions to Draining if any streams are active. <br/>is_draining_for_deletion_ stays false so the pool stays usable for new streams via fresh clients.
        US-->>Tun: HTTP/2 GOAWAY
    end

    Tun-->>DS: HTTP/2 GOAWAY

    rect rgba(180,255,180,0.20)
        Note over DS: PR3 consumer. Drain-aware HCM observes peer GOAWAY, <br/>asks the reverse-tunnel IOHandle to drop this tunnel from tracking <br/>and kick maintenance to dial a replacement.
        DS->>DS: markTunnelDrainingAndDialReplacement(key)
        DS-->>Tun: dial new tunnel to sibling US replica
    end

    Note over US,DS: In-flight HTTP/2 streams on the old tunnel complete normally. <br/>The drained client auto-closes once its last stream finishes (onStreamClosed line 311-313). <br/>The tunnel TCP-closes after that.

PR1 is the only piece that lands in envoyproxy/envoy core. PR2 and PR3 are in reverse tunnel code and are linked above so you can see exactly what a consumer of this API looks like — both how it's invoked (PR2) and how the multiplexed-peer GOAWAY observation can be wired on the receiving side (PR3). Since PR2 and PR3 depend on PR1, I will raise them as new PRs in envoy proxy once this PR is merged.

What PR2 and PR3 say about the shape of the new core API

DrainBehavior::NotifyPeerAndDrainExisting is strictly opt-in. Existing callers of drainConnections(DrainAndDelete) and friends see no behavior change. PR2 / PR3 also keep the behavior runtime-guarded so end-user reverse-tunnel deployments do not see it until they explicitly enable it.
ClusterManager::drainConnections(cluster, predicate, drain_behavior) is the three-arg overload that lets an extension target a single cluster with a specific behavior. PR2 uses it to target only envoy.clusters.reverse_connection clusters; everything else is left alone.
MultiplexedActiveClientBase::notifyPeerAndDrain() handles the "preserve in-flight streams" semantics: if codec_client_->numActiveRequests() > 0, the client transitions to Draining instead of closing; otherwise it closes immediately. This is the property PR2's scenario depends on — sibling-replica failover should not kill the requests already on the wire.

How drained connections eventually close (no new flag required)

The new branch in drainConnectionsImpl does not set is_draining_for_deletion_. Clients close via existing pre-existing paths instead:

Idle-at-drain-time clients close immediately. MultiplexedActiveClientBase::notifyPeerAndDrain calls codec_client_->close() directly when numActiveRequests() == 0. The fall-through into closeIdleConnectionsForDrainingPool() flushes connecting_clients_.
Busy-then-drained clients close on last-stream-finish. They sit in Draining state while in-flight streams complete. When the last stream finishes, the existing block at conn_pool_base.cc lines 311-313 inside onStreamClosed closes the client. This is the same path the existing MaxConnectionDuration and MaxStreamsPerConnection drains rely on today.
Peer-initiated close on receipt of the GOAWAY shows up via the normal ConnectionEvent::RemoteClose handling.
Server drain timeout (--drain-time-s) is the last-resort safety net.

The pool itself stays alive (intentionally — that is why we are renaming away from …AndDelete). New requests for the same host land in newStreamImpl and pass through the !is_draining_for_deletion_ ASSERT (no crash); they find no ready clients (all drained), so tryCreateNewConnections() spins up a fresh non-Draining client.

Commit message body

Add a new ConnectionPool::DrainBehavior::NotifyPeerAndDrainExisting value that mirrors DrainExistingConnections semantically but first notifies the peer that the pool is draining. For multiplexed pools (HTTP/2, HTTP/3) the notification is a protocol-level GOAWAY; for HTTP/1 and raw TCP there is no peer-level signal and the behavior degrades to plain DrainExistingConnections (the peer learns about the drain when the TCP socket closes). Unlike DrainAndDelete, the pool stays usable: new streams are served by fresh clients while drained clients finish their in-flight work and close on idle.

What this change introduces

A new enum value ConnectionPool::DrainBehavior::NotifyPeerAndDrainExisting.
A virtual hook ConnectionPool::ActiveClient::notifyPeerAndDrain() with a default no-op implementation (so HTTP/1 + TCP pools support the new behavior without any code change — they just don't emit a peer signal).
An override on MultiplexedActiveClientBase that calls codec_client_->goAway() and transitions the client to Draining if any streams are active, otherwise closes the client immediately.
A branch in ConnPoolImplBase::drainConnectionsImpl() that snapshots ready / busy / early-data clients, invokes the new hook on each, and falls through into the existing DrainExistingConnections body so closeIdleConnectionsForDrainingPool() handles the connecting-clients cleanup. The branch does NOT set is_draining_for_deletion_, so newStreamImpl continues to accept new requests after the drain (served by fresh clients).
A new three-arg ClusterManager::drainConnections(cluster, predicate, drain_behavior) overload (parallel to the existing two-arg overload) so callers can target a single cluster with a specific behavior.

Why this isn't `GoAwayAndDrainAndDelete`

Earlier revisions of this PR named the value GoAwayAndDrainAndDelete and modeled it after DrainAndDelete (set is_draining_for_deletion_ = true, mark the pool for deletion). That was incorrect for the motivating use case (reverse-connection clusters), because the cluster's hosts remain LB-selectable after the drain — a new request after drain would trip ASSERT(!is_draining_for_deletion_) in newStreamImpl and crash the process. The new semantics keep the pool usable; closure happens naturally on stream completion via the existing onStreamClosed close-on-idle logic.

GoAway was also HTTP/2-specific vocabulary in the protocol-agnostic envoy/common/conn_pool.h enum. NotifyPeer generalizes cleanly — HTTP/1 and TCP inherit the no-op virtual without any per-protocol branching.

Risk Level

Low. The new enum value is opt-in; existing DrainBehavior values and their code paths are untouched. The new virtual on ActiveClient has a no-op default, so non-multiplexed clients see no behavior change. No structural changes to ConnectivityGrid (the renamed value does not set the grid's draining_=true flag because the pool stays usable).

Testing

Six unit tests in test/common/http/http2/conn_pool_test.cc exercising NotifyPeerAndDrainExisting on the HTTP/2 pool:

sends GOAWAY then drains a ready client with an active stream
closes an idle client immediately after GOAWAY
does not send GOAWAY on a connecting client
idempotent on repeat calls
handles multiple clients (mixed busy + idle)
NEW regression test: a new stream issued after drain creates a fresh client and completes successfully (would crash on the old GoAwayAndDrainAndDelete semantics with ASSERT(!is_draining_for_deletion_))

Existing conn_pool_base_test and conn_pool_grid_test pass unchanged.

Manual validation (in PR2/PR3 stack)

End-to-end validated with a 2-Envoy reverse-tunnel setup: US drains its reverse_connection cluster via POST /reverse_tunnel/drain_cluster, the pool fires GOAWAY on every reverse tunnel, DS observes GOAWAY and dials replacement tunnels, in-flight streams complete on the old tunnels, US stays alive and serves new requests via fresh clients — no crash.

Docs

n/a (header doc comments updated for the new enum value, virtual hook, and ClusterManager overload).

Release Notes

Added a new connection-pool drain behavior NotifyPeerAndDrainExisting that sends a protocol-level GOAWAY to each multiplexed client before draining the pool. Unlike DrainAndDelete, the pool remains usable for new streams via fresh clients while existing clients drain in the background.

Commit Message: Add a new ConnectionPool::DrainBehavior::GoAwayAndDrainAndDelete that mirrors DrainAndDelete but first sends a protocol-level GOAWAY on each multiplexed client (HTTP/2, HTTP/3). HTTP/1 and TCP pools have no GOAWAY concept and fall back to plain DrainAndDelete. Additional Description: The connection-pool API already supports DrainAndDelete and DrainExistingConnections, but neither communicates the impending drain to the peer at the protocol layer; a multiplexed peer learns about the drain only when the TCP socket closes. For pools that sit on the sending side of a long-lived multiplexed link (for example, reverse_connection clusters whose HTTP/2 client codec is the cluster's outbound link to a peer Envoy), allowing the peer to observe GOAWAY before tear-down lets the peer migrate new streams to a different connection while in-flight streams complete. This change introduces: * A new enum value `ConnectionPool::DrainBehavior::GoAwayAndDrainAndDelete`. * A virtual hook `ConnectionPool::ActiveClient::initiateGoAwayAndDrain()` with a default no-op implementation (for HTTP/1 / TCP). * An override on `MultiplexedActiveClientBase` that calls `codec_client_->goAway()` and transitions the client to Draining if any streams are active, otherwise closes the client. * A branch in `ConnPoolImplBase::drainConnectionsImpl()` that iterates the early-data, ready and busy client lists (snapshotted because the iteration mutates state via `transitionActiveClientState`), invokes the new hook on each, and falls through to the existing DrainAndDelete idle-close path. Connecting clients are skipped because no PREFACE has been exchanged. * `ConnectivityGrid::drainConnections()` sets `draining_ = true` for the new value so no fresh HTTP/3 / HTTP/2 alternative pools are created mid-drain. * A new three-arg `ClusterManager::drainConnections(cluster, predicate, drain_behavior)` overload (parallel to the existing two-arg overload) so callers can target a single cluster with a specific behavior. Risk Level: Low. The new enum value is opt-in; the existing DrainBehavior values and their code paths are untouched. The new virtual on ActiveClient has a no-op default, so non-multiplexed clients see no behavior change. The grid update is a straightforward extension of the existing `draining_` setter. Testing: Five new unit tests added to test/common/http/http2/conn_pool_test.cc exercising GoAwayAndDrainAndDelete on the HTTP/2 pool: * sends GOAWAY then drains a ready client with an active stream * closes an idle client immediately after GOAWAY * does not send GOAWAY on a connecting client * idempotent on repeat calls * handles multiple clients (mixed busy + idle) Existing conn_pool_base_test and conn_pool_grid_test pass unchanged. Docs Changes: n/a (header doc comments updated for the new enum value, virtual hook, and ClusterManager overload). Release Notes: Added a new connection-pool drain behavior `GoAwayAndDrainAndDelete` that sends a protocol-level GOAWAY to each multiplexed client before draining the pool. Signed-off-by: Prasad I V <prasad.iv@databricks.com>

ivpr · 2026-05-28T19:09:28Z

/retest

basundhara-c · 2026-05-28T21:44:44Z

@botengyao could you have a look at the overall approach for this?

…n semantics Renames the new drain behavior introduced in the parent commit: - DrainBehavior::GoAwayAndDrainAndDelete -> NotifyPeerAndDrainExisting. - ConnectionPool::ActiveClient::initiateGoAwayAndDrain() -> notifyPeerAndDrain(). Changes the semantics so the pool is NOT marked for deletion: existing clients are notified (GOAWAY for HTTP/2/HTTP/3, no-op for HTTP/1/TCP) and transitioned to Draining; new streams continue to be served by fresh clients. The notify loop then falls through into the DrainExistingConnections cleanup body so Connecting clients are still closed via closeIdleConnectionsForDrainingPool. Fixes a crash where newStreamImpl would trip ASSERT(!is_draining_for_deletion_) when drainConnections(NotifyPeerAndDrainExisting) was called on a cluster whose LB can still select hosts after the drain (the original "AndDelete" contract is only safe to invoke on removed hosts or during cluster destruction). Drained-but-busy clients close naturally when their last in-flight stream completes via the existing onStreamClosed path (state==Draining + numActiveStreams==0 -> close()); no new flag is required for cleanup. Naming: "GoAway" leaked HTTP/2-specific vocabulary into the protocol-agnostic conn_pool.h enum; "NotifyPeer" generalizes cleanly and matches the per-protocol fallback (HTTP/1 / TCP are no-ops at the conn-pool layer). Tests: - Renamed the 5 existing GoAwayAndDrainAndDelete* tests to NotifyPeerAndDrainExisting*. - Added NotifyPeerAndDrainExistingAllowsNewStreamAfterDrain as an explicit regression test for the crash: triggers drain on a busy pool, issues a new stream, asserts a fresh client is created and the stream completes. Risk Level: Low. New enum value remains opt-in; existing DrainBehavior values are untouched. Signed-off-by: Prasad I V <prasad.iv@databricks.com>

…ate, behavior) overload The new three-arg ClusterManager::drainConnections overload introduced earlier in this PR lacked a direct unit test, dropping source/common/upstream coverage from 96.6% to 96.5% (the directory's per-line threshold) on CI. Adds DrainConnectionsWithDrainBehavior to cluster_manager_lifecycle_test.cc. The test mirrors the existing DrainConnectionsPredicate test for the two-arg overload and covers the three code paths in the new overload body: - predicate selects a subset of hosts -> only matching pools receive drainConnections - nullptr predicate -> every pool receives drainConnections - unknown cluster name -> short-circuits without touching any pool Uses DrainBehavior::NotifyPeerAndDrainExisting as the forwarded behavior, exercising the end-to-end "caller specifies a non-default DrainBehavior on a specific cluster" path that the overload was introduced for. Signed-off-by: Prasad I V <prasad.iv@databricks.com>

RyanTheOptimist · 2026-05-31T23:44:19Z

+  // peer signal and behave as DrainExistingConnections.
+  // Motivating use case: pools sitting on the sending side of a long-lived multiplexed link
+  // (e.g. reverse-connection clusters).
+  NotifyPeerAndDrainExisting,


I'm trying to understand how NotifyPeerAndDrainExisting differs from DrainExistingNonMigratableConnections. Can you explain the difference?

From what I checked, DrainExistingNonMigratableConnections is for HTTP3 QUIC to support skipping the draining when source/downstream network changes which allows connections to be kept alive.

NotifyPeerAndDrainExisting is specifically meant to actually drain the existing connection but with a notification(eg: HTTP2 GOAWAY) to support use-cases like reverse connections where connection can only be initiated from the listener/downstream side and needs a notification to be sent for grace draining.

…nBehavior test CI's check_format wants EXPECT_CALL(*cp1, ...) to fit on the call line with the second argument aligned to the open paren, rather than breaking after EXPECT_CALL(. The three EXPECT_CALL invocations in the new test (one in the predicate branch, two in the nullptr-predicate branch) were wrapped the wrong way; reformat them per Envoy's clang-format style. Signed-off-by: Prasad I V <prasad.iv@databricks.com>

ivpr · 2026-06-15T17:44:29Z

Closing this PR in favour of #45632 which solves it a bit more holistically

ivpr force-pushed the goaway-and-drain-and-delete-core branch from 6636144 to 2e69798 Compare May 28, 2026 00:43

mattklein123 assigned RyanTheOptimist May 28, 2026

ivpr changed the title ~~conn pool: add GoAwayAndDrainAndDelete drain behavior~~ conn pool: add NotifyPeerAndDrainExisting drain behavior May 29, 2026

RyanTheOptimist reviewed May 31, 2026

View reviewed changes

ivpr closed this Jun 15, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

conn pool: add NotifyPeerAndDrainExisting drain behavior#45322

conn pool: add NotifyPeerAndDrainExisting drain behavior#45322
ivpr wants to merge 4 commits into
envoyproxy:mainfrom
ivpr:goaway-and-drain-and-delete-core

ivpr commented May 27, 2026 •

edited

Loading

Uh oh!

ivpr commented May 28, 2026

Uh oh!

basundhara-c commented May 28, 2026

Uh oh!

RyanTheOptimist May 31, 2026

Uh oh!

ivpr Jun 1, 2026

Uh oh!

ivpr commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

#	Where	What
PR1 (this PR)	`envoyproxy/envoy`	Core: `DrainBehavior::NotifyPeerAndDrainExisting` enum value + virtual hook `ActiveClient::notifyPeerAndDrain()` + `ClusterManager::drainConnections(cluster, predicate, drain_behavior)` overload.
PR2 → ivpr/envoy#1	`ivpr/envoy` (stacked on PR1)	Upstream consumer: reverse-tunnel acceptor extension registers a `POST /reverse_tunnel/drain_cluster` admin endpoint that fans out the new drain behavior to every `envoy.clusters.reverse_connection` cluster. Opt-in via `envoy.reloadable_features.reverse_tunnel_drain_with_goaway`.
PR3 → ivpr/envoy#2	`ivpr/envoy` (stacked on PR2)	Downstream consumer: drain-aware HCM observes peer GOAWAY on reverse-tunnel sockets and asks the initiator IOHandle to drop the tunnel from tracking and dial a replacement. Same runtime guard.

Uh oh!

Conversation

ivpr commented May 27, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

End-to-end flow this PR enables

What PR2 and PR3 say about the shape of the new core API

How drained connections eventually close (no new flag required)

Commit message body

What this change introduces

Why this isn't GoAwayAndDrainAndDelete

Risk Level

Testing

Manual validation (in PR2/PR3 stack)

Docs

Release Notes

Uh oh!

ivpr commented May 28, 2026

Uh oh!

basundhara-c commented May 28, 2026

Uh oh!

RyanTheOptimist May 31, 2026

Choose a reason for hiding this comment

Uh oh!

ivpr Jun 1, 2026

Choose a reason for hiding this comment

Uh oh!

ivpr commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ivpr commented May 27, 2026 •

edited

Loading

Why this isn't `GoAwayAndDrainAndDelete`