conn pool: add NotifyPeerAndDrainExisting drain behavior#45322
Conversation
Commit Message: Add a new ConnectionPool::DrainBehavior::GoAwayAndDrainAndDelete that mirrors DrainAndDelete but first sends a protocol-level GOAWAY on each multiplexed client (HTTP/2, HTTP/3). HTTP/1 and TCP pools have no GOAWAY concept and fall back to plain DrainAndDelete. Additional Description: The connection-pool API already supports DrainAndDelete and DrainExistingConnections, but neither communicates the impending drain to the peer at the protocol layer; a multiplexed peer learns about the drain only when the TCP socket closes. For pools that sit on the sending side of a long-lived multiplexed link (for example, reverse_connection clusters whose HTTP/2 client codec is the cluster's outbound link to a peer Envoy), allowing the peer to observe GOAWAY before tear-down lets the peer migrate new streams to a different connection while in-flight streams complete. This change introduces: * A new enum value `ConnectionPool::DrainBehavior::GoAwayAndDrainAndDelete`. * A virtual hook `ConnectionPool::ActiveClient::initiateGoAwayAndDrain()` with a default no-op implementation (for HTTP/1 / TCP). * An override on `MultiplexedActiveClientBase` that calls `codec_client_->goAway()` and transitions the client to Draining if any streams are active, otherwise closes the client. * A branch in `ConnPoolImplBase::drainConnectionsImpl()` that iterates the early-data, ready and busy client lists (snapshotted because the iteration mutates state via `transitionActiveClientState`), invokes the new hook on each, and falls through to the existing DrainAndDelete idle-close path. Connecting clients are skipped because no PREFACE has been exchanged. * `ConnectivityGrid::drainConnections()` sets `draining_ = true` for the new value so no fresh HTTP/3 / HTTP/2 alternative pools are created mid-drain. * A new three-arg `ClusterManager::drainConnections(cluster, predicate, drain_behavior)` overload (parallel to the existing two-arg overload) so callers can target a single cluster with a specific behavior. Risk Level: Low. The new enum value is opt-in; the existing DrainBehavior values and their code paths are untouched. The new virtual on ActiveClient has a no-op default, so non-multiplexed clients see no behavior change. The grid update is a straightforward extension of the existing `draining_` setter. Testing: Five new unit tests added to test/common/http/http2/conn_pool_test.cc exercising GoAwayAndDrainAndDelete on the HTTP/2 pool: * sends GOAWAY then drains a ready client with an active stream * closes an idle client immediately after GOAWAY * does not send GOAWAY on a connecting client * idempotent on repeat calls * handles multiple clients (mixed busy + idle) Existing conn_pool_base_test and conn_pool_grid_test pass unchanged. Docs Changes: n/a (header doc comments updated for the new enum value, virtual hook, and ClusterManager overload). Release Notes: Added a new connection-pool drain behavior `GoAwayAndDrainAndDelete` that sends a protocol-level GOAWAY to each multiplexed client before draining the pool. Signed-off-by: Prasad I V <prasad.iv@databricks.com>
6636144 to
2e69798
Compare
|
/retest |
|
@botengyao could you have a look at the overall approach for this? |
…n semantics
Renames the new drain behavior introduced in the parent commit:
- DrainBehavior::GoAwayAndDrainAndDelete -> NotifyPeerAndDrainExisting.
- ConnectionPool::ActiveClient::initiateGoAwayAndDrain() -> notifyPeerAndDrain().
Changes the semantics so the pool is NOT marked for deletion: existing clients
are notified (GOAWAY for HTTP/2/HTTP/3, no-op for HTTP/1/TCP) and transitioned
to Draining; new streams continue to be served by fresh clients. The notify
loop then falls through into the DrainExistingConnections cleanup body so
Connecting clients are still closed via closeIdleConnectionsForDrainingPool.
Fixes a crash where newStreamImpl would trip ASSERT(!is_draining_for_deletion_)
when drainConnections(NotifyPeerAndDrainExisting) was called on a cluster whose
LB can still select hosts after the drain (the original "AndDelete" contract is
only safe to invoke on removed hosts or during cluster destruction).
Drained-but-busy clients close naturally when their last in-flight stream
completes via the existing onStreamClosed path (state==Draining +
numActiveStreams==0 -> close()); no new flag is required for cleanup.
Naming: "GoAway" leaked HTTP/2-specific vocabulary into the protocol-agnostic
conn_pool.h enum; "NotifyPeer" generalizes cleanly and matches the per-protocol
fallback (HTTP/1 / TCP are no-ops at the conn-pool layer).
Tests:
- Renamed the 5 existing GoAwayAndDrainAndDelete* tests to NotifyPeerAndDrainExisting*.
- Added NotifyPeerAndDrainExistingAllowsNewStreamAfterDrain as an explicit
regression test for the crash: triggers drain on a busy pool, issues a new
stream, asserts a fresh client is created and the stream completes.
Risk Level: Low. New enum value remains opt-in; existing DrainBehavior values
are untouched.
Signed-off-by: Prasad I V <prasad.iv@databricks.com>
…ate, behavior) overload The new three-arg ClusterManager::drainConnections overload introduced earlier in this PR lacked a direct unit test, dropping source/common/upstream coverage from 96.6% to 96.5% (the directory's per-line threshold) on CI. Adds DrainConnectionsWithDrainBehavior to cluster_manager_lifecycle_test.cc. The test mirrors the existing DrainConnectionsPredicate test for the two-arg overload and covers the three code paths in the new overload body: - predicate selects a subset of hosts -> only matching pools receive drainConnections - nullptr predicate -> every pool receives drainConnections - unknown cluster name -> short-circuits without touching any pool Uses DrainBehavior::NotifyPeerAndDrainExisting as the forwarded behavior, exercising the end-to-end "caller specifies a non-default DrainBehavior on a specific cluster" path that the overload was introduced for. Signed-off-by: Prasad I V <prasad.iv@databricks.com>
| // peer signal and behave as DrainExistingConnections. | ||
| // Motivating use case: pools sitting on the sending side of a long-lived multiplexed link | ||
| // (e.g. reverse-connection clusters). | ||
| NotifyPeerAndDrainExisting, |
There was a problem hiding this comment.
I'm trying to understand how NotifyPeerAndDrainExisting differs from DrainExistingNonMigratableConnections. Can you explain the difference?
There was a problem hiding this comment.
From what I checked, DrainExistingNonMigratableConnections is for HTTP3 QUIC to support skipping the draining when source/downstream network changes which allows connections to be kept alive.
NotifyPeerAndDrainExisting is specifically meant to actually drain the existing connection but with a notification(eg: HTTP2 GOAWAY) to support use-cases like reverse connections where connection can only be initiated from the listener/downstream side and needs a notification to be sent for grace draining.
…nBehavior test CI's check_format wants EXPECT_CALL(*cp1, ...) to fit on the call line with the second argument aligned to the open paren, rather than breaking after EXPECT_CALL(. The three EXPECT_CALL invocations in the new test (one in the predicate branch, two in the nullptr-predicate branch) were wrapped the wrong way; reformat them per Envoy's clang-format style. Signed-off-by: Prasad I V <prasad.iv@databricks.com>
|
Closing this PR in favour of #45632 which solves it a bit more holistically |
End-to-end flow this PR enables
sequenceDiagram autonumber participant Op as Operator participant US as Upstream Envoy participant Tun as Reverse tunnel (HTTP/2) participant DS as Downstream Envoy Note over US,DS: Steady state. DS holds a long-lived HTTP/2 tunnel. <br/>US owns the HTTP/2 client codec via the reverse_connection cluster. Op->>US: POST /reverse_tunnel/drain_cluster rect rgba(150,200,255,0.15) Note over US: PR2 consumer. Admin handler iterates envoy.clusters.reverse_connection clusters <br/>and fans out the new drain behavior. US->>US: drainConnections(cluster, nullptr, NotifyPeerAndDrainExisting) end rect rgba(255,210,150,0.25) Note over US: PR1 (this PR). Pool snapshots ready, busy and early-data clients, <br/>then calls MultiplexedActiveClientBase.notifyPeerAndDrain on each, <br/>which calls codec_client_.goAway and transitions to Draining if any streams are active. <br/>is_draining_for_deletion_ stays false so the pool stays usable for new streams via fresh clients. US-->>Tun: HTTP/2 GOAWAY end Tun-->>DS: HTTP/2 GOAWAY rect rgba(180,255,180,0.20) Note over DS: PR3 consumer. Drain-aware HCM observes peer GOAWAY, <br/>asks the reverse-tunnel IOHandle to drop this tunnel from tracking <br/>and kick maintenance to dial a replacement. DS->>DS: markTunnelDrainingAndDialReplacement(key) DS-->>Tun: dial new tunnel to sibling US replica end Note over US,DS: In-flight HTTP/2 streams on the old tunnel complete normally. <br/>The drained client auto-closes once its last stream finishes (onStreamClosed line 311-313). <br/>The tunnel TCP-closes after that.PR1 is the only piece that lands in envoyproxy/envoy core. PR2 and PR3 are in reverse tunnel code and are linked above so you can see exactly what a consumer of this API looks like — both how it's invoked (PR2) and how the multiplexed-peer GOAWAY observation can be wired on the receiving side (PR3). Since PR2 and PR3 depend on PR1, I will raise them as new PRs in envoy proxy once this PR is merged.
What PR2 and PR3 say about the shape of the new core API
DrainBehavior::NotifyPeerAndDrainExistingis strictly opt-in. Existing callers ofdrainConnections(DrainAndDelete)and friends see no behavior change. PR2 / PR3 also keep the behavior runtime-guarded so end-user reverse-tunnel deployments do not see it until they explicitly enable it.ClusterManager::drainConnections(cluster, predicate, drain_behavior)is the three-arg overload that lets an extension target a single cluster with a specific behavior. PR2 uses it to target onlyenvoy.clusters.reverse_connectionclusters; everything else is left alone.MultiplexedActiveClientBase::notifyPeerAndDrain()handles the "preserve in-flight streams" semantics: ifcodec_client_->numActiveRequests() > 0, the client transitions toDraininginstead of closing; otherwise it closes immediately. This is the property PR2's scenario depends on — sibling-replica failover should not kill the requests already on the wire.How drained connections eventually close (no new flag required)
The new branch in
drainConnectionsImpldoes not setis_draining_for_deletion_. Clients close via existing pre-existing paths instead:MultiplexedActiveClientBase::notifyPeerAndDraincallscodec_client_->close()directly whennumActiveRequests() == 0. The fall-through intocloseIdleConnectionsForDrainingPool()flushesconnecting_clients_.Drainingstate while in-flight streams complete. When the last stream finishes, the existing block at conn_pool_base.cc lines 311-313 insideonStreamClosedcloses the client. This is the same path the existingMaxConnectionDurationandMaxStreamsPerConnectiondrains rely on today.ConnectionEvent::RemoteClosehandling.--drain-time-s) is the last-resort safety net.The pool itself stays alive (intentionally — that is why we are renaming away from
…AndDelete). New requests for the same host land innewStreamImpland pass through the!is_draining_for_deletion_ASSERT (no crash); they find no ready clients (all drained), sotryCreateNewConnections()spins up a fresh non-Draining client.Commit message body
Add a new
ConnectionPool::DrainBehavior::NotifyPeerAndDrainExistingvalue that mirrorsDrainExistingConnectionssemantically but first notifies the peer that the pool is draining. For multiplexed pools (HTTP/2, HTTP/3) the notification is a protocol-level GOAWAY; for HTTP/1 and raw TCP there is no peer-level signal and the behavior degrades to plainDrainExistingConnections(the peer learns about the drain when the TCP socket closes). UnlikeDrainAndDelete, the pool stays usable: new streams are served by fresh clients while drained clients finish their in-flight work and close on idle.What this change introduces
ConnectionPool::DrainBehavior::NotifyPeerAndDrainExisting.ConnectionPool::ActiveClient::notifyPeerAndDrain()with a default no-op implementation (so HTTP/1 + TCP pools support the new behavior without any code change — they just don't emit a peer signal).MultiplexedActiveClientBasethat callscodec_client_->goAway()and transitions the client toDrainingif any streams are active, otherwise closes the client immediately.ConnPoolImplBase::drainConnectionsImpl()that snapshots ready / busy / early-data clients, invokes the new hook on each, and falls through into the existingDrainExistingConnectionsbody socloseIdleConnectionsForDrainingPool()handles the connecting-clients cleanup. The branch does NOT setis_draining_for_deletion_, sonewStreamImplcontinues to accept new requests after the drain (served by fresh clients).ClusterManager::drainConnections(cluster, predicate, drain_behavior)overload (parallel to the existing two-arg overload) so callers can target a single cluster with a specific behavior.Why this isn't
GoAwayAndDrainAndDeleteEarlier revisions of this PR named the value
GoAwayAndDrainAndDeleteand modeled it afterDrainAndDelete(setis_draining_for_deletion_ = true, mark the pool for deletion). That was incorrect for the motivating use case (reverse-connection clusters), because the cluster's hosts remain LB-selectable after the drain — a new request after drain would tripASSERT(!is_draining_for_deletion_)innewStreamImpland crash the process. The new semantics keep the pool usable; closure happens naturally on stream completion via the existingonStreamClosedclose-on-idle logic.GoAwaywas also HTTP/2-specific vocabulary in the protocol-agnosticenvoy/common/conn_pool.henum.NotifyPeergeneralizes cleanly — HTTP/1 and TCP inherit the no-op virtual without any per-protocol branching.Risk Level
Low. The new enum value is opt-in; existing
DrainBehaviorvalues and their code paths are untouched. The new virtual onActiveClienthas a no-op default, so non-multiplexed clients see no behavior change. No structural changes toConnectivityGrid(the renamed value does not set the grid'sdraining_=trueflag because the pool stays usable).Testing
Six unit tests in
test/common/http/http2/conn_pool_test.ccexercisingNotifyPeerAndDrainExistingon the HTTP/2 pool:GoAwayAndDrainAndDeletesemantics withASSERT(!is_draining_for_deletion_))Existing
conn_pool_base_testandconn_pool_grid_testpass unchanged.Manual validation (in PR2/PR3 stack)
End-to-end validated with a 2-Envoy reverse-tunnel setup: US drains its
reverse_connectioncluster viaPOST /reverse_tunnel/drain_cluster, the pool fires GOAWAY on every reverse tunnel, DS observes GOAWAY and dials replacement tunnels, in-flight streams complete on the old tunnels, US stays alive and serves new requests via fresh clients — no crash.Docs
n/a (header doc comments updated for the new enum value, virtual hook, and ClusterManager overload).
Release Notes
Added a new connection-pool drain behavior
NotifyPeerAndDrainExistingthat sends a protocol-level GOAWAY to each multiplexed client before draining the pool. UnlikeDrainAndDelete, the pool remains usable for new streams via fresh clients while existing clients drain in the background.