Skip to content

conn pool: add NotifyPeerAndDrainExisting drain behavior#45322

Closed
ivpr wants to merge 4 commits into
envoyproxy:mainfrom
ivpr:goaway-and-drain-and-delete-core
Closed

conn pool: add NotifyPeerAndDrainExisting drain behavior#45322
ivpr wants to merge 4 commits into
envoyproxy:mainfrom
ivpr:goaway-and-drain-and-delete-core

Conversation

@ivpr

@ivpr ivpr commented May 27, 2026

Copy link
Copy Markdown
Contributor

This PR is part of a 3-PR stack. PR2 and PR3 are out-of-tree against ivpr/envoy and exist to demonstrate a concrete consumer of the API added here — they do not need to land in envoyproxy/envoy for this PR to be useful on its own.

# Where What
PR1 (this PR) envoyproxy/envoy Core: DrainBehavior::NotifyPeerAndDrainExisting enum value + virtual hook ActiveClient::notifyPeerAndDrain() + ClusterManager::drainConnections(cluster, predicate, drain_behavior) overload.
PR2 → ivpr/envoy#1 ivpr/envoy (stacked on PR1) Upstream consumer: reverse-tunnel acceptor extension registers a POST /reverse_tunnel/drain_cluster admin endpoint that fans out the new drain behavior to every envoy.clusters.reverse_connection cluster. Opt-in via envoy.reloadable_features.reverse_tunnel_drain_with_goaway.
PR3 → ivpr/envoy#2 ivpr/envoy (stacked on PR2) Downstream consumer: drain-aware HCM observes peer GOAWAY on reverse-tunnel sockets and asks the initiator IOHandle to drop the tunnel from tracking and dial a replacement. Same runtime guard.

End-to-end flow this PR enables

sequenceDiagram
    autonumber
    participant Op as Operator
    participant US as Upstream Envoy
    participant Tun as Reverse tunnel (HTTP/2)
    participant DS as Downstream Envoy

    Note over US,DS: Steady state. DS holds a long-lived HTTP/2 tunnel. <br/>US owns the HTTP/2 client codec via the reverse_connection cluster.

    Op->>US: POST /reverse_tunnel/drain_cluster
    rect rgba(150,200,255,0.15)
        Note over US: PR2 consumer. Admin handler iterates envoy.clusters.reverse_connection clusters <br/>and fans out the new drain behavior.
        US->>US: drainConnections(cluster, nullptr, NotifyPeerAndDrainExisting)
    end

    rect rgba(255,210,150,0.25)
        Note over US: PR1 (this PR). Pool snapshots ready, busy and early-data clients, <br/>then calls MultiplexedActiveClientBase.notifyPeerAndDrain on each, <br/>which calls codec_client_.goAway and transitions to Draining if any streams are active. <br/>is_draining_for_deletion_ stays false so the pool stays usable for new streams via fresh clients.
        US-->>Tun: HTTP/2 GOAWAY
    end

    Tun-->>DS: HTTP/2 GOAWAY

    rect rgba(180,255,180,0.20)
        Note over DS: PR3 consumer. Drain-aware HCM observes peer GOAWAY, <br/>asks the reverse-tunnel IOHandle to drop this tunnel from tracking <br/>and kick maintenance to dial a replacement.
        DS->>DS: markTunnelDrainingAndDialReplacement(key)
        DS-->>Tun: dial new tunnel to sibling US replica
    end

    Note over US,DS: In-flight HTTP/2 streams on the old tunnel complete normally. <br/>The drained client auto-closes once its last stream finishes (onStreamClosed line 311-313). <br/>The tunnel TCP-closes after that.
Loading

PR1 is the only piece that lands in envoyproxy/envoy core. PR2 and PR3 are in reverse tunnel code and are linked above so you can see exactly what a consumer of this API looks like — both how it's invoked (PR2) and how the multiplexed-peer GOAWAY observation can be wired on the receiving side (PR3). Since PR2 and PR3 depend on PR1, I will raise them as new PRs in envoy proxy once this PR is merged.

What PR2 and PR3 say about the shape of the new core API

  • DrainBehavior::NotifyPeerAndDrainExisting is strictly opt-in. Existing callers of drainConnections(DrainAndDelete) and friends see no behavior change. PR2 / PR3 also keep the behavior runtime-guarded so end-user reverse-tunnel deployments do not see it until they explicitly enable it.
  • ClusterManager::drainConnections(cluster, predicate, drain_behavior) is the three-arg overload that lets an extension target a single cluster with a specific behavior. PR2 uses it to target only envoy.clusters.reverse_connection clusters; everything else is left alone.
  • MultiplexedActiveClientBase::notifyPeerAndDrain() handles the "preserve in-flight streams" semantics: if codec_client_->numActiveRequests() > 0, the client transitions to Draining instead of closing; otherwise it closes immediately. This is the property PR2's scenario depends on — sibling-replica failover should not kill the requests already on the wire.

How drained connections eventually close (no new flag required)

The new branch in drainConnectionsImpl does not set is_draining_for_deletion_. Clients close via existing pre-existing paths instead:

  1. Idle-at-drain-time clients close immediately. MultiplexedActiveClientBase::notifyPeerAndDrain calls codec_client_->close() directly when numActiveRequests() == 0. The fall-through into closeIdleConnectionsForDrainingPool() flushes connecting_clients_.
  2. Busy-then-drained clients close on last-stream-finish. They sit in Draining state while in-flight streams complete. When the last stream finishes, the existing block at conn_pool_base.cc lines 311-313 inside onStreamClosed closes the client. This is the same path the existing MaxConnectionDuration and MaxStreamsPerConnection drains rely on today.
  3. Peer-initiated close on receipt of the GOAWAY shows up via the normal ConnectionEvent::RemoteClose handling.
  4. Server drain timeout (--drain-time-s) is the last-resort safety net.

The pool itself stays alive (intentionally — that is why we are renaming away from …AndDelete). New requests for the same host land in newStreamImpl and pass through the !is_draining_for_deletion_ ASSERT (no crash); they find no ready clients (all drained), so tryCreateNewConnections() spins up a fresh non-Draining client.


Commit message body

Add a new ConnectionPool::DrainBehavior::NotifyPeerAndDrainExisting value that mirrors DrainExistingConnections semantically but first notifies the peer that the pool is draining. For multiplexed pools (HTTP/2, HTTP/3) the notification is a protocol-level GOAWAY; for HTTP/1 and raw TCP there is no peer-level signal and the behavior degrades to plain DrainExistingConnections (the peer learns about the drain when the TCP socket closes). Unlike DrainAndDelete, the pool stays usable: new streams are served by fresh clients while drained clients finish their in-flight work and close on idle.

What this change introduces

  • A new enum value ConnectionPool::DrainBehavior::NotifyPeerAndDrainExisting.
  • A virtual hook ConnectionPool::ActiveClient::notifyPeerAndDrain() with a default no-op implementation (so HTTP/1 + TCP pools support the new behavior without any code change — they just don't emit a peer signal).
  • An override on MultiplexedActiveClientBase that calls codec_client_->goAway() and transitions the client to Draining if any streams are active, otherwise closes the client immediately.
  • A branch in ConnPoolImplBase::drainConnectionsImpl() that snapshots ready / busy / early-data clients, invokes the new hook on each, and falls through into the existing DrainExistingConnections body so closeIdleConnectionsForDrainingPool() handles the connecting-clients cleanup. The branch does NOT set is_draining_for_deletion_, so newStreamImpl continues to accept new requests after the drain (served by fresh clients).
  • A new three-arg ClusterManager::drainConnections(cluster, predicate, drain_behavior) overload (parallel to the existing two-arg overload) so callers can target a single cluster with a specific behavior.

Why this isn't GoAwayAndDrainAndDelete

Earlier revisions of this PR named the value GoAwayAndDrainAndDelete and modeled it after DrainAndDelete (set is_draining_for_deletion_ = true, mark the pool for deletion). That was incorrect for the motivating use case (reverse-connection clusters), because the cluster's hosts remain LB-selectable after the drain — a new request after drain would trip ASSERT(!is_draining_for_deletion_) in newStreamImpl and crash the process. The new semantics keep the pool usable; closure happens naturally on stream completion via the existing onStreamClosed close-on-idle logic.

GoAway was also HTTP/2-specific vocabulary in the protocol-agnostic envoy/common/conn_pool.h enum. NotifyPeer generalizes cleanly — HTTP/1 and TCP inherit the no-op virtual without any per-protocol branching.

Risk Level

Low. The new enum value is opt-in; existing DrainBehavior values and their code paths are untouched. The new virtual on ActiveClient has a no-op default, so non-multiplexed clients see no behavior change. No structural changes to ConnectivityGrid (the renamed value does not set the grid's draining_=true flag because the pool stays usable).

Testing

Six unit tests in test/common/http/http2/conn_pool_test.cc exercising NotifyPeerAndDrainExisting on the HTTP/2 pool:

  • sends GOAWAY then drains a ready client with an active stream
  • closes an idle client immediately after GOAWAY
  • does not send GOAWAY on a connecting client
  • idempotent on repeat calls
  • handles multiple clients (mixed busy + idle)
  • NEW regression test: a new stream issued after drain creates a fresh client and completes successfully (would crash on the old GoAwayAndDrainAndDelete semantics with ASSERT(!is_draining_for_deletion_))

Existing conn_pool_base_test and conn_pool_grid_test pass unchanged.

Manual validation (in PR2/PR3 stack)

End-to-end validated with a 2-Envoy reverse-tunnel setup: US drains its reverse_connection cluster via POST /reverse_tunnel/drain_cluster, the pool fires GOAWAY on every reverse tunnel, DS observes GOAWAY and dials replacement tunnels, in-flight streams complete on the old tunnels, US stays alive and serves new requests via fresh clients — no crash.

Docs

n/a (header doc comments updated for the new enum value, virtual hook, and ClusterManager overload).

Release Notes

Added a new connection-pool drain behavior NotifyPeerAndDrainExisting that sends a protocol-level GOAWAY to each multiplexed client before draining the pool. Unlike DrainAndDelete, the pool remains usable for new streams via fresh clients while existing clients drain in the background.

Commit Message: Add a new ConnectionPool::DrainBehavior::GoAwayAndDrainAndDelete
that mirrors DrainAndDelete but first sends a protocol-level GOAWAY on each
multiplexed client (HTTP/2, HTTP/3). HTTP/1 and TCP pools have no GOAWAY
concept and fall back to plain DrainAndDelete.

Additional Description: The connection-pool API already supports DrainAndDelete
and DrainExistingConnections, but neither communicates the impending drain to
the peer at the protocol layer; a multiplexed peer learns about the drain only
when the TCP socket closes. For pools that sit on the sending side of a
long-lived multiplexed link (for example, reverse_connection clusters whose
HTTP/2 client codec is the cluster's outbound link to a peer Envoy), allowing
the peer to observe GOAWAY before tear-down lets the peer migrate new streams
to a different connection while in-flight streams complete.

This change introduces:

* A new enum value `ConnectionPool::DrainBehavior::GoAwayAndDrainAndDelete`.
* A virtual hook `ConnectionPool::ActiveClient::initiateGoAwayAndDrain()` with
  a default no-op implementation (for HTTP/1 / TCP).
* An override on `MultiplexedActiveClientBase` that calls
  `codec_client_->goAway()` and transitions the client to Draining if any
  streams are active, otherwise closes the client.
* A branch in `ConnPoolImplBase::drainConnectionsImpl()` that iterates the
  early-data, ready and busy client lists (snapshotted because the iteration
  mutates state via `transitionActiveClientState`), invokes the new hook on
  each, and falls through to the existing DrainAndDelete idle-close path.
  Connecting clients are skipped because no PREFACE has been exchanged.
* `ConnectivityGrid::drainConnections()` sets `draining_ = true` for the new
  value so no fresh HTTP/3 / HTTP/2 alternative pools are created mid-drain.
* A new three-arg `ClusterManager::drainConnections(cluster, predicate,
  drain_behavior)` overload (parallel to the existing two-arg overload) so
  callers can target a single cluster with a specific behavior.

Risk Level: Low. The new enum value is opt-in; the existing DrainBehavior
values and their code paths are untouched. The new virtual on ActiveClient has
a no-op default, so non-multiplexed clients see no behavior change. The grid
update is a straightforward extension of the existing `draining_` setter.

Testing: Five new unit tests added to
test/common/http/http2/conn_pool_test.cc exercising
GoAwayAndDrainAndDelete on the HTTP/2 pool:
* sends GOAWAY then drains a ready client with an active stream
* closes an idle client immediately after GOAWAY
* does not send GOAWAY on a connecting client
* idempotent on repeat calls
* handles multiple clients (mixed busy + idle)

Existing conn_pool_base_test and conn_pool_grid_test pass unchanged.

Docs Changes: n/a (header doc comments updated for the new enum value, virtual
hook, and ClusterManager overload).

Release Notes: Added a new connection-pool drain behavior
`GoAwayAndDrainAndDelete` that sends a protocol-level GOAWAY to each
multiplexed client before draining the pool.

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
@ivpr ivpr force-pushed the goaway-and-drain-and-delete-core branch from 6636144 to 2e69798 Compare May 28, 2026 00:43
@ivpr

ivpr commented May 28, 2026

Copy link
Copy Markdown
Contributor Author

/retest

@basundhara-c

Copy link
Copy Markdown
Contributor

@botengyao could you have a look at the overall approach for this?

…n semantics

Renames the new drain behavior introduced in the parent commit:
  - DrainBehavior::GoAwayAndDrainAndDelete -> NotifyPeerAndDrainExisting.
  - ConnectionPool::ActiveClient::initiateGoAwayAndDrain() -> notifyPeerAndDrain().

Changes the semantics so the pool is NOT marked for deletion: existing clients
are notified (GOAWAY for HTTP/2/HTTP/3, no-op for HTTP/1/TCP) and transitioned
to Draining; new streams continue to be served by fresh clients. The notify
loop then falls through into the DrainExistingConnections cleanup body so
Connecting clients are still closed via closeIdleConnectionsForDrainingPool.

Fixes a crash where newStreamImpl would trip ASSERT(!is_draining_for_deletion_)
when drainConnections(NotifyPeerAndDrainExisting) was called on a cluster whose
LB can still select hosts after the drain (the original "AndDelete" contract is
only safe to invoke on removed hosts or during cluster destruction).

Drained-but-busy clients close naturally when their last in-flight stream
completes via the existing onStreamClosed path (state==Draining +
numActiveStreams==0 -> close()); no new flag is required for cleanup.

Naming: "GoAway" leaked HTTP/2-specific vocabulary into the protocol-agnostic
conn_pool.h enum; "NotifyPeer" generalizes cleanly and matches the per-protocol
fallback (HTTP/1 / TCP are no-ops at the conn-pool layer).

Tests:
  - Renamed the 5 existing GoAwayAndDrainAndDelete* tests to NotifyPeerAndDrainExisting*.
  - Added NotifyPeerAndDrainExistingAllowsNewStreamAfterDrain as an explicit
    regression test for the crash: triggers drain on a busy pool, issues a new
    stream, asserts a fresh client is created and the stream completes.

Risk Level: Low. New enum value remains opt-in; existing DrainBehavior values
are untouched.

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
@ivpr ivpr changed the title conn pool: add GoAwayAndDrainAndDelete drain behavior conn pool: add NotifyPeerAndDrainExisting drain behavior May 29, 2026
…ate, behavior) overload

The new three-arg ClusterManager::drainConnections overload introduced earlier in this PR
lacked a direct unit test, dropping source/common/upstream coverage from 96.6% to 96.5%
(the directory's per-line threshold) on CI.

Adds DrainConnectionsWithDrainBehavior to cluster_manager_lifecycle_test.cc. The test
mirrors the existing DrainConnectionsPredicate test for the two-arg overload and covers
the three code paths in the new overload body:

  - predicate selects a subset of hosts -> only matching pools receive drainConnections
  - nullptr predicate -> every pool receives drainConnections
  - unknown cluster name -> short-circuits without touching any pool

Uses DrainBehavior::NotifyPeerAndDrainExisting as the forwarded behavior, exercising the
end-to-end "caller specifies a non-default DrainBehavior on a specific cluster" path that
the overload was introduced for.

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
Comment thread envoy/common/conn_pool.h
// peer signal and behave as DrainExistingConnections.
// Motivating use case: pools sitting on the sending side of a long-lived multiplexed link
// (e.g. reverse-connection clusters).
NotifyPeerAndDrainExisting,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to understand how NotifyPeerAndDrainExisting differs from DrainExistingNonMigratableConnections. Can you explain the difference?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I checked, DrainExistingNonMigratableConnections is for HTTP3 QUIC to support skipping the draining when source/downstream network changes which allows connections to be kept alive.

NotifyPeerAndDrainExisting is specifically meant to actually drain the existing connection but with a notification(eg: HTTP2 GOAWAY) to support use-cases like reverse connections where connection can only be initiated from the listener/downstream side and needs a notification to be sent for grace draining.

…nBehavior test

CI's check_format wants EXPECT_CALL(*cp1, ...) to fit on the call line with
the second argument aligned to the open paren, rather than breaking after
EXPECT_CALL(. The three EXPECT_CALL invocations in the new test (one in the
predicate branch, two in the nullptr-predicate branch) were wrapped the
wrong way; reformat them per Envoy's clang-format style.

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
@ivpr

ivpr commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

Closing this PR in favour of #45632 which solves it a bit more holistically

@ivpr ivpr closed this Jun 15, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants