Skip to content

reverse_tunnel: add /reverse_tunnel/drain_cluster admin endpoint for NotifyPeerAndDrainExisting#1

Open
ivpr wants to merge 3 commits into
goaway-and-drain-and-delete-corefrom
reverse-tunnel-acceptor-drain-goaway
Open

reverse_tunnel: add /reverse_tunnel/drain_cluster admin endpoint for NotifyPeerAndDrainExisting#1
ivpr wants to merge 3 commits into
goaway-and-drain-and-delete-corefrom
reverse-tunnel-acceptor-drain-goaway

Conversation

@ivpr

@ivpr ivpr commented May 27, 2026

Copy link
Copy Markdown
Owner

Commit Message: Add a POST /reverse_tunnel/drain_cluster admin endpoint on the reverse-tunnel upstream socket interface bootstrap extension. The handler fans out ConnectionPool::DrainBehavior::NotifyPeerAndDrainExisting to every envoy.clusters.reverse_connection cluster, so the cluster's outbound HTTP/2 client codecs send GOAWAY to their peers. Gated behind an opt-in runtime guard envoy.reloadable_features.reverse_tunnel_drain_with_goaway.

Additional Description: A reverse_connection cluster's HTTP/2 client codecs are the cluster's outbound link to a peer Envoy (the cluster sits on the sending side of a long-lived multiplexed tunnel). Without an explicit drain trigger those peers learn about the drain only when the TCP socket eventually closes, which defeats the point of a graceful drain.

/drain_listeners?graceful is intentionally NOT used as the trigger for two reasons:

  1. Coupling listener drain to cluster drain is novel in Envoy. No other extension wires server-wide drain into a cluster fan-out today; cluster-level drains happen via host removal, cluster removal, or explicit ClusterManager::drainConnections() calls.
  2. It creates a race window where in-flight egress requests on the draining server can fail with 503 because the listener and the upstream pool are draining in parallel.

An explicit admin verb lets the operator sequence drains:

POST /drain_listeners?graceful&inboundonly   # quiesce ingress
# (wait for egress to migrate to a sibling US)
POST /reverse_tunnel/drain_cluster           # GOAWAY all reverse tunnels
POST /drain_listeners?graceful               # full server drain + SIGTERM

What this change introduces

  • handlerDrainCluster(...) admin handler on ReverseTunnelAcceptorExtension. Iterates clusterManager().clusters().active_clusters_, finds clusters of type envoy.clusters.reverse_connection, and calls clusterManager().drainConnections(name, /*predicate=*/nullptr, ConnectionPool::DrainBehavior::NotifyPeerAndDrainExisting) on each. The ClusterManager fans the per-cluster drain out to every worker thread that holds a thread-local entry for the cluster.
  • fanOutDrainNotifyPeer() private helper containing the iterate-and-drain body.
  • One-shot guard drain_goaway_dispatched_: a repeat call to the admin endpoint returns "already drained" and is a no-op.
  • onServerInitialized() registers the handler via context_.admin()->addHandler("/reverse_tunnel/drain_cluster", ...) with mutates_server_state=true. Registration is conditional on admin.has_value() so the extension still builds in admin-disabled binaries.
  • When the runtime guard is disabled (the default), the handler returns 503 with a body explaining the guard is off. No drain is dispatched.

Risk Level

Low. The new code path is contained to clusters of type envoy.clusters.reverse_connection, only fires on an explicit admin call (no implicit triggers), and is default-off behind a runtime feature flag. The flag must be flipped on every process that should participate; existing deployments see no change.

Testing

Unit tests (3 new) in reverse_tunnel_acceptor_extension_test.cc:

  • AdminDrainClusterRespectsRuntimeFlag — flag off → 503 with explanatory body; no fan-out.
  • AdminDrainClusterFiresOnReverseConnectionClustersOnly — flag on; asserts drainConnections(NotifyPeerAndDrainExisting) is called on every envoy.clusters.reverse_connection cluster and is NOT called on a same-mock-test cluster of a different type.
  • AdminDrainClusterIsIdempotent — second call returns "already drained" and does NOT re-fire drainConnections.

Manual end-to-end validation with two real Envoy processes (upstream + downstream reverse-tunnel pair), with the runtime guard enabled. POST /reverse_tunnel/drain_cluster on the upstream causes cluster.reverse_connection_cluster.http2.goaway_sent to increment for every tunnel and the downstream's drain-aware HCM filter (PR3) to observe the peer GOAWAY. With the guard disabled, the handler returns 503 and no drain is dispatched.

The crash-fix regression on the underlying PR (PR1) is also exercised here: post-drain requests to the egress listener are served by fresh clients without crashing US.

Docs Changes

n/a.

Release Notes

Added an opt-in runtime guard envoy.reloadable_features.reverse_tunnel_drain_with_goaway. When enabled, the reverse-tunnel upstream socket interface bootstrap extension exposes a POST /reverse_tunnel/drain_cluster admin endpoint that fans out NotifyPeerAndDrainExisting to every envoy.clusters.reverse_connection cluster. Peer initiators receive GOAWAY and can dial replacement tunnels to sibling upstream replicas. Default off.

@ivpr ivpr force-pushed the goaway-and-drain-and-delete-core branch from 6636144 to 2e69798 Compare May 28, 2026 00:42
@ivpr ivpr force-pushed the reverse-tunnel-acceptor-drain-goaway branch 2 times, most recently from 7702d30 to 4b1b5d9 Compare May 29, 2026 22:36
@ivpr ivpr changed the title reverse_tunnel: drain reverse_connection cluster pools with GoAwayAndDrainAndDelete reverse_tunnel: add /reverse_tunnel/drain_cluster admin endpoint for NotifyPeerAndDrainExisting May 29, 2026
@ivpr ivpr force-pushed the reverse-tunnel-acceptor-drain-goaway branch from 4b1b5d9 to 66b488b Compare May 30, 2026 00:12
Prasad I V added 3 commits June 1, 2026 04:51
…DrainAndDelete

Commit Message: Make the reverse-tunnel upstream socket interface bootstrap
extension fan a server-wide drain out to every envoy.clusters.reverse_connection
cluster via DrainBehavior::GoAwayAndDrainAndDelete, so the cluster's outbound
HTTP/2 client codecs send GOAWAY to their peers before tear-down. Gated behind
a new opt-in runtime feature flag.

Additional Description: A reverse_connection cluster's HTTP/2 client codecs are
the cluster's outbound link to a peer Envoy (the cluster sits on the sending
side of a long-lived multiplexed tunnel). Without a drain-time hook, those
peers learn about the drain only when the TCP socket eventually closes, which
defeats the point of a graceful drain.

This change adds `onDrainCheckTimer()` to `ReverseTunnelAcceptorExtension`:

* `onServerInitialized()` arms a 100ms repeating timer.
* The timer polls `context_.drainManager().draining(DrainDirection::All)`,
  which is what `/drain_listeners?graceful` (and other server-wide drain
  triggers) flip.
* The first time it observes draining AND the runtime guard
  `envoy.reloadable_features.reverse_tunnel_drain_with_goaway` is enabled,
  the extension iterates `clusterManager().clusters().active_clusters_`,
  finds clusters of type `envoy.clusters.reverse_connection`, and calls
  `clusterManager().drainConnections(name, /*predicate=*/nullptr,
  ConnectionPool::DrainBehavior::GoAwayAndDrainAndDelete)` on each. The
  ClusterManager fans the per-cluster drain out to every worker thread that
  holds a thread-local entry for the cluster. The timer is then one-shot,
  not re-armed.

When the runtime guard is disabled (the default), the timer keeps re-arming
until the server fully exits drain; no GOAWAY is emitted, preserving existing
behavior bit-for-bit.

The drain hook only runs on the upstream socket interface (the side that
accepts reverse tunnels). The downstream side's reaction to a received GOAWAY
is added by a follow-up change, also gated by the same runtime guard.

Risk Level: Low. The new code path is bounded to clusters of type
`envoy.clusters.reverse_connection`, only fires on a server-wide drain, and is
default-off behind a runtime feature flag. The flag must be flipped on every
process that should participate; existing deployments see no change.

Testing: Manual end-to-end validation with two real Envoy processes (upstream
+ downstream reverse-tunnel pair) and a 3-Envoy DNS failover variant, with
the runtime guard enabled. With the guard enabled,
`/drain_listeners?graceful` on the upstream causes
`cluster.reverse_connection_cluster.http2.goaway_sent` to increment and the
downstream's drain-aware HCM filter to observe the peer GOAWAY. With the
guard disabled, no GOAWAY is emitted and the existing behavior is preserved.
Integration test for the full end-to-end flow is in the follow-up PR that
introduces the downstream reaction.

Docs Changes: n/a.

Release Notes: Added an opt-in runtime guard
`envoy.reloadable_features.reverse_tunnel_drain_with_goaway`. When enabled,
the reverse-tunnel upstream socket interface bootstrap extension sends
GOAWAY on every reverse_connection cluster pool when the server enters
listener drain, so peer Envoys can fail over to a sibling upstream before
in-flight streams complete. Default off.

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
…AndDrainExisting

Follow-up to the conn-pool rename: switch the upstream acceptor extension's
drain trigger from DrainBehavior::GoAwayAndDrainAndDelete to
DrainBehavior::NotifyPeerAndDrainExisting, drop the pod-prestop wording in
the header comment to keep the documentation generic, and tighten the comment
to focus on the why (peers receive GOAWAY and dial replacement tunnels)
rather than restating the mechanism.

No behavior change beyond the new enum name and its updated semantics
(the pool stays usable for new streams via fresh clients while existing
clients drain in the background).

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
…_tunnel/drain_cluster admin endpoint

Coupling cluster-level drain to /drain_listeners was novel in Envoy. Listener
drain (HCM GOAWAY downstream, listener-manager stop) and cluster drain (upstream
pool fan-out) are independent concerns elsewhere in the codebase. Tying them
together also creates a race window where in-flight egress requests on the
draining server can fail with 503 because the listener and the upstream pool
are draining in parallel.

Switching to an explicit admin verb lets the operator sequence:
  POST /drain_listeners?graceful&inboundonly   # quiesce ingress
  (wait for egress traffic to migrate to a sibling US)
  POST /reverse_tunnel/drain_cluster           # GOAWAY all reverse tunnels
  POST /drain_listeners?graceful               # full server drain + SIGTERM

This removes:
  - the drain_check_timer_ field, periodic poll, and one-shot drain_goaway_dispatched_ guard
  - the onDrainCheckTimer() method
  - the dependency on envoy/server/drain_manager.h and envoy/network/drain_decision.h

and adds:
  - admin handler registration for POST /reverse_tunnel/drain_cluster in
    onServerInitialized(), gated on the existing
    envoy.reloadable_features.reverse_tunnel_drain_with_goaway runtime flag
  - handlerDrainCluster() + private fanOutDrainNotifyPeer() helper
  - three unit tests covering: runtime-flag-off short circuit, fan-out limited to
    envoy.clusters.reverse_connection cluster types, and second-call idempotency

The runtime guard stays in source/common/runtime/runtime_features.cc so a single
opt-in flag still controls the end-to-end behavior.

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
@ivpr ivpr force-pushed the reverse-tunnel-acceptor-drain-goaway branch from 66b488b to c3713b9 Compare June 1, 2026 04:51
// egress-listener drain and reverse-tunnel-cluster drain separately.
auto admin = context_.admin();
if (admin.has_value()) {
const bool rc = admin->addHandler(

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to flip this at runtime? Wondering if we could use a flag here instead of a runtime guard?

}

void ReverseTunnelAcceptorExtension::fanOutDrainNotifyPeer() {
drain_goaway_dispatched_ = true;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the drain call fail? Should be setting this to true at the end of the function?

ivpr pushed a commit that referenced this pull request Jun 15, 2026
…envoyproxy#45324)

### Description
This PR resolves a use-after-free (UAF) / pure virtual method call crash
that occurs during `ConnectivityGrid` teardown when connection pools
(HTTP/3, HTTP/2, or HTTP/1) are destroyed while active connection
attempts are still ongoing in the background.

---

### 💥 The Bug & Crash Callstack

When the `ConnectivityGrid` is deleted, its destructor destroys its
underlying connection pools in reverse order. If a pool has active
connection attempts, deleting the pool synchronously cancels those
attempts. In C++, once a derived class destructor finishes, the object's
vtable pointer transitions to point to the base class vtable.

Because `destructAllConnections()` is invoked in
`~HttpConnPoolImplBase()` (the base class destructor), the connection
cancellation callback wrappers execute `onConnectionAttemptFailed()`
after the derived pool object has already been destructed.

When `onConnectionAttemptFailed()` attempts to write a log trace calling
`describePool(attempt->pool())` (which executes
`attempt->pool().protocolDescription()`), virtual dispatch invokes a
**pure virtual function call** on the base class `HttpConnPoolImplBase`,
crashing Envoy instantly:

```log
Program received signal SIGSEGV, Segmentation fault.
0x000056453663656e in Envoy::Http::ConnectivityGrid::WrapperCallbacks::onConnectionAttemptFailed()
Backtrace:
  #0  0x000056453663656e in Envoy::Http::ConnectivityGrid::WrapperCallbacks::onConnectionAttemptFailed()
  #1  0x000056453663606d in Envoy::Http::ConnectivityGrid::WrapperCallbacks::ConnectionAttemptCallbacks::onPoolFailure()
  #2  0x000056453664e7f1 in Envoy::Http::HttpConnPoolImplBase::onPoolFailure()
  #3  0x00005645366cb447 in Envoy::ConnectionPool::ConnPoolImplBase::purgePendingStreams()
  #4  0x00005645366c9d6e in Envoy::ConnectionPool::ConnPoolImplBase::onConnectionEvent()
  #5  0x000056453666394d in Envoy::Tcp::ActiveTcpClient::onEvent()
  envoyproxy#6  0x000056453709b0e9 in Envoy::Network::MockConnectionBase::raiseEvent()
  ...
  envoyproxy#24 0x00005645366be287 in Envoy::ConnectionPool::ConnPoolImplBase::destructAllConnections()
  envoyproxy#25 0x00005645366766b8 in Envoy::Http::HttpConnPoolImplBase::~HttpConnPoolImplBase()
  envoyproxy#26 0x000056453664e70f in Envoy::Http::HttpConnPoolImplMixed::~HttpConnPoolImplMixed()
  envoyproxy#27 0x000056453664e739 in Envoy::Http::HttpConnPoolImplMixed::~HttpConnPoolImplMixed()
  envoyproxy#28 0x000056453647099c in std::__1::default_delete<>::operator()()
  envoyproxy#29 0x000056453647091c in std::__1::unique_ptr<>::reset()
  envoyproxy#30 0x000056453663a3ad in Envoy::Http::ConnectivityGrid::~ConnectivityGrid()
  envoyproxy#31 0x000056453663a4f9 in Envoy::Http::ConnectivityGrid::~ConnectivityGrid()
  envoyproxy#32 0x00005645365de65c in std::__1::default_delete<>::operator()()
  envoyproxy#33 0x000056453656b55c in std::__1::unique_ptr<>::reset()
  envoyproxy#34 0x00005645363174fa in ConnectivityGridTest_DestroyGridWithActiveH2Attempts_Test::TestBody()
```

---

### 🛠️ The Fix

Introduced a reloadable feature flag guard
`envoy.reloadable_features.conn_pool_grid_early_return_on_teardown`
(enabled by default) to shield both H3 and H2/H1 pools during
destruction:
* **Early Return**: Inside
`WrapperCallbacks::onConnectionAttemptFailed()`, we check if
`delete_started_` is `true`. If the flag is enabled and the wrapper is
already being deleted/torn down, we return immediately, completely
skipping the unsafe virtual method lookups on the half-destructed pool
object.
* **ALPN & Protocol Safety**: This safely protects the system regardless
of which pool is being torn down (H3 or H2/H1 mixed pools).

---

### 🧪 Testing & Verification

* Added the `DestroyGridWithActiveH2Attempts` unit test to verify that
tearing down the `ConnectivityGrid` while an active H2 connection
attempt is running does crash without the fix and not with the fix.

Risk Level: low, only exposed with log level trace
Testing: new unit test
Docs Changes: N/A 
Release Notes: Y
Platform Specific Features: N/A
Runtime guard:
envoy.reloadable_features.conn_pool_grid_early_return_on_teardown

Signed-off-by: Dan Zhang <danzh@google.com>
Co-authored-by: Dan Zhang <danzh@google.com>
if (!Runtime::runtimeFeatureEnabled(
"envoy.reloadable_features.reverse_tunnel_drain_with_goaway")) {
response.add("runtime envoy.reloadable_features.reverse_tunnel_drain_with_goaway is off\n");
return Http::Code::ServiceUnavailable;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ServiceUnavailable is misleading and might cause the client to retry, can we return a NotImplemented here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants