upstreams/http/reverse_tunnel: drain-aware upstream client codec#4
Open
ivpr wants to merge 3 commits into
Open
upstreams/http/reverse_tunnel: drain-aware upstream client codec#4ivpr wants to merge 3 commits into
ivpr wants to merge 3 commits into
Conversation
This was referenced Jun 11, 2026
7febccb to
ced2843
Compare
737bc27 to
605bdd2
Compare
ced2843 to
f510d90
Compare
605bdd2 to
6083fea
Compare
f510d90 to
22b605f
Compare
6083fea to
f80753a
Compare
22b605f to
dc1fddb
Compare
f80753a to
bd68589
Compare
ivpr
pushed a commit
that referenced
this pull request
Jun 15, 2026
…envoyproxy#45324) ### Description This PR resolves a use-after-free (UAF) / pure virtual method call crash that occurs during `ConnectivityGrid` teardown when connection pools (HTTP/3, HTTP/2, or HTTP/1) are destroyed while active connection attempts are still ongoing in the background. --- ### 💥 The Bug & Crash Callstack When the `ConnectivityGrid` is deleted, its destructor destroys its underlying connection pools in reverse order. If a pool has active connection attempts, deleting the pool synchronously cancels those attempts. In C++, once a derived class destructor finishes, the object's vtable pointer transitions to point to the base class vtable. Because `destructAllConnections()` is invoked in `~HttpConnPoolImplBase()` (the base class destructor), the connection cancellation callback wrappers execute `onConnectionAttemptFailed()` after the derived pool object has already been destructed. When `onConnectionAttemptFailed()` attempts to write a log trace calling `describePool(attempt->pool())` (which executes `attempt->pool().protocolDescription()`), virtual dispatch invokes a **pure virtual function call** on the base class `HttpConnPoolImplBase`, crashing Envoy instantly: ```log Program received signal SIGSEGV, Segmentation fault. 0x000056453663656e in Envoy::Http::ConnectivityGrid::WrapperCallbacks::onConnectionAttemptFailed() Backtrace: #0 0x000056453663656e in Envoy::Http::ConnectivityGrid::WrapperCallbacks::onConnectionAttemptFailed() #1 0x000056453663606d in Envoy::Http::ConnectivityGrid::WrapperCallbacks::ConnectionAttemptCallbacks::onPoolFailure() #2 0x000056453664e7f1 in Envoy::Http::HttpConnPoolImplBase::onPoolFailure() #3 0x00005645366cb447 in Envoy::ConnectionPool::ConnPoolImplBase::purgePendingStreams() #4 0x00005645366c9d6e in Envoy::ConnectionPool::ConnPoolImplBase::onConnectionEvent() #5 0x000056453666394d in Envoy::Tcp::ActiveTcpClient::onEvent() envoyproxy#6 0x000056453709b0e9 in Envoy::Network::MockConnectionBase::raiseEvent() ... envoyproxy#24 0x00005645366be287 in Envoy::ConnectionPool::ConnPoolImplBase::destructAllConnections() envoyproxy#25 0x00005645366766b8 in Envoy::Http::HttpConnPoolImplBase::~HttpConnPoolImplBase() envoyproxy#26 0x000056453664e70f in Envoy::Http::HttpConnPoolImplMixed::~HttpConnPoolImplMixed() envoyproxy#27 0x000056453664e739 in Envoy::Http::HttpConnPoolImplMixed::~HttpConnPoolImplMixed() envoyproxy#28 0x000056453647099c in std::__1::default_delete<>::operator()() envoyproxy#29 0x000056453647091c in std::__1::unique_ptr<>::reset() envoyproxy#30 0x000056453663a3ad in Envoy::Http::ConnectivityGrid::~ConnectivityGrid() envoyproxy#31 0x000056453663a4f9 in Envoy::Http::ConnectivityGrid::~ConnectivityGrid() envoyproxy#32 0x00005645365de65c in std::__1::default_delete<>::operator()() envoyproxy#33 0x000056453656b55c in std::__1::unique_ptr<>::reset() envoyproxy#34 0x00005645363174fa in ConnectivityGridTest_DestroyGridWithActiveH2Attempts_Test::TestBody() ``` --- ### 🛠️ The Fix Introduced a reloadable feature flag guard `envoy.reloadable_features.conn_pool_grid_early_return_on_teardown` (enabled by default) to shield both H3 and H2/H1 pools during destruction: * **Early Return**: Inside `WrapperCallbacks::onConnectionAttemptFailed()`, we check if `delete_started_` is `true`. If the flag is enabled and the wrapper is already being deleted/torn down, we return immediately, completely skipping the unsafe virtual method lookups on the half-destructed pool object. * **ALPN & Protocol Safety**: This safely protects the system regardless of which pool is being torn down (H3 or H2/H1 mixed pools). --- ### 🧪 Testing & Verification * Added the `DestroyGridWithActiveH2Attempts` unit test to verify that tearing down the `ConnectivityGrid` while an active H2 connection attempt is running does crash without the fix and not with the fix. Risk Level: low, only exposed with log level trace Testing: new unit test Docs Changes: N/A Release Notes: Y Platform Specific Features: N/A Runtime guard: envoy.reloadable_features.conn_pool_grid_early_return_on_teardown Signed-off-by: Dan Zhang <danzh@google.com> Co-authored-by: Dan Zhang <danzh@google.com>
dc1fddb to
eb48580
Compare
a3b5f90 to
9415483
Compare
added 3 commits
June 26, 2026 04:20
Add the envoy.upstreams.http.reverse_tunnel extension: a per-cluster upstream (client) codec for reverse-connection clusters that makes graceful drain observable and controllable, built on the Http::ClientCodecFactory seam. - ReverseTunnelUpstreamCodecOptions (typed_extension_protocol_options) doubles as the ClientCodecFactory; it is scoped to HTTP/2 envoy.clusters.reverse_connection clusters and is a no-op otherwise. - Observes a peer GOAWAY (stat) and forwards to the pool so normal drain runs. - Two-phase client GOAWAY (graceful max-stream-id, then final) and a graceful pool drain at the end of the drain window so in-flight requests finish. - UpstreamCodecDrainRegistry (TLS-backed singleton) + admin endpoint /reverse_tunnel/drain_clusters to fan a drain out to worker-thread codecs. Gated by the per-cluster proto field enable_drain_with_goaway; off by default. Signed-off-by: Prasad I V <prasad.iv@databricks.com>
Expose the codec factory via ProtocolOptionsConfig::upstreamHttpClientCodecFactory(), switch createClientCodec to the one-arg (nullptr-to-defer) contract, and return nullptr in the decline cases instead of calling create_default. Signed-off-by: Prasad I V <prasad.iv@databricks.com>
Update the config_test Context helper for the new ClientCodecFactory::Context options field. Signed-off-by: Prasad I V <prasad.iv@databricks.com>
9415483 to
4835218
Compare
yanavlasov
pushed a commit
to envoyproxy/envoy
that referenced
this pull request
Jun 27, 2026
Adds `Http::ClientCodecFactory`, an optional per-cluster extension point for building the upstream (client) HTTP codec. It is discovered from `typed_extension_protocol_options` and surfaced via a new non-pure `ClusterInfo::upstreamHttpClientCodecFactory()` accessor that defaults to empty. `CodecClientProd` invokes the factory when present, handing it a thunk that builds the stock codec, so an implementation can either decorate the stock codec or return a fresh one. With no factory configured the stock codec is built exactly as before and behavior is unchanged. Motivating use case: reverse tunnels invert the HTTP roles -- the connection is dialed from the downstream to the upstream, but HTTP requests then flow from the upstream to the downstream over that tunnel, so the upstream is the HTTP/2 client and the downstream is the HTTP/2 server. Graceful drain of such a tunnel needs cooperation in both directions and both land on the upstream (client) codec: when the upstream is draining it must signal the downstream to dial a replacement connection while in-flight requests finish; when the downstream is draining it must tell the upstream to stop sending new requests and let the downstream re-establish, again finishing in-flight requests. Neither is expressible with the stock upstream codec, and Envoy currently has no per-cluster hook to substitute the upstream (client) codec (`CodecClientProd` hard-codes it by protocol). This change adds that hook generically; the reverse-tunnel codec that uses it is a follow-up change. Reverse-tunnel implementation that uses this seam (in the ivpr/envoy fork): the upstream drain-aware client codec (ivpr#4) and the downstream drain-aware HCM that dials a replacement tunnel (ivpr#5). ## Reverse-tunnel graceful drain (sequence diagram) ```mermaid sequenceDiagram autonumber participant D as Downstream participant U as Upstream Note over D,U: Reverse tunnel setup - unlike a normal hop, the downstream dials OUT to the upstream D->>U: open TCP connection and initiate reverse tunnel U-->>D: tunnel established Note over D,U: On the established tunnel the upstream is the HTTP/2 client and the downstream is the HTTP/2 server, so requests flow from upstream to downstream U->>D: request over tunnel D-->>U: 200 Note over D,U: Upstream graceful drain U->>D: GOAWAY - drain signal toward downstream D->>D: dial a replacement tunnel to a healthy upstream U->>D: in-flight request continues on the old tunnel D-->>U: 200 Note over D,U: old tunnel closes once idle, new requests use the replacement Note over D,U: Downstream graceful drain D->>U: GOAWAY - tell upstream to stop sending new requests U->>U: drain pool, route new requests to the replacement tunnel U->>D: in-flight request continues on the old tunnel D-->>U: 200 Note over D,U: downstream re-establishes, old tunnel closes once idle ``` Risk Level: Low -- new, opt-in per-cluster extension point. When no factory is configured the stock codec is used and there is no behavior change. Testing: `//test/common/http:codec_client_test` covers the unchanged no-factory path. A dedicated unit test for the factory-injection path is a planned follow-up. Docs Changes: None (the new interface is documented inline in `envoy/http/client_codec_factory.h`). Release Notes: None (extension point only; no user-facing behavior change on its own). Platform Specific Features: None. Runtime guard: None (this PR adds no runtime-guarded behavior). API Considerations: No xDS/proto API changes. Adds one new C++ extension interface (`envoy/http/client_codec_factory.h`) and a non-pure `ClusterInfo::upstreamHttpClientCodecFactory()` accessor that defaults to empty, so existing `ClusterInfo` implementations are unaffected. --------- Signed-off-by: Prasad I V <prasad.iv@databricks.com> Co-authored-by: Prasad I V <prasad.iv@databricks.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
envoy.upstreams.http.reverse_tunnel: a per-cluster upstream (client) codec forreverse-connection clusters, built on the
Http::ClientCodecFactoryseam from http: add per-cluster upstream client codec factory seam #3.ReverseTunnelUpstreamCodecOptionsdoubles as the factory; scoped to HTTP/2envoy.clusters.reverse_connectionclusters and a no-op otherwise.final) and drives a graceful pool drain so in-flight requests finish.
UpstreamCodecDrainRegistry(TLS-backed singleton) + admin endpoint/reverse_tunnel/drain_clustersto fan a drain out to worker-thread codecs.enable_drain_with_goaway; off by default.How it fits (reverse tunnels)
On a reverse tunnel the upstream is the HTTP/2 client; this codec lets a draining upstream
emit GOAWAY from the client side and drain its pool gracefully, so in-flight requests finish while
clients move to a healthy upstream.
Stack
Stacked on #3 (the generic codec-factory seam). Paired with #5 (the downstream drain-aware HCM).
Test plan
bazel test //test/extensions/upstreams/http/reverse_tunnel:config_test