Skip to content

http: add per-cluster upstream client codec factory seam#45632

Merged
yanavlasov merged 11 commits into
envoyproxy:mainfrom
ivpr:per-cluster-upstream-codec-factory
Jun 27, 2026
Merged

http: add per-cluster upstream client codec factory seam#45632
yanavlasov merged 11 commits into
envoyproxy:mainfrom
ivpr:per-cluster-upstream-codec-factory

Conversation

@ivpr

@ivpr ivpr commented Jun 15, 2026

Copy link
Copy Markdown
Contributor

Commit Message: http: add per-cluster upstream client codec factory seam

Additional Description:
Adds Http::ClientCodecFactory, an optional per-cluster extension point for building the upstream
(client) HTTP codec. It is discovered from typed_extension_protocol_options and surfaced via a new
non-pure ClusterInfo::upstreamHttpClientCodecFactory() accessor that defaults to empty.
CodecClientProd invokes the factory when present, handing it a thunk that builds the stock codec,
so an implementation can either decorate the stock codec or return a fresh one. With no factory
configured the stock codec is built exactly as before and behavior is unchanged.

Motivating use case: reverse tunnels invert the HTTP roles -- the connection is dialed from the
downstream to the upstream, but HTTP requests then flow from the upstream to the downstream over
that tunnel, so the upstream is the HTTP/2 client and the downstream is the HTTP/2 server. Graceful
drain of such a tunnel needs cooperation in both directions and both land on the upstream (client)
codec: when the upstream is draining it must signal the downstream to dial a replacement connection
while in-flight requests finish; when the downstream is draining it must tell the upstream to stop
sending new requests and let the downstream re-establish, again finishing in-flight requests.
Neither is expressible with the stock upstream codec, and Envoy currently has no per-cluster hook to
substitute the upstream (client) codec (CodecClientProd hard-codes it by protocol). This change
adds that hook generically; the reverse-tunnel codec that uses it is a follow-up change.

Reverse-tunnel implementation that uses this seam (in the ivpr/envoy fork): the upstream drain-aware client codec (ivpr#4) and the downstream drain-aware HCM that dials a replacement tunnel (ivpr#5).

Reverse-tunnel graceful drain (sequence diagram)

sequenceDiagram
    autonumber
    participant D as Downstream
    participant U as Upstream

    Note over D,U: Reverse tunnel setup - unlike a normal hop, the downstream dials OUT to the upstream
    D->>U: open TCP connection and initiate reverse tunnel
    U-->>D: tunnel established
    Note over D,U: On the established tunnel the upstream is the HTTP/2 client and the downstream is the HTTP/2 server, so requests flow from upstream to downstream
    U->>D: request over tunnel
    D-->>U: 200

    Note over D,U: Upstream graceful drain
    U->>D: GOAWAY - drain signal toward downstream
    D->>D: dial a replacement tunnel to a healthy upstream
    U->>D: in-flight request continues on the old tunnel
    D-->>U: 200
    Note over D,U: old tunnel closes once idle, new requests use the replacement

    Note over D,U: Downstream graceful drain
    D->>U: GOAWAY - tell upstream to stop sending new requests
    U->>U: drain pool, route new requests to the replacement tunnel
    U->>D: in-flight request continues on the old tunnel
    D-->>U: 200
    Note over D,U: downstream re-establishes, old tunnel closes once idle
Loading

Risk Level: Low -- new, opt-in per-cluster extension point. When no factory is configured the stock
codec is used and there is no behavior change.

Testing: //test/common/http:codec_client_test covers the unchanged no-factory path. A dedicated
unit test for the factory-injection path is a planned follow-up.

Docs Changes: None (the new interface is documented inline in envoy/http/client_codec_factory.h).

Release Notes: None (extension point only; no user-facing behavior change on its own).

Platform Specific Features: None.

Runtime guard: None (this PR adds no runtime-guarded behavior).

API Considerations: No xDS/proto API changes. Adds one new C++ extension interface
(envoy/http/client_codec_factory.h) and a non-pure ClusterInfo::upstreamHttpClientCodecFactory()
accessor that defaults to empty, so existing ClusterInfo implementations are unaffected.

@ivpr ivpr force-pushed the per-cluster-upstream-codec-factory branch 3 times, most recently from 22b605f to dc1fddb Compare June 15, 2026 16:23
Introduce Http::ClientCodecFactory, an optional per-cluster extension point
for building the upstream (client) HTTP codec. ClusterInfo exposes it via
upstreamHttpClientCodecFactory(); ClusterInfoImpl recovers it from
typed_extension_protocol_options by sidecast (an options object that also
implements ClientCodecFactory is the factory). CodecClientProd routes codec
construction through the factory when present, passing a thunk that builds the
stock codec so the factory can either decorate it or return a fresh codec.

This is a generic, opt-in seam: with no factory configured, behavior is
unchanged and the stock codec is used.

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
@ivpr

ivpr commented Jun 15, 2026

Copy link
Copy Markdown
Contributor Author

@agrawroh / @basundhara-c Can you please review and assign to folks who can review this

@basundhara-c

Copy link
Copy Markdown
Contributor

/gemini-review

Comment thread source/common/http/codec_client.cc Outdated
Comment thread source/common/http/codec_client.cc Outdated
@ivpr

ivpr commented Jun 16, 2026

Copy link
Copy Markdown
Contributor Author

/retest

Add RELEASE_ASSERT checks to guard against calling create_default multiple times and returning a null codec from custom upstream client codec factories.

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
@agrawroh

Copy link
Copy Markdown
Member

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a per-cluster upstream HTTP client codec factory (ClientCodecFactory) that allows decorating or replacing the stock codec via typed extension protocol options. The reviewer provided valuable feedback on several areas: first, the stock codec is currently constructed eagerly, which can cause immediate, unwanted side effects on the connection if the factory decides to replace it; constructing it lazily would prevent this. Second, finding the factory by iterating over absl::flat_hash_map introduces non-deterministic behavior if multiple factories are configured, so the code should detect and throw an exception in such cases. Finally, envoy/common/pure.h should be explicitly included in client_codec_factory.h for the PURE macro.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread source/common/http/codec_client.cc Outdated
Comment thread source/common/upstream/upstream_impl.cc Outdated
Comment thread envoy/http/client_codec_factory.h
Avoid eager stock codec construction when an upstream codec factory is configured, make factory selection deterministic by sorting extension option names, and add explicit pure.h include for PURE usage.

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
@ivpr

ivpr commented Jun 17, 2026

Copy link
Copy Markdown
Contributor Author

/retest

Prasad I V added 2 commits June 18, 2026 12:01
Replace the create_default callback with a nullptr-return contract so
CodecClientProd builds the stock codec via a single switch only when no
custom codec is used (avoids eager stock construction side effects).

Discover the per-cluster factory via an explicit
ProtocolOptionsConfig::upstreamHttpClientCodecFactory() hook instead of a
dynamic_cast probe, and enforce at-most-one factory per cluster.

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
Avoid the non-dictionary word flagged by the pedantic spell check.

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
@ivpr

ivpr commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a per-cluster upstream HTTP client codec factory interface (ClientCodecFactory) allowing custom or decorated client codecs to be built via cluster protocol options. The review feedback recommends passing the transport socket options (TransportSocketOptions) into the factory's Context struct. This addition is necessary for custom codecs to replicate or decorate stock HTTP/1 behavior, which depends on whether the connection is proxied. The feedback includes actionable suggestions to update the interface, the codec client instantiation, and the associated unit tests.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread envoy/http/client_codec_factory.h Outdated
Comment thread envoy/http/client_codec_factory.h
Comment thread source/common/http/codec_client.cc
Comment thread test/common/http/codec_client_test.cc
Add the connection's transport socket options to ClientCodecFactory::Context
so a custom codec factory can replicate or decorate the stock HTTP/1 codec,
which inspects http11ProxyInfo() to detect a proxied connection.

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
@ivpr

ivpr commented Jun 18, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a per-cluster factory interface (ClientCodecFactory) for upstream HTTP codecs, allowing custom or decorated client codecs to be built via typed_extension_protocol_options. CodecClientProd is updated to consult this factory and fall back to stock codecs if none is provided. The review feedback suggests explicitly documenting the lifetime contract of the returned factory in ProtocolOptionsConfig to prevent potential lifetime bugs.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread envoy/upstream/upstream.h
Note that upstreamHttpClientCodecFactory() must return a factory owned by the
options object, since ClusterInfoImpl pins it via a shared_ptr aliasing the
options object.

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
// constructing it has immediate side effects on the connection (e.g. the HTTP/2 codec writes
// SETTINGS), so building one we would discard could corrupt the connection.
if (auto factory = host->cluster().upstreamHttpClientCodecFactory(); factory.has_value()) {
codec_ = factory->createClientCodec(Http::ClientCodecFactory::Context{

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are skipping quic_session.Initialize() for the custom factory path. We should evaluate whether this is okay.

This should be fine for reverse tunnels, but it's an invariant we should note; if a custom codec factory is used, QUIC will not work?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. The contract is that a factory returning a codec owns full construction, including QUIC init for HTTP/3 (it can dynamic_cast context.connection to EnvoyQuicClientSession& and call Initialize(), like core does). Documented this on the createClientCodec docstring and also added an HTTP/3 test covering the factory path.

Comment thread envoy/upstream/upstream.h
* Defaults to none. An options extension attached via typed_extension_protocol_options
* can override this to make CodecClientProd build a custom or decorated client codec.
*/
virtual OptRef<const Http::ClientCodecFactory> upstreamHttpClientCodecFactory() const {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why cannot this be PURE?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is on generic ProtocolOptionsConfig, the base for all typed_extension_protocol_options objects (TCP options, HTTP options, etc.), almost none of which provide a codec. PURE would force all of them to add a boilerplate return {};. Codec providers override it to return *this; everyone else uses the default.

(createClientCodec() on ClientCodecFactory itself is PURE, since a real codec factory must implement it.)

TEST(CodecClientProdTest, NoUpstreamClientCodecFactoryUsesStockCodec) {
NiceMock<Event::MockDispatcher> dispatcher;
NiceMock<Random::MockRandomGenerator> random;
auto cluster = std::make_shared<NiceMock<Upstream::MockClusterInfo>>();

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to add more tests covering protocols other than HTTP/1, like:

  • HTTP/2 factory-used + factory-returns-nullptr
  • HTTP/3 + whatever we decide for QUIC: do we reject the config if it's used with a custom factory?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Added tests for HTTP/2 and HTTP/3 as well

}

// A factory that returns nullptr defers to the stock codec, which CodecClientProd then builds.
TEST(CodecClientProdTest, UpstreamClientCodecFactoryNullptrUsesStockCodec) {

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We also need a test for the case where multiple factories are registered, and we throw a "multiple upstream HTTP client codec factories configured" error

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

.WillOnce(Invoke([&](const ClientCodecFactory::Context& context) -> ClientConnectionPtr {
EXPECT_EQ(CodecType::HTTP1, context.type);
EXPECT_EQ(cluster.get(), &context.cluster);
EXPECT_EQ(nullptr, context.options);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add one test that passes a real options and asserts the factory receives that same value

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// SETTINGS), so building one we would discard could corrupt the connection.
if (auto factory = host->cluster().upstreamHttpClientCodecFactory(); factory.has_value()) {
codec_ = factory->createClientCodec(Http::ClientCodecFactory::Context{
type, *connection_, *this, host->cluster(), random_generator, options});

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we are also not passing the response header limits that are passed for stock HTTP/1 and HTTP/2:

      host->cluster().maxResponseHeadersKb().value_or(...),   // ← response-header size cap
      host->cluster().maxResponseHeadersCount(), ...);         // ← response-header count cap

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are passing host->cluster(), factory can read them from context.cluster (maxResponseHeadersKb().value_or(...), maxResponseHeadersCount()), which the reverse_tunnel codec already does(code ref). Unlike the transport-socket options (not reachable from the cluster, hence added to Context), the header limits are available via Context::cluster.

Prasad I V added 3 commits June 18, 2026 20:39
Add HTTP/2 factory-used and factory-returns-nullptr cases, assert the
transport socket options are forwarded to the factory, and verify that
configuring multiple upstream client codec factories on one cluster is
rejected.

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
Note that a factory returning a non-null codec owns the full construction the
stock path would do, including QUIC session initialization for HTTP/3, and
that returning nullptr defers to the stock codec for unsupported types.

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
Verify the factory is consulted for HTTP/3 and that returning a codec skips the
stock QUIC construction path (running against a mock connection without
crashing proves the dynamic_cast/Initialize path is not taken).

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
@ivpr ivpr requested a review from basundhara-c June 19, 2026 03:41
@agrawroh

Copy link
Copy Markdown
Member

/assign @yanavlasov

@ivpr

ivpr commented Jun 25, 2026

Copy link
Copy Markdown
Contributor Author

@yanavlasov Can you please review this

@yanavlasov yanavlasov left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just some style changes, please.

/wait

Comment thread envoy/http/client_codec_factory.h Outdated
namespace Envoy {

namespace Network {
class Connection;

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per the style guide, we try to avoid forward declarations and include the headers with the types definitions. If it does not cause a problem, please include headers with the types instead of forward declarations.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. I've now change this to use include files

Per the style guide, include the headers defining the types used in
ClientCodecFactory::Context rather than forward-declaring them. No dependency
cycle: upstream_interface only forward-declares Http::ClientCodecFactory.

Signed-off-by: Prasad I V <prasad.iv@databricks.com>
@ivpr

ivpr commented Jun 26, 2026

Copy link
Copy Markdown
Contributor Author

/retest

@yanavlasov yanavlasov merged commit 2eec278 into envoyproxy:main Jun 27, 2026
28 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants