Skip to content

events: mTLS authentication#10013

Draft
ripatel-fd wants to merge 5 commits into
mainfrom
ripatel/events-mtls
Draft

events: mTLS authentication#10013
ripatel-fd wants to merge 5 commits into
mainfrom
ripatel/events-mtls

Conversation

@ripatel-fd

@ripatel-fd ripatel-fd commented Jun 1, 2026

Copy link
Copy Markdown
Contributor

Replace custom challenge-response auth with TLS 1.3 client authentication

  • events: fix lifecycle bugs
  • events: remove custom signing scheme
  • events: make OpenSSL mandatory
  • events: mutual TLS authentication
  • events: add option to skip server cert verify

Close #9968

Copilot AI review requested due to automatic review settings June 1, 2026 04:45
@ripatel-fd ripatel-fd changed the title ripatel/events mtls events: mTLS authentication Jun 1, 2026
Comment thread src/app/firedancer/main.c
@greptile-jt

This comment was marked as spam.

greptile-jt[bot]

This comment was marked as spam.

@greptile-jt

greptile-jt Bot commented Jun 1, 2026

Copy link
Copy Markdown
Additional Comments (1)

src/disco/events/fd_event_client.h
Unused metric field transport_success_cnt

transport_success_cnt is declared in fd_event_client_metrics_t but is no longer incremented anywhere in fd_event_client.c after the removal of fd_event_client_handle_confirm_auth_resp. It is also not exported via any FD_MCNT_SET in fd_event_tile.c. Consider either removing the field or wiring it up in fd_event_client_grpc_conn_established.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/disco/events/fd_event_client.h
Line: 18

Comment:
**Unused metric field `transport_success_cnt`**

`transport_success_cnt` is declared in `fd_event_client_metrics_t` but is no longer incremented anywhere in `fd_event_client.c` after the removal of `fd_event_client_handle_confirm_auth_resp`. It is also not exported via any `FD_MCNT_SET` in `fd_event_tile.c`. Consider either removing the field or wiring it up in `fd_event_client_grpc_conn_established`.

How can I resolve this? If you propose a fix, please make it concise.

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.097018 s 0.096847 s -0.176%
backtest mainnet-406545575-perf snapshot load 1.935 s 1.958 s 1.189%
backtest mainnet-406545575-perf total elapsed 97.018391 s 96.847023 s -0.177%
firedancer mem usage with mainnet.toml 1138.39 GiB 1138.39 GiB 0.000%

This comment was marked as spam.

@ripatel-fd ripatel-fd force-pushed the ripatel/events-mtls branch from fea111f to 6f9fce5 Compare June 1, 2026 06:27
@ripatel-fd

Copy link
Copy Markdown
Contributor Author

@greptile-jt review

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.090128 s 0.08986 s -0.297%
backtest mainnet-406545575-perf snapshot load 1.787 s 1.787 s 0.000%
backtest mainnet-406545575-perf total elapsed 90.12792 s 89.859539 s -0.298%
firedancer mem usage with mainnet.toml 1138.39 GiB 1138.39 GiB 0.000%

@greptile-jt

This comment was marked as spam.

greptile-jt[bot]

This comment was marked as spam.

Copilot AI review requested due to automatic review settings June 1, 2026 06:46
@ripatel-fd ripatel-fd force-pushed the ripatel/events-mtls branch from 6f9fce5 to b4870db Compare June 1, 2026 06:46
@ripatel-fd

Copy link
Copy Markdown
Contributor Author

@greptile-jt review

@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.09181 s 0.091581 s -0.249%
backtest mainnet-406545575-perf snapshot load 1.813 s 1.839 s 1.434%
backtest mainnet-406545575-perf total elapsed 91.809582 s 91.58112 s -0.249%
firedancer mem usage with mainnet.toml 1138.39 GiB 1138.39 GiB 0.000%

@greptile-jt

greptile-jt Bot commented Jun 1, 2026

Copy link
Copy Markdown

Greptile Summary

This PR replaces the bespoke challenge-response authentication for the events tile with standard TLS 1.3 mutual authentication using RFC 7250 Raw Public Keys, delegating signing of the TLS CertificateVerify to the existing keyguard via a custom OpenSSL provider.

  • Auth model rewrite: A new fd_event_auth OpenSSL provider exposes the validator's Ed25519 identity as a TLS client credential; signing requests go through fd_keyguard_client_sign with the new FD_KEYGUARD_PAYLOAD_TLS_CV role, and the event client's state machine collapses from 5 states (DISCONNECTED/CONNECTING/AUTHENTICATING/CONFIRMING_AUTH/CONNECTED) to 3 (DISCONNECTED/CONNECTING/CONNECTED).
  • Protocol & keyguard cleanup: AuthenticateRequest/Response and ConfirmAuthChallenge* RPCs are dropped from event.proto; the keyguard removes FD_KEYGUARD_PAYLOAD_EVENT and FD_KEYGUARD_SIGN_TYPE_FD_EVENTS_AUTH_CONCAT_ED25519, with fd_keyguard_authorize_tls_cv extended to accept the 162-byte client payload and the event_sign link MTU bumped from 32 → 162.
  • OpenSSL becomes mandatory for the events tile: topology.c gates the entire tile on FD_HAS_OPENSSL, and a new tls_cert_verify option lets operators disable server certificate verification (useful for self-signed test deployments). The default endpoint in default.toml is switched from http:// to https://.
  • Resolved previous review threads: connected_timestamp is now initialized in fd_event_client.c, fd_circq is moved outside the FD_HAS_OPENSSL block in Local.mk so test_circq builds without OpenSSL, the stale HandshakeTimeouts/AuthFail metrics are gone, and the topology warning when OpenSSL is unavailable is gated on telemetry_enabled.
  • Test server: contrib/event-test-server is rewritten on tokio-rustls with a custom AnyRpkVerifier that validates Ed25519 SubjectPublicKeyInfo client certificates.
  • Minor: tls_cert_verify is int in fd_config.h but a uchar : 1 bitfield in fd_topo.h — works but worth aligning.

Confidence Score: 4/5

  • This PR is largely safe to merge; previous review issues are resolved and the remaining concern is a minor type inconsistency.
  • The refactor is well-structured: it replaces a custom auth scheme with standard TLS 1.3 mTLS, properly delegates signing to the keyguard, regenerates metrics, and addresses the four prior review threads (connected_timestamp init, test_circq build, stale metrics, gated warning). Only a minor stylistic type mismatch was found.
  • src/disco/events/fd_event_auth.c (new OpenSSL provider) and src/disco/events/fd_event_client.c (state machine rewrite) warrant the closest attention from reviewers.

Important Files Changed

Filename Overview
src/disco/events/fd_event_auth.c New OpenSSL provider implementing keymgmt and signature dispatch for ED25519 keys delegated to the keyguard. Cleanly wraps SSL_set1_credential with raw public key support.
src/disco/events/fd_event_client.c Major refactor replacing custom challenge-response auth with TLS 1.3 mTLS. State machine reduced from 5 to 3 states. The previously flagged connected_timestamp initialization is now in place at line 427.
src/disco/events/fd_event_tile.c TLS is now mandatory; new tls_cert_verify option toggles between loading system CA store and SSL_VERIFY_NONE. SSL context creation and teardown are wired through the existing privileged/unprivileged init paths.
src/disco/keyguard/fd_keyguard_authorize.c Expands TLS CV size check to cover the new 162-byte client-side payload, and switches event role authorization to use TLS CV (instead of the removed custom event payload).
src/disco/keyguard/fd_sign_tile.c Removes the event-specific sign type and updates the event sign-link MTU from 32 to 162 bytes to accommodate TLS CertificateVerify payloads.
src/app/firedancer/topology.c Event tile setup wrapped in # if FD_HAS_OPENSSL. The warning about disabled telemetry without OpenSSL is now gated on telemetry_enabled (previous review thread resolved). Event sign link MTU updated to 162UL.
src/app/shared/fd_config.h Adds int tls_cert_verify; field to event config. Note that this field is stored as a uchar : 1 bitfield in fd_topo_t, causing a minor type mismatch between the two representations.
src/disco/topo/fd_topo.h Adds uchar tls_cert_verify : 1; bitfield. Type differs from the int representation in fd_config.h; this works since values are 0/1, but the inconsistency may be worth aligning.
contrib/event-test-server/src/main.rs Rewritten Rust test server with tokio-rustls and a custom AnyRpkVerifier that accepts any well-formed Ed25519 SubjectPublicKeyInfo. Hardcoded server cert/key are clearly marked as test-only.

Last reviewed commit: b4870db

@greptile-jt greptile-jt Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

26 files reviewed, 1 comment

Edit Code Review Agent Settings | Greptile

@greptile-jt

greptile-jt Bot commented Jun 1, 2026

Copy link
Copy Markdown
Additional Comments (1)

src/app/shared/fd_config.h
Type mismatch with fd_topo_t

The new tls_cert_verify field is declared as int here but as a uchar : 1 bitfield in src/disco/topo/fd_topo.h. This works in practice because the value is only ever 0 or 1, but the inconsistency makes it easy to introduce subtle bugs (e.g. assigning a non-zero non-one value to the int and silently truncating). Consider aligning the two representations (either both int, both uchar, or both bitfields) for consistency with the other boolean-ish config fields.

Prompt To Fix With AI
This is a comment left during a code review.
Path: src/app/shared/fd_config.h
Line: 1

Comment:
**Type mismatch with `fd_topo_t`**

The new `tls_cert_verify` field is declared as `int` here but as a `uchar : 1` bitfield in `src/disco/topo/fd_topo.h`. This works in practice because the value is only ever 0 or 1, but the inconsistency makes it easy to introduce subtle bugs (e.g. assigning a non-zero non-one value to the `int` and silently truncating). Consider aligning the two representations (either both `int`, both `uchar`, or both bitfields) for consistency with the other boolean-ish config fields.

How can I resolve this? If you propose a fix, please make it concise.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 26 changed files in this pull request and generated no new comments.

Comments suppressed due to low confidence (1)

src/disco/events/fd_event_client.c:144

  • fd_url_parse_endpoint() returns whether the endpoint is SSL via _is_ssl, but the result is currently ignored. Since the client always configures an SSL connection later, accepting a non-https URL here would lead to attempting a TLS handshake against a plain HTTP endpoint (and hides misconfiguration). Consider rejecting non-SSL URLs explicitly right after parsing.
  fd_url_t url[1];
  _Bool _is_ssl = 0;
  if( FD_UNLIKELY( fd_url_parse_endpoint( url,
                                          _url,
                                          strlen( _url ),

@ripatel-fd ripatel-fd force-pushed the ripatel/events-mtls branch from b4870db to e961c91 Compare June 1, 2026 07:34
Copilot AI review requested due to automatic review settings June 1, 2026 07:37
@ripatel-fd ripatel-fd force-pushed the ripatel/events-mtls branch from e961c91 to 511338d Compare June 1, 2026 07:37
riptl added 2 commits June 1, 2026 07:38
Remove custom challenge-response authentication scheme from event
telemetry API.  To be replaced with mutual TLS in a future commit.

Rename 'authenticate' method to 'hello' to avoid confusion.
An upcoming change will enforce mutual TLS for event telemetry
connections, thus just assume OpenSSL is available in build
configuration.
riptl added 2 commits June 1, 2026 07:38
Allow clients to authenticate via RFC 7250 Raw Public Keys
@ripatel-fd ripatel-fd force-pushed the ripatel/events-mtls branch from 511338d to b15790a Compare June 1, 2026 07:38

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 24 out of 26 changed files in this pull request and generated 10 comments.

Comment on lines +431 to +435
client->event_stream = NULL;
client->metrics.transport_success_cnt++;
client->state = FD_EVENT_CLIENT_STATE_CONNECTED;
client->connected.connected_timestamp = fd_log_wallclock();

Comment on lines 468 to 470
FD_LOG_INFO(( "Requesting auth challenge from event server " FD_IP4_ADDR_FMT ":%u (%.*s)",
FD_IP4_ADDR_FMT_ARGS( client->server_ip4_addr ), client->server_tcp_port,
(int)client->server_fqdn_len, client->server_fqdn ));
Comment on lines +474 to 478
fd_env_client_handle_hello_resp( fd_event_client_t * client,
void const * protobuf,
ulong protobuf_sz ) {
(void)protobuf; (void)protobuf_sz;
client->state = FD_EVENT_CLIENT_STATE_CONNECTED;
Comment on lines 550 to +552
switch( request_ctx ) {
case FD_EVENT_CLIENT_REQ_CTX_AUTHENTICATE:
fd_event_client_handle_auth_challenge_resp( client, protobuf, protobuf_sz );
break;
case FD_EVENT_CLIENT_REQ_CTX_CONFIRM_AUTH:
fd_event_client_handle_confirm_auth_resp( client, protobuf, protobuf_sz );
case FD_EVENT_CLIENT_REQ_CTX_HELLO:
fd_env_client_handle_hello_resp( client, protobuf, protobuf_sz );
Comment on lines 448 to +452

fd_grpc_h2_stream_t * stream = fd_grpc_client_request_start1(
client->grpc_client,
"/events.v1.EventService/Authenticate", strlen("/events.v1.EventService/Authenticate"),
FD_EVENT_CLIENT_REQ_CTX_AUTHENTICATE,
"/events.v1.EventService/Hello", strlen("/events.v1.EventService/Hello"),
FD_EVENT_CLIENT_REQ_CTX_HELLO,
Comment on lines +431 to +435
client->event_stream = NULL;
client->metrics.transport_success_cnt++;
client->state = FD_EVENT_CLIENT_STATE_CONNECTED;
client->connected.connected_timestamp = fd_log_wallclock();

Comment on lines 468 to 470
FD_LOG_INFO(( "Requesting auth challenge from event server " FD_IP4_ADDR_FMT ":%u (%.*s)",
FD_IP4_ADDR_FMT_ARGS( client->server_ip4_addr ), client->server_tcp_port,
(int)client->server_fqdn_len, client->server_fqdn ));
Comment on lines +474 to 478
fd_env_client_handle_hello_resp( fd_event_client_t * client,
void const * protobuf,
ulong protobuf_sz ) {
(void)protobuf; (void)protobuf_sz;
client->state = FD_EVENT_CLIENT_STATE_CONNECTED;
Comment on lines 550 to +552
switch( request_ctx ) {
case FD_EVENT_CLIENT_REQ_CTX_AUTHENTICATE:
fd_event_client_handle_auth_challenge_resp( client, protobuf, protobuf_sz );
break;
case FD_EVENT_CLIENT_REQ_CTX_CONFIRM_AUTH:
fd_event_client_handle_confirm_auth_resp( client, protobuf, protobuf_sz );
case FD_EVENT_CLIENT_REQ_CTX_HELLO:
fd_env_client_handle_hello_resp( client, protobuf, protobuf_sz );
Comment on lines +1468 to +1470
# By default, the TLS certificate of the event server is
# verified against the CA certs in /etc/ssl/certs. To disable
# certificate verification, set this option to 'false'.
@github-actions

github-actions Bot commented Jun 1, 2026

Copy link
Copy Markdown

Performance Measurements ⏳

Suite Baseline New Change
backtest mainnet-406545575-perf per slot 0.08998 s 0.090015 s 0.039%
backtest mainnet-406545575-perf snapshot load 1.79 s 1.784 s -0.335%
backtest mainnet-406545575-perf total elapsed 89.979637 s 90.015093 s 0.039%
firedancer mem usage with mainnet.toml 1138.39 GiB 1138.39 GiB 0.000%

@ripatel-fd ripatel-fd marked this pull request as draft June 1, 2026 16:07
fd_topob_tile_out( topo, "sign", 0UL, "sign_event", 0UL );
}
# else /* no OpenSSL */
if( telemetry_enabled ) FD_LOG_WARNING(( "ignoring [telemetry] = true: this build of Firedancer is missing OpenSSL" ));

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be an error, user has explicit intent for telemetry, tell them to turn it off.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Use mTLS for events

4 participants