adapter: add LaunchDarkly reconnect integration test#37391
Draft
jasonhernandez wants to merge 1 commit into
Draft
adapter: add LaunchDarkly reconnect integration test#37391jasonhernandez wants to merge 1 commit into
jasonhernandez wants to merge 1 commit into
Conversation
Add an integration test that reproduces incident-984: the LaunchDarkly data source must reconnect after its streaming connection drops with a non-Eof error, so flag updates keep syncing. This lands ahead of the upstream SDK bump so the behavior is characterized against the current (fork) SDK first, and the bump must keep the test green. To make this testable against a controlled server, add a hidden `--launchdarkly-base-uri` flag (env `LAUNCHDARKLY_BASE_URI`) that overrides the SDK's streaming, polling, and events endpoints with a single base URL via the SDK's relay-proxy support. This is also generally useful for pointing at a LaunchDarkly relay proxy. It is threaded through `SystemParameterSyncClientConfig::LaunchDarkly` into `ld_config`. The test (test/launchdarkly-reconnect) runs a mock LaunchDarkly streaming server that serves an initial flag value, resets the first streaming connection mid-stream with a TCP RST (a non-Eof transport error, as in the incident), and serves an updated value to every reconnecting client. environmentd is pointed at the mock, so reaching the updated value proves the data source reconnected; a regressed SDK stays stuck on the initial value. Unlike test/launchdarkly, this needs no real LaunchDarkly credentials, and runs in the nightly pipeline. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
incident-984 was a runtime failure: the LaunchDarkly data source stopped reconnecting after its streaming connection dropped with a non-
Eoferror, silently wedging flag sync. The existingtest/launchdarklynightly covers value sync, persistence, targeting, and the kill switch, but nothing exercises reconnect after a mid-stream drop. A prior attempt to add such a test was abandoned because the failure couldn't be reproduced against real LaunchDarkly ("the test would need to cut the connection"). This reproduces it deterministically against a mock.This is the reconnect test from #37026, un-stacked from the SDK bump (#37025) and rebased onto current
main, so it lands ahead of the upstream-SDK migration. The goal is to simplify review and increase confidence: the test characterizes reconnect behavior against today's (fork) SDK first, and the SDK bump must keep it green.What changed
Production: a hidden
--launchdarkly-base-uriflag (envLAUNCHDARKLY_BASE_URI) that overrides the SDK's streaming/polling/events endpoints with a single base URL, via the SDK'sServiceEndpointsBuilder::relay_proxy. Generally useful for relay-proxy setups; here it lets tests point the SDK at a mock. Threaded throughSystemParameterSyncClientConfig::LaunchDarkly→ld_config. The current fork SDK (2.6.x) already exposes this API, so no SDK change is needed.Test (
test/launchdarkly-reconnect):mock_ld.py— a minimal mock of the LD streaming API. The first streaming client gets an initial flag value (2 GiB), then the connection is reset mid-stream with a TCP RST (SO_LINGER0) so the SDK sees a non-Eoftransport error (the incident class), not theEofit always recovered from. Every reconnecting client gets an updated value (3 GiB).mzcompose.py— boots environmentd pointed at the mock (mappingmax_result_sizeto the flag) and assertsSHOW max_result_sizereaches3GB. That can only happen if the data source reconnected after the reset; a regressed SDK stays stuck at2GBand the assertion times out.test/launchdarkly).Validation status
cargo checkclean (mz-adapter,mz-environmentd,mz-sqllogictest);bin/fmtclean.main. It should pass (we reverted to the fork precisely because it handles incident-984's non-Eofreconnect), but the mock uses a TCP RST specifically and that path is unverified on the fork.LaunchDarkly reconnectjob is confirmed green on this branch (fork SDK). Trigger via theci-nightlylabel.🤖 Generated with Claude Code