fix(analytics): Make local-flags polling loop shutdown promptly#149
fix(analytics): Make local-flags polling loop shutdown promptly#149tylerjroach wants to merge 2 commits into
Conversation
stop_polling_for_definitions! previously took up to polling_interval_in_seconds to return (3 min teardown observed during cross-SDK audit at the default 60s interval) because the polling thread blocked on an uninterruptible sleep before checking @stop_polling. Replace the sleep with Mutex#synchronize + ConditionVariable#wait, and have stop_polling_for_definitions! broadcast the condition so the thread wakes immediately. Shutdown is now ~0.1ms in a smoke test. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #149 +/- ##
==========================================
+ Coverage 96.64% 96.68% +0.03%
==========================================
Files 14 14
Lines 656 663 +7
==========================================
+ Hits 634 641 +7
Misses 22 22
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
The previous commit's polling loop called ConditionVariable#wait inside
@polling_mutex but checked @stop_polling outside it:
@polling_mutex.synchronize do
@polling_condition.wait(@polling_mutex, interval)
end
break if @stop_polling
If stop_polling_for_definitions! set @stop_polling = true and called
broadcast while the polling thread was outside the mutex (typically mid
fetch_flag_definitions), the broadcast had no waiter and was lost. The
thread would then re-enter the synchronized region and call wait(interval)
again, riding out the full polling interval before observing @stop_polling
— reintroducing the original slow-shutdown bug for the duration of one
extra tick.
Move the predicate check inside the synchronized region so the polling
thread sees @stop_polling consistently with the broadcast and skips the
wait when shutdown has already been requested.
Add two regression specs:
- shutdown latency while the thread is parked on the CV (basic case)
- shutdown latency when shutdown races an in-flight fetch (race case)
Without the predicate-inside-mutex fix, the race spec fails with elapsed
~= polling_interval_in_seconds. With it, elapsed is sub-millisecond.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Pushed What was wrong: The original commit moved the Fix: Move the predicate check inside stopped = @polling_mutex.synchronize do
next true if @stop_polling
@polling_condition.wait(@polling_mutex, @config[:polling_interval_in_seconds])
@stop_polling
end
break if stoppedThis is the standard "always check the predicate under the mutex before/after Regression coverage: Added two specs:
I verified the race spec actually catches the bug by temporarily reverting just the predicate-inside-mutex change: the spec failed with All 46 |
There was a problem hiding this comment.
Pull request overview
This PR improves Mixpanel::Flags::LocalFlagsProvider’s polling shutdown behavior by making the polling loop’s interval wait interruptible, so stop_polling_for_definitions! can return promptly instead of waiting out the full polling interval.
Changes:
- Replace the polling thread’s
sleepinterval with aMutex+ConditionVariable#waitso shutdown can wake the thread immediately. - Update
stop_polling_for_definitions!to set the stop flag under the mutex andbroadcastthe condition before joining the thread. - Add specs that validate prompt shutdown both while waiting on the interval and when stop races an in-flight fetch.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
lib/mixpanel-ruby/flags/local_flags_provider.rb |
Makes polling interval waits interruptible via ConditionVariable, and broadcasts on stop to wake the polling thread promptly. |
spec/mixpanel-ruby/flags/local_flags_spec.rb |
Adds regression coverage for prompt shutdown in both “waiting” and “mid-fetch” scenarios. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ) | ||
|
|
||
| polling_provider.start_polling_for_definitions! | ||
| sleep 0.1 # let the polling thread enter the condition wait |
| ) | ||
|
|
||
| polling_provider.start_polling_for_definitions! | ||
| second_fetch_started.pop # 2nd fetch is now blocked in the polling thread |
| # so the broadcast goes to no waiter. Run it in a thread so we can release | ||
| # the fetch afterwards. | ||
| stopper = Thread.new { polling_provider.stop_polling_for_definitions! } | ||
| sleep 0.05 # give stopper time to set @stop_polling + broadcast |
Summary
stop_polling_for_definitions!previously took up topolling_interval_in_secondsto return (3-minute teardown observed during a cross-SDK audit at the default 60s interval — three polling cycles before the test process gave up) because the polling thread blocked on an uninterruptiblesleepbefore checking@stop_polling. Setting the flag couldn't preempt the sleep, soThread#joinhad to wait.Replace the
sleepwithMutex#synchronize+ConditionVariable#wait, and havestop_polling_for_definitions!broadcast the condition so the polling thread wakes immediately and breaks out of the loop on the next iteration.Context
Found during a cross-SDK audit of feature-flag / OpenFeature implementations. The Python and Go server SDKs already handle this correctly (Python via task cancellation, Go via context).
Test plan
local_flags_specexamples passtracker_spec.rb:23callsCGI.parsewhich was extracted out of stdlib in Ruby 4)stop_polling_for_definitions!returned in ~0.1ms after my change vs. up to 60_000ms before, with the polling thread actively waiting on the condition🤖 Generated with Claude Code