Skip to content

fix(analytics): Make local-flags polling loop shutdown promptly#149

Open
tylerjroach wants to merge 2 commits into
masterfrom
fix/polling-thread-prompt-shutdown
Open

fix(analytics): Make local-flags polling loop shutdown promptly#149
tylerjroach wants to merge 2 commits into
masterfrom
fix/polling-thread-prompt-shutdown

Conversation

@tylerjroach

Copy link
Copy Markdown
Contributor

Summary

stop_polling_for_definitions! previously took up to polling_interval_in_seconds to return (3-minute teardown observed during a cross-SDK audit at the default 60s interval — three polling cycles before the test process gave up) because the polling thread blocked on an uninterruptible sleep before checking @stop_polling. Setting the flag couldn't preempt the sleep, so Thread#join had to wait.

Replace the sleep with Mutex#synchronize + ConditionVariable#wait, and have stop_polling_for_definitions! broadcast the condition so the polling thread wakes immediately and breaks out of the loop on the next iteration.

Context

Found during a cross-SDK audit of feature-flag / OpenFeature implementations. The Python and Go server SDKs already handle this correctly (Python via task cancellation, Go via context).

Test plan

  • All 44 existing local_flags_spec examples pass
  • Full Ruby spec suite passes (one pre-existing unrelated failure: tracker_spec.rb:23 calls CGI.parse which was extracted out of stdlib in Ruby 4)
  • Shutdown timing smoke test: stop_polling_for_definitions! returned in ~0.1ms after my change vs. up to 60_000ms before, with the polling thread actively waiting on the condition

🤖 Generated with Claude Code

stop_polling_for_definitions! previously took up to polling_interval_in_seconds
to return (3 min teardown observed during cross-SDK audit at the default 60s
interval) because the polling thread blocked on an uninterruptible sleep before
checking @stop_polling.

Replace the sleep with Mutex#synchronize + ConditionVariable#wait, and have
stop_polling_for_definitions! broadcast the condition so the thread wakes
immediately. Shutdown is now ~0.1ms in a smoke test.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@tylerjroach tylerjroach requested review from a team and rahul-mixpanel June 15, 2026 20:49
@codecov

codecov Bot commented Jun 15, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 96.68%. Comparing base (e5edf71) to head (deac4ef).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #149      +/-   ##
==========================================
+ Coverage   96.64%   96.68%   +0.03%     
==========================================
  Files          14       14              
  Lines         656      663       +7     
==========================================
+ Hits          634      641       +7     
  Misses         22       22              
Flag Coverage Δ
openfeature 100.00% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

The previous commit's polling loop called ConditionVariable#wait inside
@polling_mutex but checked @stop_polling outside it:

  @polling_mutex.synchronize do
    @polling_condition.wait(@polling_mutex, interval)
  end
  break if @stop_polling

If stop_polling_for_definitions! set @stop_polling = true and called
broadcast while the polling thread was outside the mutex (typically mid
fetch_flag_definitions), the broadcast had no waiter and was lost. The
thread would then re-enter the synchronized region and call wait(interval)
again, riding out the full polling interval before observing @stop_polling
— reintroducing the original slow-shutdown bug for the duration of one
extra tick.

Move the predicate check inside the synchronized region so the polling
thread sees @stop_polling consistently with the broadcast and skips the
wait when shutdown has already been requested.

Add two regression specs:
  - shutdown latency while the thread is parked on the CV (basic case)
  - shutdown latency when shutdown races an in-flight fetch (race case)

Without the predicate-inside-mutex fix, the race spec fails with elapsed
~= polling_interval_in_seconds. With it, elapsed is sub-millisecond.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@tylerjroach

Copy link
Copy Markdown
Contributor Author

Pushed deac4ef to close a residual race I missed in the first commit.

What was wrong: The original commit moved the sleep to a ConditionVariable#wait but kept the break if @stop_polling check outside the mutex. If stop_polling_for_definitions! set @stop_polling = true and called broadcast while the polling thread was outside the synchronized region (e.g. mid-fetch_flag_definitions), the broadcast had no waiter and was lost. The thread would then re-enter the synchronized region and call wait(interval) again, riding out the full polling interval before observing the flag — reintroducing the original slow-shutdown bug for one extra tick.

Fix: Move the predicate check inside synchronize, before and after wait:

stopped = @polling_mutex.synchronize do
  next true if @stop_polling
  @polling_condition.wait(@polling_mutex, @config[:polling_interval_in_seconds])
  @stop_polling
end
break if stopped

This is the standard "always check the predicate under the mutex before/after wait" idiom for ConditionVariable.

Regression coverage: Added two specs:

  • shuts down promptly while the polling thread is waiting on the interval — basic case (thread parked on CV)
  • shuts down promptly when stop races an in-flight fetch — race case (thread mid-fetch)

I verified the race spec actually catches the bug by temporarily reverting just the predicate-inside-mutex change: the spec failed with expected < 0.5, got 1.005 (full polling interval rode out). With the fix in place, it passes with elapsed sub-millisecond.

All 46 local_flags_spec examples pass.

@tylerjroach tylerjroach changed the title Make local-flags polling loop shutdown promptly fix(analytics): Make local-flags polling loop shutdown promptly Jun 16, 2026
@tylerjroach tylerjroach requested a review from Copilot June 16, 2026 13:06

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR improves Mixpanel::Flags::LocalFlagsProvider’s polling shutdown behavior by making the polling loop’s interval wait interruptible, so stop_polling_for_definitions! can return promptly instead of waiting out the full polling interval.

Changes:

  • Replace the polling thread’s sleep interval with a Mutex + ConditionVariable#wait so shutdown can wake the thread immediately.
  • Update stop_polling_for_definitions! to set the stop flag under the mutex and broadcast the condition before joining the thread.
  • Add specs that validate prompt shutdown both while waiting on the interval and when stop races an in-flight fetch.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.

File Description
lib/mixpanel-ruby/flags/local_flags_provider.rb Makes polling interval waits interruptible via ConditionVariable, and broadcasts on stop to wake the polling thread promptly.
spec/mixpanel-ruby/flags/local_flags_spec.rb Adds regression coverage for prompt shutdown in both “waiting” and “mid-fetch” scenarios.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

)

polling_provider.start_polling_for_definitions!
sleep 0.1 # let the polling thread enter the condition wait
)

polling_provider.start_polling_for_definitions!
second_fetch_started.pop # 2nd fetch is now blocked in the polling thread
# so the broadcast goes to no waiter. Run it in a thread so we can release
# the fetch afterwards.
stopper = Thread.new { polling_provider.stop_polling_for_definitions! }
sleep 0.05 # give stopper time to set @stop_polling + broadcast
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants