fix: synchronize CoalesceService slot lifecycle to prevent dropped reruns#486
fix: synchronize CoalesceService slot lifecycle to prevent dropped reruns#486JuliusHenke wants to merge 3 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR adjusts the concurrency control in CoalesceService to better synchronize per-key slot lifecycle and prevent rerun futures from being dropped or orphaned during task completion races.
Changes:
- Synchronizes
submitAsyncstate transitions onslot.monitorbefore starting the worker loop. - Moves rerun promotion and slot shutdown/removal into a monitor-held critical section in the worker loop.
- Refactors the “first run” path to create/return its
CompletableFutureonly after the slot state is established.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Agent-Logs-Url: https://github.com/TorMap/tormap/sessions/ceafc039-466b-486c-a47f-746e72bd07db Co-authored-by: JuliusHenke <23460202+JuliusHenke@users.noreply.github.com>
Agent-Logs-Url: https://github.com/TorMap/tormap/sessions/3ff79a36-5735-4758-a207-d2ef45467cbb Co-authored-by: JuliusHenke <23460202+JuliusHenke@users.noreply.github.com>
|
Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details. Warning Firewall rules blocked me from connecting to one or more addresses (expand for details)I tried to connect to the following addresses, but was blocked by firewall rules:
If you need me to access, download, or install something from one of these locations, you can either:
|
|
@copilot perform a final review to determine if there are any issues left |
Final review done at 02ebc1c. Current state:
So I still see that intermittent test-stability concern as the remaining issue. |
Motivation
CoalesceServicewhere submissions near task completion could create an orphannextFuture(never completed) or allow duplicate concurrent executions for the same key.CompletableFuture.allOfcallers, duplicate DB/cache work, and extra lock contention.Description
submitAsyncstate transitions (running,rerunRequested,nextFuture) by synchronizing onslot.monitorso the initial-run path setsrunningand prepares the first future while holding the lock.running=false+slots.remove) into the samesynchronized(slot.monitor)block and promote/clearnextFuturethere to prevent orphaned futures and stale-slot removal races.backend/src/main/kotlin/org/tormap/service/CoalesceService.ktand preserves the intended coalescing semantics.Testing
./gradlew :backend:test --tests "org.tormap.service.CoalesceServiceTest"at the repo root failed because./gradlewwas not present in that path. (failed)cd backend && ./gradlew test --tests "org.tormap.service.CoalesceServiceTest"which failed due to a Gradle/toolchain environment error (25.0.2), so unit tests could not be executed in this environment. (failed)Codex Task