Conversation
…50987532 🤖 Auto-Build: Update Ring Loadables
…52892094 🤖 Auto-Build: Update Ring Loadables
…53339573 🤖 Auto-Build: Update Ring Loadables
Update NEW branch
There was a problem hiding this comment.
Sorry @jkool702, your pull request is larger than the review limit of 150000 diff characters
|
/gemini summary |
There was a problem hiding this comment.
Code Review
This pull request refactors the lock-free batch claiming mechanism in forkrun_ring.c by splitting ring_claim_main into a side-effect-free helper do_lockfree_claim and a wrapper handling Bash variable bindings. It also introduces speculative claim limits, improves escrow draining, updates tokenization to pass offsets explicitly, and adds several C plugin unit tests. A critical issue was identified in the review where retry and poison metadata variables (RING_NUM_KILLS and RING_POISONED) are not reset for fresh batches, leading to state leakage from previous poisoned batches.
| if (batch.num_kills > 0) { | ||
| snprintf(buf, sizeof(buf), "%u", batch.num_kills); | ||
| bind_variable("RING_NUM_KILLS", buf, 0); | ||
|
|
||
| if (local_state->numa_enabled) { | ||
| worker_last_major = local_state->major_ring[my_read_idx & RING_MASK]; | ||
| worker_last_minor = | ||
| local_state->minor_ring[my_read_idx & RING_MASK] & ~FLAG_MAJOR_EOF; | ||
| u64toa(batch.idx, buf); | ||
| bind_variable("RING_BATCH_IDX", buf, 0); | ||
|
|
||
| if (local_state->minor_ring[(my_read_idx + claim_count - 1) & RING_MASK] & | ||
| FLAG_MAJOR_EOF) { | ||
| worker_last_minor |= FLAG_MAJOR_EOF; | ||
| int limit = 3; // Default to 3 retries | ||
| const char *s_lim = get_string_value("FORKRUN_RETRY_LIMIT"); | ||
| if (s_lim) limit = atoi(s_lim); | ||
|
|
||
| // limit < 0 → infinite retries (never poison) | ||
| // limit >= 0 → poison when kill count reaches the limit | ||
| if (limit >= 0 && batch.num_kills >= (uint32_t)limit) { | ||
| bind_variable("RING_POISONED", "1", 0); | ||
| } else { | ||
| bind_variable("RING_POISONED", "0", 0); | ||
| } | ||
| } |
There was a problem hiding this comment.
For fresh batches (batch.num_kills == 0), the retry and poison metadata variables (RING_NUM_KILLS and RING_POISONED) are not updated and will retain their values from the previous batch. If a previous batch was poisoned (RING_POISONED="1"), a subsequent fresh batch that fails just once will be incorrectly treated as poisoned by the Bash layer, leading to premature aborts. Resetting these variables to "0" for fresh batches prevents this critical state leakage.
if (batch.num_kills > 0) {
snprintf(buf, sizeof(buf), "%u", batch.num_kills);
bind_variable("RING_NUM_KILLS", buf, 0);
u64toa(batch.idx, buf);
bind_variable("RING_BATCH_IDX", buf, 0);
int limit = 3; // Default to 3 retries
const char *s_lim = get_string_value("FORKRUN_RETRY_LIMIT");
if (s_lim) limit = atoi(s_lim);
// limit < 0 → infinite retries (never poison)
// limit >= 0 → poison when kill count reaches the limit
if (limit >= 0 && batch.num_kills >= (uint32_t)limit) {
bind_variable("RING_POISONED", "1", 0);
} else {
bind_variable("RING_POISONED", "0", 0);
}
} else {
bind_variable("RING_NUM_KILLS", "0", 0);
bind_variable("RING_POISONED", "0", 0);
}There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 652e4b9062
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| // --- Export retry/poison metadata only for recycled (escrow) batches --- | ||
| // For fresh batches (num_kills == 0) these variables retain their previous | ||
| // values; the Bash layer only inspects them after a non-zero retry count. | ||
| if (batch.num_kills > 0) { |
There was a problem hiding this comment.
Reset poison metadata for non-retry escrow claims
RING_NUM_KILLS/RING_POISONED are now only updated when batch.num_kills > 0, so recycled batches with num_kills == 0 keep whatever values were set by a previous claim. In a sequence where one recycled batch is poisoned (RING_POISONED=1) and the next recycled batch is healthy (num_kills=0), callers that inspect RING_POISONED on each claim can incorrectly skip valid work because the stale poison flag is never cleared here.
Useful? React with 👍 / 👎.
…25371208 🤖 Auto-Build: Update Ring Loadables
|
|
Warning Gemini encountered an error creating the summary. You can try again by commenting |


No description provided.