Skip to content

[Simulation.Core] WorkerThread: fix use-after-free crash with ultra-short Task#6116

Open
fredroy wants to merge 2 commits into
sofa-framework:masterfrom
fredroy:fix_scheduler_race
Open

[Simulation.Core] WorkerThread: fix use-after-free crash with ultra-short Task#6116
fredroy wants to merge 2 commits into
sofa-framework:masterfrom
fredroy:fix_scheduler_race

Conversation

@fredroy
Copy link
Copy Markdown
Contributor

@fredroy fredroy commented May 20, 2026

Got this bug while working with a parallelization of a component, with Claude.

After investigation (not so easy even with Claude 😅), it appears that in WorkerThread,

bool WorkerThread::pushTask(Task *task)
{
// if we're single threaded return false
if (m_taskScheduler->getThreadCount() < 2)
{
return false;
}
{
simulation::ScopedLock lock(m_taskMutex);
const int taskId = task->getStatus()->setBusy(true);
task->m_id = taskId;
m_tasks.push_back(task);
}
if (m_taskScheduler->testMainTaskStatus(nullptr))
{
m_taskScheduler->setMainTaskStatus(task->getStatus());
m_taskScheduler->wakeUpWorkers();
}
return true;
}

if a task is so fast that it finished before the line 216, task has been deleted and is garbage. So it crashes (or some random stuff)

To demonstrate the bug, I asked Claude to write a "unit test" which crashes, at least on my computer (macbook pro m3).
I am wondering if the CI (and its slow multi-threading config, if any) will trigger the bug. 🤔
I will push the fix shortly after in this PR, because we cannot merge a crash 😅

EDIT: It does crash on the macOS CI, most certainly because the CI is bare metal and much faster.

[with-all-tests]


By submitting this pull request, I acknowledge that
I have read, understand, and agree SOFA Developer Certificate of Origin (DCO).


Reviewers will merge this pull-request only if

  • it builds with SUCCESS for all platforms on the CI.
  • it does not generate new warnings.
  • it does not generate new unit test failures.
  • it does not generate new scene test failures.
  • it does not break API compatibility.
  • it is more than 1 week old (or has fast-merge label).

@fredroy fredroy added pr: fix Fix a bug pr: status to review To notify reviewers to review this pull-request pr: AI-aided Label notifying the reviewers that part or all of the PR has been generated with the help of an AI labels May 20, 2026
@bakpaul bakpaul added pr: status wip Development in the pull-request is still in progress and removed pr: status to review To notify reviewers to review this pull-request labels May 21, 2026
@fredroy fredroy added pr: status to review To notify reviewers to review this pull-request and removed pr: status wip Development in the pull-request is still in progress labels May 21, 2026
fredroy added 2 commits May 22, 2026 07:15
 For an ultra-short task, a worker that's still in its doWork poll loop pops the task, runs it (m_task()
  runs the lambda + counter increment, microseconds), and the framework calls task->operator delete  all
  before the main thread reaches setMainTaskStatus(task->getStatus())
@fredroy fredroy force-pushed the fix_scheduler_race branch from 4a40c3f to d27afdf Compare May 21, 2026 22:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr: AI-aided Label notifying the reviewers that part or all of the PR has been generated with the help of an AI pr: fix Fix a bug pr: status to review To notify reviewers to review this pull-request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants