Skip to content

fix: add thread safety to preload_windows_gl_dlls#421

Closed
hobostay wants to merge 1 commit into
sail-sg:mainfrom
hobostay:fix/glfw-thread-safety
Closed

fix: add thread safety to preload_windows_gl_dlls#421
hobostay wants to merge 1 commit into
sail-sg:mainfrom
hobostay:fix/glfw-thread-safety

Conversation

@hobostay
Copy link
Copy Markdown
Contributor

Summary

  • Wrap shared mutable state mutations in preload_windows_gl_dlls() with _CONTEXT_LOCK to prevent data races during concurrent envpool.make() calls

Problem

preload_windows_gl_dlls() modifies three module-level mutable collections (_WINDOWS_DLL_HANDLES, _REGISTERED_DLL_DIRS, _PRELOADED_DLL_PATHS) without holding the existing _CONTEXT_LOCK. Concurrent envpool.make() calls that trigger preload_windows_gl_dlls() can corrupt these shared data structures.

Test plan

  • Run existing test suite to confirm no regressions

🤖 Generated with Claude Code

preload_windows_gl_dlls() modifies three module-level mutable
collections (_WINDOWS_DLL_HANDLES, _REGISTERED_DLL_DIRS,
_PRELOADED_DLL_PATHS) without holding _CONTEXT_LOCK. If multiple
threads call envpool.make() concurrently, these shared data structures
can be corrupted.

Move the shared-state mutation section under _CONTEXT_LOCK to prevent
concurrent modification.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Copy link
Copy Markdown

@chatgpt-codex-connector chatgpt-codex-connector Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: 85a24a2101

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

if dll_path.is_file() and dll_path_str not in _PRELOADED_DLL_PATHS:
_WINDOWS_DLL_HANDLES.append(win_dll(str(dll_path)))
_PRELOADED_DLL_PATHS.add(dll_path_str)
with _CONTEXT_LOCK:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Avoid reacquiring the non-reentrant context lock

On Windows when ENVPOOL_DLL_DIR is set to a valid DLL directory, try_ensure_mujoco_glfw_context() already holds _CONTEXT_LOCK while constructing _GlfwContext, and _GlfwContext.__init__() calls preload_windows_gl_dlls(). This new with _CONTEXT_LOCK: therefore tries to acquire the same threading.Lock a second time in the same thread, causing the first GLFW context initialization to hang instead of rendering or falling back. Use an RLock or move the preload outside the outer locked region.

Useful? React with 👍 / 👎.

Copy link
Copy Markdown
Collaborator

@Trinkle23897 Trinkle23897 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what kind of issue do you want to address? if it's a real issue, could you provide your minimal repo?

@hobostay
Copy link
Copy Markdown
Contributor Author

You're right to question this — I think this PR should be closed entirely.

The original code is not buggy. Here's why:

The call chain is:

try_ensure_mujoco_glfw_context()   # acquires _CONTEXT_LOCK
  → _GlfwContext(width, height)
    → _GlfwContext.__init__()
      → preload_windows_gl_dlls()   # already under _CONTEXT_LOCK

preload_windows_gl_dlls() is always called while _CONTEXT_LOCK is already held, so the shared mutable state (_WINDOWS_DLL_HANDLES, _REGISTERED_DLL_DIRS, _PRELOADED_DLL_PATHS) is already protected. There is no data race.

This PR introduces a real P1 deadlock. _CONTEXT_LOCK is a threading.Lock (non-reentrant). Adding with _CONTEXT_LOCK: inside preload_windows_gl_dlls() causes the same thread to acquire it twice → deadlock on the first envpool.make() call on Windows.

I'd recommend closing this PR as-is. If thread safety for standalone preload_windows_gl_dlls() calls (outside the _CONTEXT_LOCK path) is ever needed, the fix would be to change threading.Lock()threading.RLock(), not to add a nested with _CONTEXT_LOCK.

🤖 Generated with Claude Code

@hobostay
Copy link
Copy Markdown
Contributor Author

Closing: the original code has no data race (preload_windows_gl_dlls is always called under _CONTEXT_LOCK via try_ensure_mujoco_glfw_context → _GlfwContext.init), and this PR introduces a deadlock with the non-reentrant threading.Lock.

@hobostay hobostay closed this May 29, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants