Skip to content

pyisolate 0.10.3rc1: per-channel tensor transport + modern-Python event loop#8

Closed
pollockjj wants to merge 4 commits into
developfrom
release/0.10.3-rc-squash
Closed

pyisolate 0.10.3rc1: per-channel tensor transport + modern-Python event loop#8
pollockjj wants to merge 4 commits into
developfrom
release/0.10.3-rc-squash

Conversation

@pollockjj

Copy link
Copy Markdown
Owner

Summary

Two fixes plus the RC version bump, squashed into a single commit on top of develop.

Per-channel tensor transport in RPC pre-passes

Serialize tensors using the per-channel transport mode in the RPC pre-passes so a shared-memory TensorRef is never emitted onto a JSON channel. Previously, when a shared-memory serializer was co-resident in the host process, a TensorRef (which has no "data" field) could be written to a JSON channel, causing KeyError('data') in torch-free sealed/conda workers and killing the RPC receive thread.

Modern-Python event loop handling

Avoid asyncio.get_event_loop() RuntimeError/DeprecationWarning on Python >=3.12. AsyncRPC.__init__ now reuses the running loop if present, otherwise the installed loop, and creates a new loop only as a last resort — without emitting the "no current event loop" deprecation.

Tests / version

  • Regression tests for both fixes.
  • Version bumped to 0.10.3rc1.

Test plan

  • pytest unit suite (event channel, RPC contract, tensor-transport isolation).
  • Verified torch_share and sealed_worker transport modes under their most aggressive tests.

…nt loop

Serialize tensors using the per-channel transport mode in the RPC pre-passes so a shared-memory TensorRef is never emitted onto a JSON channel. This fixes KeyError('data') in torch-free sealed/conda workers when an IsolationToolkit shared-memory serializer is co-resident in the host process.

Avoid asyncio.get_event_loop() RuntimeError/DeprecationWarning on Python >=3.12 by reusing the running or installed event loop and creating one only as a last resort.

Add regression tests for both fixes and bump version to 0.10.3rc1.
Copilot AI review requested due to automatic review settings June 7, 2026 14:04

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR prepares pyisolate for the 0.10.3rc1 release by fixing two runtime regressions: (1) torch tensor pre-serialization incorrectly emitting shared-memory TensorRef objects onto JSON RPC channels, and (2) AsyncRPC construction failing or warning on Python ≥ 3.12 when no event loop is currently set.

Changes:

  • Ensure RPC/model “pre-pass” serialization defers torch Tensor objects so the per-channel transport (JSON vs shared-memory) selects the correct wire format.
  • Update AsyncRPC.__init__ to select an event loop safely on modern Python (reuse running loop, then installed loop, else create+install a loop) without emitting the “no current event loop” deprecation.
  • Add regression tests for both fixes and bump the project version to 0.10.3rc1.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated no comments.

Show a summary per file
File Description
pyisolate/_internal/rpc_serialization.py Defers tensor handling to the active transport by bypassing the registry’s mode-bound Tensor serializer.
pyisolate/_internal/model_serialization.py Same tensor deferral for host-side isolation serialization to prevent JSON-channel TensorRef emission.
pyisolate/_internal/rpc_protocol.py Modernizes event loop acquisition in AsyncRPC.__init__ for Python ≥ 3.12 behavior.
tests/test_tensor_transport_mode_isolation.py Regression tests ensuring pre-passes do not emit TensorRef dicts when tensors must be transport-serialized per channel.
tests/test_tensor_shared_memory.py Regression test ensuring shared-memory CPU tensor transport remains zero-copy (shared storage observable across views).
tests/test_rpc_contract.py Tests AsyncRPC construction with no current loop, reuse of an installed loop, and absence of DeprecationWarning.
tests/test_event_channel.py Updates event bridge tests to use asyncio.run() rather than legacy get_event_loop().run_until_complete().
pyproject.toml Version bump to 0.10.3rc1.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@qa-agent-seveneves

Copy link
Copy Markdown

Codex Review -- Round 1 -- KICKOFF

Runner: scripts/run_codex_review.py (structured-output mode)
PR: #8
Base branch under review: develop
Reviewer: codex exec --output-schema codex_review_schema.json (JSON schema enforced server-side)
Kickoff timestamp (UTC): 2026-06-07T14:11:04Z

Awaiting codex output. The structured result will be posted as a separate comment on this PR when codex exits.

@qa-agent-seveneves

Copy link
Copy Markdown

Codex Review -- Round 1 -- RESULT

PR: #8
Base branch: develop
Reviewer: codex exec --output-schema (structured output)
Result timestamp (UTC): 2026-06-07T14:13:26Z
Duration (s): 139.9
Verdict: ❌ patch is incorrect (one or more blocking findings)
Confidence: 0.86

Overall explanation

The patch fixes construction without a current event loop, but the fallback loop it creates can become a non-running dispatch target for inbound RPC work. That leaves an important newly enabled Python >=3.12 sync-host scenario hanging despite otherwise green tests.

Findings (1)

P1 — urgent

  • [P1] [P1] Avoid using a fallback loop that is never run/home/johnj/dev_master/pyisolate/pyisolate/_internal/rpc_protocol.py:185-186 (confidence 0.87)
    When AsyncRPC is constructed from the synchronous host path with no current loop, this branch installs a fresh event loop but no code runs that loop before rpc.run(). _recv_thread later sees self.default_loop as valid and schedules inbound call/callback dispatch onto it with asyncio.run_coroutine_threadsafe, so child-to-host RPC requests or event callbacks in the Python >=3.12/no-installed-loop scenario hang forever instead of being dispatched.

Raw JSON (schema-enforced)
{
  "findings": [
    {
      "body": "When `AsyncRPC` is constructed from the synchronous host path with no current loop, this branch installs a fresh event loop but no code runs that loop before `rpc.run()`. `_recv_thread` later sees `self.default_loop` as valid and schedules inbound call/callback dispatch onto it with `asyncio.run_coroutine_threadsafe`, so child-to-host RPC requests or event callbacks in the Python >=3.12/no-installed-loop scenario hang forever instead of being dispatched.",
      "code_location": {
        "absolute_file_path": "/home/johnj/dev_master/pyisolate/pyisolate/_internal/rpc_protocol.py",
        "line_range": {
          "end": 186,
          "start": 185
        }
      },
      "confidence_score": 0.87,
      "priority": 1,
      "title": "[P1] Avoid using a fallback loop that is never run"
    }
  ],
  "overall_confidence_score": 0.86,
  "overall_correctness": "patch is incorrect",
  "overall_explanation": "The patch fixes construction without a current event loop, but the fallback loop it creates can become a non-running dispatch target for inbound RPC work. That leaves an important newly enabled Python >=3.12 sync-host scenario hanging despite otherwise green tests."
}

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Comment thread pyproject.toml
[project]
name = "pyisolate"
version = "0.10.2"
version = "0.10.3rc1"
AsyncRPC may be constructed before the event loop that will service it exists (the synchronous host launch path), so __init__ can only install a placeholder loop on Python >=3.12. _recv_thread dispatches inbound calls via run_coroutine_threadsafe(default_loop), which executes only on a running loop -- a never-run placeholder would hang every inbound child-to-host call or event. run() now adopts the running loop before starting the dispatch threads, making the actually-running loop the authoritative dispatch target (matching the manual rebind tests already perform and the update_event_loop contract). Adds a regression test.
@qa-agent-seveneves

Copy link
Copy Markdown

Codex Review -- Round 2 -- KICKOFF

Runner: scripts/run_codex_review.py (structured-output mode)
PR: #8
Base branch under review: develop
Reviewer: codex exec --output-schema codex_review_schema.json (JSON schema enforced server-side)
Kickoff timestamp (UTC): 2026-06-07T14:25:11Z

Awaiting codex output. The structured result will be posted as a separate comment on this PR when codex exits.

@qa-agent-seveneves

Copy link
Copy Markdown

Codex Review -- Round 2 -- RESULT

PR: #8
Base branch: develop
Reviewer: codex exec --output-schema (structured output)
Result timestamp (UTC): 2026-06-07T14:28:31Z
Duration (s): 198.2
Verdict: ❌ patch is incorrect (one or more blocking findings)
Confidence: 0.84

Overall explanation

The patch introduces a CUDA IPC regression in the eager host-side serialization path by making a stale import-time flag decide whether CUDA tensors are copied to CPU before the registry serializer can handle them. This affects configured CUDA tensor sharing, so the PR should not merge as-is.

Findings (1)

P1 — urgent

  • [P1] [P1] Preserve CUDA tensors before the CPU fallback/home/johnj/dev_master/pyisolate/pyisolate/_internal/model_serialization.py:49-53 (confidence 0.87)
    When share_cuda_ipc is enabled via Extension config, the host sets PYISOLATE_ENABLE_CUDA_IPC during _initialize_process, after model_serialization has usually already been imported, so the module-level _cuda_ipc_enabled remains false. Because this newly-reordered branch now runs before the registry Tensor serializer, CUDA tensors passed through create_caller() are converted with data.cpu() instead of reaching serialize_tensor()'s CUDA IPC path, so CUDA-enabled extensions receive CPU tensors and lose the configured CUDA transport.

Raw JSON (schema-enforced)
{
  "findings": [
    {
      "body": "When `share_cuda_ipc` is enabled via `Extension` config, the host sets `PYISOLATE_ENABLE_CUDA_IPC` during `_initialize_process`, after `model_serialization` has usually already been imported, so the module-level `_cuda_ipc_enabled` remains false. Because this newly-reordered branch now runs before the registry `Tensor` serializer, CUDA tensors passed through `create_caller()` are converted with `data.cpu()` instead of reaching `serialize_tensor()`'s CUDA IPC path, so CUDA-enabled extensions receive CPU tensors and lose the configured CUDA transport.",
      "code_location": {
        "absolute_file_path": "/home/johnj/dev_master/pyisolate/pyisolate/_internal/model_serialization.py",
        "line_range": {
          "end": 53,
          "start": 49
        }
      },
      "confidence_score": 0.87,
      "priority": 1,
      "title": "[P1] Preserve CUDA tensors before the CPU fallback"
    }
  ],
  "overall_confidence_score": 0.84,
  "overall_correctness": "patch is incorrect",
  "overall_explanation": "The patch introduces a CUDA IPC regression in the eager host-side serialization path by making a stale import-time flag decide whether CUDA tensors are copied to CPU before the registry serializer can handle them. This affects configured CUDA tensor sharing, so the PR should not merge as-is."
}

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated no new comments.

model_serialization captured _cuda_ipc_enabled at import time, but the host sets PYISOLATE_ENABLE_CUDA_IPC during _initialize_process -- after this module is imported. With the tensor branch reordered ahead of the registry serializer, that stale flag made configured CUDA tensors fall back to obj.cpu() instead of deferring the on-device tensor to the per-channel CUDA IPC transport, silently dropping CUDA transport. Read the env at call time, matching rpc_serialization._prepare_for_rpc_impl, and drop the now-unused module flag. Adds a runtime-evaluation regression test.
@qa-agent-seveneves

Copy link
Copy Markdown

Codex Review -- Round 3 -- KICKOFF

Runner: scripts/run_codex_review.py (structured-output mode)
PR: #8
Base branch under review: develop
Reviewer: codex exec --output-schema codex_review_schema.json (JSON schema enforced server-side)
Kickoff timestamp (UTC): 2026-06-07T14:33:15Z

Awaiting codex output. The structured result will be posted as a separate comment on this PR when codex exits.

@qa-agent-seveneves

Copy link
Copy Markdown

Codex Review -- Round 3 -- RESULT

PR: #8
Base branch: develop
Reviewer: codex exec --output-schema (structured output)
Result timestamp (UTC): 2026-06-07T14:36:29Z
Duration (s): 192.1
Verdict: ✅ patch is correct (zero P0/P1/P2 findings)
Confidence: 0.86

Overall explanation

No actionable regressions were found in the PR diff. The focused regression tests pass with pytest addopts cleared, and the PR's GitHub checks are passing.

Findings (0)

No findings.


Raw JSON (schema-enforced)
{
  "findings": [],
  "overall_confidence_score": 0.86,
  "overall_correctness": "patch is correct",
  "overall_explanation": "No actionable regressions were found in the PR diff. The focused regression tests pass with pytest addopts cleared, and the PR's GitHub checks are passing."
}

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

Comment on lines +166 to 188
# Acquire the loop without raising when constructed outside a running loop.
# Python >=3.10 deprecated and >=3.12 removed implicit main-thread event loop
# creation, so an eager asyncio.get_event_loop() raised here in sync construction
# paths. Preserve the historical get_event_loop() semantics: prefer the running
# loop, then the thread's installed loop (set via asyncio.set_event_loop, e.g. a
# synchronous caller that constructs the RPC before running its own loop), and only
# create+install a new loop as a last resort. update_event_loop() may replace it,
# and run() rebinds to the running loop when started inside one.
try:
self.default_loop = asyncio.get_running_loop()
except RuntimeError:
# No running loop. Reuse the thread's installed loop if present, else install a
# fresh one. asyncio.get_event_loop() returns an installed loop, and only emits
# the "no current event loop" DeprecationWarning (Python >=3.12) when none is
# installed -- we handle creation explicitly, so that one warning is silenced.
with warnings.catch_warnings():
warnings.simplefilter("ignore", DeprecationWarning)
try:
self.default_loop = asyncio.get_event_loop()
except RuntimeError:
self.default_loop = asyncio.new_event_loop()
asyncio.set_event_loop(self.default_loop)
self._loop_lock = threading.Lock() # Protects default_loop updates
Comment on lines +383 to +389
try:
running_loop = asyncio.get_running_loop()
except RuntimeError:
running_loop = None
if running_loop is not None:
with self._loop_lock:
self.default_loop = running_loop
When AsyncRPC.__init__ creates and installs a new event loop as a last resort (no running or installed loop), a later run()/update_event_loop() rebind to a real running loop left that created loop open and unused -- a leaked event loop that emits ResourceWarning: unclosed event loop in sync-construction paths that subsequently use asyncio.run(). Track ownership of the self-created loop and close it on supersession, in both run() and update_event_loop(). Strengthens the run()-rebind regression test to assert the placeholder loop is closed.
@qa-agent-seveneves

Copy link
Copy Markdown

Codex Review -- Round 4 -- KICKOFF

Runner: scripts/run_codex_review.py (structured-output mode)
PR: #8
Base branch under review: develop
Reviewer: codex exec --output-schema codex_review_schema.json (JSON schema enforced server-side)
Kickoff timestamp (UTC): 2026-06-07T14:42:17Z

Awaiting codex output. The structured result will be posted as a separate comment on this PR when codex exits.

@qa-agent-seveneves

Copy link
Copy Markdown

Codex Review -- Round 4 -- RESULT

PR: #8
Base branch: develop
Reviewer: codex exec --output-schema (structured output)
Result timestamp (UTC): 2026-06-07T14:44:30Z
Duration (s): 132.0
Verdict: ✅ patch is correct (zero P0/P1/P2 findings)
Confidence: 0.83

Overall explanation

No P0/P1/P2 issues were found in the PR diff. The focused regression tests for the changed RPC event-loop and tensor transport paths passed with pytest addopts disabled; a full local suite could not collect because optional test dependencies such as torch are not installed.

Findings (0)

No findings.


Raw JSON (schema-enforced)
{
  "findings": [],
  "overall_confidence_score": 0.83,
  "overall_correctness": "patch is correct",
  "overall_explanation": "No P0/P1/P2 issues were found in the PR diff. The focused regression tests for the changed RPC event-loop and tensor transport paths passed with pytest addopts disabled; a full local suite could not collect because optional test dependencies such as torch are not installed."
}

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.

Comment on lines +85 to +87
# CUDA IPC configured at runtime -> defer the on-device tensor (do NOT downgrade).
monkeypatch.setenv("PYISOLATE_ENABLE_CUDA_IPC", "1")
assert _serialize() is tensor
Comment on lines +52 to 54
if os.environ.get("PYISOLATE_ENABLE_CUDA_IPC") == "1":
return data
return data.cpu()
Comment on lines 277 to 281
@@ -277,6 +281,10 @@ def _prepare_for_rpc_impl(
return obj.cpu()
@pollockjj

Copy link
Copy Markdown
Owner Author

Superseded by #9 — a single clean squash off the current develop tip carrying only the two verified fixes (per-channel tensor transport + modern-Python event loop) plus the version bump. The run()-loop-rebind and fallback-loop-close commits on this branch were verified to be no-ops in every real construction path and were dropped.

@pollockjj pollockjj closed this Jun 7, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants