Add split GPU text encoder cache#9310
Open
Jacid23 wants to merge 1 commit into
Open
Conversation
Collaborator
|
This is a great idea. Heads up that this is a generic multi-GPU PR coming down the pike (#5997) and this will need some adaptation to work with that scheme. I'll be working on an integration. |
This was referenced Jun 28, 2026
lstein
added a commit
to lstein/InvokeAI
that referenced
this pull request
Jun 29, 2026
Adds `offload_text_encoders_to_idle_gpus` (default on): when more than one generation device is configured and a GPU is idle, a session's text/prompt encoder runs on the idle GPU instead of the one running its denoise pipeline. This avoids evicting the denoise model from VRAM to make room for the encoder, and lets a cached encoder be reused across generations. Under full load (no idle GPU) behavior is unchanged. Mechanism: - New GENERATION_DEVICE_POOL arbiter (backend/util/device_pool.py) with a per-device exclusive-use lock. A native session blocking-acquires its own device's lock for the whole run; an encoder node try-borrows an idle device's lock for the duration of the node. This makes a borrowed encoder and a native session mutually exclusive on a GPU -- preventing the shared-encoder corruption that produced garbled images -- and is deadlock-free (borrows are non-blocking; a session only ever blocks on its own device). - DefaultSessionRunner re-pins the worker thread to the borrowed device for the whole encoder node; conditioning is stored on the CPU and the denoiser picks it up on its own GPU afterward. - Nodes opt in via @invocation(idle_gpu_offloadable=True), mirroring the existing `bottleneck` ClassVar marker. Applied to the text/prompt encoder nodes (compel + sdxl/refiner, flux, sd3, qwen-image, anima, cogview4, flux2 klein, z-image, flux_redux). Inspired by invoke-ai#9310; supersedes it. Tests: device-pool lock semantics, two concurrency regression tests asserting a session and a borrow never use a GPU at the same time, the runner offload context-manager behavior, and a marker-wiring check. Docs: invokeai-yaml.mdx (config setting) and creating-nodes.mdx (how to support the feature in a node). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Collaborator
|
Because of pending #9263 , this PR will be in conflict and can't be merged. However, the idea has been folded into a pending PR in my personal repository that will be posted here after 9263 goes in. It is in lstein#137 if you'd like to take a look. I will give full credit to @Jacid23 for the concept and initial implementation. |
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Why
Text encoder loads can force the denoise model to unload/reload on single-device cache paths. On dual-GPU systems, keeping text encoders resident on the other CUDA device avoids that churn and makes repeated generation materially smoother.
Behavior
Verification
pnpm lint:prettierpnpm lint:tscpnpm lint:knipopenapi.jsonNotes
This branch was prepared from
upstream/mainand squashed to one focused commit. It does not include local fork/runtime update scripts, batch-specific files, or unrelated compatibility work.