Skip to content

Claude/add lora loading node w xg9u#219

Open
czzs wants to merge 15 commits into
runpod-workers:mainfrom
czzs:claude/add-lora-loading-node-wXG9u
Open

Claude/add lora loading node w xg9u#219
czzs wants to merge 15 commits into
runpod-workers:mainfrom
czzs:claude/add-lora-loading-node-wXG9u

Conversation

@czzs
Copy link
Copy Markdown

@czzs czzs commented Apr 2, 2026

Motivation

Issues closed

Changeset

claude and others added 15 commits January 13, 2026 21:33
Includes comfyui-kjnodes, comfyui-videohelpersuite,
comfyui-frame-interpolation, and comfyui_essentials nodes.
Uses base image with no pre-loaded models for network volume usage.
…oo9-h2AMk

Add custom Dockerfile for video/animation workflows
Builds and pushes Dockerfile.custom to Docker Hub on:
- Push to main when Dockerfile.custom changes
- Manual workflow dispatch

Tags images with both 'latest' and commit SHA.
Updated base image version for ComfyUI worker.
…oo9-h2AMk

Claude/explain codebase mkd3ltpfzd90poo9 h2 a mk
…oo9-h2AMk

Fix: Free up disk space before Docker build
…oo9-h2AMk

Add ComfyUI-GIMM-VFI node for video frame interpolation
The ws.recv() call previously blocked indefinitely with no timeout,
causing RunPod jobs to fail with "WebSocket message timeout after 60.0
seconds" on heavy workloads like video frame interpolation.

- Add WEBSOCKET_MESSAGE_TIMEOUT env var (default 600s / 10 minutes)
- Set ws.settimeout() after initial connect and after reconnect
- On timeout, probe ComfyUI HTTP health before resuming the wait;
  fail fast if the server has become unreachable

https://claude.ai/code/session_01TXa8rubsUoz56PV36fLRpS
On timeout, before deciding whether to keep waiting or fail, poll
ComfyUI's /queue endpoint to check if the job is still actively
running or pending. This matches the fix applied to the regular GPU
server's generation_worker.py.

Timeout handler now follows this decision tree:
1. ComfyUI HTTP unreachable → fail immediately
2. Job still in /queue (running/pending) → continue waiting
3. Queue check failed (network blip) → continue (benefit of doubt)
4. Job NOT in queue and NOT completed → fail with clear error

https://claude.ai/code/session_01TXa8rubsUoz56PV36fLRpS
Install the LoRA model loading node from sourceful-official that supports
loading LoRA models by URL directly.

https://claude.ai/code/session_01V44ZJUxEKQn8HMUPvgDSXg
The node is available on the Comfy Registry, so use comfy-node-install
instead of git clone.

https://claude.ai/code/session_01V44ZJUxEKQn8HMUPvgDSXg
Copy link
Copy Markdown
Contributor

@TimPietruskyRunPod TimPietruskyRunPod left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the contribution, but this PR is doing too many unrelated things at once and needs to be split before it can be merged.

The PR title is "add lora loading node" but the diff actually contains:

  1. A new Dockerfile.custom with hardcoded custom nodes (comfyui-kjnodes, comfyui-videohelpersuite, comfyui-frame-interpolation, comfyui_essentials, comfyui-gimm-vfi, loadloramodelonlywithurl)
  2. A new .github/workflows/build-custom.yml that pushes to ${{ secrets.DOCKERHUB_USERNAME }}/runpod-comfy-worker (not the official runpod/worker-comfyui namespace)
  3. A new WEBSOCKET_MESSAGE_TIMEOUT env var + handler change for socket recv timeout (this looks like it might be useful, but it's totally unrelated to the title)
  4. An empty PR description (no motivation, no changeset)
  5. No tests, and no validation that the lora node actually loads correctly

I can't merge any of this as-is:

  • The custom Dockerfile.custom belongs in the user's own fork. It's not a generic improvement to the official worker — it's a specific bundle of nodes for one use case.
  • The build-custom.yml pushing to a different Docker Hub namespace is definitely not appropriate for this repo.
  • The WEBSOCKET_MESSAGE_TIMEOUT change might actually be valuable. If you want to land that, please open a separate PR with: just that change, a changeset, a PR body explaining the motivation (which heavy workload made you hit the silent timeout?), and ideally a reproduction.

Requesting changes — please close this and open focused PRs for the parts you actually want upstreamed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants