Official Release: AMD Strix Point Native Optimization and Director St… by Cyb3rLab5 · Pull Request #796 · lllyasviel/FramePack

Cyb3rLab5 · 2025-12-18T08:06:52Z

…udio

Co-authored-by: Cyb3rLab5 <224908985+Cyb3rLab5@users.noreply.github.com>

Cached the weight and bias tensors in `vae_decode_fake` based on device and dtype. This prevents re-allocation and conversion on every call, significantly reducing overhead in hot loops. Co-authored-by: Cyb3rLab5 <224908985+Cyb3rLab5@users.noreply.github.com>

Refactored `HunyuanVideoRotaryPosEmbed` to calculate Rotary Positional Embeddings (RoPE) iteratively over 1D components (T, H, W). PyTorch's zero-copy `.expand()` and views are then used to map these vectors into 3D. This eliminates massive redundant 3D grid creations (via `torch.meshgrid`) and sequential looping over batches. Performance benchmark shows ~4.3x forward pass speedup for batch coordinate generations. Co-authored-by: Cyb3rLab5 <224908985+Cyb3rLab5@users.noreply.github.com>

…6079084948 ⚡ Bolt: [performance improvement] 1D RoPE Expand

…858351843 ⚡ Bolt: [performance improvement] Cache tensors in vae_decode_fake

Batch the torch.cuda.memory_stats queries to check only every 25 modules, significantly reducing CPU-GPU synchronization stalls. Co-authored-by: Cyb3rLab5 <224908985+Cyb3rLab5@users.noreply.github.com>

…head-55594135257129418 ⚡ Bolt: Optimize VRAM polling overhead

…27942520 ⚡ Bolt: [performance improvement]

Bobby Jackson and others added 13 commits December 18, 2025 01:07

Official Release: AMD Strix Point Native Optimization and Director St…

2d47510

…udio

Final Xmas Delivery: Total FramePack Rebrand & Hardware Profiles

7f15782

Restore and verify AMD Strix Point Studio build after PC reset

27a0bfe

perf: optimize get_cuda_free_memory_gb for non-CUDA devices

c63ea01

Co-authored-by: Cyb3rLab5 <224908985+Cyb3rLab5@users.noreply.github.com>

Merge pull request #14 from Cyb3rLab5/bolt-optimize-rope-3d-204299705…

f8d3f44

…6079084948 ⚡ Bolt: [performance improvement] 1D RoPE Expand

Merge branch 'main' into bolt-vae-decode-cache-1923750923858351843

2215e52

Merge pull request #7 from Cyb3rLab5/bolt-vae-decode-cache-1923750923…

d959556

…858351843 ⚡ Bolt: [performance improvement] Cache tensors in vae_decode_fake

⚡ Bolt: Optimize VRAM polling overhead

2ca2e4e

Batch the torch.cuda.memory_stats queries to check only every 25 modules, significantly reducing CPU-GPU synchronization stalls. Co-authored-by: Cyb3rLab5 <224908985+Cyb3rLab5@users.noreply.github.com>

Merge pull request #16 from Cyb3rLab5/bolt-optimize-vram-polling-over…

79e920c

…head-55594135257129418 ⚡ Bolt: Optimize VRAM polling overhead

Merge branch 'main' into bolt-opt-memory-check-230278557527942520

6fe8e52

Merge pull request #5 from Cyb3rLab5/bolt-opt-memory-check-2302785575…

5c5a98c

…27942520 ⚡ Bolt: [performance improvement]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Official Release: AMD Strix Point Native Optimization and Director St…#796

Official Release: AMD Strix Point Native Optimization and Director St…#796
Cyb3rLab5 wants to merge 13 commits into
lllyasviel:mainfrom
Cyb3rLab5:main

Cyb3rLab5 commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Cyb3rLab5 commented Dec 18, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant