You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
|2025.01| 🔥🔥[**Token Pruning**] Token Pruning for Caching Better: 9× Acceleration on Stable Diffusion for Free(@SJTU) |[[pdf]](https://arxiv.org/pdf/2501.00375)|[[DaTo]](https://github.com/EvelynZhang-epiclab/DaTo)|⭐️⭐️ |
96
96
|2025.04| 🔥🔥[**AB-Cache**] AB-Cache: Training-Free Acceleration of Diffusion Models via Adams-Bashforth Cached Feature Reuse(@USTC) |[[pdf]](https://arxiv.org/pdf/2504.10540)| ⚠️|⭐️⭐️ |
97
97
98
-
## 📙Awesome Diffusion Distributed Inference with Multi-GPUs
98
+
## 📙 Multi-GPUs
99
99
100
100
<divid="Distributed"></div>
101
101
@@ -116,9 +116,10 @@
116
116
|2024.05 | 🔥🔥[**TensorRT-LLM SDXL**] SDXL Distributed Inference with TensorRT-LLM and synchronous comm(@Zars19) |[[pdf]](https://arxiv.org/abs/2402.19481)|[[SDXL-TensorRT-LLM]](https://github.com/NVIDIA/TensorRT-LLM/pull/1514)| ⭐️⭐️ |
117
117
|2024.06| 🔥🔥[**Clip Parallelism**] Video-Infinity: Distributed Long Video Generation(@nus.edu)|[[pdf]](https://arxiv.org/pdf/2406.16260)|[[Video-Infinity]](https://github.com/Yuanshi9815/Video-Infinity)|⭐️⭐️ |
118
118
|2024.05| 🔥🔥[**FIFO-Diffusion**] FIFO-Diffusion: Generating Infinite Videos from Text without Training(@Seoul National University)|[[pdf]](https://arxiv.org/pdf/2405.11473)|[[FIFO-Diffusion]](https://github.com/jjihwan/FIFO-Diffusion_public)|⭐️⭐️ |
119
+
|2025.01| 🔥🔥[**ParaAttention**] Context parallel attention that accelerates DiT model inference with dynamic caching(@chengzeyi)|[[docs]](https://github.com/chengzeyi/ParaAttention)|[[ParaAttention]](https://github.com/chengzeyi/ParaAttention)|⭐️⭐️ |
119
120
120
-
## 📙Other Awesome Diffusion Inference Paper with codes
|2024.11| 🔥🔥[**SVDQuant**] SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models|[[pdf]](https://arxiv.org/pdf/2411.05007)|[[nunchaku]](https://github.com/mit-han-lab/nunchaku)|⭐️⭐️ |
134
+
|2024.10|🔥🔥[**SageAttention**] SAGEATTENTION: ACCURATE 8-BIT ATTENTION FOR PLUG-AND-PLAY INFERENCE ACCELERATION(@thu-ml)|[[pdf]](https://arxiv.org/pdf/2410.02367)|[[SageAttention]](https://github.com/thu-ml/SageAttention)| ⭐️⭐️ |
135
+
|2024.11|🔥🔥[**SageAttention-2**] SageAttention2: Efficient Attention with Thorough Outlier Smoothing and Per-thread INT4 Quantization(@thu-ml)|[[pdf]](https://arxiv.org/pdf/2411.10958)|[[SageAttention]](https://github.com/thu-ml/SageAttention)| ⭐️⭐️ |
136
+
|2025.03|🔥🔥[**SpargeAttention**] SpargeAttn: Accurate Sparse Attention Accelerating Any Model Inference(@thu-ml)|[[pdf]](https://arxiv.org/pdf/2502.18137)|[[SpargeAttn]](https://github.com/thu-ml/SpargeAttn)| ⭐️⭐️ |
0 commit comments