Tinygrad MPS#65
Conversation
^here's my current error, if anyone wants to tap on |
|
Threw in a Tensor recast in Metal. Here's where I'm at: |
|
Ayyy it's running (kinda), gotta confirm output but it's past 0%! Looks like attention needed to be segmented, but MPS doesn't have a segmented attention built-in. Built my own |
|
@lllyasviel , would love your take on this. Got the MPS kernels thrown in, and split the attention up to batch it. I don't know if we should expose a slider through gradio for chunk-sizing on M-series machines, but it's currently diffusing (slowly). One operation only runs on CPU on Mac, so you can run it, but it just takes a while (1h30m for a 25 frame video) |
|
Okay - finally started updating portions of this for more gains.
|
|
Just a Note, there's another fork for mac which seems to be faster https://www.reddit.com/r/StableDiffusion/comments/1k2neim/framepack_on_macos/ https://github.com/brandon929/FramePack
|
|
@e1732a364fed , appreciate it! Looking into it now. |
|
I have made pytorch supporting
250418_230711_181_7405_37.mp4
250420_173459_831_8394_28.mp4I just saw the new PR, hope it would be helpful for that. |
|
@donghao1393 I gotcha - I chunked these buffers down fairly aggressively to generate the output. Output's not good? I tried getting it out with the guy jumping, and it worked well! Let me know what's going wrong. Maybe it's because my machine's just really constrained, I've got a M3 Pro with 18GB of RAM, so I'm really constrained |
|
@mdaiter It's in the first output. The video seems not playing right. 250418_230711_181_7405_37.reoutput.mp4 |
Sorry, the auto AI stuff bonked a lot of my code. Rolling some stuff back, this is just a train of thought. Basically, just disregard this.
I'm gonna start digging to get tinygrad grafted into the transformer bit. Metal blocks (and compacting that thing down in general) should help a lot with speed + memory usage on machines.