[Web] Improve large tensor loading in wasm runtime#19771
Conversation
There was a problem hiding this comment.
Code Review
This pull request restructures TVM-FFI includes in wasm_runtime.cc to prevent static initialization crashes and updates ArrayDecodeStorage to tolerate uncompressed float32 weights under the 'f32-to-bf16' format. In runtime.ts, it introduces chunked record loading and copying (up to 128MB chunks) to handle large tensors efficiently, and adds support for kTVMFFIShape types. The review feedback suggests optimizing these chunking loops by utilizing the cached makeShapeTuple method on the Instance class rather than invoking the FFI this.ctx.makeShapeTuple repeatedly, which reduces redundant FFI round-trips.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
| this.ctx.makeShapeTuple( | ||
| ...chunkShape.map((value) => new Scalar(value, "int")), | ||
| ), |
There was a problem hiding this comment.
We can leverage the cached makeShapeTuple method on the Instance class instead of directly calling the FFI this.ctx.makeShapeTuple on every chunk. This avoids redundant FFI round-trips to create the same shape tuple multiple times across chunks and records, improving performance.
this.makeShapeTuple(chunkShape),| const chunkShapeTuple = this.ctx.makeShapeTuple( | ||
| ...chunkShape.map((value) => new Scalar(value, "int")), | ||
| ); |
There was a problem hiding this comment.
We can leverage the cached makeShapeTuple method on the Instance class instead of directly calling the FFI this.ctx.makeShapeTuple on every chunk. This avoids redundant FFI round-trips to create the same shape tuple multiple times across chunks and records, improving performance.
const chunkShapeTuple = this.makeShapeTuple(chunkShape);012380a to
9c4334d
Compare
This splits out the Web/WebGPU runtime-only portion of #19766 into a smaller PR, following reviewer feedback that the compiler-side changes should be handled separately.
This PR keeps the scope to
web/runtime code:ArrayDecodeStoragetoleratef32-to-bf16records whose payload is already native float32-sized, while preserving the existing packed-bf16 expansion pathkTVMFFIShapecallback results as JS number arrays so chunked tensor views can pass explicit shape tuplesLocal validation:
npm run lintfromweb/npx tsc --noEmit --pretty falsefromweb/git diff --checkI could not run the full local
npm run prepwasm && npm run buildpath on this machine because Emscripten (emcc/emsdk) is not installed. The earlier broad PR had the Apache wasm CI pass before the compiler-side CI failures; this PR is intended to let the wasm job validate the runtime-only split independently.