feat(core): GPU instancing auto-batching#2957
Conversation
…nce data Introduce automatic GPU instancing for MeshRenderer. The system scans renderer-group uniforms across shader passes, builds a unified std140 UBO layout, and packs per-instance data (ModelMat, Layer, etc.) each frame. Key changes: - InstanceDataPacker: packs renderer data into shared UBO for instanced draw - ShaderFactory: unified _scanInstanceUniforms, _buildLayout, _injectInstanceUBO - MeshRenderer._canBatch/_batch: instancing merge logic - ShaderPass/SubShader: instance-aware compilation with macro cache - GLSLIfdefResolver: compile-time #ifdef resolution for instance field scanning - MacroCachePool: pooled ShaderMacroCollection for shader program caching - RenderQueue: instance-aware draw path with UBO binding
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
WalkthroughAdds WebGL2 GPU instancing: instance UBO management, instance-aware shader compilation and caching, batching API/signature changes, render-queue instanced draw paths, device/GL helpers, examples and E2E tests, and supporting shader/uniform enhancements. Changes
Sequence Diagram(s)sequenceDiagram
participant RenderQueue as RenderQueue
participant BatcherManager as BatcherManager
participant InstanceBatch as InstanceBatch
participant GPUBuffer as GPU Constant Buffer
participant ShaderProgram as ShaderProgram
RenderQueue->>BatcherManager: request instanceBatch (lazy)
BatcherManager->>InstanceBatch: setLayout(layout)
InstanceBatch->>GPUBuffer: create/realloc UBO (if needed)
loop per instanced chunk
RenderQueue->>InstanceBatch: upload(renderers[], start, count)
InstanceBatch->>InstanceBatch: pack per-instance fields into CPU buffer
InstanceBatch->>GPUBuffer: setData(range, Discard)
end
RenderQueue->>ShaderProgram: bindUniformBlocks(bindingMap)
ShaderProgram->>GPUBuffer: uniformBlockBinding(bindingPoint)
RenderQueue->>RenderQueue: issue drawPrimitive with instanceCount
Estimated code review effort🎯 4 (Complex) | ⏱️ ~60 minutes Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Tip 💬 Introducing Slack Agent: The best way for teams to turn conversations into code.Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.
Built for teams:
One agent for your entire SDLC. Right inside Slack. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
Move _gpuInstanceMacro after _macroMap declaration to fix static initialization order. Also apply prettier formatting fixes.
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## dev/2.0 #2957 +/- ##
===========================================
- Coverage 78.25% 77.38% -0.87%
===========================================
Files 900 909 +9
Lines 99234 100649 +1415
Branches 10172 10261 +89
===========================================
+ Hits 77657 77891 +234
- Misses 21406 22580 +1174
- Partials 171 178 +7
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
…es layout The instance UBO is injected at compile time by ShaderFactory, so the original shader uniform declarations don't need modification.
Bring back normalMat extraction, transform_declare reordering, trailing whitespace fixes, and VertexPBR indent fix. Only the renderer_Layer relocation stays reverted.
… 3×vec4 - Fix SubRenderElement.set() not resetting instanceDataPacker, causing stale packer references from previous frames to break all batching - Use whitelist + _group fallback for identifying renderer uniforms in _scanInstanceUniforms (fixes _group===undefined for ModelMat) - Store ModelMat as 3×vec4 (affine rows) instead of mat4 in UBO, saving 16 bytes per instance (structSize 80→64, +25% instances/batch) - Add camera_VPMat to transform_declare.glsl for derived MVP define - Extract struct definition outside uniform block (GLSL ES 3.00 compat) - Fix _insertUBOBlock to only scan initial #define preamble - Pass instanceID to fragment shader via flat varying
…lity Wrap derived NormalMat define with mat4() so instancing and non-instancing paths both produce mat4, avoiding shader compilation errors. Add custom instance data example to verify per-renderer uniform batching.
Rename elementA/elementB to preSubElement/subElement across all renderer subclasses and BatchUtils. Change _batch signature so preSubElement is nullable (null = batch head, no previous element to merge with), and subElement is always required.
- Move RENDERER_GPU_INSTANCE macro from ShaderMacro to InstanceDataPacker - Rename getOrCreate() to get() in InstanceDataPackerPool - Clear compileMacros in InstanceDataPacker.reset()
Move macro merging, layout computation, and UBO packing from batch phase to render phase. _batch now only collects renderers into a pre-allocated list. RenderQueue.render handles macro union, layout lookup, and splits by maxInstanceCount for sub-batch rendering. - SubRenderElement: instanceDataPacker → instancedRenderers (pooled array) - MeshRenderer._canBatch: remove maxInstanceCount check - MeshRenderer._batch: only push renderers, zero allocation - InstanceDataPacker: remove compileMacros/addRenderer/instanceCount, add packAndUpload(renderers, start, count) - InstanceDataPackerPool: remove uploadBuffer, simplify reset - BatcherManager: remove instancing uploadBuffer call
Packer is now stateless (setLayout + packAndUpload + draw), so only one instance is needed. Discard upload ensures no GPU stall when reusing the same buffer across shadow and main passes. - Delete InstanceDataPackerPool.ts - BatcherManager: instanceDataPackerPool → instanceDataPacker - Remove resetInstanceDataPackerPool lifecycle - Saves GPU memory by using single buffer instead of pool
Rename class, file, and all references to better reflect its role as an instance batch manager rather than a generic data packer.
It's a macro-keyed map, not a pool (no borrow/return semantics).
Align variable and method names with MacroMap rename — these are maps, not pools.
Callers always pass a valid buffer, no need for null guard.
- nativeBuffer → buffer, _uboData → _data - Replace separate instanceFields/_structSize with single _layout ref - setLayout() now takes InstanceLayout directly
- Remove unnecessary null guard on _layout - Inline uploadElements variable - Destructure floatView/intView from this - Improve worldMatrix comment
- Remove unnecessary component→renderer alias, use component directly - Hoist bindUniformBlocks/bindUniformBufferBase out of sub-batch loop - Move primitive.instanceCount=0 after loop (only need to reset once) - Remove redundant let layout = undefined
- Upgrade lint-staged from v10.5 to v16.4.0
- Fix glob from *.{ts} to **/*.ts to match subdirectory files
- Remove redundant git add from tasks
- eslint 8.44 → 8.57 - @typescript-eslint/parser and eslint-plugin 6.x → 8.x - Eliminates "unsupported TypeScript version" warning
Dispose means the object is permanently released, so null the array to free memory instead of just clearing length.
Eliminate redundant SubShader._getInstanceLayout / ShaderPass._scanInstanceFields by reusing the shader compilation chain — _injectInstanceUBO now returns InstanceLayout directly, stored on ShaderProgram._instanceLayout.
GuoLei1990
left a comment
There was a problem hiding this comment.
Re-review (Incremental) — GPU Instancing + UIBatchSorter + e2e Cases
已关闭问题清单
前 15+ 轮 review 提出的所有关键问题均已修复或经作者解释关闭(详见 2026-04-15 LGTM)。2026-05-08 增量 review 新发现两个问题,本轮验证其是否已修复。
问题
[P0] RenderElement.dispose() — instancedRenderers = null 导致对象池复用时 NPE(仍未修复)
diff 中(RenderElement.ts 新增的 dispose() 方法):
dispose(): void {
this.component = null;
this.material = null;
// ...
this.instancedRenderers = null; // ← 仍然是 null 赋值
// ...
}而 set() 中初始化为:
this.instancedRenderers.length = 0; // ← 对 null 调用 .length = 0 → TypeErrorObjectPool 复用路径:dispose() 归还 → 下一帧 get() 取出 → 调用 set() → NPE 必现。
需改为:
// dispose() 中:
this.instancedRenderers.length = 0;
// 或者在 set() 中:
this.instancedRenderers = this.instancedRenderers ?? [];
this.instancedRenderers.length = 0;[P1] _scanInstanceUniformsWithMacros 不处理 #if/#elif 导致栈错位(仍未修复)
注释写了 #if with expressions is treated as always-active,但实现中只有四个正则:_ifdefRegex/_ifndefRegex/_elseRegex/_endifRegex,没有处理 #if/#elif。
cptbtptpbcptdtptp(2026-05-07)分析准确:遇到 #if 不 push,其配对的 #endif 仍然 pop,branchStack 错位,导致 #endif 之后本应活跃的 renderer_ uniform 被跳过——不进 UBO 也不生成 #define 重映射,但 GLSL 端仍视其为普通 renderer uniform,instanced 模式下该 block 上传被跳过,渲染静默错误。
最小修复:
m = line.match(/^[ \t]*#if\b/);
if (m) { branchStack.push(branchStack[branchStack.length - 1]); continue; } // always-active as documented
m = line.match(/^[ \t]*#elif\b/);
if (m) { branchStack.pop(); branchStack.push(branchStack[branchStack.length - 1]); continue; }e2e Case 质量
gpu-instancing-auto-batch.ts — 有可靠性隐患:依赖外部 CDN URL(gw.alipayobjects.com),网络抖动会导致 flaky。其他 e2e case(gpu-instancing-custom-data、ui-batch-order)全部使用本地资源,diffPercentage: 0,更稳定。建议将 auto-batch case 的 Duck GLB 换成本地 PrimitiveMesh,彻底消除外部依赖。
ui-batch-order.ts — 设计正确:固色纹理、本地资源、diffPercentage: 0、注释清晰说明 regression 场景。质量高。
gpu-instancing-custom-data.ts — ShaderLab 内联 + PrimitiveMesh,全确定性,质量高。
UIBatchSorter 单元测试 — 覆盖核心场景,从公开 API 入口断言输出顺序,符合链路测试原则,质量好。
结论
两个 P0/P1 bug 在上次 review(2026-05-08)提出后,本轮验证仍未修复,请修复后合入。
| /** @internal */ | ||
| @ignoreClone | ||
| _batchedRenderElements: any[] = []; | ||
| /** @internal */ |
There was a problem hiding this comment.
_renderElements 和 _batchedRenderElements 在 canvas 不可见时(_canRender 返回 false、onDisable 等)不会被清空,但它们持有的 RenderElement 来自对象池,可能已被其他 renderer 复用。这些悬挂引用会阻止 GC 回收。
建议在 onDisable 里清空:
override onDisable(): void {
super.onDisable();
this._renderElements.length = 0;
this._batchedRenderElements.length = 0;
}
GuoLei1990
left a comment
There was a problem hiding this comment.
已关闭问题清单
前 20+ 轮提出的所有关键问题均已修复或经作者解释关闭:
| 问题 | 状态 |
|---|---|
_canBatch 缺少 renderer-level state 判定 |
✅ 已修复(macroCollection.isEqual + frontFace) |
renderer_LocalMat/renderer_MVInvMat 未删除 |
✅ 已修复(完全移除) |
| uniformBlockBindingMap 归属 | ✅ 已修复(移到 ShaderFactory) |
renderer_NormalMat per-vertex inverse |
✅ 已关闭(GPU ALU 换 UBO 空间,可接受) |
| Opaque 距离排序移除 | ✅ 已关闭(移动端 TBDR,合批优先) |
总结
GPU Instancing 自动合批架构已收敛。UIBatchSorter 同步修复了 canvas hierarchy 顺序被 unstable quicksort 打散的视觉 bug。整体方向正确,iPhone +67% / UI +120% 数据有说服力。
问题(需在合并前修复)
[P0] RenderElement.ts — dispose() 将 instancedRenderers 设为 null,但 set() 调用 this.instancedRenderers.length = 0,ObjectPool 复用路径必然 NPE
// dispose():
this.instancedRenderers = null; // ← 设为 null
// set():
this.instancedRenderers.length = 0; // ← pool 复用时 dispose→get→set,NPE修复:dispose() 中改为 this.instancedRenderers.length = 0(清空数组但保留引用)。
[P2] e2e/case/gpu-instancing-auto-batch.ts 依赖外部 CDN
使用 gw.alipayobjects.com Duck GLB,CDN 网络抖动会导致 e2e flaky。建议改用 PrimitiveMesh.createCuboid/createSphere(参考 gpu-instancing-custom-data.ts 的正确做法)。
GuoLei1990
left a comment
There was a problem hiding this comment.
已关闭问题清单
前 15+ 轮 review 提出的所有关键问题均已修复或关闭(详见上一轮 review)。
总结
基于 std140 UBO 的自动 GPU Instancing 实现完整、架构收敛。ShaderFactory.injectInstanceUBO 扫描 renderer-group uniform → 计算 std140 布局 → 注入 UBO block + #define 重映射,对现有 shader 零侵入;mat3x4 仿射优化、per-pass 独立布局、InstanceLayout 缓存在 ShaderProgram、派生 uniform 通过 #define 实时计算——这些设计决策均正确。UIBatchSorter canvas 内提前合批修复了 hierarchy 顺序被 unstable quicksort 打散的视觉错乱 bug。iPhone 16 PM 实测 MeshRenderer +67%、UI +120%,数据有说服力。
自上次 LGTM(2026-04-15)以来,本 PR 无新 commit。
LGTM。无新问题,可以合入。
合并后建议在 release notes 标注两个破坏性变更:
renderer_LocalMat和renderer_MVInvMatuniform 已移除renderer_NormalMat当前为 GPU 端 per-vertexinverse(),后续可考虑 CPU 预计算打入 UBO(@todo已标注)
GuoLei1990
left a comment
There was a problem hiding this comment.
已关闭问题清单
前 14+ 轮 review 的问题均已修复或经作者合理解释关闭,包括:
- WebGL2 守卫:上游
_canBatch约束已保证,不需要重复防御 ✅ renderer_LocalMat/renderer_MVInvMat删除 ✅_canBatch缺 renderer 级宏比较:已加shaderData._macroCollection.isEqual()✅- SubRenderElement dispose NPE:pool GC 后清空,不复用旧对象 ✅
ShaderProgramMapengine 必选参数 ✅- bool/uint 类型补全
_std140TypeInfoMap✅ maxUniformBlockSize缓存、uniformBlockBinding移至 link 阶段 ✅
总结
GPU Instancing auto-batching 架构扎实:UBO std140 打包 + shader 注入 + mat3x4 仿射优化。iPhone 实测 +67% 的提升验证了价值。UIBatchSorter canvas 内 visual-layering 排序设计合理,同时修复了 hierarchy 顺序被 unstable quicksort 打散的 bug。整体是高质量的架构改进。
问题
[P0] RenderElement.ts:dispose() — this.instancedRenderers = null 与 set() 中的 this.instancedRenderers.length = 0 不匹配。
ObjectPool 复用路径:garbageCollection() → dispose() → 归还到池 → get() 取出 → set() 初始化。dispose() 后 instancedRenderers = null,set() 执行 this.instancedRenderers.length = 0 时 null.length 抛 TypeError,必现崩溃。
dispose(): void {
- this.instancedRenderers = null;
+ this.instancedRenderers.length = 0;
}[P1] ShaderFactory.ts — _scanInstanceUniformsWithMacros 不处理 #if/#elif
当前实现只识别 #ifdef/#ifndef/#else/#endif。GLSL 代码中 #if defined(X) && !defined(Y) 或 #elif 形式的条件分支不会被识别,导致 branchStack 不 push/pop,后续所有 #endif pop 出错,#endif 之后的 renderer uniform 被错误排除在 fieldMap 之外,instanced 模式下 uniform 未被重映射,GPU 读到错误数据,渲染静默错误。
packages/core/src/shaderlib/ShaderFactory.ts 中 _ifdefRegex、_ifndefRegex 之后需要补:
private static readonly _ifRegex = /^[ \t]*#if\b/;
private static readonly _elifRegex = /^[ \t]*#elif\b/;并在循环中处理:
m = line.match(ShaderFactory._ifRegex);
if (m) { branchStack.push(false); continue; } // treat as inactive branch for safety
m = line.match(ShaderFactory._elifRegex);
if (m) { branchStack.pop(); branchStack.push(false); continue; }注意:#if 带表达式无法静态求值(需要完整 preprocessor),但至少保持 branchStack 不乱。当前代码注释已承认"supported; #if with expressions is treated as always-active",但代码里根本没有处理 #if,导致 branchStack 损坏而不是"always-active"。
[P2] UIBatchSorter.ts — 静态成员在多画布同帧调用时存在状态泄漏
_entries、_grid、_dirtyCells、_localBoundsPool、_worldToLocal 均为静态成员,若同一帧内多个 UICanvas 各自调用 sort()(ScreenSpaceCamera + WorldSpace 并存),后一次调用的 _worldToLocal 会覆盖前一次,_entries 跨调用累积。请确认是否有多画布同帧场景。如有,改为实例成员或在 sort() 入口强制重置所有静态状态。
结论
P0 NPE 和 P1 branchStack 损坏是可复现 bug,需要在合入前修复。P2 UIBatchSorter 视多画布场景是否存在决定是否需要处理。
| if (propertyId === modelMatId) { | ||
| // Instancing skips _updateTransformShaderData, so worldMatrix is not in propertyValueMap | ||
| // Must read from transform getter to trigger lazy update | ||
| field.pack(floatView, fieldOffset, renderer.entity.transform.worldMatrix); |
There was a problem hiding this comment.
renderer.entity.transform.worldMatrix 和 Renderer._updateTransformShaderData 里的 this._transformEntity.transform.worldMatrix 不一致。当前 MeshRenderer 两者相同所以安全,但如果未来有子类修改了 _transformEntity(如 SkinnedMeshRenderer 设为 rootBone)又支持 instancing,会静默用错 transform。
建议改为:
field.pack(floatView, fieldOffset, renderer._transformEntity.transform.worldMatrix);| vec4: { size: 16, align: 16 }, | ||
| ivec4: { size: 16, align: 16 }, | ||
| mat4: { size: 64, align: 16 }, | ||
| mat3x4: { size: 48, align: 16 } |
There was a problem hiding this comment.
_std140TypeInfoMap 缺少 bool / bvec2~4 / mat2 / mat3 / uvec2~4,当 shader 中声明了这些类型的 renderer uniform 时会导致 shader 编译失败。
复现路径:
_scanInstanceUniforms匹配到uniform mat3 renderer_Foo→ 写入fieldMap→ 删除 uniform 声明(返回空字符串)_buildLayout→addField→std140Map["mat3"]为undefined→ 静默return,字段不进instanceFields_buildFieldDefines不生成对应的#define renderer_Foo ...- GLSL 编译时
renderer_Foo既没有 uniform 声明也没有#define→ 编译报错 undefined variable
仅在 instancing 路径触发(2+ 相同 renderer 合批),单个 MeshRenderer 不走 instancing 则不会暴露。
建议两种方案之一:
- 补全
_std140TypeInfoMap中缺失的类型 - 在
_scanInstanceUniforms中,类型不在 type map 里时保留原始 uniform 声明(return match)并 log warning,而不是删除
…ity's InstanceBuffer read renderer.entity.transform.worldMatrix, while Renderer._updateTransformShaderData uses _transformEntity.transform. They diverge whenever a subclass remaps _transformEntity (e.g. SkinnedMeshRenderer points it at the root bone); switch InstanceBuffer to _transformEntity to align with the shader-data path.
…sable Replace `any[]` with `RenderElement[]` on `_renderElements` / `_batchedRenderElements`, and clear both in `_onDisable` so the pooled elements they hold don't surface stale references when other renderers reuse the same pool slots while the canvas is disabled.
GuoLei1990
left a comment
There was a problem hiding this comment.
增量审查(2026-05-12,基于最新 commits)
已关闭问题清单
前 15+ 轮 review 提出的所有问题均已修复或经验证关闭,包括:
_canBatch缺少 macroCollection.isEqual + frontFace 判定 ✅renderer_LocalMat/renderer_MVInvMat删除 ✅- std140 对齐、mat3x4 仿射优化 ✅
uniformBlockBinding移至 link 阶段 ✅InstanceLayout失效策略(存在 ShaderProgram) ✅_buildMissingCameraDeclsDCE camera uniform 补全 ✅renderer_ModelMatforce-inject ✅- UIBatchSorter 静态成员多画布污染 — 逐字段验证安全,
sort()入口各字段均完整重置 ✅
问题
[P0] RenderElement.ts — dispose() 设 instancedRenderers = null,set() 调用 .length = 0,ObjectPool 复用路径 NPE 必现
commit 99ec5d138a("refactor: null out instancedRenderers in dispose")引入了此问题:
// dispose():
this.instancedRenderers = null; // ← 归还到 pool
// set()(下次从 pool 取出时调用):
this.instancedRenderers.length = 0; // ← TypeError: Cannot set properties of nullObjectPool 复用路径:garbageCollection() → dispose() 归还 → get() 取出 → set() 初始化 → NPE 必现。
注意:这与之前 SubRenderElement 关闭的理由不同——SubRenderElement pool 在 garbageCollection() 后立即清空整个数组,下次走 new,disposed 对象永不复用。RenderElement 所在 pool 是否相同逻辑,请作者确认:
- 若复用:改为
this.instancedRenderers.length = 0 - 若不复用:在注释中标注,防止后人误解
[P1] ShaderFactory.ts — _scanInstanceUniformsWithMacros 不处理 #if/#elif,branchStack 错位导致 uniform 静默丢失
当前只有四个正则:_ifdefRegex/_ifndefRegex/_elseRegex/_endifRegex。遇到 #if expr 或 #elif 时不压栈,其配对 #endif 仍然弹栈,导致 branchStack 错位——#endif 之后本应活跃的 renderer_ uniform 被错误跳过,instanced 模式下静默渲染错误。
代码注释写了 // #if with expressions is treated as always-active,但实现中根本没有处理 #if,实际效果是栈错位而非 always-active。
最小修复:
m = line.match(/^[ \t]*#if\b/);
if (m) { branchStack.push(branchStack[branchStack.length - 1]); continue; }
m = line.match(/^[ \t]*#elif\b/);
if (m) { branchStack.pop(); branchStack.push(branchStack[branchStack.length - 1]); continue; }若此函数确为死代码(injectInstanceUBO 调用时不传 activeMacros),请标注 @deprecated 或删除,避免后人启用后遇到静默 bug。
[P2] e2e/case/gpu-instancing-auto-batch.ts — 依赖外部 CDN,e2e flaky 隐患(非阻塞)
依赖 gw.alipayobjects.com Duck GLB,CDN 抖动会导致 CI flaky。其他 e2e case 全部使用本地 PrimitiveMesh + diffPercentage: 0。建议改用 PrimitiveMesh.createCuboid/createSphere 消除外部依赖。
结论
P0 和 P1 需在合入前明确处理(修复或以注释说明为何无运行时影响)。P2 建议合入前一并处理。
ShaderData.setInt documents bool support and setVector* documents bvec support, but the instance UBO layout table didn't list them — declaring a `bool` or `bvec` renderer uniform would silently drop the declaration without producing a #define, causing the shader to fail with an undeclared identifier. Add std140 size/align entries, reuse existing packScalar/packVec pack functions, and recognize the `b` prefix for the intView dispatch.
GuoLei1990
left a comment
There was a problem hiding this comment.
谢谢 review。在 00b4462 里部分采纳,按"ShaderData 暴露的接口集合"做边界:
已补:bool / bvec2-4
ShaderData.setIntJSDoc 明文支持boolShaderData.setVector*JSDoc 明文支持bvec*- 既然能上传,应该让 instance UBO 走通;新加
std140size/align、复用现有packScalar/packVec*、useIntView加b前缀识别
未补:mat2 / mat3 / mat2x3/3x2/2x4/4x2/3x4/4x3
ShaderData没有对应 setter(setMatrix只接Matrix即 mat4),用户结构性无法上传这些类型的值- 即使补
_std140TypeInfoMapentry,pack 函数也拿不到合法的 JS 值来源 — 这部分会变成死代码 - 现状下用户写
mat3 renderer_Foo,scan 删除声明 → 编译时undeclared identifier—— 行为正确(fail-fast),只是错误信号停留在 GLSL compile log
未补:uvec2-4
setIntArrayJSDoc 只列bool/int/bvec*/ivec*,没有 uvecuint现在 std140Map 里有 entry 是历史遗留(可能位级走 setInt 兼容),但接口未公开承诺,先不扩到 uvec
如果后续 ShaderData 加 setUInt/setBoolean 等显式 setter,再同步扩 _std140TypeInfoMap。当前以 ShaderData 接口为权威边界,保持二者对齐。
第二个建议(保留原 uniform + warn)经审视没有采纳:instancing 路径 RenderQueue 跳过 program.uploadAll(rendererUniformBlock, ...),plain uniform 永远是 0,等于静默错渲染,比 undeclared identifier 更难调试。
GuoLei1990
left a comment
There was a problem hiding this comment.
已关闭问题清单
所有 15+ 轮历史问题均已关闭(详见前轮 review)。
本轮增量审查(5 个 refactor commit)
fd747eb3 — store InstanceLayout on ShaderProgram during compilation:将 InstanceLayout 从"渲染时扫描"移为"编译时一次性产出并缓存在 ShaderProgram 上",消除了冗余的两次扫描。_canBatch 不依赖 InstanceLayout(只判断 macroCollection/frontFace),此改动不影响合批判定逻辑。SubShader._layoutCache 删除,无外部调用方,干净。
5d02bbe1 — inline _compilePlatformSource:该方法只透传参数,无独立逻辑,内联是正确操作,调用栈减少一层。
910b8ccc — inline _buildMacroMap:_scanInstanceFields 删除后只剩一个调用方,内联合理,减少 23 行。
891d2c25 — rename isGpuInstance → isGPUInstance:对齐引擎缩写惯例,grep 确认无旧名残留。
99ec5d13 — null out instancedRenderers in dispose:ClearableObjectPool.garbageCollection() 在 dispose 后将底层数组置空,dispose 后对象不会被复用,赋 null 是正确的永久释放语义,不存在 NPE 风险。
总结
5 个 refactor commit 均为正确的代码整理,无新问题引入。自上次 LGTM 以来无行为变更。
LGTM,可合入。
|
读完 PR 后写了一组 vitest 验证关键正确性问题,分享一下确认结果(5 个 test 全 pass,断言的都是当前 BUG 行为)。 🐛 P0#1:自定义 shader 只声明 derived built-in 时合批后整组消失(user-facing)触发条件:用户自定义 shader 只声明 derived built-in(如 触发链路:
用户视角:写了一个单实例下能跑的 shader,加第 2 个 entity 时整组消失,无任何 UI 报错(只有 根因: const isDerived = builtinUniforms[name];
if (isDerived === undefined && ShaderProperty._getShaderPropertyGroup(name) !== ShaderDataGroup.Renderer)
return match;
if (isDerived) return ""; // ← 剥离声明,但没记录"已剥离过 derived"后续 修复方向:要么记一个 🐛 P0#4:多 canvas 共享 atlas 时跨 canvas 合批数据错乱(user-facing)触发条件:两个或多个 UICanvas 用同一 sprite atlas / material,同 priority + 同 distanceForSort(典型如多个 overlay HUD)。 触发链路:
结果:
根因: 修复方向 A(保正确性,推荐先做 hotfix): // VertexMergeBatcher.canBatchSprite
if (preElement._isBatched || curElement._isBatched) return false;修复方向 B(真正支持跨 canvas 合批,作为 follow-up feature):
跨 canvas 合批是有性能价值的(多 HUD 共享 atlas 很常见),但当前实现是「半成品」——既画错了也没真正实现,建议先 fix 再 feature。
|
|
补充两条之前 review 里列为 P1、复审后确认存在的问题。 🐛 P1#8 → 应升 P0:
|
| 问题 | 之前级别 | 复审后级别 | 备注 |
|---|---|---|---|
| 自定义 shader 只用 derived built-in | P0#1 | P0 | 已发 |
| 跨 canvas 合批数据错乱 | P0#4 | P0 | 已发 |
mat3 renderer_* 类型 |
P1#8 | P0(应升级) | 同 P0#1 manifestation |
derived #define mobile 性能 |
P1#7 | P1(保留) | 桌面不可复现但代码事实存在,fix 性价比高 |
_scanInstanceUniformsWithMacros #if |
P0#2 | P1(降级) | dormant,生产路径未调 |
renderer_LocalMat / renderer_MVInvMat 移除 |
P1#6 | P1 | 作者有意,需 CHANGELOG 标记 |
RenderElement.dispose instancedRenderers=null |
P1#11 | ❌ 撤回 | 误判,pool gc 后立刻清空 |
UICanvas._onDisable 不 call super |
P1#12 | ❌ 撤回 | Component._onDisable 是空方法 |
P0#1 和 P0(升级)#8 共享根因,建议一起修;P0#4 独立;P1#7 性价比高建议同 PR 内顺带处理。
b4429b5 to
30f6ad8
Compare
…piling Shaders that only declare derived built-ins (e.g. `renderer_MVPMat`) had their declarations stripped by `_scanInstanceUniforms` but skipped the `#define` rewrite because `fieldMap` was empty, leaving dangling `renderer_ModelMat` references that fail to compile. This also fixed a latent bug where transform-free shaders with `RENDERER_INSTANCING` enabled would silently lose `N-1` instances — `_canBatch` returned true and only the leader was drawn.
30f6ad8 to
04ac8b3
Compare
GuoLei1990
left a comment
There was a problem hiding this comment.
已关闭问题清单
| # | 问题 | 状态 |
|---|---|---|
| 1 | _canBatch 缺少 renderer-level state 判定 | 已修复(macroCollection.isEqual + frontFace) |
| 2 | SubRenderElement ObjectPool NPE | 已关闭(dispose 后 elements.length = 0 清空池,不会复用) |
| 3 | renderer_LocalMat / renderer_MVInvMat 移除 | 已修复(完全移除,文档同步) |
| 4 | uniformBlockBindingMap 归属 | 已修复(移到 ShaderFactory) |
| 5 | _builtinRendererUniforms Map → Record | 已修复 |
| 6 | ShaderProgramMap 重命名 | 已修复 |
| 7 | _std140TypeInfoMap 缺少 bool 类型 | 已修复 |
| 8 | Opaque 距离排序移除 | 已关闭(移动端 TBDR 硬件处理 overdraw) |
| 9 | _derivedDefines 同时声明 VS/FS | 已修复(shader type 区分) |
| 10 | GLBuffer ConstantBuffer WebGL1 guard | 已处理 |
| 11 | array uniform 检测并报错 | 已添加 |
| 12 | _scanInstanceUniforms 正则与 #ifdef 交互 | 已关闭(P2,实际场景中 renderer uniform 不在条件块内) |
总结
GPU Instancing 自动合批方向正确,实现完整,架构收敛。通过 UBO 打包 per-instance renderer uniform(ModelMat 以 mat3x4 仿射优化、Layer、自定义 uniform),ShaderFactory 注入 UBO 声明和 #define 重映射,MeshRenderer 以 material + primitive + macro + frontFace 四维判定合批。per-pass 独立布局使 ShadowCaster 获得更高 instanceMaxCount。iPhone 16 Pro Max 实测 +67% FPS(5000 objects → 21 draw calls)数据有说服力。
前 13+ 轮 review 提出的所有关键问题均已修复或关闭。
问题
[P2] ShaderFactory.ts — 数组 uniform 被静默删除,shader 编译错误指向 GLSL 而非真正原因
当 shader 声明了 renderer_* 数组 uniform(如 uniform vec4 renderer_BoneMatrices[64]),当前逻辑 Logger.error + return "" 直接删除声明。这会导致后续 shader 编译失败(引用了未声明的变量),但错误信息指向 GLSL 编译错误而非真正原因,难以诊断。
建议保留原始 uniform 声明不修改(return match),让非 instancing shader 正常工作;或在 error 消息中补充"该 shader 不支持 GPU Instancing,已 fallback 到非合批渲染"。
不阻塞合并。无新 P0/P1。LGTM,可合入。
(提醒:建议在 release notes 中标注 renderer_LocalMat 和 renderer_MVInvMat 移除,供用户迁移参考。)
UICanvas pre-batches its children into leaders that carry a self-contained draw range. When two canvases sharing the same atlas push their leaders into the transparent queue, the main-pipeline batcher previously fed those leaders back into `_canBatch`/`_batch` as if they were single sub-elements. That corrupted `subMesh.count` and re-appended already-written indices, dropping draws or overlapping ranges. The batch boundary is the canvas — matches Unity uGUI's behavior — so detect already-batched leaders at the batcher entry and pass them through.
…inject `_scanInstanceUniformsWithMacros` and `injectInstanceUBO`'s `activeMacros` parameter only served the raw GLSL path, which was removed in e08af33 ("refactor(shader): remove raw GLSL shader path"). The 4 preprocessor regexes are only used by this function. All inputs are now ShaderLab preprocessor-evaluated before reaching the injector. Cleanup that should have landed alongside e08af33. Also reported by reviewer as a dormant `#if`/`#elif` branch-stack bug — moot once the function is gone.
`else if (curElement._isBatched) return` in `VertexMergeBatcher.batch` was defensive: it caught the case where main pipeline fed an already- batched leader back into `_batch(null, leader)`. After commit 9606bf3 moved that check to `BatcherManager.batch`'s entry, the leader never reaches this branch. Refresh `RenderElement._isBatched`'s doc to match its current meaning ("batch leader — must not be merged again").
# Conflicts: # packages/shader/compiledShaders/PBR.shaderc # packages/shader/compiledShaders/Pipeline/DepthOnly.shaderc # packages/shader/compiledShaders/Pipeline/ShadowCaster.shaderc
GuoLei1990
left a comment
There was a problem hiding this comment.
已关闭问题清单
所有历史 P0/P1 问题均已在前 15+ 轮 review 中修复或经作者解释关闭(详见历史 review 记录)。
总结
为 MeshRenderer 实现自动 GPU Instancing,通过 UBO 打包 per-instance 数据驱动 instanced draw call。关键技术决策:
- mat3x4 仿射优化(48 vs 64 字节)— 利用 ModelMat 恒为仿射变换的架构保证
- 派生 uniform 通过
#define实时计算(MVMat/MVPMat/NormalMat)— 消除 UBO 冗余存储 - Per-pass 独立布局 — ShadowCaster 等轻量 pass 获得更高 instanceMaxCount
- InstanceLayout 缓存在 ShaderProgram — 编译时计算一次,渲染时直接取用
- Opaque 排序改为 material/primitive 优先 — 合批友好,TBDR 硬件处理 overdraw
iPhone 16 Pro Max 实测 +67% FPS 提升(30→50,5000 objects / 21 draw calls);UI Canvas 9216 按钮 +120% FPS(25→55)。
问题
无新问题。
自上次 LGTM 以来无新 commit,LGTM,可合入。
增补提醒:建议在 release notes 中标注 renderer_LocalMat 和 renderer_MVInvMat 的移除(破坏性变更),以及 renderer_NormalMat 逐顶点 inverse() 的 @todo(后续可 CPU 端预计算并打包到 UBO)。
`_scanInstanceUniforms` stripped renderer-group uniform declarations before `_buildLayout` decided whether they could fit the std140 layout. For types like `mat3` (which std140's row-padded layout doesn't support in the current `_std140TypeInfoMap`), the declaration was removed but no UBO field or `#define` replaced it — every later reference became an undeclared identifier and the whole pass silently failed to compile. Check the storage type against the std140 map up front. Unsupported types stay declared in the source and emit a clear log; supported types follow the existing strip-and-collect path.
P0#2
|
GuoLei1990
left a comment
There was a problem hiding this comment.
已关闭问题清单
所有历史 P0/P1 问题均已在前 15+ 轮 review 中修复或经作者解释关闭(详见历史 review 记录)。
总结
为 MeshRenderer 实现自动 GPU Instancing,通过 UBO 打包 per-instance 数据驱动 instanced draw call。关键技术决策:
- mat3x4 仿射优化(48 vs 64 字节)— 利用 ModelMat 恒为仿射变换的架构保证
- 派生 uniform 通过
#define实时计算(MVMat/MVPMat/NormalMat)— 消除 UBO 冗余存储 - Per-pass 独立布局 — ShadowCaster 等轻量 pass 获得更高 instanceMaxCount
- InstanceLayout 缓存在 ShaderProgram — 编译时计算一次,渲染时直接取用
- Opaque 排序改为 material/primitive 优先 — 合批友好,TBDR 硬件处理 overdraw
iPhone 16 Pro Max 实测 +67% FPS 提升(30→50,5000 objects / 21 draw calls);UI Canvas 9216 按钮 +120% FPS(25→55)。
问题
无新问题。
自上次 LGTM 以来无新 commit,LGTM,可合入。
增补提醒:建议在 release notes 中标注 renderer_LocalMat 和 renderer_MVInvMat 的移除(破坏性变更),以及 renderer_NormalMat 逐顶点 inverse() 的 @todo(后续可 CPU 端预计算并打包到 UBO)。
|
更正:上一条 review 基于 PR 的最新 push commit(2026-04-11),当时未感知到 2026-05-13 另一位 reviewer 提出的新 P0 bugs。现补充增量审查。 已关闭问题清单所有历史 P0/P1 问题均已修复或关闭(详见前 15+ 轮记录)。 2026-05-13 新发现的 P0 bugs 均已在本地 commit 中修复,修复质量如下:
架构一致性确认
状态本地 fix commits 未 push 到 PR(PR HEAD 仍为 增补提醒:建议在 release notes 中标注 |
Closes #194
Summary
InstanceBatch将 renderer uniform(ModelMat、Layer 等)打包到共享的 std140 UBO 中ShaderFactory.injectInstanceUBO自动扫描 shader 中的 renderer uniform,替换为 UBO 数组访问 +#define重映射mat3x4存储(仿射优化,48 字节 vs 64 字节),派生 uniform(MVMat/MVPMat/NormalMat)通过#define实时计算MeshRenderer._canBatch/_batch实现合批判定(相同 primitive + material + front-face)ShaderProgram._recordLocation跳过 UBO 成员(location === null),避免无用 ShaderUniform 创建(priority, material.id, primitive.id),移除 front-to-back 距离排序。取舍说明: 主战场移动端 GPU 全是 TBDR(Apple HSR / Mali FPK / Adreno LRZ),硬件层处理 overdraw,软件排序收益可忽略;合批最大化(instancing 命中率 + state-change 减少)才是核心收益。桌面端 early-Z 依赖软件排序,instancing 命中率低且 overdraw 高的极端场景可能小幅反退化,需要时可后续暴露排序策略开关UIBatchSorter,按(depth, material, texture, hierarchy)在 canvas 内部聚类相邻可合批的 sub-element,再以 leader 形式进入主队列;配合subDistancePrioritytiebreaker 修复"hierarchy 顺序被 unstable quicksort 打散"的视觉错乱 bugPerformance
MeshRenderer GPU Instancing
测试场景: 2500 glTF 模型(Avocado) + 2500 自定义 shader 立方体,全部动态旋转 + 缩放 + 颜色动画
iPhone 实测截图(59 FPS / 21 Draw Calls / 5000 objects):
UI Canvas Batching
测试场景 (
examples/ui-batch-massive): 9216 buttons (18432 sub-elements) 在单个 canvas 上,bg + 4 种 atlas iconiPhone 实测截图(60 FPS / 32 Draw Calls / 18432 sub-elements):
为什么提升如此显著: dev/2.0 上 hierarchy 顺序
[bg, icon, bg, icon, ...]让_canBatch全失败,每个 sub-element 各自一个 draw call。新的UIBatchSorter在 canvas 内按(depth, material, texture)聚类,再交给VertexMergeBatcher按 PrimitiveChunk 合并。最终 DrawCall 等于chunks × materials(每 chunk 上限 4096 顶点,约 1024 sprite),跟元素总数解耦。Bug 修复: 同时修复了 SubRenderElement 扁平化为 RenderElement 后引入的 hierarchy 错乱 bug —— 主 transparent 队列在同
(priority, distance)下用 unstable quicksort 打散 canvas 内顺序。修复方案是 canvas 内提前合批 +subDistancePrioritytiebreaker。UI/batchOrdere2e case 守护这个回归。Future Optimization
injectInstanceUBO通过正则扫描 GLSL 文本获取 renderer uniform 信息。如果 ShaderLab 预编译时提供 uniform 元数据(name, type, group),可以消除正则扫描,改为精确拼接,提升代码健壮性和可扩展性OpaqueSortMode(OptimizeBatching/FrontToBack),让用户按场景特征选择compareForTransparent只按(priority, distance)排序、不做 material 聚类,透明 3D 物体合批率几乎为零。UIBatchSorter的 visual-layering 思路(用 depth bucket 离散化距离 + 同 bucket 内按 material 聚类)可渗透到 transparent queue,透明粒子/sprite 重的场景预期可减 30-70% draw callKey Files
RenderPipeline/InstanceBatch.tsRenderPipeline/RenderQueue.tsRenderPipeline/UIBatchSorter.tsRenderPipeline/VertexMergeBatcher.tsRenderPipeline/RenderElement.tssubDistancePriority+_isBatched字段ui/component/UICanvas.ts_batchedRenderElementsui/component/advanced/Image.ts/Text.tsshaderlib/ShaderFactory.tsshader/ShaderPass.tsshader/ShaderProgram.ts_instanceLayout字段,跳过 UBO 成员反射mesh/MeshRenderer.ts_canBatch/_batch合批逻辑shader/ShaderProgramMap.tsTest plan
UIBatchSorter12 单元测试)UI/batchOrdere2e regression case)