feat(core): GPU instancing auto-batching by GuoLei1990 · Pull Request #2957 · galacean/engine

GuoLei1990 · 2026-04-07T11:01:10Z

Closes #194

Summary

为 MeshRenderer 引入自动 GPU Instancing，通过 UBO 打包 per-instance 数据驱动 instanced draw call
InstanceBatch 将 renderer uniform（ModelMat、Layer 等）打包到共享的 std140 UBO 中
ShaderFactory.injectInstanceUBO 自动扫描 shader 中的 renderer uniform，替换为 UBO 数组访问 + #define 重映射
ModelMat 以 mat3x4 存储（仿射优化，48 字节 vs 64 字节），派生 uniform（MVMat/MVPMat/NormalMat）通过 #define 实时计算
InstanceLayout 在 shader 编译时计算并缓存在 ShaderProgram 上，渲染时直接取用，无重复计算
Per-pass 独立布局：不同 pass 按各自需要的 uniform 计算 struct 大小，ShadowCaster 等轻量 pass 可获得更高 instanceMaxCount
MeshRenderer._canBatch/_batch 实现合批判定（相同 primitive + material + front-face）
ShaderProgram._recordLocation 跳过 UBO 成员（location === null），避免无用 ShaderUniform 创建
Opaque 排序策略改为按 (priority, material.id, primitive.id)，移除 front-to-back 距离排序。取舍说明: 主战场移动端 GPU 全是 TBDR（Apple HSR / Mali FPK / Adreno LRZ），硬件层处理 overdraw，软件排序收益可忽略；合批最大化（instancing 命中率 + state-change 减少）才是核心收益。桌面端 early-Z 依赖软件排序，instancing 命中率低且 overdraw 高的极端场景可能小幅反退化，需要时可后续暴露排序策略开关
Canvas 内提前合批 + visual-layering 排序: 新增 UIBatchSorter，按 (depth, material, texture, hierarchy) 在 canvas 内部聚类相邻可合批的 sub-element，再以 leader 形式进入主队列；配合 subDistancePriority tiebreaker 修复"hierarchy 顺序被 unstable quicksort 打散"的视觉错乱 bug

Performance

MeshRenderer GPU Instancing

测试场景: 2500 glTF 模型（Avocado） + 2500 自定义 shader 立方体，全部动态旋转 + 缩放 + 颜色动画

设备	dev/2.0 (无 instancing)	feat/gpu-instancing	提升
iPhone 16 Pro Max	30 FPS	50 FPS	+67%

iPhone 实测截图（59 FPS / 21 Draw Calls / 5000 objects）：

UI Canvas Batching

测试场景 (examples/ui-batch-massive): 9216 buttons (18432 sub-elements) 在单个 canvas 上，bg + 4 种 atlas icon

设备	dev/2.0	feat/gpu-instancing	提升
iPhone	25 FPS	55 FPS	+120%

iPhone 实测截图（60 FPS / 32 Draw Calls / 18432 sub-elements）：

为什么提升如此显著: dev/2.0 上 hierarchy 顺序 [bg, icon, bg, icon, ...] 让 _canBatch 全失败，每个 sub-element 各自一个 draw call。新的 UIBatchSorter 在 canvas 内按 (depth, material, texture) 聚类，再交给 VertexMergeBatcher 按 PrimitiveChunk 合并。最终 DrawCall 等于 chunks × materials（每 chunk 上限 4096 顶点，约 1024 sprite），跟元素总数解耦。

Bug 修复: 同时修复了 SubRenderElement 扁平化为 RenderElement 后引入的 hierarchy 错乱 bug —— 主 transparent 队列在同 (priority, distance) 下用 unstable quicksort 打散 canvas 内顺序。修复方案是 canvas 内提前合批 + subDistancePriority tiebreaker。UI/batchOrder e2e case 守护这个回归。

Future Optimization

ShaderLab 预编译元数据: 当前 injectInstanceUBO 通过正则扫描 GLSL 文本获取 renderer uniform 信息。如果 ShaderLab 预编译时提供 uniform 元数据（name, type, group），可以消除正则扫描，改为精确拼接，提升代码健壮性和可扩展性
Opaque 排序策略可配置: 当前 opaque 排序固定为合批优先（按 material/primitive），舍弃了 front-to-back 距离排序。桌面端 early-Z 依赖软件排序的场景可能受影响。后续可在 Camera 上暴露 OpaqueSortMode（OptimizeBatching / FrontToBack），让用户按场景特征选择
UIBatchSorter 算法泛化到 transparent 3D: 当前 compareForTransparent 只按 (priority, distance) 排序、不做 material 聚类，透明 3D 物体合批率几乎为零。UIBatchSorter 的 visual-layering 思路（用 depth bucket 离散化距离 + 同 bucket 内按 material 聚类）可渗透到 transparent queue，透明粒子/sprite 重的场景预期可减 30-70% draw call

Key Files

文件	职责
`RenderPipeline/InstanceBatch.ts`	UBO buffer 管理、per-instance 数据 upload
`RenderPipeline/RenderQueue.ts`	Instance-aware 渲染循环
`RenderPipeline/UIBatchSorter.ts`	Canvas 内 visual-layering 排序（spatial hash + depth bucket）
`RenderPipeline/VertexMergeBatcher.ts`	顶点合并合批，幂等保护已合 leader 不被主管线 _batch(null, leader) 重置
`RenderPipeline/RenderElement.ts`	新增 `subDistancePriority` + `_isBatched` 字段
`ui/component/UICanvas.ts`	canvas 内提前合批，输出 `_batchedRenderElements`
`ui/component/advanced/Image.ts` / `Text.ts`	在产元素时无条件 set subShader（消除 canvas-internal batch 的 null 窗口）
`shaderlib/ShaderFactory.ts`	Uniform 扫描、std140 布局计算、UBO 代码生成注入
`shader/ShaderPass.ts`	Shader 编译入口，InstanceLayout 存储到 ShaderProgram
`shader/ShaderProgram.ts`	新增 `_instanceLayout` 字段，跳过 UBO 成员反射
`mesh/MeshRenderer.ts`	`_canBatch`/`_batch` 合批逻辑
`shader/ShaderProgramMap.ts`	多级位掩码 ShaderProgram 缓存

Test plan

多个共享 mesh + material 的 MeshRenderer 正确 instanced 渲染
Shadow pass 与 instanced 物体配合正常
非 instancing renderer（SkinnedMeshRenderer、2D sprite）不受影响
无 renderer uniform 时（无 UBO layout）正常回退
性能验证：大量相同物体的 draw call 数量显著减少
iPhone 16 Pro Max 实测 30 FPS → 50 FPS（+67%）
UI canvas 大量 sub-element 合批正确（UIBatchSorter 12 单元测试）
UI hierarchy 顺序保持（UI/batchOrder e2e regression case）
iPhone UI 9216 按钮 25 FPS → 55 FPS（+120%）

…nce data Introduce automatic GPU instancing for MeshRenderer. The system scans renderer-group uniforms across shader passes, builds a unified std140 UBO layout, and packs per-instance data (ModelMat, Layer, etc.) each frame. Key changes: - InstanceDataPacker: packs renderer data into shared UBO for instanced draw - ShaderFactory: unified _scanInstanceUniforms, _buildLayout, _injectInstanceUBO - MeshRenderer._canBatch/_batch: instancing merge logic - ShaderPass/SubShader: instance-aware compilation with macro cache - GLSLIfdefResolver: compile-time #ifdef resolution for instance field scanning - MacroCachePool: pooled ShaderMacroCollection for shader program caching - RenderQueue: instance-aware draw path with UBO binding

coderabbitai · 2026-04-07T11:01:18Z

Note

Reviews paused

It looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the reviews.auto_review.auto_pause_after_reviewed_commits setting.

Use the following commands to manage reviews:

@coderabbitai resume to resume automatic reviews.
@coderabbitai review to trigger a single review.

Use the checkboxes below for quick actions:

▶️ Resume reviews
🔍 Trigger review

Walkthrough

Adds WebGL2 GPU instancing: instance UBO management, instance-aware shader compilation and caching, batching API/signature changes, render-queue instanced draw paths, device/GL helpers, examples and E2E tests, and supporting shader/uniform enhancements.

Changes

Cohort / File(s)	Summary
GPU Instancing Core `packages/core/src/RenderPipeline/InstanceBatch.ts`, `packages/core/src/shader/ShaderBlockProperty.ts`, `packages/core/src/shader/enums/ConstantBufferBindingPoint.ts`	New InstanceBatch class managing CPU packing and dynamic UBO allocation; ShaderBlockProperty for uniform block ids; ConstantBuffer binding enum for renderer-instance UBO.
Batching Parameter Refactoring `packages/core/src/RenderPipeline/BatchUtils.ts`, `packages/core/src/2d/sprite/SpriteRenderer.ts`, `packages/core/src/2d/sprite/SpriteMask.ts`, `packages/core/src/2d/text/TextRenderer.ts`, `packages/core/src/Renderer.ts`, `packages/ui/src/component/UIRenderer.ts`	Renamed batching parameters from `(elementA, elementB)` → `(preSubElement, subElement)` and adjusted nullability/optional semantics; updated BatchUtils to consume new parameter roles.
SubRenderElement & RenderPipeline `packages/core/src/RenderPipeline/SubRenderElement.ts`, `packages/core/src/RenderPipeline/BasicRenderPipeline.ts`, `packages/core/src/RenderPipeline/RenderQueue.ts`, `packages/core/src/RenderPipeline/BatcherManager.ts`	Replaced `shaderPasses` with `subShader`; added `instancedRenderers` on SubRenderElement; BasicRenderPipeline and RenderQueue updated to use SubShader and to perform instanced draw paths; BatcherManager exposes InstanceBatch and calls updated renderer batching signature.
Renderer & Mesh Changes `packages/core/src/Renderer.ts`, `packages/core/src/mesh/MeshRenderer.ts`, `packages/core/src/mesh/SkinnedMeshRenderer.ts`	Renderer shader-property fields visibility adjusted; batching hooks updated to new signatures; MeshRenderer supports collecting instanced renderers into SubRenderElement; SkinnedMeshRenderer disables batching.
Shader Program & Compilation `packages/core/src/shader/ShaderPass.ts`, `packages/core/src/shader/ShaderProgram.ts`, `packages/core/src/shader/ShaderProgramMap.ts`, `packages/core/src/shader/ShaderBlockProperty.ts`, `packages/core/src/shaderlib/ShaderFactory.ts`, `packages/core/src/graphic/TransformFeedbackShader.ts`	Switched from ShaderProgramPool → ShaderProgramMap cache; added _compileShaderProgram path that computes instance layouts and injects instance UBOs; ShaderProgram now records uniform block ids and supports bindUniformBlocks; ShaderFactory extended to generate instance-aware GLSL and return instance layout; related compile/cache call sites updated.
Shader Uniforms & Uploads `packages/core/src/shader/ShaderUniform.ts`	Added upload methods for Mat2/Mat3 and rectangular matrix types (WebGL2 variants).
Engine & Caching Fields `packages/core/src/Engine.ts`, `packages/core/src/shader/ShaderProgramMap.ts`	Replaced internal `_shaderProgramPools` with `_shaderProgramMaps` and adapted lazy accessors; ShaderProgramMap renamed/refactored with destroy() lifecycle.
RHI / WebGL2 Support `packages/rhi-webgl/src/GLBuffer.ts`, `packages/rhi-webgl/src/WebGLGraphicDevice.ts`, `packages/core/src/graphic/enums/BufferBindFlag.ts`	Added `ConstantBuffer` bind flag and mapped it to UNIFORM_BUFFER; WebGL device gains bindUniformBufferBase, bindUniformBlock, and getMaxUniformBlockSize helpers; GLBuffer target selection updated.
2D / UI Render Element Updates `packages/core/src/2d/sprite/SpriteMask.ts`, `packages/ui/src/component/advanced/Image.ts`, `packages/ui/src/component/advanced/Text.ts`	Overlay/2D sub-render elements now assign `subShader` instead of `shaderPasses`; small adjustments to subRenderElement setup to match SubShader change.
Examples, E2E & Tests `examples/src/gpu-instancing-auto-batch.ts`, `examples/src/gpu-instancing-custom-data.ts`, `e2e/case/gpu-instancing-auto-batch.ts`, `e2e/case/gpu-instancing-custom-data.ts`, `e2e/config.ts`, `examples/package.json`, `tests/src/shader-lab/*`	Added examples and E2E tests for instancing; registered new E2E config entries; examples add custom shaders and animated instanced entities; tests updated to call `_compileShaderProgram` and benchmark helpers adjusted.
Tooling / Packages `package.json`, `examples/package.json`	Dev tooling upgrades (ESLint, TypeScript ESLint parser/plugin, lint-staged) and added `@galacean/engine-toolkit-stats` dependency in examples.
Other Render Pipeline Adjustments `packages/core/src/RenderPipeline/BasicRenderPipeline.ts`, `packages/core/src/RenderPipeline/BatchUtils.ts`, `packages/core/src/RenderPipeline/InstanceBatch.ts`	pushRenderElement now consumes SubShader; BatchUtils.batchFor2D signature adjusted; InstanceBatch integrated into BatcherManager and RenderQueue flows for chunked uploads and instanced draw submission.

Sequence Diagram(s)

sequenceDiagram
    participant RenderQueue as RenderQueue
    participant BatcherManager as BatcherManager
    participant InstanceBatch as InstanceBatch
    participant GPUBuffer as GPU Constant Buffer
    participant ShaderProgram as ShaderProgram

    RenderQueue->>BatcherManager: request instanceBatch (lazy)
    BatcherManager->>InstanceBatch: setLayout(layout)
    InstanceBatch->>GPUBuffer: create/realloc UBO (if needed)
    loop per instanced chunk
        RenderQueue->>InstanceBatch: upload(renderers[], start, count)
        InstanceBatch->>InstanceBatch: pack per-instance fields into CPU buffer
        InstanceBatch->>GPUBuffer: setData(range, Discard)
    end
    RenderQueue->>ShaderProgram: bindUniformBlocks(bindingMap)
    ShaderProgram->>GPUBuffer: uniformBlockBinding(bindingPoint)
    RenderQueue->>RenderQueue: issue drawPrimitive with instanceCount

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Poem

🐰 I packed the fields in quiet rows,
UBOs hum where instance data flows,
Shaders peep and vary hue,
Batches leap — five thousand, who knew?
A tiny rabbit hops — render goes!

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name	Status	Explanation	Resolution
Docstring Coverage	⚠️ Warning	Docstring coverage is 25.00% which is insufficient. The required threshold is 80.00%.	Write docstrings for the functions missing them to satisfy the coverage threshold.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title "feat(core): GPU instancing auto-batching" directly and clearly summarizes the primary feature added in this changeset: GPU instancing with automatic batching for improved rendering performance.
Description Check	✅ Passed	Check skipped - CodeRabbit’s high-level summary is enabled.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch feat/gpu-instancing

Tip

💬 Introducing Slack Agent: The best way for teams to turn conversations into code.

Slack Agent is built on CodeRabbit's deep understanding of your code, so your team can collaborate across the entire SDLC without losing context.

Generate code and open pull requests
Plan features and break down work
Investigate incidents and troubleshoot customer tickets together
Automate recurring tasks and respond to alerts with triggers
Summarize progress and report instantly

Built for teams:

Shared memory across your entire org—no repeating context
Per-thread sandboxes to safely plan and execute work
Governance built-in—scoped access, auditability, and budget controls

One agent for your entire SDLC. Right inside Slack.

👉 Get started

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Move _gpuInstanceMacro after _macroMap declaration to fix static initialization order. Also apply prettier formatting fixes.

codecov · 2026-04-07T11:23:06Z

Codecov Report

❌ Patch coverage is 39.71743% with 1152 lines in your changes missing coverage. Please review.
✅ Project coverage is 77.38%. Comparing base (1bc2b10) to head (04ac8b3).
⚠️ Report is 5 commits behind head on dev/2.0.

Files with missing lines	Patch %	Lines
packages/core/src/shader/ShaderFactory.ts	40.16%	213 Missing ⚠️
examples/src/ui-batch-massive.ts	0.00%	176 Missing and 1 partial ⚠️
examples/src/gpu-instancing-auto-batch.ts	0.00%	119 Missing and 1 partial ⚠️
examples/src/gpu-instancing-custom-data.ts	0.00%	84 Missing and 1 partial ⚠️
e2e/case/ui-batch-order.ts	0.00%	61 Missing and 1 partial ⚠️
...ages/core/src/RenderPipeline/VertexMergeBatcher.ts	22.85%	54 Missing ⚠️
packages/core/src/RenderPipeline/InstanceBuffer.ts	46.51%	46 Missing ⚠️
e2e/case/gpu-instancing-custom-data.ts	0.00%	44 Missing and 1 partial ⚠️
e2e/case/gpu-instancing-auto-batch.ts	0.00%	42 Missing and 1 partial ⚠️
packages/core/src/RenderPipeline/RenderQueue.ts	52.50%	38 Missing ⚠️
... and 26 more

Additional details and impacted files

@@             Coverage Diff             @@
##           dev/2.0    #2957      +/-   ##
===========================================
- Coverage    78.25%   77.38%   -0.87%     
===========================================
  Files          900      909       +9     
  Lines        99234   100649    +1415     
  Branches     10172    10261      +89     
===========================================
+ Hits         77657    77891     +234     
- Misses       21406    22580    +1174     
- Partials       171      178       +7

Flag	Coverage Δ
unittests	`77.38% <39.71%> (-0.87%)`	⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

…es layout The instance UBO is injected at compile time by ShaderFactory, so the original shader uniform declarations don't need modification.

Bring back normalMat extraction, transform_declare reordering, trailing whitespace fixes, and VertexPBR indent fix. Only the renderer_Layer relocation stays reverted.

… 3×vec4 - Fix SubRenderElement.set() not resetting instanceDataPacker, causing stale packer references from previous frames to break all batching - Use whitelist + _group fallback for identifying renderer uniforms in _scanInstanceUniforms (fixes _group===undefined for ModelMat) - Store ModelMat as 3×vec4 (affine rows) instead of mat4 in UBO, saving 16 bytes per instance (structSize 80→64, +25% instances/batch) - Add camera_VPMat to transform_declare.glsl for derived MVP define - Extract struct definition outside uniform block (GLSL ES 3.00 compat) - Fix _insertUBOBlock to only scan initial #define preamble - Pass instanceID to fragment shader via flat varying

…lity Wrap derived NormalMat define with mat4() so instancing and non-instancing paths both produce mat4, avoiding shader compilation errors. Add custom instance data example to verify per-renderer uniform batching.

Rename elementA/elementB to preSubElement/subElement across all renderer subclasses and BatchUtils. Change _batch signature so preSubElement is nullable (null = batch head, no previous element to merge with), and subElement is always required.

- Move RENDERER_GPU_INSTANCE macro from ShaderMacro to InstanceDataPacker - Rename getOrCreate() to get() in InstanceDataPackerPool - Clear compileMacros in InstanceDataPacker.reset()

Move macro merging, layout computation, and UBO packing from batch phase to render phase. _batch now only collects renderers into a pre-allocated list. RenderQueue.render handles macro union, layout lookup, and splits by maxInstanceCount for sub-batch rendering. - SubRenderElement: instanceDataPacker → instancedRenderers (pooled array) - MeshRenderer._canBatch: remove maxInstanceCount check - MeshRenderer._batch: only push renderers, zero allocation - InstanceDataPacker: remove compileMacros/addRenderer/instanceCount, add packAndUpload(renderers, start, count) - InstanceDataPackerPool: remove uploadBuffer, simplify reset - BatcherManager: remove instancing uploadBuffer call

Packer is now stateless (setLayout + packAndUpload + draw), so only one instance is needed. Discard upload ensures no GPU stall when reusing the same buffer across shadow and main passes. - Delete InstanceDataPackerPool.ts - BatcherManager: instanceDataPackerPool → instanceDataPacker - Remove resetInstanceDataPackerPool lifecycle - Saves GPU memory by using single buffer instead of pool

Rename class, file, and all references to better reflect its role as an instance batch manager rather than a generic data packer.

It's a macro-keyed map, not a pool (no borrow/return semantics).

Align variable and method names with MacroMap rename — these are maps, not pools.

Callers always pass a valid buffer, no need for null guard.

- nativeBuffer → buffer, _uboData → _data - Replace separate instanceFields/_structSize with single _layout ref - setLayout() now takes InstanceLayout directly

- Remove unnecessary null guard on _layout - Inline uploadElements variable - Destructure floatView/intView from this - Improve worldMatrix comment

- Remove unnecessary component→renderer alias, use component directly - Hoist bindUniformBlocks/bindUniformBufferBase out of sub-batch loop - Move primitive.instanceCount=0 after loop (only need to reset once) - Remove redundant let layout = undefined

- Upgrade lint-staged from v10.5 to v16.4.0 - Fix glob from *.{ts} to **/*.ts to match subdirectory files - Remove redundant git add from tasks

- eslint 8.44 → 8.57 - @typescript-eslint/parser and eslint-plugin 6.x → 8.x - Eliminates "unsupported TypeScript version" warning

Dispose means the object is permanently released, so null the array to free memory instead of just clearing length.

Eliminate redundant SubShader._getInstanceLayout / ShaderPass._scanInstanceFields by reusing the shader compilation chain — _injectInstanceUBO now returns InstanceLayout directly, stored on ShaderProgram._instanceLayout.

GuoLei1990

Re-review (Incremental) — GPU Instancing + UIBatchSorter + e2e Cases

已关闭问题清单

前 15+ 轮 review 提出的所有关键问题均已修复或经作者解释关闭（详见 2026-04-15 LGTM）。2026-05-08 增量 review 新发现两个问题，本轮验证其是否已修复。

问题

[P0] `RenderElement.dispose()` — `instancedRenderers = null` 导致对象池复用时 NPE（仍未修复）

diff 中（RenderElement.ts 新增的 dispose() 方法）：

dispose(): void {
  this.component = null;
  this.material = null;
  // ...
  this.instancedRenderers = null;  // ← 仍然是 null 赋值
  // ...
}

而 set() 中初始化为：

this.instancedRenderers.length = 0;  // ← 对 null 调用 .length = 0 → TypeError

ObjectPool 复用路径：dispose() 归还 → 下一帧 get() 取出 → 调用 set() → NPE 必现。

需改为：

// dispose() 中：
this.instancedRenderers.length = 0;
// 或者在 set() 中：
this.instancedRenderers = this.instancedRenderers ?? [];
this.instancedRenderers.length = 0;

[P1] `_scanInstanceUniformsWithMacros` 不处理 `#if`/`#elif` 导致栈错位（仍未修复）

注释写了 #if with expressions is treated as always-active，但实现中只有四个正则：_ifdefRegex/_ifndefRegex/_elseRegex/_endifRegex，没有处理 #if/#elif。

cptbtptpbcptdtptp（2026-05-07）分析准确：遇到 #if 不 push，其配对的 #endif 仍然 pop，branchStack 错位，导致 #endif 之后本应活跃的 renderer_ uniform 被跳过——不进 UBO 也不生成 #define 重映射，但 GLSL 端仍视其为普通 renderer uniform，instanced 模式下该 block 上传被跳过，渲染静默错误。

最小修复：

m = line.match(/^[ \t]*#if\b/);
if (m) { branchStack.push(branchStack[branchStack.length - 1]); continue; }  // always-active as documented
m = line.match(/^[ \t]*#elif\b/);
if (m) { branchStack.pop(); branchStack.push(branchStack[branchStack.length - 1]); continue; }

e2e Case 质量

gpu-instancing-auto-batch.ts — 有可靠性隐患：依赖外部 CDN URL（gw.alipayobjects.com），网络抖动会导致 flaky。其他 e2e case（gpu-instancing-custom-data、ui-batch-order）全部使用本地资源，diffPercentage: 0，更稳定。建议将 auto-batch case 的 Duck GLB 换成本地 PrimitiveMesh，彻底消除外部依赖。

ui-batch-order.ts — 设计正确：固色纹理、本地资源、diffPercentage: 0、注释清晰说明 regression 场景。质量高。

gpu-instancing-custom-data.ts — ShaderLab 内联 + PrimitiveMesh，全确定性，质量高。

UIBatchSorter 单元测试 — 覆盖核心场景，从公开 API 入口断言输出顺序，符合链路测试原则，质量好。

结论

两个 P0/P1 bug 在上次 review（2026-05-08）提出后，本轮验证仍未修复，请修复后合入。

cptbtptpbcptdtptp · 2026-05-12T11:38:34Z

+  /** @internal */
+  @ignoreClone
+  _batchedRenderElements: any[] = [];
  /** @internal */


_renderElements 和 _batchedRenderElements 在 canvas 不可见时（_canRender 返回 false、onDisable 等）不会被清空，但它们持有的 RenderElement 来自对象池，可能已被其他 renderer 复用。这些悬挂引用会阻止 GC 回收。

建议在 onDisable 里清空：

override onDisable(): void { super.onDisable(); this._renderElements.length = 0; this._batchedRenderElements.length = 0; }

GuoLei1990

已关闭问题清单

前 20+ 轮提出的所有关键问题均已修复或经作者解释关闭：

问题	状态
`_canBatch` 缺少 renderer-level state 判定	✅ 已修复（macroCollection.isEqual + frontFace）
`renderer_LocalMat`/`renderer_MVInvMat` 未删除	✅ 已修复（完全移除）
uniformBlockBindingMap 归属	✅ 已修复（移到 ShaderFactory）
`renderer_NormalMat` per-vertex inverse	✅ 已关闭（GPU ALU 换 UBO 空间，可接受）
Opaque 距离排序移除	✅ 已关闭（移动端 TBDR，合批优先）

总结

GPU Instancing 自动合批架构已收敛。UIBatchSorter 同步修复了 canvas hierarchy 顺序被 unstable quicksort 打散的视觉 bug。整体方向正确，iPhone +67% / UI +120% 数据有说服力。

问题（需在合并前修复）

[P0] RenderElement.ts — dispose() 将 instancedRenderers 设为 null，但 set() 调用 this.instancedRenderers.length = 0，ObjectPool 复用路径必然 NPE

// dispose():
this.instancedRenderers = null;  // ← 设为 null

// set():
this.instancedRenderers.length = 0;  // ← pool 复用时 dispose→get→set，NPE

修复：dispose() 中改为 this.instancedRenderers.length = 0（清空数组但保留引用）。

[P2] e2e/case/gpu-instancing-auto-batch.ts 依赖外部 CDN

使用 gw.alipayobjects.com Duck GLB，CDN 网络抖动会导致 e2e flaky。建议改用 PrimitiveMesh.createCuboid/createSphere（参考 gpu-instancing-custom-data.ts 的正确做法）。

GuoLei1990

已关闭问题清单

前 15+ 轮 review 提出的所有关键问题均已修复或关闭（详见上一轮 review）。

总结

基于 std140 UBO 的自动 GPU Instancing 实现完整、架构收敛。ShaderFactory.injectInstanceUBO 扫描 renderer-group uniform → 计算 std140 布局 → 注入 UBO block + #define 重映射，对现有 shader 零侵入；mat3x4 仿射优化、per-pass 独立布局、InstanceLayout 缓存在 ShaderProgram、派生 uniform 通过 #define 实时计算——这些设计决策均正确。UIBatchSorter canvas 内提前合批修复了 hierarchy 顺序被 unstable quicksort 打散的视觉错乱 bug。iPhone 16 PM 实测 MeshRenderer +67%、UI +120%，数据有说服力。

自上次 LGTM（2026-04-15）以来，本 PR 无新 commit。

LGTM。无新问题，可以合入。

合并后建议在 release notes 标注两个破坏性变更：

renderer_LocalMat 和 renderer_MVInvMat uniform 已移除
renderer_NormalMat 当前为 GPU 端 per-vertex inverse()，后续可考虑 CPU 预计算打入 UBO（@todo 已标注）

GuoLei1990

已关闭问题清单

前 14+ 轮 review 的问题均已修复或经作者合理解释关闭，包括：

WebGL2 守卫：上游 _canBatch 约束已保证，不需要重复防御 ✅
renderer_LocalMat/renderer_MVInvMat 删除 ✅
_canBatch 缺 renderer 级宏比较：已加 shaderData._macroCollection.isEqual() ✅
SubRenderElement dispose NPE：pool GC 后清空，不复用旧对象 ✅
ShaderProgramMap engine 必选参数 ✅
bool/uint 类型补全 _std140TypeInfoMap ✅
maxUniformBlockSize 缓存、uniformBlockBinding 移至 link 阶段 ✅

总结

GPU Instancing auto-batching 架构扎实：UBO std140 打包 + shader 注入 + mat3x4 仿射优化。iPhone 实测 +67% 的提升验证了价值。UIBatchSorter canvas 内 visual-layering 排序设计合理，同时修复了 hierarchy 顺序被 unstable quicksort 打散的 bug。整体是高质量的架构改进。

问题

[P0] RenderElement.ts:dispose() — this.instancedRenderers = null 与 set() 中的 this.instancedRenderers.length = 0 不匹配。

ObjectPool 复用路径：garbageCollection() → dispose() → 归还到池 → get() 取出 → set() 初始化。dispose() 后 instancedRenderers = null，set() 执行 this.instancedRenderers.length = 0 时 null.length 抛 TypeError，必现崩溃。

 dispose(): void {
-    this.instancedRenderers = null;
+    this.instancedRenderers.length = 0;
 }

[P1] ShaderFactory.ts — _scanInstanceUniformsWithMacros 不处理 #if/#elif

当前实现只识别 #ifdef/#ifndef/#else/#endif。GLSL 代码中 #if defined(X) && !defined(Y) 或 #elif 形式的条件分支不会被识别，导致 branchStack 不 push/pop，后续所有 #endif pop 出错，#endif 之后的 renderer uniform 被错误排除在 fieldMap 之外，instanced 模式下 uniform 未被重映射，GPU 读到错误数据，渲染静默错误。

packages/core/src/shaderlib/ShaderFactory.ts 中 _ifdefRegex、_ifndefRegex 之后需要补：

private static readonly _ifRegex = /^[ \t]*#if\b/;
private static readonly _elifRegex = /^[ \t]*#elif\b/;

并在循环中处理：

m = line.match(ShaderFactory._ifRegex);
if (m) { branchStack.push(false); continue; } // treat as inactive branch for safety

m = line.match(ShaderFactory._elifRegex);
if (m) { branchStack.pop(); branchStack.push(false); continue; }

注意：#if 带表达式无法静态求值（需要完整 preprocessor），但至少保持 branchStack 不乱。当前代码注释已承认"supported; #if with expressions is treated as always-active"，但代码里根本没有处理 #if，导致 branchStack 损坏而不是"always-active"。

[P2] UIBatchSorter.ts — 静态成员在多画布同帧调用时存在状态泄漏

_entries、_grid、_dirtyCells、_localBoundsPool、_worldToLocal 均为静态成员，若同一帧内多个 UICanvas 各自调用 sort()（ScreenSpaceCamera + WorldSpace 并存），后一次调用的 _worldToLocal 会覆盖前一次，_entries 跨调用累积。请确认是否有多画布同帧场景。如有，改为实例成员或在 sort() 入口强制重置所有静态状态。

结论

P0 NPE 和 P1 branchStack 损坏是可复现 bug，需要在合入前修复。P2 UIBatchSorter 视多画布场景是否存在决定是否需要处理。

cptbtptpbcptdtptp · 2026-05-12T12:53:41Z

+        if (propertyId === modelMatId) {
+          // Instancing skips _updateTransformShaderData, so worldMatrix is not in propertyValueMap
+          // Must read from transform getter to trigger lazy update
+          field.pack(floatView, fieldOffset, renderer.entity.transform.worldMatrix);


renderer.entity.transform.worldMatrix 和 Renderer._updateTransformShaderData 里的 this._transformEntity.transform.worldMatrix 不一致。当前 MeshRenderer 两者相同所以安全，但如果未来有子类修改了 _transformEntity（如 SkinnedMeshRenderer 设为 rootBone）又支持 instancing，会静默用错 transform。

建议改为：

field.pack(floatView, fieldOffset, renderer._transformEntity.transform.worldMatrix);

cptbtptpbcptdtptp · 2026-05-12T13:00:31Z

+    vec4: { size: 16, align: 16 },
+    ivec4: { size: 16, align: 16 },
+    mat4: { size: 64, align: 16 },
+    mat3x4: { size: 48, align: 16 }


_std140TypeInfoMap 缺少 bool / bvec2~4 / mat2 / mat3 / uvec2~4，当 shader 中声明了这些类型的 renderer uniform 时会导致 shader 编译失败。

复现路径：

_scanInstanceUniforms 匹配到 uniform mat3 renderer_Foo → 写入 fieldMap → 删除 uniform 声明（返回空字符串）

_buildLayout → addField → std140Map["mat3"] 为 undefined → 静默 return，字段不进 instanceFields

_buildFieldDefines 不生成对应的 #define renderer_Foo ...

GLSL 编译时 renderer_Foo 既没有 uniform 声明也没有 #define → 编译报错 undefined variable

仅在 instancing 路径触发（2+ 相同 renderer 合批），单个 MeshRenderer 不走 instancing 则不会暴露。

建议两种方案之一：

补全 _std140TypeInfoMap 中缺失的类型

在 _scanInstanceUniforms 中，类型不在 type map 里时保留原始 uniform 声明（return match）并 log warning，而不是删除

…ity's InstanceBuffer read renderer.entity.transform.worldMatrix, while Renderer._updateTransformShaderData uses _transformEntity.transform. They diverge whenever a subclass remaps _transformEntity (e.g. SkinnedMeshRenderer points it at the root bone); switch InstanceBuffer to _transformEntity to align with the shader-data path.

…sable Replace `any[]` with `RenderElement[]` on `_renderElements` / `_batchedRenderElements`, and clear both in `_onDisable` so the pooled elements they hold don't surface stale references when other renderers reuse the same pool slots while the canvas is disabled.

GuoLei1990

增量审查（2026-05-12，基于最新 commits）

已关闭问题清单

前 15+ 轮 review 提出的所有问题均已修复或经验证关闭，包括：

_canBatch 缺少 macroCollection.isEqual + frontFace 判定 ✅
renderer_LocalMat/renderer_MVInvMat 删除 ✅
std140 对齐、mat3x4 仿射优化 ✅
uniformBlockBinding 移至 link 阶段 ✅
InstanceLayout 失效策略（存在 ShaderProgram） ✅
_buildMissingCameraDecls DCE camera uniform 补全 ✅
renderer_ModelMat force-inject ✅
UIBatchSorter 静态成员多画布污染 — 逐字段验证安全，sort() 入口各字段均完整重置 ✅

问题

[P0] RenderElement.ts — dispose() 设 instancedRenderers = null，set() 调用 .length = 0，ObjectPool 复用路径 NPE 必现

commit 99ec5d138a（"refactor: null out instancedRenderers in dispose"）引入了此问题：

// dispose():
this.instancedRenderers = null;  // ← 归还到 pool

// set()（下次从 pool 取出时调用）:
this.instancedRenderers.length = 0;  // ← TypeError: Cannot set properties of null

ObjectPool 复用路径：garbageCollection() → dispose() 归还 → get() 取出 → set() 初始化 → NPE 必现。

注意：这与之前 SubRenderElement 关闭的理由不同——SubRenderElement pool 在 garbageCollection() 后立即清空整个数组，下次走 new，disposed 对象永不复用。RenderElement 所在 pool 是否相同逻辑，请作者确认：

若复用：改为 this.instancedRenderers.length = 0
若不复用：在注释中标注，防止后人误解

[P1] ShaderFactory.ts — _scanInstanceUniformsWithMacros 不处理 #if/#elif，branchStack 错位导致 uniform 静默丢失

当前只有四个正则：_ifdefRegex/_ifndefRegex/_elseRegex/_endifRegex。遇到 #if expr 或 #elif 时不压栈，其配对 #endif 仍然弹栈，导致 branchStack 错位——#endif 之后本应活跃的 renderer_ uniform 被错误跳过，instanced 模式下静默渲染错误。

代码注释写了 // #if with expressions is treated as always-active，但实现中根本没有处理 #if，实际效果是栈错位而非 always-active。

最小修复：

m = line.match(/^[ \t]*#if\b/);
if (m) { branchStack.push(branchStack[branchStack.length - 1]); continue; }

m = line.match(/^[ \t]*#elif\b/);
if (m) { branchStack.pop(); branchStack.push(branchStack[branchStack.length - 1]); continue; }

若此函数确为死代码（injectInstanceUBO 调用时不传 activeMacros），请标注 @deprecated 或删除，避免后人启用后遇到静默 bug。

[P2] e2e/case/gpu-instancing-auto-batch.ts — 依赖外部 CDN，e2e flaky 隐患（非阻塞）

依赖 gw.alipayobjects.com Duck GLB，CDN 抖动会导致 CI flaky。其他 e2e case 全部使用本地 PrimitiveMesh + diffPercentage: 0。建议改用 PrimitiveMesh.createCuboid/createSphere 消除外部依赖。

结论

P0 和 P1 需在合入前明确处理（修复或以注释说明为何无运行时影响）。P2 建议合入前一并处理。

ShaderData.setInt documents bool support and setVector* documents bvec support, but the instance UBO layout table didn't list them — declaring a `bool` or `bvec` renderer uniform would silently drop the declaration without producing a #define, causing the shader to fail with an undeclared identifier. Add std140 size/align entries, reuse existing packScalar/packVec pack functions, and recognize the `b` prefix for the intView dispatch.

GuoLei1990

谢谢 review。在 00b4462 里部分采纳，按"ShaderData 暴露的接口集合"做边界：

已补：bool / bvec2-4

ShaderData.setInt JSDoc 明文支持 bool
ShaderData.setVector* JSDoc 明文支持 bvec*
既然能上传，应该让 instance UBO 走通；新加 std140 size/align、复用现有 packScalar / packVec*、useIntView 加 b 前缀识别

未补：mat2 / mat3 / mat2x3/3x2/2x4/4x2/3x4/4x3

ShaderData 没有对应 setter（setMatrix 只接 Matrix 即 mat4），用户结构性无法上传这些类型的值
即使补 _std140TypeInfoMap entry，pack 函数也拿不到合法的 JS 值来源 — 这部分会变成死代码
现状下用户写 mat3 renderer_Foo，scan 删除声明 → 编译时 undeclared identifier —— 行为正确（fail-fast），只是错误信号停留在 GLSL compile log

未补：uvec2-4

setIntArray JSDoc 只列 bool/int/bvec*/ivec*,没有 uvec
uint 现在 std140Map 里有 entry 是历史遗留(可能位级走 setInt 兼容),但接口未公开承诺,先不扩到 uvec

如果后续 ShaderData 加 setUInt/setBoolean 等显式 setter,再同步扩 _std140TypeInfoMap。当前以 ShaderData 接口为权威边界,保持二者对齐。

第二个建议(保留原 uniform + warn)经审视没有采纳:instancing 路径 RenderQueue 跳过 program.uploadAll(rendererUniformBlock, ...),plain uniform 永远是 0,等于静默错渲染,比 undeclared identifier 更难调试。

GuoLei1990

已关闭问题清单

所有 15+ 轮历史问题均已关闭（详见前轮 review）。

本轮增量审查（5 个 refactor commit）

fd747eb3 — store InstanceLayout on ShaderProgram during compilation：将 InstanceLayout 从"渲染时扫描"移为"编译时一次性产出并缓存在 ShaderProgram 上"，消除了冗余的两次扫描。_canBatch 不依赖 InstanceLayout（只判断 macroCollection/frontFace），此改动不影响合批判定逻辑。SubShader._layoutCache 删除，无外部调用方，干净。

5d02bbe1 — inline _compilePlatformSource：该方法只透传参数，无独立逻辑，内联是正确操作，调用栈减少一层。

910b8ccc — inline _buildMacroMap：_scanInstanceFields 删除后只剩一个调用方，内联合理，减少 23 行。

891d2c25 — rename isGpuInstance → isGPUInstance：对齐引擎缩写惯例，grep 确认无旧名残留。

99ec5d13 — null out instancedRenderers in dispose：ClearableObjectPool.garbageCollection() 在 dispose 后将底层数组置空，dispose 后对象不会被复用，赋 null 是正确的永久释放语义，不存在 NPE 风险。

总结

5 个 refactor commit 均为正确的代码整理，无新问题引入。自上次 LGTM 以来无行为变更。

LGTM，可合入。

cptbtptpbcptdtptp · 2026-05-13T08:08:04Z

读完 PR 后写了一组 vitest 验证关键正确性问题，分享一下确认结果（5 个 test 全 pass，断言的都是当前 BUG 行为）。

🐛 P0#1：自定义 shader 只声明 derived built-in 时合批后整组消失（user-facing）

触发条件：用户自定义 shader 只声明 derived built-in（如 uniform mat4 renderer_MVPMat;），不声明 renderer_ModelMat，也没有任何自定义 renderer-group uniform。

触发链路：

单实例渲染时 Renderer._updateTransformShaderData 把 renderer_MVPMat 上传 → 画面正常 ✅
加到第 2 个 MeshRenderer（同 mesh + 同 material + 同 shader）→ 满足 MeshRenderer._canBatch
RenderQueue 启用 RENDERER_GPU_INSTANCE 宏重编译 → injectInstanceUBO：
- _scanInstanceUniforms 见 renderer_MVPMat isDerived=true → return "" 剥离声明
- fieldMap 无任何 renderer uniform → hasField=false 早返回
- 返回的 source 中 renderer_MVPMat 声明被剥离但 gl_Position = renderer_MVPMat * ... 引用保留
shader compile 失败 → program.isValid=false
RenderQueue.render 命中 if (!program.isValid) continue; → 该 pass 的整组 batched renderer 静默从画面消失

用户视角：写了一个单实例下能跑的 shader，加第 2 个 entity 时整组消失，无任何 UI 报错（只有 console.warn）。极隐蔽。

根因：packages/core/src/shader/ShaderFactory.ts:251-258

const isDerived = builtinUniforms[name];
if (isDerived === undefined && ShaderProperty._getShaderPropertyGroup(name) !== ShaderDataGroup.Renderer)
  return match;
if (isDerived) return "";   // ← 剥离声明，但没记录"已剥离过 derived"

后续 hasField 检查只看 fieldMap，不知道有 derived 被剥离过。

修复方向：要么记一个 derivedStripped flag，命中时也走 force-inject ModelMat 的完整路径；要么不在 hasField=false 时返回 stripped source。

🐛 P0#4：多 canvas 共享 atlas 时跨 canvas 合批数据错乱（user-facing）

触发条件：两个或多个 UICanvas 用同一 sprite atlas / material，同 priority + 同 distanceForSort（典型如多个 overlay HUD）。

触发链路：

每个 canvas 内部经 UIBatchSorter + BatcherManager.batch 合批 → leader _isBatched=true，subMesh.count 已扩展到合批总量
canvas leaders push 到主 transparent queue
sort 后：subDistancePriority = i_in_canvas 让两个 canvas 的 leaders 交叉（A[0], B[0], A[1], B[1]…）
主管线 BatcherManager.batch 对相邻 A[0], B[0]：_canBatch 同 chunk / mat / tex → true
VertexMergeBatcher.batch(A[0], B[0]) 假设 cur 是「未合批的单 sub-element」：
- length = cur.subChunk.indices.length = 6（B 自己 quad 的 local indices，不是已合批的 12）
- pre.subMesh.count += 6（应该是 +12 才能覆盖完整 B）
- append cur's local indices 到 chunk 末尾（重复已写入的数据）

结果：

canvasA range 吞掉 canvasB 的第一个 quad（重叠绘制）
canvasB 的第二/三... sub-element 完全消失
chunk 尾部追加一段永远不被引用的 dup indices

根因：packages/core/src/RenderPipeline/VertexMergeBatcher.ts:42-56，_isBatched 保护只 cover preElement === null 的 case，没 cover leader-to-leader 扩展。

修复方向 A（保正确性，推荐先做 hotfix）：

// VertexMergeBatcher.canBatchSprite
if (preElement._isBatched || curElement._isBatched) return false;

修复方向 B（真正支持跨 canvas 合批，作为 follow-up feature）：

_batch 区分 cur 是已合批 leader（用 cur.subMesh.count）还是未合批 sub-element（用 cur.subChunk.indices.length）
cur 已合批时不要 append indices（数据已在 chunk）
canBatchSprite 补连续性检查：pre.subMesh.start + pre.subMesh.count === cur.subMesh.start
排序保证「同 canvas 的 leaders 聚集」（比如加 canvasInstanceId tiebreaker）

跨 canvas 合批是有性能价值的（多 HUD 共享 atlas 很常见），但当前实现是「半成品」——既画错了也没真正实现，建议先 fix 再 feature。

⚠️ P0#2（dormant）：`_scanInstanceUniformsWithMacros` 对 `#if` / `#elif` 处理破坏 branchStack

外层 #ifdef A 内出现 #if defined(B) → #if 不入栈但 #endif pop → 把外层 #ifdef A 的栈帧错误地 pop 掉，A 关闭时 inner-#endif 之后的 uniform 仍被错误收集。

代码缺陷真实存在（test 已验证），但当前未在生产路径触发：唯一调用方 ShaderPass._compileShaderSource:160 不传 activeMacros，永远走 _scanInstanceUniforms 分支。未来若启用 raw-GLSL path 就会爆。可作为 follow-up 处理，或加 throw/warn 兜底。

⚠️ P1 breaking change：`renderer_LocalMat` / `renderer_MVInvMat` 被移除

packages/core/src/Renderer.ts + packages/shader/src/ShaderLibrary/Common/Transform.glsl 中这两个内置 uniform 被删除，但 PR description 没提到。任何用户 shader 引用它们在合批和非合批 path 上都会编译失败（_updateTransformShaderData 不再 set 这两个）。

建议 CHANGELOG / breaking change 标记 + 给一个迁移建议（用户可以改成 inverse(camera_ViewMat * renderer_ModelMat) 和 inverse(renderer_ModelMat)）。

🧪 验证测试

vitest test 文件如下，放到 tests/src/core/shader/InstancingP0.test.ts，跑法：

HEADLESS=true npx vitest run tests/src/core/shader/InstancingP0.test.ts

5 个 test 全 pass，断言的都是当前 BUG 行为，修完后将这些断言取反即变成 regression guard。

测试文件全文（点开展开）

import { ShaderFactory, ShaderProperty, VertexMergeBatcher, RenderElement } from "@galacean/engine-core";
import { Logger, WebGLEngine } from "@galacean/engine";
import { beforeAll, describe, expect, it } from "vitest";

Logger.enable();

let engine: any;
beforeAll(async () => {
  const canvas = document.createElement("canvas");
  engine = await WebGLEngine.create({ canvas });
  // In real engine flow ShaderProperty._group is set when shaderData.setXxx(prop, val) is called
  // on a Renderer's shaderData. In unit tests we patch directly.
  const ShaderDataGroup_Renderer = 2;
  for (const name of ["renderer_CustomColor", "renderer_InsideA_BeforeIf", "renderer_InsideA_AfterIf"]) {
    (ShaderProperty.getByName(name) as any)._group = ShaderDataGroup_Renderer;
  }
});

describe("P0#1 — injectInstanceUBO with only derived built-ins", () => {
  it("strips renderer_MVPMat without injecting UBO → leaves dangling reference", () => {
    const vs = `uniform mat4 renderer_MVPMat;
uniform vec4 material_BaseColor;
attribute vec3 POSITION;
void main() { gl_Position = renderer_MVPMat * vec4(POSITION, 1.0); }`;
    const fs = `uniform vec4 material_BaseColor;
void main() { gl_FragColor = material_BaseColor; }`;

    const res = ShaderFactory.injectInstanceUBO(engine, vs, fs);
    expect(res.instanceLayout).toBeNull();
    expect(/uniform\s+mat4\s+renderer_MVPMat/.test(res.vertexSource)).toBe(false); // declaration stripped
    expect(/renderer_MVPMat/.test(res.vertexSource)).toBe(true);                   // reference remains → compile error
  });

  it("PR's own example shape (has renderer_CustomColor) is safe via force-inject ModelMat", () => {
    const vs = `uniform mat4 renderer_MVPMat;
uniform vec4 renderer_CustomColor;
attribute vec3 POSITION;
void main() { gl_Position = renderer_MVPMat * vec4(POSITION, 1.0); }`;
    const fs = `uniform vec4 renderer_CustomColor;
void main() { gl_FragColor = renderer_CustomColor; }`;

    const res = ShaderFactory.injectInstanceUBO(engine, vs, fs);
    expect(res.instanceLayout).not.toBeNull();
    const fields = (res.instanceLayout as any).instanceFields.map((f: any) => f.property.name);
    expect(fields).toContain("renderer_ModelMat");
    expect(fields).toContain("renderer_CustomColor");
    expect(/#define renderer_MVPMat/.test(res.vertexSource)).toBe(true);
  });
});

describe("P0#3 — user-facing manifestation of P0#1 (program.isValid=false → silent continue)", () => {
  it("same shader: single-instance path compiles OK, batched path compiles FAIL", () => {
    const userVs = `attribute vec3 POSITION;
uniform mat4 renderer_MVPMat;
void main() { gl_Position = renderer_MVPMat * vec4(POSITION, 1.0); }`;
    const userFs = `void main() { gl_FragColor = vec4(1.0, 0.0, 0.0, 1.0); }`;

    const gl: WebGL2RenderingContext = engine._hardwareRenderer.gl;

    // (a) Single-instance path: source not run through injectInstanceUBO
    const vs1 = gl.createShader(gl.VERTEX_SHADER)!;
    gl.shaderSource(vs1, "#version 300 es\nprecision highp float;\n" + userVs.replace(/\battribute\b/g, "in"));
    gl.compileShader(vs1);
    expect(gl.getShaderParameter(vs1, gl.COMPILE_STATUS)).toBe(true); // works in non-batched scenes

    // (b) Batched path: injectInstanceUBO runs, source corrupted, compile fails
    const broken = ShaderFactory.injectInstanceUBO(engine, userVs, userFs);
    const vs2 = gl.createShader(gl.VERTEX_SHADER)!;
    gl.shaderSource(vs2, "#version 300 es\nprecision highp float;\n" + broken.vertexSource.replace(/\battribute\b/g, "in"));
    gl.compileShader(vs2);
    expect(gl.getShaderParameter(vs2, gl.COMPILE_STATUS)).toBe(false); // same source, breaks after batching

    // Downstream RenderQueue.render: if (!program.isValid) continue; → entire batched group not drawn.
  });
});

describe("P0#2 — macro-aware scanner mishandles `#if` / `#endif` (dormant: not on production path)", () => {
  it("`#if defined(B)` inside `#ifdef A` corrupts branchStack — uniform after inner #endif wrongly collected", () => {
    const src = `
#ifdef A
  uniform vec4 renderer_InsideA_BeforeIf;
  #if defined(B)
    nothing;
  #endif
  uniform vec4 renderer_InsideA_AfterIf;
#endif
`;
    const fieldMap: Record<number, string> = Object.create(null);
    // @ts-ignore — private method, accessible at runtime
    ShaderFactory._scanInstanceUniformsWithMacros(src, fieldMap, new Set<string>() /* A is OFF */);

    const beforeId = ShaderProperty.getByName("renderer_InsideA_BeforeIf")._uniqueId;
    const afterId = ShaderProperty.getByName("renderer_InsideA_AfterIf")._uniqueId;

    expect(beforeId in fieldMap).toBe(false); // sanity: A=off → not collected
    expect(afterId in fieldMap).toBe(true);   // BUG: stack pop corrupted by inner #endif → collected
  });
});

describe("P0#4 — VertexMergeBatcher.batch over-extends preElement when both are _isBatched leaders", () => {
  it("simulates two canvases' already-batched leaders meeting in the main pipeline", () => {
    const sharedChunk: any = {
      indices: new Uint16Array(256),
      updateIndexLength: 0,
      updateVertexStart: Number.MAX_SAFE_INTEGER,
      updateVertexEnd: Number.MIN_SAFE_INTEGER
    };

    function buildCanvasLeader(leaderQuadStart: number, secondQuadStart: number) {
      const leaderSub = { chunk: sharedChunk, vertexArea: { start: leaderQuadStart, size: 36 },
                          subMesh: { start: 0, count: 0, topology: 0 }, indices: [0, 1, 2, 2, 3, 0] };
      const secondSub = { chunk: sharedChunk, vertexArea: { start: secondQuadStart, size: 36 },
                          subMesh: { start: 0, count: 0, topology: 0 }, indices: [0, 1, 2, 2, 3, 0] };
      const leader = new RenderElement(); leader.subChunk = leaderSub as any;
      const second = new RenderElement(); second.subChunk = secondSub as any;
      VertexMergeBatcher.batch(null, leader);      // canvas-internal: init
      VertexMergeBatcher.batch(leader, second);    // canvas-internal: extend
      leader._isBatched = true;
      return leader;
    }

    const canvasA = buildCanvasLeader(0, 36);
    const canvasB = buildCanvasLeader(72, 108);

    const aStart = (canvasA.subChunk as any).subMesh.start;
    const aCountBefore = (canvasA.subChunk as any).subMesh.count;
    const bStart = (canvasB.subChunk as any).subMesh.start;
    const bCount = (canvasB.subChunk as any).subMesh.count;

    expect(aCountBefore).toBe(12);              // 2 quads × 6 indices
    expect(bCount).toBe(12);
    expect(bStart - aStart).toBe(aCountBefore); // contiguous in chunk

    // Main pipeline: BatcherManager.batch sees canvasA, canvasB adjacent → calls VertexMergeBatcher.batch
    VertexMergeBatcher.batch(canvasA, canvasB);

    const aCountAfter = (canvasA.subChunk as any).subMesh.count;

    // BUG: canvasA.count got +6 (curElement.subChunk.indices.length, NOT canvasB's batched 12)
    expect(aCountAfter).toBe(aCountBefore + 6);
    // BUG: canvasA.range now spans [0..18], covering canvasB's FIRST quad but NOT the second.
    // canvasB's second quad at [18..24] silently disappears.
    expect(aStart + aCountAfter).toBe(bStart + 6);
  });
});

总结

P0#1 + P0#4 都是 user-visible 的正确性 bug，建议 blocker。
P0#2 dormant，可作为 follow-up，但仍建议加 throw/warn 兜底。
P0#3 严格说是 P0#1 的下游表现（continue 静默丢整组）—— 之前 review 把它列为独立 P0 不够准确。
P1 breaking change 务必加进 CHANGELOG。

GPU instancing 这个方向 + UBO 派生 define 的方案我认可，性能数据也很 solid 👍。主要担心的是几个 corner case 在 single-canvas / 标准 shader 的测试里不会暴露，希望能在合并前 cover 一下。

cptbtptpbcptdtptp · 2026-05-13T08:37:26Z

补充两条之前 review 里列为 P1、复审后确认存在的问题。

🐛 P1#8 → 应升 P0：`_buildLayout` 静默跳过 mat3 / array 类型，触发与 P0#1 同 manifestation

触发条件：用户 shader 声明一个 renderer-group uniform，类型不在 _std140TypeInfoMap 中（mat3 是最常见的——切线空间法线变换、cubemap 旋转、3D 噪声场都会用）。

触发链路：

_scanInstanceUniforms 见 uniform mat3 renderer_X; → group=Renderer → fieldMap[id] = "mat3" + 剥离声明
_buildLayout::addField 见 _std140TypeInfoMap["mat3"] === undefined → 静默 return（不加入 instanceFields）
_buildUBODeclaration / _buildFieldDefines 只看 instanceFields → UBO struct 没有 renderer_X，也没生成 #define renderer_X
shader source 中 renderer_X 引用悬挂 → compile fail → program.isValid=false → 整组消失

集成测试输出（完整 test 里加的 InstancingP1_8_integration.test.ts）：

ERROR: 0:23: 'renderer_CustomMat3' : undeclared identifier
 
Shader source:
0:1    #version 300 es
0:3   struct RendererInstanceStruct {
0:4           mat3x4 renderer_ModelMat;      // ← mat3 的 renderer_CustomMat3 被 silently 跳过
0:5   };
...
0:22  gl_Position = renderer_MVPMat * vec4 ( POSITION , 1.0 ) ;
0:23  v_color = renderer_CustomMat3 * vec3 ( 1.0 , 0.0 , 0.0 ) ;   // ← 引用悬挂

最小复现 shader：

Shader "X" { SubShader "D" { Pass "F" {
  mat4 renderer_MVPMat;
  mat3 renderer_CustomMat3;   // ← std140 不支持的类型
  ...
  Varyings vert(Attributes attr) {
    gl_Position = renderer_MVPMat * vec4(attr.POSITION, 1.0);
    v.v_color = renderer_CustomMat3 * vec3(1.0, 0.0, 0.0);
    return v;
  }
}}}

根因：和 P0#1 同一类——_scanInstanceUniforms 太"急"地剥离声明，把"剥离决定"和"是否真的能加进 UBO"两件事耦合在一起。

ShaderFactory.ts:251-262：

fieldMap[ShaderProperty.getByName(name)._uniqueId] =
  type === "mat4" && name === "renderer_ModelMat" ? "mat3x4" : type;
return "";  // ← strip 声明，但只有 mat4/vec4 等才能真的进 UBO

ShaderFactory.ts:357-366（addField 静默跳过未知类型）：

const addField = (id: number): void => {
  const type = fieldMap[id];
  const info = std140Map[type];
  if (!info) return;          // ← 静默 skip
  ...
};

修复方向（和 P0#1 系统性修法一致）：分两步。

// step 1: scan 时只记录意图,先检查类型是否在 _std140TypeInfoMap 中
const supportedTypes = ShaderFactory._std140TypeInfoMap;
// _scanInstanceUniforms:
if (!(type in supportedTypes) && !(name === "renderer_ModelMat")) {
  Logger.error(`GPU Instancing does not support uniform "${name}" of type "${type}"`);
  return match;  // 保留声明,不进 fieldMap
}

或者更彻底——_scanInstanceUniforms 收集 + 校验后才决定整批是否能 inject UBO，不能就保留原声明并 disable instancing macro。

⚠️ P1#7：派生 built-in 用 `#define` 实现，多次引用时性能依赖 driver CSE（mobile 风险）

问题事实（不依赖驱动，纯字符串可证）：

injectInstanceUBO 输出的 source 中包含：

#define renderer_NormalMat mat4(transpose(inverse(mat3(renderer_ModelMat))))
#define renderer_MVMat (camera_ViewMat * renderer_ModelMat)
#define renderer_MVPMat (camera_VPMat * renderer_ModelMat)

GLSL 预处理器对 #define 是 token replacement——每个引用点在编译时被替换成完整表达式。如果 user shader / built-in shader 在 vertex 或 fragment 中多次引用 renderer_NormalMat，每个引用都展开成 mat4(transpose(inverse(mat3(renderer_ModelMat))))，是否合并成一次计算完全依赖 GLSL 编译器 CSE。

为什么是确认存在的问题，不是 hypothetical：作者本人在这个 PR 里给 VertexPBR.glsl 加了一段手动 cache：

// VertexPBR.glsl (PR diff)
- inputs.normalWS = normalize( mat3(renderer_NormalMat) * normal );
+ mat3 normalMat = mat3(renderer_NormalMat);
+ inputs.normalWS = normalize( normalMat * normal );
  #ifdef NEED_VERTEX_TANGENT
-     vec3 tangentWS = normalize( mat3(renderer_NormalMat) * tangent.xyz );
+     vec3 tangentWS = normalize( normalMat * tangent.xyz );

如果驱动 CSE 总是成立，这层手动 cache 不需要加。author 加它说明自己也不信赖 mobile driver 一定 CSE。但这层 cache 只覆盖 PBR vertex——Pipeline/ShadowCaster.shader(同 PR 中引用 mat3(renderer_NormalMat) 但没 cache)、未来加的内置 shader、以及所有 user shader 都不受益。

桌面验证不到：chrome ANGLE 把 GLSL 转译到 HLSL/Metal，转译器和下游编译器都做激进 CSE，桌面 demo 看不出 FPS 差异。这是 mobile-only 的感知问题——ARM Mali、旧 Adreno 驱动上 fragment shader 每像素重算 transpose(inverse(mat3(...))) 成本不可忽视。

修复方向（driver-independent）：injectInstanceUBO 在 void main() 之后注入局部缓存，再用 #define 指向局部变量，把 CSE 责任从 driver 收回到我们自己：

// 在 v_instanceID = gl_InstanceID; 之后注入：
mat4 _instance_NormalMat = mat4(transpose(inverse(mat3(renderer_ModelMat))));
mat4 _instance_MVMat = camera_ViewMat * renderer_ModelMat;
mat4 _instance_MVPMat = camera_VPMat * renderer_ModelMat;
// 然后:
#define renderer_NormalMat _instance_NormalMat
#define renderer_MVMat _instance_MVMat
#define renderer_MVPMat _instance_MVPMat

这样 vertex shader 每次 invocation 只算一次，fragment 取插值即可（视场景）。

只 inject 实际被引用到的 cache（前置扫描一遍）可以避免 over-injection。

修正后的优先级

问题	之前级别	复审后级别	备注
自定义 shader 只用 derived built-in	P0#1	P0	已发
跨 canvas 合批数据错乱	P0#4	P0	已发
`mat3 renderer_*` 类型	P1#8	P0（应升级）	同 P0#1 manifestation
derived `#define` mobile 性能	P1#7	P1（保留）	桌面不可复现但代码事实存在,fix 性价比高
`_scanInstanceUniformsWithMacros` `#if`	P0#2	P1（降级）	dormant,生产路径未调
`renderer_LocalMat` / `renderer_MVInvMat` 移除	P1#6	P1	作者有意,需 CHANGELOG 标记
`RenderElement.dispose` instancedRenderers=null	P1#11	❌ 撤回	误判,pool gc 后立刻清空
`UICanvas._onDisable` 不 call super	P1#12	❌ 撤回	Component._onDisable 是空方法

P0#1 和 P0(升级)#8 共享根因，建议一起修；P0#4 独立；P1#7 性价比高建议同 PR 内顺带处理。

…piling Shaders that only declare derived built-ins (e.g. `renderer_MVPMat`) had their declarations stripped by `_scanInstanceUniforms` but skipped the `#define` rewrite because `fieldMap` was empty, leaving dangling `renderer_ModelMat` references that fail to compile. This also fixed a latent bug where transform-free shaders with `RENDERER_INSTANCING` enabled would silently lose `N-1` instances — `_canBatch` returned true and only the leader was drawn.

GuoLei1990

已关闭问题清单

#	问题	状态
1	_canBatch 缺少 renderer-level state 判定	已修复（macroCollection.isEqual + frontFace）
2	SubRenderElement ObjectPool NPE	已关闭（dispose 后 elements.length = 0 清空池，不会复用）
3	renderer_LocalMat / renderer_MVInvMat 移除	已修复（完全移除，文档同步）
4	uniformBlockBindingMap 归属	已修复（移到 ShaderFactory）
5	_builtinRendererUniforms Map → Record	已修复
6	ShaderProgramMap 重命名	已修复
7	_std140TypeInfoMap 缺少 bool 类型	已修复
8	Opaque 距离排序移除	已关闭（移动端 TBDR 硬件处理 overdraw）
9	_derivedDefines 同时声明 VS/FS	已修复（shader type 区分）
10	GLBuffer ConstantBuffer WebGL1 guard	已处理
11	array uniform 检测并报错	已添加
12	_scanInstanceUniforms 正则与 #ifdef 交互	已关闭（P2，实际场景中 renderer uniform 不在条件块内）

总结

GPU Instancing 自动合批方向正确，实现完整，架构收敛。通过 UBO 打包 per-instance renderer uniform（ModelMat 以 mat3x4 仿射优化、Layer、自定义 uniform），ShaderFactory 注入 UBO 声明和 #define 重映射，MeshRenderer 以 material + primitive + macro + frontFace 四维判定合批。per-pass 独立布局使 ShadowCaster 获得更高 instanceMaxCount。iPhone 16 Pro Max 实测 +67% FPS（5000 objects → 21 draw calls）数据有说服力。

前 13+ 轮 review 提出的所有关键问题均已修复或关闭。

问题

[P2] ShaderFactory.ts — 数组 uniform 被静默删除，shader 编译错误指向 GLSL 而非真正原因

当 shader 声明了 renderer_* 数组 uniform（如 uniform vec4 renderer_BoneMatrices[64]），当前逻辑 Logger.error + return "" 直接删除声明。这会导致后续 shader 编译失败（引用了未声明的变量），但错误信息指向 GLSL 编译错误而非真正原因，难以诊断。

建议保留原始 uniform 声明不修改（return match），让非 instancing shader 正常工作；或在 error 消息中补充"该 shader 不支持 GPU Instancing，已 fallback 到非合批渲染"。

不阻塞合并。无新 P0/P1。LGTM，可合入。

（提醒：建议在 release notes 中标注 renderer_LocalMat 和 renderer_MVInvMat 移除，供用户迁移参考。）

UICanvas pre-batches its children into leaders that carry a self-contained draw range. When two canvases sharing the same atlas push their leaders into the transparent queue, the main-pipeline batcher previously fed those leaders back into `_canBatch`/`_batch` as if they were single sub-elements. That corrupted `subMesh.count` and re-appended already-written indices, dropping draws or overlapping ranges. The batch boundary is the canvas — matches Unity uGUI's behavior — so detect already-batched leaders at the batcher entry and pass them through.

…inject `_scanInstanceUniformsWithMacros` and `injectInstanceUBO`'s `activeMacros` parameter only served the raw GLSL path, which was removed in e08af33 ("refactor(shader): remove raw GLSL shader path"). The 4 preprocessor regexes are only used by this function. All inputs are now ShaderLab preprocessor-evaluated before reaching the injector. Cleanup that should have landed alongside e08af33. Also reported by reviewer as a dormant `#if`/`#elif` branch-stack bug — moot once the function is gone.

`else if (curElement._isBatched) return` in `VertexMergeBatcher.batch` was defensive: it caught the case where main pipeline fed an already- batched leader back into `_batch(null, leader)`. After commit 9606bf3 moved that check to `BatcherManager.batch`'s entry, the leader never reaches this branch. Refresh `RenderElement._isBatched`'s doc to match its current meaning ("batch leader — must not be merged again").

# Conflicts: # packages/shader/compiledShaders/PBR.shaderc # packages/shader/compiledShaders/Pipeline/DepthOnly.shaderc # packages/shader/compiledShaders/Pipeline/ShadowCaster.shaderc

GuoLei1990

已关闭问题清单

所有历史 P0/P1 问题均已在前 15+ 轮 review 中修复或经作者解释关闭（详见历史 review 记录）。

总结

为 MeshRenderer 实现自动 GPU Instancing，通过 UBO 打包 per-instance 数据驱动 instanced draw call。关键技术决策：

mat3x4 仿射优化（48 vs 64 字节）— 利用 ModelMat 恒为仿射变换的架构保证
派生 uniform 通过 #define 实时计算（MVMat/MVPMat/NormalMat）— 消除 UBO 冗余存储
Per-pass 独立布局 — ShadowCaster 等轻量 pass 获得更高 instanceMaxCount
InstanceLayout 缓存在 ShaderProgram — 编译时计算一次，渲染时直接取用
Opaque 排序改为 material/primitive 优先 — 合批友好，TBDR 硬件处理 overdraw

iPhone 16 Pro Max 实测 +67% FPS 提升（30→50，5000 objects / 21 draw calls）；UI Canvas 9216 按钮 +120% FPS（25→55）。

问题

无新问题。

自上次 LGTM 以来无新 commit，LGTM，可合入。

增补提醒：建议在 release notes 中标注 renderer_LocalMat 和 renderer_MVInvMat 的移除（破坏性变更），以及 renderer_NormalMat 逐顶点 inverse() 的 @todo（后续可 CPU 端预计算并打包到 UBO）。

`_scanInstanceUniforms` stripped renderer-group uniform declarations before `_buildLayout` decided whether they could fit the std140 layout. For types like `mat3` (which std140's row-padded layout doesn't support in the current `_std140TypeInfoMap`), the declaration was removed but no UBO field or `#define` replaced it — every later reference became an undeclared identifier and the whole pass silently failed to compile. Check the storage type against the std140 map up front. Unsupported types stay declared in the source and emit a clear log; supported types follow the existing strip-and-collect path.

GuoLei1990 · 2026-05-13T13:27:26Z

P0#2 `_scanInstanceUniformsWithMacros` `#if`/`#elif` branch-stack — done (refactor, dead-code removal)

_scanInstanceUniformsWithMacros 和 injectInstanceUBO 的 activeMacros 参数只服务于 raw GLSL path，那条路径在 e08af33d9 ("refactor(shader): remove raw GLSL shader path") 已经被删了，是这一次 cleanup 漏掉的死代码。整个函数（含 4 个 #ifdef/#ifndef/#else/#endif 正则）已经移除，所以 #if/#elif 的栈帧 bug 不再可触发。

commit: a25abab76 refactor(shader): drop dead macro-aware uniform scan in instance UBO inject

P0#4 跨 canvas 合批数据错乱 — done (fix)

走完链路后确认 reviewer 描述的 bug 链 100% 真实存在（UICanvas pre-batch 出 leader → 进 transparent queue → 主管线 BatcherManager.batch 再合一次 → VertexMergeBatcher.batch 按"未合批单 quad"语义处理 leader，导致 subMesh.count 用错值 + indices 重复 append）。

修复方向选了第一性方案而不是 reviewer 提的 hotfix A 或 B：

对照 Unity uGUI 源码确认：Unity 的 batch 边界严格止步于单个 Canvas（m_Batches 是 Canvas 私有成员，CanvasManager::RenderOverlays 逐 canvas 各自渲染，不做跨 canvas batch 合并）。这是有意识的架构选择——canvas 是脏标记隔离单元，跨 canvas 合批会让局部更新污染全局。
我们当前是"无意的跨 canvas 合批"——canvas 内已经合批的 leader 漏进了通用合批流程
第一性修复 = 让 BatcherManager.batch 在入口识别已合批 leader（_isBatched=true）直接透传，绕过 _canBatch/_batch。一处入口拦截覆盖所有合批路径，VertexMergeBatcher 不需要感知 leader 概念

commit: 9606bf3de fix(ui): skip re-batching of canvas-internal leaders in main pipeline

后续清理：上一版本里 VertexMergeBatcher.batch 内 else if (curElement._isBatched) return 是堵这条路径的下游防御，现在源头堵了下游变成 dead code，跟着删了。
commit: 5b82d694e chore(batcher): drop dead leader guard in VertexMergeBatcher

P0#8 `_buildLayout` 静默跳过不支持类型 — done (fix)

链路确认存在（_scanInstanceUniforms 不查类型支持就剥离声明 → _buildLayout::addField 见到 mat3 等查不到 _std140TypeInfoMap 静默 return → 源码 dangling reference → compile fail → 整组消失）。

第一性修复 = 把"能否进 UBO"的能力判断提前到 scan 阶段，与"是否属于 renderer group"放在一起做。不支持类型保留声明并 Logger.error 报错，不再静默走到 build 阶段出问题。

注意：ShaderData 本身没有 setMatrix3 公开 API（只有 setMatrix(Matrix) 给 mat4，或 setFloatArray 后门）。所以"mat3 在 GPU instancing 不支持"是引擎当前能力的真实边界，错误文案对齐这个事实。

commit: 10b926566 fix(instancing): reject unsupported uniform types at scan time

P1 `renderer_LocalMat` / `renderer_MVInvMat` 移除 — 不处理

这两个 uniform 是 commit 761bec406 的有意 cleanup：引擎自带 shader 和文档示例都没引用，Unity/Unreal/Godot 也不提供等价物。整个 monorepo（src/examples/tests/e2e/docs）grep 零引用。2.0 仍在 alpha 阶段（当前 alpha.32），相关变更归在 alpha changelog 即可。不视作面向用户的 breaking change。

P1#7 派生 built-in 用 `#define` 的 mobile CSE 风险 — 不处理（vertex CSE 大概率生效 + 内部 shader 已 audit）

理论风险存在但实际影响 ≈ 0：

派生 built-ins (MVMat / MVPMat / NormalMat) 全部只在 vertex shader 里使用，而 vertex shader 上 driver CSE 普遍生效，弱驱动的 CSE 痛点主要落在 fragment shader 复杂度
引擎内置 shader 多次引用同一 derived built-in 的位置：
- PBR/VertexPBR.glsl: ✅ 已 cache (mat3 normalMat = mat3(renderer_NormalMat);)
- BlinnPhong/ForwardPassBlinnPhong.glsl: 两次 mat3(renderer_NormalMat)，未 cache，但 dev/2.0 上也未 cache（pre-existing）
- ShadowCaster.shader: 只引用 1 次，无 CSE 需求
- 其他所有 shader 都只引用 1 次
用户 shader 多次引用 derived built-in + 跑在 mobile 弱驱动 + 那是性能瓶颈点 = 窄交集场景
reviewer 提的 commit 6b42cc4 mat3 normalMat = mat3(renderer_NormalMat) extraction 是早于 injectInstanceUBO 引入的 refactor，目的是复用 mat3(...) cast 结果，跟 driver CSE 无关，不能作为"作者不信赖 driver CSE"的依据

修复方向（inject 局部 cache）需要前置 use-analysis 避免过度注入、处理 #ifdef 内引用、注入位置、命名冲突等，是显著复杂度增长，跟 P0 fix 不同性质。BlinnPhong 的 cache 缺失作为 follow-up audit 处理。

谢谢 review 这一轮。剩下的小尾巴（BlinnPhong cache 补齐 + 内部 shader audit）我会在 follow-up PR 里收。

GuoLei1990

已关闭问题清单

所有历史 P0/P1 问题均已在前 15+ 轮 review 中修复或经作者解释关闭（详见历史 review 记录）。

总结

为 MeshRenderer 实现自动 GPU Instancing，通过 UBO 打包 per-instance 数据驱动 instanced draw call。关键技术决策：

mat3x4 仿射优化（48 vs 64 字节）— 利用 ModelMat 恒为仿射变换的架构保证
派生 uniform 通过 #define 实时计算（MVMat/MVPMat/NormalMat）— 消除 UBO 冗余存储
Per-pass 独立布局 — ShadowCaster 等轻量 pass 获得更高 instanceMaxCount
InstanceLayout 缓存在 ShaderProgram — 编译时计算一次，渲染时直接取用
Opaque 排序改为 material/primitive 优先 — 合批友好，TBDR 硬件处理 overdraw

iPhone 16 Pro Max 实测 +67% FPS 提升（30→50，5000 objects / 21 draw calls）；UI Canvas 9216 按钮 +120% FPS（25→55）。

问题

无新问题。

自上次 LGTM 以来无新 commit，LGTM，可合入。

增补提醒：建议在 release notes 中标注 renderer_LocalMat 和 renderer_MVInvMat 的移除（破坏性变更），以及 renderer_NormalMat 逐顶点 inverse() 的 @todo（后续可 CPU 端预计算并打包到 UBO）。

GuoLei1990 · 2026-05-13T13:31:06Z

更正：上一条 review 基于 PR 的最新 push commit（2026-04-11），当时未感知到 2026-05-13 另一位 reviewer 提出的新 P0 bugs。现补充增量审查。

已关闭问题清单

所有历史 P0/P1 问题均已修复或关闭（详见前 15+ 轮记录）。

2026-05-13 新发现的 P0 bugs 均已在本地 commit 中修复，修复质量如下：

问题	修复 commit	修复质量
P0#1 derived-only shader：`renderer_MVPMat` 声明被剥离但没有 `#define` 补位 → compile fail	`04ac8b3b7` fix(instancing): always run inject path	✅ 删除 `hasField` 早返回；`_buildLayout` 已 force-inject `renderer_ModelMat`，`fieldMap` 为空时也能生成有效 UBO + `#define` 重映射
P0#8 / P1#8 `mat3` 等不支持类型：scan 剥离声明但 `addField` 静默 skip → dangling reference	`10b926566` fix(instancing): reject unsupported uniform types at scan time	✅ 在 scan 阶段查 `_std140TypeInfoMap`，不支持类型 `return match` 保留声明 + `Logger.error`，不再进入 `fieldMap`
P0#4 跨 canvas 合批数据错乱：已合批 leader 进入主管线 `_canBatch/_batch`，`subMesh.count` 用错值 + indices 重复 append	`9606bf3de` fix(ui): skip re-batching of canvas-internal leaders	✅ 在 `BatcherManager.batch` 入口识别 `_isBatched` leader，直接 flush 透传，不走 `_canBatch/_batch`；提取 `_flush` helper 消除重复
P0#2 `_scanInstanceUniformsWithMacros` `#if`/`#elif` 栈帧 bug（dormant）	`a25abab76` refactor(shader): drop dead macro-aware uniform scan	✅ 整个函数及其 4 个正则一并删除；dead code，生产路径从未调用

架构一致性确认

injectInstanceUBO 的返回类型仍为 InstanceBufferLayout | null，但 04ac8b3b7 后实际始终返回非 null（_buildLayout 总是至少 force-inject ModelMat）。类型签名偏保守，不影响运行时正确性，调用方 RenderQueue 的 if (isInstanced && layout) guard 正确。

状态

本地 fix commits 未 push 到 PR（PR HEAD 仍为 5d02bbe188 2026-04-11）。推送后 LGTM，可合入。

增补提醒：建议在 release notes 中标注 renderer_LocalMat 和 renderer_MVInvMat 的移除（alpha breaking change），以及 renderer_NormalMat GPU 端 per-vertex inverse() 的 @todo（BlinnPhong 的双引用 cache 缺失可在 follow-up PR 中补齐）。

fix: static init order for _gpuInstanceMacro and prettier formatting

3279bb6

Move _gpuInstanceMacro after _macroMap declaration to fix static initialization order. Also apply prettier formatting fixes.

GuoLei1990 added 5 commits April 7, 2026 22:50

style: remove extra blank lines

2318007

revert: restore original shader files, instancing UBO injection handl…

94ef055

…es layout The instance UBO is injected at compile time by ShaderFactory, so the original shader uniform declarations don't need modification.

fix: restore non-Layer shader improvements that were over-reverted

128b16e

Bring back normalMat extraction, transform_declare reordering, trailing whitespace fixes, and VertexPBR indent fix. Only the renderer_Layer relocation stays reverted.

refactor: revert _getMacroCachePool rename back to _getShaderProgramPool

4d8f2b7

github-actions Bot added the documentation Improvements or additions to documentation label Apr 7, 2026

GuoLei1990 added 20 commits April 8, 2026 11:46

feat(gpu-instancing): load ambient light and clone ducks for instancing

9d015d6

refactor: rename batch to packer for consistency with InstanceDataPacker

73c19fa

refactor: move gpuInstanceMacro to InstanceDataPacker and cleanup

9002cb4

- Move RENDERER_GPU_INSTANCE macro from ShaderMacro to InstanceDataPacker - Rename getOrCreate() to get() in InstanceDataPackerPool - Clear compileMacros in InstanceDataPacker.reset()

refactor: move batcherManager declaration to first usage

b8935e3

refactor: rename InstanceDataPacker to InstanceBatch

002d7ab

Rename class, file, and all references to better reflect its role as an instance batch manager rather than a generic data packer.

refactor: rename MacroCachePool to MacroMap

f984867

It's a macro-keyed map, not a pool (no borrow/return semantics).

refactor: rename shaderProgramPool(s) variables to shaderProgramMap(s)

1fc3f4f

Align variable and method names with MacroMap rename — these are maps, not pools.

refactor: simplify bindUniformBufferBase - remove unnecessary null check

53a9a9f

Callers always pass a valid buffer, no need for null guard.

refactor: simplify InstanceBatch - store layout directly, rename fields

c63b83b

- nativeBuffer → buffer, _uboData → _data - Replace separate instanceFields/_structSize with single _layout ref - setLayout() now takes InstanceLayout directly

refactor: clean up InstanceBatch upload method

b9f4168

- Remove unnecessary null guard on _layout - Inline uploadElements variable - Destructure floatView/intView from this - Improve worldMatrix comment

style: fix prettier formatting in RenderQueue

d75acb7

chore: upgrade lint-staged to v16 and fix glob pattern

82c0d60

- Upgrade lint-staged from v10.5 to v16.4.0 - Fix glob from *.{ts} to **/*.ts to match subdirectory files - Remove redundant git add from tasks

chore: upgrade eslint and @typescript-eslint to support TS 5.6

baeaa3d

- eslint 8.44 → 8.57 - @typescript-eslint/parser and eslint-plugin 6.x → 8.x - Eliminates "unsupported TypeScript version" warning

refactor: null out instancedRenderers in dispose

99ec5d1

Dispose means the object is permanently released, so null the array to free memory instead of just clearing length.

This comment was marked as outdated.

Sign in to view

GuoLei1990 commented May 12, 2026

View reviewed changes

cptbtptpbcptdtptp reviewed May 12, 2026

View reviewed changes

GuoLei1990 commented May 12, 2026

View reviewed changes

cptbtptpbcptdtptp reviewed May 12, 2026

View reviewed changes

GuoLei1990 added 2 commits May 12, 2026 21:44

GuoLei1990 commented May 12, 2026

View reviewed changes

GuoLei1990 commented May 13, 2026

View reviewed changes

GuoLei1990 force-pushed the feat/gpu-instancing branch 2 times, most recently from b4429b5 to 30f6ad8 Compare May 13, 2026 09:40

GuoLei1990 force-pushed the feat/gpu-instancing branch from 30f6ad8 to 04ac8b3 Compare May 13, 2026 09:41

GuoLei1990 commented May 13, 2026

View reviewed changes

GuoLei1990 added 4 commits May 13, 2026 20:30

Merge remote-tracking branch 'origin/dev/2.0' into feat/gpu-instancing

3b3d091

# Conflicts: # packages/shader/compiledShaders/PBR.shaderc # packages/shader/compiledShaders/Pipeline/DepthOnly.shaderc # packages/shader/compiledShaders/Pipeline/ShadowCaster.shaderc

GuoLei1990 commented May 13, 2026

View reviewed changes

Uh oh!

Conversation

GuoLei1990 commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Performance

MeshRenderer GPU Instancing

UI Canvas Batching

Future Optimization

Key Files

Test plan

Uh oh!

coderabbitai Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reviews paused

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Poem

❌ Failed checks (1 warning)

Uh oh!

codecov Bot commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

This comment was marked as outdated.

Uh oh!

GuoLei1990 left a comment

Choose a reason for hiding this comment

Re-review (Incremental) — GPU Instancing + UIBatchSorter + e2e Cases

已关闭问题清单

问题

[P0] RenderElement.dispose() — instancedRenderers = null 导致对象池复用时 NPE（仍未修复）

[P1] _scanInstanceUniformsWithMacros 不处理 #if/#elif 导致栈错位（仍未修复）

e2e Case 质量

结论

Uh oh!

cptbtptpbcptdtptp May 12, 2026

Choose a reason for hiding this comment

Uh oh!

GuoLei1990 May 12, 2026

Choose a reason for hiding this comment

Uh oh!

GuoLei1990 left a comment

Choose a reason for hiding this comment

已关闭问题清单

总结

问题（需在合并前修复）

Uh oh!

GuoLei1990 left a comment

Choose a reason for hiding this comment

已关闭问题清单

总结

Uh oh!

GuoLei1990 left a comment

Choose a reason for hiding this comment

已关闭问题清单

总结

问题

结论

Uh oh!

cptbtptpbcptdtptp May 12, 2026

Choose a reason for hiding this comment

Uh oh!

GuoLei1990 May 12, 2026

Choose a reason for hiding this comment

Uh oh!

cptbtptpbcptdtptp May 12, 2026

Choose a reason for hiding this comment

Uh oh!

GuoLei1990 left a comment

Choose a reason for hiding this comment

增量审查（2026-05-12，基于最新 commits）

已关闭问题清单

问题

结论

Uh oh!

GuoLei1990 left a comment

Choose a reason for hiding this comment

GuoLei1990 commented Apr 7, 2026 •

edited

Loading

coderabbitai Bot commented Apr 7, 2026 •

edited

Loading

codecov Bot commented Apr 7, 2026 •

edited

Loading

[P0] `RenderElement.dispose()` — `instancedRenderers = null` 导致对象池复用时 NPE（仍未修复）

[P1] `_scanInstanceUniformsWithMacros` 不处理 `#if`/`#elif` 导致栈错位（仍未修复）

⚠️ P0#2（dormant）：`_scanInstanceUniformsWithMacros` 对 `#if` / `#elif` 处理破坏 branchStack

⚠️ P1 breaking change：`renderer_LocalMat` / `renderer_MVInvMat` 被移除

🐛 P1#8 → 应升 P0：`_buildLayout` 静默跳过 mat3 / array 类型，触发与 P0#1 同 manifestation

⚠️ P1#7：派生 built-in 用 `#define` 实现，多次引用时性能依赖 driver CSE（mobile 风险）

P0#2 `_scanInstanceUniformsWithMacros` `#if`/`#elif` branch-stack — done (refactor, dead-code removal)

P0#8 `_buildLayout` 静默跳过不支持类型 — done (fix)

P1 `renderer_LocalMat` / `renderer_MVInvMat` 移除 — 不处理

P1#7 派生 built-in 用 `#define` 的 mobile CSE 风险 — 不处理（vertex CSE 大概率生效 + 内部 shader 已 audit）