Skip to content

[Relax][Backend] Fix TVM crashes with default relax pipeline when opt_level=1: InternalError: Check failed: (slot->value_computed) is false#18491

Draft
cchung100m wants to merge 12 commits into
apache:mainfrom
cchung100m:issue-17876
Draft

Conversation

@cchung100m

@cchung100m cchung100m commented Nov 23, 2025

Copy link
Copy Markdown
Contributor

Hi Commiters,

This PR is trying to fix issues #17876. Any suggestions would be appreciated if you are available.

Root Cause

VMShapeLower crashed when processing ShapeExpr containing composite PrimExpr that weren't computed yet.

Solution

Modified VisitExpr_(const ShapeExprNode* op) in vm_shape_lower.cc to:

1. Mark uncomputed variables as ready for computation
2. Trigger EmitOutstandingPrimExprCompute() to resolve the dependency chain
3. Ensure all expressions are computed before calling MakeSymbolicShapeArg

Added test case: test_composite_shape_expression_fix() to prevent future occurrences.

symbolizing the composite PrimExpr in the pipeline: canonicalization

@cchung100m cchung100m changed the title [#17876]Fix TVM crashes with default relax pipeline when opt_level=1: InternalError: Check failed: (slot->value_computed) is false [Relax][Backend] Fix TVM crashes with default relax pipeline when opt_level=1: InternalError: Check failed: (slot->value_computed) is false Nov 23, 2025
@cchung100m cchung100m marked this pull request as ready for review November 25, 2025 15:58
@tlopex tlopex self-assigned this Nov 28, 2025
Comment thread src/relax/backend/vm/vm_shape_lower.cc Outdated
if (it != slot_map_.end() && !it->second->value_computed) {
// If it's a variable, mark it as ready for computation
if (expr.as<tir::VarNode>()) {
it->second->value_computed = true;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking arbitrary tir::VarNode instances as value_computed = true in VisitExpr_ is incorrect. In VMShapeLower, only Relax ShapeVars (from function parameters/match_cast) or IntImm constants can be safely marked as computed, because only these have runtime values accessible to the VM.
I think correct way is to fix this earlier in the pipeline (shape inference/canonicalization) by symbolizing composite PrimExpr (introduce Relax ShapeVars) or simplifying them before VMShapeLower.

@cchung100m cchung100m Dec 8, 2025

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @tlopex

Thanks for the suggestion.
Should we extend ComputePrimValue() to handle ShapeExpr nodes with compound PrimExpr?
https://github.com/apache/tvm/blob/main/src/relax/transform/compute_prim_value.cc#L65

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, ComputePrimValue is intended only for evaluating statically evaluable PrimExpr into IntImm (constant folding), so I think extending ComputePrimValue would not address the root issue.
The real problem is that VMShapeLower cannot consume composite PrimExpr directly. The correct solution here should be to canonicalize ShapeExpr earlier by introducing a Relax ShapeVar binding for any non-trivial PrimExpr.

Just like:

# 1. Compute the symbolic value first (Canonicalization)
s1 = R.prim_value(n + 1)

# 2. Pass the computed var to the shape (VMShapeLower is happy now)
lv = R.call_tir(cls.func, (x,), R.shape([s1]), dtype="float32")

@cchung100m cchung100m marked this pull request as draft December 2, 2025 12:37
@cchung100m cchung100m force-pushed the issue-17876 branch 2 times, most recently from 533de8a to b98b5a7 Compare December 2, 2025 13:07
@cchung100m cchung100m force-pushed the issue-17876 branch 21 times, most recently from a59ed45 to b3f7800 Compare December 19, 2025 15:00
@cchung100m cchung100m force-pushed the issue-17876 branch 2 times, most recently from 4100b4c to bbd6752 Compare February 8, 2026 15:09
@cchung100m

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new Relax pass, CanonicalizeShapeExpr, which canonicalizes compound shape expressions by replacing them with fresh symbolic variables, and integrates it into several backend pipelines. The review feedback highlights a potential crash when canonicalizing function parameters due to emitting bindings outside of an active block, a Python compatibility issue in the tests when using the union operator in isinstance, and a minor typo in the Python docstring.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/relax/transform/canonicalize_shape_expr.cc
Comment thread tests/python/relax/test_transform_canonicalize_shape_expr.py Outdated
Comment thread python/tvm/relax/transform/transform.py Outdated
@cchung100m cchung100m force-pushed the issue-17876 branch 8 times, most recently from 9d4e9c3 to 95a5610 Compare June 14, 2026 11:23
@cchung100m

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a new pass, CanonicalizeShapeExpr, which canonicalizes compound shape expressions (e.g., n + 1) by replacing them with fresh symbolic variables and emitting MatchCast bindings. This pass is integrated into the finalize pipelines for CPU, CUDA, GPU, and ROCm backends, and includes Python bindings and unit tests. The review feedback points out a critical issue where the function's return struct info (op->ret_struct_info) is not canonicalized, which can cause downstream passes to crash. Additionally, the reviewer notes that calling builder_->EmitNormalized outside of an active binding block (such as when visiting the return struct info) will crash the compiler, and suggests tracking whether the visitor is inside a binding block to safely collect and prepend out-of-block bindings.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread src/relax/transform/canonicalize_shape_expr.cc Outdated
Comment thread src/relax/transform/canonicalize_shape_expr.cc
@cchung100m

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the CanonicalizeShapeExpr pass in TVM Relax, which replaces compound PrimExpr dimensions in ShapeExpr with fresh symbolic variables to prevent issues during downstream shape lowering. The pass is integrated into the CPU, CUDA, GPU, and ROCm compilation pipelines. The review feedback highlights a potential bug where nested local functions might have their parameters incorrectly canonicalized within the outer scope, suggesting a fix to reset the state when entering a FunctionNode. Additionally, the feedback points out an unused MatchCast binding variable and recommends adding a test case to verify the behavior of nested functions.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +47 to +53
BindingBlock VisitBindingBlock(const BindingBlock& block) final {
bool prev = inside_binding_block_;
inside_binding_block_ = true;
BindingBlock ret = ExprMutator::VisitBindingBlock(block);
inside_binding_block_ = prev;
return ret;
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

If a local function (lambda) is defined inside a binding block, inside_binding_block_ will remain true when visiting the local function's parameters and return type. This can lead to incorrect canonicalization of the nested function's parameters in the outer function's scope, emitting MatchCast bindings into the wrong block. We should override VisitExpr_(const FunctionNode* op) to temporarily reset inside_binding_block_ to false when entering a local function.

  BindingBlock VisitBindingBlock(const BindingBlock& block) final {
    bool prev = inside_binding_block_;
    inside_binding_block_ = true;
    BindingBlock ret = ExprMutator::VisitBindingBlock(block);
    inside_binding_block_ = prev;
    return ret;
  }

  Expr VisitExpr_(const FunctionNode* op) final {
    bool prev = inside_binding_block_;
    inside_binding_block_ = false;
    Expr ret = ExprMutator::VisitExpr_(op);
    inside_binding_block_ = prev;
    return ret;
  }

Comment on lines +91 to +92
builder_->EmitNormalized(MatchCast(match_var, PrimValue(expr), target_sinfo));
MatchCast binding(match_var, PrimValue(expr), target_sinfo);

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The local variable binding of type MatchCast is constructed but never used. Since builder_->EmitNormalized already creates and emits the MatchCast binding, this line is completely redundant and can be safely removed.

Suggested change
builder_->EmitNormalized(MatchCast(match_var, PrimValue(expr), target_sinfo));
MatchCast binding(match_var, PrimValue(expr), target_sinfo);
builder_->EmitNormalized(MatchCast(match_var, PrimValue(expr), target_sinfo));

Comment on lines +74 to +76

if __name__ == "__main__":
tvm.testing.main()

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Add a test case to verify that nested functions do not have their parameters incorrectly canonicalized in the outer function's scope.

@I.ir_module
class NestedFunc:
    @R.function
    def main(x: R.Tensor(("n",), "float32")):
        with R.dataflow():
            @R.function
            def local_func(y: R.Tensor(("n + 1",), "float32")):
                out = R.zeros(R.shape([n + 1]), dtype="float32")
                return out
            res = local_func(x)
            R.output(res)
        return res


def test_canonicalize_shape_expr_nested_function():
    mod = relax.transform.CanonicalizeShapeExpr()(NestedFunc)
    local_func = mod["main"].body.blocks[0].bindings[0].value
    param_shape = local_func.params[0].struct_info.shape
    assert any(not isinstance(dim, (tirx.IntImm, tirx.Var)) for dim in param_shape.values)


if __name__ == "__main__":
    tvm.testing.main()

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants