Skip to content

Engine build error for Alpamayo-R1-10B action head #92

@hygxy

Description

@hygxy

Building the engine of Alpamayo-R1-10B's action head leads to the following error:

./build/examples/multimodal/action_build   --onnxDir $WORKSPACE_DIR/$MODEL_NAME/onnx/action   --engineDir $WORKSPACE_DIR/$MODEL_NAME/engines   --maxBatchSize 6
[15:02:57.185] [INFO] [trtUtils.h:62:loadEdgellmPluginLib] EDGELLM_PLUGIN_PATH: /home/agx/TensorRT-Edge-LLM/build/libNvInfer_edgellm_plugin.so
[15:02:57.188] [WARNING] [version.cpp:79:checkVersion] Model does not have edgellm_version. Current runtime version: 0.7.1
[15:02:57.475] [INFO] [TensorRT] [MemUsageChange] Init CUDA: CPU +21, GPU +0, now: CPU 29, GPU 101073 (MiB)
[15:02:58.736] [INFO] [TensorRT] [MemUsageChange] Init builder kernel library: CPU +1228, GPU +1234, now: CPU 1459, GPU 102512 (MiB)
[15:02:58.744] [INFO] [TensorRT] ----------------------------------------------------------------
[15:02:58.744] [INFO] [TensorRT] Input filename:   /home/agx/tensorrt-edgellm-workspace/Alpamayo-R1-10B/onnx/action/model.onnx
[15:02:58.744] [INFO] [TensorRT] ONNX IR version:  0.0.10
[15:02:58.744] [INFO] [TensorRT] Opset version:    24
[15:02:58.744] [INFO] [TensorRT] Producer name:    pytorch
[15:02:58.744] [INFO] [TensorRT] Producer version: 2.10.0+cu128
[15:02:58.744] [INFO] [TensorRT] Domain:           
[15:02:58.744] [INFO] [TensorRT] Model version:    0
[15:02:58.744] [INFO] [TensorRT] Doc string:       
[15:02:58.744] [INFO] [TensorRT] ----------------------------------------------------------------
[15:02:58.751] [INFO] [TensorRT] Searching for plugin wth node domain namespace: 
[15:02:58.751] [INFO] [TensorRT] Searching for plugin wth node domain namespace: 
...
[15:02:58.754] [ERROR] [TensorRT] ModelImporter.cpp:135: While parsing node number 126 [RotaryEmbedding -> "rope_onnx"]:
[15:02:58.754] [ERROR] [TensorRT] ModelImporter.cpp:138: --- Begin node ---
input: "mul_198"
input: "_to_copy_5"
input: "_to_copy_6"
input: "attention_pos_id"
output: "rope_onnx"
name: "n0"
op_type: "RotaryEmbedding"
domain: "trt"
metadata_props {
  key: "namespace"
  value: ": llm_loader.models.alpamayo.modeling_alpamayo_action.AlpamayoAction/expert: llm_loader.models.alpamayo.modeling_alpamayo_action.ActionExpert/expert.layers.0: llm_loader.models.alpamayo.modeling_alpamayo_action.ActionDecoderLayer/expert.layers.0.self_attn: llm_loader.models.alpamayo.modeling_alpamayo_action.ActionAttention/rope_onnx: trt.rope_onnx.default"
}
metadata_props {
  key: "pkg.torch.onnx.class_hierarchy"
  value: "[\'llm_loader.models.alpamayo.modeling_alpamayo_action.AlpamayoAction\', \'llm_loader.models.alpamayo.modeling_alpamayo_action.ActionExpert\', \'llm_loader.models.alpamayo.modeling_alpamayo_action.ActionDecoderLayer\', \'llm_loader.models.alpamayo.modeling_alpamayo_action.ActionAttention\', \'trt.rope_onnx.default\']"
}
metadata_props {
  key: "pkg.torch.onnx.fx_node"
  value: "%rope_onnx : [num_users=1] = call_function[target=torch.ops.trt.rope_onnx.default](args = (%mul_198, %_to_copy_5, %_to_copy_6, %attention_pos_id), kwargs = {})"
}
metadata_props {
  key: "pkg.torch.onnx.name_scopes"
  value: "[\'\', \'expert\', \'expert.layers.0\', \'expert.layers.0.self_attn\', \'rope_onnx\']"
}
metadata_props {
  key: "pkg.torch.onnx.stack_trace"
  value: "File \"/home/user/TensorRT-Edge-LLM/experimental/llm_loader/models/alpamayo/modeling_alpamayo_action.py\", line 401, in forward\n    hidden, present_ks, present_vs = self.expert(\n  File \"/home/user/TensorRT-Edge-LLM/experimental/llm_loader/models/alpamayo/modeling_alpamayo_action.py\", line 318, in forward\n    hidden_states, pk, pv = layer(\n  File \"/home/user/TensorRT-Edge-LLM/experimental/llm_loader/models/alpamayo/modeling_alpamayo_action.py\", line 271, in forward\n    hidden_states, present_k, present_v = self.self_attn(\n  File \"/home/user/TensorRT-Edge-LLM/experimental/llm_loader/models/alpamayo/modeling_alpamayo_action.py\", line 208, in forward\n    q = rope_onnx(q.to(compute_type), rope_cos, rope_sin, position_ids)"
}
...
[15:02:58.754] [ERROR] [TensorRT] ModelImporter.cpp:139: --- End node ---
[15:02:58.754] [ERROR] [TensorRT] ModelImporter.cpp:141: ERROR: onnxOpCheckers.cpp:837 In function checkFallbackPluginImporter:
[6] creator && "Plugin not found, are the plugin name, version, and namespace correct?"
[15:02:58.754] [ERROR] [TensorRT] ModelImporter.cpp:135: While parsing node number 129 [TensorScatter -> "present_k_cache_0"]:
[15:02:58.754] [ERROR] [TensorRT] ModelImporter.cpp:138: --- Begin node ---
input: "k_cache_0"
input: "rope_onnx_1"
input: "kvcache_start_index"
output: "present_k_cache_0"
name: "n0_3"
op_type: "TensorScatter"
domain: "trt"
metadata_props {
  key: "namespace"
  value: ": llm_loader.models.alpamayo.modeling_alpamayo_action.AlpamayoAction/expert: llm_loader.models.alpamayo.modeling_alpamayo_action.ActionExpert/expert.layers.0: llm_loader.models.alpamayo.modeling_alpamayo_action.ActionDecoderLayer/expert.layers.0.self_attn: llm_loader.models.alpamayo.modeling_alpamayo_action.ActionAttention/kv_cache_update_onnx: trt.kv_cache_update_onnx.default"
}
metadata_props {
  key: "pkg.torch.onnx.class_hierarchy"
  value: "[\'llm_loader.models.alpamayo.modeling_alpamayo_action.AlpamayoAction\', \'llm_loader.models.alpamayo.modeling_alpamayo_action.ActionExpert\', \'llm_loader.models.alpamayo.modeling_alpamayo_action.ActionDecoderLayer\', \'llm_loader.models.alpamayo.modeling_alpamayo_action.ActionAttention\', \'trt.kv_cache_update_onnx.default\']"
}
metadata_props {
  key: "pkg.torch.onnx.fx_node"
  value: "%kv_cache_update_onnx : [num_users=2] = call_function[target=torch.ops.trt.kv_cache_update_onnx.default](args = (%cache_tensors_0, %rope_onnx_1, %kvcache_start_index), kwargs = {})"
}
metadata_props {
  key: "pkg.torch.onnx.name_scopes"
  value: "[\'\', \'expert\', \'expert.layers.0\', \'expert.layers.0.self_attn\', \'kv_cache_update_onnx\']"
}
metadata_props {
  key: "pkg.torch.onnx.stack_trace"
  value: "File \"/home/user/TensorRT-Edge-LLM/experimental/llm_loader/models/alpamayo/modeling_alpamayo_action.py\", line 401, in forward\n    hidden, present_ks, present_vs = self.expert(\n  File \"/home/user/TensorRT-Edge-LLM/experimental/llm_loader/models/alpamayo/modeling_alpamayo_action.py\", line 318, in forward\n    hidden_states, pk, pv = layer(\n  File \"/home/user/TensorRT-Edge-LLM/experimental/llm_loader/models/alpamayo/modeling_alpamayo_action.py\", line 271, in forward\n    hidden_states, present_k, present_v = self.self_attn(\n  File \"/home/user/TensorRT-Edge-LLM/experimental/llm_loader/models/alpamayo/modeling_alpamayo_action.py\", line 219, in forward\n    present_k = kv_cache_update_onnx(k_cache, k, kvcache_start_index)"
}
...
[15:02:58.755] [ERROR] [TensorRT] ModelImporter.cpp:139: --- End node ---
[15:02:58.755] [ERROR] [TensorRT] ModelImporter.cpp:141: ERROR: onnxOpCheckers.cpp:837 In function checkFallbackPluginImporter:
[6] creator && "Plugin not found, are the plugin name, version, and namespace correct?"
[15:02:58.755] [ERROR] [TensorRT] ModelImporter.cpp:135: While parsing node number 131 [Attention -> "attention_onnx"]:
[15:02:58.755] [ERROR] [TensorRT] ModelImporter.cpp:138: --- Begin node ---
input: "mul_223"
input: "present_k_cache_0"
input: "present_v_cache_0"
input: "masked_fill"
output: "attention_onnx"
name: "n0_5"
op_type: "Attention"
attribute {
  name: "is_causal"
  i: 0
  type: INT
}
attribute {
  name: "TRT_decomposable"
  i: 1
  type: INT
}
attribute {
  name: "scale"
  f: 1
  type: FLOAT
}
domain: "trt"
metadata_props {
  key: "namespace"
  value: ": llm_loader.models.alpamayo.modeling_alpamayo_action.AlpamayoAction/expert: llm_loader.models.alpamayo.modeling_alpamayo_action.ActionExpert/expert.layers.0: llm_loader.models.alpamayo.modeling_alpamayo_action.ActionDecoderLayer/expert.layers.0.self_attn: llm_loader.models.alpamayo.modeling_alpamayo_action.ActionAttention/attention_onnx: trt.attention_onnx.default"
}
metadata_props {
  key: "pkg.torch.onnx.class_hierarchy"
  value: "[\'llm_loader.models.alpamayo.modeling_alpamayo_action.AlpamayoAction\', \'llm_loader.models.alpamayo.modeling_alpamayo_action.ActionExpert\', \'llm_loader.models.alpamayo.modeling_alpamayo_action.ActionDecoderLayer\', \'llm_loader.models.alpamayo.modeling_alpamayo_action.ActionAttention\', \'trt.attention_onnx.default\']"
}
metadata_props {
  key: "pkg.torch.onnx.fx_node"
  value: "%attention_onnx : [num_users=1] = call_function[target=torch.ops.trt.attention_onnx.default](args = (%mul_223, %kv_cache_update_onnx, %kv_cache_update_onnx_1, %masked_fill, False, 1.0), kwargs = {})"
}
metadata_props {
  key: "pkg.torch.onnx.name_scopes"
  value: "[\'\', \'expert\', \'expert.layers.0\', \'expert.layers.0.self_attn\', \'attention_onnx\']"
}
metadata_props {
  key: "pkg.torch.onnx.stack_trace"
  value: "File \"/home/user/TensorRT-Edge-LLM/experimental/llm_loader/models/alpamayo/modeling_alpamayo_action.py\", line 401, in forward\n    hidden, present_ks, present_vs = self.expert(\n  File \"/home/user/TensorRT-Edge-LLM/experimental/llm_loader/models/alpamayo/modeling_alpamayo_action.py\", line 318, in forward\n    hidden_states, pk, pv = layer(\n  File \"/home/user/TensorRT-Edge-LLM/experimental/llm_loader/models/alpamayo/modeling_alpamayo_action.py\", line 271, in forward\n    hidden_states, present_k, present_v = self.self_attn(\n  File \"/home/user/TensorRT-Edge-LLM/experimental/llm_loader/models/alpamayo/modeling_alpamayo_action.py\", line 223, in forward\n    attn_output = attention_onnx("
}
...

It seems like that op_type "Attention", "TensorScatter", "RotaryEmbedding" is not registered correctly as a plugin?

TRT version is:

trtexec -h
...
&&&& PASSED TensorRT.trtexec [TensorRT v101303] [b9] # trtexec

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions