Add ViT Attention Plugin Support for Qwen, Mllama, and SigLIP Visual Models#4241
Add ViT Attention Plugin Support for Qwen, Mllama, and SigLIP Visual Models#4241micwill755 wants to merge 4 commits into
Conversation
narendasan
left a comment
There was a problem hiding this comment.
Are we able to use: https://huggingface.co/docs/transformers/v5.5.0/en/serialization#exporting-to-production to avoid as much patching on the model side?
…into TensorRT-Edge-LLM under vitAttentionKernels
narendasan
left a comment
There was a problem hiding this comment.
How does the new plugin operator get inserted into the graph?
| position_ids = torch.arange(input_embeds.shape[1]).unsqueeze(0).to(device) | ||
|
|
||
| use_fp32_acc = False | ||
| use_explicit_typing = False |
There was a problem hiding this comment.
Enabled precision is deprecated in TRT 10.16 and will be removed in the next version so we dont need this code path
There was a problem hiding this comment.
Will do. I’ll clean this up by removing.
It follows the same pattern as the existing AttentionPlugin integration. At a high level, we insert a Torch custom op into the Dynamo graph by wrapping/replacing the model attention module. That custom op is only a graph marker on the PyTorch side. During Torch-TensorRT conversion, the registered converter lowers that marker to the real TensorRT plugin layer by looking up the plugin creator and calling add_plugin_v2.
|
Summary
This PR adds ViT attention plugin integration and validation support to the TensorRT Dynamo examples/tooling path. It wires ViTAttentionPlugin conversion through the Torch-TensorRT/Dynamo flow, supports Qwen-style packed/windowed attention metadata via cu_seqlens and max_seq_len, and adds end-to-end visual model validation for Qwen2.5-VL, Llama 3.2 Vision/Mllama, and GR00T/Eagle/SigLIP-style models.
Changes
Testing