Skip to content

Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4 (LLM part) failed to convert to TensorRT Engine #89

@BitCircuit

Description

@BitCircuit

Describe the bug

During the model conversion of Nemotron Omni from ONNX to TensorRT Engine on the target platform, the error message shows:

...
[07:48:03.497] [INFO] [TensorRT] Successfully created plugin: Nvfp4MoePlugin
[07:48:03.516] [INFO] [llmBuilder.cpp:118:build] Created directory Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4/omni_engine_nvfp4.trtedgellm070/llm for saving LLM engine.
[07:48:03.546] [ERROR] [TensorRT] IBuilder::buildSerializedNetwork: Error Code 9: API Usage Error (INT8 and FP8 mixed precision is allowed only when building network with kSTRONGLY_TYPED mode on Blackwell+ platforms.)
[07:48:03.546] [ERROR] [builderUtils.cpp:313:buildAndSerializeEngine] Failed to build serialized engine
[07:48:04.430] [ERROR] [llm_build.cpp:242:main] Failed to build LLM engine.

Steps/Code to reproduce bug

  1. Download model from https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4
  2. Change model_type in config.json from NemotronH_Nano_Omni_Reasoning_V3 to NemotronH_Nano_VL_V2
  3. Convert model to ONNX by using experimental llm_loader
  4. Send model & cross-compiled TensorRT Edge-LLM program to Nvidia Drive Thor U (DriveOS 7.0.3)
  5. Set /proc/sys/vm/nr_hugepages to 16384
  6. Run ./build/examples/llm/llm_build --onnxDir Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4/omni_onnx_nvfp4.trtedgellm070/llm/ --engineDir Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4/omni_engine_nvfp4.trtedgellm070/llm

Build configuration:

cmake .. \
    -DCMAKE_BUILD_TYPE=Release \
    -DTRT_PACKAGE_DIR=/usr \
    -DCMAKE_TOOLCHAIN_FILE=cmake/aarch64_linux_toolchain.cmake \
    -DEMBEDDED_TARGET=auto-thor \
    -DCUDA_CTK_VERSION=12.8

Expected behavior

The LLM build should be completed. And I can run this omni model with llm_inference

System information (Edge Device)

  • Platform (e.g., NVIDIA Jetson Thor): Drive Thor U
  • Software release (e.g., JetPack 7.1): DriveOS 7.0.3
  • CPU architecture: aarch64
  • GPU compute capability (e.g., SM110 for Jetson Thor): SM101
  • Total device memory: 59045MB
  • Build type (e.g., Release, Debug): Release
  • Library versions:
    • TensorRT Edge-LLM version or commit hash: 0.7.0
    • CUDA: 12.8
    • TensorRT: 10.10
    • C++ compiler (e.g., GCC 11.4): GCC 13.3 (x86 docker dev env)

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions