Describe the bug
During the model conversion of Nemotron Omni from ONNX to TensorRT Engine on the target platform, the error message shows:
...
[07:48:03.497] [INFO] [TensorRT] Successfully created plugin: Nvfp4MoePlugin
[07:48:03.516] [INFO] [llmBuilder.cpp:118:build] Created directory Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4/omni_engine_nvfp4.trtedgellm070/llm for saving LLM engine.
[07:48:03.546] [ERROR] [TensorRT] IBuilder::buildSerializedNetwork: Error Code 9: API Usage Error (INT8 and FP8 mixed precision is allowed only when building network with kSTRONGLY_TYPED mode on Blackwell+ platforms.)
[07:48:03.546] [ERROR] [builderUtils.cpp:313:buildAndSerializeEngine] Failed to build serialized engine
[07:48:04.430] [ERROR] [llm_build.cpp:242:main] Failed to build LLM engine.
Steps/Code to reproduce bug
- Download model from https://huggingface.co/nvidia/Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4
- Change
model_type in config.json from NemotronH_Nano_Omni_Reasoning_V3 to NemotronH_Nano_VL_V2
- Convert model to ONNX by using experimental llm_loader
- Send model & cross-compiled TensorRT Edge-LLM program to Nvidia Drive Thor U (DriveOS 7.0.3)
- Set
/proc/sys/vm/nr_hugepages to 16384
- Run
./build/examples/llm/llm_build --onnxDir Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4/omni_onnx_nvfp4.trtedgellm070/llm/ --engineDir Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4/omni_engine_nvfp4.trtedgellm070/llm
Build configuration:
cmake .. \
-DCMAKE_BUILD_TYPE=Release \
-DTRT_PACKAGE_DIR=/usr \
-DCMAKE_TOOLCHAIN_FILE=cmake/aarch64_linux_toolchain.cmake \
-DEMBEDDED_TARGET=auto-thor \
-DCUDA_CTK_VERSION=12.8
Expected behavior
The LLM build should be completed. And I can run this omni model with llm_inference
System information (Edge Device)
- Platform (e.g., NVIDIA Jetson Thor): Drive Thor U
- Software release (e.g., JetPack 7.1): DriveOS 7.0.3
- CPU architecture: aarch64
- GPU compute capability (e.g., SM110 for Jetson Thor): SM101
- Total device memory: 59045MB
- Build type (e.g., Release, Debug): Release
- Library versions:
- TensorRT Edge-LLM version or commit hash: 0.7.0
- CUDA: 12.8
- TensorRT: 10.10
- C++ compiler (e.g., GCC 11.4): GCC 13.3 (x86 docker dev env)
Describe the bug
During the model conversion of Nemotron Omni from ONNX to TensorRT Engine on the target platform, the error message shows:
Steps/Code to reproduce bug
model_typeinconfig.jsonfromNemotronH_Nano_Omni_Reasoning_V3toNemotronH_Nano_VL_V2/proc/sys/vm/nr_hugepagesto16384./build/examples/llm/llm_build --onnxDir Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4/omni_onnx_nvfp4.trtedgellm070/llm/ --engineDir Nemotron-3-Nano-Omni-30B-A3B-Reasoning-NVFP4/omni_engine_nvfp4.trtedgellm070/llmBuild configuration:
cmake .. \ -DCMAKE_BUILD_TYPE=Release \ -DTRT_PACKAGE_DIR=/usr \ -DCMAKE_TOOLCHAIN_FILE=cmake/aarch64_linux_toolchain.cmake \ -DEMBEDDED_TARGET=auto-thor \ -DCUDA_CTK_VERSION=12.8Expected behavior
The LLM build should be completed. And I can run this omni model with llm_inference
System information (Edge Device)