Describe the bug
tensorrt-edgellm-export-llm --model_dir /workspace/models/Qwen3-30B-A3B-GPTQ-Int4--output_dir $MODEL_NAME/onnx --device cpu
Skipping import of cpp extensions due to incompatible torch version. Please upgrade to torch >= 2.11.0 (found 2.10.0+cu128).
/workspace/venv/tesorrt-edge-llm/lib/python3.10/site-packages/modelopt/torch/init.py:36: UserWarning: transformers version 5.8.0 is not tested with nvidia-modelopt and may cause issues. Please install recommended version with pip install nvidia-modelopt[hf] if working with HF models.
_warnings.warn(
[transformers] Qwen2VLImageProcessorFast is deprecated. The Fast suffix for image processors has been removed; use Qwen2VLImageProcessor instead.
ModelOpt save/restore enabled for transformers library.
ModelOpt save/restore enabled for peft library.
ModelOpt save/restore enabled for transformers library.
ModelOpt save/restore enabled for peft library.
Exporting standard model to ONNX format
Loading standard model from /workspace/models/Qwen3-30B-A3B-GPTQ-Int4
Loading GPTQ MoE model from /workspace/models/Qwen3-30B-A3B-GPTQ-Int4. You might see warnings saying 'Some weights of the model checkpoint at Qwen/Qwen3-30B-A3B-GPTQ-Int4 were not used when initializing Qwen3MoeForCausalLM', which is expected. The weights will be fixed automatically afterwards.
[transformers] torch_dtype is deprecated! Use dtype instead!
WARN Python GIL is enabled: Multi-gpu quant acceleration for MoE models is sub-optimal and multi-core accelerated cpu packing is also disabled. We recommend Python >= 3.13.3t with Pytorch > 2.8 for mult-gpu quantization and multi-cpu packing with env PYTHON_GIL=0.
WARN Feature utils/Perplexity requires Python < 3.14 and Python GIL enabled and Python >= 3.13.3T (T for Threading-Free edition of Python) plus Torch 2.8. Feature is currently skipped/disabled.
INFO ENV: Auto setting PYTORCH_ALLOC_CONF='expandable_segments:True,max_split_size_mb:256,garbage_collection_threshold:0.7' for memory saving.
INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
INFO
/\\\\\\/\\\\\\_/\\\\\\\_/\_/\\/\\/\_/\\\____
/\///////////\/////////\_///////\//////\\/\\____/\\\/\\\/\_////\_
/\_/\_/\_/\_/\//////\_/\//\_/\//\_/\_/\_
/\_/\\\_/\\\\\\/________/\_/\\\\\_/\______//\_/\\///\/\//\_/\\_/\_/\\\\/\_
/\___/////\_/\//////////\_/////////////\_/\_/\_///\//\_/\///\_/\\\\_/\/////\_/\_
/\_/\_/\_/\_///\\/\\//\_////\_/\_//\__/\////\_/\\\\\_/\_
/\_/\_/\_/\_////\///\_/\_//\_/\_/\_/\_//\////////\___
//\\\\\\//\_/\_///\\\/\_/\_///\\///\\\/\//\\\\\/\\\\_
////////////____///////////////////////////////////////////////////_
[W508 02:32:55.984626263 Context.cpp:424] Warning: torch.backends.cuda.preferred_linalg_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator())
/workspace/venv/tesorrt-edge-llm/lib/python3.10/site-packages/kernels/utils.py:401: FutureWarning: Future versions of kernels (>=0.15) will require specifying a kernel version or revision. See: https://huggingface.co/docs/kernels/migration
revision = select_revision_or_version(repo_id, revision=revision, version=version)
'[Errno 101] Network is unreachable' thrown while requesting HEAD https://huggingface.co/kernels/kernels-community/quantization_gptq/resolve/main/kernel-status.toml
WARNING:huggingface_hub.utils._http:'[Errno 101] Network is unreachable' thrown while requesting HEAD https://huggingface.co/kernels/kernels-community/quantization_gptq/resolve/main/kernel-status.toml
Retrying in 1s [Retry 1/5].
WARNING:huggingface_hub.utils._http:Retrying in 1s [Retry 1/5].
Failed to load CPU gemm_4bit kernel: Cannot send a request, as the client has been closed.. Use fallback path. Please make sure you already pip install kernels and the kernels >= 0.11.1
INFO Kernel: Auto-selection: adding candidate TorchFusedQuantLinear
INFO Kernel: selected -> TorchFusedQuantLinear.
[transformers] loss_type=None was set in the config but it is unrecognized. Using the default loss: ForCausalLMLoss.
Loading weights: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 963/963 [00:00<00:00, 1833.36it/s]
[transformers] Qwen3MoeForCausalLM LOAD REPORT from: /workspace/models/Qwen3-30B-A3B-GPTQ-Int4
Key | Status | Details
--------------------------------------------------------------+------------+--------
model.layers.{0...47}.mlp.experts.{0...127}.down_proj.g_idx | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.up_proj.scales | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.up_proj.qweight | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.gate_proj.qweight | UNEXPECTED |
model.layers.{0...47}.mlp.gate.g_idx | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.gate_proj.qzeros | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.down_proj.qzeros | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.gate_proj.scales | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.down_proj.scales | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.down_proj.qweight | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.up_proj.qzeros | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.gate_proj.g_idx | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.up_proj.g_idx | UNEXPECTED |
model.layers.{0...47}.mlp.gate.qweight | UNEXPECTED |
model.layers.{0...47}.mlp.gate.scales | UNEXPECTED |
model.layers.{0...47}.mlp.gate.qzeros | UNEXPECTED |
model.layers.{0...47}.mlp.gate.weight | MISSING |
model.layers.{0...47}.mlp.experts.down_proj | MISSING |
model.layers.{0...47}.mlp.experts.gate_up_proj | MISSING |
Notes:
- UNEXPECTED: can be ignored when loading from different task/architecture; not ok if you expect identical arch.
- MISSING: those params were newly initialized because missing from the checkpoint. Consider training on your downstream task.
INFO QuantizeConfig: offload_to_disk_path auto set to ./gptqmodel_offload/hqdpgrum-rkaakpxx/
INFO Format: Converting checkpoint_format from gptq to internal gptq_v2.
INFO Format: Converting GPTQ v1 to v2
INFO Optimize: TorchFusedQuantLinear compilation triggered.
INFO gc.collect() reclaimed 10 objects in 0.226s
Warning: No gate weights found in checkpoint
GPTQ load dtype normalization: cast 0 params and 2 buffers to torch.float16; skipped 192 GPTQ quantized modules.
Warning: Loaded processor from /workspace/models/Qwen3-30B-A3B-GPTQ-Int4. The processor will skip image processing for images smaller than 128x28x28 or bigger than 2048x32x32 due to excessive memory usage during image quantization.
=== Exporting model ===
Detected MoE model, replacing MoE blocks with Int4MoePlugin
Registered ONNX symbolic functions for custom Int4MoePlugin
Error during LLM model export: 'Qwen3MoeSparseMoeBlock' object has no attribute 'num_experts'
Traceback:
Traceback (most recent call last):
File "/workspace/TensorRT-Edge-LLM/tensorrt_edgellm/scripts/export_llm.py", line 113, in main
export_llm_model(model_dir=args.model_dir,
File "/workspace/TensorRT-Edge-LLM/tensorrt_edgellm/onnx_export/llm_export.py", line 1023, in export_llm_model
model = replace_moe_blocks_with_plugin(model)
File "/workspace/TensorRT-Edge-LLM/tensorrt_edgellm/llm_models/layers/int4_moe_plugin.py", line 544, in replace_moe_blocks_with_plugin
new_module = Int4MoePluginModule(module, group_size)
File "/workspace/TensorRT-Edge-LLM/tensorrt_edgellm/llm_models/layers/int4_moe_plugin.py", line 429, in init
self.num_experts = moe_block.num_experts
File "/workspace/venv/tesorrt-edge-llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1965, in getattr
raise AttributeError(
AttributeError: 'Qwen3MoeSparseMoeBlock' object has no attribute 'num_experts'
Expected behavior
System information (x86 Host with GPU)
TensorRT Edge-LLM:0.7.0
Package Version
accelerate 1.13.0
aiohappyeyeballs 2.6.1
aiohttp 3.13.5
aiosignal 1.4.0
annotated-doc 0.0.4
annotated-types 0.7.0
anyio 4.13.0
async-timeout 5.0.1
attrs 26.1.0
audioread 3.1.0
backoff 2.2.1
certifi 2026.4.22
cffi 2.0.0
charset-normalizer 3.4.7
click 8.3.3
coloredlogs 15.0.1
cppimport 26.4.17
cuda-bindings 12.9.4
cuda-pathfinder 1.5.4
cupy-cuda12x 14.0.1
datasets 4.4.2
decorator 5.2.1
Device-SMI 0.5.6
dill 0.4.0
einops 0.8.2
exceptiongroup 1.3.1
filelock 3.29.0
flatbuffers 25.12.19
frozenlist 1.8.0
fsspec 2025.10.0
GPTQModel 5.7.0
h11 0.16.0
hf_transfer 0.1.9
hf-xet 1.5.0
httpcore 1.0.9
httpx 0.28.1
huggingface_hub 1.14.0
humanfriendly 10.0
idna 3.13
Jinja2 3.1.6
joblib 1.5.3
kernels 0.14.0
kernels-data 0.14.0
lazy-loader 0.5
librosa 0.11.0
llvmlite 0.47.0
LogBar 0.4.3
Mako 1.3.12
markdown-it-py 4.1.0
MarkupSafe 3.0.3
maturin 1.13.1
mdurl 0.1.2
ml_dtypes 0.5.4
mpmath 1.3.0
msgpack 1.1.2
multidict 6.7.1
multiprocess 0.70.18
networkx 3.4.2
ninja 1.13.0
numba 0.65.1
numpy 2.2.6
nvidia-cublas-cu12 12.8.4.1
nvidia-cuda-cupti-cu12 12.8.90
nvidia-cuda-nvrtc-cu12 12.8.93
nvidia-cuda-runtime-cu12 12.8.90
nvidia-cudnn-cu12 9.10.2.21
nvidia-cufft-cu12 11.3.3.83
nvidia-cufile-cu12 1.13.1.3
nvidia-curand-cu12 10.3.9.90
nvidia-cusolver-cu12 11.7.3.90
nvidia-cusparse-cu12 12.5.8.93
nvidia-cusparselt-cu12 0.7.1
nvidia-ml-py 13.595.45
nvidia-modelopt 0.39.0
nvidia-nccl-cu12 2.27.5
nvidia-nvjitlink-cu12 12.8.93
nvidia-nvshmem-cu12 3.4.5
nvidia-nvtx-cu12 12.8.90
onnx 1.19.0
onnx_graphsurgeon 0.6.1
onnx-ir 0.2.1
onnxconverter-common 1.16.0
onnxruntime-gpu 1.22.0
onnxscript 0.7.0
onnxsim 0.6.3
optimum 2.1.0
packaging 26.2
pandas 2.3.3
peft 0.18.1
pillow 12.1.1
pip 22.0.2
platformdirs 4.9.6
polygraphy 0.49.26
pooch 1.9.0
propcache 0.4.1
protobuf 7.34.1
psutil 7.2.2
PuLP 3.3.1
pyarrow 24.0.0
pybind11 3.0.4
pycparser 3.0
pydantic 2.13.4
pydantic_core 2.46.4
Pygments 2.20.0
PyPcre 0.3.2
python-dateutil 2.9.0.post0
pytz 2026.2
PyYAML 6.0.3
regex 2026.4.4
requests 2.33.1
rich 15.0.0
safetensors 0.7.0
scikit-learn 1.7.2
scipy 1.15.3
sentencepiece 0.2.1
setuptools 59.6.0
shellingham 1.5.4
six 1.17.0
soundfile 0.13.1
soxr 1.1.0
sympy 1.14.0
tensorrt-edgellm 0.7.0
threadpoolctl 3.6.0
tiktoken 0.12.0
TokeNicer 0.0.13
tokenizers 0.22.2
tomli 2.4.1
tomlkit 0.14.0
torch 2.10.0
torchao 0.17.0
torchprofile 0.1.0
torchvision 0.25.0
tqdm 4.67.3
transformers 5.3.0
triton 3.6.0
typer 0.25.1
typing_extensions 4.15.0
typing-inspection 0.4.2
tzdata 2026.2
urllib3 2.6.3
xxhash 3.7.0
yarl 1.23.0
Describe the bug
tensorrt-edgellm-export-llm --model_dir /workspace/models/Qwen3-30B-A3B-GPTQ-Int4--output_dir $MODEL_NAME/onnx --device cpu
Skipping import of cpp extensions due to incompatible torch version. Please upgrade to torch >= 2.11.0 (found 2.10.0+cu128).
/workspace/venv/tesorrt-edge-llm/lib/python3.10/site-packages/modelopt/torch/init.py:36: UserWarning: transformers version 5.8.0 is not tested with nvidia-modelopt and may cause issues. Please install recommended version with
pip install nvidia-modelopt[hf]if working with HF models._warnings.warn(
[transformers]
Qwen2VLImageProcessorFastis deprecated. TheFastsuffix for image processors has been removed; useQwen2VLImageProcessorinstead.ModelOpt save/restore enabled for
transformerslibrary.ModelOpt save/restore enabled for
peftlibrary.ModelOpt save/restore enabled for
transformerslibrary.ModelOpt save/restore enabled for
peftlibrary.Exporting standard model to ONNX format
Loading standard model from /workspace/models/Qwen3-30B-A3B-GPTQ-Int4
Loading GPTQ MoE model from /workspace/models/Qwen3-30B-A3B-GPTQ-Int4. You might see warnings saying 'Some weights of the model checkpoint at Qwen/Qwen3-30B-A3B-GPTQ-Int4 were not used when initializing Qwen3MoeForCausalLM', which is expected. The weights will be fixed automatically afterwards.
[transformers]
torch_dtypeis deprecated! Usedtypeinstead!WARN Python GIL is enabled: Multi-gpu quant acceleration for MoE models is sub-optimal and multi-core accelerated cpu packing is also disabled. We recommend Python >= 3.13.3t with Pytorch > 2.8 for mult-gpu quantization and multi-cpu packing with env
PYTHON_GIL=0.WARN Feature
utils/Perplexityrequires Python < 3.14 and Python GIL enabled and Python >= 3.13.3T (T for Threading-Free edition of Python) plus Torch 2.8. Feature is currently skipped/disabled.INFO ENV: Auto setting PYTORCH_ALLOC_CONF='expandable_segments:True,max_split_size_mb:256,garbage_collection_threshold:0.7' for memory saving.
INFO ENV: Auto setting CUDA_DEVICE_ORDER=PCI_BUS_ID for correctness.
INFO
/\\\\\\/\\\\\\_/\\\\\\\_/\_/\\/\\/\_/\\\____
/\///////////\/////////\_///////\//////\\/\\____/\\\/\\\/\_////\_
/\_/\_/\_/\_/\//////\_/\//\_/\//\_/\_/\_
/\_/\\\_/\\\\\\/________/\_/\\\\\_/\______//\_/\\///\/\//\_/\\_/\_/\\\\/\_
/\___/////\_/\//////////\_/////////////\_/\_/\_///\//\_/\///\_/\\\\_/\/////\_/\_
/\_/\_/\_/\_///\\/\\//\_////\_/\_//\__/\////\_/\\\\\_/\_
/\_/\_/\_/\_////\///\_/\_//\_/\_/\_/\_//\////////\___
//\\\\\\//\_/\_///\\\/\_/\_///\\///\\\/\//\\\\\/\\\\_
////////////____///////////////////////////////////////////////////_
[W508 02:32:55.984626263 Context.cpp:424] Warning: torch.backends.cuda.preferred_linalg_library is an experimental feature. If you see any error or unexpected behavior when this flag is set please file an issue on GitHub. (function operator())
/workspace/venv/tesorrt-edge-llm/lib/python3.10/site-packages/kernels/utils.py:401: FutureWarning: Future versions of
kernels(>=0.15) will require specifying a kernel version or revision. See: https://huggingface.co/docs/kernels/migrationrevision = select_revision_or_version(repo_id, revision=revision, version=version)
'[Errno 101] Network is unreachable' thrown while requesting HEAD https://huggingface.co/kernels/kernels-community/quantization_gptq/resolve/main/kernel-status.toml
WARNING:huggingface_hub.utils._http:'[Errno 101] Network is unreachable' thrown while requesting HEAD https://huggingface.co/kernels/kernels-community/quantization_gptq/resolve/main/kernel-status.toml
Retrying in 1s [Retry 1/5].
WARNING:huggingface_hub.utils._http:Retrying in 1s [Retry 1/5].
Failed to load CPU gemm_4bit kernel: Cannot send a request, as the client has been closed.. Use fallback path. Please make sure you already
pip install kernelsand the kernels >= 0.11.1INFO Kernel: Auto-selection: adding candidate
TorchFusedQuantLinearINFO Kernel: selected ->
TorchFusedQuantLinear.[transformers]
loss_type=Nonewas set in the config but it is unrecognized. Using the default loss:ForCausalLMLoss.Loading weights: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 963/963 [00:00<00:00, 1833.36it/s]
[transformers] Qwen3MoeForCausalLM LOAD REPORT from: /workspace/models/Qwen3-30B-A3B-GPTQ-Int4
Key | Status | Details
--------------------------------------------------------------+------------+--------
model.layers.{0...47}.mlp.experts.{0...127}.down_proj.g_idx | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.up_proj.scales | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.up_proj.qweight | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.gate_proj.qweight | UNEXPECTED |
model.layers.{0...47}.mlp.gate.g_idx | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.gate_proj.qzeros | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.down_proj.qzeros | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.gate_proj.scales | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.down_proj.scales | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.down_proj.qweight | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.up_proj.qzeros | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.gate_proj.g_idx | UNEXPECTED |
model.layers.{0...47}.mlp.experts.{0...127}.up_proj.g_idx | UNEXPECTED |
model.layers.{0...47}.mlp.gate.qweight | UNEXPECTED |
model.layers.{0...47}.mlp.gate.scales | UNEXPECTED |
model.layers.{0...47}.mlp.gate.qzeros | UNEXPECTED |
model.layers.{0...47}.mlp.gate.weight | MISSING |
model.layers.{0...47}.mlp.experts.down_proj | MISSING |
model.layers.{0...47}.mlp.experts.gate_up_proj | MISSING |
Notes:
INFO QuantizeConfig: offload_to_disk_path auto set to
./gptqmodel_offload/hqdpgrum-rkaakpxx/INFO Format: Converting
checkpoint_formatfromgptqto internalgptq_v2.INFO Format: Converting GPTQ v1 to v2
INFO Optimize:
TorchFusedQuantLinearcompilation triggered.INFO gc.collect() reclaimed 10 objects in 0.226s
Warning: No gate weights found in checkpoint
GPTQ load dtype normalization: cast 0 params and 2 buffers to torch.float16; skipped 192 GPTQ quantized modules.
Warning: Loaded processor from /workspace/models/Qwen3-30B-A3B-GPTQ-Int4. The processor will skip image processing for images smaller than 128x28x28 or bigger than 2048x32x32 due to excessive memory usage during image quantization.
=== Exporting model ===
Detected MoE model, replacing MoE blocks with Int4MoePlugin
Registered ONNX symbolic functions for custom Int4MoePlugin
Error during LLM model export: 'Qwen3MoeSparseMoeBlock' object has no attribute 'num_experts'
Traceback:
Traceback (most recent call last):
File "/workspace/TensorRT-Edge-LLM/tensorrt_edgellm/scripts/export_llm.py", line 113, in main
export_llm_model(model_dir=args.model_dir,
File "/workspace/TensorRT-Edge-LLM/tensorrt_edgellm/onnx_export/llm_export.py", line 1023, in export_llm_model
model = replace_moe_blocks_with_plugin(model)
File "/workspace/TensorRT-Edge-LLM/tensorrt_edgellm/llm_models/layers/int4_moe_plugin.py", line 544, in replace_moe_blocks_with_plugin
new_module = Int4MoePluginModule(module, group_size)
File "/workspace/TensorRT-Edge-LLM/tensorrt_edgellm/llm_models/layers/int4_moe_plugin.py", line 429, in init
self.num_experts = moe_block.num_experts
File "/workspace/venv/tesorrt-edge-llm/lib/python3.10/site-packages/torch/nn/modules/module.py", line 1965, in getattr
raise AttributeError(
AttributeError: 'Qwen3MoeSparseMoeBlock' object has no attribute 'num_experts'
Expected behavior
System information (x86 Host with GPU)
TensorRT Edge-LLM:0.7.0
Package Version
accelerate 1.13.0
aiohappyeyeballs 2.6.1
aiohttp 3.13.5
aiosignal 1.4.0
annotated-doc 0.0.4
annotated-types 0.7.0
anyio 4.13.0
async-timeout 5.0.1
attrs 26.1.0
audioread 3.1.0
backoff 2.2.1
certifi 2026.4.22
cffi 2.0.0
charset-normalizer 3.4.7
click 8.3.3
coloredlogs 15.0.1
cppimport 26.4.17
cuda-bindings 12.9.4
cuda-pathfinder 1.5.4
cupy-cuda12x 14.0.1
datasets 4.4.2
decorator 5.2.1
Device-SMI 0.5.6
dill 0.4.0
einops 0.8.2
exceptiongroup 1.3.1
filelock 3.29.0
flatbuffers 25.12.19
frozenlist 1.8.0
fsspec 2025.10.0
GPTQModel 5.7.0
h11 0.16.0
hf_transfer 0.1.9
hf-xet 1.5.0
httpcore 1.0.9
httpx 0.28.1
huggingface_hub 1.14.0
humanfriendly 10.0
idna 3.13
Jinja2 3.1.6
joblib 1.5.3
kernels 0.14.0
kernels-data 0.14.0
lazy-loader 0.5
librosa 0.11.0
llvmlite 0.47.0
LogBar 0.4.3
Mako 1.3.12
markdown-it-py 4.1.0
MarkupSafe 3.0.3
maturin 1.13.1
mdurl 0.1.2
ml_dtypes 0.5.4
mpmath 1.3.0
msgpack 1.1.2
multidict 6.7.1
multiprocess 0.70.18
networkx 3.4.2
ninja 1.13.0
numba 0.65.1
numpy 2.2.6
nvidia-cublas-cu12 12.8.4.1
nvidia-cuda-cupti-cu12 12.8.90
nvidia-cuda-nvrtc-cu12 12.8.93
nvidia-cuda-runtime-cu12 12.8.90
nvidia-cudnn-cu12 9.10.2.21
nvidia-cufft-cu12 11.3.3.83
nvidia-cufile-cu12 1.13.1.3
nvidia-curand-cu12 10.3.9.90
nvidia-cusolver-cu12 11.7.3.90
nvidia-cusparse-cu12 12.5.8.93
nvidia-cusparselt-cu12 0.7.1
nvidia-ml-py 13.595.45
nvidia-modelopt 0.39.0
nvidia-nccl-cu12 2.27.5
nvidia-nvjitlink-cu12 12.8.93
nvidia-nvshmem-cu12 3.4.5
nvidia-nvtx-cu12 12.8.90
onnx 1.19.0
onnx_graphsurgeon 0.6.1
onnx-ir 0.2.1
onnxconverter-common 1.16.0
onnxruntime-gpu 1.22.0
onnxscript 0.7.0
onnxsim 0.6.3
optimum 2.1.0
packaging 26.2
pandas 2.3.3
peft 0.18.1
pillow 12.1.1
pip 22.0.2
platformdirs 4.9.6
polygraphy 0.49.26
pooch 1.9.0
propcache 0.4.1
protobuf 7.34.1
psutil 7.2.2
PuLP 3.3.1
pyarrow 24.0.0
pybind11 3.0.4
pycparser 3.0
pydantic 2.13.4
pydantic_core 2.46.4
Pygments 2.20.0
PyPcre 0.3.2
python-dateutil 2.9.0.post0
pytz 2026.2
PyYAML 6.0.3
regex 2026.4.4
requests 2.33.1
rich 15.0.0
safetensors 0.7.0
scikit-learn 1.7.2
scipy 1.15.3
sentencepiece 0.2.1
setuptools 59.6.0
shellingham 1.5.4
six 1.17.0
soundfile 0.13.1
soxr 1.1.0
sympy 1.14.0
tensorrt-edgellm 0.7.0
threadpoolctl 3.6.0
tiktoken 0.12.0
TokeNicer 0.0.13
tokenizers 0.22.2
tomli 2.4.1
tomlkit 0.14.0
torch 2.10.0
torchao 0.17.0
torchprofile 0.1.0
torchvision 0.25.0
tqdm 4.67.3
transformers 5.3.0
triton 3.6.0
typer 0.25.1
typing_extensions 4.15.0
typing-inspection 0.4.2
tzdata 2026.2
urllib3 2.6.3
xxhash 3.7.0
yarl 1.23.0