SIMART is a unified MLLM framework that performs part-level decomposition and kinematic prediction jointly to transform monolithic meshes into sim-ready articulated assets.
- Unified MLLM Framework: Offers a single-stage path to joint static asset understanding and sim-ready asset generation.
- Sparse 3D VQ-VAE: Reduces token counts by 70% compared to dense voxel tokens, enabling high-fidelity multi-part assemblies.
- Sim-Ready Assets: Generates structured URDF metadata and decomposed segments, enabling deployment into physics-based simulators and interactive robotic environments.
Our implementation is tested on Python 3.10.
conda create -n simart python=3.10
conda activate simart
pip install -r requirements.txtDownload the pre-trained checkpoints for the MLLM and the VQ-VAE from Hugging Face:
Place the downloaded weights in the ./checkpoints, or specify your custom paths using the inference arguments below.
Our model is trained on 3D assets following the Right-Handed Coordinate System:
- Up Direction: +Z
- Forward Direction: -Y (or +Y, but consistency is key for part orientation)
Pre-aligned Models: If your models are generated by Seed3D or Hunyuan3D, they are typically pre-aligned to the +Z up convention. You can run the normalization script directly without additional rotation arguments:
python scripts/process_raw_objects.py --input <object_path> --output ./assets --renderManual Alignment: For models from other sources that might use +Y up, you must use the rotation flags to align them.
- Important: Beyond just the "Up" direction, ensure the "Front" of the object faces the intended direction to help the MLLM correctly identify parts like "front legs" or "handles".
- Reference: Please refer to the processed models in the
assets/directory for the standard orientation.
Arguments:
--input: Path to the input raw object (.glb).--output: Output directory for the normalized model.--rot_x,--rot_y,--rot_z: Rotation angles in degrees to align the mesh.--render: Highly recommended. It renders a preview image to let you verify if the object is standing upright and facing forward.
To predict the articulated structure and generate the URDF of a processed 3D model, run the main inference pipeline:
python inference/infer.py --object_path ./assets/box_00.glb --debugArguments:
--object_path: Path to the object file or a folder containing multiple GLBs (Required).--output_path: Directory to save outputs (Default:./output/raw).--name: Base name for outputs (JSON, URDF, PLY, folders). If not provided, it is derived from theobject_path.--model_path: Path to the trained MLLM checkpoint directory (Default:./checkpoints/simart_mllm).--vqvae_ckpt_dir: Path to the VQ-VAE checkpoint directory (Default:./checkpoints/simart_vqvae).--blender_path: Custom path to the Blender executable. If not provided, it auto-downloads to/tmp.--debug: Enable debug mode to output intermediate visualizations (colored PLY files, joint axes, etc.).
SIMART/
├── assets/ # Sample 3D GLB assets
├── blender_script/ # Scripts for headless Blender rendering
├── inference/ # Main MLLM inference pipeline
├── scripts/ # Data preprocessing scripts
├── utils/ # Modular utility functions (mesh, URDF, parsing, etc.)
└── vqvae/ # Sparse VQ-VAE model definitions
This project is licensed under the Apache 2.0.
If you find our work helpful, please cite as
@article{zhang2026simart,
title={SIMART: Decomposing Monolithic Meshes into Sim-ready Articulated Assets via MLLM},
author={Zhang, Chuanrui and Qin, Minghan and Wang, Yuang and Xie, Baifeng and Li, Hang and Wang, Ziwei},
journal={arXiv preprint arXiv:2603.23386},
year={2026}
}


