HEX is a whole-body vision-language-action framework for full-sized humanoid robots.
-
Updated
May 26, 2026 - Jupyter Notebook
HEX is a whole-body vision-language-action framework for full-sized humanoid robots.
Towards Versatile Vision-Language-Action Models via Learning Unified Vision-Motion Representations
Visualize episode embeddings and select maximally diverse training subsets for robotics ML. Train on 10K diverse episodes instead of 50K random ones.
Imitation Learning for Surgical Robot Task Automation — Behavioral Cloning, DAgger, Diffusion Policy, and VLA models on JIGSAWS surgical demonstrations
Cross-embodiment visual representation learning using Vision Transformers conditioned on robot kinematic structure via cross-attention.
Add a description, image, and links to the vla-model topic page so that developers can more easily learn about it.
To associate your repository with the vla-model topic, visit your repo's landing page and select "manage topics."