Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
-
Updated
Dec 14, 2023 - Python
Large scale 4D parallelism pre-training for 🤗 transformers in Mixture of Experts *(still work in progress)*
Trains a 7B-parameter GPT model using NVIDIA Megatron-LM with full 3D parallelism across a 64-GPU InfiniBand cluster. Communication is profiled at multiple levels: PyTorch Profiler traces, Nsight Systems captures, a dedicated NCCL C++ benchmark, a Rust GPU memory monitor.
Add a description, image, and links to the 3d-parallelism topic page so that developers can more easily learn about it.
To associate your repository with the 3d-parallelism topic, visit your repo's landing page and select "manage topics."