FoMoE: Breaking the Full-Replica Barrier with a Federation of MoEs
Researchers propose FoMoE, a federated Mixture-of-Experts approach that enables large-scale LLM pre-training on limited compute budgets without requiring high-speed interconnects. FoMoE breaks the full-replica barrier by distributing MoE models across nodes with slower interconnects, improving efficiency and scalability. This approach can help builders train large models on constrained infrastructure. FoMoE achieves state-of-the-art results while reducing computational costs.
- FoMoE enables LLM pre-training on limited compute budgets.
- Distributes MoE models across nodes with slower interconnects.
- Improves efficiency and scalability for large-scale LLM training.