MoE(Mixture-of-Experts)

#Pocket#NLP#LanguageModel#ACL
Issue Date: 2025-01-06 DeepSeekMoE: Towards Ultimate Expert Specialization in Mixture-of-Experts Language Models, Damai+, ACL24, 2024.08 CommentIn the era of large language models, Mixture-of-Experts (MoE) is a promising architecture for managing computational costs when scaling up model param ... #Pretraining#MachineLearning#Pocket#NLP#LanguageModel#Finetuning (SFT)
Issue Date: 2024-11-25 Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints, Aran Komatsuzaki+, ICLR23 Comment斜め読みしかできていないが、Mixture-of-Expertsを用いたモデルをSFT/Pretrainingする際に、既存のcheckpointの重みを活用することでより効率的かつ性能向上する方法を提案。MoE LayerのMLPを全て既存のcheckpointにおけるMLPの重みをコピーして初期 ...