ES-MoE: overcoming the scalability challenges in mixture-of-experts models전문가 혼합 모델의 확장성 문제 극복 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 2
  • Download : 0
Mixture-of-Experts (MoE) models have recently emerged as a powerful technique for enhancing the scalability and performance of neural networks, primarily by leveraging learnable gating networks to allocate input tokens to different expert models. However, training MoE models on GPUs presents unique challenges, including insufficient GPU memory capacity for a large number of experts and computational inefficiency due to token load imbalance. To address these issues, we introduce Expert Server MoE (ES-MoE), a novel method that offloads all expert parameters and their optimizer states to CPUs. This approach not only mitigates the memory constraints of GPU-based training but also enhances training throughput by creating a unified pool of experts that allows for more efficient scheduling. Furthermore, ES-MoE employs pipelined expert optimization to minimize the iteration latency, effectively circumventing the issue of extended CPU optimization time. We validate our approach using GPT-based MoE architectures, demonstrating that ES-MoE scales up to 16 times better, and improves throughput up to 4.55x over the existing frameworks.
Advisors
한동수researcher
Description
한국과학기술원 :김재철AI대학원,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2023.8,[iii, 26 p. :]

Keywords

전문가 혼합 모델 시스템▼a머신 러닝 시스템▼a메모리 개선▼a학습속도 가속화▼a파이프라이닝; Mixture-of-experts system▼aMachine learning system▼aMemory improvements▼aAccelerate training▼aPipelining

URI
http://hdl.handle.net/10203/320530
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1045718&flag=dissertation
Appears in Collection
AI-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0