DC Field | Value | Language |
---|---|---|
dc.contributor.advisor | 한동수 | - |
dc.contributor.author | Kim, Yechan | - |
dc.contributor.author | 김예찬 | - |
dc.date.accessioned | 2024-07-25T19:30:44Z | - |
dc.date.available | 2024-07-25T19:30:44Z | - |
dc.date.issued | 2023 | - |
dc.identifier.uri | http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1045718&flag=dissertation | en_US |
dc.identifier.uri | http://hdl.handle.net/10203/320530 | - |
dc.description | 학위논문(석사) - 한국과학기술원 : 김재철AI대학원, 2023.8,[iii, 26 p. :] | - |
dc.description.abstract | Mixture-of-Experts (MoE) models have recently emerged as a powerful technique for enhancing the scalability and performance of neural networks, primarily by leveraging learnable gating networks to allocate input tokens to different expert models. However, training MoE models on GPUs presents unique challenges, including insufficient GPU memory capacity for a large number of experts and computational inefficiency due to token load imbalance. To address these issues, we introduce Expert Server MoE (ES-MoE), a novel method that offloads all expert parameters and their optimizer states to CPUs. This approach not only mitigates the memory constraints of GPU-based training but also enhances training throughput by creating a unified pool of experts that allows for more efficient scheduling. Furthermore, ES-MoE employs pipelined expert optimization to minimize the iteration latency, effectively circumventing the issue of extended CPU optimization time. We validate our approach using GPT-based MoE architectures, demonstrating that ES-MoE scales up to 16 times better, and improves throughput up to 4.55x over the existing frameworks. | - |
dc.language | eng | - |
dc.publisher | 한국과학기술원 | - |
dc.subject | 전문가 혼합 모델 시스템▼a머신 러닝 시스템▼a메모리 개선▼a학습속도 가속화▼a파이프라이닝 | - |
dc.subject | Mixture-of-experts system▼aMachine learning system▼aMemory improvements▼aAccelerate training▼aPipelining | - |
dc.title | ES-MoE: overcoming the scalability challenges in mixture-of-experts models | - |
dc.title.alternative | 전문가 혼합 모델의 확장성 문제 극복 연구 | - |
dc.type | Thesis(Master) | - |
dc.identifier.CNRN | 325007 | - |
dc.description.department | 한국과학기술원 :김재철AI대학원, | - |
dc.contributor.alternativeauthor | Han, Dongsu | - |
Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.