(An) efficient partitioning and scheduling algorithm for GPUs in designing machine learning inference server인공지능 추론용 서버를 위한 GPU의 분할 및 스케줄링 방법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 55
  • Download : 0
Today's cloud vendors offer Machine Learning as a Service (MLaaS). Unlike the training process, inference does not require high computational power, and inference using GPUs does not fully utilize the computational power of the device. The recently proposed GPU allows providers to partition single GPU into units of a size suitable for the degree of user's request and provides the ability to lower their Total Cost of Ownership (TCO) through increased computational utilization. This dissertation proposes a method of improving the compute utilization through heterogeneity of the multi-GPU server. The sophisticated partitioning algorithm proposed (PARIS) heterogeneizes inference servers based on the model and the characteristics of the environment, and guarantees Service Level Agreement (SLA) through the appropriate scheduling method (ELSA). The proposed partitioning and scheduling algorithm achieves an maximum 17.4x and 1.8x improvement in latency and throughput, respectively.
Advisors
Rhu, Minsooresearcher유민수researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2022.2,[iv, 28 p. :]

URI
http://hdl.handle.net/10203/309959
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=997183&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0