(A) preemptible neural processing unit architecture and its applicability for QoS-aware scheduling선점 가능한 뉴럴 프로세서 구조 및 서비스 품질을 높이는 스케줄링 알고리즘

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 179
  • Download : 0
To meet the high demands for Deep Neural Network (DNN) computation, major cloud vendors offer Machine Learning (ML) acceleration as a service. Due to the high compute requirement of DNN application, GPUs and chips specially designed for DNN computation called Neural Processing Units (NPUs) are generally used for the computation. Service providers utilize multi-tenancy by co-locating multiple DNN models inside a single accelerator to achieve high throughput and reduce the Total Cost of Ownership (TCO). However, current NPUs lack the ability to preempt an on-going task and cannot provide fast response time to high priority requests leading to Service Level Agreement (SLA) violation. To improve the Quality of Service (QoS) of ML services, this dissertation explores three possible preemption mechanisms for NPUs and proposes a scheduling algorithm (PREMA) working on top of the preemptible NPU. Overall, the proposed scheduler achieves an average of 7.8$\times$, 1.4$\times$, and 4.8$\times$ improvement in latency, throughput, and SLA satisfaction, respectively.
Advisors
Rhu, Minsooresearcher유민수researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2020
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2020.8,[iv, 28 p. :]

Keywords

DNN (Deep Neural Network)▼aNPU (Neural Processing Unit)▼apreemption; QoS (Quality of Service)▼aTCO (Total Cost of Ownership); 심층 학습▼a뉴럴 프로세서▼a선점▼a서비스 품질▼a총 소유 비용

URI
http://hdl.handle.net/10203/285071
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=925235&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0