DSpace at KOASAS: Optimizing the aggregate throughput of concurrent deep learning jobs on a shared cluster

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Theses_Master(석사논문)

Optimizing the aggregate throughput of concurrent deep learning jobs on a shared cluster공유 클러스터에서 동시 딥 러닝 작업의 총 처리성능 최적화

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 353
Download : 0

Export

DC Field	Value	Language
dc.contributor.advisor	Park, Kyoung Soo	-
dc.contributor.advisor	박경수	-
dc.contributor.author	Son, Kyuho	-
dc.date.accessioned	2019-09-04T02:41:38Z	-
dc.date.available	2019-09-04T02:41:38Z	-
dc.date.issued	2018	-
dc.identifier.uri	http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=828576&flag=dissertation	en_US
dc.identifier.uri	http://hdl.handle.net/10203/266783	-
dc.description	학위논문(석사) - 한국과학기술원 : 전기및전자공학부(반도체학제전공), 2018.8,[3, 27 p. :]	-
dc.description.abstract	The explosive popularity of deep learning (DL) has led to the evolution of deep learning frameworks. Unfortunately, despite the need for running multiple deep learning jobs on a GPU shared cluster environment, current cloud schedulers are often insufficient to schedule them efficiently. Managing resources for deep learning models without enough information or expertise results in poor performance in scalability and adversely affects the overall cluster performance. In this paper, we present Max-Speedup, a job scheduling policy of multi-tenant deep learning jobs on a shared GPU cluster. We address two main challenges, 1) precise estimation of training throughput to analyse resource-performance trade-off of a deep learning model and 2) efficient scheduling policy for multi-tenant deep learning jobs on a shared GPU cluster. We tackle these problems by estimating the finish time of parameter synchronization and maximizinig aggregate speedup by exploiting performance-resource trade-offs of DL jobs. Our evaluation shows that Max-Speedup improves the average job completion time by 3x over SRTF while it reduces makespan by up to 26.9x.	-
dc.language	eng	-
dc.publisher	한국과학기술원	-
dc.subject	Job scheduler▼adeep learning▼aperformance estimation▼aGPU cluster▼aresource management	-
dc.subject	작업 스케줄러▼a딥러닝▼a성능 예측▼aGPU 클러스터▼a자원 관리	-
dc.title	Optimizing the aggregate throughput of concurrent deep learning jobs on a shared cluster	-
dc.title.alternative	공유 클러스터에서 동시 딥 러닝 작업의 총 처리성능 최적화	-
dc.type	Thesis(Master)	-
dc.identifier.CNRN	325007	-
dc.description.department	한국과학기술원 :전기및전자공학부(반도체학제전공),	-
dc.contributor.alternativeauthor	손규호	-

Appears in Collection: EE-Theses_Master(석사논문)

Files in This Item: There are no files associated with this item.

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Optimizing the aggregate throughput of concurrent deep learning jobs on a shared cluster공유 클러스터에서 동시 딥 러닝 작업의 총 처리성능 최적화

KOASAS

Communities & Collections