(An) adaptive spatial-temporal GPU scheduling scheme for multi-domain DNNs services다중 도메인 DNN 서비스를 위한 적응형 시공간 GPU 스케줄링 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 123
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorYoun, Chan-Hyun-
dc.contributor.advisor윤찬현-
dc.contributor.authorDinh, Khac Tuyen-
dc.date.accessioned2023-06-26T19:34:31Z-
dc.date.available2023-06-26T19:34:31Z-
dc.date.issued2023-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1032940&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/309994-
dc.description학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[iii, 37 p. :]-
dc.description.abstractHigh throughput Deep Neural Networks (DNNs) serving servers are essential for online service applications as deep learning techniques are used in a wider range of applications. Serving DNNs services on serving servers require the key requirement: they need to serve multiple heterogenous DNNs services models that guarantee the service-level objective (SLO) of each model as well as improve the system utilization and improve the system-wide throughput. Therefore, it is required for an GPU scheduler to orchestrate the GPU resources for DNNs models. To address the requirements of DNNs serving servers for multi-domain DNNs services model, this thesis proposes an adaptive combined spatial-temporal GPU scheduling scheme. We first show the existing limitations of current works including both conventional temporal scheduling and spatial scheduling on GPU. By our experiments, we show that the existing spatial scheduling approaches usually lead to high resources reconfiguring time and has not fully utilize the GPU computation resources. To tackle the problem, we propose a combine spatial-temporal GPU scheduling strategy that first deploying an adaptive GPU spatial partitioning strategy to partition the GPU computation to multiple computation parts, which we call GPU partition, to schedule concurrent running models, and secondly define a strategy to share the GPU partition temporally to further enhance the utilization of the GPU system. We investigate the factors that might affect the performance of model when running concurrently and formulate the latency function of spatial-temporal running models and propose a heuristic approach for the problem. Our evaluation shows that the proposed scheme can significantly reduce the resource reconfiguring overhead as well as enhance the system throughput compare to the other baseline works.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectDeep Learning Inference▼aGPU Scheduling▼aSpatial sharing▼aTemporal sharing▼aGPU resources partitioning-
dc.subject딥 러닝 추론▼aGPU 스케줄링▼a공간 공유▼a시간 공유▼aGPU 리소스 파티셔닝-
dc.title(An) adaptive spatial-temporal GPU scheduling scheme for multi-domain DNNs services-
dc.title.alternative다중 도메인 DNN 서비스를 위한 적응형 시공간 GPU 스케줄링 기법-
dc.typeThesis(Master)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전기및전자공학부,-
dc.contributor.alternativeauthor딩 칵 투옌-
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0