(An) adaptive spatial-temporal GPU scheduling scheme for multi-domain DNNs services다중 도메인 DNN 서비스를 위한 적응형 시공간 GPU 스케줄링 기법

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 122
  • Download : 0
High throughput Deep Neural Networks (DNNs) serving servers are essential for online service applications as deep learning techniques are used in a wider range of applications. Serving DNNs services on serving servers require the key requirement: they need to serve multiple heterogenous DNNs services models that guarantee the service-level objective (SLO) of each model as well as improve the system utilization and improve the system-wide throughput. Therefore, it is required for an GPU scheduler to orchestrate the GPU resources for DNNs models. To address the requirements of DNNs serving servers for multi-domain DNNs services model, this thesis proposes an adaptive combined spatial-temporal GPU scheduling scheme. We first show the existing limitations of current works including both conventional temporal scheduling and spatial scheduling on GPU. By our experiments, we show that the existing spatial scheduling approaches usually lead to high resources reconfiguring time and has not fully utilize the GPU computation resources. To tackle the problem, we propose a combine spatial-temporal GPU scheduling strategy that first deploying an adaptive GPU spatial partitioning strategy to partition the GPU computation to multiple computation parts, which we call GPU partition, to schedule concurrent running models, and secondly define a strategy to share the GPU partition temporally to further enhance the utilization of the GPU system. We investigate the factors that might affect the performance of model when running concurrently and formulate the latency function of spatial-temporal running models and propose a heuristic approach for the problem. Our evaluation shows that the proposed scheme can significantly reduce the resource reconfiguring overhead as well as enhance the system throughput compare to the other baseline works.
Advisors
Youn, Chan-Hyunresearcher윤찬현researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2023
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2023.2,[iii, 37 p. :]

Keywords

Deep Learning Inference▼aGPU Scheduling▼aSpatial sharing▼aTemporal sharing▼aGPU resources partitioning; 딥 러닝 추론▼aGPU 스케줄링▼a공간 공유▼a시간 공유▼aGPU 리소스 파티셔닝

URI
http://hdl.handle.net/10203/309994
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=1032940&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0