(A) scalable heterogeneous vector-array architecture with resource scheduling for multi-user/multi-DNN workloads다중 유저 및 다중 신경망 가속을 위한 리소스 스케쥴링을 포함한 확장 가능한 벡터-어레이 이종 구조 가속기

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 58
  • Download : 0
As artificial intelligence and machine learning technologies disrupt a wide range of industries, cloud datacenters face ever-increasing demand in inference workloads. However, conventional CPU-based servers cannot handle excessive computational requirements of deep neural network (DNN) models, while GPU-based servers suffer from huge power consumption and high cost. In this paper, we present a scalable heterogeneous vector-array architecture that can cope with dynamically changing multi-user/multi-DNN workloads in cloud datacenters. We propose a heterogeneous architecture that features a load balancer that performs a high-level workload distribution and multiple vector-array clusters. Each cluster consists of a programmable scheduler, throughput-oriented array processors, function-oriented vector processors, and shared memories, in which the scheduler is responsible for allocating layer/sub-layer tasks in the cluster. The array processor is designed for matrix/convolution operations and the vector processor is designed for various algorithmic functions required in DNN models. For supporting multi-user/multi-DNN in our proposed architecture, we devise a lightweight DNN model description format that enables general model representation with user description. We also propose a novel resource scheduling algorithm that enables concurrent execution of runtime tasks while maximizing hardware utilization based on computation and memory access time estimation. Finally, we build an architecture and algorithm simulation framework based on actual synthesis and place-and-route implementation results. As a result, the proposed heterogeneous vector-array architecture achieves 10.9x higher throughput performance and 30.17x higher energy efficiency than a compatible GPU on various ML workloads. This research is conducted in collaboration with Sungyeob, Yoo, a master's student of the laboratory.
Advisors
Kim, Joo-Youngresearcher김주영researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2022
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2022.2,[iii, 20 p. :]

URI
http://hdl.handle.net/10203/309854
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=997185&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0