(A) low-power high-performance DNN processor for mobile platforms모바일 플랫폼을 위한 저전력 고성능 심층 신경망 프로세서

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 363
  • Download : 0
Recently, deep learning, realized with deep neural networks (DNNs), has become universal in all-around applications due to its overwhelming performance and universal applicability. It has changed the paradigm of machine learning and brought significant progress in vision, speech, language processing, and many other applications. In 2012, a DNN named AlexNet dramatically reduced the image classification error by 10% for the ImageNet dataset. It has been developed rapidly and now performs human level accuracy in image classification. Due to rapid development, more and more companies are providing services that utilize deep learning. These services are mostly delivered in the form of cloud computing via servers in the data center due to the large computational requirements of the deep learning algorithm. In cloud computing, tasks requested by users are sent to the server through the Internet, and then the results are sent back to the user after the operation is completed on the server. In contrast, edge computing processes the tasks within the edge device itself. These edge computing benefits over cloud computing in terms of latency, data transfer costs, security, stability, and reliability, enabling better services. In particular, for video-based real-time services, these features become more prominent. The goal of this research is to implement intelligence-on-things or smart machines by enabling deep learning through edge computing on mobile platforms such as smartphones, drones, IoT devices, wearable devices, and robots. However, current mobile computing units could not meet the throughput and power requirements to process deep learning in real-time. Therefore, implementing a deep learning ASIC with higher energy efficiency is essential and is presented in this dissertation. In this dissertation, needs, validity, and characteristics of deep learning ASIC are introduced with key schemes for higher energy efficiency. Key schemes include existing and newly proposed ones and are divided into reduced numerical precision, processing and data flow optimization, utilization of reusability, sparsity exploit, and customized ALU. Under these bases, an energy-efficient deep learning ASIC named DNPU is presented with the real chip implementation. It has the following key features: 1) A reconfigurable heterogeneous architecture to support various DNNS 2) An off-chip access-efficient workload division method to handle large data with a limited on-chip memory 3) On-line self-tuning layer-by-layer dynamic-fixed-point to reduce the bit-width of activations 4) A quantization table-based matrix multiplication to reduce off-chip memory accesses and remove duplicated multiplications. 5) A computational SRAM-based stereo matching processor for RGB-depth 4-ch support
Advisors
Yoo, Hoi-Junresearcher유회준researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2018
Identifier
325007
Language
eng
Description

학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2018.8,[vii, 72 p. :]

Keywords

Artificial Intelligence▼adeep neural network▼amultilayer perceptron▼aconvolutional neural network▼arecurrent neural network▼aASIC▼aheterogeneous architecture▼amixed division method▼adynamic fixed-point▼alookup-table▼acomputational SRAM; 인공지능▼a심층 신경망▼a다층 퍼셉트론▼a컨볼루션 신경망▼a순환 신경망▼aASIC▼a이기종 구조▼a복합 분할 방식▼a동적 고정 소수점▼a룩업 테이블▼a연산 SRAM

URI
http://hdl.handle.net/10203/265237
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=828215&flag=dissertation
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0