Energy efficient processors and In-DRAM processing framework for deep convolutional neural network에너지 효율적인 심층 컨볼루셔널 신경망 프로세서 및 DRAM 내부 연산 프레임워크

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 443
  • Download : 0
DC FieldValueLanguage
dc.contributor.advisorKim, Lee-Sup-
dc.contributor.advisor김이섭-
dc.contributor.authorSim, Jaehyeong-
dc.date.accessioned2019-08-25T02:44:11Z-
dc.date.available2019-08-25T02:44:11Z-
dc.date.issued2019-
dc.identifier.urihttp://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=842507&flag=dissertationen_US
dc.identifier.urihttp://hdl.handle.net/10203/265145-
dc.description학위논문(박사) - 한국과학기술원 : 전기및전자공학부, 2019.2,[vii, 84 p. :]-
dc.description.abstractRecent deep convolutional neural networks (CNNs) are outperforming conventional hand-crafted algorithms in a wide variety of intelligent vision tasks, but they require billons of operations and hundreds million of weights. To process large-scale CNNs energy-efficiently, three generations of CNN hardware are designed in this dissertation. The first two generations are CNN processors based on the conventional Von Neumann architecture, and the third generation CNN hardware is based on in-DRAM processing framework that does not obey Von Neumann architecture. The first generation primitive CNN processor integrates dual-range multiply-accumulate (MAC) blocks by exploiting the statistics of input feature values to reduce energy consumption of MAC operations. Also, tile-based computing method is proposed in the primitive CNN processor. In result, it achieves 1.42TOPS/W energy efficiency in the LeNet-5 CNN model. The second generation advanced CNN processor operates at near-threshold voltage (NTV) to reduce energy consumption furthermore. It also features a newly proposed enhanced output stationary dataflow (EOS) and two-stage big and small on-chip memory architecture, resulting in up to 1.15TOPS/W energy efficiency in the VGG-16 model. Finally, the third generation in-DRAM processing binary CNN hardware processes dominant convolution operations by serially cascading in-DRAM bulk bitwise operations. To this end, we first identify the problem that the bitcount operations with only bulk bitwise AND/OR/NOT incur significant overhead in terms of delay when the size of kernels gets larger. Then, we not only optimize the performance by efficiently allocating inputs and kernels to DRAM banks for both convolutional and fully-connected layers through design space explorations, but also mitigate the overhead of bitcount operations by splitting kernels into multiple parts. Partial sum accumulations and tasks of the other layers such as max-pooling and normalization layers are processed in the peripheral area of DRAM with negligible overheads. In results, our in-DRAM binary CNN processing framework achieves 19x-36x performance and 9x-14x EDP improvements for convolutional layers, and 9x-17x performance and 1.4x-4.5x EDP improvements for fully-connected layers over previous PIM technique in four large-scale CNN models. Also, it shows 3.796TOPS/W energy efficiency in AlexNet CNN model.-
dc.languageeng-
dc.publisher한국과학기술원-
dc.subjectDeep learning▼adeep convolutional neural network▼aenegy efficient processor▼aIn-DRAM processing▼aprocessing in-memory-
dc.subject딥 러닝▼a심층 컨볼루셔널 신경망▼a에너지 효율적인 프로세서▼aDRAM 내부 연산▼a메모리 내부 연산-
dc.titleEnergy efficient processors and In-DRAM processing framework for deep convolutional neural network-
dc.title.alternative에너지 효율적인 심층 컨볼루셔널 신경망 프로세서 및 DRAM 내부 연산 프레임워크-
dc.typeThesis(Ph.D)-
dc.identifier.CNRN325007-
dc.description.department한국과학기술원 :전기및전자공학부,-
dc.contributor.alternativeauthor심재형-
Appears in Collection
EE-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0