Improving convolutional neural network processing in Winograd domain using intra tile parallelismWinograd 영역에서의 타일 내 병렬성을 활용한 합성곱 신경망 처리 방식 개선

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 203
  • Download : 0
Developing efficient hardware solutions for processing Convolutional Neural Networks (CNNs) is an active area of research among the computer architecture community. While some model level modifications have been proposed over the years, the use of a transformed convolution scheme is the only approach, which guarantees performance improvement without the loss of accuracy. Among the transformed convolution schemes, the Winograd Minimal Filtering algorithm guarantees up to 2.25X performance improvement by significantly reducing the overall compute-intensity of the CNN. The Winograd convolution algorithm also accompanies an inherent parallelism called Intra Tile Parallelism, which presents a unique opportunity to further speedup the CNN processing. Our work proposes an efficient dataflow architecture, which exploits this Intra Tile parallelism to exhibit performance improvements for CNN processing over ResNet model. The performance improvements achieved from our experiments over the ResNet model outperform the state of the art results provided by NVIDIA's cuDNN library. We experienced a speedup of up to 2.14X for CNN layer processing time, and device memory bandwidth savings of up to 2.3X on Volta V100 Graphics Processing Unit (GPU), inside the NVIDIA's DGX-1 system, relative to their cuDNN library-based counterparts.
Advisors
Kim, Dongjunresearcher김동준researcher
Description
한국과학기술원 :전기및전자공학부,
Publisher
한국과학기술원
Issue Date
2020
Identifier
325007
Language
eng
Description

학위논문(석사) - 한국과학기술원 : 전기및전자공학부, 2020.8,[iii, 23 p. :]

Keywords

Convolution▼aGPU▼aPerformance▼aOptimization; 콘볼루션▼aGPU▼a성능▼a최적화

URI
http://hdl.handle.net/10203/285081
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=925245&flag=dissertation
Appears in Collection
EE-Theses_Master(석사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0