DSpace at KOASAS: Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Conference Papers(학술회의논문)

Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture

Cited 7 time in

Cited 0 time in

Hit : 241
Download : 0

Export

Hong, Byungchul / Ro, Yeonju / Kim, John Dongjun researcher

Accelerating neural network training is critical in exploring design space of neural networks. Data parallelism is commonly used to accelerate training for Convolutional Neural Networks (CNN) where input hatch is distributed across the multiple workers; however, the communication time of weight. gradients can limit scalability for moderate hatch size. In this work, we propose multi-dimensional parallel training (MPT) of convolution layers by exploiting both data parallelism and intra-tile parallelism available in Winograd transformed convolution. Workers are organized across two dimensions one dimension exploiting intra-tile parallelism while the other dimension exploits data parallelism. MPT reduces the amount of communication necessary for weight gradients since weight gradients are only communicated within the data parallelism dimension. However, Winograd transform fundamentally requires more data accesses and the proposed MPT architecture also introduces a new type of communication which we refer to as tile transfer gather/scatter of Winograd domain feature maps (tiles). We propose a scalable near-data processing (NDP) architecture to minimize the cost of data accesses through 31) stacked memory while leveraging a memory-centric network organization to provide high-connectivity between the workers to accelerate tile transfer. To minimize tile gathering communication overhead, we exploit prediction of activation of spatial domain neurons in order to remove the communication of tiles that are transformed to non-activated neurons. We also propose dynamic clustering of the memory-centric network architecture that reconfigures the interconnect topology between the workers for each convolution layer to balance the communication required for weight gradients and tile transfer. Our evaluations show that the proposed MPT with NDP architecture accelerates training by up to 2.7x, 21x compared to data parallel training on the NDP architecture and a multi-GPU system, respectively.

Publisher: IEEE/ACM

Issue Date: 2018-10-23

Language: English

Citation: The 51st Annual IEEE/ACM International Symposium on Microarchitecture, pp.682 - 695

DOI: 10.1109/MICRO.2018.00061

URI: http://hdl.handle.net/10203/247303

Appears in Collection: EE-Conference Papers(학술회의논문)

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 7 items in WoS	Click to see citing articles in

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Multi-dimensional Parallel Training of Winograd Layer on Memory-Centric Architecture

This item is cited by other documents in WoS

KOASAS

Communities & Collections