DSpace at KOASAS: Near-data processing on memory-centric network architecture for data-intensive workloads

DSpace at KOASAS

College of Engineering(공과대학)School of Computing(전산학부)CS-Theses_Ph.D.(박사논문)

Near-data processing on memory-centric network architecture for data-intensive workloads데이터-접근 집중적인 작업을 위한 데이터 근접 연산 장치와 연결 네트워크 구조에 대한 연구

Cited 0 time in webofscience

Cited 0 time in scopus

Hit : 446
Download : 0

Export

Hong, Byungchul

Recent advances in 3D integration technology allow 3D stacked dies with TSVs (Through-Silicon Vias), and the demand to high-bandwidth memory motivates 3D-stacked memory. Hybrid Memory Cube (HMC) is an example of the 3D-stacked memory with a logic die on the bottom that has additional area to hold processing elements for near-data processing (NDP). 3D integration technology allows the logic die and DRAM dies fabricated by different processes and have made NDP more feasible to accelerate different workloads. In addition, 3D-stacked memory modules, each becoming a router, can be interconnected with high-speed links to scale the system and create a memory-centric network. We first explore near-data processing for a fundamental operation – linked-list traversal (LLT), which is widely used as a central data structure for big-memory workloads. We propose a new NDP architecture that does not change the existing sequential programming model and does not require any modification to the processor microarchitecture. Instead, we exploit the packetized interface between the core and the memory modules to off-load LLT for NDP. We leverage a system with multiple memory modules interconnected with a memory network and our initial evaluation shows that simply off-loading LLT computation to near-memory can actually reduce performance because of the additional off-chip memory network channel traversals. Thus, we propose NDP-aware data localization to exploit locality – including locality within a single memory module and memory vault – to minimize latency and improve energy efficiency. In order to improve overall throughput and maximize parallelism, we propose batching multiple LLT operations together to amortize the cost of NDP by utilizing the highly parallel execution of NDP processing units and the high bandwidth of 3D stacked DRAM. Meanwhile, accelerating neural network training is critical in exploring design space of neural networks. Data parallelism is commonly used to accelerate training for Convolutional Neural Networks (CNNs) where in- put batch is distributed across the multiple workers; however, the increase in communication of weight gradients across the workers limits scalability. In this work, we propose multi-dimensional parallel (MDP) training of con- volution layer by exploiting both data parallelism and intra-tile parallelism available in Winograd transformed convolution. Workers are organized across two dimensions – one dimension exploiting intra-tile parallelism while the other dimension exploits data parallelism. MDP reduces the amount of communication necessary for weight gradients since weight gradients are only communicated across the data parallelism dimension. However, Wino- grad transform fundamentally requires more data accesses and the proposed MDP architecture also introduces a new type of communication which we refer to as tile transfer – gather/scatter of Winograd domain feature maps (tiles). We propose a scalable NDP architecture to minimize the cost of data accesses through 3D stacked memory while leveraging a memory-centric network organization to provide high-connectivity between the workers with intra-tile parallelism to accelerate tile transfer. In order to balance the communication required for weight gra- dients and tile transfer, we also propose a reconfigurable memory-centric network architecture that reconfigures network channel connectivity between the workers for each convolution layer.

Advisors: John Dongjun Kim researcher; 김동준 researcher

Description: 전산학부,

Publisher: 한국과학기술원

Issue Date: 2018

Identifier: 325007

Language: eng

Description: 학위논문(박사) - 전산학부, 2018.8,[v, 63 p. :]

Keywords: Near-data processing▼a3D-stacked memory▼alinked-list▼abig-memory workload▼adeep learning▼aconvolutional neural network; 데이터 근접 연산▼a3차원 적층 메모리▼a연결 리스트▼a신경망 학습

URI: http://hdl.handle.net/10203/265316

Link: http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=828228&flag=dissertation

Appears in Collection: CS-Theses_Ph.D.(박사논문)

Files in This Item: There are no files associated with this item.

Display Full Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

Near-data processing on memory-centric network architecture for data-intensive workloads데이터-접근 집중적인 작업을 위한 데이터 근접 연산 장치와 연결 네트워크 구조에 대한 연구

KOASAS

Communities & Collections