Near-data processing on memory-centric network architecture for data-intensive workloads데이터-접근 집중적인 작업을 위한 데이터 근접 연산 장치와 연결 네트워크 구조에 대한 연구

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 444
  • Download : 0
Recent advances in 3D integration technology allow 3D stacked dies with TSVs (Through-Silicon Vias), and the demand to high-bandwidth memory motivates 3D-stacked memory. Hybrid Memory Cube (HMC) is an example of the 3D-stacked memory with a logic die on the bottom that has additional area to hold processing elements for near-data processing (NDP). 3D integration technology allows the logic die and DRAM dies fabricated by different processes and have made NDP more feasible to accelerate different workloads. In addition, 3D-stacked memory modules, each becoming a router, can be interconnected with high-speed links to scale the system and create a memory-centric network. We first explore near-data processing for a fundamental operation – linked-list traversal (LLT), which is widely used as a central data structure for big-memory workloads. We propose a new NDP architecture that does not change the existing sequential programming model and does not require any modification to the processor microarchitecture. Instead, we exploit the packetized interface between the core and the memory modules to off-load LLT for NDP. We leverage a system with multiple memory modules interconnected with a memory network and our initial evaluation shows that simply off-loading LLT computation to near-memory can actually reduce performance because of the additional off-chip memory network channel traversals. Thus, we propose NDP-aware data localization to exploit locality – including locality within a single memory module and memory vault – to minimize latency and improve energy efficiency. In order to improve overall throughput and maximize parallelism, we propose batching multiple LLT operations together to amortize the cost of NDP by utilizing the highly parallel execution of NDP processing units and the high bandwidth of 3D stacked DRAM. Meanwhile, accelerating neural network training is critical in exploring design space of neural networks. Data parallelism is commonly used to accelerate training for Convolutional Neural Networks (CNNs) where in- put batch is distributed across the multiple workers; however, the increase in communication of weight gradients across the workers limits scalability. In this work, we propose multi-dimensional parallel (MDP) training of con- volution layer by exploiting both data parallelism and intra-tile parallelism available in Winograd transformed convolution. Workers are organized across two dimensions – one dimension exploiting intra-tile parallelism while the other dimension exploits data parallelism. MDP reduces the amount of communication necessary for weight gradients since weight gradients are only communicated across the data parallelism dimension. However, Wino- grad transform fundamentally requires more data accesses and the proposed MDP architecture also introduces a new type of communication which we refer to as tile transfer – gather/scatter of Winograd domain feature maps (tiles). We propose a scalable NDP architecture to minimize the cost of data accesses through 3D stacked memory while leveraging a memory-centric network organization to provide high-connectivity between the workers with intra-tile parallelism to accelerate tile transfer. In order to balance the communication required for weight gra- dients and tile transfer, we also propose a reconfigurable memory-centric network architecture that reconfigures network channel connectivity between the workers for each convolution layer.
Advisors
John Dongjun Kimresearcher김동준researcher
Description
전산학부,
Publisher
한국과학기술원
Issue Date
2018
Identifier
325007
Language
eng
Description

학위논문(박사) - 전산학부, 2018.8,[v, 63 p. :]

Keywords

Near-data processing▼a3D-stacked memory▼alinked-list▼abig-memory workload▼adeep learning▼aconvolutional neural network; 데이터 근접 연산▼a3차원 적층 메모리▼a연결 리스트▼a신경망 학습

URI
http://hdl.handle.net/10203/265316
Link
http://library.kaist.ac.kr/search/detail/view.do?bibCtrlNo=828228&flag=dissertation
Appears in Collection
CS-Theses_Ph.D.(박사논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0