DSpace at KOASAS: T-PIM: An Energy-Efficient Processing-in-Memory Accelerator for End-to-End On-Device Training

DSpace at KOASAS

College of Engineering(공과대학)School of Electrical Engineering(전기및전자공학부)EE-Journal Papers(저널논문)

T-PIM: An Energy-Efficient Processing-in-Memory Accelerator for End-to-End On-Device Training

Cited 12 time in

Cited 0 time in

Hit : 116
Download : 0

Export

DC Field	Value	Language
dc.contributor.author	Heo, Jaehoon	ko
dc.contributor.author	Kim, Junsoo	ko
dc.contributor.author	Lim, Sukbin	ko
dc.contributor.author	Han, Wontak	ko
dc.contributor.author	Kim, Joo-Young	ko
dc.date.accessioned	2023-09-01T05:00:32Z	-
dc.date.available	2023-09-01T05:00:32Z	-
dc.date.created	2022-12-05	-
dc.date.created	2022-12-05	-
dc.date.issued	2023-03	-
dc.identifier.citation	IEEE JOURNAL OF SOLID-STATE CIRCUITS, v.58, no.3, pp.600 - 613	-
dc.identifier.issn	0018-9200	-
dc.identifier.uri	http://hdl.handle.net/10203/312104	-
dc.description.abstract	Recently, on-device training has become crucial for the success of edge intelligence. However, frequent data movement between computing units and memory during training has been a major problem for battery-powered edge devices. Processing-in-memory (PIM) is a novel computing paradigm that merges computing logic into memory, which can address the data movement problem with excellent power efficiency. However, previous PIM accelerators cannot support the entire training process on chip due to its computing complexity. This article presents a PIM accelerator for end-to-end on-device training (T-PIM), the first PIM realization that enables end-to-end on-device training as well as high-speed inference. Its full-custom PIM macro contains 8T-SRAM cells to perform the energy-efficient in-cell and operation and the bit-serial-based computation logic enables fully variable bit-precision for input data. The macro supports various data mapping methods and computational paths for both fully connected and convolutional layers, in order to handle the complex training process. An efficient tiling scheme is also proposed to enable T-PIM to compute any size of deep neural network with the implemented hardware. In addition, configurable arithmetic units in a forward propagation path make T-PIM handle power-of-two bit-precision for weight data, enabling a significant performance boost during inference. Finally, T-PIM efficiently handles sparsity in both operands by skipping the computation of zeros in the input data and by gating-off computing units when the weight data are zero. Finally, we fabricate the T-PIM chip in 28-nm CMOS technology, occupying a die area of 5.04 mm $<^>{2}$ , including five T-PIM cores. It dissipates 5.25-51.23 mW at 50-280 MHz operating frequency with 0.75-1.05-V supply voltage. We successfully demonstrate that T-PIM can run the end-to-end training of VGG16 model on the CIFAR10 and CIFAR100 datasets, achieving 0.13-161.08-and 0.25-7.59-TOPS/W power efficiency during inference and training, respectively. The result shows that T-PIM is 2.02 $\times$ more energy-efficient than the state-of-the-art PIM chip that only supports backward propagation, not a whole training. Furthermore, we conduct an architectural experiment using a cycle-level simulator based on actual measurement results, which suggests that the T-PIM architecture is scalable and its scaled-up version provides up to 203.26 $\times$ higher power efficiency than a comparable GPU.	-
dc.language	English	-
dc.publisher	IEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC	-
dc.title	T-PIM: An Energy-Efficient Processing-in-Memory Accelerator for End-to-End On-Device Training	-
dc.type	Article	-
dc.identifier.wosid	000886873600001	-
dc.identifier.scopusid	2-s2.0-85142855423	-
dc.type.rims	ART	-
dc.citation.volume	58	-
dc.citation.issue	3	-
dc.citation.beginningpage	600	-
dc.citation.endingpage	613	-
dc.citation.publicationname	IEEE JOURNAL OF SOLID-STATE CIRCUITS	-
dc.identifier.doi	10.1109/JSSC.2022.3220195	-
dc.contributor.localauthor	Kim, Joo-Young	-
dc.contributor.nonIdAuthor	Kim, Junsoo	-
dc.contributor.nonIdAuthor	Lim, Sukbin	-
dc.contributor.nonIdAuthor	Han, Wontak	-
dc.description.isOpenAccess	N	-
dc.type.journalArticle	Article	-
dc.subject.keywordAuthor	Bit-serial arithmetic	-
dc.subject.keywordAuthor	deep neural network (DNN)	-
dc.subject.keywordAuthor	edge device accelerator	-
dc.subject.keywordAuthor	energy-efficient SRAM	-
dc.subject.keywordAuthor	on-device training	-
dc.subject.keywordAuthor	processing-in-memory (PIM)	-
dc.subject.keywordAuthor	sparsity handling	-
dc.subject.keywordAuthor	training	-
dc.subject.keywordPlus	DEEP NEURAL-NETWORKS	-
dc.subject.keywordPlus	SRAM	-

Appears in Collection: EE-Journal Papers(저널논문)

Files in This Item: There are no files associated with this item.

This item is cited by other documents in WoS

⊙ Detail Information in WoSⓡ	Click to see
⊙ Cited 12 items in WoS	Click to see citing articles in

Display Simple Item Record

qr_code

트윗하기

KOASAS

Knowledge Service Development Team, KAIST 291 Daehak-ro, Yuseong-gu, Daejeon 34141, Republic of Korea. T. 82-42-350-4493 Email. koasas@kaist.ac.kr
Copyright © 2016. Korea Advanced Institute of Science and Technology. All Rights Reserved.

KOASAS

KOASAS

Browse

T-PIM: An Energy-Efficient Processing-in-Memory Accelerator for End-to-End On-Device Training

This item is cited by other documents in WoS

KOASAS

Communities & Collections