Rare Computing: Removing Redundant Multiplications from Sparse and Repetitive Data in Deep Neural Networks

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 145
  • Download : 0
Recent research shows that 4-bit data precision is sufficient for Deep Neural Network (DNN) inference without accuracy degradation. Due to the low bit-width, a large amount of data is repeated. In this paper, we propose a hardware architecture, named Rare Computing Architecture (RCA), that skips redundant computations due to the repeated data in the networks. By exploiting redundancy, RCA is not significantly affected by data-sparsity and maintains great improvements in performance and energy efficiency, while the improvements of existing DNN accelerators are vulnerable to variations in sparsity. In the RCA, repeated data in a window for censoring repetition are detected by a Redundancy Censoring Unit (RCU) and processed at a time, achieving high effective throughput. Additionally, we present a dataflow that exploits abundant data-reusability in DNNs, which enables the high-throughput computations to be ceaselessly performed without an increase of bandwidth for data-read. The proposed architecture is evaluated in two ways of exploiting weight- and activation-repetition. In the evaluation, RCA is compared to a value-agnostic computation and UCNN that is the state-of-the-art accelerator exploiting weight-repetition. Additionally, RCA is compared to Bit-pragmatic that exploits bit-level sparsity. Both evaluations demonstrate that the RCA shows steadily high improvements in performance and energy-efficiency.
Publisher
IEEE COMPUTER SOC
Issue Date
2022-04
Language
English
Article Type
Article
Citation

IEEE TRANSACTIONS ON COMPUTERS, v.71, no.4, pp.795 - 808

ISSN
0018-9340
DOI
10.1109/TC.2021.3063269
URI
http://hdl.handle.net/10203/292582
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0