UNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision

Cited 209 time in webofscience Cited 142 time in scopus
  • Hit : 1099
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorLee, Jinmookko
dc.contributor.authorKim, Changhyeonko
dc.contributor.authorKang, Sanghoonko
dc.contributor.authorShin, Dongjooko
dc.contributor.authorSangyeob Kimko
dc.contributor.authorYoo, Hoi-Junko
dc.date.accessioned2019-02-21T01:27:13Z-
dc.date.available2019-02-21T01:27:13Z-
dc.date.created2018-12-19-
dc.date.created2018-12-19-
dc.date.issued2019-01-
dc.identifier.citationIEEE JOURNAL OF SOLID-STATE CIRCUITS, v.54, no.1, pp.1 - 13-
dc.identifier.issn0018-9200-
dc.identifier.urihttp://hdl.handle.net/10203/250504-
dc.description.abstractAn energy-efficient deep neural network (DNN) accelerator, unified neural processing unit (UNPU), is proposed for mobile deep learning applications. The UNPU can support both convolutional layers (CLs) and recurrent or fully connected layers (FCLs) to support versatile workload combinations to accelerate various mobile deep learning applications. In addition, the UNPU is the first DNN accelerator ASIC that can support fully variable weight bit precision from 1 to 16 bit. It enables the UNPU to operate on the accuracy-energy optimal point. Moreover, the lookup table (LUT)-based bit-serial processing element (LBPE) in the UNPU achieves the energy consumption reduction compared to the conventional fixed-point multiply-and-accumulate (MAC) array by 23.1%, 27.2%, 41%, and 53.6% for the 16-, 8-, 4-, and 1-bit weight precision, respectively. Besides the energy efficiency improvement, the unified DNN core architecture of the UNPU improves the peak performance for CL by 1.15x compared to the previous work. It makes the UNPU operate on the lower voltage and frequency for the given DNN to increase energy efficiency. The UNPU is implemented in 65-nm CMOS technology and occupies the 4 x 4 mm² die area. The UNPU can operates from 0.63- to 1.1-V supply voltage with maximum frequency of 200 MHz. The UNPU has peak performance of 345.6 GOPS for 16-bit weight precision and 7372 GOPS for 1-bit weight precision. The wide operating range of UNPU makes the UNPU achieve the power efficiency of 3.08 TOPS/W for 16-bit weight precision and 50.6 TOPS/W for 1-bit weight precision. The functionality of the UNPU is successfully demonstrated on the verification system using ImageNet deep CNN (VGG-16).-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleUNPU: An Energy-Efficient Deep Neural Network Accelerator With Fully Variable Weight Bit Precision-
dc.typeArticle-
dc.identifier.wosid000457637300016-
dc.identifier.scopusid2-s2.0-85060461834-
dc.type.rimsART-
dc.citation.volume54-
dc.citation.issue1-
dc.citation.beginningpage1-
dc.citation.endingpage13-
dc.citation.publicationnameIEEE JOURNAL OF SOLID-STATE CIRCUITS-
dc.identifier.doi10.1109/JSSC.2018.2865489-
dc.contributor.localauthorYoo, Hoi-Jun-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle; Proceedings Paper-
dc.subject.keywordAuthorBit serial-
dc.subject.keywordAuthordeep learning-
dc.subject.keywordAuthordeep learning ASIC-
dc.subject.keywordAuthordeep learning hardware-
dc.subject.keywordAuthordeep neural network (DNN)-
dc.subject.keywordAuthormobile deep learning-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 209 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0