Agamotto: A Performance Optimization Framework for CNN Accelerator With Row Stationary Dataflow

Cited 4 time in webofscience Cited 0 time in scopus
  • Hit : 177
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorKim, Donghyukko
dc.contributor.authorJeong, Sanghyunko
dc.contributor.authorKim, Joo-Youngko
dc.date.accessioned2023-06-07T07:00:42Z-
dc.date.available2023-06-07T07:00:42Z-
dc.date.created2023-04-17-
dc.date.issued2023-06-
dc.identifier.citationIEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS, v.70, no.6, pp.2487 - 2496-
dc.identifier.issn1549-8328-
dc.identifier.urihttp://hdl.handle.net/10203/307089-
dc.description.abstractWe propose a software/hardware co-design framework called Agamotto for the complete design automation and performance optimization of the row stationary-based CNN accelerator. We design a scalable accelerator template whose critical design parameters can be configured. Based on the hardware template, Agamotto estimates the performance of the numerous possible hardware implementations for the target FPGA device and CNN model using the latency modeling tool. It chooses the best hardware design and generates the instructions and optimal runtime variables for each target CNN layer. As a result, Agamotto can generate the best hardware design within 61.67 seconds, achieving up to 2.8x higher hardware utilization than the original accelerator. In addition, experimental results show that the performance estimation is accurate, showing only 4.8% difference against the FPGA runtime for the end-to-end CNN model execution. The accelerator implemented on the Xilinx VCU118 evaluation board achieves 402 giga operations per second (GOPS) at 200 MHz, resulting in 13 frames per second (FPS) for the end-to-end execution of VGG-16. It is flexible enough to run more complex CNN models such as ResNet-50 and DarkNet-53, achieving 29.3 FPS and 16.9 FPS, respectively.-
dc.languageEnglish-
dc.publisherIEEE-INST ELECTRICAL ELECTRONICS ENGINEERS INC-
dc.titleAgamotto: A Performance Optimization Framework for CNN Accelerator With Row Stationary Dataflow-
dc.typeArticle-
dc.identifier.wosid000958818900001-
dc.identifier.scopusid2-s2.0-85151491443-
dc.type.rimsART-
dc.citation.volume70-
dc.citation.issue6-
dc.citation.beginningpage2487-
dc.citation.endingpage2496-
dc.citation.publicationnameIEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS-
dc.identifier.doi10.1109/TCSI.2023.3258411-
dc.contributor.localauthorKim, Joo-Young-
dc.contributor.nonIdAuthorJeong, Sanghyun-
dc.description.isOpenAccessN-
dc.type.journalArticleArticle-
dc.subject.keywordAuthorHardware-
dc.subject.keywordAuthorConvolutional neural networks-
dc.subject.keywordAuthorConvolution-
dc.subject.keywordAuthorComputational modeling-
dc.subject.keywordAuthorField programmable gate arrays-
dc.subject.keywordAuthorData models-
dc.subject.keywordAuthorArrays-
dc.subject.keywordAuthorCNN-
dc.subject.keywordAuthorrow stationary dataflow-
dc.subject.keywordAuthormapping strategy-
dc.subject.keywordAuthorperformance optimization-
dc.subject.keywordAuthorsoftware-
dc.subject.keywordAuthorhardware co-design-
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 4 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0