TRiM: Tensor Reduction in Memory

Cited 12 time in webofscience Cited 6 time in scopus
  • Hit : 289
  • Download : 610
Personalized recommendation systems are gaining significant traction due to their industrial importance. An important building block of recommendation systems consists of what is known as the embedding layers, which exhibit a highly memory-intensive characteristics. Fundamental primitives of embedding layers are the embedding vector gathers followed by vector reductions, which exhibit low arithmetic intensity and becomes bottlenecked by the memory throughput. To address this issue, recent proposals in this research space employ a near-data processing (NDP) solution at the DRAM rank-level, achieving a significant performance speedup. We observe that prior NDP solutions based on rank-level parallelism leave significant performance left on the table, as they do not fully reap the abundant data transfer throughput inherent in DRAM datapaths. Based on the observation that the datapath of the DRAM has a hierarchical tree structure, we propose a novel, fine-grained NDP architecture for recommendation systems, which augments the DRAM datapath with an "in-DRAM" reduction unit at the DDR4/5 rank/bank-group/bank level, achieving significant performance improvements over state-of-the-art approaches. We also propose hot embedding-vector replication to alleviate the load imbalance across the reduction units.
Publisher
IEEE COMPUTER SOC
Issue Date
2021-01
Language
English
Article Type
Article
Citation

IEEE COMPUTER ARCHITECTURE LETTERS, v.20, no.1, pp.5 - 8

ISSN
1556-6056
DOI
10.1109/LCA.2020.3042805
URI
http://hdl.handle.net/10203/280737
Appears in Collection
EE-Journal Papers(저널논문)
Files in This Item
000607793700001.pdf(1.08 MB)Download
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 12 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0