A Sofware-defined Tensor Streaming Multiprocessor for Large-scale Machine Learning

Cited 11 time in webofscience Cited 0 time in scopus
  • Hit : 794
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorAbts, Dennisko
dc.contributor.authorKimmell, Garrinko
dc.contributor.authorLing, Andrewko
dc.contributor.authorKim, John Dongjunko
dc.contributor.authorBoyd, Mattko
dc.contributor.authorBitar, Andrewko
dc.contributor.authorParmar, Sahilko
dc.contributor.authorAhmed, Ibrahimko
dc.contributor.authorDiCecco, Robertoko
dc.contributor.authorHan, Davidko
dc.contributor.authorThompson, Johnko
dc.contributor.authorBye, Michaelko
dc.contributor.authorHwang, Jenniferko
dc.contributor.authorFowers, Jeremyko
dc.contributor.authorLillian, Peterko
dc.contributor.authorMurthy, Ashwinko
dc.contributor.authorMehtabuddin, Elyasko
dc.contributor.authorTekur, Chetanko
dc.contributor.authorSohmers, Thomasko
dc.contributor.authorKang, Krisko
dc.contributor.authorMaresh, Stephenko
dc.contributor.authorRoss, Jonathanko
dc.date.accessioned2022-11-28T08:05:01Z-
dc.date.available2022-11-28T08:05:01Z-
dc.date.created2022-11-26-
dc.date.issued2022-06-18-
dc.identifier.citation49th IEEE/ACM International Symposium on Computer Architecture, ISCA 2022, pp.567 - 580-
dc.identifier.issn1063-6897-
dc.identifier.urihttp://hdl.handle.net/10203/301182-
dc.description.abstractWe describe our novel commercial software-defned approach for large-scale interconnection networks of tensor streaming processing (TSP) elements. The system architecture includes packaging, routing, and flow control of the interconnection network of TSPs. We describe the communication and synchronization primitives of a bandwidth-rich substrate for global communication. This scalable communication fabric provides the backbone for large-scale systems based on a software-defned Dragonfly topology, ultimately yielding a parallel machine learning system with elasticity to support a variety of workloads, both training and inference. We extend the TSP's producer-consumer stream programming model to include global memory which is implemented as logically shared, but physically distributed SRAM on-chip memory. Each TSP contributes 220 MiBytes to the global memory capacity, with the maximum capacity limited only by the network's scale-the maximum number of endpoints in the system. The TSP acts as both a processing element (endpoint) and network switch for moving tensors across the communication links. We describe a novel software-controlled networking approach that avoids the latency variation introduced by dynamic contention for network links. We describe the topology, routing and flow control to characterize the performance of the network that serves as the fabric for a large-scale parallel machine learning system with up to 10,440 TSPs and more than 2 TeraBytes of global memory accessible in less than 3 microseconds of end-to-end system latency.-
dc.languageEnglish-
dc.publisherACM-
dc.titleA Sofware-defined Tensor Streaming Multiprocessor for Large-scale Machine Learning-
dc.typeConference-
dc.identifier.wosid000852702500040-
dc.identifier.scopusid2-s2.0-85132810555-
dc.type.rimsCONF-
dc.citation.beginningpage567-
dc.citation.endingpage580-
dc.citation.publicationname49th IEEE/ACM International Symposium on Computer Architecture, ISCA 2022-
dc.identifier.conferencecountryUS-
dc.identifier.conferencelocationNew York-
dc.identifier.doi10.1145/3470496.3527405-
dc.contributor.localauthorKim, John Dongjun-
dc.contributor.nonIdAuthorAbts, Dennis-
dc.contributor.nonIdAuthorKimmell, Garrin-
dc.contributor.nonIdAuthorLing, Andrew-
dc.contributor.nonIdAuthorBoyd, Matt-
dc.contributor.nonIdAuthorBitar, Andrew-
dc.contributor.nonIdAuthorParmar, Sahil-
dc.contributor.nonIdAuthorAhmed, Ibrahim-
dc.contributor.nonIdAuthorDiCecco, Roberto-
dc.contributor.nonIdAuthorHan, David-
dc.contributor.nonIdAuthorThompson, John-
dc.contributor.nonIdAuthorBye, Michael-
dc.contributor.nonIdAuthorHwang, Jennifer-
dc.contributor.nonIdAuthorFowers, Jeremy-
dc.contributor.nonIdAuthorLillian, Peter-
dc.contributor.nonIdAuthorMurthy, Ashwin-
dc.contributor.nonIdAuthorMehtabuddin, Elyas-
dc.contributor.nonIdAuthorTekur, Chetan-
dc.contributor.nonIdAuthorSohmers, Thomas-
dc.contributor.nonIdAuthorKang, Kris-
dc.contributor.nonIdAuthorMaresh, Stephen-
dc.contributor.nonIdAuthorRoss, Jonathan-
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 11 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0