MGPUSim: Enabling multi-GPU performance modeling and optimization

Cited 40 time in webofscience Cited 34 time in scopus
  • Hit : 267
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorSun, Yifanko
dc.contributor.authorBaruah, Trinayanko
dc.contributor.authorMojumder, Saiful A.ko
dc.contributor.authorDong, Shiko
dc.contributor.authorGong, Xiangko
dc.contributor.authorTreadway, Shaneko
dc.contributor.authorBao, Yuhuiko
dc.contributor.authorHance, Spencerko
dc.contributor.authorMcCardwell, Carterko
dc.contributor.authorZhao, Vincentko
dc.contributor.authorBarclay, Harrisonko
dc.contributor.authorZiabari, Amir Kavyanko
dc.contributor.authorChen, Zhongliangko
dc.contributor.authorUbal, Rafaelko
dc.contributor.authorAbellán, José L.ko
dc.contributor.authorKim, Johnko
dc.contributor.authorJoshi, Ajayko
dc.contributor.authorKaeli, Davidko
dc.date.accessioned2019-08-19T01:20:03Z-
dc.date.available2019-08-19T01:20:03Z-
dc.date.created2019-08-16-
dc.date.created2019-08-16-
dc.date.issued2019-06-24-
dc.identifier.citation46th International Symposium on Computer Architecture, ISCA 2019, pp.197 - 209-
dc.identifier.issn1063-6897-
dc.identifier.urihttp://hdl.handle.net/10203/264256-
dc.description.abstractThe rapidly growing popularity and scale of data-parallel workloads demand a corresponding increase in raw computational power of Graphics Processing Units (GPUs). As single-GPU platforms struggle to satisfy these performance demands, multi-GPU platforms have started to dominate the high-performance computing world. The advent of such systems raises a number of design challenges, including the GPU microarchitecture, multi-GPU interconnect fabric, runtime libraries, and associated programming models. The research community currently lacks a publicly available and comprehensive multi-GPU simulation framework to evaluate next-generation multi-GPU system designs. In this work, we present MGPUSim, a cycle-accurate, extensively validated, multi-GPU simulator, based on AMD's Graphics Core Next 3 (GCN3) instruction set architecture. MGPUSim comes with in-built support for multi-threaded execution to enable fast, parallelized, and accurate simulation. In terms of performance accuracy, MGPUSim differs by only 5.5% on average from the actual GPU hardware. We also achieve a 3.5× and a 2.5× average speedup running functional emulation and detailed timing simulation, respectively, on a 4-core CPU, while delivering the same accuracy as serial simulation. We illustrate the flexibility and capability of the simulator through two concrete design studies. In the first, we propose the Locality API, an API extension that allows the GPU programmer to both avoid the complexity of multi-GPU programming, while precisely controlling data placement in the multi-GPU memory. In the second design study, we propose <u>P</u>rogressive P<u>a</u>ge <u>S</u>plitting M<u>i</u>gration (PASI), a customized multi-GPU memory management system enabling the hardware to progressively improve data placement. For a discrete 4-GPU system, we observe that the Locality API can speed up the system by 1.6× (geometric mean), and PASI can improve the system performance by 2.6× (geometric mean) across all benchmarks, compared to a unified 4-GPU platform.-
dc.languageEnglish-
dc.publisherInstitute of Electrical and Electronics Engineers Inc.-
dc.titleMGPUSim: Enabling multi-GPU performance modeling and optimization-
dc.typeConference-
dc.identifier.wosid000521059600016-
dc.identifier.scopusid2-s2.0-85069514312-
dc.type.rimsCONF-
dc.citation.beginningpage197-
dc.citation.endingpage209-
dc.citation.publicationname46th International Symposium on Computer Architecture, ISCA 2019-
dc.identifier.conferencecountryUS-
dc.identifier.conferencelocationPhoenix, Arizona-
dc.identifier.doi10.1145/3307650.3322230-
dc.contributor.localauthorKim, John-
dc.contributor.nonIdAuthorSun, Yifan-
dc.contributor.nonIdAuthorBaruah, Trinayan-
dc.contributor.nonIdAuthorMojumder, Saiful A.-
dc.contributor.nonIdAuthorDong, Shi-
dc.contributor.nonIdAuthorGong, Xiang-
dc.contributor.nonIdAuthorTreadway, Shane-
dc.contributor.nonIdAuthorBao, Yuhui-
dc.contributor.nonIdAuthorHance, Spencer-
dc.contributor.nonIdAuthorMcCardwell, Carter-
dc.contributor.nonIdAuthorZhao, Vincent-
dc.contributor.nonIdAuthorBarclay, Harrison-
dc.contributor.nonIdAuthorZiabari, Amir Kavyan-
dc.contributor.nonIdAuthorChen, Zhongliang-
dc.contributor.nonIdAuthorUbal, Rafael-
dc.contributor.nonIdAuthorAbellán, José L.-
dc.contributor.nonIdAuthorJoshi, Ajay-
dc.contributor.nonIdAuthorKaeli, David-
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 40 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0