Valkyrie: Leveraging inter-TLB locality to enhance GPU performance

Cited 0 time in webofscience Cited 0 time in scopus
  • Hit : 134
  • Download : 0
DC FieldValueLanguage
dc.contributor.authorBaruah, Trinayanko
dc.contributor.authorSun, Yifanko
dc.contributor.authorMojumder, Saiful Ako
dc.contributor.authorAbellán, José Lko
dc.contributor.authorUkidave, Yashko
dc.contributor.authorJoshi, Ajayko
dc.contributor.authorRubin, Normanko
dc.contributor.authorKim, Johnko
dc.contributor.authorKaeli, Davidko
dc.date.accessioned2021-12-01T06:50:15Z-
dc.date.available2021-12-01T06:50:15Z-
dc.date.created2021-11-26-
dc.date.issued2020-10-06-
dc.identifier.citation2020 ACM International Conference on Parallel Architectures and Compilation Techniques, PACT 2020, pp.456 - 466-
dc.identifier.urihttp://hdl.handle.net/10203/289868-
dc.description.abstractProgramming on a GPU has been made considerably easier with theintroduction of Virtual Memory features, which support commonpointer-based semantics between the CPU and the GPU. However,supporting virtual memory on a GPU comes with some additionalcosts and overhead, with the largest being from the support foraddress translation. The fact that a massive number of threads runconcurrently on a GPU means that the translation lookaside bu!ers(TLBs) are oversubscribed most of the time. Our investigation intoa diverse set of GPU workloads shows that TLB misses can beextremely high (up to 99%), which inevitably leads to signi"cantperformance degradation due to long-latency page-table walks. Ourpro"ling of TLB-sensitive workloads reveals a high degree of pagesharing across the di!erent cores of a GPU. In many applications,a page can be accessed in temporal proximity by multiple cores,following similar memory access patterns. To support the inherent sharing present in GPU workloads, we propose Valkyrie, anintegrated cooperative TLB prefetching mechanism and an interL1-TLB probing scheme that can e#ciently reduce TLB bottlenecksin GPUs. Our evaluation using a diverse set of GPU workloadsreveals that Valkyrie is able to achieve an average speedup of 1.95?,while adding modest hardware overhead.-
dc.languageEnglish-
dc.publisherInstitute of Electrical and Electronics Engineers Inc.-
dc.titleValkyrie: Leveraging inter-TLB locality to enhance GPU performance-
dc.typeConference-
dc.identifier.scopusid2-s2.0-85094207692-
dc.type.rimsCONF-
dc.citation.beginningpage456-
dc.citation.endingpage466-
dc.citation.publicationname2020 ACM International Conference on Parallel Architectures and Compilation Techniques, PACT 2020-
dc.identifier.conferencecountryUS-
dc.identifier.conferencelocationVirtual-
dc.identifier.doi10.1145/3410463.3414639-
dc.contributor.localauthorKim, John-
dc.contributor.nonIdAuthorBaruah, Trinayan-
dc.contributor.nonIdAuthorSun, Yifan-
dc.contributor.nonIdAuthorMojumder, Saiful A-
dc.contributor.nonIdAuthorAbellán, José L-
dc.contributor.nonIdAuthorUkidave, Yash-
dc.contributor.nonIdAuthorJoshi, Ajay-
dc.contributor.nonIdAuthorRubin, Norman-
dc.contributor.nonIdAuthorKaeli, David-
Appears in Collection
EE-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0