Efficient Synonym Filtering and Scalable Delayed Translation for Hybrid Virtual Caching

Cited 5 time in webofscience Cited 0 time in scopus
  • Hit : 336
  • Download : 0
Conventional translation look-aside buffers(TLBs) are required to complete address translation withshort latencies, as the address translation is on the criticalpath of all memory accesses even for L1 cache hits. Such strictTLB latency restrictions limit the TLB capacity, as the latencyincrease with large TLBs may lower the overall performanceeven with potential TLB miss reductions. Furthermore, TLBsconsume a significant amount of energy as they are accessedfor every instruction fetch and data access. To avoid thelatency restriction and reduce the energy consumption, virtualcaching techniques have been proposed to defer translation toafter L1 cache misses. However, an efficient solution for thesynonym problem has been a critical issue hindering the wideadoption of virtual caching.Based on the virtual caching concept, this study proposes ahybrid virtual memory architecture extending virtual cachingto the entire cache hierarchy, aiming to improve both performanceand energy consumption. The hybrid virtual cachinguses virtual addresses augmented with address space identifiers(ASID) in the cache hierarchy for common non-synonymaddresses. For such non-synonyms, the address translationoccurs only after last-level cache (LLC) misses. For uncommonsynonym addresses, the addresses are translated to physicaladdresses with conventional TLBs before L1 cache accesses. Tosupport such hybrid translation, we propose an efficient synonymdetection mechanism based on Bloom filters which canidentify synonym candidates with few false positives. For largememory applications, delayed translation alone cannot solvethe address translation problem, as fixed-granularity delayedTLBs may not scale with the increasing memory requirements.To mitigate the translation scalability problem, this studyproposes a delayed many segment translation designed for thehybrid virtual caching. The experimental results show that ourapproach effectively lowers accesses to the TLBs, leading tosignificant power savings. In addition, the approach providesperformance improvement with scalable delayed translationwith variable length segments.
Publisher
ACM SIGGRAPH and IEEE TCCA
Issue Date
2016-06
Language
English
Citation

43rd International Symposium on Computer Architecture, ISCA 2016, pp.217 - 229

DOI
10.1109/ISCA.2016.28
URI
http://hdl.handle.net/10203/218873
Appears in Collection
CS-Conference Papers(학술회의논문)
Files in This Item
There are no files associated with this item.
This item is cited by other documents in WoS
⊙ Detail Information in WoSⓡ Click to see webofscience_button
⊙ Cited 5 items in WoS Click to see citing articles in records_button

qr_code

  • mendeley

    citeulike


rss_1.0 rss_2.0 atom_1.0