Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012, p.235-246
2012

Details

Autor(en) / Beteiligte
Titel
Fundamental Latency Trade-off in Architecting DRAM Caches: Outperforming Impractical SRAM-Tags with a Simple and Practical Design
Ist Teil von
  • 2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012, p.235-246
Ort / Verlag
Washington, DC, USA: IEEE Computer Society
Erscheinungsjahr
2012
Link zum Volltext
Quelle
IEEE Xplore
Beschreibungen/Notizen
  • This paper analyzes the design trade-offs in architecting large-scale DRAM caches. Prior research, including the recent work from Loh and Hill, have organized DRAM caches similar to conventional caches. In this paper, we contend that some of the basic design decisions typically made for conventional caches (such as serialization of tag and data access, large associativity, and update of replacement state) are detrimental to the performance of DRAM caches, as they exacerbate the already high hit latency. We show that higher performance can be obtained by optimizing the DRAM cache architecture first for latency, and then for hit rate. We propose a latency-optimized cache architecture, called Alloy Cache, that eliminates the delay due to tag serialization by streaming tag and data together in a single burst. We also propose a simple and highly effective Memory Access Predictor that incurs a storage overhead of 96 bytes per core and a latency of 1 cycle. It helps service cache misses faster without the need to wait for a cache miss detection in the common case. Our evaluations show that our latency-optimized cache design significantly outperforms both the recent proposal from Loh and Hill, as well as an impractical SRAM Tag-Store design that incurs an unacceptable overhead of several tens of megabytes. On average, the proposal from Loh and Hill provides 8.7% performance improvement, the "idealized" SRAM Tag design provides 24%, and our simple latency-optimized design provides 35%.

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX