Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA), 2019, p.436-448
2019

Details

Autor(en) / Beteiligte
Titel
Filter Caching for Free: The Untapped Potential of the Store-Buffer
Ist Teil von
  • 2019 ACM/IEEE 46th Annual International Symposium on Computer Architecture (ISCA), 2019, p.436-448
Ort / Verlag
ACM
Erscheinungsjahr
2019
Link zum Volltext
Quelle
IEEE Electronic Library (IEL)
Beschreibungen/Notizen
  • Modern processors contain store-buffers to allow stores to retire under a miss, thus hiding store-miss latency. The store-buffer needs to be large (for performance) and searched on every load (for correctness), thereby making it a costly structure in both area and energy. Yet on every load, the store-buffer is probed in parallel with the L1 and TLB, with no concern for the store-buffer's intrinsic hit rate or whether a store-buffer hit can be predicted to save energy by disabling the L1 and TLB probes. In this work we cache data that have been written back to memory in a unified store-queue/buffer/cache, and predict hits to avoid L1/TLB probes and save energy. By dynamically adjusting the allocation of entries between the store-queue/buffer/cache, we can achieve nearly optimal reuse, without causing stalls. We are able to do this efficiently and cheaply by recognizing key properties of stores: free caching (since they must be written into the store-buffer for correctness we need no additional data movement), cheap coherence (since we only need to track state changes of the local, dirty data in the store-buffer), and free and accurate hit prediction (since the memory dependence predictor already does this for scheduling). As a result, we are able to increase the store-buffer hit rate and reduce store-buffer/TLB/L1 dynamic energy by 11.8% (up to 26.4%) on SPEC2006 without hurting performance (average IPC improvements of 1.5%, up to 4.7%). The cost for these improvements is a 0.2% increase in L1 cache capacity (1 bit per line) and one additional tail pointer in the store-buffer.
Sprache
Englisch
Identifikatoren
ISBN: 9781450366694, 1450366694
eISSN: 2575-713X
DOI: 10.1145/3307650.3322269
Titel-ID: cdi_ieee_primary_8980329

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX