UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Ergebnis 20 von 1663

Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation

2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012, p.107-118

2012

Details

Autor(en) / Beteiligte

Titel

Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation

Ist Teil von

2012 45th Annual IEEE/ACM International Symposium on Microarchitecture, 2012, p.107-118

Ort / Verlag

Washington, DC, USA: IEEE Computer Society

Erscheinungsjahr

2012

Link zum Volltext

IEEE_Xplore

Quelle

IEEE Electronic Library (IEL)

Beschreibungen/Notizen

Data warehousing applications represent an emerging application arena that requires the processing of relational queries and computations over massive amounts of data. Modern general purpose GPUs are high bandwidth architectures that potentially offer substantial improvements in throughput for these applications. However, there are significant challenges that arise due to the overheads of data movement through the memory hierarchy and between the GPU and host CPU. This paper proposes data movement optimizations to address these challenges. Inspired in part by loop fusion optimizations in the scientific computing community, we propose kernel fusion as a basis for data movement optimizations. Kernel fusion fuses the code bodies of two GPU kernels to i) reduce data footprint to cut down data movement throughout GPU and CPU memory hierarchy, and ii) enlarge compiler optimization scope. We classify producer consumer dependences between compute kernels into three types, i) fine-grained thread-to-thread dependences, ii) medium-grained thread block dependences, and iii) coarse-grained kernel dependences. Based on this classification, we propose a compiler framework, Kernel Weaver, that can automatically fuse relational algebra operators thereby eliminating redundant data movement. The experiments on NVIDIA Fermi platforms demonstrate that kernel fusion achieves 2.89x speedup in GPU computation and a 2.35x speedup in PCIe transfer time on average across the micro-benchmarks tested. We present key insights, lessons learned, measurements from our compiler implementation, and opportunities for further improvements.

Sprache: Englisch
Identifikatoren: ISBN: 9780769549248, 0769549241
ISSN: 1072-4451
DOI: 10.1109/MICRO.2012.19
Titel-ID: cdi_ieee_primary_6493612

Format: –
Schlagworte: Compiler Optimization, Computer systems organization -- Dependable and fault-tolerant systems and networks, Computing methodologies -- Computer graphics -- Graphics systems and interfaces -- Graphics processors, General and reference -- Cross-computing tools and techniques -- Performance, GPU, Information systems -- Information systems applications, Networks -- Network performance evaluation

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Kernel Weaver: Automatically Fusing Database Primitives for Efficient GPU Computation

Details

Weiterführende Literatur