UB Paderborn / Katalog / Suche / Details

Zur Ergebnisliste

Ergebnis 9 von 777

Optimizing occupancy and ILP on the GPU using a combinatorial approach

Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020, p.133-144

2020

Details

Autor(en) / Beteiligte

Titel

Optimizing occupancy and ILP on the GPU using a combinatorial approach

Ist Teil von

Proceedings of the 18th ACM/IEEE International Symposium on Code Generation and Optimization, 2020, p.133-144

Ort / Verlag

New York, NY, USA: ACM

Erscheinungsjahr

2020

Link zum Volltext

Quelle

ACM Digital Library

Beschreibungen/Notizen

This paper presents the first general solution to the problem of optimizing both occupancy and Instruction-Level Parallelism (ILP) when compiling for a Graphics Processing Unit (GPU). Exploiting ILP (minimizing schedule length) requires using more registers, but using more registers decreases occupancy (the number of thread groups that can be run in parallel). The problem of balancing these two conflicting objectives to achieve the best overall performance is a challenging open problem in code optimization. In this paper, we present a two-pass Branch-and-Bound (B&B) algorithm for solving this problem by treating occupancy as a primary objective and ILP as a secondary objective. In the first pass, the algorithm searches for a maximum-occupancy schedule, while in the second pass it iteratively searches for the shortest schedule that gives the maximum occupancy found in the first pass. The proposed scheduling algorithm was implemented in the LLVM compiler and applied to an AMD GPU. The algorithm’s performance was evaluated using benchmarks from the PlaidML machine learning framework relative to LLVM’s scheduling algorithm, AMD’s production scheduling algorithm and an existing B&B scheduling algorithm that uses a different approach. The results show that the proposed B&B scheduling algorithm speeds up almost every benchmark by up to 35% relative to LLVM’s scheduler, up to 31% relative to AMD’s scheduler and up to 18% relative to the existing B&B scheduler. The geometric-mean improvements are 16.3% relative to LLVM’s scheduler, 5.5% relative to AMD’s production scheduler and 6.2% relative to the existing B&B scheduler. If more compile time can be tolerated, a geometric-mean improvement of 6.3% relative to AMD’s scheduler can be achieved.

Sprache: Englisch
Identifikatoren: ISBN: 1450370470, 9781450370479
DOI: 10.1145/3368826.3377918
Titel-ID: cdi_acm_books_10_1145_3368826_3377918

Format: –
Schlagworte: Software and its engineering -- Software notations and tools -- Compilers

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Optimizing occupancy and ILP on the GPU using a combinatorial approach

Details

Weiterführende Literatur