Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
International journal of parallel programming, 2019-02, Vol.47 (1), p.39-58
2019
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
Compiler Optimization of Accelerator Data Transfers
Ist Teil von
  • International journal of parallel programming, 2019-02, Vol.47 (1), p.39-58
Ort / Verlag
New York: Springer US
Erscheinungsjahr
2019
Quelle
Springer Journals
Beschreibungen/Notizen
  • Accelerators such as GPUs, FPGAs, and many-core processors can provide significant performance improvements, but their effectiveness is dependent upon the skill of programmers to manage their complex architectures. One area of difficulty is determining which data to transfer on and off of the accelerator and when. Poorly placed data transfers can result in overheads that completely dwarf the benefits of using accelerators. To know what data to transfer, and when, the programmer must understand the data-flow of the transferred memory locations throughout the program, and how the accelerator region fits into the program as a whole. We argue that compilers should take on the responsibility of data transfer scheduling, thereby reducing the demands on the programmer, and resulting in improved program performance and program efficiency from the reduction in the number of bytes transferred. We show that by performing whole-program transfer scheduling on accelerator data transfers we are able to automatically eliminate up to 99% of the bytes transferred to and from the accelerator compared to transfering all data immediately before and after kernel execution for all data involved. The analysis and optimization are language and accelerator-agnostic, but for our examples and testing they have been implemented into an OpenMP to LLVM-IR to CUDA workflow.

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX