UB Paderborn / Katalog / Details

Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...

Automated Compiler Optimization of Multiple Vector Loads/Stores

International journal of parallel programming, 2018-04, Vol.46 (2), p.471-503

2018

Details

Autor(en) / Beteiligte

Titel

Automated Compiler Optimization of Multiple Vector Loads/Stores

Ist Teil von

International journal of parallel programming, 2018-04, Vol.46 (2), p.471-503

Ort / Verlag

New York: Springer US

Erscheinungsjahr

2018

Link zum Volltext

Quelle

SpringerLink

Beschreibungen/Notizen

With widening vectors and the proliferation of advanced vector instructions in today’s processors, vectorization plays an ever-increasing role in delivering application performance. Achieving the performance potential of this vector hardware has required significant support from the software level such as new explicit vector programming models and advanced vectorizing compilers. Today, with the combination of these software tools plus new SIMD ISA extensions like gather/scatter instructions it is not uncommon to find that even codes with complex and irregular data access patterns can be vectorized. In this paper we focus on these vectorized codes with irregular accesses, and show that while the best-in-class Intel Compiler Vectorizer does indeed provide speedup through efficient vectorization, there are some opportunities where clever program transformations can increase performance further. After identifying these opportunities, this paper describes two automatic compiler optimizations to target these data access patterns. The first optimization focuses on improving the performance for a group of adjacent gathers/scatters. The second optimization improves performance for a group of stencil vector accesses using more efficient SIMD instructions. Both optimizations are now implemented in the 17.0 version of the Intel Compiler. We evaluate the optimizations using an extensive set of micro-kernels, representative benchmarks and application kernels. On these benchmarks, we demonstrate performance gains of 3–750% on the Intel ® Xeon processor (Haswell—HSW), up to 25% on the Intel ® Xeon Phi TM coprocessor (Knights Corner—KNC), and up to 430% on the Intel ® Xeon Phi TM processor with AVX-512 instructions support (Knights Landing—KNL).

Sprache: Englisch
Identifikatoren: ISSN: 0885-7458
eISSN: 1573-7640
DOI: 10.1007/s10766-016-0485-7
Titel-ID: cdi_crossref_primary_10_1007_s10766_016_0485_7

Format: –
Schlagworte: Automation, Benchmarks, Compilers, Computer Science, Group dynamics, Kernels, Microprocessors, Optimization, Performance enhancement, Processor Architectures, Software, Software development tools, Software Engineering/Programming and Operating Systems, Theory of Computation, Vector processing (computers)

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Automated Compiler Optimization of Multiple Vector Loads/Stores

Details

Weiterführende Literatur