Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 11 von 22
SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, 2022, p.1-15
2022

Details

Autor(en) / Beteiligte
Titel
Deinsum: Practically I/O Optimal Multi-Linear Algebra
Ist Teil von
  • SC22: International Conference for High Performance Computing, Networking, Storage and Analysis, 2022, p.1-15
Ort / Verlag
IEEE
Erscheinungsjahr
2022
Link zum Volltext
Quelle
IEEE Xplore Digital Library
Beschreibungen/Notizen
  • Multilinear algebra kernel performance on modern massively-parallel systems is determined mainly by data movement. However, deriving data movement-optimal distributed schedules for programs with many high-dimensional inputs is a notoriously hard problem. State-of-the-art libraries rely on heuristics and often fall back to suboptimal tensor folding and BLAS calls. We present Deinsum, an automated framework for distributed multilinear algebra computations expressed in Einstein notation, based on rigorous mathematical tools to address this problem. Our framework automatically derives data movement-optimal tiling and generates corresponding distributed schedules, further optimizing the performance of local computations by increasing their arithmetic intensity. To show the benefits of our approach, we test it on two important tensor kernel classes: Matricized Tensor Times Khatri-Rao Products and Tensor Times Matrix chains. We show performance results and scaling on the Piz Daint supercomputer, with up to 19x speedup over state-of-the-art solutions on 512 nodes.
Sprache
Englisch
Identifikatoren
eISSN: 2167-4337
DOI: 10.1109/SC41404.2022.00030
Titel-ID: cdi_ieee_primary_10046073

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX