Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 12 von 146
IEEE transactions on parallel and distributed systems, 2017-09, Vol.28 (9), p.2539-2552
2017
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
Improving Execution Concurrency of Large-Scale Matrix Multiplication on Distributed Data-Parallel Platforms
Ist Teil von
  • IEEE transactions on parallel and distributed systems, 2017-09, Vol.28 (9), p.2539-2552
Ort / Verlag
New York: IEEE
Erscheinungsjahr
2017
Quelle
IEEE Electronic Library (IEL)
Beschreibungen/Notizen
  • Matrix multiplication is a dominant but very time-consuming operation in many big data analytic applications. Thus its performance optimization is an important and fundamental research issue. The performance of large-scale matrix multiplication on distributed data-parallel platforms is determined by both computation and IO costs. For existing matrix multiplication execution strategies, when the execution concurrency scales up above a threshold, their execution performance deteriorates quickly because the increase of the IO cost outweighs the decrease of the computation cost. This paper presents a novel parallel execution strategy CRMM (Concurrent Replication-based Matrix Multiplication) along with a parallel algorithm, Marlin, for large-scale matrix multiplication on data-parallel platforms. The CRMM strategy exploits higher execution concurrency for sub-block matrix multiplication with the same IO cost. To further improve the performance of Marlin, we also propose a number of novel system-level optimizations, including increasing the concurrency of local data exchange by calling native library in batch, reducing the overhead of block matrix transformation, and reducing disk heavy shuffle operations by exploiting the semantics of matrix computation. We have implemented Marlin as a library along with a set of related matrix operations on Spark and also contributed Marlin to the open-source community. For large-sized matrix multiplication, Marlin outperforms existing systems including Spark MLlib, SystemML and SciDB, with about 1.29×, 3.53× and 2.21× speedup on average, respectively. The evaluation upon a real-world DNN workload also indicates that Marlin outperforms above systems by about 12.8×, 5.1× and 27.2× speedup, respectively.

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX