UB Paderborn / Katalog / Suche / Details

Ergebnis 24 von 195

Tarazu: optimizing MapReduce on heterogeneous clusters

Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, 2012, p.61-74

2012

Details

Autor(en) / Beteiligte

Titel

Tarazu: optimizing MapReduce on heterogeneous clusters

Ist Teil von

Proceedings of the seventeenth international conference on Architectural Support for Programming Languages and Operating Systems, 2012, p.61-74

Ort / Verlag

New York, NY, USA: ACM

Erscheinungsjahr

2012

Link zum Volltext

Quelle

ACM Digital Library

Beschreibungen/Notizen

Data center-scale clusters are evolving towards heterogeneous hardware for power, cost, differentiated price-performance, and other reasons. MapReduce is a well-known programming model to process large amount of data on data center-scale clusters. Most MapReduce implementations have been designed and optimized for homogeneous clusters. Unfortunately, these implementations perform poorly on heterogeneous clusters (e.g., on a 90-node cluster that contains 10 Xeon-based servers and 80 Atom-based servers, Hadoop performs worse than on 10-node Xeon-only or 80-node Atom-only homogeneous sub-clusters for many of our benchmarks). This poor performance remains despite previously proposed optimizations related to management of straggler tasks. In this paper, we address MapReduce's poor performance on heterogeneous clusters. Our first contribution is that the poor performance is due to two key factors: (1) the non-intuitive effect that MapReduce's built-in load balancing results in excessive and bursty network communication during the Map phase, and (2) the intuitive effect that the heterogeneity amplifies load imbalance in the Reduce computation. Our second contribution is Tarazu, a suite of optimizations to improve MapReduce performance on heterogeneous clusters. Tarazu consists of (1) Communication-Aware Load Balancing of Map computation (CALB) across the nodes, (2) Communication-Aware Scheduling of Map computation (CAS) to avoid bursty network traffic and (3) Predictive Load Balancing of Reduce computation (PLB) across the nodes. Using the above 90-node cluster, we show that Tarazu significantly improves performance over a baseline of Hadoop with straightforward tuning for hardware heterogeneity.

Sprache: Englisch
Identifikatoren: ISBN: 9781450307598, 1450307590
DOI: 10.1145/2150976.2150984
Titel-ID: cdi_acm_books_10_1145_2150976_2150984

Format: –
Schlagworte: Computing methodologies -- Distributed computing methodologies -- Distributed programming languages, Computing methodologies -- Parallel computing methodologies -- Parallel programming languages, Software and its engineering -- Software notations and tools -- General programming languages -- Language types -- Distributed programming languages, Software and its engineering -- Software notations and tools -- General programming languages -- Language types -- Parallel programming languages

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX

Menü

Weitere Dienste

Einstellungen

Tarazu: optimizing MapReduce on heterogeneous clusters

Details

Weiterführende Literatur