UB Paderborn / Katalog / Suche / Details

Ergebnis 10 von 45

Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2022, p.135-149

2022

Autor(en) / Beteiligte

Titel

Near-optimal sparse allreduce for distributed deep learning

Ist Teil von

Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, 2022, p.135-149

Ort / Verlag

New York, NY, USA: ACM

Erscheinungsjahr

2022

Link zum Volltext

Quelle

ACM Digital Library Complete

Beschreibungen/Notizen

Communication overhead is one of the major obstacles to train large deep learning models at scale. Gradient sparsification is a promising technique to reduce the communication volume. However, it is very challenging to obtain real performance improvement because of (1) the difficulty of achieving an scalable and efficient sparse allreduce algorithm and (2) the sparsification overhead. This paper proposes Ok-Topk, a scheme for distributed training with sparse gradients. Ok-Topk integrates a novel sparse allreduce algorithm (less than 6k communication volume which is asymptotically optimal) with the decentralized parallel Stochastic Gradient Descent (SGD) optimizer, and its convergence is proved. To reduce the sparsification overhead, Ok-Topk efficiently selects the top-k gradient values according to an estimated threshold. Evaluations are conducted on the Piz Daint supercomputer with neural network models from different deep learning domains. Empirical results show that Ok-Topk achieves similar model accuracy to dense allreduce. Compared with the optimized dense and the state-of-the-art sparse allreduces, Ok-Topk is more scalable and significantly improves training throughput (e.g., 3.29x-12.95x improvement for BERT on 256 GPUs).

Sprache: Englisch
Identifikatoren: ISBN: 9781450392044, 1450392040
DOI: 10.1145/3503221.3508399
Titel-ID: cdi_acm_books_10_1145_3503221_3508399_brief

Format: –
Schlagworte: Computing methodologies -- Machine learning -- Machine learning approaches -- Neural networks, Theory of computation -- Design and analysis of algorithms -- Parallel algorithms

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX