Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 10 von 130
2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), 2023, p.728-729
2023
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
Communication Analysis for Multidimensional Parallel Training of Large-scale DNN Models
Ist Teil von
  • 2023 IEEE International Conference on High Performance Computing & Communications, Data Science & Systems, Smart City & Dependability in Sensor, Cloud & Big Data Systems & Application (HPCC/DSS/SmartCity/DependSys), 2023, p.728-729
Ort / Verlag
IEEE
Erscheinungsjahr
2023
Quelle
IEEE Xplore
Beschreibungen/Notizen
  • Multidimensional parallel training has been widely applied to train large-scale deep learning models like GPT-3. The efficiency of parameter communication among training devices/processes is often the performance bottleneck of large model training. Analysis of parameter communication mode and traffic has important reference significance for the research of interconnection network design and computing task scheduling to improve the training performance. In this paper, we analyze the parametric communication modes in typical 3D parallel training (data parallelism, pipeline parallelism, and tensor parallelism), and model the traffic in different communication modes. Finally, taking GPT-3 as an example, we present the communication in its 3D parallel training.
Sprache
Englisch
Identifikatoren
DOI: 10.1109/HPCC-DSS-SmartCity-DependSys60770.2023.00104
Titel-ID: cdi_ieee_primary_10466947

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX