Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Ergebnis 18 von 2789
The international journal of high performance computing applications, 2016-08, Vol.30 (3), p.305-319
2016
Volltextzugriff (PDF)

Details

Autor(en) / Beteiligte
Titel
Evaluating and extending user-level fault tolerance in MPI applications
Ist Teil von
  • The international journal of high performance computing applications, 2016-08, Vol.30 (3), p.305-319
Ort / Verlag
London, England: SAGE Publications
Erscheinungsjahr
2016
Quelle
Alma/SFX Local Collection
Beschreibungen/Notizen
  • The user-level failure mitigation (ULFM) interface has been proposed to provide fault-tolerant semantics in the Message Passing Interface (MPI). Previous work presented performance evaluations of ULFM; yet questions related to its programability and applicability, especially to non-trivial, bulk synchronous applications, remain unanswered. In this article, we present our experiences on using ULFM in a case study with a large, highly scalable, bulk synchronous molecular dynamics application to shed light on the advantages and difficulties of this interface to program fault-tolerant MPI applications. We found that, although ULFM is suitable for master–worker applications, it provides few benefits for more common bulk synchronous MPI applications. To address these limitations, we introduce a new, simpler fault-tolerant interface for complex, bulk synchronous MPI programs with better applicability and support than ULFM for application-level recovery mechanisms, such as global rollback.
Sprache
Englisch
Identifikatoren
ISSN: 1094-3420
eISSN: 1741-2846
DOI: 10.1177/1094342015623623
Titel-ID: cdi_osti_scitechconnect_1342070

Weiterführende Literatur

Empfehlungen zum selben Thema automatisch vorgeschlagen von bX