Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Supporting User-directed Fault Tolerance over Standard MPI
Ist Teil von
2012 IEEE 18th International Conference on Parallel and Distributed Systems, 2012, p.696-697
Ort / Verlag
IEEE
Erscheinungsjahr
2012
Quelle
IEEE Electronic Library (IEL)
Beschreibungen/Notizen
User-directed means the process of carrying out fault tolerance is dynamic and the fault tolerance mode is chosen by users based on application requirements. In this paper, we introduce a general scheme based on standard MPI to provide the user directed support for application level algorithmic fault tolerance. The user-directed fault tolerance plays the role as a connection between applications and algorithmic fault tolerance. As a case study, our scheme has been incorporated to HPL combined with a non-blocking ABFT technique. We have tested the functional availability of our scheme for fault tolerance in real circumstance. We also evaluated that when there is no failure occurring, our support only brings 2.5 percent overhead. When failure occurs, with our scheme, the scalability of algorithmic fault tolerance maintains well.