Sie befinden Sich nicht im Netzwerk der Universität Paderborn. Der Zugriff auf elektronische Ressourcen ist gegebenenfalls nur via VPN oder Shibboleth (DFN-AAI) möglich. mehr Informationen...
Failure analysis of a fault-tolerant 2-node server system
Ist Teil von
RAMS '06. Annual Reliability and Maintainability Symposium, 2006, 2006, p.526-531
Ort / Verlag
IEEE
Erscheinungsjahr
2006
Quelle
IEEE Xplore
Beschreibungen/Notizen
In this paper, we present an integrated model of hardware and software failures of a fault-tolerant 2-node server system used in a real-life application of an archive system. Each node runs a distinct component of the server application software and identical copies of a fault monitoring service. The fault monitoring service on each node monitors the status of its local application software as well as the availability of the hardware and software on the other node. Upon a node failure, the fault monitoring service on the good node transfers the application software on the failed node to the good node. Upon the failure of an application software component or fault monitoring service, an automatic restoration is performed by the available fault monitoring service. The failed nodes are restored on a first-come, first-serve basis by a single repair facility. The failure and restoration processes of the hardware and software are highly dependent on the status of other components as well as the sequence of failure events. Therefore, we employ a decomposition method that uses both combinatorial analysis as well as Markov-based state space analysis to solve the problem. The proposed method allows us to extend the analysis easily for the cases of multiple nodes, software components, and different repair policies