Identifying recovery patterns from resource usage data of cluster systems

Nentawe Gurumdimma; Gideon Dadik Bibu; Desmond Bala Bisandu; Mammuan Titus Alams

download PDF

Published:

Feb 18, 2019

DOI:

Keywords:

Change point detection resource usage data recovery sequence detection large-scale HPC systems

Issue

Vol. 13 No. 4 (2018)

Section

Articles

Copyright belongs to the journal. Journal is Open Access

Nentawe Gurumdimma

Gideon Dadik Bibu

Desmond Bala Bisandu

Mammuan Titus Alams

Abstract

Failure of Cluster Systems has proven to be of adverse effect and it can be costly. System administrators have employed divide and conquer approach to diagnosing the root-cause of such failure in order to take corrective or preventive measures. Most times, event logs are the source of the information about the failures. Events that characterized failures are then noted and categorized as causes of failure. However, not all the ’causative’ events lead to eventual failure, as some faults sequence experience recovery. Such sequences or patterns constitute challenge to system administrators and failure prediction tools as they add to false positives. Their presence are always predicted as “failure causing“, while in reality, they will not. In order to detect such recovery patterns of events from failure patterns, we proposed a novel approach that utilizes resource usage data of cluster systems to identify recovery and failure sequences. We further propose an online detection approach to the same problem. We experiment our approach on data from Ranger Supercomputer System and the results are positive.

Keywords: Change point detection; resource usage data; recovery sequence; detection; large-scale HPC systems

Science World Journal
Journal / Science World Journal / Vol. 13 No. 4 (2018) / Articles

Published:

DOI:

Keywords:

Identifying recovery patterns from resource usage data of cluster systems

Nentawe Gurumdimma

Gideon Dadik Bibu

Desmond Bala Bisandu

Mammuan Titus Alams

Abstract

Journal Identifiers

Article Sidebar

Published:

DOI:

Keywords:

Article Details

Main Article Content

Nentawe Gurumdimma

Gideon Dadik Bibu

Desmond Bala Bisandu

Mammuan Titus Alams

Abstract

Journal Identifiers