Impact of feature selection on classification via clustering techniques in software defect prediction

F.E. Usman-Hamza; A.F. Atte; A.O. Balogun; H.A. Mojeed; A.O. Bajeh; V.E. Adeyemo

doi:10.4314/jcsia.v26i1.8

download PDF

Published:

Feb 9, 2020

DOI:

10.4314/jcsia.v26i1.8

Keywords:

Classification Clustering Feature Selection Software Defect Prediction

Issue

Vol. 26 No. 1 (2019)

Section

Articles

Copyright belongs to Nigeria Computer Society (NCS)

Author Biographies

F.E. Usman-Hamza

Department of Computer Science, University of Ilorin, Ilorin, Nigeria

A.F. Atte

Department of Computer Science, University of Ilorin, Ilorin, Nigeria

A.O. Balogun

Department of Computer Science, University of Ilorin, Ilorin, Nigeria

H.A. Mojeed

Department of Computer Science, University of Ilorin, Ilorin, Nigeria

A.O. Bajeh

Department of Computer Science, University of Ilorin, Ilorin, Nigeria

V.E. Adeyemo

School of Computing and IT, Taylor’s University, Selangor, Malaysia

F.E. Usman-Hamza

A.F. Atte

A.O. Balogun

H.A. Mojeed

A.O. Bajeh

V.E. Adeyemo

Abstract

Software testing using software defect prediction aims to detect as many defects as possible in software before the software release. This plays an important role in ensuring quality and reliability. Software defect prediction can be modeled as a classification problem that classifies software modules into two classes: defective and non-defective; and classification algorithms are used for this process. This study investigated the impact of feature selection methods on classification via clustering techniques for software defect prediction. Three clustering techniques were selected; Farthest First Clusterer, K-Means and Make-Density Clusterer, and three feature selection methods: Chi-Square, Clustering Variation, and Information Gain were used on software defect datasets from NASA repository. The best software defect prediction model was farthest-first using information gain feature selection method with an accuracy of 78.69%, precision value of 0.804 and recall value of 0.788. The experimental results showed that the use of clustering techniques as a classifier gave a good predictive performance and feature selection methods further enhanced their performance. This indicates that classification via clustering techniques can give competitive results against standard classification methods with the advantage of not having to train any model using labeled dataset; as it can be used on the unlabeled datasets.

Keywords: Classification, Clustering, Feature Selection, Software Defect Prediction

Vol. 26, No 1, June, 2019

Journal of Computer Science and Its Application
Journal / Journal of Computer Science and Its Application / Vol. 26 No. 1 (2019) / Articles

Published:

DOI:

Keywords:

F.E. Usman-Hamza

A.F. Atte

A.O. Balogun

H.A. Mojeed

A.O. Bajeh

V.E. Adeyemo

Impact of feature selection on classification via clustering techniques in software defect prediction

F.E. Usman-Hamza

A.F. Atte

A.O. Balogun

H.A. Mojeed

A.O. Bajeh

V.E. Adeyemo

Abstract

Journal Identifiers

Article Sidebar

Published:

DOI:

Keywords:

Article Details

F.E. Usman-Hamza

A.F. Atte

A.O. Balogun

H.A. Mojeed

A.O. Bajeh

V.E. Adeyemo

Main Article Content

F.E. Usman-Hamza

A.F. Atte

A.O. Balogun

H.A. Mojeed

A.O. Bajeh

V.E. Adeyemo

Abstract

Journal Identifiers