Main Article Content

Heterogeneous Distributed Ensemble Feature Selection: An Enhancement Approach to Machine Learning for Phishing Detection


B.M. Olukoya
G.O. Ogunleye
P.O. Olabisi
A.T. Olusesi
A.A. Osobukola

Abstract

Phishing is a significant cybersecurity issue due to the rapid technological advancements facilitating it. Detecting these attacks is
challenging as the techniques continually evolve. While numerous strategies have been deployed, no single solution is foolproof.
Machine learning is currently favoured for combating phishing, this particular method comprises several steps, with feature
selection playing a critical step. The quality of the features selected in building the machine learning model plays a significant role.
Traditional feature selection methods have limitations, such as determining a cutoff point and high computation. To overcome
these, a novel ensemble feature selection strategy was used, discarding correlated features and using a Borda count algorithm to
enhance selection performance. Three filter-based predictors were used in the first phase, and the innovative HDEFS was applied
in the second phase, producing unique baseline webpage features. The results showed that models using HDEFS features
improved phishing detection. The bagged SVM model achieved the highest accuracy of 97.4%, outperforming other models. The
study suggests that the selection of optimal webpage features through the innovative proposed ensemble feature selection
approach astronomical improves the performance of the phishing detection performance. Likewise, it produced efficient new
features different from outdated features such as IpAddress, AtSymbol, QueryLength, MissingTitle, NumQueryComponents
previously used by the prior studies.


Journal Identifiers


eISSN: 2756-4843