Comparative analysis of text categorization algorithms

A.P. Adewole; D.M. Omitiran

download PDF

Published:

Aug 3, 2018

DOI:

Keywords:

classifier Decision trees k-Nearest Neighbour (kNN) Machine learning Naïve Bayes Support Vector Machines (SVM) text categorization text classification

Issue

Vol. 24 No. 2 (2017)

Section

Articles

Copyright belongs to Nigeria Computer Society (NCS)

A.P. Adewole

D.M. Omitiran

Abstract

Text categorization (also known as text classification) is the task of automatically assigning documents to a category (or categories) from a pre-specified set. This task has several applications, including spam filtering, identification of document genre, automated indexing of scientific articles according to a predefined thesauri of technical terms, and even the automated extraction of metadata. The importance of text categorization cannot be overemphasized due to the fact that unstructured texts are the largest readily available source of data and manual organization of this data is infeasible due to the large number of documents involved as well as time constraints. The accuracy of modern text categorization machines rivals that of trained human professionals. This study experimentally compared four machine learning classifiers used in text categorization. These algorithms are; Naïve Bayes, Decision trees, k-Nearest Neighbour (kNN) and Support Vector Machines (SVM). These classifiers were developed using Python programming language. When run on the Reuters dataset, SVM significantly outperforms Naïve Bayes, kNN and Decision Trees. Decision trees performed worst of the four algorithms considered in this study. From observations made during the course of running these experiments, there seems to be a trade-off between simplicity and effectiveness. In conclusion, the results of this comparative analysis prove that SVM is the most effective of the classifiers considered in this study.

Keywords: classifier, Decision trees, k-Nearest Neighbour (kNN), Machine learning, Naïve Bayes, Support Vector Machines (SVM), text categorization, text classification

Journal of Computer Science and Its Application
Journal / Journal of Computer Science and Its Application / Vol. 24 No. 2 (2017) / Articles

Published:

DOI:

Keywords:

Comparative analysis of text categorization algorithms

A.P. Adewole

D.M. Omitiran

Abstract

Journal Identifiers

Article Sidebar

Published:

DOI:

Keywords:

Article Details

Main Article Content

A.P. Adewole

D.M. Omitiran

Abstract

Journal Identifiers