Exploring multinomial naïve Bayes for Yorùbá text document classification

I.I.  Ayogu

doi:10.4314/njt.v39i2.23

download PDF

Published:

Jul 16, 2020

DOI:

10.4314/njt.v39i2.23

Keywords:

Supervised learning, text classification, Yorùbá language, text mining, BoW Representation

Issue

Vol. 39 No. 2 (2020)

Section

Articles

Copyright belongs to the Faculty of Engineering, University of Nigeria, Nsukka, Nigeria

The contents of the articles are the sole opinion of the author(s) and not of NIJOTECH.
NIJOTECH allows open access for distribution of the published articles in any media so long as whole (not part) of articles are distributed.
A copyright and statement of originality documents will need to be filled out clearly and signed prior to publication of an accepted article. The Copyright form can be downloaded from https://www.nijotech.com/downloads/COPYRIGHT%20FORM.pdf while the Statement of Originality is in Statement of Originality.pdf (nijotech.com)
For articles that were developed from funded research, a clear acknowledgement of such support should be mentioned in the article with relevant references. Authors are expected to provide complete information on the sponsorship and intellectual property rights of the article together with all exceptions.
It is forbidden to publish the same research report in more than one journal.

I.I. Ayogu

Abstract

The recent increase in the emergence of Nigerian language text online motivates this paper in which the problem of classifying text documents written in Yorùbá language into one of a few pre-designated classes is considered. Text document classification/categorization research is well established for English language and many other languages; this is not so for Nigerian languages. This paper evaluated the performance of a multinomial Naive Bayes model learned on a research dataset consisting of 100 samples of text each from business, sporting, entertainment, technology and political domains, separately on unigram, bigram and trigram features obtained using the bag of words representation approach. Results show that the performance of the model over unigram and bigram features is comparable but significantly better than a model learned on trigram features. The results generally indicate a possibility for the practical application of NB algorithm to the classification of text documents written in Yorùbá language.

Keywords: Supervised learning, text classification, Yorùbá language, text mining, BoW Representation

Nigerian Journal of Technology
Journal / Nigerian Journal of Technology / Vol. 39 No. 2 (2020) / Articles

Published:

DOI:

Keywords:

Exploring multinomial naïve Bayes for Yorùbá text document classification

I.I. Ayogu

Abstract

Journal Identifiers

Article Sidebar

Published:

DOI:

Keywords:

Article Details

Main Article Content

I.I. Ayogu

Abstract

Journal Identifiers