An enhanced stemmer algorithm for geez text: a long match approach

Afewerki Hafteslassie Bahta; Melquiades Jr. Rigor Hayag; Tsegay Gebremeskel

download PDF

Published:

Jan 18, 2019

DOI:

Keywords:

Information Retrieval (IR) Stemming morphology Natural Language Processing Suffixes Prefixes Algorithm

Issue

Vol. 15 No. 2 (2018)

Section

Articles

The Association of Information Professionals of Nigeria

Afewerki Hafteslassie Bahta

Melquiades Jr. Rigor Hayag

Tsegay Gebremeskel

Abstract

In this study, development and enhancement of a stemmer algorithm for Geez texts were presented. The general approach used a longest-match principle and in this stemmer takes a corpus an input, then expand short words, remove punctual marks, special characters and numbers (normalization), remove stop words, identify the case when an affix is not a real affix (exceptions), handle irregular words, and finally removes its affixes and the corpus input is taking in Geez language and the resulting corpus in Geez language. In this enhanced stemmer, there is no need of transliteration; and stemmer is implemented with a user interface which make the stemmer easily understandable to none expert users and learners. The prototype was tested with three datasets with vary of 2000 words. To evaluate stemmer, manual error counting method was used. According to the evaluation of the experiments, the results showed that it achieved an average accuracy of 87.22% and the proposed method generated some errors over stemming and under stemming errors were 8.31% and 4.35%, respectively. In conclusion, an overall accuracy of the stemmer was encouraging which shows stemming can be performed with low error rates in morphologically rich languages such as Geez language. Finally, researchers found out that infixed words affect in geez stem words. They also found out that it is possible to use the stemmer for developing morphological analyzer, parser, and spell checker, thesaurus and word frequency counting and so on.

Keywords: Information, Retrieval (IR), Stemming, morphology, Natural Language, Processing, Suffixes, Prefixes, Algorithm

Information Technologist (The)
Journal / Information Technologist (The) / Vol. 15 No. 2 (2018) / Articles

Published:

DOI:

Keywords:

An enhanced stemmer algorithm for geez text: a long match approach

Afewerki Hafteslassie Bahta

Melquiades Jr. Rigor Hayag

Tsegay Gebremeskel

Abstract

Journal Identifiers

Article Sidebar

Published:

DOI:

Keywords:

Article Details

Main Article Content

Afewerki Hafteslassie Bahta

Melquiades Jr. Rigor Hayag

Tsegay Gebremeskel

Abstract

Journal Identifiers