Signal based Ethiopian languages identification using Gaussian mixture model

Mikias Wondimu; Menore Tekeba

download PDF

Published:

Nov 15, 2019

Keywords:

Accuracy GMM LID Language Identification System MFCC Utterance

Issue

Vol. 37 (2019)

Section

Articles

Mikias Wondimu

Menore Tekeba

Abstract

A language identification (LID) system is an approach in which machines can determine the language and identify it from relatively brief audio spoken samples. Very few attempts have been made on LID Systems for Ethiopian languages. The importance of LID is increasing due to the development of telecommunication infrastructures. Using an LID, service calls from customers can be forwarded to a person who knows the language. Therefore, an LID system involving four Ethiopian languages (Amharic, Oromiffa, Guragegna and Tigregna) is done using Gaussian mixture models (GMM). A dataset consisted of recordings from seven different speakers of each language was prepared and after preprocessing the data, the features are extracted using Mel frequency cepstral coefficients (MFCC) and classification is done using GMM. The performance of the LID system was tested with scenarios by taking two, three and four languages at a time. The LID system is also tested for utterance and speaker dependence performances. The average accuracy of utterance dependent LID test for the four languages was about 93%, the utterance independent test for the four languages was about 70% while the speaker independent test, being tested on utterance dependent scenario only, for the four languages was nearly 91%.

Keywords: Accuracy, GMM, LID, Language Identification System, MFCC, Utterance

Zede Journal
Journal / Zede Journal / Vol. 37 (2019) / Articles

Published:

Keywords:

Signal based Ethiopian languages identification using Gaussian mixture model

Mikias Wondimu

Menore Tekeba

Abstract

Journal Identifiers

Article Sidebar

Published:

Keywords:

Article Details

Main Article Content

Mikias Wondimu

Menore Tekeba

Abstract

Journal Identifiers