Main Article Content

Identifying Amharic-Tigrigna Shared Features: Towards Optimizing Implementation of Under Resourced Languages


Lemlem Hagos
Million Meshesha
Solomon Atnafu
Solomon Teferra

Abstract

In this article, exploratory research is conducted to analyze statistical overlap across Amharic and Tigrigna at different level of abstraction, namely, word level, CV syllable level, and at phoneme level. Amharic and Tigrigna are among the most widely spoken Ethiosemitic languages in Ethiopia, yet under resourced to be fully integrated into TTS applications that assist oral society in their day-to-day activities. Text to speech research requires linguistic resources involving intensive text analysis and acoustic resources that involve digital signal analysis. TTS researches for Ethiosemitic languages have been explored on monolingual basis which require fragmented research activities towards the resource intensive task. Investigating the level of overlap for Amharic and Tigrigna gives an insight to reuse shared acoustic and linguistic resources across these languages and reduce duplication of effort in the process of designing higher level applications such as TTS. According to our statistical analysis, Amharic and Tigrigna share 86.36% at phonemic level, 85.93% at CV syllable level, and encouraging level of overlap at the word level. The extent to which these languages overlap at different level of abstraction implies the opportunity to reduce duplication of effort in the design and development of bilingual and multilingual TTS for Ethiosemitic polyglots.


Journal Identifiers


eISSN: 2520-7997
print ISSN: 0379-2897