Main Article Content

A corpus-based survey of four electronic swahili–english bilingual dictionaries


G De Pauw
G de Schryver
P Wagacha

Abstract

In this article we survey four different electronic bilingual dictionaries for the lan-guage pair Swahili–English. Aided by a data-driven morphological analyzer and part-of-speech tagger, we quantify the coverage of the dictionaries on large monolingual corpora of Swahili. In a second series of experiments, we investigate how applicable the dictionaries are as a tool in the development of a machine translation system, by evaluating bilingual coverage on the parallel SAWA corpus. At the same time we attempt to consolidate the dictionaries into a unified lexico-graphic database and compare the coverage to that of its composite parts.

Keywords: lexicography, evaluation, morphology, lemmatization, parallel corpora, machine learning, machine translation, swahili (kiswahili), english

Journal Identifiers


eISSN: 2224-0039
print ISSN: 1684-4904