Main Article Content

The extraction of terminology list using ParaConc for creating a quadrilingual dictionary


Respect Mlambo
Nomsa Skosana
Muzi Matfunjwa

Abstract

The lack of terminology and language resources for the under-resourced South African languages poses a serious problem for effective  communication in specialised fields such as law, education, health, agriculture, science, and technology. The ability to use all South African  languages in all contemporary fields requires the existence of relevant terminology and resources. The article aims to semi-automatically identify and extract terminology for creating an English, Xitsonga, Siswati, and isiNdebele quadrilingual dictionary. Given parallel texts in the four different languages, we use ParaConc to identify and extract terminology in one language and the corresponding translations
in the other languages. In this study, English is used as the source language, while Xitsonga, Siswati, and isiNdebele are the target  languages. This process allowed us to identify specific lexical items in the source language (manually) and their translation equivalents in the target languages (automatically). The result was a collection of extracted terminology lists that can be used to compile a specialised quadrilingual dictionary for English and three of the under-resourced languages. We show the usefulness of ParaConc to semi-automatically extract quadrilingual terminology lists, which, by creating quadrilingual dictionaries, will contribute to the development and promotion of multilingualism in South Africa.


Journal Identifiers


eISSN: 1727-9461
print ISSN: 1607-3614