Main Article Content

Challenges to issues of balance and representativeness in African lexicography


T Otlogetswe

Abstract

Modern dictionaries depend on corpora of different sizes and types for frequency listings, concordances and collocations, illustrative sentences and grammatical information. With the help of computer software, retrieving such information has increasingly become relatively easy. However, the quality of retrieved information for lexicographic purposes depends on the information input at the stage of corpus construction. If corpora are not representative of the different language usages of a speech community, they may prove to be unreliable sources of lexicographic information. There are, however, issues in African languages which make many African corpora questionable. These issues include a lack of texts of different genres, the unavailability of balanced and representative written texts, a complete absence of spoken texts as well as literacy problems in African societies. This article therefore explores the different challenges to the construction of reliable corpora in African languages. It argues that African languages face peculiar challenges and corpus research may require a different treatment compared to European and American corpus research. It finally concludes that issues of balance and representativeness appear theoretically impossible when looking at the results of sociolinguistic research on the different existing language varieties which are difficult to represent accurately in a corpus.

Keywords: african languages, balance, bank of english, borrowing, british national corpus, cobuild, code-switching, computers, corpora, dialect, dictionaries, frequency, language variety, representativeness, setswana, sociolinguistics, speech, text

Journal Identifiers


eISSN: 2224-0039
print ISSN: 1684-4904