Main Article Content

Northern Sotho grammatical descriptions: the design of a tokeniser for the verbal segment


Petronella M Kotzé

Abstract

Since Northern Sotho uses the disjunctive method of writing, it creates difficulties for the morphological analyser to correctly analyse Northern Sotho verbs. In order to overcome this obstacle a tokeniser, which could isolate verbs from raw texts, needs to be created. The verbal element a ka be a se a re šadišetša ‘he had not left us' consists, for example, of eight separately written parts which would be difficult to extract from a running text. The tokeniser will prevent over-analysis and unnecessary morphological ambiguity. A morpheme such as se that is not first tokenised could be ambiguously analysed as a subject concord, an object concord, a demonstrative pronoun, a negative marker or an auxiliary verb stem. With tokenisation, this ambiguity is removed as the position of the morpheme in the token allows for more accurate analysis of the morpheme. This article focuses on the description of the verbal segment in current Northern Sotho grammars. The different types of verbal elements are investigated as well as all the verbal prefixes which may form part of the verbal segment. Terminological issues surrounding so-called ‘deficient verbs' are addressed and a framework for the design of a tokeniser which provides for all the verbal prefixes is proposed.

Southern African Linguistics and Applied Language Studies 2008, 26(2): 197–208

Journal Identifiers


eISSN: 1727-9461
print ISSN: 1607-3614