South African Journal of African Languages

Log in or Register to get access to full text downloads.

Remember me or Register

DOWNLOAD FULL TEXT Open Access  DOWNLOAD FULL TEXT Subscription or Fee Access

Tokenization rules for the disjunctively written verbal segment of Northern Sotho

PM Kotzé


This article describes the tokenization rules required to analyse the disjunctively written verbal segmentof Northern Sotho correctly. The purpose of such a tokenizer is to isolate verbal segments from runningtext prior to being analysed. The disjunctive elements of the verbal segment that are discussed in thisarticle and for which generic tokenization rules are proposed, are the following: subject and objectconcords, the potential marker, negative markers, tense markers and aspect prefixes. The position ofeach element in a sequence of pre-verbal elements is determined and the collocation restrictions thatapply to certain elements are described and incorporated into the tokenization rules. The rules describedin this article have already been implemented in a prototype tokenizer that is currently being tested.

AJOL African Journals Online