Semi-automating the reading programme for a historical dictionary project

Tim van Niekerk; Johannes Schäfer; Ulrich Heid

doi:10.5788/28-1-1468

download PDF

Published:

Jan 29, 2019

DOI:

10.5788/28-1-1468

Keywords:

corpora dictionary workflows historical lexicography language varieties lexical databases reading programmes South African English

Issue

Vol. 28 (2018)

Section

Articles

Copyright is owned by: Bureau of the WAT

Tim van Niekerk

Johannes Schäfer

Ulrich Heid

Abstract

This paper describes the resources and software procedures used or developed in a major enabling step towards the revision of the scholarly reference work A Dictionary of South African English on Historical Principles (DSAE, Silva et al. 1996), namely the semi-automatic generation of a digitally-sourced lexical database on which new and updated dictionary entries will be based; as well as the addition, in parallel, of a new corpus of South African English (SAE) to the project. Drawing on online data sources and an extensive list of known SAE word forms, we have developed a software toolchain to gather, encode, annotate and collate textual sources, producing: (i) a 3.1-billion part-of-speech-annotated corpus of South African English; (ii) a lexical database of illustrative quotations for over 20,000 known SAE word forms, available for selection at the entry-revision stage; and (iii) a list of potential new variant spellings and headword inclusion candidates. These steps replace, where recent electronic sources are concerned, the mechanical aspects of quotation gathering, normally undertaken manually through a reading programme requiring years of teamwork to acquire sufficient coverage (cf. Hicks 2010).

Keywords: corpora, dictionary workflows, historical lexicography, language varieties, lexical databases, reading programmes, South African English

Lexikos
Journal / Lexikos / Vol. 28 (2018) / Articles

Published:

DOI:

Keywords:

Semi-automating the reading programme for a historical dictionary project

Tim van Niekerk

Johannes Schäfer

Ulrich Heid

Abstract

Journal Identifiers

Article Sidebar

Published:

DOI:

Keywords:

Article Details

Main Article Content

Tim van Niekerk

Johannes Schäfer

Ulrich Heid

Abstract

Journal Identifiers