PEDANT: parallel texts in Göteborg
The article presents the status of the PEDANT project with parallel corpora at the Language Bank at Göteborg University. The solutions for access to the corpus data are presented. Access is provided by way of the internet and standard applications and SGML-aware programming tools. The SGML format for encoding translation pairs is outlined together. The methods allow working with everything from plain text to texts densely encoded with linguistic information.
Keywords: sgml, parallel corpora, morphosyntactic encoding, lemmatization, multiword units, compound words, internet access