A fully inflected Arabic verb resource constructed constructed from a lexicon of lemmas by using finite-state transducers

Amid Neme Alexis

download PDF

Published:

May 17, 2018

DOI:

Keywords:

Arabic Natural Language Processing Semitic morphology POS tagging root and pattern

Issue

Vol. 20 No. 2 (2013)

Section

Articles

Copyright for articles published in this journal is retained by the journal.

Amid Neme Alexis

Abstract

We describe a fully inflected lexicon of 2.5 million verbal forms generated by using finite-state transducers. The lexicon is constituted of 15 400 verbal entries or lemmas. The lexicon of Arabic verbs is constructed on the basis of Semitic patterns and used in a resource-based method of morphological annotation of written Arabic text. An enhanced FST implementation for Semitic languages was created. This system is adapted also for generating inflected forms. The language resources can be easily updated. We propose an inflectional taxonomy that increases the lexicon readability and maintainability for Arabic speakers and linguists. Traditional grammar defines inflectional verbal classes by using verbal pattern-classes and root-classes, related to the nature of each of the triliteral root-consonants. Verbal pattern-classes are clearly defined but root-classes are complex. In our taxonomy, traditional pattern-classes are reused and rootclasses are simply redefined. Our taxonomy provides a straightforward encoding scheme for inflectional variations and orthographic adjustments due to assimilation and agglutination. We have tested and evaluated our resource against 10 000 diacriticized verb occurrences in the Nemlar corpus and compared it to Buckwalter resources. The lexical coverage is 99.9 %. A laptop needs two minutes in order to generate and compress the 2.5 million form lexicon into 4 Megabytes for fast retrieval. The analysis of a verb takes 0.5 millisecond.

Keywords: Arabic, Natural Language Processing, Semitic morphology, POS tagging, root and pattern

Revue d'Information Scientifique et Technique
Journal / Revue d'Information Scientifique et Technique / Vol. 20 No. 2 (2013) / Articles

Published:

DOI:

Keywords:

A fully inflected Arabic verb resource constructed constructed from a lexicon of lemmas by using finite-state transducers

Amid Neme Alexis

Abstract

Journal Identifiers

Article Sidebar

Published:

DOI:

Keywords:

Article Details

Main Article Content

Amid Neme Alexis

Abstract

Journal Identifiers