A Comparative Study of Deep and Shallow Parsing Approaches to Automated Grammaticality Evaluation
The concept of automated grammar evaluation of natural language texts is one that has attracted significant interests in the natural language processing community. It is the examination of natural language text for grammatical accuracy using computer software. The current work is a comparative study of different deep and shallow parsing techniques that have been applied to lexical analysis and grammaticality evaluation of natural language texts. The comparative analysis was based on data gathered from numerous related works. Shallow parsing using induced grammars was first examined along with its two main sub-categories, the probabilistic statistical parsers and the connectionist approach using neural networks. Deep parsing using handcrafted grammar was subsequently examined along with several of it‟s subcategories including Transformational Grammars, Feature Based Grammars, Lexical Functional Grammar (LFG), Definite Clause Grammar (DCG), Property Grammar (PG), Categorial Grammar (CG), Generalized Phrase Structure Grammar (GPSG), and Head-driven Phrase Structure Grammar (HPSG). Based on facts gathered from literature on the different aforementioned formalisms, a comparative analysis of the deep and shallow parsing techniques was performed. The comparative analysis showed among other things that while the shallow parsing approach was usually domain dependent, influenced by sentence length and lexical frequency and employed machine learning to induce grammar rules, the deep parsing approach were not domain dependent, not influenced by sentence length nor lexical frequency, and they made use of well spelt out set of precise linguistic rules. The deep parsing techniques proved to be a more labour intensive approach while the induced grammar rules were usually faster and reliability increased with size, accuracy and coverage of training data. The shallow parsing approach has gained immense popularity owing to availability of large corpora for different languages, and has therefore become the most accepted and adopted approach in recent times.
Keywords: Grammaticality, Natural language processing, Deep parsing, Shallow parsing, Handcrafted grammar, Precision grammar, Induced grammar, Automated scoring, Computational linguistics, Comparative study.