Ancient Ethiopic manuscripts character recognition using Deep Belief Networks
Very large proportion of Ethiopian literature is found in ancient ge’ez manuscripts in the form of old scriptures with papers from animal hides and skins (Branas) on which the ancient Ethiopic knowledge and civilization is recorded. This knowledge can be extracted and made usable by applying optical character recognition (OCR) systems on document images. Little efforts have been done for OCR of Ethiopic ancient manuscripts. Handwritten OCR process is considered as one of the most challenging problems in the area of image processing . The unique morphology of ge’ez hand-writing system (known as “Kum Tsihfet”), the degraded quality of the documents, and non-uniform background of the Branas poses additional challenges. Because of this, the OCR technique employed can’t be addressed directly by using OCR systems designed for modern printed and handwritten documents. Machine learning techniques like deep belief networks (DBNs) are becoming powerful set techniques that attempt to model complicated morphological features of handwritten texts. In this research we developed an OCR system using DBNs. The system was trained and tested using our own segmented datasets of ancient ge’ez characters containing 24 base characters only. The test result shows that a recognition accuracy of 93.75% was obtained, which is a promising result.