Malappuram/MESCE/Malayalam OCR

Revision as of 20:55, 2 February 2007 by 202.56.231.116 (talk)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Welcome to Akshara Malayalam O.C.R. This is a "Character" recognition software which can be used to recognize malayalam printed texts etc. It is based on "GNU Public License" and uses "IPL98" and "WxWidgets" libraries. It is a "Cross Platform Open Source" project started by the students of MES College of Engineering, Kuttipuram.

The OCR software has been developed in response to save time, the most valuable resource that we have. OCR has been designed to automate the process of conversion of printed matter, may it be books, magazines or even documents. The OCR is designed to be platform independent to keep up with the general trend shift towards open source platforms etc. It has also been kept open source to help with its future development. Akshara uses open source libraries like WxWidgets for its GUI development and file related processes and uses Image Processing Library (IPL98) for all its image manipulation requirements. Both these libraries being platform independent ensures Akshara’s independence.

Akshara Malayalam OCR is designed to be as a framework which can be ported to other languages with minor changes to the code to speed up development drastically.

Any OCR implementation consists of a number of preprocessing steps followed by the actual recognition. The number and types of preprocessing algorithms employed on the scanned image depend on many factors such as age of the document, paper quality, resolution of the scanned image, the amount of skew in the image, the format and layout of the images and text, the kind of script used and also on the type of characters - printed or handwritten. The recognition stage usually involves calculating a number of statistical parameters and hence recognizing the character. Typical preprocessing stages include noise cleaning, binarization, skeletonization, skew detection and correction and feature extraction - like line and word segmentation.