SMC/SoC/2008: Difference between revisions

Line 4: Line 4:
Write a Lemmatiser for Malayalam. See whether we can do a  plugin for GATE for malayalam, that would help NLP reasearchers a lot and that would be a great idea. IGoogle search GATE,download and install GATE , and in the plugins directory a hindi tokenizer and lemmatiser is available.
Write a Lemmatiser for Malayalam. See whether we can do a  plugin for GATE for malayalam, that would help NLP reasearchers a lot and that would be a great idea. IGoogle search GATE,download and install GATE , and in the plugins directory a hindi tokenizer and lemmatiser is available.
=== Functional Optical character Recognition system===
=== Functional Optical character Recognition system===
Add malayalam Support for tesseract OCR . Stages and objectives to be defined clearly
Add malayalam Support for tesseract OCR.
 
* Study tesseract OCR system
* Recogntion of all characters
* Layout recogization using ocropus (optional ?)
 
http://code.google.com/p/tesseract-ocr/
http://code.google.com/p/ocropus/
 
=== Write a Gnome Speech Driver for Dhvani and Integrate it with Orca ===
=== Write a Gnome Speech Driver for Dhvani and Integrate it with Orca ===
#Orca for visually impaired users uses gnome speech for speech engines. Currently Festival, Espeak, freetts etc have drivers for gnome speech. We need to write a driver for dhvani.
#Orca for visually impaired users uses gnome speech for speech engines. Currently Festival, Espeak, freetts etc have drivers for gnome speech. We need to write a driver for dhvani.