SMC/SoC/2008: Difference between revisions
| Line 4: | Line 4: | ||
Write a Lemmatiser for Malayalam. See whether we can do a plugin for GATE for malayalam, that would help NLP reasearchers a lot and that would be a great idea. IGoogle search GATE,download and install GATE , and in the plugins directory a hindi tokenizer and lemmatiser is available. | Write a Lemmatiser for Malayalam. See whether we can do a plugin for GATE for malayalam, that would help NLP reasearchers a lot and that would be a great idea. IGoogle search GATE,download and install GATE , and in the plugins directory a hindi tokenizer and lemmatiser is available. | ||
=== Functional Optical character Recognition system=== | === Functional Optical character Recognition system=== | ||
Add malayalam Support for tesseract OCR . | Add malayalam Support for tesseract OCR. | ||
* Study tesseract OCR system | |||
* Recogntion of all characters | |||
* Layout recogization using ocropus (optional ?) | |||
http://code.google.com/p/tesseract-ocr/ | |||
http://code.google.com/p/ocropus/ | |||
=== Write a Gnome Speech Driver for Dhvani and Integrate it with Orca === | === Write a Gnome Speech Driver for Dhvani and Integrate it with Orca === | ||
#Orca for visually impaired users uses gnome speech for speech engines. Currently Festival, Espeak, freetts etc have drivers for gnome speech. We need to write a driver for dhvani. | #Orca for visually impaired users uses gnome speech for speech engines. Currently Festival, Espeak, freetts etc have drivers for gnome speech. We need to write a driver for dhvani. | ||