SMC/SoC/2008

Participation of SMC in GSOC 2008 is not confirmed. Use this page for collecting the Project Ideas

Tokenizer/Lemmatiser for malayalam for GATE
Write a Lemmatiser for Malayalam. See whether we can do a plugin for GATE for malayalam, that would help NLP reasearchers a lot and that would be a great idea. IGoogle search GATE,download and install GATE, and in the plugins directory a hindi tokenizer and lemmatiser is available.

Functional Optical character Recognition system
Add malayalam Support for tesseract OCR.


 * Study tesseract OCR system
 * Recogntion of all characters
 * Layout recogization using ocropus (optional ?)

http://code.google.com/p/tesseract-ocr/ http://code.google.com/p/ocropus/

Write a Gnome Speech Driver for Dhvani and Integrate it with Orca

 * 1) Orca for visually impaired users uses gnome speech for speech engines. Currently Festival, Espeak, freetts etc have drivers for gnome speech. We need to write a driver for dhvani.
 * 2) Develop plugins for KTTS/Gedit/Firefox

Rewrite the Dhvani sound system with SDL

 * 1) Rewrite the ALSA sound system of dhvani with SDL to make it a cross platform application
 * 2) Packaging for different platforms
 * 3) Bug fixes for langauge modules and Code clean up
 * 4) Adding pitch/volume/pause support for the generated speech

Localization of Free Content Management Systems to Malayalam-Drupal/Joomla
100% localization of Drupal and Joomla CMS systems to Malayalam

How to Apply
see http://code.google.com/soc/2008/faqs.html