SMC/SoC/2008: Difference between revisions

Revision as of 15:55, 26 February 2008

Participation of SMC in GSOC 2008 is not confirmed. Use this page for collecting the Project Ideas

Ideas for Google Summer of Code 2008

Tokenizer/Lemmatiser for malayalam for GATE

Write a Lemmatiser for Malayalam. See whether we can do a plugin for GATE for malayalam, that would help NLP reasearchers a lot and that would be a great idea. IGoogle search GATE,download and install GATE , and in the plugins directory a hindi tokenizer and lemmatiser is available.

Functional Optical character Recognition system

Add malayalam Support for tesseract OCR . Stages and objectives to be defined clearly

Write a Gnome Speech Driver for Dhvani and Integrate it with Orca

Orca for visually impaired users uses gnome speech for speech engines. Currently Festival, Espeak, freetts etc have drivers for gnome speech. We need to write a driver for dhvani.

Swathantra Malayalam Corpus Phase 1

The whole swathantra malayalam corpus is aimed at building a Free and Open source annotated corpus,related APIs, programs to build different types of corpus etc.

Details:

Needs an annotated image and speech corpus to support the Speech and image related FOSS driven research and development.
It should be able to act as a standard train and test data for the R&D activities.
In the first phase need to build a specification document, clearly written manual for building the corpus and should build the tools needed to build the corpus and use the corpus.
Anybody who like to contribute to the project must be able to do so and the specifications should be of the best covering all the aspects on classification of data, annotation of data, structure of storage and all related details.
As a part of the project, when we finish the summer, we must be able to build a complete specification document and programs to build the corpus and access the corpus(building the whole process must be a collaborative effort, it is not coming under this phase).

@@ Line 11: / Line 11: @@
 Details:
-. Needs an annotated image and speech corpus to support the Speech and image related FOSS driven research and development.
+* Needs an annotated image and speech corpus to support the Speech and image related FOSS driven research and development.
-. It should be able to act as a standard train and test data for the R&D activities.
+* It should be able to act as a standard train and test data for the R&D activities.
-. In the first phase need to build a specification document, clearly written manual for building the corpus and should build the tools needed to build the corpus and use the corpus.
+* In the first phase need to build a specification document, clearly written manual for building the corpus and should build the tools needed to build the corpus and use the corpus.
-. Anybody who like to contribute to the project must be able to do so and the specifications should be of the best covering all the aspects on classification of data, annotation of data, structure of storage and all related details.
+* Anybody who like to contribute to the project must be able to do so and the specifications should be of the best covering all the aspects on classification of data, annotation of data, structure of storage and all related details.
-. As a part of the project, when we finish the summer, we must be able to build a complete specification document and programs to build the corpus and access the corpus(building the whole process must be a collaborative effort, it is not coming under this phase).
+* As a part of the project, when we finish the summer, we must be able to build a complete specification document and programs to build the corpus and access the corpus(building the whole process must be a collaborative effort, it is not coming under this phase).
+Please add more details that can be added to a corpora project.
 ==How to Apply ==

SMC/SoC/2008: Difference between revisions

Revision as of 15:55, 26 February 2008

Contents

Ideas for Google Summer of Code 2008

Tokenizer/Lemmatiser for malayalam for GATE

Functional Optical character Recognition system

Write a Gnome Speech Driver for Dhvani and Integrate it with Orca

Swathantra Malayalam Corpus Phase 1

How to Apply

Selection procedure

Guidelines for Students

Guidelines for Mentors

Navigation menu

SMC/SoC/2008: Difference between revisions

Revision as of 15:55, 26 February 2008

Ideas for Google Summer of Code 2008

Tokenizer/Lemmatiser for malayalam for GATE

Functional Optical character Recognition system

Write a Gnome Speech Driver for Dhvani and Integrate it with Orca

Swathantra Malayalam Corpus Phase 1

How to Apply

Selection procedure

Guidelines for Students

Guidelines for Mentors

Navigation menu

Search