SMC/SoC/2008: Difference between revisions

Latest revision as of 21:37, 18 March 2008

SMC in Google Summer of Code 2007

   SMC is not selected for GSOC 2008. Anyway these projects need to be done!

Ideas for Google Summer of Code 2008

Tokenizer/Lemmatiser for malayalam for GATE

Write a Lemmatiser for Malayalam. See whether we can do a plugin for GATE for malayalam, that would help NLP reasearchers a lot and that would be a great idea. Google search GATE,download and install GATE , and in the plugins directory a hindi tokenizer and lemmatiser is available.

Functional Optical character Recognition system

Add malayalam Support for tesseract OCR.

Study tesseract OCR system
Recogntion of all characters
Layout recogization using ocropus (optional ?)

http://code.google.com/p/tesseract-ocr/

http://code.google.com/p/ocropus/

Write a Gnome Speech Driver for Dhvani and Integrate it with Orca

Orca for visually impaired users uses gnome speech for speech engines. Currently Festival, Espeak, freetts etc have drivers for gnome speech. We need to write a driver for dhvani.
Develop plugins for KTTS/Gedit/Firefox

Write a Dhvani Interface for Speech Dispatcher

The goal of Speech Dispatcher project is to provide a high-level device independent layer for speech synthesis through a simple, stable and well documented interface. Since SD is more discussed to act as a unified TTS layer for both gnome and KDE, We can try to write a Interface for that

Rewrite the Dhvani sound system with SDL and Additional APIs

Rewrite the ALSA sound system of Dhvani with SDL to make it a cross platform application
Packaging for different platforms
Bug fixes for langauge modules and Code clean up
Adding pitch/volume/pause support for the generated speech
API to stop the speech in between a synthesis
Provide Dhvani as a library
API to check whether the synthesizer is producing speech(isSpeaking)

Localization of Free Content Management Systems to Malayalam-Drupal &Joomla

100% localization of Drupal and Joomla CMS systems to Malayalam

Speech recognition system for Malayalam

The aim is to develop a speech recognition system for Malayalam using the concepts of memory prediction framework. Memory prediction framework put forward by Jeff Hawkins in his book 'On Intelligence'(2004) is a theory of brain function, based on the hierarchical organization of human neocortex.It explains how the hierarchical structure enables brain to match sensory inputs to the stored memory patterns for predicting the future input sequences. According to this model, neocortex has a layered structure with different layers storing constructs of varying complexity, with sensory inputs coming to the lowest layer. For example in case of vision, the lower layer receives retinal signals and layers up the hierarchy associates themselves with meaningful constructs like lines, two dimensional figures, and furthur up specific objects like faces etc. In speech the layers store different speech constructs from phonemes and syllables to phrases and sentences. The human speech perception and recognition can be understood using this hierarchical organization. If we mimic the way in which human brain recognizes speech, the resulting system will be more robust than the existing systems. The proposed system is trained with a carefully compiled database and different speech constructs are stored in different layers.When a speech segment to be recognized is given, a series of predictions start and signals will be passed upwards and downwards the layers, until the most probable speech construct is arrived at. For example if the most probable candidate for first word is 'how', predictions start as to what succeeding words can be. This continues until the last word is arrived at and the phrase giving maximum probability will chosen among these predictions.

References:

"On Intelligence", Jeff Hawkins, Sandra Blakeslee; Henry Holt, 2004
"Hierarchical Temporal Memory - Concepts, Theory, and Terminology" by Jeff Hawkins and Dileep George, Numenta Inc.
http://www.phillylac.org/prediction

Creating a new family of Equal Height Fonts (EHF)for Malayalam language

To design and create a new family of Equal Height Fonts for the traditional Malayalam script. Following Roman typology, serif and sans serif type of font variations are available in Malayalam. Equal Width Fonts, such as Courier, available in Roman typography are impossible for Malayalam characters and this is unnecessary. The proposed Equal Height Fonts is a new concept in the history of font making to surmount the typographical challenge of vertically stacked conjuncts.

How to Apply

Selection procedure

http://code.google.com/soc/2008/faqs.html

Guidelines for Students

How to write applications for KDE Google Summer Of Code? - most of the tips applicable to all projects.

Guidelines for Mentors

Summer of Code Mentoring HOWTO

@@ Line 1: / Line 1: @@
-Participation of SMC in GSOC 2008 is not confirmed. Use this page for collecting the Project Ideas
+[[SMC/SoC/2007|SMC in Google Summer of Code 2007]]
+    SMC is not selected for GSOC 2008. Anyway these projects need to be done!
 ==Ideas for Google Summer of Code 2008==
 ===Tokenizer/Lemmatiser for malayalam for GATE===
-Write a Lemmatiser for Malayalam. See whether we can do a  plugin for GATE for malayalam, that would help NLP reasearchers a lot and that would be a great idea. IGoogle search GATE,download and install GATE , and in the plugins directory a hindi tokenizer and lemmatiser is available.
+Write a Lemmatiser for Malayalam. See whether we can do a  plugin for GATE for malayalam, that would help NLP reasearchers a lot and that would be a great idea. Google search GATE,download and install GATE , and in the plugins directory a hindi tokenizer and lemmatiser is available.
 === Functional Optical character Recognition system===
-Add malayalam Support for tesseract OCR . Stages and objectives to be defined clearly
+Add malayalam Support for tesseract OCR.
+* Study tesseract OCR system
+* Recogntion of all characters
+* Layout recogization using ocropus (optional ?)
+http://code.google.com/p/tesseract-ocr/
+http://code.google.com/p/ocropus/
 === Write a Gnome Speech Driver for Dhvani and Integrate it with Orca ===
-Orca for visually impaired users uses gnome speech for speech engines. Currently Festival, Espeak, freetts etc have drivers for gnome speech. We need to write a driver for dhvani.
+#Orca for visually impaired users uses gnome speech for speech engines. Currently Festival, Espeak, freetts etc have drivers for gnome speech. We need to write a driver for dhvani.
-=== Swathantra Malayalam Corpus Phase 1===
+#Develop plugins for KTTS/Gedit/Firefox
-The whole swathantra malayalam corpus is aimed at building a Free and Open source annotated corpus,related APIs, programs to build different types of corpus etc.
+=== Write a Dhvani Interface for Speech Dispatcher  ===
+The goal of [http://www.freebsoft.org/speechd Speech Dispatcher] project is to provide a high-level device independent layer for speech synthesis through a simple, stable and well documented interface. Since SD is more discussed to act as a unified TTS layer  for both gnome and KDE, We can try to write a  Interface for that
-Details:
+===Rewrite the Dhvani sound system with SDL and Additional APIs===
-* Needs an annotated image and speech corpus to support the Speech and image related FOSS driven research and development.
+#Rewrite the ALSA sound system of [[Dhvani|Dhvani]] with [http://www.libsdl.org/ SDL] to make it a cross platform application
-* It should be able to act as a standard train and test data for the R&D activities.
+#Packaging for different platforms
-* In the first phase need to build a specification document, clearly written manual for building the corpus and should build the tools needed to build the corpus and use the corpus.
+#Bug fixes for langauge modules and Code clean up
-* Anybody who like to contribute to the project must be able to do so and the specifications should be of the best covering all the aspects on classification of data, annotation of data, structure of storage and all related details.
+#Adding pitch/volume/pause support for the generated speech
-* As a part of the project, when we finish the summer, we must be able to build a complete specification document(document, explanations,related presentations, demo files etc.) and programs to build the corpus and access the corpus(building the whole process must be a collaborative effort, it is not coming under this phase).
+#API to stop the speech in between a synthesis
-* More importantly, the structure should be an extensible one for all indic languages.
+#Provide Dhvani as a library
+#API to check whether the synthesizer is producing speech(isSpeaking)
+===Localization of Free Content Management Systems to Malayalam-Drupal &Joomla ===
+% localization of Drupal and Joomla CMS systems to Malayalam
+===Speech recognition system for Malayalam===
+The aim is to develop a speech recognition system for Malayalam using the concepts of memory prediction framework. Memory prediction framework put forward by Jeff Hawkins in his book 'On Intelligence'(2004) is a theory of brain function, based on  the hierarchical organization of human neocortex.It explains how the hierarchical structure enables brain to match sensory inputs to the stored memory patterns for predicting the future input sequences. According to this model, neocortex has a layered structure with different layers storing constructs of varying complexity, with sensory inputs coming to the lowest layer. For example in case of vision, the lower layer receives retinal signals and layers up the hierarchy associates themselves with meaningful constructs like lines, two dimensional figures, and furthur up specific objects like faces etc. In speech the layers store different speech constructs from phonemes and syllables to phrases and sentences. The human speech perception and recognition can be understood using  this hierarchical organization.
+If we mimic the way in which human brain recognizes speech, the resulting system will be more robust than the existing systems. The proposed system is  trained with a carefully compiled database and different speech constructs are stored in different layers.When a speech segment to be recognized is given, a series of predictions start and signals will be passed upwards and downwards the layers, until the most probable speech construct is arrived at. For example if the most probable candidate for first word is 'how', predictions start as to what succeeding words can be. This continues until the last word is arrived at and the phrase giving maximum probability will chosen among these predictions.
+References:
+#"On Intelligence", Jeff Hawkins, Sandra Blakeslee; Henry Holt, 2004
+#"Hierarchical Temporal Memory - Concepts, Theory, and Terminology" by Jeff Hawkins and Dileep George, Numenta Inc.
+#http://www.phillylac.org/prediction
+===Creating a new family of Equal Height Fonts (EHF)for Malayalam language===
+To design and create a new family of Equal Height Fonts for the traditional Malayalam
+script. Following Roman typology, serif and sans serif type of font variations are available in
+Malayalam. Equal Width Fonts, such as Courier, available in Roman typography are
+impossible for Malayalam characters and this is unnecessary. The proposed Equal Height
+Fonts is a new concept in the history of font making to surmount the typographical
+challenge of vertically stacked conjuncts.
-Please add more details that can be added to a corpora project.
 ==How to Apply ==
-see http://code.google.com/soc/2008/faqs.html
+#see http://code.google.com/soc/2008/faqs.html
+#[http://wiki.debian.org/SummerOfCode2008/StudentApplicationTemplate Student Application Template]
 ==Selection procedure ==
+http://code.google.com/soc/2008/faqs.html
 ==Guidelines for Students ==
+[http://pradeepto.livejournal.com/12565.html How to write applications for KDE Google Summer Of Code?] - most of the tips applicable to all projects.
 ==Guidelines for Mentors ==
+[http://www.gnome.org/~federico/docs/summer-of-code-mentoring-howto/index.html Summer of Code Mentoring HOWTO]

SMC/SoC/2008: Difference between revisions

Latest revision as of 21:37, 18 March 2008

Contents

Ideas for Google Summer of Code 2008

Tokenizer/Lemmatiser for malayalam for GATE

Functional Optical character Recognition system

Write a Gnome Speech Driver for Dhvani and Integrate it with Orca

Write a Dhvani Interface for Speech Dispatcher

Rewrite the Dhvani sound system with SDL and Additional APIs

Localization of Free Content Management Systems to Malayalam-Drupal &Joomla

Speech recognition system for Malayalam

Creating a new family of Equal Height Fonts (EHF)for Malayalam language

How to Apply

Selection procedure

Guidelines for Students

Guidelines for Mentors

Navigation menu

SMC/SoC/2008: Difference between revisions

Latest revision as of 21:37, 18 March 2008

Ideas for Google Summer of Code 2008

Tokenizer/Lemmatiser for malayalam for GATE

Functional Optical character Recognition system

Write a Gnome Speech Driver for Dhvani and Integrate it with Orca

Write a Dhvani Interface for Speech Dispatcher

Rewrite the Dhvani sound system with SDL and Additional APIs

Localization of Free Content Management Systems to Malayalam-Drupal &Joomla

Speech recognition system for Malayalam

Creating a new family of Equal Height Fonts (EHF)for Malayalam language

How to Apply

Selection procedure

Guidelines for Students

Guidelines for Mentors

Navigation menu

Search