Wednesday, October 7, 2009

Voice Recognition Options

There appear to be a few interesting voice recognition options available for robot control. While initial testing showed that all of the recognition systems available for Linux are just about useless, it was then discovered that my microphone is receiving significant interference.

  • Simon
    Simon looks fairly easy to get working by itself but looks more difficult to integrate into your robotics framework

  • Julius
    Julius is designed for Japanese though basic english support is available

  • PocketSphinx
    This voice recognition system is based on CMU Sphinx2, however it has been optimized for speed and small devices.

  • Voxforge
    This is an opensource project to develop acoustic models for speech recognition based on user contributed audio files. Several audio models are available for download.


Luis Uebel said...

Automatic speech recognition (ASR) is not only an option for robotics, but also for any type of control system. ASR provides speech-to-text functions and it is a natural interface for humans.
In 50's, speech recognition didn't work and keyboards were invented to overcome this problem.
Problem with ASR is that you need very expensive tools (large speech database, text databases for language modeling), and none will give you this for free. Try to look in LDC database for how much cost a good speech database. Julius, pocket sphinx, sphinx 4 work very well. I build systems with them, but you need good speech database for good recognition results. Acoustic models found in Internet use very little speech data.
I build speech recognition application for robotics in Brazilian Portuguese and American English. My speech database for Brazilian Portuguese has 800 speakers and over 700 hours of speech data.
My company build robots with speech recognition and work great. Look in ASRLabs dot com for further comments in speech recognition for robotic applications.


I Heart Robotics said...

That is why I am excited about the Voxforge project, I think if enough people submit their voices to it we will get much better open source speech tools.

Luis Uebel said...

Voxforge project is a quite good project, but there are a couple of problems:
1. You need to find enough people to record their voices;
2. Someone needs to validate all this data;
3. All speakers need to read phonetic balanced sentences;
4. Quality of recording can be a problem (mic, background noise, etc).

Find right people to record a speech database, validate all this data and work to select sentence phonetic balanced is the big reason for high price of a good speech database.

ASR Labs