Speereo Speech Recognition Technology
Upcoming SlideShare
Loading in...5
×
 

Speereo Speech Recognition Technology

on

  • 373 views

General short technical presentation of Continuous Speech Recognition Technolody by Speereo Software.

General short technical presentation of Continuous Speech Recognition Technolody by Speereo Software.

Statistics

Views

Total Views
373
Views on SlideShare
366
Embed Views
7

Actions

Likes
0
Downloads
8
Comments
0

1 Embed 7

http://community.sk.ru 7

Accessibility

Categories

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Speereo Speech Recognition Technology Speereo Speech Recognition Technology Presentation Transcript

  • Speereo Speech Engine
  • Speereo History Founded in 1998. 2002: mobile applications with Speereo Speech Engine developed. 2002-2011: annual SW awards. 2011: Russian language added. 2011: Skolkovo Fund resident. SK/Microsoft grant. 2012: Concepts development. 2013: Kamaz/Intel project, Voice Remote prototype Back
  • Speereo Concepts Automotive Aerospace Voice Remote Voice Admiral Back
  • Speereo Software Voice Translator Voice Launcher Voice Reader Voice Browser Back
  • Speereo Speech Recognition Over 10 years of research in speech technology Speaker independent. High recognition accuracy (up to 99%). Misc. noises robustness (up to 98% in car noises). Multilanguage support (English, Russian). Compact size. Embedded solutions (ARM, SHx, MIPS).
  • SSE: General Structure Noise Speech Recognition result
  • Initial Processing • • • • Features system, 41 coefficient Surroundings analyses Special algorithms detect microphone type and eliminate channel distortions Special algorithms assure stable system operation within a vehicle
  • Decoder • Continuous hidden Markov models (higher accuracy) • Discrete hidden Markov models (higher speed) • 63 models with 2446 components for English language • Models’ parameters are defined statistically • Highly optimized decoder algorithm allows real-time operation
  • From lab settings to reality Reality dictates own conditions to speech recognition systems: • Office noises • Mobile devices require higher noise robustness (can be used outside) • Car noises • Speech recognition with other voice present • Far field recognition
  • Cloud Services Cloud solutions offer value and dictate limitations: • Ability to use large processing resources • Easy integration for devices’ manufacturers • Server connection dependence (traffic payments, not always available) • Latency of voice message transfer is noticeable to a User
  • Speereo Recognition on a Client • Highly effective algorithms (performance enhanced, autonomous operation time for devices prolonged) • Cross-platform solution, support of large number of processors (ARM, SHx, Atom, etc.) • Relatively small memory consumption (5-10 MB)
  • Speech Engine Requirements • High recognition level (over 98%) • Speaker independence – recognizes any speaker (no specific voice training) • Large vocabulary that can be enlarged and enriched ‘on a fly’ • Noise robustness for car noises, street, etc. • Accents and pronunciation robustness • Effective processing system
  • Speereo: Accuracy Теsт 1: Long phrases recognition Test conditions: 600 phrases. Language - English. Recognition accuracy – 99.9%. Test was done with long phrases taken from TIMIT testing database. Speereo System was loaded with all 600 phrases in text form (txt) taken from TEST corpora set. Then system was fed with an acoustic recordings from the TEST set. Speereo System was returning recognition results (with a decoder algorithm it was finding the most similar phrase from the phraselist uploaded earlier). Result: TEST corpora in total consists of 1680 acoustic recordings, 1679 were recognized correctly, 1 mistake was made (99.94% success).
  • Speereo: Accuracy Теsт 2: Short words recognition Test conditions: numerical vocabulary (including muttered unclear pronounced words), 11 unique words. Language – English: accuracy level – 99.2%. Language – Russian: accuracy level – 98.5%.
  • Speereo: Noise Robustness Test 3: accuracy dependence on surrounding noises SNR (dB) 0 5 10 15 20 >50 Accuracy (%) 98,2 98,4 98,3 98,6 98,7 99,2 The base test sample consisted of about 2000 pronunciations (numerical vocabulary database English language) & has been mixed up with various noises. As a noise an air conditioner sound was taken.
  • Speereo: Recognition in a Car Test 4: long phrases in noisy environment Test conditions: identical to Test 1. 600 phrases. Noise – moving car with windows rolled down. Language – English. Accuracy level – 97,6%. Specially developed algorithms ensure high recognition accuracy in a moving vehicle.
  • Speereo: HW Requirements • Small size. Minimal memory requirement 1-2 MB • Speereo Speech Recognition operates on processors (chips) from 100 MIPS • Wide chips’ range support: SHx, TMPR39XX, NEC VR4122, MIPS, ARM, x86, etc.
  • Speereo: HW vs. Performance Speereo System can be tuned to utilize different memory volumes. Specific numbers depend on misc. parameters like vocabulary size, models applied, etc. Examples: Vocabulary size RAM ROM 50-100 words 512 kB ROM 256 kB 1 000 1 MB 2 MB 5 000 1.5 MB 2 MB 50 000 2 MB 5 MB
  • Speereo: Integration Tools • Easy to understand, simple development tools available to non-specialists in speech recognition development. • Application scalability. • Ability to use the technology in misc. platforms or misc. non-platform devices.
  • Speereo Speech Engine: WinCE List of speech commands Speech commands pronounced by user
  • Speereo Speech Engine: Use Operation of SE can be divided into 2 major stages: 1. Application defines the operating mode of SE and if it’s necessary sends the list of speech commands to SE. 2. User pronounces a phrase (command), SE determines most probable phrase from the list of received speech commands and sends its ID to the application. Developer does not need to trace the moment of pronouncing of a phrase. All one needs is to process the Speereo Speech Engine message that contains ID of the command pronounced by User.
  • Speereo: Recognition Models Two models are available today: 1. Recognition of phrases with words known to SE and included into the vocabulary. 2. Recognition of phrases with unknown to SE words (mostly personal names, etc.). In this case unknown words are transcribed automatically.
  • Speereo Engine: Initialization In order to use the speech interface in the program, the developer must register that program in Speereo Speech Engine; it is necessary to call the function AddRegisterApplication for it. The function prototype is as follows: UINT AddRegisterApplication (HWND hWnd), where hWnd – is the handle of the developer’s application window which receives the message from SE.
  • SE: Commands List Creation It is realized by the call of the AddPhrase function for each speech command. void AddPhrase (LPCTSTR pszText, DWORD dwId), where: pszText is a speech command in orthographic form; dwId is the integer identifier of the speech command that will be returned by SE if the speech command is pronounced.
  • Commands Definition Sample AddPhrase (_T(“Open Window”), ID_OPEN_WINDOW); AddPhrase (_T(“Close Window”), ID_CLOSE_WINDOW); In this example, two speech commands (“ Open Window ” and “ Close Window ”) are passed to SE: with the identifiers ID_OPEN_WINDOW and ID_CLOSE_WINDOW accordingly.
  • Response Receipt from SE The WM_SRT_ACCEPTHYPO message passes the identifier of the recognized speech command as the wPARAM parameter. The message comes from SE to the application window, hWnd of which was used in the AddRegisterApplication function as its parameter. Example: case WM_SRT_ACCEPTHYPO: MakeHypo (wParam); return TRUE; MakeHypo is the developer's function for implementation of speech command functionality here.
  • Response Receipt from SE The WM_SRT_ACCEPTHYPO message passes the identifier of the recognized speech command as the wPARAM parameter. The message comes from SE to the application window, hWnd of which was used in the AddRegisterApplication function as its parameter. Example: case WM_SRT_ACCEPTHYPO: MakeHypo (wParam); return TRUE; MakeHypo is the developer's function for implementation of speech command functionality here.
  • SE: Simplicity In order to implement Speereo voice interface one has to take three steps: 1. Initialize Speereo Speech Engine. 2. Define a voice commands list. 3. Define application reaction to a voice command list.
  • SE: Additional Features 1. Microphone and speaker controls. 2. Ability to interact with several applications simultaneously. 3. Ability to record sound and voice signal via microphone and real-time compression. 4. Ability to play sound and voice signals for User/speaker. 5. Speech signal detector selection (continuous monitoring of speech signal or recognition launch on a key press).
  • TTS: Speech Synthesizers Types Whole word TTS Text Phonemic TTS Text Speech Phones DB Prosody Phones Speech
  • TTS Requirements’ Resources Whole-words TTS Predefined vocabulary (up to 2-3 thousands words) at the system development stage. CPU from 40 MIPS, RAM from 0.5 Mb requires pronunciation by a narrator of all the vocabulary’s words. Phonemic TTS Large dictionaries may be used (over 100 thousands words). CPU from 80 MIPS, RAM from 2 MB, does not require setting for a dictionary.
  • TTS Requirements’ Resources Whole-words ТТS Can be used by any language. For creation of the word’s database there will be needed a narrator. The development time (1-2 weeks), depending on the dictionaries. Phonemic TTS Presently, there is support of English, Russian, French and Spanish. German and Italian are under development. The new language development period is 3 month.
  • Conclusion Speereo Speech Engine for embedded devices: • Speaker independent voice recognition: from 100 MIPS, from 1 MB of memory; • Speech synthesizer: from 80 MIPS, from 2Mb of memory; • Speech compression: from 40 MIPS, from 200 KB of memory. Back
  • Team Oleg Maleev, PhD, CTO R&D Daniil Ishchenko, CMO Business Development Konstantin Lamin, CEO Ideology, General Leadership Back
  • Contact Konstantin V. Lamin lamin@speereo.com Oleg G. Maleev maleev@speereo.com Daniil O. Ishchenko D_ischenko@speereo.com