Киев 2016
Первый в Украине фестиваль тестирования
Introduction to Speech
Recognition Software testing
Roman Gorin
Киев 2016
About me
• Senior Technical Leader – Testing
@ Delphi LLC http://udelphi.com
• 12+ years in Speech Recognition Testing
• 6+ years as QA Team Lead
• Main Product: Nuance Dragon Medical
http://www.nuance.com/for-healthcare/dragon-medical
• https://telegram.me/DJ_ZX
• Facebook: rgorin.zx
Киев 2016
What it is
Киев 2016
Where used
• Nuance Dragon Family
• Dragon Pro
• Dragon Medical
• Dragon for Mac
• Dragon Anywhere
• Etc
Windows Speech Recognition
Google Voice Search
Киев 2016
Where used
Personal assistants
• Siri
• Cortana
• Google Now
• Facebook M, etc
Car systems
Киев 2016
Where used
Smart Home assistants
• Amazon Echo
• Google Home
• Zenbo
• Homer, etc.
• Automated Call Сenters SW
and more
Киев 2016
Where used: ViV AI (unreleased)
Киев 2016
Basic Principles
• Capture audio
• Separate speech from other types of sounds (esp. noise)
• Compare speech audio with known patterns of text<-
>audio match
• Analyze language specific model
• Perform actions (type text, execute command) based on
collected data
Киев 2016
Generic structure of how SR works
Main speech recognition models
(based on Wiki)
• Hidden Markov models
• Dynamic time warping (DTW)-based
speech recognition
• Neural networks
• Deep Feedforward and Recurrent
Neural Networks
Киев 2016
Testing areas
• Engine and Language Modelling (usually on recognition server side)
• UI
• Hardware
• Deployment
• Adaptation
• Recognition and Text Editing
• Language specific
etc
Киев 2016
Testing areas: Hardware
• Mobile HW
• Internal mic (notebooks/tablets)
• Noise cancelling mic
• Sound card and drivers compatibility
• System Requirements compliance
• HW Dependency
• Driver Dependency (WASAPI, DirectSound, ASIO, Kernel streaming for
Windows, ALSA, PulseAudio – Linux, Core Audio – Mac)
Киев 2016
Testing areas: Hardware
• Mics and recorders (samples from nuance.com store)
• Special bundled HW for Professional
*Nuance PowerMic *Philips SpeechMike
Киев 2016
Testing areas: Deployment
• Platform
• Client OS (Desktop/Mobile)
• Server OS for Client app
• Server OS for Cloud/Remote app
• Azure Cloud
• Amazon Cloud
• Proprietary cloud hosts for server recognition (for ex. recognition servers for Siri, etc)
• Support for virtualization platforms: VDI and App Virtualization
(standalone recognition on remote access)
• Citrix XenApp and XenDesktop/Thin and Thick clients
• VMWare Workstation and Horizon
• Oracle VirtualBox
• Microsoft Remote Desktop/Terminal Services
Киев 2016
Testing areas: Adaptation
• Predefined language patterns
• Statistical models
A statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability
P ( w 1 , … , w m ) to the whole sequence. Having a way to estimate the relative likelihood of different phrases is useful in many natural language processing applications.
Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications.
• “Part of speech” detection
• Sound specific patterns
• Person-specific
• How person pronounce words and sounds
• How person construct sentences
• Pronunciation speed
Киев 2016
Testing areas: Recognition and Commands
control
• Initial recognition tests
• Turn app into “listening mode”
• Basic commands (“what I can do”)
• Extended commands (app-type specific)
• Non strict commands (pseudo-AI)
• Search commands
• 3rd party Apps specific commands/3rd party SW compatibility
• Dictating into app default text controls (if supported)
• Dictating into 3rd party supported and unsupported apps
• Transcribing prerecorded audio
Киев 2016
Testing areas: Recognition and Text Editing
(sample from PCWorld/Nuance)
Киев 2016
Testing areas: Languages and Accents
• Different accents (UK English, US English, Australian English, etc)
• Issues with speaking
• Language-specific sounds
• Homophones (French)
• Umlauts (German)
• etc
• Language specific syntax (using commas, periods, exclamation marks,
etc)
• Similar or close pronunciation words (fr. voux, voi, vu, etc)
• Hieroglyphs (Chinese, Japan, etc)
Киев 2016
Testing areas: Other stuff
• Audio codecs
• Traffic consumption (for cloud or remote access apps)
• Memory and CPU consumption
• Response time and cancelling recognition
Киев 2016
Enterprise Recognition (based on Nuance.com info)
Киев 2016
Enterprise Recognition (based on Nuance.com info)
• Support Major EHR
platforms—including Epic®,
Cerner®, eClinicalWorks,
athenahealth®, MEDITECH®,
and more. © Nuance.com
Киев 2016
Киев 2016
Links
• https://msdn.microsoft.com/en-us/library/hh378337(v=office.14).aspx
• http://www.explainthatstuff.com/voicerecognition.html
• http://scienceline.org/2014/08/ever-wondered-how-does-speech-to-text-software-work/
• http://www.nuance.com/for-healthcare/capture-anywhere/360-mobile-solutions/powermicmobile/index.htm
• http://www.nuance.com/for-individuals/by-product/dragon-accessories
• https://en.wikipedia.org/wiki/List_of_speech_recognition_software
• https://en.wikipedia.org/wiki/Dragon_NaturallySpeaking
• https://en.wikipedia.org/wiki/Speech_recognition
• https://en.wikipedia.org/wiki/Language_model
• http://www.pcmag.com/article2/0,2817,2464719,00.asp
• http://www.pcworld.com/article/2055599/control-your-pc-with-these-5-speech-recognition-programs.html
• http://www.oxygen.lcs.mit.edu/Speech.html
• http://copia.com.au/medical-speech-recognition/

QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

  • 1.
    Киев 2016 Первый вУкраине фестиваль тестирования Introduction to Speech Recognition Software testing Roman Gorin
  • 2.
    Киев 2016 About me •Senior Technical Leader – Testing @ Delphi LLC http://udelphi.com • 12+ years in Speech Recognition Testing • 6+ years as QA Team Lead • Main Product: Nuance Dragon Medical http://www.nuance.com/for-healthcare/dragon-medical • https://telegram.me/DJ_ZX • Facebook: rgorin.zx
  • 3.
  • 4.
    Киев 2016 Where used •Nuance Dragon Family • Dragon Pro • Dragon Medical • Dragon for Mac • Dragon Anywhere • Etc Windows Speech Recognition Google Voice Search
  • 5.
    Киев 2016 Where used Personalassistants • Siri • Cortana • Google Now • Facebook M, etc Car systems
  • 6.
    Киев 2016 Where used SmartHome assistants • Amazon Echo • Google Home • Zenbo • Homer, etc. • Automated Call Сenters SW and more
  • 7.
    Киев 2016 Where used:ViV AI (unreleased)
  • 8.
    Киев 2016 Basic Principles •Capture audio • Separate speech from other types of sounds (esp. noise) • Compare speech audio with known patterns of text<- >audio match • Analyze language specific model • Perform actions (type text, execute command) based on collected data
  • 9.
    Киев 2016 Generic structureof how SR works Main speech recognition models (based on Wiki) • Hidden Markov models • Dynamic time warping (DTW)-based speech recognition • Neural networks • Deep Feedforward and Recurrent Neural Networks
  • 10.
    Киев 2016 Testing areas •Engine and Language Modelling (usually on recognition server side) • UI • Hardware • Deployment • Adaptation • Recognition and Text Editing • Language specific etc
  • 11.
    Киев 2016 Testing areas:Hardware • Mobile HW • Internal mic (notebooks/tablets) • Noise cancelling mic • Sound card and drivers compatibility • System Requirements compliance • HW Dependency • Driver Dependency (WASAPI, DirectSound, ASIO, Kernel streaming for Windows, ALSA, PulseAudio – Linux, Core Audio – Mac)
  • 12.
    Киев 2016 Testing areas:Hardware • Mics and recorders (samples from nuance.com store) • Special bundled HW for Professional *Nuance PowerMic *Philips SpeechMike
  • 13.
    Киев 2016 Testing areas:Deployment • Platform • Client OS (Desktop/Mobile) • Server OS for Client app • Server OS for Cloud/Remote app • Azure Cloud • Amazon Cloud • Proprietary cloud hosts for server recognition (for ex. recognition servers for Siri, etc) • Support for virtualization platforms: VDI and App Virtualization (standalone recognition on remote access) • Citrix XenApp and XenDesktop/Thin and Thick clients • VMWare Workstation and Horizon • Oracle VirtualBox • Microsoft Remote Desktop/Terminal Services
  • 14.
    Киев 2016 Testing areas:Adaptation • Predefined language patterns • Statistical models A statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability P ( w 1 , … , w m ) to the whole sequence. Having a way to estimate the relative likelihood of different phrases is useful in many natural language processing applications. Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications. • “Part of speech” detection • Sound specific patterns • Person-specific • How person pronounce words and sounds • How person construct sentences • Pronunciation speed
  • 15.
    Киев 2016 Testing areas:Recognition and Commands control • Initial recognition tests • Turn app into “listening mode” • Basic commands (“what I can do”) • Extended commands (app-type specific) • Non strict commands (pseudo-AI) • Search commands • 3rd party Apps specific commands/3rd party SW compatibility • Dictating into app default text controls (if supported) • Dictating into 3rd party supported and unsupported apps • Transcribing prerecorded audio
  • 16.
    Киев 2016 Testing areas:Recognition and Text Editing (sample from PCWorld/Nuance)
  • 17.
    Киев 2016 Testing areas:Languages and Accents • Different accents (UK English, US English, Australian English, etc) • Issues with speaking • Language-specific sounds • Homophones (French) • Umlauts (German) • etc • Language specific syntax (using commas, periods, exclamation marks, etc) • Similar or close pronunciation words (fr. voux, voi, vu, etc) • Hieroglyphs (Chinese, Japan, etc)
  • 18.
    Киев 2016 Testing areas:Other stuff • Audio codecs • Traffic consumption (for cloud or remote access apps) • Memory and CPU consumption • Response time and cancelling recognition
  • 19.
    Киев 2016 Enterprise Recognition(based on Nuance.com info)
  • 20.
    Киев 2016 Enterprise Recognition(based on Nuance.com info) • Support Major EHR platforms—including Epic®, Cerner®, eClinicalWorks, athenahealth®, MEDITECH®, and more. © Nuance.com
  • 21.
  • 22.
    Киев 2016 Links • https://msdn.microsoft.com/en-us/library/hh378337(v=office.14).aspx •http://www.explainthatstuff.com/voicerecognition.html • http://scienceline.org/2014/08/ever-wondered-how-does-speech-to-text-software-work/ • http://www.nuance.com/for-healthcare/capture-anywhere/360-mobile-solutions/powermicmobile/index.htm • http://www.nuance.com/for-individuals/by-product/dragon-accessories • https://en.wikipedia.org/wiki/List_of_speech_recognition_software • https://en.wikipedia.org/wiki/Dragon_NaturallySpeaking • https://en.wikipedia.org/wiki/Speech_recognition • https://en.wikipedia.org/wiki/Language_model • http://www.pcmag.com/article2/0,2817,2464719,00.asp • http://www.pcworld.com/article/2055599/control-your-pc-with-these-5-speech-recognition-programs.html • http://www.oxygen.lcs.mit.edu/Speech.html • http://copia.com.au/medical-speech-recognition/

Editor's Notes

  • #3 Коротко о себе Чуть длиннее о компании, сколько работаю, команде и на каком проекте и почему так долго
  • #4 Коротко об индустрии распознавания (речь, текст, звук в целом) и о специфике распознавания речи и реализациях
  • #18 Welcome or Well come
  • #19 Live sample with Cortana and Google Now and description what happen on each stage