Advertisement
Advertisement

More Related Content

Similar to QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика(20)

Advertisement

More from QAFest(20)

Advertisement

QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

  1. Киев 2016 Первый в Украине фестиваль тестирования Introduction to Speech Recognition Software testing Roman Gorin
  2. Киев 2016 About me • Senior Technical Leader – Testing @ Delphi LLC http://udelphi.com • 12+ years in Speech Recognition Testing • 6+ years as QA Team Lead • Main Product: Nuance Dragon Medical http://www.nuance.com/for-healthcare/dragon-medical • https://telegram.me/DJ_ZX • Facebook: rgorin.zx
  3. Киев 2016 What it is
  4. Киев 2016 Where used • Nuance Dragon Family • Dragon Pro • Dragon Medical • Dragon for Mac • Dragon Anywhere • Etc Windows Speech Recognition Google Voice Search
  5. Киев 2016 Where used Personal assistants • Siri • Cortana • Google Now • Facebook M, etc Car systems
  6. Киев 2016 Where used Smart Home assistants • Amazon Echo • Google Home • Zenbo • Homer, etc. • Automated Call Сenters SW and more
  7. Киев 2016 Where used: ViV AI (unreleased)
  8. Киев 2016 Basic Principles • Capture audio • Separate speech from other types of sounds (esp. noise) • Compare speech audio with known patterns of text<- >audio match • Analyze language specific model • Perform actions (type text, execute command) based on collected data
  9. Киев 2016 Generic structure of how SR works Main speech recognition models (based on Wiki) • Hidden Markov models • Dynamic time warping (DTW)-based speech recognition • Neural networks • Deep Feedforward and Recurrent Neural Networks
  10. Киев 2016 Testing areas • Engine and Language Modelling (usually on recognition server side) • UI • Hardware • Deployment • Adaptation • Recognition and Text Editing • Language specific etc
  11. Киев 2016 Testing areas: Hardware • Mobile HW • Internal mic (notebooks/tablets) • Noise cancelling mic • Sound card and drivers compatibility • System Requirements compliance • HW Dependency • Driver Dependency (WASAPI, DirectSound, ASIO, Kernel streaming for Windows, ALSA, PulseAudio – Linux, Core Audio – Mac)
  12. Киев 2016 Testing areas: Hardware • Mics and recorders (samples from nuance.com store) • Special bundled HW for Professional *Nuance PowerMic *Philips SpeechMike
  13. Киев 2016 Testing areas: Deployment • Platform • Client OS (Desktop/Mobile) • Server OS for Client app • Server OS for Cloud/Remote app • Azure Cloud • Amazon Cloud • Proprietary cloud hosts for server recognition (for ex. recognition servers for Siri, etc) • Support for virtualization platforms: VDI and App Virtualization (standalone recognition on remote access) • Citrix XenApp and XenDesktop/Thin and Thick clients • VMWare Workstation and Horizon • Oracle VirtualBox • Microsoft Remote Desktop/Terminal Services
  14. Киев 2016 Testing areas: Adaptation • Predefined language patterns • Statistical models A statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability P ( w 1 , … , w m ) to the whole sequence. Having a way to estimate the relative likelihood of different phrases is useful in many natural language processing applications. Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications. • “Part of speech” detection • Sound specific patterns • Person-specific • How person pronounce words and sounds • How person construct sentences • Pronunciation speed
  15. Киев 2016 Testing areas: Recognition and Commands control • Initial recognition tests • Turn app into “listening mode” • Basic commands (“what I can do”) • Extended commands (app-type specific) • Non strict commands (pseudo-AI) • Search commands • 3rd party Apps specific commands/3rd party SW compatibility • Dictating into app default text controls (if supported) • Dictating into 3rd party supported and unsupported apps • Transcribing prerecorded audio
  16. Киев 2016 Testing areas: Recognition and Text Editing (sample from PCWorld/Nuance)
  17. Киев 2016 Testing areas: Languages and Accents • Different accents (UK English, US English, Australian English, etc) • Issues with speaking • Language-specific sounds • Homophones (French) • Umlauts (German) • etc • Language specific syntax (using commas, periods, exclamation marks, etc) • Similar or close pronunciation words (fr. voux, voi, vu, etc) • Hieroglyphs (Chinese, Japan, etc)
  18. Киев 2016 Testing areas: Other stuff • Audio codecs • Traffic consumption (for cloud or remote access apps) • Memory and CPU consumption • Response time and cancelling recognition
  19. Киев 2016 Enterprise Recognition (based on Nuance.com info)
  20. Киев 2016 Enterprise Recognition (based on Nuance.com info) • Support Major EHR platforms—including Epic®, Cerner®, eClinicalWorks, athenahealth®, MEDITECH®, and more. © Nuance.com
  21. Киев 2016
  22. Киев 2016 Links • https://msdn.microsoft.com/en-us/library/hh378337(v=office.14).aspx • http://www.explainthatstuff.com/voicerecognition.html • http://scienceline.org/2014/08/ever-wondered-how-does-speech-to-text-software-work/ • http://www.nuance.com/for-healthcare/capture-anywhere/360-mobile-solutions/powermicmobile/index.htm • http://www.nuance.com/for-individuals/by-product/dragon-accessories • https://en.wikipedia.org/wiki/List_of_speech_recognition_software • https://en.wikipedia.org/wiki/Dragon_NaturallySpeaking • https://en.wikipedia.org/wiki/Speech_recognition • https://en.wikipedia.org/wiki/Language_model • http://www.pcmag.com/article2/0,2817,2464719,00.asp • http://www.pcworld.com/article/2055599/control-your-pc-with-these-5-speech-recognition-programs.html • http://www.oxygen.lcs.mit.edu/Speech.html • http://copia.com.au/medical-speech-recognition/

Editor's Notes

  1. Коротко о себе Чуть длиннее о компании, сколько работаю, команде и на каком проекте и почему так долго
  2. Коротко об индустрии распознавания (речь, текст, звук в целом) и о специфике распознавания речи и реализациях
  3. Welcome or Well come
  4. Live sample with Cortana and Google Now and description what happen on each stage
Advertisement