Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

516 views

Published on

В докладе расскажу об основных принципах работы Speech Recognition Software, где и какие технологии используются и расскажу о ключевых моментах в тестировании продуктов такого типа (как standalone-mode, так и формата cloud-recognition, включая голосовых помощников). Также расскажу о том, как используются такие продукты на Enterprise-уровне и какие аспекты тестирования нужно прнять во внимание.

Published in: Education
  • Be the first to comment

  • Be the first to like this

QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика

  1. 1. Киев 2016 Первый в Украине фестиваль тестирования Introduction to Speech Recognition Software testing Roman Gorin
  2. 2. Киев 2016 About me • Senior Technical Leader – Testing @ Delphi LLC http://udelphi.com • 12+ years in Speech Recognition Testing • 6+ years as QA Team Lead • Main Product: Nuance Dragon Medical http://www.nuance.com/for-healthcare/dragon-medical • https://telegram.me/DJ_ZX • Facebook: rgorin.zx
  3. 3. Киев 2016 What it is
  4. 4. Киев 2016 Where used • Nuance Dragon Family • Dragon Pro • Dragon Medical • Dragon for Mac • Dragon Anywhere • Etc Windows Speech Recognition Google Voice Search
  5. 5. Киев 2016 Where used Personal assistants • Siri • Cortana • Google Now • Facebook M, etc Car systems
  6. 6. Киев 2016 Where used Smart Home assistants • Amazon Echo • Google Home • Zenbo • Homer, etc. • Automated Call Сenters SW and more
  7. 7. Киев 2016 Where used: ViV AI (unreleased)
  8. 8. Киев 2016 Basic Principles • Capture audio • Separate speech from other types of sounds (esp. noise) • Compare speech audio with known patterns of text<- >audio match • Analyze language specific model • Perform actions (type text, execute command) based on collected data
  9. 9. Киев 2016 Generic structure of how SR works Main speech recognition models (based on Wiki) • Hidden Markov models • Dynamic time warping (DTW)-based speech recognition • Neural networks • Deep Feedforward and Recurrent Neural Networks
  10. 10. Киев 2016 Testing areas • Engine and Language Modelling (usually on recognition server side) • UI • Hardware • Deployment • Adaptation • Recognition and Text Editing • Language specific etc
  11. 11. Киев 2016 Testing areas: Hardware • Mobile HW • Internal mic (notebooks/tablets) • Noise cancelling mic • Sound card and drivers compatibility • System Requirements compliance • HW Dependency • Driver Dependency (WASAPI, DirectSound, ASIO, Kernel streaming for Windows, ALSA, PulseAudio – Linux, Core Audio – Mac)
  12. 12. Киев 2016 Testing areas: Hardware • Mics and recorders (samples from nuance.com store) • Special bundled HW for Professional *Nuance PowerMic *Philips SpeechMike
  13. 13. Киев 2016 Testing areas: Deployment • Platform • Client OS (Desktop/Mobile) • Server OS for Client app • Server OS for Cloud/Remote app • Azure Cloud • Amazon Cloud • Proprietary cloud hosts for server recognition (for ex. recognition servers for Siri, etc) • Support for virtualization platforms: VDI and App Virtualization (standalone recognition on remote access) • Citrix XenApp and XenDesktop/Thin and Thick clients • VMWare Workstation and Horizon • Oracle VirtualBox • Microsoft Remote Desktop/Terminal Services
  14. 14. Киев 2016 Testing areas: Adaptation • Predefined language patterns • Statistical models A statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability P ( w 1 , … , w m ) to the whole sequence. Having a way to estimate the relative likelihood of different phrases is useful in many natural language processing applications. Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications. • “Part of speech” detection • Sound specific patterns • Person-specific • How person pronounce words and sounds • How person construct sentences • Pronunciation speed
  15. 15. Киев 2016 Testing areas: Recognition and Commands control • Initial recognition tests • Turn app into “listening mode” • Basic commands (“what I can do”) • Extended commands (app-type specific) • Non strict commands (pseudo-AI) • Search commands • 3rd party Apps specific commands/3rd party SW compatibility • Dictating into app default text controls (if supported) • Dictating into 3rd party supported and unsupported apps • Transcribing prerecorded audio
  16. 16. Киев 2016 Testing areas: Recognition and Text Editing (sample from PCWorld/Nuance)
  17. 17. Киев 2016 Testing areas: Languages and Accents • Different accents (UK English, US English, Australian English, etc) • Issues with speaking • Language-specific sounds • Homophones (French) • Umlauts (German) • etc • Language specific syntax (using commas, periods, exclamation marks, etc) • Similar or close pronunciation words (fr. voux, voi, vu, etc) • Hieroglyphs (Chinese, Japan, etc)
  18. 18. Киев 2016 Testing areas: Other stuff • Audio codecs • Traffic consumption (for cloud or remote access apps) • Memory and CPU consumption • Response time and cancelling recognition
  19. 19. Киев 2016 Enterprise Recognition (based on Nuance.com info)
  20. 20. Киев 2016 Enterprise Recognition (based on Nuance.com info) • Support Major EHR platforms—including Epic®, Cerner®, eClinicalWorks, athenahealth®, MEDITECH®, and more. © Nuance.com
  21. 21. Киев 2016
  22. 22. Киев 2016 Links • https://msdn.microsoft.com/en-us/library/hh378337(v=office.14).aspx • http://www.explainthatstuff.com/voicerecognition.html • http://scienceline.org/2014/08/ever-wondered-how-does-speech-to-text-software-work/ • http://www.nuance.com/for-healthcare/capture-anywhere/360-mobile-solutions/powermicmobile/index.htm • http://www.nuance.com/for-individuals/by-product/dragon-accessories • https://en.wikipedia.org/wiki/List_of_speech_recognition_software • https://en.wikipedia.org/wiki/Dragon_NaturallySpeaking • https://en.wikipedia.org/wiki/Speech_recognition • https://en.wikipedia.org/wiki/Language_model • http://www.pcmag.com/article2/0,2817,2464719,00.asp • http://www.pcworld.com/article/2055599/control-your-pc-with-these-5-speech-recognition-programs.html • http://www.oxygen.lcs.mit.edu/Speech.html • http://copia.com.au/medical-speech-recognition/

×