SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our User Agreement and Privacy Policy.
SlideShare uses cookies to improve functionality and performance, and to provide you with relevant advertising. If you continue browsing the site, you agree to the use of cookies on this website. See our Privacy Policy and User Agreement for details.
Successfully reported this slideshow.
Activate your 14 day free trial to unlock unlimited reading.
QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика
В докладе расскажу об основных принципах работы Speech Recognition Software, где и какие технологии используются и расскажу о ключевых моментах в тестировании продуктов такого типа (как standalone-mode, так и формата cloud-recognition, включая голосовых помощников). Также расскажу о том, как используются такие продукты на Enterprise-уровне и какие аспекты тестирования нужно прнять во внимание.
В докладе расскажу об основных принципах работы Speech Recognition Software, где и какие технологии используются и расскажу о ключевых моментах в тестировании продуктов такого типа (как standalone-mode, так и формата cloud-recognition, включая голосовых помощников). Также расскажу о том, как используются такие продукты на Enterprise-уровне и какие аспекты тестирования нужно прнять во внимание.
QA Fest 2016. Роман Горин. Введение в системы распознавания речи глазами тестировщика
1.
Киев 2016
Первый в Украине фестиваль тестирования
Introduction to Speech
Recognition Software testing
Roman Gorin
2.
Киев 2016
About me
• Senior Technical Leader – Testing
@ Delphi LLC http://udelphi.com
• 12+ years in Speech Recognition Testing
• 6+ years as QA Team Lead
• Main Product: Nuance Dragon Medical
http://www.nuance.com/for-healthcare/dragon-medical
• https://telegram.me/DJ_ZX
• Facebook: rgorin.zx
4.
Киев 2016
Where used
• Nuance Dragon Family
• Dragon Pro
• Dragon Medical
• Dragon for Mac
• Dragon Anywhere
• Etc
Windows Speech Recognition
Google Voice Search
5.
Киев 2016
Where used
Personal assistants
• Siri
• Cortana
• Google Now
• Facebook M, etc
Car systems
6.
Киев 2016
Where used
Smart Home assistants
• Amazon Echo
• Google Home
• Zenbo
• Homer, etc.
• Automated Call Сenters SW
and more
8.
Киев 2016
Basic Principles
• Capture audio
• Separate speech from other types of sounds (esp. noise)
• Compare speech audio with known patterns of text<-
>audio match
• Analyze language specific model
• Perform actions (type text, execute command) based on
collected data
9.
Киев 2016
Generic structure of how SR works
Main speech recognition models
(based on Wiki)
• Hidden Markov models
• Dynamic time warping (DTW)-based
speech recognition
• Neural networks
• Deep Feedforward and Recurrent
Neural Networks
10.
Киев 2016
Testing areas
• Engine and Language Modelling (usually on recognition server side)
• UI
• Hardware
• Deployment
• Adaptation
• Recognition and Text Editing
• Language specific
etc
11.
Киев 2016
Testing areas: Hardware
• Mobile HW
• Internal mic (notebooks/tablets)
• Noise cancelling mic
• Sound card and drivers compatibility
• System Requirements compliance
• HW Dependency
• Driver Dependency (WASAPI, DirectSound, ASIO, Kernel streaming for
Windows, ALSA, PulseAudio – Linux, Core Audio – Mac)
12.
Киев 2016
Testing areas: Hardware
• Mics and recorders (samples from nuance.com store)
• Special bundled HW for Professional
*Nuance PowerMic *Philips SpeechMike
13.
Киев 2016
Testing areas: Deployment
• Platform
• Client OS (Desktop/Mobile)
• Server OS for Client app
• Server OS for Cloud/Remote app
• Azure Cloud
• Amazon Cloud
• Proprietary cloud hosts for server recognition (for ex. recognition servers for Siri, etc)
• Support for virtualization platforms: VDI and App Virtualization
(standalone recognition on remote access)
• Citrix XenApp and XenDesktop/Thin and Thick clients
• VMWare Workstation and Horizon
• Oracle VirtualBox
• Microsoft Remote Desktop/Terminal Services
14.
Киев 2016
Testing areas: Adaptation
• Predefined language patterns
• Statistical models
A statistical language model is a probability distribution over sequences of words. Given such a sequence, say of length m, it assigns a probability
P ( w 1 , … , w m ) to the whole sequence. Having a way to estimate the relative likelihood of different phrases is useful in many natural language processing applications.
Language modeling is used in speech recognition, machine translation, part-of-speech tagging, parsing, handwriting recognition, information retrieval and other applications.
• “Part of speech” detection
• Sound specific patterns
• Person-specific
• How person pronounce words and sounds
• How person construct sentences
• Pronunciation speed
15.
Киев 2016
Testing areas: Recognition and Commands
control
• Initial recognition tests
• Turn app into “listening mode”
• Basic commands (“what I can do”)
• Extended commands (app-type specific)
• Non strict commands (pseudo-AI)
• Search commands
• 3rd party Apps specific commands/3rd party SW compatibility
• Dictating into app default text controls (if supported)
• Dictating into 3rd party supported and unsupported apps
• Transcribing prerecorded audio
16.
Киев 2016
Testing areas: Recognition and Text Editing
(sample from PCWorld/Nuance)
17.
Киев 2016
Testing areas: Languages and Accents
• Different accents (UK English, US English, Australian English, etc)
• Issues with speaking
• Language-specific sounds
• Homophones (French)
• Umlauts (German)
• etc
• Language specific syntax (using commas, periods, exclamation marks,
etc)
• Similar or close pronunciation words (fr. voux, voi, vu, etc)
• Hieroglyphs (Chinese, Japan, etc)
18.
Киев 2016
Testing areas: Other stuff
• Audio codecs
• Traffic consumption (for cloud or remote access apps)
• Memory and CPU consumption
• Response time and cancelling recognition
19.
Киев 2016
Enterprise Recognition (based on Nuance.com info)