Speech Recognition, Text to Speech, and Voice Interfaces

•Download as PPTX, PDF•

1 like•1,290 views

Christiana Vasquez

By Taryne Cahalin, Stephanie Sirico, Christiana Vasquez

Technology Education

What is Speech
Recognition?
Instead of an automated voice recording that enables a
person to press buttons, he or she is able to speak specific
words into a device and command orders with the help of a
speech recognition program.

The Uses
Individuals With Disabilities – Assists those who have visual
impairment, hand immobility, dyslexia, etc.
Medical Transcription – Reduces delays to write out
medical transcriptions
Dictation - Converts words to text in emails or other word
documents (also helpful for English Language Learners).
Access Menu Commands – Opens files using voice commands.

How does it work?
Speech recognition functions as a
pipeline:
The pipeline converts PCM (pulse
code modulation) digital audio into
recognized speech from a sound
card.

Transforming PCM Digital Audio

16,000 PCM values
per second, a “wavy
line”, that repeat while
the user speaks

Information is
converted for
better
recognition in
the program

Fast-Fourier
transform
identifies
frequency
components of a
specific sound

The program
can
approximate
how our ears
distinguish the
sound

Transform PCM digital audio
using Fast-Fourier Transform
Fast-Fourier analyzes every 1/100th of a second
and converts the audio data

Each 1/100th produces an amplitude graph
These graphs are in a database called a “codebook”
Sounds matched to the most similar entry in the codebook.
Sound is given a number which describes the sound, called the “feature
number”

Two Categories

Small Vocabulary/many-users:
• Leaves room for speech disparity (i.e. accents)
• Limited, preset number of commands that are able to be used

Large Vocabulary/limited-users:
• Best for business settings
• Train system to work with a small number of users
• Accuracy rate will increase as it learns its users

Discrete vs. Continuous Speech
Discrete
• Easier for program to understand
• Noticeable pause after each word
Continuous
• Allows speaking at conversational speed
• Used in most modern systems
Programs now can recognize accents and pronunciations better. In
earlier programs, accents, pronunciations, speed, and background noise
were all variables that made sounds difficult for programs to understand.

Using Talk – Text to Voice

This app allows you to type and then have the device repeat what was
typed. In this case, instead of the device saying Taryne as “Ta-rin”, it
pronounced it as “Ta-reen”. This is an example of speech recognition
programs still need some work to be done because of emphasis on a
syllable. The codebook did not have Taryne in it, so it was unable to
pronounce her name.

The Future of Assistive Technology
in Schools
Students who need assistance in their writing skills because
they have stronger oral skills.
Students who were absent for a class, have poor memory, or
need assistance hearing the lesson.
Students who need assistance during Guided Reading.

Students who are English Language Learners.

Students with visual/hearing impairments and learning
disabilities regarding reading/spelling/writing.

Viewers also liked

클라우드기반 음성변환 서비스 보이스몬제안서_201312Justin Shin

Speech analytics solution overviewRajkumar Subramanian

Voice Interfaces Usergroup Berlin - 05-10-2016 : Kay Lerch on Morse-Coder skillKay Lerch

How to Succeed With Rewarded Video AdsSohan Maheshwar

Mobile Gaming Monetization Trends in 2016Sohan Maheshwar

KiwiPyCon 2014 talk - Understanding human language with PythonAlyona Medelyan

Designing a Conversational Intelligent Bot which can cookKaushik Das

ICS2208 lecture4Vanessa Camilleri

Applying Science to Conversational UX DesignRaphael Arar

The Journey to conversational interfacesRomin Irani

Amazon Alexa Voice Interfaces Meetup Berlin August 2016Tilmann Böhme

Where's Jarvis? The future of Voice Recognition and Natural Language User Int...UXPA International

Introduction to Chat BotsAlyona Medelyan

Chatbots - What, Why and How? - Beerud ShethWithTheBest

Self-Service.AI - Pitch Competition for AI-Driven SaaS StartupsDatentreiber

Build your first messenger botNowa Labs Pte Ltd

How to implement chatbots for Alexa and Facebook MessengerMoritz Strube

The lifecycle of a chatbotSohan Maheshwar

Speech recognition system seminarDiptimaya Sarangi

Amazon EC2 Systems Manager for Hybrid Cloud Management at ScaleAmazon Web Services

Viewers also liked (20)

클라우드기반 음성변환 서비스 보이스몬제안서_201312

Speech analytics solution overview

Voice Interfaces Usergroup Berlin - 05-10-2016 : Kay Lerch on Morse-Coder skill

How to Succeed With Rewarded Video Ads

Mobile Gaming Monetization Trends in 2016

KiwiPyCon 2014 talk - Understanding human language with Python

Designing a Conversational Intelligent Bot which can cook

ICS2208 lecture4

Applying Science to Conversational UX Design

The Journey to conversational interfaces

Amazon Alexa Voice Interfaces Meetup Berlin August 2016

Where's Jarvis? The future of Voice Recognition and Natural Language User Int...

Introduction to Chat Bots

Chatbots - What, Why and How? - Beerud Sheth

Self-Service.AI - Pitch Competition for AI-Driven SaaS Startups

Build your first messenger bot

How to implement chatbots for Alexa and Facebook Messenger

The lifecycle of a chatbot

Speech recognition system seminar

Amazon EC2 Systems Manager for Hybrid Cloud Management at Scale

Similar to Speech Recognition, Text to Speech, and Voice Interfaces

Speech recognition An overviewsajanazoya

Introduction to myanmar Text-To-SpeechNgwe Tun

Proposal presentation.pptxNhlakanipho Majola

Speech Recognition in Artificail InteligenceIlhaan Marwat

Speech recognitionCharu Joshi

An communication app for hearing impaired groupsVanessa Li

SeminarAkash Prajapati

Web AI.pptx20CS102RAMMPRASHATHK

Speech to text conversionankit_saluja

ACHIEVING SECURITY VIA SPEECH RECOGNITIONijistjournal

Turn Talking Softwareacollier212

Noise Adaptive Training for Robust Automatic Speech Recognitionأحلام انصارى

F 08 dragon naturally speakingTracy Gilmer

Speechrecognition 100423091251-phpapp01girishjoshi1234

Synchronous CommunicationRichard Turnbull

PurposeSpeech recognition software has existed for decades; diff.docxmakdul

Assistive technology presentationShamia Garrett

Similar to Speech Recognition, Text to Speech, and Voice Interfaces (20)

Speech recognition An overview

Introduction to myanmar Text-To-Speech

Proposal presentation.pptx

Speech Recognition in Artificail Inteligence

Speech recognition

An communication app for hearing impaired groups

Seminar

Web AI.pptx

Speech to text conversion

ACHIEVING SECURITY VIA SPEECH RECOGNITION

Turn Talking Software

Noise Adaptive Training for Robust Automatic Speech Recognition

F 08 dragon naturally speaking

Speechrecognition 100423091251-phpapp01

Synchronous Communication

PurposeSpeech recognition software has existed for decades; diff.docx

Assistive technology presentation

Recently uploaded

Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC

Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm

Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation

CloudStudio User manual (basic edition):comworks

08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls

Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed

Maximizing Board Effectiveness 2024 Webinar.pptxOnBoard

Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren

"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays

Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar

Are Multi-Cloud and Serverless Good or Bad?Mattias Andersson

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community

The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad

Swan(sea) Song – personal research during my six years at Swansea ... and bey...Alan Dix

Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software

SQL Database Design For Developers at php[tek] 2024Scott Keck-Warren

Install Stable Diffusion in windows machinePadma Pradeep

Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik

Human Factors of XR: Using Human Factors to Design XR SystemsMark Billinghurst

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptxnull - The Open Security Community

Recently uploaded (20)

Breaking the Kubernetes Kill Chain: Host Path Mount

Streamlining Python Development: A Guide to a Modern Project Setup

Connect Wave/ connectwave Pitch Deck Presentation

CloudStudio User manual (basic edition):

08448380779 Call Girls In Diplomatic Enclave Women Seeking Men

Scanning the Internet for External Cloud Exposures via SSL Certs

Maximizing Board Effectiveness 2024 Webinar.pptx

Advanced Test Driven-Development @ php[tek] 2024

"Federated learning: out of reach no matter how close",Oleksandr Lapshyn

Unleash Your Potential - Namagunga Girls Coding Club

Are Multi-Cloud and Serverless Good or Bad?

Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx

The Codex of Business Writing Software for Real-World Solutions 2.pptx

Swan(sea) Song – personal research during my six years at Swansea ... and bey...

Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation

SQL Database Design For Developers at php[tek] 2024

Install Stable Diffusion in windows machine

Injustice - Developers Among Us (SciFiDevCon 2024)

Human Factors of XR: Using Human Factors to Design XR Systems

E-Vehicle_Hacking_by_Parul Sharma_null_owasp.pptx

Speech Recognition, Text to Speech, and Voice Interfaces

1. Speech Recognition, Text-To-Speech, and Voice Interfaces By: Taryne Cahalin Stephanie Sirico Christiana Vasquez Adelphi University - Mobile Learning, Fall 2013

2. What is Speech Recognition? Instead of an automated voice recording that enables a person to press buttons, he or she is able to speak specific words into a device and command orders with the help of a speech recognition program.

3. The Uses Individuals With Disabilities – Assists those who have visual impairment, hand immobility, dyslexia, etc. Medical Transcription – Reduces delays to write out medical transcriptions Dictation - Converts words to text in emails or other word documents (also helpful for English Language Learners). Access Menu Commands – Opens files using voice commands.

4. Using Dragon Mobile

5. How does it work? Speech recognition functions as a pipeline: The pipeline converts PCM (pulse code modulation) digital audio into recognized speech from a sound card.

7. Transforming PCM Digital Audio 16,000 PCM values per second, a “wavy line”, that repeat while the user speaks Information is converted for better recognition in the program Fast-Fourier transform identifies frequency components of a specific sound The program can approximate how our ears distinguish the sound

8. Transform PCM digital audio using Fast-Fourier Transform Fast-Fourier analyzes every 1/100th of a second and converts the audio data Each 1/100th produces an amplitude graph These graphs are in a database called a “codebook” Sounds matched to the most similar entry in the codebook. Sound is given a number which describes the sound, called the “feature number”

9. Two Categories Small Vocabulary/many-users: • Leaves room for speech disparity (i.e. accents) • Limited, preset number of commands that are able to be used Large Vocabulary/limited-users: • Best for business settings • Train system to work with a small number of users • Accuracy rate will increase as it learns its users

10. Discrete vs. Continuous Speech Discrete • Easier for program to understand • Noticeable pause after each word Continuous • Allows speaking at conversational speed • Used in most modern systems Programs now can recognize accents and pronunciations better. In earlier programs, accents, pronunciations, speed, and background noise were all variables that made sounds difficult for programs to understand.

11. Using Talk – Text to Voice This app allows you to type and then have the device repeat what was typed. In this case, instead of the device saying Taryne as “Ta-rin”, it pronounced it as “Ta-reen”. This is an example of speech recognition programs still need some work to be done because of emphasis on a syllable. The codebook did not have Taryne in it, so it was unable to pronounce her name.

12. The Future of Assistive Technology in Schools Students who need assistance in their writing skills because they have stronger oral skills. Students who were absent for a class, have poor memory, or need assistance hearing the lesson. Students who need assistance during Guided Reading. Students who are English Language Learners. Students with visual/hearing impairments and learning disabilities regarding reading/spelling/writing.

Speech Recognition, Text to Speech, and Voice Interfaces

Recommended

Recommended

More Related Content

Viewers also liked

Viewers also liked (20)

Similar to Speech Recognition, Text to Speech, and Voice Interfaces

Similar to Speech Recognition, Text to Speech, and Voice Interfaces (20)

Recently uploaded

Recently uploaded (20)

Speech Recognition, Text to Speech, and Voice Interfaces