SlideShare a Scribd company logo
1 of 45
Sp!ch Recognition
and Sp!ch Syn"esis
on iOS
http://sysrun.haifa.il.ibm.com/ibm/history/exhibits/specialprod1/specialprod1_7.html
@peterfriese
peter.friese@zuehlke.com
xing.to/peter
http://peterfriese.de
Peter Friese
Ever since we use computers,
we have dreamt of using
spoken language
to communicate with them
SPEECH
SYNTHESIS
SPEECH
RECOGNITION
WHAT IS
SPEECH SYNTHESIS?
is the artificial production of
human speech
Sp!ch Syn"esis
Sp!ch syn"esis: Hist#y
1769: Speaking machine, by Wolfgang von Kempelen (he also developed the
famous Mechanical Turk)
Functional representation of the human vocal tract.
http://www.youtube.com/watch?v=zYRVqrfY3tQ
1970: Vocoder, custom built for Kraftwerk.
http://www.youtube.com/watch?v=w-Jq7BHtQMA
1939: Vocoder (Vocal Encoder), developed by Horner Dudley for Bell Labs.
Needed to be played (using a keyboard) by a trained operator.
Exhibited at the 1939 World Fair.
http://www.youtube.com/watch?v=CyaK22DMfF0
Most modern speech synthesis
systems use electronic /
computerized approaches
Sp!ch Syn"esis
Text to sp!ch (TTS)
Text Speech
Front end Back end
In modern TTS systems, speech synthesis is a
multi-step process that is divided into two
main parts:
1) Front end (analysis)
2) Back end (synthesis)
Text to sp!ch (TTS)
Text
analysis
Linguistic analysis
Waveform
generation
Phasing
Intonation
Duration
Text Speech
PhonemesWords
Front end Back end
TTS: Analysis
Text normalization challenges
My latest project is to
learn how to better
project my voice
TTS: Analysis
Text normalization challenges
1430
Half past two
one - four - "r! - zero
Fourt!n hundred and "irty
One "ousand four hundred "irty
TTS: Analysis
Text to phoneme challenges
read
Red
R!d
SYNTHESIS
APPROACHES
TTS: Syn"esis
1) Concatenative synthesis
2) Formant synthesis
TTS: Concatenative syn"
Base strategy: Concatenate segments of recorded speech
Unit selection synthesis: uses phones, diphones, half-phones, syllables,
morphemes, word, phrases and sentences. Best results, often
indistinguishable from human speech. Requires huge amount of pre-
recorded data.
Diphone synthesis: uses a minimal database containing all diphones of a
natural language (English: 800 diphones, German: 2500 diphones).
Disadvantage: sonic glitches. Still used commercially, but on the decline.
Domain-specific synthesis: concatenates prerecorded words and
sentences. Used in transport schedule announcements, weather reports,...
Simple to implement. High level of naturalness.
TTS: F#mant syn"
Formant: spectral peak of the sound spectrum of the voice.
It is sufficient to reproduce the first two (of 4) formants to be able to
distinguish vowels.
Can be implemented quite easily, but results in rather artificial results
(“computer voice”).
Vowel Formant f1 Formant f2
i 240 Hz 2400 Hz
e 390 Hz 2300 Hz
o 360 Hz 640 Hz
Vowel Formant f1 Formant f2
i 320 Hz 3200 Hz
e 500 Hz 2300 Hz
o 500 Hz 1000 Hz
English German
Concatenative Formant
Advantages • High level of naturalness • No large database
required
• Very intelligible, also at
high speeds
Disadvantages • Requires large database • Low level of naturalness
(“robotic” sound)
TTS: Syn"esis
SPEECH SYNTHESIS
DEMOS
TTS SDKs
• Siri
• iOS Voice Services
• Flite
• OpenEars (based on Flite)
• iSpeech
• Nuance
• AT&T
• Google TTS
• Bing TTS
TTS SDKs
• Siri
• iOS Voice Services
• Flite
• OpenEars (based on Flite)
• iSpeech
• Nuance
• AT&T
• Google TTS
• Bing TTS
Using iOS Voice Service
Private API: Not save for the App Store - use at your own risk!
VSSpeechSynthesizer *speech =
[[NSClassFromString(@"VSSpeechSynthesizer") alloc] init];
[speech setRate:(float)1.0];
[speech startSpeakingString:@"Hello world, how are you"];
OpenEars SDK
URL: http://www.politepix.com/openears/
Shared Source
Based on CMU Pocketsphinx, CMU Flite, and CMU-CLMTK
Works offline, both for recognition and synthesis
Currently only supports English
Synthetic sound (diphone voice synthesis)
Pricing: free, with additional paid voices
iSp!ch SDK
URL: http://www.ispeech.org
Commercial, free access for testing
Needs a server connection
Supports several languages: English (US, UK, m/f), Spanish (m/f), Chinese,
Japanese, Danish, Finnish, Italian, German, Russian, ...
Synthetic sound (diphone voice synthesis)
Pricing:
pay per use (0.02$ per TX)
pay per install (0.25$ per install, minimum 10.000 installs)
AT & T Sp!ch SDK
URL: http://developer.att.com
Commercial, free trial access for 90 days
Pricing: USD 99 / year grants 1.000.000 API calls per month
TTS API:
Web Service:
send text, get WAV back
Voices:
US English (male / female)
US Spanish (male / female)
Nuance
URL: http://dragonmobile.nuancemobiledeveloper.com/
Commercial, free access for testing
Needs a server connection
Supports several languages: English (US, UK, m/f), Spanish (m/f), Chinese,
Japanese, Danish, Finnish, Italian, German, Russian, ...
Rather natural sound
Pricing:
Several Service Levels (Silver, Gold, Emerald)
Silver:
Up to 20 TX per device per day, max 500.000 devices
Gold
Pay per device ($0.24 per install)
Pay per transaction ($0.009 per tx)
Pre-payment of at least $3000
WHAT IS
SPEECH
RECOGNITION?
is the translation of spoken
words into text.
Sp!ch Recognition
Sp!ch recognition: Hist#y
1952: “Audrey” developed at Bell Labs. Could recognized digits spoken by a
single voice.
1970s: DARPA Speech Unerstanding Research program. “Harpy”, developed at
Carnegie Mellon University (could understand 1011 words).
http://www.youtube.com/watch?v=N3i6NoUZsSw
1962: “Shoebox” by IBM, demonstrated at World Fair. Could recognize 16
words spoken in English.
http://sysrun.haifa.il.ibm.com/ibm/history/exhibits/specialprod1/
specialprod1_7.html
1980s: By using statistical models (Hidden Markov Models), ASR vocabularies
grew from a few hundred words over several thousand words to
potentially unlimited numbers of words. Still, discrete dictation was
required.
1990s: Dragon Naturally Speaking (originally at $9000) supports continuous
speech recognition.
Sp!ch recognition
Preprocessing
Recognition
Decoder
(analogous)
speech
Language
model
Dictionary
Text
Candidate
Candidate
Candidate
Acoustic
model
Sp!ch recognition
Language
model
Dictionary
Acoustic
model
States Phonemes Words Sentences
/’h/
/’h/ -> /a/
/a/ how will
the weather be
tomorrow
todayshow me
SPEECH RECOGNITION
DEMOS
Sp!ch Recognition SDKs
• Siri
• iOS Voice Services
• Flite
• OpenEars (based on Flite)
• iSpeech
• Nuance
• AT&T
• Google TTS
• Bing TTS
• Siri
• iOS Voice Services
• Flite
• OpenEars (based on Flite)
• iSpeech
• Nuance
• AT&T
• Google TTS
• Bing TTS
Sp!ch Recognition SDKs
OpenEars SDK
URL: http://www.politepix.com/openears/
Shared Source
Based on CMU Pocketsphinx, CMU Flite, and CMU-CLMTK
Works offline, both for recognition and synthesis
Vocabulary: needs to be provided by developer
Currently only supports English
Pricing: free, with additional paid voices
iSp!ch SDK
URL: http://www.ispeech.org
Commercial, free access for testing
Needs a server connection
Supports several languages: English (US, UK, m/f), Spanish (m/f), Chinese,
Japanese, Danish, Finnish, Italian, German, Russian, ...
Pricing:
pay per use (0.02$ per TX)
pay per install (0.25$ per install, minimum 10.000 installs)
AT & T Sp!ch SDK
URL: http://developer.att.com
Commercial, free trial access for 90 days
Pricing: USD 99 / year grants 1.000.000 API calls per month
Supports several recognition contexts:
Gaming, Social Media, Web Search, Business Search, Voicemail to Text,
SMS, Question and Answer, TV, Generic
Support for command mode:
provide set of commands that are allowed in your app. Supports 19
languages (including English, German, Mandarin, Japanese, French,
Italian)
Nuance
URL: http://dragonmobile.nuancemobiledeveloper.com/
Commercial, free access for testing
Needs a server connection
Supports several languages: English (US, UK), Spanish, Chinese,
Japanese, Danish, Finnish, Italian, German, Russian, ...
Pricing:
Several Service Levels (Silver, Gold, Emerald)
Silver:
Up to 20 TX per device per day, max 500.000 devices
Gold
Pay per device ($0.24 per install)
Pay per transaction ($0.009 per tx)
Pre-payment of at least $3000
OUTLOOK
Multi-modal UIs
Pixeltone
http://www.gierad.com/projects/pixeltone-a-multimodal-interface-for-image-editing/
Multi modal input
+ = ?+
+ = ?
+ = ?
http://elizaapp.com
Zühlke. Empowering Ideas.
@peterfriese
peter.friese@zuehlke.com
http://www.zuehlke.com
Want to learn more? Get in touch - I’m available for consulting:
Zühlke. Empowering Ideas.
@peterfriese
peter.friese@zuehlke.com
http://www.zuehlke.com
Want to learn more? Get in touch - I’m available for consulting:
http://slidesha.re/15xNxpf

More Related Content

Viewers also liked

Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognitionshanle03
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationssChandan Deb
 
Natural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital HumanitiesNatural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital HumanitiesXiang Li
 
Michael Hausenblas- Scalable time series and stream processing for IoT applic...
Michael Hausenblas- Scalable time series and stream processing for IoT applic...Michael Hausenblas- Scalable time series and stream processing for IoT applic...
Michael Hausenblas- Scalable time series and stream processing for IoT applic...WithTheBest
 
Multi Object Tracking | Presentation 1 | ID 103001
Multi Object Tracking | Presentation 1 | ID 103001Multi Object Tracking | Presentation 1 | ID 103001
Multi Object Tracking | Presentation 1 | ID 103001Md. Minhazul Haque
 
Personal Assistant Application Using Android
Personal Assistant Application Using AndroidPersonal Assistant Application Using Android
Personal Assistant Application Using AndroidAhmar Ansari
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)ananth
 
Building an Image Recognition Service - How to leverage IBM Watson for visual...
Building an Image Recognition Service - How to leverage IBM Watson for visual...Building an Image Recognition Service - How to leverage IBM Watson for visual...
Building an Image Recognition Service - How to leverage IBM Watson for visual...10x Nation
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition子毅 楊
 
Machine Intelligence Applications for IoT Slam Dec 1st 2016
Machine Intelligence Applications for IoT Slam Dec 1st 2016Machine Intelligence Applications for IoT Slam Dec 1st 2016
Machine Intelligence Applications for IoT Slam Dec 1st 2016Sudha Jamthe
 
Watson Internet of Things Hexamite
Watson Internet of Things HexamiteWatson Internet of Things Hexamite
Watson Internet of Things HexamiteJason Lu
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by IqbalIqbal
 

Viewers also liked (18)

Speech Recognition
Speech RecognitionSpeech Recognition
Speech Recognition
 
Open nlp presentationss
Open nlp presentationssOpen nlp presentationss
Open nlp presentationss
 
Google voice
Google voice Google voice
Google voice
 
Natural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital HumanitiesNatural Language Processing Tools for the Digital Humanities
Natural Language Processing Tools for the Digital Humanities
 
Michael Hausenblas- Scalable time series and stream processing for IoT applic...
Michael Hausenblas- Scalable time series and stream processing for IoT applic...Michael Hausenblas- Scalable time series and stream processing for IoT applic...
Michael Hausenblas- Scalable time series and stream processing for IoT applic...
 
Issues, Challenges and Perspectives of Digitization: the NLP Experience
Issues, Challenges and Perspectives of Digitization: the NLP ExperienceIssues, Challenges and Perspectives of Digitization: the NLP Experience
Issues, Challenges and Perspectives of Digitization: the NLP Experience
 
Multi Object Tracking | Presentation 1 | ID 103001
Multi Object Tracking | Presentation 1 | ID 103001Multi Object Tracking | Presentation 1 | ID 103001
Multi Object Tracking | Presentation 1 | ID 103001
 
Personal Assistant Application Using Android
Personal Assistant Application Using AndroidPersonal Assistant Application Using Android
Personal Assistant Application Using Android
 
An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)An overview of Hidden Markov Models (HMM)
An overview of Hidden Markov Models (HMM)
 
Building an Image Recognition Service - How to leverage IBM Watson for visual...
Building an Image Recognition Service - How to leverage IBM Watson for visual...Building an Image Recognition Service - How to leverage IBM Watson for visual...
Building an Image Recognition Service - How to leverage IBM Watson for visual...
 
Google voice
Google voice Google voice
Google voice
 
The Main Concepts of Speech Recognition
The Main Concepts of Speech RecognitionThe Main Concepts of Speech Recognition
The Main Concepts of Speech Recognition
 
Internet of Things (IoT) and Google
Internet of Things (IoT) and GoogleInternet of Things (IoT) and Google
Internet of Things (IoT) and Google
 
Machine Intelligence Applications for IoT Slam Dec 1st 2016
Machine Intelligence Applications for IoT Slam Dec 1st 2016Machine Intelligence Applications for IoT Slam Dec 1st 2016
Machine Intelligence Applications for IoT Slam Dec 1st 2016
 
Watson Internet of Things Hexamite
Watson Internet of Things HexamiteWatson Internet of Things Hexamite
Watson Internet of Things Hexamite
 
Seminar
SeminarSeminar
Seminar
 
Speech Recognition by Iqbal
Speech Recognition by IqbalSpeech Recognition by Iqbal
Speech Recognition by Iqbal
 
Why Learn NLP or go on an NLP Training : Webinair
 Why Learn NLP or go on an NLP Training : Webinair Why Learn NLP or go on an NLP Training : Webinair
Why Learn NLP or go on an NLP Training : Webinair
 

More from Peter Friese

Building Reusable SwiftUI Components
Building Reusable SwiftUI ComponentsBuilding Reusable SwiftUI Components
Building Reusable SwiftUI ComponentsPeter Friese
 
Firebase & SwiftUI Workshop
Firebase & SwiftUI WorkshopFirebase & SwiftUI Workshop
Firebase & SwiftUI WorkshopPeter Friese
 
Building Reusable SwiftUI Components
Building Reusable SwiftUI ComponentsBuilding Reusable SwiftUI Components
Building Reusable SwiftUI ComponentsPeter Friese
 
Firebase for Apple Developers - SwiftHeroes
Firebase for Apple Developers - SwiftHeroesFirebase for Apple Developers - SwiftHeroes
Firebase for Apple Developers - SwiftHeroesPeter Friese
 
 +  = ❤️ (Firebase for Apple Developers) at Swift Leeds
 +  = ❤️ (Firebase for Apple Developers) at Swift Leeds +  = ❤️ (Firebase for Apple Developers) at Swift Leeds
 +  = ❤️ (Firebase for Apple Developers) at Swift LeedsPeter Friese
 
async/await in Swift
async/await in Swiftasync/await in Swift
async/await in SwiftPeter Friese
 
Firebase for Apple Developers
Firebase for Apple DevelopersFirebase for Apple Developers
Firebase for Apple DevelopersPeter Friese
 
Building Apps with SwiftUI and Firebase
Building Apps with SwiftUI and FirebaseBuilding Apps with SwiftUI and Firebase
Building Apps with SwiftUI and FirebasePeter Friese
 
Rapid Application Development with SwiftUI and Firebase
Rapid Application Development with SwiftUI and FirebaseRapid Application Development with SwiftUI and Firebase
Rapid Application Development with SwiftUI and FirebasePeter Friese
 
Rapid Application Development with SwiftUI and Firebase
Rapid Application Development with SwiftUI and FirebaseRapid Application Development with SwiftUI and Firebase
Rapid Application Development with SwiftUI and FirebasePeter Friese
 
6 Things You Didn't Know About Firebase Auth
6 Things You Didn't Know About Firebase Auth6 Things You Didn't Know About Firebase Auth
6 Things You Didn't Know About Firebase AuthPeter Friese
 
Five Things You Didn't Know About Firebase Auth
Five Things You Didn't Know About Firebase AuthFive Things You Didn't Know About Firebase Auth
Five Things You Didn't Know About Firebase AuthPeter Friese
 
Building High-Quality Apps for Google Assistant
Building High-Quality Apps for Google AssistantBuilding High-Quality Apps for Google Assistant
Building High-Quality Apps for Google AssistantPeter Friese
 
Building Conversational Experiences with Actions on Google
Building Conversational Experiences with Actions on Google Building Conversational Experiences with Actions on Google
Building Conversational Experiences with Actions on Google Peter Friese
 
Building Conversational Experiences with Actions on Google
Building Conversational Experiences with Actions on GoogleBuilding Conversational Experiences with Actions on Google
Building Conversational Experiences with Actions on GooglePeter Friese
 
What's new in Android Wear 2.0
What's new in Android Wear 2.0What's new in Android Wear 2.0
What's new in Android Wear 2.0Peter Friese
 
Google Fit, Android Wear & Xamarin
Google Fit, Android Wear & XamarinGoogle Fit, Android Wear & Xamarin
Google Fit, Android Wear & XamarinPeter Friese
 
Introduction to Android Wear
Introduction to Android WearIntroduction to Android Wear
Introduction to Android WearPeter Friese
 
Google Play Services Rock
Google Play Services RockGoogle Play Services Rock
Google Play Services RockPeter Friese
 
Introduction to Android Wear
Introduction to Android WearIntroduction to Android Wear
Introduction to Android WearPeter Friese
 

More from Peter Friese (20)

Building Reusable SwiftUI Components
Building Reusable SwiftUI ComponentsBuilding Reusable SwiftUI Components
Building Reusable SwiftUI Components
 
Firebase & SwiftUI Workshop
Firebase & SwiftUI WorkshopFirebase & SwiftUI Workshop
Firebase & SwiftUI Workshop
 
Building Reusable SwiftUI Components
Building Reusable SwiftUI ComponentsBuilding Reusable SwiftUI Components
Building Reusable SwiftUI Components
 
Firebase for Apple Developers - SwiftHeroes
Firebase for Apple Developers - SwiftHeroesFirebase for Apple Developers - SwiftHeroes
Firebase for Apple Developers - SwiftHeroes
 
 +  = ❤️ (Firebase for Apple Developers) at Swift Leeds
 +  = ❤️ (Firebase for Apple Developers) at Swift Leeds +  = ❤️ (Firebase for Apple Developers) at Swift Leeds
 +  = ❤️ (Firebase for Apple Developers) at Swift Leeds
 
async/await in Swift
async/await in Swiftasync/await in Swift
async/await in Swift
 
Firebase for Apple Developers
Firebase for Apple DevelopersFirebase for Apple Developers
Firebase for Apple Developers
 
Building Apps with SwiftUI and Firebase
Building Apps with SwiftUI and FirebaseBuilding Apps with SwiftUI and Firebase
Building Apps with SwiftUI and Firebase
 
Rapid Application Development with SwiftUI and Firebase
Rapid Application Development with SwiftUI and FirebaseRapid Application Development with SwiftUI and Firebase
Rapid Application Development with SwiftUI and Firebase
 
Rapid Application Development with SwiftUI and Firebase
Rapid Application Development with SwiftUI and FirebaseRapid Application Development with SwiftUI and Firebase
Rapid Application Development with SwiftUI and Firebase
 
6 Things You Didn't Know About Firebase Auth
6 Things You Didn't Know About Firebase Auth6 Things You Didn't Know About Firebase Auth
6 Things You Didn't Know About Firebase Auth
 
Five Things You Didn't Know About Firebase Auth
Five Things You Didn't Know About Firebase AuthFive Things You Didn't Know About Firebase Auth
Five Things You Didn't Know About Firebase Auth
 
Building High-Quality Apps for Google Assistant
Building High-Quality Apps for Google AssistantBuilding High-Quality Apps for Google Assistant
Building High-Quality Apps for Google Assistant
 
Building Conversational Experiences with Actions on Google
Building Conversational Experiences with Actions on Google Building Conversational Experiences with Actions on Google
Building Conversational Experiences with Actions on Google
 
Building Conversational Experiences with Actions on Google
Building Conversational Experiences with Actions on GoogleBuilding Conversational Experiences with Actions on Google
Building Conversational Experiences with Actions on Google
 
What's new in Android Wear 2.0
What's new in Android Wear 2.0What's new in Android Wear 2.0
What's new in Android Wear 2.0
 
Google Fit, Android Wear & Xamarin
Google Fit, Android Wear & XamarinGoogle Fit, Android Wear & Xamarin
Google Fit, Android Wear & Xamarin
 
Introduction to Android Wear
Introduction to Android WearIntroduction to Android Wear
Introduction to Android Wear
 
Google Play Services Rock
Google Play Services RockGoogle Play Services Rock
Google Play Services Rock
 
Introduction to Android Wear
Introduction to Android WearIntroduction to Android Wear
Introduction to Android Wear
 

Recently uploaded

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Igalia
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxKatpro Technologies
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024The Digital Insurer
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processorsdebabhi2
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024The Digital Insurer
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024The Digital Insurer
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024The Digital Insurer
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slidevu2urc
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptxHampshireHUG
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsRoshan Dwivedi
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationRadu Cotescu
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerThousandEyes
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationSafe Software
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...apidays
 

Recently uploaded (20)

Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
Raspberry Pi 5: Challenges and Solutions in Bringing up an OpenGL/Vulkan Driv...
 
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptxFactors to Consider When Choosing Accounts Payable Services Providers.pptx
Factors to Consider When Choosing Accounts Payable Services Providers.pptx
 
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law DevelopmentsTrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
TrustArc Webinar - Stay Ahead of US State Data Privacy Law Developments
 
Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024Partners Life - Insurer Innovation Award 2024
Partners Life - Insurer Innovation Award 2024
 
Exploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone ProcessorsExploring the Future Potential of AI-Enabled Smartphone Processors
Exploring the Future Potential of AI-Enabled Smartphone Processors
 
Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024Tata AIG General Insurance Company - Insurer Innovation Award 2024
Tata AIG General Insurance Company - Insurer Innovation Award 2024
 
Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024Finology Group – Insurtech Innovation Award 2024
Finology Group – Insurtech Innovation Award 2024
 
Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024Axa Assurance Maroc - Insurer Innovation Award 2024
Axa Assurance Maroc - Insurer Innovation Award 2024
 
Histor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slideHistor y of HAM Radio presentation slide
Histor y of HAM Radio presentation slide
 
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
04-2024-HHUG-Sales-and-Marketing-Alignment.pptx
 
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live StreamsTop 5 Benefits OF Using Muvi Live Paywall For Live Streams
Top 5 Benefits OF Using Muvi Live Paywall For Live Streams
 
Scaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organizationScaling API-first – The story of a global engineering organization
Scaling API-first – The story of a global engineering organization
 
How to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected WorkerHow to Troubleshoot Apps for the Modern Connected Worker
How to Troubleshoot Apps for the Modern Connected Worker
 
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time AutomationFrom Event to Action: Accelerate Your Decision Making with Real-Time Automation
From Event to Action: Accelerate Your Decision Making with Real-Time Automation
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
Apidays Singapore 2024 - Building Digital Trust in a Digital Economy by Veron...
 

Speech Recognition and Speech Synthesis on iOS

  • 1. Sp!ch Recognition and Sp!ch Syn"esis on iOS http://sysrun.haifa.il.ibm.com/ibm/history/exhibits/specialprod1/specialprod1_7.html
  • 3. Ever since we use computers, we have dreamt of using spoken language to communicate with them
  • 6. is the artificial production of human speech Sp!ch Syn"esis
  • 7. Sp!ch syn"esis: Hist#y 1769: Speaking machine, by Wolfgang von Kempelen (he also developed the famous Mechanical Turk) Functional representation of the human vocal tract. http://www.youtube.com/watch?v=zYRVqrfY3tQ 1970: Vocoder, custom built for Kraftwerk. http://www.youtube.com/watch?v=w-Jq7BHtQMA 1939: Vocoder (Vocal Encoder), developed by Horner Dudley for Bell Labs. Needed to be played (using a keyboard) by a trained operator. Exhibited at the 1939 World Fair. http://www.youtube.com/watch?v=CyaK22DMfF0
  • 8. Most modern speech synthesis systems use electronic / computerized approaches Sp!ch Syn"esis
  • 9. Text to sp!ch (TTS) Text Speech Front end Back end In modern TTS systems, speech synthesis is a multi-step process that is divided into two main parts: 1) Front end (analysis) 2) Back end (synthesis)
  • 10. Text to sp!ch (TTS) Text analysis Linguistic analysis Waveform generation Phasing Intonation Duration Text Speech PhonemesWords Front end Back end
  • 11. TTS: Analysis Text normalization challenges My latest project is to learn how to better project my voice
  • 12. TTS: Analysis Text normalization challenges 1430 Half past two one - four - "r! - zero Fourt!n hundred and "irty One "ousand four hundred "irty
  • 13. TTS: Analysis Text to phoneme challenges read Red R!d
  • 15. TTS: Syn"esis 1) Concatenative synthesis 2) Formant synthesis
  • 16. TTS: Concatenative syn" Base strategy: Concatenate segments of recorded speech Unit selection synthesis: uses phones, diphones, half-phones, syllables, morphemes, word, phrases and sentences. Best results, often indistinguishable from human speech. Requires huge amount of pre- recorded data. Diphone synthesis: uses a minimal database containing all diphones of a natural language (English: 800 diphones, German: 2500 diphones). Disadvantage: sonic glitches. Still used commercially, but on the decline. Domain-specific synthesis: concatenates prerecorded words and sentences. Used in transport schedule announcements, weather reports,... Simple to implement. High level of naturalness.
  • 17. TTS: F#mant syn" Formant: spectral peak of the sound spectrum of the voice. It is sufficient to reproduce the first two (of 4) formants to be able to distinguish vowels. Can be implemented quite easily, but results in rather artificial results (“computer voice”). Vowel Formant f1 Formant f2 i 240 Hz 2400 Hz e 390 Hz 2300 Hz o 360 Hz 640 Hz Vowel Formant f1 Formant f2 i 320 Hz 3200 Hz e 500 Hz 2300 Hz o 500 Hz 1000 Hz English German
  • 18. Concatenative Formant Advantages • High level of naturalness • No large database required • Very intelligible, also at high speeds Disadvantages • Requires large database • Low level of naturalness (“robotic” sound) TTS: Syn"esis
  • 20. TTS SDKs • Siri • iOS Voice Services • Flite • OpenEars (based on Flite) • iSpeech • Nuance • AT&T • Google TTS • Bing TTS
  • 21. TTS SDKs • Siri • iOS Voice Services • Flite • OpenEars (based on Flite) • iSpeech • Nuance • AT&T • Google TTS • Bing TTS
  • 22. Using iOS Voice Service Private API: Not save for the App Store - use at your own risk! VSSpeechSynthesizer *speech = [[NSClassFromString(@"VSSpeechSynthesizer") alloc] init]; [speech setRate:(float)1.0]; [speech startSpeakingString:@"Hello world, how are you"];
  • 23. OpenEars SDK URL: http://www.politepix.com/openears/ Shared Source Based on CMU Pocketsphinx, CMU Flite, and CMU-CLMTK Works offline, both for recognition and synthesis Currently only supports English Synthetic sound (diphone voice synthesis) Pricing: free, with additional paid voices
  • 24. iSp!ch SDK URL: http://www.ispeech.org Commercial, free access for testing Needs a server connection Supports several languages: English (US, UK, m/f), Spanish (m/f), Chinese, Japanese, Danish, Finnish, Italian, German, Russian, ... Synthetic sound (diphone voice synthesis) Pricing: pay per use (0.02$ per TX) pay per install (0.25$ per install, minimum 10.000 installs)
  • 25. AT & T Sp!ch SDK URL: http://developer.att.com Commercial, free trial access for 90 days Pricing: USD 99 / year grants 1.000.000 API calls per month TTS API: Web Service: send text, get WAV back Voices: US English (male / female) US Spanish (male / female)
  • 26. Nuance URL: http://dragonmobile.nuancemobiledeveloper.com/ Commercial, free access for testing Needs a server connection Supports several languages: English (US, UK, m/f), Spanish (m/f), Chinese, Japanese, Danish, Finnish, Italian, German, Russian, ... Rather natural sound Pricing: Several Service Levels (Silver, Gold, Emerald) Silver: Up to 20 TX per device per day, max 500.000 devices Gold Pay per device ($0.24 per install) Pay per transaction ($0.009 per tx) Pre-payment of at least $3000
  • 28. is the translation of spoken words into text. Sp!ch Recognition
  • 29. Sp!ch recognition: Hist#y 1952: “Audrey” developed at Bell Labs. Could recognized digits spoken by a single voice. 1970s: DARPA Speech Unerstanding Research program. “Harpy”, developed at Carnegie Mellon University (could understand 1011 words). http://www.youtube.com/watch?v=N3i6NoUZsSw 1962: “Shoebox” by IBM, demonstrated at World Fair. Could recognize 16 words spoken in English. http://sysrun.haifa.il.ibm.com/ibm/history/exhibits/specialprod1/ specialprod1_7.html 1980s: By using statistical models (Hidden Markov Models), ASR vocabularies grew from a few hundred words over several thousand words to potentially unlimited numbers of words. Still, discrete dictation was required. 1990s: Dragon Naturally Speaking (originally at $9000) supports continuous speech recognition.
  • 31. Sp!ch recognition Language model Dictionary Acoustic model States Phonemes Words Sentences /’h/ /’h/ -> /a/ /a/ how will the weather be tomorrow todayshow me
  • 33. Sp!ch Recognition SDKs • Siri • iOS Voice Services • Flite • OpenEars (based on Flite) • iSpeech • Nuance • AT&T • Google TTS • Bing TTS
  • 34. • Siri • iOS Voice Services • Flite • OpenEars (based on Flite) • iSpeech • Nuance • AT&T • Google TTS • Bing TTS Sp!ch Recognition SDKs
  • 35. OpenEars SDK URL: http://www.politepix.com/openears/ Shared Source Based on CMU Pocketsphinx, CMU Flite, and CMU-CLMTK Works offline, both for recognition and synthesis Vocabulary: needs to be provided by developer Currently only supports English Pricing: free, with additional paid voices
  • 36. iSp!ch SDK URL: http://www.ispeech.org Commercial, free access for testing Needs a server connection Supports several languages: English (US, UK, m/f), Spanish (m/f), Chinese, Japanese, Danish, Finnish, Italian, German, Russian, ... Pricing: pay per use (0.02$ per TX) pay per install (0.25$ per install, minimum 10.000 installs)
  • 37. AT & T Sp!ch SDK URL: http://developer.att.com Commercial, free trial access for 90 days Pricing: USD 99 / year grants 1.000.000 API calls per month Supports several recognition contexts: Gaming, Social Media, Web Search, Business Search, Voicemail to Text, SMS, Question and Answer, TV, Generic Support for command mode: provide set of commands that are allowed in your app. Supports 19 languages (including English, German, Mandarin, Japanese, French, Italian)
  • 38. Nuance URL: http://dragonmobile.nuancemobiledeveloper.com/ Commercial, free access for testing Needs a server connection Supports several languages: English (US, UK), Spanish, Chinese, Japanese, Danish, Finnish, Italian, German, Russian, ... Pricing: Several Service Levels (Silver, Gold, Emerald) Silver: Up to 20 TX per device per day, max 500.000 devices Gold Pay per device ($0.24 per install) Pay per transaction ($0.009 per tx) Pre-payment of at least $3000
  • 42. + = ?+ + = ? + = ?
  • 44. Zühlke. Empowering Ideas. @peterfriese peter.friese@zuehlke.com http://www.zuehlke.com Want to learn more? Get in touch - I’m available for consulting:
  • 45. Zühlke. Empowering Ideas. @peterfriese peter.friese@zuehlke.com http://www.zuehlke.com Want to learn more? Get in touch - I’m available for consulting: http://slidesha.re/15xNxpf