Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Adam	Coates
AI	for	100	million	people
with	deep	learning
Adam	Coates
@adampaulcoates
Silicon	Valley	AI	Lab
Adam	Coates
Silicon	Valley	AI	Lab
• Mission: Develop	hard	AI	technologies	that	let	us	have	
significant	impact	on	hundreds...
Adam	Coates
AI	for	100	million	people
• First	goal:		speech	recognition	everywhere.
If	you’re	connecting	to	internet	for	f...
Adam	Coates
AI	for	100	million	people
• First	goal:		speech	recognition	everywhere.
Mobile	devices
Captioning
Home	devices...
Adam	Coates
AI	for	100	million	people
• First	goal:		speech	recognition	everywhere.
Diversity
Accuracy
Specialized
solutio...
Adam	Coates
Speech	recognition:		Traditional	ASR
• Traditional	speech	systems	built	on	standard	ML	+	engineering	
practice...
Adam	Coates
Deep	learning
Data	&	Computation
Performance Deep	Learning	algorithms
Many	previous	methods
Major	advantage	of...
Adam	Coates
Speech	recognition	with	deep	learning
• Replace	most	of	speech	system	with	large	
neural	network.
Spectrograms...
Adam	Coates
“Deep	Speech”
• Pour	effort	into	data	+	computation.
– Try	to	catch	up	to	human	accuracy	by	scaling.
Data	&	Co...
Adam	Coates
Deep	Speech
Spectrogram
• Train	giant	neural	
networks	to	predict	
characters	from	audio.
– Train	“end	to	end”...
Adam	Coates
Raw	Training	Data
0
2000
4000
6000
8000
10000
12000
14000
WSJ Switchboard Fisher Deep	Speech
80 300
2000
11940...
Adam	Coates
Data	augmentation
• Many	forms	of	distortion	that	model	should	be	robust	to:
– Reverb,	noise,	far	field	effect...
Adam	Coates
Example:		far	field
Speech	
Impulse	Response
=*
Augmented	Far-field	speech
Compare:		Real	Far-field	speech
[Bi...
Adam	Coates
Augmented	dataset
0
10000
20000
30000
40000
50000
60000
70000
80000
90000
100000
WSJ Switchboard Fisher Deep	S...
Adam	Coates
Compute
Titan	X	GPU
~6	TeraFLOPs
“Speed	of	light”	=	3-6	weeks	on	1	GPU
1	experiment	requires	>10,000,000,000,0...
Adam	Coates
Compute
Infiniband network
Scale	out	to	large	numbers	of	GPUs	(e.g.,	8	– 64)
• Cut	experiment	times	to	~3-5	da...
Adam	Coates
Deep	Speech	for	Mandarin
• Deep	Speech	is	driven	by	data.
– Mandarin	is	very	different	from	English.
• “Tonal”...
Adam	Coates
Deep	Speech	for	Mandarin
• Deep	Speech	is	driven	by	data.
– Mandarin	is	very	different	from	English.
• “Tonal”...
Adam	Coates
Deep	Speech	for	Mandarin
• With	a	few	changes,	a	single	algorithm	can	learn	an	
entirely	new	language.
– Compe...
Adam	Coates
Making	devices	easier	and	more	efficient
• Does	speech	make	a	difference?		YES!
Adam	Coates
Comparing	speech	with	keyboard	input
• Compare	user	performance/experience	for	speech	
vs.	traditional	keyboar...
Adam	Coates
Speech	is	3x	faster	than	typing
Ruan et	al.,	arxiv.org/abs/1608.07323
Adam	Coates
…and	more	accurate.
Ruan	et	al.,	arxiv.org/abs/1608.07323
Adam	Coates
TalkType – voice-centric	keyboard	for	Android
• Opportunity	to	rethink	product	experiences	
around	speech	and	...
Adam	Coates
AI	for	100	million	people
• Deep	Learning	is	closing	gap	with	humans	on	
speech,	through	scalability.
– Still	...
Adam	Coates
Thank	you!
Adam	Coates
adamcoates@baidu.com
@adampaulcoates
If	you	want	to	help	bring	AI	to	100s	of	millions	
...
Upcoming SlideShare
Loading in …5
×

Adam Coates at AI Frontiers: AI for 100 Million People with Deep Learning

2,161 views

Published on

Large scale deep learning has made it possible for small teams of researchers and engineers to tackle hard AI problems that previously entailed massive engineering efforts. Adam shares the story of Baidu’s Deep Speech engine: how a recurrent neural network has evolved into a state-of-the-art production speech recognition system in multiple languages, often exceeding the abilities of native speakers. He covers the vision, the implementation, and some lessons learned to illustrate what it takes to build new AI technology that 100 million people will care about.

Published in: Technology

Adam Coates at AI Frontiers: AI for 100 Million People with Deep Learning

  1. 1. Adam Coates AI for 100 million people with deep learning Adam Coates @adampaulcoates Silicon Valley AI Lab
  2. 2. Adam Coates Silicon Valley AI Lab • Mission: Develop hard AI technologies that let us have significant impact on hundreds of millions of users. Ø Choose problems that significantly affect large numbers of people.
  3. 3. Adam Coates AI for 100 million people • First goal: speech recognition everywhere. If you’re connecting to internet for first time in 2017, you’re likely using a mobile device. Speech will transform mobile device interfaces.
  4. 4. Adam Coates AI for 100 million people • First goal: speech recognition everywhere. Mobile devices Captioning Home devices Cars / Hands-free interfaces
  5. 5. Adam Coates AI for 100 million people • First goal: speech recognition everywhere. Diversity Accuracy Specialized solutions (e.g., IVR) Traditional LVCSR Human-level speech recognition
  6. 6. Adam Coates Speech recognition: Traditional ASR • Traditional speech systems built on standard ML + engineering practices. Features Compute features Acoustic Model Predict phonemes Language Model Transcription “What time is it in Beijing?” Merge with pronunciation and language data. Sequence model Combine into state sequence Some applications can be solved this way. But it’s hard to scale our own cleverness.
  7. 7. Adam Coates Deep learning Data & Computation Performance Deep Learning algorithms Many previous methods Major advantage of deep learning: scalability. Effort
  8. 8. Adam Coates Speech recognition with deep learning • Replace most of speech system with large neural network. Spectrograms Language model Transcription “What time is it in Beijing?” Simple LM (no linguistic info) Deep Learning
  9. 9. Adam Coates “Deep Speech” • Pour effort into data + computation. – Try to catch up to human accuracy by scaling. Data & Computation Performance Deep Speech Many previous methods
  10. 10. Adam Coates Deep Speech Spectrogram • Train giant neural networks to predict characters from audio. – Train “end to end” [e.g., Graves et al., 2006] Ø Hard part is training at scale and searching for best model. Ø Need data + computing power.
  11. 11. Adam Coates Raw Training Data 0 2000 4000 6000 8000 10000 12000 14000 WSJ Switchboard Fisher Deep Speech 80 300 2000 11940 Hours of raw speech data We need a lot of data for end to end DL systems: use read speech.
  12. 12. Adam Coates Data augmentation • Many forms of distortion that model should be robust to: – Reverb, noise, far field effects, echo, compression artifacts, changes in tempo. • Learn to be robust by training from data with distortions! – Easier to engineer data pipeline than to engineer recognition pipeline. Raw audio ($$$$) Novel audio Augmentation
  13. 13. Adam Coates Example: far field Speech Impulse Response =* Augmented Far-field speech Compare: Real Far-field speech [Billy Jun, Rewon Child, Sanjeev Satheesh] Reduces errors on far-field data by 15%-20%. Relies on model search + large-scale training.
  14. 14. Adam Coates Augmented dataset 0 10000 20000 30000 40000 50000 60000 70000 80000 90000 100000 WSJ Switchboard Fisher Deep Speech 80 300 2000 ~96,000 Synthesized data • Augmentation greatly expands available data. – Trained models have heard decades of unique audio.
  15. 15. Adam Coates Compute Titan X GPU ~6 TeraFLOPs “Speed of light” = 3-6 weeks on 1 GPU 1 experiment requires >10,000,000,000,000,000,000 FLOPs (10s of ExaFLOPs)
  16. 16. Adam Coates Compute Infiniband network Scale out to large numbers of GPUs (e.g., 8 – 64) • Cut experiment times to ~3-5 days. – Achieve ~50% of peak FLOPs on 8+ GPUs. – Comparable to supercomputing workloads.
  17. 17. Adam Coates Deep Speech for Mandarin • Deep Speech is driven by data. – Mandarin is very different from English. • “Tonal”, thousands of characters DeepSpeech Training “My favorite singer is…” English training data
  18. 18. Adam Coates Deep Speech for Mandarin • Deep Speech is driven by data. – Mandarin is very different from English. • “Tonal”, thousands of characters DeepSpeech Training “我最喜欢的歌手是…” Mandarin training data
  19. 19. Adam Coates Deep Speech for Mandarin • With a few changes, a single algorithm can learn an entirely new language. – Competitive with committee of native speakers for short audio clips. Ø Learns hybrid speech (e.g., famous people, iphones): 我最喜欢的歌手是Angelababy
  20. 20. Adam Coates Making devices easier and more efficient • Does speech make a difference? YES!
  21. 21. Adam Coates Comparing speech with keyboard input • Compare user performance/experience for speech vs. traditional keyboard. [Ruan et al., arxiv.org/abs/1608.07323]
  22. 22. Adam Coates Speech is 3x faster than typing Ruan et al., arxiv.org/abs/1608.07323
  23. 23. Adam Coates …and more accurate. Ruan et al., arxiv.org/abs/1608.07323
  24. 24. Adam Coates TalkType – voice-centric keyboard for Android • Opportunity to rethink product experiences around speech and AI. [Nicky Chan, Bijit Halder, Kenny Liou, Thuan Nguyen, Nina Wei]talktype.baidu.com
  25. 25. Adam Coates AI for 100 million people • Deep Learning is closing gap with humans on speech, through scalability. – Still more to do; but it keeps getting better. • Speech already enabling proliferation of new AI products. – Let’s make them work for everyone.
  26. 26. Adam Coates Thank you! Adam Coates adamcoates@baidu.com @adampaulcoates If you want to help bring AI to 100s of millions of people, come talk to us! http://research.baidu.com

×