SlideShare a Scribd company logo
Multimedia Information Processing
(1)
Koichi Shinoda
Tokyo Institute of Technology
1
Outline
• Theory and implementation of statistical
speech recognition
– Hidden Markov models
– Clustering, Bayes estimation, etc
– speaker adaptation
• Video information retrieval
2
Syllabus
1. Introdution: Sound and speech
2. Speech analysis
3. Very simple speech recognition
4. Hidden Markov model(1)
5. Hidden Markov model(2)
6. Continuous speech recognition
7. Language model
8. Speaker adaptation
9. Video Information Retrieval (1)
10. Video Information Retrieval (2)
3
My CV
1987 Graduated from The University of Tokyo (Physics)
1989 MS from The University of Tokyo (Astronomical physics)
1989 Joined NEC Corporation. Research on speech recognition.
1997 Visiting Scholar at Bell Labs, NJ, USA (-1998)
2001 Dr. Eng. from Tokyo Institute of Technology
2001 Associate Professor of The University of Tokyo
2003 Associate Professor of Dept. Computer Science,
Tokyo Institute of Technology
Visiting Associate Professor of
The Institute of Statistical Mathematics
2013 Professor of Dept. Computer Science, Tokyo Tech
4
Research Area
Statistical Pattern Recognition (Speech, Video)
• Acoustic Modeling for speech recognition
– High speed calculation in pattern matching
– Autonomous model-size control
– Graphical Modeling
– Active learning
• Speaker Adaptation for speech recognition
– Rapid improvement with a small amount of user’s utterances.
• Robust speech recognition
– Noises, Microphones, Channels,...
• Video Information Retrieval
– Highlight scene extraction from the broadcast of sports
– High level feature extraction
– Event detection (Surveillance)
• Multimodal interface
– Simultaneous input interface of speech and gestures.
• Social Signal Processing
– Data mining from human-human communication
5
Speech recognition
• Familiar in SF novels (2001 A Space Odyssey,
Blade Runner, Star Wars,…)
• Now used in car navigation, voice search, call
center business, etc
Problems:
spontaneous speech, noisy environment, multi-
modality, conversation, etc
6
A brief history of speech recognition
1952: The first speech recognition system(10 digits, Bell Labs)
1952: Dynamic Programming (DP) was used in Operation Research
1968: The theory of Hidden Markov Model(Baum)
1976: Research for Speech Recognition using HMM(IBM)
1978: Commercial speech recognition system using DP matching(10 digits, N
1983: The development of HMM based continuous speech recognition(AT&T
1980s∼: Large projects (DARPA)
1990s∼: Software for continuous speech recognition using HMMs
Speech recognition algorithm
Simple pattern matching → DP matching → HMM
Signal Processing – Extraction of good features
⇓ Computational theory, Hardware
Information-Theoretic approach – Data mining from large database
7
8
Gartner Hype Cycle for 2011
Video Analysis for
Consumer Service
Gesture
Recognition
Image
Recognition
Biometric
Authentication
Method
Speech
Recognition
Babble
Crash!
History of DARPA speech recognition
benchmark tests





1k
ATIS






100%
10%
1%
WORDERRORRATE
1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003
Read
Speech
Spontaneous
Speech
Conversational
Speech
Broadcast
SpeechVaried
Microphone
Noisy
20k
5k
foreign
Courtesy NIST 1999 DARPA
HUB-4 Report, Pallett et al.
foreign
Resource
Management
WSJ
Switchboard
NAB
A speech recognition system
10
My Research in NEC
• Automatic interpretation system between Japanese and English
(1989–1991)
• Large vocabulary speech recognition hardware (1993-1994)
• Speech recognition software on MS Windows (1994-1995)
• Dictation software (1998-2001)
• Robot with speech recognition function.
• Speech recognition middleware for car navigation system
• Telephone speech recognition
• Japanese-English recognition
• Speech input interface for many applications, such as presentation,
home appliance, train transfer guide.
• Robust speech recognition with microphone array.
11
Speech to speech translation system
(Japanese ⇔ English) 1989-1991
• NEC’s CI
(Computer & Communication)
• Speech recognition + machine
translation + speech synthesis
• Hardware implementation
• Demo at Telecom91 (Genève)
• I made English speech
recognition tools.
12
Large Vocabulary Speech Recognition
Device (1993-1994)
• Name: DS-1000
• Recognizes 1000 isolated words
• 2-3 million yen
• Market: hand-busy, eyes-busy
– Classify meet by their quality
– Rapping fish, vegetables
• Since CPU was not fast, we design a special LSI
• I went to business department for 3 months
• Circuit diagram, Time chart, Simulator, etc.
13
Dictation Software (1998-2001)
• Smart Voice series
• Large vocabulary
continuous speech
recognition
• Database, Algorithms,
Evaluation,…
• Team leader for
acoustic model
development
14
Other projects
15
What you learn in this lecture
• Even beginners can run speech recognition
– Many tools and software: HTK, Sphinx, Jucer, T-
cubed decoder
– But they do not know how it works
– They do not know how to solve problems
Speech recognition INSIDE
16
Textbook
• S. Furui, "Digital speech processing, synthesis,
and recognition", Second Edition, Marcel
Deccor, 2001.
• C. M. Bishop, "Pattern Recognition and
Machine Intelligence", Springer, 2006
17

More Related Content

What's hot

Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...
Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...
Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...
TAUS - The Language Data Network
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
avinash raibole
 
Nicolae_DUTA_CV.doc
Nicolae_DUTA_CV.docNicolae_DUTA_CV.doc
Nicolae_DUTA_CV.docbutest
 
Artificial Intelligence for Speech Recognition
Artificial Intelligence for Speech RecognitionArtificial Intelligence for Speech Recognition
Artificial Intelligence for Speech Recognition
RHIMRJ Journal
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEY
IJCERT
 
Voice/Speech recognition in mobile devices
Voice/Speech recognition in mobile devicesVoice/Speech recognition in mobile devices
Voice/Speech recognition in mobile devices
Harshad Karmarkar
 
Speech Analysis
Speech AnalysisSpeech Analysis
Speech Analysis
Mohamed Essam
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data mining
Jimit Rupani
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
Charu Joshi
 

What's hot (10)

Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...
Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...
Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...
 
Voice recognition system
Voice recognition systemVoice recognition system
Voice recognition system
 
Nicolae_DUTA_CV.doc
Nicolae_DUTA_CV.docNicolae_DUTA_CV.doc
Nicolae_DUTA_CV.doc
 
Automatic Speech Recognion
Automatic Speech RecognionAutomatic Speech Recognion
Automatic Speech Recognion
 
Artificial Intelligence for Speech Recognition
Artificial Intelligence for Speech RecognitionArtificial Intelligence for Speech Recognition
Artificial Intelligence for Speech Recognition
 
AUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEYAUTOMATIC SPEECH RECOGNITION- A SURVEY
AUTOMATIC SPEECH RECOGNITION- A SURVEY
 
Voice/Speech recognition in mobile devices
Voice/Speech recognition in mobile devicesVoice/Speech recognition in mobile devices
Voice/Speech recognition in mobile devices
 
Speech Analysis
Speech AnalysisSpeech Analysis
Speech Analysis
 
speech processing and recognition basic in data mining
speech processing and recognition basic in  data miningspeech processing and recognition basic in  data mining
speech processing and recognition basic in data mining
 
Speech recognition
Speech recognitionSpeech recognition
Speech recognition
 

Viewers also liked

Media, Multimedia & Digital Media
Media, Multimedia & Digital MediaMedia, Multimedia & Digital Media
Media, Multimedia & Digital Media
nylysy
 
Digital Multimedia Productions: Incroporating Wikis, the Jing Project and Oth...
Digital Multimedia Productions: Incroporating Wikis, the Jing Project and Oth...Digital Multimedia Productions: Incroporating Wikis, the Jing Project and Oth...
Digital Multimedia Productions: Incroporating Wikis, the Jing Project and Oth...
jvp3
 
Deep Learning: AI Breakthrough
Deep Learning: AI BreakthroughDeep Learning: AI Breakthrough
Deep Learning: AI Breakthrough
Mohsen Fayyaz
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
NamHyuk Ahn
 
Source, Message, and Channel Factors
Source, Message, and Channel FactorsSource, Message, and Channel Factors
Source, Message, and Channel Factors
Indrajit Bage
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
Drift
 
Build Features, Not Apps
Build Features, Not AppsBuild Features, Not Apps
Build Features, Not Apps
Natasha Murashev
 

Viewers also liked (7)

Media, Multimedia & Digital Media
Media, Multimedia & Digital MediaMedia, Multimedia & Digital Media
Media, Multimedia & Digital Media
 
Digital Multimedia Productions: Incroporating Wikis, the Jing Project and Oth...
Digital Multimedia Productions: Incroporating Wikis, the Jing Project and Oth...Digital Multimedia Productions: Incroporating Wikis, the Jing Project and Oth...
Digital Multimedia Productions: Incroporating Wikis, the Jing Project and Oth...
 
Deep Learning: AI Breakthrough
Deep Learning: AI BreakthroughDeep Learning: AI Breakthrough
Deep Learning: AI Breakthrough
 
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image SegmentationDeconvNet, DecoupledNet, TransferNet in Image Segmentation
DeconvNet, DecoupledNet, TransferNet in Image Segmentation
 
Source, Message, and Channel Factors
Source, Message, and Channel FactorsSource, Message, and Channel Factors
Source, Message, and Channel Factors
 
3 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 20173 Things Every Sales Team Needs to Be Thinking About in 2017
3 Things Every Sales Team Needs to Be Thinking About in 2017
 
Build Features, Not Apps
Build Features, Not AppsBuild Features, Not Apps
Build Features, Not Apps
 

Similar to Iitdmj 1

Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...
Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...
Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...
TAUS - The Language Data Network
 
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
VishnuRajuV
 
Aplikace pro rozpoznávání řeči - Jan Šedivý
Aplikace pro rozpoznávání řeči - Jan ŠedivýAplikace pro rozpoznávání řeči - Jan Šedivý
Aplikace pro rozpoznávání řeči - Jan Šedivý
Asociace UX (Prague ACM SIGCHI)
 
A survey on Enhancements in Speech Recognition
A survey on Enhancements in Speech RecognitionA survey on Enhancements in Speech Recognition
A survey on Enhancements in Speech Recognition
IRJET Journal
 
Nicolae_DUTA_CV.doc
Nicolae_DUTA_CV.docNicolae_DUTA_CV.doc
Nicolae_DUTA_CV.docbutest
 
Machine learning
Machine learningMachine learning
Machine learning
Apurva Mittal
 
20161014IROS_WS
20161014IROS_WS20161014IROS_WS
20161014IROS_WS
Komei Sugiura
 
Presentation.ai
Presentation.aiPresentation.ai
Presentation.ai
Yashwanth Reddy
 
Track 1 session 2 - st dev con 2016 - dsp concepts - innovating iot+wearab...
Track 1   session 2 - st dev con 2016 -  dsp concepts - innovating iot+wearab...Track 1   session 2 - st dev con 2016 -  dsp concepts - innovating iot+wearab...
Track 1 session 2 - st dev con 2016 - dsp concepts - innovating iot+wearab...
ST_World
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1Samiul Parag
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
ananth
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Zachary S. Brown
 
AI for voice recognition.pptx
AI for voice recognition.pptxAI for voice recognition.pptx
AI for voice recognition.pptx
JhalakDashora
 
silent sound new by RAJ NIRANJAN
silent sound new by RAJ NIRANJANsilent sound new by RAJ NIRANJAN
silent sound new by RAJ NIRANJAN
Raj Niranjan
 
Abstract of speech recognition
Abstract of speech recognitionAbstract of speech recognition
Abstract of speech recognitionVinay Jaisriram
 
Assign
AssignAssign
Ppsp icassp17v10
Ppsp icassp17v10Ppsp icassp17v10
Ppsp icassp17v10
Gérard Chollet
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And Recording
IOSR Journals
 
De4201715719
De4201715719De4201715719
De4201715719
IJERA Editor
 

Similar to Iitdmj 1 (20)

Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...
Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...
Spoken Language Translation, Past, Present, and Future, by Mark Seligman, Spo...
 
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptxLiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
LiDeng-BerlinOct2015-ASR-GenDisc-4by3.pptx
 
Aplikace pro rozpoznávání řeči - Jan Šedivý
Aplikace pro rozpoznávání řeči - Jan ŠedivýAplikace pro rozpoznávání řeči - Jan Šedivý
Aplikace pro rozpoznávání řeči - Jan Šedivý
 
A survey on Enhancements in Speech Recognition
A survey on Enhancements in Speech RecognitionA survey on Enhancements in Speech Recognition
A survey on Enhancements in Speech Recognition
 
Nicolae_DUTA_CV.doc
Nicolae_DUTA_CV.docNicolae_DUTA_CV.doc
Nicolae_DUTA_CV.doc
 
Machine learning
Machine learningMachine learning
Machine learning
 
20161014IROS_WS
20161014IROS_WS20161014IROS_WS
20161014IROS_WS
 
Presentation.ai
Presentation.aiPresentation.ai
Presentation.ai
 
Track 1 session 2 - st dev con 2016 - dsp concepts - innovating iot+wearab...
Track 1   session 2 - st dev con 2016 -  dsp concepts - innovating iot+wearab...Track 1   session 2 - st dev con 2016 -  dsp concepts - innovating iot+wearab...
Track 1 session 2 - st dev con 2016 - dsp concepts - innovating iot+wearab...
 
Digital speech processing lecture1
Digital speech processing lecture1Digital speech processing lecture1
Digital speech processing lecture1
 
Introduction To Applied Machine Learning
Introduction To Applied Machine LearningIntroduction To Applied Machine Learning
Introduction To Applied Machine Learning
 
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech RecognitionTeaching Machines to Listen: An Introduction to Automatic Speech Recognition
Teaching Machines to Listen: An Introduction to Automatic Speech Recognition
 
AI for voice recognition.pptx
AI for voice recognition.pptxAI for voice recognition.pptx
AI for voice recognition.pptx
 
silent sound new by RAJ NIRANJAN
silent sound new by RAJ NIRANJANsilent sound new by RAJ NIRANJAN
silent sound new by RAJ NIRANJAN
 
Amadou
AmadouAmadou
Amadou
 
Abstract of speech recognition
Abstract of speech recognitionAbstract of speech recognition
Abstract of speech recognition
 
Assign
AssignAssign
Assign
 
Ppsp icassp17v10
Ppsp icassp17v10Ppsp icassp17v10
Ppsp icassp17v10
 
Efficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And RecordingEfficient Intralingual Text To Speech Web Podcasting And Recording
Efficient Intralingual Text To Speech Web Podcasting And Recording
 
De4201715719
De4201715719De4201715719
De4201715719
 

Recently uploaded

Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
Jemma Hussein Allen
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
Alex Pruden
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
UiPathCommunity
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
Adtran
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
Pierluigi Pugliese
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
SOFTTECHHUB
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 

Recently uploaded (20)

Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
The Future of Platform Engineering
The Future of Platform EngineeringThe Future of Platform Engineering
The Future of Platform Engineering
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex ProofszkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
zkStudyClub - Reef: Fast Succinct Non-Interactive Zero-Knowledge Regex Proofs
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..UiPath Community Day Dubai: AI at Work..
UiPath Community Day Dubai: AI at Work..
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
Pushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 daysPushing the limits of ePRTC: 100ns holdover for 100 days
Pushing the limits of ePRTC: 100ns holdover for 100 days
 
Climate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing DaysClimate Impact of Software Testing at Nordic Testing Days
Climate Impact of Software Testing at Nordic Testing Days
 
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024By Design, not by Accident - Agile Venture Bolzano 2024
By Design, not by Accident - Agile Venture Bolzano 2024
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
Why You Should Replace Windows 11 with Nitrux Linux 3.5.0 for enhanced perfor...
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 

Iitdmj 1

  • 1. Multimedia Information Processing (1) Koichi Shinoda Tokyo Institute of Technology 1
  • 2. Outline • Theory and implementation of statistical speech recognition – Hidden Markov models – Clustering, Bayes estimation, etc – speaker adaptation • Video information retrieval 2
  • 3. Syllabus 1. Introdution: Sound and speech 2. Speech analysis 3. Very simple speech recognition 4. Hidden Markov model(1) 5. Hidden Markov model(2) 6. Continuous speech recognition 7. Language model 8. Speaker adaptation 9. Video Information Retrieval (1) 10. Video Information Retrieval (2) 3
  • 4. My CV 1987 Graduated from The University of Tokyo (Physics) 1989 MS from The University of Tokyo (Astronomical physics) 1989 Joined NEC Corporation. Research on speech recognition. 1997 Visiting Scholar at Bell Labs, NJ, USA (-1998) 2001 Dr. Eng. from Tokyo Institute of Technology 2001 Associate Professor of The University of Tokyo 2003 Associate Professor of Dept. Computer Science, Tokyo Institute of Technology Visiting Associate Professor of The Institute of Statistical Mathematics 2013 Professor of Dept. Computer Science, Tokyo Tech 4
  • 5. Research Area Statistical Pattern Recognition (Speech, Video) • Acoustic Modeling for speech recognition – High speed calculation in pattern matching – Autonomous model-size control – Graphical Modeling – Active learning • Speaker Adaptation for speech recognition – Rapid improvement with a small amount of user’s utterances. • Robust speech recognition – Noises, Microphones, Channels,... • Video Information Retrieval – Highlight scene extraction from the broadcast of sports – High level feature extraction – Event detection (Surveillance) • Multimodal interface – Simultaneous input interface of speech and gestures. • Social Signal Processing – Data mining from human-human communication 5
  • 6. Speech recognition • Familiar in SF novels (2001 A Space Odyssey, Blade Runner, Star Wars,…) • Now used in car navigation, voice search, call center business, etc Problems: spontaneous speech, noisy environment, multi- modality, conversation, etc 6
  • 7. A brief history of speech recognition 1952: The first speech recognition system(10 digits, Bell Labs) 1952: Dynamic Programming (DP) was used in Operation Research 1968: The theory of Hidden Markov Model(Baum) 1976: Research for Speech Recognition using HMM(IBM) 1978: Commercial speech recognition system using DP matching(10 digits, N 1983: The development of HMM based continuous speech recognition(AT&T 1980s∼: Large projects (DARPA) 1990s∼: Software for continuous speech recognition using HMMs Speech recognition algorithm Simple pattern matching → DP matching → HMM Signal Processing – Extraction of good features ⇓ Computational theory, Hardware Information-Theoretic approach – Data mining from large database 7
  • 8. 8 Gartner Hype Cycle for 2011 Video Analysis for Consumer Service Gesture Recognition Image Recognition Biometric Authentication Method Speech Recognition Babble Crash!
  • 9. History of DARPA speech recognition benchmark tests      1k ATIS       100% 10% 1% WORDERRORRATE 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 Read Speech Spontaneous Speech Conversational Speech Broadcast SpeechVaried Microphone Noisy 20k 5k foreign Courtesy NIST 1999 DARPA HUB-4 Report, Pallett et al. foreign Resource Management WSJ Switchboard NAB
  • 10. A speech recognition system 10
  • 11. My Research in NEC • Automatic interpretation system between Japanese and English (1989–1991) • Large vocabulary speech recognition hardware (1993-1994) • Speech recognition software on MS Windows (1994-1995) • Dictation software (1998-2001) • Robot with speech recognition function. • Speech recognition middleware for car navigation system • Telephone speech recognition • Japanese-English recognition • Speech input interface for many applications, such as presentation, home appliance, train transfer guide. • Robust speech recognition with microphone array. 11
  • 12. Speech to speech translation system (Japanese ⇔ English) 1989-1991 • NEC’s CI (Computer & Communication) • Speech recognition + machine translation + speech synthesis • Hardware implementation • Demo at Telecom91 (Genève) • I made English speech recognition tools. 12
  • 13. Large Vocabulary Speech Recognition Device (1993-1994) • Name: DS-1000 • Recognizes 1000 isolated words • 2-3 million yen • Market: hand-busy, eyes-busy – Classify meet by their quality – Rapping fish, vegetables • Since CPU was not fast, we design a special LSI • I went to business department for 3 months • Circuit diagram, Time chart, Simulator, etc. 13
  • 14. Dictation Software (1998-2001) • Smart Voice series • Large vocabulary continuous speech recognition • Database, Algorithms, Evaluation,… • Team leader for acoustic model development 14
  • 16. What you learn in this lecture • Even beginners can run speech recognition – Many tools and software: HTK, Sphinx, Jucer, T- cubed decoder – But they do not know how it works – They do not know how to solve problems Speech recognition INSIDE 16
  • 17. Textbook • S. Furui, "Digital speech processing, synthesis, and recognition", Second Edition, Marcel Deccor, 2001. • C. M. Bishop, "Pattern Recognition and Machine Intelligence", Springer, 2006 17