SlideShare a Scribd company logo
Towards A Spoken Version of Google
─ making the global knowledge
accessible by voice
Lin-shan Lee
National Taiwan University
• Google, Amazon, Facebook, YouTube offer most of our daily life
information
• Physical knowledge archives (e.g. libraries, museums) useful in the
past are developing their virtualized versions
• Online courses, distant learning, electronic books, etc. become the
most efficient learning aids
Internet is the Only Largest Archive for Global
Human Knowledge
Internet
Real-time
Information
– weather, traffic
– flight schedule
– stock price
– sports scores
Special Services
– Google
– Facebook
–YouTube
– Amazon
Knowledge
Archieves
– digital libraries
– virtual museums
Intelligent Working
Environment
– intelligent agents
– teleconferencing
– distant learning
– electric commerce
Private Services
– personal assistants
– business databases
– home appliances
– network
entertainments
• Have been An Important Part of our Daily Life
Google and the Internet
• Finding any desired information (in text) over the Internet
relevant to the user instructions (in text): Information
Retrieval
• Google is able to match and find the relevance between
segments of text
Internet
Google
……………
…………..
…….
server
server
User instructions
(in text)
Documents/Information
(in text)
(in text)
(in text)
Google and the Internet
– for multimedia information (e.g. YouTube, online courses, including
video and audio) to be retrieved by Google, the content needs to include
text descriptions
• Google can only read text information
• Have been An Important Part of our Daily Life
Internet
……………
…………..
…….
server
server
User instructions
(in text)
Documents/Information
(in text)
Google
Google and the Internet
• The Whole Video (a lecture or a news story) can be retrieved,
but Not the Exact Sentence
– the text descriptions is for a video or a lecture
• Have been An Important Part of our Daily Life
Internet
……………
…………..
…….
server
server
User instructions
(in text)
Documents/Information
(in text)
Google
Google and the Internet
• The Audio Information (voice in a video or a lecture) in
Multimedia very often tells what is going on in the
Multimedia
• Can be used as the Key for Information Retrieval
• Have been An Important Part of our Daily Life
Internet
……………
…………..
…….
server
server
User instructions
(in text)
Documents/Information
(in text)
Google
Speech Technologies Advancing Very Fast Today
• Speech Recognition / Synthesis Technologies
• Machines can Listen to Voice or Read Text
• All Roles of Text can be Realized by Voice
Speech Recognition
(machine)
Speech Synthesis
(machine)
text
voice
How are you today?
Good morning!
Spoken language written language
Machines
listen to
the audio
part
Google
voice
information
Multimedia
Content
Internet
voice
input/
output
text
information
• Information Retrieval as Google does can be performed based on Voice
• User-Content Interaction can be Accomplished by Spoken and Multi-
modal Dialogues (including other Modes of Interaction, e.g. those with
fingers)
• Text Information found can be transformed to Voice
Google
(text-based)
Text
Content
Information
Retrieval
(voice-based)
Speech Synthesis
Spoken and
Multi-modal
Dialogue
Spoken Version of Google
Google may have a Spoken Version
Information Retrieval (text/voice-based)
• Both the User Instructions and Network Content can be in
form of Voice
Voice Instructions
US/China Trade Policy ?
Text Instructions
d1
Text Information
d2
d3
d1
d2
d3
Voice Information
President Donald Trump…
Google
/voice
Voice Interface is Convenient for All Different Kinds of
User Terminals
Internet
Text Content
Multimedia
Content
• Smart phones, Hand-held Devices, Wearable Devices (Watches,
Glasses, etc.), Notebooks, Vehicular Electronics, Home
Appliances …
• Network Access at Any Time, from Anywhere
• Small in Size, Light in Weight, Ubiquitous, Invisible…
• Voice is the Only Interface Convenient for ALL User Terminals at
Any Time, from Anywhere, and To the Point in one Utterance
Google and the Internet
• Have been An Important Part of our Daily Life
Internet
Google
……………
…………..
…….
server
server
User instructions
(in text)
Documents/Information
(in text)
• Finding any desired information (in text) over the Internet
relevant to the user instructions (in text): Information
Retrieval
• Google is able to match and find the relevance between
segments of text
Google can have a Spoken Version
• Finding any desired information (in voice) over the Internet
relevant to the user instructions (in voice): Information
Retrieval
• Spoken version of Google is able to match and find the
relevance between segments of voice
• Have been An Important Part of our Daily Life
Internet
Google
……………
…………..
…….
server
server
User instructions
(in voice)
Documents/Information
(in voice)
Google can have a Spoken Version
• Can Locate the Exact Time the Desired Information Appears
in the Multimedia
• No Need for Text Descriptions
• Have been An Important Part of our Daily Life
Internet
Google
……………
…………..
…….
server
server
User instructions
(in voice)
Documents/Information
(in voice)
Retrieval of Public TV News of Taiwan (公視新聞
搜尋)
• 2004
• Locate the Exact Relevant Utterances (Spoken Sentences)
without Text Descriptions
Retrieval of Public TV News of Taiwan (公視新聞
搜尋)
• 2004
• Locate the Exact Relevant Utterances (Spoken Sentences)
without Text Descriptions
User-Content Interaction for Spoken Content
Retrieval
• Problems
– User-content interaction always important even for text content
– Unlike text content, spoken content not easily summarized on screen,
thus retrieved results difficult to scan and select
User
Query
Spoken
Archives
Retrieved Results Spoken Version
of Google
User-Content Interaction for Spoken Content
Retrieval
• Possible Approaches
– Automatic summary/title generation and keyword extraction from
spoken content
– Topic structure for spoken content
– Multi-modal dialogue with improved interaction
Keyword/
Titles/Summaries
User
Query
Multi-modal
Dialogue
Spoken
Archives
Retrieved Results Spoken Version
of Google
User
Interface Topic
Structure
Spoken version of Google
X1
X2
X3
X4
X5
X6
Correctly recognized word
X1
X3
summary
• Selecting most representative
utterances but avoiding redundancy
Wrongly recognized word
t2
t1
Summarization of A Segment of Voice
User-Content Interaction for Spoken Content
Retrieval
• Possible Approaches
– Automatic summary/title generation and keyword extraction from
spoken content
– Topic structure for spoken content
– Multi-modal dialogue with improved interaction
Keyword/
Titles/Summaries
User
Query
Multi-modal
Dialogue
Spoken
Archives
Retrieved Results Spoken Version
of Google
User
Interface Topic
Structure
Spoken version of Google
• Example 1: retrieved results Grouped by Topics and
organized in a Two-dimensional Tree Structure
– each group of retrieved segments labeled by a set of keywords (topic)
– each group expanded into a map in the next layer
Topic Structure (1/2)
Broadcast News Browser (2006)
(電視新聞瀏覽器)
Demonstration
Summary NewsVideo
• Usually very long time required (e.g. 45 hrs) to learn from a
complete online course
– difficult for very busy people to learn pieces of knowledge
• Possible to retrieve a segment of lecture with a browser
– knowledge transfer in a course is usually sequential
– not easy to understand a lecture without listening to the previous
lectures
– not easy to find out background or related knowledge
• Possible Approaches
– consecutive audio/video for each slide taken as a segment of
multimedia (a page of spoken slide)
– each spoken slide labeled by the keywords (topic) extracted from the
audio
– relationships between keywords of the course represented by a graph
Online Courses
• Example 2: Keyword Graph
– each spoken slide labeled by a set of keywords (topics)
– relationships between keywords represented by a graph
Topic Structure (2/2)
-----
-----
-----
-----
---------
---------
---------
---
-------
-------
-------
----
spoken
slides
keyword
graph
Acoustic
Modeling
Viterbi
search
HMM
Language
Modeling
Perplexity
NTU Virtual Instructor (2009)
(台大虛擬教師)
Demonstration
Too Many Online Courses
752 matches
• A user enters a keyword or a key phrase to coursera
Machines Listening to the online Courses
Lectures with very
similar content
three courses on
some similar topic
sequential order for
learning (prerequisite
conditions)
Machines Listening to the online Courses
three courses on
some similar topic
Learning Map Produced by
Machine (2014)
(機器製作學習地圖)
Demonstration
• To Which Degree can Machines Understand the Spoken
Content ?
• Machines to take TOEFL Listening Comprehension Test (an
English test for students whose native language is not English)
• An Example Problem:
A Question: “ What is a possible origin of Venus’ clouds? ”
Answer options:
(A) gases released as a result of volcanic activity
(C) bursts of radio energy from the plane's surface
(D) strong winds that blow dust into the atmosphere
Recorded voice: (5min long)
Machine Comprehension of Spoken Content
(B) chemical reactions caused by high surface temperatures
(1) (2) (3) (4) (5)
by some hand-crafted rules without
listening to the recorded voice
Best Machine Performance: 50.0%
choosing the shortest answer
Machine Comprehension of Spoken Content
(%)
• Machine Performance 2017
Accuracy
What is Technically
Achievable in the Future ?
What can Machines do for Human ?
• Machines can listen to and understand the entire multimedia
knowledge archive over the Internet
– extracting desired information for each individual user
300hrs of videos
uploaded per min
(2015.01)
Roughly 2000 online
courses on Coursera
(2016.04)
• Nobody can go through so much multimedia
information, but Machines can
• Multimedia Content exponentially
increasing over the Internet
– best archive of global human knowledge is here
– desired information deeply buried under huge quantities
of unrelated information
An Example: Personalized Courses
• Machines generate desired personalized courses for each
individual user
I wish to learn some knowledge
about the masterpieces of
Wolfgang Amadeus Mozart
I am an engineer. I know nothing
about music
I can spend 3 hrs to learn
user
Thank you for your request. This is
the 3-hr personalized course for
you.
Information
from Internet
Spoken Version of Google
• Speech Recognition and Synthesis Technologies Make
Machines Capable of Listening to and Speaking Human Voice
– the best the machines can do may be as good as the human
• Machines can Handle Huge Quantities of Information
– much more efficient than human
• Google Reads All Text over the Internet
– similarly machines can listen to all voices over the Internet
• Internet is the Only Largest Archive for Global Human
Knowledge
– voice can be the key to that archive

More Related Content

Similar to Towards A Spoken Version of Google

Intro to call cai
Intro to call caiIntro to call cai
Intro to call cai
Izaham
 
Google Wave Platform: Exploring the Settings for Personalized Learning
Google Wave Platform: Exploring the Settings for Personalized LearningGoogle Wave Platform: Exploring the Settings for Personalized Learning
Google Wave Platform: Exploring the Settings for Personalized LearningMalinka Ivanova
 
Tsl641
Tsl641Tsl641
Tsl641
Izaham
 
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
linshanleearchive
 
M portfolios poster
M portfolios posterM portfolios poster
M portfolios poster
Helen Barrett
 
IGNIS 2017 - Marcia Pedersen and Anita Peng 052317
IGNIS 2017 - Marcia Pedersen and Anita Peng 052317IGNIS 2017 - Marcia Pedersen and Anita Peng 052317
IGNIS 2017 - Marcia Pedersen and Anita Peng 052317
SBCTCProfessionalLearning
 
DTUI6_chap09_accessiblePPT.pptx
DTUI6_chap09_accessiblePPT.pptxDTUI6_chap09_accessiblePPT.pptx
DTUI6_chap09_accessiblePPT.pptx
HetaSuto
 
Engaged Learning Using Web 2.0 Technologies
Engaged Learning Using Web 2.0 TechnologiesEngaged Learning Using Web 2.0 Technologies
Engaged Learning Using Web 2.0 Technologies
Nanette Stillwell
 
Inclusive teaching using technology and the internet
Inclusive teaching using technology and the internetInclusive teaching using technology and the internet
Inclusive teaching using technology and the internet
E.A. Draffan
 
Using the Internet as a delivery platform for Open and Distance Learning mate...
Using the Internet as a delivery platform for Open and Distance Learning mate...Using the Internet as a delivery platform for Open and Distance Learning mate...
Using the Internet as a delivery platform for Open and Distance Learning mate...
Dr Wayne Barry
 
Digital Tools for Teaching in the 21st Century
Digital Tools for Teaching in the 21st CenturyDigital Tools for Teaching in the 21st Century
Digital Tools for Teaching in the 21st Century
Federico Espinosa
 
2007 LITA National Forum 2007. Denver, Colorado
2007 LITA National Forum  2007. Denver, Colorado2007 LITA National Forum  2007. Denver, Colorado
2007 LITA National Forum 2007. Denver, Colorado
Western Illinois University
 
Asynchronous Computer-mediated Communication
Asynchronous Computer-mediated CommunicationAsynchronous Computer-mediated Communication
Asynchronous Computer-mediated CommunicationJonathan Smart
 
Computer-Assisted Language Learning
Computer-Assisted Language LearningComputer-Assisted Language Learning
Computer-Assisted Language Learning
Hassan Saleh Mahdi
 
Google Wave Platform. Exploring the Settings for Personalized Learning
Google Wave Platform. Exploring the Settings for Personalized LearningGoogle Wave Platform. Exploring the Settings for Personalized Learning
Google Wave Platform. Exploring the Settings for Personalized Learning
Javed Alam
 
(7)review of a de technology (web conferencing)
(7)review of a de technology  (web conferencing)(7)review of a de technology  (web conferencing)
(7)review of a de technology (web conferencing)Abdullah Al-Hatami
 
Developing online listening exercises for natural English
Developing online listening exercises for natural EnglishDeveloping online listening exercises for natural English
Developing online listening exercises for natural English
Vance Stevens
 
Blended Learning Technology Access
Blended Learning Technology AccessBlended Learning Technology Access
Blended Learning Technology Access
Universidad Americana (UAM)
 
Technology in the Classroom
Technology in the ClassroomTechnology in the Classroom
Technology in the ClassroomVideoguy
 

Similar to Towards A Spoken Version of Google (20)

Educ190report (1)
Educ190report (1)Educ190report (1)
Educ190report (1)
 
Intro to call cai
Intro to call caiIntro to call cai
Intro to call cai
 
Google Wave Platform: Exploring the Settings for Personalized Learning
Google Wave Platform: Exploring the Settings for Personalized LearningGoogle Wave Platform: Exploring the Settings for Personalized Learning
Google Wave Platform: Exploring the Settings for Personalized Learning
 
Tsl641
Tsl641Tsl641
Tsl641
 
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
Doing Something We Never Could with Spoken Language Technologies_109-10-29_In...
 
M portfolios poster
M portfolios posterM portfolios poster
M portfolios poster
 
IGNIS 2017 - Marcia Pedersen and Anita Peng 052317
IGNIS 2017 - Marcia Pedersen and Anita Peng 052317IGNIS 2017 - Marcia Pedersen and Anita Peng 052317
IGNIS 2017 - Marcia Pedersen and Anita Peng 052317
 
DTUI6_chap09_accessiblePPT.pptx
DTUI6_chap09_accessiblePPT.pptxDTUI6_chap09_accessiblePPT.pptx
DTUI6_chap09_accessiblePPT.pptx
 
Engaged Learning Using Web 2.0 Technologies
Engaged Learning Using Web 2.0 TechnologiesEngaged Learning Using Web 2.0 Technologies
Engaged Learning Using Web 2.0 Technologies
 
Inclusive teaching using technology and the internet
Inclusive teaching using technology and the internetInclusive teaching using technology and the internet
Inclusive teaching using technology and the internet
 
Using the Internet as a delivery platform for Open and Distance Learning mate...
Using the Internet as a delivery platform for Open and Distance Learning mate...Using the Internet as a delivery platform for Open and Distance Learning mate...
Using the Internet as a delivery platform for Open and Distance Learning mate...
 
Digital Tools for Teaching in the 21st Century
Digital Tools for Teaching in the 21st CenturyDigital Tools for Teaching in the 21st Century
Digital Tools for Teaching in the 21st Century
 
2007 LITA National Forum 2007. Denver, Colorado
2007 LITA National Forum  2007. Denver, Colorado2007 LITA National Forum  2007. Denver, Colorado
2007 LITA National Forum 2007. Denver, Colorado
 
Asynchronous Computer-mediated Communication
Asynchronous Computer-mediated CommunicationAsynchronous Computer-mediated Communication
Asynchronous Computer-mediated Communication
 
Computer-Assisted Language Learning
Computer-Assisted Language LearningComputer-Assisted Language Learning
Computer-Assisted Language Learning
 
Google Wave Platform. Exploring the Settings for Personalized Learning
Google Wave Platform. Exploring the Settings for Personalized LearningGoogle Wave Platform. Exploring the Settings for Personalized Learning
Google Wave Platform. Exploring the Settings for Personalized Learning
 
(7)review of a de technology (web conferencing)
(7)review of a de technology  (web conferencing)(7)review of a de technology  (web conferencing)
(7)review of a de technology (web conferencing)
 
Developing online listening exercises for natural English
Developing online listening exercises for natural EnglishDeveloping online listening exercises for natural English
Developing online listening exercises for natural English
 
Blended Learning Technology Access
Blended Learning Technology AccessBlended Learning Technology Access
Blended Learning Technology Access
 
Technology in the Classroom
Technology in the ClassroomTechnology in the Classroom
Technology in the Classroom
 

More from linshanleearchive

星雲教育獎頒獎典禮手冊
星雲教育獎頒獎典禮手冊星雲教育獎頒獎典禮手冊
星雲教育獎頒獎典禮手冊
linshanleearchive
 
國立臺灣大學電機資訊學院學術貢獻獎設置辦法.pdf
國立臺灣大學電機資訊學院學術貢獻獎設置辦法.pdf國立臺灣大學電機資訊學院學術貢獻獎設置辦法.pdf
國立臺灣大學電機資訊學院學術貢獻獎設置辦法.pdf
linshanleearchive
 
新科學創造新文明 Part 2
新科學創造新文明 Part 2新科學創造新文明 Part 2
新科學創造新文明 Part 2
linshanleearchive
 
新科學創造新文明 Part 1
新科學創造新文明 Part 1新科學創造新文明 Part 1
新科學創造新文明 Part 1
linshanleearchive
 
2013《無涯學海渡扁舟:課本論文中不曾討論的電機資訊經驗談》 電機系大學部專題討論
2013《無涯學海渡扁舟:課本論文中不曾討論的電機資訊經驗談》 電機系大學部專題討論2013《無涯學海渡扁舟:課本論文中不曾討論的電機資訊經驗談》 電機系大學部專題討論
2013《無涯學海渡扁舟:課本論文中不曾討論的電機資訊經驗談》 電機系大學部專題討論
linshanleearchive
 
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
linshanleearchive
 
2022 國際語音學會科學成就獎章得獎致詞
2022 國際語音學會科學成就獎章得獎致詞2022 國際語音學會科學成就獎章得獎致詞
2022 國際語音學會科學成就獎章得獎致詞
linshanleearchive
 
琳山老師榮退感言.pptx
琳山老師榮退感言.pptx琳山老師榮退感言.pptx
琳山老師榮退感言.pptx
linshanleearchive
 
2021《芝麻開門——語音的聲音開啟人類文明的無限空間》台大科學教育中心「探索科學講座」
2021《芝麻開門——語音的聲音開啟人類文明的無限空間》台大科學教育中心「探索科學講座」2021《芝麻開門——語音的聲音開啟人類文明的無限空間》台大科學教育中心「探索科學講座」
2021《芝麻開門——語音的聲音開啟人類文明的無限空間》台大科學教育中心「探索科學講座」
linshanleearchive
 
芝麻開門:語音技術的前世今生
芝麻開門:語音技術的前世今生芝麻開門:語音技術的前世今生
芝麻開門:語音技術的前世今生
linshanleearchive
 
Spoken Content Retrieval
Spoken Content RetrievalSpoken Content Retrieval
Spoken Content Retrieval
linshanleearchive
 
From Semantics to Self-supervised Learning for Speech and Beyond
From Semantics to Self-supervised Learning for Speech and BeyondFrom Semantics to Self-supervised Learning for Speech and Beyond
From Semantics to Self-supervised Learning for Speech and Beyond
linshanleearchive
 
輕舟已過萬重山
輕舟已過萬重山輕舟已過萬重山
輕舟已過萬重山
linshanleearchive
 
2016《華語語音辨識研究的先驅者》科學月刊專訪
2016《華語語音辨識研究的先驅者》科學月刊專訪2016《華語語音辨識研究的先驅者》科學月刊專訪
2016《華語語音辨識研究的先驅者》科學月刊專訪
linshanleearchive
 
2017《推動產業轉型 大學必修課程先鬆綁》自由時報星期專訪
2017《推動產業轉型 大學必修課程先鬆綁》自由時報星期專訪2017《推動產業轉型 大學必修課程先鬆綁》自由時報星期專訪
2017《推動產業轉型 大學必修課程先鬆綁》自由時報星期專訪
linshanleearchive
 
2017《推動產業轉型 大學必修課程先鬆綁》
2017《推動產業轉型 大學必修課程先鬆綁》2017《推動產業轉型 大學必修課程先鬆綁》
2017《推動產業轉型 大學必修課程先鬆綁》
linshanleearchive
 
無涯學海渡扁舟 - 課本論文中不曾討論的電機資訊經驗談
無涯學海渡扁舟 - 課本論文中不曾討論的電機資訊經驗談無涯學海渡扁舟 - 課本論文中不曾討論的電機資訊經驗談
無涯學海渡扁舟 - 課本論文中不曾討論的電機資訊經驗談
linshanleearchive
 
芝麻開門 - 語音技術的前世今生
芝麻開門 - 語音技術的前世今生芝麻開門 - 語音技術的前世今生
芝麻開門 - 語音技術的前世今生
linshanleearchive
 
Spoken Content Retrieval - Lattices and Beyond
Spoken Content Retrieval - Lattices and BeyondSpoken Content Retrieval - Lattices and Beyond
Spoken Content Retrieval - Lattices and Beyond
linshanleearchive
 
105-08-17 輕舟已過萬重山
105-08-17 輕舟已過萬重山105-08-17 輕舟已過萬重山
105-08-17 輕舟已過萬重山
linshanleearchive
 

More from linshanleearchive (20)

星雲教育獎頒獎典禮手冊
星雲教育獎頒獎典禮手冊星雲教育獎頒獎典禮手冊
星雲教育獎頒獎典禮手冊
 
國立臺灣大學電機資訊學院學術貢獻獎設置辦法.pdf
國立臺灣大學電機資訊學院學術貢獻獎設置辦法.pdf國立臺灣大學電機資訊學院學術貢獻獎設置辦法.pdf
國立臺灣大學電機資訊學院學術貢獻獎設置辦法.pdf
 
新科學創造新文明 Part 2
新科學創造新文明 Part 2新科學創造新文明 Part 2
新科學創造新文明 Part 2
 
新科學創造新文明 Part 1
新科學創造新文明 Part 1新科學創造新文明 Part 1
新科學創造新文明 Part 1
 
2013《無涯學海渡扁舟:課本論文中不曾討論的電機資訊經驗談》 電機系大學部專題討論
2013《無涯學海渡扁舟:課本論文中不曾討論的電機資訊經驗談》 電機系大學部專題討論2013《無涯學海渡扁舟:課本論文中不曾討論的電機資訊經驗談》 電機系大學部專題討論
2013《無涯學海渡扁舟:課本論文中不曾討論的電機資訊經驗談》 電機系大學部專題討論
 
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...From Semantics to Self-supervised Learning  for Speech and Beyond (Opening Ke...
From Semantics to Self-supervised Learning for Speech and Beyond (Opening Ke...
 
2022 國際語音學會科學成就獎章得獎致詞
2022 國際語音學會科學成就獎章得獎致詞2022 國際語音學會科學成就獎章得獎致詞
2022 國際語音學會科學成就獎章得獎致詞
 
琳山老師榮退感言.pptx
琳山老師榮退感言.pptx琳山老師榮退感言.pptx
琳山老師榮退感言.pptx
 
2021《芝麻開門——語音的聲音開啟人類文明的無限空間》台大科學教育中心「探索科學講座」
2021《芝麻開門——語音的聲音開啟人類文明的無限空間》台大科學教育中心「探索科學講座」2021《芝麻開門——語音的聲音開啟人類文明的無限空間》台大科學教育中心「探索科學講座」
2021《芝麻開門——語音的聲音開啟人類文明的無限空間》台大科學教育中心「探索科學講座」
 
芝麻開門:語音技術的前世今生
芝麻開門:語音技術的前世今生芝麻開門:語音技術的前世今生
芝麻開門:語音技術的前世今生
 
Spoken Content Retrieval
Spoken Content RetrievalSpoken Content Retrieval
Spoken Content Retrieval
 
From Semantics to Self-supervised Learning for Speech and Beyond
From Semantics to Self-supervised Learning for Speech and BeyondFrom Semantics to Self-supervised Learning for Speech and Beyond
From Semantics to Self-supervised Learning for Speech and Beyond
 
輕舟已過萬重山
輕舟已過萬重山輕舟已過萬重山
輕舟已過萬重山
 
2016《華語語音辨識研究的先驅者》科學月刊專訪
2016《華語語音辨識研究的先驅者》科學月刊專訪2016《華語語音辨識研究的先驅者》科學月刊專訪
2016《華語語音辨識研究的先驅者》科學月刊專訪
 
2017《推動產業轉型 大學必修課程先鬆綁》自由時報星期專訪
2017《推動產業轉型 大學必修課程先鬆綁》自由時報星期專訪2017《推動產業轉型 大學必修課程先鬆綁》自由時報星期專訪
2017《推動產業轉型 大學必修課程先鬆綁》自由時報星期專訪
 
2017《推動產業轉型 大學必修課程先鬆綁》
2017《推動產業轉型 大學必修課程先鬆綁》2017《推動產業轉型 大學必修課程先鬆綁》
2017《推動產業轉型 大學必修課程先鬆綁》
 
無涯學海渡扁舟 - 課本論文中不曾討論的電機資訊經驗談
無涯學海渡扁舟 - 課本論文中不曾討論的電機資訊經驗談無涯學海渡扁舟 - 課本論文中不曾討論的電機資訊經驗談
無涯學海渡扁舟 - 課本論文中不曾討論的電機資訊經驗談
 
芝麻開門 - 語音技術的前世今生
芝麻開門 - 語音技術的前世今生芝麻開門 - 語音技術的前世今生
芝麻開門 - 語音技術的前世今生
 
Spoken Content Retrieval - Lattices and Beyond
Spoken Content Retrieval - Lattices and BeyondSpoken Content Retrieval - Lattices and Beyond
Spoken Content Retrieval - Lattices and Beyond
 
105-08-17 輕舟已過萬重山
105-08-17 輕舟已過萬重山105-08-17 輕舟已過萬重山
105-08-17 輕舟已過萬重山
 

Recently uploaded

Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
Celine George
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
Tamralipta Mahavidyalaya
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Atul Kumar Singh
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
RaedMohamed3
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
TechSoup
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
Special education needs
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
Peter Windle
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
Levi Shapiro
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
SACHIN R KONDAGURI
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
joachimlavalley1
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
Jisc
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
Jisc
 

Recently uploaded (20)

Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17How to Make a Field invisible in Odoo 17
How to Make a Field invisible in Odoo 17
 
Home assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdfHome assignment II on Spectroscopy 2024 Answers.pdf
Home assignment II on Spectroscopy 2024 Answers.pdf
 
Guidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th SemesterGuidance_and_Counselling.pdf B.Ed. 4th Semester
Guidance_and_Counselling.pdf B.Ed. 4th Semester
 
The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Palestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptxPalestine last event orientationfvgnh .pptx
Palestine last event orientationfvgnh .pptx
 
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup   New Member Orientation and Q&A (May 2024).pdfWelcome to TechSoup   New Member Orientation and Q&A (May 2024).pdf
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdf
 
special B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdfspecial B.ed 2nd year old paper_20240531.pdf
special B.ed 2nd year old paper_20240531.pdf
 
Embracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic ImperativeEmbracing GenAI - A Strategic Imperative
Embracing GenAI - A Strategic Imperative
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...
 
"Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe..."Protectable subject matters, Protection in biotechnology, Protection of othe...
"Protectable subject matters, Protection in biotechnology, Protection of othe...
 
Additional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdfAdditional Benefits for Employee Website.pdf
Additional Benefits for Employee Website.pdf
 
The approach at University of Liverpool.pptx
The approach at University of Liverpool.pptxThe approach at University of Liverpool.pptx
The approach at University of Liverpool.pptx
 
Supporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptxSupporting (UKRI) OA monographs at Salford.pptx
Supporting (UKRI) OA monographs at Salford.pptx
 

Towards A Spoken Version of Google

  • 1. Towards A Spoken Version of Google ─ making the global knowledge accessible by voice Lin-shan Lee National Taiwan University
  • 2. • Google, Amazon, Facebook, YouTube offer most of our daily life information • Physical knowledge archives (e.g. libraries, museums) useful in the past are developing their virtualized versions • Online courses, distant learning, electronic books, etc. become the most efficient learning aids Internet is the Only Largest Archive for Global Human Knowledge Internet Real-time Information – weather, traffic – flight schedule – stock price – sports scores Special Services – Google – Facebook –YouTube – Amazon Knowledge Archieves – digital libraries – virtual museums Intelligent Working Environment – intelligent agents – teleconferencing – distant learning – electric commerce Private Services – personal assistants – business databases – home appliances – network entertainments
  • 3. • Have been An Important Part of our Daily Life Google and the Internet • Finding any desired information (in text) over the Internet relevant to the user instructions (in text): Information Retrieval • Google is able to match and find the relevance between segments of text Internet Google …………… ………….. ……. server server User instructions (in text) Documents/Information (in text) (in text) (in text)
  • 4. Google and the Internet – for multimedia information (e.g. YouTube, online courses, including video and audio) to be retrieved by Google, the content needs to include text descriptions • Google can only read text information • Have been An Important Part of our Daily Life Internet …………… ………….. ……. server server User instructions (in text) Documents/Information (in text) Google
  • 5. Google and the Internet • The Whole Video (a lecture or a news story) can be retrieved, but Not the Exact Sentence – the text descriptions is for a video or a lecture • Have been An Important Part of our Daily Life Internet …………… ………….. ……. server server User instructions (in text) Documents/Information (in text) Google
  • 6. Google and the Internet • The Audio Information (voice in a video or a lecture) in Multimedia very often tells what is going on in the Multimedia • Can be used as the Key for Information Retrieval • Have been An Important Part of our Daily Life Internet …………… ………….. ……. server server User instructions (in text) Documents/Information (in text) Google
  • 7. Speech Technologies Advancing Very Fast Today • Speech Recognition / Synthesis Technologies • Machines can Listen to Voice or Read Text • All Roles of Text can be Realized by Voice Speech Recognition (machine) Speech Synthesis (machine) text voice How are you today? Good morning! Spoken language written language
  • 8. Machines listen to the audio part Google voice information Multimedia Content Internet voice input/ output text information • Information Retrieval as Google does can be performed based on Voice • User-Content Interaction can be Accomplished by Spoken and Multi- modal Dialogues (including other Modes of Interaction, e.g. those with fingers) • Text Information found can be transformed to Voice Google (text-based) Text Content Information Retrieval (voice-based) Speech Synthesis Spoken and Multi-modal Dialogue Spoken Version of Google Google may have a Spoken Version
  • 9. Information Retrieval (text/voice-based) • Both the User Instructions and Network Content can be in form of Voice Voice Instructions US/China Trade Policy ? Text Instructions d1 Text Information d2 d3 d1 d2 d3 Voice Information President Donald Trump… Google /voice
  • 10. Voice Interface is Convenient for All Different Kinds of User Terminals Internet Text Content Multimedia Content • Smart phones, Hand-held Devices, Wearable Devices (Watches, Glasses, etc.), Notebooks, Vehicular Electronics, Home Appliances … • Network Access at Any Time, from Anywhere • Small in Size, Light in Weight, Ubiquitous, Invisible… • Voice is the Only Interface Convenient for ALL User Terminals at Any Time, from Anywhere, and To the Point in one Utterance
  • 11. Google and the Internet • Have been An Important Part of our Daily Life Internet Google …………… ………….. ……. server server User instructions (in text) Documents/Information (in text) • Finding any desired information (in text) over the Internet relevant to the user instructions (in text): Information Retrieval • Google is able to match and find the relevance between segments of text
  • 12. Google can have a Spoken Version • Finding any desired information (in voice) over the Internet relevant to the user instructions (in voice): Information Retrieval • Spoken version of Google is able to match and find the relevance between segments of voice • Have been An Important Part of our Daily Life Internet Google …………… ………….. ……. server server User instructions (in voice) Documents/Information (in voice)
  • 13. Google can have a Spoken Version • Can Locate the Exact Time the Desired Information Appears in the Multimedia • No Need for Text Descriptions • Have been An Important Part of our Daily Life Internet Google …………… ………….. ……. server server User instructions (in voice) Documents/Information (in voice)
  • 14. Retrieval of Public TV News of Taiwan (公視新聞 搜尋) • 2004 • Locate the Exact Relevant Utterances (Spoken Sentences) without Text Descriptions
  • 15. Retrieval of Public TV News of Taiwan (公視新聞 搜尋) • 2004 • Locate the Exact Relevant Utterances (Spoken Sentences) without Text Descriptions
  • 16. User-Content Interaction for Spoken Content Retrieval • Problems – User-content interaction always important even for text content – Unlike text content, spoken content not easily summarized on screen, thus retrieved results difficult to scan and select User Query Spoken Archives Retrieved Results Spoken Version of Google
  • 17. User-Content Interaction for Spoken Content Retrieval • Possible Approaches – Automatic summary/title generation and keyword extraction from spoken content – Topic structure for spoken content – Multi-modal dialogue with improved interaction Keyword/ Titles/Summaries User Query Multi-modal Dialogue Spoken Archives Retrieved Results Spoken Version of Google User Interface Topic Structure Spoken version of Google
  • 18. X1 X2 X3 X4 X5 X6 Correctly recognized word X1 X3 summary • Selecting most representative utterances but avoiding redundancy Wrongly recognized word t2 t1 Summarization of A Segment of Voice
  • 19. User-Content Interaction for Spoken Content Retrieval • Possible Approaches – Automatic summary/title generation and keyword extraction from spoken content – Topic structure for spoken content – Multi-modal dialogue with improved interaction Keyword/ Titles/Summaries User Query Multi-modal Dialogue Spoken Archives Retrieved Results Spoken Version of Google User Interface Topic Structure Spoken version of Google
  • 20. • Example 1: retrieved results Grouped by Topics and organized in a Two-dimensional Tree Structure – each group of retrieved segments labeled by a set of keywords (topic) – each group expanded into a map in the next layer Topic Structure (1/2)
  • 21. Broadcast News Browser (2006) (電視新聞瀏覽器) Demonstration Summary NewsVideo
  • 22. • Usually very long time required (e.g. 45 hrs) to learn from a complete online course – difficult for very busy people to learn pieces of knowledge • Possible to retrieve a segment of lecture with a browser – knowledge transfer in a course is usually sequential – not easy to understand a lecture without listening to the previous lectures – not easy to find out background or related knowledge • Possible Approaches – consecutive audio/video for each slide taken as a segment of multimedia (a page of spoken slide) – each spoken slide labeled by the keywords (topic) extracted from the audio – relationships between keywords of the course represented by a graph Online Courses
  • 23. • Example 2: Keyword Graph – each spoken slide labeled by a set of keywords (topics) – relationships between keywords represented by a graph Topic Structure (2/2) ----- ----- ----- ----- --------- --------- --------- --- ------- ------- ------- ---- spoken slides keyword graph Acoustic Modeling Viterbi search HMM Language Modeling Perplexity
  • 24. NTU Virtual Instructor (2009) (台大虛擬教師) Demonstration
  • 25. Too Many Online Courses 752 matches • A user enters a keyword or a key phrase to coursera
  • 26. Machines Listening to the online Courses Lectures with very similar content three courses on some similar topic
  • 27. sequential order for learning (prerequisite conditions) Machines Listening to the online Courses three courses on some similar topic
  • 28. Learning Map Produced by Machine (2014) (機器製作學習地圖) Demonstration
  • 29. • To Which Degree can Machines Understand the Spoken Content ? • Machines to take TOEFL Listening Comprehension Test (an English test for students whose native language is not English) • An Example Problem: A Question: “ What is a possible origin of Venus’ clouds? ” Answer options: (A) gases released as a result of volcanic activity (C) bursts of radio energy from the plane's surface (D) strong winds that blow dust into the atmosphere Recorded voice: (5min long) Machine Comprehension of Spoken Content (B) chemical reactions caused by high surface temperatures
  • 30. (1) (2) (3) (4) (5) by some hand-crafted rules without listening to the recorded voice Best Machine Performance: 50.0% choosing the shortest answer Machine Comprehension of Spoken Content (%) • Machine Performance 2017 Accuracy
  • 32. What can Machines do for Human ? • Machines can listen to and understand the entire multimedia knowledge archive over the Internet – extracting desired information for each individual user 300hrs of videos uploaded per min (2015.01) Roughly 2000 online courses on Coursera (2016.04) • Nobody can go through so much multimedia information, but Machines can • Multimedia Content exponentially increasing over the Internet – best archive of global human knowledge is here – desired information deeply buried under huge quantities of unrelated information
  • 33. An Example: Personalized Courses • Machines generate desired personalized courses for each individual user I wish to learn some knowledge about the masterpieces of Wolfgang Amadeus Mozart I am an engineer. I know nothing about music I can spend 3 hrs to learn user Thank you for your request. This is the 3-hr personalized course for you. Information from Internet
  • 34. Spoken Version of Google • Speech Recognition and Synthesis Technologies Make Machines Capable of Listening to and Speaking Human Voice – the best the machines can do may be as good as the human • Machines can Handle Huge Quantities of Information – much more efficient than human • Google Reads All Text over the Internet – similarly machines can listen to all voices over the Internet • Internet is the Only Largest Archive for Global Human Knowledge – voice can be the key to that archive