Focus on spoken content in multimedia retrieval
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

Focus on spoken content in multimedia retrieval

on

  • 326 views

Invited talk at the University of Texas at El Paso

Invited talk at the University of Texas at El Paso

Statistics

Views

Total Views
326
Views on SlideShare
326
Embed Views
0

Actions

Likes
0
Downloads
2
Comments
0

0 Embeds 0

No embeds

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

Focus on spoken content in multimedia retrieval Presentation Transcript

  • 1. Focus on spoken content in multimedia retrieval 1/48 Focus on spoken content in multimedia retrieval Maria Eskevich Centre for Next Generation Localisation School of Computing, Dublin City University, Dublin, Ireland April, 16, 2013
  • 2. Focus on spoken content in multimedia retrieval 2/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  • 3. Focus on spoken content in multimedia retrieval 3/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: segmentation aspect
  • 4. Focus on spoken content in multimedia retrieval 4/48 Information Retrieval (IR) Speech Processing (Automatic Speech Recognition (ASR))
  • 5. Focus on spoken content in multimedia retrieval 4/48 Standard IR System Speech Processing (Automatic Speech Recognition (ASR))
  • 6. Focus on spoken content in multimedia retrieval 4/48 Standard IR System Queries IR System Indexed Documents IR Model Information Request Results Retrieval Speech Processing (Automatic Speech Recognition (ASR))
  • 7. Focus on spoken content in multimedia retrieval 4/48 Standard IR System Queries IR System Indexed Documents IR Model Information Request Results Retrieval Speech Processing (Automatic Speech Recognition (ASR)) Audio Data Collection Transcripts of Audio DataASR System
  • 8. Focus on spoken content in multimedia retrieval 4/48 Spoken Content Retrieval (SCR) Queries SCR System Indexed Documents Indexed Transcripts IR Model Information Request Audio Files Retrieval
  • 9. Focus on spoken content in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Spoken Content
  • 10. Focus on spoken content in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Spoken Content ASR Transcript ASR System
  • 11. Focus on spoken content in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Spoken Content ASR Transcript ASR System Indexed Transcript Indexing
  • 12. Focus on spoken content in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Spoken Content ASR Transcript ASR System Indexed Transcript Ranked Result List 1 2 ... Indexing Retrieval
  • 13. Focus on spoken content in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Data Spoken Content ASR Transcript ASR System Indexed Transcript Ranked Result List 1 2 ... Indexing Retrieval
  • 14. Focus on spoken content in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Data Spoken Content ASR Transcript Experiments ASR System Indexed Transcript Ranked Result List 1 2 ... Indexing Retrieval
  • 15. Focus on spoken content in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Data Spoken Content ASR Transcript Experiments ASR System Indexed Transcript Ranked Result List 1 2 ... Indexing Evaluation Metrics Retrieval
  • 16. Focus on spoken content in multimedia retrieval 6/48 Outline: Spoken Content Data Spoken Content ASR Transcript Experiments ASR System Indexed Transcript Ranked Result List 1 2 ... Indexing Evaluation Metrics Retrieval
  • 17. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content
  • 18. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech
  • 19. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News
  • 20. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures
  • 21. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures Meetings
  • 22. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures Meetings Informal Content
  • 23. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures Meetings Informal Content Internet TV, Podcast, Interview
  • 24. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast NewsBroadcast News Lectures Meetings Informal Content Internet TV, Podcast, Interview Broadcast News: Data High quality recordings: Often soundproof studio Speaker - professional presenter Well defined structure Query is on a certain topic: User is ready to listen to the whole section Experiments: TREC SDR (1997-2000) Known-item search and ad-hoc retrieval Search with and without fixed story boundaries Evaluation: interest in rank position
  • 25. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast NewsBroadcast News Lectures Meetings Informal Content Internet TV, Podcast, Interview Broadcast News: Data High quality recordings: Often soundproof studio Speaker - professional presenter Well defined structure Query is on a certain topic: User is ready to listen to the whole section Experiments: TREC SDR (1997-2000) Known-item search and ad-hoc retrieval Search with and without fixed story boundaries Evaluation: interest in rank position HIGHLIGHT: ”Success story” (Garofolo et al., 2000): Performance on ASR Transcript ≈ Manual Transcript ASR good: large amounts of training data Data structure CHALLENGE: Speech data in broadcast news is close to the written text, and differs from the informal content of spontaneous speech
  • 26. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News LecturesLectures Meetings Informal Content Internet TV, Podcast, Interview Lectures: Data: Prepared presentations containing conversational style features: hesitations, mispronunciations Specialized vocabulary Out-Of-Vocabulary words Lecture specific words may have low probability scores in the ASR language model Additional information available: presentation slides, textbooks Experiments: Lectures browsing: e.g. TalkMiner, MIT lectures, eLectures SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10: e.g. IR experiments, evaluation metrics that assess topic segmentation methods
  • 27. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News LecturesLectures Meetings Informal Content Internet TV, Podcast, Interview Lectures: Data: Prepared presentations containing conversational style features: hesitations, mispronunciations Specialized vocabulary Out-Of-Vocabulary words Lecture specific words may have low probability scores in the ASR language model Additional information available: presentation slides, textbooks Experiments: Lectures browsing: e.g. TalkMiner, MIT lectures, eLectures SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10: e.g. IR experiments, evaluation metrics that assess topic segmentation methods HIGHLIGHT/CHALLENGE: Focus on segmentation methods, jump-in points
  • 28. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures MeetingsMeetings Informal Content Internet TV, Podcast, Interview Meetings: Data features: Mixture of semi-formal and prepared spoken content Additional data: slides, minutes Possible real life motivated scenario: Jump-in points where discussion on topic started or a decision point is reached Opinion of a certain person or person with a certain role Search for all relevant (parts of) meetings where topic was discussed Experiments: topic segmentation, browsing summarization
  • 29. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures MeetingsMeetings Informal Content Internet TV, Podcast, Interview Meetings: Data features: Mixture of semi-formal and prepared spoken content Additional data: slides, minutes Possible real life motivated scenario: Jump-in points where discussion on topic started or a decision point is reached Opinion of a certain person or person with a certain role Search for all relevant (parts of) meetings where topic was discussed Experiments: topic segmentation, browsing summarization HIGHLIGHT/CHALLENGE: No unified search scenario We created a test retrieval collection on the basis of AMI corpus and set up a task scenario ourselves
  • 30. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures Meetings Informal ContentInformal Content Internet TV, Podcast, Interview Informal Content (Interviews, Internet TV): Data features: Varying quality: semi- and non-professional data creators Additional data: professionally or user-generated metadata Experiments: CLEF CL-SR: MALACH collection un/known-boundaries, ad-hoc task MediaEval’11,’12,’13: retrieval of semi-professional multimedia content known-item task, unknown boundaries Metrics: focus on ranking and penalize distance from the jump-in point
  • 31. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures Meetings Informal ContentInformal Content Internet TV, Podcast, Interview Informal Content (Interviews, Internet TV): Data features: Varying quality: semi- and non-professional data creators Additional data: professionally or user-generated metadata Experiments: CLEF CL-SR: MALACH collection un/known-boundaries, ad-hoc task MediaEval’11,’12,’13: retrieval of semi-professional multimedia content known-item task, unknown boundaries Metrics: focus on ranking and penalize distance from the jump-in point HIGHLIGHT/CHALLENGE: Metric does not always take into account how much time the user needs to spend listening to access the relevant content Diversity of the informal multimedia content Search scenario no longer limited to factual information
  • 32. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News LecturesLectures MeetingsMeetings Informal ContentInformal Content Internet TV, Podcast, Interview Review of the challenges/our work for Informal SCR: Framework of retrieval experiment has to be set up: retrieval collections to be created Our work: We collected new multimodal retrieval collections via crowdsourcing ASR errors decrease IR results Our work: We examined deeper relationship between ASR performance and results ranking Suitable segmentation is vital Our work: We carry out experiments with varying methods Need for metrics that reflect all aspects of user experience Our work: We created a new set of metrics
  • 33. Focus on spoken content in multimedia retrieval 8/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  • 34. Focus on spoken content in multimedia retrieval 9/48 MediaEval Multimedia Evaluation benchmarking inititative Evaluate new algorithms for multimedia access and retrieval. Emphasize the ”multi” in multimedia: speech, audio, visual content, tags, users, context. Innovates new tasks and techniques focusing on the human and social aspects of multimedia content.
  • 35. Focus on spoken content in multimedia retrieval 10/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention
  • 36. Focus on spoken content in multimedia retrieval 10/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention
  • 37. Focus on spoken content in multimedia retrieval 11/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention
  • 38. Focus on spoken content in multimedia retrieval 12/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 Transcript 2
  • 39. Focus on spoken content in multimedia retrieval 12/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 Meaning 1 Transcript 2 Meaning 2
  • 40. Focus on spoken content in multimedia retrieval 12/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Meaning 1 = Transcript 2 Meaning 2
  • 41. Focus on spoken content in multimedia retrieval 12/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Meaning 1 = Transcript 2 Meaning 2 Conventional retrieval
  • 42. Focus on spoken content in multimedia retrieval 13/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Meaning 1 = Transcript 2 Meaning 2
  • 43. Focus on spoken content in multimedia retrieval 13/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Meaning 1 = Speech act 1 = Transcript 2 Meaning 2 Speech act 2
  • 44. Focus on spoken content in multimedia retrieval 13/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Meaning 1 = Speech act 1 = Transcript 2 Meaning 2 Speech act 2 Extended speech retrieval
  • 45. Focus on spoken content in multimedia retrieval 14/48 MediaEval 2012-2013: Search and Hyperlinking (S&H) Task Background
  • 46. Focus on spoken content in multimedia retrieval 15/48 MediaEval 2012-2013: S&H Task
  • 47. Focus on spoken content in multimedia retrieval 16/48 MediaEval 2012-2013: S&H Task and Crowdsourcing
  • 48. Focus on spoken content in multimedia retrieval 17/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  • 49. Focus on spoken content in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process.
  • 50. Focus on spoken content in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account:
  • 51. Focus on spoken content in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account: Sufficient number of workers
  • 52. Focus on spoken content in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account: Sufficient number of workers Level of payment
  • 53. Focus on spoken content in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account: Sufficient number of workers Level of payment Clear instructions
  • 54. Focus on spoken content in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account: Sufficient number of workers Level of payment Clear instructions Possible cheating
  • 55. Focus on spoken content in multimedia retrieval 19/48 Results assessment
  • 56. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries
  • 57. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries
  • 58. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets
  • 59. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating:
  • 60. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples
  • 61. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples − > Examples should be pictures, not texts
  • 62. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples − > Examples should be pictures, not texts Choose the option of no speech act found in the video
  • 63. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples − > Examples should be pictures, not texts Choose the option of no speech act found in the video − > Manual assessment by requester needed
  • 64. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples − > Examples should be pictures, not texts Choose the option of no speech act found in the video − > Manual assessment by requester needed Workers rarely find noteworthy content later than the third minute from the start of playback point in the video
  • 65. Focus on spoken content in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources
  • 66. Focus on spoken content in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers
  • 67. Focus on spoken content in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers Pay attention to technical issues of watching the video
  • 68. Focus on spoken content in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers Pay attention to technical issues of watching the video Video preprocessing into smaller segments
  • 69. Focus on spoken content in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers Pay attention to technical issues of watching the video Video preprocessing into smaller segments Creative work demands higher reward level, or just more flexible system
  • 70. Focus on spoken content in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers Pay attention to technical issues of watching the video Video preprocessing into smaller segments Creative work demands higher reward level, or just more flexible system High level of wastage due to task complexity
  • 71. Focus on spoken content in multimedia retrieval 21/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  • 72. Focus on spoken content in multimedia retrieval 22/48 Dataset segment representation
  • 73. Focus on spoken content in multimedia retrieval 23/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots
  • 74. Focus on spoken content in multimedia retrieval 23/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window:
  • 75. Focus on spoken content in multimedia retrieval 23/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  • 76. Focus on spoken content in multimedia retrieval 23/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  • 77. Focus on spoken content in multimedia retrieval 23/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  • 78. Focus on spoken content in multimedia retrieval 24/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  • 79. Focus on spoken content in multimedia retrieval 25/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  • 80. Focus on spoken content in multimedia retrieval 26/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  • 81. Focus on spoken content in multimedia retrieval 27/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  • 82. Focus on spoken content in multimedia retrieval 28/48 Approach 2: Flexible length segmentation Speech or Video units of varying length
  • 83. Focus on spoken content in multimedia retrieval 28/48 Approach 2: Flexible length segmentation Speech or Video units of varying length Speech: sentence, speech segment, silence points, changes of speakers Video: shots
  • 84. Focus on spoken content in multimedia retrieval 28/48 Approach 2: Flexible length segmentation Speech or Video units of varying length Speech: sentence, speech segment, silence points, changes of speakers Video: shots Topical segmentation Lexical cohesion - C99, TexTiling
  • 85. Focus on spoken content in multimedia retrieval 29/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  • 86. Focus on spoken content in multimedia retrieval 30/48 Evaluation: Search sub-task
  • 87. Focus on spoken content in multimedia retrieval 31/48 Evaluation: Search sub-task
  • 88. Focus on spoken content in multimedia retrieval 32/48 Evaluation: Search sub-task
  • 89. Focus on spoken content in multimedia retrieval 33/48 Evaluation: Search sub-task
  • 90. Focus on spoken content in multimedia retrieval 34/48 Evaluation: Search sub-task
  • 91. Focus on spoken content in multimedia retrieval 34/48 Evaluation: Search sub-task Mean Reciprocal Rank (MRR): RR = 1 RANK Mean Generalized Average Precision (mGAP): GAP = 1 RANK . PENALTY
  • 92. Focus on spoken content in multimedia retrieval 35/48 Evaluation: Search sub-task Mean Average Segment Precision (MASP): Ranking + Length of (ir)relevant content Segment Precision (SP[r]) at rank r:
  • 93. Focus on spoken content in multimedia retrieval 35/48 Evaluation: Search sub-task Mean Average Segment Precision (MASP): Ranking + Length of (ir)relevant content Segment Precision (SP[r]) at rank r:
  • 94. Focus on spoken content in multimedia retrieval 35/48 Evaluation: Search sub-task Mean Average Segment Precision (MASP): Ranking + Length of (ir)relevant content Segment Precision (SP[r]) at rank r: Average Segment Precision:
  • 95. Focus on spoken content in multimedia retrieval 35/48 Evaluation: Search sub-task Mean Average Segment Precision (MASP): Ranking + Length of (ir)relevant content Segment Precision (SP[r]) at rank r: Average Segment Precision: ASP = 1 n . N r=1 SP[r] · rel(sr ) rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0
  • 96. Focus on spoken content in multimedia retrieval 36/48 Evaluation: Search sub-task Focus on Precision/Recall of the relevant content within the retrieved segment.
  • 97. Focus on spoken content in multimedia retrieval 37/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  • 98. Focus on spoken content in multimedia retrieval 38/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods Segment: 100 % Recall of the relevant content High Precision (30, 56 %) of the relevant content Topic consistency
  • 99. Focus on spoken content in multimedia retrieval 39/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  • 100. Focus on spoken content in multimedia retrieval 40/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  • 101. Focus on spoken content in multimedia retrieval 41/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  • 102. Focus on spoken content in multimedia retrieval 42/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  • 103. Focus on spoken content in multimedia retrieval 43/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  • 104. Focus on spoken content in multimedia retrieval 44/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  • 105. Focus on spoken content in multimedia retrieval 45/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  • 106. Focus on spoken content in multimedia retrieval 46/48 Experiments (S&H) Fixed length segmentation with sliding window 2 transcrpts (LIMSI, LIUM) LIMSI LIUM
  • 107. Focus on spoken content in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content
  • 108. Focus on spoken content in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking.
  • 109. Focus on spoken content in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking. Related metadata can be useful to improve ranking of the segment with high recall and containing non relevant content.
  • 110. Focus on spoken content in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking. Related metadata can be useful to improve ranking of the segment with high recall and containing non relevant content. Influence of ASR quality:
  • 111. Focus on spoken content in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking. Related metadata can be useful to improve ranking of the segment with high recall and containing non relevant content. Influence of ASR quality: The errors effect is not straightforward, can be smoothed by the use of context, query dependent treatment of the transcript.
  • 112. Focus on spoken content in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking. Related metadata can be useful to improve ranking of the segment with high recall and containing non relevant content. Influence of ASR quality: The errors effect is not straightforward, can be smoothed by the use of context, query dependent treatment of the transcript. ASR System Vocabulary variability: longer segments have higher MRR scores with transcript of lower language variability (LIMSI), whereas shorter segments perform better with transcripts of higher language variability (LIUM).
  • 113. Focus on spoken content in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking. Related metadata can be useful to improve ranking of the segment with high recall and containing non relevant content. Influence of ASR quality: The errors effect is not straightforward, can be smoothed by the use of context, query dependent treatment of the transcript. ASR System Vocabulary variability: longer segments have higher MRR scores with transcript of lower language variability (LIMSI), whereas shorter segments perform better with transcripts of higher language variability (LIUM). Multimodal queries: addition of visual information decreases performance.
  • 114. Focus on spoken content in multimedia retrieval 48/48 Thank you for your attention! Questions?