Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Focus on spoken content in multimedia retrieval

442 views

Published on

Invited talk at the University of Texas at El Paso

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Focus on spoken content in multimedia retrieval

  1. 1. Focus on spoken content in multimedia retrieval 1/48 Focus on spoken content in multimedia retrieval Maria Eskevich Centre for Next Generation Localisation School of Computing, Dublin City University, Dublin, Ireland April, 16, 2013
  2. 2. Focus on spoken content in multimedia retrieval 2/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  3. 3. Focus on spoken content in multimedia retrieval 3/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: segmentation aspect
  4. 4. Focus on spoken content in multimedia retrieval 4/48 Information Retrieval (IR) Speech Processing (Automatic Speech Recognition (ASR))
  5. 5. Focus on spoken content in multimedia retrieval 4/48 Standard IR System Speech Processing (Automatic Speech Recognition (ASR))
  6. 6. Focus on spoken content in multimedia retrieval 4/48 Standard IR System Queries IR System Indexed Documents IR Model Information Request Results Retrieval Speech Processing (Automatic Speech Recognition (ASR))
  7. 7. Focus on spoken content in multimedia retrieval 4/48 Standard IR System Queries IR System Indexed Documents IR Model Information Request Results Retrieval Speech Processing (Automatic Speech Recognition (ASR)) Audio Data Collection Transcripts of Audio DataASR System
  8. 8. Focus on spoken content in multimedia retrieval 4/48 Spoken Content Retrieval (SCR) Queries SCR System Indexed Documents Indexed Transcripts IR Model Information Request Audio Files Retrieval
  9. 9. Focus on spoken content in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Spoken Content
  10. 10. Focus on spoken content in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Spoken Content ASR Transcript ASR System
  11. 11. Focus on spoken content in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Spoken Content ASR Transcript ASR System Indexed Transcript Indexing
  12. 12. Focus on spoken content in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Spoken Content ASR Transcript ASR System Indexed Transcript Ranked Result List 1 2 ... Indexing Retrieval
  13. 13. Focus on spoken content in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Data Spoken Content ASR Transcript ASR System Indexed Transcript Ranked Result List 1 2 ... Indexing Retrieval
  14. 14. Focus on spoken content in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Data Spoken Content ASR Transcript Experiments ASR System Indexed Transcript Ranked Result List 1 2 ... Indexing Retrieval
  15. 15. Focus on spoken content in multimedia retrieval 5/48 Spoken Content Retrieval (SCR) Data Spoken Content ASR Transcript Experiments ASR System Indexed Transcript Ranked Result List 1 2 ... Indexing Evaluation Metrics Retrieval
  16. 16. Focus on spoken content in multimedia retrieval 6/48 Outline: Spoken Content Data Spoken Content ASR Transcript Experiments ASR System Indexed Transcript Ranked Result List 1 2 ... Indexing Evaluation Metrics Retrieval
  17. 17. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content
  18. 18. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech
  19. 19. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News
  20. 20. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures
  21. 21. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures Meetings
  22. 22. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures Meetings Informal Content
  23. 23. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures Meetings Informal Content Internet TV, Podcast, Interview
  24. 24. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast NewsBroadcast News Lectures Meetings Informal Content Internet TV, Podcast, Interview Broadcast News: Data High quality recordings: Often soundproof studio Speaker - professional presenter Well defined structure Query is on a certain topic: User is ready to listen to the whole section Experiments: TREC SDR (1997-2000) Known-item search and ad-hoc retrieval Search with and without fixed story boundaries Evaluation: interest in rank position
  25. 25. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast NewsBroadcast News Lectures Meetings Informal Content Internet TV, Podcast, Interview Broadcast News: Data High quality recordings: Often soundproof studio Speaker - professional presenter Well defined structure Query is on a certain topic: User is ready to listen to the whole section Experiments: TREC SDR (1997-2000) Known-item search and ad-hoc retrieval Search with and without fixed story boundaries Evaluation: interest in rank position HIGHLIGHT: ”Success story” (Garofolo et al., 2000): Performance on ASR Transcript ≈ Manual Transcript ASR good: large amounts of training data Data structure CHALLENGE: Speech data in broadcast news is close to the written text, and differs from the informal content of spontaneous speech
  26. 26. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News LecturesLectures Meetings Informal Content Internet TV, Podcast, Interview Lectures: Data: Prepared presentations containing conversational style features: hesitations, mispronunciations Specialized vocabulary Out-Of-Vocabulary words Lecture specific words may have low probability scores in the ASR language model Additional information available: presentation slides, textbooks Experiments: Lectures browsing: e.g. TalkMiner, MIT lectures, eLectures SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10: e.g. IR experiments, evaluation metrics that assess topic segmentation methods
  27. 27. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News LecturesLectures Meetings Informal Content Internet TV, Podcast, Interview Lectures: Data: Prepared presentations containing conversational style features: hesitations, mispronunciations Specialized vocabulary Out-Of-Vocabulary words Lecture specific words may have low probability scores in the ASR language model Additional information available: presentation slides, textbooks Experiments: Lectures browsing: e.g. TalkMiner, MIT lectures, eLectures SpokenDoc(2) Tasks at NTCIR-9, NTCIR-10: e.g. IR experiments, evaluation metrics that assess topic segmentation methods HIGHLIGHT/CHALLENGE: Focus on segmentation methods, jump-in points
  28. 28. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures MeetingsMeetings Informal Content Internet TV, Podcast, Interview Meetings: Data features: Mixture of semi-formal and prepared spoken content Additional data: slides, minutes Possible real life motivated scenario: Jump-in points where discussion on topic started or a decision point is reached Opinion of a certain person or person with a certain role Search for all relevant (parts of) meetings where topic was discussed Experiments: topic segmentation, browsing summarization
  29. 29. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures MeetingsMeetings Informal Content Internet TV, Podcast, Interview Meetings: Data features: Mixture of semi-formal and prepared spoken content Additional data: slides, minutes Possible real life motivated scenario: Jump-in points where discussion on topic started or a decision point is reached Opinion of a certain person or person with a certain role Search for all relevant (parts of) meetings where topic was discussed Experiments: topic segmentation, browsing summarization HIGHLIGHT/CHALLENGE: No unified search scenario We created a test retrieval collection on the basis of AMI corpus and set up a task scenario ourselves
  30. 30. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures Meetings Informal ContentInformal Content Internet TV, Podcast, Interview Informal Content (Interviews, Internet TV): Data features: Varying quality: semi- and non-professional data creators Additional data: professionally or user-generated metadata Experiments: CLEF CL-SR: MALACH collection un/known-boundaries, ad-hoc task MediaEval’11,’12,’13: retrieval of semi-professional multimedia content known-item task, unknown boundaries Metrics: focus on ranking and penalize distance from the jump-in point
  31. 31. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News Lectures Meetings Informal ContentInformal Content Internet TV, Podcast, Interview Informal Content (Interviews, Internet TV): Data features: Varying quality: semi- and non-professional data creators Additional data: professionally or user-generated metadata Experiments: CLEF CL-SR: MALACH collection un/known-boundaries, ad-hoc task MediaEval’11,’12,’13: retrieval of semi-professional multimedia content known-item task, unknown boundaries Metrics: focus on ranking and penalize distance from the jump-in point HIGHLIGHT/CHALLENGE: Metric does not always take into account how much time the user needs to spend listening to access the relevant content Diversity of the informal multimedia content Search scenario no longer limited to factual information
  32. 32. Focus on spoken content in multimedia retrieval 7/48 Spoken Content Retrieval: historical perspective Spoken Content Prepared Speech Informal Conversational Speech Broadcast News LecturesLectures MeetingsMeetings Informal ContentInformal Content Internet TV, Podcast, Interview Review of the challenges/our work for Informal SCR: Framework of retrieval experiment has to be set up: retrieval collections to be created Our work: We collected new multimodal retrieval collections via crowdsourcing ASR errors decrease IR results Our work: We examined deeper relationship between ASR performance and results ranking Suitable segmentation is vital Our work: We carry out experiments with varying methods Need for metrics that reflect all aspects of user experience Our work: We created a new set of metrics
  33. 33. Focus on spoken content in multimedia retrieval 8/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  34. 34. Focus on spoken content in multimedia retrieval 9/48 MediaEval Multimedia Evaluation benchmarking inititative Evaluate new algorithms for multimedia access and retrieval. Emphasize the ”multi” in multimedia: speech, audio, visual content, tags, users, context. Innovates new tasks and techniques focusing on the human and social aspects of multimedia content.
  35. 35. Focus on spoken content in multimedia retrieval 10/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention
  36. 36. Focus on spoken content in multimedia retrieval 10/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention
  37. 37. Focus on spoken content in multimedia retrieval 11/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention
  38. 38. Focus on spoken content in multimedia retrieval 12/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 Transcript 2
  39. 39. Focus on spoken content in multimedia retrieval 12/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 Meaning 1 Transcript 2 Meaning 2
  40. 40. Focus on spoken content in multimedia retrieval 12/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Meaning 1 = Transcript 2 Meaning 2
  41. 41. Focus on spoken content in multimedia retrieval 12/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Meaning 1 = Transcript 2 Meaning 2 Conventional retrieval
  42. 42. Focus on spoken content in multimedia retrieval 13/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Meaning 1 = Transcript 2 Meaning 2
  43. 43. Focus on spoken content in multimedia retrieval 13/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Meaning 1 = Speech act 1 = Transcript 2 Meaning 2 Speech act 2
  44. 44. Focus on spoken content in multimedia retrieval 13/48 MediaEval 2011 Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Meaning 1 = Speech act 1 = Transcript 2 Meaning 2 Speech act 2 Extended speech retrieval
  45. 45. Focus on spoken content in multimedia retrieval 14/48 MediaEval 2012-2013: Search and Hyperlinking (S&H) Task Background
  46. 46. Focus on spoken content in multimedia retrieval 15/48 MediaEval 2012-2013: S&H Task
  47. 47. Focus on spoken content in multimedia retrieval 16/48 MediaEval 2012-2013: S&H Task and Crowdsourcing
  48. 48. Focus on spoken content in multimedia retrieval 17/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  49. 49. Focus on spoken content in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process.
  50. 50. Focus on spoken content in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account:
  51. 51. Focus on spoken content in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account: Sufficient number of workers
  52. 52. Focus on spoken content in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account: Sufficient number of workers Level of payment
  53. 53. Focus on spoken content in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account: Sufficient number of workers Level of payment Clear instructions
  54. 54. Focus on spoken content in multimedia retrieval 18/48 What is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account: Sufficient number of workers Level of payment Clear instructions Possible cheating
  55. 55. Focus on spoken content in multimedia retrieval 19/48 Results assessment
  56. 56. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries
  57. 57. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries
  58. 58. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets
  59. 59. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating:
  60. 60. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples
  61. 61. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples − > Examples should be pictures, not texts
  62. 62. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples − > Examples should be pictures, not texts Choose the option of no speech act found in the video
  63. 63. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples − > Examples should be pictures, not texts Choose the option of no speech act found in the video − > Manual assessment by requester needed
  64. 64. Focus on spoken content in multimedia retrieval 19/48 Results assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples − > Examples should be pictures, not texts Choose the option of no speech act found in the video − > Manual assessment by requester needed Workers rarely find noteworthy content later than the third minute from the start of playback point in the video
  65. 65. Focus on spoken content in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources
  66. 66. Focus on spoken content in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers
  67. 67. Focus on spoken content in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers Pay attention to technical issues of watching the video
  68. 68. Focus on spoken content in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers Pay attention to technical issues of watching the video Video preprocessing into smaller segments
  69. 69. Focus on spoken content in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers Pay attention to technical issues of watching the video Video preprocessing into smaller segments Creative work demands higher reward level, or just more flexible system
  70. 70. Focus on spoken content in multimedia retrieval 20/48 Crowdsourcing issues for multimedia retrieval collection creation It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers Pay attention to technical issues of watching the video Video preprocessing into smaller segments Creative work demands higher reward level, or just more flexible system High level of wastage due to task complexity
  71. 71. Focus on spoken content in multimedia retrieval 21/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  72. 72. Focus on spoken content in multimedia retrieval 22/48 Dataset segment representation
  73. 73. Focus on spoken content in multimedia retrieval 23/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots
  74. 74. Focus on spoken content in multimedia retrieval 23/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window:
  75. 75. Focus on spoken content in multimedia retrieval 23/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  76. 76. Focus on spoken content in multimedia retrieval 23/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  77. 77. Focus on spoken content in multimedia retrieval 23/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  78. 78. Focus on spoken content in multimedia retrieval 24/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  79. 79. Focus on spoken content in multimedia retrieval 25/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  80. 80. Focus on spoken content in multimedia retrieval 26/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  81. 81. Focus on spoken content in multimedia retrieval 27/48 Approach 1: Fixed length segmentation Fixed length segmentation Number of words (including/excluding stop words) Time slots Fixed length segmentation with sliding window: Post-processing:
  82. 82. Focus on spoken content in multimedia retrieval 28/48 Approach 2: Flexible length segmentation Speech or Video units of varying length
  83. 83. Focus on spoken content in multimedia retrieval 28/48 Approach 2: Flexible length segmentation Speech or Video units of varying length Speech: sentence, speech segment, silence points, changes of speakers Video: shots
  84. 84. Focus on spoken content in multimedia retrieval 28/48 Approach 2: Flexible length segmentation Speech or Video units of varying length Speech: sentence, speech segment, silence points, changes of speakers Video: shots Topical segmentation Lexical cohesion - C99, TexTiling
  85. 85. Focus on spoken content in multimedia retrieval 29/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  86. 86. Focus on spoken content in multimedia retrieval 30/48 Evaluation: Search sub-task
  87. 87. Focus on spoken content in multimedia retrieval 31/48 Evaluation: Search sub-task
  88. 88. Focus on spoken content in multimedia retrieval 32/48 Evaluation: Search sub-task
  89. 89. Focus on spoken content in multimedia retrieval 33/48 Evaluation: Search sub-task
  90. 90. Focus on spoken content in multimedia retrieval 34/48 Evaluation: Search sub-task
  91. 91. Focus on spoken content in multimedia retrieval 34/48 Evaluation: Search sub-task Mean Reciprocal Rank (MRR): RR = 1 RANK Mean Generalized Average Precision (mGAP): GAP = 1 RANK . PENALTY
  92. 92. Focus on spoken content in multimedia retrieval 35/48 Evaluation: Search sub-task Mean Average Segment Precision (MASP): Ranking + Length of (ir)relevant content Segment Precision (SP[r]) at rank r:
  93. 93. Focus on spoken content in multimedia retrieval 35/48 Evaluation: Search sub-task Mean Average Segment Precision (MASP): Ranking + Length of (ir)relevant content Segment Precision (SP[r]) at rank r:
  94. 94. Focus on spoken content in multimedia retrieval 35/48 Evaluation: Search sub-task Mean Average Segment Precision (MASP): Ranking + Length of (ir)relevant content Segment Precision (SP[r]) at rank r: Average Segment Precision:
  95. 95. Focus on spoken content in multimedia retrieval 35/48 Evaluation: Search sub-task Mean Average Segment Precision (MASP): Ranking + Length of (ir)relevant content Segment Precision (SP[r]) at rank r: Average Segment Precision: ASP = 1 n . N r=1 SP[r] · rel(sr ) rel(sr ) = 1, if relevant content is present, otherwise rel(sr ) = 0
  96. 96. Focus on spoken content in multimedia retrieval 36/48 Evaluation: Search sub-task Focus on Precision/Recall of the relevant content within the retrieved segment.
  97. 97. Focus on spoken content in multimedia retrieval 37/48 Outline Spoken Content Retrieval: historical perspective MediaEval Benchmark: 3 years of Spoken Content Retrieval experiments: Rich Speech Retrieval and Search and Hyperlinking tasks Dataset collection creation issues for multimedia retrieval: crowdsourcing aspect Interesting observations on results: Segmentation methods Evaluation metrics Numbers
  98. 98. Focus on spoken content in multimedia retrieval 38/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods Segment: 100 % Recall of the relevant content High Precision (30, 56 %) of the relevant content Topic consistency
  99. 99. Focus on spoken content in multimedia retrieval 39/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  100. 100. Focus on spoken content in multimedia retrieval 40/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  101. 101. Focus on spoken content in multimedia retrieval 41/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  102. 102. Focus on spoken content in multimedia retrieval 42/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  103. 103. Focus on spoken content in multimedia retrieval 43/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  104. 104. Focus on spoken content in multimedia retrieval 44/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  105. 105. Focus on spoken content in multimedia retrieval 45/48 Experiments (RSR): Spontaneous Speech Search Relationship Between Retrieval Effectiveness and Segmentation Methods
  106. 106. Focus on spoken content in multimedia retrieval 46/48 Experiments (S&H) Fixed length segmentation with sliding window 2 transcrpts (LIMSI, LIUM) LIMSI LIUM
  107. 107. Focus on spoken content in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content
  108. 108. Focus on spoken content in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking.
  109. 109. Focus on spoken content in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking. Related metadata can be useful to improve ranking of the segment with high recall and containing non relevant content.
  110. 110. Focus on spoken content in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking. Related metadata can be useful to improve ranking of the segment with high recall and containing non relevant content. Influence of ASR quality:
  111. 111. Focus on spoken content in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking. Related metadata can be useful to improve ranking of the segment with high recall and containing non relevant content. Influence of ASR quality: The errors effect is not straightforward, can be smoothed by the use of context, query dependent treatment of the transcript.
  112. 112. Focus on spoken content in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking. Related metadata can be useful to improve ranking of the segment with high recall and containing non relevant content. Influence of ASR quality: The errors effect is not straightforward, can be smoothed by the use of context, query dependent treatment of the transcript. ASR System Vocabulary variability: longer segments have higher MRR scores with transcript of lower language variability (LIMSI), whereas shorter segments perform better with transcripts of higher language variability (LIUM).
  113. 113. Focus on spoken content in multimedia retrieval 47/48 Segmentation requirements for effective SCR Segmentation plays significant role in retrieving relevant content High recall and precision of the relevant content within the segment leads to good segment ranking. Related metadata can be useful to improve ranking of the segment with high recall and containing non relevant content. Influence of ASR quality: The errors effect is not straightforward, can be smoothed by the use of context, query dependent treatment of the transcript. ASR System Vocabulary variability: longer segments have higher MRR scores with transcript of lower language variability (LIMSI), whereas shorter segments perform better with transcripts of higher language variability (LIUM). Multimodal queries: addition of visual information decreases performance.
  114. 114. Focus on spoken content in multimedia retrieval 48/48 Thank you for your attention! Questions?

×