Creating a Data Collection for Evaluating Rich Speech Retrieval            Creating a Data Collection       for Evaluating...
Creating a Data Collection for Evaluating Rich Speech RetrievalOutline          MediaEval benchmark          MediaEval 201...
Creating a Data Collection for Evaluating Rich Speech Retrieval  ediaEvalMultimedia Evaluation benchmarking inititative   ...
Creating a Data Collection for Evaluating Rich Speech Retrieval   ediaEval 2011Rich Speech Retrieval (RSR) Task       Task...
Creating a Data Collection for Evaluating Rich Speech Retrieval   ediaEval 2011Rich Speech Retrieval (RSR) Task       Task...
Creating a Data Collection for Evaluating Rich Speech Retrieval   ediaEval 2011Rich Speech Retrieval (RSR) Task       Task...
Creating a Data Collection for Evaluating Rich Speech Retrieval   ediaEval 2011Rich Speech Retrieval (RSR) Task       Task...
Creating a Data Collection for Evaluating Rich Speech Retrieval   ediaEval 2011Rich Speech Retrieval (RSR) Task       Task...
Creating a Data Collection for Evaluating Rich Speech Retrieval   ediaEval 2011Rich Speech Retrieval (RSR) Task       Task...
Creating a Data Collection for Evaluating Rich Speech Retrieval   ediaEval 2011Rich Speech Retrieval (RSR) Task       Task...
Creating a Data Collection for Evaluating Rich Speech Retrieval   ediaEval 2011Rich Speech Retrieval (RSR) Task       Task...
Creating a Data Collection for Evaluating Rich Speech Retrieval   ediaEval 2011Rich Speech Retrieval (RSR) Task       Task...
Creating a Data Collection for Evaluating Rich Speech Retrieval   ediaEval 2011Rich Speech Retrieval (RSR) Task       Task...
Creating a Data Collection for Evaluating Rich Speech Retrieval   ediaEval 2011Rich Speech Retrieval (RSR) Task       ME10...
Creating a Data Collection for Evaluating Rich Speech Retrieval   ediaEval 2011Rich Speech Retrieval (RSR) Task       ME10...
Creating a Data Collection for Evaluating Rich Speech Retrieval   ediaEval 2011Rich Speech Retrieval (RSR) Task       ME10...
Creating a Data Collection for Evaluating Rich Speech Retrieval   ediaEval 2011Rich Speech Retrieval (RSR) Task        ME1...
Creating a Data Collection for Evaluating Rich Speech Retrieval   ediaEval 2011Rich Speech Retrieval (RSR) Task        ME1...
Creating a Data Collection for Evaluating Rich Speech RetrievalWhat is crowdsourcing?        Crowdsourcing is a form of hu...
Creating a Data Collection for Evaluating Rich Speech RetrievalWhat is crowdsourcing?        Crowdsourcing is a form of hu...
Creating a Data Collection for Evaluating Rich Speech RetrievalWhat is crowdsourcing?        Crowdsourcing is a form of hu...
Creating a Data Collection for Evaluating Rich Speech RetrievalWhat is crowdsourcing?        Crowdsourcing is a form of hu...
Creating a Data Collection for Evaluating Rich Speech RetrievalWhat is crowdsourcing?        Crowdsourcing is a form of hu...
Creating a Data Collection for Evaluating Rich Speech RetrievalWhat is crowdsourcing?        Crowdsourcing is a form of hu...
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing in Development of Speech andLanguage Resources
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing in Development of Speech andLanguage Resource...
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing in Development of Speech andLanguage Resource...
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing in Development of Speech andLanguage Resource...
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing with Amazon Mechanical Turk
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing with Amazon Mechanical Turk     Task is refer...
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing with Amazon Mechanical Turk     Task is refer...
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing with Amazon Mechanical Turk     Task is refer...
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing with Amazon Mechanical Turk     Task is refer...
Creating a Data Collection for Evaluating Rich Speech RetrievalInformation expected from the workerto create a test collec...
Creating a Data Collection for Evaluating Rich Speech RetrievalInformation expected from the workerto create a test collec...
Creating a Data Collection for Evaluating Rich Speech RetrievalInformation expected from the workerto create a test collec...
Creating a Data Collection for Evaluating Rich Speech RetrievalInformation expected from the workerto create a test collec...
Creating a Data Collection for Evaluating Rich Speech RetrievalInformation expected from the workerto create a test collec...
Creating a Data Collection for Evaluating Rich Speech RetrievalData management for Amazon MTurking     ME10WWW videos vary...
Creating a Data Collection for Evaluating Rich Speech RetrievalData management for Amazon MTurking     ME10WWW videos vary...
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment     Worker expectations:
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment     Worker expectations:
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment     Worker expectations:       Rew...
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment     Worker expectations:       Rew...
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment                                   ...
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment                                   ...
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment                                   ...
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment      Workers feedback:            ...
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment                                   ...
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment                                   ...
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment                                   ...
Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment                                   ...
Creating a Data Collection for Evaluating Rich Speech RetrievalHIT example        Pilot:     “Please watch the video and fi...
Creating a Data Collection for Evaluating Rich Speech RetrievalHIT example        Pilot:     “Please watch the video and fi...
Creating a Data Collection for Evaluating Rich Speech RetrievalResults:Number of collected queries per speech act       Pr...
Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment
Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment       Number of accepted HITs = number o...
Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment       Number of accepted HITs = number o...
Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment       Number of accepted HITs = number o...
Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment       Number of accepted HITs = number o...
Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment       Number of accepted HITs = number o...
Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment       Number of accepted HITs = number o...
Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment       Number of accepted HITs = number o...
Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment       Number of accepted HITs = number o...
Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment       Number of accepted HITs = number o...
Creating a Data Collection for Evaluating Rich Speech RetrievalConclusions        It is possible to crowdsource extensive ...
Creating a Data Collection for Evaluating Rich Speech RetrievalConclusions        It is possible to crowdsource extensive ...
Creating a Data Collection for Evaluating Rich Speech RetrievalConclusions        It is possible to crowdsource extensive ...
Creating a Data Collection for Evaluating Rich Speech RetrievalConclusions        It is possible to crowdsource extensive ...
Creating a Data Collection for Evaluating Rich Speech RetrievalConclusions        It is possible to crowdsource extensive ...
Creating a Data Collection for Evaluating Rich Speech RetrievalConclusions        It is possible to crowdsource extensive ...
Creating a Data Collection for Evaluating Rich Speech Retrieval  ediaEval 2012 Brave New Task:Search and Hyperlinking     ...
Creating a Data Collection for Evaluating Rich Speech Retrieval  ediaEval 2012 Brave New Task:Search and Hyperlinking     ...
Creating a Data Collection for Evaluating Rich Speech Retrieval  ediaEval 2012 Brave New Task:Search and Hyperlinking     ...
Creating a Data Collection for Evaluating Rich Speech Retrieval  ediaEval 2012 Brave New Task:Search and Hyperlinking     ...
Creating a Data Collection for Evaluating Rich Speech RetrievalediaEval 2012           Thank you for your attention!     W...
Upcoming SlideShare
Loading in …5
×

Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)

385 views
332 views

Published on

We describe the development of a test collection for the investigation of speech retrieval beyond identification of relevant content. This collection focus on satisfying user information needs for queries associated with specific types of speech acts. The collection is based on an archive of Internet video from Internet video sharing platform (blip.tv), and was provided by the MediaEval benchmarking initiative. A crowdsourcing approach is used to identify segments in the video data which contain speech acts, to create a description of the video containing the act and to generate search queries designed to refind this speech act. We describe and reflect on our experiences with crowdsourcing
this test collection using the Amazon Mechanical Turk platform.We highlight the challenges of constructing this dataset, including the selection of the data source, design of the crowdsouring task and the specification of queries and relevant items.

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
385
On SlideShare
0
From Embeds
0
Number of Embeds
1
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Creating a Data Collection for Evaluating Rich Speech Retrieval (LREC 2012)

  1. 1. Creating a Data Collection for Evaluating Rich Speech Retrieval Creating a Data Collection for Evaluating Rich Speech Retrieval Maria Eskevich1 , Gareth J.F. Jones1 Martha Larson 2 , Roeland Ordelman 31 Centre for Digital Video Processing, Centre for Next Generation Localisation School of Computing, Dublin City University, Dublin, Ireland 2 Delft University of Technology, Delft, The Netherlands 3 University of Twente, The Netherlands
  2. 2. Creating a Data Collection for Evaluating Rich Speech RetrievalOutline MediaEval benchmark MediaEval 2011 Rich Speech Retrieval Task What is crowdsourcing? Crowdsourcing in Development of Speech and Language Resources Development of effective crowdsourcing task Comments on results Conclusion Future Work: Brave New Task at MediaEval 2012
  3. 3. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEvalMultimedia Evaluation benchmarking inititative Evaluate new algorithms for multimedia access and retrieval. Emphasize the ”multi” in multimedia: speech, audio, visual content, tags, users, context. Innovates new tasks and techniques focusing on the human and social aspects of multimedia content.
  4. 4. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention
  5. 5. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention
  6. 6. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention
  7. 7. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 Transcript 2
  8. 8. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 Transcript 2 Meaning 1 Meaning 2
  9. 9. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Transcript 2 Meaning 1 = Meaning 2
  10. 10. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Transcript 2 Meaning 1 = Meaning 2 Conventional retrieval
  11. 11. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Transcript 2 Meaning 1 = Meaning 2
  12. 12. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Transcript 2 Meaning 1 = Meaning 2 Speech act 1 = Speech act 2
  13. 13. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task Task Goal: Information to be found - combination of required audio and visual content, and speaker’s intention Transcript 1 = Transcript 2 Meaning 1 = Meaning 2 Speech act 1 = Speech act 2 Extended speech retrieval
  14. 14. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task ME10WWW dataset: Videos from Internet video sharing platform blip.tv (1974 episodes, 350 hours)
  15. 15. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task ME10WWW dataset: Videos from Internet video sharing platform blip.tv (1974 episodes, 350 hours) Automatic Speech Recognition (ASR) transcript provided by LIMSI and Vocapia Research
  16. 16. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task ME10WWW dataset: Videos from Internet video sharing platform blip.tv (1974 episodes, 350 hours) Automatic Speech Recognition (ASR) transcript provided by LIMSI and Vocapia Research No queries and relevant items
  17. 17. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task ME10WWW dataset: Videos from Internet video sharing platform blip.tv (1974 episodes, 350 hours) Automatic Speech Recognition (ASR) transcript provided by LIMSI and Vocapia Research No queries and relevant items − > Collect for Retrieval Experiment: user-generated queries user-generated relevant items
  18. 18. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2011Rich Speech Retrieval (RSR) Task ME10WWW dataset: Videos from Internet video sharing platform blip.tv (1974 episodes, 350 hours) Automatic Speech Recognition (ASR) transcript provided by LIMSI and Vocapia Research No queries and relevant items − > Collect for Retrieval Experiment: user-generated queries user-generated relevant items − > Collect via crowdsourcing technology
  19. 19. Creating a Data Collection for Evaluating Rich Speech RetrievalWhat is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process.
  20. 20. Creating a Data Collection for Evaluating Rich Speech RetrievalWhat is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account:
  21. 21. Creating a Data Collection for Evaluating Rich Speech RetrievalWhat is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account: Sufficient number of workers
  22. 22. Creating a Data Collection for Evaluating Rich Speech RetrievalWhat is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account: Sufficient number of workers Level of payment
  23. 23. Creating a Data Collection for Evaluating Rich Speech RetrievalWhat is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account: Sufficient number of workers Level of payment Clear instructions
  24. 24. Creating a Data Collection for Evaluating Rich Speech RetrievalWhat is crowdsourcing? Crowdsourcing is a form of human computation. Human computation is a method of having people do things that we might consider assigning to a computing device, e.g. a language translation task. A crowdsourcing system facilitates a crowdsourcing process. Factors to take into account: Sufficient number of workers Level of payment Clear instructions Possible cheating
  25. 25. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing in Development of Speech andLanguage Resources
  26. 26. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing in Development of Speech andLanguage Resources Suitability of crowdsourcing for simple/straightforward natural language processing tasks:
  27. 27. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing in Development of Speech andLanguage Resources Suitability of crowdsourcing for simple/straightforward natural language processing tasks: Work by non-experts crowdsource workers is of similar standard to that performed by expert workers: translation/translation assessment transcription of native language word sense disambiguation temporal annotation [Snow et al., 2008]
  28. 28. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing in Development of Speech andLanguage Resources Suitability of crowdsourcing for simple/straightforward natural language processing tasks: Work by non-experts crowdsource workers is of similar standard to that performed by expert workers: translation/translation assessment transcription of native language word sense disambiguation temporal annotation [Snow et al., 2008] Research question at collection creation stage: Can untrained crowdsource workers undertake extended tasks which require them to be creative?
  29. 29. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing with Amazon Mechanical Turk
  30. 30. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing with Amazon Mechanical Turk Task is referred to as a ‘Human Intelligence Task’ or HIT.
  31. 31. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing with Amazon Mechanical Turk Task is referred to as a ‘Human Intelligence Task’ or HIT. Crowdsourcing procedure: HIT initiation: Requester uploads a HIT.
  32. 32. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing with Amazon Mechanical Turk Task is referred to as a ‘Human Intelligence Task’ or HIT. Crowdsourcing procedure: HIT initiation: Requester uploads a HIT. Work: Workers carry out the HIT
  33. 33. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing with Amazon Mechanical Turk Task is referred to as a ‘Human Intelligence Task’ or HIT. Crowdsourcing procedure: HIT initiation: Requester uploads a HIT. Work: Workers carry out the HIT Review: Requester reviews the completed work and confirms payment to the worker with a previously set payment. *Requester has an option of paying more (”Bonus”)
  34. 34. Creating a Data Collection for Evaluating Rich Speech RetrievalInformation expected from the workerto create a test collection for RSR Task
  35. 35. Creating a Data Collection for Evaluating Rich Speech RetrievalInformation expected from the workerto create a test collection for RSR Task Speech act type: ’expressives’: apology, opinion ’assertives’: definition ’directives’: warning ’commissives’: promise
  36. 36. Creating a Data Collection for Evaluating Rich Speech RetrievalInformation expected from the workerto create a test collection for RSR Task Speech act type: ’expressives’: apology, opinion ’assertives’: definition ’directives’: warning ’commissives’: promise Time of the labeled speech act: beginning and end
  37. 37. Creating a Data Collection for Evaluating Rich Speech RetrievalInformation expected from the workerto create a test collection for RSR Task Speech act type: ’expressives’: apology, opinion ’assertives’: definition ’directives’: warning ’commissives’: promise Time of the labeled speech act: beginning and end Accurate transcript of the labeled speech act
  38. 38. Creating a Data Collection for Evaluating Rich Speech RetrievalInformation expected from the workerto create a test collection for RSR Task Speech act type: ’expressives’: apology, opinion ’assertives’: definition ’directives’: warning ’commissives’: promise Time of the labeled speech act: beginning and end Accurate transcript of the labeled speech act Queries to refind this speech act: a full sentence query a short web style query
  39. 39. Creating a Data Collection for Evaluating Rich Speech RetrievalData management for Amazon MTurking ME10WWW videos vary in length:
  40. 40. Creating a Data Collection for Evaluating Rich Speech RetrievalData management for Amazon MTurking ME10WWW videos vary in length: − > Starting points for longer videos at a distance of approximately 7 minutes apart are calculated: Data set Episodes Starting points Dev 247 562 Test 1727 3278
  41. 41. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment
  42. 42. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment Worker expectations:
  43. 43. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment Worker expectations:
  44. 44. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment Worker expectations: Reward vs Work
  45. 45. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment Worker expectations: Reward vs Work Per hour Rate
  46. 46. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment Requester uploads the HIT: Worker expectations: Reward vs Work Per hour Rate
  47. 47. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment Requester uploads the HIT: Worker expectations: Reward vs Work Pilot wording Per hour Rate
  48. 48. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment Requester uploads the HIT: Worker expectations: Reward vs Work Pilot wording Per hour Rate 0.11 $ + bonus per speech act type
  49. 49. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment Workers feedback: Requester uploads the HIT: Reward is not worth the Work Pilot wording Task is 0.11 $ + bonus per too complicated speech act type
  50. 50. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment Requester updates the HIT: Workers feedback: Rewording Reward is not worth the Work Task is too complicated
  51. 51. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment Requester updates the HIT: Workers feedback: Rewording Reward is not worth Examples the Work Task is too complicated
  52. 52. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment Requester updates the HIT: Workers feedback: Rewording Reward is not worth Examples the Work 0.19 $ + bonus (0-21$) Task is Workers suggest bonus too complicated size (Mention to be a non-profit organization)
  53. 53. Creating a Data Collection for Evaluating Rich Speech RetrievalCrowdsourcing experiment Requester updates the HIT: Workers feedback: Reward is worth Rewording the Work Examples Task is comprehensible 0.19 $ + bonus (0-21$) Workers suggest bonus Workers are size (Mention that we are a not greedy! non-profit organization)
  54. 54. Creating a Data Collection for Evaluating Rich Speech RetrievalHIT example Pilot: “Please watch the video and find a short portion of the video (a segment) that contains an interesting quote. The quote must fall into one of these six categories”
  55. 55. Creating a Data Collection for Evaluating Rich Speech RetrievalHIT example Pilot: “Please watch the video and find a short portion of the video (a segment) that contains an interesting quote. The quote must fall into one of these six categories” Revised: “Imagine that you are watching videos on YouTube. When you come across something interesting you might want to share it on Facebook, Twitter or your favorite social network. Now please watch this video and search for an interesting video segment that you would like to share with others because it is (an apology, a definition, an opinion, a promise, a warning)”.
  56. 56. Creating a Data Collection for Evaluating Rich Speech RetrievalResults:Number of collected queries per speech act Prices: Dev set: 40 $ per 30 queries Test set: 80 $ per 50 queries
  57. 57. Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment
  58. 58. Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment Number of accepted HITs = number of collected queries
  59. 59. Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment Number of accepted HITs = number of collected queries
  60. 60. Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets
  61. 61. Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating:
  62. 62. Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples
  63. 63. Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples − > Examples should be pictures, not texts
  64. 64. Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples − > Examples should be pictures, not texts Choose the option of no speech act found in the video
  65. 65. Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples − > Examples should be pictures, not texts Choose the option of no speech act found in the video − > Manual assessment by requester needed
  66. 66. Creating a Data Collection for Evaluating Rich Speech RetrievalResults assessment Number of accepted HITs = number of collected queries No overlap of workers in dev and test sets Creative work - Creative Cheating: Copy and paste provided examples − > Examples should be pictures, not texts Choose the option of no speech act found in the video − > Manual assessment by requester needed Workers rarely find noteworthy content later than the third minute from the start of playback point in the video
  67. 67. Creating a Data Collection for Evaluating Rich Speech RetrievalConclusions It is possible to crowdsource extensive and complex tasks to support speech and language resources
  68. 68. Creating a Data Collection for Evaluating Rich Speech RetrievalConclusions It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers
  69. 69. Creating a Data Collection for Evaluating Rich Speech RetrievalConclusions It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers Pay attention to technical issues of watching the video
  70. 70. Creating a Data Collection for Evaluating Rich Speech RetrievalConclusions It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers Pay attention to technical issues of watching the video Video preprocessing into smaller segments
  71. 71. Creating a Data Collection for Evaluating Rich Speech RetrievalConclusions It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers Pay attention to technical issues of watching the video Video preprocessing into smaller segments Creative work demands higher reward level, or just more flexible system
  72. 72. Creating a Data Collection for Evaluating Rich Speech RetrievalConclusions It is possible to crowdsource extensive and complex tasks to support speech and language resources Use concepts and vocabulary familiar to the workers Pay attention to technical issues of watching the video Video preprocessing into smaller segments Creative work demands higher reward level, or just more flexible system High level of wastage due to task complexity
  73. 73. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2012 Brave New Task:Search and Hyperlinking Use Scenario: a user is searching for a known segment in a video collection. Furthermore, because the information in the segment might not be sufficient for his information need, s/he wants to have links to other related video segments, which may help to satisfy information need related to this video.
  74. 74. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2012 Brave New Task:Search and Hyperlinking Use Scenario: a user is searching for a known segment in a video collection. Furthermore, because the information in the segment might not be sufficient for his information need, s/he wants to have links to other related video segments, which may help to satisfy information need related to this video. Sub-tasks:
  75. 75. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2012 Brave New Task:Search and Hyperlinking Use Scenario: a user is searching for a known segment in a video collection. Furthermore, because the information in the segment might not be sufficient for his information need, s/he wants to have links to other related video segments, which may help to satisfy information need related to this video. Sub-tasks: Search: finding suitable video segments based on a short natural language query,
  76. 76. Creating a Data Collection for Evaluating Rich Speech Retrieval ediaEval 2012 Brave New Task:Search and Hyperlinking Use Scenario: a user is searching for a known segment in a video collection. Furthermore, because the information in the segment might not be sufficient for his information need, s/he wants to have links to other related video segments, which may help to satisfy information need related to this video. Sub-tasks: Search: finding suitable video segments based on a short natural language query, Linking: defining links to other relevant video segments in the collection.
  77. 77. Creating a Data Collection for Evaluating Rich Speech RetrievalediaEval 2012 Thank you for your attention! Welcome to MediaEval 2012! http://multimediaeval.org

×