+	          Crowdsourcing	  for	          Mul0media	  Retrieval	          Marco	  Tagliasacchi	          Politecnico	  di	...
+	          Outline	          n    Crowdsourcing	  applica0ons	  in	  mul0media	  retrieval	          n    Aggrega0ng	  ...
+	  Crowdsourcing	  applica0ons	  in	    mul0media	  retrieval	  
+	          Crowdsourcing	          n    Crowdsourcing	  is	  an	  example	  of	  human	  compu+ng	          n    Use	  ...
+	          Crowdsourcing	          n    Crowdsourcing	  plaHorms	                n     Paid	  contributors	            ...
+	          Applica0ons	  in	  mul0media	  retrieval	          	          n     Create	  annotated	  data	  sets	  for	  ...
+	          Crea0ng	  annotated	  training	  sets	          [Sorokin	  and	  Forsyth,	  2008]	          n    Collect	  an...
Proto+	          Crea0ng	  annotated	  training	  sets	          [Sorokin	  and	  Forsyth,	  2008]	                       ...
+	    Experiment 3: trace the boundary of the person.       1      0.8                                  Crea0ng	  annotate...
+	          Crea0ng	  annotated	  training	  sets	          [Soleymani	  and	  Larson,	  2010]	          n    MediaEval	 ...
+	          Crea0ng	  annotated	  training	  sets	          [Nowak	  and	  Ruger.,	  2010]	          n    Crowdsourcing	 ...
+	          Crea0ng	  annotated	  training	  sets	          [Nowak	  and	  Ruger.,	  2010]	          n    Study	  of	  ex...
+	          Crea0ng	  annotated	  training	  sets	          [Vondrick	  et	  al.,	  2010]	          n    Crowdsourcing	  ...
+	          Crea0ng	  annotated	  training	  sets	          [Vondrick	  et	  al.,	  2010]	          n    Annotators	  lab...
+	                           Crea0ng	  annotated	  training	  sets	   ments between F and each of the other e, every docum...
+	          Validate	  the	  output	  of	  MIR	  systems        [Snoek	  et	  al.,	  2010][Freiburg	  et	  al.,	  2011]	  ...
+	          Validate	  the	  output	  of	  MIR	  systems            Crowdsourcing Event Detection in YouTube Videos       ...
+	          Validate	  the	  output	  of	  MIR	  systems        [Goeau	  et	  al.,	  2011]	          n    Visual	  plant	...
+	                       Validate	  the	  output	  of	  MIR	  systems	                       [Yan	  et	  al.,	  2010]	    ...
C./% +3/% *)% -./% 62*7,% #3% #% 3/#26.% 3-2#-/4=% "3% 6*:1/99"$4D% "-%       ,/-/6-"*$% *)% 31/6")"6% )/#-+2/3% 7"-."$% $...
+	          Aggrega0ng	  annota0ons	  
+	          Annota0on	  model	          n     A	  set	  of	  objects	  to	  annotate	   i = 1, . . . , I        n     A	...
+	          Annota0on	  model	          True	  labels	     Objects	               Annotators	                             ...
+	          Aggrega0ng	  annota0ons	          n    Majority	  vo0ng	  (baseline)	                n    For	  each	  objec...
+	          Aggrega0ng	  annota0ons	          Majority	  vo0ng	          n    Assume	  that	  	                          ...
+	                 Aggrega0ng	  annota0ons	                 Majority	  vo0ng	   -q                                     1  ...
+	          Aggrega0ng	  annota0ons	          [Snow	  et	  al.,	  2008]	                                           j      ...
+	          Aggrega0ng	  annota0ons	          [Snow	  et	  al.,	  2008]	                                         j        ...
+	          Aggrega0ng	  annota0ons	          [Snow	  et	  al.,	  2008]	          n    Each	  annotator	  vote	  is	  wei...
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Crowdsourcing for Multimedia Retrieval
Upcoming SlideShare
Loading in …5
×

Crowdsourcing for Multimedia Retrieval

988 views
943 views

Published on

lecture of Marco Tagliasacchi (Politecnico di Milano) for Summer School on Social Media Modeling and Search, and European Chapter of the ACM SIGMM event, supported by CUbRIK and Social Sensor Project.

10-14 September, Fira, Santorini, Greece in Santorini

Published in: Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
988
On SlideShare
0
From Embeds
0
Number of Embeds
6
Actions
Shares
0
Downloads
11
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Crowdsourcing for Multimedia Retrieval

  1. 1. +   Crowdsourcing  for   Mul0media  Retrieval   Marco  Tagliasacchi   Politecnico  di  Milano,  Italy  
  2. 2. +   Outline   n  Crowdsourcing  applica0ons  in  mul0media  retrieval   n  Aggrega0ng  annota0ons   n  Aggrega0ng  and  learning   n  Crowdsourcing  at  work  
  3. 3. +  Crowdsourcing  applica0ons  in   mul0media  retrieval  
  4. 4. +   Crowdsourcing   n  Crowdsourcing  is  an  example  of  human  compu+ng   n  Use  an  online  community  of  human  workers  to  complete  useful   tasks   n  The  task  is  outsourced  to  an  undefined  public   n  Main  idea:  design  tasks  that  are   n  Easy  for  humans   n  Hard  for  machines  
  5. 5. +   Crowdsourcing   n  Crowdsourcing  plaHorms   n  Paid  contributors   n  Amazon  Mechanical  Turk  (www.mturk.com)   n  CrowdFlower  (crowdflower.com)   n  oDesk  (www.odesk.com)   n  …   n  Volunteers   n  Foldit  (www.fold.it)   n  Duolingo  (www.duolingo.com)   n  …    
  6. 6. +   Applica0ons  in  mul0media  retrieval     n  Create  annotated  data  sets  for  training   n  Reduces  both  cost  and  0me  needed  to  gather  annota0ons,   n  …but  annota0ons  might  be  noisy!     n   Validate  the  output  of  mul0media  retrieval  systems   n  Query  expansion  /  reformula0on  
  7. 7. +   Crea0ng  annotated  training  sets   [Sorokin  and  Forsyth,  2008]   n  Collect  annota0ons  for  computer  vision  data  sets     n  people  segmenta0on     Protocol 1 Protocol 1 Protocol 2 Protocol 2
  8. 8. Proto+   Crea0ng  annotated  training  sets   [Sorokin  and  Forsyth,  2008]   Protocol 2 n  Collect  annota0ons  for  computer  vision  data  sets     n  people  segmenta0on  and  pose  annota0on     Protocol 3 Protocol 4 Figure 1. Example results show the example results obtained from the annotation experiments. The first column is the implementation of
  9. 9. +   Experiment 3: trace the boundary of the person. 1 0.8 Crea0ng  annotated  training  sets   area(XOR)/area(AND). The lower the better. Mean 0.21, std 0.14, median 0.16 knee A [Sorokin  and  Forsyth,  2008]   0.6 G B 0.4 F E C D 0.2 A B 0 0 50 100 150 200 250 300 n  Observa0ons:   C n  Annotators  make  errors   D E F G n  Quality  of  annotators  is  heterogeneous   n  The  quality  of  the  annota0ons  depends  on  the  difficulty  of  the  task   Experiment 4: click on 14 landmarks 50 Mean error in pixels between annotation points. The lower the better. Mean 8.71, std 6.29, median 7.35. 40 14 12 12 7 7 14 14 9 13 11 11 1310 1310 30 10 figure 6 9 8 8 9 8 8 8 7 knee 9 9 14 14 7 7 13 13 G 11 13 rWrist 10 10 rHip 20 rAnkle A 3 4 12 B 13 3 3 4 4 C 11 11 13 13 Neck rElbow 12 12 lHip 12 F rKnee 2 2 5 5 4 4 3 3 12 12 D E 2 5 12 lElbow rShoulder Head 10 B C lKnee 6 6 5 5 2 2 A 11 1 11 lWrist 1 11 lShoulder 11 lAnkle 6 1 1 6 6 1 10 10 10 10 0 0 50 100 150 200 9 250 300 350 9 9 9 8 8 8 8 7 7 7 7 14 14 7 7 14 14 14 10 10 13 13 6 14 14 6 6 6 9 8 8 8 14 14 9 12 12 10 10 10 13 13 13 13 11 11 9 9 13 5 13 9 9 10 10 13 5 5 10 10 5 9 9 9 11 11 11 11 11 4 4 12 3 3 8 8 4 8 8 11 11 4 3 3 3 12 12 12 4 4 D E F G 12 12 4 4 3 3 4 4 12 4 7 7 7 7 3 7 7 3 3 3 8 7 3 3 100 4 4 5 5 110 120 130 140 150 160 5 5 170 180 190 200 100 110 120 130 140 150 160 88 5 170 5 5 180 190 200 100 110 120 130 140 150 160 170 180 190 200 100 110 120 130 140 5 5 2 2 2 2 2 2 2 2 2 6 6 6 6 1 1 1 1 1 1 1 6 6 1 1 6 6 6 Figure 6. Quality details per landmark. We present analysis of annotation quality per landmark in experiment 4. WeFigure 5. Quality details. We presentbest pair forof annotation quality forbetween 35th4. For every image the best fitting between points “C” and detailed analysis all annotations experiments 3 and and 65th percentiles - “E” of experiment 4 in fig. 5.pair of annotations is selected. The score of the best pair is shown in the figure. For experiment 3 we score annotations by the area oftheir symmetric difference (XOR) divided bysame scale:union(OR). For experimenttowe compute the average distance between the the the area of their from image 100 4 200 on horizontal axis and from 3 pixels to 13 pixels of error on the vertical axis. T
  10. 10. +   Crea0ng  annotated  training  sets   [Soleymani  and  Larson,  2010]   n  MediaEval  2010  Affect  Task   n  Use  of  Amazon  Mechanical  Turk  to  annotate  the  Affect  Task  Corpus   n  126  videos  (2-­‐5  mins  in  length)   n  Annotate   n  Mood  (e.g.,  pleased,  helpless,  energe0c,  etc.)   n  Emo0on  (e.g.,  sadness,  joy,  anger,  etc.)   n  Boreness  (nine  point  ra0ng  scale)   n  Like  (nine  point  ra0ng  scale)  
  11. 11. +   Crea0ng  annotated  training  sets   [Nowak  and  Ruger.,  2010]   n  Crowdsourcing  image  concepts.  53  concepts,  e.g.,   n  Abstract  categories:  pPlace contains threehmutual exclusive concepts, namely In- artylife,  beach   olidays,  snow,  etc.   3.3.1 Design of HIT Template door, Outdoor and No Visual Place. In contrast several op- The design of the HITs at MTurk for the im n  Time  of  the  day:  day,  tional concepts belongue  the category Landscape Elements. night,  no  visual  c to tion task is similar to the annotation tool that w The task of the annotators was to choose exactly one concept to the expert annotators (see Sec. 3.2). Each H n  …   for categories with mutual exclusive concepts and to select of the annotation of one image with all applica all applicable concepts for optional designed concepts. All cepts. It is arranged as a question survey and photos were annotated at an image-based level. The anno- into three sections. The section Scene Descript n  Subset  of  99  images  from  the  ImageCLEF2009  dataset   tator tagged the whole image with all applicable concepts section Representation each contain four questio and then continued with the next image. tion Pictured Objects consists of three questions each section the image to be annotated is pres repetition of the image ensures that the turke while answering the questions without scrolling of the document. Fig. 2 illustrates the questi section Representation. Figure 1: Annotation tool that was used for the ac- quisition of expert annotations.
  12. 12. +   Crea0ng  annotated  training  sets   [Nowak  and  Ruger.,  2010]   n  Study  of  expert  and  non-­‐expert  labeling   n  Inter-­‐annota0on  agreement  among  experts:     n  very  high   n  Influence  of  the  expert  ground  truth  on  concept-­‐based  retrieval  ranking:     n  very  limited   n  Inter-­‐annota0on  agreement  among  non-­‐experts   n  High,  although  not  as  good  as  among  experts   n  Influence  of  averaged  annota0ons  (experts  vs.  non  experts)  on  concept-­‐based   retrieval  ranking:   n  Averaging  filters  out  noisy  non-­‐expert  annota0ons  
  13. 13. +   Crea0ng  annotated  training  sets   [Vondrick  et  al.,  2010]   n  Crowdsourcing  object  tracking  in  video   4 C. Vondrick, D. Ramanan, D. Patterson n  Annotators  draw  bounding  boxes   Fig. 2: Our video labeling user interface. All previously labeled entities are shown
  14. 14. +   Crea0ng  annotated  training  sets   [Vondrick  et  al.,  2010]   n  Annotators  label  the  enclosing  bounding  box  of  an  en0ty  every  T   frames   n  Bounding  boxes  at  intermediate  0me  instants  are  interpolated   n  Interes0ng  trade-­‐off  between     n  Cost  of  12 turk  workers  D. Ramanan, D. Patterson M C. Vondrick, n  Cost  of  interpola0on  on  Amazon  EC2  cloud   (a) Field drills (b) Basketball players
  15. 15. +   Crea0ng  annotated  training  sets   ments between F and each of the other e, every document was judged as more 4.1 HI T Design The use of preference judgments is prone to have a very simple [Urbano  et  al.,  2010]   which was judged equally similar (or HIT design (see Figure 4). We asked workers to listen to the new segment appears to the left of F withed more relevant, and G is set up in the the two incipits tor the second iteration, in the rightmost compare. Next, they were asked what variation was more similar s needed because F and G were already to the original melody, allowing 3 options: A is more similar, B isd be the pivot for the leftmost segment. more similar, and they are either equally similar or dissimilar. Weged similar to B, but D and E are evalua0on  of  music  informa0on  retrieval  systems   they n  Goal:  judged as indicated them that if one melody was part of another one, set up in a segment to the right of B. At had to be considered equally similar, so as to comply with therdered groups of relevance formed with original guidelines. As optional questions, they were asked for n  Use  crowdsourcing  amusicalalterna0ve  if o  experts  to  comments gorNote that not all the 21 judgments were their s  an   background, t any, and for create   round-­‐ truths  of  par0ally  ordered  lists   aggregate every incipit (e.g. G is only suggestions to give us some feedback.organized partially ordered list. Pivots for eachace. Documents that have been pivots alreadynts Preference JudgmentsG, B, F C<F, D<F, E<F, A<F, G=F, B<FB , F, G C=B, D>B, E>B, A=B E , F, G C=A, D=ED), (F, G) -ents, the sample of rankings given to eache than with the original method. Whenever over another one, it would be given a rankn case it was judged equally similar, a rankits sample. With the original methodology,anks given to an incipit could rangegreement  (92%  complete  +  par0al)  with  experts   n  Good  a from 1 ch increases the variance of the samples.eme, the two samples of rankings given tos are the opposite and therefore have the Mann-Whitney U tests can be used again ank samples are different or not. Becausevariable, the effect size is larger, which
  16. 16. +   Validate  the  output  of  MIR  systems [Snoek  et  al.,  2010][Freiburg  et  al.,  2011]   n  Search  engine  for  archival  rock  ‘n’  roll  concert  video   n  Use  of  crowdsourcing  to  improve,  extend  and  share  automa0cally   Audience detected  concepts  in  video  fragments   Close-up Hands Pinkpop hat Keyboard Guitar player Singer Stage Pink Drummer Over the shuolder Figure 1: Eleven common concert concepts we detect automatically, and for which we collect user-feed Audience Close-up Hands Pinkpop hat Keyboard Guitar player Drummer Over the shuolder Singer Stage Pinkpop logo Figure 1: Eleven common concert concepts we detect automatically, and for which we collect user-feedback. 180 Excluded correct fragment labels first exp 160 back. A Crowdsourcing errors 140 vided t a prefer showed Video Fragments 120 respond 100 gregatio 80 between reliable 60 forced, 40 2%. Wi crowdso Figure 2: Timeline-based video player where col- ored dots correspond to automated visual detection 20 tomated results. Users can navigate directly to fragments of 0 can be e interest by interaction with the colored dots, which >50% >60% >70% >80% >90% is an in pop-up a feedback overlay as displayed in Figure 3. User-Feedback Agreement Figure 2: Timeline-based video player where col- 6. AC since 1970 at Landgraaf, the Netherlands. All music videos Figure 4: Results for Experiment 2: Quality vs We th
  17. 17. +   Validate  the  output  of  MIR  systems Crowdsourcing Event Detection in YouTube Videos 3 [Steiner  et  al.,  2011]   through a combination of textual, visual, and behavioral analysis techniques. When a user starts watching a video, three event detection processes start: Visual Event Detection Process We detect shots in the video by visually analyzing its content [19]. We do this with the help of a browser extension, i.e., the whole process runs on the client-side using the modern HTML5 [12] JavaScript APIs of the <video> and <canvas> elements. As soon as the shots have been detected, we offer the user the n  Propose  a  browser  extension  to  navigate  detected  events  in  videos   choice to quickly jump into a specific shot by clicking on a representative still frame. n  Visual  events  (shot  changes)   The detected named entitiesvideopresented to the Occurrence Event Detection Process We analyze the available NLP techniques, as outlined in [18]. are metadata using user in a list, and upon click via a timeline-like user interface allow for jumping into n  Occurrence  events  (analysis  of  metadata  by  means  of  NLP  to  detect   one of the shots where the named entity occurs. named  en00es)   JavaScript Detection Processeachsoon asshotsvisualcount clicks been detected, Interest-based Event we attach event listeners to As of the the and events have on shots as an n  Interest-­‐based  events  (click  counters  on  detected  visual  events)   expression of interest in those shots. Fig. 2: Screenshot of the YouTube browser extension, showing the three different event
  18. 18. +   Validate  the  output  of  MIR  systems [Goeau  et  al.,  2011]   n  Visual  plant  species  iden0fica0on   n  Based  on  local  visual  features   n  Crowdsourced  valida0on   writing, 858 images were up new users. These images a with uniform background, o background, and involve 15 set of 55 species. Note that within ImageCLEF2011 pla 5. EVALUATION Performances, basically i rates, will be actually show fline version connected to a d an enjoying demo where an leaves. Users would notice s cation (around 2 seconds), suggested in spite of the in cases with occlusions or wit Figure 1: GUI of the web application. a rough guide, a leave one
  19. 19. +   Validate  the  output  of  MIR  systems   [Yan  et  al.,  2010]   n  CrowdSearch  combines   n  Automated  image  search   n  Local  processing  on  mobile  phones  +  backend  processing   n  Real-­‐0me  human  valida0on  of  search  results   n  Amazon  Mechanical  Turk   n  Studies  the  trade-­‐off  in  terms  of   n  Delay  man error and bias to maximize accuracy. To balance these !"#$%&()*# +),-.-)/#&()*#0 1"23.4)/#&5)3.-)/.6,&7)080tradeoffs, CrowdSearch uses an adaptive algorithm that uses n  Accuracy   % $ # " !delay and result prediction models of human responses to ju- )(*+,( &( &( &( &( &( +9 n  Cost  diciously use human validation. Once a candidate image isvalidated, it is returned to the user as a valid search result. % $ # " ! )(*+,( &( -. &( &( &( +<3. CROWDSOURCING FOR SEARCH In this section,n  More  on  this  later…   of the Ama- % $ # " ! we first provide a background )(*+,( -. -. -. -. -. +;zon Mechanical Turk (AMT). We then discuss several designchoices that we make while using crowdsourcing for image % $ # " !validation including: 1) how to construct tasks such that )(*+,( &( -. -. &( &( +:they are likely to be answered quickly, 2) how to minimizehuman error and bias, and 3) how to price a validation taskto minimize delay. Figure 2: Shown are an image search query, candi-
  20. 20. C./% +3/% *)% -./% 62*7,% #3% #% 3/#26.% 3-2#-/4=% "3% 6*:1/99"$4D% "-% ,/-/6-"*$% *)% 31/6")"6% )/#-+2/3% 7"-."$% $*$AB?% :+9-":/,"#%"$-2*,+6/3% ,"</23"-=% *)% 3/#26.% -/2:3% 3"$6/% ,"))/2/$-% :/:;/23% *)% 6*99/6-"*$3>% R"8"1/,"#% Z/-2"/<#9@% #% -#38% "$% U:#4/?MGQ% &KX(%-./% 62*7,% 7"99% #119=% ,"))/2/$-% 3/#26.% 3-2#-/4"/3% ;#3/,% *$% -./"2% "$<*9</3% 9*6#-"$4% 2/9/<#$-% ":#4/3% )2*:% -./% R"8"1/,"#% ":#4/%)#:"9"#2"-=% 7"-.% -./% 3/#26.% -*1"6>% E*2/*</2@% -./% 62*7,% .#3% ;//$% 6*99/6-"*$% ;#3/,% *$% #% 12*<",/,% -/H-% 0+/2=% #$,% 3/</2#9% 3#:19/%3.*7$% -*% 12*<",/% 4**,% 0+#9"-=% "$% 3-+,"/3% "$<*9<"$4% 2/9/<#$6/% ":#4/3>% % R."9/% R"8"1/,"#% Z/-2"/<#9% /H#:"$/3% $*"3=% #$,% +$3-2+6-+2/,% -/H-+#9% #$$*-#-"*$3% "$% R"8"1/,"#% :+9-":/,"#@% -./% +  F+,4:/$-3>%G</$%7"-.%,"</23"-=@%7/%6#$%3-"99%/H1/6-%3/#26.%0+#9"-=I%3*:/%3-+,"/3%*$%12/,"6-"*$%"$%62*7,3*+26"$4%3=3-/:3%,/:*$3-2#-/% 3/:"3-2+6-+2/,%6*$-/$-%/<#9+#-/,%"$%U:#4/?MGQ%"3%)#2%9/33%$*"3=%-.#-% 2/9"#;"9"-=% *)% -./% #</2#4/% *)% 12/,"6-/,% 36*2/3% ;=% -./% 62*7,% #$,%:*2/%3-2+6-+2/,%-.#$%6*$-/$-%3/#26./3%*$%S*+C+;/>% Query  expansion  /  reformula0on  ":12*</3% #3% -./% 3"J/% *)% -./% 62*7,% "$62/#3/3% &KL@% KK(>% M"8/7"3/@% V/</2#9% 3-+,"/3% .#</% /H#:"$/,% 3/#26.% 0+#9"-=% *$% +3/23+119"/,%3/#26.%0+#9"-=%"3%/H1/6-/,%-*%":12*</%#3%-./%$+:;/2%*)%3/#26./23% -#43%"$%*-./2%R/;%O>L%#119"6#-"*$3>%%]"</23"-=%*)%":#4/%-#4%3/#26.%"$% -./% 62*7,% /H1#$,3>% ?2*7,3*+26"$4% 6*$-2#3-3% 7"-.% 8$*79/,4/% 2/3+9-3% "$% Q9"682% +3"$4% #$% ":19"6"-% 2/9/<#$6/% )//,;#68% :*,/9% "3% [Harris,  2012]  :#28/-3% "$% 9/</9% *)% /$4#4/:/$-D% N"/93/$% :/$-"*$3% "$% &KO(% -.#-% /H19*2/,%;=%<*$%^7*9%!"#$%&#%&KY(@%6*$69+,"$4%-.#-%,"</23"-=%"3%#$%*</2% LP% *)% 8$*79/,4/% :#28/-% 42*+1% 1#2-"6"1#$-3% )#"9% -*% ":1*2-#$-% 6*:1*$/$-% 7./$% 2/-2"/<#9% "3% ;#3/,% *$% 3:#99% ,#-#% 3/-3@%6*$-2";+-/D% -./2/)*2/% -./% 62*7,3*+26"$4% #31/6-% "$-2*,+6/3% 3*:/% 3+6.% #3% -.*3/% )*+$,% "$% ":#4/% -#43>% % _*-.*% !"#$ %&#% /H19*2/%)"$#$6"#9%"$6/$-"</%-*%:*-"<#-/%-#38%1#2-"6"1#-"*$>% )*983*$*:=% -#44"$4@% 7."6.% "3% ;*+$,% ;=% -./% 3#:/% $*"3=% +$3-2+6-+2/,%2/3-2"6-"*$3%#3%S*+C+;/%-#43%&K`(@%;+-%-./"2%3-+,=%7#3%C./%*;F/6-"</%"$%-."3%1#1/2%"3%-*%/H#:"$/%")%-./%62*7,%6#$%12*<",/% 12":#2"9=% )*6+3/,% *$% 2/6*::/$,/2% 3=3-/:3% +3#4/% *)% -./3/% -#43>%#% :*2/% 12/6"3/% 3/-% *)% AB?% 3/#26.% 2/3+9-3@% 4"</$% #% 0+/2=@% a-./23% .#</% /H#:"$/,% :+9-":/,"#% 3/#26.% /))/6-"</$/33% *$%6*:1#2/,% 7"-.% *-./2% :+9-":/,"#% 3/#26.% -**93>% C./% 6*$-2";+-"*$3%*)% -."3% 1#1/2% #2/% #3% )*99*73>% Q"23-% 7/% 6*:1#2/% -./% 2/-2"/<#9% !"! #$%&(%) 8$*79/,4/%:#28/-%7/;3"-/3@%3+6.%#3%?.+#%!"#$%&#%"$%&Kb(%#$,%M"%!"#$ (:;-4)NC)O/974: n  Search  YouTube  user  generated  content  1/2)*2:#$6/%*)%,"))/2/$-%2/-2"/<#9%:*,/93%"$%-/2:3%*)%12/6"3"*$%*$%3/</2#9% 6#-/4*2"/3% +3"$4% AB?% <",/*% 2/0+/3-3% -#8/$% )2*:% 9/#,"$4% %&#%"$%&Kc(D%.*7/</2@%-./"2%)*6+3%"3%-*%9*6#-/%#99%6*$-/$-%#,,2/33"$4% !"*! +,,-./0) #% 31/6")"6% 0+/3-"*$% d/>4>% e.*7% -*f% #$,% e7.=f% 0+/3-"*$% -=1/3g% ./6478:94)F.>. %4:79A)%67:640 !"#$%& ()& *++,#$%& )-.,/.#+$& 0)(+12& 3)& 4.,4/,.)& ()& 567& 7./2/#3% -./% )*6+3% *)% *+2% 3-+,=% "3% *$% )"$,"$4% #$,% 2#$8"$4% <",/*3%8$*79/,4/% :#28/-% 7/;3"-/3>% R/% -./$% 6*:1#2/% S*+C+;/T3% *7$% "4+8)"&9+8&).4(&+9&()&").84(&)99+8":&&;()")&.8)&%#-)$&#$&;.<,)&=:&& H/1)$&H).84( -.#-%)+9)"99%#%31/6")"6%3/#26.%2/0+/3-%d/>4>@%e./91%)"$,%#%<",/*fg>%%3/#26.%"$-/2)#6/%7"-.%#%3/#26.%6*$,+6-/,%;=%3-+,/$-3%#3%7/99%#3%#% >(#,)&()")&"4+8)"&"))0&8)."+$.<,)2&#&#"&,#?),@&1/)&+&3+&#""/)"A& n  Natural  language  queries  are  restated  and  given  as  input  to  3/#26.% #112*#6.% +3"$4% 62*7,3*+26"$4>% % R/% /<#9+#-/% *+2% 2/3+9-3%+3"$4% -7*% :/-.*,3I% :/#$% #</2#4/% 12/6"3"*$% ,/-/2:"$/,% #)-/2% h%)/7%3-+,"/3%.#</%/H#:"$/,%-./%/))/6-"</$/33%*)%62*7,3%*$%$*"3=% +/8&4.,4/,.#+$&+9&%8+/$1&8/(&.$12&9+8&0+"&").84()"2&()8)&3)8)& ,#-#% 3/#26./3>% V-/"$/2% !"#$ %&#% ,/:*$3-2#-/,% 3/#26./3% *)% /</$-% "0.,,& *)84)$.%)& +9& B+/;/<)& -#1)+"& 3)8)& 4+$"#1)8)1& +$,@& .& ^8+31"+/84#$% &#119="$4% 1**9"$4@% #$,% #% 3":19/% 9"3-% 12/)/2/$6/@% 7./2/% -./% /$-"2/% ,/-/6-"*$%:/-.*,3%"$%S*+C+;/%<",/*3%#-%-./%)2#4:/$-%9/</9%&K(>% & ;()& 48+31"+/84#$%& ").84(& "8.)%@& .$1& ()& "/1)$& 8),)-.$:& YouTube  search  interface   n 9"3-%*)%<",/*3%F+,4/,%#3%2/9/<#$-%;=%/#6.%:/-.*,%#2/%6*:1#2/,>%% _3+/.%!"#$%&#%/H#:"$/,%3/#26./3% "$%1*9"-"6#9%;9*43%"$%&OL(%7."6.@%"8.)%#)"& *)89+80)1& <))8& (.$& ()& B+/;/<)& ").84(& ").84(& (:;-4)SC)O/9 ./6478:94)F #9-.*+4.% $*"3=@% ,*% $*-% /H1/2"/$6/% -./% 2/3-2"6-"*$3% "$./2/$-% "$% ."& 0)."/8)1& <@& 5672& .& 8)"/,& (.& #"& ".#"#4.,,@& #$)89.4)&C./%2/:#"$,/2%*)%-./%1#1/2%"3%*24#$"J/,%#3%)*99*73>%U$%V/6-"*$%O% n  Students  7/%1+-%*+2%7*28%"$%-./%6*$-/H-%*)%12/<"*+3%7*28>%U$%V/6-"*$%W%7/% #112*#6.% 6#99/,% ?2*7,V/#26.@% 7."6.% 12*<",/,% $/#22/#9-":/% "#%$#9#4.$&C3+&.#,)12&*DE:EFG:& :+9-":/,"#% -#43>% % U$% &OK(@% S#$% !"#$ %&>% 12*<",/,% #$% "$$*<#-"</% %4:79A)%67:640,"36+33% *+2% /H1/2":/$-#9% 3/-+1>% V/6-"*$% X% *))/23% #% ,"36+33"*$% *)% (:;-4)<")=>47:--)?@+)59,745)8,7)4:9A)54:79A)567:640B") H/1)$&H).84( #33/33:/$-% *)% ":#4/3>% h9-.*+4.% -./% #+-.*23T% )*6+3% 7#3% *$%V/6-"*$%Y>% n  Crowd  in  Mturk  -./% 2/3+9-3>% R/% 6*$69+,/% #$,% 12*<",/% "$3"4.-% "$-*% )+-+2/% 7*28% "$% 9#;/9"$4% ":#4/3@% -./"2% #112*#6.% 6*+9,% )/#3";9=% ;/% /H-/$,/,%%4:79A)%67:640B) 9*6#-"$4%3":"9#2%:/,"#%*$%S*+C+;/>% -*% ?@+) ^8+31"+/84#$ H/1)$&H).84(& E:FXK& & ;+&#,,/"8.)2&#$&; ^8+31"+/84#$%& E:FQX& U1#99#4/,V2&+&+< B+/;/<)&H).84(& E:=Fa& 3)&3+/,1&)W*)4 & 4)$":& & ;+& +< H#$4)& I)".)1& J/)8#)"& 3)8)& %8+/*)1& #$+& (8))& ")*.8.)& 48+31"+/84#$%2& 4.)%+8#)"& C)."@2& 0)1#/02& .$1& 1#99#4/,G2& 3)& )-.,/.)1& ()0& 0#$/)"&.$1&#$4/ ")*.8.),@& 9+8& ).4(& ").84(& "8.)%@:& & ;()& 8)"/,"& .8)& 8)*+8)1& #$& 8)*8)")$& ,+$%& ) ;.<,)&K:& 48+312& ."& 4+0*. .?)"& 3+& (#81" MAP   (:;-4)!C)?@+)59,745)8,7)4:9A)54:79A)567:640BD);7,E4/)F,G/) ;B)54:79A)9:640,7B") )M/#-.,)$&.0+/$ +/8&*.8.0+/$&+ +& *8+-#1)& ()& < %4:79A)%67:640B) $:5B) ?4F.H2) I.88.9H-6) 4+$"#1)8.#+$2&+/ H/1)$&H).84(& E:T=T& E:FQT& E:KXQ& ()&<)"&8.1)+99& ^8+31"+/84#$%& E:TQa& E:FQX& E:KEK& !"1! %.23-4) B+/;/<)&H).84(& E:FE`& E:=KK& E:XXK& >)& .**,@& ^+*), PXT2& XYR2& #"& .& % & ;.<,)&K&8.#")"&"+0)&#$)8)"#$%&*+#$"&9+8&1#"4/""#+$:&&L#8"2&567& *8)9)8)$4)":& & ^+ -./012)3")+4214.25)67)892)4.:26)1281.24;<)=16>2??).@46<4.@/)A600B2C?)?2;1>9).@8217;>2D)?80:2@8?D);@:)892)>165:") "4+8)"&9+8&)."@&M/)8#)"&.8)&0/4(&0+8)&4+$"#")$&.48+""&"8.)%#)"& 3+& ,#""& 9+8& .& %# 4+0*.8)1& 3#(& (+")& 9+8& 0)1#/0& +8& 1#99#4/,& ").84()":& & ;(#"& #"& ."")""+8S"&*8)9)8) % <@&$/0<)8&+9&-#4 ,#?),@&.&8)"/,&+9&.&().-#)8&8),#.$4)&9+8&"/1)$"&.$1&()&48+31&+$& ()& ".$1.81& B+/;/<)& ").84(& #$)89.4)& 9+8& ()& )."#)8& M/)8#)"2& 3#$$)8:&&>)&)W.0
  21. 21. +   Aggrega0ng  annota0ons  
  22. 22. +   Annota0on  model   n  A  set  of  objects  to  annotate   i = 1, . . . , I n  A  set  of  annotators   j = 1, . . . , J n  Types  of  annota0ons   n  Binary   n  Categorical  (mul0-­‐class)   n  Numerical   n  Other    
  23. 23. +   Annota0on  model   True  labels   Objects   Annotators   1 y1 1 y2 2 y1 y1 Annota0ons   3 y2 y2 j yi ∈ L 3 y3 Binary   |L| = 2 y3 4 y1   5 Mul0-­‐class   |L| > 2 y1 5 y2
  24. 24. +   Aggrega0ng  annota0ons   n  Majority  vo0ng  (baseline)   n  For  each  object,  assign  the  label  that  received  the  largest  number  of  votes   n  Aggrega0ng  annota0ons   n  [Dawid  and  Skene,  1979]   n  [Snow  et  al.,  2008]   n  [Whitehill  et  al.,  2009]   n  …   n  Aggrega0ng  and  learning   n  [Sheng  et  al.,  2008]   n  [Donmez  et  al.,  2009]   n  [Raykar  et  al.,  2010]   n  …  
  25. 25. +   Aggrega0ng  annota0ons   Majority  vo0ng   n  Assume  that     j n  The  annotator  quality  is  independent  from  the  object  P (yi = yi ) = pj   n  All  annotators  have  the  same  quality   pj = p n  The  integrated  quality  of  majority  vo0ng  using    I              2N    +    1       =                   annotators  is   2N + 1 N q = P (y M V = y) = p2N +1−i · (1 − p)i i l=0
  26. 26. +   Aggrega0ng  annota0ons   Majority  vo0ng   -q 1 0.9 p=1.0 ly. p=0.9 Integrated qualityme 0.8 p=0.8 0.7 p=0.7 0.6 p=0.6 0.5 p=0.5 y) 0.4 p=0.4me 0.3 , 0.2 U. 1 3 5 7 9 11 13 yi Number of labelers is Figure 2: The relationship between integrated label-ue ing quality, individual quality, and the number of la- el belers.
  27. 27. +   Aggrega0ng  annota0ons   [Snow  et  al.,  2008]   j n  Binary  labels:     yi ∈ {0, 1} n  The  true  label  is  es0mated  evalua0ng  the  posterior  log-­‐odds,  i.e.,   1 J P (yi = 1|yi , . . . , yi ) log 1 J P (yi = 0|yi , . . . , yi ) n  Applying  Bayes  theorem   P (yi = 1|yi , . . . , yi ) 1 J j P (yi |yi = 1) P (yi = 1) log 1 J = log j + log P (yi = 0|yi , . . . , yi ) j P (yi |yi = 0) P (yi = 0) posterior   likelihood   prior  
  28. 28. +   Aggrega0ng  annota0ons   [Snow  et  al.,  2008]   j j n  How  to  es0mate P (yi |y      =    1)    and  P    (y  i    |y  i      =    0)  ?    i                                     n  Gold  standard:     n  Some  objects  have  known  labels   n  Ask  to  annotate  these  objects   n  Compute  empirical  p.m.f.  for  object(s)  with  known  labels   Number of correct annotations P (y j = 1|y = 1) = Number of annotations of object with label = 1 n  Compute  the  performance  of  annotator    j    (independent  from  the  object)       j j j P (y1 |y1 = 1) = P (y2 |y2 = 1) = . . . = P (yI |yI = 1) = P (y j |y = 1)
  29. 29. +   Aggrega0ng  annota0ons   [Snow  et  al.,  2008]   n  Each  annotator  vote  is  weighted  by  the  log-­‐likelihood  ra0o  for  their   given  response  (Naïve  Bayes)   n  More  reliable  annotators  are  weighted  more   P (yi = 1|yi , . . . , yi ) 1 J j P (yi |yi = 1) P (yi = 1) log 1 J = log j + log P (yi = 0|yi , . . . , yi ) j P (yi |yi = 0) P (yi = 0) n  Issue:  Obtaining  a  gold  standard  is  costly!    

×