Your SlideShare is downloading. ×

Getting Semantics from the Crowd

551

Published on

Talk given at the Dagstuhl seminar on Semantic Data Management, April 2012

Talk given at the Dagstuhl seminar on Semantic Data Management, April 2012

0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
551
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Ge#ng  Seman*cs  from  the  Crowd   Gianluca  Demar*ni   eXascale  Infolab,  University  of  Fribourg   Switzerland  
  • 2. Seman<c  Web  2.0     •  not  the  Web  3.0   •  GeDng  seman<cs  from  (non-­‐expert)  people   –  From  few  publishers  and  many  consumers  (SW  1.0)   –  To  many  publishers  and  many  consumers  (SW  2.0)   27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   2  
  • 3. read/write  SW   •  Wikidata hQp://meta.wikimedia.org/wiki/Wikidata     •  Seman<cs  is  about  the  meaning   •  Get  people  in  the  loop!   •  Social  compu<ng  for  SemWeb  applica<ons   27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   3  
  • 4. Crowdsourcing   •  Exploit  human  intelligence  to  solve   – Tasks  simple  for  humans,  complex  for  machines   – With  a  large  number  of  humans  (the  Crowd)   – Small  problems:  micro-­‐tasks  (Amazon  MTurk)   •  Examples   – Wikipedia,  Flickr   •  Incen<ves   – Financial,  fun,  visibility   27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   4  
  • 5. Crowdsourcing   •  Success  Stories   – Training  set  for  ML   – Image  tagging   – Document  annota<on/transla<on   – IR  evalua<on  [Blanco  et  al.  SIGIR  2011]   – CrowdDB  [Franklin  et  al.  SIGMOD  2011]   27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   5  
  • 6. Crowd-­‐powered  SW  apps   •  En<ty  Linking  [ZenCrowd  at  WWW12]   •  Create/validate  sameAs  links   •  Schema  matching   •  ...  Add  your  own  favorite  applica<on!   27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   6   HTML+ RDFa Pages LOD Cloud
  • 7. ZenCrowd   •  Combine  both  algorithmic  and  manual  linking   •  Automate  manual  linking  via  crowdsourcing   •  Dynamically  assess  human  workers  with  a   probabilis<c  reasoning  framework   27-­‐Apr-­‐12   7   Crowd   Algorithms  Machines  
  • 8. ZenCrowd  Architecture   Micro Matching Tasks HTML Pages HTML+ RDFa Pages LOD Open Data Cloud Crowdsourcing Platform ZenCrowd Entity Extractors LOD Index Get Entity Input Output Probabilistic Network Decision Engine Micro- TaskManager Workers Decisions Algorithmic Matchers 27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   8  
  • 9. The  micro-­‐task   27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   9  
  • 10. En<ty  Factor  Graphs   •  Graph  components   – Workers,  links,  clicks   – Prior  probabili<es   – Link  Factors   – Constraints   •  Probabilis<c   Inference   – Select  all  links  with   posterior  prob  >τ   w1 w2 l1 l2 pw1( ) pw2( ) lf1( ) lf2( ) pl1( ) pl2( ) l3 lf3( ) pl3( ) c11 c22 c12 c21 c13 c23 u2-3( )sa1-2( ) 2  workers,  6  clicks,  3  candidate  links   Link  priors   Worker   priors   Observed   variables   Link   factors   SameAs   constraints   Dataset   Unicity   constraints 27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   10  
  • 11. ZenCrowd:  Lessons  Learnt   •  Crowdsourcing  +  Prob  reasoning  works!   •  But   – Different  worker  communi<es  perform  differently   – No  differences  w/  different  contexts   – Comple<on  <me  may  vary  (based  on  reward)   – Many  low  quality  workers  +  Spam   27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   11  
  • 12. ZenCrowd   •  Worker  Selec<on   Top$US$ Worker$ 0$ 0.5$ 1$ 0$ 250$ 500$ Worker&Precision& Number&of&Tasks& US$Workers$ IN$Workers$ 27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   12  
  • 13. Challenges  for  Crowd-­‐SW   •  How  to  design  the  micro-­‐task   •  Where  to  find  the  crowd   – MTurk,  Facebook  (900M  users)   •  Evalua<on   – Which  ground  truth?!   •  Quality  control  /  Spam   – Need  for  spam  benchmarks  in  Crowdsourcing   [Mechanical  Cheat  at  CrowdSearch  2012]   27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   13  
  • 14. 27-­‐Apr-­‐12   Gianluca  Demar<ni,  eXascale  Infolab   14  

×