CrowdQ: Crowdsourced Query Understanding


Published on

Conference talk at CIDR 2013, January 2013, Asilomar, CA, USA.

  • Interesting. I too have a hypothesis that meaning of an expression (query is one of them) is what bulk of the receivers of the expression think it means. Please take a look at

    Let's see if we can collaborate. Best wishes,
    Are you sure you want to  Yes  No
    Your message goes here

CrowdQ: Crowdsourced Query Understanding

  1. 1. CrowdQ:   Crowdsourced  Query  Understanding     Gianluca  Demar8ni,  Beth  Trushkowsky,   Tim  Kraska,  Michael  J.  Franklin  
  2. 2. Scenario   Find  the  birthdate  of  the  mayor  of  the  capital   city  of  France     Gianluca  Demar8ni   2  
  3. 3. Gianluca  Demar8ni   3  
  4. 4. Gianluca  Demar8ni   4  
  5. 5. Gianluca  Demar8ni   5  
  6. 6. Gianluca  Demar8ni   6  
  7. 7. Mo8va8on   •  Web  Search  Engines  can  answer  simple  factual   queries  directly  on  the  result  page   •  Users  with  complex  informa8on  needs  are   oQen  unsa8sfied   •  Purely  automa8c  techniques  are  not  enough   •  We  want  to  solve  it  with  Crowdsourcing!   Gianluca  Demar8ni   7  
  8. 8. Background   •  Crowdsourcing  so  far  used  for  data  processing   – DB/SemWeb:  Data  integra8on  and  cleaning   – IR:  Relevance  judgments     We  use  the  crowd  to  understand  the  query   Gianluca  Demar8ni   8  
  9. 9. CrowdQ   •  CrowdQ  is  the  first  system  that  uses   crowdsourcing  to   – Understand  the  intended  meaning   – Build  a  structured  query  template   – Answer  the  query  over  Linked  Open  Data   Gianluca  Demar8ni   9  
  10. 10. Gianluca  Demar8ni   10  
  11. 11. User Keyword Query On#line'Complex'Query Processing Complex query classifier Crowdsourcing Platform Vetrical selection, Unstructured Search, ... POS + NER tagging Query Template Index Crowd Manager N Y Queries Templ + Answer Types Structured LOD Search Result Joiner Template Generation SERP t1 t2 t3 Off#line'Complex'Query Decomposition Structured Query Query Log query N Answer Composition LOD Open Data Cloud Match with existing query templates CrowdQ  Architecture   Gianluca  Demar8ni   11   Off-­‐line:  query  template  genera8on  with  the  help  of  the  crowd   On-­‐line:  query  template  matching  using  NLP  and  search  over  open  data  
  12. 12. Hybrid  Human-­‐Machine  Pipeline   Gianluca  Demar8ni   12   Q=  birthdate  of  actors  of  forrest  gump   Query  annota8on   Noun   Noun   Named  en8ty   Verifica8on   En8ty  Rela8ons   Is  forrest  gump  this  en8ty  in  the  query?   Which  is  the  rela8on  between:  actors  and  forrest  gump   starring   Schema  element   Starring                          <dbpedia-­‐owl:starring>     Verifica8on   Is  the  rela8on  between:   Indiana  Jones  –  Harrison  Ford   Back  to  the  Future  –  Michael  J.  Fox   of  the  same  type  as   Forrest  Gump  -­‐  actors      
  13. 13. Structured  query  genera8on   SELECT  ?y  ?x   WHERE  {  ?y  <dbpedia-­‐owl:birthdate>  ?x  .        ?z  <dbpedia-­‐owl:starring>  ?y  .        ?z  <rdfs:label>  ‘Forrest  Gump’  }   Gianluca  Demar8ni   13   Results  from  BTC09:   Q=  birthdate  of  actors  of  forrest  gump   MOVIE   MOVIE  
  14. 14. Current  Status   •  Realize  the  vision   •  Running  demo:   – Daniel  Haas,  Daniel  Bruckner,  Jonathan  Harper   •  Next  Steps   – Evalua8on  of  Crowd  effec8veness  at  each  step   – Comparison  hybrid  vs  machine  pipeline   Gianluca  Demar8ni   14  
  15. 15. Conclusions   •  CrowdQ:  an  hybrid  approach  to  complex  query   understanding   •  Combines  techniques  from  DB,  NLP,  IR,  Data   Mining,  and  Human  Intelligence     •  Ini8al  experiments  show  the  poten8al  of   CrowdQ   Gianluca  Demar8ni   15