SlideShare a Scribd company logo
En##es,	
  Graphs,	
  and	
  Crowdsourcing	
  
for	
  be7er	
  Web	
  Search	
  
Gianluca	
  Demar#ni	
  
eXascale	
  Infolab	
  
University	
  of	
  Fribourg,	
  Switzerland	
  
Gianluca	
  Demar#ni	
  
•  M.Sc.	
  at	
  University	
  of	
  Udine,	
  Italy	
  
•  Ph.D.	
  at	
  University	
  of	
  Hannover,	
  Germany	
  
–  En#ty	
  Retrieval	
  
•  Worked	
  for	
  UC	
  Berkeley	
  (on	
  Crowdsourcing),	
  Yahoo!	
  Research	
  
(Spain),	
  L3S	
  Research	
  Center	
  (Germany)	
  
•  Post-­‐doc	
  at	
  the	
  eXascale	
  Infolab,	
  Uni	
  Fribourg,	
  Switzerland.	
  
•  Lecturer	
  for	
  Social	
  Compu,ng	
  in	
  Fribourg	
  
•  Tutorial	
  on	
  En#ty	
  Search	
  at	
  ECIR	
  2012,	
  on	
  Crowdsourcing	
  at	
  
ESWC	
  2013	
  and	
  ISWC	
  2013	
  
•  Research	
  Interests	
  
–  Informa#on	
  Retrieval,	
  Seman#c	
  Web,	
  Crowdsourcing	
  
2	
  
demartini@exascale.info
Gianluca	
  Demar#ni	
  
Gianluca	
  Demar#ni	
   3	
  
Gianluca	
  Demar#ni	
   4	
  
Web	
  of	
  Data	
  
•  Freebase	
  
–  Acquired	
  by	
  Google	
  in	
  July	
  2010.	
  
–  Knowledge	
  Graph	
  launched	
  in	
  May	
  2012.	
  
•  Schema.org	
  
–  Driven	
  by	
  major	
  search	
  engine	
  companies	
  
–  Machine-­‐readable	
  annota#ons	
  of	
  Web	
  pages	
  
•  Linked	
  Open	
  Data	
  
–  31	
  billion	
  triples,	
  Sept.	
  2011	
  
Gianluca	
  Demar#ni	
   5	
  
Linked	
  Open	
  Data	
  
Z.	
  Kaoudi	
  and	
  I.	
  Manolescu,	
  ICDE	
  seminar	
  2013	
  	
   6	
  
I	
  will	
  talk	
  about	
  
•  En#ty	
  Linking/Disambigua#on	
  
– On	
  the	
  Web	
  using	
  crowdsourcing	
  
– For	
  scien#fic	
  literature	
  using	
  graphs	
  
•  Ad-­‐hoc	
  Object	
  Retrieval	
  (En#ty	
  Ranking)	
  
– Using	
  IR	
  and	
  graphs	
  
•  Crowdsourced	
  Query	
  Understanding	
  
Gianluca	
  Demar#ni	
   7	
  
Disclaimer	
  
•  No	
  efficiency	
  evalua#on	
  
– Approaches	
  not	
  distributed	
  
– But	
  designed	
  to	
  scale	
  out	
  
•  No	
  user	
  studies	
  
– Goal:	
  Obtain	
  high	
  quality	
  data	
  
– Only	
  TREC-­‐like	
  evalua#on	
  on	
  effec#veness	
  
Gianluca	
  Demar#ni	
   8	
  
En#ty	
  Linking/Disambigua#on	
  
Gianluca	
  Demar#ni	
   10	
  
h7p://dbpedia.org/resource/Facebook	
  
h7p://dbpedia.org/resource/Instagram	
  
jase:Instagram	
  
owl:sameAs	
  
Google	
  
Android	
  
<p>Facebook	
  is	
  not	
  wai#ng	
  for	
  its	
  ini#al	
  
public	
  offering	
  to	
  make	
  its	
  first	
  big	
  
purchase.</p><p>In	
  its	
  largest	
  
acquisi#on	
  to	
  date,	
  the	
  social	
  network	
  
has	
  purchased	
  Instagram,	
  the	
  popular	
  
photo-­‐sharing	
  applica#on,	
  for	
  about	
  $1	
  
billion	
  in	
  cash	
  and	
  stock,	
  the	
  company	
  
said	
  Monday.</p>	
  
<p><span	
  about="h7p://dbpedia.org/resource/
Facebook"><cite	
  property=”rdfs:label">Facebook</
cite>	
  is	
  not	
  wai#ng	
  for	
  its	
  ini#al	
  public	
  offering	
  to	
  
make	
  its	
  first	
  big	
  purchase.</span></p><p><span	
  
about="h7p://dbpedia.org/resource/Instagram">In	
  
its	
  largest	
  acquisi#on	
  to	
  date,	
  the	
  social	
  network	
  has	
  
purchased	
  <cite	
  property=”rdfs:label">Instagram</
cite>	
  ,	
  the	
  popular	
  photo-­‐sharing	
  applica#on,	
  for	
  
about	
  $1	
  billion	
  in	
  cash	
  and	
  stock,	
  the	
  company	
  said	
  
Monday.</span></p>	
  
RDFa	
  
enrichment	
  
HTML:	
  
Crowdsourcing	
  
•  Exploit	
  human	
  intelligence	
  to	
  solve	
  
– Tasks	
  simple	
  for	
  humans,	
  complex	
  for	
  machines	
  
– With	
  a	
  large	
  number	
  of	
  humans	
  (the	
  Crowd)	
  
– Small	
  problems:	
  micro-­‐tasks	
  (Amazon	
  MTurk)	
  
•  Examples	
  
– Wikipedia,	
  Image	
  tagging	
  
•  Incen#ves	
  
– Financial,	
  fun,	
  visibility	
  
Gianluca	
  Demar#ni	
   11	
  
ZenCrowd	
  
•  Combine	
  both	
  algorithmic	
  and	
  manual	
  linking	
  
•  Automate	
  manual	
  linking	
  via	
  crowdsourcing	
  
•  Dynamically	
  assess	
  human	
  workers	
  with	
  a	
  
probabilis#c	
  reasoning	
  framework	
  
12	
  
Crowd	
  
Algorithms	
  Machines	
  
Gianluca	
  Demar#ni	
  
ZenCrowd	
  Architecture	
  
Micro
Matching
Tasks
HTML
Pages
HTML+ RDFa
Pages
LOD Open Data Cloud
Crowdsourcing
Platform
ZenCrowd
Entity
Extractors
LOD Index Get Entity
Input Output
Probabilistic
Network
Decision Engine
Micro-
TaskManager
Workers Decisions
Algorithmic
Matchers
Gianluca	
  Demar#ni	
   13	
  
Gianluca	
  Demar#ni,	
  Djellel	
  Eddine	
  Difallah,	
  and	
  Philippe	
  Cudré-­‐Mauroux.	
  ZenCrowd:	
  Leveraging	
  Probabilis#c	
  
Reasoning	
  and	
  Crowdsourcing	
  Techniques	
  for	
  Large-­‐Scale	
  En#ty	
  Linking.	
  In:	
  21st	
  Interna#onal	
  Conference	
  on	
  
World	
  Wide	
  Web	
  (WWW	
  2012).	
  
Algorithmic	
  Matching	
  
•  Inverted	
  index	
  over	
  LOD	
  en##es	
  
– DBPedia,	
  Freebase,	
  Geonames,	
  NYT	
  
•  TF-­‐IDF	
  (IR	
  ranking	
  func#on)	
  
•  Top	
  ranked	
  URIs	
  linked	
  to	
  en##es	
  in	
  docs	
  
•  Threshold	
  on	
  the	
  ranking	
  func#on	
  or	
  top	
  N	
  
Gianluca	
  Demar#ni	
   14	
  
En#ty	
  Factor	
  Graphs	
  
•  Graph	
  components	
  
– Workers,	
  links,	
  clicks	
  
– Prior	
  probabili#es	
  
– Link	
  Factors	
  
– Constraints	
  
•  Probabilis#c	
  
Inference	
  
– Select	
  all	
  links	
  with	
  
posterior	
  prob	
  >τ	
  
w1
w2
l1
l2
pw1( ) pw2( )
lf1( ) lf2( )
pl1( ) pl2( )
l3
lf3( )
pl3( )
c11
c22
c12
c21
c13
c23
u2-3( )sa1-2( )
2	
  workers,	
  6	
  clicks,	
  3	
  candidate	
  links	
  
Link	
  priors	
  
Worker	
  
priors	
  
Observed	
  
variables	
  
Link	
  
factors	
  
SameAs	
  
constraints	
  
Dataset	
  
Unicity	
  
constraints	
  
Gianluca	
  Demar#ni	
   15	
  
En#ty	
  Factor	
  Graphs	
  
•  Training	
  phase	
  
– Ini#alize	
  worker	
  priors	
  
– with	
  k	
  matches	
  on	
  known	
  answers	
  
•  Upda#ng	
  worker	
  Priors	
  
– Use	
  link	
  decision	
  as	
  new	
  observa#ons	
  
– Compute	
  new	
  worker	
  probabili#es	
  
•  Iden#fy	
  (and	
  discard)	
  unreliable	
  workers	
  
Gianluca	
  Demar#ni	
   16	
  
Experimental	
  Evalua#on	
  
•  Datasets	
  
–  25	
  news	
  ar#cles	
  from	
  
•  CNN.com	
  (Global	
  news)	
  
•  NYTimes.com	
  (Global	
  news)	
  
•  Washington-­‐post.com	
  (US	
  local	
  news)	
  
•  Timesofindia.india#mes.com	
  (India	
  news)	
  
•  Swissinfo.com	
  (Switzerland	
  local	
  news)	
  
–  40M	
  en##es	
  (Freebase,	
  DBPedia,	
  Geonames,	
  NYT)	
  
Gianluca	
  Demar#ni	
   17	
  
Worker	
  Selec#on	
  
Gianluca	
  Demar#ni	
   18	
  
Top$US$
Worker$
0$
0.5$
1$
0$ 250$ 500$
Worker&Precision&
Number&of&Tasks&
US$Workers$
IN$Workers$
0.6$
0.62$
0.64$
0.66$
0.68$
0.7$
0.72$
0.74$
0.76$
0.78$
0.8$
1$ 2$ 3$ 4$ 5$ 6$ 7$ 8$ 9$Precision)
Top)K)workers)
Lessons	
  Learnt	
  
•  Crowdsourcing	
  +	
  Prob	
  reasoning	
  works!	
  
•  But	
  
– Different	
  worker	
  communi#es	
  perform	
  differently	
  
– Many	
  low	
  quality	
  workers	
  
– Comple#on	
  #me	
  may	
  vary	
  (based	
  on	
  reward)	
  
•  Need	
  to	
  find	
  the	
  right	
  workers	
  for	
  your	
  task	
  
(see	
  WWW13	
  paper)	
  
Gianluca	
  Demar#ni	
   19	
  
ZenCrowd	
  Summary	
  
•  ZenCrowd:	
  Probabilis#c	
  reasoning	
  over	
  automa#c	
  and	
  
crowdsourcing	
  methods	
  for	
  en#ty	
  linking	
  
•  Standard	
  crowdsourcing	
  improves	
  6%	
  over	
  automa#c	
  
•  4%	
  -­‐	
  35%	
  improvement	
  over	
  standard	
  crowdsourcing	
  
•  14%	
  average	
  improvement	
  over	
  automa#c	
  approaches	
  
•  On-­‐going	
  work:	
  
–  Also	
  used	
  for	
  instance	
  matching	
  across	
  datasets	
  
–  3-­‐way	
  blocking	
  with	
  the	
  crowd	
  
h7p://exascale.info/zencrowd/	
  
Gianluca	
  Demar#ni	
   20	
  
En#ty	
  Disambigua#on	
  
in	
  Scien#fic	
  Literature	
  
•  Using	
  a	
  background	
  concept	
  graph	
  
Roman	
  Prokofyev,	
  Gianluca	
  Demar#ni,	
  Philippe	
  Cudré-­‐Mauroux,	
  Alexey	
  Boyarsky,	
  and	
  Oleg	
  Ruchayskiy.	
  
Ontology-­‐Based	
  Word	
  Sense	
  Disambigua#on	
  in	
  the	
  Scien#fic	
  Domain.	
  In:	
  35th	
  European	
  Conference	
  on	
  
Informa#on	
  Retrieval	
  (ECIR	
  2013).	
  
Gianluca	
  Demar#ni	
   21	
  
h7p://scienceWISE.info/	
  
En#ty	
  Ranking	
  
Ad-­‐hoc	
  Object	
  Retrieval	
  
•  Once	
  en##es	
  have	
  been	
  iden#fied…	
  
•  We	
  want	
  to	
  rank	
  them	
  as	
  answer	
  to	
  a	
  query	
  
•  AOR	
  
– Given	
  the	
  descrip#on	
  of	
  an	
  en#ty	
  
– give	
  me	
  back	
  its	
  iden#fier	
  
– Input:	
  query	
  q,	
  data	
  graph	
  G	
  
– Output:	
  ranked	
  list	
  of	
  URIs	
  from	
  G	
  
Gianluca	
  Demar#ni	
   23	
  
An	
  Hybrid	
  Approach	
  to	
  AOR	
  
Alberto	
  Tonon,	
  Gianluca	
  Demar#ni,	
  and	
  Philippe	
  Cudré-­‐Mauroux.	
  Combining	
  Inverted	
  Indices	
  and	
  Structured	
  
Search	
  for	
  Ad-­‐hoc	
  Object	
  Retrieval.	
  In:	
  35th	
  Annual	
  ACM	
  SIGIR	
  Conference	
  (SIGIR	
  2012).	
  
index()
User
Query Annotation
and Expansion
Inverted Index
RDF
Store
Ranking
FunctionsRanking
FunctionsRanking
Functions
query()
Entity Search
Keyword Query
intermediate
top-k results
Graph-Enriched
Results
Graph Traversals
(queries on object
properties)
Neighborhoods
(queries on datatype
properties)
Structured
Inverted Index
WordNet
3rd party
search engines
Final Ranking
Function
Pseudo-Relevance Feedback
Gianluca	
  Demar#ni	
   24	
  
AOR	
  Evalua#on	
  
•  1.3	
  billions	
  RDF	
  triples	
  from	
  LOD	
  cloud	
  
•  92	
  and	
  50	
  queries	
  
•  Crowdsourced	
  relevance	
  judgments	
  
•  semsearch.yahoo.com	
  
Gianluca	
  Demar#ni	
   25	
  
Evalua#on	
  Results	
  
Gianluca	
  Demar#ni	
   26	
  
Summary	
  
•  AOR	
  =	
  “Given	
  the	
  descrip,on	
  of	
  an	
  en,ty,	
  give	
  
me	
  back	
  its	
  iden,fier”	
  	
  
•  Combining	
  classic	
  IR	
  techniques	
  +	
  structured	
  
database	
  storing	
  graph	
  data	
  	
  
•  Significantly	
  be7er	
  results	
  (up	
  to	
  +25%	
  MAP	
  
over	
  BM25	
  baseline).	
  	
  
•  Overhead	
  caused	
  from	
  the	
  graph	
  traversal	
  
part	
  is	
  limited	
  	
  
Gianluca	
  Demar#ni	
   27	
  
h7p://exascale.info/AOR/	
  
CrowdQ:	
  Crowdsourced	
  Query	
  
Understanding	
  
birthdate	
  of	
  mayor	
  of	
  capital	
  city	
  of	
  france	
  
Gianluca	
  Demar#ni	
   29	
  
capital	
  city	
  of	
  france	
  
Gianluca	
  Demar#ni	
   30	
  
mayor	
  of	
  paris	
  
Gianluca	
  Demar#ni	
   31	
  
birthdate	
  of	
  Bertrand	
  Delanoë	
  
Gianluca	
  Demar#ni	
   32	
  
Mo#va#on	
  
•  Web	
  Search	
  Engines	
  can	
  answer	
  simple	
  factual	
  
queries	
  directly	
  on	
  the	
  result	
  page	
  
•  Users	
  with	
  complex	
  informa#on	
  needs	
  are	
  
oyen	
  unsa#sfied	
  
•  Purely	
  automa#c	
  techniques	
  are	
  not	
  enough	
  
•  We	
  want	
  to	
  solve	
  it	
  with	
  Crowdsourcing!	
  
Gianluca	
  Demar#ni	
   33	
  
CrowdQ	
  
•  CrowdQ	
  is	
  the	
  first	
  system	
  that	
  uses	
  
crowdsourcing	
  to	
  
– Understand	
  the	
  intended	
  meaning	
  
– Build	
  a	
  structured	
  query	
  template	
  
– Answer	
  the	
  query	
  over	
  Linked	
  Open	
  Data	
  
Gianluca	
  Demar#ni	
   34	
  
Gianluca	
  Demar#ni,	
  Beth	
  Trushkowsky,	
  Tim	
  Kraska,	
  and	
  Michael	
  Franklin.	
  CrowdQ:	
  
Crowdsourced	
  Query	
  Understanding.	
  In:	
  6th	
  Biennial	
  Conference	
  on	
  Innova#ve	
  Data	
  Systems	
  
Research	
  (CIDR	
  2013).	
  
35	
  
User
Keyword Query
On#line'Complex'Query
Processing
Complex
query
classifier
Crowdsourcing
Platform
Vetrical
selection,
Unstructured
Search, ...
POS + NER tagging
Query Template Index
Crowd
Manager
N
Y
Queries Templ +
Answer Types
Structured
LOD Search
Result Joiner
Template Generation
SERP
t1
t2
t3
Off#line'Complex'Query
Decomposition
Structured Query
Query
Log
query
N
Answer
Composition
LOD Open Data Cloud
Match with existing
query templates
CrowdQ	
  Architecture	
  
36	
  
Off-­‐line:	
  query	
  template	
  genera#on	
  with	
  the	
  help	
  of	
  the	
  crowd	
  
On-­‐line:	
  query	
  template	
  matching	
  using	
  NLP	
  and	
  search	
  over	
  open	
  data	
  
Hybrid	
  Human-­‐Machine	
  Pipeline	
  
Gianluca	
  Demar#ni	
   37	
  
Q=	
  birthdate	
  of	
  actors	
  of	
  forrest	
  gump	
  
Query	
  annota#on	
   Noun	
   Noun	
   Named	
  en#ty	
  
Verifica#on	
  
En#ty	
  Rela#ons	
  
Is	
  forrest	
  gump	
  this	
  en#ty	
  in	
  the	
  query?	
  
Which	
  is	
  the	
  rela#on	
  between:	
  actors	
  and	
  forrest	
  gump	
   starring	
  
Schema	
  element	
   Starring	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  	
  <dbpedia-­‐owl:starring>	
  	
  
Verifica#on	
   Is	
  the	
  rela#on	
  between:	
  
Indiana	
  Jones	
  –	
  Harrison	
  Ford	
  
Back	
  to	
  the	
  Future	
  –	
  Michael	
  J.	
  Fox	
  
of	
  the	
  same	
  type	
  as	
  
Forrest	
  Gump	
  -­‐	
  actors	
  
	
  
	
  
Structured	
  query	
  genera#on	
  
SELECT	
  ?y	
  ?x	
  
WHERE	
  {	
  ?y	
  <dbpedia-­‐owl:birthdate>	
  ?x	
  .	
  
	
   	
   	
  ?z	
  <dbpedia-­‐owl:starring>	
  ?y	
  .	
  
	
   	
   	
  ?z	
  <rdfs:label>	
  ‘Forrest	
  Gump’	
  }	
  
Gianluca	
  Demar#ni	
   38	
  
Results	
  from	
  BTC09:	
  
Q=	
  birthdate	
  of	
  actors	
  of	
  forrest	
  gump	
  
MOVIE	
  
MOVIE	
  
Conclusions	
  
•  Structured	
  Data	
  make	
  Web	
  Search	
  be7er	
  
•  Exploit	
  the	
  best	
  out	
  of	
  structured	
  and	
  
unstructured	
  data	
  (Hybrid	
  AOR)	
  
•  Crowd	
  can	
  help	
  in	
  understanding	
  seman#cs	
  
•  Hybrid	
  human-­‐machine	
  systems	
  (ZenCrowd)	
  
•  Exploit	
  Human	
  Intelligence	
  at	
  Scale	
  (CrowdQ)	
  
gianlucademartini.net demartini@exascale.info
Gianluca	
  Demar#ni	
   39	
  

More Related Content

Viewers also liked

Dagstuhl2014
Dagstuhl2014Dagstuhl2014
Dagstuhl2014
eXascale Infolab
 
An Integrated Socio/Technical Crowdsourcing Platform for Accelerating Returns...
An Integrated Socio/Technical Crowdsourcing Platform for Accelerating Returns...An Integrated Socio/Technical Crowdsourcing Platform for Accelerating Returns...
An Integrated Socio/Technical Crowdsourcing Platform for Accelerating Returns...
eXascale Infolab
 
ScienceWISE: A Web-based Interactive Semantic Platform for Scientific Collabor...
ScienceWISE: A Web-based Interactive Semantic Platform for Scientific Collabor...ScienceWISE: A Web-based Interactive Semantic Platform for Scientific Collabor...
ScienceWISE: A Web-based Interactive Semantic Platform for Scientific Collabor...
eXascale Infolab
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
eXascale Infolab
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific Literature
eXascale Infolab
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
eXascale Infolab
 
Braintalk cuso nm
Braintalk cuso nmBraintalk cuso nm
Braintalk cuso nm
eXascale Infolab
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
eXascale Infolab
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
eXascale Infolab
 

Viewers also liked (9)

Dagstuhl2014
Dagstuhl2014Dagstuhl2014
Dagstuhl2014
 
An Integrated Socio/Technical Crowdsourcing Platform for Accelerating Returns...
An Integrated Socio/Technical Crowdsourcing Platform for Accelerating Returns...An Integrated Socio/Technical Crowdsourcing Platform for Accelerating Returns...
An Integrated Socio/Technical Crowdsourcing Platform for Accelerating Returns...
 
ScienceWISE: A Web-based Interactive Semantic Platform for Scientific Collabor...
ScienceWISE: A Web-based Interactive Semantic Platform for Scientific Collabor...ScienceWISE: A Web-based Interactive Semantic Platform for Scientific Collabor...
ScienceWISE: A Web-based Interactive Semantic Platform for Scientific Collabor...
 
Entity-Centric Data Management
Entity-Centric Data ManagementEntity-Centric Data Management
Entity-Centric Data Management
 
Ontology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific LiteratureOntology-Based Word Sense Disambiguation for Scientific Literature
Ontology-Based Word Sense Disambiguation for Scientific Literature
 
SANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference ResolutionSANAPHOR: Ontology-based Coreference Resolution
SANAPHOR: Ontology-based Coreference Resolution
 
Braintalk cuso nm
Braintalk cuso nmBraintalk cuso nm
Braintalk cuso nm
 
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
Fixing the Domain and Range of Properties in Linked Data by Context Disambigu...
 
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data OceansDependency-Driven Analytics: A Compass for Uncharted Data Oceans
Dependency-Driven Analytics: A Compass for Uncharted Data Oceans
 

Similar to Entities, Graphs, and Crowdsourcing for better Web Search

Human Computation for Big Data
Human Computation for Big DataHuman Computation for Big Data
Human Computation for Big Data
eXascale Infolab
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
Srinath Perera
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community Responses
Daniel S. Katz
 
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...
eXascale Infolab
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
CrowdFlower
 
Offensive OSINT
Offensive OSINTOffensive OSINT
Offensive OSINT
Christian Martorella
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
Tao Feng
 
How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for Success
OpenSource Connections
 
Implimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyImplimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled Technology
Indiana Online Users Group
 
Personalised Access to Linked Data
Personalised Access to Linked DataPersonalised Access to Linked Data
Personalised Access to Linked Data
Milan Dojchinovski
 
OpenNeuro: a free online platform for sharing and analysis of neuroimaging data
OpenNeuro: a free online platform for sharing and analysis of neuroimaging dataOpenNeuro: a free online platform for sharing and analysis of neuroimaging data
OpenNeuro: a free online platform for sharing and analysis of neuroimaging data
Krzysztof Gorgolewski
 
Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity Report
Fabien Gandon
 
Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)
stelligence
 
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
Tom LaGatta
 
Splunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data ScienceSplunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data Science
Splunk
 
Designing a synergistic relationship between undergraduate Data Science educa...
Designing a synergistic relationship between undergraduate Data Science educa...Designing a synergistic relationship between undergraduate Data Science educa...
Designing a synergistic relationship between undergraduate Data Science educa...
Ciera Martinez
 
Data science presentation
Data science presentationData science presentation
Data science presentation
MSDEVMTL
 
Putting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationPutting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education Organisation
Mathieu d'Aquin
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrust
Menchita Falcutila Dumlao
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
Jay Gendron
 

Similar to Entities, Graphs, and Crowdsourcing for better Web Search (20)

Human Computation for Big Data
Human Computation for Big DataHuman Computation for Big Data
Human Computation for Big Data
 
Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack Big Data Analysis : Deciphering the haystack
Big Data Analysis : Deciphering the haystack
 
Scientific Software Challenges and Community Responses
Scientific Software Challenges and Community ResponsesScientific Software Challenges and Community Responses
Scientific Software Challenges and Community Responses
 
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...
ZenCrowd: Leveraging Probabilistic Reasoning and Crowdsourcing Techniques for...
 
How Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment AnalysisHow Oracle Uses CrowdFlower For Sentiment Analysis
How Oracle Uses CrowdFlower For Sentiment Analysis
 
Offensive OSINT
Offensive OSINTOffensive OSINT
Offensive OSINT
 
Data council sf amundsen presentation
Data council sf    amundsen presentationData council sf    amundsen presentation
Data council sf amundsen presentation
 
How To Structure Your Search Team for Success
How To Structure Your Search Team for SuccessHow To Structure Your Search Team for Success
How To Structure Your Search Team for Success
 
Implimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled TechnologyImplimenting and Mitigating Change with all of this Newfangled Technology
Implimenting and Mitigating Change with all of this Newfangled Technology
 
Personalised Access to Linked Data
Personalised Access to Linked DataPersonalised Access to Linked Data
Personalised Access to Linked Data
 
OpenNeuro: a free online platform for sharing and analysis of neuroimaging data
OpenNeuro: a free online platform for sharing and analysis of neuroimaging dataOpenNeuro: a free online platform for sharing and analysis of neuroimaging data
OpenNeuro: a free online platform for sharing and analysis of neuroimaging data
 
Wimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity ReportWimmics Research Team 2015 Activity Report
Wimmics Research Team 2015 Activity Report
 
Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)Splunk for DataScience (.conf2014)
Splunk for DataScience (.conf2014)
 
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014LaGatta and de Garrigues - Splunk for Data Science - .conf2014
LaGatta and de Garrigues - Splunk for Data Science - .conf2014
 
Splunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data ScienceSplunk conf2014 - Splunk for Data Science
Splunk conf2014 - Splunk for Data Science
 
Designing a synergistic relationship between undergraduate Data Science educa...
Designing a synergistic relationship between undergraduate Data Science educa...Designing a synergistic relationship between undergraduate Data Science educa...
Designing a synergistic relationship between undergraduate Data Science educa...
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Putting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education OrganisationPutting Linked Data to Use in a Large Higher-Education Organisation
Putting Linked Data to Use in a Large Higher-Education Organisation
 
Lec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrustLec 1 integrating data science and data analytics in various research thrust
Lec 1 integrating data science and data analytics in various research thrust
 
Big Data in NATO and Your Role
Big Data in NATO and Your RoleBig Data in NATO and Your Role
Big Data in NATO and Your Role
 

More from eXascale Infolab

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
eXascale Infolab
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
eXascale Infolab
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
eXascale Infolab
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
eXascale Infolab
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
eXascale Infolab
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
eXascale Infolab
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
eXascale Infolab
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
eXascale Infolab
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
eXascale Infolab
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
eXascale Infolab
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
eXascale Infolab
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
eXascale Infolab
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
eXascale Infolab
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
eXascale Infolab
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
eXascale Infolab
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
eXascale Infolab
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
eXascale Infolab
 
Hasler2014
Hasler2014Hasler2014
Hasler2014
eXascale Infolab
 
Crowdsourcing is for the tail
Crowdsourcing is for the tailCrowdsourcing is for the tail
Crowdsourcing is for the tail
eXascale Infolab
 
The Evolution of Big Data Frameworks
The Evolution of Big Data FrameworksThe Evolution of Big Data Frameworks
The Evolution of Big Data Frameworks
eXascale Infolab
 

More from eXascale Infolab (20)

Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link PredictionBeyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
Beyond Triplets: Hyper-Relational Knowledge Graph Embedding for Link Prediction
 
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
It Takes Two: Instrumenting the Interaction between In-Memory Databases and S...
 
Representation Learning on Complex Graphs
Representation Learning on Complex GraphsRepresentation Learning on Complex Graphs
Representation Learning on Complex Graphs
 
A force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory mapA force directed approach for offline gps trajectory map
A force directed approach for offline gps trajectory map
 
Cikm 2018
Cikm 2018Cikm 2018
Cikm 2018
 
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
HistoSketch: Fast Similarity-Preserving Sketching of Streaming Histograms wit...
 
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
SwissLink: High-Precision, Context-Free Entity Linking Exploiting Unambiguous...
 
Crowd scheduling www2016
Crowd scheduling www2016Crowd scheduling www2016
Crowd scheduling www2016
 
Efficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked DataEfficient, Scalable, and Provenance-Aware Management of Linked Data
Efficient, Scalable, and Provenance-Aware Management of Linked Data
 
SSSW 2015 Sense Making
SSSW 2015 Sense MakingSSSW 2015 Sense Making
SSSW 2015 Sense Making
 
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked DataLDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
LDOW2015 - Uduvudu: a Graph-Aware and Adaptive UI Engine for Linked Data
 
Executing Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web DataExecuting Provenance-Enabled Queries over Web Data
Executing Provenance-Enabled Queries over Web Data
 
The Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task CrowdsourcingThe Dynamics of Micro-Task Crowdsourcing
The Dynamics of Micro-Task Crowdsourcing
 
CIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition rankingCIKM14: Fixing grammatical errors by preposition ranking
CIKM14: Fixing grammatical errors by preposition ranking
 
OLTP-Bench
OLTP-BenchOLTP-Bench
OLTP-Bench
 
An Introduction to Big Data
An Introduction to Big DataAn Introduction to Big Data
An Introduction to Big Data
 
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
Internet Infrastructures for Big Data (Verisign's Distinguished Speaker Series)
 
Hasler2014
Hasler2014Hasler2014
Hasler2014
 
Crowdsourcing is for the tail
Crowdsourcing is for the tailCrowdsourcing is for the tail
Crowdsourcing is for the tail
 
The Evolution of Big Data Frameworks
The Evolution of Big Data FrameworksThe Evolution of Big Data Frameworks
The Evolution of Big Data Frameworks
 

Recently uploaded

Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
Hiroshi SHIBATA
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
HarisZaheer8
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
akankshawande
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
Brandon Minnick, MBA
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
Pixlogix Infotech
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
MichaelKnudsen27
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
tolgahangng
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
Pravash Chandra Das
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Jeffrey Haguewood
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
Jakub Marek
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
flufftailshop
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
Wouter Lemaire
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
Tatiana Kojar
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
panagenda
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Malak Abu Hammad
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
Zilliz
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
Intelisync
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
Zilliz
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
Chart Kalyan
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
fredae14
 

Recently uploaded (20)

Introduction of Cybersecurity with OSS at Code Europe 2024
Introduction of Cybersecurity with OSS  at Code Europe 2024Introduction of Cybersecurity with OSS  at Code Europe 2024
Introduction of Cybersecurity with OSS at Code Europe 2024
 
AWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptxAWS Cloud Cost Optimization Presentation.pptx
AWS Cloud Cost Optimization Presentation.pptx
 
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development ProvidersYour One-Stop Shop for Python Success: Top 10 US Python Development Providers
Your One-Stop Shop for Python Success: Top 10 US Python Development Providers
 
Choosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptxChoosing The Best AWS Service For Your Website + API.pptx
Choosing The Best AWS Service For Your Website + API.pptx
 
Best 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERPBest 20 SEO Techniques To Improve Website Visibility In SERP
Best 20 SEO Techniques To Improve Website Visibility In SERP
 
Nordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptxNordic Marketo Engage User Group_June 13_ 2024.pptx
Nordic Marketo Engage User Group_June 13_ 2024.pptx
 
Serial Arm Control in Real Time Presentation
Serial Arm Control in Real Time PresentationSerial Arm Control in Real Time Presentation
Serial Arm Control in Real Time Presentation
 
Operating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptxOperating System Used by Users in day-to-day life.pptx
Operating System Used by Users in day-to-day life.pptx
 
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...
 
Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)Main news related to the CCS TSI 2023 (2023/1695)
Main news related to the CCS TSI 2023 (2023/1695)
 
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdfNunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
Nunit vs XUnit vs MSTest Differences Between These Unit Testing Frameworks.pdf
 
UI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentationUI5 Controls simplified - UI5con2024 presentation
UI5 Controls simplified - UI5con2024 presentation
 
Skybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoptionSkybuffer SAM4U tool for SAP license adoption
Skybuffer SAM4U tool for SAP license adoption
 
HCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAUHCL Notes and Domino License Cost Reduction in the World of DLAU
HCL Notes and Domino License Cost Reduction in the World of DLAU
 
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfUnlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdf
 
Programming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup SlidesProgramming Foundation Models with DSPy - Meetup Slides
Programming Foundation Models with DSPy - Meetup Slides
 
A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024A Comprehensive Guide to DeFi Development Services in 2024
A Comprehensive Guide to DeFi Development Services in 2024
 
Fueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte WebinarFueling AI with Great Data with Airbyte Webinar
Fueling AI with Great Data with Airbyte Webinar
 
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfHow to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdf
 
Recommendation System using RAG Architecture
Recommendation System using RAG ArchitectureRecommendation System using RAG Architecture
Recommendation System using RAG Architecture
 

Entities, Graphs, and Crowdsourcing for better Web Search

  • 1. En##es,  Graphs,  and  Crowdsourcing   for  be7er  Web  Search   Gianluca  Demar#ni   eXascale  Infolab   University  of  Fribourg,  Switzerland  
  • 2. Gianluca  Demar#ni   •  M.Sc.  at  University  of  Udine,  Italy   •  Ph.D.  at  University  of  Hannover,  Germany   –  En#ty  Retrieval   •  Worked  for  UC  Berkeley  (on  Crowdsourcing),  Yahoo!  Research   (Spain),  L3S  Research  Center  (Germany)   •  Post-­‐doc  at  the  eXascale  Infolab,  Uni  Fribourg,  Switzerland.   •  Lecturer  for  Social  Compu,ng  in  Fribourg   •  Tutorial  on  En#ty  Search  at  ECIR  2012,  on  Crowdsourcing  at   ESWC  2013  and  ISWC  2013   •  Research  Interests   –  Informa#on  Retrieval,  Seman#c  Web,  Crowdsourcing   2   demartini@exascale.info Gianluca  Demar#ni  
  • 5. Web  of  Data   •  Freebase   –  Acquired  by  Google  in  July  2010.   –  Knowledge  Graph  launched  in  May  2012.   •  Schema.org   –  Driven  by  major  search  engine  companies   –  Machine-­‐readable  annota#ons  of  Web  pages   •  Linked  Open  Data   –  31  billion  triples,  Sept.  2011   Gianluca  Demar#ni   5  
  • 6. Linked  Open  Data   Z.  Kaoudi  and  I.  Manolescu,  ICDE  seminar  2013     6  
  • 7. I  will  talk  about   •  En#ty  Linking/Disambigua#on   – On  the  Web  using  crowdsourcing   – For  scien#fic  literature  using  graphs   •  Ad-­‐hoc  Object  Retrieval  (En#ty  Ranking)   – Using  IR  and  graphs   •  Crowdsourced  Query  Understanding   Gianluca  Demar#ni   7  
  • 8. Disclaimer   •  No  efficiency  evalua#on   – Approaches  not  distributed   – But  designed  to  scale  out   •  No  user  studies   – Goal:  Obtain  high  quality  data   – Only  TREC-­‐like  evalua#on  on  effec#veness   Gianluca  Demar#ni   8  
  • 10. Gianluca  Demar#ni   10   h7p://dbpedia.org/resource/Facebook   h7p://dbpedia.org/resource/Instagram   jase:Instagram   owl:sameAs   Google   Android   <p>Facebook  is  not  wai#ng  for  its  ini#al   public  offering  to  make  its  first  big   purchase.</p><p>In  its  largest   acquisi#on  to  date,  the  social  network   has  purchased  Instagram,  the  popular   photo-­‐sharing  applica#on,  for  about  $1   billion  in  cash  and  stock,  the  company   said  Monday.</p>   <p><span  about="h7p://dbpedia.org/resource/ Facebook"><cite  property=”rdfs:label">Facebook</ cite>  is  not  wai#ng  for  its  ini#al  public  offering  to   make  its  first  big  purchase.</span></p><p><span   about="h7p://dbpedia.org/resource/Instagram">In   its  largest  acquisi#on  to  date,  the  social  network  has   purchased  <cite  property=”rdfs:label">Instagram</ cite>  ,  the  popular  photo-­‐sharing  applica#on,  for   about  $1  billion  in  cash  and  stock,  the  company  said   Monday.</span></p>   RDFa   enrichment   HTML:  
  • 11. Crowdsourcing   •  Exploit  human  intelligence  to  solve   – Tasks  simple  for  humans,  complex  for  machines   – With  a  large  number  of  humans  (the  Crowd)   – Small  problems:  micro-­‐tasks  (Amazon  MTurk)   •  Examples   – Wikipedia,  Image  tagging   •  Incen#ves   – Financial,  fun,  visibility   Gianluca  Demar#ni   11  
  • 12. ZenCrowd   •  Combine  both  algorithmic  and  manual  linking   •  Automate  manual  linking  via  crowdsourcing   •  Dynamically  assess  human  workers  with  a   probabilis#c  reasoning  framework   12   Crowd   Algorithms  Machines   Gianluca  Demar#ni  
  • 13. ZenCrowd  Architecture   Micro Matching Tasks HTML Pages HTML+ RDFa Pages LOD Open Data Cloud Crowdsourcing Platform ZenCrowd Entity Extractors LOD Index Get Entity Input Output Probabilistic Network Decision Engine Micro- TaskManager Workers Decisions Algorithmic Matchers Gianluca  Demar#ni   13   Gianluca  Demar#ni,  Djellel  Eddine  Difallah,  and  Philippe  Cudré-­‐Mauroux.  ZenCrowd:  Leveraging  Probabilis#c   Reasoning  and  Crowdsourcing  Techniques  for  Large-­‐Scale  En#ty  Linking.  In:  21st  Interna#onal  Conference  on   World  Wide  Web  (WWW  2012).  
  • 14. Algorithmic  Matching   •  Inverted  index  over  LOD  en##es   – DBPedia,  Freebase,  Geonames,  NYT   •  TF-­‐IDF  (IR  ranking  func#on)   •  Top  ranked  URIs  linked  to  en##es  in  docs   •  Threshold  on  the  ranking  func#on  or  top  N   Gianluca  Demar#ni   14  
  • 15. En#ty  Factor  Graphs   •  Graph  components   – Workers,  links,  clicks   – Prior  probabili#es   – Link  Factors   – Constraints   •  Probabilis#c   Inference   – Select  all  links  with   posterior  prob  >τ   w1 w2 l1 l2 pw1( ) pw2( ) lf1( ) lf2( ) pl1( ) pl2( ) l3 lf3( ) pl3( ) c11 c22 c12 c21 c13 c23 u2-3( )sa1-2( ) 2  workers,  6  clicks,  3  candidate  links   Link  priors   Worker   priors   Observed   variables   Link   factors   SameAs   constraints   Dataset   Unicity   constraints   Gianluca  Demar#ni   15  
  • 16. En#ty  Factor  Graphs   •  Training  phase   – Ini#alize  worker  priors   – with  k  matches  on  known  answers   •  Upda#ng  worker  Priors   – Use  link  decision  as  new  observa#ons   – Compute  new  worker  probabili#es   •  Iden#fy  (and  discard)  unreliable  workers   Gianluca  Demar#ni   16  
  • 17. Experimental  Evalua#on   •  Datasets   –  25  news  ar#cles  from   •  CNN.com  (Global  news)   •  NYTimes.com  (Global  news)   •  Washington-­‐post.com  (US  local  news)   •  Timesofindia.india#mes.com  (India  news)   •  Swissinfo.com  (Switzerland  local  news)   –  40M  en##es  (Freebase,  DBPedia,  Geonames,  NYT)   Gianluca  Demar#ni   17  
  • 18. Worker  Selec#on   Gianluca  Demar#ni   18   Top$US$ Worker$ 0$ 0.5$ 1$ 0$ 250$ 500$ Worker&Precision& Number&of&Tasks& US$Workers$ IN$Workers$ 0.6$ 0.62$ 0.64$ 0.66$ 0.68$ 0.7$ 0.72$ 0.74$ 0.76$ 0.78$ 0.8$ 1$ 2$ 3$ 4$ 5$ 6$ 7$ 8$ 9$Precision) Top)K)workers)
  • 19. Lessons  Learnt   •  Crowdsourcing  +  Prob  reasoning  works!   •  But   – Different  worker  communi#es  perform  differently   – Many  low  quality  workers   – Comple#on  #me  may  vary  (based  on  reward)   •  Need  to  find  the  right  workers  for  your  task   (see  WWW13  paper)   Gianluca  Demar#ni   19  
  • 20. ZenCrowd  Summary   •  ZenCrowd:  Probabilis#c  reasoning  over  automa#c  and   crowdsourcing  methods  for  en#ty  linking   •  Standard  crowdsourcing  improves  6%  over  automa#c   •  4%  -­‐  35%  improvement  over  standard  crowdsourcing   •  14%  average  improvement  over  automa#c  approaches   •  On-­‐going  work:   –  Also  used  for  instance  matching  across  datasets   –  3-­‐way  blocking  with  the  crowd   h7p://exascale.info/zencrowd/   Gianluca  Demar#ni   20  
  • 21. En#ty  Disambigua#on   in  Scien#fic  Literature   •  Using  a  background  concept  graph   Roman  Prokofyev,  Gianluca  Demar#ni,  Philippe  Cudré-­‐Mauroux,  Alexey  Boyarsky,  and  Oleg  Ruchayskiy.   Ontology-­‐Based  Word  Sense  Disambigua#on  in  the  Scien#fic  Domain.  In:  35th  European  Conference  on   Informa#on  Retrieval  (ECIR  2013).   Gianluca  Demar#ni   21   h7p://scienceWISE.info/  
  • 23. Ad-­‐hoc  Object  Retrieval   •  Once  en##es  have  been  iden#fied…   •  We  want  to  rank  them  as  answer  to  a  query   •  AOR   – Given  the  descrip#on  of  an  en#ty   – give  me  back  its  iden#fier   – Input:  query  q,  data  graph  G   – Output:  ranked  list  of  URIs  from  G   Gianluca  Demar#ni   23  
  • 24. An  Hybrid  Approach  to  AOR   Alberto  Tonon,  Gianluca  Demar#ni,  and  Philippe  Cudré-­‐Mauroux.  Combining  Inverted  Indices  and  Structured   Search  for  Ad-­‐hoc  Object  Retrieval.  In:  35th  Annual  ACM  SIGIR  Conference  (SIGIR  2012).   index() User Query Annotation and Expansion Inverted Index RDF Store Ranking FunctionsRanking FunctionsRanking Functions query() Entity Search Keyword Query intermediate top-k results Graph-Enriched Results Graph Traversals (queries on object properties) Neighborhoods (queries on datatype properties) Structured Inverted Index WordNet 3rd party search engines Final Ranking Function Pseudo-Relevance Feedback Gianluca  Demar#ni   24  
  • 25. AOR  Evalua#on   •  1.3  billions  RDF  triples  from  LOD  cloud   •  92  and  50  queries   •  Crowdsourced  relevance  judgments   •  semsearch.yahoo.com   Gianluca  Demar#ni   25  
  • 26. Evalua#on  Results   Gianluca  Demar#ni   26  
  • 27. Summary   •  AOR  =  “Given  the  descrip,on  of  an  en,ty,  give   me  back  its  iden,fier”     •  Combining  classic  IR  techniques  +  structured   database  storing  graph  data     •  Significantly  be7er  results  (up  to  +25%  MAP   over  BM25  baseline).     •  Overhead  caused  from  the  graph  traversal   part  is  limited     Gianluca  Demar#ni   27   h7p://exascale.info/AOR/  
  • 28. CrowdQ:  Crowdsourced  Query   Understanding  
  • 29. birthdate  of  mayor  of  capital  city  of  france   Gianluca  Demar#ni   29  
  • 30. capital  city  of  france   Gianluca  Demar#ni   30  
  • 31. mayor  of  paris   Gianluca  Demar#ni   31  
  • 32. birthdate  of  Bertrand  Delanoë   Gianluca  Demar#ni   32  
  • 33. Mo#va#on   •  Web  Search  Engines  can  answer  simple  factual   queries  directly  on  the  result  page   •  Users  with  complex  informa#on  needs  are   oyen  unsa#sfied   •  Purely  automa#c  techniques  are  not  enough   •  We  want  to  solve  it  with  Crowdsourcing!   Gianluca  Demar#ni   33  
  • 34. CrowdQ   •  CrowdQ  is  the  first  system  that  uses   crowdsourcing  to   – Understand  the  intended  meaning   – Build  a  structured  query  template   – Answer  the  query  over  Linked  Open  Data   Gianluca  Demar#ni   34   Gianluca  Demar#ni,  Beth  Trushkowsky,  Tim  Kraska,  and  Michael  Franklin.  CrowdQ:   Crowdsourced  Query  Understanding.  In:  6th  Biennial  Conference  on  Innova#ve  Data  Systems   Research  (CIDR  2013).  
  • 35. 35  
  • 36. User Keyword Query On#line'Complex'Query Processing Complex query classifier Crowdsourcing Platform Vetrical selection, Unstructured Search, ... POS + NER tagging Query Template Index Crowd Manager N Y Queries Templ + Answer Types Structured LOD Search Result Joiner Template Generation SERP t1 t2 t3 Off#line'Complex'Query Decomposition Structured Query Query Log query N Answer Composition LOD Open Data Cloud Match with existing query templates CrowdQ  Architecture   36   Off-­‐line:  query  template  genera#on  with  the  help  of  the  crowd   On-­‐line:  query  template  matching  using  NLP  and  search  over  open  data  
  • 37. Hybrid  Human-­‐Machine  Pipeline   Gianluca  Demar#ni   37   Q=  birthdate  of  actors  of  forrest  gump   Query  annota#on   Noun   Noun   Named  en#ty   Verifica#on   En#ty  Rela#ons   Is  forrest  gump  this  en#ty  in  the  query?   Which  is  the  rela#on  between:  actors  and  forrest  gump   starring   Schema  element   Starring                          <dbpedia-­‐owl:starring>     Verifica#on   Is  the  rela#on  between:   Indiana  Jones  –  Harrison  Ford   Back  to  the  Future  –  Michael  J.  Fox   of  the  same  type  as   Forrest  Gump  -­‐  actors      
  • 38. Structured  query  genera#on   SELECT  ?y  ?x   WHERE  {  ?y  <dbpedia-­‐owl:birthdate>  ?x  .        ?z  <dbpedia-­‐owl:starring>  ?y  .        ?z  <rdfs:label>  ‘Forrest  Gump’  }   Gianluca  Demar#ni   38   Results  from  BTC09:   Q=  birthdate  of  actors  of  forrest  gump   MOVIE   MOVIE  
  • 39. Conclusions   •  Structured  Data  make  Web  Search  be7er   •  Exploit  the  best  out  of  structured  and   unstructured  data  (Hybrid  AOR)   •  Crowd  can  help  in  understanding  seman#cs   •  Hybrid  human-­‐machine  systems  (ZenCrowd)   •  Exploit  Human  Intelligence  at  Scale  (CrowdQ)   gianlucademartini.net demartini@exascale.info Gianluca  Demar#ni   39