Iden%fying	  Informa%on	  Needs	  by	  Modelling	  Collec%ve	  Query	  Pa:erns	                 K.Elbedweihy,	  S.	  Mazum...
Informa%on	  Needs	         	     	     	     	  	  	         	     	     	     	         	     	  Informa(on	  needs	  	 ...
Informa%on	  Needs	  (Cont’d)	     !   !   PREFIX dbo: <>   SELECT ?manufacturer WHERE {!   <h...
Mo%va%on	  	  Saracevic[1997]:	  	  “ The	  success	  or	  failure	  of	  any	  interac%ve	  system	  and	  technology	  i...
Mo%va%on	  	  understand	  how	  to	  use	  logs	  of	  queries	  	  	  	  	                   	   	   	  iden%fy	  inform...
Outline	  •  Introduc%on	  •  Related	  Work	  •  Approach	       	  -­‐	  Formalising	  Query	  Logs	  	   	  -­‐	  Analy...
Introduc%on                                                                                                               ...
Introduc%on   	                        Semantic                     Query Logs
Related	  Work	  Analysis	  for	  the	  Web	  of	  Documents	  	  •  Studying	  the	  search	  behavior	  of	  Web	  users...
Related	  Work	  (Cont’d)	  Analysis	  for	  the	  Web	  of	  Data	  •  Moller	  et	  al.	  [10]	  iden%fied	  pa>erns	  of...
Related	  Work	  (Cont’d)	  	  How	  our	  work	  is	  different:	  	  Our	  focus	  is	  on	  	         	     	  iden%fyin...
Formalizing	  Query	  Logs	  •  Proposed	  ontology	  ‘Qlog’	  used	  to	  represent	  the	  main	     concepts	  and	  re...
Qlog	  Ontology	  Log	  Entry	  Concepts	           Query	  Logs	  Analysis	  Concepts	  
Analyzing	  Query	  Logs	  
Consuming	  Query	  Logs	  Analysis	  •  How	  to	  consume	  the	  query	  logs	  analysis?	  	         	  -­‐	  Automa%c...
Consuming	  Query	  Logs	  Analysis:	  Visualiza%ons	  	  Concept	  Graph	  	  	  	  Predicate	  sequence	  tree	  
Steps	  for	  Consuming	  Query	  Logs	  Analysis	                    Identify        Gather      Build                   ...
Dataset	  •  The	  data	  used	  in	  this	  study	  is	  made	  available	  by	  the	     USEWOD2011	  data	  challenge.	...
Analyzing	  Dbpedia	  usage	  pa:erns	  
Analyzing	  Dbpedia	  usage	  pa:erns	  (Cont’d)	  
Ques%ons	       	       	       	       	  Ques%ons?        !
Upcoming SlideShare
Loading in …5

Identifying Information Needs by Modelling Collective Query Patterns


Published on

Published in: Technology, Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide

Identifying Information Needs by Modelling Collective Query Patterns

  1. 1. Iden%fying  Informa%on  Needs  by  Modelling  Collec%ve  Query  Pa:erns   K.Elbedweihy,  S.  Mazumdar,  A.E.  Cano,  S.N.  Wrigley,  F.Ciravegna   OAK  Research  Group,     Department  of  Computer  Science,     University  of  Sheffield  
  2. 2. Informa%on  Needs                          Informa(on  needs       “the  set  of  concepts  and  proper%es  users  refer  to  while   using  SPARQL  queries.”  
  3. 3. Informa%on  Needs  (Cont’d)   ! ! PREFIX dbo: <> SELECT ?manufacturer WHERE {! <> ! query !dbo:manufacturer ?manufacturer. !}! !•  User’s  informa%on  needs:     type  concept:    “h:p://”    property:  “dbo:manufacturer”  
  4. 4. Mo%va%on    Saracevic[1997]:    “ The  success  or  failure  of  any  interac%ve  system  and  technology  is  con%ngent  on  the  extent  to  which  user  issues,  the  human  factors,  are  addressed  right  from  the  beginning  to  the  very  end…..”    Peter  Mika[2009]:  “Considering  the  informa%on  needs  of  end  users  is  cri%cal  to  the  success  of  Seman%c  Search”       !
  5. 5. Mo%va%on    understand  how  to  use  logs  of  queries                iden%fy  informa%on  needs          consume  such  analysis                                                                    be:er  understanding  and  insight  into  the  data  usage     !
  6. 6. Outline  •  Introduc%on  •  Related  Work  •  Approach    -­‐  Formalising  Query  Logs      -­‐  Analysing  Query  Logs      -­‐  Consuming  Query  Log  Analyses  •  Dataset  &  Findings  
  7. 7. Introduc%on   Moseley Audio Scrobbler LOV Linked User Feedback Slideshare 2RDF tags2con delicious Bricklink Sussex 295 Dataset 31 billion RDF triples Folk (DBTune) Reading St. GTAA Magna- Lists Andrews Klapp- tune stuhl- Resource NTU DB club Lists Resource Tropes Lotico Semantic yovisto John Music Man- Lists “September 2011” Music Tweet chester Hellenic Peel Brainz NDL (DBTune) (Data Brainz Reading subjects FBD (zitgist) Lists Open EUTC Incubator) Linked Hellenic Library Open t4gm Produc- Crunch- PD Surge RDF info tions Discogs base Library Radio Ontos Source Code Crime ohloh Plymouth (Talis) (Data News LEM Ecosystem Reading RAMEAU Reports business Incubator) Crime Portal Linked Data Lists SH UK Music Jamendo (En- uk Brainz (DBtune) LinkedL Ox AKTing) FanHubz gnoss ntnusc (DBTune) SSW CCN Points Thesau- Last.FM Poké- Thesaur Popula- artists pédia Didactal us rus W LIBRIS tion (En- (DBTune) Last.FM ia theses. LCSH Rådata reegle research patents MARC AKTing) (rdfize) my fr nå! data.go Codes Ren. NHS uk Good- Experi- Classical List Energy (En- win flickr ment (DB Pokedex Family Norwe- Genera- AKTing) Mortality BBC wrappr Sudoc PSH Tune) gian (En- tors Program MeSH AKTing) semantic mes BBC IdRef GND CO2 educatio OpenEI SW Energy Sudoc ndlna Emission Music Dog VIAF EEA (En- Chronic- Linked (En- Portu- Food UB AKTing) ling Event MDB AKTing) guese Mann- Europeana BBC America Media DBpedia Calames heim Ord- Recht- Wildlife Deutsche Open Revyu DDC Openly spraak. Finder Bio- lobid Election nance legislation Local nl RDF graphie Resources NSZL Swedish Data Survey Tele- data Ulm EU New Book Project graphis Catalog Open Insti- York URI Open Mashup Cultural tutions Times Greek P20 UK Post- Burner Calais Heritage codes DBpedia ECS Wiki statistics lobid GovWILD Taxon iServe South- Organi- LOIUS BNBBrazilian uk Concept ECS ampton sations Geo World OS BibBase STW GESIS Poli- ESD South- ECS Names Fact- (RKB ticians stan- reference ampton book Freebase Explorer) Budapest dards NASA EPrints uk intervals Project OAI Lichfield transport (Data DBpedia data Guten- Pisa Spen- Incu- dcs RESEX Scholaro- ISTAT ding bator) Fishes berg DBLP DBLP uk Geo meter Immi- Scotland of Texas (FU (L3S) Pupils & Uberblic DBLP gration Species Berlin) IRIT Exams Euro- dbpedia data- (RKB London TCM ACM stat lite open- Explorer) NVD Gazette (FUB) Gene IBM Traffic Geo ac-uk Scotland TWC LOGD Eurostat Daily DIT Linked UN/ Data UMBEL Med ERA Data LOCODE DEPLOY CORDIS YAGO New- lingvoj Disea- (RKB some SIDER RAE2001 castle LOCAH CORDIS Explorer) Linked Eurécom Eurostat Drug CiteSeer Roma (FUB) Sensor Data GovTrack (Ontology (Kno.e.sis) Open Bank Pfam Course- Central) riese Enipedia Cyc Lexvo LinkedCT ware Linked PDB UniProt VIVO EURES EDGAR dotAC US SEC Indiana ePrints IEEE (Ontology (rdfabout) Central) WordNet RISKS (VUA) Taxono UniProt US Census EUNIS Twarql HGNC Semantic Cornetto (Bio2RDF) (rdfabout) my VIVO FTS XBRL PRO- ProDom STITCH Cornell LAAS SITE KISTI NSF Scotland Geo- GeoWord LODE graphy Net WordNet WordNet JISC (W3C) (RKB Climbing Linked Affy- KEGG SMC Explorer) SISVU Pub VIVO UF Piedmont GeoData metrix Drug ECCO- Finnish Journals PubMed Gene SGD Chem Munici- Accomo- El AGROV Ontology TCP Media dations Alpine bible palities Viajero OC Ski ontology Tourism KEGG Ocean Austria Enzyme PBAC Geographic Metoffice GEMET ChEMBL Italian Drilling OMIM KEGG Weather Open public Codices AEMET Linked MGI Pathway schools Forecasts Data Open InterPro GeneID Publications EARTh Thesau- KEGG Turismo rus Colors Reaction de Zaragoza Product Smart KEGG User-generated content Weather DB Link Medi Glycan Janus Stations Product Care KEGG AMP UniParc UniRef UniSTS Government Types Italian Homolo Com- Yahoo! Airports Museums pound Ontology Google Gene Geo Art Planet National wrapper Chem2 Cross-domain Radio- Bio2RDF activity UniPath JP Sears Open Linked OGOLOD way Life sciences Corpo- Amster- Reactome dam medu- Open rates Numbers Museum cator As of September 2011
  8. 8. Introduc%on   Semantic Query Logs
  9. 9. Related  Work  Analysis  for  the  Web  of  Documents    •  Studying  the  search  behavior  of  Web  users  [Silverstein  et   al.  (1999),  Jansen  and  Spink  (2005),  Jansen  et  al.  (2005)   and  Spink  et  al.  (2002)].  •  Improving  the  search  experience  of  Web  users:    -­‐  Query  Recommenda(ons  [Baeza-­‐Yates  et  al.  (2004)  and      Wen  et  al.  (2001)]    -­‐  Query  Expansion  [Cui  et  al.  (2002a)]    
  10. 10. Related  Work  (Cont’d)  Analysis  for  the  Web  of  Data  •  Moller  et  al.  [10]  iden%fied  pa>erns  of  Linked  Data  usage   with  respect  to  different  types  of  agents.  •  Arias  et  al.  [1]  analyzed  the  structure  of  the  SPARQL   queries  to  iden(fy  most  frequent  language  elements.  •  Kirchberg  et  al.  [8]  introduced  a  new  no%on  of  ‘relevance   of  a  LD  resource’  as  the  ‘rela%onship  between  traffic  and   the  resource  and  whether  it  changes  over  %me  windows’  
  11. 11. Related  Work  (Cont’d)    How  our  work  is  different:    Our  focus  is  on        iden%fying  informa%on  needs  by        modelling  query  pa5erns  of  Linked  Data  users.       approach  to  formalize  seman%c  query  log  analysis     set  of  methods  for  extrac%ng  pa:erns  in  the  query  logs   visualiza%on  of  informa%on  needs  
  12. 12. APPROACH  
  13. 13. Formalizing  Query  Logs  •  Proposed  ontology  ‘Qlog’  used  to  represent  the  main   concepts  and  rela%ons  extracted  from  a  query  log  entry.    •  A  log  entry  follows  the  Combined  Log  Format  (CLF):  
  14. 14. Qlog  Ontology  Log  Entry  Concepts   Query  Logs  Analysis  Concepts  
  15. 15. Analyzing  Query  Logs  
  16. 16. Consuming  Query  Logs  Analysis  •  How  to  consume  the  query  logs  analysis?      -­‐  Automa%c  query  sugges%ons      -­‐  Recommender  systems        -­‐  Search  tools  (disambigua%on  and  ranking  results)      -­‐  Visualiza%ons  (to  gain  understanding  of  dataset  usage)      1.  Concept  Graph        2.  Predicate  sequence  tree  
  17. 17. Consuming  Query  Logs  Analysis:  Visualiza%ons    Concept  Graph        Predicate  sequence  tree  
  18. 18. Steps  for  Consuming  Query  Logs  Analysis   Identify Gather Build Render Instance Class Vis Vis Types Size TablesQuery Logs A1 A2 A3 A4Knowledge Base Identify Build Build Render KB Predicate Transition Vis Vis Sequence Matrix Tables B1 B2 B3 B4
  19. 19. CASE  STUDY  
  20. 20. Dataset  •  The  data  used  in  this  study  is  made  available  by  the   USEWOD2011  data  challenge.    •  The  logs  contained  around  5  million  queries  issued  to   DBpedia  over  a  %me  period  of  almost  4  months.     Number  of  analyzed  queries   4951803   Number  of  unique  triple  pa:erns   2641098   Number  of  unique  subjects   1168945   Number  of  unique  predicates   2003   Number  of  unique  objects   196221   Number  of  unique  vocabularies   323  
  21. 21. Analyzing  Dbpedia  usage  pa:erns  
  22. 22. Analyzing  Dbpedia  usage  pa:erns  (Cont’d)  
  23. 23. Ques%ons          Ques%ons? !