Using	  SKOS	  Vocabularies	  for	  Improving	  Web	  Search	  Web	  of	  Linked	  En>>es	  Workshop	  (WoLE	  2013)	  WWW...
Overview	  •  A	  brief	  intro	  to	  SKOS	  •  SKOS-­‐based	  term	  expansion	  •  Lucene-­‐SKOS	  •  Evalua>on	  •  Co...
What	  is	  SKOS?	  •  A	  language	  for	  describing	  Web	  vocabularies	  (taxonomies,	  classifica>on	  schemes,	  the...
Who	  is	  using	  SKOS?	  h"p://www.w3.org/2001/sw/wiki/SKOS/Datasets	  4	  
SKOS	  Example	  -­‐	  UKAT	  "Weapons"skos:prefLabelskos:Conceptukat:859"Military Equipment"skos:prefLabelskos:Conceptuka...
Overview	  •  A	  brief	  intro	  to	  SKOS	  •  SKOS-­‐based	  term	  expansion	  •  Lucene-­‐SKOS	  •  Evalua>on	  •  Co...
The	  big	  picture	  Retrieval ModelQueryDocumentsAnalysisAnalysisQueryRepresentationDocumentRepresentationScoring Result...
Label-­‐based	  (Query)	  Expansion	  •  Query:	  roman	  arms	  •  Document:	  Title	   Spearhead	  Descrip>on	   Roman	 ...
Label-­‐based	  (Query)	  Expansion	  roman armsskos:prefLabelweaponsskos:altLabelarmamentsskos:broaderordnanceskos:broade...
URI-­‐based	  Expansion	  •  Query:	  roman	  arms	  •  Document:	  Title	   Spearhead	  Descrip>on	   Roman	  iron	  spea...
URI-­‐based	  Expansion	  11	  
URI-­‐based	  Expansion	  matches	  Title	   Spearhead	  Descrip>on	   Roman	  iron	  spearhead.	  The	  spearhead	  was	 ...
Scoring	  •  Apply	  regular	  text	  retrieval	  func>ons	  •  boostctype:	  leverages	  explicit	  declara>on	  of	  SKO...
Overview	  •  A	  brief	  intro	  to	  SKOS	  •  SKOS-­‐based	  term	  expansion	  •  Implementa>on:	  Lucene-­‐SKOS	  •  ...
h"ps://github.com/behas/lucene-­‐skos	  15	  
Overview	  •  A	  brief	  intro	  to	  SKOS	  •  SKOS-­‐based	  term	  expansion	  •  Implementa>on:	  Lucene-­‐SKOS	  •  ...
Dataset	  1	  •  OHSUMED:	  – 350K	  Pubmed	  metadata	  records	  from	  270	  journals	  – Title,	  author,	  abstract,	...
Dataset	  2	  •  8,905	  MODS	  metadata	  records	  harvested	  from	  the	  Library	  of	  Congress	  (LoC)	  •  10	  qu...
Method	  •  Focus	  on	  label-­‐based	  expansion	  at	  query	  >me	  •  Normalized	  queries	  and	  SKOS	  vocabularie...
Two	  Baselines	  •  No	  term	  expansion	  (NoExp)	  – queries	  are	  executed	  over	  documents	  without	  SKOS-­‐ba...
OHSUMED	  query	  expansion	  example	  Query:	  fibromyalgia fibrositis, diagnosis and treatment	  Expanded	  Query:	  (f...
Results	  OHSUMED/MESH	  LTC TF-IDFP@1 P@3 P@10 nDCG@1 nDCG@3 nDCG@10 PNoExp 0.333 0.302 0.260 0.407 0.376 0.356 0.3PRF 0....
Ini>al	  Results	  LoC/LCSH	  .fsh.m-ee-ede.--fPRF might lead to better recall i.e., return more relevantdocuments, but hu...
Overview	  •  A	  brief	  intro	  to	  SKOS	  •  SKOS-­‐based	  term	  expansion	  •  Implementa>on:	  Lucene-­‐SKOS	  •  ...
Conclusions	  •  Our	  experiments	  indicated	  gains	  in	  retrieval	  effec>veness	  compared	  to	  no-­‐expansion	  o...
Future	  Work	  •  More	  thorough	  evalua>on	  using	  LoC/LCSH	  	  •  Use	  corpora,	  queries,	  and	  vocabularies	 ...
Thanks!	  @bhaslhofer,	  @flaviomar>ns	  hbp://slideshare.net/bhaslhofer	  	  hbps://github.com/behas/lucene-­‐skos	  27	  
Upcoming SlideShare
Loading in...5
×

Using SKOS Vocabularies for Improving Web Search

1,025

Published on

WoLE 2013 talk

Published in: Technology, Education
0 Comments
2 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
1,025
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
10
Comments
0
Likes
2
Embeds 0
No embeds

No notes for slide

Using SKOS Vocabularies for Improving Web Search

  1. 1. Using  SKOS  Vocabularies  for  Improving  Web  Search  Web  of  Linked  En>>es  Workshop  (WoLE  2013)  WWW  2013,  Rio  de  Janeiro,  May  13th  2013    Bernhard  Haslhofer  |  University  of  Vienna  Flávio  Mar>ns  |  Universidade  Nova  de  Lisboa  João  Magalhães  |  Universidade  Nova  de  Lisboa  
  2. 2. Overview  •  A  brief  intro  to  SKOS  •  SKOS-­‐based  term  expansion  •  Lucene-­‐SKOS  •  Evalua>on  •  Conclusions  2  
  3. 3. What  is  SKOS?  •  A  language  for  describing  Web  vocabularies  (taxonomies,  classifica>on  schemes,  thesauri)  •  Builds  on  Linked  Data  principles  – Concepts  have  URIs    – Concepts  are  interlinked  – Vocabularies  are  expressed  in  RDF  3  
  4. 4. Who  is  using  SKOS?  h"p://www.w3.org/2001/sw/wiki/SKOS/Datasets  4  
  5. 5. SKOS  Example  -­‐  UKAT  "Weapons"skos:prefLabelskos:Conceptukat:859"Military Equipment"skos:prefLabelskos:Conceptukat:5060skos:broaderskos:narrowerskos:broaderskos:narrower"Ordnance"skos:altLabel"Armaments"skos:altLabel"Arms"skos:altLabelukat: http://www.ukat.org.uk/thesaurus/concept/skos: http://www.w3.org/2004/02/skos/core# 5  
  6. 6. Overview  •  A  brief  intro  to  SKOS  •  SKOS-­‐based  term  expansion  •  Lucene-­‐SKOS  •  Evalua>on  •  Conclusions  6  
  7. 7. The  big  picture  Retrieval ModelQueryDocumentsAnalysisAnalysisQueryRepresentationDocumentRepresentationScoring ResultsSKOS-­‐based  term  expansion  7  
  8. 8. Label-­‐based  (Query)  Expansion  •  Query:  roman  arms  •  Document:  Title   Spearhead  Descrip>on   Roman  iron  spearhead.  The  spearhead  was  abached  to  one  end  of  a  wooden  shac.  .  .  Subject   Weapons  8  
  9. 9. Label-­‐based  (Query)  Expansion  roman armsskos:prefLabelweaponsskos:altLabelarmamentsskos:broaderordnanceskos:broadermilitary equipmentTitle   Spearhead  Descrip>on   Roman  iron  spearhead.  The  spearhead  was  abached  to  one  end  of  a  wooden  shac.  .  .  Subject   Weapons  matches  9  
  10. 10. URI-­‐based  Expansion  •  Query:  roman  arms  •  Document:  Title   Spearhead  Descrip>on   Roman  iron  spearhead.  The  spearhead  was  abached  to  one  end  of  a  wooden  shac.  .  .  Subject   hbp://www.ukat.org.uk/thesaurus/concept/859  10  
  11. 11. URI-­‐based  Expansion  11  
  12. 12. URI-­‐based  Expansion  matches  Title   Spearhead  Descrip>on   Roman  iron  spearhead.  The  spearhead  was  abached  to  one  end  of  a  wooden  shac.  .  .  Subject   hbp://www.ukat.org.uk/thesaurus/concept/859  arms,  weapons,  armaments,  ...  Query:  roman  arms  12  
  13. 13. Scoring  •  Apply  regular  text  retrieval  func>ons  •  boostctype:  leverages  explicit  declara>on  of  SKOS  expansion  types  in  term  representa>ons  •  coordq,d:  ensures  that  a  document  with  more  matching  terms  will  score  higher  13  
  14. 14. Overview  •  A  brief  intro  to  SKOS  •  SKOS-­‐based  term  expansion  •  Implementa>on:  Lucene-­‐SKOS  •  Evalua>on  •  Conclusions  14  
  15. 15. h"ps://github.com/behas/lucene-­‐skos  15  
  16. 16. Overview  •  A  brief  intro  to  SKOS  •  SKOS-­‐based  term  expansion  •  Implementa>on:  Lucene-­‐SKOS  •  Evalua>on  •  Conclusions  16  
  17. 17. Dataset  1  •  OHSUMED:  – 350K  Pubmed  metadata  records  from  270  journals  – Title,  author,  abstract,  …  – 3  level  relevance  judgments  for  informa>on  needs  •  Medical  Subject  Headings  (MeSH)  in  SKOS  – Maintained  by  US  Na>onal  Library  of  Medicine  – Used  to  index  millions  of  ar>cles  in  PubMed    17  
  18. 18. Dataset  2  •  8,905  MODS  metadata  records  harvested  from  the  Library  of  Congress  (LoC)  •  10  queries  from  the  2009/2010  query  collec>on  •  Binary  relevance  judgments  for  queries    •  Library  of  Congress  Subject  Headings  (LCSH)  in  SKOS  18  
  19. 19. Method  •  Focus  on  label-­‐based  expansion  at  query  >me  •  Normalized  queries  and  SKOS  vocabularies  •  Expanded  each  query  term  by  collec>ng  SKOS  concepts  that  have  that  term  in  any  of  their  labels  (prefLabel,  altLabel,  hiddenLabel)  •  Applied  pre-­‐defined  boost-­‐weights  that  maximized  query  performance  19  
  20. 20. Two  Baselines  •  No  term  expansion  (NoExp)  – queries  are  executed  over  documents  without  SKOS-­‐based  term  expansion.  •  Pseudo  Relevance  Feedback  (PRF):  – Perform  ini>al  search  with  original  query  – Collect  terms  from  k  retrieved  documents  – Resubmit  query  +  collected  terms  20  
  21. 21. OHSUMED  query  expansion  example  Query:  fibromyalgia fibrositis, diagnosis and treatment  Expanded  Query:  (fibromyalgia rheumatism muscular^0.5 diffuse myofascial pain syndrome^0.5fibromyalgia fibromyositis syndrome^0.5 myofascial pain syndrome diffuse^0.5fibromyositis fibromyalgia syndrome^0.5 fibrositis^0.5)  (fibrositis rheumatism muscular^0.5 diffuse myofascial pain syndrome^0.5fibromyalgia fibromyositis syndrome^0.5 myofascial pain syndrome diffuse^0.5fibromyositis fibromyalgia syndrome^0.5 fibromyalgia^0.5)  (diagnosis examinations diagnoses^0.5)  (treatment disease management^0.5 therapy^0.5)  21  
  22. 22. Results  OHSUMED/MESH  LTC TF-IDFP@1 P@3 P@10 nDCG@1 nDCG@3 nDCG@10 PNoExp 0.333 0.302 0.260 0.407 0.376 0.356 0.3PRF 0.322 0.282 0.302 0.379 0.360 0.393 0.3SKOS 0.419 0.366 0.276 0.484 0.429 0.379 0.5Table 1: Precision and nDCG results onquery set provided with each dataset (63 OHSUMED/MESH,10 LoC/LCSH). We focused on two measures, precision atrank n (P@n) for both dataset bundles and nDCG at rank n(nDCG@n) for the OHSUMED/MESH bundle, which pro-vides ordinal relevance judgments. Two retrieval modelswere used for ranking documents: LTC TF-IDF and BM25.4.1 Dataset Characteristicsonlyoutco25 mwithoapartPRFdocumBM25nDCG@10 P@1 P@3 P@10 nDCG@1 nDCG@3 nDCG@100.356 0.381 0.344 0.265 0.450 0.414 0.3740.393 0.377 0.317 0.275 0.443 0.397 0.3690.379 0.500 0.366 0.282 0.548 0.435 0.397CG results on the OHSUMED dataset.22  
  23. 23. Ini>al  Results  LoC/LCSH  .fsh.m-ee-ede.--fPRF might lead to better recall i.e., return more relevantdocuments, but hurt precison at top ranks. On the otherhand, with SKOS-expansion the terms used in expansionare always from a SKOS vocabulary and not from the cor-pus. Therefore, we are already only expanding using rele-vant terms, since the terms’ existence in the vocabulary is agood hint that the term is important.LTC TF-IDFP@1 P@3 P@10NoExp 0.500 0.500 0.370PRF 0.300 0.433 0.430SKOS 0.600 0.533 0.370Table 2: Precision with LTC TF-IDF results on theLibrary of Congress dataset.Table 2 shows the results we obtained from early experi-ments with the Library of Congress Subject Headings. Theresults are similar to the results with OHSUMED. The PRF23  
  24. 24. Overview  •  A  brief  intro  to  SKOS  •  SKOS-­‐based  term  expansion  •  Implementa>on:  Lucene-­‐SKOS  •  Experiments  •  Conclusions  24  
  25. 25. Conclusions  •  Our  experiments  indicated  gains  in  retrieval  effec>veness  compared  to  no-­‐expansion  or  pseudo  relevance  feedback  •  Our  solu>on  can  easily  be  adopted  by  loading  lucene-­‐skos  with  Apache  Lucene  and  Solr  25  
  26. 26. Future  Work  •  More  thorough  evalua>on  using  LoC/LCSH    •  Use  corpora,  queries,  and  vocabularies  from  other  domains  •  Apply  this  technique  for  general  concept-­‐based  and  URI-­‐iden>fied  Web  data  sources  26  
  27. 27. Thanks!  @bhaslhofer,  @flaviomar>ns  hbp://slideshare.net/bhaslhofer    hbps://github.com/behas/lucene-­‐skos  27  
  1. A particular slide catching your eye?

    Clipping is a handy way to collect important slides you want to go back to later.

×