Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organisation Systems (KOS)

430 views

Published on

Albert Merono-Penuela (DANS, VU Amsterdam) "Understanding Change in Versioned Web-Knowledge Organisation Systems (KOS), Presentation at the KnoweScape workshop "Evolution and variation of classification systems" March 4-5, 2015 Amsterdam

Published in: Education
  • Be the first to comment

  • Be the first to like this

Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organisation Systems (KOS)

  1. 1. Understanding  Change  in   Versioned  KOS  on  the  Web   Albert  Meroño-­‐Peñuela   Christophe  Guéret   Stefan  Schlobach     @albertmeronyo     EvoluFon  and  variaFon  of  classificaFon  systems  –  KnoweScape  workshop   04-­‐03-­‐2015  
  2. 2. CEDAR:  Harmonizing  Historical  Census   Data  in  the  SemanFc  Web  
  3. 3. CEDAR:  Harmonizing  Historical  Census   Data  in  the  SemanFc  Web  
  4. 4. CEDAR:  Source  Historical  Data     Dutch  Historical  Censuses  (1795-­‐1971)     [Public  Historical  StaFsFcal  Data]      
  5. 5. 5   From  scans  to  spreadsheets  
  6. 6. Uniform  queries  on  the  Web   1795    1830    1840    1849    1859    1869    1879    1889    1899    1909    1919    1920    1930    1947    1956    1960    1971   (through  ~3K   heterogeneous  tables)  
  7. 7. RDF  Data  Cube   “There  are  many  situaFons  where  it  would   be  useful  to  be  able  to  publish  mulF-­‐ dimensional  data,  such  as  staFsFcs,  on  the   web  in  such  a  way  that  they  can  be  linked   to  related  data  sets  and  concepts.”  
  8. 8. RDF  Data  Cube  vocabulary  (QB)   •  SDMX  compaFble   •  Defines  cubes  as  a  set  of  observa*ons  that  consist  of   dimensions,  measures  and  a/ributes   •   Dimensions:  Fme  period,  region,  sex  (qb:DimensionProperty) •   Measure:  populaFon  life  expectancy  (qb:MeasureProperty)   •   Ajribute:  unit  of  measure  =  years,  metadata  status  =   measured  (qb:AttributeProperty)   ObservaFon:  “the  measured  life  expectancy  of  males  in   Newport  in  the  period  2004-­‐2006  is  76.7  years”  
  9. 9. Dynamic  ClassificaFons   •  Gemeentegeschiedenis.nl  
  10. 10. Dynamic  ClassificaFons   hjp://lod.cedar-­‐project.nl/maps/  (kudos  to  Richard  Zijdeman)  
  11. 11. Dynamic  ClassificaFons   •  HISCO   hjp://historyofwork.iisg.nl/  
  12. 12. LSD  Dimensions   hjp://lsd-­‐dimensions.org/   hjps://github.com/albertmeronyo/LSD-­‐Dimensions   Daily  JSON-­‐LD  dumps  
  13. 13. hjp://lsd-­‐dimensions.org/  
  14. 14. Concept Drift   Census  classificaFon  of   occupaFons  as  for      1859   •  Root  node  is  void   •  Depth  1:  occupaFon  groups   •  Leaves:  actual  occupaFons  
  15. 15. Concept Drift   Census  classificaFon  of   occupaFons  as  for      1889   •  Root  node  is  void   •  Depth  1:  occupaFon  groups   •  Leaves:  actual  occupaFons  
  16. 16. Concept Drift   Census  classificaFon  of   occupaFons  as  for      1899   •  Root  node  is  void   •  Depth  1:  occupaFon  groups   •  Leaves:  actual  occupaFons  
  17. 17. Concept  Dris   Upper ontologies (HISCO, AC) Year- dependent ontologies 1859 1869 1879
  18. 18. Concept  Dris   Upper ontologies (HISCO, AC) Year- dependent ontologies
  19. 19. Concept  Dris   Upper ontologies (HISCO, AC) Year- dependent ontologies ? ?
  20. 20. PredicFng  Change   •  KOS  version  chains:  subsequent  unique   version  iden*fiers  to  unique  states  of  KOS   •  ProblemaFc  for   – Data  publishers  (KOS  maintainability)   – Data  users/linkers  (link  validity)   A.  Meroño-­‐Peñuela,  C.  Guéret,  S.  Schlobach.  Predic1ng  Change  in  Versioned  Knowledge   Organisa1on  Systems  on  the  Web.  IJCAI  2015  (under  review)  
  21. 21. PredicFng  Change   •  Proposal:  generic  approach  to  predict  when  and   where  a  Web  KOS  of  any  domain  will  change   –  Using  supervised  learning  on  past  versions  of  KOS   •  SotA1:  predicFon  of  class  extension  in     –  1  OBO/OWL  version  chain  (Gene  Ontology)   –  using  few  classifiers   •  Contribu1on2:  predicFon  of  concept  dri:  in     –  150  Web  KOS  version  chains   –  using  all  (21)  SotA  classifiers  (WEKA  API)   2  A.  Meroño-­‐Peñuela,  C.  Guéret,  S.  Schlobach.  “Predic1ng  Change  in  Versioned  Knowledge   Organisa1on  Systems  on  the  Web”.  IJCAI  2015  (under  review)   1  C.  Pesquita,  F.M.  Couto.  “Predic1ng  the  extension  of  biomedical  ontologies”.  PLoS  computa1onal   biology  8  (9),  e1002630      
  22. 22. Concept  Dris   •  Proxy  for  change  of  meaning  over  Fme1   – Intension  dri:  occurs  when  there  is  a  difference   in  the  properFes  or  ajributes  of  two  variants  of   the  same  concept   – Extension  dri:  occurs  when  there  is  a  difference   in  the  individuals  that  belong  to  two  variants  of   the  same  concept   – Label  dri:  occurs  when  there  is  a  difference  in  the   labels  of  two  variants  of  the  same  concept   1  S.  Wang,  S.  Schlobach,  K.  Klein.  “What  Is  Concept  DriR  and  How  to  Measure  It?”.  EKAW  2010.  
  23. 23. Input  Datasets   KOS  version  chains  from   •  HISCO/CEDAR  (1  version  chain)   •  DBpedia  (2  version  chains)   •  Linked  Open  Vocabularies1  (134  version  chains)   •  *Ontology  chains  from  637  SPARQL   endpoints2  (6  version  chains)   1  hjp://lov.okfn.org/       2  hjps://github.com/albertmeronyo/ConceptDris-­‐data/tree/master/src    
  24. 24. Features   •  From  which  data  characterisFcs  (related  to   change)  should  we  learn?   •  SotA  in  Ontology  Change  [Stojanovic  2004]   – Structure-­‐driven  (rdfs:subClassOf,  skos:broader)   •  maxDepth,  children,  parents,  siblings   – Data-­‐driven  (rdf:type)   •  members,  childMembers,  parentMembers,   siblingMembers   – Usage-­‐driven   •  incExtLinks  (on  the  Web!)  
  25. 25. Pipeline   hjps://github.com/albertmeronyo/ConceptDris    
  26. 26. EvaluaFon   •  Use  a  subset  of  past  versions  for  learning  (Vt)   •  Check  whether  changed  happened  by   observing  Vr,  Ve  
  27. 27. Results  –  classifier  performance   CEDAR/HISCO  classificaFon   performance  over  Fme   Dbpedia  ontology  classificaFon   performance  over  Fme  
  28. 28. Results  –  understanding  performance   RelaFonship  between  characterisFcs  of  input  version  chains  and   selected  classifiers  /  performance?     •  totalSize   •  nSnapshots   •  avgGap   •  avgTreeDepth   •  ra1oInstances   •  ra1oStructural   •  ra1oInserts   •  ra1oDeletes   •  ra1oComm   f(xi)?   q  roc   q  classifier  
  29. 29. Table 1: Dependent variable: functions rules trees functions rules trees functions rules trees (1) (2) (3) (4) (5) (6) (7) (8) (9) log(nSnapshots) 0.291 0.257 1.975 0.180 0.239 1.745 0.193 0.212 1.838 (0.656) (0.765) (1.503) (0.680) (0.790) (1.512) (0.667) (0.777) (1.497) log(avgGap) 0.238 0.145 1.385⇤ 0.266 0.173 1.269⇤ 0.248 0.161 1.351⇤ (0.242) (0.271) (0.734) (0.240) (0.269) (0.703) (0.240) (0.270) (0.729) log(totalSize) 0.669⇤⇤⇤ 0.539⇤ 0.052 0.636⇤⇤ 0.531⇤ 0.010 0.641⇤⇤⇤ 0.524⇤ 0.025 (0.249) (0.278) (0.563) (0.251) (0.282) (0.555) (0.249) (0.279) (0.557) avgTreeDepth 0.399 0.334 0.534 0.393 0.336 0.564 0.385 0.323 0.553 (0.302) (0.330) (0.719) (0.304) (0.334) (0.728) (0.303) (0.332) (0.728) ratioInstances 1.378 2.463 3.090 1.071 2.246 3.394 1.269 2.330 3.221 (3.485) (4.021) (6.654) (3.455) (3.981) (6.629) (3.476) (4.005) (6.649) ratioStructural 9.054 1.357 9.539 9.039 1.674 10.799 9.594 1.116 10.030 (6.040) (6.135) (13.505) (6.142) (6.353) (13.945) (6.136) (6.267) (13.827) ratioInserts 3.006 2.376 3.540 (1.906) (2.210) (4.401) ratioDeletes 1.918 0.929 2.341 (1.907) (2.154) (4.058) ratioComm 1.440 0.945 1.615 (1.028) (1.170) (2.219) Constant 5.610⇤⇤ 5.580⇤⇤ 12.702⇤⇤ 5.288⇤⇤ 5.259⇤⇤ 12.402⇤⇤ 4.059⇤ 4.494⇤ 14.266⇤⇤ (2.248) (2.511) (5.954) (2.210) (2.494) (5.759) (2.265) (2.585) (6.511) Akaike Inf. Crit. 313.543 313.543 313.543 316.179 316.179 316.179 314.605 314.605 314.605 Note: ⇤ p<0.1; ⇤⇤ p<0.05; ⇤⇤⇤ p<0.01 Classifier  SelecFon  
  30. 30. SimulaFon  of  avgGap  VS  Classifier  Family  SelecFon  
  31. 31. Conclusions   •  SemanFc  technology  for  Social  History   –  It  saved  work!   •  Historical  datasets  as  an  observatory  of  dynamic   KOS   –  Logging  usage  of  KOS  in  Linked  StaFsFcal  Data   •  Modeling  change  in  Web  KOS   –  Version  chains  are  scarce  (beware  of  bias)   –  Chain  recipe:  nSnapshots,  avgTreeDepth,   raFoStructural,  raFoInserts,  raFoComm   –  Classifier  dependence:  avgGap,  totalSize  
  32. 32. Thank you Questions, suggestions, comments most welcome @albertmeronyo https://github.com/albertmeronyo/ConceptDrift http://www.cedar-project.nl http://krr.cs.vu.nl/ http://easy.dans.knaw.nl/ http://lsd-dimensions.org/
  33. 33. Me  in  6  tweets   hjp://www.albertmeronyo.org   •  Background:  Computer  Science,  Web  hacker,  AI  &  Law   •  PhD  candidate  at  the  VU  University  Amsterdam,  DANS,   and  eHumaniFes  group  (KNAW)   •  Topic:  SemanFc  Web  for  the  HumaniFes     •  CEDAR  project  (2012-­‐2015):  harmonized  historical   Dutch  censuses  in  the  SemanFc  Web     •  Problem:  staFsFcal  data  publishing,  concept  dris  and   dynamics  of  meaning     •  Last  paper:  What  is  Linked  Historical  Data?  (EKAW   2014)    

×