Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Understanding	
  Change	
  in	
  
Versioned	
  KOS	
  on	
  the	
  Web	
  
Albert	
  Meroño-­‐Peñuela	
  
Christophe	
  Gu...
CEDAR:	
  Harmonizing	
  Historical	
  Census	
  
Data	
  in	
  the	
  SemanFc	
  Web	
  
CEDAR:	
  Harmonizing	
  Historical	
  Census	
  
Data	
  in	
  the	
  SemanFc	
  Web	
  
CEDAR:	
  Source	
  Historical	
  Data	
  
	
  
Dutch	
  Historical	
  Censuses	
  (1795-­‐1971)	
  	
  
[Public	
  Histor...
5	
  
From	
  scans	
  to	
  spreadsheets	
  
Uniform	
  queries	
  on	
  the	
  Web	
  
1795	
  	
  1830	
  	
  1840	
  	
  1849	
  	
  1859	
  	
  1869	
  	
  1879	
 ...
RDF	
  Data	
  Cube	
  
“There	
  are	
  many	
  situaFons	
  where	
  it	
  would	
  
be	
  useful	
  to	
  be	
  able	
 ...
RDF	
  Data	
  Cube	
  vocabulary	
  (QB)	
  
•  SDMX	
  compaFble	
  
•  Defines	
  cubes	
  as	
  a	
  set	
  of	
  obser...
Dynamic	
  ClassificaFons	
  
•  Gemeentegeschiedenis.nl	
  
Dynamic	
  ClassificaFons	
  
hjp://lod.cedar-­‐project.nl/maps/	
  (kudos	
  to	
  Richard	
  Zijdeman)	
  
Dynamic	
  ClassificaFons	
  
•  HISCO	
  
hjp://historyofwork.iisg.nl/	
  
LSD	
  Dimensions	
  
hjp://lsd-­‐dimensions.org/	
  
hjps://github.com/albertmeronyo/LSD-­‐Dimensions	
  
Daily	
  JSON-­...
hjp://lsd-­‐dimensions.org/	
  
Concept Drift
	
   Census	
  classificaFon	
  of	
  
occupaFons	
  as	
  for	
  
	
  
	
  1859	
  
•  Root	
  node	
  is	
 ...
Concept Drift
	
   Census	
  classificaFon	
  of	
  
occupaFons	
  as	
  for	
  
	
  
	
  1889	
  
•  Root	
  node	
  is	
 ...
Concept Drift
	
   Census	
  classificaFon	
  of	
  
occupaFons	
  as	
  for	
  
	
  
	
  1899	
  
•  Root	
  node	
  is	
 ...
Concept	
  Dris	
  
Upper ontologies
(HISCO, AC)
Year-
dependent
ontologies
1859 1869 1879
Concept	
  Dris	
  
Upper ontologies
(HISCO, AC)
Year-
dependent
ontologies
Concept	
  Dris	
  
Upper ontologies
(HISCO, AC)
Year-
dependent
ontologies
? ?
PredicFng	
  Change	
  
•  KOS	
  version	
  chains:	
  subsequent	
  unique	
  
version	
  iden*fiers	
  to	
  unique	
  s...
PredicFng	
  Change	
  
•  Proposal:	
  generic	
  approach	
  to	
  predict	
  when	
  and	
  
where	
  a	
  Web	
  KOS	
...
Concept	
  Dris	
  
•  Proxy	
  for	
  change	
  of	
  meaning	
  over	
  Fme1	
  
– Intension	
  dri:	
  occurs	
  when	
...
Input	
  Datasets	
  
KOS	
  version	
  chains	
  from	
  
•  HISCO/CEDAR	
  (1	
  version	
  chain)	
  
•  DBpedia	
  (2	...
Features	
  
•  From	
  which	
  data	
  characterisFcs	
  (related	
  to	
  
change)	
  should	
  we	
  learn?	
  
•  Sot...
Pipeline	
  
hjps://github.com/albertmeronyo/ConceptDris	
  	
  
EvaluaFon	
  
•  Use	
  a	
  subset	
  of	
  past	
  versions	
  for	
  learning	
  (Vt)	
  
•  Check	
  whether	
  change...
Results	
  –	
  classifier	
  performance	
  
CEDAR/HISCO	
  classificaFon	
  
performance	
  over	
  Fme	
  
Dbpedia	
  ont...
Results	
  –	
  understanding	
  performance	
  
RelaFonship	
  between	
  characterisFcs	
  of	
  input	
  version	
  cha...
Table 1:
Dependent variable:
functions rules trees functions rules trees functions rules trees
(1) (2) (3) (4) (5) (6) (7)...
SimulaFon	
  of	
  avgGap	
  VS	
  Classifier	
  Family	
  SelecFon	
  
Conclusions	
  
•  SemanFc	
  technology	
  for	
  Social	
  History	
  
–  It	
  saved	
  work!	
  
•  Historical	
  data...
Thank you
Questions, suggestions, comments
most welcome
@albertmeronyo
https://github.com/albertmeronyo/ConceptDrift
http:...
Me	
  in	
  6	
  tweets	
  
hjp://www.albertmeronyo.org	
  
•  Background:	
  Computer	
  Science,	
  Web	
  hacker,	
  AI...
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organisation Systems (KOS)
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organisation Systems (KOS)
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organisation Systems (KOS)
Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organisation Systems (KOS)
Upcoming SlideShare
Loading in …5
×

Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organisation Systems (KOS)

416 views

Published on

Albert Merono-Penuela (DANS, VU Amsterdam) "Understanding Change in Versioned Web-Knowledge Organisation Systems (KOS), Presentation at the KnoweScape workshop "Evolution and variation of classification systems" March 4-5, 2015 Amsterdam

Published in: Education
  • Be the first to comment

  • Be the first to like this

Albert Merono-Penuela: Understanding Change in Versioned Web-Knowledge Organisation Systems (KOS)

  1. 1. Understanding  Change  in   Versioned  KOS  on  the  Web   Albert  Meroño-­‐Peñuela   Christophe  Guéret   Stefan  Schlobach     @albertmeronyo     EvoluFon  and  variaFon  of  classificaFon  systems  –  KnoweScape  workshop   04-­‐03-­‐2015  
  2. 2. CEDAR:  Harmonizing  Historical  Census   Data  in  the  SemanFc  Web  
  3. 3. CEDAR:  Harmonizing  Historical  Census   Data  in  the  SemanFc  Web  
  4. 4. CEDAR:  Source  Historical  Data     Dutch  Historical  Censuses  (1795-­‐1971)     [Public  Historical  StaFsFcal  Data]      
  5. 5. 5   From  scans  to  spreadsheets  
  6. 6. Uniform  queries  on  the  Web   1795    1830    1840    1849    1859    1869    1879    1889    1899    1909    1919    1920    1930    1947    1956    1960    1971   (through  ~3K   heterogeneous  tables)  
  7. 7. RDF  Data  Cube   “There  are  many  situaFons  where  it  would   be  useful  to  be  able  to  publish  mulF-­‐ dimensional  data,  such  as  staFsFcs,  on  the   web  in  such  a  way  that  they  can  be  linked   to  related  data  sets  and  concepts.”  
  8. 8. RDF  Data  Cube  vocabulary  (QB)   •  SDMX  compaFble   •  Defines  cubes  as  a  set  of  observa*ons  that  consist  of   dimensions,  measures  and  a/ributes   •   Dimensions:  Fme  period,  region,  sex  (qb:DimensionProperty) •   Measure:  populaFon  life  expectancy  (qb:MeasureProperty)   •   Ajribute:  unit  of  measure  =  years,  metadata  status  =   measured  (qb:AttributeProperty)   ObservaFon:  “the  measured  life  expectancy  of  males  in   Newport  in  the  period  2004-­‐2006  is  76.7  years”  
  9. 9. Dynamic  ClassificaFons   •  Gemeentegeschiedenis.nl  
  10. 10. Dynamic  ClassificaFons   hjp://lod.cedar-­‐project.nl/maps/  (kudos  to  Richard  Zijdeman)  
  11. 11. Dynamic  ClassificaFons   •  HISCO   hjp://historyofwork.iisg.nl/  
  12. 12. LSD  Dimensions   hjp://lsd-­‐dimensions.org/   hjps://github.com/albertmeronyo/LSD-­‐Dimensions   Daily  JSON-­‐LD  dumps  
  13. 13. hjp://lsd-­‐dimensions.org/  
  14. 14. Concept Drift   Census  classificaFon  of   occupaFons  as  for      1859   •  Root  node  is  void   •  Depth  1:  occupaFon  groups   •  Leaves:  actual  occupaFons  
  15. 15. Concept Drift   Census  classificaFon  of   occupaFons  as  for      1889   •  Root  node  is  void   •  Depth  1:  occupaFon  groups   •  Leaves:  actual  occupaFons  
  16. 16. Concept Drift   Census  classificaFon  of   occupaFons  as  for      1899   •  Root  node  is  void   •  Depth  1:  occupaFon  groups   •  Leaves:  actual  occupaFons  
  17. 17. Concept  Dris   Upper ontologies (HISCO, AC) Year- dependent ontologies 1859 1869 1879
  18. 18. Concept  Dris   Upper ontologies (HISCO, AC) Year- dependent ontologies
  19. 19. Concept  Dris   Upper ontologies (HISCO, AC) Year- dependent ontologies ? ?
  20. 20. PredicFng  Change   •  KOS  version  chains:  subsequent  unique   version  iden*fiers  to  unique  states  of  KOS   •  ProblemaFc  for   – Data  publishers  (KOS  maintainability)   – Data  users/linkers  (link  validity)   A.  Meroño-­‐Peñuela,  C.  Guéret,  S.  Schlobach.  Predic1ng  Change  in  Versioned  Knowledge   Organisa1on  Systems  on  the  Web.  IJCAI  2015  (under  review)  
  21. 21. PredicFng  Change   •  Proposal:  generic  approach  to  predict  when  and   where  a  Web  KOS  of  any  domain  will  change   –  Using  supervised  learning  on  past  versions  of  KOS   •  SotA1:  predicFon  of  class  extension  in     –  1  OBO/OWL  version  chain  (Gene  Ontology)   –  using  few  classifiers   •  Contribu1on2:  predicFon  of  concept  dri:  in     –  150  Web  KOS  version  chains   –  using  all  (21)  SotA  classifiers  (WEKA  API)   2  A.  Meroño-­‐Peñuela,  C.  Guéret,  S.  Schlobach.  “Predic1ng  Change  in  Versioned  Knowledge   Organisa1on  Systems  on  the  Web”.  IJCAI  2015  (under  review)   1  C.  Pesquita,  F.M.  Couto.  “Predic1ng  the  extension  of  biomedical  ontologies”.  PLoS  computa1onal   biology  8  (9),  e1002630      
  22. 22. Concept  Dris   •  Proxy  for  change  of  meaning  over  Fme1   – Intension  dri:  occurs  when  there  is  a  difference   in  the  properFes  or  ajributes  of  two  variants  of   the  same  concept   – Extension  dri:  occurs  when  there  is  a  difference   in  the  individuals  that  belong  to  two  variants  of   the  same  concept   – Label  dri:  occurs  when  there  is  a  difference  in  the   labels  of  two  variants  of  the  same  concept   1  S.  Wang,  S.  Schlobach,  K.  Klein.  “What  Is  Concept  DriR  and  How  to  Measure  It?”.  EKAW  2010.  
  23. 23. Input  Datasets   KOS  version  chains  from   •  HISCO/CEDAR  (1  version  chain)   •  DBpedia  (2  version  chains)   •  Linked  Open  Vocabularies1  (134  version  chains)   •  *Ontology  chains  from  637  SPARQL   endpoints2  (6  version  chains)   1  hjp://lov.okfn.org/       2  hjps://github.com/albertmeronyo/ConceptDris-­‐data/tree/master/src    
  24. 24. Features   •  From  which  data  characterisFcs  (related  to   change)  should  we  learn?   •  SotA  in  Ontology  Change  [Stojanovic  2004]   – Structure-­‐driven  (rdfs:subClassOf,  skos:broader)   •  maxDepth,  children,  parents,  siblings   – Data-­‐driven  (rdf:type)   •  members,  childMembers,  parentMembers,   siblingMembers   – Usage-­‐driven   •  incExtLinks  (on  the  Web!)  
  25. 25. Pipeline   hjps://github.com/albertmeronyo/ConceptDris    
  26. 26. EvaluaFon   •  Use  a  subset  of  past  versions  for  learning  (Vt)   •  Check  whether  changed  happened  by   observing  Vr,  Ve  
  27. 27. Results  –  classifier  performance   CEDAR/HISCO  classificaFon   performance  over  Fme   Dbpedia  ontology  classificaFon   performance  over  Fme  
  28. 28. Results  –  understanding  performance   RelaFonship  between  characterisFcs  of  input  version  chains  and   selected  classifiers  /  performance?     •  totalSize   •  nSnapshots   •  avgGap   •  avgTreeDepth   •  ra1oInstances   •  ra1oStructural   •  ra1oInserts   •  ra1oDeletes   •  ra1oComm   f(xi)?   q  roc   q  classifier  
  29. 29. Table 1: Dependent variable: functions rules trees functions rules trees functions rules trees (1) (2) (3) (4) (5) (6) (7) (8) (9) log(nSnapshots) 0.291 0.257 1.975 0.180 0.239 1.745 0.193 0.212 1.838 (0.656) (0.765) (1.503) (0.680) (0.790) (1.512) (0.667) (0.777) (1.497) log(avgGap) 0.238 0.145 1.385⇤ 0.266 0.173 1.269⇤ 0.248 0.161 1.351⇤ (0.242) (0.271) (0.734) (0.240) (0.269) (0.703) (0.240) (0.270) (0.729) log(totalSize) 0.669⇤⇤⇤ 0.539⇤ 0.052 0.636⇤⇤ 0.531⇤ 0.010 0.641⇤⇤⇤ 0.524⇤ 0.025 (0.249) (0.278) (0.563) (0.251) (0.282) (0.555) (0.249) (0.279) (0.557) avgTreeDepth 0.399 0.334 0.534 0.393 0.336 0.564 0.385 0.323 0.553 (0.302) (0.330) (0.719) (0.304) (0.334) (0.728) (0.303) (0.332) (0.728) ratioInstances 1.378 2.463 3.090 1.071 2.246 3.394 1.269 2.330 3.221 (3.485) (4.021) (6.654) (3.455) (3.981) (6.629) (3.476) (4.005) (6.649) ratioStructural 9.054 1.357 9.539 9.039 1.674 10.799 9.594 1.116 10.030 (6.040) (6.135) (13.505) (6.142) (6.353) (13.945) (6.136) (6.267) (13.827) ratioInserts 3.006 2.376 3.540 (1.906) (2.210) (4.401) ratioDeletes 1.918 0.929 2.341 (1.907) (2.154) (4.058) ratioComm 1.440 0.945 1.615 (1.028) (1.170) (2.219) Constant 5.610⇤⇤ 5.580⇤⇤ 12.702⇤⇤ 5.288⇤⇤ 5.259⇤⇤ 12.402⇤⇤ 4.059⇤ 4.494⇤ 14.266⇤⇤ (2.248) (2.511) (5.954) (2.210) (2.494) (5.759) (2.265) (2.585) (6.511) Akaike Inf. Crit. 313.543 313.543 313.543 316.179 316.179 316.179 314.605 314.605 314.605 Note: ⇤ p<0.1; ⇤⇤ p<0.05; ⇤⇤⇤ p<0.01 Classifier  SelecFon  
  30. 30. SimulaFon  of  avgGap  VS  Classifier  Family  SelecFon  
  31. 31. Conclusions   •  SemanFc  technology  for  Social  History   –  It  saved  work!   •  Historical  datasets  as  an  observatory  of  dynamic   KOS   –  Logging  usage  of  KOS  in  Linked  StaFsFcal  Data   •  Modeling  change  in  Web  KOS   –  Version  chains  are  scarce  (beware  of  bias)   –  Chain  recipe:  nSnapshots,  avgTreeDepth,   raFoStructural,  raFoInserts,  raFoComm   –  Classifier  dependence:  avgGap,  totalSize  
  32. 32. Thank you Questions, suggestions, comments most welcome @albertmeronyo https://github.com/albertmeronyo/ConceptDrift http://www.cedar-project.nl http://krr.cs.vu.nl/ http://easy.dans.knaw.nl/ http://lsd-dimensions.org/
  33. 33. Me  in  6  tweets   hjp://www.albertmeronyo.org   •  Background:  Computer  Science,  Web  hacker,  AI  &  Law   •  PhD  candidate  at  the  VU  University  Amsterdam,  DANS,   and  eHumaniFes  group  (KNAW)   •  Topic:  SemanFc  Web  for  the  HumaniFes     •  CEDAR  project  (2012-­‐2015):  harmonized  historical   Dutch  censuses  in  the  SemanFc  Web     •  Problem:  staFsFcal  data  publishing,  concept  dris  and   dynamics  of  meaning     •  Last  paper:  What  is  Linked  Historical  Data?  (EKAW   2014)    

×