Successfully reported this slideshow.
Your SlideShare is downloading. ×

Enriching Linked Open Data with distributional semantics to study concept drift

Ad

Enriching Linked Open Data
with distributional semantics
to study concept drift
Astrid van Aggelen, Laura Hollink, Jacco v...

Ad

What is concept drift?
Betti, A, van den Berg, H. Modelling the history of ideas. British Journal for the History of Philo...

Ad

What is concept drift?
• Intension: definitions, properties, necessary and sufficient condition
• e.g. science, gender nonco...

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Ad

Loading in …3
×

Check these out next

1 of 43 Ad
1 of 43 Ad

Enriching Linked Open Data with distributional semantics to study concept drift

Download to read offline

Presentation at the "Proximity in Information Retrieval" symposium on the occasion of the PhD thesis defense of Jeroen Vuurens
April 26, 2017, Delft University of Technology

Presentation at the "Proximity in Information Retrieval" symposium on the occasion of the PhD thesis defense of Jeroen Vuurens
April 26, 2017, Delft University of Technology

Advertisement
Advertisement

More Related Content

Advertisement

Enriching Linked Open Data with distributional semantics to study concept drift

  1. 1. Enriching Linked Open Data with distributional semantics to study concept drift Astrid van Aggelen, Laura Hollink, Jacco van Ossenbruggen Information Access Group
  2. 2. What is concept drift? Betti, A, van den Berg, H. Modelling the history of ideas. British Journal for the History of Philosophy, 22(4):812-835, 2014. Wang, S, Schlobach, S, Klein, M. Concept drift and how to identify it. Journal of Web Semantics 9.3:247- 265, 2011. Kenter, T, Wevers, M, Huijnen, P, de Rijke, M. Ad Hoc Monitoring of Vocabulary Shifts over Time. In Proceedings of CIKM, October 2015. The phenomenon where the characteristics of a concept change over time, signifying a shift in meaning
  3. 3. What is concept drift? • Intension: definitions, properties, necessary and sufficient condition • e.g. science, gender nonconformity Betti, A, van den Berg, H. Modelling the history of ideas. British Journal for the History of Philosophy, 22(4):812-835, 2014. Wang, S, Schlobach, S, Klein, M. Concept drift and how to identify it. Journal of Web Semantics 9.3:247- 265, 2011. Kenter, T, Wevers, M, Huijnen, P, de Rijke, M. Ad Hoc Monitoring of Vocabulary Shifts over Time. In Proceedings of CIKM, October 2015. The phenomenon where the characteristics of a concept change over time, signifying a shift in meaning
  4. 4. What is concept drift? • Intension: definitions, properties, necessary and sufficient condition • e.g. science, gender nonconformity Betti, A, van den Berg, H. Modelling the history of ideas. British Journal for the History of Philosophy, 22(4):812-835, 2014. Wang, S, Schlobach, S, Klein, M. Concept drift and how to identify it. Journal of Web Semantics 9.3:247- 265, 2011. Kenter, T, Wevers, M, Huijnen, P, de Rijke, M. Ad Hoc Monitoring of Vocabulary Shifts over Time. In Proceedings of CIKM, October 2015. The phenomenon where the characteristics of a concept change over time, signifying a shift in meaning • Extension: the instances of a class • e.g. new Nobel prize winners, EU member states
  5. 5. What is concept drift? • Intension: definitions, properties, necessary and sufficient condition • e.g. science, gender nonconformity Betti, A, van den Berg, H. Modelling the history of ideas. British Journal for the History of Philosophy, 22(4):812-835, 2014. Wang, S, Schlobach, S, Klein, M. Concept drift and how to identify it. Journal of Web Semantics 9.3:247- 265, 2011. Kenter, T, Wevers, M, Huijnen, P, de Rijke, M. Ad Hoc Monitoring of Vocabulary Shifts over Time. In Proceedings of CIKM, October 2015. The phenomenon where the characteristics of a concept change over time, signifying a shift in meaning • Extension: the instances of a class • e.g. new Nobel prize winners, EU member states • Labels: words used to refer to to a concept • e.g. “migrant”, “refugee”
  6. 6. Linked Open Data Classes, instances, their properties and labels are explicitly encoded in formal languages. class class class i i i i i i i i i i i label label label label
  7. 7. Concept drift problems in LOD applications Semantic annotation under concept drift Ontology matching under concept drift Interpreting user input under concept drift Premenstrual tension syndromes Tension syndromes Menstrual migraine Migraine x ICD9 2009 Premenstrual tension syndromes Tension syndromes synonyms "menstrual migrane" De Lignieres, B., et al. "Prevention of menstrual migraine by percutaneous oestradiol." British medical journal (Clinical research ed.) 293.6561 (1986): 1540. ICD9 2008 Ontology A Ontology A' Ontology B Ontology B' matched ? ?? new version new version
  8. 8. Semantic annotation under concept drift Premenstrual tension syndromes Tension syndromes synonyms "menstrual migrane" De Lignieres, B., et al. "Prevention of menstrual migraine by percutaneous oestradiol." British medical journal (Clinical research ed.) 293.6561 (1986): 1540. ICD9 2008
  9. 9. Semantic annotation under concept drift Example adapted from: Cédric Pruski, keynote presentation at Drift-a-LOD’17, First workshop on Detection, Representation and Management of Concept Drift in Linked Open Data, at EKAW, Bologna, Italy, 20 November 2016. Premenstrual tension syndromes Tension syndromes synonyms "menstrual migrane" De Lignieres, B., et al. "Prevention of menstrual migraine by percutaneous oestradiol." British medical journal (Clinical research ed.) 293.6561 (1986): 1540. ICD9 2008
  10. 10. Semantic annotation under concept drift Example adapted from: Cédric Pruski, keynote presentation at Drift-a-LOD’17, First workshop on Detection, Representation and Management of Concept Drift in Linked Open Data, at EKAW, Bologna, Italy, 20 November 2016. Premenstrual tension syndromes Tension syndromes Menstrual migraine Migraine x ICD9 2009 Premenstrual tension syndromes Tension syndromes synonyms "menstrual migrane" De Lignieres, B., et al. "Prevention of menstrual migraine by percutaneous oestradiol." British medical journal (Clinical research ed.) 293.6561 (1986): 1540. ICD9 2008
  11. 11. Interpreting user input under concept drift http://www.delpher.nl provides access to the digitised collections from the National Library of the Netherlands.
  12. 12. Interpreting user input under concept drift http://www.delpher.nl provides access to the digitised collections from the National Library of the Netherlands. S: (n) Holocaust, final solution (the mass murder of Jews under the German Nazi regime from 1941 until 1945) Semantic annotation / named entity detection x
  13. 13. Ontology matching under concept drift Example adapted from: Julio Cesar dos Reis, Cédric Pruski, Marcos Da Silveira, Chantal Reynaud-Delaître, Understanding semantic mapping evolution by observing changes in biomedical ontologies, Journal of Biomedical Informatics, Volume 47, February 2014, Pages 71-82 Ontology A Ontology Bmatched
  14. 14. Ontology matching under concept drift Example adapted from: Julio Cesar dos Reis, Cédric Pruski, Marcos Da Silveira, Chantal Reynaud-Delaître, Understanding semantic mapping evolution by observing changes in biomedical ontologies, Journal of Biomedical Informatics, Volume 47, February 2014, Pages 71-82 Ontology A Ontology A' Ontology Bmatched ?new version Ontology A Ontology Bmatched
  15. 15. Ontology matching under concept drift Example adapted from: Julio Cesar dos Reis, Cédric Pruski, Marcos Da Silveira, Chantal Reynaud-Delaître, Understanding semantic mapping evolution by observing changes in biomedical ontologies, Journal of Biomedical Informatics, Volume 47, February 2014, Pages 71-82 Ontology A Ontology A' Ontology B Ontology B' matched ? ?? new version new version Ontology A Ontology A' Ontology Bmatched ?new version Ontology A Ontology Bmatched
  16. 16. Studying concept drift in Linked Open Data Which concept will be deleted / merged / split / edited? Prediction Versioning “RDF diff” Keeping links & annotations up to date when entities change Which syntactic change is also a semantic change?
  17. 17. Studying concept drift in Linked Open Data Which concept will be deleted / merged / split / edited? Prediction Versioning “RDF diff” Keeping links & annotations up to date when entities change Which syntactic change is also a semantic change? Recent work: tracking changes on LOD scale
  18. 18. Studying concept drift in Linked Open Data Which concept will be deleted / merged / split / edited? Prediction Versioning “RDF diff” Keeping links & annotations up to date when entities change Which syntactic change is also a semantic change? Recent work: tracking changes on LOD scale Table from: Käfer, Tobias, et al. "Observing linked data dynamics." Extended Semantic Web Conference. Springer Berlin Heidelberg, 2013.
  19. 19. Studying concept drift in Linked Open Data Which concept will be deleted / merged / split / edited? Prediction Versioning “RDF diff” Keeping links & annotations up to date when entities change Which syntactic change is also a semantic change? Recent work: tracking changes on LOD scale Table from: Käfer, Tobias, et al. "Observing linked data dynamics." Extended Semantic Web Conference. Springer Berlin Heidelberg, 2013. Apart from these practical issues, it is also just interesting to see how knowledge evolves!
  20. 20. Changes in explicit knowledge are explicit too. We can now measure where and when intensional, extensional and label changes took place.
  21. 21. Changes in explicit knowledge are explicit too. But only to the entend that the facts are explicitly modelled. • The association between science and religion is not explicit. • The prevalent meaning of polysemous words is not explicit. We can now measure where and when intensional, extensional and label changes took place.
  22. 22. Changes in explicit knowledge are explicit too. But only to the entend that the facts are explicitly modelled. • The association between science and religion is not explicit. • The prevalent meaning of polysemous words is not explicit. We can now measure where and when intensional, extensional and label changes took place.
  23. 23. Changes in explicit knowledge are explicit too. But only to the entend that the facts are explicitly modelled. • The association between science and religion is not explicit. • The prevalent meaning of polysemous words is not explicit. We can now measure where and when intensional, extensional and label changes took place.
  24. 24. Distributional semantics works well for detecting changes in word meaning Evaluated e.g. in Frermann & Lapata. A Bayesian Model of Diachronic Meaning Change. examples by Aurelie Herbelot, http://aurelieherbelot.net/research/distributional-semantics-intro/ matrices from https://cs224d.stanford.edu/lecture_notes/notes1.pdf
  25. 25. Image from: Lea Frermann. “Modelling fine-grained Change in Word Meaning over centuries from Large Collections of Unstructured Text." Keynote presentation at Drift-a-LOD’17, First workshop on Detection, Representation and Management of Concept Drift in Linked Open Data, at EKAW, Bologna, Italy, 20 November 2016.
  26. 26. Image from: Lea Frermann. “Modelling fine-grained Change in Word Meaning over centuries from Large Collections of Unstructured Text." Keynote presentation at Drift-a-LOD’17, First workshop on Detection, Representation and Management of Concept Drift in Linked Open Data, at EKAW, Bologna, Italy, 20 November 2016.
  27. 27. Information on the level of individual words Open questions: Have synonyms changed too? And hyponyms? Have all the words for political systems changed? Which group of words has changed most?
  28. 28. Enriching Linked Open Data with distributional semantics +
  29. 29. Enriching Linked Open Data with distributional semantics GTAA + * A method to link the two data sources * A data model to represent the combination * An RDF dataset that can be queried: https://github.com/aan680/ SemanticChange_data
  30. 30. Enriching Linked Open Data with distributional semantics GTAA + * A method to link the two data sources * A data model to represent the combination * An RDF dataset that can be queried: https://github.com/aan680/ SemanticChange_data ✤ Code ✤ Embeddings derived from google books ✤ Change scores for top 10.000 words ✤ between each decade over 200 years.
  31. 31. WordNet Data Model example of data from WordNet RDF Synset (democracy) LexicalEntry Form Synset (political system) "a political system in which the supreme power lies in a body of citizens who can elect people to represent them" "democracy"@en gloss noun.group domain Synset (parliamentary democracy) noun part of speech "a political system in which a mob is the source of control; government by the masses" Synset (mobocracy) gloss Synset (political party) meronym hypernym hypernym hypernym
  32. 32. Data model for change scores {lexical entry, decade 1, decade 2, change score}
  33. 33. Data model for change scores 8.878 matches (out of 10.000) 
 mapped on 12.469 lexical entries
  34. 34. Example query WordNet synsets are classified into 46 ‘domains’. Which domain has changes most in the past two centuries? . :
  35. 35. Follow-up query Top 10 changing words within the “process” domain
  36. 36. Follow-up query Which subconcept of “Psychological state” has changed most?
  37. 37. Example query Relation between polysemy (nr. of senses of a word in WordNet) and change score? . :
  38. 38. Example query • Which linguistic category has changed most?
  39. 39. Late breaking results • Can we use relations in LOD to study how a concept has changed? Instead of only how much?
  40. 40. Late breaking results • Can we use relations in LOD to study how a concept has changed? Instead of only how much? Gay
  41. 41. Late breaking results • Can we use relations in LOD to study how a concept has changed? Instead of only how much? Gay
  42. 42. Call
  43. 43. Conclusion A first step to enrich LOD with information about lexical change, obtained from large volumes of unstructured text. GTAA Next steps: enrich LOD with info about how concepts are used: • popularity? • importance? • controversy? Published as: A. van Aggelen, L. Hollink and J. van Ossenbruggen. Combining distributional semantics and structured data to study lexical change. In proceedings of the first Drift- a-LOD workshop, co-located with EKAW, Bologna, Italy, 20 Nov. 2016

×