Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Jacco van Ossenbruggen - Detecteren van veranderingen in de betekenis van woorden en concepten

431 views

Published on

Lezing bij VOGIN-IP-lezing - 28 maart 2018 - Amsterdam

Published in: Data & Analytics
  • Be the first to comment

  • Be the first to like this

Jacco van Ossenbruggen - Detecteren van veranderingen in de betekenis van woorden en concepten

  1. 1. Enriching Linked Open Data with distributional semantics to study concept drift Astrid van Aggelen, Laura Hollink, Jacco van Ossenbruggen Information Access Group
  2. 2. What is concept drift? • Intension: definitions, properties, necessary and sufficient condition • e.g. science, gender nonconformity Betti, A, van den Berg, H. Modelling the history of ideas. British Journal for the History of Philosophy, 22(4):812-835, 2014. Wang, S, Schlobach, S, Klein, M. Concept drift and how to identify it. Journal of Web Semantics 9.3:247- 265, 2011. Kenter, T, Wevers, M, Huijnen, P, de Rijke, M. Ad Hoc Monitoring of Vocabulary Shifts over Time. In Proceedings of CIKM, October 2015. The phenomenon where the characteristics of a concept change over time, signifying a shift in meaning • Extension: the instances of a class • e.g. new Nobel prize winners, EU member states • Labels: words used to refer to to a concept • e.g. “migrant”, “refugee”
  3. 3. Linked Open Data Classes, instances, their properties and labels are explicitly encoded in formal languages.
  4. 4. Concept drift problems in LOD applications Semantic annotation under concept drift Ontology matching under concept drift Interpreting user input under concept drift
  5. 5. Semantic annotation under concept drift Example adapted from: Cédric Pruski, keynote presentation at Drift-a-LOD’17, First workshop on Detection, Representation and Management of Concept Drift in Linked Open Data, at EKAW, Bologna, Italy, 20 November 2016.
  6. 6. Interpreting user input under concept drift http://www.delpher.nl provides access to the digitised collections from the National Library of the Netherlands. S: (n) Holocaust, final solution (the mass murder of Jews under the German Nazi regime from 1941 until 1945) Semantic annotation / named entity detection x
  7. 7. Ontology matching under concept drift Example adapted from: Julio Cesar dos Reis, Cédric Pruski, Marcos Da Silveira, Chantal Reynaud-Delaître, Understanding semantic mapping evolution by observing changes in biomedical ontologies, Journal of Biomedical Informatics, Volume 47, February 2014, Pages 71-82
  8. 8. Studying concept drift in Linked Open Data Which concept will be deleted / merged / split / edited? Prediction Versioning “RDF diff” Keeping links & annotations up to date when entities change Which syntactic change is also a semantic change? Recent work: tracking changes on LOD scale Table from: Käfer, Tobias, et al. "Observing linked data dynamics." Extended Semantic Web Conference. Springer Berlin Heidelberg, 2013. Apart from these practical issues, it is also just interesting to see how knowledge evolves!
  9. 9. Changes in explicit knowledge are explicit too. But only to the entend that the facts are explicitly modelled. • The association between science and religion is not explicit. • The prevalent meaning of polysemous words is not explicit. We can now measure where and when intensional, extensional and label changes took place.
  10. 10. Distributional semantics works well for detecting changes in word meaning Evaluated e.g. in Frermann & Lapata. A Bayesian Model of Diachronic Meaning Change. examples by Aurelie Herbelot, http://aurelieherbelot.net/research/distributional-semantics-intro/ matrices from https://cs224d.stanford.edu/lecture_notes/notes1.pdf
  11. 11. Image from: Lea Frermann. “Modelling fine-grained Change in Word Meaning over centuries from Large Collections of Unstructured Text." Keynote presentation at Drift-a-LOD’17, First workshop on Detection, Representation and Management of Concept Drift in Linked Open Data, at EKAW, Bologna, Italy, 20 November 2016.
  12. 12. Image from: Lea Frermann. “Modelling fine-grained Change in Word Meaning over centuries from Large Collections of Unstructured Text." Keynote presentation at Drift-a-LOD’17, First workshop on Detection, Representation and Management of Concept Drift in Linked Open Data, at EKAW, Bologna, Italy, 20 November 2016.
  13. 13. Information on the level of individual words Open questions: Have synonyms changed too? And hyponyms? Have all the words for political systems changed? Which group of words has changed most?
  14. 14. Enriching Linked Open Data with distributional semantics +
  15. 15. Enriching Linked Open Data with distributional semantics + * A method to link the two data sources * A data model to represent the combination * An RDF dataset that can be queried: https://github.com/aan680/SemanticCha nge_data ✤ Code ✤ Embeddings derived from google books ✤ Change scores for top 10.000 words ✤ between each decade over 200 years.
  16. 16. WordNet Data Model example of data from WordNet RDF
  17. 17. Data model for change scores {lexical entry, decade 1, decade 2, change score}
  18. 18. Data model for change scores 8.878 matches (out of 10.000) mapped on 12.469 lexical entries
  19. 19. Example query WordNet synsets are classified into 46 ‘domains’. Which domain has changes most in the past two centuries? . :
  20. 20. Follow-up query Top 10 changing words within the “process” domain
  21. 21. Follow-up query Which subconcept of “Psychological state” has changed most?
  22. 22. Example query Relation between polysemy (nr. of senses of a word in WordNet) and change score? . :
  23. 23. Example query • Which linguistic category has changed most?
  24. 24. Late breaking results • Can we use relations in LOD to study how a concept has changed? Instead of only how much? Gay
  25. 25. Call
  26. 26. Conclusion A first step to enrich LOD with information about lexical change, obtained from large volumes of unstructured text. Next steps: enrich LOD with info about how concepts are used: • popularity? • importance? Published as: A. van Aggelen, L. Hollink and J. van Ossenbruggen. Combining distributional semantics and structured data to study lexical change. In proceedings of the first Drift- a-LOD workshop, co-located with EKAW, Bologna, Italy, 20 Nov. 2016

×