Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Computationally Tracing Concepts Through Time and Space

253 views

Published on

Slides for HNR2020 Keynote presentation

Abstract:
Digitised sources are a treasure trove for scholars, but accessing the information contained in them is far from trivial. Due to scale, traditional methods are insufficient to analyse the big data coming from these sources. Hence, computational methods look to be the solution. Indeed, computational methods can be utilised to identify and model concepts in large digital datasets, however the nature of these datasets as well as that of humanities research questions requires caution. In particular, the ramifications of time and location on understanding concepts cannot be underestimated.

In this talk, Marieke will present ongoing work on computationally tracing concepts through time and across geography using language and semantic web technology. The work illustrates that seemingly simple concepts (e.g. sugar) prove to be much more complex than expected. We discuss the importance of semantics in helping not only to deal with this complexity but reify it so that it can be interrogated both computationally and via expert analysis.

Slides 5, 8, 11, 12, 15, 16, 17, 18, 19, 20 are based the presentation Tabea Tietz gave for the paper "Challenges of Knowledge Graph Evolution from an NLP Perspective" in the WHiSe Workshop @ ESWC 2020 (2 June 2020).

http://hnr2020.historicalnetworkresearch.org/

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Computationally Tracing Concepts Through Time and Space

  1. 1. Computationally Tracing Concepts Through Time and Space Marieke van Erp merpeltje D I G I TA L H U M A N I T I E S L A B
  2. 2. D I G I TA L H U M A N I T I E S L A B Overview of this talk • Big (text) Data & Humanities • Tracing concepts • Entity spaces • New horizons • Wrapping up
  3. 3. D I G I TA L H U M A N I T I E S L A B Big Data & Humanities • Digitised archives are enabling new types of research • Dutch National Library: 100+ million newspaper, book & magazine pages • Chronicling America: 100,000 newspaper pages • Amsterdam City Archives: 160,000 notary deeds • Bibliothèque Nationale de Luxembourg: 800,000 pages • & many more sources
  4. 4. D I G I TA L H U M A N I T I E S L A B Zooming in & Zooming out • Qualitative methods often filter down to individual records or pages • Quantitative methods started scratching the surface • KNAW HuC focuses on bridging the gap between quantitative & qualitative analyses through advancing natural language processing and semantic web methods Image source: https://upload.wikimedia.org/wikipedia/commons/b/b5/MediaWiki_flame_graph_screenshot_2014-12-15_22.png
  5. 5. D I G I TA L H U M A N I T I E S L A B Digital Humanities • Involves the understanding of these cultural heritage data. • Methods involving Natural Language Processing supported by Knowledge Graphs have entered the humanities research community (Meroño-Peñuela et al.) Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
  6. 6. D I G I TA L H U M A N I T I E S L A B Who has the biggest sweet tooth? • Sugar consumption patterns are difficult to trace • Historical apple pie recipes can serve as a proxy • Apple pastries are common in many cultures Marieke van Erp & Ulbe Bosma: Divergent patterns of sugar consumption in the wake of the Industrial Revolution: an analysis on the basis of apple pie recipes. Forthcoming
  7. 7. D I G I TA L H U M A N I T I E S L A B Analysing historical recipes • Differences in availability of digitised sources • Digitisation artefacts hamper automatic analysis • Normalisation of quantities is needed • Combine quantitative & qualitative methods Marieke van Erp & Ulbe Bosma: Divergent patterns of sugar consumption in the wake of the Industrial Revolution: an analysis on the basis of apple pie recipes. (Forthcoming) Image source: https://en.wikipedia.org/wiki/Apple_pie#/media/File:For_to_Make_Tartys_in_Applis_(1381).gif
  8. 8. D I G I TA L H U M A N I T I E S L A B Comparing Ingredients in Dutch and American Apple Pie Recipes Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
  9. 9. D I G I TA L H U M A N I T I E S L A B Comparing sugar quantities in Dutch, American, French and German apple pie recipes Marieke van Erp & Ulbe Bosma: Divergent patterns of sugar consumption in the wake of the Industrial Revolution: an analysis on the basis of apple pie recipes. (Forthcoming)
  10. 10. D I G I TA L H U M A N I T I E S L A B What is an apple pie? • The real world is constantly changing • Knowledge that was considered true at one point in time in a specific cultural and spa7al setting may not be true in another context • Concepts evolve Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
  11. 11. D I G I TA L H U M A N I T I E S L A B Cultural Context ● What is considered as true in one cultural setting may not be in another. ● Apfelstrudel == apple pie? Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
  12. 12. How can we store this type of information at scale?
  13. 13. D I G I TA L H U M A N I T I E S L A B Concept modelling • Computer Science: Knowledge Representation/Semantic Web • Long history: at least since Aristotle • Machine readable knowledge was Sir Tim Berners-Lee’s intent when he developed the World Wide Web • To date, we have several large scale knowledge graphs such as DBpedia and Wikidata Image source: https://upload.wikimedia.org/wikipedia/commons/c/c6/Complexity_vs._orderliness.png
  14. 14. D I G I TA L H U M A N I T I E S L A B Knowledge Graphs • Represent what we consider true about parts of the world • Are created and maintained to continuously compose knowledge (Bonatti et al.). Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
  15. 15. D I G I TA L H U M A N I T I E S L A B But: • Knowledge Graphs are often static and only reflect one snippet of reality • This static representation of the real world is a problem when attempting to understand historical descriptions of concepts (Bonatti et al., Tasnim et al.) Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
  16. 16. D I G I TA L H U M A N I T I E S L A B Concepts • Are manifested in our cultures’ norms and values • Are documented through photographs, newspapers, books, music, film, advertisements. Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
  17. 17. D I G I TA L H U M A N I T I E S L A B Spatio-temporal context ● Distinguish the spatio- temporal metadata of the concept itself and the metadata of its source ● Trace the evolution of the concept over time and geographic regions Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
  18. 18. D I G I TA L H U M A N I T I E S L A B Units ● Modern units ○ imperial vs. metric system (lbs, kg) ● Historical units ○ ell, zentner ● Natural language description of measurements ○ “a load of butter”, “a plate of apples” Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
  19. 19. D I G I TA L H U M A N I T I E S L A B Concept modelling ● How broad or narrow should the ontology be modeled to fit the concept but also capture its changes over time? ● What are the properties that define a concept across the spatio-temporal and cultural context? Tabea Tietz et al. Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020
  20. 20. Entity spaces
  21. 21. D I G I TA L H U M A N I T I E S L A B Language & Meaning • Human language is incredibly flexible and efficient • We can use the term ‘sugar’ to refer to • the sugar industry (a sour day for sugar) • to particular instances of sugar (shall I put some sugar in?) • nutritional information (sugar and fiber intake) • commodities (grain and sugar are produced) • How can computers make sense of this? Marieke van Erp & Paul Groth (2020) Towards Entity Spaces. In: Proceedings of The 12th Language Resources and Evaluation Conference (LREC’2020)
  22. 22. D I G I TA L H U M A N I T I E S L A B Proxy for Entity Spaces Marieke van Erp & Paul Groth (2020) Towards Entity Spaces. In: Proceedings of The 12th Language Resources and Evaluation Conference (LREC’2020)
  23. 23. D I G I TA L H U M A N I T I E S L A B Tolerant Entity Linking • Not every meaning of an entity or concept is represented in a knowledge base • We argue that a link to an entity space is better than no link • ‘good enough interpretation’ (Poesio et al.) • Proof of concept shows increase in recall for 8 out of 13 datasets Marieke van Erp & Paul Groth (2020) Towards Entity Spaces. In: Proceedings of The 12th Language Resources and Evaluation Conference (LREC’2020)
  24. 24. D I G I TA L H U M A N I T I E S L A B Next steps • Extending entity spaces beyond Wikipedia • Structuring concepts within entity spaces • Add temporal dimension • Intangible concepts • Scale up Marieke van Erp & Paul Groth (2020) Towards Entity Spaces. In: Proceedings of The 12th Language Resources and Evaluation Conference (LREC’2020)
  25. 25. D I G I TA L H U M A N I T I E S L A B New Horizons • Complex concepts have multiple dimensions • Dimensions may go beyond a single discipline • Recognising, modelling & using concepts and knowledge graphs require team work
  26. 26. D I G I TA L H U M A N I T I E S L A B Unexpected Crews • Within the KNAW Humanities Cluster, we harbour (computational) linguists, historians, literature scientists, ethnologists, developers, network specialists, digital humanists… • Different disciplines find each other on intersection of topics/ data/methods • Use your network!
  27. 27. D I G I TA L H U M A N I T I E S L A B Wrapping Up • Text analysis and knowledge representation are becoming more important to humanities research • Big challenges for complex information extraction and modelling • Interdisciplinary collaboration is needed
  28. 28. http://dhlab.nl Acknowledgments: Adina Nerghes, Eleonora Marzi, Fabio Mariani, Harald Sack, ISWS Summer School, Lientje Maas, Mehwish Alam, Melvin Wevers, Mortaza Alinam, Paul Groth, Tabea Tietz, Ulbe Bosma & Wouter van den Berg
  29. 29. References • Tabea Tietz, Mehwish Alam, Harald Sack and Marieke van Erp (2020) Challenges of Knowledge Graph Evolution from an NLP Perspective. WHiSe Workshop @ ESWC 2020 • Marieke van Erp & Paul Groth (2020) Towards Entity Spaces. In: Proceedings of The 12th Language Resources and Evaluation Conference (LREC’2020) • Marieke van Erp & Ulbe Bosma: Divergent patterns of sugar consumption in the wake of the Industrial Revolution: an analysis on the basis of apple pie recipes. (Forthcoming) • Piero Andrea Bonatti, Stefan Decker, Axel Polleres and Valentina Presutti (2019) Knowledge Graphs: New Directions for Knowledge Representation on the Semantic Web. Dagstuhl Seminar 18371). Dagstuhl Reports 8(9), 29–111 (2019). https://doi.org/10.4230/DagRep.8.9.29 • Albert Meroño-Peñuela, Ashkan Ashkpour, Marieke van Erp, Kees Mandemakers, Leen Breure, Andrea Scharnhorst, Stefan Schlobach, Frank van Harmelen (2015) Semantic technologies for historical research: A survey. In: Semantic Web Journal • Mayesha Tasnim, Diego Collarana, Damien Graux, Fabrizio Orlandi and Maria-Esther Vidal (2019) Summarizing Entity Temporal Evolution in Knowledge Graphs. In: Companion Proceedings of The 2019 World Wide Web Conference •

×