Slides of the Knowledge and Media lecture about Linked Data and Linked Open Data. Presented 19 november 2012. Slides were based on presentations by Victor de Boer and Christophe Guéret
Agora: putting museum objects into their art-historic contextMarieke van Erp
The digital era has presented big challenges, but also great opportunities for the museum world. One of these opportunities is the way museums can open up their collections to the public. Many museums are now actively exploring possibilities to present their collections online for visitors who cannot come to the museum, or to show objects for which they do not have space in the exhibition halls. Often they will put together themed Web sites for online exhibitions in which objects are presented in a certain context. However, these themed Web sites usually only cover only a small part of their collection. For the majority of the objects, the context is not made explicit. In the Agora project, we aim to make this context explicit in an automatic way in order to help users understand and interpret museum objects. We do this by linking museum objects to historical events and explicitly presenting these links in an event-driven browsing environment.
In the first part of my talk, I will explain the theoretical framework we have developed in the Agora project to represent historical contexts as well as the general challenges to the project. In the second part of my talk, I will focus on the particular challenges in information extraction for building the event thesaurus and linking museum objects.
These slides are from a presentation given at the Eurecom seminar on July 20 2012
Lecture 5: Mining, Analysis and VisualisationMarieke van Erp
This is the fourth lecture in the Social Web course at the VU University Amsterdam
Visit the website for more information: <a>Social Web 2012</a>
Presentation of "Reusing Linguistic Resources: Tasks and Goals for a Linked Data Approach", March 9, DGfS 34, Frankfurt Germany.
Find the paper at: http://www.springerlink.com/content/k535323272457913
Slides shown at Agora Bronbeek session in which we interviewed eye witnesses and interested lay persons on how they share their memories and access information about historical events they were involved in.
Agora: putting museum objects into their art-historic contextMarieke van Erp
The digital era has presented big challenges, but also great opportunities for the museum world. One of these opportunities is the way museums can open up their collections to the public. Many museums are now actively exploring possibilities to present their collections online for visitors who cannot come to the museum, or to show objects for which they do not have space in the exhibition halls. Often they will put together themed Web sites for online exhibitions in which objects are presented in a certain context. However, these themed Web sites usually only cover only a small part of their collection. For the majority of the objects, the context is not made explicit. In the Agora project, we aim to make this context explicit in an automatic way in order to help users understand and interpret museum objects. We do this by linking museum objects to historical events and explicitly presenting these links in an event-driven browsing environment.
In the first part of my talk, I will explain the theoretical framework we have developed in the Agora project to represent historical contexts as well as the general challenges to the project. In the second part of my talk, I will focus on the particular challenges in information extraction for building the event thesaurus and linking museum objects.
These slides are from a presentation given at the Eurecom seminar on July 20 2012
Lecture 5: Mining, Analysis and VisualisationMarieke van Erp
This is the fourth lecture in the Social Web course at the VU University Amsterdam
Visit the website for more information: <a>Social Web 2012</a>
Presentation of "Reusing Linguistic Resources: Tasks and Goals for a Linked Data Approach", March 9, DGfS 34, Frankfurt Germany.
Find the paper at: http://www.springerlink.com/content/k535323272457913
Slides shown at Agora Bronbeek session in which we interviewed eye witnesses and interested lay persons on how they share their memories and access information about historical events they were involved in.
Linked data: Four rules and five stars for the Amsterdam MuseumVictor de Boer
Slides used for a guest lecture about Linked Data for the course "Knowledge and Media" at the VU Amsterdam (Nov 2011).
The talk takes the practical example of converting Amsterdam Museum data to Five-star Linked Open Data.
Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritag...Victor de Boer
Within the cultural heritage field, proprietary metadata and vocabularies are being transformed into public Linked Data. These efforts have mostly been at the level of large-scale aggregators such as Europeana where the original data is abstracted to a common format and schema. Although this approach ensures a level of consistency and interoperability, the richness of the original data is lost in the process. In this paper, we present a transparent and interactive methodology for ingesting, converting and linking cultural heritage metadata into Linked Data. The methodology is designed to maintain the richness and detail of the original metadata.
We introduce the XMLRDF conversion tool and describe how it is integrated in the ClioPatria semantic web toolkit. The methodology and the tools have been validated by converting the Amsterdam Museum metadata to a Linked Data version. In this way, the Amsterdam Museum became the first `small' cultural heritage institution with a node in the Linked Data cloud.
Radically Open Cultural Heritage Data on the WebJulie Allinson
What happens when tens of thousands of archival photos are shared with open licenses, then mashed up with geolocation data and current photos? Or when app developers can freely utilize information and images from millions of books? On this panel, we'll explore the fundamental elements of Linked Open Data and discover how rapidly growing access to metadata within the world's libraries, archives and museums is opening exciting new possibilities for understanding our past, and may help in predicting our future. Our panelists will look into the technological underpinnings of Linked Open Data, demonstrate use cases and applications, and consider the possibilities of such data for scholarly research, preservation, commercial interests, and the future of cultural heritage data.
First steps towards publishing library data on the semantic webhorvadam
First steps towards publishing library data on the semantic web. Implementing:
CoolUri
RDFDC
SKOS
RDF database and SPARQL interface
Content negotiation
Lecture at the advanced course on Data Science of the SIKS research school, May 20, 2016, Vught, The Netherlands.
Contents
-Why do we create Linked Open Data? Example questions from the Humanities and Social Sciences
-Introduction into Linked Open Data
-Lessons learned about the creation of Linked Open Data (link discovery, knowledge representation, evaluation).
-Accessing Linked Open Data
Two graph data models : RDF and Property Graphsandyseaborne
Talk given at ApacheConEU Big Data 2015.
This talk describes the two common graph data approaches, RDF and Property Graphs. It concludes with observations about the different emphasis of each and where each is focused.
Towards Culturally Aware AI Systems - TSDH SymposiumMarieke van Erp
Towards Culturally Aware AI Systems
Presented 23 June 2021
Slide credits: Cultural AI team members Andrei Nesterov, Laura Hollink, Ryan Brate, Valentin Vogelmann + input and inspiration from all Cultural AI Colleagues
Biases in data can be both explicit and implicit. Explicitly, ‘The Dutch Seventeenth Century’ and ‘The Dutch Golden Age’ are pseudo-synonymous and refer to a particular era of Dutch history. Implicitly, the ‘Golden Age’ moniker is contested due to the fact that the geopolitical and economic expansion came with great costs, such as the slave trade. A simple two-word phrase can carry strong contestations, and entire research fields, such as post-colonial studies, are devoted to them. However, these sometimes subtle (and sometimes not so subtle) differences in voice are as yet not often represented well in AI systems.
In this talk, I will discuss how the Cultural AI Lab is working towards creating AI systems that are implicitly or explicitly aware of the subtle and subjective complexity of human culture. I will highlight the different research strands and activities that look at AI from different angles as well as how we engage with our user communities to create synergies between the technology and the daily practice of cultural heritage professionals.
The Human in Digital Humanities
Online Symposium, Tilburg School of Humanities & Digital Sciences
Tilburg University
https://www.digitalhumanitiestilburg.com/
Marieke van Erp & Victor de Boer (2021, June). A Polyvocal and Contextualised Semantic Web. In European Semantic Web Conference (pp. 506-512). Springer, Cham.
Presented on 8 June, 2021
Linked data: Four rules and five stars for the Amsterdam MuseumVictor de Boer
Slides used for a guest lecture about Linked Data for the course "Knowledge and Media" at the VU Amsterdam (Nov 2011).
The talk takes the practical example of converting Amsterdam Museum data to Five-star Linked Open Data.
Eswc2012 presentation: Supporting Linked Data Production for Cultural Heritag...Victor de Boer
Within the cultural heritage field, proprietary metadata and vocabularies are being transformed into public Linked Data. These efforts have mostly been at the level of large-scale aggregators such as Europeana where the original data is abstracted to a common format and schema. Although this approach ensures a level of consistency and interoperability, the richness of the original data is lost in the process. In this paper, we present a transparent and interactive methodology for ingesting, converting and linking cultural heritage metadata into Linked Data. The methodology is designed to maintain the richness and detail of the original metadata.
We introduce the XMLRDF conversion tool and describe how it is integrated in the ClioPatria semantic web toolkit. The methodology and the tools have been validated by converting the Amsterdam Museum metadata to a Linked Data version. In this way, the Amsterdam Museum became the first `small' cultural heritage institution with a node in the Linked Data cloud.
Radically Open Cultural Heritage Data on the WebJulie Allinson
What happens when tens of thousands of archival photos are shared with open licenses, then mashed up with geolocation data and current photos? Or when app developers can freely utilize information and images from millions of books? On this panel, we'll explore the fundamental elements of Linked Open Data and discover how rapidly growing access to metadata within the world's libraries, archives and museums is opening exciting new possibilities for understanding our past, and may help in predicting our future. Our panelists will look into the technological underpinnings of Linked Open Data, demonstrate use cases and applications, and consider the possibilities of such data for scholarly research, preservation, commercial interests, and the future of cultural heritage data.
First steps towards publishing library data on the semantic webhorvadam
First steps towards publishing library data on the semantic web. Implementing:
CoolUri
RDFDC
SKOS
RDF database and SPARQL interface
Content negotiation
Lecture at the advanced course on Data Science of the SIKS research school, May 20, 2016, Vught, The Netherlands.
Contents
-Why do we create Linked Open Data? Example questions from the Humanities and Social Sciences
-Introduction into Linked Open Data
-Lessons learned about the creation of Linked Open Data (link discovery, knowledge representation, evaluation).
-Accessing Linked Open Data
Two graph data models : RDF and Property Graphsandyseaborne
Talk given at ApacheConEU Big Data 2015.
This talk describes the two common graph data approaches, RDF and Property Graphs. It concludes with observations about the different emphasis of each and where each is focused.
Towards Culturally Aware AI Systems - TSDH SymposiumMarieke van Erp
Towards Culturally Aware AI Systems
Presented 23 June 2021
Slide credits: Cultural AI team members Andrei Nesterov, Laura Hollink, Ryan Brate, Valentin Vogelmann + input and inspiration from all Cultural AI Colleagues
Biases in data can be both explicit and implicit. Explicitly, ‘The Dutch Seventeenth Century’ and ‘The Dutch Golden Age’ are pseudo-synonymous and refer to a particular era of Dutch history. Implicitly, the ‘Golden Age’ moniker is contested due to the fact that the geopolitical and economic expansion came with great costs, such as the slave trade. A simple two-word phrase can carry strong contestations, and entire research fields, such as post-colonial studies, are devoted to them. However, these sometimes subtle (and sometimes not so subtle) differences in voice are as yet not often represented well in AI systems.
In this talk, I will discuss how the Cultural AI Lab is working towards creating AI systems that are implicitly or explicitly aware of the subtle and subjective complexity of human culture. I will highlight the different research strands and activities that look at AI from different angles as well as how we engage with our user communities to create synergies between the technology and the daily practice of cultural heritage professionals.
The Human in Digital Humanities
Online Symposium, Tilburg School of Humanities & Digital Sciences
Tilburg University
https://www.digitalhumanitiestilburg.com/
Marieke van Erp & Victor de Boer (2021, June). A Polyvocal and Contextualised Semantic Web. In European Semantic Web Conference (pp. 506-512). Springer, Cham.
Presented on 8 June, 2021
Computationally Tracing Concepts Through Time and SpaceMarieke van Erp
Slides for HNR2020 Keynote presentation
Abstract:
Digitised sources are a treasure trove for scholars, but accessing the information contained in them is far from trivial. Due to scale, traditional methods are insufficient to analyse the big data coming from these sources. Hence, computational methods look to be the solution. Indeed, computational methods can be utilised to identify and model concepts in large digital datasets, however the nature of these datasets as well as that of humanities research questions requires caution. In particular, the ramifications of time and location on understanding concepts cannot be underestimated.
In this talk, Marieke will present ongoing work on computationally tracing concepts through time and across geography using language and semantic web technology. The work illustrates that seemingly simple concepts (e.g. sugar) prove to be much more complex than expected. We discuss the importance of semantics in helping not only to deal with this complexity but reify it so that it can be interrogated both computationally and via expert analysis.
Slides 5, 8, 11, 12, 15, 16, 17, 18, 19, 20 are based the presentation Tabea Tietz gave for the paper "Challenges of Knowledge Graph Evolution from an NLP Perspective" in the WHiSe Workshop @ ESWC 2020 (2 June 2020).
http://hnr2020.historicalnetworkresearch.org/
The Hitchhiker's Guide to the Future of Digital HumanitiesMarieke van Erp
Slides of my DHOxSS closing lecture
Oxford, 26 July 2019
Abstract
In the constellation of research fields, new configurations are continuously reshaping our ideas of what a field should be. This is particularly the case in the young field of digital humanities which, as David M. Berry noted, started with a focus on improving access to digital repositories and then moved to expanding the limits of archives to include born-digital materials as research objects. Both moves greatly impacted our research practice. However, I argue that we have only started scratching the surface of what digital methods can mean for humanities research.
In particular, as our methods and collaborations with other fields have matured, we can now start imagining new types of research questions that go beyond the sum of their ‘digital’ and ‘humanities’ parts -- to fundamentally change the nature of the humanities questions that we can ask. For such a reshaping to occur, we need to deepen the connection to our academic neighbours and keep looking beyond our own research community in order to ask these new questions. In my talk, I will present how multi-disciplinary collaborations between historians, linguists, and computer scientists can bring about new insights that may form the first steps to this future.
Why language technology can’t handle Game of Thrones (yet)Marieke van Erp
Natural language processing (NLP) tools are commonly used in many day-to-day applications such as Siri and Google, but the effectiveness of these technologies is not thoroughly understood. I will present joint work with colleagues from the Vrij Universiteit Amsterdam in which we perform a thorough evaluation of four different name recognition tools on 40 popular novels (including A Game of Thrones). I will highlight why literary texts are so difficult for NLP tools as well as solutions for improving their performance.
Finding common ground between text, maps, and tables for quantitative and qua...Marieke van Erp
Invited talk given at 8th AIUCD Conference 2019 – ‘Pedagogy, teaching, and research in the age of Digital Humanities’
http://aiucd2019.uniud.it/
24 January 2019, Udine, Italy
Slicing and Dicing a Newspaper Corpus for Historical Ecology ResearchMarieke van Erp
Presented at EKAW 2018
Historical newspapers are a novel source of information for historical ecologists to study the interactions between humans and animals through time and space. Newspaper archives are particularly interesting to analyse because of their breadth and depth. However, the size and the occasional noisiness of such archives also brings difficulties, as manual analysis is impossible. In this paper, we present experiments and results on automatic query expansion and categorisation for the perception of animal species between 1800 and 1940. For query expansion and to the manual annotation process, we used lexicons. For the categorisation we trained a Support Vector Machine model. Our results indicate that we can distinguish newspaper articles that are about animal species from those that are not with an F 1 of 0.92 and the subcategorisation of the different types of newspapers on animals up to 0.84 F 1 .
Lessons Learnt from the Named Entity rEcognition and Linking (NEEL) Challenge...Marieke van Erp
Giuseppe Rizzo, Biana Pereira, Andra Varga, Marieke van Erp, Amparo Elizabeth Cano Basave
Presented on Wednesday 10 October at the 17th International Semantic Web Conference (ISWC 2018)
Paper: http://www.semantic-web-journal.net/content/lessons-learnt-named-entity-recognition-and-linking-neel-challenge-series
Conference: http://iswc2018.semanticweb.org/
Entity Typing Using Distributional Semantics and DBpedia Marieke van Erp
Presentation given at NLP&DBpedia workshop on 18 October 2016. The presentation accompanies the work described in: https://nlpdbpedia2016.files.wordpress.com/2016/09/nlpdbpedia2016_paper_9.pdf
The domain as unifier, how focusing on social history can bring technical fie...Marieke van Erp
Invited talk given at the final CEDAR symposium about the interaction between (social) history, language technology, and semantic web.
https://socialhistory.org/en/events/final-cedar-mini-symposium
Evaluating entity linking an analysis of current benchmark datasets and a ro...Marieke van Erp
Marieke van Erp, Pablo Mendes, Heiko Paulheim, Filip Ilievski, Julien Plu, Giuseppe Rizzo and Joerg Waitelonis
Presented at LREC 2016:
http://www.lrec-conf.org/proceedings/lrec2016/pdf/926_Paper.pdf
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...Marieke van Erp
Slides of the NewsReader Computational Models of Narrative Presentation "Finding Stories in 1,784,532 Events: Scaling Up Computational Models of Narrative - Marieke van Erp, Antske Fokkens, and Piek Vossen"
Workshop page: http://narrative.csail.mit.edu/cmn14/
Project page: http://www.newsreader-project.eu
Finding Stories in 1,784,532 Events: Scaling up computational models of narr...
KM Lecture 7 LOD
1. LECTURE 7:
LINKED (OPEN) DATA
Marieke van Erp
(with slides from Victor de Boer and Christophe Guéret)
2. TODAY’S LECTURE
• Why Linked (Open) Data?
• What is Linked (Open) Data?
• The story of Linked Open Data
• Contributing to Linked Data
• Standards and best practices
• Consuming Linked Data
• Drawbacks and problems
6. WHAT IS LINKED DATA?
• Linked Data is a method to publish
structured data for interlinking with
other data sources
• Standard Web technology (HTTP
and URIs)
• Making information more easily
readable and shareable for machines
• Linked Open Data is a W3C
community project to extend the
Web with open data sets
http://www.ted.com/talks/tim_berners_lee_on_the_next_web.html
10. “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
11. “Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch. http://lod-cloud.net/”
12. CONTRIBUTING TO LINKED DATA
Yes, it may be scary to open up
your data but it may lead to:
• Transparency
• Participation
• Improvement
• Innovation
• New knowledge & insights
from combined data sources
14. LINKED OPEN DATA FIVE STAR SYSTEM
Available on the web (whatever
★
format), but with an open license
Available as machine-readable
★★ structured data (e.g. excel instead of
image scan of a table)
as (2) plus non-proprietary format (e.g.
★★★
CSV instead of excel)
All the above plus, Use open standards
from W3C (RDF and SPARQL) to
★★★★
identify things, so that people can point
at your stuff
All the above, plus: Link your data to
★★★★★
other people’s data to provide context
www.w3.org/designissues/
15. FOUR RULES OF LINKED DATA
1. Use URIs as names for things (Resources)
2. Use HTTP URIs so that people can look up those names.
(Dereferencing)
3. When someone looks up a URI, provide useful information,
using the standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more
things.
http://www.w3.org/DesignIssues/
16. FOUR RULES OF LINKED DATA
1. Use URIs as names for things (Resources)
2. Use HTTP URIs so that people can look up those names.
(Dereferencing)
3. When someone looks up a URI, provide useful information,
using the standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more
things.
http://www.w3.org/DesignIssues/
17. HOW TO MAKE COOL URI’S
• Use HTTP://
• Use a namespace you control
• Unique, stable and persistent
• Don’t use:
• Author name, subject, status, access, file name extension, software mechanism
C://MyDisk/awesome/MvanErp/latest/cgi_bin/rembrandt.html
18. FOUR RULES OF LINKED DATA
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
(Dereferencing)
3. When someone looks up a URI, provide useful information,
using the standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more
things.
http://www.w3.org/DesignIssues/
21. ARCHITECTURE
SPARQL-app Browser
Purl.org
redirect
SPARQL Web interface
HTTP server
Logic
a
RDF(s) storage
tri
pa
clio
Prolog
http://
22. HOW TO ACCESS THE DATA
• PURL 303 redirect to VU semantic layer
http://purl.org/collections/nl/am/proxy-63432
è
http://semanticweb.cs.vu.nl/europeana/browse/list_resource?r=http://
purl.org/collections/nl/am/proxy-63432
• At our server: content negotiation
• HTTP request text/html:
• Local condensed view
• Local full view
• HTTP request application/rdf+xml
• rdf/xml “describe”
• SPARQL endpoint
27. FOUR RULES OF LINKED DATA
1. Use URIs as names for things
2. Use HTTP URIs so that people can look up those names.
3. When someone looks up a URI, provide useful information,
using the standards (RDF*, SPARQL)
4. Include links to other URIs. so that they can discover more
things.
http://www.w3.org/DesignIssues/
28. LINK TO OTHER SOURCES
“19319 ”
iref
:pr
am am:date “1651”
“1234”
am:priref
am:Record am:maker am:Person
am:birthdate
“1606”
am:proxy-19319 am:p-1234
rda:name “Rembrandt”
owl:sameAs (?)
Viaf:nationality
“Dutch”
Viaf:Person
Viaf:RebrandtvanRijn
“Rembrandt
Harmensz. Van
rdfs:label Rijn”
29. CONSUMING LINKED DATA
• Generic Applications
• Can process any data from any domain
• Domain specific applications
• Covers needs of specific user community
33. DRAWBACKS AND PROBLEMS
• Extra burden on the data provider
• Nerd-only (aka “SPARQL is hard”)
• How do we build user-friendly systems?
• Ranking, user-friendly information presentation
• Scalability (how do you query a huge graph?)
• Licenses
• Is Open always a good idea?
• Context?
• Data quality
34. FURTHER READING
• Tom Heath and Christian Bizer (2011)
Linked Data: Evolving the Web into a
Global Data Space (1st edition).
Synthesis Lectures on the Semantic
Web: Theory and Technology, 1:1,
1-136. Morgan & Claypool (available
online for free)