The document discusses strategies for maintaining links and content over time on the web (link rot and content drift). It proposes using persistent identifiers (PIDs) assigned to web resources and versions to combat these issues. PIDs allow the "web of the present" to remain the same as the "web of the past" by redirecting links over time. The document provides examples of communities like scholarly communication that care about maintaining the integrity of web archives and references over long periods. It also discusses the use of relation types like "cite-as" to convey a preferred URI for references that may change.
International Image Interoperability Framework (IIIF). Sharing high resolutio...LIBIS
On Moday April 23th 2018 Roxanne Wyns (LIBIS - KU Leuven Libraries) gave a lecture at the University of Antwerp for Digital Humanities students and researchers. IIIF or the International Image Interoperability Framework is a community-developed framework for sharing high-resolution images in an efficient and standardized way across institutional boundaries. Using an IIIF manifest URL, a researcher can pull image based resources and related contextual information such as the structure of a complex object or document, metadata and rights information into any IIIF compliant viewer such as the Mirador viewer. Simply put, a researcher can access high resolution images from the British Library and from the KU Leuven Libraries in a single viewer for research. This lecture will introduce IIIF and its concepts, highlight projects and viewers, and give an in-depth view of its current and future application options for DH research.
Presentation during the 2016 American Library Association (ALA) Annual Conference in Orlando (Florida), given at the ALCTS Program "Linked Data - Globally Connecting Libraries, Archives, and Museums", Sponsor: ALCTS International Relations Committee, Co-Sponsor: Linked Library Data Interest Group
A Framework for Aggregating Private and Public Web Archivesjcdl2018
Mat Kelly, Michael L. Nelson, and Michele C. Weigle
Old Dominion University
Web Science & Digital Libraries Research Group {mkelly, mln, mweigle}@cs.odu.edu @machawk1 • @WebSciDL
#jcdl2018
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....Matthias Arnold
This paper presents the project “Early Chinese Periodicals Online (ECPO)”. It introduces the database, and discusses two major directions of current development: 1) The installation of a cross-database agents’ service to identify names, assign names to persons, and relate persons to authorities (GND, VIAF, Wikidata). 2) The conceptualization of a TEI module to expand the database with full texts functionality, thereby touching issues like semi-automatic page segmentation, use of non-Chinese speaking communities in crowd sourcing, and selecting of relevant TEI markup to encode Republican era publications.
International Image Interoperability Framework (IIIF). Sharing high resolutio...LIBIS
On Moday April 23th 2018 Roxanne Wyns (LIBIS - KU Leuven Libraries) gave a lecture at the University of Antwerp for Digital Humanities students and researchers. IIIF or the International Image Interoperability Framework is a community-developed framework for sharing high-resolution images in an efficient and standardized way across institutional boundaries. Using an IIIF manifest URL, a researcher can pull image based resources and related contextual information such as the structure of a complex object or document, metadata and rights information into any IIIF compliant viewer such as the Mirador viewer. Simply put, a researcher can access high resolution images from the British Library and from the KU Leuven Libraries in a single viewer for research. This lecture will introduce IIIF and its concepts, highlight projects and viewers, and give an in-depth view of its current and future application options for DH research.
Presentation during the 2016 American Library Association (ALA) Annual Conference in Orlando (Florida), given at the ALCTS Program "Linked Data - Globally Connecting Libraries, Archives, and Museums", Sponsor: ALCTS International Relations Committee, Co-Sponsor: Linked Library Data Interest Group
A Framework for Aggregating Private and Public Web Archivesjcdl2018
Mat Kelly, Michael L. Nelson, and Michele C. Weigle
Old Dominion University
Web Science & Digital Libraries Research Group {mkelly, mln, mweigle}@cs.odu.edu @machawk1 • @WebSciDL
#jcdl2018
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....Matthias Arnold
This paper presents the project “Early Chinese Periodicals Online (ECPO)”. It introduces the database, and discusses two major directions of current development: 1) The installation of a cross-database agents’ service to identify names, assign names to persons, and relate persons to authorities (GND, VIAF, Wikidata). 2) The conceptualization of a TEI module to expand the database with full texts functionality, thereby touching issues like semi-automatic page segmentation, use of non-Chinese speaking communities in crowd sourcing, and selecting of relevant TEI markup to encode Republican era publications.
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
Knowledge Graphs are often used as a symbolic representation mechanism for representing knowledge in data intensive applications, both for integrating corporate knowledge as well as for providing general, cross-domain knowledge in public knowledge graphs such as Wikidata. As such, they have been identified as a useful way of injecting background knowledge in data analysis processes. To fully harness the potential of knowledge graphs, latent representations of entities in the graphs, so called knowledge graph embeddings, show superior performance, but sacrifice one central advantage of knowledge graphs, i.e., the explicit symbolic knowledge representations. In this talk, I will shed some light on the usage of knowledge graphs and embeddings in data analysis, and give an outlook on research directions which aim at combining the best of both worlds.
Using knowledge graphs in data mining typically requires a propositional, i.e., vector-shaped representation of entities. RDF2vec is an example for generating such vectors from knowledge graphs, relying on random walks for extracting pseudo-sentences from a graph, and utilizing word2vec for creating embedding vectors from those pseudo-sentences. In this talk, I will give insights into the idea of RDF2vec, possible application areas, and recently developed variants incorporating different walk strategies and training variations.
This presentation shows approaches for knowledge graph construction from Wikipedia and other Wikis that go beyond the "one entity per page" paradigm. We see CaLiGraph, which extracts entities from categories and listings, as well as DBkWik, which extracts and integrates information from thousands of Wikis.
Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim
Large-scale cross-domain knowledge graphs, such as DBpedia or Wikidata, are some of the most popular and widely used datasets of the Semantic Web. In this paper, we introduce some of the most popular knowledge graphs on the Semantic Web. We discuss how machine learning is used to improve those knowledge graphs, and how they can be exploited as background knowledge in popular machine learning tasks, such as recommender systems.
How are Knowledge Graphs created?
What is inside public Knowledge Graphs?
Addressing typical problems in Knowledge Graphs (errors, incompleteness)
New Knowledge Graphs: WebIsALOD, DBkWik
Registration / Certification Interoperability Architecture (overlay peer-review)Herbert Van de Sompel
Presentation for the COAR meeting on Overlay Peer-Review held at INRIA, Paris, France. It provides overall context regarding a scholarly communication system in which the core functions of scholarly communication (registration, certification, awareness, archiving) are implemented in a decoupled manner and whereby each function can simultaneously be fulfilled by different parties, potentially in different ways. It shows how notifications can be used to achieve loosely coupled, point-to-point interoperability in such an environment, zooming in on interoperability between registration and certification aka interoperability between repositories and overlay peer-review services.
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim
AI is not just about machine learning, it also requires knowledge about the world. In this talk, I give an introduction on knowledge graphs, how they are built at scale, and how they are used in modern AI systems.
The original Semantic Web vision foresees to describe entities in a way that the meaning can be interpreted both by machines and humans. Following that idea, large-scale knowledge graphs capturing a significant portion of knowledge have been developed. In the recent past, vector space embeddings of semantic web knowledge graphs - i.e., projections of a knowledge graph into a lower-dimensional, numerical feature
space (a.k.a. latent feature space) - have been shown to yield superior performance in many tasks, including relation prediction, recommender systems, or the enrichment of predictive data mining tasks. At the same time, those projections describe an entity as a numerical vector, without
any semantics attached to the dimensions. Thus, embeddings are as far from the original Semantic Web vision as can be. As a consequence, the results achieved with embeddings - as impressive as they are in terms of quantitative performance - are most often not interpretable, and it is hard to obtain a justification for a prediction, e.g., an explanation why an item has been suggested by a recommender system. In this paper, we make a claim for semantic embeddings and discuss possible ideas towards their construction.
LOCAH Project and Considerations of Linked Data ApproachesAdrian Stevenson
Presentation given at JISC 'Managing Research Data International Workshop', Birmingham, UK. 29th March 2011
http://www.jisc.ac.uk/whatwedo/programmes/mrd/rdmevents/mrdinternationalworkshop.aspx
Mining the Web of Linked Data with RapidMinerHeiko Paulheim
Lots of data from different domains is published as Linked Open Data. While there are quite a few browsers for that data, as well as intelligent tools for particular purposes, a versatile tool for deriving additional knowledge by mining the Web of Linked Data is still missing. In this challenge entry, we introduce the RapidMiner Linked Open Data extension. The extension hooks into the powerful data mining platform RapidMiner, and offers operators for accessing Linked Open Data in RapidMiner, allowing for using it in sophisticated data analysis workflows without the need to know SPARQL or RDF. As an example, we show how statistical data on scientific publications, published as an RDF data cube, can be linked to further datasets and analyzed using additional background knowledge from various LOD datasets.
Keynote talk presented at Web Archiving and Digital Libraries (WADL) 2018
June 6, 2018 - Fort Worth, TX
Michele C. Weigle (@weiglemc)
Web Science and Digital Libraries (WS-DL) Research Group (@WebSciDL)
Old Dominion University
Norfolk, VA
This presentation looks back at several efforts, conducted in the past fifteen years, aimed at establishing interoperability for web-based scholarly communication. It tries to characterize the perspectives/approaches taken by these efforts and, based upon that, proposes an HATEOS-based approach to interlink scholarly nodes on the web. This was first presented at the Research Data Alliance meeting in Paris, France, September 22 2015.
Researcher Pod: Scholarly Communication Using the Decentralized WebHerbert Van de Sompel
The presentation provides an overview of the motivation and direction of the Mellon-funded Researcher Pod project that investigates technical aspects of scholarly communication in a decentralized web setting.
This slide deck provides an overview of proposals to use HTTP Links as a means to address some long standing problems related to scholarly resources on the web.
These slides go with the paper "Reminiscing About 15 Years of Interoperability Efforts" which is available at http://dx.doi.org/10.1045/november2015-vandesompel
Slides were used for a presentation at the Fall 2015 Membership Meeting of the Coalition for Networked Information.
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
Knowledge Graphs are often used as a symbolic representation mechanism for representing knowledge in data intensive applications, both for integrating corporate knowledge as well as for providing general, cross-domain knowledge in public knowledge graphs such as Wikidata. As such, they have been identified as a useful way of injecting background knowledge in data analysis processes. To fully harness the potential of knowledge graphs, latent representations of entities in the graphs, so called knowledge graph embeddings, show superior performance, but sacrifice one central advantage of knowledge graphs, i.e., the explicit symbolic knowledge representations. In this talk, I will shed some light on the usage of knowledge graphs and embeddings in data analysis, and give an outlook on research directions which aim at combining the best of both worlds.
Using knowledge graphs in data mining typically requires a propositional, i.e., vector-shaped representation of entities. RDF2vec is an example for generating such vectors from knowledge graphs, relying on random walks for extracting pseudo-sentences from a graph, and utilizing word2vec for creating embedding vectors from those pseudo-sentences. In this talk, I will give insights into the idea of RDF2vec, possible application areas, and recently developed variants incorporating different walk strategies and training variations.
This presentation shows approaches for knowledge graph construction from Wikipedia and other Wikis that go beyond the "one entity per page" paradigm. We see CaLiGraph, which extracts entities from categories and listings, as well as DBkWik, which extracts and integrates information from thousands of Wikis.
Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim
Large-scale cross-domain knowledge graphs, such as DBpedia or Wikidata, are some of the most popular and widely used datasets of the Semantic Web. In this paper, we introduce some of the most popular knowledge graphs on the Semantic Web. We discuss how machine learning is used to improve those knowledge graphs, and how they can be exploited as background knowledge in popular machine learning tasks, such as recommender systems.
How are Knowledge Graphs created?
What is inside public Knowledge Graphs?
Addressing typical problems in Knowledge Graphs (errors, incompleteness)
New Knowledge Graphs: WebIsALOD, DBkWik
Registration / Certification Interoperability Architecture (overlay peer-review)Herbert Van de Sompel
Presentation for the COAR meeting on Overlay Peer-Review held at INRIA, Paris, France. It provides overall context regarding a scholarly communication system in which the core functions of scholarly communication (registration, certification, awareness, archiving) are implemented in a decoupled manner and whereby each function can simultaneously be fulfilled by different parties, potentially in different ways. It shows how notifications can be used to achieve loosely coupled, point-to-point interoperability in such an environment, zooming in on interoperability between registration and certification aka interoperability between repositories and overlay peer-review services.
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim
AI is not just about machine learning, it also requires knowledge about the world. In this talk, I give an introduction on knowledge graphs, how they are built at scale, and how they are used in modern AI systems.
The original Semantic Web vision foresees to describe entities in a way that the meaning can be interpreted both by machines and humans. Following that idea, large-scale knowledge graphs capturing a significant portion of knowledge have been developed. In the recent past, vector space embeddings of semantic web knowledge graphs - i.e., projections of a knowledge graph into a lower-dimensional, numerical feature
space (a.k.a. latent feature space) - have been shown to yield superior performance in many tasks, including relation prediction, recommender systems, or the enrichment of predictive data mining tasks. At the same time, those projections describe an entity as a numerical vector, without
any semantics attached to the dimensions. Thus, embeddings are as far from the original Semantic Web vision as can be. As a consequence, the results achieved with embeddings - as impressive as they are in terms of quantitative performance - are most often not interpretable, and it is hard to obtain a justification for a prediction, e.g., an explanation why an item has been suggested by a recommender system. In this paper, we make a claim for semantic embeddings and discuss possible ideas towards their construction.
LOCAH Project and Considerations of Linked Data ApproachesAdrian Stevenson
Presentation given at JISC 'Managing Research Data International Workshop', Birmingham, UK. 29th March 2011
http://www.jisc.ac.uk/whatwedo/programmes/mrd/rdmevents/mrdinternationalworkshop.aspx
Mining the Web of Linked Data with RapidMinerHeiko Paulheim
Lots of data from different domains is published as Linked Open Data. While there are quite a few browsers for that data, as well as intelligent tools for particular purposes, a versatile tool for deriving additional knowledge by mining the Web of Linked Data is still missing. In this challenge entry, we introduce the RapidMiner Linked Open Data extension. The extension hooks into the powerful data mining platform RapidMiner, and offers operators for accessing Linked Open Data in RapidMiner, allowing for using it in sophisticated data analysis workflows without the need to know SPARQL or RDF. As an example, we show how statistical data on scientific publications, published as an RDF data cube, can be linked to further datasets and analyzed using additional background knowledge from various LOD datasets.
Keynote talk presented at Web Archiving and Digital Libraries (WADL) 2018
June 6, 2018 - Fort Worth, TX
Michele C. Weigle (@weiglemc)
Web Science and Digital Libraries (WS-DL) Research Group (@WebSciDL)
Old Dominion University
Norfolk, VA
This presentation looks back at several efforts, conducted in the past fifteen years, aimed at establishing interoperability for web-based scholarly communication. It tries to characterize the perspectives/approaches taken by these efforts and, based upon that, proposes an HATEOS-based approach to interlink scholarly nodes on the web. This was first presented at the Research Data Alliance meeting in Paris, France, September 22 2015.
Researcher Pod: Scholarly Communication Using the Decentralized WebHerbert Van de Sompel
The presentation provides an overview of the motivation and direction of the Mellon-funded Researcher Pod project that investigates technical aspects of scholarly communication in a decentralized web setting.
This slide deck provides an overview of proposals to use HTTP Links as a means to address some long standing problems related to scholarly resources on the web.
These slides go with the paper "Reminiscing About 15 Years of Interoperability Efforts" which is available at http://dx.doi.org/10.1045/november2015-vandesompel
Slides were used for a presentation at the Fall 2015 Membership Meeting of the Coalition for Networked Information.
This slide deck provides an overview of proposals to use HTTP Links as a means to address some long standing problems related to scholarly resources on the web.
Automated interpretability of linked data ontologies: an evaluation within th...Nuno Freire
Publication and usage of linked data has been highly pursued by cultural heritage institutions and service providers in this domain. Much research and cooperation are taking place in adapting and improving cultural heritage data models for linked data and in defining ontologies and vocabularies, as well as the setting up of services based on linked data. This article presents an evaluation of ontologies and vocabularies published as liked data, which originate from the cultural heritage domain, or are frequently used and linked to in this domain. Our study aims to evaluate their usability by crawlers operating on the web of data, according to specifications and practices of linked data, the Semantic Web and ontology reasoning. We evaluate having in mind the use case of general data consumption applications based on RDF, RDF Schema, OWL, SKOS and linked data’s guidelines. We have evaluated twelve ontologies and vocabularies and identified that four were not fully compliant, and that alignments between ontologies are not included in the definitions of the ontologies. This study contributes to the research of novel services consuming linked data. It also allows to better assess the automation that can be achieved to handle the variety and large volume of linked data, when assessing the viability of new services based on linked data in cultural heritage.
A 4 hour hands on linked data workshop held at ELAG 2013 - http://elag2013.org/ws2-very-gentle-linked-data/. Resources at http://data.archiveshub.ac.uk/workshops/elag2013/
Linked Statistical Data: does it actually pay off?Oscar Corcho
Invited keynote at the ISWC2015 Workshop on Semantics and Statistics (SemStats 2015). http://semstats.github.io/2015/
The release of the W3C RDF Data Cube recommendation was a significant milestone towards improving the maturity of the area of Linked Statistical Data. Many Data Cube-based datasets have been released since then. Tools for the generation and exploitation of such datasets have also appeared. While the benefits for the usage of RDF Data Cube and the generation of Linked Data in this area seem to be clear, there are still many challenges associated to the generation and exploitation of such data. In this talk we will reflect about them, based on our experience on generating and exploiting such type of data, and hopefully provoke some discussion about what the next steps should be.
How can design help us communicate data easily to users? Where does this stem from? What methods of design are easy for users to engage with? What should we be trying to achieve with these designs?
The cultural sector is a big adopter of open data and semantic web technologies. They have embraced the ideas and are weaving them into everything they do. So, who is doing what? What data sets are there available? And how have these been presented to the public.
Using case studies from the cultural sector, we will explore the practical challenges associated with complex UI designs. Looking at work-in-progress through to finished products we will discuss best practice, finding innovation, and the challenges of working with data sets.
Evaluation of Schema.org for Aggregation of Cultural Heritage MetadataNuno Freire
In the World Wide Web, a very large number of resources is made available through digital libraries. The existence of many individual digital libraries, maintained by different organiza-tions, brings challenges to the discoverability, sharing and reuse of the resources. A widely-used approach is metadata aggregation, where centralized efforts like Europeana facilitate the discoverability and use of the resources by collecting their associated metadata. The cultural heritage domain embraced the aggregation approach while, at the same time, the technological landscape kept evolving. Nowadays, cultural heritage institutions are increas-ingly applying technologies designed for the wider interoperability on the Web. In this con-text, we have identified the Schema.org vocabulary as a potential technology for innovating metadata aggregation. We conducted two case studies that analysed Schema.org metadata from collections from cultural heritage institutions. We used the requirements of the Euro-peana Network as evaluation criteria. These include the recommendations of the Europeana Data Model, which is a collaborative effort from all the domains represented in Europeana: libraries, museums, archives, and galleries. We concluded that Schema.org poses no obstacle that cannot be overcome to allow data providers to deliver metadata in full compliance with Europeana requirements and with the desired semantic quality. However, Schema.org’s cross-domain applicability raises the need for accompanying its adoption by recommenda-tions and/or specifications regarding how data providers should create their Schema.org metadata, so that they can meet the specific requirements of Europeana or other cultural aggregation networks.
At this online web conference, the Europeana Aggregators’ Forum will open their virtual doors to cultural heritage professionals and anyone with an interest in high quality, open cultural heritage content.
At this online web conference, the Europeana Aggregators’ Forum will open their virtual doors to cultural heritage professionals and anyone with an interest in high quality, open cultural heritage content.
Slides 2 - 39:Europeana Network Association General Assembly by Marco de Niet, Georgia Angelaki, Erwin Verbruggen, Fred Truyen and Sara Di Giorgio
Slide 40: Keynote Frédéric Kaplan
Slide 41: State Secretary Angela Ferreira
Slide 42: Wrap up day one by Marco de Niet
Slide 45: Welcome by Marco de Niet
Slide 46: Welcome by Maria Ines Cordeiro
Slide 47: Europeana Strategy 2020+ by Rehana Schwinninger-Ladak
Slides 48 - 142: Developments at Europeana by Harry Verwayen
Slides 143 - 147: Welcome & Introduction to the conference programme by Marco de Niet
Slides 149 - 191: The Europeana Innovation Agenda highlights by Ina Blümel, Johan Oomen, Sara Di Giorgio, Lorna Hughes, Pedro Santos and Andy Neale
Slides 193 - 194: Introduction of the afternoon programme by Fred Truyen
Slides 195 - 231: We transform the world with culture by Harry Verwayen, Elisabeth Niggemann, Rehana Schwinninger-Ladak, Katherine Heid and Merete Sanderhoff
Slides 232 - : The Europeana Innovation Agenda highlights by Gregory Markus, Chris Dijkshoorn, Maarten Dammers and Harald Sack
Slide 285: Pitch your project (See pitch your project presentation slides)
Slides 286 - 290: Unsung Heroes by Marco de Niet
Slides 291 - 292: Wrap up and closure of day two by Sara Di Giorgio
Slides 2 - 6: Introduction to the programme by Georgia Angelaki
Slides 7 - 9: Keynote Michael Edson
Slides 10 - 40: Europeana Aggregators Forum by Marco Rendina
Slides 42 - 75: Promoting Cultural Heritage with digital invasion by Altheo Valentini-Egina and Marianna Marcucci
Slides 77 - 97: Opportunities for digital cultural heritage and the public domain, under the EU Copyright Rules by Paul Keller, Steven Stegers, Jurga Gradauskaite, Antje Schmidt, Sebastiaan ter Burg and Harry Verwayen
Slides 98 - 101: Climate Call for Action: Outcomes by Barbara Fischer
Slides 102 - 114: Wrap up and closure by Marco de Niet
Europeana 2019 - Connect Communities - Pitch your projectEuropeana
Slides 3 - 10: The GIFT Box: Helping museums make richer digital experiences for their visitors by Anders Sundnes Lovlie
Slides 11 - 18: Between people and things - Transfer of knowledge at SHMH by Elisabeth Böhm
Slides 19 - 30: Automated recognition of historical image content by Tino Mager
Slides 31 - 51: 50s in Europe: Kaleidoscope by Sofie Taes
Slides 52 - 63: CrowdHeritage: Crowdsourcing Platform for Enriching Europeana Metadata by Vassilis Tzouvaras
Slides 64 - 73: One by One: developing digital literacy in museums by Anra Kennedy
Slides 74 - 85: HeritageMaps.ie - Ireland's One-Stop Heritage Portal by Patrick Reid
Slides 86 - 90: Open GLAM now! - Sharing knowledge openly online by Larissa Borck
Slides 91 - 103: Endangered Archives Programme the world's most diverse online archive by Tristan Roddis
Slides 104 - 109: We transform the world with culture - Our impact on climate change by Barbara Fischer, Killian Downing and Peter Soemers
Slide 2 - 66: Shaping innovatin in education with cultural heritage by Fred Truyen, Steven Stegers, Evita Tasiopoulou and Marco Neves
Slides 67 - 152: Multilingual access and machine translation by Andy Neale, Antoine Isaac, Pavel Kats, Alex Raginsky and Sergiu Gordea
Slides 155 - 164: How to implement the FAIR principles in digital culture by Sara Di Giorgio, Saskia Scheltjens and Makx Dekkers, Seamus Ross, Franco Niccolucci and Erzsébet Tóth-Czifra
Slide 166: EuropeanaTech Unconference by Clemens Neudecker
Slides 2 - 35: Introduction to Impact Workshop by Dafydd Tudur, Maja Drabczyk, Julia Fallon and Simon Tanner
Slides 36 - 68: Music to my ears: Making rights understandable by Juozas Markauskas and Jurga Gradauskaite
Slides 70 - 92: Achieving inclusivity & diversity in the Europeana Network by Killian Downing, Larissa Borck and Tola Dabiri
Slides 94 - 123: Communicating the value of digital culture to stakeholders by Susan Hazan, Eleanor Kenny and Katherine Heid
Have you ever wondered how search works while visiting an e-commerce site, internal website, or searching through other types of online resources? Look no further than this informative session on the ways that taxonomies help end-users navigate the internet! Hear from taxonomists and other information professionals who have first-hand experience creating and working with taxonomies that aid in navigation, search, and discovery across a range of disciplines.
Acorn Recovery: Restore IT infra within minutesIP ServerOne
Introducing Acorn Recovery as a Service, a simple, fast, and secure managed disaster recovery (DRaaS) by IP ServerOne. A DR solution that helps restore your IT infra within minutes.
0x01 - Newton's Third Law: Static vs. Dynamic AbusersOWASP Beja
f you offer a service on the web, odds are that someone will abuse it. Be it an API, a SaaS, a PaaS, or even a static website, someone somewhere will try to figure out a way to use it to their own needs. In this talk we'll compare measures that are effective against static attackers and how to battle a dynamic attacker who adapts to your counter-measures.
About the Speaker
===============
Diogo Sousa, Engineering Manager @ Canonical
An opinionated individual with an interest in cryptography and its intersection with secure software development.
This presentation, created by Syed Faiz ul Hassan, explores the profound influence of media on public perception and behavior. It delves into the evolution of media from oral traditions to modern digital and social media platforms. Key topics include the role of media in information propagation, socialization, crisis awareness, globalization, and education. The presentation also examines media influence through agenda setting, propaganda, and manipulative techniques used by advertisers and marketers. Furthermore, it highlights the impact of surveillance enabled by media technologies on personal behavior and preferences. Through this comprehensive overview, the presentation aims to shed light on how media shapes collective consciousness and public opinion.
Sharpen existing tools or get a new toolbox? Contemporary cluster initiatives...Orkestra
UIIN Conference, Madrid, 27-29 May 2024
James Wilson, Orkestra and Deusto Business School
Emily Wise, Lund University
Madeline Smith, The Glasgow School of Art
This presentation by Morris Kleiner (University of Minnesota), was made during the discussion “Competition and Regulation in Professions and Occupations” held at the Working Party No. 2 on Competition and Regulation on 10 June 2024. More papers and presentations on the topic can be found out at oe.cd/crps.
This presentation was uploaded with the author’s consent.
María Carolina Martínez - eCommerce Day Colombia 2024
Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conference 2018
1. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Herbert Van de Sompel
Los Alamos National Laboratory
@hvdsomp
Perseverance on Persistence
a future-note about the past
2. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
OAI-ORE
3. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
2006
• OAI-ORE observation: Scholarly assets are
rapidly becoming compound, consisting of
multiple resources with various:
• Relationships
• Interdependencies
• How to convey this compound-ness in an
interoperable manner so that applications
can access, consume such assets?
http://www.openarchives.org/ore/1.0/toc
4. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Address interoperability challenges from the perspective of the web
• The resource at the center of the universe
• The notion of a repository (or even of a web server) does
not exist in the architecture of the web
• Neither the notion of a Digital Object
• The tools of the interoperability trade are the primitives of the
web
ORE Insight 1 - Web-Centric Interoperability Paradigm
5. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Tools of the Web-Centric Interoperability Trade
• Resource
• URI
• HTTP as the API: HEAD/GET, POST, PUT, DELETE
• Representation
• Media Type
• Link
• Content Negotiation
• Typed Link
• Controlled Vocabularies for Typed Links
W3C
Architecture of
the World
Wide Web
RDF, RDFS,
OWL
6. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
7. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
OAI-ORE in EDM
Europeana v1.0 2009
8. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
The web-centric ORE approach allowed using off-the-shelf web
tools to archive evolving compound objects
• Evolving versions of Resource Maps, Aggregated Resources
were captured in a web archive
• But how to use the URI of the Aggregation or Resource Map to
see the status of an Aggregation at a specific moment in the
past?
ORE Insight 2 – How to Access Temporal State of an Aggregation
H. Van de Sompel (2007) Compound Information Object Prototype Demonstration
https://www.dropbox.com/s/dd7xd427y90q4jx/CT_Watch_hvds_20070703.mov?dl=0
9. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
H. Van de Sompel, M. L. Nelson, R. Sanderson (2013) RFC7089 - HTTP Framework for Time-
Based Access to Resource States – Memento. https://tools.ietf.org/html/rfc7089
Memento
10. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Tools of the Web-Centric Interoperability Trade – HTTP Stack
• Resource
• URI
• HTTP as the API
• Representation
• Media Types
• Link
• Content Negotiation, e.g. for preferred Media Type
• Typed Link
• Controlled Vocabularies for Typed Links
W3C
Architecture of
the World Wide
Web
HTTP Links,
IANA link
relation registry,
community link
relation types
HATEOAS – Hypermedia As The Engine Of Application State
http://en.wikipedia.org/wiki/HATEOAS
11. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Original Resource and Mementos
12. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Bridge from Present to Past
13. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Bridge from Present to Past
14. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Bridge from Past to Present
15. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
timegate Link: Link to Your Own History
Can link to preferred web
archive, but also:
• Maintain your own
resource version history
• timegate link to your
own history
• Distributed management of
resource history
• Uniform access to
resource history across
systems
• Follow links across
systems subject to time
16. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
No timegate Link – Client Intelligence
Client uses TimeGate of its
preferred web archive, but:
• Internet Archive is
massive, yet substantial
unique materials in other
archives
• Introduce aggregated
TimeGate: Memento
Aggregator
17. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Routing TimeGate Requests Using Machine Learning
Bornand, N., Balakireva, L., Van de Sompel, H. (2016) Routing Memento Requests Using Binary
Classifiers. JCDL16. https://arxiv.org/abs/1606.09136
• Memento Aggregator covers 20+ web archives
• Distributed systems problem: As the number of archives (and
incoming requests) grows, sending requests to each archive for
every incoming request is not feasible
• Response times
• Load on distributed archives
• After various optimization attempts, devised an approach using
binary classifiers per web archive:
• Trained on the basis of cached URIs, using URI features only
• Operational since 2016: 80% reduction in # queries. 1/3
reduction in response times. Recall 85%
18. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
From
Internet Archive
TodayToday Select Date Mar 20 2007 Apr 03 2007
Various Memento Tools (client/server)
https://github.com/machawk1/awesome-memento
19. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Pockets of Persistence
20. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Creating Pockets of Persistence
• With Memento’s time travel capability in place, what would it take to
support faithfully navigating the web of the Past?
• There are two major forces that hinder achieving this goal:
• Link rot: A link stops working all together
• Content drift: The linked content changes over time and may
eventually no longer be representative of the content that was
originally linked
• Without these forces at work, the web of the Present would be the
same as the web of the Past
• But that clearly is not the case
21. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Hyperlinks in Theory
22. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Hyperlinks in Reality
23. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Hyperlinks in Reality
24. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Link Rot
25. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Link Rot - PMC
Martin Klein, Herbert Van de Sompel, Robert Sanderson, Harihar Shankar, et al. (2014) Scholarly
context not found. In: PLOS ONE https://doi.org/10.1371/journal.pone.0115253
26. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Hyperlinks in Reality
27. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Content Drift
28. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Content Drift
29. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Content Drift
http://icecube.wisc.edu/ on May 8 2009 (left) and August 27 2009 (right)
30. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
No Content Drift
http://www.ifa.hawaii.edu/~cowie/k_table.html on June 9 1997 (left) and March 2016 (right)
31. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Content Drift - PMC
Shawn Jones, Herbert Van de Sompel, Harihar Shankar, Martin Klein, et al. (2016) Scholarly context
not found. In: PLOS ONE https://doi.org/10.1371/journal.pone.0167475
32. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Creating Pockets of Persistence
• What would it take to really support faithfully navigating the web of
the Past?
• This challenge exists for the entire web. Some communities with well
managed collections care about addressing it:
• Scholarly communication
• Cultural heritage
• Legal publications
• Journalism
• Wikipedia
• Why?
• Link Rot: Quality of Service
• Content Drift: integrity of the record, reliable evidence, revisiting
the state of knowledge, transparency of editorial process, …
33. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
US Supreme Court Opinion – Link Rot Activism
http://ssnat.com
34. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Two Types of Links from a Managed Collection
35. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Take 1 – PID Approach
PID
for
B
36. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Managed Collection => Managed Collection
37. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
PID Approach
Combat:
• Link Rot: Link to PID;
Redirect to current location
• Content Drift: Mint a PID
per version; Link to version
PID
With PID links:
• Web of Present = Web of
Past
38. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
39. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
URI References - PMC
Herbert Van de Sompel, Martin Klein, and Shawn Jones (2016) Persistent URIs Must Be Used to Be Persistent.
In: WWW2016. http://arxiv.org/1602.09102
Herbert Van de Sompel, Martin Klein, and Shawn Jones (2016) Persistent URIs Must Be Used
to Be Persistent. In: WWW2016. http://arxiv.org/1602.09102
40. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
cite-as Relation Type
Herbert Van de Sompel et al. (2018) cite-as: A Link Relation to Convey a Preferred URI for
Referencing. https://datatracker.ietf.org/doc/draft-vandesompel-citeas/
http://signposting.org
41. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
PID Approach – Division of Labor
42. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Managed Collection => Web at Large
43. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
44. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
PID Approach
-
45. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Take 2 – Robust Links Approach
46. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Managed Collection => Web at Large
47. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Snapshot Approach
Combat:
• Link Rot & Content Drift:
Custodian of A creates
snapshot of B, in web
archive or locally
Regarding links:
• Intuition suggests linking to
the snapshot of B …
48. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Linking to Snapshot of B = Potentially Creating a Rotten Link
• Existing practice for linking to snapshots:
<a href=“URL of snapshot of B”>
• Problems with existing practice:
o Impossible to visit the original URI, if desired
o Requires the permanent existence/uptime of the archive that
holds the snapshot
- One link rot problem replaced by another
http://robustlinks.mementoweb.org/about/
49. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Permanent Existence/Uptime of Archives?
Remnant of discontinued web archive http://mummify.it captured on February 14 2014
https://web.archive.org/web/20140214233752/https://www.mummify.it/
50. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Permanent Existence/Uptime of Archives?
http://www.themoscowtimes.com/news/article/russia-bans-wayback-machine-internet-archive-over-
islamic-state-video/510074.html
51. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Permanent Existence/Uptime of Archives?
http://web.archive.org/web/20121101043952/http://vogin.nl on March 6 2017 at 15:59 CET
52. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Decorate the Link
• Proposed practice for linking to captures:
<a href=“URL of snapshot of B”
data-originalurl=“B”
data-versiondate=“datetime of snapshot of B”>
<a href=“B”
data-versionurl=“URL of snapshot of B”
data-versiondate=“datetime of snapshot of B”>
http://robustlinks.mementoweb.org/spec/
53. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Robust Links: Link Decoration in Action
Van de Sompel H. & Nelson, M.L. (2015) Reminiscing about 15 years of interoperability efforts. In:
D-Lib Magazine. https://doi.org/10.1045/november2015-vandesompel
JavaScript makes the
link decorations actionable
54. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Robust Links: Refuse to Die
55. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
56. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Snapshot Approach – Division of Labor
57. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Managed Collection => Managed Collection
58. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Cool URI Approach
Combat:
• Link Rot: Link to B;
Redirect to current location
• Content Drift: Generic URI;
Version URIs
With Cool URI links:
• Tension between linking to
generic URI and version
URI
59. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Robust Links: Refuse to Die
60. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
61. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Cool URI Approach – Division of Labor
62. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Robust Links Approach
63. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Summary
PID RLLabor
-
64. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Robust Links for Linked Data?
Sanderson, R., Ciccarese, P., and Young, B. (2017) Web Annotation Vocabulary
W3C Recommendation 23 February 2017. https://www.w3.org/TR/annotation-vocab/
65. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Handling Resource Versions, Captures
B
B
t1
B
t2
66. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Systems with Resource Versions
67. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
DBpedia Snapshot Archive Using HDT, TPF, Memento
Vander Sande, M., Verborgh, R., Hochstenbach, P., and Van de Sompel, H. (2017) Towards
sustainable publishing and querying of distributed Linked Data archives.
Temporal: subject URI access ; ?s ?p ?o queries ; SPARQL queries
68. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Memento Tracer
http://tracer.mementoweb.org
69. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Resource Capture: Tension Between Scale and Quality
• Web crawling: optimized for scale
• Problems with capturing resources accessible via interactive
affordances
• webrecorder.io: optimized for quality
• Personal archiving
• User records web navigation session
• Not used for archiving at scale
• LOCKSS: optimized for scholarly journals
• Pages in Publisher/Journal portals share lay-out, affordances
• Heuristics per publisher/journal to improve capture quality
70. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Memento Tracer: New Sweet Spot Between Scale and Quality
• ~ web crawling: server side process to capture resources
• ~ LOCKSS: leverages insight that web publications in any given
portal are based on same template:
• share lay-out
• share interactive affordances
• ~ webrecorder.io: human guidance to achieve quality
• But, with Memento Tracer:
• user does not record a specific web publication
• user records heuristics that apply to a class of web publications
71. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Memento Tracer
72. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
A Trace for slideshare Presentations
{ "portal_url_match":
"(slideshare.net)/([^/]+)/([^/]+)",
"actions": [{ "action_order": "1",
"value": "div.j-next-btn.arrow-right",
"type": "CSSSelector",
"action": "repeated_click",
"repeat_until": {
"condition": "changes",
"type": "resource_url"
}
},
{ "action_order": "2",
"value": "div.notranslate.transcript.add-
padding-right.j-transcript a",
"type": "CSSSelector",
"action": "click"
}
], …
73. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Memento Tracer: Experimental
• Promising results, thus far
• Currently investigating challenges, including:
• User interface to support recording Traces for complex
sequences of interactions.
• Limitations of the browser event listener approach for recording
Traces.
• Language used to express Traces.
• Organization of the shared repository for Traces.
• Selection of a Trace for capturing a web publication in cases
where different page layouts and interactive affordances are
available for web publications that share a URI pattern.
74. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Demo: Recording a Trace for a Web Publication
https://github.com/www.gorillatoolkit/pkg/mux
75. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Demo: Capturing another Web Publication Using the Trace
https://github.com/mementoweb/node-solid-server
76. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Demo: Capturing another Web Publication Using the Trace
https://github.com/mementoweb/node-solid-server
77. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Demo: Playing Back the Captured Web Publication
Capture of https://github.com/mementoweb/node-solid-server
78. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Herbert Van de Sompel
Los Alamos National Laboratory
@hvdsomp
Perseverance on Persistence
a future-note about the past