Presentation given at EuropeanaTech 2018 in Rotterdam, The Netherlands. Provides a summary of insights gained from working for about a decade on challenges related to temporal aspects of the web, persistence.
International Image Interoperability Framework (IIIF). Sharing high resolutio...LIBIS
On Moday April 23th 2018 Roxanne Wyns (LIBIS - KU Leuven Libraries) gave a lecture at the University of Antwerp for Digital Humanities students and researchers. IIIF or the International Image Interoperability Framework is a community-developed framework for sharing high-resolution images in an efficient and standardized way across institutional boundaries. Using an IIIF manifest URL, a researcher can pull image based resources and related contextual information such as the structure of a complex object or document, metadata and rights information into any IIIF compliant viewer such as the Mirador viewer. Simply put, a researcher can access high resolution images from the British Library and from the KU Leuven Libraries in a single viewer for research. This lecture will introduce IIIF and its concepts, highlight projects and viewers, and give an in-depth view of its current and future application options for DH research.
Presentation during the 2016 American Library Association (ALA) Annual Conference in Orlando (Florida), given at the ALCTS Program "Linked Data - Globally Connecting Libraries, Archives, and Museums", Sponsor: ALCTS International Relations Committee, Co-Sponsor: Linked Library Data Interest Group
A Framework for Aggregating Private and Public Web Archivesjcdl2018
Mat Kelly, Michael L. Nelson, and Michele C. Weigle
Old Dominion University
Web Science & Digital Libraries Research Group {mkelly, mln, mweigle}@cs.odu.edu @machawk1 • @WebSciDL
#jcdl2018
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....Matthias Arnold
This paper presents the project “Early Chinese Periodicals Online (ECPO)”. It introduces the database, and discusses two major directions of current development: 1) The installation of a cross-database agents’ service to identify names, assign names to persons, and relate persons to authorities (GND, VIAF, Wikidata). 2) The conceptualization of a TEI module to expand the database with full texts functionality, thereby touching issues like semi-automatic page segmentation, use of non-Chinese speaking communities in crowd sourcing, and selecting of relevant TEI markup to encode Republican era publications.
International Image Interoperability Framework (IIIF). Sharing high resolutio...LIBIS
On Moday April 23th 2018 Roxanne Wyns (LIBIS - KU Leuven Libraries) gave a lecture at the University of Antwerp for Digital Humanities students and researchers. IIIF or the International Image Interoperability Framework is a community-developed framework for sharing high-resolution images in an efficient and standardized way across institutional boundaries. Using an IIIF manifest URL, a researcher can pull image based resources and related contextual information such as the structure of a complex object or document, metadata and rights information into any IIIF compliant viewer such as the Mirador viewer. Simply put, a researcher can access high resolution images from the British Library and from the KU Leuven Libraries in a single viewer for research. This lecture will introduce IIIF and its concepts, highlight projects and viewers, and give an in-depth view of its current and future application options for DH research.
Presentation during the 2016 American Library Association (ALA) Annual Conference in Orlando (Florida), given at the ALCTS Program "Linked Data - Globally Connecting Libraries, Archives, and Museums", Sponsor: ALCTS International Relations Committee, Co-Sponsor: Linked Library Data Interest Group
A Framework for Aggregating Private and Public Web Archivesjcdl2018
Mat Kelly, Michael L. Nelson, and Michele C. Weigle
Old Dominion University
Web Science & Digital Libraries Research Group {mkelly, mln, mweigle}@cs.odu.edu @machawk1 • @WebSciDL
#jcdl2018
Early Chinese Periodicals Online (ECPO): From Digitization Towards Open Data....Matthias Arnold
This paper presents the project “Early Chinese Periodicals Online (ECPO)”. It introduces the database, and discusses two major directions of current development: 1) The installation of a cross-database agents’ service to identify names, assign names to persons, and relate persons to authorities (GND, VIAF, Wikidata). 2) The conceptualization of a TEI module to expand the database with full texts functionality, thereby touching issues like semi-automatic page segmentation, use of non-Chinese speaking communities in crowd sourcing, and selecting of relevant TEI markup to encode Republican era publications.
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
Knowledge Graphs are often used as a symbolic representation mechanism for representing knowledge in data intensive applications, both for integrating corporate knowledge as well as for providing general, cross-domain knowledge in public knowledge graphs such as Wikidata. As such, they have been identified as a useful way of injecting background knowledge in data analysis processes. To fully harness the potential of knowledge graphs, latent representations of entities in the graphs, so called knowledge graph embeddings, show superior performance, but sacrifice one central advantage of knowledge graphs, i.e., the explicit symbolic knowledge representations. In this talk, I will shed some light on the usage of knowledge graphs and embeddings in data analysis, and give an outlook on research directions which aim at combining the best of both worlds.
Using knowledge graphs in data mining typically requires a propositional, i.e., vector-shaped representation of entities. RDF2vec is an example for generating such vectors from knowledge graphs, relying on random walks for extracting pseudo-sentences from a graph, and utilizing word2vec for creating embedding vectors from those pseudo-sentences. In this talk, I will give insights into the idea of RDF2vec, possible application areas, and recently developed variants incorporating different walk strategies and training variations.
This presentation shows approaches for knowledge graph construction from Wikipedia and other Wikis that go beyond the "one entity per page" paradigm. We see CaLiGraph, which extracts entities from categories and listings, as well as DBkWik, which extracts and integrates information from thousands of Wikis.
Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim
Large-scale cross-domain knowledge graphs, such as DBpedia or Wikidata, are some of the most popular and widely used datasets of the Semantic Web. In this paper, we introduce some of the most popular knowledge graphs on the Semantic Web. We discuss how machine learning is used to improve those knowledge graphs, and how they can be exploited as background knowledge in popular machine learning tasks, such as recommender systems.
How are Knowledge Graphs created?
What is inside public Knowledge Graphs?
Addressing typical problems in Knowledge Graphs (errors, incompleteness)
New Knowledge Graphs: WebIsALOD, DBkWik
Registration / Certification Interoperability Architecture (overlay peer-review)Herbert Van de Sompel
Presentation for the COAR meeting on Overlay Peer-Review held at INRIA, Paris, France. It provides overall context regarding a scholarly communication system in which the core functions of scholarly communication (registration, certification, awareness, archiving) are implemented in a decoupled manner and whereby each function can simultaneously be fulfilled by different parties, potentially in different ways. It shows how notifications can be used to achieve loosely coupled, point-to-point interoperability in such an environment, zooming in on interoperability between registration and certification aka interoperability between repositories and overlay peer-review services.
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim
AI is not just about machine learning, it also requires knowledge about the world. In this talk, I give an introduction on knowledge graphs, how they are built at scale, and how they are used in modern AI systems.
The original Semantic Web vision foresees to describe entities in a way that the meaning can be interpreted both by machines and humans. Following that idea, large-scale knowledge graphs capturing a significant portion of knowledge have been developed. In the recent past, vector space embeddings of semantic web knowledge graphs - i.e., projections of a knowledge graph into a lower-dimensional, numerical feature
space (a.k.a. latent feature space) - have been shown to yield superior performance in many tasks, including relation prediction, recommender systems, or the enrichment of predictive data mining tasks. At the same time, those projections describe an entity as a numerical vector, without
any semantics attached to the dimensions. Thus, embeddings are as far from the original Semantic Web vision as can be. As a consequence, the results achieved with embeddings - as impressive as they are in terms of quantitative performance - are most often not interpretable, and it is hard to obtain a justification for a prediction, e.g., an explanation why an item has been suggested by a recommender system. In this paper, we make a claim for semantic embeddings and discuss possible ideas towards their construction.
LOCAH Project and Considerations of Linked Data ApproachesAdrian Stevenson
Presentation given at JISC 'Managing Research Data International Workshop', Birmingham, UK. 29th March 2011
http://www.jisc.ac.uk/whatwedo/programmes/mrd/rdmevents/mrdinternationalworkshop.aspx
Mining the Web of Linked Data with RapidMinerHeiko Paulheim
Lots of data from different domains is published as Linked Open Data. While there are quite a few browsers for that data, as well as intelligent tools for particular purposes, a versatile tool for deriving additional knowledge by mining the Web of Linked Data is still missing. In this challenge entry, we introduce the RapidMiner Linked Open Data extension. The extension hooks into the powerful data mining platform RapidMiner, and offers operators for accessing Linked Open Data in RapidMiner, allowing for using it in sophisticated data analysis workflows without the need to know SPARQL or RDF. As an example, we show how statistical data on scientific publications, published as an RDF data cube, can be linked to further datasets and analyzed using additional background knowledge from various LOD datasets.
Keynote talk presented at Web Archiving and Digital Libraries (WADL) 2018
June 6, 2018 - Fort Worth, TX
Michele C. Weigle (@weiglemc)
Web Science and Digital Libraries (WS-DL) Research Group (@WebSciDL)
Old Dominion University
Norfolk, VA
This presentation looks back at several efforts, conducted in the past fifteen years, aimed at establishing interoperability for web-based scholarly communication. It tries to characterize the perspectives/approaches taken by these efforts and, based upon that, proposes an HATEOS-based approach to interlink scholarly nodes on the web. This was first presented at the Research Data Alliance meeting in Paris, France, September 22 2015.
Researcher Pod: Scholarly Communication Using the Decentralized WebHerbert Van de Sompel
The presentation provides an overview of the motivation and direction of the Mellon-funded Researcher Pod project that investigates technical aspects of scholarly communication in a decentralized web setting.
This slide deck provides an overview of proposals to use HTTP Links as a means to address some long standing problems related to scholarly resources on the web.
These slides go with the paper "Reminiscing About 15 Years of Interoperability Efforts" which is available at http://dx.doi.org/10.1045/november2015-vandesompel
Slides were used for a presentation at the Fall 2015 Membership Meeting of the Coalition for Networked Information.
Using Knowledge Graphs in Data Science - From Symbolic to Latent Representati...Heiko Paulheim
Knowledge Graphs are often used as a symbolic representation mechanism for representing knowledge in data intensive applications, both for integrating corporate knowledge as well as for providing general, cross-domain knowledge in public knowledge graphs such as Wikidata. As such, they have been identified as a useful way of injecting background knowledge in data analysis processes. To fully harness the potential of knowledge graphs, latent representations of entities in the graphs, so called knowledge graph embeddings, show superior performance, but sacrifice one central advantage of knowledge graphs, i.e., the explicit symbolic knowledge representations. In this talk, I will shed some light on the usage of knowledge graphs and embeddings in data analysis, and give an outlook on research directions which aim at combining the best of both worlds.
Using knowledge graphs in data mining typically requires a propositional, i.e., vector-shaped representation of entities. RDF2vec is an example for generating such vectors from knowledge graphs, relying on random walks for extracting pseudo-sentences from a graph, and utilizing word2vec for creating embedding vectors from those pseudo-sentences. In this talk, I will give insights into the idea of RDF2vec, possible application areas, and recently developed variants incorporating different walk strategies and training variations.
This presentation shows approaches for knowledge graph construction from Wikipedia and other Wikis that go beyond the "one entity per page" paradigm. We see CaLiGraph, which extracts entities from categories and listings, as well as DBkWik, which extracts and integrates information from thousands of Wikis.
Machine Learning with and for Semantic Web Knowledge GraphsHeiko Paulheim
Large-scale cross-domain knowledge graphs, such as DBpedia or Wikidata, are some of the most popular and widely used datasets of the Semantic Web. In this paper, we introduce some of the most popular knowledge graphs on the Semantic Web. We discuss how machine learning is used to improve those knowledge graphs, and how they can be exploited as background knowledge in popular machine learning tasks, such as recommender systems.
How are Knowledge Graphs created?
What is inside public Knowledge Graphs?
Addressing typical problems in Knowledge Graphs (errors, incompleteness)
New Knowledge Graphs: WebIsALOD, DBkWik
Registration / Certification Interoperability Architecture (overlay peer-review)Herbert Van de Sompel
Presentation for the COAR meeting on Overlay Peer-Review held at INRIA, Paris, France. It provides overall context regarding a scholarly communication system in which the core functions of scholarly communication (registration, certification, awareness, archiving) are implemented in a decoupled manner and whereby each function can simultaneously be fulfilled by different parties, potentially in different ways. It shows how notifications can be used to achieve loosely coupled, point-to-point interoperability in such an environment, zooming in on interoperability between registration and certification aka interoperability between repositories and overlay peer-review services.
Knowledge Matters! The Role of Knowledge Graphs in Modern AI SystemsHeiko Paulheim
AI is not just about machine learning, it also requires knowledge about the world. In this talk, I give an introduction on knowledge graphs, how they are built at scale, and how they are used in modern AI systems.
The original Semantic Web vision foresees to describe entities in a way that the meaning can be interpreted both by machines and humans. Following that idea, large-scale knowledge graphs capturing a significant portion of knowledge have been developed. In the recent past, vector space embeddings of semantic web knowledge graphs - i.e., projections of a knowledge graph into a lower-dimensional, numerical feature
space (a.k.a. latent feature space) - have been shown to yield superior performance in many tasks, including relation prediction, recommender systems, or the enrichment of predictive data mining tasks. At the same time, those projections describe an entity as a numerical vector, without
any semantics attached to the dimensions. Thus, embeddings are as far from the original Semantic Web vision as can be. As a consequence, the results achieved with embeddings - as impressive as they are in terms of quantitative performance - are most often not interpretable, and it is hard to obtain a justification for a prediction, e.g., an explanation why an item has been suggested by a recommender system. In this paper, we make a claim for semantic embeddings and discuss possible ideas towards their construction.
LOCAH Project and Considerations of Linked Data ApproachesAdrian Stevenson
Presentation given at JISC 'Managing Research Data International Workshop', Birmingham, UK. 29th March 2011
http://www.jisc.ac.uk/whatwedo/programmes/mrd/rdmevents/mrdinternationalworkshop.aspx
Mining the Web of Linked Data with RapidMinerHeiko Paulheim
Lots of data from different domains is published as Linked Open Data. While there are quite a few browsers for that data, as well as intelligent tools for particular purposes, a versatile tool for deriving additional knowledge by mining the Web of Linked Data is still missing. In this challenge entry, we introduce the RapidMiner Linked Open Data extension. The extension hooks into the powerful data mining platform RapidMiner, and offers operators for accessing Linked Open Data in RapidMiner, allowing for using it in sophisticated data analysis workflows without the need to know SPARQL or RDF. As an example, we show how statistical data on scientific publications, published as an RDF data cube, can be linked to further datasets and analyzed using additional background knowledge from various LOD datasets.
Keynote talk presented at Web Archiving and Digital Libraries (WADL) 2018
June 6, 2018 - Fort Worth, TX
Michele C. Weigle (@weiglemc)
Web Science and Digital Libraries (WS-DL) Research Group (@WebSciDL)
Old Dominion University
Norfolk, VA
This presentation looks back at several efforts, conducted in the past fifteen years, aimed at establishing interoperability for web-based scholarly communication. It tries to characterize the perspectives/approaches taken by these efforts and, based upon that, proposes an HATEOS-based approach to interlink scholarly nodes on the web. This was first presented at the Research Data Alliance meeting in Paris, France, September 22 2015.
Researcher Pod: Scholarly Communication Using the Decentralized WebHerbert Van de Sompel
The presentation provides an overview of the motivation and direction of the Mellon-funded Researcher Pod project that investigates technical aspects of scholarly communication in a decentralized web setting.
This slide deck provides an overview of proposals to use HTTP Links as a means to address some long standing problems related to scholarly resources on the web.
These slides go with the paper "Reminiscing About 15 Years of Interoperability Efforts" which is available at http://dx.doi.org/10.1045/november2015-vandesompel
Slides were used for a presentation at the Fall 2015 Membership Meeting of the Coalition for Networked Information.
This slide deck provides an overview of proposals to use HTTP Links as a means to address some long standing problems related to scholarly resources on the web.
Automated interpretability of linked data ontologies: an evaluation within th...Nuno Freire
Publication and usage of linked data has been highly pursued by cultural heritage institutions and service providers in this domain. Much research and cooperation are taking place in adapting and improving cultural heritage data models for linked data and in defining ontologies and vocabularies, as well as the setting up of services based on linked data. This article presents an evaluation of ontologies and vocabularies published as liked data, which originate from the cultural heritage domain, or are frequently used and linked to in this domain. Our study aims to evaluate their usability by crawlers operating on the web of data, according to specifications and practices of linked data, the Semantic Web and ontology reasoning. We evaluate having in mind the use case of general data consumption applications based on RDF, RDF Schema, OWL, SKOS and linked data’s guidelines. We have evaluated twelve ontologies and vocabularies and identified that four were not fully compliant, and that alignments between ontologies are not included in the definitions of the ontologies. This study contributes to the research of novel services consuming linked data. It also allows to better assess the automation that can be achieved to handle the variety and large volume of linked data, when assessing the viability of new services based on linked data in cultural heritage.
A 4 hour hands on linked data workshop held at ELAG 2013 - http://elag2013.org/ws2-very-gentle-linked-data/. Resources at http://data.archiveshub.ac.uk/workshops/elag2013/
Linked Statistical Data: does it actually pay off?Oscar Corcho
Invited keynote at the ISWC2015 Workshop on Semantics and Statistics (SemStats 2015). http://semstats.github.io/2015/
The release of the W3C RDF Data Cube recommendation was a significant milestone towards improving the maturity of the area of Linked Statistical Data. Many Data Cube-based datasets have been released since then. Tools for the generation and exploitation of such datasets have also appeared. While the benefits for the usage of RDF Data Cube and the generation of Linked Data in this area seem to be clear, there are still many challenges associated to the generation and exploitation of such data. In this talk we will reflect about them, based on our experience on generating and exploiting such type of data, and hopefully provoke some discussion about what the next steps should be.
How can design help us communicate data easily to users? Where does this stem from? What methods of design are easy for users to engage with? What should we be trying to achieve with these designs?
The cultural sector is a big adopter of open data and semantic web technologies. They have embraced the ideas and are weaving them into everything they do. So, who is doing what? What data sets are there available? And how have these been presented to the public.
Using case studies from the cultural sector, we will explore the practical challenges associated with complex UI designs. Looking at work-in-progress through to finished products we will discuss best practice, finding innovation, and the challenges of working with data sets.
Evaluation of Schema.org for Aggregation of Cultural Heritage MetadataNuno Freire
In the World Wide Web, a very large number of resources is made available through digital libraries. The existence of many individual digital libraries, maintained by different organiza-tions, brings challenges to the discoverability, sharing and reuse of the resources. A widely-used approach is metadata aggregation, where centralized efforts like Europeana facilitate the discoverability and use of the resources by collecting their associated metadata. The cultural heritage domain embraced the aggregation approach while, at the same time, the technological landscape kept evolving. Nowadays, cultural heritage institutions are increas-ingly applying technologies designed for the wider interoperability on the Web. In this con-text, we have identified the Schema.org vocabulary as a potential technology for innovating metadata aggregation. We conducted two case studies that analysed Schema.org metadata from collections from cultural heritage institutions. We used the requirements of the Euro-peana Network as evaluation criteria. These include the recommendations of the Europeana Data Model, which is a collaborative effort from all the domains represented in Europeana: libraries, museums, archives, and galleries. We concluded that Schema.org poses no obstacle that cannot be overcome to allow data providers to deliver metadata in full compliance with Europeana requirements and with the desired semantic quality. However, Schema.org’s cross-domain applicability raises the need for accompanying its adoption by recommenda-tions and/or specifications regarding how data providers should create their Schema.org metadata, so that they can meet the specific requirements of Europeana or other cultural aggregation networks.
Presentation about reference rot given at the Complexity Science Hub in Vienna, November 2021.
Links to web resources frequently break (link rot), and linked content can change at unpredictable rates (content drift). These dynamics of the Web are detrimental when references to web resources provide evidence or supporting information.
This presentation will report on research that assessed the extent of these problems for links to web resources in scholarly literature, by using three vast corpora of publications and a range of public web archives. It will also describe the Robust Link approach that offers a proactive, uniform, and machine-actionable way to combat link rot and content drift. Finally, it will introduce the Robustify web service and API that was devised to generate links that remain functional over time, paying special attention to challenges related to deploying infrastructure that is required to be long lasting.
Presentation for a workshop about persistent identifiers organized by the Royal Library of The Netherlands and DANS. Highlights the non-trivial commitments required of all parties involved in persistent identifier systems to actually keep links based on persistent identifiers ... err ... persistent.
Various FAIR criteria pertaining to machine interaction with scholarly artifacts can commonly be addressed by means of repository-wide affordances that are uniformly provided for all hosted artifacts rather than through artifact-specific interventions. If various repository platforms provide such affordances in an interoperable manner, devising tools - for both human and machine use - that leverage them becomes easier.
My involvement, over the years, in a range of interoperability efforts has brought the insight that two factors strongly influence adoption: addressing a burning issue and delivering a KISS solution to tackle it. Undoubtedly, FAIR and FAIR DOs are burning issues. FAIR Signposting <https://signposting.org/FAIR/> is an ad-hoc repository interoperability effort that squarely fits in this problem space and that purposely specifies a KISS solution, hoping to inspire wide adoption.
Slides used for a keynote presentation at the VIVO 2019 Conference in Podgorica, Montenegro.
Abstract: The invitation to present a keynote at the VIVO Conference and the goal of the VIVO platform, as stated on the DuraSpace site, to create an integrated record of the scholarly work of an organisation reminded me of various efforts that I have been involved in over the past years that had similar goals. EgoSystem (2014) attempted to gather information about postdocs that had left the organisation, leaving little or no contact details behind. Autoload (2017), an operational service, discovers papers by organisational researchers in order to upload them in the institutional repository. myresearch.institute (2018), an experiment that is still in progress, discovers artefacts that researchers deposit in web productivity portals and subsequently archives them. More recently, I have been involved in thinking about the future of NARCIS, a portal that provides an overview of research productivity in The Netherlands. The approach taken in all these efforts share a characteristic motivated by a desire to devise scalable and sustainable solutions: let machines rather than humans do the work. In this talk, I will provide an overview of these efforts, their motivations, the challenges involved, and the nature of success (if any).
Presentation for PIDapalooza 2019, Dublin, Ireland.
The Scholarly Orphans project, funded by the Andrew W. Mellon Foundation, explores technical approaches aimed at capturing and archiving scholarly artifacts that researchers deposit in web productivity portals as a means to collaborate and communicate with their peers. These artifacts are not collected by other frameworks aimed at archiving the scholarly record (e.g., LOCKSS, Portico, Institutional Repositories) and are only incidentally captured by web archives. The project explores an institution-driven approach inspired by web archiving. To demonstrate the ongoing thinking, the project has devised an experimental automated pipeline that continuously discovers, captures, and archives artifacts. These are created by actual researchers who, for the purpose of the experiment, were virtually enlisted in a fictive research institution. A portal at myresearch.institute provides an overview of the artifacts that were discovered and provides access to archived versions stored in both an institutional and a cross-institutional archive. The set-up leverages a range of technologies that share a flavor of persistence: Memento, Memento Tracer, Robust Links, Signposting.
As a memento of my last week of working at LANL, I put together a slide deck that provides an overview of major efforts conducted during the time I was there.
"Scholarly Communication: Deconstruct and Decentralize" was presented at the Fall 2017 Meeting of the Coalition for Networked Information. It explores working towards a Scholarly Commons by applying decentralized web ideas to scholarly communication.
Looks at hyperlinks from the perspective of a managed collection of resources for which link persistence/integrity is considered a quality of service concern. Distinguishes between links into other managed collections and to the web at large. Considers link rot and content drift.
Presentation for PIDapalooza 2016. PIDs need to be used to achieve their intended persistence. Our research (reported at WWW2016, see http://arxiv.org/1602.09102) found that a disturbing percentage of references to papers that have DOIs actually use the landing page HTTP URI instead of the DOI HTTP URI. The problem is likely related to tools used for collecting references such as bookmarks and reference managers. These select the landing page URI instead of the DOI URI because the former is what's available in the address bar. It can safely be assumed that the same problem exists for other types of PIDs. The net result is that the true potential of PIDs is not realized. In order to ameliorate this problem we propose a Signposting pattern for PIDs (http://signposting.org/identifier/). It consists of adding a Link header to HTTP HEAD/GET responses for all resources identified by a DOI, including the landing page and content resources such as "the PDF" and "the dataset". The Link header contains a link, which points with the "identifier" relation type to the DOI HTTP URI. When such a link is available, tools can automatically discover and use the DOI URI instead of the other URIs (landing page, PDF, dataset) associated with the DOI-identified object.
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTHerbert Van de Sompel
DBpedia is the Linked Data version of Wikipedia. Starting in 2007, several DBpedia dumps have been made available for download. In 2010, the Research Library at the Los Alamos National Laboratory used these dumps to deploy a Memento-compliant DBpedia Archive, in order to demonstrate the applicability and appeal of accessing temporal versions of Linked Data sets using the Memento “Time Travel for the Web” protocol. The archive supported datetime negotiation to access various temporal versions of RDF descriptions of DBpedia subject URIs.
In a recent collaboration with the iMinds Group of Ghent University, the DBpedia Archive received a major overhaul. The initial MongoDB storage approach, which was unable to handle increasingly large DBpedia dumps, was replaced by HDT, the Binary RDF Representation for Publication and Exchange. And, in addition to the existing subject URI access point, Triple Pattern Fragments access, as proposed by the Linked Data Fragments project, was added. This allows datetime negotiation for URIs that identify RDF triples that match subject/predicate/object patterns. To add this powerful capability, native Memento support was added to the Linked Data Fragments Server of Ghent University.
In this talk, we will include a brief refresher of Memento, and will cover Linked Data Fragments, Triple Pattern Fragments, and HDT in more detail. We will share lessons learned from this effort and demo the new DBpedia Archive, which, at this point, holds over 5 billion RDF triples.
Extended version of slides presented at the "404/File Not Found" symposium held at Georgetown University on October 24 2014, see http://www.law.georgetown.edu/library/404/ . The presentation provides a brief overview of the link/reference rot problem and then discusses three complimentary strategies to combat it: Pro-actively capturing web resources that are linked from a seed collection; Referencing the captures by means of annotated links; Accessing the captures using Memento infrastructure.
This presentation introduces ResourceSync, a specification aimed to enable web-based synchronization of resources. The specification is the result of a collaboration between NISO and the Open Archives Initiative funded by the Sloan Foundation and JISC. The proposed resource synchronization approach is based on several existing specifications (e.g. Sitemaps, PubSubHubbub, well-known URI) and is aligned with common architectural principles (e.g. REST, follow your nose).
A 15 minute video version of these slides is available at https://www.youtube.com/watch?v=ASQ4jMYytsA
This presentation provides an overview of the Memento "Time Travel for the Web" framework that is aligned with the stable version of the Memento protocol, specified in RFC 7089.
As the scholarly communication system evolves to become natively web-based and starts supporting the communication of a wide variety of objects, the manner in which its essential functions – registration, certification, awareness, archiving - are fulfilled co-evolves. This presentation focuses on the nature of the archival function based on a perspective of the future scholarly communication infrastructure. This presentation, prepared for a meeting in June 2014, is based on and updates a previous one that was prepared for a January 2014 meeting. The latter is available at http://www.slideshare.net/atreloar/scholarly-archiveofthefuture
The slides were used to accompany an overview of the outcomes of the ResourceSync project at the 2014 Spring Membership Meeting of the Coalition for Networked Information (CNI).
The launch of ResourceSync, a joint project of the National Information Standards Organization (NISO) and the Open Archives Initiative (OAI) funded by the Alfred P. Sloan Foundation, was motivated by the ubiquitous need to synchronize resources for applications in the realm of cultural heritage and research communication. After an initial problem definition and scoping phase, the project has designed, specified, and tested a framework for web-based synchronization that is based on SiteMaps, a protocol widely used by web servers to advertise the resources they make available to search engines for indexing. This choice allows repositories to address both search engine optimization and resource synchronization needs using the same technology.
The ResourceSync framework specifies various modular capabilities that a repository can support in order to allow third party systems to remain synchronized with its evolving resources. For example, a Resource List provides an inventory of resources whereas a Change List details resources that were created, deleted or updated during a given temporal interval. Support for capabilities can be combined in order to meet local or community requirements. The framework specifies capabilities that require a third party to recurrently poll for up-to-date information about a repositories’ resources but also publish/subscribe capabilities that keep third parties informed about changes through notifications, thereby significantly reducing synchronization latency.
Persistent Identifiers and the Web: The Need for an Unambiguous MappingHerbert Van de Sompel
Presentation given at the International Digital Curation Conference in San Francisco, February 26 2014. Highlights the lack of machine-actionability of persistent identifiers assigned to scholarly communication assets. Proposes an approach to address the issue that meets requirements that take into account the changing nature of web based research communication. A draft paper provides more details: http://public.lanl.gov/herbertv/papers/Papers/2014/IDCC2014_vandesompel.pdf
Slides used for a presentation at the CNI 2013 Fall meeting. Discusses the problem domain of the Hiberlink project, a collaboration between the Los Alamos National Laboratory and the University of Edinburgh, funded by the Andrew W. Mellon Foundation. Hiberlink investigates reference rot in web-based scholarly communication.
Keynote presentation delivered at ELAG 2013 in Gent, Belgium, on May 29 2013. Discusses Research Objects and the relationship to work my team has been involved in during the past couple of years: OAI-ORE, Open Annotation, Memento.
Presentation given at the EMTACL12 conference in Trondheim, Norway, on October 1 2012. Discusses the evolution towards a highly dynamic scholarly record (assets don't have the sense of fixity they used to have; assets are highly interdependent) and how the archiving infrastructure used for scholarly communication can not adequately deal with this dynamism.
Multi-cluster Kubernetes Networking- Patterns, Projects and GuidelinesSanjeev Rampal
Talk presented at Kubernetes Community Day, New York, May 2024.
Technical summary of Multi-Cluster Kubernetes Networking architectures with focus on 4 key topics.
1) Key patterns for Multi-cluster architectures
2) Architectural comparison of several OSS/ CNCF projects to address these patterns
3) Evolution trends for the APIs of these projects
4) Some design recommendations & guidelines for adopting/ deploying these solutions.
ER(Entity Relationship) Diagram for online shopping - TAEHimani415946
https://bit.ly/3KACoyV
The ER diagram for the project is the foundation for the building of the database of the project. The properties, datatypes, and attributes are defined by the ER diagram.
1.Wireless Communication System_Wireless communication is a broad term that i...JeyaPerumal1
Wireless communication involves the transmission of information over a distance without the help of wires, cables or any other forms of electrical conductors.
Wireless communication is a broad term that incorporates all procedures and forms of connecting and communicating between two or more devices using a wireless signal through wireless communication technologies and devices.
Features of Wireless Communication
The evolution of wireless technology has brought many advancements with its effective features.
The transmitted distance can be anywhere between a few meters (for example, a television's remote control) and thousands of kilometers (for example, radio communication).
Wireless communication can be used for cellular telephony, wireless access to the internet, wireless home networking, and so on.
This 7-second Brain Wave Ritual Attracts Money To You.!nirahealhty
Discover the power of a simple 7-second brain wave ritual that can attract wealth and abundance into your life. By tapping into specific brain frequencies, this technique helps you manifest financial success effortlessly. Ready to transform your financial future? Try this powerful ritual and start attracting money today!
1. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Herbert Van de Sompel
Los Alamos National Laboratory
@hvdsomp
Perseverance on Persistence
a future-note about the past
2. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
OAI-ORE
3. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
2006
• OAI-ORE observation: Scholarly assets are
rapidly becoming compound, consisting of
multiple resources with various:
• Relationships
• Interdependencies
• How to convey this compound-ness in an
interoperable manner so that applications
can access, consume such assets?
http://www.openarchives.org/ore/1.0/toc
4. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Address interoperability challenges from the perspective of the web
• The resource at the center of the universe
• The notion of a repository (or even of a web server) does
not exist in the architecture of the web
• Neither the notion of a Digital Object
• The tools of the interoperability trade are the primitives of the
web
ORE Insight 1 - Web-Centric Interoperability Paradigm
5. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Tools of the Web-Centric Interoperability Trade
• Resource
• URI
• HTTP as the API: HEAD/GET, POST, PUT, DELETE
• Representation
• Media Type
• Link
• Content Negotiation
• Typed Link
• Controlled Vocabularies for Typed Links
W3C
Architecture of
the World
Wide Web
RDF, RDFS,
OWL
6. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
7. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
OAI-ORE in EDM
Europeana v1.0 2009
8. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
The web-centric ORE approach allowed using off-the-shelf web
tools to archive evolving compound objects
• Evolving versions of Resource Maps, Aggregated Resources
were captured in a web archive
• But how to use the URI of the Aggregation or Resource Map to
see the status of an Aggregation at a specific moment in the
past?
ORE Insight 2 – How to Access Temporal State of an Aggregation
H. Van de Sompel (2007) Compound Information Object Prototype Demonstration
https://www.dropbox.com/s/dd7xd427y90q4jx/CT_Watch_hvds_20070703.mov?dl=0
9. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
H. Van de Sompel, M. L. Nelson, R. Sanderson (2013) RFC7089 - HTTP Framework for Time-
Based Access to Resource States – Memento. https://tools.ietf.org/html/rfc7089
Memento
10. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Tools of the Web-Centric Interoperability Trade – HTTP Stack
• Resource
• URI
• HTTP as the API
• Representation
• Media Types
• Link
• Content Negotiation, e.g. for preferred Media Type
• Typed Link
• Controlled Vocabularies for Typed Links
W3C
Architecture of
the World Wide
Web
HTTP Links,
IANA link
relation registry,
community link
relation types
HATEOAS – Hypermedia As The Engine Of Application State
http://en.wikipedia.org/wiki/HATEOAS
11. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Original Resource and Mementos
12. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Bridge from Present to Past
13. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Bridge from Present to Past
14. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Bridge from Past to Present
15. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
timegate Link: Link to Your Own History
Can link to preferred web
archive, but also:
• Maintain your own
resource version history
• timegate link to your
own history
• Distributed management of
resource history
• Uniform access to
resource history across
systems
• Follow links across
systems subject to time
16. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
No timegate Link – Client Intelligence
Client uses TimeGate of its
preferred web archive, but:
• Internet Archive is
massive, yet substantial
unique materials in other
archives
• Introduce aggregated
TimeGate: Memento
Aggregator
17. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Routing TimeGate Requests Using Machine Learning
Bornand, N., Balakireva, L., Van de Sompel, H. (2016) Routing Memento Requests Using Binary
Classifiers. JCDL16. https://arxiv.org/abs/1606.09136
• Memento Aggregator covers 20+ web archives
• Distributed systems problem: As the number of archives (and
incoming requests) grows, sending requests to each archive for
every incoming request is not feasible
• Response times
• Load on distributed archives
• After various optimization attempts, devised an approach using
binary classifiers per web archive:
• Trained on the basis of cached URIs, using URI features only
• Operational since 2016: 80% reduction in # queries. 1/3
reduction in response times. Recall 85%
18. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
From
Internet Archive
TodayToday Select Date Mar 20 2007 Apr 03 2007
Various Memento Tools (client/server)
https://github.com/machawk1/awesome-memento
19. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Pockets of Persistence
20. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Creating Pockets of Persistence
• With Memento’s time travel capability in place, what would it take to
support faithfully navigating the web of the Past?
• There are two major forces that hinder achieving this goal:
• Link rot: A link stops working all together
• Content drift: The linked content changes over time and may
eventually no longer be representative of the content that was
originally linked
• Without these forces at work, the web of the Present would be the
same as the web of the Past
• But that clearly is not the case
21. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Hyperlinks in Theory
22. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Hyperlinks in Reality
23. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Hyperlinks in Reality
24. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Link Rot
25. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Link Rot - PMC
Martin Klein, Herbert Van de Sompel, Robert Sanderson, Harihar Shankar, et al. (2014) Scholarly
context not found. In: PLOS ONE https://doi.org/10.1371/journal.pone.0115253
26. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Hyperlinks in Reality
27. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Content Drift
28. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Content Drift
29. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Content Drift
http://icecube.wisc.edu/ on May 8 2009 (left) and August 27 2009 (right)
30. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
No Content Drift
http://www.ifa.hawaii.edu/~cowie/k_table.html on June 9 1997 (left) and March 2016 (right)
31. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Content Drift - PMC
Shawn Jones, Herbert Van de Sompel, Harihar Shankar, Martin Klein, et al. (2016) Scholarly context
not found. In: PLOS ONE https://doi.org/10.1371/journal.pone.0167475
32. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Creating Pockets of Persistence
• What would it take to really support faithfully navigating the web of
the Past?
• This challenge exists for the entire web. Some communities with well
managed collections care about addressing it:
• Scholarly communication
• Cultural heritage
• Legal publications
• Journalism
• Wikipedia
• Why?
• Link Rot: Quality of Service
• Content Drift: integrity of the record, reliable evidence, revisiting
the state of knowledge, transparency of editorial process, …
33. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
US Supreme Court Opinion – Link Rot Activism
http://ssnat.com
34. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Two Types of Links from a Managed Collection
35. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Take 1 – PID Approach
PID
for
B
36. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Managed Collection => Managed Collection
37. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
PID Approach
Combat:
• Link Rot: Link to PID;
Redirect to current location
• Content Drift: Mint a PID
per version; Link to version
PID
With PID links:
• Web of Present = Web of
Past
38. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
39. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
URI References - PMC
Herbert Van de Sompel, Martin Klein, and Shawn Jones (2016) Persistent URIs Must Be Used to Be Persistent.
In: WWW2016. http://arxiv.org/1602.09102
Herbert Van de Sompel, Martin Klein, and Shawn Jones (2016) Persistent URIs Must Be Used
to Be Persistent. In: WWW2016. http://arxiv.org/1602.09102
40. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
cite-as Relation Type
Herbert Van de Sompel et al. (2018) cite-as: A Link Relation to Convey a Preferred URI for
Referencing. https://datatracker.ietf.org/doc/draft-vandesompel-citeas/
http://signposting.org
41. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
PID Approach – Division of Labor
42. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Managed Collection => Web at Large
43. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
44. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
PID Approach
-
45. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Take 2 – Robust Links Approach
46. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Managed Collection => Web at Large
47. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Snapshot Approach
Combat:
• Link Rot & Content Drift:
Custodian of A creates
snapshot of B, in web
archive or locally
Regarding links:
• Intuition suggests linking to
the snapshot of B …
48. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Linking to Snapshot of B = Potentially Creating a Rotten Link
• Existing practice for linking to snapshots:
<a href=“URL of snapshot of B”>
• Problems with existing practice:
o Impossible to visit the original URI, if desired
o Requires the permanent existence/uptime of the archive that
holds the snapshot
- One link rot problem replaced by another
http://robustlinks.mementoweb.org/about/
49. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Permanent Existence/Uptime of Archives?
Remnant of discontinued web archive http://mummify.it captured on February 14 2014
https://web.archive.org/web/20140214233752/https://www.mummify.it/
50. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Permanent Existence/Uptime of Archives?
http://www.themoscowtimes.com/news/article/russia-bans-wayback-machine-internet-archive-over-
islamic-state-video/510074.html
51. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Permanent Existence/Uptime of Archives?
http://web.archive.org/web/20121101043952/http://vogin.nl on March 6 2017 at 15:59 CET
52. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Decorate the Link
• Proposed practice for linking to captures:
<a href=“URL of snapshot of B”
data-originalurl=“B”
data-versiondate=“datetime of snapshot of B”>
<a href=“B”
data-versionurl=“URL of snapshot of B”
data-versiondate=“datetime of snapshot of B”>
http://robustlinks.mementoweb.org/spec/
53. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Robust Links: Link Decoration in Action
Van de Sompel H. & Nelson, M.L. (2015) Reminiscing about 15 years of interoperability efforts. In:
D-Lib Magazine. https://doi.org/10.1045/november2015-vandesompel
JavaScript makes the
link decorations actionable
54. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Robust Links: Refuse to Die
55. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
56. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Snapshot Approach – Division of Labor
57. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Managed Collection => Managed Collection
58. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Cool URI Approach
Combat:
• Link Rot: Link to B;
Redirect to current location
• Content Drift: Generic URI;
Version URIs
With Cool URI links:
• Tension between linking to
generic URI and version
URI
59. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Robust Links: Refuse to Die
60. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
61. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Cool URI Approach – Division of Labor
62. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Robust Links Approach
63. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Summary
PID RLLabor
-
64. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Robust Links for Linked Data?
Sanderson, R., Ciccarese, P., and Young, B. (2017) Web Annotation Vocabulary
W3C Recommendation 23 February 2017. https://www.w3.org/TR/annotation-vocab/
65. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Handling Resource Versions, Captures
B
B
t1
B
t2
66. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Systems with Resource Versions
67. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
DBpedia Snapshot Archive Using HDT, TPF, Memento
Vander Sande, M., Verborgh, R., Hochstenbach, P., and Van de Sompel, H. (2017) Towards
sustainable publishing and querying of distributed Linked Data archives.
Temporal: subject URI access ; ?s ?p ?o queries ; SPARQL queries
68. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Memento Tracer
http://tracer.mementoweb.org
69. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Resource Capture: Tension Between Scale and Quality
• Web crawling: optimized for scale
• Problems with capturing resources accessible via interactive
affordances
• webrecorder.io: optimized for quality
• Personal archiving
• User records web navigation session
• Not used for archiving at scale
• LOCKSS: optimized for scholarly journals
• Pages in Publisher/Journal portals share lay-out, affordances
• Heuristics per publisher/journal to improve capture quality
70. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Memento Tracer: New Sweet Spot Between Scale and Quality
• ~ web crawling: server side process to capture resources
• ~ LOCKSS: leverages insight that web publications in any given
portal are based on same template:
• share lay-out
• share interactive affordances
• ~ webrecorder.io: human guidance to achieve quality
• But, with Memento Tracer:
• user does not record a specific web publication
• user records heuristics that apply to a class of web publications
71. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Memento Tracer
72. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
A Trace for slideshare Presentations
{ "portal_url_match":
"(slideshare.net)/([^/]+)/([^/]+)",
"actions": [{ "action_order": "1",
"value": "div.j-next-btn.arrow-right",
"type": "CSSSelector",
"action": "repeated_click",
"repeat_until": {
"condition": "changes",
"type": "resource_url"
}
},
{ "action_order": "2",
"value": "div.notranslate.transcript.add-
padding-right.j-transcript a",
"type": "CSSSelector",
"action": "click"
}
], …
73. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Memento Tracer: Experimental
• Promising results, thus far
• Currently investigating challenges, including:
• User interface to support recording Traces for complex
sequences of interactions.
• Limitations of the browser event listener approach for recording
Traces.
• Language used to express Traces.
• Organization of the shared repository for Traces.
• Selection of a Trace for capturing a web publication in cases
where different page layouts and interactive affordances are
available for web publications that share a URI pattern.
74. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Demo: Recording a Trace for a Web Publication
https://github.com/www.gorillatoolkit/pkg/mux
75. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Demo: Capturing another Web Publication Using the Trace
https://github.com/mementoweb/node-solid-server
76. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Demo: Capturing another Web Publication Using the Trace
https://github.com/mementoweb/node-solid-server
77. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Demo: Playing Back the Captured Web Publication
Capture of https://github.com/mementoweb/node-solid-server
78. Herbert Van de Sompel @hvdsomp
EuropeanaTech 2018, Rotterdam, The Netherlands, 15/05/18
Herbert Van de Sompel
Los Alamos National Laboratory
@hvdsomp
Perseverance on Persistence
a future-note about the past