Slides used for a presentation at the CNI 2013 Fall meeting. Discusses the problem domain of the Hiberlink project, a collaboration between the Los Alamos National Laboratory and the University of Edinburgh, funded by the Andrew W. Mellon Foundation. Hiberlink investigates reference rot in web-based scholarly communication.
Extended version of slides presented at the "404/File Not Found" symposium held at Georgetown University on October 24 2014, see http://www.law.georgetown.edu/library/404/ . The presentation provides a brief overview of the link/reference rot problem and then discusses three complimentary strategies to combat it: Pro-actively capturing web resources that are linked from a seed collection; Referencing the captures by means of annotated links; Accessing the captures using Memento infrastructure.
Presentation for PIDapalooza 2016. PIDs need to be used to achieve their intended persistence. Our research (reported at WWW2016, see http://arxiv.org/1602.09102) found that a disturbing percentage of references to papers that have DOIs actually use the landing page HTTP URI instead of the DOI HTTP URI. The problem is likely related to tools used for collecting references such as bookmarks and reference managers. These select the landing page URI instead of the DOI URI because the former is what's available in the address bar. It can safely be assumed that the same problem exists for other types of PIDs. The net result is that the true potential of PIDs is not realized. In order to ameliorate this problem we propose a Signposting pattern for PIDs (http://signposting.org/identifier/). It consists of adding a Link header to HTTP HEAD/GET responses for all resources identified by a DOI, including the landing page and content resources such as "the PDF" and "the dataset". The Link header contains a link, which points with the "identifier" relation type to the DOI HTTP URI. When such a link is available, tools can automatically discover and use the DOI URI instead of the other URIs (landing page, PDF, dataset) associated with the DOI-identified object.
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTHerbert Van de Sompel
DBpedia is the Linked Data version of Wikipedia. Starting in 2007, several DBpedia dumps have been made available for download. In 2010, the Research Library at the Los Alamos National Laboratory used these dumps to deploy a Memento-compliant DBpedia Archive, in order to demonstrate the applicability and appeal of accessing temporal versions of Linked Data sets using the Memento “Time Travel for the Web” protocol. The archive supported datetime negotiation to access various temporal versions of RDF descriptions of DBpedia subject URIs.
In a recent collaboration with the iMinds Group of Ghent University, the DBpedia Archive received a major overhaul. The initial MongoDB storage approach, which was unable to handle increasingly large DBpedia dumps, was replaced by HDT, the Binary RDF Representation for Publication and Exchange. And, in addition to the existing subject URI access point, Triple Pattern Fragments access, as proposed by the Linked Data Fragments project, was added. This allows datetime negotiation for URIs that identify RDF triples that match subject/predicate/object patterns. To add this powerful capability, native Memento support was added to the Linked Data Fragments Server of Ghent University.
In this talk, we will include a brief refresher of Memento, and will cover Linked Data Fragments, Triple Pattern Fragments, and HDT in more detail. We will share lessons learned from this effort and demo the new DBpedia Archive, which, at this point, holds over 5 billion RDF triples.
As the scholarly communication system evolves to become natively web-based and starts supporting the communication of a wide variety of objects, the manner in which its essential functions – registration, certification, awareness, archiving - are fulfilled co-evolves. This presentation focuses on the nature of the archival function based on a perspective of the future scholarly communication infrastructure. This presentation, prepared for a meeting in June 2014, is based on and updates a previous one that was prepared for a January 2014 meeting. The latter is available at http://www.slideshare.net/atreloar/scholarly-archiveofthefuture
Various FAIR criteria pertaining to machine interaction with scholarly artifacts can commonly be addressed by means of repository-wide affordances that are uniformly provided for all hosted artifacts rather than through artifact-specific interventions. If various repository platforms provide such affordances in an interoperable manner, devising tools - for both human and machine use - that leverage them becomes easier.
My involvement, over the years, in a range of interoperability efforts has brought the insight that two factors strongly influence adoption: addressing a burning issue and delivering a KISS solution to tackle it. Undoubtedly, FAIR and FAIR DOs are burning issues. FAIR Signposting <https://signposting.org/FAIR/> is an ad-hoc repository interoperability effort that squarely fits in this problem space and that purposely specifies a KISS solution, hoping to inspire wide adoption.
To the Rescue of the Orphans of Scholarly CommunicationMartin Klein
To the Rescue of the Orphans of Scholarly Communication
presentation at CNI Spring 2017 meeting
Herbert Van de Sompel
http://orcid.org/0000-0002-0715-6126
Michael L. Nelson
http://orcid.org/0000-0003-3749-8116
Martin Klein
http://orcid.org/0000-0003-0130-2097
This presentation looks back at several efforts, conducted in the past fifteen years, aimed at establishing interoperability for web-based scholarly communication. It tries to characterize the perspectives/approaches taken by these efforts and, based upon that, proposes an HATEOS-based approach to interlink scholarly nodes on the web. This was first presented at the Research Data Alliance meeting in Paris, France, September 22 2015.
Extended version of slides presented at the "404/File Not Found" symposium held at Georgetown University on October 24 2014, see http://www.law.georgetown.edu/library/404/ . The presentation provides a brief overview of the link/reference rot problem and then discusses three complimentary strategies to combat it: Pro-actively capturing web resources that are linked from a seed collection; Referencing the captures by means of annotated links; Accessing the captures using Memento infrastructure.
Presentation for PIDapalooza 2016. PIDs need to be used to achieve their intended persistence. Our research (reported at WWW2016, see http://arxiv.org/1602.09102) found that a disturbing percentage of references to papers that have DOIs actually use the landing page HTTP URI instead of the DOI HTTP URI. The problem is likely related to tools used for collecting references such as bookmarks and reference managers. These select the landing page URI instead of the DOI URI because the former is what's available in the address bar. It can safely be assumed that the same problem exists for other types of PIDs. The net result is that the true potential of PIDs is not realized. In order to ameliorate this problem we propose a Signposting pattern for PIDs (http://signposting.org/identifier/). It consists of adding a Link header to HTTP HEAD/GET responses for all resources identified by a DOI, including the landing page and content resources such as "the PDF" and "the dataset". The Link header contains a link, which points with the "identifier" relation type to the DOI HTTP URI. When such a link is available, tools can automatically discover and use the DOI URI instead of the other URIs (landing page, PDF, dataset) associated with the DOI-identified object.
DBpedia Archive using Memento, Triple Pattern Fragments, and HDTHerbert Van de Sompel
DBpedia is the Linked Data version of Wikipedia. Starting in 2007, several DBpedia dumps have been made available for download. In 2010, the Research Library at the Los Alamos National Laboratory used these dumps to deploy a Memento-compliant DBpedia Archive, in order to demonstrate the applicability and appeal of accessing temporal versions of Linked Data sets using the Memento “Time Travel for the Web” protocol. The archive supported datetime negotiation to access various temporal versions of RDF descriptions of DBpedia subject URIs.
In a recent collaboration with the iMinds Group of Ghent University, the DBpedia Archive received a major overhaul. The initial MongoDB storage approach, which was unable to handle increasingly large DBpedia dumps, was replaced by HDT, the Binary RDF Representation for Publication and Exchange. And, in addition to the existing subject URI access point, Triple Pattern Fragments access, as proposed by the Linked Data Fragments project, was added. This allows datetime negotiation for URIs that identify RDF triples that match subject/predicate/object patterns. To add this powerful capability, native Memento support was added to the Linked Data Fragments Server of Ghent University.
In this talk, we will include a brief refresher of Memento, and will cover Linked Data Fragments, Triple Pattern Fragments, and HDT in more detail. We will share lessons learned from this effort and demo the new DBpedia Archive, which, at this point, holds over 5 billion RDF triples.
As the scholarly communication system evolves to become natively web-based and starts supporting the communication of a wide variety of objects, the manner in which its essential functions – registration, certification, awareness, archiving - are fulfilled co-evolves. This presentation focuses on the nature of the archival function based on a perspective of the future scholarly communication infrastructure. This presentation, prepared for a meeting in June 2014, is based on and updates a previous one that was prepared for a January 2014 meeting. The latter is available at http://www.slideshare.net/atreloar/scholarly-archiveofthefuture
Various FAIR criteria pertaining to machine interaction with scholarly artifacts can commonly be addressed by means of repository-wide affordances that are uniformly provided for all hosted artifacts rather than through artifact-specific interventions. If various repository platforms provide such affordances in an interoperable manner, devising tools - for both human and machine use - that leverage them becomes easier.
My involvement, over the years, in a range of interoperability efforts has brought the insight that two factors strongly influence adoption: addressing a burning issue and delivering a KISS solution to tackle it. Undoubtedly, FAIR and FAIR DOs are burning issues. FAIR Signposting <https://signposting.org/FAIR/> is an ad-hoc repository interoperability effort that squarely fits in this problem space and that purposely specifies a KISS solution, hoping to inspire wide adoption.
To the Rescue of the Orphans of Scholarly CommunicationMartin Klein
To the Rescue of the Orphans of Scholarly Communication
presentation at CNI Spring 2017 meeting
Herbert Van de Sompel
http://orcid.org/0000-0002-0715-6126
Michael L. Nelson
http://orcid.org/0000-0003-3749-8116
Martin Klein
http://orcid.org/0000-0003-0130-2097
This presentation looks back at several efforts, conducted in the past fifteen years, aimed at establishing interoperability for web-based scholarly communication. It tries to characterize the perspectives/approaches taken by these efforts and, based upon that, proposes an HATEOS-based approach to interlink scholarly nodes on the web. This was first presented at the Research Data Alliance meeting in Paris, France, September 22 2015.
These slides go with the paper "Reminiscing About 15 Years of Interoperability Efforts" which is available at http://dx.doi.org/10.1045/november2015-vandesompel
Slides were used for a presentation at the Fall 2015 Membership Meeting of the Coalition for Networked Information.
Presentation for a workshop about persistent identifiers organized by the Royal Library of The Netherlands and DANS. Highlights the non-trivial commitments required of all parties involved in persistent identifier systems to actually keep links based on persistent identifiers ... err ... persistent.
This slide deck provides an overview of proposals to use HTTP Links as a means to address some long standing problems related to scholarly resources on the web.
This presentation provides an overview of the Memento "Time Travel for the Web" framework that is aligned with the stable version of the Memento protocol, specified in RFC 7089.
https://doi.org/10.6084/m9.figshare.11854626.v1
Presented at Dutch National Librarian/Information Professianal Association annual conference 2011 - NVB2011
November 17, 2011
Presentation about reference rot given at the Complexity Science Hub in Vienna, November 2021.
Links to web resources frequently break (link rot), and linked content can change at unpredictable rates (content drift). These dynamics of the Web are detrimental when references to web resources provide evidence or supporting information.
This presentation will report on research that assessed the extent of these problems for links to web resources in scholarly literature, by using three vast corpora of publications and a range of public web archives. It will also describe the Robust Link approach that offers a proactive, uniform, and machine-actionable way to combat link rot and content drift. Finally, it will introduce the Robustify web service and API that was devised to generate links that remain functional over time, paying special attention to challenges related to deploying infrastructure that is required to be long lasting.
Quantifying Orphaned Annotations in Hypothes.ismaturban
Web annotation has been receiving increased attention recently with the organization of the Open Annotation Collaboration and new tools for open annotation, such as Hypothes.is. In this paper, we investigate the prevalence of orphaned annotations, where a live Web page no longer contains the text that had previously been annotated in the
Hypothes.is annotation system (containing 20,953 highlighted text annotations).
TPDL2013 tutorial linked data for digital libraries 2013-10-22jodischneider
Tutorial on Linked Data for Digital Libraries, given by me, Uldis Bojars, and Nuno Lopes in Valletta, Malta at TPDL2013 on 2013-10-22.
http://tpdl2013.upatras.gr/tut-lddl.php
This half-day tutorial is aimed at academics and practitioners interested in creating and using Library Linked Data. Linked Data has been embraced as the way to bring complex information onto the Web, enabling discoverability while maintaining the richness of the original data. This tutorial will offer participants an overview of how digital libraries are already using Linked Data, followed by a more detailed exploration of how to publish, discover and consume Linked Data. The practical part of the tutorial will include hands-on exercises in working with Linked Data and will be based on two main case studies: (1) linked authority data and VIAF; (2) place name information as Linked Data.
For practitioners, this tutorial provides a greater understanding of what Linked Data is, and how to prepare digital library materials for conversion to Linked Data. For researchers, this tutorial updates the state of the art in digital libraries, while remaining accessible to those learning Linked
Data principles for the first time. For library and iSchool instructors, the tutorial provides a valuable introduction to an area of growing interest for information organization curricula. For digital library project managers, this tutorial provides a deeper understanding of the principles of Linked Data, which is needed for bespoke projects that involve data mapping and the reuse of existing metadata models.
This slide deck provides an overview of proposals to use HTTP Links as a means to address some long standing problems related to scholarly resources on the web.
Profiling Web Archives
IIPC General Assembly
Paris, France, May 21, 2014
Michael Nelson, Ahmed AlSum, Michele Weigle, Herbert Van de Sompel, David Rosenthal
Researcher Pod: Scholarly Communication Using the Decentralized WebHerbert Van de Sompel
The presentation provides an overview of the motivation and direction of the Mellon-funded Researcher Pod project that investigates technical aspects of scholarly communication in a decentralized web setting.
Presentation delivered at the Linked Ancient World Data Institute, Drew University, 30 May 2013.
Copyright 2013 New York University.
This work is licensed under a Creative Commons Attribution 4.0 International License.
http://creativecommons.org/licenses/by/4.0/deed.en_US
Funding for the preparation and presentation of this presentation and the workshop at which it was presented was provided by the National Endowment for the Humanities. Any views, findings, conclusions, or recommendations expressed in this presentation do not necessarily reflect those of the National Endowment for the Humanities.
These slides accompany the LDOW2010 paper "An HTTP-Based Versioning Mechanism for Linked Data". The paper is available at http://arxiv.org/abs/1003.3661. It describes how the combination of the Memento (Time Travel for the Web) framework, and a resource versioning approach that is aligned both with the Cool URI notion and with Tim Berners-Lee concept of Time-Generic and Time-Specific, yields the ability to collect current and prior versions of resource merely using "follow your nose" HTTP navigation. The proposed combination further extends the value of a URI, and allows the emergence of a novel realm of temporal Web applications.
These slides go with the paper "Reminiscing About 15 Years of Interoperability Efforts" which is available at http://dx.doi.org/10.1045/november2015-vandesompel
Slides were used for a presentation at the Fall 2015 Membership Meeting of the Coalition for Networked Information.
Presentation for a workshop about persistent identifiers organized by the Royal Library of The Netherlands and DANS. Highlights the non-trivial commitments required of all parties involved in persistent identifier systems to actually keep links based on persistent identifiers ... err ... persistent.
This slide deck provides an overview of proposals to use HTTP Links as a means to address some long standing problems related to scholarly resources on the web.
This presentation provides an overview of the Memento "Time Travel for the Web" framework that is aligned with the stable version of the Memento protocol, specified in RFC 7089.
https://doi.org/10.6084/m9.figshare.11854626.v1
Presented at Dutch National Librarian/Information Professianal Association annual conference 2011 - NVB2011
November 17, 2011
Presentation about reference rot given at the Complexity Science Hub in Vienna, November 2021.
Links to web resources frequently break (link rot), and linked content can change at unpredictable rates (content drift). These dynamics of the Web are detrimental when references to web resources provide evidence or supporting information.
This presentation will report on research that assessed the extent of these problems for links to web resources in scholarly literature, by using three vast corpora of publications and a range of public web archives. It will also describe the Robust Link approach that offers a proactive, uniform, and machine-actionable way to combat link rot and content drift. Finally, it will introduce the Robustify web service and API that was devised to generate links that remain functional over time, paying special attention to challenges related to deploying infrastructure that is required to be long lasting.
Quantifying Orphaned Annotations in Hypothes.ismaturban
Web annotation has been receiving increased attention recently with the organization of the Open Annotation Collaboration and new tools for open annotation, such as Hypothes.is. In this paper, we investigate the prevalence of orphaned annotations, where a live Web page no longer contains the text that had previously been annotated in the
Hypothes.is annotation system (containing 20,953 highlighted text annotations).
TPDL2013 tutorial linked data for digital libraries 2013-10-22jodischneider
Tutorial on Linked Data for Digital Libraries, given by me, Uldis Bojars, and Nuno Lopes in Valletta, Malta at TPDL2013 on 2013-10-22.
http://tpdl2013.upatras.gr/tut-lddl.php
This half-day tutorial is aimed at academics and practitioners interested in creating and using Library Linked Data. Linked Data has been embraced as the way to bring complex information onto the Web, enabling discoverability while maintaining the richness of the original data. This tutorial will offer participants an overview of how digital libraries are already using Linked Data, followed by a more detailed exploration of how to publish, discover and consume Linked Data. The practical part of the tutorial will include hands-on exercises in working with Linked Data and will be based on two main case studies: (1) linked authority data and VIAF; (2) place name information as Linked Data.
For practitioners, this tutorial provides a greater understanding of what Linked Data is, and how to prepare digital library materials for conversion to Linked Data. For researchers, this tutorial updates the state of the art in digital libraries, while remaining accessible to those learning Linked
Data principles for the first time. For library and iSchool instructors, the tutorial provides a valuable introduction to an area of growing interest for information organization curricula. For digital library project managers, this tutorial provides a deeper understanding of the principles of Linked Data, which is needed for bespoke projects that involve data mapping and the reuse of existing metadata models.
This slide deck provides an overview of proposals to use HTTP Links as a means to address some long standing problems related to scholarly resources on the web.
Profiling Web Archives
IIPC General Assembly
Paris, France, May 21, 2014
Michael Nelson, Ahmed AlSum, Michele Weigle, Herbert Van de Sompel, David Rosenthal
Researcher Pod: Scholarly Communication Using the Decentralized WebHerbert Van de Sompel
The presentation provides an overview of the motivation and direction of the Mellon-funded Researcher Pod project that investigates technical aspects of scholarly communication in a decentralized web setting.
Presentation delivered at the Linked Ancient World Data Institute, Drew University, 30 May 2013.
Copyright 2013 New York University.
This work is licensed under a Creative Commons Attribution 4.0 International License.
http://creativecommons.org/licenses/by/4.0/deed.en_US
Funding for the preparation and presentation of this presentation and the workshop at which it was presented was provided by the National Endowment for the Humanities. Any views, findings, conclusions, or recommendations expressed in this presentation do not necessarily reflect those of the National Endowment for the Humanities.
These slides accompany the LDOW2010 paper "An HTTP-Based Versioning Mechanism for Linked Data". The paper is available at http://arxiv.org/abs/1003.3661. It describes how the combination of the Memento (Time Travel for the Web) framework, and a resource versioning approach that is aligned both with the Cool URI notion and with Tim Berners-Lee concept of Time-Generic and Time-Specific, yields the ability to collect current and prior versions of resource merely using "follow your nose" HTTP navigation. The proposed combination further extends the value of a URI, and allows the emergence of a novel realm of temporal Web applications.
Keynote presentation delivered at ELAG 2013 in Gent, Belgium, on May 29 2013. Discusses Research Objects and the relationship to work my team has been involved in during the past couple of years: OAI-ORE, Open Annotation, Memento.
This presentation introduces the Memento solution to allow time travel on the Web. Slides used at the first presentation about Memento at the Library of Congress, November 16 2009. Please consult the February 2010 slides (http://www.slideshare.net/hvdsomp/memento-updated-technical-details-february-2010) for up-to-date technical details. More info at http://www.mementoweb.org
Memento: Big Leaps Towards Seamless Navigation of the Web of the PastHerbert Van de Sompel
These slides provide an explanation of the Memento Framework (time travel for the Web) from the perspective of resource versioning. It also details progress that has been made with deploying the framework since it was first introduced in November 2009, including standardization, development of tools, and advocacy. In addition, it touches upon new challenges (discovery, branding) and announces plans to make transactional Web archiving software available in the course of 2011.
This presentation introduces ResourceSync, a specification aimed to enable web-based synchronization of resources. The specification is the result of a collaboration between NISO and the Open Archives Initiative funded by the Sloan Foundation and JISC. The proposed resource synchronization approach is based on several existing specifications (e.g. Sitemaps, PubSubHubbub, well-known URI) and is aligned with common architectural principles (e.g. REST, follow your nose).
A 15 minute video version of these slides is available at https://www.youtube.com/watch?v=ASQ4jMYytsA
The presentation explores the trend towards a scholarly communication system that is friendly to machines. It presents 3 exhibits illustrating the trend and 1 exhibit illustrating inertia in the system. It makes the point that machine-actionability can be much easier achieved if content and metadata are available in Open Access and under a permissive Creative Commons license. It also observes that even with content and metadata openly available, new costs related to advanced tools to explore the scholarly record will emerge. Finally, it points at significant challenges regarding the persistence of the scholarly record in light of increasingly interconnected and actionable content and advanced tools to interact with it.
The slides were used for a plenary presentation at the LIBER 2011 Conference in Barcelona, Spain, on June 30 2011.
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...Martin Klein
As research and research communication nowadays happen on the web, scholarly articles increasingly link to resources that are not necessarily considered part of the scholarly record but are rather so-called web-at-large resources such as project websites, online debates, presentations, blogs, videos, etc. Our research (reported in PLOS ONE [1]) found overwhelming evidence for this trend and showed the severity of link rot for such references. Our more recent study [2] provides unprecedented insight into the vast extent of content drift for these references. We speak of content drift when the content of a referenced resource evolves after the publication of the referencing article, in many cases, beyond recognition. Reference rot, the combination of link rot and content drift, makes it impossible to revisit the context that surrounded these research papers as it was at the time of writing and must therefore be considered a significant detriment to scholarly communication. In order to introduce a level of persistence for the scholarly context we devised the Robust Links approach that consists of archiving referenced web-at-large resources and referencing them using Link Decoration [3]. The proposed approach is aimed at providing optimal guarantees that referenced web-at-large resources can be revisited as they were when a paper referenced them.
In this presentation we will report on both studies, provide a reliable quantification of the reference rot problem and discuss our solution to address it. Robust Links are demonstrated in a recently published paper [4].
[1] http://dx.doi.org/10.1371/journal.pone.0115253
[2] http://dx.doi.org/10.1371/journal.pone.0167475
[3] http://robustlinks.mementoweb.org/spec/
[4] http://dx.doi.org/10.1045/november2015-vandesompel
The state of play currently with the preservation of all things webby and concrete actions to take. Delivered by Peter Burnhill at the ALSP event "Standing on the Digits of Giants: Research data, preservation and innovation" on 8 March 2015 in London.
Delivered by Peter Burnhill, Director of EDINA, at the PRELIDA Consolidation and Dissemination workshop on 17/18 October 2014 (http://prelida.eu/consolidation-workshop).
Summary: The web changes over time, and significant reference rot inevitably occurs. Web archiving delivers only a 50% chance of success. So in addition to the original URI, the link should be augmented with temporal context to increase robustness.
Delivered by Richard Richard Wincewicz at Open Repositories OR2015, Indianapolis, IN, USA, June 2014.
An introduction to "Reference or Link Rot", the evidence for the extent of the problem, and remedies proposed by the Hiberlink project.
HIBERLINK: Reference Rot and Linked Data: Threat and RemedyPRELIDA Project
Peter Burnhill (EDINA, University of Edinburgh), presented at the 3rd PRELIDA Consolidation and Dissemination Workshop, Riva, Italy, October, 17, 2014. More information about the workshop at: prelida.eu
A talk given at 'Taking the Long View: International Perspectives on E-Journal Archiving', a conference hosted by EDINA and ISSN IC at the University of Edinburgh, September 7th 2015.
Presented in Glasgow at UKSG, 31 March - 1 April, by Peter Burnhill and Richard Wincewicz.
This presentation looks at reference rot, link rot, and the work of Hiberlink to ensure web citations persist through time.
Professional Forum:
Eleanor Fink, American Art Collaborative, USA, Shane Richey, Crystal Bridges Museum of American Art, USA, Jeremy Tubbs, Indianapolis Museum of Art, USA, Rebecca Menendez, Autry Museum of the American West, USA, Cathryn Goodwin, Princeton University, USA
Last year the Andrew W. Mellon Foundation awarded a planning grant to the American Art Collaborative (AAC), a consortium of thirteen U.S. museums who have come together to learn about and implement LOD within their respective museums. Under the grant AAC developed a road map for the Initiative that will test LOD reconciliation issues, develop production and reconciliation tools, and result in the publication of American art holdings as LOD for researchers, educators, general public, aggregators such as DPLA, ResearchSpace, and digital application developers. The road map also includes publication of best practices and guidelines to share with the broader museum community.
In September 2015, AAC member Crystal Bridges Museum of American Art received on behalf of AAC, an IMLS National leadership grant and plans for additional grants are underway. These grants are allowing AAC to convert data to LOD using the CIDOC CRM, link to the Getty Vocabularies as well as contribute missing names to enhance the vocabularies, and implement an API and reader compliant with the International Image Interoperability Framework (IIIF) that will allow researchers to compare and contrast AAC LOD. Several open source tools including a link curation tool and IIIF/CRM translator will be developed and made available for other museums. AAC is developing its LOD under a federated model whereby each AAC member assumes responsibility for updating and maintaining its own data.
The session will bring together representatives from large as well as small AAC partners to discuss the benefits of LOD, some of the lessons learned and challenging documentation issues AAC is facing.
Bibliography:
American Alliance of Museums (Museum July/August 2016 Beyond the Hyperlink: Linked Open Data creates new opportunities;
http://www.club-innovation-culture.fr/emmanuelle-delmas-glass-yale-center-for-british-art-si-les-musees-ne-choisissent-pas-lopen-content-ils-deviendront-invisibles-et-inutiles/
Web Today, Good Tomorrow? Transactional archiving of web contentPeter Burnhill
Report from Hiberlink Project into threat of and remedy for Reference Rot. Archiving what is cited on the web. Need for action by scholarly publishers.
As delivered at Innovators Session, Professional/Scholarly Publishing (PSP) Division, Association of American Publishers (AAP), Washington DC, 1-4 February 2017.
"Scholarly Communication: Deconstruct and Decentralize" was presented at the Fall 2017 Meeting of the Coalition for Networked Information. It explores working towards a Scholarly Commons by applying decentralized web ideas to scholarly communication.
Presentation given by Peter Burnhill, director of EDINA, at #ReCon_15 : Beyond the paper: publishing data, software and more. Edinburgh, 19 June 2015
Peter Burnhill
http://reconevent.com/
IFLA LIDASIG Open Session 2017: Introduction to Linked DataLars G. Svensson
At the IFLA Linked Data Special Interest Group open session in Wroclaw we briefly introduced the mission of the SIG and then went on to a brief introduction to what linked data is and why that topic is important to libraries.
The presentation was held jointly by Astrid Verheusen (general introduction to the SIG) and Lars G. Svensson (introduction to Linked Data)
Linked Data Basics Slot in WWW2012 Tutorial: Practical Cross-Dataset Queries on the Web of Data
http://latc-project.eu/events/www2012-tutorial-cross-dataset-queries
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a y...Martin Kalfatovic
Thinking of Linking: A random series of ideas, concepts, Platonic ideals, a yeoman's miscellany, and nonesuch guide to Linked Data, especially as it relates to libraries, archives, and museums. Martin R. Kalfatovic. American Library Association Annual Meeting. Anaheim, CA. 23 June 2012.
Stronger together: community initiatives in journal managementJisc
There has been a recent growth of initiatives to address common problems regarding current and long-term access to e-journal content. Jisc is at the forefront of many of these with the close participation and active input of educational institutions.
This session aims to summarise the current state of key themes with pointers to future directions of areas such as sustainability, the move towards e-only environments, and shared consortia approaches. It will provide an overview and panel discussion on developing the supporting infrastructure to meet the needs of users. The discussion will focus on how institutions, community bodies and service providers can best work together to ensure sustainable, long-term initiatives by seeking to introduce uniformity, standardisation and collaboration to an even greater extent.
The session will introduce two new Jisc-supported projects in this area, the Keepers Registry Extra and SafeNet initiatives, and discuss how these fit alongside existing Jisc services such as Knowledge Base+, UK LOCKSS Alliance, Journal Archives and JUSP (Journal Usage Statistics Portal). The panel will address how this catalogue of services contributes towards a coherent strategy in the management of e-journal content.
Overview of the problems of Reference Rot and what actions to take to ensure the persistence of the digital scholarly record. Presented by Peter Burnhill with Adam Rusbridge & Muriel Mewissen, EDINA, University of Edinburgh, UK; Herbert Van De Sompel, Los Alamos National Laboratory Research Library, USA; Gaelle Bequet, ISSN International Centre, France; at Towards Open Science, LIBER, London, June 2015.
Similar to Hiberlink: Investigating Reference Rot, December 2013 (20)
Registration / Certification Interoperability Architecture (overlay peer-review)Herbert Van de Sompel
Presentation for the COAR meeting on Overlay Peer-Review held at INRIA, Paris, France. It provides overall context regarding a scholarly communication system in which the core functions of scholarly communication (registration, certification, awareness, archiving) are implemented in a decoupled manner and whereby each function can simultaneously be fulfilled by different parties, potentially in different ways. It shows how notifications can be used to achieve loosely coupled, point-to-point interoperability in such an environment, zooming in on interoperability between registration and certification aka interoperability between repositories and overlay peer-review services.
Slides used for a keynote presentation at the VIVO 2019 Conference in Podgorica, Montenegro.
Abstract: The invitation to present a keynote at the VIVO Conference and the goal of the VIVO platform, as stated on the DuraSpace site, to create an integrated record of the scholarly work of an organisation reminded me of various efforts that I have been involved in over the past years that had similar goals. EgoSystem (2014) attempted to gather information about postdocs that had left the organisation, leaving little or no contact details behind. Autoload (2017), an operational service, discovers papers by organisational researchers in order to upload them in the institutional repository. myresearch.institute (2018), an experiment that is still in progress, discovers artefacts that researchers deposit in web productivity portals and subsequently archives them. More recently, I have been involved in thinking about the future of NARCIS, a portal that provides an overview of research productivity in The Netherlands. The approach taken in all these efforts share a characteristic motivated by a desire to devise scalable and sustainable solutions: let machines rather than humans do the work. In this talk, I will provide an overview of these efforts, their motivations, the challenges involved, and the nature of success (if any).
Presentation for PIDapalooza 2019, Dublin, Ireland.
The Scholarly Orphans project, funded by the Andrew W. Mellon Foundation, explores technical approaches aimed at capturing and archiving scholarly artifacts that researchers deposit in web productivity portals as a means to collaborate and communicate with their peers. These artifacts are not collected by other frameworks aimed at archiving the scholarly record (e.g., LOCKSS, Portico, Institutional Repositories) and are only incidentally captured by web archives. The project explores an institution-driven approach inspired by web archiving. To demonstrate the ongoing thinking, the project has devised an experimental automated pipeline that continuously discovers, captures, and archives artifacts. These are created by actual researchers who, for the purpose of the experiment, were virtually enlisted in a fictive research institution. A portal at myresearch.institute provides an overview of the artifacts that were discovered and provides access to archived versions stored in both an institutional and a cross-institutional archive. The set-up leverages a range of technologies that share a flavor of persistence: Memento, Memento Tracer, Robust Links, Signposting.
As a memento of my last week of working at LANL, I put together a slide deck that provides an overview of major efforts conducted during the time I was there.
Presentation given at EuropeanaTech 2018 in Rotterdam, The Netherlands. Provides a summary of insights gained from working for about a decade on challenges related to temporal aspects of the web, persistence.
Looks at hyperlinks from the perspective of a managed collection of resources for which link persistence/integrity is considered a quality of service concern. Distinguishes between links into other managed collections and to the web at large. Considers link rot and content drift.
The slides were used to accompany an overview of the outcomes of the ResourceSync project at the 2014 Spring Membership Meeting of the Coalition for Networked Information (CNI).
The launch of ResourceSync, a joint project of the National Information Standards Organization (NISO) and the Open Archives Initiative (OAI) funded by the Alfred P. Sloan Foundation, was motivated by the ubiquitous need to synchronize resources for applications in the realm of cultural heritage and research communication. After an initial problem definition and scoping phase, the project has designed, specified, and tested a framework for web-based synchronization that is based on SiteMaps, a protocol widely used by web servers to advertise the resources they make available to search engines for indexing. This choice allows repositories to address both search engine optimization and resource synchronization needs using the same technology.
The ResourceSync framework specifies various modular capabilities that a repository can support in order to allow third party systems to remain synchronized with its evolving resources. For example, a Resource List provides an inventory of resources whereas a Change List details resources that were created, deleted or updated during a given temporal interval. Support for capabilities can be combined in order to meet local or community requirements. The framework specifies capabilities that require a third party to recurrently poll for up-to-date information about a repositories’ resources but also publish/subscribe capabilities that keep third parties informed about changes through notifications, thereby significantly reducing synchronization latency.
Persistent Identifiers and the Web: The Need for an Unambiguous MappingHerbert Van de Sompel
Presentation given at the International Digital Curation Conference in San Francisco, February 26 2014. Highlights the lack of machine-actionability of persistent identifiers assigned to scholarly communication assets. Proposes an approach to address the issue that meets requirements that take into account the changing nature of web based research communication. A draft paper provides more details: http://public.lanl.gov/herbertv/papers/Papers/2014/IDCC2014_vandesompel.pdf
Presentation given at the EMTACL12 conference in Trondheim, Norway, on October 1 2012. Discusses the evolution towards a highly dynamic scholarly record (assets don't have the sense of fixity they used to have; assets are highly interdependent) and how the archiving infrastructure used for scholarly communication can not adequately deal with this dynamism.
This presentation provides a problem perspective from the recently launched NISO/OAI ResourceSync effort that aims at devisions a framework for synchronizing web resources. The slides were used during a WebEx conference on March 6 2012.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Dev Dives: Train smarter, not harder – active learning and UiPath LLMs for do...UiPathCommunity
💥 Speed, accuracy, and scaling – discover the superpowers of GenAI in action with UiPath Document Understanding and Communications Mining™:
See how to accelerate model training and optimize model performance with active learning
Learn about the latest enhancements to out-of-the-box document processing – with little to no training required
Get an exclusive demo of the new family of UiPath LLMs – GenAI models specialized for processing different types of documents and messages
This is a hands-on session specifically designed for automation developers and AI enthusiasts seeking to enhance their knowledge in leveraging the latest intelligent document processing capabilities offered by UiPath.
Speakers:
👨🏫 Andras Palfi, Senior Product Manager, UiPath
👩🏫 Lenka Dulovicova, Product Program Manager, UiPath
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Search and Society: Reimagining Information Access for Radical FuturesBhaskar Mitra
The field of Information retrieval (IR) is currently undergoing a transformative shift, at least partly due to the emerging applications of generative AI to information access. In this talk, we will deliberate on the sociotechnical implications of generative AI for information access. We will argue that there is both a critical necessity and an exciting opportunity for the IR community to re-center our research agendas on societal needs while dismantling the artificial separation between the work on fairness, accountability, transparency, and ethics in IR and the rest of IR research. Instead of adopting a reactionary strategy of trying to mitigate potential social harms from emerging technologies, the community should aim to proactively set the research agenda for the kinds of systems we should build inspired by diverse explicitly stated sociotechnical imaginaries. The sociotechnical imaginaries that underpin the design and development of information access technologies needs to be explicitly articulated, and we need to develop theories of change in context of these diverse perspectives. Our guiding future imaginaries must be informed by other academic fields, such as democratic theory and critical theory, and should be co-developed with social science scholars, legal scholars, civil rights and social justice activists, and artists, among others.
JMeter webinar - integration with InfluxDB and GrafanaRTTS
Watch this recorded webinar about real-time monitoring of application performance. See how to integrate Apache JMeter, the open-source leader in performance testing, with InfluxDB, the open-source time-series database, and Grafana, the open-source analytics and visualization application.
In this webinar, we will review the benefits of leveraging InfluxDB and Grafana when executing load tests and demonstrate how these tools are used to visualize performance metrics.
Length: 30 minutes
Session Overview
-------------------------------------------
During this webinar, we will cover the following topics while demonstrating the integrations of JMeter, InfluxDB and Grafana:
- What out-of-the-box solutions are available for real-time monitoring JMeter tests?
- What are the benefits of integrating InfluxDB and Grafana into the load testing stack?
- Which features are provided by Grafana?
- Demonstration of InfluxDB and Grafana using a practice web application
To view the webinar recording, go to:
https://www.rttsweb.com/jmeter-integration-webinar
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
"Impact of front-end architecture on development cost", Viktor TurskyiFwdays
I have heard many times that architecture is not important for the front-end. Also, many times I have seen how developers implement features on the front-end just following the standard rules for a framework and think that this is enough to successfully launch the project, and then the project fails. How to prevent this and what approach to choose? I have launched dozens of complex projects and during the talk we will analyze which approaches have worked for me and which have not.
"Impact of front-end architecture on development cost", Viktor Turskyi
Hiberlink: Investigating Reference Rot, December 2013
1. Investigating Reference Rot in Web-Based Scholarly Communication
Herbert Van de Sompel
Los Alamos National Laboratory
@hvdsomp
Martin Klein
Los Alamos National Laboratory
@mart1nkle1n
http://hiberlink.org #hiberlink
http://mementoweb.org #memento
Hiberlink is funded by the Andrew W. Mellon Foundation
2. Hiberlink Project Partners
• Los Alamos National Laboratory:
• Research Library: Martin Klein, Robert Sanderson, Herbert Van
de Sompel
• University of Edinburgh:
• Edina: Peter Burnhill, Neil Mayo, Muriel Mewissen, Christine
Rees, Tim Stickland, Riachard Wincewicz
• Language Technology Group: Beatrice Alex, Claire Grover,
Richard Tobin, Ke “Adam” Zhou
• Funding: Andrew W. Mellon Foundation
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
3. Acknowledgments
• Primary datasets: arXiv, Chesapeake Project, Elsevier, PubMed
Central, PLoS, … (many more to come)
• Secondary datasets: Ex Libris, MS Academic, SerialsSolutions
• Technology support: CrossRef Labs, CrossRef Prospect, Elsevier
• Liaisons: archive.is, CrossRef, Internet Archive, Old Dominion
University Web Science & Digital Library Research Group, perma.cc
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
4. Reference Rot
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
5. Problem Domain
• Web-based scholarly communication links to, references, Web
resources:
• Formal citing of scholarly resources
• Referencing “Web at Large” resources needed or created in
research activities e.g. project websites, software, ontologies,
workflows, online debate, slides, blogs, videos, etc.
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
6. Problem Domain
• Links to web resources are subject to Reference Rot:
• Link Rot: Link stops working, e.g. HTTP 404
• Content Decay: Linked content changes over time
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
7. References in Web-Based Scholarly Communication
To Scholarly Resources
To Web at Large Resources
Link Rot
Content Decay
an increasingly blurry boundary
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
8. References in Web-Based Scholarly Communication
To Scholarly Resources
Link Rot
To Web at Large Resources
DOI, HTTP version of DOI
Content Decay
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
9. References in Web-Based Scholarly Communication
To Scholarly Resources
Link Rot
DOI, HTTP version of DOI
Content Decay
To Web at Large Resources
Fixity of content
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
10. References in Web-Based Scholarly Communication
To Scholarly Resources
Link Rot
DOI, HTTP version of DOI
Content Decay
To Web at Large Resources
Fixity of content
Archiving: CLoCKSS,
LoCKSS, Portico, Keepers
Registry, …
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
11. References in Web-Based Scholarly Communication
To Scholarly Resources
Link Rot
DOI, HTTP version of DOI
Content Decay
To Web at Large Resources
Fixity of content
Archiving: CLoCKSS,
LoCKSS, Portico, Keepers
Registry, …
There are issues here too, see
David Rosenthal blog post http://blog.dshr.org/2013/11/patio-perspectives-at-anadp-ii.html
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
12. References to Scholarly Resources
• We hope/assume that peer-reviewed scholarly literature has fixity
and is adequately archived
• This, BTW, might not be a correct assumption:
• Dynamic, content rich, landing pages
• No public audit regarding archival status of electronic journal
literature archived in special-purpose infrastructure
• Poor archiving in public web archives, related to protected
content
• Initial information in Keepers Registry indicates spotty archiving
of of electronic journal literature
• … Still, this is NOT what Hiberlink investigates
See David Rosenthal blog post http://blog.dshr.org/2013/11/patio-perspectives-at-anadp-ii.html
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
13. References in Web-Based Scholarly Communication
To Scholarly Resources
Link Rot
DOI, HTTP version of DOI
Content Decay
To Web at Large Resources
Fixity of content
Archiving: CLoCKSS,
LoCKSS, Portico, Keepers
Registry, …
Hiberlink focus
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
14. References to “Web at Large” Resources
• Hiberlink focuses on the wide variety of web resources needed or
created in research activities
• These resources:
• Are not necessarily under the custodianship of a party that cares
about long term integrity, access
• Do not necessarily have the same sense of fixity that e.g.
journal articles have
• Reference Rot makes it impossible to adequately recreate the
temporal context for scholarly discourse
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
15. Herbert Van de Sompel, et al. (2004) http://dx.doi.org/10.1045/september2004-vandesompel
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
17. Hiberlink: Investigating Reference Rot
• Hiberlink explores references to Web at Large resources:
• Quantifies Reference Rot
• Explores potential solutions to Reference Rot
• Focuses on links in electronic journal articles
• But has the big picture in mind: dynamic, interdependent,
web-based scholarly assets
• See Herbert Van de Sompel, From the Version of
Record to a Version of the Record, CNI Spring 2013
plenary talk - http://www.youtube.com/watch?v=fhrGSQbNVA
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
18. References in Web-Based Scholarly Communication
To Scholarly Resources
Link Rot
DOI, HTTP version of DOI
Content Decay
To Web at Large Resources
Fixity of content
Archiving: CLoCKSS,
LoCKSS, Portico, Keepers
Registry, …
Is it worth our time to study this?
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
19. Articles Increasingly Link to Web Resources
URIs extracted from PubMed papers – links to Web at Large resources
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
20. The New York Times Cares
http://www.nytimes.com/2013/09/24/us/politics/
in-supreme-court-opinions-clicks-that-lead-nowhere.html
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
21. Reference Rot in Law Journals
Zittrain, J., Kendra, A., Lessig, L. (2013) Perma: Scoping and
Addressing the Problem of Link and Reference Rot in Legal
Citations
• Link rot in Law Journals: ~27%
• Reference rot in law journals: ~70%
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2329161
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
22. Not Just in Scholarly Communication
Zittrain, J., Kendra, A., Lessig, L. (2013) Perma: Scoping and
Addressing the Problem of Link and Reference Rot in Legal
Citations
Liebler, R., Liebert, J. (2012) Something rotten in the State of Legal
Citation
• Link rot: 29% of links in Supreme Court decisions (study of 19962010)
• Reference rot, including link rot: 49.9% of links in Supreme Court
decisions
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2329161
http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2188070
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
23. Not Just in Scholarly Communication
http://en.wikipedia.org/wiki/Wikipedia_talk:Link_rot
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
25. Quantifying Reference Rot
• Reference Rot has been studied before:
• For the web at large
• For scholarly communication
• For government documents
• What is different with Hiberlink?
• Investigates Reference Rot not just link rot, i.e. includes the
aspect of changing content not just rotting links
• Investigates coverage of referenced resources in web archives
• Operates at a massive scale regarding number of journal
articles, referenced URIs, web archive lookups
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
26. STUDY
Author (Date)
Lawrence (2001)
Casserly (2003)
Casserly (2007)
Rumsey (2002)
Davis (2002)
Wren (2004)
Sellitto (2005)
Goh (2005)
Dimitrova (2007)
McCown (2005)
Wagner (2009)
Parker (2007)
Duda (2008)
Falagas (2007)
Russell (2008)
Wren (2008)
Moghaddam (2010)
Sanderson (2011)
Year of
Publication
of Citations
1993-1999
1999-2000
1999-2000
1997-2001
1999-2001
1994-2002
1995-2003
1997-2003
2000-2003
1995-2004
2002-2004
2002-2005
1997-2005
2003-2006
1999-2006
1994-2007
1995-2008
1993-2010
# URIs
67,577
500
500
3,406
688
1,630
1,043
2,516
1,126
4,387
2,011
1,229
2,100
1,417
510
6,154
1,761
162,052
#URIs looked
up in web
archives
500
500
2.011
1,761
162,052
Sanderson, R., Phillips, M., and Van de Sompel, H. (2011) http://arxiv.org/abs/1105.3459
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
27. Quantifying Reference Rot - Methodology
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
32. • Filter DOIs, HTTP version of DOIs
• Filter URIs that should have been
referenced by means of a DOI
• Supported by secondary
datasets
• Filter obvious noise, e.g. localhost,
example.org, foo.bar, licenses, etc.
33.
34. • HTTP HEAD on referenced URI-R
• Follow redirects up to a maximum
of 50
• Record HTTP transaction chain
• If HTTP transaction chain ends with
2XX status code: Exists
• If HTTP transaction chain does not
end with 2XX: !Exist
35. • Lookup in web archives via a
Memento Aggregator that covers
among others Internet Archive,
Archive-It, archive.is, British
Library web archive, UK National
Archives web archive, Icelandic
web archive
36. • Obtain TimeMap per URI
• If TimeMap does not exist:
!Archived
• If TimeMap exists, select
Memento URI-M closest to
article publication date
• HTTP HEAD on URI-M
• Follow archived redirects
up to a maximum of 50
• Record HTTP transaction
chain
• If HTTP transaction chain
ends 2XX: Archived
• If HTTP transaction chain
does not end with 2XX:
!Archived
38. 200k
31.2%
10k
80
90
!Exist
Archived
Archived within 30 days
Archived within 14 days
Archived within 7 days
Archived within 1 day
50k
100
Quantifying Reference Rot – Early Results
1k
100
40
50
Amount of citations
60
70
16.8%
10
20
30
11.3%
1
0
40.7%
1997
1999
2001
2003
2005
2007
2009
2011
1
5
10
50
Weeks
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
100
500
1000
39. Study: PubMed Central Corpus 01/1997 – 12/2012
•
•
•
•
Articles processed:
Articles that contain Web at Large URIs:
References to Web at Large URIs:
Unique referenced Web at Large URIs:
494,785
176,527
557,432
327,782
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
40. Percentage Exists & Archived Referenced URIs
Exists & Archived
!Exists & Archived
Exists & !Archived
!Exists & !Archived
31.2%
16.8%
11.3%
40.7%
URIs extracted from PubMed papers – links to Web at Large resources
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
41. Percentage Exists & Archived in 30 Day Window
23%
16.7%
Exists & Archived
!Exists & Archived
Exists & !Archived
!Exists & !Archived
5.1%
55.2%
URIs extracted from PubMed papers – links to Web at Large resources
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
42. Percentage Exists & Archived in 15 Day Window
24.6%
Exists & Archived
!Exists & Archived
Exists & !Archived
!Exists & !Archived
12.4%
3.5%
59.5%
URIs extracted from PubMed papers – links to Web at Large resources
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
43. Percentage Exists & Archived in 07 Day Window
25.8%
Exists & Archived
!Exists & Archived
Exists & !Archived
!Exists & !Archived
8.8%
2.3%
63.1%
URIs extracted from PubMed papers – links to Web at Large resources
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
44. Percentage Exists & Archived in 01 Day Window
Exists & Archived
!Exists & Archived
Exists & !Archived
!Exists & !Archived
27.9%
0.9%
0.2%
71%
URIs extracted from PubMed papers – links to Web at Large resources
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
45. 50
0
10
20
30
40
Percent
60
70
80
90
100
Percentage of !Exists per Year
1997
1999
2001
2003
2005
2007
2009
2011
URIs extracted from PubMed papers – links to Web at Large resources
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
46. 100
Percentage of !Exists, Archived per Year
0
10
20
30
40
50
60
70
80
90
!Exist
Archived
Archived within 30 days
Archived within 14 days
Archived within 7 days
Archived within 1 day
1997
1999
2001
2003
2005
2007
2009
2011
URIs extracted from PubMed papers – links to Web at Large resources
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
47. 100
90
80
0
10
20
30
40
50
60
70
80
70
60
50
40
30
0
10
20
Percent
Percentage !Exists URIs
90
!Exist
Archived
Archived within 30 days
Archived within 14 days
Archived within 7 days
Archived within 1 day
1997
1999
2001
2003
2005
2007
2009
2011
Percentage Archived URIs for !Exists URIs
100
Percentage of !Exists and of Those Archived per Year
URIs extracted from PubMed papers – links to Web at Large resources
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
48. 100
1000
10000 30000
Absolute Number of Archived per Year
1
Archived
Archived within 30 days
Archived within 14 days
Archived within 7 days
Archived within 1 day
1997
1999
2001
2003
2005
2007
2009
2011
URIs extracted from PubMed papers – links to Web at Large resources
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
49. Solving Reference Rot
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
50. References in Web-Based Scholarly Communication
To Scholarly Resources
Link Rot
DOI, HTTP version of DOI
Content Decay
Fixity of content
To Web at Large Resources
-
Archiving: CLoCKSS,
LoCKSS, Portico, Keepers
Registry, …
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
51. Addressing Content Decay
• Aim for a more pro-active approach to collect snapshots of web
resources (likely to be) referenced in scholarly communication
• A system that hosts resources that are likely to be referenced in
scholarly communication can create snapshots of itself by:
o Using CMS, wikis, datawikis with solid versioning
mechanisms
o Subscribing to on-demand self web archiving service
o Using transactional web archives, cf. SiteStory
• Referenced resources can be web archived on-demand:
o By authors during note taking, authoring
o By platforms involved in the publication process, e.g.
archiving linked resources at the time of manuscript
submission
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
52. References in Web-Based Scholarly Communication
To Scholarly Resources
To Web at Large Resources
Link Rot
DOI, HTTP version of DOI
Content Decay
Fixity of content
-
Archiving: CLoCKSS,
LoCKSS, Portico, Keepers
Registry, …
Web archiving
Content Versioning Systems
Self archiving
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
53. Click link to blog post
http://blogs.law.harvard.edu/futureoftheinternet/2013/09/22/perma/
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
55. Search and find Mementos in Internet Archive for
http://blogs.law.harvard.edu/futureoftheinternet/2013/09/22/perma/
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
56. Search and find a Memento in archive.is for
http://blogs.law.harvard.edu/futureoftheinternet/2013/09/22/perma/
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
57. Click perma.cc link to Memento of blog post
http://perma.cc/0Hg62eLdZ3T
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
58. Receive Memento from perma.cc
http://perma.cc/0Hg62eLdZ3T
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
59. Search and do not find Mementos in Internet Archive for
http://perma.cc/0Hg62eLdZ3T
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
60. Search and do not find Mementos in archive.is for
http://perma.cc/0Hg62eLdZ3T
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
61. What Happened?
• Good news: The number of archived copies of the blog post was
increased by pro-actively creating a Memento in perma.cc
• Bad news: The possibility of finding Mementos for the blog post
in other web archives was undermined by replacing the Original
URI-R with the Memento URI-M
• The Memento URI-M is a key in only one archive
• The Original URI-R is a key in all web archives
• Using the Memento URI-M in a link requires the permanent
existence/uptime of the archive that issued it
• One link rot problem was replaced by another …
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
62. Web Archives Less Permanent than Permanent?
http://webcitation.org
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
63. Web Archives Less Permanent than Permanent?
http://ws-dl.blogspot.com/2013/11/2013-11-21-conservative-party-speeches.html
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
64. Web Archives Less Permanent than Permanent?
http://richmondsfblog.com/2013/11/06/part-of-internet-archive-building-badly-burned-in-earlymorning-fire/
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
65. What To Do?
• Need an approach for referencing archived resources that
supports lookups in many web archives, not just one
• Since the Original URI-R is a key in all web archives, the linking
approach needs to necessarily include it
• Hence, two URIs are required:
• The Original URI-R
• The Memento URI-M, e.g. the perma.cc URI
• But a link in HTML only carries one URI!
• It is understandable that the Memento URI-M is used for the
link: the approach works with existing web infrastructure
• Yet, an approach to address link rot that itself is subject to
link rot is … err… problematic
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
66. The Missing Link Proposal
• Extend the link to the Original URI-R with temporal context:
• Memento URI-M in a specific archive
• Dates:
• date of page that contains the link
• date of the link, cf. “accessed at” in citations of web
resources
• Provide the Original URI-R and the temporal context in a
machine-actionable manner so it can be used by user and
machine agents to retrieve Mementos from various web archives
http://mementoweb.org/missing-link/
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
67. The Missing Link Proposal
http://mementoweb.org/missing-link/
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
68. How to Make Missing Link Happen?
• The existing approach works out of the box but is problematic
• Missing Link requires infrastructure changes but generally
contributes to increased web persistence:
• HTML
• META for page date: no problem, already in use
• Attributes for <a> to convey URI-M and link date:
• data- extensibility mechanism in HTML5 can be
used but is not intended for cross-site applications
• In 1995, HTML had the URN attribute for <a> as a
means to address web persistence concerns
• Browser, tool support
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
69. References in Web-Based Scholarly Communication
To Scholarly Resources
To Web at Large Resources
Link Rot
DOI, HTTP version of DOI
Missing Link proposal
Content Decay
Fixity of content
-
Archiving: CLoCKSS,
LoCKSS, Portico, Keepers
Registry, …
Web archiving
Content Versioning Systems
Self archiving
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
70. Demo: Application Using Temporal Context for Links
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
71. Application Using Temporal Context for Links
• Memento for Chrome is an application that uses Original URI-R
and dates to access Mementos in various web archives
• Memento around the date selected in user interface
calendar
• Most recently archived Memento
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
72. Memento Time Travel for Chrome
http://bit.ly/memento-for-chrome
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
73. Memento Time Travel for Chrome
http://www.youtube.com/watch?v=0_70lQPOOIg
http://www.youtube.com/watch?v=WtZHKeFwjzk
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
74. Application Using Temporal Context for Links
• An experimental version of Memento for Chrome also uses
Missing Link information (Original URI-R, URI-M, and dates) to
access Mementos in various web archives:
• Memento around the date selected in user interface calendar
• Most recently archived Memento
• Memento around the date of the page that contains the link
• Memento around the date of the link
• Memento URI-M in a specific archive
• A Memento client is just one example of an application that can
use temporal context provided for links. Other applications,
including search engines, can use it too
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
75. NYT has <META itemprop=“datePublished” content=“2013-09-23”>
Link in NYT was:
<a href=“http://blogs.law.harvard.edu/futureoftheinternet/2013/09/22/perma/”>
Changed to:
<a href=“http://blogs.law.harvard.edu/futureoftheinternet/2013/09/22/perma/”
data-versionurl=“http://perma.cc/0Hg62eLdZ3T”>
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
76. Right Click Link Get near current time (done on Nov 25 2013)
http://blogs.law.harvard.edu/futureoftheinternet/2013/09/22/perma/
enabler: <a href=“URI-R”>
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
77. Receive Memento from archive.is, Nov 24 2013
http://archive.is/20131124221749/http://blogs.law.harvard.edu/futureoftheinternet/2013/09/22/perma/
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
78. Right Click Link Get at page date
http://blogs.law.harvard.edu/futureoftheinternet/2013/09/22/perma/
enabler: <a href=“URI-R”> & <META itemprop=“datePublished” content=“2013-09-23”>
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
79. Receive Memento from Internet Archive, Sep 24 2013
http://web.archive.org/web/20130924053315/http://futureoftheinternet/2013/09/22/perma
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
80. Right Click Link Get from perma.cc
http://blogs.law.harvard.edu/futureoftheinternet/2013/09/22/perma/
enabler: <a href=“URI-R” data-versionurl=“URI-M”>
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
81. Receive Memento from perma.cc, Oct 2 2013
http://perma.cc/0Hg62eLdZ3T
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
82. Link in NYT was:
<a href=“http://perma.cc/0Hg62eLdZ3T”>
Changed to:
<a href=“http://blogs.law.harvard.edu/futureoftheinternet/2013/09/22/perma/”
data-versionurl=“http://perma.cc/0Hg62eLdZ3T”>
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
83. All previous options available
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
85. Click Link (done on November 25 2013)
http://en.wikipedia.org/wiki/Link_rot
enabler: <a href=“URI-R”>
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
87. Scroll down in page
Shows Perma.cc link, added October 22 2013, a month after the blog post
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
88. Right Click Link Get at page date
http://en.wikipedia.org/Link_rot
enabler: <a href=“URI-R”> & <META itemprop=“datePublished” content=“2013-09-22”>
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
90. Scroll down in page
Does not show Perma.cc link, added October 22 2013, a month after the blog post
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
91. Link in blog was:
<a href=“http://librarylab.law.harvard.edu”>
Changed (for fun) to:
<a href=“http://librarylab.law.harvard.edu” data-versiondate=“2010-09-22”>
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
92. Click Link (done on November 25 2013)
http://librarylab.law.harvard.edu
enabler: <a href=“URI-R”>
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
94. Right Click Link Get at page date
http://librarylab.law.harvard.edu
enabler: <a href=“URI-R”> & <META itemprop=“datePublished” content=“2013-09-22”>
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
95. Receive Memento from archive.is, Jun 21 2013
http://archive.is/20130621162538/http://librarylab.law.harvard.edu
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
96. Right Click Link Get at link date
http://librarylab.law.harvard.edu
enabler: <a href=“URI-R” data-versiondate=“2010-09-22”>
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
97. Receive Memento from Internet Archive, Sep 18 2010
http://web.archive.org/web/20100918025331/http://librarylab.law.harvard.edu
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
98. Bottom Line: A Link Leads to Many Times and Archives
http://mementoweb.org/missing-link/
Herbert Van de Sompel, Martin Klein – Hiberlink
CNI Fall 2013, Washington, DC, December 9 2013
99. Investigating Reference Rot in Web-Based Scholarly Communication
Herbert Van de Sompel
Los Alamos National Laboratory
@hvdsomp
Martin Klein
Los Alamos National Laboratory
@mart1nkle1n
http://hiberlink.org #hiberlink
http://mementoweb.org #memento
Hiberlink is funded by the Andrew W. Mellon Foundation
Editor's Notes
The basic consideration in the talk is that life used to be simple when scholarly assets were PDFs: single frozen assets
Problem in scholarly communication, legal journals, supreme court opinions, wikipedia, … Since the problem is so broad, need a solution that works for the wqeb at large not just for scholarly communication
Quote from Wagner et al:Because sites such as Internet Archive and WebCite will remove archived web pages at the owners’request, authors should not depend on these utilitiesas the sole archives for web-based information.