SlideShare a Scribd company logo
1 of 61
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Martin Klein
Los Alamos National Laboratory
@mart1nkle1n
https://orcid.org/0000-0003-0130-2097
Herbert Van de Sompel
Los Alamos National Laboratory
@hvdsomp
https://orcid.org/0000-0002-0715-6126
A Web-Centric Pipeline for Archiving Scholarly Artifacts
The Scholarly Orphans project
is funded by the Andrew W. Mellon Foundation
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Scholarly Orphans – Project Motivation
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
• Consideration
• Researchers are increasingly using a variety of web platforms for
collaboration and communication
• Why?
• Many of these platforms have desirable characteristics
• Versioning
• Time stamping
• Social embedding
• Their institutions do not provide platforms that have global reach
• Collaboration, cf. Github ~ productivity
• Communication, cf. SlideShare ~ visibility
Research and Research Communication on the Web
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Emma Schymanski
https://orcid.org/0000-0001-6868-8145
https://github.com/schymane
https://www.slideshare.net/EmmaSchymanski
https://figshare.com/authors/Emma_Schymanski/5087039
https://publons.com/author/1538491/emma-schymanski#profile
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Shawn Jones
https://orcid.org/0000-0002-4372-870X
http://www.shawnmjones.org/
https://github.com/shawnmjones
https://www.slideshare.net/shawnmjones
https://en.wikipedia.org/wiki/User:Shawnmjones
https://www.blogger.com/profile/17827543974149663194
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
• Consideration
• Researchers deposit artifacts in these web platforms
• Web Platforms:
• Dedicated to scholarship:
• Commercial: e.g., FigShare, Publons
• Not for profit: e.g., OSF, Zenodo
• General purpose:
• Commercial: e.g., GitHub, SlideShare
• Not for profit: e.g., Wikipedia, Wikidata
Research and Research Communication on the Web
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
• Consideration
• Researchers deposit artifacts in these web platforms
• Status quo - The researchers’ institutions commonly:
• Do not know about the existence of these artifact
• Do not have a copy of these artifacts
Research and Research Communication on the Web
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
• Consideration
• Researchers deposit artifacts in these web platforms
• Status quo – Uncertainty regarding long-term accessibility of these
artifacts:
• General purpose platforms don’t provide long-term access
guarantees; platforms dedicated to scholarship commonly do
• Uncertainty regarding the sustainability of unhindered long-
term access to artifacts in these platforms:
• Commercial: when is the change in business model
coming?
• Not for profit: will the next round of grant applications,
member contributions be successful?
Research and Research Communication on the Web
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
• Consideration
• Researchers deposit artifacts in these web platforms
• Status quo - These artifacts are not systematically archived:
• No frameworks like LOCKSS/Portico exist for these artifacts
• Researchers only selectively deposit artifacts in portals that
provide archival guarantees; to obtain a cite-able DOI
• Can’t expect researchers to (also) upload all artifacts in IRs
• Web archives only incidentally archive these artifacts
• Anecdotal & Hiberlink evidence
Research and Research Communication on the Web
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Emma’s SlideShare Artifact: 0 Mementos
https://www.slideshare.net/EmmaSchymanski/dmcm2018-community-resources-connecting-chemistry-and-toxicity-knowledge
http://timetravel.mementoweb.org/
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Shawn’s GitHub Artifact: 1 Memento
https://github.com/shawnmjones/mediawiki
http://web.archive.org/
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Hiberlink Evidence
Web resources referenced in Elsevier corpus (1996-2012)
without representative Memento in public web archives
Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONE
https://doi.org/10.1371/journal.pone.0115253
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
The Need for an Archiving Infrastructure
Herbert Van de Sompel & Andrew Treloar (2014) A Perspective on Archiving the Scholarly Web
https://hvdsomp.info/papers/Papers/2014/iPres2014_Sompel_Treloar.pdf
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Recording versus Archiving
Recording Archiving
Short-term Longer-term
No guarantees provided Attempt to provide guarantees
Write many/read many Write once/Read many
Scholarly process Scholarly record
Herbert Van de Sompel & Andrew Treloar (2014) A Perspective on Archiving the Scholarly Web
https://hvdsomp.info/papers/Papers/2014/iPres2014_Sompel_Treloar.pdf
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Scholarly Orphans – Project Overview
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
The Scholarly Orphans Project
• Funded by the Andrew W. Mellon Foundation
• Los Alamos National Laboratory & New Mexico Consortium
• Old Dominion University
• 04/2016 - 03/2019
• How to capture Scholarly Orphans (i.e., the scholarly artifacts
deposited in web portals) for long-term archiving?
• Experimental project, aimed at exploring technical possibilities
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
The Scholarly Orphans Project
• Explores an institution-driven paradigm
• Academic institutions typically have a long shelf life
• A basic premise underlying e.g., LOCKSS, perma.cc
• An academic institution should be interested in capturing the
artifacts (intellectual property) its scholars deposit on the web
• Collecting and archiving such artifacts aligns with the
mission of academic libraries
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
An Institutional Perspective
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
The Scholarly Orphans Project
• Explores a paradigm inspired by web archiving
• Scale of the problem
• Can’t expect researchers to upload all artifacts in an institutional
repository
• Bilateral agreements for archival purposes with most web
portals unlikely
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
A Web Archiving Perspective
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Inspiration
• LOCKSS
• Web crawling approach
• Focused on journal literature
• Archive-It
• On-demand, subscription-based web archiving
• Not focused on scholarly orphans
• Institutional repository, auto-discovery of journal articles
• Capture an institution’s output
• Focused on journal literature
• The Locker Project & Amy Guy’s Personal Web Observatory work
• Capture an individual’s web presence
• Not focused on scholarly orphans
http://rhiaro.co.uk/
https://rhiaro.github.io/thesis/
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Scholarly Orphans – Prototype Pipeline Overview
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Prototype Pipeline
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Prototype Pipeline
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Demo - myresearch.institute
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
myresearch.institute - Researchers
• Uniquely identified by ORCIDs
• Web identities in multiple portals
• Create various types of artifacts
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
myresearch.institute - Portals
• Tracking started August 27 2018
• Tracking artifacts created starting
August 1 2018
• >2,200 artifacts tracked to date
for all 16 researchers
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
myresearch.institute - Artifacts
• schema.org typology:
• Answer
• Article
• BlogPosting
• Comment
• Dataset
• PresentationDigitalDocument
• Question
• Review
• SoftwareSourceCode
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Tracking Artifacts
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Tracking Artifacts - Description
• In order to track artifacts that were recently deposited by an
institutional researcher in a portal, one reasonably needs:
• The web identity of the researcher in the portal
• Algorithmic discovery
• Discovery via a registry
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Algorithmic Discovery of Web Identities
James Powell, Harihar Shankar, Marko Rodriguez, and Herbert Van de Sompel (2014)
EgoSystem: Where are our alumni? In: code4lib http://journal.code4lib.org/articles/9519
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Martin Klein and Herbert Van de Sompel (2017)
Discovering Scholarly Orphans Using ORCID In: JCDL2017 https://arxiv.org/abs/1703.09343
Discovery of Web Identities via a Registry (ORCID)
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
https://orcid.org/0000-0002-4372-870X
Shawn’s ORCID Record
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
https://orcid.org/0000-0001-6868-8145
Emma’s ORCID Record
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Tracking Artifacts - Description
• In order to track artifacts that were recently deposited by an
institutional researcher in a portal, one reasonably needs:
• The web identity of the researcher in the portal
• Algorithmic discovery
• Discovery via a registry
• A portal API that supports:
• Access by web identity
• Access to contributions “since …” for the web identity
• Result of tracking:
• URI(s) of new artifact(s) discovered in the portal
Tracking Artifacts - Architecture
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Tracking Artifacts - Implementation
• Tracker event notifications:
• Linked Data Notifications (JSON-LD) using AS2, PROV-O,
schema.org
• Identifiers: Unique tracker event identifier per notification
• Dates: artifact publication date & artifact tracked date
• URIs: 1+ artifact URI
• Event database:
• Notifications stored/indexed in ElasticSearch
• Researcher database:
• SQLite
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Tracking Artifacts - Demo
Demo: https://myresearch.institute/
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Tracking Artifacts - Challenges
• Discovery of web identities of researchers
• Algorithmic, registry-based currently not adequate
• Fallback: manual discovery and entry
• With help of researcher
• Portal API access by web identity
• Broadly supported by general purpose portals
• Typically not supported by scholarly portals
• Some lack an API altogether
• Should add ORCID access to APIs
• OAI-PMH and ResourceSync need sets per web identity
• Professional versus personal contributions
• Tracking frequency/scale
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Capturing Artifacts
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Capturing Artifacts - Description
• The capture process takes as input the URI of a new artifact
discovered in a portal
• Its task is to create a representative institutional capture of the
artifact
• Result of capture:
• WARC file for new artifact in an institutional archive
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Capturing Artifacts - Description
• Challenges:
• Delineate the web boundary of the artifact
• More than the input artifact URI
• The boundary is in the eye of the beholder
• Create a high-fidelity capture using an approach that scales for
a steady stream of new artifacts
• Unsolved problem
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Capturing Artifacts
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Capturing Artifacts
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Capturing Artifacts
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Memento Tracer - Framework
http://tracer.mementoweb.org
Capturing Artifacts - Architecture
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Capturing Artifacts - Implementation
• Capture event notifications:
• Identifiers: Unique capture event identifier per notification ;
Preceding tracker event identifier conveyed as provenance
• Dates: Datetime of WARC file creation
• URIs: 1+ WARC file URI
• Tracer, client-side:
• Tracer Chrome extension leveraging Selenium IDE
• Tracer, server-side:
• Stormcrawler ; Selenium (Chrome) with Tracer plug-in ;
WarcProxy ; file-system storage for WARC files
http://stormcrawler.net/
https://www.seleniumhq.org/projects/webdriver/
https://github.com/odie5533/WarcProxy
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Capturing Artifacts - Demo
Demo: https://myresearch.institute/
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Capturing Artifacts - Challenges
• Memento Tracer:
• Language used to express Traces (interoperability)
• Organization of the shared repository for Traces
• Limitations of the browser event listener approach for recording
Traces
• Selection of a Trace for capturing a web publication by other
means than URI pattern
• Legal constraints
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Archiving Artifacts
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Archiving Artifacts - Description
• The archiving process takes as input the URI of a WARC file
generated by the capture process
• Its task is to ingest the WARC file in a cross-institutional web archive
• This can be achieved using off-the-shelf web archiving software,
e.g., pywb, Open Wayback
• Result of archiving:
• Mementos pertaining to newly discovered artifact in a cross-
institutional, Memento-compliant web archive
Archiving Artifacts - Architecture
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Archiving Artifacts - Implementation
• Archiver event notifications:
• Identifiers: Unique archiver event identifier per notification ;
preceding tracker/capturer event identifiers conveyed as
provenance
• Dates: WARC file ingest date ; Memento-Datetime values
URIs: 1+ Memento URI, each corresponding to an artifact URI
• Web Archive:
• pywb
• Social card:
• MementoEmbed
https://github.com/webrecorder/pywb
https://github.com/oduwsdl/MementoEmbed
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Archiving Artifacts - Demo
Demo: https://myresearch.institute/
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Archiving Artifacts - Challenges
• Attempted to use ipwb, a pywb version that uses IPFS
• Cross-institutional distributed file system with redundancy
• Ran out of time to get it operationally stable
Sawood Alam, Mat Kelly, and Michael L. Nelson (2016) InterPlanetary Wayback: The Permanent Web Archive
https://doi.org/10.1145/2910896.2925467
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Scholarly Orphans – Summary
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Summary (1/2)
• The Scholarly Orphans project explores an institution-driven
approach to capture scholarly artifacts deposited in web portals
• Artifacts out of scope of existing archival approaches such as
LOCKSS, Portico, web archives
• Institutions have a long shelf life, should be interested in
collecting these artifacts, and have feasible scale for
identity/artifact discovery
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Summary (2/2)
• Components of the experimental pipeline:
• Tracker: Automatically discover artifacts because researchers
will not upload them to the institution
• Capturer: High fidelity artifact captures through crowd-sourcing
navigation patterns with Memento Tracer
• Archiver: Cross-institutional, Memento-compliant scholarly web
archive
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Acknowledgments
• Los Alamos National Laboratory:
• Lyudmila Balakireva
• Martin Klein
• James Powell
• Harihar Shankar
• Herbert Van de Sompel
• Old Dominion University:
• Sawood Alam
• Grant Atkins
• Shawn Jones
• Mat Kelly
• Michael L. Nelson
• myresearch.institute – all volunteering researchers
@mart1nkle1n @hvdsomp
TPDL2018, Porto, Portugal, 12 Sep 2018
Martin Klein
Los Alamos National Laboratory
@mart1nkle1n
https://orcid.org/0000-0003-0130-2097
Herbert Van de Sompel
Los Alamos National Laboratory
@hvdsomp
https://orcid.org/0000-0002-0715-6126
A Web-Centric Pipeline for Archiving Scholarly Artifacts
The Scholarly Orphans project
is funded by the Andrew W. Mellon Foundation

More Related Content

Similar to A Web-Centric Pipeline for Archiving Scholarly Artifacts

Inspire Hackathon - Integration of Research Projects Sustainability with Cit...
Inspire Hackathon -  Integration of Research Projects Sustainability with Cit...Inspire Hackathon -  Integration of Research Projects Sustainability with Cit...
Inspire Hackathon - Integration of Research Projects Sustainability with Cit...plan4all
 
Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...
Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...
Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...Europeana
 
Perseverance on Persistence by Herbert van de Sompel - EuropeanaTech Conferen...
Perseverance on Persistence by Herbert van de Sompel - EuropeanaTech Conferen...Perseverance on Persistence by Herbert van de Sompel - EuropeanaTech Conferen...
Perseverance on Persistence by Herbert van de Sompel - EuropeanaTech Conferen...Europeana
 
A hands-on data exploration & challenge to become a derived data-set author o...
A hands-on data exploration & challenge to become a derived data-set author o...A hands-on data exploration & challenge to become a derived data-set author o...
A hands-on data exploration & challenge to become a derived data-set author o...labsbl
 
Working with the British Library’s Digital Collections & Data - Insights from...
Working with the British Library’s Digital Collections & Data - Insights from...Working with the British Library’s Digital Collections & Data - Insights from...
Working with the British Library’s Digital Collections & Data - Insights from...labsbl
 
Introduction to the Orléans/OGC INSPIRE Hackathon 2018
Introduction to the Orléans/OGC INSPIRE Hackathon 2018Introduction to the Orléans/OGC INSPIRE Hackathon 2018
Introduction to the Orléans/OGC INSPIRE Hackathon 2018plan4all
 
Introduction to the Oxford Collections Visualization Project
Introduction to the Oxford Collections Visualization ProjectIntroduction to the Oxford Collections Visualization Project
Introduction to the Oxford Collections Visualization ProjectChristine Madsen
 
Deconstructed and decentralized scholarly communication
Deconstructed and decentralized scholarly communicationDeconstructed and decentralized scholarly communication
Deconstructed and decentralized scholarly communicationJisc
 
Toward complex e service for management of reserach outcomes, poland
Toward complex e service for management of reserach outcomes, polandToward complex e service for management of reserach outcomes, poland
Toward complex e service for management of reserach outcomes, polandSistemaBibliotecarioSapienza
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...Martin Klein
 
OpenGLAM CH Hackathons
OpenGLAM CH HackathonsOpenGLAM CH Hackathons
OpenGLAM CH HackathonsBeat Estermann
 
Open Science policy: EC, ERC, Belspo, FWO
Open Science policy: EC, ERC, Belspo, FWOOpen Science policy: EC, ERC, Belspo, FWO
Open Science policy: EC, ERC, Belspo, FWOOpenAccessBelgium
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansMartin Klein
 
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE
 

Similar to A Web-Centric Pipeline for Archiving Scholarly Artifacts (20)

Inspire Hackathon - Integration of Research Projects Sustainability with Cit...
Inspire Hackathon -  Integration of Research Projects Sustainability with Cit...Inspire Hackathon -  Integration of Research Projects Sustainability with Cit...
Inspire Hackathon - Integration of Research Projects Sustainability with Cit...
 
Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...
Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...
Perseverance on persistence by Herbert Van de Sompel - EuropeanaTech Conferen...
 
Perseverance on Persistence
Perseverance on PersistencePerseverance on Persistence
Perseverance on Persistence
 
Perseverance on Persistence by Herbert van de Sompel - EuropeanaTech Conferen...
Perseverance on Persistence by Herbert van de Sompel - EuropeanaTech Conferen...Perseverance on Persistence by Herbert van de Sompel - EuropeanaTech Conferen...
Perseverance on Persistence by Herbert van de Sompel - EuropeanaTech Conferen...
 
A hands-on data exploration & challenge to become a derived data-set author o...
A hands-on data exploration & challenge to become a derived data-set author o...A hands-on data exploration & challenge to become a derived data-set author o...
A hands-on data exploration & challenge to become a derived data-set author o...
 
Working with the British Library’s Digital Collections & Data - Insights from...
Working with the British Library’s Digital Collections & Data - Insights from...Working with the British Library’s Digital Collections & Data - Insights from...
Working with the British Library’s Digital Collections & Data - Insights from...
 
Introduction to the Orléans/OGC INSPIRE Hackathon 2018
Introduction to the Orléans/OGC INSPIRE Hackathon 2018Introduction to the Orléans/OGC INSPIRE Hackathon 2018
Introduction to the Orléans/OGC INSPIRE Hackathon 2018
 
From Open Access to Open Data: collaborative work in the university libraries...
From Open Access to Open Data: collaborative work in the university libraries...From Open Access to Open Data: collaborative work in the university libraries...
From Open Access to Open Data: collaborative work in the university libraries...
 
Introduction to the Oxford Collections Visualization Project
Introduction to the Oxford Collections Visualization ProjectIntroduction to the Oxford Collections Visualization Project
Introduction to the Oxford Collections Visualization Project
 
PechaKucha (FormaliSE'2018)
PechaKucha (FormaliSE'2018)PechaKucha (FormaliSE'2018)
PechaKucha (FormaliSE'2018)
 
Deconstructed and decentralized scholarly communication
Deconstructed and decentralized scholarly communicationDeconstructed and decentralized scholarly communication
Deconstructed and decentralized scholarly communication
 
180515 kitodo wendt_europeana_tech
180515 kitodo wendt_europeana_tech180515 kitodo wendt_europeana_tech
180515 kitodo wendt_europeana_tech
 
Toward complex e service for management of reserach outcomes, poland
Toward complex e service for management of reserach outcomes, polandToward complex e service for management of reserach outcomes, poland
Toward complex e service for management of reserach outcomes, poland
 
To the Rescue of Scholarly Orphans
To the Rescue of Scholarly OrphansTo the Rescue of Scholarly Orphans
To the Rescue of Scholarly Orphans
 
PIDs for cultural heritage Flanders
PIDs for cultural heritage FlandersPIDs for cultural heritage Flanders
PIDs for cultural heritage Flanders
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...
 
OpenGLAM CH Hackathons
OpenGLAM CH HackathonsOpenGLAM CH Hackathons
OpenGLAM CH Hackathons
 
Open Science policy: EC, ERC, Belspo, FWO
Open Science policy: EC, ERC, Belspo, FWOOpen Science policy: EC, ERC, Belspo, FWO
Open Science policy: EC, ERC, Belspo, FWO
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
OpenAIRE: Implementing Open Science in EOSC - crosscutting with RDA (Presenta...
 

More from Martin Klein

On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebOn the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebMartin Klein
 
On the Persistence of Persistent Identifiers of the Scholarly Web
 On the Persistence of Persistent Identifiers of the Scholarly Web On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebMartin Klein
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansMartin Klein
 
Who is Asking - Humans and Machines Experience a Different Scholarly Web
Who is Asking - Humans and Machines  Experience a Different Scholarly WebWho is Asking - Humans and Machines  Experience a Different Scholarly Web
Who is Asking - Humans and Machines Experience a Different Scholarly WebMartin Klein
 
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...Martin Klein
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...Martin Klein
 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncMartin Klein
 
Evaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsEvaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsMartin Klein
 
A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly ArtifactsMartin Klein
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento RequestsMartin Klein
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesMartin Klein
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsMartin Klein
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live WebMartin Klein
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web ResourcesMartin Klein
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for RepositoriesMartin Klein
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDMartin Klein
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationMartin Klein
 
Uniform Access to Raw Mementos
Uniform Access to Raw MementosUniform Access to Raw Mementos
Uniform Access to Raw MementosMartin Klein
 
Robust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationRobust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationMartin Klein
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...Martin Klein
 

More from Martin Klein (20)

On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebOn the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
 
On the Persistence of Persistent Identifiers of the Scholarly Web
 On the Persistence of Persistent Identifiers of the Scholarly Web On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
Who is Asking - Humans and Machines Experience a Different Scholarly Web
Who is Asking - Humans and Machines  Experience a Different Scholarly WebWho is Asking - Humans and Machines  Experience a Different Scholarly Web
Who is Asking - Humans and Machines Experience a Different Scholarly Web
 
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSync
 
Evaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsEvaluating Memento Service Optimizations
Evaluating Memento Service Optimizations
 
A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly Artifacts
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento Requests
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web Archives
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event Collections
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly Communication
 
Uniform Access to Raw Mementos
Uniform Access to Raw MementosUniform Access to Raw Mementos
Uniform Access to Raw Mementos
 
Robust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationRobust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communication
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 

Recently uploaded

Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITMgdsc13
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一Fs
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一z xss
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMartaLoveguard
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作ys8omjxb
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsThierry TROUIN ☁
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Roomishabajaj13
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhimiss dipika
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Deliverybabeytanya
 
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...akbard9823
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一Fs
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)Christopher H Felton
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdf
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdfThe Intriguing World of CDR Analysis by Police: What You Need to Know.pdf
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdfMilind Agarwal
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一Fs
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Roomdivyansh0kumar0
 

Recently uploaded (20)

Git and Github workshop GDSC MLRITM
Git and Github  workshop GDSC MLRITMGit and Github  workshop GDSC MLRITM
Git and Github workshop GDSC MLRITM
 
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
定制(Management毕业证书)新加坡管理大学毕业证成绩单原版一比一
 
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
办理(UofR毕业证书)罗切斯特大学毕业证成绩单原版一比一
 
Magic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptxMagic exist by Marta Loveguard - presentation.pptx
Magic exist by Marta Loveguard - presentation.pptx
 
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
Potsdam FH学位证,波茨坦应用技术大学毕业证书1:1制作
 
AlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with FlowsAlbaniaDreamin24 - How to easily use an API with Flows
AlbaniaDreamin24 - How to easily use an API with Flows
 
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Serviceyoung call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
young call girls in Uttam Nagar🔝 9953056974 🔝 Delhi escort Service
 
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With RoomVIP Kolkata Call Girl Salt Lake 👉 8250192130  Available With Room
VIP Kolkata Call Girl Salt Lake 👉 8250192130 Available With Room
 
Contact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New DelhiContact Rya Baby for Call Girls New Delhi
Contact Rya Baby for Call Girls New Delhi
 
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on DeliveryCall Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
Call Girls In Mumbai Central Mumbai ❤️ 9920874524 👈 Cash on Delivery
 
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...
Sushant Golf City / best call girls in Lucknow | Service-oriented sexy call g...
 
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
定制(Lincoln毕业证书)新西兰林肯大学毕业证成绩单原版一比一
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
A Good Girl's Guide to Murder (A Good Girl's Guide to Murder, #1)
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdf
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdfThe Intriguing World of CDR Analysis by Police: What You Need to Know.pdf
The Intriguing World of CDR Analysis by Police: What You Need to Know.pdf
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
定制(UAL学位证)英国伦敦艺术大学毕业证成绩单原版一比一
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130  Available With RoomVIP Kolkata Call Girl Kestopur 👉 8250192130  Available With Room
VIP Kolkata Call Girl Kestopur 👉 8250192130 Available With Room
 

A Web-Centric Pipeline for Archiving Scholarly Artifacts

  • 1. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Martin Klein Los Alamos National Laboratory @mart1nkle1n https://orcid.org/0000-0003-0130-2097 Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp https://orcid.org/0000-0002-0715-6126 A Web-Centric Pipeline for Archiving Scholarly Artifacts The Scholarly Orphans project is funded by the Andrew W. Mellon Foundation
  • 2. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Scholarly Orphans – Project Motivation
  • 3. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 • Consideration • Researchers are increasingly using a variety of web platforms for collaboration and communication • Why? • Many of these platforms have desirable characteristics • Versioning • Time stamping • Social embedding • Their institutions do not provide platforms that have global reach • Collaboration, cf. Github ~ productivity • Communication, cf. SlideShare ~ visibility Research and Research Communication on the Web
  • 4. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Emma Schymanski https://orcid.org/0000-0001-6868-8145 https://github.com/schymane https://www.slideshare.net/EmmaSchymanski https://figshare.com/authors/Emma_Schymanski/5087039 https://publons.com/author/1538491/emma-schymanski#profile
  • 5. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Shawn Jones https://orcid.org/0000-0002-4372-870X http://www.shawnmjones.org/ https://github.com/shawnmjones https://www.slideshare.net/shawnmjones https://en.wikipedia.org/wiki/User:Shawnmjones https://www.blogger.com/profile/17827543974149663194
  • 6. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 • Consideration • Researchers deposit artifacts in these web platforms • Web Platforms: • Dedicated to scholarship: • Commercial: e.g., FigShare, Publons • Not for profit: e.g., OSF, Zenodo • General purpose: • Commercial: e.g., GitHub, SlideShare • Not for profit: e.g., Wikipedia, Wikidata Research and Research Communication on the Web
  • 7. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 • Consideration • Researchers deposit artifacts in these web platforms • Status quo - The researchers’ institutions commonly: • Do not know about the existence of these artifact • Do not have a copy of these artifacts Research and Research Communication on the Web
  • 8. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 • Consideration • Researchers deposit artifacts in these web platforms • Status quo – Uncertainty regarding long-term accessibility of these artifacts: • General purpose platforms don’t provide long-term access guarantees; platforms dedicated to scholarship commonly do • Uncertainty regarding the sustainability of unhindered long- term access to artifacts in these platforms: • Commercial: when is the change in business model coming? • Not for profit: will the next round of grant applications, member contributions be successful? Research and Research Communication on the Web
  • 9. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 • Consideration • Researchers deposit artifacts in these web platforms • Status quo - These artifacts are not systematically archived: • No frameworks like LOCKSS/Portico exist for these artifacts • Researchers only selectively deposit artifacts in portals that provide archival guarantees; to obtain a cite-able DOI • Can’t expect researchers to (also) upload all artifacts in IRs • Web archives only incidentally archive these artifacts • Anecdotal & Hiberlink evidence Research and Research Communication on the Web
  • 10. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Emma’s SlideShare Artifact: 0 Mementos https://www.slideshare.net/EmmaSchymanski/dmcm2018-community-resources-connecting-chemistry-and-toxicity-knowledge http://timetravel.mementoweb.org/
  • 11. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Shawn’s GitHub Artifact: 1 Memento https://github.com/shawnmjones/mediawiki http://web.archive.org/
  • 12. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Hiberlink Evidence Web resources referenced in Elsevier corpus (1996-2012) without representative Memento in public web archives Martin Klein, Herbert Van de Sompel, et al. (2014) Scholarly context not found. In: PLOS ONE https://doi.org/10.1371/journal.pone.0115253
  • 13. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 The Need for an Archiving Infrastructure Herbert Van de Sompel & Andrew Treloar (2014) A Perspective on Archiving the Scholarly Web https://hvdsomp.info/papers/Papers/2014/iPres2014_Sompel_Treloar.pdf
  • 14. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Recording versus Archiving Recording Archiving Short-term Longer-term No guarantees provided Attempt to provide guarantees Write many/read many Write once/Read many Scholarly process Scholarly record Herbert Van de Sompel & Andrew Treloar (2014) A Perspective on Archiving the Scholarly Web https://hvdsomp.info/papers/Papers/2014/iPres2014_Sompel_Treloar.pdf
  • 15. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Scholarly Orphans – Project Overview
  • 16. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 The Scholarly Orphans Project • Funded by the Andrew W. Mellon Foundation • Los Alamos National Laboratory & New Mexico Consortium • Old Dominion University • 04/2016 - 03/2019 • How to capture Scholarly Orphans (i.e., the scholarly artifacts deposited in web portals) for long-term archiving? • Experimental project, aimed at exploring technical possibilities
  • 17. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 The Scholarly Orphans Project • Explores an institution-driven paradigm • Academic institutions typically have a long shelf life • A basic premise underlying e.g., LOCKSS, perma.cc • An academic institution should be interested in capturing the artifacts (intellectual property) its scholars deposit on the web • Collecting and archiving such artifacts aligns with the mission of academic libraries
  • 18. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 An Institutional Perspective
  • 19. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 The Scholarly Orphans Project • Explores a paradigm inspired by web archiving • Scale of the problem • Can’t expect researchers to upload all artifacts in an institutional repository • Bilateral agreements for archival purposes with most web portals unlikely
  • 20. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 A Web Archiving Perspective
  • 21. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Inspiration • LOCKSS • Web crawling approach • Focused on journal literature • Archive-It • On-demand, subscription-based web archiving • Not focused on scholarly orphans • Institutional repository, auto-discovery of journal articles • Capture an institution’s output • Focused on journal literature • The Locker Project & Amy Guy’s Personal Web Observatory work • Capture an individual’s web presence • Not focused on scholarly orphans http://rhiaro.co.uk/ https://rhiaro.github.io/thesis/
  • 22. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Scholarly Orphans – Prototype Pipeline Overview
  • 23. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Prototype Pipeline
  • 24. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Prototype Pipeline
  • 25. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Demo - myresearch.institute
  • 26. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 myresearch.institute - Researchers • Uniquely identified by ORCIDs • Web identities in multiple portals • Create various types of artifacts
  • 27. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 myresearch.institute - Portals • Tracking started August 27 2018 • Tracking artifacts created starting August 1 2018 • >2,200 artifacts tracked to date for all 16 researchers
  • 28. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 myresearch.institute - Artifacts • schema.org typology: • Answer • Article • BlogPosting • Comment • Dataset • PresentationDigitalDocument • Question • Review • SoftwareSourceCode
  • 29. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Tracking Artifacts
  • 30. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Tracking Artifacts - Description • In order to track artifacts that were recently deposited by an institutional researcher in a portal, one reasonably needs: • The web identity of the researcher in the portal • Algorithmic discovery • Discovery via a registry
  • 31. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Algorithmic Discovery of Web Identities James Powell, Harihar Shankar, Marko Rodriguez, and Herbert Van de Sompel (2014) EgoSystem: Where are our alumni? In: code4lib http://journal.code4lib.org/articles/9519
  • 32. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Martin Klein and Herbert Van de Sompel (2017) Discovering Scholarly Orphans Using ORCID In: JCDL2017 https://arxiv.org/abs/1703.09343 Discovery of Web Identities via a Registry (ORCID)
  • 33. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 https://orcid.org/0000-0002-4372-870X Shawn’s ORCID Record
  • 34. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 https://orcid.org/0000-0001-6868-8145 Emma’s ORCID Record
  • 35. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Tracking Artifacts - Description • In order to track artifacts that were recently deposited by an institutional researcher in a portal, one reasonably needs: • The web identity of the researcher in the portal • Algorithmic discovery • Discovery via a registry • A portal API that supports: • Access by web identity • Access to contributions “since …” for the web identity • Result of tracking: • URI(s) of new artifact(s) discovered in the portal
  • 36. Tracking Artifacts - Architecture
  • 37. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Tracking Artifacts - Implementation • Tracker event notifications: • Linked Data Notifications (JSON-LD) using AS2, PROV-O, schema.org • Identifiers: Unique tracker event identifier per notification • Dates: artifact publication date & artifact tracked date • URIs: 1+ artifact URI • Event database: • Notifications stored/indexed in ElasticSearch • Researcher database: • SQLite
  • 38. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Tracking Artifacts - Demo Demo: https://myresearch.institute/
  • 39. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Tracking Artifacts - Challenges • Discovery of web identities of researchers • Algorithmic, registry-based currently not adequate • Fallback: manual discovery and entry • With help of researcher • Portal API access by web identity • Broadly supported by general purpose portals • Typically not supported by scholarly portals • Some lack an API altogether • Should add ORCID access to APIs • OAI-PMH and ResourceSync need sets per web identity • Professional versus personal contributions • Tracking frequency/scale
  • 40. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Capturing Artifacts
  • 41. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Capturing Artifacts - Description • The capture process takes as input the URI of a new artifact discovered in a portal • Its task is to create a representative institutional capture of the artifact • Result of capture: • WARC file for new artifact in an institutional archive
  • 42. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Capturing Artifacts - Description • Challenges: • Delineate the web boundary of the artifact • More than the input artifact URI • The boundary is in the eye of the beholder • Create a high-fidelity capture using an approach that scales for a steady stream of new artifacts • Unsolved problem
  • 43. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Capturing Artifacts
  • 44. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Capturing Artifacts
  • 45. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Capturing Artifacts
  • 46. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Memento Tracer - Framework http://tracer.mementoweb.org
  • 47. Capturing Artifacts - Architecture
  • 48. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Capturing Artifacts - Implementation • Capture event notifications: • Identifiers: Unique capture event identifier per notification ; Preceding tracker event identifier conveyed as provenance • Dates: Datetime of WARC file creation • URIs: 1+ WARC file URI • Tracer, client-side: • Tracer Chrome extension leveraging Selenium IDE • Tracer, server-side: • Stormcrawler ; Selenium (Chrome) with Tracer plug-in ; WarcProxy ; file-system storage for WARC files http://stormcrawler.net/ https://www.seleniumhq.org/projects/webdriver/ https://github.com/odie5533/WarcProxy
  • 49. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Capturing Artifacts - Demo Demo: https://myresearch.institute/
  • 50. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Capturing Artifacts - Challenges • Memento Tracer: • Language used to express Traces (interoperability) • Organization of the shared repository for Traces • Limitations of the browser event listener approach for recording Traces • Selection of a Trace for capturing a web publication by other means than URI pattern • Legal constraints
  • 51. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Archiving Artifacts
  • 52. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Archiving Artifacts - Description • The archiving process takes as input the URI of a WARC file generated by the capture process • Its task is to ingest the WARC file in a cross-institutional web archive • This can be achieved using off-the-shelf web archiving software, e.g., pywb, Open Wayback • Result of archiving: • Mementos pertaining to newly discovered artifact in a cross- institutional, Memento-compliant web archive
  • 53. Archiving Artifacts - Architecture
  • 54. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Archiving Artifacts - Implementation • Archiver event notifications: • Identifiers: Unique archiver event identifier per notification ; preceding tracker/capturer event identifiers conveyed as provenance • Dates: WARC file ingest date ; Memento-Datetime values URIs: 1+ Memento URI, each corresponding to an artifact URI • Web Archive: • pywb • Social card: • MementoEmbed https://github.com/webrecorder/pywb https://github.com/oduwsdl/MementoEmbed
  • 55. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Archiving Artifacts - Demo Demo: https://myresearch.institute/
  • 56. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Archiving Artifacts - Challenges • Attempted to use ipwb, a pywb version that uses IPFS • Cross-institutional distributed file system with redundancy • Ran out of time to get it operationally stable Sawood Alam, Mat Kelly, and Michael L. Nelson (2016) InterPlanetary Wayback: The Permanent Web Archive https://doi.org/10.1145/2910896.2925467
  • 57. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Scholarly Orphans – Summary
  • 58. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Summary (1/2) • The Scholarly Orphans project explores an institution-driven approach to capture scholarly artifacts deposited in web portals • Artifacts out of scope of existing archival approaches such as LOCKSS, Portico, web archives • Institutions have a long shelf life, should be interested in collecting these artifacts, and have feasible scale for identity/artifact discovery
  • 59. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Summary (2/2) • Components of the experimental pipeline: • Tracker: Automatically discover artifacts because researchers will not upload them to the institution • Capturer: High fidelity artifact captures through crowd-sourcing navigation patterns with Memento Tracer • Archiver: Cross-institutional, Memento-compliant scholarly web archive
  • 60. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Acknowledgments • Los Alamos National Laboratory: • Lyudmila Balakireva • Martin Klein • James Powell • Harihar Shankar • Herbert Van de Sompel • Old Dominion University: • Sawood Alam • Grant Atkins • Shawn Jones • Mat Kelly • Michael L. Nelson • myresearch.institute – all volunteering researchers
  • 61. @mart1nkle1n @hvdsomp TPDL2018, Porto, Portugal, 12 Sep 2018 Martin Klein Los Alamos National Laboratory @mart1nkle1n https://orcid.org/0000-0003-0130-2097 Herbert Van de Sompel Los Alamos National Laboratory @hvdsomp https://orcid.org/0000-0002-0715-6126 A Web-Centric Pipeline for Archiving Scholarly Artifacts The Scholarly Orphans project is funded by the Andrew W. Mellon Foundation