SlideShare a Scribd company logo
1 of 19
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Martin Klein & Lyudmila Balakireva
Los Alamos National Laboratory
{mklein, ludab}@lanl.gov
On the Persistence of Persistent
Identifiers of the Scholarly Web
HEAD GET GET+ Chrome
https://arxiv.org/abs/2004.03011
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
DOIs are very common
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
DOIs are very common
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
DOIs are very common
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
How does this work via HTTP?
https://doi.org/10.1007/978-3-540-87599-4_38
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Arrived at landing page
https://doi.org/10.1007/978-3-540-87599-4_38
https://link.springer.com/chapter/10.1007%2F978-3-540-87599-4_38
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
HTTP redirects
https://doi.org/10.1007/978-3-540-87599-4_38
 (HTTP 302 redirect)
http://link.springer.com/10.1007/978-3-540-87599-4_38
 (HTTP 301 redirect)
https://link.springer.com/10.1007/978-3-540-87599-4_38
 (HTTP 302 redirect)
https://link.springer.com/chapter/10.1007%2F978-3-540-87599-4_38
 (HTTP 200)
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Questions…
• How persistent is this DOI resolution?
• Given different clients and network environments:
• Can we consistently arrive at the same location at the end
of the redirect chain?
• Is the path there (redirect chain) the same?
• Are there differences between Open Access and non-OA?
• Subscription vs non-Subscription level content?
• Do scholarly content providers differ from the popular web?
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Idea…
• Comparative study investigating scholarly publishers’ responses
• To common HTTP requests
• Against DOIs
• Using different web clients and request methods, resembling
• Machines ”browsing”, crawling
• Humans browsing
• From network environments with different subscriptions/licenses
• Amazon Web Service EC2 instance
• LANL internal
• Compare against web servers providing popular web content
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
HTTP clients, request methods, dataset, networks
• HTTP HEAD
• cURL
• HTTP GET
• cURL
• HTTP GET+
• cURL + various common parameters e.g., user agent, cookies
• HTTP GET
• Chrome
• 10,000 DOIs, randomly picked, 100 DOIs from the 100 most
frequent publisher domains
• HTTP requests sent from AWS VM and LANL network
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
HTTP clients, request methods, dataset, networks
• HTTP HEAD
• cURL
• HTTP GET
• cURL
• HTTP GET+
• cURL + various common parameters e.g., user agent, cookies
• HTTP GET
• Chrome
• 10,000 DOIs, randomly picked, 100 DOIs from the 100 most
frequent publisher domains
• HTTP requests sent from AWS VM and LANL network
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Response codes of last link in redirect chain by DOI
HEAD GET GET+ Chrome
2xx 3xx 4xx 5xx Err10,000DOIs
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Response codes of last link in redirect chain by DOI
HEAD GET GET+ Chrome
2xx 3xx 4xx 5xx Err
48.3%
• < 50% successful
requests across all
methods
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Response codes of last link in redirect chain by DOI
HEAD GET GET+ Chrome
2xx 3xx 4xx 5xx Err
48.3%
• < 50% successful
requests across all
methods
• > 40% 300-level
responses w/ GET
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Response codes of last link in redirect chain by DOI
HEAD GET GET+ Chrome
2xx 3xx 4xx 5xx Err
48.3%
• < 50% successful
requests across all
methods
• > 40% 300-level
responses w/ GET
• 25% return 200-level
w/ HEAD/Chrome
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Response codes of last link in redirect chain by DOI
HEAD GET GET+ Chrome
2xx 3xx 4xx 5xx Err
48.3%
• < 50% successful
requests across all
methods
• > 40% 300-level
responses w/ GET
• 25% return 200-level
w/ HEAD/Chrome
• 13% 400-level
responses w/ HEAD
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
Response codes of last link in redirect chain by DOI
HEAD GET GET+ Chrome
2xx 3xx 4xx 5xx Err
48.3%
• < 50% successful
requests across all
methods
• > 40% 300-level
responses w/ GET
• 25% return 200-level
w/ HEAD/Chrome
• 13% 400-level
responses w/ HEAD
• 25% of them w/
200-level response
w/ any other method
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
https://arxiv.org/abs/2004.03011
For more background, details, results
On the Persistence of Persistent Identifiers of the Scholarly Web
@mart1nkle1n
TPDL, August 2020
On the Persistence of Persistent
Identifiers of the Scholarly Web
Thank you
&
stay safe!
Martin Klein & Lyudmila Balakireva
Los Alamos National Laboratory
{mklein, ludab}@lanl.gov

More Related Content

What's hot

cited by how-to
cited by how-tocited by how-to
cited by how-toCrossref
 
(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web PagesMichael Nelson
 
Linking media, data, and services
Linking media, data, and servicesLinking media, data, and services
Linking media, data, and servicesRuben Verborgh
 
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesSynchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesMichael Nelson
 
Introduction to Linked Data 1/5
Introduction to Linked Data 1/5Introduction to Linked Data 1/5
Introduction to Linked Data 1/5Juan Sequeda
 
ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...
ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...
ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...Michael Cummings
 
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...Crossref
 
Introduction To Linked Data
Introduction To Linked DataIntroduction To Linked Data
Introduction To Linked DataLeigh Dodds
 
Semantic Web Applications
Semantic Web ApplicationsSemantic Web Applications
Semantic Web ApplicationsJulian Higman
 
How to become an effective web searcher
How to become an effective web searcherHow to become an effective web searcher
How to become an effective web searcherrangak
 
Google searching techniques
Google searching techniquesGoogle searching techniques
Google searching techniquessawarkar17
 
Chuck Koscher: The Metadata Engine #crossref15
Chuck Koscher: The Metadata Engine #crossref15Chuck Koscher: The Metadata Engine #crossref15
Chuck Koscher: The Metadata Engine #crossref15Crossref
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawlervinay arora
 
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Alison Hitchens
 
Computer study lesson - Internet Search (25 Mar 2020)
Computer study lesson - Internet Search (25 Mar 2020)Computer study lesson - Internet Search (25 Mar 2020)
Computer study lesson - Internet Search (25 Mar 2020)wmsklang
 
Location, location, location: A transaction comparison of catalog searches o...
Location, location, location:A transaction comparison of catalog searches o...Location, location, location:A transaction comparison of catalog searches o...
Location, location, location: A transaction comparison of catalog searches o...teaguese
 
Architecture of a search engine
Architecture of a search engineArchitecture of a search engine
Architecture of a search engineSylvain Utard
 
1018telling story from text 2
1018telling story from text 21018telling story from text 2
1018telling story from text 2Ke Jiang
 
Understanding Seo At A Glance
Understanding Seo At A GlanceUnderstanding Seo At A Glance
Understanding Seo At A Glancepoojagupta267
 
Introduction to CrossRef Technical Basics Webinar 031815
Introduction to CrossRef Technical Basics Webinar 031815Introduction to CrossRef Technical Basics Webinar 031815
Introduction to CrossRef Technical Basics Webinar 031815Crossref
 

What's hot (20)

cited by how-to
cited by how-tocited by how-to
cited by how-to
 
(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages(Re-)Discovering Lost Web Pages
(Re-)Discovering Lost Web Pages
 
Linking media, data, and services
Linking media, data, and servicesLinking media, data, and services
Linking media, data, and services
 
Synchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web PagesSynchronicity: Just-In-Time Discovery of Lost Web Pages
Synchronicity: Just-In-Time Discovery of Lost Web Pages
 
Introduction to Linked Data 1/5
Introduction to Linked Data 1/5Introduction to Linked Data 1/5
Introduction to Linked Data 1/5
 
ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...
ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...
ELUNA2013:Providing Voyager catalog data in a custom, open source web applica...
 
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
CrossRef How-to: A Technical Introduction to the Basics of CrossRef, Chuck Ko...
 
Introduction To Linked Data
Introduction To Linked DataIntroduction To Linked Data
Introduction To Linked Data
 
Semantic Web Applications
Semantic Web ApplicationsSemantic Web Applications
Semantic Web Applications
 
How to become an effective web searcher
How to become an effective web searcherHow to become an effective web searcher
How to become an effective web searcher
 
Google searching techniques
Google searching techniquesGoogle searching techniques
Google searching techniques
 
Chuck Koscher: The Metadata Engine #crossref15
Chuck Koscher: The Metadata Engine #crossref15Chuck Koscher: The Metadata Engine #crossref15
Chuck Koscher: The Metadata Engine #crossref15
 
Search engine and web crawler
Search engine and web crawlerSearch engine and web crawler
Search engine and web crawler
 
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...Changing Data: Implementing Primo for the Tri University Group of Libraries (...
Changing Data: Implementing Primo for the Tri University Group of Libraries (...
 
Computer study lesson - Internet Search (25 Mar 2020)
Computer study lesson - Internet Search (25 Mar 2020)Computer study lesson - Internet Search (25 Mar 2020)
Computer study lesson - Internet Search (25 Mar 2020)
 
Location, location, location: A transaction comparison of catalog searches o...
Location, location, location:A transaction comparison of catalog searches o...Location, location, location:A transaction comparison of catalog searches o...
Location, location, location: A transaction comparison of catalog searches o...
 
Architecture of a search engine
Architecture of a search engineArchitecture of a search engine
Architecture of a search engine
 
1018telling story from text 2
1018telling story from text 21018telling story from text 2
1018telling story from text 2
 
Understanding Seo At A Glance
Understanding Seo At A GlanceUnderstanding Seo At A Glance
Understanding Seo At A Glance
 
Introduction to CrossRef Technical Basics Webinar 031815
Introduction to CrossRef Technical Basics Webinar 031815Introduction to CrossRef Technical Basics Webinar 031815
Introduction to CrossRef Technical Basics Webinar 031815
 

Similar to On the Persistence of Persistent Identifiers of the Scholarly Web

On the Persistence of Persistent Identifiers of the Scholarly Web
 On the Persistence of Persistent Identifiers of the Scholarly Web On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly WebMartin Klein
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web PagesMichael Nelson
 
DBpedia Framework - BBC Talk
DBpedia Framework - BBC TalkDBpedia Framework - BBC Talk
DBpedia Framework - BBC TalkGeorgi Kobilarov
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commonsJesse Wang
 
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...Amazon Web Services
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)Jeremy Cabral
 
Site Crawling: What To Do & What To Look For
Site Crawling: What To Do & What To Look ForSite Crawling: What To Do & What To Look For
Site Crawling: What To Do & What To Look ForOutspoken Media
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014Robert Meusel
 
Transmission6 - Publishing Linked Data
Transmission6 - Publishing Linked DataTransmission6 - Publishing Linked Data
Transmission6 - Publishing Linked DataBill Roberts
 
The Power of Open Data
The Power of Open DataThe Power of Open Data
The Power of Open DataPhil Windley
 
Tagging search solution design
Tagging search solution designTagging search solution design
Tagging search solution designAlexander Tokarev
 
Semantic web and Linked Data
Semantic web and Linked DataSemantic web and Linked Data
Semantic web and Linked DataHyun Namgoong
 
Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Cataloguesdavid-read
 
Web Hacking Series Part 1
Web Hacking Series Part 1Web Hacking Series Part 1
Web Hacking Series Part 1Aditya Kamat
 
API Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid Rahimian
API Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid RahimianAPI Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid Rahimian
API Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid RahimianVahid Rahimian
 

Similar to On the Persistence of Persistent Identifiers of the Scholarly Web (20)

On the Persistence of Persistent Identifiers of the Scholarly Web
 On the Persistence of Persistent Identifiers of the Scholarly Web On the Persistence of Persistent Identifiers of the Scholarly Web
On the Persistence of Persistent Identifiers of the Scholarly Web
 
(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages(Re-) Discovering Lost Web Pages
(Re-) Discovering Lost Web Pages
 
Linked Data
Linked DataLinked Data
Linked Data
 
Webofdata
WebofdataWebofdata
Webofdata
 
Barcamprdu linkeddata
Barcamprdu linkeddataBarcamprdu linkeddata
Barcamprdu linkeddata
 
Insight_150115_Demo
Insight_150115_DemoInsight_150115_Demo
Insight_150115_Demo
 
DBpedia Framework - BBC Talk
DBpedia Framework - BBC TalkDBpedia Framework - BBC Talk
DBpedia Framework - BBC Talk
 
The Web of data and web data commons
The Web of data and web data commonsThe Web of data and web data commons
The Web of data and web data commons
 
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
Deep Dive on Accelerating Content, APIs, and Applications with Amazon CloudFr...
 
Jeremy cabral search marketing summit - scraping data-driven content (1)
Jeremy cabral   search marketing summit - scraping data-driven content (1)Jeremy cabral   search marketing summit - scraping data-driven content (1)
Jeremy cabral search marketing summit - scraping data-driven content (1)
 
Site Crawling: What To Do & What To Look For
Site Crawling: What To Do & What To Look ForSite Crawling: What To Do & What To Look For
Site Crawling: What To Do & What To Look For
 
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
The Web Data Commons Microdata, RDFa, and Microformat Dataset Series @ ISWC2014
 
Transmission6 - Publishing Linked Data
Transmission6 - Publishing Linked DataTransmission6 - Publishing Linked Data
Transmission6 - Publishing Linked Data
 
The Power of Open Data
The Power of Open DataThe Power of Open Data
The Power of Open Data
 
Rest web services
Rest web servicesRest web services
Rest web services
 
Tagging search solution design
Tagging search solution designTagging search solution design
Tagging search solution design
 
Semantic web and Linked Data
Semantic web and Linked DataSemantic web and Linked Data
Semantic web and Linked Data
 
Open Data and CKAN Data Catalogues
Open Data and CKAN Data CataloguesOpen Data and CKAN Data Catalogues
Open Data and CKAN Data Catalogues
 
Web Hacking Series Part 1
Web Hacking Series Part 1Web Hacking Series Part 1
Web Hacking Series Part 1
 
API Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid Rahimian
API Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid RahimianAPI Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid Rahimian
API Design, A Quick Guide to REST, SOAP, gRPC, and GraphQL, By Vahid Rahimian
 

More from Martin Klein

An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansMartin Klein
 
Who is Asking - Humans and Machines Experience a Different Scholarly Web
Who is Asking - Humans and Machines  Experience a Different Scholarly WebWho is Asking - Humans and Machines  Experience a Different Scholarly Web
Who is Asking - Humans and Machines Experience a Different Scholarly WebMartin Klein
 
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...Martin Klein
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...Martin Klein
 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncMartin Klein
 
Evaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsEvaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsMartin Klein
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansMartin Klein
 
A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly ArtifactsMartin Klein
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...Martin Klein
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento RequestsMartin Klein
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesMartin Klein
 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsMartin Klein
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsMartin Klein
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live WebMartin Klein
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web ResourcesMartin Klein
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for RepositoriesMartin Klein
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDMartin Klein
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationMartin Klein
 
Uniform Access to Raw Mementos
Uniform Access to Raw MementosUniform Access to Raw Mementos
Uniform Access to Raw MementosMartin Klein
 
Robust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationRobust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationMartin Klein
 

More from Martin Klein (20)

An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
Who is Asking - Humans and Machines Experience a Different Scholarly Web
Who is Asking - Humans and Machines  Experience a Different Scholarly WebWho is Asking - Humans and Machines  Experience a Different Scholarly Web
Who is Asking - Humans and Machines Experience a Different Scholarly Web
 
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...The Memento Tracer Framework: Balancing Quality and Scalability  for Web Arch...
The Memento Tracer Framework: Balancing Quality and Scalability for Web Arch...
 
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...Memento Tracer An Innovative Approach Towards Balancing  Scale and Fidelity f...
Memento Tracer An Innovative Approach Towards Balancing Scale and Fidelity f...
 
Comparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSyncComparing the Performance of OAI-PMH with ResourceSync
Comparing the Performance of OAI-PMH with ResourceSync
 
Evaluating Memento Service Optimizations
Evaluating Memento Service OptimizationsEvaluating Memento Service Optimizations
Evaluating Memento Service Optimizations
 
An Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly OrphansAn Institutional Perspective to Rescue Scholarly Orphans
An Institutional Perspective to Rescue Scholarly Orphans
 
A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly Artifacts
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento Requests
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web Archives
 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly Artifacts
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event Collections
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly Communication
 
Uniform Access to Raw Mementos
Uniform Access to Raw MementosUniform Access to Raw Mementos
Uniform Access to Raw Mementos
 
Robust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationRobust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communication
 

Recently uploaded

How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)Damian Radcliffe
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607dollysharma2066
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Dana Luther
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersDamian Radcliffe
 
Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneRussian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneCall girls in Ahmedabad High profile
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...aditipandeya
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of indiaimessage0108
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Servicesexy call girls service in goa
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...SofiyaSharma5
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girladitipandeya
 

Recently uploaded (20)

How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)How is AI changing journalism? (v. April 2024)
How is AI changing journalism? (v. April 2024)
 
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
FULL ENJOY Call Girls In Mayur Vihar Delhi Contact Us 8377087607
 
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Model Towh Delhi 💯Call Us 🔝8264348440🔝
 
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
Packaging the Monolith - PHP Tek 2024 (Breaking it down one bite at a time)
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providersMoving Beyond Twitter/X and Facebook - Social Media for local news providers
Moving Beyond Twitter/X and Facebook - Social Media for local news providers
 
Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service ThaneRussian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
Russian Call Girls Thane Swara 8617697112 Independent Escort Service Thane
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of india
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICECall Girls In South Ex 📱  9999965857  🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
Call Girls In South Ex 📱 9999965857 🤩 Delhi 🫦 HOT AND SEXY VVIP 🍎 SERVICE
 
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 22 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine ServiceHot Service (+9316020077 ) Goa  Call Girls Real Photos and Genuine Service
Hot Service (+9316020077 ) Goa Call Girls Real Photos and Genuine Service
 
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
Low Rate Young Call Girls in Sector 63 Mamura Noida ✔️☆9289244007✔️☆ Female E...
 
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Samaira 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Samaira 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 26 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call GirlVIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
VIP 7001035870 Find & Meet Hyderabad Call Girls LB Nagar high-profile Call Girl
 

On the Persistence of Persistent Identifiers of the Scholarly Web

  • 1. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Martin Klein & Lyudmila Balakireva Los Alamos National Laboratory {mklein, ludab}@lanl.gov On the Persistence of Persistent Identifiers of the Scholarly Web HEAD GET GET+ Chrome https://arxiv.org/abs/2004.03011
  • 2. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 DOIs are very common
  • 3. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 DOIs are very common
  • 4. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 DOIs are very common
  • 5. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 How does this work via HTTP? https://doi.org/10.1007/978-3-540-87599-4_38
  • 6. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Arrived at landing page https://doi.org/10.1007/978-3-540-87599-4_38 https://link.springer.com/chapter/10.1007%2F978-3-540-87599-4_38
  • 7. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 HTTP redirects https://doi.org/10.1007/978-3-540-87599-4_38  (HTTP 302 redirect) http://link.springer.com/10.1007/978-3-540-87599-4_38  (HTTP 301 redirect) https://link.springer.com/10.1007/978-3-540-87599-4_38  (HTTP 302 redirect) https://link.springer.com/chapter/10.1007%2F978-3-540-87599-4_38  (HTTP 200)
  • 8. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Questions… • How persistent is this DOI resolution? • Given different clients and network environments: • Can we consistently arrive at the same location at the end of the redirect chain? • Is the path there (redirect chain) the same? • Are there differences between Open Access and non-OA? • Subscription vs non-Subscription level content? • Do scholarly content providers differ from the popular web?
  • 9. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Idea… • Comparative study investigating scholarly publishers’ responses • To common HTTP requests • Against DOIs • Using different web clients and request methods, resembling • Machines ”browsing”, crawling • Humans browsing • From network environments with different subscriptions/licenses • Amazon Web Service EC2 instance • LANL internal • Compare against web servers providing popular web content
  • 10. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 HTTP clients, request methods, dataset, networks • HTTP HEAD • cURL • HTTP GET • cURL • HTTP GET+ • cURL + various common parameters e.g., user agent, cookies • HTTP GET • Chrome • 10,000 DOIs, randomly picked, 100 DOIs from the 100 most frequent publisher domains • HTTP requests sent from AWS VM and LANL network
  • 11. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 HTTP clients, request methods, dataset, networks • HTTP HEAD • cURL • HTTP GET • cURL • HTTP GET+ • cURL + various common parameters e.g., user agent, cookies • HTTP GET • Chrome • 10,000 DOIs, randomly picked, 100 DOIs from the 100 most frequent publisher domains • HTTP requests sent from AWS VM and LANL network
  • 12. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Response codes of last link in redirect chain by DOI HEAD GET GET+ Chrome 2xx 3xx 4xx 5xx Err10,000DOIs
  • 13. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Response codes of last link in redirect chain by DOI HEAD GET GET+ Chrome 2xx 3xx 4xx 5xx Err 48.3% • < 50% successful requests across all methods
  • 14. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Response codes of last link in redirect chain by DOI HEAD GET GET+ Chrome 2xx 3xx 4xx 5xx Err 48.3% • < 50% successful requests across all methods • > 40% 300-level responses w/ GET
  • 15. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Response codes of last link in redirect chain by DOI HEAD GET GET+ Chrome 2xx 3xx 4xx 5xx Err 48.3% • < 50% successful requests across all methods • > 40% 300-level responses w/ GET • 25% return 200-level w/ HEAD/Chrome
  • 16. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Response codes of last link in redirect chain by DOI HEAD GET GET+ Chrome 2xx 3xx 4xx 5xx Err 48.3% • < 50% successful requests across all methods • > 40% 300-level responses w/ GET • 25% return 200-level w/ HEAD/Chrome • 13% 400-level responses w/ HEAD
  • 17. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 Response codes of last link in redirect chain by DOI HEAD GET GET+ Chrome 2xx 3xx 4xx 5xx Err 48.3% • < 50% successful requests across all methods • > 40% 300-level responses w/ GET • 25% return 200-level w/ HEAD/Chrome • 13% 400-level responses w/ HEAD • 25% of them w/ 200-level response w/ any other method
  • 18. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 https://arxiv.org/abs/2004.03011 For more background, details, results
  • 19. On the Persistence of Persistent Identifiers of the Scholarly Web @mart1nkle1n TPDL, August 2020 On the Persistence of Persistent Identifiers of the Scholarly Web Thank you & stay safe! Martin Klein & Lyudmila Balakireva Los Alamos National Laboratory {mklein, ludab}@lanl.gov

Editor's Notes

  1. Hello and welcome to this session! My name is Martin Klein and I work in the RL @ LANL. I’d like to give brief overview of the work done with my colleague Luda Balakireva on the persistence of persistent identifiers of the scholarly web. More specifically, we are testing Digital Object Identifiers (DOIs) and how consistently or inconsistently scholarly publishers respond when DOIs are requested. It is worth noting that several different persistent identifiers are used on the scholarly web but for the purpose of this study, we only investigate DOIs.
  2. Why do we do that? Well, the answer is pretty simple: because DOIs are very common. For example, traditional journal or conference proceeding papers are often assigned DOIs as shown in this example from the IEEE.
  3. The same holds true for datasets that are often assigned DOIs as shown here in Zenodo.
  4. Or more generally speaking, scholarly projects that can include multiple resources and types of resources, as shown here in the example of the Open Science Framework, are assigned DOIs. So this all is to say that DOIs are very frequently used to identify scholarly resources on the web.
  5. So how does this work, how are DOIs resolved on the web? If we take this DOIs that is actionable via HTTP
  6. And use a web browser to dereference it, the browser will eventually display the resource, in this case the landing page of a scholarly article, identified by the DOI. Note that the URI of the landing page, shown on the bottom of this slide, is different from the DOI, as it is hosted by Springer.
  7. The reason for this is that in the background, somewhat opaque to the user, the browser follows a number of HTTP redirects from the DOI to the landing page URI. The redirect chain for our example DOI is shown here: We first see a HTTP 302 redirect to Springer Followed by a 301 redirect to the HTTPS protocol And another 302 to the landing page URI. The landing page, as the last link of the redirect chain, returns an HTTP 200 response code, indicating success of the request and the server’s response.
  8. So the main question we are investigating with our work is: how persistent is this DOI resolution? Given that DOIs can be requested by different HTTP clients and from different network environments, several subsequent questions arise. For example: Can we consistently arrive at the same last link of a redirect chain? Does the chain itself change? Is there a difference between the resolution of DOIs that identify OA resources vs those that identify non-OA resources? Does it matter if the request against a DOI comes from within an institutional network with certain subscription levels to commercial publishers? If we observe such differences, is this typical only for the scholarly web or are these behaviors reflected in the popular web as well? In short, our intention is to test the consistency of DOI responses. Afterall, without consistency, how can we trust the persistence of such identifiers and their underlying infrastructure?
  9. We designed a study to investigate scholarly publishers and their responses to requests against DOIs. We use common HTTP clients and methods that resemble both machine and human browsing behavior. We send our request from 2 different network environments with different subscription levels to commercial publishers. We send the same requests against web servers providing popular web content to compare our results.
  10. We use the here summarized 4 different HTTP methods and clients for our experiment. We send HTTP HEAD requests with the popular command line tool cURL. We send simple HTTP GET requests, also with cURL. We send more complex HTTP GET requests with cURL, where we for example specify a user agent and accept cookies. Lastly, we use the popular web browser Chrome to send HTTP GET requests. We send these 4 requests against a corpus of 10k randomly sampled DOIs and repeat the experiment from 2 different network environments a VM in the Amazon Cloud and from within the LANL network.
  11. We make the case that the first 3 methods resemble a machine browsing or crawling the web. Mostly because cURL is a tool that humans typically only use for testing but it is a tool that is frequently utilized in scripts that access web resources at scale. In contrast, the Chrome method, somewhat naturally, most closely resembles a human browsing.
  12. Due to time constraints I will only show one set of results. What we see here in this graph is the response code of the last link of all redirect chains, distinguished by request method. Our 4 methods to dereference DOIs are shown on the x-axis 10k DOIs are displayed on the y-axis Response codes are binned at the hundreds level, where green indicates 200-level response (success), gray represents 300-level responses (redirect), red – 400 (server error), blue – 500 (client error) This graph shows results of requests sent from a VM in the Amazon Cloud, so a network presumably w/o subscriptions to commercial publishers. A number of observations can immediately be made:
  13. 1) - Less than 50% of DOIs consistently return a 200-level response, meaning success, across all 4 request methods. - In other words, more than 5k of our DOIs did not respond consistently across all 4 methods! A rather astonishing ratio! - Looking at the individual methods, we can note that Chrome, the method most closely resembling a human browsing the web, performs best
  14. 2) Next, we recognize that the simple GET method seems not well-suited for resolving DOIs With more than 40% of DOI chains ending in a 300-level response. This is noteworthy as, by definition, 300-level should not be a *final* response code of a redirect chain on the web - No obvious reason why….
  15. …especially given that a large fraction of those DOIs, 25% in total, result in a successful response with the HEAD or Chrome method used.
  16. 4) Our next observation is that a significant portion – 13% - of DOI requests with the simple HEAD method result in a 400-level response. One could think there are a lot of 403s meaning access forbidden or 405 meaning the HEAD method is not allowed against the resource But that is not the case, this portion is indeed dominated by 404s meaning resource not found
  17. Oddly, 25% of these DOIs result in a 200-level response when any other request method is used. So, do they exist or not? While such scenarios of changing response codes are not well-aligned with HTTP standards and best practice on the web, our observations strongly indicate that scholarly publishers do respond differently to requests against the same DOI, depending on what method is used. In addition, we can clearly see patterns where responses are different for methods that resemble machine vs human behavior. This is represented by the success of the Chrome method and the lack of success in particular by the simple GET and HEAD method. In aggregate, from our point of view, these observed inconsistencies raise more questions and do not increase trust in the persistence of persistent identifiers.
  18. For more results, details on the methodology and dataset used, we refer to the paper. The corresponding pre-print is available at the displayed URI on the bottom of this slide.
  19. This concludes my short presentation. Thanks a lot for watching! I am happy to hear your feedback and discuss our work. Thank you!