SlideShare a Scribd company logo
1 of 42
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Martin Klein
Los Alamos National Laboratory
martinklein0815@gmail.com
@mart1nkle1n
with
Lyudmila Balakireva (LANL)
Harihar Shankar (98point6)
Who is Asking?
Humans and Machines
Experience a Different Scholarly Web
HEAD GET GET+ Chrome IA Crawl
2xx 3xx 4xx 5xx
HEAD GET GET+ Chrome IA Crawl
010002000300040005000
2xx 3xx 4xx 5xx
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Imagine this is your phone…
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
…and you are calling 112…
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
…this person responds...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
…and you are getting the help you need!
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
What if this is your phone …
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
…and you are calling 112…
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
…this other person responds...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
…and some “help” is coming!
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
But what if this is your phone …
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
… and you are calling 112 …
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
…no one responds...
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
…and you don’t get any help!
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
No more scary 112 calls!
• Phones are web clients
• 112 calls are HTTP requests against DOIs
• Regardless of the web client you use, would you not expect
the same response from a web server responding to the
request against a DOI?
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Idea…
• Comparative study investigating scholarly publishers’ responses
• To common HTTP requests
• Against DOIs
• Using multiple different web clients, resembling
• Machines browsing
• Humans browsing
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Why is this relevant?
• Archival use case
• Libraries, archives, preservation orgs capturing/archiving
scholarly resources on the web
• Dynamic nature of the web
• Requires continuous updating of crawling frameworks
• If we can discover and learn patterns
• Crawling and archiving frameworks could be “smarter”
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
How does this work?
10.1007/978-3-540-87599-4_38
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
How does this not work?
10.1007/978-3-540-87599-4_38
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
How does this work?
https://doi.org/10.1007/978-3-540-87599-4_38
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
How does this work?
https://doi.org/10.1007/978-3-540-87599-4_38
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
How does this work?
https://doi.org/10.1007/978-3-540-87599-4_38
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
How does this work?
https://doi.org/10.1007/978-3-540-87599-4_38

http://link.springer.com/10.1007/978-3-540-87599-4_38
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
How does this work?
https://doi.org/10.1007/978-3-540-87599-4_38

http://link.springer.com/10.1007/978-3-540-87599-4_38

https://link.springer.com/10.1007/978-3-540-87599-4_38
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
How does this work?
https://doi.org/10.1007/978-3-540-87599-4_38

http://link.springer.com/10.1007/978-3-540-87599-4_38

https://link.springer.com/10.1007/978-3-540-87599-4_38

https://link.springer.com/chapter/10.1007%2F978-3-540-87599-4_38
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
DOI dataset
• Gathering a representative sample is not trivial!
• Internet Archive conducts crawls of the scholarly domain
• June 2018: 93 million DOIs
• Obtained WARC files and extracted DOI redirect chain
• Investigate publisher distribution
• Final link of redirect chain and extract host e.g.:
https://link.springer.com/chapter/10.1007%2F978-3-540-87599-4_38

Domain: springer.com
• Randomly pick 100 DOIs from the 100 most frequent domains
• 10,000 DOIs
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Domain distribution
0 2000 4000 6000 8000 10000
1e+001e+021e+041e+06
Hosts
Frequency
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Web clients and HTTP requests 1/4
• HEAD request
• Server responds with response headers
• *but no* response body
• Client: cURL
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Web clients and HTTP requests 1/4
• HEAD request
• Server responds with response headers
• *but no* response body
• Client: cURL
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Web clients and HTTP requests 2/4
• GET request
• Server responds with response headers
• *and* response body
• Client: cURL
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Web clients and HTTP requests 2/4
• GET request
• Server responds with response headers
• *and* response body
• Client: cURL
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Web clients and HTTP requests 3/4
• GET+
• GET request with request headers
• User Agent (desktop Chrome browser)
• Specified connection timeout
• Specified maximum number of redirects
• Cookies accepted and stored
• Insecure connections allowed
• Client: cURL
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Web clients and HTTP requests 3/4
• GET+
• GET request with request headers
• User Agent (desktop Chrome browser)
• Specified connection timeout
• Specified maximum number of redirects
• Cookies accepted and stored
• Insecure connections allowed
• Client: cURL
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Web clients and HTTP requests 4/4
• Chrome:
• GET request via Selenium Webdriver controlled browser
• Client: Chrome
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Web clients and HTTP requests 4/4
• Chrome:
• GET request via Selenium Webdriver controlled browser
• Client: Chrome
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Regarding response headers, RFC 7231 states:
“The server SHOULD send the same header
fields in response to a HEAD request as it would
have sent if the request had been a GET...”.
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
HTTP response codes
• 2xx
• Success
• 3xx
• Redirection
• 4xx
• Client error
• 5xx
• Server error
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Response codes of last link in redirect chain
200 301 302 303 400 401 403 404 405 406 500 502 503 509 520
020006000
HEAD
GET
GET+
Chrome
IA Crawl
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Response codes of last link in redirect chain by DOI
HEAD GET GET+ Chrome IA Crawl
2xx 3xx 4xx 5xx
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Frequency of number of redirects
1 2 3 4 5 6 7 8 14 21
0100030005000
HEAD
GET
GET+
Chrome
IA Crawl
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Frequency of number of redirects for final 200s
2 3 4 5 6 7 8 14
050015002500
HEAD
GET
GET+
Chrome
IA Crawl
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Take-aways & next steps
• Scholarly publishers respond differently to requests against DOIs
• Depending on HTTP client and request method
• Implications for crawlers:
• Test different combinations of clients and request methods
• Pretend to be as human as possible
• Repeat from within LANL network with subscriptions to publishers’
content
• Repeat at a later point in time, check for changes in redirection
chains
Who is Asking? Humans and Machines Experience a Different Scholarly Web
@mart1nkle1n
iPres, Amsterdam, The Netherlands, September 17 2019
Martin Klein
Los Alamos National Laboratory
martinklein0815@gmail.com
@mart1nkle1n
with
Lyudmila Balakireva (LANL)
Harihar Shankar (98point6)
Who is Asking?
Humans and Machines
Experience a Different Scholarly Web
HEAD GET GET+ Chrome IA Crawl
2xx 3xx 4xx 5xx
HEAD GET GET+ Chrome IA Crawl
010002000300040005000
2xx 3xx 4xx 5xx

More Related Content

More from Martin Klein

A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly ArtifactsMartin Klein
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...Martin Klein
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento RequestsMartin Klein
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesMartin Klein
 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsMartin Klein
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsMartin Klein
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live WebMartin Klein
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web ResourcesMartin Klein
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for RepositoriesMartin Klein
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDMartin Klein
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationMartin Klein
 
Uniform Access to Raw Mementos
Uniform Access to Raw MementosUniform Access to Raw Mementos
Uniform Access to Raw MementosMartin Klein
 
Robust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationRobust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationMartin Klein
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...Martin Klein
 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationMartin Klein
 
web_archive_interoperability_memento
web_archive_interoperability_mementoweb_archive_interoperability_memento
web_archive_interoperability_mementoMartin Klein
 
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...Martin Klein
 
Comparing Published Scientific Journal Articles to Their Pre-print Versions
Comparing Published Scientific Journal Articles  to Their Pre-print VersionsComparing Published Scientific Journal Articles  to Their Pre-print Versions
Comparing Published Scientific Journal Articles to Their Pre-print VersionsMartin Klein
 
Preserving Born-Digital News Panel JCDL 2016
Preserving Born-Digital News Panel JCDL 2016Preserving Born-Digital News Panel JCDL 2016
Preserving Born-Digital News Panel JCDL 2016Martin Klein
 
How much does $1.7 billion buy?
How much does $1.7 billion buy?How much does $1.7 billion buy?
How much does $1.7 billion buy?Martin Klein
 

More from Martin Klein (20)

A Vision of the Library’s Role in Archiving Scholarly Artifacts
A Vision of the Library’s Role  in Archiving Scholarly ArtifactsA Vision of the Library’s Role  in Archiving Scholarly Artifacts
A Vision of the Library’s Role in Archiving Scholarly Artifacts
 
First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...First Steps in Research Data Management Under Constraints of a National Secur...
First Steps in Research Data Management Under Constraints of a National Secur...
 
Smart Routing of Memento Requests
Smart Routing of Memento RequestsSmart Routing of Memento Requests
Smart Routing of Memento Requests
 
Building Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web ArchivesBuilding Event Collections from Crawling Web Archives
Building Event Collections from Crawling Web Archives
 
A Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly ArtifactsA Web-Centric Pipeline for Archiving Scholarly Artifacts
A Web-Centric Pipeline for Archiving Scholarly Artifacts
 
Focused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event CollectionsFocused Crawl of Web Archives to Build Event Collections
Focused Crawl of Web Archives to Build Event Collections
 
Creating Topical Collections: Web Archives vs. Live Web
Creating Topical Collections:Web Archives vs. Live WebCreating Topical Collections:Web Archives vs. Live Web
Creating Topical Collections: Web Archives vs. Live Web
 
Robust Linking to Web Resources
Robust Linking to Web ResourcesRobust Linking to Web Resources
Robust Linking to Web Resources
 
Signposting for Repositories
Signposting for RepositoriesSignposting for Repositories
Signposting for Repositories
 
Discovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCIDDiscovering Scholarly Orphans Using ORCID
Discovering Scholarly Orphans Using ORCID
 
Using the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly CommunicationUsing the Memento Framework to Assess Content Drift in Scholarly Communication
Using the Memento Framework to Assess Content Drift in Scholarly Communication
 
Uniform Access to Raw Mementos
Uniform Access to Raw MementosUniform Access to Raw Mementos
Uniform Access to Raw Mementos
 
Robust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communicationRobust Links - a proposed solution to reference rot in scholarly communication
Robust Links - a proposed solution to reference rot in scholarly communication
 
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
ResourceSync - Overview and Real-World Use Cases for Discovery, Harvesting, a...
 
To the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly CommunicationTo the Rescue of the Orphans of Scholarly Communication
To the Rescue of the Orphans of Scholarly Communication
 
web_archive_interoperability_memento
web_archive_interoperability_mementoweb_archive_interoperability_memento
web_archive_interoperability_memento
 
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
Reference Rot in Scholarly Communication: A Reliable Quantification and a P...
 
Comparing Published Scientific Journal Articles to Their Pre-print Versions
Comparing Published Scientific Journal Articles  to Their Pre-print VersionsComparing Published Scientific Journal Articles  to Their Pre-print Versions
Comparing Published Scientific Journal Articles to Their Pre-print Versions
 
Preserving Born-Digital News Panel JCDL 2016
Preserving Born-Digital News Panel JCDL 2016Preserving Born-Digital News Panel JCDL 2016
Preserving Born-Digital News Panel JCDL 2016
 
How much does $1.7 billion buy?
How much does $1.7 billion buy?How much does $1.7 billion buy?
How much does $1.7 billion buy?
 

Recently uploaded

Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.soniya singh
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...Diya Sharma
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...aditipandeya
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Roomgirls4nights
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Sheetaleventcompany
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$kojalkojal131
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkataanamikaraghav4
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...APNIC
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebJames Anderson
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts servicesonalikaur4
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Callshivangimorya083
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Roomdivyansh0kumar0
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsstephieert
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of indiaimessage0108
 

Recently uploaded (20)

Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Ashram Chowk Delhi 💯Call Us 🔝8264348440🔝
 
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
Call Now ☎ 8264348440 !! Call Girls in Shahpur Jat Escort Service Delhi N.C.R.
 
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
₹5.5k {Cash Payment}New Friends Colony Call Girls In [Delhi NIHARIKA] 🔝|97111...
 
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No AdvanceRohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
Rohini Sector 6 Call Girls Delhi 9999965857 @Sabina Saikh No Advance
 
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
VIP 7001035870 Find & Meet Hyderabad Call Girls Dilsukhnagar high-profile Cal...
 
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
Call Girls In Saket Delhi 💯Call Us 🔝8264348440🔝
 
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With RoomVIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
VIP Kolkata Call Girls Salt Lake 8250192130 Available With Room
 
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
Call Girls Service Chandigarh Lucky ❤️ 7710465962 Independent Call Girls In C...
 
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls KolkataVIP Call Girls Kolkata Ananya 🤌  8250192130 🚀 Vip Call Girls Kolkata
VIP Call Girls Kolkata Ananya 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
Call Girls Dubai Prolapsed O525547819 Call Girls In Dubai Princes$
 
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls KolkataRussian Call Girls in Kolkata Ishita 🤌  8250192130 🚀 Vip Call Girls Kolkata
Russian Call Girls in Kolkata Ishita 🤌 8250192130 🚀 Vip Call Girls Kolkata
 
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
'Future Evolution of the Internet' delivered by Geoff Huston at Everything Op...
 
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark WebGDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
GDG Cloud Southlake 32: Kyle Hettinger: Demystifying the Dark Web
 
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts serviceChennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
Chennai Call Girls Porur Phone 🍆 8250192130 👅 celebrity escorts service
 
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip CallDelhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
Delhi Call Girls Rohini 9711199171 ☎✔👌✔ Whatsapp Hard And Sexy Vip Call
 
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130  Available With RoomVIP Kolkata Call Girl Alambazar 👉 8250192130  Available With Room
VIP Kolkata Call Girl Alambazar 👉 8250192130 Available With Room
 
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝Model Call Girl in  Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Jamuna Vihar Delhi reach out to us at 🔝9953056974🔝
 
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
Dwarka Sector 26 Call Girls | Delhi | 9999965857 🫦 Vanshika Verma More Our Se...
 
Radiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girlsRadiant Call girls in Dubai O56338O268 Dubai Call girls
Radiant Call girls in Dubai O56338O268 Dubai Call girls
 
Gram Darshan PPT cyber rural in villages of india
Gram Darshan PPT cyber rural  in villages of indiaGram Darshan PPT cyber rural  in villages of india
Gram Darshan PPT cyber rural in villages of india
 

Humans and Machines Experience Different Scholarly Web Responses

  • 1. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Martin Klein Los Alamos National Laboratory martinklein0815@gmail.com @mart1nkle1n with Lyudmila Balakireva (LANL) Harihar Shankar (98point6) Who is Asking? Humans and Machines Experience a Different Scholarly Web HEAD GET GET+ Chrome IA Crawl 2xx 3xx 4xx 5xx HEAD GET GET+ Chrome IA Crawl 010002000300040005000 2xx 3xx 4xx 5xx
  • 2. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Imagine this is your phone…
  • 3. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 …and you are calling 112…
  • 4. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 …this person responds...
  • 5. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 …and you are getting the help you need!
  • 6. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 What if this is your phone …
  • 7. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 …and you are calling 112…
  • 8. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 …this other person responds...
  • 9. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 …and some “help” is coming!
  • 10. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 But what if this is your phone …
  • 11. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 … and you are calling 112 …
  • 12. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 …no one responds...
  • 13. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 …and you don’t get any help!
  • 14. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 No more scary 112 calls! • Phones are web clients • 112 calls are HTTP requests against DOIs • Regardless of the web client you use, would you not expect the same response from a web server responding to the request against a DOI?
  • 15. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Idea… • Comparative study investigating scholarly publishers’ responses • To common HTTP requests • Against DOIs • Using multiple different web clients, resembling • Machines browsing • Humans browsing
  • 16. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Why is this relevant? • Archival use case • Libraries, archives, preservation orgs capturing/archiving scholarly resources on the web • Dynamic nature of the web • Requires continuous updating of crawling frameworks • If we can discover and learn patterns • Crawling and archiving frameworks could be “smarter”
  • 17. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 How does this work? 10.1007/978-3-540-87599-4_38
  • 18. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 How does this not work? 10.1007/978-3-540-87599-4_38
  • 19. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 How does this work? https://doi.org/10.1007/978-3-540-87599-4_38
  • 20. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 How does this work? https://doi.org/10.1007/978-3-540-87599-4_38
  • 21. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 How does this work? https://doi.org/10.1007/978-3-540-87599-4_38
  • 22. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 How does this work? https://doi.org/10.1007/978-3-540-87599-4_38  http://link.springer.com/10.1007/978-3-540-87599-4_38
  • 23. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 How does this work? https://doi.org/10.1007/978-3-540-87599-4_38  http://link.springer.com/10.1007/978-3-540-87599-4_38  https://link.springer.com/10.1007/978-3-540-87599-4_38
  • 24. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 How does this work? https://doi.org/10.1007/978-3-540-87599-4_38  http://link.springer.com/10.1007/978-3-540-87599-4_38  https://link.springer.com/10.1007/978-3-540-87599-4_38  https://link.springer.com/chapter/10.1007%2F978-3-540-87599-4_38
  • 25. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 DOI dataset • Gathering a representative sample is not trivial! • Internet Archive conducts crawls of the scholarly domain • June 2018: 93 million DOIs • Obtained WARC files and extracted DOI redirect chain • Investigate publisher distribution • Final link of redirect chain and extract host e.g.: https://link.springer.com/chapter/10.1007%2F978-3-540-87599-4_38  Domain: springer.com • Randomly pick 100 DOIs from the 100 most frequent domains • 10,000 DOIs
  • 26. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Domain distribution 0 2000 4000 6000 8000 10000 1e+001e+021e+041e+06 Hosts Frequency
  • 27. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Web clients and HTTP requests 1/4 • HEAD request • Server responds with response headers • *but no* response body • Client: cURL
  • 28. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Web clients and HTTP requests 1/4 • HEAD request • Server responds with response headers • *but no* response body • Client: cURL
  • 29. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Web clients and HTTP requests 2/4 • GET request • Server responds with response headers • *and* response body • Client: cURL
  • 30. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Web clients and HTTP requests 2/4 • GET request • Server responds with response headers • *and* response body • Client: cURL
  • 31. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Web clients and HTTP requests 3/4 • GET+ • GET request with request headers • User Agent (desktop Chrome browser) • Specified connection timeout • Specified maximum number of redirects • Cookies accepted and stored • Insecure connections allowed • Client: cURL
  • 32. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Web clients and HTTP requests 3/4 • GET+ • GET request with request headers • User Agent (desktop Chrome browser) • Specified connection timeout • Specified maximum number of redirects • Cookies accepted and stored • Insecure connections allowed • Client: cURL
  • 33. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Web clients and HTTP requests 4/4 • Chrome: • GET request via Selenium Webdriver controlled browser • Client: Chrome
  • 34. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Web clients and HTTP requests 4/4 • Chrome: • GET request via Selenium Webdriver controlled browser • Client: Chrome
  • 35. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Regarding response headers, RFC 7231 states: “The server SHOULD send the same header fields in response to a HEAD request as it would have sent if the request had been a GET...”.
  • 36. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 HTTP response codes • 2xx • Success • 3xx • Redirection • 4xx • Client error • 5xx • Server error
  • 37. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Response codes of last link in redirect chain 200 301 302 303 400 401 403 404 405 406 500 502 503 509 520 020006000 HEAD GET GET+ Chrome IA Crawl
  • 38. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Response codes of last link in redirect chain by DOI HEAD GET GET+ Chrome IA Crawl 2xx 3xx 4xx 5xx
  • 39. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Frequency of number of redirects 1 2 3 4 5 6 7 8 14 21 0100030005000 HEAD GET GET+ Chrome IA Crawl
  • 40. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Frequency of number of redirects for final 200s 2 3 4 5 6 7 8 14 050015002500 HEAD GET GET+ Chrome IA Crawl
  • 41. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Take-aways & next steps • Scholarly publishers respond differently to requests against DOIs • Depending on HTTP client and request method • Implications for crawlers: • Test different combinations of clients and request methods • Pretend to be as human as possible • Repeat from within LANL network with subscriptions to publishers’ content • Repeat at a later point in time, check for changes in redirection chains
  • 42. Who is Asking? Humans and Machines Experience a Different Scholarly Web @mart1nkle1n iPres, Amsterdam, The Netherlands, September 17 2019 Martin Klein Los Alamos National Laboratory martinklein0815@gmail.com @mart1nkle1n with Lyudmila Balakireva (LANL) Harihar Shankar (98point6) Who is Asking? Humans and Machines Experience a Different Scholarly Web HEAD GET GET+ Chrome IA Crawl 2xx 3xx 4xx 5xx HEAD GET GET+ Chrome IA Crawl 010002000300040005000 2xx 3xx 4xx 5xx