SlideShare a Scribd company logo
1 of 13
Download to read offline
Leabharlann UCD
An Coláiste Ollscoile, Baile
Átha Cliath,
Belfield, Baile Átha Cliath 4,
Eire
UCD Library
University College Dublin,
Belfield, Dublin 4, Ireland
Robot hunter
Or, precisely what I thought I wouldn’t
be doing when I became a librarian
Joseph Greene
Research Repository Librarian
joseph.greene@ucd.ie
http://researchrepository.ucd.ie
Counting downloads
• Open Access repositories make science and
scholarship accessible, and we need to
demonstrate our value
• Simple question: how often are these papers
used? How many times have they been
downloaded?
Enter the Robot
• At least 18% of web requests are from robots
• Less than half can be accounted for by the five
main search engines
• At Research Repository UCD, 2/3rds of our
repository’s downloads are marked as web robots
What are you talking about?
Internet robot, Web robot, automated agent,
crawler, spider, bot: any programme that visits
websites and systematically retrieves information
from them
Good and bad
• Search engines, link verifiers, computer science
experiments
• Gathering content for spam, phishing and copycat
sites, artificially improving a website’s ranking
(spamdexing), looking for security holes, DDoS
attacks…………
‘And the noisy, nasty nuisance grew, ‘til
the villagers cried, “What can we do?”’
Detection methods:
• Blocking robots in real-time:
Turing tests
• Detecting later and removing
from statistics
Appropriate, but problematic methods
for repositories
• Excluding known robots by user-agent name
– Easily faked or omitted
• Excluding by IP address
– DHCP, and list is growing exponentially
• Usage pattern analysis: query rate and resources
requested
– Expensive to automate
• Machine learning: training decision trees, neural
nets and/or statistical systems
– Did you say expensive???
• Combined approaches
Effectiveness, and repository out-of-the-
box repository strategies
Strength
Robots detected by Recall (%) Precision (%)
No images requested 98.34 75.48
No referring site 96.27 52.25
List of IP addresses 69.29 99.40
HEAD method to access site 32.37 100.00
Agent name declared 26.56 100.00
Access only at night 24.48 50.43
Robots.txt file accessed 17.01 100.00
Time, σ (3s) 2.49 100.00
Time, average (1s) 2.49 75.00
DSpace uses IP addresses of
known agents – much weaker than
in the benchmarking study
Effectiveness, and repository out-of-the-
box repository strategies
Strength
Robots detected by Recall (%) Precision (%)
No images requested 98.34 75.48
No referring site 96.27 52.25
List of IP addresses 69.29 99.40
HEAD method to access site 32.37 100.00
Agent name declared 26.56 100.00
Access only at night 24.48 50.43
Robots.txt file accessed 17.01 100.00
Time, σ (3s) 2.49 100.00
Time, average (1s) 2.49 75.00
Eprints filters based on number of
hits from an IP address per day –
similar to time based strategies in
the benchmarking study
Effectiveness, and repository out-of-the-
box repository strategies
Strength
Robots detected by Recall (%) Precision (%)
No images requested 98.34 75.48
No referring site 96.27 52.25
List of IP addresses 69.29 99.40
HEAD method to access site 32.37 100.00
Agent name declared 26.56 100.00
Access only at night 24.48 50.43
Robots.txt file accessed 17.01 100.00
Time, σ (3s) 2.49 100.00
Time, average (1s) 2.49 75.00
Centralised strategy: IRUS-UK
• Collects and filters statistics from 84 DSpace and
Eprints repositories
• COUNTER compliant usage statistics
• Robot exclusion:
– The COUNTER list of agent names
– All downloads from IP addresses where there are
more than 200 downloads in a day from a
repository
– Most downloads from IP addresses where there are
more than 100 downloads in a day from a
repository
• Work commissioned to investigate feasibility and
approach to adaptive filtering based on usage
behaviour
Sources by slide
1 Bill Gosper's Glider Gun in action—a variation of Conway's Game of Life. Johan G.
Bontes.
<https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#/media/File:Gosper
s_glider_gun.gif>
3, 6, 7 Doran, D.; Gokhale, S.S. Web robot detection techniques: overview and
limitations. Data Mining and Knowledge Discovery (2011) 22:183-210.
DOI:10.1007/s10618-010-0180-z
4 http://pixabay.com/static/uploads/photo/2015/05/31/12/09/wooden-
791421_640.jpg
5 Bad Robot Productions logo. 2001-2008.
<https://en.wikipedia.org/wiki/Bad_Robot_Productions#/media/File:Bad_Robot_
Productions_logo.jpg>
6 Burroway, J., Loard, J. V. The Giant Jam Sandwich. 1972, Houghton Mifflin Harcourt.
8, 9, 10 Nick Geens, Johan Huysmans, Jan Vanthienen. Evaluation of Web Robot
Discovery Techniques: A Benchmarking Study. Advances in Data Mining.
Applications in Medicine, Web Mining, Marketing, Image and Signal Mining.
Lecture Notes in Computer Science 4065, pp 121-130, 2006.
DOI:10.1007/11790853_10
8 Diggory, Mark. SOLR Statistics. DSpace Wiki.
<https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics>
9 Joint, Nicholas. [EP-tech] Re: Please change the way IRstats works. Eprints_tech
mailing list 2011-10-13 <http://www.eprints.org/tech.php/15695.html>
11 IRUS-UK. <http://www.irus.mimas.ac.uk/participants/>
Thank you!

More Related Content

Viewers also liked

bradail-irish-plagiarism-tutorial-ucd-library
bradail-irish-plagiarism-tutorial-ucd-librarybradail-irish-plagiarism-tutorial-ucd-library
bradail-irish-plagiarism-tutorial-ucd-libraryUCD Library
 
What Is LibGuides?
What Is LibGuides?What Is LibGuides?
What Is LibGuides?UCD Library
 
The production process and promotion of video in UCD Library, integrating Web...
The production process and promotion of video in UCD Library, integrating Web...The production process and promotion of video in UCD Library, integrating Web...
The production process and promotion of video in UCD Library, integrating Web...UCD Library
 
EU Tools for all Open Data harmonisation all over Europe
EU Tools for all Open Data harmonisation all over EuropeEU Tools for all Open Data harmonisation all over Europe
EU Tools for all Open Data harmonisation all over EuropeMarc Garriga
 
Creating spaces for learning : designing the UCD Health Sciences Library. Aut...
Creating spaces for learning : designing the UCD Health Sciences Library. Aut...Creating spaces for learning : designing the UCD Health Sciences Library. Aut...
Creating spaces for learning : designing the UCD Health Sciences Library. Aut...UCD Library
 
Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...
Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...
Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...UCD Library
 
Reassembling a Forgotten Library: The Library of the Royal College of Science...
Reassembling a Forgotten Library: The Library of the Royal College of Science...Reassembling a Forgotten Library: The Library of the Royal College of Science...
Reassembling a Forgotten Library: The Library of the Royal College of Science...UCD Library
 
UKASFP Conference 2009
UKASFP Conference 2009UKASFP Conference 2009
UKASFP Conference 2009carl plant
 
Presentation #1ODataLicenseEU. LAPSI Seminar, Budapest
Presentation #1ODataLicenseEU. LAPSI Seminar, BudapestPresentation #1ODataLicenseEU. LAPSI Seminar, Budapest
Presentation #1ODataLicenseEU. LAPSI Seminar, BudapestMarc Garriga
 
Seeing through learners' eyes
Seeing through learners' eyesSeeing through learners' eyes
Seeing through learners' eyesUCD Library
 
Teaching support : different perspectives, shared challenges. Authors: Ursula...
Teaching support : different perspectives, shared challenges. Authors: Ursula...Teaching support : different perspectives, shared challenges. Authors: Ursula...
Teaching support : different perspectives, shared challenges. Authors: Ursula...UCD Library
 
Week 2 Uf 5163
Week 2 Uf 5163Week 2 Uf 5163
Week 2 Uf 5163Mohd Yusak
 
Environmental Conflicts - Resolution Through Reframing
Environmental Conflicts - Resolution Through Reframing Environmental Conflicts - Resolution Through Reframing
Environmental Conflicts - Resolution Through Reframing Mark Szabo
 
Library Resource Discovery Service - Is Instructional Help Necessary?
Library Resource Discovery Service - Is Instructional Help Necessary?Library Resource Discovery Service - Is Instructional Help Necessary?
Library Resource Discovery Service - Is Instructional Help Necessary?UCD Library
 
Roger matisse
Roger matisseRoger matisse
Roger matisseIrisat
 
Data driven cities: Gestionar las ciudades a partir de los datos
Data driven cities: Gestionar las ciudades a partir de los datosData driven cities: Gestionar las ciudades a partir de los datos
Data driven cities: Gestionar las ciudades a partir de los datosMarc Garriga
 

Viewers also liked (20)

bradail-irish-plagiarism-tutorial-ucd-library
bradail-irish-plagiarism-tutorial-ucd-librarybradail-irish-plagiarism-tutorial-ucd-library
bradail-irish-plagiarism-tutorial-ucd-library
 
What Is LibGuides?
What Is LibGuides?What Is LibGuides?
What Is LibGuides?
 
The production process and promotion of video in UCD Library, integrating Web...
The production process and promotion of video in UCD Library, integrating Web...The production process and promotion of video in UCD Library, integrating Web...
The production process and promotion of video in UCD Library, integrating Web...
 
EU Tools for all Open Data harmonisation all over Europe
EU Tools for all Open Data harmonisation all over EuropeEU Tools for all Open Data harmonisation all over Europe
EU Tools for all Open Data harmonisation all over Europe
 
Creating spaces for learning : designing the UCD Health Sciences Library. Aut...
Creating spaces for learning : designing the UCD Health Sciences Library. Aut...Creating spaces for learning : designing the UCD Health Sciences Library. Aut...
Creating spaces for learning : designing the UCD Health Sciences Library. Aut...
 
Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...
Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...
Open data licensing : Trojan horse or sunken treasure? Authors: Caleb Derven,...
 
Paula
PaulaPaula
Paula
 
Reassembling a Forgotten Library: The Library of the Royal College of Science...
Reassembling a Forgotten Library: The Library of the Royal College of Science...Reassembling a Forgotten Library: The Library of the Royal College of Science...
Reassembling a Forgotten Library: The Library of the Royal College of Science...
 
Presentation2
Presentation2Presentation2
Presentation2
 
UKASFP Conference 2009
UKASFP Conference 2009UKASFP Conference 2009
UKASFP Conference 2009
 
Presentation #1ODataLicenseEU. LAPSI Seminar, Budapest
Presentation #1ODataLicenseEU. LAPSI Seminar, BudapestPresentation #1ODataLicenseEU. LAPSI Seminar, Budapest
Presentation #1ODataLicenseEU. LAPSI Seminar, Budapest
 
Seeing through learners' eyes
Seeing through learners' eyesSeeing through learners' eyes
Seeing through learners' eyes
 
Teaching support : different perspectives, shared challenges. Authors: Ursula...
Teaching support : different perspectives, shared challenges. Authors: Ursula...Teaching support : different perspectives, shared challenges. Authors: Ursula...
Teaching support : different perspectives, shared challenges. Authors: Ursula...
 
Presentation6
Presentation6Presentation6
Presentation6
 
Week 2 Uf 5163
Week 2 Uf 5163Week 2 Uf 5163
Week 2 Uf 5163
 
My 2 cents on Productivity
My 2 cents on ProductivityMy 2 cents on Productivity
My 2 cents on Productivity
 
Environmental Conflicts - Resolution Through Reframing
Environmental Conflicts - Resolution Through Reframing Environmental Conflicts - Resolution Through Reframing
Environmental Conflicts - Resolution Through Reframing
 
Library Resource Discovery Service - Is Instructional Help Necessary?
Library Resource Discovery Service - Is Instructional Help Necessary?Library Resource Discovery Service - Is Instructional Help Necessary?
Library Resource Discovery Service - Is Instructional Help Necessary?
 
Roger matisse
Roger matisseRoger matisse
Roger matisse
 
Data driven cities: Gestionar las ciudades a partir de los datos
Data driven cities: Gestionar las ciudades a partir de los datosData driven cities: Gestionar las ciudades a partir de los datos
Data driven cities: Gestionar las ciudades a partir de los datos
 

Similar to Robot Hunter: or precisely what I thought I wouldn't be doing when I became a librarian

Developing COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesDeveloping COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesUCD Library
 
hacking techniques and intrusion techniques useful in OSINT.pptx
hacking techniques and intrusion techniques useful in OSINT.pptxhacking techniques and intrusion techniques useful in OSINT.pptx
hacking techniques and intrusion techniques useful in OSINT.pptxsconalbg
 
Discovery Systems Used in Academic Libraries Projects & Case Study
Discovery Systems Used in Academic Libraries Projects & Case StudyDiscovery Systems Used in Academic Libraries Projects & Case Study
Discovery Systems Used in Academic Libraries Projects & Case StudyHong (Jenny) Jing
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...Ian Foster
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeEdward Baker
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeVince Smith
 
Hunting: Defense Against The Dark Arts v2
Hunting: Defense Against The Dark Arts v2Hunting: Defense Against The Dark Arts v2
Hunting: Defense Against The Dark Arts v2Spyglass Security
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science ServicesIan Foster
 
Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016
Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016
Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016Danny Akacki
 
The Web Application Hackers Toolchain
The Web Application Hackers ToolchainThe Web Application Hackers Toolchain
The Web Application Hackers Toolchainjasonhaddix
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryIan Foster
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Alex Pinto
 
Dafgjgghhghfhjgghjhgy06-Footprinting.pptx
Dafgjgghhghfhjgghjhgy06-Footprinting.pptxDafgjgghhghfhjgghjhgy06-Footprinting.pptx
Dafgjgghhghfhjgghjhgy06-Footprinting.pptxAlfredObia1
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013Kirill Osipov
 
DEEPSEC 2013: Malware Datamining And Attribution
DEEPSEC 2013: Malware Datamining And AttributionDEEPSEC 2013: Malware Datamining And Attribution
DEEPSEC 2013: Malware Datamining And AttributionMichael Boman
 
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007Jason Hong
 
Chapter 2 for cyber security examination.pptx
Chapter 2 for cyber security examination.pptxChapter 2 for cyber security examination.pptx
Chapter 2 for cyber security examination.pptxMahdiHasanSowrav
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationIan Foster
 

Similar to Robot Hunter: or precisely what I thought I wouldn't be doing when I became a librarian (20)

Developing COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesDeveloping COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access Resources
 
hacking techniques and intrusion techniques useful in OSINT.pptx
hacking techniques and intrusion techniques useful in OSINT.pptxhacking techniques and intrusion techniques useful in OSINT.pptx
hacking techniques and intrusion techniques useful in OSINT.pptx
 
Discovery Systems Used in Academic Libraries Projects & Case Study
Discovery Systems Used in Academic Libraries Projects & Case StudyDiscovery Systems Used in Academic Libraries Projects & Case Study
Discovery Systems Used in Academic Libraries Projects & Case Study
 
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...Computing Just What You Need: Online Data Analysis and Reduction  at Extreme ...
Computing Just What You Need: Online Data Analysis and Reduction at Extreme ...
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
Hunting: Defense Against The Dark Arts v2
Hunting: Defense Against The Dark Arts v2Hunting: Defense Against The Dark Arts v2
Hunting: Defense Against The Dark Arts v2
 
Accelerating Discovery via Science Services
Accelerating Discovery via Science ServicesAccelerating Discovery via Science Services
Accelerating Discovery via Science Services
 
Solr for Data Science
Solr for Data ScienceSolr for Data Science
Solr for Data Science
 
Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016
Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016
Hunting: Defense Against The Dark Arts - BSides Philadelphia - 2016
 
The Web Application Hackers Toolchain
The Web Application Hackers ToolchainThe Web Application Hackers Toolchain
The Web Application Hackers Toolchain
 
Science as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate DiscoveryScience as a Service: How On-Demand Computing can Accelerate Discovery
Science as a Service: How On-Demand Computing can Accelerate Discovery
 
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013Applying Machine Learning to Network Security Monitoring - BayThreat 2013
Applying Machine Learning to Network Security Monitoring - BayThreat 2013
 
Dafgjgghhghfhjgghjhgy06-Footprinting.pptx
Dafgjgghhghfhjgghjhgy06-Footprinting.pptxDafgjgghhghfhjgghjhgy06-Footprinting.pptx
Dafgjgghhghfhjgghjhgy06-Footprinting.pptx
 
Science cloud foster june 2013
Science cloud foster june 2013Science cloud foster june 2013
Science cloud foster june 2013
 
DEEPSEC 2013: Malware Datamining And Attribution
DEEPSEC 2013: Malware Datamining And AttributionDEEPSEC 2013: Malware Datamining And Attribution
DEEPSEC 2013: Malware Datamining And Attribution
 
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007
Phinding Phish: An Evaluation of Anti-Phishing Toolbars, at NDSS 2007
 
Bots & spiders
Bots & spidersBots & spiders
Bots & spiders
 
Chapter 2 for cyber security examination.pptx
Chapter 2 for cyber security examination.pptxChapter 2 for cyber security examination.pptx
Chapter 2 for cyber security examination.pptx
 
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and AutomationThe Discovery Cloud: Accelerating Science via Outsourcing and Automation
The Discovery Cloud: Accelerating Science via Outsourcing and Automation
 

More from UCD Library

The role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrityThe role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrityUCD Library
 
Collection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD LibraryCollection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD LibraryUCD Library
 
The authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA HumanitiesThe authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA HumanitiesUCD Library
 
Show and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and educationShow and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and educationUCD Library
 
Print to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital LibraryPrint to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital LibraryUCD Library
 
Appearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishersAppearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishersUCD Library
 
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...UCD Library
 
UCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library
 
Going Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in ChinaGoing Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in ChinaUCD Library
 
Going Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in ChinaGoing Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in ChinaUCD Library
 
Clifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an OverviewClifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an OverviewUCD Library
 
UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...UCD Library
 
Optimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryOptimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryUCD Library
 
Creating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital CollectionCreating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital CollectionUCD Library
 
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...UCD Library
 
Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...UCD Library
 
UCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining CollectionsUCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining CollectionsUCD Library
 
Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...UCD Library
 
Pin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locationsPin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locationsUCD Library
 
Real Life Digital Curation and Preservation
Real Life Digital Curation and PreservationReal Life Digital Curation and Preservation
Real Life Digital Curation and PreservationUCD Library
 

More from UCD Library (20)

The role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrityThe role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrity
 
Collection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD LibraryCollection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD Library
 
The authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA HumanitiesThe authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA Humanities
 
Show and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and educationShow and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and education
 
Print to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital LibraryPrint to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital Library
 
Appearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishersAppearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishers
 
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
 
UCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for Researchers
 
Going Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in ChinaGoing Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in China
 
Going Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in ChinaGoing Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in China
 
Clifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an OverviewClifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an Overview
 
UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...
 
Optimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryOptimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital Library
 
Creating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital CollectionCreating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital Collection
 
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
 
Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...
 
UCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining CollectionsUCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining Collections
 
Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...
 
Pin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locationsPin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locations
 
Real Life Digital Curation and Preservation
Real Life Digital Curation and PreservationReal Life Digital Curation and Preservation
Real Life Digital Curation and Preservation
 

Recently uploaded

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenHervé Boutemy
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity PlanDatabarracks
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 3652toLead Limited
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024BookNet Canada
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsPixlogix Infotech
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubKalema Edgar
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningLars Bell
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxLoriGlavin3
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024BookNet Canada
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxLoriGlavin3
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024Stephanie Beckett
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLScyllaDB
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyAlfredo García Lavilla
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024Lorenzo Miniero
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfAlex Barbosa Coqueiro
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brandgvaughan
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxLoriGlavin3
 

Recently uploaded (20)

DevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache MavenDevoxxFR 2024 Reproducible Builds with Apache Maven
DevoxxFR 2024 Reproducible Builds with Apache Maven
 
How to write a Business Continuity Plan
How to write a Business Continuity PlanHow to write a Business Continuity Plan
How to write a Business Continuity Plan
 
Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365Ensuring Technical Readiness For Copilot in Microsoft 365
Ensuring Technical Readiness For Copilot in Microsoft 365
 
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
New from BookNet Canada for 2024: BNC CataList - Tech Forum 2024
 
The Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and ConsThe Ultimate Guide to Choosing WordPress Pros and Cons
The Ultimate Guide to Choosing WordPress Pros and Cons
 
Unleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding ClubUnleash Your Potential - Namagunga Girls Coding Club
Unleash Your Potential - Namagunga Girls Coding Club
 
DSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine TuningDSPy a system for AI to Write Prompts and Do Fine Tuning
DSPy a system for AI to Write Prompts and Do Fine Tuning
 
The State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptxThe State of Passkeys with FIDO Alliance.pptx
The State of Passkeys with FIDO Alliance.pptx
 
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
New from BookNet Canada for 2024: Loan Stars - Tech Forum 2024
 
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptxThe Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
The Role of FIDO in a Cyber Secure Netherlands: FIDO Paris Seminar.pptx
 
What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024What's New in Teams Calling, Meetings and Devices March 2024
What's New in Teams Calling, Meetings and Devices March 2024
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Developer Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQLDeveloper Data Modeling Mistakes: From Postgres to NoSQL
Developer Data Modeling Mistakes: From Postgres to NoSQL
 
Commit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easyCommit 2024 - Secret Management made easy
Commit 2024 - Secret Management made easy
 
SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024SIP trunking in Janus @ Kamailio World 2024
SIP trunking in Janus @ Kamailio World 2024
 
Unraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdfUnraveling Multimodality with Large Language Models.pdf
Unraveling Multimodality with Large Language Models.pdf
 
WordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your BrandWordPress Websites for Engineers: Elevate Your Brand
WordPress Websites for Engineers: Elevate Your Brand
 
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data PrivacyTrustArc Webinar - How to Build Consumer Trust Through Data Privacy
TrustArc Webinar - How to Build Consumer Trust Through Data Privacy
 
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptxMerck Moving Beyond Passwords: FIDO Paris Seminar.pptx
Merck Moving Beyond Passwords: FIDO Paris Seminar.pptx
 

Robot Hunter: or precisely what I thought I wouldn't be doing when I became a librarian

  • 1. Leabharlann UCD An Coláiste Ollscoile, Baile Átha Cliath, Belfield, Baile Átha Cliath 4, Eire UCD Library University College Dublin, Belfield, Dublin 4, Ireland Robot hunter Or, precisely what I thought I wouldn’t be doing when I became a librarian Joseph Greene Research Repository Librarian joseph.greene@ucd.ie http://researchrepository.ucd.ie
  • 2. Counting downloads • Open Access repositories make science and scholarship accessible, and we need to demonstrate our value • Simple question: how often are these papers used? How many times have they been downloaded?
  • 3. Enter the Robot • At least 18% of web requests are from robots • Less than half can be accounted for by the five main search engines • At Research Repository UCD, 2/3rds of our repository’s downloads are marked as web robots
  • 4. What are you talking about? Internet robot, Web robot, automated agent, crawler, spider, bot: any programme that visits websites and systematically retrieves information from them
  • 5. Good and bad • Search engines, link verifiers, computer science experiments • Gathering content for spam, phishing and copycat sites, artificially improving a website’s ranking (spamdexing), looking for security holes, DDoS attacks…………
  • 6. ‘And the noisy, nasty nuisance grew, ‘til the villagers cried, “What can we do?”’ Detection methods: • Blocking robots in real-time: Turing tests • Detecting later and removing from statistics
  • 7. Appropriate, but problematic methods for repositories • Excluding known robots by user-agent name – Easily faked or omitted • Excluding by IP address – DHCP, and list is growing exponentially • Usage pattern analysis: query rate and resources requested – Expensive to automate • Machine learning: training decision trees, neural nets and/or statistical systems – Did you say expensive??? • Combined approaches
  • 8. Effectiveness, and repository out-of-the- box repository strategies Strength Robots detected by Recall (%) Precision (%) No images requested 98.34 75.48 No referring site 96.27 52.25 List of IP addresses 69.29 99.40 HEAD method to access site 32.37 100.00 Agent name declared 26.56 100.00 Access only at night 24.48 50.43 Robots.txt file accessed 17.01 100.00 Time, σ (3s) 2.49 100.00 Time, average (1s) 2.49 75.00 DSpace uses IP addresses of known agents – much weaker than in the benchmarking study
  • 9. Effectiveness, and repository out-of-the- box repository strategies Strength Robots detected by Recall (%) Precision (%) No images requested 98.34 75.48 No referring site 96.27 52.25 List of IP addresses 69.29 99.40 HEAD method to access site 32.37 100.00 Agent name declared 26.56 100.00 Access only at night 24.48 50.43 Robots.txt file accessed 17.01 100.00 Time, σ (3s) 2.49 100.00 Time, average (1s) 2.49 75.00 Eprints filters based on number of hits from an IP address per day – similar to time based strategies in the benchmarking study
  • 10. Effectiveness, and repository out-of-the- box repository strategies Strength Robots detected by Recall (%) Precision (%) No images requested 98.34 75.48 No referring site 96.27 52.25 List of IP addresses 69.29 99.40 HEAD method to access site 32.37 100.00 Agent name declared 26.56 100.00 Access only at night 24.48 50.43 Robots.txt file accessed 17.01 100.00 Time, σ (3s) 2.49 100.00 Time, average (1s) 2.49 75.00
  • 11. Centralised strategy: IRUS-UK • Collects and filters statistics from 84 DSpace and Eprints repositories • COUNTER compliant usage statistics • Robot exclusion: – The COUNTER list of agent names – All downloads from IP addresses where there are more than 200 downloads in a day from a repository – Most downloads from IP addresses where there are more than 100 downloads in a day from a repository • Work commissioned to investigate feasibility and approach to adaptive filtering based on usage behaviour
  • 12. Sources by slide 1 Bill Gosper's Glider Gun in action—a variation of Conway's Game of Life. Johan G. Bontes. <https://en.wikipedia.org/wiki/Conway%27s_Game_of_Life#/media/File:Gosper s_glider_gun.gif> 3, 6, 7 Doran, D.; Gokhale, S.S. Web robot detection techniques: overview and limitations. Data Mining and Knowledge Discovery (2011) 22:183-210. DOI:10.1007/s10618-010-0180-z 4 http://pixabay.com/static/uploads/photo/2015/05/31/12/09/wooden- 791421_640.jpg 5 Bad Robot Productions logo. 2001-2008. <https://en.wikipedia.org/wiki/Bad_Robot_Productions#/media/File:Bad_Robot_ Productions_logo.jpg> 6 Burroway, J., Loard, J. V. The Giant Jam Sandwich. 1972, Houghton Mifflin Harcourt. 8, 9, 10 Nick Geens, Johan Huysmans, Jan Vanthienen. Evaluation of Web Robot Discovery Techniques: A Benchmarking Study. Advances in Data Mining. Applications in Medicine, Web Mining, Marketing, Image and Signal Mining. Lecture Notes in Computer Science 4065, pp 121-130, 2006. DOI:10.1007/11790853_10 8 Diggory, Mark. SOLR Statistics. DSpace Wiki. <https://wiki.duraspace.org/display/DSDOC5x/SOLR+Statistics> 9 Joint, Nicholas. [EP-tech] Re: Please change the way IRstats works. Eprints_tech mailing list 2011-10-13 <http://www.eprints.org/tech.php/15695.html> 11 IRUS-UK. <http://www.irus.mimas.ac.uk/participants/>