SlideShare a Scribd company logo
1 of 21
Leabharlann UCD
An Coláiste Ollscoile, Baile
Átha Cliath,
Belfield, Baile Átha Cliath 4,
Eire
UCD Library
University College Dublin,
Belfield, Dublin 4, Ireland
Joseph Greene
Research Repository Librarian
University College Dublin
joseph.greene@ucd.ie
http://researchrepository.ucd.ie
#iCanHazRobot?
Improved robot detection for IR usage statistics
Open Repositories 2016
Dublin, 14 June
Overview and take-home points
• Usage stats are important
– (go to the Usage Stats panel on Thursday,
16/Jun/2016: 11:00am - 12:30pm)
• Robot filtration is a problem, especially in
repositories
• Robot detection has an exponential effect on
usage stats’ accuracy in repositories
• 2-3 ways to improve DSpace and EPrints’ usage
stats by 20% or more will be demonstrated
Experimental study
• Simple random sample of 2 years of UCD
repository’s download data
– n=341, N=3.3 million; 96.20% certainty
• Manually checked to determine if robot or human
• Applied DSpace, EPrints robot detection
algorithms to the dataset
– This is an EXPERIMENT, simulating algorithms on a
DSpace repository’s usage data and Apache logs
– The data is real, live data, and the algorithms were
very easy to simulate
First finding
85% of unfiltered
repository downloads
come from robots
• This is confirmed in a 2013 IRUS-UK white paper
on 20 IRs; 85% was also found to be robots
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Accuracyofdownloadstats(inverseprecition)
Recall (robots)
Catching more robots improves stats
(But how much depends on the number of robots)
Getbetterstats
Catch more robots
Typical website, 15% robot traffic
OA journal, 40% robot
Internet Archive, 91% robot
OA repositories, 85% robot
Robot detection techniques used
DSpace EPrints
Minho DSpace
Statistics Add-on
Rate of requests ✓3
User agent string ✓ ✓ ✓
robots.txt access ✓
Volume of requests ✓2
✓3
List of known robot IP addresses ✓ ✓
Reverse DNS name lookup ✓1
Trap file ✓
User agents per IP address
Width of traversal in the URL space ✓3
1
Only implemented nominally or experimentally
2
Via the repeat download or ‘double-click’ filter
3
Data available as a configurable report for manual decision making
Measurements used in robot detection
• All measurements are a number between 0 and 1
• Recall: proportion of robots detected
– I can haz robot?
• Precision: true positives in robot detection
– Proportion of discounted downloads that are
actually made by robots (sometimes humans are
counted as robots)
• Accuracy of download stats measured as inverse
precision:
– Proportion of stats that are actually made by
humans
How they perform, out-of-the-box
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
DSpace EPrints Minho Minho with
monthly manual
checking
No robot detection
Robot detection in OA IR systems
Recall Precision Negative precision (accuracy of download stats)
Room for improvement?
1. Ability to manually check for outliers
• At UCD, once a month, we check:
– Daily downloads for the last 2-4 months
– Top 10 most downloaded items
– Top 20 downloading IP addresses for the last 2-4
months
0
0.2
0.4
0.6
0.8
1
DSpace Eprints Minho
Robots caught (Recall)
Out-…
0
0.2
0.4
0.6
0.8
1
DSpace Eprints Minho Wihtout robot
detection
Accuracy of reported download stats
(Inverse precision)
Out-of-the-box
With manual checking (outlier exclusion)
2. Recalibrate the EPrints repeat-
download (double-click) filter
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall (robots) Precision (accuracy
of excluded
downloads)
Inverse recall
(legitimate
downloads
accounted for in
stats)
Inverse precision
(accuracy of
reported download
stats)
Overall accuracy
Effect of double-click filter on EPrints’ robot detection and stats
Without double-click filter With double-click filter (out-of-the-box) With recalibrated double-click filter*
𝑻𝒑 + 𝑻𝒏
𝒏
3. Port Minho’s robot detection code (a
log parser) onto DSpace or EPrints
• 1 Java class
• Input is Apache Combined Log Format
• Output is a database update (robot = true field)
– Similar to EPrints' $is_robot variable in Robots.pm,
– Could be modified to update the DSpace 'isBot'
field in the SOLR usage events document
• Requires 2 database tables to store learned
agents and IPs
0
0.2
0.4
0.6
0.8
1
DSpace Eprints Minho
Robots caught (Recall)
0
0.2
0.4
0.6
0.8
1
DSpace Eprints Minho Wihtout robot
detection
Accuracy of reported download stats
(Inverse precision)
Out-of-the-box With Minho log parser
4. Combine two or more techniques
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
DSpace Eprints Minho
Robots caught
(Recall)
Out-of-the-box
With manual
checking (outlier
exclusion)
With recalibrated
double click filter*
With Minho log
parser
With Minho and
outliers
Minho, outliers, and
recalibrated double-
click*
4. Combine two or more techniques
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
DSpace Eprints Minho Wihtout robot
detection
Accuracy of reported download stats
(Inverse precision)
Out-of-the-box
With manual checking
(outlier exclusion)
With recalibrated
double click filter*
With Minho log parser
With Minho and
outliers
Minho, outliers, and
recalibrated double-
click*
Thank you!

More Related Content

Viewers also liked

Using a consultancy to assist in developing the UCD vision for the future onl...
Using a consultancy to assist in developing the UCD vision for the future onl...Using a consultancy to assist in developing the UCD vision for the future onl...
Using a consultancy to assist in developing the UCD vision for the future onl...UCD Library
 
Les possibilitats d’Internet aplicades a l’agricultura ecològica
Les possibilitats d’Internet aplicades a l’agricultura ecològicaLes possibilitats d’Internet aplicades a l’agricultura ecològica
Les possibilitats d’Internet aplicades a l’agricultura ecològicaMarc Garriga
 
CII S'Marketing Convention 2009
CII S'Marketing Convention 2009CII S'Marketing Convention 2009
CII S'Marketing Convention 2009managemarketing
 
Is peer review peerless? Author: Tony Eklof
Is peer review peerless? Author: Tony EklofIs peer review peerless? Author: Tony Eklof
Is peer review peerless? Author: Tony EklofUCD Library
 
El Gobierno Abierto es la respuesta, ¿pero cuál era la pregunta?
El Gobierno Abierto es la respuesta, ¿pero cuál era la pregunta?El Gobierno Abierto es la respuesta, ¿pero cuál era la pregunta?
El Gobierno Abierto es la respuesta, ¿pero cuál era la pregunta?Marc Garriga
 
Dades Obertes. El valor del coneixement lliure.
Dades Obertes. El valor del coneixement lliure.Dades Obertes. El valor del coneixement lliure.
Dades Obertes. El valor del coneixement lliure.Marc Garriga
 
Weiying1新生儿
Weiying1新生儿Weiying1新生儿
Weiying1新生儿Deep Deep
 
Andy warhol . Raul and Gerard
 Andy warhol . Raul and Gerard Andy warhol . Raul and Gerard
Andy warhol . Raul and GerardIrisat
 
Курсовая работа
Курсовая работаКурсовая работа
Курсовая работаivan_z
 
Loex 2008 (P2)
Loex 2008 (P2)Loex 2008 (P2)
Loex 2008 (P2)oreinaue
 
On the shelf in time : developing a strategy to improve reading list support....
On the shelf in time : developing a strategy to improve reading list support....On the shelf in time : developing a strategy to improve reading list support....
On the shelf in time : developing a strategy to improve reading list support....UCD Library
 
Resource description and new media : challenges and opportunities. Authors: E...
Resource description and new media : challenges and opportunities. Authors: E...Resource description and new media : challenges and opportunities. Authors: E...
Resource description and new media : challenges and opportunities. Authors: E...UCD Library
 
The library as place. Author: Peter Hickey
The library as place. Author: Peter HickeyThe library as place. Author: Peter Hickey
The library as place. Author: Peter HickeyUCD Library
 
Pharmacy Businesslaw2
Pharmacy Businesslaw2Pharmacy Businesslaw2
Pharmacy Businesslaw2shyjesta
 
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...UCD Library
 

Viewers also liked (20)

Using a consultancy to assist in developing the UCD vision for the future onl...
Using a consultancy to assist in developing the UCD vision for the future onl...Using a consultancy to assist in developing the UCD vision for the future onl...
Using a consultancy to assist in developing the UCD vision for the future onl...
 
Les possibilitats d’Internet aplicades a l’agricultura ecològica
Les possibilitats d’Internet aplicades a l’agricultura ecològicaLes possibilitats d’Internet aplicades a l’agricultura ecològica
Les possibilitats d’Internet aplicades a l’agricultura ecològica
 
CII S'Marketing Convention 2009
CII S'Marketing Convention 2009CII S'Marketing Convention 2009
CII S'Marketing Convention 2009
 
Is peer review peerless? Author: Tony Eklof
Is peer review peerless? Author: Tony EklofIs peer review peerless? Author: Tony Eklof
Is peer review peerless? Author: Tony Eklof
 
El Gobierno Abierto es la respuesta, ¿pero cuál era la pregunta?
El Gobierno Abierto es la respuesta, ¿pero cuál era la pregunta?El Gobierno Abierto es la respuesta, ¿pero cuál era la pregunta?
El Gobierno Abierto es la respuesta, ¿pero cuál era la pregunta?
 
Dades Obertes. El valor del coneixement lliure.
Dades Obertes. El valor del coneixement lliure.Dades Obertes. El valor del coneixement lliure.
Dades Obertes. El valor del coneixement lliure.
 
OpenGovernment
OpenGovernmentOpenGovernment
OpenGovernment
 
Weiying1新生儿
Weiying1新生儿Weiying1新生儿
Weiying1新生儿
 
Presentation6
Presentation6Presentation6
Presentation6
 
Noms
NomsNoms
Noms
 
Andy warhol . Raul and Gerard
 Andy warhol . Raul and Gerard Andy warhol . Raul and Gerard
Andy warhol . Raul and Gerard
 
Курсовая работа
Курсовая работаКурсовая работа
Курсовая работа
 
Dynasties
DynastiesDynasties
Dynasties
 
Loex 2008 (P2)
Loex 2008 (P2)Loex 2008 (P2)
Loex 2008 (P2)
 
On the shelf in time : developing a strategy to improve reading list support....
On the shelf in time : developing a strategy to improve reading list support....On the shelf in time : developing a strategy to improve reading list support....
On the shelf in time : developing a strategy to improve reading list support....
 
Resource description and new media : challenges and opportunities. Authors: E...
Resource description and new media : challenges and opportunities. Authors: E...Resource description and new media : challenges and opportunities. Authors: E...
Resource description and new media : challenges and opportunities. Authors: E...
 
The library as place. Author: Peter Hickey
The library as place. Author: Peter HickeyThe library as place. Author: Peter Hickey
The library as place. Author: Peter Hickey
 
Pharmacy Businesslaw2
Pharmacy Businesslaw2Pharmacy Businesslaw2
Pharmacy Businesslaw2
 
Graphis Feature
Graphis FeatureGraphis Feature
Graphis Feature
 
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...
 

Similar to #iCanHazRobot?: improved robot detection for IR usage statistics

Developing COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesDeveloping COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesUCD Library
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringTao Xie
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesVincenzo Gulisano
 
Building and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility AnalyticsBuilding and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility AnalyticsEmiliano De Cristofaro
 
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an..."Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...SegInfo
 
3 Pitfalls Everyone Should Avoid with Cloud Native Observability
3 Pitfalls Everyone Should Avoid with Cloud Native Observability3 Pitfalls Everyone Should Avoid with Cloud Native Observability
3 Pitfalls Everyone Should Avoid with Cloud Native ObservabilityEric D. Schabell
 
BotMagnifier: Locating Spambots on the Internet
BotMagnifier: Locating Spambots on the InternetBotMagnifier: Locating Spambots on the Internet
BotMagnifier: Locating Spambots on the InternetGianluca Stringhini
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeEdward Baker
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeVince Smith
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibilityc.titus.brown
 
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Pete Burnap
 
Technical Workshop - Win32/Georbot Analysis
Technical Workshop - Win32/Georbot AnalysisTechnical Workshop - Win32/Georbot Analysis
Technical Workshop - Win32/Georbot AnalysisPositive Hack Days
 
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningitstuff
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesNish Parikh
 
A Fast, Offline Reverse Geocoder in Python
A Fast, Offline Reverse Geocoder in PythonA Fast, Offline Reverse Geocoder in Python
A Fast, Offline Reverse Geocoder in PythonAjay Thampi
 

Similar to #iCanHazRobot?: improved robot detection for IR usage statistics (20)

Developing COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access ResourcesDeveloping COUNTER Standards to Measure the Use of Open Access Resources
Developing COUNTER Standards to Measure the Use of Open Access Resources
 
Software Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software EngineeringSoftware Analytics: Data Analytics for Software Engineering
Software Analytics: Data Analytics for Software Engineering
 
Bots & spiders
Bots & spidersBots & spiders
Bots & spiders
 
The data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architecturesThe data streaming processing paradigm and its use in modern fog architectures
The data streaming processing paradigm and its use in modern fog architectures
 
Building and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility AnalyticsBuilding and Measuring Privacy-Preserving Mobility Analytics
Building and Measuring Privacy-Preserving Mobility Analytics
 
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an..."Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
"Automated Malware Analysis" de Gabriel Negreira Barbosa, Malware Research an...
 
3 Pitfalls Everyone Should Avoid with Cloud Native Observability
3 Pitfalls Everyone Should Avoid with Cloud Native Observability3 Pitfalls Everyone Should Avoid with Cloud Native Observability
3 Pitfalls Everyone Should Avoid with Cloud Native Observability
 
2015 moloch recipes
2015 moloch recipes2015 moloch recipes
2015 moloch recipes
 
BotMagnifier: Locating Spambots on the Internet
BotMagnifier: Locating Spambots on the InternetBotMagnifier: Locating Spambots on the Internet
BotMagnifier: Locating Spambots on the Internet
 
PhD Symposium 2014
PhD Symposium 2014PhD Symposium 2014
PhD Symposium 2014
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
NHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-LifeNHM Data Portal: first steps toward the Graph-of-Life
NHM Data Portal: first steps toward the Graph-of-Life
 
2014 nicta-reproducibility
2014 nicta-reproducibility2014 nicta-reproducibility
2014 nicta-reproducibility
 
2015 genome-center
2015 genome-center2015 genome-center
2015 genome-center
 
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
Real-time Classification of Malicious URLs on Twitter using Machine Activity ...
 
Technical Workshop - Win32/Georbot Analysis
Technical Workshop - Win32/Georbot AnalysisTechnical Workshop - Win32/Georbot Analysis
Technical Workshop - Win32/Georbot Analysis
 
2015 illinois-talk
2015 illinois-talk2015 illinois-talk
2015 illinois-talk
 
Large scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log miningLarge scale Click-streaming and tranaction log mining
Large scale Click-streaming and tranaction log mining
 
IEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slidesIEEE.BigData.Tutorial.2.slides
IEEE.BigData.Tutorial.2.slides
 
A Fast, Offline Reverse Geocoder in Python
A Fast, Offline Reverse Geocoder in PythonA Fast, Offline Reverse Geocoder in Python
A Fast, Offline Reverse Geocoder in Python
 

More from UCD Library

The role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrityThe role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrityUCD Library
 
Collection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD LibraryCollection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD LibraryUCD Library
 
The authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA HumanitiesThe authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA HumanitiesUCD Library
 
Show and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and educationShow and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and educationUCD Library
 
Print to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital LibraryPrint to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital LibraryUCD Library
 
Appearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishersAppearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishersUCD Library
 
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...UCD Library
 
UCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library
 
Going Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in ChinaGoing Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in ChinaUCD Library
 
Going Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in ChinaGoing Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in ChinaUCD Library
 
Clifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an OverviewClifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an OverviewUCD Library
 
UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...UCD Library
 
Optimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryOptimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryUCD Library
 
Creating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital CollectionCreating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital CollectionUCD Library
 
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...UCD Library
 
Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...UCD Library
 
UCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining CollectionsUCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining CollectionsUCD Library
 
Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...UCD Library
 
Pin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locationsPin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locationsUCD Library
 
Real Life Digital Curation and Preservation
Real Life Digital Curation and PreservationReal Life Digital Curation and Preservation
Real Life Digital Curation and PreservationUCD Library
 

More from UCD Library (20)

The role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrityThe role of academic libraries in supporting a culture of research integrity
The role of academic libraries in supporting a culture of research integrity
 
Collection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD LibraryCollection Management and GreenGlass at UCD Library
Collection Management and GreenGlass at UCD Library
 
The authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA HumanitiesThe authentic research experience: UCD Special Collections in the BA Humanities
The authentic research experience: UCD Special Collections in the BA Humanities
 
Show and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and educationShow and teach: the role of exhibitions in outreach and education
Show and teach: the role of exhibitions in outreach and education
 
Print to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital LibraryPrint to pixels: digitised periodical collections in UCD Digital Library
Print to pixels: digitised periodical collections in UCD Digital Library
 
Appearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishersAppearances can be deceiving: how to avoid 'predatory' publishers
Appearances can be deceiving: how to avoid 'predatory' publishers
 
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...
 
UCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for ResearchersUCD Library's Training Programme and Resources for Researchers
UCD Library's Training Programme and Resources for Researchers
 
Going Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in ChinaGoing Global: UCD Library's Experience of Teaching Information Literacy in China
Going Global: UCD Library's Experience of Teaching Information Literacy in China
 
Going Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in ChinaGoing Global: UCD Library's Experiences in China
Going Global: UCD Library's Experiences in China
 
Clifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an OverviewClifden Arts Festival Archive@UCD: an Overview
Clifden Arts Festival Archive@UCD: an Overview
 
UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...UCD Digital Library: Creating Digitised Content from Archival Collections - P...
UCD Digital Library: Creating Digitised Content from Archival Collections - P...
 
Optimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital LibraryOptimising Workflows for Digital Archives: UCD Digital Library
Optimising Workflows for Digital Archives: UCD Digital Library
 
Creating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital CollectionCreating the Collected Letters of Nano Nagle Digital Collection
Creating the Collected Letters of Nano Nagle Digital Collection
 
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...
 
Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...Enhancing User Engagement and Experiences through the Development of UCD Libr...
Enhancing User Engagement and Experiences through the Development of UCD Libr...
 
UCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining CollectionsUCD Library and GreenGlass: Defining Needs, Redefining Collections
UCD Library and GreenGlass: Defining Needs, Redefining Collections
 
Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...Are They Being Served? Reference Services Student Experience Project, UCD Lib...
Are They Being Served? Reference Services Student Experience Project, UCD Lib...
 
Pin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locationsPin It! Linking shelf-marks to shelf locations
Pin It! Linking shelf-marks to shelf locations
 
Real Life Digital Curation and Preservation
Real Life Digital Curation and PreservationReal Life Digital Curation and Preservation
Real Life Digital Curation and Preservation
 

Recently uploaded

USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...Postal Advocate Inc.
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Celine George
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptxiammrhaywood
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management systemChristalin Nelson
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfErwinPantujan2
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management SystemChristalin Nelson
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxHumphrey A Beña
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptxmary850239
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Mark Reed
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataBabyAnnMotar
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxElton John Embodo
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17Celine George
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4JOYLYNSAMANIEGO
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxJanEmmanBrigoli
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parentsnavabharathschool99
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxlancelewisportillo
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...JojoEDelaCruz
 

Recently uploaded (20)

USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
USPS® Forced Meter Migration - How to Know if Your Postage Meter Will Soon be...
 
Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17Field Attribute Index Feature in Odoo 17
Field Attribute Index Feature in Odoo 17
 
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptxAUDIENCE THEORY -CULTIVATION THEORY -  GERBNER.pptx
AUDIENCE THEORY -CULTIVATION THEORY - GERBNER.pptx
 
Concurrency Control in Database Management system
Concurrency Control in Database Management systemConcurrency Control in Database Management system
Concurrency Control in Database Management system
 
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdfVirtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
Virtual-Orientation-on-the-Administration-of-NATG12-NATG6-and-ELLNA.pdf
 
Transaction Management in Database Management System
Transaction Management in Database Management SystemTransaction Management in Database Management System
Transaction Management in Database Management System
 
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptxINTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
INTRODUCTION TO CATHOLIC CHRISTOLOGY.pptx
 
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptxFINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
FINALS_OF_LEFT_ON_C'N_EL_DORADO_2024.pptx
 
4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx4.16.24 21st Century Movements for Black Lives.pptx
4.16.24 21st Century Movements for Black Lives.pptx
 
Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)Influencing policy (training slides from Fast Track Impact)
Influencing policy (training slides from Fast Track Impact)
 
Measures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped dataMeasures of Position DECILES for ungrouped data
Measures of Position DECILES for ungrouped data
 
EMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docxEMBODO Lesson Plan Grade 9 Law of Sines.docx
EMBODO Lesson Plan Grade 9 Law of Sines.docx
 
How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17How to Add Barcode on PDF Report in Odoo 17
How to Add Barcode on PDF Report in Odoo 17
 
Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4Daily Lesson Plan in Mathematics Quarter 4
Daily Lesson Plan in Mathematics Quarter 4
 
Millenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptxMillenials and Fillennials (Ethical Challenge and Responses).pptx
Millenials and Fillennials (Ethical Challenge and Responses).pptx
 
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptxINCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
INCLUSIVE EDUCATION PRACTICES FOR TEACHERS AND TRAINERS.pptx
 
Choosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for ParentsChoosing the Right CBSE School A Comprehensive Guide for Parents
Choosing the Right CBSE School A Comprehensive Guide for Parents
 
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptxQ4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
Q4-PPT-Music9_Lesson-1-Romantic-Opera.pptx
 
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptxLEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
LEFT_ON_C'N_ PRELIMS_EL_DORADO_2024.pptx
 
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
ENG 5 Q4 WEEk 1 DAY 1 Restate sentences heard in one’s own words. Use appropr...
 

#iCanHazRobot?: improved robot detection for IR usage statistics

  • 1. Leabharlann UCD An Coláiste Ollscoile, Baile Átha Cliath, Belfield, Baile Átha Cliath 4, Eire UCD Library University College Dublin, Belfield, Dublin 4, Ireland Joseph Greene Research Repository Librarian University College Dublin joseph.greene@ucd.ie http://researchrepository.ucd.ie #iCanHazRobot? Improved robot detection for IR usage statistics Open Repositories 2016 Dublin, 14 June
  • 2. Overview and take-home points • Usage stats are important – (go to the Usage Stats panel on Thursday, 16/Jun/2016: 11:00am - 12:30pm) • Robot filtration is a problem, especially in repositories • Robot detection has an exponential effect on usage stats’ accuracy in repositories • 2-3 ways to improve DSpace and EPrints’ usage stats by 20% or more will be demonstrated
  • 3. Experimental study • Simple random sample of 2 years of UCD repository’s download data – n=341, N=3.3 million; 96.20% certainty • Manually checked to determine if robot or human • Applied DSpace, EPrints robot detection algorithms to the dataset – This is an EXPERIMENT, simulating algorithms on a DSpace repository’s usage data and Apache logs – The data is real, live data, and the algorithms were very easy to simulate
  • 4. First finding 85% of unfiltered repository downloads come from robots • This is confirmed in a 2013 IRUS-UK white paper on 20 IRs; 85% was also found to be robots
  • 5. 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Accuracyofdownloadstats(inverseprecition) Recall (robots) Catching more robots improves stats (But how much depends on the number of robots) Getbetterstats Catch more robots Typical website, 15% robot traffic OA journal, 40% robot Internet Archive, 91% robot OA repositories, 85% robot
  • 6. Robot detection techniques used DSpace EPrints Minho DSpace Statistics Add-on Rate of requests ✓3 User agent string ✓ ✓ ✓ robots.txt access ✓ Volume of requests ✓2 ✓3 List of known robot IP addresses ✓ ✓ Reverse DNS name lookup ✓1 Trap file ✓ User agents per IP address Width of traversal in the URL space ✓3 1 Only implemented nominally or experimentally 2 Via the repeat download or ‘double-click’ filter 3 Data available as a configurable report for manual decision making
  • 7. Measurements used in robot detection • All measurements are a number between 0 and 1 • Recall: proportion of robots detected – I can haz robot? • Precision: true positives in robot detection – Proportion of discounted downloads that are actually made by robots (sometimes humans are counted as robots) • Accuracy of download stats measured as inverse precision: – Proportion of stats that are actually made by humans
  • 8. How they perform, out-of-the-box 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 DSpace EPrints Minho Minho with monthly manual checking No robot detection Robot detection in OA IR systems Recall Precision Negative precision (accuracy of download stats)
  • 10. 1. Ability to manually check for outliers • At UCD, once a month, we check: – Daily downloads for the last 2-4 months – Top 10 most downloaded items – Top 20 downloading IP addresses for the last 2-4 months
  • 11.
  • 12.
  • 13.
  • 14.
  • 15. 0 0.2 0.4 0.6 0.8 1 DSpace Eprints Minho Robots caught (Recall) Out-… 0 0.2 0.4 0.6 0.8 1 DSpace Eprints Minho Wihtout robot detection Accuracy of reported download stats (Inverse precision) Out-of-the-box With manual checking (outlier exclusion)
  • 16. 2. Recalibrate the EPrints repeat- download (double-click) filter 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Recall (robots) Precision (accuracy of excluded downloads) Inverse recall (legitimate downloads accounted for in stats) Inverse precision (accuracy of reported download stats) Overall accuracy Effect of double-click filter on EPrints’ robot detection and stats Without double-click filter With double-click filter (out-of-the-box) With recalibrated double-click filter* 𝑻𝒑 + 𝑻𝒏 𝒏
  • 17. 3. Port Minho’s robot detection code (a log parser) onto DSpace or EPrints • 1 Java class • Input is Apache Combined Log Format • Output is a database update (robot = true field) – Similar to EPrints' $is_robot variable in Robots.pm, – Could be modified to update the DSpace 'isBot' field in the SOLR usage events document • Requires 2 database tables to store learned agents and IPs
  • 18. 0 0.2 0.4 0.6 0.8 1 DSpace Eprints Minho Robots caught (Recall) 0 0.2 0.4 0.6 0.8 1 DSpace Eprints Minho Wihtout robot detection Accuracy of reported download stats (Inverse precision) Out-of-the-box With Minho log parser
  • 19. 4. Combine two or more techniques 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 DSpace Eprints Minho Robots caught (Recall) Out-of-the-box With manual checking (outlier exclusion) With recalibrated double click filter* With Minho log parser With Minho and outliers Minho, outliers, and recalibrated double- click*
  • 20. 4. Combine two or more techniques 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 DSpace Eprints Minho Wihtout robot detection Accuracy of reported download stats (Inverse precision) Out-of-the-box With manual checking (outlier exclusion) With recalibrated double click filter* With Minho log parser With Minho and outliers Minho, outliers, and recalibrated double- click*

Editor's Notes

  1. Good news: DSpace and EPrints do robot filtration out-of-the-box, bad news: the stats are still quite inaccurate More good news: Improving robot recall has an exponential effect on usage stats accuracy Usage stats: primarily download counts, used heavily in marketing the repository and they provide a measure of ROI both to those who have uploaded them (investment of time/effort) and to those who fund the repository. More downloads = more UCD visibility – one measure of our ROI.
  2. Experiment: simple random sample of 2 years of download data (n=341, N=3.3 million for 96.20% certainty), manually checked to determine if robot or human. DSpace 1.8.2 with U. Minho DSpace Statistics Add-on v. 4. Apache Tomcat behind Apache HTTP server; logs in Apache Combined Log Format. Minho registers every download in the PostgreSQL database. Results to be published in July 2016 issue of Library Hi Tech (Greene 2016) This dataset is used to experimentally test different detection techniques used alone and in combination Weaknesses: The data is taken from a DSpace/Minho system (it's own SEO, it's own way of being crawled, etc.) 'In vitro': Except for the original system (DSpace/Minho + monthly manual outlier checking), the robot detection techniques are simulated. Hence, EXPERIMENTAL Strengths: 'In vivo': the data is real data from a production OA IR system Simulating the various detection techniques was very easy to do, so is probably a very accurate picture of how each system would have treated this dataset
  3. See: INFORMATION POWER LTD. 2013. IRUS download data: identifying unusual usage [Online]. Available: http://www.irus.mimas.ac.uk/news/IRUS_download_data_Final_report.pdf [Accessed 2015-12-11. Confirms 85% figure DORAN, D. & GOKHALE, S. S. 2011. Web robot detection techniques: overview and limitations. Data Mining and Knowledge Discovery, 22, 183-210. Hypothesizes why so high in OA (p.191)
  4. Typical website (15% robot traffic) (precision = 0.8727, mean of four studies; robots:total sessions = 0.1516, mean of four studies) OA journal (40% robot) HUNTINGTON, P., NICHOLAS, D. & JAMALI, H. R. 2008. Web robot detection in the scholarly information environment. Journal of Information Science, 34, 726-741. OA repositories (85% robot) Greene 2016 and Information Power 2013 (see above) Internet Archive (91% robot) ALNOAMANY, Y., WEIGLE, M. C. & NELSON, M. L. 2013. Access patterns for robots and humans in web archives. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 339-348. Reverse is also true: fail to catch robots (e.g. deterioration over time as robots improve their capabilities), accuracy of stats diminishes Formula: Greene 2016 𝐏𝐢𝐧𝐯 = 𝐓𝐑(𝐑−𝐏𝐑−𝟏)+𝟐𝐓𝐏𝐑−𝐏(𝐓+𝐑−𝟏) 𝐑(𝐓𝐑−𝐏−𝐓)+𝐏 R = recall (robot detection) P = precision (robot detection) Pinv = inverse precision (human stats) T = ratio of robots to total
  5. Greene 2016
  6. Minho with monthly manual checking is the original data as measured in vivo. Minho alone has detected manual outliers removed. DSpace and EPrints have been generated by applying their native algorithms to the data.
  7. Outliers: c.f. LAMOTHE, A. R. 2014. The importance of identifying and accommodating e-resource usage data for the presence of outliers. Information Technology and Libraries, 33, 31-44.
  8. *Recalibrated double-click filter: a single IP address downloading a single item more than 10 times in 24 hours is excluded. By default the filter is 1 IP, downloads 1 item more than 1 time in 24 hours. This can be configured in terms of the timeout length but currently can't be configured to increase the number of downloads allowed within the period See also: JOINT, N., FIELD, A. & GREGSON, M. 2011. Please change the way IRstats works [Online]. Available: http://www.eprints.org/tech.php/15695.html [Accessed November 28 2015]. The drop in inverse recall (loss of legitimate downloads) supports the concern raised in this email discussion. However, if the recalibration were to be implemented, the improvement to robot precision means that the increase in legitimate downloads is offset by the decrease in illegitimate ones, so inverse precision is not affected a great deal. Overall accuracy improves notably however.
  9. *Recalibrated double-click filter: a single IP address downloading a single item more than 10 times in 24 hours is excluded. By default the filter is 1 IP, downloads 1 item more than 1 time in 24 hours. This can be configured in terms of the timeout length but currently can't be configured to increase the number of downloads allowed within the period
  10. *Recalibrated double-click filter: a single IP address downloading a single item more than 10 times in 24 hours is excluded. By default the filter is 1 IP, downloads 1 item more than 1 time in 24 hours. This can be configured in terms of the timeout length but currently can't be configured to increase the number of downloads allowed within the period