Presentation given by Joseph Greene, Research Repository Librarian at University College Dublin Library, at Open Repositories held at Trinity College Dublin, June 13-16th, 2016.
Robot Hunter: or precisely what I thought I wouldn't be doing when I became a...UCD Library
Presentation given by Joseph Greene, Research Repository Librarian at University College Dublin Library, to the Inaugural CONUL Conference, June 4th, 2015 in Athlone, Ireland.
Presentation given by Joseph Greene, Research Repository Librarian at University College Dublin Library, at Open Repositories 2016, held at Trinity College Dublin, June 13-16th, 2016.
Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...CONUL Conference
Presented at the CONUL Conference, July 2015, Athlone, Ireland by Joseph Greene, University College Dublin.
Abstract
Web robots have become an enormous problem and must be considered when collecting and analysing web usage statistics. This is particularly problematic for institutional repositories, as the major platforms (DSpace and EPrints) have only basic robot detection and filtration capabilities for their native statistics packages. These systems, as well as a DSpace extension, the University of Minho StatisticsAddOn, will be described along with general information about web robots such as definition, usefulness and behaviour. Work is currently being done by COUNTER and IRUS-UK in this area. Their work will be discussed in the context of its applicability in Ireland.
Biography
Joseph received an MLIS in 2005 from Louisiana State University in Baton Rouge, Louisiana. He worked in East Baton Rouge Parish Libraries for three years before moving to Ireland and working with the Irish Viritual Research Library and Archive (IVRLA) at University College Dublin. Joseph became UCD’s systems librarian in 2008. He completed a diploma in project management in 2009 and has been responsible for the UCD institutional repository since 2008.
From the Fall 2013 CASEX Talk at Indiana CASE Conference – IVY Tech. This presentation is a brief introduction to developing personas in education marketing and communications to better understand and engage with prospects.
Robot Hunter: or precisely what I thought I wouldn't be doing when I became a...UCD Library
Presentation given by Joseph Greene, Research Repository Librarian at University College Dublin Library, to the Inaugural CONUL Conference, June 4th, 2015 in Athlone, Ireland.
Presentation given by Joseph Greene, Research Repository Librarian at University College Dublin Library, at Open Repositories 2016, held at Trinity College Dublin, June 13-16th, 2016.
Robot Hunter, or, precisely what I thought I wouldn't be doing when I became ...CONUL Conference
Presented at the CONUL Conference, July 2015, Athlone, Ireland by Joseph Greene, University College Dublin.
Abstract
Web robots have become an enormous problem and must be considered when collecting and analysing web usage statistics. This is particularly problematic for institutional repositories, as the major platforms (DSpace and EPrints) have only basic robot detection and filtration capabilities for their native statistics packages. These systems, as well as a DSpace extension, the University of Minho StatisticsAddOn, will be described along with general information about web robots such as definition, usefulness and behaviour. Work is currently being done by COUNTER and IRUS-UK in this area. Their work will be discussed in the context of its applicability in Ireland.
Biography
Joseph received an MLIS in 2005 from Louisiana State University in Baton Rouge, Louisiana. He worked in East Baton Rouge Parish Libraries for three years before moving to Ireland and working with the Irish Viritual Research Library and Archive (IVRLA) at University College Dublin. Joseph became UCD’s systems librarian in 2008. He completed a diploma in project management in 2009 and has been responsible for the UCD institutional repository since 2008.
From the Fall 2013 CASEX Talk at Indiana CASE Conference – IVY Tech. This presentation is a brief introduction to developing personas in education marketing and communications to better understand and engage with prospects.
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...UCD Library
Presentation given by Jennifer Collery, Liaison Librarian at University College Dublin Library, to the IFLA Information Literacy Section Satellite Meeting on August 14, 2014 in Limerick, Ireland.
Developing COUNTER Standards to Measure the Use of Open Access ResourcesUCD Library
Presentation given by Joseph Greene, Research Repository Librarian at UCD Library, at the 9th International Conference on Qualitative and Quantitative Methods in Libraries (QQML), on May 24, 2017.
Finishing the Jigsaw: consolidating and profiling the plagiarism awareness se...UCD Library
Presentation given by Jennifer Collery, Liaison Librarian at University College Dublin Library, to the IFLA Information Literacy Section Satellite Meeting on August 14, 2014 in Limerick, Ireland.
Developing COUNTER Standards to Measure the Use of Open Access ResourcesUCD Library
Presentation given by Joseph Greene, Research Repository Librarian at UCD Library, at the 9th International Conference on Qualitative and Quantitative Methods in Libraries (QQML), on May 24, 2017.
The data streaming processing paradigm and its use in modern fog architecturesVincenzo Gulisano
Invited lecture at the University of Trieste.
The lecture covers (briefly) the data streaming processing paradigm, research challenges related to distributed, parallel and deterministic streaming analysis and the research of the DCS (Distributed Computing and Systems) groups at Chalmers University of Technology.
Talk at the Royal Society Privacy in Statistical Analysis Workshop at Imperial College -- May 3, 2017
http://wwwf.imperial.ac.uk/~nadams/events/ic-rss2017/ic-rss2017.html
3 Pitfalls Everyone Should Avoid with Cloud Native ObservabilityEric D. Schabell
Are you looking at your organization's efforts to enter or expand into the cloud native landscape and feeling a bit daunted by the vast expanse of information surrounding cloud native observability? When you're moving so fast with agile practices across your DevOps, SRE's, and platform engineering teams, it's no wonder this can seem a bit confusing. Unfortunately, the choices being made have a great impact on both your business, your budgets, and the ultimate success of your cloud native initiatives. That hasty decision up front leads to big headaches very quickly down the road. In this talk, I'll introduce the problem facing everyone with cloud native observability followed by 3 common mistakes that I'm seeing organizations make and how you can avoid them!
The role of academic libraries in supporting a culture of research integrityUCD Library
Presentation given by Michelle Dalton, UCD Library's Head of Research Services, at the Academic and Research Integrity Conference Ireland 2023, 4-6 October in Galway, Ireland.
Collection Management and GreenGlass at UCD LibraryUCD Library
Presentation given by UCD Library's Collections Support Librarian Catherine Ryan at 'Collection Management: Sharing Experiences' Joint Seminar organised by CONUL Collections and CONUL Training and Development, 24th October, 2018 at the Royal Irish Academy, Dublin.
The authentic research experience: UCD Special Collections in the BA HumanitiesUCD Library
Presentation given by Evelyn Flanagan, Head of UCD Special Collections, UCD Library, and Naomi McAreavey, Assistant Professor, UCD School of English, Drama and Film, University College Dublin, at the LAI Rare Books Group Annual Seminar, held at Chester Beatty Library, Dublin, Ireland on 22 November, 2019.
Show and teach: the role of exhibitions in outreach and educationUCD Library
Presentation given by Evelyn Flanagan, Head of UCD Special Collections, UCD Library, and Associate Professor Lucy Collins, UCD School of English, Drama and Film to the LAI Rare Books Group Seminar held on 30th November, 2018, Chester Beatty Library, Dublin, Ireland.
Print to pixels: digitised periodical collections in UCD Digital LibraryUCD Library
Presentation given by Órna Roche, Metadata Librarian at UCD Library, University College Dublin, Ireland, to the Periodical Research at UCD Symposium, 30th September 2019.
Appearances can be deceiving: how to avoid 'predatory' publishersUCD Library
A presentation given by Michelle Dalton, Head of Research Services at University College Dublin Library, at the LAI Health Sciences Library Group seminar, February 2020.
Re-using OERs in UCD’s Research Accelerator for the Social Sciences Online Mo...UCD Library
Presentation given by Marta Bustillo, University College Dublin Library College Liaison Librarian, and Dr Andrew Browne, UCD, at the CONUL Annual Teaching and Learning Seminar on Thursday, November 7, 2019 in the Seamus Heaney Theatre, Dublin City University, Dublin, Ireland.
UCD Library's Training Programme and Resources for ResearchersUCD Library
Presentation given by Julia Barrett, Head of Research Services, University College Dublin Library, at the 2019 EIFL General Assembly, 8-10 August, 2019, at the American University of Central Asia, Bishkek, Kyrgyzstan.
Going Global: UCD Library's Experiences in ChinaUCD Library
Poster presentation by James Molloy and Diarmuid Stokes, College Liaison Librarians at UCD Library, at NACADA International Conference, July 16-19, 2018, University College Dublin.
Clifden Arts Festival Archive@UCD: an OverviewUCD Library
Presentation given by Ursula Byrne, Head of Development and Strategic Programmes, UCD Library, and Dr Lucy Collins, Associate Professor, UCD School of English, Drama & Film, at the 41st Clifden Arts Festival, Clifden, Galway, Ireland on 20 September 2018.
UCD Digital Library: Creating Digitised Content from Archival Collections - P...UCD Library
Presentation given by Julia Barrett, Head of UCD Library Research Services, and Audrey Drohan, Senior Library Assistant, Research Services at the Association for Church Archives of Ireland Annual General Meeting event on May 12th, 2018, at All Hallows College, Drumcondra, Dublin 9, Ireland.
Optimising Workflows for Digital Archives: UCD Digital LibraryUCD Library
Presentation by Audrey Drohan, Senior Library Assistant, Research Services (University College Dublin Library), given at the 'Optimising Workflows for Digital Archives' event, held at the James Hardiman Library, National University of Ireland Galway, Galway, Ireland, on July 10, 2018.
Creating the Collected Letters of Nano Nagle Digital CollectionUCD Library
Presentation given by Órna Roche, Metadata Librarian, Research Services, University College Dublin Library, at the Launch of the Digitization of the Letters of Nano Nagle, June 8, 2018, at Nano Nagle Place, Cork, Ireland.
#Nuntastic: Transcribing Nano Nagle's Letters using Collaborative Transcripti...UCD Library
Presentation given by Audrey Drohan, Senior Library Assistant, Research Services, University College Dublin Library, at the CONUL Annual Conference, May 30-31, 2018, held in Galway, Ireland.
Enhancing User Engagement and Experiences through the Development of UCD Libr...UCD Library
Presentation given by Julia Barrett, Head of Research Services, and Jane Nolan, Maps and GIS Librarian, University College Dublin Library, at the CONUL Annual Conference held on May 30-31, 2018 in Galway, Ireland.
UCD Library and GreenGlass: Defining Needs, Redefining CollectionsUCD Library
Presentation given by Catherine Ryan, Collections Support Librarian, University College Dublin Library, at the CONUL Annual Conference held on May 30-31,2018 in Galway, Ireland.
Are They Being Served? Reference Services Student Experience Project, UCD Lib...UCD Library
Presentation given by Jenny Collery and Dr Marta Bustillo, College Liaison Librarians at University College Dublin Library, at the CONUL Annual Conference held on May 30-31, 2018 in Galway, Ireland.
Pin It! Linking shelf-marks to shelf locationsUCD Library
Poster presented by Debra McCann and Vanessa Buckley, Senior Library Assistants at University College Dublin Library (Client Services), at the CONUL Annual Conference held on May 30-31, 2018 in Galway, Ireland.
Real Life Digital Curation and PreservationUCD Library
Poster presented by Peter Clarke, Programmer with University College Dublin Library Research Services, at the CONUL Annual Conference held on May 30-31, 2018 in Galway, Ireland.
Palestine last event orientationfvgnh .pptxRaedMohamed3
An EFL lesson about the current events in Palestine. It is intended to be for intermediate students who wish to increase their listening skills through a short lesson in power point.
Read| The latest issue of The Challenger is here! We are thrilled to announce that our school paper has qualified for the NATIONAL SCHOOLS PRESS CONFERENCE (NSPC) 2024. Thank you for your unwavering support and trust. Dive into the stories that made us stand out!
Welcome to TechSoup New Member Orientation and Q&A (May 2024).pdfTechSoup
In this webinar you will learn how your organization can access TechSoup's wide variety of product discount and donation programs. From hardware to software, we'll give you a tour of the tools available to help your nonprofit with productivity, collaboration, financial management, donor tracking, security, and more.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
Andreas Schleicher presents at the OECD webinar ‘Digital devices in schools: detrimental distraction or secret to success?’ on 27 May 2024. The presentation was based on findings from PISA 2022 results and the webinar helped launch the PISA in Focus ‘Managing screen time: How to protect and equip students against distraction’ https://www.oecd-ilibrary.org/education/managing-screen-time_7c225af4-en and the OECD Education Policy Perspective ‘Students, digital devices and success’ can be found here - https://oe.cd/il/5yV
How to Make a Field invisible in Odoo 17Celine George
It is possible to hide or invisible some fields in odoo. Commonly using “invisible” attribute in the field definition to invisible the fields. This slide will show how to make a field invisible in odoo 17.
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
#iCanHazRobot?: improved robot detection for IR usage statistics
1. Leabharlann UCD
An Coláiste Ollscoile, Baile
Átha Cliath,
Belfield, Baile Átha Cliath 4,
Eire
UCD Library
University College Dublin,
Belfield, Dublin 4, Ireland
Joseph Greene
Research Repository Librarian
University College Dublin
joseph.greene@ucd.ie
http://researchrepository.ucd.ie
#iCanHazRobot?
Improved robot detection for IR usage statistics
Open Repositories 2016
Dublin, 14 June
2. Overview and take-home points
• Usage stats are important
– (go to the Usage Stats panel on Thursday,
16/Jun/2016: 11:00am - 12:30pm)
• Robot filtration is a problem, especially in
repositories
• Robot detection has an exponential effect on
usage stats’ accuracy in repositories
• 2-3 ways to improve DSpace and EPrints’ usage
stats by 20% or more will be demonstrated
3. Experimental study
• Simple random sample of 2 years of UCD
repository’s download data
– n=341, N=3.3 million; 96.20% certainty
• Manually checked to determine if robot or human
• Applied DSpace, EPrints robot detection
algorithms to the dataset
– This is an EXPERIMENT, simulating algorithms on a
DSpace repository’s usage data and Apache logs
– The data is real, live data, and the algorithms were
very easy to simulate
4. First finding
85% of unfiltered
repository downloads
come from robots
• This is confirmed in a 2013 IRUS-UK white paper
on 20 IRs; 85% was also found to be robots
5. 0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
Accuracyofdownloadstats(inverseprecition)
Recall (robots)
Catching more robots improves stats
(But how much depends on the number of robots)
Getbetterstats
Catch more robots
Typical website, 15% robot traffic
OA journal, 40% robot
Internet Archive, 91% robot
OA repositories, 85% robot
6. Robot detection techniques used
DSpace EPrints
Minho DSpace
Statistics Add-on
Rate of requests ✓3
User agent string ✓ ✓ ✓
robots.txt access ✓
Volume of requests ✓2
✓3
List of known robot IP addresses ✓ ✓
Reverse DNS name lookup ✓1
Trap file ✓
User agents per IP address
Width of traversal in the URL space ✓3
1
Only implemented nominally or experimentally
2
Via the repeat download or ‘double-click’ filter
3
Data available as a configurable report for manual decision making
7. Measurements used in robot detection
• All measurements are a number between 0 and 1
• Recall: proportion of robots detected
– I can haz robot?
• Precision: true positives in robot detection
– Proportion of discounted downloads that are
actually made by robots (sometimes humans are
counted as robots)
• Accuracy of download stats measured as inverse
precision:
– Proportion of stats that are actually made by
humans
8. How they perform, out-of-the-box
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
DSpace EPrints Minho Minho with
monthly manual
checking
No robot detection
Robot detection in OA IR systems
Recall Precision Negative precision (accuracy of download stats)
10. 1. Ability to manually check for outliers
• At UCD, once a month, we check:
– Daily downloads for the last 2-4 months
– Top 10 most downloaded items
– Top 20 downloading IP addresses for the last 2-4
months
16. 2. Recalibrate the EPrints repeat-
download (double-click) filter
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Recall (robots) Precision (accuracy
of excluded
downloads)
Inverse recall
(legitimate
downloads
accounted for in
stats)
Inverse precision
(accuracy of
reported download
stats)
Overall accuracy
Effect of double-click filter on EPrints’ robot detection and stats
Without double-click filter With double-click filter (out-of-the-box) With recalibrated double-click filter*
𝑻𝒑 + 𝑻𝒏
𝒏
17. 3. Port Minho’s robot detection code (a
log parser) onto DSpace or EPrints
• 1 Java class
• Input is Apache Combined Log Format
• Output is a database update (robot = true field)
– Similar to EPrints' $is_robot variable in Robots.pm,
– Could be modified to update the DSpace 'isBot'
field in the SOLR usage events document
• Requires 2 database tables to store learned
agents and IPs
Good news: DSpace and EPrints do robot filtration out-of-the-box, bad news: the stats are still quite inaccurate
More good news: Improving robot recall has an exponential effect on usage stats accuracy
Usage stats: primarily download counts, used heavily in marketing the repository and they provide a measure of ROI both to those who have uploaded them (investment of time/effort) and to those who fund the repository. More downloads = more UCD visibility – one measure of our ROI.
Experiment: simple random sample of 2 years of download data (n=341, N=3.3 million for 96.20% certainty), manually checked to determine if robot or human. DSpace 1.8.2 with U. Minho DSpace Statistics Add-on v. 4. Apache Tomcat behind Apache HTTP server; logs in Apache Combined Log Format. Minho registers every download in the PostgreSQL database. Results to be published in July 2016 issue of Library Hi Tech (Greene 2016)
This dataset is used to experimentally test different detection techniques used alone and in combination
Weaknesses:
The data is taken from a DSpace/Minho system (it's own SEO, it's own way of being crawled, etc.)
'In vitro': Except for the original system (DSpace/Minho + monthly manual outlier checking), the robot detection techniques are simulated. Hence, EXPERIMENTAL
Strengths:
'In vivo': the data is real data from a production OA IR system
Simulating the various detection techniques was very easy to do, so is probably a very accurate picture of how each system would have treated this dataset
See:
INFORMATION POWER LTD. 2013. IRUS download data: identifying unusual usage [Online]. Available: http://www.irus.mimas.ac.uk/news/IRUS_download_data_Final_report.pdf [Accessed 2015-12-11.
Confirms 85% figure
DORAN, D. & GOKHALE, S. S. 2011. Web robot detection techniques: overview and limitations. Data Mining and Knowledge Discovery, 22, 183-210.
Hypothesizes why so high in OA (p.191)
Typical website (15% robot traffic)
(precision = 0.8727, mean of four studies; robots:total sessions = 0.1516, mean of four studies)
OA journal (40% robot)
HUNTINGTON, P., NICHOLAS, D. & JAMALI, H. R. 2008. Web robot detection in the scholarly information environment. Journal of Information Science, 34, 726-741.
OA repositories (85% robot)
Greene 2016 and Information Power 2013 (see above)
Internet Archive (91% robot)
ALNOAMANY, Y., WEIGLE, M. C. & NELSON, M. L. 2013. Access patterns for robots and humans in web archives. Proceedings of the ACM/IEEE Joint Conference on Digital Libraries, 339-348.
Reverse is also true: fail to catch robots (e.g. deterioration over time as robots improve their capabilities), accuracy of stats diminishes
Formula: Greene 2016
𝐏𝐢𝐧𝐯 = 𝐓𝐑(𝐑−𝐏𝐑−𝟏)+𝟐𝐓𝐏𝐑−𝐏(𝐓+𝐑−𝟏) 𝐑(𝐓𝐑−𝐏−𝐓)+𝐏
R = recall (robot detection)
P = precision (robot detection)
Pinv = inverse precision (human stats)
T = ratio of robots to total
Greene 2016
Minho with monthly manual checking is the original data as measured in vivo. Minho alone has detected manual outliers removed. DSpace and EPrints have been generated by applying their native algorithms to the data.
Outliers: c.f. LAMOTHE, A. R. 2014. The importance of identifying and accommodating e-resource usage data for the presence of outliers. Information Technology and Libraries, 33, 31-44.
*Recalibrated double-click filter: a single IP address downloading a single item more than 10 times in 24 hours is excluded. By default the filter is 1 IP, downloads 1 item more than 1 time in 24 hours. This can be configured in terms of the timeout length but currently can't be configured to increase the number of downloads allowed within the period
See also: JOINT, N., FIELD, A. & GREGSON, M. 2011. Please change the way IRstats works [Online]. Available: http://www.eprints.org/tech.php/15695.html [Accessed November 28 2015].
The drop in inverse recall (loss of legitimate downloads) supports the concern raised in this email discussion. However, if the recalibration were to be implemented, the improvement to robot precision means that the increase in legitimate downloads is offset by the decrease in illegitimate ones, so inverse precision is not affected a great deal. Overall accuracy improves notably however.
*Recalibrated double-click filter: a single IP address downloading a single item more than 10 times in 24 hours is excluded. By default the filter is 1 IP, downloads 1 item more than 1 time in 24 hours. This can be configured in terms of the timeout length but currently can't be configured to increase the number of downloads allowed within the period
*Recalibrated double-click filter: a single IP address downloading a single item more than 10 times in 24 hours is excluded. By default the filter is 1 IP, downloads 1 item more than 1 time in 24 hours. This can be configured in terms of the timeout length but currently can't be configured to increase the number of downloads allowed within the period