SlideShare a Scribd company logo
Evaluating future uses and limits
in library collections
S.Haime@bbk.ac.uk
@Siobhan_M_HQ
Overview
Demystifying
AI
Machine learning,
algorithms and
data.
Use cases in
collections
In-house
opportunities and
external
possibilities
Looking to the
future
What next,
expectations and
recommendations
Overview
Demystifying
AI
Machine learning,
algorithms and
data.
Use cases in
collections
In-house
opportunities and
external
possibilities
Looking to the
future
What next,
expectations and
recommendations
Overview
Demystifying
AI
Machine learning,
algorithms and
data.
Use cases in
collections
In-house
opportunities and
external
possibilities
Looking to the
future
What next,
expectations and
recommendations
About me
2014-17
2018-19
2019-
2021-23
2023-
BA Liberal Arts &
Sciences
Computer Science
Statistics
Neuroscience
Linguistics
MA Applied
Linguistics
Conversation analysis
Pragmatics
Corpus Linguistics
PhD Applied
Linguistics
Conversation analysis
Ambiguity in stories
Human – computer
interaction
Collections
Assistant
Acquisitions &
Reading lists
University of
Leeds
Publishing
Technologies
Librarian
Open Library of
Humanities
Janeway Systems
Demystifying AI
Opening the black box
What is AI?
Demystifying AI
Opening the black box
What is AI?
Whatever you want it to become
Demystifying AI
Opening the black box
What is AI?
“[algorithm based technologies that aim to]
simulate human intelligence and problem-solving
capabilities”
– IBM
Machine learning
“We say that a machine learns with respect to a particular task (T), performance metric P,
and type of [training] experience (E), if the system reliably improves its performance (P) at
task (T), following experience (E). Depending on how we specify T, P, and E, the learning
task might also be called by names such as data mining, autonomous discovery, database
updating, programming by example, etc.”
-Tom M. Mitchell, “The Discipline of Machine Learning.”
Playing chess
matches
% of games
won against
players
Practice
games against
itself
Task Performance
measure
Training
experience
Machine learning
“We say that a machine learns with respect to a particular task (T), performance metric P,
and type of [training] experience (E), if the system reliably improves its performance (P) at
task (T), following experience (E). Depending on how we specify T, P, and E, the learning
task might also be called by names such as data mining, autonomous discovery, database
updating, programming by example, etc.”
-Tom M. Mitchell, “The Discipline of Machine Learning.”
Playing chess
matches
% of games
won against
players
Practice
games against
itself
Task
Performance
measure
Training
experience
Machine learning
“We say that a machine learns with respect to a particular task (T), performance metric P,
and type of [training] experience (E), if the system reliably improves its performance (P) at
task (T), following experience (E). Depending on how we specify T, P, and E, the learning
task might also be called by names such as data mining, autonomous discovery, database
updating, programming by example, etc.”
-Tom M. Mitchell, “The Discipline of Machine Learning.”
Playing chess
matches
% of games
won against
players
Practice
games against
itself
Task Performance
measure
Training
experience
Machine learning
“We say that a machine learns with respect to a particular task (T), performance metric P,
and type of [training] experience (E), if the system reliably improves its performance (P) at
task (T), following experience (E). Depending on how we specify T, P, and E, the learning
task might also be called by names such as data mining, autonomous discovery, database
updating, programming by example, etc.”
-Tom M. Mitchell, “The Discipline of Machine Learning.”
Playing chess
matches
% of games
won against
players
Practice
games against
itself
Task Performance
measure
Training
experience
(Un)intelligence?
What is intelligence
VS
Searle’s Chinese room argument (1980)
Language generation versus language
understanding (or interaction)
Deep learning
Linear processing
Parallel / networked processing
Deep learning
Linear processing
Parallel / networked processing
Rise of the Transformers
‘Old AI’ Transformers
Specialised use cases Broader use cases
Re-training and re-writing Fine-tuning
1000s – 10,000s tokens
(A library dataset or large set of
usage statistics)
Millions – billions tokens
(Significant portions of the
internet)
Relatively easy and transparent Complex – ‘black box’
Cheap Expensive
Rise of the Transformers
‘Old AI’ Transformers
Specialised use cases Broader use cases
Re-training and re-writing Fine-tuning
1000s – 10,000s tokens
(A library dataset or large set of
usage statistics)
Millions – billions tokens
(Significant portions of the
internet)
Relatively easy and transparent Complex – ‘black box’
Cheap Expensive
Rise of the Transformers
‘Old AI’ Transformers
Specialised use cases Broader use cases
Re-training and re-writing Fine-tuning
1000s – 10,000s tokens
(A library dataset or large set of
usage statistics)
Millions – billions tokens
(Significant portions of the
internet)
Relatively easy and transparent Complex – ‘black box’
Cheap Expensive
Rise of the Transformers
‘Old AI’ Transformers
Specialised use cases Broader use cases
Re-training and re-writing Fine-tuning
1000s – 10,000s tokens
(A library dataset or large set of
usage statistics)
Millions – billions tokens
(Significant portions of the
internet)
Relatively easy and transparent Complex – ‘black box’
Cheap Expensive
Rise of the Transformers
‘Old AI’ Transformers
Specialised use cases Broader use cases
Re-training and re-writing Fine-tuning
1000s – 10,000s tokens
(A library dataset or large set of
usage statistics)
Millions – billions tokens
(Significant portions of the
internet)
Relatively easy and transparent Complex – ‘black box’
Cheap Expensive
Rise of the Transformers
‘Old AI’ Transformers
Specialised use cases Broader use cases
Re-training and re-writing Fine-tuning
1000s – 10,000s tokens
(A library dataset or large set of
usage statistics)
Millions – billions tokens
(Significant portions of the
internet)
Relatively easy and transparent Complex – ‘black box’
Cheap Expensive
Find the right tool for the job!
DATA
DATA VALUE
• Scarcity and commercial value
• Data can make or break development
• Copyright
• Contracts and agreement
• Can our data be used to train products that
will be sold back to us (that generate more
data)?
Define commercially sensitive/valuable data
DATA SECURITY & PRIVACY
• Is our data secure?
• Do users have a right to privacy?
! Microsoft CoPilot and US congress
Don’t feed AI GDPR/commercially sensitive data
DATA QUALITY
• Completeness, accuracy and consistency
• The correct data for the task
• For what purpose was the data collected?
• What is its ecosystem and what are the underlying
assumptions?
Quantitative examples Qualitative examples
Citation metrics ↛ Research quality Country of pub./nationality ↛ Diversity
Budget ↛ Service quality Academic book reviews
Reading list statistics ↛ Engagement Surveys
DATA BIAS
• Algorithmic bias and data provenance
“Managing bias rather than working to eliminate
bias is a distinction born of the sense that elimination
is not possible because elimination would be a kind of
bias itself—essentially a well-meaning, if ultimately
futile, ouroboros.”(Padilla, 2019)
ALL DATASETS ARE BIASED
Use cases
• Proof of concept for use cases and in-house development
• Generative AI and ‘regular’ machine learning
• Failing fast and failing often
• Other future applications
Generating MARC – ESTHER
Run No. 1 – Web interface
1. Limits in file size for
upload
2. Could not always
access links
3. Only older / better
known titles and/ or
ISBNs
4. No control
Generating MARC – ESTHER
Run No. 1 – Web interface
1. Limits in file size for
upload
2. Could not always
access links
3. Only older / better
known titles and/ or
ISBNs
4. No control
Run No. 2 -API (langchain)
1. Could use various
filetypes and sizes
2. Could access weblinks
3. Print based on
weblinks was variable
4. Control!
ALWAYS USE THE API FOR ANALYSES
Title comparisons - MARY
• Run No. 1 – CSV upload to web (failed)
• Run No. 1.5 – Cleaning CSV in web (failed)
Title comparisons - MARY
• Run No. 1 – CSV upload to web (failed)
• Run No. 1.5 – Cleaning CSV in web (failed)
• Run No. 2 – API (pandas / langchain)
• Worked reasonably well, but expensive?
Title comparisons - MARY
• Run No. 1 – CSV upload to web (failed)
• Run No. 1.5 – Cleaning CSV in web (failed)
• Run No. 2 – API (pandas / langchain)
• Worked reasonably well, but expensive?
• Run No. 3 - Machine Learning
• Neural network (numpy, PyTorch, sklearn, pandas)
• Training on outliers and low confidence answers
• More work, testing and refining – but it works* and is cheaper!
• Data processing and cleaning
Fail fast, fail often
JUDITH - Recommender
• Combined usage
statistics, reading list
statistics and
reading list
information using
predictive modelling
• Worked
‘theoretically’ but
lacked data and thus
meaningful testing
• Best integrated into
LMS
MIRIAM - Print Book Usage
• Determining high and
low usage?
• Combined print usage,
ILL and reading lists
statistics
• Lacked contextual
data / understanding
– unable to do
meaningful
development or
testing
MAGADLENE -Finding New
Editions
• Building on MARY-
comparing titles and
then identifying
new editions
• More complex than
expected
• Required an
additional API
• Best integrated in
LMS
MAGDALENA
MIRIAM
JUDITH
Fail fast, fail often
JUDITH - Recommender
• Combined usage
statistics, reading list
statistics and
reading list
information using
predictive modelling
• Worked
‘theoretically’ but
lacked data and thus
meaningful testing
• Best integrated into
LMS
MIRIAM - Print Book Usage
• Determining high and
low usage?
• Combined print usage,
ILL and reading lists
statistics
• Lacked contextual
data / understanding
– unable to do
meaningful
development or
testing
MAGADLENE -Finding New
Editions
• Building on MARY-
comparing titles and
then identifying
new editions
• More complex than
expected
• Required an
additional API
• Best integrated in
LMS
MAGDALENA
MIRIAM
Fail fast, fail often
JUDITH - Recommender
• Combined usage
statistics, reading list
statistics and
reading list
information using
predictive modelling
• Worked
‘theoretically’ but
lacked data and thus
meaningful testing
• Best integrated into
LMS
MIRIAM - Print Book Usage
• Determining high and
low usage?
• Combined print usage,
ILL and reading lists
statistics
• Lacked contextual
data / understanding
– unable to do
meaningful
development or
testing
MAGADLENE -Finding New
Editions
• Building on MARY-
comparing titles and
then identifying
new editions
• More complex than
expected
• Required an
additional API
• Best integrated in
LMS
MAGDALENA
Fail fast, fail often
JUDITH - Recommender
• Combined usage
statistics, reading list
statistics and
reading list
information using
predictive modelling
• Worked
‘theoretically’ but
lacked data and thus
meaningful testing
• Best integrated into
LMS
MIRIAM - Print Book Usage
• Determining high and
low usage?
• Combined print usage,
ILL and reading lists
statistics
• Lacked contextual
data / understanding
– unable to do
meaningful
development or
testing
MAGADLENE -Finding New
Editions
• Building on MARY-
comparing titles and
then identifying
new editions
• More complex than
expected
• Required an
additional API
• Best integrated in
LMS
Other use cases
Linked
(open)
data
Contextual
information
Enhanced
discovery
Data as
collections
OERs
Collections
as data
Networked
knowledge
graphs
Semantic
search
Enhanced
analytics*
Machine
translation
Collection
mapping
Simplifying
workflows
Improved
digitisation
Contextual
metadata
Supporting
acquisitions
Looking to the future
• Model sustainability
• Financially
• Ecologically
• Copyright
• Open-source
• HuggingFace
• OpenLLMs
• Specialised LLMS? (GLINER on Github)
• Transformer technology
• Engine, rather than the tech
Short term actions library
DATA MODELS PEOPLE
▪ Commercial
value of data
▪ Data
governance
▪ Data
assessment
criteria
▪ Data audits
▪ Benchmarks
and quality
standards
▪ Appropriate
tools
▪ Experiment!
▪ Assess the
context
▪ Find the right
people
▪ Cross-department
collaboration
▪ Working with the
data and
workflows
▪ Problem
formulation
RESIST THE URGE TO BE
IMPRESSED

More Related Content

Similar to UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library collections

Building Data Scientists
Building Data ScientistsBuilding Data Scientists
Building Data ScientistsMitch Sanders
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataAndre Freitas
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataAndy Stretton
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodDuncan Hull
 
Hector Guerrero- Road to Business Analytics
Hector Guerrero- Road to Business AnalyticsHector Guerrero- Road to Business Analytics
Hector Guerrero- Road to Business AnalyticsErika Marr
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Data science presentation
Data science presentationData science presentation
Data science presentationMSDEVMTL
 
Data Science versus Artificial Intelligence: a useful distinction
Data Science versus Artificial Intelligence: a useful distinctionData Science versus Artificial Intelligence: a useful distinction
Data Science versus Artificial Intelligence: a useful distinctionChristoforos Anagnostopoulos
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7Paul Lo
 
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Hima Patel
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactDr. Sunil Kr. Pandey
 
Mastering in Data Science 3RITPL-1 (1).pdf
Mastering in Data Science 3RITPL-1 (1).pdfMastering in Data Science 3RITPL-1 (1).pdf
Mastering in Data Science 3RITPL-1 (1).pdftarunprajapati0t
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupSri Ambati
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data ScientistsRichard Garris
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software EngineeringMiroslaw Staron
 
How to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data TeamHow to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data TeamTraveloka
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceMark West
 
Practical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in CybersecurityPractical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in Cybersecurityscoopnewsgroup
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 

Similar to UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library collections (20)

Building Data Scientists
Building Data ScientistsBuilding Data Scientists
Building Data Scientists
 
Introduction to question answering for linked data & big data
Introduction to question answering for linked data & big dataIntroduction to question answering for linked data & big data
Introduction to question answering for linked data & big data
 
Ordering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect dataOrdering the chaos: Creating websites with imperfect data
Ordering the chaos: Creating websites with imperfect data
 
eScience: A Transformed Scientific Method
eScience: A Transformed Scientific MethodeScience: A Transformed Scientific Method
eScience: A Transformed Scientific Method
 
Hector Guerrero- Road to Business Analytics
Hector Guerrero- Road to Business AnalyticsHector Guerrero- Road to Business Analytics
Hector Guerrero- Road to Business Analytics
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Data science presentation
Data science presentationData science presentation
Data science presentation
 
Data Science versus Artificial Intelligence: a useful distinction
Data Science versus Artificial Intelligence: a useful distinctionData Science versus Artificial Intelligence: a useful distinction
Data Science versus Artificial Intelligence: a useful distinction
 
Big Data Meetup #7
Big Data Meetup #7Big Data Meetup #7
Big Data Meetup #7
 
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
Advances in Exploratory Data Analysis, Visualisation and Quality for Data Cen...
 
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & ImpactData Science - An emerging Stream of Science with its Spreading Reach & Impact
Data Science - An emerging Stream of Science with its Spreading Reach & Impact
 
Mastering in Data Science 3RITPL-1 (1).pdf
Mastering in Data Science 3RITPL-1 (1).pdfMastering in Data Science 3RITPL-1 (1).pdf
Mastering in Data Science 3RITPL-1 (1).pdf
 
H2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User GroupH2O with Erin LeDell at Portland R User Group
H2O with Erin LeDell at Portland R User Group
 
Azure Databricks for Data Scientists
Azure Databricks for Data ScientistsAzure Databricks for Data Scientists
Azure Databricks for Data Scientists
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
How to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data TeamHow to Feed a Data Hungry Organization – by Traveloka Data Team
How to Feed a Data Hungry Organization – by Traveloka Data Team
 
OpenML data@Sheffield
OpenML data@SheffieldOpenML data@Sheffield
OpenML data@Sheffield
 
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data ScienceGeeCon Prague 2018 - A Practical-ish Introduction to Data Science
GeeCon Prague 2018 - A Practical-ish Introduction to Data Science
 
Practical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in CybersecurityPractical Applications of Machine Learning in Cybersecurity
Practical Applications of Machine Learning in Cybersecurity
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 

More from UKSG: connecting the knowledge community

UKSG 2024 Plenary Session 3 - There is No List: (How) Can We Combat “Predator...
UKSG 2024 Plenary Session 3 - There is No List: (How) Can We Combat “Predator...UKSG 2024 Plenary Session 3 - There is No List: (How) Can We Combat “Predator...
UKSG 2024 Plenary Session 3 - There is No List: (How) Can We Combat “Predator...UKSG: connecting the knowledge community
 
UKSG 2024 Plenary 4 - Combining Open Access research and large language model...
UKSG 2024 Plenary 4 - Combining Open Access research and large language model...UKSG 2024 Plenary 4 - Combining Open Access research and large language model...
UKSG 2024 Plenary 4 - Combining Open Access research and large language model...UKSG: connecting the knowledge community
 
UKSG 2024 Plenary 3 - There is No List: (How) Can We Combat “Predatory” Publi...
UKSG 2024 Plenary 3 - There is No List: (How) Can We Combat “Predatory” Publi...UKSG 2024 Plenary 3 - There is No List: (How) Can We Combat “Predatory” Publi...
UKSG 2024 Plenary 3 - There is No List: (How) Can We Combat “Predatory” Publi...UKSG: connecting the knowledge community
 
UKSG 2024 Plenary 2 - Are we there yet? A review of transitional agreements i...
UKSG 2024 Plenary 2 - Are we there yet? A review of transitional agreements i...UKSG 2024 Plenary 2 - Are we there yet? A review of transitional agreements i...
UKSG 2024 Plenary 2 - Are we there yet? A review of transitional agreements i...UKSG: connecting the knowledge community
 
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...UKSG: connecting the knowledge community
 
UKSG 2024 Lightning 2 - How GetFTR Supports Discovery and Access of OA Content
UKSG 2024 Lightning 2 - How GetFTR Supports Discovery and Access of OA ContentUKSG 2024 Lightning 2 - How GetFTR Supports Discovery and Access of OA Content
UKSG 2024 Lightning 2 - How GetFTR Supports Discovery and Access of OA ContentUKSG: connecting the knowledge community
 
UKSG 2024 Lightning 2 - Advocating for data sharing: messaging frameworks for...
UKSG 2024 Lightning 2 - Advocating for data sharing: messaging frameworks for...UKSG 2024 Lightning 2 - Advocating for data sharing: messaging frameworks for...
UKSG 2024 Lightning 2 - Advocating for data sharing: messaging frameworks for...UKSG: connecting the knowledge community
 
UKSG 2024 Lightning 2 - All Watched Over By Machines That Love Open Research
UKSG 2024 Lightning 2 - All Watched Over By Machines That Love Open ResearchUKSG 2024 Lightning 2 - All Watched Over By Machines That Love Open Research
UKSG 2024 Lightning 2 - All Watched Over By Machines That Love Open ResearchUKSG: connecting the knowledge community
 
UKSG 2024 Lightning 1 - Responding to the UN SDG Publishers Compact – Bristol...
UKSG 2024 Lightning 1 - Responding to the UN SDG Publishers Compact – Bristol...UKSG 2024 Lightning 1 - Responding to the UN SDG Publishers Compact – Bristol...
UKSG 2024 Lightning 1 - Responding to the UN SDG Publishers Compact – Bristol...UKSG: connecting the knowledge community
 
UKSG 2024 Lightning 1 - Practical steps towards an open research culture: Bui...
UKSG 2024 Lightning 1 - Practical steps towards an open research culture: Bui...UKSG 2024 Lightning 1 - Practical steps towards an open research culture: Bui...
UKSG 2024 Lightning 1 - Practical steps towards an open research culture: Bui...UKSG: connecting the knowledge community
 
UKSG 2024 - Reckoning or Retreat? A Longitudinal Look at DEIA in Scholarly Co...
UKSG 2024 - Reckoning or Retreat? A Longitudinal Look at DEIA in Scholarly Co...UKSG 2024 - Reckoning or Retreat? A Longitudinal Look at DEIA in Scholarly Co...
UKSG 2024 - Reckoning or Retreat? A Longitudinal Look at DEIA in Scholarly Co...UKSG: connecting the knowledge community
 
UKSG 2024 - You don't know what you've got till it's gone: Future directions ...
UKSG 2024 - You don't know what you've got till it's gone: Future directions ...UKSG 2024 - You don't know what you've got till it's gone: Future directions ...
UKSG 2024 - You don't know what you've got till it's gone: Future directions ...UKSG: connecting the knowledge community
 
UKSG 2024 - Vision, mission, passion: how UK University Presses collaborate t...
UKSG 2024 - Vision, mission, passion: how UK University Presses collaborate t...UKSG 2024 - Vision, mission, passion: how UK University Presses collaborate t...
UKSG 2024 - Vision, mission, passion: how UK University Presses collaborate t...UKSG: connecting the knowledge community
 
UKSG - 2024 - Fostering an Open Research culture: ARU's Graduate Trainee Seco...
UKSG - 2024 - Fostering an Open Research culture: ARU's Graduate Trainee Seco...UKSG - 2024 - Fostering an Open Research culture: ARU's Graduate Trainee Seco...
UKSG - 2024 - Fostering an Open Research culture: ARU's Graduate Trainee Seco...UKSG: connecting the knowledge community
 
UKSG 2024 - Creating credibility through community: Encouraging high quality ...
UKSG 2024 - Creating credibility through community: Encouraging high quality ...UKSG 2024 - Creating credibility through community: Encouraging high quality ...
UKSG 2024 - Creating credibility through community: Encouraging high quality ...UKSG: connecting the knowledge community
 
UKSG 2024 - Author Identity Metadata: Why a Small Publisher Can Address a Maj...
UKSG 2024 - Author Identity Metadata: Why a Small Publisher Can Address a Maj...UKSG 2024 - Author Identity Metadata: Why a Small Publisher Can Address a Maj...
UKSG 2024 - Author Identity Metadata: Why a Small Publisher Can Address a Maj...UKSG: connecting the knowledge community
 
UKSG 2024 - Captivate, Connect, and Convert: Unlocking the art of Collections...
UKSG 2024 - Captivate, Connect, and Convert: Unlocking the art of Collections...UKSG 2024 - Captivate, Connect, and Convert: Unlocking the art of Collections...
UKSG 2024 - Captivate, Connect, and Convert: Unlocking the art of Collections...UKSG: connecting the knowledge community
 

More from UKSG: connecting the knowledge community (20)

UKSG 2024 Plenary Session 3 - There is No List: (How) Can We Combat “Predator...
UKSG 2024 Plenary Session 3 - There is No List: (How) Can We Combat “Predator...UKSG 2024 Plenary Session 3 - There is No List: (How) Can We Combat “Predator...
UKSG 2024 Plenary Session 3 - There is No List: (How) Can We Combat “Predator...
 
UKSG 2024 From algorithms to empowerment by Christina Dinh Nguyen.pdf
UKSG 2024 From algorithms to empowerment by Christina Dinh Nguyen.pdfUKSG 2024 From algorithms to empowerment by Christina Dinh Nguyen.pdf
UKSG 2024 From algorithms to empowerment by Christina Dinh Nguyen.pdf
 
UKSG 2024 Plenary 4 - Combining Open Access research and large language model...
UKSG 2024 Plenary 4 - Combining Open Access research and large language model...UKSG 2024 Plenary 4 - Combining Open Access research and large language model...
UKSG 2024 Plenary 4 - Combining Open Access research and large language model...
 
UKSG 2024 Plenary 3 - There is No List: (How) Can We Combat “Predatory” Publi...
UKSG 2024 Plenary 3 - There is No List: (How) Can We Combat “Predatory” Publi...UKSG 2024 Plenary 3 - There is No List: (How) Can We Combat “Predatory” Publi...
UKSG 2024 Plenary 3 - There is No List: (How) Can We Combat “Predatory” Publi...
 
UKSG 2024 Plenary 2 - Let's Talk About Green
UKSG 2024 Plenary 2 - Let's Talk About GreenUKSG 2024 Plenary 2 - Let's Talk About Green
UKSG 2024 Plenary 2 - Let's Talk About Green
 
UKSG 2024 Plenary 2 - Are we there yet? A review of transitional agreements i...
UKSG 2024 Plenary 2 - Are we there yet? A review of transitional agreements i...UKSG 2024 Plenary 2 - Are we there yet? A review of transitional agreements i...
UKSG 2024 Plenary 2 - Are we there yet? A review of transitional agreements i...
 
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...
UKSG 2024 Plenary 2 - What did we Read, What did we Publish: Distilling the d...
 
UKSG 2024 Lightning 2 - How GetFTR Supports Discovery and Access of OA Content
UKSG 2024 Lightning 2 - How GetFTR Supports Discovery and Access of OA ContentUKSG 2024 Lightning 2 - How GetFTR Supports Discovery and Access of OA Content
UKSG 2024 Lightning 2 - How GetFTR Supports Discovery and Access of OA Content
 
UKSG 2024 Lightning 2 - Advocating for data sharing: messaging frameworks for...
UKSG 2024 Lightning 2 - Advocating for data sharing: messaging frameworks for...UKSG 2024 Lightning 2 - Advocating for data sharing: messaging frameworks for...
UKSG 2024 Lightning 2 - Advocating for data sharing: messaging frameworks for...
 
UKSG 2024 Lightning 2 - All Watched Over By Machines That Love Open Research
UKSG 2024 Lightning 2 - All Watched Over By Machines That Love Open ResearchUKSG 2024 Lightning 2 - All Watched Over By Machines That Love Open Research
UKSG 2024 Lightning 2 - All Watched Over By Machines That Love Open Research
 
UKSG 2024 Lightning 1 - Responding to the UN SDG Publishers Compact – Bristol...
UKSG 2024 Lightning 1 - Responding to the UN SDG Publishers Compact – Bristol...UKSG 2024 Lightning 1 - Responding to the UN SDG Publishers Compact – Bristol...
UKSG 2024 Lightning 1 - Responding to the UN SDG Publishers Compact – Bristol...
 
UKSG 2024 Lightning 1 - Practical steps towards an open research culture: Bui...
UKSG 2024 Lightning 1 - Practical steps towards an open research culture: Bui...UKSG 2024 Lightning 1 - Practical steps towards an open research culture: Bui...
UKSG 2024 Lightning 1 - Practical steps towards an open research culture: Bui...
 
UKSG 2024 - Open infrastructure and standards: small bodies, big impact
UKSG 2024 - Open infrastructure and standards: small bodies, big impactUKSG 2024 - Open infrastructure and standards: small bodies, big impact
UKSG 2024 - Open infrastructure and standards: small bodies, big impact
 
UKSG 2024 - Reckoning or Retreat? A Longitudinal Look at DEIA in Scholarly Co...
UKSG 2024 - Reckoning or Retreat? A Longitudinal Look at DEIA in Scholarly Co...UKSG 2024 - Reckoning or Retreat? A Longitudinal Look at DEIA in Scholarly Co...
UKSG 2024 - Reckoning or Retreat? A Longitudinal Look at DEIA in Scholarly Co...
 
UKSG 2024 - You don't know what you've got till it's gone: Future directions ...
UKSG 2024 - You don't know what you've got till it's gone: Future directions ...UKSG 2024 - You don't know what you've got till it's gone: Future directions ...
UKSG 2024 - You don't know what you've got till it's gone: Future directions ...
 
UKSG 2024 - Vision, mission, passion: how UK University Presses collaborate t...
UKSG 2024 - Vision, mission, passion: how UK University Presses collaborate t...UKSG 2024 - Vision, mission, passion: how UK University Presses collaborate t...
UKSG 2024 - Vision, mission, passion: how UK University Presses collaborate t...
 
UKSG - 2024 - Fostering an Open Research culture: ARU's Graduate Trainee Seco...
UKSG - 2024 - Fostering an Open Research culture: ARU's Graduate Trainee Seco...UKSG - 2024 - Fostering an Open Research culture: ARU's Graduate Trainee Seco...
UKSG - 2024 - Fostering an Open Research culture: ARU's Graduate Trainee Seco...
 
UKSG 2024 - Creating credibility through community: Encouraging high quality ...
UKSG 2024 - Creating credibility through community: Encouraging high quality ...UKSG 2024 - Creating credibility through community: Encouraging high quality ...
UKSG 2024 - Creating credibility through community: Encouraging high quality ...
 
UKSG 2024 - Author Identity Metadata: Why a Small Publisher Can Address a Maj...
UKSG 2024 - Author Identity Metadata: Why a Small Publisher Can Address a Maj...UKSG 2024 - Author Identity Metadata: Why a Small Publisher Can Address a Maj...
UKSG 2024 - Author Identity Metadata: Why a Small Publisher Can Address a Maj...
 
UKSG 2024 - Captivate, Connect, and Convert: Unlocking the art of Collections...
UKSG 2024 - Captivate, Connect, and Convert: Unlocking the art of Collections...UKSG 2024 - Captivate, Connect, and Convert: Unlocking the art of Collections...
UKSG 2024 - Captivate, Connect, and Convert: Unlocking the art of Collections...
 

Recently uploaded

The Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational ResourcesThe Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational Resourcesaileywriter
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsCol Mukteshwar Prasad
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptSourabh Kumar
 
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTelling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTechSoup
 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportAvinash Rai
 
Gyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxGyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxShibin Azad
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxheathfieldcps1
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345beazzy04
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasiemaillard
 
[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online PresentationGDSCYCCE
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
 
Morse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxMorse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxjmorse8
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPCeline George
 
How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17Celine George
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersPedroFerreira53928
 
size separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceuticssize separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceuticspragatimahajan3
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...Nguyen Thanh Tu Collection
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxEduSkills OECD
 
IATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdffIATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdff17thcssbs2
 

Recently uploaded (20)

The Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational ResourcesThe Benefits and Challenges of Open Educational Resources
The Benefits and Challenges of Open Educational Resources
 
How to Break the cycle of negative Thoughts
How to Break the cycle of negative ThoughtsHow to Break the cycle of negative Thoughts
How to Break the cycle of negative Thoughts
 
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.pptBasic_QTL_Marker-assisted_Selection_Sourabh.ppt
Basic_QTL_Marker-assisted_Selection_Sourabh.ppt
 
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdfTelling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
Telling Your Story_ Simple Steps to Build Your Nonprofit's Brand Webinar.pdf
 
Industrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training ReportIndustrial Training Report- AKTU Industrial Training Report
Industrial Training Report- AKTU Industrial Training Report
 
Gyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptxGyanartha SciBizTech Quiz slideshare.pptx
Gyanartha SciBizTech Quiz slideshare.pptx
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation[GDSC YCCE] Build with AI Online Presentation
[GDSC YCCE] Build with AI Online Presentation
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
Operations Management - Book1.p  - Dr. Abdulfatah A. SalemOperations Management - Book1.p  - Dr. Abdulfatah A. Salem
Operations Management - Book1.p - Dr. Abdulfatah A. Salem
 
Morse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptxMorse OER Some Benefits and Challenges.pptx
Morse OER Some Benefits and Challenges.pptx
 
How to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERPHow to Create Map Views in the Odoo 17 ERP
How to Create Map Views in the Odoo 17 ERP
 
How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17How to the fix Attribute Error in odoo 17
How to the fix Attribute Error in odoo 17
 
Basic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumersBasic phrases for greeting and assisting costumers
Basic phrases for greeting and assisting costumers
 
size separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceuticssize separation d pharm 1st year pharmaceutics
size separation d pharm 1st year pharmaceutics
 
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
50 ĐỀ LUYỆN THI IOE LỚP 9 - NĂM HỌC 2022-2023 (CÓ LINK HÌNH, FILE AUDIO VÀ ĐÁ...
 
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptxStudents, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
Students, digital devices and success - Andreas Schleicher - 27 May 2024..pptx
 
IATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdffIATP How-to Foreign Travel May 2024.pdff
IATP How-to Foreign Travel May 2024.pdff
 

UKSG 2024 - Demystifying AI - Evaluating future uses and limits in library collections

  • 1. Evaluating future uses and limits in library collections S.Haime@bbk.ac.uk @Siobhan_M_HQ
  • 2. Overview Demystifying AI Machine learning, algorithms and data. Use cases in collections In-house opportunities and external possibilities Looking to the future What next, expectations and recommendations
  • 3. Overview Demystifying AI Machine learning, algorithms and data. Use cases in collections In-house opportunities and external possibilities Looking to the future What next, expectations and recommendations
  • 4. Overview Demystifying AI Machine learning, algorithms and data. Use cases in collections In-house opportunities and external possibilities Looking to the future What next, expectations and recommendations
  • 5. About me 2014-17 2018-19 2019- 2021-23 2023- BA Liberal Arts & Sciences Computer Science Statistics Neuroscience Linguistics MA Applied Linguistics Conversation analysis Pragmatics Corpus Linguistics PhD Applied Linguistics Conversation analysis Ambiguity in stories Human – computer interaction Collections Assistant Acquisitions & Reading lists University of Leeds Publishing Technologies Librarian Open Library of Humanities Janeway Systems
  • 6. Demystifying AI Opening the black box What is AI?
  • 7. Demystifying AI Opening the black box What is AI? Whatever you want it to become
  • 8. Demystifying AI Opening the black box What is AI? “[algorithm based technologies that aim to] simulate human intelligence and problem-solving capabilities” – IBM
  • 9. Machine learning “We say that a machine learns with respect to a particular task (T), performance metric P, and type of [training] experience (E), if the system reliably improves its performance (P) at task (T), following experience (E). Depending on how we specify T, P, and E, the learning task might also be called by names such as data mining, autonomous discovery, database updating, programming by example, etc.” -Tom M. Mitchell, “The Discipline of Machine Learning.” Playing chess matches % of games won against players Practice games against itself Task Performance measure Training experience
  • 10. Machine learning “We say that a machine learns with respect to a particular task (T), performance metric P, and type of [training] experience (E), if the system reliably improves its performance (P) at task (T), following experience (E). Depending on how we specify T, P, and E, the learning task might also be called by names such as data mining, autonomous discovery, database updating, programming by example, etc.” -Tom M. Mitchell, “The Discipline of Machine Learning.” Playing chess matches % of games won against players Practice games against itself Task Performance measure Training experience
  • 11. Machine learning “We say that a machine learns with respect to a particular task (T), performance metric P, and type of [training] experience (E), if the system reliably improves its performance (P) at task (T), following experience (E). Depending on how we specify T, P, and E, the learning task might also be called by names such as data mining, autonomous discovery, database updating, programming by example, etc.” -Tom M. Mitchell, “The Discipline of Machine Learning.” Playing chess matches % of games won against players Practice games against itself Task Performance measure Training experience
  • 12. Machine learning “We say that a machine learns with respect to a particular task (T), performance metric P, and type of [training] experience (E), if the system reliably improves its performance (P) at task (T), following experience (E). Depending on how we specify T, P, and E, the learning task might also be called by names such as data mining, autonomous discovery, database updating, programming by example, etc.” -Tom M. Mitchell, “The Discipline of Machine Learning.” Playing chess matches % of games won against players Practice games against itself Task Performance measure Training experience
  • 13. (Un)intelligence? What is intelligence VS Searle’s Chinese room argument (1980) Language generation versus language understanding (or interaction)
  • 16. Rise of the Transformers ‘Old AI’ Transformers Specialised use cases Broader use cases Re-training and re-writing Fine-tuning 1000s – 10,000s tokens (A library dataset or large set of usage statistics) Millions – billions tokens (Significant portions of the internet) Relatively easy and transparent Complex – ‘black box’ Cheap Expensive
  • 17. Rise of the Transformers ‘Old AI’ Transformers Specialised use cases Broader use cases Re-training and re-writing Fine-tuning 1000s – 10,000s tokens (A library dataset or large set of usage statistics) Millions – billions tokens (Significant portions of the internet) Relatively easy and transparent Complex – ‘black box’ Cheap Expensive
  • 18. Rise of the Transformers ‘Old AI’ Transformers Specialised use cases Broader use cases Re-training and re-writing Fine-tuning 1000s – 10,000s tokens (A library dataset or large set of usage statistics) Millions – billions tokens (Significant portions of the internet) Relatively easy and transparent Complex – ‘black box’ Cheap Expensive
  • 19. Rise of the Transformers ‘Old AI’ Transformers Specialised use cases Broader use cases Re-training and re-writing Fine-tuning 1000s – 10,000s tokens (A library dataset or large set of usage statistics) Millions – billions tokens (Significant portions of the internet) Relatively easy and transparent Complex – ‘black box’ Cheap Expensive
  • 20. Rise of the Transformers ‘Old AI’ Transformers Specialised use cases Broader use cases Re-training and re-writing Fine-tuning 1000s – 10,000s tokens (A library dataset or large set of usage statistics) Millions – billions tokens (Significant portions of the internet) Relatively easy and transparent Complex – ‘black box’ Cheap Expensive
  • 21. Rise of the Transformers ‘Old AI’ Transformers Specialised use cases Broader use cases Re-training and re-writing Fine-tuning 1000s – 10,000s tokens (A library dataset or large set of usage statistics) Millions – billions tokens (Significant portions of the internet) Relatively easy and transparent Complex – ‘black box’ Cheap Expensive Find the right tool for the job!
  • 22. DATA
  • 23. DATA VALUE • Scarcity and commercial value • Data can make or break development • Copyright • Contracts and agreement • Can our data be used to train products that will be sold back to us (that generate more data)? Define commercially sensitive/valuable data
  • 24. DATA SECURITY & PRIVACY • Is our data secure? • Do users have a right to privacy? ! Microsoft CoPilot and US congress Don’t feed AI GDPR/commercially sensitive data
  • 25. DATA QUALITY • Completeness, accuracy and consistency • The correct data for the task • For what purpose was the data collected? • What is its ecosystem and what are the underlying assumptions? Quantitative examples Qualitative examples Citation metrics ↛ Research quality Country of pub./nationality ↛ Diversity Budget ↛ Service quality Academic book reviews Reading list statistics ↛ Engagement Surveys
  • 26. DATA BIAS • Algorithmic bias and data provenance “Managing bias rather than working to eliminate bias is a distinction born of the sense that elimination is not possible because elimination would be a kind of bias itself—essentially a well-meaning, if ultimately futile, ouroboros.”(Padilla, 2019) ALL DATASETS ARE BIASED
  • 27. Use cases • Proof of concept for use cases and in-house development • Generative AI and ‘regular’ machine learning • Failing fast and failing often • Other future applications
  • 28. Generating MARC – ESTHER Run No. 1 – Web interface 1. Limits in file size for upload 2. Could not always access links 3. Only older / better known titles and/ or ISBNs 4. No control
  • 29. Generating MARC – ESTHER Run No. 1 – Web interface 1. Limits in file size for upload 2. Could not always access links 3. Only older / better known titles and/ or ISBNs 4. No control Run No. 2 -API (langchain) 1. Could use various filetypes and sizes 2. Could access weblinks 3. Print based on weblinks was variable 4. Control! ALWAYS USE THE API FOR ANALYSES
  • 30. Title comparisons - MARY • Run No. 1 – CSV upload to web (failed) • Run No. 1.5 – Cleaning CSV in web (failed)
  • 31. Title comparisons - MARY • Run No. 1 – CSV upload to web (failed) • Run No. 1.5 – Cleaning CSV in web (failed) • Run No. 2 – API (pandas / langchain) • Worked reasonably well, but expensive?
  • 32. Title comparisons - MARY • Run No. 1 – CSV upload to web (failed) • Run No. 1.5 – Cleaning CSV in web (failed) • Run No. 2 – API (pandas / langchain) • Worked reasonably well, but expensive? • Run No. 3 - Machine Learning • Neural network (numpy, PyTorch, sklearn, pandas) • Training on outliers and low confidence answers • More work, testing and refining – but it works* and is cheaper! • Data processing and cleaning
  • 33. Fail fast, fail often JUDITH - Recommender • Combined usage statistics, reading list statistics and reading list information using predictive modelling • Worked ‘theoretically’ but lacked data and thus meaningful testing • Best integrated into LMS MIRIAM - Print Book Usage • Determining high and low usage? • Combined print usage, ILL and reading lists statistics • Lacked contextual data / understanding – unable to do meaningful development or testing MAGADLENE -Finding New Editions • Building on MARY- comparing titles and then identifying new editions • More complex than expected • Required an additional API • Best integrated in LMS MAGDALENA MIRIAM JUDITH
  • 34. Fail fast, fail often JUDITH - Recommender • Combined usage statistics, reading list statistics and reading list information using predictive modelling • Worked ‘theoretically’ but lacked data and thus meaningful testing • Best integrated into LMS MIRIAM - Print Book Usage • Determining high and low usage? • Combined print usage, ILL and reading lists statistics • Lacked contextual data / understanding – unable to do meaningful development or testing MAGADLENE -Finding New Editions • Building on MARY- comparing titles and then identifying new editions • More complex than expected • Required an additional API • Best integrated in LMS MAGDALENA MIRIAM
  • 35. Fail fast, fail often JUDITH - Recommender • Combined usage statistics, reading list statistics and reading list information using predictive modelling • Worked ‘theoretically’ but lacked data and thus meaningful testing • Best integrated into LMS MIRIAM - Print Book Usage • Determining high and low usage? • Combined print usage, ILL and reading lists statistics • Lacked contextual data / understanding – unable to do meaningful development or testing MAGADLENE -Finding New Editions • Building on MARY- comparing titles and then identifying new editions • More complex than expected • Required an additional API • Best integrated in LMS MAGDALENA
  • 36. Fail fast, fail often JUDITH - Recommender • Combined usage statistics, reading list statistics and reading list information using predictive modelling • Worked ‘theoretically’ but lacked data and thus meaningful testing • Best integrated into LMS MIRIAM - Print Book Usage • Determining high and low usage? • Combined print usage, ILL and reading lists statistics • Lacked contextual data / understanding – unable to do meaningful development or testing MAGADLENE -Finding New Editions • Building on MARY- comparing titles and then identifying new editions • More complex than expected • Required an additional API • Best integrated in LMS
  • 37. Other use cases Linked (open) data Contextual information Enhanced discovery Data as collections OERs Collections as data Networked knowledge graphs Semantic search Enhanced analytics* Machine translation Collection mapping Simplifying workflows Improved digitisation Contextual metadata Supporting acquisitions
  • 38. Looking to the future • Model sustainability • Financially • Ecologically • Copyright • Open-source • HuggingFace • OpenLLMs • Specialised LLMS? (GLINER on Github) • Transformer technology • Engine, rather than the tech
  • 39. Short term actions library DATA MODELS PEOPLE ▪ Commercial value of data ▪ Data governance ▪ Data assessment criteria ▪ Data audits ▪ Benchmarks and quality standards ▪ Appropriate tools ▪ Experiment! ▪ Assess the context ▪ Find the right people ▪ Cross-department collaboration ▪ Working with the data and workflows ▪ Problem formulation
  • 40. RESIST THE URGE TO BE IMPRESSED