This session will demystify (generative) AI by exploring its workings as an advanced statistical modelling tool (suitable for any level of technical knowledge). Not only will this session explain the technological underpinnings of AI, it will also address concerns and (long-term) requirements around ethical and practical usage of AI. This includes data preparation and cleaning, data ownership, and the value of data-generated - but not owned - by libraries. It will also discuss the potentials for (hypothetical) use cases of AI in collections environments and making collections data AI-ready; providing examples of AI capabilities and applications beyond chatbots.
5. About me
2014-17
2018-19
2019-
2021-23
2023-
BA Liberal Arts &
Sciences
Computer Science
Statistics
Neuroscience
Linguistics
MA Applied
Linguistics
Conversation analysis
Pragmatics
Corpus Linguistics
PhD Applied
Linguistics
Conversation analysis
Ambiguity in stories
Human – computer
interaction
Collections
Assistant
Acquisitions &
Reading lists
University of
Leeds
Publishing
Technologies
Librarian
Open Library of
Humanities
Janeway Systems
8. Demystifying AI
Opening the black box
What is AI?
“[algorithm based technologies that aim to]
simulate human intelligence and problem-solving
capabilities”
– IBM
9. Machine learning
“We say that a machine learns with respect to a particular task (T), performance metric P,
and type of [training] experience (E), if the system reliably improves its performance (P) at
task (T), following experience (E). Depending on how we specify T, P, and E, the learning
task might also be called by names such as data mining, autonomous discovery, database
updating, programming by example, etc.”
-Tom M. Mitchell, “The Discipline of Machine Learning.”
Playing chess
matches
% of games
won against
players
Practice
games against
itself
Task Performance
measure
Training
experience
10. Machine learning
“We say that a machine learns with respect to a particular task (T), performance metric P,
and type of [training] experience (E), if the system reliably improves its performance (P) at
task (T), following experience (E). Depending on how we specify T, P, and E, the learning
task might also be called by names such as data mining, autonomous discovery, database
updating, programming by example, etc.”
-Tom M. Mitchell, “The Discipline of Machine Learning.”
Playing chess
matches
% of games
won against
players
Practice
games against
itself
Task
Performance
measure
Training
experience
11. Machine learning
“We say that a machine learns with respect to a particular task (T), performance metric P,
and type of [training] experience (E), if the system reliably improves its performance (P) at
task (T), following experience (E). Depending on how we specify T, P, and E, the learning
task might also be called by names such as data mining, autonomous discovery, database
updating, programming by example, etc.”
-Tom M. Mitchell, “The Discipline of Machine Learning.”
Playing chess
matches
% of games
won against
players
Practice
games against
itself
Task Performance
measure
Training
experience
12. Machine learning
“We say that a machine learns with respect to a particular task (T), performance metric P,
and type of [training] experience (E), if the system reliably improves its performance (P) at
task (T), following experience (E). Depending on how we specify T, P, and E, the learning
task might also be called by names such as data mining, autonomous discovery, database
updating, programming by example, etc.”
-Tom M. Mitchell, “The Discipline of Machine Learning.”
Playing chess
matches
% of games
won against
players
Practice
games against
itself
Task Performance
measure
Training
experience
16. Rise of the Transformers
‘Old AI’ Transformers
Specialised use cases Broader use cases
Re-training and re-writing Fine-tuning
1000s – 10,000s tokens
(A library dataset or large set of
usage statistics)
Millions – billions tokens
(Significant portions of the
internet)
Relatively easy and transparent Complex – ‘black box’
Cheap Expensive
17. Rise of the Transformers
‘Old AI’ Transformers
Specialised use cases Broader use cases
Re-training and re-writing Fine-tuning
1000s – 10,000s tokens
(A library dataset or large set of
usage statistics)
Millions – billions tokens
(Significant portions of the
internet)
Relatively easy and transparent Complex – ‘black box’
Cheap Expensive
18. Rise of the Transformers
‘Old AI’ Transformers
Specialised use cases Broader use cases
Re-training and re-writing Fine-tuning
1000s – 10,000s tokens
(A library dataset or large set of
usage statistics)
Millions – billions tokens
(Significant portions of the
internet)
Relatively easy and transparent Complex – ‘black box’
Cheap Expensive
19. Rise of the Transformers
‘Old AI’ Transformers
Specialised use cases Broader use cases
Re-training and re-writing Fine-tuning
1000s – 10,000s tokens
(A library dataset or large set of
usage statistics)
Millions – billions tokens
(Significant portions of the
internet)
Relatively easy and transparent Complex – ‘black box’
Cheap Expensive
20. Rise of the Transformers
‘Old AI’ Transformers
Specialised use cases Broader use cases
Re-training and re-writing Fine-tuning
1000s – 10,000s tokens
(A library dataset or large set of
usage statistics)
Millions – billions tokens
(Significant portions of the
internet)
Relatively easy and transparent Complex – ‘black box’
Cheap Expensive
21. Rise of the Transformers
‘Old AI’ Transformers
Specialised use cases Broader use cases
Re-training and re-writing Fine-tuning
1000s – 10,000s tokens
(A library dataset or large set of
usage statistics)
Millions – billions tokens
(Significant portions of the
internet)
Relatively easy and transparent Complex – ‘black box’
Cheap Expensive
Find the right tool for the job!
23. DATA VALUE
• Scarcity and commercial value
• Data can make or break development
• Copyright
• Contracts and agreement
• Can our data be used to train products that
will be sold back to us (that generate more
data)?
Define commercially sensitive/valuable data
24. DATA SECURITY & PRIVACY
• Is our data secure?
• Do users have a right to privacy?
! Microsoft CoPilot and US congress
Don’t feed AI GDPR/commercially sensitive data
25. DATA QUALITY
• Completeness, accuracy and consistency
• The correct data for the task
• For what purpose was the data collected?
• What is its ecosystem and what are the underlying
assumptions?
Quantitative examples Qualitative examples
Citation metrics ↛ Research quality Country of pub./nationality ↛ Diversity
Budget ↛ Service quality Academic book reviews
Reading list statistics ↛ Engagement Surveys
26. DATA BIAS
• Algorithmic bias and data provenance
“Managing bias rather than working to eliminate
bias is a distinction born of the sense that elimination
is not possible because elimination would be a kind of
bias itself—essentially a well-meaning, if ultimately
futile, ouroboros.”(Padilla, 2019)
ALL DATASETS ARE BIASED
27. Use cases
• Proof of concept for use cases and in-house development
• Generative AI and ‘regular’ machine learning
• Failing fast and failing often
• Other future applications
28. Generating MARC – ESTHER
Run No. 1 – Web interface
1. Limits in file size for
upload
2. Could not always
access links
3. Only older / better
known titles and/ or
ISBNs
4. No control
29. Generating MARC – ESTHER
Run No. 1 – Web interface
1. Limits in file size for
upload
2. Could not always
access links
3. Only older / better
known titles and/ or
ISBNs
4. No control
Run No. 2 -API (langchain)
1. Could use various
filetypes and sizes
2. Could access weblinks
3. Print based on
weblinks was variable
4. Control!
ALWAYS USE THE API FOR ANALYSES
30. Title comparisons - MARY
• Run No. 1 – CSV upload to web (failed)
• Run No. 1.5 – Cleaning CSV in web (failed)
31. Title comparisons - MARY
• Run No. 1 – CSV upload to web (failed)
• Run No. 1.5 – Cleaning CSV in web (failed)
• Run No. 2 – API (pandas / langchain)
• Worked reasonably well, but expensive?
32. Title comparisons - MARY
• Run No. 1 – CSV upload to web (failed)
• Run No. 1.5 – Cleaning CSV in web (failed)
• Run No. 2 – API (pandas / langchain)
• Worked reasonably well, but expensive?
• Run No. 3 - Machine Learning
• Neural network (numpy, PyTorch, sklearn, pandas)
• Training on outliers and low confidence answers
• More work, testing and refining – but it works* and is cheaper!
• Data processing and cleaning
33. Fail fast, fail often
JUDITH - Recommender
• Combined usage
statistics, reading list
statistics and
reading list
information using
predictive modelling
• Worked
‘theoretically’ but
lacked data and thus
meaningful testing
• Best integrated into
LMS
MIRIAM - Print Book Usage
• Determining high and
low usage?
• Combined print usage,
ILL and reading lists
statistics
• Lacked contextual
data / understanding
– unable to do
meaningful
development or
testing
MAGADLENE -Finding New
Editions
• Building on MARY-
comparing titles and
then identifying
new editions
• More complex than
expected
• Required an
additional API
• Best integrated in
LMS
MAGDALENA
MIRIAM
JUDITH
34. Fail fast, fail often
JUDITH - Recommender
• Combined usage
statistics, reading list
statistics and
reading list
information using
predictive modelling
• Worked
‘theoretically’ but
lacked data and thus
meaningful testing
• Best integrated into
LMS
MIRIAM - Print Book Usage
• Determining high and
low usage?
• Combined print usage,
ILL and reading lists
statistics
• Lacked contextual
data / understanding
– unable to do
meaningful
development or
testing
MAGADLENE -Finding New
Editions
• Building on MARY-
comparing titles and
then identifying
new editions
• More complex than
expected
• Required an
additional API
• Best integrated in
LMS
MAGDALENA
MIRIAM
35. Fail fast, fail often
JUDITH - Recommender
• Combined usage
statistics, reading list
statistics and
reading list
information using
predictive modelling
• Worked
‘theoretically’ but
lacked data and thus
meaningful testing
• Best integrated into
LMS
MIRIAM - Print Book Usage
• Determining high and
low usage?
• Combined print usage,
ILL and reading lists
statistics
• Lacked contextual
data / understanding
– unable to do
meaningful
development or
testing
MAGADLENE -Finding New
Editions
• Building on MARY-
comparing titles and
then identifying
new editions
• More complex than
expected
• Required an
additional API
• Best integrated in
LMS
MAGDALENA
36. Fail fast, fail often
JUDITH - Recommender
• Combined usage
statistics, reading list
statistics and
reading list
information using
predictive modelling
• Worked
‘theoretically’ but
lacked data and thus
meaningful testing
• Best integrated into
LMS
MIRIAM - Print Book Usage
• Determining high and
low usage?
• Combined print usage,
ILL and reading lists
statistics
• Lacked contextual
data / understanding
– unable to do
meaningful
development or
testing
MAGADLENE -Finding New
Editions
• Building on MARY-
comparing titles and
then identifying
new editions
• More complex than
expected
• Required an
additional API
• Best integrated in
LMS
38. Looking to the future
• Model sustainability
• Financially
• Ecologically
• Copyright
• Open-source
• HuggingFace
• OpenLLMs
• Specialised LLMS? (GLINER on Github)
• Transformer technology
• Engine, rather than the tech
39. Short term actions library
DATA MODELS PEOPLE
▪ Commercial
value of data
▪ Data
governance
▪ Data
assessment
criteria
▪ Data audits
▪ Benchmarks
and quality
standards
▪ Appropriate
tools
▪ Experiment!
▪ Assess the
context
▪ Find the right
people
▪ Cross-department
collaboration
▪ Working with the
data and
workflows
▪ Problem
formulation