SlideShare a Scribd company logo
Building Bridges 
(and rapid depreciation)
David Foster Wallace, on Ambition: 
“You know, the whole thing about perfectionism. The 
perfectionism is very dangerous, because of course if your 
fidelity to perfectionism is too high, you never do 
anything. 
Because doing anything results in— It’s actually kind of 
tragic because it means you sacrifice how gorgeous and 
perfect it is in your head for what it really is.” 
- As told to Leonard Lopate on WNYC on March 4, 1996. 
(emphasis my own) 
http://blankonblank.org/interviews/david-foster-wallace-on-ambition/
The unifying theme to (pretty much) 
all the requests:
The unifying theme to (pretty much) 
all the requests: 
Give me 
EVERYTHING!
The unifying theme to (pretty much) 
all the requests: 
Give me 
EVERYTHING! 
(that might be important to my work)
Fetch!
Why? 
“Can’t they just find the things they want 
through the catalogue?”
1. If they knew which bits 
of data were necessary, 
they would already know 
the answers.
“I am 
interested in 
travel 
accounts in 
Europe during 
the 19th 
Century”
2. If a conventional 
search interface worked, 
they wouldn’t be asking.
How does conventional search work 
anyway? Under what assumptions? 
Starts with the Text: 
“I quickly explained that many big jobs involve 
a few hazards.”
How does conventional search work 
anyway? Under what assumptions? 
Then it is Tokenised (with some assumptions 
on how this is possible): 
“I”, “quickly”, “explained”, “that”, ”many”, “big”, 
“jobs”, “involve”, “a”, “few”, “hazards”
How does conventional search work 
anyway? Under what assumptions? 
Then, the most common words are removed 
as these are assumed to be unimportant. 
(Stopwords) 
“quickly”, “explained”, ”many”, “big”, “jobs”, 
“involve”, “few”, “hazards”
How does conventional search work 
anyway? Under what assumptions? 
Many fulltext search services will also perform 
language-specific Stemming, that is, to reduce 
each word to a root: 
“quick”, “explain”, ”many”, “big”, “job”, 
“involve”, “few”, “hazard” 
(Lookup ‘porter’ and ‘snowball’ stemmers for more.)
How does conventional search work 
anyway? Under what assumptions? 
Finally, an inverse-index is created* and 
arranged with the assumption that you want to 
find the most Relevant results to future 
queries. 
Search terms are passed through the same 
workflow. 
(*Contemporary search engines are more complex of course, but the basics 
are still there.)
Why on earth did I teach you about 
search? 
All services are made with compromises and 
assumptions, and it is good to examine these 
from time to time. 
The key assumption is that people will search 
for the most Relevant record that matches the 
text they entered.
The most Relevant record 
that matches the text they 
entered.
Why not: 
All the works that likely 
cover a specific topic I 
define or fit an arbitrary 
algorithm I can supply.
“That’s great and all but 
it’s all subjective; you 
can’t teach a computer 
that…”
http://www.robertelliottsmith.com/?p=530
“16 Sad Women”
“I am 
interested in 
travel 
accounts in 
Europe during 
the 19th 
Century”
2013 Competition winners 
http://labs.bl.uk/Ideas+for+Labs 
Pieter Francois


2013 Competition winners 
http://labs.bl.uk/Ideas+for+Labs 
Dan Norton - “Mixing the Library. Information 
Interaction and the DJ” 
Can a researcher record a session drawing 
from digital objects, in the same way a DJ does 
with music tracks?
The other unifying themes to the 
requests: 
“I need tools to help me interpret the vast 
amount of content you hold. You don’t provide 
any but make it impossible for others to do 
so.” 
“I want to work on broad sweeps of content, 
rather than book-by-book. It would take too 
much time to get each one.” 
“API? what’s that? I don’t care. Just give me the 
files.”
So, a challenge was born… 
If a researcher is given direct file access to a 
large amount of data, can it be useful? 
What internal conventions would need to be 
removed? What external conventions added? 
One way to try it out, was to pretend to be a 
researcher and to ‘eat our own dogfood’.
How has the depiction of 
faces changed in books 
over the 19th Century? 
aka how well does modern photographic 
face detection routines work on 19th C 
illustrations?
Success? Not really. 
Many more female faces were found than 
male. 
This did not mean that there are more 
images of women in the books than men!
19C depictions of faces 
• Often drawn more symmetrically - male faces 
were more likely to be exaggerated. 
• Depiction is typically 'clean' and posed 
• Fashion: beards, spectacles and hats - different 
to the modern photographic training data
There was something else though... 
People on their way past would occasionally 
pause and look over my shoulder. 
Every day it dug up illustrations that 
surprised me and the team around me. 
So… I wondered if anyone else might be 
surprised and intrigued by them too? 
http://mechanicalcurator.tumblr.com/archive
How does machine learning work? 
First, turn the raw data into numbers, 
something the computer can deal with: 
eg when analysing text, assign a number to 
each word and form a ‘dictionary’
How does machine learning work? 
Process the numeric data in an effort to 
better expose the “important” information 
- removing noise and tone variation from an 
image 
- turning a grid of pixels into independant 
trackable ‘points of interest’ 
- hue, saturation, levels 
- produce metrics
How does machine learning work? 
Annotate - manually or automatically - what is 
useful and what is not in a portion of the data: 
- Characteristics: 
- Spam or not? 
- Face at x,y,w,h 
- Positive, neutral and negative sentiment 
- Scalar qualities
How does machine learning work? 
Pass most of the ‘known’ data through one of 
many machine learning algorithms, such as a 
Scalable Vector Machine (SVM) as 
implemented in libsvm. 
Which one depends entirely on what the 
computer will be able to do once trained.
How does machine learning work? 
Test your trained machine with half of the rest 
of the data to see how it does. 
eg if characterising email, does it correctly spot 
Spam?
How does machine learning work? 
Now, use the trained profile on real data! 
Sometimes, these profiles are shared, for 
example, Haar cascades trained on 
photographic datasets (face, body, etc) are 
freely available
Why the second lesson? 
Analysis starts with a bulk set of data, and a 
set of assumptions and ideas. 
The usefulness of a stemming/tokenising 
search service is unquestioned and Libraries 
support metadata-level search. 
No-one can support all assumptions and 
ideas!
Surprising? It was an experiment, 
after all...
Accessible? 
• In theory, the books were accessible. 
• In practice, it was a real challenge to find 
anything viewable. 
The chasm between digital and print: 
http://samplegenerator.cloudapp.net
As this is all in the public domain 
anyway... 
What’s the harm in making it a bit more 
accessible? 
The Mechanical Curator twitter account has 
only got a handful of people following it 
after all. Maybe there isn’t much appetite for 
it?
Impact? 
Hard to measure: 
- 20 million hits on average every month, 
over 200 million in 10 months*. 
- Over 100,000 tags added. 
- Hundreds of contributors. 
- Iterative crowdsourcing is ongoing. 
- Peter Balman’s aforementioned project 
* Are image view stats really a good measure?
Research and Technology 
• Mario Klingemann Pattern Recognition Software 
• Collaborative PhD ‘A History of the Printed Image 1750-1850: Applying 
Data Science Techniques to Printed Book Illustration’ 
• TSB Digitial Innovation Contest New tech for tracking Public Domain in 
the Wild
Crowdsourcing & Apps 
• Metadata Games 
• Wikipedia Synoptic Index 
• BL Georeferencer - 3221 maps referenced in a few weeks!
Tagathon!
[Tangent warning] 
Scott Nicholson’s RECIPE
Creative Uses 
• David Normal installation at Burning Man Festival 
• “Moments” by Joe Bell 
• Colouring-in Pages for Children
Tutorial 
s 
• Using Photoshop to Up-res images 
• Converting images to vector graphics
Collaborations with Colleagues 
• Inspired by Flickr, a Sound Archive series 
• Maps will be fed into the next phase of the Georeferencer
Education 
• Images included in Wikipedia Articles 
• University of Minnesota English Literature Course Exercise on Tagging 
• Art Therapy Courses
The ‘British Library Big Data 
Experiment’ 
http://britishlibrary.typepad.co.uk/digital-scholarship/ 
2014/06/the-british-library-big-data- 
experiment.html 
“What can a group of UCL Big Data CS 
students do when given access to cloud 
computing, all of the book data and a focus 
group of digital humanists?”
The ‘British Library Big Data 
Experiment’ 
Next phase will work with an undergraduate 
team with experience at image analysis. 
We are hosting an event on the 18th of 
December 2014, on “Pattern Recognition”.
In summary, “Clarity” 
It is clear that we can: 
fail and fail quickly 
build experiments that 
won’t last 
open content 
build bridges
My contact details for later technical 
questions: 
ben.osteen@bl.uk 
@benosteen 
Links: 
http://labs.bl.uk 
http://mechanicalcurator.tumblr.com 
https://flickr.com/photos/britishlibrary 
https://github.com/bl-labs 
http://britishlibrary.typepad.co.uk/digital-scholarship/2013/12/a-million-first-steps.html
Image credits: 
Title image: from https://www.flickr.com/photos/britishlibrary/11223645575 
Title: "The Book of The Grand Junction Railway, being a history and description of the line from Birmingham to Liverpool and 
Manchester ... By T. Roscoe, assisted by the resident engineers of the line" 
Author: Roscoe, Thomas. 
Shelfmark: "British Library HMNTS 796.f.3." 
https://www.flickr.com/photos/britishlibrary/11209677645 - Foot Bridge, Dartmoor 
https://www.flickr.com/photos/britishlibrary/11208502325 - The Suspension Bridge 
https://www.flickr.com/photos/britishlibrary/11234482436 - Wensleydale & Swaledale 
Image taken from page 97 of 'The Mineral Baths of Bath. The Bathes of Bathe's Ayde in the reign of Charles 2nd as 
illustrated by a drawing of the King's and Queen's Bath, signed 1675. Whereunto is annexed a Visit to Bath in the year 
1675 by “A Person of Q" by The British Library (More from this book here: https://www.flickr.com/search/? 
tags=sysnum000878624) 
Image taken from page 467 of '[The History of New South Wales, including Botany Bay, Port Jackson, Pamaratta [sic], 
Sydney, and all its dependancies ... with the customs and manners of the natives, and an account of the English colony, 
from its foundation https://www.flickr.com/photos/britishlibrary/11001417405 
http://britishlibrary.typepad.co.uk/digital-scholarship/2013/10/peeking-behind-the-curtain-of-the-mechanical-curator.html

More Related Content

What's hot

Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
James Hendler
 
Building Social Software for the Anti-Social: Part I
Building Social Software for the Anti-Social: Part IBuilding Social Software for the Anti-Social: Part I
Building Social Software for the Anti-Social: Part I
codinghorror
 
Twitter for Researchers
Twitter for ResearchersTwitter for Researchers
Twitter for Researchers
Ned Potter
 
Isle of Man open data overview
Isle of Man open data overviewIsle of Man open data overview
Isle of Man open data overview
Chris Taggart
 
Book Takeout and User-Docused Delivery
Book Takeout and User-Docused DeliveryBook Takeout and User-Docused Delivery
Book Takeout and User-Docused Delivery
Ned Potter
 
I've Always Wanted To Data Model - Data Week 2013
I've Always Wanted To Data Model - Data Week 2013I've Always Wanted To Data Model - Data Week 2013
I've Always Wanted To Data Model - Data Week 2013
Ian Varley
 
Oredev 2011: Building Social Software for the Anti-Social Part II, Electric B...
Oredev 2011: Building Social Software for the Anti-Social Part II, Electric B...Oredev 2011: Building Social Software for the Anti-Social Part II, Electric B...
Oredev 2011: Building Social Software for the Anti-Social Part II, Electric B...
codinghorror
 
Unstructure: Smashing the Boundaries of Data (SxSWi 2014)
Unstructure: Smashing the Boundaries of Data (SxSWi 2014)Unstructure: Smashing the Boundaries of Data (SxSWi 2014)
Unstructure: Smashing the Boundaries of Data (SxSWi 2014)
Ian Varley
 
Five Ways to Get Better Data From Our Users
Five Ways to Get Better Data From Our UsersFive Ways to Get Better Data From Our Users
Five Ways to Get Better Data From Our Users
Sajid Reshamwala
 
Digital Distraction and Digital Overload: Maybe Nicholas Carr was Right!
Digital Distraction and Digital Overload: Maybe Nicholas Carr was Right!Digital Distraction and Digital Overload: Maybe Nicholas Carr was Right!
Digital Distraction and Digital Overload: Maybe Nicholas Carr was Right!
Cherie Dargan
 
The time for Libraries is NOW
The time for Libraries is NOWThe time for Libraries is NOW
The time for Libraries is NOW
Ned Potter
 
How to stop sucking and be awesome instead
How to stop sucking and be awesome insteadHow to stop sucking and be awesome instead
How to stop sucking and be awesome instead
codinghorror
 
Place graphs are the new social graphs
Place graphs are the new social graphsPlace graphs are the new social graphs
Place graphs are the new social graphs
Matt Biddulph
 
Summary Project Slideshow
Summary Project SlideshowSummary Project Slideshow
Summary Project Slideshow
Andy Black
 
e-Learning A to Z - Part 2 (N-Z)
e-Learning A to Z  - Part 2 (N-Z)e-Learning A to Z  - Part 2 (N-Z)
e-Learning A to Z - Part 2 (N-Z)
Barry Dahl
 
Fighting Spam at Flickr
Fighting Spam at FlickrFighting Spam at Flickr
Fighting Spam at Flickr
Mikhail Panchenko
 
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
museums and the web
 
Cyborgs
CyborgsCyborgs
Cyborgs
Aaron Muir
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
GIS Colorado
 
Web 2.0 and virtual worlds
Web 2.0 and virtual worldsWeb 2.0 and virtual worlds
Web 2.0 and virtual worlds
Bryan Alexander
 

What's hot (20)

Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
 
Building Social Software for the Anti-Social: Part I
Building Social Software for the Anti-Social: Part IBuilding Social Software for the Anti-Social: Part I
Building Social Software for the Anti-Social: Part I
 
Twitter for Researchers
Twitter for ResearchersTwitter for Researchers
Twitter for Researchers
 
Isle of Man open data overview
Isle of Man open data overviewIsle of Man open data overview
Isle of Man open data overview
 
Book Takeout and User-Docused Delivery
Book Takeout and User-Docused DeliveryBook Takeout and User-Docused Delivery
Book Takeout and User-Docused Delivery
 
I've Always Wanted To Data Model - Data Week 2013
I've Always Wanted To Data Model - Data Week 2013I've Always Wanted To Data Model - Data Week 2013
I've Always Wanted To Data Model - Data Week 2013
 
Oredev 2011: Building Social Software for the Anti-Social Part II, Electric B...
Oredev 2011: Building Social Software for the Anti-Social Part II, Electric B...Oredev 2011: Building Social Software for the Anti-Social Part II, Electric B...
Oredev 2011: Building Social Software for the Anti-Social Part II, Electric B...
 
Unstructure: Smashing the Boundaries of Data (SxSWi 2014)
Unstructure: Smashing the Boundaries of Data (SxSWi 2014)Unstructure: Smashing the Boundaries of Data (SxSWi 2014)
Unstructure: Smashing the Boundaries of Data (SxSWi 2014)
 
Five Ways to Get Better Data From Our Users
Five Ways to Get Better Data From Our UsersFive Ways to Get Better Data From Our Users
Five Ways to Get Better Data From Our Users
 
Digital Distraction and Digital Overload: Maybe Nicholas Carr was Right!
Digital Distraction and Digital Overload: Maybe Nicholas Carr was Right!Digital Distraction and Digital Overload: Maybe Nicholas Carr was Right!
Digital Distraction and Digital Overload: Maybe Nicholas Carr was Right!
 
The time for Libraries is NOW
The time for Libraries is NOWThe time for Libraries is NOW
The time for Libraries is NOW
 
How to stop sucking and be awesome instead
How to stop sucking and be awesome insteadHow to stop sucking and be awesome instead
How to stop sucking and be awesome instead
 
Place graphs are the new social graphs
Place graphs are the new social graphsPlace graphs are the new social graphs
Place graphs are the new social graphs
 
Summary Project Slideshow
Summary Project SlideshowSummary Project Slideshow
Summary Project Slideshow
 
e-Learning A to Z - Part 2 (N-Z)
e-Learning A to Z  - Part 2 (N-Z)e-Learning A to Z  - Part 2 (N-Z)
e-Learning A to Z - Part 2 (N-Z)
 
Fighting Spam at Flickr
Fighting Spam at FlickrFighting Spam at Flickr
Fighting Spam at Flickr
 
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
 
Cyborgs
CyborgsCyborgs
Cyborgs
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Web 2.0 and virtual worlds
Web 2.0 and virtual worldsWeb 2.0 and virtual worlds
Web 2.0 and virtual worlds
 

Viewers also liked

Ajax Introduction
Ajax IntroductionAjax Introduction
Ajax Introduction
Oliver Cai
 
How To Build Website
How To Build WebsiteHow To Build Website
How To Build Website
Oliver Cai
 
OpenIDEO i20 Presentation 1.13.11
OpenIDEO i20 Presentation 1.13.11 OpenIDEO i20 Presentation 1.13.11
OpenIDEO i20 Presentation 1.13.11
Nathan Waterhouse
 
大規模画像配信とPerl
大規模画像配信とPerl大規模画像配信とPerl
大規模画像配信とPerlMasahiro Nagano
 
Introduction to ajax
Introduction to ajaxIntroduction to ajax
Introduction to ajax
Raja V
 

Viewers also liked (7)

Ajax Introduction
Ajax IntroductionAjax Introduction
Ajax Introduction
 
Ajax Introduction
Ajax IntroductionAjax Introduction
Ajax Introduction
 
How To Build Website
How To Build WebsiteHow To Build Website
How To Build Website
 
OpenIDEO i20 Presentation 1.13.11
OpenIDEO i20 Presentation 1.13.11 OpenIDEO i20 Presentation 1.13.11
OpenIDEO i20 Presentation 1.13.11
 
大規模画像配信とPerl
大規模画像配信とPerl大規模画像配信とPerl
大規模画像配信とPerl
 
Introduction to ajax
Introduction to ajaxIntroduction to ajax
Introduction to ajax
 
Introduction to ajax
Introduction to ajaxIntroduction to ajax
Introduction to ajax
 

Similar to BL Labs 2014 Symposium: The Mechanical Curator

CityLIS talk, Feb 1st 2016
CityLIS talk, Feb 1st 2016CityLIS talk, Feb 1st 2016
CityLIS talk, Feb 1st 2016
benosteen
 
Data Visualisation Literacy - Learning to See
Data Visualisation Literacy - Learning to SeeData Visualisation Literacy - Learning to See
Data Visualisation Literacy - Learning to See
Andy Kirk
 
The surprising adventures of the mechanical curator
The surprising adventures of the mechanical curatorThe surprising adventures of the mechanical curator
The surprising adventures of the mechanical curator
benosteen
 
Software art and design: computational thinking through programming practice ...
Software art and design: computational thinking through programming practice ...Software art and design: computational thinking through programming practice ...
Software art and design: computational thinking through programming practice ...
Aarhus University
 
Voices from the Field: Practices, Challenges & Directions in Digital Humaniti...
Voices from the Field: Practices, Challenges & Directions in Digital Humaniti...Voices from the Field: Practices, Challenges & Directions in Digital Humaniti...
Voices from the Field: Practices, Challenges & Directions in Digital Humaniti...
Monica Bulger
 
NDF,Te Papa, New Zealand 2015 - Keynote
NDF,Te Papa, New Zealand 2015 - KeynoteNDF,Te Papa, New Zealand 2015 - Keynote
NDF,Te Papa, New Zealand 2015 - Keynote
benosteen
 
Voices from the Field
Voices from the FieldVoices from the Field
Voices from the Field
Smiljana Antonijevic
 
Flow based-1994
Flow based-1994Flow based-1994
Flow based-1994
getdownload
 
2013 LIANZA Keynote: River's End
2013 LIANZA Keynote: River's End2013 LIANZA Keynote: River's End
2013 LIANZA Keynote: River's End
gnat
 
AI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and Publishing
AI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and PublishingAI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and Publishing
AI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and Publishing
Erin Owens
 
The Object Orientation of Teams
The Object Orientation of TeamsThe Object Orientation of Teams
The Object Orientation of Teams
Lisa Welchman
 
being observable
being observablebeing observable
being observablejudell
 
Wassup with Web 2.0
Wassup with Web 2.0Wassup with Web 2.0
Wassup with Web 2.0
Wayne Hodgins
 
Data Visualisation - An Introduction
Data Visualisation - An IntroductionData Visualisation - An Introduction
Data Visualisation - An Introduction
b1e1n1
 
Beyond Usage Stats (Or, demonstrating value & marketing services when you hav...
Beyond Usage Stats (Or, demonstrating value & marketing services when you hav...Beyond Usage Stats (Or, demonstrating value & marketing services when you hav...
Beyond Usage Stats (Or, demonstrating value & marketing services when you hav...
Joy Palmer
 
Malaysian Higher Ed-UN Learning
Malaysian Higher Ed-UN LearningMalaysian Higher Ed-UN Learning
Malaysian Higher Ed-UN Learning
Wayne Hodgins
 
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
Micah Altman
 
What is a Creative Date Scientist (and why the $@%! do we need one?)
What is a Creative Date Scientist (and why the $@%! do we need one?)What is a Creative Date Scientist (and why the $@%! do we need one?)
What is a Creative Date Scientist (and why the $@%! do we need one?)
Dave LaFontaine
 
104 Communicating our Collections Online
104 Communicating our Collections Online104 Communicating our Collections Online
104 Communicating our Collections Online
benosteen
 
User Experience Webinar 1 - Eye-popping Content: Creating a User-friendly Fra...
User Experience Webinar 1 - Eye-popping Content: Creating a User-friendly Fra...User Experience Webinar 1 - Eye-popping Content: Creating a User-friendly Fra...
User Experience Webinar 1 - Eye-popping Content: Creating a User-friendly Fra...
springshare
 

Similar to BL Labs 2014 Symposium: The Mechanical Curator (20)

CityLIS talk, Feb 1st 2016
CityLIS talk, Feb 1st 2016CityLIS talk, Feb 1st 2016
CityLIS talk, Feb 1st 2016
 
Data Visualisation Literacy - Learning to See
Data Visualisation Literacy - Learning to SeeData Visualisation Literacy - Learning to See
Data Visualisation Literacy - Learning to See
 
The surprising adventures of the mechanical curator
The surprising adventures of the mechanical curatorThe surprising adventures of the mechanical curator
The surprising adventures of the mechanical curator
 
Software art and design: computational thinking through programming practice ...
Software art and design: computational thinking through programming practice ...Software art and design: computational thinking through programming practice ...
Software art and design: computational thinking through programming practice ...
 
Voices from the Field: Practices, Challenges & Directions in Digital Humaniti...
Voices from the Field: Practices, Challenges & Directions in Digital Humaniti...Voices from the Field: Practices, Challenges & Directions in Digital Humaniti...
Voices from the Field: Practices, Challenges & Directions in Digital Humaniti...
 
NDF,Te Papa, New Zealand 2015 - Keynote
NDF,Te Papa, New Zealand 2015 - KeynoteNDF,Te Papa, New Zealand 2015 - Keynote
NDF,Te Papa, New Zealand 2015 - Keynote
 
Voices from the Field
Voices from the FieldVoices from the Field
Voices from the Field
 
Flow based-1994
Flow based-1994Flow based-1994
Flow based-1994
 
2013 LIANZA Keynote: River's End
2013 LIANZA Keynote: River's End2013 LIANZA Keynote: River's End
2013 LIANZA Keynote: River's End
 
AI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and Publishing
AI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and PublishingAI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and Publishing
AI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and Publishing
 
The Object Orientation of Teams
The Object Orientation of TeamsThe Object Orientation of Teams
The Object Orientation of Teams
 
being observable
being observablebeing observable
being observable
 
Wassup with Web 2.0
Wassup with Web 2.0Wassup with Web 2.0
Wassup with Web 2.0
 
Data Visualisation - An Introduction
Data Visualisation - An IntroductionData Visualisation - An Introduction
Data Visualisation - An Introduction
 
Beyond Usage Stats (Or, demonstrating value & marketing services when you hav...
Beyond Usage Stats (Or, demonstrating value & marketing services when you hav...Beyond Usage Stats (Or, demonstrating value & marketing services when you hav...
Beyond Usage Stats (Or, demonstrating value & marketing services when you hav...
 
Malaysian Higher Ed-UN Learning
Malaysian Higher Ed-UN LearningMalaysian Higher Ed-UN Learning
Malaysian Higher Ed-UN Learning
 
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
 
What is a Creative Date Scientist (and why the $@%! do we need one?)
What is a Creative Date Scientist (and why the $@%! do we need one?)What is a Creative Date Scientist (and why the $@%! do we need one?)
What is a Creative Date Scientist (and why the $@%! do we need one?)
 
104 Communicating our Collections Online
104 Communicating our Collections Online104 Communicating our Collections Online
104 Communicating our Collections Online
 
User Experience Webinar 1 - Eye-popping Content: Creating a User-friendly Fra...
User Experience Webinar 1 - Eye-popping Content: Creating a User-friendly Fra...User Experience Webinar 1 - Eye-popping Content: Creating a User-friendly Fra...
User Experience Webinar 1 - Eye-popping Content: Creating a User-friendly Fra...
 

More from benosteen

Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
benosteen
 
Bl labs ucl-services
Bl labs ucl-servicesBl labs ucl-services
Bl labs ucl-services
benosteen
 
Bl labs what is british library labs
Bl labs   what is british library labsBl labs   what is british library labs
Bl labs what is british library labs
benosteen
 
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
benosteen
 
Uses of Library Collections
Uses of Library CollectionsUses of Library Collections
Uses of Library Collections
benosteen
 
British library labs - What? Why?
British library labs - What? Why?British library labs - What? Why?
British library labs - What? Why?
benosteen
 
UKSG 2015 Mechanical curator and British Library labs
UKSG 2015  Mechanical curator and British Library labsUKSG 2015  Mechanical curator and British Library labs
UKSG 2015 Mechanical curator and British Library labs
benosteen
 
Lightning Talk - LDCX 2015 Stanford
Lightning Talk - LDCX 2015 StanfordLightning Talk - LDCX 2015 Stanford
Lightning Talk - LDCX 2015 Stanford
benosteen
 
Sharing and Serendipity
Sharing and SerendipitySharing and Serendipity
Sharing and Serendipity
benosteen
 
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
benosteen
 
Mechanical curator - Technical notes
Mechanical curator - Technical notesMechanical curator - Technical notes
Mechanical curator - Technical notesbenosteen
 
Apache pig as a researcher’s stepping stone
Apache pig as a researcher’s stepping stoneApache pig as a researcher’s stepping stone
Apache pig as a researcher’s stepping stonebenosteen
 
New methods of access and discoverability bring new affordances for digital r...
New methods of access and discoverability bring new affordances for digital r...New methods of access and discoverability bring new affordances for digital r...
New methods of access and discoverability bring new affordances for digital r...benosteen
 
Visualising Knowledge: Why? What? How?
Visualising Knowledge: Why? What? How?Visualising Knowledge: Why? What? How?
Visualising Knowledge: Why? What? How?benosteen
 
Mashspa
MashspaMashspa
Mashspa
benosteen
 
Postscript, books and binding
Postscript, books and bindingPostscript, books and binding
Postscript, books and binding
benosteen
 
Open Bibliography, Citations and Scholarship
Open Bibliography, Citations and ScholarshipOpen Bibliography, Citations and Scholarship
Open Bibliography, Citations and Scholarship
benosteen
 
Text-mining and Automation
Text-mining and AutomationText-mining and Automation
Text-mining and Automation
benosteen
 
Bodleian Library's DAMS system
Bodleian Library's DAMS systemBodleian Library's DAMS system
Bodleian Library's DAMS system
benosteen
 
Choices, modelling and Frankenstein Ontologies
Choices, modelling and Frankenstein OntologiesChoices, modelling and Frankenstein Ontologies
Choices, modelling and Frankenstein Ontologiesbenosteen
 

More from benosteen (20)

Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
 
Bl labs ucl-services
Bl labs ucl-servicesBl labs ucl-services
Bl labs ucl-services
 
Bl labs what is british library labs
Bl labs   what is british library labsBl labs   what is british library labs
Bl labs what is british library labs
 
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
 
Uses of Library Collections
Uses of Library CollectionsUses of Library Collections
Uses of Library Collections
 
British library labs - What? Why?
British library labs - What? Why?British library labs - What? Why?
British library labs - What? Why?
 
UKSG 2015 Mechanical curator and British Library labs
UKSG 2015  Mechanical curator and British Library labsUKSG 2015  Mechanical curator and British Library labs
UKSG 2015 Mechanical curator and British Library labs
 
Lightning Talk - LDCX 2015 Stanford
Lightning Talk - LDCX 2015 StanfordLightning Talk - LDCX 2015 Stanford
Lightning Talk - LDCX 2015 Stanford
 
Sharing and Serendipity
Sharing and SerendipitySharing and Serendipity
Sharing and Serendipity
 
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
 
Mechanical curator - Technical notes
Mechanical curator - Technical notesMechanical curator - Technical notes
Mechanical curator - Technical notes
 
Apache pig as a researcher’s stepping stone
Apache pig as a researcher’s stepping stoneApache pig as a researcher’s stepping stone
Apache pig as a researcher’s stepping stone
 
New methods of access and discoverability bring new affordances for digital r...
New methods of access and discoverability bring new affordances for digital r...New methods of access and discoverability bring new affordances for digital r...
New methods of access and discoverability bring new affordances for digital r...
 
Visualising Knowledge: Why? What? How?
Visualising Knowledge: Why? What? How?Visualising Knowledge: Why? What? How?
Visualising Knowledge: Why? What? How?
 
Mashspa
MashspaMashspa
Mashspa
 
Postscript, books and binding
Postscript, books and bindingPostscript, books and binding
Postscript, books and binding
 
Open Bibliography, Citations and Scholarship
Open Bibliography, Citations and ScholarshipOpen Bibliography, Citations and Scholarship
Open Bibliography, Citations and Scholarship
 
Text-mining and Automation
Text-mining and AutomationText-mining and Automation
Text-mining and Automation
 
Bodleian Library's DAMS system
Bodleian Library's DAMS systemBodleian Library's DAMS system
Bodleian Library's DAMS system
 
Choices, modelling and Frankenstein Ontologies
Choices, modelling and Frankenstein OntologiesChoices, modelling and Frankenstein Ontologies
Choices, modelling and Frankenstein Ontologies
 

Recently uploaded

The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
Vivekanand Anglo Vedic Academy
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
Steve Thomason
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
GeoBlogs
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
siemaillard
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
MysoreMuleSoftMeetup
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
Jheel Barad
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
Pavel ( NSTU)
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
Vikramjit Singh
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
Atul Kumar Singh
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
Thiyagu K
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
Sandy Millin
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
rosedainty
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
BhavyaRajput3
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
MIRIAMSALINAS13
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Thiyagu K
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
Balvir Singh
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
beazzy04
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
Celine George
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
AzmatAli747758
 

Recently uploaded (20)

The French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free downloadThe French Revolution Class 9 Study Material pdf free download
The French Revolution Class 9 Study Material pdf free download
 
The Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve ThomasonThe Art Pastor's Guide to Sabbath | Steve Thomason
The Art Pastor's Guide to Sabbath | Steve Thomason
 
Chapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptxChapter 3 - Islamic Banking Products and Services.pptx
Chapter 3 - Islamic Banking Products and Services.pptx
 
The geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideasThe geography of Taylor Swift - some ideas
The geography of Taylor Swift - some ideas
 
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
 
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
Mule 4.6 & Java 17 Upgrade | MuleSoft Mysore Meetup #46
 
Instructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptxInstructions for Submissions thorugh G- Classroom.pptx
Instructions for Submissions thorugh G- Classroom.pptx
 
Synthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptxSynthetic Fiber Construction in lab .pptx
Synthetic Fiber Construction in lab .pptx
 
Digital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and ResearchDigital Tools and AI for Teaching Learning and Research
Digital Tools and AI for Teaching Learning and Research
 
Language Across the Curriculm LAC B.Ed.
Language Across the  Curriculm LAC B.Ed.Language Across the  Curriculm LAC B.Ed.
Language Across the Curriculm LAC B.Ed.
 
Unit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdfUnit 8 - Information and Communication Technology (Paper I).pdf
Unit 8 - Information and Communication Technology (Paper I).pdf
 
2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...2024.06.01 Introducing a competency framework for languag learning materials ...
2024.06.01 Introducing a competency framework for languag learning materials ...
 
Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)Template Jadual Bertugas Kelas (Boleh Edit)
Template Jadual Bertugas Kelas (Boleh Edit)
 
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCECLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
CLASS 11 CBSE B.St Project AIDS TO TRADE - INSURANCE
 
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXXPhrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
Phrasal Verbs.XXXXXXXXXXXXXXXXXXXXXXXXXX
 
Unit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdfUnit 2- Research Aptitude (UGC NET Paper I).pdf
Unit 2- Research Aptitude (UGC NET Paper I).pdf
 
Operation Blue Star - Saka Neela Tara
Operation Blue Star   -  Saka Neela TaraOperation Blue Star   -  Saka Neela Tara
Operation Blue Star - Saka Neela Tara
 
Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345Sha'Carri Richardson Presentation 202345
Sha'Carri Richardson Presentation 202345
 
Model Attribute Check Company Auto Property
Model Attribute  Check Company Auto PropertyModel Attribute  Check Company Auto Property
Model Attribute Check Company Auto Property
 
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...Cambridge International AS  A Level Biology Coursebook - EBook (MaryFosbery J...
Cambridge International AS A Level Biology Coursebook - EBook (MaryFosbery J...
 

BL Labs 2014 Symposium: The Mechanical Curator

  • 1. Building Bridges (and rapid depreciation)
  • 2.
  • 3.
  • 4. David Foster Wallace, on Ambition: “You know, the whole thing about perfectionism. The perfectionism is very dangerous, because of course if your fidelity to perfectionism is too high, you never do anything. Because doing anything results in— It’s actually kind of tragic because it means you sacrifice how gorgeous and perfect it is in your head for what it really is.” - As told to Leonard Lopate on WNYC on March 4, 1996. (emphasis my own) http://blankonblank.org/interviews/david-foster-wallace-on-ambition/
  • 5. The unifying theme to (pretty much) all the requests:
  • 6. The unifying theme to (pretty much) all the requests: Give me EVERYTHING!
  • 7. The unifying theme to (pretty much) all the requests: Give me EVERYTHING! (that might be important to my work)
  • 9. Why? “Can’t they just find the things they want through the catalogue?”
  • 10. 1. If they knew which bits of data were necessary, they would already know the answers.
  • 11.
  • 12. “I am interested in travel accounts in Europe during the 19th Century”
  • 13. 2. If a conventional search interface worked, they wouldn’t be asking.
  • 14. How does conventional search work anyway? Under what assumptions? Starts with the Text: “I quickly explained that many big jobs involve a few hazards.”
  • 15. How does conventional search work anyway? Under what assumptions? Then it is Tokenised (with some assumptions on how this is possible): “I”, “quickly”, “explained”, “that”, ”many”, “big”, “jobs”, “involve”, “a”, “few”, “hazards”
  • 16. How does conventional search work anyway? Under what assumptions? Then, the most common words are removed as these are assumed to be unimportant. (Stopwords) “quickly”, “explained”, ”many”, “big”, “jobs”, “involve”, “few”, “hazards”
  • 17. How does conventional search work anyway? Under what assumptions? Many fulltext search services will also perform language-specific Stemming, that is, to reduce each word to a root: “quick”, “explain”, ”many”, “big”, “job”, “involve”, “few”, “hazard” (Lookup ‘porter’ and ‘snowball’ stemmers for more.)
  • 18. How does conventional search work anyway? Under what assumptions? Finally, an inverse-index is created* and arranged with the assumption that you want to find the most Relevant results to future queries. Search terms are passed through the same workflow. (*Contemporary search engines are more complex of course, but the basics are still there.)
  • 19. Why on earth did I teach you about search? All services are made with compromises and assumptions, and it is good to examine these from time to time. The key assumption is that people will search for the most Relevant record that matches the text they entered.
  • 20. The most Relevant record that matches the text they entered.
  • 21. Why not: All the works that likely cover a specific topic I define or fit an arbitrary algorithm I can supply.
  • 22. “That’s great and all but it’s all subjective; you can’t teach a computer that…”
  • 25. “I am interested in travel accounts in Europe during the 19th Century”
  • 26. 2013 Competition winners http://labs.bl.uk/Ideas+for+Labs Pieter Francois
  • 27.
  • 28.
  • 29.
  • 30. 2013 Competition winners http://labs.bl.uk/Ideas+for+Labs Dan Norton - “Mixing the Library. Information Interaction and the DJ” Can a researcher record a session drawing from digital objects, in the same way a DJ does with music tracks?
  • 31. The other unifying themes to the requests: “I need tools to help me interpret the vast amount of content you hold. You don’t provide any but make it impossible for others to do so.” “I want to work on broad sweeps of content, rather than book-by-book. It would take too much time to get each one.” “API? what’s that? I don’t care. Just give me the files.”
  • 32. So, a challenge was born… If a researcher is given direct file access to a large amount of data, can it be useful? What internal conventions would need to be removed? What external conventions added? One way to try it out, was to pretend to be a researcher and to ‘eat our own dogfood’.
  • 33. How has the depiction of faces changed in books over the 19th Century? aka how well does modern photographic face detection routines work on 19th C illustrations?
  • 34.
  • 35. Success? Not really. Many more female faces were found than male. This did not mean that there are more images of women in the books than men!
  • 36. 19C depictions of faces • Often drawn more symmetrically - male faces were more likely to be exaggerated. • Depiction is typically 'clean' and posed • Fashion: beards, spectacles and hats - different to the modern photographic training data
  • 37. There was something else though... People on their way past would occasionally pause and look over my shoulder. Every day it dug up illustrations that surprised me and the team around me. So… I wondered if anyone else might be surprised and intrigued by them too? http://mechanicalcurator.tumblr.com/archive
  • 38.
  • 39.
  • 40.
  • 41. How does machine learning work? First, turn the raw data into numbers, something the computer can deal with: eg when analysing text, assign a number to each word and form a ‘dictionary’
  • 42. How does machine learning work? Process the numeric data in an effort to better expose the “important” information - removing noise and tone variation from an image - turning a grid of pixels into independant trackable ‘points of interest’ - hue, saturation, levels - produce metrics
  • 43. How does machine learning work? Annotate - manually or automatically - what is useful and what is not in a portion of the data: - Characteristics: - Spam or not? - Face at x,y,w,h - Positive, neutral and negative sentiment - Scalar qualities
  • 44. How does machine learning work? Pass most of the ‘known’ data through one of many machine learning algorithms, such as a Scalable Vector Machine (SVM) as implemented in libsvm. Which one depends entirely on what the computer will be able to do once trained.
  • 45. How does machine learning work? Test your trained machine with half of the rest of the data to see how it does. eg if characterising email, does it correctly spot Spam?
  • 46. How does machine learning work? Now, use the trained profile on real data! Sometimes, these profiles are shared, for example, Haar cascades trained on photographic datasets (face, body, etc) are freely available
  • 47. Why the second lesson? Analysis starts with a bulk set of data, and a set of assumptions and ideas. The usefulness of a stemming/tokenising search service is unquestioned and Libraries support metadata-level search. No-one can support all assumptions and ideas!
  • 48.
  • 49. Surprising? It was an experiment, after all...
  • 50. Accessible? • In theory, the books were accessible. • In practice, it was a real challenge to find anything viewable. The chasm between digital and print: http://samplegenerator.cloudapp.net
  • 51. As this is all in the public domain anyway... What’s the harm in making it a bit more accessible? The Mechanical Curator twitter account has only got a handful of people following it after all. Maybe there isn’t much appetite for it?
  • 52.
  • 53. Impact? Hard to measure: - 20 million hits on average every month, over 200 million in 10 months*. - Over 100,000 tags added. - Hundreds of contributors. - Iterative crowdsourcing is ongoing. - Peter Balman’s aforementioned project * Are image view stats really a good measure?
  • 54.
  • 55.
  • 56.
  • 57. Research and Technology • Mario Klingemann Pattern Recognition Software • Collaborative PhD ‘A History of the Printed Image 1750-1850: Applying Data Science Techniques to Printed Book Illustration’ • TSB Digitial Innovation Contest New tech for tracking Public Domain in the Wild
  • 58. Crowdsourcing & Apps • Metadata Games • Wikipedia Synoptic Index • BL Georeferencer - 3221 maps referenced in a few weeks!
  • 60. [Tangent warning] Scott Nicholson’s RECIPE
  • 61. Creative Uses • David Normal installation at Burning Man Festival • “Moments” by Joe Bell • Colouring-in Pages for Children
  • 62. Tutorial s • Using Photoshop to Up-res images • Converting images to vector graphics
  • 63. Collaborations with Colleagues • Inspired by Flickr, a Sound Archive series • Maps will be fed into the next phase of the Georeferencer
  • 64. Education • Images included in Wikipedia Articles • University of Minnesota English Literature Course Exercise on Tagging • Art Therapy Courses
  • 65.
  • 66. The ‘British Library Big Data Experiment’ http://britishlibrary.typepad.co.uk/digital-scholarship/ 2014/06/the-british-library-big-data- experiment.html “What can a group of UCL Big Data CS students do when given access to cloud computing, all of the book data and a focus group of digital humanists?”
  • 67. The ‘British Library Big Data Experiment’ Next phase will work with an undergraduate team with experience at image analysis. We are hosting an event on the 18th of December 2014, on “Pattern Recognition”.
  • 68.
  • 69. In summary, “Clarity” It is clear that we can: fail and fail quickly build experiments that won’t last open content build bridges
  • 70. My contact details for later technical questions: ben.osteen@bl.uk @benosteen Links: http://labs.bl.uk http://mechanicalcurator.tumblr.com https://flickr.com/photos/britishlibrary https://github.com/bl-labs http://britishlibrary.typepad.co.uk/digital-scholarship/2013/12/a-million-first-steps.html
  • 71. Image credits: Title image: from https://www.flickr.com/photos/britishlibrary/11223645575 Title: "The Book of The Grand Junction Railway, being a history and description of the line from Birmingham to Liverpool and Manchester ... By T. Roscoe, assisted by the resident engineers of the line" Author: Roscoe, Thomas. Shelfmark: "British Library HMNTS 796.f.3." https://www.flickr.com/photos/britishlibrary/11209677645 - Foot Bridge, Dartmoor https://www.flickr.com/photos/britishlibrary/11208502325 - The Suspension Bridge https://www.flickr.com/photos/britishlibrary/11234482436 - Wensleydale & Swaledale Image taken from page 97 of 'The Mineral Baths of Bath. The Bathes of Bathe's Ayde in the reign of Charles 2nd as illustrated by a drawing of the King's and Queen's Bath, signed 1675. Whereunto is annexed a Visit to Bath in the year 1675 by “A Person of Q" by The British Library (More from this book here: https://www.flickr.com/search/? tags=sysnum000878624) Image taken from page 467 of '[The History of New South Wales, including Botany Bay, Port Jackson, Pamaratta [sic], Sydney, and all its dependancies ... with the customs and manners of the natives, and an account of the English colony, from its foundation https://www.flickr.com/photos/britishlibrary/11001417405 http://britishlibrary.typepad.co.uk/digital-scholarship/2013/10/peeking-behind-the-curtain-of-the-mechanical-curator.html