SlideShare a Scribd company logo
1 of 71
Download to read offline
Building Bridges 
(and rapid depreciation)
David Foster Wallace, on Ambition: 
“You know, the whole thing about perfectionism. The 
perfectionism is very dangerous, because of course if your 
fidelity to perfectionism is too high, you never do 
anything. 
Because doing anything results in— It’s actually kind of 
tragic because it means you sacrifice how gorgeous and 
perfect it is in your head for what it really is.” 
- As told to Leonard Lopate on WNYC on March 4, 1996. 
(emphasis my own) 
http://blankonblank.org/interviews/david-foster-wallace-on-ambition/
The unifying theme to (pretty much) 
all the requests:
The unifying theme to (pretty much) 
all the requests: 
Give me 
EVERYTHING!
The unifying theme to (pretty much) 
all the requests: 
Give me 
EVERYTHING! 
(that might be important to my work)
Fetch!
Why? 
“Can’t they just find the things they want 
through the catalogue?”
1. If they knew which bits 
of data were necessary, 
they would already know 
the answers.
“I am 
interested in 
travel 
accounts in 
Europe during 
the 19th 
Century”
2. If a conventional 
search interface worked, 
they wouldn’t be asking.
How does conventional search work 
anyway? Under what assumptions? 
Starts with the Text: 
“I quickly explained that many big jobs involve 
a few hazards.”
How does conventional search work 
anyway? Under what assumptions? 
Then it is Tokenised (with some assumptions 
on how this is possible): 
“I”, “quickly”, “explained”, “that”, ”many”, “big”, 
“jobs”, “involve”, “a”, “few”, “hazards”
How does conventional search work 
anyway? Under what assumptions? 
Then, the most common words are removed 
as these are assumed to be unimportant. 
(Stopwords) 
“quickly”, “explained”, ”many”, “big”, “jobs”, 
“involve”, “few”, “hazards”
How does conventional search work 
anyway? Under what assumptions? 
Many fulltext search services will also perform 
language-specific Stemming, that is, to reduce 
each word to a root: 
“quick”, “explain”, ”many”, “big”, “job”, 
“involve”, “few”, “hazard” 
(Lookup ‘porter’ and ‘snowball’ stemmers for more.)
How does conventional search work 
anyway? Under what assumptions? 
Finally, an inverse-index is created* and 
arranged with the assumption that you want to 
find the most Relevant results to future 
queries. 
Search terms are passed through the same 
workflow. 
(*Contemporary search engines are more complex of course, but the basics 
are still there.)
Why on earth did I teach you about 
search? 
All services are made with compromises and 
assumptions, and it is good to examine these 
from time to time. 
The key assumption is that people will search 
for the most Relevant record that matches the 
text they entered.
The most Relevant record 
that matches the text they 
entered.
Why not: 
All the works that likely 
cover a specific topic I 
define or fit an arbitrary 
algorithm I can supply.
“That’s great and all but 
it’s all subjective; you 
can’t teach a computer 
that…”
http://www.robertelliottsmith.com/?p=530
“16 Sad Women”
“I am 
interested in 
travel 
accounts in 
Europe during 
the 19th 
Century”
2013 Competition winners 
http://labs.bl.uk/Ideas+for+Labs 
Pieter Francois


2013 Competition winners 
http://labs.bl.uk/Ideas+for+Labs 
Dan Norton - “Mixing the Library. Information 
Interaction and the DJ” 
Can a researcher record a session drawing 
from digital objects, in the same way a DJ does 
with music tracks?
The other unifying themes to the 
requests: 
“I need tools to help me interpret the vast 
amount of content you hold. You don’t provide 
any but make it impossible for others to do 
so.” 
“I want to work on broad sweeps of content, 
rather than book-by-book. It would take too 
much time to get each one.” 
“API? what’s that? I don’t care. Just give me the 
files.”
So, a challenge was born… 
If a researcher is given direct file access to a 
large amount of data, can it be useful? 
What internal conventions would need to be 
removed? What external conventions added? 
One way to try it out, was to pretend to be a 
researcher and to ‘eat our own dogfood’.
How has the depiction of 
faces changed in books 
over the 19th Century? 
aka how well does modern photographic 
face detection routines work on 19th C 
illustrations?
Success? Not really. 
Many more female faces were found than 
male. 
This did not mean that there are more 
images of women in the books than men!
19C depictions of faces 
• Often drawn more symmetrically - male faces 
were more likely to be exaggerated. 
• Depiction is typically 'clean' and posed 
• Fashion: beards, spectacles and hats - different 
to the modern photographic training data
There was something else though... 
People on their way past would occasionally 
pause and look over my shoulder. 
Every day it dug up illustrations that 
surprised me and the team around me. 
So… I wondered if anyone else might be 
surprised and intrigued by them too? 
http://mechanicalcurator.tumblr.com/archive
How does machine learning work? 
First, turn the raw data into numbers, 
something the computer can deal with: 
eg when analysing text, assign a number to 
each word and form a ‘dictionary’
How does machine learning work? 
Process the numeric data in an effort to 
better expose the “important” information 
- removing noise and tone variation from an 
image 
- turning a grid of pixels into independant 
trackable ‘points of interest’ 
- hue, saturation, levels 
- produce metrics
How does machine learning work? 
Annotate - manually or automatically - what is 
useful and what is not in a portion of the data: 
- Characteristics: 
- Spam or not? 
- Face at x,y,w,h 
- Positive, neutral and negative sentiment 
- Scalar qualities
How does machine learning work? 
Pass most of the ‘known’ data through one of 
many machine learning algorithms, such as a 
Scalable Vector Machine (SVM) as 
implemented in libsvm. 
Which one depends entirely on what the 
computer will be able to do once trained.
How does machine learning work? 
Test your trained machine with half of the rest 
of the data to see how it does. 
eg if characterising email, does it correctly spot 
Spam?
How does machine learning work? 
Now, use the trained profile on real data! 
Sometimes, these profiles are shared, for 
example, Haar cascades trained on 
photographic datasets (face, body, etc) are 
freely available
Why the second lesson? 
Analysis starts with a bulk set of data, and a 
set of assumptions and ideas. 
The usefulness of a stemming/tokenising 
search service is unquestioned and Libraries 
support metadata-level search. 
No-one can support all assumptions and 
ideas!
Surprising? It was an experiment, 
after all...
Accessible? 
• In theory, the books were accessible. 
• In practice, it was a real challenge to find 
anything viewable. 
The chasm between digital and print: 
http://samplegenerator.cloudapp.net
As this is all in the public domain 
anyway... 
What’s the harm in making it a bit more 
accessible? 
The Mechanical Curator twitter account has 
only got a handful of people following it 
after all. Maybe there isn’t much appetite for 
it?
Impact? 
Hard to measure: 
- 20 million hits on average every month, 
over 200 million in 10 months*. 
- Over 100,000 tags added. 
- Hundreds of contributors. 
- Iterative crowdsourcing is ongoing. 
- Peter Balman’s aforementioned project 
* Are image view stats really a good measure?
Research and Technology 
• Mario Klingemann Pattern Recognition Software 
• Collaborative PhD ‘A History of the Printed Image 1750-1850: Applying 
Data Science Techniques to Printed Book Illustration’ 
• TSB Digitial Innovation Contest New tech for tracking Public Domain in 
the Wild
Crowdsourcing & Apps 
• Metadata Games 
• Wikipedia Synoptic Index 
• BL Georeferencer - 3221 maps referenced in a few weeks!
Tagathon!
[Tangent warning] 
Scott Nicholson’s RECIPE
Creative Uses 
• David Normal installation at Burning Man Festival 
• “Moments” by Joe Bell 
• Colouring-in Pages for Children
Tutorial 
s 
• Using Photoshop to Up-res images 
• Converting images to vector graphics
Collaborations with Colleagues 
• Inspired by Flickr, a Sound Archive series 
• Maps will be fed into the next phase of the Georeferencer
Education 
• Images included in Wikipedia Articles 
• University of Minnesota English Literature Course Exercise on Tagging 
• Art Therapy Courses
The ‘British Library Big Data 
Experiment’ 
http://britishlibrary.typepad.co.uk/digital-scholarship/ 
2014/06/the-british-library-big-data- 
experiment.html 
“What can a group of UCL Big Data CS 
students do when given access to cloud 
computing, all of the book data and a focus 
group of digital humanists?”
The ‘British Library Big Data 
Experiment’ 
Next phase will work with an undergraduate 
team with experience at image analysis. 
We are hosting an event on the 18th of 
December 2014, on “Pattern Recognition”.
In summary, “Clarity” 
It is clear that we can: 
fail and fail quickly 
build experiments that 
won’t last 
open content 
build bridges
My contact details for later technical 
questions: 
ben.osteen@bl.uk 
@benosteen 
Links: 
http://labs.bl.uk 
http://mechanicalcurator.tumblr.com 
https://flickr.com/photos/britishlibrary 
https://github.com/bl-labs 
http://britishlibrary.typepad.co.uk/digital-scholarship/2013/12/a-million-first-steps.html
Image credits: 
Title image: from https://www.flickr.com/photos/britishlibrary/11223645575 
Title: "The Book of The Grand Junction Railway, being a history and description of the line from Birmingham to Liverpool and 
Manchester ... By T. Roscoe, assisted by the resident engineers of the line" 
Author: Roscoe, Thomas. 
Shelfmark: "British Library HMNTS 796.f.3." 
https://www.flickr.com/photos/britishlibrary/11209677645 - Foot Bridge, Dartmoor 
https://www.flickr.com/photos/britishlibrary/11208502325 - The Suspension Bridge 
https://www.flickr.com/photos/britishlibrary/11234482436 - Wensleydale & Swaledale 
Image taken from page 97 of 'The Mineral Baths of Bath. The Bathes of Bathe's Ayde in the reign of Charles 2nd as 
illustrated by a drawing of the King's and Queen's Bath, signed 1675. Whereunto is annexed a Visit to Bath in the year 
1675 by “A Person of Q" by The British Library (More from this book here: https://www.flickr.com/search/? 
tags=sysnum000878624) 
Image taken from page 467 of '[The History of New South Wales, including Botany Bay, Port Jackson, Pamaratta [sic], 
Sydney, and all its dependancies ... with the customs and manners of the natives, and an account of the English colony, 
from its foundation https://www.flickr.com/photos/britishlibrary/11001417405 
http://britishlibrary.typepad.co.uk/digital-scholarship/2013/10/peeking-behind-the-curtain-of-the-mechanical-curator.html

More Related Content

What's hot

Unstructure: Smashing the Boundaries of Data (SxSWi 2014)
Unstructure: Smashing the Boundaries of Data (SxSWi 2014)Unstructure: Smashing the Boundaries of Data (SxSWi 2014)
Unstructure: Smashing the Boundaries of Data (SxSWi 2014)
Ian Varley
 
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
museums and the web
 

What's hot (20)

Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...
 
Building Social Software for the Anti-Social: Part I
Building Social Software for the Anti-Social: Part IBuilding Social Software for the Anti-Social: Part I
Building Social Software for the Anti-Social: Part I
 
Twitter for Researchers
Twitter for ResearchersTwitter for Researchers
Twitter for Researchers
 
Isle of Man open data overview
Isle of Man open data overviewIsle of Man open data overview
Isle of Man open data overview
 
Book Takeout and User-Docused Delivery
Book Takeout and User-Docused DeliveryBook Takeout and User-Docused Delivery
Book Takeout and User-Docused Delivery
 
I've Always Wanted To Data Model - Data Week 2013
I've Always Wanted To Data Model - Data Week 2013I've Always Wanted To Data Model - Data Week 2013
I've Always Wanted To Data Model - Data Week 2013
 
Oredev 2011: Building Social Software for the Anti-Social Part II, Electric B...
Oredev 2011: Building Social Software for the Anti-Social Part II, Electric B...Oredev 2011: Building Social Software for the Anti-Social Part II, Electric B...
Oredev 2011: Building Social Software for the Anti-Social Part II, Electric B...
 
Unstructure: Smashing the Boundaries of Data (SxSWi 2014)
Unstructure: Smashing the Boundaries of Data (SxSWi 2014)Unstructure: Smashing the Boundaries of Data (SxSWi 2014)
Unstructure: Smashing the Boundaries of Data (SxSWi 2014)
 
Five Ways to Get Better Data From Our Users
Five Ways to Get Better Data From Our UsersFive Ways to Get Better Data From Our Users
Five Ways to Get Better Data From Our Users
 
Digital Distraction and Digital Overload: Maybe Nicholas Carr was Right!
Digital Distraction and Digital Overload: Maybe Nicholas Carr was Right!Digital Distraction and Digital Overload: Maybe Nicholas Carr was Right!
Digital Distraction and Digital Overload: Maybe Nicholas Carr was Right!
 
The time for Libraries is NOW
The time for Libraries is NOWThe time for Libraries is NOW
The time for Libraries is NOW
 
How to stop sucking and be awesome instead
How to stop sucking and be awesome insteadHow to stop sucking and be awesome instead
How to stop sucking and be awesome instead
 
Place graphs are the new social graphs
Place graphs are the new social graphsPlace graphs are the new social graphs
Place graphs are the new social graphs
 
Summary Project Slideshow
Summary Project SlideshowSummary Project Slideshow
Summary Project Slideshow
 
e-Learning A to Z - Part 2 (N-Z)
e-Learning A to Z  - Part 2 (N-Z)e-Learning A to Z  - Part 2 (N-Z)
e-Learning A to Z - Part 2 (N-Z)
 
Fighting Spam at Flickr
Fighting Spam at FlickrFighting Spam at Flickr
Fighting Spam at Flickr
 
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...
 
Cyborgs
CyborgsCyborgs
Cyborgs
 
Introduction to the Semantic Web
Introduction to the Semantic WebIntroduction to the Semantic Web
Introduction to the Semantic Web
 
Web 2.0 and virtual worlds
Web 2.0 and virtual worldsWeb 2.0 and virtual worlds
Web 2.0 and virtual worlds
 

Viewers also liked (7)

Ajax Introduction
Ajax IntroductionAjax Introduction
Ajax Introduction
 
Ajax Introduction
Ajax IntroductionAjax Introduction
Ajax Introduction
 
How To Build Website
How To Build WebsiteHow To Build Website
How To Build Website
 
OpenIDEO i20 Presentation 1.13.11
OpenIDEO i20 Presentation 1.13.11 OpenIDEO i20 Presentation 1.13.11
OpenIDEO i20 Presentation 1.13.11
 
大規模画像配信とPerl
大規模画像配信とPerl大規模画像配信とPerl
大規模画像配信とPerl
 
Introduction to ajax
Introduction to ajaxIntroduction to ajax
Introduction to ajax
 
Introduction to ajax
Introduction to ajaxIntroduction to ajax
Introduction to ajax
 

Similar to BL Labs 2014 Symposium: The Mechanical Curator

being observable
being observablebeing observable
being observable
judell
 

Similar to BL Labs 2014 Symposium: The Mechanical Curator (20)

CityLIS talk, Feb 1st 2016
CityLIS talk, Feb 1st 2016CityLIS talk, Feb 1st 2016
CityLIS talk, Feb 1st 2016
 
Data Visualisation Literacy - Learning to See
Data Visualisation Literacy - Learning to SeeData Visualisation Literacy - Learning to See
Data Visualisation Literacy - Learning to See
 
The surprising adventures of the mechanical curator
The surprising adventures of the mechanical curatorThe surprising adventures of the mechanical curator
The surprising adventures of the mechanical curator
 
Software art and design: computational thinking through programming practice ...
Software art and design: computational thinking through programming practice ...Software art and design: computational thinking through programming practice ...
Software art and design: computational thinking through programming practice ...
 
Voices from the Field: Practices, Challenges & Directions in Digital Humaniti...
Voices from the Field: Practices, Challenges & Directions in Digital Humaniti...Voices from the Field: Practices, Challenges & Directions in Digital Humaniti...
Voices from the Field: Practices, Challenges & Directions in Digital Humaniti...
 
NDF,Te Papa, New Zealand 2015 - Keynote
NDF,Te Papa, New Zealand 2015 - KeynoteNDF,Te Papa, New Zealand 2015 - Keynote
NDF,Te Papa, New Zealand 2015 - Keynote
 
Voices from the Field
Voices from the FieldVoices from the Field
Voices from the Field
 
Flow based-1994
Flow based-1994Flow based-1994
Flow based-1994
 
2013 LIANZA Keynote: River's End
2013 LIANZA Keynote: River's End2013 LIANZA Keynote: River's End
2013 LIANZA Keynote: River's End
 
AI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and Publishing
AI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and PublishingAI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and Publishing
AI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and Publishing
 
The Object Orientation of Teams
The Object Orientation of TeamsThe Object Orientation of Teams
The Object Orientation of Teams
 
being observable
being observablebeing observable
being observable
 
Wassup with Web 2.0
Wassup with Web 2.0Wassup with Web 2.0
Wassup with Web 2.0
 
Data Visualisation - An Introduction
Data Visualisation - An IntroductionData Visualisation - An Introduction
Data Visualisation - An Introduction
 
Beyond Usage Stats (Or, demonstrating value & marketing services when you hav...
Beyond Usage Stats (Or, demonstrating value & marketing services when you hav...Beyond Usage Stats (Or, demonstrating value & marketing services when you hav...
Beyond Usage Stats (Or, demonstrating value & marketing services when you hav...
 
Malaysian Higher Ed-UN Learning
Malaysian Higher Ed-UN LearningMalaysian Higher Ed-UN Learning
Malaysian Higher Ed-UN Learning
 
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...
 
What is a Creative Date Scientist (and why the $@%! do we need one?)
What is a Creative Date Scientist (and why the $@%! do we need one?)What is a Creative Date Scientist (and why the $@%! do we need one?)
What is a Creative Date Scientist (and why the $@%! do we need one?)
 
104 Communicating our Collections Online
104 Communicating our Collections Online104 Communicating our Collections Online
104 Communicating our Collections Online
 
User Experience Webinar 1 - Eye-popping Content: Creating a User-friendly Fra...
User Experience Webinar 1 - Eye-popping Content: Creating a User-friendly Fra...User Experience Webinar 1 - Eye-popping Content: Creating a User-friendly Fra...
User Experience Webinar 1 - Eye-popping Content: Creating a User-friendly Fra...
 

More from benosteen

Mechanical curator - Technical notes
Mechanical curator - Technical notesMechanical curator - Technical notes
Mechanical curator - Technical notes
benosteen
 
Apache pig as a researcher’s stepping stone
Apache pig as a researcher’s stepping stoneApache pig as a researcher’s stepping stone
Apache pig as a researcher’s stepping stone
benosteen
 
New methods of access and discoverability bring new affordances for digital r...
New methods of access and discoverability bring new affordances for digital r...New methods of access and discoverability bring new affordances for digital r...
New methods of access and discoverability bring new affordances for digital r...
benosteen
 
Visualising Knowledge: Why? What? How?
Visualising Knowledge: Why? What? How?Visualising Knowledge: Why? What? How?
Visualising Knowledge: Why? What? How?
benosteen
 
Choices, modelling and Frankenstein Ontologies
Choices, modelling and Frankenstein OntologiesChoices, modelling and Frankenstein Ontologies
Choices, modelling and Frankenstein Ontologies
benosteen
 

More from benosteen (20)

Arches Getty Brownbag Talk
Arches Getty Brownbag TalkArches Getty Brownbag Talk
Arches Getty Brownbag Talk
 
Bl labs ucl-services
Bl labs ucl-servicesBl labs ucl-services
Bl labs ucl-services
 
Bl labs what is british library labs
Bl labs   what is british library labsBl labs   what is british library labs
Bl labs what is british library labs
 
British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017British Library Labs - Overview Talk 2017
British Library Labs - Overview Talk 2017
 
Uses of Library Collections
Uses of Library CollectionsUses of Library Collections
Uses of Library Collections
 
British library labs - What? Why?
British library labs - What? Why?British library labs - What? Why?
British library labs - What? Why?
 
UKSG 2015 Mechanical curator and British Library labs
UKSG 2015  Mechanical curator and British Library labsUKSG 2015  Mechanical curator and British Library labs
UKSG 2015 Mechanical curator and British Library labs
 
Lightning Talk - LDCX 2015 Stanford
Lightning Talk - LDCX 2015 StanfordLightning Talk - LDCX 2015 Stanford
Lightning Talk - LDCX 2015 Stanford
 
Sharing and Serendipity
Sharing and SerendipitySharing and Serendipity
Sharing and Serendipity
 
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
Mechanical Curator (@ CREATE PUBLIC DOMAIN WORKSHOP FOR CREATIVE BUSINESSES)
 
Mechanical curator - Technical notes
Mechanical curator - Technical notesMechanical curator - Technical notes
Mechanical curator - Technical notes
 
Apache pig as a researcher’s stepping stone
Apache pig as a researcher’s stepping stoneApache pig as a researcher’s stepping stone
Apache pig as a researcher’s stepping stone
 
New methods of access and discoverability bring new affordances for digital r...
New methods of access and discoverability bring new affordances for digital r...New methods of access and discoverability bring new affordances for digital r...
New methods of access and discoverability bring new affordances for digital r...
 
Visualising Knowledge: Why? What? How?
Visualising Knowledge: Why? What? How?Visualising Knowledge: Why? What? How?
Visualising Knowledge: Why? What? How?
 
Mashspa
MashspaMashspa
Mashspa
 
Postscript, books and binding
Postscript, books and bindingPostscript, books and binding
Postscript, books and binding
 
Open Bibliography, Citations and Scholarship
Open Bibliography, Citations and ScholarshipOpen Bibliography, Citations and Scholarship
Open Bibliography, Citations and Scholarship
 
Text-mining and Automation
Text-mining and AutomationText-mining and Automation
Text-mining and Automation
 
Bodleian Library's DAMS system
Bodleian Library's DAMS systemBodleian Library's DAMS system
Bodleian Library's DAMS system
 
Choices, modelling and Frankenstein Ontologies
Choices, modelling and Frankenstein OntologiesChoices, modelling and Frankenstein Ontologies
Choices, modelling and Frankenstein Ontologies
 

Recently uploaded

Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
KarakKing
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptxExploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
Exploring_the_Narrative_Style_of_Amitav_Ghoshs_Gun_Island.pptx
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 
How to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POSHow to Manage Global Discount in Odoo 17 POS
How to Manage Global Discount in Odoo 17 POS
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...Kodo Millet  PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
Kodo Millet PPT made by Ghanshyam bairwa college of Agriculture kumher bhara...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Sociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning ExhibitSociology 101 Demonstration of Learning Exhibit
Sociology 101 Demonstration of Learning Exhibit
 
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptxHMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
HMCS Max Bernays Pre-Deployment Brief (May 2024).pptx
 
SOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning PresentationSOC 101 Demonstration of Learning Presentation
SOC 101 Demonstration of Learning Presentation
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Salient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functionsSalient Features of India constitution especially power and functions
Salient Features of India constitution especially power and functions
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
ICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptxICT Role in 21st Century Education & its Challenges.pptx
ICT Role in 21st Century Education & its Challenges.pptx
 
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptxHMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
HMCS Vancouver Pre-Deployment Brief - May 2024 (Web Version).pptx
 
latest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answerslatest AZ-104 Exam Questions and Answers
latest AZ-104 Exam Questions and Answers
 
Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)Jamworks pilot and AI at Jisc (20/03/2024)
Jamworks pilot and AI at Jisc (20/03/2024)
 
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptxCOMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
COMMUNICATING NEGATIVE NEWS - APPROACHES .pptx
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17How to Create and Manage Wizard in Odoo 17
How to Create and Manage Wizard in Odoo 17
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 

BL Labs 2014 Symposium: The Mechanical Curator

  • 1. Building Bridges (and rapid depreciation)
  • 2.
  • 3.
  • 4. David Foster Wallace, on Ambition: “You know, the whole thing about perfectionism. The perfectionism is very dangerous, because of course if your fidelity to perfectionism is too high, you never do anything. Because doing anything results in— It’s actually kind of tragic because it means you sacrifice how gorgeous and perfect it is in your head for what it really is.” - As told to Leonard Lopate on WNYC on March 4, 1996. (emphasis my own) http://blankonblank.org/interviews/david-foster-wallace-on-ambition/
  • 5. The unifying theme to (pretty much) all the requests:
  • 6. The unifying theme to (pretty much) all the requests: Give me EVERYTHING!
  • 7. The unifying theme to (pretty much) all the requests: Give me EVERYTHING! (that might be important to my work)
  • 9. Why? “Can’t they just find the things they want through the catalogue?”
  • 10. 1. If they knew which bits of data were necessary, they would already know the answers.
  • 11.
  • 12. “I am interested in travel accounts in Europe during the 19th Century”
  • 13. 2. If a conventional search interface worked, they wouldn’t be asking.
  • 14. How does conventional search work anyway? Under what assumptions? Starts with the Text: “I quickly explained that many big jobs involve a few hazards.”
  • 15. How does conventional search work anyway? Under what assumptions? Then it is Tokenised (with some assumptions on how this is possible): “I”, “quickly”, “explained”, “that”, ”many”, “big”, “jobs”, “involve”, “a”, “few”, “hazards”
  • 16. How does conventional search work anyway? Under what assumptions? Then, the most common words are removed as these are assumed to be unimportant. (Stopwords) “quickly”, “explained”, ”many”, “big”, “jobs”, “involve”, “few”, “hazards”
  • 17. How does conventional search work anyway? Under what assumptions? Many fulltext search services will also perform language-specific Stemming, that is, to reduce each word to a root: “quick”, “explain”, ”many”, “big”, “job”, “involve”, “few”, “hazard” (Lookup ‘porter’ and ‘snowball’ stemmers for more.)
  • 18. How does conventional search work anyway? Under what assumptions? Finally, an inverse-index is created* and arranged with the assumption that you want to find the most Relevant results to future queries. Search terms are passed through the same workflow. (*Contemporary search engines are more complex of course, but the basics are still there.)
  • 19. Why on earth did I teach you about search? All services are made with compromises and assumptions, and it is good to examine these from time to time. The key assumption is that people will search for the most Relevant record that matches the text they entered.
  • 20. The most Relevant record that matches the text they entered.
  • 21. Why not: All the works that likely cover a specific topic I define or fit an arbitrary algorithm I can supply.
  • 22. “That’s great and all but it’s all subjective; you can’t teach a computer that…”
  • 25. “I am interested in travel accounts in Europe during the 19th Century”
  • 26. 2013 Competition winners http://labs.bl.uk/Ideas+for+Labs Pieter Francois
  • 27.
  • 28.
  • 29.
  • 30. 2013 Competition winners http://labs.bl.uk/Ideas+for+Labs Dan Norton - “Mixing the Library. Information Interaction and the DJ” Can a researcher record a session drawing from digital objects, in the same way a DJ does with music tracks?
  • 31. The other unifying themes to the requests: “I need tools to help me interpret the vast amount of content you hold. You don’t provide any but make it impossible for others to do so.” “I want to work on broad sweeps of content, rather than book-by-book. It would take too much time to get each one.” “API? what’s that? I don’t care. Just give me the files.”
  • 32. So, a challenge was born… If a researcher is given direct file access to a large amount of data, can it be useful? What internal conventions would need to be removed? What external conventions added? One way to try it out, was to pretend to be a researcher and to ‘eat our own dogfood’.
  • 33. How has the depiction of faces changed in books over the 19th Century? aka how well does modern photographic face detection routines work on 19th C illustrations?
  • 34.
  • 35. Success? Not really. Many more female faces were found than male. This did not mean that there are more images of women in the books than men!
  • 36. 19C depictions of faces • Often drawn more symmetrically - male faces were more likely to be exaggerated. • Depiction is typically 'clean' and posed • Fashion: beards, spectacles and hats - different to the modern photographic training data
  • 37. There was something else though... People on their way past would occasionally pause and look over my shoulder. Every day it dug up illustrations that surprised me and the team around me. So… I wondered if anyone else might be surprised and intrigued by them too? http://mechanicalcurator.tumblr.com/archive
  • 38.
  • 39.
  • 40.
  • 41. How does machine learning work? First, turn the raw data into numbers, something the computer can deal with: eg when analysing text, assign a number to each word and form a ‘dictionary’
  • 42. How does machine learning work? Process the numeric data in an effort to better expose the “important” information - removing noise and tone variation from an image - turning a grid of pixels into independant trackable ‘points of interest’ - hue, saturation, levels - produce metrics
  • 43. How does machine learning work? Annotate - manually or automatically - what is useful and what is not in a portion of the data: - Characteristics: - Spam or not? - Face at x,y,w,h - Positive, neutral and negative sentiment - Scalar qualities
  • 44. How does machine learning work? Pass most of the ‘known’ data through one of many machine learning algorithms, such as a Scalable Vector Machine (SVM) as implemented in libsvm. Which one depends entirely on what the computer will be able to do once trained.
  • 45. How does machine learning work? Test your trained machine with half of the rest of the data to see how it does. eg if characterising email, does it correctly spot Spam?
  • 46. How does machine learning work? Now, use the trained profile on real data! Sometimes, these profiles are shared, for example, Haar cascades trained on photographic datasets (face, body, etc) are freely available
  • 47. Why the second lesson? Analysis starts with a bulk set of data, and a set of assumptions and ideas. The usefulness of a stemming/tokenising search service is unquestioned and Libraries support metadata-level search. No-one can support all assumptions and ideas!
  • 48.
  • 49. Surprising? It was an experiment, after all...
  • 50. Accessible? • In theory, the books were accessible. • In practice, it was a real challenge to find anything viewable. The chasm between digital and print: http://samplegenerator.cloudapp.net
  • 51. As this is all in the public domain anyway... What’s the harm in making it a bit more accessible? The Mechanical Curator twitter account has only got a handful of people following it after all. Maybe there isn’t much appetite for it?
  • 52.
  • 53. Impact? Hard to measure: - 20 million hits on average every month, over 200 million in 10 months*. - Over 100,000 tags added. - Hundreds of contributors. - Iterative crowdsourcing is ongoing. - Peter Balman’s aforementioned project * Are image view stats really a good measure?
  • 54.
  • 55.
  • 56.
  • 57. Research and Technology • Mario Klingemann Pattern Recognition Software • Collaborative PhD ‘A History of the Printed Image 1750-1850: Applying Data Science Techniques to Printed Book Illustration’ • TSB Digitial Innovation Contest New tech for tracking Public Domain in the Wild
  • 58. Crowdsourcing & Apps • Metadata Games • Wikipedia Synoptic Index • BL Georeferencer - 3221 maps referenced in a few weeks!
  • 60. [Tangent warning] Scott Nicholson’s RECIPE
  • 61. Creative Uses • David Normal installation at Burning Man Festival • “Moments” by Joe Bell • Colouring-in Pages for Children
  • 62. Tutorial s • Using Photoshop to Up-res images • Converting images to vector graphics
  • 63. Collaborations with Colleagues • Inspired by Flickr, a Sound Archive series • Maps will be fed into the next phase of the Georeferencer
  • 64. Education • Images included in Wikipedia Articles • University of Minnesota English Literature Course Exercise on Tagging • Art Therapy Courses
  • 65.
  • 66. The ‘British Library Big Data Experiment’ http://britishlibrary.typepad.co.uk/digital-scholarship/ 2014/06/the-british-library-big-data- experiment.html “What can a group of UCL Big Data CS students do when given access to cloud computing, all of the book data and a focus group of digital humanists?”
  • 67. The ‘British Library Big Data Experiment’ Next phase will work with an undergraduate team with experience at image analysis. We are hosting an event on the 18th of December 2014, on “Pattern Recognition”.
  • 68.
  • 69. In summary, “Clarity” It is clear that we can: fail and fail quickly build experiments that won’t last open content build bridges
  • 70. My contact details for later technical questions: ben.osteen@bl.uk @benosteen Links: http://labs.bl.uk http://mechanicalcurator.tumblr.com https://flickr.com/photos/britishlibrary https://github.com/bl-labs http://britishlibrary.typepad.co.uk/digital-scholarship/2013/12/a-million-first-steps.html
  • 71. Image credits: Title image: from https://www.flickr.com/photos/britishlibrary/11223645575 Title: "The Book of The Grand Junction Railway, being a history and description of the line from Birmingham to Liverpool and Manchester ... By T. Roscoe, assisted by the resident engineers of the line" Author: Roscoe, Thomas. Shelfmark: "British Library HMNTS 796.f.3." https://www.flickr.com/photos/britishlibrary/11209677645 - Foot Bridge, Dartmoor https://www.flickr.com/photos/britishlibrary/11208502325 - The Suspension Bridge https://www.flickr.com/photos/britishlibrary/11234482436 - Wensleydale & Swaledale Image taken from page 97 of 'The Mineral Baths of Bath. The Bathes of Bathe's Ayde in the reign of Charles 2nd as illustrated by a drawing of the King's and Queen's Bath, signed 1675. Whereunto is annexed a Visit to Bath in the year 1675 by “A Person of Q" by The British Library (More from this book here: https://www.flickr.com/search/? tags=sysnum000878624) Image taken from page 467 of '[The History of New South Wales, including Botany Bay, Port Jackson, Pamaratta [sic], Sydney, and all its dependancies ... with the customs and manners of the natives, and an account of the English colony, from its foundation https://www.flickr.com/photos/britishlibrary/11001417405 http://britishlibrary.typepad.co.uk/digital-scholarship/2013/10/peeking-behind-the-curtain-of-the-mechanical-curator.html