Surprising? It was an experiment,
after all...
Ah, I’ve not said what ‘it’ is, have I?
Let’s start at the beginning...
[Insert clichéd time-travel effects here]
British Library Labs
funded by the Andrew K. Mellon foundation
(2013 - …)
⇒ Mahendra Mahey and Ben O’Steen
Experimenting for the sake of the
researcher:
British Library Labs - http://labs.bl.uk
(situated in the ‘Digital Scholarship’ dept.)
“Create, explore and foster new and innovative
ways to work with the British Library’s existing
digital content.”*
(*My paraphrasing)
Digital Research Team
• The Digital Research Team is a
cross-disciplinary mix of curators,
librarians and programmers
supporting the innovative use of
our digital collections.
• We explore how digital technologies
are re/shaping research and how this
informs how the library does its
business.
• We encourage and support
colleagues & scholars of all
disciplines to work innovatively with
and across the library’s diverse
digital content
Engaging with researchers
Informally (typically through direct questions):
“Can I work with all of the scanned books that
might be about 19th century European travel?”
“I like ‘distant reading’. I don’t know what it is
exactly, but it sounds useful. Does the library
offer that?”
Engaging with researchers
Formally (through our yearly ‘competitions’)
Researchers submit a proposal, entries are
shortlisted and two ‘winners’ are picked.
They win our time and effort
(and travel expenses and, so I’ve been told,
some actual money at the end of it.)
2013 Competition winners
http://labs.bl.uk/Ideas+for+Labs
Pieter Francois
2013 Competition winners
http://labs.bl.uk/Ideas+for+Labs
Dan Norton - “Mixing the Library. Information
Interaction and the DJ”
Can a researcher record a session drawing
from digital objects, in the same way a DJ does
with music tracks?
The unifying theme to (pretty much)
all the requests:
The unifying theme to (pretty much)
all the requests:
Give us
EVERYTHING!
Fetch!
The *other* unifying themes to the
requests:
“I need tools to help me interpret the vast
amount of content you hold. You don’t provide
any but make it impossible for others to do so.”
“I want to work on broad sweeps of content,
rather than book-by-book.”
“API? what’s that? I don’t care. Just give me
the files.”
So, a challenge was born…
If a researcher was given direct file access to a
large amount of data, can it be useful?
One way to try it out, was to ‘eat our own
dogfood’
How has the depiction of
faces changed in books
over the 19th Century?
aka how well does modern photographic
face detection routines work on 19th C
illustrations?
Success? Not really.
• Likelyhood of detection:
• Female faces > Male
19C depictions of faces
• Likelyhood of detection:
• Female faces > Male
• Any common differences?
• Often drawn more symmetrically - male faces
were more likely to be exaggerated.
• Depiction is typically 'clean' and posed
• Fashion: beards, spectacles and hats -
different to the modern training data
There was something else though...
People on their way past would occasionally
pause and look over my shoulder.
Every day it dug up illustrations that surprised
me and the team around me.
So… I wonder if anyone else might be
surprised and intrigued by them too?
http://mechanicalcurator.tumblr.com/archive
Accessible?
• In theory, the books were accessible.
• In practice, it was a real challenge to find
anything viewable.
The chasm between digital and print:
http://samplegenerator.cloudapp.net
As this is all in the public domain
anyway...
What’s the harm in making it a bit more
accessible?
The Mechanical Curator twitter account has
only got a handful of people following it after
all. Maybe there isn’t much appetite for it?
Creative Uses
• David Normal installation at Burning Man Festival
• “Moments” by Joe Bell
• Colouring-in Pages for Children
Research and Technology
• Mario Klingemann Pattern Recognition Software
• Collaborative PhD ‘A History of the Printed Image 1750-1850: Applying
Data Science Techniques to Printed Book Illustration’
• TSB Digitial Innovation Contest New tech for tracking Public Domain in
the Wild
Crowdsourcing & Apps
• Metadata Games
• Wikipedia Synoptic Index
Tutorial
s
• Using Photoshop to Up-res images
• Converting images to vector graphics
Collaborations with Colleagues
• Inspired by Flickr, a Sound Archive series
• Maps will be fed into the next phase of the Georeferencer
Education
• Images included in Wikipedia Articles
• University of Minnesota English Literature Course Exercise on Tagging
• Art Therapy Courses
Impact:
Hard to measure!
Are image view stats really a good measure?
(163 million views as of 10/06/2014)
How about getting every image viewed once?
(done by 5/3/2014) at least 5 times? (only a
few hundred left 12/06/2014!)
What’s next?
Accessible is great, but can we make our data
more useful?
Not just images, but any of our researchable
digital content?
Microsoft Azure 4 Research award
This was awarded to British Library Labs at
the end of last year. Equivalent to $40,000 for
a years use!
This gave us capacity and storage for our
“unplanned” experiments.
Microsoft (and UCL) are joining in on a new
experiment...
The ‘British Library Big Data
Experiment’
http://britishlibrary.typepad.co.uk/digital-
scholarship/2014/06/the-british-library-big-
data-experiment.html
“What can a group of UCL Big Data CS
students do when given access to cloud
computing, all of the book data and a focus
group of digital humanists?”
“Playing with assurance”
Using concepts from
party games like
“Wits and Wagers” to
validate and assure
incoming tags.
[Tangent warning]
Scott Nicholson’s RECIPE
In summary
• Mechanical Curator is still tweeting, and
Flickr is still standing.
• We are experimenting with broader and
more direct access to our digital holdings.
• There is a demand for this content!
• Take the content to where people go.
– NB ‘where’ is entirely dependant on the current
culture of search and research. Right now, it’s
Google.
Now for one, final experiment…
Thank you!
Questions?
Thank you for your time
(Autographs by reception, £10 each)
[drops microphone]
[audience goes wild]
[this slide intentionally left blank]
Maybe just ending it with
the useful details?
Contact:
ben.osteen@bl.uk
@benosteen
Links:
http://labs.bl.uk
http://mechanicalcurator.tumblr.com
http://flickr.com/photos/britishlibrary
http://britishlibrary.typepad.co.uk/digital-scholarship/2013/12/a-million-first-steps.html
Image credits:
Title image: from https://www.flickr.com/photos/britishlibrary/11160768745
"Image taken from page 97 of 'The Mineral Baths of Bath. The Bathes of Bathe's Ayde in the reign of Charles 2nd as
illustrated by a drawing of the King's and Queen's Bath, signed 1675. Whereunto is annexed a Visit to Bath in the year
1675 by “A Person of Q" by The British Library (More from this book here:
https://www.flickr.com/search/?tags=sysnum000878624)
Slide 10 - Title: The costume of Yorkshire, Illustrator: "Walker, George; Havell, R & D (George Walker; R and D Havell)"
Provenance: London, 1814 Caption: Rape threshing https://www.flickr.com/photos/britishlibrary/12459323374
Slide 11 - Image taken from page 3 of 'Worthy Workers. A monthly magazine for all, etc. [Edited and partly written by
Sarah Sutton.] no. 1. March 1886' https://www.flickr.com/photos/britishlibrary/11287891873
Slide 16 - Image taken from page 467 of '[The History of New South Wales, including Botany Bay, Port Jackson,
Pamaratta [sic], Sydney, and all its dependancies ... with the customs and manners of the natives, and an account of the
English colony, from its foundation https://www.flickr.com/photos/britishlibrary/11001417405
Slide 20 - Image from http://britishlibrary.typepad.co.uk/digital-scholarship/2013/10/peeking-behind-the-curtain-of-the-
mechanical-curator.html

The surprising adventures of the mechanical curator

  • 2.
    Surprising? It wasan experiment, after all...
  • 3.
    Ah, I’ve notsaid what ‘it’ is, have I? Let’s start at the beginning...
  • 4.
  • 5.
    British Library Labs fundedby the Andrew K. Mellon foundation (2013 - …) ⇒ Mahendra Mahey and Ben O’Steen
  • 6.
    Experimenting for thesake of the researcher: British Library Labs - http://labs.bl.uk (situated in the ‘Digital Scholarship’ dept.) “Create, explore and foster new and innovative ways to work with the British Library’s existing digital content.”* (*My paraphrasing)
  • 7.
    Digital Research Team •The Digital Research Team is a cross-disciplinary mix of curators, librarians and programmers supporting the innovative use of our digital collections. • We explore how digital technologies are re/shaping research and how this informs how the library does its business. • We encourage and support colleagues & scholars of all disciplines to work innovatively with and across the library’s diverse digital content
  • 8.
    Engaging with researchers Informally(typically through direct questions): “Can I work with all of the scanned books that might be about 19th century European travel?” “I like ‘distant reading’. I don’t know what it is exactly, but it sounds useful. Does the library offer that?”
  • 9.
    Engaging with researchers Formally(through our yearly ‘competitions’) Researchers submit a proposal, entries are shortlisted and two ‘winners’ are picked.
  • 10.
    They win ourtime and effort (and travel expenses and, so I’ve been told, some actual money at the end of it.)
  • 12.
  • 13.
    2013 Competition winners http://labs.bl.uk/Ideas+for+Labs DanNorton - “Mixing the Library. Information Interaction and the DJ” Can a researcher record a session drawing from digital objects, in the same way a DJ does with music tracks?
  • 14.
    The unifying themeto (pretty much) all the requests:
  • 15.
    The unifying themeto (pretty much) all the requests: Give us EVERYTHING!
  • 16.
  • 17.
    The *other* unifyingthemes to the requests: “I need tools to help me interpret the vast amount of content you hold. You don’t provide any but make it impossible for others to do so.” “I want to work on broad sweeps of content, rather than book-by-book.” “API? what’s that? I don’t care. Just give me the files.”
  • 18.
    So, a challengewas born… If a researcher was given direct file access to a large amount of data, can it be useful? One way to try it out, was to ‘eat our own dogfood’
  • 19.
    How has thedepiction of faces changed in books over the 19th Century? aka how well does modern photographic face detection routines work on 19th C illustrations?
  • 21.
    Success? Not really. •Likelyhood of detection: • Female faces > Male
  • 22.
    19C depictions offaces • Likelyhood of detection: • Female faces > Male • Any common differences? • Often drawn more symmetrically - male faces were more likely to be exaggerated. • Depiction is typically 'clean' and posed • Fashion: beards, spectacles and hats - different to the modern training data
  • 23.
    There was somethingelse though... People on their way past would occasionally pause and look over my shoulder. Every day it dug up illustrations that surprised me and the team around me. So… I wonder if anyone else might be surprised and intrigued by them too? http://mechanicalcurator.tumblr.com/archive
  • 26.
    Accessible? • In theory,the books were accessible. • In practice, it was a real challenge to find anything viewable. The chasm between digital and print: http://samplegenerator.cloudapp.net
  • 27.
    As this isall in the public domain anyway... What’s the harm in making it a bit more accessible? The Mechanical Curator twitter account has only got a handful of people following it after all. Maybe there isn’t much appetite for it?
  • 31.
    Creative Uses • DavidNormal installation at Burning Man Festival • “Moments” by Joe Bell • Colouring-in Pages for Children
  • 34.
    Research and Technology •Mario Klingemann Pattern Recognition Software • Collaborative PhD ‘A History of the Printed Image 1750-1850: Applying Data Science Techniques to Printed Book Illustration’ • TSB Digitial Innovation Contest New tech for tracking Public Domain in the Wild
  • 35.
    Crowdsourcing & Apps •Metadata Games • Wikipedia Synoptic Index
  • 36.
    Tutorial s • Using Photoshopto Up-res images • Converting images to vector graphics
  • 37.
    Collaborations with Colleagues •Inspired by Flickr, a Sound Archive series • Maps will be fed into the next phase of the Georeferencer
  • 38.
    Education • Images includedin Wikipedia Articles • University of Minnesota English Literature Course Exercise on Tagging • Art Therapy Courses
  • 39.
    Impact: Hard to measure! Areimage view stats really a good measure? (163 million views as of 10/06/2014) How about getting every image viewed once? (done by 5/3/2014) at least 5 times? (only a few hundred left 12/06/2014!)
  • 40.
    What’s next? Accessible isgreat, but can we make our data more useful? Not just images, but any of our researchable digital content?
  • 41.
    Microsoft Azure 4Research award This was awarded to British Library Labs at the end of last year. Equivalent to $40,000 for a years use! This gave us capacity and storage for our “unplanned” experiments. Microsoft (and UCL) are joining in on a new experiment...
  • 42.
    The ‘British LibraryBig Data Experiment’ http://britishlibrary.typepad.co.uk/digital- scholarship/2014/06/the-british-library-big- data-experiment.html “What can a group of UCL Big Data CS students do when given access to cloud computing, all of the book data and a focus group of digital humanists?”
  • 43.
    “Playing with assurance” Usingconcepts from party games like “Wits and Wagers” to validate and assure incoming tags.
  • 44.
  • 45.
    In summary • MechanicalCurator is still tweeting, and Flickr is still standing. • We are experimenting with broader and more direct access to our digital holdings. • There is a demand for this content! • Take the content to where people go. – NB ‘where’ is entirely dependant on the current culture of search and research. Right now, it’s Google.
  • 46.
    Now for one,final experiment…
  • 47.
  • 48.
    Thank you foryour time (Autographs by reception, £10 each)
  • 49.
  • 50.
  • 51.
    Maybe just endingit with the useful details?
  • 52.
  • 53.
    Image credits: Title image:from https://www.flickr.com/photos/britishlibrary/11160768745 "Image taken from page 97 of 'The Mineral Baths of Bath. The Bathes of Bathe's Ayde in the reign of Charles 2nd as illustrated by a drawing of the King's and Queen's Bath, signed 1675. Whereunto is annexed a Visit to Bath in the year 1675 by “A Person of Q" by The British Library (More from this book here: https://www.flickr.com/search/?tags=sysnum000878624) Slide 10 - Title: The costume of Yorkshire, Illustrator: "Walker, George; Havell, R & D (George Walker; R and D Havell)" Provenance: London, 1814 Caption: Rape threshing https://www.flickr.com/photos/britishlibrary/12459323374 Slide 11 - Image taken from page 3 of 'Worthy Workers. A monthly magazine for all, etc. [Edited and partly written by Sarah Sutton.] no. 1. March 1886' https://www.flickr.com/photos/britishlibrary/11287891873 Slide 16 - Image taken from page 467 of '[The History of New South Wales, including Botany Bay, Port Jackson, Pamaratta [sic], Sydney, and all its dependancies ... with the customs and manners of the natives, and an account of the English colony, from its foundation https://www.flickr.com/photos/britishlibrary/11001417405 Slide 20 - Image from http://britishlibrary.typepad.co.uk/digital-scholarship/2013/10/peeking-behind-the-curtain-of-the- mechanical-curator.html