CityLIS talk, Feb 1st 2016

999 views

Published on

Slides for talk given at CityLIS for British Library Labs, 2016

Published in: Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
999
On SlideShare
0
From Embeds
0
Number of Embeds
41
Actions
Shares
0
Downloads
3
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

CityLIS talk, Feb 1st 2016

  1. 1. Farces and Failures Ben O’Steen, British Library Labs @benosteen
  2. 2. Names and labels we choose shape the questions that people will ask and the assumptions that they make. For example - “Labs”
  3. 3. https://www.flickr.com/photos/internetarchivebookimages/14763474682
  4. 4. In a nutshell: British Library Labs works with researchers on their specific problems, trying to assess how widely this problem is felt. With their help, we talk to communities of researchers and try to pinpoint what they need as opposed to what they think they need to ask us.
  5. 5. What were the researcher's initial preconceptions of working with the British Library?
  6. 6. “Give me all of collection X!” Common for researchers to want all of a named collection. Also common for us to give names to a collection based on who paid for it, or what project 'collated' it.
  7. 7. Farces... A common plot mechanism: A conversation where the participants leave in agreement but with two very different ideas of what was actually discussed.
  8. 8. Some common farce-inducing words Collection
  9. 9. Some common farce-inducing words Collection Access
  10. 10. Some common farce-inducing words Collection Access Content
  11. 11. Some common farce-inducing words Collection Access Content Metadata
  12. 12. Some common farce-inducing words Collection Access Content Metadata Crowdsourced
  13. 13. Microsoft Books digitisation project ● Started in 2007, but stopped in 2009 due to the cancellation of the MS Book search project. ● Digitised approximately 49k works, (~65k volumes). ● Online from 2012 via a “standard” page-turning interface, but very low usage statistics.
  14. 14. “I am interested in travel accounts in Europe during the 19th Century”
  15. 15. 2013 Competition winners http://labs.bl.uk/Ideas+for+Labs Pieter Francois
  16. 16. Bias in digitisation The tool was made to give a statistically valid sample. Due to the paltry amount digitised, it showed how skewed the digital corpus is, compared to the overall holdings. Allen B. Riddell in “Where are the novels?”* estimates that using HathiTrust’s corpus: “... about 58%—somewhere between 47% and 68%—of the 2,903 novels [all publications in English between 1800 and 1836] have publicly accessible scans.” * (2012) https://ariddell.org/where-are-the-novels.html
  17. 17. John Cooper, https://www.flickr.com/photos/atomicshed/2436324958 CC-BY-NC-ND 2.0
  18. 18. [ ]The square brackets of the soul...
  19. 19. What about some of our metadata?
  20. 20. The Chartist Walk...
  21. 21. Katrina Navickas, our researcher, in period costume! http://turbulentlondon.com/2015/10/01/following-the-chartists-around-london/
  22. 22. “Access” The newspapers were accessible. We had access to the newspapers but... We didn't have access to them. Keyword search fails miserably, and bulk access is an issue.
  23. 23. Simple data structure would've helped! All projects to date would’ve been made incredibly easier if: • Every thing had a URL. • The URL linked to a page that tells you all about that thing. • It should link to other, related things. • The page was machine-readable - never assumed a human would always read it. • Access to all data – images, XML, etc
  24. 24. Uptake? Hard to measure but: •13-20 million hits on average every month, over 330,000,000 hits to date. •Almost every image has been seen at least 20 times. •Over 500,000 tags added by volunteers and machine algorithms. •Iterative crowdsourcing is key.
  25. 25. Iterative crowdsourcing? (The term is stolen with permission from Mia Ridge.) 1. Crowdsource broad facts and subcollections of related items will emerge. 2. No 'one-size-fits-all': Subcollections allow for more focussed curation. Goto step 1
  26. 26. Purposefully contextless ● Presenting them through Flickr removed the illustration's context. – Did this help or hinder? ● Wished to stimulate research with the illustrations themselves (linotypes, etchings, etc). CS research was primarily 'Vision'
  27. 27. It wasn't perfect, it was an experiment “You know, the whole thing about perfectionism. The perfectionism is very dangerous, because of course if your fidelity to perfectionism is too high, you never do anything. Because doing anything results in— It’s actually kind of tragic because it means you sacrifice how gorgeous and perfect it is in your head for what it really is.” - As told to Leonard Lopate on WNYC on March 4, 1996. (emphasis my own) http://blankonblank.org/interviews/david-foster-wallace-on- ambition/
  28. 28. Fear of imperfection Encourages us to value the systems that provide access above the outcomes that could occur. Adherence to a specification, and 'hit' counts are easy to measure. Once you've built one interface, people are loath to make any others that would run in parallel.
  29. 29. Metaphors don't translate well between media Why do we assume that physical facsimiles are anything but a comforting solution?
  30. 30. Tagathon found nearly 30,000 maps!
  31. 31. Georeferencing - http://bl.uk/maps
  32. 32. Not just research use! http://www.playingbythebook.net/2014/03/18/barbapapas-new-house-a-book-so- good-im-featuring-it-for-a-second-time/
  33. 33. Burning Man Festival David Normal created light boxes around the Burning man, using the British Library’s Flickr Images
  34. 34. “Crossroads of Curiosity” launched on 20th June 2015
  35. 35. “Crowdsourcing” Found lots of really bad assumptions using this term: ● A crowd of people, each doing a small bit ● You must have special software for it ● If you build it, they will come – free labour! ● It's totally untrustworthy ● It's easy ● It fixes all problems ● It's cheap
  36. 36. “Crowdsourcing” ● A crowd of people, each doing a small bit % done Crowd Zooniverse usage concurs with this distribution
  37. 37. “Crowdsourcing” ● You must have special software for it Capturing input, showing progress and engaging with volunteers is what is important. Spreadsheets can be a wonderful thing!
  38. 38. “Crowdsourcing” ● If you build it, they will come – free labour!
  39. 39. “Crowdsourcing” ● It's totally untrustworthy ● It's easy ● It fixes all problems ● It's cheap
  40. 40. Investigation into the unusual ● Can we avoid the keyboard and mouse? ● Can we make use of casual interaction, as opposed to the usual “group of experts”? ● Can useful games be made with this constraint? ● Can they be fun, as well as rewarding? ● Which age ranges understand what an arcade machine even is?
  41. 41. Game Jam!
  42. 42. In Summary: ● Be careful with the words you use, especially those you think everyone understands ● Things do not need to be catalogued or perfect to be useful to people ● Wanting access to everything is the default ● A singular presentation of a collection is a risky strategy – only mimicking the physical may not be the best idea ● Experts are where you find them, look after them once you do! ● Make space to experiment, to fail and to learn from your mistakes.
  43. 43. My contact details: ben.osteen@bl.uk @benosteen Links: http://labs.bl.uk http://mechanicalcurator.tumblr.com https://flickr.com/photos/britishlibrary https://github.com/bl-labs http://britishlibrary.typepad.co.uk/digital-scholarship/2013/12/a-million-first-steps.html

×