The document discusses the British Library's "Mechanical Curator" experiment of providing direct access to a large collection of digitized books and images. It describes how initial attempts at automated analysis were unsuccessful due to the limitations of models trained on modern photographs. However, the raw images proved interesting to many people. Over time, the project evolved to include crowdsourcing annotations, educational uses, and collaborations with computer science students and other researchers. The author concludes that through such experiments, they can fail quickly and build bridges to open up cultural heritage collections.
UX, ethnography and possibilities: for Libraries, Museums and ArchivesNed Potter
These slides are adapted from a talk I gave at the Welsh Government's Marketing Awards for the LAM sector, in 2017.
It offers a primer on UX - User Experience - and how ethnography and design might be used in the library, archive and museum worlds to better understand our users. All good marketing starts with audience insight.
The presentation covers the following:
1) An introduction to UX
2) Ethnography, with definitions and examples of 7 ethnographic techniques
3) User-centred design and Design Thinking
4) Examples of UX-led changes made at institutions in the UK and Scandinavia
5) Next Steps - if you'd like to try out UX at your own organisation
This is a facts & figures overview of the Digital Scholarship Training offered by the Library & IT Services at the University of York over the last 18 months.
We've found the academic community (specifically academics, postgraduate researchers and support staff) extremely receptive to the workshops, which cover themes such as Twitter (for teaching and for research), blogging, the presentation tool Prezi, and Google Apps for Education.
If you work in a library or IT department at a Higher Education institution and have relevant expertise in this area, find a way to deliver it to the people who want it!
Schemas for the Real World [Madison RubyConf 2013]Carina C. Zona
Social app development challenges us how to code for users’ personal world. Users are giving push-back to ill-fitted assumptions about their identity — including name, gender, sexual orientation, important relationships, and other attributes they value.
How can we balance users’ realities with an app’s business requirements?
Facebook, Google+, and others are grappling with these questions. Resilient approaches arise from an app’s own foundation. Discover schemas’ influence over codebase, UX, and development itself. Learn how we can use schemas to both inspire users and generate data we need as developers.
--
META
Where: Madison Ruby Conference 2013 (Madison, Wisconsin, USA)
Date: August 23, 2013
Video: http://www.confreaks.com/videos/2627-madisonruby2013-schemas-for-the-real-world
Ain't Nobody's Business If I Do (Read Serials)NASIG
Per the ethics of librarianship as codified by the American Library Association, knowledge seekers can expect that librarians will "protect each library user's right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted." Can librarians honestly promise this with respect to electronic serials? Do library users know or care whether librarians do? Do serials publishers and vendors acknowledge ethical duties toward readers and their privacy? Dorothea Salo will outline legal, technical, logistical, and licensing facets impinging upon, sometimes threatening, the serial reader's privacy.
Speaker: Dorothea Salo, Faculty Associate, University of Wisconsin - Madison
What is UX and how can it help your organisation?Ned Potter
An overview of User Experience techniques. No longer just web usability testing, there's a new much more human movement in UX. This presentation outlines the key components, with examples: ethnography, and human-centred design.
UX, ethnography and possibilities: for Libraries, Museums and ArchivesNed Potter
These slides are adapted from a talk I gave at the Welsh Government's Marketing Awards for the LAM sector, in 2017.
It offers a primer on UX - User Experience - and how ethnography and design might be used in the library, archive and museum worlds to better understand our users. All good marketing starts with audience insight.
The presentation covers the following:
1) An introduction to UX
2) Ethnography, with definitions and examples of 7 ethnographic techniques
3) User-centred design and Design Thinking
4) Examples of UX-led changes made at institutions in the UK and Scandinavia
5) Next Steps - if you'd like to try out UX at your own organisation
This is a facts & figures overview of the Digital Scholarship Training offered by the Library & IT Services at the University of York over the last 18 months.
We've found the academic community (specifically academics, postgraduate researchers and support staff) extremely receptive to the workshops, which cover themes such as Twitter (for teaching and for research), blogging, the presentation tool Prezi, and Google Apps for Education.
If you work in a library or IT department at a Higher Education institution and have relevant expertise in this area, find a way to deliver it to the people who want it!
Schemas for the Real World [Madison RubyConf 2013]Carina C. Zona
Social app development challenges us how to code for users’ personal world. Users are giving push-back to ill-fitted assumptions about their identity — including name, gender, sexual orientation, important relationships, and other attributes they value.
How can we balance users’ realities with an app’s business requirements?
Facebook, Google+, and others are grappling with these questions. Resilient approaches arise from an app’s own foundation. Discover schemas’ influence over codebase, UX, and development itself. Learn how we can use schemas to both inspire users and generate data we need as developers.
--
META
Where: Madison Ruby Conference 2013 (Madison, Wisconsin, USA)
Date: August 23, 2013
Video: http://www.confreaks.com/videos/2627-madisonruby2013-schemas-for-the-real-world
Ain't Nobody's Business If I Do (Read Serials)NASIG
Per the ethics of librarianship as codified by the American Library Association, knowledge seekers can expect that librarians will "protect each library user's right to privacy and confidentiality with respect to information sought or received and resources consulted, borrowed, acquired or transmitted." Can librarians honestly promise this with respect to electronic serials? Do library users know or care whether librarians do? Do serials publishers and vendors acknowledge ethical duties toward readers and their privacy? Dorothea Salo will outline legal, technical, logistical, and licensing facets impinging upon, sometimes threatening, the serial reader's privacy.
Speaker: Dorothea Salo, Faculty Associate, University of Wisconsin - Madison
What is UX and how can it help your organisation?Ned Potter
An overview of User Experience techniques. No longer just web usability testing, there's a new much more human movement in UX. This presentation outlines the key components, with examples: ethnography, and human-centred design.
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...James Hendler
IJCAI 16 keynote on the need to bring modern AI accomplishments of recent years into connection with the more traditional goals of symbolic AI (and vice versa).
I've Always Wanted To Data Model - Data Week 2013Ian Varley
One of the tenets of Big Data is that it allows developers to work with "unstructured" data. But unless you're piping /dev/random, there's no such thing as *truly* unstructured data; only data whose structure you don't understand yet. In this lightning talk, we'll take a tour of the core fundamentals of deep data structure modeling, and see how the rigid tools and techniques of the past have failed us in the modern world of agile software and big data. We'll delve into what hope there is for understanding the semantics and structure of data that doesn't play by the rules of an RDBMS.
Unstructure: Smashing the Boundaries of Data (SxSWi 2014)Ian Varley
When it comes to thinking about data, most software designers are stuck in a rigid, 2-dimensional mindset: "rows and columns." A shame, because breaking free from this "tyranny of the table" can bring our software to new heights: intuitive user experiences, fast development iterations, and cohesive apps.
In this workshop, we'll cover a few concepts that bring data design out of the 1970s, like: sparse representation, emergent schema, ultra-structure, prototype-driven design, graph theory, traversing the time dimension, and more. We'll run the gamut of philosophical approaches to understanding what is important in your mental (and software) model, and how to transcend your two-dimensional picture of data, and trade it in for an N-dimensional one.
Working hands-on with a simple "mock company" and its new killer app, you'll learn:
* The basic concepts of data design: entities, relationships, attributes, and types (along with a few better ways to notate them)
* How to experiment with creating these data structures in a couple existing cloud-based frameworks (e.g. google apps engine, force.com, heroku, etc.).
* How emergent techniques like schema-on-read and ultra-structure can simplify modeling (or, sometimes, complicate it)
* How statistical techniques from the data mining world can loosen our insistence on rigid models
* Why the time dimension is important (in data as well as schema)
This is a call to arms for libraries, inspired loosely by the famous SHIFT HAPPENS deck. Feel free to embed it anywhere and everywhere, with attribution.
Come on people! This is libraries' time!
How to stop sucking and be awesome insteadcodinghorror
If you're reading this abstract, you're not awesome enough. Attend this session to unlock the secrets of Jeff Atwood, world famous blogger and industry leading co-founder of Stack Overflow and Stack Exchange. Learn how you too can determine clear goals for your future and turn your dreams into reality through positive-minded conceptualization techniques.* Within six to eight weeks, you'll realize the positive effects of Jeff Atwood's wildly popular Coding Horror blog in your own life, transporting you to an exciting new world of wealth, happiness and political power.
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...museums and the web
What becomes the role for institutions and scholars charged with the study and safe-keeping of the past and the near-future when traditional methodologies like "authority records" are forced to compete with automated data collection, machine learning, the now suddenly practical reality of "big data" and the rise of broad communities of participation?
The breadth and reach of the Internet and the availability of alternative data sources, whether they are harvested programmatically or fashioned by amateur communities of interest has created a world where both the conceptual and financial economics of traditional scholarship are rapidly being undermined. Further, in the absence of a way for non-experts to feel as though they can participate in the discourse outside of established venues and vocabularies the opinions and assumed meritocracies of experts are increasingly being overlooked entirely.
What would it mean to change the role of digital preservation and scholarly interpretation from one where it looks and feels, to those the outside, like castle walls to be more like a rough guide composed of road signs and fence-posts? To consider a project whose goal is no longer to weave elaborate tapestries of the past facts but to produce textiles, and patterns, to be fashioned into reflections of the present?
A presentation from Museums and the Web 2011.
Knowledge Representation in the Age of Deep Learning, Watson, and the Semanti...James Hendler
IJCAI 16 keynote on the need to bring modern AI accomplishments of recent years into connection with the more traditional goals of symbolic AI (and vice versa).
I've Always Wanted To Data Model - Data Week 2013Ian Varley
One of the tenets of Big Data is that it allows developers to work with "unstructured" data. But unless you're piping /dev/random, there's no such thing as *truly* unstructured data; only data whose structure you don't understand yet. In this lightning talk, we'll take a tour of the core fundamentals of deep data structure modeling, and see how the rigid tools and techniques of the past have failed us in the modern world of agile software and big data. We'll delve into what hope there is for understanding the semantics and structure of data that doesn't play by the rules of an RDBMS.
Unstructure: Smashing the Boundaries of Data (SxSWi 2014)Ian Varley
When it comes to thinking about data, most software designers are stuck in a rigid, 2-dimensional mindset: "rows and columns." A shame, because breaking free from this "tyranny of the table" can bring our software to new heights: intuitive user experiences, fast development iterations, and cohesive apps.
In this workshop, we'll cover a few concepts that bring data design out of the 1970s, like: sparse representation, emergent schema, ultra-structure, prototype-driven design, graph theory, traversing the time dimension, and more. We'll run the gamut of philosophical approaches to understanding what is important in your mental (and software) model, and how to transcend your two-dimensional picture of data, and trade it in for an N-dimensional one.
Working hands-on with a simple "mock company" and its new killer app, you'll learn:
* The basic concepts of data design: entities, relationships, attributes, and types (along with a few better ways to notate them)
* How to experiment with creating these data structures in a couple existing cloud-based frameworks (e.g. google apps engine, force.com, heroku, etc.).
* How emergent techniques like schema-on-read and ultra-structure can simplify modeling (or, sometimes, complicate it)
* How statistical techniques from the data mining world can loosen our insistence on rigid models
* Why the time dimension is important (in data as well as schema)
This is a call to arms for libraries, inspired loosely by the famous SHIFT HAPPENS deck. Feel free to embed it anywhere and everywhere, with attribution.
Come on people! This is libraries' time!
How to stop sucking and be awesome insteadcodinghorror
If you're reading this abstract, you're not awesome enough. Attend this session to unlock the secrets of Jeff Atwood, world famous blogger and industry leading co-founder of Stack Overflow and Stack Exchange. Learn how you too can determine clear goals for your future and turn your dreams into reality through positive-minded conceptualization techniques.* Within six to eight weeks, you'll realize the positive effects of Jeff Atwood's wildly popular Coding Horror blog in your own life, transporting you to an exciting new world of wealth, happiness and political power.
MW2011: Cope, A., Authority Records, Future Computers and Other Unfinished Hi...museums and the web
What becomes the role for institutions and scholars charged with the study and safe-keeping of the past and the near-future when traditional methodologies like "authority records" are forced to compete with automated data collection, machine learning, the now suddenly practical reality of "big data" and the rise of broad communities of participation?
The breadth and reach of the Internet and the availability of alternative data sources, whether they are harvested programmatically or fashioned by amateur communities of interest has created a world where both the conceptual and financial economics of traditional scholarship are rapidly being undermined. Further, in the absence of a way for non-experts to feel as though they can participate in the discourse outside of established venues and vocabularies the opinions and assumed meritocracies of experts are increasingly being overlooked entirely.
What would it mean to change the role of digital preservation and scholarly interpretation from one where it looks and feels, to those the outside, like castle walls to be more like a rough guide composed of road signs and fence-posts? To consider a project whose goal is no longer to weave elaborate tapestries of the past facts but to produce textiles, and patterns, to be fashioned into reflections of the present?
A presentation from Museums and the Web 2011.
Presentation to the i20 event hosted by The White House and The Department of State on the 13th January 2011. Presentation of the OpenIDEO i20 challenge, featuring 6 grand challenges contributed by a range of people. More details here: http://bit.ly/bvAjkt
Voices from the Field: Practices, Challenges & Directions in Digital Humaniti...Monica Bulger
Presented at the Click-on-Knowledge Conference May 11-13, 2011 in Copenhagen.
Smiljana Antonijevic & Monica Bulger
This paper presents findings of a fieldwork study that explored research practices, challenges, and directions in contemporary digital humanities scholarship. The study was conducted in the period April-October, 2010, as part of two research projects of the Royal Netherlands Academy of Arts and Sciences and the Oxford Internet Institute— Alflalab (http://alfalablog.huygensinstituut.nl/) and Humanities Information Practices (http://www.oii.ox.ac.uk/research/?id=58). The study included observations and in-depth interviews with digital humanities scholars, policymakers, and funders, with a focus on developers and users of digital resources for humanities research. The study involved 86 participants from over 25 institutions in 5 countries. Participating institutions included: Huygens Institute; National Endowment for Humanities Office of Digital Humanities; Stanford University; University of Alberta; University of California, Berkeley; University of California, Los Angeles; University of Indiana; University of Maryland; University of Oxford; University of Virginia.
Talk given at Te Papa, for the NDF NZ. The video of the talk is inserted here before the slides themselves.
Direct link to the video of the talk: https://www.youtube.com/watch?v=bIXB0ROyxcY
Presentation based on fieldwork research conducted at digital humanities institutions in Europe and the USA; delivered at Click on Knowledge conference in Copenhagen (http://engerom.ku.dk/clickonknowledge/)
AI and the Researcher: ChatGPT and DALL-E in Scholarly Writing and PublishingErin Owens
The artificial intelligence tool ChatGPT has taken the world by storm, prompting concerns about student plagiarism. But A.I. text and image generators also pose ethical and legal conundrums for scholarly researchers. This session will delve into some of the emerging issues and developments that may affect faculty in scholarly writing and publishing.
An introduction to the art & science of Data Visualisation. A whistle-stop tour, with some bad examples and some good examples. Key lessons and a case study (deep dive).
slides from my recent presentation to the Malaysian Higher Education conference in Langkawi on March 1st, 2007. See blog posting at www.autodesk.com/waynehodgins
MIT Program on Information Science Talk -- Julia Flanders on Jobs, Roles, Ski...Micah Altman
Julia Flanders, who is the Director of the Digital Scholarship Group in the Northeastern University Library, and a Professor of Practice in Northeastern's English Department gave a talk on Jobs, Roles, Skills, Tools: Working in the Digital Academy as part of the Program on Information Science Brown Bag Series.
In the talk, illustrated by the slides below, Julia discusses the evolving landscape of digital humanities (and digital scholarship more broadly) and considers the relationship between technology, tool development, and professional roles.
For more see: http://informatics.mit.edu/event/brown-bag-jobs-roles-skills-tools-working-digital-academy-julia-flanders
What is a Creative Date Scientist (and why the $@%! do we need one?)Dave LaFontaine
This presentation was originally delivered to the SoCal UX Camp; it's designed to help "creatives" to get over their numbers-phobia, and instead start engaging with analytics.
User Experience Webinar 1 - Eye-popping Content: Creating a User-friendly Fra...springshare
You’ve got it all – databases, articles, videos, books, recommended links. So how do you package it in a way that not only satisfies your users’ information needs but encourages browsing? Learn practical techniques and ideas for building a user-friendly and contextual framework for the web while using the resources at your fingertips.
Similar to BL Labs 2014 Symposium: The Mechanical Curator (20)
Some collected uses of the British Library Flickr collection, illustrating how a new presentation changed its usage.
Outlines the existence of collection bias, especially in digitised material.
An Overview of the area and the current potential for the open technologies to be used, and some suggestions as to why they are not as heavily used as they should be.
The French Revolution, which began in 1789, was a period of radical social and political upheaval in France. It marked the decline of absolute monarchies, the rise of secular and democratic republics, and the eventual rise of Napoleon Bonaparte. This revolutionary period is crucial in understanding the transition from feudalism to modernity in Europe.
For more information, visit-www.vavaclasses.com
The Art Pastor's Guide to Sabbath | Steve ThomasonSteve Thomason
What is the purpose of the Sabbath Law in the Torah. It is interesting to compare how the context of the law shifts from Exodus to Deuteronomy. Who gets to rest, and why?
Instructions for Submissions thorugh G- Classroom.pptxJheel Barad
This presentation provides a briefing on how to upload submissions and documents in Google Classroom. It was prepared as part of an orientation for new Sainik School in-service teacher trainees. As a training officer, my goal is to ensure that you are comfortable and proficient with this essential tool for managing assignments and fostering student engagement.
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
Unit 8 - Information and Communication Technology (Paper I).pdfThiyagu K
This slides describes the basic concepts of ICT, basics of Email, Emerging Technology and Digital Initiatives in Education. This presentations aligns with the UGC Paper I syllabus.
2024.06.01 Introducing a competency framework for languag learning materials ...Sandy Millin
http://sandymillin.wordpress.com/iateflwebinar2024
Published classroom materials form the basis of syllabuses, drive teacher professional development, and have a potentially huge influence on learners, teachers and education systems. All teachers also create their own materials, whether a few sentences on a blackboard, a highly-structured fully-realised online course, or anything in between. Despite this, the knowledge and skills needed to create effective language learning materials are rarely part of teacher training, and are mostly learnt by trial and error.
Knowledge and skills frameworks, generally called competency frameworks, for ELT teachers, trainers and managers have existed for a few years now. However, until I created one for my MA dissertation, there wasn’t one drawing together what we need to know and do to be able to effectively produce language learning materials.
This webinar will introduce you to my framework, highlighting the key competencies I identified from my research. It will also show how anybody involved in language teaching (any language, not just English!), teacher training, managing schools or developing language learning materials can benefit from using the framework.
Operation “Blue Star” is the only event in the history of Independent India where the state went into war with its own people. Even after about 40 years it is not clear if it was culmination of states anger over people of the region, a political game of power or start of dictatorial chapter in the democratic setup.
The people of Punjab felt alienated from main stream due to denial of their just demands during a long democratic struggle since independence. As it happen all over the word, it led to militant struggle with great loss of lives of military, police and civilian personnel. Killing of Indira Gandhi and massacre of innocent Sikhs in Delhi and other India cities was also associated with this movement.
Model Attribute Check Company Auto PropertyCeline George
In Odoo, the multi-company feature allows you to manage multiple companies within a single Odoo database instance. Each company can have its own configurations while still sharing common resources such as products, customers, and suppliers.
4. David Foster Wallace, on Ambition:
“You know, the whole thing about perfectionism. The
perfectionism is very dangerous, because of course if your
fidelity to perfectionism is too high, you never do
anything.
Because doing anything results in— It’s actually kind of
tragic because it means you sacrifice how gorgeous and
perfect it is in your head for what it really is.”
- As told to Leonard Lopate on WNYC on March 4, 1996.
(emphasis my own)
http://blankonblank.org/interviews/david-foster-wallace-on-ambition/
9. Why?
“Can’t they just find the things they want
through the catalogue?”
10. 1. If they knew which bits
of data were necessary,
they would already know
the answers.
11.
12. “I am
interested in
travel
accounts in
Europe during
the 19th
Century”
13. 2. If a conventional
search interface worked,
they wouldn’t be asking.
14. How does conventional search work
anyway? Under what assumptions?
Starts with the Text:
“I quickly explained that many big jobs involve
a few hazards.”
15. How does conventional search work
anyway? Under what assumptions?
Then it is Tokenised (with some assumptions
on how this is possible):
“I”, “quickly”, “explained”, “that”, ”many”, “big”,
“jobs”, “involve”, “a”, “few”, “hazards”
16. How does conventional search work
anyway? Under what assumptions?
Then, the most common words are removed
as these are assumed to be unimportant.
(Stopwords)
“quickly”, “explained”, ”many”, “big”, “jobs”,
“involve”, “few”, “hazards”
17. How does conventional search work
anyway? Under what assumptions?
Many fulltext search services will also perform
language-specific Stemming, that is, to reduce
each word to a root:
“quick”, “explain”, ”many”, “big”, “job”,
“involve”, “few”, “hazard”
(Lookup ‘porter’ and ‘snowball’ stemmers for more.)
18. How does conventional search work
anyway? Under what assumptions?
Finally, an inverse-index is created* and
arranged with the assumption that you want to
find the most Relevant results to future
queries.
Search terms are passed through the same
workflow.
(*Contemporary search engines are more complex of course, but the basics
are still there.)
19. Why on earth did I teach you about
search?
All services are made with compromises and
assumptions, and it is good to examine these
from time to time.
The key assumption is that people will search
for the most Relevant record that matches the
text they entered.
30. 2013 Competition winners
http://labs.bl.uk/Ideas+for+Labs
Dan Norton - “Mixing the Library. Information
Interaction and the DJ”
Can a researcher record a session drawing
from digital objects, in the same way a DJ does
with music tracks?
31. The other unifying themes to the
requests:
“I need tools to help me interpret the vast
amount of content you hold. You don’t provide
any but make it impossible for others to do
so.”
“I want to work on broad sweeps of content,
rather than book-by-book. It would take too
much time to get each one.”
“API? what’s that? I don’t care. Just give me the
files.”
32. So, a challenge was born…
If a researcher is given direct file access to a
large amount of data, can it be useful?
What internal conventions would need to be
removed? What external conventions added?
One way to try it out, was to pretend to be a
researcher and to ‘eat our own dogfood’.
33. How has the depiction of
faces changed in books
over the 19th Century?
aka how well does modern photographic
face detection routines work on 19th C
illustrations?
34.
35. Success? Not really.
Many more female faces were found than
male.
This did not mean that there are more
images of women in the books than men!
36. 19C depictions of faces
• Often drawn more symmetrically - male faces
were more likely to be exaggerated.
• Depiction is typically 'clean' and posed
• Fashion: beards, spectacles and hats - different
to the modern photographic training data
37. There was something else though...
People on their way past would occasionally
pause and look over my shoulder.
Every day it dug up illustrations that
surprised me and the team around me.
So… I wondered if anyone else might be
surprised and intrigued by them too?
http://mechanicalcurator.tumblr.com/archive
38.
39.
40.
41. How does machine learning work?
First, turn the raw data into numbers,
something the computer can deal with:
eg when analysing text, assign a number to
each word and form a ‘dictionary’
42. How does machine learning work?
Process the numeric data in an effort to
better expose the “important” information
- removing noise and tone variation from an
image
- turning a grid of pixels into independant
trackable ‘points of interest’
- hue, saturation, levels
- produce metrics
43. How does machine learning work?
Annotate - manually or automatically - what is
useful and what is not in a portion of the data:
- Characteristics:
- Spam or not?
- Face at x,y,w,h
- Positive, neutral and negative sentiment
- Scalar qualities
44. How does machine learning work?
Pass most of the ‘known’ data through one of
many machine learning algorithms, such as a
Scalable Vector Machine (SVM) as
implemented in libsvm.
Which one depends entirely on what the
computer will be able to do once trained.
45. How does machine learning work?
Test your trained machine with half of the rest
of the data to see how it does.
eg if characterising email, does it correctly spot
Spam?
46. How does machine learning work?
Now, use the trained profile on real data!
Sometimes, these profiles are shared, for
example, Haar cascades trained on
photographic datasets (face, body, etc) are
freely available
47. Why the second lesson?
Analysis starts with a bulk set of data, and a
set of assumptions and ideas.
The usefulness of a stemming/tokenising
search service is unquestioned and Libraries
support metadata-level search.
No-one can support all assumptions and
ideas!
50. Accessible?
• In theory, the books were accessible.
• In practice, it was a real challenge to find
anything viewable.
The chasm between digital and print:
http://samplegenerator.cloudapp.net
51. As this is all in the public domain
anyway...
What’s the harm in making it a bit more
accessible?
The Mechanical Curator twitter account has
only got a handful of people following it
after all. Maybe there isn’t much appetite for
it?
52.
53. Impact?
Hard to measure:
- 20 million hits on average every month,
over 200 million in 10 months*.
- Over 100,000 tags added.
- Hundreds of contributors.
- Iterative crowdsourcing is ongoing.
- Peter Balman’s aforementioned project
* Are image view stats really a good measure?
54.
55.
56.
57. Research and Technology
• Mario Klingemann Pattern Recognition Software
• Collaborative PhD ‘A History of the Printed Image 1750-1850: Applying
Data Science Techniques to Printed Book Illustration’
• TSB Digitial Innovation Contest New tech for tracking Public Domain in
the Wild
58. Crowdsourcing & Apps
• Metadata Games
• Wikipedia Synoptic Index
• BL Georeferencer - 3221 maps referenced in a few weeks!
61. Creative Uses
• David Normal installation at Burning Man Festival
• “Moments” by Joe Bell
• Colouring-in Pages for Children
62. Tutorial
s
• Using Photoshop to Up-res images
• Converting images to vector graphics
63. Collaborations with Colleagues
• Inspired by Flickr, a Sound Archive series
• Maps will be fed into the next phase of the Georeferencer
64. Education
• Images included in Wikipedia Articles
• University of Minnesota English Literature Course Exercise on Tagging
• Art Therapy Courses
65.
66. The ‘British Library Big Data
Experiment’
http://britishlibrary.typepad.co.uk/digital-scholarship/
2014/06/the-british-library-big-data-
experiment.html
“What can a group of UCL Big Data CS
students do when given access to cloud
computing, all of the book data and a focus
group of digital humanists?”
67. The ‘British Library Big Data
Experiment’
Next phase will work with an undergraduate
team with experience at image analysis.
We are hosting an event on the 18th of
December 2014, on “Pattern Recognition”.
68.
69. In summary, “Clarity”
It is clear that we can:
fail and fail quickly
build experiments that
won’t last
open content
build bridges
70. My contact details for later technical
questions:
ben.osteen@bl.uk
@benosteen
Links:
http://labs.bl.uk
http://mechanicalcurator.tumblr.com
https://flickr.com/photos/britishlibrary
https://github.com/bl-labs
http://britishlibrary.typepad.co.uk/digital-scholarship/2013/12/a-million-first-steps.html
71. Image credits:
Title image: from https://www.flickr.com/photos/britishlibrary/11223645575
Title: "The Book of The Grand Junction Railway, being a history and description of the line from Birmingham to Liverpool and
Manchester ... By T. Roscoe, assisted by the resident engineers of the line"
Author: Roscoe, Thomas.
Shelfmark: "British Library HMNTS 796.f.3."
https://www.flickr.com/photos/britishlibrary/11209677645 - Foot Bridge, Dartmoor
https://www.flickr.com/photos/britishlibrary/11208502325 - The Suspension Bridge
https://www.flickr.com/photos/britishlibrary/11234482436 - Wensleydale & Swaledale
Image taken from page 97 of 'The Mineral Baths of Bath. The Bathes of Bathe's Ayde in the reign of Charles 2nd as
illustrated by a drawing of the King's and Queen's Bath, signed 1675. Whereunto is annexed a Visit to Bath in the year
1675 by “A Person of Q" by The British Library (More from this book here: https://www.flickr.com/search/?
tags=sysnum000878624)
Image taken from page 467 of '[The History of New South Wales, including Botany Bay, Port Jackson, Pamaratta [sic],
Sydney, and all its dependancies ... with the customs and manners of the natives, and an account of the English colony,
from its foundation https://www.flickr.com/photos/britishlibrary/11001417405
http://britishlibrary.typepad.co.uk/digital-scholarship/2013/10/peeking-behind-the-curtain-of-the-mechanical-curator.html