Digital Berkshire, April 2012: Chris Clark, British Library PT#2


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • More than 200 people poured into the Museum of Lancashire in Preston at the weekend to have their loved-one’s precious items digitised for the virtual archive Queuing began an hour before doors opened on Saturday (10.03.12) – and the crowds continued to stream in until they closed nine hours later. Many people had travelled from as far afield as Leeds, Manchester, Birkenhead, Liverpool and Warrington just to be there. More than 2,300 images were taken of a wide variety of items, including: letters, diaries, medals, birth and death certificates, nurses’ autograph books, cartoons, pictures and trench art – everyday objects made from anything the soldiers found, such as shell casings and spent ammunition. The Preston roadshow is the latest in a series that is being rolled-out across 10 countries in Europe this year to create a unique pan-European account of WW1 that is available to everyone. Europeana 1914-1918 brings together a partnership of libraries, museums, academic and cultural institutions, which in the UK includes the British Library, Oxford University, JISC and Lancashire County Council. Page
  • Commercial partners: + it gets the job done - It can lock up information in silos while investment is recouped. The first works to be digitised by Google will range from feminist pamphlets about Queen Marie-Antoinette (1791), to the invention of the first combustion engine-driven submarine (1858), and an account of a stuffed Hippopotamus owned by the Prince of Orange (1775). Page
  • Luke McKernan (Lead Curator, Moving Image) and AHRC Page
  • Page
  • Showing relationships between  Target, Subject, Event, Special Collection. The Targets are colour coded to high level subject area and the size of the node represents the number of target instances. The UK web domain 9 million .uk domain names registered in December 2010 ~ 1 million using other domain names Growing at 11% - 14% per year 40% estimated to be in scope for Legal Deposit Estimated ~110TB each UK domain crawl Traditional “document-centric” approach does not scale up - canonical mission of heritage institutions being challenged Many technical challenges – the constant need to respond to the evolving web Harvests are at best snapshots or samples cannot get everything: resource and legal constraints; Crawler works well with HTML but struggles to capture advanced web content, e.g. rich media, dynamic and interactive content Rendering software does not always “replay” the archived content Cannot reply streaming media Risks of “republishing” – libel, copyright Legal Deposit offers some protection but access restricted to premises of LD institutions Page
  • This work was only continued in England when the threat of attack by France loomed, with the start of the Napoleonic Wars in 1795. Started on a scale of six-inches-to-the-mile for the south coast, which was then reduced at a later point. Thenceforth all mappong to be done by Board of Ordnance. Done by 1815. Upon this was based the original 1-inch maps of England. I’ve taken a detail of Exeter
  • These geo-tools allow historic maps to be overlaid and combined with modern mapping, enhancing the ability to compare and analyse the representation, and enabling searching by placename.
  • Digital Berkshire, April 2012: Chris Clark, British Library PT#2

    1. 1. ● Scale & materiality– Not individual, standard documents but vast collections of them; authenticity demands multiplicity of versions● Cost– Preservation not by individuals but large organizations● Intellectual Property– If content worth saving someone is making money from it
    2. 2. http://youtube-global.blogspot .com/2011/05/
    3. 3.  “The challenge for libraries is to find ways to preserve platform dependent digital works and to prevent the loss of complex digital media…. Since we cannot possibly save everything, we need to carefully consider which digital materials are the most important to preserve and try to anticipate the needs of future scholars and researchers” Marlene Manoff, 2006 If preservation priority is X and user need is Y, what are the values of X and Y?
    4. 4. If sustainability means that information is kept useful and available, then the LOCKSSapproach has real merit! It implies that SERVICES must be preserved as well.
    5. 5. Abundance of stored content: attention is scarce & must be earnedServices Content
    6. 6.  platforms to focus in 2012+ Maintain active presence on Continue to assess
    7. 7. Digital Research & Curator Team digital commercia l scholars pa academic rt n social digital consortia ers ers networks Digital funding us hip Research & bodiesmachines s Curator Team media systems & services M ser ation ns ret h & al es g l /ST o se oria riev s vic rc etin ibiti ce v sea s er eIS exh rvi rat pre rk cu ma
    8. 8. Training & development: seminars, conferences, events, ‘Digital Conversations’ Extend + ‘Tooling up’ Collaborate Digital Scholarship: horizon scanning, Tech Watch communities of practice, consortiaConsolidate Digital Curation as collaborative process: acquisitions, workflows, tools, project management, funding, exhibitions & marketing
    9. 9. Europeana – SB Berlin User generated content Centenary of the outbreak of the First  Roadshows in 10 countries to create World War unique pan-European archive Will create a European corpus of  Preston event produced more than digitised materials concerning the First 2300 images from letters, diaries, World War in all its aspects medals, pictures, trench art, and more Will contribute to Europeana a substantial collection of more than 400,000 outstanding sources
    10. 10. British Newspaper Archive Google Books British Library and brightsolid online  A 6 year project starting June 2011 publishing  250,000 Books, 1700-1870 Up to 40 million newspaper pages from  From the French Revolution to the the British Librarys collection over 10 end of slavery. years Collection includes runs of most  Material in major European languages newspapers published in the UK since  Focus on books that are not yet 1800 freely available in digital form online Over 4m pages added since launch  Access via Google Books and BL  Storage at Google and BL  Contract and terms available on the web!
    11. 11. Broadcast News IMPACT Historic Text  TV & radio news receivable in the UK,  Improve the digital accessibility of since May 2010, e.g. Al-Jazeera English, printed text produced before 1900 CNN, France 24, Russia Today  OCR does not produce  Search subtitles (where available) satisfactory results for old books,  AHRC-funded project looking at magazines and newspapers speech-to-text technologies for opening up audio and video archives  Historic material have archaic  Project will index 3,00 hours of TV fonts, complex layouts, warped or news and 3,000 hours of radio content degraded pages  Manual post-correction is slow and expensive
    12. 12.  Early music on-line: digitised 300 volumes (21k images) of rare early printed music from the British Library’s collections  Open educational licence encourages use and re- purposing of content and embedding in teaching and research  Detailed inventories of the books’ contents created for the first time, with access points for composer and title  Data included in British Library catalogue, COPAC and RISM music database, with links to digitised content  Digital images provided to Aruspix, which is developing an OCR and transcription tool for early
    13. 13. Personal digital archives Web archives Data analysis beyond documents  Create a research collection of UK Use computer forensics websites Capture, management, description,  Develop high-impact data analytical and preservation of personal digital access services collections to facilitate access and  Demonstrate the potential of domain analysis level web archives, or the “haystacks” Archives range from poets (W Cope)  UK web domain > 9m .uk domain and playwrights (H Pinter) to names computer scientists (D Michie) and  Estimate 110TB/crawl biologists
    14. 14.  Goal • Approach  Builds on previous crowd- • Accessible and convenient application sourcing projects, e.g. UK • Immediate results and feedback SoundMap • Competitive tools  Addressed key challenges – awareness, engagement, • Recognition and visible contribution productivity at scale
    15. 15. What is georeferencing?Ordnance Surveyors Drawing 40 (detail). Pen and Ink on paper. 1801.British Library, Maps OSD 40(3).
    16. 16. sults:725 maps assigned spatial metadata over 5 daysPublicity minimal – social media key~90 participants, top five completed half the workData quality good: <3% had errors
    17. 17. T-Pen Transcription UI
    18. 18.  Evolution by projects and commercial ties tends to reduce interoperability and inconveniences the researcher International collaborations, such as International Image Interoperability Framework, seek a shared canvas
    19. 19. ARROW project - a tool to assist ‘diligent search’ and provide faster answers to: Rights status? – Rightsholders? – Can I digitise?2008 2009 2010 2011 2012 2013 ARROW  29 Partners ARROW Plus  Libraries, BIP, Reprographic Rights Organisation (UK)  36 Partners  12 countries  14 countries (Austria, Denmark, France, Finland, (Austria, Belgium, Bulgaria, Germany, Germany, Italy, the Netherlands, Norway, Greece, Hungary, Ireland, Italy, Latvia, Slovenia, Spain, Lithuania, the Netherlands, Poland, Sweden, UK) Portugal, Spain)  Pilots: Germany, France; Spain; UK  Books and images in books  Books only 21
    20. 20. ARROW benefits Automated (where it can be – still some manual processes) Therefore saves time and cost ARROW search = 5 % of Manual search time  National partners working together across different sectors  Domain partners working together across countries 22
    21. 21. Persistent enquiry: can I use this? Open Knowledge Foundation Creative Commons Licenses Persistent URLs
    22. 22.  Six decades into the computer revolution, four decades since the invention of the microprocessor, and two decades into the rise of the modern internet, all of the technology required to transform industries through software finally works and can be delivered at global scale. Marc Andreessen ‘Why software is eating the world’ Wall Street Journal August 20 2011
    23. 23.  Our vision: In 2020, the British Library will be a leading hub in the global information network, advancing knowledge through our collections, expertise and partnerships, for the benefit of the economy and society and the enrichment of cultural life. If Andreessen is right, we may not be talking in 2020 about digital libraries and digital curators but an agency for the curation and creation of software.
    24. 24. @chrisleeclark