Your SlideShare is downloading. ×
Aly
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
1,022
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
0
Comments
0
Likes
0
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide
  • But I do think this picture holds a number of the clues. I’m being perhaps a bit mysterious here aren’t I?
  • Transcript

    • 1. Large Scale Digitisation Initiatives – the British Library Experience Aly Conteh Digitisation Programme Manager British Library September 2009
    • 2.  
    • 3. What do we have
      • 150 million items
      • Or
      • 650 linear kilometres
      • +
      • 11 kilometres every year
    • 4. 3.5 billion pages
    • 5. 825 million pages
    • 6. 5.5 billion pages
    • 7.  
    • 8. The focus of LSDI
    • 9. Historic Newspapers 1620 - 1900
      • Over 4 million pages digitised
      • Three Challenges
            • How to QA
            • How to Sustain
            • Need for better text extraction
    • 10. How do you Quality Assure 4 million pages?
      • Outsource – but need to QA the QA Process
      • ISO 2895-1 : “Sampling procedures for inspection by attribute-sampling scheme indexed by acceptances quality limit (AQL)”
      • Automation - JHOVE
    • 11.  
    • 12.  
    • 13. Better Text Extraction January 1874
    • 14.  
    • 15.  
    • 16. They had the internet in 1816 ! The Morning Chronicle  (London, England), Saturday, May 18, 1816; Issue 14678
    • 17. and DVD in 1803! The Morning Chronicle  (London, England), Friday, June 10, 1803; Issue 10625
    • 18.
      • Significantly improving mass digitisation of historical printed text by
      • Innovating OCR software and language technology
      • Sharing expertise and building capacity across Europe
      • Ensuring that tools and services will be sustained after the end of the project
      IMPACT is supported by the European Community under the FP7 ICT Work Programme. The project is coordinated by the National Library of the Netherlands. IMPACT
    • 19. The IMPACT Consortium
      • Libraries
        • National Library of the Netherlands (KB)
        • The British Library (BL)
        • Bibliothèque nationale de France (BNF)
        • German National Library (DNB)
        • Bavarian State Library (BSB)
        • Göttingen State and University Library (UGOE)
        • Austrian National Library (ONB)
        • University of Innsbruck Library (UIBK)
      • Universities & Research centres
        • Dutch Institute for Lexicology (INL)
        • National Centre for Scientific Research – Demokritos (NCSR)
        • University of Salford (USAL)
        • University of Munich (CIS group)
        • University of Innsbruck (InfMath group)
        • University of Bath (UKOLN)
      • Industry partners
        • IBM (Haifa Research Lab)
        • ABBYY (Moscow)
    • 20. Facts and figures
      • Project supported by the European Community under the FP7 ICT Work Programme.
      • coordinated by the National Library of the Netherlands (KB)
      • EU funding: € 11 500 000
      • Start date: 1 January 2008
      • Duration: 48 months
      • From 2011: sustainable Centre of Competence with alternative resources
      • Web site: www.impact-project.eu
    • 21. Microsoft – British Library 100,000 or 75,000 19 th Century Books
    • 22. Digitising 19 th Century Books
    • 23. Two more challenges
      • Greater Capacity
      • How do we store everything
    • 24. Greater Capacity
    • 25.  
    • 26. The reality
      • Productivity = 2x – 3x
      • At peak: 2 shifts, 6 workstations = 1.5m pcm
      • Integration with book ordering and catalogue systems
      • Project completed 5 months early
    • 27. only 2 of 20 million pages were damaged during scanning
    • 28. Storage
      • CR2 = 23 MB (438 TB)
      • TIFF = 53 MB (1 PB)
      • JP2K = 17 MB (324 TB)
      • JP2K (lossy) = 4 MB (80 TB)
      • What format is right for your project?
    • 29.
      • www.bl.uk
      • [email_address]
      Storage Thank you

    ×