NDNP the kentucky way


Published on

A description of how the University of Kentucky Libraries started digitizing newspapers.

Published in: Education
1 Like
  • Be the first to comment

No Downloads
Total views
On SlideShare
From Embeds
Number of Embeds
Embeds 0
No embeds

No notes for slide
  • Kentuckiana Digital Library Beyond the Shelf: Over 100,000 pages Over 1,000 titles from microfilm created in a cooperative microfilming project Good for us to cut our teeth on: Practice creating objects from film Practice working in a group that spanned organizational structure Develop process Balance between production and quality control
  • Kentucky holds an important place in US history because of a strategic geographic location that served as a gateway for the westward expansion of settlers from the 18 th century forward. Daniel Boone explored and settled on Kentucky. Kentucky is a border state as it sits at the top of the American south. This was a strategic center during the US civil war and also was a nexus for the civil rights struggle of African Americans in the 1960s. Lyndon Johnson’s War on Poverty put forth Appalachian regions of Eastern Kentucky as some of the poorest in the nation. Thus materials about Kentucky history hold great significance for the US history.
  • Ability to supply print masters ourselves
  • Plus fend off vendors – you are doing this yourself?!
  • Very fast, very sexy You know your mechanic well
  • 72 MB for a TIFF of each IA newspaper page 576 MB for each eight-page issue 29,952 MB for one year of an eight-page weekly paper … and that’s just the TIFFs (We produce four for each page, so that’s actually 119,808 MB)
  • Internet 2 will help Luckily we have a GA
  • NDNP the kentucky way

    1. 1. The Kentucky Way Digitizing Newspapers as a part of the NDNP Mary Molinaro molinaro@uky.edu
    2. 2. Kentucky and the NDNP <ul><li>Why did we apply? </li></ul><ul><li>Why didn’t we outsource? </li></ul><ul><li>How are we actually doing the work? </li></ul><ul><li>What did we learn? </li></ul><ul><li>What’s next? </li></ul>
    3. 3. Taking the logical path
    4. 4. Beyond the Shelf: Serving historic Kentuckiana through virtual access http://kdl.kyvl.org/
    5. 5. “ When are you going to digitize newspapers?”
    6. 6. NDNP checklist <ul><li>Successful film to digital experience </li></ul><ul><li>Know microfilm well </li></ul><ul><li>Have the master negatives </li></ul><ul><li>Fits into overall plan for growth of program </li></ul><ul><li>Opportunity to find our niche </li></ul>
    7. 7. The Proposal In 45 days
    8. 8. We didn’t propose outsourcing <ul><li>Successful film to digital experience </li></ul><ul><li>Know microfilm well </li></ul><ul><li>Have the master negatives </li></ul><ul><li>Fits into overall plan for growth of program </li></ul><ul><li>Opportunity to find our niche </li></ul><ul><li>It never occurred to us! </li></ul>
    9. 9. The case for our content
    10. 10. We highlighted our experience and expertise with newspapers, microfilm, and digitization
    11. 11. Grant awarded, now get to work! <ul><li>Order server </li></ul><ul><li>Order new scanner </li></ul><ul><li>Order software pieces and parts </li></ul><ul><li>Hire project manager </li></ul><ul><li>Get organized </li></ul><ul><li>Call meeting of advisory board </li></ul>
    12. 12. Seize opportunity
    13. 13. So how DO we do this?
    14. 14. Title selection <ul><li>Geographically distributed </li></ul><ul><li>Significant titles </li></ul><ul><li>Titles that are available </li></ul><ul><li>What we have in our vault </li></ul><ul><li>Advisory board recommendations </li></ul>
    15. 15. Microfilm evaluation collects information – and reveals physical problems <ul><li>dirty film </li></ul><ul><li>circulated master negatives </li></ul><ul><li>redox </li></ul><ul><li>rings from hydration </li></ul>
    16. 16. Microfilm evaluation collects information – and reveals intellectual problems [1], [2], [1], [2], [3], [4], [5], [6] <05.27.1903> | splice | [3], [4], [8], [blank], [1], [2], [7], [8] <05.24.1905> 1 2 1 2 3 4 5 6 3 4 8 B 1 2 7 8
    17. 17. Microfilm evaluation collects information – and sees metadata challenges Title: The Owingsville Outlook, Frequency: Weekly, Location: Owingsville, KY, File Number: S/83-5, Date: 1906: January 25, December 20, Notes: some pages are mutilated, *Issues this month are missing (June) Present: 1906-01-25, 1906-02-01, 1906-02-15, 1906-02-22, 1906-03-01, 1906-03-08, 1906-03-15, 1906-04-05, 1906-04-12, 1906-04-19, 1906-04-26, 1906-05-03, 1906-05-10, 1906-05-17, 1906-07-26, 1906-08-02, 1906-08-16, 1906-09-27, 1906-10-11, 1906-11-08, 1906-11-22, 1906-12-20; Missing: 1906-02-08, 1906-03-22, 1906-03-29, 1906-05-24, 1906-07-12, 1906-07-19, 1906-08-09, 1906-08-23, 1906-09-06, 1906-09-13, 1906-09-20, 1906-10-04, 1906-10-18, 1906-10-25, 1906-11-01, 1906-11-15, 1906-11-29, 1906-12-06, 1906-12-13; Incomplete: 1906-07-05, Codes: check mark=present, M=missing, I=incomplete, Mu=mutilated, NP=not published;
    18. 18. We have decades of experience with microfilm production – but little experience with negative duplication But Shell Dunn taught herself how to make print master negatives, troubleshot problematic film, and helped solve a mystery of mottled film
    19. 19. How is an $84,000 scanner like a sports car?
    20. 20. Large-format microfilm (IA) + NDNP image specifications ---------------------------------------------------------------------------------------- Scanning and storage challenges 72 MB 576 MB 29,952 MB … and that’s just the TIFFs
    21. 21. What makes a good image? … and, remember, newspapers aren’t printed on white paper.
    22. 22. And sometimes papers are filmed on gray camera beds…
    23. 23. Digital Production Application Framework Manages the Digitization Process <ul><li>Ingest </li></ul>Automation Manual Process Output
    24. 24. Digitization Steps Before Post Processing <ul><li>1. Ingest (automated) </li></ul><ul><li>2. Split/Deskew/Crop (manual) </li></ul><ul><li>3. Structural Metadata (manual) </li></ul><ul><li>4. Zoning for OCR (manual) </li></ul>1 | 2 | 3 | 4
    25. 25. 1. Ingest (Automated) <ul><li>Import images and CSV file into application framework. </li></ul><ul><li>Create derivative images for use in the application framework. </li></ul><ul><li>Create new work container in database manager. </li></ul>1 | 2 | 3 | 4
    26. 26. 2. Split/Deskew/Crop (Manual) <ul><li>Split any images from IIB oriented film so that each page image is a distinct file. </li></ul><ul><li>Deskew by text line for better OCR/OWR. </li></ul><ul><li>Crop to include page edges. </li></ul>1 | 2 | 3 | 4
    27. 27. 3. Structural Metadata (Manual) <ul><li>Key data for page numbers, reel sequence, newspaper section, and any targets included on the film. </li></ul>1 | 2 | 3 | 4
    28. 28. 4. Zoning for OCR/OWR (Manual) <ul><li>Plot division lines over page images to create templates that guide the OCR/OWR engines during their recognition process. </li></ul><ul><li>Ensure preservation of correct reading order in the generated searchable text. </li></ul>1 | 2 | 3 | 4
    29. 29. Quality Control <ul><li>Example: Scan through thumbnails of every page image to check for proper skew, split and crop. </li></ul>
    30. 30. Output: Post Processing <ul><li>Automated process >> </li></ul>
    31. 31. Validation of Data (Automated) <ul><li>LC Digital Viewer and Validation software parses output to ensure data is present and properly formatted. </li></ul><ul><li>Writes digital signatures into XML files that have validated successfully. </li></ul>
    32. 32. Lurking under the rocks?
    33. 33. Microfilm – you gotta love it! <ul><li>What time was it shot? </li></ul><ul><li>Filmed in a tobacco state? </li></ul><ul><li>What page was that? </li></ul><ul><li>Page 1 or pages 1,3,and 5? </li></ul>
    34. 34. Technical Infrastructure <ul><li>Systems support requirements challenging </li></ul><ul><ul><li>Not an ILS </li></ul></ul><ul><li>Network issues </li></ul><ul><li>Storage issues </li></ul><ul><ul><ul><li>At least 4 copies in the system at one time </li></ul></ul></ul>
    35. 35. Blue skies ahead?
    36. 36. Predicted Benefits <ul><li>Gaining expertise </li></ul><ul><li>Giving us a niche for this expertise </li></ul><ul><li>Fun stimulating work </li></ul><ul><li>Excellent team working as one </li></ul><ul><li>Something on which to build other work/projects </li></ul><ul><li>Building infrastructure </li></ul>
    37. 37. Unpredicted benefits <ul><li>Relationship with iArchives </li></ul><ul><li>Support from the Dean where it counts </li></ul><ul><li>We have become experts </li></ul><ul><li>We found lots of things lurking under the rocks and conquered them </li></ul>
    38. 38. Staff <ul><li>Principal investigator 12% </li></ul><ul><li>Project Manager 100% </li></ul><ul><li>Microfilm Manager 10% </li></ul><ul><li>KDL Director 10% </li></ul><ul><li>Image Management Specialist 25% </li></ul><ul><li>Metadata specialist 50% </li></ul><ul><li>Students - 30 hours per week </li></ul>
    39. 39. Opportunities ahead? <ul><li>Facilitate other institutions’ projects </li></ul><ul><li>Subcontract work from others </li></ul><ul><li>Grow future project managers </li></ul><ul><li>Library school students benefit from experience </li></ul><ul><li>Literally writing the cookbook </li></ul>
    40. 40. Look at an image?
    41. 41. The Kentucky Way Digitizing Newspapers as a part of the NDNP Mary Molinaro molinaro@uky.edu