Digitizing Spectator - Libraries Digital Program


Published on

  • Be the first to comment

  • Be the first to like this

Digitizing Spectator - Libraries Digital Program

  1. 1. Columbia Spectator Archive Progress Report on Phase 1 Stephen Paul Davis Columbia University Libraries Digital Program June 27, 2012
  2. 2. The Plan• Partnership between Columbia Libraries / Information Services and the Spectator• High quality scanning of original Spectator issues from Columbia University Archives and the Spectator Editorial Offices• State-of-the-art text processing (OCR) of scanned images to allow searching at article• Feature-rich online presentation• Permanent, long-term digital preservation
  3. 3. The Players• The Spectator staff and board• University Archives• Libraries‟ Preservation & Digital Conversion Division• Libraries‟ Digital Program Division• Libraries‟ Information Technology Division• Digital Data Divide• Brechin Imaging Services• Digital Library Consulting (Veridian provider)• Cornell University Libraries [behind the scenes]
  4. 4. The ContextColumbia Libraries Digital Program’s mission:• To carry out digitization and access projects chiefly from Columbia‟s rare and special collections (2002-)• To build and support Columbia‟s long-term digital preservation infrastructure (2010-)• To develop and support preservation of and access to born-digital archival collections (2011-)
  5. 5. Columbia Libraries Digitization Program• Digitization Projects (Digital Scriptorium, APIS (papyrus project), John Jay Papers, Herbert Lehman Papers, etc.)• Digital Exhibitions (See especially: Core Curriculum:CC, Core Curriculum:LitHum, 1968:Columbia in Crisis, Varsity Show)• „Born-Digital‟ & Web Archives (Columbia University, Human Rights Organizations, etc.)
  6. 6. Columbia‟s Technology PlatformsColumbia University Libraries / Information Serviceshas a:• robust repository infrastructure that follows• national and international standards and• „best practices‟ to support• digital publishing and• long-term digital preservation
  7. 7. Columbia‟s Repository & Preservation Infrastructure Schematic Overview
  8. 8. Newspaper Access …• Providing flexible access to newspaper content is complicated and expensive• Not cost-effective for single institutions to build custom, newspaper-oriented software• Only two major vendors provide software optimized for newspapers• DL Consulting’s Veridian is by far the better & most frequent choice for research libraries
  9. 9. Spectator StatsSpectator run from 1877-2009: Number of volumes = 155 Estimated no. of pages = 79,145 Average pages per volume = 500 (wide variation!) Est. vols. requiring disbinding = 100 Est. vols. unable to be digitized = 10NB: Most volumes contain severely brittle paper; only24 volumes have flexible paper
  10. 10. Why Scan From Originals?
  11. 11. Scanning from originals retains visual content 6 May 1968
  12. 12. Tiny sampler of Spec Archive images
  13. 13. 11 October 1956 19 February 1957
  14. 14. 29 September 1959
  15. 15. 3 December 197327 October 1961
  16. 16. 2 October 1972 7 March 1974
  17. 17. Challenges of Scanning from Originals
  18. 18. Disbinding fragile pages
  19. 19. Repairing and Conserving
  20. 20. Preservation Boxing(for shipping & long-term storage)
  21. 21. Phase 1 Completion• Prep, rehouse, digitize & encode Spec volumes for 1955-1992: completed June 15th• Load into VeridianTest System: June 29th• Design Spectator Archive website: July 15th• Move test system to production environment: July 30th• Do user testing and quality review: August 15th• Launch new public site: September 4th
  22. 22. Demo of Test System• 1964: http://tinyurl.com/78hhypj• 1968: http://tinyurl.com/7jk6ynz• 1973: http://tinyurl.com/7gu55p6• 1983: http://tinyurl.com/7dq8zly• Searching “coeducation”: http://tinyurl.com/7cwd95g• Partial content list: http://tinyurl.com/7q8w4nq[Note that these are all temporary links that work as of 6/28/2012 but whichwill stop working altogether at some point in the next few weeks.]
  23. 23. Phase 2 Goals Finish the Project!(Prep, rehouse, repair, digitize & encode Specvolumes for 1877-1954 and 1992-2009)
  24. 24. Phase 2 Costs (for ca. 55,000 pages)• Preparation, rehousing, repair = will be covered by CU Libraries• Scanning of 55,000 pages = $55,000 + $5,000 contingency• OCR, segmentation, selective text correction = $55,000 + $5,000 contingency• Load into host system, license, maintenance = already covered by CU Libraries• Long term preservation of master image (tiff) files = may require additional fundraising
  25. 25. Final, key points• The Spectator Archive project is extremely important for preservation of and access to Columbia University‟s history• This is an archival preservation project as well as an information access project• Columbia Libraries is making a major, long-term investment to ensure the success of this project• The Libraries and the Spec have made a great start, but additional funding is needed to complete the job
  26. 26. QuestionsStephen Paul Davis, Director Libraries Digital Program Columbia University daviss@columbia.edu