Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Evolution of motion picture digitization at the National Library of Medicine


Published on

Talk related to my IS&T Archiving2016 conference paper

Published in: Technology
  • Be the first to comment

  • Be the first to like this

Evolution of motion picture digitization at the National Library of Medicine

  1. 1. The Evolution of Historical Moving Pictures Digitization at the National Library of Medicine John Rees Archivist and Digital Resources Manager History of Medicine Division Archiving 2016 April 21, 2016
  2. 2. • Evolving Digitization Premises • Access • Staffing • Production Workflow (why is it so hard?) • New Digital Preservation Direction
  3. 3. Digitization Premise • Historic Audiovisual Program has ~9,000 titles | General Collection ~29,000 • Started pilot in 2010 with 11 mostly World War II training films • Primary goal is discovery and access • HMD goal was to widen access to content and broaden user audiences • Preservation was a secondary concern – Digital content with high-enough quality to satisfy most all use cases except cinematic – Aligns with NARA’s reformatting approaches for reproduction masters: ing.html • Section 508 compliance a major production factor • Choices changing as NLM turns to digitization as a preservation function
  4. 4. Current Analog Preservation Workflows • NLM has a long history of vendor-focused reformatting of original media • BetcamSP copy master; VHS/DVD access derivative • Originals transferred to Iron Mountain cold storage • BetacamSP stored in local cool vault
  5. 5. Access • Primary delivery mechanism via our custom built Video Player With Search application within Fedora/Blacklight repository infrastructure – Bibliographic and transcript full text browse/search via Blacklight • Curator also distributes to YouTube channel, Medical Movies on the Web, online subject guides
  6. 6. Staffing • No full-time digitization team • Contributions from a mix of permanent staff, summer students, interns, vendors • No real deep knowledge of film/video digital preservation or production • No robust hardware/software
  7. 7. Production Workflow Processes • Ultimate deliverable is throttled mp4 used with Video Player Application + several downloads • Rip MPEG2 from DVD/BetacamSP • Transcript and caption files for 508 accessibility • No single free/COTS software in 2010 could produce everything • Ingested artifacts from these various processes; lots of ‘what if’ use cases • Moved to Sorenson Squeeze for derivative production
  8. 8. Transcription/Captioning • Automated transcription tools don’t really work • 5:1 ratio is best-case scenario for in-house production; 8:1 more realistic – Transcription vendor well-worth the cost • MovieCaptioner captioning software outputs – DFXP caption file (W3C standard, runs our video player with search) – SRT preservation master
  9. 9. Old Pre-Ingest SIP
  10. 10. New Pre-Ingest SIP
  11. 11. Pre-Ingest Quality Control
  12. 12. New Preservation Direction • Preservation Section desire to modernize practices, move away from BetacamSP • Maintain vendor-driven production paradigm • Digitize now before digitization hardware/vendor market declines, esp. for film • Do more with General Collection content – Large collection of Umatics past their life expectancy – Provide more online access where possible • In-house working group surveyed digital format landscape, debated formats • Contracted with AV Preserve to re-inforce our format decisions but also – develop statements of work for outsourcing all digitization and captioning work – market research for additional, geographically dispersed vendors – increase current duplication throughput – Advance quality control and inspection tools
  13. 13. Preservation Format Decisions
  14. 14. Preservation Format Decisions • Tiered Approach to Preservation Masters – align with NARA’s approach again – Video: 10 bit FFV1/Matroska – Film: 10 bit uncompressed DPX | lossless/compressed MJPEG2000/MXF (kinda scared of DPX file sizes) – Intra-AVC 100/MXF mezzanine copy for master/derivative generation – Characterize with MediaInfo/reVTMD for technical and provenance/process metadata • Considered AVI but learned it lacked capacity for captions/audio description and other tech and admin metadata
  15. 15. Project Scope • Collection Survey – 3,800 Umatics (38,000 hours) – 2,300 films (1,150 hours) • Start pilot with 100 Umatic titles and a few films from HAV with ultimate goal of maintaining usual 700-800 titles per year reformatting rate • Storage needs – 200+ TB for FFV1 – 1,200+ TB for DPX – 600 TB MJPEG200
  16. 16. Credits Co-authors: John Doyle and Doron Shalvi AV Preservation Working Group: Walter Cybulski, Sarah Eilers, Sandra Kim, Felix Kong, Ben Petersen, Karen Sinkule, Rebecca Warlow