Digitisation Workshop Pres 2008(V1)


Published on

Slides and notes from a presentation that I gave as part of a masterclass for library managers in April 2008. Some slides contain links and the slides are best read in conjunction with the notes that appear at the bottom of the slideshare screen.

Published in: Technology, Education
  • Be the first to like this

Digitisation Workshop Pres 2008(V1)

  1. 1. Digitisation Revolutionising Library Management Day 2 Sydney, April 2007 Mal Booth – Head, Research Centre
  2. 2. Where am I from? The Memorial’s Research Centre functions as a library and an archive. We develop, manage and provide public access to Australia’s official, personal, & published records of war.
  3. 3. Global trends in digitisation <ul><li>Faster, better, cheaper equipment & storage </li></ul><ul><li>Better DAMS & CMS software </li></ul><ul><li>Institutional repositories </li></ul><ul><li>More audio & film </li></ul><ul><li>Collaboration </li></ul><ul><li>Shared collections (eg. Picture Australia) </li></ul><ul><li>Mass digitisation programs: Google, Microsoft, Yahoo, Open Content Alliance (OCA), Internet Archive </li></ul>
  4. 4. I’m not sure what these are, but they are important! <ul><li>Dynamism </li></ul><ul><li>Preservation (as a benefit & obligation) </li></ul><ul><li>Playing </li></ul><ul><li>Management & planning </li></ul><ul><li>Compromise </li></ul><ul><li>Access </li></ul>
  5. 5. Recent Digitisation Examples <ul><li>WW1, WW2, Korea & Vietnam unit war diaries </li></ul><ul><li>260k+ images of our collections </li></ul><ul><li>Official histories (published works) </li></ul><ul><li>Digitisation on demand </li></ul>
  6. 6. Digitisation on demand Currently running at 90,000 pp p.a.
  7. 7. About one fifth of these images
  8. 8. What we will cover today <ul><li>1. GETTING STARTED </li></ul><ul><li>a. Why and what to digitise? </li></ul><ul><li>b. How (preservation/access) & Principles </li></ul><ul><li>c. Copyright and IP considerations (briefly) </li></ul><ul><li>d. Resources needed; in-house or outsource? </li></ul><ul><li>e. Process outline: from planning to long term maintenance (life-cycle) </li></ul><ul><li>2. METHODS, CONTENT & STORAGE </li></ul><ul><li>a. Production: file formats & standards, scanners & cameras, software </li></ul><ul><li>b. Output: indexing, access, search optimisation, delivery options </li></ul><ul><li>c. Storage, ongoing maintenance & management requirements </li></ul><ul><li>d. Just doing it, lessons learned & key issues </li></ul>
  9. 9. Why and what to digitise? <ul><li>WHY </li></ul><ul><li>Increase & broaden access (remote & 24/7) </li></ul><ul><li>Fragile, valuable &/or unique materials (loss or damage would be catastrophic) </li></ul><ul><li>Support research & education </li></ul><ul><li>Anticipating future use or re-use </li></ul><ul><li>Improved search & retrieval </li></ul><ul><li>Promoting knowledge, understanding & recognition of collections </li></ul><ul><li>Relationships to other collections </li></ul><ul><li>Preservation of at-risk collections by risk reduction & conservation </li></ul><ul><li>WHAT : popular collections; fragile/unique; at-risk; significant priorities; relationships (corporate or collaborative); & what you have the right to digitise! </li></ul>
  10. 10. How: some Principles* - Collections <ul><li>( organised groups of objects ) </li></ul><ul><li>Agreed collection development policy </li></ul><ul><li>Sound description </li></ul><ul><li>Lifecycle curation </li></ul><ul><li>Broad access to all </li></ul><ul><li>Respect for IP </li></ul><ul><li>Evaluation for use & usefulness </li></ul><ul><li>Interoperability </li></ul><ul><li>Integration of staff & user workflows </li></ul><ul><li>Sustainability & continued usability </li></ul><ul><li>* NISO Framework of Guidance for the Building of Good Digital Collections </li></ul>
  11. 11. How: some Principles - Objects <ul><li>( digital assets ) </li></ul><ul><li>Production ensures collection priorities & maintains interoperability and re-use </li></ul><ul><li>Preservability: persistence & accessibility over time; across evolving media, software & formats </li></ul><ul><li>Meaningful outside its context: portable, reusable, interoperable </li></ul><ul><li>Persistent identifiers: URLs or URIs </li></ul><ul><li>Authentication: veracity, accuracy & authenticity </li></ul><ul><li>Inclusion of associated metadata: descriptive, administrative & structural </li></ul>
  12. 12. How: some Principles - Metadata <ul><li>( selection and implementation of information about objects: descriptive; administrative; technical; structural; & preservation ) </li></ul><ul><li>Appropriate to materials, users and use </li></ul><ul><li>Support for interoperability: mappings & crosswalks between schemes </li></ul><ul><li>Use of authority control and content standards </li></ul><ul><li>Includes a clear statement on conditions of use for the objects (eg. fair use) </li></ul><ul><li>Support for long term management, eg. PREMIS </li></ul><ul><li>Metadata records are treated as digital objects </li></ul>
  13. 13. How: some Principles - Initiatives <ul><li>( the creation & management of collections ) </li></ul><ul><li>A substantial design and planning component </li></ul><ul><li>Appropriate staffing and expertise </li></ul><ul><li>Best practice project management </li></ul><ul><li>An evaluation plan </li></ul><ul><li>A project report that documents the process & outcomes </li></ul><ul><li>Consideration of the entire lifecycle (ongoing management) </li></ul>
  14. 14. Copyright & Intellectual Property (1) <ul><li>Concerns: </li></ul><ul><li>What sort of items are protected by copyright? </li></ul><ul><li>What is the duration of copyright protection? </li></ul><ul><li>What sorts of activities infringe copyright? </li></ul><ul><li>When is a copyright licence required? </li></ul><ul><li>Understanding the “exceptions” to copyright infringement </li></ul>See: Copyright and Cultural Institutions: Short Guidelines for Digitisation by Emily Hudson and Andrew Kenyon & ACC’s S pecial case exception: education, libraries, collections (deals with the new section 200AB)
  15. 15. IFLA/IPA Statement on Orphaned Works
  16. 19. Resources required (1) <ul><li>Hardware – scanners, cameras, computers, monitors, digital storage, memory & processing power </li></ul><ul><li>Software – scanning, OCR, office apps, image editing & management, DAM?, video/audio capture, metadata capture?, file conversion, calibration </li></ul><ul><li>Furnishings – for staff, computers, scanners, storage </li></ul><ul><li>Facility space – scanning, preparation & storage, QA </li></ul><ul><li>Specialist staff – curatorial, cataloguers, IT/DBA, web, scanning, project management, conservators </li></ul><ul><li>Training needs </li></ul><ul><li>Conservation needs – archival supplies & consultancies </li></ul><ul><li>Budget funds – salaries, hardware/software purchases & lease, licenses, running/ongoing costs, contingency </li></ul><ul><li>Corporate support – context within corporate or other priorities and strategies </li></ul>
  17. 20. WW1 Diaries scanning facilities Approximately 200,000 high res. images per year
  18. 21. Outsource or Inhouse? <ul><li>Contractor responsible for capital equipment, training and technology obsolescence costs costs </li></ul><ul><li>No need to find scanning space </li></ul><ul><li>Less need for digitisation knowledge </li></ul><ul><li>Economies of scale (& capability for large volumes & throughput) </li></ul><ul><li>The bureau may be able to achieve a better quality result & have a broader range of services </li></ul><ul><li>A better fix on costs and timescales (but these can vary widely) </li></ul><ul><li>Better institutional knowledge, understanding & capacity </li></ul><ul><li>Less risk than working with external parties </li></ul><ul><li>Better ability to meet specific needs and deadlines? </li></ul><ul><li>Cheaper costs for oversized or non-standard materials? </li></ul><ul><li>QA may be more efficient </li></ul><ul><li>Saving on transport and insurance and less risk with onsite scanning </li></ul><ul><li>Assured staff and expertise </li></ul>
  19. 22. Dealing with an external bureau <ul><li>Clear contracts are important </li></ul><ul><li>Choosing a bureau – check with reference sites </li></ul><ul><li>Range and scope of material - non-standard materials </li></ul><ul><li>Collaboration with others to achieve further economies of scale may be possible </li></ul><ul><li>QA can be a project killer </li></ul><ul><li>Metadata – what will the bureau record? </li></ul><ul><li>Consider partial outsourcing or bringing a specialist partner onsite </li></ul>
  20. 23. Some funding options <ul><li>Program funding – dependent on corporate priorities </li></ul><ul><li>User pays – but will they? </li></ul><ul><li>Grants - eg. http:// www.nla.gov.au /chg/ </li></ul><ul><li>Donors or sponsors - from or associated with a web presence </li></ul><ul><li>Collection Depreciation – depends on valuation and an accounting standard </li></ul><ul><li>As a training activity – can be viable learning experience for a small team & project </li></ul><ul><li>New policy proposals </li></ul>
  21. 24. “ Investing in an Intangible Asset” <ul><li>The benefits of long term preservation of digital assets are difficult to value (reliably and objectively), but the costs of not doing so are high if action isn’t taken. More information on costs and benefits is needed . </li></ul><ul><li>Digital preservation is still new, so there is scope for market creation & development, research and experimentation . </li></ul><ul><li>Information managers know why such programs are important, but find it hard to communicate this to those who control our finances. Business cases based on empirical evidence need something like the balanced scorecard approach to bridge the gap between us and decision makers . </li></ul><ul><li>Digital preservation is still an organisational innovation and must be managed effectively as it is dependent on independently driven technological developments. </li></ul><ul><li>From DCC’s Investment in an Intangible Asset </li></ul>
  22. 25. The AWM Document Digitisation Process
  23. 26. Cornell’s digital imaging process map <ul><li>Radiating out from the goals and deliverables of the project are the institutional resources </li></ul><ul><li>The outer wheel represents the processes or stages of digital imaging initiatives – clockwise from Selection </li></ul>
  24. 27. Draft DCC Curation Lifecycle
  25. 28. PRODUCTION: file formats and standards <ul><li>Commonly used formats: </li></ul><ul><li>TIFF </li></ul><ul><li>JPEG </li></ul><ul><li>GIF </li></ul><ul><li>PDF </li></ul><ul><li>Future formats: </li></ul><ul><li>JPEG 2000 </li></ul><ul><li>PNG </li></ul>
  26. 29. PRODUCTION: file formats – how and where they are used
  27. 30. PRODUCTION: scanners & cameras <ul><li>Flatbed scanners </li></ul><ul><li>Map/plan scanners </li></ul><ul><li>Overhead scanners </li></ul><ul><li>Digital cameras </li></ul><ul><li>Book scanners </li></ul><ul><li>Book-edge scanners </li></ul><ul><li>Microfilm and slide scanners </li></ul>
  28. 31. PRODUCTION: software <ul><li>Image editing software </li></ul><ul><li>Consider: cost; hardware requirements; usability; functionality </li></ul><ul><li>Options : Adobe Photoshop CS3 (expensive/best) & Photoshop Elements (cheap); Gimp (free); + prop. software for RAW files </li></ul><ul><li>Derivative and pdf production: Acrobat Writer (expensive); ImageMagick (conversion software); Ghostscript (pdf interpreter); & pdftk (pdf toolkit) </li></ul><ul><li>Other useful open source software: </li></ul><ul><li>JHOVE object validation </li></ul><ul><li>FedoraCommons object repository management system </li></ul><ul><li>ebXML e-business suite </li></ul><ul><li>Xena digital document preservation software (from NAA) </li></ul><ul><li>DSpace institutional repository system </li></ul><ul><li>DROID automated batch identification of file formats (from TNA UK) </li></ul>
  29. 32. OUTPUT <ul><li>Indexing </li></ul><ul><li>Most descriptive metadata will come from your MARC records </li></ul><ul><li>If a separate database is needed: Access, SQL & Oracle </li></ul><ul><li>Access options (also part of just doing it ) </li></ul><ul><li>Collection OPACs , databases , Zoomify , EAD , DVDs, CDs </li></ul><ul><li>Other: Blogs , Facebook ArtShare , Flickr , Facebook page </li></ul><ul><li>Search engine optimisation </li></ul><ul><li>How can I create a Google-friendly site? </li></ul>
  30. 33. STORAGE & MAINTENANCE <ul><li>Storage </li></ul><ul><li>Consider : Speed (read/write, data transfer); Capacity; Reliability (stability, redundancy); Standardization; Cost; & Fitness to task </li></ul><ul><li>Management, maintenance & preservation </li></ul><ul><li>Digital preservation practices </li></ul><ul><li>Preservation metadata </li></ul><ul><li>Trusted digital repositories? </li></ul>
  31. 34. What we want <ul><li>Accuracy / authenticity </li></ul><ul><li>Searchability </li></ul><ul><li>Easy navigation & download </li></ul><ul><li>Cost effectiveness </li></ul><ul><li>Good quality product </li></ul><ul><li>Text capture and search (OCR) where poss. </li></ul><ul><li>Integration </li></ul><ul><li>Scalability </li></ul><ul><li>Web interactivity </li></ul><ul><li>Simple solutions </li></ul><ul><li>Costs estimates escalate </li></ul><ul><li>Technology has limits, but is improving </li></ul><ul><li>You learn with new technology by doing </li></ul><ul><li>There is more to copyright than owning it </li></ul><ul><li>Anticipate needs & increasing expectations </li></ul><ul><li>$ hard to find for access (sponsorship?) </li></ul><ul><li>Better management & storage of assets </li></ul><ul><li>A need to educate managers & suppliers! </li></ul><ul><li>Keeping trained staff is a challenge </li></ul><ul><li>Costs/benefits of new technologies ( risk ?) </li></ul><ul><li>Importance of QA in projects! </li></ul><ul><li>Need for a strategic plan(s) </li></ul><ul><li>Be prepared to compromise </li></ul>What we are finding Lessons
  32. 35. Enterprise Content Management: management, search & web facilities for digital assets and services <ul><li>Extensive digital asset management features </li></ul><ul><li>Excellent electronic document & record management </li></ul><ul><li>Intuitive web content management features </li></ul><ul><li>Facilitate simple and complex workflow processes </li></ul><ul><li>Extensive and unified searching constructs </li></ul><ul><li>Scaleable </li></ul><ul><li>Compliant with all government recordkeeping requirements & emerging digital preservation standards </li></ul><ul><li>Integrate easily with existing Memorial systems </li></ul><ul><li>Simple to administer in terms of security, auditing & storage management </li></ul>
  33. 36. Other Corporate Systems Digital Asset Management Electronic Document & Records Management Record Management E:mail Memorial Intranet Web Content Management AJRP Website Lotus Notes OAI Interface FIRST OPAC MICA OPAC (CAS) ‏ ECM - Conceptual Overview CMS Digital Object Mgmt System DOMS Biographical Databases & War Diaries RecordSearch NAA Collection Mgmt MICA Library System FIRST Fund Raising System Raisers Edge Financial & HR System SAP POS System, Advance Retail CAS Internal Orders OnLine Shop Search Photocopy Quotes ReQuest eSales PICTION
  34. 37. implementing user-friendly technologies <ul><li>make sure they are findable and useable </li></ul><ul><li>pick a few “winners” & lead by example </li></ul><ul><li>collaborate & network </li></ul><ul><li>get involved in your core business </li></ul><ul><li>don't leave it to IT-staff </li></ul><ul><li>learn to compromise (the 80:20 rule) ‏ </li></ul><ul><li>experiment </li></ul><ul><li>start now! it is sometimes easier to seek forgiveness than gain permission </li></ul>
  35. 38. JISC 2007 – five key issues for digitisation <ul><li>Re-focus on the user (simple, easily found & used output) </li></ul><ul><li>Aggregate and present content that can resonate with multiple communities </li></ul><ul><li>Learn from Google & YouTube but keep our values </li></ul><ul><li>New business models are needed, collaborating with and without the private sector </li></ul><ul><li>More collaboration between publishers, curators, funders, users, vendors and standards bodies </li></ul>