Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Digitisation workshop pres 2009(v1)


Published on

Slides from a half day workshop that I gave a couple of times in 2009. Better late than never I suppose. You need to read my blog post here: for an explanation about some slides and for references.

Published in: Education

Digitisation workshop pres 2009(v1)

  1. 1. Managing DigitisationPrograms<br />Workshop<br />Sydney, <br />16 July 2009<br />Mal Booth – DERSU<br />
  2. 2. My background?The Australian War Memorial’s Research Centre functions as a library and an archive. It develops, manages and provides public access to Australia’s official, personal, & published records of war.<br />
  3. 3. Global trends in digitisation<br /><ul><li>Faster, better, cheaper equipment & storage
  4. 4. Better DAMS & CMS software
  5. 5. Institutional & shared repositories
  6. 6. More audio & film
  7. 7. Collaboration
  8. 8. Shared collections (eg. Picture Australia)
  9. 9. Mass digitisation programs: Google, Microsoft, Yahoo, Open Content Alliance (OCA), Internet Archive
  10. 10. Pressure for online access & pressures on real storage space</li></li></ul><li>I’m not sure what these are, but they are important!<br /><ul><li>Dynamism
  11. 11. Preservation (as a benefit & obligation where necessary)
  12. 12. Playing
  13. 13. Management & planning
  14. 14. Compromise
  15. 15. Access</li></li></ul><li>Recent Examples - AWM<br />WW1, WW2, Korea & Vietnam unit war diaries<br />260k+ images of our collections<br />Official histories (published works)<br />Digitisation on demand <br />
  16. 16. Digitisation for Accessc90,000 ppper year<br />
  17. 17. Recent Examples – UTS Library<br />Supporting Teaching & Learning<br /><ul><li>Digital Resource Register
  18. 18. Alternative Format Service
  19. 19. Exam Papers</li></ul>Access only<br />Supporting Research<br /><ul><li> eScholarship (UTS ePress, iResearch, eData)
  20. 20. Australian Digital Theses Collection</li></ul>Access & Preservation (data curation)<br />
  21. 21. About one fifth of these images<br />
  22. 22. What we will cover today<br />1. GETTING STARTED<br />a. Why and what to digitise?<br />b. How (preservation/access) & Principles<br />c. Copyright and IP considerations (briefly)<br />d. Resources needed; in-house or outsource?<br />e. Process outline: from planning to long term maintenance (life-cycle)<br />2. METHODS, CONTENT & STORAGE<br />a. Production: file formats & standards, scanners & cameras, software<br />b. Output: indexing, access, search optimisation, delivery options<br />c. Storage, ongoing maintenance & management requirements<br />d. Just doing it, lessons learned & key issues<br />
  23. 23. Why and what to digitise?<br />WHY<br />Increase & broaden access (remote & 24/7)<br />Fragile, valuable &/or unique materials (loss or damage would be catastrophic)<br />Support research & education<br />Anticipating future use or re-use<br />Improved search, retrieval & storage<br />Promoting knowledge, understanding & recognition of collections<br />Relationships to other collections<br />Preservation of at-risk collections by risk reduction & conservation<br />WHAT: popular collections; fragile/unique; at-risk; significant priorities; relationships (corporate or collaborative); & what you have the right to digitise!<br />
  24. 24. How: some Principles* - Collections<br />(organised groups of objects)<br />Agreed collection development policy<br />Sound description<br />Lifecycle curation<br />Broad access to all<br />Respect for IP<br />Evaluation for use & usefulness<br />Interoperability<br />Integration of staff & user workflows<br />Sustainability & continued usability<br />* NISO Framework of Guidance for the Building of Good Digital Collections <br />
  25. 25. How: some Principles - Objects<br />(digital assets)<br />Production ensures collection priorities & maintains interoperability and re-use<br />Preservability: persistence & accessibility over time; across evolving media, software & formats<br />Meaningful outside its context: portable, reusable, interoperable<br />Persistent identifiers: URLs or URIs<br />Authentication: veracity, accuracy & authenticity<br />Inclusion of associated metadata: descriptive, administrative & structural<br />
  26. 26. How: some Principles - Metadata<br />(selection and implementation of information about objects: descriptive; administrative; technical; structural; & preservation)<br />Appropriate to materials, users and use<br />Support for interoperability: mappings & crosswalks between schemes<br />Use of authority control and content standards<br />Includes a clear statement on conditions of use for the objects (eg. fair use)<br />Support for long term management, eg. PREMIS<br />Metadata records are treated as digital objects<br />RUBRIC overview:<br /><br />
  27. 27. How: some Principles - Initiatives<br />(the creation & management of collections)<br />A substantial design and planning component<br />Appropriate staffing and expertise<br />Best practice project management<br />An evaluation plan<br />A project report that documents the process & outcomes<br />Consideration of the entire lifecycle (ongoing management)<br />
  28. 28. Copyright & Intellectual Property (1)<br />Concerns:<br />What sort of items are protected by copyright? <br />What is the duration of copyright protection? <br />What sorts of activities infringe copyright? <br />When is a copyright licence required?<br />Understanding the “exceptions” to copyright infringement<br />See: Copyright and Cultural Institutions: Short Guidelines for Digitisation by Emily Hudson and Andrew Kenyon<br />& ACC’s Special case exception: education, libraries, collections(deals with the new section 200AB)<br />
  29. 29. IFLA/IPA Statement on Orphaned Works<br />
  30. 30.
  31. 31.
  32. 32.
  33. 33. Resources required (1)<br />Hardware – scanners, cameras, computers, monitors, digital storage, memory & processing power<br />Software – scanning, OCR, office apps, image editing & management, DAM?, video/audio capture, metadata capture?, file conversion, calibration<br />Furnishings – for staff, computers, scanners, storage<br />Facility space – scanning, preparation & storage, QA<br />Specialist staff – curatorial, cataloguers, IT/DBA, web, scanning, project management, conservators<br />Training needs<br />Conservation needs – archival supplies & consultancies<br />Budget funds – salaries, hardware/software purchases & lease, licenses, running/ongoing costs, contingency<br />Corporate support – context within corporate or other priorities and strategies<br />
  34. 34. WW1 Diaries scanning facilities<br />Approximately 200,000 high res. images per year<br />
  35. 35. Outsource or Inhouse?<br />Contractor responsible for capital equipment, training and technology obsolescence costs costs<br />No need to find scanning space<br />Less need for digitisation knowledge<br />Economies of scale (& capability for large volumes & throughput)<br />The bureau may be able to achieve a better quality result & have a broader range of services<br />A better fix on costs and timescales (but these can vary widely)<br /><ul><li>Better institutional knowledge, understanding & capacity
  36. 36. Less risk than working with external parties
  37. 37. Better ability to meet specific needs and deadlines?
  38. 38. Cheaper costs for oversized or non-standard materials?
  39. 39. QA may be more efficient
  40. 40. Saving on transport and insurance and less risk with onsite scanning
  41. 41. Assured staff and expertise </li></li></ul><li>Dealing with an external bureau<br /><ul><li>Clear contracts are important
  42. 42. Choosing a bureau – check with reference sites
  43. 43. Range and scope of material - non-standard materials
  44. 44. Collaboration with others to achieve further economies of scale may be possible
  45. 45. QA can be a project killer
  46. 46. Metadata – what will the bureau record?
  47. 47. Consider partial outsourcing or bringing a specialist partner onsite</li></li></ul><li>Some funding options<br /><ul><li>Program funding – dependent on corporate priorities
  48. 48. User pays – but will they?
  49. 49. Grants - eg.
  50. 50. Donors or sponsors -from or associated with a web presence
  51. 51. Collection Depreciation – depends on valuation and an accounting standard
  52. 52. As a training activity – can be viable learning experience for a small team & project
  53. 53. New policy proposals</li></li></ul><li>“Investing in an Intangible Asset”<br />The benefits of long term preservation of digital assets are difficult to value (reliably and objectively), but the costs of not doing so are high if action isn’t taken. More information on costs and benefits is needed.<br />Digital preservation is still new, so there is scope for market creation & development, research and experimentation.<br />Information managers know why such programs are important, but find it hard to communicate this to those who control our finances. Business cases based on empirical evidence need something like the balanced scorecard approach to bridge the gap between us and decision makers.<br />Digital preservation is still an organisational innovation and must be managed effectively as it is dependent on independently driven technological developments.<br />From DCC’sInvestment in an Intangible Asset<br />
  54. 54. The AWM Document Digitisation Process<br />
  55. 55. Cornell’s digital imaging process map<br /><ul><li>Radiating out from the goals and deliverables of the project are the institutional resources
  56. 56. The outer wheel represents the processes or stages of digital imaging initiatives – clockwise from Selection</li></li></ul><li>
  57. 57. PRODUCTION: file formats and standards<br />Commonly used formats:<br /><ul><li>TIFF
  58. 58. JPEG
  59. 59. GIF
  60. 60. PDF (accessible text!)</li></ul>Contemporary & future formats:<br /><ul><li>JPEG 2000
  61. 61. PNG
  62. 62. DNG</li></li></ul><li>PRODUCTION: file formats – how and where they are used<br />
  63. 63. PRODUCTION: scanners & cameras<br /><ul><li>Flatbed scanners
  64. 64. Map/plan scanners
  65. 65. Overhead scanners
  66. 66. Digital cameras
  67. 67. Book scanners
  68. 68. Book-edge scanners
  69. 69. Microfilm and slide scanners</li></li></ul><li>PRODUCTION: software<br />Image editing software<br /><ul><li> Consider: cost; hardware requirements; usability; functionality
  70. 70. Options : Adobe Photoshop CS3 (expensive/best) & Photoshop Elements (cheap); Gimp (free); + prop. software for RAW files
  71. 71. Derivative, OCR and pdf production: Adobe Acrobat 9 Pro; OmniPage; ImageMagick (conversion software); Ghostscript (pdf interpreter); & pdftk (pdf toolkit)</li></ul>Other useful open source software:<br /><ul><li>JHOVE object validation
  72. 72. FedoraCommons object repository management system
  73. 73. ebXML e-business suite
  74. 74. Xena digital document preservation software (from NAA)
  75. 75. DSpace institutional repository system
  76. 76. DROID automated batch identification of file formats (from TNA UK)
  77. 77. OpenEdit ; Razuna ; ResourceSpace - Open source & free DAM software</li></li></ul><li>OUTPUT<br />Indexing<br /><ul><li> Most descriptive metadata will come from your MARC records
  78. 78. If a separate database is needed: Access, SQL & Oracle</li></ul>Access options (also part of just doing it)<br /><ul><li>Collection OPACs, databases, Zoomify, EAD, DVDs, CDs
  79. 79. Other: Blogs, Facebook ArtShare, Flickr, Flickr Commons, Facebook page</li></ul>Search engine optimisation<br /><ul><li>How can I create a Google-friendly site? </li></li></ul><li>STORAGE & MAINTENANCE<br />Storage<br />Consider: Speed (read/write, data transfer); Capacity; Reliability (stability, redundancy); Standardization; Cost; & Fitness to task<br />Management, maintenance & preservation<br /><ul><li> Digital preservation practices
  80. 80. Preservation metadata
  81. 81. Trusted digital repositories?</li></li></ul><li>Lessons<br />What we want<br />What we are finding<br /><ul><li>Accuracy / authenticity
  82. 82. Accessibility
  83. 83. Searchability
  84. 84. Easy navigation & download
  85. 85. Cost effectiveness
  86. 86. Good quality product
  87. 87. Text capture and search (OCR) where poss.
  88. 88. Integration
  89. 89. Scalability
  90. 90. Web interactivity
  91. 91. Simple solutions
  92. 92. Costs estimates escalate
  93. 93. Technology has limits, but is improving
  94. 94. You learn with new technology by doing
  95. 95. There is more to copyright than owning it
  96. 96. Anticipate needs & increasing expectations
  97. 97. $ hard to find for access (sponsorship?)
  98. 98. Better management & storage of assets
  99. 99. A need to educate managers & suppliers!
  100. 100. Keeping trained staff is a challenge
  101. 101. Costs/benefits of new technologies (risk?)
  102. 102. Importance of QA in projects!
  103. 103. Need for a strategic plan(s)
  104. 104. Be prepared to compromise</li></li></ul><li>Enterprise Content Management: management, search & web facilities for digital assets and services<br />Extensive digital asset management features<br />Excellent electronic document & record management<br />Intuitive web content management features<br />Facilitate simple and complex workflow processes<br />Extensive and unified searching constructs<br />Scaleable<br />Compliant with all government recordkeeping requirements & emerging digital preservation standards<br />Integrate easily with existing systems<br />Simple to administer in terms of security, auditing & storage management<br />
  105. 105. implementing user-friendly technologies<br /><ul><li>make sure they are findable and useable
  106. 106. pick a few “winners” & lead by example
  107. 107. collaborate & network
  108. 108. get involved in your core business
  109. 109. don't leave it just to IT-staff (get involved)
  110. 110. learn to compromise (the 80:20 rule)‏
  111. 111. experiment
  112. 112. start now! it is sometimes easier to seek forgiveness than gain permission</li></li></ul><li>JISC 2007 – five key issues for digitisation<br />Re-focus on the user (simple, easily found & used output) <br />Aggregate and present content that can resonate with multiple communities <br />Learn from Google & YouTube but keep your values<br />New business models are needed, collaborating with and without the private sector<br />More collaboration between publishers, curators, funders, users, vendors and standards bodies<br />