Successfully reported this slideshow.

Digital projects best practices [xxxiii reunión nacional de archivos 201111]


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

Digital projects best practices [xxxiii reunión nacional de archivos 201111]

  1. 1. digital projects best practices Frederick Zarndt 1
  2. 2. how’s and what’s of a digital archive / library•  what is a (good) digital library ?•  how should a digital library be designed ?•  how should a digital library be created ?•  how is a digital library measured ?•  how should a digital project be executed ?•  how should a digital library or a digital project be managed ? 2
  3. 3. why a digital project?•  to enhance accessibility of the content in libraries and archives•  to increase collaboration and cooperation between libraries and archives around the world•  to promote research•  to provide opportunities for entrepreneurs 3
  4. 4. digital projects overview•  collections: organized groups of digital objects 4
  5. 5. digital collectionsLibrary and Archives Canada 5
  6. 6. digital projects overview•  collections: organized groups of digital objects•  objects: digital materials 6
  7. 7. digital objectissue from the California Digital Newspaper Collection 7
  8. 8. digital projects overview•  collections: organized groups of digital objects•  objects: digital materials•  metadata: information about objects and collections 8
  9. 9. digital object metadatametadata from the Singapore National Library 9
  10. 10. project phases•  assess•  design•  implement•  measure•  preserve•  manage 10
  11. 11. assess•  select the collection or content•  define the goals•  identify the users•  identify ownership and legal risks•  identify applicable standards•  evaluate capabilities 11
  12. 12. design: standards•  METS XML for descriptive, structural, technical, and administrative metadata•  descriptive metadata •  Metadata Object Description Standard (MODS) selected metadata from MARC •  Dublin Core fundamental group of text elements for describing and cataloging•  technical metadata •  ALTO for OCR text •  PREMIS for digital preservation •  MIX for images 12
  13. 13. design: standards•  image standards •  TIFF •  JPEG2000 •  JPEG •  ANSI/NISO Z39.87•  file standards •  PDF, PDF/A, PDF/A-1b, PDF/A-1a •  TEI•  record standards •  ISAD(G) •  ERA 13
  14. 14. design: access•  user community•  user interface (UI)•  search•  authentication and user management•  digital object presentation•  portability•  administration 14
  15. 15. implement: pilot create requirements and acceptance criteria repeat { digitize (small) pilot batch test data against acceptance criteria adjust requirements and acceptance criteria } until (no more adjustments are necessary) digitize more dataNB: pilot batches are VERY VERY important!! 15
  16. 16. implement: in-housereasons for in-house production •  collection cannot be moved •  collection is badly organized •  digitization must be done slowly over a long period •  digitization is very simple 16
  17. 17. implement: outsourcereasons for outsourced production •  originals can’t be scanned in-house because… •  equipment is too expensive •  output data is beyond staff experience •  labor is too expensive •  large volume of work in a short time •  insufficient space, infrastructure, or staff 17
  18. 18. implement: software•  commercial off-the-shelf (COTS)•  open source•  customized COTS•  customized open source•  custom in-house 18
  19. 19. implement: crowd sourcing • •  National Library of Australia Newspapers Digitisation Program •  Library and Archives Canada •  Wikipedia 19
  20. 20. measure: acceptance criteria•  automatic quality checks •  is the digital object complete? •  is the digital object verifiable? •  is the digital object uncorrupted?•  manual quality checks •  does the metadata meet accuracy specifications? •  does the text meet accuracy specifications? •  is the image quality satisfactory? 20
  21. 21. measure: image quality “…images which are ultimately to be viewed by human beings, the only “correct” method of quantifying visual image quality is through subjective evaluation. in practice, however, subjective evaluation is usually too inconvenient, time-consuming and expensive…” “…best way to assess the quality of an image is to look at it because human eyes are the ultimate viewers of most images…”Zhou Wang and Hamid R. Sheikh. Image Quality Assessment: From Error Visibility to Structural Similarity.IEEE Transactions on Image Processing. April 2004Zhou Wang, Alan Bovick, and Ligang Lu. Why is image quality assessment so difficult? IEEE Transactionson Image Processing. April 2004 21
  22. 22. measure: use•  who is using the collection?•  what is the collection being used for?•  how many page views per day / week / month?•  how long do visitors to the collection stay?•  how many repeat visitors to the collection? 22
  23. 23. preserve•  bit rot•  format obsolescence•  media obsolescence / decay•  migration to new media or hardware•  standards obsolescence 23
  24. 24. preserve: bit rotgradual decay of … •  storage media because of media quality •  storage media because of improper storage •  data due to random events (bit-flip, •  software due to interface changes •  software due to non-obvious or inadvertent configuration changes 24
  25. 25. preserve: media decaya report by NIST and the Library of Congress says that •  virtually all CD-Rs tested indicated an estimated life expectancy beyond 15 years •  only 47 percent of recordable DVDs indicated an estimated life expectancy beyond 15 years, some had a life expectancy as short as 1.9 years •  in practice actual lifetimes may be considerably shorter 25
  26. 26. preserve: media obsolescence •  5 ¼” floppy disks •  8 track tapes •  3 ½” floppy disks •  ZIP drives •  CD-R, CD-RW, Blu-Ray •  microfilm 26
  27. 27. preserve: migration•  file format changes•  file name differences: case sensitive / insensitive•  extended file attributes•  file permissions•  soft links / hard links 27
  28. 28. preserve: standards obsolescence remember … •  WordPerfect ? •  MARC records ? •  Adobe Flash ? 28
  29. 29. preservationOpen Archival Information System (OAIS) reference model 29
  30. 30. the problem 30
  31. 31. the problem the 2009 CHAOS Report (The Standish Group)reports that of all software projects surveyed, 44% are “challenged”, 24% failed, and only 32% succeeded 31
  32. 32. the problem Roger Sessions estimates that the worldwide cost of IT failure is USD $500 billion per monthRoger Sessions: CTO of ObjectWatch. He has written seven books includingSimple Architectures for Complex Enterprises and many articles. He is afounding member of the Board of Directors of the International Association ofSoftware Architects. 32
  33. 33. the problem in a recent survey of 1230 IT professionalsconducted by Embarcadero Technologies, 2 of the 3 biggest project challenges cited by the IT prosare “poor planning” and “poor or no requirements” 33
  34. 34. the problem in a March 2007 web poll conducted by theComputing Technology Industry Association "nearly 28 percent of the more than 1,000 respondentssingled out poor communications as the number one cause of project failure" 34
  35. 35. the problemin a white paper written for Project Perfect by Taimour alNeimat, he lists • poor planning • unclear goals and objectives • objectives changing during the project • unrealistic time or resource estimates • lack of executive support and user involvement • failure to communicate and act as a team • inappropriate skillsas primary causes for the failure of complex IT projects 35
  36. 36. the problema recent tender from an (anonymous) government agency •  project to convert ~ 170,000 text images to xml •  value of project ~ USD $180,000 •  19 pages of definitions, governing law, proposal evaluation criteria, contractual conditions, instructions about tender response format, etc •  technical requirements description? < 1 page •  data acceptance criteria? “a high level of accuracy” 36
  37. 37. the problema recent program established by a prominent nationallibrary •  digitize more than 20 million text pages •  high level image and xml requirements •  value of work awarded? > USD $5,000,000 •  after award of work, technical requirements expand to 43+ pages from ~3 pages •  acceptance criteria? added as an afterthought and not well defined 37
  38. 38. the problemtypical tender evaluation criteria in priority order 1. understanding of requirements 2. reputation of service bureau 3. price 38
  39. 39. 39
  40. 40. the problemrequirements 40
  41. 41. requirementsLibrary of Congress JPEG2000 profile 41
  42. 42. the problemrequirements acceptance 42
  43. 43. acceptanceNational Library of Australia NDP 43
  44. 44. the problem requirements acceptancecommunication 44
  45. 45. communication “projects are about communication, communication, and communication”Elenbass,  B.  (2000).  “Staging  a  Project:  Are  You  Se>ng  Your  Project  Up  for  Success?”.    Proceedings  of  the  Project  Management  InsItute  Annual  Seminars  &  Symposiums.   45
  46. 46. references•  METS, MODS, ALTO, PRISM, etc :•  OAIS :•  NISO standards and guidelines :•  good practice guides :•  And many, many more 46
  47. 47. preguntas? Frederick Zarndt frederick@frederickzarndt.comThis work is licensed under the Creative Commons Attribution-ShareAlike (CC by SA) License. To view a copy of this license visit 47