Successfully reported this slideshow.
Your SlideShare is downloading. ×

A Digitization Primer for Botanical and Horticultural Librarians

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad

Check these out next

1 of 66 Ad

More Related Content

Similar to A Digitization Primer for Botanical and Horticultural Librarians (20)

More from Chris Freeland (20)

Advertisement

Recently uploaded (20)

A Digitization Primer for Botanical and Horticultural Librarians

  1. 1. A Digitization Primer for Botanical and Horticultural Librarians • Chris Freeland – MBG Web and Digitization Project Coordinator • Doug Holland – MBG Administrative Librarian • Heather Rolen – NYBG Digitization Specialist CBHL 2002: A Digitization Primer
  2. 2. Why Digitize? • Makes resources broadly available while preserving original. • 24/7 worldwide availability. • Capitalize on investment in resources and technology (Collections, storage curation) • Assimilate disparate resources • Learn something new (It’s Fun!!) • Pressure from above (Everyone is doing it!) CBHL 2002: A Digitization Primer
  3. 3. Survey Summary 13 Humble Responses! – Little to no experience with projects – Some with Scanning/Photoshop • Types of materials – Slides and glass plates 6 – Photos (Electrophoresis gels?) 7 – Printed material [loose, bound (rare books!)], newspaper clippings, maps, architectural drawings, seed and nursery catalogs] 10 – Herbarium Specimens 2 • Inhouse image database (Annie Malley) CBHL 2002: A Digitization Primer
  4. 4. What we will be covering • Audience and Users • Goals • Ownership • Preservation • Access • Metadata • Scanning • Sustainability CBHL 2002: A Digitization Primer
  5. 5. A Framework of Guidance for Building Good Digital Collections http://www.imls.gov/pubs/forumframework.htm • Interoperability • Reusability (Repurposing) • Persistence • Verification • Documentation • Respecting copyright and intellectual property law • Think a little bigger and think about the future. CBHL 2002: A Digitization Primer
  6. 6. Audience and Users • Who are your users – Today – Future • Lifelong Learners • Scholar/researcher • Students • Business Community CBHL 2002: A Digitization Primer
  7. 7. Why is it important to define users? • Guide selection process • Determines complexity and type of metadata • Determines image resolution • Determines web-site design (Database or exhibit format) • Determines equipment needs CBHL 2002: A Digitization Primer
  8. 8. How can you retain users and keep them coming back? • Keep adding new content • Creating value-added content after the initial rollout – Lesson plans, etc. • Create an e-mail newsletter CBHL 2002: A Digitization Primer
  9. 9. User Comments • Should include a way to solicit, retain, and respond to user comments and suggestions. – Can tell you if you’re reaching your intended audience – Can provide you with wonderful comments to include in grant proposals or to show your administration: • “Thanks so much for sharing this. This is the internet at its best.” • “This is fantastic. I am most enjoying these rare books, especially the illustrations. I hope to use this with teachers in the future.” CBHL 2002: A Digitization Primer
  10. 10. Planning and Goals • Have clear project goals and objectives • Be aware that funding agencies may influence the scope of your project • Designate a project manager. • Identify key departments or staff • Stay realistic (perhaps conservative) in your production promises. • Document all changes and evolution in your project. CBHL 2002: A Digitization Primer
  11. 11. Ownership • Copyright needs to be considered • Holding doesn’t mean owning • Is item in public domain? http://www.unc.edu/~unclng/public-d.htm http://cidc.library.cornell.edu/copyright/ • Modify your deed of gift to include digital distribution • Controlling intellectual property after digitization CBHL 2002: A Digitization Primer
  12. 12. Selection • Audience needs • Good Collections • Condition • One or many collections or mainstreaming • Item formats and sizes • Metadata available or Collection condition (Activities other than scanning require 75% or project time) • Rights • Sensitive Issues (Skeletons??) • Who else is doing the same or similar items? CBHL 2002: A Digitization Primer
  13. 13. Preservation and Digitization • Digitization is NOT preservation • Do not discard originals. • Why not? – Media longevity – Software and hardware obsolescence • Digitization does preserve original through reduced exposure and handling. CBHL 2002: A Digitization Primer
  14. 14. Preserving the Original • Handle Items Once (Scan high!) • Consider rehousing either before or after scanning. • Appropriate long term storage • Remember 2/3 of project time has nothing to do with scanning. CBHL 2002: A Digitization Primer
  15. 15. Discovery and Access (or Scanned and Deliver) • Online Catalog or Database – Subject Heading or keyword search • Finding Aids for archival collections • Exhibit style educational page • Don’t forget metatags and visibility to Web search engines. (If that is one of your goals!) CBHL 2002: A Digitization Primer
  16. 16. Web Access and Display • Exhibit Approach • Database Approach • Both CBHL 2002: A Digitization Primer
  17. 17. Exhibit Approach • Pull together text, images, maps, documents, etc. to tell a story • Value added information enhances the scanned images • Appealing to a wide audience CBHL 2002: A Digitization Primer
  18. 18. Example of Exhibit Approach • Private Passions, Public Legacy: Paul Mellon's Personal Library at the University of Virginia CBHL 2002: A Digitization Primer
  19. 19. Database Approach • Give access to images through a search mechanism – Generally have to know something about the collection to find what you’re looking for • Appealing to a more focused audience – Scholars, professionals CBHL 2002: A Digitization Primer
  20. 20. Example of Database Approach • Making of America • Google Image Search CBHL 2002: A Digitization Primer
  21. 21. Both Approaches • Provide value added information to reach a wider audience • Also give full access to the data for people who know what they want to view. CBHL 2002: A Digitization Primer
  22. 22. Example – MBG Rare Book Site CBHL 2002: A Digitization Primer
  23. 23. Design vs. Development • Usually spend too much time discussing background colors and layout – Too subjective • Should focus on – Search engine placement – Successful searches for key phrases – Usage statistics CBHL 2002: A Digitization Primer
  24. 24. “If you build it, they may not come” • Indexing by search engines is not a given • Great images + great metadata does not equal a popular site • You must consider how search engines work CBHL 2002: A Digitization Primer
  25. 25. Indexing tips CBHL 2002: A Digitization Primer
  26. 26. Indexing tips – <meta> tag • <meta name="description" content="The Missouri Botanical Garden Library presents its Rare Book Digitization Project."> • <meta name="keywords" content="botanical illustration,rare books, herbals, engravings, illustrations, botany, botanical illustrations, medicinal plants, Desktop Wallpaper, images of medicinal plants, plant images, Jaume, Kohler"> • <META NAME="DC.Title" CONTENT="Plate 1 - Cinchona officinalis; <i>Cinchona officinalis</i> L.; quinine"> • <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#title"> • <META NAME="DC.Creator" CONTENT="Lambert, Aylmer Bourke"> • <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#creator"> • <META NAME="DC.Subject" CONTENT="(SCHEME=LCSH) Cinchona.|Hyaenanche.|Rubiaceae.|Euphorbiaceae.|Graphic media : --Copper engraving -- Uncolored -- 1797 -- England.|"> • <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#subject"> • <META NAME="DC.Subject" CONTENT="(SCHEME=LCCS) QK495 .F270 L35 1797"> • <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#subject"> • <META NAME="DC.Description" CONTENT="Plate 1 - Cinchona officinalis; <i>Cinchona officinalis</i> L.; quinine"> • <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#description"> • <META NAME="DC.Publisher" CONTENT="Missouri Botanical Garden"> • <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#publisher"> • <META NAME="DC.Contributor.CorporateName" CONTENT="Missouri Botanical Garden"> • <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#contributor"> • <META NAME="DC.Date" CONTENT="(SCHEME=ISO8601)1998-10-01"> • <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#date"> • <META NAME="DC.Type" CONTENT="Image.Illustration"> • <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#type"> • <META NAME="DC.Format" CONTENT="(SCHEME=IMT) text/html"> • <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#format"> • <LINK REL=SCHEMA.imt HREF="http://sunsite.auc.dk/RFC/rfc/rfc2046.html"> • <META NAME="DC.Identifier" CONTENT="http://ridgwaydb.mobot.org/mobot/rarebooks? referencenumber=QK495F270L351797"> • <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#identifier"> • <META NAME="DC.Language" CONTENT="(SCHEME=ISO639-1) en"> • <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#language"> • <META NAME="DC.Relation" CONTENT="QK495F270L351797"> • <LINK REL=SCHEMA.dc HREF="http://purl.org/metadata/dublin_core_elements#relation"> CBHL 2002: A Digitization Primer
  27. 27. Indexing tips - <title> tag • Use descriptive <title> tags: – <title>MBG Rare Books: Plate 1 - Cinchona officinalis</title> CBHL 2002: A Digitization Primer
  28. 28. Indexing tips - <body> text • Use text in your page: – A Description of the Genus Cinchona by Lambert, Aylmer Bourke – Description of Page: Plate 1 - Cinchona officinalis (Cinchona officinalis L., quinine) CBHL 2002: A Digitization Primer
  29. 29. More indexing tips • Having key phrase in all 3 (<meta>, <title>, and body text) increases your search engine rank • Indexing robots follow links on pages – They will follow the hierarchy of your site • Robots don’t: – Click on buttons – Use dropdown menus – Natively navigate or index Flash/multimedia content CBHL 2002: A Digitization Primer
  30. 30. Case Study: Köhler’s Medizinal Pflanzen • Published 1883 – 1914 • Digitized in 1997 • Images were heavily edited and cropped • Text was added to images CBHL 2002: A Digitization Primer
  31. 31. Case Study: Köhler’s Medizinal Pflanzen • Created static HTML pages with links through site • Created a list of current botanical names with links to illustration • NOT technically sophisticated • Used an Exhibit Approach CBHL 2002: A Digitization Primer
  32. 32. Case Study: Köhler’s Medizinal Pflanzen • Receive more user feedback and image requests for this site than any other • Reasons: – Popular content with interesting images – Has been online for several years – Simple web display that can be indexed by all search engines CBHL 2002: A Digitization Primer
  33. 33. Lessons learned • DON’T: – spend too much time bickering over color schemes, fonts, and layout – confuse users and indexing robots with irregular navigation – ignore importance of search engine results for your content CBHL 2002: A Digitization Primer
  34. 34. Lessons learned • DO: – spend time creating rich <meta> and <title> tags and body text – Learn how search engines index content – Consider display, but focus on development CBHL 2002: A Digitization Primer
  35. 35. Metadata and Electronic Resources • Vast amount of information, increasing at a faster rate than is manageable • Standards developing and evolving, using best practices • Web enabled search engines—many, varied in retrieval success • Everyone’s a publisher, everyone’s a librarian • HTML Metatags structure and content limited, inhibits reliable searching • Lack of subject rich terms CBHL 2002: A Digitization Primer
  36. 36. Metadata and Standards • Metadata definition: data about data; data that aids in identification, description and location of networked resources • Standard Generalized Mark-up Language (SGML)--1986 – Structure for producing documents – Document Type Definition (DTD) created for each type of material or individual publication – SGML’s support of encoding text AND description of document in the header CBHL 2002: A Digitization Primer
  37. 37. Dublin Core Basics • http://purl.oclc.org/dc/ • How it began • Why it is important – Simple to create – Easy to understand – International – Flexible • Descriptive, Structural and Administrative metadata • All elements repeatable, all optional CBHL 2002: A Digitization Primer
  38. 38. Dublin Core Elements • Title • Subject • Creator terms/classification • Publisher • Rights Management • Contributor • Source • Description • Type • Identifier • Language • Date • Relation • Format • Coverage CBHL 2002: A Digitization Primer
  39. 39. How MBG uses DC for a book • Title: Icones pictae plantarum rariorum descriptionibus et observationibus illustratae / Auctore J.E. Smith, M.D. Fasc. 1-3. • Creator: Smith, James Edward • Subject_LCSH: Botany -- Pictorial works. • Subject_LCCS: QK98 .S657 • Description: 2 p.l., 18 numb. 1. : 18 col. pl. ; 50 cm. • Publisher: London, 1790-93, Missouri Botanical Garden • Contributor: Photography and Web design by Debbie Windus. • Date: 1998-09-01 • Identifier: http://ridgwaydb.mobot.org/mobot/rarebooks/title.asp? relation=QK98S657 • Relation: QK98S657 • Rights: http://ridgwaydb.mobot.org/mobot/rarebooks/copyright.asp CBHL 2002: A Digitization Primer
  40. 40. How MBG uses DC for a page/image • Title: QK495F270L351797_0060.jpg • Creator: Lambert, Aylmer Bourke, 1761-1842 Subject: Cinchona.|Hyaenanche.|Rubiaceae.|Euphorbiaceae.| Graphic media : --Copper engraving -- Uncolored -- 1797 -- England.| • Description: Plate 9 - Cinchona angustifolia • Publisher: Missouri Botanical Garden • Contributor: Missouri Botanical Garden • Date: 1998-10-01 • Type: Image • Format: jpeg • Identifier: 0060 • Source: QK495.F270 L35 1797 CBHL 2002: A Digitization Primer
  41. 41. Subject Access • Controlled vocabularies – Vocabularies and thesauri – Taxonomies – Access CBHL 2002: A Digitization Primer
  42. 42. XML • METADATA – descriptive – facilitate discovery • OAI • MARC • EAD • Dublin Core – administrative – identify/manage/preserve digital object(s) over time • info on where pieces reside • info on how to view digital object • info on scanning process CBHL 2002: A Digitization Primer
  43. 43. XML • METADATA cont. – structural – storage/presentation of digital object(s) • METS (metadata encoding and transmission standard) » http://www.loc.gov/standards/mets • TEI (text encoding initiative) http://www.tei-c.org • TEI for Libraries (5 levels of encoding) • http://www.indiana.edu/~letrs/tei/ • METAe -automatic metadata creation • http://meta-e.uibk.ac.at CBHL 2002: A Digitization Primer
  44. 44. XML • SGML/HTML/XML – Standard Generalized Markup Language (1986) – Hypertext Markup Language (1989) – eXtensible Markup Language (1996) • XML – a document markup language for defining structured information – a language used by computers to define hidden information about the structure of a document CBHL 2002: A Digitization Primer
  45. 45. XML • XML cont. -best of both worlds – storage • can store any kind of structured info/not limited to Web delivery – presentation • flexible development/design CBHL 2002: A Digitization Primer
  46. 46. XML • XML is a lot simpler than SGML and is sometimes described as an 80/20 solution: you get 80% of the power of SGML for 20% of the effort • You can use XML without thinking ahead and make up your elements en route as long as they nest within each other. This is called writing "well-formed" rather than "valid" XML. Purists discourage this but people will do it anyhow. • XML is specifically designed to work easily with the Web. – http://facultyweb.at.nwu.edu/english/mmueller/ariadne/teixintro/ index.htm CBHL 2002: A Digitization Primer
  47. 47. XML • XML and NYBG digitization project XML text Images files Public access GSDL software server suite Public use CBHL 2002: A Digitization Primer
  48. 48. XML • XML/NYBG project – lack of adopted standards – nature of the data – delivery mechanisms • Research! CBHL 2002: A Digitization Primer
  49. 49. XML • XML sites – http://www.oasis-open.org/cover/sgml-xml.html – http://www.w3.org/XML/ – http://www.ucc.ie/xml/#exec – http://www.xml.com/ • SGML sites – http://www.oasis-open.org/cover/general.html – http://www.w3.org/MarkUp/SGML/ • Listservs – http://sunsite.berkeley.edu/XML4Lib/ – http://www.oasis-open.org/cover/lists.html CBHL 2002: A Digitization Primer
  50. 50. Scanning • Principles for Scanning • Access (not preservation) • Storage • Outsource options CBHL 2002: A Digitization Primer
  51. 51. Howard Besser’s Principles • Scan at the highest resolution appropriate to the informational content of the originals • Scan at an appropriate level of quality to avoid rescanning and re-handling of the originals in the future--scan once • Create and store a master image file that can be used to produce derivative image files and serve a variety of current and future user needs • Use system components that are non- proprietary CBHL 2002: A Digitization Primer
  52. 52. Besser’s Principles Cont. • Use image file formats and compression techniques that conform to industry standards • Create backup copies of all files on a stable medium • Create meaningful metadata for image files or collections • Store media in an appropriate environment • Monitor and recopy data as necessary • Outline a migration strategy for transferring data across generations of technology • Anticipate and plan for future technological developments CBHL 2002: A Digitization Primer
  53. 53. Scan Basics • Digital formats—Master/Archival, access, thumbnail • Always keep a facsimile master • Minimum recommended standards- NARA/LC/CPD • Hardware requirements: – Scanner that exceeds your standards – Workstation—At least Pentium III, 650mhz, storage (20+gigabyte) – Server for display and archiving CBHL 2002: A Digitization Primer
  54. 54. MBG Imaging Lab Specs • See handout CBHL 2002: A Digitization Primer
  55. 55. Scanning • Your requirements may be different than the accepted norm – Maybe 600 dpi is too low for your project • Should be aware of generally accepted guidelines – Have to know the rules before you break them CBHL 2002: A Digitization Primer
  56. 56. Scanning Guidelines • Review handout CBHL 2002: A Digitization Primer
  57. 57. Scanning • Software—Scanners come with some basic software, Adobe Photoshop Lite • Keep current on software • Physical facilities for scanning • When to outsource/special materials CBHL 2002: A Digitization Primer
  58. 58. Outsourcing • What? – Contract work to service providers – Off-site, on-site, imaging only, image/content display/management provider, ASP (application service provider) • Why? – Factors to consider • Project size • project expectations • staff size CBHL 2002: A Digitization Primer
  59. 59. Outsourcing • Why? Cont. • staff expertise • available resources (funding for staff training and equipment, physical space) • deadlines CBHL 2002: A Digitization Primer
  60. 60. Outsourcing • NYBG/Mellon Digitization Project – 3 titles from RB collection – conservation efforts necessary – 21 month grant, no lab, no allocated space to build lab, no staff, no expertise, no extra funding for equipment or staff training, project expectations (grant stipulates archival quality imaging, hard deadline) – image capture outsourced to east coast vendor, quality checks performed in-house CBHL 2002: A Digitization Primer
  61. 61. Outsourcing • Weighing the pros and cons – fragile/rare materials under supervised control vs. equipment costs and updates/staff/expertise/time/ physical space • Worth consideration – …”For digitization projects, institutions and service providers are working with developing technologies and a new vocabulary, creating new quality and production benchmarks, and trying to determine best practices. All the while, digital technology continues to evolve. Both parties must collaborate to determine capture requirements, costs, and deliverables; manage the process; and agree on criteria.” -Meg Bellinger, President, Preservation Resources, Moving Theory into Practice, 2000. CBHL 2002: A Digitization Primer
  62. 62. Outsourcing • Vendors – Octavo http://www.octavo.com/ – Systems Integration Group http://www.sigi.com/ – Preservation Resources http://www.oclc.org/oclc/presres/ – Saztec http://www.saztec.com – Innodata http://www.innodata.com/ – Northern Micrographics http://www.normicro.com/ northern_micrographics.htm CBHL 2002: A Digitization Primer
  63. 63. Sustainability • Digitization shouldn’t be a fling, (when others are paying the bills) It is a marriage and more. • Time = Money • Permanence • Data Migration and Emulation • Review and schedule upgrades • Documentation CBHL 2002: A Digitization Primer
  64. 64. Cost • Not cheap, but consider the value of objects, the investment already made on your collections and your organizational mission . • Prices range from $7 - $35 per image • Most projects are funded on soft money. Attempt to incorporate scanning into normal operating budgets. • Scanning is 1/3 of total cost. • Largest cost is in research and time invested in creation of metadata or organization of collections. CBHL 2002: A Digitization Primer
  65. 65. Staffing • Staff with tolerance for ambiguity • Staff with creativity • Training in metadata, scanning • Photographic skills (artistic eye) microcomputer skills, web design skills • Staff with risk taking attitude CBHL 2002: A Digitization Primer
  66. 66. Concluding Thoughts • Create digital products worth preserving • Collaborate! • Adhere to standards • Refresh/migrate your data • Don’t forget preservation metadata- digital products are not copies, but new artifacts CBHL 2002: A Digitization Primer

Editor's Notes

  • Introductions Disclaimers – Not on “Guru Tour” of digitization workshops Share what we have learned.
  • Ask people why they want to digitize Call on or read those that replied Tell how Raven was asking why if the Art museums library was digitized, why isn’t ours? Cataloging vs. digitizing. We are really talking about “Reformatting” as opposed to items born digital.
  • Share what we have learned. Everything is scaleable. You are not LC Focusing on projects, but can be 50 or 5000 or 50000 images. Even if you are scanning only for in house use, not for web delivery much of this is relevant. As complicated as this seems, it is continually getting more standardized and easier. Many more resources and standards available A Framework of Guidance for Building Good Digital Collections
  • Distinguish between a enduring value and immediate value scanning. Importance of thinking of issues bigger than “scanning” especially when using funding from large agencies. Lots of people around the world digitizing. Standards, Interoperbility More Bang for Buck. Reuse and repurposing of digitized items. Think big and think about the future Includes Sources for detailed information on Good selection Sources for detailed information on creating Good Digital Objects Souces for detailed information on creating good Metadata Sources for detailed information on running Good Project
  • See matrix Appendix 1 Predicting users is difficult.
  • Allow for major staffing, hardware or software delays Project manager who is accountable and empowered Who do you need to talk to about server space, databases, programming? Documentation for others and your own institution. Reports for granting agencies. Make an estimate---Double time, triple expense. Cost for scanning 500 page rare book $3000 staff time and media. No equipment or indirect costs.
  • Mr. Sid controls access to high resolution images
  • Selection can be based on Themes Geography Historic Period Subject Headings Core Lists! Material Types (images) Don’t give exact localities of rare and endangered plants Don’t scan personal or sensitive information about founders or their descendents.
  • Will address preservation of digital objects later.
  • “Guided tour” of images Browse by Subject could be thought of as an exhibit approach
  • No search function. Purpose is to tell the story about Paul Mellon, not document his collection in its entirety.
  • Give MBG Archives project at example.
  • No search feature No thought given to future books, format, etc.
  • What’s different about electronic resources
  • Some history and standards
  • We want to provide guidance as well as guidelines
  • Sho Think through long term commitment next month, 5 years, 50 years 100 years? Cannot put a digital collection on the shelf for 50 years Storing archival images on CD or Servers Documentation for future archaeologists, or the project manager who takes over when you move to a new job. Good metadata will contain some documentation

×