Introductions Disclaimers – Not on “Guru Tour” of digitization workshops Share what we have learned.
Ask people why they want to digitize Call on or read those that replied Tell how Raven was asking why if the Art museums library was digitized, why isn’t ours? Cataloging vs. digitizing. We are really talking about “Reformatting” as opposed to items born digital.
Share what we have learned. Everything is scaleable. You are not LC Focusing on projects, but can be 50 or 5000 or 50000 images. Even if you are scanning only for in house use, not for web delivery much of this is relevant. As complicated as this seems, it is continually getting more standardized and easier. Many more resources and standards available A Framework of Guidance for Building Good Digital Collections
Distinguish between a enduring value and immediate value scanning. Importance of thinking of issues bigger than “scanning” especially when using funding from large agencies. Lots of people around the world digitizing. Standards, Interoperbility More Bang for Buck. Reuse and repurposing of digitized items. Think big and think about the future Includes Sources for detailed information on Good selection Sources for detailed information on creating Good Digital Objects Souces for detailed information on creating good Metadata Sources for detailed information on running Good Project
See matrix Appendix 1 Predicting users is difficult.
Allow for major staffing, hardware or software delays Project manager who is accountable and empowered Who do you need to talk to about server space, databases, programming? Documentation for others and your own institution. Reports for granting agencies. Make an estimate---Double time, triple expense. Cost for scanning 500 page rare book $3000 staff time and media. No equipment or indirect costs.
Mr. Sid controls access to high resolution images
Selection can be based on Themes Geography Historic Period Subject Headings Core Lists! Material Types (images) Don’t give exact localities of rare and endangered plants Don’t scan personal or sensitive information about founders or their descendents.
Will address preservation of digital objects later.
“Guided tour” of images Browse by Subject could be thought of as an exhibit approach
No search function. Purpose is to tell the story about Paul Mellon, not document his collection in its entirety.
Give MBG Archives project at example.
No search feature No thought given to future books, format, etc.
What’s different about electronic resources
Some history and standards
We want to provide guidance as well as guidelines
Sho Think through long term commitment next month, 5 years, 50 years 100 years? Cannot put a digital collection on the shelf for 50 years Storing archival images on CD or Servers Documentation for future archaeologists, or the project manager who takes over when you move to a new job. Good metadata will contain some documentation
Transcript of "A Digitization Primer for Botanical and Horticultural Librarians"
A Digitization Primer forBotanical and Horticultural Librarians• Chris Freeland – MBG Web and Digitization Project Coordinator• Doug Holland – MBG Administrative Librarian• Heather Rolen – NYBG Digitization Specialist CBHL 2002: A Digitization Primer
Why Digitize?• Makes resources broadly available while preserving original.• 24/7 worldwide availability.• Capitalize on investment in resources and technology (Collections, storage curation)• Assimilate disparate resources• Learn something new (It’s Fun!!)• Pressure from above (Everyone is doing it!) CBHL 2002: A Digitization Primer
Survey Summary13 Humble Responses! – Little to no experience with projects – Some with Scanning/Photoshop• Types of materials – Slides and glass plates 6 – Photos (Electrophoresis gels?) 7 – Printed material [loose, bound (rare books!)], newspaper clippings, maps, architectural drawings, seed and nursery catalogs] 10 – Herbarium Specimens 2• Inhouse image database (Annie Malley) CBHL 2002: A Digitization Primer
What we will be covering • Audience and Users • Goals • Ownership • Preservation • Access • Metadata • Scanning • Sustainability CBHL 2002: A Digitization Primer
A Framework of Guidance for Building Good Digital Collectionshttp://www.imls.gov/pubs/forumframework.htm• Interoperability• Reusability (Repurposing)• Persistence• Verification• Documentation• Respecting copyright and intellectual property law• Think a little bigger and think about the future. CBHL 2002: A Digitization Primer
Audience and Users• Who are your users – Today – Future• Lifelong Learners• Scholar/researcher• Students• Business Community CBHL 2002: A Digitization Primer
Why is it important to define users?• Guide selection process• Determines complexity and type of metadata• Determines image resolution• Determines web-site design (Database or exhibit format)• Determines equipment needs CBHL 2002: A Digitization Primer
How can you retain users and keep them coming back?• Keep adding new content• Creating value-added content after the initial rollout – Lesson plans, etc.• Create an e-mail newsletter CBHL 2002: A Digitization Primer
User Comments• Should include a way to solicit, retain, and respond to user comments and suggestions. – Can tell you if you’re reaching your intended audience – Can provide you with wonderful comments to include in grant proposals or to show your administration: • “Thanks so much for sharing this. This is the internet at its best.” • “This is fantastic. I am most enjoying these rare books, especially the illustrations. I hope to use this with teachers in the future.” CBHL 2002: A Digitization Primer
Planning and Goals• Have clear project goals and objectives• Be aware that funding agencies may influence the scope of your project• Designate a project manager.• Identify key departments or staff• Stay realistic (perhaps conservative) in your production promises.• Document all changes and evolution in your project. CBHL 2002: A Digitization Primer
Ownership• Copyright needs to be considered• Holding doesn’t mean owning• Is item in public domain? http://www.unc.edu/~unclng/public-d.htm http://cidc.library.cornell.edu/copyright/• Modify your deed of gift to include digital distribution• Controlling intellectual property after digitization CBHL 2002: A Digitization Primer
Selection• Audience needs• Good Collections• Condition• One or many collections or mainstreaming• Item formats and sizes• Metadata available or Collection condition (Activities other than scanning require 75% or project time)• Rights• Sensitive Issues (Skeletons??)• Who else is doing the same or similar items? CBHL 2002: A Digitization Primer
Preservation and Digitization• Digitization is NOT preservation• Do not discard originals.• Why not? – Media longevity – Software and hardware obsolescence• Digitization does preserve original through reduced exposure and handling. CBHL 2002: A Digitization Primer
Preserving the Original• Handle Items Once (Scan high!)• Consider rehousing either before or after scanning.• Appropriate long term storage• Remember 2/3 of project time has nothing to do with scanning. CBHL 2002: A Digitization Primer
Discovery and Access (or Scanned and Deliver)• Online Catalog or Database – Subject Heading or keyword search• Finding Aids for archival collections• Exhibit style educational page• Don’t forget metatags and visibility to Web search engines. (If that is one of your goals!) CBHL 2002: A Digitization Primer
Web Access and Display• Exhibit Approach• Database Approach• Both CBHL 2002: A Digitization Primer
Exhibit Approach• Pull together text, images, maps, documents, etc. to tell a story• Value added information enhances the scanned images• Appealing to a wide audience CBHL 2002: A Digitization Primer
Example of Exhibit Approach• Private Passions, Public Legacy: Paul Mellons Personal Library at the University of Virginia CBHL 2002: A Digitization Primer
Database Approach• Give access to images through a search mechanism – Generally have to know something about the collection to find what you’re looking for• Appealing to a more focused audience – Scholars, professionals CBHL 2002: A Digitization Primer
Example of Database Approach• Making of America• Google Image Search CBHL 2002: A Digitization Primer
Both Approaches• Provide value added information to reach a wider audience• Also give full access to the data for people who know what they want to view. CBHL 2002: A Digitization Primer
Example – MBG Rare Book Site CBHL 2002: A Digitization Primer
Design vs. Development• Usually spend too much time discussing background colors and layout – Too subjective• Should focus on – Search engine placement – Successful searches for key phrases – Usage statistics CBHL 2002: A Digitization Primer
“If you build it, they may not come”• Indexing by search engines is not a given• Great images + great metadata does not equal a popular site• You must consider how search engines work CBHL 2002: A Digitization Primer
Indexing tips CBHL 2002: A Digitization Primer
Indexing tips - <title> tag• Use descriptive <title> tags: – <title>MBG Rare Books: Plate 1 - Cinchona officinalis</title> CBHL 2002: A Digitization Primer
Indexing tips - <body> text• Use text in your page: – A Description of the Genus Cinchona by Lambert, Aylmer Bourke – Description of Page: Plate 1 - Cinchona officinalis (Cinchona officinalis L., quinine) CBHL 2002: A Digitization Primer
More indexing tips• Having key phrase in all 3 (<meta>, <title>, and body text) increases your search engine rank• Indexing robots follow links on pages – They will follow the hierarchy of your site• Robots don’t: – Click on buttons – Use dropdown menus – Natively navigate or index Flash/multimedia content CBHL 2002: A Digitization Primer
Case Study: Köhler’s Medizinal Pflanzen• Published 1883 – 1914• Digitized in 1997• Images were heavily edited and cropped• Text was added to images CBHL 2002: A Digitization Primer
Case Study: Köhler’s Medizinal Pflanzen• Created static HTML pages with links through site• Created a list of current botanical names with links to illustration• NOT technically sophisticated• Used an Exhibit Approach CBHL 2002: A Digitization Primer
Case Study: Köhler’s Medizinal Pflanzen• Receive more user feedback and image requests for this site than any other• Reasons: – Popular content with interesting images – Has been online for several years – Simple web display that can be indexed by all search engines CBHL 2002: A Digitization Primer
Lessons learned• DON’T: – spend too much time bickering over color schemes, fonts, and layout – confuse users and indexing robots with irregular navigation – ignore importance of search engine results for your content CBHL 2002: A Digitization Primer
Lessons learned• DO: – spend time creating rich <meta> and <title> tags and body text – Learn how search engines index content – Consider display, but focus on development CBHL 2002: A Digitization Primer
Metadata and Electronic Resources• Vast amount of information, increasing at a faster rate than is manageable• Standards developing and evolving, using best practices• Web enabled search engines—many, varied in retrieval success• Everyone’s a publisher, everyone’s a librarian• HTML Metatags structure and content limited, inhibits reliable searching• Lack of subject rich terms CBHL 2002: A Digitization Primer
Metadata and Standards• Metadata definition: data about data; data that aids in identification, description and location of networked resources• Standard Generalized Mark-up Language (SGML)--1986 – Structure for producing documents – Document Type Definition (DTD) created for each type of material or individual publication – SGML’s support of encoding text AND description of document in the header CBHL 2002: A Digitization Primer
Dublin Core Basics• http://purl.oclc.org/dc/• How it began• Why it is important – Simple to create – Easy to understand – International – Flexible• Descriptive, Structural and Administrative metadata• All elements repeatable, all optional CBHL 2002: A Digitization Primer
Dublin Core Elements• Title • Subject• Creator terms/classification• Publisher • Rights Management• Contributor • Source• Description • Type• Identifier • Language• Date • Relation• Format • Coverage CBHL 2002: A Digitization Primer
How MBG uses DC for a book• Title: Icones pictae plantarum rariorum descriptionibus et observationibus illustratae / Auctore J.E. Smith, M.D. Fasc. 1-3.• Creator: Smith, James Edward• Subject_LCSH: Botany -- Pictorial works.• Subject_LCCS: QK98 .S657• Description: 2 p.l., 18 numb. 1. : 18 col. pl. ; 50 cm.• Publisher: London, 1790-93, Missouri Botanical Garden• Contributor: Photography and Web design by Debbie Windus.• Date: 1998-09-01• Identifier: http://ridgwaydb.mobot.org/mobot/rarebooks/title.asp? relation=QK98S657• Relation: QK98S657• Rights: http://ridgwaydb.mobot.org/mobot/rarebooks/copyright.asp CBHL 2002: A Digitization Primer
How MBG uses DC for a page/image• Title: QK495F270L351797_0060.jpg• Creator: Lambert, Aylmer Bourke, 1761-1842 Subject: Cinchona.|Hyaenanche.|Rubiaceae.|Euphorbiaceae.| Graphic media : --Copper engraving -- Uncolored -- 1797 -- England.|• Description: Plate 9 - Cinchona angustifolia• Publisher: Missouri Botanical Garden• Contributor: Missouri Botanical Garden• Date: 1998-10-01• Type: Image• Format: jpeg• Identifier: 0060• Source: QK495.F270 L35 1797 CBHL 2002: A Digitization Primer
Subject Access• Controlled vocabularies – Vocabularies and thesauri – Taxonomies – Access CBHL 2002: A Digitization Primer
XML• METADATA – descriptive – facilitate discovery • OAI • MARC • EAD • Dublin Core – administrative – identify/manage/preserve digital object(s) over time • info on where pieces reside • info on how to view digital object • info on scanning process CBHL 2002: A Digitization Primer
XML• METADATA cont. – structural – storage/presentation of digital object(s) • METS (metadata encoding and transmission standard) » http://www.loc.gov/standards/mets • TEI (text encoding initiative) http://www.tei-c.org • TEI for Libraries (5 levels of encoding) • http://www.indiana.edu/~letrs/tei/ • METAe -automatic metadata creation • http://meta-e.uibk.ac.at CBHL 2002: A Digitization Primer
XML• SGML/HTML/XML – Standard Generalized Markup Language (1986) – Hypertext Markup Language (1989) – eXtensible Markup Language (1996)• XML – a document markup language for defining structured information – a language used by computers to define hidden information about the structure of a document CBHL 2002: A Digitization Primer
XML• XML cont. -best of both worlds – storage • can store any kind of structured info/not limited to Web delivery – presentation • flexible development/design CBHL 2002: A Digitization Primer
XML• XML is a lot simpler than SGML and is sometimes described as an 80/20 solution: you get 80% of the power of SGML for 20% of the effort• You can use XML without thinking ahead and make up your elements en route as long as they nest within each other. This is called writing "well-formed" rather than "valid" XML. Purists discourage this but people will do it anyhow.• XML is specifically designed to work easily with the Web. – http://facultyweb.at.nwu.edu/english/mmueller/ariadne/teixintro/ index.htm CBHL 2002: A Digitization Primer
XML• XML and NYBG digitization project XML text Images files Public access GSDL software server suite Public use CBHL 2002: A Digitization Primer
XML• XML/NYBG project – lack of adopted standards – nature of the data – delivery mechanisms• Research! CBHL 2002: A Digitization Primer
XML• XML sites – http://www.oasis-open.org/cover/sgml-xml.html – http://www.w3.org/XML/ – http://www.ucc.ie/xml/#exec – http://www.xml.com/• SGML sites – http://www.oasis-open.org/cover/general.html – http://www.w3.org/MarkUp/SGML/• Listservs – http://sunsite.berkeley.edu/XML4Lib/ – http://www.oasis-open.org/cover/lists.html CBHL 2002: A Digitization Primer
Scanning• Principles for Scanning• Access (not preservation)• Storage• Outsource options CBHL 2002: A Digitization Primer
Howard Besser’s Principles• Scan at the highest resolution appropriate to the informational content of the originals• Scan at an appropriate level of quality to avoid rescanning and re-handling of the originals in the future--scan once• Create and store a master image file that can be used to produce derivative image files and serve a variety of current and future user needs• Use system components that are non- proprietary CBHL 2002: A Digitization Primer
Besser’s Principles Cont.• Use image file formats and compression techniques that conform to industry standards• Create backup copies of all files on a stable medium• Create meaningful metadata for image files or collections• Store media in an appropriate environment• Monitor and recopy data as necessary• Outline a migration strategy for transferring data across generations of technology• Anticipate and plan for future technological developments CBHL 2002: A Digitization Primer
Scan Basics• Digital formats—Master/Archival, access, thumbnail• Always keep a facsimile master• Minimum recommended standards- NARA/LC/CPD• Hardware requirements: – Scanner that exceeds your standards – Workstation—At least Pentium III, 650mhz, storage (20+gigabyte) – Server for display and archiving CBHL 2002: A Digitization Primer
MBG Imaging Lab Specs• See handout CBHL 2002: A Digitization Primer
Scanning• Your requirements may be different than the accepted norm – Maybe 600 dpi is too low for your project• Should be aware of generally accepted guidelines – Have to know the rules before you break them CBHL 2002: A Digitization Primer
Scanning Guidelines• Review handout CBHL 2002: A Digitization Primer
Scanning• Software—Scanners come with some basic software, Adobe Photoshop Lite• Keep current on software• Physical facilities for scanning• When to outsource/special materials CBHL 2002: A Digitization Primer
Outsourcing• What? – Contract work to service providers – Off-site, on-site, imaging only, image/content display/management provider, ASP (application service provider)• Why? – Factors to consider • Project size • project expectations • staff size CBHL 2002: A Digitization Primer
Outsourcing• Why? Cont. • staff expertise • available resources (funding for staff training and equipment, physical space) • deadlines CBHL 2002: A Digitization Primer
Outsourcing• NYBG/Mellon Digitization Project – 3 titles from RB collection – conservation efforts necessary – 21 month grant, no lab, no allocated space to build lab, no staff, no expertise, no extra funding for equipment or staff training, project expectations (grant stipulates archival quality imaging, hard deadline) – image capture outsourced to east coast vendor, quality checks performed in-house CBHL 2002: A Digitization Primer
Outsourcing• Weighing the pros and cons – fragile/rare materials under supervised control vs. equipment costs and updates/staff/expertise/time/ physical space• Worth consideration – …”For digitization projects, institutions and service providers are working with developing technologies and a new vocabulary, creating new quality and production benchmarks, and trying to determine best practices. All the while, digital technology continues to evolve. Both parties must collaborate to determine capture requirements, costs, and deliverables; manage the process; and agree on criteria.” -Meg Bellinger, President, Preservation Resources, Moving Theory into Practice, 2000. CBHL 2002: A Digitization Primer
Outsourcing• Vendors – Octavo http://www.octavo.com/ – Systems Integration Group http://www.sigi.com/ – Preservation Resources http://www.oclc.org/oclc/presres/ – Saztec http://www.saztec.com – Innodata http://www.innodata.com/ – Northern Micrographics http://www.normicro.com/ northern_micrographics.htm CBHL 2002: A Digitization Primer
Sustainability• Digitization shouldn’t be a fling, (when others are paying the bills) It is a marriage and more.• Time = Money• Permanence• Data Migration and Emulation• Review and schedule upgrades• Documentation CBHL 2002: A Digitization Primer
Cost• Not cheap, but consider the value of objects, the investment already made on your collections and your organizational mission .• Prices range from $7 - $35 per image• Most projects are funded on soft money. Attempt to incorporate scanning into normal operating budgets.• Scanning is 1/3 of total cost.• Largest cost is in research and time invested in creation of metadata or organization of collections. CBHL 2002: A Digitization Primer
Staffing• Staff with tolerance for ambiguity• Staff with creativity• Training in metadata, scanning• Photographic skills (artistic eye) microcomputer skills, web design skills• Staff with risk taking attitude CBHL 2002: A Digitization Primer
Concluding Thoughts• Create digital products worth preserving• Collaborate!• Adhere to standards• Refresh/migrate your data• Don’t forget preservation metadata- digital products are not copies, but new artifacts CBHL 2002: A Digitization Primer
A particular slide catching your eye?
Clipping is a handy way to collect important slides you want to go back to later.