This document provides an overview of a conference on building digital collections. It discusses selecting materials for digitization, setting priorities, copyright considerations, digitization methods, metadata, and file organization. Attendees learned about planning digital projects, choosing a scanner, assigning descriptive information, and creating standardized naming systems for digital files and folders. The presentation provided guidance on effectively building organized and sustainable online collections.
Basic Civil Engineering first year Notes- Chapter 4 Building.pptx
Building Digital Collections: Planning and Creating
1. L O C A L H I S T O R Y & H I S T O R I C P R E S E R V AT I O N C O N F E R E N C E
O C T O B E R 1 1 , 2 0 1 3
BUILDING DIGITAL COLLECTIONS
PART 1: PLANNING AND CREATING
Supported by WHRAB
2. TODAY’S AGENDA
• Introductions
• Selecting materials
• Selection criteria
• Setting priorities
• Copyright considerations
• Cost considerations
• Digitizing collections
• Choosing a scanner
• Formats and standards
• When NOT to scan yourself
• Metadata
• What is metadata?
• Assigning titles and subject
headings
• Organizing and naming files
• Wrap-up and final thoughts Waterford Public Library/University of
Wisconsin Digital Collections
3. INTRODUCTIONS
• We are…
• Sarah Grimm, Electronic
Records Archivist, Wisconsin
Historical Society
• Emily Pfotenhauer, Recollection
Wisconsin Program Manager,
WiLS
• You are…
• What organization do you
represent?
• What digital projects are you
currently working on or
thinking about?
Eager Free Public Library/University of
Wisconsin Digital Collections
4. WHAT DO YOU MEAN, DIGITIZE?
• Selecting materials
• Reformatting materials
(scanning or photographing)
• Adding metadata
(descriptive information)
• Making available online
• Storing and maintaining
digital files and data (digital
preservation)
Wisconsin Historical Society
5. DEFINING A DIGITAL COLLECTION
• A good digital collection…
• Is publicly accessible
• Is searchable - Includes keywords and other descriptive
information (metadata) so users can find what they’re looking for
• Uses software that is sustainable (will be around for a long time)
and interoperable (can be migrated or shared)
• Remains true to the original materials
• Respects intellectual property rights
• A digital collection is not…
• An inventory
• An online exhibit/gallery/slideshow
6. BEFORE YOU EVEN START…..
• Don’t scan a mess! Take the
time to assess and organize
your originals first.
• A digital project can be an
ideal time to evaluate
collection conditions and
rehouse materials as needed.
• Resources for collections care
and organization:
• Wisconsin Historical Society
Field Services staff
• Wisconsin Archives Mentoring
Service
• National Park Service
Conserve-O-Grams Richland County History Room
8. TYPES OF MATERIALS
• Photographs
• Postcards
• Letters
• Diaries
• Scrapbooks
• Yearbooks
• Newspaper clippings
• City directories
• Local histories
• Magazines
• Pamphlets
• Maps
• Artifacts/3-D objects
• Oral histories
• Sound recordings
• Video recordings
• Other? Appleton Public Library
9. DEVELOPING SELECTION CRITERIA
When developing a selection policy, consider…
• Your organization’s mission statement and collecting policies
• Appeal and interest (is this of value to researchers? To other
audiences?)
• Uniqueness of materials (is this the only source or does it also
exist elsewhere? Avoid duplication)
• Focusing on a specific subject, theme or creator
• Manageability – tackle a project of appropriate size and scope
10. SETTING PRIORITIES
Ask yourself which materials are…
• most significant to your
organization?
• most extensive?
• most requested/used?
• easiest?
• oldest?
• newest?
• at risk?
Neville Public Museum of Brown County
11. SELECTION – YES OR NO
• This item is rare or unique to our collection.
• This item is frequently requested by our patrons/visitors.
• This item or very similar items are not found anywhere else on the Internet.
• There is enough accurate information available about the item to add
useful context for our audience (for example, we know or can find out
names of people, locations, dates).
• We have the appropriate equipment to create an accurate, high-quality
digital copy of this item (for example, item is not too large to fit on
scanner), or funding to outsource if needed.
• This item is in stable condition and will not be damaged by scanning or
other handling.
• This item is in the public domain or we have secured permission from the
rights holder to make it available online.
12. CONSIDERING COPYRIGHT
• Disclaimer: We are not
lawyers.
• Owning a physical item does
not necessarily mean you
hold the copyright to that
item.
• Public domain = no longer
under copyright. In the US
in 2013 that means the item
was:
• Published before 1923 –OR–
• Unpublished; creator died
before 1943 –OR–
• Unpublished; unknown
creator; made before 1893
UW-Milwaukee Libraries
13. CONSIDERING COPYRIGHT
• Works under copyright,
copyright holder is known:
• Contact copyright holder IN
WRITING to request
permission to make available
online.
• Works presumed to be
under copyright; copyright
holder is unknown or
cannot be located:
• Due diligence has been made
to identify and locate
copyright holder.
• Be prepared to remove item
from digital collection if
challenged.
Three Lakes Historical Society
14. SAMPLE COPYRIGHT STATEMENTS
• For an item presumed to be in the public domain: This item is in
the public domain. There are no known restrictions on the use of this
digital resource. Contact [your institution] to purchase a high-
resolution version of this image.
• For an item under copyright; copyright holder has granted
permission to put online: This image has been made available with
permission of the copyright holder and has been provided here for
educational purposes only. Commercial use is prohibited without
permission. Contact [your institution] for information regarding
permissions and reproductions.
• For an item in which copyright status is undetermined: This
material may be protected by copyright law. The user is responsible
for all issues of copyright. Contact [your institution] for information
regarding permissions and reproductions.
16. POTENTIAL PROJECT COSTS
• Scanner
• Outsourcing imaging to a
commercial vendor
• Digital camera and related
equipment
• Internet access
• Storage for digital files
• Software for online access
• Archival storage supplies
• Be sure to budget for TIME
and SPACE
Merrill Historical Society
17. FUNDING
• Grants
• Historical societies: WI Council
for Local History mini-grants
• Public libraries: LSTA
Digitization of Local Resources
grants (Dep’t of Public
Instruction)
• Local corporations or
foundations
• In-kind contributions
• Tech support
• Equipment use
• Biggest expense is TIME
• Paid staff time
• “Free” volunteer time
• Students/interns Ripon College
19. DIGITAL IMAGING
• Goals of imaging:
• Create a digital
representation that’s
faithful to the original item
• Create the highest quality
image you can with
available resources
• Anticipate multiple uses
(online, print publication,
exhibit, etc.)
• Scan once—don’t expect to
return to re-digitize
UW-Madison Archives
20. CHOOSING A SCANNER
• Some features to look for:
• Transparency unit
--for scanning slides and negatives
• Size of scanning bed
• Image editing software
--many new scanners come with Photoshop Elements
• Compatible with your computer’s operating system
• Is your computer fast enough to process large image files?
21. SCANNING PHOTOGRAPHS
• Scan all photographs in 24-bit
color, even if image is black
and white
• Scanning resolution (ppi)
depends on size of original
item
• Longest side of item longer
than 7” = 300ppi
• Shorter than 7” = 600ppi
• Save two copies of each scan:
• High resolution TIFF (20-
40MB) for archiving and
printing
• Lower resolution JPEG (1-5MB)
for online collection, email,
social media
UW-La Crosse
22. TIP: USE YOUR HISTOGRAM
• A histogram is a graph that shows
the distribution of dark and light
pixels in a digital image.
• Using the Histogram function
improves the accuracy/fidelity of
your scan
• Do a preview scan
• In advanced/professional/ custom
mode, select the Histogram
function
• Move the left and right sliders to
each end point of the histogram
• Do not move the sliders INTO the
histogram
• Scan the image
23. TIP: PLACE IMAGES CAREFULLY
Leave a border on all four sides
OR crop all four sides evenly.
24. SCANNING DOCUMENTS
• Handwritten texts
• Scan in 24-bit color to
retain character of
original
• 300-400ppi is generally
sufficient
• If feasible, create a
transcription
• Use care when unfolding
papers or handling tightly
bound volumes
Wisconsin Historical Society
25. SCANNING DOCUMENTS
• Printed texts
• Scan in 8-bit grayscale or
24-bit color
• 300ppi is generally
sufficient
• Use OCR (Optical Character
Recognition) software to
make the text computer-
searchable
• May be provided with your
scanner software
• ABBYY Fine Reader
• Adobe Acrobat
• OCR is never 100% accurate,
but that’s ok
L. E. Phillips Memorial Library, Eau Claire
26. WORKING WITH PRINTED TEXT? OCR!
• OCR = Optical Character Recognition
• Software that makes printed text computer-readable and fully
searchable
• Very valuable when scanning books, yearbooks, city
directories, newspaper clippings, etc.
• A couple of options…
• ABBYY Finereader ($100-$170)
• Adobe Acrobat ($45 through techsoup.org)
27. WHEN NOT TO SCAN IT YOURSELF
• Look to a vendor for scanning…
• Oversized materials
--maps, blueprints, etc.
• Fragile books or scrapbooks
--bindings can be damaged by laying flat to scan
• Anything with flaking, cracked or otherwise fragile surface
• Microfilm
--newspapers
• Potential vendors
• Northern Micrographics, La Crosse
• A/E Graphics, Milwaukee
• Wisconsin Historical Society (for microfilm)
29. METADATA: WHAT IS IT?
• Information about stuff
• Technical metadata = information
about the digital file (size, type,
etc.)
• Descriptive metadata =
information about the content of
the item (what are we looking
at?)
• Helps users find what they’re
looking for
• Organized, standardized,
consistent, searchable
Grant County Historical Society
30.
31.
32.
33. SAMPLE METADATA
Field Name Sample Data
Title DiVall barber shop, Middleton, 1925
Subjects Barbers; Barbershops
Type Still image
Format image/tiff
Rights statement This material may be protected by copyright law. The
user is responsible for all issues of copyright.
File name 2006_01_12.tif
Submitter Middleton Area Historical Society
Date digitized 2013-04-05
Middleton Area Historical Society
34. SAMPLE METADATA
Field Name Sample Data
Creator Bartle, F. C.
Date Created 1925-09-12 OR 1920-1930
Materials Photographs
Description Ralph DiVall (left) and Edwin T. Baltes (right) shave
two men seated in barber chairs. According to a
family history on file at the Society, DiVall operated
this barber shop from the 1920s until his retirement
on July 1, 1966.
Location Middleton, Dane County, Wisconsin
Collection DiVall Family Collection
Identifier 2006.01.12
Middleton Area Historical Society
36. EXISTING TITLES
If the photograph contains a title or caption, transcribe it exactly.
Birds-eye-view, No. 4,
1908, Barneveld, Wis.
37. WHAT MAKES A GOOD TITLE?
If the photo does not already have a title, you’ll need to create
one.
A useful title is…
• Descriptive and specific
• Brief
• Follows specific formatting rules
• Capitalize first word and proper names (people, places, institutions)
• Don’t start with “A” or “The”
• Period not needed at the end
38. SUBJECT, LOCATION, DATE
Person, object, building, etc.
City OR township OR county
Year or date range
BASIC FORMULA FOR CREATING TITLES
Only include an element IF KNOWN
39. PEOPLE & PORTRAITS
• Identify the person’s name (first
name, last name)
• Identify the location to the most
specific level possible (City OR
Township OR County)
• do not include state
• Identify the date (Specific year?
Date range?)
41. PEOPLE & PORTRAITS
• Identify…Who? Where?
When?
• Women
• Children
• Babies
• Carriages/strollers
• Stores/shops
• Boardwalk
• Marathon County
• 1890-1899
42. Women and children with babies in carriages,
Manitowoc County, 1890-1899
(SUBJECT, LOCATION, DATE)
43. BUILDINGS AND CITYSCAPES
• Identify the name of the street or view
• Identify the location (City OR Township OR County)
• Identify the date (Year? Date range?)
44. 100 block of South Main Street,
Fort Atkinson, 1940-1949
(SUBJECT, LOCATION, DATE)
45. SUBJECT, ACTIVITY, LOCATION, DATE
Person, object, building, etc.
City OR township OR county
Year or date range
EXPANDED FORMULA FOR CREATING TITLES
Action or event
Only include an element IF KNOWN
46. Identify…Who? What are they
doing? Where? When?
• Tailor and customer
• Measuring
• Two Rivers
• Date unknown – 20th century
ACTIVITIES AND EVENTS
48. ACTIVITIES AND EVENTS
Identify…Who? What are
they doing? Where and
when?
• Circus elephant
• Trainer
• Woman on swing
• Evansville
• 1940-1949
49. Trainer with circus elephant holding woman on swing,
Evansville, 1940-1949
(SUBJECT, ACTIVITY, LOCATION, DATE)
50. EXERCISE - ASSIGNING TITLES
Work in small groups to assign a title to a historic
photograph.
Remember the basic title formulas:
• SUBJECT, LOCATION, DATE
• SUBJECT, ACTIVITY, LOCATION, DATE
51. ASSIGNING SUBJECT HEADINGS
• Subject headings are terms or
phrases assigned to an item to
facilitate searching and
browsing a collection.
• Consistent use of subject
headings helps link related
content in your collection and
across disparate collections.
52. CONTROLLED VOCABULARIES
• A controlled vocabulary is a
standardized, pre-determined
list of subject headings.
• Some examples of controlled
vocabularies:
• Library of Congress Thesaurus
for Graphic Materials
• Library of Congress Subject
Headings
• Getty Art and Architecture
Thesaurus
• Nomenclature 3.0 New Berlin Historical Society
53. TIPS FOR ASSIGNING SUBJECT HEADINGS
• Consider the following elements to help select terms:
• WHO? People - age, gender, occupation, ethnicity
• WHERE? Building or other setting
• WHAT? Activities or events
• Always copy terms exactly from the controlled vocabulary.
• Think of your own “tags,” then search the controlled
vocabulary list for correct terms.
• How did others do it? Look at similar photos for
examples/ideas.
• Aim for 1-5 terms.
• There is no one right answer!
58. EXERCISE – ASSIGNING SUBJECTS
Work in small groups to assign subject headings to
a historic photograph (choose a maximum of 5
terms).
Select terms from the short list extracted from the
Library of Congress Thesaurus for Graphic
Materials. The full version of this controlled
vocabulary is available online:
http://www.loc.gov/rr/print/tgm1/
59. FILE NAMING AND ORGANIZATION
Sixty Years of Quality Canning by the Lakeside Packing Company, ca. 1947.
Manitowoc Public Library/ University of Wisconsin Digital Collections
60. WHY IS THIS IMPORTANT?
• To create organizational standards
• To help you find it again
• To prevent accidental overwriting
• To eliminate (minimize) duplication of files
Train Wreck Image ID: WHi-2011
61. FILE NAMING
• Keep folder / document titles
short and descriptive
• Use only lower case letters,
numbers, and dashes or
underscores
• Don’t use spaces or
punctuation
• Don’t use special characters in
your file/folder titles
(^”<>|? / : @’* &.)
(Just because you CAN doesn’t
mean you SHOULD…..)
Typing at Dickinson Secretarial School
Image ID: WHi-19562
62. FILE NAMING
• Date your documents consistently
• yyyymmdd_brieftitle.xxx
• Use leading zeroes for consecutive numbering. For example, a
multi-page letter could have file names mac001.tif,
mac002.tif, mac003.tif, etc.
• Tie your file names to existing catalog numbers if possible
63. EXAMPLES
• Photograph with accession # 2011.32.1 = 201132001.tif –OR–
2011_32_001.tif
• Series of images by photographer John Smith = smith001.tif,
smith002.tif, smith003.tif
• Not so good: Glassplate16039 Auto repair in basement 025.tif
64. RESOURCES
• State Library of North Carolina –
• Web
http://www.archive.org/details/WhyFileNamingIsImportant
http://www.archive.org/details/HowToChangeAFileName
http://www.archive.org/details/WhatNotToDoWhenNamingFiles
http://www.archive.org/details/WhatToDoWhenNamingFiles
• YouTube
http://digitalpreservation.ncdcr.gov/tutorials.html
65. FILE ORGANIZATION AND MANAGEMENT
• Centralize your files
• Minimize your layers
• Leave breadcrumbs
(AKA “READ ME”)
• Determine what you
don’t know
IH General Office Mail Room
Image ID: WHi-12016
66. WHAT NOT TO KEEP?
• Backups/copies/drafts
• Supplementary files that
provide no additional
long-term value
• Corrupted files
• Same item – different
file formats
• Items that don’t fit your
organization’s purpose Boy on Curb near Trash Pile
Image ID: WHi-57208
68. WRAPPING UP – FINAL THOUGHTS
Commencement, 1978
UW-Madison Archives
69. TIPS FROM OTHER DIGITIZERS
• If I could do it all over
again, I would:
• Tackle a smaller group of
materials at first
• Make sure two people
started the project at the
same time so we could help
each other
• Start with a clearer plan
• Take the time to sort and
research the physical
collection before digitizing
• Have firm deadlines to help
me stay on track
Langlade County Historical Society
70. NEXT STEPS/TO DO LIST
• Review collections and set priorities for digitization.
• Consider developing a written selection policy.
• Determine the copyright status of any materials you plan
to share online and secure permissions from copyright
holders if materials are not in public domain.
• Acquire scanning equipment or make other plans for
conversion.
• Familiarize yourself with good, useful metadata by
looking at other online collections.
• Develop a file naming convention document.
71. THANK YOU!
• Sarah Grimm, Wisconsin
Historical Society
sarah.grimm@wisconsinhistory.org
608-261-1008
• Emily Pfotenhauer, WiLS
emily@wils.org
608-616-9756
• Slides and handouts
available at
http://recollectionwisconsin
.org/localhistory2013
South Wood County Historical Museum
Editor's Notes
Once you have your selection criteria, it may not be possible to review/select everything at once, so how might you sequence the process? Again, the answer will be different for each organization.Think about what’smost significant to your organization?most extensive? (and therefore a more coherent body of material to manage)most requested/used?Easiest to tackle (e.g. most familiar, most ready for ingest – a quick win for your digital preservation process; very helpful when you are having to prove the value of your efforts to a reluctant administration)Oldest (possible historical importance)Newest(possible immediate interest)Mandated (via local policies, legislation, etc.)At risk? If it were no longer available, what digital files would be the hardest to replace? Some formats become obsolete a lot faster than other formats. PDFs are viable for a really long time – video files, however, get old very quickly.
If you answered “no” to any of these questions, the item may not be a good candidate for digitization.
Copyright demo
As you are going through the selection process, you will need to establish how you are going to name and organize your files. find things in many places and named in many different ways depending on who worked on the item. Digital items are so much easier to save psychologically for people. 100 items on your hard drive doesn’t take up as much visual space as 100 items in your office. A file that is 1 kb looks pretty much like the one that is 1 MB or 1 GB. There also tends to be more copies of digital items, everyone keeps a draft, or it gets attached to an email and sent to 10 people, or it gets filed in two places. Everybody keeps their own items…project documentation is rarely one person managing the group’s information anymore. Its multiplied by the number of people working on the projectAs a result – EVERYTHING IS SAVED – “just in case” and its often saved more then once
Standards – Need a baseline so that everyone knows how to name items as well as how NOT to name themOR where and how items will be stored
Short and Descriptive – My record is a file name with 167 characters. While really descriptive, it was too hard to work with. Couldn’t read the entire title in a file list and couldn’t copy it since it was buried in several layers of folders. We tend to name things in ways that make sense to us at the time, but this is not handy for long term preservation. You need to name things in a way that will make sense 20 years from now. Has anyone inherited files from previous employees or projects – do they make any sense? “My stuff” “Important” “To Read”
Searching is really difficult if you have to search through multiple layersMany types of documents will be easier to find if you can come up with a consistent date naming convention
This slide contains links to both the web version and the You Tube version of 4 videos created by the State Library of North Carolina about File Naming procedures. They total about 10 minutes and provide some great tips.
Co-locate – It’s OK to move things around if it makes sense to do so. Layers – If you have several layers to hunt through, it can be really hard to find anything – Shallow is better Searching is really difficult if you have to search through multiple layersBreadcrumbs – OK to leave “sticky notes” (AKA “READ ME”) files in folders. Can give a brief description of contents, retention schedule, any naming conventionsDon’t know – unknown file formats, files on old media (floppies), password protected
File backups – EX: Speeches had multiple drafts Final + copies in several different font sizes Supplementary files – folder of images that were used in a power point. Files you can’t open – CorruptedFormats – may receive Word and pdf – May not want to keep both. As you are creating your inventory, you are likely to discover a lot of really simple places you can clean up the files you are reviewing. Co-locate – It’s OK to move things around if it makes sense to do so. Bury – If you have several layers to hunt through, it can be really hard to find anything – Shallow is better
Once you’ve decided how you want to handle file naming issues and have made file management decisions – Document itIt doesn’t have to be long….. You can distribute it in your organization – post it on an intranet, place it in a procedures manual WHY – You will not be the only keeper of the information. (You weren’t here to ask)It will help others who may be helping you with the inventoryYou can hand it out to organizations/departments you receive information from In order to better manage our files, we will accept these file types and formats, they will be named this way. Do not give us password protected documentsYou don’t have to organize and fix everything, but you do need to give other people the tools to help you.