Presented at 2015 PaLA Annual Conference on November 6, 2015 by
Linda Ballinger, Penn State
Doreva Belfiore, Temple University
Bill Fee, State Library of Pennsylvania
Leanne Finnigan, Temple University
Kristen Yarmey, University of Scranton
3. Who are we?
The Pennsylvania Digital Collections Project (PDCP)
Metadata team!
Linda Ballinger, Penn State
Doreva Belfiore, Temple University
Bill Fee, State Library of Pennsylvania
Leanne Finnigan, Temple University
Kristen Yarmey, University of Scranton
4. On the agenda
PDCP/DPLA Overview
Meet the aggregator
Why metadata matters
Field by Field metadata madness!
Derived Fields
Required Fields
Highly Recommended Fields
Recommended Fields
Optional Fields
5. Before we start
Q&A throughout
Fun breaks at panelists’ discretion!
Slides and guidelines will be available.
Most of all:
Don’t panic.
We’re all in this together.
7. Toward a PA DPLA Hub
August 2014: meeting at the Free Library of
Philadelphia
Initiated by Joe Lucia and Stacey Aldrich, former PA State
Librarian
Including representatives from a number of institutions
across the state
9. Why get involved?
DPLA as major discoverability conduit:
Worldwide exposure for PA content
DPLA as a means of working efficiently:
Collaboration at the cross-institutional level
Taking advantage of economy of scale
DPLA portal / api vs. customized siloes
11. DPLA Hub and Spoke Model
Content Hubs:
Single institutions, 200K+ objects, i.e.
NARA, Hathi Trust, NYPL
Service Hubs:
Content aggregation for many
institutions
State/regional level; ideally 1:1 ratio
Digital Commonwealth, Mountain West
Digital Library, Empire State Digital
Network
12.
13. Digitization and Repository Support Activities
Digitization:
For organizations that have not started digitizing
materials, or have not done much
Potential for remote, local and mobile digitization
options (a.k.a. “scannebagos”)
Provided by the State Library of Pennsylvania
Content Hosting:
For organizations that already have digital files but no
current digital repository capabilities
Provided by POWER Library (HSLC)
Free for Pennsylvania institutions
14. SUCCESS!
PDCP Announced as DPLA Pennsylvania
Service hub, August 28, 2015 !
Estimated Timeline:
September, 2015 - Orientation
October-November, 2015 - Metadata
normalization and harvesting tests
December, 2015 - Final ingest of data
into PDCP Aggregator
Early 2016 - Planned live ingest of
records into the DPLA!
15. PA-DPLA Aggregator
Proof-of-concept prototype
Penn State / Temple University /
State Library partnership
Dec. 2014 - Mar. 2015
Hydra (Fedora) - Open Source Platform
Harvesting & exposing metadata via OAI-PMH
https://github.com/tulibraries/dplah
20. Prototype Harvested Content
“Lowest hanging fruit”:
OAI-PMH harvestable data
29 institutions, 147K+ harvested records
Primarily targeting collections from PDPC Steering and
Planning Committee institutions
Keep numbers manageable for testing purposes
Scalable to full production mode for the future
29. CC0 Metadata
Contributing institutions are required to share
their metadata and thumbnails under a CC0
license (full access - no rights reserved).
The digital objects themselves retain any
original specified rights.
30. Collecting Scope
The following types of collections are NOT
currently accepted by the DPLA:
Scholarly materials: ETD’s, Journal Articles
Finding Aids: EAD’s, Collection Guides
Aggregate Description: Objects described at the folder,
series, or collection level instead of the item level
Items that don’t resolve to a publicly-accessible URL
Individual page-level objects instead of compound ones
32. Derived Fields
Derived fields are those
metadata fields that are
created by the PDCP
aggregator automatically
from the OAI-PMH feed.
Happy Face, Temple University, http://digital.library.temple.edu/cdm/ref/collection/p15037coll3/id/6541
Derived = “Dont worry, be
happy!”
33. Thumbnail
Thumbnails are the small preview versions of
your digital object that are shown both in your
repository and in the DPLA.
They are important because they give viewers
a confirmation that they have found (or not
found) what they are looking for.
34. Thumbnail
Thumbnails can be derived by our aggregator
from these common repository systems:
CONTENTdm
Bepress
Omeka
VUDL
… and more to be added
36. Thumbnail
For other systems, we need a consistent path
where the thumbnail is housed, i.e.:
http://www.server.org/repo/thumbs/$identifier/
37. Collection
The collection name is set up by the team
before harvesting. It generally matches the
digital collection name found online.
38. Contributing Institution
The contributing institution
name refers to YOUR
ORGANIZATION and is set
up by the team before
harvesting.
Are YOU in This?, Temple University, http://digital.library.temple.edu/cdm/ref/collection/p16002coll9/id/2952
40. Intermediate Provider
If your data is hosted by an aggregator or
common repository then we list that entity as an
Intermediate Provider, i.e.:
Keystone Library Network (KLN)
Lackawanna Valley Digital Archives (LVDA)
POWER Library (HSLC)
41. Resource Location
The Resource Location is a trackback to the
original collection URL for a digital object.
Example:
http://content.lackawannadigitalarchives.org/cd
m/ref/collection/SPL/id/36
43. Resource Location
Early Library Staff, Scranton Public Library, http://content.lackawannadigitalarchives.org/cdm/ref/collection/SPL/id/36
44. Resource Location
Required by DPLA to present your original data record
Can be derived from the OAI-PMH data feed for typical systems:
CONTENTdm
Bepress
Omeka
VUDL
Can be custom mapped if needed for other systems, e.g.:
http://www.server.org/repo/$identifier/
46. Title
Other than the thumbnail, the title is often the first piece of
information a user sees on a results list
Should be the name by which an object is known, not a file
name
47. Language
Required if appropriate
3 letter ISO 639-2 language codes are preferred
Aggregator normalizes these codes to full language
names for display
Examples:
eng ---> English lat
---> Latin
ita ---> Italian san
---> Sanskrit
spa ---> Spanish vie
49. Rights
Contains information about rights associated with the
resource
“In the public domain and may be used without copyright restriction.”
“Content is under copyright of the University of Scranton.”
http://creativecommons.org/licenses/by-sa/3.0/
REMINDER: DPLA will only accept objects that are
available and viewable to the general public
pdcp_noharvest
50. Rights
‘Getting it Right on Rights’
Working group (DPLA, Europeana, etc.)
Released white paper May 2015 and
opened it up for comments
Standardized rights statements
Coming soon!
52. Type
The nature or genre of the resource
DCMI Type Vocabulary recommended
Assign ‘Text’ type to images of texts
Think of the user
53. Type
Types used by DPLA:
text, image, sound, moving image,
physical object
The aggregator can map your local
types to these at the collection/seed
level
68. Place
Multiple choice:
Philadelphia
Philadelphia; Pennsylvania
Philadelphia (Pa.)
Philadelphia, Pennsylvania, United States
Seventh and Sansom Streets (Philadelphia, Pa.)
Franklin Institute (Philadelphia, Pa.)
Facade of the original
Franklin Institute building
prior to moving to the
Parkway in 1934.
74. Subject
Many variations on a theme:
Newspapers
Student newspaper
New Holland (Pa.) Newspapers
Scranton (Pa.) -- Newspapers
West Chester University Student Newspapers
College student newspapers and periodicals -- Pennsylvania -- Scranton
University of Scranton -- Students -- Newspapers
Lock Haven University of Pennsylvania Student Newspaper Archive
77. Subject
Watch out for:
Quotation marks
"D.O.R.A at Westminster"
Separate terms with semi-colons
, Holt, Colbin
32 Carat Club,anniversary ,charitable organizations,social services
Odd symbols or characters
2nd &
84. Format
From New York Heritage
http://cdm16694.contentdm.oclc.org/cdm/ref/collection/p15109coll6/id/2083
From African Americans Seen Through the Eyes of the Newsreel Cameraman
http://collections.contentdm.oclc.org/cdm/singleitem/collection/p9002coll1/id/277/rec/1
91. Checklist to Contribute Data
Permission letter agreeing to share metadata and
thumbnails to DPLA under a CC0 license
Data available on a publicly accessible website
Ability to share metadata via OAI-PMH or CSV
file
Staff available to work with PDCP about
metadata issues
92. What’s next?
See PDCP Metadata Guidelines (still in draft)
Let us know if you have feedback!
We plan to finalize v.1 in December
Living document
Would your institution like to contribute to the DPLA?
Email: dplainpa@gmail.com
Institutions will be forwarded to different organizations based upon
needs and readiness for data harvest (harvesting and metadata
support, repository support, digitization support)
93. Stay in touch:
Come to PA Backwards session tomorrow morning (9am)!
Email the PDCP team: dplainpa@gmail.com
Twitter: Follow us at @pdcp_pa
PADIGITAL Listserv - general information about statewide digital initiatives
PADIGITAL@listserv.albright.org
Send a message to listserv@albright org with the text “subscribe padigital” in
the body
94. Resources and Support
What would help you the most?
Online office hours?
Webinar workshops?
…?
METADATA
Original: The Doctor Is In, Peanuts Worldwide, LLC.