Data Curation and Data Curation
Services in Academic Libraries -
an Overview
Patricia Hswe | Pennsylvania State University Libraries
Digital Content Strategist
Head, ScholarSphere User Services
CLIR Postdoctoral Fellowship Program Summer Seminar
Bryn Mawr College | 30 July 2014
1
My Context
How I Got Here
2
“Cubicle wall” - by sidewalk flying CC BY 2.0
First CLIR postdoc cohort
U. Illinois at Urbana-
Champaign: Slavic &
East European Library,
2004-2006
Slavic Digital
Humanities Fellow
Digital projects,
reference services,
workshops Summer 2004
3
Library and information
science school - GSLIS
Digital Library track
Document Modeling; Use and
Users; Interfaces to Information
Systems; Digital Libraries;
Information Modeling;
Metadata; Information Transfer
& Collaboration in Science;
Ontology Development
Graduate and research
assistantships - Digital
preservation projects, Engineering
and Mathematics libraries
4
National Digital Information &
Infrastructure Preservation
Program
Publishing and Curation Services
Penn State University Libraries
{ Since 2008 }
5
Template for the Talk
What is data curation?
Curation is not a solo undertaking.
Developing a repository service - one
piece of the data curation puzzle.
Suggested essentials for getting started.
6
What is data
curation?
What is data?
What isn’t data curation?
Purposes.
7
“20080528-002corrupted”- by nvshn CC BY SA 2.0
Research data may be -
Observational (e.g., sensor data, data from surveys)
Experimental (e.g., gene sequencing data)
Simulation (e.g., climate modeling data)
Derived or compiled (e.g., text mining, 3D models)
Reference or canonical (e.g., static, peer-reviewed
data sets, likely published and curated)
(from U. of Edinburgh, Information Services)
8
Examples of data formats -
Text
Numerical
Multimedia
Models
9
Data specific to a
discipline (e.g.,
Chrystallographic
Information File, or CIF)
Data specific to an
instrument (e.g., images
from a digital
microscope)
from U. Edinburgh, Information Services
10
“Walt Whitman’s Leaves of Grass” by dbking CC BY 2.0
11
What isn’t data curation?
Only backing up
data
Not equal to data
management,
though overlaps
Digital preservation
Vs. digital curation
12
“Data” - by UWW ResNET CC BY-NC-SA 2.0
13
Transformation
(migration, derivatives)
Collection of data
(selection and appraisal)
Documentation / description
(metadata)
Ingest / preservation / storage
Dissemination / sharing
access / use / reuse
Standards
Policies,
rights,
conditions
I
N
T
E
R
O
P
E
R
A
B
I
L
I
T
Y
“Gulliver’s collections” by Yohan Creemers CC BY-SA 2.0
INFRASTRUCTURE
Examples of data curation
activities -
Organizing data
Reformatting, compressing
(e.g., .tar, .zip), normalizing
Planning for long-term security
Protecting, redacting,
anonymizing data
Applying licenses to data
Providing data citation
guidance
14
“Clean up, or you’re out! :Brooklyn Street Sign” - by
emilydickinsonridesabmx CC BY 2.0
Purposes: proper curation of
research data should enable -
Verifiability
Understanding of data use
Identification of responsibility
“Findability”
New research
Improved guidelines, policies
15
“Ecosystem” - by moizissimo CC BY-NC-ND 2.0
Keep in mind: use cases
Researchers’ goals
Current research
workflows and
desired workflows
Role / context of
“data keeper”
Intended uses
16
Primary users of the
data
Types of access
Frequency of
access / use
Potential
outcomes / impacts
Curation is
not a solo
undertaking
Thinking beyond
models
17
“Soloist” - by Chris JL CC BY-NC-ND 2.0
18
Plan research/
design methodology
Collect/Generate
data
Document/Describe
Process/Analyze
data
Preserve (Prepare for
ingest)
Store data
Share/Access data
Create derivatives /
reuse
A mashup
of a few
models
19
Plan research/
design methodology
Collect/Generate
data
Document/
Describe
MANAGE DATA
Process/Analyze
data
Preserve (Prepare for
ingest)
MANAGE DATA
Store data
MANAGE DATA
Share/Access data
MANAGE DATA
Create derivatives /
reuse
MANAGE DATA
20
Plan research/
design methodology
Collect/Generate
data
Document/
Describe
MANAGE DATA
Process/Analyze
data
Preserve (Prepare for
ingest)
MANAGE DATA
Store data
MANAGE DATA
Share/Access data
MANAGE DATA
Create derivatives /
reuse
MANAGE DATA
C
O
L
L
A
B
O
R
A
T
I
O
N
C
O
L
L
A
B
O
R
A
T
I
O
N
Researchers
Researchers
Researchers
Researchers
Data curators /
librarians, IT
Data curators /
librarians, IT
Data curators /
librarians, IT
Data curators /
librarians, IT
21
Plan research/
design methodology
Collect/Generate
data
Document/
Describe
MANAGE DATA
Process/Analyze
data
Preserve (Prepare for
ingest)
MANAGE DATA
Store data
MANAGE DATA
Share/Access data
MANAGE DATA
Create derivatives /
reuse
MANAGE DATA
C
O
L
L
A
B
O
R
A
T
I
O
N
C
O
L
L
A
B
O
R
A
T
I
O
N
Researchers
Researchers
Researchers
Researchers
Data curators /
librarians, IT
Publishers
Data curators /
librarians, IT
Data curators /
librarians, IT
Data curators /
librarians, IT
22
Plan research/
design methodology
Collect/Generate
data
Document/
Describe
MANAGE DATA
Process/Analyze
data
Preserve (Prepare for
ingest)
MANAGE DATA
Store data
MANAGE DATA
Share/Access data
MANAGE DATA
Create derivatives /
reuse
MANAGE DATA
C
O
L
L
A
B
O
R
A
T
I
O
N
C
O
L
L
A
B
O
R
A
T
I
O
N
Researchers
Researchers
Researchers
Researchers
Data curators /
librarians, IT
Publishers
Data curators /
librarians, IT
Data curators /
librarians, IT
Data curators /
librarians, IT
Users of
the
data!
Curation involves building
and sustaining -
Community
Partnerships
Integration
toward
solutions
23
“Relationships” - by beatnic CC BY 2.0
Examples of libraries leading
in data curation services
Purdue University Libraries
The Library - UC San Diego
NYPL Labs
University of Minnesota Libraries
24
Developing
a Repository
Service
One piece of the
data curation puzzle
25
ScholarSphere didn’t
happen overnight.
26
Text
Platform review, pilot project, prototype
2010 through 2011
27
Building a Community of Practice: The Curation Architecture Prototype Services
(CAPS) Project at Penn State University
!BACKGROUND:!Review!of!Legacy!Pla:orms!
 !Myriad,!unconnected!workflows!
 !Two!pla6orms!effec9vely!moribund!
 !Another!pla6orm!without!upgrades!since!2007!
 !Impossible!to!search!content!across!pla6orms!
 !No!support!for!eErecords,!liFle!for!research!data!
sets!
 !Lack!of!unifying!architecture!
Co#authors:+Michelle+Belden,+Kevin+Clair,+Daniel+Coughlin,++
Michael+Giarlo,+Patricia+Hswe,+and+Linda+Klimczyk++
CAPS%architecture%–%our%point%
of%departure%
Dashboard%interface%giving%
view%of%objects%in%system%%
Wiki%where%stakeholders%
documented%testing%of%CAPS%
How!Prototyping!a!CuraCon!Service!Led!to!Building!a!Community!of!PracCce!–!The!Story!of!CAPS!
 !Aggressive!Cme!line!–!January!to!March!2011!
 !Key!project!roles:!project!manager,!digital!library!architect,!programmer!analyst,!metadata!
librarian,!access!archivist,!and!digital!collec9ons!curator!
 !Use!cases!kickEstarted!prototyping!
 !Daily!meeCngs!allowed!team!to!check!in!briefly!for!updates!&!discussion!
 !Weekly!mee9ngs!with!stakeholders!(archivists,!subject!experts,!digi9za9on!staff)!!
 !Stakeholders!were!consulted!early!on!for!metadata!needs,!with!concerted!aFen9on!to!
crea9ng!a!framework!allowing!for!annota9on!of!objects!throughout!their!lifecycle!!
 !Followed!agile!methods!to!drive!development,!collabora9on,!feedback,!itera9ons!
 !Listened!to!&!learned!from!stakeholders,!being!as!responsive!as!possible!
 !User!requirements!were!prioriCzed,!but!all!of!them!were!documented;!no!need!was!
dismissed!–!both!to!inform!future!development!but!also!to!assure!stakeholders!their!
perspec9ves!counted!and!were!crucial!to!future!development!and!itera9ons!
 !Survey!of!stakeholders!at!project’s!end!revealed!they!felt!listened!to!by!the!project!team,!
and!the!resul9ng!prototype!reflected!their!requirements!and!deliverables!
 !Next!steps:!chart!a!produc9on!path!and!priori9ze!pilot!projects!for!further!tes9ng!
Prototype%of%public%interface%
for%curated%collections%
Ingest%tool%interface%to%upload%
and%describe%digital%object%
Digital!Library!Technologies!
CAPS Poster - https://scholarsphere.psu.edu/files/r781wj67c
In 2012: We did 9 months of
development before beta release . . .
28
. . . 9 months during which we
prioritized user engagement
Enlisted liaison librarians
Held bi-weekly “stakeholder”
meetings to discuss use cases
Met with researchers to gauge their
needs, learn their use cases
Demoed the evolving tool, got
feedback
Frequent communications about
progress
Usability testing w/ librarians, faculty
& students; developers present
29
“011 Researcher by Equipment” -
by IPASadelaide CC BY 2.0
30
31
32
Uses of ScholarSphere
include -
Supplemental files for electronic theses and dissertations
(ETDs)
Compliance with NSF, NIH, and other funding agency
requirements
“Legacy” data sets - i.e., “I need to get these off my hard
drive” data files
Data sets linked to journal articles (publisher requirement)
Student research (“scholarship as pedagogy”)
Service / platform to help diversify what Libraries collect
33
34
35
We aimed for breadth,
rather than depth,
initially.
Because we knew we’d be evolving and
improving the service continually. And we are.
36
37
!
!
!
- 10 releases since
2012
- Next release will be
ScholarSphere 2.0 in
fall 2014: new UI/UX
- Expectation
management is a thing!
How engagement is paying off
Inquiries from researchers looking for help with managing
their data >> new partners.
Subject specialist inquiries about the service >> more
faculty and student users.
Inquiries from departments to use service for student
scholarship >> more content.
Feature requests from users >> improved service.
Librarians + researchers understand better the importance
of data & data curation >> opportunities for infrastructure,
collections, publishing, partnerships, outreach, new roles.
38
Current work, next steps
Current: ScholarSphere Users Group and
incremental feedback on UI/UX work
Usage statistics
More integration with cloud services
Marketing and promotion / outreach (ongoing)
Next: mediated deposit service, “work” data
model, different metadata templates, citation
support (DOIs), notification framework, Zotero
39
Suggested essentials- 1
Find out who your collaborators will be.
Who do you want to collaborate with?
Learn the technology infrastructure at your
library and institution.
Be your own use case.
Learn the basics of project management.
Aim to work in partnership with researchers.
40
Suggested essentials - 2
Discover new mentors.
Definitely seek out training that skills you up -
problem-based.
Also: MANTRA, online data mgmt course.
Looking for examples of digital curation in
practice? There’s an active Google group for this.
Cultivate a sense of experimentation & of play by
trying out tools and applications new to you.
41
Text
Thank you!
Keep in touch: phswe@psu.edu
42

Final data presentation_clir_july2014

  • 1.
    Data Curation andData Curation Services in Academic Libraries - an Overview Patricia Hswe | Pennsylvania State University Libraries Digital Content Strategist Head, ScholarSphere User Services CLIR Postdoctoral Fellowship Program Summer Seminar Bryn Mawr College | 30 July 2014 1
  • 2.
    My Context How IGot Here 2 “Cubicle wall” - by sidewalk flying CC BY 2.0
  • 3.
    First CLIR postdoccohort U. Illinois at Urbana- Champaign: Slavic & East European Library, 2004-2006 Slavic Digital Humanities Fellow Digital projects, reference services, workshops Summer 2004 3
  • 4.
    Library and information scienceschool - GSLIS Digital Library track Document Modeling; Use and Users; Interfaces to Information Systems; Digital Libraries; Information Modeling; Metadata; Information Transfer & Collaboration in Science; Ontology Development Graduate and research assistantships - Digital preservation projects, Engineering and Mathematics libraries 4
  • 5.
    National Digital Information& Infrastructure Preservation Program Publishing and Curation Services Penn State University Libraries { Since 2008 } 5
  • 6.
    Template for theTalk What is data curation? Curation is not a solo undertaking. Developing a repository service - one piece of the data curation puzzle. Suggested essentials for getting started. 6
  • 7.
    What is data curation? Whatis data? What isn’t data curation? Purposes. 7 “20080528-002corrupted”- by nvshn CC BY SA 2.0
  • 8.
    Research data maybe - Observational (e.g., sensor data, data from surveys) Experimental (e.g., gene sequencing data) Simulation (e.g., climate modeling data) Derived or compiled (e.g., text mining, 3D models) Reference or canonical (e.g., static, peer-reviewed data sets, likely published and curated) (from U. of Edinburgh, Information Services) 8
  • 9.
    Examples of dataformats - Text Numerical Multimedia Models 9 Data specific to a discipline (e.g., Chrystallographic Information File, or CIF) Data specific to an instrument (e.g., images from a digital microscope) from U. Edinburgh, Information Services
  • 10.
    10 “Walt Whitman’s Leavesof Grass” by dbking CC BY 2.0
  • 11.
  • 12.
    What isn’t datacuration? Only backing up data Not equal to data management, though overlaps Digital preservation Vs. digital curation 12 “Data” - by UWW ResNET CC BY-NC-SA 2.0
  • 13.
    13 Transformation (migration, derivatives) Collection ofdata (selection and appraisal) Documentation / description (metadata) Ingest / preservation / storage Dissemination / sharing access / use / reuse Standards Policies, rights, conditions I N T E R O P E R A B I L I T Y “Gulliver’s collections” by Yohan Creemers CC BY-SA 2.0 INFRASTRUCTURE
  • 14.
    Examples of datacuration activities - Organizing data Reformatting, compressing (e.g., .tar, .zip), normalizing Planning for long-term security Protecting, redacting, anonymizing data Applying licenses to data Providing data citation guidance 14 “Clean up, or you’re out! :Brooklyn Street Sign” - by emilydickinsonridesabmx CC BY 2.0
  • 15.
    Purposes: proper curationof research data should enable - Verifiability Understanding of data use Identification of responsibility “Findability” New research Improved guidelines, policies 15 “Ecosystem” - by moizissimo CC BY-NC-ND 2.0
  • 16.
    Keep in mind:use cases Researchers’ goals Current research workflows and desired workflows Role / context of “data keeper” Intended uses 16 Primary users of the data Types of access Frequency of access / use Potential outcomes / impacts
  • 17.
    Curation is not asolo undertaking Thinking beyond models 17 “Soloist” - by Chris JL CC BY-NC-ND 2.0
  • 18.
    18 Plan research/ design methodology Collect/Generate data Document/Describe Process/Analyze data Preserve(Prepare for ingest) Store data Share/Access data Create derivatives / reuse A mashup of a few models
  • 19.
    19 Plan research/ design methodology Collect/Generate data Document/ Describe MANAGEDATA Process/Analyze data Preserve (Prepare for ingest) MANAGE DATA Store data MANAGE DATA Share/Access data MANAGE DATA Create derivatives / reuse MANAGE DATA
  • 20.
    20 Plan research/ design methodology Collect/Generate data Document/ Describe MANAGEDATA Process/Analyze data Preserve (Prepare for ingest) MANAGE DATA Store data MANAGE DATA Share/Access data MANAGE DATA Create derivatives / reuse MANAGE DATA C O L L A B O R A T I O N C O L L A B O R A T I O N Researchers Researchers Researchers Researchers Data curators / librarians, IT Data curators / librarians, IT Data curators / librarians, IT Data curators / librarians, IT
  • 21.
    21 Plan research/ design methodology Collect/Generate data Document/ Describe MANAGEDATA Process/Analyze data Preserve (Prepare for ingest) MANAGE DATA Store data MANAGE DATA Share/Access data MANAGE DATA Create derivatives / reuse MANAGE DATA C O L L A B O R A T I O N C O L L A B O R A T I O N Researchers Researchers Researchers Researchers Data curators / librarians, IT Publishers Data curators / librarians, IT Data curators / librarians, IT Data curators / librarians, IT
  • 22.
    22 Plan research/ design methodology Collect/Generate data Document/ Describe MANAGEDATA Process/Analyze data Preserve (Prepare for ingest) MANAGE DATA Store data MANAGE DATA Share/Access data MANAGE DATA Create derivatives / reuse MANAGE DATA C O L L A B O R A T I O N C O L L A B O R A T I O N Researchers Researchers Researchers Researchers Data curators / librarians, IT Publishers Data curators / librarians, IT Data curators / librarians, IT Data curators / librarians, IT Users of the data!
  • 23.
    Curation involves building andsustaining - Community Partnerships Integration toward solutions 23 “Relationships” - by beatnic CC BY 2.0
  • 24.
    Examples of librariesleading in data curation services Purdue University Libraries The Library - UC San Diego NYPL Labs University of Minnesota Libraries 24
  • 25.
    Developing a Repository Service One pieceof the data curation puzzle 25
  • 26.
  • 27.
    Text Platform review, pilotproject, prototype 2010 through 2011 27 Building a Community of Practice: The Curation Architecture Prototype Services (CAPS) Project at Penn State University !BACKGROUND:!Review!of!Legacy!Pla:orms!  !Myriad,!unconnected!workflows!  !Two!pla6orms!effec9vely!moribund!  !Another!pla6orm!without!upgrades!since!2007!  !Impossible!to!search!content!across!pla6orms!  !No!support!for!eErecords,!liFle!for!research!data! sets!  !Lack!of!unifying!architecture! Co#authors:+Michelle+Belden,+Kevin+Clair,+Daniel+Coughlin,++ Michael+Giarlo,+Patricia+Hswe,+and+Linda+Klimczyk++ CAPS%architecture%–%our%point% of%departure% Dashboard%interface%giving% view%of%objects%in%system%% Wiki%where%stakeholders% documented%testing%of%CAPS% How!Prototyping!a!CuraCon!Service!Led!to!Building!a!Community!of!PracCce!–!The!Story!of!CAPS!  !Aggressive!Cme!line!–!January!to!March!2011!  !Key!project!roles:!project!manager,!digital!library!architect,!programmer!analyst,!metadata! librarian,!access!archivist,!and!digital!collec9ons!curator!  !Use!cases!kickEstarted!prototyping!  !Daily!meeCngs!allowed!team!to!check!in!briefly!for!updates!&!discussion!  !Weekly!mee9ngs!with!stakeholders!(archivists,!subject!experts,!digi9za9on!staff)!!  !Stakeholders!were!consulted!early!on!for!metadata!needs,!with!concerted!aFen9on!to! crea9ng!a!framework!allowing!for!annota9on!of!objects!throughout!their!lifecycle!!  !Followed!agile!methods!to!drive!development,!collabora9on,!feedback,!itera9ons!  !Listened!to!&!learned!from!stakeholders,!being!as!responsive!as!possible!  !User!requirements!were!prioriCzed,!but!all!of!them!were!documented;!no!need!was! dismissed!–!both!to!inform!future!development!but!also!to!assure!stakeholders!their! perspec9ves!counted!and!were!crucial!to!future!development!and!itera9ons!  !Survey!of!stakeholders!at!project’s!end!revealed!they!felt!listened!to!by!the!project!team,! and!the!resul9ng!prototype!reflected!their!requirements!and!deliverables!  !Next!steps:!chart!a!produc9on!path!and!priori9ze!pilot!projects!for!further!tes9ng! Prototype%of%public%interface% for%curated%collections% Ingest%tool%interface%to%upload% and%describe%digital%object% Digital!Library!Technologies! CAPS Poster - https://scholarsphere.psu.edu/files/r781wj67c
  • 28.
    In 2012: Wedid 9 months of development before beta release . . . 28
  • 29.
    . . .9 months during which we prioritized user engagement Enlisted liaison librarians Held bi-weekly “stakeholder” meetings to discuss use cases Met with researchers to gauge their needs, learn their use cases Demoed the evolving tool, got feedback Frequent communications about progress Usability testing w/ librarians, faculty & students; developers present 29 “011 Researcher by Equipment” - by IPASadelaide CC BY 2.0
  • 30.
  • 31.
  • 32.
  • 33.
    Uses of ScholarSphere include- Supplemental files for electronic theses and dissertations (ETDs) Compliance with NSF, NIH, and other funding agency requirements “Legacy” data sets - i.e., “I need to get these off my hard drive” data files Data sets linked to journal articles (publisher requirement) Student research (“scholarship as pedagogy”) Service / platform to help diversify what Libraries collect 33
  • 34.
  • 35.
  • 36.
    We aimed forbreadth, rather than depth, initially. Because we knew we’d be evolving and improving the service continually. And we are. 36
  • 37.
    37 ! ! ! - 10 releasessince 2012 - Next release will be ScholarSphere 2.0 in fall 2014: new UI/UX - Expectation management is a thing!
  • 38.
    How engagement ispaying off Inquiries from researchers looking for help with managing their data >> new partners. Subject specialist inquiries about the service >> more faculty and student users. Inquiries from departments to use service for student scholarship >> more content. Feature requests from users >> improved service. Librarians + researchers understand better the importance of data & data curation >> opportunities for infrastructure, collections, publishing, partnerships, outreach, new roles. 38
  • 39.
    Current work, nextsteps Current: ScholarSphere Users Group and incremental feedback on UI/UX work Usage statistics More integration with cloud services Marketing and promotion / outreach (ongoing) Next: mediated deposit service, “work” data model, different metadata templates, citation support (DOIs), notification framework, Zotero 39
  • 40.
    Suggested essentials- 1 Findout who your collaborators will be. Who do you want to collaborate with? Learn the technology infrastructure at your library and institution. Be your own use case. Learn the basics of project management. Aim to work in partnership with researchers. 40
  • 41.
    Suggested essentials -2 Discover new mentors. Definitely seek out training that skills you up - problem-based. Also: MANTRA, online data mgmt course. Looking for examples of digital curation in practice? There’s an active Google group for this. Cultivate a sense of experimentation & of play by trying out tools and applications new to you. 41
  • 42.
    Text Thank you! Keep intouch: phswe@psu.edu 42