Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Data Stewardship for Researchers at UC Riverside
1. Data
Stewardship
for
Researchers
Carly
Strasser,
PhD
California
Digital
Library
@carlystrasser
carly.strasser@ucop.edu
UC
Riverside
April
2013
From
Calisphere,
Couretsy
of
UC
Riverside,
California
Museum
of
Photography
Tips,
Tools,
&
Guidance
From
Calisphere,
Courtesy
of
Thousand
Oaks
Library
3. C.
Strasser
C.
Strasser
C.
Strasser
C.
Strasser
Courtesy
of
WHOI
4. C.
Strasser
C.
Strasser
C.
Strasser
North
Atlantic
right
whale
mother
and
calf,
by
Gill
Braulik
under
Permit
No.
655-‐1652
5. Is
data
management
being
taught?
Do
attitudes
about
sharing
differ
among
disciplines?
What
role
can
libraries
play
in
data
education?
How
can
we
promote
storing
data
in
repositories?
What
barriers
to
sharing
can
we
eliminate?
Why
don’t
people
share
data?
6.
7. Why
is
data
management
a
hot
topic?
From
Calisphere
via
Santa
Clara
University,
ark:/13030/kt696nc7j2
9. Digital
data
From
Flickr
by
Flickmor
From
Flickr
by
US
Army
Environmental
Command
From
Flickr
by
DW0825
C.
Strasser
Courtesey
of
WHOI
www.woodrow.org
From
Flickr
by
deltaMike
15. UGLY TRUTH
Data
management?
Metadata?
Data
repositories?
Share
data
publicly?
Why
share
data?
From
Flickr
by
s
i
b
e
r
about
researchers…
16. Hurdles
to
Data
Stewardship
From
Flickr
by
iowa_spirit_walker
• Cost
• Confusion
about
standards
• Disparate
datasets
• Lack
of
training
• Fear
of
lost
rights
or
benefits
• No
incentives
17. Who
cares?
From
Flickr
by
Redden-‐McAllister
From
Flickr
by
AJC1
20. 1. Planning
2. Data
collection
&
organization
3. Quality
control
&
assurance
4. Metadata
5. Workflows
6. Data
stewardship
&
reuse
Best
Practices
for
Data
Management
21. Create
unique
identifiers
• Decide
on
naming
scheme
early
• Create
a
key
• Different
for
each
sample
2.
Data
collection
&
organization
From
Flickr
by
sjbresnahan
From
Flickr
by
zebbie
22. Standardize
• Consistent
within
columns
– only
numbers,
dates,
or
text
• Consistent
names,
codes,
formats
Modified
from
K.
Vanderbilt
From
Pink
Floyd,
The
Wall
themurkyfringe.com
2.
Data
collection
&
organization
23. Google
Docs
Forms
Standardize
• Reduce
possibility
of
manual
error
by
constraining
entry
choices
Modified
from
K.
Vanderbilt
2.
Data
collection
&
organization
Excel
lists
Data
validataion
24. 2.
Data
collection
&
organization
Create
parameter
table
Create
a
site
table
From
doi:10.3334/ORNLDAAC/777
From
doi:10.3334/ORNLDAAC/777
From
R
Cook,
ESA
Best
Practices
Workshop
2010
25. Use
descriptive
file
names
• Unique
• Reflect
contents
From
R
Cook,
ESA
Best
Practices
Workshop
2010
Bad:
Mydata.xls
2001_data.csv
best
version.txt
Better:
Eaffinis_nanaimo_2010_counts.xls
Site
name
Year
What
was
measured
Study
organism
2.
Data
collection
&
organization
*Not
for
everyone
*
26. Organize
files
logically
Biodiversity
Lake
Experiments
Field
work
Grassland
Biodiv_H20_heatExp_2005to2008.csv
Biodiv_H20_predatorExp_2001to2003.csv
…
Biodiv_H20_PlanktonCount_2001toActive.csv
Biodiv_H20_ChlAprofiles_2003.csv
…
From
S.
Hampton
2.
Data
collection
&
organization
27. Preserve
information
• Keep
raw
data
raw
• Use
scripts
to
process
data
&
save
them
with
data
Raw
data
as
.csv
R
script
for
processing
&
analysis
2.
Data
collection
&
organization
28. 1. Planning
2. Data
collection
&
organization
3. Quality
control
&
assurance
4. Metadata
5. Workflows
6. Data
stewardship
&
reuse
Best
Practices
for
Data
Management
29. Before
data
collection
• Define
&
enforce
standards
• Assign
responsibility
for
data
quality
3.
Quality
control
and
quality
assurance
From
Flickr
by
StacieBee
30. During
data
collection/entry
• Minimize
manual
entry
• Use
double
entry
• Use
a
database
• Document
changes
3.
Quality
control
and
quality
assurance
From
Flickr
by
schock
31. After
data
entry
• Check
for
missing,
impossible,
anomalous
values
• Perform
statistical
summaries
• Look
for
outliers
• Normal
probability
plots
• Regression
• Scatter
plots
• Maps
3.
Quality
control
and
quality
assurance
0
10
20
30
40
50
60
0
10
20
30
40
32. 1. Planning
2. Data
collection
&
organization
3. Quality
control
&
assurance
4. Metadata
5. Workflows
6. Data
stewardship
&
reuse
Best
Practices
for
Data
Management
34. • Digital
context
• Name
of
the
data
set
• The
name(s)
of
the
data
file(s)
in
the
data
set
• Date
the
data
set
was
last
modified
• Example
data
file
records
for
each
data
type
file
• Pertinent
companion
files
• List
of
related
or
ancillary
data
sets
• Software
(including
version
number)
used
to
prepare/read
the
data
set
• Data
processing
that
was
performed
• Personnel
&
stakeholders
• Who
collected
• Who
to
contact
with
questions
• Funders
• Scientific
context
• Scientific
reason
why
the
data
were
collected
• What
data
were
collected
• What
instruments
(including
model
&
serial
number)
were
used
• Environmental
conditions
during
collection
• Where
collected
&
spatial
resolution
When
collected
&
temporal
resolution
• Standards
or
calibrations
used
• Information
about
parameters
• How
each
was
measured
or
produced
• Units
of
measure
• Format
used
in
the
data
set
• Precision
&
accuracy
if
known
• Information
about
data
• Definitions
of
codes
used
• Quality
assurance
&
control
measures
• Known
problems
that
limit
data
use
(e.g.
uncertainty,
sampling
problems)
• How
to
cite
the
data
set
4.
Metadata
basics
35. • Provides
structure
to
describe
data
Common
terms
|
definitions
|
language
|
structure
4.
Metadata
basics
• Lots
of
different
standards
EML
,
FGDC,
ISO19115,
DarwinCore,…
• Tools
for
creating
metadata
files
Morpho
(EML),
Metavist
(FGDC),
NOAA
MERMaid
(CSGDM)
What
is
metadata?
Select
the
appropriate
metadata
standard
36. 1. Planning
2. Data
collection
&
organization
3. Quality
control
&
assurance
4. Metadata
5. Workflows
6. Data
stewardship
&
reuse
Best
Practices
for
Data
Management
37. Temperature
data
Salinity
data
Data
import
into
R
Analysis:
mean,
SD
Graph
production
Quality
control
&
data
cleaning
“Clean”
T
&
S
data
Summary
statistics
Data
in
R
format
5.
Workflows
Workflow:
how
you
get
from
the
raw
data
to
the
final
products
of
your
research
Simple
workflows:
flow
charts
38. • R,
SAS,
MATLAB
• Well-‐documented
code
is…
Easier
to
review
Easier
to
share
Easier
to
repeat
analysis
5.
Workflows
Workflow:
how
you
get
from
the
raw
data
to
the
final
products
of
your
research
Simple
workflows:
commented
scripts
#
%
$
&
40. Workflows
enable
Reproducibility
can
someone
independently
validate
findings?
Transparency
others
can
understand
how
you
arrived
at
your
results
Executability
others
can
re-‐run
or
re-‐use
your
analysis
5.
Workflows
From
Flickr
by
merlinprincesse
41. 1. Planning
2. Data
collection
&
organization
3. Quality
control
&
assurance
4. Metadata
5. Workflows
6. Data
stewardship
&
reuse
Best
Practices
for
Data
Management
42. Use
stable
formats
csv,
txt,
tiff
Create
back-‐up
copies
original,
near,
far
Periodically
test
ability
to
restore
information
6.
Data
stewardship
&
reuse
Modified from R. Cook
43. Store
your
data
in
a
repository
Institutional
archive
Discipline/specialty
archive
6.
Data
stewardship
&
reuse
From
Flickr
by
torkildr
44. Allows
readers
to
find
data
products
Get
credit
for
data
and
publications
Promotes
reproducibility
Better
measure
of
research
impact
Modified from R. Cook
6.
Data
stewardship
&
reuse
Practice
Data
Citation
Example:
Sidlauskas,
B.
2007.
Data
from:
Testing
for
unequal
rates
of
morphological
diversification
in
the
absence
of
a
detailed
phylogeny:
a
case
study
from
characiform
fishes.
Dryad
Digital
Repository.
doi:10.5061/dryad.20
Learn
more
at
www.datacite.org
46. A
document
that
describes
what
you
will
do
with
your
data
during
your
research
and
after
you
complete
the
project
What
is
a
data
management
plan?
47. From
Flickr
by
Gavinzac
• Saves
time
• Increases
efficiency
• Easier
to
use
data
• Others
can
understand
&
use
data
• Credit
for
data
products
• Funders
require
it
Why
bother?
48. DMP
supplement
may
include:
1. the
types
of
data,
samples,
physical
collections,
software,
curriculum
materials,
and
other
materials
to
be
produced
in
the
course
of
the
project
2.
the
standards
to
be
used
for
data
and
metadata
format
and
content
(where
existing
standards
are
absent
or
deemed
inadequate,
this
should
be
documented
along
with
any
proposed
solutions
or
remedies)
3.
policies
for
access
and
sharing
including
provisions
for
appropriate
protection
of
privacy,
confidentiality,
security,
intellectual
property,
or
other
rights
or
requirements
4.
policies
and
provisions
for
re-‐use,
re-‐distribution,
and
the
production
of
derivatives
5.
plans
for
archiving
data,
samples,
and
other
research
products,
and
for
preservation
of
access
to
them
NSF
DMP
Requirements
From
Grant
Proposal
Guidelines:
49. • Types
of
data
• Existing
data
• How/when/where
created?
• How
processed?
• Quality
control
• Security
• Who
is
responsible
1. Types
of
data
&
other
information
biology.kenyon.edu
C.
Strasser
From
Flickr
by
Lazurite
50. Wired.com
• Metadata
needed
• How
captured
• Standards
2. Data
&
metadata
standards
51. • Obligation
to
share
• How/when/where
available
• Getting
access
• Copyright
/
IP
• Permission
restrictions
• Embargo
periods
• Ethics/privacy
• How
cited
3. Policies
for
access
&
sharing
4. Policies
for
re-‐use
&
re-‐distribution
52. • What
&
where
• Metadata
• Who’s
responsible
5. Plans
for
archiving
&
preservation
From
Flickr
by
theManWhoSurfedTooMuch
54. NSF’s
Vision*
DMPs
and
their
evaluation
will
grow
&
change
over
time
Peer
review
will
determine
next
steps
Community-‐driven
guidelines
Evaluation
will
vary
with
directorate,
division,
&
program
officer
*Unofficially
70. My
website
Email
me
Tweet
me
My
slides
CDL
Blog
carlystrasser.net
carlystrasser@gmail.com
@carlystrasser
slideshare.net/carlystrasser
datapub.cdlib.org