11. www.petshaming.net
NO
Reproducibility
Transparency
Reuse
Didn’t
share
the
data
Didn’t
document
the
data
(metadata)
Didn’t
document
provenance/workflow
19. Design
file
naming
scheme
Planning
Use
descriptive
file
names
*
• Unique
• Reflect
contents
Bad:
Mydata.xls
2001_data.csv
best
version.txt
Better:
Eaffinis_nanaimo_2010_counts.xls
Study
organism
Site
name
Year
What
was
measured
*Not
for
everyone
From
R
Cook,
ESA
Best
Practices
Workshop
2010
21. Design
file
organization
Biodiversity
Lake
Experiments
Biodiv_H20_heatExp_2005to2008.csv
Biodiv_H20_predatorExp_2001to2003.csv
…
Field
work
Biodiv_H20_PlanktonCount_2001toActive.csv
Biodiv_H20_ChlAprofiles_2003.csv
…
Planning
Consider…
• Dependencies?
• File
formats?
• Time
of
collection?
• Order
of
analysis?
Wo r
ws !
kflo
Grassland
From
S.
Hampton
22. Design
your
spreadsheet
Constrain
entries
Atomize
Break
down
spreadsheets
From
Flickr
by
Ulleskelf
Planning
23. Consider
a
database
Planning
A
relational
database
is
A
set
of
tables
Relationships
among
the
tables
A
language
to
specify
&
query
the
tables
A
RDB
provides
Scalability:
millions+
records
Features
for
sub-‐setting,
querying,
sorting
Reduced
redundancy
&
entry
errors
From
Mark
Schildhauer
24. Consider
a
database
Planning
You
should
invest
time
in
learning
databases
if
your
data
sets
are
large
or
complex
Consider
investing
time
in
learning
databases
if
your
data
are
small
and
humble
you
ever
intend
to
share
your
data
you
are
<
30
years
old
From
Mark
Schildhauer
25. Planning
Pick
a
data
repository
Store
your
data
in
a
repository
Institutional
archive
Ask
a
librarian
Discipline/specialty
archive
Repos
of
repos:
databib.org
re3data.org
From
Flickr
by
torkildr
26. Decide
on
preservation/backup
Planning
What
software?
What
hardware?
What
personnel?
How
often?
Set
up
reminders!
Test
system
From
Flickr
by
withassociates
From
Flickr
by
sepa
synod
From
Flickr
by
taberandrew
27. Write
a
data
management
plan!
Planning
…document
that
describes
what
you
will
do
with
your
data
throughout
the
research
project
From
Flickr
by
Barbies
Land
28. Planning
DMP
components
•
•
•
•
•
•
From
Flickr
by
Barbies
Land
What
will
be
collected
Methods
Standards
Metadata
Sharing/access
have
But they all
different requirements
Long-‐term
storage
and express them in
different ways
29. dmptool.org
Step-‐by-‐step
wizard
for
generating
DMP
create
|
edit
|
re-‐use
|
share
Free
&
open
to
community
Planning
31. Keep
raw
data
raw
Realistically:
• Archive
.csv
version
of
raw
data
• Make
a
“raw”
tab
in
working
data
file
• Do
all
work
on
other
tabs
During
collection
32. Keep
raw
data
raw
Ideally:
• Use
scripts
to
process
data
• Save
them
with
data
Raw
data
as
.csv
During
collection
R
script
for
processing
&
analysis
33. Document
your
workflow
During
collection
Workflow:
how
you
get
from
the
raw
data
to
the
final
products
of
your
research
Simple
workflow:
flow
chart
Temperature
data
Salinity
data
“Clean”
T
&
S
data
Data
import
into
Excel
Data
in
spread-‐
sheet
Quality
control
&
data
cleaning
Analysis:
mean,
SD
Graph
production
Summary
statistics
34. Document
your
workflow
During
collection
Workflow:
how
you
get
from
the
raw
data
to
the
final
products
of
your
research
Simple
workflow:
commented
script
• R,
SAS,
MATLAB…
• Well-‐documented
code
is
Easier
to
review
Easier
to
share
Easier
to
use
for
repeat
analysis
#
%
$
&
35. Document
your
workflow
During
collection
Fancy
schmancy
workflows
Resulting
output
https://kepler-‐project.org
36. Document
your
workflow
During
collection
Workflows
enable
• Reproducibility
• Transparency
• Reuse
From
Flickr
by
merlinprincesse
37. Constrain
data
entries
• Excel
lists
• Data
validation
• Google
docs
forms
Modified
from
K.
Vanderbilt
During
collection
39. Break
down
spreadsheets
Fake
a
relational
database
During
collection
Create
parameter
table
Create
a
site
table
From
doi:10.3334/ORNLDAAC/777
From
doi:10.3334/ORNLDAAC/777
From
R
Cook,
ESA
Best
Practices
Workshop
2010
41. During
collection
Create
metadata
Metadata:
data
reporting
WHO
created
the
data?
WHAT
is
the
content
of
the
data
set?
WHEN
was
it
created?
HOW
was
it
developed?
WHY
was
it
developed?
From
Flickr
by
//ichael
Patric|{
WHERE
was
it
collected?
42. During
collection
Create
metadata
Digital
context
Scientific
context
•
Name
of
the
data
set
•
Scientific
reason
why
the
data
were
collected
•
The
name(s)
of
the
data
file(s)
in
the
data
set
•
What
data
were
collected
•
Date
the
data
set
was
last
modified
•
•
Example
data
file
records
for
each
data
type
file
What
instruments
(including
model
&
serial
number)
were
used
•
Environmental
conditions
during
collection
•
Pertinent
companion
files
•
Temporal
&
spatial
resolution
•
List
of
related
or
ancillary
data
sets
•
Standards
or
calibrations
used
•
Software
(including
version
number)
used
to
Information
about
parameters
prepare/read
the
data
set
• How
each
was
measured
or
produced
Data
processing
that
was
performed
• Units
of
measure
•
Personnel
&
stakeholders
•
Format
used
in
the
data
set
•
Who
collected
•
Precision
&
accuracy
if
known
•
Who
to
contact
with
questions
•
Funders
Information
about
data
•
Definitions
of
codes
used
•
Quality
assurance
&
control
measures
•
Known
problems
that
limit
data
use
(e.g.
uncertainty,
sampling
problems)
43. Create
metadata
<
a n da rd
St
During
collection
What
is
metadata?
Metadata
standards…
• Provide
structure
to
describe
data
Common
terms
|
definitions
|
language
|
structure
• Come
in
many
flavors
EML
,
FGDC,
ISO19115,
DarwinCore,…
• Can
be
met
using
software
tools
Morpho
(EML),
Metavist
(FGDC),
NOAA
MERMaid
(CSGDM)
44. During
collection
Back
up
daily
Near
Original
From
Flickr
by
see
phar
From
Flickr
by
lippo
Far
45. Remember
that
data
management
plan?
During
collection
Revisit
Review
Revise
From
Flickr
by
Barbies
Land
46. From
Flickr
by
purplemattfish
During
collection
Revisit
Review
Revise
Schedule
a
time
each
week
or
month
54. Clean
data
Open
Refine
=
Google
Refine
•
•
•
•
Open
source
desktop
application
Used
for
data
cleanup
and
transformation
to
other
formats
Works
with
spreadsheets
but
behaves
like
a
database
User
can
filter
the
rows
to
display
using
facets
that
define
filtering
criteria
55. Open
Refine
=
Google
Refine
•
•
•
•
Open
source
desktop
application
Used
for
data
cleanup
and
transformation
to
other
formats
Works
with
spreadsheets
but
behaves
like
a
database
User
can
filter
the
rows
to
display
using
facets
that
define
filtering
criteria