Data management workshop 101113

DATA MANAGEMENT 101
October 11, 2013

1 | Data definitions
2 | Dealing with Data
3 | The Next Steps

Data does not speak for itself…

But, even more fundamentally…

tomaytoe

PANTONE
1795 C

tomahto

Solanum
lycopersicum
tdTomato
554ex 581em

$64

Data means different things
to different people

1. Brilliant Idea!
2. Design Experiment
3. Do Experiment

The data timeline

4. Collect data
5. Compile and Analyze
6. Publish
7. Fame, Fortune

1. Brilliant Idea!
2. Design Experiment
3. Do Experiment
4. Collect data

The data timeline:
What people think

5. Compile and Analyze
6. Publish

7. Fame, Fortune

Idea!

Analyzing data

The data timeline:
What Happens

experiment
Compile data

design
Other
People’s
data
Try #2
Failure!!
beer
#896
#896

6. Publish

The data cycle:
What Really Happens

Why should I care?
Hello there!

So you won’t go crazy

Efficiency
Accelerates scientific discovery
Reproducibility of science

Credit where credit is due

Personal organization

Do you get frustrated with…
Hello there!

a. Storing data
b. Backing up data
c. Analyzing/manipulating data
d. Finding data produced by other researchers
e. Ensuring data are secure
f. Making data accessible to other researchers
g. Controlling access to data
h. Tracking updates to data (ie versioning)
i. Creating metadata
j. Protecting intellectual property rights
k. Ensuring appropriate professional credit/citation is given

How do I not go crazy?
naming|metadata |standards | tools

Naming conventions

s/n, variable

Retain order

Project_instrument_location_YYYYMMDDhhmmss_extra.ext
Index/grant

conditions

Leading zero!

Experiment: stem cells on fibrin to damaged heart
Lamar Soutter Library UMMS

‐2 days:  Incubate stem cells with markers

‐1 day:  Stem cells in solution with biological suture

0 day: #1 Surgery: infarct/delivery of stem cells to damaged heart tissue
Variable days:  #2 Surgery:  examination, high speed imaging/LVPs,
isolate heart and place it in freezer
Post days +:  Section heart, tissues on slides, staining, images of tissues,
tracking particles on heart
Collective data from experiment

‐2 days:  Incubate stem cells with markers

‐1 day:  Stem cells in solution with biological suture

0 day: #1 Surgery: infarct/delivery of stem cells to damaged heart tissue

TIME | TYPE | USE

Variable days:  #2 Surgery:  examination, high speed imaging/LVPs,
isolate heart and place it in freezer
Post days +:  Section heart, tissues on slides, staining, images of tissues,
tracking particles on heart
Collective data from experiment

Data

File Format

Images

Machine dependent

Ventricular pressure
measurements

Proprietary

Home made software

MATLAB or C

Histology sections

Slides and images

Contextual

Project, Experiment, Animal

Many different file types

Data

File Format

Name

Images

Machine
dependent

Scope_Date_Var

Ventricular
pressure
measurements

Proprietary

M_Date_Var.raw

Home made
software

MATLAB or C

Script_Date_Var

Histology
sections

Slides and images

Anat_Date_Stain

Separate Nomenclature

Data

File Format

Name

Images (1)

Machine
dependent

E_1_Date_var

Ventricular
pressure
measurements
(2)

Proprietary

E_2_Date_Var

Home made
software (3)

MATLAB or C

E_3_Script_Var

Histology
sections (4)

Slides and images

E_4_Date_Stain

Unified Nomenclature

Type

Recommended

Avoid for data sharing

Tabular data

CSV, TSV, SPSS portable

Excel

Text

Plain text, HTML, RTF
PDF/A only if layout matters

Word

Media

Container: MP4, Ogg
Codec: Theora, Dirac, FLAC

Quicktime
H264

Images

TIFF, JPEG2000, PNG

GIF, JPG

Structured data

XML, RDF

RDBMS

Recommended File Formats

RESOURCES

• Bulk Rename Utility (Windows)
• Renamer (Mac)
• PSRenamer
• Mendeley
Bulk File Renaming Tools

Naming conventions

Grant_Project_experiment_instrument_location_weather_catsname_i
cecreamflavor_collaborator_owner_zodiacsign_mousemodel_address
_painscalerating_favoritecolor_ssn_shoesize_sex_eyecolor_tattoos_
scars_votingrecord_YYYYMMDDhhmmss_extra.ext

Presentations

Data presentation
CTSAconnect presentation
Monarch presentation

SPARC

CTSAconnect

Monarch

RESOURCES

www.coggle.it

http://ftp.ihmc.us/

www.mindjet.com

Mindmapping Software

DataManagement@UPR_seminars_101113_JW
DataManagement@UPR_data_101113_JW
DataManagement_dataship_100313_NV_JW_MH_RC
Data101_dataship_091113_FINAL_JW
Data101_dataship_091013_v04_JW
DataManagement_dataship_091013_v03_JW
DataManagement_SPARC_082013_FINAL_NV
DataManagement_SPARC_052013_v8

RESOURCES

Dropbox | Google docs
GIT | SMART SVN

Version Control

Which of the following do you do?

a. Save copies of data on a disk, USB drive, or computer
hard drive
b. Save copies of data on a local server
c. Save copies of data on a central campus server
d. Save copies of data on a web based or cloud server
e. Store data in a repository or archives
f. Automatically backup files
g. Manually generate backup
h. Restrict access to files

3 | copies (you, lab, other)
2 | 2 different forms
1 | remote location

Metadata/ Controled Vocab/Ontologies

How do you speak for your data
when you are not around?

Metadata
Controlled
Vocabularies
How do you speak for your data
ontologies
when you are not around?

What it is
Controlled vocabularies
metadata
What it takes to do it
Relevant variables

Controlled vocab

definitions
grouping

ontologies

classification
connection

What is metadata, really?
Hello there!

a. a philosophy
b. describes data
c. dating site
d. data

Title
Author
Call number
Publisher
ISBN

Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
metadata

Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
metadata

Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
metadata

Your metadata should
make your data
understandable to
others without your
involvement

- Anne Gilliland

File name

File type

Title
Date created

Who created the
data

RESOURCES

http://www.dlib.indiana.edu/~jenlrile/metadatamap/

Metadata standards

RESOURCES

http://rs.tdwg.org/dwc/

Metadata standards

What is controlled vocab, really?
Hello there!

Craigslist search: Chaise

Craigslist search: Fainting couch

PubMed indexes articles with
MeSH Terms

What is an ontology, really?
Hello there!

Human Disease:

PFEIFFER
SYNDROME

Coronal
craniosynostosis
HP:0004440

Hypoplasia of
the maxilla
HP:0000327

Cross-species
Phenotype

premature
suture
closure

Most similar
mouse model:
CD1.Cg-Fgfr2tm4Lni/H

premature
suture closure
MP:0000081

maxilla
hypoplasia
short maxilla
MP:0000097

malocclusion
Dental crowding
HP:0000678

Brachyturricephaly
HP:0000244

Hypertelorism
HP:0000316

malocclusion
MP:0000120

shortened
head

ocular
hypertelorism

shortened
head
MP:0000435

ocular
hypertelorism
MP:0001300

How?
naming|metadata |standards | tools

Meet the Urban Lab

Meet the Urban Lab

The Urban Lab Antibodies

A+ organization!

Percent identifiable

90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Commerical Ab Catalog number Source organism Target uniquely
identifiable
reported
reported
identifiable

Of 14 antibodies published in 45 articles,
only 38% were identifiable

RESOURCES

AntibodyRegistry.org

Are you aware of data standards in
your field?
@OHSU, 72% said no or didn’t know!

Data standards are the rules by which data are
described and recorded. In order to share, exchange,
and understand data, we must standardize the format
as well as the meaning.
www.usgs.gov/datamanagement/plan/datastandards.php

Many microarray transcriptomics standards
JAMIA:sea‐of‐standards

Reporting
guidelines

Terminology
structure
(Interoperability)

Exchange
Formats

RESOURCES

Minimum Information for Biological and Biomedical Investigations

RESOURCES

www.force11.org/node/4463

biosharing.org/bsg-000532

Minimum Information for
Biological and Biomedical Investigations
http://www.biosharing.org/standards/mibbi

Reporting Standards

RESOURCES

runmycode.org

www.wf4ever‐project.org

galaxyproject.org/

Workflow analysis platforms

RESOURCES

http://opus.bath.ac.uk/32296

www.labarchives.com

www.labguru.com

RESOURCES

https://dmp.cdlib.org/

Uniquely identifying data
www.flickr.com/photos/pmeimon

Digital Object Identifier (DOI)
Example: 10.1371/journal.pbio.1001339

Unique resource identifier (URI)
A URI will resolve to a single location on the web
URIs for people

Repositories use Unique IDs

RESOURCES

Data Sharing Repositories

Thinking Beyond the PDF
Raw Science

Small publications

Self-publishing

Datasets

Nanopublications

Blogging

Code

Argument or
passage

Social Media

Experimental
design

Single figure
publications

Comments &
Reviews
Annotations

RESOURCES

figshare.com

datadryad.org

thedata.org

www.dataone.org

data.rutgers.edu/

v
n2t.net/ezid

nature.com/scientificdata/

F1000.com/

Data publishing and sharing

John L Campbell, Research Ecologist, Oregon State
University, Corvallis OR

John L Campbell, Research Ecologist, Center for
Research on Ecosystem Change, Durham, NC

RESOURCES

Impact.Story
impactstory.org

www.plumanalytics.com

http://myidp.sciencecareers.org/

orcid.org

Yes, you are an individual!

1 | Data definitions
2 | Dealing with Data
3 | The Next Step

http://libguides.ohsu.edu/data

Melissa Haendel
Nicole Vasilevsky
Robin Champieux

what happens
between publications?

“We are Drowning in
Information but
Starved for
Knowledge”
John Naisbitt

“We are Drowning in
data but
Starved for
Knowledge”
Hello there!

Jackie’s bad paraphrase of John Naisbitt

Dr. Sawyer

Dr. Finn

Connected?

University

Collegeville College

University of Badass

No

Journals

Journal of Information

Journal of Data

No

Grant Title

Protein G is Important

Protein H is Important

No

Dr. Sawyer

Dr. Finn

Connected?

Machines Used

Alpha, Gamma, Theta,
Sigma

Gamma, Beta, Kappa, Theta

Yes!

Reagents Used

Cyan, Orange, Green,
Mauve, Beige

Mauve, Chartreuse, Cyan,
Green, Taupe

Yes!

Genes Referenced

bz3d14.2, bz3c13.1,
bz3d,98.1

bz3c13.1

Yes!

Proteins Referenced

Eng1a, Ntl, Ncdq

Ndrw, Eng1a, Brs

Yes!

People Affiliated with
Resources

Harry, Neville, Ron

Harry, Ron, Hermione

Yes!

Data management workshop 101113

Data management workshop 101113

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (7)

Similar to Data management workshop 101113

Similar to Data management workshop 101113 (20)

More from Jackie Wirz, PhD

More from Jackie Wirz, PhD (15)

Recently uploaded

Recently uploaded (20)

Data management workshop 101113