Data Management Open House

DATA MANAGEMENT OPEN
HOUSE
OHSU Library
October 9th, 2013
#OHSUdata @force11rescomm

Melissa
Haendel
Ontology
Development
Group and DMICE
Nicole
Vasilevsky
Ontology
Development
Group
Jackie
Wirz
Research Specialist,
Research Roadmap
SOM
Robin
Champieux
Scholarly
Communication

http://www.force11.org/
@force11rescomm

Who is FORCE11?
Publishers
Library and
Information
scientists
Policy
makers
Tool
builders
Funders
Scholars
Social
Science
Humanities
Science
Free to join!

Beyond-the-PDF
San Diego, Jan 2011 | Amsterdam, March 2013
www.force11.org/beyondthepdf2 | #btpdf2

How does OHSU fit in?
We won 1K to find out.
Today | Discuss data-research cycle, reproducibility, and
communication of findings
Later | Data playground with researchers:
 Your data needs
 Identify the material and services you need
 Get paid $50

Once upon a time….
Research, Present, Publish.
Repeat.

You might say it wears a uniform

Our relationship
is so one-sided.

www.sciencemag.org/site/special/scicomm/infographic.jpg

asdf
Data can be pretty complex…

Data does not speak for itself…

But, even more fundamentally…

How do you speak for your data
when you are not around?

Do you know what metadata is?
a. Philosophy
b. describes data
c. dating site
d. data

Title
Author
Call number
Publisher
ISBN

- Anne Gilliland
Your metadata should
make your data
understandable to
others…
without your
involvement
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata

http://www.phd2published.com/wp-content/uploads/2011/09/publications_image.jpg

Biomed Res Int. 2013;2013:350419.

A Solution: Antibody Registry
The Antibody Registry
www.antibodyregistry.org

Data standards can help with
reproducibility
Average of
~50% of
resources
were not
identifiable
Vasilevsky et al., 2013 PeerJ 1:e148
www.force11.org/node/4463 biosharing.org/bsg-000532

Data Analysis Pipeline Reproducibility
Platforms
RESOURCES
www.wf4ever-project.org runmycode.org
galaxyproject.org/

Are you aware of data standards in
your field?
@OHSU, 72% said no or didn’t know!

Data standards are the rules by which data are
described and recorded. In order to share, exchange,
and understand data, we must standardize the format
as well as the meaning.
www.usgs.gov/datamanagement/plan/datastandards.php
Data Standards

Types of data standards
Reporting
guidelines
Terminology
Artifacts
(includes ontologies)
Exchange
Formats
Can be used together

Reporting
guidelines
Terminology
Artifacts
(includes
ontologies)
Exchange
Formats
MIAME
Data standards examples

Many microarray transcriptomics standards
JAMIA:sea-of-standards

www.cdisc.org
RESOURCES
Minimum Information for Biological and Biomedical Investigations
biosharing.org/

But it isn’t just about reproducibility…
It’s about

Data reuse?
www.erp-recycling.org

Ontologies as a tool for unification
Disease-
Phenotype
databases
Disease
phenotype
ontology
Expression
data
Gene function
data
Cell and tissue
ontology
GO
annotations
ontologies

For example, there are many useful ways to classify organism
parts:
its parts and their arrangement
its relation to other structures
what is it: part of; connected to; adjacent
to, overlapping?
its shape
its function
its developmental origins
its species or clade
its evolutionary history
Cajal 1915, “Accept the view that nothing in nature is useless, even from the human point of view.”
Ontologies classify data in multiple ways
http://www.boloncol.com/images/stories/boletin19/cajal16.jpg

Human Disease:
PFEIFFER
SYNDROME
Most similar
mouse model:
CD1.Cg-Fgfr2tm4Lni/H
shortened
head
MP:0000435
malocclusion
MP:0000120
ocular
hypertelorism
MP:0001300
short maxilla
MP:0000097
Brachyturricephaly
HP:0000244
Hypoplasia of
the maxilla
HP:0000327
Dental crowding
HP:0000678
Hypertelorism
HP:0000316
Coronal
craniosynostosis
HP:0004440
premature
suture
closure
maxilla
hypoplasia
malocclusion
shortened
head
ocular
hypertelorism
premature
suture closure
MP:0000081
Cross-species
Phenotype
Ontologies aid candidate gene identification for
undiagnosed diseases

How can I make my data reusable?
There are tools to help!

Tools for research management
RESOURCES
www.labguru.com
www.labarchives.com

Data management plan tool
RESOURCES
https://dmp.cdlib.org/

What to do with data?
Storage Versioning Publication
Back up in multiple
locations:
 Local hard drive
 Removable
storage
 Shared Network
 Cloud server
 File name
versioning
 Dropbox
 Version control
software
 CVS
 SVN
 Git
Data sharing
repositories:
 Local repository
 Domain specific
 Generic public
repository

Uniquely identifying data
 Document Object Identifier (DOI)
 Unique resource identifier (URI)
www.flickr.com/photos/pmeimon

v
figshare.com datadryad.org thedata.org
n2t.net/ezid www.dataone.org data.rutgers.edu/
Data journals and repositories
RESOURCES
nature.com/scientificdata/

Thinking Beyond the PDF
Raw Science Small publications Self-publishing
Datasets
Code
Experimental
design
Argument or
passage
Blogging
Microblogging
Comments &
Reviews
Annotations
Single figure
publications
Nanopublications

Impact.Story
impactstory.org
www.plumanalytics.com
orcid.org
RESOURCES
Services to identify yourself and your
impact

rubriq.com
scalar.usc.edu
RESOURCES
Alternative publishing mechanisms
thedata.org

http://theconversation.com/scientists-must-share-early-and-share-often-to-boost-citations-18699

Citing products of your
research

What is your scientific footprint?

 Legitimate, citable products of
research
 Same importance as traditional
citations
 Data management is central
Data

Data citation principles.
http://thedata.org/files/thedata_new2/
files/datacitationprinciples-datacite.pdf

Data Management 101
libguides.ohsu.edu/d
ata

Data Management Open House

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Data Management Open House

Similar to Data Management Open House (20)

More from Jackie Wirz, PhD

More from Jackie Wirz, PhD (20)

Recently uploaded

Recently uploaded (20)

Data Management Open House

Editor's Notes