DATA MANAGEMENT 101
October 11, 2013
Hello there!
1 | Data definitions
2 | Dealing with Data
3 | The Next Steps
1 |Data definitions
asdf

Data can be complex
Data can be amazing
Data are about discovery
But…
Data does not speak for itself…
YOU speak for YOUR data
and you need to manage it
But, even more fundamentally…
tomaytoe

PANTONE 
1795 C

tomahto

Solanum
lycopersicum
tdTomato
554ex 581em

$64
Data means different things
to different people
Data is not static
1. Brilliant Idea!
2. Design Experiment
3. Do Experiment

The data timeline

4. Collect data
5. Compile and Analyze
6. Pub...
1. Brilliant Idea!
2. Design Experiment
3. Do Experiment
4. Collect data

The data timeline:
What people think

5. Compile...
Idea!

Analyzing data

The data timeline:
What Happens

experiment
Compile data

design
Other
People’s
data
Try #2
Failure...
The data cycle:
What Really Happens
2 |dealing with data
Why should I care?
Hello there!
So you won’t go crazy

Efficiency
Accelerates scientific discovery
Reproducibility of science

Credit where credit is due
...
Do you get frustrated with…
Hello there!
a. Storing data
b. Backing up data
c. Analyzing/manipulating data
d. Finding data produced by other researchers
e. Ensurin...
How do I not go crazy?
naming|metadata |standards | tools
naming
Naming: File Names
File naming
File naming
Naming conventions

s/n, variable

Retain order

Project_instrument_location_YYYYMMDDhhmmss_extra.ext
Index/grant

conditi...
Experiment: stem cells on fibrin to damaged heart
Lamar Soutter Library UMMS
‐2 days:  Incubate stem cells with markers

‐1 day:  Stem cells in solution with biological suture

0 day: #1 Surgery: inf...
‐2 days:  Incubate stem cells with markers

‐1 day:  Stem cells in solution with biological suture

0 day: #1 Surgery: inf...
Data

File Format

Images

Machine dependent

Ventricular pressure 
measurements

Proprietary 

Home made software

MATLAB...
Data

File Format

Name

Images

Machine 
dependent

Scope_Date_Var

Ventricular 
pressure 
measurements

Proprietary 

M_...
Data

File Format

Name

Images (1)

Machine 
dependent

E_1_Date_var

Ventricular 
pressure 
measurements 
(2)

Proprieta...
Type

Recommended

Avoid for data sharing

Tabular data

CSV, TSV, SPSS portable

Excel

Text

Plain text, HTML, RTF
PDF/A...
RESOURCES

• Bulk Rename Utility (Windows)
• Renamer (Mac)
• PSRenamer
• Mendeley
Bulk File Renaming Tools
Naming conventions

Grant_Project_experiment_instrument_location_weather_catsname_i
cecreamflavor_collaborator_owner_zodia...
Naming: Directory Structure
Presentations

Data presentation
CTSAconnect presentation
Monarch presentation

SPARC

CTSAconnect

Monarch
RESOURCES

www.coggle.it

http://ftp.ihmc.us/

www.mindjet.com

Mindmapping Software
Oldie but goodie…
Naming: Version Control
DataManagement@UPR_seminars_101113_JW
DataManagement@UPR_data_101113_JW
DataManagement_dataship_100313_NV_JW_MH_RC
Data101...
RESOURCES

Dropbox | Google docs
GIT | SMART SVN

Version Control
Naming: Backups
Which of the following do you do?
a. Save copies of data on a disk, USB drive, or computer
hard drive
b. Save copies of data on a local server
c. Save copie...
3 | copies (you, lab, other)
2 | 2 different forms
1 | remote location
ETHICS
Computing in the cloud
How do I not go crazy?
naming|metadata |standards | tools
Metadata/ Controled Vocab/Ontologies
You speak for your data
How do you speak for your data
when you are not around?
Metadata
Controlled
Vocabularies
How do you speak for your data
ontologies
when you are not around?
What it is
Controlled vocabularies
metadata
What it takes to do it
Relevant variables

Controlled vocab

definitions
group...
What is metadata, really?
Hello there!
a. a philosophy
b. describes data
c. dating site
d. data
Title
Author
Call number
Publisher
ISBN
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metadata
Metad...
File name

File type

Title
Date created

Who created the 
data
RESOURCES

http://www.dlib.indiana.edu/~jenlrile/metadatamap/

Metadata standards
RESOURCES

http://rs.tdwg.org/dwc/

Metadata standards
What is controlled vocab, really?
Hello there!
Craigslist search: Chaise

Craigslist search: Fainting couch
acetominophen
PubMed indexes articles with 
MeSH Terms
What is an ontology, really?
Hello there!
Human Disease:

PFEIFFER
SYNDROME

Coronal
craniosynostosis
HP:0004440

Hypoplasia of
the maxilla
HP:0000327

Cross-specie...
How?
naming|metadata |standards | tools
standards
Meet the Urban Lab

Meet the Urban Lab
The Urban Lab Antibodies

A+ organization!
Percent identifiable

90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Commerical Ab Catalog number Source organism Target uniquely
...
RESOURCES

AntibodyRegistry.org
Are you aware of data standards in
your field?
@OHSU, 72% said no or didn’t know!
Data standards are the rules by which data are 
described and recorded. In order to share, exchange, 
and understand data,...
Many microarray transcriptomics standards
JAMIA:sea‐of‐standards
Reporting
guidelines

Terminology
structure
(Interoperability)

Exchange
Formats
How do these standards work?
Metadata
Controlled
Vocabularies
How do you speak for your data
ontologies
when you are not around?
RESOURCES

Minimum Information for Biological and Biomedical Investigations 
RESOURCES

www.cdisc.org
RESOURCES

www.force11.org/node/4463

biosharing.org/bsg-000532

Minimum Information for 
Biological and Biomedical Invest...
How?
naming|metadata |standards | tools
tools
RESOURCES

runmycode.org

www.wf4ever‐project.org

galaxyproject.org/

Workflow analysis platforms
RESOURCES

http://opus.bath.ac.uk/32296

www.labarchives.com

www.labguru.com
RESOURCES

https://dmp.cdlib.org/
Uniquely identifying data
www.flickr.com/photos/pmeimon
Digital Object Identifier (DOI)
Example: 10.1371/journal.pbio.1001339

Unique resource identifier (URI)
A URI will resolve...
RESOURCES

Repository Map
RESOURCES

Data Sharing Repositories
3 |next steps
Thinking Beyond the PDF
Raw Science

Small publications

Self-publishing

Datasets

Nanopublications

Blogging

Code

Argu...
Research Products
RESOURCES

figshare.com

datadryad.org

thedata.org

www.dataone.org

data.rutgers.edu/

v
n2t.net/ezid

nature.com/scient...
You are unique, too
John L Campbell, Research Ecologist, Oregon State 
University, Corvallis OR

John L Campbell, Research Ecologist, Center f...
RESOURCES

Impact.Story
impactstory.org

www.plumanalytics.com

http://myidp.sciencecareers.org/

orcid.org

Yes, you are ...
1 | Data definitions
2 | Dealing with Data
3 | The Next Step
wirzj@ohsu.edu
http://libguides.ohsu.edu/data
Melissa Haendel
Nicole Vasilevsky
Robin Champieux
Thank you!
Questions?
what happens
between publications?
“We are Drowning in
Information but
Starved for
Knowledge”
John Naisbitt
“We are Drowning in
data but
Starved for
Knowledge”
Hello there!

Jackie’s bad paraphrase of John Naisbitt
Dr. Sawyer

Dr. Finn

Connected?

University

Collegeville College

University of Badass

No

Journals

Journal of Informa...
Dr. Sawyer

Dr. Finn

Connected?

Machines Used

Alpha, Gamma, Theta, 
Sigma

Gamma, Beta, Kappa, Theta

Yes!

Reagents Us...
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Data management workshop 101113
Upcoming SlideShare
Loading in...5
×

Data management workshop 101113

432

Published on

Presentation on Data management @ UCC

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
432
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
1
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Transcript of "Data management workshop 101113"

  1. 1. DATA MANAGEMENT 101 October 11, 2013
  2. 2. Hello there!
  3. 3. 1 | Data definitions 2 | Dealing with Data 3 | The Next Steps
  4. 4. 1 |Data definitions
  5. 5. asdf Data can be complex
  6. 6. Data can be amazing
  7. 7. Data are about discovery
  8. 8. But…
  9. 9. Data does not speak for itself…
  10. 10. YOU speak for YOUR data
  11. 11. and you need to manage it
  12. 12. But, even more fundamentally…
  13. 13. tomaytoe PANTONE  1795 C tomahto Solanum lycopersicum tdTomato 554ex 581em $64
  14. 14. Data means different things to different people
  15. 15. Data is not static
  16. 16. 1. Brilliant Idea! 2. Design Experiment 3. Do Experiment The data timeline 4. Collect data 5. Compile and Analyze 6. Publish 7. Fame, Fortune
  17. 17. 1. Brilliant Idea! 2. Design Experiment 3. Do Experiment 4. Collect data The data timeline: What people think 5. Compile and Analyze 6. Publish 7. Fame, Fortune
  18. 18. Idea! Analyzing data The data timeline: What Happens experiment Compile data design Other People’s data Try #2 Failure!! beer #896 #896 6. Publish
  19. 19. The data cycle: What Really Happens
  20. 20. 2 |dealing with data
  21. 21. Why should I care? Hello there!
  22. 22. So you won’t go crazy Efficiency Accelerates scientific discovery Reproducibility of science Credit where credit is due Personal organization
  23. 23. Do you get frustrated with… Hello there!
  24. 24. a. Storing data b. Backing up data c. Analyzing/manipulating data d. Finding data produced by other researchers e. Ensuring data are secure f. Making data accessible to other researchers g. Controlling access to data h. Tracking updates to data (ie versioning) i. Creating metadata  j. Protecting intellectual property rights  k. Ensuring appropriate professional credit/citation is given
  25. 25. How do I not go crazy? naming|metadata |standards | tools
  26. 26. naming
  27. 27. Naming: File Names
  28. 28. File naming
  29. 29. File naming
  30. 30. Naming conventions s/n, variable Retain order Project_instrument_location_YYYYMMDDhhmmss_extra.ext Index/grant conditions Leading zero!
  31. 31. Experiment: stem cells on fibrin to damaged heart Lamar Soutter Library UMMS
  32. 32. ‐2 days:  Incubate stem cells with markers ‐1 day:  Stem cells in solution with biological suture 0 day: #1 Surgery: infarct/delivery of stem cells to damaged heart tissue  Variable days:  #2 Surgery:  examination, high speed imaging/LVPs,  isolate heart and place it in freezer  Post days +:  Section heart, tissues on slides, staining, images of tissues,  tracking particles on heart   Collective data from experiment
  33. 33. ‐2 days:  Incubate stem cells with markers ‐1 day:  Stem cells in solution with biological suture 0 day: #1 Surgery: infarct/delivery of stem cells to damaged heart tissue  TIME | TYPE | USE Variable days:  #2 Surgery:  examination, high speed imaging/LVPs,  isolate heart and place it in freezer  Post days +:  Section heart, tissues on slides, staining, images of tissues,  tracking particles on heart   Collective data from experiment
  34. 34. Data File Format Images Machine dependent Ventricular pressure  measurements Proprietary  Home made software MATLAB or C Histology sections Slides and images Contextual Project, Experiment, Animal Many different file types
  35. 35. Data File Format Name Images Machine  dependent Scope_Date_Var Ventricular  pressure  measurements Proprietary  M_Date_Var.raw Home made  software MATLAB or C Script_Date_Var Histology  sections Slides and images Anat_Date_Stain Separate Nomenclature
  36. 36. Data File Format Name Images (1) Machine  dependent E_1_Date_var Ventricular  pressure  measurements  (2) Proprietary  E_2_Date_Var Home made  software (3) MATLAB or C E_3_Script_Var Histology  sections (4) Slides and images E_4_Date_Stain Unified Nomenclature
  37. 37. Type Recommended Avoid for data sharing Tabular data CSV, TSV, SPSS portable Excel Text Plain text, HTML, RTF PDF/A only if layout matters Word Media Container: MP4, Ogg Codec: Theora, Dirac, FLAC Quicktime H264 Images TIFF, JPEG2000, PNG GIF, JPG Structured data XML, RDF RDBMS Recommended File Formats
  38. 38. RESOURCES • Bulk Rename Utility (Windows) • Renamer (Mac) • PSRenamer • Mendeley Bulk File Renaming Tools
  39. 39. Naming conventions Grant_Project_experiment_instrument_location_weather_catsname_i cecreamflavor_collaborator_owner_zodiacsign_mousemodel_address _painscalerating_favoritecolor_ssn_shoesize_sex_eyecolor_tattoos_ scars_votingrecord_YYYYMMDDhhmmss_extra.ext
  40. 40. Naming: Directory Structure
  41. 41. Presentations Data presentation CTSAconnect presentation Monarch presentation SPARC CTSAconnect Monarch
  42. 42. RESOURCES www.coggle.it http://ftp.ihmc.us/ www.mindjet.com Mindmapping Software
  43. 43. Oldie but goodie…
  44. 44. Naming: Version Control
  45. 45. DataManagement@UPR_seminars_101113_JW DataManagement@UPR_data_101113_JW DataManagement_dataship_100313_NV_JW_MH_RC Data101_dataship_091113_FINAL_JW Data101_dataship_091013_v04_JW DataManagement_dataship_091013_v03_JW DataManagement_dataship_090913_v02_JW DataManagement_dataship_090913_v01_JW DataManagement_SPARC_082013_FINAL_NV DataManagement_SPARC_052013_v8
  46. 46. RESOURCES Dropbox | Google docs GIT | SMART SVN Version Control
  47. 47. Naming: Backups
  48. 48. Which of the following do you do?
  49. 49. a. Save copies of data on a disk, USB drive, or computer hard drive b. Save copies of data on a local server c. Save copies of data on a central campus server d. Save copies of data on a web based or cloud server e. Store data in a repository or archives f. Automatically backup files g. Manually generate backup h. Restrict access to files
  50. 50. 3 | copies (you, lab, other) 2 | 2 different forms 1 | remote location
  51. 51. ETHICS
  52. 52. Computing in the cloud
  53. 53. How do I not go crazy? naming|metadata |standards | tools
  54. 54. Metadata/ Controled Vocab/Ontologies
  55. 55. You speak for your data
  56. 56. How do you speak for your data when you are not around?
  57. 57. Metadata Controlled Vocabularies How do you speak for your data ontologies when you are not around?
  58. 58. What it is Controlled vocabularies metadata What it takes to do it Relevant variables Controlled vocab definitions grouping ontologies classification connection
  59. 59. What is metadata, really? Hello there!
  60. 60. a. a philosophy b. describes data c. dating site d. data
  61. 61. Title Author Call number Publisher ISBN
  62. 62. Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata metadata Your metadata should make your data understandable to others without your involvement - Anne Gilliland
  63. 63. File name File type Title Date created Who created the  data
  64. 64. RESOURCES http://www.dlib.indiana.edu/~jenlrile/metadatamap/ Metadata standards
  65. 65. RESOURCES http://rs.tdwg.org/dwc/ Metadata standards
  66. 66. What is controlled vocab, really? Hello there!
  67. 67. Craigslist search: Chaise Craigslist search: Fainting couch
  68. 68. acetominophen
  69. 69. PubMed indexes articles with  MeSH Terms
  70. 70. What is an ontology, really? Hello there!
  71. 71. Human Disease: PFEIFFER SYNDROME Coronal craniosynostosis HP:0004440 Hypoplasia of the maxilla HP:0000327 Cross-species Phenotype premature suture closure Most similar mouse model: CD1.Cg-Fgfr2tm4Lni/H premature suture closure MP:0000081 maxilla hypoplasia short maxilla MP:0000097 malocclusion Dental crowding HP:0000678 Brachyturricephaly HP:0000244 Hypertelorism HP:0000316 malocclusion MP:0000120 shortened head ocular hypertelorism shortened head MP:0000435 ocular hypertelorism MP:0001300
  72. 72. How? naming|metadata |standards | tools
  73. 73. standards
  74. 74. Meet the Urban Lab Meet the Urban Lab
  75. 75. The Urban Lab Antibodies A+ organization!
  76. 76. Percent identifiable 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Commerical Ab Catalog number Source organism Target uniquely identifiable reported reported identifiable Of 14 antibodies published in 45 articles, only 38% were identifiable
  77. 77. RESOURCES AntibodyRegistry.org
  78. 78. Are you aware of data standards in your field? @OHSU, 72% said no or didn’t know!
  79. 79. Data standards are the rules by which data are  described and recorded. In order to share, exchange,  and understand data, we must standardize the format  as well as the meaning. www.usgs.gov/datamanagement/plan/datastandards.php
  80. 80. Many microarray transcriptomics standards JAMIA:sea‐of‐standards
  81. 81. Reporting guidelines Terminology structure (Interoperability) Exchange Formats
  82. 82. How do these standards work?
  83. 83. Metadata Controlled Vocabularies How do you speak for your data ontologies when you are not around?
  84. 84. RESOURCES Minimum Information for Biological and Biomedical Investigations 
  85. 85. RESOURCES www.cdisc.org
  86. 86. RESOURCES www.force11.org/node/4463 biosharing.org/bsg-000532 Minimum Information for  Biological and Biomedical Investigations  http://www.biosharing.org/standards/mibbi Reporting Standards
  87. 87. How? naming|metadata |standards | tools
  88. 88. tools
  89. 89. RESOURCES runmycode.org www.wf4ever‐project.org galaxyproject.org/ Workflow analysis platforms
  90. 90. RESOURCES http://opus.bath.ac.uk/32296 www.labarchives.com www.labguru.com
  91. 91. RESOURCES https://dmp.cdlib.org/
  92. 92. Uniquely identifying data www.flickr.com/photos/pmeimon
  93. 93. Digital Object Identifier (DOI) Example: 10.1371/journal.pbio.1001339 Unique resource identifier (URI) A URI will resolve to a single location on the web URIs for people Repositories use Unique IDs
  94. 94. RESOURCES Repository Map
  95. 95. RESOURCES Data Sharing Repositories
  96. 96. 3 |next steps
  97. 97. Thinking Beyond the PDF Raw Science Small publications Self-publishing Datasets Nanopublications Blogging Code Argument or passage Social Media Experimental design Single figure publications Comments & Reviews Annotations
  98. 98. Research Products
  99. 99. RESOURCES figshare.com datadryad.org thedata.org www.dataone.org data.rutgers.edu/ v n2t.net/ezid nature.com/scientificdata/ F1000.com/ Data publishing and sharing
  100. 100. You are unique, too
  101. 101. John L Campbell, Research Ecologist, Oregon State  University, Corvallis OR John L Campbell, Research Ecologist, Center for  Research on Ecosystem Change, Durham, NC
  102. 102. RESOURCES Impact.Story impactstory.org www.plumanalytics.com http://myidp.sciencecareers.org/ orcid.org Yes, you are an individual!
  103. 103. 1 | Data definitions 2 | Dealing with Data 3 | The Next Step
  104. 104. wirzj@ohsu.edu
  105. 105. http://libguides.ohsu.edu/data
  106. 106. Melissa Haendel Nicole Vasilevsky Robin Champieux
  107. 107. Thank you!
  108. 108. Questions?
  109. 109. what happens between publications?
  110. 110. “We are Drowning in Information but Starved for Knowledge” John Naisbitt
  111. 111. “We are Drowning in data but Starved for Knowledge” Hello there! Jackie’s bad paraphrase of John Naisbitt
  112. 112. Dr. Sawyer Dr. Finn Connected? University Collegeville College University of Badass No Journals Journal of Information Journal of Data No Grant Title Protein G is Important Protein H is Important No
  113. 113. Dr. Sawyer Dr. Finn Connected? Machines Used Alpha, Gamma, Theta,  Sigma Gamma, Beta, Kappa, Theta Yes! Reagents Used Cyan,  Orange, Green,  Mauve, Beige Mauve, Chartreuse, Cyan,  Green, Taupe Yes! Genes Referenced bz3d14.2, bz3c13.1,  bz3d,98.1 bz3c13.1 Yes! Proteins Referenced Eng1a, Ntl, Ncdq Ndrw, Eng1a, Brs Yes! People Affiliated with  Resources Harry, Neville, Ron Harry, Ron, Hermione Yes!

×