Data Management

627 views

Published on

data management, information management, data, big data, personal organization, organization, file management, scientific research, research, project management, data security, file naming conventions, data management plan,

Published in: Education
0 Comments
1 Like
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total views
627
On SlideShare
0
From Embeds
0
Number of Embeds
200
Actions
Shares
0
Downloads
0
Comments
0
Likes
1
Embeds 0
No embeds

No notes for slide

Data Management

  1. 1. DATA MANAGEMENT July 7, 2014
  2. 2. Hello there!
  3. 3. 1 | Data definitions 2 | Dealing with Data 3 | The Real World
  4. 4. 1 |Data definitions
  5. 5. asdf Data can be complex
  6. 6. Data can be amazing
  7. 7. Data are about discovery
  8. 8. But…
  9. 9. Data does not speak for itself…
  10. 10. YOU speak for YOUR data
  11. 11. and you need to manage it
  12. 12. But, even more fundamentally…
  13. 13. tomaytoe tomahto Solanum lycopersicum PANTONE 1795 C tdTomato 554ex 581em $64
  14. 14. Data means different things to different people
  15. 15. TraditionalData Types Observational Experimental Simulation data Derived or compiled data
  16. 16. Welcome to the 21st Century
  17. 17. Data is not static
  18. 18. The data timeline 1. BrilliantIdea! 2. DesignExperiment 3. Do Experiment 4. Collectdata 5. Compileand Analyze 6. Publish 7. Fame,Fortune
  19. 19. 1. BrilliantIdea! 2. Design Experiment 3. Do Experiment 4. Collect data 5. Compileand Analyze 6. Publish 7. Fame,Fortune The data timeline: What people think
  20. 20. The data timeline: What Happens Idea! design experiment Compile&Cleandata Publish Try #2 Failure!! #896 coffee #896!!!! Analyzingdata Other People’s data
  21. 21. The data cycle: WhatReally Happens
  22. 22. 2 |dealing with data
  23. 23. Hello there! Why should I care?
  24. 24. Personal organization Credit where credit is due Reproducibilityof science Accelerates scientific discovery Efficiency So you won’t go crazy
  25. 25. Hello there! Do you get frustrated with…
  26. 26. a. Storing data b. Backing up data c. Analyzing/manipulating data d. Finding data produced by other researchers e. Ensuring data are secure f. Making data accessible to other researchers g. Controlling access to data h. Tracking updates to data (versioning) i. Creating metadata (what’s metadata?) j. Protecting intellectual property rights k. Ensuring appropriate professional credit/citation is given
  27. 27. naming|metadata |standards | tools How do I not go crazy?
  28. 28. naming
  29. 29. Naming: File Names
  30. 30. File naming
  31. 31. File naming This is fake…
  32. 32. File naming This is real!
  33. 33. Naming conventions Project_instrument_location_YYYYMMDDhhmmss_extra.ext Index/grant conditions Leading zero! s/n, variable Retain order
  34. 34. Lamar Soutter Library UMMS Experiment: stem cells on fibrin to damaged heart
  35. 35. Collective data from experiment Post days +: Section heart, tissues on slides, staining, images of tissues, tracking particles on heart Variable days: #2 Surgery: examination, high speed imaging/LVPs, isolate heart and place it in freezer 0 day: #1 Surgery: infarct/delivery of stem cells to damaged heart tissue -1 day: Stem cells in solution with biological suture -2 days: Incubate stem cells with markers
  36. 36. Collective data from experiment Post days +: Section heart, tissues on slides, staining, images of tissues, tracking particles on heart Variable days: #2 Surgery: examination, high speed imaging/LVPs, isolate heart and place it in freezer 0 day: #1 Surgery: infarct/delivery of stem cells to damaged heart tissue -1 day: Stem cells in solution with biological suture -2 days: Incubate stem cells with markers TIME | TYPE | USE
  37. 37. Data File Format Images Machine dependent Ventricular pressure measurements Proprietary Home made software MATLAB or C Histology sections Slides and images Contextual Project, Experiment, Animal Many different file types
  38. 38. Data File Format Name Images Machine dependent Scope_Date_Var Ventricular pressure measurements Proprietary M_Date_Var.raw Home made software MATLAB or C Script_Date_Var Histology sections Slides and images Anat_Date_Stain Separate Nomenclature
  39. 39. Data File Format Name Images (1) Machine dependent E_1_Date_var Ventricular pressure measurements (2) Proprietary E_2_Date_Var Home made software (3) MATLAB or C E_3_Script_Var Histology sections (4) Slides and images E_4_Date_Stain Unified Nomenclature
  40. 40. Recommended File Formats Type Recommended Meh Tabular data CSV, TSV, SPSS portable Excel Text Plain text, HTML, RTF PDF/A only if layout matters Word Media Sound: FLAC, Ogg Video: MP4 Sound: MP3, WAV, AIFF Video: .mj2 Images TIFF, JPEG2000, PNG GIF, JPG, PDF Structured data XML, RDF RDBMS
  41. 41. Bulk File Renaming Tools RESOURCES • Bulk Rename Utility (Windows) • Renamer (Mac) • PSRenamer • Mendeley
  42. 42. Naming conventions Grant_Project_experiment_instrument_location_weather_catsname_i cecreamflavor_collaborator_owner_zodiacsign_mousemodel_address _painscalerating_favoritecolor_ssn_shoesize_sex_eyecolor_tattoos_ scars_votingrecord_YYYYMMDDhhmmss_extra.ext
  43. 43. Naming: Directory Structure
  44. 44. Data presentation CTSAconnect presentation Monarch presentation Presentations SPARC CTSAconnect Monarch
  45. 45. http://ftp.ihmc.us/ Mindmapping Software RESOURCES www.coggle.it www.mindjet.com
  46. 46. Mindmapping Old School RESOURCES
  47. 47. Oldie but goodie…
  48. 48. Naming: Version Control
  49. 49. DataManagement@UPR_seminars_101113_JW DataManagement@UPR_data_101113_JW DataManagement_dataship_100313_NV_JW_MH_RC Data101_dataship_091113_FINAL_JW Data101_dataship_091013_v04_JW DataManagement_dataship_091013_v03_JW DataManagement_dataship_090913_v02_JW DataManagement_dataship_090913_v01_JW DataManagement_SPARC_082013_FINAL_NV DataManagement_SPARC_052013_v8
  50. 50. Version Control RESOURCES Electronic Lab Notebooks
  51. 51. Naming: Backups
  52. 52. Which of the following do you do?
  53. 53. a. Save copies of data on a disk, USB drive, or computer hard drive b. Save copies of data on a local server c. Save copies of data on a central campus server d. Save copies of data on a web based or cloud server e. Store data in a repository or archives f. Automatically backup files g. Manually generate backup h. Restrict access to files
  54. 54. 3 | copies (you,lab,other) 2 | 2 different forms 1 | remote location
  55. 55. ETHICS
  56. 56. Computingin the cloud
  57. 57. Ownership
  58. 58. naming|metadata |standards | tools How do I not go crazy?
  59. 59. Metadata/ Controled Vocab/Ontologies
  60. 60. You speak for your data
  61. 61. How do you speak for your data when you are not around?
  62. 62. How do you speak for your data when you are not around? Metadata Controlled Vocabularies ontologies
  63. 63. Controlled vocabularies metadata Whatit is Whatit takesto do it Relevantvariables definitions grouping classification connection Controlled vocab ontologies
  64. 64. Hello there! What is metadata,really?
  65. 65. a. a philosophy b. describes data c. dating site d. data
  66. 66. a. a philosophy b. describes data c. dating site d. data
  67. 67. Title Author Call number Publisher ISBN
  68. 68. - AnneGilliland Your metadatashould make your data understandable to others withoutyour involvement Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata metadata
  69. 69. - Jackie Your metadatashould make your data understandable to your mother Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata metadata
  70. 70. File name File type Who created the data Title Date created
  71. 71. Metadata standards RESOURCES http://www.dlib.indiana.edu/~jenlrile/metadatamap/
  72. 72. Metadata standards RESOURCES http://rs.tdwg.org/dwc/
  73. 73. Hello there! What is controlled vocab, really?
  74. 74. Craigslist search: Chaise Craigslist search: Fainting couch
  75. 75. = acetominophen
  76. 76. PubMed indexes articles with MeSH Terms
  77. 77. Hello there! What is an ontology, really?
  78. 78. Hello there! Ummmm….So what?!?
  79. 79. naming|metadata |standards | tools How?
  80. 80. standards
  81. 81. Why? The Methods Section…
  82. 82. Meet the Urban Lab Meet the Urban Lab
  83. 83. A+ organization! The Urban Lab Antibodies
  84. 84. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Commerical Ab identifiable Catalog number reported Source organism reported Target uniquely identifiable Of 14 antibodies published in 45 articles, only 38% were identifiable Percentidentifiable
  85. 85. AntibodyRegistry.org RESOURCES
  86. 86. Are you aware of data standards in your field? @OHSU, 72% said no or didn’t know!
  87. 87. How do these standards work?
  88. 88. How do you speak for your data when you are not around? Metadata Controlled Vocabularies ontologies
  89. 89. RESOURCES Minimum Information for Biological and Biomedical Investigations
  90. 90. www.force11.org/node/4463 biosharing.org/bsg-000532 Reporting Standards RESOURCES http://www.biosharing.org/standards/mibbi www.cdisc.org
  91. 91. naming|metadata |standards | tools How?
  92. 92. tools
  93. 93. RESOURCES www.wf4ever-project.org runmycode.org galaxyproject.org/ Workflow analysis platforms www.pegasus.isi.edu/
  94. 94. RESOURCES www.labguru.com www.labarchives.com http://opus.bath.ac.uk/32296
  95. 95. RESOURCES www.labtove.org www.openwetware.org
  96. 96. What types of data will be created? Who will own / access / be responsible? Where will data be stored during and after? What info is necessary for my mom to get it?
  97. 97. RESOURCES https://dmp.cdlib.org/
  98. 98. Uniquely identifying data
  99. 99. Digital Object Identifier(DOI) Example: 10.1371/journal.pbio.1001339 Unique resource identifier(URI) A URI will resolve to a single location on the web URIs for people Repositories use Unique IDs
  100. 100. RESOURCES Repository Map
  101. 101. RESOURCES Data Sharing Repositories
  102. 102. v figshare.com datadryad.org thedata.org n2t.net/ezid www.dataone.org data.rutgers.edu/ RESOURCES nature.com/scientificdata/ F1000.com/ Data publishing and sharing
  103. 103. naming|metadata |standards | tools How do I not go crazy?
  104. 104. *special topics in data management* How do I not go crazy?
  105. 105. Raw Science Small publications Alt Publishing Datasets Code Experimental design Argument or passage Blogging Social Media Comments & Reviews Annotations Single figure publications Nanopublications Beyond the Traditional Journal Article
  106. 106. “Research Products”
  107. 107. You are unique, too!
  108. 108. John L Campbell, Research Ecologist, Oregon State University, Corvallis OR John L Campbell, Research Ecologist, Center for Research on Ecosystem Change, Durham, NC
  109. 109. Impact.Story impactstory.org www.plumanalytics.com orcid.org RESOURCES Yes, you are an individual! http://myidp.sciencecareers.org/
  110. 110. 3 |the Real World
  111. 111. naming|metadata |standards | tools How do I not go crazy?
  112. 112. So, what does this mean for day to day research? How do I not go crazy?
  113. 113. Gummy Bear: the Groundbreaking Paper
  114. 114. Your Data: Gummy Bear Raw Data Haribo Gummi Bears Sugar Free 5lb Bag Bounces Amplitude Color 15 4 blue 43 3 red 58 9 green 75 82 purple
  115. 115. Your task: Create a Figure for Nature Figure with the following: Image of Gummy skeleton only with belly button annotated Chart of springiness per color of bear Figure legend Methods section
  116. 116. Figure 1. A) Gummy skeleton with belly button annotated with red arrow B) Springiness by sample color. Methods Section: Haribo Gummi Bears (Sugar Free) were purchased from Amazon.com (UPC: 422384500110). Gummy bears were placed in the SpringOMatic 3000 (ICanPickleThat, Portland OR) according to the manufactures instructions. The Gummy Anatomy (Jason Freeny) image was cropped in PPT (Microsoft) and annotate to highlight the bellybutton. 0 2 4 6 8 10 12 14 16 blue red green purple Springiness(bounces/length) Sample Color A B Final Figure Example
  117. 117. Gummy Bear ChallengeFinal Figure Group 1
  118. 118. Gummy Bear Final Data 0 2 4 6 8 10 12 14 16 blue red green purple 4 3 9 82 15 43 58 75 Springiness (Bounces/Amplitude) 15 4 blue 43 3 red 58 9 green 75 82 purple Methods: A schematic of a Gummi Bear was cropped to indicate where the belly button is located (Fig. 1). At this point, raw experimental data showing the bounce, amplitude, and color were analyzed and the springiness calculated for each color of bear. This was accomplished by dividing the bounce by the amplitude and plotting this against bear color. Fig. 1 Belly button of Haribo Sugar Free Gummi Bear
  119. 119. Gummy Bear ChallengeFinal Figure Group 2
  120. 120. Figure 1. A) Gummy skeleton with belly button annotated with red arrow B) Springiness by sample color. Methods Section: Haribo Gummi Bears (Sugar Free) were purchased from Amazon.com (UPC: 422384500110). Gummy bears were placed in the SpringOMatic 3000 (ICanPickleThat, Portland OR) according to the manufactures instructions. Gummy Bear Final Data 0 2 4 6 8 10 12 14 16 blue red green purple Springiness(bounces/length) Sample Color A B B
  121. 121. Gummy Bear ChallengeFinal Figure Group 3
  122. 122. 0 2 4 6 8 10 12 14 16 purple blue green red Figure 1: Haribo Gummi Bear (Ursus gummius hariboensis) springiness as a measure of bounces/amplitude, by color (n = 1). Springiness(Bounces/Amplitude) Figure 2: Schematic depiction of Haribo Gummi Bear umbilical skeletal anatomy. Methods & Materials Gummi Bears were obtained through Amazon in 3 kg bags. Lot and temperature during transport data were not made available. Bears were housed in a plastic bowl in accordance with IACUC policy and national standards for gummi bear care. They were housed at room temperature on a natural light cycle. Food and water were provided ad libitum (consumption was not monitored) Each bear was sampled only once to reduce costs
  123. 123. Methods & Materials • Gummi Bears were obtained through Amazon in 3 kg bags. Lot and temperature during transport data were not made available. Bears were housed in a plastic bowl in accordance with IACUC policy and national standards for gummi bear care. They were housed at room temperature on a natural light cycle. • Food and water were provided ad libitum (consumption was not monitored) • Each bear was sampled only once to reduce costs
  124. 124. Gummy Bear ChallengeFinal Figure Group 4
  125. 125. Belly Button 0.00 2.00 4.00 6.00 8.00 10.00 12.00 14.00 16.00 blue red green purpleSpringiness(bounces/amplitude) Gummy Bear Color (a) (b) Fig. 1. (a) schematic of the anatomy of a gummy bear (adapted from 1). (b) springiness of bear by color using spring-o-matic.
  126. 126. Methods: Insert the sample of interest, specifically a colored gummy bear (Haribo, Japan). Position the probe above the sample. Press "Tickle" and the SpringOMatic (ICanPickleThat, Portland) will poke the belly button a standard depth of 1 cm. Record the number of bounces and the amplitude of the largest bounce in cm. From these values, the springiness can be calculated (bounce/amplitude).
  127. 127. Take the Challenge!
  128. 128. 1 | Data definitions 2 | Dealing with Data 3 | The Real World
  129. 129. wirzj@ohsu.edu
  130. 130. http://libguides.ohsu.edu/data
  131. 131. Thank you!
  132. 132. Questions?
  133. 133. “We are Drowning in Informationbut Starved for Knowledge” John Naisbitt
  134. 134. Hello there! “We are Drowning in data but Starved for Knowledge” Jackie’s badparaphraseofJohn Naisbitt

×