Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

SPARC 2013 Data Management Presentation


Published on

Published in: Technology, Education
  • Be the first to comment

  • Be the first to like this

SPARC 2013 Data Management Presentation

  1. 1. Data management.NicoleVasilevsky, NCNM, OHSUJackieWirz, OHSUMelissa Haendel, OHSU
  2. 2. Outline• Introduction• Why do we need good data management?• Good data management• Databases and tools• Sharing your data
  3. 3. Who are we?• NicoleVasilevsky, PhD– Assistant Professor, Helfgott Research Institute, NCNM– Project Manager, Ontology Development Group, OHSU• JackieWirz, PhD– Assistant Professor, Bioinformation Specialist, OHSU library• Melissa Haendel, PhD– Assistant Professor, Department Head, Ontology DevelopmentGroup, OHSU
  4. 4. What does data mean to you?
  5. 5. Do you have any training in data management?
  6. 6. Do you know what metadata is?a. Philosophyb. describes datac. dating sited. data
  7. 7. What is data?• Clinical data• Experimental data• School related data• Personal data• Social data
  8. 8. So much data
  9. 9. Why?Personal organizationCredit wherecredit is dueReproducibility of scienceand medicineAccelerates scientific andclinical discoveryEfficiency
  10. 10. Do you get frustrated with any of the followingin your personal or professional life?a. Storing datab. Backing up datac. Analyzing/manipulating datad. Finding data produced by other researchers/clinicianse. Ensuring data are securef. Making data accessible to other researchersg. Controlling access to datah. Tracking updates to data (ie versioning)i. Creating metadata (ie describing the data to be more useful at a latertime or by others)j. Protecting intellectual property rightsk. Ensuring appropriate professional credit/citation is given to datasets/generated
  11. 11. Desktop?
  12. 12. Which of the following do you do?a. Save copies of data on a disk, USB drive, tape, or computer hard driveb. Save copies of data on a local serverc. Save copies of data on a central campus serverd. Save copies of data on a web-based or cloud servere. Store data in a repository or archivesf. Automatically backup filesg. Manually generate backuph. Restrict access to files
  13. 13. Credit where credit is dueData collection& AnalysisAuthoringStorage,Archiving, &PreservationPublication &DisseminationThe scholarlycommunication cycle
  14. 14. Reproducibility of science• Lack of informationmakes it difficult toreproduce experiments• Retraction rates are onthe rise• Difficulty identifyingresources in thepublished literatureCokol et al. EMBO reports (2008) 9, 20%25%50%75%100%Antibodies Cell lines Constructs KnockdownreagentsOrganisms
  15. 15. Sharing can be advantageous
  16. 16. Why share your data?• Data sharingmandates– NIH public accesspolicy– NIH/NSF datasharing plan fornew applications• Further science andand medicine• Build collaborations• Enable newdiscoveries withyour data• Can be required attime of publication
  17. 17. Efficiency
  18. 18. How?• File naming and data storage• Metadata• Controlled vocabularies and Ontologies• Databases andTools• Data accessibility
  19. 19. File naming
  20. 20. Informative file namesWill I remember whatthis file is in a monthfrom now?
  21. 21. Naming conventionsProject_instrument_location_YYYYMMDDhhmmss_extra.extIndex/grant conditions Leading zero!s/n, variable Retainorder
  22. 22. Directory StructureSticking with a directory structure can be hardFiles:SPARC presentationCTSAconnect presentationMonarch presentationPresentationsSPARC CTSAconnect Monarch
  23. 23. VersioningDataManagement_SPARC_050313_final_NV• Save a copy of every version of a data file• Follow a file naming convention• Version control software– Dropbox– Google docs– GIT– SMART SVN
  24. 24.
  25. 25. Google docs
  26. 26. Remember to backup yourdata!• Recommended to back up three copies!– 1 on your local workstation– 1 local/remove, such as external hard drive– 1 remote, such as on a cloud server**Depending on the type of data, as cloud servers are not always secure
  27. 27. Organizing your IRB applicationCreated by Heather SchiffkeSee:
  28. 28. File renaming applications• Bulk Rename Utility (Windows)• Renamer (Mac)• PSRenamer
  29. 29. Metadata
  30. 30. What is Metadata?TitleAuthorCall numberPublisherISBN
  31. 31. File name File typeWho created thedataTitleDate created
  32. 32. Using structured phenotype data to identify geneticbasis of diseaseHuman Disease:HADZISELIMOVICSYNDROMEMost similarmouse model:b2b1035Clo(aka Blue Meanie)tricuspidvalve atresiaMP:0006123prenatal growthretardationMP:0010865persistent truncusarteriosisMP:0002633cleft palateMP:0000111VentricularhypertrophyHP:0001714High-archedpalateHP:0000156Failure to thriveHP:0001508Pulmonaryartery atresiaHP:0004935RenalhypoplasiaHP:0000089abnormalkidneymorphologyabnormalpalatemorphologygrowthdeficiencyMalformationof the heartand greatvesselsabnormalheart andgreat arteryattachmentduplex kidneyMP:0004017Phenotypes incommon(UBEROpheno)
  33. 33. Metadata standards:Controlled vocabularies andontologies
  34. 34. Controlled vocabularies
  35. 35. MeSH
  36. 36. MeSHacetominophen
  37. 37. What is an Ontology?1. Hierarchical terms aredefined textually andlogically2. Relationships betweenthe terms are defined3. Expressed in a languagethat can be reasonedacross by computers4. Data can be reused andcan be easily linkedtogether
  38. 38. Commonly Used Ontologies• GeneOntology• LinnaeanTaxonomy• SNOMED
  39. 39. Why are CVs and Ontologies useful?• Can be used to structure your metadata• Are often used to structure information indatabases
  40. 40. Structured data helps withsearchingCraigslist search: ChaiseCraigslist matches on strings onlyCraigslist search: Fainting couch
  41. 41. Structured data helps withsearchingPubMed indexes articles withMeSHTerms
  42. 42. In Summary:Structured Metadata = goodHow can I create structured metadata?
  43. 43. and Tools…(to make your life easier)(s)
  44. 44. Data Management tools andrepositories• Purpose: Software where you canorganize, store and/or share data• Often contain metadata to assist with dataentry and create structured data
  45. 45. Tools for data management
  46. 46. Data Sharing Repositories
  47. 47. Repositories use Unique IDs• Document Object Identifier (DOI)• Example: DOIs for publications– doi: 10.1371/journal.pbio.1001339• Unique resource identifier (URI)• A URI will resolve to a single location on the web• URIs for people
  48. 48. People Disambiguation
  49. 49. • Example:• John L Campbell, Research Ecologist, Oregon State University, CorvallisOR• John L Campbell, Research Ecologist, Center for Research onEcosystem Change, Durham, NC
  50. 50. Tools for personal datamanagement• Google drive• Dropbox• Evernote• Task Paper• Diigo- bookmarking websites• Mendeley, EndNote, Zotero- citation manager• Sound Gecko
  51. 51. URLs to resourcesGo to:
  52. 52. Data Sharing and Management Snafuin 3 short acts