SPARC 2013 Data Management Presentation
Upcoming SlideShare
Loading in...5

Like this? Share it with your network


SPARC 2013 Data Management Presentation






Total Views
Views on SlideShare
Embed Views



1 Embed 55 55



Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
Post Comment
Edit your comment
  • NICOLE, MELISSA, JACKIEWhen I was a graduate student, data looked like thisJackie, Melissa, Nicole each show an exampleWhat does mean to you?
  • NICOLEAsk them to brainstorm some examples of each of theseClinical dataData that is captured in the clinic, ie, vitals, chief complaints, diagnosesExperimental dataOutput from assays, such as numbers in a spreadsheet, images, recordings in a lab notebook, facs plots from a flow cytometerSchool related dataSyllabus, coursework/assignments, tracking student information, etcPersonal dataPersonal files on your computer, your word files, your google docs, your music stored on your computer, your facebook profileSocial dataFacebook, LinkedIn, Instagram
  • NICOLESmall scale- to big scalePersonal:Efficiency- big data and how airplane companies have figured out how to make airline departures more efficientAirplane dept and arrivals Healthcare can be more efficientTraffic patternsEtcCan leverage data that we have to be more efficient and effective
  • NICOLEFind passwordFind file on your computer
  • MELISSA: Impact story- scholarly communications come in many forms, not just publications
  • MELISSAImproved airline ETAsPilots used to provide the ETA at the airportA company started collecting data about arrival times, and can now better calculate the time of arrival, up to 10 mins closer to the actual timeUses combination of data including weather patterns, flight schedules, previous flight history and arrivals under certain conditions, etc.
  • JACKIEShow examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was made
  • NICOLENOTE: Need to post this on our lib guide
  • NICOLESoftware that can rename your files, if you already have them named
  • NICOLEHave them look at this data and try to come up with more metadata
  • Additional metadata on the patientData on the fileData on the columnData on the rowData in each cellPatient 1 has an ID? Where is the ID and where is it stored?
  • NICOLEHelfgott would like to pull data from epic to do secondary analysis on patientsCan track outcomes such as, do patients have decreased pain over time after visiting the NCNM clinic, when treated with certain interventionsCan come up with hypotheses and do analysis on patient data
  • NICOLE- Epic is commonly used in the clinic and contains structured fields for collecting data about patients- Issue is with data entryGarbage in/garbage outStudents (and faculty) are not consistently trained on how to enter data into epicData entry and collection is not done consistently within the clinicFor example, some practioners enter BP into BP fieldOthers add it in progress notesUsing structured metadata allows more consistent date collection and reportingEnables researchers to do secondary analysis on the dataCan pull all the BP data from the BP field if it’s thereIf it’s in the notes or comments, it’s difficult to grab and analyze this data
  • NICOLEBiosharing and Isatab tools
  • MELISSAGoal is to solve the author/contributor name ambiguity problem in scholarly communications Creating a central registry of unique identifiers for individual researchers Identifiers, and the relationships among them, can be linked to the researcher
  • MELISSAPC tool- OneNote

SPARC 2013 Data Management Presentation Presentation Transcript

  • 1. Data management.NicoleVasilevsky, NCNM, OHSUJackieWirz, OHSUMelissa Haendel, OHSU
  • 2. Outline• Introduction• Why do we need good data management?• Good data management• Databases and tools• Sharing your data
  • 3. Who are we?• NicoleVasilevsky, PhD– Assistant Professor, Helfgott Research Institute, NCNM– Project Manager, Ontology Development Group, OHSU• JackieWirz, PhD– Assistant Professor, Bioinformation Specialist, OHSU library• Melissa Haendel, PhD– Assistant Professor, Department Head, Ontology DevelopmentGroup, OHSU
  • 4. What does data mean to you?
  • 5. Do you have any training in data management?
  • 6. Do you know what metadata is?a. Philosophyb. describes datac. dating sited. data
  • 7. What is data?• Clinical data• Experimental data• School related data• Personal data• Social data
  • 8. So much data
  • 9. Why?Personal organizationCredit wherecredit is dueReproducibility of scienceand medicineAccelerates scientific andclinical discoveryEfficiency
  • 10. Do you get frustrated with any of the followingin your personal or professional life?a. Storing datab. Backing up datac. Analyzing/manipulating datad. Finding data produced by other researchers/clinicianse. Ensuring data are securef. Making data accessible to other researchersg. Controlling access to datah. Tracking updates to data (ie versioning)i. Creating metadata (ie describing the data to be more useful at a latertime or by others)j. Protecting intellectual property rightsk. Ensuring appropriate professional credit/citation is given to datasets/generated
  • 11. Desktop?
  • 12. Which of the following do you do?a. Save copies of data on a disk, USB drive, tape, or computer hard driveb. Save copies of data on a local serverc. Save copies of data on a central campus serverd. Save copies of data on a web-based or cloud servere. Store data in a repository or archivesf. Automatically backup filesg. Manually generate backuph. Restrict access to files
  • 13. Credit where credit is dueData collection& AnalysisAuthoringStorage,Archiving, &PreservationPublication &DisseminationThe scholarlycommunication cycle
  • 14. Reproducibility of science• Lack of informationmakes it difficult toreproduce experiments• Retraction rates are onthe rise• Difficulty identifyingresources in thepublished literatureCokol et al. EMBO reports (2008) 9, 20%25%50%75%100%Antibodies Cell lines Constructs KnockdownreagentsOrganisms
  • 15. Sharing can be advantageous
  • 16. Why share your data?• Data sharingmandates– NIH public accesspolicy– NIH/NSF datasharing plan fornew applications• Further science andand medicine• Build collaborations• Enable newdiscoveries withyour data• Can be required attime of publication
  • 17. Efficiency
  • 18. How?• File naming and data storage• Metadata• Controlled vocabularies and Ontologies• Databases andTools• Data accessibility
  • 19. File naming
  • 20. Informative file namesWill I remember whatthis file is in a monthfrom now?
  • 21. Naming conventionsProject_instrument_location_YYYYMMDDhhmmss_extra.extIndex/grant conditions Leading zero!s/n, variable Retainorder
  • 22. Directory StructureSticking with a directory structure can be hardFiles:SPARC presentationCTSAconnect presentationMonarch presentationPresentationsSPARC CTSAconnect Monarch
  • 23. VersioningDataManagement_SPARC_050313_final_NV• Save a copy of every version of a data file• Follow a file naming convention• Version control software– Dropbox– Google docs– GIT– SMART SVN
  • 24.
  • 25. Google docs
  • 26. Remember to backup yourdata!• Recommended to back up three copies!– 1 on your local workstation– 1 local/remove, such as external hard drive– 1 remote, such as on a cloud server**Depending on the type of data, as cloud servers are not always secure
  • 27. Organizing your IRB applicationCreated by Heather SchiffkeSee:
  • 28. File renaming applications• Bulk Rename Utility (Windows)• Renamer (Mac)• PSRenamer
  • 29. Metadata
  • 30. What is Metadata?TitleAuthorCall numberPublisherISBN
  • 31. File name File typeWho created thedataTitleDate created
  • 32. Using structured phenotype data to identify geneticbasis of diseaseHuman Disease:HADZISELIMOVICSYNDROMEMost similarmouse model:b2b1035Clo(aka Blue Meanie)tricuspidvalve atresiaMP:0006123prenatal growthretardationMP:0010865persistent truncusarteriosisMP:0002633cleft palateMP:0000111VentricularhypertrophyHP:0001714High-archedpalateHP:0000156Failure to thriveHP:0001508Pulmonaryartery atresiaHP:0004935RenalhypoplasiaHP:0000089abnormalkidneymorphologyabnormalpalatemorphologygrowthdeficiencyMalformationof the heartand greatvesselsabnormalheart andgreat arteryattachmentduplex kidneyMP:0004017Phenotypes incommon(UBEROpheno)
  • 33. Metadata standards:Controlled vocabularies andontologies
  • 34. Controlled vocabularies
  • 35. MeSH
  • 36. MeSHacetominophen
  • 37. What is an Ontology?1. Hierarchical terms aredefined textually andlogically2. Relationships betweenthe terms are defined3. Expressed in a languagethat can be reasonedacross by computers4. Data can be reused andcan be easily linkedtogether
  • 38. Commonly Used Ontologies• GeneOntology• LinnaeanTaxonomy• SNOMED
  • 39. Why are CVs and Ontologies useful?• Can be used to structure your metadata• Are often used to structure information indatabases
  • 40. Structured data helps withsearchingCraigslist search: ChaiseCraigslist matches on strings onlyCraigslist search: Fainting couch
  • 41. Structured data helps withsearchingPubMed indexes articles withMeSHTerms
  • 42. In Summary:Structured Metadata = goodHow can I create structured metadata?
  • 43. and Tools…(to make your life easier)(s)
  • 44. Data Management tools andrepositories• Purpose: Software where you canorganize, store and/or share data• Often contain metadata to assist with dataentry and create structured data
  • 45. Tools for data management
  • 46. Data Sharing Repositories
  • 47. Repositories use Unique IDs• Document Object Identifier (DOI)• Example: DOIs for publications– doi: 10.1371/journal.pbio.1001339• Unique resource identifier (URI)• A URI will resolve to a single location on the web• URIs for people
  • 48. People Disambiguation
  • 49. • Example:• John L Campbell, Research Ecologist, Oregon State University, CorvallisOR• John L Campbell, Research Ecologist, Center for Research onEcosystem Change, Durham, NC
  • 50. Tools for personal datamanagement• Google drive• Dropbox• Evernote• Task Paper• Diigo- bookmarking websites• Mendeley, EndNote, Zotero- citation manager• Sound Gecko
  • 51. URLs to resourcesGo to:
  • 52. Data Sharing and Management Snafuin 3 short acts