SPARC 2013 Data Management Presentation

459 views
369 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
459
On SlideShare
0
From Embeds
0
Number of Embeds
71
Actions
Shares
0
Downloads
8
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • NICOLE
  • NICOLE
  • NICOLE
  • NICOLE
  • NICOLE, MELISSA, JACKIEWhen I was a graduate student, data looked like thisJackie, Melissa, Nicole each show an exampleWhat does mean to you?
  • NICOLE
  • NICOLE
  • NICOLEAsk them to brainstorm some examples of each of theseClinical dataData that is captured in the clinic, ie, vitals, chief complaints, diagnosesExperimental dataOutput from assays, such as numbers in a spreadsheet, images, recordings in a lab notebook, facs plots from a flow cytometerSchool related dataSyllabus, coursework/assignments, tracking student information, etcPersonal dataPersonal files on your computer, your word files, your google docs, your music stored on your computer, your facebook profileSocial dataFacebook, LinkedIn, Instagram
  • NICOLE
  • NICOLESmall scale- to big scalePersonal:Efficiency- big data and how airplane companies have figured out how to make airline departures more efficientAirplane dept and arrivals Healthcare can be more efficientTraffic patternsEtcCan leverage data that we have to be more efficient and effective
  • NICOLE
  • NICOLEFind passwordFind file on your computer
  • NICOLE
  • MELISSA: Impact story- scholarly communications come in many forms, not just publications
  • MELISSA
  • MELISSA
  • MELISSA
  • MELISSAImproved airline ETAsPilots used to provide the ETA at the airportA company started collecting data about arrival times, and can now better calculate the time of arrival, up to 10 mins closer to the actual timeUses combination of data including weather patterns, flight schedules, previous flight history and arrivals under certain conditions, etc.
  • JACKIE
  • JACKIE
  • JACKIE
  • JACKIE
  • JACKIE
  • JACKIEShow examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was made
  • JACKIE
  • JACKIE
  • JACKIE
  • NICOLENOTE: Need to post this on our lib guide
  • NICOLESoftware that can rename your files, if you already have them named
  • NICOLE
  • NICOLE
  • NICOLE
  • NICOLEHave them look at this data and try to come up with more metadata
  • Additional metadata on the patientData on the fileData on the columnData on the rowData in each cellPatient 1 has an ID? Where is the ID and where is it stored?
  • NICOLEHelfgott would like to pull data from epic to do secondary analysis on patientsCan track outcomes such as, do patients have decreased pain over time after visiting the NCNM clinic, when treated with certain interventionsCan come up with hypotheses and do analysis on patient data
  • NICOLE- Epic is commonly used in the clinic and contains structured fields for collecting data about patients- Issue is with data entryGarbage in/garbage outStudents (and faculty) are not consistently trained on how to enter data into epicData entry and collection is not done consistently within the clinicFor example, some practioners enter BP into BP fieldOthers add it in progress notesUsing structured metadata allows more consistent date collection and reportingEnables researchers to do secondary analysis on the dataCan pull all the BP data from the BP field if it’s thereIf it’s in the notes or comments, it’s difficult to grab and analyze this data
  • MELISSA
  • MELISSA
  • MELISSA
  • JACKIE
  • JACKIE
  • MELISSA
  • MELISSA
  • MELISSA
  • NICOLE
  • NICOLE
  • NICOLEBiosharing and Isatab tools
  • NICOLE
  • NICOLE
  • NICOLEFigShareDryadData.gov
  • NICOLE
  • NICOLE
  • NICOLE
  • MELISSAGoal is to solve the author/contributor name ambiguity problem in scholarly communications Creating a central registry of unique identifiers for individual researchers Identifiers, and the relationships among them, can be linked to the researcher
  • MELISSA
  • MELISSAPC tool- OneNote
  • MELISSA
  • SPARC 2013 Data Management Presentation

    1. 1. Data management.NicoleVasilevsky, NCNM, OHSUJackieWirz, OHSUMelissa Haendel, OHSU
    2. 2. Outline• Introduction• Why do we need good data management?• Good data management• Databases and tools• Sharing your data
    3. 3. Who are we?• NicoleVasilevsky, PhD– Assistant Professor, Helfgott Research Institute, NCNM– Project Manager, Ontology Development Group, OHSU• JackieWirz, PhD– Assistant Professor, Bioinformation Specialist, OHSU library• Melissa Haendel, PhD– Assistant Professor, Department Head, Ontology DevelopmentGroup, OHSU
    4. 4. What does data mean to you?
    5. 5. Do you have any training in data management?
    6. 6. Do you know what metadata is?a. Philosophyb. describes datac. dating sited. data
    7. 7. What is data?• Clinical data• Experimental data• School related data• Personal data• Social data
    8. 8. So much data
    9. 9. Why?Personal organizationCredit wherecredit is dueReproducibility of scienceand medicineAccelerates scientific andclinical discoveryEfficiency
    10. 10. Do you get frustrated with any of the followingin your personal or professional life?a. Storing datab. Backing up datac. Analyzing/manipulating datad. Finding data produced by other researchers/clinicianse. Ensuring data are securef. Making data accessible to other researchersg. Controlling access to datah. Tracking updates to data (ie versioning)i. Creating metadata (ie describing the data to be more useful at a latertime or by others)j. Protecting intellectual property rightsk. Ensuring appropriate professional credit/citation is given to datasets/generated
    11. 11. http://davidmichaelangelosilva.wordpress.com/2012/01/29/organize-your-messy-desktop-with-fences/Messy Desktop?
    12. 12. Which of the following do you do?a. Save copies of data on a disk, USB drive, tape, or computer hard driveb. Save copies of data on a local serverc. Save copies of data on a central campus serverd. Save copies of data on a web-based or cloud servere. Store data in a repository or archivesf. Automatically backup filesg. Manually generate backuph. Restrict access to files
    13. 13. Credit where credit is dueData collection& AnalysisAuthoringStorage,Archiving, &PreservationPublication &DisseminationThe scholarlycommunication cycle
    14. 14. Reproducibility of science• Lack of informationmakes it difficult toreproduce experiments• Retraction rates are onthe rise• Difficulty identifyingresources in thepublished literatureCokol et al. EMBO reports (2008) 9, 20%25%50%75%100%Antibodies Cell lines Constructs KnockdownreagentsOrganisms
    15. 15. Sharing can be advantageoushttp://www.flickr.com/photos/eltonl/107582334/sizes/l/in/photostream/
    16. 16. Why share your data?• Data sharingmandates– NIH public accesspolicy– NIH/NSF datasharing plan fornew applications• Further science andand medicine• Build collaborations• Enable newdiscoveries withyour data• Can be required attime of publication
    17. 17. Efficiencyhttp://hbr.org/2012/10/big-data-the-management-revolutionhttps://upload.wikimedia.org/wikipedia/commons/b/ba/HMS_Surprise_at_sunset_with_airplane.jpg
    18. 18. How?• File naming and data storage• Metadata• Controlled vocabularies and Ontologies• Databases andTools• Data accessibility
    19. 19. File naming
    20. 20. Informative file namesWill I remember whatthis file is in a monthfrom now?
    21. 21. Naming conventionsProject_instrument_location_YYYYMMDDhhmmss_extra.extIndex/grant conditions Leading zero!s/n, variable Retainorder
    22. 22. Directory StructureSticking with a directory structure can be hardFiles:SPARC presentationCTSAconnect presentationMonarch presentationPresentationsSPARC CTSAconnect Monarch
    23. 23. VersioningDataManagement_SPARC_050313_final_NV• Save a copy of every version of a data file• Follow a file naming convention• Version control software– Dropbox– Google docs– GIT– SMART SVN
    24. 24. Dropboxwww.dropbox.com
    25. 25. Google docs
    26. 26. Remember to backup yourdata!• Recommended to back up three copies!– 1 on your local workstation– 1 local/remove, such as external hard drive– 1 remote, such as on a cloud server**Depending on the type of data, as cloud servers are not always securehttp://libraries.mit.edu/guides/subjects/data-management/Managing%20Research%20Data%20101.pdf
    27. 27. Organizing your IRB applicationCreated by Heather SchiffkeSee:http://libguides.ohsu.edu/data
    28. 28. File renaming applications• Bulk Rename Utility (Windows)• Renamer (Mac)• PSRenamer
    29. 29. Metadata
    30. 30. What is Metadata?TitleAuthorCall numberPublisherISBN
    31. 31. File name File typeWho created thedataTitleDate created
    32. 32. Using structured phenotype data to identify geneticbasis of diseaseHuman Disease:HADZISELIMOVICSYNDROMEMost similarmouse model:b2b1035Clo(aka Blue Meanie)tricuspidvalve atresiaMP:0006123prenatal growthretardationMP:0010865persistent truncusarteriosisMP:0002633cleft palateMP:0000111VentricularhypertrophyHP:0001714High-archedpalateHP:0000156Failure to thriveHP:0001508Pulmonaryartery atresiaHP:0004935RenalhypoplasiaHP:0000089abnormalkidneymorphologyabnormalpalatemorphologygrowthdeficiencyMalformationof the heartand greatvesselsabnormalheart andgreat arteryattachmentduplex kidneyMP:0004017Phenotypes incommon(UBEROpheno)
    33. 33. Metadata standards:Controlled vocabularies andontologies
    34. 34. Controlled vocabularies
    35. 35. MeSH
    36. 36. MeSHacetominophen
    37. 37. What is an Ontology?1. Hierarchical terms aredefined textually andlogically2. Relationships betweenthe terms are defined3. Expressed in a languagethat can be reasonedacross by computers4. Data can be reused andcan be easily linkedtogether
    38. 38. Commonly Used Ontologies• GeneOntology• LinnaeanTaxonomy• SNOMED
    39. 39. Why are CVs and Ontologies useful?• Can be used to structure your metadata• Are often used to structure information indatabases
    40. 40. Structured data helps withsearchingCraigslist search: ChaiseCraigslist matches on strings onlyCraigslist search: Fainting couch
    41. 41. Structured data helps withsearchingPubMed indexes articles withMeSHTerms
    42. 42. In Summary:Structured Metadata = goodHow can I create structured metadata?http://www.flickr.com/photos/san_drino/1454922072/
    43. 43. and Tools…(to make your life easier)(s)http://farm4.static.flickr.com/3560/3332644561_c9d5041d02.jpg
    44. 44. Data Management tools andrepositories• Purpose: Software where you canorganize, store and/or share data• Often contain metadata to assist with dataentry and create structured data
    45. 45. Tools for data management
    46. 46. Data Sharing Repositorieshttp://www.nlm.nih.gov/NIHbmic/nih_data_sharing_repositories.html
    47. 47. Repositories use Unique IDs• Document Object Identifier (DOI)• Example: DOIs for publications– doi: 10.1371/journal.pbio.1001339• Unique resource identifier (URI)• A URI will resolve to a single location on the web• URIs for people
    48. 48. People Disambiguation
    49. 49. • Example:• John L Campbell, Research Ecologist, Oregon State University, CorvallisOR• John L Campbell, Research Ecologist, Center for Research onEcosystem Change, Durham, NC
    50. 50. Tools for personal datamanagement• Google drive• Dropbox• Evernote• Task Paper• Diigo- bookmarking websites• Mendeley, EndNote, Zotero- citation manager• Sound Geckohttp://blogs.scientificamerican.com/information-culture/2012/12/10/managing-personal-knowledge-data-and-information/
    51. 51. URLs to resourcesGo to:http://libguides.ohsu.edu/data
    52. 52. Data Sharing and Management Snafuin 3 short acts

    ×