• Share
  • Email
  • Embed
  • Like
  • Save
  • Private Content
Data management workshop 101113
 

Data management workshop 101113

on

  • 417 views

Presentation on Data management @ UCC

Presentation on Data management @ UCC

Statistics

Views

Total Views
417
Views on SlideShare
359
Embed Views
58

Actions

Likes
0
Downloads
1
Comments
0

1 Embed 58

http://libguides.ohsu.edu 58

Accessibility

Categories

Upload Details

Uploaded via as Adobe PDF

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

    Data management workshop 101113 Data management workshop 101113 Presentation Transcript

    • DATA MANAGEMENT 101 October 11, 2013
    • Hello there!
    • 1 | Data definitions 2 | Dealing with Data 3 | The Next Steps
    • 1 |Data definitions
    • asdf Data can be complex
    • Data can be amazing
    • Data are about discovery
    • But…
    • Data does not speak for itself…
    • YOU speak for YOUR data
    • and you need to manage it
    • But, even more fundamentally…
    • tomaytoe PANTONE  1795 C tomahto Solanum lycopersicum tdTomato 554ex 581em $64
    • Data means different things to different people
    • Data is not static
    • 1. Brilliant Idea! 2. Design Experiment 3. Do Experiment The data timeline 4. Collect data 5. Compile and Analyze 6. Publish 7. Fame, Fortune
    • 1. Brilliant Idea! 2. Design Experiment 3. Do Experiment 4. Collect data The data timeline: What people think 5. Compile and Analyze 6. Publish 7. Fame, Fortune
    • Idea! Analyzing data The data timeline: What Happens experiment Compile data design Other People’s data Try #2 Failure!! beer #896 #896 6. Publish
    • The data cycle: What Really Happens
    • 2 |dealing with data
    • Why should I care? Hello there!
    • So you won’t go crazy Efficiency Accelerates scientific discovery Reproducibility of science Credit where credit is due Personal organization
    • Do you get frustrated with… Hello there!
    • a. Storing data b. Backing up data c. Analyzing/manipulating data d. Finding data produced by other researchers e. Ensuring data are secure f. Making data accessible to other researchers g. Controlling access to data h. Tracking updates to data (ie versioning) i. Creating metadata  j. Protecting intellectual property rights  k. Ensuring appropriate professional credit/citation is given
    • How do I not go crazy? naming|metadata |standards | tools
    • naming
    • Naming: File Names
    • File naming
    • File naming
    • Naming conventions s/n, variable Retain order Project_instrument_location_YYYYMMDDhhmmss_extra.ext Index/grant conditions Leading zero!
    • Experiment: stem cells on fibrin to damaged heart Lamar Soutter Library UMMS
    • ‐2 days:  Incubate stem cells with markers ‐1 day:  Stem cells in solution with biological suture 0 day: #1 Surgery: infarct/delivery of stem cells to damaged heart tissue  Variable days:  #2 Surgery:  examination, high speed imaging/LVPs,  isolate heart and place it in freezer  Post days +:  Section heart, tissues on slides, staining, images of tissues,  tracking particles on heart   Collective data from experiment
    • ‐2 days:  Incubate stem cells with markers ‐1 day:  Stem cells in solution with biological suture 0 day: #1 Surgery: infarct/delivery of stem cells to damaged heart tissue  TIME | TYPE | USE Variable days:  #2 Surgery:  examination, high speed imaging/LVPs,  isolate heart and place it in freezer  Post days +:  Section heart, tissues on slides, staining, images of tissues,  tracking particles on heart   Collective data from experiment
    • Data File Format Images Machine dependent Ventricular pressure  measurements Proprietary  Home made software MATLAB or C Histology sections Slides and images Contextual Project, Experiment, Animal Many different file types
    • Data File Format Name Images Machine  dependent Scope_Date_Var Ventricular  pressure  measurements Proprietary  M_Date_Var.raw Home made  software MATLAB or C Script_Date_Var Histology  sections Slides and images Anat_Date_Stain Separate Nomenclature
    • Data File Format Name Images (1) Machine  dependent E_1_Date_var Ventricular  pressure  measurements  (2) Proprietary  E_2_Date_Var Home made  software (3) MATLAB or C E_3_Script_Var Histology  sections (4) Slides and images E_4_Date_Stain Unified Nomenclature
    • Type Recommended Avoid for data sharing Tabular data CSV, TSV, SPSS portable Excel Text Plain text, HTML, RTF PDF/A only if layout matters Word Media Container: MP4, Ogg Codec: Theora, Dirac, FLAC Quicktime H264 Images TIFF, JPEG2000, PNG GIF, JPG Structured data XML, RDF RDBMS Recommended File Formats
    • RESOURCES • Bulk Rename Utility (Windows) • Renamer (Mac) • PSRenamer • Mendeley Bulk File Renaming Tools
    • Naming conventions Grant_Project_experiment_instrument_location_weather_catsname_i cecreamflavor_collaborator_owner_zodiacsign_mousemodel_address _painscalerating_favoritecolor_ssn_shoesize_sex_eyecolor_tattoos_ scars_votingrecord_YYYYMMDDhhmmss_extra.ext
    • Naming: Directory Structure
    • Presentations Data presentation CTSAconnect presentation Monarch presentation SPARC CTSAconnect Monarch
    • RESOURCES www.coggle.it http://ftp.ihmc.us/ www.mindjet.com Mindmapping Software
    • Oldie but goodie…
    • Naming: Version Control
    • DataManagement@UPR_seminars_101113_JW DataManagement@UPR_data_101113_JW DataManagement_dataship_100313_NV_JW_MH_RC Data101_dataship_091113_FINAL_JW Data101_dataship_091013_v04_JW DataManagement_dataship_091013_v03_JW DataManagement_dataship_090913_v02_JW DataManagement_dataship_090913_v01_JW DataManagement_SPARC_082013_FINAL_NV DataManagement_SPARC_052013_v8
    • RESOURCES Dropbox | Google docs GIT | SMART SVN Version Control
    • Naming: Backups
    • Which of the following do you do?
    • a. Save copies of data on a disk, USB drive, or computer hard drive b. Save copies of data on a local server c. Save copies of data on a central campus server d. Save copies of data on a web based or cloud server e. Store data in a repository or archives f. Automatically backup files g. Manually generate backup h. Restrict access to files
    • 3 | copies (you, lab, other) 2 | 2 different forms 1 | remote location
    • ETHICS
    • Computing in the cloud
    • How do I not go crazy? naming|metadata |standards | tools
    • Metadata/ Controled Vocab/Ontologies
    • You speak for your data
    • How do you speak for your data when you are not around?
    • Metadata Controlled Vocabularies How do you speak for your data ontologies when you are not around?
    • What it is Controlled vocabularies metadata What it takes to do it Relevant variables Controlled vocab definitions grouping ontologies classification connection
    • What is metadata, really? Hello there!
    • a. a philosophy b. describes data c. dating site d. data
    • Title Author Call number Publisher ISBN
    • Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata metadata Your metadata should make your data understandable to others without your involvement - Anne Gilliland
    • File name File type Title Date created Who created the  data
    • RESOURCES http://www.dlib.indiana.edu/~jenlrile/metadatamap/ Metadata standards
    • RESOURCES http://rs.tdwg.org/dwc/ Metadata standards
    • What is controlled vocab, really? Hello there!
    • Craigslist search: Chaise Craigslist search: Fainting couch
    • acetominophen
    • PubMed indexes articles with  MeSH Terms
    • What is an ontology, really? Hello there!
    • Human Disease: PFEIFFER SYNDROME Coronal craniosynostosis HP:0004440 Hypoplasia of the maxilla HP:0000327 Cross-species Phenotype premature suture closure Most similar mouse model: CD1.Cg-Fgfr2tm4Lni/H premature suture closure MP:0000081 maxilla hypoplasia short maxilla MP:0000097 malocclusion Dental crowding HP:0000678 Brachyturricephaly HP:0000244 Hypertelorism HP:0000316 malocclusion MP:0000120 shortened head ocular hypertelorism shortened head MP:0000435 ocular hypertelorism MP:0001300
    • How? naming|metadata |standards | tools
    • standards
    • Meet the Urban Lab Meet the Urban Lab
    • The Urban Lab Antibodies A+ organization!
    • Percent identifiable 90% 80% 70% 60% 50% 40% 30% 20% 10% 0% Commerical Ab Catalog number Source organism Target uniquely identifiable reported reported identifiable Of 14 antibodies published in 45 articles, only 38% were identifiable
    • RESOURCES AntibodyRegistry.org
    • Are you aware of data standards in your field? @OHSU, 72% said no or didn’t know!
    • Data standards are the rules by which data are  described and recorded. In order to share, exchange,  and understand data, we must standardize the format  as well as the meaning. www.usgs.gov/datamanagement/plan/datastandards.php
    • Many microarray transcriptomics standards JAMIA:sea‐of‐standards
    • Reporting guidelines Terminology structure (Interoperability) Exchange Formats
    • How do these standards work?
    • Metadata Controlled Vocabularies How do you speak for your data ontologies when you are not around?
    • RESOURCES Minimum Information for Biological and Biomedical Investigations 
    • RESOURCES www.cdisc.org
    • RESOURCES www.force11.org/node/4463 biosharing.org/bsg-000532 Minimum Information for  Biological and Biomedical Investigations  http://www.biosharing.org/standards/mibbi Reporting Standards
    • How? naming|metadata |standards | tools
    • tools
    • RESOURCES runmycode.org www.wf4ever‐project.org galaxyproject.org/ Workflow analysis platforms
    • RESOURCES http://opus.bath.ac.uk/32296 www.labarchives.com www.labguru.com
    • RESOURCES https://dmp.cdlib.org/
    • Uniquely identifying data www.flickr.com/photos/pmeimon
    • Digital Object Identifier (DOI) Example: 10.1371/journal.pbio.1001339 Unique resource identifier (URI) A URI will resolve to a single location on the web URIs for people Repositories use Unique IDs
    • RESOURCES Repository Map
    • RESOURCES Data Sharing Repositories
    • 3 |next steps
    • Thinking Beyond the PDF Raw Science Small publications Self-publishing Datasets Nanopublications Blogging Code Argument or passage Social Media Experimental design Single figure publications Comments & Reviews Annotations
    • Research Products
    • RESOURCES figshare.com datadryad.org thedata.org www.dataone.org data.rutgers.edu/ v n2t.net/ezid nature.com/scientificdata/ F1000.com/ Data publishing and sharing
    • You are unique, too
    • John L Campbell, Research Ecologist, Oregon State  University, Corvallis OR John L Campbell, Research Ecologist, Center for  Research on Ecosystem Change, Durham, NC
    • RESOURCES Impact.Story impactstory.org www.plumanalytics.com http://myidp.sciencecareers.org/ orcid.org Yes, you are an individual!
    • 1 | Data definitions 2 | Dealing with Data 3 | The Next Step
    • wirzj@ohsu.edu
    • http://libguides.ohsu.edu/data
    • Melissa Haendel Nicole Vasilevsky Robin Champieux
    • Thank you!
    • Questions?
    • what happens between publications?
    • “We are Drowning in Information but Starved for Knowledge” John Naisbitt
    • “We are Drowning in data but Starved for Knowledge” Hello there! Jackie’s bad paraphrase of John Naisbitt
    • Dr. Sawyer Dr. Finn Connected? University Collegeville College University of Badass No Journals Journal of Information Journal of Data No Grant Title Protein G is Important Protein H is Important No
    • Dr. Sawyer Dr. Finn Connected? Machines Used Alpha, Gamma, Theta,  Sigma Gamma, Beta, Kappa, Theta Yes! Reagents Used Cyan,  Orange, Green,  Mauve, Beige Mauve, Chartreuse, Cyan,  Green, Taupe Yes! Genes Referenced bz3d14.2, bz3c13.1,  bz3d,98.1 bz3c13.1 Yes! Proteins Referenced Eng1a, Ntl, Ncdq Ndrw, Eng1a, Brs Yes! People Affiliated with  Resources Harry, Neville, Ron Harry, Ron, Hermione Yes!