Data101 pmcb retreat_09-20-13_final
Upcoming SlideShare
Loading in...5
×

Like this? Share it with your network

Share

Data101 pmcb retreat_09-20-13_final

  • 361 views
Uploaded on

 

More in: Education , Technology
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
    Be the first to like this
No Downloads

Views

Total Views
361
On Slideshare
270
From Embeds
91
Number of Embeds
1

Actions

Shares
Downloads
3
Comments
0
Likes
0

Embeds 91

http://libguides.ohsu.edu 91

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide
  • JW
  • JW
  • JW
  • JW
  • JW
  • JW
  • Ask them to think about what type of data they deal with/generate. Give a couple minutes.
  • Ask if they have additional data types that they brainstormed
  • JW
  • These are all things that the library can help you do
  • JW
  • JW
  • http://patenteux.com/Messy_desktop/messy_wallpaper-1280x1024.jpg
  • If you work on the command line, you can see all the file paths
  • JW
  • Show examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was made
  • Show examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was madeNEW SLIDES:Examples of versions of dataData101_NV_v1Data101_NV_v2Simple software solutionsSome software keeps versions for youShow where to go get itVersion Control SoftwareVersion control softwareSVN, GITShow example of google codeCan write commit messages you version you commit
  • Show examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was made
  • Show examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was made
  • Show examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was made
  • NICOLE
  • Central servers will have multiple redundancy, back ups of back upsHigh quality secure USBs with passwords and encyrption, or burn to disk
  • JW
  • !
  • Move this
  • Information science is a parent
  • Ontologies classify terms and the relationships between them.
  • JW
  • Software that can rename your files, if you already have them named
  • Goal is to solve the author/contributor name ambiguity problem in scholarly communications Creating a central registry of unique identifiers for individual researchers Identifiers, and the relationships among them, can be linked to the researcher
  • JW
  • JW
  • JW
  • JW
  • JW
  • JW
  • JW
  • Maybe discuss the PlumX project?
  • JW
  • Say that we won an award to sponsor this program

Transcript

  • 1. DATA MANAGEMENT 101 Nicole Vasilevsky, Jackie Wirz and Melissa Haendel PMCB New Student Orientation 20 September 2013
  • 2. 1 | Data definitions 2 | Dealing with data 3 | How the OHSU Library can help
  • 3. Nicole Vasilevsky, Ph D Project Manager, Ontolo gy Development Group Jackie Wirz, PhD Assistant Professor, Bioinformation Specialist Melissa Haendel, PhD Assistant Professor, Lead, Ontology Development Group
  • 4. 1 | Data definitions
  • 5. Data does not speak for itself…
  • 6. YOU speak for YOUR data
  • 7. But First, you need to manage it
  • 8. But, even more fundamentally…
  • 9. data means many things…
  • 10. what does data mean to you?
  • 11. What are data? Experimental data Social data School related data Personal data
  • 12. Do you know what metadata is? a. Philosophy b. describes data c. dating site d. data
  • 13. 2 | dealing with data
  • 14. Do you get frustrated with any of the following? a. Storing data b. Backing up data c. Analyzing/manipulating data d. Finding data produced by other researchers/clinicians e. Ensuring data are secure f. Making data accessible to other researchers g. Controlling access to data h. Tracking updates to data (ie versioning) i. Creating metadata (ie describing the data to be more useful at a later time or by others) j. Protecting intellectual property rights k. Ensuring appropriate professional credit/citation is given to data sets/generated
  • 15. Why? Personal organization Efficiency Credit where credit is due Accelerate scientific and clinical discovery Reproducibility of science and medicine
  • 16. naming | metadata | tools | standards How?
  • 17. naming
  • 18. File naming
  • 19. Naming conventions Project_instrument_location_YYYYMMDDhhm mss_extra.ext Index/grant conditions Leading zero! s/n, variable Retain order
  • 20. Naming: Directory Structure
  • 21. PCMB presentation Library presentation DMICE presentation Presentations PMCB Library DMICE
  • 22. http://ftp.ihmc.us/
  • 23. ReadMe
  • 24. Version Control
  • 25. Versioning • Save a copy of every version of a file • Follow a file naming convention Data101_PMCB_Retreat_09-20-13_v1 Data101_PMCB_Retreat_09-20-13_v2 Data101_PMCB_Retreat_09-20-13_Final
  • 26. Versioning
  • 27. Versioning
  • 28. Versioning Version Control software: • GIT • SVN
  • 29. Backups
  • 30. Which of the following do you do? a. Save copies of data on a disk, USB drive, or computer hard drive b. Save copies of data on a local server c. Save copies of data on a central campus server d. Save copies of data on a web-based or cloud server e. Store data in a repository or archives f. Automatically backup files g. Manually generate backup h. Restrict access to files
  • 31.  1 on your local workstation  1 local/removable, such as external hard drive  1 on central server  1 remote, such as on a cloud server* *Depending on the type of data, as cloud servers are not always secure Where can you backup your data?
  • 32. Metadata
  • 33. What is Metadata? Title Author Call number Publisher ISBN
  • 34. - Anne Gilliland Your metadata should make your data understandable to others without your involvement Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata
  • 35. Are you aware of data standards in your field?
  • 36. data standards Data standards are the rules by which data are described and recorded. In order to share, exchange, and understand data, we must standardize the format as well as the meaning. http://www.usgs.gov/datamanagement/plan/datastandards.php
  • 37. Controlled vocabularies
  • 38. Structured data helps with searching Craigslist search: Chaise Craigslist matches on strings only Craigslist search: Fainting couch
  • 39. Structured data helps with searching PubMed indexes articles with MeSH Terms
  • 40. Structured data helps with searching
  • 41. Why are CVs and Ontologies useful? • Can be used to structure your metadata • Are often used to structure information in databases Cell Ontology Linnean Taxonomy Order Genus Species Phylum Class Family Kingdom
  • 42. tools
  • 43. File renaming applications • Bulk Rename Utility (Windows) • Renamer (Mac) • PSRenamer
  • 44. Data Management tools and repositories • Purpose: Software where you can organize, store and/or share data • Often contain metadata to assist with data entry and create structured data
  • 45. Tools for data management
  • 46. Repositories use Unique IDs • Document Object Identifier (DOI) • Example: DOIs for publications – doi: 10.1371/journal.pbio.1001339 • Unique resource identifier (URI) • A URI will resolve to a single location on the web • URIs for people
  • 47. • Example: • John L Campbell, Research Ecologist, Oregon State University, Corvallis OR • John L Campbell, Research Ecologist, Center for Research on Ecosystem Change, Durham, NC
  • 48. standards
  • 49. nomenclature
  • 50. antibodies Western Blot Immunohistochemstry ELISA Co-immunoprecipitation ChIP Radioimmunoassay
  • 51. FACS analysis of T cells from LNs and tumors T cells were liberated from LNs by disruption between two frosted glass slides. Cells from LNs and tumors were stained with various combination of the following Abs: FITC- CD4, allophycocyanin-CD25, PE Cy7-CD8, APC-CD62L, PE- CD25, PE Cy7-CD25, and biotinylated-KJ-126 and in some experiments made permeable with fixation/permeablization buffers and stained with PE-FoxP3 (eBioscience). Harvested samples, isotype controls, and single stain controls were run on the FACSCalibur (BD Biosciences). Ruby and Weinberg (2009) J Immunol. 182(3):1481-9.
  • 52. Which antibody did they use in the paper?
  • 53. A Solution: Antibody Registry antibodyregistry.org
  • 54. Meet the Urban Lab Meet the Urban Lab
  • 55. A+ organization! The Urban lab antibodies
  • 56. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Commerical Ab identifiable Catalog number reported Source organism reported Target uniquely identifiable Of 14 antibodies published in 45 articles, only 38% were identifiable Percentidentifiable
  • 57. http://www.force11.org/node/4463 http://biosharing.org/bsg-000532
  • 58. http://www.biosharing.org/standards/mibbi Minimum Information for Biological and Biomedical Investigations
  • 59. data publication and sharing
  • 60. Why share data? • Data sharing mandates • Further science and and medicine • Build collaborations • Enable new discoveries with your data • Can be required at time of publication
  • 61. Distribution of 2004–2005 citation counts of 85 trials by data availability.
  • 62. How?
  • 63. Beyond the PDF: What can be published (and cited)? Raw Science Nanopublications Self-publishing
  • 64. Beyond the PDF: What can be published (and cited)? Raw Science Nanopublications Self-publishing Datasets Code Experimental design Argument or passage Blogging Microblogging Comments on existing work Annotations on existing work Single figure publications
  • 65. How? Data Journals and Repositories • FigShare • Dryad • DataVerse (social science) • Institutional repositories
  • 66. www.impactstory.org
  • 67. 3 | How the OHSU Library can help
  • 68. 1 | Large Lecture: Data Management 101 2 | 10 –15 Small Groups: data playground • 1 researcher paired with 2 or 3 library staff • Tailored analysis of data reporting and instruction Save the date: 10/09/13 4-6pm 1k challenge award recipients
  • 69. Thank you!
  • 70. URLs to resources Go to: http://libguides.ohsu.edu/data