DATA MANAGEMENT 101
Nicole Vasilevsky, Jackie Wirz and Melissa Haendel
PMCB New Student Orientation
20 September 2013
1 | Data definitions
2 | Dealing with data
3 | How the OHSU
Library can help
Nicole
Vasilevsky, Ph
D
Project
Manager, Ontolo
gy Development
Group
Jackie Wirz,
PhD
Assistant
Professor,
Bioinformation
...
1 | Data definitions
Data does not speak for itself…
YOU speak for YOUR data
But First, you need to manage it
But, even more fundamentally…
data
means many
things…
what does
data mean to
you?
What are data?
Experimental data
Social data
School related data
Personal data
Do you know what metadata is?
a. Philosophy
b. describes data
c. dating site
d. data
2 | dealing with data
Do you get frustrated with any of the following?
a. Storing data
b. Backing up data
c. Analyzing/manipulating data
d. Find...
Why?
Personal
organization
Efficiency
Credit where
credit is due
Accelerate
scientific and
clinical discovery
Reproducibil...
naming | metadata | tools | standards
How?
naming
File naming
Naming conventions
Project_instrument_location_YYYYMMDDhhm
mss_extra.ext
Index/grant
conditions
Leading zero!
s/n, variabl...
Naming: Directory Structure
PCMB presentation
Library presentation
DMICE presentation
Presentations
PMCB Library DMICE
http://ftp.ihmc.us/
ReadMe
Version Control
Versioning
• Save a copy of every version of a file
• Follow a file naming convention
Data101_PMCB_Retreat_09-20-13_v1
Dat...
Versioning
Versioning
Versioning
Version Control software:
• GIT
• SVN
Backups
Which of the following do you do?
a. Save copies of data on a disk, USB drive, or computer
hard drive
b. Save copies of da...
 1 on your local workstation
 1 local/removable, such as external hard
drive
 1 on central server
 1 remote, such as o...
Metadata
What is Metadata?
Title
Author
Call number
Publisher
ISBN
- Anne Gilliland
Your metadata
should make
your data
understandable
to others without
your
involvement
Metadata
Metadata
M...
Are you aware of data
standards in your field?
data standards
Data standards are the rules by which data are
described and recorded. In order to
share, exchange, and und...
Controlled vocabularies
Structured data helps with searching
Craigslist search: Chaise
Craigslist matches on strings only
Craigslist search: Faint...
Structured data helps with searching
PubMed indexes articles with
MeSH Terms
Structured data helps with searching
Why are CVs and Ontologies useful?
• Can be used to structure your metadata
• Are often used to structure information in
d...
tools
File renaming applications
• Bulk Rename Utility (Windows)
• Renamer (Mac)
• PSRenamer
Data Management tools and
repositories
• Purpose: Software where you can
organize, store and/or share data
• Often contain...
Tools for data management
Repositories use Unique IDs
• Document Object Identifier (DOI)
• Example: DOIs for publications
– doi: 10.1371/journal.pbi...
• Example:
• John L Campbell, Research Ecologist, Oregon State University, Corvallis
OR
• John L Campbell, Research Ecolog...
standards
nomenclature
antibodies
Western Blot
Immunohistochemstry
ELISA
Co-immunoprecipitation
ChIP
Radioimmunoassay
FACS analysis of T cells from LNs and tumors
T cells were liberated from LNs by disruption between two
frosted glass slide...
Which antibody did they use in the paper?
A Solution: Antibody Registry
antibodyregistry.org
Meet the Urban Lab
Meet the Urban Lab
A+ organization!
The Urban lab antibodies
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
Commerical Ab
identifiable
Catalog number
reported
Source organism
reported
Target ...
http://www.force11.org/node/4463
http://biosharing.org/bsg-000532
http://www.biosharing.org/standards/mibbi
Minimum Information for Biological and Biomedical Investigations
data publication and sharing
Why share data?
• Data sharing
mandates
• Further science and
and medicine
• Build collaborations
• Enable new
discoveries...
Distribution of 2004–2005 citation counts of 85 trials by data availability.
How?
Beyond the PDF:
What can be published (and cited)?
Raw Science Nanopublications Self-publishing
Beyond the PDF:
What can be published (and cited)?
Raw Science Nanopublications Self-publishing
Datasets
Code
Experimental...
How?
Data Journals and Repositories
• FigShare
• Dryad
• DataVerse (social science)
• Institutional repositories
www.impactstory.org
3 | How the OHSU
Library can help
1 | Large Lecture: Data Management 101
2 | 10 –15 Small Groups: data playground
• 1 researcher paired with 2 or 3 library ...
Thank you!
URLs to resources
Go to:
http://libguides.ohsu.edu/data
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_final
Data101 pmcb retreat_09-20-13_final
Upcoming SlideShare
Loading in...5
×

Data101 pmcb retreat_09-20-13_final

308

Published on

Published in: Education, Technology
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total Views
308
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide
  • JW
  • JW
  • JW
  • JW
  • JW
  • JW
  • Ask them to think about what type of data they deal with/generate. Give a couple minutes.
  • Ask if they have additional data types that they brainstormed
  • JW
  • These are all things that the library can help you do
  • JW
  • JW
  • http://patenteux.com/Messy_desktop/messy_wallpaper-1280x1024.jpg
  • If you work on the command line, you can see all the file paths
  • JW
  • Show examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was made
  • Show examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was madeNEW SLIDES:Examples of versions of dataData101_NV_v1Data101_NV_v2Simple software solutionsSome software keeps versions for youShow where to go get itVersion Control SoftwareVersion control softwareSVN, GITShow example of google codeCan write commit messages you version you commit
  • Show examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was made
  • Show examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was made
  • Show examples of versionsCan go back when you make mistakes when changes are madeShare work with other peopleBoth work on things at the same time and merge back togetherAkin to game of telephone- version control can let you see exactly when a change was made
  • NICOLE
  • Central servers will have multiple redundancy, back ups of back upsHigh quality secure USBs with passwords and encyrption, or burn to disk
  • JW
  • !
  • Move this
  • Information science is a parent
  • Ontologies classify terms and the relationships between them.
  • JW
  • Software that can rename your files, if you already have them named
  • Goal is to solve the author/contributor name ambiguity problem in scholarly communications Creating a central registry of unique identifiers for individual researchers Identifiers, and the relationships among them, can be linked to the researcher
  • JW
  • JW
  • JW
  • JW
  • JW
  • JW
  • JW
  • Maybe discuss the PlumX project?
  • JW
  • Say that we won an award to sponsor this program
  • Data101 pmcb retreat_09-20-13_final

    1. 1. DATA MANAGEMENT 101 Nicole Vasilevsky, Jackie Wirz and Melissa Haendel PMCB New Student Orientation 20 September 2013
    2. 2. 1 | Data definitions 2 | Dealing with data 3 | How the OHSU Library can help
    3. 3. Nicole Vasilevsky, Ph D Project Manager, Ontolo gy Development Group Jackie Wirz, PhD Assistant Professor, Bioinformation Specialist Melissa Haendel, PhD Assistant Professor, Lead, Ontology Development Group
    4. 4. 1 | Data definitions
    5. 5. Data does not speak for itself…
    6. 6. YOU speak for YOUR data
    7. 7. But First, you need to manage it
    8. 8. But, even more fundamentally…
    9. 9. data means many things…
    10. 10. what does data mean to you?
    11. 11. What are data? Experimental data Social data School related data Personal data
    12. 12. Do you know what metadata is? a. Philosophy b. describes data c. dating site d. data
    13. 13. 2 | dealing with data
    14. 14. Do you get frustrated with any of the following? a. Storing data b. Backing up data c. Analyzing/manipulating data d. Finding data produced by other researchers/clinicians e. Ensuring data are secure f. Making data accessible to other researchers g. Controlling access to data h. Tracking updates to data (ie versioning) i. Creating metadata (ie describing the data to be more useful at a later time or by others) j. Protecting intellectual property rights k. Ensuring appropriate professional credit/citation is given to data sets/generated
    15. 15. Why? Personal organization Efficiency Credit where credit is due Accelerate scientific and clinical discovery Reproducibility of science and medicine
    16. 16. naming | metadata | tools | standards How?
    17. 17. naming
    18. 18. File naming
    19. 19. Naming conventions Project_instrument_location_YYYYMMDDhhm mss_extra.ext Index/grant conditions Leading zero! s/n, variable Retain order
    20. 20. Naming: Directory Structure
    21. 21. PCMB presentation Library presentation DMICE presentation Presentations PMCB Library DMICE
    22. 22. http://ftp.ihmc.us/
    23. 23. ReadMe
    24. 24. Version Control
    25. 25. Versioning • Save a copy of every version of a file • Follow a file naming convention Data101_PMCB_Retreat_09-20-13_v1 Data101_PMCB_Retreat_09-20-13_v2 Data101_PMCB_Retreat_09-20-13_Final
    26. 26. Versioning
    27. 27. Versioning
    28. 28. Versioning Version Control software: • GIT • SVN
    29. 29. Backups
    30. 30. Which of the following do you do? a. Save copies of data on a disk, USB drive, or computer hard drive b. Save copies of data on a local server c. Save copies of data on a central campus server d. Save copies of data on a web-based or cloud server e. Store data in a repository or archives f. Automatically backup files g. Manually generate backup h. Restrict access to files
    31. 31.  1 on your local workstation  1 local/removable, such as external hard drive  1 on central server  1 remote, such as on a cloud server* *Depending on the type of data, as cloud servers are not always secure Where can you backup your data?
    32. 32. Metadata
    33. 33. What is Metadata? Title Author Call number Publisher ISBN
    34. 34. - Anne Gilliland Your metadata should make your data understandable to others without your involvement Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata Metadata
    35. 35. Are you aware of data standards in your field?
    36. 36. data standards Data standards are the rules by which data are described and recorded. In order to share, exchange, and understand data, we must standardize the format as well as the meaning. http://www.usgs.gov/datamanagement/plan/datastandards.php
    37. 37. Controlled vocabularies
    38. 38. Structured data helps with searching Craigslist search: Chaise Craigslist matches on strings only Craigslist search: Fainting couch
    39. 39. Structured data helps with searching PubMed indexes articles with MeSH Terms
    40. 40. Structured data helps with searching
    41. 41. Why are CVs and Ontologies useful? • Can be used to structure your metadata • Are often used to structure information in databases Cell Ontology Linnean Taxonomy Order Genus Species Phylum Class Family Kingdom
    42. 42. tools
    43. 43. File renaming applications • Bulk Rename Utility (Windows) • Renamer (Mac) • PSRenamer
    44. 44. Data Management tools and repositories • Purpose: Software where you can organize, store and/or share data • Often contain metadata to assist with data entry and create structured data
    45. 45. Tools for data management
    46. 46. Repositories use Unique IDs • Document Object Identifier (DOI) • Example: DOIs for publications – doi: 10.1371/journal.pbio.1001339 • Unique resource identifier (URI) • A URI will resolve to a single location on the web • URIs for people
    47. 47. • Example: • John L Campbell, Research Ecologist, Oregon State University, Corvallis OR • John L Campbell, Research Ecologist, Center for Research on Ecosystem Change, Durham, NC
    48. 48. standards
    49. 49. nomenclature
    50. 50. antibodies Western Blot Immunohistochemstry ELISA Co-immunoprecipitation ChIP Radioimmunoassay
    51. 51. FACS analysis of T cells from LNs and tumors T cells were liberated from LNs by disruption between two frosted glass slides. Cells from LNs and tumors were stained with various combination of the following Abs: FITC- CD4, allophycocyanin-CD25, PE Cy7-CD8, APC-CD62L, PE- CD25, PE Cy7-CD25, and biotinylated-KJ-126 and in some experiments made permeable with fixation/permeablization buffers and stained with PE-FoxP3 (eBioscience). Harvested samples, isotype controls, and single stain controls were run on the FACSCalibur (BD Biosciences). Ruby and Weinberg (2009) J Immunol. 182(3):1481-9.
    52. 52. Which antibody did they use in the paper?
    53. 53. A Solution: Antibody Registry antibodyregistry.org
    54. 54. Meet the Urban Lab Meet the Urban Lab
    55. 55. A+ organization! The Urban lab antibodies
    56. 56. 0% 10% 20% 30% 40% 50% 60% 70% 80% 90% Commerical Ab identifiable Catalog number reported Source organism reported Target uniquely identifiable Of 14 antibodies published in 45 articles, only 38% were identifiable Percentidentifiable
    57. 57. http://www.force11.org/node/4463 http://biosharing.org/bsg-000532
    58. 58. http://www.biosharing.org/standards/mibbi Minimum Information for Biological and Biomedical Investigations
    59. 59. data publication and sharing
    60. 60. Why share data? • Data sharing mandates • Further science and and medicine • Build collaborations • Enable new discoveries with your data • Can be required at time of publication
    61. 61. Distribution of 2004–2005 citation counts of 85 trials by data availability.
    62. 62. How?
    63. 63. Beyond the PDF: What can be published (and cited)? Raw Science Nanopublications Self-publishing
    64. 64. Beyond the PDF: What can be published (and cited)? Raw Science Nanopublications Self-publishing Datasets Code Experimental design Argument or passage Blogging Microblogging Comments on existing work Annotations on existing work Single figure publications
    65. 65. How? Data Journals and Repositories • FigShare • Dryad • DataVerse (social science) • Institutional repositories
    66. 66. www.impactstory.org
    67. 67. 3 | How the OHSU Library can help
    68. 68. 1 | Large Lecture: Data Management 101 2 | 10 –15 Small Groups: data playground • 1 researcher paired with 2 or 3 library staff • Tailored analysis of data reporting and instruction Save the date: 10/09/13 4-6pm 1k challenge award recipients
    69. 69. Thank you!
    70. 70. URLs to resources Go to: http://libguides.ohsu.edu/data
    1. A particular slide catching your eye?

      Clipping is a handy way to collect important slides you want to go back to later.

    ×