Your SlideShare is downloading. ×
0
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Reproducible, Open  Data Science in the  Life Sciences
Upcoming SlideShare
Loading in...5
×

Thanks for flagging this SlideShare!

Oops! An error has occurred.

×
Saving this for later? Get the SlideShare app to save on your phone or tablet. Read anywhere, anytime – even offline.
Text the download link to your phone
Standard text messaging rates apply

Reproducible, Open Data Science in the Life Sciences

1,434

Published on

Digital Research 2013 presentation on "Reproducible, Open …

Digital Research 2013 presentation on "Reproducible, Open
Data Science in the Life Sciences"

Published in: Technology, Education
1 Comment
6 Likes
Statistics
Notes
No Downloads
Views
Total Views
1,434
On Slideshare
0
From Embeds
0
Number of Embeds
3
Actions
Shares
0
Downloads
28
Comments
1
Likes
6
Embeds 0
No embeds

Report content
Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
No notes for slide

Transcript

  • 1. Reproducible, Open Data Science in the Life Sciences Digital Research 2013, 9th-10th September 2013 Eamonn Maguire Lead Software Engineer University of Oxford e-Research Centre
  • 2. “data science” - the storage, management and analysis of data sets or... What is “data science”... these days, all hard sciences are “data sciences” "data science" - formalizing a hypothesis given a set of observations and assumptions, grabbing data about that hypothesis, testing it and analyzing it to either confirm or falsify the hypothesis.  we shift the focus of science from performing physical experiments where data is the by-product used to test a hypothesis, to working directly with the data both definitions have different levels of validity in terms of the etymology of the word “science”, but in this presentation, both go very much hand in hand.
  • 3. Why reproducible and open? open experiments are expensive...and often funded publicly. data from experiments may extend way beyond the realms originally intended... one without the other is really no use at all. reproducible findings need to be robust...and testable by the wider scientific community... provided with data, metadata, analysis methods, algorithms enabled by data, metadata, analysis methods, algorithms
  • 4. Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment The workflow of a “data scientist”...
  • 5. Planning Planning What is your hypothesis? What do you need to prove/disprove this? You need an experimental design. Preferably balanced groups, but enough samples to make it statistically valid. If you need to generate the data...there’s an app for that. + Design Wizard Plan your experiment by answering questions about what you want to measure and how you want to measure it...then let the tool create the design Creates the ISA-Tab stub, leaving you to fill in which files match which biological samples.
  • 6. Planning Planning You also need a data management strategy. Which ontologies, minimal information checklists and exchange formats can be used for my domain? What are the requirements of my funder for data deposition? Which databases support my data?
  • 7. Data Collection Data Collection Use existing data Perform new experiment New experiment data Use existing data Collect the data and metadata from an experiment Or use existing data and metadata to test a hypothesis... SAMPLE 1 SAMPLE 2 SAMPLE 3 SAMPLE 4 SAMPLE 5 SAMPLE 6 SAMPLE 7 SAMPLE 8 SAMPLE 9 SAMPLE 10 SAMPLE 11 FILE 1 FILE 2 FILE 3 FILE 4 FILE 5 FILE 6 FILE 7 FILE 8 FIL FIL FIL SAMPLE 1 SAMPLE 2 SAMPLE 3 SAMPLE 4 SAMPLE 5 SAMPLE 6 SAMPLE 7 SAMPLE 8 SAMPLE 9 SAMPLE 10 SAMPLE 11 FILE 1 FILE 2 FILE 3 FILE 4 FILE 5 FILE 6 FILE 7 FILE 8 FIL FIL FIL
  • 8. Data Collection Use existing data Perform new experiment New experiment data Excel Create templates to fit the type of experiments to be described Curate your experiment using a desktop-based, platform independent tool. Describe & curate your experiment with geographically distributed collaborators Check out http://isa-tools.org to download. Enables the creation of meaningful experiment data in a simple extendable format
  • 9. Data Collection Use existing data Perform new experiment Use existing data
  • 10. Data Management Data Management
  • 11. Data Management Data Management Shifting towards a new system
  • 12. Analysis Analysis The interesting bit...doing something with our data and metadata... Analysis of ISA Tab data in the R language. Brings together the context and data to enable more meaningful analysis. Also suggests packages to use for analysis based on the data types in the ISA Tab file. Publication coming soon... Analysis of ISA-Tab data in the Galaxy Environment. Creates Galaxy Library objects from ISA-Tab files. Analysis of ISA-Tab data in the GenomeSpace Environment. Load and edit files stored on distributed servers. Created by Brad Chapman at the Harvard School for Public Health
  • 13. Visualization Visualization Check out your experiment, visualize experimental design Visual Compression of Workflow Visualizations with Automated Detection of Macro Motifs Maguire et al, 2013 IEEE TVCG Taxonomy−based Glyph Design – with a Case Study on Visualizing Workflows of Biological Experiments Maguire et al, 2012 IEEE TVCG
  • 14. Publication Visualization Publish, along with your research articles & specialised community repositories Share, link and reason over experiments with linked data Getting your work out there...
  • 15. Publication - current work Visualization See http://www.slideshare.net/GigaScience/scott-edmunds-ismb-talk-on-big-data- publishing for a use case showing how we achieve this. Analysis results Data files Publications Metadata Encapsulates all the information about the experiment, providing links to the data files, publications and analysis protocols Analysis workflows in the Galaxy Environment Workflows Presentations Logs
  • 16. Box it all up The role of a data scientist, (or in the life sciences, a bioinformatician) is multi fold We’ve presented a suite of tools to help the data scientist in the management of their data and support the creation of open, meaningful life science data Data Scientist VisualizationData ManagementData Collection PublicationPlanning a b c e f Analysis d “data science” - the storage, management and analysis of data sets "data science" - formalizing a hypothesis given a set of observations and assumptions, grabbing data about that hypothesis, testing it and analyzing it to either confirm or falsify the hypothesis.  With the systems we have in place for data discovery paired with data already created with the ISA suite of tools, we make possible data integration
  • 17. The workflow of a “data scientist”... Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Making science with data possible...
  • 18. The workflow of a “data scientist”... Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment And this... Making science with data possible...
  • 19. The workflow of a “data scientist”... Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Making science with data possible...
  • 20. The workflow of a “data scientist”... ... Making science with data possible... Data Scientist Analysis Planning Data Management Data Collection Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Publication Data Scientist Analysis Planning Data Management Data Collection Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Publication Data Scientist Planning Data Collection Use existing data Perform new experiment Data Scientist Planning Data CollectionPublication Use existing data Perform new experiment Data Scientist Planning Data CollectionPublication Use existing data Perform new experiment Data Scientist Planning Data CollectionPublication Use existing data Perform new experiment Data Scientist Planning Data CollectionPublication Use existing data Perform new experiment Data Scientist Planning Data CollectionPublication Use existing data Perform new experiment Data Scientist Planning Publication Data Scientist Analysis Planning Data Management Data Collection Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Data Management Data CollectionPublication Use existing data Perform new experiment Data Scientist Visualization Analysis Planning Publication
  • 21. Recent Publications Visual Compression of Workflow Visualizations with Automated Detection of Macro Motifs Maguire et al, 2013 IEEE TVCG Taxonomy−based Glyph Design – with a Case Study on Visualizing Workflows of Biological Experiments Maguire et al, 2012 IEEE TVCG ISA software suite: Overview of ISA-Tab and first set of tools Rocca-Serra et al, 2010 Bioinformatics Towards interoperable bioscience data: Pre- senting the ISA Commons, authored by more than 50 collaborators at over 30 scientific organizations around the globe. Sansone et al, 2012 Nature Genetics OntoMaton: a Bioportal powered ontology widget for Google Spreadsheets. Maguire et al, 2013 Bioinformatics The Harvard Stem Cell Discovery Engine: an integrated repository and analysis system for Ho Sui et al, 2012 Nucleic Acids Research Taxonomy-based Glyph Design Maguire et al, 2012 IEEE TVCG Visualizing (ISA based) workflows of biological experiments Standardizing data: ISA-Tab-Nano: ISA-Tab extension for nanotechnology applications au- thored by over 20 organizations inlc. government agencies, academia and industry. Baker et al, 2013 Nature Nanotechnology MetaboLights: an open-access repository for metabolomics at EBI powered by ISA. Haug et al, 2013 Nucleic Acids Research The ToxBank Data Warehouse: a research cluster of 7 EU FP7 Health systems toxicology and toxicogenomics projects develops the ISAtoRDF module Kohonen et al, 2013 Molecular Informatics
  • 22. Thanks to ISA team Susanna-Assunta Sansone Philippe Rocca-Serra Eamonn Maguire Alejandra Gonzalez-Beltran Contributors Marco Brandizi Natalija Sklyar Brad Chapman Bob MacCallum Kenneth Haug Pablo Conesa Audrey Kauffman Funders & Our Many Collaborators! S te m C e ll C om m ons Nanotechnology Informatics Working Group
  • 23. Questions? http://isa-tools.org http://isacommons.org http://biosharing.org

×