Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Approaches for the Integration of Visual and Computational Analysis of Biomedical Data

289 views

Published on

The integration of computational and statistical approaches with visualization tools is becoming crucial as biomedical data sets are rapidly growing in size. Finding efficient solutions that address the interplay between data management, algorithmic and visual analysis tools is challenging. I will discuss some of these challenges and demonstrate how we are addressing them in our Refinery Platform project (http://www.refinery-platform.org).

Published in: Science
  • Be the first to comment

  • Be the first to like this

Approaches for the Integration of Visual and Computational Analysis of Biomedical Data

  1. 1. Approaches for the Integration of Visual and Computational Analysis of Biomedical Data HARVARD MEDICAL SCHOOL DEPARTMENT OF BIOMEDICAL INFORMATICS NILS GEHLENBORG @nils_gehlenborg http://gehlenborglab.org
  2. 2. FRITZ LEKSCHAS HARVARD MEDICAL SCHOOL
  3. 3. BIG PILES OF DATA …
  4. 4. Data Repositories general specialized ArrayExpress GEO Metabolights PRIDE dbGAP … ENCODE Roadmap Epigenomics …
  5. 5. … OFFER OPPORTUNITIES …
  6. 6. SINGLE OR FEW DATA SETS Test hypotheses without generating new data. Use published data as supporting evidence for findings based on our your own data sets. MANY DATA SETS Conduct meta analyses, e.g. characterize expression patterns in human tissues or to link diseases.
  7. 7. M. Lukk, et al., Nature Biotechnology, 28(4):322–324 (2010)
  8. 8. S. Suthram et al.,PLoS Computational Biology 6(2)(2010)
  9. 9. SINGLE OR FEW DATA SETS Test hypotheses without generating new data. Use published data as supporting evidence for findings based on our your own data sets. MANY DATA SETS Conduct meta analyses, e.g. characterize expression patterns in human tissues or to link diseases. COMMON BEHAVIOR OF RESEARCH PARASITES!
  10. 10. N Gehlenborg et al. , manuscript in preparation | DATA REPOSITORY VISUALIZATION TOOLS ANALYSIS PIPELINES
  11. 11. N Gehlenborg et al. , manuscript in preparation | DATA REPOSITORY VISUALIZATION TOOLS ANALYSIS PIPELINES
  12. 12. ANALYSIS PIPELINES N Gehlenborg et al. , manuscript in preparation | DATA REPOSITORY VISUALIZATION TOOLS ANALYSIS PIPELINES
  13. 13. ANALYSIS PIPELINES N Gehlenborg et al. , manuscript in preparation | DATA REPOSITORY VISUALIZATION TOOLS ANALYSIS PIPELINES GALAXY Toolshed Workflow Editor Tools REST API
  14. 14. ANALYSIS PIPELINES N Gehlenborg et al. , manuscript in preparation | DATA REPOSITORY VISUALIZATION TOOLS ANALYSIS PIPELINES GALAXY Toolshed Workflow Editor Tools REST API Workflow Inputs Workflow Outputs
  15. 15. N Gehlenborg et al. , manuscript in preparation | DATA REPOSITORY VISUALIZATION TOOLS ANALYSIS PIPELINES http://www.refinery-platform.org
  16. 16. … BUT NOT SO FAST!
  17. 17. Z Text-Bas Data Sets Metadata Data Files X Y Z A1 X Y Z A2 A3 A4 X Y Z- - K K K K L M L M Free Text Annotation Mapping K L, M X, Y Z X YZX Y Keywords
  18. 18. Z Text-Based Search Data Sets Metadata Data Files X Y Ontologies Z A1 X Y Z A2 A3 A4 X Y Z- - K K K K L M L M Free Text Annotation Mapping K L, M X, Y Z X YZX Y Terminal Root subclassof Keywords
  19. 19. Z Text-Based Search Data Sets Metadata Data Files X Y Ontologies Z A1 X Y Z A2 A3 A4 X Y Z- - K K K K L M L M Free Text Annotation Mapping K L, M X, Y Z X YZX Y Terminal Root subclassof Keywords
  20. 20. Z Text-Based Search Data Sets Metadata Data Files X Y Ontologies Z A1 X Y Z A2 A3 A4 X Y Z- - K K K K L M L M Free Text Annotation Mapping K L, M X, Y Z X YZX Y Terminal Root subclassof Keywords
  21. 21. Z Text-Based Search Data Sets Metadata Data Files X Y Ontologies Z A1 X Y Z A2 A3 A4 X Y Z- - K K K K L M L M Free Text Annotation Mapping K L, M X, Y Z X YZX Y Terminal Root subclassof Keywords
  22. 22. X Semantic Visual Exploration Y Z Text-Based Search Data Sets Metadata Data Files X Y Ontologies Z A1 X Y Z A2 A3 A4 X Y Z- - K K K K L M L M Free Text Annotation Mapping K L, M X, Y Z X YZX Y SATORI Terminal Root subclassof Keywords YX Z Z X
  23. 23. SATORI: A System for Ontology-Guided Visual Exploration of Biomedical Data Repositories http://satori.refinery-platform.org
  24. 24. D R C Data Analyst Group Leader Data Curator
  25. 25. D R C Data Analyst Group Leader Data Curator
  26. 26. D R C Data Analyst Group Leader Data Curator
  27. 27. D R C Data Analyst Group Leader Data Curator
  28. 28. Need 1
 find data sets that match certain experimental characteristics. Need 2
 find data sets that are similar (or dissimilar) to given data sets. Need 3
 get an overview of the distribution of the experimental characteristics across a collection of data sets. Need 4
 get an overview of the annotation term hierarchy and term usage.
  29. 29. Peter Pirolli and Stu Card
  30. 30. SATORI: A System for Ontology-Guided Visual Exploration of Biomedical Data Repositories http://satori.refinery-platform.org
  31. 31. C A B C List graph B C B Tree Tree map A A B C Data sets B C B C B C CB CB A B C Scenario 1: Scenario 2: Scenario 3: AnnotationsTerm 1 2 3 4
  32. 32. SATORI: A System for Ontology-Guided Visual Exploration of Biomedical Data Repositories http://satori.refinery-platform.org
  33. 33. SATORI: A System for Ontology-Guided Visual Exploration of Biomedical Data Repositories http://satori.refinery-platform.org
  34. 34. The Art Institute of Chicago
  35. 35. HARVARD MEDICAL SCHOOL JOHANNES KEPLER UNIVERSITY LINZ Stefan Luger, Holger Stitz, Marc Streit Web http://satori.refinery-platform.org · http://refinery-platform.org Acknowledgements Peter J Park & all members of the Computational Genomics Lab Fritz Lekschas, Jennifer K Marx, Scott Ouellette, Anton Xue, Psalm Haseley HARVARD SCHOOL OF PUBLIC HEALTH Ilya Sytchev, Shannan Ho Sui UNIVERSITY OF SHEFFIELD David R Jones, Winston Hide Funding NIH/NHGRI R00 HG007583, Harvard Stem Cell Institute
  36. 36. We are hiring postdocs & developers! HARVARD MEDICAL SCHOOL DEPARTMENT OF BIOMEDICAL INFORMATICS See http://gehlenborglab.org or http://dbmi.med.harvard.edu for details. Data visualization, analysis, and management for: • genomic structural variants • dynamics of the 3D genome • cancer subtypes in patient cohorts • exploration tools for data repositories • provenance graphs
  37. 37. X B A D A X XX Term Terminal term To be deleted A A X To be duplicated A A C ABA C B C' 0 0 00 5 5 5 5 0 5 1 5 5 10 5 10 Term size Cumulative sizeX1 2 2 7 2 7 1 5 D C F D C F F' 1. Global 2. Tree Map 3. Node-Link Diagram 5 10 1 5 1 105 5 0 10 G G BB B C C C E EA'C

×