Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Analysing biomedical data (ers october 2017)

28 views

Published on

Presented at European Respiratory Society, Berlin, October 2017. High level talk to mix of clinicians and scientists on the difficulties of biomedical analysis, including practical, statistical and data issues.

Published in: Health & Medicine
  • Be the first to comment

  • Be the first to like this

Analysing biomedical data (ers october 2017)

  1. 1. Analysing biomedical data Paul Agapow / Translational Bioinformatics DSI-ICL / October 2017
  2. 2. Biomedical science is now data science
  3. 3. The 4-headed beast ● The 4 heads ○ Acquisition ○ Storage ○ Analysis ○ Sharing ● Big data 4Vs ○ Velocity ○ Volume ○ Variety ○ Veracity
  4. 4. The problems of biomedical data Many ... ● Types ● Formats ● Silos ● Gaps ● Interactions Difficult analysis ● The curse of dimensionality ● Multiple hypothesis testing & false discovery ● Batch effects ● Life history ● Biased sampling ● Need for integrative analysis Practical issues ● Unstructured data ● Managing big data ● Security ● Legal & privacy
  5. 5. Future medicine A mix of promise & peril ● More data ○ Genomic medicine ○ Other “omic” medicine ○ Wearables ○ EHR & digital health ● P4 medicine ○ Stratification ○ Analysis at the bedside ○ Patient participation ● Translational medicine ○ Leveraging health data for research
  6. 6. Scientific data doubles every 18 months A new paper is published every 30 seconds Most papers are never cited or even read No new principle will declare itself from below a heap of facts. (Peter Medewar)
  7. 7. The analytical challenges
  8. 8. Liberating health data ● Enabling EHR for research ● Text extraction ● Unstructured to structured data
  9. 9. Computationally intensive approaches ● Deep learning ● Concurrent computation ● Which to use? Which works? ● Implementation ● Interpretation ● Assisted & auto-discovery
  10. 10. Integrative analysis ● The genome is not enough ● Complex interactions ● Statistical power ● Which is best? ● Interpretation
  11. 11. Building knowledge bases ● Extracting structured information from unstructured input ● Veracity ● Exploring / querying
  12. 12. Reproducibility
  13. 13. The “solutions”
  14. 14. Standards ● Clinical descriptions ● Measurements: ○ blood pressure ○ White cells ● Cross-study Yes! ● Allows combining & comparing studies ● CDISC ● HPO But! ● A lot of work
  15. 15. Data formats & storageYes! ● Plain text ● Open formats ● Structured formats ● Advantages: ○ Human & machine readable ○ Unambiguous ○ WYSIWYG ● Examples: ○ Open bio formats ○ CSV, TSV No! ● Homebrew formats ● Proprietary / closed formats ● Binary formats ● Excel
  16. 16. Workflow systems & notebooks● Analysis as: ○ An executable recipe ○ A document or commentary ● Many candidates: ○ Workflows: ■ Snakemake ■ Nextflow ■ CWL etc. ○ Computational notebooks: ■ Jupyter / IPython ■ RMarkdown
  17. 17. Deep learning / machine learning How do you know a biologist is using deep learning in their research? Don’t worry, they’ll tell you. ● “Just” optimization and search techniques ● Takes a set of features and produces a model that performs a classification or a regression ● A series of layers that assemble features into higher level features ● Several high-quality toolkits ● Some need for specialised hardware (GPU) ● Interpretability ● Ground truths ● Needs lots of data
  18. 18. The pitfalls
  19. 19. Batch effects ● Technical sources of variation ○ Reageants ○ Technician ○ Platform ○ ... ● Solutions: ○ Plot data ○ Don’t batch ○ COMBAT etc. (but loss of information)
  20. 20. Omnigenics What if every gene affected every other gene? ● Pritchard et al 2017 ● FOAF / six degrees of separation effect ● Implicated genes are a few drivers and an enormous number of “related” loci ● Context?
  21. 21. The garden of forking paths Multiple hypothesis testing
  22. 22. Conclusion Taming the 4-headed beast Acquiring: interpret EHR Storing: data formats & systems Analysing: statistics, correct for batch effects, integrative analysis, deep learning Sharing: standards, data formats, workflow systems

×