Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Interpreting transcriptomics (ers berlin 2017)

27 views

Published on

Presented at European Respiratory Society, Berlin, October 2017. High level talk to mix of clinicians and scientists on analyzing transcriptomic / gene expression data

Published in: Science
  • Be the first to comment

  • Be the first to like this

Interpreting transcriptomics (ers berlin 2017)

  1. 1. Clusters, pathways, context Interpreting transcriptomic data Paul Agapow, Translational Bioinformatics, Data Science Institute Syst. Medicine in Resp. Disease Berlin October 2017
  2. 2. • Which genes are transcribed more / less? • What’s the difference between: – Cell lines? – Healthy & unhealthy tissue? – Tissues? – Patients with & without a SNP? Expression data can tell us ...
  3. 3. • Dynamic • Responsive • Quantifiable • More informative Why study expression data? But: • (Processing) • Comparative analysis • Multiple technologies • Cut-offs • Batch effects • Power • Looking at the right place / time? • Interpretation
  4. 4. • Microarrays: – DNA anchored to a solid surface – Assess RNA that binds to it – “Old” (90s) – Noisy – Finds what’s on the chip Platforms • RNA-seq: – Deep-sequencing of RNA – More accurate & reliable – More expensive – High throughput – Finds everything
  5. 5. 1. Set of R software libraries for analysis of high-throughput data – Inter-operable – documented 2. BC library for transcriptomic analysis Tools: Bioconductor & limma
  6. 6. Interpretation: Clustering Put similar things together: • Gene expression patterns (co- regulation, modules) • Patients (stratification) But: • What’s a cluster / similarity? • Allow for noise • Comparison • Is it ontologically real?
  7. 7. Many methods but: • K-Means / K-Medians clustering – Simple – Stochastic, define K – Best with spherical data • Hierarchical clustering – Levels of granularity – Produces dendrogram – Computationally complex How to cluster But: • Little comparative work • No support / confidence • Supervised vs unsupervised • Poor reproducibility – Bootstrap / Jackknife • Comparing clusters
  8. 8. Clustering assumptions • Incorrect number of clusters • Non-spherical distribution • Unequal variance • Unequal group size
  9. 9. How do you compare clusters obtained from 2+ different experiments? • Especially if clusters labelled differently • If separation poor • If clusters nest Comparing clusters • Adjusted mutual information (sklearn) – No nesting • Conditional entropy
  10. 10. • Match genes against lists • Associate a gene with a compartment or pathway • Examine enrichment / downregulation Interpretation: enrichment But: • What’s a pathway? • Are they right? • Statistical basis • Many choices • Post-transcriptional regulation?
  11. 11. • Popular tools: – DAVID (not updated?) – GSEA – Ingenuity / Metacore – Bioconductor • Individual cases: – Hypergeometric test • Gives you support Enrichment
  12. 12. • Many knowledge bases are a pot- pourri of undifferentiated “facts” – Incomplete – Where / what / how? • Use curated knowledge bases • Traverse graphs Interpretation: contextualization
  13. 13. • Use graphs databases for • Traverse graphs for “neighbours” – Shortest paths connecting protein COL6A5, a protein implicated in airway remodelling, to asthma • Stats / support? • Hypothesis generation Graph databases for knowledge representation
  14. 14. • Science is hard • Assumptions are important • Obtaining support / confidence / validation is difficult • ... but important Conclusions?

×