Probabilistic refinement of cellular pathway models

Loading...

Flash Player 9 (or above) is needed to view presentations.
We have detected that you do not have it on your computer. To install it, go here.

0 comments

Post a comment

    Post a comment
    Embed Video
    Edit your comment Cancel

    Favorites, Groups & Events

    Probabilistic refinement of cellular pathway models - Presentation Transcript

    1. Probabilistic refinement of cellular pathway models Cambridge Statistical Laboratory Networks seminar series 2009 Jan 21 Florian Markowetz florian.markowetz@cancer.org.uk
    2. What is a signaling pathway? Environmental stimuli Protein Receptor in cell membrane Pat hw mRNA Protein cascade ay Transcription factors regulating target genes DNA
    3. Pathway reconstruction Signaling pathways are important - Deregulation causes many diseases incl. cancer Signaling pathways are poorly understood - Only parts-lists - missing are interactions within and between pathways Biological research - So far mostly focused on individual genes New genome-scale datasets - Opportunity for data integration and novel methods
    4. What data do we have? Proteins: - interactions between proteins Bulk of data: - binding to DNA Microarray mRNA: Protein - Expression under different stimuli - binding to DNA mRNA Sequence: - binding motifs - epigenetic marks DNA Morphology
    5. Pathways as graphs • Nodes are (mostly) known • Goal: infer edges from data • Data are heterogeneous • co-expression between Edges genes • interactions between proteins • binding motifs at genes • binding of proteins to Nodes • Protein domains DNA • Functional annotation • Cause-effect data: Paths • changing environments • experimental perturbations
    6. Pathway reconstruction “Classical” statistical approaches: Treat the genes/proteins as random variables and explore correlation structure in the data: – Correlation graphs – Gaussian graphical models (partial correlation) – Bayesian networks Challenges/Problems/Opportunities 1. Correlation may be un-informative 2. Integrate heterogeneous and noisy and complementary data sources Review: Markowetz and Spang (2007)
    7. – Part 1 – Nested Effects Models
    8. Experimental perturbations Drugs Small molecules RNAi Protein Stress Knockout mRNA DNA Readout: Global gene expression measurements
    9. Drosophila immune response Columns: perturbed genes Rows: effects on other genes 1. Silencing tak1 reduces expression of all LPS- inducible transcripts 2. Silencing rel (key) or mkk4/hep reduces expression of subsets of induced transcripts (Boutros et al, Dev Cell 2002)
    10. (!) Two types of entities Components of signaling pathway which are experimentally perturbed Downstream effect reporters
    11. (!!) Only indirect information No direct observation of perturbation effects on other pathway components! Inference from observed perturbation effects on downstream reporters.
    12. The information gap Direct information: Indirect information: effects are visible at other effects are only visible at pathway components down-stream reporters Pathway Pathway B B D D A C A C - Cell survival or death - Growth rate - downstream genes
    13. Correlation won’t do “Classical” approach Pathway Correlation B D Graphical models: - Bayes Nets A C - GGMs Mutual Information Nested Downstream Effects regulated genes Models
    14. Nested Effects Models 1. Set of candidate pathway genes INPUT 2. High-dimensional phenotypic profile, e.g. microarray Graph representation of information flow explaining OUTPUT the phenotypes Phenotypic profiles Inferred pathway Gene perturbations A AB B C D EF CD E F G GH H Effects
    15. NEM: model formulation M’xyz: Expected Observed Z X Y X X FN FN Y Y FP Z Z FN E1 E2 E3 E4 E5 E6 E1 E2 E3 E4 E5 E6 E1 E2 E3 E4 E5 E6 Pathway genes: X, Y, Z Effect reporters: E1, …, E6 • core topology • states are observed • to be reconstructed = Data D = Model M • positions in pathway unknown = Parameters θ Marginal likelihood Posterior: P ( M | D ) = 1/Z . P( D | M ) . P( M )
    16. Likelihood P( D | M, θ ) Compare predictions with observations: Y Prediction E1=0 E2=1 X Z Observation 1. E1=1 E2=1 2. E1=0 E2=1 E1 E2 Error probabilities e.g. false NEG rate 20%, false POS rate 5% Lik = Pr( E1 = 1) ⋅ Pr( E2 = 1) ⋅ Pr( E1 = 0) ⋅ Pr( E2 = 1) = 0.05 ⋅ 0.95 ⋅ 0.80 ⋅ 0.95
    17. Marginal likelihood P ( D | M ) = ∫ P ( D | M , Θ ) P (Θ | M ) dΘ m l n 1 ∏∑∏ P(e | M ,θ i = j ) =m ik n i =1 j =1 k =1 Uniform prior over positions Distribution of single effect Product over Product over reporter with all effect Average over possible positions replicate known position reporters observation in the pathway
    18. NEM: inference Model space: all transitively closed directed graphs Exhaustive enumeration: score all models to find the one fitting the data best Markowetz et al. Bioinformatics, 2005 MCMC, Simulated Annealing: take small probabilistic steps to explore model space . . . with A Tresch; in preparation Divide and conquer: break a big model into smaller, manageable pieces and then re-assemble Markowetz et al. ISMB 2007
    19. NEM: extensions Likelihood based on Drop transitivity requirement log-ratios of effects Feature selection to concentrate on informative effect reporters Tresch and Markowetz (2008)
    20. NEMs on Drosophila data
    21. Summary of part 1 1. Gene perturbation screens with gene- expression readouts 2. Perturbation screens suffer from the information gap between pathways and reporters 3. Nested Effects Models reconstruct pathway features from subset relations between observed effects
    22. – Part 2 – Data integration and probabilistic refinement of a signaling pathway hypothesis
    23. Pathway refinement 1. Start from given pathway hypothesis Even if our understanding of pathways is poor, that does not mean we have none at all! 2. Evaluate evidence for hypothesis in data 3. Identify weakly supported areas and likely extensions Not reconstruction from scratch. Step 1: assemble pathway hypothesis (KEGG, literature, …) for pheromone response pathway in Yeast
    24. Edge data I Support for hypothesis in protein-protein interaction data
    25. Edge data II Support for hypothesis in co-expression data
    26. Edge data III Why is it so hard to reconstruct nuclear regulatory network from correlations?
    27. Edge data IV Support for hypothesis in TF-DNA binding data
    28. Paths: cause-effect data Expression profiling of knock-out mutants (Hughes et al., 2000) Result: transcriptional response to perturbation only visible on down-stream genes (information gap!)
    29. Conclusion from data analysis • Every data source is informative for a specific compartment of the pathway • No data source is informative in all compartments • We expect these observations also to hold for other MAPK and signaling pathways. Need compartment-specific integrative model encompassing edge, node, and path data.
    30. Integrative model Conditional distributions for each data type Pathway graph as hidden/latent variables Prior Parameters Graphical model defines Different data types contribute posterior P(G|data) to each compartment -> inference by Gibbs sampler
    31. Evaluation 1. Fit model parameters on pheromone response pathway (training) 2. Use fitted model on other MAPK pathways (generalization to closely related examples) 3. Use fitted model on all other Yeast signaling pathways (generalization to everything else) … work in progress …
    32. Acknowledgements Nested Effects Models Rainer Spang (Univ. Regensburg) .:. Dennis Kostka (UC SF) .:. Achim Tresch (Gene Center Munich) .:. Holger Fröhlich (DKFZ Heidelberg) .:. Tim Beißbarth (Univ. Göttingen) .:. Josh Stuart, Charlie Vaske (UC SC) .:. Data integration Olga G. Troyanskaya (Princeton) .:. Edoardo Airoldi (Harvard) .:. David Blei (Princeton) .:.
    33. Probabilistic refinement of cellular pathway models Thank you ! Florian Markowetz florian.markowetz@cancer.org.uk
    SlideShare Zeitgeist 2009

    + Florian MarkowetzFlorian Markowetz Nominate

    custom

    269 views, 0 favs, 0 embeds more stats

    Talk at the Network Seminar Series of the Cambridge more

    More info about this document

    © All Rights Reserved

    Go to text version

    • Total Views 269
      • 269 on SlideShare
      • 0 from embeds
    • Comments 0
    • Favorites 0
    • Downloads 4
    Most viewed embeds

    more

    All embeds

    less

    Flagged as inappropriate Flag as inappropriate
    Flag as inappropriate

    Select your reason for flagging this presentation as inappropriate. If needed, use the feedback form to let us know more details.

    Cancel
    File a copyright complaint
    Having problems? Go to our helpdesk?

    Categories