Pathway Discovery in Cancer:                the Bayesian Approach                              Francesco GadaletaDeveloped...
Genes and DeseasesBiological Assumptions  •   Cancer normally originate in a single cell  •   Cell’s life is regulated by ...
Genes and Deseases:genetic predisposition
Genes and Deseases:    genetic predisposition                   Normal cellFirst mutation   Second mutation          Third...
Genes and Deseases:                Microarray TechnologyCancer Cells                                Red Fluorescent Probes...
Genes and Deseases:the goal of biologists and genetists•   Prenatal diagnosis for recognized deseases, eg Down Sindrome•  ...
Goal of this Thesis•   Microarray Analysis by more complex tools•   Integrate in a unique model what is already known from...
Type Of Data•   Normalization (fluorescent intensity)•   Filtering of microarray data (how to select subsets of genes)•   D...
Interval Discretization•   Sort n observations•   Divide observations in d levels (uniformly spaced intervals)•   i-th obs...
Quantile Discretization•   Sort n observations•   Divide all observations in d levels by placing an equal number of obs.  ...
Exporting DataGene Name Sample_0   Sample_1 Sample_2 Sample_3   ...   Sample_n rpoH      29.345     30.431   25.125   29.5...
Knowledge Base•   Many biological processes are still not known•   Reliabiality of data      ➡ hybridization is still a ha...
What we want to solve?•   Genetical cancer forecasting?•   Need for a model to handle uncertain knowledge•   A model that ...
What we want to solve?•   Need for a model to handle uncertain knowledge•   A model that biologists and epidemiologists ca...
Bayesian Networks:                              features•   Can handle uncertain knowledge with probability•   Can handle ...
Bayesian Networks:                         definition•   Direct Acyclic Graph    (how variables interact each other)       ...
Bayesian Networks:                         definition•                                             p(A)    Direct Acyclic G...
Bayesian Networks:                         definition•                                             p(A)               p(B) ...
Bayesian Networks:                         definition•                                             p(A)                   p...
Bayesian Networks:                         definition•                                             p(A)                   p...
Bayesian Networks:                         definition•                                                      p(A)           ...
Bayesian Networks:                         definition•                                                      p(A)           ...
Bayesian Networks:                         definition•                                                      p(A)           ...
Bayesian Networks:                     formal assumptions• Structure Possibility• Complete Data• Markov Condition• Observa...
Bayesian Networks:                     formal assumptions• Structure Possibility          Each of the n! structures is pos...
Bayesian Networks:                     formal assumptions• Structure Possibility• Complete Data• Markov Condition• Observa...
Bayesian Networks:                     formal assumptions• Structure Possibility           No missing data in order to com...
Bayesian Networks:                     formal assumptions• Structure Possibility• Complete Data• Markov Condition• Observa...
Bayesian Networks:                     formal assumptions• Structure Possibility          Allows to factorize  • Complete ...
Bayesian Networks:                     formal assumptions• Structure Possibility• Complete Data• Markov Condition• Observa...
Bayesian Networks:                     formal assumptions                                   X1             X1• Structure P...
Bayesian Networks:                     formal assumptions• Structure Possibility• Complete Data• Markov Condition• Observa...
Bayesian Networks:                     formal assumptions• Structure Possibility       A function to measure how well a• C...
Bayesian Networks:                     formal assumptions• Structure Possibility• Complete Data• Markov Condition• Observa...
Bayesian Network:                   structure learning•   Constraint Satisfaction Problem vs. Optimization Problem       •...
Bayesian Networks:             K2 algorithm•   Goal: maximize the structure probability given the data•   A initial order ...
Bayesian Networks:   K2 algorithm                     [Quality measure of the                      net given the data by  ...
Bayesian Networks:                       K2 algorithm• let D the dataset, N the number of examples,• G the network structu...
Bayesian Networks:                       K2 algorithm• let D the dataset, N the number of examples,• G the network structu...
Bayesian Networks:                       K2 algorithm•   Possible actions     •   edge addition     •   edge deletion
Data Integration•   heterogeneous data integration•   binary gene-gene relations•   bayesian network collective learning  ...
Data Integration•   heterogeneous data integration                                                 Gene1                  ...
Data Integration•   heterogeneous data integration•   binary gene-gene relations•   bayesian network collective learning  ...
Data Integration•   heterogeneous data integration•   binary gene-gene relations•   bayesian network collective learning  ...
Data Integration                                           Microarray data•   heterogeneous data integration•   binary gen...
Data Integration                                           Microarray data•   heterogeneous data integration•   binary gen...
Data Integration                                           Microarray data   Clinical data•   heterogeneous data integrati...
Data Integration                                           Microarray data   Clinical data•   heterogeneous data integrati...
Data Integration                                           Microarray data   Clinical data•   heterogeneous data integrati...
Data Integration                                           Microarray data   Clinical data•   heterogeneous data integrati...
Data Integration                                           Microarray data   Clinical data•   heterogeneous data integrati...
Experiments and results                             a generator of synthetic gene expression data            SynTReN      ...
Experiments and results•   Results (random net + bio net (without clinical data))•   Idea that clinical data may improve s...
Learned Structure Network                    Microarray variables                                               232   DEMO...
Conclusions•   Partial Integration of two data sources improves performance within    the Bayesian Network Framework•   A ...
Upcoming SlideShare
Loading in …5
×

Pathway Discovery in Cancer: the Bayesian Approach

615 views

Published on

Back to 2008 this is the presentation of the master thesis titled "Pathway discovery in cancer: the Bayesian approach".
In this thesis I focused on ovarian cancer microarray data analysis to predict gene-gene interactions using supervised learning and probabilistic methods.

1 Comment
1 Like
Statistics
Notes
No Downloads
Views
Total views
615
On SlideShare
0
From Embeds
0
Number of Embeds
8
Actions
Shares
0
Downloads
0
Comments
1
Likes
1
Embeds 0
No embeds

No notes for slide

Pathway Discovery in Cancer: the Bayesian Approach

  1. 1. Pathway Discovery in Cancer: the Bayesian Approach Francesco GadaletaDeveloped and written at ESAT dept. of Electrical Engineering of the Faculty of Engineering Katholieke Universiteit Leuven (Belgium)
  2. 2. Genes and DeseasesBiological Assumptions • Cancer normally originate in a single cell • Cell’s life is regulated by many genes activated in different stepsTypes Of Genes • Oncogene • Tumor-suppressor • DNA-repair
  3. 3. Genes and Deseases:genetic predisposition
  4. 4. Genes and Deseases: genetic predisposition Normal cellFirst mutation Second mutation Third mutation Malignant Cell
  5. 5. Genes and Deseases: Microarray TechnologyCancer Cells Red Fluorescent Probes mRNA cDNA Reverse RNA Isolation Transcriptase Combine TargetNormal Cells Labeling mRNA cDNA Green Fluorescent Probes
  6. 6. Genes and Deseases:the goal of biologists and genetists• Prenatal diagnosis for recognized deseases, eg Down Sindrome• Carrier testing to help couples with hereditary desease in the risky decision of breeding• Patient tailored diagnosis for genetic deseases
  7. 7. Goal of this Thesis• Microarray Analysis by more complex tools• Integrate in a unique model what is already known from other experiments• Identify those genes that form desease pathways
  8. 8. Type Of Data• Normalization (fluorescent intensity)• Filtering of microarray data (how to select subsets of genes)• Data Discretization (Are bio reactions discrete events?) ➡ Interval discretization ➡ Quantile discretization ➡ Exporting
  9. 9. Interval Discretization• Sort n observations• Divide observations in d levels (uniformly spaced intervals)• i-th obs. is disretized as j-th level iff: x0 + j(xn-1 - x0) (j+1)(xn-1 - x0) d < xi < x0 + d
  10. 10. Quantile Discretization• Sort n observations• Divide all observations in d levels by placing an equal number of obs. in each bin: all levels are equally represented• i-th observation belongs to j-th level iff: jn (j+1)n d <i< d
  11. 11. Exporting DataGene Name Sample_0 Sample_1 Sample_2 Sample_3 ... Sample_n rpoH 29.345 30.431 25.125 29.543 29.987 mopA 42.746 40.375 41.740 29.345 29.345 htpG 29.345 29.345 29.345 29.345 29.345 ... araE 29.345 29.345 29.345 29.345 29.345
  12. 12. Knowledge Base• Many biological processes are still not known• Reliabiality of data ➡ hybridization is still a handmade process• Small sample size - Huge number of genes ➡ integration with heterogeneous data
  13. 13. What we want to solve?• Genetical cancer forecasting?• Need for a model to handle uncertain knowledge• A model that biologists and epidemiologists can understand• A model to be updated in different times
  14. 14. What we want to solve?• Need for a model to handle uncertain knowledge• A model that biologists and epidemiologists can understand• A model to be updated in different times
  15. 15. Bayesian Networks: features• Can handle uncertain knowledge with probability• Can handle subsequent changes (bio noise, multiple measurements)• Intuitive model a biologist can understand: white box vs. black box (neural networks)
  16. 16. Bayesian Networks: definition• Direct Acyclic Graph (how variables interact each other) A B C• Set of local probability distributions F (p(xi=k | Pa(xi)=j) = ijk) E D G
  17. 17. Bayesian Networks: definition• p(A) Direct Acyclic Graph A B (how variables interact each other) C• Set of local probability distributions F (p(xi=k | Pa(xi)=j) = ijk) E D G
  18. 18. Bayesian Networks: definition• p(A) p(B) Direct Acyclic Graph A B (how variables interact each other) C• Set of local probability distributions F (p(xi=k | Pa(xi)=j) = ijk) E D G
  19. 19. Bayesian Networks: definition• p(A) p(B) Direct Acyclic Graph A B (how variables interact each other) p(C|A,B) C• Set of local probability distributions F (p(xi=k | Pa(xi)=j) = ijk) E D G
  20. 20. Bayesian Networks: definition• p(A) p(B) Direct Acyclic Graph A B (how variables interact each other) p(C|A,B) C• Set of local probability distributions F p(F|B) (p(xi=k | Pa(xi)=j) = ijk) E D G
  21. 21. Bayesian Networks: definition• p(A) p(B) Direct Acyclic Graph A B (how variables interact each other) p(C|A,B) C• Set of local probability distributions F p(F|B) (p(xi=k | Pa(xi)=j) = ijk) p(E|C) E D G
  22. 22. Bayesian Networks: definition• p(A) p(B) Direct Acyclic Graph A B (how variables interact each other) p(C|A,B) C• Set of local probability distributions F p(F|B) (p(xi=k | Pa(xi)=j) = ijk) p(E|C) p(D|C) E D G
  23. 23. Bayesian Networks: definition• p(A) p(B) Direct Acyclic Graph A B (how variables interact each other) p(C|A,B) C• Set of local probability distributions F p(F|B) (p(xi=k | Pa(xi)=j) = ijk) p(E|C) p(D|C) E D p(G|F) G
  24. 24. Bayesian Networks: formal assumptions• Structure Possibility• Complete Data• Markov Condition• Observational Equivalence• Scoring Function
  25. 25. Bayesian Networks: formal assumptions• Structure Possibility Each of the n! structures is possible• Complete Data p(Si |) 0• Markov Condition• Observational Equivalence• Scoring Function
  26. 26. Bayesian Networks: formal assumptions• Structure Possibility• Complete Data• Markov Condition• Observational Equivalence• Scoring Function
  27. 27. Bayesian Networks: formal assumptions• Structure Possibility No missing data in order to compute• Complete Data p(S, S|) and p(C|D, S, ),• Markov Condition C new observation, in closed form• Observational Equivalence• Scoring Function
  28. 28. Bayesian Networks: formal assumptions• Structure Possibility• Complete Data• Markov Condition• Observational Equivalence• Scoring Function
  29. 29. Bayesian Networks: formal assumptions• Structure Possibility Allows to factorize • Complete Data p(x1 , x2 , . . . xn ) = p(xi |P a(xi ))• Markov Condition• Observational Equivalence• Scoring Function
  30. 30. Bayesian Networks: formal assumptions• Structure Possibility• Complete Data• Markov Condition• Observational Equivalence• Scoring Function
  31. 31. Bayesian Networks: formal assumptions X1 X1• Structure Possibility• Complete Data X2 X3 X2 X3• Markov Condition X4 X4• Observational Equivalence• Scoring Function X5 X5
  32. 32. Bayesian Networks: formal assumptions• Structure Possibility• Complete Data• Markov Condition• Observational Equivalence• Scoring Function
  33. 33. Bayesian Networks: formal assumptions• Structure Possibility A function to measure how well a• Complete Data structure fits the data• Markov Condition• Observational Equivalence• Scoring Function
  34. 34. Bayesian Networks: formal assumptions• Structure Possibility• Complete Data• Markov Condition• Observational Equivalence• Scoring Function
  35. 35. Bayesian Network: structure learning• Constraint Satisfaction Problem vs. Optimization Problem • CSP tries to discover dependencies from the data with a statistical hypothesis test • OP searches and tries to improve the score assigned by a scoring function
  36. 36. Bayesian Networks: K2 algorithm• Goal: maximize the structure probability given the data• A initial order is given (A,B,C, D, E, F, G) [Quality measure of the net given the data by Cooper Herskovits]
  37. 37. Bayesian Networks: K2 algorithm [Quality measure of the net given the data by Cooper Herskovits]
  38. 38. Bayesian Networks: K2 algorithm• let D the dataset, N the number of examples,• G the network structure, paij the j th instantiation of P a(xi ),• Nijk the number of data where xi = k and P a(xi ) = j, and ri• Nij = k=1 Nijk [Quality measure of the net given the data by Cooper Herskovits]
  39. 39. Bayesian Networks: K2 algorithm• let D the dataset, N the number of examples,• G the network structure, paij the j th instantiation of P a(xi ),• Nijk the number of data where xi = k and P a(xi ) = j, and ri• Nij = k=1 Nijk P (G, D) = P (G)P (D|G) [Quality measure of the n qi ri net given the data by (ri −1)! P (D|G) = i=1 j=1 (Nij +ri −1)! k=1 Nijk ! Cooper Herskovits]
  40. 40. Bayesian Networks: K2 algorithm• Possible actions • edge addition • edge deletion
  41. 41. Data Integration• heterogeneous data integration• binary gene-gene relations• bayesian network collective learning (Partial Integration)
  42. 42. Data Integration• heterogeneous data integration Gene1 Gene2 Gene3 G1 G9 Gene4 . G7 GeneN G5• binary gene-gene relations Literature extraction Fixed vocabulary G3 G4 G2 prior G8 Abstract Indexing prior• bayesian network collective learning Cosine measure (Partial Integration) prior Gene Similarity Matrix
  43. 43. Data Integration• heterogeneous data integration• binary gene-gene relations• bayesian network collective learning (Partial Integration)
  44. 44. Data Integration• heterogeneous data integration• binary gene-gene relations• bayesian network collective learning (Partial Integration)
  45. 45. Data Integration Microarray data• heterogeneous data integration• binary gene-gene relations• bayesian network collective learning (Partial Integration)
  46. 46. Data Integration Microarray data• heterogeneous data integration• binary gene-gene relations• bayesian network collective learning (Partial Integration)
  47. 47. Data Integration Microarray data Clinical data• heterogeneous data integration• binary gene-gene relations• bayesian network collective learning (Partial Integration)
  48. 48. Data Integration Microarray data Clinical data• heterogeneous data integration• binary gene-gene relations• bayesian network collective learning (Partial Integration)
  49. 49. Data Integration Microarray data Clinical data• heterogeneous data integration• binary gene-gene relations• bayesian network collective learning (Partial Integration)
  50. 50. Data Integration Microarray data Clinical data• heterogeneous data integration• binary gene-gene relations• bayesian network collective learning (Partial Integration)
  51. 51. Data Integration Microarray data Clinical data• heterogeneous data integration• binary gene-gene relations• bayesian network collective learning (Partial Integration)
  52. 52. Experiments and results a generator of synthetic gene expression data SynTReN for design and analysis of structure learning algorithms syntetic model syntetic dataValidator Structure Learning Framework learned model
  53. 53. Experiments and results• Results (random net + bio net (without clinical data))• Idea that clinical data may improve structure learning: more complete biological models (not bad considering that it is a type of data medical centers are equipped)
  54. 54. Learned Structure Network Microarray variables 232 DEMO Clinical variables 11 Patients (train) 78 Patients (test) 19 Structure Learning Computation time 12h (*) (*) Matlab running on Intel 2CoreDuo 2Ghz
  55. 55. Conclusions• Partial Integration of two data sources improves performance within the Bayesian Network Framework• A huge pure-microarray dataset is not helpful• Data Integration leads to fewer variables for each source (pure microarray is expensive)

×