SlideShare a Scribd company logo
Pathway Discovery in Cancer:
                the Bayesian Approach




                              Francesco Gadaleta
Developed and written at ESAT dept. of Electrical Engineering of the Faculty of Engineering
                          Katholieke Universiteit Leuven (Belgium)
Genes and Deseases
Biological Assumptions
  •   Cancer normally originate in a single cell

  •   Cell’s life is regulated by many genes activated in different steps

Types Of Genes
  • Oncogene
  • Tumor-suppressor
  • DNA-repair
Genes and Deseases:
genetic predisposition
Genes and Deseases:
    genetic predisposition

                   Normal cell




First mutation   Second mutation          Third mutation


                                   Malignant Cell
Genes and Deseases:
                Microarray Technology
Cancer Cells
                                Red Fluorescent Probes
                     mRNA              cDNA

                                  Reverse
               RNA Isolation    Transcriptase             Combine Target
Normal Cells                      Labeling

                     mRNA              cDNA
                               Green Fluorescent Probes
Genes and Deseases:
the goal of biologists and genetists

•   Prenatal diagnosis for recognized deseases, eg Down Sindrome

•   Carrier testing to help couples with hereditary desease in the risky
    decision of breeding

•   Patient tailored diagnosis for genetic deseases
Goal of this Thesis

•   Microarray Analysis by more complex tools

•   Integrate in a unique model what is already known from other
    experiments

•   Identify those genes that form desease pathways
Type Of Data

•   Normalization (fluorescent intensity)

•   Filtering of microarray data (how to select subsets of genes)

•   Data Discretization (Are bio reactions discrete events?)
    ➡ Interval discretization

    ➡ Quantile discretization

    ➡ Exporting
Interval Discretization

•   Sort n observations

•   Divide observations in d levels (uniformly spaced intervals)

•   i-th obs. is disretized as j-th level iff:



           x0 + j(xn-1 - x0)                  (j+1)(xn-1 - x0)
                     d            < xi < x0 +       d
Quantile Discretization

•   Sort n observations

•   Divide all observations in d levels by placing an equal number of obs.
    in each bin: all levels are equally represented

•   i-th observation belongs to j-th level iff:

                            jn             (j+1)n
                            d     <i<         d
Exporting Data
Gene Name Sample_0   Sample_1 Sample_2 Sample_3   ...   Sample_n

 rpoH      29.345     30.431   25.125   29.543           29.987

 mopA      42.746     40.375   41.740   29.345           29.345

 htpG      29.345     29.345   29.345   29.345           29.345

   ...
 araE      29.345     29.345   29.345   29.345           29.345
Knowledge Base

•   Many biological processes are still not known


•   Reliabiality of data
      ➡ hybridization is still a handmade process

•   Small sample size - Huge number of genes
      ➡ integration with heterogeneous data
What we want to solve?
•   Genetical cancer forecasting?


•   Need for a model to handle uncertain knowledge

•   A model that biologists and epidemiologists can understand

•   A model to be updated in different times
What we want to solve?


•   Need for a model to handle uncertain knowledge

•   A model that biologists and epidemiologists can understand

•   A model to be updated in different times
Bayesian Networks:
                              features

•   Can handle uncertain knowledge with probability

•   Can handle subsequent changes (bio noise, multiple measurements)

•   Intuitive model a biologist can understand: white box vs. black box
    (neural networks)
Bayesian Networks:
                         definition

•   Direct Acyclic Graph
    (how variables interact each other)
                                                 A       B


                                                     C
•   Set of local probability distributions                   F
    (p(xi=k | Pa(xi)=j) = ijk)
                                             E       D

                                                             G
Bayesian Networks:
                         definition

•
                                             p(A)
    Direct Acyclic Graph                            A       B
    (how variables interact each other)

                                                        C
•   Set of local probability distributions                      F
    (p(xi=k | Pa(xi)=j) = ijk)
                                             E          D

                                                                G
Bayesian Networks:
                         definition

•
                                             p(A)               p(B)
    Direct Acyclic Graph                            A       B
    (how variables interact each other)

                                                        C
•   Set of local probability distributions                         F
    (p(xi=k | Pa(xi)=j) = ijk)
                                             E          D

                                                                   G
Bayesian Networks:
                         definition

•
                                             p(A)                   p(B)
    Direct Acyclic Graph                              A         B
    (how variables interact each other)

                                                 p(C|A,B)
                                                            C
•   Set of local probability distributions                             F
    (p(xi=k | Pa(xi)=j) = ijk)
                                             E              D

                                                                       G
Bayesian Networks:
                         definition

•
                                             p(A)                   p(B)
    Direct Acyclic Graph                              A         B
    (how variables interact each other)

                                                 p(C|A,B)
                                                            C
•   Set of local probability distributions                             F   p(F|B)

    (p(xi=k | Pa(xi)=j) = ijk)
                                             E              D

                                                                       G
Bayesian Networks:
                         definition

•
                                                      p(A)                   p(B)
    Direct Acyclic Graph                                       A         B
    (how variables interact each other)

                                                          p(C|A,B)
                                                                     C
•   Set of local probability distributions                                      F   p(F|B)

    (p(xi=k | Pa(xi)=j) = ijk)
                                             p(E|C)
                                                      E              D

                                                                                G
Bayesian Networks:
                         definition

•
                                                      p(A)                        p(B)
    Direct Acyclic Graph                                       A           B
    (how variables interact each other)

                                                          p(C|A,B)
                                                                     C
•   Set of local probability distributions                                           F   p(F|B)

    (p(xi=k | Pa(xi)=j) = ijk)
                                             p(E|C)                      p(D|C)
                                                      E              D

                                                                                     G
Bayesian Networks:
                         definition

•
                                                      p(A)                        p(B)
    Direct Acyclic Graph                                       A           B
    (how variables interact each other)

                                                          p(C|A,B)
                                                                     C
•   Set of local probability distributions                                           F   p(F|B)

    (p(xi=k | Pa(xi)=j) = ijk)
                                             p(E|C)                      p(D|C)
                                                      E              D
                                                                                         p(G|F)
                                                                                     G
Bayesian Networks:
                     formal assumptions

• Structure Possibility
• Complete Data
• Markov Condition
• Observational Equivalence
• Scoring Function
Bayesian Networks:
                     formal assumptions

• Structure Possibility          Each of the n! structures is possible
• Complete Data               p(Si |)  0
• Markov Condition
• Observational Equivalence
• Scoring Function
Bayesian Networks:
                     formal assumptions

• Structure Possibility
• Complete Data
• Markov Condition
• Observational Equivalence
• Scoring Function
Bayesian Networks:
                     formal assumptions

• Structure Possibility           No missing data in order to compute
• Complete Data               p(S, S|) and p(C|D, S, ),
• Markov Condition            C new observation,
                              in closed form
• Observational Equivalence
• Scoring Function
Bayesian Networks:
                     formal assumptions

• Structure Possibility
• Complete Data
• Markov Condition
• Observational Equivalence
• Scoring Function
Bayesian Networks:
                     formal assumptions

• Structure Possibility          Allows to factorize  
• Complete Data               p(x1 , x2 , . . . xn ) = p(xi |P a(xi ))
• Markov Condition
• Observational Equivalence
• Scoring Function
Bayesian Networks:
                     formal assumptions

• Structure Possibility
• Complete Data
• Markov Condition
• Observational Equivalence
• Scoring Function
Bayesian Networks:
                     formal assumptions
                                   X1             X1


• Structure Possibility
• Complete Data               X2        X3   X2        X3



• Markov Condition                 X4             X4


• Observational Equivalence
• Scoring Function                 X5             X5
Bayesian Networks:
                     formal assumptions

• Structure Possibility
• Complete Data
• Markov Condition
• Observational Equivalence
• Scoring Function
Bayesian Networks:
                     formal assumptions

• Structure Possibility       A function to measure how well a
• Complete Data               structure fits the data
• Markov Condition
• Observational Equivalence
• Scoring Function
Bayesian Networks:
                     formal assumptions

• Structure Possibility
• Complete Data
• Markov Condition
• Observational Equivalence
• Scoring Function
Bayesian Network:
                   structure learning

•   Constraint Satisfaction Problem vs. Optimization Problem

       •   CSP tries to discover dependencies from the data with a statistical
           hypothesis test

       •   OP searches and tries to improve the score assigned by a scoring
           function
Bayesian Networks:
             K2 algorithm

•   Goal: maximize the structure probability given the data
•   A initial order is given (A,B,C, D, E, F, G)



                                                       [Quality measure of the
                                                        net given the data by
                                                        Cooper  Herskovits]
Bayesian Networks:
   K2 algorithm




                     [Quality measure of the
                      net given the data by
                      Cooper  Herskovits]
Bayesian Networks:
                       K2 algorithm
• let D the dataset, N the number of examples,


• G the network structure, paij the j th instantiation of P a(xi ),


• Nijk the number of data where xi = k and P a(xi ) = j, and

          ri
• Nij =    k=1   Nijk

                                                                      [Quality measure of the
                                                                       net given the data by
                                                                       Cooper  Herskovits]
Bayesian Networks:
                       K2 algorithm
• let D the dataset, N the number of examples,


• G the network structure, paij the j th instantiation of P a(xi ),


• Nijk the number of data where xi = k and P a(xi ) = j, and

          ri
• Nij =    k=1   Nijk

                          P (G, D) = P (G)P (D|G)                                [Quality measure of the
                                    n     qi                   ri              net given the data by
                                                  (ri −1)!
                        P (D|G) =    i=1    j=1 (Nij +ri −1)!     k=1   Nijk !    Cooper  Herskovits]
Bayesian Networks:
                       K2 algorithm

•   Possible actions

     •   edge addition


     •   edge deletion
Data Integration

•   heterogeneous data integration



•   binary gene-gene relations



•   bayesian network collective learning
    (Partial Integration)
Data Integration

•   heterogeneous data integration
                                                 Gene1
                                                 Gene2
                                                 Gene3                                                          G1         G9
                                                 Gene4
                                                   .                                                                                  G7

                                                 GeneN
                                                                                                                      G5



•   binary gene-gene relations             Literature extraction
                                                                   Fixed vocabulary
                                                                                                                G3
                                                                                                                            G4
                                                                                                                                 G2




                                                                                        prior                        G8
                                                                   Abstract
                                                                   Indexing
                                                                                                prior

•   bayesian network collective learning
                                                                     Cosine measure
    (Partial Integration)                                                                               prior

                                                                                 Gene
                                                                               Similarity
                                                                                Matrix
Data Integration

•   heterogeneous data integration



•   binary gene-gene relations



•   bayesian network collective learning
    (Partial Integration)
Data Integration

•   heterogeneous data integration



•   binary gene-gene relations



•   bayesian network collective learning
    (Partial Integration)
Data Integration
                                           Microarray data



•   heterogeneous data integration



•   binary gene-gene relations



•   bayesian network collective learning
    (Partial Integration)
Data Integration
                                           Microarray data



•   heterogeneous data integration



•   binary gene-gene relations



•   bayesian network collective learning
    (Partial Integration)
Data Integration
                                           Microarray data   Clinical data



•   heterogeneous data integration



•   binary gene-gene relations



•   bayesian network collective learning
    (Partial Integration)
Data Integration
                                           Microarray data   Clinical data



•   heterogeneous data integration



•   binary gene-gene relations



•   bayesian network collective learning
    (Partial Integration)
Data Integration
                                           Microarray data   Clinical data



•   heterogeneous data integration



•   binary gene-gene relations



•   bayesian network collective learning
    (Partial Integration)
Data Integration
                                           Microarray data   Clinical data



•   heterogeneous data integration



•   binary gene-gene relations



•   bayesian network collective learning
    (Partial Integration)
Data Integration
                                           Microarray data   Clinical data



•   heterogeneous data integration



•   binary gene-gene relations



•   bayesian network collective learning
    (Partial Integration)
Experiments and results
                             a generator of synthetic gene expression data
            SynTReN          for design and analysis of structure learning
                             algorithms




            syntetic model                                  syntetic data
Validator


                                                             Structure
                                                             Learning
                                                            Framework



            learned model
Experiments and results

•   Results (random net + bio net (without clinical data))

•   Idea that clinical data may improve structure learning: more complete
    biological models (not bad considering that it is a type of data medical centers are
    equipped)
Learned Structure Network

                    Microarray variables
                                               232
   DEMO
                     Clinical variables
                                                11
                          Patients
                           (train)              78
                          Patients
                           (test)              19
                    Structure Learning
                    Computation time          12h (*)


                 (*) Matlab running on Intel 2CoreDuo 2Ghz
Conclusions

•   Partial Integration of two data sources improves performance within
    the Bayesian Network Framework

•   A huge pure-microarray dataset is not helpful

•   Data Integration leads to fewer variables for each source (pure
    microarray is expensive)

More Related Content

Similar to Pathway Discovery in Cancer: the Bayesian Approach

Jylee probabilistic reasoning with bayesian networks
Jylee probabilistic reasoning with bayesian networksJylee probabilistic reasoning with bayesian networks
Jylee probabilistic reasoning with bayesian networks
Jungyeol
 
Bayesian network
Bayesian networkBayesian network
Bayesian network
Rafsan Siddiqui
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
Adnan Masood
 
Bayesian Core: Chapter 8
Bayesian Core: Chapter 8Bayesian Core: Chapter 8
Bayesian Core: Chapter 8
Christian Robert
 
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual IdentificationSubspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
United States Air Force Academy
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Gota Morota
 
Logistic Regression(SGD)
Logistic Regression(SGD)Logistic Regression(SGD)
Logistic Regression(SGD)
Prentice Xu
 
Causal Bayesian Networks
Causal Bayesian NetworksCausal Bayesian Networks
Causal Bayesian Networks
Flávio Codeço Coelho
 
01 graphical models
01 graphical models01 graphical models
01 graphical models
zukun
 
Marked Point Process For Neurite Tracing
Marked Point Process For Neurite TracingMarked Point Process For Neurite Tracing
Marked Point Process For Neurite Tracing
IPALab
 
icml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Iicml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part I
zukun
 

Similar to Pathway Discovery in Cancer: the Bayesian Approach (11)

Jylee probabilistic reasoning with bayesian networks
Jylee probabilistic reasoning with bayesian networksJylee probabilistic reasoning with bayesian networks
Jylee probabilistic reasoning with bayesian networks
 
Bayesian network
Bayesian networkBayesian network
Bayesian network
 
Bayesian Networks - A Brief Introduction
Bayesian Networks - A Brief IntroductionBayesian Networks - A Brief Introduction
Bayesian Networks - A Brief Introduction
 
Bayesian Core: Chapter 8
Bayesian Core: Chapter 8Bayesian Core: Chapter 8
Bayesian Core: Chapter 8
 
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual IdentificationSubspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
Subspace Indexing on Grassmannian Manifold for Large Scale Visual Identification
 
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
Application of Bayesian and Sparse Network Models for Assessing Linkage Diseq...
 
Logistic Regression(SGD)
Logistic Regression(SGD)Logistic Regression(SGD)
Logistic Regression(SGD)
 
Causal Bayesian Networks
Causal Bayesian NetworksCausal Bayesian Networks
Causal Bayesian Networks
 
01 graphical models
01 graphical models01 graphical models
01 graphical models
 
Marked Point Process For Neurite Tracing
Marked Point Process For Neurite TracingMarked Point Process For Neurite Tracing
Marked Point Process For Neurite Tracing
 
icml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part Iicml2004 tutorial on spectral clustering part I
icml2004 tutorial on spectral clustering part I
 

Recently uploaded

CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
camakaiclarkmusic
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
adhitya5119
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
chanes7
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
Academy of Science of South Africa
 
Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.
Ashokrao Mane college of Pharmacy Peth-Vadgaon
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
TechSoup
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
Jean Carlos Nunes Paixão
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
taiba qazi
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
simonomuemu
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
History of Stoke Newington
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
Priyankaranawat4
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
PECB
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
Peter Windle
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
Israel Genealogy Research Association
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
Jean Carlos Nunes Paixão
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
heathfieldcps1
 

Recently uploaded (20)

CACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdfCACJapan - GROUP Presentation 1- Wk 4.pdf
CACJapan - GROUP Presentation 1- Wk 4.pdf
 
Main Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docxMain Java[All of the Base Concepts}.docx
Main Java[All of the Base Concepts}.docx
 
Digital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments UnitDigital Artifact 1 - 10VCD Environments Unit
Digital Artifact 1 - 10VCD Environments Unit
 
PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)South African Journal of Science: Writing with integrity workshop (2024)
South African Journal of Science: Writing with integrity workshop (2024)
 
Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.Types of Herbal Cosmetics its standardization.
Types of Herbal Cosmetics its standardization.
 
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat  Leveraging AI for Diversity, Equity, and InclusionExecutive Directors Chat  Leveraging AI for Diversity, Equity, and Inclusion
Executive Directors Chat Leveraging AI for Diversity, Equity, and Inclusion
 
Lapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdfLapbook sobre os Regimes Totalitários.pdf
Lapbook sobre os Regimes Totalitários.pdf
 
DRUGS AND ITS classification slide share
DRUGS AND ITS classification slide shareDRUGS AND ITS classification slide share
DRUGS AND ITS classification slide share
 
Smart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICTSmart-Money for SMC traders good time and ICT
Smart-Money for SMC traders good time and ICT
 
The History of Stoke Newington Street Names
The History of Stoke Newington Street NamesThe History of Stoke Newington Street Names
The History of Stoke Newington Street Names
 
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdfANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
ANATOMY AND BIOMECHANICS OF HIP JOINT.pdf
 
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
ISO/IEC 27001, ISO/IEC 42001, and GDPR: Best Practices for Implementation and...
 
A Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in EducationA Strategic Approach: GenAI in Education
A Strategic Approach: GenAI in Education
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
The Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collectionThe Diamonds of 2023-2024 in the IGRA collection
The Diamonds of 2023-2024 in the IGRA collection
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
A Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdfA Independência da América Espanhola LAPBOOK.pdf
A Independência da América Espanhola LAPBOOK.pdf
 
The basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptxThe basics of sentences session 6pptx.pptx
The basics of sentences session 6pptx.pptx
 

Pathway Discovery in Cancer: the Bayesian Approach

  • 1. Pathway Discovery in Cancer: the Bayesian Approach Francesco Gadaleta Developed and written at ESAT dept. of Electrical Engineering of the Faculty of Engineering Katholieke Universiteit Leuven (Belgium)
  • 2. Genes and Deseases Biological Assumptions • Cancer normally originate in a single cell • Cell’s life is regulated by many genes activated in different steps Types Of Genes • Oncogene • Tumor-suppressor • DNA-repair
  • 4. Genes and Deseases: genetic predisposition Normal cell First mutation Second mutation Third mutation Malignant Cell
  • 5. Genes and Deseases: Microarray Technology Cancer Cells Red Fluorescent Probes mRNA cDNA Reverse RNA Isolation Transcriptase Combine Target Normal Cells Labeling mRNA cDNA Green Fluorescent Probes
  • 6. Genes and Deseases: the goal of biologists and genetists • Prenatal diagnosis for recognized deseases, eg Down Sindrome • Carrier testing to help couples with hereditary desease in the risky decision of breeding • Patient tailored diagnosis for genetic deseases
  • 7. Goal of this Thesis • Microarray Analysis by more complex tools • Integrate in a unique model what is already known from other experiments • Identify those genes that form desease pathways
  • 8. Type Of Data • Normalization (fluorescent intensity) • Filtering of microarray data (how to select subsets of genes) • Data Discretization (Are bio reactions discrete events?) ➡ Interval discretization ➡ Quantile discretization ➡ Exporting
  • 9. Interval Discretization • Sort n observations • Divide observations in d levels (uniformly spaced intervals) • i-th obs. is disretized as j-th level iff: x0 + j(xn-1 - x0) (j+1)(xn-1 - x0) d < xi < x0 + d
  • 10. Quantile Discretization • Sort n observations • Divide all observations in d levels by placing an equal number of obs. in each bin: all levels are equally represented • i-th observation belongs to j-th level iff: jn (j+1)n d <i< d
  • 11. Exporting Data Gene Name Sample_0 Sample_1 Sample_2 Sample_3 ... Sample_n rpoH 29.345 30.431 25.125 29.543 29.987 mopA 42.746 40.375 41.740 29.345 29.345 htpG 29.345 29.345 29.345 29.345 29.345 ... araE 29.345 29.345 29.345 29.345 29.345
  • 12. Knowledge Base • Many biological processes are still not known • Reliabiality of data ➡ hybridization is still a handmade process • Small sample size - Huge number of genes ➡ integration with heterogeneous data
  • 13. What we want to solve? • Genetical cancer forecasting? • Need for a model to handle uncertain knowledge • A model that biologists and epidemiologists can understand • A model to be updated in different times
  • 14. What we want to solve? • Need for a model to handle uncertain knowledge • A model that biologists and epidemiologists can understand • A model to be updated in different times
  • 15. Bayesian Networks: features • Can handle uncertain knowledge with probability • Can handle subsequent changes (bio noise, multiple measurements) • Intuitive model a biologist can understand: white box vs. black box (neural networks)
  • 16. Bayesian Networks: definition • Direct Acyclic Graph (how variables interact each other) A B C • Set of local probability distributions F (p(xi=k | Pa(xi)=j) = ijk) E D G
  • 17. Bayesian Networks: definition • p(A) Direct Acyclic Graph A B (how variables interact each other) C • Set of local probability distributions F (p(xi=k | Pa(xi)=j) = ijk) E D G
  • 18. Bayesian Networks: definition • p(A) p(B) Direct Acyclic Graph A B (how variables interact each other) C • Set of local probability distributions F (p(xi=k | Pa(xi)=j) = ijk) E D G
  • 19. Bayesian Networks: definition • p(A) p(B) Direct Acyclic Graph A B (how variables interact each other) p(C|A,B) C • Set of local probability distributions F (p(xi=k | Pa(xi)=j) = ijk) E D G
  • 20. Bayesian Networks: definition • p(A) p(B) Direct Acyclic Graph A B (how variables interact each other) p(C|A,B) C • Set of local probability distributions F p(F|B) (p(xi=k | Pa(xi)=j) = ijk) E D G
  • 21. Bayesian Networks: definition • p(A) p(B) Direct Acyclic Graph A B (how variables interact each other) p(C|A,B) C • Set of local probability distributions F p(F|B) (p(xi=k | Pa(xi)=j) = ijk) p(E|C) E D G
  • 22. Bayesian Networks: definition • p(A) p(B) Direct Acyclic Graph A B (how variables interact each other) p(C|A,B) C • Set of local probability distributions F p(F|B) (p(xi=k | Pa(xi)=j) = ijk) p(E|C) p(D|C) E D G
  • 23. Bayesian Networks: definition • p(A) p(B) Direct Acyclic Graph A B (how variables interact each other) p(C|A,B) C • Set of local probability distributions F p(F|B) (p(xi=k | Pa(xi)=j) = ijk) p(E|C) p(D|C) E D p(G|F) G
  • 24. Bayesian Networks: formal assumptions • Structure Possibility • Complete Data • Markov Condition • Observational Equivalence • Scoring Function
  • 25. Bayesian Networks: formal assumptions • Structure Possibility Each of the n! structures is possible • Complete Data p(Si |) 0 • Markov Condition • Observational Equivalence • Scoring Function
  • 26. Bayesian Networks: formal assumptions • Structure Possibility • Complete Data • Markov Condition • Observational Equivalence • Scoring Function
  • 27. Bayesian Networks: formal assumptions • Structure Possibility No missing data in order to compute • Complete Data p(S, S|) and p(C|D, S, ), • Markov Condition C new observation, in closed form • Observational Equivalence • Scoring Function
  • 28. Bayesian Networks: formal assumptions • Structure Possibility • Complete Data • Markov Condition • Observational Equivalence • Scoring Function
  • 29. Bayesian Networks: formal assumptions • Structure Possibility Allows to factorize • Complete Data p(x1 , x2 , . . . xn ) = p(xi |P a(xi )) • Markov Condition • Observational Equivalence • Scoring Function
  • 30. Bayesian Networks: formal assumptions • Structure Possibility • Complete Data • Markov Condition • Observational Equivalence • Scoring Function
  • 31. Bayesian Networks: formal assumptions X1 X1 • Structure Possibility • Complete Data X2 X3 X2 X3 • Markov Condition X4 X4 • Observational Equivalence • Scoring Function X5 X5
  • 32. Bayesian Networks: formal assumptions • Structure Possibility • Complete Data • Markov Condition • Observational Equivalence • Scoring Function
  • 33. Bayesian Networks: formal assumptions • Structure Possibility A function to measure how well a • Complete Data structure fits the data • Markov Condition • Observational Equivalence • Scoring Function
  • 34. Bayesian Networks: formal assumptions • Structure Possibility • Complete Data • Markov Condition • Observational Equivalence • Scoring Function
  • 35. Bayesian Network: structure learning • Constraint Satisfaction Problem vs. Optimization Problem • CSP tries to discover dependencies from the data with a statistical hypothesis test • OP searches and tries to improve the score assigned by a scoring function
  • 36. Bayesian Networks: K2 algorithm • Goal: maximize the structure probability given the data • A initial order is given (A,B,C, D, E, F, G) [Quality measure of the net given the data by Cooper Herskovits]
  • 37. Bayesian Networks: K2 algorithm [Quality measure of the net given the data by Cooper Herskovits]
  • 38. Bayesian Networks: K2 algorithm • let D the dataset, N the number of examples, • G the network structure, paij the j th instantiation of P a(xi ), • Nijk the number of data where xi = k and P a(xi ) = j, and ri • Nij = k=1 Nijk [Quality measure of the net given the data by Cooper Herskovits]
  • 39. Bayesian Networks: K2 algorithm • let D the dataset, N the number of examples, • G the network structure, paij the j th instantiation of P a(xi ), • Nijk the number of data where xi = k and P a(xi ) = j, and ri • Nij = k=1 Nijk P (G, D) = P (G)P (D|G) [Quality measure of the n qi ri net given the data by (ri −1)! P (D|G) = i=1 j=1 (Nij +ri −1)! k=1 Nijk ! Cooper Herskovits]
  • 40. Bayesian Networks: K2 algorithm • Possible actions • edge addition • edge deletion
  • 41. Data Integration • heterogeneous data integration • binary gene-gene relations • bayesian network collective learning (Partial Integration)
  • 42. Data Integration • heterogeneous data integration Gene1 Gene2 Gene3 G1 G9 Gene4 . G7 GeneN G5 • binary gene-gene relations Literature extraction Fixed vocabulary G3 G4 G2 prior G8 Abstract Indexing prior • bayesian network collective learning Cosine measure (Partial Integration) prior Gene Similarity Matrix
  • 43. Data Integration • heterogeneous data integration • binary gene-gene relations • bayesian network collective learning (Partial Integration)
  • 44. Data Integration • heterogeneous data integration • binary gene-gene relations • bayesian network collective learning (Partial Integration)
  • 45. Data Integration Microarray data • heterogeneous data integration • binary gene-gene relations • bayesian network collective learning (Partial Integration)
  • 46. Data Integration Microarray data • heterogeneous data integration • binary gene-gene relations • bayesian network collective learning (Partial Integration)
  • 47. Data Integration Microarray data Clinical data • heterogeneous data integration • binary gene-gene relations • bayesian network collective learning (Partial Integration)
  • 48. Data Integration Microarray data Clinical data • heterogeneous data integration • binary gene-gene relations • bayesian network collective learning (Partial Integration)
  • 49. Data Integration Microarray data Clinical data • heterogeneous data integration • binary gene-gene relations • bayesian network collective learning (Partial Integration)
  • 50. Data Integration Microarray data Clinical data • heterogeneous data integration • binary gene-gene relations • bayesian network collective learning (Partial Integration)
  • 51. Data Integration Microarray data Clinical data • heterogeneous data integration • binary gene-gene relations • bayesian network collective learning (Partial Integration)
  • 52. Experiments and results a generator of synthetic gene expression data SynTReN for design and analysis of structure learning algorithms syntetic model syntetic data Validator Structure Learning Framework learned model
  • 53. Experiments and results • Results (random net + bio net (without clinical data)) • Idea that clinical data may improve structure learning: more complete biological models (not bad considering that it is a type of data medical centers are equipped)
  • 54. Learned Structure Network Microarray variables 232 DEMO Clinical variables 11 Patients (train) 78 Patients (test) 19 Structure Learning Computation time 12h (*) (*) Matlab running on Intel 2CoreDuo 2Ghz
  • 55. Conclusions • Partial Integration of two data sources improves performance within the Bayesian Network Framework • A huge pure-microarray dataset is not helpful • Data Integration leads to fewer variables for each source (pure microarray is expensive)