SlideShare a Scribd company logo
1 of 71
Introduction to causal discovery:
 A Bayesian Networks approach

               Ioannis Tsamardinos1, 2
               Sofia Triantafillou1, 2
               Vincenzo Lagani1
               1Bioinformatics Laboratory,
               Institute of Computer Science,
               Foundation for Research and Tech., Hellas
               2Computer   Science Department
               University of Crete
Democritus said that he
would rather discover a
single cause than be the
      king of Persia



                           “Beyond such discarded fundamentals
                           as ‘matter’ and ‘force’ lies still another
                           fetish amidst the inscrutable arcana of
                           modern science, namely the category of
                           cause and effect”
                                                     [K. Pearson]
What is causality?
• What do you understand when I say :
  – smoking causes lung cancer?
What is (probabilistic) causality?
• What do you understand when I say :
  – smoking causes lung cancer?

• A causes B:
  – A causally affects B
  – Probabilistically
  – Intervening onto values of A will affect the distribution
    of B
  – in some appropriate context
Statistical Association
        (Unconditional Dependency)
• Dep( X, Y |     )

• X and Y are associated
   – Observing the value of X may change the conditional
     distribution of the (observed) values of Y: P(Y | X) P(Y)
   – Knowledge of X provides information for Y
   – Observed X is predictive for observed Y and vice versa
   – Knowing X changes our beliefs for the distribution of Y

   – Makes no claims about the distribution of Y, if instead of
     observing, we intervene on the values of X

• Several means for measuring it
Association is NOT Causation
• Yellow teeth and lung cancer are associated

• Can I bleach my teeth and reduce the
  probability of getting lung cancer?

• Is Smoking really causing Lung Cancer?
BUT

“If A and B are correlated, A causes
B OR B causes A OR they share
a latent common cause“




                                  [Hans Reichenbach]
Is Smoking Causing Lung Cancer?
              All possible models*

             common                               common
              cause                                cause

                       Lung                                 Lung
   Smoking                              Smoking
                      Cancer                               Cancer




                       Lung
   Smoking
                      Cancer
                               *assuming:
                               1. Smoking precedes Lung Cancer
                               2. No feedback cycles
                               3. Several hidden common causes can be modeled
                                  by a single hidden common cause
A way to learn causality
          1. Take 200 people
          2. Randomly split them in control and
             treatment groups
          3. Force control group to smoke, force
             treatment group not to smoke
          4. Wait until they are 60 years old
          5. Measure correlation


                [ Randomized Control Trial ]
 [Sir Ronald Fisher]
Manipulation
                  All possible models*

                common                            common
RCT              cause            RCT              cause

                          Lung                              Lung
      Smoking                           Smoking
                         Cancer                            Cancer


RCT

                          Lung
      Smoking
                         Cancer
Manipulation removes other causes
                  All possible models*

                common                            common
RCT              cause            RCT              cause

                          Lung                              Lung
      Smoking                           Smoking
                         Cancer                            Cancer


RCT

                          Lung
      Smoking
                         Cancer
Manipulation removes other causes
                  All possible models*

                common                            common
RCT              cause            RCT              cause

                          Lung                              Lung
      Smoking                           Smoking
                         Cancer                            Cancer


RCT

                          Lung
      Smoking
                         Cancer    Association persists
                                        only when
                                  relationship is causal
RCTs are hard
• Can we learn anything from observational
  data?
RCTs are hard
• Can we learn anything from observational
  data?

“If A and B are correlated, A   causes
B OR B causes A OR they share
a latent common cause“


                                    [Hans Reichenbach]
Conditional Association
          (Conditional Dependency)
• Dep( X, Y | Z)

• X and Y are associated conditioned on Z
   – For some values of Z (some context)

   – Knowledge of X still provides information for Y

   – Observed X is still predictive for observed Y and vice versa

• Statistically estimable
Conditioning and Causality


Burglar             Earthquake



          Alarm




             Call




                                 [example by Judea Pearl]
Conditioning and Causality


     Burglar             Earthquake



               Alarm




                  Call



Dep(Burglar, Call| )
     Burglar provides information for Call
Conditioning and Causality

                                       Learning the value of
      Burglar             Earthquake
                                       intermediate and
                Alarm                  common causes
                                       renders variables
                   Call                independent
Ind (Burglar, Call| Alarm)
Burglar provides no information for Call once Alarm is known
Conditioning and Causality


    Burglar             Earthquake



              Alarm




                 Call




Ind (Burglar, Earthquake| )
Conditioning and Causality


    Burglar             Earthquake



              Alarm




                 Call




Ind (Burglar, Earthquake| )
Conditioning and Causality

                                       Learning the value of
      Burglar             Earthquake
                                       common effects
                Alarm                  renders variables
                                       dependent
                   Call




Dep (Burglar, Earthquake| Alarm)
Observing data from a causal model
               Smoking




Yellow Teeth             Lung Cancer




 What would you observe?
 • Dep(Lung Cancer, Yellow Teeth| )
 • Dep(Smoking, Lung Cancer | )
 • Dep(Lung Cancer, Yellow Teeth | )
 • Ind(Lung Cancer, Yellow Teeth| Smoking)
Observing data from a causal model
               Smoking




Yellow Teeth             Lung Cancer




 What would you observe?
 • Dep(Lung Cancer, Yellow Teeth| )
 • Dep(Smoking, Lung Cancer | )
 • Dep(Lung Cancer, Yellow Teeth | )
 • Ind(Lung Cancer, Yellow Teeth| Smoking)
Observing data from a causal model
               Smoking




Yellow Teeth             Lung Cancer




 What would you observe?
 • Dep(Lung Cancer, Yellow Teeth| )
 • Dep(Smoking, Lung Cancer | )
 • Dep(Lung Cancer, Yellow Teeth | )
 • Ind(Lung Cancer, Yellow Teeth| Smoking)
Observing data from a causal model
               Smoking




Yellow Teeth             Lung Cancer




 What would you observe?
 • Dep(Lung Cancer, Yellow Teeth| )
 • Dep(Smoking, Lung Cancer | )
 • Ind(Lung Cancer, Yellow Teeth | )
 • Dep(Lung Cancer, Yellow Teeth| Smoking)
Markov Equivalent Networks
               Smoking                                   Smoking




Yellow Teeth             Lung Cancer      Yellow Teeth             Lung Cancer




               Smoking
                                       • Same conditional
                                         Independencies
                                       • Same skeleton
Yellow Teeth             Lung Cancer   • Same v-structures (subgraphs
                                         X Y Z no X-Z)
Causal Bayesian Networks*
       Graph G                                      JPD J
                Smoking                                    Lung Cancer

                                   Smoking     Yellow      Yes           No
                                               Teeth

Yellow Teeth                       Yes         Yes         0,01          0,04

                                   Yes         No          0,01          0,04

                                   No          Yes         0,000045      0,044955

                                   No          No          0,000855      0,854145
        Lung Cancer       Assumptions about the nature of causality
                          connect the graph G with the observed
                          distribution J and allow reasoning

                                                              *almost there
Causal Markov Condition (CMC)
                          Smoking                      Lung Cancer
                                    Smoking   Yellow   Yes       No
                                              Teeth

         Yellow Teeth               Yes       Yes      0,01      0,04
                                    Yes       No       0,01      0,04
                                    No        Yes      0,000045 0,044955

                                    No        No       0,000855 0,854145
                  Lung Cancer




 Every variable is (conditionally) independent of its
non-effects (non-descendants in the graph)
given its direct causes (parents)
Causal Markov Condition
               Smoking   P(Yellow Teeth, Smoking, Lung Cancer) =
                         P(Smoking)
                         P(Yellow Teeth| Smoking)
                         P(Lung Cancer| Smoking)
Yellow Teeth




        Lung Cancer
Factorization with the CMC
                              P(Yellow Teeth, Smoking, Lung Cancer) =
                              P(Smoking)
                              P(Yellow Teeth| Smoking)
                              P(Lung Cancer| Smoking)

                                         Smoking      P(Smoking) = 0.1
P(Yellow Teeth|Smoking) = 0.5
P(Yellow Teeth| Smoking) = 0.05

                       Yellow Teeth



                                                P(Lung Cancer|Smoking) = 0.2
                                                P(Lung Cancer| Smoking) = 0.001
                                  Lung Cancer
Using a Causal Bayesian Network
                                          Smoking
1. Factorize the jpd                                            Medicine Y

2. Answer questions like:
    1. P(Lung Cancer| Levels of Protein
      X) = ?
                                                    Levels of
                                                    Protein X              Lung
   2. Ind(Smoking, Fatigue| Levels of                                      Cancer
      Protein X)?
                                          Yellow-
                                          stained
   3. What will happen if I design a      Fingers
      drug that blocks the function of
      protein X (predict effect of                               Fatigue
      interventions)?
Bayesian Networks

                            I don’t like all these
                                assumptions




                I kind of liked reducing
                 the parameters of the
                      distribution
Drop the
Causal part!
Using a Bayesian Network
                                  Smoking
1. Factorize the jpd
                                                           Medicine Y

2. Answer questions like:
    1. P(Lung Cancer| Levels of
      Protein X) = ?
                                               Levels of
   2. Ind(Smoking, Fatigue|                    Protein X                Lung
                                                                        Cancer
      Levels of Protein X)?
                              Yellow-stained
                                  Fingers



                                                            Fatigue
Observing a causal model
                                                                  Raw data
                                            Subject #   Smoking      Yellow   Lung
                   Smoking
                                                                     Teeth    Cancer

                                            1           0            0        0

                                            2           1            1        0

                                            3           1            1        0

  Yellow Teeth               Lung Cancer    4           0            1        0

                                            5           0            0        0

                                            6           1            1        0

                                            7           1            1        1

                                            8           0            1        0

                                            9           0            0        0
What would you observe?                     10          1            1        0
• Dep(Lung Cancer, Yellow Teeth| )          11          1            1        0

• Dep(Smoking, Lung Cancer | )              .           .            .        .

• Dep(Lung Cancer, Yellow Teeth | )         .           .            .        .

• Ind(Lung Cancer, Yellow Teeth| Smoking)   .           .            .        .

                                            10000       1            1        1
Learning Set of Equivalent Networks

Constraint-Based Approach   Score-Based (Bayesian)



Test conditional            Find the DAG with the
independencies in           maximum a posteriori
data and find a DAG that    probability given the
encodes them                data
Learning the network
 Constraint-Based Approach                 Bayesian Approach
•SGS [Spirtes, Glymour, & Scheines 2000]   •K2 [Cooper and Herskowitz 1992]
• PC [Spirtes, Glymour, & Scheines 2000]   •GBPS [Spirtes and Meek 1995]
                                           •GES [Chickering and Meek 2002]
• TPDA [Cheng et al., 1997]
                                           •Sparse Candidate [Friedman et al. 1999]
•CPC [Ramsey et al, 2006]                  •Optimal Reinsertion [Moore and Wong
                                           2003]

Hybrid                                     •Rec [Xie, X, Geng, Zhi, JMLR 2008]
                                           •Exact Algorithms [Koivisto et al., 2004] ,
•MMHC [Tsamardinos et al. 2006]            [Koivisto, 2006] , [Silander & Myllymaki, 2006]

• CB [Provan et al. 1995]
• BENEDICT [Provan and de Campos 2001]
•ECOS [Kaname et al. 2010]                 Many more!!!
Assumptions*
• Tests of Conditional Independences / Scoring
  methods may not be appropriate for the type of
  data at hand

•   Faithfulness
•   No feedback cycles
•   No determinism
•   No latent variables
•   No measurement error
•   No averaging effects
                           *aka why the algorithms may not work
Faithfulness
• ALL conditional (in)dependencies stem from
  the CMC
                 Time in   +0.6
          +0.5
                 the sun

  Sunscreen                 Skin Cancer
                 -0.3


                                             Time in
•Markov Condition does not imply:            the sun
   •Ind(Skin Cancer, Sunscreen)
•Unfaithful if:                                        Skin Cancer

   •Ind(Skin Cancer, Sunscreen)           Sunscreen
Collinearity and Determinism
              X                Y                  W



                               Z


              X                Z                  W



                               Y

• Assume Y and Z are information equivalent (e.g., one-to-
  one deterministic relation)
• Cannot distinguish the two graphs
• A specific type of violation of Faithfulness
No Feedback Cycles
        Studying                    Good Grades
• Studying causes Good Grades causes more studying (at a
  later time!)…
• Hard to define without explicitly representing time
• If all relations are linear, we can assume we sample from
  the distribution of the equilibrium of the system when
  external factors are kept constant
   – Path-diagrams (Structural Equation Models with no
     measurement model part) allow such feedback loops
• If there is feedback and relations are not linear, there may
  be chaos, literally (mathematically) and metaphorically
No Latent Confounders
                 Heat
            (Not recorded)
                                     • Dep (Ice Cream, Polio)
                                     • No CAUSAL Bayesian
                                       Network on the
Ice cream                    Polio
                                       modeled variables
                                       ONLY captures causal
Ice cream                    Polio     relations correctly

Ice cream
                                     • Both Bayesian
                             Polio
                                       Networks capture
                                       associations correctly
                                       (not always the case)
Effects of Measurement Error
             X            Y             W
True
Model
             X           Y              W


Possible
Induced      Y           X              W
Model

• X, Y, Z the actual physical quantities
• X , Y , Z the measured quantities (+ noise)
• If Y is measured with more error than X then
  Dep(X ;W | Y )
Effects of Averaging
• Almost all omics technologies measure average
  quantities over millions of cells
• The quantities in the models though refer to
  single-cells
           X             Y              W
True
Model




Possibly
Induced     Xi            Yi            Wi
Model
Success Stories
• Identify genes that cause a phenotype.
   – [Schadt et al., Nature Genetics, 2005]
• Reconstruct causal pathways.
   – [K. Sachs, et al. Science , (2005)]
• Identify causal effects.
   – [Maathius et al., Nature Methods, 2010]
• Predict association among variables never measured
  together.
   – [Tsamardinos et al., JMLR, 2012]
• Select features that are most predictive of a target variable.
   – [Aliferis et. al., JMLR, 2010 ]
An integrative genomics approach to infer causal
    associations between gene expression and disease
L –Locus of DNA variation
R – gene expression
C- Phenotype (Omental Fat Pad Mass trait)

Biological knowledge: Nothing causally affects L


L             R               C                      Causal model


L             R               C                     Reactive model


L             R               C                    Independent model
An integrative genomics approach to infer causal
associations between gene expression and disease
1. Identify loci susceptible for causing the disease
    • 4 QTLs
2. Identify gene expression traits correlated with the
    disease
    • 440 genes
3. Identify genes with eQTLs that coincide with the QTLs
    • 113 genes, 267 eQTLs
4. Identify genes that support causal models
5. Rank genes by causal effect
                                     One of them ranked 152
                                     out of the 440 based on
                                         mere correlation
An integrative genomics approach to infer causal
associations between gene expression and disease
1. Identify loci susceptible for causing the disease
    • 4 QTLs
2. Identify gene expression traits correlated with the
    disease
    • 440 genes
3. Identify genes with eQTLs that coincide with the QTLs
    • 113 genes, 267 eQTLs
4. Identify genes that support causal models
5. Rank genes by causal effect
                                    4 of the top ranked genes
                                      where experimentally
                                        validated as causal
Causal Protein-Signaling Networks Derived
      from Multi-parameter Single-Cell Data
                       CD3                CD28                              LFA-1



                                  L             PI3K     RAS      Cytohesin    JAB-1
                        Zap70     A    VAV
                                       SLP-76
                                LckT
                                                         PKC

                                       PLC
                                                PIP3     Akt


                                       PIP2                PKA        Raf



                                                MAPKKK   MAPKKK     Mek1/2
• Protein Signaling Pathways
                                                                    Erk1/2
  resemble Causal Bayesian Networks MEK4/7               MEK3/6


• Use Causal Bayesian Networks learning JNK                p38

   to reconstruct a Protein Signaling
   pathway
                                [K. Sachs, et al. Science , (2005)]
Reconstructed vs. Actual Network
                                         Phospho-Proteins
                                         Phospho-Lipids
                      PKC
                                         Perturbed in data

                                           Expected Pathway
                    PKA
                                            Reported
                                Raf
Plc                                         Reversed
              Jnk   P38                      Missed

                                Mek
       PIP3

                                Erk    15/17 Classic

PIP2                      Akt          17/17 Reported
                                       3 Missed
Predicting causal effects in large-scale
      systems from observational data
  What will happen if you knock down Gene X?


                                                              Gene A   Gene B   …   Gene X

                                                          1     0.1      0.5          1.2
Intervention calculus when DAG is Absent                  2    0.56      2.32         0.7

                                                          …
1. For every DAG G faithful to the data
                                                          n     7        0.4          2.4
                   c
2. Causal effect G of X on V in DAG G is
    1. 0, if V PaX (G)
    2. coefficient of X in V∼X + PaX (G), otherwise

3. Causal effect   c of X on V is the minimum of all cG
Predicting causal effects in large-scale
  systems from observational data
IDA Evaluation
 Experimental                   Observational
    Data*                          Data**


Rank causal effects                Apply IDA.
Take top m percent              Take top   q genes

                                                     Rosetta Compendium data:
                      Compare
                                                     • 5,361 genes
                                                     • 234 single-gene deletion mutants*
                                                     • 63 wild-type measurements**
Predicting causal effects in large-scale
  systems from observational data




• m=10%
Integrative Causal Analysis
• Make inferences from multiple heterogeneous
  datasets
  – That measure quantities under different experimental
    conditions
  – Measure different (overlapping) sets of quantities
  – In the context of prior knowledge

• General Idea:
  – Find all CAUSAL models that simultaneously fit all
    datasets and are consistent with prior knowledge
  – Reason with the set of all such models
Reason with Set of Solutions
                                                       X   Y   Z   W
                      Y-Z
                       ?
                                                       X   Y   Z   W

                Variables                              X   Y   Z   W
            Y     X         W   Z
                                                       X   Y   Z   W
  samples




            ρXW.Y = 0                                  X   Y   Z   W
                                    |ρΧΥ | > | ρΧZ |
                                                       X   Y   Z   W

                   ρ XW.Z= 0                           X   Y   Z   W
Make Predictions
                                                      X   Y   Z   W
                    Y-Z
                                                      X   Y   Z   W

              Variables                               X   Y   Z   W
          Y     X         W   Z
                                                      X   Y   Z   W
samples




          ρXW.Y = 0                                   X   Y   Z   W
                                  |ρΧΥ | > | ρΧZ |
                                                      X   Y   Z   W

                 ρ XW.Z= 0                            X   Y   Z   W



                                                     Nothing causes X
Incorporating Prior Knowledge
                                                        X   Y   Z   W
                      Y-Z
                                                        X   Y   Z   W

                Variables                               X   Y   Z   W
            Y     X         W   Z
                                                        X   Y   Z   W
  samples




            ρXW.Y = 0                                   X   Y   Z   W
                                    |ρΧΥ | > | ρΧZ |
                                                        X   Y   Z   W

                   ρ XW.Z= 0                            X   Y   Z   W



                                                       Nothing causes X
Further Inductions
                                                                      X   Y   Z   W
                  X         Y         Z       W
                                                                      X   Y   Z   W

                          Variables                                   X   Y   Z   W
                      Y     X         W   Z
        samples




                      ρXW.Y = 0                                       X   Y   Z   W
                                                  |ρΧΥ | > | ρΧZ |
                                                                      X   Y   Z   W

                             ρ XW.Z= 0                                X   Y   Z   W



                                                                     Nothing causes X
Changing Y will have an effect on Z
Proof-of-concept Results
                   I Tsamardinos, S Triantafillou and V Lagani,
   Towards Integrative Causal Analysis of Heterogeneous Datasets and Studies,
                Journal of Machine Learning Research, to appear

    20 datasets




                                              Actual correlation
  Biological, Financial, Text,
  Medical, Social

698897 predictions

   98% accuracy

vs. 16% for random guessing                                        Predicted correlation


       0.79 R2 between predicted and sample correlation
Causality and Feature Selection
• Question: Find a minimal set of molecular quantities that
  collectively carries all the information for optimal prediction /
  diagnosis (target variable) (Molecular Signature)
    – Minimal: throw away irrelevant or superfluous features
    – Collectively: May need to consider interactions
    – Optimal: Requires constructing a classification / regression model and
      estimating its performance
• Answer*:It is the direct causes, the direct effects, and the direct causes
   of the direct effects of the target variable in the BN (called the Markov
   Blanket in this context)



                          * Adopting all causal assumptions
Markov Blanket
CD3                   CD28                             LFA-1




               L            PI3K     RAS      Cytohesin   JAB-1
 Zap70     A       VAV
           T       SLP-76
         Lck
                                     PKC

                   PLC
                            PIP3     Akt


                   PIP2                PKA       Raf

Quantity of
Interest                    MAPKKK   MAPKKK    Mek1/2


                            MEK4/7   MEK3/6     Erk1/2


                             JNK       p38
Markov Blanket
CD3                   CD28                             LFA-1




               L            PI3K     RAS      Cytohesin   JAB-1
 Zap70     A       VAV
           T       SLP-76
         Lck
                                     PKC

                   PLC
                            PIP3     Akt


                   PIP2                PKA       Raf

Quantity of
Interest                    MAPKKK   MAPKKK    Mek1/2


Signature /                 MEK4/7   MEK3/6     Erk1/2

Markov
                             JNK       p38
Blanket
Markov Blanket Algorithms

• Efficient and accurate algorithms applicable to
  datasets with hundreds of thousands of
  variables
  – Max-Min Markov Blanket, [Tsamardinos, Aliferis, Statnikov, KDD 2003]
  – HITON [Aliferis, Tsamardinos, Statnikov, AMIA 2003]
  – [Aliferis, Statnikov, Tsamardinos, et. al. JMLR 2010]


• State-of-the-art in variable selection
Objective

•Identifying a set of transcripts able to
predict IKAROS gene expression.
•The selected set should be:
  – Maximally informative: able to predict
    IKAROS expression with optimal accuracy
  – Minimal: containing no redundant or
    uninformative transcripts
Data
• Genome-wide transciptome data from HapMap
  individual of European descent [Montgomery et al., 2010]
   – Lymphoblast cells
   – 60 distinct individuals
   – Approximately. ~140K transcripts


• RKPM values freely available from ArrayExpress
   – www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-197
Methods
• Constraint-based, local learning feature selection
  method for identifying multiple signatures
   – [Tsamardinos, Lagani and Pappas, 2012]


• Support Vector Machine (SVM) for providing testable
  predictions
   – [Chang and Lin, 2011]


• Nested cross validation procedure for:
   – setting algorithms’ parameters
   – providing unbiased performance estimations
   – [Statnikov, Aliferis, Tsamardinos, et al., 2005]
Results
• 22 different signatures found to be equally maximally
  predictive
   – Mean Absolute Error: 1.93
   – R2: 0.7159
   – Correlation of predictions and true expressions:
      – 0.8461 (p-value < 0.0001)
• Example signature:
   – ENST00000246549, ENST00000545189,
     ENST00000265495, ENST00000398483, ENST00000496570
• Corresponding to genes:
   – FFAR2, ZNF426, ELF2, MRPL48, DNMT3A
Predicted vs. Observed IKZF1 values
Beyond This Tutorial
Textbooks:
• Pearl, J. Causality: models, reasoning and inference (Cambridge University
   Press: 2000).
• Spirtes, P., Glymour, C. & Scheines, R. Causation, Prediction, and Search. (The
   MIT Press: 2001).
• Neapolitan, R. Learning Bayesian Networks. (Prentice Hall: 2003).
Beyond This Tutorial
Different principles for discovering causality
• Shimizu, S., Hoyer, P.O., Hyvärinen, A. & Kerminen, A. A Linear Non-Gaussian Acyclic
    Model for Causal Discovery. Journal of Machine Learning Research 7, 2003-2030 (2006).
Hoyer, P., Janzing, D., Mooij, J., Peters, J. & Schölkopf, B. Nonlinear causal discovery with
    additive noise models. Neural Information Processing Systems (NIPS) 21, 689-696
    (2009).
Causality with Feedback cycles
• Hyttinen, A., Eberhardt, F., Hoyer, P.O., Learning Linear Cyclic Causal Models with Latent
    Variables. Journal of Machine Learning Research, 13(Nov):3387-3439, 2012.
Causality with Latent Variables
• Richardson, T. & Spirtes, P. Ancestral Graph Markov Models. The Annals of Statistics 30,
    962-1030 (2002).
• Leray, P., Meganck, S., Maes, S. & Manderick, B. Causal graphical models with latent
    variables: Learning and inference. Innovations in Bayesian Networks 156, 219-249
    (2008).
Conclusions
• Causal Discovery is possible from observational data or by
  limited experiments
• Beware of violations assumptions and equivalences
• Causality provides a formal language for conceptualizing
  data analysis problems
• Necessary to predict the effect of interventions
• Deep connections to Feature Selection
• Allows integrative analysis in novel ways
• Advanced theory and algorithms exist for different sets of
  (less restrictive) assumptions
• Way to go still, particularly in disseminating to non-experts
References
•   S. B. Montgomery et al. Transcriptome genetics using second generation
    sequencing in a Caucasian population. Nature 464, 773-777 (1 April 2010)
•   I. Tsamardinos, V. Lagani and D. Pappas. Discovering multiple, equivalent
    biomarker signatures. In Proceeding of the 7th conference of the Hellenic Society
    for Computational Biology & Bioinformatics (HSCBB) 2012
•   C. Chang and C. Lin. LIBSVM: a library for support vector machines. ACM
    Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011
•   A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, and S.Levy. A comprehensive
    evaluation of multicategory classification methods for microarray gene expression
    cancer diagnosis. Bioinformatics, 21:631-643, 2005.

More Related Content

What's hot

. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
butest
 
FIT2012招待講演「異常検知技術のビジネス応用最前線」
FIT2012招待講演「異常検知技術のビジネス応用最前線」FIT2012招待講演「異常検知技術のビジネス応用最前線」
FIT2012招待講演「異常検知技術のビジネス応用最前線」
Shohei Hido
 
Bayesian Methods for Machine Learning
Bayesian Methods for Machine LearningBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning
butest
 

What's hot (20)

PyMC mcmc
PyMC mcmcPyMC mcmc
PyMC mcmc
 
[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)[기초개념] Graph Convolutional Network (GCN)
[기초개념] Graph Convolutional Network (GCN)
 
. An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic .... An introduction to machine learning and probabilistic ...
. An introduction to machine learning and probabilistic ...
 
FIT2012招待講演「異常検知技術のビジネス応用最前線」
FIT2012招待講演「異常検知技術のビジネス応用最前線」FIT2012招待講演「異常検知技術のビジネス応用最前線」
FIT2012招待講演「異常検知技術のビジネス応用最前線」
 
Bayesian Methods for Machine Learning
Bayesian Methods for Machine LearningBayesian Methods for Machine Learning
Bayesian Methods for Machine Learning
 
Discovery of Linear Acyclic Models Using Independent Component Analysis
Discovery of Linear Acyclic Models Using Independent Component AnalysisDiscovery of Linear Acyclic Models Using Independent Component Analysis
Discovery of Linear Acyclic Models Using Independent Component Analysis
 
Tiger graph 2021 corporate overview [read only]
Tiger graph 2021 corporate overview [read only]Tiger graph 2021 corporate overview [read only]
Tiger graph 2021 corporate overview [read only]
 
独立成分分析 ICA
独立成分分析 ICA独立成分分析 ICA
独立成分分析 ICA
 
Causal inference in practice
Causal inference in practiceCausal inference in practice
Causal inference in practice
 
計量経済学と 機械学習の交差点入り口 (公開用)
計量経済学と 機械学習の交差点入り口 (公開用)計量経済学と 機械学習の交差点入り口 (公開用)
計量経済学と 機械学習の交差点入り口 (公開用)
 
Latent Dirichlet Allocation
Latent Dirichlet AllocationLatent Dirichlet Allocation
Latent Dirichlet Allocation
 
Learning loss for active learning
Learning loss for active learningLearning loss for active learning
Learning loss for active learning
 
Tutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial NetworksTutorial on Theory and Application of Generative Adversarial Networks
Tutorial on Theory and Application of Generative Adversarial Networks
 
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
Scott Lundberg, Microsoft Research - Explainable Machine Learning with Shaple...
 
Feature Engineering
Feature EngineeringFeature Engineering
Feature Engineering
 
Support Vector Machine without tears
Support Vector Machine without tearsSupport Vector Machine without tears
Support Vector Machine without tears
 
深層学習入門
深層学習入門深層学習入門
深層学習入門
 
인공지능 방법론 - Deep Learning 쉽게 이해하기
인공지능 방법론 - Deep Learning 쉽게 이해하기인공지능 방법론 - Deep Learning 쉽게 이해하기
인공지능 방법론 - Deep Learning 쉽게 이해하기
 
星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章星野「調査観察データの統計科学」第3章
星野「調査観察データの統計科学」第3章
 
Causality without headaches
Causality without headachesCausality without headaches
Causality without headaches
 

Similar to An Introduction to Causal Discovery, a Bayesian Network Approach

Biological effects of radiation
Biological effects of radiationBiological effects of radiation
Biological effects of radiation
jmocherman
 
Is radon remediation causing lung cancers v1.04
Is radon remediation causing lung cancers v1.04Is radon remediation causing lung cancers v1.04
Is radon remediation causing lung cancers v1.04
Mohan Doss
 

Similar to An Introduction to Causal Discovery, a Bayesian Network Approach (20)

1. Cancer and work
1. Cancer and work1. Cancer and work
1. Cancer and work
 
Austin CyberKnife: Lung Cancer Awareness Month
Austin CyberKnife: Lung Cancer Awareness MonthAustin CyberKnife: Lung Cancer Awareness Month
Austin CyberKnife: Lung Cancer Awareness Month
 
Biological effects of radiation
Biological effects of radiationBiological effects of radiation
Biological effects of radiation
 
Elimination of occupational cancer from hazardous substances
Elimination of occupational cancer from hazardous substancesElimination of occupational cancer from hazardous substances
Elimination of occupational cancer from hazardous substances
 
Illinois CyberKnife: Lung Cancer Awareness
Illinois CyberKnife: Lung Cancer AwarenessIllinois CyberKnife: Lung Cancer Awareness
Illinois CyberKnife: Lung Cancer Awareness
 
Radiation, Friend or Foe
Radiation, Friend or FoeRadiation, Friend or Foe
Radiation, Friend or Foe
 
Radiation and Medical Imaging
Radiation and Medical ImagingRadiation and Medical Imaging
Radiation and Medical Imaging
 
Is radon remediation causing lung cancers v1.04
Is radon remediation causing lung cancers v1.04Is radon remediation causing lung cancers v1.04
Is radon remediation causing lung cancers v1.04
 
Understanding cancer
Understanding cancerUnderstanding cancer
Understanding cancer
 
Philadelphia CyberKnife: Lung Cancer Awareness Month
Philadelphia CyberKnife: Lung Cancer Awareness MonthPhiladelphia CyberKnife: Lung Cancer Awareness Month
Philadelphia CyberKnife: Lung Cancer Awareness Month
 
Mendelian Randomisation
Mendelian RandomisationMendelian Randomisation
Mendelian Randomisation
 
Radiation carcinogenesis
Radiation carcinogenesis Radiation carcinogenesis
Radiation carcinogenesis
 
Cancer of larynx
Cancer of larynxCancer of larynx
Cancer of larynx
 
Alara 2 0
Alara 2 0Alara 2 0
Alara 2 0
 
Rad errors causes and cures india 9 23
Rad errors causes and cures india 9 23Rad errors causes and cures india 9 23
Rad errors causes and cures india 9 23
 
Urogenital cancer. Kidney Cancer
Urogenital cancer. Kidney CancerUrogenital cancer. Kidney Cancer
Urogenital cancer. Kidney Cancer
 
Lung Cancer Awareness By Epillo Health Systems
Lung Cancer Awareness By Epillo Health Systems Lung Cancer Awareness By Epillo Health Systems
Lung Cancer Awareness By Epillo Health Systems
 
ANALYTICAL STUDY DESIGN COHORT STUDY.pptx
ANALYTICAL STUDY DESIGN COHORT STUDY.pptxANALYTICAL STUDY DESIGN COHORT STUDY.pptx
ANALYTICAL STUDY DESIGN COHORT STUDY.pptx
 
Carcinogens pp
Carcinogens ppCarcinogens pp
Carcinogens pp
 
Oklahoma CyberKnife: Lung Cancer Awareness
Oklahoma CyberKnife: Lung Cancer AwarenessOklahoma CyberKnife: Lung Cancer Awareness
Oklahoma CyberKnife: Lung Cancer Awareness
 

More from COST action BM1006

Knowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysisKnowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysis
COST action BM1006
 
Reverse-engineering techniques in Data Integration
Reverse-engineering techniques in Data IntegrationReverse-engineering techniques in Data Integration
Reverse-engineering techniques in Data Integration
COST action BM1006
 
from B-cell Biology to Data Integration
from B-cell Biology to Data Integrationfrom B-cell Biology to Data Integration
from B-cell Biology to Data Integration
COST action BM1006
 
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...
COST action BM1006
 
Integrative Analysis of Epigenomics and miRNA data in Immune System Models
Integrative Analysis of Epigenomics and miRNA data in Immune System ModelsIntegrative Analysis of Epigenomics and miRNA data in Immune System Models
Integrative Analysis of Epigenomics and miRNA data in Immune System Models
COST action BM1006
 
Proteomics analysis: Basics and Applications
Proteomics analysis: Basics and ApplicationsProteomics analysis: Basics and Applications
Proteomics analysis: Basics and Applications
COST action BM1006
 
Metabolomics: data acquisition, pre-processing and quality control
Metabolomics: data acquisition, pre-processing and quality controlMetabolomics: data acquisition, pre-processing and quality control
Metabolomics: data acquisition, pre-processing and quality control
COST action BM1006
 

More from COST action BM1006 (11)

Knowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysisKnowledge management for integrative omics data analysis
Knowledge management for integrative omics data analysis
 
Reverse-engineering techniques in Data Integration
Reverse-engineering techniques in Data IntegrationReverse-engineering techniques in Data Integration
Reverse-engineering techniques in Data Integration
 
from B-cell Biology to Data Integration
from B-cell Biology to Data Integrationfrom B-cell Biology to Data Integration
from B-cell Biology to Data Integration
 
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...
Mechanisms of Asthma and Allergy (MeDALL): from population based birth cohort...
 
Integrative Analysis of Epigenomics and miRNA data in Immune System Models
Integrative Analysis of Epigenomics and miRNA data in Immune System ModelsIntegrative Analysis of Epigenomics and miRNA data in Immune System Models
Integrative Analysis of Epigenomics and miRNA data in Immune System Models
 
Proteomics analysis: Basics and Applications
Proteomics analysis: Basics and ApplicationsProteomics analysis: Basics and Applications
Proteomics analysis: Basics and Applications
 
Metabolomics Data Analysis
Metabolomics Data AnalysisMetabolomics Data Analysis
Metabolomics Data Analysis
 
Metabolomics: data acquisition, pre-processing and quality control
Metabolomics: data acquisition, pre-processing and quality controlMetabolomics: data acquisition, pre-processing and quality control
Metabolomics: data acquisition, pre-processing and quality control
 
RNA-seq Analysis
RNA-seq AnalysisRNA-seq Analysis
RNA-seq Analysis
 
ChipSeq Data Analysis
ChipSeq Data AnalysisChipSeq Data Analysis
ChipSeq Data Analysis
 
X-omics Data Integration Challenges
X-omics Data Integration ChallengesX-omics Data Integration Challenges
X-omics Data Integration Challenges
 

Recently uploaded

Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Dipal Arora
 

Recently uploaded (20)

Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort ServicePremium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
Premium Call Girls Cottonpet Whatsapp 7001035870 Independent Escort Service
 
Call Girls Ooty Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Ooty Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Ooty Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Ooty Just Call 8250077686 Top Class Call Girl Service Available
 
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
Pondicherry Call Girls Book Now 9630942363 Top Class Pondicherry Escort Servi...
 
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Aurangabad Just Call 8250077686 Top Class Call Girl Service Available
 
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...Top Rated  Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
Top Rated Hyderabad Call Girls Erragadda ⟟ 6297143586 ⟟ Call Me For Genuine ...
 
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore EscortsCall Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
Call Girls Horamavu WhatsApp Number 7001035870 Meeting With Bangalore Escorts
 
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
Best Rate (Guwahati ) Call Girls Guwahati ⟟ 8617370543 ⟟ High Class Call Girl...
 
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Dehradun Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service AvailableCall Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
Call Girls Gwalior Just Call 8617370543 Top Class Call Girl Service Available
 
Bangalore Call Girls Nelamangala Number 9332606886 Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 9332606886  Meetin With Bangalore Esc...Bangalore Call Girls Nelamangala Number 9332606886  Meetin With Bangalore Esc...
Bangalore Call Girls Nelamangala Number 9332606886 Meetin With Bangalore Esc...
 
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
Night 7k to 12k Chennai City Center Call Girls 👉👉 7427069034⭐⭐ 100% Genuine E...
 
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
Best Rate (Patna ) Call Girls Patna ⟟ 8617370543 ⟟ High Class Call Girl In 5 ...
 
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
The Most Attractive Hyderabad Call Girls Kothapet 𖠋 6297143586 𖠋 Will You Mis...
 
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any TimeTop Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
Top Quality Call Girl Service Kalyanpur 6378878445 Available Call Girls Any Time
 
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Coimbatore Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service AvailableCall Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
Call Girls Nagpur Just Call 9907093804 Top Class Call Girl Service Available
 
Call Girls Bareilly Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Bareilly Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Bareilly Just Call 8250077686 Top Class Call Girl Service Available
 
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
Call Girls Visakhapatnam Just Call 9907093804 Top Class Call Girl Service Ava...
 
Call Girls Haridwar Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Haridwar Just Call 8250077686 Top Class Call Girl Service AvailableCall Girls Haridwar Just Call 8250077686 Top Class Call Girl Service Available
Call Girls Haridwar Just Call 8250077686 Top Class Call Girl Service Available
 

An Introduction to Causal Discovery, a Bayesian Network Approach

  • 1. Introduction to causal discovery: A Bayesian Networks approach Ioannis Tsamardinos1, 2 Sofia Triantafillou1, 2 Vincenzo Lagani1 1Bioinformatics Laboratory, Institute of Computer Science, Foundation for Research and Tech., Hellas 2Computer Science Department University of Crete
  • 2. Democritus said that he would rather discover a single cause than be the king of Persia “Beyond such discarded fundamentals as ‘matter’ and ‘force’ lies still another fetish amidst the inscrutable arcana of modern science, namely the category of cause and effect” [K. Pearson]
  • 3. What is causality? • What do you understand when I say : – smoking causes lung cancer?
  • 4. What is (probabilistic) causality? • What do you understand when I say : – smoking causes lung cancer? • A causes B: – A causally affects B – Probabilistically – Intervening onto values of A will affect the distribution of B – in some appropriate context
  • 5. Statistical Association (Unconditional Dependency) • Dep( X, Y | ) • X and Y are associated – Observing the value of X may change the conditional distribution of the (observed) values of Y: P(Y | X) P(Y) – Knowledge of X provides information for Y – Observed X is predictive for observed Y and vice versa – Knowing X changes our beliefs for the distribution of Y – Makes no claims about the distribution of Y, if instead of observing, we intervene on the values of X • Several means for measuring it
  • 6. Association is NOT Causation • Yellow teeth and lung cancer are associated • Can I bleach my teeth and reduce the probability of getting lung cancer? • Is Smoking really causing Lung Cancer?
  • 7. BUT “If A and B are correlated, A causes B OR B causes A OR they share a latent common cause“ [Hans Reichenbach]
  • 8. Is Smoking Causing Lung Cancer? All possible models* common common cause cause Lung Lung Smoking Smoking Cancer Cancer Lung Smoking Cancer *assuming: 1. Smoking precedes Lung Cancer 2. No feedback cycles 3. Several hidden common causes can be modeled by a single hidden common cause
  • 9. A way to learn causality 1. Take 200 people 2. Randomly split them in control and treatment groups 3. Force control group to smoke, force treatment group not to smoke 4. Wait until they are 60 years old 5. Measure correlation [ Randomized Control Trial ] [Sir Ronald Fisher]
  • 10. Manipulation All possible models* common common RCT cause RCT cause Lung Lung Smoking Smoking Cancer Cancer RCT Lung Smoking Cancer
  • 11. Manipulation removes other causes All possible models* common common RCT cause RCT cause Lung Lung Smoking Smoking Cancer Cancer RCT Lung Smoking Cancer
  • 12. Manipulation removes other causes All possible models* common common RCT cause RCT cause Lung Lung Smoking Smoking Cancer Cancer RCT Lung Smoking Cancer Association persists only when relationship is causal
  • 13. RCTs are hard • Can we learn anything from observational data?
  • 14. RCTs are hard • Can we learn anything from observational data? “If A and B are correlated, A causes B OR B causes A OR they share a latent common cause“ [Hans Reichenbach]
  • 15. Conditional Association (Conditional Dependency) • Dep( X, Y | Z) • X and Y are associated conditioned on Z – For some values of Z (some context) – Knowledge of X still provides information for Y – Observed X is still predictive for observed Y and vice versa • Statistically estimable
  • 16. Conditioning and Causality Burglar Earthquake Alarm Call [example by Judea Pearl]
  • 17. Conditioning and Causality Burglar Earthquake Alarm Call Dep(Burglar, Call| ) Burglar provides information for Call
  • 18. Conditioning and Causality Learning the value of Burglar Earthquake intermediate and Alarm common causes renders variables Call independent Ind (Burglar, Call| Alarm) Burglar provides no information for Call once Alarm is known
  • 19. Conditioning and Causality Burglar Earthquake Alarm Call Ind (Burglar, Earthquake| )
  • 20. Conditioning and Causality Burglar Earthquake Alarm Call Ind (Burglar, Earthquake| )
  • 21. Conditioning and Causality Learning the value of Burglar Earthquake common effects Alarm renders variables dependent Call Dep (Burglar, Earthquake| Alarm)
  • 22. Observing data from a causal model Smoking Yellow Teeth Lung Cancer What would you observe? • Dep(Lung Cancer, Yellow Teeth| ) • Dep(Smoking, Lung Cancer | ) • Dep(Lung Cancer, Yellow Teeth | ) • Ind(Lung Cancer, Yellow Teeth| Smoking)
  • 23. Observing data from a causal model Smoking Yellow Teeth Lung Cancer What would you observe? • Dep(Lung Cancer, Yellow Teeth| ) • Dep(Smoking, Lung Cancer | ) • Dep(Lung Cancer, Yellow Teeth | ) • Ind(Lung Cancer, Yellow Teeth| Smoking)
  • 24. Observing data from a causal model Smoking Yellow Teeth Lung Cancer What would you observe? • Dep(Lung Cancer, Yellow Teeth| ) • Dep(Smoking, Lung Cancer | ) • Dep(Lung Cancer, Yellow Teeth | ) • Ind(Lung Cancer, Yellow Teeth| Smoking)
  • 25. Observing data from a causal model Smoking Yellow Teeth Lung Cancer What would you observe? • Dep(Lung Cancer, Yellow Teeth| ) • Dep(Smoking, Lung Cancer | ) • Ind(Lung Cancer, Yellow Teeth | ) • Dep(Lung Cancer, Yellow Teeth| Smoking)
  • 26. Markov Equivalent Networks Smoking Smoking Yellow Teeth Lung Cancer Yellow Teeth Lung Cancer Smoking • Same conditional Independencies • Same skeleton Yellow Teeth Lung Cancer • Same v-structures (subgraphs X Y Z no X-Z)
  • 27. Causal Bayesian Networks* Graph G JPD J Smoking Lung Cancer Smoking Yellow Yes No Teeth Yellow Teeth Yes Yes 0,01 0,04 Yes No 0,01 0,04 No Yes 0,000045 0,044955 No No 0,000855 0,854145 Lung Cancer Assumptions about the nature of causality connect the graph G with the observed distribution J and allow reasoning *almost there
  • 28. Causal Markov Condition (CMC) Smoking Lung Cancer Smoking Yellow Yes No Teeth Yellow Teeth Yes Yes 0,01 0,04 Yes No 0,01 0,04 No Yes 0,000045 0,044955 No No 0,000855 0,854145 Lung Cancer Every variable is (conditionally) independent of its non-effects (non-descendants in the graph) given its direct causes (parents)
  • 29. Causal Markov Condition Smoking P(Yellow Teeth, Smoking, Lung Cancer) = P(Smoking) P(Yellow Teeth| Smoking) P(Lung Cancer| Smoking) Yellow Teeth Lung Cancer
  • 30. Factorization with the CMC P(Yellow Teeth, Smoking, Lung Cancer) = P(Smoking) P(Yellow Teeth| Smoking) P(Lung Cancer| Smoking) Smoking P(Smoking) = 0.1 P(Yellow Teeth|Smoking) = 0.5 P(Yellow Teeth| Smoking) = 0.05 Yellow Teeth P(Lung Cancer|Smoking) = 0.2 P(Lung Cancer| Smoking) = 0.001 Lung Cancer
  • 31. Using a Causal Bayesian Network Smoking 1. Factorize the jpd Medicine Y 2. Answer questions like: 1. P(Lung Cancer| Levels of Protein X) = ? Levels of Protein X Lung 2. Ind(Smoking, Fatigue| Levels of Cancer Protein X)? Yellow- stained 3. What will happen if I design a Fingers drug that blocks the function of protein X (predict effect of Fatigue interventions)?
  • 32. Bayesian Networks I don’t like all these assumptions I kind of liked reducing the parameters of the distribution Drop the Causal part!
  • 33. Using a Bayesian Network Smoking 1. Factorize the jpd Medicine Y 2. Answer questions like: 1. P(Lung Cancer| Levels of Protein X) = ? Levels of 2. Ind(Smoking, Fatigue| Protein X Lung Cancer Levels of Protein X)? Yellow-stained Fingers Fatigue
  • 34. Observing a causal model Raw data Subject # Smoking Yellow Lung Smoking Teeth Cancer 1 0 0 0 2 1 1 0 3 1 1 0 Yellow Teeth Lung Cancer 4 0 1 0 5 0 0 0 6 1 1 0 7 1 1 1 8 0 1 0 9 0 0 0 What would you observe? 10 1 1 0 • Dep(Lung Cancer, Yellow Teeth| ) 11 1 1 0 • Dep(Smoking, Lung Cancer | ) . . . . • Dep(Lung Cancer, Yellow Teeth | ) . . . . • Ind(Lung Cancer, Yellow Teeth| Smoking) . . . . 10000 1 1 1
  • 35. Learning Set of Equivalent Networks Constraint-Based Approach Score-Based (Bayesian) Test conditional Find the DAG with the independencies in maximum a posteriori data and find a DAG that probability given the encodes them data
  • 36. Learning the network Constraint-Based Approach Bayesian Approach •SGS [Spirtes, Glymour, & Scheines 2000] •K2 [Cooper and Herskowitz 1992] • PC [Spirtes, Glymour, & Scheines 2000] •GBPS [Spirtes and Meek 1995] •GES [Chickering and Meek 2002] • TPDA [Cheng et al., 1997] •Sparse Candidate [Friedman et al. 1999] •CPC [Ramsey et al, 2006] •Optimal Reinsertion [Moore and Wong 2003] Hybrid •Rec [Xie, X, Geng, Zhi, JMLR 2008] •Exact Algorithms [Koivisto et al., 2004] , •MMHC [Tsamardinos et al. 2006] [Koivisto, 2006] , [Silander & Myllymaki, 2006] • CB [Provan et al. 1995] • BENEDICT [Provan and de Campos 2001] •ECOS [Kaname et al. 2010] Many more!!!
  • 37. Assumptions* • Tests of Conditional Independences / Scoring methods may not be appropriate for the type of data at hand • Faithfulness • No feedback cycles • No determinism • No latent variables • No measurement error • No averaging effects *aka why the algorithms may not work
  • 38. Faithfulness • ALL conditional (in)dependencies stem from the CMC Time in +0.6 +0.5 the sun Sunscreen Skin Cancer -0.3 Time in •Markov Condition does not imply: the sun •Ind(Skin Cancer, Sunscreen) •Unfaithful if: Skin Cancer •Ind(Skin Cancer, Sunscreen) Sunscreen
  • 39. Collinearity and Determinism X Y W Z X Z W Y • Assume Y and Z are information equivalent (e.g., one-to- one deterministic relation) • Cannot distinguish the two graphs • A specific type of violation of Faithfulness
  • 40. No Feedback Cycles Studying Good Grades • Studying causes Good Grades causes more studying (at a later time!)… • Hard to define without explicitly representing time • If all relations are linear, we can assume we sample from the distribution of the equilibrium of the system when external factors are kept constant – Path-diagrams (Structural Equation Models with no measurement model part) allow such feedback loops • If there is feedback and relations are not linear, there may be chaos, literally (mathematically) and metaphorically
  • 41. No Latent Confounders Heat (Not recorded) • Dep (Ice Cream, Polio) • No CAUSAL Bayesian Network on the Ice cream Polio modeled variables ONLY captures causal Ice cream Polio relations correctly Ice cream • Both Bayesian Polio Networks capture associations correctly (not always the case)
  • 42. Effects of Measurement Error X Y W True Model X Y W Possible Induced Y X W Model • X, Y, Z the actual physical quantities • X , Y , Z the measured quantities (+ noise) • If Y is measured with more error than X then Dep(X ;W | Y )
  • 43. Effects of Averaging • Almost all omics technologies measure average quantities over millions of cells • The quantities in the models though refer to single-cells X Y W True Model Possibly Induced Xi Yi Wi Model
  • 44. Success Stories • Identify genes that cause a phenotype. – [Schadt et al., Nature Genetics, 2005] • Reconstruct causal pathways. – [K. Sachs, et al. Science , (2005)] • Identify causal effects. – [Maathius et al., Nature Methods, 2010] • Predict association among variables never measured together. – [Tsamardinos et al., JMLR, 2012] • Select features that are most predictive of a target variable. – [Aliferis et. al., JMLR, 2010 ]
  • 45. An integrative genomics approach to infer causal associations between gene expression and disease L –Locus of DNA variation R – gene expression C- Phenotype (Omental Fat Pad Mass trait) Biological knowledge: Nothing causally affects L L R C Causal model L R C Reactive model L R C Independent model
  • 46. An integrative genomics approach to infer causal associations between gene expression and disease 1. Identify loci susceptible for causing the disease • 4 QTLs 2. Identify gene expression traits correlated with the disease • 440 genes 3. Identify genes with eQTLs that coincide with the QTLs • 113 genes, 267 eQTLs 4. Identify genes that support causal models 5. Rank genes by causal effect One of them ranked 152 out of the 440 based on mere correlation
  • 47. An integrative genomics approach to infer causal associations between gene expression and disease 1. Identify loci susceptible for causing the disease • 4 QTLs 2. Identify gene expression traits correlated with the disease • 440 genes 3. Identify genes with eQTLs that coincide with the QTLs • 113 genes, 267 eQTLs 4. Identify genes that support causal models 5. Rank genes by causal effect 4 of the top ranked genes where experimentally validated as causal
  • 48. Causal Protein-Signaling Networks Derived from Multi-parameter Single-Cell Data CD3 CD28 LFA-1 L PI3K RAS Cytohesin JAB-1 Zap70 A VAV SLP-76 LckT PKC PLC PIP3 Akt PIP2 PKA Raf MAPKKK MAPKKK Mek1/2 • Protein Signaling Pathways Erk1/2 resemble Causal Bayesian Networks MEK4/7 MEK3/6 • Use Causal Bayesian Networks learning JNK p38 to reconstruct a Protein Signaling pathway [K. Sachs, et al. Science , (2005)]
  • 49. Reconstructed vs. Actual Network Phospho-Proteins Phospho-Lipids PKC Perturbed in data Expected Pathway PKA Reported Raf Plc Reversed Jnk P38 Missed Mek PIP3 Erk  15/17 Classic PIP2 Akt  17/17 Reported  3 Missed
  • 50. Predicting causal effects in large-scale systems from observational data What will happen if you knock down Gene X? Gene A Gene B … Gene X 1 0.1 0.5 1.2 Intervention calculus when DAG is Absent 2 0.56 2.32 0.7 … 1. For every DAG G faithful to the data n 7 0.4 2.4 c 2. Causal effect G of X on V in DAG G is 1. 0, if V PaX (G) 2. coefficient of X in V∼X + PaX (G), otherwise 3. Causal effect c of X on V is the minimum of all cG
  • 51. Predicting causal effects in large-scale systems from observational data IDA Evaluation Experimental Observational Data* Data** Rank causal effects Apply IDA. Take top m percent Take top q genes Rosetta Compendium data: Compare • 5,361 genes • 234 single-gene deletion mutants* • 63 wild-type measurements**
  • 52. Predicting causal effects in large-scale systems from observational data • m=10%
  • 53. Integrative Causal Analysis • Make inferences from multiple heterogeneous datasets – That measure quantities under different experimental conditions – Measure different (overlapping) sets of quantities – In the context of prior knowledge • General Idea: – Find all CAUSAL models that simultaneously fit all datasets and are consistent with prior knowledge – Reason with the set of all such models
  • 54. Reason with Set of Solutions X Y Z W Y-Z ? X Y Z W Variables X Y Z W Y X W Z X Y Z W samples ρXW.Y = 0 X Y Z W |ρΧΥ | > | ρΧZ | X Y Z W ρ XW.Z= 0 X Y Z W
  • 55. Make Predictions X Y Z W Y-Z X Y Z W Variables X Y Z W Y X W Z X Y Z W samples ρXW.Y = 0 X Y Z W |ρΧΥ | > | ρΧZ | X Y Z W ρ XW.Z= 0 X Y Z W Nothing causes X
  • 56. Incorporating Prior Knowledge X Y Z W Y-Z X Y Z W Variables X Y Z W Y X W Z X Y Z W samples ρXW.Y = 0 X Y Z W |ρΧΥ | > | ρΧZ | X Y Z W ρ XW.Z= 0 X Y Z W Nothing causes X
  • 57. Further Inductions X Y Z W X Y Z W X Y Z W Variables X Y Z W Y X W Z samples ρXW.Y = 0 X Y Z W |ρΧΥ | > | ρΧZ | X Y Z W ρ XW.Z= 0 X Y Z W Nothing causes X Changing Y will have an effect on Z
  • 58. Proof-of-concept Results I Tsamardinos, S Triantafillou and V Lagani, Towards Integrative Causal Analysis of Heterogeneous Datasets and Studies, Journal of Machine Learning Research, to appear 20 datasets Actual correlation Biological, Financial, Text, Medical, Social 698897 predictions 98% accuracy vs. 16% for random guessing Predicted correlation 0.79 R2 between predicted and sample correlation
  • 59. Causality and Feature Selection • Question: Find a minimal set of molecular quantities that collectively carries all the information for optimal prediction / diagnosis (target variable) (Molecular Signature) – Minimal: throw away irrelevant or superfluous features – Collectively: May need to consider interactions – Optimal: Requires constructing a classification / regression model and estimating its performance • Answer*:It is the direct causes, the direct effects, and the direct causes of the direct effects of the target variable in the BN (called the Markov Blanket in this context) * Adopting all causal assumptions
  • 60. Markov Blanket CD3 CD28 LFA-1 L PI3K RAS Cytohesin JAB-1 Zap70 A VAV T SLP-76 Lck PKC PLC PIP3 Akt PIP2 PKA Raf Quantity of Interest MAPKKK MAPKKK Mek1/2 MEK4/7 MEK3/6 Erk1/2 JNK p38
  • 61. Markov Blanket CD3 CD28 LFA-1 L PI3K RAS Cytohesin JAB-1 Zap70 A VAV T SLP-76 Lck PKC PLC PIP3 Akt PIP2 PKA Raf Quantity of Interest MAPKKK MAPKKK Mek1/2 Signature / MEK4/7 MEK3/6 Erk1/2 Markov JNK p38 Blanket
  • 62. Markov Blanket Algorithms • Efficient and accurate algorithms applicable to datasets with hundreds of thousands of variables – Max-Min Markov Blanket, [Tsamardinos, Aliferis, Statnikov, KDD 2003] – HITON [Aliferis, Tsamardinos, Statnikov, AMIA 2003] – [Aliferis, Statnikov, Tsamardinos, et. al. JMLR 2010] • State-of-the-art in variable selection
  • 63. Objective •Identifying a set of transcripts able to predict IKAROS gene expression. •The selected set should be: – Maximally informative: able to predict IKAROS expression with optimal accuracy – Minimal: containing no redundant or uninformative transcripts
  • 64. Data • Genome-wide transciptome data from HapMap individual of European descent [Montgomery et al., 2010] – Lymphoblast cells – 60 distinct individuals – Approximately. ~140K transcripts • RKPM values freely available from ArrayExpress – www.ebi.ac.uk/arrayexpress/experiments/E-MTAB-197
  • 65. Methods • Constraint-based, local learning feature selection method for identifying multiple signatures – [Tsamardinos, Lagani and Pappas, 2012] • Support Vector Machine (SVM) for providing testable predictions – [Chang and Lin, 2011] • Nested cross validation procedure for: – setting algorithms’ parameters – providing unbiased performance estimations – [Statnikov, Aliferis, Tsamardinos, et al., 2005]
  • 66. Results • 22 different signatures found to be equally maximally predictive – Mean Absolute Error: 1.93 – R2: 0.7159 – Correlation of predictions and true expressions: – 0.8461 (p-value < 0.0001) • Example signature: – ENST00000246549, ENST00000545189, ENST00000265495, ENST00000398483, ENST00000496570 • Corresponding to genes: – FFAR2, ZNF426, ELF2, MRPL48, DNMT3A
  • 67. Predicted vs. Observed IKZF1 values
  • 68. Beyond This Tutorial Textbooks: • Pearl, J. Causality: models, reasoning and inference (Cambridge University Press: 2000). • Spirtes, P., Glymour, C. & Scheines, R. Causation, Prediction, and Search. (The MIT Press: 2001). • Neapolitan, R. Learning Bayesian Networks. (Prentice Hall: 2003).
  • 69. Beyond This Tutorial Different principles for discovering causality • Shimizu, S., Hoyer, P.O., Hyvärinen, A. & Kerminen, A. A Linear Non-Gaussian Acyclic Model for Causal Discovery. Journal of Machine Learning Research 7, 2003-2030 (2006). Hoyer, P., Janzing, D., Mooij, J., Peters, J. & Schölkopf, B. Nonlinear causal discovery with additive noise models. Neural Information Processing Systems (NIPS) 21, 689-696 (2009). Causality with Feedback cycles • Hyttinen, A., Eberhardt, F., Hoyer, P.O., Learning Linear Cyclic Causal Models with Latent Variables. Journal of Machine Learning Research, 13(Nov):3387-3439, 2012. Causality with Latent Variables • Richardson, T. & Spirtes, P. Ancestral Graph Markov Models. The Annals of Statistics 30, 962-1030 (2002). • Leray, P., Meganck, S., Maes, S. & Manderick, B. Causal graphical models with latent variables: Learning and inference. Innovations in Bayesian Networks 156, 219-249 (2008).
  • 70. Conclusions • Causal Discovery is possible from observational data or by limited experiments • Beware of violations assumptions and equivalences • Causality provides a formal language for conceptualizing data analysis problems • Necessary to predict the effect of interventions • Deep connections to Feature Selection • Allows integrative analysis in novel ways • Advanced theory and algorithms exist for different sets of (less restrictive) assumptions • Way to go still, particularly in disseminating to non-experts
  • 71. References • S. B. Montgomery et al. Transcriptome genetics using second generation sequencing in a Caucasian population. Nature 464, 773-777 (1 April 2010) • I. Tsamardinos, V. Lagani and D. Pappas. Discovering multiple, equivalent biomarker signatures. In Proceeding of the 7th conference of the Hellenic Society for Computational Biology & Bioinformatics (HSCBB) 2012 • C. Chang and C. Lin. LIBSVM: a library for support vector machines. ACM Transactions on Intelligent Systems and Technology, 2:27:1--27:27, 2011 • A. Statnikov, C.F. Aliferis, I. Tsamardinos, D. Hardin, and S.Levy. A comprehensive evaluation of multicategory classification methods for microarray gene expression cancer diagnosis. Bioinformatics, 21:631-643, 2005.