Here, I talk more about the causality idea in order to define D-Separation and the initial algorithm for finding it. I put some extra example of the algorithm...
Hopefully, It is good enough for you...
Universal Approximation Theorem
Here, we prove that the perceptron multi-layer can approximate all continuous functions in the hypercube [0,1]. For this, we used the Cybenko proof... I tried to include the basic in topology and mathematical analysis to make the slides more understandable. However, they still need some work to be done. In addition, I am a little bit rusty in my mathematical analysis, so I am still not so convinced with my linear functional I defined for the proof...!!! Back to the Rudin and Apostol!!! So expect changes in the future.
18 Machine Learning Radial Basis Function Networks Forward HeuristicsAndres Mendez-Vazquez
This document discusses radial basis function networks and forward selection heuristics for neural networks. It begins by outlining topics to be covered, including predicting the variance of weights and outputs, selecting the regularization parameter, and forward selection algorithms. It then derives an expression for the variance of the weight vector w when noise is assumed to be normally distributed. Next, it discusses how to calculate the variance matrix and selects the regularization parameter λ. Finally, it introduces how to determine the number of dimensions and provides an overview of forward selection algorithms.
The document introduces Bayesian classifiers and the naive Bayes model. It discusses key concepts like likelihood, prior probability, and evidence. The naive Bayes model uses Bayes' theorem to calculate the posterior probability of class membership given observed data. The optimal Bayesian classification rule is to assign the data point to the class with the highest posterior probability. Minimizing classification error is achieved by selecting the decision boundary that minimizes the sum of the error probabilities for each class.
This document outlines the plan for a seminar on kernel methods. The seminar will cover four main topics: kernel methods, random projections, deep learning, and more about kernels. The overall goal is to provide enough theoretical background to understand several related papers on convolutional kernel networks, fastfood kernel approximations, and deep fried convolutional neural networks. The document includes an agenda, definitions, theorems, and references to support understanding kernel methods and their applications.
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrapChristian Robert
The document discusses the bootstrap method and its applications in statistical inference. It introduces the bootstrap as a technique for estimating properties of estimators like variance and distribution when the true sampling distribution is unknown. This is done by treating the observed sample as if it were the population and resampling with replacement to create new simulated samples. The bootstrap then approximates characteristics of the sampling distribution, allowing inferences like confidence intervals to be constructed.
Universal Approximation Theorem
Here, we prove that the perceptron multi-layer can approximate all continuous functions in the hypercube [0,1]. For this, we used the Cybenko proof... I tried to include the basic in topology and mathematical analysis to make the slides more understandable. However, they still need some work to be done. In addition, I am a little bit rusty in my mathematical analysis, so I am still not so convinced with my linear functional I defined for the proof...!!! Back to the Rudin and Apostol!!! So expect changes in the future.
18 Machine Learning Radial Basis Function Networks Forward HeuristicsAndres Mendez-Vazquez
This document discusses radial basis function networks and forward selection heuristics for neural networks. It begins by outlining topics to be covered, including predicting the variance of weights and outputs, selecting the regularization parameter, and forward selection algorithms. It then derives an expression for the variance of the weight vector w when noise is assumed to be normally distributed. Next, it discusses how to calculate the variance matrix and selects the regularization parameter λ. Finally, it introduces how to determine the number of dimensions and provides an overview of forward selection algorithms.
The document introduces Bayesian classifiers and the naive Bayes model. It discusses key concepts like likelihood, prior probability, and evidence. The naive Bayes model uses Bayes' theorem to calculate the posterior probability of class membership given observed data. The optimal Bayesian classification rule is to assign the data point to the class with the highest posterior probability. Minimizing classification error is achieved by selecting the decision boundary that minimizes the sum of the error probabilities for each class.
This document outlines the plan for a seminar on kernel methods. The seminar will cover four main topics: kernel methods, random projections, deep learning, and more about kernels. The overall goal is to provide enough theoretical background to understand several related papers on convolutional kernel networks, fastfood kernel approximations, and deep fried convolutional neural networks. The document includes an agenda, definitions, theorems, and references to support understanding kernel methods and their applications.
Statistics (1): estimation, Chapter 2: Empirical distribution and bootstrapChristian Robert
The document discusses the bootstrap method and its applications in statistical inference. It introduces the bootstrap as a technique for estimating properties of estimators like variance and distribution when the true sampling distribution is unknown. This is done by treating the observed sample as if it were the population and resampling with replacement to create new simulated samples. The bootstrap then approximates characteristics of the sampling distribution, allowing inferences like confidence intervals to be constructed.
This document summarizes a seminar on kernels and support vector machines. It begins by explaining why kernels are useful for increasing flexibility and speed compared to direct inner product calculations. It then covers definitions of positive definite kernels and how to prove a function is a kernel. Several kernel families are discussed, including translation invariant, polynomial, and non-Mercer kernels. Finally, the document derives the primal and dual problems for support vector machines and explains how the kernel trick allows non-linear classification.
Interval valued intuitionistic fuzzy homomorphism of bf algebrasAlexander Decker
This document discusses interval-valued intuitionistic fuzzy homomorphisms of BF-algebras. It begins with introducing BF-algebras and defining interval-valued intuitionistic fuzzy sets. It then defines interval-valued intuitionistic fuzzy ideals of BF-algebras and provides an example. Interval-valued intuitionistic fuzzy homomorphisms of BF-algebras are introduced and some properties are investigated, including showing that the image of an interval-valued intuitionistic fuzzy ideal under a homomorphism is also an interval-valued intuitionistic fuzzy ideal if it satisfies the "sup-inf" property.
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Christian Robert
The document discusses likelihood functions and inference. It begins by defining the likelihood function as the function that gives the probability of observing a sample given a parameter value. The likelihood varies with the parameter, while the density function varies with the data. Maximum likelihood estimation chooses parameters that maximize the likelihood function. The score function is the gradient of the log-likelihood and has an expected value of zero at the true parameter value. The Fisher information matrix measures the curvature of the likelihood surface and provides information about the precision of parameter estimates. It relates to the concentration of likelihood functions around the true parameter value as sample size increases.
A common fixed point theorem in cone metric spacesAlexander Decker
This academic article summarizes a common fixed point theorem for continuous and asymptotically regular self-mappings on complete cone metric spaces. The theorem extends previous results to cone metric spaces, which generalize metric spaces by replacing real numbers with an ordered Banach space. It proves that under certain contractive conditions, the self-mapping has a unique fixed point. The proof constructs a Cauchy sequence that converges to the fixed point.
This document provides an overview of probability distributions and related concepts. It defines key probability distributions like the binomial, beta, multinomial, and Dirichlet distributions. It also describes probability distribution equations like the cumulative distribution function and probability density function. Additionally, it outlines descriptive parameters for distributions like mean, variance, skewness and kurtosis. Finally, it briefly discusses probability theorems such as the law of large numbers, central limit theorem, and Bayes' theorem.
This document presents a new coupled fixed point theorem for mappings having the mixed monotone property in partially ordered metric spaces. Specifically:
1) The theorem establishes the existence of a coupled fixed point for a mapping F that satisfies a contraction-type condition and has the mixed monotone property in a partially ordered, complete metric space.
2) It is shown that the coupled fixed point can be unique under additional conditions involving midpoint lower or upper bound properties.
3) An estimate is provided for the convergence rate as the iterates of the mapping F converge to the coupled fixed point.
The document discusses statistical models and exponential families. It states that for most of the course, data is assumed to be a random sample from a distribution F. Repetition of observations via the law of large numbers and central limit theorem increases information about F. Exponential families are a class of parametric distributions with convenient analytic properties, where the density can be written as a function of natural parameters in an exponential form. Examples of exponential families include the binomial and normal distributions.
This document outlines the key concepts that will be covered in Lecture 2 on Bayesian modeling. It introduces the likelihood function and how it can be used to determine the most likely parameter values given observed data. It provides examples of applying Bayesian modeling to proportions, normal distributions, linear regression with one predictor, and linear regression with multiple predictors. The lecture aims to give students a basic understanding of how Bayesian analysis works and prepare them for fitting linear mixed models.
The document describes an algorithm for solving the Steiner Tree problem parameterized by treewidth. It uses dynamic programming over a tree decomposition of the input graph to compute the solution. The algorithm builds a dynamic programming table at each node of the tree decomposition. It handles leaf, forget, introduce and join nodes in t^O(t) time by updating/merging partial solutions. This results in an overall running time of t^O(t) to solve the Steiner Tree problem parameterized by treewidth.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...Cristiano Longo
The document discusses the quantified fragment of set theory called ∀π0. ∀π0 allows for restricted quantification over sets and ordered pairs. A decision procedure for the satisfiability of ∀π0 formulas works by non-deterministically guessing a skeletal representation and checking if its realization is a model of the formula. The document considers encoding the conditions on skeletal representations as first-order formulas to view ∀π0 as a first-order logic and leverage tools developed for first-order logic fragments.
Higher-order (F, α, β, ρ, d)-convexity is considered. A multiobjective programming problem (MP) is considered. Mond-Weir and Wolfe type duals are considered for multiobjective programming problem. Duality results are established for multiobjective programming problem under higher-order (F, α, β, ρ, d)- convexity assumptions. The results are also applied for multiobjective fractional programming problem.
The document discusses matroids and parameterized algorithms. It begins with an overview of Kruskal's greedy algorithm for finding a minimum spanning tree in a graph. It then introduces matroids and defines them as structures where greedy algorithms find optimal solutions. Several examples of matroids are provided, including uniform matroids, partition matroids, graphic matroids, and gammoids. The document also presents an alternate definition of matroids using basis families.
This document provides an introduction to machine learning concepts including:
- Machine learning involves learning parameters of probabilistic models from data.
- Maximum likelihood and maximum a posteriori estimation are common techniques for learning parameters.
- Inductive learning involves constructing a hypothesis from examples to generalize the target function to new examples. Cross-validation is used to evaluate hypotheses on held-out data and avoid overfitting.
Fuzzy soft set is one of the recent topics developed for dealing with the uncertainties present
in most of our real life situations. The parameterization tool of soft set theory enhances the flexibility of
its application. In this paper, we have studied membership grade, power set,
-cut set , strong fuzzy
-
cut set ,some standard operation fuzzy soft set, degree of subset hood and proposed some results with
examples.
This document provides a summary of Lecture 10 on Bayesian decision theory and Naive Bayes machine learning algorithms. It begins with a recap of Lecture 9 on using probability to classify patterns into categories. It then discusses how to apply these probabilistic concepts to both nominal and continuous variables. A medical example is presented to illustrate Bayesian classification. The document concludes by explaining the Naive Bayes algorithm for classification tasks and providing a worked example of how it is trained and makes predictions.
"reflections on the probability space induced by moment conditions with impli...Christian Robert
This document discusses using moment conditions to perform Bayesian inference when the likelihood function is intractable or unknown. It outlines some approaches that have been proposed, including approximating the likelihood using empirical likelihood or pseudo-likelihoods. However, these approaches do not guarantee the same consistency as a true likelihood. Alternative approximative Bayesian methods are also discussed, such as Approximate Bayesian Computation, Integrated Nested Laplace Approximation, and variational Bayes. The empirical likelihood method constructs a likelihood from generalized moment conditions, but its use in Bayesian inference requires further analysis of consistency in each application.
This document discusses moment-generating functions and their properties and applications in probability theory. It defines the moment-generating function for both discrete and continuous random variables. Some key properties of moment-generating functions are that each probability distribution has a unique moment-generating function, and they can be used to find the distribution of sums of random variables. The document also describes several common probability distributions and derives their corresponding moment-generating functions.
Peter LaBrash is seeking an inside/outside sales position and has a Bachelor's degree in Sales and Business Marketing from Western Michigan University. He has work experience in logistics sales, customer service, general construction, and student painting. His volunteer experience includes working with a local food bank and Toys for Tots.
Digimon are digital monsters that originate from Earth's communication networks and live in the Digital World. Taichi "Tai" Kamiya is the leader of the digidestined and his partner digimon is Agumon, whose various forms include Greymon, MetalGreymon, WarGreymon, VictoryGreymon, and through DNA digivolution with MetalGarurumon, Omnimon. Agumon's attacks include Pepper Breath and Claw Attack in his Rookie form.
This document summarizes a seminar on kernels and support vector machines. It begins by explaining why kernels are useful for increasing flexibility and speed compared to direct inner product calculations. It then covers definitions of positive definite kernels and how to prove a function is a kernel. Several kernel families are discussed, including translation invariant, polynomial, and non-Mercer kernels. Finally, the document derives the primal and dual problems for support vector machines and explains how the kernel trick allows non-linear classification.
Interval valued intuitionistic fuzzy homomorphism of bf algebrasAlexander Decker
This document discusses interval-valued intuitionistic fuzzy homomorphisms of BF-algebras. It begins with introducing BF-algebras and defining interval-valued intuitionistic fuzzy sets. It then defines interval-valued intuitionistic fuzzy ideals of BF-algebras and provides an example. Interval-valued intuitionistic fuzzy homomorphisms of BF-algebras are introduced and some properties are investigated, including showing that the image of an interval-valued intuitionistic fuzzy ideal under a homomorphism is also an interval-valued intuitionistic fuzzy ideal if it satisfies the "sup-inf" property.
Statistics (1): estimation Chapter 3: likelihood function and likelihood esti...Christian Robert
The document discusses likelihood functions and inference. It begins by defining the likelihood function as the function that gives the probability of observing a sample given a parameter value. The likelihood varies with the parameter, while the density function varies with the data. Maximum likelihood estimation chooses parameters that maximize the likelihood function. The score function is the gradient of the log-likelihood and has an expected value of zero at the true parameter value. The Fisher information matrix measures the curvature of the likelihood surface and provides information about the precision of parameter estimates. It relates to the concentration of likelihood functions around the true parameter value as sample size increases.
A common fixed point theorem in cone metric spacesAlexander Decker
This academic article summarizes a common fixed point theorem for continuous and asymptotically regular self-mappings on complete cone metric spaces. The theorem extends previous results to cone metric spaces, which generalize metric spaces by replacing real numbers with an ordered Banach space. It proves that under certain contractive conditions, the self-mapping has a unique fixed point. The proof constructs a Cauchy sequence that converges to the fixed point.
This document provides an overview of probability distributions and related concepts. It defines key probability distributions like the binomial, beta, multinomial, and Dirichlet distributions. It also describes probability distribution equations like the cumulative distribution function and probability density function. Additionally, it outlines descriptive parameters for distributions like mean, variance, skewness and kurtosis. Finally, it briefly discusses probability theorems such as the law of large numbers, central limit theorem, and Bayes' theorem.
This document presents a new coupled fixed point theorem for mappings having the mixed monotone property in partially ordered metric spaces. Specifically:
1) The theorem establishes the existence of a coupled fixed point for a mapping F that satisfies a contraction-type condition and has the mixed monotone property in a partially ordered, complete metric space.
2) It is shown that the coupled fixed point can be unique under additional conditions involving midpoint lower or upper bound properties.
3) An estimate is provided for the convergence rate as the iterates of the mapping F converge to the coupled fixed point.
The document discusses statistical models and exponential families. It states that for most of the course, data is assumed to be a random sample from a distribution F. Repetition of observations via the law of large numbers and central limit theorem increases information about F. Exponential families are a class of parametric distributions with convenient analytic properties, where the density can be written as a function of natural parameters in an exponential form. Examples of exponential families include the binomial and normal distributions.
This document outlines the key concepts that will be covered in Lecture 2 on Bayesian modeling. It introduces the likelihood function and how it can be used to determine the most likely parameter values given observed data. It provides examples of applying Bayesian modeling to proportions, normal distributions, linear regression with one predictor, and linear regression with multiple predictors. The lecture aims to give students a basic understanding of how Bayesian analysis works and prepare them for fitting linear mixed models.
The document describes an algorithm for solving the Steiner Tree problem parameterized by treewidth. It uses dynamic programming over a tree decomposition of the input graph to compute the solution. The algorithm builds a dynamic programming table at each node of the tree decomposition. It handles leaf, forget, introduce and join nodes in t^O(t) time by updating/merging partial solutions. This results in an overall running time of t^O(t) to solve the Steiner Tree problem parameterized by treewidth.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Herbrand-satisfiability of a Quantified Set-theoretical Fragment (Cantone, Lo...Cristiano Longo
The document discusses the quantified fragment of set theory called ∀π0. ∀π0 allows for restricted quantification over sets and ordered pairs. A decision procedure for the satisfiability of ∀π0 formulas works by non-deterministically guessing a skeletal representation and checking if its realization is a model of the formula. The document considers encoding the conditions on skeletal representations as first-order formulas to view ∀π0 as a first-order logic and leverage tools developed for first-order logic fragments.
Higher-order (F, α, β, ρ, d)-convexity is considered. A multiobjective programming problem (MP) is considered. Mond-Weir and Wolfe type duals are considered for multiobjective programming problem. Duality results are established for multiobjective programming problem under higher-order (F, α, β, ρ, d)- convexity assumptions. The results are also applied for multiobjective fractional programming problem.
The document discusses matroids and parameterized algorithms. It begins with an overview of Kruskal's greedy algorithm for finding a minimum spanning tree in a graph. It then introduces matroids and defines them as structures where greedy algorithms find optimal solutions. Several examples of matroids are provided, including uniform matroids, partition matroids, graphic matroids, and gammoids. The document also presents an alternate definition of matroids using basis families.
This document provides an introduction to machine learning concepts including:
- Machine learning involves learning parameters of probabilistic models from data.
- Maximum likelihood and maximum a posteriori estimation are common techniques for learning parameters.
- Inductive learning involves constructing a hypothesis from examples to generalize the target function to new examples. Cross-validation is used to evaluate hypotheses on held-out data and avoid overfitting.
Fuzzy soft set is one of the recent topics developed for dealing with the uncertainties present
in most of our real life situations. The parameterization tool of soft set theory enhances the flexibility of
its application. In this paper, we have studied membership grade, power set,
-cut set , strong fuzzy
-
cut set ,some standard operation fuzzy soft set, degree of subset hood and proposed some results with
examples.
This document provides a summary of Lecture 10 on Bayesian decision theory and Naive Bayes machine learning algorithms. It begins with a recap of Lecture 9 on using probability to classify patterns into categories. It then discusses how to apply these probabilistic concepts to both nominal and continuous variables. A medical example is presented to illustrate Bayesian classification. The document concludes by explaining the Naive Bayes algorithm for classification tasks and providing a worked example of how it is trained and makes predictions.
"reflections on the probability space induced by moment conditions with impli...Christian Robert
This document discusses using moment conditions to perform Bayesian inference when the likelihood function is intractable or unknown. It outlines some approaches that have been proposed, including approximating the likelihood using empirical likelihood or pseudo-likelihoods. However, these approaches do not guarantee the same consistency as a true likelihood. Alternative approximative Bayesian methods are also discussed, such as Approximate Bayesian Computation, Integrated Nested Laplace Approximation, and variational Bayes. The empirical likelihood method constructs a likelihood from generalized moment conditions, but its use in Bayesian inference requires further analysis of consistency in each application.
This document discusses moment-generating functions and their properties and applications in probability theory. It defines the moment-generating function for both discrete and continuous random variables. Some key properties of moment-generating functions are that each probability distribution has a unique moment-generating function, and they can be used to find the distribution of sums of random variables. The document also describes several common probability distributions and derives their corresponding moment-generating functions.
Peter LaBrash is seeking an inside/outside sales position and has a Bachelor's degree in Sales and Business Marketing from Western Michigan University. He has work experience in logistics sales, customer service, general construction, and student painting. His volunteer experience includes working with a local food bank and Toys for Tots.
Digimon are digital monsters that originate from Earth's communication networks and live in the Digital World. Taichi "Tai" Kamiya is the leader of the digidestined and his partner digimon is Agumon, whose various forms include Greymon, MetalGreymon, WarGreymon, VictoryGreymon, and through DNA digivolution with MetalGarurumon, Omnimon. Agumon's attacks include Pepper Breath and Claw Attack in his Rookie form.
Ignite talk about Docker.io Docker is an open-source project to easily create lightweight, portable, self-sufficient containers from any application. The same container that a developer builds and tests on a laptop can run at scale, in production, on VMs, bare metal, OpenStack clusters, public clouds and more.
FreeeUp helps you to hire reliable remote workers without the hassle of interviewing applicants.
Set up a meeting with our CEO today at FreeeUp.com. Or Sign Up! It's free, fast, and easy!
La educación a distancia ha evolucionado a través de cinco generaciones tecnológicas, comenzando con materiales impresos y progresando hacia el uso de computadoras e Internet. El papel del profesor ha cambiado de ser un transmisor de información a ser un facilitador y motivador, mientras que el estudiante se ha vuelto más activo y colaborativo en entornos virtuales. La educación a distancia actual ofrece una variedad de formas de comunicación síncrona y asíncrona entre estudiantes, profesores y la institución
O poema descreve as diferentes facetas da personalidade feminina ao longo dos dias. A mulher ora está alegre, ora carente, ora sensual ou indiferente. Embora possa ser sedutora, por dentro ela é doce e carinhosa, especialmente nos momentos de amor e intimidade. O autor é Tarcísio Costa, poeta e cronista cearense.
The document provides tips for scoring well on class 12 board exams. It recommends that students set realistic targets and pace themselves by not trying to study everything at once. Students should thoroughly read their textbooks to gain a strong foundation of basics and should not panic by viewing the exams as ordinary tests. Important points should be underlined, diagrams drawn neatly, last minute cramming avoided, and the full syllabus studied in a calm state of mind. Preparing well in this manner can help students achieve desired results.
Inbound marketing focuses on creating helpful content to attract potential customers to a website rather than interrupting customers with ads. It involves creating content through blogs, social media, and websites on topics relevant to customers. This content is optimized for search engines so customers can find the information they need. Inbound marketing aims to convert visitors into leads and customers by using forms, calls-to-action, landing pages, and email nurturing with relevant content. It is a multi-channel strategy to engage with customers at each stage of their buying journey.
O documento é uma poesia que descreve uma águia azul voando no céu infinito em missão divina para elevar a consciência humana e trazer mais amor e beleza ao mundo. A águia representa a poetisa que é uma guerreira cósmica e embaixatriz de Deus nessa missão.
Barbie dolls are immensely popular, with over 946 million sold each year generating $3.6 billion in annual sales. Since Barbie's debut in 1959, when 300,000 dolls were sold, the iconic doll has shaped ideals of femininity and body image. The widespread influence and commercial success of Barbie reflects societal pressures that still impact self-esteem today.
Mapa conceptual 26 de julio milena villamizar power pointcelimer
El documento presenta un mapa conceptual sobre nociones del derecho civil creado por la alumna Milena Villamizar González para la Escuela de Derecho de la Universidad Fermín Toro en Venezuela, bajo la tutela de la Dra. Cristina Virguez.
Una computadora es un mecanismo electrónico que acepta información de entrada, la procesa y produce información de salida. Se compone de hardware físico como dispositivos de entrada, salida, memoria y un procesador, así como software de programas. Los virus informáticos son pequeños programas creados para alterar sin permiso el funcionamiento normal de una computadora y su información, o para evitar la piratería.
Talk: Joint causal inference on observational and experimental data - NIPS 20...Sara Magliacane
This document discusses methods for performing joint causal inference using both observational and experimental datasets. It proposes modeling the datasets within a single causal graph framework using additional dummy variables to represent experimental conditions. This allows applying constraint-based causal discovery methods, but requires modifications to handle violations of the causal faithfulness assumption that can occur due to deterministic relationships between variables. A strategy is presented that rephrases the constraint-based approach in terms of d-separations and derives sound conditional independence relationships to partially address these violations. The goal is to reconstruct the causal graph representing all datasets from independence test results.
This document discusses network representation and analysis. It defines networks as consisting of nodes (vertices) and edges, and describes different ways to represent networks mathematically using adjacency matrices, incidence matrices, and Laplacian matrices. It also discusses visualizing networks using multidimensional scaling and plotting them in R. Special types of networks like complete graphs and random graphs are briefly introduced.
This document discusses modeling networks using regression analysis with additive and multiplicative effects. It introduces network modeling and describes some common network regression models, including the social relations model (SRM) which captures sender and receiver effects. The document discusses incorporating covariates into these models and using multiplicative effects to better capture triadic behavior and homophily in networks. It also briefly mentions generalizing these models to ordinal outcomes.
A short and naive introduction to using network in prediction modelstuxette
The document provides an introduction to using network information in prediction models. It discusses representing a network as a graph with a Laplacian matrix. The Laplacian captures properties like random walks on the graph and heat diffusion. Eigenvectors of the Laplacian related to small eigenvalues are strongly tied to graph structure. The document discusses using the Laplacian in prediction models by working in the feature space defined by the Laplacian eigenvectors or directly regularizing a linear model with the Laplacian. This introduces network information and encourages similar contributions from connected nodes. The approaches are applied to problems like predicting phenotypes from gene expression using a known gene network.
Big Data analysis involves building predictive models from high-dimensional data using techniques like variable selection, cross-validation, and regularization to avoid overfitting. The document discusses an example analyzing web browsing data to predict online spending, highlighting challenges with large numbers of variables. It also covers summarizing high-dimensional data through dimension reduction and model building for prediction versus causal inference.
This document provides information about various probability distributions and Bayesian networks. It defines key concepts such as joint probability distribution, marginal distribution, conditional distribution, and independence assumptions. Examples are given to illustrate how to calculate joint distributions in Bayesian networks and perform causal reasoning using conditional probabilities. Naive Bayes classifiers are also introduced as applying conditional independence assumptions. Tools for simulating Bayesian networks are referenced.
Cointegration analysis: Modelling the complex interdependencies between finan...Edward Thomas Jones
1) The document discusses cointegration analysis, which models the complex interdependencies between financial assets. It examines the non-stationary nature of financial time series data and explores vector autoregressive (VAR) models and cointegration techniques to analyze relationships between non-stationary variables.
2) VAR models provide a framework for modeling dynamic relationships between stationary time series variables. The document outlines univariate and multivariate VAR models and discusses estimations and lag order selection for VAR models.
3) Cointegration techniques allow modeling of relationships between non-stationary time series variables. The document reviews tests for identifying stationary and non-stationary time series, including the Augmented Dickey-Fuller and Phillips-Perron tests
Model-Based Diagnosis of Discrete Event Systems via Automatic PlanningLUCACERIANI1
The document discusses model-based diagnosis of discrete event systems via automatic planning. It begins with an introduction to model-based diagnosis and the hypothesis space approach. The talk then describes representing diagnosis as a planning problem by defining the diagnosis problem in a hypothesis space with a mapping function. Algorithms are presented that use AI planners to compute the diagnosis as the preferred hypotheses. Experimental results applying these algorithms to challenging diagnosis problems are also discussed.
Higher-order organization of complex networksDavid Gleich
A talk I gave at the Park City Institute of Mathematics about our recent work on using motifs to analyze and cluster networks. This involves a higher-order cheeger inequality in terms of motifs.
This document summarizes key points from a lecture on probability and Bayes networks:
1. Bayes networks provide a structured representation of probability distributions through a directed acyclic graph where nodes are variables and edges represent conditional dependencies.
2. Conditional probabilities allow calculating joint probabilities and the likelihood of events given other observed events through Bayes' rule.
3. Bayes networks encode conditional independence relationships between variables - observing certain variables can "block" influence between other variables depending on the network structure.
Multiple regression analysis is used to understand the relationship between a dependent variable and multiple independent variables. It allows us to determine the effect of independent variables like age and work experience on a dependent variable like income. The analysis examines metrics like the coefficient of determination (R2), F-test, t-test and assumptions around normality, multicollinearity, homoscedasticity and autocorrelation. Properly conducted, multiple regression can be used to predict the value of a dependent variable based on the values of independent variables.
1. The document discusses the chain rule for functions of several variables. It provides examples of how to use a "tree diagram" to represent variable dependencies and derive the appropriate chain rule statement.
2. It also gives examples of applying the chain rule to find derivatives like df/dt for functions where variables like x, y, and z depend on t, or to find partial derivatives like ∂f/∂s and ∂f/∂t.
3. One example works through applying the chain rule to find df/dt for the function f(x,y) = x^2 + y^2, where x = t^2 and y = t^4, and verifies the
Graph Neural Network for Phenotype Predictiontuxette
This document describes a study on using graph neural networks (GNNs) for phenotype prediction from gene expression data. The objectives are to determine if including network information can improve predictions, which network types work best, and if GNNs can learn network inferences. It provides background on GNNs and how they generalize convolutional layers to graph data. The authors implemented a GNN model from previous work as a starting point and tested it on different network types to see which network information is most useful for predictions. Their methodology involves comparing GNN performance to other methods like random forests using 10-fold cross validation.
Introduction to Machine Learning Lecturesssuserfece35
This lecture discusses ensemble methods in machine learning. It introduces bagging, which trains multiple models on random subsets of the training data and averages their predictions, in order to reduce variance and prevent overfitting. Bagging is effective because it decreases the correlation between predictions. Random forests apply bagging to decision trees while also introducing more randomness by selecting a random subset of features to consider at each node. The next lecture will cover boosting, which aims to reduce bias by training models sequentially to focus on examples previously misclassified.
When Classifier Selection meets Information Theory: A Unifying ViewMohamed Farouk
Classifier selection aims to reduce the size of an
ensemble of classifiers in order to improve its efficiency and
classification accuracy. Recently an information-theoretic view
was presented for feature selection. It derives a space of possible
selection criteria and show that several feature selection criteria
in the literature are points within this continuous space. The
contribution of this paper is to export this information-theoretic
view to solve an open issue in ensemble learning which is
classifier selection. We investigated a couple of informationtheoretic
selection criteria that are used to rank classifiers.
The document discusses Bayesian networks and causal discovery methods. It provides definitions and examples of key concepts in Bayesian networks including directed acyclic graphs (DAGs), Markov blankets, and the Markov condition. It also describes different approaches to learning Bayesian network structures, including constraint-based methods such as the PC algorithm and score-based methods like greedy hill climbing. Causal discovery from data aims to infer causal relationships between variables using techniques like conditional independence tests on Bayesian networks.
This document classifies and defines different types of differential equations:
Ordinary differential equations contain ordinary derivatives of a single unknown function. Partial differential equations contain partial derivatives of multiple unknown functions. Systems of differential equations involve finding multiple unknown functions from a set of equations. Differential equations are also classified based on their order, whether they are linear or nonlinear, and whether solutions exist and can be found.
Reading revue of "Inferring Multiple Graphical Structures"tuxette
This document summarizes and reviews methods for inferring gene co-expression networks from gene expression data, as presented in related articles including Chiquet et al. It describes various statistical approaches implemented in packages like GeneNet and glasso, including graphical Gaussian models using shrinkage and sparse linear regression. It compares the resulting network densities produced by different methods.
Similar to Artificial Intelligence 06.2 More on Causality Bayesian Networks (20)
This document outlines an introduction to Bayesian estimation. It discusses key concepts like the likelihood principle, sufficiency, and Bayesian inference. The likelihood principle states that all experimental information about an unknown parameter is contained within the likelihood function. An example is provided testing the fairness of a coin using different data collection scenarios to illustrate how the likelihood function remains the same. The document also discusses the history of the likelihood principle and provides an outline of topics to be covered.
The document discusses linear transformations and their applications in mathematics for artificial intelligence. It begins by introducing linear transformations and how matrices can be used to define functions. It describes how a matrix A can define a linear transformation fA that maps vectors in Rn to vectors in Rm. It also defines key concepts for linear transformations like the kernel, range, row space, and column space. The document will continue exploring topics like the derivative of transformations, linear regression, principal component analysis, and singular value decomposition.
This document provides an introduction and outline for a discussion of orthonormal bases and eigenvectors. It begins with an overview of orthonormal bases, including definitions of the dot product, norm, orthogonal vectors and subspaces, and orthogonal complements. It also discusses the relationship between the null space and row space of a matrix. The document then provides an introduction to eigenvectors and outlines topics that will be covered, including what eigenvectors are useful for and how to find and use them.
The document discusses square matrices and determinants. It begins by noting that square matrices are the only matrices that can have inverses. It then presents an algorithm for calculating the inverse of a square matrix A by forming the partitioned matrix (A|I) and applying Gauss-Jordan reduction. The document also discusses determinants, defining them recursively as the sum of products of diagonal entries with signs depending on row/column position, for matrices larger than 1x1. Complexity increases exponentially with matrix size.
This document provides an introduction to systems of linear equations and matrix operations. It defines key concepts such as matrices, matrix addition and multiplication, and transitions between different bases. It presents an example of multiplying two matrices using NumPy. The document outlines how systems of linear equations can be represented using matrices and discusses solving systems using techniques like Gauss-Jordan elimination and elementary row operations. It also introduces the concepts of homogeneous and inhomogeneous systems.
This document provides an outline and introduction to a course on mathematics for artificial intelligence, with a focus on vector spaces and linear algebra. It discusses:
1. A brief history of linear algebra, from ancient Babylonians solving systems of equations to modern definitions of matrices.
2. The definition of a vector space as a set that can be added and multiplied by elements of a field, with properties like closure under addition and scalar multiplication.
3. Examples of using matrices and vectors to model systems of linear equations and probabilities of transitions between web pages.
4. The importance of linear algebra concepts like bases, dimensions, and eigenvectors/eigenvalues for machine learning applications involving feature vectors and least squares error.
This document outlines and discusses backpropagation and automatic differentiation. It begins with an introduction to backpropagation, describing how it works in two phases: feed-forward to calculate outputs, and backpropagation to calculate gradients using the chain rule. It then discusses automatic differentiation, noting that it provides advantages over symbolic differentiation. The document explores the forward and reverse modes of automatic differentiation and examines their implementation and complexity. In summary, it covers the fundamental algorithms and methods for calculating gradients in neural networks.
This document summarizes the services of a company that provides data analysis and machine learning solutions. They have an interdisciplinary team with over 15 years of experience in areas like machine learning, artificial intelligence, big data, and data engineering. Their expertise includes developing data models, analysis products, and systems to help companies with forecasting, decision making, and improving data operations efficiency. They can help clients across various industries like telecom, finance, retail, and more.
My first set of slides (The NN and DL class I am preparing for the fall)... I included the problem of Vanishing Gradient and the need to have ReLu (Mentioning btw the saturation problem inherited from Hebbian Learning)
Reinforcement learning is a method for learning behaviors through trial-and-error interactions with an environment. The goal is to maximize a numerical reward signal by discovering the actions that yield the most reward. The learner is not told which actions to take directly, but must instead determine which actions are best by trying them out. This document outlines reinforcement learning concepts like exploration versus exploitation, where exploration involves trying non-optimal actions to gain more information, while exploitation uses current knowledge to choose optimal actions. It also discusses formalisms like Markov decision processes and the tradeoff between maximizing short-term versus long-term rewards in reinforcement learning problems.
This document provides an overview of a 65-hour course on neural networks and deep learning taught by Andres Mendez Vazquez at Cinvestav Guadalajara. The course objectives are to introduce students to concepts of neural networks, with a focus on various neural network architectures and their applications. Topics covered include traditional neural networks, deep learning, optimization techniques for training deep models, and specific deep learning architectures like convolutional and recurrent neural networks. The course grades are based on midterms, homework assignments, and a final project.
This document provides a syllabus for an introduction to artificial intelligence course. It outlines 14 topics that will be covered in the class, including what AI is, the mathematics behind it like probability and linear algebra, search techniques, constraint satisfaction problems, probabilistic reasoning, Bayesian networks, graphical models, neural networks, machine learning, planning, knowledge representation, reinforcement learning, logic in AI, and genetic algorithms. It also lists the course requirements, which include exams, homework, and a group project to simulate predators and prey.
The document outlines a proposed 8 semester curriculum for a Bachelor's degree in Machine Learning and Data Science. The curriculum covers fundamental topics in mathematics, computer science, statistics, physics and artificial intelligence in the first 4 semesters. Later semesters focus on more advanced topics in artificial intelligence, machine learning, neural networks, databases, and parallel programming. The final semester emphasizes practical applications of machine learning and data science through courses on large-scale systems and non-traditional databases.
This document outlines the syllabus for a course on analysis of algorithms and complexity. The course will cover foundational topics like asymptotic analysis and randomized algorithms, as well as specific algorithms like sorting, searching trees, and graph algorithms. It will also cover advanced techniques like dynamic programming, greedy algorithms, and amortized analysis. Later topics will include NP-completeness, multi-threaded algorithms, and approaches for dealing with NP-complete problems. The requirements include exams, homework assignments, and a project, and the course will be taught in English.
A review of one of the most popular methods of clustering, a part of what is know as unsupervised learning, K-Means. Here, we go from the basic heuristic used to solve the NP-Hard problem to an approximation algorithm K-Centers. Additionally, we look at variations coming from the Fuzzy Set ideas. In the future, we will add more about On-Line algorithms in the line of Stochastic Gradient Ideas...
Here a Review of the Combination of Machine Learning models from Bayesian Averaging, Committees to Boosting... Specifically An statistical analysis of Boosting is done
This document provides an introduction to machine learning concepts including loss functions, empirical risk, and two basic methods of learning - least squared error and nearest neighborhood. It describes how machine learning aims to find an optimal function that minimizes empirical risk under a given loss function. Least squared error learning is discussed as minimizing the squared differences between predictions and labels. Nearest neighborhood is also introduced as an alternative method. The document serves as a high-level overview of fundamental machine learning principles.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
International Conference on NLP, Artificial Intelligence, Machine Learning an...gerogepatton
International Conference on NLP, Artificial Intelligence, Machine Learning and Applications (NLAIM 2024) offers a premier global platform for exchanging insights and findings in the theory, methodology, and applications of NLP, Artificial Intelligence, Machine Learning, and their applications. The conference seeks substantial contributions across all key domains of NLP, Artificial Intelligence, Machine Learning, and their practical applications, aiming to foster both theoretical advancements and real-world implementations. With a focus on facilitating collaboration between researchers and practitioners from academia and industry, the conference serves as a nexus for sharing the latest developments in the field.
Harnessing WebAssembly for Real-time Stateless Streaming PipelinesChristina Lin
Traditionally, dealing with real-time data pipelines has involved significant overhead, even for straightforward tasks like data transformation or masking. However, in this talk, we’ll venture into the dynamic realm of WebAssembly (WASM) and discover how it can revolutionize the creation of stateless streaming pipelines within a Kafka (Redpanda) broker. These pipelines are adept at managing low-latency, high-data-volume scenarios.
CHINA’S GEO-ECONOMIC OUTREACH IN CENTRAL ASIAN COUNTRIES AND FUTURE PROSPECTjpsjournal1
The rivalry between prominent international actors for dominance over Central Asia's hydrocarbon
reserves and the ancient silk trade route, along with China's diplomatic endeavours in the area, has been
referred to as the "New Great Game." This research centres on the power struggle, considering
geopolitical, geostrategic, and geoeconomic variables. Topics including trade, political hegemony, oil
politics, and conventional and nontraditional security are all explored and explained by the researcher.
Using Mackinder's Heartland, Spykman Rimland, and Hegemonic Stability theories, examines China's role
in Central Asia. This study adheres to the empirical epistemological method and has taken care of
objectivity. This study analyze primary and secondary research documents critically to elaborate role of
china’s geo economic outreach in central Asian countries and its future prospect. China is thriving in trade,
pipeline politics, and winning states, according to this study, thanks to important instruments like the
Shanghai Cooperation Organisation and the Belt and Road Economic Initiative. According to this study,
China is seeing significant success in commerce, pipeline politics, and gaining influence on other
governments. This success may be attributed to the effective utilisation of key tools such as the Shanghai
Cooperation Organisation and the Belt and Road Economic Initiative.
Batteries -Introduction – Types of Batteries – discharging and charging of battery - characteristics of battery –battery rating- various tests on battery- – Primary battery: silver button cell- Secondary battery :Ni-Cd battery-modern battery: lithium ion battery-maintenance of batteries-choices of batteries for electric vehicle applications.
Fuel Cells: Introduction- importance and classification of fuel cells - description, principle, components, applications of fuel cells: H2-O2 fuel cell, alkaline fuel cell, molten carbonate fuel cell and direct methanol fuel cells.
TIME DIVISION MULTIPLEXING TECHNIQUE FOR COMMUNICATION SYSTEMHODECEDSIET
Time Division Multiplexing (TDM) is a method of transmitting multiple signals over a single communication channel by dividing the signal into many segments, each having a very short duration of time. These time slots are then allocated to different data streams, allowing multiple signals to share the same transmission medium efficiently. TDM is widely used in telecommunications and data communication systems.
### How TDM Works
1. **Time Slots Allocation**: The core principle of TDM is to assign distinct time slots to each signal. During each time slot, the respective signal is transmitted, and then the process repeats cyclically. For example, if there are four signals to be transmitted, the TDM cycle will divide time into four slots, each assigned to one signal.
2. **Synchronization**: Synchronization is crucial in TDM systems to ensure that the signals are correctly aligned with their respective time slots. Both the transmitter and receiver must be synchronized to avoid any overlap or loss of data. This synchronization is typically maintained by a clock signal that ensures time slots are accurately aligned.
3. **Frame Structure**: TDM data is organized into frames, where each frame consists of a set of time slots. Each frame is repeated at regular intervals, ensuring continuous transmission of data streams. The frame structure helps in managing the data streams and maintaining the synchronization between the transmitter and receiver.
4. **Multiplexer and Demultiplexer**: At the transmitting end, a multiplexer combines multiple input signals into a single composite signal by assigning each signal to a specific time slot. At the receiving end, a demultiplexer separates the composite signal back into individual signals based on their respective time slots.
### Types of TDM
1. **Synchronous TDM**: In synchronous TDM, time slots are pre-assigned to each signal, regardless of whether the signal has data to transmit or not. This can lead to inefficiencies if some time slots remain empty due to the absence of data.
2. **Asynchronous TDM (or Statistical TDM)**: Asynchronous TDM addresses the inefficiencies of synchronous TDM by allocating time slots dynamically based on the presence of data. Time slots are assigned only when there is data to transmit, which optimizes the use of the communication channel.
### Applications of TDM
- **Telecommunications**: TDM is extensively used in telecommunication systems, such as in T1 and E1 lines, where multiple telephone calls are transmitted over a single line by assigning each call to a specific time slot.
- **Digital Audio and Video Broadcasting**: TDM is used in broadcasting systems to transmit multiple audio or video streams over a single channel, ensuring efficient use of bandwidth.
- **Computer Networks**: TDM is used in network protocols and systems to manage the transmission of data from multiple sources over a single network medium.
### Advantages of TDM
- **Efficient Use of Bandwidth**: TDM all
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
2. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
2 / 112
3. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
3 / 112
4. Causality
What do we naturally?
A way of structuring a situation for reasoning under uncertainty is to
construct a graph representing causal relations between events.
Example of events with possible outputs
Fuel? {Yes, No}
Clean Spark Plugs? {full,1/2, empty}
Start? {Yes, No}
4 / 112
5. Causality
What do we naturally?
A way of structuring a situation for reasoning under uncertainty is to
construct a graph representing causal relations between events.
Example of events with possible outputs
Fuel? {Yes, No}
Clean Spark Plugs? {full,1/2, empty}
Start? {Yes, No}
4 / 112
6. Causality
We know
We know that the state of Fuel? and the state of Clean Spark Plugs?
have a causal impact on the state of Start?.
Thus we have something like
5 / 112
7. Causality
We know
We know that the state of Fuel? and the state of Clean Spark Plugs?
have a causal impact on the state of Start?.
Thus we have something like
Start
Clean Spark Plugs
Fuel
5 / 112
8. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
6 / 112
9. Causal Structure - Judea Perl (1988)
Definition
A causal structure of a set of variables V is a directed acyclic graph (DAG)
in which each node corresponds to a distinct element of V , and each edge
represents direct functional relationship among the corresponding variables.
Observation
Causal Structure A precise specification of how each variable is
influenced by its parents in the DAG.
7 / 112
10. Causal Structure - Judea Perl (1988)
Definition
A causal structure of a set of variables V is a directed acyclic graph (DAG)
in which each node corresponds to a distinct element of V , and each edge
represents direct functional relationship among the corresponding variables.
Observation
Causal Structure A precise specification of how each variable is
influenced by its parents in the DAG.
7 / 112
11. Causal Model
Definition
A causal model is a pair M = D, ΘD consisting of a causal structure D
and a set of parameters ΘD compatible with D.
Thus
The parameters ΘD assign a distribution
xi = fi (pai, ui)
ui ∼ p (ui)
Where
xi is a variable in the model D.
pai are the parents of xi in D.
ui is independent of any other u.
8 / 112
12. Causal Model
Definition
A causal model is a pair M = D, ΘD consisting of a causal structure D
and a set of parameters ΘD compatible with D.
Thus
The parameters ΘD assign a distribution
xi = fi (pai, ui)
ui ∼ p (ui)
Where
xi is a variable in the model D.
pai are the parents of xi in D.
ui is independent of any other u.
8 / 112
13. Causal Model
Definition
A causal model is a pair M = D, ΘD consisting of a causal structure D
and a set of parameters ΘD compatible with D.
Thus
The parameters ΘD assign a distribution
xi = fi (pai, ui)
ui ∼ p (ui)
Where
xi is a variable in the model D.
pai are the parents of xi in D.
ui is independent of any other u.
8 / 112
14. Causal Model
Definition
A causal model is a pair M = D, ΘD consisting of a causal structure D
and a set of parameters ΘD compatible with D.
Thus
The parameters ΘD assign a distribution
xi = fi (pai, ui)
ui ∼ p (ui)
Where
xi is a variable in the model D.
pai are the parents of xi in D.
ui is independent of any other u.
8 / 112
15. Causal Model
Definition
A causal model is a pair M = D, ΘD consisting of a causal structure D
and a set of parameters ΘD compatible with D.
Thus
The parameters ΘD assign a distribution
xi = fi (pai, ui)
ui ∼ p (ui)
Where
xi is a variable in the model D.
pai are the parents of xi in D.
ui is independent of any other u.
8 / 112
16. Example
From the point of view of Statistics
Formulation
z = fZ (uZ )
x = fX (z, uX )
y = fY (x, uY )
9 / 112
17. Example
From the point of view of Statistics
Formulation
z = fZ (uZ )
x = fX (z, uX )
y = fY (x, uY )
9 / 112
18. Example
From the point of view of Statistics
Formulation
z = fZ (uZ )
x = fX (z, uX )
y = fY (x, uY )
9 / 112
19. Example
From the point of view of Statistics
Formulation
z = fZ (uZ )
x = fX (z, uX )
y = fY (x, uY )
9 / 112
20. Now add an observation x0
From the point of view of Statistics
Formulation after blocking information
z = fZ (uZ )
x = x0
y = fY (x, uY )
10 / 112
21. Now add an observation x0
From the point of view of Statistics
Formulation after blocking information
z = fZ (uZ )
x = x0
y = fY (x, uY )
10 / 112
22. Now add an observation x0
From the point of view of Statistics
Formulation after blocking information
z = fZ (uZ )
x = x0
y = fY (x, uY )
10 / 112
23. Now add an observation x0
From the point of view of Statistics
Formulation after blocking information
z = fZ (uZ )
x = x0
y = fY (x, uY )
10 / 112
24. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
11 / 112
25. Causal Networks
Definition
A causal network consists of a set of variables and a set of directed links
(also called edges) between variables.
Thus
In order to analyze a causal network is necessary to analyze:
Causal Chains
Common Causes
Common Effects
12 / 112
26. Causal Networks
Definition
A causal network consists of a set of variables and a set of directed links
(also called edges) between variables.
Thus
In order to analyze a causal network is necessary to analyze:
Causal Chains
Common Causes
Common Effects
12 / 112
27. Causal Networks
Definition
A causal network consists of a set of variables and a set of directed links
(also called edges) between variables.
Thus
In order to analyze a causal network is necessary to analyze:
Causal Chains
Common Causes
Common Effects
12 / 112
28. Causal Networks
Definition
A causal network consists of a set of variables and a set of directed links
(also called edges) between variables.
Thus
In order to analyze a causal network is necessary to analyze:
Causal Chains
Common Causes
Common Effects
12 / 112
29. Causal Networks
Definition
A causal network consists of a set of variables and a set of directed links
(also called edges) between variables.
Thus
In order to analyze a causal network is necessary to analyze:
Causal Chains
Common Causes
Common Effects
12 / 112
30. Causal Networks
Definition
A causal network consists of a set of variables and a set of directed links
(also called edges) between variables.
Thus
In order to analyze a causal network is necessary to analyze:
Causal Chains
Common Causes
Common Effects
12 / 112
31. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
13 / 112
32. Causal Chains
This configuration is a “causal chain”
What about the Joint Distribution?
We have by the Chain Rule
P (X, Y , Z) = P (X) P (Y |X) P (Z|Y , X) (1)
14 / 112
33. Causal Chains
This configuration is a “causal chain”
What about the Joint Distribution?
We have by the Chain Rule
P (X, Y , Z) = P (X) P (Y |X) P (Z|Y , X) (1)
14 / 112
34. Propagation of Information
Given no information about Y
Information can propagate from X to Z.
Thus
The natural question is What does happen if Y = y for some value y?
15 / 112
35. Propagation of Information
Given no information about Y
Information can propagate from X to Z.
Thus
The natural question is What does happen if Y = y for some value y?
15 / 112
36. Thus, we have that
Blocking Propagation of Information
16 / 112
37. Using Our Probabilities
Then Z is independent of X given a Y = y
And making the assumption that once an event happens
P (Z|X, Y = y) = P (Z|Y = y) (2)
YES!!!
Evidence along the chain “blocks” the influence.
17 / 112
38. Using Our Probabilities
Then Z is independent of X given a Y = y
And making the assumption that once an event happens
P (Z|X, Y = y) = P (Z|Y = y) (2)
YES!!!
Evidence along the chain “blocks” the influence.
17 / 112
39. Thus
Something Notable
Knowing that X has occurred does not make any difference to our beliefs
about Z if we already know that Y has occurred.
Thus conditional independencies can be written
IP (Z, X|Y = y) (3)
18 / 112
40. Thus
Something Notable
Knowing that X has occurred does not make any difference to our beliefs
about Z if we already know that Y has occurred.
Thus conditional independencies can be written
IP (Z, X|Y = y) (3)
18 / 112
42. Thus
Is X independent of Z given Y = y?
P (Z|X, Y = y) =
P (X, Y = y, Z)
P (X, Y = y)
=
P (X) P (Y = y|X) P (Z|Y = y)
P (X) P (Y = y|X)
= P (Z|Y = y)
20 / 112
43. Thus
Is X independent of Z given Y = y?
P (Z|X, Y = y) =
P (X, Y = y, Z)
P (X, Y = y)
=
P (X) P (Y = y|X) P (Z|Y = y)
P (X) P (Y = y|X)
= P (Z|Y = y)
20 / 112
44. Thus
Is X independent of Z given Y = y?
P (Z|X, Y = y) =
P (X, Y = y, Z)
P (X, Y = y)
=
P (X) P (Y = y|X) P (Z|Y = y)
P (X) P (Y = y|X)
= P (Z|Y = y)
20 / 112
45. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
21 / 112
46. Common Causes
Another basic configuration: two effects of the same cause
Thus
P (X, Y = y, Z) = P (X) P (Y = y|X) P (Z|X, Y = y)
P(Z|Y =y)
(5)
22 / 112
47. Common Causes
Another basic configuration: two effects of the same cause
Thus
P (X, Y = y, Z) = P (X) P (Y = y|X) P (Z|X, Y = y)
P(Z|Y =y)
(5)
22 / 112
48. Common Causes
What happened if X is independent of Z given Y = y?
P (Z|X, Y = y) =
P (X, Y = y, Z)
P (X, Y = y)
=
P (X) P (Y = y|X) P (Z|Y = y)
P (X) P (Y = y|X)
= P (Z|Y = y)
YES!!!
Evidence on the top of the chain “blocks” the influence between X and Z.
23 / 112
49. Common Causes
What happened if X is independent of Z given Y = y?
P (Z|X, Y = y) =
P (X, Y = y, Z)
P (X, Y = y)
=
P (X) P (Y = y|X) P (Z|Y = y)
P (X) P (Y = y|X)
= P (Z|Y = y)
YES!!!
Evidence on the top of the chain “blocks” the influence between X and Z.
23 / 112
50. Common Causes
What happened if X is independent of Z given Y = y?
P (Z|X, Y = y) =
P (X, Y = y, Z)
P (X, Y = y)
=
P (X) P (Y = y|X) P (Z|Y = y)
P (X) P (Y = y|X)
= P (Z|Y = y)
YES!!!
Evidence on the top of the chain “blocks” the influence between X and Z.
23 / 112
51. Thus
It gives rise to the same conditional independent structure as chains
IP (Z, X|Y = y) (6)
i.e.
if we already know about Y , then an additional information about X will
not tell us anything new about Z.
24 / 112
52. Thus
It gives rise to the same conditional independent structure as chains
IP (Z, X|Y = y) (6)
i.e.
if we already know about Y , then an additional information about X will
not tell us anything new about Z.
24 / 112
53. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
25 / 112
54. Common Effect
Last configuration: two causes of one effect (v-structures)
Are X and Z independent if we do not have information about Y ?
Yes!!! Because the ballgame and the rain can cause traffic, but they are
not correlated.
26 / 112
55. Common Effect
Last configuration: two causes of one effect (v-structures)
Are X and Z independent if we do not have information about Y ?
Yes!!! Because the ballgame and the rain can cause traffic, but they are
not correlated.
26 / 112
56. Proof
We have the following
P (Z|X, Y ) =
P (X, Y , Z)
P (X, Y )
=
P (X|Z, Y ) P (Y |Z) P (Z)
P (X|Y ) P (Y )
=
P (X) P (Y |Z) P (Z)
P (X) P (Y |Z)
= P (Z)
27 / 112
57. Proof
We have the following
P (Z|X, Y ) =
P (X, Y , Z)
P (X, Y )
=
P (X|Z, Y ) P (Y |Z) P (Z)
P (X|Y ) P (Y )
=
P (X) P (Y |Z) P (Z)
P (X) P (Y |Z)
= P (Z)
27 / 112
58. Proof
We have the following
P (Z|X, Y ) =
P (X, Y , Z)
P (X, Y )
=
P (X|Z, Y ) P (Y |Z) P (Z)
P (X|Y ) P (Y )
=
P (X) P (Y |Z) P (Z)
P (X) P (Y |Z)
= P (Z)
27 / 112
59. Proof
We have the following
P (Z|X, Y ) =
P (X, Y , Z)
P (X, Y )
=
P (X|Z, Y ) P (Y |Z) P (Z)
P (X|Y ) P (Y )
=
P (X) P (Y |Z) P (Z)
P (X) P (Y |Z)
= P (Z)
27 / 112
60. Common Effects
Are X and Z independent given Y = y?
No!!! Because seeing traffic puts the rain and the ballgame in competition
as explanation!!!
Why?
P (X, Z|Y = y) =
P (X, Z, Y = y)
P (Y = y)
=
P (X|Z, Y = y) P (Z|Y = y) P (Y = y)
P (Y = y)
= P (X|Z, Y = y) P (Z|Y = y)
28 / 112
61. Common Effects
Are X and Z independent given Y = y?
No!!! Because seeing traffic puts the rain and the ballgame in competition
as explanation!!!
Why?
P (X, Z|Y = y) =
P (X, Z, Y = y)
P (Y = y)
=
P (X|Z, Y = y) P (Z|Y = y) P (Y = y)
P (Y = y)
= P (X|Z, Y = y) P (Z|Y = y)
28 / 112
62. Common Effects
Are X and Z independent given Y = y?
No!!! Because seeing traffic puts the rain and the ballgame in competition
as explanation!!!
Why?
P (X, Z|Y = y) =
P (X, Z, Y = y)
P (Y = y)
=
P (X|Z, Y = y) P (Z|Y = y) P (Y = y)
P (Y = y)
= P (X|Z, Y = y) P (Z|Y = y)
28 / 112
63. Common Effects
Are X and Z independent given Y = y?
No!!! Because seeing traffic puts the rain and the ballgame in competition
as explanation!!!
Why?
P (X, Z|Y = y) =
P (X, Z, Y = y)
P (Y = y)
=
P (X|Z, Y = y) P (Z|Y = y) P (Y = y)
P (Y = y)
= P (X|Z, Y = y) P (Z|Y = y)
28 / 112
64. The General Case
Backwards from the other cases
Observing an effect activates influence between possible causes.
Fact
Any complex example can be analyzed using these three canonical cases
Question
In a given Bayesian Network, Are two variables independent (given
evidence)?
Solution
Analyze Graph Deeply!!!
29 / 112
65. The General Case
Backwards from the other cases
Observing an effect activates influence between possible causes.
Fact
Any complex example can be analyzed using these three canonical cases
Question
In a given Bayesian Network, Are two variables independent (given
evidence)?
Solution
Analyze Graph Deeply!!!
29 / 112
66. The General Case
Backwards from the other cases
Observing an effect activates influence between possible causes.
Fact
Any complex example can be analyzed using these three canonical cases
Question
In a given Bayesian Network, Are two variables independent (given
evidence)?
Solution
Analyze Graph Deeply!!!
29 / 112
67. The General Case
Backwards from the other cases
Observing an effect activates influence between possible causes.
Fact
Any complex example can be analyzed using these three canonical cases
Question
In a given Bayesian Network, Are two variables independent (given
evidence)?
Solution
Analyze Graph Deeply!!!
29 / 112
68. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
30 / 112
69. Analyze the Graph
Definition 2.1
Let G = (V , E) be a DAG, where V is a set of random variables. We
say that, based on the Markov condition, G entails conditional
independence:
IP(A, B|C) for A, B, C ⊆ V if IP(A, B|C) holds for every P ∈ PG ,
where PG is the set of all probability distributions P such that (G, P)
satisfies the Markov condition.
Thus
We also say the Markov condition entails the conditional independence for
G and that the conditional independence is in G.
31 / 112
70. Analyze the Graph
Definition 2.1
Let G = (V , E) be a DAG, where V is a set of random variables. We
say that, based on the Markov condition, G entails conditional
independence:
IP(A, B|C) for A, B, C ⊆ V if IP(A, B|C) holds for every P ∈ PG ,
where PG is the set of all probability distributions P such that (G, P)
satisfies the Markov condition.
Thus
We also say the Markov condition entails the conditional independence for
G and that the conditional independence is in G.
31 / 112
71. Analyze the Graph
Definition 2.1
Let G = (V , E) be a DAG, where V is a set of random variables. We
say that, based on the Markov condition, G entails conditional
independence:
IP(A, B|C) for A, B, C ⊆ V if IP(A, B|C) holds for every P ∈ PG ,
where PG is the set of all probability distributions P such that (G, P)
satisfies the Markov condition.
Thus
We also say the Markov condition entails the conditional independence for
G and that the conditional independence is in G.
31 / 112
72. Analyze the Graph
Question
In a given Bayesian Network, Are two variables independent (Given
evidence)?
Solution
Analyze Graph Deeply!!!
32 / 112
73. Analyze the Graph
Question
In a given Bayesian Network, Are two variables independent (Given
evidence)?
Solution
Analyze Graph Deeply!!!
32 / 112
74. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
33 / 112
75. Examples of Entailed Conditional independence
Example
G: Graduate Program Quality.
F: First Job Quality.
B: Number of Publications.
C: Number of Citations.
F is given some evidence!!!
G F B C
34 / 112
76. Examples of Entailed Conditional independence
Example
G: Graduate Program Quality.
F: First Job Quality.
B: Number of Publications.
C: Number of Citations.
F is given some evidence!!!
G F B C
34 / 112
77. Using Markov Condition
If the graph satisfies the Markov Condition
G F B C
Thus
P (C|G, F = f ) =
b
P (C|B = b, G, F = f ) P (B = b|G, F = f )
=
b
P (C|B = b, F = f ) P (B = b|F = f )
= P (C|F = f )
35 / 112
78. Using Markov Condition
If the graph satisfies the Markov Condition
G F B C
Thus
P (C|G, F = f ) =
b
P (C|B = b, G, F = f ) P (B = b|G, F = f )
=
b
P (C|B = b, F = f ) P (B = b|F = f )
= P (C|F = f )
35 / 112
79. Using Markov Condition
If the graph satisfies the Markov Condition
G F B C
Thus
P (C|G, F = f ) =
b
P (C|B = b, G, F = f ) P (B = b|G, F = f )
=
b
P (C|B = b, F = f ) P (B = b|F = f )
= P (C|F = f )
35 / 112
80. Using Markov Condition
If the graph satisfies the Markov Condition
G F B C
Thus
P (C|G, F = f ) =
b
P (C|B = b, G, F = f ) P (B = b|G, F = f )
=
b
P (C|B = b, F = f ) P (B = b|F = f )
= P (C|F = f )
35 / 112
82. D-Separation ≈ Conditional independence
F and G are given as evidence
Example C and G are d-separated by A, F in the DAG in
G
A
F B C
37 / 112
83. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
38 / 112
84. Basic Definitions
Definition (Undirected Paths)
A path between two sets of nodes X and Y is any sequence of nodes
between a member of X and a member of Y such that every adjacent pair
of nodes is connected by an edge (regardless of direction) and no node
appears in the sequence twice.
39 / 112
87. Thus
Given a path in G = (V , E)
There are the edges connecting [X1, X2, ..., Xk].
Therefore
Given the directed edge X → Y , we say the tail of the edge is at X and
the head of the edge is Y .
42 / 112
88. Thus
Given a path in G = (V , E)
There are the edges connecting [X1, X2, ..., Xk].
Therefore
Given the directed edge X → Y , we say the tail of the edge is at X and
the head of the edge is Y .
42 / 112
89. Thus
Given a path in G = (V , E)
There are the edges connecting [X1, X2, ..., Xk].
Therefore
Given the directed edge X → Y , we say the tail of the edge is at X and
the head of the edge is Y .
42 / 112
90. Basic Classifications of Meetings
Head-to-Tail
A path X → Y → Z is a head-to-tail meeting, the edges meet head-
to-tail at Y , and Y is a head-to-tail node on the path.
Tail-to-Tail
A path X ← Y → Z is a tail-to-tail meeting, the edges meet tail-to-tail
at Z, and Z is a tail-to-tail node on the path.
Head-to-Head
A path X → Y ← Z is a head-to-head meeting, the edges meet
head-to-head at Y , and Y is a head-to-head node on the path.
43 / 112
91. Basic Classifications of Meetings
Head-to-Tail
A path X → Y → Z is a head-to-tail meeting, the edges meet head-
to-tail at Y , and Y is a head-to-tail node on the path.
Tail-to-Tail
A path X ← Y → Z is a tail-to-tail meeting, the edges meet tail-to-tail
at Z, and Z is a tail-to-tail node on the path.
Head-to-Head
A path X → Y ← Z is a head-to-head meeting, the edges meet
head-to-head at Y , and Y is a head-to-head node on the path.
43 / 112
92. Basic Classifications of Meetings
Head-to-Tail
A path X → Y → Z is a head-to-tail meeting, the edges meet head-
to-tail at Y , and Y is a head-to-tail node on the path.
Tail-to-Tail
A path X ← Y → Z is a tail-to-tail meeting, the edges meet tail-to-tail
at Z, and Z is a tail-to-tail node on the path.
Head-to-Head
A path X → Y ← Z is a head-to-head meeting, the edges meet
head-to-head at Y , and Y is a head-to-head node on the path.
43 / 112
96. Basic Classifications of Meetings
Finally
A path (undirected) X − Z − Y , such that X and Y are not
adjacent, is an uncoupled meeting.
47 / 112
97. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
48 / 112
98. Blocking Information ≈ Conditional Independence
Definition 2.2
Definition 2.2 LetG = (V , E) be a DAG, A ⊆ V , X and Y be distinct
nodes in V − A, and ρ be a path between X and Y .
Then ρ is blocked by A if one of the following holds:
1 There is a node Z ∈ A on the path ρ, and the edges incident to Z on
ρ meet head-to-tail at Z.
2 There is a node Z ∈ A on the path ρ, and the edges incident to Z on
ρ, meet tail-to-tail at Z.
3 There is a node Z, such that Z and all of Z’s descendent’s are not in
A, on the chain ρ, and the edges incident to Z on ρ meet
head-to-head at Z.
49 / 112
99. Blocking Information ≈ Conditional Independence
Definition 2.2
Definition 2.2 LetG = (V , E) be a DAG, A ⊆ V , X and Y be distinct
nodes in V − A, and ρ be a path between X and Y .
Then ρ is blocked by A if one of the following holds:
1 There is a node Z ∈ A on the path ρ, and the edges incident to Z on
ρ meet head-to-tail at Z.
2 There is a node Z ∈ A on the path ρ, and the edges incident to Z on
ρ, meet tail-to-tail at Z.
3 There is a node Z, such that Z and all of Z’s descendent’s are not in
A, on the chain ρ, and the edges incident to Z on ρ meet
head-to-head at Z.
49 / 112
100. Blocking Information ≈ Conditional Independence
Definition 2.2
Definition 2.2 LetG = (V , E) be a DAG, A ⊆ V , X and Y be distinct
nodes in V − A, and ρ be a path between X and Y .
Then ρ is blocked by A if one of the following holds:
1 There is a node Z ∈ A on the path ρ, and the edges incident to Z on
ρ meet head-to-tail at Z.
2 There is a node Z ∈ A on the path ρ, and the edges incident to Z on
ρ, meet tail-to-tail at Z.
3 There is a node Z, such that Z and all of Z’s descendent’s are not in
A, on the chain ρ, and the edges incident to Z on ρ meet
head-to-head at Z.
49 / 112
101. Blocking Information ≈ Conditional Independence
Definition 2.2
Definition 2.2 LetG = (V , E) be a DAG, A ⊆ V , X and Y be distinct
nodes in V − A, and ρ be a path between X and Y .
Then ρ is blocked by A if one of the following holds:
1 There is a node Z ∈ A on the path ρ, and the edges incident to Z on
ρ meet head-to-tail at Z.
2 There is a node Z ∈ A on the path ρ, and the edges incident to Z on
ρ, meet tail-to-tail at Z.
3 There is a node Z, such that Z and all of Z’s descendent’s are not in
A, on the chain ρ, and the edges incident to Z on ρ meet
head-to-head at Z.
49 / 112
102. Blocking Information ≈ Conditional Independence
Definition 2.2
Definition 2.2 LetG = (V , E) be a DAG, A ⊆ V , X and Y be distinct
nodes in V − A, and ρ be a path between X and Y .
Then ρ is blocked by A if one of the following holds:
1 There is a node Z ∈ A on the path ρ, and the edges incident to Z on
ρ meet head-to-tail at Z.
2 There is a node Z ∈ A on the path ρ, and the edges incident to Z on
ρ, meet tail-to-tail at Z.
3 There is a node Z, such that Z and all of Z’s descendent’s are not in
A, on the chain ρ, and the edges incident to Z on ρ meet
head-to-head at Z.
49 / 112
103. Example
We have that the path [Y , X, Z, S] is blocked by {X} and {Z}
Because the edges on the chain incident to X meet tail-to-tail at X.
50 / 112
104. Example
We have that the path [W , Y , R, Z, S] is blocked by ∅
Because R /∈ ∅ and T /∈ ∅ and the edges on the chain incident to R
meet head-to-head at R .
51 / 112
105. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
52 / 112
106. Definition of D-Separation
Definition 2.3
Let G = (V , E) be a DAG, A ⊆ V , and X and Y be distinct nodes in
V − A. We say X and Y are D-Separated by A in G if every path
between X and Y is blocked by A.
Definition 2.4
Let G = (V , E) be a DAG, and A, B, and C be mutually disjoint subsets
of V . We say A and B are d-separated by C in G if for every X ∈ A and
Y ∈ B, X and Y are D-Separated by C.
We write
IG(A, B|C) or A ⊥⊥ B|C (8)
If C = ∅, we write only IG(A, B) or A ⊥⊥ B.
53 / 112
107. Definition of D-Separation
Definition 2.3
Let G = (V , E) be a DAG, A ⊆ V , and X and Y be distinct nodes in
V − A. We say X and Y are D-Separated by A in G if every path
between X and Y is blocked by A.
Definition 2.4
Let G = (V , E) be a DAG, and A, B, and C be mutually disjoint subsets
of V . We say A and B are d-separated by C in G if for every X ∈ A and
Y ∈ B, X and Y are D-Separated by C.
We write
IG(A, B|C) or A ⊥⊥ B|C (8)
If C = ∅, we write only IG(A, B) or A ⊥⊥ B.
53 / 112
108. Definition of D-Separation
Definition 2.3
Let G = (V , E) be a DAG, A ⊆ V , and X and Y be distinct nodes in
V − A. We say X and Y are D-Separated by A in G if every path
between X and Y is blocked by A.
Definition 2.4
Let G = (V , E) be a DAG, and A, B, and C be mutually disjoint subsets
of V . We say A and B are d-separated by C in G if for every X ∈ A and
Y ∈ B, X and Y are D-Separated by C.
We write
IG(A, B|C) or A ⊥⊥ B|C (8)
If C = ∅, we write only IG(A, B) or A ⊥⊥ B.
53 / 112
109. Example
X and T are D-Separated by {Y , Z}
Because the chain [X, Y , R, T] is blocked at Y .
54 / 112
110. Example
X and T are D-Separated by {Y , Z}
Because the chain [X, Z, S, R, T] is blocked at Z, S.
55 / 112
111. D-Separation ⇒ Independence
D-Separation Theorem
Let P be a probability distribution of the variables in V and G = (V , E)
be a DAG. Then (G, P) satisfies the Markov condition if and only if
for every three mutually disjoint subsets A, B, C ⊆ V , whenever A
and B are D-Separated by C, A and B are conditionally independent
in P given C.
That is, (G, P) satisfies the Markov condition if and only if
IG (A, B|C) ⇒ IP (A, B|C) (9)
56 / 112
112. D-Separation ⇒ Independence
D-Separation Theorem
Let P be a probability distribution of the variables in V and G = (V , E)
be a DAG. Then (G, P) satisfies the Markov condition if and only if
for every three mutually disjoint subsets A, B, C ⊆ V , whenever A
and B are D-Separated by C, A and B are conditionally independent
in P given C.
That is, (G, P) satisfies the Markov condition if and only if
IG (A, B|C) ⇒ IP (A, B|C) (9)
56 / 112
113. D-Separation ⇒ Independence
D-Separation Theorem
Let P be a probability distribution of the variables in V and G = (V , E)
be a DAG. Then (G, P) satisfies the Markov condition if and only if
for every three mutually disjoint subsets A, B, C ⊆ V , whenever A
and B are D-Separated by C, A and B are conditionally independent
in P given C.
That is, (G, P) satisfies the Markov condition if and only if
IG (A, B|C) ⇒ IP (A, B|C) (9)
56 / 112
114. Proof
The proof that, if (G, P) satisfies the Markov condition
Then, each D-Separation implies the corresponding conditional
independence is quite lengthy and can be found in [Verma and Pearl,
1990] and in [Neapolitan, 1990].
Then, we will only prove the other direction
Suppose each D-Separation implies a conditional independence.
Thus, the following implication holds
IG (A, B|C) ⇒ IP (A, B|C) (10)
57 / 112
115. Proof
The proof that, if (G, P) satisfies the Markov condition
Then, each D-Separation implies the corresponding conditional
independence is quite lengthy and can be found in [Verma and Pearl,
1990] and in [Neapolitan, 1990].
Then, we will only prove the other direction
Suppose each D-Separation implies a conditional independence.
Thus, the following implication holds
IG (A, B|C) ⇒ IP (A, B|C) (10)
57 / 112
116. Proof
The proof that, if (G, P) satisfies the Markov condition
Then, each D-Separation implies the corresponding conditional
independence is quite lengthy and can be found in [Verma and Pearl,
1990] and in [Neapolitan, 1990].
Then, we will only prove the other direction
Suppose each D-Separation implies a conditional independence.
Thus, the following implication holds
IG (A, B|C) ⇒ IP (A, B|C) (10)
57 / 112
117. Proof
Something Notable
It is not hard to see that a node’s parents D-Separate the node from all its
non-descendent’s that are not its parents.
This is
If we denote the sets of parents and non-descendent’s of X by PAX and
NDX respectively, we have
IG ({X} , NDX − PAX |PAX ) (11)
Thus
IP ({X} , NDX − PAX |PAX ) (12)
58 / 112
118. Proof
Something Notable
It is not hard to see that a node’s parents D-Separate the node from all its
non-descendent’s that are not its parents.
This is
If we denote the sets of parents and non-descendent’s of X by PAX and
NDX respectively, we have
IG ({X} , NDX − PAX |PAX ) (11)
Thus
IP ({X} , NDX − PAX |PAX ) (12)
58 / 112
119. Proof
Something Notable
It is not hard to see that a node’s parents D-Separate the node from all its
non-descendent’s that are not its parents.
This is
If we denote the sets of parents and non-descendent’s of X by PAX and
NDX respectively, we have
IG ({X} , NDX − PAX |PAX ) (11)
Thus
IP ({X} , NDX − PAX |PAX ) (12)
58 / 112
120. Proof
This states the same than
IP ({X} , NDX |PAX ) (13)
Meaning
The Markov condition is satisfied.
59 / 112
121. Proof
This states the same than
IP ({X} , NDX |PAX ) (13)
Meaning
The Markov condition is satisfied.
59 / 112
122. Every Entailed Conditional Independence is Identified by
D-Separation
Lemma 2.2
Any conditional independence entailed by a DAG, based on the Markov
condition, is equivalent to a conditional independence among disjoint sets
of random variables.
Theorem 2.1
Based on the Markov condition, a DAG G entails all and only those
conditional independences that are identified by D-Separations in G.
60 / 112
123. Every Entailed Conditional Independence is Identified by
D-Separation
Lemma 2.2
Any conditional independence entailed by a DAG, based on the Markov
condition, is equivalent to a conditional independence among disjoint sets
of random variables.
Theorem 2.1
Based on the Markov condition, a DAG G entails all and only those
conditional independences that are identified by D-Separations in G.
60 / 112
124. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
61 / 112
125. Now
We would like to find the D-Separations
Since d-separations entail conditional independencies.
We want an efficient algorithm
For determining whether two sets are D-Separated by another set.
For This, we need to build an algorithm
One that can find all D-Separated nodes from one set of nodes by another.
62 / 112
126. Now
We would like to find the D-Separations
Since d-separations entail conditional independencies.
We want an efficient algorithm
For determining whether two sets are D-Separated by another set.
For This, we need to build an algorithm
One that can find all D-Separated nodes from one set of nodes by another.
62 / 112
127. Now
We would like to find the D-Separations
Since d-separations entail conditional independencies.
We want an efficient algorithm
For determining whether two sets are D-Separated by another set.
For This, we need to build an algorithm
One that can find all D-Separated nodes from one set of nodes by another.
62 / 112
128. How?
To accomplish this
We will first find every node X such that there is at least one active path
given A between X and a node in D.
Something like
63 / 112
129. How?
To accomplish this
We will first find every node X such that there is at least one active path
given A between X and a node in D.
Something like
63 / 112
130. We solve the following problem
Suppose we have a directed graph
We say that certain edges cannot appear consecutively in our paths of
interest.
Thus, we identify certain pair of edges (u → v, v → w)
As legal and the rest as illegal!!
Legal?
We call a path legal if it does not contain any illegal ordered pairs of
edges.
We say Y is reachable from x if there is a legal path from x to y .
64 / 112
131. We solve the following problem
Suppose we have a directed graph
We say that certain edges cannot appear consecutively in our paths of
interest.
Thus, we identify certain pair of edges (u → v, v → w)
As legal and the rest as illegal!!
Legal?
We call a path legal if it does not contain any illegal ordered pairs of
edges.
We say Y is reachable from x if there is a legal path from x to y .
64 / 112
132. We solve the following problem
Suppose we have a directed graph
We say that certain edges cannot appear consecutively in our paths of
interest.
Thus, we identify certain pair of edges (u → v, v → w)
As legal and the rest as illegal!!
Legal?
We call a path legal if it does not contain any illegal ordered pairs of
edges.
We say Y is reachable from x if there is a legal path from x to y .
64 / 112
133. We solve the following problem
Suppose we have a directed graph
We say that certain edges cannot appear consecutively in our paths of
interest.
Thus, we identify certain pair of edges (u → v, v → w)
As legal and the rest as illegal!!
Legal?
We call a path legal if it does not contain any illegal ordered pairs of
edges.
We say Y is reachable from x if there is a legal path from x to y .
64 / 112
134. Thus
We can find the set R of all nodes reachable from x as follows
Any node V such that the edge x → v exists is reachable.
Then
We label such edge with 1.
Next for each such v
We check all unlabeled edges v → w and see if (x → v, v → w) is a legal
pair.
65 / 112
135. Thus
We can find the set R of all nodes reachable from x as follows
Any node V such that the edge x → v exists is reachable.
Then
We label such edge with 1.
Next for each such v
We check all unlabeled edges v → w and see if (x → v, v → w) is a legal
pair.
65 / 112
136. Thus
We can find the set R of all nodes reachable from x as follows
Any node V such that the edge x → v exists is reachable.
Then
We label such edge with 1.
Next for each such v
We check all unlabeled edges v → w and see if (x → v, v → w) is a legal
pair.
65 / 112
137. Then
We label each such edge with a 2
And keep going!!!
Similar to a Breadth-First Graph
Here, we are visiting links rather than nodes.
66 / 112
138. Then
We label each such edge with a 2
And keep going!!!
Similar to a Breadth-First Graph
Here, we are visiting links rather than nodes.
66 / 112
139. What do we want?
Identifying the set of nodes D that are
The one that are D-Separated from B by A in a DAG G = (V , E).
For this
We need to find the set R such that
y ∈ R ⇐⇒ Either
y ∈ B.
There is at least one active chain given A between y and a node in B.
67 / 112
140. What do we want?
Identifying the set of nodes D that are
The one that are D-Separated from B by A in a DAG G = (V , E).
For this
We need to find the set R such that
y ∈ R ⇐⇒ Either
y ∈ B.
There is at least one active chain given A between y and a node in B.
67 / 112
141. What do we want?
Identifying the set of nodes D that are
The one that are D-Separated from B by A in a DAG G = (V , E).
For this
We need to find the set R such that
y ∈ R ⇐⇒ Either
y ∈ B.
There is at least one active chain given A between y and a node in B.
67 / 112
142. What do we want?
Identifying the set of nodes D that are
The one that are D-Separated from B by A in a DAG G = (V , E).
For this
We need to find the set R such that
y ∈ R ⇐⇒ Either
y ∈ B.
There is at least one active chain given A between y and a node in B.
67 / 112
143. What do we want?
Identifying the set of nodes D that are
The one that are D-Separated from B by A in a DAG G = (V , E).
For this
We need to find the set R such that
y ∈ R ⇐⇒ Either
y ∈ B.
There is at least one active chain given A between y and a node in B.
67 / 112
144. Then
If there is an active path ρ between node X and some other node
Then every 3-node sub-path u − v − w (u = w) of ρ has the following
property.
Either
u − v − w is not head-to-head at v and v is not in A.
Or
u − v − w is a head-to-head at v and v is a or has a descendant in A.
68 / 112
145. Then
If there is an active path ρ between node X and some other node
Then every 3-node sub-path u − v − w (u = w) of ρ has the following
property.
Either
u − v − w is not head-to-head at v and v is not in A.
Or
u − v − w is a head-to-head at v and v is a or has a descendant in A.
68 / 112
146. Then
If there is an active path ρ between node X and some other node
Then every 3-node sub-path u − v − w (u = w) of ρ has the following
property.
Either
u − v − w is not head-to-head at v and v is not in A.
Or
u − v − w is a head-to-head at v and v is a or has a descendant in A.
68 / 112
147. The Final Legal Rule
The algorithm find-reachable-nodes uses the RULE
Find if (u → v, v → w) is legal in G .
The pair (u → v, v → w) is legal if and only if
u = w (14)
And one of the following holds
1 (u − v − w) is not head-to-head in G and inA [v] is false.
2 (u − v − w) is head-to-head in G and descendent [v] is true.
69 / 112
148. The Final Legal Rule
The algorithm find-reachable-nodes uses the RULE
Find if (u → v, v → w) is legal in G .
The pair (u → v, v → w) is legal if and only if
u = w (14)
And one of the following holds
1 (u − v − w) is not head-to-head in G and inA [v] is false.
2 (u − v − w) is head-to-head in G and descendent [v] is true.
69 / 112
149. The Final Legal Rule
The algorithm find-reachable-nodes uses the RULE
Find if (u → v, v → w) is legal in G .
The pair (u → v, v → w) is legal if and only if
u = w (14)
And one of the following holds
1 (u − v − w) is not head-to-head in G and inA [v] is false.
2 (u − v − w) is head-to-head in G and descendent [v] is true.
69 / 112
150. The Final Legal Rule
The algorithm find-reachable-nodes uses the RULE
Find if (u → v, v → w) is legal in G .
The pair (u → v, v → w) is legal if and only if
u = w (14)
And one of the following holds
1 (u − v − w) is not head-to-head in G and inA [v] is false.
2 (u − v − w) is head-to-head in G and descendent [v] is true.
69 / 112
151. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
70 / 112
152. Example
Reachable nodes from X when A = ∅, thus
inA [v] is false and descendent [v] is false for all v ∈ V
Therefore
Only the rule 1 is applicable.
71 / 112
153. Example
Reachable nodes from X when A = ∅, thus
inA [v] is false and descendent [v] is false for all v ∈ V
Therefore
Only the rule 1 is applicable.
71 / 112
160. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
78 / 112
161. Algorithm to Finding Reachability
find-reachable-nodes(G, set of nodes B, set of nodes &R)
Input:G = (V,E), subset B ⊂ V and a RULE to find if two consecutive edges are legal
Output: R ⊂ V of all nodes reachable from B
1. for each x ∈ B
2. add x to R
3. for (each v such that x → v ∈ E)
5. add v to R
6. label x → v with 1
7. i = 1
8. found =true
79 / 112
162. Algorithm to Finding Reachability
find-reachable-nodes(G, set of nodes B, set of nodes &R)
Input:G = (V,E), subset B ⊂ V and a RULE to find if two consecutive edges are legal
Output: R ⊂ V of all nodes reachable from B
1. for each x ∈ B
2. add x to R
3. for (each v such that x → v ∈ E)
5. add v to R
6. label x → v with 1
7. i = 1
8. found =true
79 / 112
163. Algorithm to Finding Reachability
find-reachable-nodes(G, set of nodes B, set of nodes &R)
Input:G = (V,E), subset B ⊂ V and a RULE to find if two consecutive edges are legal
Output: R ⊂ V of all nodes reachable from B
1. for each x ∈ B
2. add x to R
3. for (each v such that x → v ∈ E)
5. add v to R
6. label x → v with 1
7. i = 1
8. found =true
79 / 112
164. Algorithm to Finding Reachability
find-reachable-nodes(G, set of nodes B, set of nodes &R)
Input:G = (V,E), subset B ⊂ V and a RULE to find if two consecutive edges are legal
Output: R ⊂ V of all nodes reachable from B
1. for each x ∈ B
2. add x to R
3. for (each v such that x → v ∈ E)
5. add v to R
6. label x → v with 1
7. i = 1
8. found =true
79 / 112
165. Continuation
The following steps
9. while (found)
10. found =false
11. for (each v such that u → v labeled i)
12. for (each unlabeled edge v → w
13. such (u → v, v → w) is legal)
14. add w to R
15. label v → w with i + 1
16. found =true
17. i = i + 1
80 / 112
166. Continuation
The following steps
9. while (found)
10. found =false
11. for (each v such that u → v labeled i)
12. for (each unlabeled edge v → w
13. such (u → v, v → w) is legal)
14. add w to R
15. label v → w with i + 1
16. found =true
17. i = i + 1
80 / 112
167. Continuation
The following steps
9. while (found)
10. found =false
11. for (each v such that u → v labeled i)
12. for (each unlabeled edge v → w
13. such (u → v, v → w) is legal)
14. add w to R
15. label v → w with i + 1
16. found =true
17. i = i + 1
80 / 112
168. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
81 / 112
169. Complexity of find-reachable-nodes
We have that
Let n be the number of nodes and m be the number of edges.
Something Notable
In the worst case, each of the nodes can be reached from n entry points.
Thus
Each time a node is reached, an edge emanating from it may need to be
re-examined.
82 / 112
170. Complexity of find-reachable-nodes
We have that
Let n be the number of nodes and m be the number of edges.
Something Notable
In the worst case, each of the nodes can be reached from n entry points.
Thus
Each time a node is reached, an edge emanating from it may need to be
re-examined.
82 / 112
171. Complexity of find-reachable-nodes
We have that
Let n be the number of nodes and m be the number of edges.
Something Notable
In the worst case, each of the nodes can be reached from n entry points.
Thus
Each time a node is reached, an edge emanating from it may need to be
re-examined.
82 / 112
174. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
84 / 112
175. Algorithm for D-separation
Find-D-
Separations(DAG G = (V,E) , set of nodes A,B, set of nodes D)
Input: G = (V,E)and two disjoint subsets
A,B ⊂ V
Output: D ⊂ V containing all nodes D-
Separated from every node in B by A. That
is IG (B,D|A) holds and no superset D has
this property.
1. for each v ∈ V
2. if (v ∈ A)
3. inA [v] = true
4. else
5. inA [v] = false
6. if (v is or has a descendent in A)
7. descendent [v] = true
8. else
9. descendent [v] = false
10. E = E ∪ {u → v|v → u ∈ E}
11. G = (V,E’)
12. Run the algorithm:
find-reachable-nodes(G , B,R)
Note B ⊆ R
13. return D=V − (A ∪ R)
85 / 112
176. Algorithm for D-separation
Find-D-
Separations(DAG G = (V,E) , set of nodes A,B, set of nodes D)
Input: G = (V,E)and two disjoint subsets
A,B ⊂ V
Output: D ⊂ V containing all nodes D-
Separated from every node in B by A. That
is IG (B,D|A) holds and no superset D has
this property.
1. for each v ∈ V
2. if (v ∈ A)
3. inA [v] = true
4. else
5. inA [v] = false
6. if (v is or has a descendent in A)
7. descendent [v] = true
8. else
9. descendent [v] = false
10. E = E ∪ {u → v|v → u ∈ E}
11. G = (V,E’)
12. Run the algorithm:
find-reachable-nodes(G , B,R)
Note B ⊆ R
13. return D=V − (A ∪ R)
85 / 112
177. Algorithm for D-separation
Find-D-
Separations(DAG G = (V,E) , set of nodes A,B, set of nodes D)
Input: G = (V,E)and two disjoint subsets
A,B ⊂ V
Output: D ⊂ V containing all nodes D-
Separated from every node in B by A. That
is IG (B,D|A) holds and no superset D has
this property.
1. for each v ∈ V
2. if (v ∈ A)
3. inA [v] = true
4. else
5. inA [v] = false
6. if (v is or has a descendent in A)
7. descendent [v] = true
8. else
9. descendent [v] = false
10. E = E ∪ {u → v|v → u ∈ E}
11. G = (V,E’)
12. Run the algorithm:
find-reachable-nodes(G , B,R)
Note B ⊆ R
13. return D=V − (A ∪ R)
85 / 112
178. Algorithm for D-separation
Find-D-
Separations(DAG G = (V,E) , set of nodes A,B, set of nodes D)
Input: G = (V,E)and two disjoint subsets
A,B ⊂ V
Output: D ⊂ V containing all nodes D-
Separated from every node in B by A. That
is IG (B,D|A) holds and no superset D has
this property.
1. for each v ∈ V
2. if (v ∈ A)
3. inA [v] = true
4. else
5. inA [v] = false
6. if (v is or has a descendent in A)
7. descendent [v] = true
8. else
9. descendent [v] = false
10. E = E ∪ {u → v|v → u ∈ E}
11. G = (V,E’)
12. Run the algorithm:
find-reachable-nodes(G , B,R)
Note B ⊆ R
13. return D=V − (A ∪ R)
85 / 112
179. Algorithm for D-separation
Find-D-
Separations(DAG G = (V,E) , set of nodes A,B, set of nodes D)
Input: G = (V,E)and two disjoint subsets
A,B ⊂ V
Output: D ⊂ V containing all nodes D-
Separated from every node in B by A. That
is IG (B,D|A) holds and no superset D has
this property.
1. for each v ∈ V
2. if (v ∈ A)
3. inA [v] = true
4. else
5. inA [v] = false
6. if (v is or has a descendent in A)
7. descendent [v] = true
8. else
9. descendent [v] = false
10. E = E ∪ {u → v|v → u ∈ E}
11. G = (V,E’)
12. Run the algorithm:
find-reachable-nodes(G , B,R)
Note B ⊆ R
13. return D=V − (A ∪ R)
85 / 112
180. Algorithm for D-separation
Find-D-
Separations(DAG G = (V,E) , set of nodes A,B, set of nodes D)
Input: G = (V,E)and two disjoint subsets
A,B ⊂ V
Output: D ⊂ V containing all nodes D-
Separated from every node in B by A. That
is IG (B,D|A) holds and no superset D has
this property.
1. for each v ∈ V
2. if (v ∈ A)
3. inA [v] = true
4. else
5. inA [v] = false
6. if (v is or has a descendent in A)
7. descendent [v] = true
8. else
9. descendent [v] = false
10. E = E ∪ {u → v|v → u ∈ E}
11. G = (V,E’)
12. Run the algorithm:
find-reachable-nodes(G , B,R)
Note B ⊆ R
13. return D=V − (A ∪ R)
85 / 112
181. Algorithm for D-separation
Find-D-
Separations(DAG G = (V,E) , set of nodes A,B, set of nodes D)
Input: G = (V,E)and two disjoint subsets
A,B ⊂ V
Output: D ⊂ V containing all nodes D-
Separated from every node in B by A. That
is IG (B,D|A) holds and no superset D has
this property.
1. for each v ∈ V
2. if (v ∈ A)
3. inA [v] = true
4. else
5. inA [v] = false
6. if (v is or has a descendent in A)
7. descendent [v] = true
8. else
9. descendent [v] = false
10. E = E ∪ {u → v|v → u ∈ E}
11. G = (V,E’)
12. Run the algorithm:
find-reachable-nodes(G , B,R)
Note B ⊆ R
13. return D=V − (A ∪ R)
85 / 112
182. Algorithm for D-separation
Find-D-
Separations(DAG G = (V,E) , set of nodes A,B, set of nodes D)
Input: G = (V,E)and two disjoint subsets
A,B ⊂ V
Output: D ⊂ V containing all nodes D-
Separated from every node in B by A. That
is IG (B,D|A) holds and no superset D has
this property.
1. for each v ∈ V
2. if (v ∈ A)
3. inA [v] = true
4. else
5. inA [v] = false
6. if (v is or has a descendent in A)
7. descendent [v] = true
8. else
9. descendent [v] = false
10. E = E ∪ {u → v|v → u ∈ E}
11. G = (V,E’)
12. Run the algorithm:
find-reachable-nodes(G , B,R)
Note B ⊆ R
13. return D=V − (A ∪ R)
85 / 112
183. Observation about descendent [v]
We can implement the construction of descendent [v] as follow
Initially set descendent [v] = true for all nodes in A.
Then
Then follow the incoming edges in A to their parents, their parents’
parents, and so on.
Thus
We set descendent [v] = true for each node found along the way.
86 / 112
184. Observation about descendent [v]
We can implement the construction of descendent [v] as follow
Initially set descendent [v] = true for all nodes in A.
Then
Then follow the incoming edges in A to their parents, their parents’
parents, and so on.
Thus
We set descendent [v] = true for each node found along the way.
86 / 112
185. Observation about descendent [v]
We can implement the construction of descendent [v] as follow
Initially set descendent [v] = true for all nodes in A.
Then
Then follow the incoming edges in A to their parents, their parents’
parents, and so on.
Thus
We set descendent [v] = true for each node found along the way.
86 / 112
186. Observation about E’
The RULE about legal and E’
The E’ is necessary because using only the RULE on E will no get us all
the active paths.
For example
87 / 112
187. Observation about E’
The RULE about legal and E’
The E’ is necessary because using only the RULE on E will no get us all
the active paths.
For example
1
1
2
2
87 / 112
188. Thus
Something Notable
Given A is the only node in A and X → T is the only edge in B, the edges
in that DAG are numbered according to the method.
Then
The active chain X → A ← Z ← T ← Y is missed because the edge
T → Z is already numbered by the time the chain A ← Z ← T is
investigated.
But
If we use the set of edges E’.
88 / 112
189. Thus
Something Notable
Given A is the only node in A and X → T is the only edge in B, the edges
in that DAG are numbered according to the method.
Then
The active chain X → A ← Z ← T ← Y is missed because the edge
T → Z is already numbered by the time the chain A ← Z ← T is
investigated.
But
If we use the set of edges E’.
88 / 112
190. Thus
Something Notable
Given A is the only node in A and X → T is the only edge in B, the edges
in that DAG are numbered according to the method.
Then
The active chain X → A ← Z ← T ← Y is missed because the edge
T → Z is already numbered by the time the chain A ← Z ← T is
investigated.
But
If we use the set of edges E’.
88 / 112
192. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
90 / 112
193. Complexity
Please take a look at page 81 at
“Learning Bayesian Networks” by Richard E. Neapolitan.
For the analysis of the algorithm for m edges and n nodes
Θ (m) with m ≥ n. (16)
91 / 112
194. Complexity
Please take a look at page 81 at
“Learning Bayesian Networks” by Richard E. Neapolitan.
For the analysis of the algorithm for m edges and n nodes
Θ (m) with m ≥ n. (16)
91 / 112
195. The D-Separation Algorithm works
Theorem 2.2
The set D contains all and only nodes D-Separated from every node in B
by A.
That is, we have IG (B,D|A) and no superset of D has this property.
Proof
The set R determined by the algorithm contains
All nodes in B.
All nodes reachable from B via a legal path in G .
92 / 112
196. The D-Separation Algorithm works
Theorem 2.2
The set D contains all and only nodes D-Separated from every node in B
by A.
That is, we have IG (B,D|A) and no superset of D has this property.
Proof
The set R determined by the algorithm contains
All nodes in B.
All nodes reachable from B via a legal path in G .
92 / 112
197. Proof
For any two nodes x ∈ B and y /∈ A ∪ B
The path x − ... − y is active in G if and only if the path x → ... → y is
legal in G .
Thus
Thus R contains the nodes in B plus all and only those nodes that have
active paths between them and a node in B.
By the definition of D-Separation
A node is D-Separated from every node in B in A if the node is not in
A ∪ B and there is not active path between the node and a node in B.
93 / 112
198. Proof
For any two nodes x ∈ B and y /∈ A ∪ B
The path x − ... − y is active in G if and only if the path x → ... → y is
legal in G .
Thus
Thus R contains the nodes in B plus all and only those nodes that have
active paths between them and a node in B.
By the definition of D-Separation
A node is D-Separated from every node in B in A if the node is not in
A ∪ B and there is not active path between the node and a node in B.
93 / 112
199. Proof
For any two nodes x ∈ B and y /∈ A ∪ B
The path x − ... − y is active in G if and only if the path x → ... → y is
legal in G .
Thus
Thus R contains the nodes in B plus all and only those nodes that have
active paths between them and a node in B.
By the definition of D-Separation
A node is D-Separated from every node in B in A if the node is not in
A ∪ B and there is not active path between the node and a node in B.
93 / 112
200. Proof
Thus
D=V-(A ∪ R) is the set of all nodes D-Separated from every node in B by
A.
94 / 112
201. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
95 / 112
207. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
101 / 112
208. Application
Something Notable
In general, the inference problem in Bayesian networks is to determine
P (B|A), where A and B are two sets of variables.
Thus
We can use the D-Separation for that.
102 / 112
209. Application
Something Notable
In general, the inference problem in Bayesian networks is to determine
P (B|A), where A and B are two sets of variables.
Thus
We can use the D-Separation for that.
102 / 112
212. Where
The extra nodes represent
The probabilities in the interval [0, 1] and representing P (X = x)
Creating a set of P be the set of auxiliary parent nodes
Thus, if we want to determine P (B|A) in G, we can use the algorithm for
D-Separation to find D.
Such that
IG (B,D|A) (17)
And no superset of D has this property, then take D∩P.
105 / 112
213. Where
The extra nodes represent
The probabilities in the interval [0, 1] and representing P (X = x)
Creating a set of P be the set of auxiliary parent nodes
Thus, if we want to determine P (B|A) in G, we can use the algorithm for
D-Separation to find D.
Such that
IG (B,D|A) (17)
And no superset of D has this property, then take D∩P.
105 / 112
214. Where
The extra nodes represent
The probabilities in the interval [0, 1] and representing P (X = x)
Creating a set of P be the set of auxiliary parent nodes
Thus, if we want to determine P (B|A) in G, we can use the algorithm for
D-Separation to find D.
Such that
IG (B,D|A) (17)
And no superset of D has this property, then take D∩P.
105 / 112
215. For example
We want P (f )
To determine it we need that all and only the values of PH , PB , PL ,
and PF
Because
IG ({F} , {PX } |∅) (18)
Thus
PX is the only auxiliary parent variable D-Separated from {F} by the
empty set.
106 / 112
216. For example
We want P (f )
To determine it we need that all and only the values of PH , PB , PL ,
and PF
Because
IG ({F} , {PX } |∅) (18)
Thus
PX is the only auxiliary parent variable D-Separated from {F} by the
empty set.
106 / 112
217. For example
We want P (f )
To determine it we need that all and only the values of PH , PB , PL ,
and PF
Because
IG ({F} , {PX } |∅) (18)
Thus
PX is the only auxiliary parent variable D-Separated from {F} by the
empty set.
106 / 112
218. For example
We want P (f |b)
To determine it we need that all and only the values of PH , PL , and PF
when separation set is {B}
Then
IG ({F} , {PX , PB} | {B}) (19)
Thus
PX is the only auxiliary parent variable D-Separated from {F} by the
empty set.
107 / 112
219. For example
We want P (f |b)
To determine it we need that all and only the values of PH , PL , and PF
when separation set is {B}
Then
IG ({F} , {PX , PB} | {B}) (19)
Thus
PX is the only auxiliary parent variable D-Separated from {F} by the
empty set.
107 / 112
220. For example
We want P (f |b)
To determine it we need that all and only the values of PH , PL , and PF
when separation set is {B}
Then
IG ({F} , {PX , PB} | {B}) (19)
Thus
PX is the only auxiliary parent variable D-Separated from {F} by the
empty set.
107 / 112
221. Outline
1 Causality
Example
Definition of Causal Structure
Causal Networks
Causal Chains
Common Causes
Common Effect
2 Analyze the Graph
Going Further
D-Separation
Paths
Blocking
Definition of D-Separation
3 Algorithms to Find D-Separations
Introduction
Example of Reachability
Reachability Algorithm
Analysis
D-Separation Finding Algorithm
Analysis
Example of D-Separation
Application
4 Final Remarks
Encoding Causality
108 / 112
222. Remarks
Bayes Networks can reflect the true causal patterns
Often simpler (nodes have fewer parents).
Often easier to think about.
Often easier to elicit from experts.
Something Notable
Sometimes no causal net exists over the domain.
For example, consider the variables Traffic and Drips.
Arrows reflect correlation not causation.
109 / 112
223. Remarks
Bayes Networks can reflect the true causal patterns
Often simpler (nodes have fewer parents).
Often easier to think about.
Often easier to elicit from experts.
Something Notable
Sometimes no causal net exists over the domain.
For example, consider the variables Traffic and Drips.
Arrows reflect correlation not causation.
109 / 112
224. Remarks
Bayes Networks can reflect the true causal patterns
Often simpler (nodes have fewer parents).
Often easier to think about.
Often easier to elicit from experts.
Something Notable
Sometimes no causal net exists over the domain.
For example, consider the variables Traffic and Drips.
Arrows reflect correlation not causation.
109 / 112
225. Remarks
Bayes Networks can reflect the true causal patterns
Often simpler (nodes have fewer parents).
Often easier to think about.
Often easier to elicit from experts.
Something Notable
Sometimes no causal net exists over the domain.
For example, consider the variables Traffic and Drips.
Arrows reflect correlation not causation.
109 / 112
226. Remarks
Bayes Networks can reflect the true causal patterns
Often simpler (nodes have fewer parents).
Often easier to think about.
Often easier to elicit from experts.
Something Notable
Sometimes no causal net exists over the domain.
For example, consider the variables Traffic and Drips.
Arrows reflect correlation not causation.
109 / 112
227. Remarks
Bayes Networks can reflect the true causal patterns
Often simpler (nodes have fewer parents).
Often easier to think about.
Often easier to elicit from experts.
Something Notable
Sometimes no causal net exists over the domain.
For example, consider the variables Traffic and Drips.
Arrows reflect correlation not causation.
109 / 112
228. Remarks
What do the arrows really mean?
Topologies may happen to encode causal structure.
Topologies are only guaranteed to encode conditional independence!
110 / 112
229. Remarks
What do the arrows really mean?
Topologies may happen to encode causal structure.
Topologies are only guaranteed to encode conditional independence!
110 / 112
233. Example
Thus
Adding edges allows to make different conditional independence
assumptions.
New Network
Burgalary Earthquake
Alarm
John Calls Mary Calls
112 / 112