This document provides an overview of Bayesian networks and probabilistic graphical models (PGMs). It outlines the goals of learning how to build graphical models using graph theory and perform inference under uncertainty using probability theory. It also lists some example PGM models like Markov random fields, hidden Markov models, dynamic Bayesian networks, naive Bayes models, and applications in computer vision. Finally, it provides the table of contents and references for further self-study on PGMs and Bayesian networks.
This document provides an introduction to Bayesian networks. It begins by explaining Bayesian networks using a medical example about determining the likelihood a patient has anthrax given various observed symptoms. It then provides a probability primer covering random variables, conditional probability, and independence. The document defines Bayesian networks as consisting of a directed acyclic graph and conditional probability tables at each node. It explains how Bayesian networks compactly represent joint probability distributions and allow for inference queries. The challenges of exact versus approximate inference in large networks are also noted.
Artificial Intelligence (AI), specifically deep learning, is revolutionizing industries, products, and core capabilities by delivering dramatically enhanced experiences. However, the deep neural networks of today use too much memory, compute, and energy. Plus, to make AI truly ubiquitous, networks need to run on the end device within a tight power and thermal budget. One approach to help address these issues is quantization, which attempts to reduce the number of bits used for weight parameters and activation calculations without sacrificing model accuracy. This presentation covers: why quantization is important, existing quantization challenges, Qualcomm AI Research's existing quantization research, and how developers and researchers can take advantage of quantization on Qualcomm Snapdragon.
This document contains notes from a machine learning discussion. It includes:
1. An introduction to BakFoo Inc. CEO Yuta Kashino's background in astrophysics, Python, and realtime data platforms.
2. References to papers and researchers in Bayesian deep learning and probabilistic programming, including Edward library creators Dustin Tran and Blei Lab.
3. An overview of how Edward combines TensorFlow for deep learning with probabilistic programming to perform Bayesian modeling, inference via VI and MCMC, and criticisms.
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...MLAI2
While tasks could come with varying the number of instances and classes in realistic settings, the existing meta-learning approaches for few-shot classification assume that number of instances per task and class is fixed. Due to such restriction, they learn to equally utilize the meta-knowledge across all the tasks, even when the number of instances per task and class largely varies. Moreover, they do not consider distributional difference in unseen tasks, on which the meta-knowledge may have less usefulness depending on the task relatedness. To overcome these limitations, we propose a novel meta-learning model that adaptively balances the effect of the meta-learning and task-specific learning within each task. Through the learning of the balancing variables, we can decide whether to obtain a solution by relying on the meta-knowledge or task-specific learning. We formulate this objective into a Bayesian inference framework and tackle it using variational inference. We validate our Bayesian Task-Adaptive Meta-Learning (Bayesian TAML) on two realistic task- and class-imbalanced datasets, on which it significantly outperforms existing meta-learning approaches. Further ablation study confirms the effectiveness of each balancing component and the Bayesian learning framework.
This document provides an overview of pattern recognition techniques. It begins with an introduction to pattern recognition and its applications. It then outlines the syllabus, which includes topics like design principles, statistical pattern recognition, parameter estimation methods, principal component analysis, linear discriminant analysis, and classification techniques. Under each topic, it provides further details and explanations.
This document discusses causal discovery and its application to analyzing predictive models. It introduces causal discovery as the unsupervised learning of causal relations from data to estimate causal structures like directed acyclic graphs under certain assumptions. The document then discusses using causal discovery to analyze the mechanisms of predictive models by combining causal models with predictive models to model how interventions on features affect predictions. An example using an auto MPG dataset demonstrates how this approach can suggest which variable has the greatest intervention effect on MPG predictions.
This document provides an introduction to Bayesian networks. It begins by explaining Bayesian networks using a medical example about determining the likelihood a patient has anthrax given various observed symptoms. It then provides a probability primer covering random variables, conditional probability, and independence. The document defines Bayesian networks as consisting of a directed acyclic graph and conditional probability tables at each node. It explains how Bayesian networks compactly represent joint probability distributions and allow for inference queries. The challenges of exact versus approximate inference in large networks are also noted.
Artificial Intelligence (AI), specifically deep learning, is revolutionizing industries, products, and core capabilities by delivering dramatically enhanced experiences. However, the deep neural networks of today use too much memory, compute, and energy. Plus, to make AI truly ubiquitous, networks need to run on the end device within a tight power and thermal budget. One approach to help address these issues is quantization, which attempts to reduce the number of bits used for weight parameters and activation calculations without sacrificing model accuracy. This presentation covers: why quantization is important, existing quantization challenges, Qualcomm AI Research's existing quantization research, and how developers and researchers can take advantage of quantization on Qualcomm Snapdragon.
This document contains notes from a machine learning discussion. It includes:
1. An introduction to BakFoo Inc. CEO Yuta Kashino's background in astrophysics, Python, and realtime data platforms.
2. References to papers and researchers in Bayesian deep learning and probabilistic programming, including Edward library creators Dustin Tran and Blei Lab.
3. An overview of how Edward combines TensorFlow for deep learning with probabilistic programming to perform Bayesian modeling, inference via VI and MCMC, and criticisms.
Learning to Balance: Bayesian Meta-Learning for Imbalanced and Out-of-distrib...MLAI2
While tasks could come with varying the number of instances and classes in realistic settings, the existing meta-learning approaches for few-shot classification assume that number of instances per task and class is fixed. Due to such restriction, they learn to equally utilize the meta-knowledge across all the tasks, even when the number of instances per task and class largely varies. Moreover, they do not consider distributional difference in unseen tasks, on which the meta-knowledge may have less usefulness depending on the task relatedness. To overcome these limitations, we propose a novel meta-learning model that adaptively balances the effect of the meta-learning and task-specific learning within each task. Through the learning of the balancing variables, we can decide whether to obtain a solution by relying on the meta-knowledge or task-specific learning. We formulate this objective into a Bayesian inference framework and tackle it using variational inference. We validate our Bayesian Task-Adaptive Meta-Learning (Bayesian TAML) on two realistic task- and class-imbalanced datasets, on which it significantly outperforms existing meta-learning approaches. Further ablation study confirms the effectiveness of each balancing component and the Bayesian learning framework.
This document provides an overview of pattern recognition techniques. It begins with an introduction to pattern recognition and its applications. It then outlines the syllabus, which includes topics like design principles, statistical pattern recognition, parameter estimation methods, principal component analysis, linear discriminant analysis, and classification techniques. Under each topic, it provides further details and explanations.
This document discusses causal discovery and its application to analyzing predictive models. It introduces causal discovery as the unsupervised learning of causal relations from data to estimate causal structures like directed acyclic graphs under certain assumptions. The document then discusses using causal discovery to analyze the mechanisms of predictive models by combining causal models with predictive models to model how interventions on features affect predictions. An example using an auto MPG dataset demonstrates how this approach can suggest which variable has the greatest intervention effect on MPG predictions.
[DL輪読会]Domain Adaptive Faster R-CNN for Object Detection in the WildDeep Learning JP
The document discusses domain adaptive faster R-CNN for object detection. It proposes a method to adapt a model trained on labeled data from a source domain to detect objects in an unlabeled target domain. The method uses an end-to-end deep learning model with two stages. First, it reduces differences in image distributions between the source and target domains. Then it performs object detection on the target domain images using the adapted model.
This document discusses feature extraction and selection methods for principal component analysis. It provides an introduction to principal component analysis and how it can be used for dimensionality reduction by transforming correlated variables into a set of uncorrelated variables. The document serves as a tutorial on feature extraction, selection, and principal component analysis.
The document discusses different types of generative models including auto-regressive models, variational auto-encoders, and generative adversarial networks. It provides examples of each type of model and highlights some of their features and issues during training. Specific models discussed in more detail include PixelRNNs, DCGANs, WGANs, BEGANs, Pix2Pix, and CycleGANs. The document aims to introduce deep generative models and their applications.
This tutorial extensively covers the definitions, nuances, challenges, and requirements for the design of interpretable and explainable machine learning models and systems in healthcare. We discuss many uses in which interpretable machine learning models are needed in healthcare and how they should be deployed. Additionally, we explore the landscape of recent advances to address the challenges model interpretability in healthcare and also describe how one would go about choosing the right interpretable machine learnig algorithm for a given problem in healthcare.
This document summarizes recent advances in human pose estimation using deep learning methods. It first discusses traditional approaches like pictorial structures. It then covers several deep learning methods including global/holistic view using joint regression, local appearance using body part detection, and combining global and local information. Other methods discussed are using motion features and pose estimation in videos. Evaluation metrics like PCP and PDJ are also introduced. The document outlines many key papers in this area and provides examples of network architectures and results.
This document discusses a framework for mix-and-match tuning for self-supervised semantic segmentation. It proposes training with a proxy task of colorization before semantic segmentation to learn better representations. However, colorization alone may not discriminate high-level semantics well. The proposed method addresses this by taking features from colorization and mixing and matching local patches with unique labels in a graph-based framework for semantic segmentation. Evaluation shows improved mean IoU and per-class IoU over classic self-taught learning approaches.
Vector quantization maps high-dimensional vectors to codewords from a finite codebook. Each codeword defines a Voronoi region containing vectors closest to that codeword. The Lloyd and LBG algorithms are commonly used to optimize the codebook for a given dataset by iteratively clustering vectors and recomputing codeword averages. Tree-structured vector quantization improves efficiency by recursively partitioning the codebook into binary groups defined by test vectors. This reduces the number of distance comparisons needed at the cost of potential increases in distortion and storage requirements.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/sep-2019-alliance-vitf-facebook
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Raghuraman Krishnamoorthi, Software Engineer at Facebook, delivers the presentation "Quantizing Deep Networks for Efficient Inference at the Edge" at the Embedded Vision Alliance's September 2019 Vision Industry and Technology Forum. Krishnamoorthi gives an overview of practical deep neural network quantization techniques and tools.
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
This document outlines Anusua Trivedi's talk on transfer learning and fine-tuning deep neural networks. The talk covers traditional machine learning versus deep learning, using deep convolutional neural networks (DCNNs) for image analysis, transfer learning and fine-tuning DCNNs, recurrent neural networks (RNNs), and case studies applying these techniques to diabetic retinopathy prediction and fashion image caption generation.
Continual Learning is one of the most promising research areas to shift machine learning from solving a single task to something more similar to general intelligence.
Machine learning (and especially deep neural networks research) has shown outstanding results in the past 10 years, bringing us to the deep learning era, where learning models are everywhere and they interact with many aspect of our life.
However, machine learning have an enormous issue, which completely diversity it from biological learning: machine cannot learn continuously.
This is the so called catastrophic forgetting problem, and continual learning is trying to address it, making artificial intelligence able to continually learn for the entire duration of its "life".
발표자: 박태성 (UC Berkeley 박사과정)
발표일: 2017.6.
Taesung Park is a Ph.D. student at UC Berkeley in AI and computer vision, advised by Prof. Alexei Efros.
His research interest lies between computer vision and computational photography, such as generating realistic images or enhancing photo qualities. He received B.S. in mathematics and M.S. in computer science from Stanford University.
개요:
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs.
However, for many tasks, paired training data will not be available.
We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples.
Our goal is to learn a mapping G: X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.
Because this mapping is highly under-constrained, we couple it with an inverse mapping F: Y → X and introduce a cycle consistency loss to push F(G(X)) ≈ X (and vice versa).
Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc.
Quantitative comparisons against several prior methods demonstrate the superiority of our approach.
The document discusses pattern recognition and classification. It begins by defining pattern recognition as a method for determining what something is based on data like images, audio, or text. It then provides examples of common types of pattern recognition like image recognition and speech recognition. It notes that while pattern recognition comes easily to humans, it can be difficult for computers which lack abilities like unconscious, high-speed, high-accuracy recognition. The document then discusses the basic principle of computer-based pattern recognition as classifying inputs into predefined classes based on their similarity to training examples.
This document discusses Gaussian mixture models (GMMs) and their use in applications like speaker recognition and language identification. GMMs represent a probability density function as a weighted sum of Gaussian distributions. GMM parameters are estimated from training data using Expectation-Maximization or Maximum A Posteriori estimation. GMMs are computationally inexpensive and well-suited for text-independent tasks without strong prior knowledge of content.
This document provides an overview of Bayesian networks. It defines Bayesian networks as acyclic directed graphs combined with a joint probability distribution. Each node represents a variable, and edges represent conditional dependencies between variables. The document discusses how Bayesian networks can be used to model complex processes and learn from observations. It provides examples of different types of network structures and conditional dependencies. The document also describes software for working with Bayesian networks and gives a example of how a Bayesian network was developed to help doctors determine optimal treatment for stomach lymphoma.
오사카 대학 Nishida Geio군이 Normalization 관련기술 을 정리한 자료입니다.
Normalization이 왜 필요한지부터 시작해서
Batch, Weight, Layer Normalization별로 수식에 대한 설명과 함께
마지막으로 3방법의 비교를 잘 정리하였고
학습의 진행방법에 대한 설명을 Fisher Information Matrix를 이용했는데, 깊이 공부하실 분들에게만 필요할 듯 합니다.
This document discusses feature selection concepts and methods. It defines features as attributes that determine which class an instance belongs to. Feature selection aims to select a relevant subset of features by removing irrelevant, redundant and unnecessary data. This improves learning accuracy, model performance and interpretability. The document categorizes feature selection algorithms as filter, wrapper or embedded methods based on how they evaluate feature subsets. It also discusses concepts like feature relevance, search strategies, successor generation and evaluation measures used in feature selection algorithms.
The document is a presentation about monocular human pose estimation using Bayesian networks. It includes:
- An outline with sections on introduction, approach overview, model learning, pose estimation, feature extraction, experiments and conclusions.
- Discussion of applications of human motion capture such as animation, games, medical diagnosis and visual surveillance.
- Comparison of different sensor approaches for human pose estimation including active markers, passive markers and markerless methods using cameras.
- Description of the proposed approach which uses Bayesian networks to represent the articulated human body and estimate 2D and 3D joint positions through representation, learning and inference steps.
This document discusses approximate inference in Bayesian networks using sampling methods. It introduces random number generation, which is important for sampling algorithms. Random number generators in programming languages typically generate uniform random numbers, but different distributions are needed for sampling Bayesian networks. The document covers generating random numbers from univariate and multivariate distributions to estimate probabilities for approximate inference in Bayesian networks.
[DL輪読会]Domain Adaptive Faster R-CNN for Object Detection in the WildDeep Learning JP
The document discusses domain adaptive faster R-CNN for object detection. It proposes a method to adapt a model trained on labeled data from a source domain to detect objects in an unlabeled target domain. The method uses an end-to-end deep learning model with two stages. First, it reduces differences in image distributions between the source and target domains. Then it performs object detection on the target domain images using the adapted model.
This document discusses feature extraction and selection methods for principal component analysis. It provides an introduction to principal component analysis and how it can be used for dimensionality reduction by transforming correlated variables into a set of uncorrelated variables. The document serves as a tutorial on feature extraction, selection, and principal component analysis.
The document discusses different types of generative models including auto-regressive models, variational auto-encoders, and generative adversarial networks. It provides examples of each type of model and highlights some of their features and issues during training. Specific models discussed in more detail include PixelRNNs, DCGANs, WGANs, BEGANs, Pix2Pix, and CycleGANs. The document aims to introduce deep generative models and their applications.
This tutorial extensively covers the definitions, nuances, challenges, and requirements for the design of interpretable and explainable machine learning models and systems in healthcare. We discuss many uses in which interpretable machine learning models are needed in healthcare and how they should be deployed. Additionally, we explore the landscape of recent advances to address the challenges model interpretability in healthcare and also describe how one would go about choosing the right interpretable machine learnig algorithm for a given problem in healthcare.
This document summarizes recent advances in human pose estimation using deep learning methods. It first discusses traditional approaches like pictorial structures. It then covers several deep learning methods including global/holistic view using joint regression, local appearance using body part detection, and combining global and local information. Other methods discussed are using motion features and pose estimation in videos. Evaluation metrics like PCP and PDJ are also introduced. The document outlines many key papers in this area and provides examples of network architectures and results.
This document discusses a framework for mix-and-match tuning for self-supervised semantic segmentation. It proposes training with a proxy task of colorization before semantic segmentation to learn better representations. However, colorization alone may not discriminate high-level semantics well. The proposed method addresses this by taking features from colorization and mixing and matching local patches with unique labels in a graph-based framework for semantic segmentation. Evaluation shows improved mean IoU and per-class IoU over classic self-taught learning approaches.
Vector quantization maps high-dimensional vectors to codewords from a finite codebook. Each codeword defines a Voronoi region containing vectors closest to that codeword. The Lloyd and LBG algorithms are commonly used to optimize the codebook for a given dataset by iteratively clustering vectors and recomputing codeword averages. Tree-structured vector quantization improves efficiency by recursively partitioning the codebook into binary groups defined by test vectors. This reduces the number of distance comparisons needed at the cost of potential increases in distortion and storage requirements.
For the full video of this presentation, please visit:
https://www.embedded-vision.com/platinum-members/embedded-vision-alliance/embedded-vision-training/videos/pages/sep-2019-alliance-vitf-facebook
For more information about embedded vision, please visit:
http://www.embedded-vision.com
Raghuraman Krishnamoorthi, Software Engineer at Facebook, delivers the presentation "Quantizing Deep Networks for Efficient Inference at the Edge" at the Embedded Vision Alliance's September 2019 Vision Industry and Technology Forum. Krishnamoorthi gives an overview of practical deep neural network quantization techniques and tools.
Part 2 of the Deep Learning Fundamentals Series, this session discusses Tuning Training (including hyperparameters, overfitting/underfitting), Training Algorithms (including different learning rates, backpropagation), Optimization (including stochastic gradient descent, momentum, Nesterov Accelerated Gradient, RMSprop, Adaptive algorithms - Adam, Adadelta, etc.), and a primer on Convolutional Neural Networks. The demos included in these slides are running on Keras with TensorFlow backend on Databricks.
Transfer Learning and Fine-tuning Deep Neural NetworksPyData
This document outlines Anusua Trivedi's talk on transfer learning and fine-tuning deep neural networks. The talk covers traditional machine learning versus deep learning, using deep convolutional neural networks (DCNNs) for image analysis, transfer learning and fine-tuning DCNNs, recurrent neural networks (RNNs), and case studies applying these techniques to diabetic retinopathy prediction and fashion image caption generation.
Continual Learning is one of the most promising research areas to shift machine learning from solving a single task to something more similar to general intelligence.
Machine learning (and especially deep neural networks research) has shown outstanding results in the past 10 years, bringing us to the deep learning era, where learning models are everywhere and they interact with many aspect of our life.
However, machine learning have an enormous issue, which completely diversity it from biological learning: machine cannot learn continuously.
This is the so called catastrophic forgetting problem, and continual learning is trying to address it, making artificial intelligence able to continually learn for the entire duration of its "life".
발표자: 박태성 (UC Berkeley 박사과정)
발표일: 2017.6.
Taesung Park is a Ph.D. student at UC Berkeley in AI and computer vision, advised by Prof. Alexei Efros.
His research interest lies between computer vision and computational photography, such as generating realistic images or enhancing photo qualities. He received B.S. in mathematics and M.S. in computer science from Stanford University.
개요:
Image-to-image translation is a class of vision and graphics problems where the goal is to learn the mapping between an input image and an output image using a training set of aligned image pairs.
However, for many tasks, paired training data will not be available.
We present an approach for learning to translate an image from a source domain X to a target domain Y in the absence of paired examples.
Our goal is to learn a mapping G: X → Y such that the distribution of images from G(X) is indistinguishable from the distribution Y using an adversarial loss.
Because this mapping is highly under-constrained, we couple it with an inverse mapping F: Y → X and introduce a cycle consistency loss to push F(G(X)) ≈ X (and vice versa).
Qualitative results are presented on several tasks where paired training data does not exist, including collection style transfer, object transfiguration, season transfer, photo enhancement, etc.
Quantitative comparisons against several prior methods demonstrate the superiority of our approach.
The document discusses pattern recognition and classification. It begins by defining pattern recognition as a method for determining what something is based on data like images, audio, or text. It then provides examples of common types of pattern recognition like image recognition and speech recognition. It notes that while pattern recognition comes easily to humans, it can be difficult for computers which lack abilities like unconscious, high-speed, high-accuracy recognition. The document then discusses the basic principle of computer-based pattern recognition as classifying inputs into predefined classes based on their similarity to training examples.
This document discusses Gaussian mixture models (GMMs) and their use in applications like speaker recognition and language identification. GMMs represent a probability density function as a weighted sum of Gaussian distributions. GMM parameters are estimated from training data using Expectation-Maximization or Maximum A Posteriori estimation. GMMs are computationally inexpensive and well-suited for text-independent tasks without strong prior knowledge of content.
This document provides an overview of Bayesian networks. It defines Bayesian networks as acyclic directed graphs combined with a joint probability distribution. Each node represents a variable, and edges represent conditional dependencies between variables. The document discusses how Bayesian networks can be used to model complex processes and learn from observations. It provides examples of different types of network structures and conditional dependencies. The document also describes software for working with Bayesian networks and gives a example of how a Bayesian network was developed to help doctors determine optimal treatment for stomach lymphoma.
오사카 대학 Nishida Geio군이 Normalization 관련기술 을 정리한 자료입니다.
Normalization이 왜 필요한지부터 시작해서
Batch, Weight, Layer Normalization별로 수식에 대한 설명과 함께
마지막으로 3방법의 비교를 잘 정리하였고
학습의 진행방법에 대한 설명을 Fisher Information Matrix를 이용했는데, 깊이 공부하실 분들에게만 필요할 듯 합니다.
This document discusses feature selection concepts and methods. It defines features as attributes that determine which class an instance belongs to. Feature selection aims to select a relevant subset of features by removing irrelevant, redundant and unnecessary data. This improves learning accuracy, model performance and interpretability. The document categorizes feature selection algorithms as filter, wrapper or embedded methods based on how they evaluate feature subsets. It also discusses concepts like feature relevance, search strategies, successor generation and evaluation measures used in feature selection algorithms.
The document is a presentation about monocular human pose estimation using Bayesian networks. It includes:
- An outline with sections on introduction, approach overview, model learning, pose estimation, feature extraction, experiments and conclusions.
- Discussion of applications of human motion capture such as animation, games, medical diagnosis and visual surveillance.
- Comparison of different sensor approaches for human pose estimation including active markers, passive markers and markerless methods using cameras.
- Description of the proposed approach which uses Bayesian networks to represent the articulated human body and estimate 2D and 3D joint positions through representation, learning and inference steps.
This document discusses approximate inference in Bayesian networks using sampling methods. It introduces random number generation, which is important for sampling algorithms. Random number generators in programming languages typically generate uniform random numbers, but different distributions are needed for sampling Bayesian networks. The document covers generating random numbers from univariate and multivariate distributions to estimate probabilities for approximate inference in Bayesian networks.
The document discusses probabilistic inference over time using Bayesian networks. It introduces the concepts of temporal models and the four types of inference in such models: filtering, prediction, smoothing, and most likely explanation. It outlines the goals of learning uncertainty in temporal models and examining hidden Markov models, Kalman filtering, particle filtering, and dynamic Bayesian networks. The document provides an overview of its structure and references related background units on probabilistic graphical models and inference.
This document introduces Bayesian networks and uncertainty inference with discrete variables. It discusses the goal of reviewing advanced statistical concepts like statistical inference and pattern recognition. The contents cover topics like acting under uncertainty, basic probability, marginal probability, inference using full joint distributions, independence, and Bayes' rule. Self-study materials on related topics are also referenced.
The document discusses exact inference in Bayesian networks. It begins by stating the goal is to efficiently compute the sum product of the inference formula. It then lists some related topics that will be covered in subsequent units, such as approximate inference algorithms. The document outlines the structure of the related lecture notes, which will cover topics like variable elimination, belief propagation, and junction trees for exact inference. It also provides references for further self-study on probabilistic inference in graphical models.
This document provides an overview of statistics concepts for image processing and pattern recognition. It reviews key statistical measures including histograms, measures of central tendency (mean, median, mode), variance, frequency distributions, covariance, correlation, and charts/graphs. The goal is to review basic statistics concepts that will be useful for subsequent units on uncertainty inference. Key concepts covered include histograms, probability density functions, measures of central tendency, variance as a measure of dispersion, and expected values.
My slides for acamedia talk about embedded vision in 2010. Some of our research results are also presented in this presentation.
Few slides have chinese characters.
Computer vision and pattern recognition algorithms are important for IoT applications like smart homes and healthcare that involve large camera networks. Academic expertise is needed for accuracy and efficiency, while industrial concerns focus on system integration, configuration and management. The presentation describes a large-scale video surveillance system using heterogeneous information fusion and visualization across a university campus. It also discusses implementing system self-awareness through fault, environment and context awareness, and presents methods for real-time camera anomaly detection.
This document discusses parallelizing computer vision algorithms using GPGPU computing. It begins with an introduction to multicore computing and GPUs. It explains that as CPU clock speeds can no longer increase due to power constraints, the industry has shifted to multicore CPUs and GPUs to continue improving performance. Computer vision algorithms are well-suited to parallelization on GPUs due to their massive data processing needs. The document reviews GPU architectures from Nvidia, Qualcomm, AMD, and ARM that can be used to accelerate computer vision. It also discusses parallel programming frameworks for GPUs like CUDA, OpenCL, and OpenACC.
The document discusses embedded computer vision and presents examples of embedded computer vision systems developed by Wang, Yuan-Kai and his team. It describes research in embedded computer vision using CPUs, DSPs and FPGAs. It also outlines challenges in embedded computer vision and provides examples of projects including an entertainment robot, vision sensor network, video surveillance system, and wearable camera.
This document discusses parallel computing with GPUs and CUDA. It begins by explaining that the multicore era requires parallel computing approaches. It then provides an overview of GPU architecture and programming with CUDA. Specific examples of using GPUs for image restoration, feature extraction, and video processing are mentioned.
This document describes a unit on uncertainty inference using continuous distributions. It covers Bayesian networks and Gaussian distributions, including univariate, bivariate, and multivariate Gaussian distributions. The key concepts covered are the Gaussian distribution parameters of mean and covariance matrix, properties of Gaussian distributions like axis-aligned and spherical Gaussians, and applications like using Gaussians for noise modeling in images. Self-study references on statistics and artificial intelligence are also provided.
This document provides an introduction to probability and probability distributions for Bayesian networks. It begins with a review of basic probability concepts like events, axioms of probability, and theorems derived from the axioms. It then discusses random variables, including discrete, continuous, and random vector variables. Examples of random variables in image processing and computer vision are provided. The document concludes with an overview of probability distributions as a set of probabilities assigned to a random variable or vector.
It is a presentation for acamedia talk about cloud computing for intelligent video surveillance, i.e. VSaaS, given in 2010. Some of our research results are also presented in this presentation.
This document discusses intelligent video surveillance and sousveillance. It covers topics such as video surveillance market trends, important crime cases solved using CCTV footage, and technology used in intelligent video surveillance systems. Computer vision algorithms are used to add intelligence to video surveillance, going beyond just monitoring to visual surveillance. The document also presents examples of intelligent surveillance applications and research from universities and companies.
More from IEEE International Conference on Intelligent Information Hiding and Multimedia Signal Processing (10)
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
This document provides an overview of wound healing, its functions, stages, mechanisms, factors affecting it, and complications.
A wound is a break in the integrity of the skin or tissues, which may be associated with disruption of the structure and function.
Healing is the body’s response to injury in an attempt to restore normal structure and functions.
Healing can occur in two ways: Regeneration and Repair
There are 4 phases of wound healing: hemostasis, inflammation, proliferation, and remodeling. This document also describes the mechanism of wound healing. Factors that affect healing include infection, uncontrolled diabetes, poor nutrition, age, anemia, the presence of foreign bodies, etc.
Complications of wound healing like infection, hyperpigmentation of scar, contractures, and keloid formation.
हिंदी वर्णमाला पीपीटी, hindi alphabet PPT presentation, hindi varnamala PPT, Hindi Varnamala pdf, हिंदी स्वर, हिंदी व्यंजन, sikhiye hindi varnmala, dr. mulla adam ali, hindi language and literature, hindi alphabet with drawing, hindi alphabet pdf, hindi varnamala for childrens, hindi language, hindi varnamala practice for kids, https://www.drmullaadamali.com
Main Java[All of the Base Concepts}.docxadhitya5119
This is part 1 of my Java Learning Journey. This Contains Custom methods, classes, constructors, packages, multithreading , try- catch block, finally block and more.
Beyond Degrees - Empowering the Workforce in the Context of Skills-First.pptxEduSkills OECD
Iván Bornacelly, Policy Analyst at the OECD Centre for Skills, OECD, presents at the webinar 'Tackling job market gaps with a skills-first approach' on 12 June 2024
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
Gender and Mental Health - Counselling and Family Therapy Applications and In...PsychoTech Services
A proprietary approach developed by bringing together the best of learning theories from Psychology, design principles from the world of visualization, and pedagogical methods from over a decade of training experience, that enables you to: Learn better, faster!
Gender and Mental Health - Counselling and Family Therapy Applications and In...
05 probabilistic graphical models
1. Bayesian Networks
Unit 5 Probabilistic
Graphical Models (PGM)
Wang, Yuan-Kai, 王元凱
ykwang@mails.fju.edu.tw
http://www.ykwang.tw
Department of Electrical Engineering, Fu Jen Univ.
輔仁大學電機工程系
2006~2011
Reference this document as:
Wang, Yuan-Kai, “Probabilistic Graphical Models,"
Lecture Notes of Wang, Yuan-Kai, Fu Jen University, Taiwan, 2011.
Fu Jen University Department of Electrical Engineering Wang, Yuan-Kai Copyright
2. Bayesian Networks Unit : Probabilistic Graphical Models p. 2
Goal of This Unit
• Learn how to
– Build graphical model (network model) by
graph theory
– Inference under uncertainty according to
probability theory
• Theory of Bayesian networks
– Conditional independence
– D-Separation
– Basic algorithm:
• Variable Elimination
• Introduce some BN models
– MRF, HMM, DBN, Naïve Bayes, …
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
3. Bayesian Networks Unit : Probabilistic Graphical Models p. 3
Related Units
• Background
– Statistical inference
– Graph theory
• Next units
– Exact inference algorithms
– Approximate inference algorithms
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
4. Bayesian Networks Unit : Probabilistic Graphical Models p. 4
References for Self-Study
• Chapter 14, Artificial Intelligence-a modern
approach, 2nd, by S. Russel & P. Norvig, Prentice
Hall, 2003
• E. Charniak, Bayesian networks without tears, AI
Magazine
• T. A. Stephenson, An introduction to Bayesian
network theory and usage, IDIAP research report,
IDIAP-RR-00-03, 2000
• B. D’Ambrosio, Inference in Bayesian networks, AI
Magazine, 1999
• M. I. Jordan & Y. Weiss, Probabilistic Inference in
graphical models,
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
5. Bayesian Networks Unit : Probabilistic Graphical Models p. 5
Contents
1. Representing Uncertain Knowledge .............. 18
2. Various PGM Models ..................................... 52
3. Conditional Independence …………………. 66
4. Inference .......................................................... 88
5. Applications on Computer Vision ................. 136
6. Summary ……………………………………. 146
7. References …………………………………… 152
Fu Jen University
Fu Jen University Department of Electrical Engineering
Department of Electrical Engineering Yuan-Kai Wang Copyright
Wang, Yuan-Kai Copyright
6. Bayesian Networks Unit : Probabilistic Graphical Models p. 6
Example – Car Diagnosis
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
7. Bayesian Networks Unit : Probabilistic Graphical Models p. 7
Examples on Computer Vision
Hand Upper Head Torso Upper Hand Anthropological
Forearm Size Forearm
Size Arm Size Size Arm Size Size Measurements
Size Sf St Size Sf
Sh Sa Shd Sa Sh A
Left Left Left Right Right Right Joints
Neck
Wrist Elbow Shoulder Shoulder Elbow Wrist J
N
Wl El Sl Sr Er Wr
Left Left Left Head Torso Right Right Right Components
Hand Forearm Upper Arm H T Upper Arm Forearm Hand C
Hl Fl Ul Ur Fr Hl
Observations Observations
Oij O
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
8. Bayesian Networks Unit : Probabilistic Graphical Models p. 8
Where do PGMs come from ?
• Common problems in real life :
– Complex, Uncertain
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
9. Bayesian Networks Unit : Probabilistic Graphical Models p. 9
Graph + Probability
• Graph has P(X,Y)
– Node + Edge X Y
• Two kinds of graph
– Directed graph
– Undirected graph P(X|Y)
• Probability has X Y
– Random variable Node
– Probability Edge
• Directed graph : conditional probability
• Undirected graph: joint probability
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
10. Bayesian Networks Unit : Probabilistic Graphical Models p. 10
Probabilistic Modeling of Problems
(1/2)
• Usually node has Burglary Earthquake
two semantics P(A|B,E)
– Cause Alarm
– Effect
P(J|A) P(M|A)
• Causal relationships
John Calls Mary Calls
between nodes
– Probabilistic
– Conditional probability P(Y|X): P(Effect|Cause)
– X and Y are not independent
– Directed graph
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
11. Bayesian Networks Unit : Probabilistic Graphical Models p. 11
Probabilistic Modeling of Problems
(2/2)
• If node has no causal semantics
• But happens together Student X
(influence each other)
– Probabilistic P(X,Y)
– Joint probability P(X,Y)
Student Y
– X and Y are not independent
– Undirected graph
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
12. Bayesian Networks Unit : Probabilistic Graphical Models p. 12
Cause/Effect Class/Feature (1/2)
• In pattern recognition Face
Expression
/computer vision P(f |class) P(f2|class)
– Cause class
1
– Effect feature Eyebrow Mouth
Motion Motion
Facial expression image Base image
(neutral expression)
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
13. Bayesian Networks Unit : Probabilistic Graphical Models p. 13
Cause/Effect Class/Feature (2/2)
• Face detection: Face
2-class classification object
P(f1|class) P(f2|class)
Skin Eye
Color pattern
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
14. Bayesian Networks Unit : Probabilistic Graphical Models p. 14
Cause/Effect State/Observation
P(xt|xt-1) xt+1
• In video analysis Real Real Real
location x location x location
(Tracking) t-1 t
– Cause State P(zt-1|xt-1) zt-1 zt
– Effect Observation
Observed Observed
location location
Real position : xt Predicted position
Detected position : zt x-t+1
P ( z t | xt )
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
15. Bayesian Networks Unit : Probabilistic Graphical Models p. 15
What Are PGMs Good For?
Medicine
Speech Bio-
Computer informatics
recognition
Vision
Text
Classification Computer
Stock market
troubleshooting
Classification: P(class|feature)
Prediction: P(Effect|Cause)=?
Diagnosis: P(Cause|Effect)=?
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
16. Bayesian Networks Unit : Probabilistic Graphical Models p. 16
Three Problems in PGM
Real Real Real
• Representation location location location
– Given a problem
– Build its graphical model Observed
location
Observed
location
(Construction of Bayesian network)
xt-1 x x
• Inference Real
location
Real t Real t+1
location location
– Given a set of evidences nodes
z
– Get probabilities of node(s) Observedzt-1 Observedt
location location
• Learning
– Learn the CPT of a BN x z
– Learn the graphical structure 1 3 P(xt|xt-1)
of a BN 2 6 P(zt-1|xt-1)
3 9
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
17. Bayesian Networks Unit : Probabilistic Graphical Models p. 17
Structure of Related Lecture Notes
Problem Structure Data
Learning
PGM B E
Representation Learning
A
Unit 5 : Probabilistic Graphical Units 16~ : MLE, EM
Unit 9 : Hybrid BN J M
Units 10~15: Naïve Bayes, MRF,
HMM, DBN,
Kalman filter P(B) Parameter
P(E) Learning
P(A|B,E)
P(J|A)
Query Inference
P(M|A)
Unit 6: Exact inference
Unit 7: Approximate inference
Unit 8: Temporal inference
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
18. Bayesian Networks Unit : Probabilistic Graphical Models p. 18
1. Representing
Uncertain Knowledge
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
19. Bayesian Networks Unit : Probabilistic Graphical Models p. 19
Review (1/3)
Bayes’ Theorem
Likelihood Prior
P (e | h ) P ( h )
P (h | e)
P (e)
Probability
Posterior
of Evidence
• Probability of an hypothesis, h, can be updated when
evidence, e, has been obtained
• It is usually not necessary to calculate P(e) directly
•As it can be obtained by normalizing the posterior
probabilities, P(hi | e)
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
20. Bayesian Networks Unit : Probabilistic Graphical Models p. 20
Review (2/3)
Marginalization
P ( X ) P ( X , h)
hH
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
21. Bayesian Networks Unit : Probabilistic Graphical Models p. 21
Review (3/3)
• Full joint probability distribution FJD
– Can answer any question P(X|E=e)
P(X|E=e) = hP(X, e, h)
– But become intractably large as the
number of variables grows
• Independence and conditional CPT
independence among random variables
– CPTs = FJD
– But can greatly reduce the number of
probabilities that need to specified
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
22. Bayesian Networks Unit : Probabilistic Graphical Models p. 22
A Simple Bayesian Network
• 1 FJD = 2 CPTs P(C)
– P(Cavity, Toothache) 0.002
= P(Toothache|Cavity)
* P(Cavity) Cavity
– P(X,Y)=P(X|Y)P(Y) Causal
Relationship
=P(Y|X)P(X)
• Graphical model Toothache
can represent
– Causal relationship T P(T|C)
– Joint relationship T 0.70
F 0.01
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
23. Bayesian Networks Unit : Probabilistic Graphical Models p. 23
A Burglary Network
P(E) (random)
The graph Burglary P(B) 0.002 variables
Earthquake
is directed 0.001
and acyclic B E P(A|B,E)
T T 0.95
A P(J|A)
Alarm T F 0.95
T 0.90 F T 0.29
F 0.05 F F 0.001
A P(M|A)
John Calls Mary Calls T 0.70
F 0.01
A conditional probability distribution quantifies
the effects of the parents on node
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
24. Bayesian Networks Unit : Probabilistic Graphical Models p. 24
Compact Representation
• If all n nodes have k parents
• O(2k n) vs. O(2n) parameters
P(E)
Burglary P(B) 0.002
Earthquake
0.001
B E P(A|B,E)
T T 0.95
A P(J|A)
Alarm T F 0.95
T 0.90 F T 0.29
F 0.05 F F 0.001
A P(M|A)
John Calls Mary Calls T 0.70
F 0.01
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
25. Bayesian Networks Unit : Probabilistic Graphical Models p. 25
Formal Definition of a BN
• Directed Acyclic Graph (DAG)
–Nodes : Random variables
–Edges : Direct influence between 2 variables
• CPTs : Quantifies the
dependency of two variables A B
P(X|Parent(X))
–Ex : P(C|A,B), P(D|A)
• A priori distribution : D C
for each node with no parents
–Ex : P(A) and P(B) E
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
26. Bayesian Networks Unit : Probabilistic Graphical Models p. 26
Conditional Independence in the
Directed Acyclic Graph
• Topology of network encodes
dependency/independence
• Weather is independent
of the other variables
• Cavity has direct
influence on Tooth and
Catch
• Toothache and Catch
are conditionally
independent given
Cavity
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
27. Bayesian Networks Unit : Probabilistic Graphical Models p. 27
Conditional Probability Table (CPT)
P(W) P(C)
0.001 0.02
C P(T|C) C P(Catch|C)
T 0.90 T 0.70
F 0.05 F 0.01
P(Xi|Parent(Xi)) or P(Xi|Pa(Xi))
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
28. Bayesian Networks Unit : Probabilistic Graphical Models p. 28
Causality and Bayesian Networks
• Not every BN describes causal relationships
between the variables
• Consider the dependence between Lung
Cancer, L, and the X-ray test, X.
• A BN with causality
L X P(x|l)=0.6
P(l)=0.001
P(x|l)=0.02
• Another BN represents the same distribution
and independencies without causality
P(l1|x1)=0.02915 L X P(x1)=0.02058
P(l1|x2)=0.00041
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
29. Bayesian Networks Unit : Probabilistic Graphical Models p. 29
Example - Construction of BN (1/3)
• I have a burglar alarm installed at
home
• I am at work
• Neighbor John calls to say my
alarm is ringing
• But neighbor Mary doesn't call
• Sometimes it's set off by minor
earthquakes
• Is there a burglar?
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
30. Bayesian Networks Unit : Probabilistic Graphical Models p. 30
Example - Construction of BN (2/3)
• Step 1: Find Random variables
– Burglar, Earthquake, Alarm, JohnCalls,
MaryCalls
• Step 2: Represent the causal relationships
among random variables
– A burglar can set the alarm off
– An earthquake can set the alarm off
– The alarm can cause Mary to call
– The alarm can cause John to call
• Step 3: Use network topology with
probability
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
31. Bayesian Networks Unit : Probabilistic Graphical Models p. 31
Example - Construction of BN (3/3)
• 5 Boolean random variables + 5 CPTs
P(E)
Burglary Earthquake 0.002
P(B)
0.001 B E P(A|B,E)
T T 0.95
Alarm T F 0.95
A P(J|A) F T 0.29
T 0.90 F F 0.001
F 0.05
A P(M|A)
John Calls Mary Calls T 0.70
F 0.01
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
32. Bayesian Networks Unit : Probabilistic Graphical Models p. 32
Marginalization in Bayesian Network
P (b, e, a, j ) P(b, e, a, j , h) P(b, e, a, j, M )
hH M m , m
P (b, e) P(b, e, h) P(b, e, A, J , M )
hH M m , m A a , a J j , j
Burglary Earthquake
Alarm
John Calls Mary Calls
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
33. Bayesian Networks Unit : Probabilistic Graphical Models p. 33
Markov Chain, Conditional Probability,
Independence, and Directed Edge
• Markov chain
P(X|L)
L X
– L and X are dependent, not independent
• Markov chain
Has conditional prob.
Not independent
Has directed edge
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
34. Bayesian Networks Unit : Probabilistic Graphical Models p. 34
Common Causes
Smoking It is a DAG
Bronchitis Lung Cancer
• Markov condition: I(B, L | S),
i.e. P(b | l, s) = P(b | s)
• If SB and SL, and “Joe is a smoker”
• “Joe has Bronchitis” v.s. “Joe has Lung Cancer” ?
• “Joe has Bronchitis” will not give us any more
information about the probability of “Joe has Lung
Cancer”
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
35. Bayesian Networks Unit : Probabilistic Graphical Models p. 35
Common Effects
Burglary Earthquake
Alarm
It is a DAG
• Markov condition: I(B, E), i.e. P(b | e) = P(b)
• Burglary and Earthquake are independent of
each other
• However they are conditionally dependent given
Alarm
• If the alarm has gone off, news that there had
been an earthquake would ‘explain away’ the
idea that a burglary had taken place
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
36. Bayesian Networks Unit : Probabilistic Graphical Models p. 36
Markov Assumption Ancestor
• Markov chain v.s.
independence Parent
• Random variable X Y1 Y2
is independent of its
non-descendents, X
given its parents Pa(X)
– Formally,
I (X, NonDesc(X) | Pa(X))
Non-descendent
Descendent
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
37. Bayesian Networks Unit : Probabilistic Graphical Models p. 37
Markov Assumption Example
• In this example: Earthquake Burglary
– I ( E, B )
– I ( B, {E, R} )
– I ( R, {A, B, C} | E ) Radio Alarm
– I ( A, R | B,E )
– I ( C, {B, E, R} | A)
Call
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
38. Bayesian Networks Unit : Probabilistic Graphical Models p. 38
Joint Probability Distribution
• Note that our joint distribution with 5 variables can
be represented as:
P(e, b, r , a, c) P(e) P(b | e) P(r | e, b) P(a | e, b, r ) P(c | e, b, r , a)
But due to the Markov condition we have, for example,
P (c | e, b, r , a ) P (c | a )
The joint probability distribution can be expressed as
P(e, b, r , a, c) P(e) P(b | e) P(r | e) P(a | e, b) P(c | a)
• Ex: the probability that someone has a smoking history,
lung cancer but not bronchitis, suffers from fatigue and
tests positive in an X-ray test is
P ( s, b, l , f , x ) 0.2 0.75 0.003 0.5 0.6 0.000135
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
39. Bayesian Networks Unit : Probabilistic Graphical Models p. 39
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
40. Bayesian Networks Unit : Probabilistic Graphical Models p. 40
Representing the Joint Distribution
• For a BN with nodes X1, X2, …, Xn
n
P( x1 , x2 ,..., xn ) P( xi | pa( xi ))
FJD i 1 n CPTs
• An enormous saving can be made regarding the
number of values required for the joint distribution
• For n binary variables
•2n – 1 values are required for FJD
• For a BN with n binary variables and
•Each node has at most k parents
•Less than 2kn values are required for CPTs
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
41. Bayesian Networks Unit : Probabilistic Graphical Models p. 41
Exercise (1/2)
S D
G U
E H
P(s, d, g, u, e A, h C)
P(s)P(d)P(g | s)P(u | s, d)P(e A| g, u)P(h C | u)
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
42. Bayesian Networks Unit : Probabilistic Graphical Models p. 42
Exercise (2/2)
• P(a, b, c, d, e) a
= P(e | a, b, c, d) P(a, b, c, d)
by the product rule b c
= P(e | c) P(a, b, c, d)
by cond. indep. assumption d e
= P(e | c) P(d | a, b, c) P(a, b, c)
= P(e | c) P(d | b, c) P(c | a, b) P(a, b)
= P(e | c) P(d | b, c) P(c | a) P(b | a) P(a)
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
43. Bayesian Networks Unit : Probabilistic Graphical Models p. 43
Exercises
• Facial Expression Recognition
• Face Detection
• Face Tracking Using GeNIe
• Body Segmentation http://genie.sis.pitt.edu/
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
44. Bayesian Networks Unit : Probabilistic Graphical Models p. 44
Another Example : Water-Sprinkler
P(C)
Cloudy 0.5
C P(S|C)
T 0.1 C P(R|C)
F 0.5 T 0.8
F 0.2
Sprinkler Rain
S R P(W|S,R)
T T 0.99
WetGrass T F 0.9
F T 0.9
F F 0.0
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
45. Bayesian Networks Unit : Probabilistic Graphical Models p. 45
Inference in Water-Sprinkler (1/2)
• If the grass is wet (WetGrass=True)
– Two possible explanations : rain or sprinkler
– Which is the more likely?
Pr( S T ,W T )
Sprinkler Pr( S T | W T )
Pr(W T )
c,r Pr(C , R, S T ,W T ) 0.2781 0.430
Pr(W T ) 0.6471
Pr(R T ,W T )
Rain Pr(R T | W T )
Pr(W T )
c,s Pr(C, S , R T ,W T ) 0.4581 0.708
Pr(W T ) 0.6471
The grass is more likely to be wet because of the rain
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
46. Bayesian Networks Unit : Probabilistic Graphical Models p. 46
Inference in Water-Sprinkler (2/2)
P(C)
Cloudy 0.5
C P(S|C)
T 0.1 C P(R|C)
F 0.5 T 0.8
F 0.2
Sprinkler Rain
S R P(W|S,R)
T T 0.99
T F 0.9
WetGrass F T 0.9 Time needed
F F 0.0
Using Bayes chain rule : for calculations
Pr(C , R, S , W ) Pr(C ) Pr( R | C ) Pr( S | R, C ) Pr(W | R, C , S ) 2 x 4 x 8 x 16 = 1024
Using conditional independency properties :
Pr(C , R, S , W ) Pr(C ) Pr( R | C ) Pr( S | C ) Pr(W | R, S ) 2 x 4 x 4 x 8 = 256
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
47. Bayesian Networks Unit : Probabilistic Graphical Models p. 47
Inference (1/5)
P(E=t|C=t)=0.1
P(B=t|C=t) = 0.7
1
0.9 1
0.8
0.9
0.7
0.8
0.6
0.7
0.5
0.6
0.4
0.5
0.3
0.2 Earthquake Burglary 0.4
0.3
0.1
0
0.2
0.1
0
Radio Alarm
E B P(A|E,B)
e b 0.9 0.1
e b 0.2 0.8
Call e b 0.9 0.1
e b 0.01 0.99
C=t
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
48. Bayesian Networks Unit : Probabilistic Graphical Models p. 48
Inference (2/5)
P(E=t|C=t)=0.1 P(B=t|C=t) = 0.7
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3
0.2
Earthquake Burglary 0.3
0.2
0.1
0.1
0
0
Radio Alarm
R=t
Call
C=t
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
49. Bayesian Networks Unit : Probabilistic Graphical Models p. 49
Inference (3/5)
P(E=t|C=t)=0.1 P(B=t|C=t) = 0.7
1
1
0.9
0.9
0.8
0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 Earthquake Burglary 0.3
0.2
0.2
0.1 0.1
0 0
P(E=t|C=t,R=t)=0.97 Radio Alarm P(B=t|C=t,R=t) = 0.1
1 1
0.9 0.9
0.8 R=t 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4 0.4
0.3 0.3
0.2
0.1
Call 0.2
0.1
0 0
C=t
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
50. Bayesian Networks Unit : Probabilistic Graphical Models p. 50
Inference (4/5)
P(E=t|C=t)=0.1 P(B=t|C=t) = 0.7
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 Earthquake Burglary 0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
Radio Alarm
P(E=t|C=t,R=t)=0.97 P(B=t|C=t,R=t) = 0.1
1
0.9
R=t 1
0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4
0.3
Call 0.4
0.3
0.2 0.2
0.1 0.1
0 0
C=t
Explaining away effect
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
51. Bayesian Networks Unit : Probabilistic Graphical Models p. 51
Inference (5/5)
P(E=t|C=t)=0.1 P(B=t|C=t) = 0.7
1 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6
0.5 Earthquake Burglary 0.6
0.5
0.4 0.4
0.3 0.3
0.2 0.2
0.1 0.1
0 0
Radio Alarm
P(E=t|C=t,R=t)=0.97 P(B=t|C=t,R=t) = 0.1
1 R=t 1
0.9 0.9
0.8 0.8
0.7 0.7
0.6 0.6
0.5 0.5
0.4
0.3
Call 0.4
0.3
0.2 0.2
0.1 0.1
0 0
C=t
“Probability theory is nothing but common sense reduced to calculation”
– Pierre Simon Laplace
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
52. Bayesian Networks Unit : Probabilistic Graphical Models p. 52
2. Various PGM Models
Taxonomy
Factor Graph
Naïve
Bayes
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
53. Bayesian Networks Unit : Probabilistic Graphical Models p. 53
Directional v.s. Undirectional
Directed Undirected
( Bayesian networks) ( Markov networks)
x1 x2 x1 x2
y1 y2 y1 y2
1
p(x, y) p(xi | x pa(i ) ) p(y j | x pa( j ) ) p (x, y ) a (x, y )
i j Z a
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
54. Bayesian Networks Unit : Probabilistic Graphical Models p. 54
Naive Bayes Model
• Strong (Naive) assumption of problems
– A single cause directly influences a number
of effects
– All effects are conditionally independent,
given the cause
n
P( x1 , x2 ,..., xn ) P( xi | pa ( xi ))
i 1
P(Cause, Effecti , Effectn )
P(Cause) P( Effecti | Cause)
2n+1 probabilities O(n)
i
More details on another unit
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
55. Bayesian Networks Unit : Probabilistic Graphical Models p. 55
Naïve Bayesian Classifier (NBC)
• Use Naïve Bayes for classification
P (Class | Feature1 , Featuren ) Class
P ( Feature1 , Featuren , Class)
n
P (Class) P ( Featurei | Class) Feature 1 Feature n
i 1
Face
Face
Expression
object
Skin Eye Eyebrow Mouth
Color pattern Motion Motion
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
56. Bayesian Networks Unit : Probabilistic Graphical Models p. 56
Temporal Causality
Represented by Bayesian Networks
• Temporal Causality
– In many systems, data arrives sequentially
– Dealing causality with time
• Dynamic Bayes nets (DBNs) can be used
to model such time-series (sequence)
data
• Special cases of DBNs include
– State-space models (Kalman filter)
– Hidden Markov models (HMMs)
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
57. Bayesian Networks Unit : Probabilistic Graphical Models p. 57
State Space Models (SSM)
t = 1 2 3
• Hidden Markov Model X1 X2 X3 XT
• Kalman Filter
Y1 Y2 Y3 YT
n
P( x1 , x2 ,..., xn ) P( xi | pa( xi ))
i 1
P ( X 1 ,..., X T , Y1 , , YT ) P ( X 1:T , Y1:T )
P( X 1 ) P(Y1 | X 1 ) P( X 2 | X 1 ) P (Y2 | X 2 ) P( X T | X T 1 ) P(YT | X T )
n
P( X i | X i 1 ) P(Yi | X i ), where P( X 1 | X 0 ) P( X 1 )
i 1
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
58. Bayesian Networks Unit : Probabilistic Graphical Models p. 58
DBN (1/2)
More complex temporal models
than HMM & Kalman
Slice 1 Slice 2
t=1 2 3 4 5
(DAG) (DAG)
+
Repeat
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
59. Bayesian Networks Unit : Probabilistic Graphical Models p. 59
DBN (2/2)
t=1 2 3 4 5
n
P( x1 , x2 ,..., xn ) P( xi | pa( xi ))
i 1
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
60. Bayesian Networks Unit : Probabilistic Graphical Models p. 60
Bayesian SSM
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
61. Bayesian Networks Unit : Probabilistic Graphical Models p. 61
Factorial SSM
• Multiple hidden sequences
• Avoid exponentially large hidden space
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
62. Bayesian Networks Unit : Probabilistic Graphical Models p. 62
Example: Markov Random Field
• Typical application: image region
labelling
yi
xi
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
63. Bayesian Networks Unit : Probabilistic Graphical Models p. 63
Example: Conditional Random Field
y y
y y
xi
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
64. Bayesian Networks Unit : Probabilistic Graphical Models p. 64
Markov Random Fields (1/2)
Undirected graph
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
65. Bayesian Networks Unit : Probabilistic Graphical Models p. 65
MRF (2/2)
y
Parameter
tying
x
Local evidence
Compatibility with neighbors (compatibility with image)
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
66. Bayesian Networks Unit : Probabilistic Graphical Models p. 66
3. Conditional Independencies
• A Bayesian network/probabilistic
graphical model G, represents a set of
Markov Independencies P
• There is a factorization theorem
P ( X 1 ,..., X n ) P ( X i | Pai )
i
• This section inspects deeper meanings of
conditional independence for
– The factorization theorem
– Inference algorithms in later units
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
67. Bayesian Networks Unit : Probabilistic Graphical Models p. 67
Conditional Independence
• Dependencies
– Two connected nodes
influence each other
• Independent
– Example: I(B;E)
• Conditional Independent
– Example
• I(J;M|A)?
• I(B;E|A)?
– d-seperation
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
68. Bayesian Networks Unit : Probabilistic Graphical Models p. 68
D-Separation
• It is a rule describing the influences
between nodes
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
69. Bayesian Networks Unit : Probabilistic Graphical Models p. 69
Serial (Intermediate Cause)
• Indirect causal effect, no
evidence
B • Clearly burglary will
effect Marry call
A
• Same situation for
indirect evidence effect,
M because independence is
symmetric
• If I(E;M|A) then I(M;E|A)
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
70. Bayesian Networks Unit : Probabilistic Graphical Models p. 70
Diverging (Common Cause)
• Influence can flow
A
from John call to
Mary call if we don‘t
know whether or not
J M there is alarm.
• But I(J;M|A)
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
71. Bayesian Networks Unit : Probabilistic Graphical Models p. 71
Converging (Common Effect)
• Influence can‘t flow from
E B
Earthquake to burglary
if we don‘t know whether
or not there is alarm
• So I(E;B)
A
• Special structure which
cause independence.
• V-Structure
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
72. Bayesian Networks Unit : Probabilistic Graphical Models p. 72
Independence of Two Events
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
73. Bayesian Networks Unit : Probabilistic Graphical Models p. 73
D-Separation for 3 Nodes
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
74. Bayesian Networks Unit : Probabilistic Graphical Models p. 74
Path Blockage (1/3)
• Three cases:
–Common cause Blocked
Blocked Unblocked
Active
E E
– Intermediate cause
R A R A
–Common Effect
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
75. Bayesian Networks Unit : Probabilistic Graphical Models p. 75
Path Blockage (2/3)
• Three cases:
–Common cause Blocked Unblocked
Active
E E
– Intermediate cause A A
–Common Effect C C
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
76. Bayesian Networks Unit : Probabilistic Graphical Models p. 76
Path Blockage (3/3)
Blocked Unblocked
Active
Three cases:
– Common cause E B
– Intermediate cause E B A
– Common Effect A C
E B
C
A
C
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
77. Bayesian Networks Unit : Probabilistic Graphical Models p. 77
General Case
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
78. Bayesian Networks Unit : Probabilistic Graphical Models p. 78
D-Separation in General
• X is d-separated from Y, given Z,
– If all paths from a node in X to a node in Y
are blocked, given Z
• Checking d-separation can be done
efficiently
(linear time in number of edges)
– Bottom-up phase:
Mark all nodes whose descendents are in Z
– X to Y phase:
Traverse (BFS) all edges on paths from X
to Y and check if they are blocked
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
79. Bayesian Networks Unit : Probabilistic Graphical Models p. 79
Paths (1/2)
• Intuition: dependency must “flow” along
paths in the graph
• A path is a sequence of neighboring
variables
Earthquake Burglary
Examples:
• REAB Radio Alarm
• CAER
Call
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
80. Bayesian Networks Unit : Probabilistic Graphical Models p. 80
Paths (2/2)
• For a path between two end nodes X, Y
• The path is a
– Active path
• If we can find dependency between X & Y
– Blocked path
• If we cannot find dependency between X & Y
• X & Y are conditional independent
• X & Y are D-Separated
• We want to classify situations in which
paths are active
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
81. Bayesian Networks Unit : Probabilistic Graphical Models p. 81
D-Separation Example 1 (1/3)
E B
– d-sep(R,B)?
R A
C
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
82. Bayesian Networks Unit : Probabilistic Graphical Models p. 82
D-Separation Example 1 (2/3)
– d-sep(R,B) = yes E B
– d-sep(R,B|A)?
R A
C
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
83. Bayesian Networks Unit : Probabilistic Graphical Models p. 83
D-Separation Example 1 (3/3)
– d-sep(R,B) = yes E B
– d-sep(R,B|A) = no
– d-sep(R,B|E,A)? R A
C
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
84. Bayesian Networks Unit : Probabilistic Graphical Models p. 84
D-Separation Example 2
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
85. Bayesian Networks Unit : Probabilistic Graphical Models p. 85
D-Separation Example 3
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
86. Bayesian Networks Unit : Probabilistic Graphical Models p. 86
d-separation: Car Start Problem
• 1. ‘Start’ and ‘Fuel’ are dependent on each other.
• 2. ‘Start’ and ‘Clean Spark Plugs’ are dependent on each other.
• 3. ‘Fuel’ and ‘Fuel Meter Standing’ are dependent on each other.
• 4. ‘Fuel’ and ‘Clean Spark Plugs’ are conditionally dependent on
each other given the value of ‘Start’.
• 5. ‘Fuel Meter Standing’ and ‘Start’ are conditionally
independent given the value of ‘Fuel’.
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
87. Bayesian Networks Unit : Probabilistic Graphical Models p. 87
Exercises
P(xt|xt-1) xt+1 Face
Real Real Real Expression
location x location x location
t-1 t
P(zt-1|xt-1) zt-1 zt
Observed Observed Eyebrow Mouth
location location Motion Motion
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
88. Bayesian Networks Unit : Probabilistic Graphical Models p. 88
4. Inference
• 4.1 What Is Inference
• 4.2 How Inference
• 4.3 Inference Methods
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
89. Bayesian Networks Unit : Probabilistic Graphical Models p. 89
4.1 What Is Inference
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
90. Bayesian Networks Unit : Probabilistic Graphical Models p. 90
Exercises (1/2)
• Face detection Facial Expression Recog.
Face Face
object Expression
Skin Eye Eyebrow Mouth
Color pattern Motion Motion
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
91. Bayesian Networks Unit : Probabilistic Graphical Models p. 91
Exercises (2/2)
P(xt|xt-1) xt+1
• Face tracking Real Real Real
location x location x location
t-1 t
P(zt-1|xt-1) zt-1 zt
Observed Observed
location location
Real position : xt Predicted position
x-t+1
Detected position : zt
P ( z t | xt )
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
92. Bayesian Networks Unit : Probabilistic Graphical Models p. 92
3 Kinds of Variables in Inference
• Remember the general inference
procedure in previous unit
(uncertainty inference unit)
• Let P(X|E=e) be the query
– X be the query variable
– E be the set of evidence variables
V S
• e be the observed values of E
– H be the remaining T L
unobserved variables A B
(Hidden variables) X D
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
93. Bayesian Networks Unit : Probabilistic Graphical Models p. 93
The Burglary Example
Query : P(Burglary|John Calls=true)
Query variables: X
Burglary Earthquake
Burglary
Evidence variables: E=e
John Calls = true Alarm
Hidden variables: H
Earthquake, Alarm, John Calls Mary Calls
Marry Calls
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
94. Bayesian Networks Unit : Probabilistic Graphical Models p. 94
The Asia Example
• Query P(L|v,s,d) V S
– Query variables: L
– Evidence variables: T L
V=true, S=true, D=true A B
– Hidden variables:
T, X, A, B X D
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
95. Bayesian Networks Unit : Probabilistic Graphical Models p. 95
arg max P(X|e)
• For P(X | e), if X is a Boolean variable
• P(X | e) will compute 2 probabilities
P(X=true | e) = 0.8
P(X=false | e) = 0.2
• arg maxx P(X=x|e) will get a decision
P(X=true | e) = 0.8
Max X = True
P(X=false | e) = 0.2
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
96. Bayesian Networks Unit : Probabilistic Graphical Models p. 96
Five Types of Queries in Inference
• For a probabilistic graphical model G
• Given a set of evidence E=e
• Query the PGM with
– P(e) : Likelihood query
– arg max P(e) :
Maximum likelihood query
– P(X|e) : Posterior belief query
– arg maxx P(X=x|e) : (Single query variable)
Maximum a posterior (MAP) query
– arg maxx …x P(X1=x1, …, Xt=xt|e) :
1 t
Most probable explanation (MPE) query
Also called Viterbi decoding
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
97. Bayesian Networks Unit : Probabilistic Graphical Models p. 97
Likelihood Query P(e) (1/2)
Input video
Probability of Evidence
X1 X2 Xt An HMM
e1 for Surprise
E1 E2 Et
e2 e1:t P (E1:t=e1:t)
…
et
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
98. Bayesian Networks Unit : Probabilistic Graphical Models p. 98
Likelihood Query P(e) (2/2)
• Marginalization of all hidden variables
P( E e, H h)
hH
P ( E1:t e1:t , X 1 , , X t )
X1 X 2 Xt
P( E
X 1 X t
1:t e1:t , X 1 , , X t )
n
P( X
X 1 X t i 1
i | X i 1 ) P( Ei | X i ), where P ( X 1 | X 0 ) P ( X 1 )
X1 X2 Xt
E1 E2 Et
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
99. Bayesian Networks Unit : Probabilistic Graphical Models p. 99
Maximum Likelihood Query
arg max P(e)
Input video An HMM
X1 X2 Xt
for Surprise
e1 PS(Xt|Xt-1),
E1 E2 Et PS(Ei|Xi)
P Surprise(e1:t)
e2 e1:t Max
P Cry(e1:t)
…
X1 X2 Xt Cry HMM
PC(Xt|Xt-1),
et
E1 E2 Et PC(Ei|Xi)
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
100. Bayesian Networks Unit : Probabilistic Graphical Models p. 100
Maximum Likelihood Query
arg max P(e)
• Likelihood query P(E=e)
Step 1: Bayes theorem P ( E e)
Step 2:
Marginalization P ( E e, H h)
of all hidden variables hH
• Query arg max P(E=e)
Step 1: Bayes theorem
Step 2:
Marginalization arg max P ( E e, H h)
of all hidden variables hH
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
101. Bayesian Networks Unit : Probabilistic Graphical Models p. 101
Posteriori Belief Query P(X|e)
• Usually applied on tracking
– Use temporal models of PGM
• 4 query types
– Filtering: P(Xt | E1=e1,…, Et=et)=P(Xt |e1:t)
– Prediction: P(Xt+1 | e1:t)
– Smoothing: P(Xt-k | e1:t)
(Fixed-lag smoothing)
X1 X2 Xt Xt+1
E1 E2 Et
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
102. Bayesian Networks Unit : Probabilistic Graphical Models p. 102
P(X|e) – Filtering (1/2)
• P(Xt | e1:t) X1 X2 Xt
E1 E2 Et
Real position: xi Filtered position: x’t
Detected position: ei
P ( z t | xt )
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
103. Bayesian Networks Unit : Probabilistic Graphical Models p. 103
P(X|e) – Filtering (2/2)
• Inference of the query P(Xt|e1:t) is
P( X t , e1:t )
Step 1: P( X t | e1:t )
P(e1:t )
Bayes theorem
P( X t , e1:t )
Step 2:
Marginalization P ( X t , e1:t , X 1 X t 1 )
X 1 X t 1
of all hidden variables
Step 3: P ( X i | X i 1 )P (ei | X i )
Chaining by X X i 1~ t 1 t 1
conditional independence
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
104. Bayesian Networks Unit : Probabilistic Graphical Models p. 104
P(X|e) – Prediction (1/2)
• P(Xt+k | e1:t) for k > 0
For k=1 X X Xt Xt+1
1 2
E1 E2 Et
Real position : xi Predicted position
Detected position : ei x’t+1
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
105. Bayesian Networks Unit : Probabilistic Graphical Models p. 105
P(X|e) – Prediction (2/2)
• Inference of the query P(Xt+1|e1:t) is
P( X t 1 , e1:t )
Step 1: P( X t 1 | e1:t )
P(e1:t )
Bayes theorem
P( X t 1 , e1:t )
Step 2:
Marginalization P ( X t 1 , e1:t , X 1 X t )
X 1 X t
of all hidden variables
Step 3: P ( X t 1 | X t ) P ( X i | X i 1 )P (ei | X i )
Chaining by X X i 1~ t1 t
conditional independence
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
106. Bayesian Networks Unit : Probabilistic Graphical Models p. 106
P(X|e) – Smoothing (1/3)
• P(Xk | e1:t) for 1 k < t
X1 X2 Xk Xt
E1 E2 Ek Et
Real position: xt
Smoothed position: xt
Detected position: zt
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
107. Bayesian Networks Unit : Probabilistic Graphical Models p. 107
P(X|e) – Smoothing (2/3)
• Inference of the query P(Xk|e1:t) is
P( X k , e1:t )
Step 1: P( X k | e1:t )
P(e1:t )
Bayes theorem
P( X k , e1:t )
Step 2:
Marginalization P,e, 1X:t , X 1 X t )
X 1 X k 1 , X K 1
(
of all hidden variables t
Step 3:
Chaining by
,, X it P( X i | X i 1 )P(ei | X i )
X X , X 1~
1 k 1 K 1 t
conditional independence
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
108. Bayesian Networks Unit : Probabilistic Graphical Models p. 108
P(X|e) – Smoothing (3/3)
• Fixed-lag smoothing
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
109. Bayesian Networks Unit : Probabilistic Graphical Models p. 109
MAP Query (1/2)
• arg maxx P(Xi=x|e)
• Usually applied on Classification
– Find most likely class X=x,
given the evidence e (feature)
P(X=Surprise|e) If P(X=Smile|e) is the max probability
Smile = arg maxx P(Xi=x|e)
P(X=Smile|e)
Facial X={Surprise, Smile, …}
Expression
Eyebrow Mouth
Motion
Motion
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
110. Bayesian Networks Unit : Probabilistic Graphical Models p. 110
MAP Query (2/2)
• MAP query arg maxx P(X=x|E=e)
Step 1: arg max P( X x | e)
x
Bayes theorem P ( X x, e)
arg max
x P ( e)
Step 2: arg max P( X x, e)
x
Marginalization
of all hidden variables
arg max P ( X x, e, H h)
x
hH
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
111. Bayesian Networks Unit : Probabilistic Graphical Models p. 111
MPE Query
• Also called Viterbi decoding
• arg maxx P(X1=x1,…, Xt=xt|e1:t)
• = arg maxx1:t P(X1:t|e1:t)
• = Smoothing for X1:t-1 + Filtering
X1 X2 Xt
E1 E2 Et
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
112. Bayesian Networks Unit : Probabilistic Graphical Models p. 112
Exercises
• Face Detection
• Facial Expression Recognition
• Face Tracking
• Body Segmentation
X={Surprise, Smile, …}
P(xt|xt-1) xt+1
Facial
Expression Real Real Real
location x location x location
t-1 t
P(zt-1|xt-1) zt-1 zt
Eyebrow Mouth
Motion Observed Observed
Motion location location
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
113. Bayesian Networks Unit : Probabilistic Graphical Models p. 113
4.2 How Inference
• Inference of the query P(X|E=e) is
P ( X , E e)
Step 1: P ( X | E e)
P ( E e)
Bayes theorem
P ( X , E e)
Step 2:
Marginalization P ( X , E e, H h)
of all hidden variables hH
Step 3:
Chaining by P( X i | Pa ( X i ))
hH i 1~ n
conditional independence
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
114. Bayesian Networks Unit : Probabilistic Graphical Models p. 114
The 4th Step of Inference
Steps 1 - 3
P( X | E e) P( X i | Pa ( X i ))
hH i 1~ n
• Step 4: Compute the sum product?
– Need an efficient algorithm
– First, we will explain the computation of
the sum-product by an enumeration
algorithm
• Easy but not efficient
– Then, more efficient methods will be
explained in next two units
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
115. Bayesian Networks Unit : Probabilistic Graphical Models p. 115
The Burglary Example (1/3)
• A posterior query on the burglary
network
– P(B|j, m)
– = P(B, j, m) / P(j, m)
– = P(B, j, m)
– = e a P(B, e, a, j, m)
E and A are hidden variables
This will use the full joint distribution table
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright
116. Bayesian Networks Unit : Probabilistic Graphical Models p. 116
The Burglary Example (2/3)
• Rewrite the full joint entries using
product of CPT entries
– P(B|j,m)
– = E A P(B, E, A, j, m)
– = E A P(j, m, A, B , E)
– = E A P(j|m,A,B,E)P(m|A,B,E)
P(A|B,E)P(B|E)P(E) (Chain rule)
– = eaP(B)P(e)P(a|B,e)P(j|a)P(m|a)
(Conditional Independence)
(All probabilities are CPT entries)
Fu Jen University Department of Electrical Engineering Yuan-Kai Wang Copyright