This document provides an overview of Chapter 14 on probabilistic reasoning and Bayesian networks from an artificial intelligence textbook. It introduces Bayesian networks as a way to represent knowledge over uncertain domains using directed graphs. Each node corresponds to a variable and arrows represent conditional dependencies between variables. The document explains how Bayesian networks can encode a joint probability distribution and represent conditional independence relationships. It also discusses techniques for efficiently representing conditional distributions in Bayesian networks, including noisy logical relationships and continuous variables. The chapter covers exact and approximate inference methods for Bayesian networks.
Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
You will learn the basic concepts of machine learning classification and will be introduced to some different algorithms that can be used. This is from a very high level and will not be getting into the nitty-gritty details.
Naive Bayes is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class.
Decision tree is a type of supervised learning algorithm (having a pre-defined target variable) that is mostly used in classification problems. It is a tree in which each branch node represents a choice between a number of alternatives, and each leaf node represents a decision.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
You will learn the basic concepts of machine learning classification and will be introduced to some different algorithms that can be used. This is from a very high level and will not be getting into the nitty-gritty details.
Naive Bayes is a kind of classifier which uses the Bayes Theorem. It predicts membership probabilities for each class such as the probability that given record or data point belongs to a particular class.
Bayes Nets Meetup Sept 29th 2016 - Bayesian Network Modelling by Marco ScutariBayes Nets meetup London
A talk given at the Bayes Nets meetup on Sept 29th 2016 by Dr Marco Scutari from the University of Oxford. Title of the talk was Bayesian Network Modelling with examples in Genetics and Systems Biology, with case studies.
Converting Graphic Relationships into Conditional Probabilities in Bayesian N...Loc Nguyen
Bayesian network is a powerful mathematical tool for prediction and diagnosis applications. A large Bayesian network can be constituted of many simple networks which in turn are constructed from simple graphs. A simple graph consists of one child node and many parent nodes. The strength of each relationship between a child node and a parent node is quantified by a weight and all relationships share the same semantics such as prerequisite, diagnostic, and aggregation. The research focuses on converting graphic relationships into conditional probabilities in order to construct a simple Bayesian network from a graph. Diagnostic relationship is the main research object, in which sufficient diagnostic proposition is proposed for validating diagnostic relationship. Relationship conversion is adhered to logic gates such as AND, OR, and XOR, which is essential feature of the research.
In the era of big data and Internet of things, massive sensor data are
gathered with Internet of things. Quantity of data captured by sensor
network are considered to contain highly useful and valuable
information. However, since sensor data are usually correlated in time
and space, not all the gathered data are valuable for further data
processing and analysis. Preprocessing is necessary for eliminating the
redundancy in gathered massive sensor data.
Approaches based on static Bayesian network (SBN) and dynamic
Bayesian networks (DBNs) are proposed for preprocessing big sensor
data, especially for redundancy elimination.
Static sensor data redundancy detection algorithm (SSDRDA) for
eliminating redundant data in static data sets and real-time sensor
data redundancy detection algorithm (RSDRDA) for eliminating
redundant sensor data in real-time are proposed.
It gives an overview about the role of Hidden Markov Models and Bayesian Models in Inference. Discusses the limitations of the FOL and talks about the generative and temporal models
allocation of entropy are convex cone, this report shows the 3D view of convex cone of allocation of entropy for bipartite and tripartite quantum system
THE IMPORTANCE OF MARTIAN ATMOSPHERE SAMPLE RETURN.Sérgio Sacani
The return of a sample of near-surface atmosphere from Mars would facilitate answers to several first-order science questions surrounding the formation and evolution of the planet. One of the important aspects of terrestrial planet formation in general is the role that primary atmospheres played in influencing the chemistry and structure of the planets and their antecedents. Studies of the martian atmosphere can be used to investigate the role of a primary atmosphere in its history. Atmosphere samples would also inform our understanding of the near-surface chemistry of the planet, and ultimately the prospects for life. High-precision isotopic analyses of constituent gases are needed to address these questions, requiring that the analyses are made on returned samples rather than in situ.
Observation of Io’s Resurfacing via Plume Deposition Using Ground-based Adapt...Sérgio Sacani
Since volcanic activity was first discovered on Io from Voyager images in 1979, changes
on Io’s surface have been monitored from both spacecraft and ground-based telescopes.
Here, we present the highest spatial resolution images of Io ever obtained from a groundbased telescope. These images, acquired by the SHARK-VIS instrument on the Large
Binocular Telescope, show evidence of a major resurfacing event on Io’s trailing hemisphere. When compared to the most recent spacecraft images, the SHARK-VIS images
show that a plume deposit from a powerful eruption at Pillan Patera has covered part
of the long-lived Pele plume deposit. Although this type of resurfacing event may be common on Io, few have been detected due to the rarity of spacecraft visits and the previously low spatial resolution available from Earth-based telescopes. The SHARK-VIS instrument ushers in a new era of high resolution imaging of Io’s surface using adaptive
optics at visible wavelengths.
Deep Behavioral Phenotyping in Systems Neuroscience for Functional Atlasing a...Ana Luísa Pinho
Functional Magnetic Resonance Imaging (fMRI) provides means to characterize brain activations in response to behavior. However, cognitive neuroscience has been limited to group-level effects referring to the performance of specific tasks. To obtain the functional profile of elementary cognitive mechanisms, the combination of brain responses to many tasks is required. Yet, to date, both structural atlases and parcellation-based activations do not fully account for cognitive function and still present several limitations. Further, they do not adapt overall to individual characteristics. In this talk, I will give an account of deep-behavioral phenotyping strategies, namely data-driven methods in large task-fMRI datasets, to optimize functional brain-data collection and improve inference of effects-of-interest related to mental processes. Key to this approach is the employment of fast multi-functional paradigms rich on features that can be well parametrized and, consequently, facilitate the creation of psycho-physiological constructs to be modelled with imaging data. Particular emphasis will be given to music stimuli when studying high-order cognitive mechanisms, due to their ecological nature and quality to enable complex behavior compounded by discrete entities. I will also discuss how deep-behavioral phenotyping and individualized models applied to neuroimaging data can better account for the subject-specific organization of domain-general cognitive systems in the human brain. Finally, the accumulation of functional brain signatures brings the possibility to clarify relationships among tasks and create a univocal link between brain systems and mental functions through: (1) the development of ontologies proposing an organization of cognitive processes; and (2) brain-network taxonomies describing functional specialization. To this end, tools to improve commensurability in cognitive science are necessary, such as public repositories, ontology-based platforms and automated meta-analysis tools. I will thus discuss some brain-atlasing resources currently under development, and their applicability in cognitive as well as clinical neuroscience.
Richard's entangled aventures in wonderlandRichard Gill
Since the loophole-free Bell experiments of 2020 and the Nobel prizes in physics of 2022, critics of Bell's work have retreated to the fortress of super-determinism. Now, super-determinism is a derogatory word - it just means "determinism". Palmer, Hance and Hossenfelder argue that quantum mechanics and determinism are not incompatible, using a sophisticated mathematical construction based on a subtle thinning of allowed states and measurements in quantum mechanics, such that what is left appears to make Bell's argument fail, without altering the empirical predictions of quantum mechanics. I think however that it is a smoke screen, and the slogan "lost in math" comes to my mind. I will discuss some other recent disproofs of Bell's theorem using the language of causality based on causal graphs. Causal thinking is also central to law and justice. I will mention surprising connections to my work on serial killer nurse cases, in particular the Dutch case of Lucia de Berk and the current UK case of Lucy Letby.
Multi-source connectivity as the driver of solar wind variability in the heli...Sérgio Sacani
The ambient solar wind that flls the heliosphere originates from multiple
sources in the solar corona and is highly structured. It is often described
as high-speed, relatively homogeneous, plasma streams from coronal
holes and slow-speed, highly variable, streams whose source regions are
under debate. A key goal of ESA/NASA’s Solar Orbiter mission is to identify
solar wind sources and understand what drives the complexity seen in the
heliosphere. By combining magnetic feld modelling and spectroscopic
techniques with high-resolution observations and measurements, we show
that the solar wind variability detected in situ by Solar Orbiter in March
2022 is driven by spatio-temporal changes in the magnetic connectivity to
multiple sources in the solar atmosphere. The magnetic feld footpoints
connected to the spacecraft moved from the boundaries of a coronal hole
to one active region (12961) and then across to another region (12957). This
is refected in the in situ measurements, which show the transition from fast
to highly Alfvénic then to slow solar wind that is disrupted by the arrival of
a coronal mass ejection. Our results describe solar wind variability at 0.5 au
but are applicable to near-Earth observatories.
This pdf is about the Schizophrenia.
For more details visit on YouTube; @SELF-EXPLANATORY;
https://www.youtube.com/channel/UCAiarMZDNhe1A3Rnpr_WkzA/videos
Thanks...!
Earliest Galaxies in the JADES Origins Field: Luminosity Function and Cosmic ...Sérgio Sacani
We characterize the earliest galaxy population in the JADES Origins Field (JOF), the deepest
imaging field observed with JWST. We make use of the ancillary Hubble optical images (5 filters
spanning 0.4−0.9µm) and novel JWST images with 14 filters spanning 0.8−5µm, including 7 mediumband filters, and reaching total exposure times of up to 46 hours per filter. We combine all our data
at > 2.3µm to construct an ultradeep image, reaching as deep as ≈ 31.4 AB mag in the stack and
30.3-31.0 AB mag (5σ, r = 0.1” circular aperture) in individual filters. We measure photometric
redshifts and use robust selection criteria to identify a sample of eight galaxy candidates at redshifts
z = 11.5 − 15. These objects show compact half-light radii of R1/2 ∼ 50 − 200pc, stellar masses of
M⋆ ∼ 107−108M⊙, and star-formation rates of SFR ∼ 0.1−1 M⊙ yr−1
. Our search finds no candidates
at 15 < z < 20, placing upper limits at these redshifts. We develop a forward modeling approach to
infer the properties of the evolving luminosity function without binning in redshift or luminosity that
marginalizes over the photometric redshift uncertainty of our candidate galaxies and incorporates the
impact of non-detections. We find a z = 12 luminosity function in good agreement with prior results,
and that the luminosity function normalization and UV luminosity density decline by a factor of ∼ 2.5
from z = 12 to z = 14. We discuss the possible implications of our results in the context of theoretical
models for evolution of the dark matter halo mass function.
Professional air quality monitoring systems provide immediate, on-site data for analysis, compliance, and decision-making.
Monitor common gases, weather parameters, particulates.
2. Introduction
•Chapter 13
othe basic elements of probability theory
othe importance of independence and conditional
independence relationships
•This Chapter
oBayesian networks
systematic way to represent such relation- ships explicitly
3. Agenda
•14.1 Representing Knowledge in an Uncertain Domain
•14.2 The Semantics of Bayesian Networks
•14.3 Efficient Representation of Conditional
Distributions
•14.4 Exact Inference in Bayesian Networks
•14.5 Approximate Inference in Bayesian Networks
•14.6 Relational and First-Order Probability Models
•14.7 Other Approaches to Uncertain Reasoning
4. 14.1 Representing Knowledge in an Uncertain Domain
•Bayesian Networks
oA directed graph in which each node is annotated
with quantitative probability information
oDefinition
1. Each node corresponds to a random variable, which
may be discrete or continuous
2. A set of directed links or arrows connects pairs of
nodes. ( If there is an arrow from node X to node Y , X is
said to be a parent of Y. )
3. The graph has no directed cycle.
4. Each node Xi has a conditional probability distribution
P(Xi|Parents(Xi)) that quantifies the effect of the parents
on the node.
5. Simple Example of Bayesian Networks
•The variables Toothache , Cavity, Catch, and
Weather
oWeather is independent of the other variables
oToothache and Catch are conditionally
independent, given Cavity
Cavity is a direct cause of
Toothache and Catch
no direct causal relationship
exists between Toothache
and Catch.
6. Complex Example of Bayesian Networks(1/4)
•The variables Burglary, Earthquake, Alarm,
MaryCalls and JohnCalls
oNew burglar alarm installed at home
oFairly reliable at detecting a burglary
oResponds on occasion to minor earthquakes
oTwo neighbors, John and Mary
oThey call you at work when they hear the alarm
oJohn nearly always calls when he hears the alarm
oBut sometimes confuses the telephone ringing
oMary likes rather loud music and misses the alarm
Give the evidence of who has or has not called,
then estimate the probability of a burglary
7. Complex Example of Bayesian Networks(2/4)
Burglary and Earthquakes
directly affect the probability
of the alarm’s going off
John and Mary call depends
only on the alarm
The network represents our assumptions that they do not perceive
burglaries directly, they do not notice minor earthquakes, and they
do not confer before calling
8. Complex Example of Bayesian Networks(3/4)
•Conditional Probability Table(CPT)
oEach row contains the conditional probability of each node value
oConditioning case is a combination of values for the parent nodes
oEach row must sum to 1
oThe entries represent an exhaustive set of cases for the variable
oFor Boolean variables, The probability of a true value is p, the
probability of false must be 1 – p
oBoolean variable with k Boolean parents contains 2k specifiable
probabilities
oA node with no parents has only one row, representing the prior
probabilities of each possible value of the variable
10. 14.2 The Semantics of Bayesian Networks
•The two ways to understand the meaning of
Bayesian Networks
oTo see the network as a representation of the joint
probability distribution
To be helpful in understanding how to construct networks,
oTo view it as an encoding of a collection of
conditional independence statements
To be helpful in designing inference procedures
11. Representing the full joint distribution
•Full joint distribution
𝑃 𝑥1 , … . . , 𝑥 𝑛 =
𝑖=1
𝑛
𝑃 𝑥𝑖 𝑥𝑖−1 , … . . , 𝑥1 )
•Ex.
oThe alarm has sounded(a), but neither a burglary(b)
nor an earthquake has occurred(e), and both John(j)
and Mary(m) call
P(j,m,a,¬b,¬e)
= P(j|parents(j))P(m|parents(m)P(a|parents(a))
P(¬b |parents(¬b))P(¬e |parents(¬e))
= P(j|a)P(m|a)P(a|¬b∧¬e)P(¬b)P(¬e)
= 0.90×0.70×0.001×0.999×0.998=0.000628
12. A method for constructing Bayesian networks(1)
•How to construct a GOOD Bayesian network
•Full Joint Distribution
𝑃 𝑥1 , … . . , 𝑥 𝑛 = 𝑖=1
𝑛
𝑃 𝑥𝑖 𝑥𝑖−1 , … . . , 𝑥1 )
𝑃 𝑥1 , … . . , 𝑥 𝑛 = 𝑖=1
𝑛
𝑃 𝑥𝑖 𝑃𝑎𝑟𝑒𝑛𝑡(𝑥𝑖 ))
•Correct representation
oonly if each node is conditionally independent of
its other predecessors in the node ordering, given
its parents
The parents of node Xi should contain all those
nodes in X1,..,Xi−1 that directly influence Xi.
13. A method for constructing Bayesian networks(2)
•Ex
oSuppose we have completed the network in
Figure except for choices of parents for MaryCalls
MaryCalls is certainly influenced by whether there is a
Burglary or an Earthquake, but not directly influenced
Also, given the state of the alarm, whether John calls has
no influence on Mary’s calling
P(MaryCalls | JohnCalls, Alarm, Earthquake, Burglary)
= P(MaryCalls | Alarm)
14. Compactness and node ordering
•Bayesian network can often be far more
compact than the full joint distribution
•It may not be worth the additional complexity
in the network for the small gain in accuracy.
•The correct procedure for adding a node is to
first add the root cause first and then give the
variables that they affect
15. Compactness and node ordering
•We will get a compact Bayesian network only
if we choose the node ordering well
•What happens if we happen to choose the
wrong order?
MaryCalls→JohnCalls
→ Alarm→Burglary
→Earthquake
MaryCalls→JohnCalls
→Earthquake→Burgla
ry→Alarm
Burglary→Earthquake
→Alarm→MaryCalls
→JohnCalls
16. Conditional independence relations in Bayesian networks
•“Numerical” semantics
oFull Joint Distribution
•“Nopological” semantics
oConditional independence relationships by the
graph structure
The “numerical” semantics and the
“topological” semantics are equivalent
17. Conditional independence relations in Bayesian networks
•The topological semantics specifies that each
variable is conditionally independent of its
non-descendants, given its parents
•Ex.
oJohnCalls is independent of Burglary, Earthquake,
and MaryCalls given the value of Alarm
18. Conditional independence relations in Bayesian networks
•A node is conditionally independent of all
other nodes in the network, given its parents,
children, and children’s parents(Markov
blanket)
•Ex.
•Burglary is independent of JohnCalls and
MaryCalls, given Alarm and Earthquake
19. Conditional independence relations in Bayesian networks
A node X is
conditionally
independent of its non-
descendants (e.g., the
Zijs) given its parents
(the Uis shown in the
gray area)
A node X is
conditionally
independent of all other
nodes in the network
given its Markov
blanket (the gray area).
20. 14.3 Efficient Representation of Conditional Distributions
•CPT cannot handle large number or continuous
value varibles.
•Relationships between parents and children are
usually describable by some proper canonical
distribution.
•Use the deterministic nodes to demonstrate
relationship.
oValues are specified by some function.
onondeterminism(no uncertainty)
oEx. X = f(parents(X))
oCan be logical
NorthAmerica ↔ Canada ∨ US ∨ Mexico
oOr numercial
Water level = inflow + precipitation – outflow - evaporation
21. 14.3 Efficient Representation of Conditional Distributions
•Uncertain relationships can be characterized
by noisy logical relationships.
•Ex. noisy-OR relation.
oLogical OR with probability
oEx. Cold ∨ Flu ∨ Malaria → Fever
In the real world, catching a cold sometimes does not
induce fever.
There is some probability of catching a cold and having a
fever.
22. 14.3 Efficient Representation of Conditional Distributions
•Noisy-OR
oAll possible causes are listed. (the missing can be
covered by leak node)
oCompute probability from the inhibition probability
23. 14.3 Efficient Representation of Conditional Distributions
•Suppose these individual inhibition
probabilities are as follows:
Variable depends
on k parents can
be described
using O(k)
parameters
instead of O(2k)
24. Bayesian nets with continuous variables
•Many real world problems involve continuous
quantities
oInfinite number of possible values
oImpossible to specify conditional probabilities
•Discretization
odividing up the possible values into a fixed set of
intervals
oIt’s often results in a considerable loss of accuracy
and very large CPTs
To define standard families of probability density
functions(Gaussian,etc )
25. Bayesian nets with continuous variables
•Hybrid Bayesian network
oHave both discrete and continuous variables
oTwo new kinds of distributions
Continuous variable given discrete or continuous parents
Discrete variable given continuous parents
•Example
Customer buys some fruit depending
on its cost which depends in turn on
the size of the harvest and whether
the government’s subsidy scheme is
operating.
D C
C
D
26. Hybrid Bayesian network
•P(Cost|Harvest , Subsidy )
oSubsidy(Discrete)
P(Cost|Harvest,subsidy) and P(Cost|Harvest,¬subsidy)
oHarvest(Continuous)
How the distribution over the cost c depends on the
continuous value h of Harvest
Specify the parameters of the cost distribution as a
function of h
D C
C
D
27. Hybrid Bayesian network
•The linear Gaussian distribution
oMost common choice
oThe child has a Gaussian distribution
GD has μ varies linearly with the value of the parent
GD has standard deviation σ that is fixed
oTwo distributions, subsidy and ¬subsidy, with
different parameters at, bt, σt, af , bf , and σf:
28. Hybrid Bayesian network
•A : P(Cost|Harvest,subsidy)
•B : P(Cost|Harvest,¬subsidy)
•C : P (c | h)
oaveraging over the two possible values of Subsidy
oassuming that each has prior probability 0.5
29. Other Distribution
•Distributions for discrete variables with
continuous parents
•Consider the Buys node
oCustomer will buy if the cost is low
oCustomer will not buy if it is high
o the probability of buying varies smoothly
•Probit Distribution
•Logit Distribution
D C
C
D
30. 14.4 Exact Inference in Bayesian Networks
•The task for probabilistic inference system
ogiven some observed event
some assignment of values to a set of evidence
variables
ocompute the posterior probability distribution for a
set of query variables
•Ex(In the burglary network)
oobserve JohnCalls = true and MaryCalls = true
othe probability that a burglary has occurred:
31. Inference by enumeration
•Chapter 13
oAny conditional probability can be computed by
summing terms from the full joint distribution
X denotes the query variable;
E denotes the set of evidence variables E1, . . . ,Em, and
e is a particular observed event;
Y will denotes the nonevidence, nonquery variables Y1, .
. . , Yl (called the hidden variables)
The complete set of variables is X={X}∪ E ∪Y
the posterior probability distribution P(X | e).
32. Inference by enumeration
•P(X | e) can be answered using a Bayesian
network by computing sums of products of
conditional probabilities from the network
•Ex
oConsider the query P(Burglary | JohnCalls
=true,MaryCalls =true)
oThe hidden variables for this query are
Earthquake and Alarm
33. Inference by enumeration
oFor simplicity, we do this just for Burglary =true:
O(n2 𝑛
)
O(2 𝑛
)
the P(b) term is a constant and can be moved
outside the summations over a and e
The chance of a burglary, given calls from both neighbors, is about 28%
35. The variable elimination algorithm
•The enumeration algorithm can be improved
by eliminate repeated calculations
•The idea is :
odo the calculation once and save the results for
later use(dynamic programming)
36. Clustering algorithms
•Not Clustering of ML
•Join tree algorithms
oThe time can be reduced
oThe basic idea of clustering is to join individual
nodes of the network to form cluster
oNodes in such a way that the resulting network is
a polytree
37. 14.5 Approximate Inference in Bayesian Networks
•Difficult to calculate multiply connected
networks
•It is essential to consider approximate
inference methods
•Monte Carlo algorithms
oRandomized sampling algorithms
otwo families of algorithms: direct sampling and
Markov chain sampling
oApply to the computation of posterior probabilities
38. Direct sampling methods
•Primitive element is the generation of
samples from a known probability distribution
•Sampling process for Bayesian networks
generates events from a network
•In topological order
•The probability distribution is conditioned on
the values already assigned to the variable’s
parents
39. Direct sampling methods
•Ex. Assuming an ordering
[Cloudy, Sprinkler, Rain, WetGrass]
1.Sample from P(Cloudy)
P(Cloudy) = <0.5, 0.5>
Cloudy = True
40. Direct sampling methods
•Ex. Assuming an ordering
[Cloudy, Sprinkler, Rain, WetGrass]
2.Sample from P(Sprinkler|Cloudy=true)
P(S|C=true) = <0.1, 0.9>
Sprinkler = false
41. Direct sampling methods
•Ex. Assuming an ordering
[Cloudy, Sprinkler, Rain, WetGrass]
3.Sample from P(Rain|Cloudy=true)
P(R|C=true) = <0.8, 0.2>
Rain = true
42. Direct sampling methods
•Ex. Assuming an ordering
[Cloudy, Sprinkler, Rain, WetGrass]
4.Sample from P(W|S=false, R=true)
P(W|S=false, R=true) = <0.9, 0.1>
WetGrass = true
•In this case, the event
[Cloudy, Sprinkler, Rain, WetGrass]
= [true, false, true, true]
43. Rejection sampling in Bayesian networks
•Producing samples from a hard-to-sample
distribution used an easy-to-sample
distribution
•1. It generates samples from the prior
distribution
•2. It rejects all those that do not match the
evidence
•3. The estimate P(X =x | e) is obtained by
counting how often X =x occurs in the
remaining samples
44. Rejection sampling in Bayesian networks
•Ex. Estimate P(Rain|Sprinkler=true) using
100 samples.
o27 samples have Sprinkler = true
The rest 73s are false → ignore those 73.
oOf the 27 remaining samples.
n(Rain=true) : n(Rain=false) = 8 : 19
oP^(Rain|Sprinkler=true)
= <8/27 : 19/27>
= <0.296 : 0.704>
•Rejection sampling is consistent for large
number of sampling
47. Inference by Markov chain simulation
•Markov chain
oRandom process in state space, each state is
independent of the previous state(memoryless)
•Monte Carlo
oA class of randomized algorithm whose running
time is deterministic
•Markov chain Monte Carlo(MCMC)
oA random sampling algorithm sampling each
event(state) by randomly moving to the new one
48. Gibbs sampling in Bayesian networks
•A randomized sampling method based on
MCMC
•Suitable for Bayesian network
•Start with an arbitrary state with fixed
evidence variables at their observed state
•Randomly sampling a value for one of the
nonevidence variables Xi
oThe sampling is done conditioned on the current
values of the variables in the Markov blanket of Xi
oMarkov blanket = parents, children, children’s
parents
49. Gibbs sampling in Bayesian networks
•Example
•P(Rain|Sprinkler = true, WetGrass = true)
oEvidence var = Sprinkler, WetGrass
oNonevidence var = Rain, Cloudy
o1.Arbitrarily initialize Rain and Cloudy(say true,
false)
[Cloudy, Sprinkler, Rain, WetGrass] = [T,T,F,T]
50. Gibbs sampling in Bayesian networks
•Ex. Sampling
P(Rain|Sprinkler = true, WetGrass = true)
oCurrent state
[Cloudy, Sprinkler, Rain, WetGrass] = [T,T,F,T]
o2.Sample Cloudy
Its Markov’s blanket consists of Sprinkler, Rain.
Sample from P(Cloudy|Sprinkler=true, Rain=false)
Suppose we get false
Move to the next state with changed Cloudy
[Cloudy, Sprinkler, Rain, WetGrass] = [F,T,F,T]
51. Gibbs sampling in Bayesian networks
•Ex. Sampling
P(Rain|Sprinkler = true, WetGrass = true)
oCurrent state
[Cloudy, Sprinkler, Rain, WetGrass] = [F,T,F,T]
o3.Sample Rain
Its Markov’s blanket consists of Sprinkler, Cloudy,
WetGrass.
Sample from
P(Rain|Sprinkler=true, Cloudy=false, WetGrass = true)
Suppose we get true.
Move to the next state with changed Cloudy
[Cloudy, Sprinkler, Rain, WetGrass] = [F,T,T,T]
52. Gibbs sampling in Bayesian networks
•Ex. Sampling
P(Rain|Sprinkler = true, WetGrass = true)
oCurrent state
o4. Keep doing 2,3 until reaching the number of
sampling
Suppose we need to do 80 samplings.
Get 20 states where Rain = true
60 where Rain = False
P(Rain|Sprinkler = true, WetGrass = true)
= α<20,60>
= <0.25, 0.75>
53. Gibbs sampling
•Gibbs sampling works at large number of
sampling saying that it has reached its
stationary distribution.(Converge)
oTime spent at each state equals to proportional to
its posterior distribution.
•Main computation problem
oHard to tell if it has converged.
oIf Markov’s blanket is large, consume a lot of
computational time
54. 14.6 Relational and First-order Probability models
•Bayesian networks are essentially
propositional logic.
•The set of random variables is fixed and finite.
•However, if the number becomes large,
intractable.
•Need another method to represent the model
55. 14.6 Relational and First-order Probability models
•The set of first-order models is infinite.
•Use database semantics instead called
“Relational Probability models”.
•Make unique names assumption and assume
domain closure.
•Like first-order logic
oConstant
oFunction
oPredicate symbols
•Assume type signature
56. 14.6 Relational and First-order Probability models
•Example
oAn online book retailer would like to provide
overall evaluations of products based on
recommendations received from its customers
oFor a single customer C1, recommending a single
book B1, the Bayes net might look like :
57. 14.6 Relational and First-order Probability models
•Example
oWith two customers and two books, the Bayes net
looks like
oFor larger numbers of books and customers, it
becomes completely impractical to specify the
network by hand
58. 14.6 Relational and First-order Probability models
•We would like to say something like
oA customer’s recommendation for a book depends
on the customer’s honesty and kindness and the
book’s quality
•This section develops a language that lets us
say exactly this, and a lot more besides
59. Relational probability model
•Ex. Book recommendation
“Customer” C recommends some book B by giving
score based on its “Quality” but score might vary
according to his “Kindness” and “Honesty”
•Type signature = Customer, Book
•Function and predicates
oHonest : Customer → {true, false}
oKindness : Customer → {1, 2, 3, 4, 5}
oQuality : Book → {1, 2, 3 ,4 5}
oRecommendation : Customer x Book → {1, 2, 3, 4, 5}
•Constants are whatever customer and book
names appear in the data.
oEx. “Harry Potter and the ……..” or “John”
60. Relational probability model
•Ex. Book recommendation(Cont.)
“Customer” C recommends some book B by giving
score based on its “Quality” but score might vary
according to his “Kindness” and “Honesty”
•Finally, assign dependencies that govern the
variables.
oHonest(c) ~ <0.99, 0.01>
oKindness(c) ~ <0.1, 0.1, 0.2, 0.3, 0.3>
oQuality(b) ~ <0.05, 0.2, 0.4, 0.2, 0.15>
oRecommendation(c, b) ~ RecCPT(Honest(c),
Kindness(c), Quality(b))
oRecCPT is separately defined conditional distribution
with 2 x 5 x 5 = 50 rows with 5 entries(Score 1-5)
61. Relational probability model
•We can redefine a model by specifying the
model to follow another defined rules.
•This is called “context-specific independence”
•For example, dishonest customers ignore quality
when giving a recommendation.
The criteria has no concerned → criteria functions
become independent.
Recommendation(c, b) is independent of Kindness(c),
Quality(b) when Honest(c) = false.
Recommendation(c, b) ~
if Honest(c) then
HonestRecCPT(Kindness(c), Quality(b))
else <0.4, 0.1, 0.0, 0.1, 0.4>
62. Relational probability model
•Inference in RPMs
•The idea is similar to propositionalization.
•Unrolling
oCollect evidences, query, and constant symbols
oConstruct equivalent Bayesian network
oApply any inference methods previously mentioned
•Problem
oValue of everything in the network must be known
beforehand.
oEx. Author = {A1, A2}, Author(Book1) = ?
Haven’t specified Author(Book1), but must be A1 or A2.
Uncertainty in the value of Author(Book1) is called relational
uncertainty
63. Open-universe probability models
•Database semantics is good at the setting where
every relevant objects exist and can be identified
unambiguously.
•Real-world setting is not in that form.
oEx. Father’s wife, aunt’s sister, grandma’s daughter → my
mom
•Bayesian network
oGenerate each possible world, event by event in order by
assignment of value to a variable.
•RPM
oGenerate entire sets of events defined by the possible
instantiations of the logical variables.
•OUPM
oAdd object to the world under construction.
oNot assigning value, but create the very existence of the
object
64. 14.7 Other Approaches to Uncertain Reasoning
•Rule-based methods for uncertain reasoning
•Emerged from logical inference
•Require 3 desirable properties
oLocality : If A B, we can conclude B given
evidence A without worrying about any other rules.
But in probabilistic systems, we need to consider all
evidence.
oDetachment : If we can derive B, we can use it
without caring how it was derived.
oTruth-functionality : truth value of complex
sentences can be computed from the truth of the
components.
Probability combination does not work this way
65. Representing ignorance: Dempster–Shafer theory
•Designed to deal with
ouncertainty : nothing is certain
oignorance : no idea whether evidence is real
•Not compute the probability of a proposition.
•Compute the probability that the evidence
supports the proposition.
•Belief function Bel(X): Show how trustable the
evidence is for event X.
66. Representing ignorance: Dempster–Shafer theory
Ex. Pick a coin from a magician’s pocket.
Rather not believe that the coin is fair.
•Bel(Head) = 0 x 0.5
•Bel(¬Head) = 0
Ex. A coin made by expert with 90% certainty
that the coin is fair.
•Bel(Head) = 0.9 x 0.5 = 0.45
•Bel(¬Head) = 0.9 x (1 – 0.5) = 0.45
•1 – 0.45 – 0.45 = 0.1 ← gap not accounted
for by the evidence
67. Representing ignorance: Dempster–Shafer theory
•Assign masses to sets of possible event.
•Masses is added to 1 over all possible event.
•Bel(A) is sum of masses for all events that are
subsets of A, including A.
•B(A) + B(¬A) is at most 1.
•Interval between Bel(A) and 1 – Bel(¬A) is
bounding the probability of A.
68. Representing vagueness: Fuzzy sets and fuzzy logic
•Fuzzy set theory : specifying how well an
object satisfies a vague description.
oEx.170 is tall or short?
Available answer = {tall, short}
Reality = “sort of…”
•Everything doesn’t need to be on extreme left
or right, somewhere in the middle is
acceptable, no sharp boundary.
oEx. Tall or Short are called fuzzy predicates.
oTall(X) ranges between 0 and 1.
•Fuzzy set theory ≠ uncertain reasoning
method.
69. Representing vagueness: Fuzzy sets and fuzzy logic
•Fuzzy logic : method for reasoning with
logical expression describing membership in
fuzzy sets.
•Suppose T(Tall(A)) = 0.6, T(Heavy(A)) = 0.4
oT(Tall(A) ∧ Heavy(A)) = 0.4
o“A is not that tall and heavy.”
T(A ∧ B) = min(T(A), T(B)
T(A ∨ B) = max(T(A), T(B)
T(¬A) = 1 – T(A)
70. Representing vagueness: Fuzzy sets and fuzzy logic
•Fuzzy control
oa methodology for constructing control systems
where the mapping between real-valued input and
output parameters is represented by fuzzy rules.
oSuccessful in commercial products such as
automatic transmission, video cameras, etc.
oLikely to become successful because of small rule
bases, no chaining inferences, etc.
71. Summary
oA Bayesian network is a directed acyclic graph
whose nodes correspond to random variables;
each node has a conditional distribution for the
node, given its parents.
oBayesian networks provide a concise way to
represent conditional independence
relationships in the domain.
oA Bayesian network specifies a full joint
distribution; each joint entry is defined as the
product of the corresponding entries in the local
conditional distributions. A Bayesian network is
often exponentially smaller than an explicitly
enumerated joint distribution
72. Summary
oMany conditional distributions can be represented
compactly by canonical families of distributions.
Hybrid Bayesian networks, which include both
discrete and continuous variables, use a variety of
canonical distributions.
oInference in Bayesian networks means computing
the probability distribution of a set of query
variables, given a set of evidence variables. Exact
inference algorithms, such as variable
elimination, evaluate sums of products of
conditional probabilities as efficiently as possible
73. Summary
oStochastic approximation techniques such as
likelihood weighting and MCMC can give reasonable
estimates of the true posterior probabilities in a
network and can cope with much larger networks than
can exact algorithms.
oProbability theory can be combined with
representational ideas from first-order logic to produce
very powerful systems for reasoning under uncertainty.
Relational probability models (RPMs) include
representational restrictions that guarantee a well-
defined probability distribution that can be expressed
as an equivalent Bayesian network. Open-universe
probability models handle existence and identity
uncertainty, defining probability distributions over the
infinite space of first-order possible worlds.
74. Summary
oVarious alternative systems for reasoning under
uncertainty have been suggested. Generally
speaking, truth-functional systems are not well
suited for such reasoning.