This document discusses uncertainty and statistical reasoning in artificial intelligence. It covers probability theory, Bayesian networks, and certainty factors. Key topics include probability distributions, Bayes' rule, building Bayesian networks, different types of probabilistic inferences using Bayesian networks, and defining and combining certainty factors. Case studies are provided to illustrate each algorithm.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
Guest Lecture about genetic algorithms in the course ECE657: Computational Intelligence/Intelligent Systems Design, Spring 2016, Electrical and Computer Engineering (ECE) Department, University of Waterloo, Canada.
This document discusses uncertainty and probability theory. It begins by explaining sources of uncertainty for autonomous agents from limited sensors and an unknown future. It then covers representing uncertainty with probabilities and Bayes' rule for updating beliefs. Examples show inferring diagnoses from symptoms using conditional probabilities. Independence is described as reducing the information needed for joint distributions. The document emphasizes probability theory and Bayesian reasoning for handling uncertainty.
Problem solving
Problem formulation
Search Techniques for Artificial Intelligence
Classification of AI searching Strategies
What is Search strategy ?
Defining a Search Problem
State Space Graph versus Search Trees
Graph vs. Tree
Problem Solving by Search
The document provides an overview of constraint satisfaction problems (CSPs). It defines a CSP as consisting of variables with domains of possible values, and constraints specifying allowed value combinations. CSPs can represent many problems using variables and constraints rather than explicit state representations. Backtracking search is commonly used to solve CSPs by trying value assignments and backtracking when constraints are violated.
The document discusses knowledge representation issues in artificial intelligence. It covers several key topics:
- Knowledge and its representation are distinct but related entities that are central to intelligent systems. Knowledge describes the world while representation defines how knowledge is encoded and manipulated.
- There are various ways to represent knowledge, including logical representations, inheritance hierarchies, rules-based systems, and procedural representations. Different types of knowledge require different representation schemes.
- Issues in knowledge representation include ensuring representations are adequately expressive and support effective inference, as well as how to structure knowledge at the appropriate level of granularity and represent sets of objects. Choosing the right representation approach is important for building intelligent systems.
This document provides an overview of natural language processing and planning topics including:
- NLP tasks like parsing, machine translation, and information extraction.
- The components of a planning system including the planning agent, state and goal representations, and planning techniques like forward and backward chaining.
- Methods for natural language processing including pattern matching, syntactic analysis, and the stages of NLP like phonological, morphological, syntactic, semantic, and pragmatic analysis.
Knowledge representation and Predicate logicAmey Kerkar
1. The document discusses knowledge representation and predicate logic.
2. It explains that knowledge representation involves representing facts through internal representations that can then be manipulated to derive new knowledge. Predicate logic allows representing objects and relationships between them using predicates, quantifiers, and logical connectives.
3. Several examples are provided to demonstrate representing simple facts about individuals as predicates and using quantifiers like "forall" and "there exists" to represent generalized statements.
FellowBuddy.com is an innovative platform that brings students together to share notes, exam papers, study guides, project reports and presentation for upcoming exams.
We connect Students who have an understanding of course material with Students who need help.
Benefits:-
# Students can catch up on notes they missed because of an absence.
# Underachievers can find peer developed notes that break down lecture and study material in a way that they can understand
# Students can earn better grades, save time and study effectively
Our Vision & Mission – Simplifying Students Life
Our Belief – “The great breakthrough in your life comes when you realize it, that you can learn anything you need to learn; to accomplish any goal that you have set for yourself. This means there are no limits on what you can be, have or do.”
Like Us - https://www.facebook.com/FellowBuddycom
Guest Lecture about genetic algorithms in the course ECE657: Computational Intelligence/Intelligent Systems Design, Spring 2016, Electrical and Computer Engineering (ECE) Department, University of Waterloo, Canada.
This document discusses uncertainty and probability theory. It begins by explaining sources of uncertainty for autonomous agents from limited sensors and an unknown future. It then covers representing uncertainty with probabilities and Bayes' rule for updating beliefs. Examples show inferring diagnoses from symptoms using conditional probabilities. Independence is described as reducing the information needed for joint distributions. The document emphasizes probability theory and Bayesian reasoning for handling uncertainty.
Problem solving
Problem formulation
Search Techniques for Artificial Intelligence
Classification of AI searching Strategies
What is Search strategy ?
Defining a Search Problem
State Space Graph versus Search Trees
Graph vs. Tree
Problem Solving by Search
The document provides an overview of constraint satisfaction problems (CSPs). It defines a CSP as consisting of variables with domains of possible values, and constraints specifying allowed value combinations. CSPs can represent many problems using variables and constraints rather than explicit state representations. Backtracking search is commonly used to solve CSPs by trying value assignments and backtracking when constraints are violated.
The document discusses knowledge representation issues in artificial intelligence. It covers several key topics:
- Knowledge and its representation are distinct but related entities that are central to intelligent systems. Knowledge describes the world while representation defines how knowledge is encoded and manipulated.
- There are various ways to represent knowledge, including logical representations, inheritance hierarchies, rules-based systems, and procedural representations. Different types of knowledge require different representation schemes.
- Issues in knowledge representation include ensuring representations are adequately expressive and support effective inference, as well as how to structure knowledge at the appropriate level of granularity and represent sets of objects. Choosing the right representation approach is important for building intelligent systems.
This document provides an overview of natural language processing and planning topics including:
- NLP tasks like parsing, machine translation, and information extraction.
- The components of a planning system including the planning agent, state and goal representations, and planning techniques like forward and backward chaining.
- Methods for natural language processing including pattern matching, syntactic analysis, and the stages of NLP like phonological, morphological, syntactic, semantic, and pragmatic analysis.
Knowledge representation and Predicate logicAmey Kerkar
1. The document discusses knowledge representation and predicate logic.
2. It explains that knowledge representation involves representing facts through internal representations that can then be manipulated to derive new knowledge. Predicate logic allows representing objects and relationships between them using predicates, quantifiers, and logical connectives.
3. Several examples are provided to demonstrate representing simple facts about individuals as predicates and using quantifiers like "forall" and "there exists" to represent generalized statements.
The Dempster-Shafer Theory was developed by Arthur Dempster in 1967 and Glenn Shafer in 1976 as an alternative to Bayesian probability. It allows one to combine evidence from different sources and obtain a degree of belief (or probability) for some event. The theory uses belief functions and plausibility functions to represent degrees of belief for various hypotheses given certain evidence. It was developed to describe ignorance and consider all possible outcomes, unlike Bayesian probability which only considers single evidence. An example is given of using the theory to determine the murderer in a room with 4 people where the lights went out.
The inference engine applies logical rules to facts in the knowledge base to infer new information. It uses two approaches:
- Forward chaining starts with known facts and fires rules until reaching the goal, applying rules in a bottom-up manner.
- Backward chaining starts with the goal and works backwards through rules to find supporting facts, taking a top-down approach.
Both are illustrated using examples of determining an animal's color. Forward chaining applies rules to known facts about an animal to conclude its color, while backward chaining starts with the color goal and applies rules in reverse to find facts proving the goal.
Lecture 14 Heuristic Search-A star algorithmHema Kashyap
A* is a search algorithm that finds the shortest path through a graph to a goal state. It combines the best aspects of Dijkstra's algorithm and best-first search. A* uses a heuristic function to evaluate the cost of a path passing through each state to guide the search towards the lowest cost goal state. The algorithm initializes the start state, then iteratively selects the lowest cost node from its open list to expand, adding successors to the open list until it finds the goal state. A* is admissible, complete, and optimal under certain conditions relating to the heuristic function and graph structure.
Genetic algorithms and traditional algorithms differ in their definitions, usages, and complexity. Genetic algorithms are based on genetics and natural selection, and help find optimal solutions to difficult problems. They are more advanced than traditional algorithms which provide step-by-step procedures. Genetic algorithms are used in fields like machine learning and artificial intelligence, while traditional algorithms are used in programming and mathematics.
The document discusses problem solving by searching. It describes problem solving agents and how they formulate goals and problems, search for solutions, and execute solutions. Tree search algorithms like breadth-first search, uniform-cost search, and depth-first search are described. Example problems discussed include the 8-puzzle, 8-queens, and route finding problems. The strategies of different uninformed search algorithms are explained.
Artificial Intelligence (AI) | Prepositional logic (PL)and first order predic...Ashish Duggal
The following are the topics in this presentation Prepositional Logic (PL) and First-order Predicate Logic (FOPL) is used for knowledge representation in artificial intelligence (AI).
There are also sub-topics in this presentation like logical connective, atomic sentence, complex sentence, and quantifiers.
This PPT is very helpful for Computer science and Computer Engineer
(B.C.A., M.C.A., B.TECH. , M.TECH.)
The document discusses gradient descent methods for unconstrained convex optimization problems. It introduces gradient descent as an iterative method to find the minimum of a differentiable function by taking steps proportional to the negative gradient. It describes the basic gradient descent update rule and discusses convergence conditions such as Lipschitz continuity, strong convexity, and condition number. It also covers techniques like exact line search, backtracking line search, coordinate descent, and steepest descent methods.
The document discusses the 8-puzzle problem and the A* algorithm. The 8-puzzle problem involves a 3x3 grid with 8 numbered tiles and 1 blank space that can be moved. The A* algorithm maintains a tree of paths from the initial to final state, extending the paths one step at a time until the final state is reached. It is complete and optimal but depends on the accuracy of the heuristic used to estimate costs.
This document provides an overview of genetic algorithms. It discusses how genetic algorithms are inspired by natural evolution and use techniques like selection, crossover, and mutation to arrive at optimal solutions. The document covers the history of genetic algorithms, how they work, examples of using genetic algorithms to optimize problems, and their applications in fields like electromagnetism. Genetic algorithms provide a way to find optimal solutions to complex problems by simulating the natural evolutionary process of reproduction, mutation, and selection of offspring.
The document discusses sources and approaches to handling uncertainty in artificial intelligence. It provides examples of uncertain inputs, knowledge, and outputs in AI systems. Common methods for representing and reasoning with uncertain data include probability, Bayesian belief networks, hidden Markov models, and temporal models. Effectively handling uncertainty through probability and inference allows AI to make rational decisions with imperfect knowledge.
Best-first search is a heuristic search algorithm that expands the most promising node first. It uses an evaluation function f(n) that estimates the cost to reach the goal from each node n. Nodes are ordered in the fringe by increasing f(n). A* search is a special case of best-first search that uses an admissible heuristic function h(n) and is guaranteed to find the optimal solution.
In this tutorial, we will learn the the following topics -
+ Voting Classifiers
+ Bagging and Pasting
+ Random Patches and Random Subspaces
+ Random Forests
+ Boosting
+ Stacking
This document provides an overview of first-order logic including:
- First-order logic is a formal system used in mathematics, philosophy, linguistics and computer science to represent knowledge.
- It models the world in terms of objects, properties, relations and functions.
- The syntax of first-order logic includes constant symbols, function symbols, predicate symbols, variables, and connectives like not, and, or as well as quantifiers like universal and existential.
- Examples show how first-order logic can represent statements about individuals and their relationships using predicates, terms, atomic and complex sentences with quantifiers.
The document discusses uncertainty and probabilistic reasoning. It describes sources of uncertainty like partial information, unreliable information, and conflicting information from multiple sources. It then discusses representing and reasoning with uncertainty using techniques like default logic, rules with probabilities, and probability theory. The key approaches covered are conditional probability, independence, conditional independence, and using Bayes' rule to update probabilities based on new evidence.
The document discusses procedural versus declarative knowledge representation and how logic programming languages like Prolog allow knowledge to be represented declaratively through logical rules. It also covers topics like forward and backward reasoning, matching rules to facts in working memory, and using control knowledge to guide the problem solving process. Logic programming represents knowledge through Horn clauses and uses backward chaining inference to attempt to prove goals.
Alpha-beta pruning is a modification of the minimax algorithm that optimizes it by pruning portions of the search tree that cannot affect the outcome. It uses two thresholds, alpha and beta, to track the best values found for the maximizing and minimizing players. By comparing alpha and beta at each node, it can avoid exploring subtrees where the minimum of the maximizing player's options will be greater than the maximum of the minimizing player's options. This allows it to often prune branches of the tree without calculating their values, improving the algorithm's efficiency.
This document discusses inference in first-order logic. It defines sound and complete inference and introduces substitution. It then discusses propositional vs first-order inference and introduces universal and existential quantifiers. The key techniques of first-order inference are unification, which finds substitutions to make logical expressions identical, and forward chaining inference, which applies rules like modus ponens to iteratively derive new facts from a knowledge base.
1. The document describes an artificial intelligence implementation of the tic-tac-toe game using the minimax algorithm.
2. It provides details on the game rules, initial and goal states, and the state space tree and winning conditions.
3. The minimax approach is then explained as a recursive algorithm that evaluates all possible future moves from the current state and assumes the opponent will make the choice that results in the least preferred outcome.
Bayesian networks provide a graphical representation of the conditional independence relationships between variables in a probability distribution. The structure of a Bayesian network reflects the conditional independencies, where each node represents a variable and edges denote direct probabilistic influences between variables. The conditional probability tables quantify the network by specifying the probability of each variable given its parent variables. Efficient inference in Bayesian networks can be performed using an algorithm like variable elimination, which works by joining and multiplying factors representing portions of the probability distribution and summing out variables until the desired conditional probability is computed.
This document provides an overview of Bayes classifiers and Naive Bayes classifiers. It begins by introducing probabilistic classification and the goal of predicting a categorical output given input attributes. It then discusses the Naive Bayes assumption that attributes are conditionally independent given the class label. This allows estimating each p(attribute|class) separately rather than the full joint distribution. The document covers maximum likelihood estimation, Laplace smoothing, and using Naive Bayes for problems like spam filtering. It contrasts generative models like Naive Bayes that model p(attribute|class) with discriminative approaches that directly model p(class|attributes).
The Dempster-Shafer Theory was developed by Arthur Dempster in 1967 and Glenn Shafer in 1976 as an alternative to Bayesian probability. It allows one to combine evidence from different sources and obtain a degree of belief (or probability) for some event. The theory uses belief functions and plausibility functions to represent degrees of belief for various hypotheses given certain evidence. It was developed to describe ignorance and consider all possible outcomes, unlike Bayesian probability which only considers single evidence. An example is given of using the theory to determine the murderer in a room with 4 people where the lights went out.
The inference engine applies logical rules to facts in the knowledge base to infer new information. It uses two approaches:
- Forward chaining starts with known facts and fires rules until reaching the goal, applying rules in a bottom-up manner.
- Backward chaining starts with the goal and works backwards through rules to find supporting facts, taking a top-down approach.
Both are illustrated using examples of determining an animal's color. Forward chaining applies rules to known facts about an animal to conclude its color, while backward chaining starts with the color goal and applies rules in reverse to find facts proving the goal.
Lecture 14 Heuristic Search-A star algorithmHema Kashyap
A* is a search algorithm that finds the shortest path through a graph to a goal state. It combines the best aspects of Dijkstra's algorithm and best-first search. A* uses a heuristic function to evaluate the cost of a path passing through each state to guide the search towards the lowest cost goal state. The algorithm initializes the start state, then iteratively selects the lowest cost node from its open list to expand, adding successors to the open list until it finds the goal state. A* is admissible, complete, and optimal under certain conditions relating to the heuristic function and graph structure.
Genetic algorithms and traditional algorithms differ in their definitions, usages, and complexity. Genetic algorithms are based on genetics and natural selection, and help find optimal solutions to difficult problems. They are more advanced than traditional algorithms which provide step-by-step procedures. Genetic algorithms are used in fields like machine learning and artificial intelligence, while traditional algorithms are used in programming and mathematics.
The document discusses problem solving by searching. It describes problem solving agents and how they formulate goals and problems, search for solutions, and execute solutions. Tree search algorithms like breadth-first search, uniform-cost search, and depth-first search are described. Example problems discussed include the 8-puzzle, 8-queens, and route finding problems. The strategies of different uninformed search algorithms are explained.
Artificial Intelligence (AI) | Prepositional logic (PL)and first order predic...Ashish Duggal
The following are the topics in this presentation Prepositional Logic (PL) and First-order Predicate Logic (FOPL) is used for knowledge representation in artificial intelligence (AI).
There are also sub-topics in this presentation like logical connective, atomic sentence, complex sentence, and quantifiers.
This PPT is very helpful for Computer science and Computer Engineer
(B.C.A., M.C.A., B.TECH. , M.TECH.)
The document discusses gradient descent methods for unconstrained convex optimization problems. It introduces gradient descent as an iterative method to find the minimum of a differentiable function by taking steps proportional to the negative gradient. It describes the basic gradient descent update rule and discusses convergence conditions such as Lipschitz continuity, strong convexity, and condition number. It also covers techniques like exact line search, backtracking line search, coordinate descent, and steepest descent methods.
The document discusses the 8-puzzle problem and the A* algorithm. The 8-puzzle problem involves a 3x3 grid with 8 numbered tiles and 1 blank space that can be moved. The A* algorithm maintains a tree of paths from the initial to final state, extending the paths one step at a time until the final state is reached. It is complete and optimal but depends on the accuracy of the heuristic used to estimate costs.
This document provides an overview of genetic algorithms. It discusses how genetic algorithms are inspired by natural evolution and use techniques like selection, crossover, and mutation to arrive at optimal solutions. The document covers the history of genetic algorithms, how they work, examples of using genetic algorithms to optimize problems, and their applications in fields like electromagnetism. Genetic algorithms provide a way to find optimal solutions to complex problems by simulating the natural evolutionary process of reproduction, mutation, and selection of offspring.
The document discusses sources and approaches to handling uncertainty in artificial intelligence. It provides examples of uncertain inputs, knowledge, and outputs in AI systems. Common methods for representing and reasoning with uncertain data include probability, Bayesian belief networks, hidden Markov models, and temporal models. Effectively handling uncertainty through probability and inference allows AI to make rational decisions with imperfect knowledge.
Best-first search is a heuristic search algorithm that expands the most promising node first. It uses an evaluation function f(n) that estimates the cost to reach the goal from each node n. Nodes are ordered in the fringe by increasing f(n). A* search is a special case of best-first search that uses an admissible heuristic function h(n) and is guaranteed to find the optimal solution.
In this tutorial, we will learn the the following topics -
+ Voting Classifiers
+ Bagging and Pasting
+ Random Patches and Random Subspaces
+ Random Forests
+ Boosting
+ Stacking
This document provides an overview of first-order logic including:
- First-order logic is a formal system used in mathematics, philosophy, linguistics and computer science to represent knowledge.
- It models the world in terms of objects, properties, relations and functions.
- The syntax of first-order logic includes constant symbols, function symbols, predicate symbols, variables, and connectives like not, and, or as well as quantifiers like universal and existential.
- Examples show how first-order logic can represent statements about individuals and their relationships using predicates, terms, atomic and complex sentences with quantifiers.
The document discusses uncertainty and probabilistic reasoning. It describes sources of uncertainty like partial information, unreliable information, and conflicting information from multiple sources. It then discusses representing and reasoning with uncertainty using techniques like default logic, rules with probabilities, and probability theory. The key approaches covered are conditional probability, independence, conditional independence, and using Bayes' rule to update probabilities based on new evidence.
The document discusses procedural versus declarative knowledge representation and how logic programming languages like Prolog allow knowledge to be represented declaratively through logical rules. It also covers topics like forward and backward reasoning, matching rules to facts in working memory, and using control knowledge to guide the problem solving process. Logic programming represents knowledge through Horn clauses and uses backward chaining inference to attempt to prove goals.
Alpha-beta pruning is a modification of the minimax algorithm that optimizes it by pruning portions of the search tree that cannot affect the outcome. It uses two thresholds, alpha and beta, to track the best values found for the maximizing and minimizing players. By comparing alpha and beta at each node, it can avoid exploring subtrees where the minimum of the maximizing player's options will be greater than the maximum of the minimizing player's options. This allows it to often prune branches of the tree without calculating their values, improving the algorithm's efficiency.
This document discusses inference in first-order logic. It defines sound and complete inference and introduces substitution. It then discusses propositional vs first-order inference and introduces universal and existential quantifiers. The key techniques of first-order inference are unification, which finds substitutions to make logical expressions identical, and forward chaining inference, which applies rules like modus ponens to iteratively derive new facts from a knowledge base.
1. The document describes an artificial intelligence implementation of the tic-tac-toe game using the minimax algorithm.
2. It provides details on the game rules, initial and goal states, and the state space tree and winning conditions.
3. The minimax approach is then explained as a recursive algorithm that evaluates all possible future moves from the current state and assumes the opponent will make the choice that results in the least preferred outcome.
Bayesian networks provide a graphical representation of the conditional independence relationships between variables in a probability distribution. The structure of a Bayesian network reflects the conditional independencies, where each node represents a variable and edges denote direct probabilistic influences between variables. The conditional probability tables quantify the network by specifying the probability of each variable given its parent variables. Efficient inference in Bayesian networks can be performed using an algorithm like variable elimination, which works by joining and multiplying factors representing portions of the probability distribution and summing out variables until the desired conditional probability is computed.
This document provides an overview of Bayes classifiers and Naive Bayes classifiers. It begins by introducing probabilistic classification and the goal of predicting a categorical output given input attributes. It then discusses the Naive Bayes assumption that attributes are conditionally independent given the class label. This allows estimating each p(attribute|class) separately rather than the full joint distribution. The document covers maximum likelihood estimation, Laplace smoothing, and using Naive Bayes for problems like spam filtering. It contrasts generative models like Naive Bayes that model p(attribute|class) with discriminative approaches that directly model p(class|attributes).
1. The document discusses uncertainty and different methods for handling it, including probability theory.
2. It explains that probability can be used to represent an agent's degree of belief in a proposition given available evidence.
3. Key concepts covered include prior and conditional probability, Bayes' rule, and independence assumptions which allow reducing the size of probability models.
Uncertainty & Probability
Baye's rule
Choosing Hypotheses- Maximum a posteriori
Maximum Likelihood - Baye's concept learning
Maximum Likelihood of real valued function
Bayes optimal Classifier
Joint distributions
Naive Bayes Classifier
The document provides an overview of key concepts in probability theory and stochastic processes. It defines fundamental terms like sample space, events, probability, conditional probability, independence, random variables, and common probability distributions including binomial, Poisson, exponential, uniform, and Gaussian distributions. Examples are given for each concept to illustrate how it applies to modeling random experiments and computing probabilities. The three main axioms of probability are stated. Key properties and formulas for expectation, variance, and conditional expectation are also summarized.
Conditional independence assumptions allow simpler probabilistic models to be constructed that can still accurately model real-world phenomena. Bayesian networks provide a systematic way to construct probabilistic models using conditional independence assumptions. Representing knowledge with probabilities provides a coherent framework for representing uncertainty, which is important for building intelligent systems that can reason effectively with incomplete information.
Bayesian statistics uses probability to represent uncertainty about unknown parameters in statistical models. It differs from classical statistics in that parameters are treated as random variables rather than fixed unknown constants. Bayesian probability represents a degree of belief in an event rather than the physical probability of an event. The Bayes' formula provides a way to update beliefs based on new evidence or data using conditional probability. Bayesian networks are graphical models that compactly represent joint probability distributions over many variables and allow for efficient inference.
Equational axioms for probability calculus and modelling of Likelihood ratio ...Advanced-Concepts-Team
Based on the theory of meadows an equational axiomatisation is given for probability functions on finite event spaces. Completeness of the axioms is stated with some pointers to how that is shown.Then a simplified model courtroom subjective probabilistic reasoning is provided in terms of a protocol with two proponents: the trier of fact (TOF, the judge), and the moderator of evidence (MOE, the scientific witness). Then the idea is outlined of performing of a step of Bayesian reasoning by way of applying a transformation of the subjective probability function of TOF on the basis of different pieces of information obtained from MOE. The central role of the so-called Adams transformation is outlined. A simple protocol is considered where MOE transfers to TOF first a likelihood ratio for a hypothesis H and a potential piece of evidence E and thereupon the additional assertion that E holds true. As an alternative a second protocol is considered where MOE transfers two successive likelihoods (the quotient of both being the mentioned ratio) followed with the factuality of E. It is outlined how the Adams transformation allows to describe information processing at TOF side in both protocols and that the resulting probability distribution is the same in both cases. Finally it is indicated how the Adams transformation also allows the required update of subjective probability at MOE side so that both sides in the protocol may be assumed to comply with the demands of subjective probability.
The document provides an overview of Bayesian networks including definitions of marginal independence, conditional independence, and how Bayesian networks represent probabilistic relationships between variables through a directed acyclic graph. It discusses how Bayesian networks compactly represent joint distributions through conditional independence assumptions encoded in the graph structure. An example of constructing a Bayesian network for a burglar alarm problem is used to illustrate defining conditional probability tables for each variable given its parents.
The document discusses discrete probability concepts including sample spaces, events, axioms of probability, conditional probability, Bayes' theorem, random variables, probability distributions, expectation, and classical probability problems. It provides examples and explanations of key terms. The Monty Hall problem is used to demonstrate defining the sample space, event of interest, assigning probabilities, and computing the probability of winning by sticking or switching doors.
Accounting for uncertainty is a crucial component in decision making (e.g., classification) because of ambiguity in our measurements.
Probability theory is the proper mechanism for accounting for uncertainty.
This document provides an overview of key concepts in probability and statistics, including:
- Definitions of probability, sample spaces, events, and the axioms of probability
- Concepts of conditional probability, Bayes' rule, independence, and discrete random variables
- How to calculate probabilities of events, expected values, variance, and conditioning probabilities on other events or random variables
This document provides a concise probability cheatsheet compiled by William Chen and others. It covers key probability concepts like counting rules, sampling tables, definitions of probability, independence, unions and intersections, joint/marginal/conditional probabilities, Bayes' rule, random variables and their distributions, expected value, variance, indicators, moment generating functions, and independence of random variables. The cheatsheet is licensed under CC BY-NC-SA 4.0 and the last updated date is March 20, 2015.
The document provides an overview of probability concepts including events, sample spaces, random variables, probability distributions, independence, conditional probability, Bayes' rule, mean, variance, and examples. It first defines events, sample spaces, and probability measures. It then introduces discrete and continuous random variables as well as common probability distributions. The document discusses joint, marginal, and conditional probability distributions and how to calculate probabilities using rules like the chain rule, total probability, and Bayes' rule. It also covers independence, conditional independence, mean, variance, and their properties. Finally, it gives the Monty Hall problem as an example and solves it using Bayes' rule.
The document provides an overview of probability concepts including events, sample spaces, random variables, probability distributions, independence, conditional probability, Bayes' rule, mean, variance, and examples. It first defines events, sample spaces, and probability measures. It then introduces discrete and continuous random variables as well as common probability distributions. The document also covers joint and marginal probability distributions, independence, conditional independence, and rules like the chain rule and Bayes' rule. It defines key concepts like mean, variance, and correlation. Finally, it discusses the Monty Hall problem and solves it using Bayes' rule.
The document provides an overview of probability concepts including events, sample spaces, random variables, probability distributions, independence, conditional probability, Bayes' rule, mean, variance, and examples. It first defines events, sample spaces, and probability measures. It then introduces discrete and continuous random variables as well as common probability distributions. The document discusses joint, marginal, and conditional probability distributions and how to calculate probabilities using rules like the chain rule, total probability, and Bayes' rule. It also covers independence, conditional independence, mean, variance, and their properties. Finally, it discusses using probability for statistical inference and provides the Monty Hall problem as an example.
Probability_Review HELPFUL IN STATISTICS.pptShamshadAli58
The document provides an overview of probability concepts including events, sample spaces, random variables, probability distributions, independence, conditional probability, Bayes' rule, mean, variance, and examples. It first defines events, sample spaces, and probability measures. It then introduces discrete and continuous random variables as well as common probability distributions. The document also covers joint and marginal probability distributions, independence, conditional independence, and rules like the chain rule and Bayes' rule. It defines key concepts like mean, variance, and correlation. Finally, it discusses the Monty Hall problem and solves it using Bayes' rule.
The document provides an overview of probability concepts including events, sample spaces, random variables, probability distributions, independence, conditional probability, Bayes' rule, mean, variance, and examples. It first defines events, sample spaces, and probability measures. It then introduces discrete and continuous random variables as well as common probability distributions. The document also covers joint and marginal probability distributions, independence, conditional independence, and rules like the chain rule and Bayes' rule. It defines key concepts like mean, variance, and correlation. Finally, it discusses the Monty Hall problem and solves it using Bayes' rule.
The document provides an overview of probability concepts including events, sample spaces, random variables, probability distributions, independence, conditional probability, Bayes' rule, mean, variance, and examples. It first defines events, sample spaces, and probability measures. It then introduces discrete and continuous random variables as well as common probability distributions. The document discusses joint, marginal, and conditional probability distributions and how to calculate probabilities using rules like the chain rule, total probability, and Bayes' rule. It also covers independence, conditional independence, mean, variance, and their properties. Finally, it gives the Monty Hall problem as an example and solves it using Bayes' rule.
Similar to Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VEC (20)
Road construction is not as easy as it seems to be, it includes various steps and it starts with its designing and
structure including the traffic volume consideration. Then base layer is done by bulldozers and levelers and after
base surface coating has to be done. For giving road a smooth surface with flexibility, Asphalt concrete is used.
Asphalt requires an aggregate sub base material layer, and then a base layer to be put into first place. Asphalt road
construction is formulated to support the heavy traffic load and climatic conditions. It is 100% recyclable and
saving non renewable natural resources.
With the advancement of technology, Asphalt technology gives assurance about the good drainage system and with
skid resistance it can be used where safety is necessary such as outsidethe schools.
The largest use of Asphalt is for making asphalt concrete for road surfaces. It is widely used in airports around the
world due to the sturdiness and ability to be repaired quickly, it is widely used for runways dedicated to aircraft
landing and taking off. Asphalt is normally stored and transported at 150’C or 300’F temperature
Accident detection system project report.pdfKamal Acharya
The Rapid growth of technology and infrastructure has made our lives easier. The
advent of technology has also increased the traffic hazards and the road accidents take place
frequently which causes huge loss of life and property because of the poor emergency facilities.
Many lives could have been saved if emergency service could get accident information and
reach in time. Our project will provide an optimum solution to this draw back. A piezo electric
sensor can be used as a crash or rollover detector of the vehicle during and after a crash. With
signals from a piezo electric sensor, a severe accident can be recognized. According to this
project when a vehicle meets with an accident immediately piezo electric sensor will detect the
signal or if a car rolls over. Then with the help of GSM module and GPS module, the location
will be sent to the emergency contact. Then after conforming the location necessary action will
be taken. If the person meets with a small accident or if there is no serious threat to anyone’s
life, then the alert message can be terminated by the driver by a switch provided in order to
avoid wasting the valuable time of the medical rescue team.
Generative AI Use cases applications solutions and implementation.pdfmahaffeycheryld
Generative AI solutions encompass a range of capabilities from content creation to complex problem-solving across industries. Implementing generative AI involves identifying specific business needs, developing tailored AI models using techniques like GANs and VAEs, and integrating these models into existing workflows. Data quality and continuous model refinement are crucial for effective implementation. Businesses must also consider ethical implications and ensure transparency in AI decision-making. Generative AI's implementation aims to enhance efficiency, creativity, and innovation by leveraging autonomous generation and sophisticated learning algorithms to meet diverse business challenges.
https://www.leewayhertz.com/generative-ai-use-cases-and-applications/
Digital Twins Computer Networking Paper Presentation.pptxaryanpankaj78
A Digital Twin in computer networking is a virtual representation of a physical network, used to simulate, analyze, and optimize network performance and reliability. It leverages real-time data to enhance network management, predict issues, and improve decision-making processes.
Open Channel Flow: fluid flow with a free surfaceIndrajeet sahu
Open Channel Flow: This topic focuses on fluid flow with a free surface, such as in rivers, canals, and drainage ditches. Key concepts include the classification of flow types (steady vs. unsteady, uniform vs. non-uniform), hydraulic radius, flow resistance, Manning's equation, critical flow conditions, and energy and momentum principles. It also covers flow measurement techniques, gradually varied flow analysis, and the design of open channels. Understanding these principles is vital for effective water resource management and engineering applications.
Levelised Cost of Hydrogen (LCOH) Calculator ManualMassimo Talia
The aim of this manual is to explain the
methodology behind the Levelized Cost of
Hydrogen (LCOH) calculator. Moreover, this
manual also demonstrates how the calculator
can be used for estimating the expenses associated with hydrogen production in Europe
using low-temperature electrolysis considering different sources of electricity
2. Syllabus
• Probability and Axioms-Bayes Rule-
Bayesian Networks-Inferences-Temporal
Models- Hidden Markov models-Fuzzy
reasoning-Certainty factors-Bayesian
Theory-Bayesian Network-Dempster
Shafer theory.
• Case study on each algorithm
4. 1. Probability theory
1.1 Uncertain knowledge
p symptom(p, Toothache) disease(p,cavity)
p sympt(p,Toothache)
disease(p,cavity) disease(p,gum_disease) …
• PL
- laziness
- theoretical ignorance
- practical ignorance
• Probability theory degree of belief or
plausibility of a statement – a numerical
measure in [0,1]
• Degree of truth – fuzzy logic degree of belief
4
5. 1.2 Definitions
• Unconditional or prior probability of A – the degree of
belief in A in the absence of any other information – P(A)
• A – random variable
• Probability distribution – P(A), P(A,B)
Example
P(Weather = Sunny) = 0.1
P(Weather = Rain) = 0.7
P(Weather = Snow) = 0.2
Weather – random variable
• P(Weather) = (0.1, 0.7, 0.2) – probability dsitribution
• Conditional probability – posterior – once the agent
has obtained some evidence B for A - P(A|B)
• P(Cavity | Toothache) = 0.8 5
6. Definitions - cont
• Axioms of probability
• The measure of the occurrence of an event
(random variable) A – a function P:S R
satisfying the axioms:
• 0 P(A) 1
• P(S) = 1 ( or P(true) = 1 and P(false) = 0)
• P(A B) = P(A) + P(B) - P(A B)
P(A ~A) = P(A)+P(~A) –P(false) = P(true)
P(~A) = 1 – P(A)
6
7. Definitions - cont
A and B mutually exclusive P(A B) = P(A) +
P(B)
P(e1 e2 e3 … en) = P(e1) + P(e2) + P(e3) + …
+ P(en)
The probability of a proposition a is equal to the
sum of the probabilities of the atomic events in
which a holds
e(a) – the set of atomic events in which a holds
7
8. 1.3 Product rule
Conditional probabilities can be defined in terms of
unconditional probabilities
The condition probability of the occurrence of
A if event B occurs
– P(A|B) = P(A B) / P(B)
This can be written also as:
– P(A B) = P(A|B) * P(B)
For probability distributions
– P(A=a1 B=b1) = P(A=a1|B=b1) * P(B=b1)
– P(A=a1 B=b2) = P(A=a1|B=b2) * P(B=b2)
….
– P(X,Y) = P(X|Y)*P(Y) 8
9. 1.4 Bayes’ rule and its use
P(A B) = P(A|B) *P(B)
P(A B) = P(B|A) *P(A)
Bays’ rule (theorem)
• P(B|A) = P(A | B) * P(B) / P(A)
• P(B|A) = P(A | B) * P(B) / P(A)
10. Bayes Theorem
hi – hypotheses (i=1,k);
e1,…,en - evidence
P(hi)
P(hi | e1,…,en)
P(e1,…,en| hi)
10
P(h |e ,e ,...,e ) =
P(e ,e ,...,e |h ) P(h )
P(e ,e ,...,e |h ) P(h )
, i = 1,k
i 1 2 n
1 2 n i i
1 2 n j j
j 1
k
11. Bayes’ Theorem - cont
If e1,…,en are independent hypotheses
then
PROSPECTOR
11
k
1,
=
j
),
h
|
P(e
...
)
h
|
P(e
)
h
|
P(e
=
)
h
|
e
,...,
e
,
P(e j
n
j
2
j
1
j
n
2
1
14. 2 Bayesian networks
• Represent dependencies among random
variables
• Give a short specification of conditional
probability distribution
• Many random variables are conditionally
independent
• Simplifies computations
• Graphical representation
• DAG – causal relationships among random 14
15. 2.1 Definition of Bayesian
networks
A BN is a DAG in which each node is annotated
with quantitative probability information, namely:
• Nodes represent random variables (discrete or
continuous)
• Directed links XY: X has a direct influence on
Y, X is said to be a parent of Y
• each node X has an associated conditional
probability table, P(Xi | Parents(Xi)) that quantify
the effects of the parents on the node
Example: Weather, Cavity, Toothache, Catch
• Weather, Cavity Toothache, Cavity Catch
15
16. Bayesian network - example
16
Earthquake
Alarm
JohnCalls MaryCalls
Burglary
P(B)
0.001
P(E)
0.002
B E P(A)
T T 0.95
T F 0.94
F T 0.29
F F 0.001
A P(J)
T 0.9
F 0.05
A P(M)
T 0.7
F 0.01
B E P(A | B, E)
T F
T T 0.95 0.05
T F 0.94 0.06
F T 0.29 0.71
F F 0.0010.999
Conditional probability
table
17. 2.2 Bayesian network semantics
A) Represent a probability distribution
B) Specify conditional independence – build the
network
A) each value of the probability distribution can be
computed as:
P(X1=x1 … Xn=xn) = P(x1,…, xn) =
i=1,n P(xi | Parents(xi))
17
18. 2.3 Building the network
P(X1=x1 … Xn=xn) = P(x1,…, xn) =
P(xn | xn-1,…, x1) * P(xn-1,…, x1) = … =
P(xn | xn-1,…, x1) * P(xn-1 | xn-2,…, x1)* … P(x2|x1) * P(x1) =
i=1,n P(xi | xi-1,…, x1)
• We can see that P(Xi | Xi-1,…, X1) = P(xi | Parents(Xi)) if
Parents(Xi) { Xi-1,…, X1}
• The condition may be satisfied by labeling the nodes in
an order consistent with a DAG
• Intuitively, the parents of a node Xi must be all the nodes
Xi-1,…, X1 which have a direct influence on Xi.
18
19. Building the network - cont
• Pick a set of random variables that describe the problem
• Pick an ordering of those variables
• while there are still variables repeat
(a) choose a variable Xi and add a node associated to Xi
(b) assign Parents(Xi) a minimal set of nodes that
already exists in the network such that the conditional
independence property is satisfied
(c) define the conditional probability table for Xi
• Because each node is linked only to previous nodes
DAG
• P(MaryCalls | JohnCals, Alarm, Burglary, Earthquake) =
P(MaryCalls | Alarm)
19
20. Compactness of node ordering
• Far more compact than a probability distribution
• Example of locally structured system (or
sparse): each component interacts directly only
with a limited number of other components
• Associated usually with a linear growth in
complexity rather than with an exponential one
• The order of adding the nodes is important
• The correct order in which to add nodes is to add
the “root causes” first, then the variables they
influence, and so on, until we reach the leaves
20
21. 2.4 Probabilistic inferences
21
P(A V B) = P(A) * P(V|A) * P(B|V)
V
A
B
B
V
A
A V B
P(A V B) = P(V) * P(A|V) * P(B|V)
P(A V B) = P(A) * P(B) * P(V|A,B)
25. 3. Certainty factors
• The MYCIN model
• Certainty factors / Confidence coefficients (CF)
• Heuristic model of uncertain knowledge
• In MYCIN – two probabilistic functions to model
the degree of belief and the degree of disbelief in
a hypothesis
– function to measure the degree of belief - MB
– function to measure the degree of disbelief -
MD
• MB[h,e] – how much the belief in h increases
based on evidence e
• MD[h,e] - how much the disbelief in h increases
based on evidence e 25
27. Belief functions - features
• Value range
• If h is sure, i.e. P(h|e) = 1, then
• If the negation of h is sure, i.e. , P(h|e) = 0 then
27
0 MB[h,e] 1
0 MD[h,e] 1
1 CF[h,e] 1
MB[h,e] =
1 P(h)
1 P(h)
= 1
MD[h,e]= 0
CF[h,e]=1
MB[h,e]= 0
1
=
P(h)
0
P(h)
0
=
e]
MD[h,
CF[h,e]= 1
28. Example in MYCIN
• if (1) the type of the organism is gram-positive, and
• (2) the morphology of the organism is coccus, and
• (3) the growth of the organism is chain
• then there is a strong evidence (0.7) that the identity of
the organism is streptococcus
Example of facts in MYCIN :
• (identity organism-1 pseudomonas 0.8)
• (identity organism-2 e.coli 0.15)
• (morphology organism-2 coccus 1.0)
28
29. 3.2 Combining belief functions
29
(1) Incremental gathering of evidence
• The same attribute value, h, is obtained by two separate
paths of inference, with two separate CFs : CF[h,s1] si
CF[h,s2]
• The two different paths, corresponding to hypotheses s1
and s2 may be different braches of the search tree.
• CF[h, s1&s2] = CF[h,s1] + CF[h,s2] – CF[h,s1]*CF[h,s2]
• (identity organism-1 pseudomonas 0.8)
30. Combining belief functions
30
(2) Conjunction of hypothesis
• Applied for computing the CF associated to the
premises of a rule which ahs several conditions
if A = a1 and B = b1 then …
WM: (A a1 h1 cf1)(B b1 h2 cf2)
• CF[h1&h2, s] = min(CF[h1,s], CF[h2,s])
31. Combining belief functions
31
(3) Combining beliefs
• An uncertain value is deduced based on a rule
which has as input conditions based on uncertain
values (may be obtained by applying other rules
for example).
• Allows the computation of the CF of the fact
deduced by the rule based on the rule’s CF and
the CF of the hypotheses
• CF[s,e] – belief in a hypothesis s based on
previous evidence e
• CF[h,s] - CF in h if s is sure
• CF’[h,s] = CF[h,s] * CF [s,e]
32. Combining belief functions
32
(3) Combining beliefs – cont
if A = a1 and B = b1 then C = c1 0.7
ML: (A a1 0.9) (B b1 0.6)
CF(premises) = min(0.9, 0.6) = 0.6
CF (conclusion) = CF(premises) * CF(rule) = 0.6 * 0.7
ML: (C c1 0.42)
33. 3.3 Limits of CF
33
• CF of MYCIN assumes that that the hypothesis are
sustained by independent evidence
• An example shows what happens if this condition is
violated
A: The sprinkle functioned last night
U: The grass is wet in the morning
P: Last night it rained
34. 34
R1: if the sprinkle functioned last night
then there is a strong evidence (0.9) that the grass is wet in the
morning
R2: if the grass is wet in the morning
then there is a strong evidence (0.8) that it rained last night
• CF[U,A] = 0.9
• therefore the evidence sprinkle sustains the hypothesis wet
grass with CF = 0.9
• CF[P,U] = 0.8
• therefore the evidence wet grass sustains the hypothesis rain
with CF = 0.8
• CF[P,A] = 0.8 * 0.9 = 0.72
• therefore the evidence sprinkle sustains the hypothesis rain
with CF = 0.72
35. Artificial Intelligence 35
Traditional Logic
• Based on predicate logic
• Three important assumptions:
– Predicate descriptions are sufficient w.r.t. to
the domain
– Information is consistent
– Knowledge base grows monotonically
36. Artificial Intelligence 36
Non-monotonic Logic
• Addresses the three assumptions of traditional
logic
– Knowledge is incomplete
• No knowledge about p: true or false?
• Prolog – closed world assumption
– Knowledge is inconsistent
• Based on how the world usually works
• Most birds fly, but Ostrich doesn’t
– Knowledge base grows non-monotonically
• New observation may contradict the existing knowledge, thus
the existing knowledge may need removal.
• Inference based on assumptions, how come if the
assumptions are later shown to be incorrect
• Three modal operators are introduced
37. Artificial Intelligence 37
Unless Operator
• New information may invalidate previous results
• Implemented in TMS – Truth Maintenance Systems
to keep track of the reasoning steps and preserve the
KB consistency
• Introduce Unless operator
– Support inferences based on the belief that its argument is
not true
– Consider
• p(X) unless q(X) r(X)
If p(X) is true and not believe q(X) true then r(X)
• p(Z)
• r(W) s(W)
From above, conclude s(X).
Later, change believe or find q(X) true, what happens?
Retract r(X) and s(X)
– Unless deals with believe, not truth
• Either unknown or believed false
• Believed or known true
– Monotonocity
38. Artificial Intelligence 38
Is-consistent-with Operator M
• When reason, make sure the premises are
consistent
• Format: M p – p is consistent with KB
• Consider
– X good_student(X) M study_hard(X)
graduates(X)
– For all X who is a good student, if the fact that X
studies hard is consistent with KB, then X will
graduate
– Not necessary to prove that X study hard.
• How to decide p is consistent with KB
– Negation as failure
– Heuristic-based and limited search
39. Artificial Intelligence 39
Default Logic
• Introduce a new format of inference rules:
– A(Z) :B(Z) C(Z)
– If A(Z) is provable, and it is consistent with what we
know to assume B(Z), then conclude C(Z)
• Compare with is-consistent-with operator
– Similar
– Difference is the reasoning method
• In default logic, new rules are used to infer sets of plausible
extensions
– Example:
X good_student(X) :study_hard(X) graduates(X)
Y party(Y) :not(study_hard(Y)) not(graduates(X))
40. Artificial Intelligence 40
Fuzzy Sets
• Classic sets
– Completeness: x in either A or ¬A
– Exclusive: can not be in both A and ¬A
• Fuzzy sets
– Violate the two assumptions
– Possibility theory -- measure of confidence or believe
– Probability theory – randomness
– Process imprecision
– Introduce membership function
– Believe xA in some degree between 0 and 1,
inclusive
43. Artificial Intelligence 43
Fuzzy Set Operations
• Fuzzy set operations are defined as the
operations of membership functions
• Complement: ¬A = C
– mC = 1 – mA
• Union: A B =C
– mC = max(mA, mB)
• Intersection: A B = C
– mC = min(mA, mB)
• Difference: A – B = C
– mC = max(0, mA-mB)
44. Artificial Intelligence 44
Fuzzy Inference Rules
• Rule format and computation
– If x is A and y is B then z is C
mC(z) = min(mA(x), mB(y))
– If x is A or y is B then z is C
mC(z) = max(mA(x), mB(y))
– If x is not A then z is C
mC(z) = 1 – mA(x)
45. Artificial Intelligence 45
The fuzzy regions for the input values θ (a) and dθ/dt (b).
N – Negative, Z – Zero, P – Positive
46. Artificial Intelligence 46
The fuzzy regions of the output value u, indicating the
movement of the pendulum base: Negative Big,
Negative, Zero, Positive, Positive Big.
48. Artificial Intelligence 48
The Fuzzy Associative
Matrix (FAM) for the
pendulum problem. The
input values are on the
left and top.
Fuzzy Rules:
49. Artificial Intelligence 49
The fuzzy consequents (a) and their union (b). The
centroid of the union (-2) is the crisp output.
50. Artificial Intelligence 50
Dempster-Shafer Theory
• Probability theory limitation
– Assign a single number to measure any situation, no matter how it is
complex
– Cannot deal with missing evidence, heuristics, and limited knowledge
• Dempster-Shafer theory
– Extend probability theory
– Consider a set of propositions as a whole
– Assign a set of propositions an interval [believe, plausibility] to constraint
the degree of belief for each individual propositions in the set
– The belief measure bel is in [0,1]
• 0 – no support evidence for a set of propositions
• 1 – full support evidence for a set of propositions
– The plausibility of p,
• pl(p) = 1 – bel(not(p))
• Reflect how evidence of not(p) relates to the possibility for belief in p
• Bel(not(p))=1: full support for not(p), no possibility for p
• Bel(not(p))=0: no support for not(p), full possibility for p
• Range is also in [0,1]
51. Artificial Intelligence 51
Properties of Dempster-Shafer
• Initially, no support evidence for either competing
hypotheses, say h1 and h2
– Dempster-Shafer: [bel, pl] = [0, 1]
– Probability theory: p(h1)=p(h2)=0.5
• Dempster-Shafer belief functions satisfy weaker
axioms than probability function
• Two fundamental ideas:
– Obtaining belief degrees for one question from
subjective probabilities for related questions
– Using Dempster rule to combine these belief degrees
when they are based on independent evidence
52. Artificial Intelligence 52
An Example
• Two persons M and B with reliabilities detect a computer and claim
the result independently. How you believe their claims?
• Question (Q): detection claim
• Related question (RQ): detectors’ reliability
• Dempster-Shafer approach
– Obtain belief degrees for Q from subjective (prior) probabilities for RQ
for each person
– Combine belief degrees from two persons
• Person M:
– reliability 0.9, unreliability 0.1
– Claim h1
– Belief degree of h1 is bel(h1)=0.9
– Belief degree of not(h1) is bel(not(h1))=0.0, different from probability
theory, since no evidence supporting not(h1)
– pl(h1) = 1 – bel(not(h1)) = 1-0 =1
– Thus belief measure for M claim h1 is [0.9, 1]
• Person B:
– Reliability 0.8, unreliability 0.2
– Claim h2
– bel(h2) =0.8, bel(not(h2))=0, pl(h2)=1-bel(not(h2))=1-0
53. Artificial Intelligence 53
Combining Belief Measure
• Set of propositions: M claim h1 and B claim h2
– Case 1: h1 = h2
• Reliability M and B: 09x0.8=0.72
• Unreliability M and B: 0.1x0.2=0.02
• The probability that at least one of two is reliable: 1-0.02=0.98
• Belief measure for h1=h2 is [0.98,1]
– Case 2: h1 = not(h2)
• Cannot be both correct and reliable
• At least one is unreliable
– Reliable M and unreliable B: 0.9x(1-0.8)=0.18
– Reliable B and unreliable M: 0.8x(1-0.1)=0.08
– Unreliable M and B: (1-0.9)x(1-0.8)=0.02
– At least one is unreliable: 0.18+0.08+0.02=0.28
• Given at least one is unreliable, posterior probabilities
– Reliable M and unreliable B: 0.18/0.28=0.643
– Reliable B and unreliable M: 0.08/0.28=0.286
• Belief measure for h1
– Bel(h1)=0.643, bel(not(h1))=bel(h2)=0.286
– Pl(h1)=1-bel(not(h1))=1-0.286=0.714
– Belief measure: [0.643, 0.714]
• Belief measure for h2
– Bel(h2)=0.286, bel(not(h2))=bel(h1)=0.683
– Pl(h2)=1-bel(not(h2))=1-0.683=0.317
54. Artificial Intelligence 54
Dempster’s Rule
• Assumption:
– probable questions are independent a priori
– As new evidence collected and conflicts, independency may
disappear
• Two steps
1. Sort the uncertainties into a priori independent pieces of evidence
2. Carry out Dempster rule
• Consider the previous example
– After M and B claimed, a repair person is called to check the
computer, and both M and B witnessed this.
– Three independent items of evidence must be combined
• Not all evidence is directly supportive of individual
elements of a set of hypotheses, but often supports
different subsets of hypotheses, in favor of some and
against others
55. Artificial Intelligence 55
General Dempster’s Rule
• Q – an exhaustive set of mutually exclusive
hypotheses
• Z – a subset of Q
• M – probability density function to assign a belief
measure to Z
• Mn(Z) – belief degree to Z, where n is the number of
sources of evidences
56. Artificial Intelligence 56
Discrete Markov Process
• Finite state machine
– A graphical representation
– State transition depends on input stream
– States and transitions reflect properties of a formal
language
• Probabilistic finite state machine
– A finite state machine
– Transition function represented by a probability
distribution on the current state
• Discrete Markov process (chain, machine)
– A specialization of probabilistic finite state machine
– Ignores its input values
57. Artificial Intelligence 57
A Markov state machine or Markov chain with four states, s1,
..., s4
At any time the system is in one of distinct states
The system undergoes state change or remain
Divide time into discrete intervals: t1, t2, …, tn
Change state according to the probability distribution of
each state
S(t) – the actual state at time t
p(S(t)) = p(S(t)|S(t-1), s(t-2), s(t-3), …)
First-order markov chain
– Only depends on the direct predecessor state
– P(S(t)) = p(S(t)|S(t-1))
58. Artificial Intelligence 58
Observable Markov Model
• Assume p(S(t)|S(t-1)) is time invariant, that is, transition between
specific states retains the same probabilistic relationship
• State transition probability aij between si and sj:
– aij=p(S(t)=si|S(t-1)=sj), 1<=i,j<=N
– If i=j, no transition (remain the same state)
– Properties: aij >=0, iaij=1
59. Artificial Intelligence 59
S1 – sun
S2 – cloudy
S3 – fog
S4 – precipitation
Time intervals:
noon to noon
Question: suppose that
today is sunny, what is
the probability of the
next five days being
sunny, sunny, cloudy,
cloudy, precipitation?
60. Restrictiveness of Markov models
• Are past and future really independent given current state?
• E.g., suppose that when it rains, it rains for at most 2 days
S1 S2 S3 S4 …
• Second-order Markov process
• Workaround: change meaning of “state” to events of last 2 days
S1, S2 …
S2, S3 S3, S4 S4, S5
• Another approach: add more information to the state
• E.g., the full state of the world would include whether the
sky is full of water
– Additional information may not be observable
– Blowup of number of states…
61. Hidden Markov models (HMMs)
• Same as Markov model, except we cannot see the
state
• Instead, we only see an observation each period,
which depends on the current state
S1 S2 S3 … St …
• Still need a transition model: P(St+1 = j | St = i) = aij
• Also need an observation model: P(Ot = k | St = i) = bik
O1 O2 O3 … Ot …
62. Weather example extended to HMM
• Transition probabilities:
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Observation: labmate wet or dry
• bsw = .1, bcw = .3, brw = .8
63. HMM weather example: a question
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• You have been stuck in the lab for three days (!)
• On those days, your labmate was dry, wet, wet,
respectively
• What is the probability that it is now raining outside?
• P(S2 = r | O0 = d, O1 = w, O2 = w)
• By Bayes’ rule, really want to know P(S2, O0 = d, O1 = w, O2 = w)
bsw = .1
bcw = .3
brw = .8
64. Solving the question
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Computationally efficient approach: first compute
P(S1 = i, O0 = d, O1 = w) for all states i
• General case: solve for P(St, O0 = o0, O1 = o1, …, Ot
= ot) for t=1, then t=2, … This is called monitoring
• P(St, O0 = o0, O1 = o1, …, Ot = ot) = Σst-1
P(St-1 = st-1,
O0 = o0, O1 = o1, …, Ot-1 = ot-1) P(St | St-1 = st-1) P(Ot =
o | S )
bsw = .1
bcw = .3
brw = .8
65. Predicting further out
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• You have been stuck in the lab for three days
• On those days, your labmate was dry, wet, wet,
respectively
• What is the probability that two days from now it
will be raining outside?
• P(S4 = r | O0 = d, O1 = w, O2 = w)
bsw = .1
bcw = .3
brw = .8
66. Predicting further out, continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Want to know: P(S4 = r | O0 = d, O1 = w, O2 = w)
• Already know how to get: P(S2 | O0 = d, O1 = w, O2 = w)
• P(S3 = r | O0 = d, O1 = w, O2 = w) =
Σs2
P(S3 = r, S2 = s2 | O0 = d, O1 = w, O2 = w)
Σs2
P(S3 = r | S2 = s2)P(S2 = s2 | O0 = d, O1 = w, O2 = w)
• Etc. for S4
• So: monitoring first, then straightforward Markov process
updates
bsw = .1
bcw = .3
brw = .8
67. Integrating newer information
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• You have been stuck in the lab for four days (!)
• On those days, your labmate was dry, wet, wet, dry
respectively
• What is the probability that two days ago it was
raining outside? P(S1 = r | O0 = d, O1 = w, O2 = w, O3
= d)
– Smoothing or hindsight problem
bsw = .1
bcw = .3
brw = .8
68. Hindsight problem continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Want: P(S1 = r | O0 = d, O1 = w, O2 = w, O3 = d)
• “Partial” application of Bayes’ rule:
P(S1 = r | O0 = d, O1 = w, O2 = w, O3 = d) =
P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w) /
P(O2 = w, O3 = d | O0 = d, O1 = w)
• So really want to know P(S1, O2 = w, O3 = d | O0 = d, O1 = w)
bsw = .1
bcw = .3
brw = .8
69. Hindsight problem continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Want to know P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w)
• P(S1 = r, O2 = w, O3 = d | O0 = d, O1 = w) =
P(S1 = r | O0 = d, O1 = w) P(O2 = w, O3 = d | S1 = r)
• Already know how to compute P(S1 = r | O0 = d, O1 = w)
• Just need to compute P(O2 = w, O3 = d | S1 = r)
bsw = .1
bcw = .3
brw = .8
70. Hindsight problem continued…
s
c r
.1
.2
.6
.3
.4
.3
.3
.5
.3
• Just need to compute P(O2 = w, O3 = d | S1 = r)
• P(O2 = w, O3 = d | S1 = r) =
Σs2
P(S2 = s2, O2 = w, O3 = d | S1 = r) =
Σs2
P(S2 = s2 | S1 = r) P(O2 = w | S2 = s2) P(O3 = d | S2 = s2)
• First two factors directly in the model; last factor is a
“smaller” problem of the same kind
• Use dynamic programming, backwards from the future
– Similar to forwards approach from the past
bsw = .1
bcw = .3
brw = .8