The document provides an overview of probability concepts including events, sample spaces, random variables, probability distributions, independence, conditional probability, Bayes' rule, mean, variance, and examples. It first defines events, sample spaces, and probability measures. It then introduces discrete and continuous random variables as well as common probability distributions. The document discusses joint, marginal, and conditional probability distributions and how to calculate probabilities using rules like the chain rule, total probability, and Bayes' rule. It also covers independence, conditional independence, mean, variance, and their properties. Finally, it gives the Monty Hall problem as an example and solves it using Bayes' rule.
presentation on Fandamental of ProbabilityMaheshGour5
The document provides an overview of key concepts in probability, including events and event spaces, random variables, joint and marginal probability distributions, independence, conditional independence, mean, variance, and examples like the Monty Hall problem. It defines concepts like sample spaces, events, probability measures, discrete and continuous random variables, probability mass and density functions, and common distributions. It also covers the chain rule, Bayes' rule, law of total probability, marginalization, independence, conditional independence, expectation, variance, and the big picture of statistical modeling and inference.
This document provides an overview of Bayes classifiers and Naive Bayes classifiers. It begins by introducing probabilistic classification and the goal of predicting a categorical output given input attributes. It then discusses the Naive Bayes assumption that attributes are conditionally independent given the class label. This allows estimating each p(attribute|class) separately rather than the full joint distribution. The document covers maximum likelihood estimation, Laplace smoothing, and using Naive Bayes for problems like spam filtering. It contrasts generative models like Naive Bayes that model p(attribute|class) with discriminative approaches that directly model p(class|attributes).
This document introduces predicates and quantifiers in predicate logic. It defines predicates as functions that take objects and return propositions. Predicates allow reasoning about whole classes of entities. Quantifiers like "for all" (universal quantifier ∀) and "there exists" (existential quantifier ∃) are used to make general statements about predicates over a universe of discourse. Examples demonstrate how predicates and quantifiers can express properties and relationships for objects. Laws of quantifier equivalence are also presented.
This document introduces predicates and quantifiers in predicate logic. It defines predicates as functions that take objects and return propositions. Predicates allow reasoning about whole classes of entities. Quantifiers like "for all" (universal quantifier ∀) and "there exists" (existential quantifier ∃) are used to make general statements about predicates over a universe of discourse. Examples demonstrate how predicates and quantifiers can represent concepts like "all parking spaces are full" or "some parking space is full." Laws of quantifier equivalence and negation rules with quantifiers are also presented.
The document provides an overview of key concepts in probability theory and stochastic processes. It defines fundamental terms like sample space, events, probability, conditional probability, independence, random variables, and common probability distributions including binomial, Poisson, exponential, uniform, and Gaussian distributions. Examples are given for each concept to illustrate how it applies to modeling random experiments and computing probabilities. The three main axioms of probability are stated. Key properties and formulas for expectation, variance, and conditional expectation are also summarized.
This document discusses various methods for estimating normalizing constants that arise when evaluating integrals numerically. It begins by noting there are many computational methods for approximating normalizing constants across different communities. It then lists the topics that will be covered in the upcoming workshop, including discussions on estimating constants using Monte Carlo methods and Bayesian versus frequentist approaches. The document provides examples of estimating normalizing constants using Monte Carlo integration, reverse logistic regression, and Xiao-Li Meng's maximum likelihood estimation approach. It concludes by discussing some of the challenges in bringing a statistical framework to constant estimation problems.
The document discusses information theory concepts like entropy, joint entropy, conditional entropy, and mutual information. It then discusses how these concepts relate to generalization in deep learning models. Specifically, it explains that the PAC-Bayesian bound is data-dependent, so models with high VC dimension can still generalize if the data is clean, resulting in low KL divergence between the prior and posterior distributions.
presentation on Fandamental of ProbabilityMaheshGour5
The document provides an overview of key concepts in probability, including events and event spaces, random variables, joint and marginal probability distributions, independence, conditional independence, mean, variance, and examples like the Monty Hall problem. It defines concepts like sample spaces, events, probability measures, discrete and continuous random variables, probability mass and density functions, and common distributions. It also covers the chain rule, Bayes' rule, law of total probability, marginalization, independence, conditional independence, expectation, variance, and the big picture of statistical modeling and inference.
This document provides an overview of Bayes classifiers and Naive Bayes classifiers. It begins by introducing probabilistic classification and the goal of predicting a categorical output given input attributes. It then discusses the Naive Bayes assumption that attributes are conditionally independent given the class label. This allows estimating each p(attribute|class) separately rather than the full joint distribution. The document covers maximum likelihood estimation, Laplace smoothing, and using Naive Bayes for problems like spam filtering. It contrasts generative models like Naive Bayes that model p(attribute|class) with discriminative approaches that directly model p(class|attributes).
This document introduces predicates and quantifiers in predicate logic. It defines predicates as functions that take objects and return propositions. Predicates allow reasoning about whole classes of entities. Quantifiers like "for all" (universal quantifier ∀) and "there exists" (existential quantifier ∃) are used to make general statements about predicates over a universe of discourse. Examples demonstrate how predicates and quantifiers can express properties and relationships for objects. Laws of quantifier equivalence are also presented.
This document introduces predicates and quantifiers in predicate logic. It defines predicates as functions that take objects and return propositions. Predicates allow reasoning about whole classes of entities. Quantifiers like "for all" (universal quantifier ∀) and "there exists" (existential quantifier ∃) are used to make general statements about predicates over a universe of discourse. Examples demonstrate how predicates and quantifiers can represent concepts like "all parking spaces are full" or "some parking space is full." Laws of quantifier equivalence and negation rules with quantifiers are also presented.
The document provides an overview of key concepts in probability theory and stochastic processes. It defines fundamental terms like sample space, events, probability, conditional probability, independence, random variables, and common probability distributions including binomial, Poisson, exponential, uniform, and Gaussian distributions. Examples are given for each concept to illustrate how it applies to modeling random experiments and computing probabilities. The three main axioms of probability are stated. Key properties and formulas for expectation, variance, and conditional expectation are also summarized.
This document discusses various methods for estimating normalizing constants that arise when evaluating integrals numerically. It begins by noting there are many computational methods for approximating normalizing constants across different communities. It then lists the topics that will be covered in the upcoming workshop, including discussions on estimating constants using Monte Carlo methods and Bayesian versus frequentist approaches. The document provides examples of estimating normalizing constants using Monte Carlo integration, reverse logistic regression, and Xiao-Li Meng's maximum likelihood estimation approach. It concludes by discussing some of the challenges in bringing a statistical framework to constant estimation problems.
The document discusses information theory concepts like entropy, joint entropy, conditional entropy, and mutual information. It then discusses how these concepts relate to generalization in deep learning models. Specifically, it explains that the PAC-Bayesian bound is data-dependent, so models with high VC dimension can still generalize if the data is clean, resulting in low KL divergence between the prior and posterior distributions.
The document discusses probability models and how they can be used to make inferences from data. It introduces concepts like random variables, probability distributions, conditional probability, Bayes' rule, and Markov models. It provides examples of how to compute probabilities both mathematically and through simulation. The goal is to choose a probability model that best explains the data and make predictions from that model.
The document discusses probability models and how they can be used to make inferences from data. It introduces concepts like random variables, probability distributions, conditional probability, Bayes' rule, and Markov models. It provides examples of how to compute probabilities both mathematically and through simulation. The goal is to choose a probability model that best explains the data and make predictions from that model.
The document discusses probability models and how they can be used to make inferences from data. It introduces concepts like random variables, probability distributions, conditional probability, Bayes' rule, and Markov models. It provides examples of how to compute probabilities both mathematically and through simulation. The goal is to choose a probability model that best explains the data and make predictions from that model.
The document discusses probability models and how they can be used to make inferences from data. It introduces concepts like random variables, probability distributions, conditional probability, Bayes' rule, and Markov models. It provides examples of how to compute probabilities both mathematically and through simulation. The goal is to choose a probability model that best explains the data and make predictions from that model.
The document discusses information theory concepts like entropy, joint entropy, conditional entropy, and mutual information. It then discusses how these concepts relate to generalization in deep learning models. Specifically, it explains that the PAC-Bayesian bound is data-dependent, so models with high VC dimension can still generalize if the data is clean, resulting in low KL divergence between the prior and posterior distributions.
Basic statistics for algorithmic tradingQuantInsti
In this presentation we try to understand the core basics of statistics and its application in algorithmic trading.
We start by defining what statistics is. Collecting data is the root of statistics. We need data to analyse and take quantitative decisions.
While analyzing, there are certain parameters for statistics, this branches statistics into two - descriptive statistics & inferential statistics.
This data that we have collected can be classified into uni-variate and bi-variate. It also tries to explain the fundamental difference.
Going Further we also cover topics like regression line, Coefficient of Determination, Homoscedasticity and Heteroscedasticity.
In this way the presentation look at various aspects of statistics which are used for algorithmic trading.
To learn the advanced applications of statistics for HFT & Quantitative Trading connect with us one our website: www.quantinsti.com.
Accounting for uncertainty is a crucial component in decision making (e.g., classification) because of ambiguity in our measurements.
Probability theory is the proper mechanism for accounting for uncertainty.
This document provides a concise probability cheatsheet compiled by William Chen and others. It covers key probability concepts like counting rules, sampling tables, definitions of probability, independence, unions and intersections, joint/marginal/conditional probabilities, Bayes' rule, random variables and their distributions, expected value, variance, indicators, moment generating functions, and independence of random variables. The cheatsheet is licensed under CC BY-NC-SA 4.0 and the last updated date is March 20, 2015.
This document provides an introduction to fuzzy logic and fuzzy set theory. It defines key concepts such as membership functions, fuzzy sets over discrete and continuous universes, operations on fuzzy sets like intersection and union using t-norms and t-conorms, and linguistic variables and terms. It also discusses fuzzy rules and binary fuzzy relations used to represent fuzzy rules.
The document discusses Gaussian Bayesian networks and their properties. It covers topics such as:
- Learning the parameters of a Gaussian Bayesian network from data using maximum likelihood estimation. Closed form solutions exist for the parameters.
- Representing multivariate Gaussian distributions in three equivalent forms: covariance form, Gaussian Bayesian network (information form), and Gaussian Markov random field. The information form captures conditional independencies.
- Performing operations like marginalization and conditioning on Gaussian distributions is easier in the information form compared to the covariance form.
The document discusses predicate logic and quantification. It introduces predicates, universal and existential quantifiers, and provides examples of their usage. It also discusses uniqueness quantification, nested quantification, and the importance of quantifier order. Negation of quantified expressions is explained. The concept of a limit of a function is introduced and its negation is derived.
1. Regression is a supervised learning technique used to predict continuous valued outputs. It can be used to model relationships between variables to fit a linear equation to the training data.
2. Gradient descent is an iterative algorithm that is used to find the optimal parameters for a regression model by minimizing the cost function. It works by taking steps in the negative gradient direction of the cost function to converge on the local minimum.
3. The learning rate determines the step size in gradient descent. A small learning rate leads to slow convergence while a large rate may oscillate and fail to converge. Gradient descent automatically takes smaller steps as it approaches the local minimum.
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VECsundarKanagaraj1
This document discusses uncertainty and statistical reasoning in artificial intelligence. It covers probability theory, Bayesian networks, and certainty factors. Key topics include probability distributions, Bayes' rule, building Bayesian networks, different types of probabilistic inferences using Bayesian networks, and defining and combining certainty factors. Case studies are provided to illustrate each algorithm.
The document discusses predicate logic, which extends propositional logic to permit reasoning about classes of entities through the use of predicates and variables. Predicate logic uses predicates that relate variables to form propositions, and quantifiers like "for all" and "there exists" to specify whether predicates apply to all or some members of a universe of discourse. Well-formed formulas in predicate logic contain quantified variables and avoid free variables to form unambiguous propositions.
The document discusses predicate logic, which extends propositional logic to permit reasoning about classes of entities through the use of predicates and variables. Predicate logic uses predicates that relate variables to form propositions, and quantifiers like "for all" and "there exists" to specify whether predicates apply to all or some members of a universe of discourse. Well-formed formulas in predicate logic contain quantified variables and avoid free variables to form unambiguous propositions.
Recursive State Estimation AI for Robotics.pdff20220630
1) Recursive state estimation uses probabilistic methods like Bayes filters to estimate states of a dynamic system from sensor measurements over time. Bayes filters involve prediction of state based on motion model and correction of prediction based on sensor observations using Bayes' rule.
2) An example of applying a Bayes filter to estimate the state of a door being open or closed is given. The robot's belief is updated as it takes actions like pushing the door and receives sensor feedback.
3) Key concepts discussed include belief distributions, probabilistic generative models relating state transitions and measurements, and the Bayes filter algorithm involving prediction and correction steps.
This document discusses predicates and quantifiers in predicate logic. It begins by explaining the limitations of propositional logic in expressing statements involving variables and relationships between objects. It then introduces predicates as statements involving variables, and quantifiers like universal ("for all") and existential ("there exists") to express the extent to which a predicate is true. Examples are provided to demonstrate how predicates and quantifiers can be used to represent statements and enable logical reasoning. The document also covers translating statements between natural language and predicate logic, and negating quantified statements.
This document provides a probability cheatsheet compiled by William Chen and Joe Blitzstein with contributions from others. It is licensed under CC BY-NC-SA 4.0 and contains information on topics like counting rules, probability definitions, random variables, expectations, independence, and more. The cheatsheet is designed to summarize essential concepts in probability.
The document discusses probability models and how they can be used to make inferences from data. It introduces concepts like random variables, probability distributions, conditional probability, Bayes' rule, and Markov models. It provides examples of how to compute probabilities both mathematically and through simulation. The goal is to choose a probability model that best explains the data and make predictions from that model.
The document discusses probability models and how they can be used to make inferences from data. It introduces concepts like random variables, probability distributions, conditional probability, Bayes' rule, and Markov models. It provides examples of how to compute probabilities both mathematically and through simulation. The goal is to choose a probability model that best explains the data and make predictions from that model.
The document discusses probability models and how they can be used to make inferences from data. It introduces concepts like random variables, probability distributions, conditional probability, Bayes' rule, and Markov models. It provides examples of how to compute probabilities both mathematically and through simulation. The goal is to choose a probability model that best explains the data and make predictions from that model.
The document discusses probability models and how they can be used to make inferences from data. It introduces concepts like random variables, probability distributions, conditional probability, Bayes' rule, and Markov models. It provides examples of how to compute probabilities both mathematically and through simulation. The goal is to choose a probability model that best explains the data and make predictions from that model.
The document discusses information theory concepts like entropy, joint entropy, conditional entropy, and mutual information. It then discusses how these concepts relate to generalization in deep learning models. Specifically, it explains that the PAC-Bayesian bound is data-dependent, so models with high VC dimension can still generalize if the data is clean, resulting in low KL divergence between the prior and posterior distributions.
Basic statistics for algorithmic tradingQuantInsti
In this presentation we try to understand the core basics of statistics and its application in algorithmic trading.
We start by defining what statistics is. Collecting data is the root of statistics. We need data to analyse and take quantitative decisions.
While analyzing, there are certain parameters for statistics, this branches statistics into two - descriptive statistics & inferential statistics.
This data that we have collected can be classified into uni-variate and bi-variate. It also tries to explain the fundamental difference.
Going Further we also cover topics like regression line, Coefficient of Determination, Homoscedasticity and Heteroscedasticity.
In this way the presentation look at various aspects of statistics which are used for algorithmic trading.
To learn the advanced applications of statistics for HFT & Quantitative Trading connect with us one our website: www.quantinsti.com.
Accounting for uncertainty is a crucial component in decision making (e.g., classification) because of ambiguity in our measurements.
Probability theory is the proper mechanism for accounting for uncertainty.
This document provides a concise probability cheatsheet compiled by William Chen and others. It covers key probability concepts like counting rules, sampling tables, definitions of probability, independence, unions and intersections, joint/marginal/conditional probabilities, Bayes' rule, random variables and their distributions, expected value, variance, indicators, moment generating functions, and independence of random variables. The cheatsheet is licensed under CC BY-NC-SA 4.0 and the last updated date is March 20, 2015.
This document provides an introduction to fuzzy logic and fuzzy set theory. It defines key concepts such as membership functions, fuzzy sets over discrete and continuous universes, operations on fuzzy sets like intersection and union using t-norms and t-conorms, and linguistic variables and terms. It also discusses fuzzy rules and binary fuzzy relations used to represent fuzzy rules.
The document discusses Gaussian Bayesian networks and their properties. It covers topics such as:
- Learning the parameters of a Gaussian Bayesian network from data using maximum likelihood estimation. Closed form solutions exist for the parameters.
- Representing multivariate Gaussian distributions in three equivalent forms: covariance form, Gaussian Bayesian network (information form), and Gaussian Markov random field. The information form captures conditional independencies.
- Performing operations like marginalization and conditioning on Gaussian distributions is easier in the information form compared to the covariance form.
The document discusses predicate logic and quantification. It introduces predicates, universal and existential quantifiers, and provides examples of their usage. It also discusses uniqueness quantification, nested quantification, and the importance of quantifier order. Negation of quantified expressions is explained. The concept of a limit of a function is introduced and its negation is derived.
1. Regression is a supervised learning technique used to predict continuous valued outputs. It can be used to model relationships between variables to fit a linear equation to the training data.
2. Gradient descent is an iterative algorithm that is used to find the optimal parameters for a regression model by minimizing the cost function. It works by taking steps in the negative gradient direction of the cost function to converge on the local minimum.
3. The learning rate determines the step size in gradient descent. A small learning rate leads to slow convergence while a large rate may oscillate and fail to converge. Gradient descent automatically takes smaller steps as it approaches the local minimum.
Unit IV UNCERTAINITY AND STATISTICAL REASONING in AI K.Sundar,AP/CSE,VECsundarKanagaraj1
This document discusses uncertainty and statistical reasoning in artificial intelligence. It covers probability theory, Bayesian networks, and certainty factors. Key topics include probability distributions, Bayes' rule, building Bayesian networks, different types of probabilistic inferences using Bayesian networks, and defining and combining certainty factors. Case studies are provided to illustrate each algorithm.
The document discusses predicate logic, which extends propositional logic to permit reasoning about classes of entities through the use of predicates and variables. Predicate logic uses predicates that relate variables to form propositions, and quantifiers like "for all" and "there exists" to specify whether predicates apply to all or some members of a universe of discourse. Well-formed formulas in predicate logic contain quantified variables and avoid free variables to form unambiguous propositions.
The document discusses predicate logic, which extends propositional logic to permit reasoning about classes of entities through the use of predicates and variables. Predicate logic uses predicates that relate variables to form propositions, and quantifiers like "for all" and "there exists" to specify whether predicates apply to all or some members of a universe of discourse. Well-formed formulas in predicate logic contain quantified variables and avoid free variables to form unambiguous propositions.
Recursive State Estimation AI for Robotics.pdff20220630
1) Recursive state estimation uses probabilistic methods like Bayes filters to estimate states of a dynamic system from sensor measurements over time. Bayes filters involve prediction of state based on motion model and correction of prediction based on sensor observations using Bayes' rule.
2) An example of applying a Bayes filter to estimate the state of a door being open or closed is given. The robot's belief is updated as it takes actions like pushing the door and receives sensor feedback.
3) Key concepts discussed include belief distributions, probabilistic generative models relating state transitions and measurements, and the Bayes filter algorithm involving prediction and correction steps.
This document discusses predicates and quantifiers in predicate logic. It begins by explaining the limitations of propositional logic in expressing statements involving variables and relationships between objects. It then introduces predicates as statements involving variables, and quantifiers like universal ("for all") and existential ("there exists") to express the extent to which a predicate is true. Examples are provided to demonstrate how predicates and quantifiers can be used to represent statements and enable logical reasoning. The document also covers translating statements between natural language and predicate logic, and negating quantified statements.
This document provides a probability cheatsheet compiled by William Chen and Joe Blitzstein with contributions from others. It is licensed under CC BY-NC-SA 4.0 and contains information on topics like counting rules, probability definitions, random variables, expectations, independence, and more. The cheatsheet is designed to summarize essential concepts in probability.
This document provides an overview of index numbers. It defines index numbers as quantitative measures of changes in variables like prices, production, or inventory over time. The document outlines different types of index numbers like simple aggregative, simple average of relatives, weighted index numbers using methods like Laspeyres, Paasche, Fisher's ideal. It also discusses value index numbers, chain index numbers, and provides examples of calculating different types of index numbers.
This document discusses trend analysis, which uses historical data to predict future movements in stocks. Trend analysis assumes past performance can indicate future trends. Various trend analysis methods are described, including free-hand graphical analysis, semi-average analysis, moving average analysis, and least squares analysis. These techniques establish patterns over time that can be used for forecasting and evaluating company performance.
This document discusses trend analysis, which uses historical data to predict future movements in stocks. Trend analysis assumes past performance can indicate future trends. Various trend analysis methods are described, including free-hand graphical analysis, semi-average analysis, moving average analysis, and least squares analysis. These techniques establish patterns over time that can be used for forecasting and evaluating company performance.
The document outlines 5 steps to success. It lists the numbers 1 through 5 and includes descriptions of each step in Latin. The 5 steps are presented in both ascending and descending order. Additional text provides an introduction to the steps and encourages the reader to follow the author on social media.
This document introduces key concepts in probability:
1. Probability is the likelihood of an event occurring, which can be expressed as a number or words like "impossible" or "likely".
2. Events can be classified as exhaustive, favorable, mutually exclusive, equally likely, complementary, and independent.
3. There are three approaches to defining probability: classical, frequency, and axiomatic. The classical approach defines probability as the number of favorable outcomes over the total number of possible outcomes. The frequency approach defines it as the limit of favorable outcomes over total trials. The axiomatic approach uses axioms like probabilities being between 0 and 1.
4. Several properties of probability are described, like the sum
The document contains repeating sections with titles and lorem ipsum text. It lacks substantial details or unique information. Overall it provides no clear purpose or high-level summary due to the repetitive generic content.
This document outlines the marketing research process in 11 steps: 1) Establishing the need for research, 2) Defining the problem, 3) Establishing research objectives, 4) Determining the research design, 5) Identifying information types and sources, 6) Determining data collection methods, 7) Designing data collection forms, 8) Determining sample size and plan, 9) Collecting data, 10) Analyzing data, and 11) Preparing and presenting the final report. The steps involve defining the marketing problem, designing the research, collecting and analyzing data, and communicating results to solve the problem.
Did you know that drowning is a leading cause of unintentional death among young children? According to recent data, children aged 1-4 years are at the highest risk. Let's raise awareness and take steps to prevent these tragic incidents. Supervision, barriers around pools, and learning CPR can make a difference. Stay safe this summer!
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of March 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
Codeless Generative AI Pipelines
(GenAI with Milvus)
https://ml.dssconf.pl/user.html#!/lecture/DSSML24-041a/rate
Discover the potential of real-time streaming in the context of GenAI as we delve into the intricacies of Apache NiFi and its capabilities. Learn how this tool can significantly simplify the data engineering workflow for GenAI applications, allowing you to focus on the creative aspects rather than the technical complexities. I will guide you through practical examples and use cases, showing the impact of automation on prompt building. From data ingestion to transformation and delivery, witness how Apache NiFi streamlines the entire pipeline, ensuring a smooth and hassle-free experience.
Timothy Spann
https://www.youtube.com/@FLaNK-Stack
https://medium.com/@tspann
https://www.datainmotion.dev/
milvus, unstructured data, vector database, zilliz, cloud, vectors, python, deep learning, generative ai, genai, nifi, kafka, flink, streaming, iot, edge
Open Source Contributions to Postgres: The Basics POSETTE 2024ElizabethGarrettChri
Postgres is the most advanced open-source database in the world and it's supported by a community, not a single company. So how does this work? How does code actually get into Postgres? I recently had a patch submitted and committed and I want to share what I learned in that process. I’ll give you an overview of Postgres versions and how the underlying project codebase functions. I’ll also show you the process for submitting a patch and getting that tested and committed.
We are pleased to share with you the latest VCOSA statistical report on the cotton and yarn industry for the month of May 2024.
Starting from January 2024, the full weekly and monthly reports will only be available for free to VCOSA members. To access the complete weekly report with figures, charts, and detailed analysis of the cotton fiber market in the past week, interested parties are kindly requested to contact VCOSA to subscribe to the newsletter.
2. Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
3. Sample space and Events
• W : Sample Space, result of an experiment
• If you toss a coin twice W = {HH,HT,TH,TT}
• Event: a subset of W
• First toss is head = {HH,HT}
• S: event space, a set of events:
• Closed under finite union and complements
• Entails other binary operation: union, diff, etc.
• Contains the empty event and W
4. Probability Measure
• Defined over (W,S) s.t.
• P(a) >= 0 for all a in S
• P(W) = 1
• If a, b are disjoint, then
• P(a U b) = p(a) + p(b)
• We can deduce other axioms from the above ones
• Ex: P(a U b) for non-disjoint event
P(a U b) = p(a) + p(b) – p(a ∩ b)
5. Visualization
• We can go on and define conditional
probability, using the above visualization
7. Rule of total probability
A
B1
B2
B3
B4
B5
B6
B7
) ) )
= i
i B
A
P
B
P
A
p |
8. From Events to Random Variable
• Almost all the semester we will be dealing with RV
• Concise way of specifying attributes of outcomes
• Modeling students (Grade and Intelligence):
• W = all possible students
• What are events
• Grade_A = all students with grade A
• Grade_B = all students with grade B
• Intelligence_High = … with high intelligence
• Very cumbersome
• We need “functions” that maps from W to an
attribute space.
• P(G = A) = P({student ϵ W : G(student) = A})
10. Discrete Random Variables
• Random variables (RVs) which may take on
only a countable number of distinct values
– E.g. the total number of tails X you get if you flip
100 coins
• X is a RV with arity k if it can take on exactly
one value out of {x1, …, xk}
– E.g. the possible values that X can take on are 0, 1,
2, …, 100
11. Probability of Discrete RV
• Probability mass function (pmf): P(X = xi)
• Easy facts about pmf
Σi P(X = xi) = 1
P(X = xi∩X = xj) = 0 if i ≠ j
P(X = xi U X = xj) = P(X = xi) + P(X = xj) if i ≠ j
P(X = x1 U X = x2 U … U X = xk) = 1
12. Common Distributions
• Uniform X U[1, …, N]
X takes values 1, 2, … N
P(X = i) = 1/N
E.g. picking balls of different colors from a box
• Binomial X Bin(n, p)
X takes values 0, 1, …, n
E.g. coin flips
p(X = i) =
n
i
pi
(1 p)ni
13. Continuous Random Variables
• Probability density function (pdf) instead of
probability mass function (pmf)
• A pdf is any function f(x) that describes the
probability density in terms of the input
variable x.
14. Probability of Continuous RV
• Properties of pdf
• Actual probability can be obtained by taking
the integral of pdf
E.g. the probability of X being between 0 and 1 is
f (x) 0,x
f (x) =1
P(0 X 1) = f (x)dx
0
1
15. Cumulative Distribution Function
• FX(v) = P(X ≤ v)
• Discrete RVs
FX(v) = Σvi P(X = vi)
• Continuous RVs
FX (v) = f (x)dx
v
d
dx
Fx (x) = f (x)
16. Common Distributions
• Normal X N(μ, σ2)
E.g. the height of the entire population
f (x) =
1
2
exp
(x )2
22
17. Multivariate Normal
• Generalization to higher dimensions of the
one-dimensional normal
f r
X
(xi,...,xd ) =
1
(2)d /2
1/2
exp
1
2
r
x
)
T
1 r
x
)
.
Covariance matrix
Mean
18. Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
19. Joint Probability Distribution
• Random variables encodes attributes
• Not all possible combination of attributes are equally
likely
• Joint probability distributions quantify this
• P( X= x, Y= y) = P(x, y)
• Generalizes to N-RVs
•
•
)
=
=
=
x y
y
Y
x
X
P 1
,
)
=
x y
Y
X dxdy
y
x
f 1
,
,
21. Conditional Probability
)
)
)
P X Y
P X Y
P Y
x y
x y
y
= =
= = =
=
)
)
(
)
,
(
|
y
p
y
x
p
y
x
P =
But we will always write it this way:
events
22. Marginalization
• We know p(X, Y), what is P(X=x)?
• We can use the low of total probability, why?
) )
) )
=
=
y
y
y
x
P
y
P
y
x
P
x
p
|
,
A
B1
B2
B3
B4
B5
B6
B7
24. Bayes Rule
• We know that P(rain) = 0.5
• If we also know that the grass is wet, then
how this affects our belief about whether it
rains or not?
P rain | wet
)=
P(rain)P(wet | rain)
P(wet)
P x | y
)=
P(x)P(y | x)
P(y)
25. Bayes Rule cont.
• You can condition on more variables
)
)
|
(
)
,
|
(
)
|
(
,
|
z
y
P
z
x
y
P
z
x
P
z
y
x
P =
26. Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
27. Independence
• X is independent of Y means that knowing Y
does not change our belief about X.
• P(X|Y=y) = P(X)
• P(X=x, Y=y) = P(X=x) P(Y=y)
• The above should hold for all x, y
• It is symmetric and written as X Y
28. Independence
• X1, …, Xn are independent if and only if
• If X1, …, Xn are independent and identically
distributed we say they are iid (or that they
are a random sample) and we write
P(X1 A1,...,Xn An ) = P Xi Ai
)
i=1
n
X1, …, Xn ∼ P
29. CI: Conditional Independence
• RV are rarely independent but we can still
leverage local structural properties like
Conditional Independence.
• X Y | Z if once Z is observed, knowing the
value of Y does not change our belief about X
• P(rain sprinkler’s on | cloudy)
• P(rain sprinkler’s on | wet grass)
31. Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
32. Mean and Variance
• Mean (Expectation):
– Discrete RVs:
– Continuous RVs:
)
X
E
=
) )
X P X
i
i i
v
E v v
= =
) )
X
E xf x dx
=
E(g(X)) = g(vi)P(X = vi)
vi
E(g(X)) = g(x) f (x)dx
33. Mean and Variance
• Variance:
– Discrete RVs:
– Continuous RVs:
• Covariance:
) ) )
2
X P X
i
i i
v
V v v
= =
) ) )
2
X
V x f x dx
=
Var(X) = E((X )2
)
Var(X) = E(X2
) 2
Cov(X,Y) = E((X x )(Y y )) = E(XY) xy
35. Properties
• Mean
–
–
– If X and Y are independent,
• Variance
–
– If X and Y are independent,
) ) )
X Y X Y
E E E
=
) )
X X
E a aE
=
) ) )
XY X Y
E E E
=
) )
2
X X
V a b a V
=
)
X Y (X) (Y)
V V V
=
36. Some more properties
• The conditional expectation of Y given X when
the value of X = x is:
• The Law of Total Expectation or Law of
Iterated Expectation:
) dy
x
y
p
y
x
X
Y
E )
|
(
*
|
=
=
=
=
= dx
x
p
x
X
Y
E
X
Y
E
E
Y
E X )
(
)
|
(
)
|
(
)
(
37. Some more properties
• The law of Total Variance:
Var(Y) =Var E(Y | X)
E Var(Y | X)
38. Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
40. Statistical Inference
• Given observations from a model
– What (conditional) independence assumptions
hold?
• Structure learning
– If you know the family of the model (ex,
multinomial), What are the value of the
parameters: MLE, Bayesian estimation.
• Parameter learning
41. Probability Review
• Events and Event spaces
• Random variables
• Joint probability distributions
• Marginalization, conditioning, chain rule,
Bayes Rule, law of total probability, etc.
• Structural properties
• Independence, conditional independence
• Mean and Variance
• The big picture
• Examples
42. Monty Hall Problem
• You're given the choice of three doors: Behind one
door is a car; behind the others, goats.
• You pick a door, say No. 1
• The host, who knows what's behind the doors, opens
another door, say No. 3, which has a goat.
• Do you want to pick door No. 2 instead?
43. Host must
reveal Goat B
Host must
reveal Goat A
Host reveals
Goat A
or
Host reveals
Goat B
44. Monty Hall Problem: Bayes Rule
• : the car is behind door i, i = 1, 2, 3
•
• : the host opens door j after you pick door i
•
i
C
ij
H
) 1 3
i
P C =
)
0
0
1 2
1 ,
ij k
i j
j k
P H C
i k
i k j k
=
=
=
=
45. Monty Hall Problem: Bayes Rule cont.
• WLOG, i=1, j=3
•
•
)
) )
)
13 1 1
1 13
13
P H C P C
P C H
P H
=
) )
13 1 1
1 1 1
2 3 6
P H C P C = =
46. •
•
Monty Hall Problem: Bayes Rule cont.
) ) ) )
) ) ) )
13 13 1 13 2 13 3
13 1 1 13 2 2
, , ,
1 1
1
6 3
1
2
P H P H C P H C P H C
P H C P C P H C P C
=
=
=
=
)
1 13
1 6 1
1 2 3
P C H = =
47. Monty Hall Problem: Bayes Rule cont.
)
1 13
1 6 1
1 2 3
P C H = =
You should switch!
) )
2 13 1 13
1 2
1
3 3
P C H P C H
= =
48. Information Theory
• P(X) encodes our uncertainty about X
• Some variables are more uncertain that others
• How can we quantify this intuition?
• Entropy: average number of bits required to encode X
P(X) P(Y)
X Y
)
)
)
)
)
=
=
=
x
x
P x
P
x
P
x
P
x
P
x
p
E
X
H )
(
log
1
log
1
log
49. Information Theory cont.
• Entropy: average number of bits required to encode X
• We can define conditional entropy similarly
• i.e. once Y is known, we only need H(X,Y) – H(Y) bits
• We can also define chain rule for entropies (not surprising)
)
)
) )
Y
H
Y
X
H
y
x
p
E
Y
X
H P
P
P
=
= ,
|
1
log
|
) ) ) )
Y
X
Z
H
X
Y
H
X
H
Z
Y
X
H P
P
P
P ,
|
|
,
,
=
)
)
)
)
)
=
=
=
x
x
P x
P
x
P
x
P
x
P
x
p
E
X
H )
(
log
1
log
1
log
50. Mutual Information: MI
• Remember independence?
• If XY then knowing Y won’t change our belief about X
• Mutual information can help quantify this! (not the only
way though)
• MI:
• “The amount of uncertainty in X which is removed by
knowing Y”
• Symmetric
• I(X;Y) = 0 iff, X and Y are independent!
) ) )
Y
X
H
X
H
Y
X
I P
P
P |
;
=
=
y x y
p
x
p
y
x
p
y
x
p
Y
X
I
)
(
)
(
)
,
(
log
)
,
(
)
;
(
51. Chi Square Test for Independence
(Example)
Republican Democrat Independent Total
Male 200 150 50 400
Female 250 300 50 600
Total 450 450 100 1000
• State the hypotheses
H0: Gender and voting preferences are independent.
Ha: Gender and voting preferences are not independent
• Choose significance level
Say, 0.05
53. Chi Square Test for Independence
• Chi-square test statistic
• Χ2 = (200 - 180)2/180 + (150 - 180)2/180 + (50 - 40)2/40 +
(250 - 270)2/270 + (300 - 270)2/270 + (50 - 60)2/40
• Χ2 = 400/180 + 900/180 + 100/40 + 400/270 + 900/270 +
100/60
• Χ2 = 2.22 + 5.00 + 2.50 + 1.48 + 3.33 + 1.67 = 16.2
=
v
g
v
g
v
g
E
E
O
X
,
2
,
,
2
)
(
Republican Democrat Independent Total
Male 200 150 50 400
Female 250 300 50 600
Total 450 450 100 1000
54. Chi Square Test for Independence
• P-value
– Probability of observing a sample statistic as
extreme as the test statistic
– P(X2 ≥ 16.2) = 0.0003
• Since P-value (0.0003) is less than the
significance level (0.05), we cannot accept the
null hypothesis
• There is a relationship between gender and
voting preference
55. Acknowledgment
• Carlos Guestrin recitation slides:
http://www.cs.cmu.edu/~guestrin/Class/10708/recitations/r1/Probability_and_St
atistics_Review.ppt
• Andrew Moore Tutorial:
http://www.autonlab.org/tutorials/prob.html
• Monty hall problem:
http://en.wikipedia.org/wiki/Monty_Hall_problem
• http://www.cs.cmu.edu/~guestrin/Class/10701-F07/recitation_schedule.html
• Chi-square test for independence
http://stattrek.com/chi-square-test/independence.aspx