VIP Call Girls Service Hitech City Hyderabad Call +91-8250192130
02 The Math for Analysis of Algorithms
1. Analysis of Algorithms
The Mathematics of Analysis of Algorithms
Andres Mendez-Vazquez
September 10, 2015
1 / 97
2. Outline
1 Induction
Basic Induction
Structural Induction
2 Series
Properties
Important Series
Bounding the Series
3 Probability
Intuitive Formulation
Axioms
Independence
Unconditional and Conditional Probability
Posterior (Conditional) Probability
Random Variables
Types of Random Variables
Cumulative Distributive Function
Properties of the PMF/PDF
Expected Value and Variance
Indicator Random Variable
2 / 97
3. Outline
1 Induction
Basic Induction
Structural Induction
2 Series
Properties
Important Series
Bounding the Series
3 Probability
Intuitive Formulation
Axioms
Independence
Unconditional and Conditional Probability
Posterior (Conditional) Probability
Random Variables
Types of Random Variables
Cumulative Distributive Function
Properties of the PMF/PDF
Expected Value and Variance
Indicator Random Variable
3 / 97
4. Basic Induction
Principle of Mathematical Induction
Let P(n) be a property that is defined for integers n, and let a be a fixed
integer.
Suppose the following two statements are true
1 P (a) is true.
2 For all integers k ≥ a, if P (k) is true then P (k + 1) is true.
Then the statement
For all integers n ≥ a, P (n) is true. (1)
4 / 97
5. Basic Induction
Principle of Mathematical Induction
Let P(n) be a property that is defined for integers n, and let a be a fixed
integer.
Suppose the following two statements are true
1 P (a) is true.
2 For all integers k ≥ a, if P (k) is true then P (k + 1) is true.
Then the statement
For all integers n ≥ a, P (n) is true. (1)
4 / 97
6. Basic Induction
Principle of Mathematical Induction
Let P(n) be a property that is defined for integers n, and let a be a fixed
integer.
Suppose the following two statements are true
1 P (a) is true.
2 For all integers k ≥ a, if P (k) is true then P (k + 1) is true.
Then the statement
For all integers n ≥ a, P (n) is true. (1)
4 / 97
7. Basic Induction
Principle of Mathematical Induction
Let P(n) be a property that is defined for integers n, and let a be a fixed
integer.
Suppose the following two statements are true
1 P (a) is true.
2 For all integers k ≥ a, if P (k) is true then P (k + 1) is true.
Then the statement
For all integers n ≥ a, P (n) is true. (1)
4 / 97
8. We have the following method for Mathematical Induction
Consider a statement of the form
For all integers n ≥ a, P (n) is true. (2)
To prove such a statement
Perform the following two steps
Step 1 (Basis step)
Show that P(a) is true - we normally use a = 1.
5 / 97
9. We have the following method for Mathematical Induction
Consider a statement of the form
For all integers n ≥ a, P (n) is true. (2)
To prove such a statement
Perform the following two steps
Step 1 (Basis step)
Show that P(a) is true - we normally use a = 1.
5 / 97
10. We have the following method for Mathematical Induction
Consider a statement of the form
For all integers n ≥ a, P (n) is true. (2)
To prove such a statement
Perform the following two steps
Step 1 (Basis step)
Show that P(a) is true - we normally use a = 1.
5 / 97
11. Then
Step 2 (Inductive step)
Show that for all integers k ≥ a, if P(k) is true, then P(k + 1) is true.
Inductive hypothesis
Suppose that P(k) is true, where k is any particular but arbitrarily chosen
integer with k ≥ a.
Then, you can prove that
P (k + 1) is true.
6 / 97
12. Then
Step 2 (Inductive step)
Show that for all integers k ≥ a, if P(k) is true, then P(k + 1) is true.
Inductive hypothesis
Suppose that P(k) is true, where k is any particular but arbitrarily chosen
integer with k ≥ a.
Then, you can prove that
P (k + 1) is true.
6 / 97
13. Then
Step 2 (Inductive step)
Show that for all integers k ≥ a, if P(k) is true, then P(k + 1) is true.
Inductive hypothesis
Suppose that P(k) is true, where k is any particular but arbitrarily chosen
integer with k ≥ a.
Then, you can prove that
P (k + 1) is true.
6 / 97
14. Example
Proposition
For all integers n ≥ 8, n¢ can be obtained using 3¢ and 5¢ coins.
Show that P(8) is true
P(8) is true because 8¢ can be obtained using one coin 3¢ and another
coin of 5¢.
Show that for all integers k ≥ 8, if P(k) is true then P(k + 1) is also
true
We can do the following.
7 / 97
15. Example
Proposition
For all integers n ≥ 8, n¢ can be obtained using 3¢ and 5¢ coins.
Show that P(8) is true
P(8) is true because 8¢ can be obtained using one coin 3¢ and another
coin of 5¢.
Show that for all integers k ≥ 8, if P(k) is true then P(k + 1) is also
true
We can do the following.
7 / 97
16. Example
Proposition
For all integers n ≥ 8, n¢ can be obtained using 3¢ and 5¢ coins.
Show that P(8) is true
P(8) is true because 8¢ can be obtained using one coin 3¢ and another
coin of 5¢.
Show that for all integers k ≥ 8, if P(k) is true then P(k + 1) is also
true
We can do the following.
7 / 97
17. Example
Inductive hypothesis
Suppose that k is any integer with k ≥ 8 such that
k¢ can be obtained using 3¢ and 5¢ coins.
Case 1 - There is a 5¢ among those making the change for k¢
In this case, replace 5¢ by two 3¢. Thus, we get the change for (k + 1) ¢
Case 2 - There is not a 5¢ among those making the change for k¢
Because k ≥ 8, at leat three coins must have been used.
At least three 3¢ coins must have been used.
Remove those coins and replaced them using two 5¢.
The result will be (k + 1) ¢.
8 / 97
18. Example
Inductive hypothesis
Suppose that k is any integer with k ≥ 8 such that
k¢ can be obtained using 3¢ and 5¢ coins.
Case 1 - There is a 5¢ among those making the change for k¢
In this case, replace 5¢ by two 3¢. Thus, we get the change for (k + 1) ¢
Case 2 - There is not a 5¢ among those making the change for k¢
Because k ≥ 8, at leat three coins must have been used.
At least three 3¢ coins must have been used.
Remove those coins and replaced them using two 5¢.
The result will be (k + 1) ¢.
8 / 97
19. Example
Inductive hypothesis
Suppose that k is any integer with k ≥ 8 such that
k¢ can be obtained using 3¢ and 5¢ coins.
Case 1 - There is a 5¢ among those making the change for k¢
In this case, replace 5¢ by two 3¢. Thus, we get the change for (k + 1) ¢
Case 2 - There is not a 5¢ among those making the change for k¢
Because k ≥ 8, at leat three coins must have been used.
At least three 3¢ coins must have been used.
Remove those coins and replaced them using two 5¢.
The result will be (k + 1) ¢.
8 / 97
20. Example
Inductive hypothesis
Suppose that k is any integer with k ≥ 8 such that
k¢ can be obtained using 3¢ and 5¢ coins.
Case 1 - There is a 5¢ among those making the change for k¢
In this case, replace 5¢ by two 3¢. Thus, we get the change for (k + 1) ¢
Case 2 - There is not a 5¢ among those making the change for k¢
Because k ≥ 8, at leat three coins must have been used.
At least three 3¢ coins must have been used.
Remove those coins and replaced them using two 5¢.
The result will be (k + 1) ¢.
8 / 97
21. Example
Inductive hypothesis
Suppose that k is any integer with k ≥ 8 such that
k¢ can be obtained using 3¢ and 5¢ coins.
Case 1 - There is a 5¢ among those making the change for k¢
In this case, replace 5¢ by two 3¢. Thus, we get the change for (k + 1) ¢
Case 2 - There is not a 5¢ among those making the change for k¢
Because k ≥ 8, at leat three coins must have been used.
At least three 3¢ coins must have been used.
Remove those coins and replaced them using two 5¢.
The result will be (k + 1) ¢.
8 / 97
22. Example
Inductive hypothesis
Suppose that k is any integer with k ≥ 8 such that
k¢ can be obtained using 3¢ and 5¢ coins.
Case 1 - There is a 5¢ among those making the change for k¢
In this case, replace 5¢ by two 3¢. Thus, we get the change for (k + 1) ¢
Case 2 - There is not a 5¢ among those making the change for k¢
Because k ≥ 8, at leat three coins must have been used.
At least three 3¢ coins must have been used.
Remove those coins and replaced them using two 5¢.
The result will be (k + 1) ¢.
8 / 97
23. Example
Inductive hypothesis
Suppose that k is any integer with k ≥ 8 such that
k¢ can be obtained using 3¢ and 5¢ coins.
Case 1 - There is a 5¢ among those making the change for k¢
In this case, replace 5¢ by two 3¢. Thus, we get the change for (k + 1) ¢
Case 2 - There is not a 5¢ among those making the change for k¢
Because k ≥ 8, at leat three coins must have been used.
At least three 3¢ coins must have been used.
Remove those coins and replaced them using two 5¢.
The result will be (k + 1) ¢.
8 / 97
24. Outline
1 Induction
Basic Induction
Structural Induction
2 Series
Properties
Important Series
Bounding the Series
3 Probability
Intuitive Formulation
Axioms
Independence
Unconditional and Conditional Probability
Posterior (Conditional) Probability
Random Variables
Types of Random Variables
Cumulative Distributive Function
Properties of the PMF/PDF
Expected Value and Variance
Indicator Random Variable
9 / 97
25. We can go further...
Recursively defined sets and structures
Assume S is a set. We use two steps to define the elements of S.
Basis Step
Specify an initial collection of elements.
Recursive Step
Give a rule for forming new elements from those already known to be in S.
10 / 97
26. We can go further...
Recursively defined sets and structures
Assume S is a set. We use two steps to define the elements of S.
Basis Step
Specify an initial collection of elements.
Recursive Step
Give a rule for forming new elements from those already known to be in S.
10 / 97
27. We can go further...
Recursively defined sets and structures
Assume S is a set. We use two steps to define the elements of S.
Basis Step
Specify an initial collection of elements.
Recursive Step
Give a rule for forming new elements from those already known to be in S.
10 / 97
28. Thus
Let S be a set that has been defined recursively
To prove that every object in S satisfies a certain property:
1 Show that each object in the BASE for S satisfies the property.
2 Show that for each rule in the RECURSION, if the rule is applied to
objects in S that satisfy the property, then the objects defined by the
rule also satisfy the property.
11 / 97
29. Thus
Let S be a set that has been defined recursively
To prove that every object in S satisfies a certain property:
1 Show that each object in the BASE for S satisfies the property.
2 Show that for each rule in the RECURSION, if the rule is applied to
objects in S that satisfy the property, then the objects defined by the
rule also satisfy the property.
11 / 97
30. Thus
Let S be a set that has been defined recursively
To prove that every object in S satisfies a certain property:
1 Show that each object in the BASE for S satisfies the property.
2 Show that for each rule in the RECURSION, if the rule is applied to
objects in S that satisfy the property, then the objects defined by the
rule also satisfy the property.
11 / 97
31. Example: Binary trees recursive definition
Recall that the set B of binary trees over an alphabet A is defined as
follows
1 Basis: ∈ B
2 Recursive definition: If L, R ∈ B and x ∈ A then < L, x, R >∈ B.
Now define the function f : B → N defined as
f ( ) =0
f ( L, x, R ) =
1 if L = R =
f (L) + f (R) otherwise
Theorem
Let T in B be a binary tree. Then f (T) yields the number of leaves of T.
12 / 97
32. Example: Binary trees recursive definition
Recall that the set B of binary trees over an alphabet A is defined as
follows
1 Basis: ∈ B
2 Recursive definition: If L, R ∈ B and x ∈ A then < L, x, R >∈ B.
Now define the function f : B → N defined as
f ( ) =0
f ( L, x, R ) =
1 if L = R =
f (L) + f (R) otherwise
Theorem
Let T in B be a binary tree. Then f (T) yields the number of leaves of T.
12 / 97
33. Example: Binary trees recursive definition
Recall that the set B of binary trees over an alphabet A is defined as
follows
1 Basis: ∈ B
2 Recursive definition: If L, R ∈ B and x ∈ A then < L, x, R >∈ B.
Now define the function f : B → N defined as
f ( ) =0
f ( L, x, R ) =
1 if L = R =
f (L) + f (R) otherwise
Theorem
Let T in B be a binary tree. Then f (T) yields the number of leaves of T.
12 / 97
34. Proof
By structural induction on T
Basis: The empty tree has no leaves, so f ( ) = 0 is correct.
Induction
Let L, R be trees in B, x ∈ A.
Now
Suppose that f (L) and f (R) denotes the number of leaves of L and R,
respectively.
13 / 97
35. Proof
By structural induction on T
Basis: The empty tree has no leaves, so f ( ) = 0 is correct.
Induction
Let L, R be trees in B, x ∈ A.
Now
Suppose that f (L) and f (R) denotes the number of leaves of L and R,
respectively.
13 / 97
36. Proof
By structural induction on T
Basis: The empty tree has no leaves, so f ( ) = 0 is correct.
Induction
Let L, R be trees in B, x ∈ A.
Now
Suppose that f (L) and f (R) denotes the number of leaves of L and R,
respectively.
13 / 97
37. Proof
Case 1
If L = R = , then L, x, R = , x, has one leaf, namely x, so
f ( , x, ) = 1 is correct.
Case 2
If L and R are both not empty, then the number of leaves of the tree
L, x, R is equal to the number of leaves of L plus the number of leaves
of R.
f ( L, x, R ) = f (L) + f (R) (3)
Q.E.D.
14 / 97
38. Proof
Case 1
If L = R = , then L, x, R = , x, has one leaf, namely x, so
f ( , x, ) = 1 is correct.
Case 2
If L and R are both not empty, then the number of leaves of the tree
L, x, R is equal to the number of leaves of L plus the number of leaves
of R.
f ( L, x, R ) = f (L) + f (R) (3)
Q.E.D.
14 / 97
39. Introduction
When an algorithm contains an iterative control construct such as a
while or for loop
It is possible to express its running time as a series:
n
j=1
j (4)
Thus, what is the objective of using these series
To be able to find bounds for the complexities of algorithms
15 / 97
40. Introduction
When an algorithm contains an iterative control construct such as a
while or for loop
It is possible to express its running time as a series:
n
j=1
j (4)
Thus, what is the objective of using these series
To be able to find bounds for the complexities of algorithms
15 / 97
41. Definition of Series
Definition
Given a sequence a1, a2, ..., an of numbers, where n is a no-negative
integer, we can say that
a1 + a2 + ... + an =
n
k=1
ak (5)
In the case of infinite series
a1 + a2 + ... =
∞
k=1
ak = lim
n→∞
n
k=1
ak (6)
Here, we have concepts of convergence and divergence that I will allow
you to study.
16 / 97
42. Definition of Series
Definition
Given a sequence a1, a2, ..., an of numbers, where n is a no-negative
integer, we can say that
a1 + a2 + ... + an =
n
k=1
ak (5)
In the case of infinite series
a1 + a2 + ... =
∞
k=1
ak = lim
n→∞
n
k=1
ak (6)
Here, we have concepts of convergence and divergence that I will allow
you to study.
16 / 97
43. Definition of Series
Definition
Given a sequence a1, a2, ..., an of numbers, where n is a no-negative
integer, we can say that
a1 + a2 + ... + an =
n
k=1
ak (5)
In the case of infinite series
a1 + a2 + ... =
∞
k=1
ak = lim
n→∞
n
k=1
ak (6)
Here, we have concepts of convergence and divergence that I will allow
you to study.
16 / 97
44. Definition of Series
Definition
Given a sequence a1, a2, ..., an of numbers, where n is a no-negative
integer, we can say that
a1 + a2 + ... + an =
n
k=1
ak (5)
In the case of infinite series
a1 + a2 + ... =
∞
k=1
ak = lim
n→∞
n
k=1
ak (6)
Here, we have concepts of convergence and divergence that I will allow
you to study.
16 / 97
45. Outline
1 Induction
Basic Induction
Structural Induction
2 Series
Properties
Important Series
Bounding the Series
3 Probability
Intuitive Formulation
Axioms
Independence
Unconditional and Conditional Probability
Posterior (Conditional) Probability
Random Variables
Types of Random Variables
Cumulative Distributive Function
Properties of the PMF/PDF
Expected Value and Variance
Indicator Random Variable
17 / 97
46. Linearity
For any real number c and any finite sequences a1, a2, ..., an and
b1, b2, ..., bn
n
k=1
[cak + bk] = c
n
k=1
ak +
n
k=1
bk (7)
For More
Please take a look at page 1146 of Cormen’s book.
18 / 97
47. Linearity
For any real number c and any finite sequences a1, a2, ..., an and
b1, b2, ..., bn
n
k=1
[cak + bk] = c
n
k=1
ak +
n
k=1
bk (7)
For More
Please take a look at page 1146 of Cormen’s book.
18 / 97
48. Outline
1 Induction
Basic Induction
Structural Induction
2 Series
Properties
Important Series
Bounding the Series
3 Probability
Intuitive Formulation
Axioms
Independence
Unconditional and Conditional Probability
Posterior (Conditional) Probability
Random Variables
Types of Random Variables
Cumulative Distributive Function
Properties of the PMF/PDF
Expected Value and Variance
Indicator Random Variable
19 / 97
49. Telescopic Sum
Observation
In certain sums each term is a difference of two quantities.
You sometimes see that all the terms cancel except the first and the
last.
Example
Imagine that each term in the sum has the following structure:
1
k
−
1
k + 1
=
(k + 1) − k
k (k + 1)
=
1
k (k + 1)
(8)
What is the result of the following sum?
n
k=1
1
k (k + 1)
(9)
20 / 97
50. Telescopic Sum
Observation
In certain sums each term is a difference of two quantities.
You sometimes see that all the terms cancel except the first and the
last.
Example
Imagine that each term in the sum has the following structure:
1
k
−
1
k + 1
=
(k + 1) − k
k (k + 1)
=
1
k (k + 1)
(8)
What is the result of the following sum?
n
k=1
1
k (k + 1)
(9)
20 / 97
51. Telescopic Sum
Observation
In certain sums each term is a difference of two quantities.
You sometimes see that all the terms cancel except the first and the
last.
Example
Imagine that each term in the sum has the following structure:
1
k
−
1
k + 1
=
(k + 1) − k
k (k + 1)
=
1
k (k + 1)
(8)
What is the result of the following sum?
n
k=1
1
k (k + 1)
(9)
20 / 97
52. Telescopic Sum
Observation
In certain sums each term is a difference of two quantities.
You sometimes see that all the terms cancel except the first and the
last.
Example
Imagine that each term in the sum has the following structure:
1
k
−
1
k + 1
=
(k + 1) − k
k (k + 1)
=
1
k (k + 1)
(8)
What is the result of the following sum?
n
k=1
1
k (k + 1)
(9)
20 / 97
53. Telescopic Sum
Observation
In certain sums each term is a difference of two quantities.
You sometimes see that all the terms cancel except the first and the
last.
Example
Imagine that each term in the sum has the following structure:
1
k
−
1
k + 1
=
(k + 1) − k
k (k + 1)
=
1
k (k + 1)
(8)
What is the result of the following sum?
n
k=1
1
k (k + 1)
(9)
20 / 97
54. Telescopic Sum
Observation
In certain sums each term is a difference of two quantities.
You sometimes see that all the terms cancel except the first and the
last.
Example
Imagine that each term in the sum has the following structure:
1
k
−
1
k + 1
=
(k + 1) − k
k (k + 1)
=
1
k (k + 1)
(8)
What is the result of the following sum?
n
k=1
1
k (k + 1)
(9)
20 / 97
55. Telescopic Sum
Observation
In certain sums each term is a difference of two quantities.
You sometimes see that all the terms cancel except the first and the
last.
Example
Imagine that each term in the sum has the following structure:
1
k
−
1
k + 1
=
(k + 1) − k
k (k + 1)
=
1
k (k + 1)
(8)
What is the result of the following sum?
n
k=1
1
k (k + 1)
(9)
20 / 97
56. Telescopic Sum
Definition
For any sequence a0, a1, ..., an,
n
k=1
(ak − ak−1) = an − a0 (10)
Similarly
n−1
k=0
(ak − ak+1) = a0 − an (11)
21 / 97
57. Telescopic Sum
Definition
For any sequence a0, a1, ..., an,
n
k=1
(ak − ak−1) = an − a0 (10)
Similarly
n−1
k=0
(ak − ak+1) = a0 − an (11)
21 / 97
58. Arithmetic series
Summing over the set {1, 2, 3, ..., n}
We can prove that
n
i=1
i =
n (n + 1)
2
(12)
Proof
Basis: If n = 1 then 1×2
2 = 1
22 / 97
59. Arithmetic series
Summing over the set {1, 2, 3, ..., n}
We can prove that
n
i=1
i =
n (n + 1)
2
(12)
Proof
Basis: If n = 1 then 1×2
2 = 1
22 / 97
60. Proof
Induction, assume that is true for n
n+1
i=1
i =
n
i=1
i + n + 1
=
n (n + 1)
2
+ n + 1
=
n (n + 1) + 2 (n + 1)
2
=
(n + 1) (n + 2)
2
23 / 97
61. Proof
Induction, assume that is true for n
n+1
i=1
i =
n
i=1
i + n + 1
=
n (n + 1)
2
+ n + 1
=
n (n + 1) + 2 (n + 1)
2
=
(n + 1) (n + 2)
2
23 / 97
62. Proof
Induction, assume that is true for n
n+1
i=1
i =
n
i=1
i + n + 1
=
n (n + 1)
2
+ n + 1
=
n (n + 1) + 2 (n + 1)
2
=
(n + 1) (n + 2)
2
23 / 97
63. Proof
Induction, assume that is true for n
n+1
i=1
i =
n
i=1
i + n + 1
=
n (n + 1)
2
+ n + 1
=
n (n + 1) + 2 (n + 1)
2
=
(n + 1) (n + 2)
2
23 / 97
64. Series of Squares and Sums
Series of Squares
n
k=0
k2
=
n (n + 1) (2n + 1)
6
(13)
Series of Cubes
n
k=0
k2
=
n2 (n + 1)2
4
(14)
24 / 97
65. Series of Squares and Sums
Series of Squares
n
k=0
k2
=
n (n + 1) (2n + 1)
6
(13)
Series of Cubes
n
k=0
k2
=
n2 (n + 1)2
4
(14)
24 / 97
66. Geometric Series
Definition
For a real x = 1, we have that
n
k=0
xk
= 1 + x + x2
+ ... + xn
(15)
It is called the geometric series
It is possible to prove that
n
k=0
xk
=
xn+1 − 1
x − 1
(16)
Proof
Now multiply both sides by of (Eq. 15) by x
x
n
k=0
xk
= x + x2
+ x3
+ ... + xn+1
(17)
25 / 97
67. Geometric Series
Definition
For a real x = 1, we have that
n
k=0
xk
= 1 + x + x2
+ ... + xn
(15)
It is called the geometric series
It is possible to prove that
n
k=0
xk
=
xn+1 − 1
x − 1
(16)
Proof
Now multiply both sides by of (Eq. 15) by x
x
n
k=0
xk
= x + x2
+ x3
+ ... + xn+1
(17)
25 / 97
68. Geometric Series
Definition
For a real x = 1, we have that
n
k=0
xk
= 1 + x + x2
+ ... + xn
(15)
It is called the geometric series
It is possible to prove that
n
k=0
xk
=
xn+1 − 1
x − 1
(16)
Proof
Now multiply both sides by of (Eq. 15) by x
x
n
k=0
xk
= x + x2
+ x3
+ ... + xn+1
(17)
25 / 97
69. Proof
Subtract (Eq. 17) from (Eq. 15)
n
k=0
xk
− x
n
k=0
xk
= 1 − xn+1
(18)
Finally
n
k=0
xk
=
1 − xn+1
1 − x
=
xn+1 − 1
x − 1
(19)
26 / 97
70. Proof
Subtract (Eq. 17) from (Eq. 15)
n
k=0
xk
− x
n
k=0
xk
= 1 − xn+1
(18)
Finally
n
k=0
xk
=
1 − xn+1
1 − x
=
xn+1 − 1
x − 1
(19)
26 / 97
71. Infinite Geometric Series
When the summation is infinite and |x| < 1
∞
k=0
xk
=
1
1 − x
(20)
Proof
Given that
∞
k=0
xk
= lim
n→∞
n
k=0
xk
= lim
n→∞
1 − xn+1
1 − x
=
1
1 − x
(21)
27 / 97
72. Infinite Geometric Series
When the summation is infinite and |x| < 1
∞
k=0
xk
=
1
1 − x
(20)
Proof
Given that
∞
k=0
xk
= lim
n→∞
n
k=0
xk
= lim
n→∞
1 − xn+1
1 − x
=
1
1 − x
(21)
27 / 97
73. Infinite Geometric Series
When the summation is infinite and |x| < 1
∞
k=0
xk
=
1
1 − x
(20)
Proof
Given that
∞
k=0
xk
= lim
n→∞
n
k=0
xk
= lim
n→∞
1 − xn+1
1 − x
=
1
1 − x
(21)
27 / 97
74. Infinite Geometric Series
When the summation is infinite and |x| < 1
∞
k=0
xk
=
1
1 − x
(20)
Proof
Given that
∞
k=0
xk
= lim
n→∞
n
k=0
xk
= lim
n→∞
1 − xn+1
1 − x
=
1
1 − x
(21)
27 / 97
75. Infinite Geometric Series
When the summation is infinite and |x| < 1
∞
k=0
xk
=
1
1 − x
(20)
Proof
Given that
∞
k=0
xk
= lim
n→∞
n
k=0
xk
= lim
n→∞
1 − xn+1
1 − x
=
1
1 − x
(21)
27 / 97
76. For more on the series
Please take a look to
Cormen’s book - Appendix A.
28 / 97
77. Outline
1 Induction
Basic Induction
Structural Induction
2 Series
Properties
Important Series
Bounding the Series
3 Probability
Intuitive Formulation
Axioms
Independence
Unconditional and Conditional Probability
Posterior (Conditional) Probability
Random Variables
Types of Random Variables
Cumulative Distributive Function
Properties of the PMF/PDF
Expected Value and Variance
Indicator Random Variable
29 / 97
78. This is quite useful for Analysis of Algorithms
Important
The most basic way to evaluate a series is to use mathematical induction.
Example
Prove that
n
k=0
3k
≤ c3n
(22)
30 / 97
79. This is quite useful for Analysis of Algorithms
Important
The most basic way to evaluate a series is to use mathematical induction.
Example
Prove that
n
k=0
3k
≤ c3n
(22)
30 / 97
80. Fast Bounding of Series
A quick upper bound on the arithmetic series
n
k=1
k ≤
n
k=1
n = n2
(23)
In general, for a series n
k=1 ak
If amax = max1≤k≤n ak then
n
k=1
ak ≤ n · amax (24)
Another fast way of bounding finite series is
Suppose that
ak+1
ak
≤ r for all k ≥ 0 where 0 < r < 1.
31 / 97
81. Fast Bounding of Series
A quick upper bound on the arithmetic series
n
k=1
k ≤
n
k=1
n = n2
(23)
In general, for a series n
k=1 ak
If amax = max1≤k≤n ak then
n
k=1
ak ≤ n · amax (24)
Another fast way of bounding finite series is
Suppose that
ak+1
ak
≤ r for all k ≥ 0 where 0 < r < 1.
31 / 97
82. Fast Bounding of Series
A quick upper bound on the arithmetic series
n
k=1
k ≤
n
k=1
n = n2
(23)
In general, for a series n
k=1 ak
If amax = max1≤k≤n ak then
n
k=1
ak ≤ n · amax (24)
Another fast way of bounding finite series is
Suppose that
ak+1
ak
≤ r for all k ≥ 0 where 0 < r < 1.
31 / 97
83. A More Elegant Method
Thus, we have
ak ≤ a0rk
(25)
Thus, we can use a infinite decreasing geometric series
n
k=0
ak ≤
∞
k=0
a0rk
=a0
∞
k=0
rk
=a0
1
1 − r
32 / 97
84. A More Elegant Method
Thus, we have
ak ≤ a0rk
(25)
Thus, we can use a infinite decreasing geometric series
n
k=0
ak ≤
∞
k=0
a0rk
=a0
∞
k=0
rk
=a0
1
1 − r
32 / 97
85. A More Elegant Method
Thus, we have
ak ≤ a0rk
(25)
Thus, we can use a infinite decreasing geometric series
n
k=0
ak ≤
∞
k=0
a0rk
=a0
∞
k=0
rk
=a0
1
1 − r
32 / 97
86. A More Elegant Method
Thus, we have
ak ≤ a0rk
(25)
Thus, we can use a infinite decreasing geometric series
n
k=0
ak ≤
∞
k=0
a0rk
=a0
∞
k=0
rk
=a0
1
1 − r
32 / 97
87. Approximation by integrals
When a summation has the from n
k=m f (k), where f (k) is a
monotonically increasing function
ˆ n
m−1
f (x) dx ≤
n
k=m
f (k) ≤
ˆ n+1
m
f (x) dx (26)
33 / 97
88. For example
Given
ln (n + 1) =
ˆ n+1
1
1
x
dx ≤
n
k=1
1
k
(27)
In addition
n
k=1
1
k
≤
ˆ n
1
1
x
dx = ln n (28)
34 / 97
89. For example
Given
ln (n + 1) =
ˆ n+1
1
1
x
dx ≤
n
k=1
1
k
(27)
In addition
n
k=1
1
k
≤
ˆ n
1
1
x
dx = ln n (28)
34 / 97
90. Outline
1 Induction
Basic Induction
Structural Induction
2 Series
Properties
Important Series
Bounding the Series
3 Probability
Intuitive Formulation
Axioms
Independence
Unconditional and Conditional Probability
Posterior (Conditional) Probability
Random Variables
Types of Random Variables
Cumulative Distributive Function
Properties of the PMF/PDF
Expected Value and Variance
Indicator Random Variable
35 / 97
91. Gerolamo Cardano: Gambling out of Darkness
Gambling
Gambling shows our interest in quantifying the ideas of probability for
millennia, but exact mathematical descriptions arose much later.
Gerolamo Cardano (16th century)
While gambling he developed the following rule!!!
Equal conditions
“The most fundamental principle of all in gambling is simply equal
conditions, e.g. of opponents, of bystanders, of money, of situation, of the
dice box and of the dice itself. To the extent to which you depart from
that equity, if it is in your opponent’s favour, you are a fool, and if in your
own, you are unjust.”
36 / 97
92. Gerolamo Cardano: Gambling out of Darkness
Gambling
Gambling shows our interest in quantifying the ideas of probability for
millennia, but exact mathematical descriptions arose much later.
Gerolamo Cardano (16th century)
While gambling he developed the following rule!!!
Equal conditions
“The most fundamental principle of all in gambling is simply equal
conditions, e.g. of opponents, of bystanders, of money, of situation, of the
dice box and of the dice itself. To the extent to which you depart from
that equity, if it is in your opponent’s favour, you are a fool, and if in your
own, you are unjust.”
36 / 97
93. Gerolamo Cardano: Gambling out of Darkness
Gambling
Gambling shows our interest in quantifying the ideas of probability for
millennia, but exact mathematical descriptions arose much later.
Gerolamo Cardano (16th century)
While gambling he developed the following rule!!!
Equal conditions
“The most fundamental principle of all in gambling is simply equal
conditions, e.g. of opponents, of bystanders, of money, of situation, of the
dice box and of the dice itself. To the extent to which you depart from
that equity, if it is in your opponent’s favour, you are a fool, and if in your
own, you are unjust.”
36 / 97
94. Gerolamo Cardano’s Definition
Probability
“If therefore, someone should say, I want an ace, a deuce, or a trey, you
know that there are 27 favourable throws, and since the circuit is 36, the
rest of the throws in which these points will not turn up will be 9; the
odds will therefore be 3 to 1.”
Meaning
Probability as a ratio of favorable to all possible outcomes!!! As long all
events are equiprobable...
Thus, we get
P(All favourable throws) =
Number All favourable throws
Number of All throws
(29)
37 / 97
95. Gerolamo Cardano’s Definition
Probability
“If therefore, someone should say, I want an ace, a deuce, or a trey, you
know that there are 27 favourable throws, and since the circuit is 36, the
rest of the throws in which these points will not turn up will be 9; the
odds will therefore be 3 to 1.”
Meaning
Probability as a ratio of favorable to all possible outcomes!!! As long all
events are equiprobable...
Thus, we get
P(All favourable throws) =
Number All favourable throws
Number of All throws
(29)
37 / 97
96. Gerolamo Cardano’s Definition
Probability
“If therefore, someone should say, I want an ace, a deuce, or a trey, you
know that there are 27 favourable throws, and since the circuit is 36, the
rest of the throws in which these points will not turn up will be 9; the
odds will therefore be 3 to 1.”
Meaning
Probability as a ratio of favorable to all possible outcomes!!! As long all
events are equiprobable...
Thus, we get
P(All favourable throws) =
Number All favourable throws
Number of All throws
(29)
37 / 97
97. Intuitive Formulation
Empiric Definition
Intuitively, the probability of an event A could be defined as:
P(A) = lim
n→∞
N(A)
n
Where N(A) is the number that event a happens in n trials.
Example
Imagine you have three dices, then
The total number of outcomes is 63
If we have event A = all numbers are equal, |A| = 6
Then, we have that P(A) = 6
63 = 1
36
38 / 97
98. Intuitive Formulation
Empiric Definition
Intuitively, the probability of an event A could be defined as:
P(A) = lim
n→∞
N(A)
n
Where N(A) is the number that event a happens in n trials.
Example
Imagine you have three dices, then
The total number of outcomes is 63
If we have event A = all numbers are equal, |A| = 6
Then, we have that P(A) = 6
63 = 1
36
38 / 97
99. Intuitive Formulation
Empiric Definition
Intuitively, the probability of an event A could be defined as:
P(A) = lim
n→∞
N(A)
n
Where N(A) is the number that event a happens in n trials.
Example
Imagine you have three dices, then
The total number of outcomes is 63
If we have event A = all numbers are equal, |A| = 6
Then, we have that P(A) = 6
63 = 1
36
38 / 97
100. Intuitive Formulation
Empiric Definition
Intuitively, the probability of an event A could be defined as:
P(A) = lim
n→∞
N(A)
n
Where N(A) is the number that event a happens in n trials.
Example
Imagine you have three dices, then
The total number of outcomes is 63
If we have event A = all numbers are equal, |A| = 6
Then, we have that P(A) = 6
63 = 1
36
38 / 97
101. Outline
1 Induction
Basic Induction
Structural Induction
2 Series
Properties
Important Series
Bounding the Series
3 Probability
Intuitive Formulation
Axioms
Independence
Unconditional and Conditional Probability
Posterior (Conditional) Probability
Random Variables
Types of Random Variables
Cumulative Distributive Function
Properties of the PMF/PDF
Expected Value and Variance
Indicator Random Variable
39 / 97
102. Axioms of Probability
Axioms
Given a sample space S of events, we have that
1 0 ≤ P(A) ≤ 1
2 P(S) = 1
3 If A1, A2, ..., An are mutually exclusive events (i.e. P(Ai ∩ Aj) = 0),
then:
P(A1 ∪ A2 ∪ ... ∪ An) =
n
i=1
P(Ai)
40 / 97
103. Axioms of Probability
Axioms
Given a sample space S of events, we have that
1 0 ≤ P(A) ≤ 1
2 P(S) = 1
3 If A1, A2, ..., An are mutually exclusive events (i.e. P(Ai ∩ Aj) = 0),
then:
P(A1 ∪ A2 ∪ ... ∪ An) =
n
i=1
P(Ai)
40 / 97
104. Axioms of Probability
Axioms
Given a sample space S of events, we have that
1 0 ≤ P(A) ≤ 1
2 P(S) = 1
3 If A1, A2, ..., An are mutually exclusive events (i.e. P(Ai ∩ Aj) = 0),
then:
P(A1 ∪ A2 ∪ ... ∪ An) =
n
i=1
P(Ai)
40 / 97
105. Axioms of Probability
Axioms
Given a sample space S of events, we have that
1 0 ≤ P(A) ≤ 1
2 P(S) = 1
3 If A1, A2, ..., An are mutually exclusive events (i.e. P(Ai ∩ Aj) = 0),
then:
P(A1 ∪ A2 ∪ ... ∪ An) =
n
i=1
P(Ai)
40 / 97
108. Example
Setup
Throw a biased coin twice
HH .36 HT .24
TH .24 TT .16
We have the following event
At least one head!!! Can you tell me which events are part of it?
What about this one?
Tail on first toss.
42 / 97
109. Example
Setup
Throw a biased coin twice
HH .36 HT .24
TH .24 TT .16
We have the following event
At least one head!!! Can you tell me which events are part of it?
What about this one?
Tail on first toss.
42 / 97
110. Example
Setup
Throw a biased coin twice
HH .36 HT .24
TH .24 TT .16
We have the following event
At least one head!!! Can you tell me which events are part of it?
What about this one?
Tail on first toss.
42 / 97
111. We need to count!!!
We have four main methods of counting
1 Ordered samples of size r with replacement
2 Ordered samples of size r without replacement
3 Unordered samples of size r without replacement
4 Unordered samples of size r with replacement
43 / 97
112. We need to count!!!
We have four main methods of counting
1 Ordered samples of size r with replacement
2 Ordered samples of size r without replacement
3 Unordered samples of size r without replacement
4 Unordered samples of size r with replacement
43 / 97
113. We need to count!!!
We have four main methods of counting
1 Ordered samples of size r with replacement
2 Ordered samples of size r without replacement
3 Unordered samples of size r without replacement
4 Unordered samples of size r with replacement
43 / 97
114. We need to count!!!
We have four main methods of counting
1 Ordered samples of size r with replacement
2 Ordered samples of size r without replacement
3 Unordered samples of size r without replacement
4 Unordered samples of size r with replacement
43 / 97
115. Ordered samples of size r with replacement
Definition
The number of possible sequences (ai1 , ..., air ) for n different numbers is
n × n × ... × n = nr
(30)
Example
If you throw three dices you have 6 × 6 × 6 = 216
44 / 97
116. Ordered samples of size r with replacement
Definition
The number of possible sequences (ai1 , ..., air ) for n different numbers is
n × n × ... × n = nr
(30)
Example
If you throw three dices you have 6 × 6 × 6 = 216
44 / 97
117. Ordered samples of size r without replacement
Definition
The number of possible sequences (ai1 , ..., air ) for n different numbers is
n × n − 1 × ... × (n − (r − 1)) =
n!
(n − r)!
(31)
Example
The number of different numbers that can be formed if no digit can be
repeated. For example, if you have 4 digits and you want numbers of size
3.
45 / 97
118. Ordered samples of size r without replacement
Definition
The number of possible sequences (ai1 , ..., air ) for n different numbers is
n × n − 1 × ... × (n − (r − 1)) =
n!
(n − r)!
(31)
Example
The number of different numbers that can be formed if no digit can be
repeated. For example, if you have 4 digits and you want numbers of size
3.
45 / 97
119. Unordered samples of size r without replacement
Definition
Actually, we want the number of possible unordered sets.
However
We have n!
(n−r)! collections where we care about the order. Thus
n!
(n−r)!
r!
=
n!
r! (n − r)!
=
n
r
(32)
46 / 97
120. Unordered samples of size r without replacement
Definition
Actually, we want the number of possible unordered sets.
However
We have n!
(n−r)! collections where we care about the order. Thus
n!
(n−r)!
r!
=
n!
r! (n − r)!
=
n
r
(32)
46 / 97
121. Unordered samples of size r with replacement
Definition
We want to find an unordered set {ai1 , ..., air } with replacement
Use a digit trick for that
Look at the Board
Thus
n + r − 1
r
(33)
47 / 97
122. Unordered samples of size r with replacement
Definition
We want to find an unordered set {ai1 , ..., air } with replacement
Use a digit trick for that
Look at the Board
Thus
n + r − 1
r
(33)
47 / 97
123. Unordered samples of size r with replacement
Definition
We want to find an unordered set {ai1 , ..., air } with replacement
Use a digit trick for that
Look at the Board
Thus
n + r − 1
r
(33)
47 / 97
124. How?
Change encoding by adding more signs
Imagine all the strings of three numbers with {1, 2, 3}
We have
Old String New String
111 1+0,1+1,1+2=123
112 1+0,1+1,2+2=124
113 1+0,1+1,3+2=125
122 1+0,2+1,2+2=134
123 1+0,2+1,3+2=135
133 1+0,3+1,3+2=145
222 2+0,2+1,2+2=234
223 2+0,2+1,3+2=235
233 1+0,3+1,3+2=245
333 3+0,3+1,3+2=345
48 / 97
125. How?
Change encoding by adding more signs
Imagine all the strings of three numbers with {1, 2, 3}
We have
Old String New String
111 1+0,1+1,1+2=123
112 1+0,1+1,2+2=124
113 1+0,1+1,3+2=125
122 1+0,2+1,2+2=134
123 1+0,2+1,3+2=135
133 1+0,3+1,3+2=145
222 2+0,2+1,2+2=234
223 2+0,2+1,3+2=235
233 1+0,3+1,3+2=245
333 3+0,3+1,3+2=345
48 / 97
126. Outline
1 Induction
Basic Induction
Structural Induction
2 Series
Properties
Important Series
Bounding the Series
3 Probability
Intuitive Formulation
Axioms
Independence
Unconditional and Conditional Probability
Posterior (Conditional) Probability
Random Variables
Types of Random Variables
Cumulative Distributive Function
Properties of the PMF/PDF
Expected Value and Variance
Indicator Random Variable
49 / 97
127. Independence
Definition
Two events A and B are independent if and only if
P(A, B) = P(A ∩ B) = P(A)P(B)
Do you have any example?
Any idea?
50 / 97
128. Independence
Definition
Two events A and B are independent if and only if
P(A, B) = P(A ∩ B) = P(A)P(B)
Do you have any example?
Any idea?
50 / 97
129. Example
We have two dices
Thus, we have all pairs (i, j) such that i, j = 1, 2, 3, ..., 6
We have the following events
A ={First dice 1,2 or 3}
B = {First dice 3, 4 or 5}
C = {The sum of two faces is 9}
So, we can do
Look at the board!!! Independence between A, B, C
51 / 97
130. Example
We have two dices
Thus, we have all pairs (i, j) such that i, j = 1, 2, 3, ..., 6
We have the following events
A ={First dice 1,2 or 3}
B = {First dice 3, 4 or 5}
C = {The sum of two faces is 9}
So, we can do
Look at the board!!! Independence between A, B, C
51 / 97
131. Example
We have two dices
Thus, we have all pairs (i, j) such that i, j = 1, 2, 3, ..., 6
We have the following events
A ={First dice 1,2 or 3}
B = {First dice 3, 4 or 5}
C = {The sum of two faces is 9}
So, we can do
Look at the board!!! Independence between A, B, C
51 / 97
132. Example
We have two dices
Thus, we have all pairs (i, j) such that i, j = 1, 2, 3, ..., 6
We have the following events
A ={First dice 1,2 or 3}
B = {First dice 3, 4 or 5}
C = {The sum of two faces is 9}
So, we can do
Look at the board!!! Independence between A, B, C
51 / 97
133. Example
We have two dices
Thus, we have all pairs (i, j) such that i, j = 1, 2, 3, ..., 6
We have the following events
A ={First dice 1,2 or 3}
B = {First dice 3, 4 or 5}
C = {The sum of two faces is 9}
So, we can do
Look at the board!!! Independence between A, B, C
51 / 97
134. We can use it to derive the Binomial Distribution
WHAT?????
52 / 97
135. First, we use a sequence of n Bernoulli Trials
We have this
“Success” has a probability p.
“Failure” has a probability 1 − p.
Examples
Toss a coin independently n times.
Examine components produced on an assembly line.
Now
We take S =all 2n ordered sequences of length n, with components
0(failure) and 1(success).
53 / 97
136. First, we use a sequence of n Bernoulli Trials
We have this
“Success” has a probability p.
“Failure” has a probability 1 − p.
Examples
Toss a coin independently n times.
Examine components produced on an assembly line.
Now
We take S =all 2n ordered sequences of length n, with components
0(failure) and 1(success).
53 / 97
137. First, we use a sequence of n Bernoulli Trials
We have this
“Success” has a probability p.
“Failure” has a probability 1 − p.
Examples
Toss a coin independently n times.
Examine components produced on an assembly line.
Now
We take S =all 2n ordered sequences of length n, with components
0(failure) and 1(success).
53 / 97
138. First, we use a sequence of n Bernoulli Trials
We have this
“Success” has a probability p.
“Failure” has a probability 1 − p.
Examples
Toss a coin independently n times.
Examine components produced on an assembly line.
Now
We take S =all 2n ordered sequences of length n, with components
0(failure) and 1(success).
53 / 97
139. First, we use a sequence of n Bernoulli Trials
We have this
“Success” has a probability p.
“Failure” has a probability 1 − p.
Examples
Toss a coin independently n times.
Examine components produced on an assembly line.
Now
We take S =all 2n ordered sequences of length n, with components
0(failure) and 1(success).
53 / 97
140. Thus, taking a sample ω
ω = 11 · · · 10 · · · 0
k 1’s followed by n − k 0’s.
We have then
P (ω) = P A1 ∩ A2 ∩ . . . ∩ Ak ∩ Ac
k+1 ∩ . . . ∩ Ac
n
= P (A1) P (A2) · · · P (Ak) P Ac
k+1 · · · P (Ac
n)
= pk
(1 − p)n−k
Important
The number of such sample is the number of sets with k elements.... or...
n
k
54 / 97
141. Thus, taking a sample ω
ω = 11 · · · 10 · · · 0
k 1’s followed by n − k 0’s.
We have then
P (ω) = P A1 ∩ A2 ∩ . . . ∩ Ak ∩ Ac
k+1 ∩ . . . ∩ Ac
n
= P (A1) P (A2) · · · P (Ak) P Ac
k+1 · · · P (Ac
n)
= pk
(1 − p)n−k
Important
The number of such sample is the number of sets with k elements.... or...
n
k
54 / 97
142. Thus, taking a sample ω
ω = 11 · · · 10 · · · 0
k 1’s followed by n − k 0’s.
We have then
P (ω) = P A1 ∩ A2 ∩ . . . ∩ Ak ∩ Ac
k+1 ∩ . . . ∩ Ac
n
= P (A1) P (A2) · · · P (Ak) P Ac
k+1 · · · P (Ac
n)
= pk
(1 − p)n−k
Important
The number of such sample is the number of sets with k elements.... or...
n
k
54 / 97
143. Did you notice?
We do not care where the 1’s and 0’s are
Thus all the probabilities are equal to pk (1 − p)k
Thus, we are looking to sum all those probabilities of all those
combinations of 1’s and 0’s
k 1’s
p ωk
Then
k 1’s
p ωk
=
n
k
p (1 − p)n−k
55 / 97
144. Did you notice?
We do not care where the 1’s and 0’s are
Thus all the probabilities are equal to pk (1 − p)k
Thus, we are looking to sum all those probabilities of all those
combinations of 1’s and 0’s
k 1’s
p ωk
Then
k 1’s
p ωk
=
n
k
p (1 − p)n−k
55 / 97
145. Did you notice?
We do not care where the 1’s and 0’s are
Thus all the probabilities are equal to pk (1 − p)k
Thus, we are looking to sum all those probabilities of all those
combinations of 1’s and 0’s
k 1’s
p ωk
Then
k 1’s
p ωk
=
n
k
p (1 − p)n−k
55 / 97
146. Proving this is a probability
Sum of these probabilities is equal to 1
n
k=0
n
k
p (1 − p)n−k
= (p + (1 − p))n
= 1
The other is simple
0 ≤
n
k
p (1 − p)n−k
≤ 1 ∀k
This is know as
The Binomial probability function!!!
56 / 97
147. Proving this is a probability
Sum of these probabilities is equal to 1
n
k=0
n
k
p (1 − p)n−k
= (p + (1 − p))n
= 1
The other is simple
0 ≤
n
k
p (1 − p)n−k
≤ 1 ∀k
This is know as
The Binomial probability function!!!
56 / 97
148. Proving this is a probability
Sum of these probabilities is equal to 1
n
k=0
n
k
p (1 − p)n−k
= (p + (1 − p))n
= 1
The other is simple
0 ≤
n
k
p (1 − p)n−k
≤ 1 ∀k
This is know as
The Binomial probability function!!!
56 / 97
149. Outline
1 Induction
Basic Induction
Structural Induction
2 Series
Properties
Important Series
Bounding the Series
3 Probability
Intuitive Formulation
Axioms
Independence
Unconditional and Conditional Probability
Posterior (Conditional) Probability
Random Variables
Types of Random Variables
Cumulative Distributive Function
Properties of the PMF/PDF
Expected Value and Variance
Indicator Random Variable
57 / 97
150. Different Probabilities
Unconditional
This is the probability of an event A prior to arrival of any evidence, it is
denoted by P(A). For example:
P(Cavity)=0.1 means that “in the absence of any other information,
there is a 10% chance that the patient is having a cavity”.
Conditional
This is the probability of an event A given some evidence B, it is denoted
P(A|B). For example:
P(Cavity/Toothache)=0.8 means that “there is an 80% chance that
the patient is having a cavity given that he is having a toothache”
58 / 97
151. Different Probabilities
Unconditional
This is the probability of an event A prior to arrival of any evidence, it is
denoted by P(A). For example:
P(Cavity)=0.1 means that “in the absence of any other information,
there is a 10% chance that the patient is having a cavity”.
Conditional
This is the probability of an event A given some evidence B, it is denoted
P(A|B). For example:
P(Cavity/Toothache)=0.8 means that “there is an 80% chance that
the patient is having a cavity given that he is having a toothache”
58 / 97
152. Different Probabilities
Unconditional
This is the probability of an event A prior to arrival of any evidence, it is
denoted by P(A). For example:
P(Cavity)=0.1 means that “in the absence of any other information,
there is a 10% chance that the patient is having a cavity”.
Conditional
This is the probability of an event A given some evidence B, it is denoted
P(A|B). For example:
P(Cavity/Toothache)=0.8 means that “there is an 80% chance that
the patient is having a cavity given that he is having a toothache”
58 / 97
153. Different Probabilities
Unconditional
This is the probability of an event A prior to arrival of any evidence, it is
denoted by P(A). For example:
P(Cavity)=0.1 means that “in the absence of any other information,
there is a 10% chance that the patient is having a cavity”.
Conditional
This is the probability of an event A given some evidence B, it is denoted
P(A|B). For example:
P(Cavity/Toothache)=0.8 means that “there is an 80% chance that
the patient is having a cavity given that he is having a toothache”
58 / 97
154. Outline
1 Induction
Basic Induction
Structural Induction
2 Series
Properties
Important Series
Bounding the Series
3 Probability
Intuitive Formulation
Axioms
Independence
Unconditional and Conditional Probability
Posterior (Conditional) Probability
Random Variables
Types of Random Variables
Cumulative Distributive Function
Properties of the PMF/PDF
Expected Value and Variance
Indicator Random Variable
59 / 97
155. Posterior Probabilities
Relation between conditional and unconditional probabilities
Conditional probabilities can be defined in terms of unconditional probabilities:
P(A|B) =
P(A, B)
P(B)
which generalizes to the chain rule P(A, B) = P(B)P(A|B) = P(A)P(B|A).
Law of Total Probabilities
if B1, B2, ..., Bnis a partition of mutually exclusive events and Ais an event, then
P(A) =
n
i=1
P(A ∩ Bi). An special case P(A) = P(A, B) + P(A, B).
In addition, this can be rewritten into P(A) =
n
i=1
P(A|Bi)P(Bi).
60 / 97
156. Posterior Probabilities
Relation between conditional and unconditional probabilities
Conditional probabilities can be defined in terms of unconditional probabilities:
P(A|B) =
P(A, B)
P(B)
which generalizes to the chain rule P(A, B) = P(B)P(A|B) = P(A)P(B|A).
Law of Total Probabilities
if B1, B2, ..., Bnis a partition of mutually exclusive events and Ais an event, then
P(A) =
n
i=1
P(A ∩ Bi). An special case P(A) = P(A, B) + P(A, B).
In addition, this can be rewritten into P(A) =
n
i=1
P(A|Bi)P(Bi).
60 / 97
157. Posterior Probabilities
Relation between conditional and unconditional probabilities
Conditional probabilities can be defined in terms of unconditional probabilities:
P(A|B) =
P(A, B)
P(B)
which generalizes to the chain rule P(A, B) = P(B)P(A|B) = P(A)P(B|A).
Law of Total Probabilities
if B1, B2, ..., Bnis a partition of mutually exclusive events and Ais an event, then
P(A) =
n
i=1
P(A ∩ Bi). An special case P(A) = P(A, B) + P(A, B).
In addition, this can be rewritten into P(A) =
n
i=1
P(A|Bi)P(Bi).
60 / 97
158. Example
Three cards are drawn from a deck
Find the probability of no obtaining a heart
We have
52 cards
39 of them not a heart
Define
Ai ={Card i is not a heart} Then?
61 / 97
159. Example
Three cards are drawn from a deck
Find the probability of no obtaining a heart
We have
52 cards
39 of them not a heart
Define
Ai ={Card i is not a heart} Then?
61 / 97
160. Example
Three cards are drawn from a deck
Find the probability of no obtaining a heart
We have
52 cards
39 of them not a heart
Define
Ai ={Card i is not a heart} Then?
61 / 97
161. Independence and Conditional
From here, we have that...
P(A|B) = P(A) and P(B|A) = P(B).
Conditional independence
A and B are conditionally independent given C if and only if
P(A|B, C) = P(A|C)
Example: P(WetGrass|Season, Rain) = P(WetGrass|Rain).
62 / 97
162. Independence and Conditional
From here, we have that...
P(A|B) = P(A) and P(B|A) = P(B).
Conditional independence
A and B are conditionally independent given C if and only if
P(A|B, C) = P(A|C)
Example: P(WetGrass|Season, Rain) = P(WetGrass|Rain).
62 / 97
163. Bayes Theorem
One Version
P(A|B) =
P(B|A)P(A)
P(B)
Where
P(A) is the prior probability or marginal probability of A. It is
"prior" in the sense that it does not take into account any information
about B.
P(A|B) is the conditional probability of A, given B. It is also called
the posterior probability because it is derived from or depends upon
the specified value of B.
P(B|A) is the conditional probability of B given A. It is also called
the likelihood.
P(B) is the prior or marginal probability of B, and acts as a
normalizing constant.
63 / 97
164. Bayes Theorem
One Version
P(A|B) =
P(B|A)P(A)
P(B)
Where
P(A) is the prior probability or marginal probability of A. It is
"prior" in the sense that it does not take into account any information
about B.
P(A|B) is the conditional probability of A, given B. It is also called
the posterior probability because it is derived from or depends upon
the specified value of B.
P(B|A) is the conditional probability of B given A. It is also called
the likelihood.
P(B) is the prior or marginal probability of B, and acts as a
normalizing constant.
63 / 97
165. Bayes Theorem
One Version
P(A|B) =
P(B|A)P(A)
P(B)
Where
P(A) is the prior probability or marginal probability of A. It is
"prior" in the sense that it does not take into account any information
about B.
P(A|B) is the conditional probability of A, given B. It is also called
the posterior probability because it is derived from or depends upon
the specified value of B.
P(B|A) is the conditional probability of B given A. It is also called
the likelihood.
P(B) is the prior or marginal probability of B, and acts as a
normalizing constant.
63 / 97
166. Bayes Theorem
One Version
P(A|B) =
P(B|A)P(A)
P(B)
Where
P(A) is the prior probability or marginal probability of A. It is
"prior" in the sense that it does not take into account any information
about B.
P(A|B) is the conditional probability of A, given B. It is also called
the posterior probability because it is derived from or depends upon
the specified value of B.
P(B|A) is the conditional probability of B given A. It is also called
the likelihood.
P(B) is the prior or marginal probability of B, and acts as a
normalizing constant.
63 / 97
167. Bayes Theorem
One Version
P(A|B) =
P(B|A)P(A)
P(B)
Where
P(A) is the prior probability or marginal probability of A. It is
"prior" in the sense that it does not take into account any information
about B.
P(A|B) is the conditional probability of A, given B. It is also called
the posterior probability because it is derived from or depends upon
the specified value of B.
P(B|A) is the conditional probability of B given A. It is also called
the likelihood.
P(B) is the prior or marginal probability of B, and acts as a
normalizing constant.
63 / 97
168. General Form of the Bayes Rule
Definition
If A1, A2, ..., An is a partition of mutually exclusive events and B any
event, then:
P(Ai|B) =
P(B|Ai)P(Ai)
P(B)
=
P(B|Ai)P(Ai)
n
i=1 P(B|Ai)P(Ai)
where
P(B) =
n
i=1
P(B ∩ Ai) =
n
i=1
P(B|Ai)P(Ai)
64 / 97
169. General Form of the Bayes Rule
Definition
If A1, A2, ..., An is a partition of mutually exclusive events and B any
event, then:
P(Ai|B) =
P(B|Ai)P(Ai)
P(B)
=
P(B|Ai)P(Ai)
n
i=1 P(B|Ai)P(Ai)
where
P(B) =
n
i=1
P(B ∩ Ai) =
n
i=1
P(B|Ai)P(Ai)
64 / 97
170. Example
Setup
Throw two unbiased dice independently.
Let
1 A ={sum of the faces =8}
2 B ={faces are equal}
Then calculate P (B|A)
Look at the board
65 / 97
171. Example
Setup
Throw two unbiased dice independently.
Let
1 A ={sum of the faces =8}
2 B ={faces are equal}
Then calculate P (B|A)
Look at the board
65 / 97
172. Example
Setup
Throw two unbiased dice independently.
Let
1 A ={sum of the faces =8}
2 B ={faces are equal}
Then calculate P (B|A)
Look at the board
65 / 97
173. Another Example
We have the following
Two coins are available, one unbiased and the other two headed
Assume
That you have a probability of 3
4 to choose the unbiased
Events
A= {head comes up}
B1= {Unbiased coin chosen}
B2= {Biased coin chosen}
Find that if a head come up, find the probability that the two headed
coin was chosen
66 / 97
174. Another Example
We have the following
Two coins are available, one unbiased and the other two headed
Assume
That you have a probability of 3
4 to choose the unbiased
Events
A= {head comes up}
B1= {Unbiased coin chosen}
B2= {Biased coin chosen}
Find that if a head come up, find the probability that the two headed
coin was chosen
66 / 97
175. Another Example
We have the following
Two coins are available, one unbiased and the other two headed
Assume
That you have a probability of 3
4 to choose the unbiased
Events
A= {head comes up}
B1= {Unbiased coin chosen}
B2= {Biased coin chosen}
Find that if a head come up, find the probability that the two headed
coin was chosen
66 / 97
176. Another Example
We have the following
Two coins are available, one unbiased and the other two headed
Assume
That you have a probability of 3
4 to choose the unbiased
Events
A= {head comes up}
B1= {Unbiased coin chosen}
B2= {Biased coin chosen}
Find that if a head come up, find the probability that the two headed
coin was chosen
66 / 97
177. Another Example
We have the following
Two coins are available, one unbiased and the other two headed
Assume
That you have a probability of 3
4 to choose the unbiased
Events
A= {head comes up}
B1= {Unbiased coin chosen}
B2= {Biased coin chosen}
Find that if a head come up, find the probability that the two headed
coin was chosen
66 / 97
178. Outline
1 Induction
Basic Induction
Structural Induction
2 Series
Properties
Important Series
Bounding the Series
3 Probability
Intuitive Formulation
Axioms
Independence
Unconditional and Conditional Probability
Posterior (Conditional) Probability
Random Variables
Types of Random Variables
Cumulative Distributive Function
Properties of the PMF/PDF
Expected Value and Variance
Indicator Random Variable
67 / 97
179. Random Variables
Definition
In many experiments, it is easier to deal with a summary variable than
with the original probability structure.
Example
In an opinion poll, we ask 50 people whether agree or disagree with a
certain issue.
Suppose we record a “1” for agree and “0” for disagree.
The sample space for this experiment has 250 elements. Why?
Suppose we are only interested in the number of people who agree.
Define the variable X =number of “1” ’s recorded out of 50.
Easier to deal with this sample space (has only 51 elements).
68 / 97
180. Random Variables
Definition
In many experiments, it is easier to deal with a summary variable than
with the original probability structure.
Example
In an opinion poll, we ask 50 people whether agree or disagree with a
certain issue.
Suppose we record a “1” for agree and “0” for disagree.
The sample space for this experiment has 250 elements. Why?
Suppose we are only interested in the number of people who agree.
Define the variable X =number of “1” ’s recorded out of 50.
Easier to deal with this sample space (has only 51 elements).
68 / 97
181. Random Variables
Definition
In many experiments, it is easier to deal with a summary variable than
with the original probability structure.
Example
In an opinion poll, we ask 50 people whether agree or disagree with a
certain issue.
Suppose we record a “1” for agree and “0” for disagree.
The sample space for this experiment has 250 elements. Why?
Suppose we are only interested in the number of people who agree.
Define the variable X =number of “1” ’s recorded out of 50.
Easier to deal with this sample space (has only 51 elements).
68 / 97
182. Random Variables
Definition
In many experiments, it is easier to deal with a summary variable than
with the original probability structure.
Example
In an opinion poll, we ask 50 people whether agree or disagree with a
certain issue.
Suppose we record a “1” for agree and “0” for disagree.
The sample space for this experiment has 250 elements. Why?
Suppose we are only interested in the number of people who agree.
Define the variable X =number of “1” ’s recorded out of 50.
Easier to deal with this sample space (has only 51 elements).
68 / 97
183. Random Variables
Definition
In many experiments, it is easier to deal with a summary variable than
with the original probability structure.
Example
In an opinion poll, we ask 50 people whether agree or disagree with a
certain issue.
Suppose we record a “1” for agree and “0” for disagree.
The sample space for this experiment has 250 elements. Why?
Suppose we are only interested in the number of people who agree.
Define the variable X =number of “1” ’s recorded out of 50.
Easier to deal with this sample space (has only 51 elements).
68 / 97
184. Random Variables
Definition
In many experiments, it is easier to deal with a summary variable than
with the original probability structure.
Example
In an opinion poll, we ask 50 people whether agree or disagree with a
certain issue.
Suppose we record a “1” for agree and “0” for disagree.
The sample space for this experiment has 250 elements. Why?
Suppose we are only interested in the number of people who agree.
Define the variable X =number of “1” ’s recorded out of 50.
Easier to deal with this sample space (has only 51 elements).
68 / 97
185. Random Variables
Definition
In many experiments, it is easier to deal with a summary variable than
with the original probability structure.
Example
In an opinion poll, we ask 50 people whether agree or disagree with a
certain issue.
Suppose we record a “1” for agree and “0” for disagree.
The sample space for this experiment has 250 elements. Why?
Suppose we are only interested in the number of people who agree.
Define the variable X =number of “1” ’s recorded out of 50.
Easier to deal with this sample space (has only 51 elements).
68 / 97
186. Thus...
It is necessary to define a function random variable as follow
X : S → R
Graphically
69 / 97
187. Thus...
It is necessary to define a function random variable as follow
X : S → R
Graphically
S
A
69 / 97
188. Random Variables
How?
What is the probability function of the random variable is being defined
from the probability function of the original sample space?
Suppose the sample space is S = {s1, s2, ..., sn}
Suppose the range of the random variable X =< x1, x2, ..., xm >
Then, we observe X = xi if and only if the outcome of the random
experiment is an sj ∈ S s.t. X(sj) = xj or
70 / 97
189. Random Variables
How?
What is the probability function of the random variable is being defined
from the probability function of the original sample space?
Suppose the sample space is S = {s1, s2, ..., sn}
Suppose the range of the random variable X =< x1, x2, ..., xm >
Then, we observe X = xi if and only if the outcome of the random
experiment is an sj ∈ S s.t. X(sj) = xj or
70 / 97
190. Random Variables
How?
What is the probability function of the random variable is being defined
from the probability function of the original sample space?
Suppose the sample space is S = {s1, s2, ..., sn}
Suppose the range of the random variable X =< x1, x2, ..., xm >
Then, we observe X = xi if and only if the outcome of the random
experiment is an sj ∈ S s.t. X(sj) = xj or
70 / 97
191. Random Variables
How?
What is the probability function of the random variable is being defined
from the probability function of the original sample space?
Suppose the sample space is S = {s1, s2, ..., sn}
Suppose the range of the random variable X =< x1, x2, ..., xm >
Then, we observe X = xi if and only if the outcome of the random
experiment is an sj ∈ S s.t. X(sj) = xj or
P(X = xj) = P(sj ∈ S|X(sj) = xj)
70 / 97
192. Example
Setup
Throw a coin 10 times, and let R be the number of heads.
Then
S = all sequences of length 10 with components H and T
We have for
ω =HHHHTTHTTH ⇒ R (ω) = 6
71 / 97
193. Example
Setup
Throw a coin 10 times, and let R be the number of heads.
Then
S = all sequences of length 10 with components H and T
We have for
ω =HHHHTTHTTH ⇒ R (ω) = 6
71 / 97
194. Example
Setup
Throw a coin 10 times, and let R be the number of heads.
Then
S = all sequences of length 10 with components H and T
We have for
ω =HHHHTTHTTH ⇒ R (ω) = 6
71 / 97
195. Example
Setup
Let R be the number of heads in two independent tosses of a coin.
Probability of head is .6
What are the probabilities?
Ω ={HH,HT,TH,TT}
Thus, we can calculate
P (R = 0) , P (R = 1) , P (R = 2)
72 / 97
196. Example
Setup
Let R be the number of heads in two independent tosses of a coin.
Probability of head is .6
What are the probabilities?
Ω ={HH,HT,TH,TT}
Thus, we can calculate
P (R = 0) , P (R = 1) , P (R = 2)
72 / 97
197. Example
Setup
Let R be the number of heads in two independent tosses of a coin.
Probability of head is .6
What are the probabilities?
Ω ={HH,HT,TH,TT}
Thus, we can calculate
P (R = 0) , P (R = 1) , P (R = 2)
72 / 97
198. Outline
1 Induction
Basic Induction
Structural Induction
2 Series
Properties
Important Series
Bounding the Series
3 Probability
Intuitive Formulation
Axioms
Independence
Unconditional and Conditional Probability
Posterior (Conditional) Probability
Random Variables
Types of Random Variables
Cumulative Distributive Function
Properties of the PMF/PDF
Expected Value and Variance
Indicator Random Variable
73 / 97
199. Types of Random Variables
Discrete
A discrete random variable can assume only a countable number of values.
Continuous
A continuous random variable can assume a continuous range of values.
74 / 97
200. Types of Random Variables
Discrete
A discrete random variable can assume only a countable number of values.
Continuous
A continuous random variable can assume a continuous range of values.
74 / 97
201. Properties
Probability Mass Function (PMF) and Probability Density Function (PDF)
The pmf /pdf of a random variable X assigns a probability for each
possible value of X.
Properties of the pmf and pdf
Some properties of the pmf:
x p(x) = 1 and P(a < X < b) =
b
k=a p(k).
In a similar way for the pdf:
´ ∞
−∞
p(x)dx = 1 and P(a < X < b) =
´ b
a
p(t)dt .
75 / 97
202. Properties
Probability Mass Function (PMF) and Probability Density Function (PDF)
The pmf /pdf of a random variable X assigns a probability for each
possible value of X.
Properties of the pmf and pdf
Some properties of the pmf:
x p(x) = 1 and P(a < X < b) =
b
k=a p(k).
In a similar way for the pdf:
´ ∞
−∞
p(x)dx = 1 and P(a < X < b) =
´ b
a
p(t)dt .
75 / 97
203. Properties
Probability Mass Function (PMF) and Probability Density Function (PDF)
The pmf /pdf of a random variable X assigns a probability for each
possible value of X.
Properties of the pmf and pdf
Some properties of the pmf:
x p(x) = 1 and P(a < X < b) =
b
k=a p(k).
In a similar way for the pdf:
´ ∞
−∞
p(x)dx = 1 and P(a < X < b) =
´ b
a
p(t)dt .
75 / 97
204. Properties
Probability Mass Function (PMF) and Probability Density Function (PDF)
The pmf /pdf of a random variable X assigns a probability for each
possible value of X.
Properties of the pmf and pdf
Some properties of the pmf:
x p(x) = 1 and P(a < X < b) =
b
k=a p(k).
In a similar way for the pdf:
´ ∞
−∞
p(x)dx = 1 and P(a < X < b) =
´ b
a
p(t)dt .
75 / 97
205. Properties
Probability Mass Function (PMF) and Probability Density Function (PDF)
The pmf /pdf of a random variable X assigns a probability for each
possible value of X.
Properties of the pmf and pdf
Some properties of the pmf:
x p(x) = 1 and P(a < X < b) =
b
k=a p(k).
In a similar way for the pdf:
´ ∞
−∞
p(x)dx = 1 and P(a < X < b) =
´ b
a
p(t)dt .
75 / 97
206. Outline
1 Induction
Basic Induction
Structural Induction
2 Series
Properties
Important Series
Bounding the Series
3 Probability
Intuitive Formulation
Axioms
Independence
Unconditional and Conditional Probability
Posterior (Conditional) Probability
Random Variables
Types of Random Variables
Cumulative Distributive Function
Properties of the PMF/PDF
Expected Value and Variance
Indicator Random Variable
76 / 97
207. Cumulative Distributive Function I
Cumulative Distribution Function
With every random variable, we associate a function called
Cumulative Distribution Function (CDF) which is defined as follows:
FX (x) = P(f (X) ≤ x)
With properties:
FX (x) ≥ 0
FX (x) in a non-decreasing function of X.
Example
If X is discrete, its CDF can be computed as follows:
FX (x) = P(f (X) ≤ x) = N
k=1 P(Xk = pk).
77 / 97
208. Cumulative Distributive Function I
Cumulative Distribution Function
With every random variable, we associate a function called
Cumulative Distribution Function (CDF) which is defined as follows:
FX (x) = P(f (X) ≤ x)
With properties:
FX (x) ≥ 0
FX (x) in a non-decreasing function of X.
Example
If X is discrete, its CDF can be computed as follows:
FX (x) = P(f (X) ≤ x) = N
k=1 P(Xk = pk).
77 / 97
209. Cumulative Distributive Function I
Cumulative Distribution Function
With every random variable, we associate a function called
Cumulative Distribution Function (CDF) which is defined as follows:
FX (x) = P(f (X) ≤ x)
With properties:
FX (x) ≥ 0
FX (x) in a non-decreasing function of X.
Example
If X is discrete, its CDF can be computed as follows:
FX (x) = P(f (X) ≤ x) = N
k=1 P(Xk = pk).
77 / 97
210. Cumulative Distributive Function I
Cumulative Distribution Function
With every random variable, we associate a function called
Cumulative Distribution Function (CDF) which is defined as follows:
FX (x) = P(f (X) ≤ x)
With properties:
FX (x) ≥ 0
FX (x) in a non-decreasing function of X.
Example
If X is discrete, its CDF can be computed as follows:
FX (x) = P(f (X) ≤ x) = N
k=1 P(Xk = pk).
77 / 97
212. Cumulative Distributive Function II
Continuous Function
If X is continuous, its CDF can be computed as follows:
F(x) =
ˆ x
−∞
f (t)dt.
Remark
Based in the fundamental theorem of calculus, we have the following
equality.
p(x) =
dF
dx
(x)
Note
This particular p(x) is known as the Probability Mass Function (PMF) or
Probability Distribution Function (PDF).
79 / 97
213. Cumulative Distributive Function II
Continuous Function
If X is continuous, its CDF can be computed as follows:
F(x) =
ˆ x
−∞
f (t)dt.
Remark
Based in the fundamental theorem of calculus, we have the following
equality.
p(x) =
dF
dx
(x)
Note
This particular p(x) is known as the Probability Mass Function (PMF) or
Probability Distribution Function (PDF).
79 / 97
214. Cumulative Distributive Function II
Continuous Function
If X is continuous, its CDF can be computed as follows:
F(x) =
ˆ x
−∞
f (t)dt.
Remark
Based in the fundamental theorem of calculus, we have the following
equality.
p(x) =
dF
dx
(x)
Note
This particular p(x) is known as the Probability Mass Function (PMF) or
Probability Distribution Function (PDF).
79 / 97
215. Example: Continuous Function
Setup
A number X is chosen at random between a and b
Xhas a uniform distribution
fX (x) = 1
b−a for a ≤ x ≤ b
fX (x) = 0 for x < a and x > b
We have
FX (x) = P {X ≤ x} =
ˆ x
−∞
fX (t) dt (34)
P {a < X ≤ b} =
ˆ b
a
fX (t) dt (35)
80 / 97
216. Example: Continuous Function
Setup
A number X is chosen at random between a and b
Xhas a uniform distribution
fX (x) = 1
b−a for a ≤ x ≤ b
fX (x) = 0 for x < a and x > b
We have
FX (x) = P {X ≤ x} =
ˆ x
−∞
fX (t) dt (34)
P {a < X ≤ b} =
ˆ b
a
fX (t) dt (35)
80 / 97
217. Example: Continuous Function
Setup
A number X is chosen at random between a and b
Xhas a uniform distribution
fX (x) = 1
b−a for a ≤ x ≤ b
fX (x) = 0 for x < a and x > b
We have
FX (x) = P {X ≤ x} =
ˆ x
−∞
fX (t) dt (34)
P {a < X ≤ b} =
ˆ b
a
fX (t) dt (35)
80 / 97
218. Example: Continuous Function
Setup
A number X is chosen at random between a and b
Xhas a uniform distribution
fX (x) = 1
b−a for a ≤ x ≤ b
fX (x) = 0 for x < a and x > b
We have
FX (x) = P {X ≤ x} =
ˆ x
−∞
fX (t) dt (34)
P {a < X ≤ b} =
ˆ b
a
fX (t) dt (35)
80 / 97
219. Example: Continuous Function
Setup
A number X is chosen at random between a and b
Xhas a uniform distribution
fX (x) = 1
b−a for a ≤ x ≤ b
fX (x) = 0 for x < a and x > b
We have
FX (x) = P {X ≤ x} =
ˆ x
−∞
fX (t) dt (34)
P {a < X ≤ b} =
ˆ b
a
fX (t) dt (35)
80 / 97
220. Example: Continuous Function
Setup
A number X is chosen at random between a and b
Xhas a uniform distribution
fX (x) = 1
b−a for a ≤ x ≤ b
fX (x) = 0 for x < a and x > b
We have
FX (x) = P {X ≤ x} =
ˆ x
−∞
fX (t) dt (34)
P {a < X ≤ b} =
ˆ b
a
fX (t) dt (35)
80 / 97
222. Outline
1 Induction
Basic Induction
Structural Induction
2 Series
Properties
Important Series
Bounding the Series
3 Probability
Intuitive Formulation
Axioms
Independence
Unconditional and Conditional Probability
Posterior (Conditional) Probability
Random Variables
Types of Random Variables
Cumulative Distributive Function
Properties of the PMF/PDF
Expected Value and Variance
Indicator Random Variable
82 / 97
223. Properties of the PMF/PDF
Conditional PMF/PDF
We have the conditional pdf:
p(y|x) =
p(x, y)
p(x)
.
From this, we have the general chain rule
p(x1, x2, ..., xn) = p(x1|x2, ..., xn)p(x2|x3, ..., xn)...p(xn).
Independence
If X and Y are independent, then:
p(x, y) = p(x)p(y).
83 / 97
224. Properties of the PMF/PDF
Conditional PMF/PDF
We have the conditional pdf:
p(y|x) =
p(x, y)
p(x)
.
From this, we have the general chain rule
p(x1, x2, ..., xn) = p(x1|x2, ..., xn)p(x2|x3, ..., xn)...p(xn).
Independence
If X and Y are independent, then:
p(x, y) = p(x)p(y).
83 / 97
225. Properties of the PMF/PDF
Law of Total Probability
p(y) =
x
p(y|x)p(x).
84 / 97
226. Outline
1 Induction
Basic Induction
Structural Induction
2 Series
Properties
Important Series
Bounding the Series
3 Probability
Intuitive Formulation
Axioms
Independence
Unconditional and Conditional Probability
Posterior (Conditional) Probability
Random Variables
Types of Random Variables
Cumulative Distributive Function
Properties of the PMF/PDF
Expected Value and Variance
Indicator Random Variable
85 / 97
227. Expectation
Something Notable
You have the random variables R1, R2 representing how long is a call and
how much you pay for an international call:
if 0 ≤ R1 ≤ 3(minute) R2 = 10(cents)
if 3 < R1 ≤ 6(minute) R2 = 20(cents)
if 6 < R1 ≤ 9(minute) R2 = 30(cents)
We have then the probabilities
P {R2 = 10} = 0.6, P {R2 = 20} = 0.25, P {R2 = 10} = 0.15.
If we observe N calls and N is very large
We can say that we have N × 0.6 calls and 10 × N × 0.6 the cost of those
calls.
86 / 97
228. Expectation
Something Notable
You have the random variables R1, R2 representing how long is a call and
how much you pay for an international call:
if 0 ≤ R1 ≤ 3(minute) R2 = 10(cents)
if 3 < R1 ≤ 6(minute) R2 = 20(cents)
if 6 < R1 ≤ 9(minute) R2 = 30(cents)
We have then the probabilities
P {R2 = 10} = 0.6, P {R2 = 20} = 0.25, P {R2 = 10} = 0.15.
If we observe N calls and N is very large
We can say that we have N × 0.6 calls and 10 × N × 0.6 the cost of those
calls.
86 / 97
229. Expectation
Something Notable
You have the random variables R1, R2 representing how long is a call and
how much you pay for an international call:
if 0 ≤ R1 ≤ 3(minute) R2 = 10(cents)
if 3 < R1 ≤ 6(minute) R2 = 20(cents)
if 6 < R1 ≤ 9(minute) R2 = 30(cents)
We have then the probabilities
P {R2 = 10} = 0.6, P {R2 = 20} = 0.25, P {R2 = 10} = 0.15.
If we observe N calls and N is very large
We can say that we have N × 0.6 calls and 10 × N × 0.6 the cost of those
calls.
86 / 97
230. Expectation
Something Notable
You have the random variables R1, R2 representing how long is a call and
how much you pay for an international call:
if 0 ≤ R1 ≤ 3(minute) R2 = 10(cents)
if 3 < R1 ≤ 6(minute) R2 = 20(cents)
if 6 < R1 ≤ 9(minute) R2 = 30(cents)
We have then the probabilities
P {R2 = 10} = 0.6, P {R2 = 20} = 0.25, P {R2 = 10} = 0.15.
If we observe N calls and N is very large
We can say that we have N × 0.6 calls and 10 × N × 0.6 the cost of those
calls.
86 / 97
231. Expectation
Something Notable
You have the random variables R1, R2 representing how long is a call and
how much you pay for an international call:
if 0 ≤ R1 ≤ 3(minute) R2 = 10(cents)
if 3 < R1 ≤ 6(minute) R2 = 20(cents)
if 6 < R1 ≤ 9(minute) R2 = 30(cents)
We have then the probabilities
P {R2 = 10} = 0.6, P {R2 = 20} = 0.25, P {R2 = 10} = 0.15.
If we observe N calls and N is very large
We can say that we have N × 0.6 calls and 10 × N × 0.6 the cost of those
calls.
86 / 97
232. Expectation
Something Notable
You have the random variables R1, R2 representing how long is a call and
how much you pay for an international call:
if 0 ≤ R1 ≤ 3(minute) R2 = 10(cents)
if 3 < R1 ≤ 6(minute) R2 = 20(cents)
if 6 < R1 ≤ 9(minute) R2 = 30(cents)
We have then the probabilities
P {R2 = 10} = 0.6, P {R2 = 20} = 0.25, P {R2 = 10} = 0.15.
If we observe N calls and N is very large
We can say that we have N × 0.6 calls and 10 × N × 0.6 the cost of those
calls.
86 / 97
233. Expectation
Similarly
{R2 = 20} =⇒ 0.25N and total cost 5N.
{R2 = 20} =⇒ 0.15N and total cost 4.5N.
We have then the probabilities
The total cost is 6N + 5N + 4.5N = 15.5N or in average 15.5 cents per
call.
The average
10 (0.6N) + 20 (.25N) + 30 (0.15N)
N
=10 (0.6) + 20 (.25) + 30 (0.15)
=
y
yP {R2 = y}
87 / 97
234. Expectation
Similarly
{R2 = 20} =⇒ 0.25N and total cost 5N.
{R2 = 20} =⇒ 0.15N and total cost 4.5N.
We have then the probabilities
The total cost is 6N + 5N + 4.5N = 15.5N or in average 15.5 cents per
call.
The average
10 (0.6N) + 20 (.25N) + 30 (0.15N)
N
=10 (0.6) + 20 (.25) + 30 (0.15)
=
y
yP {R2 = y}
87 / 97
235. Expectation
Similarly
{R2 = 20} =⇒ 0.25N and total cost 5N.
{R2 = 20} =⇒ 0.15N and total cost 4.5N.
We have then the probabilities
The total cost is 6N + 5N + 4.5N = 15.5N or in average 15.5 cents per
call.
The average
10 (0.6N) + 20 (.25N) + 30 (0.15N)
N
=10 (0.6) + 20 (.25) + 30 (0.15)
=
y
yP {R2 = y}
87 / 97
236. Expectation
Similarly
{R2 = 20} =⇒ 0.25N and total cost 5N.
{R2 = 20} =⇒ 0.15N and total cost 4.5N.
We have then the probabilities
The total cost is 6N + 5N + 4.5N = 15.5N or in average 15.5 cents per
call.
The average
10 (0.6N) + 20 (.25N) + 30 (0.15N)
N
=10 (0.6) + 20 (.25) + 30 (0.15)
=
y
yP {R2 = y}
87 / 97
237. Expected Value
Definition
Discrete random variable X: E(X) = x xp(x).
Continuous random variable Y : E(Y ) =
´
x xp(x)dx.
Extension to a function g (x)
E(g(X)) = x g(x)p(x) (Discrete case).
E(g(X)) =
´ +∞
−∞ g(x)p(x)dx (Continuous case).
88 / 97
238. Expected Value
Definition
Discrete random variable X: E(X) = x xp(x).
Continuous random variable Y : E(Y ) =
´
x xp(x)dx.
Extension to a function g (x)
E(g(X)) = x g(x)p(x) (Discrete case).
E(g(X)) =
´ +∞
−∞ g(x)p(x)dx (Continuous case).
88 / 97
239. Expected Value
Definition
Discrete random variable X: E(X) = x xp(x).
Continuous random variable Y : E(Y ) =
´
x xp(x)dx.
Extension to a function g (x)
E(g(X)) = x g(x)p(x) (Discrete case).
E(g(X)) =
´ +∞
−∞ g(x)p(x)dx (Continuous case).
88 / 97
240. Expected Value
Definition
Discrete random variable X: E(X) = x xp(x).
Continuous random variable Y : E(Y ) =
´
x xp(x)dx.
Extension to a function g (x)
E(g(X)) = x g(x)p(x) (Discrete case).
E(g(X)) =
´ +∞
−∞ g(x)p(x)dx (Continuous case).
88 / 97
241. Linear Property
Linearity property of the Expected Value
E(af (X) + bg(Y )) = aE(f (X)) + bE(g(Y )) (36)
Example for a discrete distribution
E [aX + b] =
x
[ax + b] p (x|θ)
=a
x
xp (x|θ) + b
x
p (x|θ)
=aE [X] + b
89 / 97
242. Linear Property
Linearity property of the Expected Value
E(af (X) + bg(Y )) = aE(f (X)) + bE(g(Y )) (36)
Example for a discrete distribution
E [aX + b] =
x
[ax + b] p (x|θ)
=a
x
xp (x|θ) + b
x
p (x|θ)
=aE [X] + b
89 / 97
243. Linear Property
Linearity property of the Expected Value
E(af (X) + bg(Y )) = aE(f (X)) + bE(g(Y )) (36)
Example for a discrete distribution
E [aX + b] =
x
[ax + b] p (x|θ)
=a
x
xp (x|θ) + b
x
p (x|θ)
=aE [X] + b
89 / 97
244. Linear Property
Linearity property of the Expected Value
E(af (X) + bg(Y )) = aE(f (X)) + bE(g(Y )) (36)
Example for a discrete distribution
E [aX + b] =
x
[ax + b] p (x|θ)
=a
x
xp (x|θ) + b
x
p (x|θ)
=aE [X] + b
89 / 97
245. Example
Imagine the following
We have the following functions
1 f (x) = e−x, x ≥ 0
2 g (x) = 0, x < 0
Find
The expected Value
90 / 97
246. Example
Imagine the following
We have the following functions
1 f (x) = e−x, x ≥ 0
2 g (x) = 0, x < 0
Find
The expected Value
90 / 97
247. Example
Imagine the following
We have the following functions
1 f (x) = e−x, x ≥ 0
2 g (x) = 0, x < 0
Find
The expected Value
90 / 97
248. Example
Imagine the following
We have the following functions
1 f (x) = e−x, x ≥ 0
2 g (x) = 0, x < 0
Find
The expected Value
90 / 97
251. Outline
1 Induction
Basic Induction
Structural Induction
2 Series
Properties
Important Series
Bounding the Series
3 Probability
Intuitive Formulation
Axioms
Independence
Unconditional and Conditional Probability
Posterior (Conditional) Probability
Random Variables
Types of Random Variables
Cumulative Distributive Function
Properties of the PMF/PDF
Expected Value and Variance
Indicator Random Variable
92 / 97
252. Indicator Random Variable
Definition
The indicator of an event A is a random variable IA defined as follows:
IA (ω) =
1 if ω ∈ A
0 if ω /∈ A
(37)
Thus
If IA = 1 if A occurs.
If IA = 0 if A does not occur.
Why is this useful?
Because we can count events!!!
93 / 97
253. Indicator Random Variable
Definition
The indicator of an event A is a random variable IA defined as follows:
IA (ω) =
1 if ω ∈ A
0 if ω /∈ A
(37)
Thus
If IA = 1 if A occurs.
If IA = 0 if A does not occur.
Why is this useful?
Because we can count events!!!
93 / 97
254. Indicator Random Variable
Definition
The indicator of an event A is a random variable IA defined as follows:
IA (ω) =
1 if ω ∈ A
0 if ω /∈ A
(37)
Thus
If IA = 1 if A occurs.
If IA = 0 if A does not occur.
Why is this useful?
Because we can count events!!!
93 / 97
255. Indicator Random Variable
Definition
The indicator of an event A is a random variable IA defined as follows:
IA (ω) =
1 if ω ∈ A
0 if ω /∈ A
(37)
Thus
If IA = 1 if A occurs.
If IA = 0 if A does not occur.
Why is this useful?
Because we can count events!!!
93 / 97
256. Useful Property
The Expected value of an indicator function
E [IA] = 0 × P (IA = 0) + 1 × P (IA = 1) = P (IA = 1) = P (A) (38)
Why is this useful?
Because we can use this to count things when a probability is involved!!!
94 / 97
257. Useful Property
The Expected value of an indicator function
E [IA] = 0 × P (IA = 0) + 1 × P (IA = 1) = P (IA = 1) = P (A) (38)
Why is this useful?
Because we can use this to count things when a probability is involved!!!
94 / 97
258. Useful Property
The Expected value of an indicator function
E [IA] = 0 × P (IA = 0) + 1 × P (IA = 1) = P (IA = 1) = P (A) (38)
Why is this useful?
Because we can use this to count things when a probability is involved!!!
94 / 97
259. Useful Property
The Expected value of an indicator function
E [IA] = 0 × P (IA = 0) + 1 × P (IA = 1) = P (IA = 1) = P (A) (38)
Why is this useful?
Because we can use this to count things when a probability is involved!!!
94 / 97
260. Useful Property
The Expected value of an indicator function
E [IA] = 0 × P (IA = 0) + 1 × P (IA = 1) = P (IA = 1) = P (A) (38)
Why is this useful?
Because we can use this to count things when a probability is involved!!!
94 / 97
261. Method of Indicators
It is possible to see random variables as a sum of indicators functions
R = IA1 + ... + IAn (39)
Then
E [R] =
n
j=1
E IAj =
n
j=1
P (Aj) (40)
Hopefully
It is easier to calculate P (Aj) than to evaluate E [R] directly.
95 / 97
262. Method of Indicators
It is possible to see random variables as a sum of indicators functions
R = IA1 + ... + IAn (39)
Then
E [R] =
n
j=1
E IAj =
n
j=1
P (Aj) (40)
Hopefully
It is easier to calculate P (Aj) than to evaluate E [R] directly.
95 / 97
263. Method of Indicators
It is possible to see random variables as a sum of indicators functions
R = IA1 + ... + IAn (39)
Then
E [R] =
n
j=1
E IAj =
n
j=1
P (Aj) (40)
Hopefully
It is easier to calculate P (Aj) than to evaluate E [R] directly.
95 / 97
264. Example
A single unbiased die is tossed independently n times
Let R1 be the numbers of 1’s.
Let R2 be the numbers of 2’s.
Find E [R1R2]
We can express each variable as
R1 =IA1 + ... + IAn
R2 =IB1 + ... + IBn
Thus
E [R1R2] =
n
i=1
n
j=1
E IAi IBj (41)
96 / 97
265. Example
A single unbiased die is tossed independently n times
Let R1 be the numbers of 1’s.
Let R2 be the numbers of 2’s.
Find E [R1R2]
We can express each variable as
R1 =IA1 + ... + IAn
R2 =IB1 + ... + IBn
Thus
E [R1R2] =
n
i=1
n
j=1
E IAi IBj (41)
96 / 97
266. Example
A single unbiased die is tossed independently n times
Let R1 be the numbers of 1’s.
Let R2 be the numbers of 2’s.
Find E [R1R2]
We can express each variable as
R1 =IA1 + ... + IAn
R2 =IB1 + ... + IBn
Thus
E [R1R2] =
n
i=1
n
j=1
E IAi IBj (41)
96 / 97
267. Next
Case 1 i = j IAi and IBj are independent
E IAi IBj = E [IAi ] E IBj = P (Ai) P (Bj) =
1
6
×
1
6
=
1
36
(42)
Case 2 i = j Ai and Bi are disjoint
Thus, IAi IBi = IAi∩Bi = 0
Thus
E [R1R2] =
n (n − 1)
36
(43)
97 / 97
268. Next
Case 1 i = j IAi and IBj are independent
E IAi IBj = E [IAi ] E IBj = P (Ai) P (Bj) =
1
6
×
1
6
=
1
36
(42)
Case 2 i = j Ai and Bi are disjoint
Thus, IAi IBi = IAi∩Bi = 0
Thus
E [R1R2] =
n (n − 1)
36
(43)
97 / 97
269. Next
Case 1 i = j IAi and IBj are independent
E IAi IBj = E [IAi ] E IBj = P (Ai) P (Bj) =
1
6
×
1
6
=
1
36
(42)
Case 2 i = j Ai and Bi are disjoint
Thus, IAi IBi = IAi∩Bi = 0
Thus
E [R1R2] =
n (n − 1)
36
(43)
97 / 97