2. Random Variables
• Random experiment: the outcome cannot be predicted with
certainty
• Statistics: model and analyze the outcomes
• Sample space S = set of all possible outcomes
• Die X = { 1, 2, 3, 4, 5, 6}
• Period of a pendulum
•Errors in the measuring process
•Fundamental unpredictability
Discrete random variable
Continous random variable
3. • discrete vs. continuous probabilities
• discrete
– finite number of outcomes
• continuous
– outcomes vary along continuous scale
basic concepts (cont.)
Basic Concepts
5. Independent events
• one event has no influence on the outcome
of another event
• if events A & B are independent
then P(A&B) = P(A)*P(B)
• if P(A&B) = P(A)*P(B)
then events A & B are independent
• coin flipping
if P(H) = P(T) = .5 then
P(HTHTH) = P(HHHHH) =
.5*.5*.5*.5*.5 = .55 = .03
6. • mutually exclusive events are not
independent
• rather, the most dependent kinds of events
– if not heads, then tails
– joint probability of 2 mutually exclusive events
is 0
• P(A&B)=0
Mutually Exclusive Events
7. • if A and B are mutually exclusive events:
P(A or B) = P(A) + P(B)
ex., die roll: P(1 or 6) = 1/6 + 1/6 = .33
• possibility set:
sum of all possible outcomes
~A = anything other than A
P(A or ~A) = P(A) + P(~A) = 1
basic concepts (cont.)
Basic Concepts
11. Bayes’ Theorem
An email message can pass through one of the two server routes
Probability of Error
Route %
messages
Server1 Server2 Server3 Server4
1 30 0.01 0.015
2 70 0.02 0.003
1. What is the probability that a message will arrive without error?
2. If a message arrives in error, find the probability that it was sent
through Route1
12. Bayes’ Theorem
An email message can pass through one of the two server routes
Probability of Error
Route %
messages
Server1 Server2 Server3 Server4
1 30 0.01 0.015
2 70 0.02 0.003
1. What is the probability that a message will arrive without error?
2. If a message arrives in error, find the probability that it was sent through Route1
Ans-1
P(R1) = 0.30; P(R2) = 0.70; Calculate P(Er/R1) = (0.01+0.015) = 0.025------(1)
P(Error) = (0.
30* 0.025)+( 0.70* 0.023) = 0.0236 =>
Ans 1 = 97.64% ie (1-0.0236)
Ans-2
P(Error/R1) = [P(R1) * P (Error/R1)]/ P(Error) = [(0.030* 0.025)/ 00236] = 0.6822
13. Assignment
The probability of the presence of an error in coding is
0.05.
If the probability of a tester detecting an error when the
error is present is 0.78; and the probability of
incorrectly detecting an error when the error is not
present is 0.06.
What is the probability that a code is tested as having an
error? What is the probability that a code tested as
having an error when the error is present?
15. NB
• Let’s say we’re testing for a rare disease, where 1% of
the population is infected. We have a highly sensitive
and specific test, which is not quite perfect:
• 99% of sick patients test positive.
• 99% of healthy patients test negative.
• Given that a patient tests positive, what is the
probability that the patient is actually sick?
• Consider 10,000 perfectly representative people.
17. PDF/ PMF
• Probability that an event occurs
– probability density function - continuous random variables or
– probability mass function - discrete random variables.
• To find the probability that a continuous random variable falls
in a particular interval of real numbers - calculate the
appropriate area under the curve of f(x) .
• Thus, evaluate the integral of f(x) over the interval of random
variables corresponding to the event of interest. This is
represented by
19. Cumulative Distribution Function (CDF)
CDF F(x) is defined as the probability that the
random variable X assumes a value less than or
equal to a given x.
Calculated from the probability density function,
33. Random Variables(RV)
• A RV is defined as a process or action whose outcome cannot
be predicted with certainty and would likely change when the
experiment is repeated.
• The variability in the outcomes might arise from many
sources: slight errors in measurements
• The sample space is the set of all outcomes from an
experiment. eg dice {1,2,3,4,5,6}.
• The outcomes from random experiments are often represented
by an uppercase variable such as X.
• This is called a RV, and its value is subject to the uncertainty
intrinsic to the experiment.
34. Random Variables(RV)
• Formally, a RV is a real-valued function defined
on the sample space.
• RV can be discrete or continuous.
o A discrete RV : values from a finite or countably infinite set of
numbers. (no of typographical errors on a page.
o A continuous RV : take on values from an interval of real
numbers. (the inter-arrival times of planes at a runway)
35. Random Variables(RV)
• An event is a subset of outcomes in the sample space. (tensile strength
of cement is in the range 40 to 50 kg/cm2.)
• Two events that cannot occur simultaneously or jointly are called
mutually exclusive events.
• Probability is a measure of the likelihood that some event will occur.
• Probabilities range between 0 and 1. A PDF of a RV describes the
probabilities associated with each possible value for the RV.
• Equal likelihood model (assign prob 1/n) and
• The relative frequency method (conduct the experiment n times and record the
outcome. The probability of event E is assigned by P(E) = f ⁄ n where f denotes the
number of experimental outcomes that satisfy event E.
36. RV
• Discrete RV
o Binomial
o Poisson
• Continuous distributions:
o uniform,
o normal,
o exponential,
o gamma,
o chi-square, the Weibull, the beta and the multivariate
normal etc.
37. The binomial distribution
• A discrete probability distribution.
• It describes the outcome of n independent trials in an
experiment. Each trial is assumed to have only two
outcomes, either success or failure.
• If the probability of a successful trial is p, then the
probability of having x successful outcomes in an
experiment of n independent trials is as follows.
• Mean E[X] = np and V[X] = np(1-p)
38. The binomial distribution
• Frequently used to model the number of
successes in a sample of size n drawn with
replacement from a population of size N.
39. The binomial distribution
• Suppose there are twelve multiple choice questions in an
English class quiz. Each question has five possible answers,
and only one of them is correct.
• Find the probability of having
(a) Exactly four answers correct
(b) four or less correct answers
if a student attempts to answer every question at random.
40. The binomial distribution
• Suppose there are twelve multiple choice questions in an English class quiz. Each question has five possible
answers, and only one of them is correct. Find the probability of having (a) Exactly 4 ans correct (b) four or
less correct answers if a student attempts to answer every question at random.
• Solution
Since only one out of five possible answers is correct, the probability of answering a question correctly by
random is 1/5=0.2.
We can find the probability of having exactly 4 correct answers by random attempts as follows.
• (4, size=12, prob=0.2)
sum(np.random.binomial(12, 0.2, 20000) == 4)/20000. # Exactly four answers correct
0.1329
• To find the probability of having four or less correct answers by random attempts, find with x = 0,…,4.
• (0, size=12, prob=0.2) +
(1, size=12, prob=0.2) +
(2, size=12, prob=0.2) +
(3, size=12, prob=0.2) +
(4, size=12, prob=0.2)
sum(np.random.binomial(12, 0.2, 20000) <= 4)/20000. # four or less correct
• 0.9274
The probability of four or less questions answered correctly by random in a twelve question multiple choice
quiz is 92.7%.
41. Poisson Distribution
• Limiting case of Binomial, where chance of success is very
small ( p -> 0); n being large and np being small finite
quantity…binomial fails to state the real picture => PD.
• Eg: No of printing mistakes in a book; defects in a length of
wire ie “ Law of improbable events”
• PD is appropriate for applications where events occur at points
in time/ space: Arrival at bank counter/ fuel station / arrival of
aircrafts at a runway
42. Poisson Distribution
• The Poisson distribution is the probability distribution of
independent event occurrences in an interval.
• If λ is the mean occurrence per interval, then the probability of
having x occurrences within a given interval is:
• E[X] = V[X] = lamda
43. Poisson Distribution
• The Poisson distribution is the probability distribution of independent event occurrences in an
interval. If λ is the mean occurrence per interval, then the probability of having x occurrences
within a given interval is:
Ex Problem
• If there are twelve cars crossing a bridge per minute on average, find the probability of having seventeen or
more cars crossing the bridge in a particular minute.
Solution
• The probability of having sixteen or less cars crossing the bridge in a particular minute is
0.89871
• Hence the probability of having seventeen or more cars crossing the bridge in a minute is in the upper tail of
the probability density function.
0.10129
Answer
• If there are twelve cars crossing a bridge per minute on average, the probability of having seventeen or more
cars crossing the bridge in a particular minute is 10.1%.
44. Expectation
• The mean or EV of a RV is defined using the PDF/PMF.
• A measure of central tendency of the distribution. If we observe many
values of the RV and take the average => expect that value to be close to
the mean.
• EXPECTED VALUE - DISCRETE RV
• VARIANCE - DISCRETE RV
If a RV has a large variance, then an observed value of the RV is more likely to be far
from the mean μ.
The SD is the square root of the variance.
47. Moments of a RV
• Other expected values of interest in statistics - moments of a RV.
• The expectation of powers of the RV.
48. Skewness
• The uniform and the normal distribution are examples of
symmetric distributions.
• The gamma and the exponential are examples of skewed or
asymmetric distributions.
• The 3rd central moment - a measure of asymmetry or
skewness in the distribution.
coefficient of skewness,
Distributions skewed to the left - negative coefficient of skewness,
Distributions skewed to the right - positive Value &
for symmetric distributions - Zero.
However, a coefficient of skewness equal to zero does not mean that the distribution must be
symmetric.
49. January 2, 2024 49
Symmetric vs. Skewed Data
• Median, mean and mode of symmetric, positively and negatively
skewed data
• In a unimodal frequency curve with perfect symmetric data
distribution, the mean, median, and mode are all at the same center
value, as shown in Figure
• Data in most real applications are not symmetric.
• They may instead be either positively skewed, where the mode
occurs at a value that is smaller than the median(Figure), or
• negatively skewed, where the mode occurs at a value greater
than the median (Figure).
50. Continuous Random Variables
•A continuous random variable X takes all values in an
interval of numbers.
Not countable
•The probability distribution of a continuous r.v. X is
described by a density curve.
•The probability of any event is the area under the density
curve and above the values of X that make up the event.
51. Kurtosis
• Kurtosis measures a different type of departure from normality -
indicating the extent of the peak (or the degree of flatness near its
center) in a distribution.
• The coefficient of kurtosis :
• If the distribution is normal, then this ratio is equal to 3.
> 3 => more values in the neighborhood of the mean (is more peaked than the
normal distribution).
< 3 => curve is flatter than the normal.
Sometimes the coefficient of excess kurtosis used as a measure of kurtosis.
63. Six Sigma
• For any normal RV
– One sigma covers 68.27%
– Two sigma covers 95.45% and
– Six sigma process is one in which 99.99966%
of all opportunities to produce some feature of
a part are statistically expected to be free of
defects (3.4 defective features per million
opportunities).
– Motorola set a goal of "six sigma" for all of its
manufacturing.
64. Ex – Normal distribution
• Current in a strip of wire is assumed to
follow a normal distribution with mean 10
mA and variance 4 (mA)2.
• Find the probability that the measurement
of current will exceed 13 mA
The continuous random variable 𝑋 has the Normal distribution if the pdf is: 𝑓 𝑥 =
1
2𝜋𝜎2
𝑒
− 𝑥−𝜇 2
2𝜎2
(−∞ < 𝑥 < ∞)
74. Exponential Distribution
Definition
1- exp(-λx), x≥0
0, elsewhere
F(x) =
Cdf
• model the amount of time until a specific event occurs or to model the
time between independent events.
• ex
• the time until the computer locks up,
• the time between arrivals of telephone calls, or
• the time until a part fails.
λ is the average arrival rate of those events
75. • => the probability that the object will operate for time s+t, given it
has already operated for time s, is simply the probability that it
operates for time t.
When the exponential is used to represent inter-arrival times, then
the parameter λ is a rate with units of arrivals per time period.
When the exponential is used to model the time until a failure
occurs, then λ is the failure rate.
Exponential Distribution
76. • The time between arrivals of vehicles at an intersection
follows an exponential distribution with a mean of 12
seconds. What is the probability that the time between
arrivals is 10 seconds or less?
• Given the average inter-arrival time, so λ = 1 ⁄ 12 . The
required probability is
Exponential Distribution
77. Exponential Distribution
Our starting point for observing the system does not matter.
•An interesting property of an exponential random variable is the
lack of memory property.
In Example , suppose that there are no vehicles arriving from 10:00
to 10:15 AM; the probability that there are vehicles arriving in the
next 10 secs is still 0.57
Because we have already been waiting for 15 minutes, we feel that
we are “due.” …ie, we expect the probability of a vehicle arriving in
the next 10 secs should be greater than 0.57.
81. Assignment
• Suppose the mean checkout time of a supermarket cashier is three
minutes. Find the probability of a customer checkout being completed
by the cashier in less than two minutes.(0.48658)
• Assume that the test scores of a college entrance exam fits a normal
distribution. Furthermore, the mean test score is 72, and the standard
deviation is 15.2. What is the percentage of students scoring 84 or
more in the exam? (0.21492)