SlideShare a Scribd company logo
SDM Lecture 4
Probability Distributions
• Probability distributions
• We have discussed some basic ways in
which we can describe a set of actual
statistical data:
– Mean, median and mode as measures of
central tendency
– Variance, Standard Deviation, Coefficient of
Variation as measures of spread.
– Histograms as a way of plotting the overall
distribution of data.
• But to draw useful conclusions from
statistical data, we need to go beyond
description, to analysis.
• To do this, we need to have some idea, or model
of the type of distribution of data we might
expect a priori. We can then compare the actual
data with the prior expectations.
• For example, a new drug is being tested to treat
a disease. The drug is given to a group of
patients, while a placebo (coloured, flavoured
liquid) is given to a control group, with neither
patients nor doctors knowing which group is
which.
• Records are kept as to how quickly/whether
patients recover in each group.
• Hopefully, more who get the drug will recover.
But if this is the case, how can we tell if this is
purely down to chance, or if the drug really is
working?
• We need an idea of what the distribution of
recovery rates would be likely to be if it was
purely down to chance. How often would 10%
recover, how often would 20% recover, etc.
• Only then can we say, for example “50% of the
treatment group recovered, compared to only
30% of the control group. There would be a less
than 1% chance of this difference occurring
randomly.”
• Thus, to conduct meaningful statistical analysis,
we need to know about the shape of
distributions of different statistics we might
expect beforehand.
• This is based on probability theory – where we
make assumptions and deductions about the
probability of different events happening or
different values of data occurring.
Some standard probability
distributions
• Binomial - when the underlying probability
experiment has only two possible outcomes
(e.g. tossing a coin)
• Normal - when many small independent
factors influence a variable (e.g. height,
influenced by genes, diet, etc.)
• Poisson - for rare events, when the
probability of occurrence is low
• Tossing a coin – the binomial distribution
• We toss a coin 100 times. We get 60 heads.
Does this suggest that the coin is biased
towards heads?
• Put another way, if the coin were fair, that is
equally likely to give heads or tails, how likely
would it be to get as many as 60 to 40 of one
side to the other?
• Or put another way: suppose we were to do the
experiment of 100 coin tosses a large number of
times, with a fair coin, and record all the
resulting scores from 0 to 100 heads. How often
would we expect to get 0 heads? How often 1
head? And so on, up to 100. What would we
expect the mean and the standard deviation to
be? What would a histogram of these scores be
expected to look like?
• The resulting distribution is called a binomial
distribution, as it is based on two possible
outcomes in each trial.
• Let’s take an easier case, where we toss a coin
6 times and count the heads. There are seven
possible scores we can get for the number of
heads – 0 up to 6. How likely is each to occur?
• We can think about this in terms of the different
possible sequences of heads and tails we might
get – e.g. HHTHTT or HTTTTH.
• If the coin is fair, each sequence is equally likely.
So what we need to do is count up the number
of possible sequences that can give each
number of heads.
• How many possible sequences are there?
Well, there are six trials, and each can
have two possible outcomes. This means
there are 2x2x2x2x2x2 possible
sequences, or 26 = 64.
• How many of these sequences give
exactly 0 heads? Clearly only 1: TTTTTT.
• How many give 1 head? There are 6 of
these, as the sole head can occur in any
of the six tosses – that is, we can get
HTTTTT, THTTTT, TTHTTT, TTTHTT,
TTTTHT or TTTTTH.
• What about 2 heads? There are rather
more possibilities.
• HHTTTT
• HTHTTT
• HTTHTT
• HTTTHT
• HTTTTH
• THHTTT
• THTHTT
• THTTHT
• THTTTH
• TTHHTT
• TTHTHT
• TTHTTH
• TTTHHT
• TTTHTH
• TTTTHH
That is, there are 15 possibilities.
Similarly, we can show there are 20
ways of getting three heads. (There is
a formula, for any number of trials,
and for any number of ‘successes’, and
for any probability of success in a
single trial – but we won’t do that
here.)
The rest is easy – the number of ways
of getting 4 heads is the same as the
number of ways of getting 2 tails –
that is 15. Likewise, there are 6 ways
of getting 5 heads, and 1 way of
getting 6 heads. (HHHHHH). Thus, the
distribution is symmetric.
No. of heads Frequency
0 1
1 6
2 15
3 20
4 15
5 6
6 1
Note, this is not a histogram of actual data, but of the expected distribution
based on a particular model of the situation – namely, six independent tosses
of a fair coin.
We could also calculate the mean and standard deviation of this distribution –
you might not be surprised to learn that the mean is 3, and it can be shown
that the variance is 1.5. Again, this is not the mean and variance of a set of
observed data, but are anticipated properties of the distribution based on a
theoretical model.
• The Normal Distribution
• The binomial distribution is an example of a
discrete distribution – one where the variable in
question can take a certain number of discrete
values (e.g. 0,1,2,3,…).
• Other distributions are continuous – that is, they
can take any value, or any value in a given range –
for example the height, weight or income of a
randomly selected individual.
• A lot of variables in the real world tend to have a
particular shape of distribution – the normal
distribution – and a lot of statistical analysis is
based on the assumption that certain variables
follow some sort of normal distribution.
• Any Normal Distribution
• Bell-shaped
• Symmetric about mean
• Continuous
• Never touches the x-axis
• Total area under curve is 1.00
• Approximately 68% lies within 1 standard
deviation of the mean, 95% within 2
standard deviations, and 99.7% within 3
standard deviations of the mean.
• Data values represented by x which has
mean mu and standard deviation sigma.
• Probability Function given by
14
The Standard Normal Curve
• We fix the horizontal scale so that units
of standard deviation are used (Z
values) instead of X values
• All normal distributions are now the
same
• Area = Probability
• Total Area under curve = 1 or 100%
• Standard Normal Distribution
• Same as a normal distribution, but also..
• Mean is zero
• Variance is one
• Standard Deviation is one
• Data values represented by z.
• Probability Function given by
16
The Standard Normal Curve
Z
0 1 2 3
-1
-2
-3
𝑍 =
X − mean
Standard deviation
Eg If mean = 40 and standard deviation = 10
When x = 50 z = 1
When x = 30 z = -1
When x = 60 z = 2
17
Areas under the Curve
Z
0 1 2 3
-1
-2
-3
This area is 0.6826
68.26% of values are within 1 standard
deviation of the mean
There is a probability of 0.68 that a value
will lie in this region
18
continued
• 68.26% of the values are within + 1
standard deviation of the mean
• 95.44% of the values are within + 2
standard deviations of the mean
• 99.73% of the values are within + 3
standard deviations of the mean
19
Z values
𝑍 =
X − mean
Standard deviation
For the Population we use
mean =
standard deviation =






x
z
20
Table of Areas
Z
0 1 2 3
-1
-2
-3
Area to the left of
the tail is given in
tables
Areas to the left of positive z values are
tabulated
eg z = 1, area to left of tail = 0.8413
z = 1.5, area = 0.9332
z = 1.52, area = 0.9357
Z value
21
Example: Find Prob(x>46)
= 40
= 4


Z = (46-40)/4
= 1.5
Proportion is 6.68%
Z
0 1 2 3
-1
-2
-3
Z = 1.5
Area = ?
a)




x
z
22
continued
b) Find prob(x<42)
z = (42-40)/4
= 0.5
Area = 0.6915
69.15%
c) Find prob(42<x<46)
z1 = 1.5, z2= 0.5
Area = 0.3085-0.0668
= 0.2417
= 24.17%
Z
0 1 2 3
-1
-2
-3
Z
0 1 2 3
-1
-2
-3
23
Continued
Z
0 1 2 3
-1
-2
-3
Area is 5%
or .05
Area = 0.05, z = 1.645
1.645 standard deviations above the mean is
40+ 1.645 x 4
=46.58
This means that 90% of values lie between
33.42 and 46.58
Z = ?
d)
Graph of men’s and women’s heights
140 145 150 155 160 165 170 175 180 185 190 195 200
Height in centimetres
Men
Women A graph of a continuous
distribution – a probability
density function – shows the
relative likelihood of getting
different values and ranges
of values for our variable.
Thus, in this example,
values nearest 166 and 174
are most common, and as
you get further from these
means, the proportion of
observations with these
values declines.
Parameters of the distribution
• The two parameters of the Normal distribution are the
mean  and the variance 2
x ~ N(, 2)
• Men’s heights are Normally distributed with mean 174 cm
and variance 92.16
xM ~ N(174, 92.16)
• Women’s heights are Normally distributed with a mean of
166 cm and variance 40.32
xW ~ N(166, 40.32)
A normal distribution can have any mean and standard deviation.
We denote the mean by μ, and the standard deviation by σ. (So
the variance is σ2.)
The mean μ will be the value in the middle of the normal curve –
thus it is also the median and the mode.
• So What?
• If we know the area under the curve this will tell us the
proportion of the population falling within certain values
of the variable e.g. height.
140 145 150 155 160 165 170 175 180 185 190 195 200
Height in centimetres
Men
Women
We could calculate the
area by using a
mathematical
technique called
integration and
applying it to the
(very complicated)
formula for the area
under a normal curve.
Fortunately we do not
have to do this and a
nice person has
calculated the areas
for THE STANDARD
NORMAL CURVE.
The Standard Normal
distribution
The Standard Normal distribution has a mean μ = 0 and a standard
deviation σ = 1 unlike for example our distribution of women’s
heights with a mean of 166cm and a standard deviation of 6.35cm .
All other Normal distributions can be generated from this Standard
Normal distribution through a variable called z, which relates the
mean and standard deviation of any Normal distribution to the
Standard Normal distribution.




x
z
Areas under the distribution
• What proportion of women are taller than 175
cm?
140 145 150 155 160 165 170 175 180 185 190 195 200
Height in centimetres
Need this area
• How many standard deviations is 175 above
166?
• The standard deviation is 40.32 = 6.35, hence
• so 175 lies 1.42 s.d’s above the mean
• How much of the Normal distribution lies beyond
1.42 s.d’s above the mean? Use tables of the
STANDARD NORMAL DISTRIBUTION
42
.
1
35
.
6
166
175



z
Areas under the distribution
The standard Normal distribution
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.00 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359
0.10 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753
0.20 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141
0.30 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517
0.40 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0.50 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.60 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
0.70 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852
0.80 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133
0.90 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389
1.00 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621
1.10 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830
1.20 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015
1.30 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177
1.40 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319
1.50 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441
1 – 0.9222 = 0.0778
Answer
• 7.78% of women are taller than 175 cm.
• Summary: to find the area in the tail of the
distribution, calculate the z-score, giving the
number of standard deviations between the
mean and the desired height. Then look the z-
score up in tables.
• The standard deviation will tell us how spread
out the values are likely to be around the mean.
The following give an idea of the meaning of σ in
a given distribution:
– About 68% of the observations should fall within one
standard deviation either side of the mean.
– About 95% of the observations should fall within two
standard deviations either side of the mean.
– About 99.7% of the observations should fall within
three standard deviations of the mean.
• For example, suppose we have a normal
distribution with mean 100 and standard
deviation 10. Then, if we were to take random
samples from this distribution, 68% of the values
would be expected to fall between 90 and 110,
95% of the values between 80 and 120, and
99.7% between 70 and 130. Less than one in
300 values would fall outside this range.
Summary
• Most statistical problems concern random
variables which have an associated probability
distribution
• Common distributions are the Binomial,
Normal and Poisson (there many others)
• Once the appropriate distribution for the
problem is recognised, the solution is
relatively straightforward

More Related Content

Similar to Lecture 4 - probability distributions (2).pptx

2. chapter ii(analyz)
2. chapter ii(analyz)2. chapter ii(analyz)
2. chapter ii(analyz)
Chhom Karath
 
The-Normal-Distribution, Statics and Pro
The-Normal-Distribution, Statics and ProThe-Normal-Distribution, Statics and Pro
The-Normal-Distribution, Statics and Pro
GiancarloMercado2
 
Introduction to Statistics23122223.ppt
Introduction to Statistics23122223.pptIntroduction to Statistics23122223.ppt
Introduction to Statistics23122223.ppt
pathianithanaidu
 
Introduction to Statistics2312.ppt
Introduction to Statistics2312.pptIntroduction to Statistics2312.ppt
Introduction to Statistics2312.ppt
pathianithanaidu
 
estimation
estimationestimation
estimation
Mmedsc Hahm
 
Estimation
EstimationEstimation
Estimation
Mmedsc Hahm
 
Statistical thinking
Statistical thinkingStatistical thinking
Statistical thinking
mij1120
 
Statistics-3 : Statistical Inference - Core
Statistics-3 : Statistical Inference - CoreStatistics-3 : Statistical Inference - Core
Statistics-3 : Statistical Inference - Core
Giridhar Chandrasekaran
 
SECTION VI - CHAPTER 40 - Concept of Probablity
SECTION VI - CHAPTER 40 - Concept of ProbablitySECTION VI - CHAPTER 40 - Concept of Probablity
SECTION VI - CHAPTER 40 - Concept of Probablity
Professional Training Academy
 
Binomial,Poisson,Geometric,Normal distribution
Binomial,Poisson,Geometric,Normal distributionBinomial,Poisson,Geometric,Normal distribution
Binomial,Poisson,Geometric,Normal distribution
Bharath kumar Karanam
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
NevIlle16
 
Quantitative Methods for Management_MBA_Bharathiar University probability dis...
Quantitative Methods for Management_MBA_Bharathiar University probability dis...Quantitative Methods for Management_MBA_Bharathiar University probability dis...
Quantitative Methods for Management_MBA_Bharathiar University probability dis...
Victor Seelan
 
3.4.-variance-and-stndard-deviation.pdf
3.4.-variance-and-stndard-deviation.pdf3.4.-variance-and-stndard-deviation.pdf
3.4.-variance-and-stndard-deviation.pdf
DebarpanHaldar1
 
St201 d normal distributions
St201 d normal distributionsSt201 d normal distributions
St201 d normal distributions
Sharayah Becker
 
The Standard Normal Distribution
The Standard Normal Distribution  The Standard Normal Distribution
The Standard Normal Distribution
Long Beach City College
 
ststs nw.pptx
ststs nw.pptxststs nw.pptx
ststs nw.pptx
MrymNb
 
best for normal distribution.ppt
best for normal distribution.pptbest for normal distribution.ppt
best for normal distribution.ppt
DejeneDay
 
statical-data-1 to know how to measure.ppt
statical-data-1 to know how to measure.pptstatical-data-1 to know how to measure.ppt
statical-data-1 to know how to measure.ppt
NazarudinManik1
 

Similar to Lecture 4 - probability distributions (2).pptx (20)

Qaunitv
QaunitvQaunitv
Qaunitv
 
Qaunitv
QaunitvQaunitv
Qaunitv
 
2. chapter ii(analyz)
2. chapter ii(analyz)2. chapter ii(analyz)
2. chapter ii(analyz)
 
The-Normal-Distribution, Statics and Pro
The-Normal-Distribution, Statics and ProThe-Normal-Distribution, Statics and Pro
The-Normal-Distribution, Statics and Pro
 
Introduction to Statistics23122223.ppt
Introduction to Statistics23122223.pptIntroduction to Statistics23122223.ppt
Introduction to Statistics23122223.ppt
 
Introduction to Statistics2312.ppt
Introduction to Statistics2312.pptIntroduction to Statistics2312.ppt
Introduction to Statistics2312.ppt
 
estimation
estimationestimation
estimation
 
Estimation
EstimationEstimation
Estimation
 
Statistical thinking
Statistical thinkingStatistical thinking
Statistical thinking
 
Statistics-3 : Statistical Inference - Core
Statistics-3 : Statistical Inference - CoreStatistics-3 : Statistical Inference - Core
Statistics-3 : Statistical Inference - Core
 
SECTION VI - CHAPTER 40 - Concept of Probablity
SECTION VI - CHAPTER 40 - Concept of ProbablitySECTION VI - CHAPTER 40 - Concept of Probablity
SECTION VI - CHAPTER 40 - Concept of Probablity
 
Binomial,Poisson,Geometric,Normal distribution
Binomial,Poisson,Geometric,Normal distributionBinomial,Poisson,Geometric,Normal distribution
Binomial,Poisson,Geometric,Normal distribution
 
Normal Distribution
Normal DistributionNormal Distribution
Normal Distribution
 
Quantitative Methods for Management_MBA_Bharathiar University probability dis...
Quantitative Methods for Management_MBA_Bharathiar University probability dis...Quantitative Methods for Management_MBA_Bharathiar University probability dis...
Quantitative Methods for Management_MBA_Bharathiar University probability dis...
 
3.4.-variance-and-stndard-deviation.pdf
3.4.-variance-and-stndard-deviation.pdf3.4.-variance-and-stndard-deviation.pdf
3.4.-variance-and-stndard-deviation.pdf
 
St201 d normal distributions
St201 d normal distributionsSt201 d normal distributions
St201 d normal distributions
 
The Standard Normal Distribution
The Standard Normal Distribution  The Standard Normal Distribution
The Standard Normal Distribution
 
ststs nw.pptx
ststs nw.pptxststs nw.pptx
ststs nw.pptx
 
best for normal distribution.ppt
best for normal distribution.pptbest for normal distribution.ppt
best for normal distribution.ppt
 
statical-data-1 to know how to measure.ppt
statical-data-1 to know how to measure.pptstatical-data-1 to know how to measure.ppt
statical-data-1 to know how to measure.ppt
 

Recently uploaded

5 Things You Need To Know Before Hiring a Videographer
5 Things You Need To Know Before Hiring a Videographer5 Things You Need To Know Before Hiring a Videographer
5 Things You Need To Know Before Hiring a Videographer
ofm712785
 
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
BBPMedia1
 
Pitch Deck Teardown: RAW Dating App's $3M Angel deck
Pitch Deck Teardown: RAW Dating App's $3M Angel deckPitch Deck Teardown: RAW Dating App's $3M Angel deck
Pitch Deck Teardown: RAW Dating App's $3M Angel deck
HajeJanKamps
 
April 2024 Nostalgia Products Newsletter
April 2024 Nostalgia Products NewsletterApril 2024 Nostalgia Products Newsletter
April 2024 Nostalgia Products Newsletter
NathanBaughman3
 
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdfSearch Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Arihant Webtech Pvt. Ltd
 
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
BBPMedia1
 
The-McKinsey-7S-Framework. strategic management
The-McKinsey-7S-Framework. strategic managementThe-McKinsey-7S-Framework. strategic management
The-McKinsey-7S-Framework. strategic management
Bojamma2
 
Enterprise Excellence is Inclusive Excellence.pdf
Enterprise Excellence is Inclusive Excellence.pdfEnterprise Excellence is Inclusive Excellence.pdf
Enterprise Excellence is Inclusive Excellence.pdf
KaiNexus
 
Brand Analysis for an artist named Struan
Brand Analysis for an artist named StruanBrand Analysis for an artist named Struan
Brand Analysis for an artist named Struan
sarahvanessa51503
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
taqyed
 
Premium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern BusinessesPremium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern Businesses
SynapseIndia
 
Business Valuation Principles for Entrepreneurs
Business Valuation Principles for EntrepreneursBusiness Valuation Principles for Entrepreneurs
Business Valuation Principles for Entrepreneurs
Ben Wann
 
Skye Residences | Extended Stay Residences Near Toronto Airport
Skye Residences | Extended Stay Residences Near Toronto AirportSkye Residences | Extended Stay Residences Near Toronto Airport
Skye Residences | Extended Stay Residences Near Toronto Airport
marketingjdass
 
Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)
Lviv Startup Club
 
Memorandum Of Association Constitution of Company.ppt
Memorandum Of Association Constitution of Company.pptMemorandum Of Association Constitution of Company.ppt
Memorandum Of Association Constitution of Company.ppt
seri bangash
 
CADAVER AS OUR FIRST TEACHER anatomt in your.pptx
CADAVER AS OUR FIRST TEACHER anatomt in your.pptxCADAVER AS OUR FIRST TEACHER anatomt in your.pptx
CADAVER AS OUR FIRST TEACHER anatomt in your.pptx
fakeloginn69
 
PriyoShop Celebration Pohela Falgun Mar 20, 2024
PriyoShop Celebration Pohela Falgun Mar 20, 2024PriyoShop Celebration Pohela Falgun Mar 20, 2024
PriyoShop Celebration Pohela Falgun Mar 20, 2024
PriyoShop.com LTD
 
anas about venice for grade 6f about venice
anas about venice for grade 6f about veniceanas about venice for grade 6f about venice
anas about venice for grade 6f about venice
anasabutalha2013
 
Lookback Analysis
Lookback AnalysisLookback Analysis
Lookback Analysis
Safe PaaS
 
India Orthopedic Devices Market: Unlocking Growth Secrets, Trends and Develop...
India Orthopedic Devices Market: Unlocking Growth Secrets, Trends and Develop...India Orthopedic Devices Market: Unlocking Growth Secrets, Trends and Develop...
India Orthopedic Devices Market: Unlocking Growth Secrets, Trends and Develop...
Kumar Satyam
 

Recently uploaded (20)

5 Things You Need To Know Before Hiring a Videographer
5 Things You Need To Know Before Hiring a Videographer5 Things You Need To Know Before Hiring a Videographer
5 Things You Need To Know Before Hiring a Videographer
 
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
RMD24 | Retail media: hoe zet je dit in als je geen AH of Unilever bent? Heid...
 
Pitch Deck Teardown: RAW Dating App's $3M Angel deck
Pitch Deck Teardown: RAW Dating App's $3M Angel deckPitch Deck Teardown: RAW Dating App's $3M Angel deck
Pitch Deck Teardown: RAW Dating App's $3M Angel deck
 
April 2024 Nostalgia Products Newsletter
April 2024 Nostalgia Products NewsletterApril 2024 Nostalgia Products Newsletter
April 2024 Nostalgia Products Newsletter
 
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdfSearch Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdf
 
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
RMD24 | Debunking the non-endemic revenue myth Marvin Vacquier Droop | First ...
 
The-McKinsey-7S-Framework. strategic management
The-McKinsey-7S-Framework. strategic managementThe-McKinsey-7S-Framework. strategic management
The-McKinsey-7S-Framework. strategic management
 
Enterprise Excellence is Inclusive Excellence.pdf
Enterprise Excellence is Inclusive Excellence.pdfEnterprise Excellence is Inclusive Excellence.pdf
Enterprise Excellence is Inclusive Excellence.pdf
 
Brand Analysis for an artist named Struan
Brand Analysis for an artist named StruanBrand Analysis for an artist named Struan
Brand Analysis for an artist named Struan
 
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
Premium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern BusinessesPremium MEAN Stack Development Solutions for Modern Businesses
Premium MEAN Stack Development Solutions for Modern Businesses
 
Business Valuation Principles for Entrepreneurs
Business Valuation Principles for EntrepreneursBusiness Valuation Principles for Entrepreneurs
Business Valuation Principles for Entrepreneurs
 
Skye Residences | Extended Stay Residences Near Toronto Airport
Skye Residences | Extended Stay Residences Near Toronto AirportSkye Residences | Extended Stay Residences Near Toronto Airport
Skye Residences | Extended Stay Residences Near Toronto Airport
 
Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)
 
Memorandum Of Association Constitution of Company.ppt
Memorandum Of Association Constitution of Company.pptMemorandum Of Association Constitution of Company.ppt
Memorandum Of Association Constitution of Company.ppt
 
CADAVER AS OUR FIRST TEACHER anatomt in your.pptx
CADAVER AS OUR FIRST TEACHER anatomt in your.pptxCADAVER AS OUR FIRST TEACHER anatomt in your.pptx
CADAVER AS OUR FIRST TEACHER anatomt in your.pptx
 
PriyoShop Celebration Pohela Falgun Mar 20, 2024
PriyoShop Celebration Pohela Falgun Mar 20, 2024PriyoShop Celebration Pohela Falgun Mar 20, 2024
PriyoShop Celebration Pohela Falgun Mar 20, 2024
 
anas about venice for grade 6f about venice
anas about venice for grade 6f about veniceanas about venice for grade 6f about venice
anas about venice for grade 6f about venice
 
Lookback Analysis
Lookback AnalysisLookback Analysis
Lookback Analysis
 
India Orthopedic Devices Market: Unlocking Growth Secrets, Trends and Develop...
India Orthopedic Devices Market: Unlocking Growth Secrets, Trends and Develop...India Orthopedic Devices Market: Unlocking Growth Secrets, Trends and Develop...
India Orthopedic Devices Market: Unlocking Growth Secrets, Trends and Develop...
 

Lecture 4 - probability distributions (2).pptx

  • 2. • Probability distributions • We have discussed some basic ways in which we can describe a set of actual statistical data: – Mean, median and mode as measures of central tendency – Variance, Standard Deviation, Coefficient of Variation as measures of spread. – Histograms as a way of plotting the overall distribution of data. • But to draw useful conclusions from statistical data, we need to go beyond description, to analysis.
  • 3. • To do this, we need to have some idea, or model of the type of distribution of data we might expect a priori. We can then compare the actual data with the prior expectations. • For example, a new drug is being tested to treat a disease. The drug is given to a group of patients, while a placebo (coloured, flavoured liquid) is given to a control group, with neither patients nor doctors knowing which group is which. • Records are kept as to how quickly/whether patients recover in each group. • Hopefully, more who get the drug will recover. But if this is the case, how can we tell if this is purely down to chance, or if the drug really is working?
  • 4. • We need an idea of what the distribution of recovery rates would be likely to be if it was purely down to chance. How often would 10% recover, how often would 20% recover, etc. • Only then can we say, for example “50% of the treatment group recovered, compared to only 30% of the control group. There would be a less than 1% chance of this difference occurring randomly.” • Thus, to conduct meaningful statistical analysis, we need to know about the shape of distributions of different statistics we might expect beforehand. • This is based on probability theory – where we make assumptions and deductions about the probability of different events happening or different values of data occurring.
  • 5. Some standard probability distributions • Binomial - when the underlying probability experiment has only two possible outcomes (e.g. tossing a coin) • Normal - when many small independent factors influence a variable (e.g. height, influenced by genes, diet, etc.) • Poisson - for rare events, when the probability of occurrence is low
  • 6. • Tossing a coin – the binomial distribution • We toss a coin 100 times. We get 60 heads. Does this suggest that the coin is biased towards heads? • Put another way, if the coin were fair, that is equally likely to give heads or tails, how likely would it be to get as many as 60 to 40 of one side to the other? • Or put another way: suppose we were to do the experiment of 100 coin tosses a large number of times, with a fair coin, and record all the resulting scores from 0 to 100 heads. How often would we expect to get 0 heads? How often 1 head? And so on, up to 100. What would we expect the mean and the standard deviation to be? What would a histogram of these scores be expected to look like?
  • 7. • The resulting distribution is called a binomial distribution, as it is based on two possible outcomes in each trial. • Let’s take an easier case, where we toss a coin 6 times and count the heads. There are seven possible scores we can get for the number of heads – 0 up to 6. How likely is each to occur? • We can think about this in terms of the different possible sequences of heads and tails we might get – e.g. HHTHTT or HTTTTH. • If the coin is fair, each sequence is equally likely. So what we need to do is count up the number of possible sequences that can give each number of heads.
  • 8. • How many possible sequences are there? Well, there are six trials, and each can have two possible outcomes. This means there are 2x2x2x2x2x2 possible sequences, or 26 = 64. • How many of these sequences give exactly 0 heads? Clearly only 1: TTTTTT. • How many give 1 head? There are 6 of these, as the sole head can occur in any of the six tosses – that is, we can get HTTTTT, THTTTT, TTHTTT, TTTHTT, TTTTHT or TTTTTH. • What about 2 heads? There are rather more possibilities.
  • 9. • HHTTTT • HTHTTT • HTTHTT • HTTTHT • HTTTTH • THHTTT • THTHTT • THTTHT • THTTTH • TTHHTT • TTHTHT • TTHTTH • TTTHHT • TTTHTH • TTTTHH That is, there are 15 possibilities. Similarly, we can show there are 20 ways of getting three heads. (There is a formula, for any number of trials, and for any number of ‘successes’, and for any probability of success in a single trial – but we won’t do that here.) The rest is easy – the number of ways of getting 4 heads is the same as the number of ways of getting 2 tails – that is 15. Likewise, there are 6 ways of getting 5 heads, and 1 way of getting 6 heads. (HHHHHH). Thus, the distribution is symmetric.
  • 10. No. of heads Frequency 0 1 1 6 2 15 3 20 4 15 5 6 6 1 Note, this is not a histogram of actual data, but of the expected distribution based on a particular model of the situation – namely, six independent tosses of a fair coin. We could also calculate the mean and standard deviation of this distribution – you might not be surprised to learn that the mean is 3, and it can be shown that the variance is 1.5. Again, this is not the mean and variance of a set of observed data, but are anticipated properties of the distribution based on a theoretical model.
  • 11. • The Normal Distribution • The binomial distribution is an example of a discrete distribution – one where the variable in question can take a certain number of discrete values (e.g. 0,1,2,3,…). • Other distributions are continuous – that is, they can take any value, or any value in a given range – for example the height, weight or income of a randomly selected individual. • A lot of variables in the real world tend to have a particular shape of distribution – the normal distribution – and a lot of statistical analysis is based on the assumption that certain variables follow some sort of normal distribution.
  • 12. • Any Normal Distribution • Bell-shaped • Symmetric about mean • Continuous • Never touches the x-axis • Total area under curve is 1.00 • Approximately 68% lies within 1 standard deviation of the mean, 95% within 2 standard deviations, and 99.7% within 3 standard deviations of the mean.
  • 13. • Data values represented by x which has mean mu and standard deviation sigma. • Probability Function given by
  • 14. 14 The Standard Normal Curve • We fix the horizontal scale so that units of standard deviation are used (Z values) instead of X values • All normal distributions are now the same • Area = Probability • Total Area under curve = 1 or 100%
  • 15. • Standard Normal Distribution • Same as a normal distribution, but also.. • Mean is zero • Variance is one • Standard Deviation is one • Data values represented by z. • Probability Function given by
  • 16. 16 The Standard Normal Curve Z 0 1 2 3 -1 -2 -3 𝑍 = X − mean Standard deviation Eg If mean = 40 and standard deviation = 10 When x = 50 z = 1 When x = 30 z = -1 When x = 60 z = 2
  • 17. 17 Areas under the Curve Z 0 1 2 3 -1 -2 -3 This area is 0.6826 68.26% of values are within 1 standard deviation of the mean There is a probability of 0.68 that a value will lie in this region
  • 18. 18 continued • 68.26% of the values are within + 1 standard deviation of the mean • 95.44% of the values are within + 2 standard deviations of the mean • 99.73% of the values are within + 3 standard deviations of the mean
  • 19. 19 Z values 𝑍 = X − mean Standard deviation For the Population we use mean = standard deviation =       x z
  • 20. 20 Table of Areas Z 0 1 2 3 -1 -2 -3 Area to the left of the tail is given in tables Areas to the left of positive z values are tabulated eg z = 1, area to left of tail = 0.8413 z = 1.5, area = 0.9332 z = 1.52, area = 0.9357 Z value
  • 21. 21 Example: Find Prob(x>46) = 40 = 4   Z = (46-40)/4 = 1.5 Proportion is 6.68% Z 0 1 2 3 -1 -2 -3 Z = 1.5 Area = ? a)     x z
  • 22. 22 continued b) Find prob(x<42) z = (42-40)/4 = 0.5 Area = 0.6915 69.15% c) Find prob(42<x<46) z1 = 1.5, z2= 0.5 Area = 0.3085-0.0668 = 0.2417 = 24.17% Z 0 1 2 3 -1 -2 -3 Z 0 1 2 3 -1 -2 -3
  • 23. 23 Continued Z 0 1 2 3 -1 -2 -3 Area is 5% or .05 Area = 0.05, z = 1.645 1.645 standard deviations above the mean is 40+ 1.645 x 4 =46.58 This means that 90% of values lie between 33.42 and 46.58 Z = ? d)
  • 24. Graph of men’s and women’s heights 140 145 150 155 160 165 170 175 180 185 190 195 200 Height in centimetres Men Women A graph of a continuous distribution – a probability density function – shows the relative likelihood of getting different values and ranges of values for our variable. Thus, in this example, values nearest 166 and 174 are most common, and as you get further from these means, the proportion of observations with these values declines.
  • 25. Parameters of the distribution • The two parameters of the Normal distribution are the mean  and the variance 2 x ~ N(, 2) • Men’s heights are Normally distributed with mean 174 cm and variance 92.16 xM ~ N(174, 92.16) • Women’s heights are Normally distributed with a mean of 166 cm and variance 40.32 xW ~ N(166, 40.32) A normal distribution can have any mean and standard deviation. We denote the mean by μ, and the standard deviation by σ. (So the variance is σ2.) The mean μ will be the value in the middle of the normal curve – thus it is also the median and the mode.
  • 26. • So What? • If we know the area under the curve this will tell us the proportion of the population falling within certain values of the variable e.g. height. 140 145 150 155 160 165 170 175 180 185 190 195 200 Height in centimetres Men Women We could calculate the area by using a mathematical technique called integration and applying it to the (very complicated) formula for the area under a normal curve. Fortunately we do not have to do this and a nice person has calculated the areas for THE STANDARD NORMAL CURVE.
  • 27. The Standard Normal distribution The Standard Normal distribution has a mean μ = 0 and a standard deviation σ = 1 unlike for example our distribution of women’s heights with a mean of 166cm and a standard deviation of 6.35cm . All other Normal distributions can be generated from this Standard Normal distribution through a variable called z, which relates the mean and standard deviation of any Normal distribution to the Standard Normal distribution.     x z
  • 28. Areas under the distribution • What proportion of women are taller than 175 cm? 140 145 150 155 160 165 170 175 180 185 190 195 200 Height in centimetres Need this area
  • 29. • How many standard deviations is 175 above 166? • The standard deviation is 40.32 = 6.35, hence • so 175 lies 1.42 s.d’s above the mean • How much of the Normal distribution lies beyond 1.42 s.d’s above the mean? Use tables of the STANDARD NORMAL DISTRIBUTION 42 . 1 35 . 6 166 175    z Areas under the distribution
  • 30. The standard Normal distribution z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09 0.00 .5000 .5040 .5080 .5120 .5160 .5199 .5239 .5279 .5319 .5359 0.10 .5398 .5438 .5478 .5517 .5557 .5596 .5636 .5675 .5714 .5753 0.20 .5793 .5832 .5871 .5910 .5948 .5987 .6026 .6064 .6103 .6141 0.30 .6179 .6217 .6255 .6293 .6331 .6368 .6406 .6443 .6480 .6517 0.40 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879 0.50 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224 0.60 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549 0.70 .7580 .7611 .7642 .7673 .7704 .7734 .7764 .7794 .7823 .7852 0.80 .7881 .7910 .7939 .7967 .7995 .8023 .8051 .8078 .8106 .8133 0.90 .8159 .8186 .8212 .8238 .8264 .8289 .8315 .8340 .8365 .8389 1.00 .8413 .8438 .8461 .8485 .8508 .8531 .8554 .8577 .8599 .8621 1.10 .8643 .8665 .8686 .8708 .8729 .8749 .8770 .8790 .8810 .8830 1.20 .8849 .8869 .8888 .8907 .8925 .8944 .8962 .8980 .8997 .9015 1.30 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177 1.40 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319 1.50 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441 1 – 0.9222 = 0.0778
  • 31. Answer • 7.78% of women are taller than 175 cm. • Summary: to find the area in the tail of the distribution, calculate the z-score, giving the number of standard deviations between the mean and the desired height. Then look the z- score up in tables.
  • 32. • The standard deviation will tell us how spread out the values are likely to be around the mean. The following give an idea of the meaning of σ in a given distribution: – About 68% of the observations should fall within one standard deviation either side of the mean. – About 95% of the observations should fall within two standard deviations either side of the mean. – About 99.7% of the observations should fall within three standard deviations of the mean. • For example, suppose we have a normal distribution with mean 100 and standard deviation 10. Then, if we were to take random samples from this distribution, 68% of the values would be expected to fall between 90 and 110, 95% of the values between 80 and 120, and 99.7% between 70 and 130. Less than one in 300 values would fall outside this range.
  • 33. Summary • Most statistical problems concern random variables which have an associated probability distribution • Common distributions are the Binomial, Normal and Poisson (there many others) • Once the appropriate distribution for the problem is recognised, the solution is relatively straightforward