Quantitative Methods for Lawyers - Class #9 - Bayes Theorem (Part 2), Skewness, Kurtosis & Data Distributions - Professor Daniel Martin Katz

Quantitative
Methods
for
Lawyers
Bayes Theorem (Part 2),
Skewness, Kurtosis
& Data Distributions
Class #9
@ computational
computationallegalstudies.com
professor daniel martin katz danielmartinkatz.com
lexpredict.com slideshare.net/DanielKatz

Example:
Marie is getting married tomorrow, at an outdoor ceremony in the
desert.
In recent years, it has rained only 5 days each year.
Unfortunately, the weatherman has predicted rain for tomorrow.
When it actually rains, the weatherman correctly forecasts rain 90% of
the time.
When it doesn't rain, he incorrectly forecasts rain 10% of the time.
What is the probability that it will rain on the day of Marie's wedding?
Bayes Rule

Solution: The sample space is deﬁned by two mutually-exclusive
events - it rains or it does not rain.
Additionally, a third event occurs when the weatherman predicts
rain. Notation for these events appears below.
• Event A1. It rains on Marie's wedding.
• Event A2. It does not rain on Marie's wedding
• Event B. The weatherman predicts rain.
Bayes Rule

• Event A1. It rains on Marie's wedding.
• Event A2. It does not rain on Marie's wedding
• Event B. The weatherman predicts rain.
In terms of probabilities, we know the following:
• P( A1 ) = 5/365 =0.014 [rains = 5 days per year]
• P( A2 ) = 360/365 = 0.986 [Not rain = 360 days per year]
• P( B | A1 ) = 0.9
[When it rains, the weatherman predicts rain 90% of the time]
• P( B | A2 ) = 0.1 [When it does not rain, the weatherman predicts
rain 10% of the time]
Bayes Rule

A2
P(B|A1)
360
365
B
B
Lets Think About This
Using a Diagram
A1
.1
=.986
.0986
P(B|A2)
.9
.0126
5
365
=.014

We want to know P( A1 | B ), the probability it will rain on the day of Marie's
wedding, given a forecast for rain by the weatherman. The answer can be
determined from Bayes' theorem, as shown below:
P( A1 | B ) = _____________P( A1 ) P( B | A1 )_________
P( A1 ) P( B | A1 ) + P( A2 ) P( B | A2 )
P( A1 | B ) =
___________(0.014)(0.9)__________
[ (0.014)(0.9) + (0.1) (0.986) ]
P( A1 | B ) = .1133
Note the somewhat unintuitive result. Even when the weatherman predicts rain, it
only rains only about 11% of the time.
Bayes Rule

What Can We Say About The Weatherman?
Bayes Rule
Likelihood Increased from ~1% to ~11%
That is a 11 fold increase in the likelihood
However, it is still pretty unlikely to rain

Bayes Rule
How Much Signal / Information ?
We Could Consider a Complex Version of the problem -
Weatherman Predicts Rain + It is the Monsoon Season
Compound Events
The Signal was of limited value because ratio of Type I to
Type II error was not favorable

Lets Try Another
Bayes Rule Problem ...

Bayes Rule
Imagine a particular test:
correctly identiﬁes those with a certain disease 94% of the time
and
correctly diagnoses those without the disease 98% of the time
A friend has just informed you that he has received a positive result
and asks for your advice about how to interpret these probabilities.
Before attempting to address your friend’s concern, you research
the illness and discover that 4% of men have this disease.
What is the probability your friend actually has the disease?

Deﬁne the events:
Express the given information and question in probability notation:
“test correctly identiﬁes those with a certain serious disease 94% of the time”
“test correctly diagnoses those without the disease 98% of the time”
“you discover that 4% of men have this disease”
this statement also tells us that 96% of men do not have the disease
Bayes
Rule
( )1 0.94P B A⇒ = !
1
2
a man has this disease
a man does not have this disease
positive test result
negative test resultC
A
A
B
B
=
=
=
=
!
( )2 0.98C
P B A⇒ = !
( )1 0.04P A⇒ = !
( )2 0.96P A⇒ = !

Key Question:
“Given a positive result, What is the
probability your friend actually has
the disease ?”
( )1 ?P A B⇒ = !

Bayes Rule
a tree diagram:
!
1
2
a man has this disease
a man does not have this disease
positive test result
negative test resultC
A
A
B
B
=
=
=
=
!

Use Bayes’ Theorem and your tree diagram to answer the question:
There is a 66.2% probability that he actually has the disease. The probability is
high, but considerably lower than your friend feared.
Bayes Rule
( )
( ) ( )
( ) ( ) ( ) ( )
1 2
1
1 2 2 2
0.0376
0.662
0.0376 0.0192
P A P B A
P A B
P A P B A P A P B A
⋅
= = ≈
+⋅ + ⋅
!

http://www.agenarisk.com/resources/probability_puzzles/event_tree.shtml
Review This One on Your Own

Sampling
Take 2
Use the Sample to
Infer
Characteristics of
the Full Population

Why Sample?
Might Be Impossible to Get the Full Population
Cost of Getting Full Population
Sampling is concerned with the selection of a subset of
individuals from within a population to estimate
characteristics of the whole population
Sampling
Focus Upon Improving Precision v. Size

(1) Deﬁning the population of concern
(2) Specifying a sampling frame, a set of items or events
possible to measure
(3) Specifying a sampling method for selecting items or
events from the frame
(4) Determining the sample size
(5) Implementing the sampling plan
(6) Sampling and data collecting
Sampling Stages

Determining the Sample Size
Conceptually We Understand that in order to
obtain a representative sample we need to acquire
somewhere between
1 > ? > Full Population
But Exactly How Many Observations do we need?

Random Sampling Error
Imagine a Political Poll
When You Sample at Random It is Possible to
Have a Skewed Set of Observation in the
Sample
where the population of interest are actual voters.
pollsters take smaller samples that are intended to be
representative, that is, a random sample of the population.
It is possible that pollsters sample 1,013 voters who happen to vote
for Bush when in fact the population is evenly split between
Candidate 1 and Candidate 2, but this is extremely unlikely
(p = 2−1013
≈ 1.1 × 10−305
) given that the sample is random.

For Simple Random Sample on a large
population, the Inverse of the Square Root of
the Sample Size

the Sample Size
Very
Typically
Reported

a random sample of size 400 will give a margin of error, at a
95% conﬁdence level, of 0.98/20 or 0.049 - just under 5%.
the Sample Size
Example:

a random sample of size 1600 will give a margin of error of
0.98/40, or 0.0245 - just under 2.5%.
the Sample Size
Example:

A random sample of size 1600 will give a margin of error of
0.98/40, or 0.0245 - just under 2.5%.
the Sample Size
Example:
Notice: Double the Precision Requires
four times the Sample Size!

Top portion of this graphic depicts the relative likelihood
that the "true" percentage is in a particular area given a
reported percentage of 50%.
In other words, for each sample
size, one is 95% conﬁdent that the
"true" percentage is in the region
indicated by the corresponding
segment.
The larger the sample is, the
smaller the margin of error.
The bottom portion shows
95% conﬁdence intervals
(horizontal line segments),
the corresponding margins of
error (on the left), and sample
sizes (on the right).

Central Limit Theorem
Try this yourself: “Netlogo Central Limit Theorem”
http://ccl.northwestern.edu/netlogo/models/run.cgi?CentralLimitTheorem.715.627

Thinking of Data as
a Distribution: Histogram
Histogram - histogram is a graphical representation showing a
visual impression of the distribution of data
(1) consists of tabular frequencies, shown as adjacent
rectangles, erected over discrete intervals (bins)
(2) The height of a rectangle is also equal to the frequency
density of the interval, i.e., the frequency divided by the width
of the interval
(3) Total area of the histogram is equal to the number of data

Thinking of Data as
a Distribution: Histogram
Histogram of travel time, US 2000 census. Area under the curve equals
the total number of cases. This diagram uses Q/width from the table.

Ordinary v. Cumulative
Histogram

http://www.socr.ucla.edu/htmls/SOCR_Charts.html
http://www.socr.ucla.edu/
An Extra Online Resource

Data as a Distribution
Try to Start Thinking of Any Data Set as a Distribution
This allows you take a broader perspective about the
observations contained therein
When you get a new dataset you should generate some
summary statistics such as
(1) Measures of Central Tendency
(2) Measures of Variation
( including the ﬁrst four moments of the distribution)

Thinking of Data as
a Distribution
Moment 1 = Mean
Moment 2 = Variance
Moment 3 = Skewness
Moment 4 = Kurtosis

Describing the Shape
of the Data

Skewness
skewness is a measure of the
asymmetry of a distribution

Skewness
Skewness in the Context of the Measures
of Central Tendency

a negative skew indicates that the tail on the left side
of the probability density function is longer than the
right side and the bulk of the values (possibly
including the median) lie to the right of the mean.
Skewness

Skewness
A positive skew indicates that the tail on the right side is
longer than the left side and the bulk of the values lie to
the left of the mean.

Calculating Skewness
1. Subtract Mean
from each Raw Score.
Aka, Deviations from
the mean
2. Raise each of these
deviations from the
mean to the third power
and sum. Aka: Sum of
third moment deviations
3. Calculate skewness, which is the sum of the deviations from the
mean, raised to the third power, divided by number of cases
minus 1, times the standard deviation raised to the third power.

Try This Problem:
http://www.indiana.edu/~educy520/
sec5982/week_12/skewness_demo.pdf
1. Subtract Mean
from each Raw Score.
Aka, Deviations from
the mean
2. Raise each of these
deviations from the
mean to the third power
and sum. Aka: Sum of
third moment deviations
3. Calculate skewness, which is the sum of the
deviations from the mean, raised to the third
power, divided by number of cases minus 1,
times the standard deviation raised to the third
power.

Try This Problem:
http://www.indiana.edu/~educy520/sec5982/week_12/skewness_demo.pdf

kurtosis is any measure of the "peakedness" of a
distribution
A high kurtosis distribution has a sharper peak
and longer, fatter tails, while a low kurtosis
distribution has a more rounded peak and
shorter, thinner tails.
Kurtosis

Distributions with zero excess kurtosis are called mesokurtic, or
mesokurtotic. The most prominent example of a mesokurtic
distribution is the normal distribution
A distribution with positive excess kurtosis is called leptokurtic, or
leptokurtotic. "Lepto-" means "slender". In terms of shape, a
leptokurtic distribution has a more acute peak around the mean and
fatter tails.
A distribution with negative excess kurtosis is called platykurtic, or
platykurtotic. "Platy-" means "broad". In terms of shape, a
platykurtic distribution has a lower, wider peak around the mean and
thinner tails.
Kurtosis

The moment coefﬁcient of kurtosis of a data set is
computed almost the same way as the coefﬁcient of
skewness:
and
“excess” kurtosis: = Kurtosis − 3
Calculating Kurtosis
Note: the excess kurtosis is
generally used because the
excess kurtosis of a normal
distribution is 0.

Example:
n = 100
x̄bar = 67.45
variance m2 = 8.5275

Example:
n = 100
x̄bar = 67.45
variance m2 = 8.5275
kurtosis is = 199.3760/
(8.5275)² = 2.7418
and the excess kurtosis is =
2.7418 − 3 =
−0.2582

Calculating Skew & Kurtosis
http://www.youtube.com/watch?v=eKwJUWkD2FQ

Daniel Martin Katz
@ computational
computationallegalstudies.com
lexpredict.com
danielmartinkatz.com
illinois tech - chicago kent college of law@

Quantitative Methods for Lawyers - Class #9 - Bayes Theorem (Part 2), Skewness, Kurtosis & Data Distributions - Professor Daniel Martin Katz

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Quantitative Methods for Lawyers - Class #9 - Bayes Theorem (Part 2), Skewness, Kurtosis & Data Distributions - Professor Daniel Martin Katz

Similar to Quantitative Methods for Lawyers - Class #9 - Bayes Theorem (Part 2), Skewness, Kurtosis & Data Distributions - Professor Daniel Martin Katz (20)

More from Daniel Katz

More from Daniel Katz (20)

Recently uploaded

Recently uploaded (20)

Quantitative Methods for Lawyers - Class #9 - Bayes Theorem (Part 2), Skewness, Kurtosis & Data Distributions - Professor Daniel Martin Katz