SlideShare a Scribd company logo
STATISTICS - 2
Elements of Inference
A Small Recap of Previous Presentation
• Descriptive Vs Inferential Statistics
• Sample Vs Population
• Need for sampling
STATISTICS
Descriptive
• Descriptive statistics in
simple sense is to provide
people a description of the
data that we currently have.
• Example would be what is
the statistics of performance
of a class of students and
answer would be like mean
marks are 64.5
Inferential
• Inferential statistics is when
we have to infer an outcome
by just looking at a small
portion of data.
• Example would be who will
win this election and answer
would be like a survey of
10000 people suggests that
XYZ has a 60-70% chance with
95% confidence
Sample Vs Population
Sample
• A sample is a portion of the
population which is readily
available or easily attainable.
• Example would be a survey
or just a million people from
the population.
Population
• A population is the entire
data that should be ideally
used for a statistic.
• Example would be a census
or population of a country
Why do we go for sample ?
• Going around and asking the entire
population of people who they are going to
vote for is impossible.
• Taking the heights of the entire population is
not feasible as a lot would die and a lot would
be born by the time we finish.
• Sometimes the sample would be a
“Representative sample” meaning it has the
same nature and characteristics of the
population.
Elements of Inference
• Population Vs Sample (Measures)
• Probability
• Random Variables
• Probability Distributions
• Statistical Inference – The Concept
Population Vs Sample (Measures)
Population Measures
• Mean = μ
• SD = σ
• Var = σ2 :
• Variance Formula : Average
squared distance from
population mean.
Sample Measures
• Mean = X̅
• SD = S
• Var = S2 :
• Variance Formula :
MODIFIED Average squared
distance from Sample
mean.
Explained
in A
separate
simulation
Alternative formula for variance
Population Variance
Sample Variance
Find the proof here
Probability
Probability – Measure of Randomness
• Probability can be assumed as the measure of
randomness
• Usually done on a population quantity
A Simple rule in subsets
• Advanced Rule – If Occurance of A implies
occurrence of B, Then Probability of A
occurrence is < probability of B occurrence.
• P(A) < P(B).
BA
Probability in Statistics
• Why probability is a part of statistics?
• Probability calculus helps model the
randomness and hence is a part of statistical
measures.
• Mass function and density function is the
starting point.
• Example is bell curve (Normal distribution)
and all other similar distribution.
Types of Data
Quantitative
• Quantitative data are
called discrete if the sample space
contains a finite or countable
infinite number.
• S = {0, 1, 2, ..., 31} – students
passed
• S = {0, 1, 2, ...} – cars crossed in a
given hour
• Quantitative data are
called continuous if the sample
space contains an interval or
continuous span of real numbers.
• S = {h: h ≥ 0 hours} – number of
hours spent studying
Qualitative
• Qualitative data are
called categorical if the
sample space contains objects
that are grouped or
categorized based on some
qualitative trait.
• When there are only two such
groups or categories, the data
are considered binary.
• S = {yes, no} – binary
• S = {Male, Female, Other} -
Categorical
Random Variables
Random Variables
• Similar to variables in a computer program
• RV is a variable which holds the numeric
outcome of a experiment
• Types : Discrete and Continuous
Discrete random variables
• Examples of Discrete : Die roll
/ Coin toss etc.
• Coin toss – 2 possible values
{H, T}
• Roll of Dice – 6 possible
values {1,2,3,4,5,6}
• Modelling Discrete : We
associate a probability to all
individual outcomes.
• Web traffic in a given day –
Can have a fixed but unbound
value at any given day
Continuous random variables
• Example of continuous :
Number of hours I sleep
daily ( Discrete ? ).
• Lets develop the above into
continuous
• If you answer that 7, I may
ask is it & or 7.000001?
• Modelling Continuous : We
associate a probability to a
various ranges of outcomes
Continuous RV Example
• Height differentials of students in a class.
• Try to put the values in a SET -> { , , }
• Can you be sure if you get a value of 1.24?
• It can be 1.245.... Or 1.242...
• So Continuous RVs does not have defined value
instead can take plenty of values or infinite states.
• That is why we use a range to model them like.
[1-2], [2-3], [3-4]
Probability Distributions
Types of Distributions
Discrete Probability Distribution
• Bernoulli Distribution
• Binomial Distribution
• Geometric Distribution
• Poisson Distribution
Continuous Probability Distribution
• Uniform Distribution
• Normal Distribution
• Exponential Distribution
• Gamma Distribution
• Chi-Squared Distribution
Probability mass function (p.m.f)
• The probability that a discrete random
variable X takes on a particular value x, that
is, P(X = x),also denoted f(x).
• The function f(x) is typically called the probability
mass function.
• A.k.a
• probability function
• frequency function
• probability density function.
PMF of Discrete Random Variables
• PMF(H) -> f(coin toss) -> P(coin toss = H) = ½
• Defined as PMF is a function of a value of
random variable (X) which gives the
probability associated with that value of X(x).
• Cant be zero, all values of RV sums to 1
Cumulative Distribution Function
• The function: F(x) = P(X ≤ x)
is called a cumulative probability distribution.
• For a discrete random variable X, the
cumulative probability distribution F(x) is
determined by:
PMF Vs CDF
• note that the probability mass function, f(x),
of a discrete random variable X is
distinguished from the cumulative probability
distribution, F(x), of a discrete random
variable X by the use of a lowercase f and an
uppercase F.
• That is, the notation f(3) means P(X = 3), while
the notation F(3) means P(X ≤ 3).
Survival function
• Both CDF and Survival function is just some functions which if
named can make life easier.
• CDF(x) is probability that the function takes the value x and
lower.
• Survival function is just the opposite of Cdf.
• Survival fn(x) -> P(X > x).
Bernoulli Distribution
• X = 0 -> Tails
• X = 1 -> Heads
• P(x) = ( ½ )x.( ½ )(x-1)
• P(x) = (θ)x.(1-θ)(x-1) for a biased coin
• The above is Bernoulli distribution and models
a coin toss.
Bernoulli Distributions
• Bernoulli
• F(X=x) = Px.(1-p)(1-x) .
• Mean = p
• Variance = p.(1-p)
P
1-p
0 0.5 1
Binomial Distrbution
• A discrete random variable X is a binomial random
variable if:
• An experiment, or trial, is performed in exactly the same
way n times.
• Each of the n trials has only two possible outcomes. One of
the outcomes is called a "success," while the other is called
a "failure." Such a trial is called a Bernoulli trial.
• The n trials are independent.
• The probability of success, denoted p, is the same for each
trial. The probability of failure is q = 1 − p.
• The random variable X = the number of successes in
the n trials.
Binomial Distribution
• The probability mass function of a binomial random
variable X is:
• f(x)=(nCx) (p)x.(1-p)(n-x) Or (nCx) (p)x.(q)(n-x)
• We denote the binomial distribution as b(n, p). That is, we
say:
• X ~ b(n, p)
• where the tilde (~) is read "as distributed as,"
and n and p are called parameters of the distribution.
Function, Mean and Variance of
Binomial Distribution
• P.m.f -> f(x)=(nCx) (p)x.(1-p)(n-x)
• Mean -> np
• Sd -> σ= √np(1−p)
• Variance -> σ2=np(1−p)
Geometric Distribution
• Assume Bernoulli trials — that is,
• (1) there are two possible outcomes,
• (2) the trials are independent, and
• (3) p, the probability of success, remains the same from trial to trial.
• Let X denote the number of trials until the first success. Then, the
probability mass function of X is:
• f(x)=P(X=x)=(1−p)x−1p
• for x = 1, 2, ... In this case, we say that X follows a geometric
distribution.
Geometric Distribution Example
• A representative from the National Football
League's Marketing Division randomly selects
people on a random street in Kansas City, Kansas
until he finds a person who attended the last
home football game. Let p, the probability that he
succeeds in finding such a person, equal 0.20.
And, let X denote the number of people he
selects until he finds his first success. What is the
probability that the marketing representative
must select 4 people before he finds one who
attended the last home football game?
Function, Mean and Variance of
Geometric Distribution
• P.m.f -> f(x)= P(X=x)=(1−p)x−1p
• Mean -> 1/p
• Sd -> σ= √(1−p)/p
• Variance -> σ2=(1-p)/p2
Negative Binomial Distribution
• Assume Bernoulli trials from the same geometric distribution
example— that is,
• (1) there are two possible outcomes,
• (2) the trials are independent, and
• (3) p, the probability of success, remains the same from trial to trial.
• Let X denote the number of trials until the rth success. Then, the
probability mass function of X is:
• f(x)=P(X=x)=(x−1Cr−1)(1−p)x−rpr
• for x = r, r + 1, r + 2, ... In this case, we say that X follows a negative
binomial distribution.
• A geometric distribution is a special case of a negative binomial
distribution with r = 1
Function, Mean and Variance of
Negative Binomial Distribution
• P.m.f -> f(x)=(x−1Cr−1)(1−p)x−rpr
• Mean -> r/p
• Sd -> √r(1−p)/p
• Variance -> r(1-p)/p2
Poisson Distribution
• Let the discrete random variable X denote the number of times an event
occurs in an interval of time (or space). Then X maybe a Poisson random
variable with x = 0, 1, 2, ...
• Examples:
• Let X equal the number of typos on a printed page. (This is an example of
an interval of space — the space being the printed page.)
• Let X equal the number of cars passing through the intersection of Allen
Street and College Avenue in one minute. (This is an example of an
interval of time — the time being one minute.)
• Let X equal the number of Alaskan salmon caught in a squid driftnet. (This
is again an example of an interval of space — the space being the squid
driftnet.)
• Let X equal the number of customers at an ATM in 10-minute intervals.
• Let X equal the number of students arriving during office hours.
Function, Mean and Variance of
Poisson Distribution
• P.m.f ->
• Mean -> λ
• Sd -> √ λ
• Variance -> λ
Poisson can be approximated to the binomial distribution when n is large and p is
small.
Continuous Probability Distributions
Continuous Distributions
• In this section, as the title suggests, we are
going to investigate probability distributions
of continuous random variables, that is,
random variables whose values contains an
infinite interval of possible outcomes.
Useful Pre-requisites
• Empirical Rule for 68,95 and 99.7 percentile.
• When the gathered data is mound or bell-
shaped.
• We can use the following formula to identify the
amount of data points below the level.
• 68% data is between μ ± 1σ
• 95% data is between μ ± 2σ
• 99.7% data is between μ ± 3σ
Useful Pre-requisites
• Quantiles
• Percentile is a derivative of quantile
• Percentile will map the value to that of 100
• Quantile will map the value to that of the
maximum value.
• The median is the 50th quantile.
• 25th Quantile will encompass 1/4th of the data
Useful Pre-requisite
• The 25th percentile is also called the first
quartile and is denoted as q1.
• The 50th percentile is also called the second
quartile or median, and is denoted as q2 or m.
• The 75th percentile is also called the third
quartile and is denoted as q3.
• The interquartile range (IQR) is the difference
between the first and third quartiles.
Useful Pre-requisite
• Five-Number Summary
• we have a random sample of 20 concentrations of
calcium carbonate (CaCO3) in milligrams per litre:
• Minimum: 127.8
• First quartile: 130.12
• Median: 131.45
• Third quartile: 132.70
• Maximum: 134.8
Useful Pre-requisite
• Skewness and Symmetry
• For a distribution that is skewed left, the bulk of the
data values (including the median) lie to the right of
the mean, and there is a long tail on the left side.
• For a distribution that is skewed right, the bulk of the
data values (including the median) lie to the left of the
mean, and there is a long tail on the right side.
• For a distribution that is symmetric, approximately half
of the data values lie to the left of the mean, and
approximately half of the data values lie to the right of
the mean.
Contd..
Symmetric Skewed right Skewed left
Probability Density Function
• The probability that X takes on any particular
value x is 0. That is, finding P(X = x) for a
continuous random variable X is not going to
work.
• Instead, we'll need to find the probability
that X falls in some interval (a, b), that is, we'll
need to find P(a < X < b). We'll do that using a
probability density function ("p.d.f.").
Probability Density Function
• PDF -> Helps model the continuous RV and used area represented by the
probabilities
• Value is always larger than zero throughout the curve or function.
• Total area under the curve is 1.
• Area under the curve gives probabilities associated with Random Variable.
• Probablity that a person with IQ > 100 but less than 115 in a PDF defined is
shown.
100 115
Special Considerations on PDF
• The probability that the Random Variable
takes any specific value is 0 since area of the
curve (not under) is 0.
• The PDF always talks or deals with the
population measure (NOT sample measure).
Cumulative Distribution Function
• The function: F(x) = P(X ≤ x)
is called a cumulative probability distribution.
• For a discrete random variable X, F(x) is:
• For a continuous RV X F(x) is :
• The summation is made to integral.
Survival function
• Both CDF and Survival function is just some functions which if
named can make life easier.
• CDF(x) is probability that the function takes the value x and
lower.
• Survival function is just the opposite of Cdf.
• Survival fn(x) -> P(X > x).
CDF SF
Continuous Distributions
• Uniform Distribution
• Beta Distribution
• Normal Distribution
• Exponential Distribution
• Gamma Distribution
• Chi-Squared Distribution
Uniform Distribution
• A continuous random variable X has a uniform
distribution, denoted U(a, b), if its distribution
is as given below
•
P.d.f, mean and variance of a Uniform
distribution
• P.d.f ->
• Mean ->
• Variance ->
Exponential Distribution
• Suppose X, following an (approximate) Poisson process,
equals the number of customers arriving at a bank in an
interval of length 1.
• If λ, the mean number of customers arriving in an interval
of length 1, is 6, say, then we might observe something like
this:
•
• w – waiting time
Exponential Distribution – contd..
• Previously, our focus would have been on the
discrete random variable X, the number of
customers arriving.
• As the picture suggests, however, we could
alternatively be interested in the continuous
random variable W, the waiting time until
the first customer arrives.
p.d.f, mean and variance of
Exponential Distribution
• P.d.f ->
• Mean ->
• Variance ->
Gamma Distributions
• we learned that in an approximate Poisson process with
mean λ, the waiting time X until the first event occurs
follows an exponential distribution with mean θ = 1/λ.
• We now let W denote the waiting time until
the αth event occurs and find the distribution of W. We
could represent the situation as follows:
•
p.d.f, mean and variance of Gamma
Distribution
• P.d.f ->
• Mean ->
• Variance ->
Chi-Square Distribution
• the chi-square distribution is just a special
case of the gamma distribution!
• Let X follow a gamma distribution with θ = 2
and α = r/2, where r is a positive integer. Then
we say that X follows a chi-square distribution
with r degrees of freedom, denoted χ2(r) and
read "chi-square-r."
p.d.f, mean and variance of Chi-Square
Distribution
• P.d.f ->
• Mean ->
• Variance ->
Normal Distribution
• Most frequent distribution seen in the natural
world.
• P.d.f ->
• Mean -> μ
• Variance -> σ2
Properties of Normal Distribution
• All normal curves are bell-shaped.
• All normal curves are symmetric about the mean μ.
• The area under an entire normal curve is 1.
• All normal curves are positive for all x. That is, f(x) > 0
for all x.
• The limit of f(x) as x goes to infinity is 0, and the limit
of f(x) as x goes to negative infinity is 0.
• The height of any normal curve is maximized at x = µ.
• The shape of any normal curve depends on its
mean μ and standard deviation σ.
Standard Normal Distribution
• If X ~ N(μ, σ2), then: Z = (X-μ)/σ follows N(0,1).
• This means that Z is a random variable which
follows the N(0,1) distribution, which is called
the standardized (or standard) normal
distribution.
• Now we can use the standard normal N(0,1)
table, typically referred to as the Z-table, to find
the desired probability.
Finding probabilities in normal
distribution?
• Lets see the following question:-
• Let X equal the IQ of a randomly selected American.
Assume X ~ N(100, 162). What is the probability that a
randomly selected American has an IQ below 90?
•
Finding probabilities in normal
distribution?
• The following integral gives us the answer for the question
• But there is just 1 problem
• It is not possible to integrate the normal p.d.f. That is, no
simple expression exists for the antiderivative. We can only
approximate the integral using numerical analysis
techniques.
• So, all we need to do is find a normal probability table for a
normal distribution with mean μ = 100 and standard
deviation σ = 16. Then there would have to be an infinite
number of normal probability tables for various μ and σ.
Finding probabilities in normal
distribution? - Solution
• The cumulative probabilities have been tabled
for the N(0,1) distribution.
• All we need to do is transform our N(100,162)
distribution to a N(0,1) distribution
• Then use the cumulative probability table for
the N(0,1) distribution to calculate our desired
probability.
Transforming X ~ N(μ,σ2) to X ~ N(0,1)
• If X ~ N(μ, σ2), then:
• follows the N(0,1) distribution, which is called
the standardized (or standard) normal
distribution.
• Use the standard normal N(0,1) table, typically
referred to as the Z-table, to find the desired
probability.
Functions to find probability or X value
• First and foremost method is to use the
Empirical rule for 90%, 95%, 97.5%, 99% viz
1.28σ, 1.645σ, 1.96σ, 2.33σ respectively
• Second method is to use the Z-Table.
• Excel and R Formulas are described in detail in
the following slides.
Excel Formula for finding probability
• X value, μ, σ are given.
• NORMDIST(x, μ, σ, cumulative)
• X is the value for which you want the distribution.
• μ is the arithmetic mean of the distribution.
• σ is the standard deviation of the distribution.
• Cumulative :
• If TRUE, returns the c.d.f. (Area to the left) P(X ≤ x)
• if FALSE, it returns the p.d.f.
• 1 - NORMDIST(x, μ, σ, TRUE) (Area to the right) P(X < x)
Excel Formula for finding x value
• Probability, μ, σ are given.
• NORMINV(probability, μ, σ )
• Probability is a probability corresponding to the normal
distribution.
• μ is the arithmetic mean of the distribution.
• σ is the standard deviation of the distribution.
• For STANDARD NORMAL.
• NORMSDIST(z)
• Z is the value for which you want the distribution.
• NORMSINV(probability)
• Probability is a probability corresponding to the normal
distribution.
R functions for finding probability
• pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
• Defaults are specified
• q - The x values that we need to find the
probability for.
• We can also directly calculate z value and use z value like
given below
• pnorm(z, lower.tail = TRUE, log.p = FALSE)
R functions for finding x value
• qnorm(p, mean, sd , lower.tail, log.p)
• P - probability (defaults NONE)
(Usually given as a quantile like
0.92 for 92%)
• Mean - mean (defaults 0)
• Sd - std_deviation (defaults 1)
• Lower.tail - TRUE -> P(X ≤ x)
- FALSE -> P(X < x)
• Log.p - TRUE -> p is given as log(p)
- FALSE -> p is a quantile
Relationship between normal and Chi-
Square Distribution
• If X is normally distributed with mean μ and
variance σ2 > 0, then:
• is distributed as a chi-square random variable
with 1 degree of freedom.
Statistical Inference
What is Statistical Inference ?
• Generating conclusions about population from
a noisy sample
• We try to identify the estimates of population
from the data available in the form of samples
• The Historical data is one of the most widely
available data
• The Survey data is the other form of available
data
Statistical Inference – The process
• We have sample data
• Hence we will have a measure for it like mean,
median or mode.
• The sample measure is called the estimator.
• Where it tries to estimate the population
measure.
• Sample mean is an estimate of population mean
• Sample median is an estimate of population
median.
END OF MODULE
~~X~~
Coming up Next = Statistical Inference - Core
76

More Related Content

What's hot

Introduction to Hypothesis Testing
Introduction to Hypothesis TestingIntroduction to Hypothesis Testing
Introduction to Hypothesis Testing
jasondroesch
 
Hipotez testi
Hipotez testiHipotez testi
Statistical analysis in SPSS_
Statistical analysis in SPSS_ Statistical analysis in SPSS_
Statistical analysis in SPSS_
Dr. Anugamini Priya
 
Binomial probability distribution
Binomial probability distributionBinomial probability distribution
Binomial probability distribution
hamza munir
 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distribution
Saradha Shyam
 
Goodness of Fit Notation
Goodness of Fit NotationGoodness of Fit Notation
Goodness of Fit Notation
Long Beach City College
 
2.03 bayesian estimation
2.03 bayesian estimation2.03 bayesian estimation
2.03 bayesian estimation
Andres Mendez-Vazquez
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
Sarabjeet Kaur
 
hypothesis testing-tests of proportions and variances in six sigma
hypothesis testing-tests of proportions and variances in six sigmahypothesis testing-tests of proportions and variances in six sigma
hypothesis testing-tests of proportions and variances in six sigma
vdheerajk
 
Calculating p value
Calculating p valueCalculating p value
Calculating p value
Ramachandra Barik
 
Statistical analysis training course
Statistical analysis training courseStatistical analysis training course
Statistical analysis training course
Marwa Abo-Amra
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
praveen3030
 
Chap06 sampling and sampling distributions
Chap06 sampling and sampling distributionsChap06 sampling and sampling distributions
Chap06 sampling and sampling distributions
Judianto Nugroho
 
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributions
jasondroesch
 
Missing data handling
Missing data handlingMissing data handling
Missing data handling
QuantUniversity
 
Statistics-3 : Statistical Inference - Core
Statistics-3 : Statistical Inference - CoreStatistics-3 : Statistical Inference - Core
Statistics-3 : Statistical Inference - Core
Giridhar Chandrasekaran
 
Statistical inference 2
Statistical inference 2Statistical inference 2
Statistical inference 2
safi Ullah
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesis
RuchiJainRuchiJain
 

What's hot (20)

Introduction to Hypothesis Testing
Introduction to Hypothesis TestingIntroduction to Hypothesis Testing
Introduction to Hypothesis Testing
 
Confidence Intervals
Confidence IntervalsConfidence Intervals
Confidence Intervals
 
Hipotez testi
Hipotez testiHipotez testi
Hipotez testi
 
Statistical analysis in SPSS_
Statistical analysis in SPSS_ Statistical analysis in SPSS_
Statistical analysis in SPSS_
 
Binomial probability distribution
Binomial probability distributionBinomial probability distribution
Binomial probability distribution
 
Binomial distribution
Binomial distributionBinomial distribution
Binomial distribution
 
Goodness of Fit Notation
Goodness of Fit NotationGoodness of Fit Notation
Goodness of Fit Notation
 
2.03 bayesian estimation
2.03 bayesian estimation2.03 bayesian estimation
2.03 bayesian estimation
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
 
hypothesis testing-tests of proportions and variances in six sigma
hypothesis testing-tests of proportions and variances in six sigmahypothesis testing-tests of proportions and variances in six sigma
hypothesis testing-tests of proportions and variances in six sigma
 
Calculating p value
Calculating p valueCalculating p value
Calculating p value
 
Statistical analysis training course
Statistical analysis training courseStatistical analysis training course
Statistical analysis training course
 
Hypothesis testing
Hypothesis testingHypothesis testing
Hypothesis testing
 
Chap06 sampling and sampling distributions
Chap06 sampling and sampling distributionsChap06 sampling and sampling distributions
Chap06 sampling and sampling distributions
 
Frequency Distributions
Frequency DistributionsFrequency Distributions
Frequency Distributions
 
Missing data handling
Missing data handlingMissing data handling
Missing data handling
 
Statistics-3 : Statistical Inference - Core
Statistics-3 : Statistical Inference - CoreStatistics-3 : Statistical Inference - Core
Statistics-3 : Statistical Inference - Core
 
Statistical inference 2
Statistical inference 2Statistical inference 2
Statistical inference 2
 
Confidence interval
Confidence intervalConfidence interval
Confidence interval
 
Testing of hypothesis
Testing of hypothesisTesting of hypothesis
Testing of hypothesis
 

Similar to Statistics-2 : Elements of Inference

Probability distribution
Probability distributionProbability distribution
Probability distributionRanjan Kumar
 
lecture4.pdf
lecture4.pdflecture4.pdf
lecture4.pdf
TarikuArega1
 
5. RV and Distributions.pptx
5. RV and Distributions.pptx5. RV and Distributions.pptx
5. RV and Distributions.pptx
SaiMohnishMuralidhar
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
King Khalid University
 
Binomial,Poisson,Geometric,Normal distribution
Binomial,Poisson,Geometric,Normal distributionBinomial,Poisson,Geometric,Normal distribution
Binomial,Poisson,Geometric,Normal distribution
Bharath kumar Karanam
 
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptx
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptxBINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptx
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptx
letbestrong
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
마이캠퍼스
 
Basic statistics for algorithmic trading
Basic statistics for algorithmic tradingBasic statistics for algorithmic trading
Basic statistics for algorithmic trading
QuantInsti
 
FandTtests.ppt
FandTtests.pptFandTtests.ppt
FandTtests.ppt
UMAIRASHFAQ20
 
Statistics Formulae for School Students
Statistics Formulae for School StudentsStatistics Formulae for School Students
Statistics Formulae for School Students
dhatiraghu
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
Natan Katz
 
Probability
ProbabilityProbability
Chapter 5 and Chapter 6
Chapter 5 and Chapter 6 Chapter 5 and Chapter 6
Chapter 5 and Chapter 6
Tara Kissel, M.Ed
 
04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrv04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrvPooja Sakhla
 
Statr sessions 9 to 10
Statr sessions 9 to 10Statr sessions 9 to 10
Statr sessions 9 to 10
Ruru Chowdhury
 
Unit3
Unit3Unit3
Statistical Analysis with R- III
Statistical Analysis with R- IIIStatistical Analysis with R- III
Statistical Analysis with R- III
Akhila Prabhakaran
 
7 Chi-square and F (1).ppt
7 Chi-square and F (1).ppt7 Chi-square and F (1).ppt
7 Chi-square and F (1).ppt
Abebe334138
 
Crv
CrvCrv
Crv
Ashar78
 
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
Discrete distributions:  Binomial, Poisson & Hypergeometric distributionsDiscrete distributions:  Binomial, Poisson & Hypergeometric distributions
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
ScholarsPoint1
 

Similar to Statistics-2 : Elements of Inference (20)

Probability distribution
Probability distributionProbability distribution
Probability distribution
 
lecture4.pdf
lecture4.pdflecture4.pdf
lecture4.pdf
 
5. RV and Distributions.pptx
5. RV and Distributions.pptx5. RV and Distributions.pptx
5. RV and Distributions.pptx
 
Machine learning mathematicals.pdf
Machine learning mathematicals.pdfMachine learning mathematicals.pdf
Machine learning mathematicals.pdf
 
Binomial,Poisson,Geometric,Normal distribution
Binomial,Poisson,Geometric,Normal distributionBinomial,Poisson,Geometric,Normal distribution
Binomial,Poisson,Geometric,Normal distribution
 
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptx
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptxBINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptx
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptx
 
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)슬로우캠퍼스:  scikit-learn & 머신러닝 (강박사)
슬로우캠퍼스: scikit-learn & 머신러닝 (강박사)
 
Basic statistics for algorithmic trading
Basic statistics for algorithmic tradingBasic statistics for algorithmic trading
Basic statistics for algorithmic trading
 
FandTtests.ppt
FandTtests.pptFandTtests.ppt
FandTtests.ppt
 
Statistics Formulae for School Students
Statistics Formulae for School StudentsStatistics Formulae for School Students
Statistics Formulae for School Students
 
Bayesian Neural Networks
Bayesian Neural NetworksBayesian Neural Networks
Bayesian Neural Networks
 
Probability
ProbabilityProbability
Probability
 
Chapter 5 and Chapter 6
Chapter 5 and Chapter 6 Chapter 5 and Chapter 6
Chapter 5 and Chapter 6
 
04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrv04 random-variables-probability-distributionsrv
04 random-variables-probability-distributionsrv
 
Statr sessions 9 to 10
Statr sessions 9 to 10Statr sessions 9 to 10
Statr sessions 9 to 10
 
Unit3
Unit3Unit3
Unit3
 
Statistical Analysis with R- III
Statistical Analysis with R- IIIStatistical Analysis with R- III
Statistical Analysis with R- III
 
7 Chi-square and F (1).ppt
7 Chi-square and F (1).ppt7 Chi-square and F (1).ppt
7 Chi-square and F (1).ppt
 
Crv
CrvCrv
Crv
 
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
Discrete distributions:  Binomial, Poisson & Hypergeometric distributionsDiscrete distributions:  Binomial, Poisson & Hypergeometric distributions
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
 

Recently uploaded

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
axoqas
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
theahmadsaood
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
James Polillo
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
ukgaet
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
Tiktokethiodaily
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
AlejandraGmez176757
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
axoqas
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
nscud
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 

Recently uploaded (20)

做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
做(mqu毕业证书)麦考瑞大学毕业证硕士文凭证书学费发票原版一模一样
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
tapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive datatapal brand analysis PPT slide for comptetive data
tapal brand analysis PPT slide for comptetive data
 
Jpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization SampleJpolillo Amazon PPC - Bid Optimization Sample
Jpolillo Amazon PPC - Bid Optimization Sample
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
一比一原版(UVic毕业证)维多利亚大学毕业证成绩单
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
1.Seydhcuxhxyxhccuuxuxyxyxmisolids 2019.pptx
 
Business update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMIBusiness update Q1 2024 Lar España Real Estate SOCIMI
Business update Q1 2024 Lar España Real Estate SOCIMI
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
哪里卖(usq毕业证书)南昆士兰大学毕业证研究生文凭证书托福证书原版一模一样
 
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
一比一原版(CBU毕业证)不列颠海角大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 

Statistics-2 : Elements of Inference

  • 1. STATISTICS - 2 Elements of Inference
  • 2. A Small Recap of Previous Presentation • Descriptive Vs Inferential Statistics • Sample Vs Population • Need for sampling
  • 3. STATISTICS Descriptive • Descriptive statistics in simple sense is to provide people a description of the data that we currently have. • Example would be what is the statistics of performance of a class of students and answer would be like mean marks are 64.5 Inferential • Inferential statistics is when we have to infer an outcome by just looking at a small portion of data. • Example would be who will win this election and answer would be like a survey of 10000 people suggests that XYZ has a 60-70% chance with 95% confidence
  • 4. Sample Vs Population Sample • A sample is a portion of the population which is readily available or easily attainable. • Example would be a survey or just a million people from the population. Population • A population is the entire data that should be ideally used for a statistic. • Example would be a census or population of a country
  • 5. Why do we go for sample ? • Going around and asking the entire population of people who they are going to vote for is impossible. • Taking the heights of the entire population is not feasible as a lot would die and a lot would be born by the time we finish. • Sometimes the sample would be a “Representative sample” meaning it has the same nature and characteristics of the population.
  • 6. Elements of Inference • Population Vs Sample (Measures) • Probability • Random Variables • Probability Distributions • Statistical Inference – The Concept
  • 7. Population Vs Sample (Measures) Population Measures • Mean = μ • SD = σ • Var = σ2 : • Variance Formula : Average squared distance from population mean. Sample Measures • Mean = X̅ • SD = S • Var = S2 : • Variance Formula : MODIFIED Average squared distance from Sample mean. Explained in A separate simulation
  • 8. Alternative formula for variance Population Variance Sample Variance Find the proof here
  • 10. Probability – Measure of Randomness • Probability can be assumed as the measure of randomness • Usually done on a population quantity
  • 11. A Simple rule in subsets • Advanced Rule – If Occurance of A implies occurrence of B, Then Probability of A occurrence is < probability of B occurrence. • P(A) < P(B). BA
  • 12. Probability in Statistics • Why probability is a part of statistics? • Probability calculus helps model the randomness and hence is a part of statistical measures. • Mass function and density function is the starting point. • Example is bell curve (Normal distribution) and all other similar distribution.
  • 13. Types of Data Quantitative • Quantitative data are called discrete if the sample space contains a finite or countable infinite number. • S = {0, 1, 2, ..., 31} – students passed • S = {0, 1, 2, ...} – cars crossed in a given hour • Quantitative data are called continuous if the sample space contains an interval or continuous span of real numbers. • S = {h: h ≥ 0 hours} – number of hours spent studying Qualitative • Qualitative data are called categorical if the sample space contains objects that are grouped or categorized based on some qualitative trait. • When there are only two such groups or categories, the data are considered binary. • S = {yes, no} – binary • S = {Male, Female, Other} - Categorical
  • 15. Random Variables • Similar to variables in a computer program • RV is a variable which holds the numeric outcome of a experiment • Types : Discrete and Continuous
  • 16. Discrete random variables • Examples of Discrete : Die roll / Coin toss etc. • Coin toss – 2 possible values {H, T} • Roll of Dice – 6 possible values {1,2,3,4,5,6} • Modelling Discrete : We associate a probability to all individual outcomes. • Web traffic in a given day – Can have a fixed but unbound value at any given day Continuous random variables • Example of continuous : Number of hours I sleep daily ( Discrete ? ). • Lets develop the above into continuous • If you answer that 7, I may ask is it & or 7.000001? • Modelling Continuous : We associate a probability to a various ranges of outcomes
  • 17. Continuous RV Example • Height differentials of students in a class. • Try to put the values in a SET -> { , , } • Can you be sure if you get a value of 1.24? • It can be 1.245.... Or 1.242... • So Continuous RVs does not have defined value instead can take plenty of values or infinite states. • That is why we use a range to model them like. [1-2], [2-3], [3-4]
  • 19. Types of Distributions Discrete Probability Distribution • Bernoulli Distribution • Binomial Distribution • Geometric Distribution • Poisson Distribution Continuous Probability Distribution • Uniform Distribution • Normal Distribution • Exponential Distribution • Gamma Distribution • Chi-Squared Distribution
  • 20. Probability mass function (p.m.f) • The probability that a discrete random variable X takes on a particular value x, that is, P(X = x),also denoted f(x). • The function f(x) is typically called the probability mass function. • A.k.a • probability function • frequency function • probability density function.
  • 21. PMF of Discrete Random Variables • PMF(H) -> f(coin toss) -> P(coin toss = H) = ½ • Defined as PMF is a function of a value of random variable (X) which gives the probability associated with that value of X(x). • Cant be zero, all values of RV sums to 1
  • 22. Cumulative Distribution Function • The function: F(x) = P(X ≤ x) is called a cumulative probability distribution. • For a discrete random variable X, the cumulative probability distribution F(x) is determined by:
  • 23. PMF Vs CDF • note that the probability mass function, f(x), of a discrete random variable X is distinguished from the cumulative probability distribution, F(x), of a discrete random variable X by the use of a lowercase f and an uppercase F. • That is, the notation f(3) means P(X = 3), while the notation F(3) means P(X ≤ 3).
  • 24. Survival function • Both CDF and Survival function is just some functions which if named can make life easier. • CDF(x) is probability that the function takes the value x and lower. • Survival function is just the opposite of Cdf. • Survival fn(x) -> P(X > x).
  • 25. Bernoulli Distribution • X = 0 -> Tails • X = 1 -> Heads • P(x) = ( ½ )x.( ½ )(x-1) • P(x) = (θ)x.(1-θ)(x-1) for a biased coin • The above is Bernoulli distribution and models a coin toss.
  • 26. Bernoulli Distributions • Bernoulli • F(X=x) = Px.(1-p)(1-x) . • Mean = p • Variance = p.(1-p) P 1-p 0 0.5 1
  • 27. Binomial Distrbution • A discrete random variable X is a binomial random variable if: • An experiment, or trial, is performed in exactly the same way n times. • Each of the n trials has only two possible outcomes. One of the outcomes is called a "success," while the other is called a "failure." Such a trial is called a Bernoulli trial. • The n trials are independent. • The probability of success, denoted p, is the same for each trial. The probability of failure is q = 1 − p. • The random variable X = the number of successes in the n trials.
  • 28. Binomial Distribution • The probability mass function of a binomial random variable X is: • f(x)=(nCx) (p)x.(1-p)(n-x) Or (nCx) (p)x.(q)(n-x) • We denote the binomial distribution as b(n, p). That is, we say: • X ~ b(n, p) • where the tilde (~) is read "as distributed as," and n and p are called parameters of the distribution.
  • 29. Function, Mean and Variance of Binomial Distribution • P.m.f -> f(x)=(nCx) (p)x.(1-p)(n-x) • Mean -> np • Sd -> σ= √np(1−p) • Variance -> σ2=np(1−p)
  • 30. Geometric Distribution • Assume Bernoulli trials — that is, • (1) there are two possible outcomes, • (2) the trials are independent, and • (3) p, the probability of success, remains the same from trial to trial. • Let X denote the number of trials until the first success. Then, the probability mass function of X is: • f(x)=P(X=x)=(1−p)x−1p • for x = 1, 2, ... In this case, we say that X follows a geometric distribution.
  • 31. Geometric Distribution Example • A representative from the National Football League's Marketing Division randomly selects people on a random street in Kansas City, Kansas until he finds a person who attended the last home football game. Let p, the probability that he succeeds in finding such a person, equal 0.20. And, let X denote the number of people he selects until he finds his first success. What is the probability that the marketing representative must select 4 people before he finds one who attended the last home football game?
  • 32. Function, Mean and Variance of Geometric Distribution • P.m.f -> f(x)= P(X=x)=(1−p)x−1p • Mean -> 1/p • Sd -> σ= √(1−p)/p • Variance -> σ2=(1-p)/p2
  • 33. Negative Binomial Distribution • Assume Bernoulli trials from the same geometric distribution example— that is, • (1) there are two possible outcomes, • (2) the trials are independent, and • (3) p, the probability of success, remains the same from trial to trial. • Let X denote the number of trials until the rth success. Then, the probability mass function of X is: • f(x)=P(X=x)=(x−1Cr−1)(1−p)x−rpr • for x = r, r + 1, r + 2, ... In this case, we say that X follows a negative binomial distribution. • A geometric distribution is a special case of a negative binomial distribution with r = 1
  • 34. Function, Mean and Variance of Negative Binomial Distribution • P.m.f -> f(x)=(x−1Cr−1)(1−p)x−rpr • Mean -> r/p • Sd -> √r(1−p)/p • Variance -> r(1-p)/p2
  • 35. Poisson Distribution • Let the discrete random variable X denote the number of times an event occurs in an interval of time (or space). Then X maybe a Poisson random variable with x = 0, 1, 2, ... • Examples: • Let X equal the number of typos on a printed page. (This is an example of an interval of space — the space being the printed page.) • Let X equal the number of cars passing through the intersection of Allen Street and College Avenue in one minute. (This is an example of an interval of time — the time being one minute.) • Let X equal the number of Alaskan salmon caught in a squid driftnet. (This is again an example of an interval of space — the space being the squid driftnet.) • Let X equal the number of customers at an ATM in 10-minute intervals. • Let X equal the number of students arriving during office hours.
  • 36. Function, Mean and Variance of Poisson Distribution • P.m.f -> • Mean -> λ • Sd -> √ λ • Variance -> λ Poisson can be approximated to the binomial distribution when n is large and p is small.
  • 38. Continuous Distributions • In this section, as the title suggests, we are going to investigate probability distributions of continuous random variables, that is, random variables whose values contains an infinite interval of possible outcomes.
  • 39. Useful Pre-requisites • Empirical Rule for 68,95 and 99.7 percentile. • When the gathered data is mound or bell- shaped. • We can use the following formula to identify the amount of data points below the level. • 68% data is between μ ± 1σ • 95% data is between μ ± 2σ • 99.7% data is between μ ± 3σ
  • 40. Useful Pre-requisites • Quantiles • Percentile is a derivative of quantile • Percentile will map the value to that of 100 • Quantile will map the value to that of the maximum value. • The median is the 50th quantile. • 25th Quantile will encompass 1/4th of the data
  • 41. Useful Pre-requisite • The 25th percentile is also called the first quartile and is denoted as q1. • The 50th percentile is also called the second quartile or median, and is denoted as q2 or m. • The 75th percentile is also called the third quartile and is denoted as q3. • The interquartile range (IQR) is the difference between the first and third quartiles.
  • 42. Useful Pre-requisite • Five-Number Summary • we have a random sample of 20 concentrations of calcium carbonate (CaCO3) in milligrams per litre: • Minimum: 127.8 • First quartile: 130.12 • Median: 131.45 • Third quartile: 132.70 • Maximum: 134.8
  • 43. Useful Pre-requisite • Skewness and Symmetry • For a distribution that is skewed left, the bulk of the data values (including the median) lie to the right of the mean, and there is a long tail on the left side. • For a distribution that is skewed right, the bulk of the data values (including the median) lie to the left of the mean, and there is a long tail on the right side. • For a distribution that is symmetric, approximately half of the data values lie to the left of the mean, and approximately half of the data values lie to the right of the mean.
  • 45. Probability Density Function • The probability that X takes on any particular value x is 0. That is, finding P(X = x) for a continuous random variable X is not going to work. • Instead, we'll need to find the probability that X falls in some interval (a, b), that is, we'll need to find P(a < X < b). We'll do that using a probability density function ("p.d.f.").
  • 46. Probability Density Function • PDF -> Helps model the continuous RV and used area represented by the probabilities • Value is always larger than zero throughout the curve or function. • Total area under the curve is 1. • Area under the curve gives probabilities associated with Random Variable. • Probablity that a person with IQ > 100 but less than 115 in a PDF defined is shown. 100 115
  • 47. Special Considerations on PDF • The probability that the Random Variable takes any specific value is 0 since area of the curve (not under) is 0. • The PDF always talks or deals with the population measure (NOT sample measure).
  • 48. Cumulative Distribution Function • The function: F(x) = P(X ≤ x) is called a cumulative probability distribution. • For a discrete random variable X, F(x) is: • For a continuous RV X F(x) is : • The summation is made to integral.
  • 49. Survival function • Both CDF and Survival function is just some functions which if named can make life easier. • CDF(x) is probability that the function takes the value x and lower. • Survival function is just the opposite of Cdf. • Survival fn(x) -> P(X > x). CDF SF
  • 50. Continuous Distributions • Uniform Distribution • Beta Distribution • Normal Distribution • Exponential Distribution • Gamma Distribution • Chi-Squared Distribution
  • 51. Uniform Distribution • A continuous random variable X has a uniform distribution, denoted U(a, b), if its distribution is as given below •
  • 52. P.d.f, mean and variance of a Uniform distribution • P.d.f -> • Mean -> • Variance ->
  • 53. Exponential Distribution • Suppose X, following an (approximate) Poisson process, equals the number of customers arriving at a bank in an interval of length 1. • If λ, the mean number of customers arriving in an interval of length 1, is 6, say, then we might observe something like this: • • w – waiting time
  • 54. Exponential Distribution – contd.. • Previously, our focus would have been on the discrete random variable X, the number of customers arriving. • As the picture suggests, however, we could alternatively be interested in the continuous random variable W, the waiting time until the first customer arrives.
  • 55. p.d.f, mean and variance of Exponential Distribution • P.d.f -> • Mean -> • Variance ->
  • 56. Gamma Distributions • we learned that in an approximate Poisson process with mean λ, the waiting time X until the first event occurs follows an exponential distribution with mean θ = 1/λ. • We now let W denote the waiting time until the αth event occurs and find the distribution of W. We could represent the situation as follows: •
  • 57. p.d.f, mean and variance of Gamma Distribution • P.d.f -> • Mean -> • Variance ->
  • 58. Chi-Square Distribution • the chi-square distribution is just a special case of the gamma distribution! • Let X follow a gamma distribution with θ = 2 and α = r/2, where r is a positive integer. Then we say that X follows a chi-square distribution with r degrees of freedom, denoted χ2(r) and read "chi-square-r."
  • 59. p.d.f, mean and variance of Chi-Square Distribution • P.d.f -> • Mean -> • Variance ->
  • 60. Normal Distribution • Most frequent distribution seen in the natural world. • P.d.f -> • Mean -> μ • Variance -> σ2
  • 61. Properties of Normal Distribution • All normal curves are bell-shaped. • All normal curves are symmetric about the mean μ. • The area under an entire normal curve is 1. • All normal curves are positive for all x. That is, f(x) > 0 for all x. • The limit of f(x) as x goes to infinity is 0, and the limit of f(x) as x goes to negative infinity is 0. • The height of any normal curve is maximized at x = µ. • The shape of any normal curve depends on its mean μ and standard deviation σ.
  • 62. Standard Normal Distribution • If X ~ N(μ, σ2), then: Z = (X-μ)/σ follows N(0,1). • This means that Z is a random variable which follows the N(0,1) distribution, which is called the standardized (or standard) normal distribution. • Now we can use the standard normal N(0,1) table, typically referred to as the Z-table, to find the desired probability.
  • 63. Finding probabilities in normal distribution? • Lets see the following question:- • Let X equal the IQ of a randomly selected American. Assume X ~ N(100, 162). What is the probability that a randomly selected American has an IQ below 90? •
  • 64. Finding probabilities in normal distribution? • The following integral gives us the answer for the question • But there is just 1 problem • It is not possible to integrate the normal p.d.f. That is, no simple expression exists for the antiderivative. We can only approximate the integral using numerical analysis techniques. • So, all we need to do is find a normal probability table for a normal distribution with mean μ = 100 and standard deviation σ = 16. Then there would have to be an infinite number of normal probability tables for various μ and σ.
  • 65. Finding probabilities in normal distribution? - Solution • The cumulative probabilities have been tabled for the N(0,1) distribution. • All we need to do is transform our N(100,162) distribution to a N(0,1) distribution • Then use the cumulative probability table for the N(0,1) distribution to calculate our desired probability.
  • 66. Transforming X ~ N(μ,σ2) to X ~ N(0,1) • If X ~ N(μ, σ2), then: • follows the N(0,1) distribution, which is called the standardized (or standard) normal distribution. • Use the standard normal N(0,1) table, typically referred to as the Z-table, to find the desired probability.
  • 67. Functions to find probability or X value • First and foremost method is to use the Empirical rule for 90%, 95%, 97.5%, 99% viz 1.28σ, 1.645σ, 1.96σ, 2.33σ respectively • Second method is to use the Z-Table. • Excel and R Formulas are described in detail in the following slides.
  • 68. Excel Formula for finding probability • X value, μ, σ are given. • NORMDIST(x, μ, σ, cumulative) • X is the value for which you want the distribution. • μ is the arithmetic mean of the distribution. • σ is the standard deviation of the distribution. • Cumulative : • If TRUE, returns the c.d.f. (Area to the left) P(X ≤ x) • if FALSE, it returns the p.d.f. • 1 - NORMDIST(x, μ, σ, TRUE) (Area to the right) P(X < x)
  • 69. Excel Formula for finding x value • Probability, μ, σ are given. • NORMINV(probability, μ, σ ) • Probability is a probability corresponding to the normal distribution. • μ is the arithmetic mean of the distribution. • σ is the standard deviation of the distribution. • For STANDARD NORMAL. • NORMSDIST(z) • Z is the value for which you want the distribution. • NORMSINV(probability) • Probability is a probability corresponding to the normal distribution.
  • 70. R functions for finding probability • pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE) • Defaults are specified • q - The x values that we need to find the probability for. • We can also directly calculate z value and use z value like given below • pnorm(z, lower.tail = TRUE, log.p = FALSE)
  • 71. R functions for finding x value • qnorm(p, mean, sd , lower.tail, log.p) • P - probability (defaults NONE) (Usually given as a quantile like 0.92 for 92%) • Mean - mean (defaults 0) • Sd - std_deviation (defaults 1) • Lower.tail - TRUE -> P(X ≤ x) - FALSE -> P(X < x) • Log.p - TRUE -> p is given as log(p) - FALSE -> p is a quantile
  • 72. Relationship between normal and Chi- Square Distribution • If X is normally distributed with mean μ and variance σ2 > 0, then: • is distributed as a chi-square random variable with 1 degree of freedom.
  • 74. What is Statistical Inference ? • Generating conclusions about population from a noisy sample • We try to identify the estimates of population from the data available in the form of samples • The Historical data is one of the most widely available data • The Survey data is the other form of available data
  • 75. Statistical Inference – The process • We have sample data • Hence we will have a measure for it like mean, median or mode. • The sample measure is called the estimator. • Where it tries to estimate the population measure. • Sample mean is an estimate of population mean • Sample median is an estimate of population median.
  • 76. END OF MODULE ~~X~~ Coming up Next = Statistical Inference - Core 76