Elements of Inference covers the following concepts and takes off right from where we left off in the previous slide https://www.slideshare.net/GiridharChandrasekar1/statistics1-the-basics-of-statistics.
Population Vs Sample (Measures)
Probability
Random Variables
Probability Distributions
Statistical Inference – The Concept
INFERENTIAL STATISTICS: AN INTRODUCTIONJohn Labrador
For instance, we use inferential statistics to try to infer from the sample data what the population might think. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study.
INFERENTIAL STATISTICS: AN INTRODUCTIONJohn Labrador
For instance, we use inferential statistics to try to infer from the sample data what the population might think. Or, we use inferential statistics to make judgments of the probability that an observed difference between groups is a dependable one or one that might have happened by chance in this study.
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 11: Goodness-of-Fit and Contingency Tables
11.1: Goodness of Fit Notation
1. continuous probability distribution
2. Normal Distribution
3. Application of Normal Dist
4. Characteristics of normal distribution
5.Standard Normal Distribution
When you perform a hypothesis test in statistics, a p-value helps you determine the significance of your results. ... The p-value is a number between 0 and 1 and interpreted in the following way: A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so you reject the null hypothesis.
This presentation was intended for employees of Dubai Municipality. It is about how to use SPSS and other statistical data analysis tools like Excel and Minitab in data analysis. The course presented some statistical concepts and definitions.
Missing data handling is typically done in an ad-hoc way. Without understanding the repurcussions of a missing data handling technique, approaches that only let you get to the "next step" in your analytics pipeline leads to terrible outputs, conclusions that aren't robust and biased estimates. Handling missing data in data sets requires a structured approach. In this workshop, we will cover the key tenets of handling missing data in a structured way
This presentation covers important topics such as
Multiple Independent Random Variables or i.i.d samples.
Expectations or Expected values
T-Distribution
Central Limit Theorem
Asymptotics & Law of Large Numbers
Confidence Intervals
Please Subscribe to this Channel for more solutions and lectures
http://www.youtube.com/onlineteaching
Chapter 11: Goodness-of-Fit and Contingency Tables
11.1: Goodness of Fit Notation
1. continuous probability distribution
2. Normal Distribution
3. Application of Normal Dist
4. Characteristics of normal distribution
5.Standard Normal Distribution
When you perform a hypothesis test in statistics, a p-value helps you determine the significance of your results. ... The p-value is a number between 0 and 1 and interpreted in the following way: A small p-value (typically ≤ 0.05) indicates strong evidence against the null hypothesis, so you reject the null hypothesis.
This presentation was intended for employees of Dubai Municipality. It is about how to use SPSS and other statistical data analysis tools like Excel and Minitab in data analysis. The course presented some statistical concepts and definitions.
Missing data handling is typically done in an ad-hoc way. Without understanding the repurcussions of a missing data handling technique, approaches that only let you get to the "next step" in your analytics pipeline leads to terrible outputs, conclusions that aren't robust and biased estimates. Handling missing data in data sets requires a structured approach. In this workshop, we will cover the key tenets of handling missing data in a structured way
This presentation covers important topics such as
Multiple Independent Random Variables or i.i.d samples.
Expectations or Expected values
T-Distribution
Central Limit Theorem
Asymptotics & Law of Large Numbers
Confidence Intervals
BINOMIAL ,POISSON AND NORMAL DISTRIBUTION.pptxletbestrong
BINOMIAL DISTRIBUTION
In probability theory and statistics, the binomial distribution is the discrete probability distribution gives only two possible results in an experiment, either Success or Failure. For example, if we toss a coin, there could be only two possible outcomes: heads or tails, and if any test is taken, then there could be only two results: pass or fail. This distribution is also called a binomial probability distribution.
Number of trials (n) is a fixed number.
The outcome of a given trial is either success or failure.
The probability of success (p) remains constant from trial to trial which means an experiment is conducted under homogeneous conditions.
The trials are independent which means the outcome of previous trial does not affect the outcome of the next trial.
Binomial Probability Distribution
In binomial probability distribution, the number of ‘Success’ in a sequence of n experiments, where each time a question is asked for yes-no, then the valued outcome is represented either with success/yes/true/one (probability p) or failure/no/false/zero (probability q = 1 − p). A single success/failure test is also called a Bernoulli trial or Bernoulli experiment, and a series of outcomes is called a Bernoulli process. For n = 1, i.e. a single experiment, the binomial distribution is a Bernoulli distribution.
There are two parameters n and p used here in a binomial distribution. The variable ‘n’ states the number of times the experiment runs and the variable ‘p’ tells the probability of any one outcome. Suppose a die is thrown randomly 10 times, then the probability of getting 2 for anyone throw is ⅙. When you throw the dice 10 times, you have a binomial distribution of n = 10 and p = ⅙.
The binomial distribution formula is for any random variable X, given by;
P(x:n,p) = nCx px (1-p)n-x
Where,
n = the number of experiments
x = 0, 1, 2, 3, 4, …
p = Probability of Success in a single experiment
q = Probability of Failure in a single experiment = 1 – p
The binomial distribution formula can also be written in the form of n-Bernoulli trials, where nCx = n!/x!(n-x)!. Hence,
P(x:n,p) = n!/[x!(n-x)!].px.(q)n-x
Binomial Distribution Mean and Variance
For a binomial distribution, the mean, variance and standard deviation for the given number of success are represented using the formulas
Mean, μ = np
Variance, σ2 = npq
Standard Deviation σ= √(npq)
Where p is the probability of success
q is the probability of failure, where q = 1-p
Properties of binomial distribution
The properties of the binomial distribution are:
• There are two possible outcomes: true or false, success or failure, yes or no.
• There is ‘n’ number of independent trials or a fixed number of n times repeated trials.
• The probability of success or failure remains the same for each trial.
• Only the number of success is calculated out of n independent trials.
• Every trial is an independent trial, which means the outcome of one trial does not affect the outcome
Basic statistics for algorithmic tradingQuantInsti
In this presentation we try to understand the core basics of statistics and its application in algorithmic trading.
We start by defining what statistics is. Collecting data is the root of statistics. We need data to analyse and take quantitative decisions.
While analyzing, there are certain parameters for statistics, this branches statistics into two - descriptive statistics & inferential statistics.
This data that we have collected can be classified into uni-variate and bi-variate. It also tries to explain the fundamental difference.
Going Further we also cover topics like regression line, Coefficient of Determination, Homoscedasticity and Heteroscedasticity.
In this way the presentation look at various aspects of statistics which are used for algorithmic trading.
To learn the advanced applications of statistics for HFT & Quantitative Trading connect with us one our website: www.quantinsti.com.
The PPT covered the distinguish between discrete and continuous distribution. Detailed explanation of the types of discrete distributions such as binomial distribution, Poisson distribution & Hyper-geometric distribution.
Similar to Statistics-2 : Elements of Inference (20)
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Subhajit Sahu
Abstract — Levelwise PageRank is an alternative method of PageRank computation which decomposes the input graph into a directed acyclic block-graph of strongly connected components, and processes them in topological order, one level at a time. This enables calculation for ranks in a distributed fashion without per-iteration communication, unlike the standard method where all vertices are processed in each iteration. It however comes with a precondition of the absence of dead ends in the input graph. Here, the native non-distributed performance of Levelwise PageRank was compared against Monolithic PageRank on a CPU as well as a GPU. To ensure a fair comparison, Monolithic PageRank was also performed on a graph where vertices were split by components. Results indicate that Levelwise PageRank is about as fast as Monolithic PageRank on the CPU, but quite a bit slower on the GPU. Slowdown on the GPU is likely caused by a large submission of small workloads, and expected to be non-issue when the computation is performed on massive graphs.
2. A Small Recap of Previous Presentation
• Descriptive Vs Inferential Statistics
• Sample Vs Population
• Need for sampling
3. STATISTICS
Descriptive
• Descriptive statistics in
simple sense is to provide
people a description of the
data that we currently have.
• Example would be what is
the statistics of performance
of a class of students and
answer would be like mean
marks are 64.5
Inferential
• Inferential statistics is when
we have to infer an outcome
by just looking at a small
portion of data.
• Example would be who will
win this election and answer
would be like a survey of
10000 people suggests that
XYZ has a 60-70% chance with
95% confidence
4. Sample Vs Population
Sample
• A sample is a portion of the
population which is readily
available or easily attainable.
• Example would be a survey
or just a million people from
the population.
Population
• A population is the entire
data that should be ideally
used for a statistic.
• Example would be a census
or population of a country
5. Why do we go for sample ?
• Going around and asking the entire
population of people who they are going to
vote for is impossible.
• Taking the heights of the entire population is
not feasible as a lot would die and a lot would
be born by the time we finish.
• Sometimes the sample would be a
“Representative sample” meaning it has the
same nature and characteristics of the
population.
6. Elements of Inference
• Population Vs Sample (Measures)
• Probability
• Random Variables
• Probability Distributions
• Statistical Inference – The Concept
7. Population Vs Sample (Measures)
Population Measures
• Mean = μ
• SD = σ
• Var = σ2 :
• Variance Formula : Average
squared distance from
population mean.
Sample Measures
• Mean = X̅
• SD = S
• Var = S2 :
• Variance Formula :
MODIFIED Average squared
distance from Sample
mean.
Explained
in A
separate
simulation
10. Probability – Measure of Randomness
• Probability can be assumed as the measure of
randomness
• Usually done on a population quantity
11. A Simple rule in subsets
• Advanced Rule – If Occurance of A implies
occurrence of B, Then Probability of A
occurrence is < probability of B occurrence.
• P(A) < P(B).
BA
12. Probability in Statistics
• Why probability is a part of statistics?
• Probability calculus helps model the
randomness and hence is a part of statistical
measures.
• Mass function and density function is the
starting point.
• Example is bell curve (Normal distribution)
and all other similar distribution.
13. Types of Data
Quantitative
• Quantitative data are
called discrete if the sample space
contains a finite or countable
infinite number.
• S = {0, 1, 2, ..., 31} – students
passed
• S = {0, 1, 2, ...} – cars crossed in a
given hour
• Quantitative data are
called continuous if the sample
space contains an interval or
continuous span of real numbers.
• S = {h: h ≥ 0 hours} – number of
hours spent studying
Qualitative
• Qualitative data are
called categorical if the
sample space contains objects
that are grouped or
categorized based on some
qualitative trait.
• When there are only two such
groups or categories, the data
are considered binary.
• S = {yes, no} – binary
• S = {Male, Female, Other} -
Categorical
15. Random Variables
• Similar to variables in a computer program
• RV is a variable which holds the numeric
outcome of a experiment
• Types : Discrete and Continuous
16. Discrete random variables
• Examples of Discrete : Die roll
/ Coin toss etc.
• Coin toss – 2 possible values
{H, T}
• Roll of Dice – 6 possible
values {1,2,3,4,5,6}
• Modelling Discrete : We
associate a probability to all
individual outcomes.
• Web traffic in a given day –
Can have a fixed but unbound
value at any given day
Continuous random variables
• Example of continuous :
Number of hours I sleep
daily ( Discrete ? ).
• Lets develop the above into
continuous
• If you answer that 7, I may
ask is it & or 7.000001?
• Modelling Continuous : We
associate a probability to a
various ranges of outcomes
17. Continuous RV Example
• Height differentials of students in a class.
• Try to put the values in a SET -> { , , }
• Can you be sure if you get a value of 1.24?
• It can be 1.245.... Or 1.242...
• So Continuous RVs does not have defined value
instead can take plenty of values or infinite states.
• That is why we use a range to model them like.
[1-2], [2-3], [3-4]
19. Types of Distributions
Discrete Probability Distribution
• Bernoulli Distribution
• Binomial Distribution
• Geometric Distribution
• Poisson Distribution
Continuous Probability Distribution
• Uniform Distribution
• Normal Distribution
• Exponential Distribution
• Gamma Distribution
• Chi-Squared Distribution
20. Probability mass function (p.m.f)
• The probability that a discrete random
variable X takes on a particular value x, that
is, P(X = x),also denoted f(x).
• The function f(x) is typically called the probability
mass function.
• A.k.a
• probability function
• frequency function
• probability density function.
21. PMF of Discrete Random Variables
• PMF(H) -> f(coin toss) -> P(coin toss = H) = ½
• Defined as PMF is a function of a value of
random variable (X) which gives the
probability associated with that value of X(x).
• Cant be zero, all values of RV sums to 1
22. Cumulative Distribution Function
• The function: F(x) = P(X ≤ x)
is called a cumulative probability distribution.
• For a discrete random variable X, the
cumulative probability distribution F(x) is
determined by:
23. PMF Vs CDF
• note that the probability mass function, f(x),
of a discrete random variable X is
distinguished from the cumulative probability
distribution, F(x), of a discrete random
variable X by the use of a lowercase f and an
uppercase F.
• That is, the notation f(3) means P(X = 3), while
the notation F(3) means P(X ≤ 3).
24. Survival function
• Both CDF and Survival function is just some functions which if
named can make life easier.
• CDF(x) is probability that the function takes the value x and
lower.
• Survival function is just the opposite of Cdf.
• Survival fn(x) -> P(X > x).
25. Bernoulli Distribution
• X = 0 -> Tails
• X = 1 -> Heads
• P(x) = ( ½ )x.( ½ )(x-1)
• P(x) = (θ)x.(1-θ)(x-1) for a biased coin
• The above is Bernoulli distribution and models
a coin toss.
27. Binomial Distrbution
• A discrete random variable X is a binomial random
variable if:
• An experiment, or trial, is performed in exactly the same
way n times.
• Each of the n trials has only two possible outcomes. One of
the outcomes is called a "success," while the other is called
a "failure." Such a trial is called a Bernoulli trial.
• The n trials are independent.
• The probability of success, denoted p, is the same for each
trial. The probability of failure is q = 1 − p.
• The random variable X = the number of successes in
the n trials.
28. Binomial Distribution
• The probability mass function of a binomial random
variable X is:
• f(x)=(nCx) (p)x.(1-p)(n-x) Or (nCx) (p)x.(q)(n-x)
• We denote the binomial distribution as b(n, p). That is, we
say:
• X ~ b(n, p)
• where the tilde (~) is read "as distributed as,"
and n and p are called parameters of the distribution.
29. Function, Mean and Variance of
Binomial Distribution
• P.m.f -> f(x)=(nCx) (p)x.(1-p)(n-x)
• Mean -> np
• Sd -> σ= √np(1−p)
• Variance -> σ2=np(1−p)
30. Geometric Distribution
• Assume Bernoulli trials — that is,
• (1) there are two possible outcomes,
• (2) the trials are independent, and
• (3) p, the probability of success, remains the same from trial to trial.
• Let X denote the number of trials until the first success. Then, the
probability mass function of X is:
• f(x)=P(X=x)=(1−p)x−1p
• for x = 1, 2, ... In this case, we say that X follows a geometric
distribution.
31. Geometric Distribution Example
• A representative from the National Football
League's Marketing Division randomly selects
people on a random street in Kansas City, Kansas
until he finds a person who attended the last
home football game. Let p, the probability that he
succeeds in finding such a person, equal 0.20.
And, let X denote the number of people he
selects until he finds his first success. What is the
probability that the marketing representative
must select 4 people before he finds one who
attended the last home football game?
32. Function, Mean and Variance of
Geometric Distribution
• P.m.f -> f(x)= P(X=x)=(1−p)x−1p
• Mean -> 1/p
• Sd -> σ= √(1−p)/p
• Variance -> σ2=(1-p)/p2
33. Negative Binomial Distribution
• Assume Bernoulli trials from the same geometric distribution
example— that is,
• (1) there are two possible outcomes,
• (2) the trials are independent, and
• (3) p, the probability of success, remains the same from trial to trial.
• Let X denote the number of trials until the rth success. Then, the
probability mass function of X is:
• f(x)=P(X=x)=(x−1Cr−1)(1−p)x−rpr
• for x = r, r + 1, r + 2, ... In this case, we say that X follows a negative
binomial distribution.
• A geometric distribution is a special case of a negative binomial
distribution with r = 1
34. Function, Mean and Variance of
Negative Binomial Distribution
• P.m.f -> f(x)=(x−1Cr−1)(1−p)x−rpr
• Mean -> r/p
• Sd -> √r(1−p)/p
• Variance -> r(1-p)/p2
35. Poisson Distribution
• Let the discrete random variable X denote the number of times an event
occurs in an interval of time (or space). Then X maybe a Poisson random
variable with x = 0, 1, 2, ...
• Examples:
• Let X equal the number of typos on a printed page. (This is an example of
an interval of space — the space being the printed page.)
• Let X equal the number of cars passing through the intersection of Allen
Street and College Avenue in one minute. (This is an example of an
interval of time — the time being one minute.)
• Let X equal the number of Alaskan salmon caught in a squid driftnet. (This
is again an example of an interval of space — the space being the squid
driftnet.)
• Let X equal the number of customers at an ATM in 10-minute intervals.
• Let X equal the number of students arriving during office hours.
36. Function, Mean and Variance of
Poisson Distribution
• P.m.f ->
• Mean -> λ
• Sd -> √ λ
• Variance -> λ
Poisson can be approximated to the binomial distribution when n is large and p is
small.
38. Continuous Distributions
• In this section, as the title suggests, we are
going to investigate probability distributions
of continuous random variables, that is,
random variables whose values contains an
infinite interval of possible outcomes.
39. Useful Pre-requisites
• Empirical Rule for 68,95 and 99.7 percentile.
• When the gathered data is mound or bell-
shaped.
• We can use the following formula to identify the
amount of data points below the level.
• 68% data is between μ ± 1σ
• 95% data is between μ ± 2σ
• 99.7% data is between μ ± 3σ
40. Useful Pre-requisites
• Quantiles
• Percentile is a derivative of quantile
• Percentile will map the value to that of 100
• Quantile will map the value to that of the
maximum value.
• The median is the 50th quantile.
• 25th Quantile will encompass 1/4th of the data
41. Useful Pre-requisite
• The 25th percentile is also called the first
quartile and is denoted as q1.
• The 50th percentile is also called the second
quartile or median, and is denoted as q2 or m.
• The 75th percentile is also called the third
quartile and is denoted as q3.
• The interquartile range (IQR) is the difference
between the first and third quartiles.
42. Useful Pre-requisite
• Five-Number Summary
• we have a random sample of 20 concentrations of
calcium carbonate (CaCO3) in milligrams per litre:
• Minimum: 127.8
• First quartile: 130.12
• Median: 131.45
• Third quartile: 132.70
• Maximum: 134.8
43. Useful Pre-requisite
• Skewness and Symmetry
• For a distribution that is skewed left, the bulk of the
data values (including the median) lie to the right of
the mean, and there is a long tail on the left side.
• For a distribution that is skewed right, the bulk of the
data values (including the median) lie to the left of the
mean, and there is a long tail on the right side.
• For a distribution that is symmetric, approximately half
of the data values lie to the left of the mean, and
approximately half of the data values lie to the right of
the mean.
45. Probability Density Function
• The probability that X takes on any particular
value x is 0. That is, finding P(X = x) for a
continuous random variable X is not going to
work.
• Instead, we'll need to find the probability
that X falls in some interval (a, b), that is, we'll
need to find P(a < X < b). We'll do that using a
probability density function ("p.d.f.").
46. Probability Density Function
• PDF -> Helps model the continuous RV and used area represented by the
probabilities
• Value is always larger than zero throughout the curve or function.
• Total area under the curve is 1.
• Area under the curve gives probabilities associated with Random Variable.
• Probablity that a person with IQ > 100 but less than 115 in a PDF defined is
shown.
100 115
47. Special Considerations on PDF
• The probability that the Random Variable
takes any specific value is 0 since area of the
curve (not under) is 0.
• The PDF always talks or deals with the
population measure (NOT sample measure).
48. Cumulative Distribution Function
• The function: F(x) = P(X ≤ x)
is called a cumulative probability distribution.
• For a discrete random variable X, F(x) is:
• For a continuous RV X F(x) is :
• The summation is made to integral.
49. Survival function
• Both CDF and Survival function is just some functions which if
named can make life easier.
• CDF(x) is probability that the function takes the value x and
lower.
• Survival function is just the opposite of Cdf.
• Survival fn(x) -> P(X > x).
CDF SF
50. Continuous Distributions
• Uniform Distribution
• Beta Distribution
• Normal Distribution
• Exponential Distribution
• Gamma Distribution
• Chi-Squared Distribution
51. Uniform Distribution
• A continuous random variable X has a uniform
distribution, denoted U(a, b), if its distribution
is as given below
•
52. P.d.f, mean and variance of a Uniform
distribution
• P.d.f ->
• Mean ->
• Variance ->
53. Exponential Distribution
• Suppose X, following an (approximate) Poisson process,
equals the number of customers arriving at a bank in an
interval of length 1.
• If λ, the mean number of customers arriving in an interval
of length 1, is 6, say, then we might observe something like
this:
•
• w – waiting time
54. Exponential Distribution – contd..
• Previously, our focus would have been on the
discrete random variable X, the number of
customers arriving.
• As the picture suggests, however, we could
alternatively be interested in the continuous
random variable W, the waiting time until
the first customer arrives.
55. p.d.f, mean and variance of
Exponential Distribution
• P.d.f ->
• Mean ->
• Variance ->
56. Gamma Distributions
• we learned that in an approximate Poisson process with
mean λ, the waiting time X until the first event occurs
follows an exponential distribution with mean θ = 1/λ.
• We now let W denote the waiting time until
the αth event occurs and find the distribution of W. We
could represent the situation as follows:
•
57. p.d.f, mean and variance of Gamma
Distribution
• P.d.f ->
• Mean ->
• Variance ->
58. Chi-Square Distribution
• the chi-square distribution is just a special
case of the gamma distribution!
• Let X follow a gamma distribution with θ = 2
and α = r/2, where r is a positive integer. Then
we say that X follows a chi-square distribution
with r degrees of freedom, denoted χ2(r) and
read "chi-square-r."
59. p.d.f, mean and variance of Chi-Square
Distribution
• P.d.f ->
• Mean ->
• Variance ->
60. Normal Distribution
• Most frequent distribution seen in the natural
world.
• P.d.f ->
• Mean -> μ
• Variance -> σ2
61. Properties of Normal Distribution
• All normal curves are bell-shaped.
• All normal curves are symmetric about the mean μ.
• The area under an entire normal curve is 1.
• All normal curves are positive for all x. That is, f(x) > 0
for all x.
• The limit of f(x) as x goes to infinity is 0, and the limit
of f(x) as x goes to negative infinity is 0.
• The height of any normal curve is maximized at x = µ.
• The shape of any normal curve depends on its
mean μ and standard deviation σ.
62. Standard Normal Distribution
• If X ~ N(μ, σ2), then: Z = (X-μ)/σ follows N(0,1).
• This means that Z is a random variable which
follows the N(0,1) distribution, which is called
the standardized (or standard) normal
distribution.
• Now we can use the standard normal N(0,1)
table, typically referred to as the Z-table, to find
the desired probability.
63. Finding probabilities in normal
distribution?
• Lets see the following question:-
• Let X equal the IQ of a randomly selected American.
Assume X ~ N(100, 162). What is the probability that a
randomly selected American has an IQ below 90?
•
64. Finding probabilities in normal
distribution?
• The following integral gives us the answer for the question
• But there is just 1 problem
• It is not possible to integrate the normal p.d.f. That is, no
simple expression exists for the antiderivative. We can only
approximate the integral using numerical analysis
techniques.
• So, all we need to do is find a normal probability table for a
normal distribution with mean μ = 100 and standard
deviation σ = 16. Then there would have to be an infinite
number of normal probability tables for various μ and σ.
65. Finding probabilities in normal
distribution? - Solution
• The cumulative probabilities have been tabled
for the N(0,1) distribution.
• All we need to do is transform our N(100,162)
distribution to a N(0,1) distribution
• Then use the cumulative probability table for
the N(0,1) distribution to calculate our desired
probability.
66. Transforming X ~ N(μ,σ2) to X ~ N(0,1)
• If X ~ N(μ, σ2), then:
• follows the N(0,1) distribution, which is called
the standardized (or standard) normal
distribution.
• Use the standard normal N(0,1) table, typically
referred to as the Z-table, to find the desired
probability.
67. Functions to find probability or X value
• First and foremost method is to use the
Empirical rule for 90%, 95%, 97.5%, 99% viz
1.28σ, 1.645σ, 1.96σ, 2.33σ respectively
• Second method is to use the Z-Table.
• Excel and R Formulas are described in detail in
the following slides.
68. Excel Formula for finding probability
• X value, μ, σ are given.
• NORMDIST(x, μ, σ, cumulative)
• X is the value for which you want the distribution.
• μ is the arithmetic mean of the distribution.
• σ is the standard deviation of the distribution.
• Cumulative :
• If TRUE, returns the c.d.f. (Area to the left) P(X ≤ x)
• if FALSE, it returns the p.d.f.
• 1 - NORMDIST(x, μ, σ, TRUE) (Area to the right) P(X < x)
69. Excel Formula for finding x value
• Probability, μ, σ are given.
• NORMINV(probability, μ, σ )
• Probability is a probability corresponding to the normal
distribution.
• μ is the arithmetic mean of the distribution.
• σ is the standard deviation of the distribution.
• For STANDARD NORMAL.
• NORMSDIST(z)
• Z is the value for which you want the distribution.
• NORMSINV(probability)
• Probability is a probability corresponding to the normal
distribution.
70. R functions for finding probability
• pnorm(q, mean = 0, sd = 1, lower.tail = TRUE, log.p = FALSE)
• Defaults are specified
• q - The x values that we need to find the
probability for.
• We can also directly calculate z value and use z value like
given below
• pnorm(z, lower.tail = TRUE, log.p = FALSE)
71. R functions for finding x value
• qnorm(p, mean, sd , lower.tail, log.p)
• P - probability (defaults NONE)
(Usually given as a quantile like
0.92 for 92%)
• Mean - mean (defaults 0)
• Sd - std_deviation (defaults 1)
• Lower.tail - TRUE -> P(X ≤ x)
- FALSE -> P(X < x)
• Log.p - TRUE -> p is given as log(p)
- FALSE -> p is a quantile
72. Relationship between normal and Chi-
Square Distribution
• If X is normally distributed with mean μ and
variance σ2 > 0, then:
• is distributed as a chi-square random variable
with 1 degree of freedom.
74. What is Statistical Inference ?
• Generating conclusions about population from
a noisy sample
• We try to identify the estimates of population
from the data available in the form of samples
• The Historical data is one of the most widely
available data
• The Survey data is the other form of available
data
75. Statistical Inference – The process
• We have sample data
• Hence we will have a measure for it like mean,
median or mode.
• The sample measure is called the estimator.
• Where it tries to estimate the population
measure.
• Sample mean is an estimate of population mean
• Sample median is an estimate of population
median.