2. ⚫Definition: Aportion of the populationwhich is examined with a
view to determining the population characteristics is called a
sample.
⚫In other words, sample is asubset ofpopulation.Size of the sample
is denoted byn.The processof selection of asample is called
Sampling.
⚫ There are different methods of sampling
Probability SamplingMethods
Non-Probability SamplingMethods
3. Probability Sampling Methods :
a) Random Sampling (Probability Sampling): It isthe processofdrawingasamplefrom a
populationin such awaythat each member of the population hasan equal chance of being included in
the sample.
Example: Ahand ofcards from a well shuffledpack ofcards is arandom sample.
Note: If N is the size of the population and n is the size of the sample,then The no. ofsamples with
replacement = Nn
The no. ofsamples without replacement = 𝑁Cn
b) Stratified Sampling :In this , the population is first dividedinto several smaller groupscalled strata
accordingto some relevant characteristics .
⚫ From each strata samples are selected at random,all the samples are combinedtogether to form the
stratified sampling.
c) ClusterSampling :
⚫ In cluster sampling,the populationis divided into mutually exclusive clusters.
⚫ Forexample,assumethat a researcher is interested in analyzinglife of smart phonebatteries from a
specificmanufacturer.The manufacturer may havedifferent models (each model in this case will be a
cluster).
d) Systematic Sampling (Quasi Random Sampling): In this method , all the units ofthe population
arearrangedin someorder . Ifthe population sizeisN, andthe samplesizeisn, then wefirst define
sampleinterval denotedby = N/n
4. Non Probability SamplingMethods:
⚫ Sampleunits are selected based on convenience and/or on voluntary basis.
Ex:Assumethat adatascientistisinterested in studyingattrition andfactors
influencing attrition. For this study,he/she maycollect data from his friends and
colleagueswhich maynot be true representation of the population.Such
samplingprocedures come under the category of non-probability sampling.
Convenience Sampling :
Convenience samplingis anon-probability sampling technique in which the sample
units are not selected according to aprobability distribution. For example,a
researcher may collect data from his school or the work place and from his/her
friends since the cost of data collection in such cases is minimal. Convenience
samplingis not recommended since it is likely to result in bias estimates.
5. Voluntary Sampling : Under voluntary sampling the data is collected from people
who volunteer for such data collection. For example,customer feedbacks in many
contexts fallunder this sampling procedure.There could be bias in case of voluntary
sampling.ManyorganizationssuchasAmazon,TripAdvisorprovidecustomer
feedback. Manytimes the feedbackis providedbycustomers who hadbad
experience with product/ service; manycustomers whowere happywith
product/service may not give feedback.
Purposive (Judgment ) Sampling : In this method, the members constituting the
sample are chosen not according to some definite scientific procedure , but
according to convenience and personal choice of the individual who selects the
sample . It is the choice of the individual items of asample entirely depends on the
individual judgment of the investigator.
Sequential Sampling: It consists of asequence of sample drawn one after another
from the population.Depending on the results of previous samples if the result of
the first sample is not acceptable then second sample is drawn and the process
continues to take proper decision . But if the first sample is acceptable ,then no
new sample is drawn .
6. Classification of Samples:
⚫ LargeSamples: Ifthe size of the sample n ≥ 30 , then it is said to
be largesample.
⚫SmallSamples : Ifthe size of the sample n < 30 ,then it is said to
be small sample or exact sample.
Parameters and Statistics:
⚫Parameter is astatistical measure based on all the units of a
population.
⚫Statistic is astatistical measure based on onlythe units selected in a
sample.
⚫Note: Inthisunit, Parameter refers to the population andStatistic
refers to sample.
7. SAMPLING DISTRIBUTION
⚫Sampling distribution refers to the probability distribution of a
statistic such assample mean and sample standard deviation
computed from several random samples of same size.
⚫Understanding the sampling distribution is important for
hypothesis testing.Test statistic in hypothesis testing is derived
based on the knowledge of sampling distribution.
⚫In this example,the population is the weight of six pumpkins (in
pounds) displayedin acarnival"guess the weight" gamebooth.You
are asked to guess the average weight of the six pumpkinsbytaking
arandom sample without replacement from the population.
8. Since we know the weights from the population,we can find the population
mean.
Todemonstrate the samplingdistribution, let’sstart with obtainingallofthe
possible samples of sizen=2 from the populations,samplingwithout
replacement.The table below showsall the possiblesamples,the weights for the
chosen pumpkins,the sample mean and the probability of obtaining each sample.
9.
10.
11. ⚫The mean of the sample means is :
⚫=9.5(1/15)+11.5(1/15)+12(2/15)+12.5(1/15)+13(1/15)+13.5(1
/15)+14(1/15)+14.5(2/15)+15.5(1/15)+16(1/15)+16.5(1/15)+1
7(1/15)+18(1/15)
⚫= 14
⚫Now,let's do the same thing as above but with sample size n=5
12.
13. ⚫ Central LimitTheorem: If ̅ be the mean of arandom sample of size n
drawn from population havingmean 𝜇 and standard deviation 𝜎 , then
the sampling distribution of the sample mean ̅ is approximately anormal
distribution with mean 𝜇 and SD = S.E of ̅ = 𝜎 / √n provided the
sample size n is large.
14. ⚫ Estimate :Anestimate is astatement made to find an unknown population
parameter.
⚫ Estimator :The procedure or rule to determine an unknown population
parameter is called estimator.
Example: Sampleproportion is anestimate of population proportion , because
with the help of sample proportion valuewe can estimate the population
proportion value.
T
ypes of Estimation:
⚫ Point Estimation: Ifthe estimate of the population parameter is given bya
single value, then the estimate is called apoint estimation of the parameter.
⚫ Interval Estimation: Ifthe estimate of the population parameter is given by
two different values where the parameter is excepted to lie, then the estimate is
called an interval estimation of the parameter.
16. In business, many claims are made by organizations. Few examples
of such
claimsare listed below:
1. Children who drink the health drink Complan (a health drink
owned by
the company Heinz in India) are likely to grow taller.
2. If you drink Horlicks, you can grow taller, stronger, and sharper
(3 in 1).
3. Using fair and lovely (fair and handsome) cream can make one
fair and
lovely(fair and handsome).
4.Wearing perfume (such as Axe) will help to attract opposite
gender
(known asAxe effect).
5.Women usecameraphone more than men (Freier, 2016).
There are many such claims and beliefs; many business rules and
17. ⚫ Takethe decisionto reject or retain the null hypothesisbasedon the p-
value and significance value α.The null hypothesis isrejected when p-
valueis less than α and the null hypothesisisretained when p-value is
greater than or equal to α.
⚫ Calculate the p-value(probability value), whichisthe conditional
probability ofobservingthe test statistic valuewhenthe null
hypothesisistrue. Insimpleterms,p-valueis the evidence in support
of the null hypothesis.
⚫ Decide the criteria for rejection and retention of null hypothesis.This
iscalledsignificancevaluetraditionallydenoted bysymbolα .The value
ofα will depend on the context and usually 0.1, 0.05, and 0.01 are
used.
⚫ ifthe calculated statistic valueis less than the critical value(p-valuewill
be lessthan α-value)then we reject the null hypothesis,whereas,ifthe
statistic value is greater than the critical value(p-valuewill be greater
18.
19. ⚫TYPEIERROR,TYPEIIERROR
⚫In hypothesis test we end up with the followingtwo decisions:
1. Reject null hypothesis.
2. Fail to reject (or retain) null hypothesis.
⚫Type I Error:Conditionalprobability of rejecting anull
hypothesis when it istrue is called Type IError or False
Positive (falselybelieving that the claimmade in alternative
hypothesisis true).
⚫ The significancevalue α is the valueofTypeIerror.
⚫TypeIError = α = P(Rejecting null hypothesis | H0 is true)
⚫Probabilityvalue (p-value) isthe evidence for the null
hypothesis whereas significancevalue α is the error based on
repetitive sampling.
20. ⚫T
ype II Error:Conditional probability of failingto reject anull
hypothesis (or retaining anull hypothesis) when the alternative hypothesis
istrue iscalledTypeIIError or FalseNegative(falselybelievingthat there
is no relationship).
⚫UsuallyT
ype IIerror isdenoted bythe symbol ß.
⚫TypeIIError = ß = P(Retain null hypothesis| H0 is false)
⚫The value(1 − ß ) is known asthe power of hypothesis test.
⚫Power of the test = 1 − ß = 1 − P(Retain null hypothesis | H0 is false)
⚫Alternativelythe power of test = 1 − ß = P(Reject null hypothesis|H0 is
false
21.
22. t-test :
⚫ The t-test is used when the population follows anormal distribution and the population standard
deviation sis unknown and isestimated from the sample. t-test is arobust test for violation of
normality of the data as long asthe data iscloseto symmetry and there are no outliers.
⚫ Let Sbe the standard deviation estimated from the sample of size n.Then the statistic
will follow a t-distribution with (n − 1) degrees of freedom if the sample is drawn from
apopulation that follows anormal distribution. Here 1 degree of freedom is lost since the
standard deviation is estimated from the sample.Thus, weuse the t-statistic (hence the test is
called t-test) to test the hypothesis when the population standard deviation isunknown. t-
statistic =
23. Chi-Square Goodness of Fit Tests
⚫Goodnessof fit testsare hypothesisteststhat are used for
comparingthe observed distribution of datawith expected
distribution of the datato decide whether there is anystatistically
significantdifference between the observed distribution and a
theoretical distribution based on comparison of observed
frequencies in the data and the expected frequencies if the data
followsaspecified theoretical distribution.
⚫The null and alternative hypothesesin chi-square goodnessof fit
tests are
H0 :There is no statistically significant differencebetweenthe
observed frequenciesandthe expected frequenciesfroma
hypothesized distribution.
HA:There isastatistically significantdifferencebetween the
24. ⚫Let Z be astandard normal distribution with 1 degree.
⚫Ifwe havek random variables,namely,X1 , X2, …, Xk, then achi-
square distribution with k-degrees of freedom is given by
⚫Consider abinomial random variable with parameter p (probability of
success) and number of trialsn.
⚫Then for a large sample, the standardized random variable in Eq.
follows a standard normal distribution (central limit theorem for
proportions):
25.
26. ⚫ Note that np and n(1 − p) are the expected values of two categories (success
and failure) of the binomial distribution.
⚫ Thus,the chi-square statisticfor goodness of fit test is given by
⚫ where Oij is the observed frequency in category (i, j) and Eij is the expected
frequency in the category(i, j).Thus, chi-square test is alwaysaright-tailed
test.
27. INTRODUCTION TO ANALYSIS OF VARIANCE (ANOVA)
⚫The objective of ANOVA is to check simultaneously whether
population mean from more than two populations are
different.
⚫ANOVA stands for Analysis of Variance. It is a statistical method
used to analyzethe differencesbetweenthe meansoftwo or more
groups or treatments.
⚫ It is often used to determine whether there are any
statisticallysignificant differencesbetween the means of
different groups.
⚫ANOVAis used to compare treatments, analyzefactors impact
on avariable, or compare meansacross multiple groups.
⚫Types of ANOVA include one-way (for comparing means of
groups) and two-way (for examining effects of two independent
28. ⚫One-way analysis of variance (ANOVA) :It is astatistical method
for testing for differencesin the meansof three or more groups.
⚫In statistics,ANOV
Aalsouses aNull hypothesis and anAlternate
hypothesis.
⚫The Null hypothesis inANOV
Ais validwhen allthe sample means are
equal,or they don’t haveanysignificant difference.
⚫On the other hand,the alternate hypothesisis validwhen at least one of
the sample means is different from the rest of the sample means.In
mathematical form, theycan be represented as:
⚫where μi is the mean of the i-th level of the factor.
29.
30. ⚫ Sum of Squares ofT
otalV
ariation (SST):
⚫Mean SquareTotal (MST) variation is given by
⚫Sum of Squares of Between (SSB) GroupV
ariation:
⚫Mean square between variation (MSB) is given by
⚫Sum of Squares ofWithin (SSW) GroupVariation:
⚫The mean square of variation within the group is
31. T
wo-WayANOV
A: T
wo wayANOV
Atechnique are used when the
data are classified based on the two factors.
Ex:the agricultural output may be classified on the basis of different
varieties of Seeds and also on the basis of different varieties of fertilizers are
used.
A statistical test is used to determine the effect of two nominal predictor
variables on a Continuous outcome variable.
Two way ANOV
Atest analyzes the effect of the independent variables on
the expected outcome along with their relationship to the outcome itself.