SlideShare a Scribd company logo
1 of 75
Sampling,Sampling,
Hypothesis &Hypothesis &
RegressionRegression
Sampling,Sampling,
Hypothesis &Hypothesis &
RegressionRegression
TBS910 BUSINESS ANALYTICSTBS910 BUSINESS ANALYTICS
by
Prof. Stephen Ong
Visiting Professor, Shenzhen
University
Visiting Fellow, Sydney Business
School, University of Wollongong
Today’s OverviewToday’s Overview
1.1. Using SamplesUsing Samples
2.2. Hypothesis TestingHypothesis Testing
3.3. RegressionRegression
TopicsTopics
7-3
USING SAMPLESUSING SAMPLES
1 - 4
Learning Objectives:Learning Objectives:
1.1. Understand how and why to useUnderstand how and why to use
samplingsampling
2.2. Appreciate the aims of statisticalAppreciate the aims of statistical
inferenceinference
3.3. Use sampling distributions to findUse sampling distributions to find
point estimates for population meanspoint estimates for population means
4.4. Calculate confidence intervals forCalculate confidence intervals for
means and proportionsmeans and proportions
5.5. Use one-sided distributionsUse one-sided distributions
6.6. UseUse tt-distributions for small samples-distributions for small samples
Collecting dataCollecting data
 There are essentially two types of dataThere are essentially two types of data
collectioncollection
 CensusCensus
 SampleSample
 A census is often expensive, infeasible,A census is often expensive, infeasible,
impossible, unnecessaryimpossible, unnecessary
 The usual choice is to collect a sampleThe usual choice is to collect a sample
Sampling is the basis ofSampling is the basis of
statistical inferencestatistical inference
 This collects data from a randomThis collects data from a random
sample of the populationsample of the population
 Uses this data to estimates features ofUses this data to estimates features of
the whole populationthe whole population
ExampleExample
 The percentages of glucose in five bars ofThe percentages of glucose in five bars of
toffee are 7.2, 6.4, 7.2, 8.0 and 8.2toffee are 7.2, 6.4, 7.2, 8.0 and 8.2
 The mean of these is:The mean of these is:
∑∑x / n = 37/5 = 7.4x / n = 37/5 = 7.4
 This is our estimate of the population meanThis is our estimate of the population mean
 The variance is (∑xThe variance is (∑x22
– (∑x)– (∑x)22
/n) / (n/n) / (n−−1) = 0.521) = 0.52
 This is our estimate of the populationThis is our estimate of the population
variancevariance
Sampling fluctuationsSampling fluctuations
 In practice, each sample will be slightlyIn practice, each sample will be slightly
differentdifferent
 The variation is described by aThe variation is described by a
sampling distributionsampling distribution
1 - 10
Sampling distributionsSampling distributions
 Data collected from samples gives a samplingData collected from samples gives a sampling
distribution, such as the sampling distributiondistribution, such as the sampling distribution
of the meanof the mean
 This has the features:This has the features:
 If a population is Normally distributed, the samplingIf a population is Normally distributed, the sampling
distribution of the mean is also Normally distributeddistribution of the mean is also Normally distributed
 With large samples the sampling distribution of theWith large samples the sampling distribution of the
mean is Normally distributed regardless of themean is Normally distributed regardless of the
population distributionpopulation distribution
 The sampling distribution of the mean has a mean μThe sampling distribution of the mean has a mean μ
and standard deviation σ/and standard deviation σ/√√nn
1 - 12
ExampleExample
 Chocolate bars have a mean weight of 50gChocolate bars have a mean weight of 50g
and standard deviation of 10g. A sample ofand standard deviation of 10g. A sample of
64 bars is taken and if the mean weight is64 bars is taken and if the mean weight is
less than 46g the whole days production isless than 46g the whole days production is
scraped.scraped.
 A large sample gives a samplingA large sample gives a sampling
distribution with mean 50g and standarddistribution with mean 50g and standard
deviation 10/√64 = 1.25gdeviation 10/√64 = 1.25g
 Z = (46Z = (46−−50)/1.25 =50)/1.25 = −−3.23.2
 P (Z<P (Z< −−3.2) =3.2) = −−0.00070.0007
 Sampling distributions can give point estimatesSampling distributions can give point estimates
of, say, a meanof, say, a mean
 Interval estimates are more usefulInterval estimates are more useful
 A 95% confidence interval, say, defines theA 95% confidence interval, say, defines the
range that we are 95% confident that therange that we are 95% confident that the
population mean lies withinpopulation mean lies within
 90% confidence interval:90% confidence interval: −− 1.645σ/1.645σ/√√n to + 1.645σ/n to + 1.645σ/√√nn
 95% confidence interval:95% confidence interval: −− 1.96σ/1.96σ/√√n to + 1.96σ/n to + 1.96σ/√√nn
 99% confidence interval:99% confidence interval: −− 2.58σ/2.58σ/√√n to + 2.58σ/n to + 2.58σ/√√nn
 The best point estimate for a populationThe best point estimate for a population
mean, is the mean of the samplingmean, is the mean of the sampling
distributiondistribution
 The best estimate for the populationThe best estimate for the population
standard deviation is the standard errorstandard deviation is the standard error
 The standard error can be biased for smallThe standard error can be biased for small
samplessamples
 Then Bessel’s correction usesThen Bessel’s correction uses
σ = s/σ = s/√√(n(n −− 1) rather than σ = s/1) rather than σ = s/√√nn
1 - 16
ExampleExample
 Attendance is normally distributed withAttendance is normally distributed with
variance of 225. Samples on five days give:variance of 225. Samples on five days give:
220220 196196 210210 186186 222222
 The point estimate of the mean is ∑x/5 =The point estimate of the mean is ∑x/5 =
206.8206.8
 95% confidence interval is:95% confidence interval is:
206.8206.8 ± 1.95√(225/5) = 206.8 ± 13.15± 1.95√(225/5) = 206.8 ± 13.15
 For 95% of samples the interval 193.65 toFor 95% of samples the interval 193.65 to
219.95 would include the population mean219.95 would include the population mean
 Investment returns are normally distributedInvestment returns are normally distributed
with standard deviation 0.5. A randomwith standard deviation 0.5. A random
sample of 10 has a mean of 0.9%.sample of 10 has a mean of 0.9%.
 Point estimate of population mean is 0.9Point estimate of population mean is 0.9
 95% confidence interval is:95% confidence interval is:
 98% confidence interval is:98% confidence interval is:
ExampleExample
The principles of sampling canThe principles of sampling can
be extended in many waysbe extended in many ways
 Population proportions, wherePopulation proportions, where
sampling distributions are:sampling distributions are:
 Normally distributedNormally distributed
 with meanwith mean ππ
 and standard deviationand standard deviation √√((ππ(1(1 −− ππ)/n))/n)
 One sided confidence intervalsOne sided confidence intervals
 Small samplesSmall samples
Population proportionsPopulation proportions
 With large sample (more than 30) theWith large sample (more than 30) the
sample proportions are:sample proportions are:
 Normally distributedNormally distributed
 with meanwith mean ππ
 standard deviationstandard deviation √√ ((ππ (1 –(1 – ππ)/)/nn))
 The 95% confidence interval is:The 95% confidence interval is:
 p – 1.96p – 1.96 ×× √(p (1√(p (1−−p) / n)p) / n)
to p + 1.96to p + 1.96 ×× √(p (1√(p (1 −− p) / n)p) / n)
1 - 21
One-sided confidence intervalOne-sided confidence interval
Sampling distributions withSampling distributions with
small samples are no longersmall samples are no longer
NormalNormal
 Small samples tend to include fewerSmall samples tend to include fewer
outlying values and under-estimate theoutlying values and under-estimate the
spreadspread
 This effect is allowed for in t-This effect is allowed for in t-
distributionsdistributions
 The shape of the t-distribution dependsThe shape of the t-distribution depends
on the degrees of freedomon the degrees of freedom
 The degree of freedom is essentiallyThe degree of freedom is essentially
one less than the sample sizeone less than the sample size
1 - 23
ExampleExample
 Duration of rentals is normally distributed.Duration of rentals is normally distributed.
 A random sample of 14 has a mean of 2.1429A random sample of 14 has a mean of 2.1429
and variance of 1.6703.and variance of 1.6703.
 Point estimate of population mean = 2.1429Point estimate of population mean = 2.1429
 Estimate of population standard deviation isEstimate of population standard deviation is
√(1.6703/14) = 0.3454√(1.6703/14) = 0.3454
 For t distribution with 13 degrees ofFor t distribution with 13 degrees of
freedom,99% confidence interval is 3.012freedom,99% confidence interval is 3.012
 Confidence interval is 2.1429Confidence interval is 2.1429 ±3.012±3.012 ×× 0.34540.3454
or 1.10 to 3.18or 1.10 to 3.18
Finding probabilities for theFinding probabilities for the
t-distributiont-distribution
 Calculations for t-distributions use:Calculations for t-distributions use:
 Standard tablesStandard tables
 Specialised statistical softwareSpecialised statistical software
 The TINV function in spreadsheetsThe TINV function in spreadsheets
 For samples of more than about 30, theFor samples of more than about 30, the
t-distribution is identical to the Normalt-distribution is identical to the Normal
1 - 26
Student-t distribution
Sample size 8
Mean 37
Standard deviation 12
Part (a)
Degrees of freedom 7 D3 - 1
Standard error 4.536 D5/SQRT(D3-1)
Confidence interval 90
Number of standard deviations 1.895 TINV((100-D11/100,D)
Confidence interval From 28.407 D4 - (D12*D9)
To 45.593001 D4 + (D12*D9)
Part (b)
Confidence interval 95
Number of standard deviations 2.365 TINV((100-D17/100,D8)
Confidence interval From 26.275 D4 - (D18*D9)
To 47.725 D4 + (D18*D9)
Part (c)
Sample size 20
Degrees of freedom 19
Standard error 2.753 D5/SQRT(D23-1)
Confidence interval 95
Number of standard deviations 2.093 TINV((100-D26/100,D24)
Confidence interval From 31.238 D4 - (D27*D25)
To 42.762 D4 + (D27*D25)
Part (d)
Normal distribution
Sample size 20
Standard error 2.753 D5/SQRT(D33-1)
Confidence interval 95
Number of standard deviations 1.960 NORMSINV((100-D35/200)
Confidence interval From 31.604 D4 - (D36*D34)
To 42.396 D4 + (D36*D34)
TESTING HYPOTHESISTESTING HYPOTHESIS
1 - 27
Learning ObjectivesLearning Objectives
1.1. Understand the purpose of hypothesis testingUnderstand the purpose of hypothesis testing
2.2. List the steps involved in hypothesis testingList the steps involved in hypothesis testing
3.3. Understand the errors involved and the use ofUnderstand the errors involved and the use of
significance levelssignificance levels
4.4. Test hypotheses about population meansTest hypotheses about population means
5.5. Use one- and two-tail testsUse one- and two-tail tests
6.6. Extend these tests to deal with small samplesExtend these tests to deal with small samples
7.7. Use the tests for a variety of problemsUse the tests for a variety of problems
8.8. Consider non-parametric tests, particularlyConsider non-parametric tests, particularly
the chi-squared testthe chi-squared test
Hypothesis testingHypothesis testing
 Considers a simple statement about aConsiders a simple statement about a
populationpopulation
 This is the hypothesisThis is the hypothesis
 Uses a sample to test whether the statement isUses a sample to test whether the statement is
likely to be true or is unlikelylikely to be true or is unlikely
 It sees whether or not data from a sampleIt sees whether or not data from a sample
supports a hypothesis about the populationsupports a hypothesis about the population
 If the hypothesis is unlikely, the hypothesis isIf the hypothesis is unlikely, the hypothesis is
rejected and another implied hypothesis mustrejected and another implied hypothesis must
be truebe true
 Define a simple, precise statement aboutDefine a simple, precise statement about
a population (the hypothesis)a population (the hypothesis)
 Take a sample from the populationTake a sample from the population
 Test this sample to see if it supports theTest this sample to see if it supports the
hypothesis, or if it makes the hypothesishypothesis, or if it makes the hypothesis
highly improbablehighly improbable
 If the hypothesis is highly improbableIf the hypothesis is highly improbable
reject it, otherwise accept itreject it, otherwise accept it
Procedure for hypothesisProcedure for hypothesis
testingtesting
ExampleExample
 A politician claims that 10% of factoriesA politician claims that 10% of factories
in an area are losing money (hypothesis)in an area are losing money (hypothesis)
 A sample of 30 shows that all of them areA sample of 30 shows that all of them are
profitable (test statistic)profitable (test statistic)
 If the hypothesis is true, the probabilityIf the hypothesis is true, the probability
that they are all profitable is 0.9that they are all profitable is 0.93030
= 0.0424= 0.0424
(p-value)(p-value)
 This is very unlikely, so we can reject theThis is very unlikely, so we can reject the
hypothesis (test result)hypothesis (test result)
Elements of a hypothesis testElements of a hypothesis test
1.1. Hypothesis – a simple statementHypothesis – a simple statement
about a populationabout a population
2.2. Test result - actual result from theTest result - actual result from the
samplesample
3.3. Test statistic – a calculation about aTest statistic – a calculation about a
sample assuming that the hypothesissample assuming that the hypothesis
is trueis true
4.4. Conclusion – either reject theConclusion – either reject the
hypothesis or nothypothesis or not
ExampleExample
1. Hypothesis - is that half of all staff have a degree
Ho : P = 0.5 H1 : P < 0.5
2. Test result - in a sample of 10 staff three have degrees
3. Test statistic – the number with degrees is a binomial
process. If probability is 0.5, the probability of a value
more extreme than 3 is:
p = P(x≤3) = P(0) + P(1) + P(2) + P(3) = 0.1719
4. Conclusion – 17.19% is greater than 5%, so the test is not
significant and there is no evidence to reject Ho
Because of uncertainty, we canBecause of uncertainty, we can
never be certain of the resultsnever be certain of the results
 Then we cannot really ‘accept’ aThen we cannot really ‘accept’ a
hypothesis, but instead we say that wehypothesis, but instead we say that we
‘cannot reject’ it‘cannot reject’ it
 If we reject the hypothesis – called theIf we reject the hypothesis – called the
null hypothesis – we implicitly acceptnull hypothesis – we implicitly accept
another hypothesis – called theanother hypothesis – called the
alternative hypothesisalternative hypothesis
Uncertainty also means thatUncertainty also means that
we can make mistakeswe can make mistakes
Type I and Type II errorsType I and Type II errors
 Ideally, both Type I and Type II errorsIdeally, both Type I and Type II errors
would be smallwould be small
 In practice, the probability of a Type IIn practice, the probability of a Type I
error is the significance levelerror is the significance level
 As this decreases, the probability ofAs this decreases, the probability of
accepting a false null hypothesisaccepting a false null hypothesis
increasesincreases
 We have to balance the two types ofWe have to balance the two types of
errorerror
Significance levelSignificance level
 Is the minimum acceptable probability that aIs the minimum acceptable probability that a
value actually comes from the hypothesisedvalue actually comes from the hypothesised
populationpopulation
 When the probability is less than this, we rejectWhen the probability is less than this, we reject
the null hypothesis; when the probability isthe null hypothesis; when the probability is
more than this we do not reject itmore than this we do not reject it
 It is the maximum acceptable probability ofIt is the maximum acceptable probability of
making a Type I errormaking a Type I error
 Usually 0.05, but other values are possible,Usually 0.05, but other values are possible,
notably 0.01notably 0.01
1 - 38
Bringing ideas together gives theBringing ideas together gives the
formal steps in a hypothesis testformal steps in a hypothesis test
 State the null and alternative hypothesesState the null and alternative hypotheses
 Specify the significance levelSpecify the significance level
 Calculate the acceptance range for theCalculate the acceptance range for the
variable testedvariable tested
 Find the actual value for the variableFind the actual value for the variable
testedtested
 Decide whether or not to reject the nullDecide whether or not to reject the null
hypothesishypothesis
 State the conclusionState the conclusion
ExampleExample
 The time a doctor spends with patients is N(7, 2The time a doctor spends with patients is N(7, 222
). A). A
new doctor spends seems to work more slowly.new doctor spends seems to work more slowly.
 HH00 :: μμ = 7= 7 HH11 :: μμ > 7> 7
 A sample of 56 patients takes 420 minutes, givingA sample of 56 patients takes 420 minutes, giving μμ ==
7.57.5
 Probability ofProbability of μμ ≥7.5 (with Z = 0.5/(2/√56)) 0.03≥7.5 (with Z = 0.5/(2/√56)) 0.03
 0.03 is less than 0.05 the test is significant and we can0.03 is less than 0.05 the test is significant and we can
reject Hreject H00 There is evidence to support HThere is evidence to support H11 that the newthat the new
doctor spends more than 7 minutesdoctor spends more than 7 minutes
There is a huge number ofThere is a huge number of
variations on the generalvariations on the general
procedure including:procedure including:
 One-sided testsOne-sided tests
 Tests with small samplesTests with small samples
 Tests for population proportionsTests for population proportions
 Testing for differences in meansTesting for differences in means
 Paired testsPaired tests
 Goodness of fitGoodness of fit
 Tests of associationTests of association
1 - 42
1 - 43
Data
Original
interviews
Later
interviews
Original
interviews
Later
interviews
10 10 Mean 8.625 10.125
11 9 Variance 2.8393 1.2679
9 11 Observations 8 8
6 10 Hypothesised Mean Difference 0
8 9 Degrees of freedom 7
10 12 t Statistic 2.291
7 9 P(T<=t) one-tail 0.0279
8 11 t Critical one-tail 1.8946
P(T<=t) two-tail 0.0557
t Critical two-tail 2.3646
Paired tests
Amethyst Interviews
t-Test: Paired Two Sample for Means
Non-parametric tests are usedNon-parametric tests are used
when there is no appropriatewhen there is no appropriate
parameter to measureparameter to measure
 Then we have to use a non-parametric – orThen we have to use a non-parametric – or
distribution free –testdistribution free –test
 The most common is the chi squaredThe most common is the chi squared testtest
 WhereWhere
 The shape of the curve depends on theThe shape of the curve depends on the
degrees of freedom (the number of classesdegrees of freedom (the number of classes
minus the number of estimated variablesminus the number of estimated variables
minus one)minus one)
E
)E(O
=
2
2 -
∑χ
1 - 45
Goodness of fit testGoodness of fit test
 We reject the hypothesis that the dataWe reject the hypothesis that the data
follows a specified distribution when thefollows a specified distribution when the
calculated value is greater than acalculated value is greater than a
critical value for the distributioncritical value for the distribution
 The critical value is found inThe critical value is found in
 Standard tablesStandard tables
 Statistical softwareStatistical software
 The CHIINV function in spreadsheetsThe CHIINV function in spreadsheets
ExampleExample
= 66.35
 The probability of this is given by aThe probability of this is given by a χχ22
distribution with cdistribution with c −− 1 degrees of freedom (c is1 degrees of freedom (c is
the number of categories)the number of categories)
 This comes from standard tables or softwareThis comes from standard tables or software
 In this case the probability thatIn this case the probability that χχ22
> 66.35 is> 66.35 is
0.000 (to three decimal places)0.000 (to three decimal places)
 This probability is less than 0.05 so we rejectThis probability is less than 0.05 so we reject
the null hypothesis that the observations comethe null hypothesis that the observations come
from the stated distributionfrom the stated distribution
Example (continued)Example (continued)
There are five categories so we need the χ2
distribution with
5 – 1 = 4 degrees of freedom and then P (2 > 11.389) = 0.0225.
This is less than 0.05 so we conclude that there is evidence that
the probability distribution differs from the hypothesised one.
Example (continued)Example (continued)
Tests of associationTests of association
 Test for relationships betweenTest for relationships between
variables described in a contingencyvariables described in a contingency
tabletable
Contingency tables continuedContingency tables continued
 The expected value in each cell is:The expected value in each cell is:
E =E = row totalrow total ×× column totalcolumn total
number of observationsnumber of observations
 Then we calculateThen we calculate χχ22
as usual and look up theas usual and look up the
related probability with degrees of freedom:related probability with degrees of freedom:
(number of rows(number of rows −−1)1) ×× (number of columns(number of columns −−1)1)
= 2= 2−−1)1) ×× (3(3−−1) = 21) = 2
ResultsResults
REGRESSIONREGRESSION
1 - 53
Learning ObjectivesLearning Objectives
1.1. Understand the purpose of regressionUnderstand the purpose of regression
2.2. See how the strength of a relationship is related to theSee how the strength of a relationship is related to the
amount of noiseamount of noise
3.3. Measure the errors introduced by noiseMeasure the errors introduced by noise
4.4. Use linear regression to find the line of best fit through aUse linear regression to find the line of best fit through a
set of dataset of data
5.5. Use this line of best fit for causal forecastingUse this line of best fit for causal forecasting
6.6. Calculate and interpret coefficients of determination andCalculate and interpret coefficients of determination and
correlationcorrelation
7.7. Use Spearman’s coefficient of rank correlationUse Spearman’s coefficient of rank correlation
8.8. Understand the results of multiple regressionUnderstand the results of multiple regression
9.9. Use curve fitting for more complex functionsUse curve fitting for more complex functions
Regression and curve fitting look forRegression and curve fitting look for
relationships between variablesrelationships between variables
 There are two basic questions:There are two basic questions:
 Finding the best relationship,Finding the best relationship,
which is regressionwhich is regression
 Seeing how well the relationshipSeeing how well the relationship
fits the data, which is measuredfits the data, which is measured
by correlation and determinationby correlation and determination
The most common approach isThe most common approach is
linear regressionlinear regression
 This draws the straight line of best fitThis draws the straight line of best fit
through a set of datathrough a set of data
 The general procedure is:The general procedure is:
 Draw a scatter diagramDraw a scatter diagram
 Identify a linear relationshipIdentify a linear relationship
 Find the line of best fit through the dataFind the line of best fit through the data
 Uses this line to predict a value for theUses this line to predict a value for the
dependent variable from a known value ofdependent variable from a known value of
the independent variablethe independent variable
Paired dataPaired data
1 - 58
0
5
10
15
20
25
30
35
40
45
0 2 4 6 8 10 12 14 16 18
Temperature
Electricityconsupmtion
Underlying trend
Some noise
More noise
To find the line of best fitTo find the line of best fit
 We have to find values for the constants aWe have to find values for the constants a
and b in the equation y = a + bxand b in the equation y = a + bx
 There is noise - or a deviation or error - inThere is noise - or a deviation or error - in
every observationevery observation
 The ‘best’ line is defined as the one thatThe ‘best’ line is defined as the one that
minimises the mean squared errorminimises the mean squared error
 There is a calculation for a and b, butThere is a calculation for a and b, but
standard software is more reliablestandard software is more reliable
( )
xb-y=a
x-xn
yx-xyn
=b 22
∑∑
∑∑∑
1 - 60
1 - 61
Quality control at Olfentia Travel
Checks Mistakes SUMMARY OUTPUT
0 92
1 86 Regression Statistics
2 81 Multiple R 0.994
3 72 R Square 0.988
4 67 Adjusted R Square 0.986
5 59 Standard Error 3.077
6 53 Observations 11
7 43
8 32 ANOVA
9 24 df SS MS
10 12 Regression 1 6833.536 6833.536
Residual 9 85.191 9.466
Total 10 6918.727
CoefficientsStandard Error
Intercept 95.864 1.735
X Variable 1 -7.882 0.293
0
10
20
30
40
50
60
70
80
90
100
0 2 4 6 8 10 12
Number of checks
Numberofmistakes
The strength of a relationship isThe strength of a relationship is
measured bymeasured by
 Coefficient of determinationCoefficient of determination
 Which see how much of the total sum of squaredWhich see how much of the total sum of squared
errors is explained by the regressionerrors is explained by the regression
 Takes a value between zero and + 1Takes a value between zero and + 1
 Coefficient of correlationCoefficient of correlation
 Which asks how closely x and y are linearlyWhich asks how closely x and y are linearly
relatedrelated
 Is the square root of the coefficient ofIs the square root of the coefficient of
determinationdetermination
 Takes a value between -1 and +1Takes a value between -1 and +1
 Spearman’s rank correlationSpearman’s rank correlation
 For ordinal dataFor ordinal data
1 - 63
1 - 64
yy
x x
y y
x x
y
xx
y
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
r = + 1 , p e r fe c t p o s it iv e c o r r e la t io n r c lo s e t o 1 , lin e o f g o o d fit
r = 0 , r a n d o m p o in t s w i t h n o
c o r r e la t io n
r d e c r e a s in g w it h w o r s e fit o f lin e
r c lo s e t o - 1 , lin e o f g o o d fit
r = - 1 , p e r fe c t n e g a t iv e
c o r r e la t io n
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
F ig u r e 9 . 1 2 In t e r p r e t in g t h e c o e ffic ie n t o f c o r r e la t io n
1 - 65
Correlation and determination
x y SUMMARY OUTPUT
4 13
17 47 Regression Statistics
3 24 Multiple R 0.797
21 41 R Square 0.635
10 29 Adjusted R Square 0.607
8 33 Standard Error 7.261
4 28 Observations 15
9 38
13 46 ANOVA
12 32 df SS
2 14 Regression 1 1191.630
6 22 Residual 13 685.304
15 26 Total 14 1876.933
8 21
19 50 Coefficients
Intercept 15.376
X Variable 1 1.545
Rank correlationRank correlation
2
s 2
6 6 8
1   1
( 1) 5 (25 1)
  0.6
D
r
n n
×
= − = −
− × −
=
∑
Service
V W X Y Z
Quality ranking 2 5 1 3 4
Cost ranking 1 3 2 4 5
Multiple (linear) regressionMultiple (linear) regression
 Extends the principles of linear regression toExtends the principles of linear regression to
more independent variablesmore independent variables
Y = a + bY = a + b11xx11 + b+ b22xx22 + b+ b33xx33 + b+ b44xx44 …….…….
 The calculations for this are always done byThe calculations for this are always done by
standard softwarestandard software
 You have to be careful with the requirementsYou have to be careful with the requirements
(technically called multicollinearity,(technically called multicollinearity,
autocorrelation, extrapolation, etc)autocorrelation, extrapolation, etc)
 You also have to be careful with theYou also have to be careful with the
interpretation of resultsinterpretation of results
1 - 68
Multiple Regression
DATA
Sales Advertising Price
2450 100 50
3010 130 56
3090 160 45
3700 190 63
3550 210 48
4280 240 70
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.996
R Square 0.992
Adjusted R Square 0.986
Standard Error 75.055
Observations 6
ANOVA
df SS
Regression 2 2003633.452
Residual 3 16899.882
Total 5 2020533.333
Coefficients Standard Error
Intercept 585.96 195.57
X Variable 1 9.92 0.76
X Variable 2 19.11 4.13
Multiple Linear RegressionMultiple Linear Regression
Computer ResultsComputer Results
1 - 71
Multiple regression
Production Shifts Bonus Overtime Morale
2810 6 15 8 5
2620 3 20 10 6
3080 3 5 22 3
4200 4 5 31 2
1500 1 7 9 8
3160 2 12 22 10
4680 2 25 30 7
2330 7 10 5 7
1780 1 12 7 5
3910 8 3 20 3
Correlations
Production Shifts Bonus Overtime Morale
Production 1
Shifts 0.262 1
Bonus 0.153 -0.315 1
Overtime 0.878 -0.108 -0.022 1
Morale -0.332 -0.395 0.451 -0.265 1
Regression
Multiple R 0.997
R Square 0.995
Adjusted R Square 0.990
Standard Error 100.807
Observations 10
ANOVA
df SS
Regression 4 9439000
Residual 5 50810
Total 9 9489810
Coefficients Standard Error
Intercept 346.33 160.22
X Variable 1 181.80 15.25
X Variable 2 50.13 5.44
X Variable 3 96.17 3.69
X Variable 4 -28.70 16.76
Curve fittingCurve fitting
 Is a more general term than regressionIs a more general term than regression
 It refers to the process of fittingIt refers to the process of fitting
different types of curve through sets ofdifferent types of curve through sets of
datadata
 Typically this involves:Typically this involves:
 Exponential curvesExponential curves
 Growth curvesGrowth curves
 PolynomialsPolynomials
1 - 73
Curve fitting
Year Cost Predictions Line of fit
1 0.8 0.816 y=bm^x b= 0.6541 m= 1.2475
2 1 1.018
3 1.3 1.270
4 1.7 1.584
5 2 1.976
6 2.4 2.465
7 2.9 3.075
8 3.8 3.836
9 4.7 4.786
10 6.2 5.970
11 7.5 7.448
12 9.291
13 11.591
14 14.459
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14
Further ReadingFurther Reading
 Waters, Donald (2011) QuantitativeWaters, Donald (2011) Quantitative
Methods for Business, Prentice Hall, 5Methods for Business, Prentice Hall, 5thth
EditionEdition
 Evans, J.R (2013), BusinessEvans, J.R (2013), Business Analytics,Analytics,
1st Edition1st Edition PearsonPearson
 Render, B., Stair Jr.,R.M. & Hanna, M.E.
(2013) Quantitative Analysis for
Management, Pearson, 11th
Edition
QUESTIONS?QUESTIONS?

More Related Content

What's hot

CABT SHS Statistics & Probability - The z-scores and Problems involving Norma...
CABT SHS Statistics & Probability - The z-scores and Problems involving Norma...CABT SHS Statistics & Probability - The z-scores and Problems involving Norma...
CABT SHS Statistics & Probability - The z-scores and Problems involving Norma...Gilbert Joseph Abueg
 
Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)jillmitchell8778
 
Week 7 quiz_review_mini_tab_2011
Week 7 quiz_review_mini_tab_2011Week 7 quiz_review_mini_tab_2011
Week 7 quiz_review_mini_tab_2011Brent Heard
 
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...nszakir
 
Chapter 7 – Confidence Intervals And Sample Size
Chapter 7 – Confidence Intervals And Sample SizeChapter 7 – Confidence Intervals And Sample Size
Chapter 7 – Confidence Intervals And Sample SizeRose Jenkins
 
Statistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsStatistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsEugene Yan Ziyou
 
Normal distribution
Normal distributionNormal distribution
Normal distributionCamilleJoy3
 
Confidence interval & probability statements
Confidence interval & probability statements Confidence interval & probability statements
Confidence interval & probability statements DrZahid Khan
 
Confidence Level and Sample Size
Confidence Level and Sample SizeConfidence Level and Sample Size
Confidence Level and Sample SizeEmilio Fer Villa
 
Business Statistics Chapter 8
Business Statistics Chapter 8Business Statistics Chapter 8
Business Statistics Chapter 8Lux PP
 
JM Statr session 13, Jan 11
JM Statr session 13, Jan 11JM Statr session 13, Jan 11
JM Statr session 13, Jan 11Ruru Chowdhury
 
Powerpoint sampling distribution
Powerpoint sampling distributionPowerpoint sampling distribution
Powerpoint sampling distributionSusan McCourt
 

What's hot (20)

CABT SHS Statistics & Probability - The z-scores and Problems involving Norma...
CABT SHS Statistics & Probability - The z-scores and Problems involving Norma...CABT SHS Statistics & Probability - The z-scores and Problems involving Norma...
CABT SHS Statistics & Probability - The z-scores and Problems involving Norma...
 
Chapter09
Chapter09Chapter09
Chapter09
 
estimation
estimationestimation
estimation
 
Estimation
EstimationEstimation
Estimation
 
Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)Statistics lecture 8 (chapter 7)
Statistics lecture 8 (chapter 7)
 
Week 7 quiz_review_mini_tab_2011
Week 7 quiz_review_mini_tab_2011Week 7 quiz_review_mini_tab_2011
Week 7 quiz_review_mini_tab_2011
 
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
Chapter 6 part1- Introduction to Inference-Estimating with Confidence (Introd...
 
L estimation
L estimationL estimation
L estimation
 
Chapter 7 – Confidence Intervals And Sample Size
Chapter 7 – Confidence Intervals And Sample SizeChapter 7 – Confidence Intervals And Sample Size
Chapter 7 – Confidence Intervals And Sample Size
 
normal distribution
normal distributionnormal distribution
normal distribution
 
Statistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-testsStatistical inference: Hypothesis Testing and t-tests
Statistical inference: Hypothesis Testing and t-tests
 
Statistics78 (2)
Statistics78 (2)Statistics78 (2)
Statistics78 (2)
 
Normal distribution
Normal distributionNormal distribution
Normal distribution
 
Confidence interval & probability statements
Confidence interval & probability statements Confidence interval & probability statements
Confidence interval & probability statements
 
Confidence Level and Sample Size
Confidence Level and Sample SizeConfidence Level and Sample Size
Confidence Level and Sample Size
 
Business Statistics Chapter 8
Business Statistics Chapter 8Business Statistics Chapter 8
Business Statistics Chapter 8
 
JM Statr session 13, Jan 11
JM Statr session 13, Jan 11JM Statr session 13, Jan 11
JM Statr session 13, Jan 11
 
Confidence Intervals
Confidence IntervalsConfidence Intervals
Confidence Intervals
 
Sfs4e ppt 08
Sfs4e ppt 08Sfs4e ppt 08
Sfs4e ppt 08
 
Powerpoint sampling distribution
Powerpoint sampling distributionPowerpoint sampling distribution
Powerpoint sampling distribution
 

Viewers also liked

Bba 2204 fin mgt introduction 180913
Bba 2204 fin mgt introduction 180913Bba 2204 fin mgt introduction 180913
Bba 2204 fin mgt introduction 180913Stephen Ong
 
Decision analysis (Part I)
Decision analysis (Part I)Decision analysis (Part I)
Decision analysis (Part I)Ask To Solve
 
MySQL for the Oracle DBA - Object Management
MySQL for the Oracle DBA - Object ManagementMySQL for the Oracle DBA - Object Management
MySQL for the Oracle DBA - Object ManagementRonald Bradford
 
Getting started with MySQL on Amazon Web Services
Getting started with MySQL on Amazon Web ServicesGetting started with MySQL on Amazon Web Services
Getting started with MySQL on Amazon Web ServicesRonald Bradford
 
Dbs1034 biz trx week 12 balance sheet
Dbs1034 biz trx week 12 balance sheetDbs1034 biz trx week 12 balance sheet
Dbs1034 biz trx week 12 balance sheetStephen Ong
 
Bba 2204 fin mgt week 2 financial markets
Bba 2204 fin mgt week 2 financial marketsBba 2204 fin mgt week 2 financial markets
Bba 2204 fin mgt week 2 financial marketsStephen Ong
 
Mba2216 business research project course intro 080613
Mba2216 business research project course intro 080613Mba2216 business research project course intro 080613
Mba2216 business research project course intro 080613Stephen Ong
 
Mba1034 cg law ethics course intro 120613
Mba1034 cg law ethics course intro 120613Mba1034 cg law ethics course intro 120613
Mba1034 cg law ethics course intro 120613Stephen Ong
 
Mba2216 week 01 intro
Mba2216 week 01 introMba2216 week 01 intro
Mba2216 week 01 introStephen Ong
 
Decision analysis part ii
Decision analysis part iiDecision analysis part ii
Decision analysis part iiAsk To Solve
 
Extending The My Sql Data Landscape
Extending The My Sql Data LandscapeExtending The My Sql Data Landscape
Extending The My Sql Data LandscapeRonald Bradford
 
Abdm4223 lecture week 3 210513
Abdm4223 lecture week 3 210513Abdm4223 lecture week 3 210513
Abdm4223 lecture week 3 210513Stephen Ong
 
Dbs1034 biz trx week 9 balancing off accounts
Dbs1034 biz trx week 9 balancing off accountsDbs1034 biz trx week 9 balancing off accounts
Dbs1034 biz trx week 9 balancing off accountsStephen Ong
 
Abdm4064 week 09 10 sampling
Abdm4064 week 09 10 samplingAbdm4064 week 09 10 sampling
Abdm4064 week 09 10 samplingStephen Ong
 
Bba 3274 qm week 5 game theory
Bba 3274 qm week 5 game theoryBba 3274 qm week 5 game theory
Bba 3274 qm week 5 game theoryStephen Ong
 
Best Practices in Migrating to MySQL - Part 1
Best Practices in Migrating to MySQL - Part 1Best Practices in Migrating to MySQL - Part 1
Best Practices in Migrating to MySQL - Part 1Ronald Bradford
 
Abdm4064 week 05 data collection methods part 1
Abdm4064 week 05 data collection methods part 1Abdm4064 week 05 data collection methods part 1
Abdm4064 week 05 data collection methods part 1Stephen Ong
 

Viewers also liked (20)

SQL v No SQL
SQL v No SQLSQL v No SQL
SQL v No SQL
 
Bba 2204 fin mgt introduction 180913
Bba 2204 fin mgt introduction 180913Bba 2204 fin mgt introduction 180913
Bba 2204 fin mgt introduction 180913
 
Decision analysis (Part I)
Decision analysis (Part I)Decision analysis (Part I)
Decision analysis (Part I)
 
Change
ChangeChange
Change
 
MySQL for the Oracle DBA - Object Management
MySQL for the Oracle DBA - Object ManagementMySQL for the Oracle DBA - Object Management
MySQL for the Oracle DBA - Object Management
 
Getting started with MySQL on Amazon Web Services
Getting started with MySQL on Amazon Web ServicesGetting started with MySQL on Amazon Web Services
Getting started with MySQL on Amazon Web Services
 
Dbs1034 biz trx week 12 balance sheet
Dbs1034 biz trx week 12 balance sheetDbs1034 biz trx week 12 balance sheet
Dbs1034 biz trx week 12 balance sheet
 
Assignment 6.1
Assignment 6.1Assignment 6.1
Assignment 6.1
 
Bba 2204 fin mgt week 2 financial markets
Bba 2204 fin mgt week 2 financial marketsBba 2204 fin mgt week 2 financial markets
Bba 2204 fin mgt week 2 financial markets
 
Mba2216 business research project course intro 080613
Mba2216 business research project course intro 080613Mba2216 business research project course intro 080613
Mba2216 business research project course intro 080613
 
Mba1034 cg law ethics course intro 120613
Mba1034 cg law ethics course intro 120613Mba1034 cg law ethics course intro 120613
Mba1034 cg law ethics course intro 120613
 
Mba2216 week 01 intro
Mba2216 week 01 introMba2216 week 01 intro
Mba2216 week 01 intro
 
Decision analysis part ii
Decision analysis part iiDecision analysis part ii
Decision analysis part ii
 
Extending The My Sql Data Landscape
Extending The My Sql Data LandscapeExtending The My Sql Data Landscape
Extending The My Sql Data Landscape
 
Abdm4223 lecture week 3 210513
Abdm4223 lecture week 3 210513Abdm4223 lecture week 3 210513
Abdm4223 lecture week 3 210513
 
Dbs1034 biz trx week 9 balancing off accounts
Dbs1034 biz trx week 9 balancing off accountsDbs1034 biz trx week 9 balancing off accounts
Dbs1034 biz trx week 9 balancing off accounts
 
Abdm4064 week 09 10 sampling
Abdm4064 week 09 10 samplingAbdm4064 week 09 10 sampling
Abdm4064 week 09 10 sampling
 
Bba 3274 qm week 5 game theory
Bba 3274 qm week 5 game theoryBba 3274 qm week 5 game theory
Bba 3274 qm week 5 game theory
 
Best Practices in Migrating to MySQL - Part 1
Best Practices in Migrating to MySQL - Part 1Best Practices in Migrating to MySQL - Part 1
Best Practices in Migrating to MySQL - Part 1
 
Abdm4064 week 05 data collection methods part 1
Abdm4064 week 05 data collection methods part 1Abdm4064 week 05 data collection methods part 1
Abdm4064 week 05 data collection methods part 1
 

Similar to Tbs910 sampling hypothesis regression

3. Statistical inference_anesthesia.pptx
3.  Statistical inference_anesthesia.pptx3.  Statistical inference_anesthesia.pptx
3. Statistical inference_anesthesia.pptxAbebe334138
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability DistributionsHarish Lunani
 
Statistik Chapter 6
Statistik Chapter 6Statistik Chapter 6
Statistik Chapter 6WanBK Leo
 
Chapter one on sampling distributions.ppt
Chapter one on sampling distributions.pptChapter one on sampling distributions.ppt
Chapter one on sampling distributions.pptFekaduAman
 
Statistical Estimation
Statistical Estimation Statistical Estimation
Statistical Estimation Remyagharishs
 
Biostatics part 7.pdf
Biostatics part 7.pdfBiostatics part 7.pdf
Biostatics part 7.pdfNatiphBasha
 
Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsBabasab Patil
 
Sample size by formula
Sample size by formulaSample size by formula
Sample size by formulaMmedsc Hahm
 
Sampling_Distribution_stat_of_Mean_New.pptx
Sampling_Distribution_stat_of_Mean_New.pptxSampling_Distribution_stat_of_Mean_New.pptx
Sampling_Distribution_stat_of_Mean_New.pptxRajJirel
 
Sampling distribution.pptx
Sampling distribution.pptxSampling distribution.pptx
Sampling distribution.pptxssusera0e0e9
 
Statistik 1 7 estimasi & ci
Statistik 1 7 estimasi & ciStatistik 1 7 estimasi & ci
Statistik 1 7 estimasi & ciSelvin Hadi
 
Sampling distribution by Dr. Ruchi Jain
Sampling distribution by Dr. Ruchi JainSampling distribution by Dr. Ruchi Jain
Sampling distribution by Dr. Ruchi JainRuchiJainRuchiJain
 
2_5332511410507220042.ppt
2_5332511410507220042.ppt2_5332511410507220042.ppt
2_5332511410507220042.pptnedalalazzwy
 
Chp11 - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Chp11  - Research Methods for Business By Authors Uma Sekaran and Roger BougieChp11  - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Chp11 - Research Methods for Business By Authors Uma Sekaran and Roger BougieHassan Usman
 
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
Discrete distributions:  Binomial, Poisson & Hypergeometric distributionsDiscrete distributions:  Binomial, Poisson & Hypergeometric distributions
Discrete distributions: Binomial, Poisson & Hypergeometric distributionsScholarsPoint1
 

Similar to Tbs910 sampling hypothesis regression (20)

3. Statistical inference_anesthesia.pptx
3.  Statistical inference_anesthesia.pptx3.  Statistical inference_anesthesia.pptx
3. Statistical inference_anesthesia.pptx
 
Normal as Approximation to Binomial
Normal as Approximation to Binomial  Normal as Approximation to Binomial
Normal as Approximation to Binomial
 
Probability Distributions
Probability DistributionsProbability Distributions
Probability Distributions
 
Statistik Chapter 6
Statistik Chapter 6Statistik Chapter 6
Statistik Chapter 6
 
Chapter one on sampling distributions.ppt
Chapter one on sampling distributions.pptChapter one on sampling distributions.ppt
Chapter one on sampling distributions.ppt
 
Statistical Estimation
Statistical Estimation Statistical Estimation
Statistical Estimation
 
lecture-2.ppt
lecture-2.pptlecture-2.ppt
lecture-2.ppt
 
Biostatics part 7.pdf
Biostatics part 7.pdfBiostatics part 7.pdf
Biostatics part 7.pdf
 
Discrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec domsDiscrete and continuous probability distributions ppt @ bec doms
Discrete and continuous probability distributions ppt @ bec doms
 
Sample size by formula
Sample size by formulaSample size by formula
Sample size by formula
 
Sampling_Distribution_stat_of_Mean_New.pptx
Sampling_Distribution_stat_of_Mean_New.pptxSampling_Distribution_stat_of_Mean_New.pptx
Sampling_Distribution_stat_of_Mean_New.pptx
 
Sampling distribution.pptx
Sampling distribution.pptxSampling distribution.pptx
Sampling distribution.pptx
 
Chapter08
Chapter08Chapter08
Chapter08
 
Statistik 1 7 estimasi & ci
Statistik 1 7 estimasi & ciStatistik 1 7 estimasi & ci
Statistik 1 7 estimasi & ci
 
Estimating a Population Proportion
Estimating a Population Proportion  Estimating a Population Proportion
Estimating a Population Proportion
 
Sampling distribution by Dr. Ruchi Jain
Sampling distribution by Dr. Ruchi JainSampling distribution by Dr. Ruchi Jain
Sampling distribution by Dr. Ruchi Jain
 
2_5332511410507220042.ppt
2_5332511410507220042.ppt2_5332511410507220042.ppt
2_5332511410507220042.ppt
 
Chp11 - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Chp11  - Research Methods for Business By Authors Uma Sekaran and Roger BougieChp11  - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
Chp11 - Research Methods for Business By Authors Uma Sekaran and Roger Bougie
 
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
Discrete distributions:  Binomial, Poisson & Hypergeometric distributionsDiscrete distributions:  Binomial, Poisson & Hypergeometric distributions
Discrete distributions: Binomial, Poisson & Hypergeometric distributions
 
Unit 3 Sampling
Unit 3 SamplingUnit 3 Sampling
Unit 3 Sampling
 

More from Stephen Ong

Tcm step 3 venture assessment
Tcm step 3 venture assessmentTcm step 3 venture assessment
Tcm step 3 venture assessmentStephen Ong
 
Tcm step 2 market needs analysis
Tcm step 2 market needs analysisTcm step 2 market needs analysis
Tcm step 2 market needs analysisStephen Ong
 
Tcm step 1 technology analysis
Tcm step 1 technology analysisTcm step 1 technology analysis
Tcm step 1 technology analysisStephen Ong
 
Tcm Workshop 1 Technology analysis
Tcm Workshop 1 Technology analysisTcm Workshop 1 Technology analysis
Tcm Workshop 1 Technology analysisStephen Ong
 
Tcm step 3 venture assessment
Tcm step 3 venture assessmentTcm step 3 venture assessment
Tcm step 3 venture assessmentStephen Ong
 
Tcm step 2 market needs analysis
Tcm step 2 market needs analysisTcm step 2 market needs analysis
Tcm step 2 market needs analysisStephen Ong
 
Tcm step 1 technology analysis
Tcm step 1 technology analysisTcm step 1 technology analysis
Tcm step 1 technology analysisStephen Ong
 
Tcm concept discovery stage introduction
Tcm concept discovery stage introductionTcm concept discovery stage introduction
Tcm concept discovery stage introductionStephen Ong
 
Mod001093 german sme hidden champions 120415
Mod001093 german sme hidden champions 120415Mod001093 german sme hidden champions 120415
Mod001093 german sme hidden champions 120415Stephen Ong
 
Tbs910 linear programming
Tbs910 linear programmingTbs910 linear programming
Tbs910 linear programmingStephen Ong
 
Mod001093 family businesses 050415
Mod001093 family businesses 050415Mod001093 family businesses 050415
Mod001093 family businesses 050415Stephen Ong
 
Gs503 vcf lecture 8 innovation finance ii 060415
Gs503 vcf lecture 8 innovation finance ii 060415Gs503 vcf lecture 8 innovation finance ii 060415
Gs503 vcf lecture 8 innovation finance ii 060415Stephen Ong
 
Gs503 vcf lecture 7 innovation finance i 300315
Gs503 vcf lecture 7 innovation finance i 300315Gs503 vcf lecture 7 innovation finance i 300315
Gs503 vcf lecture 7 innovation finance i 300315Stephen Ong
 
Tbs910 regression models
Tbs910 regression modelsTbs910 regression models
Tbs910 regression modelsStephen Ong
 
Mod001093 intrapreneurship 290315
Mod001093 intrapreneurship 290315Mod001093 intrapreneurship 290315
Mod001093 intrapreneurship 290315Stephen Ong
 
Gs503 vcf lecture 6 partial valuation ii 160315
Gs503 vcf lecture 6 partial valuation ii  160315Gs503 vcf lecture 6 partial valuation ii  160315
Gs503 vcf lecture 6 partial valuation ii 160315Stephen Ong
 
Gs503 vcf lecture 5 partial valuation i 140315
Gs503 vcf lecture 5 partial valuation i  140315Gs503 vcf lecture 5 partial valuation i  140315
Gs503 vcf lecture 5 partial valuation i 140315Stephen Ong
 
Mod001093 context of sme 220315
Mod001093 context of sme 220315Mod001093 context of sme 220315
Mod001093 context of sme 220315Stephen Ong
 
Mod001093 from innovation business model to startup 140315
Mod001093 from innovation business model to startup 140315Mod001093 from innovation business model to startup 140315
Mod001093 from innovation business model to startup 140315Stephen Ong
 
Gs503 vcf lecture 4 valuation ii 090215
Gs503 vcf lecture 4 valuation ii  090215Gs503 vcf lecture 4 valuation ii  090215
Gs503 vcf lecture 4 valuation ii 090215Stephen Ong
 

More from Stephen Ong (20)

Tcm step 3 venture assessment
Tcm step 3 venture assessmentTcm step 3 venture assessment
Tcm step 3 venture assessment
 
Tcm step 2 market needs analysis
Tcm step 2 market needs analysisTcm step 2 market needs analysis
Tcm step 2 market needs analysis
 
Tcm step 1 technology analysis
Tcm step 1 technology analysisTcm step 1 technology analysis
Tcm step 1 technology analysis
 
Tcm Workshop 1 Technology analysis
Tcm Workshop 1 Technology analysisTcm Workshop 1 Technology analysis
Tcm Workshop 1 Technology analysis
 
Tcm step 3 venture assessment
Tcm step 3 venture assessmentTcm step 3 venture assessment
Tcm step 3 venture assessment
 
Tcm step 2 market needs analysis
Tcm step 2 market needs analysisTcm step 2 market needs analysis
Tcm step 2 market needs analysis
 
Tcm step 1 technology analysis
Tcm step 1 technology analysisTcm step 1 technology analysis
Tcm step 1 technology analysis
 
Tcm concept discovery stage introduction
Tcm concept discovery stage introductionTcm concept discovery stage introduction
Tcm concept discovery stage introduction
 
Mod001093 german sme hidden champions 120415
Mod001093 german sme hidden champions 120415Mod001093 german sme hidden champions 120415
Mod001093 german sme hidden champions 120415
 
Tbs910 linear programming
Tbs910 linear programmingTbs910 linear programming
Tbs910 linear programming
 
Mod001093 family businesses 050415
Mod001093 family businesses 050415Mod001093 family businesses 050415
Mod001093 family businesses 050415
 
Gs503 vcf lecture 8 innovation finance ii 060415
Gs503 vcf lecture 8 innovation finance ii 060415Gs503 vcf lecture 8 innovation finance ii 060415
Gs503 vcf lecture 8 innovation finance ii 060415
 
Gs503 vcf lecture 7 innovation finance i 300315
Gs503 vcf lecture 7 innovation finance i 300315Gs503 vcf lecture 7 innovation finance i 300315
Gs503 vcf lecture 7 innovation finance i 300315
 
Tbs910 regression models
Tbs910 regression modelsTbs910 regression models
Tbs910 regression models
 
Mod001093 intrapreneurship 290315
Mod001093 intrapreneurship 290315Mod001093 intrapreneurship 290315
Mod001093 intrapreneurship 290315
 
Gs503 vcf lecture 6 partial valuation ii 160315
Gs503 vcf lecture 6 partial valuation ii  160315Gs503 vcf lecture 6 partial valuation ii  160315
Gs503 vcf lecture 6 partial valuation ii 160315
 
Gs503 vcf lecture 5 partial valuation i 140315
Gs503 vcf lecture 5 partial valuation i  140315Gs503 vcf lecture 5 partial valuation i  140315
Gs503 vcf lecture 5 partial valuation i 140315
 
Mod001093 context of sme 220315
Mod001093 context of sme 220315Mod001093 context of sme 220315
Mod001093 context of sme 220315
 
Mod001093 from innovation business model to startup 140315
Mod001093 from innovation business model to startup 140315Mod001093 from innovation business model to startup 140315
Mod001093 from innovation business model to startup 140315
 
Gs503 vcf lecture 4 valuation ii 090215
Gs503 vcf lecture 4 valuation ii  090215Gs503 vcf lecture 4 valuation ii  090215
Gs503 vcf lecture 4 valuation ii 090215
 

Recently uploaded

Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Servicecallgirls2057
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menzaictsugar
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Riya Pathan
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessSeta Wicaksana
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...ictsugar
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCRashishs7044
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCRashishs7044
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesKeppelCorporation
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis UsageNeil Kimberley
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMVoces Mineras
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationAnamaria Contreras
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfRbc Rbcua
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxMarkAnthonyAurellano
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaoncallgirls2057
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMintel Group
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionMintel Group
 

Recently uploaded (20)

Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort ServiceCall US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
Call US-88OO1O2216 Call Girls In Mahipalpur Female Escort Service
 
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu MenzaYouth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
Youth Involvement in an Innovative Coconut Value Chain by Mwalimu Menza
 
Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737Independent Call Girls Andheri Nightlaila 9967584737
Independent Call Girls Andheri Nightlaila 9967584737
 
Organizational Structure Running A Successful Business
Organizational Structure Running A Successful BusinessOrganizational Structure Running A Successful Business
Organizational Structure Running A Successful Business
 
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...Global Scenario On Sustainable  and Resilient Coconut Industry by Dr. Jelfina...
Global Scenario On Sustainable and Resilient Coconut Industry by Dr. Jelfina...
 
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
8447779800, Low rate Call girls in Shivaji Enclave Delhi NCR
 
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
8447779800, Low rate Call girls in Kotla Mubarakpur Delhi NCR
 
Corporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information TechnologyCorporate Profile 47Billion Information Technology
Corporate Profile 47Billion Information Technology
 
Annual General Meeting Presentation Slides
Annual General Meeting Presentation SlidesAnnual General Meeting Presentation Slides
Annual General Meeting Presentation Slides
 
2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage2024 Numerator Consumer Study of Cannabis Usage
2024 Numerator Consumer Study of Cannabis Usage
 
Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)Japan IT Week 2024 Brochure by 47Billion (English)
Japan IT Week 2024 Brochure by 47Billion (English)
 
Memorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQMMemorándum de Entendimiento (MoU) entre Codelco y SQM
Memorándum de Entendimiento (MoU) entre Codelco y SQM
 
PSCC - Capability Statement Presentation
PSCC - Capability Statement PresentationPSCC - Capability Statement Presentation
PSCC - Capability Statement Presentation
 
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
No-1 Call Girls In Goa 93193 VIP 73153 Escort service In North Goa Panaji, Ca...
 
APRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdfAPRIL2024_UKRAINE_xml_0000000000000 .pdf
APRIL2024_UKRAINE_xml_0000000000000 .pdf
 
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCREnjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
Enjoy ➥8448380779▻ Call Girls In Sector 18 Noida Escorts Delhi NCR
 
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptxContemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
Contemporary Economic Issues Facing the Filipino Entrepreneur (1).pptx
 
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City GurgaonCall Us 📲8800102216📞 Call Girls In DLF City Gurgaon
Call Us 📲8800102216📞 Call Girls In DLF City Gurgaon
 
Market Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 EditionMarket Sizes Sample Report - 2024 Edition
Market Sizes Sample Report - 2024 Edition
 
Future Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted VersionFuture Of Sample Report 2024 | Redacted Version
Future Of Sample Report 2024 | Redacted Version
 

Tbs910 sampling hypothesis regression

  • 1. Sampling,Sampling, Hypothesis &Hypothesis & RegressionRegression Sampling,Sampling, Hypothesis &Hypothesis & RegressionRegression TBS910 BUSINESS ANALYTICSTBS910 BUSINESS ANALYTICS by Prof. Stephen Ong Visiting Professor, Shenzhen University Visiting Fellow, Sydney Business School, University of Wollongong
  • 3. 1.1. Using SamplesUsing Samples 2.2. Hypothesis TestingHypothesis Testing 3.3. RegressionRegression TopicsTopics 7-3
  • 5. Learning Objectives:Learning Objectives: 1.1. Understand how and why to useUnderstand how and why to use samplingsampling 2.2. Appreciate the aims of statisticalAppreciate the aims of statistical inferenceinference 3.3. Use sampling distributions to findUse sampling distributions to find point estimates for population meanspoint estimates for population means 4.4. Calculate confidence intervals forCalculate confidence intervals for means and proportionsmeans and proportions 5.5. Use one-sided distributionsUse one-sided distributions 6.6. UseUse tt-distributions for small samples-distributions for small samples
  • 6. Collecting dataCollecting data  There are essentially two types of dataThere are essentially two types of data collectioncollection  CensusCensus  SampleSample  A census is often expensive, infeasible,A census is often expensive, infeasible, impossible, unnecessaryimpossible, unnecessary  The usual choice is to collect a sampleThe usual choice is to collect a sample
  • 7. Sampling is the basis ofSampling is the basis of statistical inferencestatistical inference  This collects data from a randomThis collects data from a random sample of the populationsample of the population  Uses this data to estimates features ofUses this data to estimates features of the whole populationthe whole population
  • 8. ExampleExample  The percentages of glucose in five bars ofThe percentages of glucose in five bars of toffee are 7.2, 6.4, 7.2, 8.0 and 8.2toffee are 7.2, 6.4, 7.2, 8.0 and 8.2  The mean of these is:The mean of these is: ∑∑x / n = 37/5 = 7.4x / n = 37/5 = 7.4  This is our estimate of the population meanThis is our estimate of the population mean  The variance is (∑xThe variance is (∑x22 – (∑x)– (∑x)22 /n) / (n/n) / (n−−1) = 0.521) = 0.52  This is our estimate of the populationThis is our estimate of the population variancevariance
  • 9. Sampling fluctuationsSampling fluctuations  In practice, each sample will be slightlyIn practice, each sample will be slightly differentdifferent  The variation is described by aThe variation is described by a sampling distributionsampling distribution
  • 11. Sampling distributionsSampling distributions  Data collected from samples gives a samplingData collected from samples gives a sampling distribution, such as the sampling distributiondistribution, such as the sampling distribution of the meanof the mean  This has the features:This has the features:  If a population is Normally distributed, the samplingIf a population is Normally distributed, the sampling distribution of the mean is also Normally distributeddistribution of the mean is also Normally distributed  With large samples the sampling distribution of theWith large samples the sampling distribution of the mean is Normally distributed regardless of themean is Normally distributed regardless of the population distributionpopulation distribution  The sampling distribution of the mean has a mean μThe sampling distribution of the mean has a mean μ and standard deviation σ/and standard deviation σ/√√nn
  • 13. ExampleExample  Chocolate bars have a mean weight of 50gChocolate bars have a mean weight of 50g and standard deviation of 10g. A sample ofand standard deviation of 10g. A sample of 64 bars is taken and if the mean weight is64 bars is taken and if the mean weight is less than 46g the whole days production isless than 46g the whole days production is scraped.scraped.  A large sample gives a samplingA large sample gives a sampling distribution with mean 50g and standarddistribution with mean 50g and standard deviation 10/√64 = 1.25gdeviation 10/√64 = 1.25g  Z = (46Z = (46−−50)/1.25 =50)/1.25 = −−3.23.2  P (Z<P (Z< −−3.2) =3.2) = −−0.00070.0007
  • 14.  Sampling distributions can give point estimatesSampling distributions can give point estimates of, say, a meanof, say, a mean  Interval estimates are more usefulInterval estimates are more useful  A 95% confidence interval, say, defines theA 95% confidence interval, say, defines the range that we are 95% confident that therange that we are 95% confident that the population mean lies withinpopulation mean lies within  90% confidence interval:90% confidence interval: −− 1.645σ/1.645σ/√√n to + 1.645σ/n to + 1.645σ/√√nn  95% confidence interval:95% confidence interval: −− 1.96σ/1.96σ/√√n to + 1.96σ/n to + 1.96σ/√√nn  99% confidence interval:99% confidence interval: −− 2.58σ/2.58σ/√√n to + 2.58σ/n to + 2.58σ/√√nn
  • 15.  The best point estimate for a populationThe best point estimate for a population mean, is the mean of the samplingmean, is the mean of the sampling distributiondistribution  The best estimate for the populationThe best estimate for the population standard deviation is the standard errorstandard deviation is the standard error  The standard error can be biased for smallThe standard error can be biased for small samplessamples  Then Bessel’s correction usesThen Bessel’s correction uses σ = s/σ = s/√√(n(n −− 1) rather than σ = s/1) rather than σ = s/√√nn
  • 17. ExampleExample  Attendance is normally distributed withAttendance is normally distributed with variance of 225. Samples on five days give:variance of 225. Samples on five days give: 220220 196196 210210 186186 222222  The point estimate of the mean is ∑x/5 =The point estimate of the mean is ∑x/5 = 206.8206.8  95% confidence interval is:95% confidence interval is: 206.8206.8 ± 1.95√(225/5) = 206.8 ± 13.15± 1.95√(225/5) = 206.8 ± 13.15  For 95% of samples the interval 193.65 toFor 95% of samples the interval 193.65 to 219.95 would include the population mean219.95 would include the population mean
  • 18.  Investment returns are normally distributedInvestment returns are normally distributed with standard deviation 0.5. A randomwith standard deviation 0.5. A random sample of 10 has a mean of 0.9%.sample of 10 has a mean of 0.9%.  Point estimate of population mean is 0.9Point estimate of population mean is 0.9  95% confidence interval is:95% confidence interval is:  98% confidence interval is:98% confidence interval is: ExampleExample
  • 19. The principles of sampling canThe principles of sampling can be extended in many waysbe extended in many ways  Population proportions, wherePopulation proportions, where sampling distributions are:sampling distributions are:  Normally distributedNormally distributed  with meanwith mean ππ  and standard deviationand standard deviation √√((ππ(1(1 −− ππ)/n))/n)  One sided confidence intervalsOne sided confidence intervals  Small samplesSmall samples
  • 20. Population proportionsPopulation proportions  With large sample (more than 30) theWith large sample (more than 30) the sample proportions are:sample proportions are:  Normally distributedNormally distributed  with meanwith mean ππ  standard deviationstandard deviation √√ ((ππ (1 –(1 – ππ)/)/nn))  The 95% confidence interval is:The 95% confidence interval is:  p – 1.96p – 1.96 ×× √(p (1√(p (1−−p) / n)p) / n) to p + 1.96to p + 1.96 ×× √(p (1√(p (1 −− p) / n)p) / n)
  • 21. 1 - 21 One-sided confidence intervalOne-sided confidence interval
  • 22. Sampling distributions withSampling distributions with small samples are no longersmall samples are no longer NormalNormal  Small samples tend to include fewerSmall samples tend to include fewer outlying values and under-estimate theoutlying values and under-estimate the spreadspread  This effect is allowed for in t-This effect is allowed for in t- distributionsdistributions  The shape of the t-distribution dependsThe shape of the t-distribution depends on the degrees of freedomon the degrees of freedom  The degree of freedom is essentiallyThe degree of freedom is essentially one less than the sample sizeone less than the sample size
  • 24. ExampleExample  Duration of rentals is normally distributed.Duration of rentals is normally distributed.  A random sample of 14 has a mean of 2.1429A random sample of 14 has a mean of 2.1429 and variance of 1.6703.and variance of 1.6703.  Point estimate of population mean = 2.1429Point estimate of population mean = 2.1429  Estimate of population standard deviation isEstimate of population standard deviation is √(1.6703/14) = 0.3454√(1.6703/14) = 0.3454  For t distribution with 13 degrees ofFor t distribution with 13 degrees of freedom,99% confidence interval is 3.012freedom,99% confidence interval is 3.012  Confidence interval is 2.1429Confidence interval is 2.1429 ±3.012±3.012 ×× 0.34540.3454 or 1.10 to 3.18or 1.10 to 3.18
  • 25. Finding probabilities for theFinding probabilities for the t-distributiont-distribution  Calculations for t-distributions use:Calculations for t-distributions use:  Standard tablesStandard tables  Specialised statistical softwareSpecialised statistical software  The TINV function in spreadsheetsThe TINV function in spreadsheets  For samples of more than about 30, theFor samples of more than about 30, the t-distribution is identical to the Normalt-distribution is identical to the Normal
  • 26. 1 - 26 Student-t distribution Sample size 8 Mean 37 Standard deviation 12 Part (a) Degrees of freedom 7 D3 - 1 Standard error 4.536 D5/SQRT(D3-1) Confidence interval 90 Number of standard deviations 1.895 TINV((100-D11/100,D) Confidence interval From 28.407 D4 - (D12*D9) To 45.593001 D4 + (D12*D9) Part (b) Confidence interval 95 Number of standard deviations 2.365 TINV((100-D17/100,D8) Confidence interval From 26.275 D4 - (D18*D9) To 47.725 D4 + (D18*D9) Part (c) Sample size 20 Degrees of freedom 19 Standard error 2.753 D5/SQRT(D23-1) Confidence interval 95 Number of standard deviations 2.093 TINV((100-D26/100,D24) Confidence interval From 31.238 D4 - (D27*D25) To 42.762 D4 + (D27*D25) Part (d) Normal distribution Sample size 20 Standard error 2.753 D5/SQRT(D33-1) Confidence interval 95 Number of standard deviations 1.960 NORMSINV((100-D35/200) Confidence interval From 31.604 D4 - (D36*D34) To 42.396 D4 + (D36*D34)
  • 28. Learning ObjectivesLearning Objectives 1.1. Understand the purpose of hypothesis testingUnderstand the purpose of hypothesis testing 2.2. List the steps involved in hypothesis testingList the steps involved in hypothesis testing 3.3. Understand the errors involved and the use ofUnderstand the errors involved and the use of significance levelssignificance levels 4.4. Test hypotheses about population meansTest hypotheses about population means 5.5. Use one- and two-tail testsUse one- and two-tail tests 6.6. Extend these tests to deal with small samplesExtend these tests to deal with small samples 7.7. Use the tests for a variety of problemsUse the tests for a variety of problems 8.8. Consider non-parametric tests, particularlyConsider non-parametric tests, particularly the chi-squared testthe chi-squared test
  • 29. Hypothesis testingHypothesis testing  Considers a simple statement about aConsiders a simple statement about a populationpopulation  This is the hypothesisThis is the hypothesis  Uses a sample to test whether the statement isUses a sample to test whether the statement is likely to be true or is unlikelylikely to be true or is unlikely  It sees whether or not data from a sampleIt sees whether or not data from a sample supports a hypothesis about the populationsupports a hypothesis about the population  If the hypothesis is unlikely, the hypothesis isIf the hypothesis is unlikely, the hypothesis is rejected and another implied hypothesis mustrejected and another implied hypothesis must be truebe true
  • 30.  Define a simple, precise statement aboutDefine a simple, precise statement about a population (the hypothesis)a population (the hypothesis)  Take a sample from the populationTake a sample from the population  Test this sample to see if it supports theTest this sample to see if it supports the hypothesis, or if it makes the hypothesishypothesis, or if it makes the hypothesis highly improbablehighly improbable  If the hypothesis is highly improbableIf the hypothesis is highly improbable reject it, otherwise accept itreject it, otherwise accept it Procedure for hypothesisProcedure for hypothesis testingtesting
  • 31. ExampleExample  A politician claims that 10% of factoriesA politician claims that 10% of factories in an area are losing money (hypothesis)in an area are losing money (hypothesis)  A sample of 30 shows that all of them areA sample of 30 shows that all of them are profitable (test statistic)profitable (test statistic)  If the hypothesis is true, the probabilityIf the hypothesis is true, the probability that they are all profitable is 0.9that they are all profitable is 0.93030 = 0.0424= 0.0424 (p-value)(p-value)  This is very unlikely, so we can reject theThis is very unlikely, so we can reject the hypothesis (test result)hypothesis (test result)
  • 32. Elements of a hypothesis testElements of a hypothesis test 1.1. Hypothesis – a simple statementHypothesis – a simple statement about a populationabout a population 2.2. Test result - actual result from theTest result - actual result from the samplesample 3.3. Test statistic – a calculation about aTest statistic – a calculation about a sample assuming that the hypothesissample assuming that the hypothesis is trueis true 4.4. Conclusion – either reject theConclusion – either reject the hypothesis or nothypothesis or not
  • 33. ExampleExample 1. Hypothesis - is that half of all staff have a degree Ho : P = 0.5 H1 : P < 0.5 2. Test result - in a sample of 10 staff three have degrees 3. Test statistic – the number with degrees is a binomial process. If probability is 0.5, the probability of a value more extreme than 3 is: p = P(x≤3) = P(0) + P(1) + P(2) + P(3) = 0.1719 4. Conclusion – 17.19% is greater than 5%, so the test is not significant and there is no evidence to reject Ho
  • 34. Because of uncertainty, we canBecause of uncertainty, we can never be certain of the resultsnever be certain of the results  Then we cannot really ‘accept’ aThen we cannot really ‘accept’ a hypothesis, but instead we say that wehypothesis, but instead we say that we ‘cannot reject’ it‘cannot reject’ it  If we reject the hypothesis – called theIf we reject the hypothesis – called the null hypothesis – we implicitly acceptnull hypothesis – we implicitly accept another hypothesis – called theanother hypothesis – called the alternative hypothesisalternative hypothesis
  • 35. Uncertainty also means thatUncertainty also means that we can make mistakeswe can make mistakes
  • 36. Type I and Type II errorsType I and Type II errors  Ideally, both Type I and Type II errorsIdeally, both Type I and Type II errors would be smallwould be small  In practice, the probability of a Type IIn practice, the probability of a Type I error is the significance levelerror is the significance level  As this decreases, the probability ofAs this decreases, the probability of accepting a false null hypothesisaccepting a false null hypothesis increasesincreases  We have to balance the two types ofWe have to balance the two types of errorerror
  • 37. Significance levelSignificance level  Is the minimum acceptable probability that aIs the minimum acceptable probability that a value actually comes from the hypothesisedvalue actually comes from the hypothesised populationpopulation  When the probability is less than this, we rejectWhen the probability is less than this, we reject the null hypothesis; when the probability isthe null hypothesis; when the probability is more than this we do not reject itmore than this we do not reject it  It is the maximum acceptable probability ofIt is the maximum acceptable probability of making a Type I errormaking a Type I error  Usually 0.05, but other values are possible,Usually 0.05, but other values are possible, notably 0.01notably 0.01
  • 39. Bringing ideas together gives theBringing ideas together gives the formal steps in a hypothesis testformal steps in a hypothesis test  State the null and alternative hypothesesState the null and alternative hypotheses  Specify the significance levelSpecify the significance level  Calculate the acceptance range for theCalculate the acceptance range for the variable testedvariable tested  Find the actual value for the variableFind the actual value for the variable testedtested  Decide whether or not to reject the nullDecide whether or not to reject the null hypothesishypothesis  State the conclusionState the conclusion
  • 40. ExampleExample  The time a doctor spends with patients is N(7, 2The time a doctor spends with patients is N(7, 222 ). A). A new doctor spends seems to work more slowly.new doctor spends seems to work more slowly.  HH00 :: μμ = 7= 7 HH11 :: μμ > 7> 7  A sample of 56 patients takes 420 minutes, givingA sample of 56 patients takes 420 minutes, giving μμ == 7.57.5  Probability ofProbability of μμ ≥7.5 (with Z = 0.5/(2/√56)) 0.03≥7.5 (with Z = 0.5/(2/√56)) 0.03  0.03 is less than 0.05 the test is significant and we can0.03 is less than 0.05 the test is significant and we can reject Hreject H00 There is evidence to support HThere is evidence to support H11 that the newthat the new doctor spends more than 7 minutesdoctor spends more than 7 minutes
  • 41. There is a huge number ofThere is a huge number of variations on the generalvariations on the general procedure including:procedure including:  One-sided testsOne-sided tests  Tests with small samplesTests with small samples  Tests for population proportionsTests for population proportions  Testing for differences in meansTesting for differences in means  Paired testsPaired tests  Goodness of fitGoodness of fit  Tests of associationTests of association
  • 43. 1 - 43 Data Original interviews Later interviews Original interviews Later interviews 10 10 Mean 8.625 10.125 11 9 Variance 2.8393 1.2679 9 11 Observations 8 8 6 10 Hypothesised Mean Difference 0 8 9 Degrees of freedom 7 10 12 t Statistic 2.291 7 9 P(T<=t) one-tail 0.0279 8 11 t Critical one-tail 1.8946 P(T<=t) two-tail 0.0557 t Critical two-tail 2.3646 Paired tests Amethyst Interviews t-Test: Paired Two Sample for Means
  • 44. Non-parametric tests are usedNon-parametric tests are used when there is no appropriatewhen there is no appropriate parameter to measureparameter to measure  Then we have to use a non-parametric – orThen we have to use a non-parametric – or distribution free –testdistribution free –test  The most common is the chi squaredThe most common is the chi squared testtest  WhereWhere  The shape of the curve depends on theThe shape of the curve depends on the degrees of freedom (the number of classesdegrees of freedom (the number of classes minus the number of estimated variablesminus the number of estimated variables minus one)minus one) E )E(O = 2 2 - ∑χ
  • 46. Goodness of fit testGoodness of fit test  We reject the hypothesis that the dataWe reject the hypothesis that the data follows a specified distribution when thefollows a specified distribution when the calculated value is greater than acalculated value is greater than a critical value for the distributioncritical value for the distribution  The critical value is found inThe critical value is found in  Standard tablesStandard tables  Statistical softwareStatistical software  The CHIINV function in spreadsheetsThe CHIINV function in spreadsheets
  • 48.  The probability of this is given by aThe probability of this is given by a χχ22 distribution with cdistribution with c −− 1 degrees of freedom (c is1 degrees of freedom (c is the number of categories)the number of categories)  This comes from standard tables or softwareThis comes from standard tables or software  In this case the probability thatIn this case the probability that χχ22 > 66.35 is> 66.35 is 0.000 (to three decimal places)0.000 (to three decimal places)  This probability is less than 0.05 so we rejectThis probability is less than 0.05 so we reject the null hypothesis that the observations comethe null hypothesis that the observations come from the stated distributionfrom the stated distribution Example (continued)Example (continued)
  • 49. There are five categories so we need the χ2 distribution with 5 – 1 = 4 degrees of freedom and then P (2 > 11.389) = 0.0225. This is less than 0.05 so we conclude that there is evidence that the probability distribution differs from the hypothesised one. Example (continued)Example (continued)
  • 50. Tests of associationTests of association  Test for relationships betweenTest for relationships between variables described in a contingencyvariables described in a contingency tabletable
  • 51. Contingency tables continuedContingency tables continued  The expected value in each cell is:The expected value in each cell is: E =E = row totalrow total ×× column totalcolumn total number of observationsnumber of observations  Then we calculateThen we calculate χχ22 as usual and look up theas usual and look up the related probability with degrees of freedom:related probability with degrees of freedom: (number of rows(number of rows −−1)1) ×× (number of columns(number of columns −−1)1) = 2= 2−−1)1) ×× (3(3−−1) = 21) = 2
  • 54. Learning ObjectivesLearning Objectives 1.1. Understand the purpose of regressionUnderstand the purpose of regression 2.2. See how the strength of a relationship is related to theSee how the strength of a relationship is related to the amount of noiseamount of noise 3.3. Measure the errors introduced by noiseMeasure the errors introduced by noise 4.4. Use linear regression to find the line of best fit through aUse linear regression to find the line of best fit through a set of dataset of data 5.5. Use this line of best fit for causal forecastingUse this line of best fit for causal forecasting 6.6. Calculate and interpret coefficients of determination andCalculate and interpret coefficients of determination and correlationcorrelation 7.7. Use Spearman’s coefficient of rank correlationUse Spearman’s coefficient of rank correlation 8.8. Understand the results of multiple regressionUnderstand the results of multiple regression 9.9. Use curve fitting for more complex functionsUse curve fitting for more complex functions
  • 55. Regression and curve fitting look forRegression and curve fitting look for relationships between variablesrelationships between variables  There are two basic questions:There are two basic questions:  Finding the best relationship,Finding the best relationship, which is regressionwhich is regression  Seeing how well the relationshipSeeing how well the relationship fits the data, which is measuredfits the data, which is measured by correlation and determinationby correlation and determination
  • 56. The most common approach isThe most common approach is linear regressionlinear regression  This draws the straight line of best fitThis draws the straight line of best fit through a set of datathrough a set of data  The general procedure is:The general procedure is:  Draw a scatter diagramDraw a scatter diagram  Identify a linear relationshipIdentify a linear relationship  Find the line of best fit through the dataFind the line of best fit through the data  Uses this line to predict a value for theUses this line to predict a value for the dependent variable from a known value ofdependent variable from a known value of the independent variablethe independent variable
  • 58. 1 - 58 0 5 10 15 20 25 30 35 40 45 0 2 4 6 8 10 12 14 16 18 Temperature Electricityconsupmtion Underlying trend Some noise More noise
  • 59. To find the line of best fitTo find the line of best fit  We have to find values for the constants aWe have to find values for the constants a and b in the equation y = a + bxand b in the equation y = a + bx  There is noise - or a deviation or error - inThere is noise - or a deviation or error - in every observationevery observation  The ‘best’ line is defined as the one thatThe ‘best’ line is defined as the one that minimises the mean squared errorminimises the mean squared error  There is a calculation for a and b, butThere is a calculation for a and b, but standard software is more reliablestandard software is more reliable ( ) xb-y=a x-xn yx-xyn =b 22 ∑∑ ∑∑∑
  • 61. 1 - 61 Quality control at Olfentia Travel Checks Mistakes SUMMARY OUTPUT 0 92 1 86 Regression Statistics 2 81 Multiple R 0.994 3 72 R Square 0.988 4 67 Adjusted R Square 0.986 5 59 Standard Error 3.077 6 53 Observations 11 7 43 8 32 ANOVA 9 24 df SS MS 10 12 Regression 1 6833.536 6833.536 Residual 9 85.191 9.466 Total 10 6918.727 CoefficientsStandard Error Intercept 95.864 1.735 X Variable 1 -7.882 0.293 0 10 20 30 40 50 60 70 80 90 100 0 2 4 6 8 10 12 Number of checks Numberofmistakes
  • 62. The strength of a relationship isThe strength of a relationship is measured bymeasured by  Coefficient of determinationCoefficient of determination  Which see how much of the total sum of squaredWhich see how much of the total sum of squared errors is explained by the regressionerrors is explained by the regression  Takes a value between zero and + 1Takes a value between zero and + 1  Coefficient of correlationCoefficient of correlation  Which asks how closely x and y are linearlyWhich asks how closely x and y are linearly relatedrelated  Is the square root of the coefficient ofIs the square root of the coefficient of determinationdetermination  Takes a value between -1 and +1Takes a value between -1 and +1  Spearman’s rank correlationSpearman’s rank correlation  For ordinal dataFor ordinal data
  • 64. 1 - 64 yy x x y y x x y xx y o o o o o o o o o o o o o o o o oo o o o o o o o o o o o o o o o o o r = + 1 , p e r fe c t p o s it iv e c o r r e la t io n r c lo s e t o 1 , lin e o f g o o d fit r = 0 , r a n d o m p o in t s w i t h n o c o r r e la t io n r d e c r e a s in g w it h w o r s e fit o f lin e r c lo s e t o - 1 , lin e o f g o o d fit r = - 1 , p e r fe c t n e g a t iv e c o r r e la t io n o o o o o o o o o o o o o o o o o o F ig u r e 9 . 1 2 In t e r p r e t in g t h e c o e ffic ie n t o f c o r r e la t io n
  • 65. 1 - 65 Correlation and determination x y SUMMARY OUTPUT 4 13 17 47 Regression Statistics 3 24 Multiple R 0.797 21 41 R Square 0.635 10 29 Adjusted R Square 0.607 8 33 Standard Error 7.261 4 28 Observations 15 9 38 13 46 ANOVA 12 32 df SS 2 14 Regression 1 1191.630 6 22 Residual 13 685.304 15 26 Total 14 1876.933 8 21 19 50 Coefficients Intercept 15.376 X Variable 1 1.545
  • 66. Rank correlationRank correlation 2 s 2 6 6 8 1   1 ( 1) 5 (25 1)   0.6 D r n n × = − = − − × − = ∑ Service V W X Y Z Quality ranking 2 5 1 3 4 Cost ranking 1 3 2 4 5
  • 67. Multiple (linear) regressionMultiple (linear) regression  Extends the principles of linear regression toExtends the principles of linear regression to more independent variablesmore independent variables Y = a + bY = a + b11xx11 + b+ b22xx22 + b+ b33xx33 + b+ b44xx44 …….…….  The calculations for this are always done byThe calculations for this are always done by standard softwarestandard software  You have to be careful with the requirementsYou have to be careful with the requirements (technically called multicollinearity,(technically called multicollinearity, autocorrelation, extrapolation, etc)autocorrelation, extrapolation, etc)  You also have to be careful with theYou also have to be careful with the interpretation of resultsinterpretation of results
  • 68. 1 - 68 Multiple Regression DATA Sales Advertising Price 2450 100 50 3010 130 56 3090 160 45 3700 190 63 3550 210 48 4280 240 70 SUMMARY OUTPUT Regression Statistics Multiple R 0.996 R Square 0.992 Adjusted R Square 0.986 Standard Error 75.055 Observations 6 ANOVA df SS Regression 2 2003633.452 Residual 3 16899.882 Total 5 2020533.333 Coefficients Standard Error Intercept 585.96 195.57 X Variable 1 9.92 0.76 X Variable 2 19.11 4.13
  • 71. 1 - 71 Multiple regression Production Shifts Bonus Overtime Morale 2810 6 15 8 5 2620 3 20 10 6 3080 3 5 22 3 4200 4 5 31 2 1500 1 7 9 8 3160 2 12 22 10 4680 2 25 30 7 2330 7 10 5 7 1780 1 12 7 5 3910 8 3 20 3 Correlations Production Shifts Bonus Overtime Morale Production 1 Shifts 0.262 1 Bonus 0.153 -0.315 1 Overtime 0.878 -0.108 -0.022 1 Morale -0.332 -0.395 0.451 -0.265 1 Regression Multiple R 0.997 R Square 0.995 Adjusted R Square 0.990 Standard Error 100.807 Observations 10 ANOVA df SS Regression 4 9439000 Residual 5 50810 Total 9 9489810 Coefficients Standard Error Intercept 346.33 160.22 X Variable 1 181.80 15.25 X Variable 2 50.13 5.44 X Variable 3 96.17 3.69 X Variable 4 -28.70 16.76
  • 72. Curve fittingCurve fitting  Is a more general term than regressionIs a more general term than regression  It refers to the process of fittingIt refers to the process of fitting different types of curve through sets ofdifferent types of curve through sets of datadata  Typically this involves:Typically this involves:  Exponential curvesExponential curves  Growth curvesGrowth curves  PolynomialsPolynomials
  • 73. 1 - 73 Curve fitting Year Cost Predictions Line of fit 1 0.8 0.816 y=bm^x b= 0.6541 m= 1.2475 2 1 1.018 3 1.3 1.270 4 1.7 1.584 5 2 1.976 6 2.4 2.465 7 2.9 3.075 8 3.8 3.836 9 4.7 4.786 10 6.2 5.970 11 7.5 7.448 12 9.291 13 11.591 14 14.459 0 2 4 6 8 10 12 14 16 0 2 4 6 8 10 12 14
  • 74. Further ReadingFurther Reading  Waters, Donald (2011) QuantitativeWaters, Donald (2011) Quantitative Methods for Business, Prentice Hall, 5Methods for Business, Prentice Hall, 5thth EditionEdition  Evans, J.R (2013), BusinessEvans, J.R (2013), Business Analytics,Analytics, 1st Edition1st Edition PearsonPearson  Render, B., Stair Jr.,R.M. & Hanna, M.E. (2013) Quantitative Analysis for Management, Pearson, 11th Edition