Tbs910 sampling hypothesis regression

Sampling,Sampling,
Hypothesis &Hypothesis &
RegressionRegression
Sampling,Sampling,
Hypothesis &Hypothesis &
RegressionRegression
TBS910 BUSINESS ANALYTICSTBS910 BUSINESS ANALYTICS
by
Prof. Stephen Ong
Visiting Professor, Shenzhen
University
Visiting Fellow, Sydney Business
School, University of Wollongong

Today’s OverviewToday’s Overview

1.1. Using SamplesUsing Samples
2.2. Hypothesis TestingHypothesis Testing
3.3. RegressionRegression
TopicsTopics
7-3

USING SAMPLESUSING SAMPLES
1 - 4

Learning Objectives:Learning Objectives:
1.1. Understand how and why to useUnderstand how and why to use
samplingsampling
2.2. Appreciate the aims of statisticalAppreciate the aims of statistical
inferenceinference
3.3. Use sampling distributions to findUse sampling distributions to find
point estimates for population meanspoint estimates for population means
4.4. Calculate confidence intervals forCalculate confidence intervals for
means and proportionsmeans and proportions
5.5. Use one-sided distributionsUse one-sided distributions
6.6. UseUse tt-distributions for small samples-distributions for small samples

Collecting dataCollecting data
 There are essentially two types of dataThere are essentially two types of data
collectioncollection
 CensusCensus
 SampleSample
 A census is often expensive, infeasible,A census is often expensive, infeasible,
impossible, unnecessaryimpossible, unnecessary
 The usual choice is to collect a sampleThe usual choice is to collect a sample

Sampling is the basis ofSampling is the basis of
statistical inferencestatistical inference
 This collects data from a randomThis collects data from a random
sample of the populationsample of the population
 Uses this data to estimates features ofUses this data to estimates features of
the whole populationthe whole population

ExampleExample
 The percentages of glucose in five bars ofThe percentages of glucose in five bars of
toffee are 7.2, 6.4, 7.2, 8.0 and 8.2toffee are 7.2, 6.4, 7.2, 8.0 and 8.2
 The mean of these is:The mean of these is:
∑∑x / n = 37/5 = 7.4x / n = 37/5 = 7.4
 This is our estimate of the population meanThis is our estimate of the population mean
 The variance is (∑xThe variance is (∑x22
– (∑x)– (∑x)22
/n) / (n/n) / (n−−1) = 0.521) = 0.52
 This is our estimate of the populationThis is our estimate of the population
variancevariance

Sampling fluctuationsSampling fluctuations
 In practice, each sample will be slightlyIn practice, each sample will be slightly
differentdifferent
 The variation is described by aThe variation is described by a
sampling distributionsampling distribution

Sampling distributionsSampling distributions
 Data collected from samples gives a samplingData collected from samples gives a sampling
distribution, such as the sampling distributiondistribution, such as the sampling distribution
of the meanof the mean
 This has the features:This has the features:
 If a population is Normally distributed, the samplingIf a population is Normally distributed, the sampling
distribution of the mean is also Normally distributeddistribution of the mean is also Normally distributed
 With large samples the sampling distribution of theWith large samples the sampling distribution of the
mean is Normally distributed regardless of themean is Normally distributed regardless of the
population distributionpopulation distribution
 The sampling distribution of the mean has a mean μThe sampling distribution of the mean has a mean μ
and standard deviation σ/and standard deviation σ/√√nn

ExampleExample
 Chocolate bars have a mean weight of 50gChocolate bars have a mean weight of 50g
and standard deviation of 10g. A sample ofand standard deviation of 10g. A sample of
64 bars is taken and if the mean weight is64 bars is taken and if the mean weight is
less than 46g the whole days production isless than 46g the whole days production is
scraped.scraped.
 A large sample gives a samplingA large sample gives a sampling
distribution with mean 50g and standarddistribution with mean 50g and standard
deviation 10/√64 = 1.25gdeviation 10/√64 = 1.25g
 Z = (46Z = (46−−50)/1.25 =50)/1.25 = −−3.23.2
 P (Z<P (Z< −−3.2) =3.2) = −−0.00070.0007

 Sampling distributions can give point estimatesSampling distributions can give point estimates
of, say, a meanof, say, a mean
 Interval estimates are more usefulInterval estimates are more useful
 A 95% confidence interval, say, defines theA 95% confidence interval, say, defines the
range that we are 95% confident that therange that we are 95% confident that the
population mean lies withinpopulation mean lies within
 90% confidence interval:90% confidence interval: −− 1.645σ/1.645σ/√√n to + 1.645σ/n to + 1.645σ/√√nn

 The best point estimate for a populationThe best point estimate for a population
mean, is the mean of the samplingmean, is the mean of the sampling
distributiondistribution
 The best estimate for the populationThe best estimate for the population
standard deviation is the standard errorstandard deviation is the standard error
 The standard error can be biased for smallThe standard error can be biased for small
samplessamples
 Then Bessel’s correction usesThen Bessel’s correction uses
σ = s/σ = s/√√(n(n −− 1) rather than σ = s/1) rather than σ = s/√√nn

ExampleExample
 Attendance is normally distributed withAttendance is normally distributed with
variance of 225. Samples on five days give:variance of 225. Samples on five days give:
220220 196196 210210 186186 222222
 The point estimate of the mean is ∑x/5 =The point estimate of the mean is ∑x/5 =
206.8206.8
 95% confidence interval is:95% confidence interval is:
206.8206.8 ± 1.95√(225/5) = 206.8 ± 13.15± 1.95√(225/5) = 206.8 ± 13.15
 For 95% of samples the interval 193.65 toFor 95% of samples the interval 193.65 to
219.95 would include the population mean219.95 would include the population mean

 Investment returns are normally distributedInvestment returns are normally distributed
with standard deviation 0.5. A randomwith standard deviation 0.5. A random
sample of 10 has a mean of 0.9%.sample of 10 has a mean of 0.9%.
 Point estimate of population mean is 0.9Point estimate of population mean is 0.9
ExampleExample

The principles of sampling canThe principles of sampling can
be extended in many waysbe extended in many ways
 Population proportions, wherePopulation proportions, where
sampling distributions are:sampling distributions are:
 Normally distributedNormally distributed
 with meanwith mean ππ
 and standard deviationand standard deviation √√((ππ(1(1 −− ππ)/n))/n)
 One sided confidence intervalsOne sided confidence intervals
 Small samplesSmall samples

Population proportionsPopulation proportions
 With large sample (more than 30) theWith large sample (more than 30) the
sample proportions are:sample proportions are:
 Normally distributedNormally distributed
 with meanwith mean ππ
 standard deviationstandard deviation √√ ((ππ (1 –(1 – ππ)/)/nn))
 The 95% confidence interval is:The 95% confidence interval is:
 p – 1.96p – 1.96 ×× √(p (1√(p (1−−p) / n)p) / n)
to p + 1.96to p + 1.96 ×× √(p (1√(p (1 −− p) / n)p) / n)

1 - 21
One-sided confidence intervalOne-sided confidence interval

Sampling distributions withSampling distributions with
small samples are no longersmall samples are no longer
NormalNormal
 Small samples tend to include fewerSmall samples tend to include fewer
outlying values and under-estimate theoutlying values and under-estimate the
spreadspread
 This effect is allowed for in t-This effect is allowed for in t-
distributionsdistributions
 The shape of the t-distribution dependsThe shape of the t-distribution depends
on the degrees of freedomon the degrees of freedom
 The degree of freedom is essentiallyThe degree of freedom is essentially
one less than the sample sizeone less than the sample size

ExampleExample
 Duration of rentals is normally distributed.Duration of rentals is normally distributed.
 A random sample of 14 has a mean of 2.1429A random sample of 14 has a mean of 2.1429
and variance of 1.6703.and variance of 1.6703.
 Point estimate of population mean = 2.1429Point estimate of population mean = 2.1429
 Estimate of population standard deviation isEstimate of population standard deviation is
√(1.6703/14) = 0.3454√(1.6703/14) = 0.3454
 For t distribution with 13 degrees ofFor t distribution with 13 degrees of
freedom,99% confidence interval is 3.012freedom,99% confidence interval is 3.012
 Confidence interval is 2.1429Confidence interval is 2.1429 ±3.012±3.012 ×× 0.34540.3454
or 1.10 to 3.18or 1.10 to 3.18

Finding probabilities for theFinding probabilities for the
t-distributiont-distribution
 Calculations for t-distributions use:Calculations for t-distributions use:
 Standard tablesStandard tables
 Specialised statistical softwareSpecialised statistical software
 The TINV function in spreadsheetsThe TINV function in spreadsheets
 For samples of more than about 30, theFor samples of more than about 30, the
t-distribution is identical to the Normalt-distribution is identical to the Normal

1 - 26
Student-t distribution
Sample size 8
Mean 37
Standard deviation 12
Part (a)
Degrees of freedom 7 D3 - 1
Standard error 4.536 D5/SQRT(D3-1)
Confidence interval 90
Number of standard deviations 1.895 TINV((100-D11/100,D)
Confidence interval From 28.407 D4 - (D12*D9)
To 45.593001 D4 + (D12*D9)
Part (b)
Number of standard deviations 2.365 TINV((100-D17/100,D8)
To 47.725 D4 + (D18*D9)
Part (c)
Sample size 20
Degrees of freedom 19
Number of standard deviations 2.093 TINV((100-D26/100,D24)
To 42.762 D4 + (D27*D25)
Part (d)
Normal distribution
Sample size 20
Number of standard deviations 1.960 NORMSINV((100-D35/200)
To 42.396 D4 + (D36*D34)

TESTING HYPOTHESISTESTING HYPOTHESIS
1 - 27

Learning ObjectivesLearning Objectives
1.1. Understand the purpose of hypothesis testingUnderstand the purpose of hypothesis testing
2.2. List the steps involved in hypothesis testingList the steps involved in hypothesis testing
3.3. Understand the errors involved and the use ofUnderstand the errors involved and the use of
significance levelssignificance levels
4.4. Test hypotheses about population meansTest hypotheses about population means
5.5. Use one- and two-tail testsUse one- and two-tail tests
6.6. Extend these tests to deal with small samplesExtend these tests to deal with small samples
7.7. Use the tests for a variety of problemsUse the tests for a variety of problems
8.8. Consider non-parametric tests, particularlyConsider non-parametric tests, particularly
the chi-squared testthe chi-squared test

Hypothesis testingHypothesis testing
 Considers a simple statement about aConsiders a simple statement about a
populationpopulation
 This is the hypothesisThis is the hypothesis
 Uses a sample to test whether the statement isUses a sample to test whether the statement is
likely to be true or is unlikelylikely to be true or is unlikely
 It sees whether or not data from a sampleIt sees whether or not data from a sample
supports a hypothesis about the populationsupports a hypothesis about the population
 If the hypothesis is unlikely, the hypothesis isIf the hypothesis is unlikely, the hypothesis is
rejected and another implied hypothesis mustrejected and another implied hypothesis must
be truebe true

 Define a simple, precise statement aboutDefine a simple, precise statement about
a population (the hypothesis)a population (the hypothesis)
 Take a sample from the populationTake a sample from the population
 Test this sample to see if it supports theTest this sample to see if it supports the
hypothesis, or if it makes the hypothesishypothesis, or if it makes the hypothesis
highly improbablehighly improbable
 If the hypothesis is highly improbableIf the hypothesis is highly improbable
reject it, otherwise accept itreject it, otherwise accept it
Procedure for hypothesisProcedure for hypothesis
testingtesting

ExampleExample
 A politician claims that 10% of factoriesA politician claims that 10% of factories
in an area are losing money (hypothesis)in an area are losing money (hypothesis)
 A sample of 30 shows that all of them areA sample of 30 shows that all of them are
profitable (test statistic)profitable (test statistic)
 If the hypothesis is true, the probabilityIf the hypothesis is true, the probability
that they are all profitable is 0.9that they are all profitable is 0.93030
= 0.0424= 0.0424
(p-value)(p-value)
 This is very unlikely, so we can reject theThis is very unlikely, so we can reject the
hypothesis (test result)hypothesis (test result)

Elements of a hypothesis testElements of a hypothesis test
1.1. Hypothesis – a simple statementHypothesis – a simple statement
about a populationabout a population
2.2. Test result - actual result from theTest result - actual result from the
samplesample
3.3. Test statistic – a calculation about aTest statistic – a calculation about a
sample assuming that the hypothesissample assuming that the hypothesis
is trueis true
4.4. Conclusion – either reject theConclusion – either reject the
hypothesis or nothypothesis or not

ExampleExample
1. Hypothesis - is that half of all staff have a degree
Ho : P = 0.5 H1 : P < 0.5
2. Test result - in a sample of 10 staff three have degrees
3. Test statistic – the number with degrees is a binomial
process. If probability is 0.5, the probability of a value
more extreme than 3 is:
p = P(x≤3) = P(0) + P(1) + P(2) + P(3) = 0.1719
4. Conclusion – 17.19% is greater than 5%, so the test is not
significant and there is no evidence to reject Ho

Because of uncertainty, we canBecause of uncertainty, we can
never be certain of the resultsnever be certain of the results
 Then we cannot really ‘accept’ aThen we cannot really ‘accept’ a
hypothesis, but instead we say that wehypothesis, but instead we say that we
‘cannot reject’ it‘cannot reject’ it
 If we reject the hypothesis – called theIf we reject the hypothesis – called the
null hypothesis – we implicitly acceptnull hypothesis – we implicitly accept
another hypothesis – called theanother hypothesis – called the
alternative hypothesisalternative hypothesis

Uncertainty also means thatUncertainty also means that
we can make mistakeswe can make mistakes

Type I and Type II errorsType I and Type II errors
 Ideally, both Type I and Type II errorsIdeally, both Type I and Type II errors
would be smallwould be small
 In practice, the probability of a Type IIn practice, the probability of a Type I
error is the significance levelerror is the significance level
 As this decreases, the probability ofAs this decreases, the probability of
accepting a false null hypothesisaccepting a false null hypothesis
increasesincreases
 We have to balance the two types ofWe have to balance the two types of
errorerror

Significance levelSignificance level
 Is the minimum acceptable probability that aIs the minimum acceptable probability that a
value actually comes from the hypothesisedvalue actually comes from the hypothesised
populationpopulation
 When the probability is less than this, we rejectWhen the probability is less than this, we reject
the null hypothesis; when the probability isthe null hypothesis; when the probability is
more than this we do not reject itmore than this we do not reject it
 It is the maximum acceptable probability ofIt is the maximum acceptable probability of
making a Type I errormaking a Type I error
 Usually 0.05, but other values are possible,Usually 0.05, but other values are possible,
notably 0.01notably 0.01

Bringing ideas together gives theBringing ideas together gives the
formal steps in a hypothesis testformal steps in a hypothesis test
 State the null and alternative hypothesesState the null and alternative hypotheses
 Specify the significance levelSpecify the significance level
 Calculate the acceptance range for theCalculate the acceptance range for the
variable testedvariable tested
 Find the actual value for the variableFind the actual value for the variable
testedtested
 Decide whether or not to reject the nullDecide whether or not to reject the null
hypothesishypothesis
 State the conclusionState the conclusion

ExampleExample
 The time a doctor spends with patients is N(7, 2The time a doctor spends with patients is N(7, 222
). A). A
new doctor spends seems to work more slowly.new doctor spends seems to work more slowly.
 HH00 :: μμ = 7= 7 HH11 :: μμ > 7> 7
 A sample of 56 patients takes 420 minutes, givingA sample of 56 patients takes 420 minutes, giving μμ ==
7.57.5
 Probability ofProbability of μμ ≥7.5 (with Z = 0.5/(2/√56)) 0.03≥7.5 (with Z = 0.5/(2/√56)) 0.03
 0.03 is less than 0.05 the test is significant and we can0.03 is less than 0.05 the test is significant and we can
reject Hreject H00 There is evidence to support HThere is evidence to support H11 that the newthat the new
doctor spends more than 7 minutesdoctor spends more than 7 minutes

There is a huge number ofThere is a huge number of
variations on the generalvariations on the general
procedure including:procedure including:
 One-sided testsOne-sided tests
 Tests with small samplesTests with small samples
 Tests for population proportionsTests for population proportions
 Testing for differences in meansTesting for differences in means
 Paired testsPaired tests
 Goodness of fitGoodness of fit
 Tests of associationTests of association

1 - 43
Data
Original
interviews
Later
interviews
Original
interviews
Later
interviews
10 10 Mean 8.625 10.125
11 9 Variance 2.8393 1.2679
9 11 Observations 8 8
6 10 Hypothesised Mean Difference 0
8 9 Degrees of freedom 7
10 12 t Statistic 2.291
7 9 P(T<=t) one-tail 0.0279
8 11 t Critical one-tail 1.8946
P(T<=t) two-tail 0.0557
t Critical two-tail 2.3646
Paired tests
Amethyst Interviews
t-Test: Paired Two Sample for Means

Non-parametric tests are usedNon-parametric tests are used
when there is no appropriatewhen there is no appropriate
parameter to measureparameter to measure
 Then we have to use a non-parametric – orThen we have to use a non-parametric – or
distribution free –testdistribution free –test
 The most common is the chi squaredThe most common is the chi squared testtest
 WhereWhere
 The shape of the curve depends on theThe shape of the curve depends on the
degrees of freedom (the number of classesdegrees of freedom (the number of classes
minus the number of estimated variablesminus the number of estimated variables
minus one)minus one)
E
)E(O
=
2
2 -
∑χ

Goodness of fit testGoodness of fit test
 We reject the hypothesis that the dataWe reject the hypothesis that the data
follows a specified distribution when thefollows a specified distribution when the
calculated value is greater than acalculated value is greater than a
critical value for the distributioncritical value for the distribution
 The critical value is found inThe critical value is found in
 Standard tablesStandard tables
 Statistical softwareStatistical software
 The CHIINV function in spreadsheetsThe CHIINV function in spreadsheets

 The probability of this is given by aThe probability of this is given by a χχ22
distribution with cdistribution with c −− 1 degrees of freedom (c is1 degrees of freedom (c is
the number of categories)the number of categories)
 This comes from standard tables or softwareThis comes from standard tables or software
 In this case the probability thatIn this case the probability that χχ22
> 66.35 is> 66.35 is
0.000 (to three decimal places)0.000 (to three decimal places)
 This probability is less than 0.05 so we rejectThis probability is less than 0.05 so we reject
the null hypothesis that the observations comethe null hypothesis that the observations come
from the stated distributionfrom the stated distribution
Example (continued)Example (continued)

There are five categories so we need the χ2
distribution with
5 – 1 = 4 degrees of freedom and then P (2 > 11.389) = 0.0225.
This is less than 0.05 so we conclude that there is evidence that
the probability distribution differs from the hypothesised one.
Example (continued)Example (continued)

Tests of associationTests of association
 Test for relationships betweenTest for relationships between
variables described in a contingencyvariables described in a contingency
tabletable

Contingency tables continuedContingency tables continued
 The expected value in each cell is:The expected value in each cell is:
E =E = row totalrow total ×× column totalcolumn total
number of observationsnumber of observations
 Then we calculateThen we calculate χχ22
as usual and look up theas usual and look up the
related probability with degrees of freedom:related probability with degrees of freedom:
(number of rows(number of rows −−1)1) ×× (number of columns(number of columns −−1)1)
= 2= 2−−1)1) ×× (3(3−−1) = 21) = 2

Learning ObjectivesLearning Objectives
1.1. Understand the purpose of regressionUnderstand the purpose of regression
2.2. See how the strength of a relationship is related to theSee how the strength of a relationship is related to the
amount of noiseamount of noise
3.3. Measure the errors introduced by noiseMeasure the errors introduced by noise
4.4. Use linear regression to find the line of best ﬁt through aUse linear regression to find the line of best ﬁt through a
set of dataset of data
5.5. Use this line of best fit for causal forecastingUse this line of best fit for causal forecasting
6.6. Calculate and interpret coefficients of determination andCalculate and interpret coefficients of determination and
correlationcorrelation
7.7. Use Spearman’s coefficient of rank correlationUse Spearman’s coefficient of rank correlation
8.8. Understand the results of multiple regressionUnderstand the results of multiple regression
9.9. Use curve fitting for more complex functionsUse curve fitting for more complex functions

Regression and curve fitting look forRegression and curve fitting look for
relationships between variablesrelationships between variables
 There are two basic questions:There are two basic questions:
 Finding the best relationship,Finding the best relationship,
which is regressionwhich is regression
 Seeing how well the relationshipSeeing how well the relationship
fits the data, which is measuredfits the data, which is measured
by correlation and determinationby correlation and determination

The most common approach isThe most common approach is
linear regressionlinear regression
 This draws the straight line of best fitThis draws the straight line of best fit
through a set of datathrough a set of data
 The general procedure is:The general procedure is:
 Draw a scatter diagramDraw a scatter diagram
 Identify a linear relationshipIdentify a linear relationship
 Find the line of best fit through the dataFind the line of best fit through the data
 Uses this line to predict a value for theUses this line to predict a value for the
dependent variable from a known value ofdependent variable from a known value of
the independent variablethe independent variable

1 - 58
0
5
10
15
20
25
30
35
40
45
0 2 4 6 8 10 12 14 16 18
Temperature
Electricityconsupmtion
Underlying trend
Some noise
More noise

To find the line of best fitTo find the line of best fit
 We have to find values for the constants aWe have to find values for the constants a
and b in the equation y = a + bxand b in the equation y = a + bx
 There is noise - or a deviation or error - inThere is noise - or a deviation or error - in
every observationevery observation
 The ‘best’ line is defined as the one thatThe ‘best’ line is defined as the one that
minimises the mean squared errorminimises the mean squared error
 There is a calculation for a and b, butThere is a calculation for a and b, but
standard software is more reliablestandard software is more reliable
( )
xb-y=a
x-xn
yx-xyn
=b 22
∑∑
∑∑∑

1 - 61
Quality control at Olfentia Travel
Checks Mistakes SUMMARY OUTPUT
0 92
1 86 Regression Statistics
2 81 Multiple R 0.994
3 72 R Square 0.988
4 67 Adjusted R Square 0.986
5 59 Standard Error 3.077
6 53 Observations 11
7 43
8 32 ANOVA
9 24 df SS MS
10 12 Regression 1 6833.536 6833.536
Residual 9 85.191 9.466
Total 10 6918.727
CoefficientsStandard Error
Intercept 95.864 1.735
X Variable 1 -7.882 0.293
0
10
20
30
40
50
60
70
80
90
100
0 2 4 6 8 10 12
Number of checks
Numberofmistakes

The strength of a relationship isThe strength of a relationship is
measured bymeasured by
 Coefficient of determinationCoefficient of determination
 Which see how much of the total sum of squaredWhich see how much of the total sum of squared
errors is explained by the regressionerrors is explained by the regression
 Takes a value between zero and + 1Takes a value between zero and + 1
 Coefficient of correlationCoefficient of correlation
 Which asks how closely x and y are linearlyWhich asks how closely x and y are linearly
relatedrelated
 Is the square root of the coefficient ofIs the square root of the coefficient of
determinationdetermination
 Takes a value between -1 and +1Takes a value between -1 and +1
 Spearman’s rank correlationSpearman’s rank correlation
 For ordinal dataFor ordinal data

1 - 64
yy
x x
y y
x x
y
xx
y
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
r = + 1 , p e r fe c t p o s it iv e c o r r e la t io n r c lo s e t o 1 , lin e o f g o o d fit
r = 0 , r a n d o m p o in t s w i t h n o
c o r r e la t io n
r d e c r e a s in g w it h w o r s e fit o f lin e
r c lo s e t o - 1 , lin e o f g o o d fit
r = - 1 , p e r fe c t n e g a t iv e
c o r r e la t io n
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
F ig u r e 9 . 1 2 In t e r p r e t in g t h e c o e ffic ie n t o f c o r r e la t io n

1 - 65
Correlation and determination
x y SUMMARY OUTPUT
4 13
17 47 Regression Statistics
3 24 Multiple R 0.797
21 41 R Square 0.635
10 29 Adjusted R Square 0.607
8 33 Standard Error 7.261
4 28 Observations 15
9 38
13 46 ANOVA
12 32 df SS
2 14 Regression 1 1191.630
6 22 Residual 13 685.304
15 26 Total 14 1876.933
8 21
19 50 Coefficients
Intercept 15.376
X Variable 1 1.545

Rank correlationRank correlation
2
s 2
6 6 8
1 1
( 1) 5 (25 1)
0.6
D
r
n n
×
= − = −
− × −
=
∑
Service
V W X Y Z
Quality ranking 2 5 1 3 4
Cost ranking 1 3 2 4 5

Multiple (linear) regressionMultiple (linear) regression
 Extends the principles of linear regression toExtends the principles of linear regression to
more independent variablesmore independent variables
Y = a + bY = a + b11xx11 + b+ b22xx22 + b+ b33xx33 + b+ b44xx44 …….…….
 The calculations for this are always done byThe calculations for this are always done by
standard softwarestandard software
 You have to be careful with the requirementsYou have to be careful with the requirements
(technically called multicollinearity,(technically called multicollinearity,
autocorrelation, extrapolation, etc)autocorrelation, extrapolation, etc)
 You also have to be careful with theYou also have to be careful with the
interpretation of resultsinterpretation of results

1 - 68
Multiple Regression
DATA
Sales Advertising Price
2450 100 50
3010 130 56
3090 160 45
3700 190 63
3550 210 48
4280 240 70
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.996
R Square 0.992
Adjusted R Square 0.986
Standard Error 75.055
Observations 6
ANOVA
df SS
Regression 2 2003633.452
Residual 3 16899.882
Total 5 2020533.333
Coefficients Standard Error
Intercept 585.96 195.57
X Variable 1 9.92 0.76
X Variable 2 19.11 4.13

Multiple Linear RegressionMultiple Linear Regression

Computer ResultsComputer Results

1 - 71
Multiple regression
Production Shifts Bonus Overtime Morale
2810 6 15 8 5
2620 3 20 10 6
3080 3 5 22 3
4200 4 5 31 2
1500 1 7 9 8
3160 2 12 22 10
4680 2 25 30 7
2330 7 10 5 7
1780 1 12 7 5
3910 8 3 20 3
Correlations
Production Shifts Bonus Overtime Morale
Production 1
Shifts 0.262 1
Bonus 0.153 -0.315 1
Overtime 0.878 -0.108 -0.022 1
Morale -0.332 -0.395 0.451 -0.265 1
Regression
Multiple R 0.997
R Square 0.995
Adjusted R Square 0.990
Standard Error 100.807
Observations 10
ANOVA
df SS
Regression 4 9439000
Residual 5 50810
Total 9 9489810
Coefficients Standard Error
Intercept 346.33 160.22
X Variable 1 181.80 15.25
X Variable 2 50.13 5.44
X Variable 3 96.17 3.69
X Variable 4 -28.70 16.76

Curve fittingCurve fitting
 Is a more general term than regressionIs a more general term than regression
 It refers to the process of fittingIt refers to the process of fitting
different types of curve through sets ofdifferent types of curve through sets of
datadata
 Typically this involves:Typically this involves:
 Exponential curvesExponential curves
 Growth curvesGrowth curves
 PolynomialsPolynomials

1 - 73
Curve fitting
Year Cost Predictions Line of fit
1 0.8 0.816 y=bm^x b= 0.6541 m= 1.2475
2 1 1.018
3 1.3 1.270
4 1.7 1.584
5 2 1.976
6 2.4 2.465
7 2.9 3.075
8 3.8 3.836
9 4.7 4.786
10 6.2 5.970
11 7.5 7.448
12 9.291
13 11.591
14 14.459
0
2
4
6
8
10
12
14
16
0 2 4 6 8 10 12 14

Further ReadingFurther Reading
 Waters, Donald (2011) QuantitativeWaters, Donald (2011) Quantitative
Methods for Business, Prentice Hall, 5Methods for Business, Prentice Hall, 5thth
EditionEdition
 Evans, J.R (2013), BusinessEvans, J.R (2013), Business Analytics,Analytics,
1st Edition1st Edition PearsonPearson
 Render, B., Stair Jr.,R.M. & Hanna, M.E.
(2013) Quantitative Analysis for
Management, Pearson, 11th
Edition

Tbs910 sampling hypothesis regression

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Viewers also liked

Viewers also liked (20)

Similar to Tbs910 sampling hypothesis regression

Similar to Tbs910 sampling hypothesis regression (20)

More from Stephen Ong

More from Stephen Ong (20)

Recently uploaded

Recently uploaded (20)

Tbs910 sampling hypothesis regression