5. Learning Objectives:Learning Objectives:
1.1. Understand how and why to useUnderstand how and why to use
samplingsampling
2.2. Appreciate the aims of statisticalAppreciate the aims of statistical
inferenceinference
3.3. Use sampling distributions to findUse sampling distributions to find
point estimates for population meanspoint estimates for population means
4.4. Calculate confidence intervals forCalculate confidence intervals for
means and proportionsmeans and proportions
5.5. Use one-sided distributionsUse one-sided distributions
6.6. UseUse tt-distributions for small samples-distributions for small samples
6. Collecting dataCollecting data
There are essentially two types of dataThere are essentially two types of data
collectioncollection
CensusCensus
SampleSample
A census is often expensive, infeasible,A census is often expensive, infeasible,
impossible, unnecessaryimpossible, unnecessary
The usual choice is to collect a sampleThe usual choice is to collect a sample
7. Sampling is the basis ofSampling is the basis of
statistical inferencestatistical inference
This collects data from a randomThis collects data from a random
sample of the populationsample of the population
Uses this data to estimates features ofUses this data to estimates features of
the whole populationthe whole population
8. ExampleExample
The percentages of glucose in five bars ofThe percentages of glucose in five bars of
toffee are 7.2, 6.4, 7.2, 8.0 and 8.2toffee are 7.2, 6.4, 7.2, 8.0 and 8.2
The mean of these is:The mean of these is:
∑∑x / n = 37/5 = 7.4x / n = 37/5 = 7.4
This is our estimate of the population meanThis is our estimate of the population mean
The variance is (∑xThe variance is (∑x22
– (∑x)– (∑x)22
/n) / (n/n) / (n−−1) = 0.521) = 0.52
This is our estimate of the populationThis is our estimate of the population
variancevariance
9. Sampling fluctuationsSampling fluctuations
In practice, each sample will be slightlyIn practice, each sample will be slightly
differentdifferent
The variation is described by aThe variation is described by a
sampling distributionsampling distribution
11. Sampling distributionsSampling distributions
Data collected from samples gives a samplingData collected from samples gives a sampling
distribution, such as the sampling distributiondistribution, such as the sampling distribution
of the meanof the mean
This has the features:This has the features:
If a population is Normally distributed, the samplingIf a population is Normally distributed, the sampling
distribution of the mean is also Normally distributeddistribution of the mean is also Normally distributed
With large samples the sampling distribution of theWith large samples the sampling distribution of the
mean is Normally distributed regardless of themean is Normally distributed regardless of the
population distributionpopulation distribution
The sampling distribution of the mean has a mean μThe sampling distribution of the mean has a mean μ
and standard deviation σ/and standard deviation σ/√√nn
13. ExampleExample
Chocolate bars have a mean weight of 50gChocolate bars have a mean weight of 50g
and standard deviation of 10g. A sample ofand standard deviation of 10g. A sample of
64 bars is taken and if the mean weight is64 bars is taken and if the mean weight is
less than 46g the whole days production isless than 46g the whole days production is
scraped.scraped.
A large sample gives a samplingA large sample gives a sampling
distribution with mean 50g and standarddistribution with mean 50g and standard
deviation 10/√64 = 1.25gdeviation 10/√64 = 1.25g
Z = (46Z = (46−−50)/1.25 =50)/1.25 = −−3.23.2
P (Z<P (Z< −−3.2) =3.2) = −−0.00070.0007
14. Sampling distributions can give point estimatesSampling distributions can give point estimates
of, say, a meanof, say, a mean
Interval estimates are more usefulInterval estimates are more useful
A 95% confidence interval, say, defines theA 95% confidence interval, say, defines the
range that we are 95% confident that therange that we are 95% confident that the
population mean lies withinpopulation mean lies within
90% confidence interval:90% confidence interval: −− 1.645σ/1.645σ/√√n to + 1.645σ/n to + 1.645σ/√√nn
95% confidence interval:95% confidence interval: −− 1.96σ/1.96σ/√√n to + 1.96σ/n to + 1.96σ/√√nn
99% confidence interval:99% confidence interval: −− 2.58σ/2.58σ/√√n to + 2.58σ/n to + 2.58σ/√√nn
15. The best point estimate for a populationThe best point estimate for a population
mean, is the mean of the samplingmean, is the mean of the sampling
distributiondistribution
The best estimate for the populationThe best estimate for the population
standard deviation is the standard errorstandard deviation is the standard error
The standard error can be biased for smallThe standard error can be biased for small
samplessamples
Then Bessel’s correction usesThen Bessel’s correction uses
σ = s/σ = s/√√(n(n −− 1) rather than σ = s/1) rather than σ = s/√√nn
17. ExampleExample
Attendance is normally distributed withAttendance is normally distributed with
variance of 225. Samples on five days give:variance of 225. Samples on five days give:
220220 196196 210210 186186 222222
The point estimate of the mean is ∑x/5 =The point estimate of the mean is ∑x/5 =
206.8206.8
95% confidence interval is:95% confidence interval is:
206.8206.8 ± 1.95√(225/5) = 206.8 ± 13.15± 1.95√(225/5) = 206.8 ± 13.15
For 95% of samples the interval 193.65 toFor 95% of samples the interval 193.65 to
219.95 would include the population mean219.95 would include the population mean
18. Investment returns are normally distributedInvestment returns are normally distributed
with standard deviation 0.5. A randomwith standard deviation 0.5. A random
sample of 10 has a mean of 0.9%.sample of 10 has a mean of 0.9%.
Point estimate of population mean is 0.9Point estimate of population mean is 0.9
95% confidence interval is:95% confidence interval is:
98% confidence interval is:98% confidence interval is:
ExampleExample
19. The principles of sampling canThe principles of sampling can
be extended in many waysbe extended in many ways
Population proportions, wherePopulation proportions, where
sampling distributions are:sampling distributions are:
Normally distributedNormally distributed
with meanwith mean ππ
and standard deviationand standard deviation √√((ππ(1(1 −− ππ)/n))/n)
One sided confidence intervalsOne sided confidence intervals
Small samplesSmall samples
20. Population proportionsPopulation proportions
With large sample (more than 30) theWith large sample (more than 30) the
sample proportions are:sample proportions are:
Normally distributedNormally distributed
with meanwith mean ππ
standard deviationstandard deviation √√ ((ππ (1 –(1 – ππ)/)/nn))
The 95% confidence interval is:The 95% confidence interval is:
p – 1.96p – 1.96 ×× √(p (1√(p (1−−p) / n)p) / n)
to p + 1.96to p + 1.96 ×× √(p (1√(p (1 −− p) / n)p) / n)
22. Sampling distributions withSampling distributions with
small samples are no longersmall samples are no longer
NormalNormal
Small samples tend to include fewerSmall samples tend to include fewer
outlying values and under-estimate theoutlying values and under-estimate the
spreadspread
This effect is allowed for in t-This effect is allowed for in t-
distributionsdistributions
The shape of the t-distribution dependsThe shape of the t-distribution depends
on the degrees of freedomon the degrees of freedom
The degree of freedom is essentiallyThe degree of freedom is essentially
one less than the sample sizeone less than the sample size
24. ExampleExample
Duration of rentals is normally distributed.Duration of rentals is normally distributed.
A random sample of 14 has a mean of 2.1429A random sample of 14 has a mean of 2.1429
and variance of 1.6703.and variance of 1.6703.
Point estimate of population mean = 2.1429Point estimate of population mean = 2.1429
Estimate of population standard deviation isEstimate of population standard deviation is
√(1.6703/14) = 0.3454√(1.6703/14) = 0.3454
For t distribution with 13 degrees ofFor t distribution with 13 degrees of
freedom,99% confidence interval is 3.012freedom,99% confidence interval is 3.012
Confidence interval is 2.1429Confidence interval is 2.1429 ±3.012±3.012 ×× 0.34540.3454
or 1.10 to 3.18or 1.10 to 3.18
25. Finding probabilities for theFinding probabilities for the
t-distributiont-distribution
Calculations for t-distributions use:Calculations for t-distributions use:
Standard tablesStandard tables
Specialised statistical softwareSpecialised statistical software
The TINV function in spreadsheetsThe TINV function in spreadsheets
For samples of more than about 30, theFor samples of more than about 30, the
t-distribution is identical to the Normalt-distribution is identical to the Normal
26. 1 - 26
Student-t distribution
Sample size 8
Mean 37
Standard deviation 12
Part (a)
Degrees of freedom 7 D3 - 1
Standard error 4.536 D5/SQRT(D3-1)
Confidence interval 90
Number of standard deviations 1.895 TINV((100-D11/100,D)
Confidence interval From 28.407 D4 - (D12*D9)
To 45.593001 D4 + (D12*D9)
Part (b)
Confidence interval 95
Number of standard deviations 2.365 TINV((100-D17/100,D8)
Confidence interval From 26.275 D4 - (D18*D9)
To 47.725 D4 + (D18*D9)
Part (c)
Sample size 20
Degrees of freedom 19
Standard error 2.753 D5/SQRT(D23-1)
Confidence interval 95
Number of standard deviations 2.093 TINV((100-D26/100,D24)
Confidence interval From 31.238 D4 - (D27*D25)
To 42.762 D4 + (D27*D25)
Part (d)
Normal distribution
Sample size 20
Standard error 2.753 D5/SQRT(D33-1)
Confidence interval 95
Number of standard deviations 1.960 NORMSINV((100-D35/200)
Confidence interval From 31.604 D4 - (D36*D34)
To 42.396 D4 + (D36*D34)
28. Learning ObjectivesLearning Objectives
1.1. Understand the purpose of hypothesis testingUnderstand the purpose of hypothesis testing
2.2. List the steps involved in hypothesis testingList the steps involved in hypothesis testing
3.3. Understand the errors involved and the use ofUnderstand the errors involved and the use of
significance levelssignificance levels
4.4. Test hypotheses about population meansTest hypotheses about population means
5.5. Use one- and two-tail testsUse one- and two-tail tests
6.6. Extend these tests to deal with small samplesExtend these tests to deal with small samples
7.7. Use the tests for a variety of problemsUse the tests for a variety of problems
8.8. Consider non-parametric tests, particularlyConsider non-parametric tests, particularly
the chi-squared testthe chi-squared test
29. Hypothesis testingHypothesis testing
Considers a simple statement about aConsiders a simple statement about a
populationpopulation
This is the hypothesisThis is the hypothesis
Uses a sample to test whether the statement isUses a sample to test whether the statement is
likely to be true or is unlikelylikely to be true or is unlikely
It sees whether or not data from a sampleIt sees whether or not data from a sample
supports a hypothesis about the populationsupports a hypothesis about the population
If the hypothesis is unlikely, the hypothesis isIf the hypothesis is unlikely, the hypothesis is
rejected and another implied hypothesis mustrejected and another implied hypothesis must
be truebe true
30. Define a simple, precise statement aboutDefine a simple, precise statement about
a population (the hypothesis)a population (the hypothesis)
Take a sample from the populationTake a sample from the population
Test this sample to see if it supports theTest this sample to see if it supports the
hypothesis, or if it makes the hypothesishypothesis, or if it makes the hypothesis
highly improbablehighly improbable
If the hypothesis is highly improbableIf the hypothesis is highly improbable
reject it, otherwise accept itreject it, otherwise accept it
Procedure for hypothesisProcedure for hypothesis
testingtesting
31. ExampleExample
A politician claims that 10% of factoriesA politician claims that 10% of factories
in an area are losing money (hypothesis)in an area are losing money (hypothesis)
A sample of 30 shows that all of them areA sample of 30 shows that all of them are
profitable (test statistic)profitable (test statistic)
If the hypothesis is true, the probabilityIf the hypothesis is true, the probability
that they are all profitable is 0.9that they are all profitable is 0.93030
= 0.0424= 0.0424
(p-value)(p-value)
This is very unlikely, so we can reject theThis is very unlikely, so we can reject the
hypothesis (test result)hypothesis (test result)
32. Elements of a hypothesis testElements of a hypothesis test
1.1. Hypothesis – a simple statementHypothesis – a simple statement
about a populationabout a population
2.2. Test result - actual result from theTest result - actual result from the
samplesample
3.3. Test statistic – a calculation about aTest statistic – a calculation about a
sample assuming that the hypothesissample assuming that the hypothesis
is trueis true
4.4. Conclusion – either reject theConclusion – either reject the
hypothesis or nothypothesis or not
33. ExampleExample
1. Hypothesis - is that half of all staff have a degree
Ho : P = 0.5 H1 : P < 0.5
2. Test result - in a sample of 10 staff three have degrees
3. Test statistic – the number with degrees is a binomial
process. If probability is 0.5, the probability of a value
more extreme than 3 is:
p = P(x≤3) = P(0) + P(1) + P(2) + P(3) = 0.1719
4. Conclusion – 17.19% is greater than 5%, so the test is not
significant and there is no evidence to reject Ho
34. Because of uncertainty, we canBecause of uncertainty, we can
never be certain of the resultsnever be certain of the results
Then we cannot really ‘accept’ aThen we cannot really ‘accept’ a
hypothesis, but instead we say that wehypothesis, but instead we say that we
‘cannot reject’ it‘cannot reject’ it
If we reject the hypothesis – called theIf we reject the hypothesis – called the
null hypothesis – we implicitly acceptnull hypothesis – we implicitly accept
another hypothesis – called theanother hypothesis – called the
alternative hypothesisalternative hypothesis
35. Uncertainty also means thatUncertainty also means that
we can make mistakeswe can make mistakes
36. Type I and Type II errorsType I and Type II errors
Ideally, both Type I and Type II errorsIdeally, both Type I and Type II errors
would be smallwould be small
In practice, the probability of a Type IIn practice, the probability of a Type I
error is the significance levelerror is the significance level
As this decreases, the probability ofAs this decreases, the probability of
accepting a false null hypothesisaccepting a false null hypothesis
increasesincreases
We have to balance the two types ofWe have to balance the two types of
errorerror
37. Significance levelSignificance level
Is the minimum acceptable probability that aIs the minimum acceptable probability that a
value actually comes from the hypothesisedvalue actually comes from the hypothesised
populationpopulation
When the probability is less than this, we rejectWhen the probability is less than this, we reject
the null hypothesis; when the probability isthe null hypothesis; when the probability is
more than this we do not reject itmore than this we do not reject it
It is the maximum acceptable probability ofIt is the maximum acceptable probability of
making a Type I errormaking a Type I error
Usually 0.05, but other values are possible,Usually 0.05, but other values are possible,
notably 0.01notably 0.01
39. Bringing ideas together gives theBringing ideas together gives the
formal steps in a hypothesis testformal steps in a hypothesis test
State the null and alternative hypothesesState the null and alternative hypotheses
Specify the significance levelSpecify the significance level
Calculate the acceptance range for theCalculate the acceptance range for the
variable testedvariable tested
Find the actual value for the variableFind the actual value for the variable
testedtested
Decide whether or not to reject the nullDecide whether or not to reject the null
hypothesishypothesis
State the conclusionState the conclusion
40. ExampleExample
The time a doctor spends with patients is N(7, 2The time a doctor spends with patients is N(7, 222
). A). A
new doctor spends seems to work more slowly.new doctor spends seems to work more slowly.
HH00 :: μμ = 7= 7 HH11 :: μμ > 7> 7
A sample of 56 patients takes 420 minutes, givingA sample of 56 patients takes 420 minutes, giving μμ ==
7.57.5
Probability ofProbability of μμ ≥7.5 (with Z = 0.5/(2/√56)) 0.03≥7.5 (with Z = 0.5/(2/√56)) 0.03
0.03 is less than 0.05 the test is significant and we can0.03 is less than 0.05 the test is significant and we can
reject Hreject H00 There is evidence to support HThere is evidence to support H11 that the newthat the new
doctor spends more than 7 minutesdoctor spends more than 7 minutes
41. There is a huge number ofThere is a huge number of
variations on the generalvariations on the general
procedure including:procedure including:
One-sided testsOne-sided tests
Tests with small samplesTests with small samples
Tests for population proportionsTests for population proportions
Testing for differences in meansTesting for differences in means
Paired testsPaired tests
Goodness of fitGoodness of fit
Tests of associationTests of association
43. 1 - 43
Data
Original
interviews
Later
interviews
Original
interviews
Later
interviews
10 10 Mean 8.625 10.125
11 9 Variance 2.8393 1.2679
9 11 Observations 8 8
6 10 Hypothesised Mean Difference 0
8 9 Degrees of freedom 7
10 12 t Statistic 2.291
7 9 P(T<=t) one-tail 0.0279
8 11 t Critical one-tail 1.8946
P(T<=t) two-tail 0.0557
t Critical two-tail 2.3646
Paired tests
Amethyst Interviews
t-Test: Paired Two Sample for Means
44. Non-parametric tests are usedNon-parametric tests are used
when there is no appropriatewhen there is no appropriate
parameter to measureparameter to measure
Then we have to use a non-parametric – orThen we have to use a non-parametric – or
distribution free –testdistribution free –test
The most common is the chi squaredThe most common is the chi squared testtest
WhereWhere
The shape of the curve depends on theThe shape of the curve depends on the
degrees of freedom (the number of classesdegrees of freedom (the number of classes
minus the number of estimated variablesminus the number of estimated variables
minus one)minus one)
E
)E(O
=
2
2 -
∑χ
46. Goodness of fit testGoodness of fit test
We reject the hypothesis that the dataWe reject the hypothesis that the data
follows a specified distribution when thefollows a specified distribution when the
calculated value is greater than acalculated value is greater than a
critical value for the distributioncritical value for the distribution
The critical value is found inThe critical value is found in
Standard tablesStandard tables
Statistical softwareStatistical software
The CHIINV function in spreadsheetsThe CHIINV function in spreadsheets
48. The probability of this is given by aThe probability of this is given by a χχ22
distribution with cdistribution with c −− 1 degrees of freedom (c is1 degrees of freedom (c is
the number of categories)the number of categories)
This comes from standard tables or softwareThis comes from standard tables or software
In this case the probability thatIn this case the probability that χχ22
> 66.35 is> 66.35 is
0.000 (to three decimal places)0.000 (to three decimal places)
This probability is less than 0.05 so we rejectThis probability is less than 0.05 so we reject
the null hypothesis that the observations comethe null hypothesis that the observations come
from the stated distributionfrom the stated distribution
Example (continued)Example (continued)
49. There are five categories so we need the χ2
distribution with
5 – 1 = 4 degrees of freedom and then P (2 > 11.389) = 0.0225.
This is less than 0.05 so we conclude that there is evidence that
the probability distribution differs from the hypothesised one.
Example (continued)Example (continued)
50. Tests of associationTests of association
Test for relationships betweenTest for relationships between
variables described in a contingencyvariables described in a contingency
tabletable
51. Contingency tables continuedContingency tables continued
The expected value in each cell is:The expected value in each cell is:
E =E = row totalrow total ×× column totalcolumn total
number of observationsnumber of observations
Then we calculateThen we calculate χχ22
as usual and look up theas usual and look up the
related probability with degrees of freedom:related probability with degrees of freedom:
(number of rows(number of rows −−1)1) ×× (number of columns(number of columns −−1)1)
= 2= 2−−1)1) ×× (3(3−−1) = 21) = 2
54. Learning ObjectivesLearning Objectives
1.1. Understand the purpose of regressionUnderstand the purpose of regression
2.2. See how the strength of a relationship is related to theSee how the strength of a relationship is related to the
amount of noiseamount of noise
3.3. Measure the errors introduced by noiseMeasure the errors introduced by noise
4.4. Use linear regression to find the line of best fit through aUse linear regression to find the line of best fit through a
set of dataset of data
5.5. Use this line of best fit for causal forecastingUse this line of best fit for causal forecasting
6.6. Calculate and interpret coefficients of determination andCalculate and interpret coefficients of determination and
correlationcorrelation
7.7. Use Spearman’s coefficient of rank correlationUse Spearman’s coefficient of rank correlation
8.8. Understand the results of multiple regressionUnderstand the results of multiple regression
9.9. Use curve fitting for more complex functionsUse curve fitting for more complex functions
55. Regression and curve fitting look forRegression and curve fitting look for
relationships between variablesrelationships between variables
There are two basic questions:There are two basic questions:
Finding the best relationship,Finding the best relationship,
which is regressionwhich is regression
Seeing how well the relationshipSeeing how well the relationship
fits the data, which is measuredfits the data, which is measured
by correlation and determinationby correlation and determination
56. The most common approach isThe most common approach is
linear regressionlinear regression
This draws the straight line of best fitThis draws the straight line of best fit
through a set of datathrough a set of data
The general procedure is:The general procedure is:
Draw a scatter diagramDraw a scatter diagram
Identify a linear relationshipIdentify a linear relationship
Find the line of best fit through the dataFind the line of best fit through the data
Uses this line to predict a value for theUses this line to predict a value for the
dependent variable from a known value ofdependent variable from a known value of
the independent variablethe independent variable
59. To find the line of best fitTo find the line of best fit
We have to find values for the constants aWe have to find values for the constants a
and b in the equation y = a + bxand b in the equation y = a + bx
There is noise - or a deviation or error - inThere is noise - or a deviation or error - in
every observationevery observation
The ‘best’ line is defined as the one thatThe ‘best’ line is defined as the one that
minimises the mean squared errorminimises the mean squared error
There is a calculation for a and b, butThere is a calculation for a and b, but
standard software is more reliablestandard software is more reliable
( )
xb-y=a
x-xn
yx-xyn
=b 22
∑∑
∑∑∑
61. 1 - 61
Quality control at Olfentia Travel
Checks Mistakes SUMMARY OUTPUT
0 92
1 86 Regression Statistics
2 81 Multiple R 0.994
3 72 R Square 0.988
4 67 Adjusted R Square 0.986
5 59 Standard Error 3.077
6 53 Observations 11
7 43
8 32 ANOVA
9 24 df SS MS
10 12 Regression 1 6833.536 6833.536
Residual 9 85.191 9.466
Total 10 6918.727
CoefficientsStandard Error
Intercept 95.864 1.735
X Variable 1 -7.882 0.293
0
10
20
30
40
50
60
70
80
90
100
0 2 4 6 8 10 12
Number of checks
Numberofmistakes
62. The strength of a relationship isThe strength of a relationship is
measured bymeasured by
Coefficient of determinationCoefficient of determination
Which see how much of the total sum of squaredWhich see how much of the total sum of squared
errors is explained by the regressionerrors is explained by the regression
Takes a value between zero and + 1Takes a value between zero and + 1
Coefficient of correlationCoefficient of correlation
Which asks how closely x and y are linearlyWhich asks how closely x and y are linearly
relatedrelated
Is the square root of the coefficient ofIs the square root of the coefficient of
determinationdetermination
Takes a value between -1 and +1Takes a value between -1 and +1
Spearman’s rank correlationSpearman’s rank correlation
For ordinal dataFor ordinal data
64. 1 - 64
yy
x x
y y
x x
y
xx
y
o o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
oo
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
o
r = + 1 , p e r fe c t p o s it iv e c o r r e la t io n r c lo s e t o 1 , lin e o f g o o d fit
r = 0 , r a n d o m p o in t s w i t h n o
c o r r e la t io n
r d e c r e a s in g w it h w o r s e fit o f lin e
r c lo s e t o - 1 , lin e o f g o o d fit
r = - 1 , p e r fe c t n e g a t iv e
c o r r e la t io n
o
o
o
o
o
o
o
o
o
o
o
o o
o
o
o
o
o
F ig u r e 9 . 1 2 In t e r p r e t in g t h e c o e ffic ie n t o f c o r r e la t io n
65. 1 - 65
Correlation and determination
x y SUMMARY OUTPUT
4 13
17 47 Regression Statistics
3 24 Multiple R 0.797
21 41 R Square 0.635
10 29 Adjusted R Square 0.607
8 33 Standard Error 7.261
4 28 Observations 15
9 38
13 46 ANOVA
12 32 df SS
2 14 Regression 1 1191.630
6 22 Residual 13 685.304
15 26 Total 14 1876.933
8 21
19 50 Coefficients
Intercept 15.376
X Variable 1 1.545
66. Rank correlationRank correlation
2
s 2
6 6 8
1 1
( 1) 5 (25 1)
0.6
D
r
n n
×
= − = −
− × −
=
∑
Service
V W X Y Z
Quality ranking 2 5 1 3 4
Cost ranking 1 3 2 4 5
67. Multiple (linear) regressionMultiple (linear) regression
Extends the principles of linear regression toExtends the principles of linear regression to
more independent variablesmore independent variables
Y = a + bY = a + b11xx11 + b+ b22xx22 + b+ b33xx33 + b+ b44xx44 …….…….
The calculations for this are always done byThe calculations for this are always done by
standard softwarestandard software
You have to be careful with the requirementsYou have to be careful with the requirements
(technically called multicollinearity,(technically called multicollinearity,
autocorrelation, extrapolation, etc)autocorrelation, extrapolation, etc)
You also have to be careful with theYou also have to be careful with the
interpretation of resultsinterpretation of results
68. 1 - 68
Multiple Regression
DATA
Sales Advertising Price
2450 100 50
3010 130 56
3090 160 45
3700 190 63
3550 210 48
4280 240 70
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.996
R Square 0.992
Adjusted R Square 0.986
Standard Error 75.055
Observations 6
ANOVA
df SS
Regression 2 2003633.452
Residual 3 16899.882
Total 5 2020533.333
Coefficients Standard Error
Intercept 585.96 195.57
X Variable 1 9.92 0.76
X Variable 2 19.11 4.13
72. Curve fittingCurve fitting
Is a more general term than regressionIs a more general term than regression
It refers to the process of fittingIt refers to the process of fitting
different types of curve through sets ofdifferent types of curve through sets of
datadata
Typically this involves:Typically this involves:
Exponential curvesExponential curves
Growth curvesGrowth curves
PolynomialsPolynomials