Business analytics© 2021 cengage learning. all rights

Business Analytics
© 2021 Cengage Learning. All Rights Reserved. May not be
scanned, copied or duplicated, or posted to a publicly accessible
website, in whole or in part.
Linear Regression
Chapter 7
Introduction (Slide 1 of 2)
Managerial decisions are often based on the relationship
between two or more variables:
Example: After considering the relationship between advertising
expenditures and sales, a marketing manager might attempt to
predict sales for a given level of advertising expenditures.
Sometimes a manager will rely on intuition to judge how two
variables are related.
If data can be obtained, a statistical procedure called regression

analysis can be used to develop an equation showing how the
variables are related.
scanned, copied or duplicated, or posted to a publ icly accessible
Dependent variable or response: Variable being predicted.
Independent variables or predictor variables: Variables being
used to predict the value of the dependent variable.
Simple linear regression: A regression analysis for which any
one unit change in the independent variable, x, is assumed to
result in the same change in the dependent variable, y.
Multiple linear regression: A regression analysis involving two
or more independent variables.
Simple Linear Regression Model

Regression Model
Estimated Regression Equation
Simple Linear Regression Model (Slide 1 of 5)
Regression Model:
The equation that describes how y is related to x and an error
term.
The error term accounts for the variability in y that cannot be
explained by the linear relationship between x and y.
In the Butler Trucking Company example, the population
consists of all the driving assignments that can be made by the
company.
For every driving assignment in the population, there is a value

of x (miles traveled) and a corresponding value of y (travel time
in hours).
6
Estimated Regression Equation:
The parameter values are usually not known and must be
estimated using sample data.
the regression equation and dropping the error term, we obtain
the estimated regression for simple linear regression.
To estimate the mean or expected value of travel time for a
driving assignment of 75 miles, Butler Trucking would
substitute the value of 75 for x in equation = b0 + b1x.
7
In the estimated simple linear regression equation:

The graph of the estimated simple linear regression equation is
called the estimated regression line.
scanned, copied or duplicated, or posted to a publicly access ible
To estimate the mean or expected value of travel time for a
driving assignment of 75 miles, Butler Trucking would
substitute the value of 75 for x in equation = b0 + b1x.
8
Meghan Cook (MC) - The slash in this equation should be a
vertical line. I have edited the alt text to show how this should
read as well.
Figure 7.1: The Estimation Process in Simple Linear Regression

9
Figure 7.2: Possible Regression Lines in Simple Linear
Regression
The regression line in Panel A shows that the mean value of y is
related positively to x, with larger values of associated with
larger values of x.
In Panel B, the mean value of y is related negatively to x, with
smaller values of associated with larger values of x.
In Panel C, the mean value of y is not related to x; that is, is
the same for every value of x.
10
Least Squares Method
Least Squares Estimates of the Regression Parameters
Using Excel’s Chart Tools to Compute the Estimated Regression
Equation

Least Squares Method (Slide 1 of 15)
Least squares method: A procedure for using sample data to find
the estimated regression equation.
12
Table 7.1: Miles Traveled and Travel Time for 10 Butler
Trucking Company Driving AssignmentsDriving Assignment ix
= Miles Traveledy = Travel Time (hours)1 1009.32 504.83
508.94 1006.55 504.26 806.27 757.48 656.09
907.610 906.1

To illustrate the least squares method, suppose data were
collected from a sample of 10 Butler Trucking Company driving
assignments.
For the ith observation or driving assignment in the sample, xi
is the miles traveled and yi is the travel time (in hours).
The values of xi and yi for the 10 driving assignments in the
sample are summarized in Table 7.1.
13
Figure 7.3: Scatter Chart of Miles Traveled and Travel Time for
Sample of 10 Butler Trucking Company Driving Assignments
Figure 7.3 is a scatter chart of the data in Table 7.1.
Miles traveled is shown on the horizontal axis, and travel time
(in hours) is shown on the vertical axis.
Longer travel times appear to coincide with more miles
traveled.
The relationship between the travel time and miles traveled
appears to be approximated by a straight line; indeed, a positive
linear relationship is indicated between x and y.
Therefore, the simple linear regression model is chosen to
represent this relationship.

14
15
Hence,
We are finding the regression that minimizes the sum of squared
errors.

16
Least Squares Estimates of the Regression Parameters:
For the Butler Trucking Company data in Table 7.1:
The estimated simple linear regression model:
17
If the length of a driving assignment were 1 unit (1 mile)
longer, the mean travel time for that driving assignment would
be 0.0678 units (0.0678 hours, or approximately 4 minutes)
longer.
If the driving distance for a driving assignment was 0 units (0
miles), the mean travel time would be 1.2739 units (1.2739

hours, or approximately 76 minutes).
We interpret b1 and b0 as we would the y-intercept and slope of
any straight line.
18
Experimental region: The range of values of the independent
variables in the data used to estimate the model.
The regression model is valid only over this region.
Extrapolation: Prediction of the value of the dependent variable
outside the experimental region.
It is risky.

Because we have no empirical evidence that the relationship we
have found holds true for values of x outside of the range of
values of x in the data used to estimate the relationship,
extrapolation is risky and should be avoided if possible.
For Butler Trucking, this means that any prediction outside the
travel time for a driving distance less than 50 miles or greater
than 100 miles is not a reliable estimate, and so for this model
the estimate of β0 is meaningless.
However, if the experimental region for a regression problem
includes zero, the y-intercept will have a meaningful
interpretation.
19
Butler Trucking Company example: Use the estimated model
and the known values for miles traveled for a driving
assignment (x) to estimate mean travel time in hours.
For example, the first driving assignment in Table 7.1 has a
value for miles
The mean travel time in hours for this driving assignment is
estimated to be:
The resulting residual of the estimate is:

The simple linear regression model underestimated travel time
for this driving assignment by 1.2461 hours (approximately 74
minutes).
20
Table 7.2: Predicted Travel Time and Residuals for 10 Butler
Trucking Company Driving Assignments
Note in Table 7.2 that:
The sum of predicted values is equal to the sum of the values
of the dependent variable y.
The sum of the residuals ei is 0.
The sum of the squared residuals has been minimized.
21
Figure 7.4: Scatter Chart of Miles Traveled and Travel Time for
Butler Trucking Company Driving Assignments with Regression
Line Superimposed

Figure 7.4 shows the simple linear regression line = 1.2739 +
0.0678xi superimposed on the scatter chart for the Butler
Trucking Company data in Table 7.1.
This figure, which also highlights the residuals for driving
assignment 3 (e3) and driving assignment 5 (e5), shows that the
regression model underpredicts travel time for some driving
assignments (such as driving assignment 3) and overpredicts
travel time for others (such as driving assignment 5), but in
general appears to fit the data relatively well.
22
Figure 7.5: A Geometric Interpretation of the
Least Squares Method
In Figure 7.5, a vertical line is drawn from each point in the
scatter chart to the linear regression line.
Each of these lines represents the difference between the actual
driving time and the driving time to be predicted using linear
regression for one of the assignments in the data.

The length of each line is equal to the absolute value of the
residual for one of the driving assignments.
When a residual is squared, the resulting value is equal to the
square that is formed using the vertical dashed line representing
the residual in Figure 7.4 as one side of a square.
Thus, when we find the linear regression model that minimizes
the sum of squared errors for the Butler Trucking example, we
are positioning the regression line in the manner that minimizes
the sum of the areas of the ten squares in Figure 7.5.
23
Using Excel’s Chart Tools to Compute the Estimated Regression
Equation:
After constructing a scatter chart with Excel’s chart tools:
Right-click on any data point and select Add Trendline.
When the Format Trendline task pane appears:
Select Linear in the Trendline Options area.
Select Display Equation on chart in the Trendline Options area.
Figure 7.6: Scatter Chart and Estimated Regression Line for
Butler Trucking Company

We can use Excel’s chart tools to compute the estimated
regression equation on a scatter chart of the Butler Trucking
Company data in Table 7.1.
The worksheet displayed in Figure 7.6 shows the original data,
scatter chart, estimated regression line, and estimated regression
equation.
Note that Excel uses y instead of to denote the predicted value
of the dependent variable and puts the regression equation into
slope-intercept form whereas we use the intercept-slope form
that is standard in statistics.
25
Slope Equation
y-Intercept Equation

scanned, copied or duplicated, or posted to a publicl y accessible
26
Assessing the Fit of the Simple Linear Regression Model
The Sums of Squares
The Coefficient of Determination
Using Excel’s Chart Tools to Compute the Coefficient of
Determination
Assessing the Fit of the Simple Linear Regression Model (Slide
1 of 10)
The Sums of Squares:
Sum of squares due to error: The value of SSE is a measure of
the error in using the estimated regression equation to predict
the values of the dependent variable in the sample.
From Table 7.2,

2 of 10)
scanned, copied or duplicated, or posted to a publ icly accessible
Without knowledge of any related variables, we would use the
sample mean as a predictor of travel time for any given driving
assignment.
To find , we divide the sum of the actual driving times yi from
Table 7.2 (67) by the number of observations n in the data (10);
this yields = 6.7.
Figure 7.7 provides insight on how well we would predict the
values of yi in the Butler Trucking company example using =
6.7.
From this figure, which again highlights the residuals for
driving assignments 3 and 5, we can see that tends to
overpredict travel times for driving assignments that have
relatively small values for miles traveled (such as driving
assignment 5) and tends to underpredict travel times for driving
assignments that have relatively large values for miles traveled
(such as driving assignment 3).
29

3 of 10)
Table 7.3 shows the sum of squared deviations obtained by
using
for each driving assignment in the sample.
Butler Trucking Example: For the ith driving assignment in the
SST is a measure of how well the observations cluster about the
line.
30
4 of 10)
The corresponding sum of squares is called the total sum of
squares (SST).

SST is a measure of how well the observations cluster about the
line.
31
Assessing the Fit of the Simple Linear Regression Model (Sli de
5 of 10)
Table 7.3: Calculations for the Sum of Squares Total for the
Butler Trucking Simple Linear Regression
The sum at the bottom of the last column in Table 7.3 is the
total sum of squares for Butler Trucking Company: SST = 23.9.
32
6 of 10)

In Figure 7.8, we show the estimated regression line = 1.2739 +
0.0678xi and the line corresponding to = 6.7.
Note that the points cluster more closely around the estimated
regression line = 1.2739 + 0.0678xi than they do about the
horizontal line = 6.7.
33
7 of 10)
Relation between SST, SSR, and SSE:
where
SST = total sum of squares
SSR = sum of squares due to regression
SSE = sum of squares due to error.
In the Butler Trucking Company example, SSE = 8.0288 and
SST = 23.9:
SSR = SST SSE = 23.9 8.0288 = 15.8712

34
8 of 10)
The Coefficient of Determination:
The ratio SSR/SST used to evaluate the goodness of fit for the
estimated regression equation; this ratio is called the coefficient
of determination and is denoted by
Take values between zero and one.
Interpreted as the percentage of the total sum of squares that
can be explained by using the estimated regression equation.
For the Butler Trucking Company example, the coefficient of
determination,
r2 = = = 0.6641.
It can be concluded that 66.41% of the total sum of squares can
be explained by using the estimated regression equation =
1.2739 + 0.0678xi to predict quarterly sales.
35

9 of 10)
Using Excel’s Chart Tools to Compute the Coefficient of
Determination:
To compute the coefficient of determination using the scatter
chart in Figure 7.3:
Right-click on any data point in the scatter chart and select Add
Trendline…
When the Format Trendline task pane appears:
Select Display R-squared value on chart in the Trendline
Options area.
36
10 of 10)

Figure 7.9 displays the scatter chart, the estimated regression
equation, the graph of the estimated regression equation, and
the coefficient of determination for the Butler Trucking
Company data. We see that r2 = 0.6641.
37
The Multiple Regression Model
Regression Model
Estimated Multiple Regression Equation
Least Squares Method and Multiple Regression
Butler Trucking Company and Multiple Regression
Using Excel’s Regression Tool to Develop the Estimated
Multiple Regression Equation
The Multiple Regression Model (Slide 1 of 11)
Regression Model:

39
Regression Model (cont.):
Represents the change in the mean value of the dependent
variable y that corresponds to a one unit increase in the
independent variable
holding the values of all other independent variables in the
model
constant.
The multiple regression equation that describes how the mean
value of y is related to
40

Estimated Multiple Regression Equation:
41
Least Squares Method and Multiple Regression:
The least squares method is used to develop the estimated
multiple regression equation:
Finding
Uses sample data to provide the values of
that minimize the sum of squared residuals.

42
Figure 7:10: The Estimation Process for Multiple Regression
The estimation process for multiple regression is shown in
Figure 7.10.
The estimated values of the dependent variable y are computed
by substituting values of the independent variables x1, x2, . . . ,
xq into the estimated multiple regression equation.
Computer software packages can be used to obtain the estimated
regression equation and determine the regression coefficients
b0, b1, b2, . . . , bq.
43
Butler Trucking Company and Multiple Regression:
The estimated simple linear regression equation,
The linear effect of the number of miles traveled explains
66.41%

This implies, 33.59% of the variability in sample travel times
remains unexplained
The managers might want to consider adding one or more
independent variables, such as number of deliveries, to the
model to explain some of the remaining variability in the
dependent variable.
44
Butler Trucking Company and Multiple Regression (cont.):
Estimated multiple linear regression with two independent
variables:

45
Figure 7.11: Data Analysis Tools Box
Using Excel’s Regression Tool to Develop the Estimated
Multiple Regression Equation:
The following steps describe how to use Excel’s Regression tool
to compute the estimated regression equation using the data in
the worksheet.
Copy the data values from the file ButlerWithDeliveries into an
Excel worksheet from columns A through D and rows 1 through
301.
Step 1. Click the DATA tab in the Ribbon
Step 2. Click Data Analysis in the Analysis group
Step 3. Select Regression from the list of Analysis Tools
in the Data Analysis tools box (shown in Figure 7.11) and click
OK
46

Figure 7.12: Regression Dialog Box
Step 4. When the Regression dialog box appears (as shown in
Figure 7.12):
1. Enter D1:D301 in the Input Y Range: box
2. Enter B1:C301 in the Input X Range: box
3. Select Labels
Selecting Labels tells Excel to use the names you have gi ven to
your variables in row 1 when displaying the regression model
output.
4. Select Confidence Level:
5. Enter 99 in the Confidence Level: box
6. Select New Worksheet Ply:
7. Click OK
47
Figure 7.13: Excel Regression Output for the Butler Trucking
Company with Miles and Deliveries as Independent Variables

In the Excel output shown in Figure 7.13, the label for the
independent variable x1 is Miles (see cell A18), and the label
for the independent variable x2 is Deliveries (see cell A19).
Estimated regression equation:
= 0.1273 + 0.0672x1 + 0.6900x2
Interpretation:
For a fixed number of deliveries, we estimate that the mean
travel time will increase by 0.0672 hours when the distance
traveled increases by 1 mile.
For a fixed distance traveled, we estimate that the mean travel
time will increase by 0.69 hours when the number of deliveries
increases by 1 delivery.
Multiple coefficient of determination, R2 = 0.8173.
Interpretation: 81.73% of the variability in sample values of the
dependent variable, travel time, is explained by the independent
variables, number of deliveries and distance traveled.
48
Figure 7.14: Graph of the Regression Equation for Multiple
Regression Analysis with Two Independent Variables

With two independent variables x1 and x2, we now generate a
predicted value of y for every combination of values of x1 and
x2.
Thus, instead of a regression line, we now create a regression
plane in three-dimensional space.
Figure 7.14 provides the graph of the estimated regression plane
for the Butler Trucking Company example and shows the
seventh driving assignment in the data.
As the plane slopes upward to larger values of total travel time
() as either the number of miles traveled (x1) or the number of
deliveries (x2) increases.
The residual for a driving assignment when x1 = 75 and x2 = 3
is the difference between the observed y value and the estimated
mean value of y when
x1 = 75 and x2 = 3.
The observed value lies above the regression plane, indicating
that the regression model underestimates the expected driving
time for the seventh driving assignment.
49
Inference and Regression
Conditions Necessary for Valid Inference in the Least Squares
Regression Model
Testing Individual Regression Parameters
Addressing Nonsignificant Independent Variables
Multicollinearity

Inference and Regression (Slide 1 of 17)
Statistical inference: Process of making estimates and drawing
conclusions about one or more characteristics of a population
(the value of one or more parameters) through the analysis of
sample data drawn from the population.
In regression, inference is commonly used to estimate and draw
conclusions about:
The regression parameters
The mean value and/or the predicted value of the dependent
variable y for specific values of the independent variables
Consider both hypothesis testing and interval estimation.
51
Conditions Necessary for Valid Inference in the Least Squares
Regression Model:
For any given combination of values of the independent
variables

distributed with a mean of 0 and a constant variance.
Figure 7.15: Illustration of the Conditions for Valid Inference in
Regression
…
Business Analytics

Time Series Analysis and Forecasting
Chapter 8
Forecasting methods can be classified as qualitative or
quantitative.
Qualitative methods generally involve the use of expert
judgment to develop forecasts.
Quantitative forecasting methods can be used when:
Past information about the variable being forecast is available.
The information can be quantified.
It is reasonable to assume that past is prologue.
The objective of time series analysis is to uncover a pattern in

the time series and then extrapolate the pattern into the future.
The forecast is based solely on past values of the variable
and/or on past forecast errors.
Modern data-collection technologies have enabled individuals,
businesses, and government agencies to collect vast amounts of
data that may be used for causal forecasting.
4
Time Series Patterns
Horizontal Pattern
Trend Pattern
Seasonal Pattern
Trend and Seasonal Pattern
Cyclical Pattern
Identifying Time Series Patterns

Time Series Patterns (Slide 1 of 20)
Time series: A sequence of observations on a variable measured
at successive points in time or over successive periods of time.
The measurements may be taken every hour, day, week, month,
year, or any other regular interval. The pattern of the data is
important in understanding the series’ past behavior.
If the behavior of the times series data of the past is expected to
continue in the future, it can be used as a guide in selecting an
appropriate forecasting method.
To identify the underlying pattern in the data, a useful first step
is to construct a time series plot, which is a graphical
presentation of the relationship between time and the time series
variable; time is represented on the horizontal axis and values
of the time series variable are shown on the vertical axis.
6
Horizontal Pattern:
Exists when the data fluctuate randomly around a constant mean
over time.

Stationary time series: It denotes a time series whose statistical
properties are independent of time:
The process generating the data has a constant mean.
The variability of the time series is constant over time.
A time series plot for a stationary time series will always
exhibit a horizontal pattern with random fluctuations.
Table 8.1: Gasoline Sales Time SeriesWeekSales (1,000s of
gallons)117221319423518616720818922102011151222
Data in Table 8.1 show the number of gallons of gasoline (in
1000s) sold by a gasoline distributor in Bennington, Vermont,
over the past 12 weeks.
The average value, or mean, for this time series is 19.25, or
19,250 gallons per week.
8

Figure 8.1: Gasoline Sales Time Series Plot
Figure 8.1 shows a time series plot for the data in Table 8.1.
Note how the data fluctuate around the sample mean of 19,250
gallons.
Although random variability is present, we would say that these
data follow a horizontal pattern.
9
Table 8.2: Gasoline Sales Time Series after Obtaining the
Contract with the Vermont State PoliceWeekSales (1,000s of
gallons)WeekSales (1,000s of
gallons)1171222221133131914344231531518163361617287201
832818193092220291020213411152233
Table 8.2 shows the number of gallons of gasoline sold for the
original time series and the 10 weeks after signing the new

contract with the Vermont State Police to provide gasoline for
state police cars located in southern Vermont.
10
Figure 8.2: Gasoline Sales Time Series Plot after Obtaining the
Contract with the Vermont State Police
Figure 8.2 shows the corresponding time series plot. Note the
increased level of the time series beginning in week 13.
This change in the level of the time series makes it more
difficult to choose an appropriate forecasting method.
11
Trend Pattern:
A trend pattern shows gradual shifts or movements to relatively
higher or lower values over a longer period of time.
A trend is usually the result of long-term factors such as:
Population increases or decreases.
Shifting demographic characteristics of the population.
Improving technology.
Changes in the competitive landscape.
Changes in consumer preferences.

Table 8.3: Bicycle Sales Time SeriesYearSales
(1,000s)121.6222.9325.5421.9523.9627.5731.5829.7928.61031.
4
Table 8.3 shows the time series of bicycle sales for a particular
manufacturer over the past 10 years.
13
Figure 8.3: Bicycle Sales Time Series Plot

Figure 8.3 shows the time series of bicycle sales for a particular
manufacturer over the past 10 years.
Note that 21,600 bicycles were sold in year 1, 22,900 were sold
in year 2, and so on.
In year 10, the most recent year, 31,400 bicycles were sold.
Visual inspection of the time series plot shows some up-and-
down movement over the past 10 years.
14
Table 8.4: Cholesterol Drug Revenue TimesYearRevenue ($
millions)123.1221.3327.4434.6533.8643.2759.5864.4974.21099.
3
The data in Table 8.4 and the corresponding time series plot in
Figure 8.4 show the sales revenue for a cholesterol drug since
the company won FDA approval for the drug 10 years ago.
The time series increases in a nonlinear fashion; that is, the rate
of change of revenue does not increase by a constant amount
from one year to the next.
In fact, the revenue appears to be growing in an exponential
fashion.
15

Figure 8.4: Cholesterol Drug Revenue Times Series Plot ($
millions)
The data in Table 8.4 and the corresponding time series plot in
Figure 8.4 show the sales revenue for a cholesterol drug since
the company won FDA approval for the drug 10 years ago.
The time series increases in a nonlinear fashion; that is, the rate
of change of revenue does not increase by a constant amount
from one year to the next.
In fact, the revenue appears to be growing in an exponential
fashion.
16
Seasonal Pattern:
Seasonal patterns are recurring patterns over successive periods
of time.
Example: A retailer that sells bathing suits expects low sales
activity in the fall and winter months, with peak sales in the
spring and summer months to occur every year.
The time series plot not only exhibits a seasonal pattern over a
one-year period but also for less than one year in duration.
Example: daily traffic volume shows within-the-day “seasonal”
behavior, with peak levels occurring during rush hour, moderate
flow during the rest of the day, and light flow from midnight to
early morning.

Table 8.5: Umbrella Sales Time SeriesYearQuarterSales11
1252 1533 1064 8821 1182 1613 1334 10231 1382 1443
1134 80
18
Table 8.5: Umbrella Sales Time Series
(cont.)YearQuarterSales41 1092 1373 1254 10951 1302
1653 1284 96

19
Figure 8.5: Umbrella Sales Time Series Plot
20
Trend and Seasonal Pattern:
Some time series include both a trend and a seasonal pattern.
Table 8.6: Quarterly Smartphone Sales Time Series
YearQuarterSales
($1,000s)114.824.136.046.5215.825.236.847.4

Table 8.6 and Figure 8.6 show quarterly smartphone sales for a
particular manufacturer over the past four years.
Clearly an increasing trend is present.
However, Figure 8.6 also indicates that sales are lowest in the
second quarter of each year and highest in quarters 3 and 4.
Thus, we conclude that a seasonal pattern also exists for
smartphone sales.
21
Table 8.6: Quarterly Smartphone Sales Time Series
(cont.)YearQuarterSales
($1,000s)316.025.637.547.8416.325.938.048.4
Figure 8.6 shows quarterly smartphone sales for a particular
manufacturer over the past four years.
smartphone sales.

22
Figure 8.6: Quarterly Smartphone Sales Time Series Plot
scanned, copied or duplicated, or posted to a publicly access ible
Table 8.6 and Figure 8.6 show quarterly smartphone sales for a
particular manufacturer over the past four years.
smartphone sales.
23
Cyclical Pattern:
A cyclical pattern exists if the time series plot shows an
alternating sequence of points below and above the trendline
that lasts for more than one year.
Example: Periods of moderate inflation followed by periods of
rapid inflation can lead to a time series that alternates below
and above a generally increasing trendline.
Cyclical effects are often combined with long-term trend effects
and referred to as trend-cycle effects.

24
Identifying Time Series Patterns:
The underlying pattern in the time series is an important factor
in selecting a forecasting method.
A time series plot should be one of the first analytic tools.
We need to use a forecasting method that is capable of handling
the pattern exhibited by the time series effectively.
25
Forecast Accuracy

Forecast Accuracy (Slide 1 of 10)
Table 8.7: Computing Forecasts and Measures of Forecast
Accuracy Using the Most Recent Value as the Forecast for the
Next PeriodWeekTime Series ValueForecastForecast
ErrorAbsolute Value of Forecast ErrorSquared Forecast
ErrorPercentage ErrorAbsolute Value of Percentage
Error11722117 4 4 16 19.05 19.0531921 −2 2
4 −10.53 10.5342319 4 4 16 17.39
17.3951823 −5 5 25 −27.78 27.7861618 −2
2 4 −12.50 12.50
scanned, copied or duplicated, or posted to a publicly accessib le
Table 8.7 shows the forecasts for the gasoline time series shown
in Table 8.1 using the simplest of all the forecasting methods.
We use the most recent week’s sales volume as the forecast for
the next week.
For instance, the distributor sold 17 thousand gallons of
gasoline in week 1; this value is used as the forecast for week 2.
Next, we use 21, the actual value of sales in week 2, as the
forecast for week 3, and so on.
This method is often referred to as a naïve forecasti ng method
because of its simplicity.

27
Accuracy Using the Most Recent Value as the Forecast for the
Next Period (cont.)WeekTime Series ValueForecastForecast
ErrorPercentage ErrorAbsolute Value of Percentage Error72016
4 4 16 20.00 20.0081820 −2 2 4
−11.11 11.1192218 4 4 16 18.18
18.18102022 −2 2 4 −10.00 10.00111520 −5
5 25 −33.33 33.33122215 7 7 49 31.82
31.82Totals 5 41 179 1.19 211.69
Table 8.7 shows the forecasts for the gasoline time series shown
in Table 8.1 using the simplest of all the forecasting methods.
We use the most recent week’s sales volume as the forecast for
the next week.
For instance, the distributor sold 17 thousand gallons of
gasoline in week 1; this value is used as the forecast for week 2.
Next, we use 21, the actual value of sales in week 2, as the
forecast for week 3, and so on.
This method is often referred to as a naïve forecasting method
because of its simplicity.
28

Naïve forecasting method: Using the most recent data to predict
future data.
The key concept associated with measuring forecast accuracy is
forecast error.
Measures to determine how well a particular forecasting method
is able to reproduce the time series data that are already
available.
Forecast error.
Mean forecast error (MFE).
Mean absolute error (MAE).
Mean squared error (MSE).
Mean absolute percentage error (MAPE).
scanned, copied or duplicated, or posted to a publicl y accessible
Forecast Error: Difference between the actual and the forecasted
values for period t.
Mean Forecast Error: Mean or average of the forecast errors.

Forecast error:
=
= actual value
= forecasted value
Example:
Consider Table 8.7.
The distributor actually sold 21 thousa nd gallons of gasoline in
week 2, and the forecast, using the sales volume in week 1, was
17 thousand gallons.
The forecast error in week 2 is = = 21 17 = 4.
Mean Forecast Error (MFE):
MFE =
n = Number of periods in time series
k = Number of periods at the beginning of the time series for
which we cannot produce a naïve forecast
Example:
Table 8.7 shows that the sum of the forecast errors for the
gasoline sales time series is 5.
Thus, the mean or average error is 5/11 = 0.45.
30
Mean Absolute Error (MAE): Measure of forecast accuracy that
avoids the problem of positive and negative forecast errors
offsetting one another.
Mean Squared Error (MSE): Measure that avoids the problem of
positive and negative errors offsetting each other is obtained by
computing the average of the squared forecast errors.

Mean Absolute Error (MAE)
It is also referred to as the mean absolute deviation (MAD).
Example:
Table 8.7 shows that the sum of the absolute values of the
forecast errors is 41.
Thus MAE = average of the absolute value of the forecast errors
= 41/11 = 3.73.
MEAN SQUARED ERROR (MSE)
Example:
From Table 8.7, the sum of the squared errors is 179.
Hence, MSE = average of the square of the forecast errors =
179/11 = 16.27.
31
Mean Absolute Percentage Error (MAPE): Average of the
absolute value of percentage forecast errors.

Mean Absolute Percentage Error (MAPE):
The size of MAE or MSE depends upon the scale of the data.
As a result, it is difficult to make comparisons for different time
intervals (such as comparing a method of forecasting monthly
gasoline sales to a method of forecasting weekly sales) or to
make comparisons across different time series (such as monthly
sales of gasoline and monthly sales of oil filters).
To make comparisons such as these, we need to work with
relative or percentage error measures.
The mean absolute percentage error (MAPE) is such a measure.
Example:
Table 8.7 shows that the sum of the absolute values of the
percentage errors is
Thus the MAPE, which is the average of the absolute value of
percentage forecast errors, is 211.69/11 = 19.24%
32
Accuracy Using the Average of All the Historical Data as the
Forecast for the Next PeriodWeekTime Series
ValueForecastForecast ErrorAbsolute Value of Forecast
ErrorSquared Forecast ErrorPercentage ErrorAbsolute Value of
Percentage Error 1 17 2 21 17.00 4.00 4.00
16.00 19.05 19.05 3 19 19.00 0.00
0.00 0.00 0.00 0.00 4 23 19.00 4.00 4.00 16.00
17.39 17.39 5 18 20.00 −2.00 2.00
4.00 −11.11 11.11

We begin by developing a forecast for week 2.
Because there is only one historical value available prior to
week 2, the forecast for week 2 is just the time series value in
week 1; thus, the forecast for week 2 is 17 thousand gallons of
gasoline.
To compute the forecast for week 3, we take the average of the
sales values in weeks 1 and 2. Thus,
= (17 + 21)/2 = 19
Similarly, the forecast for week 4 is = (17 + 21 + 19)/3 = 19.
Using the results shown in Table 8.8, we obtain the following
values of MAE, MSE, and MAPE:
MAE = 26.81/11 = 2.44
MSE = 89.07/11 = 8.10
MAPE = 141.34/11 = 12.85%
33
Forecast for the Next Period (cont.)WeekTime Series
Percentage Error 6 16 19.60 −3.60 3.60 12.96
−22.50 22.50 7 20 19.00 1.00 1.00 1.00
5.00 5.00 8 18 19.14 −1.14 1.14 1.31 −6.35
6.35 9 22 19.00 3.00 3.00 9.00 13.64 13.64

week 1; thus, the forecast for week 2 is 17 thousand gallons of
gasoline.
= (17 + 21)/2 = 19
MAE = 26.81/11 = 2.44
MSE = 89.07/11 = 8.10
MAPE = 141.34/11 = 12.85%
34
Forecast for the Next Period (cont.)WeekTime Series
Percentage Error 10 20 19.33 0.67 0.67 0.44 3.33
3.33 11 15 19.40 −4.40 4.40 19.36 −29.33
29.33 12 22 19.00 3.00 3.00 9.00 13.64
13.64Totals 4.52 26.81 89.07 2.75 141.34

week 1; thus, the forecast for week 2 is 17 thousand gall ons of
gasoline.
= (17 + 21)/2 = 19
MAE = 26.81/11 = 2.44
MSE = 89.07/11 = 8.10
MAPE = 141.34/11 = 12.85%
35
Compare the accuracy of the two forecasting methods by
comparing the values of MAE, MSE, and MAPE for each
method.Naïve MethodAverage of Past ValuesMAE 3.73
2.44MSE 16.27 8.10MAPE 19.24% 12.85%
The average of past values provides more accurate forecasts for
the next period than using the most recent observation.

36
Moving Averages and Exponential Smoothing
Moving Averages
Exponential Smoothing
Moving Averages and Exponential Smoothing (Slide 1 of 16)
Moving Averages:
Moving averages method: Uses the average of the most recent k
data values in the time series as the forecast for the next period.

In the formula for moving average forecast,
= forecast of the time series for period t + 1
= actual value of the time series in period t
k = number of periods of time series data used to generate the
forecast
To use moving averages to forecast a time series, we must first
select the order k.
If only the most recent values of the time series are considered
relevant, a small value of k is preferred.
If a greater number of past values are considered relevant, then
we generally opt for a larger value of k.
A moving average will adapt to the new level of the series and
continue to provide good forecasts in k periods.
Thus, a smaller value of k will track shifts in a time series more
quickly.
On the other hand, larger values of k will be more effective in
smoothing out random fluctuations.
38
Table 8.9: Summary of Three-Week Moving Average
CalculationsWeekTime Series ValueForecastForecast
ErrorPercentage ErrorAbsolute Value of Percentage Error 1
17 2 21 3 19 4 23 19 4 4 16
17.39 17.39 5 18 21 −3 3 9 −16.67
16.67 6 16 20 −4 4 16 −25.00 25.00

To illustrate the moving averages method, let us return to the
original 12 weeks of gasoline sales data in Table 8.1 and Figure
8.1.
We will use a three-week moving average (k = 3).
We begin by computing the forecast of sales in week 4 using the
average of the time series values in weeks 1 to 3.
= average for weeks 1 to 3 = (17 + 21 + 19)/3 = 19
The moving average forecast of sales in week 4 is 19, or 19,000
gallons of gasoline.
Because the actual value observed in week 4 is 23, the forecast
error in week 4 is e4 = 23 - 19 = 4.
We next compute the forecast of sales in week 5 by averaging
the time series values in weeks 2 to 4.
Hence, the forecast of sales in week 5 is 21 and the error
associated with this forecast is e5 = 18 - 21 = -3.
A complete summary of the three-week moving average
forecasts for the gasoline sales time series is provided in Table
8.9.
39
Table 8.9: Summary of Three-Week Moving Average
Calculations (cont.)WeekTime Series ValueForecastForecast

ErrorPercentage ErrorAbsolute Value of Percentage Error 7
20 19 1 1 1 5.00 5.00 8 18 18 0 0
0 0.00 0.00 9 22 18 4 4 16 18.18
18.18 10 20 20 0 0 0 0.00 0.00 11 15
20 −5 5 25 −33.33 33.33 12 22 19 3
3 9 13.64 13.64Totals 0 24 92 −20.79
129.21
To illustrate the moving averages method, let us return to the
original 12 weeks of gasoline sales data in Table 8.1 and Figure
8.1.
We will use a three-week moving average (k = 3).
We begin by computing the forecast of sales in week 4 using the
average of the time series values in weeks 1 to 3.
The moving average forecast of sales in week 4 is 19 or 19,000
gallons of gasoline.
Because the actual value observed in week 4 is 23, the forecast
error in week 4 is e4 = 23 - 19 = 4.
We next compute the forecast of sales in week 5 by averaging
the time series values in weeks 2 to 4.
Hence, the forecast of sales in week 5 is 21 and the error
associated with this forecast is e5 = 18 - 21 = -3.

A complete summary of the three-week moving average
forecasts …

Business analytics© 2021 cengage learning. all rights

Recommended

Recommended

More Related Content

Similar to Business analytics© 2021 cengage learning. all rights

Similar to Business analytics© 2021 cengage learning. all rights (20)

More from RAJU852744

More from RAJU852744 (20)

Recently uploaded

Recently uploaded (20)

Business analytics© 2021 cengage learning. all rights