SlideShare a Scribd company logo
Copyright © 2009 Cengage Learning
Introduction to Statistics and
Data Analysis
Resource Person:
Muhammad Zaheer
Copyright © 2009 Cengage Learning 1.2
Session 1
What is Statistics?
Copyright © 2009 Cengage Learning 1.3
What is Statistics?
“Statistics is a way to get information from data.”
Copyright © 2009 Cengage Learning 1.4
What is Statistics?
“Statistics is a way to get information from data”
Data
Statistics
Information
Definitions: Oxford English Dictionary
Statistics is a tool for creating new understanding from a set of numbers.
Copyright © 2009 Cengage Learning 1.5
Key Statistical Concepts
Population
— a population is the group of all items of interest to
a statistics practitioner.
— frequently very large; sometimes infinite.
E.g. All 5 million Florida voters, per Example
Sample
— A sample is a set of data drawn from the
population.
— Potentially very large, but less than the population.
E.g. a sample of 765 voters exit polled on election day.
Copyright © 2009 Cengage Learning 1.6
Key Statistical Concepts
Parameter
— A descriptive measure of a population.
Statistic
— A descriptive measure of a sample.
Copyright © 2009 Cengage Learning 1.7
Key Statistical Concepts
Populations have Parameters,
Samples have Statistics.
Parameter
Population Sample
Statistic
Subset
Copyright © 2009 Cengage Learning 1.8
Descriptive Statistics
…are methods of organizing, summarizing, and presenting
data in a convenient and informative way. These methods
include:
Graphical Techniques and
Numerical Techniques
The actual method used depends on what information we would
like to extract. Are we interested in…
• measure(s) of central location? and/or
• measure(s) of variability (dispersion)?
Descriptive Statistics helps to answer these questions…
Copyright © 2009 Cengage Learning 1.9
Inferential Statistics
Descriptive Statistics describe the data set that’s being
analyzed, but doesn’t allow us to draw any conclusions or
make any interferences about the data. Hence we need
another branch of statistics: inferential statistics.
Inferential statistics is also a set of methods, but it is used to
draw conclusions or inferences about characteristics of
populations based on data from a sample.
Copyright © 2009 Cengage Learning 1.10
Statistical Inference
Statistical inference is the process of making an estimate,
prediction, or decision about a population based on a sample.
Parameter
Population
Sample
Statistic
Inference
What can we infer about a Population’s Parameters
based on a Sample’s Statistics?
Copyright © 2009 Cengage Learning 1.11
Statistical Inference
We use statistics to make inferences about parameters.
Therefore, we can make an estimate, prediction, or decision
about a population based on sample data.
Thus, we can apply what we know about a sample to the
larger population from which it was drawn!
Copyright © 2009 Cengage Learning 2.12
Definitions…
A variable is some characteristic of a population or sample.
E.g. student grades.
Typically denoted with a capital letter: X, Y, Z…
The values of the variable are the range of possible values
for a variable.
E.g. student marks (0..100)
Data are the observed values of a variable.
E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
Copyright © 2009 Cengage Learning
Definitions
• Discrete Variable
A variable that can take only specified values, like whole
numbers.
example number of rooms, number of children etc
• Continuous Variable
A variable that that theoretically any value between two
specified limits a and b.
example: time, height, weight, age etc
Copyright © 2009 Cengage Learning 4.14
Numerical Descriptive Techniques…
Measures of Central Location
Mean, Median, Mode
Measures of Variability
Range, Standard Deviation, Variance, Coefficient of Variation
Measures of Relative Standing
Percentiles, Quartiles
Measures of Linear Relationship
Covariance, Correlation, Determination, Least Squares Line
Copyright © 2009 Cengage Learning
Four Elements of Descriptive Statistical Problems
1. The population or sample of interest.
2. One or more variables
3. Tables, graphs or numerical summary tools.
4. Conclusions about the data based on the patterns
revealed.
Copyright © 2009 Cengage Learning 4.16
Measures of Central Location…
The arithmetic mean, a.k.a. average, shortened to mean, is
the most popular & useful measure of central location.
It is computed by simply adding up all the observations and
dividing by the total number of observations:
Sum of the observations
Number of observations
Mean =
Copyright © 2009 Cengage Learning 4.17
Notation…
When referring to the number of observations in a
population, we use uppercase letter N
When referring to the number of observations in a
sample, we use lower case letter n
The arithmetic mean for a population is denoted with Greek
letter “mu”:
The arithmetic mean for a sample is denoted with an
“x-bar”:
Copyright © 2009 Cengage Learning 4.18
Arithmetic Mean…
Population Mean Sample Mean
Copyright © 2009 Cengage Learning 4.19
The Arithmetic Mean…
…is appropriate for describing measurement data, e.g. heights of
people, marks of student papers, etc.
Merits
•It is rigidly defined.
•Easy to calculate, simple to understand and easy to interpret.
•It is based on all the observations.
•Capable of further mathematical treatments.
•It is stable in experiments of repeated sampling from a distribution.
Demerits
•It is seriously affected by extreme values called “outliers”. E.g. as
soon as a billionaire moves into a neighborhood, the average household
income increases beyond what it was previously.
•Not a good measure for extremely skewed distributions.
•Can locate a value which doesn’t exist in the data.
Copyright © 2009 Cengage Learning 4.20
Measures of Central Location…
The median is calculated by placing all the observations in
order; the observation that falls in the middle is the median.
Data: {0, 7, 12, 5, 14, 8, 0, 9, 22} N=9 (odd)
Sort them bottom to top, find the middle:
0 0 5 7 8 9 12 14 22
Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33} N=10 (even)
Sort them bottom to top, the middle is the
simple average between 8 & 9:
0 0 5 7 8 9 12 14 22 33
median = (8+9)÷2 = 8.5
Sample and population medians are computed the same way.
Copyright © 2009 Cengage Learning 4.21
Measures of Central Location…
The mode of a set of observations is the value that occurs
most frequently.
A set of data may have one mode (or modal class), or two,
or more modes.
Mode is a useful for all data types, though mainly used for
nominal data.
For large data sets the modal class is much more relevant
than a single-value mode.
Sample and population modes are computed the same way.
Copyright © 2009 Cengage Learning 4.22
Mode…
E.g. Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33} N=10
Which observation appears most often?
The mode for this data set is 0. How is this a measure of
“central” location?
Frequency
Variable
A modal class
Copyright © 2009 Cengage Learning 4.23
Mean, Median, Mode…
If a distribution is symmetrical,
the mean, median and mode may coincide…
mode
mean
median
Copyright © 2009 Cengage Learning 4.24
Mean, Median, Mode…
If a distribution is asymmetrical, say skewed to the left or to
the right, the three measures may differ. E.g.:
mode
mean
median
Copyright © 2009 Cengage Learning
Mean, Median, Mode Relationship
• For symmetrical distribution
Mean=Median=Mode
• For positively skewed distribution
Mean>Median>Mode
• For negatively skewed distribution
Mean<Median<Mode
• Mode=3median-2mean
Copyright © 2009 Cengage Learning 4.26
Mean, Median, Mode: Which Is Best?
With three measures from which to choose, which one
should we use?
The mean is generally our first selection. However, there are
several circumstances when the median is better.
The mode is seldom the best measure of central location.
One advantage the median holds is that it not as sensitive to
extreme values as is the mean.
Copyright © 2009 Cengage Learning 4.27
Mean, Median, Mode: Which Is Best?
To illustrate, consider the data in an Example internet usage
in hours:
0 7 12 5 33 14 8 0 9 22
The mean is 11.0 and the median is 8.5.
Now suppose that the respondent who reported 33 hours
actually reported 133 hours (obviously an Internet addict).
The mean becomes
0.21
10
210
10
22081413351270
n
x
x
n
1i
i




Copyright © 2009 Cengage Learning 4.28
Mean, Median, Mode: Which Is Best?
What you see in this new MEAN?
The median stays the same. When there is a relatively small
number of extreme observations (either very small or very
large, but not both), the median usually produces a better
measure of the center of the data.
Copyright © 2009 Cengage Learning 4.29
Mean, Median, & Modes for Ordinal & Nominal Data
For ordinal and nominal data the calculation of the mean is
not valid.
Median is appropriate for ordinal data.
For nominal data, a mode calculation is useful for
determining highest frequency but not “central location”.
Copyright © 2009 Cengage Learning 4.30
Measures of Central Location • Summary…
Compute the Mean to
• Describe the central location of a single set of interval
data
Compute the Median to
• Describe the central location of a single set of interval or
ordinal data
Compute the Mode to
• Describe a single set of nominal data
Copyright © 2009 Cengage Learning 4.31
Geometric Mean
The arithmetic mean is the single most popular and useful
measure of central location.
However, there is another circumstance where neither the
mean nor the median is the best measure.
When the variable is a growth rate or rate of change, such as
the value of an investment over periods of time, we need
another measure.
This will become apparent from the following illustration.
Copyright © 2009 Cengage Learning 4.32
Geometric Mean
Suppose you make a 2-year investment of $1,000 and it
grows by 100% to $2,000 during the first year.
During the second year, however, the investment suffers a
50% loss, from $2,000 back to $1,000.
The rates of return for years 1 and 2 are R1 = 100% and R2 =
–50%, respectively. The arithmetic mean (and the median) is
computed as
%25
2
)50(100
2
RR
R 21





Copyright © 2009 Cengage Learning 4.33
Geometric Mean
But this figure is misleading. Because there was no change
in the value of the investment from the beginning to the end
of the 2-year period, the “average” compounded rate of
return is 0%.
As you will see, this is the value of the geometric mean.
Copyright © 2009 Cengage Learning 4.34
Geometric Mean
Let Ri denote the rate of return (in decimal form) in period i
(i = 1, 2, …, n). The geometric mean Rg of the returns is
defined such that
Solving for Rg we produce the following formula.
)R1)...(R1)(R1()R1( n21
n
g 
1)R1)...(R1)(R1(R n
n21g 
Copyright © 2009 Cengage Learning 4.35
Geometric Mean
The geometric mean of our investment illustration is
The geometric mean is therefore 0%. This is the single
“average” return that allows us to compute the value of the
investment at the end of the investment period from the
beginning value. Thus, using the formula for compound
interest with the rate = 0%, we find
Value at the end of the investment period = 1,000(1 + Rg)2 =
1,000(1 + 0) 2 = 1,000
1)R1)...(R1)(R1(R n
n21g 
0111])50.[1)(11(2 
Copyright © 2009 Cengage Learning 4.36
Geometric Mean
Thus, using the formula for compound interest with the rate
= 0%, we find
Value at the end of the investment period
= 1,000(1 + Rg)2 = 1,000(1 + 0) 2 = 1,000
Copyright © 2009 Cengage Learning 4.37
Geometric Mean
The geometric mean is used whenever we wish to find the
“average” growth rate, or rate of change, in a variable over
time.
However, the arithmetic mean of n returns (or growth rates)
is the appropriate mean to calculate if you wish to estimate
the mean rate of return (or growth rate) for any single period
in the future.
Copyright © 2009 Cengage Learning 4.38
Measures of Variability…
Measures of central location fail to tell the whole story about
the distribution; that is, how much are the observations
spread out around the mean value?
For example, two sets of class
grades are shown. The mean
(=50) is the same in each case…
But, the red class has greater
variability than the blue class.
Copyright © 2009 Cengage Learning 4.39
Range…
The range is the simplest measure of variability, calculated
as:
Range = Largest observation – Smallest observation
E.g.
Data: {4, 4, 4, 4, 50} Range = 46
Data: {4, 8, 15, 24, 39, 50} Range = 46
The range is the same in both cases,
but the data sets have very different distributions…
Copyright © 2009 Cengage Learning 4.40
Range…
Its major advantage is the ease with which it can be
computed.
Its major shortcoming is its failure to provide information on
the dispersion of the observations between the two end
points.
Hence we need a measure of variability that incorporates
all the data and not just two observations. Hence…
Copyright © 2009 Cengage Learning 4.41
Variance…
Variance and its related measure, standard deviation, are
arguably the most important statistics. Used to measure
variability, they also play a vital role in almost all statistical
inference procedures.
Population variance is denoted by
(Lower case Greek letter “sigma” squared)
Sample variance is denoted by
(Lower case “S” squared)
Copyright © 2009 Cengage Learning 4.42
Variance…
The variance of a population is:
The variance of a sample is:
population mean
sample mean
Note: The denominator is sample size (n) minus one !
population size
Copyright © 2009 Cengage Learning 4.43
Variance…
As you can see, you have to calculate the sample mean (x-
bar) in order to calculate the sample variance.
Alternatively, there is a short-cut formulation to calculate
sample variance directly from the data without the
intermediate step of calculating the mean. Its given by:
Copyright © 2009 Cengage Learning 4.44
Application…
Example: The following sample consists of the number of
jobs six students applied for: 17, 15, 23, 7, 9, 13.
Finds its mean and variance.
What are we looking to calculate?
The following sample consists of the number of jobs six
students applied for: 17, 15, 23, 7, 9, 13.
Finds its mean and variance.
…as opposed to  or 2
Copyright © 2009 Cengage Learning 4.45
Sample Mean & Variance…
Sample Mean
Sample Variance
Sample Variance (shortcut method)
Copyright © 2009 Cengage Learning 4.46
Standard Deviation…
The standard deviation is simply the square root of the
variance, thus:
Population standard deviation:
Sample standard deviation:
Copyright © 2009 Cengage Learning 4.47
Standard Deviation…
Consider Example 4.8 [Xm04-08]where a golf club manufacturer has
designed a new club and wants to determine if it is hit more
consistently (i.e. with less variability) than with an old club.
Using Data > Data Analysis > Descriptive Statistics in Excel, we
produce the following tables for interpretation…
You get more
consistent
distance with the
new club.
Copyright © 2009 Cengage Learning 4.48
Interpreting Standard Deviation…
The standard deviation can be used to compare the variability of
several distributions and make a statement about the general shape
of a distribution. If the histogram is bell shaped, we can use the
Empirical Rule, which states:
1) Approximately 68% of all observations fall within one standard
deviation of the mean.
2) Approximately 95% of all observations fall within two standard
deviations of the mean.
3) Approximately 99.7% of all observations fall within three standard
deviations of the mean.
Copyright © 2009 Cengage Learning 4.49
The Empirical Rule…
Approximately 68% of all observations fall
within one standard deviation of the mean.
Approximately 95% of all observations fall
within two standard deviations of the mean.
Approximately 99.7% of all observations fall
within three standard deviations of the mean.
Copyright © 2009 Cengage Learning 4.50
Chebysheff’s Theorem…
A more general interpretation of the standard deviation is
derived from Chebysheff’s Theorem, which applies to all
shapes of histograms (not just bell shaped).
The proportion of observations in any sample that lie
within k standard deviations of the mean is at least:
For k=2 (say), the theorem states
that at least 3/4 of all observations
lie within 2 standard deviations of
the mean. This is a “lower bound”
compared to Empirical Rule’s
approximation (95%).
Copyright © 2009 Cengage Learning 4.51
Interpreting Standard Deviation
Suppose that the mean and standard deviation of last year’s
midterm test marks are 70 and 5, respectively. If the
histogram is bell-shaped then we know that approximately
68% of the marks fell between 65 and 75, approximately
95% of the marks fell between 60 and 80, and approximately
99.7% of the marks fell between 55 and 85.
If the histogram is not at all bell-shaped we can say that at
least 75% of the marks fell between 60 and 80, and at least
88.9% of the marks fell between 55 and 85. (We can use
other values of k.)
Copyright © 2009 Cengage Learning 4.52
Coefficient of Variation…
The coefficient of variation of a set of observations is the
standard deviation of the observations divided by their mean,
that is:
Population coefficient of variation = CV =
Sample coefficient of variation = cv =
Copyright © 2009 Cengage Learning 4.53
Coefficient of Variation…
This coefficient provides a
proportionate measure of variation, e.g.
A standard deviation of 10 may be perceived as large when
the mean value is 100, but only moderately large when the
mean value is 500.
Copyright © 2009 Cengage Learning 4.54
Measures of Relative Standing
Measures of relative standing are designed to provide
information about the position of particular values relative to
the entire data set.
Percentile: the Pth percentile is the value for which P percent
are less than that value and (100-P)% are greater than that
value.
Suppose you scored in the 60th percentile on the GMAT, that
means 60% of the other scores were below yours, while 40% of
scores were above yours.
Copyright © 2009 Cengage Learning 4.55
Quartiles…
We have special names for the 25th, 50th, and 75th
percentiles, namely quartiles.
The first or lower quartile is labeled Q1 = 25th percentile.
The second quartile, Q2 = 50th percentile (which is also the
median).
The third or upper quartile, Q3 = 75th percentile.
We can also convert percentiles into quintiles (fifths) and deciles (tenths).
Copyright © 2009 Cengage Learning 4.56
Commonly Used Percentiles…
First (lower) decile = 10th percentile
First (lower) quartile, Q1, = 25th percentile
Second (middle)quartile,Q2, = 50th percentile
Third quartile, Q3, = 75th percentile
Ninth (upper) decile = 90th percentile
Note: If your exam mark places you in the 80th percentile,
that doesn’t mean you scored 80% on the exam – it means
that 80% of your peers scored lower than you on the exam;
It is about your position relative to others.
Copyright © 2009 Cengage Learning 4.57
Location of Percentiles…
The following formula allows us to approximate the location
of any percentile:
Copyright © 2009 Cengage Learning 4.58
Location of Percentiles…
Recall the data from Example 4.1:
0 0 5 7 8 9 12 14 22 33
Where is the location of the 25th percentile? That is, at
which point are 25% of the values lower and 75% of the
values higher?
L25 = (10+1)(25/100) = 2.75
0 0 5 7 8 9 12 14 22 33
The 25th percentile is three-quarters of the distance between the second
(which is 0) and the third (which is 5) observations. Three-quarters of the
distance is: (.75)(5 – 0) = 3.75
Because the second observation is 0, the 25th percentile is 0 + 3.75 = 3.75
Copyright © 2009 Cengage Learning 4.59
Location of Percentiles…
What about the upper quartile?
L75 = (10+1)(75/100) = 8.25
0 0 5 7 8 9 12 14 22 33
It is located one-quarter of the distance between the eighth and the ninth
observations, which are 14 and 22, respectively. One-quarter of the distance
is: (.25)(22 - 14) = 2, which means the 75th percentile is at: 14 + 2 = 16
Copyright © 2009 Cengage Learning 4.60
Location of Percentiles…
Please remember…
0 0 | 5 7 8 9 12 14 | 22 33
16
Lp determines the position in the data set where the percentile value lies,
not the value of the percentile itself.
3.75
position
8.25
position
2.75
Copyright © 2009 Cengage Learning 4.61
Interquartile Range…
The quartiles can be used to create another measure of
variability, the interquartile range, which is defined as
follows:
Interquartile Range = Q3 – Q1
The interquartile range measures the spread of the middle
50% of the observations.
Large values of this statistic mean that the 1st and 3rd
quartiles are far apart indicating a high level of variability.
Copyright © 2009 Cengage Learning 4.62
Measures of Linear Relationship…
We now present three numerical measures of linear
relationship that provide information as to the strength &
direction of a linear relationship between two variables (if
one exists).
They are the covariance, the coefficient of correlation, and
the coefficient of determination.
Copyright © 2009 Cengage Learning 4.63
Covariance…
population mean of variable X, variable Y
sample mean of variable X, variable Y
Note: divisor is n-1, not n as you may expect.
Copyright © 2009 Cengage Learning 4.64
Covariance…
In much the same way there was a “shortcut” for
calculating sample variance without having to calculate the
sample mean, there is also a shortcut for calculating sample
covariance without having to first calculate the means:
Copyright © 2009 Cengage Learning 4.65
Covariance Illustrated…
Consider the following three sets of data:
In each set, the values of X are the same, and the value for Y are the same;
the only thing that’s changed is the order of the Y’s.
In set #1, as X increases so does Y; Sxy is large & positive
In set #2, as X increases, Y decreases; Sxy is large & negative
In set #3, as X increases, Y doesn’t move in any particular way; Sxy is “small”
Copyright © 2009 Cengage Learning 4.66
Covariance… (Generally speaking)
When two variables move in the same direction (both increase or both
decrease), the covariance will be a large positive number.
When two variables move in opposite directions, the covariance is a
large negative number.
When there is no particular pattern, the covariance is a small number.
However, it is often difficult to determine whether a particular
covariance is large or small. The next parameter/statistic addresses this
problem.
Copyright © 2009 Cengage Learning 4.67
Coefficient of Correlation…
The coefficient of correlation is defined as the covariance
divided by the standard deviations of the variables:
Greek letter
“rho”
This coefficient answers the question:
How strong is the association between X and Y?
Copyright © 2009 Cengage Learning 4.68
Coefficient of Correlation…
The advantage of the coefficient of correlation over covariance is that
it has fixed range from -1 to +1, thus:
If the two variables are very strongly positively related, the coefficient
value is close to +1 (strong positive linear relationship).
If the two variables are very strongly negatively related, the coefficient
value is close to -1 (strong negative linear relationship).
No straight line relationship is indicated by a coefficient close to zero.
Copyright © 2009 Cengage Learning 4.69
Coefficient of Correlation…
r or r =
+1
0
-1
Strong positive linear relationship
No linear relationship
Strong negative linear relationship
Copyright © 2009 Cengage Learning 4.70
Example 4.16
Calculate the coefficient of correlation for the three sets of data above.
Copyright © 2009 Cengage Learning 4.71
Example 4.16
Because we’ve already calculated the covariances we only need
compute the standard deviations of X and Y.
0.5
3
762
x 


0.20
3
272013
y 


0.7
2
419
13
)57()56()52(
s
222
2
x 





0.49
2
490149
13
)2027()2020()2013(
s
222
2
y 





Copyright © 2009 Cengage Learning 4.72
Example 4.16
The standard deviations are
65.20.7sx 
00.70.49sy 
Copyright © 2009 Cengage Learning 4.73
Example 4.16
The coefficients of correlation are
Set 1:
Set 2:
Set 3:
943.
)0.7)(65.2(
5.17
ss
s
r
yx
xy

943.
)0.7)(65.2(
5.17
ss
s
r
yx
xy



189.
)0.7)(65.2(
5.3
ss
s
r
yx
xy



Copyright © 2009 Cengage Learning 4.74
Least Squares Method
The objective of the scatter diagram is to measure the
strength and direction of the linear relationship.
Both can be more easily judged by drawing a straight line
through the data.
We need an objective method of producing a straight line.
Such a method has been developed; it is called the least
squares method.
Copyright © 2009 Cengage Learning 4.75
Least Squares Method
Recall, the slope-intercept equation for a line is expressed in these
terms:
y = mx + b
Where:
m is the slope of the line
b is the y-intercept.
If we’ve determined there is a linear relationship between two
variables with covariance and the coefficient of correlation, can we
determine a linear function of the relationship?
Copyright © 2009 Cengage Learning 4.76
The Least Squares Method
…produces a straight line drawn through the points so that
the sum of squared deviations between the points and the
line is minimized. This line is represented by the equation:
b0 (“b” naught) is the y-intercept,
b1 is the slope, and
(“y” hat) is the value of y determined by the line.
Copyright © 2009 Cengage Learning 4.77
The Least Squares Method
The coefficients b0 and b1 are given by:
Copyright © 2009 Cengage Learning 4.78
Fixed and Variable Costs
Fixed costs are costs that must be paid whether or not any
units are produced.
These costs are "fixed" over a specified period of time or
range of production.
Variable costs are costs that vary directly with the number of
products produced.
Copyright © 2009 Cengage Learning 4.79
Fixed and Variable Costs
There are some expenses that are mixed.
There are several ways to break the mixed costs in its fixed
and variable components. One such method is the least
squares line. That is, we express the total costs of some
component as
y = b0 + b1x
where y = total mixed cost, b0 = fixed cost and b1 = variable
cost, and x is the number of units.
Copyright © 2009 Cengage Learning 4.80
Example 4.17
A tool and die maker operates out of a small shop making specialized
tools.
He is considering increasing the size of his business and needed to
know more about his costs.
One such cost is electricity, which he needs to operate his machines
and lights. (Some jobs require that he turn on extra bright lights to
illuminate his work.)
He keeps track of his daily electricity costs and the number of tools
that he made that day. Determine the fixed and variable electricity
costs. [Xm04-17]
Copyright © 2009 Cengage Learning 4.81
Example 4.17
y = 2.245x + 9.587
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
0 2 4 6 8 10 12 14 16
Electricalcosts
Number of tools
Copyright © 2009 Cengage Learning 4.82
Example 4.17
The slope is defined as rise/run, which means that it is the change in
y (rise) for a 1-unit increase in x (run).
The slope measures the marginal rate of change in the dependent
variable. The marginal rate of change refers to the effect of
increasing the independent variable by one additional unit.
In this example the slope is 2.25, which means that for each 1-unit
increase in the number of tools, the marginal increase in the
electricity cost 2.25. Thus, the estimated variable cost is $2.25 per
tool.
x245.2587.9yˆ 
Copyright © 2009 Cengage Learning 4.83
Example 4.17
The y-intercept is 9.57. That is, the line strikes the y-axis at 9.57.
This is simply the value of when x = 0.
However, when x = 0 we are producing no tools and hence the
estimated fixed cost of electricity is $9.59 per day.
x25.259.9yˆ 
Copyright © 2009 Cengage Learning 4.84
Coefficient of Determination
When we introduced the coefficient of correlation we pointed out that
except for −1, 0, and +1 we cannot precisely interpret its meaning.
We can judge the coefficient of correlation in relation to its proximity
to −1, 0, and +1 only.
Fortunately, we have another measure that can be precisely interpreted.
It is the coefficient of determination, which is calculated by squaring
the coefficient of correlation. For this reason we denote it R2 .
The coefficient of determination measures the amount of variation in
the dependent variable that is explained by the variation in the
independent variable.
Copyright © 2009 Cengage Learning 4.85
Example 4.18
Calculate the coefficient of determination for Example 4.17
y = 2.245x + 9.587
R² = 0.758
0.00
5.00
10.00
15.00
20.00
25.00
30.00
35.00
40.00
45.00
50.00
0 2 4 6 8 10 12 14 16
Electricalcosts
Number of tools
Copyright © 2009 Cengage Learning 4.86
Example 4.18
The coefficient of determination is
R2 = .758
This tells us that 75.8% of the variation in electrical costs is explained
by the number of tools. The remaining 24.2% is unexplained.
Copyright © 2009 Cengage Learning 4.87
Interpreting Correlation (or Determination)
Because of its importance we remind you about the correct
interpretation of the analysis of the relationship between two interval
variables. That is, if two variables are linearly related it does not mean
that X is causing Y. It may mean that another variable is causing both
X and Y or that Y is causing X. Remember
“Correlation is not Causation”
Copyright © 2009 Cengage Learning 4.88
Parameters and Statistics
Population Sample
Size N n
Mean
Variance S2
Standard
Deviation S
Coefficient of
Variation CV cv
Covariance Sxy
Coefficient of
Correlation r

More Related Content

What's hot

Regression analysis
Regression analysisRegression analysis
Regression analysis
Teachers Mitraa
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
Amira Talic
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Sarfraz Ahmad
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
Andi Koentary
 
Statistical inference concept, procedure of hypothesis testing
Statistical inference   concept, procedure of hypothesis testingStatistical inference   concept, procedure of hypothesis testing
Statistical inference concept, procedure of hypothesis testing
AmitaChaudhary19
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Anand Thokal
 
Statistics for Business Decision-making
Statistics for Business Decision-makingStatistics for Business Decision-making
Statistics for Business Decision-making
Jason Martuscello
 
Estimation and hypothesis
Estimation and hypothesisEstimation and hypothesis
Estimation and hypothesis
Junaid Ijaz
 
Introduction To Survival Analysis
Introduction To Survival AnalysisIntroduction To Survival Analysis
Introduction To Survival Analysis
federicorotolo
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
CIToolkit
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Aileen Balbido
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Venkata Reddy Konasani
 
statistic
statisticstatistic
statistic
Pwalmiki
 
Introduction to Statistics - Basic concepts
Introduction to Statistics - Basic conceptsIntroduction to Statistics - Basic concepts
Introduction to Statistics - Basic concepts
DocIbrahimAbdelmonaem
 
Introduction to Probability and Probability Distributions
Introduction to Probability and Probability DistributionsIntroduction to Probability and Probability Distributions
Introduction to Probability and Probability Distributions
Jezhabeth Villegas
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
Anjan Mahanta
 
Statistics
StatisticsStatistics
Statistics
AreehaRana
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
Rajesh Gunesh
 
Chap02 describing data; numerical
Chap02 describing data; numericalChap02 describing data; numerical
Chap02 describing data; numerical
Judianto Nugroho
 
Methods of point estimation
Methods of point estimationMethods of point estimation
Methods of point estimation
Suruchi Somwanshi
 

What's hot (20)

Regression analysis
Regression analysisRegression analysis
Regression analysis
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Introduction to statistics
Introduction to statisticsIntroduction to statistics
Introduction to statistics
 
Statistical inference concept, procedure of hypothesis testing
Statistical inference   concept, procedure of hypothesis testingStatistical inference   concept, procedure of hypothesis testing
Statistical inference concept, procedure of hypothesis testing
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Statistics for Business Decision-making
Statistics for Business Decision-makingStatistics for Business Decision-making
Statistics for Business Decision-making
 
Estimation and hypothesis
Estimation and hypothesisEstimation and hypothesis
Estimation and hypothesis
 
Introduction To Survival Analysis
Introduction To Survival AnalysisIntroduction To Survival Analysis
Introduction To Survival Analysis
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
statistic
statisticstatistic
statistic
 
Introduction to Statistics - Basic concepts
Introduction to Statistics - Basic conceptsIntroduction to Statistics - Basic concepts
Introduction to Statistics - Basic concepts
 
Introduction to Probability and Probability Distributions
Introduction to Probability and Probability DistributionsIntroduction to Probability and Probability Distributions
Introduction to Probability and Probability Distributions
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
 
Statistics
StatisticsStatistics
Statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Chap02 describing data; numerical
Chap02 describing data; numericalChap02 describing data; numerical
Chap02 describing data; numerical
 
Methods of point estimation
Methods of point estimationMethods of point estimation
Methods of point estimation
 

Similar to Introduction to statistics & data analysis

Computing Descriptive Statistics © 2014 Argos.docx
 Computing Descriptive Statistics     © 2014 Argos.docx Computing Descriptive Statistics     © 2014 Argos.docx
Computing Descriptive Statistics © 2014 Argos.docx
aryan532920
 
Computing Descriptive Statistics © 2014 Argos.docx
Computing Descriptive Statistics     © 2014 Argos.docxComputing Descriptive Statistics     © 2014 Argos.docx
Computing Descriptive Statistics © 2014 Argos.docx
AASTHA76
 
Research Method for Business chapter 12
Research Method for Business chapter 12Research Method for Business chapter 12
Research Method for Business chapter 12
Mazhar Poohlah
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docx
darwinming1
 
Statistics and permeability engineering reports
Statistics and permeability engineering reportsStatistics and permeability engineering reports
Statistics and permeability engineering reports
wwwmostafalaith99
 
STATISTICS.pptx
STATISTICS.pptxSTATISTICS.pptx
STATISTICS.pptx
theadarshagarwal
 
INTRODUCTION TO STATISTICS.pptx
INTRODUCTION TO STATISTICS.pptxINTRODUCTION TO STATISTICS.pptx
INTRODUCTION TO STATISTICS.pptx
AvilosErgelaKram
 
Statistics four
Statistics fourStatistics four
Statistics four
Mohamed Hefny
 
Measure OF Central Tendency
Measure OF Central TendencyMeasure OF Central Tendency
Measure OF Central Tendency
Iqrabutt038
 
Topic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptxTopic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptx
JohnLester81
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
Richard Paulino
 
Topic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptxTopic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptx
PreciousAntonetteOrp
 
Measures of central tendency ict integration
Measures of central tendency   ict integrationMeasures of central tendency   ict integration
Measures of central tendency ict integration
Richard Paulino
 
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docx
ANALYSIS ANDINTERPRETATION  OF DATA Analysis and Interpr.docxANALYSIS ANDINTERPRETATION  OF DATA Analysis and Interpr.docx
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docx
cullenrjzsme
 
Unit 3_1.pptx
Unit 3_1.pptxUnit 3_1.pptx
Unit 3_1.pptx
Mrunmayee Manjari
 
Upload 140103034715-phpapp01 (1)
Upload 140103034715-phpapp01 (1)Upload 140103034715-phpapp01 (1)
Upload 140103034715-phpapp01 (1)
captaininfantry
 
Sampling and statistical inference
Sampling and statistical inferenceSampling and statistical inference
Sampling and statistical inference
Bhavik A Shah
 
Mat 255 chapter 3 notes
Mat 255 chapter 3 notesMat 255 chapter 3 notes
Mat 255 chapter 3 notes
adrushle
 
Machine learning pre requisite
Machine learning pre requisiteMachine learning pre requisite
Machine learning pre requisite
Ram Singh
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersion
Gilbert Joseph Abueg
 

Similar to Introduction to statistics & data analysis (20)

Computing Descriptive Statistics © 2014 Argos.docx
 Computing Descriptive Statistics     © 2014 Argos.docx Computing Descriptive Statistics     © 2014 Argos.docx
Computing Descriptive Statistics © 2014 Argos.docx
 
Computing Descriptive Statistics © 2014 Argos.docx
Computing Descriptive Statistics     © 2014 Argos.docxComputing Descriptive Statistics     © 2014 Argos.docx
Computing Descriptive Statistics © 2014 Argos.docx
 
Research Method for Business chapter 12
Research Method for Business chapter 12Research Method for Business chapter 12
Research Method for Business chapter 12
 
Statistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docxStatistical ProcessesCan descriptive statistical processes b.docx
Statistical ProcessesCan descriptive statistical processes b.docx
 
Statistics and permeability engineering reports
Statistics and permeability engineering reportsStatistics and permeability engineering reports
Statistics and permeability engineering reports
 
STATISTICS.pptx
STATISTICS.pptxSTATISTICS.pptx
STATISTICS.pptx
 
INTRODUCTION TO STATISTICS.pptx
INTRODUCTION TO STATISTICS.pptxINTRODUCTION TO STATISTICS.pptx
INTRODUCTION TO STATISTICS.pptx
 
Statistics four
Statistics fourStatistics four
Statistics four
 
Measure OF Central Tendency
Measure OF Central TendencyMeasure OF Central Tendency
Measure OF Central Tendency
 
Topic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptxTopic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptx
 
Measures of central tendency
Measures of central tendencyMeasures of central tendency
Measures of central tendency
 
Topic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptxTopic-1-Review-of-Basic-Statistics.pptx
Topic-1-Review-of-Basic-Statistics.pptx
 
Measures of central tendency ict integration
Measures of central tendency   ict integrationMeasures of central tendency   ict integration
Measures of central tendency ict integration
 
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docx
ANALYSIS ANDINTERPRETATION  OF DATA Analysis and Interpr.docxANALYSIS ANDINTERPRETATION  OF DATA Analysis and Interpr.docx
ANALYSIS ANDINTERPRETATION OF DATA Analysis and Interpr.docx
 
Unit 3_1.pptx
Unit 3_1.pptxUnit 3_1.pptx
Unit 3_1.pptx
 
Upload 140103034715-phpapp01 (1)
Upload 140103034715-phpapp01 (1)Upload 140103034715-phpapp01 (1)
Upload 140103034715-phpapp01 (1)
 
Sampling and statistical inference
Sampling and statistical inferenceSampling and statistical inference
Sampling and statistical inference
 
Mat 255 chapter 3 notes
Mat 255 chapter 3 notesMat 255 chapter 3 notes
Mat 255 chapter 3 notes
 
Machine learning pre requisite
Machine learning pre requisiteMachine learning pre requisite
Machine learning pre requisite
 
CABT Math 8 measures of central tendency and dispersion
CABT Math 8   measures of central tendency and dispersionCABT Math 8   measures of central tendency and dispersion
CABT Math 8 measures of central tendency and dispersion
 

Recently uploaded

一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
74nqk8xf
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
ahzuo
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
AndrzejJarynowski
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
slg6lamcq
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Kiwi Creative
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
u86oixdj
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
apvysm8
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
slg6lamcq
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
kuntobimo2016
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
nyfuhyz
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
bopyb
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 

Recently uploaded (20)

一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
一比一原版(Chester毕业证书)切斯特大学毕业证如何办理
 
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
一比一原版(CBU毕业证)卡普顿大学毕业证如何办理
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
Intelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicineIntelligence supported media monitoring in veterinary medicine
Intelligence supported media monitoring in veterinary medicine
 
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
一比一原版(UniSA毕业证书)南澳大学毕业证如何办理
 
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging DataPredictably Improve Your B2B Tech Company's Performance by Leveraging Data
Predictably Improve Your B2B Tech Company's Performance by Leveraging Data
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
原版制作(Deakin毕业证书)迪肯大学毕业证学位证一模一样
 
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
办(uts毕业证书)悉尼科技大学毕业证学历证书原版一模一样
 
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
一比一原版(Adelaide毕业证书)阿德莱德大学毕业证如何办理
 
State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023State of Artificial intelligence Report 2023
State of Artificial intelligence Report 2023
 
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
一比一原版(UMN文凭证书)明尼苏达大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
一比一原版(GWU,GW文凭证书)乔治·华盛顿大学毕业证如何办理
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 

Introduction to statistics & data analysis

  • 1. Copyright © 2009 Cengage Learning Introduction to Statistics and Data Analysis Resource Person: Muhammad Zaheer
  • 2. Copyright © 2009 Cengage Learning 1.2 Session 1 What is Statistics?
  • 3. Copyright © 2009 Cengage Learning 1.3 What is Statistics? “Statistics is a way to get information from data.”
  • 4. Copyright © 2009 Cengage Learning 1.4 What is Statistics? “Statistics is a way to get information from data” Data Statistics Information Definitions: Oxford English Dictionary Statistics is a tool for creating new understanding from a set of numbers.
  • 5. Copyright © 2009 Cengage Learning 1.5 Key Statistical Concepts Population — a population is the group of all items of interest to a statistics practitioner. — frequently very large; sometimes infinite. E.g. All 5 million Florida voters, per Example Sample — A sample is a set of data drawn from the population. — Potentially very large, but less than the population. E.g. a sample of 765 voters exit polled on election day.
  • 6. Copyright © 2009 Cengage Learning 1.6 Key Statistical Concepts Parameter — A descriptive measure of a population. Statistic — A descriptive measure of a sample.
  • 7. Copyright © 2009 Cengage Learning 1.7 Key Statistical Concepts Populations have Parameters, Samples have Statistics. Parameter Population Sample Statistic Subset
  • 8. Copyright © 2009 Cengage Learning 1.8 Descriptive Statistics …are methods of organizing, summarizing, and presenting data in a convenient and informative way. These methods include: Graphical Techniques and Numerical Techniques The actual method used depends on what information we would like to extract. Are we interested in… • measure(s) of central location? and/or • measure(s) of variability (dispersion)? Descriptive Statistics helps to answer these questions…
  • 9. Copyright © 2009 Cengage Learning 1.9 Inferential Statistics Descriptive Statistics describe the data set that’s being analyzed, but doesn’t allow us to draw any conclusions or make any interferences about the data. Hence we need another branch of statistics: inferential statistics. Inferential statistics is also a set of methods, but it is used to draw conclusions or inferences about characteristics of populations based on data from a sample.
  • 10. Copyright © 2009 Cengage Learning 1.10 Statistical Inference Statistical inference is the process of making an estimate, prediction, or decision about a population based on a sample. Parameter Population Sample Statistic Inference What can we infer about a Population’s Parameters based on a Sample’s Statistics?
  • 11. Copyright © 2009 Cengage Learning 1.11 Statistical Inference We use statistics to make inferences about parameters. Therefore, we can make an estimate, prediction, or decision about a population based on sample data. Thus, we can apply what we know about a sample to the larger population from which it was drawn!
  • 12. Copyright © 2009 Cengage Learning 2.12 Definitions… A variable is some characteristic of a population or sample. E.g. student grades. Typically denoted with a capital letter: X, Y, Z… The values of the variable are the range of possible values for a variable. E.g. student marks (0..100) Data are the observed values of a variable. E.g. student marks: {67, 74, 71, 83, 93, 55, 48}
  • 13. Copyright © 2009 Cengage Learning Definitions • Discrete Variable A variable that can take only specified values, like whole numbers. example number of rooms, number of children etc • Continuous Variable A variable that that theoretically any value between two specified limits a and b. example: time, height, weight, age etc
  • 14. Copyright © 2009 Cengage Learning 4.14 Numerical Descriptive Techniques… Measures of Central Location Mean, Median, Mode Measures of Variability Range, Standard Deviation, Variance, Coefficient of Variation Measures of Relative Standing Percentiles, Quartiles Measures of Linear Relationship Covariance, Correlation, Determination, Least Squares Line
  • 15. Copyright © 2009 Cengage Learning Four Elements of Descriptive Statistical Problems 1. The population or sample of interest. 2. One or more variables 3. Tables, graphs or numerical summary tools. 4. Conclusions about the data based on the patterns revealed.
  • 16. Copyright © 2009 Cengage Learning 4.16 Measures of Central Location… The arithmetic mean, a.k.a. average, shortened to mean, is the most popular & useful measure of central location. It is computed by simply adding up all the observations and dividing by the total number of observations: Sum of the observations Number of observations Mean =
  • 17. Copyright © 2009 Cengage Learning 4.17 Notation… When referring to the number of observations in a population, we use uppercase letter N When referring to the number of observations in a sample, we use lower case letter n The arithmetic mean for a population is denoted with Greek letter “mu”: The arithmetic mean for a sample is denoted with an “x-bar”:
  • 18. Copyright © 2009 Cengage Learning 4.18 Arithmetic Mean… Population Mean Sample Mean
  • 19. Copyright © 2009 Cengage Learning 4.19 The Arithmetic Mean… …is appropriate for describing measurement data, e.g. heights of people, marks of student papers, etc. Merits •It is rigidly defined. •Easy to calculate, simple to understand and easy to interpret. •It is based on all the observations. •Capable of further mathematical treatments. •It is stable in experiments of repeated sampling from a distribution. Demerits •It is seriously affected by extreme values called “outliers”. E.g. as soon as a billionaire moves into a neighborhood, the average household income increases beyond what it was previously. •Not a good measure for extremely skewed distributions. •Can locate a value which doesn’t exist in the data.
  • 20. Copyright © 2009 Cengage Learning 4.20 Measures of Central Location… The median is calculated by placing all the observations in order; the observation that falls in the middle is the median. Data: {0, 7, 12, 5, 14, 8, 0, 9, 22} N=9 (odd) Sort them bottom to top, find the middle: 0 0 5 7 8 9 12 14 22 Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33} N=10 (even) Sort them bottom to top, the middle is the simple average between 8 & 9: 0 0 5 7 8 9 12 14 22 33 median = (8+9)÷2 = 8.5 Sample and population medians are computed the same way.
  • 21. Copyright © 2009 Cengage Learning 4.21 Measures of Central Location… The mode of a set of observations is the value that occurs most frequently. A set of data may have one mode (or modal class), or two, or more modes. Mode is a useful for all data types, though mainly used for nominal data. For large data sets the modal class is much more relevant than a single-value mode. Sample and population modes are computed the same way.
  • 22. Copyright © 2009 Cengage Learning 4.22 Mode… E.g. Data: {0, 7, 12, 5, 14, 8, 0, 9, 22, 33} N=10 Which observation appears most often? The mode for this data set is 0. How is this a measure of “central” location? Frequency Variable A modal class
  • 23. Copyright © 2009 Cengage Learning 4.23 Mean, Median, Mode… If a distribution is symmetrical, the mean, median and mode may coincide… mode mean median
  • 24. Copyright © 2009 Cengage Learning 4.24 Mean, Median, Mode… If a distribution is asymmetrical, say skewed to the left or to the right, the three measures may differ. E.g.: mode mean median
  • 25. Copyright © 2009 Cengage Learning Mean, Median, Mode Relationship • For symmetrical distribution Mean=Median=Mode • For positively skewed distribution Mean>Median>Mode • For negatively skewed distribution Mean<Median<Mode • Mode=3median-2mean
  • 26. Copyright © 2009 Cengage Learning 4.26 Mean, Median, Mode: Which Is Best? With three measures from which to choose, which one should we use? The mean is generally our first selection. However, there are several circumstances when the median is better. The mode is seldom the best measure of central location. One advantage the median holds is that it not as sensitive to extreme values as is the mean.
  • 27. Copyright © 2009 Cengage Learning 4.27 Mean, Median, Mode: Which Is Best? To illustrate, consider the data in an Example internet usage in hours: 0 7 12 5 33 14 8 0 9 22 The mean is 11.0 and the median is 8.5. Now suppose that the respondent who reported 33 hours actually reported 133 hours (obviously an Internet addict). The mean becomes 0.21 10 210 10 22081413351270 n x x n 1i i    
  • 28. Copyright © 2009 Cengage Learning 4.28 Mean, Median, Mode: Which Is Best? What you see in this new MEAN? The median stays the same. When there is a relatively small number of extreme observations (either very small or very large, but not both), the median usually produces a better measure of the center of the data.
  • 29. Copyright © 2009 Cengage Learning 4.29 Mean, Median, & Modes for Ordinal & Nominal Data For ordinal and nominal data the calculation of the mean is not valid. Median is appropriate for ordinal data. For nominal data, a mode calculation is useful for determining highest frequency but not “central location”.
  • 30. Copyright © 2009 Cengage Learning 4.30 Measures of Central Location • Summary… Compute the Mean to • Describe the central location of a single set of interval data Compute the Median to • Describe the central location of a single set of interval or ordinal data Compute the Mode to • Describe a single set of nominal data
  • 31. Copyright © 2009 Cengage Learning 4.31 Geometric Mean The arithmetic mean is the single most popular and useful measure of central location. However, there is another circumstance where neither the mean nor the median is the best measure. When the variable is a growth rate or rate of change, such as the value of an investment over periods of time, we need another measure. This will become apparent from the following illustration.
  • 32. Copyright © 2009 Cengage Learning 4.32 Geometric Mean Suppose you make a 2-year investment of $1,000 and it grows by 100% to $2,000 during the first year. During the second year, however, the investment suffers a 50% loss, from $2,000 back to $1,000. The rates of return for years 1 and 2 are R1 = 100% and R2 = –50%, respectively. The arithmetic mean (and the median) is computed as %25 2 )50(100 2 RR R 21     
  • 33. Copyright © 2009 Cengage Learning 4.33 Geometric Mean But this figure is misleading. Because there was no change in the value of the investment from the beginning to the end of the 2-year period, the “average” compounded rate of return is 0%. As you will see, this is the value of the geometric mean.
  • 34. Copyright © 2009 Cengage Learning 4.34 Geometric Mean Let Ri denote the rate of return (in decimal form) in period i (i = 1, 2, …, n). The geometric mean Rg of the returns is defined such that Solving for Rg we produce the following formula. )R1)...(R1)(R1()R1( n21 n g  1)R1)...(R1)(R1(R n n21g 
  • 35. Copyright © 2009 Cengage Learning 4.35 Geometric Mean The geometric mean of our investment illustration is The geometric mean is therefore 0%. This is the single “average” return that allows us to compute the value of the investment at the end of the investment period from the beginning value. Thus, using the formula for compound interest with the rate = 0%, we find Value at the end of the investment period = 1,000(1 + Rg)2 = 1,000(1 + 0) 2 = 1,000 1)R1)...(R1)(R1(R n n21g  0111])50.[1)(11(2 
  • 36. Copyright © 2009 Cengage Learning 4.36 Geometric Mean Thus, using the formula for compound interest with the rate = 0%, we find Value at the end of the investment period = 1,000(1 + Rg)2 = 1,000(1 + 0) 2 = 1,000
  • 37. Copyright © 2009 Cengage Learning 4.37 Geometric Mean The geometric mean is used whenever we wish to find the “average” growth rate, or rate of change, in a variable over time. However, the arithmetic mean of n returns (or growth rates) is the appropriate mean to calculate if you wish to estimate the mean rate of return (or growth rate) for any single period in the future.
  • 38. Copyright © 2009 Cengage Learning 4.38 Measures of Variability… Measures of central location fail to tell the whole story about the distribution; that is, how much are the observations spread out around the mean value? For example, two sets of class grades are shown. The mean (=50) is the same in each case… But, the red class has greater variability than the blue class.
  • 39. Copyright © 2009 Cengage Learning 4.39 Range… The range is the simplest measure of variability, calculated as: Range = Largest observation – Smallest observation E.g. Data: {4, 4, 4, 4, 50} Range = 46 Data: {4, 8, 15, 24, 39, 50} Range = 46 The range is the same in both cases, but the data sets have very different distributions…
  • 40. Copyright © 2009 Cengage Learning 4.40 Range… Its major advantage is the ease with which it can be computed. Its major shortcoming is its failure to provide information on the dispersion of the observations between the two end points. Hence we need a measure of variability that incorporates all the data and not just two observations. Hence…
  • 41. Copyright © 2009 Cengage Learning 4.41 Variance… Variance and its related measure, standard deviation, are arguably the most important statistics. Used to measure variability, they also play a vital role in almost all statistical inference procedures. Population variance is denoted by (Lower case Greek letter “sigma” squared) Sample variance is denoted by (Lower case “S” squared)
  • 42. Copyright © 2009 Cengage Learning 4.42 Variance… The variance of a population is: The variance of a sample is: population mean sample mean Note: The denominator is sample size (n) minus one ! population size
  • 43. Copyright © 2009 Cengage Learning 4.43 Variance… As you can see, you have to calculate the sample mean (x- bar) in order to calculate the sample variance. Alternatively, there is a short-cut formulation to calculate sample variance directly from the data without the intermediate step of calculating the mean. Its given by:
  • 44. Copyright © 2009 Cengage Learning 4.44 Application… Example: The following sample consists of the number of jobs six students applied for: 17, 15, 23, 7, 9, 13. Finds its mean and variance. What are we looking to calculate? The following sample consists of the number of jobs six students applied for: 17, 15, 23, 7, 9, 13. Finds its mean and variance. …as opposed to  or 2
  • 45. Copyright © 2009 Cengage Learning 4.45 Sample Mean & Variance… Sample Mean Sample Variance Sample Variance (shortcut method)
  • 46. Copyright © 2009 Cengage Learning 4.46 Standard Deviation… The standard deviation is simply the square root of the variance, thus: Population standard deviation: Sample standard deviation:
  • 47. Copyright © 2009 Cengage Learning 4.47 Standard Deviation… Consider Example 4.8 [Xm04-08]where a golf club manufacturer has designed a new club and wants to determine if it is hit more consistently (i.e. with less variability) than with an old club. Using Data > Data Analysis > Descriptive Statistics in Excel, we produce the following tables for interpretation… You get more consistent distance with the new club.
  • 48. Copyright © 2009 Cengage Learning 4.48 Interpreting Standard Deviation… The standard deviation can be used to compare the variability of several distributions and make a statement about the general shape of a distribution. If the histogram is bell shaped, we can use the Empirical Rule, which states: 1) Approximately 68% of all observations fall within one standard deviation of the mean. 2) Approximately 95% of all observations fall within two standard deviations of the mean. 3) Approximately 99.7% of all observations fall within three standard deviations of the mean.
  • 49. Copyright © 2009 Cengage Learning 4.49 The Empirical Rule… Approximately 68% of all observations fall within one standard deviation of the mean. Approximately 95% of all observations fall within two standard deviations of the mean. Approximately 99.7% of all observations fall within three standard deviations of the mean.
  • 50. Copyright © 2009 Cengage Learning 4.50 Chebysheff’s Theorem… A more general interpretation of the standard deviation is derived from Chebysheff’s Theorem, which applies to all shapes of histograms (not just bell shaped). The proportion of observations in any sample that lie within k standard deviations of the mean is at least: For k=2 (say), the theorem states that at least 3/4 of all observations lie within 2 standard deviations of the mean. This is a “lower bound” compared to Empirical Rule’s approximation (95%).
  • 51. Copyright © 2009 Cengage Learning 4.51 Interpreting Standard Deviation Suppose that the mean and standard deviation of last year’s midterm test marks are 70 and 5, respectively. If the histogram is bell-shaped then we know that approximately 68% of the marks fell between 65 and 75, approximately 95% of the marks fell between 60 and 80, and approximately 99.7% of the marks fell between 55 and 85. If the histogram is not at all bell-shaped we can say that at least 75% of the marks fell between 60 and 80, and at least 88.9% of the marks fell between 55 and 85. (We can use other values of k.)
  • 52. Copyright © 2009 Cengage Learning 4.52 Coefficient of Variation… The coefficient of variation of a set of observations is the standard deviation of the observations divided by their mean, that is: Population coefficient of variation = CV = Sample coefficient of variation = cv =
  • 53. Copyright © 2009 Cengage Learning 4.53 Coefficient of Variation… This coefficient provides a proportionate measure of variation, e.g. A standard deviation of 10 may be perceived as large when the mean value is 100, but only moderately large when the mean value is 500.
  • 54. Copyright © 2009 Cengage Learning 4.54 Measures of Relative Standing Measures of relative standing are designed to provide information about the position of particular values relative to the entire data set. Percentile: the Pth percentile is the value for which P percent are less than that value and (100-P)% are greater than that value. Suppose you scored in the 60th percentile on the GMAT, that means 60% of the other scores were below yours, while 40% of scores were above yours.
  • 55. Copyright © 2009 Cengage Learning 4.55 Quartiles… We have special names for the 25th, 50th, and 75th percentiles, namely quartiles. The first or lower quartile is labeled Q1 = 25th percentile. The second quartile, Q2 = 50th percentile (which is also the median). The third or upper quartile, Q3 = 75th percentile. We can also convert percentiles into quintiles (fifths) and deciles (tenths).
  • 56. Copyright © 2009 Cengage Learning 4.56 Commonly Used Percentiles… First (lower) decile = 10th percentile First (lower) quartile, Q1, = 25th percentile Second (middle)quartile,Q2, = 50th percentile Third quartile, Q3, = 75th percentile Ninth (upper) decile = 90th percentile Note: If your exam mark places you in the 80th percentile, that doesn’t mean you scored 80% on the exam – it means that 80% of your peers scored lower than you on the exam; It is about your position relative to others.
  • 57. Copyright © 2009 Cengage Learning 4.57 Location of Percentiles… The following formula allows us to approximate the location of any percentile:
  • 58. Copyright © 2009 Cengage Learning 4.58 Location of Percentiles… Recall the data from Example 4.1: 0 0 5 7 8 9 12 14 22 33 Where is the location of the 25th percentile? That is, at which point are 25% of the values lower and 75% of the values higher? L25 = (10+1)(25/100) = 2.75 0 0 5 7 8 9 12 14 22 33 The 25th percentile is three-quarters of the distance between the second (which is 0) and the third (which is 5) observations. Three-quarters of the distance is: (.75)(5 – 0) = 3.75 Because the second observation is 0, the 25th percentile is 0 + 3.75 = 3.75
  • 59. Copyright © 2009 Cengage Learning 4.59 Location of Percentiles… What about the upper quartile? L75 = (10+1)(75/100) = 8.25 0 0 5 7 8 9 12 14 22 33 It is located one-quarter of the distance between the eighth and the ninth observations, which are 14 and 22, respectively. One-quarter of the distance is: (.25)(22 - 14) = 2, which means the 75th percentile is at: 14 + 2 = 16
  • 60. Copyright © 2009 Cengage Learning 4.60 Location of Percentiles… Please remember… 0 0 | 5 7 8 9 12 14 | 22 33 16 Lp determines the position in the data set where the percentile value lies, not the value of the percentile itself. 3.75 position 8.25 position 2.75
  • 61. Copyright © 2009 Cengage Learning 4.61 Interquartile Range… The quartiles can be used to create another measure of variability, the interquartile range, which is defined as follows: Interquartile Range = Q3 – Q1 The interquartile range measures the spread of the middle 50% of the observations. Large values of this statistic mean that the 1st and 3rd quartiles are far apart indicating a high level of variability.
  • 62. Copyright © 2009 Cengage Learning 4.62 Measures of Linear Relationship… We now present three numerical measures of linear relationship that provide information as to the strength & direction of a linear relationship between two variables (if one exists). They are the covariance, the coefficient of correlation, and the coefficient of determination.
  • 63. Copyright © 2009 Cengage Learning 4.63 Covariance… population mean of variable X, variable Y sample mean of variable X, variable Y Note: divisor is n-1, not n as you may expect.
  • 64. Copyright © 2009 Cengage Learning 4.64 Covariance… In much the same way there was a “shortcut” for calculating sample variance without having to calculate the sample mean, there is also a shortcut for calculating sample covariance without having to first calculate the means:
  • 65. Copyright © 2009 Cengage Learning 4.65 Covariance Illustrated… Consider the following three sets of data: In each set, the values of X are the same, and the value for Y are the same; the only thing that’s changed is the order of the Y’s. In set #1, as X increases so does Y; Sxy is large & positive In set #2, as X increases, Y decreases; Sxy is large & negative In set #3, as X increases, Y doesn’t move in any particular way; Sxy is “small”
  • 66. Copyright © 2009 Cengage Learning 4.66 Covariance… (Generally speaking) When two variables move in the same direction (both increase or both decrease), the covariance will be a large positive number. When two variables move in opposite directions, the covariance is a large negative number. When there is no particular pattern, the covariance is a small number. However, it is often difficult to determine whether a particular covariance is large or small. The next parameter/statistic addresses this problem.
  • 67. Copyright © 2009 Cengage Learning 4.67 Coefficient of Correlation… The coefficient of correlation is defined as the covariance divided by the standard deviations of the variables: Greek letter “rho” This coefficient answers the question: How strong is the association between X and Y?
  • 68. Copyright © 2009 Cengage Learning 4.68 Coefficient of Correlation… The advantage of the coefficient of correlation over covariance is that it has fixed range from -1 to +1, thus: If the two variables are very strongly positively related, the coefficient value is close to +1 (strong positive linear relationship). If the two variables are very strongly negatively related, the coefficient value is close to -1 (strong negative linear relationship). No straight line relationship is indicated by a coefficient close to zero.
  • 69. Copyright © 2009 Cengage Learning 4.69 Coefficient of Correlation… r or r = +1 0 -1 Strong positive linear relationship No linear relationship Strong negative linear relationship
  • 70. Copyright © 2009 Cengage Learning 4.70 Example 4.16 Calculate the coefficient of correlation for the three sets of data above.
  • 71. Copyright © 2009 Cengage Learning 4.71 Example 4.16 Because we’ve already calculated the covariances we only need compute the standard deviations of X and Y. 0.5 3 762 x    0.20 3 272013 y    0.7 2 419 13 )57()56()52( s 222 2 x       0.49 2 490149 13 )2027()2020()2013( s 222 2 y      
  • 72. Copyright © 2009 Cengage Learning 4.72 Example 4.16 The standard deviations are 65.20.7sx  00.70.49sy 
  • 73. Copyright © 2009 Cengage Learning 4.73 Example 4.16 The coefficients of correlation are Set 1: Set 2: Set 3: 943. )0.7)(65.2( 5.17 ss s r yx xy  943. )0.7)(65.2( 5.17 ss s r yx xy    189. )0.7)(65.2( 5.3 ss s r yx xy   
  • 74. Copyright © 2009 Cengage Learning 4.74 Least Squares Method The objective of the scatter diagram is to measure the strength and direction of the linear relationship. Both can be more easily judged by drawing a straight line through the data. We need an objective method of producing a straight line. Such a method has been developed; it is called the least squares method.
  • 75. Copyright © 2009 Cengage Learning 4.75 Least Squares Method Recall, the slope-intercept equation for a line is expressed in these terms: y = mx + b Where: m is the slope of the line b is the y-intercept. If we’ve determined there is a linear relationship between two variables with covariance and the coefficient of correlation, can we determine a linear function of the relationship?
  • 76. Copyright © 2009 Cengage Learning 4.76 The Least Squares Method …produces a straight line drawn through the points so that the sum of squared deviations between the points and the line is minimized. This line is represented by the equation: b0 (“b” naught) is the y-intercept, b1 is the slope, and (“y” hat) is the value of y determined by the line.
  • 77. Copyright © 2009 Cengage Learning 4.77 The Least Squares Method The coefficients b0 and b1 are given by:
  • 78. Copyright © 2009 Cengage Learning 4.78 Fixed and Variable Costs Fixed costs are costs that must be paid whether or not any units are produced. These costs are "fixed" over a specified period of time or range of production. Variable costs are costs that vary directly with the number of products produced.
  • 79. Copyright © 2009 Cengage Learning 4.79 Fixed and Variable Costs There are some expenses that are mixed. There are several ways to break the mixed costs in its fixed and variable components. One such method is the least squares line. That is, we express the total costs of some component as y = b0 + b1x where y = total mixed cost, b0 = fixed cost and b1 = variable cost, and x is the number of units.
  • 80. Copyright © 2009 Cengage Learning 4.80 Example 4.17 A tool and die maker operates out of a small shop making specialized tools. He is considering increasing the size of his business and needed to know more about his costs. One such cost is electricity, which he needs to operate his machines and lights. (Some jobs require that he turn on extra bright lights to illuminate his work.) He keeps track of his daily electricity costs and the number of tools that he made that day. Determine the fixed and variable electricity costs. [Xm04-17]
  • 81. Copyright © 2009 Cengage Learning 4.81 Example 4.17 y = 2.245x + 9.587 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 0 2 4 6 8 10 12 14 16 Electricalcosts Number of tools
  • 82. Copyright © 2009 Cengage Learning 4.82 Example 4.17 The slope is defined as rise/run, which means that it is the change in y (rise) for a 1-unit increase in x (run). The slope measures the marginal rate of change in the dependent variable. The marginal rate of change refers to the effect of increasing the independent variable by one additional unit. In this example the slope is 2.25, which means that for each 1-unit increase in the number of tools, the marginal increase in the electricity cost 2.25. Thus, the estimated variable cost is $2.25 per tool. x245.2587.9yˆ 
  • 83. Copyright © 2009 Cengage Learning 4.83 Example 4.17 The y-intercept is 9.57. That is, the line strikes the y-axis at 9.57. This is simply the value of when x = 0. However, when x = 0 we are producing no tools and hence the estimated fixed cost of electricity is $9.59 per day. x25.259.9yˆ 
  • 84. Copyright © 2009 Cengage Learning 4.84 Coefficient of Determination When we introduced the coefficient of correlation we pointed out that except for −1, 0, and +1 we cannot precisely interpret its meaning. We can judge the coefficient of correlation in relation to its proximity to −1, 0, and +1 only. Fortunately, we have another measure that can be precisely interpreted. It is the coefficient of determination, which is calculated by squaring the coefficient of correlation. For this reason we denote it R2 . The coefficient of determination measures the amount of variation in the dependent variable that is explained by the variation in the independent variable.
  • 85. Copyright © 2009 Cengage Learning 4.85 Example 4.18 Calculate the coefficient of determination for Example 4.17 y = 2.245x + 9.587 R² = 0.758 0.00 5.00 10.00 15.00 20.00 25.00 30.00 35.00 40.00 45.00 50.00 0 2 4 6 8 10 12 14 16 Electricalcosts Number of tools
  • 86. Copyright © 2009 Cengage Learning 4.86 Example 4.18 The coefficient of determination is R2 = .758 This tells us that 75.8% of the variation in electrical costs is explained by the number of tools. The remaining 24.2% is unexplained.
  • 87. Copyright © 2009 Cengage Learning 4.87 Interpreting Correlation (or Determination) Because of its importance we remind you about the correct interpretation of the analysis of the relationship between two interval variables. That is, if two variables are linearly related it does not mean that X is causing Y. It may mean that another variable is causing both X and Y or that Y is causing X. Remember “Correlation is not Causation”
  • 88. Copyright © 2009 Cengage Learning 4.88 Parameters and Statistics Population Sample Size N n Mean Variance S2 Standard Deviation S Coefficient of Variation CV cv Covariance Sxy Coefficient of Correlation r

Editor's Notes

  1. March 20, 2020
  2. March 20, 2020