SlideShare a Scribd company logo
1 of 93
• Course Provider: Dr Anis Fatima
• Office: Room 5, Second Floor IM
Building
• Phone Ext: 2250
• Email: anisf@neduet.edu.pk
Driveonelink:
https://1drv.ms/f/s!AkgKqDvMcQJRgjeZOGE0fgYluSRX
Course contact information
1
Books and lecture notes
2
Introduction and Data Collection
Basic Concepts of Statistics
Statistics is concerned with:
• Processing and analyzing data
• Collecting, presenting, and transforming
data to assist decision makers
Key Definitions
• A population (universe) is the collection of
all members of a group
• A sample is a portion of the population
selected for analysis
• A parameter is a numerical measure that
describes a characteristic of a population
• A statistic is a numerical measure that
describes a characteristic of a sample
Population vs. Sample
a b c d
ef gh i jk l m n
o p q rs t u v w
x y z
Population Sample
b c
g i n
o r u
y
Measures used to describe a
population are called
parameters
Measures computed from
sample data are called
statistics
Two Branches of Statistics
• Descriptive statistics
– Collecting, summarizing, and presenting data
• Inferential statistics
– Drawing conclusions about a population based only
on sample data
Descriptive Statistics
• Collect data
– e.g., Survey
• Present data
– e.g., Tables and graphs
• Characterize data
– e.g., Sample mean = i
X
n

Inferential Statistics
• Estimation
– e.g., Estimate the population
mean weight using the sample
mean weight
• Hypothesis testing
– e.g., Test the claim that the
population mean weight is 120
pounds
Drawing conclusions about a population based on sample results.
Collecting Data
Secondary
Data Compilation
Observation
Experimentation
Print or Electronic
Survey
Primary
Data Collection
Types of Data
Data
Categorical Numerical
Discrete Continuous
Examples:
 Marital Status
 Political Party
 Eye Color
(Defined categories) Examples:
 Number of Children
 Defects per hour
(Counted items)
Examples:
 Weight
 Voltage
(Measured characteristics)
Levels of Measurement
and Measurement Scales
Interval Data
Ordinal Data
Nominal Data
Highest Level
(Strongest forms of
measurement)
Higher Levels
Lowest Level
(Weakest form of
measurement)
Categories (no ordering
or direction)
Ordered Categories
(rankings, order, or
scaling)
Differences between
measurements but no
true zero
Ratio Data
Differences between
measurements, true
zero exists
Levels of Measurement
and Measurement Scales
Interval Data
Ordinal Data
Nominal Data
Height, Age, Weekly Food
Spending
Service quality rating,
Standard & Poor’s bond
rating, Student letter grades
Marital status, Type of car
owned
Ratio Data
Temperature in Fahrenheit,
Standardized exam score
Categories (no ordering or
direction)
Ordered Categories (rankings,
order, or scaling)
Differences between
measurements but no true
zero
Differences between
measurements, true zero
exists
EXAMPLES:
Organizing and Presenting
Data Graphically
• Data in raw form are usually not easy to use for decision
making
– Some type of organization is needed
• Table
• Graph
• Techniques reviewed here:
– Bar charts and pie charts
– Pareto diagram
– Ordered array
– Stem-and-leaf display
– Frequency distributions, histograms and polygons
– Cumulative distributions and ogives
– Contingency tables
– Scatter diagrams
Tables and Charts for Categorical
Data
Categorical Data
Graphing Data
Pie Charts Pareto
Diagram
Bar Charts
Tabulating Data
Summary Table
The Summary Table
Example: Current Investment Portfolio
Investment Amount Percentage
Type (in thousands $) (%)
Stocks 46.5 42.27
Bonds 32.0 29.09
CD 15.5 14.09
Savings 16.0 14.55
Total 110.0 100.0
(Variables are
Categorical)
Summarize data by category
Bar and Pie Charts
• Bar charts and Pie charts are often used
for qualitative data (categories or
nominal scale)
• Height of bar or size of pie slice shows
the frequency or percentage for each
category
Bar Chart Example
Investor's Portfolio
0 10 20 30 40 50
Stocks
Bonds
CD
Savings
Amount in $1000's
Investment Amount Percentage
Type (in thousands $) (%)
Stocks 46.5 42.27
Bonds 32.0 29.09
CD 15.5 14.09
Savings 16.0 14.55
Total 110.0 100.0
Current Investment Portfolio
Pie Chart Example
Percentages are
rounded to the
nearest percent
Current Investment Portfolio
Savings
15%
CD
14%
Bonds
29%
Stocks
42%
Investment Amount Percentage
Type (in thousands $) (%)
Stocks 46.5 42.27
Bonds 32.0 29.09
CD 15.5 14.09
Savings 16.0 14.55
Total 110.0 100.0
Pareto Diagram
• Used to portray categorical data (nominal scale)
• A bar chart, where categories are shown in
descending order of frequency
• A cumulative polygon is often shown in the same
graph
• Used to separate the “vital few” from the “trivial
many”
Pareto Diagram Example
cumulative
%
invested
(line
graph)
%
invested
in
each
category
(bar
graph)
0%
5%
10%
15%
20%
25%
30%
35%
40%
45%
Stocks Bonds Savings CD
0%
10%
20%
30%
40%
50%
60%
70%
80%
90%
100%
Current Investment Portfolio
Tables and Charts for
Numerical Data
Numerical Data
Ordered Array
Stem-and-Leaf
Display Histogram Polygon Ogive
Frequency Distributions and
Cumulative Distributions
The Ordered Array
A sequence of data in rank order:
 Shows range (min to max)
 Provides some signals about variability
within the range
 May help identify outliers (unusual observations)
 If the data set is large, the ordered array is
less useful
• Data in raw form (as collected):
24, 26, 24, 21, 27, 27, 30, 41, 32, 38
• Data in ordered array from smallest to largest:
21, 24, 24, 26, 27, 27, 30, 32, 38, 41
(continued)
The Ordered Array
What is a Frequency Distribution?
• A frequency distribution is a list or a table …
• containing class groupings (ranges within
which the data fall) ...
• and the corresponding frequencies with
which data fall within each grouping or
category
Tabulating Numerical Data:
Frequency Distributions
Why Use a Frequency Distribution?
• It is a way to summarize numerical data
• It condenses the raw data into a more
useful form...
• It allows for a quick visual interpretation
of the data
Class Intervals
and Class Boundaries
• Each class grouping has the same width
• Determine the width of each interval by
 Usually at least 5 but no more than 15 groupings
 Class boundaries never overlap
 Round up the interval width to get desirable
endpoints
groupings
class
desired
of
number
range
interval
of
Width 
Frequency Distribution Example
Example: A manufacturer of insulation
randomly selects 20 winter days and records
the daily high temperature
24, 35, 17, 21, 24, 37, 26, 46, 58, 30,
32, 13, 12, 38, 41, 43, 44, 27, 53, 27
• Sort raw data in ascending order:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
• Find range: 58 - 12 = 46
• Select number of classes: 5 (usually between 5 and 15)
• Compute class interval (width): 10 (46/5 then round up)
• Determine class boundaries (limits): 10, 20, 30, 40, 50, 60
• Compute class midpoints: 15, 25, 35, 45, 55
• Count observations & assign to classes
Frequency Distribution Example
(continued)
Frequency Distribution Example
Class Frequency
10 but less than 20 3 .15 15
20 but less than 30 6 .30 30
30 but less than 40 5 .25 25
40 but less than 50 4 .20 20
50 but less than 60 2 .10 10
Total 20 1.00 100
Relative
Frequency
Percentage
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
(continued)
Tabulating Numerical Data:
Cumulative Frequency
Class
10 but less than 20 3 15 3 15
20 but less than 30 6 30 9 45
30 but less than 40 5 25 14 70
40 but less than 50 4 20 18 90
50 but less than 60 2 10 20 100
Total 20 100
Percentage
Cumulative
Percentage
Data in ordered array:
12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58
Frequency
Cumulative
Frequency
Graphing Numerical Data:
The Histogram
• A graph of the data in a frequency distribution is
called a histogram
• The class boundaries (or class midpoints) are
shown on the horizontal axis
• the vertical axis is either frequency, relative
frequency, or percentage
• Bars of the appropriate heights are used to
represent the number of observations within
each class
Histogram: Daily High Temperature
0
1
2
3
4
5
6
7
5 15 25 35 45 55 65
Frequency
Class Midpoints
Histogram Example
(No gaps
between
bars)
Class
10 but less than 20 15 3
20 but less than 30 25 6
30 but less than 40 35 5
40 but less than 50 45 4
50 but less than 60 55 2
Frequency
Class Midpoint
Frequency Polygon: Daily High Temperature
0
1
2
3
4
5
6
7
5 15 25 35 45 55 65
Frequency
Graphing Numerical Data:
The Frequency Polygon
Class Midpoints
Class
10 but less than 20 15 3
20 but less than 30 25 6
30 but less than 40 35 5
40 but less than 50 45 4
50 but less than 60 55 2
Frequency
Class Midpoint
(In a percentage polygon
the vertical axis would be
defined to show the
percentage of
observations per class)
Graphing Cumulative Frequencies:
The Ogive (Cumulative % Polygon)
Ogive: Daily High Temperature
0
20
40
60
80
100
10 20 30 40 50 60
Cumulative
Percentage
Class Boundaries (Not Midpoints)
Class
Less than 10 0 0
10 but less than 20 10 15
20 but less than 30 20 45
30 but less than 40 30 70
40 but less than 50 40 90
50 but less than 60 50 100
Cumulative
Percentage
Lower
class
boundary
10 20 30 40 50 60
Tabulating and Graphing
Multivariate Categorical Data
• Contingency Table for Investment Choices ($1000’s)
Investment Investor A Investor B Investor C Total
Category
Stocks 46.5 55 27.5 129
Bonds 32.0 44 19.0 95
CD 15.5 20 13.5 49
Savings 16.0 28 7.0 51
Total 110.0 147 67.0 324
(Individual values could also be expressed as percentages of the overall total,
percentages of the row totals, or percentages of the column totals)
• Side-by-side bar charts
(continued)
Tabulating and Graphing
Multivariate Categorical Data
Comparing Investors
0 10 20 30 40 50 60
S toc k s
B onds
CD
S avings
Inves tor A Inves tor B Inves tor C
Side-by-Side Chart Example
• Sales by quarter for three sales territories:
0
10
20
30
40
50
60
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East
West
North
1st Qtr 2nd Qtr 3rd Qtr 4th Qtr
East 20.4 27.4 59 20.4
West 30.6 38.6 34.6 31.6
North 45.9 46.9 45 43.9
• Scatter Diagrams are used to examine
possible relationships between two
numerical variables
• The Scatter Diagram:
– one variable is measured on the vertical
axis and the other variable is measured on
the horizontal axis
Scatter Diagrams
Scatter Diagram Example
Cost per Day vs. Production Volume
0
50
100
150
200
250
0 10 20 30 40 50 60 70
Volume per Day
Cost
per
Day
Volume
per day
Cost per
day
23 131
24 120
26 140
29 151
33 160
38 167
41 185
42 170
50 188
55 195
60 200
• A Time Series Plot is used to study
patterns in the values of a variable over
time
• The Time Series Plot:
– one variable is measured on the vertical
axis and the time period is measured on
the horizontal axis
Time Series Plot
Scatter Diagram Example
Number of Franchises, 1996-2004
0
20
40
60
80
100
120
1994 1996 1998 2000 2002 2004 2006
Year
Number
of
Franchises
Year
Number of
Franchises
1996 43
1997 54
1998 60
1999 73
2000 82
2001 95
2002 107
2003 99
2004 95
Summary Measures
Arithmetic Mean
Median
Mode
Describing Data Numerically
Variance
Standard Deviation
Coefficient of Variation
Range
Interquartile Range
Skewness
Central Tendency Variation Shape
Quartiles
Measures of Central Tendency
Central Tendency
Arithmetic Mean Median Mode
n
X
X
n
i
i


 1
n
/
1
n
2
1
G )
X
X
X
(
X 


 
Overview
Midpoint of
ranked values
Most
frequently
observed
value
Arithmetic Mean
• The arithmetic mean (mean) is the most
common measure of central tendency
– For a sample of size n:
Sample size
n
X
X
X
n
X
X n
2
1
n
1
i
i






 
Observed values
Arithmetic Mean
• The most common measure of central tendency
• Mean = sum of values divided by the number of values
• Affected by extreme values (outliers)
(continued)
0 1 2 3 4 5 6 7 8 9 10
Mean = 3
0 1 2 3 4 5 6 7 8 9 10
Mean = 4
3
5
15
5
5
4
3
2
1






4
5
20
5
10
4
3
2
1






Median
• In an ordered array, the median is the “middle”
number (50% above, 50% below)
• Not affected by extreme values
0 1 2 3 4 5 6 7 8 9 10
Median = 3
0 1 2 3 4 5 6 7 8 9 10
Median = 3
Finding the Median
• The location of the median:
– If the number of values is odd, the median is the middle number
– If the number of values is even, the median is the average of the two
middle numbers
• Note that is not the value of the median, only the
position of the median in the ranked data
data
ordered
the
in
position
2
1
n
position
Median


2
1
n 
Chap 3-49
Mode
• A measure of central tendency
• Value that occurs most often
• Not affected by extreme values
• Used for either numerical or categorical
(nominal) data
• There may may be no mode
• There may be several modes
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Mode = 9
0 1 2 3 4 5 6
No Mode
• Five houses on a hill by the beach
Review Example
$2,000 K
$500 K
$300 K
$100 K
$100 K
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Review Example:
Summary Statistics
• Mean: ($3,000,000/5)
= $600,000
• Median: middle value of ranked data
= $300,000
• Mode: most frequent value
= $100,000
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Sum $3,000,000
• Mean is generally used, unless
extreme values (outliers) exist
• Then median is often used, since
the median is not sensitive to
extreme values.
– Example: Median home prices may
be reported for a region – less
sensitive to outliers
Which measure of location
is the “best”?
Quartiles
• Quartiles split the ranked data into 4 segments with an
equal number of values per segment
25% 25% 25% 25%
 The first quartile, Q1, is the value for which 25% of the
observations are smaller and 75% are larger
 Q2 is the same as the median (50% are smaller, 50% are larger)
 Only 25% of the observations are greater than the third
quartile
Q1 Q2 Q3
Quartile Formulas
Find a quartile by determining the value in the
appropriate position in the ranked data, where
First quartile position: Q1 = (n+1)/4
Second quartile position: Q2 = (n+1)/2 (the median position)
Third quartile position: Q3 = 3(n+1)/4
where n is the number of observed values
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data
so use the value half way between the 2nd and 3rd values,
so Q1 = 12.5
Quartiles
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
 Example: Find the first quartile
Q1 and Q3 are measures of noncentral location
Q2 = median, a measure of central tendency
(n = 9)
Q1 is in the (9+1)/4 = 2.5 position of the ranked data,
so Q1 = 12.5
Q2 is in the (9+1)/2 = 5th position of the ranked data,
so Q2 = median = 16
Q3 is in the 3(9+1)/4 = 7.5 position of the ranked data,
so Q3 = 19.5
Quartiles
Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22
 Example:
(continued)
Same center,
different variation
Measures of Variation
Variation
Variance Standard
Deviation
Coefficient of
Variation
Range Interquartile
Range
 Measures of variation give information
on the spread or variability of the
data values.
Range
• Simplest measure of variation
• Difference between the largest and the
smallest values in a set of data:
Range = Xlargest – Xsmallest
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
Range = 14 - 1 = 13
Example:
• Ignores the way in which data are distributed
• Sensitive to outliers
7 8 9 10 11 12
Range = 12 - 7 = 5
7 8 9 10 11 12
Range = 12 - 7 = 5
Disadvantages of the Range
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5
1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120
Range = 5 - 1 = 4
Range = 120 - 1 = 119
Basic Business Statistics, 10e ©
2006 Prentice-Hall, Inc.
Chap 3-60
Interquartile Range
• Can eliminate some outlier problems by using
the interquartile range
• Eliminate some high- and low-valued
observations and calculate the range from the
remaining values
• Interquartile range = 3rd quartile – 1st quartile
= Q3 – Q1
Interquartile Range
Median
(Q2)
X
maximum
X
minimum
Q1 Q3
Example:
25% 25% 25% 25%
12 30 45 57 70
Interquartile range
= 57 – 30 = 27
Basic Business Statistics, 10e ©
2006 Prentice-Hall, Inc.
Chap 3-62
• Average (approximately) of squared deviations
of values from the mean
– Sample variance:
Variance
1
-
n
)
X
(X
S
n
1
i
2
i
2




Where = mean
n = sample size
Xi = ith value of the variable X
X
Standard Deviation
• Most commonly used measure of variation
• Shows variation about the mean
• Is the square root of the variance
• Has the same units as the original data
– Sample standard deviation:
1
-
n
)
X
(X
S
n
1
i
2
i




Calculation Example:
Sample Standard Deviation
Sample
Data (Xi) : 10 12 14 15 17 18 18 24
n = 8 Mean = X = 16
4.3095
7
130
1
8
16)
(24
16)
(14
16)
(12
16)
(10
1
n
)
X
(24
)
X
(14
)
X
(12
)
X
(10
S
2
2
2
2
2
2
2
2
























A measure of the “average” scatter
around the mean
Measuring variation
Small standard deviation
Large standard deviation
Comparing Standard Deviations
Mean = 15.5
S = 3.338
11 12 13 14 15 16 17 18 19 20 21
11 12 13 14 15 16 17 18 19 20 21
Data B
Data A
Mean = 15.5
S = 0.926
11 12 13 14 15 16 17 18 19 20 21
Mean = 15.5
S = 4.567
Data C
Advantages of Variance and
Standard Deviation
• Each value in the data set is used in the
calculation
• Values far from the mean are given extra
weight
(because deviations from the mean are squared)
Coefficient of Variation
• Measures relative variation
• Always in percentage (%)
• Shows variation relative to mean
• Can be used to compare two or more sets of
data measured in different units
100%
X
S
CV 









Comparing Coefficient
of Variation
• Stock A:
– Average price last year = $50
– Standard deviation = $5
• Stock B:
– Average price last year = $100
– Standard deviation = $5
Both stocks have
the same
standard
deviation, but
stock B is less
variable relative
to its price
10%
100%
$50
$5
100%
X
S
CVA 












5%
100%
$100
$5
100%
X
S
CVB 












Z Scores
• A measure of distance from the mean (for example, a Z-
score of 2.0 means that a value is 2.0 standard deviations
from the mean)
• The difference between a value and the mean, divided by
the standard deviation
• A Z score above 3.0 or below -3.0 is considered an outlier
S
X
X
Z


Z Scores
Example:
• If the mean is 14.0 and the standard deviation is 3.0, what is
the Z score for the value 18.5?
• The value 18.5 is 1.5 standard deviations above the mean
• (A negative Z-score would mean that a value is less than the
mean)
1.5
3.0
14.0
18.5
S
X
X
Z 




(continued)
Shape of a Distribution
• Describes how data are distributed
• Measures of shape
– Symmetric or skewed
Mean = Median
Mean < Median Median < Mean
Right-Skewed
Left-Skewed Symmetric
Using Microsoft Excel
• Descriptive Statistics can be obtained from
Microsoft® Excel
– Use menu choice:
tools / data analysis / descriptive statistics
– Enter details in dialog box
Basic Business Statistics, 10e ©
2006 Prentice-Hall, Inc.
Chap 3-74
Using Excel
Use menu choice:
tools / data analysis / descriptive
statistics
Basic Business Statistics, 10e ©
2006 Prentice-Hall, Inc.
Chap 3-75
• Enter dialog box
details
• Check box for
summary statistics
• Click OK
Using Excel
(continued)
Basic Business Statistics, 10e ©
2006 Prentice-Hall, Inc.
Chap 3-76
Excel output
Microsoft Excel
descriptive statistics output,
using the house price data:
House Prices:
$2,000,000
500,000
300,000
100,000
100,000
Numerical Measures
for a Population
• Population summary measures are called parameters
• The population mean is the sum of the values in the
population divided by the population size, N
N
X
X
X
N
X
N
2
1
N
1
i
i







 
μ = population mean
N = population size
Xi = ith value of the variable X
Where
• Average of squared deviations of values from
the mean
– Population variance:
Population Variance
N
μ)
(X
σ
N
1
i
2
i
2




Where μ = population mean
N = population size
Xi = ith value of the variable X
Population Standard Deviation
• Most commonly used measure of variation
• Shows variation about the mean
• Is the square root of the population variance
• Has the same units as the original data
– Population standard deviation:
N
μ)
(X
σ
N
1
i
2
i




• If the data distribution is approximately bell-
shaped, then the interval:
• contains about 68% of the values in
the population or the sample
The Empirical Rule
1σ
μ 
μ
68%
1σ
μ 
• contains about 95% of the values in
the population or the sample
• contains about 99.7% of the values
in the population or the
sample
The Empirical Rule
2σ
μ 
3σ
μ 
3σ
μ 
99.7%
95%
2σ
μ 
Exploratory Data Analysis
• Box-and-Whisker Plot: A Graphical display of
data using 5-number summary:
Minimum -- Q1 -- Median -- Q3 -- Maximum
Example:
Minimum 1st Median 3rd Maximum
Quartile Quartile
Minimum 1st Median 3rd Maximum
Quartile Quartile
25% 25% 25% 25%
Shape of Box-and-Whisker Plots
• The Box and central line are centered between the
endpoints if data are symmetric around the median
• A Box-and-Whisker plot can be shown in either vertical or
horizontal format
Min Q1 Median Q3 Max
Distribution Shape and
Box-and-Whisker Plot
Right-Skewed
Left-Skewed Symmetric
Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
Box-and-Whisker Plot Example
• Below is a Box-and-Whisker plot for the
following data:
0 2 2 2 3 3 4 5 5 10 27
• The data are right skewed, as the plot depicts
0 2 3 5 27
0 2 3 5 27
Min Q1 Q2 Q3 Max
The Sample Covariance
• The sample covariance measures the strength of the linear
relationship between two variables (called bivariate data)
• The sample covariance:
– Only concerned with the strength of the relationship
– No causal effect is implied
1
n
)
Y
Y
)(
X
X
(
)
Y
,
X
(
cov
n
1
i
i
i






• Covariance between two random variables:
cov(X,Y) > 0 X and Y tend to move in the same direction
cov(X,Y) < 0 X and Y tend to move in opposite directions
cov(X,Y) = 0 X and Y are independent
Interpreting Covariance
Coefficient of Correlation
• Measures the relative strength of the linear
relationship between two variables
• Sample coefficient of correlation:
where
Y
X S
S
Y)
,
(X
cov
r 
1
n
)
X
(X
S
n
1
i
2
i
X





1
n
)
Y
)(Y
X
(X
Y)
,
(X
cov
n
1
i
i
i






1
n
)
Y
(Y
S
n
1
i
2
i
Y





Features of
Correlation Coefficient, r
• Unit free
• Ranges between –1 and 1
• The closer to –1, the stronger the negative linear relationship
• The closer to 1, the stronger the positive linear relationship
• The closer to 0, the weaker the linear relationship
Chap 3-90
Scatter Plots of Data with Various
Correlation Coefficients
Y
X
Y
X
Y
X
Y
X
Y
X
r = -1 r = -.6 r = 0
r = +.3
r = +1
Y
X
r = 0
Using Excel to Find
the Correlation Coefficient
• Select
Tools/Data Analysis
• Choose Correlation from
the selection menu
• Click OK . . .
Using Excel to Find
the Correlation Coefficient
• Input data range and select
appropriate options
• Click OK to get output
(continued)
Interpreting the Result
• r = .733
• There is a relatively
strong positive linear
relationship between
test score #1
and test score #2
• Students who scored high on the first test tended to
score high on second test, and students who scored
low on the first test tended to score low on the
second test
Scatter Plot of Test Scores
70
75
80
85
90
95
100
70 75 80 85 90 95 100
Test #1 Score
Test
#2
Score

More Related Content

Similar to Lecture 1 - Overview.pptx

Aed1222 lesson 5
Aed1222 lesson 5Aed1222 lesson 5
Aed1222 lesson 5
nurun2010
 
Source of DATA
Source of DATASource of DATA
Source of DATA
Nahid Amin
 
Day2 session i&amp;ii - spss
Day2 session i&amp;ii - spssDay2 session i&amp;ii - spss
Day2 session i&amp;ii - spss
abir hossain
 
Biostatistics CH Lecture Pack
Biostatistics CH Lecture PackBiostatistics CH Lecture Pack
Biostatistics CH Lecture Pack
Shaun Cochrane
 

Similar to Lecture 1 - Overview.pptx (20)

Basic Statistics for Class 11, B.COm, BSW, B.A, BBA, MBA
Basic Statistics for Class 11, B.COm, BSW, B.A, BBA, MBABasic Statistics for Class 11, B.COm, BSW, B.A, BBA, MBA
Basic Statistics for Class 11, B.COm, BSW, B.A, BBA, MBA
 
Business Statistics Chapter 2
Business Statistics Chapter 2Business Statistics Chapter 2
Business Statistics Chapter 2
 
Aed1222 lesson 5
Aed1222 lesson 5Aed1222 lesson 5
Aed1222 lesson 5
 
Statistics with R
Statistics with R Statistics with R
Statistics with R
 
LECTURE 3 - inferential statistics bmaths
LECTURE 3 - inferential statistics bmathsLECTURE 3 - inferential statistics bmaths
LECTURE 3 - inferential statistics bmaths
 
Basic Statistics to start Analytics
Basic Statistics to start AnalyticsBasic Statistics to start Analytics
Basic Statistics to start Analytics
 
Presenting Data in Tables and Charts
Presenting Data in Tables and ChartsPresenting Data in Tables and Charts
Presenting Data in Tables and Charts
 
2. Descriptive Statistics.pdf
2. Descriptive Statistics.pdf2. Descriptive Statistics.pdf
2. Descriptive Statistics.pdf
 
Source of DATA
Source of DATASource of DATA
Source of DATA
 
More about data science post.pdf
More about data science post.pdfMore about data science post.pdf
More about data science post.pdf
 
Chapter 6 data analysis iec11
Chapter 6 data analysis iec11Chapter 6 data analysis iec11
Chapter 6 data analysis iec11
 
Statistics.ppt
Statistics.pptStatistics.ppt
Statistics.ppt
 
Summarizing Data : Listing and Grouping pdf
Summarizing Data : Listing and Grouping pdfSummarizing Data : Listing and Grouping pdf
Summarizing Data : Listing and Grouping pdf
 
Frequency distributions, graphing, and data display
Frequency distributions, graphing, and data displayFrequency distributions, graphing, and data display
Frequency distributions, graphing, and data display
 
Math3010 week 3
Math3010 week 3Math3010 week 3
Math3010 week 3
 
UM20BB151 Business Stats - Consolidated.pptx
UM20BB151 Business Stats - Consolidated.pptxUM20BB151 Business Stats - Consolidated.pptx
UM20BB151 Business Stats - Consolidated.pptx
 
Day2 session i&amp;ii - spss
Day2 session i&amp;ii - spssDay2 session i&amp;ii - spss
Day2 session i&amp;ii - spss
 
Biostatistics CH Lecture Pack
Biostatistics CH Lecture PackBiostatistics CH Lecture Pack
Biostatistics CH Lecture Pack
 
Measure of central tendency
Measure of central tendency Measure of central tendency
Measure of central tendency
 
Charts and graphs
Charts and graphsCharts and graphs
Charts and graphs
 

Recently uploaded

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
mphochane1998
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
ssuser89054b
 
Introduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptxIntroduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptx
hublikarsn
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
AldoGarca30
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
pritamlangde
 

Recently uploaded (20)

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
Computer Graphics Introduction To Curves
Computer Graphics Introduction To CurvesComputer Graphics Introduction To Curves
Computer Graphics Introduction To Curves
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments""Lesotho Leaps Forward: A Chronicle of Transformative Developments"
"Lesotho Leaps Forward: A Chronicle of Transformative Developments"
 
Worksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptxWorksharing and 3D Modeling with Revit.pptx
Worksharing and 3D Modeling with Revit.pptx
 
Introduction to Serverless with AWS Lambda
Introduction to Serverless with AWS LambdaIntroduction to Serverless with AWS Lambda
Introduction to Serverless with AWS Lambda
 
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
 
Online electricity billing project report..pdf
Online electricity billing project report..pdfOnline electricity billing project report..pdf
Online electricity billing project report..pdf
 
UNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptxUNIT 4 PTRP final Convergence in probability.pptx
UNIT 4 PTRP final Convergence in probability.pptx
 
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using PipesLinux Systems Programming: Inter Process Communication (IPC) using Pipes
Linux Systems Programming: Inter Process Communication (IPC) using Pipes
 
Introduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptxIntroduction to Robotics in Mechanical Engineering.pptx
Introduction to Robotics in Mechanical Engineering.pptx
 
School management system project Report.pdf
School management system project Report.pdfSchool management system project Report.pdf
School management system project Report.pdf
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)Theory of Time 2024 (Universal Theory for Everything)
Theory of Time 2024 (Universal Theory for Everything)
 
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKARHAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
HAND TOOLS USED AT ELECTRONICS WORK PRESENTED BY KOUSTAV SARKAR
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
fitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .pptfitting shop and tools used in fitting shop .ppt
fitting shop and tools used in fitting shop .ppt
 
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
1_Introduction + EAM Vocabulary + how to navigate in EAM.pdf
 
Digital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptxDigital Communication Essentials: DPCM, DM, and ADM .pptx
Digital Communication Essentials: DPCM, DM, and ADM .pptx
 

Lecture 1 - Overview.pptx

  • 1. • Course Provider: Dr Anis Fatima • Office: Room 5, Second Floor IM Building • Phone Ext: 2250 • Email: anisf@neduet.edu.pk Driveonelink: https://1drv.ms/f/s!AkgKqDvMcQJRgjeZOGE0fgYluSRX Course contact information 1
  • 4. Basic Concepts of Statistics Statistics is concerned with: • Processing and analyzing data • Collecting, presenting, and transforming data to assist decision makers
  • 5. Key Definitions • A population (universe) is the collection of all members of a group • A sample is a portion of the population selected for analysis • A parameter is a numerical measure that describes a characteristic of a population • A statistic is a numerical measure that describes a characteristic of a sample
  • 6. Population vs. Sample a b c d ef gh i jk l m n o p q rs t u v w x y z Population Sample b c g i n o r u y Measures used to describe a population are called parameters Measures computed from sample data are called statistics
  • 7. Two Branches of Statistics • Descriptive statistics – Collecting, summarizing, and presenting data • Inferential statistics – Drawing conclusions about a population based only on sample data
  • 8. Descriptive Statistics • Collect data – e.g., Survey • Present data – e.g., Tables and graphs • Characterize data – e.g., Sample mean = i X n 
  • 9. Inferential Statistics • Estimation – e.g., Estimate the population mean weight using the sample mean weight • Hypothesis testing – e.g., Test the claim that the population mean weight is 120 pounds Drawing conclusions about a population based on sample results.
  • 11. Types of Data Data Categorical Numerical Discrete Continuous Examples:  Marital Status  Political Party  Eye Color (Defined categories) Examples:  Number of Children  Defects per hour (Counted items) Examples:  Weight  Voltage (Measured characteristics)
  • 12. Levels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal Data Highest Level (Strongest forms of measurement) Higher Levels Lowest Level (Weakest form of measurement) Categories (no ordering or direction) Ordered Categories (rankings, order, or scaling) Differences between measurements but no true zero Ratio Data Differences between measurements, true zero exists
  • 13. Levels of Measurement and Measurement Scales Interval Data Ordinal Data Nominal Data Height, Age, Weekly Food Spending Service quality rating, Standard & Poor’s bond rating, Student letter grades Marital status, Type of car owned Ratio Data Temperature in Fahrenheit, Standardized exam score Categories (no ordering or direction) Ordered Categories (rankings, order, or scaling) Differences between measurements but no true zero Differences between measurements, true zero exists EXAMPLES:
  • 14. Organizing and Presenting Data Graphically • Data in raw form are usually not easy to use for decision making – Some type of organization is needed • Table • Graph • Techniques reviewed here: – Bar charts and pie charts – Pareto diagram – Ordered array – Stem-and-leaf display – Frequency distributions, histograms and polygons – Cumulative distributions and ogives – Contingency tables – Scatter diagrams
  • 15. Tables and Charts for Categorical Data Categorical Data Graphing Data Pie Charts Pareto Diagram Bar Charts Tabulating Data Summary Table
  • 16. The Summary Table Example: Current Investment Portfolio Investment Amount Percentage Type (in thousands $) (%) Stocks 46.5 42.27 Bonds 32.0 29.09 CD 15.5 14.09 Savings 16.0 14.55 Total 110.0 100.0 (Variables are Categorical) Summarize data by category
  • 17. Bar and Pie Charts • Bar charts and Pie charts are often used for qualitative data (categories or nominal scale) • Height of bar or size of pie slice shows the frequency or percentage for each category
  • 18. Bar Chart Example Investor's Portfolio 0 10 20 30 40 50 Stocks Bonds CD Savings Amount in $1000's Investment Amount Percentage Type (in thousands $) (%) Stocks 46.5 42.27 Bonds 32.0 29.09 CD 15.5 14.09 Savings 16.0 14.55 Total 110.0 100.0 Current Investment Portfolio
  • 19. Pie Chart Example Percentages are rounded to the nearest percent Current Investment Portfolio Savings 15% CD 14% Bonds 29% Stocks 42% Investment Amount Percentage Type (in thousands $) (%) Stocks 46.5 42.27 Bonds 32.0 29.09 CD 15.5 14.09 Savings 16.0 14.55 Total 110.0 100.0
  • 20. Pareto Diagram • Used to portray categorical data (nominal scale) • A bar chart, where categories are shown in descending order of frequency • A cumulative polygon is often shown in the same graph • Used to separate the “vital few” from the “trivial many”
  • 22. Tables and Charts for Numerical Data Numerical Data Ordered Array Stem-and-Leaf Display Histogram Polygon Ogive Frequency Distributions and Cumulative Distributions
  • 23. The Ordered Array A sequence of data in rank order:  Shows range (min to max)  Provides some signals about variability within the range  May help identify outliers (unusual observations)  If the data set is large, the ordered array is less useful
  • 24. • Data in raw form (as collected): 24, 26, 24, 21, 27, 27, 30, 41, 32, 38 • Data in ordered array from smallest to largest: 21, 24, 24, 26, 27, 27, 30, 32, 38, 41 (continued) The Ordered Array
  • 25. What is a Frequency Distribution? • A frequency distribution is a list or a table … • containing class groupings (ranges within which the data fall) ... • and the corresponding frequencies with which data fall within each grouping or category Tabulating Numerical Data: Frequency Distributions
  • 26. Why Use a Frequency Distribution? • It is a way to summarize numerical data • It condenses the raw data into a more useful form... • It allows for a quick visual interpretation of the data
  • 27. Class Intervals and Class Boundaries • Each class grouping has the same width • Determine the width of each interval by  Usually at least 5 but no more than 15 groupings  Class boundaries never overlap  Round up the interval width to get desirable endpoints groupings class desired of number range interval of Width 
  • 28. Frequency Distribution Example Example: A manufacturer of insulation randomly selects 20 winter days and records the daily high temperature 24, 35, 17, 21, 24, 37, 26, 46, 58, 30, 32, 13, 12, 38, 41, 43, 44, 27, 53, 27
  • 29. • Sort raw data in ascending order: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 • Find range: 58 - 12 = 46 • Select number of classes: 5 (usually between 5 and 15) • Compute class interval (width): 10 (46/5 then round up) • Determine class boundaries (limits): 10, 20, 30, 40, 50, 60 • Compute class midpoints: 15, 25, 35, 45, 55 • Count observations & assign to classes Frequency Distribution Example (continued)
  • 30. Frequency Distribution Example Class Frequency 10 but less than 20 3 .15 15 20 but less than 30 6 .30 30 30 but less than 40 5 .25 25 40 but less than 50 4 .20 20 50 but less than 60 2 .10 10 Total 20 1.00 100 Relative Frequency Percentage Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 (continued)
  • 31. Tabulating Numerical Data: Cumulative Frequency Class 10 but less than 20 3 15 3 15 20 but less than 30 6 30 9 45 30 but less than 40 5 25 14 70 40 but less than 50 4 20 18 90 50 but less than 60 2 10 20 100 Total 20 100 Percentage Cumulative Percentage Data in ordered array: 12, 13, 17, 21, 24, 24, 26, 27, 27, 30, 32, 35, 37, 38, 41, 43, 44, 46, 53, 58 Frequency Cumulative Frequency
  • 32. Graphing Numerical Data: The Histogram • A graph of the data in a frequency distribution is called a histogram • The class boundaries (or class midpoints) are shown on the horizontal axis • the vertical axis is either frequency, relative frequency, or percentage • Bars of the appropriate heights are used to represent the number of observations within each class
  • 33. Histogram: Daily High Temperature 0 1 2 3 4 5 6 7 5 15 25 35 45 55 65 Frequency Class Midpoints Histogram Example (No gaps between bars) Class 10 but less than 20 15 3 20 but less than 30 25 6 30 but less than 40 35 5 40 but less than 50 45 4 50 but less than 60 55 2 Frequency Class Midpoint
  • 34. Frequency Polygon: Daily High Temperature 0 1 2 3 4 5 6 7 5 15 25 35 45 55 65 Frequency Graphing Numerical Data: The Frequency Polygon Class Midpoints Class 10 but less than 20 15 3 20 but less than 30 25 6 30 but less than 40 35 5 40 but less than 50 45 4 50 but less than 60 55 2 Frequency Class Midpoint (In a percentage polygon the vertical axis would be defined to show the percentage of observations per class)
  • 35. Graphing Cumulative Frequencies: The Ogive (Cumulative % Polygon) Ogive: Daily High Temperature 0 20 40 60 80 100 10 20 30 40 50 60 Cumulative Percentage Class Boundaries (Not Midpoints) Class Less than 10 0 0 10 but less than 20 10 15 20 but less than 30 20 45 30 but less than 40 30 70 40 but less than 50 40 90 50 but less than 60 50 100 Cumulative Percentage Lower class boundary 10 20 30 40 50 60
  • 36. Tabulating and Graphing Multivariate Categorical Data • Contingency Table for Investment Choices ($1000’s) Investment Investor A Investor B Investor C Total Category Stocks 46.5 55 27.5 129 Bonds 32.0 44 19.0 95 CD 15.5 20 13.5 49 Savings 16.0 28 7.0 51 Total 110.0 147 67.0 324 (Individual values could also be expressed as percentages of the overall total, percentages of the row totals, or percentages of the column totals)
  • 37. • Side-by-side bar charts (continued) Tabulating and Graphing Multivariate Categorical Data Comparing Investors 0 10 20 30 40 50 60 S toc k s B onds CD S avings Inves tor A Inves tor B Inves tor C
  • 38. Side-by-Side Chart Example • Sales by quarter for three sales territories: 0 10 20 30 40 50 60 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East West North 1st Qtr 2nd Qtr 3rd Qtr 4th Qtr East 20.4 27.4 59 20.4 West 30.6 38.6 34.6 31.6 North 45.9 46.9 45 43.9
  • 39. • Scatter Diagrams are used to examine possible relationships between two numerical variables • The Scatter Diagram: – one variable is measured on the vertical axis and the other variable is measured on the horizontal axis Scatter Diagrams
  • 40. Scatter Diagram Example Cost per Day vs. Production Volume 0 50 100 150 200 250 0 10 20 30 40 50 60 70 Volume per Day Cost per Day Volume per day Cost per day 23 131 24 120 26 140 29 151 33 160 38 167 41 185 42 170 50 188 55 195 60 200
  • 41. • A Time Series Plot is used to study patterns in the values of a variable over time • The Time Series Plot: – one variable is measured on the vertical axis and the time period is measured on the horizontal axis Time Series Plot
  • 42. Scatter Diagram Example Number of Franchises, 1996-2004 0 20 40 60 80 100 120 1994 1996 1998 2000 2002 2004 2006 Year Number of Franchises Year Number of Franchises 1996 43 1997 54 1998 60 1999 73 2000 82 2001 95 2002 107 2003 99 2004 95
  • 43. Summary Measures Arithmetic Mean Median Mode Describing Data Numerically Variance Standard Deviation Coefficient of Variation Range Interquartile Range Skewness Central Tendency Variation Shape Quartiles
  • 44. Measures of Central Tendency Central Tendency Arithmetic Mean Median Mode n X X n i i    1 n / 1 n 2 1 G ) X X X ( X      Overview Midpoint of ranked values Most frequently observed value
  • 45. Arithmetic Mean • The arithmetic mean (mean) is the most common measure of central tendency – For a sample of size n: Sample size n X X X n X X n 2 1 n 1 i i         Observed values
  • 46. Arithmetic Mean • The most common measure of central tendency • Mean = sum of values divided by the number of values • Affected by extreme values (outliers) (continued) 0 1 2 3 4 5 6 7 8 9 10 Mean = 3 0 1 2 3 4 5 6 7 8 9 10 Mean = 4 3 5 15 5 5 4 3 2 1       4 5 20 5 10 4 3 2 1      
  • 47. Median • In an ordered array, the median is the “middle” number (50% above, 50% below) • Not affected by extreme values 0 1 2 3 4 5 6 7 8 9 10 Median = 3 0 1 2 3 4 5 6 7 8 9 10 Median = 3
  • 48. Finding the Median • The location of the median: – If the number of values is odd, the median is the middle number – If the number of values is even, the median is the average of the two middle numbers • Note that is not the value of the median, only the position of the median in the ranked data data ordered the in position 2 1 n position Median   2 1 n 
  • 49. Chap 3-49 Mode • A measure of central tendency • Value that occurs most often • Not affected by extreme values • Used for either numerical or categorical (nominal) data • There may may be no mode • There may be several modes 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Mode = 9 0 1 2 3 4 5 6 No Mode
  • 50. • Five houses on a hill by the beach Review Example $2,000 K $500 K $300 K $100 K $100 K House Prices: $2,000,000 500,000 300,000 100,000 100,000
  • 51. Review Example: Summary Statistics • Mean: ($3,000,000/5) = $600,000 • Median: middle value of ranked data = $300,000 • Mode: most frequent value = $100,000 House Prices: $2,000,000 500,000 300,000 100,000 100,000 Sum $3,000,000
  • 52. • Mean is generally used, unless extreme values (outliers) exist • Then median is often used, since the median is not sensitive to extreme values. – Example: Median home prices may be reported for a region – less sensitive to outliers Which measure of location is the “best”?
  • 53. Quartiles • Quartiles split the ranked data into 4 segments with an equal number of values per segment 25% 25% 25% 25%  The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger  Q2 is the same as the median (50% are smaller, 50% are larger)  Only 25% of the observations are greater than the third quartile Q1 Q2 Q3
  • 54. Quartile Formulas Find a quartile by determining the value in the appropriate position in the ranked data, where First quartile position: Q1 = (n+1)/4 Second quartile position: Q2 = (n+1)/2 (the median position) Third quartile position: Q3 = 3(n+1)/4 where n is the number of observed values
  • 55. (n = 9) Q1 is in the (9+1)/4 = 2.5 position of the ranked data so use the value half way between the 2nd and 3rd values, so Q1 = 12.5 Quartiles Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22  Example: Find the first quartile Q1 and Q3 are measures of noncentral location Q2 = median, a measure of central tendency
  • 56. (n = 9) Q1 is in the (9+1)/4 = 2.5 position of the ranked data, so Q1 = 12.5 Q2 is in the (9+1)/2 = 5th position of the ranked data, so Q2 = median = 16 Q3 is in the 3(9+1)/4 = 7.5 position of the ranked data, so Q3 = 19.5 Quartiles Sample Data in Ordered Array: 11 12 13 16 16 17 18 21 22  Example: (continued)
  • 57. Same center, different variation Measures of Variation Variation Variance Standard Deviation Coefficient of Variation Range Interquartile Range  Measures of variation give information on the spread or variability of the data values.
  • 58. Range • Simplest measure of variation • Difference between the largest and the smallest values in a set of data: Range = Xlargest – Xsmallest 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Range = 14 - 1 = 13 Example:
  • 59. • Ignores the way in which data are distributed • Sensitive to outliers 7 8 9 10 11 12 Range = 12 - 7 = 5 7 8 9 10 11 12 Range = 12 - 7 = 5 Disadvantages of the Range 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = 5 - 1 = 4 Range = 120 - 1 = 119
  • 60. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 3-60 Interquartile Range • Can eliminate some outlier problems by using the interquartile range • Eliminate some high- and low-valued observations and calculate the range from the remaining values • Interquartile range = 3rd quartile – 1st quartile = Q3 – Q1
  • 61. Interquartile Range Median (Q2) X maximum X minimum Q1 Q3 Example: 25% 25% 25% 25% 12 30 45 57 70 Interquartile range = 57 – 30 = 27
  • 62. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 3-62 • Average (approximately) of squared deviations of values from the mean – Sample variance: Variance 1 - n ) X (X S n 1 i 2 i 2     Where = mean n = sample size Xi = ith value of the variable X X
  • 63. Standard Deviation • Most commonly used measure of variation • Shows variation about the mean • Is the square root of the variance • Has the same units as the original data – Sample standard deviation: 1 - n ) X (X S n 1 i 2 i    
  • 64. Calculation Example: Sample Standard Deviation Sample Data (Xi) : 10 12 14 15 17 18 18 24 n = 8 Mean = X = 16 4.3095 7 130 1 8 16) (24 16) (14 16) (12 16) (10 1 n ) X (24 ) X (14 ) X (12 ) X (10 S 2 2 2 2 2 2 2 2                         A measure of the “average” scatter around the mean
  • 65. Measuring variation Small standard deviation Large standard deviation
  • 66. Comparing Standard Deviations Mean = 15.5 S = 3.338 11 12 13 14 15 16 17 18 19 20 21 11 12 13 14 15 16 17 18 19 20 21 Data B Data A Mean = 15.5 S = 0.926 11 12 13 14 15 16 17 18 19 20 21 Mean = 15.5 S = 4.567 Data C
  • 67. Advantages of Variance and Standard Deviation • Each value in the data set is used in the calculation • Values far from the mean are given extra weight (because deviations from the mean are squared)
  • 68. Coefficient of Variation • Measures relative variation • Always in percentage (%) • Shows variation relative to mean • Can be used to compare two or more sets of data measured in different units 100% X S CV          
  • 69. Comparing Coefficient of Variation • Stock A: – Average price last year = $50 – Standard deviation = $5 • Stock B: – Average price last year = $100 – Standard deviation = $5 Both stocks have the same standard deviation, but stock B is less variable relative to its price 10% 100% $50 $5 100% X S CVA              5% 100% $100 $5 100% X S CVB             
  • 70. Z Scores • A measure of distance from the mean (for example, a Z- score of 2.0 means that a value is 2.0 standard deviations from the mean) • The difference between a value and the mean, divided by the standard deviation • A Z score above 3.0 or below -3.0 is considered an outlier S X X Z  
  • 71. Z Scores Example: • If the mean is 14.0 and the standard deviation is 3.0, what is the Z score for the value 18.5? • The value 18.5 is 1.5 standard deviations above the mean • (A negative Z-score would mean that a value is less than the mean) 1.5 3.0 14.0 18.5 S X X Z      (continued)
  • 72. Shape of a Distribution • Describes how data are distributed • Measures of shape – Symmetric or skewed Mean = Median Mean < Median Median < Mean Right-Skewed Left-Skewed Symmetric
  • 73. Using Microsoft Excel • Descriptive Statistics can be obtained from Microsoft® Excel – Use menu choice: tools / data analysis / descriptive statistics – Enter details in dialog box
  • 74. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 3-74 Using Excel Use menu choice: tools / data analysis / descriptive statistics
  • 75. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 3-75 • Enter dialog box details • Check box for summary statistics • Click OK Using Excel (continued)
  • 76. Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 3-76 Excel output Microsoft Excel descriptive statistics output, using the house price data: House Prices: $2,000,000 500,000 300,000 100,000 100,000
  • 77. Numerical Measures for a Population • Population summary measures are called parameters • The population mean is the sum of the values in the population divided by the population size, N N X X X N X N 2 1 N 1 i i          μ = population mean N = population size Xi = ith value of the variable X Where
  • 78. • Average of squared deviations of values from the mean – Population variance: Population Variance N μ) (X σ N 1 i 2 i 2     Where μ = population mean N = population size Xi = ith value of the variable X
  • 79. Population Standard Deviation • Most commonly used measure of variation • Shows variation about the mean • Is the square root of the population variance • Has the same units as the original data – Population standard deviation: N μ) (X σ N 1 i 2 i    
  • 80. • If the data distribution is approximately bell- shaped, then the interval: • contains about 68% of the values in the population or the sample The Empirical Rule 1σ μ  μ 68% 1σ μ 
  • 81. • contains about 95% of the values in the population or the sample • contains about 99.7% of the values in the population or the sample The Empirical Rule 2σ μ  3σ μ  3σ μ  99.7% 95% 2σ μ 
  • 82. Exploratory Data Analysis • Box-and-Whisker Plot: A Graphical display of data using 5-number summary: Minimum -- Q1 -- Median -- Q3 -- Maximum Example: Minimum 1st Median 3rd Maximum Quartile Quartile Minimum 1st Median 3rd Maximum Quartile Quartile 25% 25% 25% 25%
  • 83. Shape of Box-and-Whisker Plots • The Box and central line are centered between the endpoints if data are symmetric around the median • A Box-and-Whisker plot can be shown in either vertical or horizontal format Min Q1 Median Q3 Max
  • 84. Distribution Shape and Box-and-Whisker Plot Right-Skewed Left-Skewed Symmetric Q1 Q2 Q3 Q1 Q2 Q3 Q1 Q2 Q3
  • 85. Box-and-Whisker Plot Example • Below is a Box-and-Whisker plot for the following data: 0 2 2 2 3 3 4 5 5 10 27 • The data are right skewed, as the plot depicts 0 2 3 5 27 0 2 3 5 27 Min Q1 Q2 Q3 Max
  • 86. The Sample Covariance • The sample covariance measures the strength of the linear relationship between two variables (called bivariate data) • The sample covariance: – Only concerned with the strength of the relationship – No causal effect is implied 1 n ) Y Y )( X X ( ) Y , X ( cov n 1 i i i      
  • 87. • Covariance between two random variables: cov(X,Y) > 0 X and Y tend to move in the same direction cov(X,Y) < 0 X and Y tend to move in opposite directions cov(X,Y) = 0 X and Y are independent Interpreting Covariance
  • 88. Coefficient of Correlation • Measures the relative strength of the linear relationship between two variables • Sample coefficient of correlation: where Y X S S Y) , (X cov r  1 n ) X (X S n 1 i 2 i X      1 n ) Y )(Y X (X Y) , (X cov n 1 i i i       1 n ) Y (Y S n 1 i 2 i Y     
  • 89. Features of Correlation Coefficient, r • Unit free • Ranges between –1 and 1 • The closer to –1, the stronger the negative linear relationship • The closer to 1, the stronger the positive linear relationship • The closer to 0, the weaker the linear relationship
  • 90. Chap 3-90 Scatter Plots of Data with Various Correlation Coefficients Y X Y X Y X Y X Y X r = -1 r = -.6 r = 0 r = +.3 r = +1 Y X r = 0
  • 91. Using Excel to Find the Correlation Coefficient • Select Tools/Data Analysis • Choose Correlation from the selection menu • Click OK . . .
  • 92. Using Excel to Find the Correlation Coefficient • Input data range and select appropriate options • Click OK to get output (continued)
  • 93. Interpreting the Result • r = .733 • There is a relatively strong positive linear relationship between test score #1 and test score #2 • Students who scored high on the first test tended to score high on second test, and students who scored low on the first test tended to score low on the second test Scatter Plot of Test Scores 70 75 80 85 90 95 100 70 75 80 85 90 95 100 Test #1 Score Test #2 Score