SlideShare a Scribd company logo
1 of 59
DESCRIPTIVE STATISTICS
Part I: Numerical Description
In this chapter, we will learn how to describe a set of data using numerical methods. This is the
first of two chapters that together will aim at providing methods of descriptive statistics. In
descriptive statistics, which is the use of graphical methods to display data and explore key
statistics.
1
What are the basic features of a data set?
A data set is a collection of data representing a particular variable.
Examples of data sets are given below.
Data Sets:
• Students’ grades in a calculus test:
65, 85, 70, 75, 85, 80, 82, 85, 90, 78, 81, 82, 67, 80
• Property tax of a sample of houses:
$5000, $4500, $4000, $7200, $5000, $3800, $4100, $5000
• Driving distance to work of a group of employee (miles):
1.2, 2.0, 2.2, 15.0, 11.0, 5.0, 3.7, 4.9, 15.2, 16.0
• Ages of all students in a college:
18, 19, 21, ……………………..…, 22, 18, 19, 21
2Notes:
…………………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………………….
In general, establishing a data set requires consideration of a number of key questions:
Notes:
…………………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………………….
3
Data Set Key Questions:
•Are the data qualitative or quantitative?
•What levels of measurement do the data exhibit? (nominal, ordinal, interval, or
ratio)
•What is the source of data?(the population)
•What is the appropriate sampling technique that should be used to collect the
samples? (random or stratified)
•What is the appropriate minimum sample size?
4
Data Set Types: (1) Univariate, (2) Bivariate, (3) Multivariate
Data Set Variable Typical Tasks
Univariate One Histograms, Descriptive Statistics, Frequency tallies
Bivariate Two Scatter plots, correlations, simple regression
Multivariate
More than two
variables Multiple regression, data mining, modeling
Person # Weight (lb)
1 150
2 120
3 130
4 125
5 155
6 134
7 150
8 140
9 160
10 200
11 180
12 140
Person
#
Years at
work
Annual Salary
($)
1 5 50,000
2 20 73,000
3 10 65,000
4 5 55,000
5 8 60,000
6 10 60,000
7 15 68,000
8 15 69,000
9 20 68,000
10 20 69,000
11 18 68,000
12 10 62,000
13 3 48,000
UnivariateDataSet
BivariateDataSet
Case Name Age
Income
($) Position Gender
1Frieda 45 67,100 Consumer Analyst F
2Stefan 32 56,500
Operations
analyst M
3John 55 88,200 Marketing VP F
4Donna 27 59,000 Statistician F
5Larry 46 26,000 Security guard M
6Alicia 52 68,500 QC Director F
7Alec 65 95,200 Chief executive M
8Jaime 50 71,200
Human
Resources M
Multivariate Data Set
Notes:
…………………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………………….
Time-series data set
5
Cross sectional Sample
Notes:
…………………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………………….
Data Sets in the Context of Sampling:
• Cross sectional data set
• Time-series data set
6
Notes:
…………………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………………….
7
Working Problem 2.2: Explain what is inheritance tax. What is the difference between inheritance tax and Estate tax?
What is the level of measurement for each of the following variables: State, Income tax, sales tax, and inheritance
tax. Why do some states have a wide income tax range?
http://portal.kiplinger.com/tools/slideshows/slideshow_pop.html?nm=TaxUnfriendlyStatesRetirees
State Income
Tax (%)
U.S. States
Sales Tax
(%)
Inheritance
Tax (%)
Alaska 0.0 0.0 NO
Wyoming 0.0 4.0 No
Michigan 4.4 6.0 No
Pennsylvania 3.1 6.0 YES
Colorado 4.6 2.9 NO
Delaware 4.6 0.0 NO
Hawaii 1.4 to 11 4.0 NO
Georgia 1.0 to 6.0 4.0 NO
South Carolina 3.0 to 7.0 6.0 NO
Alabama 2.0 to 5.0 4.0 NO
California 1.25 to 10.55 8.3 NO
Rhode Island 3.75-9.9 7.0 NO
New Jersey 1.4 to 8.97 7.0 YES
Vermont 3.55-8.95 6.0 NO
Iowa 0.36 to 8.98 6.0 YES
Nebraska 2.56 to 6.84 5.5 Yes
Wisconsin 4.6 to 7.75 5.0 NO
Oregon 5.0 to 11.0 0.0 YES
Indiana 3.4 7.0 YES
North Dakota 1.84-4.86 5.0 NO
8
Working Problem 2.3:
Identify the following data sets as ‘Cross-Sectional Data’ or ‘Time-Series Data’:
(a) Two weeks before the 56th quadrennial United States presidential election, which was held on November 4, 2008, a sample of
people taking randomly from undecided states revealed that Democrat Barack Obama is expected to earn 54% of the
popular votes and John McCain is expected to earn 46% of the votes
Cross Sectional ( ) Time-Series ( )
(b) A survey of 1000 students from a university of 10,000 students, revealed that 65% of the students do not prefer weekend
classes
Cross Sectional ( ) Time-Series ( )
(c) The U.S. City average price per gallon of unleaded regular gasoline from 2000 to 2009 was as follow:
Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
2000 1.301 1.369 1.541 1.506 1.498 1.617 1.593 1.51 1.582 1.559 1.555 1.489
2001 1.472 1.484 1.447 1.564 1.729 1.64 1.482 1.427 1.531 1.362 1.263 1.131
2002 1.139 1.13 1.241 1.407 1.421 1.404 1.412 1.423 1.422 1.449 1.448 1.394
2003 1.473 1.641 1.748 1.659 1.542 1.514 1.524 1.628 1.728 1.603 1.535 1.494
2004 1.592 1.672 1.766 1.833 2.009 2.041 1.939 1.898 1.891 2.029 2.01 1.882
2005 1.823 1.918 2.065 2.283 2.216 2.176 2.316 2.506 2.927 2.785 2.343 2.186
2006 2.315 2.31 2.401 2.757 2.947 2.917 2.999 2.985 2.589 2.272 2.241 2.334
2007 2.274 2.285 2.592 2.86 3.13 3.052 2.961 2.782 2.789 2.793 3.069 3.02
2008 3.047 3.033 3.258 3.441 3.764 4.065 4.09 3.786 3.698 3.173 2.151 1.689
2009 1.787 1.928 1.949 2.056 2.265 2.631 2.543 2.627 2.574 2.561 2.66 2.621
http://data.bls.gov/cgi-bin/surveymost
Cross Sectional ( ) Time-Series ( )
9
Notes:
…………………………………………………………………………………………………………………………………
………………………………………………………………………………………………………………………………….
What are the different elements of descriptive statistics?
Two types of descriptive statistics:
(1) Numerical measures of data, and
(2) Graphical displays of data.
The Focus of this Chapter
10
Numerical measures of descriptive statistics consist of two types of measures:
• Measures of central tendency (mean, median, and mode)
• Measures of dispersion (range, standard deviation, and variance)
• Combined measures (coefficient of variation, signal-to-noise ratio, and standardized variable)
Measures of Central Tendency
10.0 12.0 13.0 14.0 14.0 14.0 24.0 24.0 24.0 26.8 27.0 27.0 29.0 30
Mean
Mode
Median
Measures of Dispersion
Range
Standard Deviation
Variance
11
Key points to perform good statistical analysis
1. Identify your objectives:
 What questions do you really need to answer?
 What variable do you need to examine?
 What population you are about to evaluate?
2. Collect the appropriate samples and data to address your questions:
Do you have access to the entire population?
 Would a selection of a sample from the population be easier to access, less costly,
and less destructive than an evaluation of the whole population?
 Remember ‘GIGO’ or garbage-in, garbage-out. If the samples are not representative
of the population, and the data collected is not accurate and precise, the conclusions
drawn from the analysis will be meaningless.
3. Describe the data using the analysis of descriptive statistics :
 Do you detect data abnormality or outliers?
 Can you explore the data in such a way that will provide a clear description of data
center and data variability?
 Use descriptive statistics as a guideline for other methods of analysis
4. Perform inference :
 Can the sample statistics be used to estimate population parameters?
 Is your estimation of population parameters reliable?
 Do you have confidence in the population estimates?
Center Values
Measures of Central Tendency
Mean
Mode
Median
12
What are the measures of central tendency?
10.0 12.0 13.0 14.0 14.0 14.0 24.0 24.0 24.0 26.8 27.0 27.0 29.0 30
(1) Arithmetic Mean
Measures of Data Center
(Central Tendency)
Arithmetic Mean of Sample Observations
Arithmetic Mean of Population Observations
13
Example: The Table below illustrates a comparison of gas prices in some States in
September 2009 and September 2008. Determine the mean of gas prices ($ per gallon)
for each year.
State Sept- 2009 Sep-2008
California 3.099 3.75
Colorado 2.48 3.732
Florida 2.527 3.893
Massachusetts 2.597 3.582
Minnesota 2.452 3.765
New York 2.811 3.805
Ohio 2.411 3.933
Texas 2.404 3.729
Washington 2.947 3.785
Gas Prices of a number of states in September 2008, and September 2009
http://www.eia.doe.gov/oil_gas/petroleum/data_publications/wrgp/mogas_home_page.html
gallon
n
x
XMean
n
i
/636.2$
9
947.2........597.2527.248.2099.31 



gallon
n
x
XMean
n
i
/775.3$
9
785.3........582.3893.3732.375.31 



For September 2009:
For September 2008:
Comment on the Results
14
15
Properties of arithmetic mean:
1. The mean of a set of data is unique and can be used as an identity
measure of the data center
2. We can determine the mean of any data set that contains ratio or
interval level data
3. We need all observation values to be able to calculate the mean
4. You know it is the correct mean value when the sum of the deviations
of each value from it is zero,
16
Example: Determine the arithmetic mean of the three values of student grades:
80, 40, and 30. Using the mean value, prove that
.
Solution:
The arithmetic mean:
17
Working Problem 2.4:
Calculate the mean for the following data set of minimum wage ($):
7, 8, 6, 6, 8, 5, 6, 5, 8, 8
18
2. Median:
The median of a set of numbers arranged in order of magnitude is the middle value or the
arithmetic mean of the two middle values.
Example: Calculate the median of the following data set:
14, 12, 14, 16, 15, 19, 17, 17, 17
Solution:
To determine the median, we first arrange the data in order of magnitude:
12, 14, 14, 15, 16, 17, 17, 17, 19
Thus, the median is 16
Example: Calculate the median of the following data set:
8, 9, 10, 9, 8, 6, 11, 7, 12, 8
Solution:
To determine the median, we first arrange the data in order of magnitude
6, 7, 8, 8, 8, 9, 9, 10, 11, 12
Since this data set consists of an even number of observations, the middle values that split this
data into equal number of observations on both sides are 8 and 9. Thus, the median of this set of
data is (8+9)/2 = 8.5
19
3. Mode:
The mode is that value which occurs with the greatest frequency. Interestingly, mode is a
French word that means fashion; perhaps, it is popular and common fashion.
Example: Calculate the mode of the following observations:
80, 87, 90, 82, 78, 74, 80, 77, 80, 91, 81, 80
Example: Calculate the mode of the following observations:
5, 7, 8, 9, 9, 9, 10, 11, 12, 14, 14, 14, 15
Solution:
The mode of this set is 80
Solution:
This set exhibits two modes 9, and 14, and is called bimodal.
Working Problem:
Calculate the mean and the mode and the median for the following data set
of minimum wage ($)
7, 8, 6, 6, 8, 5, 6, 5, 8, 8
20
Answer:
Mean = $6.7
Median =$6.5
Mode = $8
21
Working Problem 2.6:
Calculate the median and the mode for the following data set of
minimum wage ($):
7, 8, 6, 6, 8, 5, 6, 5, 8, 8
Geometric Mean, G
n
nxxxxG ......321
Example: If the return on investment earned by a manufacturer of a
sport car for four successive years was: 20 percent, 15 percent, -40
percent, and 100 percent. What is the geometric mean rate of return on
investment?
1344.1656.1)2)(6.0)(15.1)(2.1(...... 44
321  n
nxxxxG
Accordingly, the average rate of return, which is essentially a compound
annual growth rate, is 13.44%.
22
23
Example: Suppose the inflation rates for the last 5 years in a certain country are 5%, 4%, 2%, 8%,
and 6%, respectively. What is the mean rate of inflation over this five-year period?
Accordingly, the average rate of inflation over the five-year period is 4.9%
Geometric Mean, G
Solution:
At the end of the first year, the price index will be 1.05 times the price index at the beginning of
the year; at the end of the second year, the price index will be (1.04)(1.05); at the end of the
third year, the price index will be (1.02)(1.04)(1.05) and so on. Thus, the mean of 1.05, 1.04,
1.02, 1.08, and 1.06 is:
24
Working Problem 2.8:
The percent increase in sales for the last 4 years at X-L Company were:
9.91, 10.75, 13.12, 26.6
(a) Find the geometric mean percent increase.
(b) Find the arithmetic mean percent increase.
(c) Is the arithmetic mean equal to or greater than the geometric mean?
25
What are the ‘dispersion’ or variability measures?
 Range
 Mean deviation
 Standard deviation
 Variance
10.0 12.0 13.0 14.0 14.0 14.0 24.0 24.0 24.0 26.8 27.0 27.0 29.0 30
Measures of Dispersion
Range
Standard Deviation
Variance
What are ‘Dispersion’ or Variability measures?
 Range
 Mean deviation
 Standard deviation
 Variance
minmax XXR 
Example: Calculate the range of the following set of data:
200, 205, 204, 202, 207, 208
26
The Range = R = 208 - 200 = 8
27
Properties of range:
• The range represents the most commonly used statistic after the arithmetic
mean
• It is simple as it relies on two values, the maximum value and the minimum
value
• It is easy to understand: the higher the range, the higher the variability
• Since the range relies on two values (maximum and minimum), a mistake in
any one of these two values or a presence of an outlier can result in a
misleading value of range
minmax XXR 
What are ‘Dispersion’ or Variability measures?
Mean deviation



n
i Xx
n
MD
1
1
28
Example : Calculate the mean deviation of the
following ten observations of metal sheet thickness (mm):
83, 90, 70, 90, 90, 60, 70, 70, 90, 100
Solution: Step 1: Calculate the Mean
X

= (83 + 90 + 70 + 90 + 90 + 60 + 70 + 70 + 90 + 100) / 10 = 81.3 mm
Step 2: Subtract each observation value from the Mean value, and add up the absolute
differences
Thickness (mm)
83 (83-81.3) =1.7 1.7
90 (90-81.3) = 8.7 8.7
70 (70-81.3) = -11.3 11.3
90 (90-81.3) = 8.7 8.7
90 (90-81.3) = 8.7 8.7
60 (60-81.3) = -21.3 21.3
70 (70-81.3) = -11.3 11.3
70 (70-81.3) = -11.3 11.3
90 (90-81.3) = 8.7 8.7
100 (100-81.3) = 18.7 18.7
Mean = = 81.3 Sum = 110.4
)(

 Xx |)(|

 Xx

X
Mean
Deviation =
110.4/10 =
11.04 mm
What are ‘Dispersion’ or Variability measures?
 Range
 Mean deviation
 Standard deviation
 Variance
For a Population:
 



N
i
N
x
1
2


For n < 30, we use (n-1) in the denominator
For a Sample:




n
i
n
Xx
s
1
2
)(
29
Standard deviation
Thickness (mm) 83 90 70 90 90 60 70 70 90 100
X

= (83 + 90 + 70 + 90 + 90 + 60 + 70 + 70 + 90 + 100) / 10 = 81.3 mm
For n < 30, we use (n-1) in the denominator
Example: Calculate the standard deviation of the following ten observations of metal sheet
thickness




n
i
n
Xx
s
1
2
)(
Thickness (mm)
83 (83-81.3) =1.7 2.89
90 (90-81.3) = 8.7 75.69
70 (70-81.3) = -11.3 127.69
90 (90-81.3) = 8.7 75.69
90 (90-81.3) = 8.7 75.69
60 (60-81.3) = -21.3 453.69
70 (70-81.3) = -11.3 127.69
70 (70-81.3) = -11.3 127.69
90 (90-81.3) = 8.7 75.69
100 (100-81.3) = 18.7349.69
Mean = 81.3 Sum = 1492.1
)(

 Xx 2
)(

 Xx
mms 88.12
9
1.1492

 



n
i
n
Xx
s
1
2
1
)(
30
What are ‘Dispersion’ or Variability measures?
 Range
 Mean deviation
 Standard deviation
 Variance
For a Population:
 



N
i
N
x
1
2
2 

For n < 30, we use (n-1) in the denominator
For a Sample:




n
i
n
Xx
s
1
2
2 )(
31
32
Properties of variance:
• The variance represents the most commonly used statistic to indicate variability
• It is easy to understand: the higher the variance, the higher the variability
• Unlike the range, the variance takes into account all values of the observation
values. Therefore, it is largely insensitive to outliers
• Variance values cannot be subtracted to determine variability. It can only be added.
If U = X ± Y, Var (U) = Var (X) + Var (Y). This is the principle of analysis of variance
(Chapter 11)
What are ‘Dispersion’ or Variability measures?
Variance
Thickness (mm) 83 90 70 90 90 60 70 70 90 100
X

= (83 + 90 + 70 + 90 + 90 + 60 + 70 + 70 + 90 + 100) / 10 = 81.3 mm
For n < 30, we use (n-1) in the denominator
Example: Calculate the varianceof the following ten observations of metal sheet thickness
Thickness (mm)
83 (83-81.3) =1.7 2.89
90 (90-81.3) = 8.7 75.69
70 (70-81.3) = -11.3 127.69
90 (90-81.3) = 8.7 75.69
90 (90-81.3) = 8.7 75.69
60 (60-81.3) = -21.3 453.69
70 (70-81.3) = -11.3 127.69
70 (70-81.3) = -11.3 127.69
90 (90-81.3) = 8.7 75.69
100 (100-81.3) = 18.7349.69
Mean = 81.3 Sum = 1492.1
)(

 Xx 2
)(

 Xx
22
79.165
9
1.1492
mms 
33




n
i
n
Xx
s
1
2
2 )(




n
i
n
Xx
s
1
2
2 )(
Working Problem:
Calculate the minimum, maximum, range, standard deviation, and variance for the
following data set of minimum wage ($)
7, 8, 6, 6, 8, 5, 6, 5, 8, 8
34
Answer:
Minimum = $5
Maximum =$8
Range = $3
Standard deviation = $1.252
Variance = 1.567
35
Example: Suppose X and Y are independent random variables. The variance of X is equal to 16;
and the variance of Y is equal to 9. Let U = X - Y.
What is the standard deviation of U?
•2.65 ……….
•5.00 ……….
•7.00 ……….
•25.0 ……….
•None of the above ……….
36
Working Problem 2.9:
Question (1): Calculate the minimum, the maximum, the range, the mean deviation, the
standard deviation, and the variance for the following data set of minimum wage ($)
7, 8, 6, 6, 8, 5, 6, 5, 8, 8
Question (2): In two consecutive exams, the mean grade of the first test was 80 and the mean
grade of the second test was 90. The standard deviation of grade of the first test was 6 and
the standard deviation of grade of the second test was 8. Calculate the mean of the two tests
and the variance of the two tests?
37
What are Combined Descriptive Measures?
Coefficient of Variation (C.V%)
100%.


X
s
VC
Thickness (mm) 83 90 70 90 90 60 70 70 90 100
X

= (83 + 90 + 70 + 90 + 90 + 60 + 70 + 70 + 90 + 100) / 10 = 81.3 mm
Example: Calculate the Coefficient of Variation of the following ten observations of metal
sheet thickness
Thickness (mm)
83 (83-81.3) =1.7 2.89
90 (90-81.3) = 8.7 75.69
70 (70-81.3) = -11.3 127.69
90 (90-81.3) = 8.7 75.69
90 (90-81.3) = 8.7 75.69
60 (60-81.3) = -21.3 453.69
70 (70-81.3) = -11.3 127.69
70 (70-81.3) = -11.3 127.69
90 (90-81.3) = 8.7 75.69
100 (100-81.3) = 18.7349.69
Mean = 81.3 Sum = 1492.1
)(

 Xx 2
)(

 Xx
mms 88.12
9
1.1492

%84.15100
3.81
88.12
100%.



X
s
VC
38
39
Working Problem 2.11:
Calculate the Coefficient of Variation (CV%) for the following data set of
minimum wage ($):
7, 8, 6, 6, 8, 5, 6, 5, 8, 8
What are Combined Descriptive Measures?
Standardized Variable (the z Score)
A standardized variable is a measure of the deviation from the mean by an
individual value in units of the standard deviation:
40
Example: An instructor who has been teaching statistics for twenty years has
observed that the average grade of students is 88% and the standard deviation is 3%.
After teaching the course for two classes, one in the fall semester and one in the
spring semester of 2008, the instructor found that the average grades were as follow:
Term Mean Grade
Fall 2008 82%
Spring 2008 91%
How do these two semesters compare to the instructor’s average over the last twenty
years?
41
Standardized Variable (the z Score)
Example: An instructor who has been teaching statistics for twenty years has observed that the
average grade of students is 88% and the standard deviation is 3%. After teaching the course for
two classes, one in the fall semester and one in the spring semester of 2008, the instructor found
that the average grades were as follow:
Term Mean Grade
Fall 2008 82%
Spring 2008 91%
How do these two semesters compare to the instructor’s average over the last twenty
years?
The standardized variable (z- score) is calculated for each semester as follows:
Term Mean Grade z-Score
Fall 2008 82% z82 = (82-88)/3 = -2
Spring 2008 91% z91 =(91-88)/3 = 1
From the above scores, you can conclude that the class’s grade in the Fall 2008 being 82% was 2
standard deviations below the teacher’s mean grade, while the class’s grade in the Spring 2008
being 91% was 1 standard deviations above the teacher mean grade.
42
Example: The mean driving time of people living in Union City near Atlanta Georgia to CNN Center
in downtown Atlanta is 40 minutes, with a standard deviation of 10 minutes. You asked four CNN
employees who live in Union City about their driving time to CNN Center, and you get the following
answers: 38 minutes, 52 minutes, 58 minutes, and 40 minutes. Find the z-score that
corresponds to each driving time. Interpret the difference in z-scores?
Where t is the actual driving time, t is the mean driving time, and t is the standard deviation of
driving time.
At t = 38 minutes,
At t = 52 minutes,
At t = 58 minutes,
At t = 40 minutes,
43
Working Problem 2.12:
The average scoring points per game (PTG) up to week 10 in the 2010 NFL football
season was 22 points and the standard deviation was 4 points. Using the z-score, compare
the following 3 teams and determine which team had a relatively better scoring season:
San Francisco 16 PTG, New England 29 PTG, Pittsburgh 24 PTG
44
Working Problem 2.13:
The annual salaries of engineers in the U.S. automobile industry are normally distributed
with a mean of $100,000 and a standard deviation of $10,000. What is the z-score for the
income x of an auto-engineer who earns $85,000 annually? And what is the z-score for an
auto-engineer who earns $105,000 annually?
45
Working Problem 2.14:
The annual salaries of U.S. state governors are normally distributed with a mean of $135,450 and a
standard deviation of $36,530. If in 2007, the Arkansas governor made $85,000 annual salary, and
the California governor made $206,000.
Compare the annual salaries of these two governors using the z-score.
Arnold Schwarzenegger Mike Beebe
(California) (Arkansas)
The Use of Computer for Performing Descriptive Statistics
Powerful Tools are available to perform statistical analyses, the focus
should therefore be on:
• Planning for sample and data selection in view of the study
or application objectives
• Gathering and organizing data in such a way that serves the purpose
of the application
• Selecting the appropriate type of analysis
• Organizing the analysis output
• Interpreting the analysis outcome
• Making a report addressing the case or application
in question
46
Data on Annual Tuition and Financial Aid by Different U.S. State Colleges
(http://www.ordoludus.com/costs.php, 2006)
School
In-State Out-of-State
Tuition
Total
Cost ($)
Fin.
Tuition Aid ($)
Georgia Institute of Technology $4,648 $18,990 $25,792 $8,222
University of Tennessee $5,290 $16,060 $21,270 $6,954
University of Mississippi $4,320 $9,744 $14,442 $7,532
University of Kentucky $5,812 $12,798 $18,027 $7,861
Louisiana State University $4,515 $12,815 $19,145 $8,006
University of Florida $3,094 $16,579 $22,839 $10,566
University of Virginia $7,133 $23,877 $30,266 $13,449
University of South Carolina $7,314 $18,956 $25,039 $9,501
University of North Carolina $4,515 $18,313 $24,903 $9,687
University of Georgia $4,628 $16,848 $23,224 $7,320
University of Alabama $4,864 $13,516 $18,540 $7,980
University of California (UCLA) $6,504 $24,324 $36,252 $13,462
North Dakota State University $5,264 $12,545 $17,675 $5,487
Florida State University $3,208 $16,340 $23,118 $8,269
The Use of Computer for Performing Descriptive Statistics
Example:
47
1
2
Analysis of Descriptive Statistics: Steps 1 and 2 48
3
Analysis of Descriptive Statistics: Steps 3 and 4
4
49
5
Analysis of Descriptive Statistics: Steps 5 and 6
6
The minimum of the
largest 4 observationsThe maximum of the
smallest 4 observations
50
Analysis of Descriptive Statistics: Output 51
52
The most critical aspect of statistics is to learn how to interpret
the results… This is not your typical Math course where all you
have to do is find answers…The true answer is not the outputs..it is
the interpretation of the outputs
Statistic
In-State
Tuition ($)
Out-State
Tuition ($) Total Cost ($)
Financial Aid
($)
Mean 5079 16550 22895 8878
Median 4756 16460 22979 8114
Mode 4515 None None None
Standard
Deviation 1269.44 4196.51 5602.14 2297.84
Sample
Variance 1611486.64 17610692.25 31383983.67 5280083.14
Range 4220 14580 21810 7975
Minimum 3094 9744 14442 5487
Maximum 7314 24324 36252 13462
Count 14 14 14 14
Largest(4) 5812 18956 25039 9687
Smallest(4) 4515 12815 18540 7532
Outputs of descriptive statistics for tuition, cost, and financial aid
53
$0
$5,000
$10,000
$15,000
$20,000
$25,000
$30,000
$35,000
$40,000
NorthDakotaStateUniversity
UniversityofTennessee
UniversityofGeorgia
UniversityofMississippi
UniversityofKentucky
UniversityofAlabama
LouisianaStateUniversity
GeorgiaInstituteofTechnology
FloridaStateUniversity
UniversityofSouthCarolina
UniversityofNorthCarolina
UniversityofFlorida
UniversityofVirginia
UniversityofCalifornia(UCLA)
TotalCost($)andFinancialAid($)
School
Total Cost
Financial Aid
Total Cost and Financial Aid by School
Optimum
Choice
54
APPENDIX 2.A Steps to Add Data Analysis to Excel 2007
55
1
2
Data Analysis Add-In-Steps 1 and 2
56
3
4
5
Data Analysis Add-In-Steps 3 through 5 57
Data Analysis Add-In-Steps 6 and 7
6
7
58
8
9
Data Analysis Add-In-Steps 8 and 9
59

More Related Content

What's hot

Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAileen Balbido
 
Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)HennaAnsari
 
How to choose a right statistical test
How to choose a right statistical testHow to choose a right statistical test
How to choose a right statistical testKhalid Mahmood
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statisticsaan786
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAnand Thokal
 
Quantitative data analysis
Quantitative data analysisQuantitative data analysis
Quantitative data analysisRonaldLucasia1
 
Basic Statistics & Data Analysis
Basic Statistics & Data AnalysisBasic Statistics & Data Analysis
Basic Statistics & Data AnalysisAjendra Sharma
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsAttaullah Khan
 
Introduction To SPSS
Introduction To SPSSIntroduction To SPSS
Introduction To SPSSPhi Jack
 
Basic Descriptive Statistics
Basic Descriptive StatisticsBasic Descriptive Statistics
Basic Descriptive Statisticssikojp
 
Business Statistics Chapter 2
Business Statistics Chapter 2Business Statistics Chapter 2
Business Statistics Chapter 2Lux PP
 
Inferential statistics (2)
Inferential statistics (2)Inferential statistics (2)
Inferential statistics (2)rajnulada
 
Data presentation/ How to present Research outcome data
Data presentation/ How to present Research outcome dataData presentation/ How to present Research outcome data
Data presentation/ How to present Research outcome dataDr-Jitendra Patel
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear RegressionIndus University
 

What's hot (20)

Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)Basics of Educational Statistics (Descriptive statistics)
Basics of Educational Statistics (Descriptive statistics)
 
Spss
SpssSpss
Spss
 
Understanding statistics in research
Understanding statistics in researchUnderstanding statistics in research
Understanding statistics in research
 
How to choose a right statistical test
How to choose a right statistical testHow to choose a right statistical test
How to choose a right statistical test
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Introduction to Statistics
Introduction to StatisticsIntroduction to Statistics
Introduction to Statistics
 
INTRODUCTION TO BIO STATISTICS
INTRODUCTION TO BIO STATISTICS INTRODUCTION TO BIO STATISTICS
INTRODUCTION TO BIO STATISTICS
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Quantitative data analysis
Quantitative data analysisQuantitative data analysis
Quantitative data analysis
 
Basic Statistics & Data Analysis
Basic Statistics & Data AnalysisBasic Statistics & Data Analysis
Basic Statistics & Data Analysis
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Introduction To SPSS
Introduction To SPSSIntroduction To SPSS
Introduction To SPSS
 
Type of data
Type of dataType of data
Type of data
 
Basic Descriptive Statistics
Basic Descriptive StatisticsBasic Descriptive Statistics
Basic Descriptive Statistics
 
Business Statistics Chapter 2
Business Statistics Chapter 2Business Statistics Chapter 2
Business Statistics Chapter 2
 
Probability and statistics
Probability and statisticsProbability and statistics
Probability and statistics
 
Inferential statistics (2)
Inferential statistics (2)Inferential statistics (2)
Inferential statistics (2)
 
Data presentation/ How to present Research outcome data
Data presentation/ How to present Research outcome dataData presentation/ How to present Research outcome data
Data presentation/ How to present Research outcome data
 
Multiple Linear Regression
Multiple Linear RegressionMultiple Linear Regression
Multiple Linear Regression
 

Viewers also liked

Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statisticsguest290abe
 
Pfs3 a assignment two
Pfs3 a assignment twoPfs3 a assignment two
Pfs3 a assignment tworamakgahlele
 
Webquest Descriptive Statistics of the NCAA
Webquest Descriptive Statistics of the NCAAWebquest Descriptive Statistics of the NCAA
Webquest Descriptive Statistics of the NCAAsrthomas
 
Torturing numbers - Descriptive Statistics for Growers (2013)
Torturing numbers - Descriptive Statistics for Growers (2013)Torturing numbers - Descriptive Statistics for Growers (2013)
Torturing numbers - Descriptive Statistics for Growers (2013)jasondeveau
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with RKazuki Yoshida
 
Chapter 3 260110 044503
Chapter 3 260110 044503Chapter 3 260110 044503
Chapter 3 260110 044503guest25d353
 
Das20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsDas20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsRozainita Rosley
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive StatisticsL H
 
Malimu descriptive statistics.
Malimu descriptive statistics.Malimu descriptive statistics.
Malimu descriptive statistics.Miharbi Ignasm
 
02 descriptive statistics
02 descriptive statistics02 descriptive statistics
02 descriptive statisticsVasant Kothari
 
Midpoint of the line segment
Midpoint of the line segmentMidpoint of the line segment
Midpoint of the line segmentGrace Alilin
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive StatisticsBhagya Silva
 
Areas In Statistics
Areas In StatisticsAreas In Statistics
Areas In Statisticsguestc94d8c
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statisticsRajesh Gunesh
 

Viewers also liked (20)

Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Pfs3 a assignment two
Pfs3 a assignment twoPfs3 a assignment two
Pfs3 a assignment two
 
Webquest Descriptive Statistics of the NCAA
Webquest Descriptive Statistics of the NCAAWebquest Descriptive Statistics of the NCAA
Webquest Descriptive Statistics of the NCAA
 
Torturing numbers - Descriptive Statistics for Growers (2013)
Torturing numbers - Descriptive Statistics for Growers (2013)Torturing numbers - Descriptive Statistics for Growers (2013)
Torturing numbers - Descriptive Statistics for Growers (2013)
 
Descriptive statistics i
Descriptive statistics iDescriptive statistics i
Descriptive statistics i
 
Descriptive Statistics with R
Descriptive Statistics with RDescriptive Statistics with R
Descriptive Statistics with R
 
Chapter 3 260110 044503
Chapter 3 260110 044503Chapter 3 260110 044503
Chapter 3 260110 044503
 
Das20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statisticsDas20502 chapter 1 descriptive statistics
Das20502 chapter 1 descriptive statistics
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Malimu descriptive statistics.
Malimu descriptive statistics.Malimu descriptive statistics.
Malimu descriptive statistics.
 
NVivo use for PhD study
NVivo use for PhD studyNVivo use for PhD study
NVivo use for PhD study
 
02 descriptive statistics
02 descriptive statistics02 descriptive statistics
02 descriptive statistics
 
Midpoint of the line segment
Midpoint of the line segmentMidpoint of the line segment
Midpoint of the line segment
 
Day 3 descriptive statistics
Day 3  descriptive statisticsDay 3  descriptive statistics
Day 3 descriptive statistics
 
Descriptive Statistics
Descriptive StatisticsDescriptive Statistics
Descriptive Statistics
 
Areas In Statistics
Areas In StatisticsAreas In Statistics
Areas In Statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
Descriptive statistics
Descriptive statisticsDescriptive statistics
Descriptive statistics
 
STATISTICS
STATISTICSSTATISTICS
STATISTICS
 
Measures of position
Measures of positionMeasures of position
Measures of position
 

Similar to Descriptive Statistics, Numerical Description

UM20BB151 Business Stats - Consolidated.pptx
UM20BB151 Business Stats - Consolidated.pptxUM20BB151 Business Stats - Consolidated.pptx
UM20BB151 Business Stats - Consolidated.pptxChristopherDevakumar1
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics Bahzad5
 
Basic Analytics Module for Sponsors
Basic Analytics Module for SponsorsBasic Analytics Module for Sponsors
Basic Analytics Module for SponsorsDee Daley
 
Answer chp 1 2
Answer chp 1 2Answer chp 1 2
Answer chp 1 2IIUM
 
Exer chp1 2
Exer chp1 2Exer chp1 2
Exer chp1 2IIUM
 
Statics for the management
Statics for the managementStatics for the management
Statics for the managementRohit Mishra
 
Statics for the management
Statics for the managementStatics for the management
Statics for the managementRohit Mishra
 
Statistics for management
Statistics for managementStatistics for management
Statistics for managementVinay Aradhya
 
1. You are given only three quarterly seasonal indices and quarter.docx
1. You are given only three quarterly seasonal indices and quarter.docx1. You are given only three quarterly seasonal indices and quarter.docx
1. You are given only three quarterly seasonal indices and quarter.docxjackiewalcutt
 
Estadística investigación _grupo1_ Zitácuaro
Estadística investigación _grupo1_ ZitácuaroEstadística investigación _grupo1_ Zitácuaro
Estadística investigación _grupo1_ ZitácuaroYasminSotoEsquivel
 

Similar to Descriptive Statistics, Numerical Description (20)

UM20BB151 Business Stats - Consolidated.pptx
UM20BB151 Business Stats - Consolidated.pptxUM20BB151 Business Stats - Consolidated.pptx
UM20BB151 Business Stats - Consolidated.pptx
 
Lesson 002
Lesson 002Lesson 002
Lesson 002
 
Advanced Statistics.pptx
Advanced Statistics.pptxAdvanced Statistics.pptx
Advanced Statistics.pptx
 
001
001001
001
 
statistics.ppt
statistics.pptstatistics.ppt
statistics.ppt
 
Lecture-1.ppt
Lecture-1.pptLecture-1.ppt
Lecture-1.ppt
 
Lecture 1.ppt
Lecture 1.pptLecture 1.ppt
Lecture 1.ppt
 
Lecture 1.ppt
Lecture 1.pptLecture 1.ppt
Lecture 1.ppt
 
Engineering Statistics
Engineering Statistics Engineering Statistics
Engineering Statistics
 
Basic Analytics Module for Sponsors
Basic Analytics Module for SponsorsBasic Analytics Module for Sponsors
Basic Analytics Module for Sponsors
 
Answer chp 1 2
Answer chp 1 2Answer chp 1 2
Answer chp 1 2
 
Exer chp1 2
Exer chp1 2Exer chp1 2
Exer chp1 2
 
Statics for the management
Statics for the managementStatics for the management
Statics for the management
 
Statics for the management
Statics for the managementStatics for the management
Statics for the management
 
Statistics for management
Statistics for managementStatistics for management
Statistics for management
 
day1.ppt
day1.pptday1.ppt
day1.ppt
 
Introduction.pptx
Introduction.pptxIntroduction.pptx
Introduction.pptx
 
1. You are given only three quarterly seasonal indices and quarter.docx
1. You are given only three quarterly seasonal indices and quarter.docx1. You are given only three quarterly seasonal indices and quarter.docx
1. You are given only three quarterly seasonal indices and quarter.docx
 
Estadística investigación _grupo1_ Zitácuaro
Estadística investigación _grupo1_ ZitácuaroEstadística investigación _grupo1_ Zitácuaro
Estadística investigación _grupo1_ Zitácuaro
 
9417-2.doc
9417-2.doc9417-2.doc
9417-2.doc
 

More from getyourcheaton

Genetics and Cellular Function
Genetics and Cellular FunctionGenetics and Cellular Function
Genetics and Cellular Functiongetyourcheaton
 
Tissues and integumentary ac
Tissues and integumentary acTissues and integumentary ac
Tissues and integumentary acgetyourcheaton
 
Radical Behaviorism: B.F. Skinner
Radical Behaviorism: B.F. Skinner Radical Behaviorism: B.F. Skinner
Radical Behaviorism: B.F. Skinner getyourcheaton
 
Humanistic Theories: Carl Rogers
Humanistic Theories: Carl RogersHumanistic Theories: Carl Rogers
Humanistic Theories: Carl Rogersgetyourcheaton
 
Probability Distributions for Continuous Variables
Probability Distributions for Continuous VariablesProbability Distributions for Continuous Variables
Probability Distributions for Continuous Variablesgetyourcheaton
 
Probability Distributions for Discrete Variables
Probability Distributions for Discrete VariablesProbability Distributions for Discrete Variables
Probability Distributions for Discrete Variablesgetyourcheaton
 
Descriptive Statistics Part II: Graphical Description
Descriptive Statistics Part II: Graphical DescriptionDescriptive Statistics Part II: Graphical Description
Descriptive Statistics Part II: Graphical Descriptiongetyourcheaton
 
An Overview of Basic Statistics
An Overview of Basic StatisticsAn Overview of Basic Statistics
An Overview of Basic Statisticsgetyourcheaton
 
Measures of Central Tendency, Variability, and Position
Measures of Central Tendency, Variability, and PositionMeasures of Central Tendency, Variability, and Position
Measures of Central Tendency, Variability, and Positiongetyourcheaton
 
Autonomic nervous system and visceral reflexes
Autonomic nervous system and visceral reflexesAutonomic nervous system and visceral reflexes
Autonomic nervous system and visceral reflexesgetyourcheaton
 
Water, Electrolyte, And Acid-Base Balance
Water, Electrolyte, And Acid-Base BalanceWater, Electrolyte, And Acid-Base Balance
Water, Electrolyte, And Acid-Base Balancegetyourcheaton
 

More from getyourcheaton (13)

Genetics and Cellular Function
Genetics and Cellular FunctionGenetics and Cellular Function
Genetics and Cellular Function
 
Tissues and integumentary ac
Tissues and integumentary acTissues and integumentary ac
Tissues and integumentary ac
 
Radical Behaviorism: B.F. Skinner
Radical Behaviorism: B.F. Skinner Radical Behaviorism: B.F. Skinner
Radical Behaviorism: B.F. Skinner
 
Humanistic Theories: Carl Rogers
Humanistic Theories: Carl RogersHumanistic Theories: Carl Rogers
Humanistic Theories: Carl Rogers
 
Probability Distributions for Continuous Variables
Probability Distributions for Continuous VariablesProbability Distributions for Continuous Variables
Probability Distributions for Continuous Variables
 
Probability Distributions for Discrete Variables
Probability Distributions for Discrete VariablesProbability Distributions for Discrete Variables
Probability Distributions for Discrete Variables
 
Intro to probability
Intro to probabilityIntro to probability
Intro to probability
 
Descriptive Statistics Part II: Graphical Description
Descriptive Statistics Part II: Graphical DescriptionDescriptive Statistics Part II: Graphical Description
Descriptive Statistics Part II: Graphical Description
 
An Overview of Basic Statistics
An Overview of Basic StatisticsAn Overview of Basic Statistics
An Overview of Basic Statistics
 
Measures of Central Tendency, Variability, and Position
Measures of Central Tendency, Variability, and PositionMeasures of Central Tendency, Variability, and Position
Measures of Central Tendency, Variability, and Position
 
Autonomic nervous system and visceral reflexes
Autonomic nervous system and visceral reflexesAutonomic nervous system and visceral reflexes
Autonomic nervous system and visceral reflexes
 
Water, Electrolyte, And Acid-Base Balance
Water, Electrolyte, And Acid-Base BalanceWater, Electrolyte, And Acid-Base Balance
Water, Electrolyte, And Acid-Base Balance
 
Urinary system
Urinary systemUrinary system
Urinary system
 

Recently uploaded

BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...Nguyen Thanh Tu Collection
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxheathfieldcps1
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....Ritu480198
 
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Denish Jangid
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppCeline George
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project researchCaitlinCummins3
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024Borja Sotomayor
 
Implanted Devices - VP Shunts: EMGuidewire's Radiology Reading Room
Implanted Devices - VP Shunts: EMGuidewire's Radiology Reading RoomImplanted Devices - VP Shunts: EMGuidewire's Radiology Reading Room
Implanted Devices - VP Shunts: EMGuidewire's Radiology Reading RoomSean M. Fox
 
Benefits and Challenges of OER by Shweta Babel.pptx
Benefits and Challenges of OER by Shweta Babel.pptxBenefits and Challenges of OER by Shweta Babel.pptx
Benefits and Challenges of OER by Shweta Babel.pptxsbabel
 
demyelinated disorder: multiple sclerosis.pptx
demyelinated disorder: multiple sclerosis.pptxdemyelinated disorder: multiple sclerosis.pptx
demyelinated disorder: multiple sclerosis.pptxMohamed Rizk Khodair
 
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45MysoreMuleSoftMeetup
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文中 央社
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...Gary Wood
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...Nguyen Thanh Tu Collection
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...Nguyen Thanh Tu Collection
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhleson0603
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Celine George
 
Chapter 7 Pharmacosy Traditional System of Medicine & Ayurvedic Preparations ...
Chapter 7 Pharmacosy Traditional System of Medicine & Ayurvedic Preparations ...Chapter 7 Pharmacosy Traditional System of Medicine & Ayurvedic Preparations ...
Chapter 7 Pharmacosy Traditional System of Medicine & Ayurvedic Preparations ...Sumit Tiwari
 

Recently uploaded (20)

BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
BỘ LUYỆN NGHE TIẾNG ANH 8 GLOBAL SUCCESS CẢ NĂM (GỒM 12 UNITS, MỖI UNIT GỒM 3...
 
The basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptxThe basics of sentences session 4pptx.pptx
The basics of sentences session 4pptx.pptx
 
diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....diagnosting testing bsc 2nd sem.pptx....
diagnosting testing bsc 2nd sem.pptx....
 
Word Stress rules esl .pptx
Word Stress rules esl               .pptxWord Stress rules esl               .pptx
Word Stress rules esl .pptx
 
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
Basic Civil Engineering notes on Transportation Engineering, Modes of Transpo...
 
Improved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio AppImproved Approval Flow in Odoo 17 Studio App
Improved Approval Flow in Odoo 17 Studio App
 
SURVEY I created for uni project research
SURVEY I created for uni project researchSURVEY I created for uni project research
SURVEY I created for uni project research
 
UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024UChicago CMSC 23320 - The Best Commit Messages of 2024
UChicago CMSC 23320 - The Best Commit Messages of 2024
 
Implanted Devices - VP Shunts: EMGuidewire's Radiology Reading Room
Implanted Devices - VP Shunts: EMGuidewire's Radiology Reading RoomImplanted Devices - VP Shunts: EMGuidewire's Radiology Reading Room
Implanted Devices - VP Shunts: EMGuidewire's Radiology Reading Room
 
Benefits and Challenges of OER by Shweta Babel.pptx
Benefits and Challenges of OER by Shweta Babel.pptxBenefits and Challenges of OER by Shweta Babel.pptx
Benefits and Challenges of OER by Shweta Babel.pptx
 
demyelinated disorder: multiple sclerosis.pptx
demyelinated disorder: multiple sclerosis.pptxdemyelinated disorder: multiple sclerosis.pptx
demyelinated disorder: multiple sclerosis.pptx
 
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
Exploring Gemini AI and Integration with MuleSoft | MuleSoft Mysore Meetup #45
 
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文會考英文
 
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...When Quality Assurance Meets Innovation in Higher Education - Report launch w...
When Quality Assurance Meets Innovation in Higher Education - Report launch w...
 
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
24 ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH SỞ GIÁO DỤC HẢI DƯ...
 
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
ĐỀ THAM KHẢO KÌ THI TUYỂN SINH VÀO LỚP 10 MÔN TIẾNG ANH FORM 50 CÂU TRẮC NGHI...
 
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinhĐề tieng anh thpt 2024 danh cho cac ban hoc sinh
Đề tieng anh thpt 2024 danh cho cac ban hoc sinh
 
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
Removal Strategy _ FEFO _ Working with Perishable Products in Odoo 17
 
Chapter 7 Pharmacosy Traditional System of Medicine & Ayurvedic Preparations ...
Chapter 7 Pharmacosy Traditional System of Medicine & Ayurvedic Preparations ...Chapter 7 Pharmacosy Traditional System of Medicine & Ayurvedic Preparations ...
Chapter 7 Pharmacosy Traditional System of Medicine & Ayurvedic Preparations ...
 
Including Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdfIncluding Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdf
 

Descriptive Statistics, Numerical Description

  • 1. DESCRIPTIVE STATISTICS Part I: Numerical Description In this chapter, we will learn how to describe a set of data using numerical methods. This is the first of two chapters that together will aim at providing methods of descriptive statistics. In descriptive statistics, which is the use of graphical methods to display data and explore key statistics. 1
  • 2. What are the basic features of a data set? A data set is a collection of data representing a particular variable. Examples of data sets are given below. Data Sets: • Students’ grades in a calculus test: 65, 85, 70, 75, 85, 80, 82, 85, 90, 78, 81, 82, 67, 80 • Property tax of a sample of houses: $5000, $4500, $4000, $7200, $5000, $3800, $4100, $5000 • Driving distance to work of a group of employee (miles): 1.2, 2.0, 2.2, 15.0, 11.0, 5.0, 3.7, 4.9, 15.2, 16.0 • Ages of all students in a college: 18, 19, 21, ……………………..…, 22, 18, 19, 21 2Notes: ………………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………………….
  • 3. In general, establishing a data set requires consideration of a number of key questions: Notes: ………………………………………………………………………………………………………………………………… …………………………………………………………………………………………………………………………………. 3 Data Set Key Questions: •Are the data qualitative or quantitative? •What levels of measurement do the data exhibit? (nominal, ordinal, interval, or ratio) •What is the source of data?(the population) •What is the appropriate sampling technique that should be used to collect the samples? (random or stratified) •What is the appropriate minimum sample size?
  • 4. 4 Data Set Types: (1) Univariate, (2) Bivariate, (3) Multivariate Data Set Variable Typical Tasks Univariate One Histograms, Descriptive Statistics, Frequency tallies Bivariate Two Scatter plots, correlations, simple regression Multivariate More than two variables Multiple regression, data mining, modeling Person # Weight (lb) 1 150 2 120 3 130 4 125 5 155 6 134 7 150 8 140 9 160 10 200 11 180 12 140 Person # Years at work Annual Salary ($) 1 5 50,000 2 20 73,000 3 10 65,000 4 5 55,000 5 8 60,000 6 10 60,000 7 15 68,000 8 15 69,000 9 20 68,000 10 20 69,000 11 18 68,000 12 10 62,000 13 3 48,000 UnivariateDataSet BivariateDataSet Case Name Age Income ($) Position Gender 1Frieda 45 67,100 Consumer Analyst F 2Stefan 32 56,500 Operations analyst M 3John 55 88,200 Marketing VP F 4Donna 27 59,000 Statistician F 5Larry 46 26,000 Security guard M 6Alicia 52 68,500 QC Director F 7Alec 65 95,200 Chief executive M 8Jaime 50 71,200 Human Resources M Multivariate Data Set Notes: ………………………………………………………………………………………………………………………………… ………………………………………………………………………………………………………………………………….
  • 5. Time-series data set 5 Cross sectional Sample Notes: ………………………………………………………………………………………………………………………………… …………………………………………………………………………………………………………………………………. Data Sets in the Context of Sampling: • Cross sectional data set • Time-series data set
  • 7. 7 Working Problem 2.2: Explain what is inheritance tax. What is the difference between inheritance tax and Estate tax? What is the level of measurement for each of the following variables: State, Income tax, sales tax, and inheritance tax. Why do some states have a wide income tax range? http://portal.kiplinger.com/tools/slideshows/slideshow_pop.html?nm=TaxUnfriendlyStatesRetirees State Income Tax (%) U.S. States Sales Tax (%) Inheritance Tax (%) Alaska 0.0 0.0 NO Wyoming 0.0 4.0 No Michigan 4.4 6.0 No Pennsylvania 3.1 6.0 YES Colorado 4.6 2.9 NO Delaware 4.6 0.0 NO Hawaii 1.4 to 11 4.0 NO Georgia 1.0 to 6.0 4.0 NO South Carolina 3.0 to 7.0 6.0 NO Alabama 2.0 to 5.0 4.0 NO California 1.25 to 10.55 8.3 NO Rhode Island 3.75-9.9 7.0 NO New Jersey 1.4 to 8.97 7.0 YES Vermont 3.55-8.95 6.0 NO Iowa 0.36 to 8.98 6.0 YES Nebraska 2.56 to 6.84 5.5 Yes Wisconsin 4.6 to 7.75 5.0 NO Oregon 5.0 to 11.0 0.0 YES Indiana 3.4 7.0 YES North Dakota 1.84-4.86 5.0 NO
  • 8. 8 Working Problem 2.3: Identify the following data sets as ‘Cross-Sectional Data’ or ‘Time-Series Data’: (a) Two weeks before the 56th quadrennial United States presidential election, which was held on November 4, 2008, a sample of people taking randomly from undecided states revealed that Democrat Barack Obama is expected to earn 54% of the popular votes and John McCain is expected to earn 46% of the votes Cross Sectional ( ) Time-Series ( ) (b) A survey of 1000 students from a university of 10,000 students, revealed that 65% of the students do not prefer weekend classes Cross Sectional ( ) Time-Series ( ) (c) The U.S. City average price per gallon of unleaded regular gasoline from 2000 to 2009 was as follow: Year Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec 2000 1.301 1.369 1.541 1.506 1.498 1.617 1.593 1.51 1.582 1.559 1.555 1.489 2001 1.472 1.484 1.447 1.564 1.729 1.64 1.482 1.427 1.531 1.362 1.263 1.131 2002 1.139 1.13 1.241 1.407 1.421 1.404 1.412 1.423 1.422 1.449 1.448 1.394 2003 1.473 1.641 1.748 1.659 1.542 1.514 1.524 1.628 1.728 1.603 1.535 1.494 2004 1.592 1.672 1.766 1.833 2.009 2.041 1.939 1.898 1.891 2.029 2.01 1.882 2005 1.823 1.918 2.065 2.283 2.216 2.176 2.316 2.506 2.927 2.785 2.343 2.186 2006 2.315 2.31 2.401 2.757 2.947 2.917 2.999 2.985 2.589 2.272 2.241 2.334 2007 2.274 2.285 2.592 2.86 3.13 3.052 2.961 2.782 2.789 2.793 3.069 3.02 2008 3.047 3.033 3.258 3.441 3.764 4.065 4.09 3.786 3.698 3.173 2.151 1.689 2009 1.787 1.928 1.949 2.056 2.265 2.631 2.543 2.627 2.574 2.561 2.66 2.621 http://data.bls.gov/cgi-bin/surveymost Cross Sectional ( ) Time-Series ( )
  • 10. 10 Numerical measures of descriptive statistics consist of two types of measures: • Measures of central tendency (mean, median, and mode) • Measures of dispersion (range, standard deviation, and variance) • Combined measures (coefficient of variation, signal-to-noise ratio, and standardized variable) Measures of Central Tendency 10.0 12.0 13.0 14.0 14.0 14.0 24.0 24.0 24.0 26.8 27.0 27.0 29.0 30 Mean Mode Median Measures of Dispersion Range Standard Deviation Variance
  • 11. 11 Key points to perform good statistical analysis 1. Identify your objectives:  What questions do you really need to answer?  What variable do you need to examine?  What population you are about to evaluate? 2. Collect the appropriate samples and data to address your questions: Do you have access to the entire population?  Would a selection of a sample from the population be easier to access, less costly, and less destructive than an evaluation of the whole population?  Remember ‘GIGO’ or garbage-in, garbage-out. If the samples are not representative of the population, and the data collected is not accurate and precise, the conclusions drawn from the analysis will be meaningless. 3. Describe the data using the analysis of descriptive statistics :  Do you detect data abnormality or outliers?  Can you explore the data in such a way that will provide a clear description of data center and data variability?  Use descriptive statistics as a guideline for other methods of analysis 4. Perform inference :  Can the sample statistics be used to estimate population parameters?  Is your estimation of population parameters reliable?  Do you have confidence in the population estimates?
  • 12. Center Values Measures of Central Tendency Mean Mode Median 12 What are the measures of central tendency? 10.0 12.0 13.0 14.0 14.0 14.0 24.0 24.0 24.0 26.8 27.0 27.0 29.0 30
  • 13. (1) Arithmetic Mean Measures of Data Center (Central Tendency) Arithmetic Mean of Sample Observations Arithmetic Mean of Population Observations 13
  • 14. Example: The Table below illustrates a comparison of gas prices in some States in September 2009 and September 2008. Determine the mean of gas prices ($ per gallon) for each year. State Sept- 2009 Sep-2008 California 3.099 3.75 Colorado 2.48 3.732 Florida 2.527 3.893 Massachusetts 2.597 3.582 Minnesota 2.452 3.765 New York 2.811 3.805 Ohio 2.411 3.933 Texas 2.404 3.729 Washington 2.947 3.785 Gas Prices of a number of states in September 2008, and September 2009 http://www.eia.doe.gov/oil_gas/petroleum/data_publications/wrgp/mogas_home_page.html gallon n x XMean n i /636.2$ 9 947.2........597.2527.248.2099.31     gallon n x XMean n i /775.3$ 9 785.3........582.3893.3732.375.31     For September 2009: For September 2008: Comment on the Results 14
  • 15. 15 Properties of arithmetic mean: 1. The mean of a set of data is unique and can be used as an identity measure of the data center 2. We can determine the mean of any data set that contains ratio or interval level data 3. We need all observation values to be able to calculate the mean 4. You know it is the correct mean value when the sum of the deviations of each value from it is zero,
  • 16. 16 Example: Determine the arithmetic mean of the three values of student grades: 80, 40, and 30. Using the mean value, prove that . Solution: The arithmetic mean:
  • 17. 17 Working Problem 2.4: Calculate the mean for the following data set of minimum wage ($): 7, 8, 6, 6, 8, 5, 6, 5, 8, 8
  • 18. 18 2. Median: The median of a set of numbers arranged in order of magnitude is the middle value or the arithmetic mean of the two middle values. Example: Calculate the median of the following data set: 14, 12, 14, 16, 15, 19, 17, 17, 17 Solution: To determine the median, we first arrange the data in order of magnitude: 12, 14, 14, 15, 16, 17, 17, 17, 19 Thus, the median is 16 Example: Calculate the median of the following data set: 8, 9, 10, 9, 8, 6, 11, 7, 12, 8 Solution: To determine the median, we first arrange the data in order of magnitude 6, 7, 8, 8, 8, 9, 9, 10, 11, 12 Since this data set consists of an even number of observations, the middle values that split this data into equal number of observations on both sides are 8 and 9. Thus, the median of this set of data is (8+9)/2 = 8.5
  • 19. 19 3. Mode: The mode is that value which occurs with the greatest frequency. Interestingly, mode is a French word that means fashion; perhaps, it is popular and common fashion. Example: Calculate the mode of the following observations: 80, 87, 90, 82, 78, 74, 80, 77, 80, 91, 81, 80 Example: Calculate the mode of the following observations: 5, 7, 8, 9, 9, 9, 10, 11, 12, 14, 14, 14, 15 Solution: The mode of this set is 80 Solution: This set exhibits two modes 9, and 14, and is called bimodal.
  • 20. Working Problem: Calculate the mean and the mode and the median for the following data set of minimum wage ($) 7, 8, 6, 6, 8, 5, 6, 5, 8, 8 20 Answer: Mean = $6.7 Median =$6.5 Mode = $8
  • 21. 21 Working Problem 2.6: Calculate the median and the mode for the following data set of minimum wage ($): 7, 8, 6, 6, 8, 5, 6, 5, 8, 8
  • 22. Geometric Mean, G n nxxxxG ......321 Example: If the return on investment earned by a manufacturer of a sport car for four successive years was: 20 percent, 15 percent, -40 percent, and 100 percent. What is the geometric mean rate of return on investment? 1344.1656.1)2)(6.0)(15.1)(2.1(...... 44 321  n nxxxxG Accordingly, the average rate of return, which is essentially a compound annual growth rate, is 13.44%. 22
  • 23. 23 Example: Suppose the inflation rates for the last 5 years in a certain country are 5%, 4%, 2%, 8%, and 6%, respectively. What is the mean rate of inflation over this five-year period? Accordingly, the average rate of inflation over the five-year period is 4.9% Geometric Mean, G Solution: At the end of the first year, the price index will be 1.05 times the price index at the beginning of the year; at the end of the second year, the price index will be (1.04)(1.05); at the end of the third year, the price index will be (1.02)(1.04)(1.05) and so on. Thus, the mean of 1.05, 1.04, 1.02, 1.08, and 1.06 is:
  • 24. 24 Working Problem 2.8: The percent increase in sales for the last 4 years at X-L Company were: 9.91, 10.75, 13.12, 26.6 (a) Find the geometric mean percent increase. (b) Find the arithmetic mean percent increase. (c) Is the arithmetic mean equal to or greater than the geometric mean?
  • 25. 25 What are the ‘dispersion’ or variability measures?  Range  Mean deviation  Standard deviation  Variance 10.0 12.0 13.0 14.0 14.0 14.0 24.0 24.0 24.0 26.8 27.0 27.0 29.0 30 Measures of Dispersion Range Standard Deviation Variance
  • 26. What are ‘Dispersion’ or Variability measures?  Range  Mean deviation  Standard deviation  Variance minmax XXR  Example: Calculate the range of the following set of data: 200, 205, 204, 202, 207, 208 26 The Range = R = 208 - 200 = 8
  • 27. 27 Properties of range: • The range represents the most commonly used statistic after the arithmetic mean • It is simple as it relies on two values, the maximum value and the minimum value • It is easy to understand: the higher the range, the higher the variability • Since the range relies on two values (maximum and minimum), a mistake in any one of these two values or a presence of an outlier can result in a misleading value of range minmax XXR 
  • 28. What are ‘Dispersion’ or Variability measures? Mean deviation    n i Xx n MD 1 1 28 Example : Calculate the mean deviation of the following ten observations of metal sheet thickness (mm): 83, 90, 70, 90, 90, 60, 70, 70, 90, 100 Solution: Step 1: Calculate the Mean X  = (83 + 90 + 70 + 90 + 90 + 60 + 70 + 70 + 90 + 100) / 10 = 81.3 mm Step 2: Subtract each observation value from the Mean value, and add up the absolute differences Thickness (mm) 83 (83-81.3) =1.7 1.7 90 (90-81.3) = 8.7 8.7 70 (70-81.3) = -11.3 11.3 90 (90-81.3) = 8.7 8.7 90 (90-81.3) = 8.7 8.7 60 (60-81.3) = -21.3 21.3 70 (70-81.3) = -11.3 11.3 70 (70-81.3) = -11.3 11.3 90 (90-81.3) = 8.7 8.7 100 (100-81.3) = 18.7 18.7 Mean = = 81.3 Sum = 110.4 )(   Xx |)(|   Xx  X Mean Deviation = 110.4/10 = 11.04 mm
  • 29. What are ‘Dispersion’ or Variability measures?  Range  Mean deviation  Standard deviation  Variance For a Population:      N i N x 1 2   For n < 30, we use (n-1) in the denominator For a Sample:     n i n Xx s 1 2 )( 29
  • 30. Standard deviation Thickness (mm) 83 90 70 90 90 60 70 70 90 100 X  = (83 + 90 + 70 + 90 + 90 + 60 + 70 + 70 + 90 + 100) / 10 = 81.3 mm For n < 30, we use (n-1) in the denominator Example: Calculate the standard deviation of the following ten observations of metal sheet thickness     n i n Xx s 1 2 )( Thickness (mm) 83 (83-81.3) =1.7 2.89 90 (90-81.3) = 8.7 75.69 70 (70-81.3) = -11.3 127.69 90 (90-81.3) = 8.7 75.69 90 (90-81.3) = 8.7 75.69 60 (60-81.3) = -21.3 453.69 70 (70-81.3) = -11.3 127.69 70 (70-81.3) = -11.3 127.69 90 (90-81.3) = 8.7 75.69 100 (100-81.3) = 18.7349.69 Mean = 81.3 Sum = 1492.1 )(   Xx 2 )(   Xx mms 88.12 9 1.1492       n i n Xx s 1 2 1 )( 30
  • 31. What are ‘Dispersion’ or Variability measures?  Range  Mean deviation  Standard deviation  Variance For a Population:      N i N x 1 2 2   For n < 30, we use (n-1) in the denominator For a Sample:     n i n Xx s 1 2 2 )( 31
  • 32. 32 Properties of variance: • The variance represents the most commonly used statistic to indicate variability • It is easy to understand: the higher the variance, the higher the variability • Unlike the range, the variance takes into account all values of the observation values. Therefore, it is largely insensitive to outliers • Variance values cannot be subtracted to determine variability. It can only be added. If U = X ± Y, Var (U) = Var (X) + Var (Y). This is the principle of analysis of variance (Chapter 11)
  • 33. What are ‘Dispersion’ or Variability measures? Variance Thickness (mm) 83 90 70 90 90 60 70 70 90 100 X  = (83 + 90 + 70 + 90 + 90 + 60 + 70 + 70 + 90 + 100) / 10 = 81.3 mm For n < 30, we use (n-1) in the denominator Example: Calculate the varianceof the following ten observations of metal sheet thickness Thickness (mm) 83 (83-81.3) =1.7 2.89 90 (90-81.3) = 8.7 75.69 70 (70-81.3) = -11.3 127.69 90 (90-81.3) = 8.7 75.69 90 (90-81.3) = 8.7 75.69 60 (60-81.3) = -21.3 453.69 70 (70-81.3) = -11.3 127.69 70 (70-81.3) = -11.3 127.69 90 (90-81.3) = 8.7 75.69 100 (100-81.3) = 18.7349.69 Mean = 81.3 Sum = 1492.1 )(   Xx 2 )(   Xx 22 79.165 9 1.1492 mms  33     n i n Xx s 1 2 2 )(     n i n Xx s 1 2 2 )(
  • 34. Working Problem: Calculate the minimum, maximum, range, standard deviation, and variance for the following data set of minimum wage ($) 7, 8, 6, 6, 8, 5, 6, 5, 8, 8 34 Answer: Minimum = $5 Maximum =$8 Range = $3 Standard deviation = $1.252 Variance = 1.567
  • 35. 35 Example: Suppose X and Y are independent random variables. The variance of X is equal to 16; and the variance of Y is equal to 9. Let U = X - Y. What is the standard deviation of U? •2.65 ………. •5.00 ………. •7.00 ………. •25.0 ………. •None of the above ……….
  • 36. 36 Working Problem 2.9: Question (1): Calculate the minimum, the maximum, the range, the mean deviation, the standard deviation, and the variance for the following data set of minimum wage ($) 7, 8, 6, 6, 8, 5, 6, 5, 8, 8 Question (2): In two consecutive exams, the mean grade of the first test was 80 and the mean grade of the second test was 90. The standard deviation of grade of the first test was 6 and the standard deviation of grade of the second test was 8. Calculate the mean of the two tests and the variance of the two tests?
  • 37. 37
  • 38. What are Combined Descriptive Measures? Coefficient of Variation (C.V%) 100%.   X s VC Thickness (mm) 83 90 70 90 90 60 70 70 90 100 X  = (83 + 90 + 70 + 90 + 90 + 60 + 70 + 70 + 90 + 100) / 10 = 81.3 mm Example: Calculate the Coefficient of Variation of the following ten observations of metal sheet thickness Thickness (mm) 83 (83-81.3) =1.7 2.89 90 (90-81.3) = 8.7 75.69 70 (70-81.3) = -11.3 127.69 90 (90-81.3) = 8.7 75.69 90 (90-81.3) = 8.7 75.69 60 (60-81.3) = -21.3 453.69 70 (70-81.3) = -11.3 127.69 70 (70-81.3) = -11.3 127.69 90 (90-81.3) = 8.7 75.69 100 (100-81.3) = 18.7349.69 Mean = 81.3 Sum = 1492.1 )(   Xx 2 )(   Xx mms 88.12 9 1.1492  %84.15100 3.81 88.12 100%.    X s VC 38
  • 39. 39 Working Problem 2.11: Calculate the Coefficient of Variation (CV%) for the following data set of minimum wage ($): 7, 8, 6, 6, 8, 5, 6, 5, 8, 8
  • 40. What are Combined Descriptive Measures? Standardized Variable (the z Score) A standardized variable is a measure of the deviation from the mean by an individual value in units of the standard deviation: 40 Example: An instructor who has been teaching statistics for twenty years has observed that the average grade of students is 88% and the standard deviation is 3%. After teaching the course for two classes, one in the fall semester and one in the spring semester of 2008, the instructor found that the average grades were as follow: Term Mean Grade Fall 2008 82% Spring 2008 91% How do these two semesters compare to the instructor’s average over the last twenty years?
  • 41. 41 Standardized Variable (the z Score) Example: An instructor who has been teaching statistics for twenty years has observed that the average grade of students is 88% and the standard deviation is 3%. After teaching the course for two classes, one in the fall semester and one in the spring semester of 2008, the instructor found that the average grades were as follow: Term Mean Grade Fall 2008 82% Spring 2008 91% How do these two semesters compare to the instructor’s average over the last twenty years? The standardized variable (z- score) is calculated for each semester as follows: Term Mean Grade z-Score Fall 2008 82% z82 = (82-88)/3 = -2 Spring 2008 91% z91 =(91-88)/3 = 1 From the above scores, you can conclude that the class’s grade in the Fall 2008 being 82% was 2 standard deviations below the teacher’s mean grade, while the class’s grade in the Spring 2008 being 91% was 1 standard deviations above the teacher mean grade.
  • 42. 42 Example: The mean driving time of people living in Union City near Atlanta Georgia to CNN Center in downtown Atlanta is 40 minutes, with a standard deviation of 10 minutes. You asked four CNN employees who live in Union City about their driving time to CNN Center, and you get the following answers: 38 minutes, 52 minutes, 58 minutes, and 40 minutes. Find the z-score that corresponds to each driving time. Interpret the difference in z-scores? Where t is the actual driving time, t is the mean driving time, and t is the standard deviation of driving time. At t = 38 minutes, At t = 52 minutes, At t = 58 minutes, At t = 40 minutes,
  • 43. 43 Working Problem 2.12: The average scoring points per game (PTG) up to week 10 in the 2010 NFL football season was 22 points and the standard deviation was 4 points. Using the z-score, compare the following 3 teams and determine which team had a relatively better scoring season: San Francisco 16 PTG, New England 29 PTG, Pittsburgh 24 PTG
  • 44. 44 Working Problem 2.13: The annual salaries of engineers in the U.S. automobile industry are normally distributed with a mean of $100,000 and a standard deviation of $10,000. What is the z-score for the income x of an auto-engineer who earns $85,000 annually? And what is the z-score for an auto-engineer who earns $105,000 annually?
  • 45. 45 Working Problem 2.14: The annual salaries of U.S. state governors are normally distributed with a mean of $135,450 and a standard deviation of $36,530. If in 2007, the Arkansas governor made $85,000 annual salary, and the California governor made $206,000. Compare the annual salaries of these two governors using the z-score. Arnold Schwarzenegger Mike Beebe (California) (Arkansas)
  • 46. The Use of Computer for Performing Descriptive Statistics Powerful Tools are available to perform statistical analyses, the focus should therefore be on: • Planning for sample and data selection in view of the study or application objectives • Gathering and organizing data in such a way that serves the purpose of the application • Selecting the appropriate type of analysis • Organizing the analysis output • Interpreting the analysis outcome • Making a report addressing the case or application in question 46
  • 47. Data on Annual Tuition and Financial Aid by Different U.S. State Colleges (http://www.ordoludus.com/costs.php, 2006) School In-State Out-of-State Tuition Total Cost ($) Fin. Tuition Aid ($) Georgia Institute of Technology $4,648 $18,990 $25,792 $8,222 University of Tennessee $5,290 $16,060 $21,270 $6,954 University of Mississippi $4,320 $9,744 $14,442 $7,532 University of Kentucky $5,812 $12,798 $18,027 $7,861 Louisiana State University $4,515 $12,815 $19,145 $8,006 University of Florida $3,094 $16,579 $22,839 $10,566 University of Virginia $7,133 $23,877 $30,266 $13,449 University of South Carolina $7,314 $18,956 $25,039 $9,501 University of North Carolina $4,515 $18,313 $24,903 $9,687 University of Georgia $4,628 $16,848 $23,224 $7,320 University of Alabama $4,864 $13,516 $18,540 $7,980 University of California (UCLA) $6,504 $24,324 $36,252 $13,462 North Dakota State University $5,264 $12,545 $17,675 $5,487 Florida State University $3,208 $16,340 $23,118 $8,269 The Use of Computer for Performing Descriptive Statistics Example: 47
  • 48. 1 2 Analysis of Descriptive Statistics: Steps 1 and 2 48
  • 49. 3 Analysis of Descriptive Statistics: Steps 3 and 4 4 49
  • 50. 5 Analysis of Descriptive Statistics: Steps 5 and 6 6 The minimum of the largest 4 observationsThe maximum of the smallest 4 observations 50
  • 51. Analysis of Descriptive Statistics: Output 51
  • 52. 52 The most critical aspect of statistics is to learn how to interpret the results… This is not your typical Math course where all you have to do is find answers…The true answer is not the outputs..it is the interpretation of the outputs Statistic In-State Tuition ($) Out-State Tuition ($) Total Cost ($) Financial Aid ($) Mean 5079 16550 22895 8878 Median 4756 16460 22979 8114 Mode 4515 None None None Standard Deviation 1269.44 4196.51 5602.14 2297.84 Sample Variance 1611486.64 17610692.25 31383983.67 5280083.14 Range 4220 14580 21810 7975 Minimum 3094 9744 14442 5487 Maximum 7314 24324 36252 13462 Count 14 14 14 14 Largest(4) 5812 18956 25039 9687 Smallest(4) 4515 12815 18540 7532 Outputs of descriptive statistics for tuition, cost, and financial aid
  • 54. 54
  • 55. APPENDIX 2.A Steps to Add Data Analysis to Excel 2007 55
  • 58. Data Analysis Add-In-Steps 6 and 7 6 7 58

Editor's Notes

  1. See Chapter 1 in the text for an overview of these issues
  2. Univariate data set: data of one variable (Example: people weight) Bivariate data set: data of two variables (Example: years at work, annual salary) Multivariate data set: data of more than two variables (Example: name, age, income, position, and gender) Note in the top table, the different statistical tools that can be used for each type of data set. These will be discussed throughout this course.
  3. A data set may also be divided on the basis of the sampling approach as follows:   1. Cross sectional data set 2. Time-series data set In a cross-sectional data set, all samples are collected at more or less the same point in time. In a time-series data set, samples are collected at specific points over time (e.g., weekly, monthly or quarterly). The Figure here illustrates these two types of sampling approach, and the common analyses used for the data collected in each type. In practice, cross-sectional samples are normally large in size as they are typically used to take a snap shot at a population that is assumed to be more or less stable. An example of cross-sectional data is the data of the prices of houses in a certain area collected in a given year. On the other hand, time-series samples are used for populations that encounter dynamic changes, periodic or seasonal. For example, the information on the quarterly revenue of a company is a time-series data set.
  4. This is an Optional Working Problem: This working problem aims at encouraging students to read about the origin and the meaning of data set before they analyze it. Students will be required to search the web for this information then use critical thinking to address why some states have a wide income tax range.
  5. These key points will become clear as we cover descriptive statistics and move to inferential statistics
  6. Note that X-Bar and mu are universal symbols…X-bar will always be used to describe a sample mean (or a statistic) and mu will always be used to describe a population mean (or a parameter)
  7. By comparison of the arithmetic means of the two years, it is obvious that the average price in 2009 was about $1.139 cheaper than that of 2008 at the month of September. This is obviously good news to everyone particularly those who drive big SUVs and trucks. The results, however, do not tell us the cause of this drop. In searching for the cause of this significant drop, one may consult other sources that may partially or fully explain the trend. For example, a barrel of crude oil in this period in 2008 was about $105. In 2009, the price went down significantly and in September of 2009, it was only $69. Again, this may only represent a partial cause of the drop in gas price and one must also entertain other possible causes.
  8. WP 2.4: Mean = $6.7 WP2.5: $7580 $2,350 $1.26
  9. WP 2.6: Median = $6.5 Mode = $8 WP 2.7 (a) Median = $6500, Mode = $6000 (b) Median = $2.552, Mode = None (a) Median = $1.262, Mode = $1.265
  10. Geometric Mean Commonly used when the variable under consideration is likely to change over time or periodically. It is used to find the average change over the entire period under study. Typical situations in which the geometric mean is useful include: population growth, quarterly or annual return on investment, and inflation. Note in the above Example: The value of 1.2 reflects a 20% return on investment (the original investment of 1.0 plus the return of 0.2); the value of 0.6 reflects a loss of 40% (the original investment of 1.0 less than the loss of 0.4); and the value of 2 reflects a gain of 100% (the original investment of 1.0 plus the return of 1). In business calculation, the total return each period is typically reinvested in the next period, or it becomes the base for the next period. This makes the base for, say the second year 1.2, and the base for the third year (1.2)(1.15), and that for the fourth year (1.2)(1.15)(0.6) and so forth.
  11. Note that if we calculated the classic arithmetic mean for the annual inflations [(6+8+2+4+5)/5 = 5%], we would have a higher value than the geometric mean and this will overstate the true rate of inflation.
  12. 1.1491 1.15095 Greater
  13. The advantage of using the range is its simplicity. The disadvantage is that it relies on two observations, the minimum value and the maximum value. If one of these two observations is an ‘outlier’, or a value that is inconsistent with the family of data under consideration, the range will be misleading. In addition, using the range may be associated with a loss of data resolution as will be seen shortly.
  14. For n < 30, we use (n-1) in the denominator to obtain a better estimate of the standard deviation of the population from which the sample data is taken.
  15. The correct answer is B. The solution requires us to recognize that Variable Z is a combination of two independent random variables. As such, the variance of U is equal to the variance of X plus the variance of Y. Var(U) = Var(X - Y) = Var(X) + Var(Y) = 16 + 9 = 25 The standard deviation of U is equal to the square root of the variance. Therefore, the standard deviation is equal to the square root of 25, or 5.
  16. Q(1) Minimum = $5 Maximum =$8 Range = $3 Mean Deviation = $1.1 Standard deviation = $1.252 Variance = 1.567 Q(2): Mean = (80+90)/2 = 85 Variance (Exam 1 + Exam 2) = Var (Exam 1) + Var (Exam 2) = 36 + 64 = 100
  17. 4800, 12000, 2,538.07, 6,441,777.78 1.787, 2.66, 0.33121, 0.10970 1.23, 1.302, 0.02283, 0.00052
  18. C.V% = 18.69
  19. From the z-scores, you can conclude that a driving time of 38 minutes is 0.2 standard deviations below the mean; a driving time of 52 minutes is 1.2 standard deviations below the mean; a driving time of 58 minutes is 1.8 standard deviations above the mean; and a driving time of 40 minutes is zero standard deviations or equal to the mean.
  20. San Francisco: z = (16-22)/4 = -1.5 New England: z = (29-22)/4 = 1.75 Pittsburgh: z = (24-22)/4 = 0.5 San Francisco is 1.5 standard deviations below the mean; New England is 1.75 standard deviations above the mean; and Pittsburgh is 0.5 standard deviations above the mean. To the point this statistics was recorded, out of the three teams New England had the best record, followed by Pittsburgh.
  21. For x = $85,000, z = (x-m)/s = (85,000 – 100,000)/10,000 = -1.5 For x = $105,000, z = (x-m)/s = (105,000 – 100,000)/10,000 = 0.5 The z of -1.5 indicates that an annual salary of $85,000 is one and half standard deviation below the mean, and a z of 0.5 indicates that an annual salary of $105,000 is half standard deviation above the mean.
  22. For Arkansas governor: x = $85,000, z = (x-m)/s = (85,000 – 135,450)/36,530 = -1.381 For x = $206,000, z = (x-m)/s = (206,000 – 135,450)/36,530 = 1.931 The z of -1.381 indicates that the annual salary of the Arkansas governor of $85,000 was 1.381 standard deviation below the governor mean salary, and a z of 1.931 indicates that the annual salary of the California governor of $206,000 was 1.931 standard deviation above the governor’s mean salary.
  23. Performing descriptive statistics using Excel®: Using the file in which the data in question is presented we take the following steps: Step 1: Click on Data Step 2: Click on Data Analysis
  24. Steps 3 and 4: Select ‘Descriptive Statistics’ and Click ‘OK’ . This will display the ‘Descriptive Statistics’ Menu shown in next slide
  25. Step 5: Select the Input Range, which is the column of data to be analyzed. As shown here, the input range is the data of In-State Tuition, or cells G2:G16. Since you have a label in Cell G2, Check ‘label in first raw’ box. Check ‘Summary Statistics’. You may also check ‘kth largest’ and kth smallest’ and specify the largest and smallest number of observations that you would like to set as thresholds for your analysis. This will give you the maximum value of the smallest four observations, and the minimum value of the largest 4 observations. Step 6: Click ‘OK’ to obtain the final output, which will appear in a different Excel® Sheet You can also make the output appear in the same sheet by specifying a cell in the output range window.
  26. The output of the analysis of the previous example is shown here for all the variables in the data set, obtained by following the steps described above and changing the input range in Excel® ‘Descriptive Analysis’ to analyze the variable in question. Work with your students to interpret the results…see below Description and interpretation of results: The steps described above will take only few seconds to perform. Therefore, the true effort should consist of reading the outputs, describing what we understood, and attempting to interpret the results. The following points reflect these key aspects of analysis and they are listed here as guidelines to students since different people may have different reads and different interpretations of the analysis output. The average in-state tuition per year of the schools under study is $5,079. However, some schools can be as low as $3,094 (University of Florida) and some can be as high as $7,314 (University of South Carolina). The median of in-state tuition is $4,756. The mode of in-state tuition is $4,515; that is what most schools in this set charge for in-state tuition. The average out-of-state tuition of the schools under study is $16,550. However, some schools can be as low as $9,744 (University of Mississippi) and some can be as high as $24,324 (University of California). The median of out-state tuition is $16,460. The mode of out-state tuition is not well-defined as there may be more than one mode involved. The average total cost of the schools under study is $22,895. However, some schools can be as low as $14,442 (University of Mississippi) and some can be as high as $36,252 (University of California). The median of total cost is $22,979. The mode of total cost is not well-defined as there may be more than one mode involved. The average financial aid of the schools under study is $8,878. However, some schools can be as low as $5,487 (North Dakota State University) and some can be as high as $13,462 (University of California). The median of financial aid is $8,114. The mode of financial aid is not well-defined as there may be more than one mode involved. You may also set thresholds for a certain variable of interest, say total cost, by specifying the maximum of the smallest four universities, and the minimum of the largest four universities. For total cost, the maximum of the smallest four universities is $18,540 (University of Alabama), and the minimum of the largest four universities is $25,039 (University of South Carolina). As you can see from the above points, the guideline to describe and interpret analysis outputs will depend on the key questions that one wishes to address in the analysis application. Examples of these questions are as follows: Q1: Which school will be associated with the lowest total cost of education? University of Mississippi Q2: Which school will be associated with the largest financial aid for education? University of California Q3: Which school(s) will have the lowest total cost and the highest financial aid? This question may not be easy to answer if there is a positive association between total cost and financial aid (i.e. the higher the cost, the higher the financial aid). Unfortunately, this is normally the case for most schools as shown in in next slide. In this case, you may have to make a compromising choice such as University of Kentucky, University of Mississippi, and University of Alabama.