Assignment Week 1.doc
Due by 11pm June 30th
Chapter 1
Overview of Statistics
Chapter 2
Data Collection
Chapter 3
Describing Data Visually
Upload the completed assignment using the file extension
format Lastname_Firstname_Week1.doc.
Assignment
(32 points due by 11 pm June 30th)
Note: You can team up with one of your classmates to complete
the assignment (not more than two in a team); if you want to
work on the assignment individually, that’s also fine. If you are
working in teams, then only one submission is required per
team; include both the team members’ last names as part of the
assignment submission file name as well as in the assignment
submission document.
Please provide detailed solutions to the following
problems/exercises (4 problems/exercises x 8 points each):
1) What type of data (categorical, discrete numerical, or
continuous numerical) is each of the following variables?
a) Length of a TV commercial.
b) Number of peanuts in a can of Planter’s Mixed Nuts.
c) Occupation of a mortgage applicant.
d) Flight time from London Heathrow to Chicago O’Hare.
2) Which measurement level (nominal, ordinal, interval, ratio)
is each of the following variables? Explain.
a) Number of employees in the Walmart store in Hutchinson,
Kansas.
b) Number of merchandise returns on a randomly chosen
Monday at a Walmart store.
c) Temperature (in Fahrenheit) in the ice-cream freezer at a
Walmart store.
d) Name of the cashier at register 3 in a Walmart store
e) Manager’s rating of the cashier at register 3 in a Walmart
store.
f) Social security number of the cashier at register 3 in a
Walmart store.
3) The results of a survey that collected the current credit card
balances for 36 undergraduate college students are given in the
file “College Credit Card.’
a) Using the 2k > n rule, construct a frequency distribution for
these data.
b) Using the results from a), calculate the relative frequencies
for each class.
c) Using the results from a), calculate the cumulative relative
frequencies for each class.
d) Construct a histogram for these data.
4) The cost of manufacturing vehicles in Mexico is very
attractive to automakers. Global carmakers build approximately
1.9 million vehicles in Mexico. Of these, nearly 76% are
exported, primarily to the US. Although General Motors is the
largest manufacturer in Mexico, Daimler Chrysler exports the
most vehicles. Automotive analysts examine both the number of
vehicles produced and the number exported (see the data file
“Automotive”) to determine the potential market share of each
company.
a) For the data on vehicles produced in Mexico, construct a bar
chart displaying the amount produced by each company.
b) Repeat part a) using a pie-chart.
c) Construct a bar chart displaying the number of vehicles
exported from Mexico.
d) Repeat part d) using a pie-chart.
e) Do you prefer the bar charts or the pie charts for displaying
the data? Explain.
f) What differences do the charts reveal for the automotive
companies with respect to the number of vehicles produced and
number of vehicles exported?
1
Automotive.xls
Sheet1ManufacturerVehiclesProducedVehiclesExportedGeneral
Motors444,670324,651Volkswagen425,703338,825DaimlerChry
sler404,637375,002Nissan313,496153,071Ford
Motor280,585234,994
Sheet2
Sheet3
College Credit Card.xlsx
Data2467337343414264628167592740644503260036734231178
85805539573348901455581711458467394370986572911591021
156396749935322972893751621740
Sheet2
Sheet3
Week 1 Overview.pdf
1
Chapter 1
Overview of Statistics
Chapter 2
Data Collection
Chapter 3
Describing Data Visually
Learning Objectives
After studying the material in Chapter 1, you should be able to:
1. Define statistics and explain some of its uses in business.
2. List reasons for a business student to study statistics.
3. State the common challenges facing business professionals
using statistics.
4. List and explain common statistical pitfalls.
After studying the material in Chapter 2, you should be able to:
1. Use basic terminology for describing data and samples.
2. Explain the distinction between numerical and categorical
data.
3. Explain the difference between time series and cross-
sectional data.
4. Recognize levels of measurement in data and ways of coding
data.
5. Recognize a Likert scale and know how to use it.
6. Use the correct terminology for samples and populations.
7. Explain the common sampling methods and how to implement
them.
8. Find everyday print or electronic data sources.
9. Describe basic elements of survey design, survey types, and
sources of error.
After studying the material in Chapter 3, you should be able to:
1. Make a stem-and-leaf or dot plot by hand or by using
software.
2. Create a frequency distribution for a dataset.
3. Make a histogram with appropriate bins.
4. Identify skewness, modal classes, and outliers in a histogram.
5. Make an effective line chart using Excel.
6. Know the rules for effective bar charts and pie charts.
7. Make and interpret a scatter plot using Excel.
8. Make simple tables and pivot tables.
9. Recognize deceptive graphing techniques.
Suggested Study Outline
1. First, briefly go through chapters 1 to 3 in the textbook to
familiarize yourself with the material.
2. Then, skim through the power point slides which highlight
key chapter material, and the lecture
files in the “course media player” (these provide a synopsis of
the week’s chapters).
3. Go through chapters 1 to 3 in the textbook in detail again and
take a look at the sample problems
before attempting the assignment.
2
Assignment
(32 points due by 11 pm June 30th)
Note: You can team up with one of your classmates to complete
the assignment (not more than two in a
team); if you want to work on the assignment individually,
that’s also fine. If you are working in teams,
then only one submission is required per team; include both the
team members’ last names as part of the
assignment submission file name as well as in the assignment
submission document.
Please provide detailed solutions to the following
problems/exercises (4 problems/exercises x 8 points
each):
1) What type of data (categorical, discrete numerical, or
continuous numerical) is each of the following
variables?
a) Length of a TV commercial.
b) Number of peanuts in a can of Planter’s Mixed Nuts.
c) Occupation of a mortgage applicant.
d) Flight time from London Heathrow to Chicago O’Hare.
2) Which measurement level (nominal, ordinal, interval, ratio)
is each of the following variables? Explain.
a) Number of employees in the Walmart store in Hutchinson,
Kansas.
b) Number of merchandise returns on a randomly chosen
Monday at a Walmart store.
c) Temperature (in Fahrenheit) in the ice-cream freezer at a
Walmart store.
d) Name of the cashier at register 3 in a Walmart store
e) Manager’s rating of the cashier at register 3 in a Walmart
store.
f) Social security number of the cashier at register 3 in a
Walmart store.
3) The results of a survey that collected the current credit card
balances for 36 undergraduate college
students are given in the file “College Credit Card.’
a) Using the 2k > n rule, construct a frequency distribution for
these data.
b) Using the results from a), calculate the relative frequencies
for each class.
c) Using the results from a), calculate the cumulative relative
frequencies for each class.
d) Construct a histogram for these data.
4) The cost of manufacturing vehicles in Mexico is very
attractive to automakers. Global carmakers build
approximately 1.9 million vehicles in Mexico. Of these, nearly
76% are exported, primarily to the US.
Although General Motors is the largest manufacturer in Mexico,
Daimler Chrysler exports the most
vehicles. Automotive analysts examine both the number of
vehicles produced and the number exported
(see the data file “Automotive”) to determine the potential
market share of each company.
a) For the data on vehicles produced in Mexico, construct a bar
chart displaying the amount produced by
each company.
b) Repeat part a) using a pie-chart.
c) Construct a bar chart displaying the number of vehicles
exported from Mexico.
d) Repeat part d) using a pie-chart.
e) Do you prefer the bar charts or the pie charts for displaying
the data? Explain.
3
f) What differences do the charts reveal for the automotive
companies with respect to the number of
vehicles produced and number of vehicles exported?
Refer to the “Assignments” section in the syllabus and the
“Course Orientation” document for more
information/instructions regarding assignment submissions.
Zipped Chapters 1 & 2 Material.zip
Chapter 1 Power Point Slides.pdf
C
ha
Overview of StatisticsOverview of Statistics
pter
Chapter ContentsChapter Contents
1
Chapter ContentsChapter Contents
1 1 What is Statistics?1 1 What is Statistics?1.1 What is
Statistics?1.1 What is Statistics?
1.2 Why Study Statistics?1.2 Why Study Statistics?
1.3 Uses of Statistics1.3 Uses of Statistics
1.4 Statistical Challenges1.4 Statistical Challenges
1.5 Critical Thinking1.5 Critical Thinking
1-1
C
ha
Overview of StatisticsOverview of Statistics
pter
Chapter Learning ObjectivesChapter Learning Objectives
1
Chapter Learning ObjectivesChapter Learning Objectives
LO1LO1 11LO1LO1--1:1: Define statistics and explain some of
its uses inDefine statistics and explain some of its uses in
business.business.
LO1LO1--2:2: List reasons for a business student to study
statistics.List reasons for a business student to study statistics.
LO1LO1--3:3: State the common challenges facing
businessState the common challenges facing business
professionals using statistics.professionals using statistics.
LO1LO1--4:4: List and explain common statistical pitfalls.List
and explain common statistical pitfalls.p pp p
1-2
C
ha
1.1 What is Statistics?1.1 What is Statistics?LO1LO1--11
pter 1
LO1LO1 11LO1LO1--1:1: Define statistics and explain some of
Define statistics and explain some of
its uses in business.its uses in business.
•• StatisticsStatistics is the science of collecting organizing
analyzingis the science of collecting organizing
analyzingStatisticsStatistics is the science of collecting,
organizing, analyzing, is the science of collecting, organizing,
analyzing,
interpreting, and presenting data.interpreting, and presenting
data.
•• AA statisticstatistic is a single measure (number) used to
summarize is a single measure (number) used to summarize
a sample data set; for example the average height ofa sample
data set; for example the average height ofa sample data set; for
example, the average height of a sample data set; for example,
the average height of
students in a university.students in a university.
1-3
C
ha
1.1 What is Statistics?1.1 What is Statistics?LO1LO1--11
pter 1
•• For the height of students, a graduation gown manufacturer
may need to For the height of students, a graduation gown
manufacturer may need to
know the average height for the length of the gowns or an
architect may know the average height for the length of the
gowns or an architect may
need to know the maximum height to design the height of the
doorwaysneed to know the maximum height to design the height
of the doorwaysneed to know the maximum height to design the
height of the doorways need to know the maximum height to
design the height of the doorways
of the classrooms.of the classrooms.
1-4
C
ha
1.2 Why Study Statistics?1.2 Why Study Statistics?
pter
LO1LO1--22
1
LO1LO1--2: 2: List reasons for a business student to study
statistics.List reasons for a business student to study statistics.
•• Knowing statistics will make you a better consumer of other
Knowing statistics will make you a better consumer of other
people's data. people's data. p pp p
•• You should know enough to handle everyday data You should
know enough to handle everyday data
problems, to feel confident that others cannot deceive you
problems, to feel confident that others cannot deceive you
with spurious arguments, and to know when you've reached with
spurious arguments, and to know when you've reached
the limits of your expertise.the limits of your expertise.
1-5
S S ?S S ?
C
ha
1.2 Why Study Statistics?1.2 Why Study Statistics?
pter
LO1LO1--22
•• Statistical knowledge gives a company a competitive
Statistical knowledge gives a company a competitive
1
advantage against organizations that cannot understand
advantage against organizations that cannot understand
their internal or external market data. their internal or external
market data.
Mastery of basic statistics gives an individual manager
aMastery of basic statistics gives an individual manager a••
Mastery of basic statistics gives an individual manager a
Mastery of basic statistics gives an individual manager a
competitive advantage as one works one’s way through the
competitive advantage as one works one’s way through the
promotion process, or when one moves to a new employer.
promotion process, or when one moves to a new employer. p p ,
p yp p , p y
•• Here are some reasons to study statistics.Here are some
reasons to study statistics.
1-6
C
ha
1.2 Why Study Statistics?1.2 Why Study Statistics?LO1LO1--
22
pter
C i tiC i ti
1
CommunicationCommunication
Understanding the language of statistics facilitates
Understanding the language of statistics facilitates
i ti d i bl l ii ti d i bl l icommunication and improves problem
solving.communication and improves problem solving.
Computer SkillsComputer Skills
The use of spreadsheets for data analysis and word processors
or The use of spreadsheets for data analysis and word
processors or
presentation software for reports improves upon your existing
skills.presentation software for reports improves upon your
existing skills.
1-7
C
ha
1.2 Why Study Statistics?1.2 Why Study Statistics?
pter
LO1LO1--22
Information ManagementInformation Management
1
Information ManagementInformation Management
Statistics helps summarize small and large amounts of data
Statistics helps summarize small and large amounts of data
and reveal underlying relationshipsand reveal underlying
relationships..
Technical LiteracyTechnical Literacy
Career opportunities are in growth industries propelled by
Career opportunities are in growth industries propelled by
advanced technology. The use of statistical software increases
advanced technology. The use of statistical software increases
your technical literacyyour technical literacyyour technical
literacy.your technical literacy.
1-8
C
ha
1.2 Why Study Statistics?1.2 Why Study Statistics?
pter
LO1LO1--22
Process ImprovementProcess Improvement
1
pp
Statistics helps firms oversee their suppliers monitor
theirStatistics helps firms oversee their suppliers monitor
theirStatistics helps firms oversee their suppliers, monitor their
Statistics helps firms oversee their suppliers, monitor their
internal operations, and identify problems.internal operations,
and identify problems.
1-9
C
ha
1.3 Uses of Statistics1.3 Uses of Statistics
pter
Two primary kinds of statistics:Two primary kinds of statistics:
1
Two primary kinds of statistics:Two primary kinds of statistics:
Descriptive statisticsDescriptive statistics – the collection,
organization, presentation, the collection, organization,
presentation,
and summary of data.and summary of data.
Inferential statisticsInferential statistics – generalizing from a
sample to a generalizing from a sample to a
population, estimating unknown parameters, drawing
population, estimating unknown parameters, drawing popu a o ,
es a g u o pa a e e s, d a gpopu a o , es a g u o pa a e e s, d a g
conclusions, making decisions.conclusions, making decisions.
1-10
C
ha
1.3 Uses of Statistics1.3 Uses of Statistics
pter
LO1LO1--11
1
1-11
C
ha
1.3 Uses of Statistics1.3 Uses of Statistics
pter
LO1LO1--11
AuditingAuditing
1
AuditingAuditing
Sample from over 12,000 invoices to estimate the proportion of
Sample from over 12,000 invoices to estimate the proportion of
incorrectly paid invoicesincorrectly paid invoices
MarketingMarketing
incorrectly paid invoices.incorrectly paid invoices.
MarketingMarketing
Identify likely repeat customers for Identify likely repeat
customers for Amazon.comAmazon.com and suggest coand
suggest co--
marketing opport nities based on a database of 5
millionmarketing opport nities based on a database of 5
millionmarketing opportunities based on a database of 5 million
marketing opportunities based on a database of 5 million
Internet purchases.Internet purchases.
1-12
C
ha
1.3 Uses of Statistics1.3 Uses of Statistics
pter
LO1LO1--11
Health CareHealth Care
1
Evaluate 100 incoming patients using a 42Evaluate 100
incoming patients using a 42--item physical and item physical
and
mental assessment questionnaire.mental assessment
questionnaire.
Quality ImprovementQuality Improvement
Initiate a triple inspection program, setting penalties for
workers Initiate a triple inspection program, setting penalties
for workers
who produce poorwho produce poor--quality outputquality
outputwho produce poorwho produce poor--quality
output.quality output.
1-13
C
ha
1.3 Uses of Statistics1.3 Uses of Statistics
pter
LO1LO1--11
PurchasingPurchasing
1
Determine the defect rate of a shipment and whether that rate
Determine the defect rate of a shipment and whether that rate
h h d i ifi tl tih h d i ifi tl tihas changed significantly over
time.has changed significantly over time.
MedicineMedicine
Determine whether a new drug is really better than the
Determine whether a new drug is really better than the
placebo or if the difference is due to chanceplacebo or if the
difference is due to chanceplacebo or if the difference is due to
chance.placebo or if the difference is due to chance.
1-14
C
ha
1.3 Uses of Statistics1.3 Uses of Statistics
pter
LO1LO1--11
Operations ManagementOperations Management
1
Operations ManagementOperations Management
Manage inventory by forecasting consumer demand.Manage
inventory by forecasting consumer demand.
Product WarrantyProduct Warrantyyy
Determine the average dollar cost of engine Determine the
average dollar cost of engine
warranty claims on a new hybrid enginewarranty claims on a
new hybrid enginewarranty claims on a new hybrid
engine.warranty claims on a new hybrid engine.
1-15
C
ha
1.4 Statistical Challenges1.4 Statistical Challenges
pter
LO1LO1--33
1
LO1LO1--3: 3: State the common challenges facingState the
common challenges facing
business professionals using statistics.business professionals
using statistics.
The Ideal Data AnalystThe Ideal Data Analyst
•• Is technically current (e.g., softwareIs technically current
(e.g., software--wise).wise).
•• Communicates wellCommunicates well•• Communicates
well.Communicates well.
•• Is proactive.Is proactive.
1-16
C
ha
1.4 Statistical Challenges1.4 Statistical Challenges
pter
LO1LO1--33
The Ideal Data AnalystThe Ideal Data Analyst
1
•• Has a broad outlook.Has a broad outlook.
•• Is flexible.Is flexible.
•• Focuses on the main problem.Focuses on the main problem.
•• Meets deadlinesMeets deadlines
1-17
C
ha
1.4 Statistical Challenges1.4 Statistical Challenges
pter
LO1LO1--33
The Ideal Data AnalystThe Ideal Data Analyst
1
•• Knows his/her limitations and is willing to ask for
help.Knows his/her limitations and is willing to ask for help.
•• Can deal with imperfect information.Can deal with imperfect
information.
•• Has professional integrityHas professional integrity•• Has
professional integrity.Has professional integrity.
1-18
C
ha
1.4 Statistical Challenges1.4 Statistical Challenges
pter
LO1LO1--33
1
Imperfect Data and Practical ConstraintsImperfect Data and
Practical Constraints
State any assumptions and limitations and use generallyState
any assumptions and limitations and use generallyState any
assumptions and limitations and use generally State any
assumptions and limitations and use generally
accepted statistical tests to detect unusual data points or to
accepted statistical tests to detect unusual data points or to
deal with missing data. You will face constraints on the type
deal with missing data. You will face constraints on the type
and quality of data you can collect.and quality of data you can
collect.
1-19
C
ha
1.4 Statistical Challenges1.4 Statistical Challenges
pter
LO1LO1--33
Business EthicsBusiness Ethics
1
Business EthicsBusiness Ethics
Some broad ethical responsibilities of business areSome broad
ethical responsibilities of business are
•• Treating customers in a fair and honest manner.Treating
customers in a fair and honest manner.
C l i i h l h hibi di i i iC l i i h l h hibi di i i i•• Complying with
laws that prohibit discrimination.Complying with laws that
prohibit discrimination.
•• Ensuring that products and services meet safety
regulations.Ensuring that products and services meet safety
regulations.
1-20
C
ha
1.4 Statistical Challenges1.4 Statistical Challenges
pter
LO1LO1--33
Business EthicsBusiness Ethics
1
Some broad ethical responsibilities of business are
(continued)Some broad ethical responsibilities of business are
(continued)
•• Standing behind warranties.Standing behind warranties.
•• Advertising in a factual and informative mannerAdvertising
in a factual and informative mannerAdvertising in a factual and
informative manner.Advertising in a factual and informative
manner.
•• Encouraging employees to ask questions and voice concerns
Encouraging employees to ask questions and voice concerns
about the company’s business practices.about the company’s
business practices.
•• Being responsible for accurately reporting information to
Being responsible for accurately reporting information to
management.management.
1-21
C
ha
1.4 Statistical Challenges1.4 Statistical Challenges
pter
LO1LO1--33
Upholding Ethical StandardsUpholding Ethical Standards
1
Upholding Ethical StandardsUpholding Ethical Standards
Ethi l t d d f th d t l tEthi l t d d f th d t l tEthical standards for
the data analyst:Ethical standards for the data analyst:
•• Know and follow accepted proceduresKnow and follow
accepted procedures•• Know and follow accepted
procedures.Know and follow accepted procedures.
•• Maintain data integrity.Maintain data integrity.
•• Carry out accurate calculationsCarry out accurate
calculations•• Carry out accurate calculations.Carry out
accurate calculations.
1-22
C
ha
1.4 Statistical Challenges1.4 Statistical Challenges
pter
LO1LO1--33
Upholding Ethical StandardsUpholding Ethical Standards
1
Upholding Ethical StandardsUpholding Ethical Standards
Ethical standards for the data analyst (continued):Ethical
standards for the data analyst (continued):Ethical standards for
the data analyst (continued):Ethical standards for the data
analyst (continued):
•• Report procedures faithfully.Report procedures faithfully.p p
yp p y
•• Protect confidential information.Protect confidential
information.
•• Cite sources.Cite sources.
•• Acknowledge sources of financial support.Acknowledge
sources of financial support.
1-23
C
ha
1.4 Statistical Challenges1.4 Statistical Challenges
pter
LO1LO1--33
U i C lt tU i C lt t
1
Using ConsultantsUsing Consultants
Hire consultants at the Hire consultants at the
beginningbeginning of the project, when your team of the
project, when your team g gg g p j , yp j , y
lacks certain skills or when an unbiased or informed view is
needed.lacks certain skills or when an unbiased or informed
view is needed.
1-24
C
ha
1.4 Statistical Challenges1.4 Statistical Challenges
pter
LO1LO1--33
1
Communicating with NumbersCommunicating with Numbers
N b h i l h i t d i th t t fN b h i l h i t d i th t t f•• Numbers have
meaning only when communicated in the context of a Numbers
have meaning only when communicated in the context of a
certain situation.certain situation.
•• Presentation should be such that managers will quickly
understandPresentation should be such that managers will
quickly understandPresentation should be such that managers
will quickly understand Presentation should be such that
managers will quickly understand
the information they need to use in order to make good
decisions. the information they need to use in order to make
good decisions.
1-25
C
ha
1.4 Statistical Challenges1.4 Statistical Challenges
pter
LO1LO1--33
Skills Needed for Success in BusinessSkills Needed for Success
in Business
1
(Table 1.1)(Table 1.1)For For initialinitial
job successjob success
For For longlong--rangerange
job successjob success
Common Common
weaknessesweaknessesjob successjob success job successjob
success weaknessesweaknesses
ReportReport--writingwriting
Managerial Managerial Communication Communication
ReportReport writingwriting accountingaccounting skillsskills
Accounting Accounting Managerial Managerial Writing
skillsWriting skillsprinciplesprinciples economicseconomics
Writing skillsWriting skills
MathematicsMathematics Managerial financeManagerial finance
ImmaturityImmaturityMathematicsMathematics Managerial
financeManagerial finance ImmaturityImmaturity
StatisticsStatistics Oral Oral
communicationcommunication
Unrealistic Unrealistic
expectationsexpectationscommunicationcommunication
expectationsexpectations
1-26
C
ha
1.5 Critical Thinking1.5 Critical Thinking
pter
•• Statistics is an essential part of Statistics is an essential part
of critical thinkingcritical thinking because it because it
ll t t t id i t i i l idll t t t id i t i i l id
1
allows us to test an idea against empirical evidence.allows us to
test an idea against empirical evidence.
E i i l d tE i i l d t t d t ll t d th h b tit d t ll t d th h b ti••
Empirical data Empirical data represent data collected through
observation represent data collected through observation
and experiments.and experiments.
•• Statistical tools are used to compare prior ideas with
empirical Statistical tools are used to compare prior ideas with
empirical
data, but data, but pitfalls do occur.pitfalls do occur.pp
1-27
C
ha
1.5 Critical Thinking1.5 Critical Thinking
pter
LO1LO1--44
1
LO1LO1--4: 4: List and explain common statistical pitfallsList
and explain common statistical pitfalls.
Pitfall 1:Pitfall 1: Making Conclusions about a LargeMaking
Conclusions about a LargePitfall 1: Pitfall 1: Making
Conclusions about a Large Making Conclusions about a Large
Population from a Small SamplePopulation from a Small
Sample
Be careful about making generalizations from small samples Be
careful about making generalizations from small samples
( f 10 ti t h h d i t)( f 10 ti t h h d i t)(e.g., a group of 10
patients who showed improvement).(e.g., a group of 10 patients
who showed improvement).
1-28
C
hapter
1.5 Critical Thinking1.5 Critical ThinkingLO1LO1--44
Pitfall 2: Pitfall 2: Making Conclusions fromMaking
Conclusions from
1
gg
Nonrandom SamplesNonrandom Samples
Be careful about making generalizations from small samples and
Be careful about making generalizations from small samples and
from retrospective studies of special groups (e.g., studying
heart from retrospective studies of special groups (e.g.,
studying heart p p g p ( g , y gp p g p ( g , y g
attack patients without defining matched control group).attack
patients without defining matched control group).
1-29
C
hapter
1.5 Critical Thinking1.5 Critical ThinkingLO1LO1--44
Pitfall 3: Pitfall 3: Conclusions From Rare EventsConclusions
From Rare Events
1
Be careful about drawing strong inferences from events that are
Be careful about drawing strong inferences from events that are
not surprising when looking at the entire population (e.g., not
surprising when looking at the entire population (e.g., p g g p p
( g ,p g g p p ( g ,
winning the lottery).winning the lottery).
Pitfall 4: Pitfall 4: Using Poor Survey MethodsUsing Poor
Survey Methods
Be careful about using poor sampling methods or vaguely Be
careful about using poor sampling methods or vaguely
worded questions (e.g., anonymous survey or quiz).worded
questions (e.g., anonymous survey or quiz).
1-30
C
ha
CC
pter
1.5 Critical Thinking1.5 Critical ThinkingLO1LO1--44
Pitfall 5: Pitfall 5: Assuming a Causal Link Based on
Assuming a Causal Link Based on
1
gg
ObservationsObservations
Be careful about drawing conclusions when no causeBe careful
about drawing conclusions when no cause--andand--effect effect
link exists (e.g., most shark attacks occur between 12 p.m. and
2 link exists (e.g., most shark attacks occur between 12 p.m.
and 2
p.m.).p.m.).
1-31
C
hapter
1.5 Critical Thinking1.5 Critical ThinkingLO1LO1--44
Pitfall 6:Pitfall 6: Generalization to Individuals
fromGeneralization to Individuals from
1
Pitfall 6: Pitfall 6: Generalization to Individuals from
Generalization to Individuals from
Observations about GroupsObservations about Groups
Avoid reading too much into Avoid reading too much into
statistical generalizationsstatistical generalizations
(e g men are taller than women)(e g men are taller than
women)(e.g., men are taller than women).(e.g., men are taller
than women).
1-32
C
ha
CC
pter
1.5 Critical Thinking1.5 Critical ThinkingLO1LO1--44
Pitfall 7: Pitfall 7: Unconscious BiasUnconscious Bias
1
Be careful about unconsciously or subtly allowing bias to color
Be careful about unconsciously or subtly allowing bias to color
handling of data (e.g., heart disease in men vs.
women).handling of data (e.g., heart disease in men vs.
women).g ( g , )g ( g , )
Pitfall 8: Pitfall 8: Significance versus ImportanceSignificance
versus Importance
Statistically significant effects may lack practical importance
Statistically significant effects may lack practical importance
(e.g., Austrian military recruits born in the spring average 0.6
cm (e.g., Austrian military recruits born in the spring average
0.6 cm
t ll th th b i th f ll)t ll th th b i th f ll)taller than those born in
the fall).taller than those born in the fall).
1-33
Chapter 2 Power Point Slides.pdf
C
ha
CC
pter
Data CollectionData Collection
Chapter ContentsChapter Contents
2 1 Definitions2 1 Definitions
2
2.1 Definitions2.1 Definitions
2.2 Level of Measurement2.2 Level of Measurement
2 3 S li C t2 3 S li C t2.3 Sampling Concepts2.3 Sampling
Concepts
2.4 Sampling Methods2.4 Sampling Methods
2.5 Data Sources2.5 Data Sources
2.6 Surveys2.6 Surveys2.6 Surveys 2.6 Surveys
2-1
C
ha
CC
pter
Data CollectionData Collection
Chapter Learning ObjectivesChapter Learning Objectives
2
Chapter Learning ObjectivesChapter Learning Objectives
LO2LO2 11LO2LO2--1: 1: Use basic terminology for describing
data and samples.Use basic terminology for describing data and
samples.
LO2LO2--2: 2: Explain the distinction between numerical and
Explain the distinction between numerical and
categorical data.categorical data.
LO2LO2--3: 3: Explain the difference between time series and
crossExplain the difference between time series and cross--
sectional data.sectional data.
LO2LO2--4: 4: Recognize levels of measurement in data and
ways of Recognize levels of measurement in data and ways of
di d tdi d tcoding data.coding data.
LO2LO2--5: 5: Recognize a Likert scale and know how to use
it.Recognize a Likert scale and know how to use it.
2-2
C
ha
CC
pter
Data CollectionData Collection
Chapter Learning ObjectivesChapter Learning Objectives
2
LO2LO2--6: 6: Use the correct terminology for samples and Use
the correct terminology for samples and gy pgy p
populations.populations.
LO2LO2--7: 7: Explain the common sampling methods and how
to Explain the common sampling methods and how to p p gp
p g
implement them.implement them.
LO2LO2--8: 8: Find everyday print or electronic data sources.
Find everyday print or electronic data sources. y y py y p
LO2LO2--9: 9: Describe basic elements of survey design,
survey types, Describe basic elements of survey design, survey
types,
and sources of error.and sources of error.
2-3
C
ha
LO2LO2--11 2.1 Definitions2.1 Definitions
pter
LO2LO2--1: 1: Use basic terminology for describing data and
Use basic terminology for describing data and
samples.samples.
2
samples.samples.
Observations, Observations, Variables, Data SetsVariables,
Data Sets
•• ObservationObservation: : a single member of a collection of
items that we want to study, such as a person, firm,
or region.
V i blV i bl h t i ti f th bj t•• Variable:Variable: a characteristic
of the subject or
individual, such as an employee’s income or an
i i tinvoice amount
•• Data SetData Set: consists of all the values of all
of the variables for all of the observations we haveof the
variables for all of the observations we have
chosen to observe.
2-4
C
hapter
2.1 Definitions2.1 Definitions
Table 2.2: Table 2.2: Number of Variables and Typical
TasksNumber of Variables and Typical Tasks
2
Data SetData Set VariablesVariables ExampleExample Typical
TasksTypical Tasks
Univariate One Income Histograms, descriptive statistics,
frequency tallies
Bivariate Two Income, Age
Scatter plots, correlations,
regression modeling
Multivariate More than two
Income,
Age,
Multiple regression, data
mining, econometric two Gender modeling
2-5
D t TD t T
C
ha
Data TypesData Types
pter
LO2LO2--22
2
LO2LO2--2: 2: Explain the distinction between numerical and
categorical data.Explain the distinction between numerical and
categorical data.
• Note: Ambiguity is introduced when continuous data are
(Figure 2.1)
Note: Ambiguity is introduced when continuous data are
rounded to whole numbers. Be cautious.
2-6
C
hapter
Time Series versus CrossTime Series versus Cross--Sectional
DataSectional DataLO2LO2--33
2
LO2LO2--3: 3: Explain the difference between time series and
crossExplain the difference between time series and cross--
sectional sectional
data.data.
Time Series DataTime Series Data
• Each observation in the sample represents a different equally
d i t i ti ( th d )spaced point in time (e.g., years, months, days).
• Periodicity may be annual, quarterly, monthly, weekly, daily,
hourly,
etc.etc.
• We are interested in trends and patterns over time (e.g.,
personal
bankruptcies from 1980 to 2008).
2-7
C
hapter
Time Series Versus CrossTime Series Versus Cross--Sectional
DataSectional DataLO2LO2--33
2
Cross Sectional DataCross Sectional Data
• Each observation represents a different individual unit (e.g.,
person) at the same point in time (e g monthly VISA
balances)person) at the same point in time (e.g., monthly VISA
balances).
• We are interested in:
- variation among observations or
- relationships.
• We can combine the two data types to get pooled cross-
sectional
and time series dataand time series data.
2-8
C
hapter
2.2 Level of Measurement2.2 Level of MeasurementLO2LO2--
44
2
LO2LO2--4: 4: Recognize levels of measurement in data and
ways of Recognize levels of measurement in data and ways of
coding data.coding data.
2-9
C
hapter
2.2 Level of Measurement2.2 Level of MeasurementLO2LO2--
44
2
LO2LO2--4: 4: Recognize levels of measurement in data and
ways of Recognize levels of measurement in data and ways of
coding data.coding data.
Levels of MeasurementLevels of Measurement
L l fL l f
gg
Level of Level of
MeasurementMeasurement CharacteristicsCharacteristics
ExampleExample
NominalNominal Categories only Eye color ((blueblue, ,
brownbrown, , greengreen, etc.), etc.)
Rank has meaning
OrdinalOrdinal
Rank has meaning.
No clear meaning to
distance
Rarely, never
IntervalInterval Distance has meaning Temperature (57
o Celsius)
M i f l A t bl ($21 7RatioRatio Meaningful zero exists
Accounts payable ($21.7
million)
2-10
C
hapter
2.2 Level of Measurement2.2 Level of MeasurementLO2LO2--
44
Nominal MeasurementNominal Measurement
2
• Nominal data merely identify a categorycategory.
• Nominal data are qualitative, attribute, categorical or
classification data and can be coded numericallyclassification
data and can be coded numerically
(e.g., 1 = Apple, 2 = Compaq, 3 = Dell, 4 = HP).
• Only mathematical operations are counting (e.g.,Only
mathematical operations are counting (e.g.,
frequencies) and simple statistics.
Ordinal MeasurementOrdinal Measurement
• Ordinal data codes can be ranked (e.g., 1 = ( g
Frequently, 2 = Sometimes, 3 = Rarely, 4 = Never).
2-11
C
hapter
2.2 Level of Measurement2.2 Level of MeasurementLO2LO2--
44
Ordinal MeasurementOrdinal Measurement
• Distance between codes is not meaningful
2
• Distance between codes is not meaningful
(e.g., distance between 1 and 2, or between 2 and 3, or
between 3 and 4 lacks meaning).g)
• Many useful statistical tests exist for ordinal data. Especially
useful in social science, marketing and human resource
hresearch.
I t l M tI t l M tInterval MeasurementInterval Measurement
• Data can not only be ranked, but also have meaningful
inter als bet een scale points (e g difference bet een
2-12
C
hapter
2.2 Level of Measurement2.2 Level of MeasurementLO2LO2--
44
Interval MeasurementInterval Measurement
2
• Since intervals between numbers represent distances,
mathematical operations can be performed (e.g., average).
• Zero point of interval scales is arbitrary so ratios are not• Zero
point of interval scales is arbitrary, so ratios are not
Ratio MeasurementRatio Measurement
• Ratio data have all properties of nominal ordinal and
intervalRatio data have all properties of nominal, ordinal and
interval
data types and also possess a meaningful zeromeaningful zero
(absence of
quantity being measured).
2-13
C
hapter
2.2 Level of Measurement2.2 Level of MeasurementLO2LO2--
44
Ratio MeasurementRatio Measurement
2
• Because of this zero point, ratios of data values are
meaningful
(e.g., $20 million profit is twice as much as $10 million).
• Zero does not have to be observable in the data; it is an
absolute• Zero does not have to be observable in the data; it is
an absolute
reference point.
2-14
C
hapter
2.2 Level of Measurement2.2 Level of MeasurementLO2LO2--
55
2
LO2LO2--5: 5: Recognize a Likert scale and know how to use
it.Recognize a Likert scale and know how to use it.
Likert ScalesLikert Scales
• A special case of interval data frequently used in survey
research.
• The coarseness of a Likert scale refers to the number of scale
points (typically 5 or 7).
2-15
C
hapter
2.2 Level of Measurement2.2 Level of MeasurementLO2LO2--
55
Likert Scales (examples)Likert Scales (examples)
“C ll“C ll b d hi h h l t d t h ld b i d tb d hi h h l t d t h ld b i d t
2
“College“College--bound high school students should be
required to bound high school students should be required to
study a foreign language.” (check one)study a foreign
language.” (check one)
StronglyStrongly SomewhatSomewhat Neither AgreeNeither
Agree SomewhatSomewhat StronglyStrongly
AgreeAgree AgreeAgree Nor DisagreeNor Disagree
DisagreeDisagree DisagreeDisagree
How would you rate your marketing instructor? (check one)How
would you rate your marketing instructor? (check one)
TerribleTerrible
PoorPoor
AdequateAdequate
GoodGood
ExcellentExcellentTerribleTerrible PoorPoor AdequateAdequate
GoodGood ExcellentExcellent
2-16
C
hapter
2.2 Level of Measurement2.2 Level of MeasurementLO2LO2--
44
Use the following procedure Use the following procedure to
recognize to recognize data types:data types:
2
QuestionQuestion If “Yes”If “Yes”
Q1 I th i f l R ti d t ( t ti ti l ti ll d)Q1. Is there a meaningful
zero point?
Ratio data (statistical operations are allowed)
Q2 Are intervals between Interval data (common statistics
allowedQ2. Are intervals between
scale points meaningful?
Interval data (common statistics allowed,
e.g., means and standard deviations)
Q3 Do scale points Ordinal data (restricted to certain types
ofQ3. Do scale points
represent rankings?
Ordinal data (restricted to certain types of
nonparametric statistical tests)
Q4 Are there discrete Nominal data (only counting allowed e
gQ4. Are there discrete
categories?
Nominal data (only counting allowed, e.g.,
finding the mode)
2-17
C
hapter
2.2 Level of Measurement2.2 Level of MeasurementLO2LO2--
44
Changing Data By RecodingChanging Data By Recoding
2
• In order to simplify data or when exact data magnitude is of
little
interest ratio data can be recoded downward into ordinal
orinterest, ratio data can be recoded downward into ordinal or
nominal measurements (but not conversely).
• For example, recode systolic blood pressure as “normal”
(under
130), “elevated” (130 to 140), or “high” (over 140).
• The above recoded data are ordinal (ranking is preserved), but
i t l l d i f ti i l tintervals are unequal and some information is
lost.
2-18
C
hapter
LO2LO2--66 2.3 Sampling Concepts2.3 Sampling Concepts
2
LO2LO2--6: 6: Use the correct terminology for samples and
populationsUse the correct terminology for samples and
populationsgy p p pgy p p p
Sample or CensusSample or Census
• A samplesample involves looking only at some items selected
from the
l tipopulation.
• A censuscensus is an examination of all items in a defined
population.
• Why can’t the United States Census survey every person in
the• Why can t the United States Census survey every person in
the
population? – mobility, un-documented workers, budget
constraints, incomplete responses, etc.
2-19
C
hapter
2.3 Sampling Concepts2.3 Sampling ConceptsLO2LO2--66
Situations Where A Situations Where A SampleSample or or
CensusCensus May Be PreferredMay Be Preferred
2
SampleSample Census Census pp
Infinite population Small
populationInfinite population
Small population
Destructive testing Large sample
sizeDestructive testing Large
sample size
Ti l lt D t b i tTi l lt D t b i tTimely results
Database existsTimely results
Database exists
Accuracy Legal
requirementsAccuracy
Legal requirements
CostCost
Sensitive informationSensitive information
2-20
C
hapter
2.3 Sampling Concepts2.3 Sampling ConceptsLO2LO2--66
Parameters and StatisticsParameters and Statistics
2
•• StatisticsStatistics are computed from a sample of n items,
chosen from a
population of N items.
• Statistics can be used as estimates of parametersparameters
found in theStatistics can be used as estimates of
parametersparameters found in the
population.
• Symbols are used to represent population parameters and
sample statistics.
2-21
C
hapter
2.3 Sampling Concepts2.3 Sampling ConceptsLO2LO2--66
2
Rule of Thumb: A population may be treated
as infinite when N is at least 20 times n
(i.e., when N/n ≥ 20).
2-22
C
hapter
2.3 Sampling Concepts2.3 Sampling ConceptsLO2LO2--66
Target PopulationTarget Population
2
• The population must be carefully specified and the sample
must
be drawn scientifically so that the sample is representative.
• The target populationtarget population is the population we
are interested in (e gThe target populationtarget population is
the population we are interested in (e.g.,
U.S. gasoline prices).
• The sampling framesampling frame is the group from which
we take the sample
(e.g., 115,000 stations).
• The frame should not differ from the target population.
2-23
C
hapter
2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77
2
LO2LO2--7: 7: Explain the common sampling methods and how
to Explain the common sampling methods and how to
implement them implement them
Simple random sampleSimple random sample Use random
numbers to select items from a
Random SamplingRandom Sampling
Simple random sampleSimple random sample Use random
numbers to select items from a
list (e.g., VISA cardholders).
S t ti lS t ti l S l t kth it f li tSystematic sampleSystematic
sample Select every kth item from a list or
sequence (e.g., restaurant customers).
Stratified sampleStratified sample Select randomly within
defined strata (e.g.,
by age, occupation, gender).
Cluster sampleCluster sample Like stratified sampling except
strata are
geographical areas (e.g., zip codes).
2-24
C
hapter
2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77
2
NonNon--random Samplingrandom Sampling
Judgment sampleJudgment sample Use expert knowledge to
choose “typical” items
(e g which employees to interview)
p gp g
(e.g., which employees to interview).
Convenience Convenience
samplesample
Use a sample that happens to be available (e.g.,
ask co-worker opinions at lunch).
Focus groupsFocus groups In-depth dialog with a representative
panel of
individuals (e.g., iPod users).
2-25
C
hapter
2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77
With or Without ReplacementWith or Without Replacement
2
• If we allow duplicates when sampling, then we are sampling
with with
replacementreplacement.
• Duplicates are unlikely when n is much smaller than large N.
• If we do not allow duplicates when sampling, then we are
sampling without replacementwithout replacement.sa p g ou ep
ace eou ep ace e
2-26
C
hapter
2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77
Computer MethodsComputer Methods
2
Excel Excel -- Option AOption A Enter the Excel function
=RANDBETWEEN(1,875) into 10
spreadsheet cells. Press F9 to get a new sample.
Excel Excel -- Option BOption B Enter the function
=INT(1+875*RAND()) into 10
spreadsheet cells. Press F9 to get a new sample.p g p
InternetInternet The website www.random.org will give you
many kinds of
excellent random numbers (integers, decimals, etc).
MinitabMinitab Use Minitab’s Random Data menu with the
Integer option.
These areThese are pseudopseudo randomrandom generators
because even the best algorithmsgenerators because even the
best algorithmsThese are These are pseudopseudo--
randomrandom generators because even the best algorithms
generators because even the best algorithms
eventually repeat themselves.eventually repeat themselves.
2-27
C
hapter
2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77
Row Row –– Column Data ArraysColumn Data Arrays
2
yy
• When the data are arranged in a rectangular array, an item can
be
chosen at random by selecting a row and columnchosen at
random by selecting a row and column.
• For example, in the 4 x 3 array, select a random column
between 1
and 3 and a random row between 1 and 4and 3 and a random
row between 1 and 4.
• This way, each item has an equal chance of being selected.
2-28
C
hapter
2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77
Randomizing a ListRandomizing a List
2
• In Excel, use function =RAND() beside each row to create a
column of random numbers between 0 and 1.
• Copy and paste these numbers into the same column using
Paste
Special > Values in order to paste only the values and not
theSpecial > Values in order to paste only the values and not the
formulas.
• Sort the spreadsheet on the random number column• Sort the
spreadsheet on the random number column.
2-29
C
hapter
2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77
Systematic SamplingSystematic Sampling
2
• Sample by choosing every kth item from a list, starting from a
randomly chosen entry on the list.
• For example, starting at item 2, we sample every
4 it t bt i l f 20 it f4 items to obtain a sample of n = 20 items
from a
list of N = 78 items.
(periodicity)periodicity)Note that Note that NN//n = n = 78/20
2-30
C
hapter
2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77
Stratified SamplingStratified Sampling
2
• Utilizes prior information about the population.
• Applicable when the population can be divided into relatively
pp p p y
homogeneous subgroups of known size ((stratastrata).).
• A simple random sample of the desired size is taken within
each• A simple random sample of the desired size is taken
within each
stratumstratum..
F l f l ti t i i 55% l d 45%• For example, from a population
containing 55% males and 45%
females, randomly sample from 110 males and 90 females (n =
200).00)
2-31
C
hapter
2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77
Cluster SampleCluster Sample
2
• Strata consist of geographical regions.
•• OneOne--stagestage cluster sampling – sample consists of all
elements in gg p g p
each of k randomly chosen subregions (clusters).
•• TwoTwo--stagestage cluster sampling first choose k
subregions (clusters)•• TwoTwo--stagestage cluster sampling,
first choose k subregions (clusters),
then choose a random sample of elements within each cluster.
2-32
C
hapter
2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77
Cluster SampleCluster Sample
2
• Here is an example of 4 p
elements sampled from
each of 3 randomly
h l t (tchosen clusters (two-
stage cluster sampling).
2-33
C
hapter
2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77
Judgment SampleJudgment Sample
2
• A non-probability sampling method that relies on the
expertise of the sampler to choose items that are
representative of the populationrepresentative of the population.
• Can be affected by subconscious bias (i.e., non-randomness
i th h i )in the choice).
•• Quota samplingQuota sampling is a special kind of judgment
sampling, in
which the interviewer chooses a certain number of people
in each category.
2-34
C
hapter
2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77
Convenience SampleConvenience Sample
2
• Take advantage of whatever sample is available at that
moment. A
quick way to sample.
Focus GroupsFocus Groups
• A panel of individuals chosen to be representative of a wider
population, formed for open-ended discussion and idea
gathering.
2-35
C
hapter
2.5 Data Sources2.5 Data SourcesLO2LO2--88
2
LO2LO2--8: 8: Find everyday print or electronic data
sources.Find everyday print or electronic data sources.
• One goal of a statistics course is to help you learn where to
find
data that might be needed. Fortunately, many excellent sources
g y, y
are widely available. Some sources are given in the following
table.
2-36
C
hapter
2.6 Surveys2.6 SurveysLO2LO2--99
2
LO2LO2--9: 9: Describe basic elements of survey design,
survey types, and
sources of error.
Basic Steps of Survey ResearchBasic Steps of Survey Research
•• Step 1: State the goals of the research.State the goals of the
research.
p yp y
•• Step 2: Develop the budget (time, money, staff).
• Step 3: Create a research design (target population,
f )frame, sample size).
• Step 4: Choose a survey type and method ofChoose a survey
type and method of
administrationadministrationadministration.administration.
2-37
C
hapter
2.6 Surveys2.6 SurveysLO2LO2--99
Basic Steps of Survey ResearchBasic Steps of Survey Research
2
•• Step 5: Design a data collection instrumentDesign a data
collection instrument
p yp y
Step 5: Design a data collection instrumentDesign a data
collection instrument
(questionnaire).(questionnaire).
•• Step 6: Pretest the survey instrument and revise asPretest the
survey instrument and revise as
needed.needed.
•• Step 7: Administer the survey (follow up if
needed).Administer the survey (follow up if needed).
•• Step 8: Code the data and analyze it.Code the data and
analyze it.
2-38
C
hapter
2.6 Surveys2.6 SurveysLO2LO2--99
Survey TypesSurvey Types
2
Survey GuidelinesSurvey Guidelines
Mail Planning
Telephone
Interviews
Design
QualityInterviews
Web
Quality
Pilot test
Direct observation Buy-in
Expertise
2-39
C
hapter
2.6 Surveys2.6 SurveysLO2LO2--99
Questionnaire DesignQuestionnaire Design
2
• Use a lot of white space in layout.
B i ith h t l i t ti• Begin with short, clear instructions.
• State the survey purpose.
• Instruct on how to submit the completed survey.
• Assure anonymity.
• Break survey into naturally occurring sections.
• Let respondents bypass sections that are not applicable (e.g.,
“if
d t ti 7 ki di tl t Q ti 15”)you answered no to question 7, skip
directly to Question 15”).
2-40
C
hapter
2.6 Surveys2.6 SurveysLO2LO2--99
Questionnaire DesignQuestionnaire Design
2
gg
• Pretest and revise as needed.
K h t ibl• Keep as short as possible.
Types of QuestionsTypes of Questions
Open-ended
Fill i th bl k
Types of QuestionsTypes of Questions
Fill-in-the-blank
Check boxes
Ranked choices
Pictograms
Likert scale
2-41
C
hapter
2.6 Surveys2.6 SurveysLO2LO2--99
Question WordingQuestion Wording
2
• The way a question is asked has a profound influence on the
response. For example,
1. Shall state taxes be cut?
2. Shall state taxes be cut, if it means reducing highway
maintenance?
3 Sh ll t t t b t if it fi i t h d3. Shall state taxes be cut, if it
means firing teachers and
police?
2-42
C
hapter
2.6 Surveys2.6 SurveysLO2LO2--99
Question WordingQuestion Wording
2
• Make sure you have covered all the possibilities. For
example,
• Overlapping classes or unclear
categories are a problem. What
How old is your father?
– 45
if your father is deceased or is 45
years old.
– 55
– 65
2-43
C
hapter
2.6 Surveys2.6 SurveysLO2LO2--99
Coding and Data ScreeningCoding and Data Screening
2
• Responses are usually coded numerically
(e.g., 1 = male, 2 = female).
• Missing values are typically denoted by special characters
(e.g.,
blank, “.” or “*”). , )
• Discard questionnaires that are flawed or missing many
responses.
• Watch for multiple responses, outrageous or inconsistent
replies or
out-of-range answersout of range answers.
• Followup if necessary and always document your data-coding
decisionsdecisions.
2-44
Sample Problems - Chapters 1 & 2.pdf
1
Sample Problems
1) For the following situation indicate whether the statistical
application is primarily descriptive or
inferential.
“The manager of Anna’s Fabric Shop has collected data for 10
years on the number of each type of
dress fabric that has been sold at the store. She is interested in
making a presentation that will
illustrate these data effectively.”
This application is primarily descriptive in nature. The owner
wishes to develop a presentation. She
will most likely use charts, graphs, tables and numerical
measures to describe her data.
2) Consider the following graph that appeared in a company
annual report. What type of graph is this?
Explain.
The graph is a bar chart. A bar chart displays values associated
with categories. In this case the
categories are the departments at the food store. The values are
the total monthly sales (in dollars) in
each department. A bar chart also typically has gaps between
the bars. A histogram has no gaps and
the horizontal axis represents the possible values for a
numerical variable.
3) Consider the figures below. What differences do you see
between the histogram and the bar chart?
2
A bar chart is used whenever you want to display data that has
already been categorized while a
histogram is used to display data over a range of values for the
factor under consideration. Another
fundamental difference is that there typically are gaps between
the bars on a bar chart but there are
no gaps between the bars of a histogram.
4) Consider that you are working for an advertising firm.
Provide an example of how hypothesis
testing can be used to evaluate a product claim.
Businesses often make claims about their products that can be
tested using hypothesis testing. For
example, it is not enough for a pharmaceutical company to
claim that its new drug is effective in
treating a disease. In order for the drug to be approved by the
Food and Drug Administration the
company must present sufficient evidence that the drug first
does no harm and that it also provides an
effective treatment against the disease. The claims that the drug
does no harm and is an effective
treatment can be tested using hypothesis testing.
5) In what situations might a decision maker need to use
statistical inferences?
Statistical inference procedures are useful in situations where a
decision maker needs to reach an
estimate about a population based on a subset of data taken
from the population. For example, a
decision maker might want to know the starting annual salary of
all attorneys in the United States. If
it is not feasible or possible to look at the salary data for all
attorneys the decision maker could look
at a subset of attorneys and use statistical inference to reach a
conclusion about the population of all
attorneys.
6) Explain under what circumstances you would use hypothesis
testing as opposed to an estimation
procedure.
Hypothesis testing is used whenever one is interested in testing
claims that concern a population.
Using information taken from samples, hypothesis testing
evaluates the claim and makes a conclusion
about the population from which the sample was taken.
Estimation is used when we are interested in
knowing something about all the data, but the population is too
large, or the data set is too big for us
to work with all the data. In estimation, no claim is being made
or tested.
7) Discuss any advantages a graph showing a whole set of data
has over a single measure, such as an
average.
The major advantage of a graph is it allows a more complete
representation of information in the
data. Not only can a decision maker visualize the center of the
data but also how spread out the data
is. An average, for instance, nicely represents the center of a
data set, but contains no information of
how spread out the data is.
8) Discuss any advantages a single measure, such as an average,
has over a table showing a whole set
of data.
3
By its nature, a single measure is just one value and therefore
is simpler than a table. It allows an
easy method of comparison between two or more data sets,
something that is more difficult if the data
sets are represented in tabular form. In addition, although not
mentioned in this chapter, additional
statistical techniques, such as hypothesis testing and estimation,
involve calculations based on a
single measure from a subset of population data.
9) Suppose a survey is conducted using a telephone survey
method. The survey is conducted from 9
am to 11 am. on Tuesday. Indicate what potential problems the
data collectors might encounter.
There will likely by a high rate of nonresponse bias since many
people who work days will not be
home during the 9-11 AM time slot. Also, the data collectors
need to be careful where they get the
phone number list as some people do not have listed phones in
phone books and others have no phone
or only a cell phone. This may result in selection bias.
10) For each of the following situations, indicate what type of
data collection method you would
recommend and discuss why you so:
a) Collecting data on the percentage of bike riders who wear
helmets,
b) Collecting data on the price of regular unleaded gasoline at
gas stations in your state.
c) Collecting data on customer satisfaction with the service
provided by a major US airline.
a) Observation would be the most likely method. Observers
could be located at various bike routes
and observe the number of riders with and without helmets.
This would likely be better than asking
people if they wear a helmet since the popular response might
be to say yes even when they don’t
always do so.
b) A telephone survey to gas stations in the state. This could be
a cost effective way of getting data
from across the state. The respondent would have the
information and be able to provide the correct
price.
c) A written survey of passengers. This could be given out on
the plane before the plane lands and
passengers could drop the surveys in a box as they de-plane.
This method would likely garner higher
response rates compared to sending the survey to passengers’
mailing address and asking them to
return the completed survey by mail.
11) Indicate which sampling method would most likely be used
in each of the following situations:
a) An interview conducted with mayors of a sample of cities in
Florida.
b) A poll of voters regarding a referendum calling for a national
value-added tax.
c) A survey of customers entering a shopping mall in
Minneapolis.
a) Because the population is spread over a large geographical
area, a cluster random sample could
be selected to reduce travel costs.
b) A stratified random sample would probably be used to keep
sample size as small as possible.
c) Most likely a convenience sample would be used since doing
a statistical sample would be too
difficult.
4
12) A company has 18,000 employees. The file containing the
names is ordered by employee number
from 1 to 18,000. If a sample of 100 employees is to be selected
from the 18,000 using a systematic
random sampling, within what range of employee numbers will
the first employee be selected from?
To determine the range of employee numbers for the first
employee selected in a systematic random
sample use the following: Part range = Population Size/Sample
Size = 18,000/100 = 180. Thus, the
first person selected will come from employees 1-180. Once
that person is randomly selected, the
second person will be the one numbered 100 higher than the
first, and so on.
13) Describe how systematic random sampling could be used to
select a random sample of 1,000
customers who have a CD at a commercial bank. Assume that
the bank has 25,000 customers who
own a CD.
From a numbered list of all customers who own a CD the bank
would need to randomly determine a
starting point between 1 and k, where k would be equal to
25000/1000 = 25. This could be done
using a random number table or by having a statistical package
or a spreadsheet generate a random
number between 1 and 25. Once this value is determined the
bank would select that numbered
customer as the first sampled customer and then select every
25th customer after that until 100
customers are sampled.
14) If the manager at First City Bank surveys a sample of 100
customers to determine how many
miles they live from the bank, is the mean travel distance for
this sample considered a parameter or a
statistic?
Values computed from a sample are always considered
statistics. In order for a value, such as an
average, to be considered a parameter it must be computed from
all items in the population.
15) For each of the following, indicate whether the data are
cross-sectional or time-series:
a) Quarterly employment rates
b) Unemployment rates by state
c) Monthly sales
d) Employment satisfaction data for a company.
a) Time-series
b) Cross-sectional
c) Time-series
d) Cross-sectional
16) For each of the following variables, indicate the level of
data measurement:
a) Product rating (1 = excellent, 2 = good, 3 = fair, 4 = poor, 5
= very poor)
b) Home ownership (own, rent, other)
c) College GPA
d) Marital Status (single, married, divorced, other)
5
a) Ordinal – categories with defined order
b) Nominal – categories with no defined order
c) Ratio
d) Nominal – categories with no defined order
17) Consumer Reports, in its ratings of cars, indicates repair
history with circles. The circles are
black, white, or half-and-half. To which level of data does this
correspond?
Since the circles involve a ranking from best to worst, this
would be ordinal data.
Chapters 1 & 2 Lecture Power Point Slides.pdf
Chapter 1 Chapter 1 ––
O i fO i fOverview of Overview of
St ti tiSt ti tiStatisticsStatistics
Chapter 2Chapter 2Chapter 2 Chapter 2 ––
Data CollectionData CollectionData CollectionData Collection
©2006 Thomson/South-Western 1
Areas of Business thatAreas of Business thatAreas of Business
that Areas of Business that
Rely on StatisticsRely on StatisticsRely on StatisticsRely on
Statistics
rly ReportsYearly Reports
Basic DefinitionsBasic Definitions
Descriptive Statistics:
the collection and the collection and
d i ti f d td i ti f d tdescription of datadescription of data
decisionanalyzing, decisionInferential Statistics: Inferential
Statistics: analyzing, decision analyzing, decision
making or estimation based on the datamaking or estimation
based on the data
set of all possible
measurements that is of interestmeasurements that is of interest
of the population
from which information is gatheredfrom which information is
gatheredgg
Basic DefinitionsBasic Definitions
andom Sample: a sample
in a sample in
which each item in the population has an which each item in the
population has an
equal chance of being selectedequal chance of being selected
selection of all population
itemsitemsitemsitems
calculated from the calculated from the
populationpopulation
calculated from the samplecalculated from the samplecalculated
from the samplecalculated from the sample
Basic DefinitionsBasic Definitions
that contains onlyDiscrete Data: Discrete Data: data that
contains only data that contains only
integers or counting numbers integers or counting numbers ––
usually usually
the result of counting somethingthe result of counting
somethingthe result of counting somethingthe result of counting
something
value over a
ti l iti l iparticular range is particular range is
possible possible –– usually usually
th lt fth lt fthe result of the result of
measuring measuring
thithisomethingsomething
Level of MeasurementLevel of MeasurementLevel of
Measurement Level of Measurement
for Numerical Datafor Numerical Datafor Numerical Datafor
Numerical Data
merely labels or
assigned numbersassigned numbers
arranged in order
such as worst to best or best to worstsuch as worst to best or
best to worst
arranged in order
and the difference between numbers hasand the difference
between numbers hasand the difference between numbers has
and the difference between numbers has
meaningmeaning
from interval data in that
there is a definite zero pointthere is a definite zero pointpp
T f D tT f D tTypes of DataTypes of Data
Numerical data
Qualitative QuantitativeData TypesData Types
Nominal Ordinal Interval RatioLevels of Levels of
MeasurementMeasurementMeasurementMeasurement
Discrete Discrete or continuous
Sources of DataSources of Data
Primary data comes from an original
(primary) source and are collected with (primary) source and
are collected with
specific research questions in mindspecific research questions
in mindspecific research questions in mindspecific research
questions in mind
a
Secondary data represent previously
recorded data collected for another recorded data collected for
another
p rpose or as part of a reg larlp rpose or as part of a reg
larlpurpose or as part of a regularly purpose or as part of a
regularly
scheduled data collection procedurescheduled data collection
procedure
Data CollectionData Collection
ata collection
methods:Frequently used data collection methods:
Experiments, Telephone Surveys, Written Experiments,
Telephone Surveys, Written
Q ti i d S Di tQ ti i d S Di tQuestionnaires and Surveys, Direct
Questionnaires and Surveys, Direct
Observation and Personal InterviewsObservation and Personal
Interviews
to be aware of:
I t i bi NI t i bi N bibiInterviewer bias, NonInterviewer bias,
Non--response bias, response bias,
Selection bias, Observer bias, Selection bias, Observer bias,
M t I t l liditM t I t l liditMeasurement error, Internal validity,
Measurement error, Internal validity,
External validity External validity
R d S liR d S liRandom Sampling versus Random Sampling
versus
Nonrandom SamplingNonrandom SamplingNonrandom
SamplingNonrandom Sampling
Sampling ensures that the
sample obtain is representative of the sample obtain is
representative of the
populationpopulationpopulationpopulation
or nonprobability p p yp p y
samples are generated using a samples are generated using a
deliberate selection proceduredeliberate selection procedurepp
Generating RandomGenerating RandomGenerating Random
Generating Random
NumbersNumbersNumbersNumbers
Example (see excel
sheet)
their 300 employees. Employees are p y p y
numbered 1 to 300.
Excel 2007 to generate 10
random numbers between 1 and 300. Values
must be integer numbers corresponding tomust be integer
numbers corresponding to
employee numbers
Example 1Example 1
the following
the following situation, indicate
whether the statistical application is whether the statistical
application is
primarily descriptive or inferentialprimarily descriptive or
inferentialprimarily descriptive or inferential primarily
descriptive or inferential
has collected
data for 10 years on the number of each data for 10 years on the
number of each
type of dress fabric that has been sold in type of dress fabric
that has been sold in
the store. She is interested in making a the store. She is
interested in making a
presentation that will illustrate these data presentation that will
illustrate these data
effectively.effectively.
Example 2Example 2
the For the following situations, indicate the
type of data collection method to use:type of data collection
method to use:
data on the percentage of bike
riders who wear helmetsriders who wear helmets
the price of regular
unleaded gasoline at gas stations in yourunleaded gasoline at
gas stations in yourunleaded gasoline at gas stations in your
unleaded gasoline at gas stations in your
statestate
satisfaction Collecting data on customer satisfaction
with the service provided by a major US with the service
provided by a major US
irlineairline
Example 3Example 3
cording to a national
CNN/USA/Gallup According to a national CNN/USA/Gallup
survey of 1025 adults, conducted March survey of 1025 adults,
conducted March
1414 –– 16 2008 63% say they have16 2008 63% say they
have1414 16, 2008, 63% say they have 16, 2008, 63% say they
have
experienced a hardship because of rising experienced a hardship
because of rising
gasoline prices How do you believe thegasoline prices How do
you believe thegasoline prices. How do you believe the gasoline
prices. How do you believe the
survey was conducted and what type of survey was conducted
and what type of
bias could occur in the data collectionbias could occur in the
data collectionbias could occur in the data collection bias could
occur in the data collection
process?process?
Example 4Example 4
systematic random
sampling could be used to select a sampling could be used to
select a
random sample of 1000 customers whorandom sample of 1000
customers whorandom sample of 1000 customers who random
sample of 1000 customers who
have a certificate of deposit at a have a certificate of deposit at
a
commercial bank Assume that the bankcommercial bank Assume
that the bankcommercial bank. Assume that the bank
commercial bank. Assume that the bank
has 25000 customers who own a has 25000 customers who own
a
certificate of depositcertificate of depositcertificate of
deposit.certificate of deposit.
Example 5Example 5
the manager at First City Bank surveysIf the manager at
surveys If the manager at First City Bank surveys
a sample of 100 customers to determine a sample of 100
customers to determine
how many miles they live from the bankhow many miles they
live from the bankhow many miles they live from the bank, how
many miles they live from the bank,
is the mean travel distance for this is the mean travel distance
for this
sample considered a parameter or asample considered a
parameter or asample considered a parameter or a sample
considered a parameter or a
statistic?statistic?
Example 6Example 6
te For each of the
following, indicate
whether the data are crosswhether the data are cross--sectional
or sectional or
time series:time series:time series:time series:
teUnemployment rates by state
data for aEmployment satisfaction data for a Employment
satisfaction data for a
companycompany
Example 7Example 7
each of the following variables,
indicate the level of data measurement:indicate the level of data
measurement:
excellent, 2 = good, 3 Product rating
[1 = excellent, 2 = good, 3
= fair, 4 = poor, 5 = very poor]= fair, 4 = poor, 5 = very poor]
rent, other]
tal status [single, married, divorced,Marital status
[single, married, divorced,Marital status [single, married,
divorced, Marital status [single, married, divorced,
other]other]
Example 8Example 8
energy
considering A maker of energy drinks is considering
abandoning can containers and going abandoning can containers
and going
exclusively to bottles because the salesexclusively to bottles
because the salesexclusively to bottles because the sales
exclusively to bottles because the sales
manager believes customers prefer manager believes customers
prefer
drinking from bottles However the VP indrinking from bottles
However the VP indrinking from bottles. However, the VP in
drinking from bottles. However, the VP in
charge of marketing is not convinced the charge of marketing is
not convinced the
sales manager is correctsales manager is correctsales manager is
correct.sales manager is correct.
ndicate the data collection method you Indicate the data
collection method you
would usewould use
Example 8 (contd)Example 8 (contd)
Indicate what procedures you would
follow to apply this technique in this follow to apply this
technique in this
settingsettingsettingsetting
data measurement
applies to the data you would collectapplies to the data you
would collect
data qualitative or quantitative?
Zipped Chapter 3 Material.zip
Chapter 3 Power Point Slides.pdf
C
ha
Describing Data VisuallyDescribing Data Visually
pter
Chapter ContentsChapter Contents
3 13 1 StemStem andand Leaf Displays and Dot PlotsLeaf
Displays and Dot Plots
3
3.1 3.1 StemStem--andand--Leaf Displays and Dot PlotsLeaf
Displays and Dot Plots
3.2 Frequency Distributions and Histograms3.2 Frequency
Distributions and Histograms
3.3 Excel Charts3.3 Excel Charts
3.4 Line Charts3.4 Line Charts
3.5 Bar Charts3.5 Bar Charts
3 6 Pie Charts3 6 Pie Charts3.6 Pie Charts3.6 Pie Charts
3.7 Scatter Plots3.7 Scatter Plots
3 8 T bl3 8 T bl3.8 Tables3.8 Tables
3.9 Deceptive Graphs3.9 Deceptive Graphs
3-1
C
hapter
Describing Data VisuallyDescribing Data Visually
Chapter Learning ObjectivesChapter Learning Objectives
3
Chapter Learning ObjectivesChapter Learning Objectives
LO3LO3 11LO3LO3--1: 1: Make a stemMake a stem--andand--
leaf or dot plot by hand or by computer.leaf or dot plot by hand
or by computer.
LO3LO3--2: 2: Create a frequency distribution for a data
set.Create a frequency distribution for a data set.
LO3LO3--3: 3: Make a histogram with appropriate bins.Make a
histogram with appropriate bins.
LO3LO3--4:4: Identify skewness, modal classes, and outliers in
a histogram.Identify skewness, modal classes, and outliers in a
histogram.LO3LO3 4: 4: Identify skewness, modal classes, and
outliers in a histogram.Identify skewness, modal classes, and
outliers in a histogram.
LO3LO3--5: 5: Make an effective line chart using Excel.Make
an effective line chart using Excel.
3-2
C
hapter
Describing Data VisuallyDescribing Data Visually
Chapter Learning ObjectivesChapter Learning Objectives
3
Chapter Learning ObjectivesChapter Learning Objectives
LO3LO3 66LO3LO3--6: 6: Know the rules for effective bar
charts and pie charts.Know the rules for effective bar charts and
pie charts.
LO3LO3--7: 7: Make and interpret a scatter plot using
Excel.Make and interpret a scatter plot using Excel.
LO3LO3--8: 8: Make simple tables and pivot tables.Make
simple tables and pivot tables.
LO3LO3--9:9: Recognize deceptive graphing
techniques.Recognize deceptive graphing techniques.LO3LO3
9: 9: Recognize deceptive graphing techniques.Recognize
deceptive graphing techniques.
3-3
C
hapter
3.1 Stem3.1 Stem--andand--Leaf Displays and Leaf Displays
and
Dot PlotsDot Plots
•• Methods of organizing, exploring and summarizing data
include:Methods of organizing, exploring and summarizing data
include:
3Dot PlotsDot Plots
g g, p g gg g, p g g
- VisualVisual (charts and graphs) suasua (c a s a d g ap s)
provides insight into characteristics of a data set without using
mathematics.
- NumericalNumerical (statistics or tables)-
NumericalNumerical (statistics or tables)
provides insight into characteristics of a data set using
mathematics.
3-4
C
hapter
3.1 Stem3.1 Stem--andand--Leaf Displays and Leaf Displays
and
Dot PlotsDot Plots
•• Begin with univariate data (a set of Begin with univariate
data (a set of nn observations on one variable) observations on
one variable)
and consider the following:and consider the following:
3Dot PlotsDot Plots
and consider the following:and consider the following:
3-5
C
hapter
3.1 Stem3.1 Stem--andand--Leaf Displays and Leaf Displays
and
Dot PlotsDot Plots
•• MeasurementMeasurement
• Look at the data and visualize how they were collected and
3Dot PlotsDot Plots
• Look at the data and visualize how they were collected and
measured.
•• Sorting Sorting (Example: Price/Earnings Ratios) (Example:
Price/Earnings Ratios)
• Sort the data and then summarize in a graphical display. Here
areSort the data and then summarize in a graphical display.
Here are
the sorted P/E ratios (values from Table 3.2).
3-6
C
hapter
3.1 Stem3.1 Stem--andand--leaf Displays and leaf Displays
and
Dot PlotsDot Plots
LO3LO3--11
The type of graph you use to display your data is dependent on
the
3Dot PlotsDot Plots
type of data you have. Some charts are better suited for
quantitative
data, while others are better for displaying categorical data.
Stem-and-Leaf Plot
LO3-1: Make a stem-and-leaf or dot plot by hand or by
computer.
One simple way to visualize small data sets is a stem-and-leaf
plot.
The stem-and-leaf plot is a tool of exploratory data analysis
(EDA)The stem-and-leaf plot is a tool of exploratory data
analysis (EDA)
that seeks to reveal essential data features in an intuitive way. A
stem-
and-leaf plot is basically a frequency tally, except that we use
digits
instead of tally marks. For two-digit or three-digit integer data,
the stem
is the tens digit of the data, and the leaf is the ones digit.
3-7
C
hapter
3.1 Stem3.1 Stem--andand--Leaf Displays Leaf Displays
andand Dot PlotsDot Plots
LO3LO3--11
For the 44 P/E ratios, the stem-and-leaf plot is given below.
3and and Dot PlotsDot Plots
For example, the data values in the fourth stem are 31, 37, 37,
38. We always use equally spaced
t ( if t t ) Th t d l f l t l t d (24 f thstems (even if some stems are
empty). The stem-and-leaf can reveal central tendency (24 of
the
44 P/E ratios were in the 10–19 stem) as well as dispersion (the
range is from 7 to 59). In this
illustration, the leaf digits have been sorted, although this is not
necessary. The stem-and-leaf has
the advantage that we can retrieve the raw data by
concatenating a stem digit with each of its leaf
3-8
g y g g
digits. For example, the last stem has data values 50 and 59.
C
hapter
LO3LO3--11 3.1 Stem3.1 Stem--andand--Leaf Displays and
Leaf Displays and
Dot PlotsDot Plots 3
Dot PlotsDot Plots
Dot PlotsDot Plots
•• A dot plot is the simplest graphical display of A dot plot is
the simplest graphical display of nn individual values of
numerical individual values of numerical
data. data.
E t d t dE t d t d
•• Steps in Making a Dot PlotSteps in Making a Dot Plot
-- Easy to understand. Easy to understand.
-- It reveals dispersion, central tendency, and the shape of the
distribution.It reveals dispersion, central tendency, and the
shape of the distribution.
p gp g
11. . Make a scale that covers the data range.Make a scale that
covers the data range.
22. Mark the axes and label them.. Mark the axes and label
them.
33. Plot each data value as a dot above the scale at its
approximate location.Plot each data value as a dot above the
scale at its approximate location.
Note: Note: If more than one data value lies at about the same
axis location, If more than one data value lies at about the same
axis location,
the dots are stacked vertically.the dots are stacked vertically.
3-9
C
hapter
LO3LO3--11 3.1 Stem3.1 Stem--andand--Leaf Displays and
Leaf Displays and
Dot PlotsDot Plots 3Dot PlotsDot Plots
• The range is from 7 to 59• The range is from 7 to 59.
• All but a few data values lie between 10 and 25.
• A typical “middle” data value would be around 17 or 18.
• The data are not symmetric due to a few large P/E ratios.
3-10
C
hapter
3.1 Stem3.1 Stem--andand--Leaf Displays and Leaf Displays
and
Dot PlotsDot Plots
LO3LO3--11
Comparing GroupsComparing Groups
• A stacked dot plotstacked dot plot compares two or more
groups using a common
3Dot PlotsDot Plots
• A stacked dot plotstacked dot plot compares two or more
groups using a common
X-axis scale.
3-11
C
ha3.2 Frequency Distributions and 3.2 Frequency Distributions
and pter
q yq y
HistogramsHistogramsLO3LO3--22
3
LO3LO3--2: 2: Create a frequency distribution for a data
setCreate a frequency distribution for a data set
Bins and Bin LimitsBins and Bin Limits
• A frequency distributionfrequency distribution is a table
formed by classifying n data
values into k classes (bins).
•• Bin limitsBin limits define the values to be included in each
bin. Widths must
all be the same except when we have open-ended bins.
F iF i th b f b ti ithi h bi•• FrequenciesFrequencies are the
number of observations within each bin.
• Express as relative frequenciesrelative frequencies (frequency
divided by the total) or p qq ( q y y )
percentagespercentages (relative frequency times 100).
3-12
C
ha3.2 Frequency Distributions and 3.2 Frequency Distributions
and pter
q yq y
HistogramsHistogramsLO3LO3--22
- Herbert Sturges proposed the following rule:
Constructing a Frequency DistributionConstructing a Frequency
Distribution
3
- Herbert Sturges proposed the following rule:
3-13
C
ha3.2 Frequency Distributions and 3.2 Frequency Distributions
and pter
LO3LO3--22
q yq y
HistogramsHistograms
3
3-14
C
ha3.2 Frequency Distributions and 3.2 Frequency Distributions
and pter
LO3LO3--22
q yq y
HistogramsHistograms
HistogramsHistograms
3
HistogramsHistograms
•• A A histogramhistogram is a graphical representation of a
frequency distributionis a graphical representation of a
frequency distribution.
YY--axis shows frequency within each binaxis shows frequency
within each bin.
•• A A histogramhistogram is a bar chart.is a bar chart.
XX--axis ticks shows end points of each bin.axis ticks shows
end points of each bin.
3-15
C
ha3.2 Frequency Distributions and 3.2 Frequency Distributions
and pter
q yq y
HistogramsHistogramsLO3LO3--33
• Consider 3 histograms for the P/E ratio data with different bin
3
LO3LO3--3: 3: Make a histogram with appropriate bins.Make a
histogram with appropriate bins.
• Consider 3 histograms for the P/E ratio data with different bin
widths. What do they tell you?
3-16
C
ha3.2 Frequency Distributions and 3.2 Frequency Distributions
and pter
q yq y
HistogramsHistogramsLO3LO3--33
3
LO3LO3--3: 3: Make a histogram with appropriate bins.Make a
histogram with appropriate bins.
• Choosing the number of bins and bin limits in creating
histograms Choosing the number of bins and bin limits in
creating histograms
requires judgmentrequires judgmentrequires judgment.requires
judgment.
•• One can use software programs to create histograms with
different One can use software programs to create histograms
with different
bins. These include software such as:bins. These include
software such as:
•• ExcelExcel
•• MegaStatMegaStat
•• MinitabMinitab
3-17
C
ha3.2 Frequency Distributions and 3.2 Frequency Distributions
and pter
q yq y
HistogramsHistogramsLO3LO3--33
Modal ClassModal Class
3
•• A histogram bar that is higher than those on either side. A
histogram bar that is higher than those on either side.
•• UnimodalUnimodal –– a single modal class.a single modal
class.
•• BimodalBimodal –– two modal classes.two modal classes.
•• MultimodalMultimodal –– more than two modal classesmore
than two modal classes•• MultimodalMultimodal –– more than
two modal classes.more than two modal classes.
•• Modal classes may be artifacts of the way bin limits are
chosenModal classes may be artifacts of the way bin limits are
chosen•• Modal classes may be artifacts of the way bin limits
are chosen.Modal classes may be artifacts of the way bin limits
are chosen.
3-18
C
ha3.2 Frequency Distributions and 3.2 Frequency Distributions
and pter
LO3LO3--44
q yq y
HistogramsHistograms
ShapeShape
3
LO3LO3--4:4: Identify skewness, modes, and outliers in a
histogram.Identify skewness, modes, and outliers in a
histogram.
ShapeShape
•• A histogram may suggest the A histogram may suggest the
shapeshape of the population. of the population.
SkSk i di t d b th di ti f th l t il f thi di t d b th di ti f th l t il f th
•• It is influenced by the number of bins and bin limits.It is
influenced by the number of bins and bin limits.
•• SkewnessSkewness –– indicated by the direction of the
longer tail of the indicated by the direction of the longer tail of
the
histogram.histogram.
LeftLeft--skewedskewed –– (negatively skewed) a longer left
tail(negatively skewed) a longer left tailLeftLeft--
skewedskewed –– (negatively skewed) a longer left
tail.(negatively skewed) a longer left tail.
Ri htRi ht k dk d ( iti l k d) l i ht t il( iti l k d) l i ht t
ilRightRight--skewedskewed –– (positively skewed) a longer
right tail.(positively skewed) a longer right tail.
S t iS t i b th t il thb th t il th
3-19
SymmetricSymmetric –– both tail areas are the same.both tail
areas are the same.
C
ha3.2 Frequency Distributions and 3.2 Frequency Distributions
and pter
q yq y
HistogramsHistogramsLO3LO3--44
3
3-20
C
ha3.2 Frequency Distributions and 3.2 Frequency Distributions
and pter
q yq y
HistogramsHistograms
3
Frequency Polygons and OgivesFrequency Polygons and Ogives
•• A frequency polygon is a line graph that connects the
midpoints of A frequency polygon is a line graph that connects
the midpoints of
the histogram intervals, plus extra intervals at the beginning and
the histogram intervals, plus extra intervals at the beginning and
end end
so that the line will touch theso that the line will touch the XX--
axisaxisso that the line will touch the so that the line will touch
the XX--axis. axis.
•• It serves the same purpose as a histogram, but is attractive
when you It serves the same purpose as a histogram, but is
attractive when you
need to compare two data sets (since more than one frequency
need to compare two data sets (since more than one frequency
polygon can be plotted on the same scale).polygon can be
plotted on the same scale).
•• An ogive (pronounced “ohAn ogive (pronounced “oh--jive”)
is a line graph of the cumulative jive”) is a line graph of the
cumulative
frequenciesfrequenciesfrequencies. frequencies.
•• It is useful for finding percentiles or in comparing the shape
of the It is useful for finding percentiles or in comparing the
shape of the
sample with a known benchmark such as the normal distribution
(that sample with a known benchmark such as the normal
distribution (that
you will be seeing in the next chapter).you will be seeing in the
next chapter).
3-21
C
ha3.2 Frequency Distributions and 3.2 Frequency Distributions
and pter
q yq y
HistogramsHistograms
3
Frequency Polygons and OgivesFrequency Polygons and
OgivesFrequency Polygons and OgivesFrequency Polygons and
Ogives
3-22
C
ha
3 3 Excel Charts3 3 Excel Charts pter
3.3 Excel Charts3.3 Excel Charts
3
This section describes how to use Excel to create This section
describes how to use Excel to create
charts. Please refer to the text.charts. Please refer to the text.
3-23
C
hapter
3.4 Line Charts3.4 Line ChartsLO3LO3--55
3
LO3LO3--5:5: Make an effective line chart using Excel.Make an
effective line chart using Excel.
Simple Line ChartsSimple Line Charts
•• Used to display a time Used to display a time
series or spot trends, series or spot trends,
or to compare timeor to compare timeor to compare time or to
compare time
periods.periods.
C di l lC di l l•• Can display several Can display several
variables at once.variables at once.
3-24
C
hapter
3.4 Line Charts3.4 Line ChartsLO3LO3--55
Simple Line ChartsSimple Line Charts
3
•• TwoTwo--scale line chart scale line chart –– used to compare
variables that differ in used to compare variables that differ in
magnitude or are measured in different units.magnitude or are
measured in different units.
3-25
C
hapter
LO3LO3--55 3.4 Line Charts3.4 Line Charts
Log ScalesLog Scales
Arithmetic scaleArithmetic scale distances on thedistances on
the YY a is are proportional to thea is are proportional to the
3
•• Arithmetic scaleArithmetic scale –– distances on the
distances on the YY--axis are proportional to the axis are
proportional to the
magnitude of the variable being displayed.magnitude of the
variable being displayed.
•• Logarithmic scaleLogarithmic scale –– ((ratio scaleratio
scale) equal distances represent equal ) equal distances
represent equal
ratios.ratios.
•• Use a Use a log scalelog scale for the vertical axis when data
vary over a wide for the vertical axis when data vary over a
wide
range, say, by more than an order of magnitude.range, say, by
more than an order of magnitude.
•• This will reveal more detail for smaller data values.This will
reveal more detail for smaller data values.
3-26
C
hapter
3.4 Line Charts3.4 Line ChartsLO3LO3--55
Log ScalesLog Scales
A log scale is useful for time series data that might be expected
to grow at a
3
A log scale is useful for time series data that might be expected
to grow at a
compound annual percentage rate (e.g., GDP, the national debt,
or your
future income). It reveals whether the quantity is growing at
an
increasing percent (concave upward)increasing percent (concave
upward),
constant percent (straight line), or
declining percent (concave downward)
3-27
C
hapter
3.5 Bar Charts3.5 Bar ChartsLO3LO3--66
3
LO3LO3--6: 6: Know the rules for effective bar charts and pie
charts.Know the rules for effective bar charts and pie charts.
M t t di l tt ib t d t
Simple Bar ChartsSimple Bar Charts
• Most common way to display attribute data.
- Bars represent categories or attributes.
- Lengths of bars represent frequencies.g p q
3-28
C
hapter
3.5 Bar Charts3.5 Bar ChartsLO3LO3--66
Pareto ChartsPareto Charts
3
•• Special type of bar chart used in quality management to
display the Special type of bar chart used in quality
management to display the
frequency of defects or errors of different types. frequency of
defects or errors of different types.
•• Categories are Categories are
displayed in displayed in
descending descending order order
of frequency. of frequency.
•• Focus on Focus on
significant fewsignificant few
(i.e., few (i.e., few
categories that categories that
account for most defects or errors)account for most defects or
errors)
3-29
account for most defects or errors).account for most defects or
errors).
C
hapter
3.5 Bar Charts3.5 Bar ChartsLO3LO3--66
Stacked Bar ChartStacked Bar Chart
3
•• Bar height is the sumBar height is the sumBar height is the
sum Bar height is the sum
of several subtotals. of several subtotals.
Areas may be Areas may be
d b l td b l tcompared by color to compared by color to
show patterns in the show patterns in the
subgroups and total.subgroups and total.subgroups and
total.subgroups and total.
3-30
C
hapter
3.6 Pie Charts3.6 Pie ChartsLO3LO3--66
3
LO3LO3--6:6: Know the rules for effective bar charts and pie
charts.Know the rules for effective bar charts and pie charts.
An OftAn Oft--Abused ChartAbused Chart
•• A A pie chartpie chart can only convey a general idea of the
data.can only convey a general idea of the data.
•• Pie charts should be used to portray data which sum to a total
Pie charts should be used to portray data which sum to a total
(e g percent market shares)(e g percent market shares)(e.g.,
percent market shares).(e.g., percent market shares).
•• A pie chart should only have a few (i.e., 2 or 3) slices.A pie
chart should only have a few (i.e., 2 or 3) slices.
•• Each slice can be labeled with data values or percentsEach
slice can be labeled with data values or percents•• Each slice
can be labeled with data values or percents.Each slice can be
labeled with data values or percents.
3-31
C
hapter
3.6 Pie Charts3.6 Pie ChartsLO3LO3--66
•• Consider the following charts used to illustrate an article
from the Wall StreetConsider the following charts used to
illustrate an article from the Wall Street
An OftAn Oft--Abused ChartAbused Chart
3
Consider the following charts used to illustrate an article from
the Wall Street Consider the following charts used to illustrate
an article from the Wall Street
Journal. Which type appears to be better?Journal. Which type
appears to be better?
3-32
C
hapter
3.6 Pie Charts3.6 Pie ChartsLO3LO3--66
•• ExplodedExploded and 33--D pie chartsD pie charts add
strong visual impact.
Pie Chart OptionsPie Chart Options
3
ExplodedExploded and 33 D pie chartsD pie charts add strong
visual impact.
3-33
C
hapter
3.7 Scatter Plots3.7 Scatter PlotsLO3LO3--77
3
LO3LO3--7:7: Make and interpret a scatter plot using
Excel.Make and interpret a scatter plot using Excel.
•• Scatter plots can convey patterns in data pairs that would not
be Scatter plots can convey patterns in data pairs that would not
be
apparent from a table.apparent from a table.
Refer to
the text for
EXCEL
outputs.
3-34
C
hapter
3.8 Tables3.8 Tables
•• TablesTables are the simplest form of data display.
A d t bld t bl i t bl th t t i ti i d t d th
3
• A compound tablecompound table is a table that contains time
series data down the
columns and variables across the rows.
Example: School ExpendituresExample: School Expenditures
•• Arrangement of data is in rows and columns to enhance
meaning.Arrangement of data is in rows and columns to enhance
meaning.
Example: School ExpendituresExample: School Expenditures
•• The data can be viewed by focusing on the time pattern
(down the The data can be viewed by focusing on the time
pattern (down the
columns) or by comparing the variables (across the
rows).columns) or by comparing the variables (across the
rows).) y p g ( )) y p g ( )
3-35
C
hapter
3.8 Tables3.8 Tables
Example: School ExpendituresExample: School Expenditures
3
U it f t t d i th f t t• Units of measure are stated in the footnote.
• Note merged headings to group columns.
• See text for “Tips for Effective Bar and Column
3-36
p
Charts.” Tables”.
C
hapter 3.8 Tables3.8 Tables
LO3LO3--88
3
LO3LO3--8:8: Make simple tables and Pivot tablesMake simple
tables and Pivot tables
Here are some tips for creating effective tables:Here are some
tips for creating effective tables:
1. Keep the table simple, consistent with its purpose. Put
summary tables in the main body of the written report and y y p
detailed tables in an appendix.
2. Display the data to be compared in columns rather than rows.
3 For presentation purposes round off to three or four
significant3. For presentation purposes, round off to three or
four significant
digits.
4. Physical table layout should guide the eye toward the y y g y
comparison you wish to emphasize.
5. Row and column headings should be simple yet descriptive.
6 Within a column use a consistent number of decimal digits
3-37
6. Within a column, use a consistent number of decimal digits.
C
hapter
LO3LO3--99 3.9 Deceptive Graphs3.9 Deceptive Graphs
3
LO3LO3--9:9: Recognize deceptive graphing
techniques.Recognize deceptive graphing techniques.
•• A nonzero origin will exaggerate the trendA nonzero origin
will exaggerate the trend
Error 1Error 1: Nonzero Origin: Nonzero Origin
•• A nonzero origin will exaggerate the trend.A nonzero origin
will exaggerate the trend.
3-38
DeceptiveDeceptive CorrectCorrect
C
hapter
LO3LO3--99 3.9 Deceptive Graphs3.9 Deceptive Graphs
Error 2Error 2: Elastic Graph Proportions: Elastic Graph
Proportions
3
•• Keep the Keep the aspect ratioaspect ratio (width/height)
below 2.00 so as not to (width/height) below 2.00 so as not to
exaggerate the graph. By default, Excel uses an aspect ratio of
exaggerate the graph. By default, Excel uses an aspect ratio of
1.68.1.68.
3-39
C
hapter
3.9 Deceptive Graphs3.9 Deceptive GraphsLO3LO3--99
Error 4Error 4: 3: 3--D and Novelty GraphsD and Novelty
Graphs
3
•• Can make trends appear to dwindle into the distance or loom
Can make trends appear to dwindle into the distance or loom
towards you.towards you.
3-40
C
hapter
3.9 Deceptive Graphs3.9 Deceptive GraphsLO3LO3--99
Error 5Error 5: 3: 3--D and Rotated GraphsD and Rotated
Graphs
3
•• Can make trends appear to dwindle into the distance or loom
Can make trends appear to dwindle into the distance or loom
towards you.towards you.
3-41
C
hapter
LO3LO3--99 3.9 Deceptive Graphs3.9 Deceptive Graphs
•• Avoid if possible Keep your main objective in mind Break
graphAvoid if possible Keep your main objective in mind Break
graph
Error 8Error 8: Complex Graphs: Complex Graphs
3
•• Avoid if possible. Keep your main objective in mind. Break
graph Avoid if possible. Keep your main objective in mind.
Break graph
into smaller parts.into smaller parts.
3-42
C
hapter
3.9 Deceptive Graphs3.9 Deceptive GraphsLO3LO3--99
Error 11Error 11: Area Trick: Area Trick
3
•• As figure height increases, so does width, distorting the
graph.As figure height increases, so does width, distorting the
graph.
3-43
C
hapter
LO3LO3--99 3.9 Deceptive Graphs3.9 Deceptive Graphs
•• Other deceptive graphing techniquesOther deceptive graphing
techniques
3
•• Other deceptive graphing techniques.Other deceptive
graphing techniques.
• Error 3: Dramatic Title and Distracting PicturesError 3:
Dramatic Title and Distracting Pictures
• Error 6: Unclear Definitions or Scales
• Error 7: Vague Sources
• Error 9: Gratuitous Effects
• Error 10: Estimated Data
3-44
Sample Problems - Chapter 3.pdf
1
Sample Problems
1) Given the following data, develop a frequency distribution:
Step 1: List the possible values.
The possible values for the discrete variable are 0 through 12.
Step 2: Count the number of occurrences at each value.
The resulting frequency distribution is shown as follows:
2) Assuming you have data for a variable with 2,000 values, usi
ng the 2
k
≥ n guideline, what is the
least number of groups that should be used in developing a grou
ped data frequency distribution?
Given n = 2,000, the minimum number of groups for a grouped
data frequency distribution determined using
the 2
k
≥ n guideline is:
2
k
≥ n or 2
11
= 2048 ≥ 2000; use k = 11 groups.
3) A study is being conducted in which a variable of interest has
1,000 observations. The minimum
value in the dataset is 300 points and the maximum is 2,900 poi
nts.
a) Use the 2
k
≥ n guideline to determine the minimum number of classes to u
se in developing a
grouped data frequency distribution.
b) Based on your answer in a), determine the class width that sh
ould be used.
a) Given n = 1,000, the minimum number of classes for a groupe
d data frequency distribution determined using
the 2
k
≥ n guideline is:
2
2
k
≥ n or 2
10
= 1024 ≥ 1000; use k = 10 classes.
b) Assuming that the number of classes that will be used is 10,
the class width is determined as follows:
w = (high – low)/classes = (2900 – 300)/10 = 2600/10 = 260.
4) Produce the relative frequency distribution from a sample of
size 50 that gave rise to the following
ogive:
Class Frequency Relative Frequency Cumulative
Relative Frequency
0 – < 100 10 0.20 0.20
100 – < 200 10 0.20 0.40
200 – < 300 5 0.10 0.50
300 – < 400 5 0.10 0.60
400 – < 500 20 0.40 1.00
500 – < 600 0 0.00 1.00
5) You have the following data:
3
a) Construct a frequency distribution for these data. Use the 2
k
≥ n guideline to determine the
number of classes to use.
b) Develop a relative frequency distribution using the classes yo
u constructed in a).
c) Develop a cumulative frequency distribution and a cumulativ
e relative frequency distribution using
the classes you constructed in a).
d) Develop a histogram based on the frequency distribution you
constructed in a).
a) There are n = 60 observations in the data set. Using the 2
k
> n guideline, the number of classes, k, would be
6. The maximum and minimum values in the data set are 17 and
0, respectively. The class width is computed
to be: w = (17‐0)/6 = 2.833, which is rounded to 3. The frequen
cy distribution is
Class Frequency
0-2 6
3-5 13
6-8 20
9-11 14
12-14 5
15-17 2
Total = 60
b) To construct the relative frequency distribution, divide the nu
mber of occurrences (frequency) in each class
by the total number of occurrences. The relative frequency distr
ibution is shown below.
Class Frequency Relative Frequency
0-2 6 0.100
3-5 13 0.217
6-8 20 0.333
9-11 14 0.233
12-14 5 0.083
15-17 2 0.033
Total = 60
c) To develop the cumulative frequency distribution, compute a
running sum for each class by adding the
frequency for that class to the frequencies for all classes above i
t. The cumulative relative frequencies are
4
computed by dividing the cumulative frequency for each class b
y the total number of observations. The
cumulative frequency and the cumulative relative frequency dist
ributions are shown below.
Class Frequency
Relative
Frequency
Cumulative
Frequency
Cumulative
Relative
Frequency
0-2 6 0.100 6 0.100
3-5 13 0.217 19 0.317
6-8 20 0.333 39 0.650
9-11 14 0.233 53 0.883
12-14 5 0.083 58 0.967
15-17 2 0.033 60 1.000
Total = 60
d) To develop the histogram, first construct a frequency distribu
tion (see part a). The classes form the
horizontal axis and the frequency forms the vertical axis. Bars
corresponding to the frequency of each class are
developed. The histogram based on the frequency distribution
from part (a) is shown below.
6) Fill in the missing components of the following frequency dis
tribution constructed for a sample
size of 50:
Histogram
0
5
10
15
20
25
0-2 3-5 6-8 9-11 12-14 15-17
Classes
F
re
q
u
e
n
c
y
5
Class Frequency Relative Frequency Cumulative
Relative Frequency
7.85 – < 7.95 6 0.12 0.12
7.95 – < 8.05 18 0.36 0.48
8.05 – < 8.15 12 0.24 0.72
8.15 – < 8.25 5 0.10 0.82
8.25 – < 8.35 9 0.18 1.00
7) The following cumulative frequency distribution summarizes
data obtained in a study of the
ending overages (in $) for the cash register balance at a firm:
a) Determine the proportion of the days in which there were no
shortages.
b) Determine the proportion of the days the cash register was le
ss than $20 off.
c) Determine the proportion of the days the cash register was les
s than $40 over or at the most $20
short.
a) Proportion of days in which no shortages occurred = 1 –
proportion of days in which shortages occurred = 1 –
0.24 = 0.76
b) Less than $20 off implies that overage was less than $20 and
the shortage was less than $20 = (proportion of
overages less $20) –
(proportion of shortages at most $20) = 0.56 – 0.08 = 0.48
c) Proportion of days with less than $40 over or at most $20 sho
rt = Proportion of days with less than $40 over
– proportion of days with more than $20 short = 0.96 –
0.08 = 0.86.
8) You are given the following data:
6
a) Construct a frequency distribution for these data.
b) Based on the frequency distribution, develop a histogram.
c) Construct a relative frequency distribution.
d) Develop a relative frequency histogram.
e) Compare the two histograms.
a) The data do not require grouping. The following frequency
distribution is given:
x Frequency
0 0
1 0
2 1
3 1
4 10
5 15
6 13
7 13
8 5
9 1
10 1
b) The following histogram could be developed.
0
2
4
6
8
10
12
14
16
0 1 2 3 4 5 6 7 8 9 10
F
re
q
u
e
n
c
y
x variable
7
c) The relative frequency distribution shows the fraction of
values falling at each value of x.
d) The relative frequency histogram is shown below.
e) The two histograms look exactly alike since the same data are
being graphed. The bars represent either the
frequency or relative frequency.
9) The following data reflect the percentages of employees with
different levels of education:
a) Develop a pie chart to illustrate these data.
b) Develop a horizontal bar chart to illustrate these data.
a) The pie chart is as follows:
0.00
0.05
0.10
0.15
0.20
0.25
0.30
0 1 2 3 4 5 6 7 8 9 10
R
e
la
ti
v
e
F
re
q
u
e
n
c
y
x variable
8
b) The horizontal bar chart is shown as follows:
10) Given the following data, construct a stem and leaf diagram.
Education Levels
18%
34%
14%
30%
4%
Less than HS Graduate
HS Graduate
Some College
College Graduate
Grdauate Degree
Education Levels
18
34
14
30
4
0 5 10 15 20 25 30 35 40
Less than HS Graduate
HS Graduate
Some College
College Graduate
Grdauate Degree
Percent
9
Sort the data from low to high. The lowest value is 0.7 and the
highest 6.4.
Split the values into a stem and leaf. Stem = units place leaf =
decimal place
List all possible stems from lowest to highest.
Itemize the leaves from lowest to highest and place next to the a
ppropriate stems.
11) A university has the following number of students at each gr
ade level.
a) Construct a bar chart that effectively displays these data.
b) Construct a pie chart to display these data.
c) Refer to a) and b). Which graph is the most effective way to
present these data and why?
a)
b)
10
c) A case can be made for either a bar chart or pie chart. Pie ch
arts are especially good at showing how the
total is divided into parts. The bar chart is best to draw attentio
n to specific results. In this case, one could
look at the apparent attrition that takes place in the number of st
udents between Freshman and Senior years.
12) Given the following sales data for product category and sale
s region, construct two different bar
charts that display the data effectively.
One possible bar chart is shown as follows:
Another way to present the same data is:
Still another possible way is called a “stacked” bar chart.
Sales By Product Type and Region
0
50
100
150
200
250
300
350
400
450
East West North South
Region
S
a
le
s
XJ-6 Model
X-15-Y Model
Craftsman
Generic
Sales By Product Type and Region
0
50
100
150
200
250
300
350
400
450
XJ-6 Model X-15-Y Model Craftsman Generic
Product Type
S
a
le
s
East
West
North
South
11
13) Boston Properties is a real estate investment trust that owns
office properties in selected
markets. Its income distribution by region (in percent) in 2007 i
s:
a) Construct a pie chart to display the income distribution by re
gion for 2007.
b) Construct a bar chart to display the income distribution by re
gion for 2007.
c) Which chart more effectively displays the information?
a) A pie chart displaying income distribution by region is shown
below. The categories are the regions and the
measure is the region’s percentage of total income.
b) The bar chart displaying income distribution by region is sho
wn below. The categories are the regions and
the measure for each category is the region’s percentage of total
income.
Sales By Product Type and Region
0
200
400
600
800
1000
1200
East West North South
Product Type
S
a
le
s
Generic
Craftsman
X-15-Y Model
XJ-6 Model
Princeton
4% Washingto
n, D.C
21%
Boston
27%
New York
34%
San
Francisco
14%
Income Distribution by Region
12
c) Both charts clearly indicate the income distribution for
Boston Properties by region. The bar chart, however,
makes it easier to compare percentages across regions.
14) The following data represents 11 observations of two quanti
tative variables:
x = contact hours with client
y= profit generated from client
a) Construct a scatter plot of the data. Indicate whether the plot
suggests a linear or non‐linear
relationship between the dependent and the independent variabl
es.
b) Determine how much influence one data point will have on y
our perception of the relationship
between the independent and the dependent variables by deletin
g the data point with the smallest x
value. What appears to be the relationship between the depende
nt and the independent variables?
a)
There appears to be a curvilinear relationship between the depen
dent and independent variables.
b)
0%
10%
20%
30%
40%
In
co
m
e
D
is
tr
ib
u
ti
o
n
%
Region
Income Distribution by Region
‐1000
0
1000
2000
3000
4000
5000
0 20 40 60
Series1
13
Having removed the extreme data points, the relationship betwe
en dependent and independent variables
seems to be linear and positive.
15) The following information shows the year‐end dollar value (
in millions) of deposits for the Bank
of Ozarks, Inc., for the years 1997‐2007.
Draw a line chart of the data and interpret the same.
The time‐series variable is Year‐End Dollar Value Deposits ($ i
n millions) measured over 8 years with a
maximum value of 1,380 (million). The horizontal axis will hav
e 8 time periods equally spaced. The vertical axis
will start at 0 and go to a value exceeding 1,380. We will use 1,
600. The vertical axis will also be divided into
200‐unit increments. The line chart of the data is shown below.
The line chart shows that Year‐End Deposits have been increasi
ng since 1997, but have increased more sharply
since 2002 and leveled off between 2006 and 2007.
0
1000
2000
3000
1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007
$
i
n
M
il
li
o
n
s
Bank of the Ozarks Deposits
Chapter 3 Lecture Power Point Slides.pdf
Chapter 2 Chapter 2 ––pp
Data PresentationData Presentation
Chapter Chapter 3 3 ––
Data Presentation Data Presentation
Using DescriptiveUsing Descriptive
Describing Data Describing Data
Using Descriptive Using Descriptive
G hG h
VisuallyVisually
GraphsGraphs
©2006 Thomson/South-Western 1
P l ti f 500 CitiP l ti f 500 CitiPopulation of 500
CitiesPopulation of 500 Cities
Class Number Size of City Frequency
1 Under 10,000 4
2 10,000 and under 15,000 51
3 15 000 d d 20 000 773 15,000 and under 20,000 77
4 20,000 and under 25,000 105
5 25,000 and under 30,000 84, ,
6 30,000 and under 35,000 60
7 35,000 and under 40,000 45
8 40 000 d d 45 000 388 40,000 and under 45,000 38
9 45,000 and under 50,000 31
10 50,000 and over 5,
500
Frequency DistributionFrequency Distributionq yq y
Lowest value in the data (L)Lowest value in the data (L)
s (K) value chosen to Number of Classes
(K) value chosen to
best represent the databest represent the data
rangerange HH –– LL
CWCW
( )( )
number of classesnumber of classes KKCWCW =
== =
Starting SalariesStarting SalariesStarting SalariesStarting
Salaries
No. of Graduating
Major Students Starting Salary (thousands of dollars)j g y ( )
Accounting 26 41.5 39.4 40.9 35.9 37.4 39.5 40.3
39.3 41.6 36.6 41.1 35.7 43.7 37.0
41.3 40.6 38.0 42.4 35.7 41.4 39.2
36.8 39.3 43.8 38.5 43.0
Information 10 36.3 35.6 36.2 38.1 34.8 38.1 35.7
systems 36.5 39.5 37.9
Marketing 14 34.3 36.8 33.8 35.0 37.8 38.7 37.2
32.8 38.2 37.0 39.7 38.8 35.2 36.2
Freq enc Distrib tionFreq enc Distrib tionFrequency
Distribution Frequency Distribution
for Continuous Datafor Continuous Datafor Continuous Datafor
Continuous Data
Original Data Ordered Array
41.5 39.4 40.9 35.9 37.4
39 5 40 3 39 3 41 6 36 6
Original Data
32.8 33.8 34.3 34.8 35.0
35 2 35 6 35 7 35 7 35 7
Ordered Array
39.5 40.3 39.3 41.6 36.6
41.1 35.7 43.7 37.0 41.3
40.6 38.0 42.4 35.7 41.4
35.2 35.6 35.7 35.7 35.7
35.9 36.2 36.2 36.3 36.5
36.6 36.8 36.8 37.0 37.0
39.2 36.8 39.3 43.8 38.5
43.0 36.3 35.6 36.2 38.1
37.2 37.4 37.8 37.9 38.0
38.1 38.1 38.2 38.5 38.7
34.8 38.1 35.7 36.5 39.5
37.9 34.3 36.8 33.8 35.0
37 8 38 7 37 2 32 8 38 2
38.8 39.2 39.3 39.3 39.4
39.5 39.5 39.7 40.3 40.6
40 9 41 1 41 3 41 4 41 537.8 38.7 37.2 32.8 38.2
37.0 39.7 38.8 35.2 36.2
40.9 41.1 41.3 41.4 41.5
41.6 42.4 43.0 43.7 43.8
Freq enc Distrib tionFreq enc Distrib tionFrequency
Distribution Frequency Distribution
for Continuous Datafor Continuous Datafor Continuous Datafor
Continuous Data
Class Relative
Number Class Frequency FrequencyNumber Class Frequency
Frequency
1 32 and under 34 2 .04
2 34 and under 36 9 .183 a d u de 36 9 8
3 36 and under 38 13 .26
4 38 and under 40 14 .28
5 40 d d 42 8 165 40 and under 42 8 .16
6 42 and under 44 4 .08
50 1 0050 1.00
Constructing a FrequencyConstructing a FrequencyConstructing
a Frequency Constructing a Frequency
DistributionDistributionDistributionDistribution
1.1. Gather the sample dataGather the sample datapp
2.2. Arrange the data in an ordered arrayArrange the data in an
ordered array
3.3. Select the number of classes to be usedSelect the number of
classes to be used
44 Determine the class widthDetermine the class width4.4.
Determine the class widthDetermine the class width
5.5. Determine the class limits for each classDetermine the class
limits for each class
6.6. Count the number of data values in each Count the number
of data values in each
class (the class frequencies)class (the class frequencies)class
(the class frequencies)class (the class frequencies)
7.7. Summarize the class frequencies in a Summarize the class
frequencies in a
f di t ib ti t blf di t ib ti t blfrequency distribution
tablefrequency distribution table
HistogramHistogramgg
Histogram is a graphical A Histogram is a graphical
representation of a frequency representation of a frequency
di t ib ti f ti d tdi t ib ti f ti d tdistribution for continuous
datadistribution for continuous data
proportional toproportional toproportional to proportional to
the frequency the frequency
ffof that classof that class
HistogramHistogramgg
1515 ——15 15 ——
12 12 ——cycy
9 9 ——
eq
ue
nc
eq
ue
nc
6 6 ——
33 ——
Fr
e
Fr
e
3 3
——
3232 3434 3636 3838 4040 4242 4444
Starting salary (thousands of dollars)Starting salary (thousands
of dollars)
3232 3434 3636 3838 4040 4242 4444
HistogramHistogramgg
3030.30 .30 ——
.24 .24 ——ue
nc
y
ue
nc
y
.18 .18 ——
ve
fr
eq
ve
fr
eq
.12 .12 ——
0606 ——R
el
at
iv
R
el
at
iv
.06 .06
——
3232 3434 3636 3838 4040 4242 4444
Starting salary (thousands of dollars)Starting salary (thousands
of dollars)
3232 3434 3636 3838 4040 4242 4444
StemStem--andand--Leaf DiagramsLeaf Diagramsgg
--andand--Leaf Diagrams were Leaf Diagrams
were
developed to summarize data developed to summarize data de e
oped to su a e datade e oped to su a e data
without loss of informationwithout loss of information
d l i dd l i dmoderately sized moderately sized
data sets (< 150 values)data sets (< 150 values)( )( )
StemStem--andand--Leaf DiagramsLeaf Diagramsgg
Reports of the afterReports of the after--tax profits of 12
companies tax profits of 12 companies
are (recorded as cents per dollar of revenue) as are (recorded as
cents per dollar of revenue) as
follows:follows:
3.4, 4.5, 2.3, 2.7, 3.8, 5.9, 3.4, 4.7, 2.4, 4.1, 3.6, 5.1 3.4, 4.5,
2.3, 2.7, 3.8, 5.9, 3.4, 4.7, 2.4, 4.1, 3.6, 5.1
Stem Leaf (unit = .1)
2 3 4 72 3 4 7
3 4 4 6 8
4 1 5 74 1 5 7
5 1 9
StemStem--andand--LeafLeaf
By rotating the By rotating the
StemStem andand Leaf weLeaf we
7 6
8
7
StemStem--andand--Leaf we Leaf we
get an image of the get an image of the
h f th d th f th d t
3
4
4
4
1
5
1
9 shape of the data.shape of the data.
2 3 4 5
Ordered Arra ofOrdered Arra ofOrdered Array of Ordered Array
of
Aptitude Test ScoresAptitude Test ScoresAptitude Test
ScoresAptitude Test Scores
22 44 56 68 78
25 44 57 68 78
28 46 59 69 80
31 48 60 71 82
34 49 61 72 8334 49 61 72 83
35 51 63 72 85
39 53 63 74 8839 53 63 74 88
39 53 63 75 90
40 55 65 75 92
42 55 66 76 96
StemStem--andand--Leaf Diagram forLeaf Diagram
forStemStem--andand--Leaf Diagram for Leaf Diagram for
Aptitude Test ScoresAptitude Test Scorespp
St L f ( it 1)Stem Leaf (unit = 1)
2 2 5 8
3 1 4 5 9 9
4 0 2 4 4 6 8 9
5 1 3 3 5 5 6 7 9
6 0 1 3 3 3 5 6 8 8 9
1 2 2 4 6 8 87 1 2 2 4 5 5 6 8 8
8 0 2 3 5 8
9 0 2 69 0 2 6
StemStem--andand--Leaf Diagram for Leaf Diagram for gg
Aptitude Test ScoresAptitude Test Scores
Stem Leaf (unit = 1)
2 2
Using Using
2 5 8
3 1 4
3 5 9 9
repeated repeated
stemsstems
4 0 2 4 4
4 6 8 9
5 1 3 35 1 3 3
5 5 5 6 7 9
6 0 1 3 3 3
6 5 6 8 8 96 5 6 8 8 9
7 1 2 2 4
7 5 5 6 8 8
8 0 2 38 0 2 3
8 5 8
9 0 2
9 6
Frequency PolygonFrequency Polygon
frequency polygon is a graph that
represents the shape of the datarepresents the shape of the data
conceptualized as a connection
of the midpoints of the classes at theof the midpoints of the
classes at theof the midpoints of the classes at the of the
midpoints of the classes at the
height specified by the frequencyheight specified by the
frequency
polygon is similar to a A relative frequency polygon is similar
to a
frequency polygon, except that the height frequency polygon,
except that the height
is dictated by the relative frequencyis dictated by the relative
frequency
Frequency PolygonFrequency Polygon
Frequency PolygonFrequency Polygon
100 100 ––
90 90 ––
80 80 ––
70 70 ––
yy
60 60 ––
50 50 ––
4040e
qu
en
cy
eq
ue
nc
y
40 40 ––
30 30 ––
2020 ––
Fr
e
Fr
e
20 20
10 10 ––
||
1010
||
1515
||
2020
||
2525
||
3030
||
3535
||
4040
||
4545
||
5050
******
** 4 cities had populations of less than 10 0004 cities had
populations of less than 10 000
1010 1515 2020 2525 3030 3535 4040 4545 5050
Population (thousands)Population (thousands)
4 cities had populations of less than 10,0004 cities had
populations of less than 10,000
**** 5 cities had populations of 50,000 or greater5 cities had
populations of 50,000 or greater
Frequency PolygonFrequency Polygon
No college degreeNo college degreeNo college degreeNo
college degree
ye
es
ye
es
College degreeCollege degree
em
pl
oy
em
pl
oy
m
be
r
of
m
be
r
of
||
1010
||
2020
||
3030
||
4040
||
5050
||
6060
||
7070
||
8080
||
9090
N
um
N
um
||
100100
Annual salaries (thousands of dollars)Annual salaries
(thousands of dollars)
Cumulative FrequenciesCumulative Frequenciesqq
table
provides information on theprovides information on theprovides
information on the provides information on the
number of values that are less number of values that are less
than the pper class limitthan the pper class limitthan the upper
class limitthan the upper class limit
can be presented Results can be presented
graphically with an ogivegraphically with an ogive
Starting SalariesStarting Salariesgg
C l tiCumulative
Class Cumulative Relative Relative
Number Class Frequency Frequency Frequency Frequency
1 32 and under 34 2 2 .04 .04
2 34 and under 36 9 11 .18 .22
3 36 and under 38 13 24 .26 .483 36 and under 38 13 24 .26 .48
4 38 and under 40 14 38 .28 .76
5 40 and under 42 8 46 .16 .92
6 42 and under 44 4 50 08 1 006 42 and under 44 4 50 .08 1.00
50 1.00
Starting SalariesStarting Salariesgg
SalarySalary
Bar ChartsBar Charts
used for graphical
representation of nominal and representation of nominal and pp
ordinal dataordinal data
histogram the height of As with a histogram the height of
the bar is proportional to the the bar is proportional to the p pp
p
number of values in the categorynumber of values in the
category
Graduating Business MajorsGraduating Business Majors
2626
or
s
or
s
30 30 ––
25 25 ––
1414
r
of
m
aj
o
r
of
m
aj
o
20 20 ––
1515 ––
1010
N
um
be
r
N
um
be
r 15 15 ––
10 10 ––
NN 5 5 ––
AccountingAccounting Information Information
systemssystems
MarketingMarketing
Horizontal Bar ChartHorizontal Bar Chart
Q.Q. If the price of natural gas goes down by 25% in the next
few If the price of natural gas goes down by 25% in the next
few
years, would you and your family use more or less?years, would
you and your family use more or less?
A.A. 00 55 1010 1515 2020 2525 3030 3535
Use more
Use about the same
Use less
Not sure
Bar Chart of Quality CostsBar Chart of Quality Costs
s)s) 80 80 ––
6060o
us
an
ds
ou
sa
nd
s
60 60 ––
4040e
nt
(t
ho
en
t (
th
o
40 40 ––
2020la
rs
s
pe
la
rs
s
pe
20 20 ––
––
D
ol
l
D
ol
l
PreventionPrevention AppraisalAppraisal FailureFailure
Quality cost categoryQuality cost categoryy g yy g y
Pie ChartPie Chart
alternative to the The Pie Chart is an alternative to the
bar chart for nominal and ordinal bar chart for nominal and
ordinal
datadata
represents The proportion of the Pie represents
the category’s percentage in the the category’s percentage in the
g y p gg y p g
population or samplepopulation or sample
Percentage of GraduatingPercentage of GraduatingPercentage of
Graduating Percentage of Graduating
Business MajorsBusiness Majorsjj
Accounting
majors
AA
InformationInformation
AA
BB CC
Information Information
systems systems
majorsmajors
Marketing Marketing
majorsmajors
Due by 11pm June 30th
Chapter 1
Overview of Statistics
Chapter 2
Data Collection
Chapter 3
Describing Data Visually
Upload the completed assignment using the file extension
format Lastname_Firstname_Week1.doc.
Assignment
(32 points due by 11 pm June 30th)
Note: You can team up with one of your classmates to complete
the assignment (not more than two in a team); if you want to
work on the assignment individually, that’s also fine. If you are
working in teams, then only one submission is required per
team; include both the team members’ last names as part of the
assignment submission file name as well as in the assignment
submission document.
Please provide detailed solutions to the following
problems/exercises (4 problems/exercises x 8 points each):
1) What type of data (categorical, discrete numerical, or
continuous numerical) is each of the following variables?
a) Length of a TV commercial.
b) Number of peanuts in a can of Planter’s Mixed Nuts.
c) Occupation of a mortgage applicant.
d) Flight time from London Heathrow to Chicago O’Hare.
2) Which measurement level (nominal, ordinal, interval, ratio)
is each of the following variables? Explain.
a) Number of employees in the Walmart store in Hutchinson,
Kansas.
b) Number of merchandise returns on a randomly chosen
Monday at a Walmart store.
c) Temperature (in Fahrenheit) in the ice-cream freezer at a
Walmart store.
d) Name of the cashier at register 3 in a Walmart store
e) Manager’s rating of the cashier at register 3 in a Walmart
store.
f) Social security number of the cashier at register 3 in a
Walmart store.
3) The results of a survey that collected the current credit card
balances for 36 undergraduate college students are given in the
file “College Credit Card.’
a) Using the 2k > n rule, construct a frequency distribution for
these data.
b) Using the results from a), calculate the relative frequencies
for each class.
c) Using the results from a), calculate the cumulative relative
frequencies for each class.
d) Construct a histogram for these data.
4) The cost of manufacturing vehicles in Mexico is very
attractive to automakers. Global carmakers build approximately
1.9 million vehicles in Mexico. Of these, nearly 76% are
exported, primarily to the US. Although General Motors is the
largest manufacturer in Mexico, Daimler Chrysler exports the
most vehicles. Automotive analysts examine both the number of
vehicles produced and the number exported (see the data file
“Automotive”) to determine the potential market share of each
company.
a) For the data on vehicles produced in Mexico, construct a bar
chart displaying the amount produced by each company.
b) Repeat part a) using a pie-chart.
c) Construct a bar chart displaying the number of vehicles
exported from Mexico.
d) Repeat part d) using a pie-chart.
e) Do you prefer the bar charts or the pie charts for displaying
the data? Explain.
f) What differences do the charts reveal for the automotive
companies with respect to the number of vehicles produced and
number of vehicles exported?
1

Assignment Week 1.docDue by 11pm June 30th Chapter 1.docx

  • 1.
    Assignment Week 1.doc Dueby 11pm June 30th Chapter 1 Overview of Statistics Chapter 2 Data Collection Chapter 3 Describing Data Visually Upload the completed assignment using the file extension format Lastname_Firstname_Week1.doc. Assignment (32 points due by 11 pm June 30th) Note: You can team up with one of your classmates to complete the assignment (not more than two in a team); if you want to work on the assignment individually, that’s also fine. If you are working in teams, then only one submission is required per team; include both the team members’ last names as part of the assignment submission file name as well as in the assignment submission document. Please provide detailed solutions to the following problems/exercises (4 problems/exercises x 8 points each): 1) What type of data (categorical, discrete numerical, or continuous numerical) is each of the following variables?
  • 2.
    a) Length ofa TV commercial. b) Number of peanuts in a can of Planter’s Mixed Nuts. c) Occupation of a mortgage applicant. d) Flight time from London Heathrow to Chicago O’Hare. 2) Which measurement level (nominal, ordinal, interval, ratio) is each of the following variables? Explain. a) Number of employees in the Walmart store in Hutchinson, Kansas. b) Number of merchandise returns on a randomly chosen Monday at a Walmart store. c) Temperature (in Fahrenheit) in the ice-cream freezer at a Walmart store. d) Name of the cashier at register 3 in a Walmart store e) Manager’s rating of the cashier at register 3 in a Walmart store. f) Social security number of the cashier at register 3 in a Walmart store. 3) The results of a survey that collected the current credit card balances for 36 undergraduate college students are given in the file “College Credit Card.’ a) Using the 2k > n rule, construct a frequency distribution for these data. b) Using the results from a), calculate the relative frequencies for each class. c) Using the results from a), calculate the cumulative relative frequencies for each class.
  • 3.
    d) Construct ahistogram for these data. 4) The cost of manufacturing vehicles in Mexico is very attractive to automakers. Global carmakers build approximately 1.9 million vehicles in Mexico. Of these, nearly 76% are exported, primarily to the US. Although General Motors is the largest manufacturer in Mexico, Daimler Chrysler exports the most vehicles. Automotive analysts examine both the number of vehicles produced and the number exported (see the data file “Automotive”) to determine the potential market share of each company. a) For the data on vehicles produced in Mexico, construct a bar chart displaying the amount produced by each company. b) Repeat part a) using a pie-chart. c) Construct a bar chart displaying the number of vehicles exported from Mexico. d) Repeat part d) using a pie-chart. e) Do you prefer the bar charts or the pie charts for displaying the data? Explain. f) What differences do the charts reveal for the automotive companies with respect to the number of vehicles produced and number of vehicles exported? 1 Automotive.xls Sheet1ManufacturerVehiclesProducedVehiclesExportedGeneral Motors444,670324,651Volkswagen425,703338,825DaimlerChry sler404,637375,002Nissan313,496153,071Ford
  • 4.
    Motor280,585234,994 Sheet2 Sheet3 College Credit Card.xlsx Data2467337343414264628167592740644503260036734231178 85805539573348901455581711458467394370986572911591021 156396749935322972893751621740 Sheet2 Sheet3 Week1 Overview.pdf 1 Chapter 1 Overview of Statistics Chapter 2 Data Collection Chapter 3 Describing Data Visually Learning Objectives After studying the material in Chapter 1, you should be able to: 1. Define statistics and explain some of its uses in business. 2. List reasons for a business student to study statistics. 3. State the common challenges facing business professionals using statistics.
  • 5.
    4. List andexplain common statistical pitfalls. After studying the material in Chapter 2, you should be able to: 1. Use basic terminology for describing data and samples. 2. Explain the distinction between numerical and categorical data. 3. Explain the difference between time series and cross- sectional data. 4. Recognize levels of measurement in data and ways of coding data. 5. Recognize a Likert scale and know how to use it. 6. Use the correct terminology for samples and populations. 7. Explain the common sampling methods and how to implement them. 8. Find everyday print or electronic data sources. 9. Describe basic elements of survey design, survey types, and sources of error. After studying the material in Chapter 3, you should be able to: 1. Make a stem-and-leaf or dot plot by hand or by using software. 2. Create a frequency distribution for a dataset. 3. Make a histogram with appropriate bins. 4. Identify skewness, modal classes, and outliers in a histogram. 5. Make an effective line chart using Excel. 6. Know the rules for effective bar charts and pie charts. 7. Make and interpret a scatter plot using Excel. 8. Make simple tables and pivot tables. 9. Recognize deceptive graphing techniques. Suggested Study Outline
  • 6.
    1. First, brieflygo through chapters 1 to 3 in the textbook to familiarize yourself with the material. 2. Then, skim through the power point slides which highlight key chapter material, and the lecture files in the “course media player” (these provide a synopsis of the week’s chapters). 3. Go through chapters 1 to 3 in the textbook in detail again and take a look at the sample problems before attempting the assignment. 2 Assignment (32 points due by 11 pm June 30th) Note: You can team up with one of your classmates to complete the assignment (not more than two in a team); if you want to work on the assignment individually, that’s also fine. If you are working in teams, then only one submission is required per team; include both the team members’ last names as part of the assignment submission file name as well as in the assignment submission document. Please provide detailed solutions to the following
  • 7.
    problems/exercises (4 problems/exercisesx 8 points each): 1) What type of data (categorical, discrete numerical, or continuous numerical) is each of the following variables? a) Length of a TV commercial. b) Number of peanuts in a can of Planter’s Mixed Nuts. c) Occupation of a mortgage applicant. d) Flight time from London Heathrow to Chicago O’Hare. 2) Which measurement level (nominal, ordinal, interval, ratio) is each of the following variables? Explain. a) Number of employees in the Walmart store in Hutchinson, Kansas. b) Number of merchandise returns on a randomly chosen Monday at a Walmart store. c) Temperature (in Fahrenheit) in the ice-cream freezer at a Walmart store. d) Name of the cashier at register 3 in a Walmart store e) Manager’s rating of the cashier at register 3 in a Walmart store. f) Social security number of the cashier at register 3 in a Walmart store. 3) The results of a survey that collected the current credit card balances for 36 undergraduate college students are given in the file “College Credit Card.’ a) Using the 2k > n rule, construct a frequency distribution for these data. b) Using the results from a), calculate the relative frequencies for each class. c) Using the results from a), calculate the cumulative relative
  • 8.
    frequencies for eachclass. d) Construct a histogram for these data. 4) The cost of manufacturing vehicles in Mexico is very attractive to automakers. Global carmakers build approximately 1.9 million vehicles in Mexico. Of these, nearly 76% are exported, primarily to the US. Although General Motors is the largest manufacturer in Mexico, Daimler Chrysler exports the most vehicles. Automotive analysts examine both the number of vehicles produced and the number exported (see the data file “Automotive”) to determine the potential market share of each company. a) For the data on vehicles produced in Mexico, construct a bar chart displaying the amount produced by each company. b) Repeat part a) using a pie-chart. c) Construct a bar chart displaying the number of vehicles exported from Mexico. d) Repeat part d) using a pie-chart. e) Do you prefer the bar charts or the pie charts for displaying the data? Explain. 3 f) What differences do the charts reveal for the automotive companies with respect to the number of vehicles produced and number of vehicles exported? Refer to the “Assignments” section in the syllabus and the “Course Orientation” document for more information/instructions regarding assignment submissions.
  • 9.
    Zipped Chapters 1& 2 Material.zip Chapter 1 Power Point Slides.pdf C ha Overview of StatisticsOverview of Statistics pter Chapter ContentsChapter Contents 1 Chapter ContentsChapter Contents 1 1 What is Statistics?1 1 What is Statistics?1.1 What is Statistics?1.1 What is Statistics? 1.2 Why Study Statistics?1.2 Why Study Statistics? 1.3 Uses of Statistics1.3 Uses of Statistics 1.4 Statistical Challenges1.4 Statistical Challenges 1.5 Critical Thinking1.5 Critical Thinking 1-1 C ha Overview of StatisticsOverview of Statistics pter
  • 10.
    Chapter Learning ObjectivesChapterLearning Objectives 1 Chapter Learning ObjectivesChapter Learning Objectives LO1LO1 11LO1LO1--1:1: Define statistics and explain some of its uses inDefine statistics and explain some of its uses in business.business. LO1LO1--2:2: List reasons for a business student to study statistics.List reasons for a business student to study statistics. LO1LO1--3:3: State the common challenges facing businessState the common challenges facing business professionals using statistics.professionals using statistics. LO1LO1--4:4: List and explain common statistical pitfalls.List and explain common statistical pitfalls.p pp p 1-2 C ha 1.1 What is Statistics?1.1 What is Statistics?LO1LO1--11 pter 1 LO1LO1 11LO1LO1--1:1: Define statistics and explain some of Define statistics and explain some of its uses in business.its uses in business. •• StatisticsStatistics is the science of collecting organizing
  • 11.
    analyzingis the scienceof collecting organizing analyzingStatisticsStatistics is the science of collecting, organizing, analyzing, is the science of collecting, organizing, analyzing, interpreting, and presenting data.interpreting, and presenting data. •• AA statisticstatistic is a single measure (number) used to summarize is a single measure (number) used to summarize a sample data set; for example the average height ofa sample data set; for example the average height ofa sample data set; for example, the average height of a sample data set; for example, the average height of students in a university.students in a university. 1-3 C ha 1.1 What is Statistics?1.1 What is Statistics?LO1LO1--11 pter 1 •• For the height of students, a graduation gown manufacturer may need to For the height of students, a graduation gown manufacturer may need to know the average height for the length of the gowns or an architect may know the average height for the length of the gowns or an architect may need to know the maximum height to design the height of the doorwaysneed to know the maximum height to design the height of the doorwaysneed to know the maximum height to design the height of the doorways need to know the maximum height to design the height of the doorways
  • 12.
    of the classrooms.ofthe classrooms. 1-4 C ha 1.2 Why Study Statistics?1.2 Why Study Statistics? pter LO1LO1--22 1 LO1LO1--2: 2: List reasons for a business student to study statistics.List reasons for a business student to study statistics. •• Knowing statistics will make you a better consumer of other Knowing statistics will make you a better consumer of other people's data. people's data. p pp p •• You should know enough to handle everyday data You should know enough to handle everyday data problems, to feel confident that others cannot deceive you problems, to feel confident that others cannot deceive you with spurious arguments, and to know when you've reached with spurious arguments, and to know when you've reached the limits of your expertise.the limits of your expertise. 1-5 S S ?S S ? C
  • 13.
    ha 1.2 Why StudyStatistics?1.2 Why Study Statistics? pter LO1LO1--22 •• Statistical knowledge gives a company a competitive Statistical knowledge gives a company a competitive 1 advantage against organizations that cannot understand advantage against organizations that cannot understand their internal or external market data. their internal or external market data. Mastery of basic statistics gives an individual manager aMastery of basic statistics gives an individual manager a•• Mastery of basic statistics gives an individual manager a Mastery of basic statistics gives an individual manager a competitive advantage as one works one’s way through the competitive advantage as one works one’s way through the promotion process, or when one moves to a new employer. promotion process, or when one moves to a new employer. p p , p yp p , p y •• Here are some reasons to study statistics.Here are some reasons to study statistics. 1-6 C ha
  • 14.
    1.2 Why StudyStatistics?1.2 Why Study Statistics?LO1LO1-- 22 pter C i tiC i ti 1 CommunicationCommunication Understanding the language of statistics facilitates Understanding the language of statistics facilitates i ti d i bl l ii ti d i bl l icommunication and improves problem solving.communication and improves problem solving. Computer SkillsComputer Skills The use of spreadsheets for data analysis and word processors or The use of spreadsheets for data analysis and word processors or presentation software for reports improves upon your existing skills.presentation software for reports improves upon your existing skills. 1-7 C ha 1.2 Why Study Statistics?1.2 Why Study Statistics? pter LO1LO1--22 Information ManagementInformation Management
  • 15.
    1 Information ManagementInformation Management Statisticshelps summarize small and large amounts of data Statistics helps summarize small and large amounts of data and reveal underlying relationshipsand reveal underlying relationships.. Technical LiteracyTechnical Literacy Career opportunities are in growth industries propelled by Career opportunities are in growth industries propelled by advanced technology. The use of statistical software increases advanced technology. The use of statistical software increases your technical literacyyour technical literacyyour technical literacy.your technical literacy. 1-8 C ha 1.2 Why Study Statistics?1.2 Why Study Statistics? pter LO1LO1--22 Process ImprovementProcess Improvement 1 pp
  • 16.
    Statistics helps firmsoversee their suppliers monitor theirStatistics helps firms oversee their suppliers monitor theirStatistics helps firms oversee their suppliers, monitor their Statistics helps firms oversee their suppliers, monitor their internal operations, and identify problems.internal operations, and identify problems. 1-9 C ha 1.3 Uses of Statistics1.3 Uses of Statistics pter Two primary kinds of statistics:Two primary kinds of statistics: 1 Two primary kinds of statistics:Two primary kinds of statistics: Descriptive statisticsDescriptive statistics – the collection, organization, presentation, the collection, organization, presentation, and summary of data.and summary of data. Inferential statisticsInferential statistics – generalizing from a sample to a generalizing from a sample to a population, estimating unknown parameters, drawing population, estimating unknown parameters, drawing popu a o , es a g u o pa a e e s, d a gpopu a o , es a g u o pa a e e s, d a g conclusions, making decisions.conclusions, making decisions.
  • 17.
    1-10 C ha 1.3 Uses ofStatistics1.3 Uses of Statistics pter LO1LO1--11 1 1-11 C ha 1.3 Uses of Statistics1.3 Uses of Statistics pter LO1LO1--11 AuditingAuditing 1 AuditingAuditing Sample from over 12,000 invoices to estimate the proportion of Sample from over 12,000 invoices to estimate the proportion of incorrectly paid invoicesincorrectly paid invoices MarketingMarketing
  • 18.
    incorrectly paid invoices.incorrectlypaid invoices. MarketingMarketing Identify likely repeat customers for Identify likely repeat customers for Amazon.comAmazon.com and suggest coand suggest co-- marketing opport nities based on a database of 5 millionmarketing opport nities based on a database of 5 millionmarketing opportunities based on a database of 5 million marketing opportunities based on a database of 5 million Internet purchases.Internet purchases. 1-12 C ha 1.3 Uses of Statistics1.3 Uses of Statistics pter LO1LO1--11 Health CareHealth Care 1 Evaluate 100 incoming patients using a 42Evaluate 100 incoming patients using a 42--item physical and item physical and mental assessment questionnaire.mental assessment questionnaire. Quality ImprovementQuality Improvement
  • 19.
    Initiate a tripleinspection program, setting penalties for workers Initiate a triple inspection program, setting penalties for workers who produce poorwho produce poor--quality outputquality outputwho produce poorwho produce poor--quality output.quality output. 1-13 C ha 1.3 Uses of Statistics1.3 Uses of Statistics pter LO1LO1--11 PurchasingPurchasing 1 Determine the defect rate of a shipment and whether that rate Determine the defect rate of a shipment and whether that rate h h d i ifi tl tih h d i ifi tl tihas changed significantly over time.has changed significantly over time. MedicineMedicine Determine whether a new drug is really better than the Determine whether a new drug is really better than the placebo or if the difference is due to chanceplacebo or if the difference is due to chanceplacebo or if the difference is due to chance.placebo or if the difference is due to chance.
  • 20.
    1-14 C ha 1.3 Uses ofStatistics1.3 Uses of Statistics pter LO1LO1--11 Operations ManagementOperations Management 1 Operations ManagementOperations Management Manage inventory by forecasting consumer demand.Manage inventory by forecasting consumer demand. Product WarrantyProduct Warrantyyy Determine the average dollar cost of engine Determine the average dollar cost of engine warranty claims on a new hybrid enginewarranty claims on a new hybrid enginewarranty claims on a new hybrid engine.warranty claims on a new hybrid engine. 1-15 C ha
  • 21.
    1.4 Statistical Challenges1.4Statistical Challenges pter LO1LO1--33 1 LO1LO1--3: 3: State the common challenges facingState the common challenges facing business professionals using statistics.business professionals using statistics. The Ideal Data AnalystThe Ideal Data Analyst •• Is technically current (e.g., softwareIs technically current (e.g., software--wise).wise). •• Communicates wellCommunicates well•• Communicates well.Communicates well. •• Is proactive.Is proactive. 1-16 C ha 1.4 Statistical Challenges1.4 Statistical Challenges pter LO1LO1--33 The Ideal Data AnalystThe Ideal Data Analyst
  • 22.
    1 •• Has abroad outlook.Has a broad outlook. •• Is flexible.Is flexible. •• Focuses on the main problem.Focuses on the main problem. •• Meets deadlinesMeets deadlines 1-17 C ha 1.4 Statistical Challenges1.4 Statistical Challenges pter LO1LO1--33 The Ideal Data AnalystThe Ideal Data Analyst 1 •• Knows his/her limitations and is willing to ask for help.Knows his/her limitations and is willing to ask for help. •• Can deal with imperfect information.Can deal with imperfect information. •• Has professional integrityHas professional integrity•• Has professional integrity.Has professional integrity. 1-18
  • 23.
    C ha 1.4 Statistical Challenges1.4Statistical Challenges pter LO1LO1--33 1 Imperfect Data and Practical ConstraintsImperfect Data and Practical Constraints State any assumptions and limitations and use generallyState any assumptions and limitations and use generallyState any assumptions and limitations and use generally State any assumptions and limitations and use generally accepted statistical tests to detect unusual data points or to accepted statistical tests to detect unusual data points or to deal with missing data. You will face constraints on the type deal with missing data. You will face constraints on the type and quality of data you can collect.and quality of data you can collect. 1-19 C ha 1.4 Statistical Challenges1.4 Statistical Challenges pter
  • 24.
    LO1LO1--33 Business EthicsBusiness Ethics 1 BusinessEthicsBusiness Ethics Some broad ethical responsibilities of business areSome broad ethical responsibilities of business are •• Treating customers in a fair and honest manner.Treating customers in a fair and honest manner. C l i i h l h hibi di i i iC l i i h l h hibi di i i i•• Complying with laws that prohibit discrimination.Complying with laws that prohibit discrimination. •• Ensuring that products and services meet safety regulations.Ensuring that products and services meet safety regulations. 1-20 C ha 1.4 Statistical Challenges1.4 Statistical Challenges pter LO1LO1--33 Business EthicsBusiness Ethics 1
  • 25.
    Some broad ethicalresponsibilities of business are (continued)Some broad ethical responsibilities of business are (continued) •• Standing behind warranties.Standing behind warranties. •• Advertising in a factual and informative mannerAdvertising in a factual and informative mannerAdvertising in a factual and informative manner.Advertising in a factual and informative manner. •• Encouraging employees to ask questions and voice concerns Encouraging employees to ask questions and voice concerns about the company’s business practices.about the company’s business practices. •• Being responsible for accurately reporting information to Being responsible for accurately reporting information to management.management. 1-21 C ha 1.4 Statistical Challenges1.4 Statistical Challenges pter LO1LO1--33 Upholding Ethical StandardsUpholding Ethical Standards 1
  • 26.
    Upholding Ethical StandardsUpholdingEthical Standards Ethi l t d d f th d t l tEthi l t d d f th d t l tEthical standards for the data analyst:Ethical standards for the data analyst: •• Know and follow accepted proceduresKnow and follow accepted procedures•• Know and follow accepted procedures.Know and follow accepted procedures. •• Maintain data integrity.Maintain data integrity. •• Carry out accurate calculationsCarry out accurate calculations•• Carry out accurate calculations.Carry out accurate calculations. 1-22 C ha 1.4 Statistical Challenges1.4 Statistical Challenges pter LO1LO1--33 Upholding Ethical StandardsUpholding Ethical Standards 1 Upholding Ethical StandardsUpholding Ethical Standards Ethical standards for the data analyst (continued):Ethical standards for the data analyst (continued):Ethical standards for the data analyst (continued):Ethical standards for the data analyst (continued):
  • 27.
    •• Report proceduresfaithfully.Report procedures faithfully.p p yp p y •• Protect confidential information.Protect confidential information. •• Cite sources.Cite sources. •• Acknowledge sources of financial support.Acknowledge sources of financial support. 1-23 C ha 1.4 Statistical Challenges1.4 Statistical Challenges pter LO1LO1--33 U i C lt tU i C lt t 1 Using ConsultantsUsing Consultants Hire consultants at the Hire consultants at the beginningbeginning of the project, when your team of the project, when your team g gg g p j , yp j , y lacks certain skills or when an unbiased or informed view is needed.lacks certain skills or when an unbiased or informed view is needed. 1-24
  • 28.
    C ha 1.4 Statistical Challenges1.4Statistical Challenges pter LO1LO1--33 1 Communicating with NumbersCommunicating with Numbers N b h i l h i t d i th t t fN b h i l h i t d i th t t f•• Numbers have meaning only when communicated in the context of a Numbers have meaning only when communicated in the context of a certain situation.certain situation. •• Presentation should be such that managers will quickly understandPresentation should be such that managers will quickly understandPresentation should be such that managers will quickly understand Presentation should be such that managers will quickly understand the information they need to use in order to make good decisions. the information they need to use in order to make good decisions. 1-25 C ha 1.4 Statistical Challenges1.4 Statistical Challenges pter
  • 29.
    LO1LO1--33 Skills Needed forSuccess in BusinessSkills Needed for Success in Business 1 (Table 1.1)(Table 1.1)For For initialinitial job successjob success For For longlong--rangerange job successjob success Common Common weaknessesweaknessesjob successjob success job successjob success weaknessesweaknesses ReportReport--writingwriting Managerial Managerial Communication Communication ReportReport writingwriting accountingaccounting skillsskills Accounting Accounting Managerial Managerial Writing skillsWriting skillsprinciplesprinciples economicseconomics Writing skillsWriting skills MathematicsMathematics Managerial financeManagerial finance ImmaturityImmaturityMathematicsMathematics Managerial financeManagerial finance ImmaturityImmaturity StatisticsStatistics Oral Oral communicationcommunication Unrealistic Unrealistic expectationsexpectationscommunicationcommunication expectationsexpectations 1-26
  • 30.
    C ha 1.5 Critical Thinking1.5Critical Thinking pter •• Statistics is an essential part of Statistics is an essential part of critical thinkingcritical thinking because it because it ll t t t id i t i i l idll t t t id i t i i l id 1 allows us to test an idea against empirical evidence.allows us to test an idea against empirical evidence. E i i l d tE i i l d t t d t ll t d th h b tit d t ll t d th h b ti•• Empirical data Empirical data represent data collected through observation represent data collected through observation and experiments.and experiments. •• Statistical tools are used to compare prior ideas with empirical Statistical tools are used to compare prior ideas with empirical data, but data, but pitfalls do occur.pitfalls do occur.pp 1-27 C ha 1.5 Critical Thinking1.5 Critical Thinking
  • 31.
    pter LO1LO1--44 1 LO1LO1--4: 4: Listand explain common statistical pitfallsList and explain common statistical pitfalls. Pitfall 1:Pitfall 1: Making Conclusions about a LargeMaking Conclusions about a LargePitfall 1: Pitfall 1: Making Conclusions about a Large Making Conclusions about a Large Population from a Small SamplePopulation from a Small Sample Be careful about making generalizations from small samples Be careful about making generalizations from small samples ( f 10 ti t h h d i t)( f 10 ti t h h d i t)(e.g., a group of 10 patients who showed improvement).(e.g., a group of 10 patients who showed improvement). 1-28 C hapter 1.5 Critical Thinking1.5 Critical ThinkingLO1LO1--44 Pitfall 2: Pitfall 2: Making Conclusions fromMaking Conclusions from 1 gg Nonrandom SamplesNonrandom Samples
  • 32.
    Be careful aboutmaking generalizations from small samples and Be careful about making generalizations from small samples and from retrospective studies of special groups (e.g., studying heart from retrospective studies of special groups (e.g., studying heart p p g p ( g , y gp p g p ( g , y g attack patients without defining matched control group).attack patients without defining matched control group). 1-29 C hapter 1.5 Critical Thinking1.5 Critical ThinkingLO1LO1--44 Pitfall 3: Pitfall 3: Conclusions From Rare EventsConclusions From Rare Events 1 Be careful about drawing strong inferences from events that are Be careful about drawing strong inferences from events that are not surprising when looking at the entire population (e.g., not surprising when looking at the entire population (e.g., p g g p p ( g ,p g g p p ( g , winning the lottery).winning the lottery). Pitfall 4: Pitfall 4: Using Poor Survey MethodsUsing Poor Survey Methods Be careful about using poor sampling methods or vaguely Be careful about using poor sampling methods or vaguely worded questions (e.g., anonymous survey or quiz).worded
  • 33.
    questions (e.g., anonymoussurvey or quiz). 1-30 C ha CC pter 1.5 Critical Thinking1.5 Critical ThinkingLO1LO1--44 Pitfall 5: Pitfall 5: Assuming a Causal Link Based on Assuming a Causal Link Based on 1 gg ObservationsObservations Be careful about drawing conclusions when no causeBe careful about drawing conclusions when no cause--andand--effect effect link exists (e.g., most shark attacks occur between 12 p.m. and 2 link exists (e.g., most shark attacks occur between 12 p.m. and 2 p.m.).p.m.). 1-31 C hapter
  • 34.
    1.5 Critical Thinking1.5Critical ThinkingLO1LO1--44 Pitfall 6:Pitfall 6: Generalization to Individuals fromGeneralization to Individuals from 1 Pitfall 6: Pitfall 6: Generalization to Individuals from Generalization to Individuals from Observations about GroupsObservations about Groups Avoid reading too much into Avoid reading too much into statistical generalizationsstatistical generalizations (e g men are taller than women)(e g men are taller than women)(e.g., men are taller than women).(e.g., men are taller than women). 1-32 C ha CC pter 1.5 Critical Thinking1.5 Critical ThinkingLO1LO1--44 Pitfall 7: Pitfall 7: Unconscious BiasUnconscious Bias 1 Be careful about unconsciously or subtly allowing bias to color Be careful about unconsciously or subtly allowing bias to color handling of data (e.g., heart disease in men vs.
  • 35.
    women).handling of data(e.g., heart disease in men vs. women).g ( g , )g ( g , ) Pitfall 8: Pitfall 8: Significance versus ImportanceSignificance versus Importance Statistically significant effects may lack practical importance Statistically significant effects may lack practical importance (e.g., Austrian military recruits born in the spring average 0.6 cm (e.g., Austrian military recruits born in the spring average 0.6 cm t ll th th b i th f ll)t ll th th b i th f ll)taller than those born in the fall).taller than those born in the fall). 1-33 Chapter 2 Power Point Slides.pdf C ha CC pter Data CollectionData Collection Chapter ContentsChapter Contents 2 1 Definitions2 1 Definitions 2 2.1 Definitions2.1 Definitions 2.2 Level of Measurement2.2 Level of Measurement 2 3 S li C t2 3 S li C t2.3 Sampling Concepts2.3 Sampling
  • 36.
    Concepts 2.4 Sampling Methods2.4Sampling Methods 2.5 Data Sources2.5 Data Sources 2.6 Surveys2.6 Surveys2.6 Surveys 2.6 Surveys 2-1 C ha CC pter Data CollectionData Collection Chapter Learning ObjectivesChapter Learning Objectives 2 Chapter Learning ObjectivesChapter Learning Objectives LO2LO2 11LO2LO2--1: 1: Use basic terminology for describing data and samples.Use basic terminology for describing data and samples. LO2LO2--2: 2: Explain the distinction between numerical and Explain the distinction between numerical and categorical data.categorical data. LO2LO2--3: 3: Explain the difference between time series and crossExplain the difference between time series and cross-- sectional data.sectional data. LO2LO2--4: 4: Recognize levels of measurement in data and
  • 37.
    ways of Recognizelevels of measurement in data and ways of di d tdi d tcoding data.coding data. LO2LO2--5: 5: Recognize a Likert scale and know how to use it.Recognize a Likert scale and know how to use it. 2-2 C ha CC pter Data CollectionData Collection Chapter Learning ObjectivesChapter Learning Objectives 2 LO2LO2--6: 6: Use the correct terminology for samples and Use the correct terminology for samples and gy pgy p populations.populations. LO2LO2--7: 7: Explain the common sampling methods and how to Explain the common sampling methods and how to p p gp p g implement them.implement them. LO2LO2--8: 8: Find everyday print or electronic data sources. Find everyday print or electronic data sources. y y py y p LO2LO2--9: 9: Describe basic elements of survey design, survey types, Describe basic elements of survey design, survey types,
  • 38.
    and sources oferror.and sources of error. 2-3 C ha LO2LO2--11 2.1 Definitions2.1 Definitions pter LO2LO2--1: 1: Use basic terminology for describing data and Use basic terminology for describing data and samples.samples. 2 samples.samples. Observations, Observations, Variables, Data SetsVariables, Data Sets •• ObservationObservation: : a single member of a collection of items that we want to study, such as a person, firm, or region. V i blV i bl h t i ti f th bj t•• Variable:Variable: a characteristic of the subject or individual, such as an employee’s income or an i i tinvoice amount •• Data SetData Set: consists of all the values of all of the variables for all of the observations we haveof the variables for all of the observations we have chosen to observe.
  • 39.
    2-4 C hapter 2.1 Definitions2.1 Definitions Table2.2: Table 2.2: Number of Variables and Typical TasksNumber of Variables and Typical Tasks 2 Data SetData Set VariablesVariables ExampleExample Typical TasksTypical Tasks Univariate One Income Histograms, descriptive statistics, frequency tallies Bivariate Two Income, Age Scatter plots, correlations, regression modeling Multivariate More than two Income, Age, Multiple regression, data mining, econometric two Gender modeling 2-5
  • 40.
    D t TDt T C ha Data TypesData Types pter LO2LO2--22 2 LO2LO2--2: 2: Explain the distinction between numerical and categorical data.Explain the distinction between numerical and categorical data. • Note: Ambiguity is introduced when continuous data are (Figure 2.1) Note: Ambiguity is introduced when continuous data are rounded to whole numbers. Be cautious. 2-6 C hapter Time Series versus CrossTime Series versus Cross--Sectional DataSectional DataLO2LO2--33 2 LO2LO2--3: 3: Explain the difference between time series and crossExplain the difference between time series and cross-- sectional sectional data.data.
  • 41.
    Time Series DataTimeSeries Data • Each observation in the sample represents a different equally d i t i ti ( th d )spaced point in time (e.g., years, months, days). • Periodicity may be annual, quarterly, monthly, weekly, daily, hourly, etc.etc. • We are interested in trends and patterns over time (e.g., personal bankruptcies from 1980 to 2008). 2-7 C hapter Time Series Versus CrossTime Series Versus Cross--Sectional DataSectional DataLO2LO2--33 2 Cross Sectional DataCross Sectional Data • Each observation represents a different individual unit (e.g., person) at the same point in time (e g monthly VISA balances)person) at the same point in time (e.g., monthly VISA balances). • We are interested in: - variation among observations or - relationships.
  • 42.
    • We cancombine the two data types to get pooled cross- sectional and time series dataand time series data. 2-8 C hapter 2.2 Level of Measurement2.2 Level of MeasurementLO2LO2-- 44 2 LO2LO2--4: 4: Recognize levels of measurement in data and ways of Recognize levels of measurement in data and ways of coding data.coding data. 2-9 C hapter 2.2 Level of Measurement2.2 Level of MeasurementLO2LO2-- 44 2 LO2LO2--4: 4: Recognize levels of measurement in data and ways of Recognize levels of measurement in data and ways of coding data.coding data. Levels of MeasurementLevels of Measurement
  • 43.
    L l fLl f gg Level of Level of MeasurementMeasurement CharacteristicsCharacteristics ExampleExample NominalNominal Categories only Eye color ((blueblue, , brownbrown, , greengreen, etc.), etc.) Rank has meaning OrdinalOrdinal Rank has meaning. No clear meaning to distance Rarely, never IntervalInterval Distance has meaning Temperature (57 o Celsius) M i f l A t bl ($21 7RatioRatio Meaningful zero exists Accounts payable ($21.7 million) 2-10 C hapter 2.2 Level of Measurement2.2 Level of MeasurementLO2LO2-- 44
  • 44.
    Nominal MeasurementNominal Measurement 2 •Nominal data merely identify a categorycategory. • Nominal data are qualitative, attribute, categorical or classification data and can be coded numericallyclassification data and can be coded numerically (e.g., 1 = Apple, 2 = Compaq, 3 = Dell, 4 = HP). • Only mathematical operations are counting (e.g.,Only mathematical operations are counting (e.g., frequencies) and simple statistics. Ordinal MeasurementOrdinal Measurement • Ordinal data codes can be ranked (e.g., 1 = ( g Frequently, 2 = Sometimes, 3 = Rarely, 4 = Never). 2-11 C hapter 2.2 Level of Measurement2.2 Level of MeasurementLO2LO2-- 44 Ordinal MeasurementOrdinal Measurement • Distance between codes is not meaningful 2 • Distance between codes is not meaningful
  • 45.
    (e.g., distance between1 and 2, or between 2 and 3, or between 3 and 4 lacks meaning).g) • Many useful statistical tests exist for ordinal data. Especially useful in social science, marketing and human resource hresearch. I t l M tI t l M tInterval MeasurementInterval Measurement • Data can not only be ranked, but also have meaningful inter als bet een scale points (e g difference bet een 2-12 C hapter 2.2 Level of Measurement2.2 Level of MeasurementLO2LO2-- 44 Interval MeasurementInterval Measurement 2 • Since intervals between numbers represent distances, mathematical operations can be performed (e.g., average). • Zero point of interval scales is arbitrary so ratios are not• Zero point of interval scales is arbitrary, so ratios are not
  • 46.
    Ratio MeasurementRatio Measurement •Ratio data have all properties of nominal ordinal and intervalRatio data have all properties of nominal, ordinal and interval data types and also possess a meaningful zeromeaningful zero (absence of quantity being measured). 2-13 C hapter 2.2 Level of Measurement2.2 Level of MeasurementLO2LO2-- 44 Ratio MeasurementRatio Measurement 2 • Because of this zero point, ratios of data values are meaningful (e.g., $20 million profit is twice as much as $10 million). • Zero does not have to be observable in the data; it is an absolute• Zero does not have to be observable in the data; it is an absolute reference point. 2-14
  • 47.
    C hapter 2.2 Level ofMeasurement2.2 Level of MeasurementLO2LO2-- 55 2 LO2LO2--5: 5: Recognize a Likert scale and know how to use it.Recognize a Likert scale and know how to use it. Likert ScalesLikert Scales • A special case of interval data frequently used in survey research. • The coarseness of a Likert scale refers to the number of scale points (typically 5 or 7). 2-15 C hapter 2.2 Level of Measurement2.2 Level of MeasurementLO2LO2-- 55 Likert Scales (examples)Likert Scales (examples) “C ll“C ll b d hi h h l t d t h ld b i d tb d hi h h l t d t h ld b i d t 2 “College“College--bound high school students should be required to bound high school students should be required to
  • 48.
    study a foreignlanguage.” (check one)study a foreign language.” (check one) StronglyStrongly SomewhatSomewhat Neither AgreeNeither Agree SomewhatSomewhat StronglyStrongly AgreeAgree AgreeAgree Nor DisagreeNor Disagree DisagreeDisagree DisagreeDisagree How would you rate your marketing instructor? (check one)How would you rate your marketing instructor? (check one) TerribleTerrible PoorPoor AdequateAdequate GoodGood ExcellentExcellentTerribleTerrible PoorPoor AdequateAdequate GoodGood ExcellentExcellent 2-16 C hapter
  • 49.
    2.2 Level ofMeasurement2.2 Level of MeasurementLO2LO2-- 44 Use the following procedure Use the following procedure to recognize to recognize data types:data types: 2 QuestionQuestion If “Yes”If “Yes” Q1 I th i f l R ti d t ( t ti ti l ti ll d)Q1. Is there a meaningful zero point? Ratio data (statistical operations are allowed) Q2 Are intervals between Interval data (common statistics allowedQ2. Are intervals between scale points meaningful? Interval data (common statistics allowed, e.g., means and standard deviations) Q3 Do scale points Ordinal data (restricted to certain types ofQ3. Do scale points represent rankings? Ordinal data (restricted to certain types of nonparametric statistical tests) Q4 Are there discrete Nominal data (only counting allowed e gQ4. Are there discrete categories? Nominal data (only counting allowed, e.g., finding the mode) 2-17
  • 50.
    C hapter 2.2 Level ofMeasurement2.2 Level of MeasurementLO2LO2-- 44 Changing Data By RecodingChanging Data By Recoding 2 • In order to simplify data or when exact data magnitude is of little interest ratio data can be recoded downward into ordinal orinterest, ratio data can be recoded downward into ordinal or nominal measurements (but not conversely). • For example, recode systolic blood pressure as “normal” (under 130), “elevated” (130 to 140), or “high” (over 140). • The above recoded data are ordinal (ranking is preserved), but i t l l d i f ti i l tintervals are unequal and some information is lost. 2-18 C hapter LO2LO2--66 2.3 Sampling Concepts2.3 Sampling Concepts 2
  • 51.
    LO2LO2--6: 6: Usethe correct terminology for samples and populationsUse the correct terminology for samples and populationsgy p p pgy p p p Sample or CensusSample or Census • A samplesample involves looking only at some items selected from the l tipopulation. • A censuscensus is an examination of all items in a defined population. • Why can’t the United States Census survey every person in the• Why can t the United States Census survey every person in the population? – mobility, un-documented workers, budget constraints, incomplete responses, etc. 2-19 C hapter 2.3 Sampling Concepts2.3 Sampling ConceptsLO2LO2--66 Situations Where A Situations Where A SampleSample or or CensusCensus May Be PreferredMay Be Preferred 2 SampleSample Census Census pp
  • 52.
    Infinite population Small populationInfinitepopulation Small population Destructive testing Large sample sizeDestructive testing Large sample size Ti l lt D t b i tTi l lt D t b i tTimely results Database existsTimely results Database exists Accuracy Legal requirementsAccuracy Legal requirements CostCost Sensitive informationSensitive information 2-20 C hapter 2.3 Sampling Concepts2.3 Sampling ConceptsLO2LO2--66 Parameters and StatisticsParameters and Statistics 2 •• StatisticsStatistics are computed from a sample of n items, chosen from a population of N items.
  • 53.
    • Statistics canbe used as estimates of parametersparameters found in theStatistics can be used as estimates of parametersparameters found in the population. • Symbols are used to represent population parameters and sample statistics. 2-21 C hapter 2.3 Sampling Concepts2.3 Sampling ConceptsLO2LO2--66 2 Rule of Thumb: A population may be treated as infinite when N is at least 20 times n (i.e., when N/n ≥ 20). 2-22 C hapter 2.3 Sampling Concepts2.3 Sampling ConceptsLO2LO2--66 Target PopulationTarget Population 2 • The population must be carefully specified and the sample
  • 54.
    must be drawn scientificallyso that the sample is representative. • The target populationtarget population is the population we are interested in (e gThe target populationtarget population is the population we are interested in (e.g., U.S. gasoline prices). • The sampling framesampling frame is the group from which we take the sample (e.g., 115,000 stations). • The frame should not differ from the target population. 2-23 C hapter 2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77 2 LO2LO2--7: 7: Explain the common sampling methods and how to Explain the common sampling methods and how to implement them implement them Simple random sampleSimple random sample Use random numbers to select items from a Random SamplingRandom Sampling Simple random sampleSimple random sample Use random numbers to select items from a list (e.g., VISA cardholders).
  • 55.
    S t tilS t ti l S l t kth it f li tSystematic sampleSystematic sample Select every kth item from a list or sequence (e.g., restaurant customers). Stratified sampleStratified sample Select randomly within defined strata (e.g., by age, occupation, gender). Cluster sampleCluster sample Like stratified sampling except strata are geographical areas (e.g., zip codes). 2-24 C hapter 2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77 2 NonNon--random Samplingrandom Sampling Judgment sampleJudgment sample Use expert knowledge to choose “typical” items (e g which employees to interview) p gp g (e.g., which employees to interview). Convenience Convenience samplesample
  • 56.
    Use a samplethat happens to be available (e.g., ask co-worker opinions at lunch). Focus groupsFocus groups In-depth dialog with a representative panel of individuals (e.g., iPod users). 2-25 C hapter 2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77 With or Without ReplacementWith or Without Replacement 2 • If we allow duplicates when sampling, then we are sampling with with replacementreplacement. • Duplicates are unlikely when n is much smaller than large N. • If we do not allow duplicates when sampling, then we are sampling without replacementwithout replacement.sa p g ou ep ace eou ep ace e 2-26 C hapter
  • 57.
    2.4 Sampling Methods2.4Sampling MethodsLO2LO2--77 Computer MethodsComputer Methods 2 Excel Excel -- Option AOption A Enter the Excel function =RANDBETWEEN(1,875) into 10 spreadsheet cells. Press F9 to get a new sample. Excel Excel -- Option BOption B Enter the function =INT(1+875*RAND()) into 10 spreadsheet cells. Press F9 to get a new sample.p g p InternetInternet The website www.random.org will give you many kinds of excellent random numbers (integers, decimals, etc). MinitabMinitab Use Minitab’s Random Data menu with the Integer option. These areThese are pseudopseudo randomrandom generators because even the best algorithmsgenerators because even the best algorithmsThese are These are pseudopseudo-- randomrandom generators because even the best algorithms generators because even the best algorithms eventually repeat themselves.eventually repeat themselves. 2-27 C hapter
  • 58.
    2.4 Sampling Methods2.4Sampling MethodsLO2LO2--77 Row Row –– Column Data ArraysColumn Data Arrays 2 yy • When the data are arranged in a rectangular array, an item can be chosen at random by selecting a row and columnchosen at random by selecting a row and column. • For example, in the 4 x 3 array, select a random column between 1 and 3 and a random row between 1 and 4and 3 and a random row between 1 and 4. • This way, each item has an equal chance of being selected. 2-28 C hapter 2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77 Randomizing a ListRandomizing a List 2 • In Excel, use function =RAND() beside each row to create a column of random numbers between 0 and 1.
  • 59.
    • Copy andpaste these numbers into the same column using Paste Special > Values in order to paste only the values and not theSpecial > Values in order to paste only the values and not the formulas. • Sort the spreadsheet on the random number column• Sort the spreadsheet on the random number column. 2-29 C hapter 2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77 Systematic SamplingSystematic Sampling 2 • Sample by choosing every kth item from a list, starting from a randomly chosen entry on the list. • For example, starting at item 2, we sample every 4 it t bt i l f 20 it f4 items to obtain a sample of n = 20 items from a list of N = 78 items. (periodicity)periodicity)Note that Note that NN//n = n = 78/20 2-30
  • 60.
    C hapter 2.4 Sampling Methods2.4Sampling MethodsLO2LO2--77 Stratified SamplingStratified Sampling 2 • Utilizes prior information about the population. • Applicable when the population can be divided into relatively pp p p y homogeneous subgroups of known size ((stratastrata).). • A simple random sample of the desired size is taken within each• A simple random sample of the desired size is taken within each stratumstratum.. F l f l ti t i i 55% l d 45%• For example, from a population containing 55% males and 45% females, randomly sample from 110 males and 90 females (n = 200).00) 2-31 C hapter 2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77
  • 61.
    Cluster SampleCluster Sample 2 •Strata consist of geographical regions. •• OneOne--stagestage cluster sampling – sample consists of all elements in gg p g p each of k randomly chosen subregions (clusters). •• TwoTwo--stagestage cluster sampling first choose k subregions (clusters)•• TwoTwo--stagestage cluster sampling, first choose k subregions (clusters), then choose a random sample of elements within each cluster. 2-32 C hapter 2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77 Cluster SampleCluster Sample 2 • Here is an example of 4 p elements sampled from each of 3 randomly h l t (tchosen clusters (two- stage cluster sampling). 2-33
  • 62.
    C hapter 2.4 Sampling Methods2.4Sampling MethodsLO2LO2--77 Judgment SampleJudgment Sample 2 • A non-probability sampling method that relies on the expertise of the sampler to choose items that are representative of the populationrepresentative of the population. • Can be affected by subconscious bias (i.e., non-randomness i th h i )in the choice). •• Quota samplingQuota sampling is a special kind of judgment sampling, in which the interviewer chooses a certain number of people in each category. 2-34 C hapter 2.4 Sampling Methods2.4 Sampling MethodsLO2LO2--77 Convenience SampleConvenience Sample 2
  • 63.
    • Take advantageof whatever sample is available at that moment. A quick way to sample. Focus GroupsFocus Groups • A panel of individuals chosen to be representative of a wider population, formed for open-ended discussion and idea gathering. 2-35 C hapter 2.5 Data Sources2.5 Data SourcesLO2LO2--88 2 LO2LO2--8: 8: Find everyday print or electronic data sources.Find everyday print or electronic data sources. • One goal of a statistics course is to help you learn where to find data that might be needed. Fortunately, many excellent sources g y, y are widely available. Some sources are given in the following table. 2-36 C
  • 64.
    hapter 2.6 Surveys2.6 SurveysLO2LO2--99 2 LO2LO2--9:9: Describe basic elements of survey design, survey types, and sources of error. Basic Steps of Survey ResearchBasic Steps of Survey Research •• Step 1: State the goals of the research.State the goals of the research. p yp y •• Step 2: Develop the budget (time, money, staff). • Step 3: Create a research design (target population, f )frame, sample size). • Step 4: Choose a survey type and method ofChoose a survey type and method of administrationadministrationadministration.administration. 2-37 C hapter 2.6 Surveys2.6 SurveysLO2LO2--99 Basic Steps of Survey ResearchBasic Steps of Survey Research
  • 65.
    2 •• Step 5:Design a data collection instrumentDesign a data collection instrument p yp y Step 5: Design a data collection instrumentDesign a data collection instrument (questionnaire).(questionnaire). •• Step 6: Pretest the survey instrument and revise asPretest the survey instrument and revise as needed.needed. •• Step 7: Administer the survey (follow up if needed).Administer the survey (follow up if needed). •• Step 8: Code the data and analyze it.Code the data and analyze it. 2-38 C hapter 2.6 Surveys2.6 SurveysLO2LO2--99 Survey TypesSurvey Types 2 Survey GuidelinesSurvey Guidelines Mail Planning
  • 66.
    Telephone Interviews Design QualityInterviews Web Quality Pilot test Direct observationBuy-in Expertise 2-39 C hapter 2.6 Surveys2.6 SurveysLO2LO2--99 Questionnaire DesignQuestionnaire Design 2 • Use a lot of white space in layout. B i ith h t l i t ti• Begin with short, clear instructions. • State the survey purpose.
  • 67.
    • Instruct onhow to submit the completed survey. • Assure anonymity. • Break survey into naturally occurring sections. • Let respondents bypass sections that are not applicable (e.g., “if d t ti 7 ki di tl t Q ti 15”)you answered no to question 7, skip directly to Question 15”). 2-40 C hapter 2.6 Surveys2.6 SurveysLO2LO2--99 Questionnaire DesignQuestionnaire Design 2 gg • Pretest and revise as needed. K h t ibl• Keep as short as possible. Types of QuestionsTypes of Questions Open-ended Fill i th bl k Types of QuestionsTypes of Questions
  • 68.
    Fill-in-the-blank Check boxes Ranked choices Pictograms Likertscale 2-41 C hapter 2.6 Surveys2.6 SurveysLO2LO2--99 Question WordingQuestion Wording 2 • The way a question is asked has a profound influence on the response. For example, 1. Shall state taxes be cut? 2. Shall state taxes be cut, if it means reducing highway maintenance? 3 Sh ll t t t b t if it fi i t h d3. Shall state taxes be cut, if it means firing teachers and police? 2-42 C
  • 69.
    hapter 2.6 Surveys2.6 SurveysLO2LO2--99 QuestionWordingQuestion Wording 2 • Make sure you have covered all the possibilities. For example, • Overlapping classes or unclear categories are a problem. What How old is your father? – 45 if your father is deceased or is 45 years old. – 55 – 65 2-43 C hapter 2.6 Surveys2.6 SurveysLO2LO2--99
  • 70.
    Coding and DataScreeningCoding and Data Screening 2 • Responses are usually coded numerically (e.g., 1 = male, 2 = female). • Missing values are typically denoted by special characters (e.g., blank, “.” or “*”). , ) • Discard questionnaires that are flawed or missing many responses. • Watch for multiple responses, outrageous or inconsistent replies or out-of-range answersout of range answers. • Followup if necessary and always document your data-coding decisionsdecisions. 2-44 Sample Problems - Chapters 1 & 2.pdf 1 Sample Problems 1) For the following situation indicate whether the statistical application is primarily descriptive or inferential.
  • 71.
    “The manager ofAnna’s Fabric Shop has collected data for 10 years on the number of each type of dress fabric that has been sold at the store. She is interested in making a presentation that will illustrate these data effectively.” This application is primarily descriptive in nature. The owner wishes to develop a presentation. She will most likely use charts, graphs, tables and numerical measures to describe her data. 2) Consider the following graph that appeared in a company annual report. What type of graph is this? Explain. The graph is a bar chart. A bar chart displays values associated with categories. In this case the categories are the departments at the food store. The values are the total monthly sales (in dollars) in each department. A bar chart also typically has gaps between the bars. A histogram has no gaps and the horizontal axis represents the possible values for a numerical variable. 3) Consider the figures below. What differences do you see between the histogram and the bar chart? 2
  • 72.
    A bar chartis used whenever you want to display data that has already been categorized while a histogram is used to display data over a range of values for the factor under consideration. Another fundamental difference is that there typically are gaps between the bars on a bar chart but there are no gaps between the bars of a histogram. 4) Consider that you are working for an advertising firm. Provide an example of how hypothesis testing can be used to evaluate a product claim. Businesses often make claims about their products that can be tested using hypothesis testing. For example, it is not enough for a pharmaceutical company to claim that its new drug is effective in treating a disease. In order for the drug to be approved by the Food and Drug Administration the company must present sufficient evidence that the drug first does no harm and that it also provides an effective treatment against the disease. The claims that the drug does no harm and is an effective treatment can be tested using hypothesis testing. 5) In what situations might a decision maker need to use statistical inferences? Statistical inference procedures are useful in situations where a decision maker needs to reach an estimate about a population based on a subset of data taken from the population. For example, a decision maker might want to know the starting annual salary of all attorneys in the United States. If it is not feasible or possible to look at the salary data for all attorneys the decision maker could look
  • 73.
    at a subsetof attorneys and use statistical inference to reach a conclusion about the population of all attorneys. 6) Explain under what circumstances you would use hypothesis testing as opposed to an estimation procedure. Hypothesis testing is used whenever one is interested in testing claims that concern a population. Using information taken from samples, hypothesis testing evaluates the claim and makes a conclusion about the population from which the sample was taken. Estimation is used when we are interested in knowing something about all the data, but the population is too large, or the data set is too big for us to work with all the data. In estimation, no claim is being made or tested. 7) Discuss any advantages a graph showing a whole set of data has over a single measure, such as an average. The major advantage of a graph is it allows a more complete representation of information in the data. Not only can a decision maker visualize the center of the data but also how spread out the data is. An average, for instance, nicely represents the center of a data set, but contains no information of how spread out the data is. 8) Discuss any advantages a single measure, such as an average, has over a table showing a whole set of data.
  • 74.
    3 By its nature,a single measure is just one value and therefore is simpler than a table. It allows an easy method of comparison between two or more data sets, something that is more difficult if the data sets are represented in tabular form. In addition, although not mentioned in this chapter, additional statistical techniques, such as hypothesis testing and estimation, involve calculations based on a single measure from a subset of population data. 9) Suppose a survey is conducted using a telephone survey method. The survey is conducted from 9 am to 11 am. on Tuesday. Indicate what potential problems the data collectors might encounter. There will likely by a high rate of nonresponse bias since many people who work days will not be home during the 9-11 AM time slot. Also, the data collectors need to be careful where they get the phone number list as some people do not have listed phones in phone books and others have no phone or only a cell phone. This may result in selection bias. 10) For each of the following situations, indicate what type of data collection method you would recommend and discuss why you so: a) Collecting data on the percentage of bike riders who wear helmets, b) Collecting data on the price of regular unleaded gasoline at gas stations in your state.
  • 75.
    c) Collecting dataon customer satisfaction with the service provided by a major US airline. a) Observation would be the most likely method. Observers could be located at various bike routes and observe the number of riders with and without helmets. This would likely be better than asking people if they wear a helmet since the popular response might be to say yes even when they don’t always do so. b) A telephone survey to gas stations in the state. This could be a cost effective way of getting data from across the state. The respondent would have the information and be able to provide the correct price. c) A written survey of passengers. This could be given out on the plane before the plane lands and passengers could drop the surveys in a box as they de-plane. This method would likely garner higher response rates compared to sending the survey to passengers’ mailing address and asking them to return the completed survey by mail. 11) Indicate which sampling method would most likely be used in each of the following situations: a) An interview conducted with mayors of a sample of cities in Florida. b) A poll of voters regarding a referendum calling for a national value-added tax. c) A survey of customers entering a shopping mall in
  • 76.
    Minneapolis. a) Because thepopulation is spread over a large geographical area, a cluster random sample could be selected to reduce travel costs. b) A stratified random sample would probably be used to keep sample size as small as possible. c) Most likely a convenience sample would be used since doing a statistical sample would be too difficult. 4 12) A company has 18,000 employees. The file containing the names is ordered by employee number from 1 to 18,000. If a sample of 100 employees is to be selected from the 18,000 using a systematic random sampling, within what range of employee numbers will the first employee be selected from? To determine the range of employee numbers for the first employee selected in a systematic random sample use the following: Part range = Population Size/Sample Size = 18,000/100 = 180. Thus, the first person selected will come from employees 1-180. Once that person is randomly selected, the second person will be the one numbered 100 higher than the first, and so on. 13) Describe how systematic random sampling could be used to
  • 77.
    select a randomsample of 1,000 customers who have a CD at a commercial bank. Assume that the bank has 25,000 customers who own a CD. From a numbered list of all customers who own a CD the bank would need to randomly determine a starting point between 1 and k, where k would be equal to 25000/1000 = 25. This could be done using a random number table or by having a statistical package or a spreadsheet generate a random number between 1 and 25. Once this value is determined the bank would select that numbered customer as the first sampled customer and then select every 25th customer after that until 100 customers are sampled. 14) If the manager at First City Bank surveys a sample of 100 customers to determine how many miles they live from the bank, is the mean travel distance for this sample considered a parameter or a statistic? Values computed from a sample are always considered statistics. In order for a value, such as an average, to be considered a parameter it must be computed from all items in the population. 15) For each of the following, indicate whether the data are cross-sectional or time-series: a) Quarterly employment rates b) Unemployment rates by state c) Monthly sales
  • 78.
    d) Employment satisfactiondata for a company. a) Time-series b) Cross-sectional c) Time-series d) Cross-sectional 16) For each of the following variables, indicate the level of data measurement: a) Product rating (1 = excellent, 2 = good, 3 = fair, 4 = poor, 5 = very poor) b) Home ownership (own, rent, other) c) College GPA d) Marital Status (single, married, divorced, other) 5 a) Ordinal – categories with defined order b) Nominal – categories with no defined order c) Ratio d) Nominal – categories with no defined order
  • 79.
    17) Consumer Reports,in its ratings of cars, indicates repair history with circles. The circles are black, white, or half-and-half. To which level of data does this correspond? Since the circles involve a ranking from best to worst, this would be ordinal data. Chapters 1 & 2 Lecture Power Point Slides.pdf Chapter 1 Chapter 1 –– O i fO i fOverview of Overview of St ti tiSt ti tiStatisticsStatistics Chapter 2Chapter 2Chapter 2 Chapter 2 –– Data CollectionData CollectionData CollectionData Collection ©2006 Thomson/South-Western 1 Areas of Business thatAreas of Business thatAreas of Business that Areas of Business that Rely on StatisticsRely on StatisticsRely on StatisticsRely on Statistics rly ReportsYearly Reports
  • 80.
    Basic DefinitionsBasic Definitions DescriptiveStatistics: the collection and the collection and d i ti f d td i ti f d tdescription of datadescription of data decisionanalyzing, decisionInferential Statistics: Inferential Statistics: analyzing, decision analyzing, decision making or estimation based on the datamaking or estimation based on the data set of all possible measurements that is of interestmeasurements that is of interest of the population from which information is gatheredfrom which information is gatheredgg Basic DefinitionsBasic Definitions andom Sample: a sample in a sample in which each item in the population has an which each item in the population has an equal chance of being selectedequal chance of being selected selection of all population
  • 81.
    itemsitemsitemsitems calculated from thecalculated from the populationpopulation calculated from the samplecalculated from the samplecalculated from the samplecalculated from the sample Basic DefinitionsBasic Definitions that contains onlyDiscrete Data: Discrete Data: data that contains only data that contains only integers or counting numbers integers or counting numbers –– usually usually the result of counting somethingthe result of counting somethingthe result of counting somethingthe result of counting something value over a ti l iti l iparticular range is particular range is possible possible –– usually usually th lt fth lt fthe result of the result of measuring measuring thithisomethingsomething Level of MeasurementLevel of MeasurementLevel of
  • 82.
    Measurement Level ofMeasurement for Numerical Datafor Numerical Datafor Numerical Datafor Numerical Data merely labels or assigned numbersassigned numbers arranged in order such as worst to best or best to worstsuch as worst to best or best to worst arranged in order and the difference between numbers hasand the difference between numbers hasand the difference between numbers has and the difference between numbers has meaningmeaning from interval data in that there is a definite zero pointthere is a definite zero pointpp T f D tT f D tTypes of DataTypes of Data Numerical data Qualitative QuantitativeData TypesData Types Nominal Ordinal Interval RatioLevels of Levels of MeasurementMeasurementMeasurementMeasurement Discrete Discrete or continuous
  • 83.
    Sources of DataSourcesof Data Primary data comes from an original (primary) source and are collected with (primary) source and are collected with specific research questions in mindspecific research questions in mindspecific research questions in mindspecific research questions in mind a Secondary data represent previously recorded data collected for another recorded data collected for another p rpose or as part of a reg larlp rpose or as part of a reg larlpurpose or as part of a regularly purpose or as part of a regularly scheduled data collection procedurescheduled data collection procedure Data CollectionData Collection ata collection methods:Frequently used data collection methods: Experiments, Telephone Surveys, Written Experiments, Telephone Surveys, Written Q ti i d S Di tQ ti i d S Di tQuestionnaires and Surveys, Direct Questionnaires and Surveys, Direct
  • 84.
    Observation and PersonalInterviewsObservation and Personal Interviews to be aware of: I t i bi NI t i bi N bibiInterviewer bias, NonInterviewer bias, Non--response bias, response bias, Selection bias, Observer bias, Selection bias, Observer bias, M t I t l liditM t I t l liditMeasurement error, Internal validity, Measurement error, Internal validity, External validity External validity R d S liR d S liRandom Sampling versus Random Sampling versus Nonrandom SamplingNonrandom SamplingNonrandom SamplingNonrandom Sampling Sampling ensures that the sample obtain is representative of the sample obtain is representative of the populationpopulationpopulationpopulation or nonprobability p p yp p y samples are generated using a samples are generated using a deliberate selection proceduredeliberate selection procedurepp
  • 85.
    Generating RandomGenerating RandomGeneratingRandom Generating Random NumbersNumbersNumbersNumbers Example (see excel sheet) their 300 employees. Employees are p y p y numbered 1 to 300. Excel 2007 to generate 10 random numbers between 1 and 300. Values must be integer numbers corresponding tomust be integer numbers corresponding to employee numbers Example 1Example 1 the following the following situation, indicate whether the statistical application is whether the statistical application is primarily descriptive or inferentialprimarily descriptive or inferentialprimarily descriptive or inferential primarily descriptive or inferential has collected data for 10 years on the number of each data for 10 years on the number of each
  • 86.
    type of dressfabric that has been sold in type of dress fabric that has been sold in the store. She is interested in making a the store. She is interested in making a presentation that will illustrate these data presentation that will illustrate these data effectively.effectively. Example 2Example 2 the For the following situations, indicate the type of data collection method to use:type of data collection method to use: data on the percentage of bike riders who wear helmetsriders who wear helmets the price of regular unleaded gasoline at gas stations in yourunleaded gasoline at gas stations in yourunleaded gasoline at gas stations in your unleaded gasoline at gas stations in your statestate satisfaction Collecting data on customer satisfaction with the service provided by a major US with the service provided by a major US
  • 87.
    irlineairline Example 3Example 3 cordingto a national CNN/USA/Gallup According to a national CNN/USA/Gallup survey of 1025 adults, conducted March survey of 1025 adults, conducted March 1414 –– 16 2008 63% say they have16 2008 63% say they have1414 16, 2008, 63% say they have 16, 2008, 63% say they have experienced a hardship because of rising experienced a hardship because of rising gasoline prices How do you believe thegasoline prices How do you believe thegasoline prices. How do you believe the gasoline prices. How do you believe the survey was conducted and what type of survey was conducted and what type of bias could occur in the data collectionbias could occur in the data collectionbias could occur in the data collection bias could occur in the data collection process?process? Example 4Example 4 systematic random sampling could be used to select a sampling could be used to select a random sample of 1000 customers whorandom sample of 1000
  • 88.
    customers whorandom sampleof 1000 customers who random sample of 1000 customers who have a certificate of deposit at a have a certificate of deposit at a commercial bank Assume that the bankcommercial bank Assume that the bankcommercial bank. Assume that the bank commercial bank. Assume that the bank has 25000 customers who own a has 25000 customers who own a certificate of depositcertificate of depositcertificate of deposit.certificate of deposit. Example 5Example 5 the manager at First City Bank surveysIf the manager at surveys If the manager at First City Bank surveys a sample of 100 customers to determine a sample of 100 customers to determine how many miles they live from the bankhow many miles they live from the bankhow many miles they live from the bank, how many miles they live from the bank, is the mean travel distance for this is the mean travel distance for this sample considered a parameter or asample considered a parameter or asample considered a parameter or a sample considered a parameter or a statistic?statistic? Example 6Example 6 te For each of the
  • 89.
    following, indicate whether thedata are crosswhether the data are cross--sectional or sectional or time series:time series:time series:time series: teUnemployment rates by state data for aEmployment satisfaction data for a Employment satisfaction data for a companycompany Example 7Example 7 each of the following variables, indicate the level of data measurement:indicate the level of data measurement: excellent, 2 = good, 3 Product rating [1 = excellent, 2 = good, 3 = fair, 4 = poor, 5 = very poor]= fair, 4 = poor, 5 = very poor] rent, other] tal status [single, married, divorced,Marital status [single, married, divorced,Marital status [single, married, divorced, Marital status [single, married, divorced, other]other]
  • 90.
    Example 8Example 8 energy consideringA maker of energy drinks is considering abandoning can containers and going abandoning can containers and going exclusively to bottles because the salesexclusively to bottles because the salesexclusively to bottles because the sales exclusively to bottles because the sales manager believes customers prefer manager believes customers prefer drinking from bottles However the VP indrinking from bottles However the VP indrinking from bottles. However, the VP in drinking from bottles. However, the VP in charge of marketing is not convinced the charge of marketing is not convinced the sales manager is correctsales manager is correctsales manager is correct.sales manager is correct. ndicate the data collection method you Indicate the data collection method you would usewould use Example 8 (contd)Example 8 (contd) Indicate what procedures you would follow to apply this technique in this follow to apply this technique in this
  • 91.
    settingsettingsettingsetting data measurement applies tothe data you would collectapplies to the data you would collect data qualitative or quantitative? Zipped Chapter 3 Material.zip Chapter 3 Power Point Slides.pdf C ha Describing Data VisuallyDescribing Data Visually pter Chapter ContentsChapter Contents 3 13 1 StemStem andand Leaf Displays and Dot PlotsLeaf Displays and Dot Plots 3 3.1 3.1 StemStem--andand--Leaf Displays and Dot PlotsLeaf Displays and Dot Plots 3.2 Frequency Distributions and Histograms3.2 Frequency Distributions and Histograms 3.3 Excel Charts3.3 Excel Charts 3.4 Line Charts3.4 Line Charts
  • 92.
    3.5 Bar Charts3.5Bar Charts 3 6 Pie Charts3 6 Pie Charts3.6 Pie Charts3.6 Pie Charts 3.7 Scatter Plots3.7 Scatter Plots 3 8 T bl3 8 T bl3.8 Tables3.8 Tables 3.9 Deceptive Graphs3.9 Deceptive Graphs 3-1 C hapter Describing Data VisuallyDescribing Data Visually Chapter Learning ObjectivesChapter Learning Objectives 3 Chapter Learning ObjectivesChapter Learning Objectives LO3LO3 11LO3LO3--1: 1: Make a stemMake a stem--andand-- leaf or dot plot by hand or by computer.leaf or dot plot by hand or by computer. LO3LO3--2: 2: Create a frequency distribution for a data set.Create a frequency distribution for a data set. LO3LO3--3: 3: Make a histogram with appropriate bins.Make a histogram with appropriate bins. LO3LO3--4:4: Identify skewness, modal classes, and outliers in a histogram.Identify skewness, modal classes, and outliers in a histogram.LO3LO3 4: 4: Identify skewness, modal classes, and outliers in a histogram.Identify skewness, modal classes, and outliers in a histogram. LO3LO3--5: 5: Make an effective line chart using Excel.Make an effective line chart using Excel.
  • 93.
    3-2 C hapter Describing Data VisuallyDescribingData Visually Chapter Learning ObjectivesChapter Learning Objectives 3 Chapter Learning ObjectivesChapter Learning Objectives LO3LO3 66LO3LO3--6: 6: Know the rules for effective bar charts and pie charts.Know the rules for effective bar charts and pie charts. LO3LO3--7: 7: Make and interpret a scatter plot using Excel.Make and interpret a scatter plot using Excel. LO3LO3--8: 8: Make simple tables and pivot tables.Make simple tables and pivot tables. LO3LO3--9:9: Recognize deceptive graphing techniques.Recognize deceptive graphing techniques.LO3LO3 9: 9: Recognize deceptive graphing techniques.Recognize deceptive graphing techniques. 3-3 C hapter 3.1 Stem3.1 Stem--andand--Leaf Displays and Leaf Displays and
  • 94.
    Dot PlotsDot Plots ••Methods of organizing, exploring and summarizing data include:Methods of organizing, exploring and summarizing data include: 3Dot PlotsDot Plots g g, p g gg g, p g g - VisualVisual (charts and graphs) suasua (c a s a d g ap s) provides insight into characteristics of a data set without using mathematics. - NumericalNumerical (statistics or tables)- NumericalNumerical (statistics or tables) provides insight into characteristics of a data set using mathematics. 3-4 C hapter 3.1 Stem3.1 Stem--andand--Leaf Displays and Leaf Displays and Dot PlotsDot Plots •• Begin with univariate data (a set of Begin with univariate data (a set of nn observations on one variable) observations on one variable) and consider the following:and consider the following: 3Dot PlotsDot Plots and consider the following:and consider the following:
  • 95.
    3-5 C hapter 3.1 Stem3.1 Stem--andand--LeafDisplays and Leaf Displays and Dot PlotsDot Plots •• MeasurementMeasurement • Look at the data and visualize how they were collected and 3Dot PlotsDot Plots • Look at the data and visualize how they were collected and measured. •• Sorting Sorting (Example: Price/Earnings Ratios) (Example: Price/Earnings Ratios) • Sort the data and then summarize in a graphical display. Here areSort the data and then summarize in a graphical display. Here are the sorted P/E ratios (values from Table 3.2). 3-6 C hapter 3.1 Stem3.1 Stem--andand--leaf Displays and leaf Displays
  • 96.
    and Dot PlotsDot Plots LO3LO3--11 Thetype of graph you use to display your data is dependent on the 3Dot PlotsDot Plots type of data you have. Some charts are better suited for quantitative data, while others are better for displaying categorical data. Stem-and-Leaf Plot LO3-1: Make a stem-and-leaf or dot plot by hand or by computer. One simple way to visualize small data sets is a stem-and-leaf plot. The stem-and-leaf plot is a tool of exploratory data analysis (EDA)The stem-and-leaf plot is a tool of exploratory data analysis (EDA) that seeks to reveal essential data features in an intuitive way. A stem- and-leaf plot is basically a frequency tally, except that we use digits instead of tally marks. For two-digit or three-digit integer data, the stem is the tens digit of the data, and the leaf is the ones digit. 3-7 C
  • 97.
    hapter 3.1 Stem3.1 Stem--andand--LeafDisplays Leaf Displays andand Dot PlotsDot Plots LO3LO3--11 For the 44 P/E ratios, the stem-and-leaf plot is given below. 3and and Dot PlotsDot Plots For example, the data values in the fourth stem are 31, 37, 37, 38. We always use equally spaced t ( if t t ) Th t d l f l t l t d (24 f thstems (even if some stems are empty). The stem-and-leaf can reveal central tendency (24 of the 44 P/E ratios were in the 10–19 stem) as well as dispersion (the range is from 7 to 59). In this illustration, the leaf digits have been sorted, although this is not necessary. The stem-and-leaf has the advantage that we can retrieve the raw data by concatenating a stem digit with each of its leaf 3-8 g y g g digits. For example, the last stem has data values 50 and 59. C hapter LO3LO3--11 3.1 Stem3.1 Stem--andand--Leaf Displays and Leaf Displays and
  • 98.
    Dot PlotsDot Plots3 Dot PlotsDot Plots Dot PlotsDot Plots •• A dot plot is the simplest graphical display of A dot plot is the simplest graphical display of nn individual values of numerical individual values of numerical data. data. E t d t dE t d t d •• Steps in Making a Dot PlotSteps in Making a Dot Plot -- Easy to understand. Easy to understand. -- It reveals dispersion, central tendency, and the shape of the distribution.It reveals dispersion, central tendency, and the shape of the distribution. p gp g 11. . Make a scale that covers the data range.Make a scale that covers the data range. 22. Mark the axes and label them.. Mark the axes and label them. 33. Plot each data value as a dot above the scale at its approximate location.Plot each data value as a dot above the scale at its approximate location. Note: Note: If more than one data value lies at about the same axis location, If more than one data value lies at about the same axis location, the dots are stacked vertically.the dots are stacked vertically.
  • 99.
    3-9 C hapter LO3LO3--11 3.1 Stem3.1Stem--andand--Leaf Displays and Leaf Displays and Dot PlotsDot Plots 3Dot PlotsDot Plots • The range is from 7 to 59• The range is from 7 to 59. • All but a few data values lie between 10 and 25. • A typical “middle” data value would be around 17 or 18. • The data are not symmetric due to a few large P/E ratios. 3-10 C hapter 3.1 Stem3.1 Stem--andand--Leaf Displays and Leaf Displays and Dot PlotsDot Plots LO3LO3--11 Comparing GroupsComparing Groups • A stacked dot plotstacked dot plot compares two or more groups using a common 3Dot PlotsDot Plots • A stacked dot plotstacked dot plot compares two or more groups using a common
  • 100.
    X-axis scale. 3-11 C ha3.2 FrequencyDistributions and 3.2 Frequency Distributions and pter q yq y HistogramsHistogramsLO3LO3--22 3 LO3LO3--2: 2: Create a frequency distribution for a data setCreate a frequency distribution for a data set Bins and Bin LimitsBins and Bin Limits • A frequency distributionfrequency distribution is a table formed by classifying n data values into k classes (bins). •• Bin limitsBin limits define the values to be included in each bin. Widths must all be the same except when we have open-ended bins. F iF i th b f b ti ithi h bi•• FrequenciesFrequencies are the number of observations within each bin. • Express as relative frequenciesrelative frequencies (frequency divided by the total) or p qq ( q y y ) percentagespercentages (relative frequency times 100). 3-12
  • 101.
    C ha3.2 Frequency Distributionsand 3.2 Frequency Distributions and pter q yq y HistogramsHistogramsLO3LO3--22 - Herbert Sturges proposed the following rule: Constructing a Frequency DistributionConstructing a Frequency Distribution 3 - Herbert Sturges proposed the following rule: 3-13 C ha3.2 Frequency Distributions and 3.2 Frequency Distributions and pter LO3LO3--22 q yq y HistogramsHistograms 3 3-14
  • 102.
    C ha3.2 Frequency Distributionsand 3.2 Frequency Distributions and pter LO3LO3--22 q yq y HistogramsHistograms HistogramsHistograms 3 HistogramsHistograms •• A A histogramhistogram is a graphical representation of a frequency distributionis a graphical representation of a frequency distribution. YY--axis shows frequency within each binaxis shows frequency within each bin. •• A A histogramhistogram is a bar chart.is a bar chart. XX--axis ticks shows end points of each bin.axis ticks shows end points of each bin. 3-15 C ha3.2 Frequency Distributions and 3.2 Frequency Distributions and pter q yq y
  • 103.
    HistogramsHistogramsLO3LO3--33 • Consider 3histograms for the P/E ratio data with different bin 3 LO3LO3--3: 3: Make a histogram with appropriate bins.Make a histogram with appropriate bins. • Consider 3 histograms for the P/E ratio data with different bin widths. What do they tell you? 3-16 C ha3.2 Frequency Distributions and 3.2 Frequency Distributions and pter q yq y HistogramsHistogramsLO3LO3--33 3 LO3LO3--3: 3: Make a histogram with appropriate bins.Make a histogram with appropriate bins. • Choosing the number of bins and bin limits in creating histograms Choosing the number of bins and bin limits in creating histograms requires judgmentrequires judgmentrequires judgment.requires judgment. •• One can use software programs to create histograms with different One can use software programs to create histograms
  • 104.
    with different bins. Theseinclude software such as:bins. These include software such as: •• ExcelExcel •• MegaStatMegaStat •• MinitabMinitab 3-17 C ha3.2 Frequency Distributions and 3.2 Frequency Distributions and pter q yq y HistogramsHistogramsLO3LO3--33 Modal ClassModal Class 3 •• A histogram bar that is higher than those on either side. A histogram bar that is higher than those on either side. •• UnimodalUnimodal –– a single modal class.a single modal class. •• BimodalBimodal –– two modal classes.two modal classes. •• MultimodalMultimodal –– more than two modal classesmore than two modal classes•• MultimodalMultimodal –– more than two modal classes.more than two modal classes. •• Modal classes may be artifacts of the way bin limits are
  • 105.
    chosenModal classes maybe artifacts of the way bin limits are chosen•• Modal classes may be artifacts of the way bin limits are chosen.Modal classes may be artifacts of the way bin limits are chosen. 3-18 C ha3.2 Frequency Distributions and 3.2 Frequency Distributions and pter LO3LO3--44 q yq y HistogramsHistograms ShapeShape 3 LO3LO3--4:4: Identify skewness, modes, and outliers in a histogram.Identify skewness, modes, and outliers in a histogram. ShapeShape •• A histogram may suggest the A histogram may suggest the shapeshape of the population. of the population. SkSk i di t d b th di ti f th l t il f thi di t d b th di ti f th l t il f th •• It is influenced by the number of bins and bin limits.It is influenced by the number of bins and bin limits. •• SkewnessSkewness –– indicated by the direction of the longer tail of the indicated by the direction of the longer tail of
  • 106.
    the histogram.histogram. LeftLeft--skewedskewed –– (negativelyskewed) a longer left tail(negatively skewed) a longer left tailLeftLeft-- skewedskewed –– (negatively skewed) a longer left tail.(negatively skewed) a longer left tail. Ri htRi ht k dk d ( iti l k d) l i ht t il( iti l k d) l i ht t ilRightRight--skewedskewed –– (positively skewed) a longer right tail.(positively skewed) a longer right tail. S t iS t i b th t il thb th t il th 3-19 SymmetricSymmetric –– both tail areas are the same.both tail areas are the same. C ha3.2 Frequency Distributions and 3.2 Frequency Distributions and pter q yq y HistogramsHistogramsLO3LO3--44 3 3-20 C ha3.2 Frequency Distributions and 3.2 Frequency Distributions
  • 107.
    and pter q yqy HistogramsHistograms 3 Frequency Polygons and OgivesFrequency Polygons and Ogives •• A frequency polygon is a line graph that connects the midpoints of A frequency polygon is a line graph that connects the midpoints of the histogram intervals, plus extra intervals at the beginning and the histogram intervals, plus extra intervals at the beginning and end end so that the line will touch theso that the line will touch the XX-- axisaxisso that the line will touch the so that the line will touch the XX--axis. axis. •• It serves the same purpose as a histogram, but is attractive when you It serves the same purpose as a histogram, but is attractive when you need to compare two data sets (since more than one frequency need to compare two data sets (since more than one frequency polygon can be plotted on the same scale).polygon can be plotted on the same scale). •• An ogive (pronounced “ohAn ogive (pronounced “oh--jive”) is a line graph of the cumulative jive”) is a line graph of the cumulative frequenciesfrequenciesfrequencies. frequencies. •• It is useful for finding percentiles or in comparing the shape of the It is useful for finding percentiles or in comparing the shape of the sample with a known benchmark such as the normal distribution
  • 108.
    (that sample witha known benchmark such as the normal distribution (that you will be seeing in the next chapter).you will be seeing in the next chapter). 3-21 C ha3.2 Frequency Distributions and 3.2 Frequency Distributions and pter q yq y HistogramsHistograms 3 Frequency Polygons and OgivesFrequency Polygons and OgivesFrequency Polygons and OgivesFrequency Polygons and Ogives 3-22 C ha 3 3 Excel Charts3 3 Excel Charts pter 3.3 Excel Charts3.3 Excel Charts 3 This section describes how to use Excel to create This section describes how to use Excel to create charts. Please refer to the text.charts. Please refer to the text.
  • 109.
    3-23 C hapter 3.4 Line Charts3.4Line ChartsLO3LO3--55 3 LO3LO3--5:5: Make an effective line chart using Excel.Make an effective line chart using Excel. Simple Line ChartsSimple Line Charts •• Used to display a time Used to display a time series or spot trends, series or spot trends, or to compare timeor to compare timeor to compare time or to compare time periods.periods. C di l lC di l l•• Can display several Can display several variables at once.variables at once. 3-24 C hapter 3.4 Line Charts3.4 Line ChartsLO3LO3--55 Simple Line ChartsSimple Line Charts
  • 110.
    3 •• TwoTwo--scale linechart scale line chart –– used to compare variables that differ in used to compare variables that differ in magnitude or are measured in different units.magnitude or are measured in different units. 3-25 C hapter LO3LO3--55 3.4 Line Charts3.4 Line Charts Log ScalesLog Scales Arithmetic scaleArithmetic scale distances on thedistances on the YY a is are proportional to thea is are proportional to the 3 •• Arithmetic scaleArithmetic scale –– distances on the distances on the YY--axis are proportional to the axis are proportional to the magnitude of the variable being displayed.magnitude of the variable being displayed. •• Logarithmic scaleLogarithmic scale –– ((ratio scaleratio scale) equal distances represent equal ) equal distances represent equal ratios.ratios. •• Use a Use a log scalelog scale for the vertical axis when data vary over a wide for the vertical axis when data vary over a wide range, say, by more than an order of magnitude.range, say, by
  • 111.
    more than anorder of magnitude. •• This will reveal more detail for smaller data values.This will reveal more detail for smaller data values. 3-26 C hapter 3.4 Line Charts3.4 Line ChartsLO3LO3--55 Log ScalesLog Scales A log scale is useful for time series data that might be expected to grow at a 3 A log scale is useful for time series data that might be expected to grow at a compound annual percentage rate (e.g., GDP, the national debt, or your future income). It reveals whether the quantity is growing at an increasing percent (concave upward)increasing percent (concave upward), constant percent (straight line), or declining percent (concave downward) 3-27 C
  • 112.
    hapter 3.5 Bar Charts3.5Bar ChartsLO3LO3--66 3 LO3LO3--6: 6: Know the rules for effective bar charts and pie charts.Know the rules for effective bar charts and pie charts. M t t di l tt ib t d t Simple Bar ChartsSimple Bar Charts • Most common way to display attribute data. - Bars represent categories or attributes. - Lengths of bars represent frequencies.g p q 3-28 C hapter 3.5 Bar Charts3.5 Bar ChartsLO3LO3--66 Pareto ChartsPareto Charts 3 •• Special type of bar chart used in quality management to display the Special type of bar chart used in quality management to display the frequency of defects or errors of different types. frequency of defects or errors of different types. •• Categories are Categories are
  • 113.
    displayed in displayedin descending descending order order of frequency. of frequency. •• Focus on Focus on significant fewsignificant few (i.e., few (i.e., few categories that categories that account for most defects or errors)account for most defects or errors) 3-29 account for most defects or errors).account for most defects or errors). C hapter 3.5 Bar Charts3.5 Bar ChartsLO3LO3--66 Stacked Bar ChartStacked Bar Chart 3 •• Bar height is the sumBar height is the sumBar height is the sum Bar height is the sum of several subtotals. of several subtotals. Areas may be Areas may be d b l td b l tcompared by color to compared by color to show patterns in the show patterns in the subgroups and total.subgroups and total.subgroups and total.subgroups and total.
  • 114.
    3-30 C hapter 3.6 Pie Charts3.6Pie ChartsLO3LO3--66 3 LO3LO3--6:6: Know the rules for effective bar charts and pie charts.Know the rules for effective bar charts and pie charts. An OftAn Oft--Abused ChartAbused Chart •• A A pie chartpie chart can only convey a general idea of the data.can only convey a general idea of the data. •• Pie charts should be used to portray data which sum to a total Pie charts should be used to portray data which sum to a total (e g percent market shares)(e g percent market shares)(e.g., percent market shares).(e.g., percent market shares). •• A pie chart should only have a few (i.e., 2 or 3) slices.A pie chart should only have a few (i.e., 2 or 3) slices. •• Each slice can be labeled with data values or percentsEach slice can be labeled with data values or percents•• Each slice can be labeled with data values or percents.Each slice can be labeled with data values or percents. 3-31 C hapter
  • 115.
    3.6 Pie Charts3.6Pie ChartsLO3LO3--66 •• Consider the following charts used to illustrate an article from the Wall StreetConsider the following charts used to illustrate an article from the Wall Street An OftAn Oft--Abused ChartAbused Chart 3 Consider the following charts used to illustrate an article from the Wall Street Consider the following charts used to illustrate an article from the Wall Street Journal. Which type appears to be better?Journal. Which type appears to be better? 3-32 C hapter 3.6 Pie Charts3.6 Pie ChartsLO3LO3--66 •• ExplodedExploded and 33--D pie chartsD pie charts add strong visual impact. Pie Chart OptionsPie Chart Options 3 ExplodedExploded and 33 D pie chartsD pie charts add strong visual impact. 3-33
  • 116.
    C hapter 3.7 Scatter Plots3.7Scatter PlotsLO3LO3--77 3 LO3LO3--7:7: Make and interpret a scatter plot using Excel.Make and interpret a scatter plot using Excel. •• Scatter plots can convey patterns in data pairs that would not be Scatter plots can convey patterns in data pairs that would not be apparent from a table.apparent from a table. Refer to the text for EXCEL outputs. 3-34 C hapter 3.8 Tables3.8 Tables •• TablesTables are the simplest form of data display. A d t bld t bl i t bl th t t i ti i d t d th 3 • A compound tablecompound table is a table that contains time
  • 117.
    series data downthe columns and variables across the rows. Example: School ExpendituresExample: School Expenditures •• Arrangement of data is in rows and columns to enhance meaning.Arrangement of data is in rows and columns to enhance meaning. Example: School ExpendituresExample: School Expenditures •• The data can be viewed by focusing on the time pattern (down the The data can be viewed by focusing on the time pattern (down the columns) or by comparing the variables (across the rows).columns) or by comparing the variables (across the rows).) y p g ( )) y p g ( ) 3-35 C hapter 3.8 Tables3.8 Tables Example: School ExpendituresExample: School Expenditures 3 U it f t t d i th f t t• Units of measure are stated in the footnote. • Note merged headings to group columns. • See text for “Tips for Effective Bar and Column 3-36
  • 118.
    p Charts.” Tables”. C hapter 3.8Tables3.8 Tables LO3LO3--88 3 LO3LO3--8:8: Make simple tables and Pivot tablesMake simple tables and Pivot tables Here are some tips for creating effective tables:Here are some tips for creating effective tables: 1. Keep the table simple, consistent with its purpose. Put summary tables in the main body of the written report and y y p detailed tables in an appendix. 2. Display the data to be compared in columns rather than rows. 3 For presentation purposes round off to three or four significant3. For presentation purposes, round off to three or four significant digits. 4. Physical table layout should guide the eye toward the y y g y comparison you wish to emphasize. 5. Row and column headings should be simple yet descriptive. 6 Within a column use a consistent number of decimal digits 3-37 6. Within a column, use a consistent number of decimal digits.
  • 119.
    C hapter LO3LO3--99 3.9 DeceptiveGraphs3.9 Deceptive Graphs 3 LO3LO3--9:9: Recognize deceptive graphing techniques.Recognize deceptive graphing techniques. •• A nonzero origin will exaggerate the trendA nonzero origin will exaggerate the trend Error 1Error 1: Nonzero Origin: Nonzero Origin •• A nonzero origin will exaggerate the trend.A nonzero origin will exaggerate the trend. 3-38 DeceptiveDeceptive CorrectCorrect C hapter LO3LO3--99 3.9 Deceptive Graphs3.9 Deceptive Graphs Error 2Error 2: Elastic Graph Proportions: Elastic Graph Proportions 3 •• Keep the Keep the aspect ratioaspect ratio (width/height)
  • 120.
    below 2.00 soas not to (width/height) below 2.00 so as not to exaggerate the graph. By default, Excel uses an aspect ratio of exaggerate the graph. By default, Excel uses an aspect ratio of 1.68.1.68. 3-39 C hapter 3.9 Deceptive Graphs3.9 Deceptive GraphsLO3LO3--99 Error 4Error 4: 3: 3--D and Novelty GraphsD and Novelty Graphs 3 •• Can make trends appear to dwindle into the distance or loom Can make trends appear to dwindle into the distance or loom towards you.towards you. 3-40 C hapter 3.9 Deceptive Graphs3.9 Deceptive GraphsLO3LO3--99 Error 5Error 5: 3: 3--D and Rotated GraphsD and Rotated Graphs 3
  • 121.
    •• Can maketrends appear to dwindle into the distance or loom Can make trends appear to dwindle into the distance or loom towards you.towards you. 3-41 C hapter LO3LO3--99 3.9 Deceptive Graphs3.9 Deceptive Graphs •• Avoid if possible Keep your main objective in mind Break graphAvoid if possible Keep your main objective in mind Break graph Error 8Error 8: Complex Graphs: Complex Graphs 3 •• Avoid if possible. Keep your main objective in mind. Break graph Avoid if possible. Keep your main objective in mind. Break graph into smaller parts.into smaller parts. 3-42 C hapter 3.9 Deceptive Graphs3.9 Deceptive GraphsLO3LO3--99 Error 11Error 11: Area Trick: Area Trick
  • 122.
    3 •• As figureheight increases, so does width, distorting the graph.As figure height increases, so does width, distorting the graph. 3-43 C hapter LO3LO3--99 3.9 Deceptive Graphs3.9 Deceptive Graphs •• Other deceptive graphing techniquesOther deceptive graphing techniques 3 •• Other deceptive graphing techniques.Other deceptive graphing techniques. • Error 3: Dramatic Title and Distracting PicturesError 3: Dramatic Title and Distracting Pictures • Error 6: Unclear Definitions or Scales • Error 7: Vague Sources • Error 9: Gratuitous Effects • Error 10: Estimated Data 3-44 Sample Problems - Chapter 3.pdf
  • 123.
    1 Sample Problems 1) Giventhe following data, develop a frequency distribution: Step 1: List the possible values. The possible values for the discrete variable are 0 through 12. Step 2: Count the number of occurrences at each value. The resulting frequency distribution is shown as follows: 2) Assuming you have data for a variable with 2,000 values, usi ng the 2 k ≥ n guideline, what is the least number of groups that should be used in developing a grou ped data frequency distribution? Given n = 2,000, the minimum number of groups for a grouped data frequency distribution determined using the 2 k ≥ n guideline is: 2 k ≥ n or 2
  • 124.
    11 = 2048 ≥2000; use k = 11 groups. 3) A study is being conducted in which a variable of interest has 1,000 observations. The minimum value in the dataset is 300 points and the maximum is 2,900 poi nts. a) Use the 2 k ≥ n guideline to determine the minimum number of classes to u se in developing a grouped data frequency distribution. b) Based on your answer in a), determine the class width that sh ould be used. a) Given n = 1,000, the minimum number of classes for a groupe d data frequency distribution determined using the 2 k ≥ n guideline is: 2 2 k ≥ n or 2 10
  • 125.
    = 1024 ≥1000; use k = 10 classes. b) Assuming that the number of classes that will be used is 10, the class width is determined as follows: w = (high – low)/classes = (2900 – 300)/10 = 2600/10 = 260. 4) Produce the relative frequency distribution from a sample of size 50 that gave rise to the following ogive: Class Frequency Relative Frequency Cumulative Relative Frequency 0 – < 100 10 0.20 0.20 100 – < 200 10 0.20 0.40 200 – < 300 5 0.10 0.50 300 – < 400 5 0.10 0.60 400 – < 500 20 0.40 1.00 500 – < 600 0 0.00 1.00 5) You have the following data: 3
  • 126.
    a) Construct afrequency distribution for these data. Use the 2 k ≥ n guideline to determine the number of classes to use. b) Develop a relative frequency distribution using the classes yo u constructed in a). c) Develop a cumulative frequency distribution and a cumulativ e relative frequency distribution using the classes you constructed in a). d) Develop a histogram based on the frequency distribution you constructed in a). a) There are n = 60 observations in the data set. Using the 2 k > n guideline, the number of classes, k, would be 6. The maximum and minimum values in the data set are 17 and 0, respectively. The class width is computed to be: w = (17‐0)/6 = 2.833, which is rounded to 3. The frequen cy distribution is Class Frequency 0-2 6 3-5 13 6-8 20
  • 127.
    9-11 14 12-14 5 15-172 Total = 60 b) To construct the relative frequency distribution, divide the nu mber of occurrences (frequency) in each class by the total number of occurrences. The relative frequency distr ibution is shown below. Class Frequency Relative Frequency 0-2 6 0.100 3-5 13 0.217 6-8 20 0.333 9-11 14 0.233 12-14 5 0.083 15-17 2 0.033 Total = 60 c) To develop the cumulative frequency distribution, compute a running sum for each class by adding the
  • 128.
    frequency for thatclass to the frequencies for all classes above i t. The cumulative relative frequencies are 4 computed by dividing the cumulative frequency for each class b y the total number of observations. The cumulative frequency and the cumulative relative frequency dist ributions are shown below. Class Frequency Relative Frequency Cumulative Frequency Cumulative Relative Frequency 0-2 6 0.100 6 0.100 3-5 13 0.217 19 0.317 6-8 20 0.333 39 0.650 9-11 14 0.233 53 0.883 12-14 5 0.083 58 0.967
  • 129.
    15-17 2 0.03360 1.000 Total = 60 d) To develop the histogram, first construct a frequency distribu tion (see part a). The classes form the horizontal axis and the frequency forms the vertical axis. Bars corresponding to the frequency of each class are developed. The histogram based on the frequency distribution from part (a) is shown below. 6) Fill in the missing components of the following frequency dis tribution constructed for a sample size of 50: Histogram 0 5 10 15 20 25
  • 130.
    0-2 3-5 6-89-11 12-14 15-17 Classes F re q u e n c y 5 Class Frequency Relative Frequency Cumulative Relative Frequency 7.85 – < 7.95 6 0.12 0.12 7.95 – < 8.05 18 0.36 0.48 8.05 – < 8.15 12 0.24 0.72 8.15 – < 8.25 5 0.10 0.82 8.25 – < 8.35 9 0.18 1.00
  • 131.
    7) The followingcumulative frequency distribution summarizes data obtained in a study of the ending overages (in $) for the cash register balance at a firm: a) Determine the proportion of the days in which there were no shortages. b) Determine the proportion of the days the cash register was le ss than $20 off. c) Determine the proportion of the days the cash register was les s than $40 over or at the most $20 short. a) Proportion of days in which no shortages occurred = 1 – proportion of days in which shortages occurred = 1 – 0.24 = 0.76 b) Less than $20 off implies that overage was less than $20 and the shortage was less than $20 = (proportion of overages less $20) – (proportion of shortages at most $20) = 0.56 – 0.08 = 0.48 c) Proportion of days with less than $40 over or at most $20 sho rt = Proportion of days with less than $40 over – proportion of days with more than $20 short = 0.96 – 0.08 = 0.86. 8) You are given the following data:
  • 132.
    6 a) Construct afrequency distribution for these data. b) Based on the frequency distribution, develop a histogram. c) Construct a relative frequency distribution. d) Develop a relative frequency histogram. e) Compare the two histograms. a) The data do not require grouping. The following frequency distribution is given: x Frequency 0 0 1 0 2 1 3 1 4 10 5 15 6 13 7 13 8 5
  • 133.
    9 1 10 1 b)The following histogram could be developed. 0 2 4 6 8 10 12 14 16 0 1 2 3 4 5 6 7 8 9 10 F re q u e
  • 134.
    n c y x variable 7 c) Therelative frequency distribution shows the fraction of values falling at each value of x. d) The relative frequency histogram is shown below. e) The two histograms look exactly alike since the same data are being graphed. The bars represent either the frequency or relative frequency. 9) The following data reflect the percentages of employees with different levels of education: a) Develop a pie chart to illustrate these data. b) Develop a horizontal bar chart to illustrate these data. a) The pie chart is as follows:
  • 135.
    0.00 0.05 0.10 0.15 0.20 0.25 0.30 0 1 23 4 5 6 7 8 9 10 R e la ti v e F re q u e n c y
  • 136.
    x variable 8 b) Thehorizontal bar chart is shown as follows: 10) Given the following data, construct a stem and leaf diagram. Education Levels 18% 34% 14% 30% 4% Less than HS Graduate HS Graduate Some College College Graduate
  • 137.
    Grdauate Degree Education Levels 18 34 14 30 4 05 10 15 20 25 30 35 40 Less than HS Graduate HS Graduate Some College College Graduate Grdauate Degree Percent 9 Sort the data from low to high. The lowest value is 0.7 and the highest 6.4.
  • 138.
    Split the valuesinto a stem and leaf. Stem = units place leaf = decimal place List all possible stems from lowest to highest. Itemize the leaves from lowest to highest and place next to the a ppropriate stems. 11) A university has the following number of students at each gr ade level. a) Construct a bar chart that effectively displays these data. b) Construct a pie chart to display these data. c) Refer to a) and b). Which graph is the most effective way to present these data and why? a) b) 10 c) A case can be made for either a bar chart or pie chart. Pie ch arts are especially good at showing how the total is divided into parts. The bar chart is best to draw attentio n to specific results. In this case, one could
  • 139.
    look at theapparent attrition that takes place in the number of st udents between Freshman and Senior years. 12) Given the following sales data for product category and sale s region, construct two different bar charts that display the data effectively. One possible bar chart is shown as follows: Another way to present the same data is: Still another possible way is called a “stacked” bar chart. Sales By Product Type and Region 0 50 100 150 200 250 300 350
  • 140.
    400 450 East West NorthSouth Region S a le s XJ-6 Model X-15-Y Model Craftsman Generic Sales By Product Type and Region 0 50 100 150 200 250
  • 141.
    300 350 400 450 XJ-6 Model X-15-YModel Craftsman Generic Product Type S a le s East West North South 11 13) Boston Properties is a real estate investment trust that owns office properties in selected markets. Its income distribution by region (in percent) in 2007 i
  • 142.
    s: a) Construct apie chart to display the income distribution by re gion for 2007. b) Construct a bar chart to display the income distribution by re gion for 2007. c) Which chart more effectively displays the information? a) A pie chart displaying income distribution by region is shown below. The categories are the regions and the measure is the region’s percentage of total income. b) The bar chart displaying income distribution by region is sho wn below. The categories are the regions and the measure for each category is the region’s percentage of total income. Sales By Product Type and Region 0 200 400 600 800 1000
  • 143.
    1200 East West NorthSouth Product Type S a le s Generic Craftsman X-15-Y Model XJ-6 Model Princeton 4% Washingto n, D.C 21% Boston 27% New York 34% San Francisco
  • 144.
    14% Income Distribution byRegion 12 c) Both charts clearly indicate the income distribution for Boston Properties by region. The bar chart, however, makes it easier to compare percentages across regions. 14) The following data represents 11 observations of two quanti tative variables: x = contact hours with client y= profit generated from client a) Construct a scatter plot of the data. Indicate whether the plot suggests a linear or non‐linear relationship between the dependent and the independent variabl es. b) Determine how much influence one data point will have on y our perception of the relationship between the independent and the dependent variables by deletin g the data point with the smallest x value. What appears to be the relationship between the depende nt and the independent variables?
  • 145.
    a) There appears tobe a curvilinear relationship between the depen dent and independent variables. b) 0% 10% 20% 30% 40% In co m e D is tr ib u ti o n % Region Income Distribution by Region ‐1000 0
  • 146.
    1000 2000 3000 4000 5000 0 20 4060 Series1 13 Having removed the extreme data points, the relationship betwe en dependent and independent variables seems to be linear and positive. 15) The following information shows the year‐end dollar value ( in millions) of deposits for the Bank of Ozarks, Inc., for the years 1997‐2007. Draw a line chart of the data and interpret the same. The time‐series variable is Year‐End Dollar Value Deposits ($ i n millions) measured over 8 years with a
  • 147.
    maximum value of1,380 (million). The horizontal axis will hav e 8 time periods equally spaced. The vertical axis will start at 0 and go to a value exceeding 1,380. We will use 1, 600. The vertical axis will also be divided into 200‐unit increments. The line chart of the data is shown below. The line chart shows that Year‐End Deposits have been increasi ng since 1997, but have increased more sharply since 2002 and leveled off between 2006 and 2007. 0 1000 2000 3000 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 $ i n M il li o n
  • 148.
    s Bank of theOzarks Deposits Chapter 3 Lecture Power Point Slides.pdf Chapter 2 Chapter 2 ––pp Data PresentationData Presentation Chapter Chapter 3 3 –– Data Presentation Data Presentation Using DescriptiveUsing Descriptive Describing Data Describing Data Using Descriptive Using Descriptive G hG h VisuallyVisually GraphsGraphs ©2006 Thomson/South-Western 1 P l ti f 500 CitiP l ti f 500 CitiPopulation of 500 CitiesPopulation of 500 Cities Class Number Size of City Frequency 1 Under 10,000 4 2 10,000 and under 15,000 51 3 15 000 d d 20 000 773 15,000 and under 20,000 77 4 20,000 and under 25,000 105
  • 149.
    5 25,000 andunder 30,000 84, , 6 30,000 and under 35,000 60 7 35,000 and under 40,000 45 8 40 000 d d 45 000 388 40,000 and under 45,000 38 9 45,000 and under 50,000 31 10 50,000 and over 5, 500 Frequency DistributionFrequency Distributionq yq y Lowest value in the data (L)Lowest value in the data (L) s (K) value chosen to Number of Classes (K) value chosen to best represent the databest represent the data rangerange HH –– LL CWCW ( )( ) number of classesnumber of classes KKCWCW = == = Starting SalariesStarting SalariesStarting SalariesStarting Salaries No. of Graduating Major Students Starting Salary (thousands of dollars)j g y ( )
  • 150.
    Accounting 26 41.539.4 40.9 35.9 37.4 39.5 40.3 39.3 41.6 36.6 41.1 35.7 43.7 37.0 41.3 40.6 38.0 42.4 35.7 41.4 39.2 36.8 39.3 43.8 38.5 43.0 Information 10 36.3 35.6 36.2 38.1 34.8 38.1 35.7 systems 36.5 39.5 37.9 Marketing 14 34.3 36.8 33.8 35.0 37.8 38.7 37.2 32.8 38.2 37.0 39.7 38.8 35.2 36.2 Freq enc Distrib tionFreq enc Distrib tionFrequency Distribution Frequency Distribution for Continuous Datafor Continuous Datafor Continuous Datafor Continuous Data Original Data Ordered Array 41.5 39.4 40.9 35.9 37.4 39 5 40 3 39 3 41 6 36 6 Original Data 32.8 33.8 34.3 34.8 35.0 35 2 35 6 35 7 35 7 35 7 Ordered Array 39.5 40.3 39.3 41.6 36.6 41.1 35.7 43.7 37.0 41.3 40.6 38.0 42.4 35.7 41.4 35.2 35.6 35.7 35.7 35.7 35.9 36.2 36.2 36.3 36.5 36.6 36.8 36.8 37.0 37.0
  • 151.
    39.2 36.8 39.343.8 38.5 43.0 36.3 35.6 36.2 38.1 37.2 37.4 37.8 37.9 38.0 38.1 38.1 38.2 38.5 38.7 34.8 38.1 35.7 36.5 39.5 37.9 34.3 36.8 33.8 35.0 37 8 38 7 37 2 32 8 38 2 38.8 39.2 39.3 39.3 39.4 39.5 39.5 39.7 40.3 40.6 40 9 41 1 41 3 41 4 41 537.8 38.7 37.2 32.8 38.2 37.0 39.7 38.8 35.2 36.2 40.9 41.1 41.3 41.4 41.5 41.6 42.4 43.0 43.7 43.8 Freq enc Distrib tionFreq enc Distrib tionFrequency Distribution Frequency Distribution for Continuous Datafor Continuous Datafor Continuous Datafor Continuous Data Class Relative Number Class Frequency FrequencyNumber Class Frequency Frequency 1 32 and under 34 2 .04 2 34 and under 36 9 .183 a d u de 36 9 8 3 36 and under 38 13 .26 4 38 and under 40 14 .28 5 40 d d 42 8 165 40 and under 42 8 .16 6 42 and under 44 4 .08
  • 152.
    50 1 00501.00 Constructing a FrequencyConstructing a FrequencyConstructing a Frequency Constructing a Frequency DistributionDistributionDistributionDistribution 1.1. Gather the sample dataGather the sample datapp 2.2. Arrange the data in an ordered arrayArrange the data in an ordered array 3.3. Select the number of classes to be usedSelect the number of classes to be used 44 Determine the class widthDetermine the class width4.4. Determine the class widthDetermine the class width 5.5. Determine the class limits for each classDetermine the class limits for each class 6.6. Count the number of data values in each Count the number of data values in each class (the class frequencies)class (the class frequencies)class (the class frequencies)class (the class frequencies) 7.7. Summarize the class frequencies in a Summarize the class frequencies in a f di t ib ti t blf di t ib ti t blfrequency distribution tablefrequency distribution table HistogramHistogramgg Histogram is a graphical A Histogram is a graphical representation of a frequency representation of a frequency
  • 153.
    di t ibti f ti d tdi t ib ti f ti d tdistribution for continuous datadistribution for continuous data proportional toproportional toproportional to proportional to the frequency the frequency ffof that classof that class HistogramHistogramgg 1515 ——15 15 —— 12 12 ——cycy 9 9 —— eq ue nc eq ue nc 6 6 —— 33 —— Fr e Fr e
  • 154.
    3 3 —— 3232 34343636 3838 4040 4242 4444 Starting salary (thousands of dollars)Starting salary (thousands of dollars) 3232 3434 3636 3838 4040 4242 4444 HistogramHistogramgg 3030.30 .30 —— .24 .24 ——ue nc y ue nc y .18 .18 —— ve fr eq ve fr eq .12 .12 ——
  • 155.
    0606 ——R el at iv R el at iv .06 .06 —— 32323434 3636 3838 4040 4242 4444 Starting salary (thousands of dollars)Starting salary (thousands of dollars) 3232 3434 3636 3838 4040 4242 4444 StemStem--andand--Leaf DiagramsLeaf Diagramsgg --andand--Leaf Diagrams were Leaf Diagrams were developed to summarize data developed to summarize data de e oped to su a e datade e oped to su a e data without loss of informationwithout loss of information d l i dd l i dmoderately sized moderately sized data sets (< 150 values)data sets (< 150 values)( )( )
  • 156.
    StemStem--andand--Leaf DiagramsLeaf Diagramsgg Reportsof the afterReports of the after--tax profits of 12 companies tax profits of 12 companies are (recorded as cents per dollar of revenue) as are (recorded as cents per dollar of revenue) as follows:follows: 3.4, 4.5, 2.3, 2.7, 3.8, 5.9, 3.4, 4.7, 2.4, 4.1, 3.6, 5.1 3.4, 4.5, 2.3, 2.7, 3.8, 5.9, 3.4, 4.7, 2.4, 4.1, 3.6, 5.1 Stem Leaf (unit = .1) 2 3 4 72 3 4 7 3 4 4 6 8 4 1 5 74 1 5 7 5 1 9 StemStem--andand--LeafLeaf By rotating the By rotating the StemStem andand Leaf weLeaf we 7 6 8 7 StemStem--andand--Leaf we Leaf we get an image of the get an image of the h f th d th f th d t 3
  • 157.
    4 4 4 1 5 1 9 shape ofthe data.shape of the data. 2 3 4 5 Ordered Arra ofOrdered Arra ofOrdered Array of Ordered Array of Aptitude Test ScoresAptitude Test ScoresAptitude Test ScoresAptitude Test Scores 22 44 56 68 78 25 44 57 68 78 28 46 59 69 80 31 48 60 71 82 34 49 61 72 8334 49 61 72 83 35 51 63 72 85 39 53 63 74 8839 53 63 74 88 39 53 63 75 90 40 55 65 75 92 42 55 66 76 96
  • 158.
    StemStem--andand--Leaf Diagram forLeafDiagram forStemStem--andand--Leaf Diagram for Leaf Diagram for Aptitude Test ScoresAptitude Test Scorespp St L f ( it 1)Stem Leaf (unit = 1) 2 2 5 8 3 1 4 5 9 9 4 0 2 4 4 6 8 9 5 1 3 3 5 5 6 7 9 6 0 1 3 3 3 5 6 8 8 9 1 2 2 4 6 8 87 1 2 2 4 5 5 6 8 8 8 0 2 3 5 8 9 0 2 69 0 2 6 StemStem--andand--Leaf Diagram for Leaf Diagram for gg Aptitude Test ScoresAptitude Test Scores Stem Leaf (unit = 1) 2 2 Using Using 2 5 8 3 1 4 3 5 9 9 repeated repeated stemsstems 4 0 2 4 4 4 6 8 9 5 1 3 35 1 3 3
  • 159.
    5 5 56 7 9 6 0 1 3 3 3 6 5 6 8 8 96 5 6 8 8 9 7 1 2 2 4 7 5 5 6 8 8 8 0 2 38 0 2 3 8 5 8 9 0 2 9 6 Frequency PolygonFrequency Polygon frequency polygon is a graph that represents the shape of the datarepresents the shape of the data conceptualized as a connection of the midpoints of the classes at theof the midpoints of the classes at theof the midpoints of the classes at the of the midpoints of the classes at the height specified by the frequencyheight specified by the frequency polygon is similar to a A relative frequency polygon is similar to a frequency polygon, except that the height frequency polygon, except that the height is dictated by the relative frequencyis dictated by the relative frequency
  • 160.
    Frequency PolygonFrequency Polygon FrequencyPolygonFrequency Polygon 100 100 –– 90 90 –– 80 80 –– 70 70 –– yy 60 60 –– 50 50 –– 4040e qu en cy eq ue nc y 40 40 –– 30 30 –– 2020 –– Fr e
  • 161.
    Fr e 20 20 10 10–– || 1010 || 1515 || 2020 || 2525 || 3030 || 3535 || 4040 || 4545 || 5050 ****** ** 4 cities had populations of less than 10 0004 cities had
  • 162.
    populations of lessthan 10 000 1010 1515 2020 2525 3030 3535 4040 4545 5050 Population (thousands)Population (thousands) 4 cities had populations of less than 10,0004 cities had populations of less than 10,000 **** 5 cities had populations of 50,000 or greater5 cities had populations of 50,000 or greater Frequency PolygonFrequency Polygon No college degreeNo college degreeNo college degreeNo college degree ye es ye es College degreeCollege degree em pl oy em pl oy m be
  • 163.
  • 164.
    || 9090 N um N um || 100100 Annual salaries (thousandsof dollars)Annual salaries (thousands of dollars) Cumulative FrequenciesCumulative Frequenciesqq table provides information on theprovides information on theprovides information on the provides information on the number of values that are less number of values that are less than the pper class limitthan the pper class limitthan the upper class limitthan the upper class limit can be presented Results can be presented graphically with an ogivegraphically with an ogive Starting SalariesStarting Salariesgg C l tiCumulative
  • 165.
    Class Cumulative RelativeRelative Number Class Frequency Frequency Frequency Frequency 1 32 and under 34 2 2 .04 .04 2 34 and under 36 9 11 .18 .22 3 36 and under 38 13 24 .26 .483 36 and under 38 13 24 .26 .48 4 38 and under 40 14 38 .28 .76 5 40 and under 42 8 46 .16 .92 6 42 and under 44 4 50 08 1 006 42 and under 44 4 50 .08 1.00 50 1.00 Starting SalariesStarting Salariesgg SalarySalary Bar ChartsBar Charts used for graphical representation of nominal and representation of nominal and pp ordinal dataordinal data histogram the height of As with a histogram the height of the bar is proportional to the the bar is proportional to the p pp p number of values in the categorynumber of values in the category
  • 166.
    Graduating Business MajorsGraduatingBusiness Majors 2626 or s or s 30 30 –– 25 25 –– 1414 r of m aj o r of m aj o 20 20 –– 1515 –– 1010
  • 167.
    N um be r N um be r 15 15–– 10 10 –– NN 5 5 –– AccountingAccounting Information Information systemssystems MarketingMarketing Horizontal Bar ChartHorizontal Bar Chart Q.Q. If the price of natural gas goes down by 25% in the next few If the price of natural gas goes down by 25% in the next few years, would you and your family use more or less?years, would you and your family use more or less? A.A. 00 55 1010 1515 2020 2525 3030 3535 Use more Use about the same
  • 168.
    Use less Not sure BarChart of Quality CostsBar Chart of Quality Costs s)s) 80 80 –– 6060o us an ds ou sa nd s 60 60 –– 4040e nt (t ho en t ( th o
  • 169.
    40 40 –– 2020la rs s pe la rs s pe 2020 –– –– D ol l D ol l PreventionPrevention AppraisalAppraisal FailureFailure Quality cost categoryQuality cost categoryy g yy g y Pie ChartPie Chart alternative to the The Pie Chart is an alternative to the
  • 170.
    bar chart fornominal and ordinal bar chart for nominal and ordinal datadata represents The proportion of the Pie represents the category’s percentage in the the category’s percentage in the g y p gg y p g population or samplepopulation or sample Percentage of GraduatingPercentage of GraduatingPercentage of Graduating Percentage of Graduating Business MajorsBusiness Majorsjj Accounting majors AA InformationInformation AA BB CC Information Information systems systems majorsmajors Marketing Marketing majorsmajors
  • 171.
    Due by 11pmJune 30th Chapter 1 Overview of Statistics Chapter 2 Data Collection Chapter 3 Describing Data Visually Upload the completed assignment using the file extension format Lastname_Firstname_Week1.doc. Assignment (32 points due by 11 pm June 30th) Note: You can team up with one of your classmates to complete the assignment (not more than two in a team); if you want to work on the assignment individually, that’s also fine. If you are working in teams, then only one submission is required per team; include both the team members’ last names as part of the assignment submission file name as well as in the assignment submission document. Please provide detailed solutions to the following problems/exercises (4 problems/exercises x 8 points each): 1) What type of data (categorical, discrete numerical, or continuous numerical) is each of the following variables? a) Length of a TV commercial. b) Number of peanuts in a can of Planter’s Mixed Nuts. c) Occupation of a mortgage applicant.
  • 172.
    d) Flight timefrom London Heathrow to Chicago O’Hare. 2) Which measurement level (nominal, ordinal, interval, ratio) is each of the following variables? Explain. a) Number of employees in the Walmart store in Hutchinson, Kansas. b) Number of merchandise returns on a randomly chosen Monday at a Walmart store. c) Temperature (in Fahrenheit) in the ice-cream freezer at a Walmart store. d) Name of the cashier at register 3 in a Walmart store e) Manager’s rating of the cashier at register 3 in a Walmart store. f) Social security number of the cashier at register 3 in a Walmart store. 3) The results of a survey that collected the current credit card balances for 36 undergraduate college students are given in the file “College Credit Card.’ a) Using the 2k > n rule, construct a frequency distribution for these data. b) Using the results from a), calculate the relative frequencies for each class. c) Using the results from a), calculate the cumulative relative frequencies for each class. d) Construct a histogram for these data. 4) The cost of manufacturing vehicles in Mexico is very attractive to automakers. Global carmakers build approximately 1.9 million vehicles in Mexico. Of these, nearly 76% are
  • 173.
    exported, primarily tothe US. Although General Motors is the largest manufacturer in Mexico, Daimler Chrysler exports the most vehicles. Automotive analysts examine both the number of vehicles produced and the number exported (see the data file “Automotive”) to determine the potential market share of each company. a) For the data on vehicles produced in Mexico, construct a bar chart displaying the amount produced by each company. b) Repeat part a) using a pie-chart. c) Construct a bar chart displaying the number of vehicles exported from Mexico. d) Repeat part d) using a pie-chart. e) Do you prefer the bar charts or the pie charts for displaying the data? Explain. f) What differences do the charts reveal for the automotive companies with respect to the number of vehicles produced and number of vehicles exported? 1