SlideShare a Scribd company logo
1 of 38
Chapter
©
2015
Cengage
Learning.
All
Rights
Reserved.
May
not
be
scanned,
copied
or
duplicated,
or
posted
to
a
publicly
accessible
website,
in
whole
or
in
part.
BUSINESS ANALYTICS:
DATA ANALYSIS AND
DECISION MAKING
Finding Relationships among Variables
3
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Introduction
 The primary interest in data analysis is usually in
relationships between variables.
 The most useful numerical summary measure is
correlation.
 The most useful graph is a scatterplot.
 To break down a numerical variable by a categorical
variable, it is useful to create side-by-side box plots.
 Excel’s® pivot table breaks down one variable by
others so that all sorts of relationships can be
uncovered very quickly.
 The diagram in the file Data Analysis
Taxonomy.xlsx gives you the big picture of which
analyses are appropriate for which data types and
which tools are best for performing the various
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Relationships Among
Categorical Variables
 The most meaningful way to examine
relationships between two categorical
variables is with counts and corresponding
charts of the counts.
 You can find counts of the categories of either
variable separately, as well as counts of the joint
categories of the two variables.
 Corresponding percentages of totals and charts
help tell the story.
 It is customary to display all such counts in a
table called a crosstabs (for
crosstabulations). This is also sometimes
called a contingency table.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Example 3.1:
Smoking Drinking.xlsx (slide 1 of 2)
 Objective: To use a
crosstabs to explore the
relationship between
smoking and drinking.
 Solution: Data set lists the
smoking and drinking habits
of 8761 adults.
 Categories have been
coded “N,” “O,” “H,” “S,” and
“D” for “Non,” “Occasional,”
“Heavy,” “Smoker,” and
“Drinker.”
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Example 3.1:
Smoking Drinking.xlsx (slide 2 of 2)
 To create the crosstabs,
enter the category
headings in Excel and
use the COUNTIFS
function to fill the table
with counts of joint
categories.
 Next, sum across rows
and down columns to
get totals.
 Then express the
counts as percentages
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Relationships Among Categorical
Variables and a Numerical Variable
 The comparison problem is one of the most
important problems in data analysis. It occurs
whenever you want to compare a numerical
measure across two or more subpopulations.
 Examples:
 The subpopulations are males and females, and the
numerical measure is salary.
 The subpopulations are different regions of the
country, and the numerical measure is the cost of
living.
 The subpopulations are different days of the week,
and the numerical measure is the number of
customers going to a particular fast-food chain.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Stacked and Unstacked
Formats
 There are two possible data formats, stacked
and unstacked.
 The data are stacked if there are two “long” variables,
such as Gender and Salary. The idea is that the male
salaries are stacked in with the female salaries.
 This is the format you will see in the vast majority of
situations.
 You will occasionally see data in unstacked format,
when there are two “short” variables, such as Male
Salary and Female Salary.
 StatTools is capable of dealing with either format
and can convert from stacked to unstacked or vice
versa.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Stacked and Unstacked Data
Stacked Data Unstacked Data
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Example 3.2:
Baseball Salaries 2011 Extra.xlsx (slide
1 of 2)
 Objective: To learn methods in StatTools for breaking down
baseball salaries by various categorical variables.
 Solution: Data set contains the same 2011 baseball data
examined previously, as well as several extra categorical
variables.
 Create summary measures by selecting One-Variable
Summary from the Summary Statistics dropdown list.
 Next, click the Format button and choose Stacked. Then
choose the Cat variable you want to categorize by and the
Val variable you want to summarize.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Example 3.2:
Baseball Salaries 2011 Extra.xlsx (slide
2 of 2)
 Create side-by-side
boxplots, by
selecting Box-
Whisker Plot from
the Summary
Graphs dropdown
list and filling in the
resulting dialog box.
 Select the Stacked
format so that you
can choose a Cat
variable and a Val
variable.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Relationships Among Numerical
Variables
 To study relationships among numerical
variables, a new type of chart, called a
scatterplot, and two new summary measures,
correlation and covariance, are used.
 These measures can be applied to any
variables that are displayed numerically.
 However, they are appropriate only for truly
numerical variables, not for categorical
variables that have been coded numerically.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Scatterplots
 A scatterplot is a scatter of points, where
each point denotes the values of an
observation for two selected variables.
 It is a graphical method for detecting relationships
between two numerical variables.
 The two variables are often labeled generically as
X and Y, so a scatterplot is sometimes called an
X-Y chart.
 The purpose of a scatterplot is to make a
relationship (or the lack of it) apparent.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Example 3.3:
GolfStats.xlsx (slide 1 of 2)
 Objective: To use scatterplots to search for
relationships in the golf data.
 Solution: Data set includes an observation (stats) for
each of the top 200 earners on the PGA Tour.
 In StatTools, designate a StatTools data set for a
particular year.
 Next, select Scatterplot from the Summary Graphs
dropdown list and then select at least one X variable
and at least one Y variable.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Example 3.3:
GolfStats.xlsx (slide 2 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Trend Lines in Scatterplots
 Once you have a scatterplot, Excel enables
you to superimpose one of several trend lines
on the scatterplot.
 A trend line is a line or curve that “fits” the scatter
as well as possible.
 This could be a straight line, or it could be one of
several types of curves.
 To do this, right-click on any point in the chart,
select Add Trendline, and fill out the resulting
dialog box.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Scatterplot with Trend Line and
Equation Superimposed
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Correlation and Covariance
(slide 1 of 4)
 Correlation and covariance measure the strength
and direction of a linear relationship between two
numerical variables.
 The relationship is “strong” if the points in a scatterplot
cluster tightly around some straight line.
 If this straight line rises from left to right, the relationship is
positive and the measures will be positive numbers.
 If it falls from left to right, the relationship is negative and
the measures will be negative numbers.
 The two numerical variables must be “paired”
variables.
 They must have the same number of observations, and the
values for any observation should be naturally paired.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Correlation and Covariance
(slide 2 of 4)
 Covariance is essentially an average of
products of deviations from means.
 Excel has a built-in COVAR function, and
StatTools also calculates covariances
automatically.
 Covariance has a serious limitation as a
descriptive measure because it is very
sensitive to the units in which X and Y are
measured.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Correlation and Covariance
(slide 3 of 4)
 Correlation is a unitless quantity that is
unaffected by the measurement scale.
 The correlation is always between -1 and +1.
 The closer it is to either of these two extremes,
the closer the points in a scatterplot are to a
straight line.
 Excel has a built-in CORREL function, and
StatTools also calculates correlations
automatically.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Correlation and Covariance
(slide 4 of 4)
 Three important points about scatterplots,
correlations, and covariances:
 A correlation is a single-number summary of a
scatterplot. It never conveys as much information
as the full scatterplot.
 You are usually on the lookout for large
correlations, those near -1 or +1.
 Do not even try to interpret covariances
numerically except possibly to check whether
they are positive or negative. For interpretive
purposes, concentrate on correlations.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Example 3.3 (Continued)
GolfStats.xlsx (slide 1 of 2)
 Objective: To use correlations to understand
relationships in the golf data.
 Solution: In StatTools, create a table of
correlations by selecting Correlation and
Covariance from the Summary Statistics
dropdown list.
 Fill in the resulting dialog box and check
Correlations.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Example 3.3 (Continued)
GolfStats.xlsx (slide 2 of 2)
 You can learn more about a correlation by
creating the corresponding scatterplot.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Pivot Tables
 The pivot table is an Excel tool that allows
you to break data down by categories.
 Sometimes pivot tables are used to display
tables of counts, often called crosstabs or
contingency tables.
 However, crosstabs typically list only counts,
whereas pivot tables can list counts, sums,
averages, and other summary measures.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Example 3.4:
Elecmart Sales.xlsx (slide 1 of 2)
 Objective: To use pivot tables to break down the
customer order data by a number of categorical
variables.
 Solution: Data set contains data on 400 customer
orders during several months for Elecmart company.
 Create a pivot table by clicking the PivotTable button
on the Insert ribbon.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Example 3.4:
Elecmart Sales.xlsx (slide 2 of 2)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Hiding Categories (Filtering)
 You can filter out any items in a pivot table that
you don’t want to see.
 Click the Row Labels dropdown arrow of the active
field and check the items you want to filter on.
 A pivot table with hidden categories is shown below.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Sorting on Values or Categories
 It is easy to sort in a pivot table, either by the
numbers in the Values area or by the labels in
a Rows or Columns field.
 To sort by the numbers in the Values area, right-
click any number and select Sort.
 To sort on the labels of a Rows or Columns field,
right-click any of the categories and select Sort.
 You can also click the dropdown arrow for the field and
get the dialog box that allows both sorting and filtering.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Changing Locations of Fields
(Pivoting)
 You can choose where to place variables in a
pivot table.
 For example, to place the Region variable in the
Columns area, drag the Region button from the
Rows area of the PivotTable Fields pane to the
Columns area.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Changing Field Settings
 You can change various settings in the Field
Settings dialog box.
 To get to this dialog box:
 Click the Field Setting button on the Analyze/Options
ribbon.
 OR right-click any of the pivot table cells and select the
Field Settings item.
 The pivot table with Value Field Settings changed to
Average is shown below.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Pivot Charts
 It is easy to accompany pivot tables with pivot
charts.
 These charts adapt automatically to the underlying
pivot table.
 To create a pivot chart, click anywhere inside the pivot
table, select the PivotChart button on the
Analyze/Options ribbon, and select a chart type.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Multiple Variables in the Values
Area
 More than a single variable can be placed in
the Values area.
 Also, a given variable in the Values area can
be summarized by more than one
summarizing function.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Summarizing by Count
 The variable in the Values area can be
summarized by the Count function.
 This is useful when you want to know, for example,
how many of the orders were placed by females in the
South.
 Right-click any number in the pivot table, select Value
Field Settings, and select the Count function.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Grouping
 Categories in a Rows or Columns variable can be
grouped.
 Suppose you want to summarize Sum of Total
Cost by Date.
 Starting with a blank pivot table, check both Date and
Total Cost in the PivotTable Fields pane.
 Then right-click any date and select Group.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Other Pivot Table Features
 Showing/hiding subtotals and grand totals (check the Layout
options on the Design ribbon)
 Dealing with blank rows, that is, categories with no data (right-click
any number, choose PivotTable Options, and check the options on
the Layout & Format tab)
 Displaying the data behind a given number in a pivot table (double-
click any number in the Values area to get a new worksheet)
 Formatting a pivot table with various styles (check the style options
on the Design ribbon)
 Moving or renaming pivot tables (check the PivotTable and Action
groups on the Analyze/Options ribbon)
 Refreshing pivot tables as the underlying data changes (check the
Refresh dropdown list on the Analyze/Options ribbon)
 Creating pivot table formulas for calculated fields or calculated
items (check the Formulas dropdown list on the Analyze/Options
ribbon)
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Example 3.5:
Lasagna Triers.xlsx (slide 1 of 2)
 Objective: To use pivot tables to explore which
demographic variables help to distinguish lasagna
triers from nontriers.
 Solution: Data set contains data on over 800
potential customers being tracked by a frozen
lasagna company.
 Set up a pivot table that shows counts of triers
and nontriers for different categories of the
variables.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Example 3.5:
Lasagna Triers.xlsx (slide 2 of 2)
Pivot Table and Pivot Chart for Examining the Effect of
Gender
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Slicers and Timelines
 In Excel 2010, Microsoft added slicers—lists
of the distinct values of any variable, which
you can then filter on.
 You add a slicer from the Analyze/Options ribbon
under PivotTable Tools.
 In Excel 2013, a Timeline feature was added.
A Timeline is like a slicer, but it is specifically
for filtering on a date variable.
© 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in
Pivot Table with Slicers and a
Timeline

More Related Content

What's hot

Using vlookup in excel
Using vlookup in excelUsing vlookup in excel
Using vlookup in excel
megankilb
 

What's hot (20)

Nurses Data Analysis by Applied SPSS
Nurses Data Analysis by Applied SPSSNurses Data Analysis by Applied SPSS
Nurses Data Analysis by Applied SPSS
 
Spss an introduction
Spss  an introductionSpss  an introduction
Spss an introduction
 
E miner
E minerE miner
E miner
 
Statistical softwares
Statistical softwaresStatistical softwares
Statistical softwares
 
USING VLOOKUP FUNCTION
USING VLOOKUP FUNCTIONUSING VLOOKUP FUNCTION
USING VLOOKUP FUNCTION
 
Data entry in Excel and SPSS
Data entry in Excel and SPSS Data entry in Excel and SPSS
Data entry in Excel and SPSS
 
WorldCat Local Lists for Serials Reporting
WorldCat Local Lists for Serials ReportingWorldCat Local Lists for Serials Reporting
WorldCat Local Lists for Serials Reporting
 
Using vlookup in excel
Using vlookup in excelUsing vlookup in excel
Using vlookup in excel
 
Chap01 intro & data collection
Chap01 intro & data collectionChap01 intro & data collection
Chap01 intro & data collection
 
Tableau Final Presentation
Tableau Final PresentationTableau Final Presentation
Tableau Final Presentation
 
spss teaching
spss teachingspss teaching
spss teaching
 
Spss by vijay ambast
Spss by vijay ambastSpss by vijay ambast
Spss by vijay ambast
 
Application of SPSS by umakant bhaskar gohatre
Application of SPSS by umakant bhaskar gohatre Application of SPSS by umakant bhaskar gohatre
Application of SPSS by umakant bhaskar gohatre
 
Excel Datamining Addin Beginner
Excel Datamining Addin BeginnerExcel Datamining Addin Beginner
Excel Datamining Addin Beginner
 
Moderation and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSSModeration and Meditation conducting in SPSS
Moderation and Meditation conducting in SPSS
 
Chap02 presenting data in chart & tables
Chap02 presenting data in chart & tablesChap02 presenting data in chart & tables
Chap02 presenting data in chart & tables
 
224-2009
224-2009224-2009
224-2009
 
Spss
SpssSpss
Spss
 
Basics of Graphpad prism
Basics of Graphpad prismBasics of Graphpad prism
Basics of Graphpad prism
 
Forecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear RegressionForecasting Stock Market using Multiple Linear Regression
Forecasting Stock Market using Multiple Linear Regression
 

Similar to 1133629601 400440

Quality management consultant
Quality management consultantQuality management consultant
Quality management consultant
selinasimpson2501
 
Quality document management system
Quality document management systemQuality document management system
Quality document management system
selinasimpson2501
 
Functions of quality management
Functions of quality managementFunctions of quality management
Functions of quality management
selinasimpson2901
 
Masters in quality management
Masters in quality managementMasters in quality management
Masters in quality management
selinasimpson0501
 
Asian institute of quality management
Asian institute of quality managementAsian institute of quality management
Asian institute of quality management
selinasimpson1301
 
Assignment DetailsInfluence ProcessesYou have been encourag.docx
Assignment DetailsInfluence ProcessesYou have been encourag.docxAssignment DetailsInfluence ProcessesYou have been encourag.docx
Assignment DetailsInfluence ProcessesYou have been encourag.docx
faithxdunce63732
 
Software quality management system
Software quality management systemSoftware quality management system
Software quality management system
selinasimpson1801
 

Similar to 1133629601 400440 (20)

Tqm old tools
Tqm old toolsTqm old tools
Tqm old tools
 
OLD SEVEN TOOLS OF QUALTIY MANAGEMENT
OLD SEVEN TOOLS OF QUALTIY MANAGEMENTOLD SEVEN TOOLS OF QUALTIY MANAGEMENT
OLD SEVEN TOOLS OF QUALTIY MANAGEMENT
 
Structural equation modeling in amos
Structural equation modeling in amosStructural equation modeling in amos
Structural equation modeling in amos
 
Quality management consultant
Quality management consultantQuality management consultant
Quality management consultant
 
Data analysis.pptx
Data analysis.pptxData analysis.pptx
Data analysis.pptx
 
Types of graphs and charts and their uses with examples and pics
Types of graphs and charts and their uses  with examples and picsTypes of graphs and charts and their uses  with examples and pics
Types of graphs and charts and their uses with examples and pics
 
Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)Weka term paper(siddharth 10 bm60086)
Weka term paper(siddharth 10 bm60086)
 
Quality document management system
Quality document management systemQuality document management system
Quality document management system
 
Dealing with data-excel
Dealing with data-excelDealing with data-excel
Dealing with data-excel
 
Advanced Models
Advanced ModelsAdvanced Models
Advanced Models
 
Functions of quality management
Functions of quality managementFunctions of quality management
Functions of quality management
 
Predictive Modeling with Enterprise Miner
Predictive Modeling with Enterprise MinerPredictive Modeling with Enterprise Miner
Predictive Modeling with Enterprise Miner
 
Masters in quality management
Masters in quality managementMasters in quality management
Masters in quality management
 
Asian institute of quality management
Asian institute of quality managementAsian institute of quality management
Asian institute of quality management
 
Quality data management
Quality data managementQuality data management
Quality data management
 
Quality data management
Quality data managementQuality data management
Quality data management
 
Assignment DetailsInfluence ProcessesYou have been encourag.docx
Assignment DetailsInfluence ProcessesYou have been encourag.docxAssignment DetailsInfluence ProcessesYou have been encourag.docx
Assignment DetailsInfluence ProcessesYou have been encourag.docx
 
The Art of Data Visualization in Microsoft Excel for Mac.pdf
The Art of Data Visualization in Microsoft Excel for Mac.pdfThe Art of Data Visualization in Microsoft Excel for Mac.pdf
The Art of Data Visualization in Microsoft Excel for Mac.pdf
 
Spss software
Spss softwareSpss software
Spss software
 
Software quality management system
Software quality management systemSoftware quality management system
Software quality management system
 

Recently uploaded

Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
PECB
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
QucHHunhnh
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
kauryashika82
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
heathfieldcps1
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
Chris Hunter
 

Recently uploaded (20)

Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104Nutritional Needs Presentation - HLTH 104
Nutritional Needs Presentation - HLTH 104
 
Application orientated numerical on hev.ppt
Application orientated numerical on hev.pptApplication orientated numerical on hev.ppt
Application orientated numerical on hev.ppt
 
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17  How to Extend Models Using Mixin ClassesMixin Classes in Odoo 17  How to Extend Models Using Mixin Classes
Mixin Classes in Odoo 17 How to Extend Models Using Mixin Classes
 
Unit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptxUnit-IV; Professional Sales Representative (PSR).pptx
Unit-IV; Professional Sales Representative (PSR).pptx
 
Asian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptxAsian American Pacific Islander Month DDSD 2024.pptx
Asian American Pacific Islander Month DDSD 2024.pptx
 
Python Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docxPython Notes for mca i year students osmania university.docx
Python Notes for mca i year students osmania university.docx
 
Key note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdfKey note speaker Neum_Admir Softic_ENG.pdf
Key note speaker Neum_Admir Softic_ENG.pdf
 
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
TỔNG ÔN TẬP THI VÀO LỚP 10 MÔN TIẾNG ANH NĂM HỌC 2023 - 2024 CÓ ĐÁP ÁN (NGỮ Â...
 
Micro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdfMicro-Scholarship, What it is, How can it help me.pdf
Micro-Scholarship, What it is, How can it help me.pdf
 
General Principles of Intellectual Property: Concepts of Intellectual Proper...
General Principles of Intellectual Property: Concepts of Intellectual  Proper...General Principles of Intellectual Property: Concepts of Intellectual  Proper...
General Principles of Intellectual Property: Concepts of Intellectual Proper...
 
microwave assisted reaction. General introduction
microwave assisted reaction. General introductionmicrowave assisted reaction. General introduction
microwave assisted reaction. General introduction
 
Beyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global ImpactBeyond the EU: DORA and NIS 2 Directive's Global Impact
Beyond the EU: DORA and NIS 2 Directive's Global Impact
 
1029-Danh muc Sach Giao Khoa khoi 6.pdf
1029-Danh muc Sach Giao Khoa khoi  6.pdf1029-Danh muc Sach Giao Khoa khoi  6.pdf
1029-Danh muc Sach Giao Khoa khoi 6.pdf
 
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in DelhiRussian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
Russian Escort Service in Delhi 11k Hotel Foreigner Russian Call Girls in Delhi
 
The basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptxThe basics of sentences session 3pptx.pptx
The basics of sentences session 3pptx.pptx
 
Unit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptxUnit-IV- Pharma. Marketing Channels.pptx
Unit-IV- Pharma. Marketing Channels.pptx
 
PROCESS RECORDING FORMAT.docx
PROCESS      RECORDING        FORMAT.docxPROCESS      RECORDING        FORMAT.docx
PROCESS RECORDING FORMAT.docx
 
Class 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdfClass 11th Physics NEET formula sheet pdf
Class 11th Physics NEET formula sheet pdf
 
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
Explore beautiful and ugly buildings. Mathematics helps us create beautiful d...
 
Making and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdfMaking and Justifying Mathematical Decisions.pdf
Making and Justifying Mathematical Decisions.pdf
 

1133629601 400440

  • 2. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Introduction  The primary interest in data analysis is usually in relationships between variables.  The most useful numerical summary measure is correlation.  The most useful graph is a scatterplot.  To break down a numerical variable by a categorical variable, it is useful to create side-by-side box plots.  Excel’s® pivot table breaks down one variable by others so that all sorts of relationships can be uncovered very quickly.  The diagram in the file Data Analysis Taxonomy.xlsx gives you the big picture of which analyses are appropriate for which data types and which tools are best for performing the various
  • 3. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Relationships Among Categorical Variables  The most meaningful way to examine relationships between two categorical variables is with counts and corresponding charts of the counts.  You can find counts of the categories of either variable separately, as well as counts of the joint categories of the two variables.  Corresponding percentages of totals and charts help tell the story.  It is customary to display all such counts in a table called a crosstabs (for crosstabulations). This is also sometimes called a contingency table.
  • 4. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Example 3.1: Smoking Drinking.xlsx (slide 1 of 2)  Objective: To use a crosstabs to explore the relationship between smoking and drinking.  Solution: Data set lists the smoking and drinking habits of 8761 adults.  Categories have been coded “N,” “O,” “H,” “S,” and “D” for “Non,” “Occasional,” “Heavy,” “Smoker,” and “Drinker.”
  • 5. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Example 3.1: Smoking Drinking.xlsx (slide 2 of 2)  To create the crosstabs, enter the category headings in Excel and use the COUNTIFS function to fill the table with counts of joint categories.  Next, sum across rows and down columns to get totals.  Then express the counts as percentages
  • 6. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Relationships Among Categorical Variables and a Numerical Variable  The comparison problem is one of the most important problems in data analysis. It occurs whenever you want to compare a numerical measure across two or more subpopulations.  Examples:  The subpopulations are males and females, and the numerical measure is salary.  The subpopulations are different regions of the country, and the numerical measure is the cost of living.  The subpopulations are different days of the week, and the numerical measure is the number of customers going to a particular fast-food chain.
  • 7. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Stacked and Unstacked Formats  There are two possible data formats, stacked and unstacked.  The data are stacked if there are two “long” variables, such as Gender and Salary. The idea is that the male salaries are stacked in with the female salaries.  This is the format you will see in the vast majority of situations.  You will occasionally see data in unstacked format, when there are two “short” variables, such as Male Salary and Female Salary.  StatTools is capable of dealing with either format and can convert from stacked to unstacked or vice versa.
  • 8. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Stacked and Unstacked Data Stacked Data Unstacked Data
  • 9. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Example 3.2: Baseball Salaries 2011 Extra.xlsx (slide 1 of 2)  Objective: To learn methods in StatTools for breaking down baseball salaries by various categorical variables.  Solution: Data set contains the same 2011 baseball data examined previously, as well as several extra categorical variables.  Create summary measures by selecting One-Variable Summary from the Summary Statistics dropdown list.  Next, click the Format button and choose Stacked. Then choose the Cat variable you want to categorize by and the Val variable you want to summarize.
  • 10. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Example 3.2: Baseball Salaries 2011 Extra.xlsx (slide 2 of 2)  Create side-by-side boxplots, by selecting Box- Whisker Plot from the Summary Graphs dropdown list and filling in the resulting dialog box.  Select the Stacked format so that you can choose a Cat variable and a Val variable.
  • 11. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Relationships Among Numerical Variables  To study relationships among numerical variables, a new type of chart, called a scatterplot, and two new summary measures, correlation and covariance, are used.  These measures can be applied to any variables that are displayed numerically.  However, they are appropriate only for truly numerical variables, not for categorical variables that have been coded numerically.
  • 12. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Scatterplots  A scatterplot is a scatter of points, where each point denotes the values of an observation for two selected variables.  It is a graphical method for detecting relationships between two numerical variables.  The two variables are often labeled generically as X and Y, so a scatterplot is sometimes called an X-Y chart.  The purpose of a scatterplot is to make a relationship (or the lack of it) apparent.
  • 13. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Example 3.3: GolfStats.xlsx (slide 1 of 2)  Objective: To use scatterplots to search for relationships in the golf data.  Solution: Data set includes an observation (stats) for each of the top 200 earners on the PGA Tour.  In StatTools, designate a StatTools data set for a particular year.  Next, select Scatterplot from the Summary Graphs dropdown list and then select at least one X variable and at least one Y variable.
  • 14. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Example 3.3: GolfStats.xlsx (slide 2 of 2)
  • 15. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Trend Lines in Scatterplots  Once you have a scatterplot, Excel enables you to superimpose one of several trend lines on the scatterplot.  A trend line is a line or curve that “fits” the scatter as well as possible.  This could be a straight line, or it could be one of several types of curves.  To do this, right-click on any point in the chart, select Add Trendline, and fill out the resulting dialog box.
  • 16. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Scatterplot with Trend Line and Equation Superimposed
  • 17. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Correlation and Covariance (slide 1 of 4)  Correlation and covariance measure the strength and direction of a linear relationship between two numerical variables.  The relationship is “strong” if the points in a scatterplot cluster tightly around some straight line.  If this straight line rises from left to right, the relationship is positive and the measures will be positive numbers.  If it falls from left to right, the relationship is negative and the measures will be negative numbers.  The two numerical variables must be “paired” variables.  They must have the same number of observations, and the values for any observation should be naturally paired.
  • 18. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Correlation and Covariance (slide 2 of 4)  Covariance is essentially an average of products of deviations from means.  Excel has a built-in COVAR function, and StatTools also calculates covariances automatically.  Covariance has a serious limitation as a descriptive measure because it is very sensitive to the units in which X and Y are measured.
  • 19. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Correlation and Covariance (slide 3 of 4)  Correlation is a unitless quantity that is unaffected by the measurement scale.  The correlation is always between -1 and +1.  The closer it is to either of these two extremes, the closer the points in a scatterplot are to a straight line.  Excel has a built-in CORREL function, and StatTools also calculates correlations automatically.
  • 20. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Correlation and Covariance (slide 4 of 4)  Three important points about scatterplots, correlations, and covariances:  A correlation is a single-number summary of a scatterplot. It never conveys as much information as the full scatterplot.  You are usually on the lookout for large correlations, those near -1 or +1.  Do not even try to interpret covariances numerically except possibly to check whether they are positive or negative. For interpretive purposes, concentrate on correlations.
  • 21. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Example 3.3 (Continued) GolfStats.xlsx (slide 1 of 2)  Objective: To use correlations to understand relationships in the golf data.  Solution: In StatTools, create a table of correlations by selecting Correlation and Covariance from the Summary Statistics dropdown list.  Fill in the resulting dialog box and check Correlations.
  • 22. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Example 3.3 (Continued) GolfStats.xlsx (slide 2 of 2)  You can learn more about a correlation by creating the corresponding scatterplot.
  • 23. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Pivot Tables  The pivot table is an Excel tool that allows you to break data down by categories.  Sometimes pivot tables are used to display tables of counts, often called crosstabs or contingency tables.  However, crosstabs typically list only counts, whereas pivot tables can list counts, sums, averages, and other summary measures.
  • 24. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Example 3.4: Elecmart Sales.xlsx (slide 1 of 2)  Objective: To use pivot tables to break down the customer order data by a number of categorical variables.  Solution: Data set contains data on 400 customer orders during several months for Elecmart company.  Create a pivot table by clicking the PivotTable button on the Insert ribbon.
  • 25. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Example 3.4: Elecmart Sales.xlsx (slide 2 of 2)
  • 26. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Hiding Categories (Filtering)  You can filter out any items in a pivot table that you don’t want to see.  Click the Row Labels dropdown arrow of the active field and check the items you want to filter on.  A pivot table with hidden categories is shown below.
  • 27. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Sorting on Values or Categories  It is easy to sort in a pivot table, either by the numbers in the Values area or by the labels in a Rows or Columns field.  To sort by the numbers in the Values area, right- click any number and select Sort.  To sort on the labels of a Rows or Columns field, right-click any of the categories and select Sort.  You can also click the dropdown arrow for the field and get the dialog box that allows both sorting and filtering.
  • 28. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Changing Locations of Fields (Pivoting)  You can choose where to place variables in a pivot table.  For example, to place the Region variable in the Columns area, drag the Region button from the Rows area of the PivotTable Fields pane to the Columns area.
  • 29. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Changing Field Settings  You can change various settings in the Field Settings dialog box.  To get to this dialog box:  Click the Field Setting button on the Analyze/Options ribbon.  OR right-click any of the pivot table cells and select the Field Settings item.  The pivot table with Value Field Settings changed to Average is shown below.
  • 30. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Pivot Charts  It is easy to accompany pivot tables with pivot charts.  These charts adapt automatically to the underlying pivot table.  To create a pivot chart, click anywhere inside the pivot table, select the PivotChart button on the Analyze/Options ribbon, and select a chart type.
  • 31. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Multiple Variables in the Values Area  More than a single variable can be placed in the Values area.  Also, a given variable in the Values area can be summarized by more than one summarizing function.
  • 32. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Summarizing by Count  The variable in the Values area can be summarized by the Count function.  This is useful when you want to know, for example, how many of the orders were placed by females in the South.  Right-click any number in the pivot table, select Value Field Settings, and select the Count function.
  • 33. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Grouping  Categories in a Rows or Columns variable can be grouped.  Suppose you want to summarize Sum of Total Cost by Date.  Starting with a blank pivot table, check both Date and Total Cost in the PivotTable Fields pane.  Then right-click any date and select Group.
  • 34. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Other Pivot Table Features  Showing/hiding subtotals and grand totals (check the Layout options on the Design ribbon)  Dealing with blank rows, that is, categories with no data (right-click any number, choose PivotTable Options, and check the options on the Layout & Format tab)  Displaying the data behind a given number in a pivot table (double- click any number in the Values area to get a new worksheet)  Formatting a pivot table with various styles (check the style options on the Design ribbon)  Moving or renaming pivot tables (check the PivotTable and Action groups on the Analyze/Options ribbon)  Refreshing pivot tables as the underlying data changes (check the Refresh dropdown list on the Analyze/Options ribbon)  Creating pivot table formulas for calculated fields or calculated items (check the Formulas dropdown list on the Analyze/Options ribbon)
  • 35. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Example 3.5: Lasagna Triers.xlsx (slide 1 of 2)  Objective: To use pivot tables to explore which demographic variables help to distinguish lasagna triers from nontriers.  Solution: Data set contains data on over 800 potential customers being tracked by a frozen lasagna company.  Set up a pivot table that shows counts of triers and nontriers for different categories of the variables.
  • 36. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Example 3.5: Lasagna Triers.xlsx (slide 2 of 2) Pivot Table and Pivot Chart for Examining the Effect of Gender
  • 37. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Slicers and Timelines  In Excel 2010, Microsoft added slicers—lists of the distinct values of any variable, which you can then filter on.  You add a slicer from the Analyze/Options ribbon under PivotTable Tools.  In Excel 2013, a Timeline feature was added. A Timeline is like a slicer, but it is specifically for filtering on a date variable.
  • 38. © 2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in Pivot Table with Slicers and a Timeline