SlideShare a Scribd company logo
1 of 23
MANAGEMENT
SCIENCE
The Art of Modeling with Spreadsheets
STEPHEN G. POWELL
KENNETH R. BAKER
Compatible with Analytic Solver Platform
FOURTH EDITION
CHAPTER 5 POWERPOINT
DATA EXPLORATION AND VISUALIZATION
INTRODUCTION
• Business analysts must know how to use data to derive business
insights and improve decisions.
• Analysts may use data to describe situations (e.g., profit over the
last year), predict situations (e.g., profit over the next year), or
prescribe actions the organization must take to achieve its goals.
• Several basic skills are required to understand a data set, explore
individual variables (or groups of them) for insights, and to prepare
data for more complex analysis.
• Remain skeptical of data: datasets are only as good as their
collection methods (e.g., may have been collected with biases), and
may or may not be relevant to the problem at hand.
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 2
DATABASE STRUCTURE
• Spreadsheet databases are two-dimensional files (versus
more complex relational databases).
• Consist of:
– Rows = records (sometimes, “cases” or “instances”)
– Columns = or fields (sometimes “variables,” “descriptors,”
“predictors”
• Most databases contain a data dictionary that
documents fields in detail.
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 3
DATABASE STRUCTURE, EXAMPLE
• The data dictionary for
this sample:
Field Name Description
ID Record number
ITEM Item number
UPC Uniform Product Code
DESCRIPTION Description
SIZE Items per container
STORE Store number
WEEK Week number
SALES Sales volume in cases
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 4
DATABASE STRUCTURE, EXAMPLE
• We might use this
database to answer the
questions:
• What were the market shares
of the various brands?
• What were the weekly sales
volumes at the various
stores?
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 5
TYPES OF DATA
• An infinite variety of data, but just a few common types:
– Categorical data, which includes nominal and ordinal data
– Numerical data, which includes interval and ratio data
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 6
TYPES OF DATA: CATEGORICAL VARIABLES
• Nominal data, which simply names the category of
record.
– Example: A GENDER field, with only two variables (male
and female)
– Example: The DESCRIPTION field in previous slides, with
numerous variables (e.g., ADVIL, TYLENOL X/STRGTH LIQ).
• Ordinal data, also identifies category of record but with a
natural order to the values.
– Example: High, Medium and Low
– Example: Numerical rankings, where 5 = most preferred, 1
= least preferred
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 7
TYPES OF DATA: NUMERICAL DATA
• Interval data, which conveys a sense of the difference
between values.
– Example: The Fahrenheit scale.
• Ratio data, based on a scale with a meaningful zero
point.
– Example: Monetary units, ages.
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 8
DATA EXPLORATION
• Databases are highly structured for storage but do not
automatically reveal patterns and insights.
• We explore databases in a five-step process:
1. Understand the data
2. Organize and subset the database
3. Examine individual variables and their distributions
4. Calculate summary measures for individual variables
5. Examine relationships among variables
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 9
UNDERSTAND THE DATA
• Be skeptical of data, and ask:
– How are fields defined?
– What types of data are represented?
– What units are the data in?
• Example: Job applicants database
– SEX and AGE are unambiguous, but, does CITZ CODE (with U for US, N
for non-US) represent country of birth? Or citizenship? Where the
applicant currently lives? Know how the variable was coded.
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 10
ORGANIZE AND SUBSET THE DATABASE
• Two essential tools: Sort and Filter
– On the Home ribbon in the Editing group and the Data
Ribbon in the Sort and Filter group
• Question: In the Executives database below, do any
duplicate records (EXECID) appear?
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 11
ORGANIZE AND SUBSET THE DATABASE (CONT’D)
• Home►Editing►Sort & Filter►Custom Sort opens the
Sort window
– We sort by the EXECID column, sort on Values, and in
order of A to Z, and click OK.
– We can then scan for duplicate numbers (which appear
above one another)
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 12
ORGANIZE AND SUBSET THE DATABASE (CONT’D)
• We can sort by more than one criterion using Add Level,
for example:
– ROUND then INDUSTRY then JOB MONTHS
– But, ties on the first criterion will be broken by the second,
and the second by the third.
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 13
ORGANIZE AND SUBSET THE DATABASE: FILTERING
• Filtering allows us to probe a large database and extract
what interests us.
• Example: In Applicants database, what are the
characteristics of applicants from nonprofit
organizations?
• Home►Editing►Sort & Filter►Filter. Click on Industry
Description, and uncheck Select All, then check
Nonprofit.
• Does not delete other records, only hides them
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 14
EXAMINE INDIVIDUAL VARIABLES AND THEIR DISTRIBUTION
• For numerical variables, we typically want to know the
range of records from lowest to highest, and areas where
most outcomes lie.
• Example: In Applicants database, what are typical values
for JOB MONTHS and what is the range from lowest to
highest?
• A common way to summarize a set of numerical values is
the histogram, although Excel provides eight choices.
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 15
EXAMINE INDIVIDUAL VARIABLES AND THEIR DISTRIBUTION
(CONT’D)
• In XLMiner add-in, choose
Explore►Chart Wizard, and
the screen at top right
appears.
• In subsequent windows
choose Frequency for Y axis,
JOB MONTHS for X axis, and
the histogram at bottom
right appears.
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 16
CALCULATE SUMMARY MEASURES FOR INDIVIDUAL
VARIABLES (CONT’D)
• Excel provides numerous functions useful for
investigating individual variables.
• Some can summarize the values of numerical variables;
others can be used to identify or count specific variables,
both numerical and categorical.
• Example: What is the average age in the Applicants
database?
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 17
CALCULATE SUMMARY MEASURES FOR INDIVIDUAL
VARIABLES
• The most common summary measure of a numerical
value is average or mean.
• Calculate using the AVERAGE function in Excel, for
example:
AVERAGE (C2:C2918) = 28.97
• Other useful summary measures are median, minimum,
maximum.
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 18
EXAMINE RELATIONSHIPS AMONG VARIABLES
• In many cases relationships among variables are more
important in analysis than the properties of one variable.
• Graphical methods can track relationships.
• Example: How long have older applicants held their
current jobs?
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 19
EXAMINE RELATIONSHIPS AMONG VARIABLES (CONT’D)
• Use XLMiner to create a
scatterplot between AGE
and JOB MONTHS in the
Applicants database.
• Select Explore►Chart
Wizard►Scatterplot Matrix.
• Select variables AGE and
JOB MONTHS, then click
Finish for results at right.
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 20
EXAMINE RELATIONSHIPS AMONG VARIABLES (CONT’D)
• Relationships may be more complex,
based on numerous variables.
– Example: How does the distribution of
GMAT scores of applicants compare
across the five application rounds?
• This asks us to compare five
distributions, each with considerable
information.
• Boxplot option in XLMiner can
generate a chart summarizing
numerous statistics (e.g., mean,
median).
• Select Explore►Chart
Wizard►Boxplot select variables
GMAT and ROUND, click Finish.
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 21
SUMMARY
• The ability to use data intelligently is a vital skill for business
analysts.
• Analysts tend to perform most of their analysis in Excel.
• Understanding the data is the most important step, before
undertaking any analysis.
• Careful preparation of raw data is often required before data
mining can succeed.
– Missing values may have to be removed or replaced with average
values.
– Numerical variables may need to be converted to categorical values
(or vice versa).
– Normalization of data may be required.
Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 22
COPYRIGHT © 2013 JOHN WILEY & SONS, INC.
All rights reserved. Reproduction or translation of
this work beyond that permitted in section 117 of the 1976
United States Copyright Act without express permission of
the copyright owner is unlawful. Request for further
information should be addressed to the Permissions
Department, John Wiley & Sons, Inc. The purchaser may
make back-up copies for his/her own use only and not for
distribution or resale. The Publisher assumes no
responsibility for errors, omissions, or damages caused by
the use of these programs or from the use of the information
herein.

More Related Content

Similar to CHAPTER 5- DATA EXPLORATION AND VALIDATION

Introduction to spss 1
Introduction to spss 1Introduction to spss 1
Introduction to spss 1Michael Taiwo
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial IndustrySubrat Panda, PhD
 
Dimensional data model
Dimensional data modelDimensional data model
Dimensional data modelVnktp1
 
14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx
14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx
14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docxaulasnilda
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional ModelingSunita Sahu
 
Data Samples & Data AnalysesNYU SCPSDataba
Data Samples & Data AnalysesNYU  SCPSDatabaData Samples & Data AnalysesNYU  SCPSDataba
Data Samples & Data AnalysesNYU SCPSDatabaOllieShoresna
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstSpark Summit
 
The Art of Requesting Data from IT
The Art of Requesting Data from ITThe Art of Requesting Data from IT
The Art of Requesting Data from ITBrad Adams
 
unit 4 deta analysis bbaY Dr kanchan.pptx
unit 4 deta analysis bbaY Dr kanchan.pptxunit 4 deta analysis bbaY Dr kanchan.pptx
unit 4 deta analysis bbaY Dr kanchan.pptxProf. Kanchan Kumari
 
unit 4 deta analysis bbaY Dr kanchan.pptx
unit 4 deta analysis bbaY Dr kanchan.pptxunit 4 deta analysis bbaY Dr kanchan.pptx
unit 4 deta analysis bbaY Dr kanchan.pptxProf. Kanchan Kumari
 
Module 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdfModule 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdffathiah5
 
The Impact of Data Science on Finance
The Impact of Data Science on FinanceThe Impact of Data Science on Finance
The Impact of Data Science on FinanceRoger Fried
 

Similar to CHAPTER 5- DATA EXPLORATION AND VALIDATION (20)

Introduction to spss 1
Introduction to spss 1Introduction to spss 1
Introduction to spss 1
 
Machine Learning in the Financial Industry
Machine Learning in the Financial IndustryMachine Learning in the Financial Industry
Machine Learning in the Financial Industry
 
Data Mining GUI Tools with Demo
Data Mining GUI Tools with DemoData Mining GUI Tools with Demo
Data Mining GUI Tools with Demo
 
Dimensional data model
Dimensional data modelDimensional data model
Dimensional data model
 
Data mining
Data miningData mining
Data mining
 
14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx
14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx
14 CREATING A GROUP AND RUNNING A PROJECTIn this chapter, we wil.docx
 
Presentation.pptx
Presentation.pptxPresentation.pptx
Presentation.pptx
 
Big data overview
Big data overviewBig data overview
Big data overview
 
Query basics
Query basicsQuery basics
Query basics
 
Dw concepts
Dw conceptsDw concepts
Dw concepts
 
EDA-Unit 1.pdf
EDA-Unit 1.pdfEDA-Unit 1.pdf
EDA-Unit 1.pdf
 
Dimensional Modeling
Dimensional ModelingDimensional Modeling
Dimensional Modeling
 
Data Samples & Data AnalysesNYU SCPSDataba
Data Samples & Data AnalysesNYU  SCPSDatabaData Samples & Data AnalysesNYU  SCPSDataba
Data Samples & Data AnalysesNYU SCPSDataba
 
TopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David DurstTopNotch: Systematically Quality Controlling Big Data by David Durst
TopNotch: Systematically Quality Controlling Big Data by David Durst
 
The Art of Requesting Data from IT
The Art of Requesting Data from ITThe Art of Requesting Data from IT
The Art of Requesting Data from IT
 
unit 4 deta analysis bbaY Dr kanchan.pptx
unit 4 deta analysis bbaY Dr kanchan.pptxunit 4 deta analysis bbaY Dr kanchan.pptx
unit 4 deta analysis bbaY Dr kanchan.pptx
 
unit 4 deta analysis bbaY Dr kanchan.pptx
unit 4 deta analysis bbaY Dr kanchan.pptxunit 4 deta analysis bbaY Dr kanchan.pptx
unit 4 deta analysis bbaY Dr kanchan.pptx
 
unit 1.pptx
unit 1.pptxunit 1.pptx
unit 1.pptx
 
Module 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdfModule 5 - Data Science Methodology.pdf
Module 5 - Data Science Methodology.pdf
 
The Impact of Data Science on Finance
The Impact of Data Science on FinanceThe Impact of Data Science on Finance
The Impact of Data Science on Finance
 

Recently uploaded

VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130Suhani Kapoor
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...shivangimorya083
 
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service AizawlVip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawlmakika9823
 
Chapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionChapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionMuhammadHusnain82237
 
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdfAdnet Communications
 
Interimreport1 January–31 March2024 Elo Mutual Pension Insurance Company
Interimreport1 January–31 March2024 Elo Mutual Pension Insurance CompanyInterimreport1 January–31 March2024 Elo Mutual Pension Insurance Company
Interimreport1 January–31 March2024 Elo Mutual Pension Insurance CompanyTyöeläkeyhtiö Elo
 
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptxOAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptxhiddenlevers
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...Henry Tapper
 
Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024Bladex
 
The Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdfThe Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdfGale Pooley
 
VIP Call Girls in Saharanpur Aarohi 8250192130 Independent Escort Service Sah...
VIP Call Girls in Saharanpur Aarohi 8250192130 Independent Escort Service Sah...VIP Call Girls in Saharanpur Aarohi 8250192130 Independent Escort Service Sah...
VIP Call Girls in Saharanpur Aarohi 8250192130 Independent Escort Service Sah...Suhani Kapoor
 
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...Call Girls in Nagpur High Profile
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignHenry Tapper
 
Instant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School SpiritInstant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School Spiritegoetzinger
 
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsHigh Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...Pooja Nehwal
 
20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdfAdnet Communications
 
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure serviceCall US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure servicePooja Nehwal
 
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Sapana Sha
 

Recently uploaded (20)

VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
VIP Call Girls Service Dilsukhnagar Hyderabad Call +91-8250192130
 
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
Russian Call Girls In Gtb Nagar (Delhi) 9711199012 💋✔💕😘 Naughty Call Girls Se...
 
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service AizawlVip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
Vip B Aizawl Call Girls #9907093804 Contact Number Escorts Service Aizawl
 
Chapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th editionChapter 2.ppt of macroeconomics by mankiw 9th edition
Chapter 2.ppt of macroeconomics by mankiw 9th edition
 
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANIKA) Budhwar Peth Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf20240429 Calibre April 2024 Investor Presentation.pdf
20240429 Calibre April 2024 Investor Presentation.pdf
 
Interimreport1 January–31 March2024 Elo Mutual Pension Insurance Company
Interimreport1 January–31 March2024 Elo Mutual Pension Insurance CompanyInterimreport1 January–31 March2024 Elo Mutual Pension Insurance Company
Interimreport1 January–31 March2024 Elo Mutual Pension Insurance Company
 
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptxOAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
OAT_RI_Ep19 WeighingTheRisks_Apr24_TheYellowMetal.pptx
 
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
letter-from-the-chair-to-the-fca-relating-to-british-steel-pensions-scheme-15...
 
Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024Bladex Earnings Call Presentation 1Q2024
Bladex Earnings Call Presentation 1Q2024
 
The Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdfThe Economic History of the U.S. Lecture 17.pdf
The Economic History of the U.S. Lecture 17.pdf
 
VIP Call Girls in Saharanpur Aarohi 8250192130 Independent Escort Service Sah...
VIP Call Girls in Saharanpur Aarohi 8250192130 Independent Escort Service Sah...VIP Call Girls in Saharanpur Aarohi 8250192130 Independent Escort Service Sah...
VIP Call Girls in Saharanpur Aarohi 8250192130 Independent Escort Service Sah...
 
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
VVIP Pune Call Girls Katraj (7001035870) Pune Escorts Nearby with Complete Sa...
 
Log your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaignLog your LOA pain with Pension Lab's brilliant campaign
Log your LOA pain with Pension Lab's brilliant campaign
 
Instant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School SpiritInstant Issue Debit Cards - High School Spirit
Instant Issue Debit Cards - High School Spirit
 
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur EscortsHigh Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
High Class Call Girls Nagpur Grishma Call 7001035870 Meet With Nagpur Escorts
 
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
Dharavi Russian callg Girls, { 09892124323 } || Call Girl In Mumbai ...
 
20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf20240417-Calibre-April-2024-Investor-Presentation.pdf
20240417-Calibre-April-2024-Investor-Presentation.pdf
 
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure serviceCall US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
Call US 📞 9892124323 ✅ Kurla Call Girls In Kurla ( Mumbai ) secure service
 
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111Call Girls In Yusuf Sarai Women Seeking Men 9654467111
Call Girls In Yusuf Sarai Women Seeking Men 9654467111
 

CHAPTER 5- DATA EXPLORATION AND VALIDATION

  • 1. MANAGEMENT SCIENCE The Art of Modeling with Spreadsheets STEPHEN G. POWELL KENNETH R. BAKER Compatible with Analytic Solver Platform FOURTH EDITION CHAPTER 5 POWERPOINT DATA EXPLORATION AND VISUALIZATION
  • 2. INTRODUCTION • Business analysts must know how to use data to derive business insights and improve decisions. • Analysts may use data to describe situations (e.g., profit over the last year), predict situations (e.g., profit over the next year), or prescribe actions the organization must take to achieve its goals. • Several basic skills are required to understand a data set, explore individual variables (or groups of them) for insights, and to prepare data for more complex analysis. • Remain skeptical of data: datasets are only as good as their collection methods (e.g., may have been collected with biases), and may or may not be relevant to the problem at hand. Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 2
  • 3. DATABASE STRUCTURE • Spreadsheet databases are two-dimensional files (versus more complex relational databases). • Consist of: – Rows = records (sometimes, “cases” or “instances”) – Columns = or fields (sometimes “variables,” “descriptors,” “predictors” • Most databases contain a data dictionary that documents fields in detail. Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 3
  • 4. DATABASE STRUCTURE, EXAMPLE • The data dictionary for this sample: Field Name Description ID Record number ITEM Item number UPC Uniform Product Code DESCRIPTION Description SIZE Items per container STORE Store number WEEK Week number SALES Sales volume in cases Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 4
  • 5. DATABASE STRUCTURE, EXAMPLE • We might use this database to answer the questions: • What were the market shares of the various brands? • What were the weekly sales volumes at the various stores? Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 5
  • 6. TYPES OF DATA • An infinite variety of data, but just a few common types: – Categorical data, which includes nominal and ordinal data – Numerical data, which includes interval and ratio data Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 6
  • 7. TYPES OF DATA: CATEGORICAL VARIABLES • Nominal data, which simply names the category of record. – Example: A GENDER field, with only two variables (male and female) – Example: The DESCRIPTION field in previous slides, with numerous variables (e.g., ADVIL, TYLENOL X/STRGTH LIQ). • Ordinal data, also identifies category of record but with a natural order to the values. – Example: High, Medium and Low – Example: Numerical rankings, where 5 = most preferred, 1 = least preferred Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 7
  • 8. TYPES OF DATA: NUMERICAL DATA • Interval data, which conveys a sense of the difference between values. – Example: The Fahrenheit scale. • Ratio data, based on a scale with a meaningful zero point. – Example: Monetary units, ages. Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 8
  • 9. DATA EXPLORATION • Databases are highly structured for storage but do not automatically reveal patterns and insights. • We explore databases in a five-step process: 1. Understand the data 2. Organize and subset the database 3. Examine individual variables and their distributions 4. Calculate summary measures for individual variables 5. Examine relationships among variables Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 9
  • 10. UNDERSTAND THE DATA • Be skeptical of data, and ask: – How are fields defined? – What types of data are represented? – What units are the data in? • Example: Job applicants database – SEX and AGE are unambiguous, but, does CITZ CODE (with U for US, N for non-US) represent country of birth? Or citizenship? Where the applicant currently lives? Know how the variable was coded. Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 10
  • 11. ORGANIZE AND SUBSET THE DATABASE • Two essential tools: Sort and Filter – On the Home ribbon in the Editing group and the Data Ribbon in the Sort and Filter group • Question: In the Executives database below, do any duplicate records (EXECID) appear? Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 11
  • 12. ORGANIZE AND SUBSET THE DATABASE (CONT’D) • Home►Editing►Sort & Filter►Custom Sort opens the Sort window – We sort by the EXECID column, sort on Values, and in order of A to Z, and click OK. – We can then scan for duplicate numbers (which appear above one another) Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 12
  • 13. ORGANIZE AND SUBSET THE DATABASE (CONT’D) • We can sort by more than one criterion using Add Level, for example: – ROUND then INDUSTRY then JOB MONTHS – But, ties on the first criterion will be broken by the second, and the second by the third. Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 13
  • 14. ORGANIZE AND SUBSET THE DATABASE: FILTERING • Filtering allows us to probe a large database and extract what interests us. • Example: In Applicants database, what are the characteristics of applicants from nonprofit organizations? • Home►Editing►Sort & Filter►Filter. Click on Industry Description, and uncheck Select All, then check Nonprofit. • Does not delete other records, only hides them Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 14
  • 15. EXAMINE INDIVIDUAL VARIABLES AND THEIR DISTRIBUTION • For numerical variables, we typically want to know the range of records from lowest to highest, and areas where most outcomes lie. • Example: In Applicants database, what are typical values for JOB MONTHS and what is the range from lowest to highest? • A common way to summarize a set of numerical values is the histogram, although Excel provides eight choices. Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 15
  • 16. EXAMINE INDIVIDUAL VARIABLES AND THEIR DISTRIBUTION (CONT’D) • In XLMiner add-in, choose Explore►Chart Wizard, and the screen at top right appears. • In subsequent windows choose Frequency for Y axis, JOB MONTHS for X axis, and the histogram at bottom right appears. Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 16
  • 17. CALCULATE SUMMARY MEASURES FOR INDIVIDUAL VARIABLES (CONT’D) • Excel provides numerous functions useful for investigating individual variables. • Some can summarize the values of numerical variables; others can be used to identify or count specific variables, both numerical and categorical. • Example: What is the average age in the Applicants database? Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 17
  • 18. CALCULATE SUMMARY MEASURES FOR INDIVIDUAL VARIABLES • The most common summary measure of a numerical value is average or mean. • Calculate using the AVERAGE function in Excel, for example: AVERAGE (C2:C2918) = 28.97 • Other useful summary measures are median, minimum, maximum. Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 18
  • 19. EXAMINE RELATIONSHIPS AMONG VARIABLES • In many cases relationships among variables are more important in analysis than the properties of one variable. • Graphical methods can track relationships. • Example: How long have older applicants held their current jobs? Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 19
  • 20. EXAMINE RELATIONSHIPS AMONG VARIABLES (CONT’D) • Use XLMiner to create a scatterplot between AGE and JOB MONTHS in the Applicants database. • Select Explore►Chart Wizard►Scatterplot Matrix. • Select variables AGE and JOB MONTHS, then click Finish for results at right. Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 20
  • 21. EXAMINE RELATIONSHIPS AMONG VARIABLES (CONT’D) • Relationships may be more complex, based on numerous variables. – Example: How does the distribution of GMAT scores of applicants compare across the five application rounds? • This asks us to compare five distributions, each with considerable information. • Boxplot option in XLMiner can generate a chart summarizing numerous statistics (e.g., mean, median). • Select Explore►Chart Wizard►Boxplot select variables GMAT and ROUND, click Finish. Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 21
  • 22. SUMMARY • The ability to use data intelligently is a vital skill for business analysts. • Analysts tend to perform most of their analysis in Excel. • Understanding the data is the most important step, before undertaking any analysis. • Careful preparation of raw data is often required before data mining can succeed. – Missing values may have to be removed or replaced with average values. – Numerical variables may need to be converted to categorical values (or vice versa). – Normalization of data may be required. Chapter 5 Copyright © 2013 John Wiley & Sons, Inc. 22
  • 23. COPYRIGHT © 2013 JOHN WILEY & SONS, INC. All rights reserved. Reproduction or translation of this work beyond that permitted in section 117 of the 1976 United States Copyright Act without express permission of the copyright owner is unlawful. Request for further information should be addressed to the Permissions Department, John Wiley & Sons, Inc. The purchaser may make back-up copies for his/her own use only and not for distribution or resale. The Publisher assumes no responsibility for errors, omissions, or damages caused by the use of these programs or from the use of the information herein.