SlideShare a Scribd company logo
MarketingDatabaseAnalysis
Anna Andrusova
Nathan Bailey
James Ballard
Han Si
DeminWangMKT 6362. Database Marketing
Overview of Business Problem
• In the 1990’s and early 2000’s, Dominick’s was a chain of over
100 grocery stores in the Chicago Metropolitan area
• For this evaluation, we are performing a corporate-level as
well as a category-level data analysis
• Corporate Analysis – Relate store sales performance
with known demographics to facilitate corporate
planning activities and test potential locations
• Category Analysis – Relate category sales performance
with known demographics to improve sales
performance and expand product offerings
Data Description
Store-level historical data on the sales over more than seven year
period
Customer Count File
Daily sales of stores in 30 product
categories:
• Bakery
• Beer
• Cosmetic
• Dairy
• Meat
• Pharmacy
• Grocery
Store-Specific Demographics
Demographic profiles of stores:
• Age
• Single / Retired / Unemployed
• Mortgage
• Poverty
• Income
• Education
• Household size
• Working woman, etc
• Cheese
• Wine
• Health and Beauty
• Deli
• Fish
• Floral
• Jewelry, etc.
Data Preparation
Step 1. The latest year’s sales data was aggregated by Store and
summarized for the year from Customer Count File
Step 2. Demographic variables were added from Store Account File
Resulting data set:
• 1-record per store (94 stores) containing 12-month sales data and
store demographic data
• Sales data on 30 product categories (the ‘Behavior’ variables)
• 43 demographic variables for residents living near the store
Approach
1. Segmentation: create groups of the stores similar in their
performance according to certain group of product categories and
dissimilar to the other groups according to the same group of
categories
Method: Non-hierarchical and hierarchical clustering
2. Response Analysis: find targetable characteristics of identified
groups of the stores
Method: Discriminant analysis
3. Model Validation: evaluate performance of the models on a hold-out
sample (20% of the stores)
4. Recommendations and conclusions
Dominick’s Data Set
General Data Set
Corporate Analysis
Category Analysis
Data Preparation
Clusters
Hierarchical Clustering and Non-Hierarchical Clustering
Response Analysis
Discriminate Analysis Hold-Out
Group
20%
Model Test
Conclusion and Recommendation
Corporate Analysis Results
Category Analysis Results
Flowchart of the Approach
Cluster History
Number of
Clusters
Clusters Joined Freq New Cluster
RMS Std Dev
Semipartial
R-Square
R-Square Centroid
Distance
Tie
… … … … … … … ….
11 CL21 311 3 255876 0.0013 .955 2.09E6
10 CL15 112 15 223435 0.0018 .953 2.09E6
9 CL18 CL11 9 293813 0.0044 .949 2.25E6
8 CL14 314 5 281264 0.0020 .947 2.37E6
7 CL10 CL17 43 329098 0.0346 .912 2.84E6
6 304 315 2 376122 0.0018 .910 2.86E6
5 CL8 CL9 14 451590 0.0209 .889 3.85E6
4 CL13 CL7 76 455327 0.1236 .766 3.88E6
3 CL12 CL5 16 567698 0.0270 .739 5.93E6
2 CL3 CL6 18 679121 0.0365 .702 6.84E6
1 CL4 CL2 94 918977 0.7022 .000 1.05E7
Corporate Analysis
Step #1 – Hierarchical Clustering
Conclusion: optimal number of clusters is between 3 and 6
3 clusters 4 clusters 5 clusters 6 clusters
Pseudo F Statistic 256.19 245.65 246.97 260.81
Approximate Expected Over-All R-
Squared
0.7364 0.77973 0.80166 0.8157
Cubic Clustering Criterion 5.517 6.505 7.813 16.200
Corporate Analysis (Cont.)
Step #2 – Non-Hierarchical Clustering
Conclusion: based on the results of both Hierarchical and Non
Hierarchical clustering 6-cluster solution is determined to
be optimal
Corporate Analysis – Clustering Results
Cluster Summary
Cluster Freq RMS Std
Deviation
Max Distance
from Seed to
Observation
Radius
Exceeded
Nearest
Cluster
Distance Between
Cluster Centroids
1 33 201245 2233427 6 2948467
2 1 . 0 3 4765417
3 6 336424 2455353 4 4286141
4 9 293813 2207687 3 4286141
5 16 274583 3058122 6 3192018
6 29 213948 2063995 1 2948467
35.1%
1.1%
6.4%
9.6%
17.0%
30.9%
21.5%
2.4%
12.1%
14.7%
21.2%
28.2%
Segment 1 Segment 2 Segment 3 Segment 4 Segment 5 Segment 6
% of stores vs. % of sales
% of stores % of sales
Corporate Analysis - Discriminant Analysis
Confidence Level: 90%
Univariate Test Statistics
F Statistics, Num DF=5, Den DF=79
Variable Total
Standard
Deviation
Pooled
Standard
Deviation
Between
Standard
Deviation
R-Square R-Square
/ (1-RSq)
F Value Pr > F
EDUC 0.1129 0.1102 0.0394 0.1029 0.1147 1.81 0.1200
NOCAR 0.1316 0.1287 0.0453 0.1000 0.1111 1.76 0.1318
INCSIGMA 2323 2264 824.9388 0.1064 0.1190 1.88 0.1070
HSIZE1 0.0829 0.0809 0.0292 0.1045 0.1167 1.84 0.1138
SINHOUSE 0.2173 0.2103 0.0817 0.1194 0.1355 2.14 0.0690
HVAL200 0.1853 0.1758 0.0792 0.1541 0.1822 2.88 0.0194
SINGLE 0.0703 0.0665 0.0306 0.1593 0.1895 2.99 0.0158
NWRKCH17 0.0199 0.0194 0.006933 0.1024 0.1141 1.80 0.1218
TELEPHN 0.0309 0.0293 0.0134 0.1581 0.1879 2.97 0.0166
SHPINDX 0.2482 0.2405 0.0924 0.1168 0.1323 2.09 0.0753
* 17 statistically significant variables in total
Corporate Analysis - Discriminant Analysis (Cont.)
Canonical
Correlation
Adjusted
Canonical
Correlation
Approximate
Standard
Error
Squared
Canonical
Correlation
1 0.847077 0.761387 0.030819 0.717540
Multivariate Statistics and F Approximations
S=5 M=15 N=21
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.02426163 1.39 180 223.58 0.0103
Pillai's Trace 2.50666011 1.34 180 240 0.0172
Hotelling-Lawley
Trace
6.07753961 1.44 180 164.86 0.0093
Roy's Greatest
Root
2.54031820 3.39 36 48 <.0001
Means of the
independent
variables are
statistically
different among
segments
Only 2.4% of the
variance in the
discriminant
scores is not
explained by the
differences among
groups of the
stores Ratio between-group SS to
the total SS => Good set of
descriptors
Error Count Estimates for CLUSTER
1 3 4 5 6 Total
Rate 0.1429 0.0000 0.0000 0.3333 0.3333 0.1619
Priors 0.1667 0.1667 0.1667 0.1667 0.1667 0.8333
Error Count Estimates for CLUSTER
1 2 3 4 5 6 Total
Rate 0.1818 0.0000 0.0000 0.1667 0.3571 0.1923 0.1497
Priors 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667
Corporate Analysis – Classification Results
Original
Dataset
Hold-out
Sample
~ 85% of the stores are classified correctly
~ 84% of the stores are classified correctly
Category Analysis: Beer and Wine
Cluster History
Number of
Clusters
Clusters Joined Freq New
Cluster
RMS Std
Dev
Semipartial
R-Square
R-Square Centroid
Distance
Tie
9 CL16 309 8 72804.2 0.0031 .906 197203
8 CL23 CL13 10 93539.6 0.0091 .897 200748
7 CL10 CL31 11 95378.9 0.0085 .888 239550
6 CL7 CL8 21 145510 0.0459 .842 311263
5 CL87 CL11 61 112380 0.0639 .778 318702
4 CL5 CL6 82 170030 0.2099 .568 385452
3 CL4 CL15 85 185394 0.0973 .471 609748
2 CL3 CL9 93 226017 0.3212 .150 696877
1 CL2 304 94 243807 0.1499 .000 1.29E6
Step #1 – Hierarchical Clustering
Conclusion: optimal number of clusters is between 4 and 6
Category Analysis: Beer and Wine (Cont.)
Step #2 – Non-Hierarchical Clustering
4 clusters 5 clusters 6 clusters
Pseudo F Statistic 87.53 116.85 131.08
Approximate Expected Over-All R-Squared 0.7692 0.81988 0.85358
Cubic Clustering Criterion -1.336 1.458 2.489
Conclusion: based on the results of both Hierarchical and Non
Hierarchical clustering 6-cluster solution is determined
to be optimal
Category Analysis: Beer and Wine (Cont.)
Cluster Summary
Cluster Frequency RMS Std
Deviation
Maximum
Distance
from Seed
to
Observation
Radius
Exceeded
Nearest
Cluster
Distance
Between
Cluster
Centroids
1 35 83267.8 194999 2 268532
2 32 78629.9 206948 1 268532
3 8 131663 250170 2 374603
4 9 82174.1 159203 2 333646
5 9 80329.2 180104 4 377389
6 1 . 0 3 924906
Cluster Means
Cluster BEER WINE
1 144128.421 101864.577
2 326776.212 298713.241
3 493651.738 634093.243
4 649465.774 213912.842
5 955669.947 434505.459
6 383045.800 1552362.060
Cluster #5 is the top seller
of Beer
Cluster #6 is the Top seller
of Wine
Cluster #1 has the lowest
sales of both Beer & Wine
One store in Cluster 6
outlier
Discriminant Analysis: Beer and Wine
Confidence level: 95%
Univariate Test Statistics
F Statistics, Num DF=5, Den DF=79
Variable Total
Standard
Deviation
Pooled
Standard
Deviation
Between
Standard
Deviation
R-Square R-Square
/ (1-RSq)
F Value Pr > F
AGE9 0.0272 0.0261 0.0109 0.1347 0.1557 2.46 0.0400
EDUC 0.1129 0.1051 0.0528 0.1843 0.2259 3.57 0.0058
INCOME 0.2921 0.2793 0.1192 0.1405 0.1635 2.58 0.0324
INCSIGMA 2323 2191 1021 0.1630 0.1948 3.08 0.0137
HSIZEAVG 0.2686 0.2480 0.1303 0.1985 0.2477 3.91 0.0032
HSIZE2 0.0322 0.0298 0.0154 0.1942 0.2410 3.81 0.0038
HSIZE567 0.0325 0.0277 0.0200 0.3176 0.4655 7.35 <.0001
HH3PLUS 0.0844 0.0796 0.0371 0.1628 0.1944 3.07 0.0138
HH4PLUS 0.0650 0.0606 0.0303 0.1833 0.2244 3.55 0.0061
DENSITY 0.001250 0.001192 0.000518 0.1447 0.1692 2.67 0.0277
HVAL150 0.2460 0.2260 0.1217 0.2064 0.2601 4.11 0.0023
HVAL200 0.1853 0.1664 0.0992 0.2417 0.3188 5.04 0.0005
HVALMEAN 47.3071 42.9341 24.4560 0.2254 0.2909 4.60 0.0010
SINGLE 0.0703 0.0664 0.0308 0.1616 0.1927 3.04 0.0145
UNEMP 0.0239 0.0226 0.0103 0.1576 0.1871 2.96 0.0169
WRKWNCH 0.0446 0.0424 0.0187 0.1483 0.1742 2.75 0.0241
TELEPHN 0.0309 0.0287 0.0148 0.1929 0.2389 3.78 0.0041
POVERTY 0.0457 0.0441 0.0175 0.1238 0.1413 2.23 0.0590
Statistically
significant
variables in
discriminating
observations
among groups
Discriminant Analysis: Beer and Wine (Cont.)
Canonical
Correlation
Adjusted
Canonical
Correlation
Approximate
Standard
Error
Squared
Canonical
Correlation
1 0.846814 0.751237 0.030868 0.717094
Multivariate Statistics and F Approximations
S=5 M=15 N=21
Statistic Value F Value Num DF Den DF Pr > F
Wilks' Lambda 0.01346418 1.72 180 223.58 <.0001
Pillai's Trace 2.81504177 1.72 180 240 <.0001
Hotelling-Lawley
Trace
7.26639429 1.72 180 164.86 0.0002
Roy's Greatest
Root
2.53474655 3.38 36 48 <.0001
Means of the
independent
variables are
statistically
different among
segments
Only 1.3% of the
variance in the
discriminant
scores is not
explained by the
differences among
groups of the
stores
Good set of descriptors
Beer & Wine Category Analysis –
Classification Results
Original
Dataset
Error Count Estimates for CLUSTER
1 2 3 4 5 6 Total
Rate 0.5714 0.6207 0.7143 0.6000 0.8750 1.0000 0.7302
Priors 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667
Hold-out
Sample
Error Count Estimates for CLUSTER
1 2 3 4 5 Total
Rate 0.1667 0.3333 0.5000 0.5000 0.5000 0.4000
Priors 0.1667 0.1667 0.1667 0.1667 0.1667 0.8333
~ 27% of the stores are classified correctly
~ 60% of the stores are classified correctly
Recommendations
Corporate Level:
• Resource allocation among the stores: perform additional analysis of the stores
in underperforming segments (1 & 6)
• Evaluation of the potential locations for a new store: deploy discriminant
function to predict performance of the stores in different product categories
based on the demographic profiles of their locations
Category Level (Beer & Wine):
• Marketing strategy for a new brand of Beer or Wine: adjust targeting strategy
for a product based on the demographic profile of the location it will be sold
• Choice of the stores to test market a new product: recommend to perform a
market test for Beer in stores of segments 4 & 5 and Wine in segments 3 &6
Limitations of the Analysis
Additional data
• Product-level data: assessment of specific product sales in new stores & prediction
of a new product performance that is being considered to be launched
• Customer-specific data: ability to build better predictive models tied to the customer
demographics (scanner data from the loyalty program members’ transactions)
Higher quality analysis at a more granular
level

More Related Content

Similar to Database Marketing - Dominick's stores in Chicago distric

Business and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupBusiness and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April Meetup
Ken Tucker
 
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by  Naga.docxWalmart Sales Prediction Using Rapidminer Prepared by  Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
celenarouzie
 
My Portfolio
My PortfolioMy Portfolio
My Portfolio
Agsantiago
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
Jim Stafford
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
Jim Stafford
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
Jim Stafford
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
Jim Stafford
 
Optimizing Assortments by Focusing on Attribute-Based Demand Patterns
Optimizing Assortments by Focusing on Attribute-Based Demand PatternsOptimizing Assortments by Focusing on Attribute-Based Demand Patterns
Optimizing Assortments by Focusing on Attribute-Based Demand Patterns
G3 Communications
 
Retail analytics ASIA workshop_ New perspective on GfK retail analytics
Retail analytics ASIA workshop_ New perspective on GfK retail analyticsRetail analytics ASIA workshop_ New perspective on GfK retail analytics
Retail analytics ASIA workshop_ New perspective on GfK retail analytics
Richard Jo
 
Statistical Models for Proportional Outcomes
Statistical Models for Proportional OutcomesStatistical Models for Proportional Outcomes
Statistical Models for Proportional OutcomesWenSui Liu
 
Dmaic
DmaicDmaic
Report_Imports of goods and services Canada(2023).docx
Report_Imports of goods and services Canada(2023).docxReport_Imports of goods and services Canada(2023).docx
Report_Imports of goods and services Canada(2023).docx
migneshbirdi
 
Loss Data analysis
Loss Data analysisLoss Data analysis
Loss Data analysisMark James
 
2nd Dubai Marketing Club (Pharmaceutical Forecasting) by Dr.Samer Saeed
2nd Dubai Marketing Club (Pharmaceutical Forecasting) by Dr.Samer Saeed2nd Dubai Marketing Club (Pharmaceutical Forecasting) by Dr.Samer Saeed
2nd Dubai Marketing Club (Pharmaceutical Forecasting) by Dr.Samer Saeed
Mahmoud Bahgat
 
Graphical tools
Graphical toolsGraphical tools
Graphical tools
Cynthia Cumby
 
Scoring_model.pdf
Scoring_model.pdfScoring_model.pdf
Scoring_model.pdf
FabrizioLanubile
 
Managing Earnings at Asset Light 3PLs
Managing Earnings at Asset Light 3PLsManaging Earnings at Asset Light 3PLs
Managing Earnings at Asset Light 3PLs
Lean Transit Consulting
 
AABB - Improving Methods for Matching Single Donor Platelet Production -blue
AABB - Improving Methods for Matching Single Donor Platelet Production -blueAABB - Improving Methods for Matching Single Donor Platelet Production -blue
AABB - Improving Methods for Matching Single Donor Platelet Production -blueTracy Mallios
 

Similar to Database Marketing - Dominick's stores in Chicago distric (20)

Business and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April MeetupBusiness and Data Analytics Collaborative April Meetup
Business and Data Analytics Collaborative April Meetup
 
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by  Naga.docxWalmart Sales Prediction Using Rapidminer Prepared by  Naga.docx
Walmart Sales Prediction Using Rapidminer Prepared by Naga.docx
 
My Portfolio
My PortfolioMy Portfolio
My Portfolio
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
 
Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010Fundamentals Of Data Mining 2010
Fundamentals Of Data Mining 2010
 
Optimizing Assortments by Focusing on Attribute-Based Demand Patterns
Optimizing Assortments by Focusing on Attribute-Based Demand PatternsOptimizing Assortments by Focusing on Attribute-Based Demand Patterns
Optimizing Assortments by Focusing on Attribute-Based Demand Patterns
 
Retail analytics ASIA workshop_ New perspective on GfK retail analytics
Retail analytics ASIA workshop_ New perspective on GfK retail analyticsRetail analytics ASIA workshop_ New perspective on GfK retail analytics
Retail analytics ASIA workshop_ New perspective on GfK retail analytics
 
Econ stat1
Econ stat1Econ stat1
Econ stat1
 
Statistical Models for Proportional Outcomes
Statistical Models for Proportional OutcomesStatistical Models for Proportional Outcomes
Statistical Models for Proportional Outcomes
 
Dmaic
DmaicDmaic
Dmaic
 
Report_Imports of goods and services Canada(2023).docx
Report_Imports of goods and services Canada(2023).docxReport_Imports of goods and services Canada(2023).docx
Report_Imports of goods and services Canada(2023).docx
 
Loss Data analysis
Loss Data analysisLoss Data analysis
Loss Data analysis
 
2nd Dubai Marketing Club (Pharmaceutical Forecasting) by Dr.Samer Saeed
2nd Dubai Marketing Club (Pharmaceutical Forecasting) by Dr.Samer Saeed2nd Dubai Marketing Club (Pharmaceutical Forecasting) by Dr.Samer Saeed
2nd Dubai Marketing Club (Pharmaceutical Forecasting) by Dr.Samer Saeed
 
Graphical tools
Graphical toolsGraphical tools
Graphical tools
 
Scoring_model.pdf
Scoring_model.pdfScoring_model.pdf
Scoring_model.pdf
 
Managing Earnings at Asset Light 3PLs
Managing Earnings at Asset Light 3PLsManaging Earnings at Asset Light 3PLs
Managing Earnings at Asset Light 3PLs
 
AABB - Improving Methods for Matching Single Donor Platelet Production -blue
AABB - Improving Methods for Matching Single Donor Platelet Production -blueAABB - Improving Methods for Matching Single Donor Platelet Production -blue
AABB - Improving Methods for Matching Single Donor Platelet Production -blue
 
ZQK JV
ZQK JVZQK JV
ZQK JV
 

Recently uploaded

一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
taqyed
 
Introduction to Amazon company 111111111111
Introduction to Amazon company 111111111111Introduction to Amazon company 111111111111
Introduction to Amazon company 111111111111
zoyaansari11365
 
Taurus Zodiac Sign_ Personality Traits and Sign Dates.pptx
Taurus Zodiac Sign_ Personality Traits and Sign Dates.pptxTaurus Zodiac Sign_ Personality Traits and Sign Dates.pptx
Taurus Zodiac Sign_ Personality Traits and Sign Dates.pptx
my Pandit
 
Digital Transformation in PLM - WHAT and HOW - for distribution.pdf
Digital Transformation in PLM - WHAT and HOW - for distribution.pdfDigital Transformation in PLM - WHAT and HOW - for distribution.pdf
Digital Transformation in PLM - WHAT and HOW - for distribution.pdf
Jos Voskuil
 
5 Things You Need To Know Before Hiring a Videographer
5 Things You Need To Know Before Hiring a Videographer5 Things You Need To Know Before Hiring a Videographer
5 Things You Need To Know Before Hiring a Videographer
ofm712785
 
Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...
dylandmeas
 
20240425_ TJ Communications Credentials_compressed.pdf
20240425_ TJ Communications Credentials_compressed.pdf20240425_ TJ Communications Credentials_compressed.pdf
20240425_ TJ Communications Credentials_compressed.pdf
tjcomstrang
 
Filing Your Delaware Franchise Tax A Detailed Guide
Filing Your Delaware Franchise Tax A Detailed GuideFiling Your Delaware Franchise Tax A Detailed Guide
Filing Your Delaware Franchise Tax A Detailed Guide
YourLegal Accounting
 
Enterprise Excellence is Inclusive Excellence.pdf
Enterprise Excellence is Inclusive Excellence.pdfEnterprise Excellence is Inclusive Excellence.pdf
Enterprise Excellence is Inclusive Excellence.pdf
KaiNexus
 
India Orthopedic Devices Market: Unlocking Growth Secrets, Trends and Develop...
India Orthopedic Devices Market: Unlocking Growth Secrets, Trends and Develop...India Orthopedic Devices Market: Unlocking Growth Secrets, Trends and Develop...
India Orthopedic Devices Market: Unlocking Growth Secrets, Trends and Develop...
Kumar Satyam
 
Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)
Lviv Startup Club
 
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdfSearch Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Arihant Webtech Pvt. Ltd
 
chapter 10 - excise tax of transfer and business taxation
chapter 10 - excise tax of transfer and business taxationchapter 10 - excise tax of transfer and business taxation
chapter 10 - excise tax of transfer and business taxation
AUDIJEAngelo
 
CADAVER AS OUR FIRST TEACHER anatomt in your.pptx
CADAVER AS OUR FIRST TEACHER anatomt in your.pptxCADAVER AS OUR FIRST TEACHER anatomt in your.pptx
CADAVER AS OUR FIRST TEACHER anatomt in your.pptx
fakeloginn69
 
anas about venice for grade 6f about venice
anas about venice for grade 6f about veniceanas about venice for grade 6f about venice
anas about venice for grade 6f about venice
anasabutalha2013
 
Pitch Deck Teardown: RAW Dating App's $3M Angel deck
Pitch Deck Teardown: RAW Dating App's $3M Angel deckPitch Deck Teardown: RAW Dating App's $3M Angel deck
Pitch Deck Teardown: RAW Dating App's $3M Angel deck
HajeJanKamps
 
Cree_Rey_BrandIdentityKit.PDF_PersonalBd
Cree_Rey_BrandIdentityKit.PDF_PersonalBdCree_Rey_BrandIdentityKit.PDF_PersonalBd
Cree_Rey_BrandIdentityKit.PDF_PersonalBd
creerey
 
Role of Remote Sensing and Monitoring in Mining
Role of Remote Sensing and Monitoring in MiningRole of Remote Sensing and Monitoring in Mining
Role of Remote Sensing and Monitoring in Mining
Naaraayani Minerals Pvt.Ltd
 
What is the TDS Return Filing Due Date for FY 2024-25.pdf
What is the TDS Return Filing Due Date for FY 2024-25.pdfWhat is the TDS Return Filing Due Date for FY 2024-25.pdf
What is the TDS Return Filing Due Date for FY 2024-25.pdf
seoforlegalpillers
 
FINAL PRESENTATION.pptx12143241324134134
FINAL PRESENTATION.pptx12143241324134134FINAL PRESENTATION.pptx12143241324134134
FINAL PRESENTATION.pptx12143241324134134
LR1709MUSIC
 

Recently uploaded (20)

一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
一比一原版加拿大渥太华大学毕业证(uottawa毕业证书)如何办理
 
Introduction to Amazon company 111111111111
Introduction to Amazon company 111111111111Introduction to Amazon company 111111111111
Introduction to Amazon company 111111111111
 
Taurus Zodiac Sign_ Personality Traits and Sign Dates.pptx
Taurus Zodiac Sign_ Personality Traits and Sign Dates.pptxTaurus Zodiac Sign_ Personality Traits and Sign Dates.pptx
Taurus Zodiac Sign_ Personality Traits and Sign Dates.pptx
 
Digital Transformation in PLM - WHAT and HOW - for distribution.pdf
Digital Transformation in PLM - WHAT and HOW - for distribution.pdfDigital Transformation in PLM - WHAT and HOW - for distribution.pdf
Digital Transformation in PLM - WHAT and HOW - for distribution.pdf
 
5 Things You Need To Know Before Hiring a Videographer
5 Things You Need To Know Before Hiring a Videographer5 Things You Need To Know Before Hiring a Videographer
5 Things You Need To Know Before Hiring a Videographer
 
Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...Discover the innovative and creative projects that highlight my journey throu...
Discover the innovative and creative projects that highlight my journey throu...
 
20240425_ TJ Communications Credentials_compressed.pdf
20240425_ TJ Communications Credentials_compressed.pdf20240425_ TJ Communications Credentials_compressed.pdf
20240425_ TJ Communications Credentials_compressed.pdf
 
Filing Your Delaware Franchise Tax A Detailed Guide
Filing Your Delaware Franchise Tax A Detailed GuideFiling Your Delaware Franchise Tax A Detailed Guide
Filing Your Delaware Franchise Tax A Detailed Guide
 
Enterprise Excellence is Inclusive Excellence.pdf
Enterprise Excellence is Inclusive Excellence.pdfEnterprise Excellence is Inclusive Excellence.pdf
Enterprise Excellence is Inclusive Excellence.pdf
 
India Orthopedic Devices Market: Unlocking Growth Secrets, Trends and Develop...
India Orthopedic Devices Market: Unlocking Growth Secrets, Trends and Develop...India Orthopedic Devices Market: Unlocking Growth Secrets, Trends and Develop...
India Orthopedic Devices Market: Unlocking Growth Secrets, Trends and Develop...
 
Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)Maksym Vyshnivetskyi: PMO Quality Management (UA)
Maksym Vyshnivetskyi: PMO Quality Management (UA)
 
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdfSearch Disrupted Google’s Leaked Documents Rock the SEO World.pdf
Search Disrupted Google’s Leaked Documents Rock the SEO World.pdf
 
chapter 10 - excise tax of transfer and business taxation
chapter 10 - excise tax of transfer and business taxationchapter 10 - excise tax of transfer and business taxation
chapter 10 - excise tax of transfer and business taxation
 
CADAVER AS OUR FIRST TEACHER anatomt in your.pptx
CADAVER AS OUR FIRST TEACHER anatomt in your.pptxCADAVER AS OUR FIRST TEACHER anatomt in your.pptx
CADAVER AS OUR FIRST TEACHER anatomt in your.pptx
 
anas about venice for grade 6f about venice
anas about venice for grade 6f about veniceanas about venice for grade 6f about venice
anas about venice for grade 6f about venice
 
Pitch Deck Teardown: RAW Dating App's $3M Angel deck
Pitch Deck Teardown: RAW Dating App's $3M Angel deckPitch Deck Teardown: RAW Dating App's $3M Angel deck
Pitch Deck Teardown: RAW Dating App's $3M Angel deck
 
Cree_Rey_BrandIdentityKit.PDF_PersonalBd
Cree_Rey_BrandIdentityKit.PDF_PersonalBdCree_Rey_BrandIdentityKit.PDF_PersonalBd
Cree_Rey_BrandIdentityKit.PDF_PersonalBd
 
Role of Remote Sensing and Monitoring in Mining
Role of Remote Sensing and Monitoring in MiningRole of Remote Sensing and Monitoring in Mining
Role of Remote Sensing and Monitoring in Mining
 
What is the TDS Return Filing Due Date for FY 2024-25.pdf
What is the TDS Return Filing Due Date for FY 2024-25.pdfWhat is the TDS Return Filing Due Date for FY 2024-25.pdf
What is the TDS Return Filing Due Date for FY 2024-25.pdf
 
FINAL PRESENTATION.pptx12143241324134134
FINAL PRESENTATION.pptx12143241324134134FINAL PRESENTATION.pptx12143241324134134
FINAL PRESENTATION.pptx12143241324134134
 

Database Marketing - Dominick's stores in Chicago distric

  • 1. MarketingDatabaseAnalysis Anna Andrusova Nathan Bailey James Ballard Han Si DeminWangMKT 6362. Database Marketing
  • 2. Overview of Business Problem • In the 1990’s and early 2000’s, Dominick’s was a chain of over 100 grocery stores in the Chicago Metropolitan area • For this evaluation, we are performing a corporate-level as well as a category-level data analysis • Corporate Analysis – Relate store sales performance with known demographics to facilitate corporate planning activities and test potential locations • Category Analysis – Relate category sales performance with known demographics to improve sales performance and expand product offerings
  • 3. Data Description Store-level historical data on the sales over more than seven year period Customer Count File Daily sales of stores in 30 product categories: • Bakery • Beer • Cosmetic • Dairy • Meat • Pharmacy • Grocery Store-Specific Demographics Demographic profiles of stores: • Age • Single / Retired / Unemployed • Mortgage • Poverty • Income • Education • Household size • Working woman, etc • Cheese • Wine • Health and Beauty • Deli • Fish • Floral • Jewelry, etc.
  • 4. Data Preparation Step 1. The latest year’s sales data was aggregated by Store and summarized for the year from Customer Count File Step 2. Demographic variables were added from Store Account File Resulting data set: • 1-record per store (94 stores) containing 12-month sales data and store demographic data • Sales data on 30 product categories (the ‘Behavior’ variables) • 43 demographic variables for residents living near the store
  • 5. Approach 1. Segmentation: create groups of the stores similar in their performance according to certain group of product categories and dissimilar to the other groups according to the same group of categories Method: Non-hierarchical and hierarchical clustering 2. Response Analysis: find targetable characteristics of identified groups of the stores Method: Discriminant analysis 3. Model Validation: evaluate performance of the models on a hold-out sample (20% of the stores) 4. Recommendations and conclusions
  • 6. Dominick’s Data Set General Data Set Corporate Analysis Category Analysis Data Preparation Clusters Hierarchical Clustering and Non-Hierarchical Clustering Response Analysis Discriminate Analysis Hold-Out Group 20% Model Test Conclusion and Recommendation Corporate Analysis Results Category Analysis Results Flowchart of the Approach
  • 7. Cluster History Number of Clusters Clusters Joined Freq New Cluster RMS Std Dev Semipartial R-Square R-Square Centroid Distance Tie … … … … … … … …. 11 CL21 311 3 255876 0.0013 .955 2.09E6 10 CL15 112 15 223435 0.0018 .953 2.09E6 9 CL18 CL11 9 293813 0.0044 .949 2.25E6 8 CL14 314 5 281264 0.0020 .947 2.37E6 7 CL10 CL17 43 329098 0.0346 .912 2.84E6 6 304 315 2 376122 0.0018 .910 2.86E6 5 CL8 CL9 14 451590 0.0209 .889 3.85E6 4 CL13 CL7 76 455327 0.1236 .766 3.88E6 3 CL12 CL5 16 567698 0.0270 .739 5.93E6 2 CL3 CL6 18 679121 0.0365 .702 6.84E6 1 CL4 CL2 94 918977 0.7022 .000 1.05E7 Corporate Analysis Step #1 – Hierarchical Clustering Conclusion: optimal number of clusters is between 3 and 6
  • 8. 3 clusters 4 clusters 5 clusters 6 clusters Pseudo F Statistic 256.19 245.65 246.97 260.81 Approximate Expected Over-All R- Squared 0.7364 0.77973 0.80166 0.8157 Cubic Clustering Criterion 5.517 6.505 7.813 16.200 Corporate Analysis (Cont.) Step #2 – Non-Hierarchical Clustering Conclusion: based on the results of both Hierarchical and Non Hierarchical clustering 6-cluster solution is determined to be optimal
  • 9. Corporate Analysis – Clustering Results Cluster Summary Cluster Freq RMS Std Deviation Max Distance from Seed to Observation Radius Exceeded Nearest Cluster Distance Between Cluster Centroids 1 33 201245 2233427 6 2948467 2 1 . 0 3 4765417 3 6 336424 2455353 4 4286141 4 9 293813 2207687 3 4286141 5 16 274583 3058122 6 3192018 6 29 213948 2063995 1 2948467 35.1% 1.1% 6.4% 9.6% 17.0% 30.9% 21.5% 2.4% 12.1% 14.7% 21.2% 28.2% Segment 1 Segment 2 Segment 3 Segment 4 Segment 5 Segment 6 % of stores vs. % of sales % of stores % of sales
  • 10. Corporate Analysis - Discriminant Analysis Confidence Level: 90% Univariate Test Statistics F Statistics, Num DF=5, Den DF=79 Variable Total Standard Deviation Pooled Standard Deviation Between Standard Deviation R-Square R-Square / (1-RSq) F Value Pr > F EDUC 0.1129 0.1102 0.0394 0.1029 0.1147 1.81 0.1200 NOCAR 0.1316 0.1287 0.0453 0.1000 0.1111 1.76 0.1318 INCSIGMA 2323 2264 824.9388 0.1064 0.1190 1.88 0.1070 HSIZE1 0.0829 0.0809 0.0292 0.1045 0.1167 1.84 0.1138 SINHOUSE 0.2173 0.2103 0.0817 0.1194 0.1355 2.14 0.0690 HVAL200 0.1853 0.1758 0.0792 0.1541 0.1822 2.88 0.0194 SINGLE 0.0703 0.0665 0.0306 0.1593 0.1895 2.99 0.0158 NWRKCH17 0.0199 0.0194 0.006933 0.1024 0.1141 1.80 0.1218 TELEPHN 0.0309 0.0293 0.0134 0.1581 0.1879 2.97 0.0166 SHPINDX 0.2482 0.2405 0.0924 0.1168 0.1323 2.09 0.0753 * 17 statistically significant variables in total
  • 11. Corporate Analysis - Discriminant Analysis (Cont.) Canonical Correlation Adjusted Canonical Correlation Approximate Standard Error Squared Canonical Correlation 1 0.847077 0.761387 0.030819 0.717540 Multivariate Statistics and F Approximations S=5 M=15 N=21 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.02426163 1.39 180 223.58 0.0103 Pillai's Trace 2.50666011 1.34 180 240 0.0172 Hotelling-Lawley Trace 6.07753961 1.44 180 164.86 0.0093 Roy's Greatest Root 2.54031820 3.39 36 48 <.0001 Means of the independent variables are statistically different among segments Only 2.4% of the variance in the discriminant scores is not explained by the differences among groups of the stores Ratio between-group SS to the total SS => Good set of descriptors
  • 12. Error Count Estimates for CLUSTER 1 3 4 5 6 Total Rate 0.1429 0.0000 0.0000 0.3333 0.3333 0.1619 Priors 0.1667 0.1667 0.1667 0.1667 0.1667 0.8333 Error Count Estimates for CLUSTER 1 2 3 4 5 6 Total Rate 0.1818 0.0000 0.0000 0.1667 0.3571 0.1923 0.1497 Priors 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 Corporate Analysis – Classification Results Original Dataset Hold-out Sample ~ 85% of the stores are classified correctly ~ 84% of the stores are classified correctly
  • 13. Category Analysis: Beer and Wine Cluster History Number of Clusters Clusters Joined Freq New Cluster RMS Std Dev Semipartial R-Square R-Square Centroid Distance Tie 9 CL16 309 8 72804.2 0.0031 .906 197203 8 CL23 CL13 10 93539.6 0.0091 .897 200748 7 CL10 CL31 11 95378.9 0.0085 .888 239550 6 CL7 CL8 21 145510 0.0459 .842 311263 5 CL87 CL11 61 112380 0.0639 .778 318702 4 CL5 CL6 82 170030 0.2099 .568 385452 3 CL4 CL15 85 185394 0.0973 .471 609748 2 CL3 CL9 93 226017 0.3212 .150 696877 1 CL2 304 94 243807 0.1499 .000 1.29E6 Step #1 – Hierarchical Clustering Conclusion: optimal number of clusters is between 4 and 6
  • 14. Category Analysis: Beer and Wine (Cont.) Step #2 – Non-Hierarchical Clustering 4 clusters 5 clusters 6 clusters Pseudo F Statistic 87.53 116.85 131.08 Approximate Expected Over-All R-Squared 0.7692 0.81988 0.85358 Cubic Clustering Criterion -1.336 1.458 2.489 Conclusion: based on the results of both Hierarchical and Non Hierarchical clustering 6-cluster solution is determined to be optimal
  • 15. Category Analysis: Beer and Wine (Cont.) Cluster Summary Cluster Frequency RMS Std Deviation Maximum Distance from Seed to Observation Radius Exceeded Nearest Cluster Distance Between Cluster Centroids 1 35 83267.8 194999 2 268532 2 32 78629.9 206948 1 268532 3 8 131663 250170 2 374603 4 9 82174.1 159203 2 333646 5 9 80329.2 180104 4 377389 6 1 . 0 3 924906 Cluster Means Cluster BEER WINE 1 144128.421 101864.577 2 326776.212 298713.241 3 493651.738 634093.243 4 649465.774 213912.842 5 955669.947 434505.459 6 383045.800 1552362.060 Cluster #5 is the top seller of Beer Cluster #6 is the Top seller of Wine Cluster #1 has the lowest sales of both Beer & Wine One store in Cluster 6 outlier
  • 16. Discriminant Analysis: Beer and Wine Confidence level: 95% Univariate Test Statistics F Statistics, Num DF=5, Den DF=79 Variable Total Standard Deviation Pooled Standard Deviation Between Standard Deviation R-Square R-Square / (1-RSq) F Value Pr > F AGE9 0.0272 0.0261 0.0109 0.1347 0.1557 2.46 0.0400 EDUC 0.1129 0.1051 0.0528 0.1843 0.2259 3.57 0.0058 INCOME 0.2921 0.2793 0.1192 0.1405 0.1635 2.58 0.0324 INCSIGMA 2323 2191 1021 0.1630 0.1948 3.08 0.0137 HSIZEAVG 0.2686 0.2480 0.1303 0.1985 0.2477 3.91 0.0032 HSIZE2 0.0322 0.0298 0.0154 0.1942 0.2410 3.81 0.0038 HSIZE567 0.0325 0.0277 0.0200 0.3176 0.4655 7.35 <.0001 HH3PLUS 0.0844 0.0796 0.0371 0.1628 0.1944 3.07 0.0138 HH4PLUS 0.0650 0.0606 0.0303 0.1833 0.2244 3.55 0.0061 DENSITY 0.001250 0.001192 0.000518 0.1447 0.1692 2.67 0.0277 HVAL150 0.2460 0.2260 0.1217 0.2064 0.2601 4.11 0.0023 HVAL200 0.1853 0.1664 0.0992 0.2417 0.3188 5.04 0.0005 HVALMEAN 47.3071 42.9341 24.4560 0.2254 0.2909 4.60 0.0010 SINGLE 0.0703 0.0664 0.0308 0.1616 0.1927 3.04 0.0145 UNEMP 0.0239 0.0226 0.0103 0.1576 0.1871 2.96 0.0169 WRKWNCH 0.0446 0.0424 0.0187 0.1483 0.1742 2.75 0.0241 TELEPHN 0.0309 0.0287 0.0148 0.1929 0.2389 3.78 0.0041 POVERTY 0.0457 0.0441 0.0175 0.1238 0.1413 2.23 0.0590 Statistically significant variables in discriminating observations among groups
  • 17. Discriminant Analysis: Beer and Wine (Cont.) Canonical Correlation Adjusted Canonical Correlation Approximate Standard Error Squared Canonical Correlation 1 0.846814 0.751237 0.030868 0.717094 Multivariate Statistics and F Approximations S=5 M=15 N=21 Statistic Value F Value Num DF Den DF Pr > F Wilks' Lambda 0.01346418 1.72 180 223.58 <.0001 Pillai's Trace 2.81504177 1.72 180 240 <.0001 Hotelling-Lawley Trace 7.26639429 1.72 180 164.86 0.0002 Roy's Greatest Root 2.53474655 3.38 36 48 <.0001 Means of the independent variables are statistically different among segments Only 1.3% of the variance in the discriminant scores is not explained by the differences among groups of the stores Good set of descriptors
  • 18. Beer & Wine Category Analysis – Classification Results Original Dataset Error Count Estimates for CLUSTER 1 2 3 4 5 6 Total Rate 0.5714 0.6207 0.7143 0.6000 0.8750 1.0000 0.7302 Priors 0.1667 0.1667 0.1667 0.1667 0.1667 0.1667 Hold-out Sample Error Count Estimates for CLUSTER 1 2 3 4 5 Total Rate 0.1667 0.3333 0.5000 0.5000 0.5000 0.4000 Priors 0.1667 0.1667 0.1667 0.1667 0.1667 0.8333 ~ 27% of the stores are classified correctly ~ 60% of the stores are classified correctly
  • 19. Recommendations Corporate Level: • Resource allocation among the stores: perform additional analysis of the stores in underperforming segments (1 & 6) • Evaluation of the potential locations for a new store: deploy discriminant function to predict performance of the stores in different product categories based on the demographic profiles of their locations Category Level (Beer & Wine): • Marketing strategy for a new brand of Beer or Wine: adjust targeting strategy for a product based on the demographic profile of the location it will be sold • Choice of the stores to test market a new product: recommend to perform a market test for Beer in stores of segments 4 & 5 and Wine in segments 3 &6
  • 20. Limitations of the Analysis Additional data • Product-level data: assessment of specific product sales in new stores & prediction of a new product performance that is being considered to be launched • Customer-specific data: ability to build better predictive models tied to the customer demographics (scanner data from the loyalty program members’ transactions) Higher quality analysis at a more granular level