SlideShare a Scribd company logo
Final Presentation
STATISTICAL MEASUREMENTS, ANALYSIS & RESEARCH
Oliver Gong
Net ID: zg2088
Instructor: Dr. Luyao Zhang
Contents
Self-Introduction01
Key Learnings02
Market Research Report03
Appendix04
01 Self-Introduction
PART ONE
Self-Introduction
I am Ziyu Gong also go by Oliver. I received Bachelor’s degree in
Accounting from China University of Mining and Technology(Beijing).
In the seconded year of university, I went to Industrial and Commercial
Bank of China(Shangrao) Corporate Businiess Department as an intern,
Supporting the account manager in loan marketing by visiting clients
to know about their demands for bank funds and collect necessary
documents required for loan approval. That’s where I experienced and
comprehended marketing practice for the first time. In the third year
of university, I interned in Everbright Securities Investment Banking
Department, conducting corporate and financial due diligence for an
IPO project, responsibilities included collecting and checking
confirmation requests and analyzing financial statements. As an
aspiring person willing to take challenges, I set my goal to devote
myself to marketing analysis industry in the future.
B.S. in Accounting |China University of Mining and Technology(Beijing)
LinkedIn URL: https://www.linkedin.com/in/%E5%AD%90%E8%88%86-%E9%BE%9A-18b8101b7/
GitHub Repo	Link:	https://colab.research.google.com/github/OliverGong77/NYU_Integrated_Marketing
Kaggle	Notebook	Link:	https://www.kaggle.com/olivergong77/customer-segementation-zg2088
02 Key Learnings
PART TOW
Key Learnings
The most important thing I learned from this lesson was to use tools like
Goole Data Studio, Github and Kaggle to analyze data. In terms of
application, I learned to conduct hypothesis testing, such as T-test, Anova,
Chi-Square and other testing methods. I also learned to analyze correlations
and linear regression. In addition, I learned to apply k-mean Clustering and
Hierarchical Clustering methods to segment customers.
As for my professional growth, the role of this class is huge. The professor
has repeatedly stressed the importance of applying what is taught in class
to our future work. The application tools taught in the classroom are
relatively advanced. If we use these tools for data analysis in future work, it
will greatly improve the work efficiency and reliability, and also enhance our
competitiveness.
03Market Research Report
PART THREE
Session 1: New Dataset
Supermarket XYZ has been operating since 2008 and business flourished until 2016.
They have a large database but they do not use them to achieve better business
solutions. Their annual revenues have declined 10% and it seems to stay that way
every year.
Through the membership card, Supermarket XYZ got some basic information about
the customer like Customer ID, age, gender, annual income and spending score.
Spending Score is something you assign to the customer based on your defined
parameters like customer behavior and purchasing data.
Supermarket_CustomerMembers.csv: This dataset I used for analyzing the
consumer age structure and spending and regression analysis.
Supermarket XYZ Customer data:
https://www.kaggle.com/sindraanthony9985/marketing-data-for-a-
supermarket-in-united-states
XYZ Supermarket Consumer Age Structure and Spending score Analysis Report
Session 2: Research Design and The Data
XYZ Supermarket, whose annual revenue will
drop by 10% starting in 2016, has a huge
database and obtains basic information about
200 customers, such as Customer ID, age,
gender, annual income and spending score. By
analyzing the age structure and spending
score of customers, I want to understand the
consumption level of customers of different
ages in XYZ supermarket, so that I can do
better marketing accordingly in the future.
According to the study, the spending score of
30-35 years old is relatively high, while that
of 19-29 years old is relatively low. In the
future XYZ supermarket should obtain more
basic customer information to make the
results more reliable.
Google data studio report URL: https://datastudio.google.com/reporting/61b9ad02-8e3f-48c3-84f4-59a61a91ef07
• Supermarket XYZ Customer data:
• https://www.kaggle.com/sindraanthony9985/marketing-data-for-a-supermarket-
in-united-states
Through the membership card, Supermarket XYZ got some basic information about the
customer like Customer ID, age, gender, annual income and spending score.
• URL of Github report:
https://colab.research.google.com/github/OliverGong77/NYU_Integrated_Marketing
I conducted a linear regression analysis and found that ages affect the spending score
and annual income did not affect the spending score.
Session 3: Regression
Conduct the Analysis --- Scatter Plot
Since I wanted to know whether there was a linear regression relationship between the age
and the Spending Score and whether there was a linear regression relationship between the
annual income and the Spending Score , I made a scatter plot with the data.
I found no linear regression relationship between these variables.
Conduct the Analysis --- Regression Result
Null hypothesis: β1=0 and β2=0
Result: The X1 P-value = 0 < 0.05: we conclude that at the significant level 0.05, we can reject
the null hypothesis that β1=0.
The X2 P-value = 0.931 > 0.05: we conclude that at the significant level 0.05, we can’t reject
the null hypothesis that β2=0.
Based on the previous step, I set
the ages as the independent
variable X1, the annual income
as the independent variable X2,
and the spending score as the
dependent variableY. Then I
made the null hypothesis and
did a linear regression analysis.
Conduct the Analysis --- Insights and making decisions
It can be concluded from linear regressionanalysis that
(1) the age affects the spending score.
(2) The annual income does not affect the spending score.
Therefore, if we want to increase customer’s spending score to make
more revenue, we should further understand which age groups have
higher consumption levels and market to different age groups.
Assumptions	Check
Then I went to check whether the six assumptions I use are likely to be satisfied.
I found that one of them is not satisfied and five of them are satisfied.
Assumption 4: The varianceof the error term is constant. This variancedoes not depend on
the values assumed by X.
We can see from the scatter plot below, the assumption 4 is satisfied.
Assumptions	Check
Assumption 2: The means of all these normal distributions of Y, given X, lie on a straight line
with slope b.
We can see from the scatter plot on page 11, the assumption 2 is not satisfied.
Assumption 1&3: The error term is normally distributed. For each fixed value of X, the
distribution of Y is normal. The mean of the error term is 0.
We can see from the diagram below , the assumption 1&3 is satisfied.
Assumption 5: The error terms are uncorrelated. In other words, the
observations have been drawn independently.
Since our data is not time series data, the assumption 5 is satisfied.
Assumption 6: The independent variables in X are not correlated. This is no
issues of multi-collinearity.
We can see the P-value=0.781 > 0.05, we conclude that at the significant
level 0.05, we can’t reject the null hypothesis that the independent variables
in X are not correlated. So the assumption 6 is satisfied.
Further	Research	
As a supermarket, we should expand the collection of customer
data and add sample points to make the results more reliable.
Then we should further understand which age groups have higher
consumption levels and segment to different age groups. Finally,
we should develop different marketing strategies for different
age groups.
04 Appendix
PART FOUR
Capstone Project Milestone 2: Research Design and The Data
Capstone Project Milestone 3: Hypothesis Testing
• Bank Marketing Data:
https://data.world/data-society/bank-marketing-data
The data is related with direct marketing campaigns of a Portuguese banking institution. The
marketing campaigns were based on phone calls.
• G20 GDP Data:
https://stats.oecd.org/index.aspx?queryid=33940#
The annual GDP of each country for each quarter.
• URL of Github report:
https://colab.research.google.com/github/OliverGong77/NYU_Integrated_Marketing
I use paired test, Spearman test and one-sample t-test to test the null hypothesis,the
conclusion are all significant.
Name: Oliver Gong
ID number: N14152886
NetID: zg2088
Since the two different groups data are metric data and we need to test the correlation of GDP of different
countries between the the fourth quarter of 2018 and 2019, we do the paired tests.
Conclusion: The P-value=0 < 0.05: we conclude that at the significant level 0.05, we can reject the null hypothesis
that the means of GDP per capita for the fourth quarter of 2018 and 2019 for all countries are the same.
Three Hypothesis Tests --- Paired Tests
Null hypothesis: the means of GDP per
capita for the fourth quarter of 2018 and
2019 for all countries are the same.
Since the normal equals False, we use the Spearman to
test correlations.
Null hypothesis: the GDP of the certain country in the
fourth quarter of 2018 and 2019 is not correlated.
Three Hypothesis Tests --- Spearman Tests
Conclusion: The P-value < 0.05: we conclude that at the
significant level 0.05, we can reject the null hypothesis
that the GDP of the certain country in the fourth quarter
of 2018 and 2019 is not correlated.
Three Hypothesis Tests --- One-Sample T-test
Since I want to test whether the means of one single group of data is true, I use One-Sample
T-test to test the mean of balance.
Null hypothesis: the mean of balance equals 300.
Result: The P-value < 0.05: we conclude that at the significant level 0.05, we can reject the
null hypothesis that the mean of balance equals 300.
We want to know the sample size of the research, so we set the cohen d, power and alpha to do the power
analysis.
Result: For a 0.77 cohen d effect size, a power of 0.80, and a type I error of 0.05, we need a sample size of 27 (for
each group).
There are 224 countries and regions in the world. Now we just compare the quarterly GDP of 20 countries. So our
conclusions are not very strong. In the future, we should increase the sample size and obtain the GDP data of
each quarter of all countries and regions in the world to make our conclusion more reliable.
Power Analysis and Final Remarks
• Customer	Churn	Prediction	2020
https://www.kaggle.com/c/customer-churn-prediction-2020
This	competition	is	about	predicting	whether	a	customer	will	change	telecommunications	
provider,	something	known	as	“churning”. The	dataset	contains	4250	samples.	Each	sample	
contains	19	features	and	1	boolean variable	"churn"	which	indicates	the	class	of	the	sample.
• URL of Github report:
https://colab.research.google.com/github/OliverGong77/NYU_Integrated_Marketing
I conducted a linear regression analysis and found that total minutes of day calls and the
total minutes of eve callsdid not affect the total minutes of night calls.
Name: Oliver Gong
ID number: N14152886
NetID: zg2088
Capstone Project Milestone 4: Regression
Conduct the Analysis --- Scatter Plot
Since I wanted to know whether there was a linear regression relationship between the total
day minutes and total night minutes and whether there was a linear regression relationship
between the total eve minutes and total night minutes, I made a scatter plot with the data.
Conduct the Analysis --- Regression Result
Null hypothesis: β1=0 and β2=0
Result: The X1 P-value > 0.05: we conclude that at the significant level 0.05, we can’t reject
the null hypothesis that β1=0.
The X2 P-value > 0.05: we conclude that at the significant level 0.05, we can’t reject the null
hypothesis that β2=0.
Based on the previous step, I set
the total day minutes as the
independent variable X1, the
total eve minutes as the
independent variable X2, and the
total night minutes as the
dependent variableY. Then I
made the null hypothesis and
did a linear regression analysis.
Conduct the Analysis --- Insights and making decisions
It can be concluded from linear regressionanalysis that
(1) the total minutes of day calls does not affect the total minutes of
night calls.
(2) the total minutes of eve calls does not affect the total minutes of
night calls.
Therefore, if we want to retain customers, we should give discounts
package to customers who call at different periods.
Assumptions	Check
Then I went to check whether the six assumptions I use are likely to be satisfied.
I found that one of them is not satisfied and five of them are satisfied.
Assumption 4: The varianceof the error term is constant. This variancedoes not depend on
the values assumed by X.
We can see from the scatter plot below, the assumption 4 is satisfied.
Assumptions	Check
Assumption 2: The means of all these normal distributions of Y, given X, lie on a straight line
with slope b.
We can see from the scatter plot on page 2, the assumption 2 is not satisfied.
Assumption 1&3: The error term is normally distributed. For each fixed value of X, the
distribution of Y is normal. The mean of the error term is 0.
We can see from the diagram below , the assumption 1&3 is satisfied.
Assumption 5: The error terms are uncorrelated. In other words, the
observations have been drawn independently.
Since our data is not time series data, the assumption 5 is satisfied.
Assumption 6: The independent variables in X are not correlated. This is no
issues of multi-collinearity.
We can see the P-value=0.388 > 0.05, we conclude that at the significant
level 0.05, we can’t reject the null hypothesis that the independent variables
in X are not correlated. So the assumption 6 is satisfied.
Further	Research	
As a telecommunications company, we should find some factors
that can significantly influence the customer churn rate in the
future, and give correspondingrecommendations to reduce this
factor.
• Onlineretail customer clutering
https://www.kaggle.com/hellbuoy/online-retail-customer-clustering
Online	retail	is	a	transnational	data	set	which	contains	all	the	transactions	occurring	between	
01/12/2010	and	09/12/2011	for	a	UK-based	and	registered	non-store	online	retail.	The	company	
mainly	sells	unique	all-occasion	gifts.	Many	customers	of	the	company	are	wholesalers.
• URL of Kaggle Notebook:
https://www.kaggle.com/olivergong77/customer-segementation-zg2088
I choose France retail customers’ data to do the Cluster Analysis. I use the K-Means
Clustering and Hierarchical Clusteringto get the best k and find the target customer
clusters which we need to pay attention to.
Name: Oliver Gong
ID number: N14152886
NetID: zg2088
Capstone Project Milestone 5: Clustering
K-Means Clustering --- Finding the best k
I choose France retail customers’ data to do the
Cluster Analysis.
When metric = “distortion”, I got k = 4;
When metric = “silhouette”, I got k = 3;
When metric= “calinski_harabasz”, I didn’t get a k.
So I finally found the best k = 3
K-Means Clustering --- Visualize the cluster with the best k and summarize
By the RFM criteria, we should choose the customer clusters
with a lower recency, a higher frequency and amount.
From the K-means clustering results, we can see that see
that customers with Cluster_Id=2 best fit the criteria.
We can see that we k-Means Clustering returns 18 target
customer.
Hierarchical Clustering --- Linkage methods
By following three Linkage methods, I draw the tree diagrams.
Then I do the hierarchical clustering accordingto k=3.
Hierarchical Clustering --- Visualize the cluster with the best k and summarize
By the RFM criteria, we should choose the customer clusters
with a lower recency, a higher frequency and amount.
From the K-means clustering results, we can see that
customers with Cluster_Labels=2 best fit the criteria.
We can see that Hierarchical Clusteringreturns 2 target
customer.
Further Research
We can see that k-Means Clustering returns 18 target customer.
We can see that Hierarchical Clustering returns 2 target customer,
which is a much smaller group than the one that K-Means Clustering
return.
In the actual work, if there are only 2 clusters, the number of people
surveyed will be relatively small and the results are not reliable enough.
Therefore, I prefer to use the K-Means Clustering.
THANKS
FOR
WATCHING

More Related Content

What's hot

The Death of a Salesman
The Death of a SalesmanThe Death of a Salesman
The Death of a Salesman
Houston Hunter
 
EC4417 Econometrics Project
EC4417 Econometrics ProjectEC4417 Econometrics Project
EC4417 Econometrics Project
Gearóid Dowling
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation Engines
Derek Kane
 
Application of econometrics in business world
Application of econometrics in business worldApplication of econometrics in business world
Application of econometrics in business world
Huda Khan Durrani
 
James Hamer – Proactive Advisor Magazine – Volume 3, Issue 12
James Hamer – Proactive Advisor Magazine – Volume 3, Issue 12James Hamer – Proactive Advisor Magazine – Volume 3, Issue 12
James Hamer – Proactive Advisor Magazine – Volume 3, Issue 12
Proactive Advisor Magazine
 
Statistics for management assignment
Statistics for management assignmentStatistics for management assignment
Statistics for management assignment
GIEDEEAM SOLAR and Gajanana Publications, LIC
 
Videocon industries limited
Videocon industries limitedVideocon industries limited
Videocon industries limited
Shashwat Shankar
 
Qnt275 qnt 275
Qnt275 qnt 275Qnt275 qnt 275
Qnt275 qnt 275
GOODCourseHelp
 
STOCK_ANALYSIS_PROJECT
STOCK_ANALYSIS_PROJECTSTOCK_ANALYSIS_PROJECT
STOCK_ANALYSIS_PROJECT
Louise Miller
 
FYP
FYPFYP
Email marketing-metrics-benchmark-study-2014-silverpop
Email marketing-metrics-benchmark-study-2014-silverpopEmail marketing-metrics-benchmark-study-2014-silverpop
Email marketing-metrics-benchmark-study-2014-silverpop
Yoli Chisholm
 
Causal Relationship between Stock market and Real Economy in India using Gran...
Causal Relationship between Stock market and Real Economy in India using Gran...Causal Relationship between Stock market and Real Economy in India using Gran...
Causal Relationship between Stock market and Real Economy in India using Gran...
sammysammysammy
 
Demand estimation
Demand estimation Demand estimation
Demand estimation
Qamar Farooq
 
Review Report B
Review Report BReview Report B
Review Report B
Ruoqing Li
 
Stock market prediction using Twitter sentiment analysis
Stock market prediction using Twitter sentiment analysisStock market prediction using Twitter sentiment analysis
Stock market prediction using Twitter sentiment analysis
journal ijrtem
 
ECONOMETRICS PROJECT PG2 2015
ECONOMETRICS PROJECT PG2 2015ECONOMETRICS PROJECT PG2 2015
ECONOMETRICS PROJECT PG2 2015
Sayantan Baidya
 

What's hot (16)

The Death of a Salesman
The Death of a SalesmanThe Death of a Salesman
The Death of a Salesman
 
EC4417 Econometrics Project
EC4417 Econometrics ProjectEC4417 Econometrics Project
EC4417 Econometrics Project
 
Data Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation EnginesData Science - Part VI - Market Basket and Product Recommendation Engines
Data Science - Part VI - Market Basket and Product Recommendation Engines
 
Application of econometrics in business world
Application of econometrics in business worldApplication of econometrics in business world
Application of econometrics in business world
 
James Hamer – Proactive Advisor Magazine – Volume 3, Issue 12
James Hamer – Proactive Advisor Magazine – Volume 3, Issue 12James Hamer – Proactive Advisor Magazine – Volume 3, Issue 12
James Hamer – Proactive Advisor Magazine – Volume 3, Issue 12
 
Statistics for management assignment
Statistics for management assignmentStatistics for management assignment
Statistics for management assignment
 
Videocon industries limited
Videocon industries limitedVideocon industries limited
Videocon industries limited
 
Qnt275 qnt 275
Qnt275 qnt 275Qnt275 qnt 275
Qnt275 qnt 275
 
STOCK_ANALYSIS_PROJECT
STOCK_ANALYSIS_PROJECTSTOCK_ANALYSIS_PROJECT
STOCK_ANALYSIS_PROJECT
 
FYP
FYPFYP
FYP
 
Email marketing-metrics-benchmark-study-2014-silverpop
Email marketing-metrics-benchmark-study-2014-silverpopEmail marketing-metrics-benchmark-study-2014-silverpop
Email marketing-metrics-benchmark-study-2014-silverpop
 
Causal Relationship between Stock market and Real Economy in India using Gran...
Causal Relationship between Stock market and Real Economy in India using Gran...Causal Relationship between Stock market and Real Economy in India using Gran...
Causal Relationship between Stock market and Real Economy in India using Gran...
 
Demand estimation
Demand estimation Demand estimation
Demand estimation
 
Review Report B
Review Report BReview Report B
Review Report B
 
Stock market prediction using Twitter sentiment analysis
Stock market prediction using Twitter sentiment analysisStock market prediction using Twitter sentiment analysis
Stock market prediction using Twitter sentiment analysis
 
ECONOMETRICS PROJECT PG2 2015
ECONOMETRICS PROJECT PG2 2015ECONOMETRICS PROJECT PG2 2015
ECONOMETRICS PROJECT PG2 2015
 

Similar to Final presentation zg2088

Final presentation
Final presentationFinal presentation
Final presentation
ssuser8e5ee2
 
Hy2208 Final
Hy2208 FinalHy2208 Final
Hy2208 Final
ssuser433675
 
Hy2208 final
Hy2208 finalHy2208 final
Hy2208 final
ssuser433675
 
Start Up Market Analysis Tutorial from Sunstone Communication
Start Up Market Analysis Tutorial from Sunstone CommunicationStart Up Market Analysis Tutorial from Sunstone Communication
Start Up Market Analysis Tutorial from Sunstone Communication
Kenny Fraser
 
Final Presentation
Final PresentationFinal Presentation
Final Presentation
ssuseraf9eb5
 
statistical measurement project present
statistical measurement project presentstatistical measurement project present
statistical measurement project present
KexinZhang22
 
statistical measurement project presentation
statistical measurement project presentationstatistical measurement project presentation
statistical measurement project presentation
KexinZhang22
 
statistical measurement project present
statistical measurement project presentstatistical measurement project present
statistical measurement project present
KexinZhang22
 
Amit Satsangi Analytics Portfolio
Amit Satsangi Analytics PortfolioAmit Satsangi Analytics Portfolio
Amit Satsangi Analytics Portfolio
Amit Satsangi
 
Dbs challange stage 1
Dbs challange stage 1Dbs challange stage 1
Dbs challange stage 1
Jatinder Bedi
 
Digital salary and industry insights report, 7th edition
Digital salary and industry insights report, 7th editionDigital salary and industry insights report, 7th edition
Digital salary and industry insights report, 7th edition
Alex Straw
 
Analytical CRM - Ecommerce analysis of customer behavior to enhance sales
Analytical CRM - Ecommerce analysis of customer behavior to enhance sales Analytical CRM - Ecommerce analysis of customer behavior to enhance sales
Analytical CRM - Ecommerce analysis of customer behavior to enhance sales
Shrikant Samarth
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
Carolyn Knight
 
Final Presentation Slide--yw5244
Final Presentation Slide--yw5244Final Presentation Slide--yw5244
Final Presentation Slide--yw5244
ssuserdb31951
 
The State of Conversion Rate Optimisation (CRO) 2019
The State of Conversion Rate Optimisation (CRO) 2019The State of Conversion Rate Optimisation (CRO) 2019
The State of Conversion Rate Optimisation (CRO) 2019
Host Digital
 
Inbound Marketing Effectiveness Benchmark Report
Inbound Marketing Effectiveness Benchmark ReportInbound Marketing Effectiveness Benchmark Report
Inbound Marketing Effectiveness Benchmark Report
Demand Metric
 
Shrinking big data for real time marketing strategy - A statistical Report
Shrinking big data for real time marketing strategy - A statistical ReportShrinking big data for real time marketing strategy - A statistical Report
Shrinking big data for real time marketing strategy - A statistical Report
Manidipa Banerjee
 
Measurement and monetizing customer experience with social media.
Measurement and monetizing customer experience with social media.Measurement and monetizing customer experience with social media.
Measurement and monetizing customer experience with social media.
Michael Wolfe
 
Marketing Decision Models Project - AMAZONFRESH
Marketing Decision Models Project - AMAZONFRESHMarketing Decision Models Project - AMAZONFRESH
Marketing Decision Models Project - AMAZONFRESH
Chu (Esperanza) Wang
 
Customer Personality Analysis — Part 1.pdf
Customer Personality Analysis — Part 1.pdfCustomer Personality Analysis — Part 1.pdf
Customer Personality Analysis — Part 1.pdf
ssuser33ba021
 

Similar to Final presentation zg2088 (20)

Final presentation
Final presentationFinal presentation
Final presentation
 
Hy2208 Final
Hy2208 FinalHy2208 Final
Hy2208 Final
 
Hy2208 final
Hy2208 finalHy2208 final
Hy2208 final
 
Start Up Market Analysis Tutorial from Sunstone Communication
Start Up Market Analysis Tutorial from Sunstone CommunicationStart Up Market Analysis Tutorial from Sunstone Communication
Start Up Market Analysis Tutorial from Sunstone Communication
 
Final Presentation
Final PresentationFinal Presentation
Final Presentation
 
statistical measurement project present
statistical measurement project presentstatistical measurement project present
statistical measurement project present
 
statistical measurement project presentation
statistical measurement project presentationstatistical measurement project presentation
statistical measurement project presentation
 
statistical measurement project present
statistical measurement project presentstatistical measurement project present
statistical measurement project present
 
Amit Satsangi Analytics Portfolio
Amit Satsangi Analytics PortfolioAmit Satsangi Analytics Portfolio
Amit Satsangi Analytics Portfolio
 
Dbs challange stage 1
Dbs challange stage 1Dbs challange stage 1
Dbs challange stage 1
 
Digital salary and industry insights report, 7th edition
Digital salary and industry insights report, 7th editionDigital salary and industry insights report, 7th edition
Digital salary and industry insights report, 7th edition
 
Analytical CRM - Ecommerce analysis of customer behavior to enhance sales
Analytical CRM - Ecommerce analysis of customer behavior to enhance sales Analytical CRM - Ecommerce analysis of customer behavior to enhance sales
Analytical CRM - Ecommerce analysis of customer behavior to enhance sales
 
Bank churn with Data Science
Bank churn with Data ScienceBank churn with Data Science
Bank churn with Data Science
 
Final Presentation Slide--yw5244
Final Presentation Slide--yw5244Final Presentation Slide--yw5244
Final Presentation Slide--yw5244
 
The State of Conversion Rate Optimisation (CRO) 2019
The State of Conversion Rate Optimisation (CRO) 2019The State of Conversion Rate Optimisation (CRO) 2019
The State of Conversion Rate Optimisation (CRO) 2019
 
Inbound Marketing Effectiveness Benchmark Report
Inbound Marketing Effectiveness Benchmark ReportInbound Marketing Effectiveness Benchmark Report
Inbound Marketing Effectiveness Benchmark Report
 
Shrinking big data for real time marketing strategy - A statistical Report
Shrinking big data for real time marketing strategy - A statistical ReportShrinking big data for real time marketing strategy - A statistical Report
Shrinking big data for real time marketing strategy - A statistical Report
 
Measurement and monetizing customer experience with social media.
Measurement and monetizing customer experience with social media.Measurement and monetizing customer experience with social media.
Measurement and monetizing customer experience with social media.
 
Marketing Decision Models Project - AMAZONFRESH
Marketing Decision Models Project - AMAZONFRESHMarketing Decision Models Project - AMAZONFRESH
Marketing Decision Models Project - AMAZONFRESH
 
Customer Personality Analysis — Part 1.pdf
Customer Personality Analysis — Part 1.pdfCustomer Personality Analysis — Part 1.pdf
Customer Personality Analysis — Part 1.pdf
 

Recently uploaded

PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
AyyanKhan40
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
eBook.com.bd (প্রয়োজনীয় বাংলা বই)
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
Katrina Pritchard
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
NgcHiNguyn25
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
RitikBhardwaj56
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
amberjdewit93
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
Celine George
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Mohd Adib Abd Muin, Senior Lecturer at Universiti Utara Malaysia
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
National Information Standards Organization (NISO)
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
TechSoup
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
Priyankaranawat4
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
mulvey2
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
Dr. Mulla Adam Ali
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
GeorgeMilliken2
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Dr. Vinod Kumar Kanvaria
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
David Douglas School District
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
adhitya5119
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
tarandeep35
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
IreneSebastianRueco1
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
Nguyen Thanh Tu Collection
 

Recently uploaded (20)

PIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf IslamabadPIMS Job Advertisement 2024.pdf Islamabad
PIMS Job Advertisement 2024.pdf Islamabad
 
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdfবাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
বাংলাদেশ অর্থনৈতিক সমীক্ষা (Economic Review) ২০২৪ UJS App.pdf
 
BBR 2024 Summer Sessions Interview Training
BBR  2024 Summer Sessions Interview TrainingBBR  2024 Summer Sessions Interview Training
BBR 2024 Summer Sessions Interview Training
 
Life upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for studentLife upper-Intermediate B2 Workbook for student
Life upper-Intermediate B2 Workbook for student
 
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...The simplified electron and muon model, Oscillating Spacetime: The Foundation...
The simplified electron and muon model, Oscillating Spacetime: The Foundation...
 
Digital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental DesignDigital Artefact 1 - Tiny Home Environmental Design
Digital Artefact 1 - Tiny Home Environmental Design
 
How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17How to Make a Field Mandatory in Odoo 17
How to Make a Field Mandatory in Odoo 17
 
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptxChapter 4 - Islamic Financial Institutions in Malaysia.pptx
Chapter 4 - Islamic Financial Institutions in Malaysia.pptx
 
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
Pollock and Snow "DEIA in the Scholarly Landscape, Session One: Setting Expec...
 
Walmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdfWalmart Business+ and Spark Good for Nonprofits.pdf
Walmart Business+ and Spark Good for Nonprofits.pdf
 
clinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdfclinical examination of hip joint (1).pdf
clinical examination of hip joint (1).pdf
 
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptxC1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
C1 Rubenstein AP HuG xxxxxxxxxxxxxx.pptx
 
Hindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdfHindi varnamala | hindi alphabet PPT.pdf
Hindi varnamala | hindi alphabet PPT.pdf
 
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
What is Digital Literacy? A guest blog from Andy McLaughlin, University of Ab...
 
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
Exploiting Artificial Intelligence for Empowering Researchers and Faculty, In...
 
Pride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School DistrictPride Month Slides 2024 David Douglas School District
Pride Month Slides 2024 David Douglas School District
 
Advanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docxAdvanced Java[Extra Concepts, Not Difficult].docx
Advanced Java[Extra Concepts, Not Difficult].docx
 
S1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptxS1-Introduction-Biopesticides in ICM.pptx
S1-Introduction-Biopesticides in ICM.pptx
 
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
RPMS TEMPLATE FOR SCHOOL YEAR 2023-2024 FOR TEACHER 1 TO TEACHER 3
 
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
BÀI TẬP BỔ TRỢ TIẾNG ANH 8 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2023-2024 (CÓ FI...
 

Final presentation zg2088

  • 1. Final Presentation STATISTICAL MEASUREMENTS, ANALYSIS & RESEARCH Oliver Gong Net ID: zg2088 Instructor: Dr. Luyao Zhang
  • 4. Self-Introduction I am Ziyu Gong also go by Oliver. I received Bachelor’s degree in Accounting from China University of Mining and Technology(Beijing). In the seconded year of university, I went to Industrial and Commercial Bank of China(Shangrao) Corporate Businiess Department as an intern, Supporting the account manager in loan marketing by visiting clients to know about their demands for bank funds and collect necessary documents required for loan approval. That’s where I experienced and comprehended marketing practice for the first time. In the third year of university, I interned in Everbright Securities Investment Banking Department, conducting corporate and financial due diligence for an IPO project, responsibilities included collecting and checking confirmation requests and analyzing financial statements. As an aspiring person willing to take challenges, I set my goal to devote myself to marketing analysis industry in the future. B.S. in Accounting |China University of Mining and Technology(Beijing) LinkedIn URL: https://www.linkedin.com/in/%E5%AD%90%E8%88%86-%E9%BE%9A-18b8101b7/ GitHub Repo Link: https://colab.research.google.com/github/OliverGong77/NYU_Integrated_Marketing Kaggle Notebook Link: https://www.kaggle.com/olivergong77/customer-segementation-zg2088
  • 6. Key Learnings The most important thing I learned from this lesson was to use tools like Goole Data Studio, Github and Kaggle to analyze data. In terms of application, I learned to conduct hypothesis testing, such as T-test, Anova, Chi-Square and other testing methods. I also learned to analyze correlations and linear regression. In addition, I learned to apply k-mean Clustering and Hierarchical Clustering methods to segment customers. As for my professional growth, the role of this class is huge. The professor has repeatedly stressed the importance of applying what is taught in class to our future work. The application tools taught in the classroom are relatively advanced. If we use these tools for data analysis in future work, it will greatly improve the work efficiency and reliability, and also enhance our competitiveness.
  • 8. Session 1: New Dataset Supermarket XYZ has been operating since 2008 and business flourished until 2016. They have a large database but they do not use them to achieve better business solutions. Their annual revenues have declined 10% and it seems to stay that way every year. Through the membership card, Supermarket XYZ got some basic information about the customer like Customer ID, age, gender, annual income and spending score. Spending Score is something you assign to the customer based on your defined parameters like customer behavior and purchasing data. Supermarket_CustomerMembers.csv: This dataset I used for analyzing the consumer age structure and spending and regression analysis. Supermarket XYZ Customer data: https://www.kaggle.com/sindraanthony9985/marketing-data-for-a- supermarket-in-united-states
  • 9. XYZ Supermarket Consumer Age Structure and Spending score Analysis Report Session 2: Research Design and The Data XYZ Supermarket, whose annual revenue will drop by 10% starting in 2016, has a huge database and obtains basic information about 200 customers, such as Customer ID, age, gender, annual income and spending score. By analyzing the age structure and spending score of customers, I want to understand the consumption level of customers of different ages in XYZ supermarket, so that I can do better marketing accordingly in the future. According to the study, the spending score of 30-35 years old is relatively high, while that of 19-29 years old is relatively low. In the future XYZ supermarket should obtain more basic customer information to make the results more reliable. Google data studio report URL: https://datastudio.google.com/reporting/61b9ad02-8e3f-48c3-84f4-59a61a91ef07
  • 10. • Supermarket XYZ Customer data: • https://www.kaggle.com/sindraanthony9985/marketing-data-for-a-supermarket- in-united-states Through the membership card, Supermarket XYZ got some basic information about the customer like Customer ID, age, gender, annual income and spending score. • URL of Github report: https://colab.research.google.com/github/OliverGong77/NYU_Integrated_Marketing I conducted a linear regression analysis and found that ages affect the spending score and annual income did not affect the spending score. Session 3: Regression
  • 11. Conduct the Analysis --- Scatter Plot Since I wanted to know whether there was a linear regression relationship between the age and the Spending Score and whether there was a linear regression relationship between the annual income and the Spending Score , I made a scatter plot with the data. I found no linear regression relationship between these variables.
  • 12. Conduct the Analysis --- Regression Result Null hypothesis: β1=0 and β2=0 Result: The X1 P-value = 0 < 0.05: we conclude that at the significant level 0.05, we can reject the null hypothesis that β1=0. The X2 P-value = 0.931 > 0.05: we conclude that at the significant level 0.05, we can’t reject the null hypothesis that β2=0. Based on the previous step, I set the ages as the independent variable X1, the annual income as the independent variable X2, and the spending score as the dependent variableY. Then I made the null hypothesis and did a linear regression analysis.
  • 13. Conduct the Analysis --- Insights and making decisions It can be concluded from linear regressionanalysis that (1) the age affects the spending score. (2) The annual income does not affect the spending score. Therefore, if we want to increase customer’s spending score to make more revenue, we should further understand which age groups have higher consumption levels and market to different age groups.
  • 14. Assumptions Check Then I went to check whether the six assumptions I use are likely to be satisfied. I found that one of them is not satisfied and five of them are satisfied. Assumption 4: The varianceof the error term is constant. This variancedoes not depend on the values assumed by X. We can see from the scatter plot below, the assumption 4 is satisfied.
  • 15. Assumptions Check Assumption 2: The means of all these normal distributions of Y, given X, lie on a straight line with slope b. We can see from the scatter plot on page 11, the assumption 2 is not satisfied. Assumption 1&3: The error term is normally distributed. For each fixed value of X, the distribution of Y is normal. The mean of the error term is 0. We can see from the diagram below , the assumption 1&3 is satisfied. Assumption 5: The error terms are uncorrelated. In other words, the observations have been drawn independently. Since our data is not time series data, the assumption 5 is satisfied. Assumption 6: The independent variables in X are not correlated. This is no issues of multi-collinearity. We can see the P-value=0.781 > 0.05, we conclude that at the significant level 0.05, we can’t reject the null hypothesis that the independent variables in X are not correlated. So the assumption 6 is satisfied.
  • 16. Further Research As a supermarket, we should expand the collection of customer data and add sample points to make the results more reliable. Then we should further understand which age groups have higher consumption levels and segment to different age groups. Finally, we should develop different marketing strategies for different age groups.
  • 18. Capstone Project Milestone 2: Research Design and The Data
  • 19. Capstone Project Milestone 3: Hypothesis Testing • Bank Marketing Data: https://data.world/data-society/bank-marketing-data The data is related with direct marketing campaigns of a Portuguese banking institution. The marketing campaigns were based on phone calls. • G20 GDP Data: https://stats.oecd.org/index.aspx?queryid=33940# The annual GDP of each country for each quarter. • URL of Github report: https://colab.research.google.com/github/OliverGong77/NYU_Integrated_Marketing I use paired test, Spearman test and one-sample t-test to test the null hypothesis,the conclusion are all significant. Name: Oliver Gong ID number: N14152886 NetID: zg2088
  • 20. Since the two different groups data are metric data and we need to test the correlation of GDP of different countries between the the fourth quarter of 2018 and 2019, we do the paired tests. Conclusion: The P-value=0 < 0.05: we conclude that at the significant level 0.05, we can reject the null hypothesis that the means of GDP per capita for the fourth quarter of 2018 and 2019 for all countries are the same. Three Hypothesis Tests --- Paired Tests Null hypothesis: the means of GDP per capita for the fourth quarter of 2018 and 2019 for all countries are the same.
  • 21. Since the normal equals False, we use the Spearman to test correlations. Null hypothesis: the GDP of the certain country in the fourth quarter of 2018 and 2019 is not correlated. Three Hypothesis Tests --- Spearman Tests Conclusion: The P-value < 0.05: we conclude that at the significant level 0.05, we can reject the null hypothesis that the GDP of the certain country in the fourth quarter of 2018 and 2019 is not correlated.
  • 22. Three Hypothesis Tests --- One-Sample T-test Since I want to test whether the means of one single group of data is true, I use One-Sample T-test to test the mean of balance. Null hypothesis: the mean of balance equals 300. Result: The P-value < 0.05: we conclude that at the significant level 0.05, we can reject the null hypothesis that the mean of balance equals 300.
  • 23. We want to know the sample size of the research, so we set the cohen d, power and alpha to do the power analysis. Result: For a 0.77 cohen d effect size, a power of 0.80, and a type I error of 0.05, we need a sample size of 27 (for each group). There are 224 countries and regions in the world. Now we just compare the quarterly GDP of 20 countries. So our conclusions are not very strong. In the future, we should increase the sample size and obtain the GDP data of each quarter of all countries and regions in the world to make our conclusion more reliable. Power Analysis and Final Remarks
  • 24. • Customer Churn Prediction 2020 https://www.kaggle.com/c/customer-churn-prediction-2020 This competition is about predicting whether a customer will change telecommunications provider, something known as “churning”. The dataset contains 4250 samples. Each sample contains 19 features and 1 boolean variable "churn" which indicates the class of the sample. • URL of Github report: https://colab.research.google.com/github/OliverGong77/NYU_Integrated_Marketing I conducted a linear regression analysis and found that total minutes of day calls and the total minutes of eve callsdid not affect the total minutes of night calls. Name: Oliver Gong ID number: N14152886 NetID: zg2088 Capstone Project Milestone 4: Regression
  • 25. Conduct the Analysis --- Scatter Plot Since I wanted to know whether there was a linear regression relationship between the total day minutes and total night minutes and whether there was a linear regression relationship between the total eve minutes and total night minutes, I made a scatter plot with the data.
  • 26. Conduct the Analysis --- Regression Result Null hypothesis: β1=0 and β2=0 Result: The X1 P-value > 0.05: we conclude that at the significant level 0.05, we can’t reject the null hypothesis that β1=0. The X2 P-value > 0.05: we conclude that at the significant level 0.05, we can’t reject the null hypothesis that β2=0. Based on the previous step, I set the total day minutes as the independent variable X1, the total eve minutes as the independent variable X2, and the total night minutes as the dependent variableY. Then I made the null hypothesis and did a linear regression analysis.
  • 27. Conduct the Analysis --- Insights and making decisions It can be concluded from linear regressionanalysis that (1) the total minutes of day calls does not affect the total minutes of night calls. (2) the total minutes of eve calls does not affect the total minutes of night calls. Therefore, if we want to retain customers, we should give discounts package to customers who call at different periods.
  • 28. Assumptions Check Then I went to check whether the six assumptions I use are likely to be satisfied. I found that one of them is not satisfied and five of them are satisfied. Assumption 4: The varianceof the error term is constant. This variancedoes not depend on the values assumed by X. We can see from the scatter plot below, the assumption 4 is satisfied.
  • 29. Assumptions Check Assumption 2: The means of all these normal distributions of Y, given X, lie on a straight line with slope b. We can see from the scatter plot on page 2, the assumption 2 is not satisfied. Assumption 1&3: The error term is normally distributed. For each fixed value of X, the distribution of Y is normal. The mean of the error term is 0. We can see from the diagram below , the assumption 1&3 is satisfied. Assumption 5: The error terms are uncorrelated. In other words, the observations have been drawn independently. Since our data is not time series data, the assumption 5 is satisfied. Assumption 6: The independent variables in X are not correlated. This is no issues of multi-collinearity. We can see the P-value=0.388 > 0.05, we conclude that at the significant level 0.05, we can’t reject the null hypothesis that the independent variables in X are not correlated. So the assumption 6 is satisfied.
  • 30. Further Research As a telecommunications company, we should find some factors that can significantly influence the customer churn rate in the future, and give correspondingrecommendations to reduce this factor.
  • 31. • Onlineretail customer clutering https://www.kaggle.com/hellbuoy/online-retail-customer-clustering Online retail is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers. • URL of Kaggle Notebook: https://www.kaggle.com/olivergong77/customer-segementation-zg2088 I choose France retail customers’ data to do the Cluster Analysis. I use the K-Means Clustering and Hierarchical Clusteringto get the best k and find the target customer clusters which we need to pay attention to. Name: Oliver Gong ID number: N14152886 NetID: zg2088 Capstone Project Milestone 5: Clustering
  • 32. K-Means Clustering --- Finding the best k I choose France retail customers’ data to do the Cluster Analysis. When metric = “distortion”, I got k = 4; When metric = “silhouette”, I got k = 3; When metric= “calinski_harabasz”, I didn’t get a k. So I finally found the best k = 3
  • 33. K-Means Clustering --- Visualize the cluster with the best k and summarize By the RFM criteria, we should choose the customer clusters with a lower recency, a higher frequency and amount. From the K-means clustering results, we can see that see that customers with Cluster_Id=2 best fit the criteria. We can see that we k-Means Clustering returns 18 target customer.
  • 34. Hierarchical Clustering --- Linkage methods By following three Linkage methods, I draw the tree diagrams. Then I do the hierarchical clustering accordingto k=3.
  • 35. Hierarchical Clustering --- Visualize the cluster with the best k and summarize By the RFM criteria, we should choose the customer clusters with a lower recency, a higher frequency and amount. From the K-means clustering results, we can see that customers with Cluster_Labels=2 best fit the criteria. We can see that Hierarchical Clusteringreturns 2 target customer.
  • 36. Further Research We can see that k-Means Clustering returns 18 target customer. We can see that Hierarchical Clustering returns 2 target customer, which is a much smaller group than the one that K-Means Clustering return. In the actual work, if there are only 2 clusters, the number of people surveyed will be relatively small and the results are not reliable enough. Therefore, I prefer to use the K-Means Clustering.