SlideShare a Scribd company logo
1 of 31
Download to read offline
1
Final Presentation
Yanle Wang-yw5178
Contents
Part 1 Self-introduction 2
Part 2 Course Summary 3
Part 3 Market research report `4
Session1 4
Session2 5
Session3 6
Part 4 Appendix 14
2
Self-introduction
URL of Kaggle Notebook
https://www.kaggle.com/yanlewang/customer-segementation-yw5178
Github URL:
https://colab.research.google.com/github/Yanle57/NYU_Integrated_Marketing
Linkedin:
www.linkedin.com/in/yw-wang-04a3231b7
Yanle (Mike) Wang received his bachelor’s degree in
management from Shanghai University of International
Business and Economics. He has great passion for Marketing
and enjoys creating value for customers. He has interned for
Hitachi (China) and Nike, learning different market-oriented
strategies to enhance brand value. He is also interested in doing
volunteer works to meet different kinds of people and help
them. He thinks this is a way to learn the need and pain from
potential customers. He has won prize for “Internet plus”
innovation and entrepreneurship in solving online tutor in
poverty areas.
3
Course Summary
I have learnt many theory and data modeling methods from the course. In the theoretical
part, the methods and logic of many investigations have given me a deeper
understanding of what market research should do. The three platforms used in the class
also allowed me to experience the entire process from finding data to final analysis and
modeling.
Through this course, I really discovered that LinkedIn is also a very good platform to
let others know you. I am now getting more and more interested in doing research and
investigation, and I want to learn more about programming in the future to make better
use of these platforms and software. I am looking for an internship in data marketing to
prepare for my future career
4
Market research report-Session 1
Data Set
All annual data of the variables from 1973 to 2016 are collected from National Bureau of
Statistics.
http://www.stats.gov.cn (Some missing data are estimated by Yanle Wang)
Y: The life expectancy at birth of the population
life expectancy at birth is our most commonly used life expectancy indicator. It shows the average
number of years the new born population is expected to survive and is an important indicator of the
health of the population.
X1: College enrollment rate (% of total population)
The college enrollment rate Macroscopically shows the eduction level of a country. We consider
that the increase of college enrollment rate will certainly lead to the increase of the life expectancy.
However, we do not think this relationship is a single linear one.
X2:GDP ( Taking 1973 as Fixed Base Index )
GDP refers to the total sum of all final products and services produced by a country for a given
period of time and it is often considered as an indicator of country’s economic state. This paper
assumes that the economic situation reflect people’s consumption level in a period of time. It will
affect the volume of people’s spending on the all life-related behaviour. This paper predicts that
GDP has a significantly positive impact on Y. And in this paper, we set the date of 1973 as the base.
X3: Hospital bed number(per thousand people)
The paper assumes that the hospital bed number (per thousand people) of the medical industry is an
important indicator that shows the medical resource that per person can use. Recent years, with the
continuous improvement of medical level, more diseases can be cured. However, from a macro
angle, it is the medical resources that per person can use contribute to their life expectancy. We also
consider that the relationship is positive.
5
Market research report-Session 2
Research Abstract
The Chinese population has entered a period of healthy post-transformation, and the level of death
has shown a steady decline, and the life expectancy of the population has continued to increase.
Although it is difficult to predict how long a particular person will live, it is possible to calculate
and inform, by a scientific method, the Life expectancy at birth of the population is feasible to study.
It is of great practical significance to study the life expectancy in the country and this paper will
focus on what factors will have influence on the life expectancy at birth of the population from the
macro perspective. The research will use linear regression to study the relationship between Life
expectancy at birth of the population and other variables.
Google Datastudio Link for Public:
https://datastudio.google.com/reporting/9adf893b-7baa-45df-aa28-46662f9dd349
6
Market research report-Session 3
Research Executive Summary
Yw5178
Yanle Wang
All annual data of the variables from 1973 to 2016 are collected from National
Bureau of Statistics.
http://www.stats.gov.cn (Some missing data are estimated by Yanle Wang)
[The GDP ($) takes 1973 as Fixed Base Index.]
It is of great practical significance to study the life expectancy in the country and
this project will focus on what factors will have influence on the life expectancy
at birth.
In this project, we will use linear regression model to check the correlation
between College enrollment rate (% of total population) (X1), GDP (X2),
Hospital bed number (per thousand people) (X3) and Life expectancy at birth of
the population (Y). The assumption model and H0 is listed below.
Y=B1X1+B2X2+B3X3+B0+e ---[H0: B1=0, B2=0, B3=0. (α=0.05)]
The result showed that (X1) (X2) (X3) and (Y) are significantly related. However,
the model need further revised after assumption check.
Github URL:
https://colab.research.google.com/github/Yanle57/NYU_Integrated_Marketi
ng
7
Conduct the Analysis
Scatterplot 1
For X1: College enrollment rate (% of total population) and Y: Life
expectancy at birth of the population
The scatterplot shows a proportional (active) liner relationship.
College enrollment rate might have a positive effect on Life
expectancy at birth of the population.
However, it is also possible that the relationship between them is not
linear but increasing in square.
8
Conduct the Analysis
Scatterplot 2
For X2: GDP and Y: Life expectancy at birth of the population
The scatterplot does not show a clear relationship.
It seems that when GDP level is more than 1.5 E+11, it might have a
positive effect on Life expectancy at birth of the population.
9
Conduct the Analysis
Scatterplot 3
For X3: Hospital bed number (per thousand people) and Y: Life
expectancy at birth of the population
The scatterplot shows a proportional (active) liner relationship.
More hospital bed providing might have a positive effect on Life
expectancy at birth of the population.
10
Conduct the Analysis
Summery
•I choose this Group regression to see relationship between the X1, X2 and Y.
Y=B1X1+B2X2+B3X3+B0+e ---[H0: B1=0, B2=0, B3=0. (α=0.05)]
X1:
•For P-Value =0.00 < 0.05, type I error, reject and B1=0.3212
For each one percentage increase in College enrollment rate, the life expectancy
at birth of the population will be increased by 0.3212.
X2
•For P-Value =0.00 < 0.05, type I error, reject and B2= -4.106e-11
For each increase in total GDP, the life expectancy at birth of the population will
be decreased by -4.106e-11.
X3
•For P-Value =0.00 < 0.05, type I error, reject and B3=2.2161
For each one increase in Hospital bed number (per thousand people), the life
expectancy at birth of the population will be increased by 2.2161.
11
12
Assumptions Check and Further Research
After assumption check, 2 assumptions are satisfied and 4 are not satisfied.
1. Because p=0.00 we reject the H0: X1 and X2/ X2 and X3/ X1 and X3 are not
correlated, respectively. (Not Satisfied)
2. As the predictions increase, the residual plot shows an unparallel distribution,
which shows the variance of the error term is not constant. (Not Satisfied)
3. The mean of the error term is close to 0. (Satisfied)
4. The observations have been drawn not independently indicates that the error
term is not normally distributed. (Not Satisfied)
5. Error terms are uncorrelated. (Satisfied)
6. Scatter plot (Not Satisfied)
The means of Y, given X1, do not lie on a straight line with slope B1.
The means of Y, given X2, do not lie on a straight line with slope B2.
The means of Y, given X3, lie on a straight line with slope B3.
13
Conclusion and Suggestion:
Our model needs further improvement. The pure linear regression might be not
good for this data, Especially for X1.
As a marketing manager, the model result seems not reasonable.
1. GDP growth should have a positive effect on Y. This might be caused by the
correlation of independent data. Future research should avoid using such
correlated variable combinations.
2. It is also possible that the relationship between College enrollment rate and Life
expectancy at birth of the population is not linear but increasing in square. Linear
regression can be done to test X1^2 and Y.
14
Appendix
Capstone Project Milestone 2: Research
Design and The Data
15
Capstone Project Milestone 3: Hypothesis
Testing
Executive Summary
Yw5178
Yanle Wang
Data Sources:
Direct Marketing campaigns data (phone calls) of a Portuguese banking institution:
https://data.world/data society/bank marketing data
Economy data before and after Covid 19:
https://stats.oecd.org/index.aspx?queryid=33940#
Hypothesis Test-Parametric Test: T-Test, Paired T-Test and Group T-Test
In population, the Null Hypothesis is True,a type I error of 0.05
Github URL:
https://colab.research.google.com/github/Yanle57/NYU_Integrated_Marketing
16
Hypothesis Test 1
•I choose this test to see whether the data population is normal distribution or
not.
•H0: The data population is in normal distribution
•Since the result is Normal=False, I choose Spearman correlation.
•For alpha=0.05, P-Value=0.005417 < 0.05, type I error, reject.
17
Hypothesis Test 2
•I choose this Paired T-Test to see whether the data population in different time
period has significant difference.
•H0: The data population in different time period has no significant difference
•The data population in different time period has significant difference
•For power=0.09, cohen d=0.1, P-Value =0.0 < 0.05, type I error, reject.
18
Hypothesis Test 3
•I choose this Group T-Test to see whether the two data populations have
significant difference.
•H0: The two data populations do not have significant difference
•The two data populations have significant difference
•For power=1, cohen d=0.23077, P-Value =2.76e-137 < 0.05, type I error, reject.
19
Power Analysis and Final Remarks
Conclusion and limitation: For a 0.15 cohen d effect size, a power of 0.80, and a
type I error of 0.05, we need a sample size of 699 (for each group), which is a
valuable decision for business use.
If my boss told me that the revenue of the product will increase by more than 10%
rather than 15% of the standard deviation, the sample size well be increased to
1571.
20
Capstone Project Milestone 4: Regression
Executive Summary
Yw5178
Yanle Wang
The data source is a Customer Churn Prediction in 2020. It is used to predict
whether a customer will change telco provider.
In this project, we will use linear regression model to check the correlation
between Total_day_minutes (X1), Total_day_calls (X2) and Total_day_charge
(Y). The assumption model and H0 is listed below.
Y=B1X1+B2X2+B0+e
H0: B1=0, B2=0. (α=0.05)
The result showed that Total_day_minutes (X1) and Total_day_charge (Y)
have strong positive relationship. Total_day_calls (X2) and Total_day_charge
(Y) are not significantly related.
Data Sources: https://www.kaggle.com/c/customer-churn-prediction-2020
Github URL:
https://colab.research.google.com/github/Yanle57/NYU_Integrated_Marketi
ng
21
Conduct the Analysis
Scatterplot 1
For X1: Total_day_minutes and Y: Total_day_charge
The scatterplot shows a proportional (active) liner relationship.
Total_day_minutes might have a positive effect on
Total_day_charge.
22
Conduct the Analysis
Scatterplot 2
For X2: Total_day_calls and Y: Total_day_charge
The scatterplot does not show a clear relationship.
It is hard to predict the relation between Total_day_calls and
Total_day_charge at this stage.
23
Conduct the Analysis
Summery
•I choose this Group regression to see relationship between the X1, X2 and Y.
Y=B1X1+B2X2+B0+e
H0: B1=0, B2=0. (α=0.05)
X1:
•For P-Value =0.00 < 0.05, type I error, reject and B1=0.170
For each increase in total day time minute, the total day time charge will be
increased by 0.17. (0.17 dollar/minute)
X2
•For P-Value =0.395 > 0.05, not reject. B2=0
For each increase in total day time call, there is no significant increase in the total
day time charge.
24
25
Assumptions Check and Further Research
After assumption check, 4 assumptions are satisfied and 2 are not satisfied.
1. Because p=0.961 we accept the H0: X1 and X2 are not correlated. (Satisfied)
2. As the predictions increase, the residual plot shows a parallel distribution, which
shows the variance of the error term is constant. (Satisfied)
3. The mean of the error term is very close to 0. (Satisfied)
4. The observations have been drawn not independently indicates that the error
term is not normally distributed. (Not Satisfied)
5. Error terms are uncorrelated. (No time series data). (Satisfied)
6. The means of Y, given X1, lie on a straight line with slope B1.
The means of Y, given X2, do not lie on a straight line with slope B2. (Not Satisfied)
Suggestion: Our model needs further improvement. The linear regression might
be not good for this data. As a marketing manager, the model result seems
reasonable that the Total_day_minutes might have a positive effect on
Total_day_charge. However, more test like the relation between night_minutes
and night_charge / eve_minutes and eve_charge should be done to judge whether
Total minutes and Total charge have the positive and valid relationship.
26
Capstone Project Milestone 5: Clustering
Executive Summary
Yw5178
Yanle Wang
• Online retail customer clustering
https://www.kaggle.com/hellbuoy/online-retail-customer-clustering
Online retail is a transnational data set which contains all the transactions
occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered
non-store online retail. The company mainly sells unique all-occasion gifts.
Many customers of the company are wholesalers.
The Goal of Customer Segmentation: find the target customer. (Based on RFM :
Recency, Frequency, and Monetary):
• URL of Kaggle Notebook
https://www.kaggle.com/yanlewang/customer-segementation-yw5178
I choose Germany retail customers’ data to do the Cluster Analysis. I use the K-
Means Clustering and Hierarchical Clustering statistic methods to find the K and
the target customer based on RFM.
27
K-Means Clustering
I use “distortion”, “silhouette”, and “calinski_harabasz” metric to get a k.
When metric= “distortion”, K=4.
When metric= “silhouette”, I cannot find K.
When metric= “calinski_harabasz”, K=4.
Finally I choose the best K=4 (calinski_harabasz)
28
By the RFM criteria, we should choose the customer clusters with a lower
recency, a higher frequency and amount.
From the K-means clustering results, we can see that see that customers with
Cluster_Id =2 best fit the criteria.
k-Means Clustering returns 3 target Customer.
29
Hierarchical Clustering
After Visualizing Tree by Linkage Methods, I choose K=4 here.
30
By the RFM criteria, we should choose the customer clusters with a lower
recency, a higher frequency and amount.
From the K-means clustering results, we can see that customers with
Cluster_Labels=2 best fit the criteria.
Hierarchical Clustering returns 8 target customer.
31
Future plan
K-Means Clustering and Hierarchical Clustering returns 3 and 8 target customer
respectively.
As a marketing manager, I would like to target on 3 customer group to making
precision marketing plans and campaigns in the short term.
However, since 8 target customers are all good based on RFM : Recency,
Frequency, and Monetary, I will keep paying attention to them in the long term.

More Related Content

Similar to Final presentation (yw5178)

Yuchen's final draft (yy3743)
Yuchen's final draft (yy3743)Yuchen's final draft (yy3743)
Yuchen's final draft (yy3743)YuchenYang34
 
Twitter Based Sentimental Analysis of Impact of COVID-19 on Economy using Naï...
Twitter Based Sentimental Analysis of Impact of COVID-19 on Economy using Naï...Twitter Based Sentimental Analysis of Impact of COVID-19 on Economy using Naï...
Twitter Based Sentimental Analysis of Impact of COVID-19 on Economy using Naï...CSCJournals
 
Part 1 Interest RatesMacroeconomic factors that influence inter.docx
Part 1 Interest RatesMacroeconomic factors that influence inter.docxPart 1 Interest RatesMacroeconomic factors that influence inter.docx
Part 1 Interest RatesMacroeconomic factors that influence inter.docxssuser562afc1
 
Part 1 Interest RatesMacroeconomic factors that influence inter.docx
Part 1 Interest RatesMacroeconomic factors that influence inter.docxPart 1 Interest RatesMacroeconomic factors that influence inter.docx
Part 1 Interest RatesMacroeconomic factors that influence inter.docxkarlhennesey
 
A Comparative Analysis of the Level of a State’s Economic Development with th...
A Comparative Analysis of the Level of a State’s Economic Development with th...A Comparative Analysis of the Level of a State’s Economic Development with th...
A Comparative Analysis of the Level of a State’s Economic Development with th...James Darnbrook
 
An Empirical Study on the Change of Consumption Level of Chinese Residents
An Empirical Study on the Change of Consumption Level of Chinese ResidentsAn Empirical Study on the Change of Consumption Level of Chinese Residents
An Empirical Study on the Change of Consumption Level of Chinese ResidentsDr. Amarjeet Singh
 
Beyond GDP: Measuring well-being and progress of Nations
Beyond GDP: Measuring well-being and progress of NationsBeyond GDP: Measuring well-being and progress of Nations
Beyond GDP: Measuring well-being and progress of NationsKübra Bayram
 
A. Krasovskii, D. Pisarenko, Modeling Control of Population Dynamics in Russi...
A. Krasovskii, D. Pisarenko, Modeling Control of Population Dynamics in Russi...A. Krasovskii, D. Pisarenko, Modeling Control of Population Dynamics in Russi...
A. Krasovskii, D. Pisarenko, Modeling Control of Population Dynamics in Russi...Dmitri Pisarenko
 
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATEREGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATEChaoyi WU
 
Sirui_Zhang_Demograhpy_Term_Paper
Sirui_Zhang_Demograhpy_Term_PaperSirui_Zhang_Demograhpy_Term_Paper
Sirui_Zhang_Demograhpy_Term_PaperSirui Zhang
 
Sirui_Zhang_Demograhpy_Term_Paper
Sirui_Zhang_Demograhpy_Term_PaperSirui_Zhang_Demograhpy_Term_Paper
Sirui_Zhang_Demograhpy_Term_PaperSirui Zhang
 
An Assignment On Advanced Biostatistics
An Assignment On Advanced BiostatisticsAn Assignment On Advanced Biostatistics
An Assignment On Advanced BiostatisticsAmy Roman
 

Similar to Final presentation (yw5178) (17)

Yuchen's final draft (yy3743)
Yuchen's final draft (yy3743)Yuchen's final draft (yy3743)
Yuchen's final draft (yy3743)
 
Twitter Based Sentimental Analysis of Impact of COVID-19 on Economy using Naï...
Twitter Based Sentimental Analysis of Impact of COVID-19 on Economy using Naï...Twitter Based Sentimental Analysis of Impact of COVID-19 on Economy using Naï...
Twitter Based Sentimental Analysis of Impact of COVID-19 on Economy using Naï...
 
Analysis of the Influence of Economic Growth, Poverty, and Education on the S...
Analysis of the Influence of Economic Growth, Poverty, and Education on the S...Analysis of the Influence of Economic Growth, Poverty, and Education on the S...
Analysis of the Influence of Economic Growth, Poverty, and Education on the S...
 
Part 1 Interest RatesMacroeconomic factors that influence inter.docx
Part 1 Interest RatesMacroeconomic factors that influence inter.docxPart 1 Interest RatesMacroeconomic factors that influence inter.docx
Part 1 Interest RatesMacroeconomic factors that influence inter.docx
 
Part 1 Interest RatesMacroeconomic factors that influence inter.docx
Part 1 Interest RatesMacroeconomic factors that influence inter.docxPart 1 Interest RatesMacroeconomic factors that influence inter.docx
Part 1 Interest RatesMacroeconomic factors that influence inter.docx
 
A Comparative Analysis of the Level of a State’s Economic Development with th...
A Comparative Analysis of the Level of a State’s Economic Development with th...A Comparative Analysis of the Level of a State’s Economic Development with th...
A Comparative Analysis of the Level of a State’s Economic Development with th...
 
An Empirical Study on the Change of Consumption Level of Chinese Residents
An Empirical Study on the Change of Consumption Level of Chinese ResidentsAn Empirical Study on the Change of Consumption Level of Chinese Residents
An Empirical Study on the Change of Consumption Level of Chinese Residents
 
Beyond GDP: Measuring well-being and progress of Nations
Beyond GDP: Measuring well-being and progress of NationsBeyond GDP: Measuring well-being and progress of Nations
Beyond GDP: Measuring well-being and progress of Nations
 
A. Krasovskii, D. Pisarenko, Modeling Control of Population Dynamics in Russi...
A. Krasovskii, D. Pisarenko, Modeling Control of Population Dynamics in Russi...A. Krasovskii, D. Pisarenko, Modeling Control of Population Dynamics in Russi...
A. Krasovskii, D. Pisarenko, Modeling Control of Population Dynamics in Russi...
 
Sustainability 11-03686-v2
Sustainability 11-03686-v2Sustainability 11-03686-v2
Sustainability 11-03686-v2
 
An Analytical Study Of The Impact Of Unemployment On Economic Growth In Kenya
An Analytical Study Of The Impact Of Unemployment On Economic Growth In KenyaAn Analytical Study Of The Impact Of Unemployment On Economic Growth In Kenya
An Analytical Study Of The Impact Of Unemployment On Economic Growth In Kenya
 
Business statistcs
Business statistcsBusiness statistcs
Business statistcs
 
Business statistics
Business statistics Business statistics
Business statistics
 
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATEREGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
REGRESSION ANALYSIS ON HEALTH INSURANCE COVERAGE RATE
 
Sirui_Zhang_Demograhpy_Term_Paper
Sirui_Zhang_Demograhpy_Term_PaperSirui_Zhang_Demograhpy_Term_Paper
Sirui_Zhang_Demograhpy_Term_Paper
 
Sirui_Zhang_Demograhpy_Term_Paper
Sirui_Zhang_Demograhpy_Term_PaperSirui_Zhang_Demograhpy_Term_Paper
Sirui_Zhang_Demograhpy_Term_Paper
 
An Assignment On Advanced Biostatistics
An Assignment On Advanced BiostatisticsAn Assignment On Advanced Biostatistics
An Assignment On Advanced Biostatistics
 

Recently uploaded

B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxJohnnyPlasten
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...dajasot375
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Delhi Call girls
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfLars Albertsson
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfMarinCaroMartnezBerg
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationBoston Institute of Analytics
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystSamantha Rae Coolbeth
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingNeil Barnes
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiSuhani Kapoor
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts ServiceSapana Sha
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptxAnupama Kate
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改atducpo
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxEmmanuel Dauda
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Jack DiGiovanna
 

Recently uploaded (20)

Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in  KishangarhDelhi 99530 vip 56974 Genuine Escort Service Call Girls in  Kishangarh
Delhi 99530 vip 56974 Genuine Escort Service Call Girls in Kishangarh
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Log Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptxLog Analysis using OSSEC sasoasasasas.pptx
Log Analysis using OSSEC sasoasasasas.pptx
 
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
Indian Call Girls in Abu Dhabi O5286O24O8 Call Girls in Abu Dhabi By Independ...
 
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
Best VIP Call Girls Noida Sector 39 Call Me: 8448380779
 
Schema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdfSchema on read is obsolete. Welcome metaprogramming..pdf
Schema on read is obsolete. Welcome metaprogramming..pdf
 
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
VIP Call Girls Service Charbagh { Lucknow Call Girls Service 9548273370 } Boo...
 
FESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdfFESE Capital Markets Fact Sheet 2024 Q1.pdf
FESE Capital Markets Fact Sheet 2024 Q1.pdf
 
Predicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project PresentationPredicting Employee Churn: A Data-Driven Approach Project Presentation
Predicting Employee Churn: A Data-Driven Approach Project Presentation
 
Unveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data AnalystUnveiling Insights: The Role of a Data Analyst
Unveiling Insights: The Role of a Data Analyst
 
Brighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data StorytellingBrighton SEO | April 2024 | Data Storytelling
Brighton SEO | April 2024 | Data Storytelling
 
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service BhilaiLow Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
Low Rate Call Girls Bhilai Anika 8250192130 Independent Escort Service Bhilai
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
Call Girls In Mahipalpur O9654467111 Escorts Service
Call Girls In Mahipalpur O9654467111  Escorts ServiceCall Girls In Mahipalpur O9654467111  Escorts Service
Call Girls In Mahipalpur O9654467111 Escorts Service
 
Decoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in ActionDecoding Loan Approval: Predictive Modeling in Action
Decoding Loan Approval: Predictive Modeling in Action
 
100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx100-Concepts-of-AI by Anupama Kate .pptx
100-Concepts-of-AI by Anupama Kate .pptx
 
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
代办国外大学文凭《原版美国UCLA文凭证书》加州大学洛杉矶分校毕业证制作成绩单修改
 
Customer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptxCustomer Service Analytics - Make Sense of All Your Data.pptx
Customer Service Analytics - Make Sense of All Your Data.pptx
 
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
Building on a FAIRly Strong Foundation to Connect Academic Research to Transl...
 

Final presentation (yw5178)

  • 1. 1 Final Presentation Yanle Wang-yw5178 Contents Part 1 Self-introduction 2 Part 2 Course Summary 3 Part 3 Market research report `4 Session1 4 Session2 5 Session3 6 Part 4 Appendix 14
  • 2. 2 Self-introduction URL of Kaggle Notebook https://www.kaggle.com/yanlewang/customer-segementation-yw5178 Github URL: https://colab.research.google.com/github/Yanle57/NYU_Integrated_Marketing Linkedin: www.linkedin.com/in/yw-wang-04a3231b7 Yanle (Mike) Wang received his bachelor’s degree in management from Shanghai University of International Business and Economics. He has great passion for Marketing and enjoys creating value for customers. He has interned for Hitachi (China) and Nike, learning different market-oriented strategies to enhance brand value. He is also interested in doing volunteer works to meet different kinds of people and help them. He thinks this is a way to learn the need and pain from potential customers. He has won prize for “Internet plus” innovation and entrepreneurship in solving online tutor in poverty areas.
  • 3. 3 Course Summary I have learnt many theory and data modeling methods from the course. In the theoretical part, the methods and logic of many investigations have given me a deeper understanding of what market research should do. The three platforms used in the class also allowed me to experience the entire process from finding data to final analysis and modeling. Through this course, I really discovered that LinkedIn is also a very good platform to let others know you. I am now getting more and more interested in doing research and investigation, and I want to learn more about programming in the future to make better use of these platforms and software. I am looking for an internship in data marketing to prepare for my future career
  • 4. 4 Market research report-Session 1 Data Set All annual data of the variables from 1973 to 2016 are collected from National Bureau of Statistics. http://www.stats.gov.cn (Some missing data are estimated by Yanle Wang) Y: The life expectancy at birth of the population life expectancy at birth is our most commonly used life expectancy indicator. It shows the average number of years the new born population is expected to survive and is an important indicator of the health of the population. X1: College enrollment rate (% of total population) The college enrollment rate Macroscopically shows the eduction level of a country. We consider that the increase of college enrollment rate will certainly lead to the increase of the life expectancy. However, we do not think this relationship is a single linear one. X2:GDP ( Taking 1973 as Fixed Base Index ) GDP refers to the total sum of all final products and services produced by a country for a given period of time and it is often considered as an indicator of country’s economic state. This paper assumes that the economic situation reflect people’s consumption level in a period of time. It will affect the volume of people’s spending on the all life-related behaviour. This paper predicts that GDP has a significantly positive impact on Y. And in this paper, we set the date of 1973 as the base. X3: Hospital bed number(per thousand people) The paper assumes that the hospital bed number (per thousand people) of the medical industry is an important indicator that shows the medical resource that per person can use. Recent years, with the continuous improvement of medical level, more diseases can be cured. However, from a macro angle, it is the medical resources that per person can use contribute to their life expectancy. We also consider that the relationship is positive.
  • 5. 5 Market research report-Session 2 Research Abstract The Chinese population has entered a period of healthy post-transformation, and the level of death has shown a steady decline, and the life expectancy of the population has continued to increase. Although it is difficult to predict how long a particular person will live, it is possible to calculate and inform, by a scientific method, the Life expectancy at birth of the population is feasible to study. It is of great practical significance to study the life expectancy in the country and this paper will focus on what factors will have influence on the life expectancy at birth of the population from the macro perspective. The research will use linear regression to study the relationship between Life expectancy at birth of the population and other variables. Google Datastudio Link for Public: https://datastudio.google.com/reporting/9adf893b-7baa-45df-aa28-46662f9dd349
  • 6. 6 Market research report-Session 3 Research Executive Summary Yw5178 Yanle Wang All annual data of the variables from 1973 to 2016 are collected from National Bureau of Statistics. http://www.stats.gov.cn (Some missing data are estimated by Yanle Wang) [The GDP ($) takes 1973 as Fixed Base Index.] It is of great practical significance to study the life expectancy in the country and this project will focus on what factors will have influence on the life expectancy at birth. In this project, we will use linear regression model to check the correlation between College enrollment rate (% of total population) (X1), GDP (X2), Hospital bed number (per thousand people) (X3) and Life expectancy at birth of the population (Y). The assumption model and H0 is listed below. Y=B1X1+B2X2+B3X3+B0+e ---[H0: B1=0, B2=0, B3=0. (α=0.05)] The result showed that (X1) (X2) (X3) and (Y) are significantly related. However, the model need further revised after assumption check. Github URL: https://colab.research.google.com/github/Yanle57/NYU_Integrated_Marketi ng
  • 7. 7 Conduct the Analysis Scatterplot 1 For X1: College enrollment rate (% of total population) and Y: Life expectancy at birth of the population The scatterplot shows a proportional (active) liner relationship. College enrollment rate might have a positive effect on Life expectancy at birth of the population. However, it is also possible that the relationship between them is not linear but increasing in square.
  • 8. 8 Conduct the Analysis Scatterplot 2 For X2: GDP and Y: Life expectancy at birth of the population The scatterplot does not show a clear relationship. It seems that when GDP level is more than 1.5 E+11, it might have a positive effect on Life expectancy at birth of the population.
  • 9. 9 Conduct the Analysis Scatterplot 3 For X3: Hospital bed number (per thousand people) and Y: Life expectancy at birth of the population The scatterplot shows a proportional (active) liner relationship. More hospital bed providing might have a positive effect on Life expectancy at birth of the population.
  • 10. 10 Conduct the Analysis Summery •I choose this Group regression to see relationship between the X1, X2 and Y. Y=B1X1+B2X2+B3X3+B0+e ---[H0: B1=0, B2=0, B3=0. (α=0.05)] X1: •For P-Value =0.00 < 0.05, type I error, reject and B1=0.3212 For each one percentage increase in College enrollment rate, the life expectancy at birth of the population will be increased by 0.3212. X2 •For P-Value =0.00 < 0.05, type I error, reject and B2= -4.106e-11 For each increase in total GDP, the life expectancy at birth of the population will be decreased by -4.106e-11. X3 •For P-Value =0.00 < 0.05, type I error, reject and B3=2.2161 For each one increase in Hospital bed number (per thousand people), the life expectancy at birth of the population will be increased by 2.2161.
  • 11. 11
  • 12. 12 Assumptions Check and Further Research After assumption check, 2 assumptions are satisfied and 4 are not satisfied. 1. Because p=0.00 we reject the H0: X1 and X2/ X2 and X3/ X1 and X3 are not correlated, respectively. (Not Satisfied) 2. As the predictions increase, the residual plot shows an unparallel distribution, which shows the variance of the error term is not constant. (Not Satisfied) 3. The mean of the error term is close to 0. (Satisfied) 4. The observations have been drawn not independently indicates that the error term is not normally distributed. (Not Satisfied) 5. Error terms are uncorrelated. (Satisfied) 6. Scatter plot (Not Satisfied) The means of Y, given X1, do not lie on a straight line with slope B1. The means of Y, given X2, do not lie on a straight line with slope B2. The means of Y, given X3, lie on a straight line with slope B3.
  • 13. 13 Conclusion and Suggestion: Our model needs further improvement. The pure linear regression might be not good for this data, Especially for X1. As a marketing manager, the model result seems not reasonable. 1. GDP growth should have a positive effect on Y. This might be caused by the correlation of independent data. Future research should avoid using such correlated variable combinations. 2. It is also possible that the relationship between College enrollment rate and Life expectancy at birth of the population is not linear but increasing in square. Linear regression can be done to test X1^2 and Y.
  • 14. 14 Appendix Capstone Project Milestone 2: Research Design and The Data
  • 15. 15 Capstone Project Milestone 3: Hypothesis Testing Executive Summary Yw5178 Yanle Wang Data Sources: Direct Marketing campaigns data (phone calls) of a Portuguese banking institution: https://data.world/data society/bank marketing data Economy data before and after Covid 19: https://stats.oecd.org/index.aspx?queryid=33940# Hypothesis Test-Parametric Test: T-Test, Paired T-Test and Group T-Test In population, the Null Hypothesis is True,a type I error of 0.05 Github URL: https://colab.research.google.com/github/Yanle57/NYU_Integrated_Marketing
  • 16. 16 Hypothesis Test 1 •I choose this test to see whether the data population is normal distribution or not. •H0: The data population is in normal distribution •Since the result is Normal=False, I choose Spearman correlation. •For alpha=0.05, P-Value=0.005417 < 0.05, type I error, reject.
  • 17. 17 Hypothesis Test 2 •I choose this Paired T-Test to see whether the data population in different time period has significant difference. •H0: The data population in different time period has no significant difference •The data population in different time period has significant difference •For power=0.09, cohen d=0.1, P-Value =0.0 < 0.05, type I error, reject.
  • 18. 18 Hypothesis Test 3 •I choose this Group T-Test to see whether the two data populations have significant difference. •H0: The two data populations do not have significant difference •The two data populations have significant difference •For power=1, cohen d=0.23077, P-Value =2.76e-137 < 0.05, type I error, reject.
  • 19. 19 Power Analysis and Final Remarks Conclusion and limitation: For a 0.15 cohen d effect size, a power of 0.80, and a type I error of 0.05, we need a sample size of 699 (for each group), which is a valuable decision for business use. If my boss told me that the revenue of the product will increase by more than 10% rather than 15% of the standard deviation, the sample size well be increased to 1571.
  • 20. 20 Capstone Project Milestone 4: Regression Executive Summary Yw5178 Yanle Wang The data source is a Customer Churn Prediction in 2020. It is used to predict whether a customer will change telco provider. In this project, we will use linear regression model to check the correlation between Total_day_minutes (X1), Total_day_calls (X2) and Total_day_charge (Y). The assumption model and H0 is listed below. Y=B1X1+B2X2+B0+e H0: B1=0, B2=0. (α=0.05) The result showed that Total_day_minutes (X1) and Total_day_charge (Y) have strong positive relationship. Total_day_calls (X2) and Total_day_charge (Y) are not significantly related. Data Sources: https://www.kaggle.com/c/customer-churn-prediction-2020 Github URL: https://colab.research.google.com/github/Yanle57/NYU_Integrated_Marketi ng
  • 21. 21 Conduct the Analysis Scatterplot 1 For X1: Total_day_minutes and Y: Total_day_charge The scatterplot shows a proportional (active) liner relationship. Total_day_minutes might have a positive effect on Total_day_charge.
  • 22. 22 Conduct the Analysis Scatterplot 2 For X2: Total_day_calls and Y: Total_day_charge The scatterplot does not show a clear relationship. It is hard to predict the relation between Total_day_calls and Total_day_charge at this stage.
  • 23. 23 Conduct the Analysis Summery •I choose this Group regression to see relationship between the X1, X2 and Y. Y=B1X1+B2X2+B0+e H0: B1=0, B2=0. (α=0.05) X1: •For P-Value =0.00 < 0.05, type I error, reject and B1=0.170 For each increase in total day time minute, the total day time charge will be increased by 0.17. (0.17 dollar/minute) X2 •For P-Value =0.395 > 0.05, not reject. B2=0 For each increase in total day time call, there is no significant increase in the total day time charge.
  • 24. 24
  • 25. 25 Assumptions Check and Further Research After assumption check, 4 assumptions are satisfied and 2 are not satisfied. 1. Because p=0.961 we accept the H0: X1 and X2 are not correlated. (Satisfied) 2. As the predictions increase, the residual plot shows a parallel distribution, which shows the variance of the error term is constant. (Satisfied) 3. The mean of the error term is very close to 0. (Satisfied) 4. The observations have been drawn not independently indicates that the error term is not normally distributed. (Not Satisfied) 5. Error terms are uncorrelated. (No time series data). (Satisfied) 6. The means of Y, given X1, lie on a straight line with slope B1. The means of Y, given X2, do not lie on a straight line with slope B2. (Not Satisfied) Suggestion: Our model needs further improvement. The linear regression might be not good for this data. As a marketing manager, the model result seems reasonable that the Total_day_minutes might have a positive effect on Total_day_charge. However, more test like the relation between night_minutes and night_charge / eve_minutes and eve_charge should be done to judge whether Total minutes and Total charge have the positive and valid relationship.
  • 26. 26 Capstone Project Milestone 5: Clustering Executive Summary Yw5178 Yanle Wang • Online retail customer clustering https://www.kaggle.com/hellbuoy/online-retail-customer-clustering Online retail is a transnational data set which contains all the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and registered non-store online retail. The company mainly sells unique all-occasion gifts. Many customers of the company are wholesalers. The Goal of Customer Segmentation: find the target customer. (Based on RFM : Recency, Frequency, and Monetary): • URL of Kaggle Notebook https://www.kaggle.com/yanlewang/customer-segementation-yw5178 I choose Germany retail customers’ data to do the Cluster Analysis. I use the K- Means Clustering and Hierarchical Clustering statistic methods to find the K and the target customer based on RFM.
  • 27. 27 K-Means Clustering I use “distortion”, “silhouette”, and “calinski_harabasz” metric to get a k. When metric= “distortion”, K=4. When metric= “silhouette”, I cannot find K. When metric= “calinski_harabasz”, K=4. Finally I choose the best K=4 (calinski_harabasz)
  • 28. 28 By the RFM criteria, we should choose the customer clusters with a lower recency, a higher frequency and amount. From the K-means clustering results, we can see that see that customers with Cluster_Id =2 best fit the criteria. k-Means Clustering returns 3 target Customer.
  • 29. 29 Hierarchical Clustering After Visualizing Tree by Linkage Methods, I choose K=4 here.
  • 30. 30 By the RFM criteria, we should choose the customer clusters with a lower recency, a higher frequency and amount. From the K-means clustering results, we can see that customers with Cluster_Labels=2 best fit the criteria. Hierarchical Clustering returns 8 target customer.
  • 31. 31 Future plan K-Means Clustering and Hierarchical Clustering returns 3 and 8 target customer respectively. As a marketing manager, I would like to target on 3 customer group to making precision marketing plans and campaigns in the short term. However, since 8 target customers are all good based on RFM : Recency, Frequency, and Monetary, I will keep paying attention to them in the long term.