My name is Yueyao Wang. The slide is the revised version of my final presentation slide, in Statistical Measurements&Analysis Integrated Marketing major from New York University. Thank you~
2. CONTENT
●Part I: Self-introduction slides
headshot and one paragraph of self-introduction
Github Repo Link
Kaggle Notebook Link(s)
LinkedIn URL
●Part II: Summary of what I have learned in this course and my takeaway for personal and professional growth
●Part III: My own market research report
Session 1. a new dataset that is not one of the instruction datasets we used in this course.
Session 2. Reproduce: Capstone Project Milestone 2: Research Design and The Data
Session 3. Reproduce: Capstone Project Milestone 3: Hypothesis Testing
●Part VI: Appendix
Capstone Project Milestone 2: Research Design and The Data
Capstone Project Milestone 3: Hypothesis Testing
Capstone Project Milestone 4: Regression
Capstone Project Milestone 5: Clustering
3. Self-introduction
• I come from Nanjing, China. After finishing my bachelor degree of Business English,
I continue to study Integrated Marketing in New York University. In 2019, I had an
internship in IKEA Nanjing for about 2 months. Although it was not a long time, I
learned some basic knowledge relating to marketing which was very important and
helpful to me. Also, the leader taught me the significance of pricing based on market
research.
• In the university, when I was a junior student, I participated in the Business English
Contest. In this contest, every team needed to plan a crossover joint of two different
brands. During this process, I found my interest and creation of brand marketing, so
it gave me a small direction in the future. My hobby in the daily life is dance—jazz
which is my favorite hobby.
• Tel: 8613057556779 Email: yw5244@nyu.edu
Yueyao Wang
5. What I have learned in this course
• In this course, I have learned a lot about statistics. Because in my college life I majored in
arts, I did not have many knowledge related to Math. At first, I feel a little bit afraid of this
course. However, recently, I am confident that I have the ability to handle it!
• When I do my own Hypothesis Testing for the final presentation, I figure out the problem of
uploading data independently. I think this process is very meaningful to me and I also gain a
sense of achievement.
7. The source of my data
●Name of Data : Students Performance
●Link:
https://www.kaggle.com/spscientist/students-performance-in-
exams?select=StudentsPerformance.csv
●Summary of Data:evaluate the writing/reading/math
score from 5 angles-- gender, race, parental level of
education, lunch, test preparation; 1000 samples
8. Session1. Reproduce: Capstone Project Milestone 2: Research Design and
The Data
• https://datastudio.google.com/u/0/reporting/57c99570-7075-4c82-bc5d-095f8791adb9/page/vsQrB/edit
9. Session 3. Reproduce: Capstone Project Milestone 3: Hypothesis Testing
From this chart, we can
see the whole data.
10. Session 3. Reproduce: Capstone Project Milestone 3: Hypothesis Testing
One-Sample T-test
Reason:
In order to observe the mean of
Math score, so we use one-
sample T-test.
Null hypothesis:
The mean of Math score equals
to 60.
Conclusion:
Since the p-value is smaller
than 0.05, reject the null
hypothesis test. The mean of
Math score is larger than 66.
11. Two-Sample T-Test
Reason:
In order to observe the relationship between
two elements, so we use two-sample T-test.
Null hypothesis:
Lunch will have impact on students Math
score.
Conclusion:
Since the p-value is smaller than 0.05, reject
the null hypothesis test. Therefore, there is
no relationship between the free/ reduced
lunch and those standard.
12. Power analysis: T-test
Conclusion: For a 0.05 cohen d effect size, a power of 0.80, and a
type I error of 0.05, we need a sample size of 6280 (for each group).
13. summary
Limitations:
We can test the more detailed category of lunch. Also, we can have larger data sample.
Conclusions:
The parental level of education may have a huge effect on students performance. However, the lunch
standard has no relationship with Math score.
15. Capstone Project Milestone 3: Hypothesis Testing
data of a Portuguese banking institution
URL: https://data.world/data-society/bank-marketing-data
the GDP data of countries from G20 countries
URL:https://stats.oecd.org/index.aspx?queryid=33940#
Github Report URL:https://github.com/wyyyyy-
627/NYU_Integrated_Marketing
1
2
3
Data Sources
16. Paried T-test
Reason: The chart can verify and compare the fluctuation of GDP volume before and after COVID-19,
which is from 2018 to 2020.
Null hypothesis: The countries’ GDP are almost the same in 2018 Q2 and 2020 Q2.
Conclusion: Since p-value is smaller than 0.05, reject the null hypothesis test. Most countries' GDP in
2018 is higher than GDP in 2020, and COVID-19 haves a negative impact on the country’s GDP.
17. Assumption
Reason:
I find that the linear relationship between
variables and the direction of the correlation.
Therefore, I can choose Pearson correlation
to test.
Null hypothesis:
GDP of the same country in 2018 Q2 and
2020 Q2 are not correlated.
Conclusion:
Since the p-value is smaller than 0.05, reject
the null hypothesis test. The null hypothesis
that GDP for same country in 2018 and 2020
are not correlated.
18. Two-Sample T-Test
Reason:
In order to observe the
relationship between two
elements, so we use two-sample
T-test.
Null hypothesis:
Balance of people with a loan is
same as those without.
Conclusion:
Since the p-value is smaller than
0.05, reject the null hypothesis
test. Therefore, there is no
relationship between the balance
of people with a loan and those
without.
19. Power analysis: T-test
Conclusion: For a 0.2 cohen d effect size, a power of 0.80, and a type I error of 0.05, we need a
sample size of 393 (for each group).
20. summary
Limitations:
We can test the effect of COVID-19 on the GDP of G20 countries more cpmprehensively, such as the data of 2021 in
the future. Also, we can have larger data sample.
Conclusions:
COVID-19 has a significant impact on the economies and GDP of G20 countries.
Following the outbreak of COVID-19, countries are correlating. People have loan balances are different from those that
don't have.
21. Capstone Project Milestone 4: Regression
►Summary of the data sources:
Based on the 4250 samples, I will analyze whether a customer will change telecommunications
provider, something known as “churning”.
►The regression model you choose and the result:
The regression model I choose is OLS. The result shows that length of time will influence the
charge, then the customers reconsider their choice.
►Github Report URL: https://colab.research.google.com/github/wyyyyy-
627/NYU_Integrated_Marketing/blob/main
►URL of data sources: https://www.kaggle.com/c/customer-churn-prediction-2020
22. Scatterplots
In these two charts, we will find that total
day calls and total night charge have no
linear relationship. However, total day
minutes and total day charge have linear
relationship.
23. Regression Result
When X1- total day calls, X2- total day minutes, the P-value is 0.86 and 0.513. Both P-value are larger
than 0.05, so we do not reject the hypothesis. There is no linear relationships.
24. Insights Gained from the Regression
From the only one linear relationship-- total day minutes and total day charge, I know that the
length of time will greatly influence the money. Therefore, the company can lower the money they
charge per minute. Also, they can publish more variety of phone plans, which can decrease the
total day charge and attract more customers.
Although total day calls and total day minutes do not have linear relationship with total night
charge, we should still pay attention to total night charge. For example, the night charge can be
lower than the day.
25. Assumptions Check and Further Research
Firstly, the scatter plot shows that there is
no correlation.(Check Assumption 2,4)
Secondly, a histogram of the residuals
i n d i c a t e s t h a t i t i s n o r m a l l y
distributed.(Check Assumption 1 and 3)
Thirdly, the P-value is 0.961 which is
larger than 0.05, so the independent
v a r i a b l e s a r e c o r r e l a t e d . ( C h e c k
Assumption6) Therefore, all the
assumptions satisfy the results.
As the further research, we need to find
more relationship which may cause the
sales decline. For example, we can
analyze the relationship between night
calls and night charge.
26. Capstone Project Milestone 5: Clustering
• Data sourcd URL: https://www.kaggle.com/hellbuoy/online-retail-customer-clustering
• The URL to my Kaggle.com link: https://www.kaggle.com/yueyaowang/customer-
segementation-yw5244
• summary of the data sources: Online retail is a transnational data set which contains all
the transactions occurring between 01/12/2010 and 09/12/2011 for a UK-based and
registered non-store online retail. The company mainly sells unique all-occasion gifts.
Many customers of the company are wholesalers.
• the statistic methods you choose and the result: The country I choose is Germany. K-
mean cluster analysis and hierarchical clustering are two kind of method to research the
case and get the result. Hierarchical clustering: 2 target customer return.
28. Interpreting the Clustering
Based on the RFM rule, we should choose the customer
clusters with a lower recency, a higher frequency and
amount. From the K-means clustering results, we can find
that see that customers with Cluster_Id=1 best fits the
criteria.
29. Hierarchical Clustering
Hierachical clustering visualize tree by linkage
methods. In complete linkage hierachical clustering,
the distance between two clusters is defined as the
longest distance between two points in each cluster.
30. Hierarchical Clustering Analysis
Based on the RFM criteria, we should choose the customer
clusters with a lower recency, a higher frequency and amount.
From the K-means clustering results, we can find that see
that customers with Cluster_Id=1 best fits the criteria.