SlideShare a Scribd company logo
1 of 40
Download to read offline
Scooter (Monopattino)
Sharing Service
Analysis of responses to the questionnaire with
marketing purposes, using machine learning.
Emre Danışan
Giovanni Roja
Mehmet Berk Souksu
Aslı Senel
Context of the Project
Introduction
1
Cluster Analysis
Discriminant Analysis
Descriptive Analysis
Segmentation
and Profiling
3
Logistic Regression
Decision Tree
Regression
Analysis
4
Recommendation
of Pricing
Conclusion
6
Appendix
7
Understanding Data
Cleaning Data
Factor Analysis
Data
Preparation
2
5
Agenda
Launching a scooter (monopattino) sharing service in Milan
Target population: university students in Milan
The objectives are listed as follows:
Understand the potential customers
Segmentation of the potential customers and their uncover insights
Evaluation of the willingness to pay of customers and recommendation of
pricing and billing method
1
Context of the Project
The survey consists of 25 questions. Some of them were including sub questions in the form of Likert Scale rating.
There are 353 data rows, whereas 272 of them could have been completed. There were incomplete rows and
some mistakes in the data, which we will explain and demonstrate how we have handled them.
While analyzing the data, we understood that there are 4 main sections which were dedicated to public
transportation, car, shared mobility and scooter sharing.
Two sections public transportation users (PT) and car owners (C) make up two parallel paths in terms of
encountering questions during the survey. Both of the paths unite in shared mobility section (SM) and continue to
scooter (SC) and personal (P) questions.
Understanding the Data
2
First we had to separate the variables into two dataframes respective to the paths. After doing so, we removed
the question number 2 which separates the paths.
The first question makes people directly exit, if answered no, so we removed the question columns and the
observations that directly exit.
Since there is an A/B test, which is the question number 20, we have needed to separate our observations
depending on the question viewed.
To be able to not have “Not Available” rows we created sub datasheets for each part of the question which are:
Public_Trans : Public Transportation Users
Car : Car owners
Shared : Shared Mobility
Scooter : Scooter and Personal questions
Public_Shared : Public Transportation & Shared Mobility
Cleaning the Data - Questionnaire
2
It is identified that 9 columns are meta data, which are irrelevant to the our data analysis. So following
columns were removed from original data. The meta data are as follow:
Respondent ID | Collector ID | Start Date | End Date | IP Address Email Address | First Name | Last Name |
Custom Data 1
Since the both of the first two rows are used for header information, we removed one of them and put the
necessary names on the second row.
We added 11 to age information, to make it show real ages’ number.
There was a problem in the 14th question; it had no values filled. So we wrote a for loop to correct the
problem looking at the answers to the 15th question.
We also changed the answers of `Question Viewed` to 0 and 1 to simplify and convert all the data to
numericals.
Cleaning the Data - Excel Sheet
2
For our factor analysis, we have factored the variables
which are belong to the same segment and same
question. So they’re in the same context and we can
make the correlations between variables easier.
While selecting the appropriate factor number, we
checked the Eigen values, as you can see in the graph.
We prioritized to take only the ones above 1, however
depending on the elbow point and the combination of
the variables that makes sense, we took some values
above 0.90 as well.
Factor analysis have been performed to the data related with: Public Transportation(PT),
Shared Mobility(SM), Car(C), Personal (P)
Factor Analysis
2
Example of Factor: Public Transportation
2
Factor Analysis
Other factor sepertions can be seen in [Appendix].
3
Segmentation and Profiling
Main objectives of this section settled as follow:
Inside homogenity
Outside heterogenity
Meaningful size between the clusters
Evaulation of meaningfulness of differences and sizes
For first step; K-Means Cluster analysis have been performed.
K-Means Cluster Analysis was performed with the factors from
factor analysis which are in Likert Scale form.
Even though likert scale variables are ordinal data from nature,
for this marketing research they are regarded as interval data.
Firstly, number of combination with the range of 2 to 5 factors were analyzed for 2 to 6 clusters for each factors respectively. In the deciding process, first
scatter plot of each factors was interpreted for respective cluster number. To see how clusters are different, boxplots of combinations were evaluated. To
understand the respective meaningfulness of differences, t tests were examined and mean of groups were evaluated. Finally, the size of each cluster was
checked to see if they have effective number to be target segment.
18 combinations of 2 factors were evluated. Coolness(SM) and Convenience(SM) fastors were selected since they gave the best result in the process. We also
evaluated them in terms of tukeyhsd, barlett, levene and one way test to understand if the clusters have normality in distribution and consistency in variance
perspective . After interpretation of 2 factors, combinations of 3 and 4 factors were also evaluated. In the end of the process following 4 factors selected since
they gave the best results in terms of tests. The test applied to data and their results can be seen as follow:
3
K-Means Clustering Analysis
Factor Name
Type of Tests
Result
Anova Test Tukey HSD Bartlett Levene Oneway Test
Touristic(PT) Ho is accepted Ho is accepted Ho is rejected Ho is accepted Ho is accepted Bad
Coolness(SM) Ho is rejected Ho is accepted for
1 factor
Ho is rejected Ho is rejected Ho is rejected Good
Convenience(SM) Ho is rejected Ho is accepted for
1 factor
Ho is rejected Ho is rejected Ho is rejected Good
Environmental(SC) Ho is rejected Ho is rejected Ho is accepted Ho is accepted Ho is rejected Good
After evaluating trials in terms of both tests, sizes and meaningfulness; we decided to
use factors of Touristic(PT), Coolness(SM), Convenience(SM) and Environmental(SC)
for 3 clusters with the size of 97-90-58. The result of analysis for selected factors can
be seen in graph right hand side:
3
K-Means Cluster Analysis
In the end from K-Means Cluster analysis we had the insight of cluster 1 have very high
interest for Touristic(PT) and Coolness(SM) factors, meanwhile cluster 2 for
Convenience(SM) and cluster 3 has very low interest for Environmental(SC). Both
clusters have high interest to accessibility of public transportation, infrastructure and
social network(facebook and instagram) at the same time both of them have low
interest to type of sharing of shared mobility. Addition to that, cluster 2 has high
interest to convenience of shared mobility, whereas cluster 3 has low interest, and
traditional media. [Appendix]
From the result of interest of each cluster we gave them segment name respectively:
Cluster 1: Social Hommies
Cluster 2: Independents
Cluster 3 Traditionals
Even if this method does not give clear idea about number of clusters we
prefer to cut above height 40 since the distance of lines are more clear .
3
Other Cluster Analysis
For second step of cluster analysis, hierarchical cluster analysis was
performed to see if ordinal variables itself and categorical variables itself
also have effect on the cluster differentiation.
The graph on the left side demonstrates us while cutting above the height
of 40 3 clusters are meaningful, otherwise cutting belove the height of 40 4
clusters are meaningful.
We also double checked our cluster result that we got from K-Means
analysis, and from hierarchical analysis we also got 3 clusters result.
We also have performed latent class cluster analysis with selected factors also used in K-Means Analysis to compare
the result and checked the consistency of our clusters.
Since some of the variables are ordinal, we could not use them in the K-Means method. We also decided to use latent
analysis to see how values such as age and gender affect clusters.
However, as we have seen from the previous analysis, the addition of Covarite values made the analysis more complex
and difficult to interpret. Therefore, latent analysis did not give us significant results.
After the division of the clusters, it is fundamental to analyze them in order to evaluate their characteristics and, also, how they differ.
This will lead to a better understanding of the market segment will be approached.
In this section, will be discussed the relations between the variables and to further analyze the characteristics of the previously created
clusters. Each type of variable has a different approach to be performed, namely the analysis is going to be divided into:
● Categorical
● Ordinal
● İnterval
3
Descriptive Analysis
First of all , we analyzed what was the overall satisfaction of the survey with Public Transportation in order to try to exploit their
weaknesses and strengths.
Public Transportation
The conclusion of this table is that
customers are:
1. Unhappy with Punctuality and
Reliability
2. Satisfied with Cost and Availability
3
Descriptive Analysis - Gender
Gender: Our 3 clusters were separated into their gender compositions to see if there was any major difference inside them. As can be seen from
the analysis of gender in the figures below, the clusters are well balanced in terms of males and females.
Categorical Data
The categorical variables of the provided data were: gender, age, living situation and living location. A graphical approach has been
used to take a look of the characteristics of the clusters.
Age: Secondly, the age variable will be analyzed in
terms of clusters, always remembering that the range
of age was from 16 and below to 30 and above. By
analyzing the survey, none of the clusters contained
people of ages 19 and below, that’s why they are not
appearing in the further graphs and also will be no
need of looking at these younger people in the
marketing campaign.
3
Descriptive Analysis - Age
Age Composition by Clusters
3
Descriptive Analysis – Living Location
Living Location: In the survey we had also the information of where the individuals within the clusters live, with that, is possible to
perform an analysis also on this variable to see where are the areas that could be explored in the marketing campaign.
Living Situation: Finally, the last
categorical variable that will be
analyzed for the clusters is their
living situation.
AB response analysis: The clusters also responded differently to when asked questions about the payments methods.
As we know, question A was about paying a fee to use the scooter and question B was about paying a subscription and then a reduced
fee. The result shown was that Cluster 1 is mostly willing to pay when asked question A than B. The same happens for Cluster 2, but for
Cluster 3 they are not willing at all to pay for the scooter.
3
Descriptive Analysis - A/B Test
Not only is important to see the characteristics of the clusters is also fundamental to analyze the relations among the variables. Once again, the approaches
differ from the type of data, so it will be performed three different analysis, namely:
○ Contingency table
○ Multi-group boxplot
○ Scatter plot
3
Descriptive Analysis for Combinations
Where and How people live: The first analysis performed within variables
was to check where do people live relating with how they live. This
information can provide a better insight of living situation characteristics of
the survey population.
From this analysis we can see that most of the population live inside the city
of Milan and, also, share their residences with flat mates. This result is
expected as the survey targeted university students. Also, with a great
importance, are the individuals that live in other province with their families,
showing that, even though they study in an university in Milan, a high
percentage of the survey are still living with the family.
Contingency Tables: Categorical – Categorical Variables
.
From these two tables we can see that most of people already used shared mobility
but surprisingly a great number has never used shared mobility. For instance, 1/3 of
the survey that lives inside the city of Milan has never used share mobility, and it gets
even worse when looking at outside the Province of Milan that 72% also never used.
Also, from the second table we can see that most of the people that use share
mobility share also their residence with friends.
Which people already used shared mobility: Secondly, we wanted to know which are
the people that already used share mobility in their lives. This analysis is important to
know weather in the marketing campaign will be important to do a campaign to create
awareness of this type of transportation.
Where people are satisfied with shared mobility: For instance, the first
analysis performed was to see the relation with the satisfaction of the
people with the public transportation and the where they lived. In this
analysis was used the factor previously created that shows the satisfaction
of the survey with the public transportation
From the figure it can be seen that people who lives inside the city of Milan
are more satisfied with public transportation than people from anywhere
else. 3
Descriptive Analysis for Combinations
Multi Group Box Plot: İnterval - Categorical Variables
Since most of the variables were interval, our group chose the ones that
had important relations between themselves and would help most to
understand the survey.
In the second figure we can see that the people from inside the city of Milan
uses public transportation much more than outside the city.
Finally, in this last figure we can see that people from Milan and the Province of
Milan uses shared mobility way more frequently than people from outside the
Province, meaning that this could be the area to approach in the marketing
campaign.
Scatter Plot: İnterval - İnterval
Last but not least was evaluated how interval variables were related with scatter plots. Using the same approach, the most important variables
were evaluated. Also, was performed a correlation test to see if these variables were actually correlated.
For example, analyzing the factors related to Accessibility of Public
transportation and Consciousness of the Scooter Usage we can see
that they have a correlation as the p-value is lower than the
threshold of 0.05 and a correlation of 0.1289.
3
Descriptive Analysis for Combinations
Gender: Since p-value is greater than 0.05, H0 is accepted, which
means that the composition of gender is the same across the groups.
3
Discriminant Analysis
With the discriminant analysis we are going to verify how different the clusters are.
Age: The same happens for age, the chi-square analysis verifies that H0
is accepted as well, so the clusters do not differ in terms of age.
Living Location: Differing from the previous variables, living location has
a p-value greater than the threshold, which means that the cluster are
different in this characteristic.
Chi-square test: Used to compare categorical variables among clusters.
Also, from the chi-square analysis we can see if the clusters are following the expected number.
For instance, we can see that cluster 1 has as many people living in the city of Milan as expected. In the other hand it has more individuals from the
province of Milan than expected, but less of people from outside the province. Cluster 2 has slightly more people of the city than expected, but less
people from the province. Last, cluster 3 follows pretty much the expected from the chi-square analysis.
3
Discriminant Analysis
Living Situation: Last variable, that also shows a difference between its clusters, is the living situation.
Cluster 1: More person that live with parents than expected
Cluster 2: More people that live with friends and less that lives
with parents than expected
Cluster 3: Size as expected
After making the necessary adjustments to have first insight about clusters, we went forward to make sense of data through
regression analysis. The main purpose was to see which factors had the most effects on the customers willingness to pay. It was
difficult to explain the response with the variables because during the cluster analysis we saw that no matter how we cpmbine the
factor is, it would not create distinct clusters. However, the main purpose of this analysis is to understand the most prominent factors
in different pricing strategies rather than predicting the future.
In order to understand customers' preferred billing method, we divided them into two separate datasets according to A/B Test and
applied separate regression analysis on them.
First of all we performed Linear regression method to evaluate WTP of customer, however our selected independent variables could
not explain the dependent variable , which is A/B test, very well. [Appendix]
Since it was relatively difficult to analyze the results from linear regression, we decided to convert A/B test variables, which are likert
scale means interval variables, into binary values to use in the logistic regression.
In this way, our independent variables would directly explain the payment requests of the customers from different segments. For this
reason, we assigned the people, who gave the highest two choices to A/B Test , to 1 and the others to 0.
Regression Analysis
4
When we try to decide on question about which regression model should we use, we also take advantage of heatmap function in R
to understand correlation between indepent variables much better. The output of the correlation did not directly improve the result
of the regression but it creates a broader understanding of our data.
Regression Analysis
4
We can see that some variables can affect each other as we expected but extracting them from regression did not improve regression
that much. That's why we leave all of them in the model and interpret it in a broader sense
Logistic Regression Analysis – A Test
4
Coolness(SC)/Consciousness(SC)
Coolness phenomenon is a well explanatory of the
willingness to pay as we expected. Also, consciousness
perception of product reflects same insight with
environmental factors.
Time/# connection
Time spend in the traffic increase the willingness to pay
of customer but #connection make it decrease because
it means distance is too much for scooter.
Infrastructure(PT)
Well-developed public transportation shows welfare of
users. In this kind of transportation habits, usage of
scooter will increase.
Satisfaction(PT)
Increase in the satisfaction of public transportation
affects willingness to pay of customer negatively
because it will harder to use scooter instead of public
transportation.
Environmental(SC)/Environmental(PT)
Environmental variables have important role in the
willingness to pay of customer because it is the main
driver for customer to use scooter.
Logistic Regression Analysis – B Test
4
Coolness(SM)/Coolness(SC)/Consciousness(SC)
Coolness phenomenon is a well explanatory of the
willingness to pay as we expected. Also, consciousness
perception of product reflects same insight with
environmental factors.
Accessibility(PT)
Availability of the public transportation limits the shift to
scooter.
Touristic(PT)
Increase in touristic use of public transportation also
positively affects scooter usage which is logical in a sense
of object f the customer.
Infrastructure(PT)
Well-developed public transportation shows welfare of
users. In this kind of transportation habits, usage of
scooter will increase.
How often do you use public transportation?
Increase in the frequency of the public transportation
will decrease the intention of using scooter.
Usage(S)
Habit of using shared mobility directly affect the
willingness to pay of customer as we can imagine
normally.
Environmental(PT)/Environmental(SC)
Environmental variables have important role in the
willingness to pay of customer because it is the main
driver for customer to use scooter.
Decision Tree - A/B Test
4
We also decided to use additional tools to understand data since
regression gives limited perspective about willingness to pay analysis.
The simplest but the most useful one is Decision Tree Classifier. It gives
great insight about the variables and explain the part of the data that
can not be expressed by regression.
Some of the interactions are already seen from the regression like
Consciousness(SC), Coolness(SC), and Satisfaction(PT) which are
explaining left part of the tree. We can also access to the right from the
tree which is explaining the willingness to pay of different age segments
and Accessibility(PT) of public transportation.
When we analyzed the part of Test B, it would give us
different insight about our customers that we could not
possess from regression analysis. The social media
structure in the Test B, which are Online
newspaper/magazine, On-demand radio, Twitter,
YouTube and Snapchat are more meaningful in terms of
targeting the people in this segment.. We have also
seen the effect of the gender in one of the leaf and
small distinction which determine s the difference
between genders.
Test A Test B
5
Evaluation of the Willings to Pay
In the analysis of responses of the A/B test and clusters, we tried
to assess the willingness to pay of the customers corresponding to
each cluster. To do that, we summed up all values, which is result
of the A/B test(likert scale from 1 to 5), for clusters that given
answer to A/B Response and analyzed them based on percentage
of this values. As we can see from the graphes right hand side,
cluster 2 is superior to all other clusters in A/B test. Also, to be
able to understand the behaviour of the A/B test, we add the
average of the clusters because the number of the responses in
the A/B are different from each other(There are 130 responses to
question A and 113 for question B). Since the average of the
cluster 2 is higher in the A when compared to B, we can say that
customers have more intention for willingness to pay in the
method of A and cluster 2 is superior to other clusters.
5
Recommendation of Pricing Method
Interpretation Reccomendation
As we can see from the previous analysis,
cluster 2 is the most profitable among the
others and pricing method of A is superior
to others. Cluster 2 can be named as
“Independents” because they are mostly
living with flatmates or alone. They care
about Accessibility, Infrastructure and
Convenience the most and they are
majority of the respondents. They also
have the highest percentage of
respondents living in the Comune of
Milano. Also there is no significant
difference between the genders so that
our persona will be genderless.
It can be seen that for all clusters the average of
the responses are higher for A compared to B.
Therefore, pricing method of A is more attractive
to people overall, which means that they do not
want to have monthly subscription. This shows us
that respondents tend to use scooter for short
periods, so they are willing to pay more per
minute rather than having a fixed monthly
subscription and less amount per minute. As
result we reccomend to offer pricing and billing
method with the service costs €0,15 per minute
of use.
6
Conclusion
Data Interpretation
X number of datasets were
created to analyze methods
easily in following steps.
22 Factors were selected from 80
observations.
Factor Analysis
DiscriminantDescriptive
Analysis Cluster
Analysis
Regression Analysis Evaluation of WTP and Pricing
Evaluation of A/B Test
For Test A; Consciousness(SC),
Coolness(SC), and Satisfaction(PT)
factors explain the dependent variable
better. For Test B; factor of Social
Media(P) has the significant impact.
Segmentation and Profiling
From K-Means Cluster Analysis, 4 Factors
for 3 Clusters were selected. Segments were
named as Social Hommies, Independents
and Traditionals based on results from
discriminant and descriptive analysis.
Recommendation of Pricing
Independents are selected as a target
segment since they are the one has highest
willingness to pay to first payment method
which is the service costs €0,15 per minute
of use.
7
Apendix
7
Apendix
7
Apendix
7
Apendix
Question # Variable Type Question Factos
1
Categorical
(Nominal) Do you travel at least once a week in the urban area of Milan?
2 Categorical When you travel in the urban area of Milan, what kind of transportation do you use most often?
3
Likert Scale
(Ordinal) What are the reasons that you most often choose public transportation?
Factors1: Environmental(PT), Accesibility(PT), Touristic(PT), Infrastructure(PT),
Traffic(PT)
4 Ordinal What kind of ticket for public transportation do you use most often?
5 Ordinal How often do you use public transportation?
6 Ordinal
For most of these trips, how many connections do you need to make (change metro lines, change
bus lines, etc.)?
7 Likert Scale On average, how satisfied are you with the public transportation? Factors2: Satisfaction(PT)
8 Ordinal For most of these trips, how much time would you spend with public transportation?
9 Likert Scale
What are the reasons that you most often use your own car? Factors3: Time and Comfort(C), Cool to Drive(C), Unavailability(C)
10 Ordinal How often do you travel with your own car in the urban area of Milan?
11 Ordinal For most of these trips, how easy could you find parking?
12 Ordinal For most of these trips, how much time would you spend with your own car?
13 Likert Scale On average, how satisfied are you with travelling with your car? Factors4: Satisfaction(C)
14 Categorical Have you ever used shared mobility in the urban area of Milan?
15 Likert Scale How often do you use the following types of shared mobility? Factors5: Type of Sharing
16 Likert Scale How well do the following scenarios describe your usage of shared mobility? Factors6: Usage(S)
17 Likert Scale
What are the reasons that motivate you to use shared mobility? Factors7: Coolness(SM), Convenience(SM)
18 Likert Scale On average, how satisfied are you with shared mobility that you have used? Factors8: Satisfaction(SM)
19 Likert Scale What is your opinion about using a scooter sharing? Factors9: Coolness(SC), Consciousness(SC), Environmental(SC)
A/B Test 20 Likert Scale 50.0% Consider that the service costs €0,15 per minute of use. How likely are you
going to try the service?
Consider that the service requires a monthly subscription of €8, and costs€0,05
per minute of use. How likely are you going to try the service?
21 Categorical Your gender is:
22 Ordinal Your age is?
23 Categorical Where do you live?
24 Categorical What is your living situation?
25 Likert Scale How often do you use the following media channels? Factors10: Traditional(P), News(P), Informative Network(P), Social Media(P)
7
Apendix
4 factor Analysis Social Hommies Independents Traditionals
Environmental(PT) high
medium
medium
Accessibility(PT) high high high
Touristic(PT) high low medium
Infrastructure(PT) high high high
Traffic(PT)
medium medium
medium
Satisfaction(PT) high
medium
medium
Type of sharing(SM) low low low
Usage(SM)
medium medium
low
Coolness(SM) high low medium
Convenience(SM)
medium
high low
Satisfaction(SM)
medium medium
medium
Coolness(SC)
medium medium
medium
Consciousness(SC) high low low
Environmental(SC)
medium medium
low
Traditional(P)
medium
high medium
News(P)
medium medium
medium
Informative(P)
medium medium
medium
Social Network(P) high high
high
7
Apendix
7
Apendix
7
Apendix
7
Apendix
7
Apendix

More Related Content

What's hot

Integrating Fuzzy Dematel and SMAA-2 for Maintenance Expenses
Integrating Fuzzy Dematel and SMAA-2 for Maintenance ExpensesIntegrating Fuzzy Dematel and SMAA-2 for Maintenance Expenses
Integrating Fuzzy Dematel and SMAA-2 for Maintenance Expensesinventionjournals
 
Final generalized linear modeling by idrees waris iugc
Final generalized linear modeling by idrees waris iugcFinal generalized linear modeling by idrees waris iugc
Final generalized linear modeling by idrees waris iugcId'rees Waris
 
Generalized Linear Models
Generalized Linear ModelsGeneralized Linear Models
Generalized Linear ModelsAvinash Chamwad
 
Mba2216 week 11 data analysis part 03 appendix
Mba2216 week 11 data analysis part 03 appendixMba2216 week 11 data analysis part 03 appendix
Mba2216 week 11 data analysis part 03 appendixStephen Ong
 
Comparative and Non-Comparative Scaling Techniques
Comparative and Non-Comparative Scaling TechniquesComparative and Non-Comparative Scaling Techniques
Comparative and Non-Comparative Scaling TechniquesVarsha Prakash
 
Prediction of Changes That May Occur in the Neutral Cases in Conflict Theory ...
Prediction of Changes That May Occur in the Neutral Cases in Conflict Theory ...Prediction of Changes That May Occur in the Neutral Cases in Conflict Theory ...
Prediction of Changes That May Occur in the Neutral Cases in Conflict Theory ...IJCSIS Research Publications
 
Cluster Analysis Assignment 2013-2014(2)
Cluster Analysis Assignment 2013-2014(2)Cluster Analysis Assignment 2013-2014(2)
Cluster Analysis Assignment 2013-2014(2)TIEZHENG YUAN
 

What's hot (13)

Integrating Fuzzy Dematel and SMAA-2 for Maintenance Expenses
Integrating Fuzzy Dematel and SMAA-2 for Maintenance ExpensesIntegrating Fuzzy Dematel and SMAA-2 for Maintenance Expenses
Integrating Fuzzy Dematel and SMAA-2 for Maintenance Expenses
 
Final generalized linear modeling by idrees waris iugc
Final generalized linear modeling by idrees waris iugcFinal generalized linear modeling by idrees waris iugc
Final generalized linear modeling by idrees waris iugc
 
Generalized Linear Models
Generalized Linear ModelsGeneralized Linear Models
Generalized Linear Models
 
Mba2216 week 11 data analysis part 03 appendix
Mba2216 week 11 data analysis part 03 appendixMba2216 week 11 data analysis part 03 appendix
Mba2216 week 11 data analysis part 03 appendix
 
Malhotra20
Malhotra20Malhotra20
Malhotra20
 
Comparative and Non-Comparative Scaling Techniques
Comparative and Non-Comparative Scaling TechniquesComparative and Non-Comparative Scaling Techniques
Comparative and Non-Comparative Scaling Techniques
 
Prediction of Changes That May Occur in the Neutral Cases in Conflict Theory ...
Prediction of Changes That May Occur in the Neutral Cases in Conflict Theory ...Prediction of Changes That May Occur in the Neutral Cases in Conflict Theory ...
Prediction of Changes That May Occur in the Neutral Cases in Conflict Theory ...
 
S34119122
S34119122S34119122
S34119122
 
Mr4 ms10
Mr4 ms10Mr4 ms10
Mr4 ms10
 
Chap019
Chap019Chap019
Chap019
 
Cluster Analysis Assignment 2013-2014(2)
Cluster Analysis Assignment 2013-2014(2)Cluster Analysis Assignment 2013-2014(2)
Cluster Analysis Assignment 2013-2014(2)
 
factor analysis
factor analysisfactor analysis
factor analysis
 
Approaches to the_analysis_of_survey_data
Approaches to the_analysis_of_survey_dataApproaches to the_analysis_of_survey_data
Approaches to the_analysis_of_survey_data
 

Similar to Marketing Analysis of Scooter (Monopattino) Sharing in Milan

Experimental Design: Does people identify more with objects they own than wit...
Experimental Design: Does people identify more with objects they own than wit...Experimental Design: Does people identify more with objects they own than wit...
Experimental Design: Does people identify more with objects they own than wit...Paris Photographer | Février Photography
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptxVishalLabde
 
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docxByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docxclairbycraft
 
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docxByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docxDaliaCulbertson719
 
A Novel Performance Measure for Machine Learning Classification
A Novel Performance Measure for Machine Learning ClassificationA Novel Performance Measure for Machine Learning Classification
A Novel Performance Measure for Machine Learning ClassificationIJMIT JOURNAL
 
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATIONA NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATIONIJMIT JOURNAL
 
A Novel Performance Measure For Machine Learning Classification
A Novel Performance Measure For Machine Learning ClassificationA Novel Performance Measure For Machine Learning Classification
A Novel Performance Measure For Machine Learning ClassificationKarin Faust
 
Unit 8 data analysis and interpretation
Unit 8 data analysis and interpretationUnit 8 data analysis and interpretation
Unit 8 data analysis and interpretationAsima shahzadi
 
Data Matrix Of Cpi Data Distribution After Transformation...
Data Matrix Of Cpi Data Distribution After Transformation...Data Matrix Of Cpi Data Distribution After Transformation...
Data Matrix Of Cpi Data Distribution After Transformation...Kimberly Jones
 
Lu2 introduction to statistics
Lu2 introduction to statisticsLu2 introduction to statistics
Lu2 introduction to statisticsLamineKaba6
 
Scaling in research
Scaling  in researchScaling  in research
Scaling in researchankitsengar
 
07 Mesurement and Scaling.pptx
07 Mesurement and Scaling.pptx07 Mesurement and Scaling.pptx
07 Mesurement and Scaling.pptxMesfinMelese4
 
Chotu scaling techniques
Chotu scaling techniquesChotu scaling techniques
Chotu scaling techniquesPruseth Abhisek
 
Research Method chapter 6.pptx
Research Method chapter 6.pptxResearch Method chapter 6.pptx
Research Method chapter 6.pptxAsegidHmeskel
 
Analytical Design in Applied Marketing Research
Analytical Design in Applied Marketing ResearchAnalytical Design in Applied Marketing Research
Analytical Design in Applied Marketing ResearchKelly Page
 
lecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modelinglecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modelingstone55
 
Research methodology - Analysis of Data
Research methodology - Analysis of DataResearch methodology - Analysis of Data
Research methodology - Analysis of DataThe Stockker
 
20142014_20142015_20142115
20142014_20142015_2014211520142014_20142015_20142115
20142014_20142015_20142115Divita Madaan
 

Similar to Marketing Analysis of Scooter (Monopattino) Sharing in Milan (20)

Experimental Design: Does people identify more with objects they own than wit...
Experimental Design: Does people identify more with objects they own than wit...Experimental Design: Does people identify more with objects they own than wit...
Experimental Design: Does people identify more with objects they own than wit...
 
Presentation1.pptx
Presentation1.pptxPresentation1.pptx
Presentation1.pptx
 
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docxByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
 
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docxByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
ByPREFERENCES FOR CAR CHOICE IN UNITED STATES.docx
 
A Novel Performance Measure for Machine Learning Classification
A Novel Performance Measure for Machine Learning ClassificationA Novel Performance Measure for Machine Learning Classification
A Novel Performance Measure for Machine Learning Classification
 
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATIONA NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
A NOVEL PERFORMANCE MEASURE FOR MACHINE LEARNING CLASSIFICATION
 
A Novel Performance Measure For Machine Learning Classification
A Novel Performance Measure For Machine Learning ClassificationA Novel Performance Measure For Machine Learning Classification
A Novel Performance Measure For Machine Learning Classification
 
Unit 8 data analysis and interpretation
Unit 8 data analysis and interpretationUnit 8 data analysis and interpretation
Unit 8 data analysis and interpretation
 
Data Matrix Of Cpi Data Distribution After Transformation...
Data Matrix Of Cpi Data Distribution After Transformation...Data Matrix Of Cpi Data Distribution After Transformation...
Data Matrix Of Cpi Data Distribution After Transformation...
 
Lu2 introduction to statistics
Lu2 introduction to statisticsLu2 introduction to statistics
Lu2 introduction to statistics
 
Scaling in research
Scaling  in researchScaling  in research
Scaling in research
 
analysis plan.ppt
analysis plan.pptanalysis plan.ppt
analysis plan.ppt
 
07 Mesurement and Scaling.pptx
07 Mesurement and Scaling.pptx07 Mesurement and Scaling.pptx
07 Mesurement and Scaling.pptx
 
Chotu scaling techniques
Chotu scaling techniquesChotu scaling techniques
Chotu scaling techniques
 
Research Method chapter 6.pptx
Research Method chapter 6.pptxResearch Method chapter 6.pptx
Research Method chapter 6.pptx
 
Analytical Design in Applied Marketing Research
Analytical Design in Applied Marketing ResearchAnalytical Design in Applied Marketing Research
Analytical Design in Applied Marketing Research
 
lecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modelinglecture 1 applied econometrics and economic modeling
lecture 1 applied econometrics and economic modeling
 
Research methodology - Analysis of Data
Research methodology - Analysis of DataResearch methodology - Analysis of Data
Research methodology - Analysis of Data
 
20142014_20142015_20142115
20142014_20142015_2014211520142014_20142015_20142115
20142014_20142015_20142115
 
ch 13.pptx
ch 13.pptxch 13.pptx
ch 13.pptx
 

Recently uploaded

What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?riteshhsociall
 
BLOOM_April2024. Balmer Lawrie Online Monthly Bulletin
BLOOM_April2024. Balmer Lawrie Online Monthly BulletinBLOOM_April2024. Balmer Lawrie Online Monthly Bulletin
BLOOM_April2024. Balmer Lawrie Online Monthly BulletinBalmerLawrie
 
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Onlineanilsa9823
 
The Science of Landing Page Messaging.pdf
The Science of Landing Page Messaging.pdfThe Science of Landing Page Messaging.pdf
The Science of Landing Page Messaging.pdfVWO
 
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO SuccessBrighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO SuccessVarn
 
Brand experience Peoria City Soccer Presentation.pdf
Brand experience Peoria City Soccer Presentation.pdfBrand experience Peoria City Soccer Presentation.pdf
Brand experience Peoria City Soccer Presentation.pdftbatkhuu1
 
Aryabhata I, II of mathematics of both.pptx
Aryabhata I, II of mathematics of both.pptxAryabhata I, II of mathematics of both.pptx
Aryabhata I, II of mathematics of both.pptxtegevi9289
 
Factors-Influencing-Branding-Strategies.pptx
Factors-Influencing-Branding-Strategies.pptxFactors-Influencing-Branding-Strategies.pptx
Factors-Influencing-Branding-Strategies.pptxVikasTiwari846641
 
Situation Analysis | Management Company.
Situation Analysis | Management Company.Situation Analysis | Management Company.
Situation Analysis | Management Company.DanielaQuiroz63
 
How to utilize calculated properties in your HubSpot setups
How to utilize calculated properties in your HubSpot setupsHow to utilize calculated properties in your HubSpot setups
How to utilize calculated properties in your HubSpot setupsssuser4571da
 
Labour Day Celebrating Workers and Their Contributions.pptx
Labour Day Celebrating Workers and Their Contributions.pptxLabour Day Celebrating Workers and Their Contributions.pptx
Labour Day Celebrating Workers and Their Contributions.pptxelizabethella096
 
Cost-effective tactics for navigating CPC surges
Cost-effective tactics for navigating CPC surgesCost-effective tactics for navigating CPC surges
Cost-effective tactics for navigating CPC surgesPushON Ltd
 
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15SearchNorwich
 
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort ServiceDelhi Call girls
 
April 2024 - VBOUT Partners Meeting Group
April 2024 - VBOUT Partners Meeting GroupApril 2024 - VBOUT Partners Meeting Group
April 2024 - VBOUT Partners Meeting GroupVbout.com
 

Recently uploaded (20)

What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?What is Google Search Console and What is it provide?
What is Google Search Console and What is it provide?
 
BLOOM_April2024. Balmer Lawrie Online Monthly Bulletin
BLOOM_April2024. Balmer Lawrie Online Monthly BulletinBLOOM_April2024. Balmer Lawrie Online Monthly Bulletin
BLOOM_April2024. Balmer Lawrie Online Monthly Bulletin
 
No Cookies No Problem - Steve Krull, Be Found Online
No Cookies No Problem - Steve Krull, Be Found OnlineNo Cookies No Problem - Steve Krull, Be Found Online
No Cookies No Problem - Steve Krull, Be Found Online
 
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service OnlineCALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
CALL ON ➥8923113531 🔝Call Girls Hazratganj Lucknow best sexual service Online
 
The Science of Landing Page Messaging.pdf
The Science of Landing Page Messaging.pdfThe Science of Landing Page Messaging.pdf
The Science of Landing Page Messaging.pdf
 
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO SuccessBrighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
Brighton SEO April 2024 - The Good, the Bad & the Ugly of SEO Success
 
Brand experience Peoria City Soccer Presentation.pdf
Brand experience Peoria City Soccer Presentation.pdfBrand experience Peoria City Soccer Presentation.pdf
Brand experience Peoria City Soccer Presentation.pdf
 
Aryabhata I, II of mathematics of both.pptx
Aryabhata I, II of mathematics of both.pptxAryabhata I, II of mathematics of both.pptx
Aryabhata I, II of mathematics of both.pptx
 
Factors-Influencing-Branding-Strategies.pptx
Factors-Influencing-Branding-Strategies.pptxFactors-Influencing-Branding-Strategies.pptx
Factors-Influencing-Branding-Strategies.pptx
 
Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...
Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...
Top 5 Breakthrough AI Innovations Elevating Content Creation and Personalizat...
 
Situation Analysis | Management Company.
Situation Analysis | Management Company.Situation Analysis | Management Company.
Situation Analysis | Management Company.
 
How to utilize calculated properties in your HubSpot setups
How to utilize calculated properties in your HubSpot setupsHow to utilize calculated properties in your HubSpot setups
How to utilize calculated properties in your HubSpot setups
 
Labour Day Celebrating Workers and Their Contributions.pptx
Labour Day Celebrating Workers and Their Contributions.pptxLabour Day Celebrating Workers and Their Contributions.pptx
Labour Day Celebrating Workers and Their Contributions.pptx
 
Cost-effective tactics for navigating CPC surges
Cost-effective tactics for navigating CPC surgesCost-effective tactics for navigating CPC surges
Cost-effective tactics for navigating CPC surges
 
Creator Influencer Strategy Master Class - Corinne Rose Guirgis
Creator Influencer Strategy Master Class - Corinne Rose GuirgisCreator Influencer Strategy Master Class - Corinne Rose Guirgis
Creator Influencer Strategy Master Class - Corinne Rose Guirgis
 
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
Five Essential Tools for International SEO - Natalia Witczyk - SearchNorwich 15
 
Generative AI Master Class - Generative AI, Unleash Creative Opportunity - Pe...
Generative AI Master Class - Generative AI, Unleash Creative Opportunity - Pe...Generative AI Master Class - Generative AI, Unleash Creative Opportunity - Pe...
Generative AI Master Class - Generative AI, Unleash Creative Opportunity - Pe...
 
SEO Master Class - Steve Wiideman, Wiideman Consulting Group
SEO Master Class - Steve Wiideman, Wiideman Consulting GroupSEO Master Class - Steve Wiideman, Wiideman Consulting Group
SEO Master Class - Steve Wiideman, Wiideman Consulting Group
 
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort ServiceEnjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
Enjoy Night⚡Call Girls Dlf City Phase 4 Gurgaon >༒8448380779 Escort Service
 
April 2024 - VBOUT Partners Meeting Group
April 2024 - VBOUT Partners Meeting GroupApril 2024 - VBOUT Partners Meeting Group
April 2024 - VBOUT Partners Meeting Group
 

Marketing Analysis of Scooter (Monopattino) Sharing in Milan

  • 1. Scooter (Monopattino) Sharing Service Analysis of responses to the questionnaire with marketing purposes, using machine learning. Emre Danışan Giovanni Roja Mehmet Berk Souksu Aslı Senel
  • 2. Context of the Project Introduction 1 Cluster Analysis Discriminant Analysis Descriptive Analysis Segmentation and Profiling 3 Logistic Regression Decision Tree Regression Analysis 4 Recommendation of Pricing Conclusion 6 Appendix 7 Understanding Data Cleaning Data Factor Analysis Data Preparation 2 5 Agenda
  • 3. Launching a scooter (monopattino) sharing service in Milan Target population: university students in Milan The objectives are listed as follows: Understand the potential customers Segmentation of the potential customers and their uncover insights Evaluation of the willingness to pay of customers and recommendation of pricing and billing method 1 Context of the Project
  • 4. The survey consists of 25 questions. Some of them were including sub questions in the form of Likert Scale rating. There are 353 data rows, whereas 272 of them could have been completed. There were incomplete rows and some mistakes in the data, which we will explain and demonstrate how we have handled them. While analyzing the data, we understood that there are 4 main sections which were dedicated to public transportation, car, shared mobility and scooter sharing. Two sections public transportation users (PT) and car owners (C) make up two parallel paths in terms of encountering questions during the survey. Both of the paths unite in shared mobility section (SM) and continue to scooter (SC) and personal (P) questions. Understanding the Data 2
  • 5. First we had to separate the variables into two dataframes respective to the paths. After doing so, we removed the question number 2 which separates the paths. The first question makes people directly exit, if answered no, so we removed the question columns and the observations that directly exit. Since there is an A/B test, which is the question number 20, we have needed to separate our observations depending on the question viewed. To be able to not have “Not Available” rows we created sub datasheets for each part of the question which are: Public_Trans : Public Transportation Users Car : Car owners Shared : Shared Mobility Scooter : Scooter and Personal questions Public_Shared : Public Transportation & Shared Mobility Cleaning the Data - Questionnaire 2
  • 6. It is identified that 9 columns are meta data, which are irrelevant to the our data analysis. So following columns were removed from original data. The meta data are as follow: Respondent ID | Collector ID | Start Date | End Date | IP Address Email Address | First Name | Last Name | Custom Data 1 Since the both of the first two rows are used for header information, we removed one of them and put the necessary names on the second row. We added 11 to age information, to make it show real ages’ number. There was a problem in the 14th question; it had no values filled. So we wrote a for loop to correct the problem looking at the answers to the 15th question. We also changed the answers of `Question Viewed` to 0 and 1 to simplify and convert all the data to numericals. Cleaning the Data - Excel Sheet 2
  • 7. For our factor analysis, we have factored the variables which are belong to the same segment and same question. So they’re in the same context and we can make the correlations between variables easier. While selecting the appropriate factor number, we checked the Eigen values, as you can see in the graph. We prioritized to take only the ones above 1, however depending on the elbow point and the combination of the variables that makes sense, we took some values above 0.90 as well. Factor analysis have been performed to the data related with: Public Transportation(PT), Shared Mobility(SM), Car(C), Personal (P) Factor Analysis 2
  • 8. Example of Factor: Public Transportation 2 Factor Analysis Other factor sepertions can be seen in [Appendix].
  • 9. 3 Segmentation and Profiling Main objectives of this section settled as follow: Inside homogenity Outside heterogenity Meaningful size between the clusters Evaulation of meaningfulness of differences and sizes For first step; K-Means Cluster analysis have been performed. K-Means Cluster Analysis was performed with the factors from factor analysis which are in Likert Scale form. Even though likert scale variables are ordinal data from nature, for this marketing research they are regarded as interval data.
  • 10. Firstly, number of combination with the range of 2 to 5 factors were analyzed for 2 to 6 clusters for each factors respectively. In the deciding process, first scatter plot of each factors was interpreted for respective cluster number. To see how clusters are different, boxplots of combinations were evaluated. To understand the respective meaningfulness of differences, t tests were examined and mean of groups were evaluated. Finally, the size of each cluster was checked to see if they have effective number to be target segment. 18 combinations of 2 factors were evluated. Coolness(SM) and Convenience(SM) fastors were selected since they gave the best result in the process. We also evaluated them in terms of tukeyhsd, barlett, levene and one way test to understand if the clusters have normality in distribution and consistency in variance perspective . After interpretation of 2 factors, combinations of 3 and 4 factors were also evaluated. In the end of the process following 4 factors selected since they gave the best results in terms of tests. The test applied to data and their results can be seen as follow: 3 K-Means Clustering Analysis Factor Name Type of Tests Result Anova Test Tukey HSD Bartlett Levene Oneway Test Touristic(PT) Ho is accepted Ho is accepted Ho is rejected Ho is accepted Ho is accepted Bad Coolness(SM) Ho is rejected Ho is accepted for 1 factor Ho is rejected Ho is rejected Ho is rejected Good Convenience(SM) Ho is rejected Ho is accepted for 1 factor Ho is rejected Ho is rejected Ho is rejected Good Environmental(SC) Ho is rejected Ho is rejected Ho is accepted Ho is accepted Ho is rejected Good
  • 11. After evaluating trials in terms of both tests, sizes and meaningfulness; we decided to use factors of Touristic(PT), Coolness(SM), Convenience(SM) and Environmental(SC) for 3 clusters with the size of 97-90-58. The result of analysis for selected factors can be seen in graph right hand side: 3 K-Means Cluster Analysis In the end from K-Means Cluster analysis we had the insight of cluster 1 have very high interest for Touristic(PT) and Coolness(SM) factors, meanwhile cluster 2 for Convenience(SM) and cluster 3 has very low interest for Environmental(SC). Both clusters have high interest to accessibility of public transportation, infrastructure and social network(facebook and instagram) at the same time both of them have low interest to type of sharing of shared mobility. Addition to that, cluster 2 has high interest to convenience of shared mobility, whereas cluster 3 has low interest, and traditional media. [Appendix] From the result of interest of each cluster we gave them segment name respectively: Cluster 1: Social Hommies Cluster 2: Independents Cluster 3 Traditionals
  • 12. Even if this method does not give clear idea about number of clusters we prefer to cut above height 40 since the distance of lines are more clear . 3 Other Cluster Analysis For second step of cluster analysis, hierarchical cluster analysis was performed to see if ordinal variables itself and categorical variables itself also have effect on the cluster differentiation. The graph on the left side demonstrates us while cutting above the height of 40 3 clusters are meaningful, otherwise cutting belove the height of 40 4 clusters are meaningful. We also double checked our cluster result that we got from K-Means analysis, and from hierarchical analysis we also got 3 clusters result. We also have performed latent class cluster analysis with selected factors also used in K-Means Analysis to compare the result and checked the consistency of our clusters. Since some of the variables are ordinal, we could not use them in the K-Means method. We also decided to use latent analysis to see how values such as age and gender affect clusters. However, as we have seen from the previous analysis, the addition of Covarite values made the analysis more complex and difficult to interpret. Therefore, latent analysis did not give us significant results.
  • 13. After the division of the clusters, it is fundamental to analyze them in order to evaluate their characteristics and, also, how they differ. This will lead to a better understanding of the market segment will be approached. In this section, will be discussed the relations between the variables and to further analyze the characteristics of the previously created clusters. Each type of variable has a different approach to be performed, namely the analysis is going to be divided into: ● Categorical ● Ordinal ● İnterval 3 Descriptive Analysis First of all , we analyzed what was the overall satisfaction of the survey with Public Transportation in order to try to exploit their weaknesses and strengths. Public Transportation The conclusion of this table is that customers are: 1. Unhappy with Punctuality and Reliability 2. Satisfied with Cost and Availability
  • 14. 3 Descriptive Analysis - Gender Gender: Our 3 clusters were separated into their gender compositions to see if there was any major difference inside them. As can be seen from the analysis of gender in the figures below, the clusters are well balanced in terms of males and females. Categorical Data The categorical variables of the provided data were: gender, age, living situation and living location. A graphical approach has been used to take a look of the characteristics of the clusters.
  • 15. Age: Secondly, the age variable will be analyzed in terms of clusters, always remembering that the range of age was from 16 and below to 30 and above. By analyzing the survey, none of the clusters contained people of ages 19 and below, that’s why they are not appearing in the further graphs and also will be no need of looking at these younger people in the marketing campaign. 3 Descriptive Analysis - Age Age Composition by Clusters
  • 16. 3 Descriptive Analysis – Living Location Living Location: In the survey we had also the information of where the individuals within the clusters live, with that, is possible to perform an analysis also on this variable to see where are the areas that could be explored in the marketing campaign. Living Situation: Finally, the last categorical variable that will be analyzed for the clusters is their living situation.
  • 17. AB response analysis: The clusters also responded differently to when asked questions about the payments methods. As we know, question A was about paying a fee to use the scooter and question B was about paying a subscription and then a reduced fee. The result shown was that Cluster 1 is mostly willing to pay when asked question A than B. The same happens for Cluster 2, but for Cluster 3 they are not willing at all to pay for the scooter. 3 Descriptive Analysis - A/B Test
  • 18. Not only is important to see the characteristics of the clusters is also fundamental to analyze the relations among the variables. Once again, the approaches differ from the type of data, so it will be performed three different analysis, namely: ○ Contingency table ○ Multi-group boxplot ○ Scatter plot 3 Descriptive Analysis for Combinations Where and How people live: The first analysis performed within variables was to check where do people live relating with how they live. This information can provide a better insight of living situation characteristics of the survey population. From this analysis we can see that most of the population live inside the city of Milan and, also, share their residences with flat mates. This result is expected as the survey targeted university students. Also, with a great importance, are the individuals that live in other province with their families, showing that, even though they study in an university in Milan, a high percentage of the survey are still living with the family. Contingency Tables: Categorical – Categorical Variables . From these two tables we can see that most of people already used shared mobility but surprisingly a great number has never used shared mobility. For instance, 1/3 of the survey that lives inside the city of Milan has never used share mobility, and it gets even worse when looking at outside the Province of Milan that 72% also never used. Also, from the second table we can see that most of the people that use share mobility share also their residence with friends. Which people already used shared mobility: Secondly, we wanted to know which are the people that already used share mobility in their lives. This analysis is important to know weather in the marketing campaign will be important to do a campaign to create awareness of this type of transportation.
  • 19. Where people are satisfied with shared mobility: For instance, the first analysis performed was to see the relation with the satisfaction of the people with the public transportation and the where they lived. In this analysis was used the factor previously created that shows the satisfaction of the survey with the public transportation From the figure it can be seen that people who lives inside the city of Milan are more satisfied with public transportation than people from anywhere else. 3 Descriptive Analysis for Combinations Multi Group Box Plot: İnterval - Categorical Variables Since most of the variables were interval, our group chose the ones that had important relations between themselves and would help most to understand the survey. In the second figure we can see that the people from inside the city of Milan uses public transportation much more than outside the city. Finally, in this last figure we can see that people from Milan and the Province of Milan uses shared mobility way more frequently than people from outside the Province, meaning that this could be the area to approach in the marketing campaign.
  • 20. Scatter Plot: İnterval - İnterval Last but not least was evaluated how interval variables were related with scatter plots. Using the same approach, the most important variables were evaluated. Also, was performed a correlation test to see if these variables were actually correlated. For example, analyzing the factors related to Accessibility of Public transportation and Consciousness of the Scooter Usage we can see that they have a correlation as the p-value is lower than the threshold of 0.05 and a correlation of 0.1289. 3 Descriptive Analysis for Combinations
  • 21. Gender: Since p-value is greater than 0.05, H0 is accepted, which means that the composition of gender is the same across the groups. 3 Discriminant Analysis With the discriminant analysis we are going to verify how different the clusters are. Age: The same happens for age, the chi-square analysis verifies that H0 is accepted as well, so the clusters do not differ in terms of age. Living Location: Differing from the previous variables, living location has a p-value greater than the threshold, which means that the cluster are different in this characteristic. Chi-square test: Used to compare categorical variables among clusters.
  • 22. Also, from the chi-square analysis we can see if the clusters are following the expected number. For instance, we can see that cluster 1 has as many people living in the city of Milan as expected. In the other hand it has more individuals from the province of Milan than expected, but less of people from outside the province. Cluster 2 has slightly more people of the city than expected, but less people from the province. Last, cluster 3 follows pretty much the expected from the chi-square analysis. 3 Discriminant Analysis Living Situation: Last variable, that also shows a difference between its clusters, is the living situation. Cluster 1: More person that live with parents than expected Cluster 2: More people that live with friends and less that lives with parents than expected Cluster 3: Size as expected
  • 23. After making the necessary adjustments to have first insight about clusters, we went forward to make sense of data through regression analysis. The main purpose was to see which factors had the most effects on the customers willingness to pay. It was difficult to explain the response with the variables because during the cluster analysis we saw that no matter how we cpmbine the factor is, it would not create distinct clusters. However, the main purpose of this analysis is to understand the most prominent factors in different pricing strategies rather than predicting the future. In order to understand customers' preferred billing method, we divided them into two separate datasets according to A/B Test and applied separate regression analysis on them. First of all we performed Linear regression method to evaluate WTP of customer, however our selected independent variables could not explain the dependent variable , which is A/B test, very well. [Appendix] Since it was relatively difficult to analyze the results from linear regression, we decided to convert A/B test variables, which are likert scale means interval variables, into binary values to use in the logistic regression. In this way, our independent variables would directly explain the payment requests of the customers from different segments. For this reason, we assigned the people, who gave the highest two choices to A/B Test , to 1 and the others to 0. Regression Analysis 4
  • 24. When we try to decide on question about which regression model should we use, we also take advantage of heatmap function in R to understand correlation between indepent variables much better. The output of the correlation did not directly improve the result of the regression but it creates a broader understanding of our data. Regression Analysis 4 We can see that some variables can affect each other as we expected but extracting them from regression did not improve regression that much. That's why we leave all of them in the model and interpret it in a broader sense
  • 25. Logistic Regression Analysis – A Test 4 Coolness(SC)/Consciousness(SC) Coolness phenomenon is a well explanatory of the willingness to pay as we expected. Also, consciousness perception of product reflects same insight with environmental factors. Time/# connection Time spend in the traffic increase the willingness to pay of customer but #connection make it decrease because it means distance is too much for scooter. Infrastructure(PT) Well-developed public transportation shows welfare of users. In this kind of transportation habits, usage of scooter will increase. Satisfaction(PT) Increase in the satisfaction of public transportation affects willingness to pay of customer negatively because it will harder to use scooter instead of public transportation. Environmental(SC)/Environmental(PT) Environmental variables have important role in the willingness to pay of customer because it is the main driver for customer to use scooter.
  • 26. Logistic Regression Analysis – B Test 4 Coolness(SM)/Coolness(SC)/Consciousness(SC) Coolness phenomenon is a well explanatory of the willingness to pay as we expected. Also, consciousness perception of product reflects same insight with environmental factors. Accessibility(PT) Availability of the public transportation limits the shift to scooter. Touristic(PT) Increase in touristic use of public transportation also positively affects scooter usage which is logical in a sense of object f the customer. Infrastructure(PT) Well-developed public transportation shows welfare of users. In this kind of transportation habits, usage of scooter will increase. How often do you use public transportation? Increase in the frequency of the public transportation will decrease the intention of using scooter. Usage(S) Habit of using shared mobility directly affect the willingness to pay of customer as we can imagine normally. Environmental(PT)/Environmental(SC) Environmental variables have important role in the willingness to pay of customer because it is the main driver for customer to use scooter.
  • 27. Decision Tree - A/B Test 4 We also decided to use additional tools to understand data since regression gives limited perspective about willingness to pay analysis. The simplest but the most useful one is Decision Tree Classifier. It gives great insight about the variables and explain the part of the data that can not be expressed by regression. Some of the interactions are already seen from the regression like Consciousness(SC), Coolness(SC), and Satisfaction(PT) which are explaining left part of the tree. We can also access to the right from the tree which is explaining the willingness to pay of different age segments and Accessibility(PT) of public transportation. When we analyzed the part of Test B, it would give us different insight about our customers that we could not possess from regression analysis. The social media structure in the Test B, which are Online newspaper/magazine, On-demand radio, Twitter, YouTube and Snapchat are more meaningful in terms of targeting the people in this segment.. We have also seen the effect of the gender in one of the leaf and small distinction which determine s the difference between genders. Test A Test B
  • 28. 5 Evaluation of the Willings to Pay In the analysis of responses of the A/B test and clusters, we tried to assess the willingness to pay of the customers corresponding to each cluster. To do that, we summed up all values, which is result of the A/B test(likert scale from 1 to 5), for clusters that given answer to A/B Response and analyzed them based on percentage of this values. As we can see from the graphes right hand side, cluster 2 is superior to all other clusters in A/B test. Also, to be able to understand the behaviour of the A/B test, we add the average of the clusters because the number of the responses in the A/B are different from each other(There are 130 responses to question A and 113 for question B). Since the average of the cluster 2 is higher in the A when compared to B, we can say that customers have more intention for willingness to pay in the method of A and cluster 2 is superior to other clusters.
  • 29. 5 Recommendation of Pricing Method Interpretation Reccomendation As we can see from the previous analysis, cluster 2 is the most profitable among the others and pricing method of A is superior to others. Cluster 2 can be named as “Independents” because they are mostly living with flatmates or alone. They care about Accessibility, Infrastructure and Convenience the most and they are majority of the respondents. They also have the highest percentage of respondents living in the Comune of Milano. Also there is no significant difference between the genders so that our persona will be genderless. It can be seen that for all clusters the average of the responses are higher for A compared to B. Therefore, pricing method of A is more attractive to people overall, which means that they do not want to have monthly subscription. This shows us that respondents tend to use scooter for short periods, so they are willing to pay more per minute rather than having a fixed monthly subscription and less amount per minute. As result we reccomend to offer pricing and billing method with the service costs €0,15 per minute of use.
  • 30. 6 Conclusion Data Interpretation X number of datasets were created to analyze methods easily in following steps. 22 Factors were selected from 80 observations. Factor Analysis DiscriminantDescriptive Analysis Cluster Analysis Regression Analysis Evaluation of WTP and Pricing Evaluation of A/B Test For Test A; Consciousness(SC), Coolness(SC), and Satisfaction(PT) factors explain the dependent variable better. For Test B; factor of Social Media(P) has the significant impact. Segmentation and Profiling From K-Means Cluster Analysis, 4 Factors for 3 Clusters were selected. Segments were named as Social Hommies, Independents and Traditionals based on results from discriminant and descriptive analysis. Recommendation of Pricing Independents are selected as a target segment since they are the one has highest willingness to pay to first payment method which is the service costs €0,15 per minute of use.
  • 35. Question # Variable Type Question Factos 1 Categorical (Nominal) Do you travel at least once a week in the urban area of Milan? 2 Categorical When you travel in the urban area of Milan, what kind of transportation do you use most often? 3 Likert Scale (Ordinal) What are the reasons that you most often choose public transportation? Factors1: Environmental(PT), Accesibility(PT), Touristic(PT), Infrastructure(PT), Traffic(PT) 4 Ordinal What kind of ticket for public transportation do you use most often? 5 Ordinal How often do you use public transportation? 6 Ordinal For most of these trips, how many connections do you need to make (change metro lines, change bus lines, etc.)? 7 Likert Scale On average, how satisfied are you with the public transportation? Factors2: Satisfaction(PT) 8 Ordinal For most of these trips, how much time would you spend with public transportation? 9 Likert Scale What are the reasons that you most often use your own car? Factors3: Time and Comfort(C), Cool to Drive(C), Unavailability(C) 10 Ordinal How often do you travel with your own car in the urban area of Milan? 11 Ordinal For most of these trips, how easy could you find parking? 12 Ordinal For most of these trips, how much time would you spend with your own car? 13 Likert Scale On average, how satisfied are you with travelling with your car? Factors4: Satisfaction(C) 14 Categorical Have you ever used shared mobility in the urban area of Milan? 15 Likert Scale How often do you use the following types of shared mobility? Factors5: Type of Sharing 16 Likert Scale How well do the following scenarios describe your usage of shared mobility? Factors6: Usage(S) 17 Likert Scale What are the reasons that motivate you to use shared mobility? Factors7: Coolness(SM), Convenience(SM) 18 Likert Scale On average, how satisfied are you with shared mobility that you have used? Factors8: Satisfaction(SM) 19 Likert Scale What is your opinion about using a scooter sharing? Factors9: Coolness(SC), Consciousness(SC), Environmental(SC) A/B Test 20 Likert Scale 50.0% Consider that the service costs €0,15 per minute of use. How likely are you going to try the service? Consider that the service requires a monthly subscription of €8, and costs€0,05 per minute of use. How likely are you going to try the service? 21 Categorical Your gender is: 22 Ordinal Your age is? 23 Categorical Where do you live? 24 Categorical What is your living situation? 25 Likert Scale How often do you use the following media channels? Factors10: Traditional(P), News(P), Informative Network(P), Social Media(P) 7 Apendix
  • 36. 4 factor Analysis Social Hommies Independents Traditionals Environmental(PT) high medium medium Accessibility(PT) high high high Touristic(PT) high low medium Infrastructure(PT) high high high Traffic(PT) medium medium medium Satisfaction(PT) high medium medium Type of sharing(SM) low low low Usage(SM) medium medium low Coolness(SM) high low medium Convenience(SM) medium high low Satisfaction(SM) medium medium medium Coolness(SC) medium medium medium Consciousness(SC) high low low Environmental(SC) medium medium low Traditional(P) medium high medium News(P) medium medium medium Informative(P) medium medium medium Social Network(P) high high high 7 Apendix