A marketing analysis was done by Emre Danışan, Giovanni Roja, Mehmet Berk Souksu and Aslı Şenel in late 2019 for Scooter (Monopattino) Sharing services in Milan city.
Extensive research on shared mobility services all over Europe and various machine learning methods are used to analyze data and discover the results.
Marketing Analysis of Scooter (Monopattino) Sharing in Milan
1. Scooter (Monopattino)
Sharing Service
Analysis of responses to the questionnaire with
marketing purposes, using machine learning.
Emre Danışan
Giovanni Roja
Mehmet Berk Souksu
Aslı Senel
2. Context of the Project
Introduction
1
Cluster Analysis
Discriminant Analysis
Descriptive Analysis
Segmentation
and Profiling
3
Logistic Regression
Decision Tree
Regression
Analysis
4
Recommendation
of Pricing
Conclusion
6
Appendix
7
Understanding Data
Cleaning Data
Factor Analysis
Data
Preparation
2
5
Agenda
3. Launching a scooter (monopattino) sharing service in Milan
Target population: university students in Milan
The objectives are listed as follows:
Understand the potential customers
Segmentation of the potential customers and their uncover insights
Evaluation of the willingness to pay of customers and recommendation of
pricing and billing method
1
Context of the Project
4. The survey consists of 25 questions. Some of them were including sub questions in the form of Likert Scale rating.
There are 353 data rows, whereas 272 of them could have been completed. There were incomplete rows and
some mistakes in the data, which we will explain and demonstrate how we have handled them.
While analyzing the data, we understood that there are 4 main sections which were dedicated to public
transportation, car, shared mobility and scooter sharing.
Two sections public transportation users (PT) and car owners (C) make up two parallel paths in terms of
encountering questions during the survey. Both of the paths unite in shared mobility section (SM) and continue to
scooter (SC) and personal (P) questions.
Understanding the Data
2
5. First we had to separate the variables into two dataframes respective to the paths. After doing so, we removed
the question number 2 which separates the paths.
The first question makes people directly exit, if answered no, so we removed the question columns and the
observations that directly exit.
Since there is an A/B test, which is the question number 20, we have needed to separate our observations
depending on the question viewed.
To be able to not have “Not Available” rows we created sub datasheets for each part of the question which are:
Public_Trans : Public Transportation Users
Car : Car owners
Shared : Shared Mobility
Scooter : Scooter and Personal questions
Public_Shared : Public Transportation & Shared Mobility
Cleaning the Data - Questionnaire
2
6. It is identified that 9 columns are meta data, which are irrelevant to the our data analysis. So following
columns were removed from original data. The meta data are as follow:
Respondent ID | Collector ID | Start Date | End Date | IP Address Email Address | First Name | Last Name |
Custom Data 1
Since the both of the first two rows are used for header information, we removed one of them and put the
necessary names on the second row.
We added 11 to age information, to make it show real ages’ number.
There was a problem in the 14th question; it had no values filled. So we wrote a for loop to correct the
problem looking at the answers to the 15th question.
We also changed the answers of `Question Viewed` to 0 and 1 to simplify and convert all the data to
numericals.
Cleaning the Data - Excel Sheet
2
7. For our factor analysis, we have factored the variables
which are belong to the same segment and same
question. So they’re in the same context and we can
make the correlations between variables easier.
While selecting the appropriate factor number, we
checked the Eigen values, as you can see in the graph.
We prioritized to take only the ones above 1, however
depending on the elbow point and the combination of
the variables that makes sense, we took some values
above 0.90 as well.
Factor analysis have been performed to the data related with: Public Transportation(PT),
Shared Mobility(SM), Car(C), Personal (P)
Factor Analysis
2
8. Example of Factor: Public Transportation
2
Factor Analysis
Other factor sepertions can be seen in [Appendix].
9. 3
Segmentation and Profiling
Main objectives of this section settled as follow:
Inside homogenity
Outside heterogenity
Meaningful size between the clusters
Evaulation of meaningfulness of differences and sizes
For first step; K-Means Cluster analysis have been performed.
K-Means Cluster Analysis was performed with the factors from
factor analysis which are in Likert Scale form.
Even though likert scale variables are ordinal data from nature,
for this marketing research they are regarded as interval data.
10. Firstly, number of combination with the range of 2 to 5 factors were analyzed for 2 to 6 clusters for each factors respectively. In the deciding process, first
scatter plot of each factors was interpreted for respective cluster number. To see how clusters are different, boxplots of combinations were evaluated. To
understand the respective meaningfulness of differences, t tests were examined and mean of groups were evaluated. Finally, the size of each cluster was
checked to see if they have effective number to be target segment.
18 combinations of 2 factors were evluated. Coolness(SM) and Convenience(SM) fastors were selected since they gave the best result in the process. We also
evaluated them in terms of tukeyhsd, barlett, levene and one way test to understand if the clusters have normality in distribution and consistency in variance
perspective . After interpretation of 2 factors, combinations of 3 and 4 factors were also evaluated. In the end of the process following 4 factors selected since
they gave the best results in terms of tests. The test applied to data and their results can be seen as follow:
3
K-Means Clustering Analysis
Factor Name
Type of Tests
Result
Anova Test Tukey HSD Bartlett Levene Oneway Test
Touristic(PT) Ho is accepted Ho is accepted Ho is rejected Ho is accepted Ho is accepted Bad
Coolness(SM) Ho is rejected Ho is accepted for
1 factor
Ho is rejected Ho is rejected Ho is rejected Good
Convenience(SM) Ho is rejected Ho is accepted for
1 factor
Ho is rejected Ho is rejected Ho is rejected Good
Environmental(SC) Ho is rejected Ho is rejected Ho is accepted Ho is accepted Ho is rejected Good
11. After evaluating trials in terms of both tests, sizes and meaningfulness; we decided to
use factors of Touristic(PT), Coolness(SM), Convenience(SM) and Environmental(SC)
for 3 clusters with the size of 97-90-58. The result of analysis for selected factors can
be seen in graph right hand side:
3
K-Means Cluster Analysis
In the end from K-Means Cluster analysis we had the insight of cluster 1 have very high
interest for Touristic(PT) and Coolness(SM) factors, meanwhile cluster 2 for
Convenience(SM) and cluster 3 has very low interest for Environmental(SC). Both
clusters have high interest to accessibility of public transportation, infrastructure and
social network(facebook and instagram) at the same time both of them have low
interest to type of sharing of shared mobility. Addition to that, cluster 2 has high
interest to convenience of shared mobility, whereas cluster 3 has low interest, and
traditional media. [Appendix]
From the result of interest of each cluster we gave them segment name respectively:
Cluster 1: Social Hommies
Cluster 2: Independents
Cluster 3 Traditionals
12. Even if this method does not give clear idea about number of clusters we
prefer to cut above height 40 since the distance of lines are more clear .
3
Other Cluster Analysis
For second step of cluster analysis, hierarchical cluster analysis was
performed to see if ordinal variables itself and categorical variables itself
also have effect on the cluster differentiation.
The graph on the left side demonstrates us while cutting above the height
of 40 3 clusters are meaningful, otherwise cutting belove the height of 40 4
clusters are meaningful.
We also double checked our cluster result that we got from K-Means
analysis, and from hierarchical analysis we also got 3 clusters result.
We also have performed latent class cluster analysis with selected factors also used in K-Means Analysis to compare
the result and checked the consistency of our clusters.
Since some of the variables are ordinal, we could not use them in the K-Means method. We also decided to use latent
analysis to see how values such as age and gender affect clusters.
However, as we have seen from the previous analysis, the addition of Covarite values made the analysis more complex
and difficult to interpret. Therefore, latent analysis did not give us significant results.
13. After the division of the clusters, it is fundamental to analyze them in order to evaluate their characteristics and, also, how they differ.
This will lead to a better understanding of the market segment will be approached.
In this section, will be discussed the relations between the variables and to further analyze the characteristics of the previously created
clusters. Each type of variable has a different approach to be performed, namely the analysis is going to be divided into:
● Categorical
● Ordinal
● İnterval
3
Descriptive Analysis
First of all , we analyzed what was the overall satisfaction of the survey with Public Transportation in order to try to exploit their
weaknesses and strengths.
Public Transportation
The conclusion of this table is that
customers are:
1. Unhappy with Punctuality and
Reliability
2. Satisfied with Cost and Availability
14. 3
Descriptive Analysis - Gender
Gender: Our 3 clusters were separated into their gender compositions to see if there was any major difference inside them. As can be seen from
the analysis of gender in the figures below, the clusters are well balanced in terms of males and females.
Categorical Data
The categorical variables of the provided data were: gender, age, living situation and living location. A graphical approach has been
used to take a look of the characteristics of the clusters.
15. Age: Secondly, the age variable will be analyzed in
terms of clusters, always remembering that the range
of age was from 16 and below to 30 and above. By
analyzing the survey, none of the clusters contained
people of ages 19 and below, that’s why they are not
appearing in the further graphs and also will be no
need of looking at these younger people in the
marketing campaign.
3
Descriptive Analysis - Age
Age Composition by Clusters
16. 3
Descriptive Analysis – Living Location
Living Location: In the survey we had also the information of where the individuals within the clusters live, with that, is possible to
perform an analysis also on this variable to see where are the areas that could be explored in the marketing campaign.
Living Situation: Finally, the last
categorical variable that will be
analyzed for the clusters is their
living situation.
17. AB response analysis: The clusters also responded differently to when asked questions about the payments methods.
As we know, question A was about paying a fee to use the scooter and question B was about paying a subscription and then a reduced
fee. The result shown was that Cluster 1 is mostly willing to pay when asked question A than B. The same happens for Cluster 2, but for
Cluster 3 they are not willing at all to pay for the scooter.
3
Descriptive Analysis - A/B Test
18. Not only is important to see the characteristics of the clusters is also fundamental to analyze the relations among the variables. Once again, the approaches
differ from the type of data, so it will be performed three different analysis, namely:
○ Contingency table
○ Multi-group boxplot
○ Scatter plot
3
Descriptive Analysis for Combinations
Where and How people live: The first analysis performed within variables
was to check where do people live relating with how they live. This
information can provide a better insight of living situation characteristics of
the survey population.
From this analysis we can see that most of the population live inside the city
of Milan and, also, share their residences with flat mates. This result is
expected as the survey targeted university students. Also, with a great
importance, are the individuals that live in other province with their families,
showing that, even though they study in an university in Milan, a high
percentage of the survey are still living with the family.
Contingency Tables: Categorical – Categorical Variables
.
From these two tables we can see that most of people already used shared mobility
but surprisingly a great number has never used shared mobility. For instance, 1/3 of
the survey that lives inside the city of Milan has never used share mobility, and it gets
even worse when looking at outside the Province of Milan that 72% also never used.
Also, from the second table we can see that most of the people that use share
mobility share also their residence with friends.
Which people already used shared mobility: Secondly, we wanted to know which are
the people that already used share mobility in their lives. This analysis is important to
know weather in the marketing campaign will be important to do a campaign to create
awareness of this type of transportation.
19. Where people are satisfied with shared mobility: For instance, the first
analysis performed was to see the relation with the satisfaction of the
people with the public transportation and the where they lived. In this
analysis was used the factor previously created that shows the satisfaction
of the survey with the public transportation
From the figure it can be seen that people who lives inside the city of Milan
are more satisfied with public transportation than people from anywhere
else. 3
Descriptive Analysis for Combinations
Multi Group Box Plot: İnterval - Categorical Variables
Since most of the variables were interval, our group chose the ones that
had important relations between themselves and would help most to
understand the survey.
In the second figure we can see that the people from inside the city of Milan
uses public transportation much more than outside the city.
Finally, in this last figure we can see that people from Milan and the Province of
Milan uses shared mobility way more frequently than people from outside the
Province, meaning that this could be the area to approach in the marketing
campaign.
20. Scatter Plot: İnterval - İnterval
Last but not least was evaluated how interval variables were related with scatter plots. Using the same approach, the most important variables
were evaluated. Also, was performed a correlation test to see if these variables were actually correlated.
For example, analyzing the factors related to Accessibility of Public
transportation and Consciousness of the Scooter Usage we can see
that they have a correlation as the p-value is lower than the
threshold of 0.05 and a correlation of 0.1289.
3
Descriptive Analysis for Combinations
21. Gender: Since p-value is greater than 0.05, H0 is accepted, which
means that the composition of gender is the same across the groups.
3
Discriminant Analysis
With the discriminant analysis we are going to verify how different the clusters are.
Age: The same happens for age, the chi-square analysis verifies that H0
is accepted as well, so the clusters do not differ in terms of age.
Living Location: Differing from the previous variables, living location has
a p-value greater than the threshold, which means that the cluster are
different in this characteristic.
Chi-square test: Used to compare categorical variables among clusters.
22. Also, from the chi-square analysis we can see if the clusters are following the expected number.
For instance, we can see that cluster 1 has as many people living in the city of Milan as expected. In the other hand it has more individuals from the
province of Milan than expected, but less of people from outside the province. Cluster 2 has slightly more people of the city than expected, but less
people from the province. Last, cluster 3 follows pretty much the expected from the chi-square analysis.
3
Discriminant Analysis
Living Situation: Last variable, that also shows a difference between its clusters, is the living situation.
Cluster 1: More person that live with parents than expected
Cluster 2: More people that live with friends and less that lives
with parents than expected
Cluster 3: Size as expected
23. After making the necessary adjustments to have first insight about clusters, we went forward to make sense of data through
regression analysis. The main purpose was to see which factors had the most effects on the customers willingness to pay. It was
difficult to explain the response with the variables because during the cluster analysis we saw that no matter how we cpmbine the
factor is, it would not create distinct clusters. However, the main purpose of this analysis is to understand the most prominent factors
in different pricing strategies rather than predicting the future.
In order to understand customers' preferred billing method, we divided them into two separate datasets according to A/B Test and
applied separate regression analysis on them.
First of all we performed Linear regression method to evaluate WTP of customer, however our selected independent variables could
not explain the dependent variable , which is A/B test, very well. [Appendix]
Since it was relatively difficult to analyze the results from linear regression, we decided to convert A/B test variables, which are likert
scale means interval variables, into binary values to use in the logistic regression.
In this way, our independent variables would directly explain the payment requests of the customers from different segments. For this
reason, we assigned the people, who gave the highest two choices to A/B Test , to 1 and the others to 0.
Regression Analysis
4
24. When we try to decide on question about which regression model should we use, we also take advantage of heatmap function in R
to understand correlation between indepent variables much better. The output of the correlation did not directly improve the result
of the regression but it creates a broader understanding of our data.
Regression Analysis
4
We can see that some variables can affect each other as we expected but extracting them from regression did not improve regression
that much. That's why we leave all of them in the model and interpret it in a broader sense
25. Logistic Regression Analysis – A Test
4
Coolness(SC)/Consciousness(SC)
Coolness phenomenon is a well explanatory of the
willingness to pay as we expected. Also, consciousness
perception of product reflects same insight with
environmental factors.
Time/# connection
Time spend in the traffic increase the willingness to pay
of customer but #connection make it decrease because
it means distance is too much for scooter.
Infrastructure(PT)
Well-developed public transportation shows welfare of
users. In this kind of transportation habits, usage of
scooter will increase.
Satisfaction(PT)
Increase in the satisfaction of public transportation
affects willingness to pay of customer negatively
because it will harder to use scooter instead of public
transportation.
Environmental(SC)/Environmental(PT)
Environmental variables have important role in the
willingness to pay of customer because it is the main
driver for customer to use scooter.
26. Logistic Regression Analysis – B Test
4
Coolness(SM)/Coolness(SC)/Consciousness(SC)
Coolness phenomenon is a well explanatory of the
willingness to pay as we expected. Also, consciousness
perception of product reflects same insight with
environmental factors.
Accessibility(PT)
Availability of the public transportation limits the shift to
scooter.
Touristic(PT)
Increase in touristic use of public transportation also
positively affects scooter usage which is logical in a sense
of object f the customer.
Infrastructure(PT)
Well-developed public transportation shows welfare of
users. In this kind of transportation habits, usage of
scooter will increase.
How often do you use public transportation?
Increase in the frequency of the public transportation
will decrease the intention of using scooter.
Usage(S)
Habit of using shared mobility directly affect the
willingness to pay of customer as we can imagine
normally.
Environmental(PT)/Environmental(SC)
Environmental variables have important role in the
willingness to pay of customer because it is the main
driver for customer to use scooter.
27. Decision Tree - A/B Test
4
We also decided to use additional tools to understand data since
regression gives limited perspective about willingness to pay analysis.
The simplest but the most useful one is Decision Tree Classifier. It gives
great insight about the variables and explain the part of the data that
can not be expressed by regression.
Some of the interactions are already seen from the regression like
Consciousness(SC), Coolness(SC), and Satisfaction(PT) which are
explaining left part of the tree. We can also access to the right from the
tree which is explaining the willingness to pay of different age segments
and Accessibility(PT) of public transportation.
When we analyzed the part of Test B, it would give us
different insight about our customers that we could not
possess from regression analysis. The social media
structure in the Test B, which are Online
newspaper/magazine, On-demand radio, Twitter,
YouTube and Snapchat are more meaningful in terms of
targeting the people in this segment.. We have also
seen the effect of the gender in one of the leaf and
small distinction which determine s the difference
between genders.
Test A Test B
28. 5
Evaluation of the Willings to Pay
In the analysis of responses of the A/B test and clusters, we tried
to assess the willingness to pay of the customers corresponding to
each cluster. To do that, we summed up all values, which is result
of the A/B test(likert scale from 1 to 5), for clusters that given
answer to A/B Response and analyzed them based on percentage
of this values. As we can see from the graphes right hand side,
cluster 2 is superior to all other clusters in A/B test. Also, to be
able to understand the behaviour of the A/B test, we add the
average of the clusters because the number of the responses in
the A/B are different from each other(There are 130 responses to
question A and 113 for question B). Since the average of the
cluster 2 is higher in the A when compared to B, we can say that
customers have more intention for willingness to pay in the
method of A and cluster 2 is superior to other clusters.
29. 5
Recommendation of Pricing Method
Interpretation Reccomendation
As we can see from the previous analysis,
cluster 2 is the most profitable among the
others and pricing method of A is superior
to others. Cluster 2 can be named as
“Independents” because they are mostly
living with flatmates or alone. They care
about Accessibility, Infrastructure and
Convenience the most and they are
majority of the respondents. They also
have the highest percentage of
respondents living in the Comune of
Milano. Also there is no significant
difference between the genders so that
our persona will be genderless.
It can be seen that for all clusters the average of
the responses are higher for A compared to B.
Therefore, pricing method of A is more attractive
to people overall, which means that they do not
want to have monthly subscription. This shows us
that respondents tend to use scooter for short
periods, so they are willing to pay more per
minute rather than having a fixed monthly
subscription and less amount per minute. As
result we reccomend to offer pricing and billing
method with the service costs €0,15 per minute
of use.
30. 6
Conclusion
Data Interpretation
X number of datasets were
created to analyze methods
easily in following steps.
22 Factors were selected from 80
observations.
Factor Analysis
DiscriminantDescriptive
Analysis Cluster
Analysis
Regression Analysis Evaluation of WTP and Pricing
Evaluation of A/B Test
For Test A; Consciousness(SC),
Coolness(SC), and Satisfaction(PT)
factors explain the dependent variable
better. For Test B; factor of Social
Media(P) has the significant impact.
Segmentation and Profiling
From K-Means Cluster Analysis, 4 Factors
for 3 Clusters were selected. Segments were
named as Social Hommies, Independents
and Traditionals based on results from
discriminant and descriptive analysis.
Recommendation of Pricing
Independents are selected as a target
segment since they are the one has highest
willingness to pay to first payment method
which is the service costs €0,15 per minute
of use.
35. Question # Variable Type Question Factos
1
Categorical
(Nominal) Do you travel at least once a week in the urban area of Milan?
2 Categorical When you travel in the urban area of Milan, what kind of transportation do you use most often?
3
Likert Scale
(Ordinal) What are the reasons that you most often choose public transportation?
Factors1: Environmental(PT), Accesibility(PT), Touristic(PT), Infrastructure(PT),
Traffic(PT)
4 Ordinal What kind of ticket for public transportation do you use most often?
5 Ordinal How often do you use public transportation?
6 Ordinal
For most of these trips, how many connections do you need to make (change metro lines, change
bus lines, etc.)?
7 Likert Scale On average, how satisfied are you with the public transportation? Factors2: Satisfaction(PT)
8 Ordinal For most of these trips, how much time would you spend with public transportation?
9 Likert Scale
What are the reasons that you most often use your own car? Factors3: Time and Comfort(C), Cool to Drive(C), Unavailability(C)
10 Ordinal How often do you travel with your own car in the urban area of Milan?
11 Ordinal For most of these trips, how easy could you find parking?
12 Ordinal For most of these trips, how much time would you spend with your own car?
13 Likert Scale On average, how satisfied are you with travelling with your car? Factors4: Satisfaction(C)
14 Categorical Have you ever used shared mobility in the urban area of Milan?
15 Likert Scale How often do you use the following types of shared mobility? Factors5: Type of Sharing
16 Likert Scale How well do the following scenarios describe your usage of shared mobility? Factors6: Usage(S)
17 Likert Scale
What are the reasons that motivate you to use shared mobility? Factors7: Coolness(SM), Convenience(SM)
18 Likert Scale On average, how satisfied are you with shared mobility that you have used? Factors8: Satisfaction(SM)
19 Likert Scale What is your opinion about using a scooter sharing? Factors9: Coolness(SC), Consciousness(SC), Environmental(SC)
A/B Test 20 Likert Scale 50.0% Consider that the service costs €0,15 per minute of use. How likely are you
going to try the service?
Consider that the service requires a monthly subscription of €8, and costs€0,05
per minute of use. How likely are you going to try the service?
21 Categorical Your gender is:
22 Ordinal Your age is?
23 Categorical Where do you live?
24 Categorical What is your living situation?
25 Likert Scale How often do you use the following media channels? Factors10: Traditional(P), News(P), Informative Network(P), Social Media(P)
7
Apendix
36. 4 factor Analysis Social Hommies Independents Traditionals
Environmental(PT) high
medium
medium
Accessibility(PT) high high high
Touristic(PT) high low medium
Infrastructure(PT) high high high
Traffic(PT)
medium medium
medium
Satisfaction(PT) high
medium
medium
Type of sharing(SM) low low low
Usage(SM)
medium medium
low
Coolness(SM) high low medium
Convenience(SM)
medium
high low
Satisfaction(SM)
medium medium
medium
Coolness(SC)
medium medium
medium
Consciousness(SC) high low low
Environmental(SC)
medium medium
low
Traditional(P)
medium
high medium
News(P)
medium medium
medium
Informative(P)
medium medium
medium
Social Network(P) high high
high
7
Apendix