Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Marketing Analysis of Scooter (Monopattino) Sharing in Milan


Published on

A marketing analysis was done by Emre Danışan, Giovanni Roja, Mehmet Berk Souksu and Aslı Şenel in late 2019 for Scooter (Monopattino) Sharing services in Milan city.

Extensive research on shared mobility services all over Europe and various machine learning methods are used to analyze data and discover the results.

Published in: Marketing
  • Be the first to comment

Marketing Analysis of Scooter (Monopattino) Sharing in Milan

  1. 1. Scooter (Monopattino) Sharing Service Analysis of responses to the questionnaire with marketing purposes, using machine learning. Emre Danışan Giovanni Roja Mehmet Berk Souksu Aslı Senel
  2. 2. Context of the Project Introduction 1 Cluster Analysis Discriminant Analysis Descriptive Analysis Segmentation and Profiling 3 Logistic Regression Decision Tree Regression Analysis 4 Recommendation of Pricing Conclusion 6 Appendix 7 Understanding Data Cleaning Data Factor Analysis Data Preparation 2 5 Agenda
  3. 3. Launching a scooter (monopattino) sharing service in Milan Target population: university students in Milan The objectives are listed as follows: Understand the potential customers Segmentation of the potential customers and their uncover insights Evaluation of the willingness to pay of customers and recommendation of pricing and billing method 1 Context of the Project
  4. 4. The survey consists of 25 questions. Some of them were including sub questions in the form of Likert Scale rating. There are 353 data rows, whereas 272 of them could have been completed. There were incomplete rows and some mistakes in the data, which we will explain and demonstrate how we have handled them. While analyzing the data, we understood that there are 4 main sections which were dedicated to public transportation, car, shared mobility and scooter sharing. Two sections public transportation users (PT) and car owners (C) make up two parallel paths in terms of encountering questions during the survey. Both of the paths unite in shared mobility section (SM) and continue to scooter (SC) and personal (P) questions. Understanding the Data 2
  5. 5. First we had to separate the variables into two dataframes respective to the paths. After doing so, we removed the question number 2 which separates the paths. The first question makes people directly exit, if answered no, so we removed the question columns and the observations that directly exit. Since there is an A/B test, which is the question number 20, we have needed to separate our observations depending on the question viewed. To be able to not have “Not Available” rows we created sub datasheets for each part of the question which are: Public_Trans : Public Transportation Users Car : Car owners Shared : Shared Mobility Scooter : Scooter and Personal questions Public_Shared : Public Transportation & Shared Mobility Cleaning the Data - Questionnaire 2
  6. 6. It is identified that 9 columns are meta data, which are irrelevant to the our data analysis. So following columns were removed from original data. The meta data are as follow: Respondent ID | Collector ID | Start Date | End Date | IP Address Email Address | First Name | Last Name | Custom Data 1 Since the both of the first two rows are used for header information, we removed one of them and put the necessary names on the second row. We added 11 to age information, to make it show real ages’ number. There was a problem in the 14th question; it had no values filled. So we wrote a for loop to correct the problem looking at the answers to the 15th question. We also changed the answers of `Question Viewed` to 0 and 1 to simplify and convert all the data to numericals. Cleaning the Data - Excel Sheet 2
  7. 7. For our factor analysis, we have factored the variables which are belong to the same segment and same question. So they’re in the same context and we can make the correlations between variables easier. While selecting the appropriate factor number, we checked the Eigen values, as you can see in the graph. We prioritized to take only the ones above 1, however depending on the elbow point and the combination of the variables that makes sense, we took some values above 0.90 as well. Factor analysis have been performed to the data related with: Public Transportation(PT), Shared Mobility(SM), Car(C), Personal (P) Factor Analysis 2
  8. 8. Example of Factor: Public Transportation 2 Factor Analysis Other factor sepertions can be seen in [Appendix].
  9. 9. 3 Segmentation and Profiling Main objectives of this section settled as follow: Inside homogenity Outside heterogenity Meaningful size between the clusters Evaulation of meaningfulness of differences and sizes For first step; K-Means Cluster analysis have been performed. K-Means Cluster Analysis was performed with the factors from factor analysis which are in Likert Scale form. Even though likert scale variables are ordinal data from nature, for this marketing research they are regarded as interval data.
  10. 10. Firstly, number of combination with the range of 2 to 5 factors were analyzed for 2 to 6 clusters for each factors respectively. In the deciding process, first scatter plot of each factors was interpreted for respective cluster number. To see how clusters are different, boxplots of combinations were evaluated. To understand the respective meaningfulness of differences, t tests were examined and mean of groups were evaluated. Finally, the size of each cluster was checked to see if they have effective number to be target segment. 18 combinations of 2 factors were evluated. Coolness(SM) and Convenience(SM) fastors were selected since they gave the best result in the process. We also evaluated them in terms of tukeyhsd, barlett, levene and one way test to understand if the clusters have normality in distribution and consistency in variance perspective . After interpretation of 2 factors, combinations of 3 and 4 factors were also evaluated. In the end of the process following 4 factors selected since they gave the best results in terms of tests. The test applied to data and their results can be seen as follow: 3 K-Means Clustering Analysis Factor Name Type of Tests Result Anova Test Tukey HSD Bartlett Levene Oneway Test Touristic(PT) Ho is accepted Ho is accepted Ho is rejected Ho is accepted Ho is accepted Bad Coolness(SM) Ho is rejected Ho is accepted for 1 factor Ho is rejected Ho is rejected Ho is rejected Good Convenience(SM) Ho is rejected Ho is accepted for 1 factor Ho is rejected Ho is rejected Ho is rejected Good Environmental(SC) Ho is rejected Ho is rejected Ho is accepted Ho is accepted Ho is rejected Good
  11. 11. After evaluating trials in terms of both tests, sizes and meaningfulness; we decided to use factors of Touristic(PT), Coolness(SM), Convenience(SM) and Environmental(SC) for 3 clusters with the size of 97-90-58. The result of analysis for selected factors can be seen in graph right hand side: 3 K-Means Cluster Analysis In the end from K-Means Cluster analysis we had the insight of cluster 1 have very high interest for Touristic(PT) and Coolness(SM) factors, meanwhile cluster 2 for Convenience(SM) and cluster 3 has very low interest for Environmental(SC). Both clusters have high interest to accessibility of public transportation, infrastructure and social network(facebook and instagram) at the same time both of them have low interest to type of sharing of shared mobility. Addition to that, cluster 2 has high interest to convenience of shared mobility, whereas cluster 3 has low interest, and traditional media. [Appendix] From the result of interest of each cluster we gave them segment name respectively: Cluster 1: Social Hommies Cluster 2: Independents Cluster 3 Traditionals
  12. 12. Even if this method does not give clear idea about number of clusters we prefer to cut above height 40 since the distance of lines are more clear . 3 Other Cluster Analysis For second step of cluster analysis, hierarchical cluster analysis was performed to see if ordinal variables itself and categorical variables itself also have effect on the cluster differentiation. The graph on the left side demonstrates us while cutting above the height of 40 3 clusters are meaningful, otherwise cutting belove the height of 40 4 clusters are meaningful. We also double checked our cluster result that we got from K-Means analysis, and from hierarchical analysis we also got 3 clusters result. We also have performed latent class cluster analysis with selected factors also used in K-Means Analysis to compare the result and checked the consistency of our clusters. Since some of the variables are ordinal, we could not use them in the K-Means method. We also decided to use latent analysis to see how values such as age and gender affect clusters. However, as we have seen from the previous analysis, the addition of Covarite values made the analysis more complex and difficult to interpret. Therefore, latent analysis did not give us significant results.
  13. 13. After the division of the clusters, it is fundamental to analyze them in order to evaluate their characteristics and, also, how they differ. This will lead to a better understanding of the market segment will be approached. In this section, will be discussed the relations between the variables and to further analyze the characteristics of the previously created clusters. Each type of variable has a different approach to be performed, namely the analysis is going to be divided into: ● Categorical ● Ordinal ● İnterval 3 Descriptive Analysis First of all , we analyzed what was the overall satisfaction of the survey with Public Transportation in order to try to exploit their weaknesses and strengths. Public Transportation The conclusion of this table is that customers are: 1. Unhappy with Punctuality and Reliability 2. Satisfied with Cost and Availability
  14. 14. 3 Descriptive Analysis - Gender Gender: Our 3 clusters were separated into their gender compositions to see if there was any major difference inside them. As can be seen from the analysis of gender in the figures below, the clusters are well balanced in terms of males and females. Categorical Data The categorical variables of the provided data were: gender, age, living situation and living location. A graphical approach has been used to take a look of the characteristics of the clusters.
  15. 15. Age: Secondly, the age variable will be analyzed in terms of clusters, always remembering that the range of age was from 16 and below to 30 and above. By analyzing the survey, none of the clusters contained people of ages 19 and below, that’s why they are not appearing in the further graphs and also will be no need of looking at these younger people in the marketing campaign. 3 Descriptive Analysis - Age Age Composition by Clusters
  16. 16. 3 Descriptive Analysis – Living Location Living Location: In the survey we had also the information of where the individuals within the clusters live, with that, is possible to perform an analysis also on this variable to see where are the areas that could be explored in the marketing campaign. Living Situation: Finally, the last categorical variable that will be analyzed for the clusters is their living situation.
  17. 17. AB response analysis: The clusters also responded differently to when asked questions about the payments methods. As we know, question A was about paying a fee to use the scooter and question B was about paying a subscription and then a reduced fee. The result shown was that Cluster 1 is mostly willing to pay when asked question A than B. The same happens for Cluster 2, but for Cluster 3 they are not willing at all to pay for the scooter. 3 Descriptive Analysis - A/B Test
  18. 18. Not only is important to see the characteristics of the clusters is also fundamental to analyze the relations among the variables. Once again, the approaches differ from the type of data, so it will be performed three different analysis, namely: ○ Contingency table ○ Multi-group boxplot ○ Scatter plot 3 Descriptive Analysis for Combinations Where and How people live: The first analysis performed within variables was to check where do people live relating with how they live. This information can provide a better insight of living situation characteristics of the survey population. From this analysis we can see that most of the population live inside the city of Milan and, also, share their residences with flat mates. This result is expected as the survey targeted university students. Also, with a great importance, are the individuals that live in other province with their families, showing that, even though they study in an university in Milan, a high percentage of the survey are still living with the family. Contingency Tables: Categorical – Categorical Variables . From these two tables we can see that most of people already used shared mobility but surprisingly a great number has never used shared mobility. For instance, 1/3 of the survey that lives inside the city of Milan has never used share mobility, and it gets even worse when looking at outside the Province of Milan that 72% also never used. Also, from the second table we can see that most of the people that use share mobility share also their residence with friends. Which people already used shared mobility: Secondly, we wanted to know which are the people that already used share mobility in their lives. This analysis is important to know weather in the marketing campaign will be important to do a campaign to create awareness of this type of transportation.
  19. 19. Where people are satisfied with shared mobility: For instance, the first analysis performed was to see the relation with the satisfaction of the people with the public transportation and the where they lived. In this analysis was used the factor previously created that shows the satisfaction of the survey with the public transportation From the figure it can be seen that people who lives inside the city of Milan are more satisfied with public transportation than people from anywhere else. 3 Descriptive Analysis for Combinations Multi Group Box Plot: İnterval - Categorical Variables Since most of the variables were interval, our group chose the ones that had important relations between themselves and would help most to understand the survey. In the second figure we can see that the people from inside the city of Milan uses public transportation much more than outside the city. Finally, in this last figure we can see that people from Milan and the Province of Milan uses shared mobility way more frequently than people from outside the Province, meaning that this could be the area to approach in the marketing campaign.
  20. 20. Scatter Plot: İnterval - İnterval Last but not least was evaluated how interval variables were related with scatter plots. Using the same approach, the most important variables were evaluated. Also, was performed a correlation test to see if these variables were actually correlated. For example, analyzing the factors related to Accessibility of Public transportation and Consciousness of the Scooter Usage we can see that they have a correlation as the p-value is lower than the threshold of 0.05 and a correlation of 0.1289. 3 Descriptive Analysis for Combinations
  21. 21. Gender: Since p-value is greater than 0.05, H0 is accepted, which means that the composition of gender is the same across the groups. 3 Discriminant Analysis With the discriminant analysis we are going to verify how different the clusters are. Age: The same happens for age, the chi-square analysis verifies that H0 is accepted as well, so the clusters do not differ in terms of age. Living Location: Differing from the previous variables, living location has a p-value greater than the threshold, which means that the cluster are different in this characteristic. Chi-square test: Used to compare categorical variables among clusters.
  22. 22. Also, from the chi-square analysis we can see if the clusters are following the expected number. For instance, we can see that cluster 1 has as many people living in the city of Milan as expected. In the other hand it has more individuals from the province of Milan than expected, but less of people from outside the province. Cluster 2 has slightly more people of the city than expected, but less people from the province. Last, cluster 3 follows pretty much the expected from the chi-square analysis. 3 Discriminant Analysis Living Situation: Last variable, that also shows a difference between its clusters, is the living situation. Cluster 1: More person that live with parents than expected Cluster 2: More people that live with friends and less that lives with parents than expected Cluster 3: Size as expected
  23. 23. After making the necessary adjustments to have first insight about clusters, we went forward to make sense of data through regression analysis. The main purpose was to see which factors had the most effects on the customers willingness to pay. It was difficult to explain the response with the variables because during the cluster analysis we saw that no matter how we cpmbine the factor is, it would not create distinct clusters. However, the main purpose of this analysis is to understand the most prominent factors in different pricing strategies rather than predicting the future. In order to understand customers' preferred billing method, we divided them into two separate datasets according to A/B Test and applied separate regression analysis on them. First of all we performed Linear regression method to evaluate WTP of customer, however our selected independent variables could not explain the dependent variable , which is A/B test, very well. [Appendix] Since it was relatively difficult to analyze the results from linear regression, we decided to convert A/B test variables, which are likert scale means interval variables, into binary values to use in the logistic regression. In this way, our independent variables would directly explain the payment requests of the customers from different segments. For this reason, we assigned the people, who gave the highest two choices to A/B Test , to 1 and the others to 0. Regression Analysis 4
  24. 24. When we try to decide on question about which regression model should we use, we also take advantage of heatmap function in R to understand correlation between indepent variables much better. The output of the correlation did not directly improve the result of the regression but it creates a broader understanding of our data. Regression Analysis 4 We can see that some variables can affect each other as we expected but extracting them from regression did not improve regression that much. That's why we leave all of them in the model and interpret it in a broader sense
  25. 25. Logistic Regression Analysis – A Test 4 Coolness(SC)/Consciousness(SC) Coolness phenomenon is a well explanatory of the willingness to pay as we expected. Also, consciousness perception of product reflects same insight with environmental factors. Time/# connection Time spend in the traffic increase the willingness to pay of customer but #connection make it decrease because it means distance is too much for scooter. Infrastructure(PT) Well-developed public transportation shows welfare of users. In this kind of transportation habits, usage of scooter will increase. Satisfaction(PT) Increase in the satisfaction of public transportation affects willingness to pay of customer negatively because it will harder to use scooter instead of public transportation. Environmental(SC)/Environmental(PT) Environmental variables have important role in the willingness to pay of customer because it is the main driver for customer to use scooter.
  26. 26. Logistic Regression Analysis – B Test 4 Coolness(SM)/Coolness(SC)/Consciousness(SC) Coolness phenomenon is a well explanatory of the willingness to pay as we expected. Also, consciousness perception of product reflects same insight with environmental factors. Accessibility(PT) Availability of the public transportation limits the shift to scooter. Touristic(PT) Increase in touristic use of public transportation also positively affects scooter usage which is logical in a sense of object f the customer. Infrastructure(PT) Well-developed public transportation shows welfare of users. In this kind of transportation habits, usage of scooter will increase. How often do you use public transportation? Increase in the frequency of the public transportation will decrease the intention of using scooter. Usage(S) Habit of using shared mobility directly affect the willingness to pay of customer as we can imagine normally. Environmental(PT)/Environmental(SC) Environmental variables have important role in the willingness to pay of customer because it is the main driver for customer to use scooter.
  27. 27. Decision Tree - A/B Test 4 We also decided to use additional tools to understand data since regression gives limited perspective about willingness to pay analysis. The simplest but the most useful one is Decision Tree Classifier. It gives great insight about the variables and explain the part of the data that can not be expressed by regression. Some of the interactions are already seen from the regression like Consciousness(SC), Coolness(SC), and Satisfaction(PT) which are explaining left part of the tree. We can also access to the right from the tree which is explaining the willingness to pay of different age segments and Accessibility(PT) of public transportation. When we analyzed the part of Test B, it would give us different insight about our customers that we could not possess from regression analysis. The social media structure in the Test B, which are Online newspaper/magazine, On-demand radio, Twitter, YouTube and Snapchat are more meaningful in terms of targeting the people in this segment.. We have also seen the effect of the gender in one of the leaf and small distinction which determine s the difference between genders. Test A Test B
  28. 28. 5 Evaluation of the Willings to Pay In the analysis of responses of the A/B test and clusters, we tried to assess the willingness to pay of the customers corresponding to each cluster. To do that, we summed up all values, which is result of the A/B test(likert scale from 1 to 5), for clusters that given answer to A/B Response and analyzed them based on percentage of this values. As we can see from the graphes right hand side, cluster 2 is superior to all other clusters in A/B test. Also, to be able to understand the behaviour of the A/B test, we add the average of the clusters because the number of the responses in the A/B are different from each other(There are 130 responses to question A and 113 for question B). Since the average of the cluster 2 is higher in the A when compared to B, we can say that customers have more intention for willingness to pay in the method of A and cluster 2 is superior to other clusters.
  29. 29. 5 Recommendation of Pricing Method Interpretation Reccomendation As we can see from the previous analysis, cluster 2 is the most profitable among the others and pricing method of A is superior to others. Cluster 2 can be named as “Independents” because they are mostly living with flatmates or alone. They care about Accessibility, Infrastructure and Convenience the most and they are majority of the respondents. They also have the highest percentage of respondents living in the Comune of Milano. Also there is no significant difference between the genders so that our persona will be genderless. It can be seen that for all clusters the average of the responses are higher for A compared to B. Therefore, pricing method of A is more attractive to people overall, which means that they do not want to have monthly subscription. This shows us that respondents tend to use scooter for short periods, so they are willing to pay more per minute rather than having a fixed monthly subscription and less amount per minute. As result we reccomend to offer pricing and billing method with the service costs €0,15 per minute of use.
  30. 30. 6 Conclusion Data Interpretation X number of datasets were created to analyze methods easily in following steps. 22 Factors were selected from 80 observations. Factor Analysis DiscriminantDescriptive Analysis Cluster Analysis Regression Analysis Evaluation of WTP and Pricing Evaluation of A/B Test For Test A; Consciousness(SC), Coolness(SC), and Satisfaction(PT) factors explain the dependent variable better. For Test B; factor of Social Media(P) has the significant impact. Segmentation and Profiling From K-Means Cluster Analysis, 4 Factors for 3 Clusters were selected. Segments were named as Social Hommies, Independents and Traditionals based on results from discriminant and descriptive analysis. Recommendation of Pricing Independents are selected as a target segment since they are the one has highest willingness to pay to first payment method which is the service costs €0,15 per minute of use.
  31. 31. 7 Apendix
  32. 32. 7 Apendix
  33. 33. 7 Apendix
  34. 34. 7 Apendix
  35. 35. Question # Variable Type Question Factos 1 Categorical (Nominal) Do you travel at least once a week in the urban area of Milan? 2 Categorical When you travel in the urban area of Milan, what kind of transportation do you use most often? 3 Likert Scale (Ordinal) What are the reasons that you most often choose public transportation? Factors1: Environmental(PT), Accesibility(PT), Touristic(PT), Infrastructure(PT), Traffic(PT) 4 Ordinal What kind of ticket for public transportation do you use most often? 5 Ordinal How often do you use public transportation? 6 Ordinal For most of these trips, how many connections do you need to make (change metro lines, change bus lines, etc.)? 7 Likert Scale On average, how satisfied are you with the public transportation? Factors2: Satisfaction(PT) 8 Ordinal For most of these trips, how much time would you spend with public transportation? 9 Likert Scale What are the reasons that you most often use your own car? Factors3: Time and Comfort(C), Cool to Drive(C), Unavailability(C) 10 Ordinal How often do you travel with your own car in the urban area of Milan? 11 Ordinal For most of these trips, how easy could you find parking? 12 Ordinal For most of these trips, how much time would you spend with your own car? 13 Likert Scale On average, how satisfied are you with travelling with your car? Factors4: Satisfaction(C) 14 Categorical Have you ever used shared mobility in the urban area of Milan? 15 Likert Scale How often do you use the following types of shared mobility? Factors5: Type of Sharing 16 Likert Scale How well do the following scenarios describe your usage of shared mobility? Factors6: Usage(S) 17 Likert Scale What are the reasons that motivate you to use shared mobility? Factors7: Coolness(SM), Convenience(SM) 18 Likert Scale On average, how satisfied are you with shared mobility that you have used? Factors8: Satisfaction(SM) 19 Likert Scale What is your opinion about using a scooter sharing? Factors9: Coolness(SC), Consciousness(SC), Environmental(SC) A/B Test 20 Likert Scale 50.0% Consider that the service costs €0,15 per minute of use. How likely are you going to try the service? Consider that the service requires a monthly subscription of €8, and costs€0,05 per minute of use. How likely are you going to try the service? 21 Categorical Your gender is: 22 Ordinal Your age is? 23 Categorical Where do you live? 24 Categorical What is your living situation? 25 Likert Scale How often do you use the following media channels? Factors10: Traditional(P), News(P), Informative Network(P), Social Media(P) 7 Apendix
  36. 36. 4 factor Analysis Social Hommies Independents Traditionals Environmental(PT) high medium medium Accessibility(PT) high high high Touristic(PT) high low medium Infrastructure(PT) high high high Traffic(PT) medium medium medium Satisfaction(PT) high medium medium Type of sharing(SM) low low low Usage(SM) medium medium low Coolness(SM) high low medium Convenience(SM) medium high low Satisfaction(SM) medium medium medium Coolness(SC) medium medium medium Consciousness(SC) high low low Environmental(SC) medium medium low Traditional(P) medium high medium News(P) medium medium medium Informative(P) medium medium medium Social Network(P) high high high 7 Apendix
  37. 37. 7 Apendix
  38. 38. 7 Apendix
  39. 39. 7 Apendix
  40. 40. 7 Apendix