Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

Become a citizen data scientist

566 views

Published on

Understand your customer:

Profiling, Segmentation, Targeting and Recommendation using
Microsoft Azure ML, SQL, Power BI

Published in: Data & Analytics
  • Get paid to send out tweets - $25 per hour ★★★ http://t.cn/AieX6y8B
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here
  • Get Paid To Manage Facebook Fan Pages! Facebook Fan Page Workers Required - Start Immediately. ➤➤ http://t.cn/AieX6y8B
       Reply 
    Are you sure you want to  Yes  No
    Your message goes here

Become a citizen data scientist

  1. 1. Become a Citizen Data Scientist Marketing Perspective ©uluumy, 2016
  2. 2. Understand your customer: Profiling, Segmentation, Targeting and Recommendation using Microsoft Azure ML, SQL, Power BI ©uluumy, 2016 2
  3. 3. Take a look to our course: 50% Off Become a Citizen Data Scientist ©uluumy, 2016 3
  4. 4. Syllabus ▪ Introduction ▪ Lay the foundation ▪ Explore ▪ Segment ▪ Target ▪ Recommend ©uluumy, 2016 4
  5. 5. Introduction ©uluumy, 2016 5
  6. 6. Citizen Data Scientist ©uluumy, 2016 6
  7. 7. According to a Mckinsey Study, demand for Data Scientists is projected to exceed supply by more than 50% by 2018. Source: MCKINSEY, "Big data:The next frontier for innovation, competition, and productivity", 2011 ©uluumy, 2016 7
  8. 8. The term Citizen Data Scientist was introduced by Gartner in its 2015 Hype Cycle for EmergingTechnologies which we’re going present later in this lecture. Here is the definition given by Gartner : “A person who creates or generates models that leverage predictive or prescriptive analytics but whose primary job function is outside of the field of statistics” Source: GARTNER, "Hype Cycle for EmergingTechnologies", 2015 ©uluumy, 2016 8
  9. 9. Gartner Hype Cycle Gartner Hype Cycle provides a graphic representation of the maturity and adoption of technologies and applications, and how they are potentially relevant to solving real business problems and exploiting new opportunities. ©uluumy, 2016 9
  10. 10. Gartner’s EmergingTechnologies Hype Cycle contains a representative set of still-maturing technologies that receive interest from clients, and technologies that Gartner feels are significant and should be monitored.“ Gartner is predicting that Citizen Data Scientist and Advanced analytics with Self-service delivery to reach the Plateau of productivity in 2 to 5 years. ©uluumy, 2016 10
  11. 11. ©uluumy, 2016 11 Source: GARTNER, "Hype Cycle for Emerging Technologies", 2015
  12. 12. Why do we need a citizen data scientist? ▪ First, the shortage of data scientists (because of the highly specialized skills needed in: computer science, coding, mathematics, machine learning, statistics) .... and still they need to have in-depth knowledge of the business ▪ Second, the rise of self-service data preparation ▪ Third, the development of advanced analytics platforms (Microsoft Machine Learning, IBM Watson, ...) ©uluumy, 2016 12
  13. 13. The need is so, that according to Gartner, by 2017, the number of citizen data scientists will grow 5 times faster than the number of highly skilled data scientists. ©uluumy, 2016 13
  14. 14. Who could be a Citizen Data Scientist? In most companies, they’re already there. Here who they are.They Have: ▪ Solid business domain knowledge (marketing, finance, sales, operations, ...), ▪ Analytical mindset, ▪ The willingness to learn new methods and use new tools ©uluumy, 2016 14
  15. 15. Benefits The rise of Citizen Data Scientist is a great opportunity for every organization. Because business people will bring with them: ▪ Contextual knowledge ▪ the Democratization of analytics in every department ©uluumy, 2016 15
  16. 16. Lay the Foundations: Definitions Data Science is not equal to Big Data… ©uluumy, 2016 16
  17. 17. “You don’t have to have a petabyte of data and the expenses that come along with it in order to predict the interest of your customer base” Source: John W. Foreman : Data Smart: Using Data Science toTransform Information into Insight ©uluumy, 2016 17
  18. 18. Data Science: “Data Science is the transformation of data using mathematics and statistics into valuable insight, decisions, and products” Source: Introduction to Machine Learning, 2nd Edition, MIT Press ©uluumy, 2016 18
  19. 19. Machine learning: “The goal of machine learning is to program computers to use example data or past experience to solve a given problem.” Source: Introduction to Machine Learning, 2nd Edition, MIT Press ©uluumy, 2016 19
  20. 20. Machine Learning : 2 (main) categories 1- Supervised Learning: Prediction.You want to predict unknown answers from answers you already have. 2- Unsupervised Learning: Categorization.You want to find unknown answers mostly grouping- directly from data. ©uluumy, 2016 20
  21. 21. Supervised Learning.. Supervised learning can be separated into to general categories of algorithms: • Classification algorithms: are used to predict categorical responses. As example we can cite: ▪ Credit card fraud detection ▪ Customer likely to churn ▪ Customer targeting • Regression: used to predict continuous variable. • Example: Predict the future sales of a product ©uluumy, 2016 21
  22. 22. Unsupervised Learning.. Unsupervised Learning: Categorization.You want to find unknown answers mostly grouping- directly from data. • Customer segmentation • Recommendation system ©uluumy, 2016 22
  23. 23. Data Science Process According to a KDNuggets poll, 43% of the advanced analytics projects use the CRISP-DM methodology ©uluumy, 2016 23
  24. 24. CRIS-DM Methodology SourceWikipedia©uluumy, 2016 24 Source :Wikipedia
  25. 25. The process is composed of 5 steps The key point to note is that the Process is circular rather than linear. It means that we can and should go back and forth between the steps. Source: CRISP-DM : Cross Industry Standard Process for Data Mining http://spss.ch/upload/1107356429_CrispDM1.0.pdf ©uluumy, 2016 25
  26. 26. 1- Business Understanding The first step is BUSINESS UDERSTANDING. It’s the most critical step of the process.You need to frame the problem. At the end of this stage you should have a deep understanding of the problem you want to resolve and a clear idea about the data you will use ©uluumy, 2016 26
  27. 27. 2- Data Understanding The second step is DATA UNDERSTANDING. Your business knowledge will help you to contextualize your data. You notice that the steps Business understanding and Data Understanding are linked together with a double arrow. ©uluumy, 2016 27
  28. 28. 3- Data Preparation The third step is DATA PREPARATION. In this stage you will check for the common issues like missing values and outliers. Also doing operating like filtering merging and transformation Also you run some data exploration using graphics and tables. ©uluumy, 2016 28
  29. 29. 4- Modeling The fourth step is the MODELING step. In this stage you build your model (for example a regression or a classification). You notice that this step is linked to the previous one with a double arrow which mean that you will often need to step back to Data Preparation. ©uluumy, 2016 29
  30. 30. 5- Evaluation The next step is EVALUATION. Every model you build has to be evaluated in term of accuracy, robustness and deployability. You notice that at this stage you may have to step back to the Business Understanding stage if the model you have built could not be deployed ©uluumy, 2016 30
  31. 31. 6- Deployment The last step is DEPLOYMENT. The final purpose of any data science project is to give actionable insight. ©uluumy, 2016 31
  32. 32. Data Science Toolbox ©uluumy, 2016 32
  33. 33. ©uluumy, 2016 33
  34. 34. •Data preparation : SQL. a must know tool • 60% of data scientists said they spent the most time cleaning and organizing data * • SQL : first among the top 10 in-demand skills for data scientists* •Data visualization : Power BI and Excel. •Data analysis : Azure Machine Learning • 55% of data scientists think that Machine Learning had significant importance for their companies * * source: CrowdFlower, 2016 Data Science Report ©uluumy, 2016 34
  35. 35. CRIS-DM & Data Science Toolbox SourceWikipedia©uluumy, 2016 35
  36. 36. Data Preparation The first tool is SQL. According to the CrowdFlower 2016 Data Science Report, 60% of data scientists said they spent the most time cleaning and organizing data. It’sTHE language of database. You will have to use SQL in order to process the data preparation step. SQL is a must KNOW tool. ©uluumy, 2016 36
  37. 37. Data Visualization & Data Analysis • DataVisualization For DataVisualization, we will use Power BI • Data Analysis Microsoft Azure ML is one of the most relevant tool to use for citizen data scientist because of its ability to quickly create machine learning experiments and because its slight learning curve. ©uluumy, 2016 37
  38. 38. Microsoft Azure Machine Learning «While machine learning has been around for a long time, usage was primarily restricted to people with deep skills and deep pockets.The cloud changes this dynamic completely » Joseph Sirosh CorporateVice President, Data Group at Microsoft ©uluumy, 2016 38
  39. 39. Azure Machine Learning Workflow At high level we can divide it into 3 blocks: Data, Machine Learning Services, and Visualisation You can clearly see how this workflow ca be embedded into the CRISP Data science we have presented in a previous lecture ©uluumy, 2016 39 Source: Microsoft
  40. 40. Marketing Framework Analysis About 90% of the data collected by companies today are related to customer actions and marketing activities ©uluumy, 2016 40
  41. 41. “Marketing thinking is shifting from trying to maximize the company’s profit from each transaction to maximize the long-run profit from each relationship”,To rephrase it, Companies has shifted from being product-centric to CUSTOMER centric”. Philip Kotler ©uluumy, 2016 41
  42. 42. That’s why it’s of vital importance to know as much as possible about our customer and to customize as much as possible our offer to each. Despite the huge amount of data, we now have on each of our customer (from CRM,Web Site, Social Media) and the complexity to have a 360 view of the customer; still, we can frame the relationship with our customer with these 4 questions: ©uluumy, 2016 42
  43. 43. 1-Who are my customers? 2- How to reach and interact with them? 3-Which customer should I target? 4-What is the best next-offer? These 4 questions lead as to this 4 blocks Marketing Framework Analysis: ©uluumy, 2016 43
  44. 44. ©uluumy, 2016 44
  45. 45. Philip Kotler, the founder of Modern Marketing said “Marketing’s future lies in database marketing where we know enough about each customer to make relevant and customized offers to each” Explore ©uluumy, 2016 45
  46. 46. AdventureWorks is a sample database created for use in demos and training on each version of Microsoft SQL Server. A company which manufactures and sells metal and composite bicycles to North American, European and Asian. two categories of customers: B2B : team of sales B2C : E-commerce Case Study : Adventure Works ©uluumy, 2016 46
  47. 47. According to the CrowdFlower 2016 Data Science Report, SQL is first among the top 10 in-demand skills for data scientists. SQL Basics ©uluumy, 2016 47
  48. 48. SQL stands for Structured Query Language SQL is the language of databases : creation, access, manipulation Relational Database : a software to offer access to stored information and their manipulation. Information are stored in tables SQL Basics ©uluumy, 2016 48
  49. 49. Tables : A set of data arranged in columns and rows.The columns represent characteristics of stored data and the rows represent actual data entries. Tables for Database is what's spreadsheet for Excel. SQL Server Express (a free "lite" version of SQL Server) SQL Basics ©uluumy, 2016 49
  50. 50. Table Relationship: On fundamental concept of database is the tables relationship. Let’s take an example from our database AdventureWorks We took 2 tables: Product and Sales. Each table must have one Primary Key. SQL Basics ©uluumy, 2016 50
  51. 51. A primary key is a field in a table which uniquely identifies each row in the table.That means that Primary keys must contain unique values. In our example: ProductKey is the primary Key of the table DimProduct. SQL Basics ©uluumy, 2016 51
  52. 52. A FOREIGN KEY in one table is a column which points to a PRIMARY KEY in another table. Let’s look to our example. ProducKey is a foreign key column of the table FactInternetSales which refer to the primary key of the table DimProduct. It allows to identify the relationship between the two tables. SQL Basics ©uluumy, 2016 52
  53. 53. ©uluumy, 2016 53
  54. 54. SQL main operations SQL contains 4 main operations. We can • Insert of new data into a table • SELECT data from a table • Update data already existing in a table • Delete data from table ©uluumy, 2016 54
  55. 55. SQL main operations The insert, update and delete operation are usually restricted to the Database administrator.As a Citizen Data Scientist you will essentially need to select data from the database. Let’s look how to select data from a table... ©uluumy, 2016 55
  56. 56. SQL main operations Selection: Here is the general syntax of a data selection SELECT <Column List> FROM <Table Name> WHERE <Search Condition> ©uluumy, 2016 56
  57. 57. SQL main operations Aggregation How to group data and use aggregates... SELECT <Column List>, <Aggregate Function> (<Column Name>) FROM <Table Name> WHERE <Search Condition> GROUP BY <Column List> ©uluumy, 2016 57
  58. 58. SQL main operations Selection from 2 tables How to select data from more than one table... SELECT <Column List> FROM <Table1> JOIN <Table2>ON <Table1>. <Column1> = <Table2>. <Column1> ©uluumy, 2016 58
  59. 59. Customer Dashboard using Power BI ©uluumy, 2016 59
  60. 60. Segment “Customer segmentation is the process of diving a customer into groups of individuals who are similar in specific ways relevant to marketing” Source “A Marketer’s Guide to Analytics”, SAS ©uluumy, 2016 60
  61. 61. Segmentation: Types “The literature about types of segmentation is very diverse. The best I could find is the one given in the SS paper “A Marketer’s Guide to Analytics” It distinguishes between two main types of segmentation: • Foundation segmentation • Targeting segmentation ©uluumy, 2016 61
  62. 62. Segmentation: Types Foundation segmentation: Core segments It has these proprieties: • All customers are included • Each customer falls into only one segment • Each segment can be subdivided into clusters • Attributes: value, profit, attrition, risk, demographics, firmographics, etc. ©uluumy, 2016 62
  63. 63. Segmentation: Types The second type isTargeting Segmentation It identifies customers with specific needs and preferences. Useful for specific marketing programs and campaigns identifies customers with specific needs and preferences. It has these features: • Not all customers can be included • Each customer may fall into many different segments ©uluumy, 2016 63
  64. 64. Good segmentation A good segmentation must have these three features: • Relevant to the business objective • Simple: understandable and easy to characterize • Actionable ©uluumy, 2016 64
  65. 65. Managerial Segmentation RFM method We will use a very simple yet insightful method to build a customer segmentation which is relevant, simple and most importantly actionable RFM method has been around for decade.Yet it’s is still very useful ©uluumy, 2016 65
  66. 66. RFM in a nutshell RFM is an acronym for Recency, Frequency and Monetary • Recency: number of days since last purchase/Use/visit • Frequency: number of purchase/use/visit • M: Amount of purchase / time spent ©uluumy, 2016 66
  67. 67. RFM in a nutshell Based on each of these 3 factors, all the customers are ranked and given a score from 1 to 4 (depending on which quartile they are). 1 being the best score. Now for each customer we have a composite score R-F-M As each factor could have 4 different values (1,2,3, or 4) We can in theory divide our customer into until 64 segments!! ©uluumy, 2016 67
  68. 68. RFM in a nutshell It’s a good first step …but we cannot stop here because we want to have simple but ACTIONABLE segmentation That’s why we have used the term Managerial Segmentation As a managerial decision we can decide that we need to have let’s say 9 different segments based on the RFM score we have already computed ©uluumy, 2016 68
  69. 69. RFM: 9 segments Here the description of each of the 9 segments: Best: R (1) AND F (1) AND M (1): it’s simple they have the highest score... Novice: R (1) AND F (3-4) Active HighValue: R (1) AND M (1,2) Active: R (1) ©uluumy, 2016 69
  70. 70. RFM: 9 segments Warm HighValue: R (2) AND M (1) Warm: R (2) Win-back: R (3,4) AND {F (1) OR M (1)} Cold: R (3) Almost lost: R (4) ©uluumy, 2016 70
  71. 71. RFM: actions Now that we have our customer segmentation. WhatAction can we take based on this segmentation Here are some ideas/Examples: • Best Customer: • “Thank you” gift • “Exclusive preview” of new service/product ©uluumy, 2016 71
  72. 72. RFM: actions • Novice: • Connection on social media • Personal greeting message • Free shipping • Warm High-Value: • Next best offer “Get $50 in “ZZZZ” Dollars for every $50 you spend” • Almost Lost • “Last chance” special offer ©uluumy, 2016 72
  73. 73. Power BI TreeMap visualization of the 9 resulting segments ©uluumy, 2016 73
  74. 74. Target ©uluumy, 2016 74
  75. 75. Classification model : basics Here is a basic data flow for any classification model Data training is the input of the classification algorithm.The purpose is to “train” the algorithm with historical data which contain the labels (target) variable. For example, say we want to create a model to predict which customer is likely to respond positively to specific marketing campaign. Training data contains a list of customers who were targeted in the past for the same kind of campaign.The labels variable is aYes/NO variable Source: http://www.cs.princeton.edu/~schapire/talks/picasso-minicourse.pdf ©uluumy, 2016 75
  76. 76. Classification Model : Evaluation To evaluate and chose which model is the best fitted for our problem we can use several measures. Here are the most widely used: • Accuracy: the proportion of the total number of predictions that were correct. • Positive PredictiveValue or Precision: the proportion of positive cases that were correctly identified. ©uluumy, 2016 76
  77. 77. Classification Model : Evaluation • Negative PredictiveValue: the proportion of negative cases that were correctly identified. • Sensitivity or Recall: the proportion of actual positive cases which are correctly identified. • Specificity: the proportion of actual negative cases which are correctly identified. • ROC curve: It is created by plotting the recall against the false positive rate ©uluumy, 2016 77
  78. 78. Confusion Matrix Source: http://www.analyticsvidhya.com/blog/2016/02/7-important-model-evaluation-error-metrics/ ©uluumy, 2016 78
  79. 79. Classification Fundamental concept : Bias-Variance Tradeoff Google's Research Director Peter Norvig claimed that "We don’t have better algorithms. We just have more data." ©uluumy, 2016 79
  80. 80. Prediction Error ▪ You can never have a prediction model without error. ▪ Without going further with the maths behind it, prediction error is mainly divided into 2 elements: Bias andVariance. ▪ Error due to Bias is the difference between the predicted value and the correct value. ▪ Error due to Variance is defined as the variability of a model prediction for a given data point. ▪ As “a picture's worth a thousand words”, let’s look to this graphic taken from one of the best article I found on this subject “Understanding the Bias-VarianceTrade-off”. ©uluumy, 2016 80
  81. 81. Bias vs Variance Bulls-eye represents the graphical visualization of bias andVariance. Each point is the result of one iteration of the model building. The center of the target is a model that predicts perfectly the actual values. Source : scott.fortmann-roe.com/docs/BiasVariance.html ©uluumy, 2016 81
  82. 82. Bias vs Variance We have mainly four cases: ▪ Low Bias and Low Variance: That’s where we want to be! We have here a good model ▪ High Bias and Low Variance: that’s what we call an under-fitted model. It means that our model lacks some information. It’s too simple Maybe we have to add variables to our training data. Also evaluating models using other methods could be a good option too. ©uluumy, 2016 82
  83. 83. Bias vs Variance We have mainly four cases: ▪ Low Bias and High Variance: We have an Over-Fitted Model. It means that the model is too complicated for the data we have. Put simply, the model cannot be generalized. The solution is to add more data into our training set and/or to reduce the number of features (the complexity), we use Ensemble method like random forest, bagging and boosting ▪ High Bias and High Variance: we still need to work on our model. My suggestion is to tackle first the Bias error by using other methods and adding variables if you can ©uluumy, 2016 83
  84. 84. Bias –Variance TradeOff Here is another way to sum-up the bias- variance trade-off: Prediction Error is plotted against Model complexity twice: the green line is the result using the training data. The red line is the result using the test Data Source: Hastie,Tibshirani, Friedman “Elements of Statistical Learning” 2001©uluumy, 2016 84
  85. 85. Under-Fitted Model When the model is too simple (low complexity): ▪ The gap between the two plots is narrow. That’s an indication for low variance ▪ The prediction error is high for training and test data. It means a High-Bias ▪ Hence we have an under-fitted model ©uluumy, 2016 85
  86. 86. Over-Fitted Model ▪ Higher the complexity is, higher the gap is between the two plots ▪ When The prediction error between the training and the test data become too wide. ▪ It means that the model reached the over-fitting mode ©uluumy, 2016 86
  87. 87. Overview diagram of Azure Machine Learning Studio Microsoft Azure Machine Learning Studio is a drag-and- drop cloud-based service you can use to build, test, and deploy predictive analytics solutions on your data. Machine Learning Studio publishes models as web services that can easily be consumed by custom apps or BI tools such as Excel or Power BI. This Figure (source) summarizes the basic high-level steps that are required to create, test, and deploy a new Azure Machine Learning prediction model ©uluumy, 2016 87
  88. 88. Source : https://azure.microsoft.com/en-us/documentation/articles/machine-learning-studio-overview-diagram/ ©uluumy, 2016 88
  89. 89. Recommend Recommendation systems are a subclass of information filtering system that seek to predict the 'rating' or 'preference' that a user would give to an item. Source :Wikipedia ©uluumy, 2016 89
  90. 90. Two primary methodologies • Collaborative Filtering : the item recommended to the user is based on the past purchase and preference of similar users • Content-based filtering : Based on the attributes of items purchased by the user, suggest items with similar properties. Best examples : Amazon, Netflix ©uluumy, 2016 90
  91. 91. Example ©uluumy, 2016 91
  92. 92. Overview diagram of Azure Machine Learning Studio Let’s look at how these two methods works using this very simple example of movies rating Here is the example: we have a list of 6 movies (items) and 7 users. Each has rated the movies that she watched (from 1 to 5 stars). Daniel has not seen the movie "The Notebook" We want to decide if we will recommend this movie to him or not based on a prediction of his rating for the movie. So let’s start with a collaborating filtering approach©uluumy, 2016 92
  93. 93. Collaborative Filtering Daniel has not seen the movie "The Notebook" • We select the subgroup of users who watched the same movies as Daniel and also who watched "The Notebook". • Among this group, we select the users who are "similar" to Daniel in term of rating (for example using KNN algorithm). • We compute the average rating that Daniel's "neighbors" gave to "The Notebook". • It gives as the predicted rating of Daniel for the movie "The Notebook" • We repeat the steps 1 to for 4 for all movies that Daniel haven't seen • We recommend Daniel the best predicted rated movies. ©uluumy, 2016 93
  94. 94. Content-based filtering: We want to predict the Daniel rating for the movie "The Notebook"using the similarity between items (in our example movies), and not users, to make predictions • We select the movies that are similar to "The Notebook". Based on the genre we can divide movies into to groups "Action" (Skyfall, StarWar, X- Men) and "Romance" ("P. S I LoveYou", "Titanic", "The Notebook") • Daniel have rated "P. S I LoveYou" and "Titanic" which are similar to "The Notebook". Based on his rating of these two movies, we give a predicted rating of "The Notebook" • We repeat the steps 1 and 2 for all movies that Daniel haven't seen • We recommend Daniel the best predicted rated movies. ©uluumy, 2016 94
  95. 95. Labs ©uluumy, 2016 95
  96. 96. Case Study : Adventure Works ▪ AdventureWorks is a sample database created for use in demos and training on each version of Microsoft SQL Server. ▪ A company which manufactures and sells metal and composite bicycles to North American, European and Asian. ▪ two categories of customers: ▪ B2B : team of sales ▪ B2C : E-commerce ©uluumy, 2016 96
  97. 97. Setup the Lab Environment : Tools • Office 2016: FreeTrial: products.office.com/en-us/try • Power BI Free : powerbi.microsoft.com/en-us/desktop/ • Install SQL Server 2014 Express Free : microsoft.com/en-us/server-cloud/Products/sql- server-editions/sql-server-express.aspx • MICROSOFT AZURE ML Free : studio.azureml.net/ ©uluumy, 2016 97
  98. 98. SQL Server 2014 Express step 1 microsoft.com/en-us/server-cloud/Products/sql-server- editions/sql-server-express.aspx ©uluumy, 2016 98
  99. 99. SQL Server 2014 Express Step 2 ©uluumy, 2016 99
  100. 100. SQL Server 2014 Express Step 3 : Sign in ©uluumy, 2016 100
  101. 101. SQL Server 2014 Express Step 4a: 1- Chose SQL Server 2014 Express 64 Bit. 2- Choose your language 3- Scroll down click continue ©uluumy, 2016 101
  102. 102. SQL Server 2014 Express Step 4b: 1- Chose SQL Server Management Studio Express 64 Bit. 2- Choose your language 3- Scroll down click continue ©uluumy, 2016 102
  103. 103. SQL Server 2014 Express Step 5 When the files are downloaded : Execute Choose the first option as shown here ©uluumy, 2016 103
  104. 104. SQL Server 2014 Express Finally… you should have Microsoft SQL Server management Studio installed ©uluumy, 2016 104
  105. 105. Microsoft Azure Machine Learning • Microsoft Azure ML is could-based service. So you don’t have to install anything.All you need is to have a Microsoft account ID • Here is the address: studio.azureml.net/ ©uluumy, 2016 105
  106. 106. DATA ©uluumy, 2016 106
  107. 107. Adventure Works 2014 Warehouse Download the database :AdventureWorks 2014 Warehouse (Adventure Works 2014 Warehouse Script.zip) from this address (the official Microsoft examples): msftdbprodsamples.codeplex.com/releases ©uluumy, 2016 107
  108. 108. How to install the Database Adventure Works in SQL Server Management Studio ©uluumy, 2016 108
  109. 109. Step 1 : Open Microsoft SQL Server Management Studio Server Name :YourLocalHostSQLExpress On my Laptop: ULUUMYSQLExpress ©uluumy, 2016 109
  110. 110. Step 2 : If you have this Error message… ©uluumy, 2016 110
  111. 111. Step 2 : Open SQL Server 2014 Configuration Manager ©uluumy, 2016 111
  112. 112. Step 2 : Then, Start SQL Server Service ©uluumy, 2016 112
  113. 113. Step 3 : Open the file instawdbdw.sql (from the Adventure Works 2014 Warehouse Script you have already downloaded) ©uluumy, 2016 113
  114. 114. Step 4 : Put Management Studio into SQLCMD mode Tools > Options > Query Execution and selecting By default, open new queries in SQLCMD mode ©uluumy, 2016 114
  115. 115. Step 5 : Change it to the path of the AdventureWorks database you have already downloaded ©uluumy, 2016 115
  116. 116. Step 6 : Execute ©uluumy, 2016 116
  117. 117. Final result ©uluumy, 2016 117
  118. 118. Lab 1 : Data preparation using SQL ▪ The first lab is kind of the foundation for the following labs. ▪ We will use SQL to extract information about our customers: ▪ First we will create two tables: – Customers: socioeconomics and geographic data like, gender, income, education, number of children, postal code and province. – Sales : All the internet orders made by customers including quantity, amount, date and products features such as model, category and subcategory. ©uluumy, 2016 118
  119. 119. ▪ These tasks should already have been done: ▪ Install SQL Server Express 2014. ▪ Download the database : Adventure Works 2014Warehouse. ▪ Install the database in SQL Server Management Studio. ▪ You can download the SQL code here (Explore.sql) ©uluumy, 2016 119
  120. 120. Lab 2 : Customer Dashboard using Power BI ▪ You should already have installed Power BI. ▪ If not please go back to Lay the Foundation Section, Setup the Lab Environment Lecture. ▪ Also you should have finished lab 1 before, because you will need the two tables that we have created during that lab. ▪ If not, please, go back to lab 1. ▪ We will build a customer dashboard using the two tables we have already created during Lab1 : Customers and Sales. ▪ Here is an example: ©uluumy, 2016 120
  121. 121. ©uluumy, 2016 121
  122. 122. Lab 3 : Managerial Segmentation using SQL ▪ In this lab we’re going to segment our customer based on their purchases. ▪ We will use the method we have presented in the previous lecture (managerial Segmentation). ▪ At the end we will divide our customers into 9 segments. ▪ Marketing team will be able to tailor a specific strategy for each segment. ©uluumy, 2016 122
  123. 123. ▪ You should have finished lab 1 before, because you will need the two tables we have created during that lab. ▪ If not, please, go back to lab 1. ▪ Here is a Power BITreeMap visualization of the 9 resulting segments. ©uluumy, 2016 123
  124. 124. ©uluumy, 2016 124
  125. 125. You can downloand from GitHub the SQL script ManagerialSegmentation.sql which implement the solution. ©uluumy, 2016 125
  126. 126. Lab 4 : Bike Buyers targeting using Azure ML ▪ During lab 2 we have noticed that only 50% of our customers have already bought a bike from Adventure Works. ▪ The management team, decided the by the end of the fiscal year, the company should have increased this percentage by 5%. ▪ In order to achieve this objective the marketing team will launch a telemarketing campaign to reach those among our customers who have never bought a bike. ©uluumy, 2016 126
  127. 127. ▪ As it's not possible to contact all of them (around 9,000 customers) due to time and budget constraints, the marketing team wants to target only the most likely among them to be interested in our offer. ▪ So theVP Marketing asked, you , as the team citizen data scientist, to build a model in order to achieve this goal. ©uluumy, 2016 127
  128. 128. Our Challenge: Target customers who have never purchased a bike (from AdventureWorks) and who are the most likely to be interested in buying one. ©uluumy, 2016 128
  129. 129. First.. let's get the dataset You have two options: ▪ Create the table in SQL Server using the script BikeBuyerTargeting.sql (to download here ) and then export it to a csv file. ▪ Or download here the dataset BikeBuyerTargeting.csv. ▪ Now let's go to Azure ML Studio. ©uluumy, 2016 129
  130. 130. Part 1 : Predictive Model Final Experiment ©uluumy, 2016 130
  131. 131. Step1: studio.azureml.net/ SignIn ©uluumy, 2016 131
  132. 132. Step 2 : DATASETS New ©uluumy, 2016 132
  133. 133. Step2: DATASET fromlocalfile ©uluumy, 2016 133
  134. 134. Step2: DATSET UploadthefileBikeBuyerTargeting.csv ©uluumy, 2016 134
  135. 135. Step 2 : DATASET the file BikeBuyerTargeting is uploaded ©uluumy, 2016 135
  136. 136. Step 3 : Create New EXPERIMENT ©uluumy, 2016 136
  137. 137. Step3: BlankExperiment ©uluumy, 2016 137
  138. 138. Step4: 1-SelectBikeBuyerTargetingdataset 2-Drag and drop the dataset ©uluumy, 2016 138
  139. 139. Step4: InsertthemoduleSelectColumnsinDataset (DataTransformation/Manipulation) ©uluumy, 2016 139
  140. 140. Step 4 : Drag and drop it ©uluumy, 2016 140
  141. 141. Step4: Connectittothedataset ©uluumy, 2016 141
  142. 142. Step4: 1-OnthePropertieslaneofmodule“SelectColumnsinDataset” Click LaunchColumn 2-Removethese columnsasshownhere 3-ClickCheckbutton 1 2 3 ©uluumy, 2016 142
  143. 143. Step5: SearchforSplitData(DataTransformation/SampleandSplit) 1-Draganddropit 2-Connectitto“SelectColumnsinDataSet 3-InProperties,changeFractionofRowsto0.7 ©uluumy, 2016 143
  144. 144. Step6: AddthemoduleTwo-ClassBoostedDecisionTree(MachineLearning/ InitializeModel/Classification) ©uluumy, 2016 144
  145. 145. Step6: 1-SelectthemoduleTrainModel(MachineLearning/Train) 2-Draganddropit ©uluumy, 2016 145
  146. 146. Step6: 3-ConnectittoTwo-ClassBoostedDecisionTree (theconnectoronthe left) 4-ConnectittoSplitModel (theconnectorontheright) ©uluumy, 2016 146
  147. 147. Step6: TrainModelProprieties 1- From the Proprieties Pane, click Launch column selector 2- Select the labelVariable BikeBuyer 1 2 ©uluumy, 2016 147
  148. 148. Step7: 1SelectScoreModel(MachineLearning/Score) 2-Draganddropit ©uluumy, 2016 148
  149. 149. Step7: 3-ConnectittoTrainModel(theconnectorontheleft) 4-ConnectittoSplitModel (theconnectorontheright) ©uluumy, 2016 149
  150. 150. Step8: Runtheexperiment ©uluumy, 2016 150
  151. 151. Step9: Let’svisualizetheresult 2variableshavebeenadded:ScoredLabelsandScoredProbabilities ©uluumy, 2016 151
  152. 152. Step10: AddEvaluateModel(MachineLearning/Evaluate) ©uluumy, 2016 152
  153. 153. Step11: Run… andvisualizetheresult ©uluumy, 2016 153
  154. 154. Step11: ROC… ©uluumy, 2016 154
  155. 155. Step11: Measures… ©uluumy, 2016 155
  156. 156. Part 2 : Web Service Step1: ClickonSetUp WebServiceand choosePredictive WebService ©uluumy, 2016 156
  157. 157. Step1 Hereistheresult…3moduleshavebeenaddedbyAzureML 1-WebServiceInput 2-WebServiceOutput 3-BikeBuyerTargeting(trainedmodel)©uluumy, 2016 157
  158. 158. Step2 1-Deletetheconnectionbetween“WebServiceInput”andthemodule “SelectColumn..” 2-Connect“WebServiceInput”to“ScoreModel” 1 2 ©uluumy, 2016 158
  159. 159. Step3 1-SelectthemoduleSelectColumn..” 2-Launchcolumnselector 3-Add“BikeBuyer”totheexcludedvariables 1 2 3 ©uluumy, 2016 159
  160. 160. Step4 Run… ThenClickDeployWebService ©uluumy, 2016 160
  161. 161. Step4 Hereistheresult…. ClickonTest ©uluumy, 2016 161
  162. 162. Step5 Let’stestthewebservice ©uluumy, 2016 162
  163. 163. Step5 Theprediction(“yes”withaprobabilityequalto0.831 ©uluumy, 2016 163
  164. 164. Try to improve the model in term of precision (0.836). Optional Challenge: ©uluumy, 2016 164
  165. 165. Lab 5 : Next best offer using Azure ML ▪ Following the brilliant success of the previous targeted campaign, theVP marketing has asked you to work on a way to improve our customers’ retention and loyalty. ▪ In fact 50 % of our customers belong to the 3 segments "Win-Back", "Cold" and "Almost Lost". Customers in these segments have not bought a product for at least 1 year... That's something to be addressed. ©uluumy, 2016 165
  166. 166. Your challenge : Build a recommendation system which suggests to each of our 18,484 customers, 3 items that she/he could be interested in. ©uluumy, 2016 166
  167. 167. First.. let's get the dataset You have two options: ▪ Create the tables in SQL Server using the script Recommend.sql (to download here) and then export them to csv files. ▪ Or download here the dataset Rating.csv, User.csv and Item.csv ▪ Now let's go to Azure ML Studio ©uluumy, 2016 167
  168. 168. Part 1 : Recommendation Model Final Experiment O'clock Shadow by Christopher (CC BY-SA©uluumy, 2016 168
  169. 169. Step1: studio.azureml.net/ SignIn ©uluumy, 2016 169
  170. 170. Step 2 : DATASETS New ©uluumy, 2016 170
  171. 171. Step2: DATASETfromlocalfile UploadthefilesRatingRecommendation.csv,UserRecommendation.csv, ItemRecommendation.csv ©uluumy, 2016 171
  172. 172. Step 3 : Create New EXPERIMENT ©uluumy, 2016 172
  173. 173. Step3: BlankExperiment ©uluumy, 2016 173
  174. 174. Step4: 1-SelectRatingRecommendationdataset 2-Drag and drop the dataset ©uluumy, 2016 174
  175. 175. Step4: InsertthemoduleEditMetatdata (DataTransformation/Manipulation) Connectittothedataset ©uluumy, 2016 175
  176. 176. Step4: 1-OnthePropertieslaneofmodule“EditMetadata” ClickLaunchColumn 2-Includethevariable“ImpliciteRating” 3-ClickCheckbutton 1 2 3 ©uluumy, 2016 176
  177. 177. Step4: ChooseIntegerasDataType ©uluumy, 2016 177
  178. 178. Step5: SearchforSplitData(DataTransformation/SampleandSplit) 1-Draganddropit 2-Connectitto “EditMetadata” ©uluumy, 2016 178
  179. 179. Step6: 1-AddthemoduleTrainMatchboxRecommender(MachineLearning/Train) 2-ConnectittoSplitData(theleftconnector) 3-ChangeNumberofRecommendationto3 ©uluumy, 2016 179
  180. 180. Step7: AddthedatasetUserRecommendaion.csv ©uluumy, 2016 180
  181. 181. Step8: Addthemodule“RemoveDuplicateRows“ ConnectittoUserRecommendationdataset andLaunchcolumnselector ©uluumy, 2016 181
  182. 182. Step8: IncludehevariableCustomerKey ©uluumy, 2016 182
  183. 183. Step8: ThenconnectittoTrainMatchbox(themiddleconnector) ©uluumy, 2016 183
  184. 184. Step9: AddthedatasetItemRecommendation.csv ThenconnectittoTrainMatchbox(therightconnector) ©uluumy, 2016 184
  185. 185. Step10: 1SelectScoreMatchboxRecommendation2-Draganddropit 2-Connectitfrom lefttoright to“TrainMatchbox”,“SplitData”,“Remove Duplicate”and“ItemRecommendation” ©uluumy, 2016 185
  186. 186. Step11: Runtheexperiment Themodelrecommends3itemsforeachuser ©uluumy, 2016 186
  187. 187. Part 2 : Web Service 7 O'clock Shadow by Christopher (CC BY-SA ©uluumy, 2016 187 Step1: ClickonSetUpWeb Serviceandchoose PredictiveWeb Service
  188. 188. Step1 Hereistheresult…3moduleshavebeenaddedbyAzureML 1-WebServiceInput 2-WebServiceOutput 3-Lab5RecommendationSystem ©uluumy, 2016 188
  189. 189. Step2 WewanttokeeponlythevariableCustomerKeyfromthemodule“Edit Metadata” Add“SelectColumnsinDataset” Connectitwith“EditMetadat”and“ScoreMatchbox” Launch columnselectorandincludeCustomerKey ©uluumy, 2016 189
  190. 190. Step2 1-Deletetheconnectionbetween“WebServiceInput”andthemodule “EditMetadata” 2-Connect“WebServiceInput”to“ScoreMatchbox” ©uluumy, 2016 190
  191. 191. Step4 Run… ThenClickDeployWebService ©uluumy, 2016 191
  192. 192. Step4 Hereistheresult…. ClickonTest ©uluumy, 2016 192
  193. 193. Include the segment factor in your recommender system. Optional Challenge ©uluumy, 2016 193
  194. 194. Resources ▪ SQL – Querying withTransact-SQL mva.microsoft.com/en-US/training-courses/querying-with-transactsql- 10530?l=TjT07f87_9804984382 ▪ MICROSOFTAZURE ML – MicrosoftAzure Essentials: Fundamentals of Azure: https://mva.microsoft.com/ebooks – Building Recommendation Systems in Azure: mva.microsoft.com/en-US/training-courses/building-recommendation-systems-in- azure-13765?l=j6AfbmlXB_4505513172 (was a very helpful resource to find the good data example to use throughout the course and also to create the last lab) ©uluumy, 2016 194
  195. 195. ▪ POWER BI – Documentation https://powerbi.microsoft.com/en-us/documentation/powerbi-service-get- started ▪ Book – Data MiningTechniques: For Marketing, Sales, and Customer Relationship Management,Gordon S. Linoff (Author), Michael J. A. Berry (Author) ▪ Blog – http://www.kdnuggets.com/ ©uluumy, 2016 195
  196. 196. Image Credits Bike Hängärtner CC BY London Bike Show 2013 by Jon Arm CC BY bikes in malacca by CC BY-ND IA030694a simonsimagesCC BY Ines Njers CC BY-ND The wheels SakuTakakusaki CC BY-ND Cruisers... by micadew CC BY-SA Rear end by Craig Sunter CC BY-ND ©uluumy, 2016 196
  197. 197. Image Credits elite classes-2528 by jim simonsonCC BY corner by eflon CC BY The oneJesus delToro Garcia CC BY 'Phantom' bicycle JamesGardinerCollectionCC0 Drahtesel mike goehler CC BY-ND rotating shadow by rippchenmitkraut66 CC BY-ND ©uluumy, 2016 197
  198. 198. Take a look to our course: 50% Off Become a Citizen Data Scientist ©uluumy, 2016 198
  199. 199. Keep in touch.. Uluumy.com ©uluumy, 2016 199

×