Azure Machine Learning: 
Welcome to the future of predictive 
analytics 
Ruben Pertusa Lopez 
Microsoft SQL Server MVP 
Data Platform Architect at SolidQ 
rpertusa@solidq.com 
Twitter: @rpertusa
Rubén Pertusa 
 MS SQL Server MVP 
 Data Platform Architect SolidQ 
 Phd Candidate on Data mining 
 SQLSaturday Barcelona founder 
rpertusa@solidq.com 
Twitter: @rpertusa
Say Thank you to Volunteers: 
 They spend their FREE time to give you this 
event. 
 Because they are crazy.  
 Because they want YOU 
to learn from the BEST IN THE WORLD. 
 If you see a guy with “STAFF” on their back – 
buy them a beer/wine, they deserve it.
Ivan Daniel Campos:
Rui Barreira:
Paulo Matos:
Pedro Simões:
André Batista:
3 Sponsor Sessions at 15:05 
 Don’t miss them, they might be getting 
distributing some awesome prizes! 
 Rumos 
 BI4ALL 
 Devscope
Our Main Sponsors:
Goals 
This session is about: 
 Introduction to ML and AzureML 
 Real ML Cases 
 Integration between AzureML and BI 
This session is NOT about: 
 Deep Dive in Data Science and R 
 Building the best ML model 
10/28/201 
4 | 
11 | Footer Goes Here
Agenda 
 ML Overview 
 Real ML Cases 
 AzureML Overview 
 Demos! Demos! & Special Demo! 
 BI feeds AzureML 
 AzureML feeds BI 
 Conclusions 
 Questions 
10/28/201 
4 | 
12 | Footer Goes Here
MACHINE LEARNING 
OVERVIEW
What is Machine Learning? 
10/28/201 
4 | 
14 | Footer Goes Here 
System that can learn from data 
and discover patterns and rules in 
order to exploit important business 
relationships
History of ML (and the BI Story) 
Deep neural 
Networks 
No 
improvements 
Big Data 
explosion 
Graphical 
models 
SSAS DM 
improvements 
Scoring 
Systems 
SSAS 2000 
DM features 
Expert Systems 
& Decision Trees 
Neural 
Networks
2014 = Perfect Timing 
Cheap & Scalable computing (Big Data) 
+ 
Best ML algorithms 
+ 
Data culture adoption 
= 
Move ML to the next level
Basic Problem: Text recognition
Transform it into a ML solution 
Cleaned & Labeled data ML model trained Score input
One ML model to rule them all…?
Some experiences with ML 
 BBC Case Study 
 SSAS Performance Issue Detection 
 Big Automotive Manufacturer: Customer 
loyalty campaign & Stock calculator. 
 Retail Company: Automate decision making
BBC: Case Study 
 Input 
 EntryId 
 Date 
 UserId 
 SiteId 
 ForumId 
 ThreadId 
 ParentId 
 PrevId 
 NextId 
 Text 
 Case table 
 1.- Thread ( % Fails in a certain thread) 
 2.- User (% Fails per User) 
 3.- Diff Hour Forum Created (TimeDatePosted- 
TimeForumCreated) 
 4.- User Forum (% Fails in a certain forum) 
 5.- Diff Last for User (TimeDatePosted - TimeLastFailUser) 
 6.- Hour of the day 
 7.- Diff hour UserJoined-Now (TimeDatePosted-TimeUserJoined 
 8.- User Thread (% Fails per User in a thread) 
 9.- Diff Hour Thread Created (TimeDatePosted- 
TimeThreadCreated) 
 10.- Day of Week 
More than 200 attributes.
SSAS Performance issue Detection 
 Goal: Predict when is going to fail 
 Steps 
 Monitor and collect all counters, events 
 Label errors 
 ML Classification & Time series algorithm
Customer loyalty campaign & Stock 
calculator 
 Big Automotive Manufacturer:.
Retail Company: 
Automate decision making
More ML solutions 
 Churn analysis 
 Advertising analysis 
 Pricing analysis 
 Weather forecasting 
 IT optimization 
 Fraud detection 
 Recommendation 
engines 
 Personalized services 
 Health issues 
detection 
No limits
And Now… 
AZURE ML
AzureML 
 Fully-managed & scalable cloud service 
 Focus on ability to develop & deploy 
 For emerging data scientists 
 UI for Data Science workflow 
 Quality ML algorithms 
 Collaborative 
 Accessible through a web browser 
 Fastest deploy to production
First look at AzureML 
DEMO
CRISP Model 
CRISP = Cross Industry Standard Process for Data Mining 
(http://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining) 
10/28/201 
4 | 
Transform 
29 | Footer Goes Here
AzureML Process Cycle 
Get/Prepare 
Data 
Build 
Experiment 
Run 
Experiment 
Review 
results 
Save 
Trained 
Model 
Add Trained 
Model to 
new 
Experiment 
Run Scoring 
and set Public 
Input/Output 
Publish 
Web 
Service 
Deploy 
to Prod. 
Data Scientist IT
Data Scientists love R 
 Most powerful statistical programming 
language 
 Almost 400 of the most popular R Packages 
already available and integrated 
 Visualization using R plotting libraries 
 Future: 
 Upload your own R packages 
 Python compatibility 
10/28/201 
4 | 
31 | Footer Goes Here
R integration 
DEMO
AzureML Pricing 
10/28/201 
4 | 
33 | Footer Goes Here
Special demo 
DEMO PORTOFLIX
BI feeds AzureML 
 Case table is critical 
Historical 
Dataset 
Cube 
ETL 
Mining Models 
Cube
AzureML feeds BI 
 Consume results from AzureML 
 Azure Market Place 
 C#, R, 
 Excel addin 
 Power Query 
 http://microsoftazuremachinelearning.azurew 
ebsites.net/
Power Query consuming AzureML 
DEMO
Summary 
 Convert problems into ML problems 
 All about good data 
 AzureML + Big Data + Data culture 
Resources 
 Machine Learning Blog 
http://blogs.technet.com/b/machinelearning/ 
 Forum 
http://social.msdn.microsoft.com/forums/azure/e 
n-US/home?forum=MachineLearning 
10/28/201 
4 | 
38 | Footer Goes Here
QUESTIONS
Contact me! 
 Rubén Pertusa López (rpertusa@solidq.com) 
Twitter: @rpertusa 
10/28/201 
4 | 
Thank you! 
40 | Footer Goes Here

AzureML Welcome to the future of Predictive Analytics

  • 1.
    Azure Machine Learning: Welcome to the future of predictive analytics Ruben Pertusa Lopez Microsoft SQL Server MVP Data Platform Architect at SolidQ rpertusa@solidq.com Twitter: @rpertusa
  • 2.
    Rubén Pertusa MS SQL Server MVP  Data Platform Architect SolidQ  Phd Candidate on Data mining  SQLSaturday Barcelona founder rpertusa@solidq.com Twitter: @rpertusa
  • 3.
    Say Thank youto Volunteers:  They spend their FREE time to give you this event.  Because they are crazy.   Because they want YOU to learn from the BEST IN THE WORLD.  If you see a guy with “STAFF” on their back – buy them a beer/wine, they deserve it.
  • 4.
  • 5.
  • 6.
  • 7.
  • 8.
  • 9.
    3 Sponsor Sessionsat 15:05  Don’t miss them, they might be getting distributing some awesome prizes!  Rumos  BI4ALL  Devscope
  • 10.
  • 11.
    Goals This sessionis about:  Introduction to ML and AzureML  Real ML Cases  Integration between AzureML and BI This session is NOT about:  Deep Dive in Data Science and R  Building the best ML model 10/28/201 4 | 11 | Footer Goes Here
  • 12.
    Agenda  MLOverview  Real ML Cases  AzureML Overview  Demos! Demos! & Special Demo!  BI feeds AzureML  AzureML feeds BI  Conclusions  Questions 10/28/201 4 | 12 | Footer Goes Here
  • 13.
  • 14.
    What is MachineLearning? 10/28/201 4 | 14 | Footer Goes Here System that can learn from data and discover patterns and rules in order to exploit important business relationships
  • 15.
    History of ML(and the BI Story) Deep neural Networks No improvements Big Data explosion Graphical models SSAS DM improvements Scoring Systems SSAS 2000 DM features Expert Systems & Decision Trees Neural Networks
  • 16.
    2014 = PerfectTiming Cheap & Scalable computing (Big Data) + Best ML algorithms + Data culture adoption = Move ML to the next level
  • 17.
  • 18.
    Transform it intoa ML solution Cleaned & Labeled data ML model trained Score input
  • 19.
    One ML modelto rule them all…?
  • 20.
    Some experiences withML  BBC Case Study  SSAS Performance Issue Detection  Big Automotive Manufacturer: Customer loyalty campaign & Stock calculator.  Retail Company: Automate decision making
  • 21.
    BBC: Case Study  Input  EntryId  Date  UserId  SiteId  ForumId  ThreadId  ParentId  PrevId  NextId  Text  Case table  1.- Thread ( % Fails in a certain thread)  2.- User (% Fails per User)  3.- Diff Hour Forum Created (TimeDatePosted- TimeForumCreated)  4.- User Forum (% Fails in a certain forum)  5.- Diff Last for User (TimeDatePosted - TimeLastFailUser)  6.- Hour of the day  7.- Diff hour UserJoined-Now (TimeDatePosted-TimeUserJoined  8.- User Thread (% Fails per User in a thread)  9.- Diff Hour Thread Created (TimeDatePosted- TimeThreadCreated)  10.- Day of Week More than 200 attributes.
  • 22.
    SSAS Performance issueDetection  Goal: Predict when is going to fail  Steps  Monitor and collect all counters, events  Label errors  ML Classification & Time series algorithm
  • 23.
    Customer loyalty campaign& Stock calculator  Big Automotive Manufacturer:.
  • 24.
    Retail Company: Automatedecision making
  • 25.
    More ML solutions  Churn analysis  Advertising analysis  Pricing analysis  Weather forecasting  IT optimization  Fraud detection  Recommendation engines  Personalized services  Health issues detection No limits
  • 26.
  • 27.
    AzureML  Fully-managed& scalable cloud service  Focus on ability to develop & deploy  For emerging data scientists  UI for Data Science workflow  Quality ML algorithms  Collaborative  Accessible through a web browser  Fastest deploy to production
  • 28.
    First look atAzureML DEMO
  • 29.
    CRISP Model CRISP= Cross Industry Standard Process for Data Mining (http://en.wikipedia.org/wiki/Cross_Industry_Standard_Process_for_Data_Mining) 10/28/201 4 | Transform 29 | Footer Goes Here
  • 30.
    AzureML Process Cycle Get/Prepare Data Build Experiment Run Experiment Review results Save Trained Model Add Trained Model to new Experiment Run Scoring and set Public Input/Output Publish Web Service Deploy to Prod. Data Scientist IT
  • 31.
    Data Scientists loveR  Most powerful statistical programming language  Almost 400 of the most popular R Packages already available and integrated  Visualization using R plotting libraries  Future:  Upload your own R packages  Python compatibility 10/28/201 4 | 31 | Footer Goes Here
  • 32.
  • 33.
    AzureML Pricing 10/28/201 4 | 33 | Footer Goes Here
  • 34.
  • 35.
    BI feeds AzureML  Case table is critical Historical Dataset Cube ETL Mining Models Cube
  • 36.
    AzureML feeds BI  Consume results from AzureML  Azure Market Place  C#, R,  Excel addin  Power Query  http://microsoftazuremachinelearning.azurew ebsites.net/
  • 37.
  • 38.
    Summary  Convertproblems into ML problems  All about good data  AzureML + Big Data + Data culture Resources  Machine Learning Blog http://blogs.technet.com/b/machinelearning/  Forum http://social.msdn.microsoft.com/forums/azure/e n-US/home?forum=MachineLearning 10/28/201 4 | 38 | Footer Goes Here
  • 39.
  • 40.
    Contact me! Rubén Pertusa López (rpertusa@solidq.com) Twitter: @rpertusa 10/28/201 4 | Thank you! 40 | Footer Goes Here

Editor's Notes

  • #22 How did we get over this problem? Our research has two approaches, one based on the human behavior when they are posting , and the second one based on the meaning of the text they are writing We also know we have a history of every post with its result, moderated or not, so we will be able to train the model and guide the data mining. We don’t have to guess/try the result or classify posts. Just learn about the history. Starting with the first approach we thought about the information that we have: The entryid,threadid,forumid of the post The userid The date and time when they posted The result of the moderation (fail or not) But this information is not quite enough for us to extract knowledge… to get patterns What is going to happen if some new user post on a new thread? We don’t have any history about his behavior, or the behavior of the thread. What should our system do in that case? We started building some attributes, like these: Percentage of fails in a certain thread Percentage of fails per user Difference in hours between the date he posted and the date the forum was created Percentage of fails in some forum Difference in hours between the date he posted and the date he failed on a forum The hour of the day Difference in hours between the date he posted and the date he joined the forum Percentage of fails per user in a thread Difference in hours between the date he posted and the date the thread was created The day of the week And you can imagine how many combinations can happen among these attributes: Percentage of fails per user in a thread on mondays, Percentage of fails during weekends, during national holidays, during last week… There are lots of patterns and uses: Like: 1.- Moderating half of what moderators are moderating right now, Data Miming will still get more than 95% of the failing posts. That is, moderating 600.000 posts, you will fail 86.000 out of the 91.000 that you are moderating to fail now. 2.- In heavy posting days, where you will not able to moderate everything, you can automatically decide to not moderate posts that are likely to not fail. This way you minimize the risk compare to random selection of post to not moderate
  • #28  Microsoft Azure Machine Learning, a fully-managed cloud service for building predictive analytics solutions, helps overcome the challenges most businesses have in deploying and using machine learning. How? By delivering a comprehensive machine learning service that has all the benefits of the cloud. Azure Ml brings together the capabilities of new analytics tools, powerful algorithms developed for Microsoft products like Xbox and Bing, and years of machine learning experience into one simple and easy-to-use cloud service.
  • #30 The “Transform” part of the virtuous cycle of data mining is further divided into steps. The Cross Industry Standard Process for Data Mining (CRISP) model is an informally standardized process for the “Transform” part. It splits the process in six phases. The sequence of the phases is not strict. Moving back and forth between different phases is always required. The outcome of each phase determines the next phase (or particular task of a phase) that has to be performed. The arrows indicate the most important and frequent dependencies between phases. The outer circle in the figure symbolizes the cyclic nature of data mining itself. A data mining process continues after a solution has been deployed. The lessons learned during the process can trigger new, often more focused business questions. Subsequent data mining processes will benefit from the experiences of previous processes. The six CRISP phases should finish with some deliverables. The phases with typical deliverables include: Business understanding: data mining problem definition Data understanding: data quality reports, descriptive statistics, graphical presentations of data, etc. Data preparation: cleansed training and evaluation datasets, including derived variables Modeling: different models using different algorithms with different parameters Evaluation: decision whether to use a model and which model to use Deployment: end-user reports, OLAP cube structure, OLTP “soft” constraints, etc. This course will focus on the “Transform” part of the virtuous cycle.
  • #32  •Data scientists can bring their existing assets in R and integrate them seamlessly into their Azure ML workflows. •Using Azure ML Studio, R scripts can be operationalized as scalable, low latency web services on Azure in a matter of minutes! •Data scientists have access to over 400 of the most popular CRAN packages, pre-installed. Additionally, they have access to optimized linear algebra kernels that are part of the Intel Math Kernel Library. •Data scientists can visualize their data using R plotting libraries such as ggplot2. •The platform and runtime environment automatically recognize and provide extensibility via high fidelity bi-directional dataframeand schema bridges, for interoperability. •Developers can access common ML algorithms from R and compose them with other algorithms provided by the Azure ML platform. R most widely used data analysis software – used by 2M + data scientist, statisticians and analysts Most powerful statistical programming language used with RStudio, it can help you for the purposes of productivity Create beautiful and unique data visualisations – as seen in New York Times, Twitter and Flowing Data Thriving open-source community – leading edge of analytics research Fills the talent gap – new graduates prefer R. It’s fun! Why else might you use R? Pivot Tables are not always enough Scaling Data (ScaleR) R is very good at static data visualisation but Power BI and Excel are very good at dynamic data visualisation You want to double check your results or do further analysis You can use RODBC to connect to data between R and SQL Server, or R and Excel. Alternatively you can import data in.