Create and operationalize a predictive model using Microsoft Azure Machine Learning.
– Perform the typical steps involved in building a predictive analytics solution such as data ingestion, data cleansing, data exploration, feature engineering, model selection and evaluation of model results
–learn how to use machine learning with big data scenarios using tools like Hadoop and SQL Server to process and work with such data.
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Building a Predictive Analytics Solution with Azure ML
1. BUILDING A PREDICTIVE
ANALYTICS SOLUTION
WITH AZURE ML
Fidan Boylu Uz &
Syed Fahad Allam Shah
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci
2. Building a predictive analytics
solution with Azure ML
Fidan Boylu Uz, Ph.D
Syed Fahad Allam Shah, Ph.D
Data Scientists, Microsoft
6. Predicting future performance from historical data
Recommenda-
tion engines
Advertising
analysis
Weather
forecasting for
business planning
Social network
analysis
IT infrastructure
and web app
optimization
Legal
discovery and
document
archiving
Pricing analysis
Fraud
detection
Churn
analysis
Equipment
monitoring
Location-based
tracking and
services
Personalized
Insurance
Predictive analytics
should address the
likelihood of something
happening in the future,
even if it is just an instant
later…
7.
8.
9.
10.
11. This is Karl.
Karl owns a company that
operates vending machines in
Washington.
His job is to make sure that his 100
vending machines are selling drinks
& obtaining revenue.
Karl wants revenue to always
be high & his business to
be profitable
12. Sadly, vending machine will
occasionally break & may take up to
7 days to fix, thus hurting sales.
To eliminate this occurrence, Karl must maintain
operations & figure out the best way to utilize
resources in order to optimize revenue.
Intro to Advanced Analytics + AML (Slides) – 15 minutes
One of the key use cases for Machine Learning is advanced analytics. First let’s start by level setting what we mean by Advanced Analytics. We often get asked whether Advanced Analytics is just “BI” with fancier branding. This chart helps to illustrate why that is not the case.
Put simply, BI is a tool that is designed to show you what has happened so you can make your own decisions about what to do next. The next level – Advanced Analytics – allows you to take that data to the next level by having the computer predict what will happen next. This is more accurate than a human can ever hope to be, as the computer can reason over far more variables than a human can on a BI dashboard. But predictive is only the first step, the next step is once you are accurately predicting the future you can program the computer further to anticipate those occurrences and react accordingly.
ML takes a different approach. By applying concepts from a range of fields, including statistics, probability theory, and so on, we can build an ML system that “learns” to recognize handwritten digits by being trained on thousands, or even millions, of examples.
So, in order to get an accurate digit classifier, all we need to do is acquire lots of training examples, with labels (this is called “labeled training data”), and then feed it into the ML system.
So what are these converging factors?
The first is infinite scale – it’s now possible with the cloud – and inexpensively. It is now possible to harness massive amounts of data in a way that was previously inconceivable due to the cost of the on-premises hardware and software to get you there.
And that’s a good thing too, since the amount of data you have to consume is exploding. Social data, streaming data, data from every corner of the world that actually matters to your business – it isn’t just noise.
That data out there is attached to your customer who expects a new relationship with your brand that was not accessible to them in the old world. They expect their experience with your brand to be seamless from online to in-store. They expect that when they have a complaint and post it on your Facebook page, you listen, respond and take action.
But then who is going to do this advanced analytics work? Many of you don’t have a data scientist, you didn’t think you needed one. And perhaps you don’t, or perhaps you’ve had trouble finding one. That’s because there is a talent gap today – a very big gap McKinsey says is a 300 thousand gap of supply vs. demand in the US alone – but that’s changing. Universities are putting out talent and spinning up new programs faster than we can count them and companies like yours are snatching up this talent. But the market this new talent is entering is still filled with barriers.
As mentioned, Advanced analytics and machine learning have been around a long time, but progress in this space has been glacial. The adoption of cloud in larger companies has been slow and the expense of on-premises advanced analytics deployments is prohibitive in terms of both infrastructure and talent for hire. And even when you do get a data scientist in-house, they often work for a department rather than within IT – which means for example they have access to the finance department’s data within which they sit, but as we’ve discussed the value of advanced analytics is found in reasoning over many variables and getting access to those variables can be extremely hard. Then let’s say that talent gets the data, but they design their solution with the open source language R or Python and the rest of the organization doesn’t use that language. So this talent then delivers the solution to the developer to put into production and they literally don’t speak the same language. You can see why adoption has been so slow.
But we’re changing all that through our vision of accessibility to all
We first provide a modeling experience that welcomes all skill levels. Data scientists can use trusted algorithms from Xbox and Bing without writing one line of code. Or, more seasoned data scientists can mix and match with Python and R built in, or drop in their custom code. So – literally – the tool speaks their language.
Then we can deploy in minutes as a web service – our one-click deployment is unique to Microsoft.
Then partners and data scientists can scale through the Marketplace and Developers can grab APIs or finished solutions with the data science inside – no machine learning skills are needed.
This all converges into differentiation for business. A business that can not only consume the massive amounts of data that is being generated every day, but turn it into knowledge, action and advantage. Let’s talk now about some companies who are doing just that today.
Demo of AML product, audience passive, listening only - 10 minutes
So what does that look like from an architectural perspective? With advanced analytics, you work from the business problem backwards. All of the products listed can come into play, or only a few, depending on what job the technology needs to do.
Let’s say I have an issue of customer churn. I don’t know why my best customers are leaving and I need to find out. I have things like Twitter/Facebook/Blog entries in HDInsight – our Hadoop implementation in the cloud – and it’s streaming in daily from the web. On-premises I have my customer sales data and buying behavior.
I can then bring in the training set data from HDInsight and a subset of my on-premises customer data into the built-in storage space. I can then model against that training set in ML Studio – which is the playground for the data scientist or advanced analytic developer. In this space the implementer trains and tests the model until she is satisfied that the model will deliver the answer to the question of customer churn. Not only why the customers are departing, but predictive analytics to tell the company which ones are currently at risk based on past data. That way the sales and marketing departments can target those specific customers with the right activities to solve for why they’re leaving in the first place.
The implementer then literally pushes a “Yes” button in the tool to send the finished model into staging, with a flag on the Microsoft Azure portal letting the owner of the all-up portal experience know the model is ready to go. Again – this is a unique and differentiated experience with Azure ML – we are the only ones who offer the ability to push a customized model to production this easily and quickly. Once pushed live, this is now surfaced as a web service which can run over any data, anywhere. If this is running over on-premises data, the data is never persisted in the cloud, so again the only data that must be in the cloud is the original training set, which can be anonymized and removed once the modeling is done for those customers with compliance/security concerns around data in the cloud.
This finished web service can now be called from the company dashboard, where the CMO can easily consume the results and advise the teams accordingly. And, as the company needs change, the implementer need only to revisit the model in ML Studio, adjust it and push it to staging again to literally have the model swap out underneath the live web service.
But what if the company doesn’t have an implementer in house? In that case, they can go right to the Azure Machine Learning Marketplace, where there are live hosted web services already existing to solve common problems such as this. They can be simply hooked up to apps, services and dashboards for this type of solution. This is also a value-add for companies and implementers looking to monetize their own machine learning solutions. Off azure.com/ml on Machine Learning Center we have detailed instructions on how to leverage this to create, monetize and scale your own ML offerings here.
This advanced analytics process guide provides a map of the data science tasks typically involved in building and deploying predictive models using Azure Machine Learning. It shows how the Azure platform enables tasks such as ingesting data from various sources, preparing it for use in Azure Machine Learning, and then creating operationalized models with an Azure Machine Learning experiment that can be consumed by end user applications, programmatically or otherwise. While the map shows the core series steps involved in a typical end-to-end data science exercise, not all steps are required and their precise sequence can vary depending on the location, size and complexity of the data.
Generic walk thorough
Talk about NYC (10 mins)
Describe the dataset
What can we predict?
What does the end result look like - show the app here.
Hands On (60 mins)
Build an AML experiment with sample of NYC data
Operationalize and consume
Demo (30 mins)
Talk about original dataset being 50GB, but the sample was only 0.5GB. How do we bridge the gap?
Show ADAP
Point people to ADAPT resources on Azure.com
Talk about setting up environments for data science
Ingesting data
Using IPNB - Do a hands on demo of visualizing 50GB of data, feature engineering, down-sampling etc.
Now you've come full circle to loading the sample in Azure ML.