The Machine Learning Workflow with Azure

April 21
Real World Machine Learning in Azure
The Machine Learning Workflow
Step by Step and in Azure

About me
• Project Manager @
o 16+ years professional experience
• Microsoft Azure MVP
• External Expert Horizon 2020
• External Expert Eurostars, InnoFund DK
• Business Interests
o Web Development, SOA, Integration
o IoT, Machine Learning
o Security & Performance Optimization
• Contact
o ivelin.andreev@icb.bg
o www.linkedin.com/in/ivelin
o www.slideshare.net/ivoandreev

Agenda
• Domain Challenges
• The ML Workflow (step by step)
• ML Options in Azure
• Demo

Programming vs Machine Learning
• How classic programming works?
o Developer is the intelligence
o Array of statements:
• Does a bird fly?
• Yes!... Unless: dead, injured, flightless, missing a wing
o Problems raise at scale, rules and exceptions are endless
o System does not adapt
• ML model is …
o System, answering questions correctly (most of the time )
o Created via training process
o Learns from data and finds patterns
• Use Cases
o Classification, Regression, Recommendation, Anomaly detection

Machine Learning Challenges
• Asking the right questions
• Requires training data
o Real-world data is messy (wrong or missing data)
o Feature engineering transforms to predictive features (i.e. DNA)
o Feature extraction ( i.e. IP Address -> population density)
o Feature selection for informative features
• Overfitting model
o “Kicks ass” while training , fails badly on real predictions
• Model validation
o “Sense” how well your model will work on new data

The purpose of ML modelling is:
• Generate predictions
• Understand true relations

• Parametric Methods
o Step 1: Select a form for the function (i.e. f(X)=a.X + b)
o Step 2: Learn the coefficients from the training data
o Pros: Simple, Speed, Less training data
o i.e. Linear Regression 𝒚 = 𝜷 𝟎 + 𝜷1*Credit_Line + 𝜷2*Education_Level + 𝜷3*Age
• Nonparametric Methods
o No fixed functional form
o Pros: Flexiblе, No assumptions, Predictive power
o Cons: Overfitting, Slower, More training data
o i.e. Decision Tree
Model Types

ML vs. Statistical Modelling
• Statistical Models
o Require understanding how data were collected
o Aggregate data into numbers to understand structure
o Easily interpretable on lower dimensional datasets
• Data Science
o Bridges the gap
o Find out patterns in data and come with initial insights
• ML Models
o Make data speak instead of following initial hypothesis
o Customizable to fit business domain
o Scale to handle thousands of features

Do you know which is
the “sexiest” job
of 21st century?

You nailed it!
Harvard Business Review
claims that the answer is
DATA SCIENTIST

• Appealing
o 64% believe they are working in this century’s most appealing job
• In demand
o 90% contacted at least once a month with job offer
o 50% - weekly, 30% - several times/week, 35% have <2y experience
• The dark side…
o All models are wrong, some are useful
o 80% time is data preparation
o Real life, not academic problems
o Non-linear process
o No full automation
• No one cares how you do it
• Presentation is the key
The Truth about Data Science

MASTERING THE TOOLS
That does not transform
you to a watchmaker
There are yet
process and experience

Data Understanding
• Mosaic plot
o Categorical distribution
o Visualizes the relation between X and Y
o Strong relation = Y-splits are far apart
• Box plot
o Continuous distribution
o Distribution of numeric variable
o Identify and discard outliers (IQR)
• Scatter plot
o How much a variable determines another
https://www.kaggle.com/saisivasriram/titanic-feature-understanding-from-plots

• Make features usable
o Numerical
o Categorical (i.e. week day)
o PCA dimensionality reduction
(clustering, low covariance)
o Dummy variables
• Handle missing data
• Normalize data
o Standard range of numerical scale (i.e. from [-1000;1000] -> [0;1], [-1;1])
o Value range influence the importance of the feature compared to other
Data Preprocessing

Feature Engineering
Def: Using transformations on raw data to create new
features, more closely related to target variable
• Create features more closely related to target variable
o i.e. defaulting customer – debt-to-balance ratio = debt / balance
• Bring external data sources (i.e. Google places from IP address)
• Create features that are easily interpreted (i.e. date to day & month)
• You are using unstructured data sources (i.e. text, video)
• Create features, experiment, choose with best predictive power
Note: Domain knowledge is important (i.e. 7th is a pension day)

Note: All information is encoded in the digital media
• Images
o Step 1: Colour statistics, EXIF metadata, edges, shapes
o Step 2: Extract knowledge in fixed set of numeric characteristics
• Text
o Step 1
• Bagging, N-grams, term frequency, topic modelling, stemming
• Named entity recognition (i.e. Wikipedia)
o Step 2: Extract knowledge in fixed set of numeric characteristics
Digital Media Feature Engineering

Modelling Starts by Selecting Algorithm
• There are other ML tools
• There are many more algorithms
• You could make custom
implementations

Basic evaluation workflow
• Pick performance metric based on algorithm type
• Tweak data and model until target performance reached
CAUTION: Common problems
• Using the same data for validation and training
o Split data - 20-40% of data for validation
o K-fold cross validate - repeated random split with beats split noise
• Overfitting and model optimism
o Do not get tempted to model noise (bias-variance tradeoff)
o Do not use temporal features (future features) to predict values in the past
Performance Evaluation

Performance Metrics
• Regression model
o Root Mean Squared Error (RMSE)
o Coefficient of Determination, R2 ϵ [0;1]
• Classification model
o Confusion matrix
• Binary classification model
o Accuracy based on correct answers
o Area under ROC curve (AUC)
• Threshold
• Precision = TP / (TP + FP)
• Recall = TP / (TP + FN)
o PR-curve is better for imbalanced distribution

Tuning Model Parameters
• Model parameters control inner behaviour
o More sophisticated algorithm, more parameters
• i.e. Locally Deep SVM with kernel
o Kernel type, kernel coefficient
• How parameter tuning works?
1. Choose metric for evaluation (AUC - classification, R2-regression, etc.)
2. Select parameters for optimization
3. Define a grid as Cartesian product between arrays
4. For each combination, cross-validate on training set
5. Select the parameters for the best evaluation
Note: Expected improvement is 3%-8%

Feature Selection - select
the most predictive features
ML handles x1000 params
Not all params are equal
Adding features
Common approach
to increase accuracy
Poor performance
Correlated features could lead to
poor model performance
Overfitting
Learning relations in more detail
may lead to overfitting

Selecting Good Features
• Motivation
o Sometimes the ML goal is not to predict but identify predictive features
o Computational costs are related to number of features
• Approach
o Trying all combinations of features? ( that would be infeasible)
o Algorithms with built-in feature selection (i.e. decision trees)
• Algorithms
o Iterative Forward selection & Backward elimination
o Permutation feature importance
• High importance features are more sensitive to random shuffling of values
o Filter based feature selection
!!!Some features may have more predictive power when paired!!!

And now…
The Microsoft Azure tools
Data preparation
Building models
Consuming models

Azure Machine Learning
• Azure Machine Learning is an integrated, end-to-end
data science and advanced analytics solution
• ML related services and tools
• Highlights
o Built on open source technologies (Jupyter Notebook, Spark, Python, Docker)
o Execute experiments in isolated environments
o GPU-enabled VMs
o Azure ML Workbench
o Azure ML Experimentation Service
o Azure ML Model Management Service
o Azure ML Studio
o Data Science VM
o Libraries for Apache Spark (MMLSpark)
o Visual Studio Code Tools for AI
o Cognitive Toolkit (CNTK)
o Microsoft Cognitive Services
o ML Services for SQL Server (R, Python)

Azure ML Workbench
• Desktop application (Windows, macOS)
• Built-in Jupyter Notebook services and Git integration
• End-to-end process support
o Powerful inspectors for data analysis
o Data transformations by example
o Model development and experimentation (Python)
o Model history and deployment (local, Docker)

Azure ML Studio
• Visual workspace to build, test and deploy ML solutions
• Highlights
o X-browser drag and drop, no programming
o Rich set of modules
o Fits beginners and advanced users
o Unlimited extensibility (R Script, Python Script)
o Enterprise grade cloud service (SLA 99.95%)
o ML REST web services consumption
o Jupyter Notebook
o Azure AI Gallery (8000+ samples)
• At what price?
o Free plan available
o €8.5 per seat + €0.85 per experiment/hour
o Recommended: €85/month (100K requests)

Azure Data Science VM
• Pre-configured cloud environment for AI & Data Science
• Highlights
o Preconfigured, fully operational environment
o 50+ tools DEV, ML, BigData, Data management
o Windows and Linux (Ubuntu/CentOS)
o Updated every few months
o On-demand elastic capacity
o GPU optimized VMs for deep learning
o Up to 4x NV K80 or V100 GPUs
o Up to 128 cores, 3.8TiB RAM
• At what price?
o From €10 to €28’620 per month

Azure ML Experimentation Service
• Handle execution of ML experiments in virtual environment
for isolated, consistent and reproducible results (since 09.2017)
o Local native
o Docker (Local and Remote)
o Azure Spark cluster
• Supports Workbench, records and presents run history
• Scalable model consumption
https://docs.microsoft.com/en-us/azure/machine-learning/preview/experimentation-service-configuration

Azure ML Model Management Service
• Provide deployment, hosting, versioning and
management of models in Azure, on-prem and IoT Edge
• Deployment
o Model manifest for Docker image
https://docs.microsoft.com/en-us/azure/machine-
learning/preview/deployment-setup-configuration
• Consumption
o Models exposed on REST API
o Sample code (Java, C#, Python)
• Scalability
o Scale-out to 100x replicas/cluster
o 10 requests/replica (default)
o Autoscaling based on load
• Retraining
o APIs to retrain models and update
model version

Takeaways
• ML in the Microsoft World
o https://docs.microsoft.com/en-us/azure/machine-learning/
• Python for AI
o https://wiki.python.org/moin/PythonForArtificialIntelligence
• Data Science Blog
o https://data-flair.training/blogs/category/machine-learning/
• Starter Books

DEMO
• Data Analysis (Azure ML Workbench)
• Data Preparation (Azure ML Workbench)
• Predictive Maintenance (Azure ML Studio)

Upcoming events
SQLSaturday #711 Plovdiv
02 June 2018
www.sqlsaturday.com/711/
SQLSaturday #763 Sofia
13 Oct 2018
www.sqlsaturday.com/763/

Thanks to our Sponsors:
Global Sponsor:
Platinum Sponsors:
Swag Sponsors: Media Partners:

The Machine Learning Workflow with Azure

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to The Machine Learning Workflow with Azure

Similar to The Machine Learning Workflow with Azure (20)

More from Ivo Andreev

More from Ivo Andreev (20)

Recently uploaded

Recently uploaded (20)

The Machine Learning Workflow with Azure