SlideShare a Scribd company logo
DECEMBER 14
GLOBAL AI BOOTCAMP IS POWERED BY:
The Power of Auto ML
How does AutoML “Magic” Happen
Thanks to our Sponsors:
• Software Architect @
o 17+ years professional experience
• Microsoft Azure MVP
• External Expert Horizon 2020, Eurostars-Eureka
• External Expert InnoFund Denmark, RIF Cyprus
• Business Interests
o Web Development, SOA, Integration
o IoT, Machine Learning, Computer Intelligence
o Security & Performance Optimization
• Contact
ivelin.andreev@icb.bg
www.linkedin.com/in/ivelin
www.slideshare.net/ivoandreev
About me
Contents
1. Machine Learning Workflow
2. Visual Interface for Azure ML Service
3. Automated ML
4. Advanced ML with Azure Monitor
5. Deep Learning with Tensorflow
6. AI Ops
7. Cognitive Vision Services
8. Insights with Text Analytics and Vision
9. Cognitive Decision Service
10. Cognitive Search Service
11. Version Control for ML
12. VS Code for Python ML
13. Bot Framework
14. Search Bots with Cognitive Services
15. Bot Architecture Best Practices
16. AI and Cognitive Services in Power BI
17. Form Processing with AI Builder
AGENDA
Auto ML
Pipelines
Auto ML Under the Hood
Azure ML Designer
Demo (AutoML Python SDK)
ML is a Process
• Iterative data science process:
o Business problem understanding
o Data collection, cleaning, exploration
o Model building
o Performance evaluation
o Deployment
• Auto ML: Automate environment,
data preparation,
experimentation,
deployment
AutoML is not Auto Data Science
• Any ML Task = {data} + {problem type} + {loss function}
• ML project effort and budget
o 80% data preparation, 15% modeling and evaluation
o Repetitive effort (react to changes in objectives and data)
• AutoML as a tool
o A recommender system for ML pipelines
to achieve accuracy with less time
• Objective
o Offload data scientists from of repetitive tasks
o Automate problem solution on data with minimal loss
AutoML fills the gap
between “supply” and
“demand” on ML market
AutoML outperforms an
average Data Scientist
Auto ML Builds ML Pipelines
User Input: Dataset, Performance goals, Constraints (CPU, RAM, time)
Auto ML Magic
Results: Automatically determine a pipeline structure with minimal loss on the
validation set within CPU/Memory constraints
Auto ML Steps
1. Determine pipeline structure
2. Select algorithm for each step
3. Tune hyper-parameters
Performance Evaluation
• All 3 steps shall be completed;
• Iterate until performance goals reached
ML Pipeline Steps
An ML pipeline is a technical solution to stitch ML phases and automate workflows
• Data
o Select preprocessing strategy (imbalanced and missing data, normalization, outliers)
o Features (feature extraction, engineering, selection)
• Modeling
o Select algorithm
o Tune hyperparameters (i.e. number of trees)
o Train multiple models, create ensemble
o Score, evaluate, select the best model
• Training & Deployment
o Parallel training on a cluster, Maintain versioning
ML Pipeline Benefits
• Advantages of ML Pipelines
o Parallel and unattended execution
o Reusability through pipeline templates for specific scenarios
o Versioning data and results using pipeline SDK
o Modularity separating areas of concern
o Collaboration among data scientists across ML design process
o Scalability – single ML pipeline can be trained on multiple machines;
different ML pipelines can be tested in parallel on many nodes
• Open Issue
How do pipelines “learn” what to do???
“No free lunch” theorem simplified
(David Wolpert, 1996)
1. Model is simplification of reality
2. Simplification is based on bias
3. Bias fails in some situations
Conclusion 1: No algorithm or
parameter set is always the best.
Conclusion 2: Use knowledge
about data and context.
Automated Data Preparation
Step 1: Data Ingestion
• Requires data storage (Azure Blob mounted by default)
• Data quality issues are common (missing data, mixed units and formats)
• Evaluate quality, select initial features (statistical analysis and visualization)
Rule of Thumb: No algorithm could achieve good results with bad data input
Step 2: Data profiling and cleansing
• AutoML provides a variety of statistics to verify dataset is ready for modelling
o Non-numeric (Min, Max, Count)
o Numeric (Mean, StdDev, Variance, Distribution histogram)
• Cleansing cannot be done in GUI
o Python SDK: azureml.dataprep
o ML Turn on “Automtic preprocessing” option
Auto ML Guardrails
What is: Safeguard users against common issues with data and make corrections
Missing Values
• Strategies: Drop rows; intelligently replace missing values based on other data
Class Imbalances
• Most ML algorithms assume equal distribution, majority classes add more bias
• Strategies: Oversampling (add instances to minority class); Undersampling (majority)
Data Leakage
• Dataset includes information that would not be available at time of prediction
• Actual outcome is already known, model performance will be perfect
• Strategies: Remove leaky features; Add noise; Hold back unseen test data
Automated Data Preparation
Step 3: Feature Engineering
• Impute missing values (mode for categorical, mean for numerical)
• Create categorical features from numeric with low diversity
• YYYY, MM, dd, HH, mm, ss, Day of week, Day of year, Quarter, Week Nr from date
• One-hot encode low cardinality categorical vars (i.e. Gender -> IsMale, IsFemale)
• K-means clustering on each numeric columns for distance to centroid feature
• Term frequency for text variables
• Outlier treatment
Note: General-purpose steps are not domain specific (i.e. income/debt ratio)
Automated Data Preparation
Step 3 just got you into a problem 
• Feature engineering could generate too many features
• Solution need to avoid overfitting, reduce model training time
• We did not put domain knowledge
Step 4: Feature Selection (limited in AutoML)
• Drop high cardinality variables (noise)
• Drop no variance variables (non-informative)
Possible future improvements
• Drop highly correlated fields
Algorithm Selection and Hyperparametrization
Challenges of Configuration Space
• High-dimensionality (multiple continuous, categorical, binary variables)
• Conditionality (some parameter values are relevant in combination)
• No Gradient (loss function has no gradient, expensive evaluation)
Opt3: Bayesian OptimizationOpt1: Grid Search / Brute Force
• Cartesian product on hyperparameter combinations
• The simplest method, dimensionality curse
Opt2: Random Search
• Random configurations within certain budget
• Good baseline, no assumptions, easy parallelization
Meta Learning in AutoML
Challenges
• Avoid starting from scratch on new ML tasks
• Learn from experience, efficiently and in systematic data-driven way
Prerequisite
• Collect meta-data to describe previous tasks (parameters, pipeline structure, evaluations)
Result
• Meta-learner to recommend promising configurations w/o exhaustive search
Notes
• If datasets have similar results on few pipelines => similar results on remaining pipelines
• Operates similarly to recommender systems
• Privacy: AML has no need to access customer data, only pipeline results
Cross-Validation and Ensembling
Cross Validation
• Divide training data in k-subsets
• Repeat k-times: hold out ki, validate on k-1 subsets;
• Average error estimation across k error estimations
Ensembling (bagging, boosting, stacking)
• Combine few of best ML models for improved accuracy at no extra cost
Building Azure ML Pipelines
Azure ML Designer vs Azure ML Studio
• ML Studio – collaborative drag-drop workspace to build, test and deploy ML
• Azure ML – designer, SDK and CLI for data prep., train and deploy ML at scale
Azure ML Designer ML Studio (Classic)
Availability Preview (2019) Generally available (GA) (2015)
Drag-drop interface Yes Yes
Scalability With compute target Up to 10GB data limit training
Module rich Important only Multiple
Compute AML computer CPU/GPU Proprietary compute, CPU only
ML Pipeline Authoring, publishing N/A
ML Ops Flexible deployment and versioning Basic management and deploy
Model portability Portable Proprietary, non-portable
Auto ML Through SDK N/A
Azure ML
What is: cloud-based environment to rapidly build and deploy machine learning
models, by auto-scaling powerful CPU or GPU clusters
How to:
1. 4 Development environments for AML – cloud-based notebook VM (easiest);
local (with Azure subscription), Data Science VM and Azure Databricks
2. Create workspace (Python SDK or Azure Portal)
3. azureml.dataprep Python package to explore, cleanse and transform
4. Train target (Local PC, Azure Linux VM, HDInsight for Spark)
5. azureml.train recommend pipeline based on target metrics
6. Register models for tag, search and deploy (even models trained outside AML)
7. Deploy to Azure Container Instance serverless containers
Interpreting Learning Results (Classification)
• Confusion Matrix
o Rows – true class, Columns – predicted class
o Good model = most values along the diagonal
• Precision-Recall Chart
o Precision = TP / (TP + FP), ability to label correctly
o Recall = TP / (TP + FN), ability to find all instances
o Macro Average PR – independent PR average
o Micro Average PR – weighted PR average (imbalanced)
o Draw PR chart - at different threshold values
• ROC Chart – TP Rate / FP Rate over different thresholds
FPR = FP / (FP + TN) (best is close to 0), TPR = TP (TP + FN) (best is close to 1)
Lift, Gain and Calibration Charts
• Lift Chart – How many times the model is better than random
o Ratio of gain%/random expectation% at a given decile level
o Green line – baseline random guess
• Gain Chart – how much to sample to get target sensitivity (TPR)
o X – percentile addressed, Y - portion positive responses
o Green line - baseline random guess
• Calibration Chart
o Confidence of a predictive model
o Predicted vs actual probability
o Good model: y=x
o Overly confident: y=0 and y=1
Note: perfectly calibrated classifier != perfect classifier
Containers meet Machine Learning
• Steps: (from Portal or AML SDK management API)
o Add model (from local workspace or upload model)
o Add driver script
o Add package dependency file (YML)
o The system creates Docker image and register to Workspace
• Deployment
o Azure Container Instance (ACI) - test, Azure Kubernetes Service (AKS) - prod
o Azure ML Compute, Azure IoT Edge
• Operationalization
o REST API is created automatically
Operationalization
• REST APIs
o Deployment an AML model web service creates single and batch REST API
o APIs consumed by azureml.core.webservice
• Performance Degradation
o Performance in real life may differ from during training
o Data drift - change in characteristics of input data over time
• Monitoring and Drift Analysis
o Input data change over time and lead to performance degradation
o Configure inference data to snapshot and profile against baseline
o ML model trained to detect differences
o Model performance converted to drift coefficient
Takeaways
• Books
o AI MVP Book: Automated Machine Learning
https://www.amazon.com/gp/aw/d/B082P5MK8Y
o Practical Automated ML on Azure
• The No Free Lunch Theorem
https://www.kdnuggets.com/2019/09/no-free-lunch-data-science.html
• Azure ML Studio vs Azure ML Services designer
https://www.codit.eu/blog/azure-machine-learning-studio-vs-services/
https://docs.microsoft.com/en-us/azure/machine-learning/compare-azure-ml-to-
studio-classic
• Bayes Theorem
https://towardsdatascience.com/understanding-bayes-theorem-7e31b8434d4b
Azure ML StudioAure ML Service
Thanks to our Sponsors:

More Related Content

What's hot

What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
Henrik Skogström
 
Microsoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine LearningMicrosoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine Learning
Setu Chokshi
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.
Knoldus Inc.
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
Databricks
 
ML-Ops how to bring your data science to production
ML-Ops  how to bring your data science to productionML-Ops  how to bring your data science to production
ML-Ops how to bring your data science to production
Herman Wu
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
Databricks
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
Pieter de Bruin
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerAmazon Web Services
 
Ml ops past_present_future
Ml ops past_present_futureMl ops past_present_future
Ml ops past_present_future
Nisha Talagala
 
Azure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshopAzure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshop
Parashar Shah
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
AllenPeter7
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
Databricks
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
Weaveworks
 
MLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumMLOps by Sasha Rosenbaum
MLOps by Sasha Rosenbaum
Sasha Rosenbaum
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
Databricks
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
Stepan Pushkarev
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
cnvrg.io AI OS - Hands-on ML Workshops
 
Introduction to Azure Machine Learning
Introduction to Azure Machine LearningIntroduction to Azure Machine Learning
Introduction to Azure Machine Learning
Paul Prae
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
Carl W. Handlin
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
Joaquin Vanschoren
 

What's hot (20)

What is MLOps
What is MLOpsWhat is MLOps
What is MLOps
 
Microsoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine LearningMicrosoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine Learning
 
MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.MLOps Bridging the gap between Data Scientists and Ops.
MLOps Bridging the gap between Data Scientists and Ops.
 
Introduction to MLflow
Introduction to MLflowIntroduction to MLflow
Introduction to MLflow
 
ML-Ops how to bring your data science to production
ML-Ops  how to bring your data science to productionML-Ops  how to bring your data science to production
ML-Ops how to bring your data science to production
 
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full LifecycleMLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
MLOps Virtual Event | Building Machine Learning Platforms for the Full Lifecycle
 
MLOps in action
MLOps in actionMLOps in action
MLOps in action
 
Machine Learning & Amazon SageMaker
Machine Learning & Amazon SageMakerMachine Learning & Amazon SageMaker
Machine Learning & Amazon SageMaker
 
Ml ops past_present_future
Ml ops past_present_futureMl ops past_present_future
Ml ops past_present_future
 
Azure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshopAzure AI platform - Automated ML workshop
Azure AI platform - Automated ML workshop
 
MLOps.pptx
MLOps.pptxMLOps.pptx
MLOps.pptx
 
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ... MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
MLflow: Infrastructure for a Complete Machine Learning Life Cycle with Mani ...
 
Using MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOpsUsing MLOps to Bring ML to Production/The Promise of MLOps
Using MLOps to Bring ML to Production/The Promise of MLOps
 
MLOps by Sasha Rosenbaum
MLOps by Sasha RosenbaumMLOps by Sasha Rosenbaum
MLOps by Sasha Rosenbaum
 
MLOps Using MLflow
MLOps Using MLflowMLOps Using MLflow
MLOps Using MLflow
 
Monitoring AI with AI
Monitoring AI with AIMonitoring AI with AI
Monitoring AI with AI
 
MLOps for production-level machine learning
MLOps for production-level machine learningMLOps for production-level machine learning
MLOps for production-level machine learning
 
Introduction to Azure Machine Learning
Introduction to Azure Machine LearningIntroduction to Azure Machine Learning
Introduction to Azure Machine Learning
 
From Data Science to MLOps
From Data Science to MLOpsFrom Data Science to MLOps
From Data Science to MLOps
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 

Similar to The Power of Auto ML and How Does it Work

The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
Ivo Andreev
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
Ivo Andreev
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
Ivo Andreev
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
Ivo Andreev
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
Ivo Andreev
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Ventures
microsoftventures
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...
Institute of Contemporary Sciences
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
vitm11
 
Machine learning
Machine learningMachine learning
Machine learning
Saravanan Subburayal
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
Alon Bochman, CFA
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning Benchmark
Turi, Inc.
 
Practical data science
Practical data sciencePractical data science
Practical data science
Ding Li
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
Michael Gerke
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?
Matei Zaharia
 
Aws autopilot
Aws autopilotAws autopilot
Aws autopilot
Vivek Raja P S
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Databricks
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)
Julien SIMON
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
Miroslaw Staron
 
Azure Machine Learning and ML on Premises
Azure Machine Learning and ML on PremisesAzure Machine Learning and ML on Premises
Azure Machine Learning and ML on Premises
Ivo Andreev
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
Dev Raj Gautam
 

Similar to The Power of Auto ML and How Does it Work (20)

The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
 
The Machine Learning Workflow with Azure
The Machine Learning Workflow with AzureThe Machine Learning Workflow with Azure
The Machine Learning Workflow with Azure
 
Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
 
Intro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft VenturesIntro to Machine Learning by Microsoft Ventures
Intro to Machine Learning by Microsoft Ventures
 
AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...AutoML for user segmentation: how to match millions of users with hundreds of...
AutoML for user segmentation: how to match millions of users with hundreds of...
 
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdfSlides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
Slides-Артем Коваль-Cloud-Native MLOps Framework - DataFest 2021.pdf
 
Machine learning
Machine learningMachine learning
Machine learning
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning Benchmark
 
Practical data science
Practical data sciencePractical data science
Practical data science
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
 
What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?What are the Unique Challenges and Opportunities in Systems for ML?
What are the Unique Challenges and Opportunities in Systems for ML?
 
Aws autopilot
Aws autopilotAws autopilot
Aws autopilot
 
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML ToolkitAugmenting Machine Learning with Databricks Labs AutoML Toolkit
Augmenting Machine Learning with Databricks Labs AutoML Toolkit
 
Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)Building Machine Learning Models Automatically (June 2020)
Building Machine Learning Models Automatically (June 2020)
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
Azure Machine Learning and ML on Premises
Azure Machine Learning and ML on PremisesAzure Machine Learning and ML on Premises
Azure Machine Learning and ML on Premises
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 

More from Ivo Andreev

Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2
Ivo Andreev
 
Architecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for BusinessArchitecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for Business
Ivo Andreev
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
Ivo Andreev
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
Ivo Andreev
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
Ivo Andreev
 
OpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and MisconceptionsOpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and Misconceptions
Ivo Andreev
 
Collecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn DataCollecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn Data
Ivo Andreev
 
Collecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure OrbitalCollecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure Orbital
Ivo Andreev
 
Language Studio and Custom Models
Language Studio and Custom ModelsLanguage Studio and Custom Models
Language Studio and Custom Models
Ivo Andreev
 
CosmosDB for IoT Scenarios
CosmosDB for IoT ScenariosCosmosDB for IoT Scenarios
CosmosDB for IoT Scenarios
Ivo Andreev
 
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simple
Ivo Andreev
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project Bonsai
Ivo Andreev
 
Azure security guidelines for developers
Azure security guidelines for developers Azure security guidelines for developers
Azure security guidelines for developers
Ivo Andreev
 
Autonomous Machines with Project Bonsai
Autonomous Machines with Project BonsaiAutonomous Machines with Project Bonsai
Autonomous Machines with Project Bonsai
Ivo Andreev
 
Global azure virtual 2021 - Azure Lighthouse
Global azure virtual 2021 - Azure LighthouseGlobal azure virtual 2021 - Azure Lighthouse
Global azure virtual 2021 - Azure Lighthouse
Ivo Andreev
 
Flux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSFlux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JS
Ivo Andreev
 
Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challenges
Ivo Andreev
 
Industrial IoT on Azure
Industrial IoT on AzureIndustrial IoT on Azure
Industrial IoT on Azure
Ivo Andreev
 
Flying a Drone with JavaScript and Computer Vision
Flying a Drone with JavaScript and Computer VisionFlying a Drone with JavaScript and Computer Vision
Flying a Drone with JavaScript and Computer Vision
Ivo Andreev
 
ML with Power BI for Business and Pros
ML with Power BI for Business and ProsML with Power BI for Business and Pros
ML with Power BI for Business and Pros
Ivo Andreev
 

More from Ivo Andreev (20)

Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2
 
Architecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for BusinessArchitecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for Business
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
 
OpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and MisconceptionsOpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and Misconceptions
 
Collecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn DataCollecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn Data
 
Collecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure OrbitalCollecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure Orbital
 
Language Studio and Custom Models
Language Studio and Custom ModelsLanguage Studio and Custom Models
Language Studio and Custom Models
 
CosmosDB for IoT Scenarios
CosmosDB for IoT ScenariosCosmosDB for IoT Scenarios
CosmosDB for IoT Scenarios
 
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simple
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project Bonsai
 
Azure security guidelines for developers
Azure security guidelines for developers Azure security guidelines for developers
Azure security guidelines for developers
 
Autonomous Machines with Project Bonsai
Autonomous Machines with Project BonsaiAutonomous Machines with Project Bonsai
Autonomous Machines with Project Bonsai
 
Global azure virtual 2021 - Azure Lighthouse
Global azure virtual 2021 - Azure LighthouseGlobal azure virtual 2021 - Azure Lighthouse
Global azure virtual 2021 - Azure Lighthouse
 
Flux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSFlux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JS
 
Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challenges
 
Industrial IoT on Azure
Industrial IoT on AzureIndustrial IoT on Azure
Industrial IoT on Azure
 
Flying a Drone with JavaScript and Computer Vision
Flying a Drone with JavaScript and Computer VisionFlying a Drone with JavaScript and Computer Vision
Flying a Drone with JavaScript and Computer Vision
 
ML with Power BI for Business and Pros
ML with Power BI for Business and ProsML with Power BI for Business and Pros
ML with Power BI for Business and Pros
 

Recently uploaded

A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
Globus
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
KrzysztofKkol1
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Anthony Dahanne
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
AMB-Review
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
Jelle | Nordend
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
Globus
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
informapgpstrackings
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
Max Andersen
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
Tendenci - The Open Source AMS (Association Management Software)
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 

Recently uploaded (20)

A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Understanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSageUnderstanding Globus Data Transfers with NetSage
Understanding Globus Data Transfers with NetSage
 
Prosigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology SolutionsProsigns: Transforming Business with Tailored Technology Solutions
Prosigns: Transforming Business with Tailored Technology Solutions
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Designing for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web ServicesDesigning for Privacy in Amazon Web Services
Designing for Privacy in Amazon Web Services
 
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
Paketo Buildpacks : la meilleure façon de construire des images OCI? DevopsDa...
 
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdfDominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
Dominate Social Media with TubeTrivia AI’s Addictive Quiz Videos.pdf
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
De mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FMEDe mooiste recreatieve routes ontdekken met RouteYou en FME
De mooiste recreatieve routes ontdekken met RouteYou en FME
 
Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024Globus Compute Introduction - GlobusWorld 2024
Globus Compute Introduction - GlobusWorld 2024
 
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
Field Employee Tracking System| MiTrack App| Best Employee Tracking Solution|...
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Quarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden ExtensionsQuarkus Hidden and Forbidden Extensions
Quarkus Hidden and Forbidden Extensions
 
Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024Globus Compute wth IRI Workflows - GlobusWorld 2024
Globus Compute wth IRI Workflows - GlobusWorld 2024
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
Corporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMSCorporate Management | Session 3 of 3 | Tendenci AMS
Corporate Management | Session 3 of 3 | Tendenci AMS
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 

The Power of Auto ML and How Does it Work

  • 1. DECEMBER 14 GLOBAL AI BOOTCAMP IS POWERED BY: The Power of Auto ML How does AutoML “Magic” Happen
  • 2. Thanks to our Sponsors:
  • 3. • Software Architect @ o 17+ years professional experience • Microsoft Azure MVP • External Expert Horizon 2020, Eurostars-Eureka • External Expert InnoFund Denmark, RIF Cyprus • Business Interests o Web Development, SOA, Integration o IoT, Machine Learning, Computer Intelligence o Security & Performance Optimization • Contact ivelin.andreev@icb.bg www.linkedin.com/in/ivelin www.slideshare.net/ivoandreev About me
  • 4. Contents 1. Machine Learning Workflow 2. Visual Interface for Azure ML Service 3. Automated ML 4. Advanced ML with Azure Monitor 5. Deep Learning with Tensorflow 6. AI Ops 7. Cognitive Vision Services 8. Insights with Text Analytics and Vision 9. Cognitive Decision Service 10. Cognitive Search Service 11. Version Control for ML 12. VS Code for Python ML 13. Bot Framework 14. Search Bots with Cognitive Services 15. Bot Architecture Best Practices 16. AI and Cognitive Services in Power BI 17. Form Processing with AI Builder
  • 5. AGENDA Auto ML Pipelines Auto ML Under the Hood Azure ML Designer Demo (AutoML Python SDK)
  • 6. ML is a Process • Iterative data science process: o Business problem understanding o Data collection, cleaning, exploration o Model building o Performance evaluation o Deployment • Auto ML: Automate environment, data preparation, experimentation, deployment
  • 7. AutoML is not Auto Data Science • Any ML Task = {data} + {problem type} + {loss function} • ML project effort and budget o 80% data preparation, 15% modeling and evaluation o Repetitive effort (react to changes in objectives and data) • AutoML as a tool o A recommender system for ML pipelines to achieve accuracy with less time • Objective o Offload data scientists from of repetitive tasks o Automate problem solution on data with minimal loss
  • 8. AutoML fills the gap between “supply” and “demand” on ML market AutoML outperforms an average Data Scientist
  • 9. Auto ML Builds ML Pipelines User Input: Dataset, Performance goals, Constraints (CPU, RAM, time) Auto ML Magic Results: Automatically determine a pipeline structure with minimal loss on the validation set within CPU/Memory constraints Auto ML Steps 1. Determine pipeline structure 2. Select algorithm for each step 3. Tune hyper-parameters Performance Evaluation • All 3 steps shall be completed; • Iterate until performance goals reached
  • 10. ML Pipeline Steps An ML pipeline is a technical solution to stitch ML phases and automate workflows • Data o Select preprocessing strategy (imbalanced and missing data, normalization, outliers) o Features (feature extraction, engineering, selection) • Modeling o Select algorithm o Tune hyperparameters (i.e. number of trees) o Train multiple models, create ensemble o Score, evaluate, select the best model • Training & Deployment o Parallel training on a cluster, Maintain versioning
  • 11. ML Pipeline Benefits • Advantages of ML Pipelines o Parallel and unattended execution o Reusability through pipeline templates for specific scenarios o Versioning data and results using pipeline SDK o Modularity separating areas of concern o Collaboration among data scientists across ML design process o Scalability – single ML pipeline can be trained on multiple machines; different ML pipelines can be tested in parallel on many nodes • Open Issue How do pipelines “learn” what to do???
  • 12. “No free lunch” theorem simplified (David Wolpert, 1996) 1. Model is simplification of reality 2. Simplification is based on bias 3. Bias fails in some situations Conclusion 1: No algorithm or parameter set is always the best. Conclusion 2: Use knowledge about data and context.
  • 13. Automated Data Preparation Step 1: Data Ingestion • Requires data storage (Azure Blob mounted by default) • Data quality issues are common (missing data, mixed units and formats) • Evaluate quality, select initial features (statistical analysis and visualization) Rule of Thumb: No algorithm could achieve good results with bad data input Step 2: Data profiling and cleansing • AutoML provides a variety of statistics to verify dataset is ready for modelling o Non-numeric (Min, Max, Count) o Numeric (Mean, StdDev, Variance, Distribution histogram) • Cleansing cannot be done in GUI o Python SDK: azureml.dataprep o ML Turn on “Automtic preprocessing” option
  • 14. Auto ML Guardrails What is: Safeguard users against common issues with data and make corrections Missing Values • Strategies: Drop rows; intelligently replace missing values based on other data Class Imbalances • Most ML algorithms assume equal distribution, majority classes add more bias • Strategies: Oversampling (add instances to minority class); Undersampling (majority) Data Leakage • Dataset includes information that would not be available at time of prediction • Actual outcome is already known, model performance will be perfect • Strategies: Remove leaky features; Add noise; Hold back unseen test data
  • 15. Automated Data Preparation Step 3: Feature Engineering • Impute missing values (mode for categorical, mean for numerical) • Create categorical features from numeric with low diversity • YYYY, MM, dd, HH, mm, ss, Day of week, Day of year, Quarter, Week Nr from date • One-hot encode low cardinality categorical vars (i.e. Gender -> IsMale, IsFemale) • K-means clustering on each numeric columns for distance to centroid feature • Term frequency for text variables • Outlier treatment Note: General-purpose steps are not domain specific (i.e. income/debt ratio)
  • 16. Automated Data Preparation Step 3 just got you into a problem  • Feature engineering could generate too many features • Solution need to avoid overfitting, reduce model training time • We did not put domain knowledge Step 4: Feature Selection (limited in AutoML) • Drop high cardinality variables (noise) • Drop no variance variables (non-informative) Possible future improvements • Drop highly correlated fields
  • 17. Algorithm Selection and Hyperparametrization Challenges of Configuration Space • High-dimensionality (multiple continuous, categorical, binary variables) • Conditionality (some parameter values are relevant in combination) • No Gradient (loss function has no gradient, expensive evaluation) Opt3: Bayesian OptimizationOpt1: Grid Search / Brute Force • Cartesian product on hyperparameter combinations • The simplest method, dimensionality curse Opt2: Random Search • Random configurations within certain budget • Good baseline, no assumptions, easy parallelization
  • 18. Meta Learning in AutoML Challenges • Avoid starting from scratch on new ML tasks • Learn from experience, efficiently and in systematic data-driven way Prerequisite • Collect meta-data to describe previous tasks (parameters, pipeline structure, evaluations) Result • Meta-learner to recommend promising configurations w/o exhaustive search Notes • If datasets have similar results on few pipelines => similar results on remaining pipelines • Operates similarly to recommender systems • Privacy: AML has no need to access customer data, only pipeline results
  • 19. Cross-Validation and Ensembling Cross Validation • Divide training data in k-subsets • Repeat k-times: hold out ki, validate on k-1 subsets; • Average error estimation across k error estimations Ensembling (bagging, boosting, stacking) • Combine few of best ML models for improved accuracy at no extra cost
  • 20. Building Azure ML Pipelines
  • 21. Azure ML Designer vs Azure ML Studio • ML Studio – collaborative drag-drop workspace to build, test and deploy ML • Azure ML – designer, SDK and CLI for data prep., train and deploy ML at scale Azure ML Designer ML Studio (Classic) Availability Preview (2019) Generally available (GA) (2015) Drag-drop interface Yes Yes Scalability With compute target Up to 10GB data limit training Module rich Important only Multiple Compute AML computer CPU/GPU Proprietary compute, CPU only ML Pipeline Authoring, publishing N/A ML Ops Flexible deployment and versioning Basic management and deploy Model portability Portable Proprietary, non-portable Auto ML Through SDK N/A
  • 22. Azure ML What is: cloud-based environment to rapidly build and deploy machine learning models, by auto-scaling powerful CPU or GPU clusters How to: 1. 4 Development environments for AML – cloud-based notebook VM (easiest); local (with Azure subscription), Data Science VM and Azure Databricks 2. Create workspace (Python SDK or Azure Portal) 3. azureml.dataprep Python package to explore, cleanse and transform 4. Train target (Local PC, Azure Linux VM, HDInsight for Spark) 5. azureml.train recommend pipeline based on target metrics 6. Register models for tag, search and deploy (even models trained outside AML) 7. Deploy to Azure Container Instance serverless containers
  • 23. Interpreting Learning Results (Classification) • Confusion Matrix o Rows – true class, Columns – predicted class o Good model = most values along the diagonal • Precision-Recall Chart o Precision = TP / (TP + FP), ability to label correctly o Recall = TP / (TP + FN), ability to find all instances o Macro Average PR – independent PR average o Micro Average PR – weighted PR average (imbalanced) o Draw PR chart - at different threshold values • ROC Chart – TP Rate / FP Rate over different thresholds FPR = FP / (FP + TN) (best is close to 0), TPR = TP (TP + FN) (best is close to 1)
  • 24. Lift, Gain and Calibration Charts • Lift Chart – How many times the model is better than random o Ratio of gain%/random expectation% at a given decile level o Green line – baseline random guess • Gain Chart – how much to sample to get target sensitivity (TPR) o X – percentile addressed, Y - portion positive responses o Green line - baseline random guess • Calibration Chart o Confidence of a predictive model o Predicted vs actual probability o Good model: y=x o Overly confident: y=0 and y=1 Note: perfectly calibrated classifier != perfect classifier
  • 25. Containers meet Machine Learning • Steps: (from Portal or AML SDK management API) o Add model (from local workspace or upload model) o Add driver script o Add package dependency file (YML) o The system creates Docker image and register to Workspace • Deployment o Azure Container Instance (ACI) - test, Azure Kubernetes Service (AKS) - prod o Azure ML Compute, Azure IoT Edge • Operationalization o REST API is created automatically
  • 26. Operationalization • REST APIs o Deployment an AML model web service creates single and batch REST API o APIs consumed by azureml.core.webservice • Performance Degradation o Performance in real life may differ from during training o Data drift - change in characteristics of input data over time • Monitoring and Drift Analysis o Input data change over time and lead to performance degradation o Configure inference data to snapshot and profile against baseline o ML model trained to detect differences o Model performance converted to drift coefficient
  • 27. Takeaways • Books o AI MVP Book: Automated Machine Learning https://www.amazon.com/gp/aw/d/B082P5MK8Y o Practical Automated ML on Azure • The No Free Lunch Theorem https://www.kdnuggets.com/2019/09/no-free-lunch-data-science.html • Azure ML Studio vs Azure ML Services designer https://www.codit.eu/blog/azure-machine-learning-studio-vs-services/ https://docs.microsoft.com/en-us/azure/machine-learning/compare-azure-ml-to- studio-classic • Bayes Theorem https://towardsdatascience.com/understanding-bayes-theorem-7e31b8434d4b
  • 28. Azure ML StudioAure ML Service
  • 29. Thanks to our Sponsors: