SlideShare a Scribd company logo
April 21
Real World Machine Learning in Azure
The Machine Learning Workflow
Step by Step and in Azure
About me
• Project Manager @
o 16+ years professional experience
• Microsoft Azure MVP
• External Expert Horizon 2020
• External Expert Eurostars, InnoFund DK
• Business Interests
o Web Development, SOA, Integration
o IoT, Machine Learning
o Security & Performance Optimization
• Contact
o ivelin.andreev@icb.bg
o www.linkedin.com/in/ivelin
o www.slideshare.net/ivoandreev
Agenda
• Domain Challenges
• The ML Workflow (step by step)
• ML Options in Azure
• Demo
Programming vs Machine Learning
• How classic programming works?
o Developer is the intelligence
o Array of statements:
• Does a bird fly?
• Yes!... Unless: dead, injured, flightless, missing a wing
o Problems raise at scale, rules and exceptions are endless
o System does not adapt
• ML model is …
o System, answering questions correctly (most of the time )
o Created via training process
o Learns from data and finds patterns
• Use Cases
o Classification, Regression, Recommendation, Anomaly detection
Machine Learning Challenges
• Asking the right questions
• Requires training data
o Real-world data is messy (wrong or missing data)
o Feature engineering transforms to predictive features (i.e. DNA)
o Feature extraction ( i.e. IP Address -> population density)
o Feature selection for informative features
• Overfitting model
o “Kicks ass” while training , fails badly on real predictions
• Model validation
o “Sense” how well your model will work on new data
The purpose of ML modelling is:
• Generate predictions
• Understand true relations
• Parametric Methods
o Step 1: Select a form for the function (i.e. f(X)=a.X + b)
o Step 2: Learn the coefficients from the training data
o Pros: Simple, Speed, Less training data
o i.e. Linear Regression 𝒚 = 𝜷 𝟎 + 𝜷1*Credit_Line + 𝜷2*Education_Level + 𝜷3*Age
• Nonparametric Methods
o No fixed functional form
o Pros: Flexiblе, No assumptions, Predictive power
o Cons: Overfitting, Slower, More training data
o i.e. Decision Tree
Model Types
ML vs. Statistical Modelling
• Statistical Models
o Require understanding how data were collected
o Aggregate data into numbers to understand structure
o Easily interpretable on lower dimensional datasets
• Data Science
o Bridges the gap
o Find out patterns in data and come with initial insights
• ML Models
o Make data speak instead of following initial hypothesis
o Customizable to fit business domain
o Scale to handle thousands of features
Do you know which is
the “sexiest” job
of 21st century?
You nailed it!
Harvard Business Review
claims that the answer is
DATA SCIENTIST
• Appealing
o 64% believe they are working in this century’s most appealing job
• In demand
o 90% contacted at least once a month with job offer
o 50% - weekly, 30% - several times/week, 35% have <2y experience
• The dark side…
o All models are wrong, some are useful
o 80% time is data preparation
o Real life, not academic problems
o Non-linear process
o No full automation
• No one cares how you do it
• Presentation is the key
The Truth about Data Science
MASTERING THE TOOLS
That does not transform
you to a watchmaker
There are yet
process and experience
Iterative ML Process
Data Understanding
• Mosaic plot
o Categorical distribution
o Visualizes the relation between X and Y
o Strong relation = Y-splits are far apart
• Box plot
o Continuous distribution
o Distribution of numeric variable
o Identify and discard outliers (IQR)
• Scatter plot
o How much a variable determines another
https://www.kaggle.com/saisivasriram/titanic-feature-understanding-from-plots
• Make features usable
o Numerical
o Categorical (i.e. week day)
o PCA dimensionality reduction
(clustering, low covariance)
o Dummy variables
• Handle missing data
• Normalize data
o Standard range of numerical scale (i.e. from [-1000;1000] -> [0;1], [-1;1])
o Value range influence the importance of the feature compared to other
Data Preprocessing
Feature Engineering
Def: Using transformations on raw data to create new
features, more closely related to target variable
• Create features more closely related to target variable
o i.e. defaulting customer – debt-to-balance ratio = debt / balance
• Bring external data sources (i.e. Google places from IP address)
• Create features that are easily interpreted (i.e. date to day & month)
• You are using unstructured data sources (i.e. text, video)
• Create features, experiment, choose with best predictive power
Note: Domain knowledge is important (i.e. 7th is a pension day)
Note: All information is encoded in the digital media
• Images
o Step 1: Colour statistics, EXIF metadata, edges, shapes
o Step 2: Extract knowledge in fixed set of numeric characteristics
• Text
o Step 1
• Bagging, N-grams, term frequency, topic modelling, stemming
• Named entity recognition (i.e. Wikipedia)
o Step 2: Extract knowledge in fixed set of numeric characteristics
Digital Media Feature Engineering
Modelling Starts by Selecting Algorithm
• There are other ML tools
• There are many more algorithms
• You could make custom
implementations
Basic evaluation workflow
• Pick performance metric based on algorithm type
• Tweak data and model until target performance reached
CAUTION: Common problems
• Using the same data for validation and training
o Split data - 20-40% of data for validation
o K-fold cross validate - repeated random split with beats split noise
• Overfitting and model optimism
o Do not get tempted to model noise (bias-variance tradeoff)
o Do not use temporal features (future features) to predict values in the past
Performance Evaluation
Performance Metrics
• Regression model
o Root Mean Squared Error (RMSE)
o Coefficient of Determination, R2 ϵ [0;1]
• Classification model
o Confusion matrix
• Binary classification model
o Accuracy based on correct answers
o Area under ROC curve (AUC)
• Threshold
• Precision = TP / (TP + FP)
• Recall = TP / (TP + FN)
o PR-curve is better for imbalanced distribution
Tuning Model Parameters
• Model parameters control inner behaviour
o More sophisticated algorithm, more parameters
• i.e. Locally Deep SVM with kernel
o Kernel type, kernel coefficient
• How parameter tuning works?
1. Choose metric for evaluation (AUC - classification, R2-regression, etc.)
2. Select parameters for optimization
3. Define a grid as Cartesian product between arrays
4. For each combination, cross-validate on training set
5. Select the parameters for the best evaluation
Note: Expected improvement is 3%-8%
Feature Selection - select
the most predictive features
ML handles x1000 params
Not all params are equal
Adding features
Common approach
to increase accuracy
Poor performance
Correlated features could lead to
poor model performance
Overfitting
Learning relations in more detail
may lead to overfitting
Selecting Good Features
• Motivation
o Sometimes the ML goal is not to predict but identify predictive features
o Computational costs are related to number of features
• Approach
o Trying all combinations of features? ( that would be infeasible)
o Algorithms with built-in feature selection (i.e. decision trees)
• Algorithms
o Iterative Forward selection & Backward elimination
o Permutation feature importance
• High importance features are more sensitive to random shuffling of values
o Filter based feature selection
!!!Some features may have more predictive power when paired!!!
And now…
The Microsoft Azure tools
Data preparation
Building models
Consuming models
Azure Machine Learning
• Azure Machine Learning is an integrated, end-to-end
data science and advanced analytics solution
• ML related services and tools
• Highlights
o Built on open source technologies (Jupyter Notebook, Spark, Python, Docker)
o Execute experiments in isolated environments
o GPU-enabled VMs
o Azure ML Workbench
o Azure ML Experimentation Service
o Azure ML Model Management Service
o Azure ML Studio
o Data Science VM
o Libraries for Apache Spark (MMLSpark)
o Visual Studio Code Tools for AI
o Cognitive Toolkit (CNTK)
o Microsoft Cognitive Services
o ML Services for SQL Server (R, Python)
Azure ML Workbench
• Desktop application (Windows, macOS)
• Built-in Jupyter Notebook services and Git integration
• End-to-end process support
o Powerful inspectors for data analysis
o Data transformations by example
o Model development and experimentation (Python)
o Model history and deployment (local, Docker)
Azure ML Studio
• Visual workspace to build, test and deploy ML solutions
• Highlights
o X-browser drag and drop, no programming
o Rich set of modules
o Fits beginners and advanced users
o Unlimited extensibility (R Script, Python Script)
o Enterprise grade cloud service (SLA 99.95%)
o ML REST web services consumption
o Jupyter Notebook
o Azure AI Gallery (8000+ samples)
• At what price?
o Free plan available
o €8.5 per seat + €0.85 per experiment/hour
o Recommended: €85/month (100K requests)
Azure Data Science VM
• Pre-configured cloud environment for AI & Data Science
• Highlights
o Preconfigured, fully operational environment
o 50+ tools DEV, ML, BigData, Data management
o Windows and Linux (Ubuntu/CentOS)
o Updated every few months
o On-demand elastic capacity
o GPU optimized VMs for deep learning
o Up to 4x NV K80 or V100 GPUs
o Up to 128 cores, 3.8TiB RAM
• At what price?
o From €10 to €28’620 per month
Azure ML Experimentation Service
• Handle execution of ML experiments in virtual environment
for isolated, consistent and reproducible results (since 09.2017)
o Local native
o Docker (Local and Remote)
o Azure Spark cluster
• Supports Workbench, records and presents run history
• Scalable model consumption
https://docs.microsoft.com/en-us/azure/machine-learning/preview/experimentation-service-configuration
Azure ML Model Management Service
• Provide deployment, hosting, versioning and
management of models in Azure, on-prem and IoT Edge
• Deployment
o Model manifest for Docker image
https://docs.microsoft.com/en-us/azure/machine-
learning/preview/deployment-setup-configuration
• Consumption
o Models exposed on REST API
o Sample code (Java, C#, Python)
• Scalability
o Scale-out to 100x replicas/cluster
o 10 requests/replica (default)
o Autoscaling based on load
• Retraining
o APIs to retrain models and update
model version
Takeaways
• ML in the Microsoft World
o https://docs.microsoft.com/en-us/azure/machine-learning/
• Python for AI
o https://wiki.python.org/moin/PythonForArtificialIntelligence
• Data Science Blog
o https://data-flair.training/blogs/category/machine-learning/
• Starter Books
DEMO
• Data Analysis (Azure ML Workbench)
• Data Preparation (Azure ML Workbench)
• Predictive Maintenance (Azure ML Studio)
Upcoming events
SQLSaturday #711 Plovdiv
02 June 2018
www.sqlsaturday.com/711/
SQLSaturday #763 Sofia
13 Oct 2018
www.sqlsaturday.com/763/
Thanks to our Sponsors:
Global Sponsor:
Platinum Sponsors:
Swag Sponsors: Media Partners:

More Related Content

What's hot

Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
Niko Vuokko
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
thamizh arasi
 
k medoid clustering.pptx
k medoid clustering.pptxk medoid clustering.pptx
k medoid clustering.pptx
Roshan86572
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
live_and_let_live
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Simplilearn
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
Ramakant Soni
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
hktripathy
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
Liangqun Lu
 
Semi-supervised Learning
Semi-supervised LearningSemi-supervised Learning
Semi-supervised Learningbutest
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
Gopal Sakarkar
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
Amr Abd El Latief
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
Jason Rodrigues
 
Object persistence
Object persistenceObject persistence
Object persistenceVlad Vega
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
Vishal Patel
 
Data mining
Data miningData mining
Data mining
Birju Tank
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Edureka!
 
Database Basics Theory
Database Basics TheoryDatabase Basics Theory
Database Basics Theory
sunmitraeducation
 
DBMS Bascis
DBMS BascisDBMS Bascis

What's hot (20)

Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Decision tree induction
Decision tree inductionDecision tree induction
Decision tree induction
 
k medoid clustering.pptx
k medoid clustering.pptxk medoid clustering.pptx
k medoid clustering.pptx
 
NAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITIONNAMED ENTITY RECOGNITION
NAMED ENTITY RECOGNITION
 
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
Recurrent Neural Network (RNN) | RNN LSTM Tutorial | Deep Learning Course | S...
 
NOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQLNOSQL- Presentation on NoSQL
NOSQL- Presentation on NoSQL
 
Lecture1 introduction to big data
Lecture1 introduction to big dataLecture1 introduction to big data
Lecture1 introduction to big data
 
web mining
web miningweb mining
web mining
 
BERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from TransformersBERT: Bidirectional Encoder Representations from Transformers
BERT: Bidirectional Encoder Representations from Transformers
 
Semi-supervised Learning
Semi-supervised LearningSemi-supervised Learning
Semi-supervised Learning
 
Data preprocessing using Machine Learning
Data  preprocessing using Machine Learning Data  preprocessing using Machine Learning
Data preprocessing using Machine Learning
 
Data mining concepts and work
Data mining concepts and workData mining concepts and work
Data mining concepts and work
 
Data preprocessing
Data preprocessingData preprocessing
Data preprocessing
 
Object persistence
Object persistenceObject persistence
Object persistence
 
The Data Science Process
The Data Science ProcessThe Data Science Process
The Data Science Process
 
Data mining
Data miningData mining
Data mining
 
Data Mining
Data MiningData Mining
Data Mining
 
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
Recurrent Neural Networks (RNN) | RNN LSTM | Deep Learning Tutorial | Tensorf...
 
Database Basics Theory
Database Basics TheoryDatabase Basics Theory
Database Basics Theory
 
DBMS Bascis
DBMS BascisDBMS Bascis
DBMS Bascis
 

Similar to The Machine Learning Workflow with Azure

Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
Ivo Andreev
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
Ivo Andreev
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
Ivo Andreev
 
Azure Machine Learning and ML on Premises
Azure Machine Learning and ML on PremisesAzure Machine Learning and ML on Premises
Azure Machine Learning and ML on Premises
Ivo Andreev
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
Ivo Andreev
 
IoT with Azure Machine Learning and InfluxDB
IoT with Azure Machine Learning and InfluxDBIoT with Azure Machine Learning and InfluxDB
IoT with Azure Machine Learning and InfluxDB
Ivo Andreev
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
Albert Y. C. Chen
 
Machine learning
Machine learningMachine learning
Machine learning
Saravanan Subburayal
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
GibDevs
 
MachineLearning Seminar PPT.pptx
MachineLearning Seminar PPT.pptxMachineLearning Seminar PPT.pptx
MachineLearning Seminar PPT.pptx
AmanDixit74
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
Dev Raj Gautam
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
Rising Media, Inc.
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningPaco Nathan
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project Bonsai
Ivo Andreev
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
Miroslaw Staron
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Rodney Joyce
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
Alon Bochman, CFA
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
Si Krishan
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
Manish Pandey
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
Paco Nathan
 

Similar to The Machine Learning Workflow with Azure (20)

Prepare your data for machine learning
Prepare your data for machine learningPrepare your data for machine learning
Prepare your data for machine learning
 
The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?The Data Science Process - Do we need it and how to apply?
The Data Science Process - Do we need it and how to apply?
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Azure Machine Learning and ML on Premises
Azure Machine Learning and ML on PremisesAzure Machine Learning and ML on Premises
Azure Machine Learning and ML on Premises
 
Machine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackboxMachine learning for IoT - unpacking the blackbox
Machine learning for IoT - unpacking the blackbox
 
IoT with Azure Machine Learning and InfluxDB
IoT with Azure Machine Learning and InfluxDBIoT with Azure Machine Learning and InfluxDB
IoT with Azure Machine Learning and InfluxDB
 
Machine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional ManagersMachine Learning Foundations for Professional Managers
Machine Learning Foundations for Professional Managers
 
Machine learning
Machine learningMachine learning
Machine learning
 
Choosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your needChoosing a Machine Learning technique to solve your need
Choosing a Machine Learning technique to solve your need
 
MachineLearning Seminar PPT.pptx
MachineLearning Seminar PPT.pptxMachineLearning Seminar PPT.pptx
MachineLearning Seminar PPT.pptx
 
Machine Learning With ML.NET
Machine Learning With ML.NETMachine Learning With ML.NET
Machine Learning With ML.NET
 
1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop1440 track 2 boire_using our laptop
1440 track 2 boire_using our laptop
 
OSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine LearningOSCON 2014: Data Workflows for Machine Learning
OSCON 2014: Data Workflows for Machine Learning
 
Constrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project BonsaiConstrained Optimization with Genetic Algorithms and Project Bonsai
Constrained Optimization with Genetic Algorithms and Project Bonsai
 
AI for Software Engineering
AI for Software EngineeringAI for Software Engineering
AI for Software Engineering
 
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
Data Science for Dummies - Data Engineering with Titanic dataset + Databricks...
 
Kaggle Gold Medal Case Study
Kaggle Gold Medal Case StudyKaggle Gold Medal Case Study
Kaggle Gold Medal Case Study
 
Machine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An IntroMachine Learning 2 deep Learning: An Intro
Machine Learning 2 deep Learning: An Intro
 
Keynote at IWLS 2017
Keynote at IWLS 2017Keynote at IWLS 2017
Keynote at IWLS 2017
 
Data Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAMLData Workflows for Machine Learning - Seattle DAML
Data Workflows for Machine Learning - Seattle DAML
 

More from Ivo Andreev

Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2
Ivo Andreev
 
Architecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for BusinessArchitecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for Business
Ivo Andreev
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
Ivo Andreev
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
Ivo Andreev
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
Ivo Andreev
 
OpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and MisconceptionsOpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and Misconceptions
Ivo Andreev
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
Ivo Andreev
 
Collecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn DataCollecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn Data
Ivo Andreev
 
Collecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure OrbitalCollecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure Orbital
Ivo Andreev
 
Language Studio and Custom Models
Language Studio and Custom ModelsLanguage Studio and Custom Models
Language Studio and Custom Models
Ivo Andreev
 
CosmosDB for IoT Scenarios
CosmosDB for IoT ScenariosCosmosDB for IoT Scenarios
CosmosDB for IoT Scenarios
Ivo Andreev
 
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simple
Ivo Andreev
 
Azure security guidelines for developers
Azure security guidelines for developers Azure security guidelines for developers
Azure security guidelines for developers
Ivo Andreev
 
Autonomous Machines with Project Bonsai
Autonomous Machines with Project BonsaiAutonomous Machines with Project Bonsai
Autonomous Machines with Project Bonsai
Ivo Andreev
 
Global azure virtual 2021 - Azure Lighthouse
Global azure virtual 2021 - Azure LighthouseGlobal azure virtual 2021 - Azure Lighthouse
Global azure virtual 2021 - Azure Lighthouse
Ivo Andreev
 
Flux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSFlux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JS
Ivo Andreev
 
Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challenges
Ivo Andreev
 
Industrial IoT on Azure
Industrial IoT on AzureIndustrial IoT on Azure
Industrial IoT on Azure
Ivo Andreev
 
Flying a Drone with JavaScript and Computer Vision
Flying a Drone with JavaScript and Computer VisionFlying a Drone with JavaScript and Computer Vision
Flying a Drone with JavaScript and Computer Vision
Ivo Andreev
 
ML with Power BI for Business and Pros
ML with Power BI for Business and ProsML with Power BI for Business and Pros
ML with Power BI for Business and Pros
Ivo Andreev
 

More from Ivo Andreev (20)

Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2Cybersecurity and Generative AI - for Good and Bad vol.2
Cybersecurity and Generative AI - for Good and Bad vol.2
 
Architecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for BusinessArchitecting AI Solutions in Azure for Business
Architecting AI Solutions in Azure for Business
 
Cybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and BadCybersecurity Challenges with Generative AI - for Good and Bad
Cybersecurity Challenges with Generative AI - for Good and Bad
 
JS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AIJS-Experts - Cybersecurity for Generative AI
JS-Experts - Cybersecurity for Generative AI
 
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for DevelopersHow do OpenAI GPT Models Work - Misconceptions and Tips for Developers
How do OpenAI GPT Models Work - Misconceptions and Tips for Developers
 
OpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and MisconceptionsOpenAI GPT in Depth - Questions and Misconceptions
OpenAI GPT in Depth - Questions and Misconceptions
 
Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
 
Collecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn DataCollecting and Analysing Spaceborn Data
Collecting and Analysing Spaceborn Data
 
Collecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure OrbitalCollecting and Analysing Satellite Data with Azure Orbital
Collecting and Analysing Satellite Data with Azure Orbital
 
Language Studio and Custom Models
Language Studio and Custom ModelsLanguage Studio and Custom Models
Language Studio and Custom Models
 
CosmosDB for IoT Scenarios
CosmosDB for IoT ScenariosCosmosDB for IoT Scenarios
CosmosDB for IoT Scenarios
 
Forecasting time series powerful and simple
Forecasting time series powerful and simpleForecasting time series powerful and simple
Forecasting time series powerful and simple
 
Azure security guidelines for developers
Azure security guidelines for developers Azure security guidelines for developers
Azure security guidelines for developers
 
Autonomous Machines with Project Bonsai
Autonomous Machines with Project BonsaiAutonomous Machines with Project Bonsai
Autonomous Machines with Project Bonsai
 
Global azure virtual 2021 - Azure Lighthouse
Global azure virtual 2021 - Azure LighthouseGlobal azure virtual 2021 - Azure Lighthouse
Global azure virtual 2021 - Azure Lighthouse
 
Flux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JSFlux QL - Nexgen Management of Time Series Inspired by JS
Flux QL - Nexgen Management of Time Series Inspired by JS
 
Azure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challengesAzure architecture design patterns - proven solutions to common challenges
Azure architecture design patterns - proven solutions to common challenges
 
Industrial IoT on Azure
Industrial IoT on AzureIndustrial IoT on Azure
Industrial IoT on Azure
 
Flying a Drone with JavaScript and Computer Vision
Flying a Drone with JavaScript and Computer VisionFlying a Drone with JavaScript and Computer Vision
Flying a Drone with JavaScript and Computer Vision
 
ML with Power BI for Business and Pros
ML with Power BI for Business and ProsML with Power BI for Business and Pros
ML with Power BI for Business and Pros
 

Recently uploaded

Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
Matt Welsh
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Shahin Sheidaei
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
kalichargn70th171
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
Cyanic lab
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
Globus
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
Globus
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
Globus
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
wottaspaceseo
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
XfilesPro
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Globus
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Globus
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
varshanayak241
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Globus
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
Ortus Solutions, Corp
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
Paco van Beckhoven
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
WSO2
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
Ortus Solutions, Corp
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Globus
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
Georgi Kodinov
 

Recently uploaded (20)

Large Language Models and the End of Programming
Large Language Models and the End of ProgrammingLarge Language Models and the End of Programming
Large Language Models and the End of Programming
 
Vitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume MontevideoVitthal Shirke Microservices Resume Montevideo
Vitthal Shirke Microservices Resume Montevideo
 
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
Gamify Your Mind; The Secret Sauce to Delivering Success, Continuously Improv...
 
A Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdfA Comprehensive Look at Generative AI in Retail App Testing.pdf
A Comprehensive Look at Generative AI in Retail App Testing.pdf
 
Cyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdfCyaniclab : Software Development Agency Portfolio.pdf
Cyaniclab : Software Development Agency Portfolio.pdf
 
GlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote sessionGlobusWorld 2024 Opening Keynote session
GlobusWorld 2024 Opening Keynote session
 
Enhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdfEnhancing Research Orchestration Capabilities at ORNL.pdf
Enhancing Research Orchestration Capabilities at ORNL.pdf
 
How to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good PracticesHow to Position Your Globus Data Portal for Success Ten Good Practices
How to Position Your Globus Data Portal for Success Ten Good Practices
 
How Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptxHow Recreation Management Software Can Streamline Your Operations.pptx
How Recreation Management Software Can Streamline Your Operations.pptx
 
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
How Does XfilesPro Ensure Security While Sharing Documents in Salesforce?
 
Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...Developing Distributed High-performance Computing Capabilities of an Open Sci...
Developing Distributed High-performance Computing Capabilities of an Open Sci...
 
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
Innovating Inference - Remote Triggering of Large Language Models on HPC Clus...
 
Strategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptxStrategies for Successful Data Migration Tools.pptx
Strategies for Successful Data Migration Tools.pptx
 
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
Exploring Innovations in Data Repository Solutions - Insights from the U.S. G...
 
BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024BoxLang: Review our Visionary Licenses of 2024
BoxLang: Review our Visionary Licenses of 2024
 
Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024Cracking the code review at SpringIO 2024
Cracking the code review at SpringIO 2024
 
Accelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with PlatformlessAccelerate Enterprise Software Engineering with Platformless
Accelerate Enterprise Software Engineering with Platformless
 
Into the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdfInto the Box 2024 - Keynote Day 2 Slides.pdf
Into the Box 2024 - Keynote Day 2 Slides.pdf
 
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data AnalysisProviding Globus Services to Users of JASMIN for Environmental Data Analysis
Providing Globus Services to Users of JASMIN for Environmental Data Analysis
 
2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx2024 RoOUG Security model for the cloud.pptx
2024 RoOUG Security model for the cloud.pptx
 

The Machine Learning Workflow with Azure

  • 1. April 21 Real World Machine Learning in Azure The Machine Learning Workflow Step by Step and in Azure
  • 2. About me • Project Manager @ o 16+ years professional experience • Microsoft Azure MVP • External Expert Horizon 2020 • External Expert Eurostars, InnoFund DK • Business Interests o Web Development, SOA, Integration o IoT, Machine Learning o Security & Performance Optimization • Contact o ivelin.andreev@icb.bg o www.linkedin.com/in/ivelin o www.slideshare.net/ivoandreev
  • 3. Agenda • Domain Challenges • The ML Workflow (step by step) • ML Options in Azure • Demo
  • 4. Programming vs Machine Learning • How classic programming works? o Developer is the intelligence o Array of statements: • Does a bird fly? • Yes!... Unless: dead, injured, flightless, missing a wing o Problems raise at scale, rules and exceptions are endless o System does not adapt • ML model is … o System, answering questions correctly (most of the time ) o Created via training process o Learns from data and finds patterns • Use Cases o Classification, Regression, Recommendation, Anomaly detection
  • 5. Machine Learning Challenges • Asking the right questions • Requires training data o Real-world data is messy (wrong or missing data) o Feature engineering transforms to predictive features (i.e. DNA) o Feature extraction ( i.e. IP Address -> population density) o Feature selection for informative features • Overfitting model o “Kicks ass” while training , fails badly on real predictions • Model validation o “Sense” how well your model will work on new data
  • 6. The purpose of ML modelling is: • Generate predictions • Understand true relations
  • 7. • Parametric Methods o Step 1: Select a form for the function (i.e. f(X)=a.X + b) o Step 2: Learn the coefficients from the training data o Pros: Simple, Speed, Less training data o i.e. Linear Regression 𝒚 = 𝜷 𝟎 + 𝜷1*Credit_Line + 𝜷2*Education_Level + 𝜷3*Age • Nonparametric Methods o No fixed functional form o Pros: Flexiblе, No assumptions, Predictive power o Cons: Overfitting, Slower, More training data o i.e. Decision Tree Model Types
  • 8. ML vs. Statistical Modelling • Statistical Models o Require understanding how data were collected o Aggregate data into numbers to understand structure o Easily interpretable on lower dimensional datasets • Data Science o Bridges the gap o Find out patterns in data and come with initial insights • ML Models o Make data speak instead of following initial hypothesis o Customizable to fit business domain o Scale to handle thousands of features
  • 9. Do you know which is the “sexiest” job of 21st century?
  • 10. You nailed it! Harvard Business Review claims that the answer is DATA SCIENTIST
  • 11. • Appealing o 64% believe they are working in this century’s most appealing job • In demand o 90% contacted at least once a month with job offer o 50% - weekly, 30% - several times/week, 35% have <2y experience • The dark side… o All models are wrong, some are useful o 80% time is data preparation o Real life, not academic problems o Non-linear process o No full automation • No one cares how you do it • Presentation is the key The Truth about Data Science
  • 12. MASTERING THE TOOLS That does not transform you to a watchmaker There are yet process and experience
  • 14. Data Understanding • Mosaic plot o Categorical distribution o Visualizes the relation between X and Y o Strong relation = Y-splits are far apart • Box plot o Continuous distribution o Distribution of numeric variable o Identify and discard outliers (IQR) • Scatter plot o How much a variable determines another https://www.kaggle.com/saisivasriram/titanic-feature-understanding-from-plots
  • 15. • Make features usable o Numerical o Categorical (i.e. week day) o PCA dimensionality reduction (clustering, low covariance) o Dummy variables • Handle missing data • Normalize data o Standard range of numerical scale (i.e. from [-1000;1000] -> [0;1], [-1;1]) o Value range influence the importance of the feature compared to other Data Preprocessing
  • 16. Feature Engineering Def: Using transformations on raw data to create new features, more closely related to target variable • Create features more closely related to target variable o i.e. defaulting customer – debt-to-balance ratio = debt / balance • Bring external data sources (i.e. Google places from IP address) • Create features that are easily interpreted (i.e. date to day & month) • You are using unstructured data sources (i.e. text, video) • Create features, experiment, choose with best predictive power Note: Domain knowledge is important (i.e. 7th is a pension day)
  • 17. Note: All information is encoded in the digital media • Images o Step 1: Colour statistics, EXIF metadata, edges, shapes o Step 2: Extract knowledge in fixed set of numeric characteristics • Text o Step 1 • Bagging, N-grams, term frequency, topic modelling, stemming • Named entity recognition (i.e. Wikipedia) o Step 2: Extract knowledge in fixed set of numeric characteristics Digital Media Feature Engineering
  • 18. Modelling Starts by Selecting Algorithm • There are other ML tools • There are many more algorithms • You could make custom implementations
  • 19. Basic evaluation workflow • Pick performance metric based on algorithm type • Tweak data and model until target performance reached CAUTION: Common problems • Using the same data for validation and training o Split data - 20-40% of data for validation o K-fold cross validate - repeated random split with beats split noise • Overfitting and model optimism o Do not get tempted to model noise (bias-variance tradeoff) o Do not use temporal features (future features) to predict values in the past Performance Evaluation
  • 20. Performance Metrics • Regression model o Root Mean Squared Error (RMSE) o Coefficient of Determination, R2 ϵ [0;1] • Classification model o Confusion matrix • Binary classification model o Accuracy based on correct answers o Area under ROC curve (AUC) • Threshold • Precision = TP / (TP + FP) • Recall = TP / (TP + FN) o PR-curve is better for imbalanced distribution
  • 21. Tuning Model Parameters • Model parameters control inner behaviour o More sophisticated algorithm, more parameters • i.e. Locally Deep SVM with kernel o Kernel type, kernel coefficient • How parameter tuning works? 1. Choose metric for evaluation (AUC - classification, R2-regression, etc.) 2. Select parameters for optimization 3. Define a grid as Cartesian product between arrays 4. For each combination, cross-validate on training set 5. Select the parameters for the best evaluation Note: Expected improvement is 3%-8%
  • 22. Feature Selection - select the most predictive features ML handles x1000 params Not all params are equal Adding features Common approach to increase accuracy Poor performance Correlated features could lead to poor model performance Overfitting Learning relations in more detail may lead to overfitting
  • 23. Selecting Good Features • Motivation o Sometimes the ML goal is not to predict but identify predictive features o Computational costs are related to number of features • Approach o Trying all combinations of features? ( that would be infeasible) o Algorithms with built-in feature selection (i.e. decision trees) • Algorithms o Iterative Forward selection & Backward elimination o Permutation feature importance • High importance features are more sensitive to random shuffling of values o Filter based feature selection !!!Some features may have more predictive power when paired!!!
  • 24. And now… The Microsoft Azure tools Data preparation Building models Consuming models
  • 25. Azure Machine Learning • Azure Machine Learning is an integrated, end-to-end data science and advanced analytics solution • ML related services and tools • Highlights o Built on open source technologies (Jupyter Notebook, Spark, Python, Docker) o Execute experiments in isolated environments o GPU-enabled VMs o Azure ML Workbench o Azure ML Experimentation Service o Azure ML Model Management Service o Azure ML Studio o Data Science VM o Libraries for Apache Spark (MMLSpark) o Visual Studio Code Tools for AI o Cognitive Toolkit (CNTK) o Microsoft Cognitive Services o ML Services for SQL Server (R, Python)
  • 26. Azure ML Workbench • Desktop application (Windows, macOS) • Built-in Jupyter Notebook services and Git integration • End-to-end process support o Powerful inspectors for data analysis o Data transformations by example o Model development and experimentation (Python) o Model history and deployment (local, Docker)
  • 27. Azure ML Studio • Visual workspace to build, test and deploy ML solutions • Highlights o X-browser drag and drop, no programming o Rich set of modules o Fits beginners and advanced users o Unlimited extensibility (R Script, Python Script) o Enterprise grade cloud service (SLA 99.95%) o ML REST web services consumption o Jupyter Notebook o Azure AI Gallery (8000+ samples) • At what price? o Free plan available o €8.5 per seat + €0.85 per experiment/hour o Recommended: €85/month (100K requests)
  • 28. Azure Data Science VM • Pre-configured cloud environment for AI & Data Science • Highlights o Preconfigured, fully operational environment o 50+ tools DEV, ML, BigData, Data management o Windows and Linux (Ubuntu/CentOS) o Updated every few months o On-demand elastic capacity o GPU optimized VMs for deep learning o Up to 4x NV K80 or V100 GPUs o Up to 128 cores, 3.8TiB RAM • At what price? o From €10 to €28’620 per month
  • 29. Azure ML Experimentation Service • Handle execution of ML experiments in virtual environment for isolated, consistent and reproducible results (since 09.2017) o Local native o Docker (Local and Remote) o Azure Spark cluster • Supports Workbench, records and presents run history • Scalable model consumption https://docs.microsoft.com/en-us/azure/machine-learning/preview/experimentation-service-configuration
  • 30. Azure ML Model Management Service • Provide deployment, hosting, versioning and management of models in Azure, on-prem and IoT Edge • Deployment o Model manifest for Docker image https://docs.microsoft.com/en-us/azure/machine- learning/preview/deployment-setup-configuration • Consumption o Models exposed on REST API o Sample code (Java, C#, Python) • Scalability o Scale-out to 100x replicas/cluster o 10 requests/replica (default) o Autoscaling based on load • Retraining o APIs to retrain models and update model version
  • 31. Takeaways • ML in the Microsoft World o https://docs.microsoft.com/en-us/azure/machine-learning/ • Python for AI o https://wiki.python.org/moin/PythonForArtificialIntelligence • Data Science Blog o https://data-flair.training/blogs/category/machine-learning/ • Starter Books
  • 32. DEMO • Data Analysis (Azure ML Workbench) • Data Preparation (Azure ML Workbench) • Predictive Maintenance (Azure ML Studio)
  • 33. Upcoming events SQLSaturday #711 Plovdiv 02 June 2018 www.sqlsaturday.com/711/ SQLSaturday #763 Sofia 13 Oct 2018 www.sqlsaturday.com/763/
  • 34. Thanks to our Sponsors: Global Sponsor: Platinum Sponsors: Swag Sponsors: Media Partners: