Building Machine Learning
Classifiers
Mostafa Elzoghbi
Sr. Technical Evangelist – Microsoft
@MostafaElzoghbi
http://mostafa.rocks
Session Objectives & Takeaways
• What is Machine Learning?
• Azure Machine Learning (AML) for ML Solutions
• Machine Learning Classifiers
• Business Use Cases
• Let’s build smart apps initiative!
What is Machine Learning ?
• Using known data, develop a model to predict unknown data.
Known Data: Big enough archive, previous observations, past data
Model: Known data + Algorithms (ML algorithms)
Unknown Data: Missing, Unseen, not existing, future data
Microsoft Azure Machine Learning
• Web based UI accessible from different browsers
• Share|collaborate to any other ML workspace
• Drag & Drop visual design|development
• Wide range of ML Algorithms catalog
• Extend with OSS R|Python scripts
• Share|Document with IPython|Jupyter
• Deploy|Publish|Scale rapidly (APIs)
Azure Machine Learning Ecosystem
Get/Prepare
Data
Build/Edit
Experiment
Create/Update
Model
Evaluate
Model
Results
Publish Web
Service
Build ML Model Deploy as Web ServiceProvision Workspace
Get Azure
Subscription
Create
Workspace
Publish an App
Azure Data
Marketplace
Blobs and Tables
Hadoop (HDInsight)
Relational DB (Azure SQL DB)
Data Clients
Model is now a web
service that is callable
Monetize the API through
our marketplace
API
Integrated development
environment for Machine
Learning
ML STUDIO
DEMOAzure Machine Learning Studio Capabilities
EXAMPLES
Model (Decision Tree)
Age<30
Income >
$50K
Xbox-One
Customer
Not Xbox-One
Customer
Days Played >
728
Income >
$50K
Xbox-One
Customer
Not Xbox-One
Customer
Xbox-One
Customer
EXAMPLE
Classify a news article as (politics, sports, technology, health, …)
Politics Sports Tech Health
Model (Classification)
Using known data, develop a model to predict unknown data.
Known data (Training data)
Using known data, develop a model to predict unknown data.
Documents Labels
Tech
Health
Politics
Politics
Sports
Documents consist of
unstructured text. Machine
learning typically assumes a
more structured format of
examples
Process the raw data
Known data (Training data)
Using known data, develop a model to predict unknown data.
LabelsDocuments
Feature
Documents Labels
Tech
Health
Politics
Politics
Sports
Process each data instance to represent it as a feature vector
Feature vector
Known data
Data instance
i.e.
{40, (180, 82), (11,7), 70, …..} : Healthy
Age Height/Weight
Blood Pressure
Hearth Rate
LabelFeatures
Feature Vector
Developing a Model
Using known data, develop a model to predict unknown data.
Documents Labels
Tech
Health
Politics
Politics
Sports
Training data
Train
the
Model
Feature Vectors
Base
Model
Adjust
Parameters
Model’s Performance
Known data with true labels
Tech
Health
Politics
Politics
Sports
Tech
Health
Politics
Politics
Sports
Tech
Health
Politics
Politics
Sports
Model’s
Performance
Difference between
“True Labels” and
“Predicted Labels”
True
labels
Tech
Health
Politics
Politics
Sports
Predicted
labels
Train the Model
Split
Detach
+/-
+/-
+/-
Steps to Build a Machine Learning Solution
1
Problem
Framing
2
Get/Prepare
Data
3
Develop
Model
4
Deploy
Model
5
Evaluate /
Track
Performance
3.1
Analysis/
Metric
definition
3.2
Feature
Engineering
3.3
Model
Training
3.4
Parameter
Tuning
3.5
Evaluation
Machine Learning Algorithms
Flavors of machine learning algorithms
• Supervised
• Unsupervised
• Reinforcement learning (n/a in AML)
Most commonly used machine learning algorithms are supervised
(requires labels)
• Supervised learning examples
• This customer will like coffee
• This network traffic indicates a
denial of service attack
• Unsupervised learning examples
• These customers are similar
• This network traffic is unusual
Common Classes of Algorithms
(Supervised|Unsupervised)
Classification Regression Anomaly
Detection
Clustering
Supervised Supervised SupervisedUnsupervised
Classification
A classification technique (or classifier) is a systematic approach to
build classification models from an input data set.
Examples include decision tree classifiers, rule-based classifiers, neural
networks, support vector machines, and naıve Bayes classifiers.
Scenarios:
▪ Which customer are more likely to buy, stay, leave (churn analysis)
▪ Which transactions|actions are fraudulent
▪ Which quotes are more likely to become orders
▪ Recognition of patterns: speech, speaker, image, movement, etc.
Algorithms: Boosted Decision Tree, Decision Forest, Decision Jungle,
Logistic Regression, SVM, ANN, etc. (14 algorithms so far)
Classification
Binary versus Multiclass Classification
Does your customer want a yes|no answer?
• Binary examples
• click prediction
• yes|no
• over|under
• win|loss
• Multiclass examples
• kind of tree
• kind of network attack
• type of heart disease
ML Classifier Types
• Two Class Classifiers:
• Answer: Yes/No, T/F
• Multi-Class Classifiers:
• Multiple answers
• Definitive list of options
DEMOPredict an individual Income (>=50K) - Binary Classifier
Linear Classifier
Logistic Regression Classifier
Trees, forests, and jungles
Neural networks and perceptrons
Support Vector Machines (SVMs)
Bayesian methods
Bayesian methods have a highly desirable quality: they avoid overfitting.
They do this by making some assumptions beforehand about the likely distribution of the answer.
Another byproduct of this approach is that they have very few parameters.
Azure Machine Learning has both Bayesian algorithms for both classification (Two-class Bayes' point
machine) and regression (Bayesian linear regression).
Note that these assume that the data can be split or fit with a straight line.
Business Use Cases
DEMOAzure Machine Learning Classification Algorithms
DEMO
Letter Recognition (Multi-Class Classifier)
References
• Free e-book “Azure Machine Learning”
• https://mva.microsoft.com/ebooks#9780735698178
• Azure Machine Learning documentation
• https://azure.microsoft.com/en-us/documentation/services/machine-learning/
• Data Science and Machine Learning Essentials
• www.edx.org
• Azure ML Camp Files (labs & presentation) in GitHub:
https://github.com/melzoghbi/DataCamp
• Azure ML HOL (GitHub):
• https://github.com/Azure-Readiness/hol-azure-machine-learning/
Thank you
• Check out my blog for Azure ML articles: http://mostafa.rocks
• Follow me on Twitter: @MostafaElzoghbi
• Want some help in building ML Solutions? Contact me to know more.

Machine Learning Classifiers

  • 1.
    Building Machine Learning Classifiers MostafaElzoghbi Sr. Technical Evangelist – Microsoft @MostafaElzoghbi http://mostafa.rocks
  • 2.
    Session Objectives &Takeaways • What is Machine Learning? • Azure Machine Learning (AML) for ML Solutions • Machine Learning Classifiers • Business Use Cases • Let’s build smart apps initiative!
  • 3.
    What is MachineLearning ? • Using known data, develop a model to predict unknown data. Known Data: Big enough archive, previous observations, past data Model: Known data + Algorithms (ML algorithms) Unknown Data: Missing, Unseen, not existing, future data
  • 4.
    Microsoft Azure MachineLearning • Web based UI accessible from different browsers • Share|collaborate to any other ML workspace • Drag & Drop visual design|development • Wide range of ML Algorithms catalog • Extend with OSS R|Python scripts • Share|Document with IPython|Jupyter • Deploy|Publish|Scale rapidly (APIs)
  • 5.
    Azure Machine LearningEcosystem Get/Prepare Data Build/Edit Experiment Create/Update Model Evaluate Model Results Publish Web Service Build ML Model Deploy as Web ServiceProvision Workspace Get Azure Subscription Create Workspace Publish an App Azure Data Marketplace
  • 6.
    Blobs and Tables Hadoop(HDInsight) Relational DB (Azure SQL DB) Data Clients Model is now a web service that is callable Monetize the API through our marketplace API Integrated development environment for Machine Learning ML STUDIO
  • 7.
    DEMOAzure Machine LearningStudio Capabilities
  • 8.
  • 9.
    Model (Decision Tree) Age<30 Income> $50K Xbox-One Customer Not Xbox-One Customer Days Played > 728 Income > $50K Xbox-One Customer Not Xbox-One Customer Xbox-One Customer
  • 10.
  • 11.
    Classify a newsarticle as (politics, sports, technology, health, …) Politics Sports Tech Health Model (Classification) Using known data, develop a model to predict unknown data.
  • 12.
    Known data (Trainingdata) Using known data, develop a model to predict unknown data. Documents Labels Tech Health Politics Politics Sports Documents consist of unstructured text. Machine learning typically assumes a more structured format of examples Process the raw data
  • 13.
    Known data (Trainingdata) Using known data, develop a model to predict unknown data. LabelsDocuments Feature Documents Labels Tech Health Politics Politics Sports Process each data instance to represent it as a feature vector
  • 14.
    Feature vector Known data Datainstance i.e. {40, (180, 82), (11,7), 70, …..} : Healthy Age Height/Weight Blood Pressure Hearth Rate LabelFeatures Feature Vector
  • 15.
    Developing a Model Usingknown data, develop a model to predict unknown data. Documents Labels Tech Health Politics Politics Sports Training data Train the Model Feature Vectors Base Model Adjust Parameters
  • 16.
    Model’s Performance Known datawith true labels Tech Health Politics Politics Sports Tech Health Politics Politics Sports Tech Health Politics Politics Sports Model’s Performance Difference between “True Labels” and “Predicted Labels” True labels Tech Health Politics Politics Sports Predicted labels Train the Model Split Detach +/- +/- +/-
  • 17.
    Steps to Builda Machine Learning Solution 1 Problem Framing 2 Get/Prepare Data 3 Develop Model 4 Deploy Model 5 Evaluate / Track Performance 3.1 Analysis/ Metric definition 3.2 Feature Engineering 3.3 Model Training 3.4 Parameter Tuning 3.5 Evaluation
  • 18.
    Machine Learning Algorithms Flavorsof machine learning algorithms • Supervised • Unsupervised • Reinforcement learning (n/a in AML) Most commonly used machine learning algorithms are supervised (requires labels) • Supervised learning examples • This customer will like coffee • This network traffic indicates a denial of service attack • Unsupervised learning examples • These customers are similar • This network traffic is unusual
  • 19.
    Common Classes ofAlgorithms (Supervised|Unsupervised) Classification Regression Anomaly Detection Clustering Supervised Supervised SupervisedUnsupervised
  • 20.
    Classification A classification technique(or classifier) is a systematic approach to build classification models from an input data set. Examples include decision tree classifiers, rule-based classifiers, neural networks, support vector machines, and naıve Bayes classifiers. Scenarios: ▪ Which customer are more likely to buy, stay, leave (churn analysis) ▪ Which transactions|actions are fraudulent ▪ Which quotes are more likely to become orders ▪ Recognition of patterns: speech, speaker, image, movement, etc. Algorithms: Boosted Decision Tree, Decision Forest, Decision Jungle, Logistic Regression, SVM, ANN, etc. (14 algorithms so far) Classification
  • 21.
    Binary versus MulticlassClassification Does your customer want a yes|no answer? • Binary examples • click prediction • yes|no • over|under • win|loss • Multiclass examples • kind of tree • kind of network attack • type of heart disease
  • 22.
    ML Classifier Types •Two Class Classifiers: • Answer: Yes/No, T/F • Multi-Class Classifiers: • Multiple answers • Definitive list of options
  • 23.
    DEMOPredict an individualIncome (>=50K) - Binary Classifier
  • 25.
  • 26.
  • 27.
  • 28.
  • 29.
  • 30.
    Bayesian methods Bayesian methodshave a highly desirable quality: they avoid overfitting. They do this by making some assumptions beforehand about the likely distribution of the answer. Another byproduct of this approach is that they have very few parameters. Azure Machine Learning has both Bayesian algorithms for both classification (Two-class Bayes' point machine) and regression (Bayesian linear regression). Note that these assume that the data can be split or fit with a straight line.
  • 31.
  • 32.
    DEMOAzure Machine LearningClassification Algorithms
  • 33.
  • 34.
    References • Free e-book“Azure Machine Learning” • https://mva.microsoft.com/ebooks#9780735698178 • Azure Machine Learning documentation • https://azure.microsoft.com/en-us/documentation/services/machine-learning/ • Data Science and Machine Learning Essentials • www.edx.org • Azure ML Camp Files (labs & presentation) in GitHub: https://github.com/melzoghbi/DataCamp • Azure ML HOL (GitHub): • https://github.com/Azure-Readiness/hol-azure-machine-learning/
  • 35.
    Thank you • Checkout my blog for Azure ML articles: http://mostafa.rocks • Follow me on Twitter: @MostafaElzoghbi • Want some help in building ML Solutions? Contact me to know more.