AutoML 101
2018 Copyright QuantUniversity LLC.
Presented By:
Sri Krishnamurthy, CFA, CAP
sri@quantuniversity.com
www.quantuniversity.com
10/25/2018
QuantUniversity Meetup
Boston
2
About us:
• Data Science, Quant Finance and
Model Governance Advisory
• Technologies using MATLAB, Python
and R
• Programs
▫ Analytics Certificate Program
▫ Fintech programs
• Platform
3
www.analyticscertificate.com/MachineLearning
Use code “Affiliate” for a 20% off by Oct 30th
Upcoming workshop
November 7,8,2018
4
• Your challenge is to design an artificial intelligence and machine
learning (AI/ML) framework capable of flying a drone through
several professional drone racing courses without human
intervention or navigational pre-programming.
AlphaPilot Drone AI Challenge
5
6
• Machine Learning
• Automatic Machine Learning
• Demos
Agenda
7
• “AI is the theory and development of computer systems able to
perform tasks that traditionally have required human intelligence.
• AI is a broad field, of which ‘machine learning’ is a sub-category”
What is Machine Learning and AI?
Source: http://www.fsb.org/wp-content/uploads/P011117.pdf
8
The Machine Learning Process
Data
cleansing
Feature
Engineering
Training and
Testing
Model
building
Model
selection
Hyper
parameter
optimization
Model
Deployment
9
• Supervised Algorithms
▫ Given a set of variables !", predict the value of another variable # in a
given data set such that
▫ If y is numeric => Prediction
▫ If y is categorical => Classification
Machine Learning
x1,x2,x3… Model F(X) y
10
• Unsupervised Algorithms
▫ Given a dataset with variables !", build a model that captures the
similarities in different observations and assigns them to different
buckets => Clustering
Machine Learning
Obs1,
Obs2,Obs3
etc.
Model
Obs1- Class 1
Obs2- Class 2
Obs3- Class 1
11
Supervised
Learning
algorithms
Parametric
models
Non-
Parametric
models
Supervised learning Algorithms - Prediction
12
• Parametric models
▫ Assume some functional form
▫ Fit coefficients
• Examples : Linear Regression, Neural Networks
Supervised Learning models - Prediction
! = #$ + #&'&
Linear Regression Model Neural network Model
13
• Non-Parametric models
▫ No functional form assumed
• Examples : K-nearest neighbors, Decision Trees
Supervised Learning models
K-nearest neighbor Model Decision tree Model
14
15
• Automated machine learning (AutoML) is the process of
automating the end-to-end process of applying machine learning to
real-world problems.
AutoML
16
• Automated Feature Engineering
▫ Feature selection
▫ Feature extraction
▫ Meta learning and transfer learning
▫ Detection and handling of skewed data and/or missing values
• Hyper-parameter optimization
• Model Selection
• Reference:
https://en.wikipedia.org/wiki/Automated_machine_learning
Types of frameworks
17
• Parameters: Values that can be estimated from data
▫ Examples:
– Regression Coefficients
– Weights in a Neural Network
• HyperParameters: Values external to the model and cannot be
learnt from the data
▫ Examples:
– Learning rate in Neural Network
– Regularization parameters
Parameters vs Hyper Parameters
18
• Hyperparameter optimization finds a tuple of hyperparameters that yields an
optimal model which minimizes a predefined loss function on given
independent data.[1]
• [1] Claesen, Marc; Bart De Moor (2015). "Hyperparameter Search in Machine
Learning".
• Image from:
https://support.sas.com/resources/papers/proceedings17/SAS0514-2017.pdf
Hyperparameter optimization
19
• Interpretability: Ability of users to understand the model, the
parameters of the model and their effect on the outcome
• Example:
▫ In regression, coefficients enable us to interpret the influence of an
independent variable on the dependent variable.
▫ The standard error of estimates of the coefficients enable us to
determine how confident are we on these estimates
Model selection considerations
20
• Parsimonious models: A parsimonious model is a model that
accomplishes a desired level of explanation or prediction with as
few predictor variables as possible.
• Example:
▫ In regression, using Exhaustive search, Forward search, Backward
search or Stepwise regression in model selection
▫ Using PCA on the feature space prior to model building
Model selection considerations
21
• Ensemble models: Ensemble methods use multiple learning
algorithms to obtain better predictive performance than could be
obtained from any of the constituent learning algorithms alone.
Image from:
https://blogs.sas.com/content/subconsciousmusings/2017/05/18/sta
cked-ensemble-models-win-data-science-competitions/
Model selection considerations
22
Full pipeline Auotmation
• AutoWEKA is an approach for the simultaneous selection of a machine
learning algorithm and its hyperparameters; combined with
the WEKA package it automatically yields good models for a wide variety
of data sets.
• Auto-sklearn is an extension of AutoWEKA using the Python library scikit-
learn which is a drop-in replacement for regular scikit-learn classifiers and
regressors. It improves over AutoWEKA by using meta-learning to
increase search efficiency and post-hoc ensemble building to combine the
models generated during the hyperparameter optimization process.
• TPOT is a data-science assistant which optimizes machine learning
pipelines using genetic programming.
Ref: https://www.ml4aad.org/automl/
Frameworks
23
Hyper-parameter optimization and Model Selection
• H2O AutoML provides automated model selection and ensembling
for the H2O machine learning and data analytics platform.
• mlr is a R package that contains several hyperparameter
optimization techniques for machine learning problems.
Ref: https://www.ml4aad.org/automl/
Frameworks
24
Deep Neural Network Architecture search
• Google CLOUD AUTOML is an could-based machine learning service
which so far provides the automated generation of computer vision
pipelines.
• Auto Keras is an open-source python package for neural architecture
search.
• Ref:
▫ https://www.ml4aad.org/automl/
▫ https://en.wikipedia.org/wiki/Automated_machine_learning
Frameworks
25
Hardware Considerations
26
Hardware Considerations
Reference: https://azure.microsoft.com/en-us/blog/release-
models-at-pace-using-microsoft-s-automl/
27
So, which one to choose?
Let’s try some of them
28
www.QuSandbox.com
Model
Analytics
Studio
QuResearchHub
QuSandbox
Prototype, Iterate and tune Standardize workflows
Productionize and share
29
www.analyticscertificate.com/MachineLearning
Use code “Affiliate” for a 20% off by Oct 30th
Continue your learning here!
November 7,8,2018
Sri Krishnamurthy, CFA, CAP
Founder and Chief Data Scientist
sri@quantuniversity.com
srikrishnamurthy
www.QuantUniversity.com
www.analyticscertificate.com
www.qusandbox.com
Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be
distributed or used in any other publication without the prior written consent of QuantUniversity LLC.
30
• Founder of QuantUniversity LLC. and
www.analyticscertificate.com
• Advisory and Consultancy for Financial Analytics
• Prior Experience at MathWorks, Citigroup and
Endeca and 25+ financial services and energy
customers.
• Regular Columnist for the Wilmott Magazine
• Author of forthcoming book
“Financial Modeling: A case study approach”
published by Wiley
• Charted Financial Analyst and Certified Analytics
Professional
• Teaches Analytics in the Babson College MBA
program and at Northeastern University, Boston
Sri Krishnamurthy
Founder and CEO
31

Automatic machine learning (AutoML) 101

  • 1.
    AutoML 101 2018 CopyrightQuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP sri@quantuniversity.com www.quantuniversity.com 10/25/2018 QuantUniversity Meetup Boston
  • 2.
    2 About us: • DataScience, Quant Finance and Model Governance Advisory • Technologies using MATLAB, Python and R • Programs ▫ Analytics Certificate Program ▫ Fintech programs • Platform
  • 3.
    3 www.analyticscertificate.com/MachineLearning Use code “Affiliate”for a 20% off by Oct 30th Upcoming workshop November 7,8,2018
  • 4.
    4 • Your challengeis to design an artificial intelligence and machine learning (AI/ML) framework capable of flying a drone through several professional drone racing courses without human intervention or navigational pre-programming. AlphaPilot Drone AI Challenge
  • 5.
  • 6.
    6 • Machine Learning •Automatic Machine Learning • Demos Agenda
  • 7.
    7 • “AI isthe theory and development of computer systems able to perform tasks that traditionally have required human intelligence. • AI is a broad field, of which ‘machine learning’ is a sub-category” What is Machine Learning and AI? Source: http://www.fsb.org/wp-content/uploads/P011117.pdf
  • 8.
    8 The Machine LearningProcess Data cleansing Feature Engineering Training and Testing Model building Model selection Hyper parameter optimization Model Deployment
  • 9.
    9 • Supervised Algorithms ▫Given a set of variables !", predict the value of another variable # in a given data set such that ▫ If y is numeric => Prediction ▫ If y is categorical => Classification Machine Learning x1,x2,x3… Model F(X) y
  • 10.
    10 • Unsupervised Algorithms ▫Given a dataset with variables !", build a model that captures the similarities in different observations and assigns them to different buckets => Clustering Machine Learning Obs1, Obs2,Obs3 etc. Model Obs1- Class 1 Obs2- Class 2 Obs3- Class 1
  • 11.
  • 12.
    12 • Parametric models ▫Assume some functional form ▫ Fit coefficients • Examples : Linear Regression, Neural Networks Supervised Learning models - Prediction ! = #$ + #&'& Linear Regression Model Neural network Model
  • 13.
    13 • Non-Parametric models ▫No functional form assumed • Examples : K-nearest neighbors, Decision Trees Supervised Learning models K-nearest neighbor Model Decision tree Model
  • 14.
  • 15.
    15 • Automated machinelearning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. AutoML
  • 16.
    16 • Automated FeatureEngineering ▫ Feature selection ▫ Feature extraction ▫ Meta learning and transfer learning ▫ Detection and handling of skewed data and/or missing values • Hyper-parameter optimization • Model Selection • Reference: https://en.wikipedia.org/wiki/Automated_machine_learning Types of frameworks
  • 17.
    17 • Parameters: Valuesthat can be estimated from data ▫ Examples: – Regression Coefficients – Weights in a Neural Network • HyperParameters: Values external to the model and cannot be learnt from the data ▫ Examples: – Learning rate in Neural Network – Regularization parameters Parameters vs Hyper Parameters
  • 18.
    18 • Hyperparameter optimizationfinds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given independent data.[1] • [1] Claesen, Marc; Bart De Moor (2015). "Hyperparameter Search in Machine Learning". • Image from: https://support.sas.com/resources/papers/proceedings17/SAS0514-2017.pdf Hyperparameter optimization
  • 19.
    19 • Interpretability: Abilityof users to understand the model, the parameters of the model and their effect on the outcome • Example: ▫ In regression, coefficients enable us to interpret the influence of an independent variable on the dependent variable. ▫ The standard error of estimates of the coefficients enable us to determine how confident are we on these estimates Model selection considerations
  • 20.
    20 • Parsimonious models:A parsimonious model is a model that accomplishes a desired level of explanation or prediction with as few predictor variables as possible. • Example: ▫ In regression, using Exhaustive search, Forward search, Backward search or Stepwise regression in model selection ▫ Using PCA on the feature space prior to model building Model selection considerations
  • 21.
    21 • Ensemble models:Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Image from: https://blogs.sas.com/content/subconsciousmusings/2017/05/18/sta cked-ensemble-models-win-data-science-competitions/ Model selection considerations
  • 22.
    22 Full pipeline Auotmation •AutoWEKA is an approach for the simultaneous selection of a machine learning algorithm and its hyperparameters; combined with the WEKA package it automatically yields good models for a wide variety of data sets. • Auto-sklearn is an extension of AutoWEKA using the Python library scikit- learn which is a drop-in replacement for regular scikit-learn classifiers and regressors. It improves over AutoWEKA by using meta-learning to increase search efficiency and post-hoc ensemble building to combine the models generated during the hyperparameter optimization process. • TPOT is a data-science assistant which optimizes machine learning pipelines using genetic programming. Ref: https://www.ml4aad.org/automl/ Frameworks
  • 23.
    23 Hyper-parameter optimization andModel Selection • H2O AutoML provides automated model selection and ensembling for the H2O machine learning and data analytics platform. • mlr is a R package that contains several hyperparameter optimization techniques for machine learning problems. Ref: https://www.ml4aad.org/automl/ Frameworks
  • 24.
    24 Deep Neural NetworkArchitecture search • Google CLOUD AUTOML is an could-based machine learning service which so far provides the automated generation of computer vision pipelines. • Auto Keras is an open-source python package for neural architecture search. • Ref: ▫ https://www.ml4aad.org/automl/ ▫ https://en.wikipedia.org/wiki/Automated_machine_learning Frameworks
  • 25.
  • 26.
  • 27.
    27 So, which oneto choose? Let’s try some of them
  • 28.
  • 29.
    29 www.analyticscertificate.com/MachineLearning Use code “Affiliate”for a 20% off by Oct 30th Continue your learning here! November 7,8,2018
  • 30.
    Sri Krishnamurthy, CFA,CAP Founder and Chief Data Scientist sri@quantuniversity.com srikrishnamurthy www.QuantUniversity.com www.analyticscertificate.com www.qusandbox.com Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 30
  • 31.
    • Founder ofQuantUniversity LLC. and www.analyticscertificate.com • Advisory and Consultancy for Financial Analytics • Prior Experience at MathWorks, Citigroup and Endeca and 25+ financial services and energy customers. • Regular Columnist for the Wilmott Magazine • Author of forthcoming book “Financial Modeling: A case study approach” published by Wiley • Charted Financial Analyst and Certified Analytics Professional • Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston Sri Krishnamurthy Founder and CEO 31