Successfully reported this slideshow.
Your SlideShare is downloading. ×

Automatic machine learning (AutoML) 101

Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Ad
Loading in …3
×

Check these out next

1 of 31 Ad

Automatic machine learning (AutoML) 101

Download to read offline

"Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. In a typical machine learning application, practitioners must apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning. Following those preprocessing steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their final machine learning model. As many of these steps are often beyond the abilities of non-experts, AutoML was proposed as an artificial intelligence-based solution to the ever-growing challenge of applying machine learning. Automating the end-to-end process of applying machine learning offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform models that were designed by hand."

In this talk we will discuss how QuSandbox and the Model Analytics Studio can be used in the selection of machine learning models. We will also illustrate AutoML frameworks through demos and examples and show you how to get started

"Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. In a typical machine learning application, practitioners must apply the appropriate data pre-processing, feature engineering, feature extraction, and feature selection methods that make the dataset amenable for machine learning. Following those preprocessing steps, practitioners must then perform algorithm selection and hyperparameter optimization to maximize the predictive performance of their final machine learning model. As many of these steps are often beyond the abilities of non-experts, AutoML was proposed as an artificial intelligence-based solution to the ever-growing challenge of applying machine learning. Automating the end-to-end process of applying machine learning offers the advantages of producing simpler solutions, faster creation of those solutions, and models that often outperform models that were designed by hand."

In this talk we will discuss how QuSandbox and the Model Analytics Studio can be used in the selection of machine learning models. We will also illustrate AutoML frameworks through demos and examples and show you how to get started

Advertisement
Advertisement

More Related Content

Slideshows for you (20)

Similar to Automatic machine learning (AutoML) 101 (20)

Advertisement

More from QuantUniversity (20)

Recently uploaded (20)

Advertisement

Automatic machine learning (AutoML) 101

  1. 1. AutoML 101 2018 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP sri@quantuniversity.com www.quantuniversity.com 10/25/2018 QuantUniversity Meetup Boston
  2. 2. 2 About us: • Data Science, Quant Finance and Model Governance Advisory • Technologies using MATLAB, Python and R • Programs ▫ Analytics Certificate Program ▫ Fintech programs • Platform
  3. 3. 3 www.analyticscertificate.com/MachineLearning Use code “Affiliate” for a 20% off by Oct 30th Upcoming workshop November 7,8,2018
  4. 4. 4 • Your challenge is to design an artificial intelligence and machine learning (AI/ML) framework capable of flying a drone through several professional drone racing courses without human intervention or navigational pre-programming. AlphaPilot Drone AI Challenge
  5. 5. 5
  6. 6. 6 • Machine Learning • Automatic Machine Learning • Demos Agenda
  7. 7. 7 • “AI is the theory and development of computer systems able to perform tasks that traditionally have required human intelligence. • AI is a broad field, of which ‘machine learning’ is a sub-category” What is Machine Learning and AI? Source: http://www.fsb.org/wp-content/uploads/P011117.pdf
  8. 8. 8 The Machine Learning Process Data cleansing Feature Engineering Training and Testing Model building Model selection Hyper parameter optimization Model Deployment
  9. 9. 9 • Supervised Algorithms ▫ Given a set of variables !", predict the value of another variable # in a given data set such that ▫ If y is numeric => Prediction ▫ If y is categorical => Classification Machine Learning x1,x2,x3… Model F(X) y
  10. 10. 10 • Unsupervised Algorithms ▫ Given a dataset with variables !", build a model that captures the similarities in different observations and assigns them to different buckets => Clustering Machine Learning Obs1, Obs2,Obs3 etc. Model Obs1- Class 1 Obs2- Class 2 Obs3- Class 1
  11. 11. 11 Supervised Learning algorithms Parametric models Non- Parametric models Supervised learning Algorithms - Prediction
  12. 12. 12 • Parametric models ▫ Assume some functional form ▫ Fit coefficients • Examples : Linear Regression, Neural Networks Supervised Learning models - Prediction ! = #$ + #&'& Linear Regression Model Neural network Model
  13. 13. 13 • Non-Parametric models ▫ No functional form assumed • Examples : K-nearest neighbors, Decision Trees Supervised Learning models K-nearest neighbor Model Decision tree Model
  14. 14. 14
  15. 15. 15 • Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. AutoML
  16. 16. 16 • Automated Feature Engineering ▫ Feature selection ▫ Feature extraction ▫ Meta learning and transfer learning ▫ Detection and handling of skewed data and/or missing values • Hyper-parameter optimization • Model Selection • Reference: https://en.wikipedia.org/wiki/Automated_machine_learning Types of frameworks
  17. 17. 17 • Parameters: Values that can be estimated from data ▫ Examples: – Regression Coefficients – Weights in a Neural Network • HyperParameters: Values external to the model and cannot be learnt from the data ▫ Examples: – Learning rate in Neural Network – Regularization parameters Parameters vs Hyper Parameters
  18. 18. 18 • Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given independent data.[1] • [1] Claesen, Marc; Bart De Moor (2015). "Hyperparameter Search in Machine Learning". • Image from: https://support.sas.com/resources/papers/proceedings17/SAS0514-2017.pdf Hyperparameter optimization
  19. 19. 19 • Interpretability: Ability of users to understand the model, the parameters of the model and their effect on the outcome • Example: ▫ In regression, coefficients enable us to interpret the influence of an independent variable on the dependent variable. ▫ The standard error of estimates of the coefficients enable us to determine how confident are we on these estimates Model selection considerations
  20. 20. 20 • Parsimonious models: A parsimonious model is a model that accomplishes a desired level of explanation or prediction with as few predictor variables as possible. • Example: ▫ In regression, using Exhaustive search, Forward search, Backward search or Stepwise regression in model selection ▫ Using PCA on the feature space prior to model building Model selection considerations
  21. 21. 21 • Ensemble models: Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Image from: https://blogs.sas.com/content/subconsciousmusings/2017/05/18/sta cked-ensemble-models-win-data-science-competitions/ Model selection considerations
  22. 22. 22 Full pipeline Auotmation • AutoWEKA is an approach for the simultaneous selection of a machine learning algorithm and its hyperparameters; combined with the WEKA package it automatically yields good models for a wide variety of data sets. • Auto-sklearn is an extension of AutoWEKA using the Python library scikit- learn which is a drop-in replacement for regular scikit-learn classifiers and regressors. It improves over AutoWEKA by using meta-learning to increase search efficiency and post-hoc ensemble building to combine the models generated during the hyperparameter optimization process. • TPOT is a data-science assistant which optimizes machine learning pipelines using genetic programming. Ref: https://www.ml4aad.org/automl/ Frameworks
  23. 23. 23 Hyper-parameter optimization and Model Selection • H2O AutoML provides automated model selection and ensembling for the H2O machine learning and data analytics platform. • mlr is a R package that contains several hyperparameter optimization techniques for machine learning problems. Ref: https://www.ml4aad.org/automl/ Frameworks
  24. 24. 24 Deep Neural Network Architecture search • Google CLOUD AUTOML is an could-based machine learning service which so far provides the automated generation of computer vision pipelines. • Auto Keras is an open-source python package for neural architecture search. • Ref: ▫ https://www.ml4aad.org/automl/ ▫ https://en.wikipedia.org/wiki/Automated_machine_learning Frameworks
  25. 25. 25 Hardware Considerations
  26. 26. 26 Hardware Considerations Reference: https://azure.microsoft.com/en-us/blog/release- models-at-pace-using-microsoft-s-automl/
  27. 27. 27 So, which one to choose? Let’s try some of them
  28. 28. 28 www.QuSandbox.com Model Analytics Studio QuResearchHub QuSandbox Prototype, Iterate and tune Standardize workflows Productionize and share
  29. 29. 29 www.analyticscertificate.com/MachineLearning Use code “Affiliate” for a 20% off by Oct 30th Continue your learning here! November 7,8,2018
  30. 30. Sri Krishnamurthy, CFA, CAP Founder and Chief Data Scientist sri@quantuniversity.com srikrishnamurthy www.QuantUniversity.com www.analyticscertificate.com www.qusandbox.com Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 30
  31. 31. • Founder of QuantUniversity LLC. and www.analyticscertificate.com • Advisory and Consultancy for Financial Analytics • Prior Experience at MathWorks, Citigroup and Endeca and 25+ financial services and energy customers. • Regular Columnist for the Wilmott Magazine • Author of forthcoming book “Financial Modeling: A case study approach” published by Wiley • Charted Financial Analyst and Certified Analytics Professional • Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston Sri Krishnamurthy Founder and CEO 31

×