Your SlideShare is downloading.
×

- 1. AutoML 101 2018 Copyright QuantUniversity LLC. Presented By: Sri Krishnamurthy, CFA, CAP sri@quantuniversity.com www.quantuniversity.com 10/25/2018 QuantUniversity Meetup Boston
- 2. 2 About us: • Data Science, Quant Finance and Model Governance Advisory • Technologies using MATLAB, Python and R • Programs ▫ Analytics Certificate Program ▫ Fintech programs • Platform
- 3. 3 www.analyticscertificate.com/MachineLearning Use code “Affiliate” for a 20% off by Oct 30th Upcoming workshop November 7,8,2018
- 4. 4 • Your challenge is to design an artificial intelligence and machine learning (AI/ML) framework capable of flying a drone through several professional drone racing courses without human intervention or navigational pre-programming. AlphaPilot Drone AI Challenge
- 5. 5
- 6. 6 • Machine Learning • Automatic Machine Learning • Demos Agenda
- 7. 7 • “AI is the theory and development of computer systems able to perform tasks that traditionally have required human intelligence. • AI is a broad field, of which ‘machine learning’ is a sub-category” What is Machine Learning and AI? Source: http://www.fsb.org/wp-content/uploads/P011117.pdf
- 8. 8 The Machine Learning Process Data cleansing Feature Engineering Training and Testing Model building Model selection Hyper parameter optimization Model Deployment
- 9. 9 • Supervised Algorithms ▫ Given a set of variables !", predict the value of another variable # in a given data set such that ▫ If y is numeric => Prediction ▫ If y is categorical => Classification Machine Learning x1,x2,x3… Model F(X) y
- 10. 10 • Unsupervised Algorithms ▫ Given a dataset with variables !", build a model that captures the similarities in different observations and assigns them to different buckets => Clustering Machine Learning Obs1, Obs2,Obs3 etc. Model Obs1- Class 1 Obs2- Class 2 Obs3- Class 1
- 11. 11 Supervised Learning algorithms Parametric models Non- Parametric models Supervised learning Algorithms - Prediction
- 12. 12 • Parametric models ▫ Assume some functional form ▫ Fit coefficients • Examples : Linear Regression, Neural Networks Supervised Learning models - Prediction ! = #$ + #&'& Linear Regression Model Neural network Model
- 13. 13 • Non-Parametric models ▫ No functional form assumed • Examples : K-nearest neighbors, Decision Trees Supervised Learning models K-nearest neighbor Model Decision tree Model
- 14. 14
- 15. 15 • Automated machine learning (AutoML) is the process of automating the end-to-end process of applying machine learning to real-world problems. AutoML
- 16. 16 • Automated Feature Engineering ▫ Feature selection ▫ Feature extraction ▫ Meta learning and transfer learning ▫ Detection and handling of skewed data and/or missing values • Hyper-parameter optimization • Model Selection • Reference: https://en.wikipedia.org/wiki/Automated_machine_learning Types of frameworks
- 17. 17 • Parameters: Values that can be estimated from data ▫ Examples: Regression Coefficients Weights in a Neural Network • HyperParameters: Values external to the model and cannot be learnt from the data ▫ Examples: Learning rate in Neural Network Regularization parameters Parameters vs Hyper Parameters
- 18. 18 • Hyperparameter optimization finds a tuple of hyperparameters that yields an optimal model which minimizes a predefined loss function on given independent data.[1] • [1] Claesen, Marc; Bart De Moor (2015). "Hyperparameter Search in Machine Learning". • Image from: https://support.sas.com/resources/papers/proceedings17/SAS0514-2017.pdf Hyperparameter optimization
- 19. 19 • Interpretability: Ability of users to understand the model, the parameters of the model and their effect on the outcome • Example: ▫ In regression, coefficients enable us to interpret the influence of an independent variable on the dependent variable. ▫ The standard error of estimates of the coefficients enable us to determine how confident are we on these estimates Model selection considerations
- 20. 20 • Parsimonious models: A parsimonious model is a model that accomplishes a desired level of explanation or prediction with as few predictor variables as possible. • Example: ▫ In regression, using Exhaustive search, Forward search, Backward search or Stepwise regression in model selection ▫ Using PCA on the feature space prior to model building Model selection considerations
- 21. 21 • Ensemble models: Ensemble methods use multiple learning algorithms to obtain better predictive performance than could be obtained from any of the constituent learning algorithms alone. Image from: https://blogs.sas.com/content/subconsciousmusings/2017/05/18/sta cked-ensemble-models-win-data-science-competitions/ Model selection considerations
- 22. 22 Full pipeline Auotmation • AutoWEKA is an approach for the simultaneous selection of a machine learning algorithm and its hyperparameters; combined with the WEKA package it automatically yields good models for a wide variety of data sets. • Auto-sklearn is an extension of AutoWEKA using the Python library scikit- learn which is a drop-in replacement for regular scikit-learn classifiers and regressors. It improves over AutoWEKA by using meta-learning to increase search efficiency and post-hoc ensemble building to combine the models generated during the hyperparameter optimization process. • TPOT is a data-science assistant which optimizes machine learning pipelines using genetic programming. Ref: https://www.ml4aad.org/automl/ Frameworks
- 23. 23 Hyper-parameter optimization and Model Selection • H2O AutoML provides automated model selection and ensembling for the H2O machine learning and data analytics platform. • mlr is a R package that contains several hyperparameter optimization techniques for machine learning problems. Ref: https://www.ml4aad.org/automl/ Frameworks
- 24. 24 Deep Neural Network Architecture search • Google CLOUD AUTOML is an could-based machine learning service which so far provides the automated generation of computer vision pipelines. • Auto Keras is an open-source python package for neural architecture search. • Ref: ▫ https://www.ml4aad.org/automl/ ▫ https://en.wikipedia.org/wiki/Automated_machine_learning Frameworks
- 25. 25 Hardware Considerations
- 26. 26 Hardware Considerations Reference: https://azure.microsoft.com/en-us/blog/release- models-at-pace-using-microsoft-s-automl/
- 27. 27 So, which one to choose? Let’s try some of them
- 28. 28 www.QuSandbox.com Model Analytics Studio QuResearchHub QuSandbox Prototype, Iterate and tune Standardize workflows Productionize and share
- 29. 29 www.analyticscertificate.com/MachineLearning Use code “Affiliate” for a 20% off by Oct 30th Continue your learning here! November 7,8,2018
- 30. Sri Krishnamurthy, CFA, CAP Founder and Chief Data Scientist sri@quantuniversity.com srikrishnamurthy www.QuantUniversity.com www.analyticscertificate.com www.qusandbox.com Information, data and drawings embodied in this presentation are strictly a property of QuantUniversity LLC. and shall not be distributed or used in any other publication without the prior written consent of QuantUniversity LLC. 30
- 31. • Founder of QuantUniversity LLC. and www.analyticscertificate.com • Advisory and Consultancy for Financial Analytics • Prior Experience at MathWorks, Citigroup and Endeca and 25+ financial services and energy customers. • Regular Columnist for the Wilmott Magazine • Author of forthcoming book “Financial Modeling: A case study approach” published by Wiley • Charted Financial Analyst and Certified Analytics Professional • Teaches Analytics in the Babson College MBA program and at Northeastern University, Boston Sri Krishnamurthy Founder and CEO 31