SlideShare a Scribd company logo
1 of 37
Download to read offline
Automatic Machine Learning
By: Himadri Mishra, 13074014
Overview: What is Machine Learning?
● Subfield of computer science
● Evolved from the study of pattern recognition and
computational learning theory in artificial intelligence
● Gives computers the ability to learn without being
explicitly programmed
● Explores the study and construction of algorithms that
can learn from and make predictions on data
Basic Flow of Machine Learning
Overview: Why Machine Learning?
● Some tasks are difficult to define algorithmically.
Example: Learning to recognize objects.
● High-value predictions that can guide better decisions
and smart actions in real time without human intervention
● Machine learning as a technology that helps analyze these
large chunks of big data,
● Research area that targets progressive automation of
machine learning
● Also known as AutoML
● Focuses on end users without expert knowledge
● Offers new tools to Machine Learning experts.
○ Perform architecture search over deep representations
○ Analyse the importance of hyperparameters
○ Development of flexible software packages that can be instantiated
automatically in a data-driven way
● Follows the paradigm of Programming by Optimization (PbO)
What is Automatic Machine Learning?
Examples of AutoML
● AutoWEKA: Approach for the simultaneous selection of a machine learning
algorithm and its hyperparameters
● Deep Neural Networks: notoriously dependent on their hyperparameters, and
modern optimizers have achieved better results in setting them than humans
(Bergstra et al, Snoek et al).
● Making a science of model search: a complex computer vision architecture
could automatically be instantiated to yield state-of-the-art results on 3
different tasks: face matching, face identification, and object
recognition.
Methods of AutoML
● Bayesian optimization
● Regression models for structured data and big data
● Meta learning
● Transfer learning
● Combinatorial optimization.
An AutoML Framework
Modules of AutoML Framework, unraveled
● Data Pre-Processing
● Problem Identification and Data Splitting
● Feature Engineering
● Feature Stacking
● Application of various models to data
● Decomposition
● Feature Selection
● Model selection and HyperParameter tuning
● Evaluation of Model
Data Pre-Processing
● Tabular data is most common way of representing data in
machine learning or data mining
● Data must be converted to a tabular form
Problem Identification and Data Splitting
● Single column, binary values (Binary Classification)
● Single column, real values (Regression problem)
● Multiple column, binary values (Multi-Class
Classification)
● Multiple column, real values (Multiple target Regression
problem)
● Multilabel Classification
Types of Labels
● Stratified KFold splitting for Classification
● Normal KFold split for regression
Feature Engineering
● Numerical Variables
○ No Processing Required
● Categorical Variables
○ Label Encoders
○ One Hot Encoders
● Text Variables
○ Count Vectorize
○ TF-IDF vectorize
Types of Variables
Feature Stacking
● Two Kinds of Stacking
○ Model Stacking
■ An Ensemble Approach
■ Combines the power of diverse models into single
○ Feature Stacking
■ Different features after processing, gets combined
● Our Stacker Module is a feature stacker
Application of models and Decomposition
● We should go for Ensemble tree based models:
○ Random Forest Regressor/Classifier
○ Extra Trees Regressor/Classifier
○ Gradient Boosting Machine Regressor/Classifier
● Can’t apply linear models without Normalization
○ For dense features Standard Scaler Normalization
○ For Sparse Features Normalize without scaling about mean, only to
unit variance
● If the above steps give a “good” model, we can go for
optimization of hyperparameters module, else continue
● For High dimensional data, PCA is used to decompose
● For images start with 10-15 components and increase it as
long as results improve
● For other kind of data, start with 50-60 components
● For Text Data, we use Singular Value Decomposition after
converting text to sparse matrix
Feature Selection
● Greedy Forward Selection
○ Selecting best features iteratively
○ Selecting features based on coefficients of model
● Greedy backward elimination
● Use GBM for normal features and Random Forest for Sparse
features for feature evaluation
Model selection and HyperParameter tuning
● Most important and fundamental process of Machine
Learning
● Classification:
○ Random Forest
○ GBM
○ Logistic Regression
○ Naive Bayes
○ Support Vector Machines
○ k-Nearest Neighbors
● Regression
○ Random Forest
○ GBM
○ Linear Regression
○ Ridge
○ Lasso
○ SVR
Choice of Model and Hyperparameters
Evaluation of Model
Saving all Transformations on Train Data for reuse
Re-Use of saved transformations for Evaluation on validation set
Current Research
Automatic Architecture selection for Neural Network
Automatically Tuned Neural Network
● Auto-Net is a system that automatically configures neural networks
● Achieved the best performance on two datasets in the human expert track of
the recent ChaLearn AutoML Challenge
● Works by tuning:
○ layer-independent network hyperparameters
○ per-layer hyperparameters
● Auto-Net submission reached an AUC score of 90%, while the best human
competitor (Ideal Intel Analytics) only reached 80%
● first time an automatically-constructed neural network won a competition
dataset
Conclusion
● Machine learning (ML) has achieved considerable successes
in recent years and an ever-growing number of disciplines
rely on it.
● However, its success crucially relies on human machine
learning experts to perform various tasks manually
● The rapid growth of machine learning applications has
created a demand for off-the-shelf machine learning
methods that can be used easily and without expert
knowledge
● Auto-ML is an open research topic and will be very soon
challenging the state of the Art results in various
domains
Thank You

More Related Content

What's hot

Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
Databricks
 
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.aiIntro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Sri Ambati
 

What's hot (20)

The Evolution of AutoML
The Evolution of AutoMLThe Evolution of AutoML
The Evolution of AutoML
 
Introduction to Auto ML
Introduction to Auto MLIntroduction to Auto ML
Introduction to Auto ML
 
Deep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter TuningDeep Dive into Hyperparameter Tuning
Deep Dive into Hyperparameter Tuning
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
 
XGBoost & LightGBM
XGBoost & LightGBMXGBoost & LightGBM
XGBoost & LightGBM
 
Getting Started with Azure AutoML
Getting Started with Azure AutoMLGetting Started with Azure AutoML
Getting Started with Azure AutoML
 
ML Basics
ML BasicsML Basics
ML Basics
 
Automated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and TrackingAutomated Hyperparameter Tuning, Scaling and Tracking
Automated Hyperparameter Tuning, Scaling and Tracking
 
Hyperparameter Tuning
Hyperparameter TuningHyperparameter Tuning
Hyperparameter Tuning
 
Hyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine LearningHyperparameter Optimization for Machine Learning
Hyperparameter Optimization for Machine Learning
 
Fine tuning large LMs
Fine tuning large LMsFine tuning large LMs
Fine tuning large LMs
 
Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective Explainable AI (XAI) - A Perspective
Explainable AI (XAI) - A Perspective
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
An introduction to Deep Learning
An introduction to Deep LearningAn introduction to Deep Learning
An introduction to Deep Learning
 
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.aiIntro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
Intro to AutoML + Hands-on Lab - Erin LeDell, Machine Learning Scientist, H2O.ai
 
Feature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive modelsFeature Engineering - Getting most out of data for predictive models
Feature Engineering - Getting most out of data for predictive models
 
Machine Learning
Machine LearningMachine Learning
Machine Learning
 
Methods of Optimization in Machine Learning
Methods of Optimization in Machine LearningMethods of Optimization in Machine Learning
Methods of Optimization in Machine Learning
 
Machine learning
Machine learningMachine learning
Machine learning
 
Transfer Learning
Transfer LearningTransfer Learning
Transfer Learning
 

Viewers also liked

Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Lior Rokach
 
Fighting Knowledge Acquisition Bottleneck with Argument Based ...
Fighting Knowledge Acquisition Bottleneck with Argument Based ...Fighting Knowledge Acquisition Bottleneck with Argument Based ...
Fighting Knowledge Acquisition Bottleneck with Argument Based ...
butest
 
Introduction to Deducer
Introduction to DeducerIntroduction to Deducer
Introduction to Deducer
Kazuki Yoshida
 
NYAI - Commodity Machine Learning & Beyond by Andreas Mueller
NYAI - Commodity Machine Learning & Beyond by Andreas MuellerNYAI - Commodity Machine Learning & Beyond by Andreas Mueller
NYAI - Commodity Machine Learning & Beyond by Andreas Mueller
Rizwan Habib
 

Viewers also liked (20)

Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?
 
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learnAutomatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
Towards Automatic Composition of Multicomponent Predictive Systems
Towards Automatic Composition of Multicomponent Predictive SystemsTowards Automatic Composition of Multicomponent Predictive Systems
Towards Automatic Composition of Multicomponent Predictive Systems
 
Robsonalves fotografia Fine Art 2016-2
Robsonalves fotografia Fine Art 2016-2Robsonalves fotografia Fine Art 2016-2
Robsonalves fotografia Fine Art 2016-2
 
400 million Search Results -Predict Contextual Ad Clicks
400 million Search Results -Predict Contextual Ad Clicks 400 million Search Results -Predict Contextual Ad Clicks
400 million Search Results -Predict Contextual Ad Clicks
 
H2O Machine Learning AutoML Roadmap 2016.10
H2O Machine Learning AutoML Roadmap 2016.10H2O Machine Learning AutoML Roadmap 2016.10
H2O Machine Learning AutoML Roadmap 2016.10
 
Alice Lindorfer
Alice LindorferAlice Lindorfer
Alice Lindorfer
 
Nuxeo Iks 2009 11 13
Nuxeo Iks 2009 11 13Nuxeo Iks 2009 11 13
Nuxeo Iks 2009 11 13
 
Fighting Knowledge Acquisition Bottleneck with Argument Based ...
Fighting Knowledge Acquisition Bottleneck with Argument Based ...Fighting Knowledge Acquisition Bottleneck with Argument Based ...
Fighting Knowledge Acquisition Bottleneck with Argument Based ...
 
Introduction to Deducer
Introduction to DeducerIntroduction to Deducer
Introduction to Deducer
 
Data mining with Rattle For R
Data mining with Rattle For RData mining with Rattle For R
Data mining with Rattle For R
 
Installing R and R-Studio
Installing R and R-StudioInstalling R and R-Studio
Installing R and R-Studio
 
NYAI - Commodity Machine Learning & Beyond by Andreas Mueller
NYAI - Commodity Machine Learning & Beyond by Andreas MuellerNYAI - Commodity Machine Learning & Beyond by Andreas Mueller
NYAI - Commodity Machine Learning & Beyond by Andreas Mueller
 
Julian - diagnosing heart disease using convolutional neural networks
Julian - diagnosing heart disease using convolutional neural networksJulian - diagnosing heart disease using convolutional neural networks
Julian - diagnosing heart disease using convolutional neural networks
 
KM technologies and strategy
KM technologies and strategyKM technologies and strategy
KM technologies and strategy
 
R and Rcmdr Statistical Software
R and Rcmdr Statistical SoftwareR and Rcmdr Statistical Software
R and Rcmdr Statistical Software
 
R-Studio Vs. Rcmdr
R-Studio Vs. RcmdrR-Studio Vs. Rcmdr
R-Studio Vs. Rcmdr
 
Scikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in PythonScikit-Learn: Machine Learning in Python
Scikit-Learn: Machine Learning in Python
 
HUD 232 Lean Financing: A Primer
HUD 232 Lean Financing: A PrimerHUD 232 Lean Financing: A Primer
HUD 232 Lean Financing: A Primer
 

Similar to Automatic Machine Learning, AutoML

Similar to Automatic Machine Learning, AutoML (20)

SKLearn Workshop.pptx
SKLearn Workshop.pptxSKLearn Workshop.pptx
SKLearn Workshop.pptx
 
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
Enterprise PHP Architecture through Design Patterns and Modularization (Midwe...
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on Spark
 
Software Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale AutomationSoftware Design Practices for Large-Scale Automation
Software Design Practices for Large-Scale Automation
 
AI hype or reality
AI  hype or realityAI  hype or reality
AI hype or reality
 
Survey Of AutoGL - First Dedicated framework for machine learning on Graphs
Survey Of AutoGL - First Dedicated framework for machine learning on GraphsSurvey Of AutoGL - First Dedicated framework for machine learning on Graphs
Survey Of AutoGL - First Dedicated framework for machine learning on Graphs
 
Future of ai on the jvm
Future of ai on the jvmFuture of ai on the jvm
Future of ai on the jvm
 
BSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 SessionsBSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 Sessions
 
Aws autopilot
Aws autopilotAws autopilot
Aws autopilot
 
Centernet
CenternetCenternet
Centernet
 
Active Learning on Question Answering with Dialogues
 Active Learning on Question Answering with Dialogues Active Learning on Question Answering with Dialogues
Active Learning on Question Answering with Dialogues
 
MOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDCMOPs & ML Pipelines on GCP - Session 6, RGDC
MOPs & ML Pipelines on GCP - Session 6, RGDC
 
Productionalizing Spark ML
Productionalizing Spark MLProductionalizing Spark ML
Productionalizing Spark ML
 
Michelangelo - Machine Learning Platform - 2018
Michelangelo - Machine Learning Platform - 2018Michelangelo - Machine Learning Platform - 2018
Michelangelo - Machine Learning Platform - 2018
 
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
Using Machine Learning & Artificial Intelligence to Create Impactful Customer...
 
Willump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML InferenceWillump: Optimizing Feature Computation in ML Inference
Willump: Optimizing Feature Computation in ML Inference
 
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
Python's slippy path and Tao of thick Pandas: give my data, Rrrrr...
 
Data Preprocessing
Data PreprocessingData Preprocessing
Data Preprocessing
 
PPT3: Main algorithms and techniques required for implementing Machine Learni...
PPT3: Main algorithms and techniques required for implementing Machine Learni...PPT3: Main algorithms and techniques required for implementing Machine Learni...
PPT3: Main algorithms and techniques required for implementing Machine Learni...
 
L15.pptx
L15.pptxL15.pptx
L15.pptx
 

Recently uploaded

Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Kandungan 087776558899
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
Epec Engineered Technologies
 

Recently uploaded (20)

Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...Basic Electronics for diploma students as per technical education Kerala Syll...
Basic Electronics for diploma students as per technical education Kerala Syll...
 
Thermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.pptThermal Engineering -unit - III & IV.ppt
Thermal Engineering -unit - III & IV.ppt
 
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptxS1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
S1S2 B.Arch MGU - HOA1&2 Module 3 -Temple Architecture of Kerala.pptx
 
A Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna MunicipalityA Study of Urban Area Plan for Pabna Municipality
A Study of Urban Area Plan for Pabna Municipality
 
PE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and propertiesPE 459 LECTURE 2- natural gas basic concepts and properties
PE 459 LECTURE 2- natural gas basic concepts and properties
 
Ground Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth ReinforcementGround Improvement Technique: Earth Reinforcement
Ground Improvement Technique: Earth Reinforcement
 
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak HamilCara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
Cara Menggugurkan Sperma Yang Masuk Rahim Biyar Tidak Hamil
 
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
NO1 Top No1 Amil Baba In Azad Kashmir, Kashmir Black Magic Specialist Expert ...
 
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
457503602-5-Gas-Well-Testing-and-Analysis-pptx.pptx
 
Computer Networks Basics of Network Devices
Computer Networks  Basics of Network DevicesComputer Networks  Basics of Network Devices
Computer Networks Basics of Network Devices
 
Design For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the startDesign For Accessibility: Getting it right from the start
Design For Accessibility: Getting it right from the start
 
Thermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - VThermal Engineering-R & A / C - unit - V
Thermal Engineering-R & A / C - unit - V
 
Standard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power PlayStandard vs Custom Battery Packs - Decoding the Power Play
Standard vs Custom Battery Packs - Decoding the Power Play
 
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
COST-EFFETIVE  and Energy Efficient BUILDINGS ptxCOST-EFFETIVE  and Energy Efficient BUILDINGS ptx
COST-EFFETIVE and Energy Efficient BUILDINGS ptx
 
Signal Processing and Linear System Analysis
Signal Processing and Linear System AnalysisSignal Processing and Linear System Analysis
Signal Processing and Linear System Analysis
 
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...Max. shear stress theory-Maximum Shear Stress Theory ​  Maximum Distortional ...
Max. shear stress theory-Maximum Shear Stress Theory ​ Maximum Distortional ...
 
Electromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptxElectromagnetic relays used for power system .pptx
Electromagnetic relays used for power system .pptx
 
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
Bhubaneswar🌹Call Girls Bhubaneswar ❤Komal 9777949614 💟 Full Trusted CALL GIRL...
 
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptxOrlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
Orlando’s Arnold Palmer Hospital Layout Strategy-1.pptx
 
Online food ordering system project report.pdf
Online food ordering system project report.pdfOnline food ordering system project report.pdf
Online food ordering system project report.pdf
 

Automatic Machine Learning, AutoML

  • 1. Automatic Machine Learning By: Himadri Mishra, 13074014
  • 2. Overview: What is Machine Learning? ● Subfield of computer science ● Evolved from the study of pattern recognition and computational learning theory in artificial intelligence ● Gives computers the ability to learn without being explicitly programmed ● Explores the study and construction of algorithms that can learn from and make predictions on data
  • 3. Basic Flow of Machine Learning
  • 4. Overview: Why Machine Learning? ● Some tasks are difficult to define algorithmically. Example: Learning to recognize objects. ● High-value predictions that can guide better decisions and smart actions in real time without human intervention ● Machine learning as a technology that helps analyze these large chunks of big data,
  • 5. ● Research area that targets progressive automation of machine learning ● Also known as AutoML ● Focuses on end users without expert knowledge ● Offers new tools to Machine Learning experts. ○ Perform architecture search over deep representations ○ Analyse the importance of hyperparameters ○ Development of flexible software packages that can be instantiated automatically in a data-driven way ● Follows the paradigm of Programming by Optimization (PbO) What is Automatic Machine Learning?
  • 6. Examples of AutoML ● AutoWEKA: Approach for the simultaneous selection of a machine learning algorithm and its hyperparameters ● Deep Neural Networks: notoriously dependent on their hyperparameters, and modern optimizers have achieved better results in setting them than humans (Bergstra et al, Snoek et al). ● Making a science of model search: a complex computer vision architecture could automatically be instantiated to yield state-of-the-art results on 3 different tasks: face matching, face identification, and object recognition.
  • 7. Methods of AutoML ● Bayesian optimization ● Regression models for structured data and big data ● Meta learning ● Transfer learning ● Combinatorial optimization.
  • 9.
  • 10. Modules of AutoML Framework, unraveled ● Data Pre-Processing ● Problem Identification and Data Splitting ● Feature Engineering ● Feature Stacking ● Application of various models to data ● Decomposition ● Feature Selection ● Model selection and HyperParameter tuning ● Evaluation of Model
  • 12. ● Tabular data is most common way of representing data in machine learning or data mining ● Data must be converted to a tabular form
  • 13. Problem Identification and Data Splitting
  • 14. ● Single column, binary values (Binary Classification) ● Single column, real values (Regression problem) ● Multiple column, binary values (Multi-Class Classification) ● Multiple column, real values (Multiple target Regression problem) ● Multilabel Classification Types of Labels
  • 15. ● Stratified KFold splitting for Classification ● Normal KFold split for regression
  • 17. ● Numerical Variables ○ No Processing Required ● Categorical Variables ○ Label Encoders ○ One Hot Encoders ● Text Variables ○ Count Vectorize ○ TF-IDF vectorize Types of Variables
  • 19. ● Two Kinds of Stacking ○ Model Stacking ■ An Ensemble Approach ■ Combines the power of diverse models into single ○ Feature Stacking ■ Different features after processing, gets combined ● Our Stacker Module is a feature stacker
  • 20. Application of models and Decomposition
  • 21. ● We should go for Ensemble tree based models: ○ Random Forest Regressor/Classifier ○ Extra Trees Regressor/Classifier ○ Gradient Boosting Machine Regressor/Classifier ● Can’t apply linear models without Normalization ○ For dense features Standard Scaler Normalization ○ For Sparse Features Normalize without scaling about mean, only to unit variance ● If the above steps give a “good” model, we can go for optimization of hyperparameters module, else continue
  • 22. ● For High dimensional data, PCA is used to decompose ● For images start with 10-15 components and increase it as long as results improve ● For other kind of data, start with 50-60 components ● For Text Data, we use Singular Value Decomposition after converting text to sparse matrix
  • 24. ● Greedy Forward Selection ○ Selecting best features iteratively ○ Selecting features based on coefficients of model ● Greedy backward elimination ● Use GBM for normal features and Random Forest for Sparse features for feature evaluation
  • 25. Model selection and HyperParameter tuning
  • 26. ● Most important and fundamental process of Machine Learning
  • 27. ● Classification: ○ Random Forest ○ GBM ○ Logistic Regression ○ Naive Bayes ○ Support Vector Machines ○ k-Nearest Neighbors ● Regression ○ Random Forest ○ GBM ○ Linear Regression ○ Ridge ○ Lasso ○ SVR Choice of Model and Hyperparameters
  • 28.
  • 30. Saving all Transformations on Train Data for reuse
  • 31. Re-Use of saved transformations for Evaluation on validation set
  • 33. Automatic Architecture selection for Neural Network
  • 34. Automatically Tuned Neural Network ● Auto-Net is a system that automatically configures neural networks ● Achieved the best performance on two datasets in the human expert track of the recent ChaLearn AutoML Challenge ● Works by tuning: ○ layer-independent network hyperparameters ○ per-layer hyperparameters ● Auto-Net submission reached an AUC score of 90%, while the best human competitor (Ideal Intel Analytics) only reached 80% ● first time an automatically-constructed neural network won a competition dataset
  • 36. ● Machine learning (ML) has achieved considerable successes in recent years and an ever-growing number of disciplines rely on it. ● However, its success crucially relies on human machine learning experts to perform various tasks manually ● The rapid growth of machine learning applications has created a demand for off-the-shelf machine learning methods that can be used easily and without expert knowledge ● Auto-ML is an open research topic and will be very soon challenging the state of the Art results in various domains