SlideShare a Scribd company logo
1 of 21
Talk
By
Axel de Romblay
AUTOMATED MACHINE LEARNING
• Why Auto-ML is gaining momentum ?
• Focus on the automation process
• Overview of various solutions
• Conclusion
AUTOMATED MACHINE LEARNING
Data ScientistData Computation means
Data pre-processing Model tuning
Machine Learning
Almost an automated process…
Auto Machine Learning
A fully automated process
Data Computation meansRobot
• Supervised tasks
- classification
- regression
• Structured data
- csv files
- json files
- …
• Unsupervised tasks
- outlier detection
- clustering
- …
• Unstructured data
- images
- texts
- …
What is auto-ML ?
We want to automate…
…the maximum number of steps in a ML pipeline…
…with minimum human intervention…
…while conserving a high performance !
Some companies struggle to deploy ML models
« I need to recruit
more Data
Scientits ! »
« Machine Learning
cannot replace a
human »
« I have a
model ! Now
what ? »
Others don’t…
Identify a generic and
repetitive/reusable
pipeline
Implement or
buy or use
automated tools
Deployment and
monitoring : A&B
testing, …
Data
cleaning
(duplicates, ids,
correlations,
leaks, … )
Data
encoding
(NA, dates, text,
categorical
features, … )
STEP 2 : Preprocessing
STEP 1 : Reading /
merging
STEP 3 : Optimisation
Feature
selection
Feature
engineering
Model
selection
Prediction
Model
interpretation
STEP 4 : Application
Focus on the automation process
Diagram of a standard ML pipeline
Step 1: reading and merging
Maybe the hardest step. Not a priority for auto-ML at the moment.
Some inputs : paths to the data + target name
Difficult to auto-merge different sources
Step 2: preprocessing
Not all packages tackle preprocessing
No inputs : heuristics to detect feature types are easy
Naive encoding for most auto-ML packages
Step 3: optimization
Top priority for auto-ML community
Some inputs : scoring function + a hyper-parameter space
Computation time can be long !
Step 4: application
Prediction is also a priority for auto-ML
No auto-monitoring after model deployment
Fast and accurate step
Maintenance
Ease of setup
Automated
steps
- Reading
- Encoding
- Optimisation
- Encoding
- Optimisation
- Reading/merging
- Cleaning
- Encoding
- Optimisation
- Encoding
- Optimisation - Optimisation
Quality of
automation
Open source ?
Overview of various solutions
MLBox: a fully automated package
STEP 1 : Reading / merging
From a several raw datasets to one structured dataset.
• List of paths to all the datasets
• Target name
• Reading of several files (csv, xls, json and hdf5)
• Auto-merging of all the sources – information crunching
• Task detection (binary/multiclass classification or regression)
• Split between train and test sets
• Basic information display
• Dense structured train and test files (without duplicates)
Features
what is
Automated by MLBox ?
• A dataset
• Auto-cleaning/dropping : duplicates, drifts / covariate shifts (*), high correlations, highly sparse features
/ samples, constants, …
• Feature encoding : missing values, lists, dates, categorical features, text, … - SEVERAL STRATEGIES
• A cleaned dataset with numerical features
STEP 2 : Preprocessing
From a dirty dataset to a cleaned numerical one
Features
what is
Automated by MLBox ?
STEP 3 : Optimisation
A wide range of models are tested and cross-validated
• A metric (a wide choice, otherwise can be implemented) - OPTIONAL
• A hyper-parameter space – OPTIONAL
• The train set
• Feature engineering : using neural networks (*)
• Feature selection : filter methods, wrapper methods, embedded methods
• Model selection : a wide range of accurate models (LightGBM, XGBoost, Random Forest, Linear, …)
• Hyper-parameter optimisation : TPE (Bayesian optimisation method) – dumping of fitted pipelines
• Ensembling : multi-layer stacking, boosting, bagging, …
• The optimal pipeline configuration
Features
what is
Automated by MLBox ?
STEP 4 : Prediction
The best model is fitted and predicts on the test set
• Train and test datasets
• The optimal pipeline configuration - OPTIONAL
• Target prediction : classification and regression - dumping of predictions + optimal fitted pipeline
• Model interpretation : feature importances (saved)
• Leak detection : warning
• The predictions on the test set
Features
what is
Automated by MLBox ?
 Quality: functional code : tested on Kaggle
 Performance: fully distributed and optimised
 AI: dumping and automatic reading of computations
 Updates: latest algorithms
MLBox: about the package
 Compatibility: Python 2.7-3.6, Linux OS
 Quick setup: $ pip install mlbox
 User friendly: tutorials, docs, examples…
Conclusion: the benefits of auto-ML
Democratizes Machine Learning
Machine Learning for everybody (no coding)
Avoids errors
A robot never makes mistakes…
Increases productivity
Repetitive tasks are automated and accelerated ! A Data
Scientist can focus more on non-traditional issues !
Thank you !
Questions ?

More Related Content

What's hot

Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixJustin Basilico
 
Microsoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine LearningMicrosoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine LearningSetu Chokshi
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine LearningYuriy Guts
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AINing Jiang
 
Jake Mannix, MLconf 2013
Jake Mannix, MLconf 2013Jake Mannix, MLconf 2013
Jake Mannix, MLconf 2013MLconf
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareJustin Basilico
 
Automatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLAutomatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLHimadri Mishra
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkPetr Zapletal
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkIvo Andreev
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learningsafa cimenli
 
Open and Automated Machine Learning
Open and Automated Machine LearningOpen and Automated Machine Learning
Open and Automated Machine LearningJoaquin Vanschoren
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopGrant Ingersoll
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whyKorea Sdec
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableJustin Basilico
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101QuantUniversity
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Nikhil Garg
 
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...Lola Burgueño
 
Pipeline oriented data analytics
Pipeline oriented data analyticsPipeline oriented data analytics
Pipeline oriented data analyticsBorys Biletskyy
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2OSri Ambati
 

What's hot (20)

Lessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at NetflixLessons Learned from Building Machine Learning Software at Netflix
Lessons Learned from Building Machine Learning Software at Netflix
 
Microsoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine LearningMicrosoft Introduction to Automated Machine Learning
Microsoft Introduction to Automated Machine Learning
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
AutoML - The Future of AI
AutoML - The Future of AIAutoML - The Future of AI
AutoML - The Future of AI
 
Jake Mannix, MLconf 2013
Jake Mannix, MLconf 2013Jake Mannix, MLconf 2013
Jake Mannix, MLconf 2013
 
Recommendations for Building Machine Learning Software
Recommendations for Building Machine Learning SoftwareRecommendations for Building Machine Learning Software
Recommendations for Building Machine Learning Software
 
Automatic Machine Learning, AutoML
Automatic Machine Learning, AutoMLAutomatic Machine Learning, AutoML
Automatic Machine Learning, AutoML
 
MLlib and Machine Learning on Spark
MLlib and Machine Learning on SparkMLlib and Machine Learning on Spark
MLlib and Machine Learning on Spark
 
The Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it WorkThe Power of Auto ML and How Does it Work
The Power of Auto ML and How Does it Work
 
Automated Machine Learning
Automated Machine LearningAutomated Machine Learning
Automated Machine Learning
 
Open and Automated Machine Learning
Open and Automated Machine LearningOpen and Automated Machine Learning
Open and Automated Machine Learning
 
Intro to Mahout -- DC Hadoop
Intro to Mahout -- DC HadoopIntro to Mahout -- DC Hadoop
Intro to Mahout -- DC Hadoop
 
SDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the whySDEC2011 Mahout - the what, the how and the why
SDEC2011 Mahout - the what, the how and the why
 
Making Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms ReliableMaking Netflix Machine Learning Algorithms Reliable
Making Netflix Machine Learning Algorithms Reliable
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
 
Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)Building A Machine Learning Platform At Quora (1)
Building A Machine Learning Platform At Quora (1)
 
OpenML 2019
OpenML 2019OpenML 2019
OpenML 2019
 
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
A Generic Neural Network Architecture to Infer Heterogeneous Model Transforma...
 
Pipeline oriented data analytics
Pipeline oriented data analyticsPipeline oriented data analytics
Pipeline oriented data analytics
 
GLM & GBM in H2O
GLM & GBM in H2OGLM & GBM in H2O
GLM & GBM in H2O
 

Similar to How to automate Machine Learning pipeline ?

Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneIvo Andreev
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineMichael Gerke
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureDatabricks
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning ModelsTash Bickley
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesStitch Fix Algorithms
 
AzureML – zero to hero
AzureML – zero to heroAzureML – zero to hero
AzureML – zero to heroGovind Kanshi
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningSergey Karayev
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OSri Ambati
 
Everything you need to know about AutoML
Everything you need to know about AutoMLEverything you need to know about AutoML
Everything you need to know about AutoMLArpitha Gurumurthy
 
Ideas spracklen-final
Ideas spracklen-finalIdeas spracklen-final
Ideas spracklen-finalsupportlogic
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKJan Wiegelmann
 
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021Sandesh Rao
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment Databricks
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsYalçın Yenigün
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesDataWorks Summit
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTuri, Inc.
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesNick Pentreath
 

Similar to How to automate Machine Learning pipeline ? (20)

Cutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for EveryoneCutting Edge Computer Vision for Everyone
Cutting Edge Computer Vision for Everyone
 
Guiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning PipelineGuiding through a typical Machine Learning Pipeline
Guiding through a typical Machine Learning Pipeline
 
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsApache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models
 
Apache Spark MLlib
Apache Spark MLlib Apache Spark MLlib
Apache Spark MLlib
 
Tuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and ArchitectureTuning ML Models: Scaling, Workflows, and Architecture
Tuning ML Models: Scaling, Workflows, and Architecture
 
Productionising Machine Learning Models
Productionising Machine Learning ModelsProductionising Machine Learning Models
Productionising Machine Learning Models
 
AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)AutoML lectures (ACDL 2019)
AutoML lectures (ACDL 2019)
 
When We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML PipelinesWhen We Spark and When We Don’t: Developing Data and ML Pipelines
When We Spark and When We Don’t: Developing Data and ML Pipelines
 
AzureML – zero to hero
AzureML – zero to heroAzureML – zero to hero
AzureML – zero to hero
 
Data Management - Full Stack Deep Learning
Data Management - Full Stack Deep LearningData Management - Full Stack Deep Learning
Data Management - Full Stack Deep Learning
 
Scalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2OScalable Automatic Machine Learning in H2O
Scalable Automatic Machine Learning in H2O
 
Everything you need to know about AutoML
Everything you need to know about AutoMLEverything you need to know about AutoML
Everything you need to know about AutoML
 
Ideas spracklen-final
Ideas spracklen-finalIdeas spracklen-final
Ideas spracklen-final
 
END-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACKEND-TO-END MACHINE LEARNING STACK
END-TO-END MACHINE LEARNING STACK
 
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021
 
Apache Spark Model Deployment
Apache Spark Model Deployment Apache Spark Model Deployment
Apache Spark Model Deployment
 
Building High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning ApplicationsBuilding High Available and Scalable Machine Learning Applications
Building High Available and Scalable Machine Learning Applications
 
A machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companiesA machine learning and data science pipeline for real companies
A machine learning and data science pipeline for real companies
 
Towards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning BenchmarkTowards a Comprehensive Machine Learning Benchmark
Towards a Comprehensive Machine Learning Benchmark
 
Open, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI PipelinesOpen, Secure & Transparent AI Pipelines
Open, Secure & Transparent AI Pipelines
 

Recently uploaded

zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzohaibmir069
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxAleenaTreesaSaji
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptxanandsmhk
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsAArockiyaNisha
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxSwapnil Therkar
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Lokesh Kothari
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...Sérgio Sacani
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Patrick Diehl
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfWadeK3
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxkessiyaTpeter
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTSérgio Sacani
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhousejana861314
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bSérgio Sacani
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.aasikanpl
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfSwapnil Therkar
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1
 

Recently uploaded (20)

zoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistanzoogeography of pakistan.pptx fauna of Pakistan
zoogeography of pakistan.pptx fauna of Pakistan
 
GFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptxGFP in rDNA Technology (Biotechnology).pptx
GFP in rDNA Technology (Biotechnology).pptx
 
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCESTERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE
 
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatidSpermiogenesis or Spermateleosis or metamorphosis of spermatid
Spermiogenesis or Spermateleosis or metamorphosis of spermatid
 
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Mayapuri Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptxUnlocking  the Potential: Deep dive into ocean of Ceramic Magnets.pptx
Unlocking the Potential: Deep dive into ocean of Ceramic Magnets.pptx
 
Natural Polymer Based Nanomaterials
Natural Polymer Based NanomaterialsNatural Polymer Based Nanomaterials
Natural Polymer Based Nanomaterials
 
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptxAnalytical Profile of Coleus Forskohlii | Forskolin .pptx
Analytical Profile of Coleus Forskohlii | Forskolin .pptx
 
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
Labelling Requirements and Label Claims for Dietary Supplements and Recommend...
 
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
PossibleEoarcheanRecordsoftheGeomagneticFieldPreservedintheIsuaSupracrustalBe...
 
Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?Is RISC-V ready for HPC workload? Maybe?
Is RISC-V ready for HPC workload? Maybe?
 
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdfNAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
NAVSEA PEO USC - Unmanned & Small Combatants 26Oct23.pdf
 
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptxSOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
SOLUBLE PATTERN RECOGNITION RECEPTORS.pptx
 
Disentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOSTDisentangling the origin of chemical differences using GHOST
Disentangling the origin of chemical differences using GHOST
 
The Philosophy of Science
The Philosophy of ScienceThe Philosophy of Science
The Philosophy of Science
 
Orientation, design and principles of polyhouse
Orientation, design and principles of polyhouseOrientation, design and principles of polyhouse
Orientation, design and principles of polyhouse
 
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43bNightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
Nightside clouds and disequilibrium chemistry on the hot Jupiter WASP-43b
 
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
Call Girls in Munirka Delhi 💯Call Us 🔝9953322196🔝 💯Escort.
 
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdfAnalytical Profile of Coleus Forskohlii | Forskolin .pdf
Analytical Profile of Coleus Forskohlii | Forskolin .pdf
 
Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)Recombinant DNA technology (Immunological screening)
Recombinant DNA technology (Immunological screening)
 

How to automate Machine Learning pipeline ?

  • 2. • Why Auto-ML is gaining momentum ? • Focus on the automation process • Overview of various solutions • Conclusion AUTOMATED MACHINE LEARNING
  • 3. Data ScientistData Computation means Data pre-processing Model tuning Machine Learning Almost an automated process…
  • 4. Auto Machine Learning A fully automated process Data Computation meansRobot • Supervised tasks - classification - regression • Structured data - csv files - json files - … • Unsupervised tasks - outlier detection - clustering - … • Unstructured data - images - texts - …
  • 5. What is auto-ML ? We want to automate… …the maximum number of steps in a ML pipeline… …with minimum human intervention… …while conserving a high performance !
  • 6. Some companies struggle to deploy ML models « I need to recruit more Data Scientits ! » « Machine Learning cannot replace a human » « I have a model ! Now what ? »
  • 7. Others don’t… Identify a generic and repetitive/reusable pipeline Implement or buy or use automated tools Deployment and monitoring : A&B testing, …
  • 8. Data cleaning (duplicates, ids, correlations, leaks, … ) Data encoding (NA, dates, text, categorical features, … ) STEP 2 : Preprocessing STEP 1 : Reading / merging STEP 3 : Optimisation Feature selection Feature engineering Model selection Prediction Model interpretation STEP 4 : Application Focus on the automation process Diagram of a standard ML pipeline
  • 9. Step 1: reading and merging Maybe the hardest step. Not a priority for auto-ML at the moment. Some inputs : paths to the data + target name Difficult to auto-merge different sources
  • 10. Step 2: preprocessing Not all packages tackle preprocessing No inputs : heuristics to detect feature types are easy Naive encoding for most auto-ML packages
  • 11. Step 3: optimization Top priority for auto-ML community Some inputs : scoring function + a hyper-parameter space Computation time can be long !
  • 12. Step 4: application Prediction is also a priority for auto-ML No auto-monitoring after model deployment Fast and accurate step
  • 13. Maintenance Ease of setup Automated steps - Reading - Encoding - Optimisation - Encoding - Optimisation - Reading/merging - Cleaning - Encoding - Optimisation - Encoding - Optimisation - Optimisation Quality of automation Open source ? Overview of various solutions
  • 14. MLBox: a fully automated package
  • 15. STEP 1 : Reading / merging From a several raw datasets to one structured dataset. • List of paths to all the datasets • Target name • Reading of several files (csv, xls, json and hdf5) • Auto-merging of all the sources – information crunching • Task detection (binary/multiclass classification or regression) • Split between train and test sets • Basic information display • Dense structured train and test files (without duplicates) Features what is Automated by MLBox ?
  • 16. • A dataset • Auto-cleaning/dropping : duplicates, drifts / covariate shifts (*), high correlations, highly sparse features / samples, constants, … • Feature encoding : missing values, lists, dates, categorical features, text, … - SEVERAL STRATEGIES • A cleaned dataset with numerical features STEP 2 : Preprocessing From a dirty dataset to a cleaned numerical one Features what is Automated by MLBox ?
  • 17. STEP 3 : Optimisation A wide range of models are tested and cross-validated • A metric (a wide choice, otherwise can be implemented) - OPTIONAL • A hyper-parameter space – OPTIONAL • The train set • Feature engineering : using neural networks (*) • Feature selection : filter methods, wrapper methods, embedded methods • Model selection : a wide range of accurate models (LightGBM, XGBoost, Random Forest, Linear, …) • Hyper-parameter optimisation : TPE (Bayesian optimisation method) – dumping of fitted pipelines • Ensembling : multi-layer stacking, boosting, bagging, … • The optimal pipeline configuration Features what is Automated by MLBox ?
  • 18. STEP 4 : Prediction The best model is fitted and predicts on the test set • Train and test datasets • The optimal pipeline configuration - OPTIONAL • Target prediction : classification and regression - dumping of predictions + optimal fitted pipeline • Model interpretation : feature importances (saved) • Leak detection : warning • The predictions on the test set Features what is Automated by MLBox ?
  • 19.  Quality: functional code : tested on Kaggle  Performance: fully distributed and optimised  AI: dumping and automatic reading of computations  Updates: latest algorithms MLBox: about the package  Compatibility: Python 2.7-3.6, Linux OS  Quick setup: $ pip install mlbox  User friendly: tutorials, docs, examples…
  • 20. Conclusion: the benefits of auto-ML Democratizes Machine Learning Machine Learning for everybody (no coding) Avoids errors A robot never makes mistakes… Increases productivity Repetitive tasks are automated and accelerated ! A Data Scientist can focus more on non-traditional issues !

Editor's Notes

  1. 1min
  2. 1min
  3. 2min Data preprocessing and model tuning are both repetitive tasks that take a lot of time… A Data Scientist is expensive !
  4. 2min So why don’t we replace the DS by a robot ??? We would save time and money ! Let’s see what can be automated !
  5. 1min Performance = computation time + accuracy
  6. 2min 2012-2017 : Machine Learning (understand the benefit of Machine Learning | identify use cases) 2018- ? : Auto ML (model deployment issue !)
  7. 2min Companies will benefit from Auto-ml to deploy ML models ! Solution : identify supervised tasks that can be automated with a reusable/generic pipeline ! Other tasks are specifics (so difficult to be automated with a single tool/method: DS will focus on these tasks !). Gap between IT companies that have already automated pipelines and other companies that are late !
  8. 2min 90% of machine learning tasks follow this pipeline
  9. 1min Reading and creating the datasets has always been manual ! Here is the key point for automation !!!
  10. 1min Preprocessing issues : drifts (ids), leaks, duplicates ?? Heuristics : str -> categorical features, long str -> text, … Naive encoding : no NN !
  11. 1min First step automated : no needs to automate more !! (some solution pretend to test « thousands » of models but it is useless !) Optional inputs Optimization algo : a lot of ways to tackle the high dimension issue Meta learning !
  12. 1min NO monitoring ! Feature interpretation : ok
  13. 3min MLBox tackles merging !!
  14. 1min Available on PyPI Github with tutos, examples Docs with articles, kaggle kernels, … Performance : tested on Kaggle ! Features : drifts, embeddings, stacking, leak, feature importances,…
  15. 2min
  16. 2min
  17. 2min
  18. 2min
  19. 2min
  20. 2min Democratize ML !! Google CEO Sundar Pichai said that, “We hope AutoML will take an ability that a few Ph.D.s have today and will make it possible in three to five years for hundreds of thousands of developers to design new neural nets for their particular needs.”
  21. 10min THANKS ! Q&A ?