4. Auto Machine Learning
A fully automated process
Data Computation meansRobot
• Supervised tasks
- classification
- regression
• Structured data
- csv files
- json files
- …
• Unsupervised tasks
- outlier detection
- clustering
- …
• Unstructured data
- images
- texts
- …
5. What is auto-ML ?
We want to automate…
…the maximum number of steps in a ML pipeline…
…with minimum human intervention…
…while conserving a high performance !
6. Data
cleaning
(duplicates, ids,
correlations,
leaks, … )
Data
encoding
(NA, dates, text,
categorical
features, … )
STEP 2 : Preprocessing
STEP 1 : Reading /
merging
STEP 3 : Optimisation
Feature
selection
Feature
engineering
Model
selection
Prediction
Model
interpretation
STEP 4 : Application
Focus on the automation process
Diagram of a standard ML pipeline
7.
8. Quality: functional code : tested on Kaggle
Performance: fully distributed and optimised
AI: dumping and automatic reading of computations
Updates: latest algorithms
MLBox: a fully automated python package
Compatibility: Python 2.7-3.6, Linux OS
Quick setup: $ pip install mlbox
User friendly: tutorials, docs, examples…
2min
Data preprocessing and model tuning are both repetitive tasks that take a lot of time…
A Data Scientist is expensive !
2min
So why don’t we replace the DS by a robot ??? We would save time and money !
Let’s see what can be automated !
1min
Performance = computation time + accuracy
2min
90% of machine learning tasks follow this pipeline
1min
Available on PyPI
Github with tutos, examples
Docs with articles, kaggle kernels, …
Performance : tested on Kaggle !
Features : drifts, embeddings, stacking, leak, feature importances,…