Hack Session
By
Axel de Romblay
AUTOMATED MACHINE LEARNING
• Introduction on Auto-ML
• MLBox : a powerful Auto-ML python package
• Hack session on a dataset
AUTOMATED MACHINE LEARNING
Data ScientistData Computation means
Data pre-processing Model tuning
Machine Learning
Almost an automated process…
Auto Machine Learning
A fully automated process
Data Computation meansRobot
• Supervised tasks
- classification
- regression
• Structured data
- csv files
- json files
- …
• Unsupervised tasks
- outlier detection
- clustering
- …
• Unstructured data
- images
- texts
- …
What is auto-ML ?
We want to automate…
…the maximum number of steps in a ML pipeline…
…with minimum human intervention…
…while conserving a high performance !
Data
cleaning
(duplicates, ids,
correlations,
leaks, … )
Data
encoding
(NA, dates, text,
categorical
features, … )
STEP 2 : Preprocessing
STEP 1 : Reading /
merging
STEP 3 : Optimisation
Feature
selection
Feature
engineering
Model
selection
Prediction
Model
interpretation
STEP 4 : Application
Focus on the automation process
Diagram of a standard ML pipeline
 Quality: functional code : tested on Kaggle
 Performance: fully distributed and optimised
 AI: dumping and automatic reading of computations
 Updates: latest algorithms
MLBox: a fully automated python package
 Compatibility: Python 2.7-3.6, Linux OS
 Quick setup: $ pip install mlbox
 User friendly: tutorials, docs, examples…
Hack Session
https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries
Manual kernel : https://www.kaggle.com/sudalairajkumar/xgb-starter-in-python/
Auto kernel : https://www.kaggle.com/axelderomblay/mlbox-a-fully-automated-package/
Thank you !
Questions ?

Automate Machine Learning Pipeline Using MLBox

  • 1.
    Hack Session By Axel deRomblay AUTOMATED MACHINE LEARNING
  • 2.
    • Introduction onAuto-ML • MLBox : a powerful Auto-ML python package • Hack session on a dataset AUTOMATED MACHINE LEARNING
  • 3.
    Data ScientistData Computationmeans Data pre-processing Model tuning Machine Learning Almost an automated process…
  • 4.
    Auto Machine Learning Afully automated process Data Computation meansRobot • Supervised tasks - classification - regression • Structured data - csv files - json files - … • Unsupervised tasks - outlier detection - clustering - … • Unstructured data - images - texts - …
  • 5.
    What is auto-ML? We want to automate… …the maximum number of steps in a ML pipeline… …with minimum human intervention… …while conserving a high performance !
  • 6.
    Data cleaning (duplicates, ids, correlations, leaks, …) Data encoding (NA, dates, text, categorical features, … ) STEP 2 : Preprocessing STEP 1 : Reading / merging STEP 3 : Optimisation Feature selection Feature engineering Model selection Prediction Model interpretation STEP 4 : Application Focus on the automation process Diagram of a standard ML pipeline
  • 8.
     Quality: functionalcode : tested on Kaggle  Performance: fully distributed and optimised  AI: dumping and automatic reading of computations  Updates: latest algorithms MLBox: a fully automated python package  Compatibility: Python 2.7-3.6, Linux OS  Quick setup: $ pip install mlbox  User friendly: tutorials, docs, examples…
  • 9.
    Hack Session https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries Manual kernel: https://www.kaggle.com/sudalairajkumar/xgb-starter-in-python/ Auto kernel : https://www.kaggle.com/axelderomblay/mlbox-a-fully-automated-package/
  • 10.

Editor's Notes

  • #2 1min
  • #3 1min
  • #4 2min Data preprocessing and model tuning are both repetitive tasks that take a lot of time… A Data Scientist is expensive !
  • #5 2min So why don’t we replace the DS by a robot ??? We would save time and money ! Let’s see what can be automated !
  • #6 1min Performance = computation time + accuracy
  • #7 2min 90% of machine learning tasks follow this pipeline
  • #8 1min Available on PyPI Github with tutos, examples Docs with articles, kaggle kernels, … Performance : tested on Kaggle ! Features : drifts, embeddings, stacking, leak, feature importances,…
  • #9 2min
  • #10 40min
  • #11 10min THANKS ! Q&A ?