Automate Machine Learning Pipeline Using MLBox

•Download as PPTX, PDF•

0 likes•251 views

Axel de Romblay

Speaker @DataHack Summit 2017, the India's largest conference on Artificial Intelligence & Machine Learning

Science

Hack Session
By
Axel de Romblay
AUTOMATED MACHINE LEARNING

• Introduction on Auto-ML
• MLBox : a powerful Auto-ML python package
• Hack session on a dataset
AUTOMATED MACHINE LEARNING

Data ScientistData Computation means
Data pre-processing Model tuning
Machine Learning
Almost an automated process…

Auto Machine Learning
A fully automated process
Data Computation meansRobot
• Supervised tasks
- classification
- regression
• Structured data
- csv files
- json files
- …
• Unsupervised tasks
- outlier detection
- clustering
- …
• Unstructured data
- images
- texts
- …

What is auto-ML ?
We want to automate…
…the maximum number of steps in a ML pipeline…
…with minimum human intervention…
…while conserving a high performance !

Data
cleaning
(duplicates, ids,
correlations,
leaks, … )
Data
encoding
(NA, dates, text,
categorical
features, … )
STEP 2 : Preprocessing
STEP 1 : Reading /
merging
STEP 3 : Optimisation
Feature
selection
Feature
engineering
Model
selection
Prediction
Model
interpretation
STEP 4 : Application
Focus on the automation process
Diagram of a standard ML pipeline

 Quality: functional code : tested on Kaggle
 Performance: fully distributed and optimised
 AI: dumping and automatic reading of computations
 Updates: latest algorithms
MLBox: a fully automated python package
 Compatibility: Python 2.7-3.6, Linux OS
 Quick setup: $ pip install mlbox
 User friendly: tutorials, docs, examples…

Hack Session
https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries
Manual kernel : https://www.kaggle.com/sudalairajkumar/xgb-starter-in-python/
Auto kernel : https://www.kaggle.com/axelderomblay/mlbox-a-fully-automated-package/

What's hot

Scalable Automatic Machine Learning in H2OSri Ambati

Machine learning pipeline with spark mldatamantra

MLlib and Machine Learning on SparkPetr Zapletal

Microsoft Introduction to Automated Machine LearningSetu Chokshi

Jake Mannix, MLconf 2013MLconf

Introduction to MLflowDatabricks

Object- Relational Persistence in SmalltalkESUG

Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...Justin Basilico

HyperGraphDbborislav

Pipeline oriented data analyticsBorys Biletskyy

ML Infra for Netflix Recommendations - AI NEXTCon talkFaisal Siddiqi

SDEC2011 Mahout - the what, the how and the whyKorea Sdec

Building A Machine Learning Platform At Quora (1)Nikhil Garg

HypergraphDBJan Drozen

Hundreds of queries in the time of one - Gianmario SpacagnaSpark Summit

mlflow: Accelerating the End-to-End ML lifecycleDatabricks

Open Platform for AI & ML modelingInstitute of Contemporary Sciences

Automated Machine Learningsafa cimenli

Graph Based Machine Learning on Relational DataBenjamin Bengfort

Python for MLReza Sadeghi Jafari

What's hot (20)

Scalable Automatic Machine Learning in H2O

Machine learning pipeline with spark ml

MLlib and Machine Learning on Spark

Microsoft Introduction to Automated Machine Learning

Jake Mannix, MLconf 2013

Introduction to MLflow

Object- Relational Persistence in Smalltalk

Is that a Time Machine? Some Design Patterns for Real World Machine Learning ...

HyperGraphDb

Pipeline oriented data analytics

ML Infra for Netflix Recommendations - AI NEXTCon talk

SDEC2011 Mahout - the what, the how and the why

Building A Machine Learning Platform At Quora (1)

HypergraphDB

Hundreds of queries in the time of one - Gianmario Spacagna

mlflow: Accelerating the End-to-End ML lifecycle

Open Platform for AI & ML modeling

Automated Machine Learning

Graph Based Machine Learning on Relational Data

Python for ML

Similar to Automate Machine Learning Pipeline Using MLBox

Embermrphilroth

END-TO-END MACHINE LEARNING STACKJan Wiegelmann

Introduction to ML.NETGianni Rosa Gallina

Deep Learning for Autonomous DrivingJan Wiegelmann

How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....Databricks

The Power of Auto ML and How Does it WorkIvo Andreev

Cutting Edge Computer Vision for EveryoneIvo Andreev

The Data Science Process - Do we need it and how to apply?Ivo Andreev

Building Machine Learning Models Automatically (June 2020)Julien SIMON

Python mlShubham Sharma

A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...Jose Quesada (hiring)

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning ModelsAnyscale

Machine learningSaravanan Subburayal

201906 02 Introduction to AutoML with ML.NET 1.0Mark Tabladillo

201909 Automated ML for DevelopersMark Tabladillo

04 open source_toolsMarco Quartulli

Azure Databricks for Data ScientistsRichard Garris

AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021Sandesh Rao

The Challenges of Bringing Machine Learning to the MassesAlice Zheng

C19013010 the tutorial to build shared ai services session 1Bill Liu

Similar to Automate Machine Learning Pipeline Using MLBox (20)

Ember

END-TO-END MACHINE LEARNING STACK

Introduction to ML.NET

Deep Learning for Autonomous Driving

How to Productionize Your Machine Learning Models Using Apache Spark MLlib 2....

The Power of Auto ML and How Does it Work

Cutting Edge Computer Vision for Everyone

The Data Science Process - Do we need it and how to apply?

Building Machine Learning Models Automatically (June 2020)

Python ml

A full Machine learning pipeline in Scikit-learn vs in scala-Spark: pros and ...

Apache ® Spark™ MLlib 2.x: How to Productionize your Machine Learning Models

Machine learning

201906 02 Introduction to AutoML with ML.NET 1.0

201909 Automated ML for Developers

04 open source_tools

Azure Databricks for Data Scientists

AutoML - Heralding a New Era of Machine Learning - CASOUG Oct 2021

The Challenges of Bringing Machine Learning to the Masses

C19013010 the tutorial to build shared ai services session 1

Recently uploaded

The Philosophy of ScienceUniversity of Hertfordshire

Presentation Vikram Lander by Vedansh Gupta.pptxgindu3009

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCEPRINCE C P

Disentangling the origin of chemical differences using GHOSTSérgio Sacani

Engler and Prantl system of classification in plant taxonomyNistarini College, Purulia (W.B) India

Recombinant DNA technology (Immunological screening)PraveenaKalaiselvan1

Zoology 4th semester series (krishna).pdfSumit Kumar yadav

Formation of low mass protostars and their circumstellar disksSérgio Sacani

Pests of cotton_Sucking_Pests_Dr.UPR.pdfPirithiRaju

GBSN - Biochemistry (Unit 1)Areesha Ahmad

Animal Communication- Auditory and Visual.pptxUmerFayaz5

Chemistry 4th semester series (krishna).pdfSumit Kumar yadav

Natural Polymer Based NanomaterialsAArockiyaNisha

GBSN - Microbiology (Unit 2)Areesha Ahmad

DIFFERENCE IN BACK CROSS AND TEST CROSSLeenakshiTyagi

Broad bean, Lima Bean, Jack bean, Ullucus.pptxjana861314

Spermiogenesis or Spermateleosis or metamorphosis of spermatidSarthak Sekhar Mondal

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral AnalysisDiwakar Mishra

Green chemistry and Sustainable development.pptxRajatChauhan518211

Hire 💕 9907093804 Hooghly Call Girls Service Call Girls AgencySheetal Arora

Recently uploaded (20)

The Philosophy of Science

Presentation Vikram Lander by Vedansh Gupta.pptx

STERILITY TESTING OF PHARMACEUTICALS ppt by DR.C.P.PRINCE

Disentangling the origin of chemical differences using GHOST

Engler and Prantl system of classification in plant taxonomy

Recombinant DNA technology (Immunological screening)

Zoology 4th semester series (krishna).pdf

Formation of low mass protostars and their circumstellar disks

Pests of cotton_Sucking_Pests_Dr.UPR.pdf

GBSN - Biochemistry (Unit 1)

Animal Communication- Auditory and Visual.pptx

Chemistry 4th semester series (krishna).pdf

Natural Polymer Based Nanomaterials

GBSN - Microbiology (Unit 2)

DIFFERENCE IN BACK CROSS AND TEST CROSS

Broad bean, Lima Bean, Jack bean, Ullucus.pptx

Spermiogenesis or Spermateleosis or metamorphosis of spermatid

Raman spectroscopy.pptx M Pharm, M Sc, Advanced Spectral Analysis

Green chemistry and Sustainable development.pptx

Hire 💕 9907093804 Hooghly Call Girls Service Call Girls Agency

Automate Machine Learning Pipeline Using MLBox

1. Hack Session By Axel de Romblay AUTOMATED MACHINE LEARNING

2. • Introduction on Auto-ML • MLBox : a powerful Auto-ML python package • Hack session on a dataset AUTOMATED MACHINE LEARNING

3. Data ScientistData Computation means Data pre-processing Model tuning Machine Learning Almost an automated process…

4. Auto Machine Learning A fully automated process Data Computation meansRobot • Supervised tasks - classification - regression • Structured data - csv files - json files - … • Unsupervised tasks - outlier detection - clustering - … • Unstructured data - images - texts - …

5. What is auto-ML ? We want to automate… …the maximum number of steps in a ML pipeline… …with minimum human intervention… …while conserving a high performance !

6. Data cleaning (duplicates, ids, correlations, leaks, … ) Data encoding (NA, dates, text, categorical features, … ) STEP 2 : Preprocessing STEP 1 : Reading / merging STEP 3 : Optimisation Feature selection Feature engineering Model selection Prediction Model interpretation STEP 4 : Application Focus on the automation process Diagram of a standard ML pipeline

8.  Quality: functional code : tested on Kaggle  Performance: fully distributed and optimised  AI: dumping and automatic reading of computations  Updates: latest algorithms MLBox: a fully automated python package  Compatibility: Python 2.7-3.6, Linux OS  Quick setup: $ pip install mlbox  User friendly: tutorials, docs, examples…

9. Hack Session https://www.kaggle.com/c/two-sigma-connect-rental-listing-inquiries Manual kernel : https://www.kaggle.com/sudalairajkumar/xgb-starter-in-python/ Auto kernel : https://www.kaggle.com/axelderomblay/mlbox-a-fully-automated-package/

10. Thank you ! Questions ?

Editor's Notes

1min
1min
2min Data preprocessing and model tuning are both repetitive tasks that take a lot of time… A Data Scientist is expensive !
2min So why don’t we replace the DS by a robot ??? We would save time and money ! Let’s see what can be automated !
1min Performance = computation time + accuracy
2min 90% of machine learning tasks follow this pipeline
1min Available on PyPI Github with tutos, examples Docs with articles, kaggle kernels, … Performance : tested on Kaggle ! Features : drifts, embeddings, stacking, leak, feature importances,…
2min
40min
10min THANKS ! Q&A ?

Automate Machine Learning Pipeline Using MLBox

Recommended

Recommended

More Related Content

What's hot

What's hot (20)

Similar to Automate Machine Learning Pipeline Using MLBox

Similar to Automate Machine Learning Pipeline Using MLBox (20)

Recently uploaded

Recently uploaded (20)

Automate Machine Learning Pipeline Using MLBox

Editor's Notes