Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
○
○
○
○
○
○
https://competitions.codalab.org/competitions/2321Image Source: http://www.causality.inf.ethz.ch/AutoML/spiral.png
○
○
○
○
○
AutoCompete: A Framework for Machine Learning Competitions, A.Thakur and A Krohn-Grimberghe, ICML AutoML Workshop, 2015
● Numerical Data:
○ Do nothing
● Numerical Data:
○ Do nothing
● Categorical Data:
○ Label encoding
○ One-hot encoding
● Numerical Data:
○ Do nothing
● Categorical Data:
○ Label encoding
○ One-hot encoding
● Numerical Data:
○ Do nothing
● Categorical Data:
○ Label encoding
○ One-hot encoding
● Numerical Data:
○ Do nothing
● Text Data:
○ Counts
○ TF-IDF
● Numerical Data:
○ Do nothing
● Text Data:
○ Counts
○ TF-IDF
● Multiple ways of feature selection
● Random forest based feature importances
● Feature importances from GBM
● Chi2 featu...
● Multiple ways of feature selection
● Random forest based feature importances
● Feature importances from GBM
● Chi2 featu...
● Multiple ways of feature selection
● Random forest based feature importances
● Feature importances from GBM
● Chi2 featu...
● Multiple ways of feature selection
● Random forest based feature importances
● Feature importances from GBM
● Chi2 featu...
● Multiple ways of feature selection
● Random forest based feature importances
● Feature importances from GBM
● Chi2 featu...
● Grid Search
● Random Search
● Classification:
○ Random Forest
○ GBM
○ Logistic Regression
○ Naive Bayes
○ Support Vector Machines
○ k-Nearest Neighbor...
● Classification:
○ Random Forest
○ GBM
○ Logistic Regression
○ Naive Bayes
○ Support Vector Machines
○ k-Nearest Neighbor...
To Appear: AutoCompete 2.0: A Framework for Optimizing Parameters of Neural Networks, A.Thakur, ICML AutoML Workshop, Syst...
○
○
○
○
○
○
○
Results on Newsgroups-20 dataset
AutoML Final1 Results
AutoML Final4 Results
AutoML GPU Track
Results
● @abhi1thakur
● bit.ly/thakurabhishek
● kaggle.com/abhishek
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
Automatic Machine Learning using Python & scikit-learn
Upcoming SlideShare
Loading in …5
×

Automatic Machine Learning using Python & scikit-learn

4,947 views

Published on

Talk at pydata paris

Published in: Data & Analytics

Automatic Machine Learning using Python & scikit-learn

  1. 1. ○ ○ ○ ○ ○ ○
  2. 2. https://competitions.codalab.org/competitions/2321Image Source: http://www.causality.inf.ethz.ch/AutoML/spiral.png
  3. 3. ○ ○ ○ ○ ○
  4. 4. AutoCompete: A Framework for Machine Learning Competitions, A.Thakur and A Krohn-Grimberghe, ICML AutoML Workshop, 2015
  5. 5. ● Numerical Data: ○ Do nothing
  6. 6. ● Numerical Data: ○ Do nothing ● Categorical Data: ○ Label encoding ○ One-hot encoding
  7. 7. ● Numerical Data: ○ Do nothing ● Categorical Data: ○ Label encoding ○ One-hot encoding
  8. 8. ● Numerical Data: ○ Do nothing ● Categorical Data: ○ Label encoding ○ One-hot encoding
  9. 9. ● Numerical Data: ○ Do nothing ● Text Data: ○ Counts ○ TF-IDF
  10. 10. ● Numerical Data: ○ Do nothing ● Text Data: ○ Counts ○ TF-IDF
  11. 11. ● Multiple ways of feature selection ● Random forest based feature importances ● Feature importances from GBM ● Chi2 feature selection ● Greedy feature selection
  12. 12. ● Multiple ways of feature selection ● Random forest based feature importances ● Feature importances from GBM ● Chi2 feature selection ● Greedy feature selection
  13. 13. ● Multiple ways of feature selection ● Random forest based feature importances ● Feature importances from GBM ● Chi2 feature selection ● Greedy feature selection
  14. 14. ● Multiple ways of feature selection ● Random forest based feature importances ● Feature importances from GBM ● Chi2 feature selection ● Greedy feature selection
  15. 15. ● Multiple ways of feature selection ● Random forest based feature importances ● Feature importances from GBM ● Chi2 feature selection ● Greedy feature selection
  16. 16. ● Grid Search ● Random Search
  17. 17. ● Classification: ○ Random Forest ○ GBM ○ Logistic Regression ○ Naive Bayes ○ Support Vector Machines ○ k-Nearest Neighbors ● Grid Search ● Random Search
  18. 18. ● Classification: ○ Random Forest ○ GBM ○ Logistic Regression ○ Naive Bayes ○ Support Vector Machines ○ k-Nearest Neighbors ● Regression ○ Random Forest ○ GBM ○ Linear Regression ○ Ridge ○ Lasso ○ SVR ● Grid Search ● Random Search
  19. 19. To Appear: AutoCompete 2.0: A Framework for Optimizing Parameters of Neural Networks, A.Thakur, ICML AutoML Workshop, System Desc Track, 2016
  20. 20. ○ ○ ○ ○ ○ ○ ○
  21. 21. Results on Newsgroups-20 dataset
  22. 22. AutoML Final1 Results
  23. 23. AutoML Final4 Results
  24. 24. AutoML GPU Track Results
  25. 25. ● @abhi1thakur ● bit.ly/thakurabhishek ● kaggle.com/abhishek

×