Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.

AIMeetup #3: Uczenie maszynowe - rocket science czy chleb powszedni?

236 views

Published on

Vladimir Alekseichenko podczas AIMeetup #3 w Krakowie organizowanego przez 2040.io opowiadał o tym jak uczenie maszynowe istotnie wpływa na nasze życie już teraz i jak jeszcze bardziej wpłynie życie na naszych dzieci czy wnuków.

Published in: Technology

AIMeetup #3: Uczenie maszynowe - rocket science czy chleb powszedni?

  1. 1. Uczenie maszynowe Vladimir Alekseichenko „rocket science” czy chleb powszedni?
  2. 2. Zmiany w czasie
  3. 3. 10min na jeden 36 500 000 minut ~70 lat
  4. 4. Kierowca vs Mechanik
  5. 5. dataworkshop.eu
  6. 6. Bike Sharing Demand Zadnie - kaggle Rozwiązanie - github.com/dataworkshop
  7. 7. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  8. 8. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  9. 9. Zrozum Biznes i Dane (understand business and data)
  10. 10. Dni robocze
  11. 11. Weekend
  12. 12. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  13. 13. Wytworzenie cech (feature engineering) • ilościowe => od 1 do 10, 11 do 20… • daty => dzień, miesiąc, rok, godzina, czy weekend… • kategorii/jakościowe (czerwony, zielony, biały) • przypisać identyfikator liczbowy (1, 2, 3) • stworzyć n-kolumn binarnych (jest czerwony? itd) • prawdopodobieństwa ze zmienną docelową
  14. 14. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  15. 15. Selekcja cech (feature selection) • Czym mniej tym lepiej (prostszy model) • Zostawić najbardziej wartościowe (idealnie jedna :) • Cechy (zazwyczaj) są zależny, więc trzeba uważać… (sprawdzać empirycznie) • Szybciej
  16. 16. Variance Univariate Recursive
  17. 17. xgbfir https://github.com/limexp/xgbfir
  18. 18. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  19. 19. Dobór Modelu (model selection) • Linear • Decision Tree • Random Forest • Gradient Boosting • Neural Network
  20. 20. Linear https://github.com/dataworkshop/model_evaluation/blob/master/step1-regression.ipynb
  21. 21. Decision Tree http://xgboost.readthedocs.io/en/latest/model.html
  22. 22. Ensemble trees http://xgboost.readthedocs.io/en/latest/model.html
  23. 23. Ensemble trees • Bagging (bootstrap aggregation) • Random Forest • Extra Trees • Boosting • Gradient Boosting
  24. 24. XGBoost (Extreme Gradient Boosting) “When in doubt, use xgboost” Owen Zhang
  25. 25. Wybór modelu (model selection)
  26. 26. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  27. 27. Dobór hiperparametrów (tuning hyperparameters) • Grid Search • Random Search • Bayesian
  28. 28. hyperopt
  29. 29. Understand Business & Data Read and explore data Feature Engineering Create a new ones based on already exists Feature Selection Select only useful features Model Selection Find the best model(s) model A model B model C model D model E Tuning Hyperparameters Find the best hyperparameters for given model Ensemble Modeling Combine few models into one more better x0.6 x0.4+ mode l B mode l E datetime season temp count 2011-01-01 08:32:02 1 9.23 5 2012-04-02 12:10:00 2 18.78 32 2012-08-07 15:47:01 3 15.45 15 datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708 mode l B mode l E datetime season temp hour day month … count count_log 2011-01-01 08:32:02 1 9.23 8 1 1 … 5 1.609 2012-04-02 12:10:00 2 18.78 12 2 4 … 32 3.466 2012-08-07 15:47:01 3 15.45 15 7 8 … 15 2.708
  30. 30. Ansambl (ensemble modeling)
  31. 31. Neuron
  32. 32. (Artificial) Neural Network
  33. 33. MNIST
  34. 34. Dane
  35. 35. Neural Network Error: 1.60%
  36. 36. http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html
  37. 37. source
  38. 38. Wyzwania
  39. 39. Przeuczenie się (overfitting) http://mlwiki.org/index.php/Overfitting
  40. 40. Sprawdzian krzyżowy (cross-validation) http://blog.goldenhelix.com/bchristensen/cross-validation-for-genomic-prediction-in-svs/
  41. 41. Kreatywność jest wiele warta
  42. 42. https://techcrunch.com/2016/11/19/how-data-science-and-rocket-science-will-get-humans-to-mars
  43. 43. source Fala już idzi… czy jesteś gotów?
  44. 44. Dziękuję @slon1024 hello@vova.me dataworkshop.eu

×