Uvrgrp ml

487 views
327 views

Published on

Published in: Technology, Education
0 Comments
0 Likes
Statistics
Notes
  • Be the first to comment

  • Be the first to like this

No Downloads
Views
Total views
487
On SlideShare
0
From Embeds
0
Number of Embeds
2
Actions
Shares
0
Downloads
4
Comments
0
Likes
0
Embeds 0
No embeds

No notes for slide

Uvrgrp ml

  1. 1. David Callender • Finished in top 2% (18th out of >1300) on 3 year $3 million Machine Learning competition. • Studied disease propagation in an urban setting using probabilistic graphical models at Dartmouth College • Studied computational protein design at the University of Washington • Studied Mathematical foundations of Quantum Mechanics at Macalester College
  2. 2. Machine Learning in R circa 2013 David Callender
  3. 3. a.k.a. Using R on Kaggle who will end up in the hospital }drug effectiveness Computer Security: Determining employee access needs What will the salary be for a given job advertisement
  4. 4. Not Just Kaggle •Movie recomendations •Popular productions •Product recomendations •Good business oportunities •The Entire Internet •Probably a lot more too
  5. 5. Talk Outline • Motivation • Concepts • Algorithms • Decision Trees and Forests • Neural networks • Kaggle • Interactive session with R packages • randomForest • gbm • neuralnet
  6. 6. Supervised Learning Survived Pclass Sex Age SibSp Parch Fare Embarked 0 3 male 22 1 0 7.25 S 1 1 female 38 1 0 71.2833 C 1 3 female 26 0 0 7.925 S 1 1 female 35 1 0 53.1 S 0 3 male 35 0 0 8.05 S 0 3 male 33 0 0 8.4583 Q 0 1 male 54 0 0 51.8625 S 0 3 male 2 3 1 21.075 S 1 3 female 27 0 2 11.1333 S 1 2 female 14 1 0 30.0708 C Survived Pclass Sex Age SibSp Parch Fare Embarked ? 3 male 34.5 0 0 7.8292 Q ? 3 female 47 1 0 7 S ? 2 male 62 0 0 9.6875 Q ? 3 male 27 0 0 8.6625 S ? 3 female 22 1 1 12.2875 S ? 3 male 14 0 0 9.225 S ? 3 female 30 0 0 7.6292 Q ? 2 male 26 1 1 29 S ? 3 female 18 0 0 7.2292 C ? 3 male 21 2 0 24.15 S Train model with examples where you know value of “survived” Use model to predict value of “survived” Predicting survival for passengers of Titanic binary numeric catagorical
  7. 7. Overfitting http://en.wikipedia.org/wiki/File:Overfitting_on_Training_Set_Data.pdf Tomaso Poggio
  8. 8. Decision Trees http://en.wikipedia.org/wiki/File:CART_tree_titanic_survivors.png | Stephen Milborrow | Made using R Survived Pclass Sex Age SibSp Parch Fare Embarked ? 3 male 34.5 0 0 7.8292 Q ? 3 female 47 1 0 7 S ? 2 male 62 0 0 9.6875 Q ? 3 male 27 0 0 8.7 S ? 3 female 22 1 1 12.2875 S ? 3 male 14 0 0 9.225 S ? 3 female 30 0 0 7.6292 Q ? 2 male 26 1 1 29 S ? 3 female 18 0 0 7.2292 C ? 3 male 21 2 0 24.15 S
  9. 9. Random Forest (RF) Survived Pclass Sex Age SibSp Parch Fare Embarked 0 3 male 22 1 0 7.25 S 1 1 female 38 1 0 71.2833 C 1 3 female 26 0 0 7.925 S 1 1 female 35 1 0 53.1 S 0 3 male 35 0 0 8.05 S 0 3 male 33 0 0 8.4583 Q 0 1 male 54 0 0 51.8625 S 0 3 male 2 3 1 21.075 S 1 3 female 27 0 2 11.1333 S 1 2 female 14 1 0 30.0708 C Survived Pclass Sex Age SibSp Parch Fare Embarked 0 3 male 22 1 0 7.25 S 1 1 female 38 1 0 71.2833 C 1 3 female 26 0 0 7.925 S 1 1 female 35 1 0 53.1 S 0 3 male 35 0 0 8.05 S 0 3 male 33 0 0 8.4583 Q 0 1 male 54 0 0 51.8625 S 0 3 male 2 3 1 21.075 S 1 3 female 27 0 2 11.1333 S 1 2 female 14 1 0 30.0708 C Random Sub-SpacesBagging { { Voting/Avg Prediction Training
  10. 10. Adaboost & Gradient Boosting • Initialize a set of weights, One for each training example, with equal value • Train a tree with weighted training examples • Add tree to set of trees • Make predictions with set of trees • Adjust weights so that the training examples you got wrong have more weight • repeat
  11. 11. Logistic Regression a.k.a The Perceptron Activation Function Weighted sum
  12. 12. Multilayer Feed-forward Neural Network
  13. 13. R’s Popularity Tools mentioned in Kaggle user profiles From blog entry by Ben Hammer http://blog.kaggle.com/2011/11/27/kagglers-favorite-tools/
  14. 14. Summary of Recent Competition Winners Position Algorithm Other Algs. Tools Adzuna Salary 1st Adzuna Salary 2nd Adzuna Salary 3rd Merck 1st Merck 2ndMerck 3rd NN* - Python GPU NN - C++ NN NB, SVM, LR Python NN* - Python GPU GBM & SVM RF, PCA, KNN, SVM R & Python RF & SVM GBM, NN R
  15. 15. Learning More • Pedro Domingos at University of Washington • www.coursera.org/course/machlearning • www.coursera.org/uw • A Few Useful Things to Know about Machine Learning. Communications of the ACM • homes.cs.washington.edu/~pedrod • blog.kaggle.com • ufldl.stanford.edu/wiki/

×