10 R Packages to Win Kaggle Competitions
Upcoming SlideShare
Loading in...5
×
 

Like this? Share it with your network

Share

10 R Packages to Win Kaggle Competitions

on

  • 23,928 views

10 R Packages to Win Kaggle Competitions by Xavier Conort

10 R Packages to Win Kaggle Competitions by Xavier Conort

Statistics

Views

Total Views
23,928
Views on SlideShare
11,816
Embed Views
12,112

Actions

Likes
64
Downloads
348
Comments
0

46 Embeds 12,112

http://datascience101.wordpress.com 8121
https://twitter.com 1102
http://www.datarobot.com 1055
http://101.datascience.community 472
http://igorsubbotin.blogspot.ru 383
https://datascience101.wordpress.com 357
http://feedly.com 238
https://feed.mikle.com 42
http://stage2.igfarm.com 36
https://feedly.com 35
http://feedreader.com 33
http://www.newsblur.com 29
http://www.feedspot.com 23
http://www.inoreader.com 21
http://tweetedtimes.com 19
http://newsblur.com 17
http://feed.mikle.com 16
http://digg.com 16
https://www.linkedin.com 12
https://digg.com 12
https://m.facebook.com 8
http://www.google.com 7
http://igorsubbotin.blogspot.com 7
http://reader.aol.com 6
https://www.newsblur.com 5
http://www.google.co.uk 4
https://igorsubbotin.blogspot.com 4
http://wordpress.com 4
http://translate.googleusercontent.com 3
http://plus.url.google.com 3
http://www.linkedin.com 3
http://datarobot.staging.wpengine.com 2
http://www.google.ca 2
https://tweetdeck.twitter.com 2
http://wikipedia.datascience101.wordpress.com 2
http://igorsubbotin.blogspot.in 1
http://feeds.feedburner.com 1
http://igorsubbotin.blogspot.com.es 1
http://www.google.com.sg 1
http://webcache.googleusercontent.com 1
http://www.google.co.in 1
http://www.google.com.tr 1
http://www.google.co.jp 1
http://www.google.com.au 1
http://xianguo.com 1
https://wordpress.com 1
More...

Accessibility

Upload Details

Uploaded via as Microsoft PowerPoint

Usage Rights

© All Rights Reserved

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Processing…
Post Comment
Edit your comment

10 R Packages to Win Kaggle Competitions Presentation Transcript

  • 1. 10 R Packages to Win Kaggle Competitions Xavier Conort Data Scientist
  • 2. Previously... … now!
  • 3. Competitions that boosted my R learning curve The Machine seems much smarter than I am at capturing complexity in the data even for simple datasets! Humans can help the Machine too! But don’t oversimplify and discard any data. Don’t be impatient. My best GBM had 24,500 trees with learning rate = 0.01! SVM and feature selection matter too!
  • 4. Word n-grams and character n-grams can make a big difference Parallel processing and big servers can help with complex feature engineering! Still many awesome tools in R that I don’t know! Glmnet can do a great job! Competitions that boosted my R learning curve
  • 5. 10 R Packages: Allow the Machine to Capture Complexity 1. gbm 2. randomForest 3. e1071 Take Advantage of High-Cardinality Categorical or Text Data 4. glmnet 5. tau Make Your Code More Efficient 6. Matrix 7. SOAR 8. forEach 9. doMC 10. data.table
  • 6. Capture Complexity Automatically
  • 7. 1. gbm Gradient Boosting Machine (Freud & Schapiro) Greg Ridgeway / Harry Southworth Key Trick: Use gbm.more to write your own early-stopping procedure
  • 8. 2. randomForest Random Forests (Breiman & Cutler) Authors: Breiman and Cutler Maintainer: Andy Liaw Key Trick: Importance=True for permutation importance Tune the sampsize parameter for faster computation and handling unbalanced classes
  • 9. 3. e1071 3. e1071:Support Vector Machines Maintainer: David Meyer Key Tricks: Use kernlab (Karatzoglou, Smola and Hornik) to get heuristic Write own pattern search
  • 10. Take Advantage of High-Cardinality Categorical or Text Features
  • 11. 4. glmnet Authors: Friedman, Hastie, Simon, Tibshirani L1 / Elasticnet / L2 Key Tricks: - Try interactions of 2 or more categorical variables - Test your code on the Kaggle: “Amazon Employ Access Challenge”
  • 12. 5. tau Maintainer: Kurt Hornik Used for automating text-mining Key Trick: Try character n-grams. They work surprisingly well!
  • 13. Make Your Code More Efficient
  • 14. 6. Matrix Authors / Maintainers: Douglas Bates and Martin Maechler Key Trick: Use sparse.model.matrix for one-hot encoding
  • 15. 7. SOAR Author / Maintainer: Bill Venables Used to store large R objects in the cache and release memory Key Trick: Once I found out about it, it made my R Experience great! (Just remember to empty your cache … )
  • 16. 8. forEach and 9. doMC Authors: Revolution Analytics Key Trick: Use for parallel-processing to speed up computation
  • 17. 10. data.table Authors: M Dowle, T Short and others Maintainer: Matt Dowle Key Trick: Essential for doing fast data aggregation operations at scale
  • 18. Don’t Forget .. Use your intuition to help the machine! ● Always compute differences / ratios of features o This can help the Machine a lot! ● Always consider discarding features that are “too good” o They can make the Machine lazy! o An example: GE Flight Quest
  • 19. Thank you!