Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
10 R Packages to Win
Kaggle Competitions
Xavier Conort
Data Scientist
Previously... … now!
Competitions that boosted my R learning curve
The Machine seems much smarter than I am at capturing complexity in
the data...
Word n-grams and character n-grams can make a big difference
Parallel processing and big servers can help with complex fea...
10 R Packages:
Allow the Machine to Capture Complexity
1. gbm
2. randomForest
3. e1071
Take Advantage of High-Cardinality ...
Capture Complexity Automatically
1. gbm
Gradient Boosting Machine (Freud & Schapiro)
Greg Ridgeway / Harry Southworth
Key Trick:
Use gbm.more to write your...
2. randomForest
Random Forests (Breiman & Cutler)
Authors: Breiman and Cutler
Maintainer: Andy Liaw
Key Trick:
Importance=...
3. e1071
3. e1071:Support Vector Machines
Maintainer: David Meyer
Key Tricks:
Use kernlab (Karatzoglou, Smola and Hornik) ...
Take Advantage of High-Cardinality
Categorical or Text Features
4. glmnet
Authors: Friedman, Hastie, Simon, Tibshirani
L1 / Elasticnet / L2
Key Tricks:
- Try interactions of 2 or more ca...
5. tau
Maintainer: Kurt Hornik
Used for automating text-mining
Key Trick:
Try character n-grams. They work surprisingly we...
Make Your Code More Efficient
6. Matrix
Authors / Maintainers: Douglas Bates and Martin Maechler
Key Trick:
Use sparse.model.matrix for one-hot encoding
7. SOAR
Author / Maintainer: Bill Venables
Used to store large R objects in the cache and release
memory
Key Trick:
Once I...
8. forEach and 9. doMC
Authors: Revolution Analytics
Key Trick:
Use for parallel-processing to speed up computation
10. data.table
Authors: M Dowle, T Short and others
Maintainer: Matt Dowle
Key Trick:
Essential for doing fast data aggreg...
Don’t Forget ..
Use your intuition to help the machine!
● Always compute differences / ratios of features
o This can help ...
Thank you!
You’ve finished this document.
Download and read it offline.
Upcoming SlideShare
Machine Learning and Data Mining: 12 Classification Rules
Next
Upcoming SlideShare
Machine Learning and Data Mining: 12 Classification Rules
Next
Download to read offline and view in fullscreen.

Share

10 R Packages to Win Kaggle Competitions

Download to read offline

10 R Packages to Win Kaggle Competitions by Xavier Conort

Related Books

Free with a 30 day trial from Scribd

See all

10 R Packages to Win Kaggle Competitions

  1. 1. 10 R Packages to Win Kaggle Competitions Xavier Conort Data Scientist
  2. 2. Previously... … now!
  3. 3. Competitions that boosted my R learning curve The Machine seems much smarter than I am at capturing complexity in the data even for simple datasets! Humans can help the Machine too! But don’t oversimplify and discard any data. Don’t be impatient. My best GBM had 24,500 trees with learning rate = 0.01! SVM and feature selection matter too!
  4. 4. Word n-grams and character n-grams can make a big difference Parallel processing and big servers can help with complex feature engineering! Still many awesome tools in R that I don’t know! Glmnet can do a great job! Competitions that boosted my R learning curve
  5. 5. 10 R Packages: Allow the Machine to Capture Complexity 1. gbm 2. randomForest 3. e1071 Take Advantage of High-Cardinality Categorical or Text Data 4. glmnet 5. tau Make Your Code More Efficient 6. Matrix 7. SOAR 8. forEach 9. doMC 10. data.table
  6. 6. Capture Complexity Automatically
  7. 7. 1. gbm Gradient Boosting Machine (Freud & Schapiro) Greg Ridgeway / Harry Southworth Key Trick: Use gbm.more to write your own early-stopping procedure
  8. 8. 2. randomForest Random Forests (Breiman & Cutler) Authors: Breiman and Cutler Maintainer: Andy Liaw Key Trick: Importance=True for permutation importance Tune the sampsize parameter for faster computation and handling unbalanced classes
  9. 9. 3. e1071 3. e1071:Support Vector Machines Maintainer: David Meyer Key Tricks: Use kernlab (Karatzoglou, Smola and Hornik) to get heuristic Write own pattern search
  10. 10. Take Advantage of High-Cardinality Categorical or Text Features
  11. 11. 4. glmnet Authors: Friedman, Hastie, Simon, Tibshirani L1 / Elasticnet / L2 Key Tricks: - Try interactions of 2 or more categorical variables - Test your code on the Kaggle: “Amazon Employ Access Challenge”
  12. 12. 5. tau Maintainer: Kurt Hornik Used for automating text-mining Key Trick: Try character n-grams. They work surprisingly well!
  13. 13. Make Your Code More Efficient
  14. 14. 6. Matrix Authors / Maintainers: Douglas Bates and Martin Maechler Key Trick: Use sparse.model.matrix for one-hot encoding
  15. 15. 7. SOAR Author / Maintainer: Bill Venables Used to store large R objects in the cache and release memory Key Trick: Once I found out about it, it made my R Experience great! (Just remember to empty your cache … )
  16. 16. 8. forEach and 9. doMC Authors: Revolution Analytics Key Trick: Use for parallel-processing to speed up computation
  17. 17. 10. data.table Authors: M Dowle, T Short and others Maintainer: Matt Dowle Key Trick: Essential for doing fast data aggregation operations at scale
  18. 18. Don’t Forget .. Use your intuition to help the machine! ● Always compute differences / ratios of features o This can help the Machine a lot! ● Always consider discarding features that are “too good” o They can make the Machine lazy! o An example: GE Flight Quest
  19. 19. Thank you!
  • ssuser1677c7

    Jun. 10, 2021
  • marcguirand

    May. 9, 2020
  • TetsuyaIto5

    Feb. 27, 2020
  • YukiOtoshima

    Jul. 16, 2019
  • cris_weber

    Nov. 10, 2018
  • itjil

    Jan. 25, 2018
  • ssuserf91a3e

    Oct. 17, 2017
  • MadeleineLee

    Sep. 6, 2017
  • FrankEbbers

    Jul. 29, 2017
  • TamerElsharnouby

    Jun. 29, 2017
  • jurbanhost

    May. 24, 2017
  • NoriakiAshikaga

    Apr. 25, 2017
  • FranckTankouan

    Apr. 21, 2017
  • HiroyukiKatsura

    Feb. 20, 2017
  • ktargows

    Dec. 13, 2016
  • SambidKumarPradhan

    Dec. 11, 2016
  • KorkridAkepanidtawor

    Dec. 8, 2016
  • akbarboghani

    Dec. 2, 2016
  • MohcineMadkour1

    Nov. 14, 2016
  • BernardusAriKuncoro

    Oct. 20, 2016

10 R Packages to Win Kaggle Competitions by Xavier Conort

Views

Total views

116,591

On Slideshare

0

From embeds

0

Number of embeds

57,694

Actions

Downloads

1,667

Shares

0

Comments

0

Likes

228

×