SlideShare a Scribd company logo
10 R Packages to Win
Kaggle Competitions
Xavier Conort
Data Scientist
Previously... … now!
Competitions that boosted my R learning curve
The Machine seems much smarter than I am at capturing complexity in
the data even for simple datasets!
Humans can help the Machine too! But don’t oversimplify and discard
any data.
Don’t be impatient. My best GBM had 24,500 trees with learning rate =
0.01!
SVM and feature selection matter too!
Word n-grams and character n-grams can make a big difference
Parallel processing and big servers can help with complex feature
engineering!
Still many awesome tools in R that I don’t know!
Glmnet can do a great job!
Competitions that boosted my R learning curve
10 R Packages:
Allow the Machine to Capture Complexity
1. gbm
2. randomForest
3. e1071
Take Advantage of High-Cardinality Categorical or Text Data
4. glmnet
5. tau
Make Your Code More Efficient
6. Matrix
7. SOAR
8. forEach
9. doMC
10. data.table
Capture Complexity Automatically
1. gbm
Gradient Boosting Machine (Freud & Schapiro)
Greg Ridgeway / Harry Southworth
Key Trick:
Use gbm.more to write your own early-stopping procedure
2. randomForest
Random Forests (Breiman & Cutler)
Authors: Breiman and Cutler
Maintainer: Andy Liaw
Key Trick:
Importance=True for permutation importance
Tune the sampsize parameter for faster computation and
handling unbalanced classes
3. e1071
3. e1071:Support Vector Machines
Maintainer: David Meyer
Key Tricks:
Use kernlab (Karatzoglou, Smola and Hornik) to get
heuristic
Write own pattern search
Take Advantage of High-Cardinality
Categorical or Text Features
4. glmnet
Authors: Friedman, Hastie, Simon, Tibshirani
L1 / Elasticnet / L2
Key Tricks:
- Try interactions of 2 or more categorical variables
- Test your code on the Kaggle: “Amazon Employ Access
Challenge”
5. tau
Maintainer: Kurt Hornik
Used for automating text-mining
Key Trick:
Try character n-grams. They work surprisingly well!
Make Your Code More Efficient
6. Matrix
Authors / Maintainers: Douglas Bates and Martin Maechler
Key Trick:
Use sparse.model.matrix for one-hot encoding
7. SOAR
Author / Maintainer: Bill Venables
Used to store large R objects in the cache and release
memory
Key Trick:
Once I found out about it, it made my R Experience great!
(Just remember to empty your cache … )
8. forEach and 9. doMC
Authors: Revolution Analytics
Key Trick:
Use for parallel-processing to speed up computation
10. data.table
Authors: M Dowle, T Short and others
Maintainer: Matt Dowle
Key Trick:
Essential for doing fast data aggregation operations at
scale
Don’t Forget ..
Use your intuition to help the machine!
● Always compute differences / ratios of features
o This can help the Machine a lot!
● Always consider discarding features that are “too good”
o They can make the Machine lazy!
o An example: GE Flight Quest
Thank you!

More Related Content

What's hot

General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
Mark Peng
 
Brief introduction to Machine Learning
Brief introduction to Machine LearningBrief introduction to Machine Learning
Brief introduction to Machine Learning
CodeForFrankfurt
 
Europython - Machine Learning for dummies with Python
Europython - Machine Learning for dummies with PythonEuropython - Machine Learning for dummies with Python
Europython - Machine Learning for dummies with Python
Javier Arias Losada
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
Sebastian Raschka
 
Pybcn machine learning for dummies with python
Pybcn machine learning for dummies with pythonPybcn machine learning for dummies with python
Pybcn machine learning for dummies with python
Javier Arias Losada
 
kaggle_meet_up
kaggle_meet_upkaggle_meet_up
kaggle_meet_up
Marios Michailidis
 
Deep learning at nmc devin jones
Deep learning at nmc devin jones Deep learning at nmc devin jones
Deep learning at nmc devin jones
Ido Shilon
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
Sujit Pal
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
Pranav Ainavolu
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
Thomas da Silva Paula
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
Natalia Díaz Rodríguez
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
Nandita Naik
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013
Philip Zheng
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
HJ van Veen
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
BICA Labs
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learning
Babu Priyavrat
 
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Aseda Owusua Addai-Deseh
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
Lukas Tencer
 
PyData Global: Thrifty Machine Learning
PyData Global: Thrifty Machine LearningPyData Global: Thrifty Machine Learning
PyData Global: Thrifty Machine Learning
Rebecca Bilbro
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
butest
 

What's hot (20)

General Tips for participating Kaggle Competitions
General Tips for participating Kaggle CompetitionsGeneral Tips for participating Kaggle Competitions
General Tips for participating Kaggle Competitions
 
Brief introduction to Machine Learning
Brief introduction to Machine LearningBrief introduction to Machine Learning
Brief introduction to Machine Learning
 
Europython - Machine Learning for dummies with Python
Europython - Machine Learning for dummies with PythonEuropython - Machine Learning for dummies with Python
Europython - Machine Learning for dummies with Python
 
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...An Introduction to Supervised Machine Learning and Pattern Classification: Th...
An Introduction to Supervised Machine Learning and Pattern Classification: Th...
 
Pybcn machine learning for dummies with python
Pybcn machine learning for dummies with pythonPybcn machine learning for dummies with python
Pybcn machine learning for dummies with python
 
kaggle_meet_up
kaggle_meet_upkaggle_meet_up
kaggle_meet_up
 
Deep learning at nmc devin jones
Deep learning at nmc devin jones Deep learning at nmc devin jones
Deep learning at nmc devin jones
 
Artificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep LearningArtificial Intelligence, Machine Learning and Deep Learning
Artificial Intelligence, Machine Learning and Deep Learning
 
Understanding Basics of Machine Learning
Understanding Basics of Machine LearningUnderstanding Basics of Machine Learning
Understanding Basics of Machine Learning
 
An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)An introduction to Machine Learning (and a little bit of Deep Learning)
An introduction to Machine Learning (and a little bit of Deep Learning)
 
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
A Folksonomy of styles, aka: other stylists also said and Subjective Influenc...
 
Introduction to Machine Learning
Introduction to Machine LearningIntroduction to Machine Learning
Introduction to Machine Learning
 
A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013A tutorial on deep learning at icml 2013
A tutorial on deep learning at icml 2013
 
Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018Hacking Predictive Modeling - RoadSec 2018
Hacking Predictive Modeling - RoadSec 2018
 
Data Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural NetworksData Science, Machine Learning and Neural Networks
Data Science, Machine Learning and Neural Networks
 
Introduction to-machine-learning
Introduction to-machine-learningIntroduction to-machine-learning
Introduction to-machine-learning
 
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
Day 2 (Lecture 5): A Practitioner's Perspective on Building Machine Product i...
 
Semi-Supervised Learning
Semi-Supervised LearningSemi-Supervised Learning
Semi-Supervised Learning
 
PyData Global: Thrifty Machine Learning
PyData Global: Thrifty Machine LearningPyData Global: Thrifty Machine Learning
PyData Global: Thrifty Machine Learning
 
An introduction to Machine Learning
An introduction to Machine LearningAn introduction to Machine Learning
An introduction to Machine Learning
 

Viewers also liked

Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification Rules
Pier Luca Lanzi
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
Varad Meru
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learning
joshwills
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
Prof. Dr. Diego Kuonen
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
Daniel Tunkelang
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
Daniel Tunkelang
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
Imry Kissos
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
Xavier Amatriain
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Data Science London
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
ryanorban
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
Devashish Shanker
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data Scientists
David Pittman
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
Si Haem
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
Lars Marius Garshol
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
DEEPASHRI HK
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentation
lpaviglianiti
 
Id3,c4.5 algorithim
Id3,c4.5 algorithimId3,c4.5 algorithim
Id3,c4.5 algorithim
Abdelfattah Al Zaqqa
 
Rstudio上でのパッケージインストールを便利にするaddin4githubinstall
Rstudio上でのパッケージインストールを便利にするaddin4githubinstallRstudio上でのパッケージインストールを便利にするaddin4githubinstall
Rstudio上でのパッケージインストールを便利にするaddin4githubinstall
Atsushi Hayakawa
 
~knitr+pandocではじめる~『R MarkdownでReproducible Research』
~knitr+pandocではじめる~『R MarkdownでReproducible Research』~knitr+pandocではじめる~『R MarkdownでReproducible Research』
~knitr+pandocではじめる~『R MarkdownでReproducible Research』
Nagi Teramo
 
How i became a data scientist
How i became a data scientistHow i became a data scientist
How i became a data scientist
Owen Zhang
 

Viewers also liked (20)

Machine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification RulesMachine Learning and Data Mining: 12 Classification Rules
Machine Learning and Data Mining: 12 Classification Rules
 
Introduction to Mahout and Machine Learning
Introduction to Mahout and Machine LearningIntroduction to Mahout and Machine Learning
Introduction to Mahout and Machine Learning
 
Hadoop and Machine Learning
Hadoop and Machine LearningHadoop and Machine Learning
Hadoop and Machine Learning
 
A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)A Statistician's View on Big Data and Data Science (Version 1)
A Statistician's View on Big Data and Data Science (Version 1)
 
Data By The People, For The People
Data By The People, For The PeopleData By The People, For The People
Data By The People, For The People
 
How to Interview a Data Scientist
How to Interview a Data ScientistHow to Interview a Data Scientist
How to Interview a Data Scientist
 
Hands-on Deep Learning in Python
Hands-on Deep Learning in PythonHands-on Deep Learning in Python
Hands-on Deep Learning in Python
 
10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems10 Lessons Learned from Building Machine Learning Systems
10 Lessons Learned from Building Machine Learning Systems
 
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?Big Data [sorry] & Data Science: What Does a Data Scientist Do?
Big Data [sorry] & Data Science: What Does a Data Scientist Do?
 
How to Become a Data Scientist
How to Become a Data ScientistHow to Become a Data Scientist
How to Become a Data Scientist
 
Deep Learning for Natural Language Processing
Deep Learning for Natural Language ProcessingDeep Learning for Natural Language Processing
Deep Learning for Natural Language Processing
 
Myths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data ScientistsMyths and Mathemagical Superpowers of Data Scientists
Myths and Mathemagical Superpowers of Data Scientists
 
Deep neural networks
Deep neural networksDeep neural networks
Deep neural networks
 
Introduction to Big Data/Machine Learning
Introduction to Big Data/Machine LearningIntroduction to Big Data/Machine Learning
Introduction to Big Data/Machine Learning
 
Artificial neural network
Artificial neural networkArtificial neural network
Artificial neural network
 
Artificial Intelligence Presentation
Artificial Intelligence PresentationArtificial Intelligence Presentation
Artificial Intelligence Presentation
 
Id3,c4.5 algorithim
Id3,c4.5 algorithimId3,c4.5 algorithim
Id3,c4.5 algorithim
 
Rstudio上でのパッケージインストールを便利にするaddin4githubinstall
Rstudio上でのパッケージインストールを便利にするaddin4githubinstallRstudio上でのパッケージインストールを便利にするaddin4githubinstall
Rstudio上でのパッケージインストールを便利にするaddin4githubinstall
 
~knitr+pandocではじめる~『R MarkdownでReproducible Research』
~knitr+pandocではじめる~『R MarkdownでReproducible Research』~knitr+pandocではじめる~『R MarkdownでReproducible Research』
~knitr+pandocではじめる~『R MarkdownでReproducible Research』
 
How i became a data scientist
How i became a data scientistHow i became a data scientist
How i became a data scientist
 

Similar to 10 R Packages to Win Kaggle Competitions

Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator Program
GoDataDriven
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
Databricks
 
Spark meetup london share and analyse genomic data at scale with spark, adam...
Spark meetup london  share and analyse genomic data at scale with spark, adam...Spark meetup london  share and analyse genomic data at scale with spark, adam...
Spark meetup london share and analyse genomic data at scale with spark, adam...
Andy Petrella
 
The Art of Evolutionary Algorithms Programming
The Art of Evolutionary Algorithms ProgrammingThe Art of Evolutionary Algorithms Programming
The Art of Evolutionary Algorithms Programming
Juan J. Merelo
 
2014 pycon-talk
2014 pycon-talk2014 pycon-talk
2014 pycon-talk
c.titus.brown
 
BSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 SessionsBSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 Sessions
BigML, Inc
 
VSSML18. Deepnets and Time Series
VSSML18. Deepnets and Time SeriesVSSML18. Deepnets and Time Series
VSSML18. Deepnets and Time Series
BigML, Inc
 
2014 khmer protocols
2014 khmer protocols2014 khmer protocols
2014 khmer protocols
c.titus.brown
 
DN 2017 | Connecting the Enterprise with Machine Learning and Neo4j | Tim War...
DN 2017 | Connecting the Enterprise with Machine Learning and Neo4j | Tim War...DN 2017 | Connecting the Enterprise with Machine Learning and Neo4j | Tim War...
DN 2017 | Connecting the Enterprise with Machine Learning and Neo4j | Tim War...
Dataconomy Media
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
Poonam Kshirsagar
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Spark Summit
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
Doug Needham
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
Mark Nichols, P.E.
 
The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18
DataconomyGmbH
 
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
Dataconomy Media
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Mihai Criveti
 
Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)
Anand Sampat
 
Matlab programming
Matlab programmingMatlab programming
Matlab programming
Md. Rayid Hasan Mojumder
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
inside-BigData.com
 
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Pramati Technologies
 

Similar to 10 R Packages to Win Kaggle Competitions (20)

Data Science Accelerator Program
Data Science Accelerator ProgramData Science Accelerator Program
Data Science Accelerator Program
 
Tensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with HummingbirdTensors Are All You Need: Faster Inference with Hummingbird
Tensors Are All You Need: Faster Inference with Hummingbird
 
Spark meetup london share and analyse genomic data at scale with spark, adam...
Spark meetup london  share and analyse genomic data at scale with spark, adam...Spark meetup london  share and analyse genomic data at scale with spark, adam...
Spark meetup london share and analyse genomic data at scale with spark, adam...
 
The Art of Evolutionary Algorithms Programming
The Art of Evolutionary Algorithms ProgrammingThe Art of Evolutionary Algorithms Programming
The Art of Evolutionary Algorithms Programming
 
2014 pycon-talk
2014 pycon-talk2014 pycon-talk
2014 pycon-talk
 
BSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 SessionsBSSML16 L10. Summary Day 2 Sessions
BSSML16 L10. Summary Day 2 Sessions
 
VSSML18. Deepnets and Time Series
VSSML18. Deepnets and Time SeriesVSSML18. Deepnets and Time Series
VSSML18. Deepnets and Time Series
 
2014 khmer protocols
2014 khmer protocols2014 khmer protocols
2014 khmer protocols
 
DN 2017 | Connecting the Enterprise with Machine Learning and Neo4j | Tim War...
DN 2017 | Connecting the Enterprise with Machine Learning and Neo4j | Tim War...DN 2017 | Connecting the Enterprise with Machine Learning and Neo4j | Tim War...
DN 2017 | Connecting the Enterprise with Machine Learning and Neo4j | Tim War...
 
Current clustering techniques
Current clustering techniquesCurrent clustering techniques
Current clustering techniques
 
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth LoganMulti Model Machine Learning by Maximo Gurmendez and Beth Logan
Multi Model Machine Learning by Maximo Gurmendez and Beth Logan
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
 
The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18
 
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
 
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
Retrieval Augmented Generation in Practice: Scalable GenAI platforms with k8s...
 
Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)Version Control in Machine Learning + AI (Stanford)
Version Control in Machine Learning + AI (Stanford)
 
Matlab programming
Matlab programmingMatlab programming
Matlab programming
 
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
Efficient Model Selection for Deep Neural Networks on Massively Parallel Proc...
 
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
Document Clustering using LDA | Haridas Narayanaswamy [Pramati]
 

Recently uploaded

University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
soxrziqu
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
slg6lamcq
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
taqyea
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
Bill641377
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
Social Samosa
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Aggregage
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
bmucuha
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
wyddcwye1
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
jitskeb
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
Sm321
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
Lars Albertsson
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
nuttdpt
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
Sachin Paul
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
y3i0qsdzb
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
sameer shah
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
sameer shah
 

Recently uploaded (20)

University of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma TranscriptUniversity of New South Wales degree offer diploma Transcript
University of New South Wales degree offer diploma Transcript
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
一比一原版南十字星大学毕业证(SCU毕业证书)学历如何办理
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
一比一原版(harvard毕业证书)哈佛大学毕业证如何办理
 
Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...Population Growth in Bataan: The effects of population growth around rural pl...
Population Growth in Bataan: The effects of population growth around rural pl...
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
The Ipsos - AI - Monitor 2024 Report.pdf
The  Ipsos - AI - Monitor 2024 Report.pdfThe  Ipsos - AI - Monitor 2024 Report.pdf
The Ipsos - AI - Monitor 2024 Report.pdf
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
Beyond the Basics of A/B Tests: Highly Innovative Experimentation Tactics You...
 
一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理一比一原版(UO毕业证)渥太华大学毕业证如何办理
一比一原版(UO毕业证)渥太华大学毕业证如何办理
 
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
原版一比一利兹贝克特大学毕业证(LeedsBeckett毕业证书)如何办理
 
Experts live - Improving user adoption with AI
Experts live - Improving user adoption with AIExperts live - Improving user adoption with AI
Experts live - Improving user adoption with AI
 
Challenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more importantChallenges of Nation Building-1.pptx with more important
Challenges of Nation Building-1.pptx with more important
 
End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024End-to-end pipeline agility - Berlin Buzzwords 2024
End-to-end pipeline agility - Berlin Buzzwords 2024
 
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
一比一原版(UCSB文凭证书)圣芭芭拉分校毕业证如何办理
 
Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......Palo Alto Cortex XDR presentation .......
Palo Alto Cortex XDR presentation .......
 
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
一比一原版巴斯大学毕业证(Bath毕业证书)学历如何办理
 
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
STATATHON: Unleashing the Power of Statistics in a 48-Hour Knowledge Extravag...
 
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens""Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
"Financial Odyssey: Navigating Past Performance Through Diverse Analytical Lens"
 

10 R Packages to Win Kaggle Competitions

  • 1. 10 R Packages to Win Kaggle Competitions Xavier Conort Data Scientist
  • 3. Competitions that boosted my R learning curve The Machine seems much smarter than I am at capturing complexity in the data even for simple datasets! Humans can help the Machine too! But don’t oversimplify and discard any data. Don’t be impatient. My best GBM had 24,500 trees with learning rate = 0.01! SVM and feature selection matter too!
  • 4. Word n-grams and character n-grams can make a big difference Parallel processing and big servers can help with complex feature engineering! Still many awesome tools in R that I don’t know! Glmnet can do a great job! Competitions that boosted my R learning curve
  • 5. 10 R Packages: Allow the Machine to Capture Complexity 1. gbm 2. randomForest 3. e1071 Take Advantage of High-Cardinality Categorical or Text Data 4. glmnet 5. tau Make Your Code More Efficient 6. Matrix 7. SOAR 8. forEach 9. doMC 10. data.table
  • 7. 1. gbm Gradient Boosting Machine (Freud & Schapiro) Greg Ridgeway / Harry Southworth Key Trick: Use gbm.more to write your own early-stopping procedure
  • 8. 2. randomForest Random Forests (Breiman & Cutler) Authors: Breiman and Cutler Maintainer: Andy Liaw Key Trick: Importance=True for permutation importance Tune the sampsize parameter for faster computation and handling unbalanced classes
  • 9. 3. e1071 3. e1071:Support Vector Machines Maintainer: David Meyer Key Tricks: Use kernlab (Karatzoglou, Smola and Hornik) to get heuristic Write own pattern search
  • 10. Take Advantage of High-Cardinality Categorical or Text Features
  • 11. 4. glmnet Authors: Friedman, Hastie, Simon, Tibshirani L1 / Elasticnet / L2 Key Tricks: - Try interactions of 2 or more categorical variables - Test your code on the Kaggle: “Amazon Employ Access Challenge”
  • 12. 5. tau Maintainer: Kurt Hornik Used for automating text-mining Key Trick: Try character n-grams. They work surprisingly well!
  • 13. Make Your Code More Efficient
  • 14. 6. Matrix Authors / Maintainers: Douglas Bates and Martin Maechler Key Trick: Use sparse.model.matrix for one-hot encoding
  • 15. 7. SOAR Author / Maintainer: Bill Venables Used to store large R objects in the cache and release memory Key Trick: Once I found out about it, it made my R Experience great! (Just remember to empty your cache … )
  • 16. 8. forEach and 9. doMC Authors: Revolution Analytics Key Trick: Use for parallel-processing to speed up computation
  • 17. 10. data.table Authors: M Dowle, T Short and others Maintainer: Matt Dowle Key Trick: Essential for doing fast data aggregation operations at scale
  • 18. Don’t Forget .. Use your intuition to help the machine! ● Always compute differences / ratios of features o This can help the Machine a lot! ● Always consider discarding features that are “too good” o They can make the Machine lazy! o An example: GE Flight Quest