In this presentation, I talk about data science competitions. After an introduction of the data science competitions, I go through the benefits, misconceptions, and best practices of competitions.
33. NO EDA?
• Most of competitions provide actual labels - typical EDA
• Anonymized data - more creative EDA
• People decode age, states, time intervals, income, etc.
22
41. ALGORITHMS
Algorithm Tool Note
Gradient Boosting Machine XGBoost, LightGBM
The most popular algorithm in
competitions
Random Forests Scikit-Learn, randomForest
Extremely RandomTrees Scikit-Learn
Neural Networks/ Deep Learning Keras, MXNet
Blends well with GBM. Best at image
recognition competitions, NLP.
Logistic/Linear Regression Scikit-Learn,Vowpal Wabbit Fastest. Good for ensemble.
SupportVector Machine Scikit-Learn
FTRL Vowpal Wabbit
Competitive solution for CTR
estimation competitions
Factorization Machine libFM Winning solution for KDD Cup 2012
Field-aware Factorization Machine libFFM
Winning solution for CTR estimation
competitions (Criteo,Avazu)
30