The Higgs Machine Learning Challenge is not only a place for PhDs! As an undergraduate with a student license of MATLAB and a couple of dollars for Amazon AWS I could enter on the last 8 days of the challenge and overtake more than half of the competitors! In this talk, I'll present the challenge, my approach, and walk through the code.
3. Err… Kaggle?!
Platform for data science competitions
Machine Learning, Big Data, Statistics, Data
mining ...
Community for data scientists
Users, leaderboard, forums …
Sponsors!
21. 755th/1785 secrets
I’ve entered on the last 8 days of the 127-days
challenge and could overtake more than half of
the competitors using:
MATLAB 2014b (student license)
Neural Networks Toolbox
20$ EC2 at Amazon Web Services
9 code files totaling 674 words
41. Day 8
Oops!
(weighted errors using ams, regularization, mapstd, … nothing worked!)
42. Lessons learned
+ Optimize self-learning doing things from scratch (or
from default baseline)
+ Kaggle is way funnier than studying with traditional
datasets (iris, cancer, thyroid...)
+ Data science needs good engineering practices!
+ The competition fact sheet was a great way of
accessing what I know I know, what I know I don’t
know…
43.
44.
45. Let’s hack?!
Re-considering PCA
PCD?
Dimensionality Reduction
Stop on best AMS (hack nn toolbox!)
Ensemble
Auto-encoder
MATLAB unit tests
MATLAB continuous integration