SlideShare a Scribd company logo
1 of 46
Download to read offline
Higgs Challenge 
My first attempt at Kaggle: 755st and proud!
@dhianadeva
Err… Kaggle?! 
Platform for data science competitions 
Machine Learning, Big Data, Statistics, Data 
mining ... 
Community for data scientists 
Users, leaderboard, forums … 
Sponsors!
$$$posored competitions!
We don’t need no PhD!
Yes, we can! 
My guilty pleasure: 
Student license of MATLAB <3 
Open source alternatives: 
Python + Scikit + Numpy + … 
R + randomForest + e1071 + caret + … 
Octave!?
Higgs Challenge
Datasets 
Training (labeled): 
250k events 
30 features 
Event id, weight and class (s/b) 
Test (unlabeled): 
18% Public (500k events) 
72% Private
training.csv 
EventId , DER_mass_MMC , … , Weight , Class 
100000 , 138.47 , … , 0.00265331133733 , s 
100001 , 160.937 , … , 2.23358448717 , b 
100002 , -999.0 , … , 2.34738894364 , b 
100003 , 143.905 , … , 5.44637821192 , b 
…
test.csv 
EventId , DER_mass_MMC , … , PRI_jet_all_pt 
350000 , -999 , … , -0.0 
350001 , 106.398 , … , 47.575 
350002 , 117.794 , … , 0.0 
350003 , 135.861 , … , 0.0 
…
submission.csv 
EventId , RankOrder , Class 
350000 , 262328 , b 
350001 , 201479 , b 
350002 , 212810 , b 
350003 , 134945 , b 
…
End-to-end
A little math... 
(Aproximate Median Significance)
755th/1785 secrets 
I’ve entered on the last 8 days of the 127-days 
challenge and could overtake more than half of 
the competitors using: 
MATLAB 2014b (student license) 
Neural Networks Toolbox 
20$ EC2 at Amazon Web Services 
9 code files totaling 674 words
Neural netwhat?! 
Inputs Output 
Neurons
For now, a Black box! 
Inputs Output
It trains 
Output 
Target 
Inputs 
Error
It runs 
Inputs Output
Moonlighting!
8 days! 
1. nprtool 
2. fixunknowns 
3. trainlm 
4. processpca 
5. 0.8 threshold 
6. ams threshold pick 
7. hidden neurons pick 
8. 0.25*targets + amsweights
Some stats...
Day 1
Day 2
Day 3
Day 4
Day 5
Day 6
Day 7
Day 8 
Oops! 
(weighted errors using ams, regularization, mapstd, … nothing worked!)
Lessons learned 
+ Optimize self-learning doing things from scratch (or 
from default baseline) 
+ Kaggle is way funnier than studying with traditional 
datasets (iris, cancer, thyroid...) 
+ Data science needs good engineering practices! 
+ The competition fact sheet was a great way of 
accessing what I know I know, what I know I don’t 
know…
Let’s hack?! 
Re-considering PCA 
PCD? 
Dimensionality Reduction 
Stop on best AMS (hack nn toolbox!) 
Ensemble 
Auto-encoder 
MATLAB unit tests 
MATLAB continuous integration
Thanks! ;)

More Related Content

What's hot

Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
MLconf
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
MLconf
 
Surveys of Image Recoginition.ppt
Surveys of Image Recoginition.pptSurveys of Image Recoginition.ppt
Surveys of Image Recoginition.ppt
Sarang Rakhecha
 

What's hot (19)

Machine Intelligence at Google Scale: TensorFlow
Machine Intelligence at Google Scale: TensorFlowMachine Intelligence at Google Scale: TensorFlow
Machine Intelligence at Google Scale: TensorFlow
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?
 
Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017
 
K-Fashion 경진대회 3등 수상자 솔루션
K-Fashion 경진대회 3등 수상자 솔루션K-Fashion 경진대회 3등 수상자 솔루션
K-Fashion 경진대회 3등 수상자 솔루션
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Machine Learning Overview
Machine Learning OverviewMachine Learning Overview
Machine Learning Overview
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slides
 
Ds for finance day 3
Ds for finance day 3Ds for finance day 3
Ds for finance day 3
 
Towards Machine Intelligence
Towards Machine IntelligenceTowards Machine Intelligence
Towards Machine Intelligence
 
ModuLab DLC-Medical3
ModuLab DLC-Medical3ModuLab DLC-Medical3
ModuLab DLC-Medical3
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
 
Reinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsReinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving Cars
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainability
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar Dresses
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in R
 
Surveys of Image Recoginition.ppt
Surveys of Image Recoginition.pptSurveys of Image Recoginition.ppt
Surveys of Image Recoginition.ppt
 

Similar to My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Intelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIntelligent Ruby + Machine Learning
Intelligent Ruby + Machine Learning
Ilya Grigorik
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
Doug Needham
 
Next.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev OpsNext.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev Ops
Eric Chiang
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
butest
 

Similar to My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud! (20)

Learning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and KaggleLearning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and Kaggle
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science Competitions
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
ML Basic Concepts.pdf
ML Basic Concepts.pdfML Basic Concepts.pdf
ML Basic Concepts.pdf
 
Intelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIntelligent Ruby + Machine Learning
Intelligent Ruby + Machine Learning
 
CascadiaJS 2015 - Adding intelligence to your JS applications
CascadiaJS 2015 - Adding intelligence to your JS applicationsCascadiaJS 2015 - Adding intelligence to your JS applications
CascadiaJS 2015 - Adding intelligence to your JS applications
 
QCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneQCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for Everyone
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
 
A Developer's Guide To Machine Learning
A Developer's Guide To Machine LearningA Developer's Guide To Machine Learning
A Developer's Guide To Machine Learning
 
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
 
The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18
 
Next.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev OpsNext.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev Ops
 
Deep learning
Deep learningDeep learning
Deep learning
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4
 
Eye deep
Eye deepEye deep
Eye deep
 
Get connected with python
Get connected with pythonGet connected with python
Get connected with python
 
data mining with weka application
data mining with weka applicationdata mining with weka application
data mining with weka application
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
 

More from Dhiana Deva

Sistemas de recomendação
Sistemas de recomendaçãoSistemas de recomendação
Sistemas de recomendação
Dhiana Deva
 

More from Dhiana Deva (7)

Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
 
Um Pouquinho Sobre Métodos Ágeis - Rails Girls SP
Um Pouquinho Sobre Métodos Ágeis - Rails Girls SPUm Pouquinho Sobre Métodos Ágeis - Rails Girls SP
Um Pouquinho Sobre Métodos Ágeis - Rails Girls SP
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
We love NLTK
We love NLTKWe love NLTK
We love NLTK
 
AR Post-its @ CBSOFT
AR Post-its @ CBSOFTAR Post-its @ CBSOFT
AR Post-its @ CBSOFT
 
Self-Organizing Maps 101 (Dhiana Deva)
Self-Organizing Maps 101 (Dhiana Deva)Self-Organizing Maps 101 (Dhiana Deva)
Self-Organizing Maps 101 (Dhiana Deva)
 
Sistemas de recomendação
Sistemas de recomendaçãoSistemas de recomendação
Sistemas de recomendação
 

Recently uploaded

Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
amitlee9823
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
amitlee9823
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
amitlee9823
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
amitlee9823
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
amitlee9823
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
9953056974 Low Rate Call Girls In Saket, Delhi NCR
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
amitlee9823
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
amitlee9823
 

Recently uploaded (20)

Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% SecureCall me @ 9892124323  Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
Call me @ 9892124323 Cheap Rate Call Girls in Vashi with Real Photo 100% Secure
 
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
Call Girls Bannerghatta Road Just Call 👗 7737669865 👗 Top Class Call Girl Ser...
 
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
Mg Road Call Girls Service: 🍓 7737669865 🍓 High Profile Model Escorts | Banga...
 
Generative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and MilvusGenerative AI on Enterprise Cloud with NiFi and Milvus
Generative AI on Enterprise Cloud with NiFi and Milvus
 
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service BangaloreCall Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
Call Girls Begur Just Call 👗 7737669865 👗 Top Class Call Girl Service Bangalore
 
BabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptxBabyOno dropshipping via API with DroFx.pptx
BabyOno dropshipping via API with DroFx.pptx
 
ALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptxALSO dropshipping via API with DroFx.pptx
ALSO dropshipping via API with DroFx.pptx
 
Invezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signalsInvezz.com - Grow your wealth with trading signals
Invezz.com - Grow your wealth with trading signals
 
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
Call Girls Bommasandra Just Call 👗 7737669865 👗 Top Class Call Girl Service B...
 
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
Call Girls in Sarai Kale Khan Delhi 💯 Call Us 🔝9205541914 🔝( Delhi) Escorts S...
 
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort ServiceBDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
BDSM⚡Call Girls in Mandawali Delhi >༒8448380779 Escort Service
 
April 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's AnalysisApril 2024 - Crypto Market Report's Analysis
April 2024 - Crypto Market Report's Analysis
 
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
Junnasandra Call Girls: 🍓 7737669865 🍓 High Profile Model Escorts | Bangalore...
 
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICECHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
CHEAP Call Girls in Saket (-DELHI )🔝 9953056974🔝(=)/CALL GIRLS SERVICE
 
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
Vip Mumbai Call Girls Marol Naka Call On 9920725232 With Body to body massage...
 
Anomaly detection and data imputation within time series
Anomaly detection and data imputation within time seriesAnomaly detection and data imputation within time series
Anomaly detection and data imputation within time series
 
Halmar dropshipping via API with DroFx
Halmar  dropshipping  via API with DroFxHalmar  dropshipping  via API with DroFx
Halmar dropshipping via API with DroFx
 
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 nightCheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
Cheap Rate Call girls Sarita Vihar Delhi 9205541914 shot 1500 night
 
VidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptxVidaXL dropshipping via API with DroFx.pptx
VidaXL dropshipping via API with DroFx.pptx
 
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night StandCall Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
Call Girls In Bellandur ☎ 7737669865 🥵 Book Your One night Stand
 

My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

  • 1. Higgs Challenge My first attempt at Kaggle: 755st and proud!
  • 3. Err… Kaggle?! Platform for data science competitions Machine Learning, Big Data, Statistics, Data mining ... Community for data scientists Users, leaderboard, forums … Sponsors!
  • 4.
  • 5.
  • 7. We don’t need no PhD!
  • 8. Yes, we can! My guilty pleasure: Student license of MATLAB <3 Open source alternatives: Python + Scikit + Numpy + … R + randomForest + e1071 + caret + … Octave!?
  • 9.
  • 11.
  • 12.
  • 13.
  • 14. Datasets Training (labeled): 250k events 30 features Event id, weight and class (s/b) Test (unlabeled): 18% Public (500k events) 72% Private
  • 15.
  • 16. training.csv EventId , DER_mass_MMC , … , Weight , Class 100000 , 138.47 , … , 0.00265331133733 , s 100001 , 160.937 , … , 2.23358448717 , b 100002 , -999.0 , … , 2.34738894364 , b 100003 , 143.905 , … , 5.44637821192 , b …
  • 17. test.csv EventId , DER_mass_MMC , … , PRI_jet_all_pt 350000 , -999 , … , -0.0 350001 , 106.398 , … , 47.575 350002 , 117.794 , … , 0.0 350003 , 135.861 , … , 0.0 …
  • 18. submission.csv EventId , RankOrder , Class 350000 , 262328 , b 350001 , 201479 , b 350002 , 212810 , b 350003 , 134945 , b …
  • 20. A little math... (Aproximate Median Significance)
  • 21. 755th/1785 secrets I’ve entered on the last 8 days of the 127-days challenge and could overtake more than half of the competitors using: MATLAB 2014b (student license) Neural Networks Toolbox 20$ EC2 at Amazon Web Services 9 code files totaling 674 words
  • 22. Neural netwhat?! Inputs Output Neurons
  • 23. For now, a Black box! Inputs Output
  • 24. It trains Output Target Inputs Error
  • 25. It runs Inputs Output
  • 26.
  • 28. 8 days! 1. nprtool 2. fixunknowns 3. trainlm 4. processpca 5. 0.8 threshold 6. ams threshold pick 7. hidden neurons pick 8. 0.25*targets + amsweights
  • 30. Day 1
  • 31.
  • 32. Day 2
  • 33. Day 3
  • 34. Day 4
  • 35.
  • 36.
  • 37. Day 5
  • 38.
  • 39. Day 6
  • 40. Day 7
  • 41. Day 8 Oops! (weighted errors using ams, regularization, mapstd, … nothing worked!)
  • 42. Lessons learned + Optimize self-learning doing things from scratch (or from default baseline) + Kaggle is way funnier than studying with traditional datasets (iris, cancer, thyroid...) + Data science needs good engineering practices! + The competition fact sheet was a great way of accessing what I know I know, what I know I don’t know…
  • 43.
  • 44.
  • 45. Let’s hack?! Re-considering PCA PCD? Dimensionality Reduction Stop on best AMS (hack nn toolbox!) Ensemble Auto-encoder MATLAB unit tests MATLAB continuous integration