SlideShare a Scribd company logo
1 of 46
Download to read offline
Higgs Challenge 
My first attempt at Kaggle: 755st and proud!
@dhianadeva
Err… Kaggle?! 
Platform for data science competitions 
Machine Learning, Big Data, Statistics, Data 
mining ... 
Community for data scientists 
Users, leaderboard, forums … 
Sponsors!
$$$posored competitions!
We don’t need no PhD!
Yes, we can! 
My guilty pleasure: 
Student license of MATLAB <3 
Open source alternatives: 
Python + Scikit + Numpy + … 
R + randomForest + e1071 + caret + … 
Octave!?
Higgs Challenge
Datasets 
Training (labeled): 
250k events 
30 features 
Event id, weight and class (s/b) 
Test (unlabeled): 
18% Public (500k events) 
72% Private
training.csv 
EventId , DER_mass_MMC , … , Weight , Class 
100000 , 138.47 , … , 0.00265331133733 , s 
100001 , 160.937 , … , 2.23358448717 , b 
100002 , -999.0 , … , 2.34738894364 , b 
100003 , 143.905 , … , 5.44637821192 , b 
…
test.csv 
EventId , DER_mass_MMC , … , PRI_jet_all_pt 
350000 , -999 , … , -0.0 
350001 , 106.398 , … , 47.575 
350002 , 117.794 , … , 0.0 
350003 , 135.861 , … , 0.0 
…
submission.csv 
EventId , RankOrder , Class 
350000 , 262328 , b 
350001 , 201479 , b 
350002 , 212810 , b 
350003 , 134945 , b 
…
End-to-end
A little math... 
(Aproximate Median Significance)
755th/1785 secrets 
I’ve entered on the last 8 days of the 127-days 
challenge and could overtake more than half of 
the competitors using: 
MATLAB 2014b (student license) 
Neural Networks Toolbox 
20$ EC2 at Amazon Web Services 
9 code files totaling 674 words
Neural netwhat?! 
Inputs Output 
Neurons
For now, a Black box! 
Inputs Output
It trains 
Output 
Target 
Inputs 
Error
It runs 
Inputs Output
Moonlighting!
8 days! 
1. nprtool 
2. fixunknowns 
3. trainlm 
4. processpca 
5. 0.8 threshold 
6. ams threshold pick 
7. hidden neurons pick 
8. 0.25*targets + amsweights
Some stats...
Day 1
Day 2
Day 3
Day 4
Day 5
Day 6
Day 7
Day 8 
Oops! 
(weighted errors using ams, regularization, mapstd, … nothing worked!)
Lessons learned 
+ Optimize self-learning doing things from scratch (or 
from default baseline) 
+ Kaggle is way funnier than studying with traditional 
datasets (iris, cancer, thyroid...) 
+ Data science needs good engineering practices! 
+ The competition fact sheet was a great way of 
accessing what I know I know, what I know I don’t 
know…
Let’s hack?! 
Re-considering PCA 
PCD? 
Dimensionality Reduction 
Stop on best AMS (hack nn toolbox!) 
Ensemble 
Auto-encoder 
MATLAB unit tests 
MATLAB continuous integration
Thanks! ;)

More Related Content

What's hot

Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...MLconf
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Manuel Martín
 
Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017Julian Lee
 
K-Fashion 경진대회 3등 수상자 솔루션
K-Fashion 경진대회 3등 수상자 솔루션K-Fashion 경진대회 3등 수상자 솔루션
K-Fashion 경진대회 3등 수상자 솔루션DACON AI 데이콘
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnBenjamin Bengfort
 
Machine Learning Overview
Machine Learning OverviewMachine Learning Overview
Machine Learning OverviewMykhailo Koval
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slidesMLconf
 
ModuLab DLC-Medical3
ModuLab DLC-Medical3ModuLab DLC-Medical3
ModuLab DLC-Medical3Dongheon Lee
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016MLconf
 
Reinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsReinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsSneha Ravikumar
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainabilitygeetachauhan
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesHJ van Veen
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016MLconf
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_financeStefan Duprey
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in Rmikaelhuss
 
Surveys of Image Recoginition.ppt
Surveys of Image Recoginition.pptSurveys of Image Recoginition.ppt
Surveys of Image Recoginition.pptSarang Rakhecha
 

What's hot (19)

Machine Intelligence at Google Scale: TensorFlow
Machine Intelligence at Google Scale: TensorFlowMachine Intelligence at Google Scale: TensorFlow
Machine Intelligence at Google Scale: TensorFlow
 
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
Le Song, Assistant Professor, College of Computing, Georgia Institute of Tech...
 
Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?Automating Machine Learning - Is it feasible?
Automating Machine Learning - Is it feasible?
 
Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017Microsoft Data Wranglers - 8august2017
Microsoft Data Wranglers - 8august2017
 
K-Fashion 경진대회 3등 수상자 솔루션
K-Fashion 경진대회 3등 수상자 솔루션K-Fashion 경진대회 3등 수상자 솔루션
K-Fashion 경진대회 3등 수상자 솔루션
 
Introduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-LearnIntroduction to Machine Learning with SciKit-Learn
Introduction to Machine Learning with SciKit-Learn
 
Machine Learning Overview
Machine Learning OverviewMachine Learning Overview
Machine Learning Overview
 
Josh Patterson MLconf slides
Josh Patterson MLconf slidesJosh Patterson MLconf slides
Josh Patterson MLconf slides
 
Ds for finance day 3
Ds for finance day 3Ds for finance day 3
Ds for finance day 3
 
Towards Machine Intelligence
Towards Machine IntelligenceTowards Machine Intelligence
Towards Machine Intelligence
 
ModuLab DLC-Medical3
ModuLab DLC-Medical3ModuLab DLC-Medical3
ModuLab DLC-Medical3
 
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016Daniel Shank, Data Scientist, Talla at MLconf SF 2016
Daniel Shank, Data Scientist, Talla at MLconf SF 2016
 
Reinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving CarsReinforcement Learning for Self Driving Cars
Reinforcement Learning for Self Driving Cars
 
Profiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & SustainabilityProfiling PyTorch for Efficiency & Sustainability
Profiling PyTorch for Efficiency & Sustainability
 
Using Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar DressesUsing Deep Learning to Find Similar Dresses
Using Deep Learning to Find Similar Dresses
 
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
Erin LeDell, Machine Learning Scientist, H2O.ai at MLconf ATL 2016
 
Machine learning for_finance
Machine learning for_financeMachine learning for_finance
Machine learning for_finance
 
Deep learning with Tensorflow in R
Deep learning with Tensorflow in RDeep learning with Tensorflow in R
Deep learning with Tensorflow in R
 
Surveys of Image Recoginition.ppt
Surveys of Image Recoginition.pptSurveys of Image Recoginition.ppt
Surveys of Image Recoginition.ppt
 

Similar to My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

Learning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and KaggleLearning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and KaggleYvonne K. Matos
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsKrishna Sankar
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017StampedeCon
 
ML Basic Concepts.pdf
ML Basic Concepts.pdfML Basic Concepts.pdf
ML Basic Concepts.pdfManishaS49
 
Intelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIntelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIlya Grigorik
 
CascadiaJS 2015 - Adding intelligence to your JS applications
CascadiaJS 2015 - Adding intelligence to your JS applicationsCascadiaJS 2015 - Adding intelligence to your JS applications
CascadiaJS 2015 - Adding intelligence to your JS applicationsKevin Dela Rosa
 
QCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneQCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneDhiana Deva
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science ChallengeMark Nichols, P.E.
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupDoug Needham
 
A Developer's Guide To Machine Learning
A Developer's Guide To Machine LearningA Developer's Guide To Machine Learning
A Developer's Guide To Machine LearningSefik Ilkin Serengil
 
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify Dataconomy Media
 
The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18DataconomyGmbH
 
Next.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev OpsNext.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev OpsEric Chiang
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273Abutest
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4Roger Barga
 
Get connected with python
Get connected with pythonGet connected with python
Get connected with pythonJan Kroon
 
data mining with weka application
data mining with weka applicationdata mining with weka application
data mining with weka applicationRezapourabbas
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...Herman Wu
 

Similar to My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud! (20)

Learning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and KaggleLearning Predictive Modeling with TSA and Kaggle
Learning Predictive Modeling with TSA and Kaggle
 
Data Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science CompetitionsData Wrangling For Kaggle Data Science Competitions
Data Wrangling For Kaggle Data Science Competitions
 
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
End-to-end Big Data Projects with Python - StampedeCon Big Data Conference 2017
 
ML Basic Concepts.pdf
ML Basic Concepts.pdfML Basic Concepts.pdf
ML Basic Concepts.pdf
 
Intelligent Ruby + Machine Learning
Intelligent Ruby + Machine LearningIntelligent Ruby + Machine Learning
Intelligent Ruby + Machine Learning
 
CascadiaJS 2015 - Adding intelligence to your JS applications
CascadiaJS 2015 - Adding intelligence to your JS applicationsCascadiaJS 2015 - Adding intelligence to your JS applications
CascadiaJS 2015 - Adding intelligence to your JS applications
 
QCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for EveryoneQCon Rio - Machine Learning for Everyone
QCon Rio - Machine Learning for Everyone
 
Cloudera Data Science Challenge
Cloudera Data Science ChallengeCloudera Data Science Challenge
Cloudera Data Science Challenge
 
Data Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup GroupData Science Challenge presentation given to the CinBITools Meetup Group
Data Science Challenge presentation given to the CinBITools Meetup Group
 
A Developer's Guide To Machine Learning
A Developer's Guide To Machine LearningA Developer's Guide To Machine Learning
A Developer's Guide To Machine Learning
 
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
DN18 | The Data Janitor Returns | Daniel Molnar | Oberlo/Shopify
 
The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18The Data Janitor Returns | Daniel Molnar | DN18
The Data Janitor Returns | Daniel Molnar | DN18
 
Next.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev OpsNext.ml Boston: Data Science Dev Ops
Next.ml Boston: Data Science Dev Ops
 
Deep learning
Deep learningDeep learning
Deep learning
 
Machine Learning ICS 273A
Machine Learning ICS 273AMachine Learning ICS 273A
Machine Learning ICS 273A
 
Barga Data Science lecture 4
Barga Data Science lecture 4Barga Data Science lecture 4
Barga Data Science lecture 4
 
Eye deep
Eye deepEye deep
Eye deep
 
Get connected with python
Get connected with pythonGet connected with python
Get connected with python
 
data mining with weka application
data mining with weka applicationdata mining with weka application
data mining with weka application
 
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
運用CNTK 實作深度學習物件辨識 Deep Learning based Object Detection with Microsoft Cogniti...
 

More from Dhiana Deva

Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Dhiana Deva
 
Um Pouquinho Sobre Métodos Ágeis - Rails Girls SP
Um Pouquinho Sobre Métodos Ágeis - Rails Girls SPUm Pouquinho Sobre Métodos Ágeis - Rails Girls SP
Um Pouquinho Sobre Métodos Ágeis - Rails Girls SPDhiana Deva
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data ScienceDhiana Deva
 
AR Post-its @ CBSOFT
AR Post-its @ CBSOFTAR Post-its @ CBSOFT
AR Post-its @ CBSOFTDhiana Deva
 
Self-Organizing Maps 101 (Dhiana Deva)
Self-Organizing Maps 101 (Dhiana Deva)Self-Organizing Maps 101 (Dhiana Deva)
Self-Organizing Maps 101 (Dhiana Deva)Dhiana Deva
 
Sistemas de recomendação
Sistemas de recomendaçãoSistemas de recomendação
Sistemas de recomendaçãoDhiana Deva
 

More from Dhiana Deva (7)

Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
Machine Learning: Opening the Pandora's Box - Dhiana Deva @ QCon São Paulo 2019
 
Um Pouquinho Sobre Métodos Ágeis - Rails Girls SP
Um Pouquinho Sobre Métodos Ágeis - Rails Girls SPUm Pouquinho Sobre Métodos Ágeis - Rails Girls SP
Um Pouquinho Sobre Métodos Ágeis - Rails Girls SP
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
We love NLTK
We love NLTKWe love NLTK
We love NLTK
 
AR Post-its @ CBSOFT
AR Post-its @ CBSOFTAR Post-its @ CBSOFT
AR Post-its @ CBSOFT
 
Self-Organizing Maps 101 (Dhiana Deva)
Self-Organizing Maps 101 (Dhiana Deva)Self-Organizing Maps 101 (Dhiana Deva)
Self-Organizing Maps 101 (Dhiana Deva)
 
Sistemas de recomendação
Sistemas de recomendaçãoSistemas de recomendação
Sistemas de recomendação
 

Recently uploaded

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝soniya singh
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfSocial Samosa
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxStephen266013
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degreeyuu sss
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfJohn Sterrett
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Colleen Farrelly
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home ServiceSapana Sha
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationshipsccctableauusergroup
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...soniya singh
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDRafezzaman
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubaihf8803863
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptSonatrach
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFAAndrei Kaleshka
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 

Recently uploaded (20)

Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
Call Girls in Defence Colony Delhi 💯Call Us 🔝8264348440🔝
 
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdfKantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
Kantar AI Summit- Under Embargo till Wednesday, 24th April 2024, 4 PM, IST.pdf
 
B2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docxB2 Creative Industry Response Evaluation.docx
B2 Creative Industry Response Evaluation.docx
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
办美国阿肯色大学小石城分校毕业证成绩单pdf电子版制作修改#真实留信入库#永久存档#真实可查#diploma#degree
 
DBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdfDBA Basics: Getting Started with Performance Tuning.pdf
DBA Basics: Getting Started with Performance Tuning.pdf
 
Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024Generative AI for Social Good at Open Data Science East 2024
Generative AI for Social Good at Open Data Science East 2024
 
9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service9654467111 Call Girls In Munirka Hotel And Home Service
9654467111 Call Girls In Munirka Hotel And Home Service
 
04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships04242024_CCC TUG_Joins and Relationships
04242024_CCC TUG_Joins and Relationships
 
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
High Class Call Girls Noida Sector 39 Aarushi 🔝8264348440🔝 Independent Escort...
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTDINTERNSHIP ON PURBASHA COMPOSITE TEX LTD
INTERNSHIP ON PURBASHA COMPOSITE TEX LTD
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls DubaiDubai Call Girls Wifey O52&786472 Call Girls Dubai
Dubai Call Girls Wifey O52&786472 Call Girls Dubai
 
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.pptdokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
dokumen.tips_chapter-4-transient-heat-conduction-mehmet-kanoglu.ppt
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
How we prevented account sharing with MFA
How we prevented account sharing with MFAHow we prevented account sharing with MFA
How we prevented account sharing with MFA
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 

My First Attempt on Kaggle - Higgs Machine Learning Challenge: 755st and Proud!

  • 1. Higgs Challenge My first attempt at Kaggle: 755st and proud!
  • 3. Err… Kaggle?! Platform for data science competitions Machine Learning, Big Data, Statistics, Data mining ... Community for data scientists Users, leaderboard, forums … Sponsors!
  • 4.
  • 5.
  • 7. We don’t need no PhD!
  • 8. Yes, we can! My guilty pleasure: Student license of MATLAB <3 Open source alternatives: Python + Scikit + Numpy + … R + randomForest + e1071 + caret + … Octave!?
  • 9.
  • 11.
  • 12.
  • 13.
  • 14. Datasets Training (labeled): 250k events 30 features Event id, weight and class (s/b) Test (unlabeled): 18% Public (500k events) 72% Private
  • 15.
  • 16. training.csv EventId , DER_mass_MMC , … , Weight , Class 100000 , 138.47 , … , 0.00265331133733 , s 100001 , 160.937 , … , 2.23358448717 , b 100002 , -999.0 , … , 2.34738894364 , b 100003 , 143.905 , … , 5.44637821192 , b …
  • 17. test.csv EventId , DER_mass_MMC , … , PRI_jet_all_pt 350000 , -999 , … , -0.0 350001 , 106.398 , … , 47.575 350002 , 117.794 , … , 0.0 350003 , 135.861 , … , 0.0 …
  • 18. submission.csv EventId , RankOrder , Class 350000 , 262328 , b 350001 , 201479 , b 350002 , 212810 , b 350003 , 134945 , b …
  • 20. A little math... (Aproximate Median Significance)
  • 21. 755th/1785 secrets I’ve entered on the last 8 days of the 127-days challenge and could overtake more than half of the competitors using: MATLAB 2014b (student license) Neural Networks Toolbox 20$ EC2 at Amazon Web Services 9 code files totaling 674 words
  • 22. Neural netwhat?! Inputs Output Neurons
  • 23. For now, a Black box! Inputs Output
  • 24. It trains Output Target Inputs Error
  • 25. It runs Inputs Output
  • 26.
  • 28. 8 days! 1. nprtool 2. fixunknowns 3. trainlm 4. processpca 5. 0.8 threshold 6. ams threshold pick 7. hidden neurons pick 8. 0.25*targets + amsweights
  • 30. Day 1
  • 31.
  • 32. Day 2
  • 33. Day 3
  • 34. Day 4
  • 35.
  • 36.
  • 37. Day 5
  • 38.
  • 39. Day 6
  • 40. Day 7
  • 41. Day 8 Oops! (weighted errors using ams, regularization, mapstd, … nothing worked!)
  • 42. Lessons learned + Optimize self-learning doing things from scratch (or from default baseline) + Kaggle is way funnier than studying with traditional datasets (iris, cancer, thyroid...) + Data science needs good engineering practices! + The competition fact sheet was a great way of accessing what I know I know, what I know I don’t know…
  • 43.
  • 44.
  • 45. Let’s hack?! Re-considering PCA PCD? Dimensionality Reduction Stop on best AMS (hack nn toolbox!) Ensemble Auto-encoder MATLAB unit tests MATLAB continuous integration