SlideShare a Scribd company logo
How to get into Kaggle?
Philipp Singer & Dmitry Gordeev
Vienna Data Science Meetup Vienna,
Dec 5th 2019
Who we are
● Philipp
○ Data scientist at UNIQA
○ PhD in CS at TU Graz
○ Profound experience in ML research and applications
○ Kaggle competition master currently ranked 36th
● Dmitry
○ Data scientist at UNIQA
○ Master’s degree in data mining
○ In-depth experience of ML applications in financial institutes
○ Kaggle competition grandmaster currently ranked 34th
● Competing successfully together on Kaggle for 1 year: The Zoo
2
What is Kaggle?
● “Your home for Data Science”
○ Online community of data scientists and machine learners
○ Founded in 2010
○ Acquired by Google in 2017
● Data science competitions
● Share notebooks, datasets, and discussions
● Courses and tutorials
● Free notebook infrastructure with CPUs and GPUs
3
How big is Kaggle
● The most popular ML competition platform
● The largest ML community
125 000+ users
350 completed competitions
up to 10 000 users per competition
Usually 20,000 $ - 100,000 $ prize fund
4
Kaggle survey results
5
Kaggle survey results
6
Kaggle survey results
7
Kaggle survey results
8
Competitions on Kaggle
● Usually hosted by companies or research institutes
● Main goal: prediction
● Wide range of different types of competitions
○ Different types of domains (e.g., financial, medical, sports, …)
○ Different types of data (e.g., tabular, nlp, image, videos, time-series, …)
○ Different types of objectives (e.g., classification, regression, segmentation, …)
○ Different goals of competitions (featured, research, playground, in-class)
● Built-in progression system with medals and ranks
● Top spots usually receive prize money
9
Competition medals
10
User ranking + titles
11
How competitions usually work
12https://mc.ai/pseudo-labeling/
● Started competing under the team name “The Zoo” exactly one year ago
● Little prior experience on Kaggle
● Participated in 7 competitions
● Strategy: diversify types of competitions for learning purposes
The Zoo
13
Our Journey
14
Quora
Develop models that identify
and flag insincere questions.
1 306 122 labelled
questions
6.2% insincere questions
4 037 teams
2 hours to fit and predict
15
Quora - sincere/insincere
How can I become a data scientist?
How come Trump is so stupid?
Is it possible for a vegan who does crossfit to go 10 minutes without telling
someone about it?
Everytime I slap myself in the face, it hurts. How can I prevent this?
16
Quora - solution
17
Quora - final standings
18
Santander
19
Identify which customers will
make a specific transaction in
the future
200 000 transactions
8 802 teams
2 months duration
Santander - the mysterious data
20
Santander - solution
21
Santander - final standings
22
LANL Earthquake Prediction
Predict the time remaining before
laboratory earthquakes occur
from real-time seismic data.
629 145 480 data points
4 200 trainings segments
4 540 teams
30 minutes to fit and predict
23
LANL - the physics
24
LANL - solution
● Derived handful of features from the data capturing peaks
and volatility of the acoustic signal
● Combination (ensemble) of two state-of-the-art modeling approaches
○ Gradient Boosting Regression Trees
○ Neural Network (Deep Learning)
● Novel statistical data adjustment to account for different earthquake cycles
25
LANL - final standings
26
APTOS Blindness Detection
Detect diabetic retinopathy to
stop blindness before it's too late!
3 662 retina images
0 - 4 retinopathy levels
2 943 teams
15 000 evaluation images
27
Diabetic retinopathy is the leading cause of blindness in
the working-age population of the developed world. It is
estimated to affect over 93 million people.
APTOS
28
https://www.eyeops.com/contents/our-services/eye-diseases/diabetic-retinopathy; https://www.vequill.com/how-to-cure-temporary-blindness/
APTOS - solution
● Careful image pre-processing to remove any
kind of bias (e.g., device)
● Combination of several current best deep
neural networks
● Models are pre-trained on large collection of
image data (imagenet + extra retina images)
29
APTOS - final standings
30
Quiz
● Did I have relevant experience to enter this competition?
31
Data: Atomic elements (H for hydrogen, C for carbon
etc.) and their X, Y, Z cartesian coordinates.
Task: Develop an algorithm that can predict the
magnetic interaction between two atoms in a
molecule.
Why should you start on Kaggle?
● Doing is the best way to learn
● Get in touch with data and use cases
outside your main domain
● Keep up-to-date with state-of-the-art methods
● Learn from others
● Measure yourself and know where you stand
● Hardware and software is provided by Kaggle
32
Easy start
33
How can you start on Kaggle?
● Don’t be afraid! Just do it!
● Overcome self-handicapping behavior
● You gain points regardless of the result
● “Getting started” competitions
● Pick a competition that sounds exciting to you, don’t be afraid to pick one
where you have no prior experience
● Research similar previous competitions and read solutions
● Follow published notebooks and discussions
34
Learn from the community
35
How to approach a competition?
● Choose a programming language (usually python or R)
● Understand the problem setting, get a feeling for the data and the metric
● Exploratory Data Analysis (EDA)
● Implement basic script / notebook from scratch doing training and prediction
OR just fork someone’s model ;-)
● Think hard about robust CV setup
● Keep up-to-date on discussions and developments of competition
● Experiment a lot and iterate quickly
36
Try more, fail fast
37
Baseline
model
Final
model
Thanks!
Get in touch with us! We are open to any inquiries.
me@philippsinger.com
dott1718@gmail.com
@ph_singer @dott1718
38Vienna Data Science Meetup Vienna,
Dec 5th 2019

More Related Content

Similar to How to get into Kaggle? by Philipp Singer and Dmitry Gordeev

Kaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto DaneseKaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto Danese
Alberto Danese
 
Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019
Alberto Danese
 
Guerrilla UX: Practical and Affordable Research
Guerrilla UX: Practical and Affordable ResearchGuerrilla UX: Practical and Affordable Research
Guerrilla UX: Practical and Affordable Research
Brad Orego (he/they)
 
How to win a machine learning competition pavel pleskov
How to win a machine learning competition   pavel pleskovHow to win a machine learning competition   pavel pleskov
How to win a machine learning competition pavel pleskov
DataFest Tbilisi
 
Machine Learning with Python
Machine Learning with Python Machine Learning with Python
Machine Learning with Python
GLC Networks
 
Using Graph Algorithms for Advanced Analytics - Part 5 Classification
Using Graph Algorithms for Advanced Analytics - Part 5 ClassificationUsing Graph Algorithms for Advanced Analytics - Part 5 Classification
Using Graph Algorithms for Advanced Analytics - Part 5 Classification
TigerGraph
 
Graph Gurus Episode 32: Using Graph Algorithms for Advanced Analytics Part 5
Graph Gurus Episode 32: Using Graph Algorithms for Advanced Analytics Part 5Graph Gurus Episode 32: Using Graph Algorithms for Advanced Analytics Part 5
Graph Gurus Episode 32: Using Graph Algorithms for Advanced Analytics Part 5
TigerGraph
 
Kaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New OpportunitiesKaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New Opportunities
Jo-fai Chow
 
A Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationA Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR Evaluation
Julián Urbano
 
AI and ML for Everyone
AI and ML for EveryoneAI and ML for Everyone
AI and ML for Everyone
bigdata trunk
 
Why am I doing this???
Why am I doing this???Why am I doing this???
Why am I doing this???
Anne-Marie Tousch
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
Volodymyr Kazantsev
 
20181212 Queensland AI Meetup
20181212 Queensland AI Meetup20181212 Queensland AI Meetup
20181212 Queensland AI Meetup
Adam Craven
 
On science hackathons univercite 2016
On science hackathons univercite 2016On science hackathons univercite 2016
On science hackathons univercite 2016
Derek Groen
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
Big Data Colombia
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data quality
Lars Albertsson
 
Automatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to ProductionAutomatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to Production
Alexey Grigorev
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
Sampath Kumar
 
Automating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learningAutomating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learning
Tamjid Rayhan
 
A few questions about large scale machine learning
A few questions about large scale machine learningA few questions about large scale machine learning
A few questions about large scale machine learning
Theodoros Vasiloudis
 

Similar to How to get into Kaggle? by Philipp Singer and Dmitry Gordeev (20)

Kaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto DaneseKaggle Days Brussels - Alberto Danese
Kaggle Days Brussels - Alberto Danese
 
Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019Kaggle Days Milan - March 2019
Kaggle Days Milan - March 2019
 
Guerrilla UX: Practical and Affordable Research
Guerrilla UX: Practical and Affordable ResearchGuerrilla UX: Practical and Affordable Research
Guerrilla UX: Practical and Affordable Research
 
How to win a machine learning competition pavel pleskov
How to win a machine learning competition   pavel pleskovHow to win a machine learning competition   pavel pleskov
How to win a machine learning competition pavel pleskov
 
Machine Learning with Python
Machine Learning with Python Machine Learning with Python
Machine Learning with Python
 
Using Graph Algorithms for Advanced Analytics - Part 5 Classification
Using Graph Algorithms for Advanced Analytics - Part 5 ClassificationUsing Graph Algorithms for Advanced Analytics - Part 5 Classification
Using Graph Algorithms for Advanced Analytics - Part 5 Classification
 
Graph Gurus Episode 32: Using Graph Algorithms for Advanced Analytics Part 5
Graph Gurus Episode 32: Using Graph Algorithms for Advanced Analytics Part 5Graph Gurus Episode 32: Using Graph Algorithms for Advanced Analytics Part 5
Graph Gurus Episode 32: Using Graph Algorithms for Advanced Analytics Part 5
 
Kaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New OpportunitiesKaggle Competitions, New Friends, New Skills and New Opportunities
Kaggle Competitions, New Friends, New Skills and New Opportunities
 
A Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR EvaluationA Plan for Sustainable MIR Evaluation
A Plan for Sustainable MIR Evaluation
 
AI and ML for Everyone
AI and ML for EveryoneAI and ML for Everyone
AI and ML for Everyone
 
Why am I doing this???
Why am I doing this???Why am I doing this???
Why am I doing this???
 
Agile Data Science
Agile Data ScienceAgile Data Science
Agile Data Science
 
20181212 Queensland AI Meetup
20181212 Queensland AI Meetup20181212 Queensland AI Meetup
20181212 Queensland AI Meetup
 
On science hackathons univercite 2016
On science hackathons univercite 2016On science hackathons univercite 2016
On science hackathons univercite 2016
 
An introduction to deep reinforcement learning
An introduction to deep reinforcement learningAn introduction to deep reinforcement learning
An introduction to deep reinforcement learning
 
Mortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data qualityMortal analytics - Covid-19 and the problem of data quality
Mortal analytics - Covid-19 and the problem of data quality
 
Automatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to ProductionAutomatic Image Cropping - A journey from a Master Thesis to Production
Automatic Image Cropping - A journey from a Master Thesis to Production
 
Introduction to data science
Introduction to data scienceIntroduction to data science
Introduction to data science
 
Automating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learningAutomating fetal heart monitor using machine learning
Automating fetal heart monitor using machine learning
 
A few questions about large scale machine learning
A few questions about large scale machine learningA few questions about large scale machine learning
A few questions about large scale machine learning
 

More from Vienna Data Science Group

Deep learning in algorithmic trading
Deep learning in algorithmic tradingDeep learning in algorithmic trading
Deep learning in algorithmic trading
Vienna Data Science Group
 
Multi state churn analysis with a subscription product
Multi state churn analysis with a subscription productMulti state churn analysis with a subscription product
Multi state churn analysis with a subscription product
Vienna Data Science Group
 
Modelling the-spread-of-sars-cov-2
Modelling the-spread-of-sars-cov-2Modelling the-spread-of-sars-cov-2
Modelling the-spread-of-sars-cov-2
Vienna Data Science Group
 
Deeplearning ai june-sharable (1)
Deeplearning ai june-sharable (1)Deeplearning ai june-sharable (1)
Deeplearning ai june-sharable (1)
Vienna Data Science Group
 
NLP in a Bank: Automated Document Reading: Yevgen Kolesnyk / Patrik Zatko / D...
NLP in a Bank: Automated Document Reading: Yevgen Kolesnyk / Patrik Zatko / D...NLP in a Bank: Automated Document Reading: Yevgen Kolesnyk / Patrik Zatko / D...
NLP in a Bank: Automated Document Reading: Yevgen Kolesnyk / Patrik Zatko / D...
Vienna Data Science Group
 
Anita Graser: Analyzing Movment Data with MovingPandas
Anita Graser: Analyzing Movment Data  with MovingPandas Anita Graser: Analyzing Movment Data  with MovingPandas
Anita Graser: Analyzing Movment Data with MovingPandas
Vienna Data Science Group
 
Armin Rabitsch's presentation on the importance of social media in the electi...
Armin Rabitsch's presentation on the importance of social media in the electi...Armin Rabitsch's presentation on the importance of social media in the electi...
Armin Rabitsch's presentation on the importance of social media in the electi...
Vienna Data Science Group
 
Martina Chichi describes Amnesty International Italy's Barometer of Hate Project
Martina Chichi describes Amnesty International Italy's Barometer of Hate ProjectMartina Chichi describes Amnesty International Italy's Barometer of Hate Project
Martina Chichi describes Amnesty International Italy's Barometer of Hate Project
Vienna Data Science Group
 
Vdsg /Craftworks Industrial-AI
Vdsg /Craftworks Industrial-AIVdsg /Craftworks Industrial-AI
Vdsg /Craftworks Industrial-AI
Vienna Data Science Group
 
Roessler, Hafner - Modelling and Simulation in Industrial Applications: Apply...
Roessler, Hafner - Modelling and Simulation in Industrial Applications: Apply...Roessler, Hafner - Modelling and Simulation in Industrial Applications: Apply...
Roessler, Hafner - Modelling and Simulation in Industrial Applications: Apply...
Vienna Data Science Group
 
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Vienna Data Science Group
 
Openfabnet - A collaborative approach towards industry 4.0 based on open sour...
Openfabnet - A collaborative approach towards industry 4.0 based on open sour...Openfabnet - A collaborative approach towards industry 4.0 based on open sour...
Openfabnet - A collaborative approach towards industry 4.0 based on open sour...
Vienna Data Science Group
 
Lange - Industrial Data Space – Digital Sovereignty over Data
Lange - Industrial Data Space – Digital Sovereignty over DataLange - Industrial Data Space – Digital Sovereignty over Data
Lange - Industrial Data Space – Digital Sovereignty over Data
Vienna Data Science Group
 
Industry 4.0 by VDSG and Informance
Industry 4.0 by VDSG and InformanceIndustry 4.0 by VDSG and Informance
Industry 4.0 by VDSG and Informance
Vienna Data Science Group
 
Donner - Deep Learning - Overview and practical aspects
Donner - Deep Learning - Overview and practical aspectsDonner - Deep Learning - Overview and practical aspects
Donner - Deep Learning - Overview and practical aspects
Vienna Data Science Group
 
Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...
Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...
Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...
Vienna Data Science Group
 
Brunauer, Weidinger - Welcome from the Vienna Data Science Group
Brunauer, Weidinger - Welcome from the Vienna Data Science GroupBrunauer, Weidinger - Welcome from the Vienna Data Science Group
Brunauer, Weidinger - Welcome from the Vienna Data Science Group
Vienna Data Science Group
 
Data Market Austria and Data Science Continuing Education Course
Data Market Austria and Data Science Continuing Education CourseData Market Austria and Data Science Continuing Education Course
Data Market Austria and Data Science Continuing Education Course
Vienna Data Science Group
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
Vienna Data Science Group
 
Data science for CRM in banks
Data science for CRM in banksData science for CRM in banks
Data science for CRM in banks
Vienna Data Science Group
 

More from Vienna Data Science Group (20)

Deep learning in algorithmic trading
Deep learning in algorithmic tradingDeep learning in algorithmic trading
Deep learning in algorithmic trading
 
Multi state churn analysis with a subscription product
Multi state churn analysis with a subscription productMulti state churn analysis with a subscription product
Multi state churn analysis with a subscription product
 
Modelling the-spread-of-sars-cov-2
Modelling the-spread-of-sars-cov-2Modelling the-spread-of-sars-cov-2
Modelling the-spread-of-sars-cov-2
 
Deeplearning ai june-sharable (1)
Deeplearning ai june-sharable (1)Deeplearning ai june-sharable (1)
Deeplearning ai june-sharable (1)
 
NLP in a Bank: Automated Document Reading: Yevgen Kolesnyk / Patrik Zatko / D...
NLP in a Bank: Automated Document Reading: Yevgen Kolesnyk / Patrik Zatko / D...NLP in a Bank: Automated Document Reading: Yevgen Kolesnyk / Patrik Zatko / D...
NLP in a Bank: Automated Document Reading: Yevgen Kolesnyk / Patrik Zatko / D...
 
Anita Graser: Analyzing Movment Data with MovingPandas
Anita Graser: Analyzing Movment Data  with MovingPandas Anita Graser: Analyzing Movment Data  with MovingPandas
Anita Graser: Analyzing Movment Data with MovingPandas
 
Armin Rabitsch's presentation on the importance of social media in the electi...
Armin Rabitsch's presentation on the importance of social media in the electi...Armin Rabitsch's presentation on the importance of social media in the electi...
Armin Rabitsch's presentation on the importance of social media in the electi...
 
Martina Chichi describes Amnesty International Italy's Barometer of Hate Project
Martina Chichi describes Amnesty International Italy's Barometer of Hate ProjectMartina Chichi describes Amnesty International Italy's Barometer of Hate Project
Martina Chichi describes Amnesty International Italy's Barometer of Hate Project
 
Vdsg /Craftworks Industrial-AI
Vdsg /Craftworks Industrial-AIVdsg /Craftworks Industrial-AI
Vdsg /Craftworks Industrial-AI
 
Roessler, Hafner - Modelling and Simulation in Industrial Applications: Apply...
Roessler, Hafner - Modelling and Simulation in Industrial Applications: Apply...Roessler, Hafner - Modelling and Simulation in Industrial Applications: Apply...
Roessler, Hafner - Modelling and Simulation in Industrial Applications: Apply...
 
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
Wastian, Brunmeir - Data Analyses in Industrial Applications: From Predictive...
 
Openfabnet - A collaborative approach towards industry 4.0 based on open sour...
Openfabnet - A collaborative approach towards industry 4.0 based on open sour...Openfabnet - A collaborative approach towards industry 4.0 based on open sour...
Openfabnet - A collaborative approach towards industry 4.0 based on open sour...
 
Lange - Industrial Data Space – Digital Sovereignty over Data
Lange - Industrial Data Space – Digital Sovereignty over DataLange - Industrial Data Space – Digital Sovereignty over Data
Lange - Industrial Data Space – Digital Sovereignty over Data
 
Industry 4.0 by VDSG and Informance
Industry 4.0 by VDSG and InformanceIndustry 4.0 by VDSG and Informance
Industry 4.0 by VDSG and Informance
 
Donner - Deep Learning - Overview and practical aspects
Donner - Deep Learning - Overview and practical aspectsDonner - Deep Learning - Overview and practical aspects
Donner - Deep Learning - Overview and practical aspects
 
Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...
Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...
Langs - Machine Learning in Medical Imaging: Learning from Large-scale popula...
 
Brunauer, Weidinger - Welcome from the Vienna Data Science Group
Brunauer, Weidinger - Welcome from the Vienna Data Science GroupBrunauer, Weidinger - Welcome from the Vienna Data Science Group
Brunauer, Weidinger - Welcome from the Vienna Data Science Group
 
Data Market Austria and Data Science Continuing Education Course
Data Market Austria and Data Science Continuing Education CourseData Market Austria and Data Science Continuing Education Course
Data Market Austria and Data Science Continuing Education Course
 
20170126 big data processing
20170126 big data processing20170126 big data processing
20170126 big data processing
 
Data science for CRM in banks
Data science for CRM in banksData science for CRM in banks
Data science for CRM in banks
 

Recently uploaded

How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
Alireza Kamrani
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
Timothy Spann
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
hqfek
 
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
KiriakiENikolaidou
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
Timothy Spann
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
1tyxnjpia
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
yuvarajkumar334
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
ElizabethGarrettChri
 
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
exukyp
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
Vietnam Cotton & Spinning Association
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
ytypuem
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
yuvarajkumar334
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
actyx
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
blueshagoo1
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
TeukuEriSyahputra
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Marlon Dumas
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
oaxefes
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
vasanthatpuram
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
NABLAS株式会社
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
mkkikqvo
 

Recently uploaded (20)

How To Control IO Usage using Resource Manager
How To Control IO Usage using Resource ManagerHow To Control IO Usage using Resource Manager
How To Control IO Usage using Resource Manager
 
DSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelinesDSSML24_tspann_CodelessGenerativeAIPipelines
DSSML24_tspann_CodelessGenerativeAIPipelines
 
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
一比一原版爱尔兰都柏林大学毕业证(本硕)ucd学位证书如何办理
 
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptxREUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
REUSE-SCHOOL-DATA-INTEGRATED-SYSTEMS.pptx
 
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
06-12-2024-BudapestDataForum-BuildingReal-timePipelineswithFLaNK AIM
 
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
一比一原版(Sheffield毕业证书)谢菲尔德大学毕业证如何办理
 
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS_NOTES FOR MCA
 
Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024Open Source Contributions to Postgres: The Basics POSETTE 2024
Open Source Contributions to Postgres: The Basics POSETTE 2024
 
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理一比一原版(UofT毕业证)多伦多大学毕业证如何办理
一比一原版(UofT毕业证)多伦多大学毕业证如何办理
 
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
[VCOSA] Monthly Report - Cotton & Yarn Statistics May 2024
 
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
一比一原版(曼大毕业证书)曼尼托巴大学毕业证如何办理
 
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCAModule 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
Module 1 ppt BIG DATA ANALYTICS NOTES FOR MCA
 
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
一比一原版斯威本理工大学毕业证(swinburne毕业证)如何办理
 
Econ3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdfEcon3060_Screen Time and Success_ final_GroupProject.pdf
Econ3060_Screen Time and Success_ final_GroupProject.pdf
 
Template xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptxTemplate xxxxxxxx ssssssssssss Sertifikat.pptx
Template xxxxxxxx ssssssssssss Sertifikat.pptx
 
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
Discovering Digital Process Twins for What-if Analysis: a Process Mining Appr...
 
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
一比一原版卡尔加里大学毕业证(uc毕业证)如何办理
 
Cell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docxCell The Unit of Life for NEET Multiple Choice Questions.docx
Cell The Unit of Life for NEET Multiple Choice Questions.docx
 
社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .社内勉強会資料_Hallucination of LLMs               .
社内勉強会資料_Hallucination of LLMs               .
 
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
原版一比一多伦多大学毕业证(UofT毕业证书)如何办理
 

How to get into Kaggle? by Philipp Singer and Dmitry Gordeev

  • 1. How to get into Kaggle? Philipp Singer & Dmitry Gordeev Vienna Data Science Meetup Vienna, Dec 5th 2019
  • 2. Who we are ● Philipp ○ Data scientist at UNIQA ○ PhD in CS at TU Graz ○ Profound experience in ML research and applications ○ Kaggle competition master currently ranked 36th ● Dmitry ○ Data scientist at UNIQA ○ Master’s degree in data mining ○ In-depth experience of ML applications in financial institutes ○ Kaggle competition grandmaster currently ranked 34th ● Competing successfully together on Kaggle for 1 year: The Zoo 2
  • 3. What is Kaggle? ● “Your home for Data Science” ○ Online community of data scientists and machine learners ○ Founded in 2010 ○ Acquired by Google in 2017 ● Data science competitions ● Share notebooks, datasets, and discussions ● Courses and tutorials ● Free notebook infrastructure with CPUs and GPUs 3
  • 4. How big is Kaggle ● The most popular ML competition platform ● The largest ML community 125 000+ users 350 completed competitions up to 10 000 users per competition Usually 20,000 $ - 100,000 $ prize fund 4
  • 9. Competitions on Kaggle ● Usually hosted by companies or research institutes ● Main goal: prediction ● Wide range of different types of competitions ○ Different types of domains (e.g., financial, medical, sports, …) ○ Different types of data (e.g., tabular, nlp, image, videos, time-series, …) ○ Different types of objectives (e.g., classification, regression, segmentation, …) ○ Different goals of competitions (featured, research, playground, in-class) ● Built-in progression system with medals and ranks ● Top spots usually receive prize money 9
  • 11. User ranking + titles 11
  • 12. How competitions usually work 12https://mc.ai/pseudo-labeling/
  • 13. ● Started competing under the team name “The Zoo” exactly one year ago ● Little prior experience on Kaggle ● Participated in 7 competitions ● Strategy: diversify types of competitions for learning purposes The Zoo 13
  • 15. Quora Develop models that identify and flag insincere questions. 1 306 122 labelled questions 6.2% insincere questions 4 037 teams 2 hours to fit and predict 15
  • 16. Quora - sincere/insincere How can I become a data scientist? How come Trump is so stupid? Is it possible for a vegan who does crossfit to go 10 minutes without telling someone about it? Everytime I slap myself in the face, it hurts. How can I prevent this? 16
  • 18. Quora - final standings 18
  • 19. Santander 19 Identify which customers will make a specific transaction in the future 200 000 transactions 8 802 teams 2 months duration
  • 20. Santander - the mysterious data 20
  • 22. Santander - final standings 22
  • 23. LANL Earthquake Prediction Predict the time remaining before laboratory earthquakes occur from real-time seismic data. 629 145 480 data points 4 200 trainings segments 4 540 teams 30 minutes to fit and predict 23
  • 24. LANL - the physics 24
  • 25. LANL - solution ● Derived handful of features from the data capturing peaks and volatility of the acoustic signal ● Combination (ensemble) of two state-of-the-art modeling approaches ○ Gradient Boosting Regression Trees ○ Neural Network (Deep Learning) ● Novel statistical data adjustment to account for different earthquake cycles 25
  • 26. LANL - final standings 26
  • 27. APTOS Blindness Detection Detect diabetic retinopathy to stop blindness before it's too late! 3 662 retina images 0 - 4 retinopathy levels 2 943 teams 15 000 evaluation images 27 Diabetic retinopathy is the leading cause of blindness in the working-age population of the developed world. It is estimated to affect over 93 million people.
  • 29. APTOS - solution ● Careful image pre-processing to remove any kind of bias (e.g., device) ● Combination of several current best deep neural networks ● Models are pre-trained on large collection of image data (imagenet + extra retina images) 29
  • 30. APTOS - final standings 30
  • 31. Quiz ● Did I have relevant experience to enter this competition? 31 Data: Atomic elements (H for hydrogen, C for carbon etc.) and their X, Y, Z cartesian coordinates. Task: Develop an algorithm that can predict the magnetic interaction between two atoms in a molecule.
  • 32. Why should you start on Kaggle? ● Doing is the best way to learn ● Get in touch with data and use cases outside your main domain ● Keep up-to-date with state-of-the-art methods ● Learn from others ● Measure yourself and know where you stand ● Hardware and software is provided by Kaggle 32
  • 34. How can you start on Kaggle? ● Don’t be afraid! Just do it! ● Overcome self-handicapping behavior ● You gain points regardless of the result ● “Getting started” competitions ● Pick a competition that sounds exciting to you, don’t be afraid to pick one where you have no prior experience ● Research similar previous competitions and read solutions ● Follow published notebooks and discussions 34
  • 35. Learn from the community 35
  • 36. How to approach a competition? ● Choose a programming language (usually python or R) ● Understand the problem setting, get a feeling for the data and the metric ● Exploratory Data Analysis (EDA) ● Implement basic script / notebook from scratch doing training and prediction OR just fork someone’s model ;-) ● Think hard about robust CV setup ● Keep up-to-date on discussions and developments of competition ● Experiment a lot and iterate quickly 36
  • 37. Try more, fail fast 37 Baseline model Final model
  • 38. Thanks! Get in touch with us! We are open to any inquiries. me@philippsinger.com dott1718@gmail.com @ph_singer @dott1718 38Vienna Data Science Meetup Vienna, Dec 5th 2019