SlideShare a Scribd company logo
Kaggle
The home of data
science
GE Flight Quest 2
Optimize flight routes based
on weather & traffic
$250,000
122 teams
Hewlett Foundation: Automated Essay Scoring
Develop an automated scoring algorithm
for student-written essays
$100,000
155 teams
Allstate Purchase Prediction Challenge
Develop an automated scoring algorithm
for student-written essays
$50,000
1,570 teams
Merck Molecular Activity Challenge
Help develop safe and effective medicines
by predicting molecular activity
$40,000
236 teams
Higgs Boson Machine Learning Challenge
Use the ATLAS experiment to
identify the Higgs boson
$13,000
1,302 teams
Age Income Default
58 $95,824 True
73 $20,708 False
59 $82,152 False
66 $25,334 True
Age Income Default
73 $53,445
61 $36,679
47 $90,422
44 $79,040
Training Data Test Data
The Kaggle Approach
Mapping Dark Matter
Competition Progress
Accuracy
(lower is better)
Week 1 Week 3 Week 5 Week 7 End
.0150
.0170
Martin O’Leary
PhD student in Glaciology, Cambridge U
“In less than a week, Martin O’Leary,
a PhD student in glaciology,
outperformed the state-of-the-art
algorithms”
“The world’s brightest physicists have
been working for decades on solving
one of the great unifying problems of
our universe”
Mapping Dark Matter
Competition Progress
Accuracy
(lower is better)
Week 1 Week 3 Week 5 Week 7 End
.0150
.0170
Martin O’Leary
PhD student in Glaciology, Cambridge U
Marius Cobzarenco
Grad student in computer vision, UC London
Ali Haissaine & Eu Jin Loc
Signature Verification, Qatar U & Grad Student @ Deloitte
Other
deepZot (David Kirkby & Daniel Margala)
Particle Physicist & Cosmologist
EXAMPLE ESSAY QUESTION —
We all understand the benefits of laughter.
For example, someone once said,
“Laughter is the shortest distance between
two people.”
Many other people believe that laughter is
an important part of any relationship. Tell a
true story in which laughter was one
element or part.
We can work
with difficult
data —
The winning model
correctly predicted
seizures 82% of the
time. Until that point,
researchers had
struggled to develop an
algorithm that did better
than chance
Mayo Clinic:
Seizure detection
from EEG
readings
We’ve worked with
many of the
world’s largest
companies
Healthcare &
Pharma
Consumer
Internet
Finance IndustrialConsumer
Marketing
Oil
& Gas
$50b+
Beverage
Co.
Global
Bank
Top
Credit
Card
Issuer
Top 5 E&P
Top 20 E&P
Community of
over 320K data
scientists
That submit over
100K machine
learning models
per month
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
May-10 May-11 May-12 May-13 May-14 May-15
Monthly Submissions to Kaggle Competitions
Feature
engineering
matters most
Good software
engineering
practices and
robust statistical
methods are key
80% of data science is grunt work and only 20% involves deep thinking
A good pipeline makes data scientists more productive and their work higher quality and more
enjoyable
Our workflow environment will be the central repository for all data science work in a company
Anthony Goldbloom
a@kaggle.com
650 283 9781

More Related Content

What's hot

Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
Nolan Nichols
 
Allegro
AllegroAllegro
Allegro
Healthegy
 
Elsevier Medical Graph – mit Machine Learning zu Precision Medicine
Elsevier Medical Graph – mit Machine Learning zu Precision MedicineElsevier Medical Graph – mit Machine Learning zu Precision Medicine
Elsevier Medical Graph – mit Machine Learning zu Precision Medicine
Rising Media Ltd.
 
Predictive model for falls poster v3
Predictive model for falls poster v3Predictive model for falls poster v3
Predictive model for falls poster v3
Marmi Le
 
Impact.Tech "Statistical Literacy for Deep Tech"
Impact.Tech "Statistical Literacy for Deep Tech"Impact.Tech "Statistical Literacy for Deep Tech"
Impact.Tech "Statistical Literacy for Deep Tech"
Impact.Tech
 
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...
Domino Data Lab
 
Stress detection screen shots
Stress detection screen shotsStress detection screen shots
Stress detection screen shots
Venkat Projects
 
Stress detection
Stress detectionStress detection
Stress detection
Venkat Projects
 
Blue Button for Medicaid
Blue Button for Medicaid Blue Button for Medicaid
Blue Button for Medicaid
Mark Scrimshire
 
1645 dyskant using our laptop
1645 dyskant using our laptop1645 dyskant using our laptop
1645 dyskant using our laptop
Rising Media, Inc.
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
Greg Landrum
 

What's hot (13)

Meaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine researchMeaningful (meta)data at scale: removing barriers to precision medicine research
Meaningful (meta)data at scale: removing barriers to precision medicine research
 
Health Care Analytics
Health Care AnalyticsHealth Care Analytics
Health Care Analytics
 
Allegro
AllegroAllegro
Allegro
 
Elsevier Medical Graph – mit Machine Learning zu Precision Medicine
Elsevier Medical Graph – mit Machine Learning zu Precision MedicineElsevier Medical Graph – mit Machine Learning zu Precision Medicine
Elsevier Medical Graph – mit Machine Learning zu Precision Medicine
 
Predictive model for falls poster v3
Predictive model for falls poster v3Predictive model for falls poster v3
Predictive model for falls poster v3
 
Impact.Tech "Statistical Literacy for Deep Tech"
Impact.Tech "Statistical Literacy for Deep Tech"Impact.Tech "Statistical Literacy for Deep Tech"
Impact.Tech "Statistical Literacy for Deep Tech"
 
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...
Using Spark in Healthcare Predictive Analytics in the OR - Data Science Pop-u...
 
Stress detection screen shots
Stress detection screen shotsStress detection screen shots
Stress detection screen shots
 
Stress detection
Stress detectionStress detection
Stress detection
 
Blue Button for Medicaid
Blue Button for Medicaid Blue Button for Medicaid
Blue Button for Medicaid
 
Hi ssies 2013
Hi ssies 2013Hi ssies 2013
Hi ssies 2013
 
1645 dyskant using our laptop
1645 dyskant using our laptop1645 dyskant using our laptop
1645 dyskant using our laptop
 
Is one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical researchIs one enough? Data warehousing for biomedical research
Is one enough? Data warehousing for biomedical research
 

Similar to Kaggle presentation at SF Data Mining Meetup - Trulia June 23, 2015

Thinking about Data Strategy: for Ophthalmologists
Thinking about Data Strategy: for OphthalmologistsThinking about Data Strategy: for Ophthalmologists
Thinking about Data Strategy: for Ophthalmologists
PetteriTeikariPhD
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and butest
 
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
Yoon Sup Choi
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
Robert Grossman
 
machine_learning_section1_ebook.pdf
machine_learning_section1_ebook.pdfmachine_learning_section1_ebook.pdf
machine_learning_section1_ebook.pdf
agfi
 
Capstone Project.pptx
Capstone Project.pptxCapstone Project.pptx
Capstone Project.pptx
ARESProject1
 
Open Source Pharma: Crowd computing: A new approach to predictive modeling
Open Source Pharma: Crowd computing: A new approach to predictive modelingOpen Source Pharma: Crowd computing: A new approach to predictive modeling
Open Source Pharma: Crowd computing: A new approach to predictive modeling
Open Source Pharma
 
Fairness in Machine Learning
Fairness in Machine LearningFairness in Machine Learning
Fairness in Machine Learning
Delip Rao
 
2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in DataDachis Group
 
Data science for developers
Data science for developersData science for developers
Data science for developers
Patricio Del Boca
 
Top 5 Deep Learning and AI Stories 2/10
Top 5 Deep Learning and AI Stories 2/10Top 5 Deep Learning and AI Stories 2/10
Top 5 Deep Learning and AI Stories 2/10
NVIDIA
 
840 plenary elder_using his laptop
840 plenary elder_using his laptop840 plenary elder_using his laptop
840 plenary elder_using his laptop
Rising Media, Inc.
 
Kaggle: Crowd Sourcing for Data Analytics
Kaggle: Crowd Sourcing for Data AnalyticsKaggle: Crowd Sourcing for Data Analytics
Kaggle: Crowd Sourcing for Data Analytics
Jeffrey Funk Business Models
 
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frank Rybicki
 
Counter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of ThingsCounter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of Things
June Andrews
 
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Edureka!
 
Ophthalmology & Optometry 2.0
Ophthalmology & Optometry 2.0Ophthalmology & Optometry 2.0
Ophthalmology & Optometry 2.0
PetteriTeikariPhD
 
Business Optimization via Causal Inference
Business Optimization via Causal InferenceBusiness Optimization via Causal Inference
Business Optimization via Causal Inference
Hanan Shteingart
 
920 plenary elder
920 plenary elder920 plenary elder
920 plenary elder
Rising Media, Inc.
 
910 plenary Elder
910 plenary Elder910 plenary Elder
910 plenary Elder
Rising Media, Inc.
 

Similar to Kaggle presentation at SF Data Mining Meetup - Trulia June 23, 2015 (20)

Thinking about Data Strategy: for Ophthalmologists
Thinking about Data Strategy: for OphthalmologistsThinking about Data Strategy: for Ophthalmologists
Thinking about Data Strategy: for Ophthalmologists
 
Machine Learning, Data Mining, and
Machine Learning, Data Mining, and Machine Learning, Data Mining, and
Machine Learning, Data Mining, and
 
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
 
Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data Keynote on 2015 Yale Day of Data
Keynote on 2015 Yale Day of Data
 
machine_learning_section1_ebook.pdf
machine_learning_section1_ebook.pdfmachine_learning_section1_ebook.pdf
machine_learning_section1_ebook.pdf
 
Capstone Project.pptx
Capstone Project.pptxCapstone Project.pptx
Capstone Project.pptx
 
Open Source Pharma: Crowd computing: A new approach to predictive modeling
Open Source Pharma: Crowd computing: A new approach to predictive modelingOpen Source Pharma: Crowd computing: A new approach to predictive modeling
Open Source Pharma: Crowd computing: A new approach to predictive modeling
 
Fairness in Machine Learning
Fairness in Machine LearningFairness in Machine Learning
Fairness in Machine Learning
 
2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
 
Data science for developers
Data science for developersData science for developers
Data science for developers
 
Top 5 Deep Learning and AI Stories 2/10
Top 5 Deep Learning and AI Stories 2/10Top 5 Deep Learning and AI Stories 2/10
Top 5 Deep Learning and AI Stories 2/10
 
840 plenary elder_using his laptop
840 plenary elder_using his laptop840 plenary elder_using his laptop
840 plenary elder_using his laptop
 
Kaggle: Crowd Sourcing for Data Analytics
Kaggle: Crowd Sourcing for Data AnalyticsKaggle: Crowd Sourcing for Data Analytics
Kaggle: Crowd Sourcing for Data Analytics
 
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
 
Counter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of ThingsCounter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of Things
 
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
Random Forest Tutorial | Random Forest in R | Machine Learning | Data Science...
 
Ophthalmology & Optometry 2.0
Ophthalmology & Optometry 2.0Ophthalmology & Optometry 2.0
Ophthalmology & Optometry 2.0
 
Business Optimization via Causal Inference
Business Optimization via Causal InferenceBusiness Optimization via Causal Inference
Business Optimization via Causal Inference
 
920 plenary elder
920 plenary elder920 plenary elder
920 plenary elder
 
910 plenary Elder
910 plenary Elder910 plenary Elder
910 plenary Elder
 

More from gpano

Making money with Data Science
Making money with Data ScienceMaking money with Data Science
Making money with Data Science
gpano
 
Reducing Presentation Noise
Reducing Presentation NoiseReducing Presentation Noise
Reducing Presentation Noise
gpano
 
From Signal to Symbols
From Signal to SymbolsFrom Signal to Symbols
From Signal to Symbols
gpano
 
Income targeting and surge pricing
Income targeting and surge pricingIncome targeting and surge pricing
Income targeting and surge pricing
gpano
 
Natural Language Processing on Non-Textual Data
Natural Language Processing on Non-Textual DataNatural Language Processing on Non-Textual Data
Natural Language Processing on Non-Textual Data
gpano
 
Page rank for anomaly detection
Page rank for anomaly detectionPage rank for anomaly detection
Page rank for anomaly detection
gpano
 

More from gpano (6)

Making money with Data Science
Making money with Data ScienceMaking money with Data Science
Making money with Data Science
 
Reducing Presentation Noise
Reducing Presentation NoiseReducing Presentation Noise
Reducing Presentation Noise
 
From Signal to Symbols
From Signal to SymbolsFrom Signal to Symbols
From Signal to Symbols
 
Income targeting and surge pricing
Income targeting and surge pricingIncome targeting and surge pricing
Income targeting and surge pricing
 
Natural Language Processing on Non-Textual Data
Natural Language Processing on Non-Textual DataNatural Language Processing on Non-Textual Data
Natural Language Processing on Non-Textual Data
 
Page rank for anomaly detection
Page rank for anomaly detectionPage rank for anomaly detection
Page rank for anomaly detection
 

Recently uploaded

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
ocavb
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
enxupq
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
ewymefz
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
ewymefz
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
nscud
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
correoyaya
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
ewymefz
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Subhajit Sahu
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Linda486226
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
vcaxypu
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
Opendatabay
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
ArpitMalhotra16
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 

Recently uploaded (20)

【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单一比一原版(TWU毕业证)西三一大学毕业证成绩单
一比一原版(TWU毕业证)西三一大学毕业证成绩单
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单一比一原版(QU毕业证)皇后大学毕业证成绩单
一比一原版(QU毕业证)皇后大学毕业证成绩单
 
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
一比一原版(IIT毕业证)伊利诺伊理工大学毕业证成绩单
 
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
一比一原版(UPenn毕业证)宾夕法尼亚大学毕业证成绩单
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
一比一原版(CBU毕业证)卡普顿大学毕业证成绩单
 
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
Innovative Methods in Media and Communication Research by Sebastian Kubitschk...
 
一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单一比一原版(BU毕业证)波士顿大学毕业证成绩单
一比一原版(BU毕业证)波士顿大学毕业证成绩单
 
SOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape ReportSOCRadar Germany 2024 Threat Landscape Report
SOCRadar Germany 2024 Threat Landscape Report
 
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
Levelwise PageRank with Loop-Based Dead End Handling Strategy : SHORT REPORT ...
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdfSample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
Sample_Global Non-invasive Prenatal Testing (NIPT) Market, 2019-2030.pdf
 
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
一比一原版(RUG毕业证)格罗宁根大学毕业证成绩单
 
Opendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptxOpendatabay - Open Data Marketplace.pptx
Opendatabay - Open Data Marketplace.pptx
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
standardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghhstandardisation of garbhpala offhgfffghh
standardisation of garbhpala offhgfffghh
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 

Kaggle presentation at SF Data Mining Meetup - Trulia June 23, 2015

  • 1. Kaggle The home of data science
  • 2. GE Flight Quest 2 Optimize flight routes based on weather & traffic $250,000 122 teams Hewlett Foundation: Automated Essay Scoring Develop an automated scoring algorithm for student-written essays $100,000 155 teams Allstate Purchase Prediction Challenge Develop an automated scoring algorithm for student-written essays $50,000 1,570 teams Merck Molecular Activity Challenge Help develop safe and effective medicines by predicting molecular activity $40,000 236 teams Higgs Boson Machine Learning Challenge Use the ATLAS experiment to identify the Higgs boson $13,000 1,302 teams
  • 3. Age Income Default 58 $95,824 True 73 $20,708 False 59 $82,152 False 66 $25,334 True Age Income Default 73 $53,445 61 $36,679 47 $90,422 44 $79,040 Training Data Test Data The Kaggle Approach
  • 4.
  • 5. Mapping Dark Matter Competition Progress Accuracy (lower is better) Week 1 Week 3 Week 5 Week 7 End .0150 .0170 Martin O’Leary PhD student in Glaciology, Cambridge U
  • 6. “In less than a week, Martin O’Leary, a PhD student in glaciology, outperformed the state-of-the-art algorithms” “The world’s brightest physicists have been working for decades on solving one of the great unifying problems of our universe”
  • 7. Mapping Dark Matter Competition Progress Accuracy (lower is better) Week 1 Week 3 Week 5 Week 7 End .0150 .0170 Martin O’Leary PhD student in Glaciology, Cambridge U Marius Cobzarenco Grad student in computer vision, UC London Ali Haissaine & Eu Jin Loc Signature Verification, Qatar U & Grad Student @ Deloitte Other deepZot (David Kirkby & Daniel Margala) Particle Physicist & Cosmologist
  • 8. EXAMPLE ESSAY QUESTION — We all understand the benefits of laughter. For example, someone once said, “Laughter is the shortest distance between two people.” Many other people believe that laughter is an important part of any relationship. Tell a true story in which laughter was one element or part. We can work with difficult data —
  • 9. The winning model correctly predicted seizures 82% of the time. Until that point, researchers had struggled to develop an algorithm that did better than chance Mayo Clinic: Seizure detection from EEG readings
  • 10. We’ve worked with many of the world’s largest companies Healthcare & Pharma Consumer Internet Finance IndustrialConsumer Marketing Oil & Gas $50b+ Beverage Co. Global Bank Top Credit Card Issuer Top 5 E&P Top 20 E&P
  • 11. Community of over 320K data scientists
  • 12. That submit over 100K machine learning models per month 0 20,000 40,000 60,000 80,000 100,000 120,000 140,000 160,000 May-10 May-11 May-12 May-13 May-14 May-15 Monthly Submissions to Kaggle Competitions
  • 14. Good software engineering practices and robust statistical methods are key
  • 15. 80% of data science is grunt work and only 20% involves deep thinking
  • 16. A good pipeline makes data scientists more productive and their work higher quality and more enjoyable
  • 17.
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23. Our workflow environment will be the central repository for all data science work in a company