SlideShare a Scribd company logo
KAGGLE THE HOME
OF DATA SCIENCE
Anthony
Goldbloom
O P E N
D A T A
S C I E N C E
C O N F E R E N C E_
BOSTON 2015
@opendatasci
Kaggle
The home of data
science
GE Flight Quest 2
Optimize flight routes based
on weather & traffic
$250,000
122 teams
Hewlett Foundation: Automated Essay Scoring
Develop an automated scoring algorithm
for student-written essays
$100,000
155 teams
Allstate Purchase Prediction Challenge
Develop an automated scoring algorithm
for student-written essays
$50,000
1,570 teams
Merck Molecular Activity Challenge
Help develop safe and effective medicines
by predicting molecular activity
$40,000
236 teams
Higgs Boson Machine Learning Challenge
Use the ATLAS experiment to
identify the Higgs boson
$13,000
1,302 teams
Age Income Default
58 $95,824 True
73 $20,708 False
59 $82,152 False
66 $25,334 True
Age Income Default
73 $53,445
61 $36,679
47 $90,422
44 $79,040
Training Data Test Data
The Kaggle Approach
Mapping Dark Matter
Competition Progress
Accuracy
(lower is better)
Week 1 Week 3 Week 5 Week 7 End
.0150
.0170
Martin O’Leary
PhD student in Glaciology, Cambridge U
“In less than a week, Martin O’Leary,
a PhD student in glaciology,
outperformed the state-of-the-art
algorithms”
“The world’s brightest physicists have
been working for decades on solving
one of the great unifying problems of
our universe”
Mapping Dark Matter
Competition Progress
Accuracy
(lower is better)
Week 1 Week 3 Week 5 Week 7 End
.0150
.0170
Martin O’Leary
PhD student in Glaciology, Cambridge U
Marius Cobzarenco
Grad student in computer vision, UC London
Ali Haissaine & Eu Jin Loc
Signature Verification, Qatar U & Grad Student @ Deloitte
Other
deepZot (David Kirkby & Daniel Margala)
Particle Physicist & Cosmologist
EXAMPLE ESSAY QUESTION —
We all understand the benefits of laughter.
For example, someone once said,
“Laughter is the shortest distance between
two people.”
Many other people believe that laughter is
an important part of any relationship. Tell a
true story in which laughter was one
element or part.
We can work
with difficult
data —
The winning model
correctly predicted
seizures 82% of the
time. Until that point,
researchers had
struggled to develop an
algorithm that did better
than chance
Mayo Clinic:
Seizure detection
from EEG
readings
We’ve worked with
many of the
world’s largest
companies
Healthcare &
Pharma
Consumer
Internet
Finance IndustrialConsumer
Marketing
Oil
& Gas
$50b+
Beverage
Co.
Global
Bank
Top
Credit
Card
Issuer
Top 5 E&P
Top 20 E&P
Community of
over 320K data
scientists
That submit over
100K machine
learning models
per month
0
20,000
40,000
60,000
80,000
100,000
120,000
140,000
160,000
May-10 May-11 May-12 May-13 May-14 May-15
Monthly Submissions to Kaggle Competitions
Feature
engineering
matters most
Good software
engineering
practices and
robust statistical
methods are key
80% of data science is grunt work and only 20% involves deep thinking
A good pipeline makes data scientists more productive and their work higher quality and more
enjoyable
Our workflow environment will be the central repository for all data science work in a company
Anthony Goldbloom
a@kaggle.com
650 283 9781

More Related Content

Similar to Kaggle The Home of Data Science

Golden age for technology and innovation vfinal acg
Golden age for technology and innovation vfinal acgGolden age for technology and innovation vfinal acg
Golden age for technology and innovation vfinal acg
Jeffrey Bussgang
 
Business Optimization via Causal Inference
Business Optimization via Causal InferenceBusiness Optimization via Causal Inference
Business Optimization via Causal Inference
Hanan Shteingart
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
Janet Corral
 
Fairness in Machine Learning
Fairness in Machine LearningFairness in Machine Learning
Fairness in Machine Learning
Delip Rao
 
Data science for developers
Data science for developersData science for developers
Data science for developers
Patricio Del Boca
 
Top 5 Deep Learning and AI Stories 2/10
Top 5 Deep Learning and AI Stories 2/10Top 5 Deep Learning and AI Stories 2/10
Top 5 Deep Learning and AI Stories 2/10
NVIDIA
 
Big Data for Recruiting | SourceIn New York
Big Data for Recruiting | SourceIn New YorkBig Data for Recruiting | SourceIn New York
Big Data for Recruiting | SourceIn New York
LinkedIn Talent Solutions
 
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
Yoon Sup Choi
 
Counter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of ThingsCounter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of Things
June Andrews
 
2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in DataDachis Group
 
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frank Rybicki
 
Ophthalmology & Optometry 2.0
Ophthalmology & Optometry 2.0Ophthalmology & Optometry 2.0
Ophthalmology & Optometry 2.0
PetteriTeikariPhD
 
Ma smarter data if big data is so awesome why do we keep making such dumb mis...
Ma smarter data if big data is so awesome why do we keep making such dumb mis...Ma smarter data if big data is so awesome why do we keep making such dumb mis...
Ma smarter data if big data is so awesome why do we keep making such dumb mis...
Peter Fletcher-Dobson
 
Leveraging the Latest in Brain Science to Deliver the Next Generation of eLea...
Leveraging the Latest in Brain Science to Deliver the Next Generation of eLea...Leveraging the Latest in Brain Science to Deliver the Next Generation of eLea...
Leveraging the Latest in Brain Science to Deliver the Next Generation of eLea...
Axonify
 
CB insights game changers 2020
CB insights game changers 2020CB insights game changers 2020
CB insights game changers 2020
nfinitiv
 
Tom Nodine - How AI Helps Us Live Longer.pdf
Tom Nodine - How AI Helps Us Live Longer.pdfTom Nodine - How AI Helps Us Live Longer.pdf
Tom Nodine - How AI Helps Us Live Longer.pdf
SOLTUIONSpeople, THINKubators, THINKathons
 
Voice of the Market, Tom Anderson
Voice of the Market, Tom AndersonVoice of the Market, Tom Anderson
Voice of the Market, Tom Anderson
Sentiment Analysis Symposium
 
Funding Deep Tech Startups
Funding Deep Tech StartupsFunding Deep Tech Startups
Funding Deep Tech Startups
sosv
 
Artificial Intelligence Applications, Research, and Economics
Artificial Intelligence Applications, Research, and EconomicsArtificial Intelligence Applications, Research, and Economics
Artificial Intelligence Applications, Research, and Economics
Ikhlaq Sidhu
 
IFMIA 2019 Plenary Talk : Deep Learning in Medicine; Engineers' Perspectives
IFMIA 2019 Plenary Talk : Deep Learning in Medicine; Engineers' PerspectivesIFMIA 2019 Plenary Talk : Deep Learning in Medicine; Engineers' Perspectives
IFMIA 2019 Plenary Talk : Deep Learning in Medicine; Engineers' Perspectives
Namkug Kim
 

Similar to Kaggle The Home of Data Science (20)

Golden age for technology and innovation vfinal acg
Golden age for technology and innovation vfinal acgGolden age for technology and innovation vfinal acg
Golden age for technology and innovation vfinal acg
 
Business Optimization via Causal Inference
Business Optimization via Causal InferenceBusiness Optimization via Causal Inference
Business Optimization via Causal Inference
 
General AI for Medical Educators April 2024
General AI for Medical Educators April 2024General AI for Medical Educators April 2024
General AI for Medical Educators April 2024
 
Fairness in Machine Learning
Fairness in Machine LearningFairness in Machine Learning
Fairness in Machine Learning
 
Data science for developers
Data science for developersData science for developers
Data science for developers
 
Top 5 Deep Learning and AI Stories 2/10
Top 5 Deep Learning and AI Stories 2/10Top 5 Deep Learning and AI Stories 2/10
Top 5 Deep Learning and AI Stories 2/10
 
Big Data for Recruiting | SourceIn New York
Big Data for Recruiting | SourceIn New YorkBig Data for Recruiting | SourceIn New York
Big Data for Recruiting | SourceIn New York
 
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
인공지능은 의료를 어떻게 혁신할 것인가 (ver 2)
 
Counter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of ThingsCounter Intuitive Machine Learning for the Industrial Internet of Things
Counter Intuitive Machine Learning for the Industrial Internet of Things
 
2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
2011 SBS Singapore | Nicholas Gruen, The Coming Revolution in Data
 
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / MedicineFrankie Rybicki slide set for Deep Learning in Radiology / Medicine
Frankie Rybicki slide set for Deep Learning in Radiology / Medicine
 
Ophthalmology & Optometry 2.0
Ophthalmology & Optometry 2.0Ophthalmology & Optometry 2.0
Ophthalmology & Optometry 2.0
 
Ma smarter data if big data is so awesome why do we keep making such dumb mis...
Ma smarter data if big data is so awesome why do we keep making such dumb mis...Ma smarter data if big data is so awesome why do we keep making such dumb mis...
Ma smarter data if big data is so awesome why do we keep making such dumb mis...
 
Leveraging the Latest in Brain Science to Deliver the Next Generation of eLea...
Leveraging the Latest in Brain Science to Deliver the Next Generation of eLea...Leveraging the Latest in Brain Science to Deliver the Next Generation of eLea...
Leveraging the Latest in Brain Science to Deliver the Next Generation of eLea...
 
CB insights game changers 2020
CB insights game changers 2020CB insights game changers 2020
CB insights game changers 2020
 
Tom Nodine - How AI Helps Us Live Longer.pdf
Tom Nodine - How AI Helps Us Live Longer.pdfTom Nodine - How AI Helps Us Live Longer.pdf
Tom Nodine - How AI Helps Us Live Longer.pdf
 
Voice of the Market, Tom Anderson
Voice of the Market, Tom AndersonVoice of the Market, Tom Anderson
Voice of the Market, Tom Anderson
 
Funding Deep Tech Startups
Funding Deep Tech StartupsFunding Deep Tech Startups
Funding Deep Tech Startups
 
Artificial Intelligence Applications, Research, and Economics
Artificial Intelligence Applications, Research, and EconomicsArtificial Intelligence Applications, Research, and Economics
Artificial Intelligence Applications, Research, and Economics
 
IFMIA 2019 Plenary Talk : Deep Learning in Medicine; Engineers' Perspectives
IFMIA 2019 Plenary Talk : Deep Learning in Medicine; Engineers' PerspectivesIFMIA 2019 Plenary Talk : Deep Learning in Medicine; Engineers' Perspectives
IFMIA 2019 Plenary Talk : Deep Learning in Medicine; Engineers' Perspectives
 

More from odsc

Understanding the Chief Data Officer
Understanding the Chief Data Officer Understanding the Chief Data Officer
Understanding the Chief Data Officer
odsc
 
Machine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge DiscoveryMachine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge Discovery
odsc
 
API Driven Development
API Driven Development API Driven Development
API Driven Development
odsc
 
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata AnalysisMobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
odsc
 
Productionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground UpProductionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground Up
odsc
 
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and HiveBig Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
odsc
 
Think Breadth, Not Depth
Think Breadth, Not DepthThink Breadth, Not Depth
Think Breadth, Not Depth
odsc
 
Data Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and InformationData Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and Information
odsc
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet
odsc
 
Building a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure MLBuilding a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure ML
odsc
 
Beyond Names
Beyond NamesBeyond Names
Beyond Names
odsc
 
How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500
odsc
 
Domain Expertise and Unstructured Data
Domain Expertise and Unstructured DataDomain Expertise and Unstructured Data
Domain Expertise and Unstructured Data
odsc
 
Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions
odsc
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learn
odsc
 
Bridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source ToolsBridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source Tools
odsc
 
Top 10 Signs of the Textpocalypse
Top 10 Signs of the TextpocalypseTop 10 Signs of the Textpocalypse
Top 10 Signs of the Textpocalypse
odsc
 
The Art of Data Science
The Art of Data Science The Art of Data Science
The Art of Data Science
odsc
 
Frontiers of Open Data Science Research
Frontiers of Open Data Science ResearchFrontiers of Open Data Science Research
Frontiers of Open Data Science Research
odsc
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
odsc
 

More from odsc (20)

Understanding the Chief Data Officer
Understanding the Chief Data Officer Understanding the Chief Data Officer
Understanding the Chief Data Officer
 
Machine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge DiscoveryMachine-In-The-Loop for Knowledge Discovery
Machine-In-The-Loop for Knowledge Discovery
 
API Driven Development
API Driven Development API Driven Development
API Driven Development
 
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata AnalysisMobile technology Usage by Humanitarian Programs: A Metadata Analysis
Mobile technology Usage by Humanitarian Programs: A Metadata Analysis
 
Productionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground UpProductionizing Deep Learning From the Ground Up
Productionizing Deep Learning From the Ground Up
 
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and HiveBig Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
Big Data Infrastructure: Introduction to Hadoop with MapReduce, Pig, and Hive
 
Think Breadth, Not Depth
Think Breadth, Not DepthThink Breadth, Not Depth
Think Breadth, Not Depth
 
Data Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and InformationData Science at Dow Jones: Monetizing Data, News and Information
Data Science at Dow Jones: Monetizing Data, News and Information
 
Spark, Python and Parquet
Spark, Python and Parquet Spark, Python and Parquet
Spark, Python and Parquet
 
Building a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure MLBuilding a Predictive Analytics Solution with Azure ML
Building a Predictive Analytics Solution with Azure ML
 
Beyond Names
Beyond NamesBeyond Names
Beyond Names
 
How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500How Woman are Conquering the S&P 500
How Woman are Conquering the S&P 500
 
Domain Expertise and Unstructured Data
Domain Expertise and Unstructured DataDomain Expertise and Unstructured Data
Domain Expertise and Unstructured Data
 
Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions Open Source Tools & Data Science Competitions
Open Source Tools & Data Science Competitions
 
Machine Learning with scikit-learn
Machine Learning with scikit-learnMachine Learning with scikit-learn
Machine Learning with scikit-learn
 
Bridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source ToolsBridging the Gap Between Data and Insight using Open-Source Tools
Bridging the Gap Between Data and Insight using Open-Source Tools
 
Top 10 Signs of the Textpocalypse
Top 10 Signs of the TextpocalypseTop 10 Signs of the Textpocalypse
Top 10 Signs of the Textpocalypse
 
The Art of Data Science
The Art of Data Science The Art of Data Science
The Art of Data Science
 
Frontiers of Open Data Science Research
Frontiers of Open Data Science ResearchFrontiers of Open Data Science Research
Frontiers of Open Data Science Research
 
Feature Engineering
Feature Engineering Feature Engineering
Feature Engineering
 

Recently uploaded

Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
Nexer Digital
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
Alan Dix
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
Safe Software
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
UiPathCommunity
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
ControlCase
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
KAMESHS29
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
91mobiles
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Albert Hoitingh
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
Peter Spielvogel
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
James Anderson
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Paige Cruz
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
mikeeftimakis1
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
sonjaschweigert1
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
Globus
 

Recently uploaded (20)

Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?Elizabeth Buie - Older adults: Are we really designing for our future selves?
Elizabeth Buie - Older adults: Are we really designing for our future selves?
 
Epistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI supportEpistemic Interaction - tuning interfaces to provide information for AI support
Epistemic Interaction - tuning interfaces to provide information for AI support
 
Essentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FMEEssentials of Automations: The Art of Triggers and Actions in FME
Essentials of Automations: The Art of Triggers and Actions in FME
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
Le nuove frontiere dell'AI nell'RPA con UiPath Autopilot™
 
PCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase TeamPCI PIN Basics Webinar from the Controlcase Team
PCI PIN Basics Webinar from the Controlcase Team
 
RESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for studentsRESUME BUILDER APPLICATION Project for students
RESUME BUILDER APPLICATION Project for students
 
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdfSmart TV Buyer Insights Survey 2024 by 91mobiles.pdf
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf
 
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdfSAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
SAP Sapphire 2024 - ASUG301 building better apps with SAP Fiori.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
Alt. GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using ...
 
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfObservability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdf
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Introduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - CybersecurityIntroduction to CHERI technology - Cybersecurity
Introduction to CHERI technology - Cybersecurity
 
A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...A tale of scale & speed: How the US Navy is enabling software delivery from l...
A tale of scale & speed: How the US Navy is enabling software delivery from l...
 
Enhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZEnhancing Performance with Globus and the Science DMZ
Enhancing Performance with Globus and the Science DMZ
 

Kaggle The Home of Data Science

  • 1. KAGGLE THE HOME OF DATA SCIENCE Anthony Goldbloom O P E N D A T A S C I E N C E C O N F E R E N C E_ BOSTON 2015 @opendatasci
  • 2. Kaggle The home of data science
  • 3. GE Flight Quest 2 Optimize flight routes based on weather & traffic $250,000 122 teams Hewlett Foundation: Automated Essay Scoring Develop an automated scoring algorithm for student-written essays $100,000 155 teams Allstate Purchase Prediction Challenge Develop an automated scoring algorithm for student-written essays $50,000 1,570 teams Merck Molecular Activity Challenge Help develop safe and effective medicines by predicting molecular activity $40,000 236 teams Higgs Boson Machine Learning Challenge Use the ATLAS experiment to identify the Higgs boson $13,000 1,302 teams
  • 4. Age Income Default 58 $95,824 True 73 $20,708 False 59 $82,152 False 66 $25,334 True Age Income Default 73 $53,445 61 $36,679 47 $90,422 44 $79,040 Training Data Test Data The Kaggle Approach
  • 5.
  • 6. Mapping Dark Matter Competition Progress Accuracy (lower is better) Week 1 Week 3 Week 5 Week 7 End .0150 .0170 Martin O’Leary PhD student in Glaciology, Cambridge U
  • 7. “In less than a week, Martin O’Leary, a PhD student in glaciology, outperformed the state-of-the-art algorithms” “The world’s brightest physicists have been working for decades on solving one of the great unifying problems of our universe”
  • 8. Mapping Dark Matter Competition Progress Accuracy (lower is better) Week 1 Week 3 Week 5 Week 7 End .0150 .0170 Martin O’Leary PhD student in Glaciology, Cambridge U Marius Cobzarenco Grad student in computer vision, UC London Ali Haissaine & Eu Jin Loc Signature Verification, Qatar U & Grad Student @ Deloitte Other deepZot (David Kirkby & Daniel Margala) Particle Physicist & Cosmologist
  • 9. EXAMPLE ESSAY QUESTION — We all understand the benefits of laughter. For example, someone once said, “Laughter is the shortest distance between two people.” Many other people believe that laughter is an important part of any relationship. Tell a true story in which laughter was one element or part. We can work with difficult data —
  • 10. The winning model correctly predicted seizures 82% of the time. Until that point, researchers had struggled to develop an algorithm that did better than chance Mayo Clinic: Seizure detection from EEG readings
  • 11. We’ve worked with many of the world’s largest companies Healthcare & Pharma Consumer Internet Finance IndustrialConsumer Marketing Oil & Gas $50b+ Beverage Co. Global Bank Top Credit Card Issuer Top 5 E&P Top 20 E&P
  • 12. Community of over 320K data scientists
  • 13. That submit over 100K machine learning models per month 0 20,000 40,000 60,000 80,000 100,000 120,000 140,000 160,000 May-10 May-11 May-12 May-13 May-14 May-15 Monthly Submissions to Kaggle Competitions
  • 15. Good software engineering practices and robust statistical methods are key
  • 16. 80% of data science is grunt work and only 20% involves deep thinking
  • 17. A good pipeline makes data scientists more productive and their work higher quality and more enjoyable
  • 18.
  • 19.
  • 20.
  • 21.
  • 22.
  • 23.
  • 24. Our workflow environment will be the central repository for all data science work in a company

Editor's Notes

  1. People currently come to Kaggle
  2. We score their solutions in real time.
  3. Palantir is really starting to own the reporting layer. We want to own optimization and predictive modeling. Our competition platform gives us a head start in this direction.
  4. People don’t come to us with churn or cross sell, but they typically come to us with their hardest problems, and I’ll talk more about this soon. It’s for these reasons that we continue to invest in the competition platform. It’s a very efficient operation. It’s currently running with a headcount of 4. We believe 6 is the right long term number of people to invest in competitions. We decided to focus on Oil & Gas because after working with ~25 Fortune 500s and 12 industries, we believe it’s the biggest opportunity for machine learning and most ripe for disruption. Specifically because: Greatest value add: Huge gap between what they’re doing and what’s possible Shale is disruptive: the industry is looking for new ideas making it a good environment to be selling into.
  5. Kaggle Competitions – breakeven business Access to most advanced and proven techniques Recruiting the very best of a scarce resource C-level access from leadership positioning in media
  6. Kaggle Competitions – breakeven business Access to most advanced and proven techniques Recruiting the very best of a scarce resource C-level access from leadership positioning in media