SlideShare a Scribd company logo
Data Science Competition
2. 25. 2017
The 27th Annual KSEA South-Western Regional Conference
Jeong-Yoon Lee, Ph.D.
Chief Data Scientist, Conversion Logic
Ph.D. in Computer Science, USC
M.S. in Electrical Engineering, USC
B.S. in Electrical Engineering, SNU
KDD Cup Winner 2012 & 2015
Top 10, Kaggle 2015
Jeong-Yoon Lee, Ph.D.
Why Data Science Competition
Why Compete
• For fun
• For experience
• For learning
• For networking
4
Fun
• Competing with others
• Incremental improvement
5
Experience
6
Learning
7
Learning
8
Networking
9
10
Data Science Competition
Data Science Competitions
Since 1997
2006 - 2009
Since 2010
Competition Structure
Training Data
Test Data
Feature Label
Provided Submission Public LB Score Private LB Score
Kaggle
• 250+ competitions since 2010
• 500K+ users
• 50K+ competitors
• $3MM+ prize paid out
Kaggle
Kaggle
Misconceptions on Competitions
Misconceptions on Competitions
• No ETL
• No EDA
• Not worth it
• Not for production
18
No ETL? - Deloitte Western Australia Rental Prices
19
No ETL? - Outbrain Click Prediction
20
2B page views. 16.9MM clicks. 700MM users. 560 sites
No ETL? - YouTube-8M Video Understanding Challenge
21
1.7TB feature-level data. 31GB video-level data.
No ETL?
22
No EDA?
• Most of competitions provide actual labels - typical EDA
• Anonymized data - more creative EDA
o People decode age, states, time intervals, income, etc.
23
No EDA?
• Anonymized data - more creative EDA
24
Not worth it?
• Performance matters
• You walk easier when you can run
25
Not for Production?
• Kaggle Kernel
o Max execution time:10 minutes
o Max file output: 500MB
o Memory limit: 8GB
26
Ensemble Pipeline at Conversion Logic
27
Best Practices
Best Practices
• Feature Engineering
• Algorithms
• Cross Validation
• Ensemble
29
Feature Engineering
• Numerical - Log, Log(1 + x), Normalization, Binarization
• Categorical - One-hot-encode, TF-IDF (text), Weight-of-
Evidence
• Timeseries - Stats, FFT, MFCC, ERP (EEG)
• Numerical/Timeseries to Categorical - RF/GBM*
• Dimensionality Reduction - PCA, SVD, Autoencoder
* http://www.csie.ntu.edu.tw/~r01922136/kaggle-2014-criteo.pdf
30
Algorithms
Algorithm Tool Note
Gradient Boosting Machine XGBoost, LightGBM The most popular algorithm in competitions
Random Forests Scikit-Learn, randomForest Used to be popular before GBM
Extremely Random Trees Scikit-Learn
Neural Networks/ Deep Learning Keras, MXNet, CNTK, Torch Blends well with GBM. Best at image and speech recognition competitions
Logistic/Linear Regression Scikit-Learn, Vowpal Wabbit Fastest. Good for ensemble.
Support Vector Machine Scikit-Learn
FTRL Vowpal Wabbit Competitive solution for CTR estimation competitions
Factorization Machine libFM Winning solution for KDD Cup 2012
Field-aware Factorization Machine libFFM Winning solution for CTR estimation competitions (Criteo, Avazu)
31
Cross Validation
Training data are split into five folds where the sample size and dropout rate are preserved (stratified).
32
Ensemble
* for other types of ensemble, see http://mlwave.com/kaggle-ensembling-guide/
34
KDDCup 2015 Solution
35
Why Competition
• For fun
• For experiences
• For learning
• For networking
36
One Last Thing
37
Google: 20K applications per week
Conversion Logic: 200 applications per week
Thank You

More Related Content

What's hot

BSSML17 - Deepnets
BSSML17 - DeepnetsBSSML17 - Deepnets
BSSML17 - Deepnets
BigML, Inc
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
QuantUniversity
 
Graph Gurus Episode 28: In-Database Machine Learning Solution for Real-Time R...
Graph Gurus Episode 28: In-Database Machine Learning Solution for Real-Time R...Graph Gurus Episode 28: In-Database Machine Learning Solution for Real-Time R...
Graph Gurus Episode 28: In-Database Machine Learning Solution for Real-Time R...
TigerGraph
 
Winning data science competitions
Winning data science competitionsWinning data science competitions
Winning data science competitions
Owen Zhang
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat
omarodibat
 
The Evolution of AutoML
The Evolution of AutoMLThe Evolution of AutoML
The Evolution of AutoML
Ning Jiang
 
Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...
Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...
Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...
TigerGraph
 
Deep feature synthesis
Deep feature synthesisDeep feature synthesis
Deep feature synthesis
Durra Sahtout
 
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2
TigerGraph
 
Using Graph Algorithms for Advanced Analytics - Part 5 Classification
Using Graph Algorithms for Advanced Analytics - Part 5 ClassificationUsing Graph Algorithms for Advanced Analytics - Part 5 Classification
Using Graph Algorithms for Advanced Analytics - Part 5 Classification
TigerGraph
 
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiPractical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Sri Ambati
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
Joonyoung Yi
 
Model Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model AnalysisModel Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model Analysis
Vivek Raja P S
 
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Simplilearn
 

What's hot (14)

BSSML17 - Deepnets
BSSML17 - DeepnetsBSSML17 - Deepnets
BSSML17 - Deepnets
 
Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101Automatic machine learning (AutoML) 101
Automatic machine learning (AutoML) 101
 
Graph Gurus Episode 28: In-Database Machine Learning Solution for Real-Time R...
Graph Gurus Episode 28: In-Database Machine Learning Solution for Real-Time R...Graph Gurus Episode 28: In-Database Machine Learning Solution for Real-Time R...
Graph Gurus Episode 28: In-Database Machine Learning Solution for Real-Time R...
 
Winning data science competitions
Winning data science competitionsWinning data science competitions
Winning data science competitions
 
Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat Boosting Algorithms Omar Odibat
Boosting Algorithms Omar Odibat
 
The Evolution of AutoML
The Evolution of AutoMLThe Evolution of AutoML
The Evolution of AutoML
 
Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...
Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...
Graph Gurus Episode 19: Deep Learning Implemented by GSQL on a Native Paralle...
 
Deep feature synthesis
Deep feature synthesisDeep feature synthesis
Deep feature synthesis
 
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2
Graph Gurus Episode 27: Using Graph Algorithms for Advanced Analytics Part 2
 
Using Graph Algorithms for Advanced Analytics - Part 5 Classification
Using Graph Algorithms for Advanced Analytics - Part 5 ClassificationUsing Graph Algorithms for Advanced Analytics - Part 5 Classification
Using Graph Algorithms for Advanced Analytics - Part 5 Classification
 
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.aiPractical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
Practical Tips for Interpreting Machine Learning Models - Patrick Hall, H2O.ai
 
Introduction to XGBoost
Introduction to XGBoostIntroduction to XGBoost
Introduction to XGBoost
 
Model Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model AnalysisModel Drift Monitoring using Tensorflow Model Analysis
Model Drift Monitoring using Tensorflow Model Analysis
 
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
Scikit-Learn Tutorial | Machine Learning With Scikit-Learn | Sklearn | Python...
 

Viewers also liked

How hackathons can drive top line revenue growth
How hackathons can drive top line revenue growthHow hackathons can drive top line revenue growth
How hackathons can drive top line revenue growth
HackerEarth
 
Tda presentation
Tda presentationTda presentation
Tda presentation
HJ van Veen
 
Data science at the command line
Data science at the command lineData science at the command line
Data science at the command line
Sharat Chikkerur
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)
Domino Data Lab
 
DataRobot R Package
DataRobot R PackageDataRobot R Package
DataRobot R Package
DataRobot
 
HackerEarth Sourcing Solution
HackerEarth Sourcing SolutionHackerEarth Sourcing Solution
HackerEarth Sourcing Solution
HackerEarth
 
6 rules of enterprise innovation
6 rules of enterprise innovation6 rules of enterprise innovation
6 rules of enterprise innovation
HackerEarth
 
How to assess & hire Java developers accurately?
How to assess & hire Java developers accurately?How to assess & hire Java developers accurately?
How to assess & hire Java developers accurately?
HackerEarth
 
No-Bullshit Data Science
No-Bullshit Data ScienceNo-Bullshit Data Science
No-Bullshit Data Science
Domino Data Lab
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
Domino Data Lab
 
Fairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningFairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine Learning
HJ van Veen
 
Leverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingLeverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingHackerEarth
 
Ethics in Data Science and Machine Learning
Ethics in Data Science and Machine LearningEthics in Data Science and Machine Learning
Ethics in Data Science and Machine Learning
HJ van Veen
 
Open Innovation - A Case Study
Open Innovation - A Case StudyOpen Innovation - A Case Study
Open Innovation - A Case Study
HackerEarth
 
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Spark Summit
 
USC LIGHT Ministry Introduction
USC LIGHT Ministry IntroductionUSC LIGHT Ministry Introduction
USC LIGHT Ministry Introduction
Jeong-Yoon Lee
 
Intra company hackathons using HackerEarth
Intra company hackathons using HackerEarthIntra company hackathons using HackerEarth
Intra company hackathons using HackerEarth
HackerEarth
 
Kill the wabbit
Kill the wabbitKill the wabbit
Kill the wabbit
Joe Kleinwaechter
 
Druva Casestudy - HackerEarth
Druva Casestudy - HackerEarthDruva Casestudy - HackerEarth
Druva Casestudy - HackerEarthHackerEarth
 

Viewers also liked (20)

How hackathons can drive top line revenue growth
How hackathons can drive top line revenue growthHow hackathons can drive top line revenue growth
How hackathons can drive top line revenue growth
 
Tda presentation
Tda presentationTda presentation
Tda presentation
 
Data science at the command line
Data science at the command lineData science at the command line
Data science at the command line
 
Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)Doing your first Kaggle (Python for Big Data sets)
Doing your first Kaggle (Python for Big Data sets)
 
DataRobot R Package
DataRobot R PackageDataRobot R Package
DataRobot R Package
 
HackerEarth Sourcing Solution
HackerEarth Sourcing SolutionHackerEarth Sourcing Solution
HackerEarth Sourcing Solution
 
6 rules of enterprise innovation
6 rules of enterprise innovation6 rules of enterprise innovation
6 rules of enterprise innovation
 
How to assess & hire Java developers accurately?
How to assess & hire Java developers accurately?How to assess & hire Java developers accurately?
How to assess & hire Java developers accurately?
 
No-Bullshit Data Science
No-Bullshit Data ScienceNo-Bullshit Data Science
No-Bullshit Data Science
 
Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field Managing Data Science | Lessons from the Field
Managing Data Science | Lessons from the Field
 
Fairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine LearningFairly Measuring Fairness In Machine Learning
Fairly Measuring Fairness In Machine Learning
 
Leverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and RecruitingLeverage Social Media for Employer Brand and Recruiting
Leverage Social Media for Employer Brand and Recruiting
 
Ethics in Data Science and Machine Learning
Ethics in Data Science and Machine LearningEthics in Data Science and Machine Learning
Ethics in Data Science and Machine Learning
 
Work - LIGHT Ministry
Work - LIGHT MinistryWork - LIGHT Ministry
Work - LIGHT Ministry
 
Open Innovation - A Case Study
Open Innovation - A Case StudyOpen Innovation - A Case Study
Open Innovation - A Case Study
 
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
Feature Hashing for Scalable Machine Learning: Spark Summit East talk by Nick...
 
USC LIGHT Ministry Introduction
USC LIGHT Ministry IntroductionUSC LIGHT Ministry Introduction
USC LIGHT Ministry Introduction
 
Intra company hackathons using HackerEarth
Intra company hackathons using HackerEarthIntra company hackathons using HackerEarth
Intra company hackathons using HackerEarth
 
Kill the wabbit
Kill the wabbitKill the wabbit
Kill the wabbit
 
Druva Casestudy - HackerEarth
Druva Casestudy - HackerEarthDruva Casestudy - HackerEarth
Druva Casestudy - HackerEarth
 

Similar to Data Science Competition

E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - SundanceE3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
Sundance Multiprocessor Technology Ltd.
 
Melt iron heterogeneous computing - lspe v3
Melt iron   heterogeneous computing - lspe v3Melt iron   heterogeneous computing - lspe v3
Melt iron heterogeneous computing - lspe v3
Rinka Singh
 
Predictive Analytics in Manufacturing
Predictive Analytics in ManufacturingPredictive Analytics in Manufacturing
Predictive Analytics in Manufacturing
Data Science Thailand
 
ACIC: Automatic Cloud I/O Configurator for HPC Applications
ACIC: Automatic Cloud I/O Configurator for HPC ApplicationsACIC: Automatic Cloud I/O Configurator for HPC Applications
ACIC: Automatic Cloud I/O Configurator for HPC Applications
Mingliang Liu
 
Hyun joong
Hyun joongHyun joong
Hyun joong
Hyun Joong Kim
 
Datalake project
Datalake projectDatalake project
Datalake project
Hyun Joong Kim
 
lec01.pdf
lec01.pdflec01.pdf
lec01.pdf
BeiYu6
 
AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...
AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...
AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...
Amazon Web Services
 
FPGA-enhanced Bioinformatics @ NECST
FPGA-enhanced Bioinformatics @ NECSTFPGA-enhanced Bioinformatics @ NECST
FPGA-enhanced Bioinformatics @ NECST
NECST Lab @ Politecnico di Milano
 
MN-3, MN-Core and HPL - SC21 Green500 BOF
MN-3, MN-Core and HPL - SC21 Green500 BOFMN-3, MN-Core and HPL - SC21 Green500 BOF
MN-3, MN-Core and HPL - SC21 Green500 BOF
Preferred Networks
 
Physical Design Services
Physical Design ServicesPhysical Design Services
Physical Design Services
eInfochips (An Arrow Company)
 
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM Research
 
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
Larry Smarr
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
inside-BigData.com
 
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
adil raja
 
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
adil raja
 
Scaling face recognition with big data - Bogdan Bocse
 Scaling face recognition with big data - Bogdan Bocse Scaling face recognition with big data - Bogdan Bocse
Scaling face recognition with big data - Bogdan Bocse
ITCamp
 
2017 09-ohkawa-MCSoC2017-presen
2017 09-ohkawa-MCSoC2017-presen2017 09-ohkawa-MCSoC2017-presen
2017 09-ohkawa-MCSoC2017-presen
Takeshi Ohkawa
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
NECST Lab @ Politecnico di Milano
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Jim Dowling
 

Similar to Data Science Competition (20)

E3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - SundanceE3MV - Embedded Vision - Sundance
E3MV - Embedded Vision - Sundance
 
Melt iron heterogeneous computing - lspe v3
Melt iron   heterogeneous computing - lspe v3Melt iron   heterogeneous computing - lspe v3
Melt iron heterogeneous computing - lspe v3
 
Predictive Analytics in Manufacturing
Predictive Analytics in ManufacturingPredictive Analytics in Manufacturing
Predictive Analytics in Manufacturing
 
ACIC: Automatic Cloud I/O Configurator for HPC Applications
ACIC: Automatic Cloud I/O Configurator for HPC ApplicationsACIC: Automatic Cloud I/O Configurator for HPC Applications
ACIC: Automatic Cloud I/O Configurator for HPC Applications
 
Hyun joong
Hyun joongHyun joong
Hyun joong
 
Datalake project
Datalake projectDatalake project
Datalake project
 
lec01.pdf
lec01.pdflec01.pdf
lec01.pdf
 
AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...
AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...
AWS re:Invent 2016: Hardware-Accelerating Graphics Desktop Workloads with Ama...
 
FPGA-enhanced Bioinformatics @ NECST
FPGA-enhanced Bioinformatics @ NECSTFPGA-enhanced Bioinformatics @ NECST
FPGA-enhanced Bioinformatics @ NECST
 
MN-3, MN-Core and HPL - SC21 Green500 BOF
MN-3, MN-Core and HPL - SC21 Green500 BOFMN-3, MN-Core and HPL - SC21 Green500 BOF
MN-3, MN-Core and HPL - SC21 Green500 BOF
 
Physical Design Services
Physical Design ServicesPhysical Design Services
Physical Design Services
 
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
IBM and ASTRON 64-Bit Microserver Prototype Prepares for Big Bang's Big Data,...
 
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
A High-Performance Campus-Scale Cyberinfrastructure: The Technical, Political...
 
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
40 Powers of 10 - Simulating the Universe with the DiRAC HPC Facility
 
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
 
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
Modeling the Effect of Packet Loss on Speech Quality: Genetic Programming Bas...
 
Scaling face recognition with big data - Bogdan Bocse
 Scaling face recognition with big data - Bogdan Bocse Scaling face recognition with big data - Bogdan Bocse
Scaling face recognition with big data - Bogdan Bocse
 
2017 09-ohkawa-MCSoC2017-presen
2017 09-ohkawa-MCSoC2017-presen2017 09-ohkawa-MCSoC2017-presen
2017 09-ohkawa-MCSoC2017-presen
 
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...Architectural Optimizations for High Performance and Energy Efficient Smith-W...
Architectural Optimizations for High Performance and Energy Efficient Smith-W...
 
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala UniversityInvited Lecture on GPUs and Distributed Deep Learning at Uppsala University
Invited Lecture on GPUs and Distributed Deep Learning at Uppsala University
 

Recently uploaded

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
Frank van Harmelen
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
Kari Kakkonen
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Product School
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
Fwdays
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
DianaGray10
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
RTTS
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
Laura Byrne
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
BookNet Canada
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
Ralf Eggert
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
Product School
 

Recently uploaded (20)

FIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdfFIDO Alliance Osaka Seminar: Overview.pdf
FIDO Alliance Osaka Seminar: Overview.pdf
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*Neuro-symbolic is not enough, we need neuro-*semantic*
Neuro-symbolic is not enough, we need neuro-*semantic*
 
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdfFIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
FIDO Alliance Osaka Seminar: Passkeys at Amazon.pdf
 
DevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA ConnectDevOps and Testing slides at DASA Connect
DevOps and Testing slides at DASA Connect
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 
Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...Mission to Decommission: Importance of Decommissioning Products to Increase E...
Mission to Decommission: Importance of Decommissioning Products to Increase E...
 
"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi"Impact of front-end architecture on development cost", Viktor Turskyi
"Impact of front-end architecture on development cost", Viktor Turskyi
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4UiPath Test Automation using UiPath Test Suite series, part 4
UiPath Test Automation using UiPath Test Suite series, part 4
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdfFIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
FIDO Alliance Osaka Seminar: The WebAuthn API and Discoverable Credentials.pdf
 
JMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and GrafanaJMeter webinar - integration with InfluxDB and Grafana
JMeter webinar - integration with InfluxDB and Grafana
 
The Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and SalesThe Art of the Pitch: WordPress Relationships and Sales
The Art of the Pitch: WordPress Relationships and Sales
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...Transcript: Selling digital books in 2024: Insights from industry leaders - T...
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
 
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdfFIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
FIDO Alliance Osaka Seminar: FIDO Security Aspects.pdf
 
PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)PHP Frameworks: I want to break free (IPC Berlin 2024)
PHP Frameworks: I want to break free (IPC Berlin 2024)
 
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
From Daily Decisions to Bottom Line: Connecting Product Work to Revenue by VP...
 

Data Science Competition

Editor's Notes

  1. I am Jeong-Yoon Lee, Chief Data Scientist at Conversion Logic. I am going to tell you little bit about our attribution approach.
  2. states, age, time interval, weekday,
  3. states, age, time interval, weekday,
  4. Training data are split into five folds while the sample size and dropout rate are preserved across folds. For validation, each of single and ensemble models is trained five times. Each time, one fold is held out and the remain- ing four folds are used for training. Then, predictions for the hold-out folds are combined and form the model’s CV pre- diction. CV predictions are used in AUC score calculation and/or as inputs in ensemble model training. For test, each of single and ensemble models is retrained with whole training data. Then predictions for test data are used for submission and/or as inputs in ensemble model prediction.
  5. For validation, each of single and ensemble models is trained five times. Each time, one fold is held out and the remain- ing four folds are used for training. Then, predictions for the hold-out folds are combined and form the model’s CV pre- diction. CV predictions are used in AUC score calculation and/or as inputs in ensemble model training. For test, each of single and ensemble models is retrained with whole training data. Then predictions for test data are used for submission and/or as inputs in ensemble model prediction.
  6. Stage-I Ensemble: We trained 15 stage-I ensemble classifiers with different subsets of CV predictions of 64 individual classifiers. Stage-II Ensemble: We trained 2 stage-II ensemble classifiers with different subsets of CV predictions of 15 stage-I ensemble classifiers. Stage-III Ensemble: We trained a stage-III ensemble classifier with CV predictions of 5 classifiers: 1 stage-II ensemble, 3 stage-I ensemble, and 1 individual classifiers