SlideShare a Scribd company logo
4/23/13
Movie Night: Data Science
- Even On Our Night Off
May 27, 2014
Anqi and Irene – (H2O)
• Anqi is the in-house R expert and is responsible for K-means and PCA
• Irene is the pencil and paper stats nerd and technical writer
• Part of a data science team that’s 75% women, and on a technical team that’s
23% women (well above average).
Sergei- (Collective) VP, Data Sciences at Collective, where he is
responsible for the architecture, development and scaling of data-driven
technology products for digital advertising.
What is H2O?
• Same statistics - new volumes of data
• On a distributed cluster models on a terabyte of data can finish in
minutes.
• Provide an interface to give more people the power of data science.
• Also hook H2O into R and Scala
Overview
Walk through the practical problem of what movie to go see
together.
Examine work flow from data to prediction, and let the best
model inform our choice
Extend to production setting applications with a customer use
case
Movie Lens Data
Data is the 100,000 observation MovieLens data set
Demographic Features:
State Age Occupation Gender
Factor Integer Factor Factor
Levels: 62 Range
(7,73)
Levels: 21 Levels: 2
Largest class:
California
Mean: 32.9 Largest Class:
Student
M:F is about
3:1
Movie Classes
Movies are classified by types, types are not exclusive.
Dependent Variable
Users rated movies on a Likert scale of 1 to 5.
We converted this to a binomial indicator:
Ratings >= 4: recoded to 1, indicating liked movie
Ratings < 4: recoded to 0, indicating disliked the movie
Super Models
Both models are predicting the same dependent variable as a function
of the same set of features.
First model with tree based GBM - start simple and let the model get
as complex as it needs to with depth
Alternative model with regularized GLM - start with complexity
and let model generalize with regularization
WWIM
Using Gradient Boosted Classification on two classes
GBM is nonparametric, great when there’s no theoretical
model.
Accounts for complex interaction
Control overfitting with learning rate
WWAM: Alternative – Logistic GLM
Logistic binomial regression
End model has interpretability
Control for overfitting introducing penalty into objective
function - aids in feature selection and generalizability
Ridge regression- all L2 Penalty
Rubber; Meet Road
Comparison of error rates on holdout set
GBM Model GLM Model
Error on Dislike (0) 28% 30%
Error on Like (1) 18% 50%
Overall 22% 40%
GBM Predictions GLM Predictions
Like: 300, Her, Need For Speed
Dislike: Frozen, Pebody
Like: 300, Her, Capt. America
Dislike: Frozen, Divergent
Lights Out - Some Closing
Points
We didn't address a serious problem here - but this is the
general process used in a production environment.
To give you a sense for the real world implementation, we’ve
asked one of our users to share his use case with you.
Stories change people, while statistics gives
them something to argue about
- Bernie Siegel
Ad Server
(publisher)
Ad Server
(advertiser)
AgencyBrowser
BrandsPublishers
Content
Inventory
Ads
Audience
Audience Modeling
1. Build the Audience Cloud of stable cookies.
2. Define target audience using Cookie level data.
3. Assemble 1,000s of features on every cookie.
4. Build a predictive model using machine learning.
5. Score every cookie in the Audience Cloud.
6. Create a targetable segment with the top X users.
7. Adjust X daily to optimize delivery & performance.
8. Rebuild models weekly (daily if warranted).
Audience Cloud
(200M+ Stable Cookies)
Target Audience
(100K Cookies)
1M Cookies
3M Cookies
bit.ly/MLatScale
Preprint of paper submitted to KDD’14
Audience Extension: audiences (age 25-40, buys toys, watches TNT)
Audience Optimization: actions (clicks, online purchases)
Modeling Platform
MODEL BUILDING
Computing predictive models
on
Current Future
DATA SIZES
Size of data
ALGORITHM
Complexity and performance
GBMglmnet
1 million
1,000
1 billion
100,000
SCORING
Predicting outcomes
Batch
Real
Time
+ H2O

More Related Content

Similar to MLconf NYC 0xdata

Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
IRJET Journal
 
Experiment Management for the Enterprise
Experiment Management for the EnterpriseExperiment Management for the Enterprise
Experiment Management for the Enterprise
SigOpt
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
Anuj Gupta
 
Predicting Defects Using Change Genealogies (ISSE 2013)
Predicting Defects Using Change Genealogies (ISSE 2013)Predicting Defects Using Change Genealogies (ISSE 2013)
Predicting Defects Using Change Genealogies (ISSE 2013)Kim Herzig
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
Akin Osman Kazakci
 
[DSC Croatia 22] The Ethos of Process in ML - Leanne Fitzpatrick
[DSC Croatia 22] The Ethos of Process in ML - Leanne Fitzpatrick[DSC Croatia 22] The Ethos of Process in ML - Leanne Fitzpatrick
[DSC Croatia 22] The Ethos of Process in ML - Leanne Fitzpatrick
DataScienceConferenc1
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
Ted Dunning
 
Age and Gender Detection-converted.pdf
Age and Gender Detection-converted.pdfAge and Gender Detection-converted.pdf
Age and Gender Detection-converted.pdf
MohammedMuzammil83
 
IRJET- Analysis of Vehicle Number Plate Recognition
IRJET- Analysis of Vehicle Number Plate RecognitionIRJET- Analysis of Vehicle Number Plate Recognition
IRJET- Analysis of Vehicle Number Plate Recognition
IRJET Journal
 
Using synthetic data for computer vision model training
Using synthetic data for computer vision model trainingUsing synthetic data for computer vision model training
Using synthetic data for computer vision model training
Unity Technologies
 
BigML Education - Deepnets
BigML Education - DeepnetsBigML Education - Deepnets
BigML Education - Deepnets
BigML, Inc
 
Age and Gender Detection.docx
Age and Gender Detection.docxAge and Gender Detection.docx
Age and Gender Detection.docx
MohammedMuzammil83
 
DATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITODATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITO
MarcoMellia
 
Freenome's Biological Machine Learning Platform
Freenome's Biological Machine Learning PlatformFreenome's Biological Machine Learning Platform
Freenome's Biological Machine Learning Platform
Brandon White
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
MLReview
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
Jinseob Kim
 
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
Dataconomy Media
 
IRJET- Generating 3D Models Using 3D Generative Adversarial Network
IRJET- Generating 3D Models Using 3D Generative Adversarial NetworkIRJET- Generating 3D Models Using 3D Generative Adversarial Network
IRJET- Generating 3D Models Using 3D Generative Adversarial Network
IRJET Journal
 
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Pramit Choudhary
 

Similar to MLconf NYC 0xdata (20)

Partial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather ConditionsPartial Object Detection in Inclined Weather Conditions
Partial Object Detection in Inclined Weather Conditions
 
Experiment Management for the Enterprise
Experiment Management for the EnterpriseExperiment Management for the Enterprise
Experiment Management for the Enterprise
 
Building Continuous Learning Systems
Building Continuous Learning SystemsBuilding Continuous Learning Systems
Building Continuous Learning Systems
 
Predicting Defects Using Change Genealogies (ISSE 2013)
Predicting Defects Using Change Genealogies (ISSE 2013)Predicting Defects Using Change Genealogies (ISSE 2013)
Predicting Defects Using Change Genealogies (ISSE 2013)
 
Data Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analyticsData Science for Business Managers - An intro to ROI for predictive analytics
Data Science for Business Managers - An intro to ROI for predictive analytics
 
[DSC Croatia 22] The Ethos of Process in ML - Leanne Fitzpatrick
[DSC Croatia 22] The Ethos of Process in ML - Leanne Fitzpatrick[DSC Croatia 22] The Ethos of Process in ML - Leanne Fitzpatrick
[DSC Croatia 22] The Ethos of Process in ML - Leanne Fitzpatrick
 
Which Algorithms Really Matter
Which Algorithms Really MatterWhich Algorithms Really Matter
Which Algorithms Really Matter
 
Age and Gender Detection-converted.pdf
Age and Gender Detection-converted.pdfAge and Gender Detection-converted.pdf
Age and Gender Detection-converted.pdf
 
IRJET- Analysis of Vehicle Number Plate Recognition
IRJET- Analysis of Vehicle Number Plate RecognitionIRJET- Analysis of Vehicle Number Plate Recognition
IRJET- Analysis of Vehicle Number Plate Recognition
 
Using synthetic data for computer vision model training
Using synthetic data for computer vision model trainingUsing synthetic data for computer vision model training
Using synthetic data for computer vision model training
 
BigML Education - Deepnets
BigML Education - DeepnetsBigML Education - Deepnets
BigML Education - Deepnets
 
Age and Gender Detection.docx
Age and Gender Detection.docxAge and Gender Detection.docx
Age and Gender Detection.docx
 
DATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITODATI, AI E ROBOTICA @POLITO
DATI, AI E ROBOTICA @POLITO
 
Freenome's Biological Machine Learning Platform
Freenome's Biological Machine Learning PlatformFreenome's Biological Machine Learning Platform
Freenome's Biological Machine Learning Platform
 
Tutorial on Deep Generative Models
 Tutorial on Deep Generative Models Tutorial on Deep Generative Models
Tutorial on Deep Generative Models
 
useR 2014 jskim
useR 2014 jskimuseR 2014 jskim
useR 2014 jskim
 
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
DN18 | Demystifying the Buzz in Machine Learning! (This Time for Real) | Dat ...
 
IRJET- Generating 3D Models Using 3D Generative Adversarial Network
IRJET- Generating 3D Models Using 3D Generative Adversarial NetworkIRJET- Generating 3D Models Using 3D Generative Adversarial Network
IRJET- Generating 3D Models Using 3D Generative Adversarial Network
 
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )Learning to Learn Model Behavior ( Capital One: data intelligence conference )
Learning to Learn Model Behavior ( Capital One: data intelligence conference )
 
PowerPoint Presentation
PowerPoint PresentationPowerPoint Presentation
PowerPoint Presentation
 

More from MLconf

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
MLconf
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
MLconf
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
MLconf
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
MLconf
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
MLconf
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
MLconf
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
MLconf
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
MLconf
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
MLconf
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
MLconf
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
MLconf
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
MLconf
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
MLconf
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
MLconf
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
MLconf
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
MLconf
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
MLconf
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
MLconf
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
MLconf
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
MLconf
 

More from MLconf (20)

Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
Jamila Smith-Loud - Understanding Human Impact: Social and Equity Assessments...
 
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language UnderstandingTed Willke - The Brain’s Guide to Dealing with Context in Language Understanding
Ted Willke - The Brain’s Guide to Dealing with Context in Language Understanding
 
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
Justin Armstrong - Applying Computer Vision to Reduce Contamination in the Re...
 
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold RushIgor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
Igor Markov - Quantum Computing: a Treasure Hunt, not a Gold Rush
 
Josh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious ExperienceJosh Wills - Data Labeling as Religious Experience
Josh Wills - Data Labeling as Religious Experience
 
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
Vinay Prabhu - Project GaitNet: Ushering in the ImageNet moment for human Gai...
 
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
Jekaterina Novikova - Machine Learning Methods in Detecting Alzheimer’s Disea...
 
Meghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the CheapMeghana Ravikumar - Optimized Image Classification on the Cheap
Meghana Ravikumar - Optimized Image Classification on the Cheap
 
Noam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data CollectionNoam Finkelstein - The Importance of Modeling Data Collection
Noam Finkelstein - The Importance of Modeling Data Collection
 
June Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of MLJune Andrews - The Uncanny Valley of ML
June Andrews - The Uncanny Valley of ML
 
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection TasksSneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
Sneha Rajana - Deep Learning Architectures for Semantic Relation Detection Tasks
 
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
Anoop Deoras - Building an Incrementally Trained, Local Taste Aware, Global D...
 
Vito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI WorldVito Ostuni - The Voice: New Challenges in a Zero UI World
Vito Ostuni - The Voice: New Challenges in a Zero UI World
 
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
Anna choromanska - Data-driven Challenges in AI: Scale, Information Selection...
 
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
Janani Kalyanam - Machine Learning to Detect Illegal Online Sales of Prescrip...
 
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
Esperanza Lopez Aguilera - Using a Bayesian Neural Network in the Detection o...
 
Neel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to codeNeel Sundaresan - Teaching a machine to code
Neel Sundaresan - Teaching a machine to code
 
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
Rishabh Mehrotra - Recommendations in a Marketplace: Personalizing Explainabl...
 
Soumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better SoftwareSoumith Chintala - Increasing the Impact of AI Through Better Software
Soumith Chintala - Increasing the Impact of AI Through Better Software
 
Roy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime ChangesRoy Lowrance - Predicting Bond Prices: Regime Changes
Roy Lowrance - Predicting Bond Prices: Regime Changes
 

Recently uploaded

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Inflectra
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
OnBoard
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
Dorra BARTAGUIZ
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Product School
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
Elena Simperl
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
Thijs Feryn
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
DianaGray10
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
KatiaHIMEUR1
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
Alison B. Lowndes
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
Elena Simperl
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Thierry Lestable
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
Sri Ambati
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
Guy Korland
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Product School
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
James Anderson
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
Product School
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
DanBrown980551
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Ramesh Iyer
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
Paul Groth
 

Recently uploaded (20)

Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualitySoftware Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered Quality
 
Leading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdfLeading Change strategies and insights for effective change management pdf 1.pdf
Leading Change strategies and insights for effective change management pdf 1.pdf
 
Elevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object CalisthenicsElevating Tactical DDD Patterns Through Object Calisthenics
Elevating Tactical DDD Patterns Through Object Calisthenics
 
Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...Designing Great Products: The Power of Design and Leadership by Chief Designe...
Designing Great Products: The Power of Design and Leadership by Chief Designe...
 
Knowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and backKnowledge engineering: from people to machines and back
Knowledge engineering: from people to machines and back
 
Accelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish CachingAccelerate your Kubernetes clusters with Varnish Caching
Accelerate your Kubernetes clusters with Varnish Caching
 
UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3UiPath Test Automation using UiPath Test Suite series, part 3
UiPath Test Automation using UiPath Test Suite series, part 3
 
Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !Securing your Kubernetes cluster_ a step-by-step guide to success !
Securing your Kubernetes cluster_ a step-by-step guide to success !
 
Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........Bits & Pixels using AI for Good.........
Bits & Pixels using AI for Good.........
 
When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...When stars align: studies in data quality, knowledge graphs, and machine lear...
When stars align: studies in data quality, knowledge graphs, and machine lear...
 
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
Empowering NextGen Mobility via Large Action Model Infrastructure (LAMI): pav...
 
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdfFIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
FIDO Alliance Osaka Seminar: Passkeys and the Road Ahead.pdf
 
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
GenAISummit 2024 May 28 Sri Ambati Keynote: AGI Belongs to The Community in O...
 
GraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge GraphGraphRAG is All You need? LLM & Knowledge Graph
GraphRAG is All You need? LLM & Knowledge Graph
 
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
Unsubscribed: Combat Subscription Fatigue With a Membership Mentality by Head...
 
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...
 
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
AI for Every Business: Unlocking Your Product's Universal Potential by VP of ...
 
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...
 
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...
 
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMsTo Graph or Not to Graph Knowledge Graph Architectures and LLMs
To Graph or Not to Graph Knowledge Graph Architectures and LLMs
 

MLconf NYC 0xdata

  • 1. 4/23/13 Movie Night: Data Science - Even On Our Night Off May 27, 2014
  • 2. Anqi and Irene – (H2O) • Anqi is the in-house R expert and is responsible for K-means and PCA • Irene is the pencil and paper stats nerd and technical writer • Part of a data science team that’s 75% women, and on a technical team that’s 23% women (well above average). Sergei- (Collective) VP, Data Sciences at Collective, where he is responsible for the architecture, development and scaling of data-driven technology products for digital advertising.
  • 3. What is H2O? • Same statistics - new volumes of data • On a distributed cluster models on a terabyte of data can finish in minutes. • Provide an interface to give more people the power of data science. • Also hook H2O into R and Scala
  • 4. Overview Walk through the practical problem of what movie to go see together. Examine work flow from data to prediction, and let the best model inform our choice Extend to production setting applications with a customer use case
  • 5. Movie Lens Data Data is the 100,000 observation MovieLens data set Demographic Features: State Age Occupation Gender Factor Integer Factor Factor Levels: 62 Range (7,73) Levels: 21 Levels: 2 Largest class: California Mean: 32.9 Largest Class: Student M:F is about 3:1
  • 6. Movie Classes Movies are classified by types, types are not exclusive.
  • 7. Dependent Variable Users rated movies on a Likert scale of 1 to 5. We converted this to a binomial indicator: Ratings >= 4: recoded to 1, indicating liked movie Ratings < 4: recoded to 0, indicating disliked the movie
  • 8. Super Models Both models are predicting the same dependent variable as a function of the same set of features. First model with tree based GBM - start simple and let the model get as complex as it needs to with depth Alternative model with regularized GLM - start with complexity and let model generalize with regularization
  • 9. WWIM Using Gradient Boosted Classification on two classes GBM is nonparametric, great when there’s no theoretical model. Accounts for complex interaction Control overfitting with learning rate
  • 10. WWAM: Alternative – Logistic GLM Logistic binomial regression End model has interpretability Control for overfitting introducing penalty into objective function - aids in feature selection and generalizability Ridge regression- all L2 Penalty
  • 11. Rubber; Meet Road Comparison of error rates on holdout set GBM Model GLM Model Error on Dislike (0) 28% 30% Error on Like (1) 18% 50% Overall 22% 40%
  • 12. GBM Predictions GLM Predictions Like: 300, Her, Need For Speed Dislike: Frozen, Pebody Like: 300, Her, Capt. America Dislike: Frozen, Divergent
  • 13. Lights Out - Some Closing Points We didn't address a serious problem here - but this is the general process used in a production environment. To give you a sense for the real world implementation, we’ve asked one of our users to share his use case with you.
  • 14. Stories change people, while statistics gives them something to argue about - Bernie Siegel
  • 16. Audience Modeling 1. Build the Audience Cloud of stable cookies. 2. Define target audience using Cookie level data. 3. Assemble 1,000s of features on every cookie. 4. Build a predictive model using machine learning. 5. Score every cookie in the Audience Cloud. 6. Create a targetable segment with the top X users. 7. Adjust X daily to optimize delivery & performance. 8. Rebuild models weekly (daily if warranted). Audience Cloud (200M+ Stable Cookies) Target Audience (100K Cookies) 1M Cookies 3M Cookies bit.ly/MLatScale Preprint of paper submitted to KDD’14 Audience Extension: audiences (age 25-40, buys toys, watches TNT) Audience Optimization: actions (clicks, online purchases)
  • 17. Modeling Platform MODEL BUILDING Computing predictive models on Current Future DATA SIZES Size of data ALGORITHM Complexity and performance GBMglmnet 1 million 1,000 1 billion 100,000 SCORING Predicting outcomes Batch Real Time + H2O