SlideShare a Scribd company logo
1 of 30
Download to read offline
1
From practice to theory
in learning from massive data
Charles Elkan
Amazon Fellow
August 14, 2016
Important
Information here is already public.
Opinions are mine, not Amazon’s.
3
Outline
Only 30 minutes!
1. Detecting anomalies in streaming data
2. Making Spark usable for real-time predictions
3. Amazon’s most important algorithm for recommendations
4. Uplift: We want causation, not merely correlation
Outline
1. Detecting anomalies in streaming data
2. Making Spark usable for real-time predictions
3. Amazon’s most important algorithm for recommendations
4. Uplift: We want causation, not merely correlation
From practice to theory
From theory to practice
Now for everyone!
Outline
1. Detecting anomalies in streaming data
2. Making Spark usable for real-time predictions
3. Amazon’s most important algorithm for recommendations
4. Uplift: We want causation, not merely correlation
From practice to practice
Outline
1. Detecting anomalies in streaming data
2. Making Spark usable for real-time predictions
3. Amazon’s most important algorithm for recommendations
4. Uplift: We want causation, not merely correlation
13
Academic versus applied
In theory, researchers favor simplicity. In practice, they don’t.
In industry, simplicity genuinely wins.
Example: Desiderata for recommender systems:
1. Respect the privacy of users; don’t be creepy.
2. Make recommendations understandable.
3. Make them responsive to the user’s most recent interests.
4. Generate them with millisecond latency.
14
Amazon’s most important recommender system
1. Respect the privacy of users; don’t be creepy.
2. Make recommendations understandable.
3. And responsive to the user’s most recent interests.
4. Generate them with millisecond latency.
Outline
1. Detecting anomalies in streaming data
2. Making Spark usable for real-time predictions
3. Amazon’s most important algorithm for recommendations
4. Uplift: We want causation, not merely correlation
What data scientists do every day
Let x be a user and let R = 0 or 1 be a response. For example, R=1
means the user buys shoes in the next month.
Routinely, we train models to predict the probability p(R=1|x).
We send messages and coupons to users with high p(R=1|x).
16
Is p(R=1|x) actually useful?
In principle, no. "Our goal is not to predict the future; it is to
change the future."
• Merely predicting user behavior is of limited interest.
We want to select treatments that influence users.
• T = t means we choose treatment t.
• For each available t, compute p(R=1|x,T=t).
• Choose the t that gives highest probability.
17
The risk of ignoring uplift
18
Users are ranked by p(R=1|x), shown by the brown line.
The blue dashed line shows p(R=1|x,T=t) .
The treatment t has a negative effect for users in the top 5%:
p(R=1|x,T=t) < p(R=1|x).
Politicians know this …
If you are a Republican, don’t target confirmed Democrat voters!
Instead:
• Send persuasive messages to undecided voters.
• Send “get out the vote” messages to confirmed supporters.
• Send “please donate” messages to these people also.
A common scenario for uplift
Many treatments are almost free to apply, such as sending email.
The uplift question is then which treatment is most effective.
For each user x, we want to know which t has highest value
p(R=1|x,T=t).
Keep in mind: The same treatment may be the best for all x.
20
A public dataset
Published by Kevin Hillstrom, former VP of database marketing
at Nordstrom.
Studied in several published papers on uplift, notably by Nicholas
Radcliffe, professor at the University of Edinburgh.
• 64,000 past customers of an e-commerce site selling clothing.
• Randomized to no email, men’s email, or women’s email.
• Three outcomes: Binary visit? purchase? and numerical spend.
21
Looking at the data
22
Treatments have a larger effect on “visit” than on “purchase
given visit” or on “spend given purchase.”
We'll analyze uplift (i.e., the causal influence of treatments)
for visits.
Table from Hillstrom’s MineThatData email analytics challenge by Radcliffe.
The linear probability model
Assume the linear function p(R=1|x) = b0 + ∑i bi * xi.
• Find coefficients bi to minimize square loss.
Square loss is proper, so predicted probabilities are calibrated.
Avoid overfitting and predictions <0 or >1 by not having too
many predictors.
Commonly used in econometrics, not in ML. In practice, often
quite similar to logistic regression.
23
probability of visit =
7.5% + … +
6.5% IF (men’s past
AND men’s email) +
6.6% IF (women’s
past AND men’s
email) +
6.1% IF (women’s
past AND women’s
email)
24
Including treatment indicators M and W
25
The men’s email is effective for customers who have
previously purchased men’s or women’s clothing.
The women’s email is not effective for customers who have
previously purchased only men’s clothing.
26
Optimal treatment policy:
• If only men’s previous purchases: send men’s email.
• If only women’s purchases: send either email.
• If both: send men’s email.
Hypothesis: Women tend to buy clothing for their families,
but men tend to buy clothing only for themselves.
Validation
How can we confirm that we have found an optimal policy?
Approach:
1. Train models of response for each treatment.
2. For each user x in a test set, plot both predicted probabilities.
3. Three separate test sets: users who previously purchased only
women’s clothing, only men’s, or both.
4. The latter two sets should show p(R=1|x, T=M) > p(R=1|x, T=W)
for most x.
Results using random forests:
Lower two panels: As expected,
p(R=1|x, T=M) > p(R=1|x, T=W).
Top panel: The two treatments
M and W are equally effective.
What comes next?
Conclusion: Indeed, one treatment (the men’s email) can be
optimal for all customers.
The step beyond uplift modeling is reinforcement learning:
Learning a sequence of actions that is best for each user.
• The goal is to maximize total lifetime reward from each
customer.
• Learn simultaneously how customers evolve and how
they respond to actions that we take.
29
Questions?
1. Detecting anomalies in streaming data
2. Making Spark usable for real-time predictions
3. Amazon’s most important algorithm for recommendations
4. Uplift: We want causation, not merely correlation

More Related Content

What's hot

Introduction to simulating data to improve your research
Introduction to simulating data to improve your researchIntroduction to simulating data to improve your research
Introduction to simulating data to improve your researchDorothy Bishop
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsKush Kulshrestha
 
Multiclass classification of imbalanced data
Multiclass classification of imbalanced dataMulticlass classification of imbalanced data
Multiclass classification of imbalanced dataSaurabhWani6
 
Statistical Test
Statistical TestStatistical Test
Statistical Testguestdbf093
 
Qnt 275 final exam july 2017 version
Qnt 275 final exam july 2017 versionQnt 275 final exam july 2017 version
Qnt 275 final exam july 2017 versionAdams-ASs
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)Abhimanyu Dwivedi
 

What's hot (6)

Introduction to simulating data to improve your research
Introduction to simulating data to improve your researchIntroduction to simulating data to improve your research
Introduction to simulating data to improve your research
 
Performance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning AlgorithmsPerformance Metrics for Machine Learning Algorithms
Performance Metrics for Machine Learning Algorithms
 
Multiclass classification of imbalanced data
Multiclass classification of imbalanced dataMulticlass classification of imbalanced data
Multiclass classification of imbalanced data
 
Statistical Test
Statistical TestStatistical Test
Statistical Test
 
Qnt 275 final exam july 2017 version
Qnt 275 final exam july 2017 versionQnt 275 final exam july 2017 version
Qnt 275 final exam july 2017 version
 
Machine learning session6(decision trees random forrest)
Machine learning   session6(decision trees random forrest)Machine learning   session6(decision trees random forrest)
Machine learning session6(decision trees random forrest)
 

Similar to From Practice to Theory in Learning from Massive Data by Charles Elkan at BigMine16

Personalized News Recommendation (Stream Data Based)
Personalized News Recommendation (Stream Data Based)Personalized News Recommendation (Stream Data Based)
Personalized News Recommendation (Stream Data Based)Umesh Singla
 
Causality without headaches
Causality without headachesCausality without headaches
Causality without headachesBenoît Rostykus
 
Marketing Experiment - Part II: Analysis
Marketing Experiment - Part II: Analysis Marketing Experiment - Part II: Analysis
Marketing Experiment - Part II: Analysis Minha Hwang
 
Uplift Modeling Workshop
Uplift Modeling WorkshopUplift Modeling Workshop
Uplift Modeling Workshopodsc
 
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan HamedUplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan HamedRising Media Ltd.
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithmsArunangsu Sahu
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdfgadissaassefa
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSScsula its training
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017MLconf
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for RecommendationOlivier Jeunen
 
Data Science Isn't a Fad: Let's Keep it That Way
Data Science Isn't a Fad: Let's Keep it That WayData Science Isn't a Fad: Let's Keep it That Way
Data Science Isn't a Fad: Let's Keep it That WayMelinda Thielbar
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data sciencepujashri1975
 
Using Excel to Build Understanding AMATYC 2015
Using Excel to Build Understanding AMATYC 2015Using Excel to Build Understanding AMATYC 2015
Using Excel to Build Understanding AMATYC 2015kathleenalmy
 
slides-correlations.pdf
slides-correlations.pdfslides-correlations.pdf
slides-correlations.pdfFlorentBersani
 
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfkobra22
 
Data mining approaches and methods
Data mining approaches and methodsData mining approaches and methods
Data mining approaches and methodssonangrai
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment systemKOYELMAJUMDAR1
 

Similar to From Practice to Theory in Learning from Massive Data by Charles Elkan at BigMine16 (20)

Personalized News Recommendation (Stream Data Based)
Personalized News Recommendation (Stream Data Based)Personalized News Recommendation (Stream Data Based)
Personalized News Recommendation (Stream Data Based)
 
Causality without headaches
Causality without headachesCausality without headaches
Causality without headaches
 
Marketing Experiment - Part II: Analysis
Marketing Experiment - Part II: Analysis Marketing Experiment - Part II: Analysis
Marketing Experiment - Part II: Analysis
 
Uplift Modeling Workshop
Uplift Modeling WorkshopUplift Modeling Workshop
Uplift Modeling Workshop
 
DATA COLLECTION IN RESEARCH
DATA COLLECTION IN RESEARCHDATA COLLECTION IN RESEARCH
DATA COLLECTION IN RESEARCH
 
151028_abajpai1
151028_abajpai1151028_abajpai1
151028_abajpai1
 
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan HamedUplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
Uplift Modelling as a Tool for Making Causal Inferences at Shopify - Mojan Hamed
 
Essentials of machine learning algorithms
Essentials of machine learning algorithmsEssentials of machine learning algorithms
Essentials of machine learning algorithms
 
Supervised Learning.pdf
Supervised Learning.pdfSupervised Learning.pdf
Supervised Learning.pdf
 
SPSS statistics - get help using SPSS
SPSS statistics - get help using SPSSSPSS statistics - get help using SPSS
SPSS statistics - get help using SPSS
 
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
Aaron Roth, Associate Professor, University of Pennsylvania, at MLconf NYC 2017
 
Counterfactual Learning for Recommendation
Counterfactual Learning for RecommendationCounterfactual Learning for Recommendation
Counterfactual Learning for Recommendation
 
Data Science Isn't a Fad: Let's Keep it That Way
Data Science Isn't a Fad: Let's Keep it That WayData Science Isn't a Fad: Let's Keep it That Way
Data Science Isn't a Fad: Let's Keep it That Way
 
Module-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data scienceModule-2_Notes-with-Example for data science
Module-2_Notes-with-Example for data science
 
Using Excel to Build Understanding AMATYC 2015
Using Excel to Build Understanding AMATYC 2015Using Excel to Build Understanding AMATYC 2015
Using Excel to Build Understanding AMATYC 2015
 
slides-correlations.pdf
slides-correlations.pdfslides-correlations.pdf
slides-correlations.pdf
 
statistics - Populations and Samples.pdf
statistics - Populations and Samples.pdfstatistics - Populations and Samples.pdf
statistics - Populations and Samples.pdf
 
Data mining approaches and methods
Data mining approaches and methodsData mining approaches and methods
Data mining approaches and methods
 
Disease Prediction And Doctor Appointment system
Disease Prediction And Doctor Appointment  systemDisease Prediction And Doctor Appointment  system
Disease Prediction And Doctor Appointment system
 
Stat342 ch1
Stat342 ch1Stat342 ch1
Stat342 ch1
 

More from BigMine

Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...
Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...
Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...BigMine
 
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16BigMine
 
Big Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina MorikBig Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina MorikBigMine
 
Exact Data Reduction for Big Data by Jieping Ye
Exact Data Reduction for Big Data by Jieping YeExact Data Reduction for Big Data by Jieping Ye
Exact Data Reduction for Big Data by Jieping YeBigMine
 
Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...BigMine
 
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...BigMine
 
Big & Personal: the data and the models behind Netflix recommendations by Xa...
 Big & Personal: the data and the models behind Netflix recommendations by Xa... Big & Personal: the data and the models behind Netflix recommendations by Xa...
Big & Personal: the data and the models behind Netflix recommendations by Xa...BigMine
 
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosLarge Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosBigMine
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles ParkerBigMine
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...BigMine
 

More from BigMine (10)

Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...
Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...
Inside the Atoms: Mining a Network of Networks and Beyond by HangHang Tong at...
 
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
Foundations for Scaling ML in Apache Spark by Joseph Bradley at BigMine16
 
Big Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina MorikBig Data and Small Devices by Katharina Morik
Big Data and Small Devices by Katharina Morik
 
Exact Data Reduction for Big Data by Jieping Ye
Exact Data Reduction for Big Data by Jieping YeExact Data Reduction for Big Data by Jieping Ye
Exact Data Reduction for Big Data by Jieping Ye
 
Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...Processing Reachability Queries with Realistic Constraints on Massive Network...
Processing Reachability Queries with Realistic Constraints on Massive Network...
 
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
Challenging Problems for Scalable Mining of Heterogeneous Social and Informat...
 
Big & Personal: the data and the models behind Netflix recommendations by Xa...
 Big & Personal: the data and the models behind Netflix recommendations by Xa... Big & Personal: the data and the models behind Netflix recommendations by Xa...
Big & Personal: the data and the models behind Netflix recommendations by Xa...
 
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos FaloutsosLarge Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
Large Graph Mining – Patterns, tools and cascade analysis by Christos Faloutsos
 
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 Unexpected Challenges in Large Scale Machine Learning by Charles Parker Unexpected Challenges in Large Scale Machine Learning by Charles Parker
Unexpected Challenges in Large Scale Machine Learning by Charles Parker
 
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
Big Data Analytics: Applications and Opportunities in On-line Predictive Mode...
 

Recently uploaded

Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our WorldEduminds Learning
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024thyngster
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdfHuman37
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档208367051
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degreeyuu sss
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanMYRABACSAFRA2
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 217djon017
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxBoston Institute of Analytics
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in collegessuser7a7cd61
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...limedy534
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsVICTOR MAESTRE RAMIREZ
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一fhwihughh
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queensdataanalyticsqueen03
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort servicejennyeacort
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceSapana Sha
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...Amil Baba Dawood bangali
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhijennyeacort
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...Boston Institute of Analytics
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]📊 Markus Baersch
 

Recently uploaded (20)

Learn How Data Science Changes Our World
Learn How Data Science Changes Our WorldLearn How Data Science Changes Our World
Learn How Data Science Changes Our World
 
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
Consent & Privacy Signals on Google *Pixels* - MeasureCamp Amsterdam 2024
 
20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf20240419 - Measurecamp Amsterdam - SAM.pdf
20240419 - Measurecamp Amsterdam - SAM.pdf
 
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
原版1:1定制南十字星大学毕业证(SCU毕业证)#文凭成绩单#真实留信学历认证永久存档
 
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
Deep Generative Learning for All - The Gen AI Hype (Spring 2024)
 
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
毕业文凭制作#回国入职#diploma#degree澳洲中央昆士兰大学毕业证成绩单pdf电子版制作修改#毕业文凭制作#回国入职#diploma#degree
 
Identifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population MeanIdentifying Appropriate Test Statistics Involving Population Mean
Identifying Appropriate Test Statistics Involving Population Mean
 
Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2Easter Eggs From Star Wars and in cars 1 and 2
Easter Eggs From Star Wars and in cars 1 and 2
 
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptxNLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
NLP Project PPT: Flipkart Product Reviews through NLP Data Science.pptx
 
While-For-loop in python used in college
While-For-loop in python used in collegeWhile-For-loop in python used in college
While-For-loop in python used in college
 
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
Effects of Smartphone Addiction on the Academic Performances of Grades 9 to 1...
 
Advanced Machine Learning for Business Professionals
Advanced Machine Learning for Business ProfessionalsAdvanced Machine Learning for Business Professionals
Advanced Machine Learning for Business Professionals
 
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
办理学位证纽约大学毕业证(NYU毕业证书)原版一比一
 
Top 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In QueensTop 5 Best Data Analytics Courses In Queens
Top 5 Best Data Analytics Courses In Queens
 
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
9711147426✨Call In girls Gurgaon Sector 31. SCO 25 escort service
 
Call Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts ServiceCall Girls In Dwarka 9654467111 Escorts Service
Call Girls In Dwarka 9654467111 Escorts Service
 
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
NO1 Certified Black Magic Specialist Expert Amil baba in Lahore Islamabad Raw...
 
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝DelhiRS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
RS 9000 Call In girls Dwarka Mor (DELHI)⇛9711147426🔝Delhi
 
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
NLP Data Science Project Presentation:Predicting Heart Disease with NLP Data ...
 
GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]GA4 Without Cookies [Measure Camp AMS]
GA4 Without Cookies [Measure Camp AMS]
 

From Practice to Theory in Learning from Massive Data by Charles Elkan at BigMine16

  • 1. 1 From practice to theory in learning from massive data Charles Elkan Amazon Fellow August 14, 2016
  • 2. Important Information here is already public. Opinions are mine, not Amazon’s.
  • 3. 3
  • 4. Outline Only 30 minutes! 1. Detecting anomalies in streaming data 2. Making Spark usable for real-time predictions 3. Amazon’s most important algorithm for recommendations 4. Uplift: We want causation, not merely correlation
  • 5. Outline 1. Detecting anomalies in streaming data 2. Making Spark usable for real-time predictions 3. Amazon’s most important algorithm for recommendations 4. Uplift: We want causation, not merely correlation
  • 7. From theory to practice
  • 9. Outline 1. Detecting anomalies in streaming data 2. Making Spark usable for real-time predictions 3. Amazon’s most important algorithm for recommendations 4. Uplift: We want causation, not merely correlation
  • 10. From practice to practice
  • 11.
  • 12. Outline 1. Detecting anomalies in streaming data 2. Making Spark usable for real-time predictions 3. Amazon’s most important algorithm for recommendations 4. Uplift: We want causation, not merely correlation
  • 13. 13 Academic versus applied In theory, researchers favor simplicity. In practice, they don’t. In industry, simplicity genuinely wins. Example: Desiderata for recommender systems: 1. Respect the privacy of users; don’t be creepy. 2. Make recommendations understandable. 3. Make them responsive to the user’s most recent interests. 4. Generate them with millisecond latency.
  • 14. 14 Amazon’s most important recommender system 1. Respect the privacy of users; don’t be creepy. 2. Make recommendations understandable. 3. And responsive to the user’s most recent interests. 4. Generate them with millisecond latency.
  • 15. Outline 1. Detecting anomalies in streaming data 2. Making Spark usable for real-time predictions 3. Amazon’s most important algorithm for recommendations 4. Uplift: We want causation, not merely correlation
  • 16. What data scientists do every day Let x be a user and let R = 0 or 1 be a response. For example, R=1 means the user buys shoes in the next month. Routinely, we train models to predict the probability p(R=1|x). We send messages and coupons to users with high p(R=1|x). 16
  • 17. Is p(R=1|x) actually useful? In principle, no. "Our goal is not to predict the future; it is to change the future." • Merely predicting user behavior is of limited interest. We want to select treatments that influence users. • T = t means we choose treatment t. • For each available t, compute p(R=1|x,T=t). • Choose the t that gives highest probability. 17
  • 18. The risk of ignoring uplift 18 Users are ranked by p(R=1|x), shown by the brown line. The blue dashed line shows p(R=1|x,T=t) . The treatment t has a negative effect for users in the top 5%: p(R=1|x,T=t) < p(R=1|x).
  • 19. Politicians know this … If you are a Republican, don’t target confirmed Democrat voters! Instead: • Send persuasive messages to undecided voters. • Send “get out the vote” messages to confirmed supporters. • Send “please donate” messages to these people also.
  • 20. A common scenario for uplift Many treatments are almost free to apply, such as sending email. The uplift question is then which treatment is most effective. For each user x, we want to know which t has highest value p(R=1|x,T=t). Keep in mind: The same treatment may be the best for all x. 20
  • 21. A public dataset Published by Kevin Hillstrom, former VP of database marketing at Nordstrom. Studied in several published papers on uplift, notably by Nicholas Radcliffe, professor at the University of Edinburgh. • 64,000 past customers of an e-commerce site selling clothing. • Randomized to no email, men’s email, or women’s email. • Three outcomes: Binary visit? purchase? and numerical spend. 21
  • 22. Looking at the data 22 Treatments have a larger effect on “visit” than on “purchase given visit” or on “spend given purchase.” We'll analyze uplift (i.e., the causal influence of treatments) for visits. Table from Hillstrom’s MineThatData email analytics challenge by Radcliffe.
  • 23. The linear probability model Assume the linear function p(R=1|x) = b0 + ∑i bi * xi. • Find coefficients bi to minimize square loss. Square loss is proper, so predicted probabilities are calibrated. Avoid overfitting and predictions <0 or >1 by not having too many predictors. Commonly used in econometrics, not in ML. In practice, often quite similar to logistic regression. 23
  • 24. probability of visit = 7.5% + … + 6.5% IF (men’s past AND men’s email) + 6.6% IF (women’s past AND men’s email) + 6.1% IF (women’s past AND women’s email) 24 Including treatment indicators M and W
  • 25. 25 The men’s email is effective for customers who have previously purchased men’s or women’s clothing. The women’s email is not effective for customers who have previously purchased only men’s clothing.
  • 26. 26 Optimal treatment policy: • If only men’s previous purchases: send men’s email. • If only women’s purchases: send either email. • If both: send men’s email. Hypothesis: Women tend to buy clothing for their families, but men tend to buy clothing only for themselves.
  • 27. Validation How can we confirm that we have found an optimal policy? Approach: 1. Train models of response for each treatment. 2. For each user x in a test set, plot both predicted probabilities. 3. Three separate test sets: users who previously purchased only women’s clothing, only men’s, or both. 4. The latter two sets should show p(R=1|x, T=M) > p(R=1|x, T=W) for most x.
  • 28. Results using random forests: Lower two panels: As expected, p(R=1|x, T=M) > p(R=1|x, T=W). Top panel: The two treatments M and W are equally effective.
  • 29. What comes next? Conclusion: Indeed, one treatment (the men’s email) can be optimal for all customers. The step beyond uplift modeling is reinforcement learning: Learning a sequence of actions that is best for each user. • The goal is to maximize total lifetime reward from each customer. • Learn simultaneously how customers evolve and how they respond to actions that we take. 29
  • 30. Questions? 1. Detecting anomalies in streaming data 2. Making Spark usable for real-time predictions 3. Amazon’s most important algorithm for recommendations 4. Uplift: We want causation, not merely correlation