SlideShare a Scribd company logo
DataScience@SMU
Language Empowered
Recommendations
Albert Asuncion, Peter Kouvaris, Ekaterina
Pirogova, Hari Sanadhya and Arun Rajagopal
1
DataScience@SMU
Recommendations Today
2
DataScience@SMU
Recommendation Systems
Types
3
Earliest Modern New generation
Item, User, Rating
Items, Users, Context, Time,
Location, Rating
Algorithm
Recommendation
DataScience@SMU
Recommendation Systems
Types
4
Earliest Modern New generation
+
Item, User, Rating
Items, Users, Context,
Time, Location, Rating Everything in modern
+
Reviews text data
Algorithm
Recommendation Algorithm
Recommendation
DataScience@SMU
Not All Recommendation Data is
Equal
5
DataScience@SMU
Yelp – Some Data Limitations
6
Most of non-
mobile app users
are not registered
Recommendation
success/failure is
difficult to measure
User time of visit is
unknown
Ideal data structure
-?
DataScience@SMU
Yelp's Primary Data Point - The Review
7
DataScience@SMU
Yelp Star Rating
What Does it Mean?
• Ambiguity
• A 2-star rating is greater than 1-star, but by
how much?
8
DataScience@SMU
Star Bias
9
Current Stars Distribution Ideal Shape
DataScience@SMU
But What If We Could
Create a Super Star?
10
DataScience@SMU
The Information Within Text
11
DataScience@SMU 12
I. Make ordinal star ratings more
meaningful
II. Improve the modern generation
recommendation system with the
value from the text
Objectives
DataScience@SMU
High Level View Of Our
Strategy
13
DataScience@SMU
Our Super Star Factory
14
DataScience@SMU
Building the Recommendations
15
DataScience@SMU
How We Measure Each Method
16
DataScience@SMU
How We Measure Each Method
17
DataScience@SMU
Overview of Our New Star
Results
18
Basic Stars
DataScience@SMU
Overview of Super Star
Results
19
Basic Stars Sentiment Stars
DataScience@SMU
Overview of Super
Star Results
20
Basic Stars Sentiment Stars Deep Learning Stars
DataScience@SMU
How to Think About The Deep
Learning Approach
21
DataScience@SMU
Recommendation with
Super Stars
22
Basic NLP
0.4307 0.5289
*22% FCP improvement
(Fraction of Concordant Pairs)
DataScience@SMU
Where Yelp's
Recommendations Are Today
• Yelp aims for scalable solutions.
23
DataScience@SMU
Limitations of this Approach
• Spark API is limited. Must remain
distributed.
• Preprocessing spam is too harsh
24
DataScience@SMU
Why this Way? Yelp's Incentive:
25
Revenue Sources (2017)
DataScience@SMU
Features in 3rd Generation
Recommendations
26
New Generation Data Example for Random User
DataScience@SMU
Methods in 3rd Generation
Recommendations
27
New generation
+
Algorithm
Recommendation
• Text processing
• Image processing
• Feature creation
DataScience@SMU
Thinking About
Recommendation and Ethics
Search rank in recommendation systems
becomes economically valuable as the
product gains popularity.
28
DataScience@SMU
Lawsuits Due to Bad Reviews
29
DataScience@SMU
The Effects of Unsafe
Recommendation Products
• Attackers gain the economic benefit.
• Good products or companies lose
market share.
30
Are users’ livelihoods directly tied
to our system?
DataScience@SMU
Privacy for Users
31
• Anonymity is important in cases where
recommendations hold value.
• e.g. Businesses pursuing positive
reviews
DataScience@SMU
Weaknesses with Reliance on
Text Data
Adversarial agents that were trained in the
same way can get to the top!
32
DataScience@SMU
Conclusions
• The use of text data makes star ratings
more meaningful.
• The text of review improves the
recommendation system quality, even
with small sample sizes.
• Fake reviews penalty is too harsh in Yelp
system.
33

More Related Content

What's hot

Conrad - Separating the Wheat from the Chaff
Conrad - Separating the Wheat from the ChaffConrad - Separating the Wheat from the Chaff
Conrad - Separating the Wheat from the Chaff
National Information Standards Organization (NISO)
 
Project Data Incorporating Qualitative Factors for Improved Software Defect P...
Project Data Incorporating Qualitative Factors for Improved Software Defect P...Project Data Incorporating Qualitative Factors for Improved Software Defect P...
Project Data Incorporating Qualitative Factors for Improved Software Defect P...
Tim Menzies
 
ACM ICTIR 2019 Slides - Santa Clara, USA
ACM ICTIR 2019 Slides -  Santa Clara, USAACM ICTIR 2019 Slides -  Santa Clara, USA
ACM ICTIR 2019 Slides - Santa Clara, USA
Iadh Ounis
 
AIRG Presentation
AIRG PresentationAIRG Presentation
AIRG Presentation
nirvdrum
 
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
YONG ZHENG
 
CIRPA 2016: It's Show Time: Are Your Data Ready to be the "Next Big Thing"?
CIRPA 2016: It's Show Time: Are Your Data Ready to be the "Next Big Thing"?CIRPA 2016: It's Show Time: Are Your Data Ready to be the "Next Big Thing"?
CIRPA 2016: It's Show Time: Are Your Data Ready to be the "Next Big Thing"?
Stephen Childs
 
Tips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistTips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data Scientist
Lisa Cohen
 
Data science concept by Raj Krishna Paul
Data science concept by Raj Krishna PaulData science concept by Raj Krishna Paul
Data science concept by Raj Krishna Paul
Subir Paul
 
Improving inferences from poor quality samples
Improving inferences from poor quality samplesImproving inferences from poor quality samples
Improving inferences from poor quality samples
The Social Research Centre
 
Speed Data Set
Speed Data SetSpeed Data Set
Speed Data Set
Ritesh Kp
 
Distribution Problems in Recommender Systems
Distribution Problems in Recommender SystemsDistribution Problems in Recommender Systems
Distribution Problems in Recommender Systems
Daniel McEnnis
 
DSD-INT 2021 The choice - A workshop for modelers
DSD-INT 2021 The choice - A workshop for modelersDSD-INT 2021 The choice - A workshop for modelers
DSD-INT 2021 The choice - A workshop for modelers
Deltares
 
Making Connections - Turing user insights into impact
Making Connections - Turing user insights into impactMaking Connections - Turing user insights into impact
Making Connections - Turing user insights into impact
ProQuest
 
10.a predictive analytics primer
10.a predictive analytics primer10.a predictive analytics primer
10.a predictive analytics primer
Anirud Reddy Vem
 
Commonalities in LibQUAL+® (Dis)satisfaction: An international trend?
Commonalities in LibQUAL+® (Dis)satisfaction: An international trend?Commonalities in LibQUAL+® (Dis)satisfaction: An international trend?
Commonalities in LibQUAL+® (Dis)satisfaction: An international trend?
Selena Killick
 
This Is Not What We Ordered: Exploring Why Biased Search Result Rankings Affe...
This Is Not What We Ordered: Exploring Why Biased Search Result Rankings Affe...This Is Not What We Ordered: Exploring Why Biased Search Result Rankings Affe...
This Is Not What We Ordered: Exploring Why Biased Search Result Rankings Affe...
TimDraws
 
Fortner_OSCARPresentation
Fortner_OSCARPresentationFortner_OSCARPresentation
Fortner_OSCARPresentationAshley Fortner
 
Arjun
ArjunArjun
Crowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral OutcomesCrowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral OutcomesAlekya Yermal
 
DoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End toolDoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End tool
Amit Sharma
 

What's hot (20)

Conrad - Separating the Wheat from the Chaff
Conrad - Separating the Wheat from the ChaffConrad - Separating the Wheat from the Chaff
Conrad - Separating the Wheat from the Chaff
 
Project Data Incorporating Qualitative Factors for Improved Software Defect P...
Project Data Incorporating Qualitative Factors for Improved Software Defect P...Project Data Incorporating Qualitative Factors for Improved Software Defect P...
Project Data Incorporating Qualitative Factors for Improved Software Defect P...
 
ACM ICTIR 2019 Slides - Santa Clara, USA
ACM ICTIR 2019 Slides -  Santa Clara, USAACM ICTIR 2019 Slides -  Santa Clara, USA
ACM ICTIR 2019 Slides - Santa Clara, USA
 
AIRG Presentation
AIRG PresentationAIRG Presentation
AIRG Presentation
 
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
[WI 2017] Affective Prediction By Collaborative Chains In Movie Recommendation
 
CIRPA 2016: It's Show Time: Are Your Data Ready to be the "Next Big Thing"?
CIRPA 2016: It's Show Time: Are Your Data Ready to be the "Next Big Thing"?CIRPA 2016: It's Show Time: Are Your Data Ready to be the "Next Big Thing"?
CIRPA 2016: It's Show Time: Are Your Data Ready to be the "Next Big Thing"?
 
Tips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data ScientistTips and Tricks to be an Effective Data Scientist
Tips and Tricks to be an Effective Data Scientist
 
Data science concept by Raj Krishna Paul
Data science concept by Raj Krishna PaulData science concept by Raj Krishna Paul
Data science concept by Raj Krishna Paul
 
Improving inferences from poor quality samples
Improving inferences from poor quality samplesImproving inferences from poor quality samples
Improving inferences from poor quality samples
 
Speed Data Set
Speed Data SetSpeed Data Set
Speed Data Set
 
Distribution Problems in Recommender Systems
Distribution Problems in Recommender SystemsDistribution Problems in Recommender Systems
Distribution Problems in Recommender Systems
 
DSD-INT 2021 The choice - A workshop for modelers
DSD-INT 2021 The choice - A workshop for modelersDSD-INT 2021 The choice - A workshop for modelers
DSD-INT 2021 The choice - A workshop for modelers
 
Making Connections - Turing user insights into impact
Making Connections - Turing user insights into impactMaking Connections - Turing user insights into impact
Making Connections - Turing user insights into impact
 
10.a predictive analytics primer
10.a predictive analytics primer10.a predictive analytics primer
10.a predictive analytics primer
 
Commonalities in LibQUAL+® (Dis)satisfaction: An international trend?
Commonalities in LibQUAL+® (Dis)satisfaction: An international trend?Commonalities in LibQUAL+® (Dis)satisfaction: An international trend?
Commonalities in LibQUAL+® (Dis)satisfaction: An international trend?
 
This Is Not What We Ordered: Exploring Why Biased Search Result Rankings Affe...
This Is Not What We Ordered: Exploring Why Biased Search Result Rankings Affe...This Is Not What We Ordered: Exploring Why Biased Search Result Rankings Affe...
This Is Not What We Ordered: Exploring Why Biased Search Result Rankings Affe...
 
Fortner_OSCARPresentation
Fortner_OSCARPresentationFortner_OSCARPresentation
Fortner_OSCARPresentation
 
Arjun
ArjunArjun
Arjun
 
Crowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral OutcomesCrowdsourcing Predictors of Behavioral Outcomes
Crowdsourcing Predictors of Behavioral Outcomes
 
DoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End toolDoWhy Python library for causal inference: An End-to-End tool
DoWhy Python library for causal inference: An End-to-End tool
 

Similar to Language Empowered Recommendations

Learning Analytics – From Reactive to Predictive
Learning Analytics – From Reactive to PredictiveLearning Analytics – From Reactive to Predictive
Learning Analytics – From Reactive to Predictive
LearningCafe
 
Data Quality Doesn’t Just Happen: And Here’s What Some of the Industry’s Most...
Data Quality Doesn’t Just Happen: And Here’s What Some of the Industry’s Most...Data Quality Doesn’t Just Happen: And Here’s What Some of the Industry’s Most...
Data Quality Doesn’t Just Happen: And Here’s What Some of the Industry’s Most...
InsightInnovation
 
Data-Driven Product Innovation
Data-Driven Product InnovationData-Driven Product Innovation
Data-Driven Product Innovation
Xin Fu
 
Optimizing Protocol Planning, Feasibility, and Site Selection through an Inte...
Optimizing Protocol Planning, Feasibility, and Site Selection through an Inte...Optimizing Protocol Planning, Feasibility, and Site Selection through an Inte...
Optimizing Protocol Planning, Feasibility, and Site Selection through an Inte...
will buckley
 
Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!
Farhan Khan
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data assetBala Iyer
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
Toronto-Oracle-Users-Group
 
Student Activity Hub community Meeting 10-25-2017
Student Activity Hub community Meeting 10-25-2017Student Activity Hub community Meeting 10-25-2017
Student Activity Hub community Meeting 10-25-2017
Brett Pollak
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in Education
Philip Piety
 
IRJET- Analysis of Question and Answering Recommendation System
IRJET-  	  Analysis of Question and Answering Recommendation SystemIRJET-  	  Analysis of Question and Answering Recommendation System
IRJET- Analysis of Question and Answering Recommendation System
IRJET Journal
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
Alan Said
 
1030 track1 bennett
1030 track1 bennett1030 track1 bennett
1030 track1 bennett
Rising Media, Inc.
 
The Role of Analytics in Talent Acquisition
The Role of Analytics in Talent AcquisitionThe Role of Analytics in Talent Acquisition
The Role of Analytics in Talent Acquisition
Human Capital Media
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Health Catalyst
 
Using Net Promoter Score (NPS) to Increase Course Engagement
Using Net Promoter Score (NPS) to Increase Course EngagementUsing Net Promoter Score (NPS) to Increase Course Engagement
Using Net Promoter Score (NPS) to Increase Course Engagement
Lambda Solutions
 
Telling the Full Story: Adding Qualitative Data To Executive Dashboards
Telling the Full Story: Adding Qualitative Data To Executive DashboardsTelling the Full Story: Adding Qualitative Data To Executive Dashboards
Telling the Full Story: Adding Qualitative Data To Executive Dashboards
UserZoom
 
Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019
Sonya Liberman
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makers
Ruhollah Farchtchi
 
What are we learning from learning analytics: Rhetoric to reality escalate 2014
What are we learning from learning analytics: Rhetoric to reality escalate 2014What are we learning from learning analytics: Rhetoric to reality escalate 2014
What are we learning from learning analytics: Rhetoric to reality escalate 2014
Shane Dawson
 

Similar to Language Empowered Recommendations (20)

Learning Analytics – From Reactive to Predictive
Learning Analytics – From Reactive to PredictiveLearning Analytics – From Reactive to Predictive
Learning Analytics – From Reactive to Predictive
 
Data Quality Doesn’t Just Happen: And Here’s What Some of the Industry’s Most...
Data Quality Doesn’t Just Happen: And Here’s What Some of the Industry’s Most...Data Quality Doesn’t Just Happen: And Here’s What Some of the Industry’s Most...
Data Quality Doesn’t Just Happen: And Here’s What Some of the Industry’s Most...
 
2015kddtutorial
2015kddtutorial2015kddtutorial
2015kddtutorial
 
Data-Driven Product Innovation
Data-Driven Product InnovationData-Driven Product Innovation
Data-Driven Product Innovation
 
Optimizing Protocol Planning, Feasibility, and Site Selection through an Inte...
Optimizing Protocol Planning, Feasibility, and Site Selection through an Inte...Optimizing Protocol Planning, Feasibility, and Site Selection through an Inte...
Optimizing Protocol Planning, Feasibility, and Site Selection through an Inte...
 
Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!Big Data Analytics - It is here and now!
Big Data Analytics - It is here and now!
 
Valuing the data asset
Valuing the data assetValuing the data asset
Valuing the data asset
 
TOUG Big Data Challenge and Impact
TOUG Big Data Challenge and ImpactTOUG Big Data Challenge and Impact
TOUG Big Data Challenge and Impact
 
Student Activity Hub community Meeting 10-25-2017
Student Activity Hub community Meeting 10-25-2017Student Activity Hub community Meeting 10-25-2017
Student Activity Hub community Meeting 10-25-2017
 
NCME Big Data in Education
NCME Big Data  in EducationNCME Big Data  in Education
NCME Big Data in Education
 
IRJET- Analysis of Question and Answering Recommendation System
IRJET-  	  Analysis of Question and Answering Recommendation SystemIRJET-  	  Analysis of Question and Answering Recommendation System
IRJET- Analysis of Question and Answering Recommendation System
 
Best Practices in Recommender System Challenges
Best Practices in Recommender System ChallengesBest Practices in Recommender System Challenges
Best Practices in Recommender System Challenges
 
1030 track1 bennett
1030 track1 bennett1030 track1 bennett
1030 track1 bennett
 
The Role of Analytics in Talent Acquisition
The Role of Analytics in Talent AcquisitionThe Role of Analytics in Talent Acquisition
The Role of Analytics in Talent Acquisition
 
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big DataMicrosoft: A Waking Giant In Healthcare Analytics and Big Data
Microsoft: A Waking Giant In Healthcare Analytics and Big Data
 
Using Net Promoter Score (NPS) to Increase Course Engagement
Using Net Promoter Score (NPS) to Increase Course EngagementUsing Net Promoter Score (NPS) to Increase Course Engagement
Using Net Promoter Score (NPS) to Increase Course Engagement
 
Telling the Full Story: Adding Qualitative Data To Executive Dashboards
Telling the Full Story: Adding Qualitative Data To Executive DashboardsTelling the Full Story: Adding Qualitative Data To Executive Dashboards
Telling the Full Story: Adding Qualitative Data To Executive Dashboards
 
Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019Recommender Systems @ Scale - PyData 2019
Recommender Systems @ Scale - PyData 2019
 
Big data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makersBig data analytics presented at meetup big data for decision makers
Big data analytics presented at meetup big data for decision makers
 
What are we learning from learning analytics: Rhetoric to reality escalate 2014
What are we learning from learning analytics: Rhetoric to reality escalate 2014What are we learning from learning analytics: Rhetoric to reality escalate 2014
What are we learning from learning analytics: Rhetoric to reality escalate 2014
 

Recently uploaded

Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
haila53
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
Subhajit Sahu
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
ewymefz
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
javier ramirez
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
mbawufebxi
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
AnirbanRoy608946
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
u86oixdj
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
Roger Valdez
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
AbhimanyuSinha9
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
74nqk8xf
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
ewymefz
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
rwarrenll
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
NABLAS株式会社
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
pchutichetpong
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Subhajit Sahu
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
v3tuleee
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
ahzuo
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
jerlynmaetalle
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
TravisMalana
 

Recently uploaded (20)

Ch03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdfCh03-Managing the Object-Oriented Information Systems Project a.pdf
Ch03-Managing the Object-Oriented Information Systems Project a.pdf
 
Adjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTESAdjusting primitives for graph : SHORT REPORT / NOTES
Adjusting primitives for graph : SHORT REPORT / NOTES
 
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
一比一原版(UofM毕业证)明尼苏达大学毕业证成绩单
 
The Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series DatabaseThe Building Blocks of QuestDB, a Time Series Database
The Building Blocks of QuestDB, a Time Series Database
 
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
一比一原版(Bradford毕业证书)布拉德福德大学毕业证如何办理
 
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptxData_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
Data_and_Analytics_Essentials_Architect_an_Analytics_Platform.pptx
 
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
原版制作(swinburne毕业证书)斯威本科技大学毕业证毕业完成信一模一样
 
Everything you wanted to know about LIHTC
Everything you wanted to know about LIHTCEverything you wanted to know about LIHTC
Everything you wanted to know about LIHTC
 
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...Best best suvichar in gujarati english meaning of this sentence as Silk road ...
Best best suvichar in gujarati english meaning of this sentence as Silk road ...
 
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
一比一原版(Coventry毕业证书)考文垂大学毕业证如何办理
 
一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单一比一原版(NYU毕业证)纽约大学毕业证成绩单
一比一原版(NYU毕业证)纽约大学毕业证成绩单
 
My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.My burning issue is homelessness K.C.M.O.
My burning issue is homelessness K.C.M.O.
 
Criminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdfCriminal IP - Threat Hunting Webinar.pdf
Criminal IP - Threat Hunting Webinar.pdf
 
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
【社内勉強会資料_Octo: An Open-Source Generalist Robot Policy】
 
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
Data Centers - Striving Within A Narrow Range - Research Report - MCG - May 2...
 
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
Algorithmic optimizations for Dynamic Levelwise PageRank (from STICD) : SHORT...
 
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理一比一原版(UofS毕业证书)萨省大学毕业证如何办理
一比一原版(UofS毕业证书)萨省大学毕业证如何办理
 
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
一比一原版(UIUC毕业证)伊利诺伊大学|厄巴纳-香槟分校毕业证如何办理
 
The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...The affect of service quality and online reviews on customer loyalty in the E...
The affect of service quality and online reviews on customer loyalty in the E...
 
Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)Malana- Gimlet Market Analysis (Portfolio 2)
Malana- Gimlet Market Analysis (Portfolio 2)
 

Language Empowered Recommendations

Editor's Notes

  1. Recommendations are ubiquitous in today's technology products. They save us time so we don’t have to continuously search. On LinkedIn we are recommended to connect with individuals closely related to us and on YouTube we see an endless array of quality data science videos. On Amazon we are recommended products we may have never even heard of but would like to buy. Yelp we are shown popular restaurants we may not have heard of. Let's take a look at how these systems are built today.
  2. What started out simple as a simple item, user, and rating algorithm has now evolved. In the current age, these systems have evolved to include not just this recommendation layer but also preprocessing on inputs so that the recommendations can leverage all new types of data. New data points are now included to add value.
  3. As data science methods related to natural language processing evolve, a new layer is being introduced to this stack. The power of language is being added in what experts are calling an ontology model, where a user is no longer just a rating tied to an item, but instead a complex web of interests and opinions.
  4. From this added complexity we can conclude that while data may be valuable for a business purpose, not all of these valuable data points naturally translate over to being used in recommendations. We want to know our data is representative and can be trusted for these algorithms!
  5. Looking at our Venn Diagram, Yelp is missing a majority of these dimensions. Most non-mobile users are not registered, so no additional data is captured. Measuring the success and failure is generally difficult,,, and finally we rarely know when a user visited. __________________ 1. No need to have user logged in to view the ratings and reviews. This makes most visitors of yelp as unregistered users thus making personalized user based recommendations less useful. 2. Without user information who was provided the recommendation, it's not possible to verify if the user actually went to the restaurant recommended and get users feedback on the same. If this was possible then the accuracy of recommendation system could have been improved over time. 3. Reviewers time of visit to the business is not recorded in the system making the calculation of delay in time elapsed between the visit to the restaurant and review writing time unavailable. This information would have been critical as its more likely as the more is this delay, more chances that the rating provided by user would be a little lower. "Almost everyone remembers negative things more strongly and in more detail - Clifford Nass, a professor of communication at Stanford University".
  6. Yelp's primary data comes in the form of a business review like this. User's post these with a star rating, review text, and can add images or check-in if they want. There also is some meta-data about the actual reviews. For recommendations, the 1 to 5 star rating seems like the clear starting point.
  7. The Yelp the star rating is an ordinal data point. It is naturally ambiguous, where we can’t be sure exactly how different each rank is from the other. This distance is key, because the algorithms that allow us to provide useful recommendations will leverage this distance in a very literal mathematical way. 
  8. This ambiguous nature of star ratings is clear with a closer look. We noted that star distributions were heavily skewed, which, becomes a large issue when training the recommender algorithm. A better shape would be what we see on the right, a distribution based around an average experience.
  9. So what if we can transform these stars into something better? Infusing each with the text of a review, perhaps we can achieve a more normal distribution and then increase the recommendations our model gives off. _________________________________ Recommendation algorithms at a high level provide you with information based on others who look similarly. If we cannot clearly pin down what is bad, average, and great for a user in a clean numeric format then we are not leveraging these tools as best we could. Our data looks like the left here out of the box, but what we would actually prefer is something closer to the right. So what if we could create this better star rating? We would just want something that has this clear information in a numeric format and our recommendations would benefit heavily, increasing the relevance of what were before noise that was confusing our algorithm. Our recommendations will greatly benefit from this data! https://docs.google.com/drawings/d/1bLuT9jKVSbOXag8GFGljIWhTSUHFeR6m7bKhvVMSV7o/edit?usp=sharing
  10. The information in the text of a review is very personal. Different individuals use different word choices, word order, grammar, and more. Using these pieces of information we will attempt to redistribute stars so that they hold more information for our recommender models. 
  11. Our strategy, at a high level, is to focus our efforts on the star creation aspect of the pipeline, taking the yelp data and creating new stars. We then will build recommendation models and review the effect of these stars to get a sense of the functionality added by our methods.  https://docs.google.com/drawings/d/1NV41F5ZzstL9TcgRneOyTnQNcnTV5dBfrjiT0VbeFkA/edit?usp=sharing
  12. Our star factory has two sides. One neural network learns the normal review style for each business, and then another learns it for each user. The combined star output considers text regarding both entities. Our final result is therefore a new star rating. https://docs.google.com/drawings/d/131waFzcnjVKWB8CWjPlOG_jQNOZQiGgjT6RkBbSrxQw/edit?usp=sharing
  13. Building a recommendation algorithm and making sure that the format does not distort the results is key. We used an  algorithm in the same family as Yelp's called the Collaborative Filtering Algorithm family. These methods will make much better use of our adjusted stars compared to the previous type. Drawings https://docs.google.com/drawings/d/1tBXzUZKp_1Item7H6sb3TRpWPXC4rLrwXe_6rxn5aZo/edit?usp=sharing
  14. We use measurement to verify if our methods will provide an improvement to the final recommendation model. To achieve this we look at the adjusted star distribution on a very large level, simply a histogram of all the new star ratings. https://docs.google.com/drawings/d/1TzaYyh4KV2ywfXKenQH00MIa1mEI2uuITfr-ZF50tq8/edit?usp=sharing
  15. We also measure the skewness of all user star distributions to ensure its having an effect across all our users. We will explain this in more detail in a moment, but you can think of our second metric as a way to average all the histograms for each user's specific star distribution. https://docs.google.com/drawings/d/1TzaYyh4KV2ywfXKenQH00MIa1mEI2uuITfr-ZF50tq8/edit?usp=sharing
  16. Our basic star method is, as we would expect, weak along the dimensions we are using to measure success. It's distribution is heavily skewed, and the tails for the skew plot extend from –3 to 2. Simply put, our standard format isn’t ideal for recommendation.
  17. If we use a basic sentiment score as our new stars, one of the simpler NLP applications, we notice a change in the overall distribution. Looking at the skewness of users as a whole however, we note that the user skewness plot is almost an exact match. This means that our method reshapes the stars overall but does not help user distributions.
  18. Our final combined star method achieves both of our goals, producing a normal distribution on the top plot and reducing the tails of our users' skewness by a fair margin.  ________________ Looking at the plot on the bottom right, we see the tails have shortened as well. This highlights that our user's star data now looks closer to a bell curve on average, and we have begun to successfully change star ratings on the user level, adding a great depth of information. 
  19. ???? Change the New label to Super Star The best way to think about our Deep Learning approach is to understand the basic goal of the architecture. The goal is to learn what a user's and business' historic reviews look like, then analyze the current text and use that respective value to adjust the star rating. __________________ In this manner derivations from normal behavior, like a very strict user giving a positive and flowery review for the first time will be noted as very important compared to the same review text from another user who is generally very positive.  https://docs.google.com/drawings/d/10JTmP1x3o3Cq1yFVtayavczlcE8YmRCwzz_J2xfEJIs/edit?usp=sharing
  20. Our new stars are able to perform well on an out of sample dataset using the FCP score, a common metric for these methods where higher is better, although the ultimate validation for this method would have to be delivering users our recommendations via a product and reviewing the response. __________________ In this manner derivations from normal behavior, like a very strict user giving a positive and flowery review for the first time will be noted as very important compared to the same review text from another user who is generally very positive.  https://docs.google.com/drawings/d/10JTmP1x3o3Cq1yFVtayavczlcE8YmRCwzz_J2xfEJIs/edit?usp=sharing
  21. Let's compare where Yelp is today. Why aren't they doing this? Yelp is implements their recommendation model in Spark, sandwiching it between two layers of filtering and machine learning. Within these layers they use things like NLP and Deep Learning to remove bad (maybe “fake” is better word?) reviews. https://docs.google.com/drawings/d/1IGARr_gqf1CMmqGEfoHrs3sKxkn_a45bbWhCGL_hpDM/edit?usp=sharing
  22. Yelp's implementation has a few limitations. The Spark API while scalable, is difficult to extend. Yelp's preprocessing is often considered clunky, where reviews that are valid and useful are randomly discarded. This can cascade into some odd suggestions for users. https://docs.google.com/drawings/d/1BuM0HodwdqTQpdMcnZi9S8DejneYFqZnITRaj2tQ4VU/edit?usp=sharing
  23. While Yelp is focused on providing a valuable service, it's incentive is somewhat contradictory to providing the best recommendations. Making most of its revenue from advertising businesses near the top of search results; adjusting what is shown to users would damage revenues. Should recommendation results be mixed between paid and unpaid? Ethics discussion. http://www.yelp-ir.com/news-releases/news-release-details/yelp-reports-fourth-quarter-and-full-year-2017-financial-results
  24. New methods extend the data past just what's in the data set 
  25. 3rd generation recommender systems are not just about the collaborative filtering algorithm anymore. Building additional components to amplify accuracy or increase robustness against noise is the key to building these products in the coming age. As we have shown in this presentation so far, this algorithm layer of the recommendation stack is open for data scientists to do whatever they feel could improve the end result.
  26. Thinking about the results of a heavily relied upon recommendation system becomes important as the product grows. Google has had growing pains in this space as it grew important enough to cause damage, helping criminals finding protected persons or destroying businesses that relied on their search rank due to an unfair automated ban.
  27. These cascading effects can be determined by asking oneself what type of value does the platform deliver, and what additional benefit is there to being ranked near the top? For Yelp, the additional free publicity on one of the most trafficked restaurant review websites is highly valuable. Not only do these attackers benefit, but also good products and companies lose market share. We must ask ourselves, are user's livelihoods directly tied to our system?
  28. When these systems become economically important, privacy becomes an issue of user safety. If an Elite yelp user has heavy sway in the success of a business, what expectations are there of technologists to maintain anonymity for users without sacraficing quality.
  29. There are some weaknesses when relying on these methods. Adversarial agents that have been trained in a similar way can dupe our system and get ranked highly. Adding a large volume of fake reviews to a system like this would cause a large restructuring of the results delivered, which would have some cascading effects. 
  30. 1. We can improve ordinal star ratings by translating them with text data to create a new star rating. 2. These new star ratings can them simply be passed to a Yelp recommendation algorithm and will test highly on out-off-sample data.