SlideShare a Scribd company logo
Rayana & Akoglu
Shebuti Rayana Leman Akoglu*
Metadata
Rayana & Akoglu 2
 How do consumers learn about product quality?
 Advertisements
 Consumer review websites (Yelp, TripAdvisor etc.)
 Impact of consumer reviews on sales?
Collective Opinion Spam Detection
+1 star-rating increases
revenue by 5-9%
Harvard Study by M. Luca
Reviews, Reputation, andRevenue: The Case ofYelp.com
Rayana & Akoglu 3
 Paid/Biased reviewers write fake reviews
 unjustly promote / demote products or businesses
Collective Opinion Spam Detection
Problem ?
Humans only slightly better than chance
Finding Deceptive Opinion Spam by Any Stretch of the Imagination Ott et al. 2011
Rayana & Akoglu 4Collective Opinion Spam Detection
OnlineReviewSystem
Review network
Meta Data
spammer
target product
fake review
Rayana & Akoglu 5
Main contributions:
 SpEagle : a Collective approach to opinion spam
 is unsupervised
 can easily leverage labels (SpEagle+)
 improves detection performance
 Computationally light version : SpLite
 significant speed-up
Collective Opinion Spam Detection
Reviewnetwork
Metadata
Rayana & Akoglu 6Collective Opinion Spam Detection
Review
Network
Review
Text
Review
Behavior
Supervision
Ott’2011 ✓ supervised
Mukherjee’
2013
✓ ✓ supervised
Jindal’2008 ✓ supervised
Co-training
[Li’2011]
✓ semi-supervised
Wang’2011 ✓ ✓ unsupervised
FraudEagle ✓ unsupervised
SpEagle ✓ ✓ ✓ unsupervised
SpEagle+ ✓ ✓ ✓ semi-supervised
Rayana & Akoglu 7
A network classification problem
 Given
 User-Review-Product network (tri-partite)
 Features extracted from metadata (i.e. text, behavior)
– for users, reviews, and products
 Classify network objects into type-specific classes
 Users (‘benign’ vs. ‘spammer’)
 Products (‘non-target’ vs. ‘target’)
 Reviews (‘genuine’ vs. ‘fake’)
Collective Opinion Spam Detection
writes belongs
U R P
Rayana & Akoglu 8Collective Opinion Spam Detection
Workflow
Rayana & Akoglu 9Collective Opinion Spam Detection
Workflow
Rayana & Akoglu 10
 A collective classification approach (unsupervised)
 Objective function utilizes pairwise Markov Random Fields
Collective Opinion Spam Detection
Node labels as
random variables
edge
potential
(label-label)
prior belief edge potential
(label-observed label)
max
y
edge type
t
t
Rayana & Akoglu 11
 A collective classification approach (unsupervised)
 Objective function utilizes pairwise Markov Random Fields
- Inference problem (NP-hard)
 Loopy Belief Propagation (LBP)
Collective Opinion Spam Detection
edge type
prior
1) Repeat for each node:
2) At convergence:
edge
potential
j
i
Rayana & Akoglu 12
 A collective classification approach (unsupervised)
 Objective function utilizes pairwise Markov Random Fields
- Inference problem (NP-hard)
 Loopy Belief Propagation (LBP)
Collective Opinion Spam Detection
edge type
prior
1) Repeat for each node:
2) At convergence:
edge
potential
Rayana & Akoglu 13
Metadata
Features
Spam Scores
Priors
Collective Opinion Spam Detection
Users: ‘benign’ ‘spammer’
Products: ‘non-target’ ‘target’
Reviews: ‘genuine’ ‘fake’
Rayana & Akoglu 14Collective Opinion Spam Detection
User Features Product Features Review Features
• maximum #reviews/day
• ratio of +ve/-ve reviews
• avg/weighted rating
deviation
• rating deviation entropy
• temporal gaps entropy
• burstiness of reviews
• maximum #reviews/day
• ratio of +ve/-ve reviews
• avg/weighted rating
deviation
• rating deviation entropy
• temporal gaps entropy
• rank order of reviews
• absolute/thresholded
rating deviation
• extremity of rating
• early time frame
• singleton review
• review length (#words)
• avg content similarity
• max content similarity
• review length
• average content similarity
• maximum content
similarity
• ratio subjective/objective
• description length
• ratio of exclamation sent.
• freq. of similar reviews
• % capital letters
• review length
• ratio 1st person pronoun
TextBehavioral
Rayana & Akoglu 15Collective Opinion Spam Detection
User Features Product Features Review Features
• maximum #reviews/day
• ratio of +ve/-ve reviews
• avg/weighted rating
deviation
• rating deviation entropy
• temporal gaps entropy
• burstiness of reviews
• maximum #reviews/day
• ratio of +ve/-ve reviews
• avg/weighted rating
deviation
• rating deviation entropy
• temporal gaps entropy
• rank order of reviews
• absolute/thresholded
rating deviation
• extremity of rating
• early time frame
• singleton review
• review length (#words)
• avg content similarity
• max content similarity
• review length
• avg content similarity
• max content similarity
• ratio subjective/objective
• description length
• ratio of exclamation sent.
• freq. of similar reviews
• % capital letters
• review length
• ratio 1st person pronoun
TextBehavioral
Rayana & Akoglu 16Collective Opinion Spam Detection
User Features Product Features Review Features
• maximum #reviews/day
• ratio of +ve/-ve reviews
• avg/Weighted Rating
Deviation
• rating deviation entropy
• temporal gaps entropy
• burstiness of reviews
• maximum #reviews/day
• ratio of +ve/-ve reviews
• avg/Weighted Rating
Deviation
• rating deviation entropy
• temporal gaps entropy
• rank order of reviews
• absolute/thresholded
rating deviation
• extremity of rating
• early time frame
• singleton review
• review length (#words)
• avg content similarity
• max content similarity
• review length
• avg content similarity
• max content similarity
• ratio subjective/objective
• Description Length
• ratio of exclamation sent.
• freq. of similar reviews
• % capital letters
• review length
• ratio 1st person pronoun
TextBehavioral
temporal order
words in review
Rayana & Akoglu 17Collective Opinion Spam Detection
(H)igher more suspicious
(L)ower more suspicious
Rayana & Akoglu 18
Q: How to handle features with different
scales?A: Cumulative distribution:
 For each feature l , 1 ≤ l ≤ F and its corresponding
value xli for node i
 Combine F values for each node i:
 Priors :
Collective Opinion Spam Detection
Rayana & Akoglu 19
 A collective classification approach (unsupervised)
 Objective function utilizes pairwise Markov Random Fields
- Inference problem (NP-hard)
 Loopy Belief Propagation (LBP)
Collective Opinion Spam Detection
edge type
prior
1) Repeat for each node:
2) At convergence:
edge
potential
Rayana & Akoglu 20Collective Opinion Spam Detection
writes belongs
Rayana & Akoglu 21
Beliefs as class probabilities:
 Probi(spammer) = bi(yi : spammer)
 Probk(fake) = bk(yk : fake)
 Probj(target) = bj(yj : target)
Collective Opinion Spam Detection
Rayana & Akoglu 22
 SpEagle can work semi-supervised
 Can incorporate labels seamlessly
 Can use user, review, and/or product labels
 For labeled nodes, priors are set to:
 φ ← {ϵ, 1 − ϵ} for spam category
(i.e., fake, spammer, or target)
 φ ← {1− ϵ, ϵ} for non-spam
category
Collective Opinion Spam Detection
Rayana & Akoglu 23
 3 Yelp datasets1: recommended vs. non-recommended
 YelpChi –hotel and restaurant reviews from Chicago
 YelpNYC –restaurant reviews from New York City
 YelpZip –restaurants reviews from zipcodes in NJ, VT, CT, PA
Collective Opinion Spam Detection
1 Datasets are made available to the community
2 A spammer has at least one filtered review
2
Rayana & Akoglu 25
 Area Under Curve (AUC)
 for ROC curve (TPR vs. FPR)
 Average Precision (AP)
 AUC for Precision-Recall curve
 Precision@k : ratio of spam in top k
 NDCG@k : weighted scoring which favors top items
Collective Opinion Spam Detection
Rayana & Akoglu 26
 SpEagle superior to existing methods
 Different priors: User & Review priors most informative
Collective Opinion Spam Detection
Rayana & Akoglu 27
 Labels improve performance significantly
Collective Opinion Spam Detection
Rayana & Akoglu 28Collective Opinion Spam Detection
 Labels improve performance significantly
Rayana & Akoglu 29Collective Opinion Spam Detection
 Labels improve performance significantly
Rayana & Akoglu 30
Light version of
SpEagle
SpLite (SpLite+)
Rayana & Akoglu 31Collective Opinion Spam Detection
 SpEagle+ vs SpLite+ perform comparably
Rayana & Akoglu 32Collective Opinion Spam Detection
 SpLite+ is orders of magnitude faster than SpEagle+
Rayana & Akoglu 33
Main contributions:
 SpEagle : a Collective approach to opinion spam
 is unsupervised
 can easily leverage labels (SpEagle+)
 improves detection performance
 Computationally light version : SpLite (SpLite+)
 significant speed-up
Collective Opinion Spam Detection
Reviewnetwork
Metadata
Rayana & Akoglu 34Collective Opinion Spam Detection
Code and Data available:
http://shebuti.com/collective-opinion-spam-detection/
srayana@cs.stonybrook.edu
http://www.cs.stonybrook.edu/~datalab/

More Related Content

Similar to Collective Opinion Spam Detection Bridging Review Networks and Metadata

Yelper Helper Concept
Yelper Helper ConceptYelper Helper Concept
Yelper Helper Concept
alexruizeuler
 
Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Tex...
Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Tex...Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Tex...
Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Tex...
Nobal Niraula
 
Opinion Fraud Detection in Online Reviews by Network Effects
Opinion Fraud Detection in Online Reviews by Network EffectsOpinion Fraud Detection in Online Reviews by Network Effects
Opinion Fraud Detection in Online Reviews by Network Effects
SOYEON KIM
 
Yelp Fake Reviews Detection_new_v23.pptx
Yelp Fake Reviews Detection_new_v23.pptxYelp Fake Reviews Detection_new_v23.pptx
Yelp Fake Reviews Detection_new_v23.pptx
ridhimamittal3011
 
How to Interpret Implicit User Feedback
How to Interpret Implicit User FeedbackHow to Interpret Implicit User Feedback
How to Interpret Implicit User Feedback
Ladislav Peska
 
Rating System Algorithms Document
Rating System Algorithms DocumentRating System Algorithms Document
Rating System Algorithms Document
Scandala Tamang
 
V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...
V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...
V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...
eMadrid network
 
Opinion-Based Entity Ranking
Opinion-Based Entity RankingOpinion-Based Entity Ranking
Opinion-Based Entity Ranking
Kavita Ganesan
 
Fashiondatasc
FashiondatascFashiondatasc
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
opinionminingkavitahyunduk00-110407113230-phpapp01.pptopinionminingkavitahyunduk00-110407113230-phpapp01.ppt
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
ssuser059331
 
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
opinionminingkavitahyunduk00-110407113230-phpapp01.pptopinionminingkavitahyunduk00-110407113230-phpapp01.ppt
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
ssuser059331
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011
Ernesto Mislej
 
Online feedback correlation using clustering
Online feedback correlation using clusteringOnline feedback correlation using clustering
Online feedback correlation using clustering
awesomesos
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
Stanley Wang
 
Spot deceptive TripAdvisor Reviews
Spot deceptive TripAdvisor ReviewsSpot deceptive TripAdvisor Reviews
Spot deceptive TripAdvisor Reviews
Yousef Fadila
 
Detection of Fraud Reviews for a Product
Detection of Fraud Reviews for a ProductDetection of Fraud Reviews for a Product
Detection of Fraud Reviews for a Product
IJSRD
 
2014 CrossRef Annual Meeting Peer Review Panel: PRE: Securing Trust & Transpa...
2014 CrossRef Annual Meeting Peer Review Panel: PRE: Securing Trust & Transpa...2014 CrossRef Annual Meeting Peer Review Panel: PRE: Securing Trust & Transpa...
2014 CrossRef Annual Meeting Peer Review Panel: PRE: Securing Trust & Transpa...
Crossref
 
Mahendra nath
Mahendra nathMahendra nath
Mahendra nath
MahendraDwivedi7
 
Recommendation Systems Roadtrip
Recommendation Systems RoadtripRecommendation Systems Roadtrip
Recommendation Systems Roadtrip
The Real Dyl
 
Personalizing the web building effective recommender systems
Personalizing the web building effective recommender systemsPersonalizing the web building effective recommender systems
Personalizing the web building effective recommender systems
Aravindharamanan S
 

Similar to Collective Opinion Spam Detection Bridging Review Networks and Metadata (20)

Yelper Helper Concept
Yelper Helper ConceptYelper Helper Concept
Yelper Helper Concept
 
Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Tex...
Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Tex...Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Tex...
Estimating the Helpfulness and Economic Impact of Product Reviews: Mining Tex...
 
Opinion Fraud Detection in Online Reviews by Network Effects
Opinion Fraud Detection in Online Reviews by Network EffectsOpinion Fraud Detection in Online Reviews by Network Effects
Opinion Fraud Detection in Online Reviews by Network Effects
 
Yelp Fake Reviews Detection_new_v23.pptx
Yelp Fake Reviews Detection_new_v23.pptxYelp Fake Reviews Detection_new_v23.pptx
Yelp Fake Reviews Detection_new_v23.pptx
 
How to Interpret Implicit User Feedback
How to Interpret Implicit User FeedbackHow to Interpret Implicit User Feedback
How to Interpret Implicit User Feedback
 
Rating System Algorithms Document
Rating System Algorithms DocumentRating System Algorithms Document
Rating System Algorithms Document
 
V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...
V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...
V Jornadas eMadrid sobre “Educación Digital”. Roberto Centeno, Universidad Na...
 
Opinion-Based Entity Ranking
Opinion-Based Entity RankingOpinion-Based Entity Ranking
Opinion-Based Entity Ranking
 
Fashiondatasc
FashiondatascFashiondatasc
Fashiondatasc
 
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
opinionminingkavitahyunduk00-110407113230-phpapp01.pptopinionminingkavitahyunduk00-110407113230-phpapp01.ppt
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
 
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
opinionminingkavitahyunduk00-110407113230-phpapp01.pptopinionminingkavitahyunduk00-110407113230-phpapp01.ppt
opinionminingkavitahyunduk00-110407113230-phpapp01.ppt
 
Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011Recommender Systems! @ASAI 2011
Recommender Systems! @ASAI 2011
 
Online feedback correlation using clustering
Online feedback correlation using clusteringOnline feedback correlation using clustering
Online feedback correlation using clustering
 
Overview of recommender system
Overview of recommender systemOverview of recommender system
Overview of recommender system
 
Spot deceptive TripAdvisor Reviews
Spot deceptive TripAdvisor ReviewsSpot deceptive TripAdvisor Reviews
Spot deceptive TripAdvisor Reviews
 
Detection of Fraud Reviews for a Product
Detection of Fraud Reviews for a ProductDetection of Fraud Reviews for a Product
Detection of Fraud Reviews for a Product
 
2014 CrossRef Annual Meeting Peer Review Panel: PRE: Securing Trust & Transpa...
2014 CrossRef Annual Meeting Peer Review Panel: PRE: Securing Trust & Transpa...2014 CrossRef Annual Meeting Peer Review Panel: PRE: Securing Trust & Transpa...
2014 CrossRef Annual Meeting Peer Review Panel: PRE: Securing Trust & Transpa...
 
Mahendra nath
Mahendra nathMahendra nath
Mahendra nath
 
Recommendation Systems Roadtrip
Recommendation Systems RoadtripRecommendation Systems Roadtrip
Recommendation Systems Roadtrip
 
Personalizing the web building effective recommender systems
Personalizing the web building effective recommender systemsPersonalizing the web building effective recommender systems
Personalizing the web building effective recommender systems
 

Recently uploaded

VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
sukaniyasunnu
 
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
kinni singh$A17
 
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
tanupasswan6
 
High Profile Girls Call Delhi 🛵🚡9711199171 💃 Choose Best And Top Girl Service...
High Profile Girls Call Delhi 🛵🚡9711199171 💃 Choose Best And Top Girl Service...High Profile Girls Call Delhi 🛵🚡9711199171 💃 Choose Best And Top Girl Service...
High Profile Girls Call Delhi 🛵🚡9711199171 💃 Choose Best And Top Girl Service...
kinni singh$A17
 
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
Grant McAlister
 
Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...
Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...
Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...
weiwchu
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
45unexpected
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
fatima shekh$A17
 
Histology of Muscle types histology o.ppt
Histology of Muscle types histology o.pptHistology of Muscle types histology o.ppt
Histology of Muscle types histology o.ppt
SamanArshad11
 
Training on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptxTraining on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptx
lenjisoHussein
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
huseindihon
 
Potential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriatePotential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriate
huseindihon
 
History and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big DataHistory and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big Data
Jongwook Woo
 
Cyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & PricingCyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & Pricing
BaraDaniel1
 
PyData Sofia May 2024 - Intro to Apache Arrow
PyData Sofia May 2024 - Intro to Apache ArrowPyData Sofia May 2024 - Intro to Apache Arrow
PyData Sofia May 2024 - Intro to Apache Arrow
Uwe Korn
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
huseindihon
 
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
tanupasswan6
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
ginni singh$A17
 
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
sheetal singh$A17
 
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
gargjiya84
 

Recently uploaded (20)

VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
VIP Kolkata Girls Call Kolkata 0X0000000X Doorstep High-Profile Girl Service ...
 
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
New Girls Call Noida 9873940964 Unlimited Short Providing Girls Service Avail...
 
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
Busty Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And...
 
High Profile Girls Call Delhi 🛵🚡9711199171 💃 Choose Best And Top Girl Service...
High Profile Girls Call Delhi 🛵🚡9711199171 💃 Choose Best And Top Girl Service...High Profile Girls Call Delhi 🛵🚡9711199171 💃 Choose Best And Top Girl Service...
High Profile Girls Call Delhi 🛵🚡9711199171 💃 Choose Best And Top Girl Service...
 
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
AWS re:Invent 2023 - Deep dive into Amazon Aurora and its innovations DAT408
 
Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...
Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...
Harnessing Wild and Untamed (Publicly Available) Data for the Cost efficient ...
 
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
Female Girls Call Mumbai 9920725232 Unlimited Short Providing Girls Service A...
 
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
BDSM Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service And ...
 
Histology of Muscle types histology o.ppt
Histology of Muscle types histology o.pptHistology of Muscle types histology o.ppt
Histology of Muscle types histology o.ppt
 
Training on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptxTraining on CSPro and step by steps.pptx
Training on CSPro and step by steps.pptx
 
the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...the potential of the development of the Ford–Fulkerson algorithm to solve the...
the potential of the development of the Ford–Fulkerson algorithm to solve the...
 
Potential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriatePotential Uses of the Floyd-Warshall Algorithm as appropriate
Potential Uses of the Floyd-Warshall Algorithm as appropriate
 
History and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big DataHistory and Application of LLM Leveraging Big Data
History and Application of LLM Leveraging Big Data
 
Cyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & PricingCyber Insurance Mathematical Model & Pricing
Cyber Insurance Mathematical Model & Pricing
 
PyData Sofia May 2024 - Intro to Apache Arrow
PyData Sofia May 2024 - Intro to Apache ArrowPyData Sofia May 2024 - Intro to Apache Arrow
PyData Sofia May 2024 - Intro to Apache Arrow
 
potential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in generalpotential usefulness of multi-agent maze-solving in general
potential usefulness of multi-agent maze-solving in general
 
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
Celebrity Girls Call Delhi 🎈🔥9711199171 🔥💋🎈 Provide Best And Top Girl Service...
 
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
Celebrity Girls Call Noida 9873940964 Unlimited Short Providing Girls Service...
 
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
Female Girls Call Noida 🎈🔥9873940964 🔥💋🎈 Provide Best And Top Girl Service An...
 
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
High Girls Call Mohali 000XX00000 Provide Best And Top Girl Service And No1 i...
 

Collective Opinion Spam Detection Bridging Review Networks and Metadata

  • 1. Rayana & Akoglu Shebuti Rayana Leman Akoglu* Metadata
  • 2. Rayana & Akoglu 2  How do consumers learn about product quality?  Advertisements  Consumer review websites (Yelp, TripAdvisor etc.)  Impact of consumer reviews on sales? Collective Opinion Spam Detection +1 star-rating increases revenue by 5-9% Harvard Study by M. Luca Reviews, Reputation, andRevenue: The Case ofYelp.com
  • 3. Rayana & Akoglu 3  Paid/Biased reviewers write fake reviews  unjustly promote / demote products or businesses Collective Opinion Spam Detection Problem ? Humans only slightly better than chance Finding Deceptive Opinion Spam by Any Stretch of the Imagination Ott et al. 2011
  • 4. Rayana & Akoglu 4Collective Opinion Spam Detection OnlineReviewSystem Review network Meta Data spammer target product fake review
  • 5. Rayana & Akoglu 5 Main contributions:  SpEagle : a Collective approach to opinion spam  is unsupervised  can easily leverage labels (SpEagle+)  improves detection performance  Computationally light version : SpLite  significant speed-up Collective Opinion Spam Detection Reviewnetwork Metadata
  • 6. Rayana & Akoglu 6Collective Opinion Spam Detection Review Network Review Text Review Behavior Supervision Ott’2011 ✓ supervised Mukherjee’ 2013 ✓ ✓ supervised Jindal’2008 ✓ supervised Co-training [Li’2011] ✓ semi-supervised Wang’2011 ✓ ✓ unsupervised FraudEagle ✓ unsupervised SpEagle ✓ ✓ ✓ unsupervised SpEagle+ ✓ ✓ ✓ semi-supervised
  • 7. Rayana & Akoglu 7 A network classification problem  Given  User-Review-Product network (tri-partite)  Features extracted from metadata (i.e. text, behavior) – for users, reviews, and products  Classify network objects into type-specific classes  Users (‘benign’ vs. ‘spammer’)  Products (‘non-target’ vs. ‘target’)  Reviews (‘genuine’ vs. ‘fake’) Collective Opinion Spam Detection writes belongs U R P
  • 8. Rayana & Akoglu 8Collective Opinion Spam Detection Workflow
  • 9. Rayana & Akoglu 9Collective Opinion Spam Detection Workflow
  • 10. Rayana & Akoglu 10  A collective classification approach (unsupervised)  Objective function utilizes pairwise Markov Random Fields Collective Opinion Spam Detection Node labels as random variables edge potential (label-label) prior belief edge potential (label-observed label) max y edge type t t
  • 11. Rayana & Akoglu 11  A collective classification approach (unsupervised)  Objective function utilizes pairwise Markov Random Fields - Inference problem (NP-hard)  Loopy Belief Propagation (LBP) Collective Opinion Spam Detection edge type prior 1) Repeat for each node: 2) At convergence: edge potential j i
  • 12. Rayana & Akoglu 12  A collective classification approach (unsupervised)  Objective function utilizes pairwise Markov Random Fields - Inference problem (NP-hard)  Loopy Belief Propagation (LBP) Collective Opinion Spam Detection edge type prior 1) Repeat for each node: 2) At convergence: edge potential
  • 13. Rayana & Akoglu 13 Metadata Features Spam Scores Priors Collective Opinion Spam Detection Users: ‘benign’ ‘spammer’ Products: ‘non-target’ ‘target’ Reviews: ‘genuine’ ‘fake’
  • 14. Rayana & Akoglu 14Collective Opinion Spam Detection User Features Product Features Review Features • maximum #reviews/day • ratio of +ve/-ve reviews • avg/weighted rating deviation • rating deviation entropy • temporal gaps entropy • burstiness of reviews • maximum #reviews/day • ratio of +ve/-ve reviews • avg/weighted rating deviation • rating deviation entropy • temporal gaps entropy • rank order of reviews • absolute/thresholded rating deviation • extremity of rating • early time frame • singleton review • review length (#words) • avg content similarity • max content similarity • review length • average content similarity • maximum content similarity • ratio subjective/objective • description length • ratio of exclamation sent. • freq. of similar reviews • % capital letters • review length • ratio 1st person pronoun TextBehavioral
  • 15. Rayana & Akoglu 15Collective Opinion Spam Detection User Features Product Features Review Features • maximum #reviews/day • ratio of +ve/-ve reviews • avg/weighted rating deviation • rating deviation entropy • temporal gaps entropy • burstiness of reviews • maximum #reviews/day • ratio of +ve/-ve reviews • avg/weighted rating deviation • rating deviation entropy • temporal gaps entropy • rank order of reviews • absolute/thresholded rating deviation • extremity of rating • early time frame • singleton review • review length (#words) • avg content similarity • max content similarity • review length • avg content similarity • max content similarity • ratio subjective/objective • description length • ratio of exclamation sent. • freq. of similar reviews • % capital letters • review length • ratio 1st person pronoun TextBehavioral
  • 16. Rayana & Akoglu 16Collective Opinion Spam Detection User Features Product Features Review Features • maximum #reviews/day • ratio of +ve/-ve reviews • avg/Weighted Rating Deviation • rating deviation entropy • temporal gaps entropy • burstiness of reviews • maximum #reviews/day • ratio of +ve/-ve reviews • avg/Weighted Rating Deviation • rating deviation entropy • temporal gaps entropy • rank order of reviews • absolute/thresholded rating deviation • extremity of rating • early time frame • singleton review • review length (#words) • avg content similarity • max content similarity • review length • avg content similarity • max content similarity • ratio subjective/objective • Description Length • ratio of exclamation sent. • freq. of similar reviews • % capital letters • review length • ratio 1st person pronoun TextBehavioral temporal order words in review
  • 17. Rayana & Akoglu 17Collective Opinion Spam Detection (H)igher more suspicious (L)ower more suspicious
  • 18. Rayana & Akoglu 18 Q: How to handle features with different scales?A: Cumulative distribution:  For each feature l , 1 ≤ l ≤ F and its corresponding value xli for node i  Combine F values for each node i:  Priors : Collective Opinion Spam Detection
  • 19. Rayana & Akoglu 19  A collective classification approach (unsupervised)  Objective function utilizes pairwise Markov Random Fields - Inference problem (NP-hard)  Loopy Belief Propagation (LBP) Collective Opinion Spam Detection edge type prior 1) Repeat for each node: 2) At convergence: edge potential
  • 20. Rayana & Akoglu 20Collective Opinion Spam Detection writes belongs
  • 21. Rayana & Akoglu 21 Beliefs as class probabilities:  Probi(spammer) = bi(yi : spammer)  Probk(fake) = bk(yk : fake)  Probj(target) = bj(yj : target) Collective Opinion Spam Detection
  • 22. Rayana & Akoglu 22  SpEagle can work semi-supervised  Can incorporate labels seamlessly  Can use user, review, and/or product labels  For labeled nodes, priors are set to:  φ ← {ϵ, 1 − ϵ} for spam category (i.e., fake, spammer, or target)  φ ← {1− ϵ, ϵ} for non-spam category Collective Opinion Spam Detection
  • 23. Rayana & Akoglu 23  3 Yelp datasets1: recommended vs. non-recommended  YelpChi –hotel and restaurant reviews from Chicago  YelpNYC –restaurant reviews from New York City  YelpZip –restaurants reviews from zipcodes in NJ, VT, CT, PA Collective Opinion Spam Detection 1 Datasets are made available to the community 2 A spammer has at least one filtered review 2
  • 24. Rayana & Akoglu 25  Area Under Curve (AUC)  for ROC curve (TPR vs. FPR)  Average Precision (AP)  AUC for Precision-Recall curve  Precision@k : ratio of spam in top k  NDCG@k : weighted scoring which favors top items Collective Opinion Spam Detection
  • 25. Rayana & Akoglu 26  SpEagle superior to existing methods  Different priors: User & Review priors most informative Collective Opinion Spam Detection
  • 26. Rayana & Akoglu 27  Labels improve performance significantly Collective Opinion Spam Detection
  • 27. Rayana & Akoglu 28Collective Opinion Spam Detection  Labels improve performance significantly
  • 28. Rayana & Akoglu 29Collective Opinion Spam Detection  Labels improve performance significantly
  • 29. Rayana & Akoglu 30 Light version of SpEagle SpLite (SpLite+)
  • 30. Rayana & Akoglu 31Collective Opinion Spam Detection  SpEagle+ vs SpLite+ perform comparably
  • 31. Rayana & Akoglu 32Collective Opinion Spam Detection  SpLite+ is orders of magnitude faster than SpEagle+
  • 32. Rayana & Akoglu 33 Main contributions:  SpEagle : a Collective approach to opinion spam  is unsupervised  can easily leverage labels (SpEagle+)  improves detection performance  Computationally light version : SpLite (SpLite+)  significant speed-up Collective Opinion Spam Detection Reviewnetwork Metadata
  • 33. Rayana & Akoglu 34Collective Opinion Spam Detection Code and Data available: http://shebuti.com/collective-opinion-spam-detection/ srayana@cs.stonybrook.edu http://www.cs.stonybrook.edu/~datalab/

Editor's Notes

  1. My work focuses on discovering patterns and detecting anomalies in real-world data, using graph analytics techniques, and developing effective and efficient tools to do so .