Fake Review Detection Using Contextual and Behavioral Features 2.pptx

Fake Review Detection Using
Behavioral and Contextual
Features

Outline
 Introduction
 Fake Review Identification
 Motivation
 Related Work
 Research Question
 Research Methodology
 Results & Analysis
 Conclusion and Future Work

Introduction
 E-commerce website
 Online platform for sale and purchase
 Services and products
 Example of service: Restaurant, Beauty Parlor, Home cleaners
 Example of goods: Vehicles, Garments, Electronic devices
3

Reviews
 Also called Opinion/ Suggestion
 User generated content
 Experience about the product /service
 Review content
 Rating
4

 Guide new customers
 Beneficial for Businesses
 Positive or Negative [Jindal et, al. 2008]
5
Importance of User Reviews
Positive
Review
Negative
Review

Fake Review Detection
 Fake/Untruthful reviews
 Mislead users/customers
 Posted by Spammers
 Financial gain for business
 Positive/Negative Polarity
 Types: [Jindal et, al. 2007]
Untruthful
Brand Reviews
Non-reviews
6

Motivation
 Untruthful reviews effects so high
 Influence User Decision
 Mislead new Customers
 Effect business market strategies
 Effect business financially
 Losing trust over e-commerce website
 Identification of untruthful reviews
 Exploitation of different features
7

Related Work
 Fake Review Detection
[Jindal et al. 2007, Jindal et al. 2008, Algur et al., 2010, F. Li et al. 2011, Ott et al. 2011, Wu, Greene et al.
2010, Lai et al. 2010, H. Li et al. 2014, Lin Zhu et al. 2014, Ott et al. 2013, Mukherjee et al., 2013, D. Zhang et al.,
2016 ]
 Spammer Identification
[Wang et al. 2011, Akoglu et al. 2013, Fei Geli et al. 2013]
 Group Spammer Detection
[Liu et al. 2012, Mukherjee et al. 2011]
8

Related Work on Fake Review Detection
Year Author
Dataset
Type/Source
Classifier Feature Type
2007 Nitin Jindal et. al Pseudo Fake/Amazon LR Contextual
2010 C. Lai et. al Pseudo Fake/Amazon SVM Contextual
2010 Siddu Algur et. al Pseudo Fake/ Web Page - Contextual
2011 Fangtao Li et. al Pseudo Fake/Epinions LR, SVM,NB Contextual
2013 Arjun Mukherjee et. al Real life/Yelp SVM Contextual, Behavioral
2014 H. Li et. al Real life/Diaping SVM Contextual, Behavioral
2014 Yuming Lin et. al Pseudo Fake/Amazon LR, SVM Contextual, Behavioral
2016 Istiaq Ahsan et. al
Pseudo Fake, Real
Life/AMT+Yelp
NB, SVM Contextual
2016 Dongsong Zhang et. al Real life/Yelp SVM, DT, RF, NB Contextual, Behavioral
Year Author
Dataset
Type/Source
Classifier Feature Type
2010 C. Lai et. al Pseudo Fake/Amazon SVM Contextual
2010 Siddu Algur et. al Pseudo Fake/ Web Page - Contextual
2011 Fangtao Li et. al Pseudo Fake/Epinions LR, SVM,NB Contextual
2013 Arjun Mukherjee et. al Real life/Yelp SVM Contextual, Behavioral
2014 H. Li et. al Real life/Diaping SVM Contextual, Behavioral
2014 Yuming Lin et. al Pseudo Fake/Amazon LR, SVM Contextual, Behavioral
2016 Istiaq Ahsan et. al
Pseudo Fake, Real
Life/AMT+Yelp
NB, SVM Contextual
2016 Dongsong Zhang et. al Real life/Yelp SVM, DT, RF, NB Contextual, Behavioral
9

Related Work on Fake Review Detection
 Features
 Contextual Features
Behavioral Features
 Dataset
Pseudo Fake Review
Real-life Review
 Classifiers
10

Contextual Features
 Extracted from content of the review [Li et, al 2011, Mukherjee et,
2013,Algur, et. 2010, Zhang et, al.2016]
For example: review length, capital diversity
Behavioral Features
 Represent the behavior of reviewer and review [Mukherjee et,
2013,Algur, et. 2010, Zhang et, al.2016]
For example: average posting rate, positive ratio, review
duration
11

Features
 “Reviewer Content Similarity”: [Zhang et, al 2016, Mukherjee et, al. 2013]
A Contextual Feature
It shows average text similarity of all posted reviews of a reviewer
 “Reviewer Deviation” : [Mukherjee et al, 2013]
A Behavioral Feature
It captures variation in review rating on a restaurant.
12

Research Questions
RQ1: What is effect of “reviewer deviation” if combined with
other contextual and behavioral features to identify fake reviews
on Yelp dataset?
RQ2: What is the importance of “reviewer deviation” compared
with other behavioral features for fake review detection model
training?
RQ3: What is effect of different weighting schemes calculating
the “Reviewer Content Similarity” feature of reviewer?
13

Preprocessing
 Remove Invalid Values
 Transform the values of attribute in desired format
 E.g. The attribute “date” of review, conversion of String format into DATE and
removing “Update” keyword
 i.e. “Updated – 01-08-2010” into “01-08-2010”
16

Restaurant Dataset
17
[Istiaq et al., 2016, Mukherjee et al, 2013, Zhang et al 2016]
# Restaurants Reviews Reviewers Non-Fake Fake
Dataset 1 31 2060 1964 1030 1030
Dataset 2 92 12000 9754 6000 6000
# Hotels Reviews Reviewers Non-Fake Fake
Dataset 3 70 1550 1499 775 775
Hotel Dataset
Experimental Setup

 FS1
 FS2
 FS3
 FS4
 FS5
Classifiers
1) Random Forest (RF) [Zhang et. al 2016 ]
2) Support Vector Machine (SVM) [Yuming Lin et. al 2014, C. Lai et. al 2010, Fangtao Li et. Al
2011, Arjun Mukherjee et. al 2013, H. Li et. al 2014, Yuming Lin et. al 2014, Istiaq Ahsan et. Al 2016, Zhang et al 2016 ]
Experimental Setup
Feature Sets
Restaurant
Hotel
Zhang et al. 2016
Zhang et al. 2016
Mukherjee et al. 2013
Evaluation
 10-fold Cross Validation
 Precision, Recall, F1 and Accuracy

Features Set
1) Reviewer Deviation
2) Positive Ratio
3) Maximum Number of Reviews
4) Content Length
5) N-grams
6) Reviewer Content Similarity
(FS3) Arjun Mukherjee et. al, 2013
Behavioral
Contextual
19

Features Set
Restaurant Reviews Hotel Reviews
1) Useful Votes Count
2) Cool Votes Count
3) Funny Votes Count
4) Friend Count
5) Review Count
6) Average Posting Rate
7) Positive Ratio
8) Membership Length
9) Review Duration
10)Positive-to-Negative Ratio
11)Reviewer Content Similarity
12)Reviewer Deviation
1) Useful Votes Count
2) Cool Votes Count
3) Funny Votes Count
4) Friend Count
5) Review Count
6) Average Posting Rate
7) Tips Count
8) Membership Length
9) Review Duration
10)Capital Diversity
11)Reviewer Content Similarity
12)Reviewer Deviation
( NNC , BM25 )
(FS1) (FS4)
(FS2) (FS5)
Zhang et. al, 2016
Behavioral
Behavioral
Contextual
Contextual
20

Results on Restaurant Reviews
21

22
60
65
70
75
80
85
90
95
P R F1 A
Feature Set Comparison on D1 using RandomForest
FS1 FS2 FS3

23
68
73
78
83
88
93
98
P R F1 A
Feature Set Comparison on D2 Using RF
FS1 FS2 FS3

24
69
74
79
84
89
94
D1 D1 D1 D1 D1 D1 D1 D1 D2 D2 D2 D2 D2 D2 D2 D2
P R F1 A P R F1 A P R F1 A P R F1 A
SVM RF SVM RF
FS1 FS2 FS3
Feature set Comparison

Classifier Comparison
25

Importance Score of Features on Restaurant Reviews
26

28
84
85
86
87
88
89
90
91
92
93
94
P R F1 A P R F1 A
RF SVM
FS4 FS5
Feature set Comparison on Hotel Dataset (D3)

Results on Hotel Reviews
Importance Score of Features on Hotel Reviews
29

 Title:
“Exploring Behavioral Features with Contextual Feature to
Identify Fake Reviews ”
 Conference:
The 23rd Conference on
Natural Language & Information Systems (NLDB2018)
13rd - 15th June 2018, Paris, France
 Status: ACCEPTED
Achievements
30

Conclusion & Future Work
 Behavioral feature “Reviewer Deviation” improves the overall
accuracy
 Dataset scaling can also increase affect of behavioral features
 BM25 term weighting scheme also effects the classification
results with improvement
 Spammer and spammer group detection can be explored with
variety of features
 Deep Learning Approaches can also be adopted
31

References
1. Heydari, A., Tavakoli, M. A., Salim, N., & Heydari, Z. (2015). Detection of review
spam: A survey. Expert Systems with Applications.
2. Jindal, N., & Liu, B. (2007a). Analyzing and detecting review spam. Proceedings –
IEEE International Conference on Data Mining, ICDM, 547–552.
3. Algur, S., Hiremath, E., Patil, A., & Shivashankar, S. (2010). Spam detection of
customer reviews from web pages. In Proceedings of the 2nd international
conference on it and business intelligence (pp. 1–13).
4. Algur, S. P., Patil, A. P., Hiremath, P. S., & Shivashankar, S. (2010). Conceptual level
similarity measure based review spam detection. Signal and Image Processing
ICSIP 2010 International Conference on, 416–423.
5. Istiaq Ahsan, M., Nahian, T., All Kafi, A., Ismail Hossain, M., & Muhammad Shah, F.
(2016). An Ensemble approach to detect Review Spam using hybrid
MachineLearning Technique. Computer and Information Technology (ICCIT) 19th
International Conference on IEEE, 388–394.
32

References
6. Jindal, N., & Liu, B. (2008). Opinion spam and analysis. Proceedings of the
international conference on web search and web data mining 2008, 219–230.
7. Lai, C. L., Xu, K. Q., Lau, R. Y. K., Li, Y., & Jing, L. (2010). Toward a language modeling
approach for consumer review spam detection. Proceedings - IEEE International
Conference on E-Business Engineering, ICEBE 2010, 1–8.
8. Li, F., Huang, M., Yang, Y., & Zhu, X. (2011). Learning to identify review spam. In
Ijcai proceedings-international joint conference on artificial intelligence (Vol. 22, p.
2488).
9. Zhang, D., Zhou, L., Kehoe, J. L., & Kilic, I. Y. (2016). What Online Reviewer
Behaviors Really Matter? Effects of Verbal and Nonverbal Behaviors on Detection
of Fake Online Reviews. Journal of Management Information Systems, 33(2), 456–
481.
10. Mukherjee, A., Venkataraman, V., Liu, B., & Glance, N. (2013b). What Yelp Fake
Review Filter Might Be Doing? Seventh International AAAI, 409–418.
33

References
11. Istiaq Ahsan, M., Nahian, T., All Kafi, A., Ismail Hossain, M., & Muhammad Shah, F.
(2016). An Ensemble approach to detect Review Spam using hybrid Machine
Learning Technique. Computer and Information Technology (ICCIT) 19th
International Conference on IEEE, 388–394.
34

Average Posting Rate
𝐴𝑃𝑅 𝑎 =
𝑁𝑟(𝑎)
𝑁(𝑝𝑜𝑠𝑡𝑖𝑛𝑔𝑑𝑎𝑦𝑠)
It shows the ratio of total reviews of a reviewer to number of reviewer active days.
An active day is that on which reviewer has posted atleast one review.
37

Positive Ratio
𝑅𝑝𝑜𝑠 𝑎 =
𝑁𝑟( 𝑟𝑎 𝑟𝑎𝑡𝑖𝑛𝑔𝑟 ≥ 4})
𝑁𝑟(𝑎)
It shows reviews having more than or equal to 4 as rating value rating divided by total
number of reviews of a reviewer
38

Positive-to-Negative Ratio
𝑅𝑝𝑛 𝑎 =
𝑁𝑟( 𝑟𝑎 𝑟𝑎𝑡𝑖𝑛𝑔𝑟 ≥ 4})
𝑁𝑟( 𝑟𝑎 𝑟𝑎𝑡𝑖𝑛𝑔𝑟 ≤ 2})
It shows the ratio of a reviewer having more than or equal to 4 reviews rating value to the
reviews having less than or equal to 2 rating value
39

Review Duration
𝑅𝐷(𝑎) = 𝐷𝑙(𝑎) − 𝐷𝑓(𝑎)
Different of first posted review and last posted review of reviewer
40

Reviewer Deviation
𝑅𝑒𝑣𝐷𝑒𝑣 𝑟 = 𝑟𝑎𝑡𝑖𝑛𝑔 −
𝑟𝑎𝑡𝑖𝑛𝑔𝑟(𝑝)
𝑁𝑟(𝑝)
It captures variation in review rating on a restaurant. It is computed by substrating review
rating with absolute deviation of all ratings on a restaurant
41

Reviewer Content Similarity
𝑅𝐶𝑆 𝑎 =
𝑖
𝑛
max( 𝑗
𝑛
𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑟𝑖, 𝑟𝑗))
𝑛
It shows average text similarity of all posted reviews of a reviewer
42

Membership Length
𝑅𝐶𝑆 𝑎 =
𝑖
𝑛
max( 𝑗
𝑛
𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑟𝑖, 𝑟𝑗))
𝑛
------------------
43

Jindal et, al. 2007 - 2008
 They discovered spamming activities including identifying duplicate or near
duplicate reviews using shingle method.
 For identifying brand reviews and non-reviews, dissimilarity between product meta
data and review content were used.
 Spammer groups were identified by calculating content similarity of reviews of
different reviewers.
 78% on AUC
44

Algur et, al. 2010
 two annotators were hired
 dataset containing 960 reviews.
 Identified duplicate and near duplicate reviews using humming distance.
 57% percent accuracy
45

Lai, Xu, Lau, Li, & Jing, 2010
 identify untruthful and non-reviews
 Feature set for identifying non-reviews includes lexical, syntactical and stylistic
features
 Two annotators were hired
 SVM acquired 96% recall in classifying non-reviews
 Three Types of contextual features were used to identify untruthful reviews
46

Istiaq Ahsan et. al 2016
 Unlabeled dataset contains reviews from Yelp and Labeled dataset of (Ott, Cardie,
& Hancock, 2013) were used.
 Duplicate reviews were identified from unlabeled dataset using KL-JS disctance
 The accuracy of 88% was reported using NB.
47

Fake Review Detection Using Contextual and Behavioral Features 2.pptx

Recommended

Recommended

More Related Content

Similar to Fake Review Detection Using Contextual and Behavioral Features 2.pptx

Similar to Fake Review Detection Using Contextual and Behavioral Features 2.pptx (20)

Recently uploaded

Recently uploaded (20)

Fake Review Detection Using Contextual and Behavioral Features 2.pptx