SlideShare a Scribd company logo
1 of 47
Fake Review Detection Using
Behavioral and Contextual
Features
Outline
 Introduction
 Fake Review Identification
 Motivation
 Related Work
 Research Question
 Research Methodology
 Results & Analysis
 Conclusion and Future Work
Introduction
 E-commerce website
 Online platform for sale and purchase
 Services and products
 Example of service: Restaurant, Beauty Parlor, Home cleaners
 Example of goods: Vehicles, Garments, Electronic devices
3
Reviews
 Also called Opinion/ Suggestion
 User generated content
 Experience about the product /service
 Review content
 Rating
4
 Guide new customers
 Beneficial for Businesses
 Positive or Negative [Jindal et, al. 2008]
5
Importance of User Reviews
Positive
Review
Negative
Review
Fake Review Detection
 Fake/Untruthful reviews
 Mislead users/customers
 Posted by Spammers
 Financial gain for business
 Positive/Negative Polarity
 Types: [Jindal et, al. 2007]
Untruthful
Brand Reviews
Non-reviews
6
Motivation
 Untruthful reviews effects so high
 Influence User Decision
 Mislead new Customers
 Effect business market strategies
 Effect business financially
 Losing trust over e-commerce website
 Identification of untruthful reviews
 Exploitation of different features
7
Related Work
 Fake Review Detection
[Jindal et al. 2007, Jindal et al. 2008, Algur et al., 2010, F. Li et al. 2011, Ott et al. 2011, Wu, Greene et al.
2010, Lai et al. 2010, H. Li et al. 2014, Lin Zhu et al. 2014, Ott et al. 2013, Mukherjee et al., 2013, D. Zhang et al.,
2016 ]
 Spammer Identification
[Wang et al. 2011, Akoglu et al. 2013, Fei Geli et al. 2013]
 Group Spammer Detection
[Liu et al. 2012, Mukherjee et al. 2011]
8
Related Work on Fake Review Detection
Year Author
Dataset
Type/Source
Classifier Feature Type
2007 Nitin Jindal et. al Pseudo Fake/Amazon LR Contextual
2008 Nitin Jindal et. al Pseudo Fake/Amazon LR Contextual
2010 C. Lai et. al Pseudo Fake/Amazon SVM Contextual
2010 Siddu Algur et. al Pseudo Fake/ Web Page - Contextual
2011 Fangtao Li et. al Pseudo Fake/Epinions LR, SVM,NB Contextual
2013 Arjun Mukherjee et. al Real life/Yelp SVM Contextual, Behavioral
2014 H. Li et. al Real life/Diaping SVM Contextual, Behavioral
2014 Yuming Lin et. al Pseudo Fake/Amazon LR, SVM Contextual, Behavioral
2016 Istiaq Ahsan et. al
Pseudo Fake, Real
Life/AMT+Yelp
NB, SVM Contextual
2016 Dongsong Zhang et. al Real life/Yelp SVM, DT, RF, NB Contextual, Behavioral
Year Author
Dataset
Type/Source
Classifier Feature Type
2007 Nitin Jindal et. al Pseudo Fake/Amazon LR Contextual
2008 Nitin Jindal et. al Pseudo Fake/Amazon LR Contextual
2010 C. Lai et. al Pseudo Fake/Amazon SVM Contextual
2010 Siddu Algur et. al Pseudo Fake/ Web Page - Contextual
2011 Fangtao Li et. al Pseudo Fake/Epinions LR, SVM,NB Contextual
2013 Arjun Mukherjee et. al Real life/Yelp SVM Contextual, Behavioral
2014 H. Li et. al Real life/Diaping SVM Contextual, Behavioral
2014 Yuming Lin et. al Pseudo Fake/Amazon LR, SVM Contextual, Behavioral
2016 Istiaq Ahsan et. al
Pseudo Fake, Real
Life/AMT+Yelp
NB, SVM Contextual
2016 Dongsong Zhang et. al Real life/Yelp SVM, DT, RF, NB Contextual, Behavioral
9
Related Work on Fake Review Detection
 Features
 Contextual Features
Behavioral Features
 Dataset
Pseudo Fake Review
Real-life Review
 Classifiers
10
Contextual Features
 Extracted from content of the review [Li et, al 2011, Mukherjee et,
2013,Algur, et. 2010, Zhang et, al.2016]
For example: review length, capital diversity
Behavioral Features
 Represent the behavior of reviewer and review [Mukherjee et,
2013,Algur, et. 2010, Zhang et, al.2016]
For example: average posting rate, positive ratio, review
duration
11
Features
 “Reviewer Content Similarity”: [Zhang et, al 2016, Mukherjee et, al. 2013]
A Contextual Feature
It shows average text similarity of all posted reviews of a reviewer
 “Reviewer Deviation” : [Mukherjee et al, 2013]
A Behavioral Feature
It captures variation in review rating on a restaurant.
12
Research Questions
RQ1: What is effect of “reviewer deviation” if combined with
other contextual and behavioral features to identify fake reviews
on Yelp dataset?
RQ2: What is the importance of “reviewer deviation” compared
with other behavioral features for fake review detection model
training?
RQ3: What is effect of different weighting schemes calculating
the “Reviewer Content Similarity” feature of reviewer?
13
Research Methodology
14
Yelp
Attribute Selection
15
Preprocessing
 Remove Invalid Values
 Transform the values of attribute in desired format
 E.g. The attribute “date” of review, conversion of String format into DATE and
removing “Update” keyword
 i.e. “Updated – 01-08-2010” into “01-08-2010”
16
Restaurant Dataset
17
[Istiaq et al., 2016, Mukherjee et al, 2013, Zhang et al 2016]
# Restaurants Reviews Reviewers Non-Fake Fake
Dataset 1 31 2060 1964 1030 1030
Dataset 2 92 12000 9754 6000 6000
# Hotels Reviews Reviewers Non-Fake Fake
Dataset 3 70 1550 1499 775 775
Hotel Dataset
Experimental Setup
 FS1
 FS2
 FS3
 FS4
 FS5
Classifiers
1) Random Forest (RF) [Zhang et. al 2016 ]
2) Support Vector Machine (SVM) [Yuming Lin et. al 2014, C. Lai et. al 2010, Fangtao Li et. Al
2011, Arjun Mukherjee et. al 2013, H. Li et. al 2014, Yuming Lin et. al 2014, Istiaq Ahsan et. Al 2016, Zhang et al 2016 ]
Experimental Setup
Feature Sets
Restaurant
Hotel
Zhang et al. 2016
Zhang et al. 2016
Mukherjee et al. 2013
Evaluation
 10-fold Cross Validation
 Precision, Recall, F1 and Accuracy
Features Set
1) Reviewer Deviation
2) Positive Ratio
3) Maximum Number of Reviews
4) Content Length
5) N-grams
6) Reviewer Content Similarity
(FS3) Arjun Mukherjee et. al, 2013
Behavioral
Contextual
19
Features Set
Restaurant Reviews Hotel Reviews
1) Useful Votes Count
2) Cool Votes Count
3) Funny Votes Count
4) Friend Count
5) Review Count
6) Average Posting Rate
7) Positive Ratio
8) Membership Length
9) Review Duration
10)Positive-to-Negative Ratio
11)Reviewer Content Similarity
12)Reviewer Deviation
1) Useful Votes Count
2) Cool Votes Count
3) Funny Votes Count
4) Friend Count
5) Review Count
6) Average Posting Rate
7) Tips Count
8) Membership Length
9) Review Duration
10)Capital Diversity
11)Reviewer Content Similarity
12)Reviewer Deviation
( NNC , BM25 )
(FS1) (FS4)
(FS2) (FS5)
Zhang et. al, 2016
Behavioral
Behavioral
Contextual
Contextual
20
Results on Restaurant Reviews
21
22
60
65
70
75
80
85
90
95
P R F1 A
Feature Set Comparison on D1 using RandomForest
FS1 FS2 FS3
23
68
73
78
83
88
93
98
P R F1 A
Feature Set Comparison on D2 Using RF
FS1 FS2 FS3
24
69
74
79
84
89
94
D1 D1 D1 D1 D1 D1 D1 D1 D2 D2 D2 D2 D2 D2 D2 D2
P R F1 A P R F1 A P R F1 A P R F1 A
SVM RF SVM RF
FS1 FS2 FS3
Feature set Comparison
Results on Restaurant Reviews
Classifier Comparison
25
Results on Restaurant Reviews
Importance Score of Features on Restaurant Reviews
26
Results on Hotel Reviews
27
28
84
85
86
87
88
89
90
91
92
93
94
P R F1 A P R F1 A
RF SVM
FS4 FS5
Feature set Comparison on Hotel Dataset (D3)
Results on Hotel Reviews
Importance Score of Features on Hotel Reviews
29
 Title:
“Exploring Behavioral Features with Contextual Feature to
Identify Fake Reviews ”
 Conference:
The 23rd Conference on
Natural Language & Information Systems (NLDB2018)
13rd - 15th June 2018, Paris, France
 Status: ACCEPTED
Achievements
30
Conclusion & Future Work
 Behavioral feature “Reviewer Deviation” improves the overall
accuracy
 Dataset scaling can also increase affect of behavioral features
 BM25 term weighting scheme also effects the classification
results with improvement
 Spammer and spammer group detection can be explored with
variety of features
 Deep Learning Approaches can also be adopted
31
References
1. Heydari, A., Tavakoli, M. A., Salim, N., & Heydari, Z. (2015). Detection of review
spam: A survey. Expert Systems with Applications.
2. Jindal, N., & Liu, B. (2007a). Analyzing and detecting review spam. Proceedings –
IEEE International Conference on Data Mining, ICDM, 547–552.
3. Algur, S., Hiremath, E., Patil, A., & Shivashankar, S. (2010). Spam detection of
customer reviews from web pages. In Proceedings of the 2nd international
conference on it and business intelligence (pp. 1–13).
4. Algur, S. P., Patil, A. P., Hiremath, P. S., & Shivashankar, S. (2010). Conceptual level
similarity measure based review spam detection. Signal and Image Processing
ICSIP 2010 International Conference on, 416–423.
5. Istiaq Ahsan, M., Nahian, T., All Kafi, A., Ismail Hossain, M., & Muhammad Shah, F.
(2016). An Ensemble approach to detect Review Spam using hybrid
MachineLearning Technique. Computer and Information Technology (ICCIT) 19th
International Conference on IEEE, 388–394.
32
References
6. Jindal, N., & Liu, B. (2008). Opinion spam and analysis. Proceedings of the
international conference on web search and web data mining 2008, 219–230.
7. Lai, C. L., Xu, K. Q., Lau, R. Y. K., Li, Y., & Jing, L. (2010). Toward a language modeling
approach for consumer review spam detection. Proceedings - IEEE International
Conference on E-Business Engineering, ICEBE 2010, 1–8.
8. Li, F., Huang, M., Yang, Y., & Zhu, X. (2011). Learning to identify review spam. In
Ijcai proceedings-international joint conference on artificial intelligence (Vol. 22, p.
2488).
9. Zhang, D., Zhou, L., Kehoe, J. L., & Kilic, I. Y. (2016). What Online Reviewer
Behaviors Really Matter? Effects of Verbal and Nonverbal Behaviors on Detection
of Fake Online Reviews. Journal of Management Information Systems, 33(2), 456–
481.
10. Mukherjee, A., Venkataraman, V., Liu, B., & Glance, N. (2013b). What Yelp Fake
Review Filter Might Be Doing? Seventh International AAAI, 409–418.
33
References
11. Istiaq Ahsan, M., Nahian, T., All Kafi, A., Ismail Hossain, M., & Muhammad Shah, F.
(2016). An Ensemble approach to detect Review Spam using hybrid Machine
Learning Technique. Computer and Information Technology (ICCIT) 19th
International Conference on IEEE, 388–394.
34
Thank You !
35
Any Question ?
36
Average Posting Rate
𝐴𝑃𝑅 𝑎 =
𝑁𝑟(𝑎)
𝑁(𝑝𝑜𝑠𝑡𝑖𝑛𝑔𝑑𝑎𝑦𝑠)
It shows the ratio of total reviews of a reviewer to number of reviewer active days.
An active day is that on which reviewer has posted atleast one review.
37
Positive Ratio
𝑅𝑝𝑜𝑠 𝑎 =
𝑁𝑟( 𝑟𝑎 𝑟𝑎𝑡𝑖𝑛𝑔𝑟 ≥ 4})
𝑁𝑟(𝑎)
It shows reviews having more than or equal to 4 as rating value rating divided by total
number of reviews of a reviewer
38
Positive-to-Negative Ratio
𝑅𝑝𝑛 𝑎 =
𝑁𝑟( 𝑟𝑎 𝑟𝑎𝑡𝑖𝑛𝑔𝑟 ≥ 4})
𝑁𝑟( 𝑟𝑎 𝑟𝑎𝑡𝑖𝑛𝑔𝑟 ≤ 2})
It shows the ratio of a reviewer having more than or equal to 4 reviews rating value to the
reviews having less than or equal to 2 rating value
39
Review Duration
𝑅𝐷(𝑎) = 𝐷𝑙(𝑎) − 𝐷𝑓(𝑎)
Different of first posted review and last posted review of reviewer
40
Reviewer Deviation
𝑅𝑒𝑣𝐷𝑒𝑣 𝑟 = 𝑟𝑎𝑡𝑖𝑛𝑔 −
𝑟𝑎𝑡𝑖𝑛𝑔𝑟(𝑝)
𝑁𝑟(𝑝)
It captures variation in review rating on a restaurant. It is computed by substrating review
rating with absolute deviation of all ratings on a restaurant
41
Reviewer Content Similarity
𝑅𝐶𝑆 𝑎 =
𝑖
𝑛
max( 𝑗
𝑛
𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑟𝑖, 𝑟𝑗))
𝑛
It shows average text similarity of all posted reviews of a reviewer
42
Membership Length
𝑅𝐶𝑆 𝑎 =
𝑖
𝑛
max( 𝑗
𝑛
𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑟𝑖, 𝑟𝑗))
𝑛
------------------
43
Jindal et, al. 2007 - 2008
 They discovered spamming activities including identifying duplicate or near
duplicate reviews using shingle method.
 For identifying brand reviews and non-reviews, dissimilarity between product meta
data and review content were used.
 Spammer groups were identified by calculating content similarity of reviews of
different reviewers.
 78% on AUC
44
Algur et, al. 2010
 two annotators were hired
 dataset containing 960 reviews.
 Identified duplicate and near duplicate reviews using humming distance.
 57% percent accuracy
45
Lai, Xu, Lau, Li, & Jing, 2010
 identify untruthful and non-reviews
 Feature set for identifying non-reviews includes lexical, syntactical and stylistic
features
 Two annotators were hired
 SVM acquired 96% recall in classifying non-reviews
 Three Types of contextual features were used to identify untruthful reviews
46
Istiaq Ahsan et. al 2016
 Unlabeled dataset contains reviews from Yelp and Labeled dataset of (Ott, Cardie,
& Hancock, 2013) were used.
 Duplicate reviews were identified from unlabeled dataset using KL-JS disctance
 The accuracy of 88% was reported using NB.
47

More Related Content

Similar to Fake Review Detection Using Contextual and Behavioral Features 2.pptx

Bo Chen Resume
Bo Chen ResumeBo Chen Resume
Bo Chen Resume
Bo Chen
 
GA – Client Project General Guidelines Mgmt5074 Fanshaw.docx
GA – Client Project General Guidelines Mgmt5074  Fanshaw.docxGA – Client Project General Guidelines Mgmt5074  Fanshaw.docx
GA – Client Project General Guidelines Mgmt5074 Fanshaw.docx
hanneloremccaffery
 
Power point template 修改後 (2)
Power point template 修改後 (2)Power point template 修改後 (2)
Power point template 修改後 (2)
Larua Chen
 

Similar to Fake Review Detection Using Contextual and Behavioral Features 2.pptx (20)

IRJET- Review on Different Recommendation Techniques for GRS in Online Social...
IRJET- Review on Different Recommendation Techniques for GRS in Online Social...IRJET- Review on Different Recommendation Techniques for GRS in Online Social...
IRJET- Review on Different Recommendation Techniques for GRS in Online Social...
 
Bo Chen Resume
Bo Chen ResumeBo Chen Resume
Bo Chen Resume
 
IRJET- A New Approach to Product Recommendation Systems
IRJET- A New Approach to Product Recommendation SystemsIRJET- A New Approach to Product Recommendation Systems
IRJET- A New Approach to Product Recommendation Systems
 
IRJET- A New Approach to Product Recommendation Systems
IRJET-  	  A New Approach to Product Recommendation SystemsIRJET-  	  A New Approach to Product Recommendation Systems
IRJET- A New Approach to Product Recommendation Systems
 
GA – Client Project General Guidelines Mgmt5074 Fanshaw.docx
GA – Client Project General Guidelines Mgmt5074  Fanshaw.docxGA – Client Project General Guidelines Mgmt5074  Fanshaw.docx
GA – Client Project General Guidelines Mgmt5074 Fanshaw.docx
 
IRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review DetectionIRJET- Enhancing NLP Techniques for Fake Review Detection
IRJET- Enhancing NLP Techniques for Fake Review Detection
 
20120140506003
2012014050600320120140506003
20120140506003
 
An Evaluation Framework for Collaborative Filtering on Purchase Information i...
An Evaluation Framework for Collaborative Filtering on Purchase Information i...An Evaluation Framework for Collaborative Filtering on Purchase Information i...
An Evaluation Framework for Collaborative Filtering on Purchase Information i...
 
INTEGRATION OF IMPORTANCEPERFORMANCE ANALYSIS AND FUZZY DEMATEL
INTEGRATION OF IMPORTANCEPERFORMANCE ANALYSIS AND FUZZY DEMATELINTEGRATION OF IMPORTANCEPERFORMANCE ANALYSIS AND FUZZY DEMATEL
INTEGRATION OF IMPORTANCEPERFORMANCE ANALYSIS AND FUZZY DEMATEL
 
INTEGRATION OF IMPORTANCEPERFORMANCE ANALYSIS AND FUZZY DEMATEL
INTEGRATION OF IMPORTANCEPERFORMANCE ANALYSIS AND FUZZY DEMATELINTEGRATION OF IMPORTANCEPERFORMANCE ANALYSIS AND FUZZY DEMATEL
INTEGRATION OF IMPORTANCEPERFORMANCE ANALYSIS AND FUZZY DEMATEL
 
Customer Journey Mapping Research Report
Customer Journey Mapping Research ReportCustomer Journey Mapping Research Report
Customer Journey Mapping Research Report
 
IRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
IRJET-A Novel Technic to Notice Spam Reviews on e-ShoppingIRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
IRJET-A Novel Technic to Notice Spam Reviews on e-Shopping
 
SWOT Analysis and Competition Approaches of Hotels in Guangzhou
SWOT Analysis and Competition Approaches of Hotels in GuangzhouSWOT Analysis and Competition Approaches of Hotels in Guangzhou
SWOT Analysis and Competition Approaches of Hotels in Guangzhou
 
Combining the opinion profile modeling with complex context filtering for Con...
Combining the opinion profile modeling with complex context filtering for Con...Combining the opinion profile modeling with complex context filtering for Con...
Combining the opinion profile modeling with complex context filtering for Con...
 
A location based movie recommender system
A location based movie recommender systemA location based movie recommender system
A location based movie recommender system
 
A LOCATION-BASED MOVIE RECOMMENDER SYSTEM USING COLLABORATIVE FILTERING
A LOCATION-BASED MOVIE RECOMMENDER SYSTEM USING COLLABORATIVE FILTERINGA LOCATION-BASED MOVIE RECOMMENDER SYSTEM USING COLLABORATIVE FILTERING
A LOCATION-BASED MOVIE RECOMMENDER SYSTEM USING COLLABORATIVE FILTERING
 
Yelp Fake Reviews Detection_new_v23.pptx
Yelp Fake Reviews Detection_new_v23.pptxYelp Fake Reviews Detection_new_v23.pptx
Yelp Fake Reviews Detection_new_v23.pptx
 
76 s201909
76 s20190976 s201909
76 s201909
 
Prioritization of various dimensions of service quality in hospitality industry
Prioritization of various dimensions of service quality in hospitality industryPrioritization of various dimensions of service quality in hospitality industry
Prioritization of various dimensions of service quality in hospitality industry
 
Power point template 修改後 (2)
Power point template 修改後 (2)Power point template 修改後 (2)
Power point template 修改後 (2)
 

Recently uploaded

如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
zifhagzkk
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh +966572737505 get cytotec
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
varanasisatyanvesh
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
wsppdmt
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
wsppdmt
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
ahmedjiabur940
 

Recently uploaded (20)

Digital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham WareDigital Transformation Playbook by Graham Ware
Digital Transformation Playbook by Graham Ware
 
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
如何办理(Dalhousie毕业证书)达尔豪斯大学毕业证成绩单留信学历认证
 
Seven tools of quality control.slideshare
Seven tools of quality control.slideshareSeven tools of quality control.slideshare
Seven tools of quality control.slideshare
 
Harnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptxHarnessing the Power of GenAI for BI and Reporting.pptx
Harnessing the Power of GenAI for BI and Reporting.pptx
 
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
Identify Customer Segments to Create Customer Offers for Each Segment - Appli...
 
Pentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AIPentesting_AI and security challenges of AI
Pentesting_AI and security challenges of AI
 
Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?Case Study 4 Where the cry of rebellion happen?
Case Study 4 Where the cry of rebellion happen?
 
Introduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptxIntroduction to Statistics Presentation.pptx
Introduction to Statistics Presentation.pptx
 
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital AgeCredit Card Fraud Detection: Safeguarding Transactions in the Digital Age
Credit Card Fraud Detection: Safeguarding Transactions in the Digital Age
 
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotecAbortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
Abortion pills in Riyadh Saudi Arabia (+966572737505 buy cytotec
 
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...Simplify hybrid data integration at an enterprise scale. Integrate all your d...
Simplify hybrid data integration at an enterprise scale. Integrate all your d...
 
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
Jual Obat Aborsi Bandung (Asli No.1) Wa 082134680322 Klinik Obat Penggugur Ka...
 
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
SAC 25 Final National, Regional & Local Angel Group Investing Insights 2024 0...
 
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
如何办理英国诺森比亚大学毕业证(NU毕业证书)成绩单原件一模一样
 
社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction社内勉強会資料_Object Recognition as Next Token Prediction
社内勉強会資料_Object Recognition as Next Token Prediction
 
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATIONCapstone in Interprofessional Informatic  // IMPACT OF COVID 19 ON EDUCATION
Capstone in Interprofessional Informatic // IMPACT OF COVID 19 ON EDUCATION
 
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
如何办理澳洲拉筹伯大学毕业证(LaTrobe毕业证书)成绩单原件一模一样
 
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptxRESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
RESEARCH-FINAL-DEFENSE-PPT-TEMPLATE.pptx
 
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi ArabiaIn Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
In Riyadh ((+919101817206)) Cytotec kit @ Abortion Pills Saudi Arabia
 
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24  Building Real-Time Pipelines With FLaNKDATA SUMMIT 24  Building Real-Time Pipelines With FLaNK
DATA SUMMIT 24 Building Real-Time Pipelines With FLaNK
 

Fake Review Detection Using Contextual and Behavioral Features 2.pptx

  • 1. Fake Review Detection Using Behavioral and Contextual Features
  • 2. Outline  Introduction  Fake Review Identification  Motivation  Related Work  Research Question  Research Methodology  Results & Analysis  Conclusion and Future Work
  • 3. Introduction  E-commerce website  Online platform for sale and purchase  Services and products  Example of service: Restaurant, Beauty Parlor, Home cleaners  Example of goods: Vehicles, Garments, Electronic devices 3
  • 4. Reviews  Also called Opinion/ Suggestion  User generated content  Experience about the product /service  Review content  Rating 4
  • 5.  Guide new customers  Beneficial for Businesses  Positive or Negative [Jindal et, al. 2008] 5 Importance of User Reviews Positive Review Negative Review
  • 6. Fake Review Detection  Fake/Untruthful reviews  Mislead users/customers  Posted by Spammers  Financial gain for business  Positive/Negative Polarity  Types: [Jindal et, al. 2007] Untruthful Brand Reviews Non-reviews 6
  • 7. Motivation  Untruthful reviews effects so high  Influence User Decision  Mislead new Customers  Effect business market strategies  Effect business financially  Losing trust over e-commerce website  Identification of untruthful reviews  Exploitation of different features 7
  • 8. Related Work  Fake Review Detection [Jindal et al. 2007, Jindal et al. 2008, Algur et al., 2010, F. Li et al. 2011, Ott et al. 2011, Wu, Greene et al. 2010, Lai et al. 2010, H. Li et al. 2014, Lin Zhu et al. 2014, Ott et al. 2013, Mukherjee et al., 2013, D. Zhang et al., 2016 ]  Spammer Identification [Wang et al. 2011, Akoglu et al. 2013, Fei Geli et al. 2013]  Group Spammer Detection [Liu et al. 2012, Mukherjee et al. 2011] 8
  • 9. Related Work on Fake Review Detection Year Author Dataset Type/Source Classifier Feature Type 2007 Nitin Jindal et. al Pseudo Fake/Amazon LR Contextual 2008 Nitin Jindal et. al Pseudo Fake/Amazon LR Contextual 2010 C. Lai et. al Pseudo Fake/Amazon SVM Contextual 2010 Siddu Algur et. al Pseudo Fake/ Web Page - Contextual 2011 Fangtao Li et. al Pseudo Fake/Epinions LR, SVM,NB Contextual 2013 Arjun Mukherjee et. al Real life/Yelp SVM Contextual, Behavioral 2014 H. Li et. al Real life/Diaping SVM Contextual, Behavioral 2014 Yuming Lin et. al Pseudo Fake/Amazon LR, SVM Contextual, Behavioral 2016 Istiaq Ahsan et. al Pseudo Fake, Real Life/AMT+Yelp NB, SVM Contextual 2016 Dongsong Zhang et. al Real life/Yelp SVM, DT, RF, NB Contextual, Behavioral Year Author Dataset Type/Source Classifier Feature Type 2007 Nitin Jindal et. al Pseudo Fake/Amazon LR Contextual 2008 Nitin Jindal et. al Pseudo Fake/Amazon LR Contextual 2010 C. Lai et. al Pseudo Fake/Amazon SVM Contextual 2010 Siddu Algur et. al Pseudo Fake/ Web Page - Contextual 2011 Fangtao Li et. al Pseudo Fake/Epinions LR, SVM,NB Contextual 2013 Arjun Mukherjee et. al Real life/Yelp SVM Contextual, Behavioral 2014 H. Li et. al Real life/Diaping SVM Contextual, Behavioral 2014 Yuming Lin et. al Pseudo Fake/Amazon LR, SVM Contextual, Behavioral 2016 Istiaq Ahsan et. al Pseudo Fake, Real Life/AMT+Yelp NB, SVM Contextual 2016 Dongsong Zhang et. al Real life/Yelp SVM, DT, RF, NB Contextual, Behavioral 9
  • 10. Related Work on Fake Review Detection  Features  Contextual Features Behavioral Features  Dataset Pseudo Fake Review Real-life Review  Classifiers 10
  • 11. Contextual Features  Extracted from content of the review [Li et, al 2011, Mukherjee et, 2013,Algur, et. 2010, Zhang et, al.2016] For example: review length, capital diversity Behavioral Features  Represent the behavior of reviewer and review [Mukherjee et, 2013,Algur, et. 2010, Zhang et, al.2016] For example: average posting rate, positive ratio, review duration 11
  • 12. Features  “Reviewer Content Similarity”: [Zhang et, al 2016, Mukherjee et, al. 2013] A Contextual Feature It shows average text similarity of all posted reviews of a reviewer  “Reviewer Deviation” : [Mukherjee et al, 2013] A Behavioral Feature It captures variation in review rating on a restaurant. 12
  • 13. Research Questions RQ1: What is effect of “reviewer deviation” if combined with other contextual and behavioral features to identify fake reviews on Yelp dataset? RQ2: What is the importance of “reviewer deviation” compared with other behavioral features for fake review detection model training? RQ3: What is effect of different weighting schemes calculating the “Reviewer Content Similarity” feature of reviewer? 13
  • 16. Preprocessing  Remove Invalid Values  Transform the values of attribute in desired format  E.g. The attribute “date” of review, conversion of String format into DATE and removing “Update” keyword  i.e. “Updated – 01-08-2010” into “01-08-2010” 16
  • 17. Restaurant Dataset 17 [Istiaq et al., 2016, Mukherjee et al, 2013, Zhang et al 2016] # Restaurants Reviews Reviewers Non-Fake Fake Dataset 1 31 2060 1964 1030 1030 Dataset 2 92 12000 9754 6000 6000 # Hotels Reviews Reviewers Non-Fake Fake Dataset 3 70 1550 1499 775 775 Hotel Dataset Experimental Setup
  • 18.  FS1  FS2  FS3  FS4  FS5 Classifiers 1) Random Forest (RF) [Zhang et. al 2016 ] 2) Support Vector Machine (SVM) [Yuming Lin et. al 2014, C. Lai et. al 2010, Fangtao Li et. Al 2011, Arjun Mukherjee et. al 2013, H. Li et. al 2014, Yuming Lin et. al 2014, Istiaq Ahsan et. Al 2016, Zhang et al 2016 ] Experimental Setup Feature Sets Restaurant Hotel Zhang et al. 2016 Zhang et al. 2016 Mukherjee et al. 2013 Evaluation  10-fold Cross Validation  Precision, Recall, F1 and Accuracy
  • 19. Features Set 1) Reviewer Deviation 2) Positive Ratio 3) Maximum Number of Reviews 4) Content Length 5) N-grams 6) Reviewer Content Similarity (FS3) Arjun Mukherjee et. al, 2013 Behavioral Contextual 19
  • 20. Features Set Restaurant Reviews Hotel Reviews 1) Useful Votes Count 2) Cool Votes Count 3) Funny Votes Count 4) Friend Count 5) Review Count 6) Average Posting Rate 7) Positive Ratio 8) Membership Length 9) Review Duration 10)Positive-to-Negative Ratio 11)Reviewer Content Similarity 12)Reviewer Deviation 1) Useful Votes Count 2) Cool Votes Count 3) Funny Votes Count 4) Friend Count 5) Review Count 6) Average Posting Rate 7) Tips Count 8) Membership Length 9) Review Duration 10)Capital Diversity 11)Reviewer Content Similarity 12)Reviewer Deviation ( NNC , BM25 ) (FS1) (FS4) (FS2) (FS5) Zhang et. al, 2016 Behavioral Behavioral Contextual Contextual 20
  • 21. Results on Restaurant Reviews 21
  • 22. 22 60 65 70 75 80 85 90 95 P R F1 A Feature Set Comparison on D1 using RandomForest FS1 FS2 FS3
  • 23. 23 68 73 78 83 88 93 98 P R F1 A Feature Set Comparison on D2 Using RF FS1 FS2 FS3
  • 24. 24 69 74 79 84 89 94 D1 D1 D1 D1 D1 D1 D1 D1 D2 D2 D2 D2 D2 D2 D2 D2 P R F1 A P R F1 A P R F1 A P R F1 A SVM RF SVM RF FS1 FS2 FS3 Feature set Comparison
  • 25. Results on Restaurant Reviews Classifier Comparison 25
  • 26. Results on Restaurant Reviews Importance Score of Features on Restaurant Reviews 26
  • 27. Results on Hotel Reviews 27
  • 28. 28 84 85 86 87 88 89 90 91 92 93 94 P R F1 A P R F1 A RF SVM FS4 FS5 Feature set Comparison on Hotel Dataset (D3)
  • 29. Results on Hotel Reviews Importance Score of Features on Hotel Reviews 29
  • 30.  Title: “Exploring Behavioral Features with Contextual Feature to Identify Fake Reviews ”  Conference: The 23rd Conference on Natural Language & Information Systems (NLDB2018) 13rd - 15th June 2018, Paris, France  Status: ACCEPTED Achievements 30
  • 31. Conclusion & Future Work  Behavioral feature “Reviewer Deviation” improves the overall accuracy  Dataset scaling can also increase affect of behavioral features  BM25 term weighting scheme also effects the classification results with improvement  Spammer and spammer group detection can be explored with variety of features  Deep Learning Approaches can also be adopted 31
  • 32. References 1. Heydari, A., Tavakoli, M. A., Salim, N., & Heydari, Z. (2015). Detection of review spam: A survey. Expert Systems with Applications. 2. Jindal, N., & Liu, B. (2007a). Analyzing and detecting review spam. Proceedings – IEEE International Conference on Data Mining, ICDM, 547–552. 3. Algur, S., Hiremath, E., Patil, A., & Shivashankar, S. (2010). Spam detection of customer reviews from web pages. In Proceedings of the 2nd international conference on it and business intelligence (pp. 1–13). 4. Algur, S. P., Patil, A. P., Hiremath, P. S., & Shivashankar, S. (2010). Conceptual level similarity measure based review spam detection. Signal and Image Processing ICSIP 2010 International Conference on, 416–423. 5. Istiaq Ahsan, M., Nahian, T., All Kafi, A., Ismail Hossain, M., & Muhammad Shah, F. (2016). An Ensemble approach to detect Review Spam using hybrid MachineLearning Technique. Computer and Information Technology (ICCIT) 19th International Conference on IEEE, 388–394. 32
  • 33. References 6. Jindal, N., & Liu, B. (2008). Opinion spam and analysis. Proceedings of the international conference on web search and web data mining 2008, 219–230. 7. Lai, C. L., Xu, K. Q., Lau, R. Y. K., Li, Y., & Jing, L. (2010). Toward a language modeling approach for consumer review spam detection. Proceedings - IEEE International Conference on E-Business Engineering, ICEBE 2010, 1–8. 8. Li, F., Huang, M., Yang, Y., & Zhu, X. (2011). Learning to identify review spam. In Ijcai proceedings-international joint conference on artificial intelligence (Vol. 22, p. 2488). 9. Zhang, D., Zhou, L., Kehoe, J. L., & Kilic, I. Y. (2016). What Online Reviewer Behaviors Really Matter? Effects of Verbal and Nonverbal Behaviors on Detection of Fake Online Reviews. Journal of Management Information Systems, 33(2), 456– 481. 10. Mukherjee, A., Venkataraman, V., Liu, B., & Glance, N. (2013b). What Yelp Fake Review Filter Might Be Doing? Seventh International AAAI, 409–418. 33
  • 34. References 11. Istiaq Ahsan, M., Nahian, T., All Kafi, A., Ismail Hossain, M., & Muhammad Shah, F. (2016). An Ensemble approach to detect Review Spam using hybrid Machine Learning Technique. Computer and Information Technology (ICCIT) 19th International Conference on IEEE, 388–394. 34
  • 37. Average Posting Rate 𝐴𝑃𝑅 𝑎 = 𝑁𝑟(𝑎) 𝑁(𝑝𝑜𝑠𝑡𝑖𝑛𝑔𝑑𝑎𝑦𝑠) It shows the ratio of total reviews of a reviewer to number of reviewer active days. An active day is that on which reviewer has posted atleast one review. 37
  • 38. Positive Ratio 𝑅𝑝𝑜𝑠 𝑎 = 𝑁𝑟( 𝑟𝑎 𝑟𝑎𝑡𝑖𝑛𝑔𝑟 ≥ 4}) 𝑁𝑟(𝑎) It shows reviews having more than or equal to 4 as rating value rating divided by total number of reviews of a reviewer 38
  • 39. Positive-to-Negative Ratio 𝑅𝑝𝑛 𝑎 = 𝑁𝑟( 𝑟𝑎 𝑟𝑎𝑡𝑖𝑛𝑔𝑟 ≥ 4}) 𝑁𝑟( 𝑟𝑎 𝑟𝑎𝑡𝑖𝑛𝑔𝑟 ≤ 2}) It shows the ratio of a reviewer having more than or equal to 4 reviews rating value to the reviews having less than or equal to 2 rating value 39
  • 40. Review Duration 𝑅𝐷(𝑎) = 𝐷𝑙(𝑎) − 𝐷𝑓(𝑎) Different of first posted review and last posted review of reviewer 40
  • 41. Reviewer Deviation 𝑅𝑒𝑣𝐷𝑒𝑣 𝑟 = 𝑟𝑎𝑡𝑖𝑛𝑔 − 𝑟𝑎𝑡𝑖𝑛𝑔𝑟(𝑝) 𝑁𝑟(𝑝) It captures variation in review rating on a restaurant. It is computed by substrating review rating with absolute deviation of all ratings on a restaurant 41
  • 42. Reviewer Content Similarity 𝑅𝐶𝑆 𝑎 = 𝑖 𝑛 max( 𝑗 𝑛 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑟𝑖, 𝑟𝑗)) 𝑛 It shows average text similarity of all posted reviews of a reviewer 42
  • 43. Membership Length 𝑅𝐶𝑆 𝑎 = 𝑖 𝑛 max( 𝑗 𝑛 𝑠𝑖𝑚𝑖𝑙𝑎𝑟𝑖𝑡𝑦(𝑟𝑖, 𝑟𝑗)) 𝑛 ------------------ 43
  • 44. Jindal et, al. 2007 - 2008  They discovered spamming activities including identifying duplicate or near duplicate reviews using shingle method.  For identifying brand reviews and non-reviews, dissimilarity between product meta data and review content were used.  Spammer groups were identified by calculating content similarity of reviews of different reviewers.  78% on AUC 44
  • 45. Algur et, al. 2010  two annotators were hired  dataset containing 960 reviews.  Identified duplicate and near duplicate reviews using humming distance.  57% percent accuracy 45
  • 46. Lai, Xu, Lau, Li, & Jing, 2010  identify untruthful and non-reviews  Feature set for identifying non-reviews includes lexical, syntactical and stylistic features  Two annotators were hired  SVM acquired 96% recall in classifying non-reviews  Three Types of contextual features were used to identify untruthful reviews 46
  • 47. Istiaq Ahsan et. al 2016  Unlabeled dataset contains reviews from Yelp and Labeled dataset of (Ott, Cardie, & Hancock, 2013) were used.  Duplicate reviews were identified from unlabeled dataset using KL-JS disctance  The accuracy of 88% was reported using NB. 47