A Presentation on
“ Fake User Detection”
SUBMITTED BY
Mahendra Nath Dwivedi
Roll No:- 502202216004
Enroll No.:- AA/3522
Department of Computer Science & Engineering
Central college of engineering and management
GUIDED BY
Mr. Abhishek Badholia
DEPT. OF COMPUTER SCIENCE &
ENGINEERING
CONTENT
• INTRODUCTION
- INTRODUCTION OF PROJECT
- REVIEW ANALYSIS OF AMAZON.COM
- BEHAVIOR FEATURES OF SPAMMERS
• LITERATURE REVIEW
• PROBLEM IDENTIFICATION
• METHODOLOGY
• RESULT AND FUTURE SCOPE
• REFERENCES
With the development of the Internet, people are more likely to express their
views and opinions on the Web. They can write reviews or other opinions on
E-Commerce sites, forums, and blogs. They are also used by product
manufacturers to identify problems of their products and to find competitive
intelligence information about their competitors. Unfortunately, this
importance of reviews also gives good incentive for spam, which contains
false positive or malicious negative opinions
INTRODUCTION
Table shows some selected mobile phone reviews from the Amazon website. For
the mobile phone product's topic, reviews 1 and 2 are relevant to the topic, and
review 1 has the highest relevancy than other reviews. But, it is hard to decide the
relevancy between review 3 and the topic. Besides, reviews 4 and 5 are part of
plagiarism of review 3, and review 6 is an advertisement. Only two of six reviews are
relevant to the mobile phone product's topic. Fake reviews can not only increase
decision's making cost, but also affect decision accuracy making.
Review(Comment) analysis of website.
1) Star User
2) Deviation Rate
3) Bias Rate
4) Review Similarity rate
5) Review Relevancy Rate
6) Content-Length
7) Illustration
8) Burst review
BEHAVIOR FEATURES OF SPAMMERS
Types of Spammers
Types of Review Spams
Basically three types of review spams exist[6]. These are:
Type 1 (Untruthful Review Spams): Fictitious positive reviews are rewarded to
products in order to promote them and also unreasonable negative reviews
are given to the competing products to harm their reputations among the
consumers.This is how untruthful reviews mislead the consumers into
believing their spam reviews.
Type 2 (Reviews with brand mentions): These spams have only brands as their prime
focus. They comment about the manufacturer or seller or the brand name
alone.These reviews are biased and can easily b figured out as they do not
talk about the product and rather only mention the brand names.
Type 3 (Non-reviews): These reviews are either junk, as in, have no relation with the
product or are purely used for advertisement purposes. They have these
two forms:
i. marketing purposes, and
ii. irrelevant text or reviews having random write-ups.
Rule Based Classification Of Spammers
METHODOLOGY
In this section we will discuss the proposed framework in detail. The
proposed spam detection and blocking framework consist of various
modules.
•Feature Discretization
•Negative Set Extraction
•Expected Maximization Algorithm
•Blocking of Users
.
Rating Deviation from Mean Agreement
Filter Mean Target Difference
Group Filter Mean Variance
Target Model Focus
Algorithm: Negative Set Extraction
Input: P → Positive Set of Spammers
U → unlabeled set of users
Output: RN → Set of negative set.
RN <- N initially
RN_Extract (P, U)
For each feature do
Calculate
End for
For evaluate each feature (decreasing order) do
Remove instances consists of from
If Size(RN) is close enough to P then
Return RN
End if
End for
End
Algorithm for Negative set extraction is presented below.
BLOCK DIAGRAM OF GENERATING A LIST OF SPAMMERS
Literature Riews
To detect spam reviews, some scholars have done some related research by using
the techniques of data mining and natural language processing.This works are
performed by the several other researcher by work of them my research takes place
1. The paper entiteled “FAKE REVIEW DETECTION FROM A PRODUCT REVIEW USING
MODIFIED METHOD OF ITERATIVE COMPUTATION FRAMEWORK “ was published in
DP Sciences.( DOI: 10.1051/conf/ matec5803003) by Eka Dyar Wahyuni and Arif
Djunaidy.
They worked on The honesty value of a review will be measured by utilizing the text
mining and opinion mining techniques. The result from the experiment shows that
the proposed system has a better accuracy compared with the result from iterative
computation framework (ICF) method and try to identified the fake reviews.The
drawback of this method is, some process need to be optimized, so it can detect a
fake review in a short amount of time.
REVIEW OF PAPERS:-
2. The paper entitled “Spammers Detection from Product Reviews: A Hybrid Model”
was published in 1550-4786/15 $31.00 © 2015 IEEEDOI 10.1109/ICDM.2015.73.
They worked on This paper focuses on detecting hidden spam users based on product
reviews. In the literature, there have been tremendous studies suggesting diversified
methods for spammer detection. This paper proposes a principled hybrid learning
model called hPSD to combine both user features and user-product relations for
spammer detection. Three essential components of hPSD, including feature
discretization, reliable negative set extraction and hybrid learning scheme, are
elaborated respectively.
3. The paper entitled “Mining the Peanut Gallery: Opinion Extraction and Semantic
Classification Of Product Reviews” was published by Kushal Dave NEC Laboratories
America
They worked on Opinion mining tool would process a set of search results for a given
item, generating a list of product attributes (quality ,features, etc.) and aggregating
opinions about each of them (poor , mixed,good).We begin by identifying the unique
properties of this problem and develop a method for automatically distinguishing
between positive and negative reviews. a number of issues that make this problem
difficult in Rating inconsistency, Ambivalence and comparison , Sparse data , Skewed
distribution
PROBLEM IDENTIFICATION
The main problem of reviews by users lies in the fact to identify the spam reviews
in between genuine reviews. The reviews posted by any users can be spam or not a
spam. Consider an example of person Alice. Alice constantly posting review of some
published “X”. The publisher published many books. Alice simply post good content
and genuine review to the publisher “X”. He purchase most of the books of “X” and
provide review on that particular book. So by looking at this posts, the algorithm can
conclude that the user Alice is genuine user so as its comments too.
But in fact, the user Alice is hired to posts review by publisher “X”. HE gave
good and 5 star rating to publisher “X” books. This might be the problem in
identification of users who looks to be genuine but not actually is.
Fig. 3.1. Alice Behaviour of Reviewing Books
Percentage of Users Being Spammer and Ham
Lastly, the users are classified into spam and non-spam categories. The
probability of categorizing into spam and non-spam category are presented in
In our dataset, the probability of spam users are 49 % and non-spam users
are 51%. The dataset is flooded with the spam users. The user need to be blocked
so that they cannot further effect the review and comments.
Fig.. Shows the Probability of Spammer and Non-Spammers
Users Blocking
After identifying of the spam users, they are blocked. The
blocking stage is depicted in fig.
The result produced by EM(Expectation Maximization) algorithm with 6 features are
compared with the base paper having more number of features. Fig. Shows the
comparison between proposed and existing approach.
Future scope of work
In this project, majority of the work has been done with respect to spammer
detection technique. The major drawback of this work is working with only one
dataset. The future scope might be working with multiple dataset to analyse the
attacker of other websites too.
References
Nitin Jindal and Bing Liu, “Analyzing and Detecting Review Spam”, Seventh IEEE International Conference on Data
Mining 2007.
SNEHAL DIXIT & A.J.AGRAWAL, “REVIEW SPAM DETECTION”, International Journal of Computational
Linguistics and Natural Language Processing Vol 2 Issue 6 June 2013 ISSN2279 –0756
Gera T., Thakur D. and Singh J. 2015. BILD Testing for Spotting Out Suspicious Reviews, Suspicious Reviewers and
Group Spammers, International Conference on Communication Systems and Network Technologies(CSNT.2015.138).
Liang D., Liu X. and Shen H. 2014. Detecting Spam Reviewers by Combing Reviewer Feature and Relationship,
International Conference on Informative and Cybernetics for Computational Social Systems (ICCSS).
Mukherjee A., Kumar A., Liu B., Wang J. and Ghosh R. 2013. Spotting Opinion Spammers using Behavioral Footprints.
Mukherjee A., Glance N. and Liu B. 2012 . Spotting Fake Reviewer Groups in Consumer Reviews.
Wang G., Xie S., Liu B. and Philip S. Yu 2011. Review Graph based Online Store Review Spammer Detection, IEEE
International Conference on Data Mining(ICDM) .
Zhang X., Xiong G., Zhu F. and Dong X. 2016. A Method of SMS Spam Filtering Based on AdaBoost Algorithm, World
Congress on Intelligent Control and Automation (WCICA).

Mahendra nath

  • 1.
    A Presentation on “Fake User Detection” SUBMITTED BY Mahendra Nath Dwivedi Roll No:- 502202216004 Enroll No.:- AA/3522 Department of Computer Science & Engineering Central college of engineering and management GUIDED BY Mr. Abhishek Badholia DEPT. OF COMPUTER SCIENCE & ENGINEERING
  • 2.
    CONTENT • INTRODUCTION - INTRODUCTIONOF PROJECT - REVIEW ANALYSIS OF AMAZON.COM - BEHAVIOR FEATURES OF SPAMMERS • LITERATURE REVIEW • PROBLEM IDENTIFICATION • METHODOLOGY • RESULT AND FUTURE SCOPE • REFERENCES
  • 3.
    With the developmentof the Internet, people are more likely to express their views and opinions on the Web. They can write reviews or other opinions on E-Commerce sites, forums, and blogs. They are also used by product manufacturers to identify problems of their products and to find competitive intelligence information about their competitors. Unfortunately, this importance of reviews also gives good incentive for spam, which contains false positive or malicious negative opinions INTRODUCTION
  • 4.
    Table shows someselected mobile phone reviews from the Amazon website. For the mobile phone product's topic, reviews 1 and 2 are relevant to the topic, and review 1 has the highest relevancy than other reviews. But, it is hard to decide the relevancy between review 3 and the topic. Besides, reviews 4 and 5 are part of plagiarism of review 3, and review 6 is an advertisement. Only two of six reviews are relevant to the mobile phone product's topic. Fake reviews can not only increase decision's making cost, but also affect decision accuracy making. Review(Comment) analysis of website.
  • 6.
    1) Star User 2)Deviation Rate 3) Bias Rate 4) Review Similarity rate 5) Review Relevancy Rate 6) Content-Length 7) Illustration 8) Burst review BEHAVIOR FEATURES OF SPAMMERS
  • 7.
  • 8.
    Types of ReviewSpams Basically three types of review spams exist[6]. These are: Type 1 (Untruthful Review Spams): Fictitious positive reviews are rewarded to products in order to promote them and also unreasonable negative reviews are given to the competing products to harm their reputations among the consumers.This is how untruthful reviews mislead the consumers into believing their spam reviews. Type 2 (Reviews with brand mentions): These spams have only brands as their prime focus. They comment about the manufacturer or seller or the brand name alone.These reviews are biased and can easily b figured out as they do not talk about the product and rather only mention the brand names. Type 3 (Non-reviews): These reviews are either junk, as in, have no relation with the product or are purely used for advertisement purposes. They have these two forms: i. marketing purposes, and ii. irrelevant text or reviews having random write-ups.
  • 9.
  • 10.
    METHODOLOGY In this sectionwe will discuss the proposed framework in detail. The proposed spam detection and blocking framework consist of various modules. •Feature Discretization •Negative Set Extraction •Expected Maximization Algorithm •Blocking of Users .
  • 11.
    Rating Deviation fromMean Agreement Filter Mean Target Difference Group Filter Mean Variance Target Model Focus
  • 12.
    Algorithm: Negative SetExtraction Input: P → Positive Set of Spammers U → unlabeled set of users Output: RN → Set of negative set. RN <- N initially RN_Extract (P, U) For each feature do Calculate End for For evaluate each feature (decreasing order) do Remove instances consists of from If Size(RN) is close enough to P then Return RN End if End for End Algorithm for Negative set extraction is presented below.
  • 13.
    BLOCK DIAGRAM OFGENERATING A LIST OF SPAMMERS
  • 14.
    Literature Riews To detectspam reviews, some scholars have done some related research by using the techniques of data mining and natural language processing.This works are performed by the several other researcher by work of them my research takes place 1. The paper entiteled “FAKE REVIEW DETECTION FROM A PRODUCT REVIEW USING MODIFIED METHOD OF ITERATIVE COMPUTATION FRAMEWORK “ was published in DP Sciences.( DOI: 10.1051/conf/ matec5803003) by Eka Dyar Wahyuni and Arif Djunaidy. They worked on The honesty value of a review will be measured by utilizing the text mining and opinion mining techniques. The result from the experiment shows that the proposed system has a better accuracy compared with the result from iterative computation framework (ICF) method and try to identified the fake reviews.The drawback of this method is, some process need to be optimized, so it can detect a fake review in a short amount of time. REVIEW OF PAPERS:-
  • 15.
    2. The paperentitled “Spammers Detection from Product Reviews: A Hybrid Model” was published in 1550-4786/15 $31.00 © 2015 IEEEDOI 10.1109/ICDM.2015.73. They worked on This paper focuses on detecting hidden spam users based on product reviews. In the literature, there have been tremendous studies suggesting diversified methods for spammer detection. This paper proposes a principled hybrid learning model called hPSD to combine both user features and user-product relations for spammer detection. Three essential components of hPSD, including feature discretization, reliable negative set extraction and hybrid learning scheme, are elaborated respectively. 3. The paper entitled “Mining the Peanut Gallery: Opinion Extraction and Semantic Classification Of Product Reviews” was published by Kushal Dave NEC Laboratories America They worked on Opinion mining tool would process a set of search results for a given item, generating a list of product attributes (quality ,features, etc.) and aggregating opinions about each of them (poor , mixed,good).We begin by identifying the unique properties of this problem and develop a method for automatically distinguishing between positive and negative reviews. a number of issues that make this problem difficult in Rating inconsistency, Ambivalence and comparison , Sparse data , Skewed distribution
  • 16.
    PROBLEM IDENTIFICATION The mainproblem of reviews by users lies in the fact to identify the spam reviews in between genuine reviews. The reviews posted by any users can be spam or not a spam. Consider an example of person Alice. Alice constantly posting review of some published “X”. The publisher published many books. Alice simply post good content and genuine review to the publisher “X”. He purchase most of the books of “X” and provide review on that particular book. So by looking at this posts, the algorithm can conclude that the user Alice is genuine user so as its comments too. But in fact, the user Alice is hired to posts review by publisher “X”. HE gave good and 5 star rating to publisher “X” books. This might be the problem in identification of users who looks to be genuine but not actually is. Fig. 3.1. Alice Behaviour of Reviewing Books
  • 17.
    Percentage of UsersBeing Spammer and Ham Lastly, the users are classified into spam and non-spam categories. The probability of categorizing into spam and non-spam category are presented in In our dataset, the probability of spam users are 49 % and non-spam users are 51%. The dataset is flooded with the spam users. The user need to be blocked so that they cannot further effect the review and comments. Fig.. Shows the Probability of Spammer and Non-Spammers
  • 18.
    Users Blocking After identifyingof the spam users, they are blocked. The blocking stage is depicted in fig.
  • 19.
    The result producedby EM(Expectation Maximization) algorithm with 6 features are compared with the base paper having more number of features. Fig. Shows the comparison between proposed and existing approach.
  • 20.
    Future scope ofwork In this project, majority of the work has been done with respect to spammer detection technique. The major drawback of this work is working with only one dataset. The future scope might be working with multiple dataset to analyse the attacker of other websites too.
  • 21.
    References Nitin Jindal andBing Liu, “Analyzing and Detecting Review Spam”, Seventh IEEE International Conference on Data Mining 2007. SNEHAL DIXIT & A.J.AGRAWAL, “REVIEW SPAM DETECTION”, International Journal of Computational Linguistics and Natural Language Processing Vol 2 Issue 6 June 2013 ISSN2279 –0756 Gera T., Thakur D. and Singh J. 2015. BILD Testing for Spotting Out Suspicious Reviews, Suspicious Reviewers and Group Spammers, International Conference on Communication Systems and Network Technologies(CSNT.2015.138). Liang D., Liu X. and Shen H. 2014. Detecting Spam Reviewers by Combing Reviewer Feature and Relationship, International Conference on Informative and Cybernetics for Computational Social Systems (ICCSS). Mukherjee A., Kumar A., Liu B., Wang J. and Ghosh R. 2013. Spotting Opinion Spammers using Behavioral Footprints. Mukherjee A., Glance N. and Liu B. 2012 . Spotting Fake Reviewer Groups in Consumer Reviews. Wang G., Xie S., Liu B. and Philip S. Yu 2011. Review Graph based Online Store Review Spammer Detection, IEEE International Conference on Data Mining(ICDM) . Zhang X., Xiong G., Zhu F. and Dong X. 2016. A Method of SMS Spam Filtering Based on AdaBoost Algorithm, World Congress on Intelligent Control and Automation (WCICA).