Detection of fake online reviews using semi-supervised and supervised learning
By
G.Manoj Kumar
(20BF1F0033)
Under the guidance of
P.LOKESH KUMAR REDDY
Assistant Professor
DEPARTMENT OF COMPUTER APPLICATION
SRI VENKATESWARA COLLEGE OF ENGINEERING
Karakambadi Road, TIRUPATI – 517507
2020– 2022
MCA IV Semester Project II Review Presentation
Contents
 ABSTRACT
 INTRODUCTION
 PROBLEM DEFINITION
 EXISTING SYSTEM
DISADVANTAGES OF EXISTING SYSTEM
 PROPOSED SYSTEM
ADVANTAGES OF PROPOSED SYSTEM
Contents
 SOFTWARE AND HARDWARE REQUIREMENTS
 ALGORITHMS
 MODULES
 ARCHITECTURE
 UML DIAGRAMS
 TESTING STRATEGIES
ABSTRACT
Online reviews have great impact on today’s business and commerce.
Decision making for purchase of online products mostly depends on reviews
given by the users. Hence, opportunistic individuals or groups try to
manipulate product reviews for their own interests. This Project introduces
some semi-supervised and supervised text mining models to detect fake
online reviews as well as compares the efficiency of both techniques on
dataset containing hotel reviews
INTRODUCTION
 Technologies are swiftly evolving. Old innovations are being constantly substituted
with modern and emerging technologies.
 This emerging innovations allow individuals to carry out their work effectively. The
online marketplace is such a technical advancement.
 Through utilising online portals, can shop and make reservations. Before consuming
those goods or facilities, almost every one of us seeks out feedback.
 They also have a huge influence on advertisements and the marketing of goods and
services. Fake web reviews are becoming extremely relevant with the spread of the
online marketplace.
 For the marketing of their own goods, people may create fake reviews that damage the
real consumers.
 Researchers have been exploring several ways to recognise these bogus web reviews.
Some methods are focused on the quality of the article and some are based on the
actions of the consumer who publishes feedback.
PROBLEM DEFINITION
This Project introduces some semi-supervised and supervised text mining models to
detect fake online reviews as well as compares the efficiency of both techniques on
dataset containing hotel reviews.
EXISTING SYSTEM
 Content based methods focus on what is the content of the review. That is the
text of the review or what is told in it. Heydari et al. have attempted to detect
spam review by analyzing the linguistic features of the review.
 Ott used three techniques to perform classification. These three techniques are-
genre identification, detection of psycholinguistic deception and text
categorization.
 Behavior feature based study focuses on the reviewer that includes
characteristics of the person who is giving the review.
 Lim et al. addressed the problem of review spammer detection, or finding users
who are the source of spam reviews. People who post intentional fake reviews
have significantly different behavior than the normal user.
DISADVANTAGES OF EXISTING SYSTEM
 In the existing work, the system uses only to semi-supervised learning.
 Only Text Classification as sentiment text and it never finds fake
PROPOSED SYSTEM
 In the proposed system, each review goes through tokenization process first. Then,
unnecessary words are removed and candidate feature words are generated.
 Each candidate feature words are checked against the dictionary and if its entry is
available in the dictionary then its frequency is counted and added to the column in
the feature vector that corresponds the numeric map of the word.
 Alongside with counting frequency, the length of the review is measured and
added to the feature vector.
 Finally, sentiment score which is available in the data set is added in the feature
vector. We have assigned negative sentiment as zero valued and positive sentiment
as some positive valued in the feature vector.
ADVANTAGES OF PROPOSED SYSTEM
 Detect Unseen Attacks
 Low False Positive rates
 Support Large Datasets
 Able to handle imbalanced Dataset
HARDWARE REQUIREMENTS
 Processor- Intel (R) Core (TM) i3-4200U
 CPU - 1.6GHz
 RAM:4 GB
 Hard Disk: 500 GB.
SOFTWARE REQUIREMENTS
 Operating System - Windows 10
 Server - XAMPP
 Front End - HTML, CSS, JS
 Back end - Python
 Data base - MYSQL
The following the some algorithms used in this project:
1. Naïve Bayes Classifier
ALGORITHMS
In this Proposed System, There are two modules. They are:
1. Service Provider
2. Remote User
MODULES
MODULES DESCRIPTION
In this Project, There are two modules :
Service provider
In this module, login , add movies, view uploaded movies, view positive, view
negative, view neutral, view sentients reviews, view rating results, dislike like results,
view remote users, view movie reviews, view trending movies, view movies
recommended, view fake reviews /ratings, logout
User
In the module, the user will register and then login and do such operations like view all
added movies, view all movies reviews, view trending movies, view your profile, view
all movies recommended, logout
MODULES
SYSTEM ARCHITECTURE
Class Diagram
UML DIAGRAMS
USE CASE DIAGRAM FOR REMOTE USER
UML DIAGRAMS
Register & login
View all Added Movies
View all movies Reviews
View Trending movies
Remote User
View Your Profile
View all Movies Recommended
Log out
USE CASE DIAGRAM FOR SERVICE PROVIDER
UML DIAGRAMS
Login
Add Movies
Add Uploaded Movies
View Positive / Negative / Neutral
Sentiment reviews
View rating Results
Dislike / Like Results
View Remote Users
View Movie Reviews
View Trending Movies
Service Provider
View Movie Recommended
View Fake reviews / Rating
Log out
Sequence diagram
UML DIAGRAMS
Service provider Server Remote User
Login
Register & Login
1. Add movies
2. View uploaded movies
3. View postive / Negative/ netural sentiment reviews
4. View rating results
5. Dislike / Likes results
6. View remote users
7. View movie reviews
8. View tranding movies
9. View movies recommended
10. View fake reviews / rating
11. View all Added movies
12. View all movies reviews
Collaboration diagram
UML DIAGRAMS
: Service
provider
: Server
: Remote
User
View all Added Movies
View all Movies Reviews
View Trending movies
View your profile
View All Movies recommeded
1:
2:
Add Movies
View uploaded movies
View positive / Negitive / Netural
sentiment reviews
View rating results
Dislikes / Likes Results
View Remote Users
View Movies Reviews
View Trending Movies
View movies recommended
View Fake Reviews / Rating
ACTIVITY diagram For Remote User
UML DIAGRAMS
Register & Login
Che
ck Valid
Invalid
View all Added Movies
View all Movies Reviews
Log out
View Trending movies
View your profile
View all Movies recommeded
Activity diagram For Service Provider
UML DIAGRAMS
Login
Che
ck Valid
Invalid
Add Movies
View Uploaded Movies
View Positive / Negative / Neutral
Sentiment Reviews
View rating Results
Dislike / Likes results
View Remote Users
View Movie Reviews
Log out
View Trending Movies
View Movie Recommended
View Fake Reviews / Rating
Deployment diagram
UML DIAGRAMS
admin
server
user
The Common view of testing held by users is that it is preformed to improve that there
are no errors in a program. This is extremely difficult since designer cannot prove to be one
hundred percent accurate
It requires the focus on basic critical factors:
 Planning
 Project and process control
 Risk management
 Organization and professionalism Inspections
 Measurement tools
SYSTEM TESTING
 Level of Testing
The term end-to-end testing is also used in many organizations and tends to refer to a
combination of Systems Testing and Systems Integration Testing. Also, in some organizations the
term Systems Testing is used interchangeably with end-to-end testing.
 Test Plan
Before going for testing, first decide upon the type of testing to be carried out. The
following factors are taken into consideration:
To ensure whether that information properly flows into and out of program.
To find whether the local data structures maintaining their integrity during all steps in an
algorithm execution or not.
SYSTEM TESTING
To ensure that the module operate properly at boundaries established to limit or restrict
progressing.
To find out whether error-handling paths are working correctly or not. To find
out whether the values are correctly updated or not check for validation
Objectives of Testing
 Testing is done to ensure
No bug occurrence in future usage of the Application.
Quality Assurance standard is achieved.
Discover symptoms caused by bugs and provide clear diagnosis so that bugs can be
easily prevented.
SYSTEM TESTING
Test Case Design Techniques
During testing the program to be tested is executed with a set of test cases and output of
the program for the test cases is evaluated to determine if the program is performing as
expected. To accomplish this objectives test case design techniques are used:
 Unit Testing.
 Integration Testing.
 User Acceptance Testing.
 Output Testing.
 Validation Testing.
SYSTEM TESTING
Detection of Fake reviews

Detection of Fake reviews

  • 1.
    Detection of fakeonline reviews using semi-supervised and supervised learning By G.Manoj Kumar (20BF1F0033) Under the guidance of P.LOKESH KUMAR REDDY Assistant Professor DEPARTMENT OF COMPUTER APPLICATION SRI VENKATESWARA COLLEGE OF ENGINEERING Karakambadi Road, TIRUPATI – 517507 2020– 2022 MCA IV Semester Project II Review Presentation
  • 2.
    Contents  ABSTRACT  INTRODUCTION PROBLEM DEFINITION  EXISTING SYSTEM DISADVANTAGES OF EXISTING SYSTEM  PROPOSED SYSTEM ADVANTAGES OF PROPOSED SYSTEM
  • 3.
    Contents  SOFTWARE ANDHARDWARE REQUIREMENTS  ALGORITHMS  MODULES  ARCHITECTURE  UML DIAGRAMS  TESTING STRATEGIES
  • 4.
    ABSTRACT Online reviews havegreat impact on today’s business and commerce. Decision making for purchase of online products mostly depends on reviews given by the users. Hence, opportunistic individuals or groups try to manipulate product reviews for their own interests. This Project introduces some semi-supervised and supervised text mining models to detect fake online reviews as well as compares the efficiency of both techniques on dataset containing hotel reviews
  • 5.
    INTRODUCTION  Technologies areswiftly evolving. Old innovations are being constantly substituted with modern and emerging technologies.  This emerging innovations allow individuals to carry out their work effectively. The online marketplace is such a technical advancement.  Through utilising online portals, can shop and make reservations. Before consuming those goods or facilities, almost every one of us seeks out feedback.  They also have a huge influence on advertisements and the marketing of goods and services. Fake web reviews are becoming extremely relevant with the spread of the online marketplace.  For the marketing of their own goods, people may create fake reviews that damage the real consumers.  Researchers have been exploring several ways to recognise these bogus web reviews. Some methods are focused on the quality of the article and some are based on the actions of the consumer who publishes feedback.
  • 6.
    PROBLEM DEFINITION This Projectintroduces some semi-supervised and supervised text mining models to detect fake online reviews as well as compares the efficiency of both techniques on dataset containing hotel reviews.
  • 7.
    EXISTING SYSTEM  Contentbased methods focus on what is the content of the review. That is the text of the review or what is told in it. Heydari et al. have attempted to detect spam review by analyzing the linguistic features of the review.  Ott used three techniques to perform classification. These three techniques are- genre identification, detection of psycholinguistic deception and text categorization.  Behavior feature based study focuses on the reviewer that includes characteristics of the person who is giving the review.  Lim et al. addressed the problem of review spammer detection, or finding users who are the source of spam reviews. People who post intentional fake reviews have significantly different behavior than the normal user.
  • 8.
    DISADVANTAGES OF EXISTINGSYSTEM  In the existing work, the system uses only to semi-supervised learning.  Only Text Classification as sentiment text and it never finds fake
  • 9.
    PROPOSED SYSTEM  Inthe proposed system, each review goes through tokenization process first. Then, unnecessary words are removed and candidate feature words are generated.  Each candidate feature words are checked against the dictionary and if its entry is available in the dictionary then its frequency is counted and added to the column in the feature vector that corresponds the numeric map of the word.  Alongside with counting frequency, the length of the review is measured and added to the feature vector.  Finally, sentiment score which is available in the data set is added in the feature vector. We have assigned negative sentiment as zero valued and positive sentiment as some positive valued in the feature vector.
  • 10.
    ADVANTAGES OF PROPOSEDSYSTEM  Detect Unseen Attacks  Low False Positive rates  Support Large Datasets  Able to handle imbalanced Dataset
  • 11.
    HARDWARE REQUIREMENTS  Processor-Intel (R) Core (TM) i3-4200U  CPU - 1.6GHz  RAM:4 GB  Hard Disk: 500 GB.
  • 12.
    SOFTWARE REQUIREMENTS  OperatingSystem - Windows 10  Server - XAMPP  Front End - HTML, CSS, JS  Back end - Python  Data base - MYSQL
  • 13.
    The following thesome algorithms used in this project: 1. Naïve Bayes Classifier ALGORITHMS
  • 14.
    In this ProposedSystem, There are two modules. They are: 1. Service Provider 2. Remote User MODULES
  • 15.
    MODULES DESCRIPTION In thisProject, There are two modules : Service provider In this module, login , add movies, view uploaded movies, view positive, view negative, view neutral, view sentients reviews, view rating results, dislike like results, view remote users, view movie reviews, view trending movies, view movies recommended, view fake reviews /ratings, logout User In the module, the user will register and then login and do such operations like view all added movies, view all movies reviews, view trending movies, view your profile, view all movies recommended, logout MODULES
  • 16.
  • 17.
  • 18.
    USE CASE DIAGRAMFOR REMOTE USER UML DIAGRAMS Register & login View all Added Movies View all movies Reviews View Trending movies Remote User View Your Profile View all Movies Recommended Log out
  • 19.
    USE CASE DIAGRAMFOR SERVICE PROVIDER UML DIAGRAMS Login Add Movies Add Uploaded Movies View Positive / Negative / Neutral Sentiment reviews View rating Results Dislike / Like Results View Remote Users View Movie Reviews View Trending Movies Service Provider View Movie Recommended View Fake reviews / Rating Log out
  • 20.
    Sequence diagram UML DIAGRAMS Serviceprovider Server Remote User Login Register & Login 1. Add movies 2. View uploaded movies 3. View postive / Negative/ netural sentiment reviews 4. View rating results 5. Dislike / Likes results 6. View remote users 7. View movie reviews 8. View tranding movies 9. View movies recommended 10. View fake reviews / rating 11. View all Added movies 12. View all movies reviews
  • 21.
    Collaboration diagram UML DIAGRAMS :Service provider : Server : Remote User View all Added Movies View all Movies Reviews View Trending movies View your profile View All Movies recommeded 1: 2: Add Movies View uploaded movies View positive / Negitive / Netural sentiment reviews View rating results Dislikes / Likes Results View Remote Users View Movies Reviews View Trending Movies View movies recommended View Fake Reviews / Rating
  • 22.
    ACTIVITY diagram ForRemote User UML DIAGRAMS Register & Login Che ck Valid Invalid View all Added Movies View all Movies Reviews Log out View Trending movies View your profile View all Movies recommeded
  • 23.
    Activity diagram ForService Provider UML DIAGRAMS Login Che ck Valid Invalid Add Movies View Uploaded Movies View Positive / Negative / Neutral Sentiment Reviews View rating Results Dislike / Likes results View Remote Users View Movie Reviews Log out View Trending Movies View Movie Recommended View Fake Reviews / Rating
  • 24.
  • 25.
    The Common viewof testing held by users is that it is preformed to improve that there are no errors in a program. This is extremely difficult since designer cannot prove to be one hundred percent accurate It requires the focus on basic critical factors:  Planning  Project and process control  Risk management  Organization and professionalism Inspections  Measurement tools SYSTEM TESTING
  • 26.
     Level ofTesting The term end-to-end testing is also used in many organizations and tends to refer to a combination of Systems Testing and Systems Integration Testing. Also, in some organizations the term Systems Testing is used interchangeably with end-to-end testing.  Test Plan Before going for testing, first decide upon the type of testing to be carried out. The following factors are taken into consideration: To ensure whether that information properly flows into and out of program. To find whether the local data structures maintaining their integrity during all steps in an algorithm execution or not. SYSTEM TESTING
  • 27.
    To ensure thatthe module operate properly at boundaries established to limit or restrict progressing. To find out whether error-handling paths are working correctly or not. To find out whether the values are correctly updated or not check for validation Objectives of Testing  Testing is done to ensure No bug occurrence in future usage of the Application. Quality Assurance standard is achieved. Discover symptoms caused by bugs and provide clear diagnosis so that bugs can be easily prevented. SYSTEM TESTING
  • 28.
    Test Case DesignTechniques During testing the program to be tested is executed with a set of test cases and output of the program for the test cases is evaluated to determine if the program is performing as expected. To accomplish this objectives test case design techniques are used:  Unit Testing.  Integration Testing.  User Acceptance Testing.  Output Testing.  Validation Testing. SYSTEM TESTING