This document presents a project that aims to detect fake online reviews using semi-supervised and supervised learning techniques. It discusses detecting fake reviews as the problem definition. The proposed system generates feature vectors from reviews for classification using algorithms like Naive Bayes. The system has modules for service providers and remote users. UML diagrams like use case diagrams, sequence diagrams and activity diagrams are presented to model the system. Testing strategies like unit testing, integration testing are discussed.
1. Detection of fake online reviews using semi-supervised and supervised learning
By
G.Manoj Kumar
(20BF1F0033)
Under the guidance of
P.LOKESH KUMAR REDDY
Assistant Professor
DEPARTMENT OF COMPUTER APPLICATION
SRI VENKATESWARA COLLEGE OF ENGINEERING
Karakambadi Road, TIRUPATI – 517507
2020– 2022
MCA IV Semester Project II Review Presentation
2. Contents
ABSTRACT
INTRODUCTION
PROBLEM DEFINITION
EXISTING SYSTEM
DISADVANTAGES OF EXISTING SYSTEM
PROPOSED SYSTEM
ADVANTAGES OF PROPOSED SYSTEM
4. ABSTRACT
Online reviews have great impact on today’s business and commerce.
Decision making for purchase of online products mostly depends on reviews
given by the users. Hence, opportunistic individuals or groups try to
manipulate product reviews for their own interests. This Project introduces
some semi-supervised and supervised text mining models to detect fake
online reviews as well as compares the efficiency of both techniques on
dataset containing hotel reviews
5. INTRODUCTION
Technologies are swiftly evolving. Old innovations are being constantly substituted
with modern and emerging technologies.
This emerging innovations allow individuals to carry out their work effectively. The
online marketplace is such a technical advancement.
Through utilising online portals, can shop and make reservations. Before consuming
those goods or facilities, almost every one of us seeks out feedback.
They also have a huge influence on advertisements and the marketing of goods and
services. Fake web reviews are becoming extremely relevant with the spread of the
online marketplace.
For the marketing of their own goods, people may create fake reviews that damage the
real consumers.
Researchers have been exploring several ways to recognise these bogus web reviews.
Some methods are focused on the quality of the article and some are based on the
actions of the consumer who publishes feedback.
6. PROBLEM DEFINITION
This Project introduces some semi-supervised and supervised text mining models to
detect fake online reviews as well as compares the efficiency of both techniques on
dataset containing hotel reviews.
7. EXISTING SYSTEM
Content based methods focus on what is the content of the review. That is the
text of the review or what is told in it. Heydari et al. have attempted to detect
spam review by analyzing the linguistic features of the review.
Ott used three techniques to perform classification. These three techniques are-
genre identification, detection of psycholinguistic deception and text
categorization.
Behavior feature based study focuses on the reviewer that includes
characteristics of the person who is giving the review.
Lim et al. addressed the problem of review spammer detection, or finding users
who are the source of spam reviews. People who post intentional fake reviews
have significantly different behavior than the normal user.
8. DISADVANTAGES OF EXISTING SYSTEM
In the existing work, the system uses only to semi-supervised learning.
Only Text Classification as sentiment text and it never finds fake
9. PROPOSED SYSTEM
In the proposed system, each review goes through tokenization process first. Then,
unnecessary words are removed and candidate feature words are generated.
Each candidate feature words are checked against the dictionary and if its entry is
available in the dictionary then its frequency is counted and added to the column in
the feature vector that corresponds the numeric map of the word.
Alongside with counting frequency, the length of the review is measured and
added to the feature vector.
Finally, sentiment score which is available in the data set is added in the feature
vector. We have assigned negative sentiment as zero valued and positive sentiment
as some positive valued in the feature vector.
10. ADVANTAGES OF PROPOSED SYSTEM
Detect Unseen Attacks
Low False Positive rates
Support Large Datasets
Able to handle imbalanced Dataset
12. SOFTWARE REQUIREMENTS
Operating System - Windows 10
Server - XAMPP
Front End - HTML, CSS, JS
Back end - Python
Data base - MYSQL
13. The following the some algorithms used in this project:
1. Naïve Bayes Classifier
ALGORITHMS
14. In this Proposed System, There are two modules. They are:
1. Service Provider
2. Remote User
MODULES
15. MODULES DESCRIPTION
In this Project, There are two modules :
Service provider
In this module, login , add movies, view uploaded movies, view positive, view
negative, view neutral, view sentients reviews, view rating results, dislike like results,
view remote users, view movie reviews, view trending movies, view movies
recommended, view fake reviews /ratings, logout
User
In the module, the user will register and then login and do such operations like view all
added movies, view all movies reviews, view trending movies, view your profile, view
all movies recommended, logout
MODULES
18. USE CASE DIAGRAM FOR REMOTE USER
UML DIAGRAMS
Register & login
View all Added Movies
View all movies Reviews
View Trending movies
Remote User
View Your Profile
View all Movies Recommended
Log out
19. USE CASE DIAGRAM FOR SERVICE PROVIDER
UML DIAGRAMS
Login
Add Movies
Add Uploaded Movies
View Positive / Negative / Neutral
Sentiment reviews
View rating Results
Dislike / Like Results
View Remote Users
View Movie Reviews
View Trending Movies
Service Provider
View Movie Recommended
View Fake reviews / Rating
Log out
20. Sequence diagram
UML DIAGRAMS
Service provider Server Remote User
Login
Register & Login
1. Add movies
2. View uploaded movies
3. View postive / Negative/ netural sentiment reviews
4. View rating results
5. Dislike / Likes results
6. View remote users
7. View movie reviews
8. View tranding movies
9. View movies recommended
10. View fake reviews / rating
11. View all Added movies
12. View all movies reviews
21. Collaboration diagram
UML DIAGRAMS
: Service
provider
: Server
: Remote
User
View all Added Movies
View all Movies Reviews
View Trending movies
View your profile
View All Movies recommeded
1:
2:
Add Movies
View uploaded movies
View positive / Negitive / Netural
sentiment reviews
View rating results
Dislikes / Likes Results
View Remote Users
View Movies Reviews
View Trending Movies
View movies recommended
View Fake Reviews / Rating
22. ACTIVITY diagram For Remote User
UML DIAGRAMS
Register & Login
Che
ck Valid
Invalid
View all Added Movies
View all Movies Reviews
Log out
View Trending movies
View your profile
View all Movies recommeded
23. Activity diagram For Service Provider
UML DIAGRAMS
Login
Che
ck Valid
Invalid
Add Movies
View Uploaded Movies
View Positive / Negative / Neutral
Sentiment Reviews
View rating Results
Dislike / Likes results
View Remote Users
View Movie Reviews
Log out
View Trending Movies
View Movie Recommended
View Fake Reviews / Rating
25. The Common view of testing held by users is that it is preformed to improve that there
are no errors in a program. This is extremely difficult since designer cannot prove to be one
hundred percent accurate
It requires the focus on basic critical factors:
Planning
Project and process control
Risk management
Organization and professionalism Inspections
Measurement tools
SYSTEM TESTING
26. Level of Testing
The term end-to-end testing is also used in many organizations and tends to refer to a
combination of Systems Testing and Systems Integration Testing. Also, in some organizations the
term Systems Testing is used interchangeably with end-to-end testing.
Test Plan
Before going for testing, first decide upon the type of testing to be carried out. The
following factors are taken into consideration:
To ensure whether that information properly flows into and out of program.
To find whether the local data structures maintaining their integrity during all steps in an
algorithm execution or not.
SYSTEM TESTING
27. To ensure that the module operate properly at boundaries established to limit or restrict
progressing.
To find out whether error-handling paths are working correctly or not. To find
out whether the values are correctly updated or not check for validation
Objectives of Testing
Testing is done to ensure
No bug occurrence in future usage of the Application.
Quality Assurance standard is achieved.
Discover symptoms caused by bugs and provide clear diagnosis so that bugs can be
easily prevented.
SYSTEM TESTING
28. Test Case Design Techniques
During testing the program to be tested is executed with a set of test cases and output of
the program for the test cases is evaluated to determine if the program is performing as
expected. To accomplish this objectives test case design techniques are used:
Unit Testing.
Integration Testing.
User Acceptance Testing.
Output Testing.
Validation Testing.
SYSTEM TESTING