COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
1. COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
SYSTEM
Abstract
Online consumer reviews play an important role in helping consumers judge the quality
and authenticity of products on e-commerce platforms. However, the constant presence of fake
reviews on these platforms has significantly impacted the operation and development of e-
commerce platforms. In this study, we develop a novel supervised probabilistic method to
detect fake reviews by utilizing the difference in the distribution of non-fraudulent reviews and
that of fake reviews. Specifically, we first derive the univariate distributions of several unique
features (linguistic, behavioral, and interrelationship features). We then integrate these
distributions into two mixed distributions according to their labels to represent the overall
difference between non-fraudulent reviews and fake reviews. Next, we randomly generate
synthetic review data points with different labels from the above mixed distributions. Finally,
we train a Multilayer Perceptron model by using these synthetic review data to obtain a
classifier. We conducted several experiments to test the model using several original real-world
review datasets. Numerical results indicated that the proposed supervised method
outperformed some well-known sampling models and fake review detection methods, in terms
of classification accuracy. Moreover, we extend the proposed method to handle the scenarios
with small samples of raw review data. This study contributes to the literature by exploiting
the difference in the distribution of non-fraudulent reviews and that of fraudulent reviews,
which can improve the accuracy of fake review detection for online platforms.
Existing System
Detecting fake reviews on e-commerce platforms is critical for maintaining credibility
and trust among users. A supervised general mixed probability approach offers a robust method
to sift through reviews and identify potential fraudulent ones. By leveraging a combination of
machine learning algorithms and probabilistic models, this approach aims to analyze various
features within reviews to distinguish between genuine and fake content.The system employs
a supervised learning framework, utilizing a labeled dataset to train the model. Features such
as sentiment analysis, linguistic patterns, reviewer behavior, review timing, and product
information are considered to create a comprehensive feature set. These features are then
processed through a mixed probability model, which combines the strengths of different
2. probabilistic techniques, such as Bayesian methods or Hidden Markov Models, to assess the
likelihood of a review being authentic or deceptive.By employing a mixed probability
approach, this system can effectively handle diverse types of fake reviews, adapting to evolving
strategies used by malicious actors. Additionally, continuous model retraining and adaptation
ensure its ability to stay updated with new trends in fraudulent review practices.The goal of
this approach is not only to accurately detect fake reviews but also to provide e-commerce
platforms with a scalable and adaptable solution to maintain the integrity of their review
systems, fostering a trustworthy environment for both consumers and businesses.
Drawback in Existing System
Data Dependence: This approach heavily relies on labeled datasets for training.
Obtaining and maintaining a large and diverse labeled dataset can be challenging and
costly. Moreover, the model's effectiveness might decrease if the dataset doesn’t
adequately represent evolving fraudulent review tactics.
Feature Engineering Complexity: Extracting relevant features from reviews requires
sophisticated natural language processing (NLP) techniques. Designing and
engineering these features can be complex and computationally intensive. Additionally,
the model's performance heavily relies on the quality and relevance of these features.
Adaptability to New Techniques: Fraudulent review strategies evolve over time, and
new methods constantly emerge. The model might struggle to adapt quickly to these
changes, requiring frequent updates and retraining to maintain its effectiveness.
Resource Intensive: Implementing and maintaining a mixed probability approach can
be computationally demanding. This might pose challenges for smaller e-commerce
platforms with limited resources.
Proposed System
Data Collection: Description of the dataset acquisition process, emphasizing the need
for a diverse and labeled dataset.
Preprocessing and Feature Engineering: Details on data preprocessing techniques
and the selection of various features (linguistic, behavioral, temporal) for model
training.
3. Supervised Learning Framework: Explanation of the mixed probability approach
involving Bayesian classifiers, Hidden Markov Models, or ensemble methods.
Model Training and Evaluation: Methodology for model training, validation, and
performance evaluation using appropriate metrics.
Algorithm
Sentiment Analysis: Using algorithms like VADER (Valence Aware Dictionary and
sEntiment Reasoner) or supervised machine learning models to determine sentiment
polarity.
NLP Techniques: Leveraging techniques like word embeddings (Word2Vec, GloVe)
or language models (BERT, GPT) for semantic understanding.
Linguistic Features: Analyzing word frequency, syntactic patterns, or grammar
structures.
Advantages
Incorporates Various Features: Leverages linguistic, temporal, and behavioral
attributes within reviews, offering a comprehensive assessment for identifying
fraudulent patterns.
Comprehensive Feature Set: Utilizes diverse features such as sentiment analysis,
linguistic patterns, reviewer behavior, and temporal information, improving the
accuracy of detecting fake reviews.
Mixed Probability Models: Combines different probabilistic techniques, allowing the
system to adapt to emerging fraudulent review strategies over time.
Robust Classification: Considers multiple dimensions, minimizing misclassification
of genuine reviews as fake, thus reducing false alarms.
Software Specification
Processor : I3 core processor
Ram : 4 GB
Hard disk : 500 GB
4. Software Specification
Operating System : Windows 10 /11
Frond End : JAVA Swing
Back End : Mysql Server
IDE Tools : Eclipse
Browser : Microsoft Edge