SlideShare a Scribd company logo
1 of 5
Base paper Title: A Multilingual Spam Reviews Detection Based on Pre-Trained Word
Embedding and Weighted Swarm Support Vector Machines
Modified Title: A Multilingual Spam Reviews Detection Using Weighted Swarm Support
Vector Machines and Pre-Trained Word Embedding
Abstract
Online reviews are important information that customers seek when deciding to buy
products or services. Also, organizations benefit from these reviews as essential feedback for
their products or services. Such information required reliability, especially during the Covid-
19 pandemic which showed a massive increase in online reviews due to quarantine and sitting
at home. Not only the number of reviews was boosted but also the context and preferences
during the pandemic. Therefore, spam reviewers reflect on these changes and improve their
deception technique. Spam reviews usually consist of misleading, fake, or fraudulent reviews
that tend to deceive customers for the purpose of making money or causing harm to other
competitors. Hence, this work presents a Weighted Support Vector Machine (WSVM) and
Harris Hawks Optimization (HHO) for spam review detection. The HHO works as an algorithm
for optimizing hyperparameters and feature weighting. Three different language corpora have
been used as datasets, namely English, Spanish, and Arabic in order to solve the multilingual
problem in spam reviews. Moreover, pre-trained word embedding (BERT) has been applied
alongside three-word representation methods (NGram-3, TFIDF, and One-hot encoding). Four
experiments have been conducted, each focused on solving and demonstrating different
aspects. In all experiments, the proposed approach showed excellent results compared with
other state-ofthe-art algorithms. In other words, the WSVM-HHO achieved an accuracy of
88.163%, 71.913%, 89.565%, and 84.270%, for English, Spanish, Arabic, and Multilingual
datasets, respectively. Further, a deep analysis has been conducted to investigate the context of
reviews before and after the COVID-19 situation. In addition, it has been generated to create a
new dataset with statistical features and merge its previous textual features for improving
detection performance.
Existing System
Due to the evolution of technology, the Internet, and mobile devices, information has
become accessible to virtually anyone. Information is usually provided in various forms,
including images, sounds, videos, and text. On the internet, this information can be used to
verify, inform, and explain various things. Such information can be critical and important to
numerous individuals and organizations across different domains [1]. For example, in the
medical and especially in the pharmaceutical field, it involves gathering, processing, and
disseminating information about medications, as well as an explanation of how to safely and
correctly use them. As for economics, goods and services are produced primarily as a result of
information-intensive activities that augment scientific and technical innovation. Hence, a
society’s integration and development are determined by the flow of information within it. For
instance, [2] stated that information has three different styles: information as a process,
information as knowledge, and information as a thing. Therefore, information may consider
more than just informed data that exist on the web, it also can be categorized as a
communication approach between people. Information can also be found in reviews, which are
known as feedback written about the customer experience of products or services. According
to De et al. [3], reviews, or online reviews, are a form of electronic Word-Of-Mouth (eWOM),
which involves communicating information about product usage, services, or the behavior of
sellers to consumers using internet-based technology.
Drawback in Existing System
 Bias in Pre-Trained Word Embeddings:
Pre-trained word embeddings might carry biases from the data they were trained on.
If the embeddings are biased, the model may inherit and perpetuate these biases,
impacting the fairness and reliability of the spam detection system.
 Complexity and Resource Intensiveness:
Weighted Swarm Support Vector Machines, especially when applied to multilingual
scenarios, can be computationally intensive. Training and deploying such models may
require significant computational resources, limiting their applicability in resource-
constrained environments.
 Adaptability to Evolving Language:
Languages evolve over time with new words, phrases, and expressions emerging. Pre-
trained embeddings might struggle to capture the latest linguistic trends, making the
model less effective in identifying spam in recently evolved language patterns.
 Imbalance in Class Distribution:
The distribution of spam and non-spam reviews may not be balanced across
languages, leading to imbalances in class distribution. This can impact the model's
ability to learn and generalize effectively.
Proposed System
 Proposed Cuckoo and Harmony features combination in an effective hybrid feature
selection technique, and Naive Bayes is used as a classification method for classifying
reviews as spam or ham.
 The proposed WSVM-HHO algorithm is compared with other well-known classifiers
in the literature combined with different metaheuristic algorithms.
 The proposed algorithm outperformed the other algorithms and achieved the highest
results from the benchmark functions.
 The proposed method to other competitive methods, the experimental results indicate
that this approach selected only the informative features and gave higher classification
accuracy as compared to the others.
Algorithm
 Embedding Matrix:
Utilize pre-trained word embeddings (e.g., Word2Vec, GloVe, or FastText) for
multiple languages.
Create an embedding matrix mapping words to their respective vectors.
 Feature Extraction:
Sentence Embedding:
Convert individual reviews into fixed-size vectors by aggregating word embeddings
(e.g., averaging or pooling).
 Model Interpretability:
Explainability:
Incorporate techniques for explaining model decisions, especially in cases where
interpretability is crucial.
Advantages
 Transfer Learning Benefits:
Leveraging pre-trained word embeddings allows the model to benefit from
knowledge learned on large general corpora, improving performance, especially when
labeled data is limited.
 Interpretability:
While SVMs can be considered "black-box" models, the use of swarm intelligence
for optimization can lead to more interpretable models. Understanding
hyperparameter choices becomes crucial for model interpretability.
 Scalability:
Pre-trained word embeddings and swarm intelligence optimization contribute to a
scalable solution that can handle large datasets and computational requirements.
 High Accuracy in Multilingual Environments:
The combination of pre-trained embeddings and weighted SVMs enhances the
accuracy of spam detection in reviews across diverse linguistic backgrounds.
Software Specification
 Processor : I3 core processor
 Ram : 4 GB
 Hard disk : 500 GB
Software Specification
 Operating System : Windows 10 /11
 Frond End : Python
 Back End : Mysql Server
 IDE Tools : Pycharm

More Related Content

Similar to A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and Weighted Swarm Support Vector Machines.docx

View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
butest
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
butest
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
butest
 
A hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerA hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzer
IAESIJAI
 
eHealth - Mark Yendt
eHealth - Mark YendteHealth - Mark Yendt
eHealth - Mark Yendt
GHBN
 
Rational - Why did we develop a web site?
Rational - Why did we develop a web site?Rational - Why did we develop a web site?
Rational - Why did we develop a web site?
webhostingguy
 
Text pre-processing of multilingual for sentiment analysis based on social ne...
Text pre-processing of multilingual for sentiment analysis based on social ne...Text pre-processing of multilingual for sentiment analysis based on social ne...
Text pre-processing of multilingual for sentiment analysis based on social ne...
IJECEIAES
 

Similar to A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and Weighted Swarm Support Vector Machines.docx (20)

Sentiment Analysis in Social Media and Its Operations
Sentiment Analysis in Social Media and Its OperationsSentiment Analysis in Social Media and Its Operations
Sentiment Analysis in Social Media and Its Operations
 
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...
Dialectal Arabic sentiment analysis based on tree-based pipeline  optimizatio...Dialectal Arabic sentiment analysis based on tree-based pipeline  optimizatio...
Dialectal Arabic sentiment analysis based on tree-based pipeline optimizatio...
 
Word2Vec model for sentiment analysis of product reviews in Indonesian language
Word2Vec model for sentiment analysis of product reviews in Indonesian languageWord2Vec model for sentiment analysis of product reviews in Indonesian language
Word2Vec model for sentiment analysis of product reviews in Indonesian language
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
 
View the Microsoft Word document.doc
View the Microsoft Word document.docView the Microsoft Word document.doc
View the Microsoft Word document.doc
 
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
IRJET- An Extensive Study of Sentiment Analysis Techniques and its Progressio...
 
IRJET- Voice Modulation and Verification for Smart Authentication System
IRJET- Voice Modulation and Verification for Smart Authentication SystemIRJET- Voice Modulation and Verification for Smart Authentication System
IRJET- Voice Modulation and Verification for Smart Authentication System
 
A hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzerA hybrid composite features based sentence level sentiment analyzer
A hybrid composite features based sentence level sentiment analyzer
 
Supervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured TextSupervised Approach to Extract Sentiments from Unstructured Text
Supervised Approach to Extract Sentiments from Unstructured Text
 
Evaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on BrexitEvaluating sentiment analysis and word embedding techniques on Brexit
Evaluating sentiment analysis and word embedding techniques on Brexit
 
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTIONCOMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
COMMERCE FAKE PRODUCT REVIEWS MONITORING AND DETECTION
 
Text classification supervised algorithms with term frequency inverse documen...
Text classification supervised algorithms with term frequency inverse documen...Text classification supervised algorithms with term frequency inverse documen...
Text classification supervised algorithms with term frequency inverse documen...
 
Natural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine LearningNatural Language Processing Through Different Classes of Machine Learning
Natural Language Processing Through Different Classes of Machine Learning
 
Overview of ePRO
Overview of ePROOverview of ePRO
Overview of ePRO
 
eHealth - Mark Yendt
eHealth - Mark YendteHealth - Mark Yendt
eHealth - Mark Yendt
 
Rational - Why did we develop a web site?
Rational - Why did we develop a web site?Rational - Why did we develop a web site?
Rational - Why did we develop a web site?
 
Text pre-processing of multilingual for sentiment analysis based on social ne...
Text pre-processing of multilingual for sentiment analysis based on social ne...Text pre-processing of multilingual for sentiment analysis based on social ne...
Text pre-processing of multilingual for sentiment analysis based on social ne...
 
PATHS state of the art monitoring report
PATHS state of the art monitoring reportPATHS state of the art monitoring report
PATHS state of the art monitoring report
 
How to test LLMs in production.pdf
How to test LLMs in production.pdfHow to test LLMs in production.pdf
How to test LLMs in production.pdf
 

More from Shakas Technologies

More from Shakas Technologies (20)

A Review on Deep-Learning-Based Cyberbullying Detection
A Review on Deep-Learning-Based Cyberbullying DetectionA Review on Deep-Learning-Based Cyberbullying Detection
A Review on Deep-Learning-Based Cyberbullying Detection
 
A Personal Privacy Data Protection Scheme for Encryption and Revocation of Hi...
A Personal Privacy Data Protection Scheme for Encryption and Revocation of Hi...A Personal Privacy Data Protection Scheme for Encryption and Revocation of Hi...
A Personal Privacy Data Protection Scheme for Encryption and Revocation of Hi...
 
A Novel Framework for Credit Card.
A Novel Framework for Credit Card.A Novel Framework for Credit Card.
A Novel Framework for Credit Card.
 
A Comparative Analysis of Sampling Techniques for Click-Through Rate Predicti...
A Comparative Analysis of Sampling Techniques for Click-Through Rate Predicti...A Comparative Analysis of Sampling Techniques for Click-Through Rate Predicti...
A Comparative Analysis of Sampling Techniques for Click-Through Rate Predicti...
 
NS2 Final Year Project Titles 2023- 2024
NS2 Final Year Project Titles 2023- 2024NS2 Final Year Project Titles 2023- 2024
NS2 Final Year Project Titles 2023- 2024
 
MATLAB Final Year IEEE Project Titles 2023-2024
MATLAB Final Year IEEE Project Titles 2023-2024MATLAB Final Year IEEE Project Titles 2023-2024
MATLAB Final Year IEEE Project Titles 2023-2024
 
Latest Python IEEE Project Titles 2023-2024
Latest Python IEEE Project Titles 2023-2024Latest Python IEEE Project Titles 2023-2024
Latest Python IEEE Project Titles 2023-2024
 
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
EMOTION RECOGNITION BY TEXTUAL TWEETS CLASSIFICATION USING VOTING CLASSIFIER ...
 
CYBER THREAT INTELLIGENCE MINING FOR PROACTIVE CYBERSECURITY DEFENSE
CYBER THREAT INTELLIGENCE MINING FOR PROACTIVE CYBERSECURITY DEFENSECYBER THREAT INTELLIGENCE MINING FOR PROACTIVE CYBERSECURITY DEFENSE
CYBER THREAT INTELLIGENCE MINING FOR PROACTIVE CYBERSECURITY DEFENSE
 
Detecting Mental Disorders in social Media through Emotional patterns-The cas...
Detecting Mental Disorders in social Media through Emotional patterns-The cas...Detecting Mental Disorders in social Media through Emotional patterns-The cas...
Detecting Mental Disorders in social Media through Emotional patterns-The cas...
 
CO2 EMISSION RATING BY VEHICLES USING DATA SCIENCE
CO2 EMISSION RATING BY VEHICLES USING DATA SCIENCECO2 EMISSION RATING BY VEHICLES USING DATA SCIENCE
CO2 EMISSION RATING BY VEHICLES USING DATA SCIENCE
 
Toward Effective Evaluation of Cyber Defense Threat Based Adversary Emulation...
Toward Effective Evaluation of Cyber Defense Threat Based Adversary Emulation...Toward Effective Evaluation of Cyber Defense Threat Based Adversary Emulation...
Toward Effective Evaluation of Cyber Defense Threat Based Adversary Emulation...
 
Optimizing Numerical Weather Prediction Model Performance Using Machine Learn...
Optimizing Numerical Weather Prediction Model Performance Using Machine Learn...Optimizing Numerical Weather Prediction Model Performance Using Machine Learn...
Optimizing Numerical Weather Prediction Model Performance Using Machine Learn...
 
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
Nature-Based Prediction Model of Bug Reports Based on Ensemble Machine Learni...
 
Multi-Class Stress Detection Through Heart Rate Variability A Deep Neural Net...
Multi-Class Stress Detection Through Heart Rate Variability A Deep Neural Net...Multi-Class Stress Detection Through Heart Rate Variability A Deep Neural Net...
Multi-Class Stress Detection Through Heart Rate Variability A Deep Neural Net...
 
Fighting Money Laundering With Statistics and Machine Learning.docx
Fighting Money Laundering With Statistics and Machine Learning.docxFighting Money Laundering With Statistics and Machine Learning.docx
Fighting Money Laundering With Statistics and Machine Learning.docx
 
Explainable Artificial Intelligence for Patient Safety A Review of Applicatio...
Explainable Artificial Intelligence for Patient Safety A Review of Applicatio...Explainable Artificial Intelligence for Patient Safety A Review of Applicatio...
Explainable Artificial Intelligence for Patient Safety A Review of Applicatio...
 
Ensemble Deep Learning-Based Prediction of Fraudulent Cryptocurrency Transact...
Ensemble Deep Learning-Based Prediction of Fraudulent Cryptocurrency Transact...Ensemble Deep Learning-Based Prediction of Fraudulent Cryptocurrency Transact...
Ensemble Deep Learning-Based Prediction of Fraudulent Cryptocurrency Transact...
 
Effective Software Effort Estimation Leveraging Machine Learning for Digital ...
Effective Software Effort Estimation Leveraging Machine Learning for Digital ...Effective Software Effort Estimation Leveraging Machine Learning for Digital ...
Effective Software Effort Estimation Leveraging Machine Learning for Digital ...
 
Detection of Wastewater Pollution Through Natural Language Generation With a ...
Detection of Wastewater Pollution Through Natural Language Generation With a ...Detection of Wastewater Pollution Through Natural Language Generation With a ...
Detection of Wastewater Pollution Through Natural Language Generation With a ...
 

Recently uploaded

Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
EADTU
 
Orientation Canvas Course Presentation.pdf
Orientation Canvas Course Presentation.pdfOrientation Canvas Course Presentation.pdf
Orientation Canvas Course Presentation.pdf
Elizabeth Walsh
 

Recently uploaded (20)

NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
NO1 Top Black Magic Specialist In Lahore Black magic In Pakistan Kala Ilam Ex...
 
Graduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - EnglishGraduate Outcomes Presentation Slides - English
Graduate Outcomes Presentation Slides - English
 
Understanding Accommodations and Modifications
Understanding  Accommodations and ModificationsUnderstanding  Accommodations and Modifications
Understanding Accommodations and Modifications
 
Michaelis Menten Equation and Estimation Of Vmax and Tmax.pptx
Michaelis Menten Equation and Estimation Of Vmax and Tmax.pptxMichaelis Menten Equation and Estimation Of Vmax and Tmax.pptx
Michaelis Menten Equation and Estimation Of Vmax and Tmax.pptx
 
FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024FSB Advising Checklist - Orientation 2024
FSB Advising Checklist - Orientation 2024
 
REMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptxREMIFENTANIL: An Ultra short acting opioid.pptx
REMIFENTANIL: An Ultra short acting opioid.pptx
 
Tatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf artsTatlong Kwento ni Lola basyang-1.pdf arts
Tatlong Kwento ni Lola basyang-1.pdf arts
 
Including Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdfIncluding Mental Health Support in Project Delivery, 14 May.pdf
Including Mental Health Support in Project Delivery, 14 May.pdf
 
Introduction to TechSoup’s Digital Marketing Services and Use Cases
Introduction to TechSoup’s Digital Marketing  Services and Use CasesIntroduction to TechSoup’s Digital Marketing  Services and Use Cases
Introduction to TechSoup’s Digital Marketing Services and Use Cases
 
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes GuàrdiaPersonalisation of Education by AI and Big Data - Lourdes Guàrdia
Personalisation of Education by AI and Big Data - Lourdes Guàrdia
 
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdfFICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
FICTIONAL SALESMAN/SALESMAN SNSW 2024.pdf
 
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
80 ĐỀ THI THỬ TUYỂN SINH TIẾNG ANH VÀO 10 SỞ GD – ĐT THÀNH PHỐ HỒ CHÍ MINH NĂ...
 
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdfUGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
UGC NET Paper 1 Unit 7 DATA INTERPRETATION.pdf
 
Orientation Canvas Course Presentation.pdf
Orientation Canvas Course Presentation.pdfOrientation Canvas Course Presentation.pdf
Orientation Canvas Course Presentation.pdf
 
Towards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptxTowards a code of practice for AI in AT.pptx
Towards a code of practice for AI in AT.pptx
 
VAMOS CUIDAR DO NOSSO PLANETA! .
VAMOS CUIDAR DO NOSSO PLANETA!                    .VAMOS CUIDAR DO NOSSO PLANETA!                    .
VAMOS CUIDAR DO NOSSO PLANETA! .
 
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
TỔNG HỢP HƠN 100 ĐỀ THI THỬ TỐT NGHIỆP THPT TOÁN 2024 - TỪ CÁC TRƯỜNG, TRƯỜNG...
 
Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...Andreas Schleicher presents at the launch of What does child empowerment mean...
Andreas Schleicher presents at the launch of What does child empowerment mean...
 
How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17How to Manage Call for Tendor in Odoo 17
How to Manage Call for Tendor in Odoo 17
 
How to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptxHow to setup Pycharm environment for Odoo 17.pptx
How to setup Pycharm environment for Odoo 17.pptx
 

A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and Weighted Swarm Support Vector Machines.docx

  • 1. Base paper Title: A Multilingual Spam Reviews Detection Based on Pre-Trained Word Embedding and Weighted Swarm Support Vector Machines Modified Title: A Multilingual Spam Reviews Detection Using Weighted Swarm Support Vector Machines and Pre-Trained Word Embedding Abstract Online reviews are important information that customers seek when deciding to buy products or services. Also, organizations benefit from these reviews as essential feedback for their products or services. Such information required reliability, especially during the Covid- 19 pandemic which showed a massive increase in online reviews due to quarantine and sitting at home. Not only the number of reviews was boosted but also the context and preferences during the pandemic. Therefore, spam reviewers reflect on these changes and improve their deception technique. Spam reviews usually consist of misleading, fake, or fraudulent reviews that tend to deceive customers for the purpose of making money or causing harm to other competitors. Hence, this work presents a Weighted Support Vector Machine (WSVM) and Harris Hawks Optimization (HHO) for spam review detection. The HHO works as an algorithm for optimizing hyperparameters and feature weighting. Three different language corpora have been used as datasets, namely English, Spanish, and Arabic in order to solve the multilingual problem in spam reviews. Moreover, pre-trained word embedding (BERT) has been applied alongside three-word representation methods (NGram-3, TFIDF, and One-hot encoding). Four experiments have been conducted, each focused on solving and demonstrating different aspects. In all experiments, the proposed approach showed excellent results compared with other state-ofthe-art algorithms. In other words, the WSVM-HHO achieved an accuracy of 88.163%, 71.913%, 89.565%, and 84.270%, for English, Spanish, Arabic, and Multilingual datasets, respectively. Further, a deep analysis has been conducted to investigate the context of reviews before and after the COVID-19 situation. In addition, it has been generated to create a new dataset with statistical features and merge its previous textual features for improving detection performance.
  • 2. Existing System Due to the evolution of technology, the Internet, and mobile devices, information has become accessible to virtually anyone. Information is usually provided in various forms, including images, sounds, videos, and text. On the internet, this information can be used to verify, inform, and explain various things. Such information can be critical and important to numerous individuals and organizations across different domains [1]. For example, in the medical and especially in the pharmaceutical field, it involves gathering, processing, and disseminating information about medications, as well as an explanation of how to safely and correctly use them. As for economics, goods and services are produced primarily as a result of information-intensive activities that augment scientific and technical innovation. Hence, a society’s integration and development are determined by the flow of information within it. For instance, [2] stated that information has three different styles: information as a process, information as knowledge, and information as a thing. Therefore, information may consider more than just informed data that exist on the web, it also can be categorized as a communication approach between people. Information can also be found in reviews, which are known as feedback written about the customer experience of products or services. According to De et al. [3], reviews, or online reviews, are a form of electronic Word-Of-Mouth (eWOM), which involves communicating information about product usage, services, or the behavior of sellers to consumers using internet-based technology. Drawback in Existing System  Bias in Pre-Trained Word Embeddings: Pre-trained word embeddings might carry biases from the data they were trained on. If the embeddings are biased, the model may inherit and perpetuate these biases, impacting the fairness and reliability of the spam detection system.  Complexity and Resource Intensiveness: Weighted Swarm Support Vector Machines, especially when applied to multilingual scenarios, can be computationally intensive. Training and deploying such models may require significant computational resources, limiting their applicability in resource- constrained environments.
  • 3.  Adaptability to Evolving Language: Languages evolve over time with new words, phrases, and expressions emerging. Pre- trained embeddings might struggle to capture the latest linguistic trends, making the model less effective in identifying spam in recently evolved language patterns.  Imbalance in Class Distribution: The distribution of spam and non-spam reviews may not be balanced across languages, leading to imbalances in class distribution. This can impact the model's ability to learn and generalize effectively. Proposed System  Proposed Cuckoo and Harmony features combination in an effective hybrid feature selection technique, and Naive Bayes is used as a classification method for classifying reviews as spam or ham.  The proposed WSVM-HHO algorithm is compared with other well-known classifiers in the literature combined with different metaheuristic algorithms.  The proposed algorithm outperformed the other algorithms and achieved the highest results from the benchmark functions.  The proposed method to other competitive methods, the experimental results indicate that this approach selected only the informative features and gave higher classification accuracy as compared to the others. Algorithm  Embedding Matrix: Utilize pre-trained word embeddings (e.g., Word2Vec, GloVe, or FastText) for multiple languages. Create an embedding matrix mapping words to their respective vectors.
  • 4.  Feature Extraction: Sentence Embedding: Convert individual reviews into fixed-size vectors by aggregating word embeddings (e.g., averaging or pooling).  Model Interpretability: Explainability: Incorporate techniques for explaining model decisions, especially in cases where interpretability is crucial. Advantages  Transfer Learning Benefits: Leveraging pre-trained word embeddings allows the model to benefit from knowledge learned on large general corpora, improving performance, especially when labeled data is limited.  Interpretability: While SVMs can be considered "black-box" models, the use of swarm intelligence for optimization can lead to more interpretable models. Understanding hyperparameter choices becomes crucial for model interpretability.  Scalability: Pre-trained word embeddings and swarm intelligence optimization contribute to a scalable solution that can handle large datasets and computational requirements.  High Accuracy in Multilingual Environments: The combination of pre-trained embeddings and weighted SVMs enhances the accuracy of spam detection in reviews across diverse linguistic backgrounds.
  • 5. Software Specification  Processor : I3 core processor  Ram : 4 GB  Hard disk : 500 GB Software Specification  Operating System : Windows 10 /11  Frond End : Python  Back End : Mysql Server  IDE Tools : Pycharm