SlideShare a Scribd company logo
1 of 31
ConvGRUText: A Deep Learning
Method for Fake Text Detection on
Online Social Media
Gaurav Sarin [1,2] & Pradeep Kumar [2]
[1] Delhi School of Business, VIPS-TC
[2] Indian Institute of Management, Lucknow
India
Presentation Outline
Introduction
Related Research
Research Problem and Objectives
Research Methodology
Proposed Solution
Experimental Setup and Results
Conclusions
Future Research Directions and Limitations
Q&As
Introduction
The issue of deception detection is tackled as linguistic one not dependent on the source utilized (Saquete, E. et
al., 2020)
Fake news detection continues to receive rising focus from research communities and industry professionals
(Alkhodair, S. A. et al., 2019)
Major transmission of fake news has negative impact on both individuals and society (Zhang, X. et al., 2019)
Uncovering fake reviews on social media is decisive and technically challenging
Human eye cannot precisely differentiate between true and false text, so need for automated system
Related Research
Amazon, continues to sue individuals and companies, which offer to post untruthful reviews on behalf of sellers
as a paid service (Kieler et al., 2016)
Issue with applying classification methods to detect fake text is dearth of ground truth datasets (Aghakhani et al.,
2015)
Rubin et al., 2015 stated that there are different kinds of fake news, each with potential textual indicators
Ott et al., 2014 designed a multi-domain dataset by asking Turkers to pen down fake reviews
Qazvinian et al., 2011 utilized unigram, bigram and part-of-speech (PoS) features of tweets to detect rumour
Liu et al., 2006 stated that text classification is a construction problem of models that can classify new documents
into pre-defined classes
Related Research …
Ren et al., 2017 stated that Convolutional Neural Network (CNN) provides best representation for sentence and
Gated Recurrent Unit (GRU) provides best representation for document
As per Jurafsky et al., 2015 the sequential models like RNN and LSTM can be utilized for semantic composition
as well
Deep Learning methods CNN and RNN (LeCun et al., 2015), have garnered great success in image recognition
and NLP jobs (Yih et al., 2014), sentence modelling (Kalchbrenner et al. 2014), and other tasks (Collobert et al.,
2011)
Related Research …
The “Social Distance Theory” postulated by DePaulo et al., 1996 states that social non-acceptance of
deception and psychological negative effect to the deceivers leads to deceivers’ distancing themselves from
deceit and persons they attempt to deceive
The term “Deceptive Communication” factors in deliberately hiding the truth by either exclusion or authority
(Ekman and Friesen, 1969) and is a message consciously shared by sender to serve a false notion or conclusion
by the receiver. The “Interpersonal Deception Theory” (Buller et al., 1996) boosts this notion, justifying how
deceiver critically manages the communication attitude in reply to the feelings and mistrust of the receivers
The “Media Richness Theory” states that deceptive actors may most likely utilize communication mediums,
which give several clues, an option to customize and to share quick feedback (synchronous and immediate) –
aiding them to strengthen the non-committal message characteristic and regulate deceptive communication
strategy at runtime to blur their deceptive purpose (Daft et al., 1987)
Research Problem and Objectives
Research Problem
For a given text coming from different online sources such as news, reviews, posts, tweets or blogs, the task is to
figure out if it is a fake or real text using deep learning hybrid method in the online social media space
Research Objectives
To design Deep Neural Network (DNN) based hybrid model for fake text identification
To check whether DNN model outclasses machine learning models
Figure out best text representations methods and their impact on classification metrics
Research Methodology
Text pre-processing (stemming, removal of stop words, removal of punctuations, removal of numbers and
removal of special characters) done
Quantitative approach used in this work using three real-world datasets
A state-of-art novel deep learning model, ConvGRUText was designed
Proposed Solution (Process View)
Text Collection
Text pre-processing and
representation
Data Mining
Knowledge Discovery
Sentence
Collect text
from multiple
sources
Feature
Engineering
Data
Representation
Uni-Gram N-Gram Document
Term
Frequency
TermFrequency-
InverseDocumentFrequency Boolean
Validation
using ground
truth dataset
Proposed Solution (Model View)
ConvGRUText Network Diagram
Data Inputs CNN Layer 1
CNN Layer 2
[ReLU Activation]
GRU Layer 1
[ReLU Activation]
Text Classification (Fake or Real)
[Sigmoid Activation]
Experimental Setup and Results
Datasets – https://www.kaggle.com/jruvika/fake-news-detection, https://www.kaggle.com/rtatman/deceptive-
opinion-spam-corpus (M. Ott et al., 2014) and https://www.kaggle.com/c/nlp-getting-started
First one 4000 records, second one 1600 records and third one 7500 records. 2 attributes – News text and News
label (fake or real) or Review text and Review label (fake or real) or Tweet text and Tweet label (fake or real) used
in data analysis
Confusion matrix for fake news, reviews and tweets for different text representations created and compared
Different models and classification metrics for fake news, reviews and tweets compared
Receiver Operating Characteristics curves created for fake news, reviews and tweets
Results (News TF Models)
Metrics/
Models
Accuracy Recall Precision F1-Score Specificity AUC
Naïve Bayes (NB) 0.7466 0.8021 0.6070 0.6910 0.7162 0.7379
Decision Tree (DT) 0.8989 0.9058 0.8743 0.8897 0.8932 0.8974
Logistic Regression (LR) 0.9076 0.9360 0.8610 0.8969 0.8862 0.9047
Random Forest (RF) 0.9262 0.9070 0.9385 0.9224 0.9443 0.927
Support Vector Machine
(SVM)
0.9326 0.9545 0.8984 0.9256 0.9154 0.9305
Artificial Neural Network
(ANN)
0.9596 0.9648 0.9558 0.9602 0.9543 0.9597
ConvGRUText (Highest) 0.9761 0.9781 0.9701 0.9741 0.9744 0.9762
Results (News TF-IDF Models)
Metrics/
Models
Accuracy Recall Precision F1-Score Specificity AUC
NB 0.8826 0.8784 0.8690 0.8736 0.8863 0.8818
DT 0.9076 0.9167 0.8824 0.8992 0.9002 0.9060
LR 0.9201 0.9282 0.8984 0.9130 0.9134 0.9188
RF 0.9451 0.9365 0.9465 0.9414 0.9527 0.9452
SVM 0.9476 0.9611 0.9251 0.9427 0.9365 0.9462
ANN 0.962 0.9671 0.9581 0.9625 0.9567 0.9621
ConvGRUText (Highest) 0.9811 0.9918 0.968 0.9797 0.972 0.9819
Results (News Boolean Models)
Metrics/
Models
Accuracy Recall Precision F1-Score Specificity AUC
NB 0.7481 0.7893 0.6293 0.7 0.7237 0.7409
DT 0.8953 0.9008 0.872 0.8861 0.8907 0.8938
LR 0.9076 0.936 0.861 0.8969 0.8862 0.9047
RF 0.9262 0.907 0.9385 0.9224 0.9443 0.92
SVM 0.9326 0.9545 0.8984 0.9256 0.9154 0.9305
ANN 0.9561 0.9781 0.9349 0.956 0.935 0.9565
ConvGRUText (Highest) 0.9899 0.9945 0.9837 0.9891 0.986 0.9902
Results (Reviews TF Models)
Metrics/
Models
Accuracy Recall Precision F1-Score Specificity AUC
RF 0.7031 0.7152 0.675 0.6945 0.6923 0.7031
DT 0.7406 0.7692 0.6875 0.726 0.7175 0.7406
NB 0.7875 0.7614 0.8375 0.7976 0.8194 0.7875
ANN 0.7934 0.8011 0.8057 0.8033 0.7848 0.7928
LR 0.8281 0.8302 0.825 0.8275 0.8261 0.8281
SVM 0.8281 0.8344 0.8187 0.8264 0.8221 0.8281
ConvGRUText (Highest) 0.8375 0.822 0.8535 0.8375 0.8535 0.8377
Results (Reviews TF-IDF Models)
Metrics/
Models
Accuracy Recall Precision F1-Score Specificity AUC
DT 0.7031 0.7019 0.7063 0.704 0.7044 0.7031
RF 0.7531 0.7341 0.7938 0.7627 0.7755 0.7531
ANN 0.8054 0.8235 0.8 0.8115 0.7866 0.8057
NB 0.8094 0.8037 0.8188 0.8111 0.8153 0.8094
SVM 0.8188 0.811 0.8312 0.8209 0.8269 0.8188
LR 0.8344 0.8365 0.8313 0.8338 0.8323 0.8344
ConvGRUText (Highest) 0.8562 0.8527 0.8633 0.858 0.8598 0.8563
Results (Reviews Boolean Models)
Metrics/
Models
Accuracy Recall Precision F1-Score Specificity AUC
RF 0.7031 0.7152 0.675 0.6945 0.6923 0.7031
DT 0.7406 0.7692 0.6875 0.726 0.7175 0.7406
ANN 0.8084 0.8323 0.7943 0.8128 0.7844 0.8091
NB 0.7875 0.7614 0.8375 0.7976 0.8194 0.7875
SVM 0.8281 0.8344 0.8187 0.8264 0.8221 0.8281
LR 0.8281 0.8302 0.825 0.8275 0.8261 0.8281
ConvGRUText (Highest) 0.9593 0.8466 0.8734 0.8598 0.8726 0.8596
Results (Tweets TF Models)
Metrics/
Models
Accuracy Recall Precision F1-Score Specificity AUC
DT 0.6767 0.8063 0.3187 0.4568 0.6504 0.6309
SVM 0.714 0.723 0.5344 0.6145 0.7098 0.691
ANN 0.7176 0.7106 0.5478 0.6186 0.721 0.6937
NB 0.7073 0.6798 0.5938 0.6338 0.7273 0.6928
LR 0.722 0.7165 0.5766 0.6389 0.7249 0.7034
RF 0.724 0.7018 0.6141 0.655 0.7372 0.7099
ConvGRUText (Highest) 0.77 0.7251 0.7354 0.7302 0.7953 0.7644
Results (Tweets TF-IDF Models)
Metrics/
Models
Accuracy Recall Precision F1-Score Specificity AUC
DT 0.6953 0.7798 0.3984 0.5273 0.6718 0.6574
ANN 0.7151 0.7126 0.5141 0.5972 0.7163 0.6896
RF 0.7133 0.6923 0.5906 0.6374 0.7254 0.6976
NB 0.6353 0.5525 0.7641 0.6412 0.7545 0.6518
SVM 0.7233 0.7176 0.5797 0.6413 0.7263 0.705
LR 0.7333 0.7308 0.5938 0.6552 0.7347 0.7155
ConvGRUText (Highest) 0.7613 0.694 0.7351 0.714 0.7791 0.753
Results (Tweets Boolean Models)
Metrics/
Models
Accuracy Recall Precision F1-Score Specificity AUC
DT 0.6873 0.7422 0.6704 0.4094 0.5277 0.6518
SVM 0.714 0.7230 0.7098 0.5344 0.6145 0.691
NB 0.694 0.6574 0.7186 0.5906 0.6222 0.6808
LR 0.722 0.7165 0.7249 0.5766 0.6389 0.7034
ANN 0.7291 0.7094 0.7397 0.5964 0.648 0.7104
RF 0.724 0.7018 0.7372 0.6141 0.655 0.7099
ConvGRUText (Highest) 0.772 0.6801 0.7775 0.763 0.7192 0.7606
ROC Curves (News ConvGRUText a. TF b. TF-IDF c. Boolean)
a b
c
ROC Curves (Reviews ConvGRUText a. TF b. TF-IDF c. Boolean)
a
c
b
ROC Curves (Tweets ConvGRUText a. TF b. TF-IDF c. Boolean)
a b
c
News Dataset Comparison
Metrics/Models Accuracy Recall Precision F1-Score Specificity AUC
ConvGRUText TF 0.9761 0.9781 0.9701 0.9741 0.9744 0.9762
ConvGRUText TF-
IDF
0.9811 0.9918 0.968 0.9797 0.972 0.9819
ConvGRUText
Boolean
0.9899 0.9945 0.9837 0.9891 0.986 0.9902
Reviews Dataset Comparison
Metrics/Models Accuracy Recall Precision F1-Score Specificity AUC
ConvGRUText TF 0.8375 0.822 0.8535 0.8375 0.8535 0.8377
ConvGRUText TF-
IDF
0.8562 0.8527 0.8633 0.858 0.8598 0.8563
ConvGRUText
Boolean
0.9593 0.8466 0.8734 0.8598 0.8726 0.8596
Tweets Dataset Comparison
Metrics/Models Accuracy Recall Precision F1-Score Specificity AUC
ConvGRUText TF 0.77 0.7251 0.7354 0.7302 0.7953 0.7644
ConvGRUText TF-
IDF
0.7613 0.694 0.7351 0.714 0.7791 0.753
ConvGRUText
Boolean
0.772 0.6801 0.763 0.7192 0.7775 0.7606
Conclusions
Online businesses capitalize on machine learning models to find out if a text is fake or real. They can use the
proposed hybrid deep learning model for fake text identification
Regulation is required by governments to stop the advancement of fake texts. This will need appropriate public
policy changes
If fake texts are controlled then market fluctuations will be less, leading to a stable economic environment
Data Analytics is the enabler to solve this critical business problem of distinguishing between fake and real texts
with high predictive accuracy, hence assisting the policy makers, businesses, communities, consumers and
society
Future Research Directions and Limitations
Execute developed hybrid model on other domains to further test generalizability
Larger dataset to test scalability of model
Could have used word embeddings such as GloVe, FastText or Word2Vec for data representations
Questions and Answers (10 mins)
PACIS_Fake_Entity_Detection_552.pptx

More Related Content

Featured

Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
Kurio // The Social Media Age(ncy)
 

Featured (20)

PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024PEPSICO Presentation to CAGNY Conference Feb 2024
PEPSICO Presentation to CAGNY Conference Feb 2024
 
Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)Content Methodology: A Best Practices Report (Webinar)
Content Methodology: A Best Practices Report (Webinar)
 
How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024How to Prepare For a Successful Job Search for 2024
How to Prepare For a Successful Job Search for 2024
 
Social Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie InsightsSocial Media Marketing Trends 2024 // The Global Indie Insights
Social Media Marketing Trends 2024 // The Global Indie Insights
 
Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024Trends In Paid Search: Navigating The Digital Landscape In 2024
Trends In Paid Search: Navigating The Digital Landscape In 2024
 
5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary5 Public speaking tips from TED - Visualized summary
5 Public speaking tips from TED - Visualized summary
 
ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd ChatGPT and the Future of Work - Clark Boyd
ChatGPT and the Future of Work - Clark Boyd
 
Getting into the tech field. what next
Getting into the tech field. what next Getting into the tech field. what next
Getting into the tech field. what next
 
Google's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search IntentGoogle's Just Not That Into You: Understanding Core Updates & Search Intent
Google's Just Not That Into You: Understanding Core Updates & Search Intent
 
How to have difficult conversations
How to have difficult conversations How to have difficult conversations
How to have difficult conversations
 
Introduction to Data Science
Introduction to Data ScienceIntroduction to Data Science
Introduction to Data Science
 
Time Management & Productivity - Best Practices
Time Management & Productivity -  Best PracticesTime Management & Productivity -  Best Practices
Time Management & Productivity - Best Practices
 
The six step guide to practical project management
The six step guide to practical project managementThe six step guide to practical project management
The six step guide to practical project management
 
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
Beginners Guide to TikTok for Search - Rachel Pearson - We are Tilt __ Bright...
 
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
Unlocking the Power of ChatGPT and AI in Testing - A Real-World Look, present...
 
12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work12 Ways to Increase Your Influence at Work
12 Ways to Increase Your Influence at Work
 
ChatGPT webinar slides
ChatGPT webinar slidesChatGPT webinar slides
ChatGPT webinar slides
 
More than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike RoutesMore than Just Lines on a Map: Best Practices for U.S Bike Routes
More than Just Lines on a Map: Best Practices for U.S Bike Routes
 
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
Ride the Storm: Navigating Through Unstable Periods / Katerina Rudko (Belka G...
 
Barbie - Brand Strategy Presentation
Barbie - Brand Strategy PresentationBarbie - Brand Strategy Presentation
Barbie - Brand Strategy Presentation
 

PACIS_Fake_Entity_Detection_552.pptx

  • 1. ConvGRUText: A Deep Learning Method for Fake Text Detection on Online Social Media Gaurav Sarin [1,2] & Pradeep Kumar [2] [1] Delhi School of Business, VIPS-TC [2] Indian Institute of Management, Lucknow India
  • 2. Presentation Outline Introduction Related Research Research Problem and Objectives Research Methodology Proposed Solution Experimental Setup and Results Conclusions Future Research Directions and Limitations Q&As
  • 3. Introduction The issue of deception detection is tackled as linguistic one not dependent on the source utilized (Saquete, E. et al., 2020) Fake news detection continues to receive rising focus from research communities and industry professionals (Alkhodair, S. A. et al., 2019) Major transmission of fake news has negative impact on both individuals and society (Zhang, X. et al., 2019) Uncovering fake reviews on social media is decisive and technically challenging Human eye cannot precisely differentiate between true and false text, so need for automated system
  • 4. Related Research Amazon, continues to sue individuals and companies, which offer to post untruthful reviews on behalf of sellers as a paid service (Kieler et al., 2016) Issue with applying classification methods to detect fake text is dearth of ground truth datasets (Aghakhani et al., 2015) Rubin et al., 2015 stated that there are different kinds of fake news, each with potential textual indicators Ott et al., 2014 designed a multi-domain dataset by asking Turkers to pen down fake reviews Qazvinian et al., 2011 utilized unigram, bigram and part-of-speech (PoS) features of tweets to detect rumour Liu et al., 2006 stated that text classification is a construction problem of models that can classify new documents into pre-defined classes
  • 5. Related Research … Ren et al., 2017 stated that Convolutional Neural Network (CNN) provides best representation for sentence and Gated Recurrent Unit (GRU) provides best representation for document As per Jurafsky et al., 2015 the sequential models like RNN and LSTM can be utilized for semantic composition as well Deep Learning methods CNN and RNN (LeCun et al., 2015), have garnered great success in image recognition and NLP jobs (Yih et al., 2014), sentence modelling (Kalchbrenner et al. 2014), and other tasks (Collobert et al., 2011)
  • 6. Related Research … The “Social Distance Theory” postulated by DePaulo et al., 1996 states that social non-acceptance of deception and psychological negative effect to the deceivers leads to deceivers’ distancing themselves from deceit and persons they attempt to deceive The term “Deceptive Communication” factors in deliberately hiding the truth by either exclusion or authority (Ekman and Friesen, 1969) and is a message consciously shared by sender to serve a false notion or conclusion by the receiver. The “Interpersonal Deception Theory” (Buller et al., 1996) boosts this notion, justifying how deceiver critically manages the communication attitude in reply to the feelings and mistrust of the receivers The “Media Richness Theory” states that deceptive actors may most likely utilize communication mediums, which give several clues, an option to customize and to share quick feedback (synchronous and immediate) – aiding them to strengthen the non-committal message characteristic and regulate deceptive communication strategy at runtime to blur their deceptive purpose (Daft et al., 1987)
  • 7. Research Problem and Objectives Research Problem For a given text coming from different online sources such as news, reviews, posts, tweets or blogs, the task is to figure out if it is a fake or real text using deep learning hybrid method in the online social media space Research Objectives To design Deep Neural Network (DNN) based hybrid model for fake text identification To check whether DNN model outclasses machine learning models Figure out best text representations methods and their impact on classification metrics
  • 8. Research Methodology Text pre-processing (stemming, removal of stop words, removal of punctuations, removal of numbers and removal of special characters) done Quantitative approach used in this work using three real-world datasets A state-of-art novel deep learning model, ConvGRUText was designed
  • 9. Proposed Solution (Process View) Text Collection Text pre-processing and representation Data Mining Knowledge Discovery
  • 10. Sentence Collect text from multiple sources Feature Engineering Data Representation Uni-Gram N-Gram Document Term Frequency TermFrequency- InverseDocumentFrequency Boolean Validation using ground truth dataset Proposed Solution (Model View)
  • 11. ConvGRUText Network Diagram Data Inputs CNN Layer 1 CNN Layer 2 [ReLU Activation] GRU Layer 1 [ReLU Activation] Text Classification (Fake or Real) [Sigmoid Activation]
  • 12. Experimental Setup and Results Datasets – https://www.kaggle.com/jruvika/fake-news-detection, https://www.kaggle.com/rtatman/deceptive- opinion-spam-corpus (M. Ott et al., 2014) and https://www.kaggle.com/c/nlp-getting-started First one 4000 records, second one 1600 records and third one 7500 records. 2 attributes – News text and News label (fake or real) or Review text and Review label (fake or real) or Tweet text and Tweet label (fake or real) used in data analysis Confusion matrix for fake news, reviews and tweets for different text representations created and compared Different models and classification metrics for fake news, reviews and tweets compared Receiver Operating Characteristics curves created for fake news, reviews and tweets
  • 13. Results (News TF Models) Metrics/ Models Accuracy Recall Precision F1-Score Specificity AUC Naïve Bayes (NB) 0.7466 0.8021 0.6070 0.6910 0.7162 0.7379 Decision Tree (DT) 0.8989 0.9058 0.8743 0.8897 0.8932 0.8974 Logistic Regression (LR) 0.9076 0.9360 0.8610 0.8969 0.8862 0.9047 Random Forest (RF) 0.9262 0.9070 0.9385 0.9224 0.9443 0.927 Support Vector Machine (SVM) 0.9326 0.9545 0.8984 0.9256 0.9154 0.9305 Artificial Neural Network (ANN) 0.9596 0.9648 0.9558 0.9602 0.9543 0.9597 ConvGRUText (Highest) 0.9761 0.9781 0.9701 0.9741 0.9744 0.9762
  • 14. Results (News TF-IDF Models) Metrics/ Models Accuracy Recall Precision F1-Score Specificity AUC NB 0.8826 0.8784 0.8690 0.8736 0.8863 0.8818 DT 0.9076 0.9167 0.8824 0.8992 0.9002 0.9060 LR 0.9201 0.9282 0.8984 0.9130 0.9134 0.9188 RF 0.9451 0.9365 0.9465 0.9414 0.9527 0.9452 SVM 0.9476 0.9611 0.9251 0.9427 0.9365 0.9462 ANN 0.962 0.9671 0.9581 0.9625 0.9567 0.9621 ConvGRUText (Highest) 0.9811 0.9918 0.968 0.9797 0.972 0.9819
  • 15. Results (News Boolean Models) Metrics/ Models Accuracy Recall Precision F1-Score Specificity AUC NB 0.7481 0.7893 0.6293 0.7 0.7237 0.7409 DT 0.8953 0.9008 0.872 0.8861 0.8907 0.8938 LR 0.9076 0.936 0.861 0.8969 0.8862 0.9047 RF 0.9262 0.907 0.9385 0.9224 0.9443 0.92 SVM 0.9326 0.9545 0.8984 0.9256 0.9154 0.9305 ANN 0.9561 0.9781 0.9349 0.956 0.935 0.9565 ConvGRUText (Highest) 0.9899 0.9945 0.9837 0.9891 0.986 0.9902
  • 16. Results (Reviews TF Models) Metrics/ Models Accuracy Recall Precision F1-Score Specificity AUC RF 0.7031 0.7152 0.675 0.6945 0.6923 0.7031 DT 0.7406 0.7692 0.6875 0.726 0.7175 0.7406 NB 0.7875 0.7614 0.8375 0.7976 0.8194 0.7875 ANN 0.7934 0.8011 0.8057 0.8033 0.7848 0.7928 LR 0.8281 0.8302 0.825 0.8275 0.8261 0.8281 SVM 0.8281 0.8344 0.8187 0.8264 0.8221 0.8281 ConvGRUText (Highest) 0.8375 0.822 0.8535 0.8375 0.8535 0.8377
  • 17. Results (Reviews TF-IDF Models) Metrics/ Models Accuracy Recall Precision F1-Score Specificity AUC DT 0.7031 0.7019 0.7063 0.704 0.7044 0.7031 RF 0.7531 0.7341 0.7938 0.7627 0.7755 0.7531 ANN 0.8054 0.8235 0.8 0.8115 0.7866 0.8057 NB 0.8094 0.8037 0.8188 0.8111 0.8153 0.8094 SVM 0.8188 0.811 0.8312 0.8209 0.8269 0.8188 LR 0.8344 0.8365 0.8313 0.8338 0.8323 0.8344 ConvGRUText (Highest) 0.8562 0.8527 0.8633 0.858 0.8598 0.8563
  • 18. Results (Reviews Boolean Models) Metrics/ Models Accuracy Recall Precision F1-Score Specificity AUC RF 0.7031 0.7152 0.675 0.6945 0.6923 0.7031 DT 0.7406 0.7692 0.6875 0.726 0.7175 0.7406 ANN 0.8084 0.8323 0.7943 0.8128 0.7844 0.8091 NB 0.7875 0.7614 0.8375 0.7976 0.8194 0.7875 SVM 0.8281 0.8344 0.8187 0.8264 0.8221 0.8281 LR 0.8281 0.8302 0.825 0.8275 0.8261 0.8281 ConvGRUText (Highest) 0.9593 0.8466 0.8734 0.8598 0.8726 0.8596
  • 19. Results (Tweets TF Models) Metrics/ Models Accuracy Recall Precision F1-Score Specificity AUC DT 0.6767 0.8063 0.3187 0.4568 0.6504 0.6309 SVM 0.714 0.723 0.5344 0.6145 0.7098 0.691 ANN 0.7176 0.7106 0.5478 0.6186 0.721 0.6937 NB 0.7073 0.6798 0.5938 0.6338 0.7273 0.6928 LR 0.722 0.7165 0.5766 0.6389 0.7249 0.7034 RF 0.724 0.7018 0.6141 0.655 0.7372 0.7099 ConvGRUText (Highest) 0.77 0.7251 0.7354 0.7302 0.7953 0.7644
  • 20. Results (Tweets TF-IDF Models) Metrics/ Models Accuracy Recall Precision F1-Score Specificity AUC DT 0.6953 0.7798 0.3984 0.5273 0.6718 0.6574 ANN 0.7151 0.7126 0.5141 0.5972 0.7163 0.6896 RF 0.7133 0.6923 0.5906 0.6374 0.7254 0.6976 NB 0.6353 0.5525 0.7641 0.6412 0.7545 0.6518 SVM 0.7233 0.7176 0.5797 0.6413 0.7263 0.705 LR 0.7333 0.7308 0.5938 0.6552 0.7347 0.7155 ConvGRUText (Highest) 0.7613 0.694 0.7351 0.714 0.7791 0.753
  • 21. Results (Tweets Boolean Models) Metrics/ Models Accuracy Recall Precision F1-Score Specificity AUC DT 0.6873 0.7422 0.6704 0.4094 0.5277 0.6518 SVM 0.714 0.7230 0.7098 0.5344 0.6145 0.691 NB 0.694 0.6574 0.7186 0.5906 0.6222 0.6808 LR 0.722 0.7165 0.7249 0.5766 0.6389 0.7034 ANN 0.7291 0.7094 0.7397 0.5964 0.648 0.7104 RF 0.724 0.7018 0.7372 0.6141 0.655 0.7099 ConvGRUText (Highest) 0.772 0.6801 0.7775 0.763 0.7192 0.7606
  • 22. ROC Curves (News ConvGRUText a. TF b. TF-IDF c. Boolean) a b c
  • 23. ROC Curves (Reviews ConvGRUText a. TF b. TF-IDF c. Boolean) a c b
  • 24. ROC Curves (Tweets ConvGRUText a. TF b. TF-IDF c. Boolean) a b c
  • 25. News Dataset Comparison Metrics/Models Accuracy Recall Precision F1-Score Specificity AUC ConvGRUText TF 0.9761 0.9781 0.9701 0.9741 0.9744 0.9762 ConvGRUText TF- IDF 0.9811 0.9918 0.968 0.9797 0.972 0.9819 ConvGRUText Boolean 0.9899 0.9945 0.9837 0.9891 0.986 0.9902
  • 26. Reviews Dataset Comparison Metrics/Models Accuracy Recall Precision F1-Score Specificity AUC ConvGRUText TF 0.8375 0.822 0.8535 0.8375 0.8535 0.8377 ConvGRUText TF- IDF 0.8562 0.8527 0.8633 0.858 0.8598 0.8563 ConvGRUText Boolean 0.9593 0.8466 0.8734 0.8598 0.8726 0.8596
  • 27. Tweets Dataset Comparison Metrics/Models Accuracy Recall Precision F1-Score Specificity AUC ConvGRUText TF 0.77 0.7251 0.7354 0.7302 0.7953 0.7644 ConvGRUText TF- IDF 0.7613 0.694 0.7351 0.714 0.7791 0.753 ConvGRUText Boolean 0.772 0.6801 0.763 0.7192 0.7775 0.7606
  • 28. Conclusions Online businesses capitalize on machine learning models to find out if a text is fake or real. They can use the proposed hybrid deep learning model for fake text identification Regulation is required by governments to stop the advancement of fake texts. This will need appropriate public policy changes If fake texts are controlled then market fluctuations will be less, leading to a stable economic environment Data Analytics is the enabler to solve this critical business problem of distinguishing between fake and real texts with high predictive accuracy, hence assisting the policy makers, businesses, communities, consumers and society
  • 29. Future Research Directions and Limitations Execute developed hybrid model on other domains to further test generalizability Larger dataset to test scalability of model Could have used word embeddings such as GloVe, FastText or Word2Vec for data representations