1. ConvGRUText: A Deep Learning
Method for Fake Text Detection on
Online Social Media
Gaurav Sarin [1,2] & Pradeep Kumar [2]
[1] Delhi School of Business, VIPS-TC
[2] Indian Institute of Management, Lucknow
India
3. Introduction
The issue of deception detection is tackled as linguistic one not dependent on the source utilized (Saquete, E. et
al., 2020)
Fake news detection continues to receive rising focus from research communities and industry professionals
(Alkhodair, S. A. et al., 2019)
Major transmission of fake news has negative impact on both individuals and society (Zhang, X. et al., 2019)
Uncovering fake reviews on social media is decisive and technically challenging
Human eye cannot precisely differentiate between true and false text, so need for automated system
4. Related Research
Amazon, continues to sue individuals and companies, which offer to post untruthful reviews on behalf of sellers
as a paid service (Kieler et al., 2016)
Issue with applying classification methods to detect fake text is dearth of ground truth datasets (Aghakhani et al.,
2015)
Rubin et al., 2015 stated that there are different kinds of fake news, each with potential textual indicators
Ott et al., 2014 designed a multi-domain dataset by asking Turkers to pen down fake reviews
Qazvinian et al., 2011 utilized unigram, bigram and part-of-speech (PoS) features of tweets to detect rumour
Liu et al., 2006 stated that text classification is a construction problem of models that can classify new documents
into pre-defined classes
5. Related Research …
Ren et al., 2017 stated that Convolutional Neural Network (CNN) provides best representation for sentence and
Gated Recurrent Unit (GRU) provides best representation for document
As per Jurafsky et al., 2015 the sequential models like RNN and LSTM can be utilized for semantic composition
as well
Deep Learning methods CNN and RNN (LeCun et al., 2015), have garnered great success in image recognition
and NLP jobs (Yih et al., 2014), sentence modelling (Kalchbrenner et al. 2014), and other tasks (Collobert et al.,
2011)
6. Related Research …
The “Social Distance Theory” postulated by DePaulo et al., 1996 states that social non-acceptance of
deception and psychological negative effect to the deceivers leads to deceivers’ distancing themselves from
deceit and persons they attempt to deceive
The term “Deceptive Communication” factors in deliberately hiding the truth by either exclusion or authority
(Ekman and Friesen, 1969) and is a message consciously shared by sender to serve a false notion or conclusion
by the receiver. The “Interpersonal Deception Theory” (Buller et al., 1996) boosts this notion, justifying how
deceiver critically manages the communication attitude in reply to the feelings and mistrust of the receivers
The “Media Richness Theory” states that deceptive actors may most likely utilize communication mediums,
which give several clues, an option to customize and to share quick feedback (synchronous and immediate) –
aiding them to strengthen the non-committal message characteristic and regulate deceptive communication
strategy at runtime to blur their deceptive purpose (Daft et al., 1987)
7. Research Problem and Objectives
Research Problem
For a given text coming from different online sources such as news, reviews, posts, tweets or blogs, the task is to
figure out if it is a fake or real text using deep learning hybrid method in the online social media space
Research Objectives
To design Deep Neural Network (DNN) based hybrid model for fake text identification
To check whether DNN model outclasses machine learning models
Figure out best text representations methods and their impact on classification metrics
8. Research Methodology
Text pre-processing (stemming, removal of stop words, removal of punctuations, removal of numbers and
removal of special characters) done
Quantitative approach used in this work using three real-world datasets
A state-of-art novel deep learning model, ConvGRUText was designed
9. Proposed Solution (Process View)
Text Collection
Text pre-processing and
representation
Data Mining
Knowledge Discovery
11. ConvGRUText Network Diagram
Data Inputs CNN Layer 1
CNN Layer 2
[ReLU Activation]
GRU Layer 1
[ReLU Activation]
Text Classification (Fake or Real)
[Sigmoid Activation]
12. Experimental Setup and Results
Datasets – https://www.kaggle.com/jruvika/fake-news-detection, https://www.kaggle.com/rtatman/deceptive-
opinion-spam-corpus (M. Ott et al., 2014) and https://www.kaggle.com/c/nlp-getting-started
First one 4000 records, second one 1600 records and third one 7500 records. 2 attributes – News text and News
label (fake or real) or Review text and Review label (fake or real) or Tweet text and Tweet label (fake or real) used
in data analysis
Confusion matrix for fake news, reviews and tweets for different text representations created and compared
Different models and classification metrics for fake news, reviews and tweets compared
Receiver Operating Characteristics curves created for fake news, reviews and tweets
28. Conclusions
Online businesses capitalize on machine learning models to find out if a text is fake or real. They can use the
proposed hybrid deep learning model for fake text identification
Regulation is required by governments to stop the advancement of fake texts. This will need appropriate public
policy changes
If fake texts are controlled then market fluctuations will be less, leading to a stable economic environment
Data Analytics is the enabler to solve this critical business problem of distinguishing between fake and real texts
with high predictive accuracy, hence assisting the policy makers, businesses, communities, consumers and
society
29. Future Research Directions and Limitations
Execute developed hybrid model on other domains to further test generalizability
Larger dataset to test scalability of model
Could have used word embeddings such as GloVe, FastText or Word2Vec for data representations