SlideShare a Scribd company logo
1 of 27
Download to read offline
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 1 IESCE, Chittilappilly
1. INTRODUCTION
Social media is one of the most available news sources these days for many folks
worldwide due to their low value, quick access and fast spread. However, this comes with
some confusing signs and significant risk of exposure to 'false stories' written to mislead
readers. Such infom1ation can affect the public's voice and allow evil groups to control
the outcome of public events, such as elections. Fake and misleading news can have a real
impact on those who find themselves as targets. The information and news regarding the
spread of global pandemic covid-19 like self -verification of being infected, it spread
based on temperature, about vaccination; the speech of political figures during public
addressing and the unverified statement of them regarding the military invasion, about
developing and doing public goods; false and misleading images of people for malign or
praise them; manipulation of videos and audios are some of the cases and example of fake
news.
These days fake news is creating different issues from sarcastic articles to fabricated news
and planned government propaganda in some outlets. Fake news and lack of trust in the
media are growing problems with huge ramifications in our society. Obviously, a
purposely misleading story is "fake news" but lately blathering social media's discourse is
changing its definition. Some of them now use the term to dismiss the fact counter to their
preferred viewpoints. A view of an individual’s becomes information for others and based
on those biased and unverified information others build their surroundings. The increase
in information based on this approach made a society run with false ideas. This
falsification of information is hardly verified by an individual as they busy themselves in
their individual and virtual world. But the society based on the false and biased ideas is a
bomb which tickles every time to burst whenever a new idea intervenes and becomes a
threat to the dominance of the existing idea which is neither good for an individual or a
society.
However, in order to solve a problem, it is necessary to have an understanding on what
Fake News is and how the techniques in the fields of machine learning, natural language
processing help us to detect fake news.
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 2 IESCE, Chittilappilly
2. LITERATURE REVIEW
Authors: Monther Aldwairi, Ali Alwahedi, in [1] has implemented the fake news and
click bait interfere with the ability of a user to discern useful information from the
internet vice especially when news become critical for decision making. Considering the
changing landscape of the modern business world, the issue of fake news has become
more than just a marketing problem as it warrant serious effort from security researchers.
It is imperative that any attempt to manipulate ort roll the internet through fake news or
click baits are countered with absolute effectiveness. We proposed a simple but effective
approach to allow user in-stall simple tool into their personal browser and use it to detect
and filter out potential click baits. The preliminary experimental results conducted to
access the method ability to attain its intended objective showed outstanding performance
in identify possible sources of fake news. Since we started this work, few fake news
databases have been made available we are recurrently expanding our approach using to
test its effectiveness against the new data sets.
Authors: Xinyi Zhou, Reza Zafarani, in [2] has researched about important of
multidisciplinary fake news research reviewing and organizing fake news detection
studies from multiple way which are news content and the medium on which the news
spreads, the rate of detection i.e., response time whether the news is real or fake was
measured to be very slow. They have detailed fact extraction,KB/KG construction and
fact checking. There are some open issues and several potential research tasks, First ,
when collecting facts to construct KB (KG), one concern is the sources from which facts
are abstracted. In addition to the traditional sources such as Wikipedia some other sources
eg, fact checking websites that contain expert analysis and justification for checked news
contents might help provide high quality domain knowledge. However such sources have
rarely been considered in current research. As fake news research is evolving, we
accompany this survey within online repository which will provide summaries and timely
updates on the research development on fake news. Including tutorials recent publications
and method data sets and other related resources.
Authors: Srishti Agrawal, Vaishali Arora, in [3] has implemented the key expressions of
news affairs have been taken in a form that needs to be verified. The filtered data is stored
in a database known as Mango DB. Data pre-processing unit is very reliable for setting up
data for the additional processing that is required. Classification is basically dependent on
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 3 IESCE, Chittilappilly
no of tweets, no of hashtags, no of adherence confirmed user sentiment score, no of
retweets, methods of NLP. Due to multiple number of stance detection is used for
examining the stance of the author there are not 2 but three results are expected. It is a
psychological model that is used by the author, Stance Detection has any other
applications. The stance of the author can be considered as: Agreed, Neutral or Disagreed.
We can determine whether a news story is fake or genuine once we have considered all
the classes. Also the authenticity for a news story is given. After that we classify the
output and use classification algorithms. Moreover when the detection is measured =
neutral, which means neither its true nor its false. The complete process is not so useful
because the result is itself confusing, whether to trust or not. Which eventually failed the
very purpose of building the program.
Author: H. Parveen Sultanaa, Srijan Malhotra, in [4] has researched about the result that
are not satisfying with the variety of news. The results show that SVM and logistic
regression classifier have the best performance on this data set in the model, with SVM
having a slightly better performance than logistic regression classifier. The same can be
perceived from the fi scores. Also the training data is largely based on US politics and
economic news so it has been observed in our test cases, that the news statements related
to US politics have been correctly classified and fake news was detected. But the test
cases which have news related to technology have been wrongly predicted. The biggest
drawback that come packaged with this problem is that, the data is erratic and this means
that any type of prediction model can have anomalies and can mistakes. For future
improvements concepts like POS tagging, word2vec and topic modelling can be utilized.
These will give the model a lot more depth in terms of feature extraction and fine tuned
classification.
Authors: Rajendra Chatse, Pradeep Kumar Kale, in [5] has executed the process of this
project was tedious. It was not an easy experience of an expert as well. First system login
then registration, twitter data scrapping, twitter data to CSY conversion, applying NLP,
algorithm and predict the positive, negative and neutral, fake news detection. This paper
describes a simple fake news detection method based on one of the machine learning
algorithms - naive bayes classifier. The goal of the fake search is to examine hoe nai've
bayes works for this particular problem, given a manually labeled news data set, and to
support the idea of using artificial intelligence for fake news detection. Further, this
technique cannot be applied to social perform like facebook and twitter by adding recent
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 4 IESCE, Chittilappilly
news and enhancing the fake news detection system. The main drawback of this was the
dataset stored had to be manua11y labeled, which is time consuming and not convenient
for large number of datasets. The difference between this papers and other papers on this
similar topics is that in this composition na'ive bayes. Classifier was specifically used for
fake news detection we have tested the difference in accuracy by taking different length
of the articles for detection the fakenews; also a concept of web scrapping was introduced
which gave us an insight into how we can update our dataset on regular basis to check the
truthfulness of the recently updated facebook posts.
Authors: Shruthy S Shetty, KB Shreejith, in [6] has researched about the Fake news
detection on social media has recently become emerging research that is capturing
attention. Fake news is generated on purpose to mislead readers to believe false
information, which makes it difficult and non-trivial to detect based on content. Fake
news on social media has been occurring for several years; however, there is no agreed
definition of the term "fake news". For better guidance of the future directions of fake
news direction research, appropriate classifications are necessary. Social media has
proved to be a powerful source for spreading fake news. It is important to utilize some of
the emerging patterns for fake news detection on social media. The one and only
drawback hers is SVM algorithm, because is not suitable for large data sets. SVM does
not perform very well when the data set has more noise i.e., target classes are
overlapping. In cases where the number of features for each data point exceeds the
number of training data samples, the SVM will underperform.
Authors: Nerissa Pereira, Sirman Dabreo, in [7] has presented a model for fake news
detection using a variety of machine learning and deep learning algorithms. Furthermore,
in the first level of implementation, we investigated the four different classifiers and
compared their accuracies. The model that achieves the highest accuracy is LSTM and the
highest accuracy is 93%. Fake news detection is a quite popular and trending research are
which has an extremely scarce number of datasets. The current model which we have
generated is run against the existing dataset, indicating that the model performs well
against it. I our next level we have analyzed the real time data from Twitter. Here we
have trained our model using logistic regression algorithm; due to the inability of the
LSTM to perform well over the real time tweets having considerably small length. The
accuracy for the tweets classification using Logistic Regression was found to be around
87%. Also, there is no Visual presentation in the result. Hence in the future work we need
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 5 IESCE, Chittilappilly
to verify not just the Language but also the images and audio embedded in the content.
The method is only twitter oriented, hence any news which is not on twitter cannot be
predicted or analyzed whether its real or fake. Also, it will be a useless set of data.
Authors: Z Khanam, B N Alwasel, H Sirafi, in [8] has focused on detecting the fake news
by reviewing it in two stages: Characterization and disclosure. In the first stage, the basic
concepts and principles of fake news are highlighted in social media. During the
discovery stage, the current methods are reviewed for detection of fake news using
different supervised learning algorithms. As for the displayed fake news detection
approaches that is based on text analysis in the paper utilizes models based on speech
characteristics and predictive models that do note fit with the other current models. From
the utilized Nai·ve Bayes classifier to detect fake news from different sources, with
results of accuracy of 74%. Used combined ML algorithms, but they depend on unreliable
probability threshold with 85-91 % accuracy. Uses the Nai"ve Bayes to detect fake news
from different social media websites, but the results were not accurate for the untruthful
sources.
Authors: Christian Janze, Marten Risius, in [9] has implemented the research given by
them suggests that fake news sites could falsely suggest probity by selecting name, profile
pictures and logos similar to reliable sources. Thus, respective source-centric attributes
should be considered in future. In the present study, we only considered the most apparent
features of the news post, which are probably most influential due to their exposed
position. However, characteristics of the actual fake news text should prospectively also
be assessed to determine its status as being real or fake news. Beyond these
considerations, it needs to be noted that we also excluded some seemingly relevant
metrics like the percentage of post likes and the overall number of reactions due to multi
co-linearity. However, other limiting aspects concern the generalizability of our findings.
The news detection in the present work only revolves around political topics. While these
are currently of the predominant public interest, fake news can also target other areas like
science, sports or economics, which are not part of the study's sample. Nevertheless, as
we do not consider any topic specific features, we are confident in the generalizability of
our results. Furthermore, we only considered messages from Facebook, which are
structurally and functionally distinct from other social media platforms. While Facebook
represents the social media platforms where most news rae consumed other platforms are
also subject to fake news, which need individual means of detection. Next to this
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 6 IESCE, Chittilappilly
limitation, it is possible that future advances in the realm of natural language generation
could potentially bypass our detection system by incorporating our findings to create fake
news which are indistinguishable from non-fake news. Considering the alleged
substantial effects of fake news on recent political events, the automatic detection of fake
news has important practical consequences. For future research, the present study
provides a starting point to identify improve the detection of fake news, which could also
be expanded to other topics and tested using data from additional social media platforms.
Current efforts of major platform operators to manually tag fake news is not an efficient
process.
Mykhailo Granik et.al. in their paper [3] shows a simple approach for fake news detection
using na'ive Bayes classifier. They were implemented as a software system and tested
against a dataset of Facebook news posts. They were collected five Facebook pages each
from the right and from the left, as well as three large mainstream political news pages
(Politico News). They achieved classification accuracy of approximately 74%.
Classification accuracy for fake news is slightly worse. This is caused by the skewness of
the dataset only 4.9% of it is fake news.
Himank Gupta et.al.[10] gave a framework based on different machine learning approach
that deals with various problem like accuracy shortage, time lag (BotMaker) and high
processing time to handle thousands of tweets in 1 sec. Firstly, they have 400,000 tweets
from HSpam 14 dataset. Then they further characterize the 150,000 spam tweets and
250,000 non spam tweets derived some lightweight features along with the Top 30 words
that are providing highest information gain from Bag-Of-Words. They were able to
achieve an accuracy of 91.65% and surpassed the existing solution by approximately
18%.
Marco L Della Vedova et.al [11] first proposed a novel ML fake news detection method
which, by combining news counter context features, outperforms existing methods in the
literature, increasing its accuracy up to 78.8%. Second, they implement method within a
Facebook Messenger Chatbot and validate it with a real-world application, obtaining a
fake news detection 81.7%. Their goal was to classify a news item as reliable or fake;
they first described the datasets they used for their test, the content-based approach they
implemented and the method they proposed to combine it with a social based approach
literature. The resulting dataset is composed of 15,500 posts, coming from 32 pages (14
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 7 IESCE, Chittilappilly
conspiracy pages, 18 scientific pages than 2,300,00 likes by 900,000+ users, 8923
(57.6%) posts are hoaxes and 6,577 (42.4%) are non-hoaxes.
Cody Buntain et.al [12] develops a method for automating fake news detection on twitter
by learning to predict accuracy two credibility-focused twitter datasets: CREDBANK, a
crowd sourced dataset of accuracy assessments for events in PHEME, a dataset of
potential rumors in twitter and journalistic assessments of their accuracies. They apply
this method content sourced from BuzzFeed's fake news dataset. A feature analysis
identifies features that are most predictive for crowd journalistic accuracy assessments,
results of which are consistent with prior work. They rely on identifying highly retweeted
conversation and use the features of these threads to classify stories, limiting this work's
applicability only to the set of pop. Since the majority of tweets are rarely retweeted, this
method therefore is only usable only a minority of twitter conversation.
In his paper, Shivam B Parekh et.al [13] aims to present an insight of characterization of
news stories in the modem diasporic with the differential content types of news story and
its impact on readers. Subsequently, we dive into existing fake news approaches that are
heavily based on text-based analysis, and also describe popular fake news datasets. We
conclude identifying 4 key open research challenges that can guide future research. It is a
theoretical approach which gives illustrations detection by analyzing the psychological
factors.
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 8 IESCE, Chittilappilly
3. SOFTWARE DEVELOPMENTS
3.1 PROPOSED MODEL
Social media is one of the most available news sources these days for many folks
worldwide due to their low value, quick access and fast spread. However, this comes with
confusing signs and significant risks of exposure to 'false stories' written to mislead
readers. Such information can affect the public's voice and allow evil groups to control
the outcome of public events, such as elections. These days fake news is creating different
issues from sarcastic articles to fabricated news and planned government propaganda in
some outlets. Fake news and lack of trust in the media are huge ramifications in our
society.
So, we proposed a system to detect the fake news, which is a classic text classification
problem with a straight forward proposition. It is needed to build a model that can
differentiate between "Real" news and "Fake" news. Methods should be followed are:
Pre-processing data is a normal first step before training and evaluating the data using
machine learning algorithms. Machine learning algorithms are only as good as the data
you are feeding them. It is a crucial that data is formatted properly and meaningful
features are included in order to have sufficient consistency that will result in the best
possible results Tfid vectorizer is used to abstract features from the content using this
abstracted feature do train ML algorithm (passive aggressive classifier).
3.2 EXISTING SYSTEM
A simple approach for fake news detection is performed using KNN classifier. The way
they get these probabilities is by using KNN, which describes the probability of a feature
which has Miss Classification and Less Prediction. In this proposed model, initially both
training and testing data are pre-processed by removing unwanted punctuation and word,
by next feature extraction is used the extract the needful information from the pre-
processing data. Supervised machine learning algorithms are applied to perform feature
extraction and prediction. After classification model is done by using SVM to classify the
news is predict as fake or real.
In this paper a model is Support Vector Machine and Nai:ve Bayes. SVM and Bayes are a
type of classification algorithm capable of learning order dependence in sequence
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 9 IESCE, Chittilappilly
prediction problems. This classification algorithm is used. It was demonstrated that two
layers were sufficient to detect more complex features.
The main drawbacks of existing system are:
It is better but also more difficult to train can be layers. One layer works with simple
issues, and to be sufficient to find relatively complex features.
Developing a false perception about someone is one major drawback of fake news.
3.3 REQUIREMENTS SPECIFICATION
3.3.1 HARDWARE REQUIREMENTS
RAM capacity: 8GB minimum, 16GB or higher CPU: Intel Core i5 6th Generation
processor or higher
Accessories: Computer system powerful enough to handle the computing power
necessary
3.3.2 SOFTWARE REQUIREMENTS
Operating system: Microsoft Windows 10 or Ubuntu Language: Python 3.6
Tools
Anaconda Numpy Matplotlib Skleam
3.3.3 PYTHON
Python is an interpreted, object-oriented, high-level programming language with dynamic
semantics. Its high-level built in data structures, combined with dynamic typing and
dynamic binding, make it very attractive for Rapid Application Development, as well as
for use as a scripting or glue language to connect existing components together. Python's
simple, easy to learn syntax emphasizes readability and therefore reduces the cost of
program maintenance. Python supports modules and packages, which encourages
program modularity and code reuse. The Python interpreter and the extensive standard
library are available in source or binary form without charge for a11 major platforms, and
can be freely distributed.
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 10 IESCE, Chittilappilly
Often, programmers fall in love with Python because of the increased productivity it
provides. Since there is no compilation step, the edit-test-debug cycle is incredibly fast.
Debugging Python programs is easy: a bug or bad input will never cause a segmentation
fault. Instead, when the interpreter discovers an error, it raises an exception. When the
program doesn't catch the exception, the interpreter prints a stack trace. A source level
debugger allows inspection of local and global variables, evaluation of arbitrary
expressions, setting breakpoints, stepping through the code a line at a time, and so on.
The debugger is written in Python itself, testifying to Python's introspective power. On
the other hand, often the quickest way to debug a program is to add a few print statements
to the source: the fast edit-test-debug cycle makes this simple approach very effective.
Python is dynamically and garbage collected. It supports multiple programming
paradigms, including structured (particularly procedural), object oriented and functional
programming. It is often described as a "batteries included" language due to its
comprehensive standard library. Python's large standard library provides tools suited to
many tasks, and is commonly cited as one of its greatest strengths. For Internet-facing
applications, many standard formats and protocols such as MIME and HTTP are
supported. It includes modules for creating graphical user interfaces, connecting to
relational databases, generating pseudorandom numbers, arithmetic with arbitrary-
precision decimals, manipulating regular expressions, and unit testing.Python consistently
ranks as one of the most popular programming languages.
3.3.4 ANACONDA
Anaconda is an open-source distribution of the Python and R programming languages for
data science that aims to simplify package management and deployment. Package
versions in Anaconda are managed by the package management system, conda, which
analyzes the current environment before executing an installation to avoid disrupting
other frameworks and packages.
The Anaconda distribution comes with over 250 packages automatically installed. Over
7500 additional open-source packages can be installed from PyPI as well as the conda
package and virtual environment manager. It also includes a GUI (graphical user
interface), Anaconda Navigator, as a graphical alternative to the command line interface.
Anaconda Navigator is included in the Anaconda distribution, and allows users to launch
applications and manage conda packages, environments and channels without using
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 11 IESCE, Chittilappilly
command-line commands. Navigator can search for packages, install them in an
environment, run the packages and update them. Anaconda is a distribution of the Python
and R programming languages for scientific computing (data science,machine learning
applications, large-scale data processing, predictive analysis, etc.), that aims to simplify
packet management and deployment. The distribution includes data-science packages
suitable for windows, linux, and macOs. It is developed and maintained by Anaconda,
Inc., which was founded by Peter Wang and Travis Oliphant in 2012. As an Anaconda,
Inc. product, it is also known as Anaconda Distribution or Anaconda Individual Edition,
while other products from the company are Anaconda Team Edition and Anaconda
Enterprise Edition, both of which are not free.
Anaconda is an open source distribution for Python and R. With the availability of more
than 300 libraries for data science, it becomes fairly optimal for any programmer to work
on anaconda for data science. Anaconda helps in simplified package management and
deployment. Anaconda comes with a wide variety of tools to easily collect data from
various sources using various machine learning algorithms and AI algorithms. It helps in
getting an easily manageable environment setup which can deploy any project with the
click of a single button.
3.3.5 NUMPY
NumPy is a Python library used for working with arrays. NumPy stands for Numerical
Python. It also has functions for working in domain of linear algebra, fourier transform,
and matrices. NumPy was created in 2005 by Travis Oliphant. It is an open source project
and you can use it freely. NumPy is a Python library and is written partially in Python,
but most of the parts that require fast computation are written in C or C++.In Python we
have lists that serve the purpose of arrays, but they are slow to process. NumPy aims to
provide an array object that is up to 50x faster than traditional Python lists. The array
object in NumPy is called ndarray, it provides a lot of supporting functions that make
working with ndarray very easy. Arrays are very frequently used in data science, where
speed and resources are very important. NumPy arrays are stored at one continuous place
in memory unlike lists, so processes can access and manipulate them very efficiently.
This behavior is called locality of reference in computer science. This is the main reason
why NumPy is faster than lists. Also it is optimized to work with latest CPU
architectures. The source code for NumPy is located at this github repository
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 12 IESCE, Chittilappilly
3.3.6 PANDAS
Pandas is an open source Python package that is most widely used for data science/data
analysis and machine learning tasks.It is built on top of another package named Numpy,
which provides support for multi-dimensional arrays. As one of the most popular data
wrangling packages, Pandas works well with many other data science modules inside the
Python ecosystem, and is typically included in every Python distribution, from those that
come with your operating system to commercial vendor distributions like ActiveState's
ActivePython. Pandas makes it simple to do many of the time consuming, repetitive tasks
associated with working with data. Python Pandas is defined as an open-source library
that provides high-performance data manipulation in Python. This tutorial is designed for
both beginners and professionals.It is used for data analysis in Python and developed by
Wes McKinney in 2008. Our Tutorial provides all the basic and advanced concepts of
Python Pandas, such as Numpy, Data operation and Time Series
Pandas is defined as an open-source library that provides high-performance data
manipulation in Python. The name of Pandas is derived from the word Panel Data, which
means an Econometrics from Multidimensional data. It is used for data analysis in Python
and developed by Wes McKinney in 2008.Data analysis requires lots of processing, such
as restructuring, cleaning or merging, etc. There are different tools are available for fast
data processing, such as Numpy, Scipy, Cython, and Panda. But we prefer Pandas
because working with Pandas is fast, simple and more expressive than other tools. Pandas
is built on top of the Numpy package, means Numpy is required for operating the Pandas.
Before Pandas, Python was capable for data preparation, but it only provided limited
support for data analysis. So, Pandas came into the picture and enhanced the capabilities
of data analysis. It can perform five significant steps required for processing and analysis
of data irrespective of the origin of the data, i.e., load, manipulate, prepare, model, and
analyze.
3.3.7 MATPLOTLIB
Matplotlib is a cross-platform, data visualization and graphical plotting library for Python
and its numerical extension NumPy. As such, it offers a viable open source alternative to
MATLAB. Developers can also use matplotlib's APis (Application Programming
Interfaces) to embed plots in GUI applications.A Python matplotlib script is structured so
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 13 IESCE, Chittilappilly
that a few lines of code are all that is required in most instances to generate a visual data
plot.
Matplotlib is an amazing visualization library in Python for 2D plots of arrays. Matplotlib
is a multi-platform data visualization library built on NumPy arrays and designed to work
with the broader SciPy stack. It was introduced by John Hunter in the year 2002.One of
the greatest benefits of visualization is that it allows us visual access to huge amounts of
data in easily digestible visuals. Matplotlib consists of several plots like line, bar, scatter,
histogram etc.
Matplotlib is a cross-platform, data visualization and graphical plotting library for Python
and its numerical extension NumPy. As such, it offers a viable open source alternative to
MATLAB. Developers can also use matplotlib's APis (Application Programming
Interfaces) to embed plots in GUI applications.A Python matplotlib script is structured so
that a few lines of code are all that is required in most instances to generate a visual data
plot.
3.3.8 SKLEARN
Sklearn is the most useful and robust library for machine learning in Python. It provides a
selection of efficient tools for machine learning and statistical modeling including
classification, regression, clustering and dimensionality reduction via a consistence
interface in Python. This library, which is largely written in Python, is built upon NumPy,
SciPy and Matplotlib. Scikit-learn (Sklearn) is the most useful and robust library for
machine learning in Python. It provides a selection of efficient tools for machine learning
and statistical modeling including classification, regression, clustering and dimensionality
reduction via a consistence interface in Python. This library, which is largely written in
Python, is built upon NumPy, SciPy and Matplotlib.
It was originally called scikits.learn and was initially developed by David Cournapeau as
a Google summer of code project in 2007. Later, in 2010, Fabian Pedregosa, Gael
Varoquaux, Alexandre Gramfort, and Vincent Michel, from FIRCA (French Institute for
Research in Computer Science and Automation), took this project at another level and
made the first public release (v0.l beta) on 1st Feb. 2010.
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 14 IESCE, Chittilappilly
4. MODULE DESCRIPTION
4.1 DATA PREPROCESSING
Prior to training and data evaluation using machine learning, data processing is a normal
first step. Algorithms for machine learning are always as useful as information you fed
them. It is important to format correct data and to include relevant items so that they are
consistent enough to produce best outcomes possible. Stop word removal, tokenization,
lower case, sentence segmentation, and punctuation removal are all examples of data
refinement. The information must be deleted. This allows us to reduce the size of the real
data by removing irrelevant information. We created a generic processing function for
each document to remove punctuation and non-letter characters, followed by the letter
case in the document was lowered. Make different steps to clean text (remove all non-
alphanumeric characters, delete stop words, delete missing rows, etc.)
4.2 FEATURE EXTRACTION
Feature selection is the method of reduction of dimensionality that reduces an original
batch of actual data to even more controllable computing categories. A distinguishing
feature of these large volumes of data is a lot of variables that have to be processed by
many data centers. To begin, we extract a number of language features from fake news
detection models: Building a model based on a count vectorizer using word tallies or a
term frequency inverse document frequency, TF id matrix can only get use of far. But
these models do not consider the important qualities like word ordering and context. It is
very possible that two articles that are similar in their word count will be completely
different in their meaning. The data science community has responded by taking actions
against the problem.
4.3 ALGORITHM TRAINING
The idea to use data from training in machine learning programs is a simple idea,
however the way such innovations work is also really simple. The training process is an
initial piece of facts used to help a program to realize how computational intelligence
technologies can be applied and specialized results produced. Successive sets of data
called confirmation and test sets may be used as an addition to this. It can process not
only individual data points, but also whole data sequences
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 15 IESCE, Chittilappilly
4.4 PREDICTION
Usually, when a data set is separated into a workout and test set. A declaration about a
particular outcome is a prediction. Forecasting can be helpful to plan available in the
form. The majority of the data is used for training, while only a small portion of the data
is used for testing. Using message box module to generate an interface for finding a
statement is fake or original. Using the trained data machine can predict output. Test data
also applied for feature extraction and preprocessing. Jn today's society, it is crucial to
monitor fake stories online, as news reporting is produced quickly because of the easily
accessible technology. There are seven major groups in the world of false stories, and the
piece of counterfeit news content can be textual and visual. Linguistic as we11 as non-
linguistic indicators can be analyzed by several techniques to determine false news.
Although several of these methods are usually efficient in identifying fake notices, they
are limited.
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 16 IESCE, Chittilappilly
5. METHODOLOGY
The main objective is to detect fake news, which is a classic text classification problem
with a straightforward proposition. It is needed to build a model that can differentiate
between "Real news" and "Fake news. Methods should be as follows:
1. Acquiring and loading the data.
2. Cleaning the dataset.
3. Removing extra symbols.
4. Removing punctuations.
5. Removing the stop words.
6. Stemming.
7. Tokenization.
8. Feature Extractions.
9. TF-IDF vectorizer.
10. Counter vectorizer with TF-IDF transformer.
11. Machine learning model training and verification.
Preprocessing data is the normal first step before training and evaluating the data using
machine learning algorithms. Machine learning algorithms are only as good as the data
you are feeding them. It is crucial that data is formatted properly and meaningful features
are included in order to have sufficient consistency that will result in the best possible
results. Tfid vectorizer is used to extract features from the content. Using those extracted
feqtures do train ML algorithm (passive aggressive classifier).
Fig. 5.1 Architecture
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 17 IESCE, Chittilappilly
6. EXPERIMENTAL ANALYSIS
6.1 SAMPLE CODE
import numpy as np
import pandas as pd
import itertools
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfV ectorizer from
sklearn.linear_model import PassiveAggressiveClassifier from
skleam.metrics import accuracy_score, confusion_matrix import
pickle
#Read the data
df=pd.read_csv('news.csv')
#shape and head
print('Rows and colums',df.shape)
print("first 5 datas",df.head)
labels=df.label
print("labels:",labels.head())
x_train,x_test,y_train,y_test = train_test_split(df['text'], labels, test_size=0.2,
random_state=7) #Initialize a
TfidfVectorizer
tfidf_vectorizer=TfidfVectorizer(stop_words='english', max_df=O.7)
#Fit and transform train set, transform test set
tfidf_train=tfidf_vectorizer.fit_transform(x_train)
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 18 IESCE, Chittilappilly
tfidf_test=tfidf_vectorizer.transform(x_test)
#Initialize a PassiveAggressiveClassifier
pac=PassiveAggressiveClassifier(max_iter=I 00)
pac.fit(tfidf_train,y _train)
with open('vectors.pickle', 'wb') as f: pickle.dump(tfidf_vectorizer, f)
with open('fakenews.pickle','wb') as f:
pickle.dump(pac,f)
pkl = open('fakenews.pickle', 'rb') pac
= pickle.load(pkl)
vec = open('vectors.pickle', 'rb')
tt_vect = pickle.load(vec)
#Predict on the test set and calculate accuracy y_pred=pac.
predict(tfidf_test) score=accuracy_score(y_test,y_pred)
print(f Accuracy: { round(score*100,2)}% ')
#Build confusion matrix
confusion_matrix(y_test,y_pred, labels=['FAKE','REAL'])
print(confusion_matrix(y_test,y_pred, labels=['F AKE','REAL']))
text='Watch The Exact Moment Paul Ryan Committed Political Suicide At A Trnmp
Rally (VIDEO)'
tf_text=tf_ vect.transform(Ltext])
pred=pac.predict(tf_text)
Print(pred)
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 19 IESCE, Chittilappilly
6.2. IMPLEMENTATION
Fig. 6.1 User Interface
Fig.6.2 Home page
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 20 IESCE, Chittilappilly
Fig. 6.3 Admin login
Fig.6.4 User login
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 21 IESCE, Chittilappilly
Fig.6.5 News Uploading
Fig.6.6 News prediction
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 22 IESCE, Chittilappilly
Fig. 6.7 User registration
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 23 IESCE, Chittilappilly
7. CONCLUSION
The concept of deception detection in social media is particularly new and there is on-
going research in hopes that scholars can find more accurate ways to detect false
information in this booming, fake-news-infested domain. For this reason, this research
may be used to help other researchers discover which combination of methods should be
used in order to accurately detect fake news in social media. The proposed method
described in this paper is an idea for a more accurate fake news detection algorithm. It is
important that we have some mechanism for detecting fake news, or at the very least, an
awareness that not everything we read on social media may be true, so we always need to
be thinking critically. This way we can help people make more informed decisions and
they will not be fooled into thinking what others want to manipulate them into believing
Fake news interfere with the ability of a user to discern useful information from the
Internet services especially when news becomes critical for decision making. Considering
the changing landscape of the modern business world, the issue of fake news has become
more than just a marketing problem as it warrants serious efforts from security
researchers. It is imperative that any attempts to manipulate or troll the Internet through
fake news are countered with absolute effectiveness. We proposed a simple but effective
approach to allow users in-stall a simple tool into their personal browser and use it to
detect and filter out potential Clickbaits. The preliminary experimental results conducted
to assess the method's ability to attain its intended objective, showed outstanding
performance in identify possible sources of fake news. Since we started this work, few
fake news databases have been made available and we're currently expanding our
approach using R to test its effectiveness against the new datasets.
In the 21st century, the majority of the tasks are done online. Newspapers that were
earlier preferred as hard-copies are now being substituted by applications like Facebook,
Twitter, and news articles to be read online. Whatsapp's forwards are also a major source.
The growing problem of fake news only makes things more complicated and tries to
change or hamper the opinion and attitude of people towards use of digital technology.
When a person is deceived by the real news two possible things happen- People start
believing that their perceptions about a particular topic are true as assumed. Thus, in order
to curb the phenomenon, we have developed our Fake news Detection system that takes
input from the user and classify it to be true or fake. To implement this, various NLP and
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 24 IESCE, Chittilappilly
Machine Learning Techniques have to be used. The model is trained using an appropriate
dataset and performance evaluation is also done using various performance measures. The
best model, i.e. the model with highest accuracy is used to classify the news headlines or
articles. As evident above for static search, our best model came out to be Logistic
Regression with an accuracy of 65%. Hence we then used grid search parameter
optimization to increase the performance of logistic regression which then gave us the
accuracy of 75%. Hence we can say that if a user feed a particular news article or its
headline in our model, there are 75% chances that it will be classified to its true nature.
The user can check the news article or keywords online; he can also check the
authenticity of the website. The accuracy for dynamic system is 93% and it increases with
every iteration. We intend to build our own dataset which will be kept up to date
according to the latest news. All the live news and latest data will be kept in a database
using Web Crawler and online database.
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 25 IESCE, Chittilappilly
8. REFERENCES
[1] Abu-Nimeh, S., Chen, T., Alzubi, 0., 2011. Malicious and spam posts in online social
networks. Computer 44, 23-28. doi:10.l 109/MC.2011.222.
[2] Al Messabi, K., Aldwairi, M., Al Yousif, A., Thoban, A., Belqasmi, F., 2018.
Malware detection using dns records and domain name features",in: International
Conference on Future Networks and Distributed Systems (ICFNDS), ACM. URL:
https://doi.org/10.1145/3231053.3231082.
[3] Aldwairi, M., Abu-Dalo, A.M., Jarrah, M., 2017a. Pattern matching of signature-
based ids using myers algorithm under mapreduce frame-work. EURASIP J. Information
Security 2017, URL: http://dblp.uni-trier.de/db/journals/ejisec/ejisec2017.html# Aldw
airiAJ17.
[4] Aldwairi, M., Al-Salman, R., 2011. Malurls: Malicious urls classification system, in:
Annual International Conference on Information Theoryand Applications, GSTF Digital
Library (GSTF-DL), Singapore. doi:10.5176/978-981-08-8113-9_1TA201l-29. the best
paper award.
[5] Aldwairi, M., Alsaadi, H.H., 2017. Flukes: Autonomous log forensics, intelligence
and visualization tool, in: Proceedings of the InternationalConference on Future Networks
and Distributed Systems, ACM, New York, NY, USA. pp. 33:1-3
6] Aldwairi, M., Hasan, M., Balbahaith, Z., 2017b. Detection of drive-by download
attacks usmg machine learning approach. Int. J. Inf. Sec.Priv. 11, 16-28. URL:
https://doi.org/10.4018/IJISP.2017100102, doi:10.4018/IJISP.2017100102.
[7] Balmas, M., 2014. When fake news becomes real: Combined exposure to multiple
news sources and political attitudes of inefficacy, alienation,and cynicism.
Communication Research 41, 430-454. doi:10.1177/0093650212453600.
[8] Baym, G., Jones, J.P., 2012. News parody in global perspective: Politics, power, and
resistance.PopularCommunicationl0,213.URL:https://doi.org/10.1080/15405702.2012.63
856 6, doi: I 0.1080/15405702.2012.638566.
[9] Brewer, P.R., Young, D.G., Morreale, M., 2013. The impact ofreal news about fake
news": Intertextual processes and political satire. In-ternational Journal of Public Opinion
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 26 IESCE, Chittilappilly
Research 25, 323-343. URL: http://dx.doi.org/l 0.1093/ijpor/edt0I 5, doi: IO. I
093/ijpor/edt0I5
[10] Chakraborty, A., Paranjape, B., Kakarla, S., Ganguly, N., 2016. Stop clickbait:
Detecting and preventing clickbaits in online news media,in: 2016 IEEE/ACM
International Conference on Advances in Social Networks Analysis and Mining
(ASONAM), pp. 9-16. doi:10.1109/ASONAM.2016.7752207
[11] Chen, Y., Conroy, N.J., Rubin, V.L., 2015. News in an online world: The need for an
"automatic crap detector", in: Proceedings of the 78thASIS&T Annual Meeting:
Infonnation Science with Impact: Research in and for the Community, American Society
for Information Science,SilverSprings,MD,USA.pp.81:1- 81:4.URL:http://dl.acm.org/
citation.cfm?id=2857070.2857151.
[12] Conroy, N.J., Rubin, V.L., Chen, Y., 2015. Automatic deception detection: Methods
for finding fake news, in: Proceedings of the 78th ASIS&TAnnual Meeting: Information
Science with Impact: Research in and for the Community, American Society for
Information Science, SilverSprings,MD,USA.pp.82:1- 82:4.URL:http://dl.acm.org/
citation.cfm?id=2857070.2857152.
l13] Hassid, J., 2011. Four models of the fourth estate: A typology of contemporary
chinese journalists. The China Quarterly 208, 813832.doi:10.1017/S0305741011001019.
[14] Lewis, S., 2011. Journalists, social media, and the use of humor on twitter. The
Electronic Journal of Communication/ La Revue Electronicde Communication 21, 1-2.
[15] Marchi, R., 2012. With facebook, biogs, and fake news, teens reject journalistic
objectivity. Journal of Communication Inquiry 36, 246-262. URL: https://doi.org/
10.1177/0196859912458700, doi:10.1177/0196859912458700.
[16] Masri, R., Aldwairi, M., 2017. Automated malicious advertisement detection using
virustotal, urlvoid, and trendmicro, in: 2017 8th Interna-tional Conference on Information
and Communication Systems (ICICS), pp. 336-341. doi:10.1109/IACS.2017.7921994.
[17] Nah, F.F.H., 2015. Fake-website detection tools : Identifying elements that promote
individuals use and enhance their performance 1 .introduction.[18] Pogue, D., 2017. How
to stamp out fake news. Scientific American 316, 24-24. doi:10.1038/scientific
american0217-24.
Fake News Detection on Social Media using Machine Learning
Dept. of CSE 27 IESCE, Chittilappilly
[19] Qbeitah, M.A., Aldwairi, M., 2018. Dynamic malware analysis of phishing emails,
in: 2018 9th International Conference on Information andCommunication Systems
(ICICS), pp. 18-24. doi:10.1109/IACS.2018.8355435.
[20] Riedel, B., Augenstein, I., Spithourakis, G.P., Riedel, S., 2017. A simple but tough-
to-beat baseline for the fake news challenge stance detectiontask. CoRR abs/1707.03264.
URL: http://arxiv.org/abs/1707.03264, arXiv:1707.03264
[21] Rubin, V.L., Chen, Y., Conroy, N.J., 2015. Deception detection for news: Three
types of fakes, in: Proceedings of the 78th ASIS&T AnnualMeeting: Information Science
with Impact: Research in and for the Community, American Society for Information
Science, Silver Springs,MD, USA. pp. 83:1-83:4. URL: http://dl.acm.org/citation.
cfm?id=2857070.2857153.
[22] Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H., 2017. Fake news detection on social
media: Adata mmmg perspective. SIGKDDExplor. Newsl.19, 22-36. URL:
http://doi.acm.org/10.1145/3137597.3137600, doi:10.1145/3137597.3137600.
[23] Smith, J., Leavitt, A., Jackson, G., 2018. Designing new ways to give context to
news stories. https://medium.com/facebook-design/designing-new-ways-to-give-context-
to-newsstories-f 6cl 3604f450.
[24] Spicer, R.N., 2018. Lies, Damn Lies, Alternative Facts, Fake News, Propaganda,
Pinocchios, Pants on Fire, Disinformation, Misin-formation, Post-Truth, Data, and
Statistics. Springer International Publishing, Cham. pp. 1-31. URL:
https://doi.org/10.1007/978-3-319- 69820-5_1, doi:10.1007/978-3-319-69820-5_1.
[25] of Waikato, U., 2017. Waikato environment for knowledge analysis. URL:
https://www.cs.waikato.ac.nz/ml/weka/.

More Related Content

What's hot

What's hot (20)

Face recognition technology
Face recognition technologyFace recognition technology
Face recognition technology
 
Weather app presentation
Weather app presentationWeather app presentation
Weather app presentation
 
Artificial Intelligence in Travel
Artificial Intelligence in TravelArtificial Intelligence in Travel
Artificial Intelligence in Travel
 
Introduction to Deep learning
Introduction to Deep learningIntroduction to Deep learning
Introduction to Deep learning
 
Project explation ppt
Project explation pptProject explation ppt
Project explation ppt
 
SRS FOR CHAT APPLICATION
SRS FOR CHAT APPLICATIONSRS FOR CHAT APPLICATION
SRS FOR CHAT APPLICATION
 
Chat Application
Chat ApplicationChat Application
Chat Application
 
Minor project Report for "Quiz Application"
Minor project Report for "Quiz Application"Minor project Report for "Quiz Application"
Minor project Report for "Quiz Application"
 
The complete srs documentation of our developed game.
The complete srs documentation of our developed game. The complete srs documentation of our developed game.
The complete srs documentation of our developed game.
 
Internship on web development
Internship on web developmentInternship on web development
Internship on web development
 
Online Electronic Shopping Project Report Final Year
Online Electronic Shopping Project Report Final YearOnline Electronic Shopping Project Report Final Year
Online Electronic Shopping Project Report Final Year
 
Loan Prediction System Using Machine Learning.pptx
Loan Prediction System Using Machine Learning.pptxLoan Prediction System Using Machine Learning.pptx
Loan Prediction System Using Machine Learning.pptx
 
Ambulance Booking App.docx
Ambulance Booking App.docxAmbulance Booking App.docx
Ambulance Booking App.docx
 
Kapil dikshit ppt
Kapil dikshit pptKapil dikshit ppt
Kapil dikshit ppt
 
Object detection presentation
Object detection presentationObject detection presentation
Object detection presentation
 
SRS for Hospital Management System
SRS for Hospital Management SystemSRS for Hospital Management System
SRS for Hospital Management System
 
Online Quiz System Project Report ppt
Online Quiz System Project Report pptOnline Quiz System Project Report ppt
Online Quiz System Project Report ppt
 
Driver drowsiness monitoring system using visual behavior and Machine Learning.
Driver drowsiness monitoring system using visual behavior and Machine Learning.Driver drowsiness monitoring system using visual behavior and Machine Learning.
Driver drowsiness monitoring system using visual behavior and Machine Learning.
 
Social Networking Project (website) full documentation
Social Networking Project (website) full documentation Social Networking Project (website) full documentation
Social Networking Project (website) full documentation
 
Weather Now
Weather NowWeather Now
Weather Now
 

Similar to Fake News Detection on Social Media using Machine Learning

Similar to Fake News Detection on Social Media using Machine Learning (20)

A RELIABLE ARTIFICIAL INTELLIGENCE MODEL FOR FALSE NEWS DETECTION MADE BY PUB...
A RELIABLE ARTIFICIAL INTELLIGENCE MODEL FOR FALSE NEWS DETECTION MADE BY PUB...A RELIABLE ARTIFICIAL INTELLIGENCE MODEL FOR FALSE NEWS DETECTION MADE BY PUB...
A RELIABLE ARTIFICIAL INTELLIGENCE MODEL FOR FALSE NEWS DETECTION MADE BY PUB...
 
Fake News Detection
Fake News DetectionFake News Detection
Fake News Detection
 
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
 
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...
Novel Machine Learning Algorithms for Centrality and Cliques Detection in You...
 
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
NOVEL MACHINE LEARNING ALGORITHMS FOR CENTRALITY AND CLIQUES DETECTION IN YOU...
 
International life Sciences
International life SciencesInternational life Sciences
International life Sciences
 
Fake_News_Detection_1st_review[1] capstone project.pptx
Fake_News_Detection_1st_review[1] capstone project.pptxFake_News_Detection_1st_review[1] capstone project.pptx
Fake_News_Detection_1st_review[1] capstone project.pptx
 
IRJET- Fake Message Deduction using Machine Learining
IRJET- Fake Message Deduction using Machine LeariningIRJET- Fake Message Deduction using Machine Learining
IRJET- Fake Message Deduction using Machine Learining
 
Detailed Research on Fake News: Opportunities, Challenges and Methods
Detailed Research on Fake News: Opportunities, Challenges and MethodsDetailed Research on Fake News: Opportunities, Challenges and Methods
Detailed Research on Fake News: Opportunities, Challenges and Methods
 
IJSRED-V2I3P23
IJSRED-V2I3P23IJSRED-V2I3P23
IJSRED-V2I3P23
 
Fake news Detection using Machine Learning
Fake news Detection using Machine LearningFake news Detection using Machine Learning
Fake news Detection using Machine Learning
 
Fake News Detection Using Machine Learning
Fake News Detection Using Machine LearningFake News Detection Using Machine Learning
Fake News Detection Using Machine Learning
 
FakeNewsDetector.pptx
FakeNewsDetector.pptxFakeNewsDetector.pptx
FakeNewsDetector.pptx
 
FAKE INFORMATION & WORD-OF-MOUTH BEHAVIOR
FAKE INFORMATION & WORD-OF-MOUTH BEHAVIORFAKE INFORMATION & WORD-OF-MOUTH BEHAVIOR
FAKE INFORMATION & WORD-OF-MOUTH BEHAVIOR
 
EPIDEMIC OUTBREAK PREDICTION USING ARTIFICIAL INTELLIGENCE
EPIDEMIC OUTBREAK PREDICTION USING ARTIFICIAL INTELLIGENCEEPIDEMIC OUTBREAK PREDICTION USING ARTIFICIAL INTELLIGENCE
EPIDEMIC OUTBREAK PREDICTION USING ARTIFICIAL INTELLIGENCE
 
Hoax classification and sentiment analysis of Indonesian news using Naive Bay...
Hoax classification and sentiment analysis of Indonesian news using Naive Bay...Hoax classification and sentiment analysis of Indonesian news using Naive Bay...
Hoax classification and sentiment analysis of Indonesian news using Naive Bay...
 
EPIDEMIC OUTBREAK PREDICTION USING ARTIFICIAL INTELLIGENCE
EPIDEMIC OUTBREAK PREDICTION USING ARTIFICIAL INTELLIGENCEEPIDEMIC OUTBREAK PREDICTION USING ARTIFICIAL INTELLIGENCE
EPIDEMIC OUTBREAK PREDICTION USING ARTIFICIAL INTELLIGENCE
 
What's Next: The World of Fake News
What's Next: The World of Fake NewsWhat's Next: The World of Fake News
What's Next: The World of Fake News
 
My new proposal (1).docx
My new proposal (1).docxMy new proposal (1).docx
My new proposal (1).docx
 
Fake News Detection Using Machine learning algorithm
Fake News Detection Using Machine learning algorithm Fake News Detection Using Machine learning algorithm
Fake News Detection Using Machine learning algorithm
 

Recently uploaded

result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
Tonystark477637
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Christo Ananth
 

Recently uploaded (20)

College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANJALI) Dange Chowk Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
Sheet Pile Wall Design and Construction: A Practical Guide for Civil Engineer...
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
University management System project report..pdf
University management System project report..pdfUniversity management System project report..pdf
University management System project report..pdf
 
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service NashikCall Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
Call Girls Service Nashik Vaishnavi 7001305949 Independent Escort Service Nashik
 
UNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its PerformanceUNIT - IV - Air Compressors and its Performance
UNIT - IV - Air Compressors and its Performance
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Coefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptxCoefficient of Thermal Expansion and their Importance.pptx
Coefficient of Thermal Expansion and their Importance.pptx
 
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur EscortsCall Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
Call Girls Service Nagpur Tanvi Call 7001035870 Meet With Nagpur Escorts
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...Top Rated  Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
Top Rated Pune Call Girls Budhwar Peth ⟟ 6297143586 ⟟ Call Me For Genuine Se...
 
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdfONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
ONLINE FOOD ORDER SYSTEM PROJECT REPORT.pdf
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 

Fake News Detection on Social Media using Machine Learning

  • 1. Fake News Detection on Social Media using Machine Learning Dept. of CSE 1 IESCE, Chittilappilly 1. INTRODUCTION Social media is one of the most available news sources these days for many folks worldwide due to their low value, quick access and fast spread. However, this comes with some confusing signs and significant risk of exposure to 'false stories' written to mislead readers. Such infom1ation can affect the public's voice and allow evil groups to control the outcome of public events, such as elections. Fake and misleading news can have a real impact on those who find themselves as targets. The information and news regarding the spread of global pandemic covid-19 like self -verification of being infected, it spread based on temperature, about vaccination; the speech of political figures during public addressing and the unverified statement of them regarding the military invasion, about developing and doing public goods; false and misleading images of people for malign or praise them; manipulation of videos and audios are some of the cases and example of fake news. These days fake news is creating different issues from sarcastic articles to fabricated news and planned government propaganda in some outlets. Fake news and lack of trust in the media are growing problems with huge ramifications in our society. Obviously, a purposely misleading story is "fake news" but lately blathering social media's discourse is changing its definition. Some of them now use the term to dismiss the fact counter to their preferred viewpoints. A view of an individual’s becomes information for others and based on those biased and unverified information others build their surroundings. The increase in information based on this approach made a society run with false ideas. This falsification of information is hardly verified by an individual as they busy themselves in their individual and virtual world. But the society based on the false and biased ideas is a bomb which tickles every time to burst whenever a new idea intervenes and becomes a threat to the dominance of the existing idea which is neither good for an individual or a society. However, in order to solve a problem, it is necessary to have an understanding on what Fake News is and how the techniques in the fields of machine learning, natural language processing help us to detect fake news.
  • 2. Fake News Detection on Social Media using Machine Learning Dept. of CSE 2 IESCE, Chittilappilly 2. LITERATURE REVIEW Authors: Monther Aldwairi, Ali Alwahedi, in [1] has implemented the fake news and click bait interfere with the ability of a user to discern useful information from the internet vice especially when news become critical for decision making. Considering the changing landscape of the modern business world, the issue of fake news has become more than just a marketing problem as it warrant serious effort from security researchers. It is imperative that any attempt to manipulate ort roll the internet through fake news or click baits are countered with absolute effectiveness. We proposed a simple but effective approach to allow user in-stall simple tool into their personal browser and use it to detect and filter out potential click baits. The preliminary experimental results conducted to access the method ability to attain its intended objective showed outstanding performance in identify possible sources of fake news. Since we started this work, few fake news databases have been made available we are recurrently expanding our approach using to test its effectiveness against the new data sets. Authors: Xinyi Zhou, Reza Zafarani, in [2] has researched about important of multidisciplinary fake news research reviewing and organizing fake news detection studies from multiple way which are news content and the medium on which the news spreads, the rate of detection i.e., response time whether the news is real or fake was measured to be very slow. They have detailed fact extraction,KB/KG construction and fact checking. There are some open issues and several potential research tasks, First , when collecting facts to construct KB (KG), one concern is the sources from which facts are abstracted. In addition to the traditional sources such as Wikipedia some other sources eg, fact checking websites that contain expert analysis and justification for checked news contents might help provide high quality domain knowledge. However such sources have rarely been considered in current research. As fake news research is evolving, we accompany this survey within online repository which will provide summaries and timely updates on the research development on fake news. Including tutorials recent publications and method data sets and other related resources. Authors: Srishti Agrawal, Vaishali Arora, in [3] has implemented the key expressions of news affairs have been taken in a form that needs to be verified. The filtered data is stored in a database known as Mango DB. Data pre-processing unit is very reliable for setting up data for the additional processing that is required. Classification is basically dependent on
  • 3. Fake News Detection on Social Media using Machine Learning Dept. of CSE 3 IESCE, Chittilappilly no of tweets, no of hashtags, no of adherence confirmed user sentiment score, no of retweets, methods of NLP. Due to multiple number of stance detection is used for examining the stance of the author there are not 2 but three results are expected. It is a psychological model that is used by the author, Stance Detection has any other applications. The stance of the author can be considered as: Agreed, Neutral or Disagreed. We can determine whether a news story is fake or genuine once we have considered all the classes. Also the authenticity for a news story is given. After that we classify the output and use classification algorithms. Moreover when the detection is measured = neutral, which means neither its true nor its false. The complete process is not so useful because the result is itself confusing, whether to trust or not. Which eventually failed the very purpose of building the program. Author: H. Parveen Sultanaa, Srijan Malhotra, in [4] has researched about the result that are not satisfying with the variety of news. The results show that SVM and logistic regression classifier have the best performance on this data set in the model, with SVM having a slightly better performance than logistic regression classifier. The same can be perceived from the fi scores. Also the training data is largely based on US politics and economic news so it has been observed in our test cases, that the news statements related to US politics have been correctly classified and fake news was detected. But the test cases which have news related to technology have been wrongly predicted. The biggest drawback that come packaged with this problem is that, the data is erratic and this means that any type of prediction model can have anomalies and can mistakes. For future improvements concepts like POS tagging, word2vec and topic modelling can be utilized. These will give the model a lot more depth in terms of feature extraction and fine tuned classification. Authors: Rajendra Chatse, Pradeep Kumar Kale, in [5] has executed the process of this project was tedious. It was not an easy experience of an expert as well. First system login then registration, twitter data scrapping, twitter data to CSY conversion, applying NLP, algorithm and predict the positive, negative and neutral, fake news detection. This paper describes a simple fake news detection method based on one of the machine learning algorithms - naive bayes classifier. The goal of the fake search is to examine hoe nai've bayes works for this particular problem, given a manually labeled news data set, and to support the idea of using artificial intelligence for fake news detection. Further, this technique cannot be applied to social perform like facebook and twitter by adding recent
  • 4. Fake News Detection on Social Media using Machine Learning Dept. of CSE 4 IESCE, Chittilappilly news and enhancing the fake news detection system. The main drawback of this was the dataset stored had to be manua11y labeled, which is time consuming and not convenient for large number of datasets. The difference between this papers and other papers on this similar topics is that in this composition na'ive bayes. Classifier was specifically used for fake news detection we have tested the difference in accuracy by taking different length of the articles for detection the fakenews; also a concept of web scrapping was introduced which gave us an insight into how we can update our dataset on regular basis to check the truthfulness of the recently updated facebook posts. Authors: Shruthy S Shetty, KB Shreejith, in [6] has researched about the Fake news detection on social media has recently become emerging research that is capturing attention. Fake news is generated on purpose to mislead readers to believe false information, which makes it difficult and non-trivial to detect based on content. Fake news on social media has been occurring for several years; however, there is no agreed definition of the term "fake news". For better guidance of the future directions of fake news direction research, appropriate classifications are necessary. Social media has proved to be a powerful source for spreading fake news. It is important to utilize some of the emerging patterns for fake news detection on social media. The one and only drawback hers is SVM algorithm, because is not suitable for large data sets. SVM does not perform very well when the data set has more noise i.e., target classes are overlapping. In cases where the number of features for each data point exceeds the number of training data samples, the SVM will underperform. Authors: Nerissa Pereira, Sirman Dabreo, in [7] has presented a model for fake news detection using a variety of machine learning and deep learning algorithms. Furthermore, in the first level of implementation, we investigated the four different classifiers and compared their accuracies. The model that achieves the highest accuracy is LSTM and the highest accuracy is 93%. Fake news detection is a quite popular and trending research are which has an extremely scarce number of datasets. The current model which we have generated is run against the existing dataset, indicating that the model performs well against it. I our next level we have analyzed the real time data from Twitter. Here we have trained our model using logistic regression algorithm; due to the inability of the LSTM to perform well over the real time tweets having considerably small length. The accuracy for the tweets classification using Logistic Regression was found to be around 87%. Also, there is no Visual presentation in the result. Hence in the future work we need
  • 5. Fake News Detection on Social Media using Machine Learning Dept. of CSE 5 IESCE, Chittilappilly to verify not just the Language but also the images and audio embedded in the content. The method is only twitter oriented, hence any news which is not on twitter cannot be predicted or analyzed whether its real or fake. Also, it will be a useless set of data. Authors: Z Khanam, B N Alwasel, H Sirafi, in [8] has focused on detecting the fake news by reviewing it in two stages: Characterization and disclosure. In the first stage, the basic concepts and principles of fake news are highlighted in social media. During the discovery stage, the current methods are reviewed for detection of fake news using different supervised learning algorithms. As for the displayed fake news detection approaches that is based on text analysis in the paper utilizes models based on speech characteristics and predictive models that do note fit with the other current models. From the utilized Nai·ve Bayes classifier to detect fake news from different sources, with results of accuracy of 74%. Used combined ML algorithms, but they depend on unreliable probability threshold with 85-91 % accuracy. Uses the Nai"ve Bayes to detect fake news from different social media websites, but the results were not accurate for the untruthful sources. Authors: Christian Janze, Marten Risius, in [9] has implemented the research given by them suggests that fake news sites could falsely suggest probity by selecting name, profile pictures and logos similar to reliable sources. Thus, respective source-centric attributes should be considered in future. In the present study, we only considered the most apparent features of the news post, which are probably most influential due to their exposed position. However, characteristics of the actual fake news text should prospectively also be assessed to determine its status as being real or fake news. Beyond these considerations, it needs to be noted that we also excluded some seemingly relevant metrics like the percentage of post likes and the overall number of reactions due to multi co-linearity. However, other limiting aspects concern the generalizability of our findings. The news detection in the present work only revolves around political topics. While these are currently of the predominant public interest, fake news can also target other areas like science, sports or economics, which are not part of the study's sample. Nevertheless, as we do not consider any topic specific features, we are confident in the generalizability of our results. Furthermore, we only considered messages from Facebook, which are structurally and functionally distinct from other social media platforms. While Facebook represents the social media platforms where most news rae consumed other platforms are also subject to fake news, which need individual means of detection. Next to this
  • 6. Fake News Detection on Social Media using Machine Learning Dept. of CSE 6 IESCE, Chittilappilly limitation, it is possible that future advances in the realm of natural language generation could potentially bypass our detection system by incorporating our findings to create fake news which are indistinguishable from non-fake news. Considering the alleged substantial effects of fake news on recent political events, the automatic detection of fake news has important practical consequences. For future research, the present study provides a starting point to identify improve the detection of fake news, which could also be expanded to other topics and tested using data from additional social media platforms. Current efforts of major platform operators to manually tag fake news is not an efficient process. Mykhailo Granik et.al. in their paper [3] shows a simple approach for fake news detection using na'ive Bayes classifier. They were implemented as a software system and tested against a dataset of Facebook news posts. They were collected five Facebook pages each from the right and from the left, as well as three large mainstream political news pages (Politico News). They achieved classification accuracy of approximately 74%. Classification accuracy for fake news is slightly worse. This is caused by the skewness of the dataset only 4.9% of it is fake news. Himank Gupta et.al.[10] gave a framework based on different machine learning approach that deals with various problem like accuracy shortage, time lag (BotMaker) and high processing time to handle thousands of tweets in 1 sec. Firstly, they have 400,000 tweets from HSpam 14 dataset. Then they further characterize the 150,000 spam tweets and 250,000 non spam tweets derived some lightweight features along with the Top 30 words that are providing highest information gain from Bag-Of-Words. They were able to achieve an accuracy of 91.65% and surpassed the existing solution by approximately 18%. Marco L Della Vedova et.al [11] first proposed a novel ML fake news detection method which, by combining news counter context features, outperforms existing methods in the literature, increasing its accuracy up to 78.8%. Second, they implement method within a Facebook Messenger Chatbot and validate it with a real-world application, obtaining a fake news detection 81.7%. Their goal was to classify a news item as reliable or fake; they first described the datasets they used for their test, the content-based approach they implemented and the method they proposed to combine it with a social based approach literature. The resulting dataset is composed of 15,500 posts, coming from 32 pages (14
  • 7. Fake News Detection on Social Media using Machine Learning Dept. of CSE 7 IESCE, Chittilappilly conspiracy pages, 18 scientific pages than 2,300,00 likes by 900,000+ users, 8923 (57.6%) posts are hoaxes and 6,577 (42.4%) are non-hoaxes. Cody Buntain et.al [12] develops a method for automating fake news detection on twitter by learning to predict accuracy two credibility-focused twitter datasets: CREDBANK, a crowd sourced dataset of accuracy assessments for events in PHEME, a dataset of potential rumors in twitter and journalistic assessments of their accuracies. They apply this method content sourced from BuzzFeed's fake news dataset. A feature analysis identifies features that are most predictive for crowd journalistic accuracy assessments, results of which are consistent with prior work. They rely on identifying highly retweeted conversation and use the features of these threads to classify stories, limiting this work's applicability only to the set of pop. Since the majority of tweets are rarely retweeted, this method therefore is only usable only a minority of twitter conversation. In his paper, Shivam B Parekh et.al [13] aims to present an insight of characterization of news stories in the modem diasporic with the differential content types of news story and its impact on readers. Subsequently, we dive into existing fake news approaches that are heavily based on text-based analysis, and also describe popular fake news datasets. We conclude identifying 4 key open research challenges that can guide future research. It is a theoretical approach which gives illustrations detection by analyzing the psychological factors.
  • 8. Fake News Detection on Social Media using Machine Learning Dept. of CSE 8 IESCE, Chittilappilly 3. SOFTWARE DEVELOPMENTS 3.1 PROPOSED MODEL Social media is one of the most available news sources these days for many folks worldwide due to their low value, quick access and fast spread. However, this comes with confusing signs and significant risks of exposure to 'false stories' written to mislead readers. Such information can affect the public's voice and allow evil groups to control the outcome of public events, such as elections. These days fake news is creating different issues from sarcastic articles to fabricated news and planned government propaganda in some outlets. Fake news and lack of trust in the media are huge ramifications in our society. So, we proposed a system to detect the fake news, which is a classic text classification problem with a straight forward proposition. It is needed to build a model that can differentiate between "Real" news and "Fake" news. Methods should be followed are: Pre-processing data is a normal first step before training and evaluating the data using machine learning algorithms. Machine learning algorithms are only as good as the data you are feeding them. It is a crucial that data is formatted properly and meaningful features are included in order to have sufficient consistency that will result in the best possible results Tfid vectorizer is used to abstract features from the content using this abstracted feature do train ML algorithm (passive aggressive classifier). 3.2 EXISTING SYSTEM A simple approach for fake news detection is performed using KNN classifier. The way they get these probabilities is by using KNN, which describes the probability of a feature which has Miss Classification and Less Prediction. In this proposed model, initially both training and testing data are pre-processed by removing unwanted punctuation and word, by next feature extraction is used the extract the needful information from the pre- processing data. Supervised machine learning algorithms are applied to perform feature extraction and prediction. After classification model is done by using SVM to classify the news is predict as fake or real. In this paper a model is Support Vector Machine and Nai:ve Bayes. SVM and Bayes are a type of classification algorithm capable of learning order dependence in sequence
  • 9. Fake News Detection on Social Media using Machine Learning Dept. of CSE 9 IESCE, Chittilappilly prediction problems. This classification algorithm is used. It was demonstrated that two layers were sufficient to detect more complex features. The main drawbacks of existing system are: It is better but also more difficult to train can be layers. One layer works with simple issues, and to be sufficient to find relatively complex features. Developing a false perception about someone is one major drawback of fake news. 3.3 REQUIREMENTS SPECIFICATION 3.3.1 HARDWARE REQUIREMENTS RAM capacity: 8GB minimum, 16GB or higher CPU: Intel Core i5 6th Generation processor or higher Accessories: Computer system powerful enough to handle the computing power necessary 3.3.2 SOFTWARE REQUIREMENTS Operating system: Microsoft Windows 10 or Ubuntu Language: Python 3.6 Tools Anaconda Numpy Matplotlib Skleam 3.3.3 PYTHON Python is an interpreted, object-oriented, high-level programming language with dynamic semantics. Its high-level built in data structures, combined with dynamic typing and dynamic binding, make it very attractive for Rapid Application Development, as well as for use as a scripting or glue language to connect existing components together. Python's simple, easy to learn syntax emphasizes readability and therefore reduces the cost of program maintenance. Python supports modules and packages, which encourages program modularity and code reuse. The Python interpreter and the extensive standard library are available in source or binary form without charge for a11 major platforms, and can be freely distributed.
  • 10. Fake News Detection on Social Media using Machine Learning Dept. of CSE 10 IESCE, Chittilappilly Often, programmers fall in love with Python because of the increased productivity it provides. Since there is no compilation step, the edit-test-debug cycle is incredibly fast. Debugging Python programs is easy: a bug or bad input will never cause a segmentation fault. Instead, when the interpreter discovers an error, it raises an exception. When the program doesn't catch the exception, the interpreter prints a stack trace. A source level debugger allows inspection of local and global variables, evaluation of arbitrary expressions, setting breakpoints, stepping through the code a line at a time, and so on. The debugger is written in Python itself, testifying to Python's introspective power. On the other hand, often the quickest way to debug a program is to add a few print statements to the source: the fast edit-test-debug cycle makes this simple approach very effective. Python is dynamically and garbage collected. It supports multiple programming paradigms, including structured (particularly procedural), object oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library. Python's large standard library provides tools suited to many tasks, and is commonly cited as one of its greatest strengths. For Internet-facing applications, many standard formats and protocols such as MIME and HTTP are supported. It includes modules for creating graphical user interfaces, connecting to relational databases, generating pseudorandom numbers, arithmetic with arbitrary- precision decimals, manipulating regular expressions, and unit testing.Python consistently ranks as one of the most popular programming languages. 3.3.4 ANACONDA Anaconda is an open-source distribution of the Python and R programming languages for data science that aims to simplify package management and deployment. Package versions in Anaconda are managed by the package management system, conda, which analyzes the current environment before executing an installation to avoid disrupting other frameworks and packages. The Anaconda distribution comes with over 250 packages automatically installed. Over 7500 additional open-source packages can be installed from PyPI as well as the conda package and virtual environment manager. It also includes a GUI (graphical user interface), Anaconda Navigator, as a graphical alternative to the command line interface. Anaconda Navigator is included in the Anaconda distribution, and allows users to launch applications and manage conda packages, environments and channels without using
  • 11. Fake News Detection on Social Media using Machine Learning Dept. of CSE 11 IESCE, Chittilappilly command-line commands. Navigator can search for packages, install them in an environment, run the packages and update them. Anaconda is a distribution of the Python and R programming languages for scientific computing (data science,machine learning applications, large-scale data processing, predictive analysis, etc.), that aims to simplify packet management and deployment. The distribution includes data-science packages suitable for windows, linux, and macOs. It is developed and maintained by Anaconda, Inc., which was founded by Peter Wang and Travis Oliphant in 2012. As an Anaconda, Inc. product, it is also known as Anaconda Distribution or Anaconda Individual Edition, while other products from the company are Anaconda Team Edition and Anaconda Enterprise Edition, both of which are not free. Anaconda is an open source distribution for Python and R. With the availability of more than 300 libraries for data science, it becomes fairly optimal for any programmer to work on anaconda for data science. Anaconda helps in simplified package management and deployment. Anaconda comes with a wide variety of tools to easily collect data from various sources using various machine learning algorithms and AI algorithms. It helps in getting an easily manageable environment setup which can deploy any project with the click of a single button. 3.3.5 NUMPY NumPy is a Python library used for working with arrays. NumPy stands for Numerical Python. It also has functions for working in domain of linear algebra, fourier transform, and matrices. NumPy was created in 2005 by Travis Oliphant. It is an open source project and you can use it freely. NumPy is a Python library and is written partially in Python, but most of the parts that require fast computation are written in C or C++.In Python we have lists that serve the purpose of arrays, but they are slow to process. NumPy aims to provide an array object that is up to 50x faster than traditional Python lists. The array object in NumPy is called ndarray, it provides a lot of supporting functions that make working with ndarray very easy. Arrays are very frequently used in data science, where speed and resources are very important. NumPy arrays are stored at one continuous place in memory unlike lists, so processes can access and manipulate them very efficiently. This behavior is called locality of reference in computer science. This is the main reason why NumPy is faster than lists. Also it is optimized to work with latest CPU architectures. The source code for NumPy is located at this github repository
  • 12. Fake News Detection on Social Media using Machine Learning Dept. of CSE 12 IESCE, Chittilappilly 3.3.6 PANDAS Pandas is an open source Python package that is most widely used for data science/data analysis and machine learning tasks.It is built on top of another package named Numpy, which provides support for multi-dimensional arrays. As one of the most popular data wrangling packages, Pandas works well with many other data science modules inside the Python ecosystem, and is typically included in every Python distribution, from those that come with your operating system to commercial vendor distributions like ActiveState's ActivePython. Pandas makes it simple to do many of the time consuming, repetitive tasks associated with working with data. Python Pandas is defined as an open-source library that provides high-performance data manipulation in Python. This tutorial is designed for both beginners and professionals.It is used for data analysis in Python and developed by Wes McKinney in 2008. Our Tutorial provides all the basic and advanced concepts of Python Pandas, such as Numpy, Data operation and Time Series Pandas is defined as an open-source library that provides high-performance data manipulation in Python. The name of Pandas is derived from the word Panel Data, which means an Econometrics from Multidimensional data. It is used for data analysis in Python and developed by Wes McKinney in 2008.Data analysis requires lots of processing, such as restructuring, cleaning or merging, etc. There are different tools are available for fast data processing, such as Numpy, Scipy, Cython, and Panda. But we prefer Pandas because working with Pandas is fast, simple and more expressive than other tools. Pandas is built on top of the Numpy package, means Numpy is required for operating the Pandas. Before Pandas, Python was capable for data preparation, but it only provided limited support for data analysis. So, Pandas came into the picture and enhanced the capabilities of data analysis. It can perform five significant steps required for processing and analysis of data irrespective of the origin of the data, i.e., load, manipulate, prepare, model, and analyze. 3.3.7 MATPLOTLIB Matplotlib is a cross-platform, data visualization and graphical plotting library for Python and its numerical extension NumPy. As such, it offers a viable open source alternative to MATLAB. Developers can also use matplotlib's APis (Application Programming Interfaces) to embed plots in GUI applications.A Python matplotlib script is structured so
  • 13. Fake News Detection on Social Media using Machine Learning Dept. of CSE 13 IESCE, Chittilappilly that a few lines of code are all that is required in most instances to generate a visual data plot. Matplotlib is an amazing visualization library in Python for 2D plots of arrays. Matplotlib is a multi-platform data visualization library built on NumPy arrays and designed to work with the broader SciPy stack. It was introduced by John Hunter in the year 2002.One of the greatest benefits of visualization is that it allows us visual access to huge amounts of data in easily digestible visuals. Matplotlib consists of several plots like line, bar, scatter, histogram etc. Matplotlib is a cross-platform, data visualization and graphical plotting library for Python and its numerical extension NumPy. As such, it offers a viable open source alternative to MATLAB. Developers can also use matplotlib's APis (Application Programming Interfaces) to embed plots in GUI applications.A Python matplotlib script is structured so that a few lines of code are all that is required in most instances to generate a visual data plot. 3.3.8 SKLEARN Sklearn is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction via a consistence interface in Python. This library, which is largely written in Python, is built upon NumPy, SciPy and Matplotlib. Scikit-learn (Sklearn) is the most useful and robust library for machine learning in Python. It provides a selection of efficient tools for machine learning and statistical modeling including classification, regression, clustering and dimensionality reduction via a consistence interface in Python. This library, which is largely written in Python, is built upon NumPy, SciPy and Matplotlib. It was originally called scikits.learn and was initially developed by David Cournapeau as a Google summer of code project in 2007. Later, in 2010, Fabian Pedregosa, Gael Varoquaux, Alexandre Gramfort, and Vincent Michel, from FIRCA (French Institute for Research in Computer Science and Automation), took this project at another level and made the first public release (v0.l beta) on 1st Feb. 2010.
  • 14. Fake News Detection on Social Media using Machine Learning Dept. of CSE 14 IESCE, Chittilappilly 4. MODULE DESCRIPTION 4.1 DATA PREPROCESSING Prior to training and data evaluation using machine learning, data processing is a normal first step. Algorithms for machine learning are always as useful as information you fed them. It is important to format correct data and to include relevant items so that they are consistent enough to produce best outcomes possible. Stop word removal, tokenization, lower case, sentence segmentation, and punctuation removal are all examples of data refinement. The information must be deleted. This allows us to reduce the size of the real data by removing irrelevant information. We created a generic processing function for each document to remove punctuation and non-letter characters, followed by the letter case in the document was lowered. Make different steps to clean text (remove all non- alphanumeric characters, delete stop words, delete missing rows, etc.) 4.2 FEATURE EXTRACTION Feature selection is the method of reduction of dimensionality that reduces an original batch of actual data to even more controllable computing categories. A distinguishing feature of these large volumes of data is a lot of variables that have to be processed by many data centers. To begin, we extract a number of language features from fake news detection models: Building a model based on a count vectorizer using word tallies or a term frequency inverse document frequency, TF id matrix can only get use of far. But these models do not consider the important qualities like word ordering and context. It is very possible that two articles that are similar in their word count will be completely different in their meaning. The data science community has responded by taking actions against the problem. 4.3 ALGORITHM TRAINING The idea to use data from training in machine learning programs is a simple idea, however the way such innovations work is also really simple. The training process is an initial piece of facts used to help a program to realize how computational intelligence technologies can be applied and specialized results produced. Successive sets of data called confirmation and test sets may be used as an addition to this. It can process not only individual data points, but also whole data sequences
  • 15. Fake News Detection on Social Media using Machine Learning Dept. of CSE 15 IESCE, Chittilappilly 4.4 PREDICTION Usually, when a data set is separated into a workout and test set. A declaration about a particular outcome is a prediction. Forecasting can be helpful to plan available in the form. The majority of the data is used for training, while only a small portion of the data is used for testing. Using message box module to generate an interface for finding a statement is fake or original. Using the trained data machine can predict output. Test data also applied for feature extraction and preprocessing. Jn today's society, it is crucial to monitor fake stories online, as news reporting is produced quickly because of the easily accessible technology. There are seven major groups in the world of false stories, and the piece of counterfeit news content can be textual and visual. Linguistic as we11 as non- linguistic indicators can be analyzed by several techniques to determine false news. Although several of these methods are usually efficient in identifying fake notices, they are limited.
  • 16. Fake News Detection on Social Media using Machine Learning Dept. of CSE 16 IESCE, Chittilappilly 5. METHODOLOGY The main objective is to detect fake news, which is a classic text classification problem with a straightforward proposition. It is needed to build a model that can differentiate between "Real news" and "Fake news. Methods should be as follows: 1. Acquiring and loading the data. 2. Cleaning the dataset. 3. Removing extra symbols. 4. Removing punctuations. 5. Removing the stop words. 6. Stemming. 7. Tokenization. 8. Feature Extractions. 9. TF-IDF vectorizer. 10. Counter vectorizer with TF-IDF transformer. 11. Machine learning model training and verification. Preprocessing data is the normal first step before training and evaluating the data using machine learning algorithms. Machine learning algorithms are only as good as the data you are feeding them. It is crucial that data is formatted properly and meaningful features are included in order to have sufficient consistency that will result in the best possible results. Tfid vectorizer is used to extract features from the content. Using those extracted feqtures do train ML algorithm (passive aggressive classifier). Fig. 5.1 Architecture
  • 17. Fake News Detection on Social Media using Machine Learning Dept. of CSE 17 IESCE, Chittilappilly 6. EXPERIMENTAL ANALYSIS 6.1 SAMPLE CODE import numpy as np import pandas as pd import itertools from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfV ectorizer from sklearn.linear_model import PassiveAggressiveClassifier from skleam.metrics import accuracy_score, confusion_matrix import pickle #Read the data df=pd.read_csv('news.csv') #shape and head print('Rows and colums',df.shape) print("first 5 datas",df.head) labels=df.label print("labels:",labels.head()) x_train,x_test,y_train,y_test = train_test_split(df['text'], labels, test_size=0.2, random_state=7) #Initialize a TfidfVectorizer tfidf_vectorizer=TfidfVectorizer(stop_words='english', max_df=O.7) #Fit and transform train set, transform test set tfidf_train=tfidf_vectorizer.fit_transform(x_train)
  • 18. Fake News Detection on Social Media using Machine Learning Dept. of CSE 18 IESCE, Chittilappilly tfidf_test=tfidf_vectorizer.transform(x_test) #Initialize a PassiveAggressiveClassifier pac=PassiveAggressiveClassifier(max_iter=I 00) pac.fit(tfidf_train,y _train) with open('vectors.pickle', 'wb') as f: pickle.dump(tfidf_vectorizer, f) with open('fakenews.pickle','wb') as f: pickle.dump(pac,f) pkl = open('fakenews.pickle', 'rb') pac = pickle.load(pkl) vec = open('vectors.pickle', 'rb') tt_vect = pickle.load(vec) #Predict on the test set and calculate accuracy y_pred=pac. predict(tfidf_test) score=accuracy_score(y_test,y_pred) print(f Accuracy: { round(score*100,2)}% ') #Build confusion matrix confusion_matrix(y_test,y_pred, labels=['FAKE','REAL']) print(confusion_matrix(y_test,y_pred, labels=['F AKE','REAL'])) text='Watch The Exact Moment Paul Ryan Committed Political Suicide At A Trnmp Rally (VIDEO)' tf_text=tf_ vect.transform(Ltext]) pred=pac.predict(tf_text) Print(pred)
  • 19. Fake News Detection on Social Media using Machine Learning Dept. of CSE 19 IESCE, Chittilappilly 6.2. IMPLEMENTATION Fig. 6.1 User Interface Fig.6.2 Home page
  • 20. Fake News Detection on Social Media using Machine Learning Dept. of CSE 20 IESCE, Chittilappilly Fig. 6.3 Admin login Fig.6.4 User login
  • 21. Fake News Detection on Social Media using Machine Learning Dept. of CSE 21 IESCE, Chittilappilly Fig.6.5 News Uploading Fig.6.6 News prediction
  • 22. Fake News Detection on Social Media using Machine Learning Dept. of CSE 22 IESCE, Chittilappilly Fig. 6.7 User registration
  • 23. Fake News Detection on Social Media using Machine Learning Dept. of CSE 23 IESCE, Chittilappilly 7. CONCLUSION The concept of deception detection in social media is particularly new and there is on- going research in hopes that scholars can find more accurate ways to detect false information in this booming, fake-news-infested domain. For this reason, this research may be used to help other researchers discover which combination of methods should be used in order to accurately detect fake news in social media. The proposed method described in this paper is an idea for a more accurate fake news detection algorithm. It is important that we have some mechanism for detecting fake news, or at the very least, an awareness that not everything we read on social media may be true, so we always need to be thinking critically. This way we can help people make more informed decisions and they will not be fooled into thinking what others want to manipulate them into believing Fake news interfere with the ability of a user to discern useful information from the Internet services especially when news becomes critical for decision making. Considering the changing landscape of the modern business world, the issue of fake news has become more than just a marketing problem as it warrants serious efforts from security researchers. It is imperative that any attempts to manipulate or troll the Internet through fake news are countered with absolute effectiveness. We proposed a simple but effective approach to allow users in-stall a simple tool into their personal browser and use it to detect and filter out potential Clickbaits. The preliminary experimental results conducted to assess the method's ability to attain its intended objective, showed outstanding performance in identify possible sources of fake news. Since we started this work, few fake news databases have been made available and we're currently expanding our approach using R to test its effectiveness against the new datasets. In the 21st century, the majority of the tasks are done online. Newspapers that were earlier preferred as hard-copies are now being substituted by applications like Facebook, Twitter, and news articles to be read online. Whatsapp's forwards are also a major source. The growing problem of fake news only makes things more complicated and tries to change or hamper the opinion and attitude of people towards use of digital technology. When a person is deceived by the real news two possible things happen- People start believing that their perceptions about a particular topic are true as assumed. Thus, in order to curb the phenomenon, we have developed our Fake news Detection system that takes input from the user and classify it to be true or fake. To implement this, various NLP and
  • 24. Fake News Detection on Social Media using Machine Learning Dept. of CSE 24 IESCE, Chittilappilly Machine Learning Techniques have to be used. The model is trained using an appropriate dataset and performance evaluation is also done using various performance measures. The best model, i.e. the model with highest accuracy is used to classify the news headlines or articles. As evident above for static search, our best model came out to be Logistic Regression with an accuracy of 65%. Hence we then used grid search parameter optimization to increase the performance of logistic regression which then gave us the accuracy of 75%. Hence we can say that if a user feed a particular news article or its headline in our model, there are 75% chances that it will be classified to its true nature. The user can check the news article or keywords online; he can also check the authenticity of the website. The accuracy for dynamic system is 93% and it increases with every iteration. We intend to build our own dataset which will be kept up to date according to the latest news. All the live news and latest data will be kept in a database using Web Crawler and online database.
  • 25. Fake News Detection on Social Media using Machine Learning Dept. of CSE 25 IESCE, Chittilappilly 8. REFERENCES [1] Abu-Nimeh, S., Chen, T., Alzubi, 0., 2011. Malicious and spam posts in online social networks. Computer 44, 23-28. doi:10.l 109/MC.2011.222. [2] Al Messabi, K., Aldwairi, M., Al Yousif, A., Thoban, A., Belqasmi, F., 2018. Malware detection using dns records and domain name features",in: International Conference on Future Networks and Distributed Systems (ICFNDS), ACM. URL: https://doi.org/10.1145/3231053.3231082. [3] Aldwairi, M., Abu-Dalo, A.M., Jarrah, M., 2017a. Pattern matching of signature- based ids using myers algorithm under mapreduce frame-work. EURASIP J. Information Security 2017, URL: http://dblp.uni-trier.de/db/journals/ejisec/ejisec2017.html# Aldw airiAJ17. [4] Aldwairi, M., Al-Salman, R., 2011. Malurls: Malicious urls classification system, in: Annual International Conference on Information Theoryand Applications, GSTF Digital Library (GSTF-DL), Singapore. doi:10.5176/978-981-08-8113-9_1TA201l-29. the best paper award. [5] Aldwairi, M., Alsaadi, H.H., 2017. Flukes: Autonomous log forensics, intelligence and visualization tool, in: Proceedings of the InternationalConference on Future Networks and Distributed Systems, ACM, New York, NY, USA. pp. 33:1-3 6] Aldwairi, M., Hasan, M., Balbahaith, Z., 2017b. Detection of drive-by download attacks usmg machine learning approach. Int. J. Inf. Sec.Priv. 11, 16-28. URL: https://doi.org/10.4018/IJISP.2017100102, doi:10.4018/IJISP.2017100102. [7] Balmas, M., 2014. When fake news becomes real: Combined exposure to multiple news sources and political attitudes of inefficacy, alienation,and cynicism. Communication Research 41, 430-454. doi:10.1177/0093650212453600. [8] Baym, G., Jones, J.P., 2012. News parody in global perspective: Politics, power, and resistance.PopularCommunicationl0,213.URL:https://doi.org/10.1080/15405702.2012.63 856 6, doi: I 0.1080/15405702.2012.638566. [9] Brewer, P.R., Young, D.G., Morreale, M., 2013. The impact ofreal news about fake news": Intertextual processes and political satire. In-ternational Journal of Public Opinion
  • 26. Fake News Detection on Social Media using Machine Learning Dept. of CSE 26 IESCE, Chittilappilly Research 25, 323-343. URL: http://dx.doi.org/l 0.1093/ijpor/edt0I 5, doi: IO. I 093/ijpor/edt0I5 [10] Chakraborty, A., Paranjape, B., Kakarla, S., Ganguly, N., 2016. Stop clickbait: Detecting and preventing clickbaits in online news media,in: 2016 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM), pp. 9-16. doi:10.1109/ASONAM.2016.7752207 [11] Chen, Y., Conroy, N.J., Rubin, V.L., 2015. News in an online world: The need for an "automatic crap detector", in: Proceedings of the 78thASIS&T Annual Meeting: Infonnation Science with Impact: Research in and for the Community, American Society for Information Science,SilverSprings,MD,USA.pp.81:1- 81:4.URL:http://dl.acm.org/ citation.cfm?id=2857070.2857151. [12] Conroy, N.J., Rubin, V.L., Chen, Y., 2015. Automatic deception detection: Methods for finding fake news, in: Proceedings of the 78th ASIS&TAnnual Meeting: Information Science with Impact: Research in and for the Community, American Society for Information Science, SilverSprings,MD,USA.pp.82:1- 82:4.URL:http://dl.acm.org/ citation.cfm?id=2857070.2857152. l13] Hassid, J., 2011. Four models of the fourth estate: A typology of contemporary chinese journalists. The China Quarterly 208, 813832.doi:10.1017/S0305741011001019. [14] Lewis, S., 2011. Journalists, social media, and the use of humor on twitter. The Electronic Journal of Communication/ La Revue Electronicde Communication 21, 1-2. [15] Marchi, R., 2012. With facebook, biogs, and fake news, teens reject journalistic objectivity. Journal of Communication Inquiry 36, 246-262. URL: https://doi.org/ 10.1177/0196859912458700, doi:10.1177/0196859912458700. [16] Masri, R., Aldwairi, M., 2017. Automated malicious advertisement detection using virustotal, urlvoid, and trendmicro, in: 2017 8th Interna-tional Conference on Information and Communication Systems (ICICS), pp. 336-341. doi:10.1109/IACS.2017.7921994. [17] Nah, F.F.H., 2015. Fake-website detection tools : Identifying elements that promote individuals use and enhance their performance 1 .introduction.[18] Pogue, D., 2017. How to stamp out fake news. Scientific American 316, 24-24. doi:10.1038/scientific american0217-24.
  • 27. Fake News Detection on Social Media using Machine Learning Dept. of CSE 27 IESCE, Chittilappilly [19] Qbeitah, M.A., Aldwairi, M., 2018. Dynamic malware analysis of phishing emails, in: 2018 9th International Conference on Information andCommunication Systems (ICICS), pp. 18-24. doi:10.1109/IACS.2018.8355435. [20] Riedel, B., Augenstein, I., Spithourakis, G.P., Riedel, S., 2017. A simple but tough- to-beat baseline for the fake news challenge stance detectiontask. CoRR abs/1707.03264. URL: http://arxiv.org/abs/1707.03264, arXiv:1707.03264 [21] Rubin, V.L., Chen, Y., Conroy, N.J., 2015. Deception detection for news: Three types of fakes, in: Proceedings of the 78th ASIS&T AnnualMeeting: Information Science with Impact: Research in and for the Community, American Society for Information Science, Silver Springs,MD, USA. pp. 83:1-83:4. URL: http://dl.acm.org/citation. cfm?id=2857070.2857153. [22] Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H., 2017. Fake news detection on social media: Adata mmmg perspective. SIGKDDExplor. Newsl.19, 22-36. URL: http://doi.acm.org/10.1145/3137597.3137600, doi:10.1145/3137597.3137600. [23] Smith, J., Leavitt, A., Jackson, G., 2018. Designing new ways to give context to news stories. https://medium.com/facebook-design/designing-new-ways-to-give-context- to-newsstories-f 6cl 3604f450. [24] Spicer, R.N., 2018. Lies, Damn Lies, Alternative Facts, Fake News, Propaganda, Pinocchios, Pants on Fire, Disinformation, Misin-formation, Post-Truth, Data, and Statistics. Springer International Publishing, Cham. pp. 1-31. URL: https://doi.org/10.1007/978-3-319- 69820-5_1, doi:10.1007/978-3-319-69820-5_1. [25] of Waikato, U., 2017. Waikato environment for knowledge analysis. URL: https://www.cs.waikato.ac.nz/ml/weka/.