SlideShare a Scribd company logo
1 of 25
SRI SHANMUGHA COLLEGE OF ENGINEERING AND TECHNOLOGY
FAKE NEWS DETECTION
USING PYTHON AND MACHINE LEARNING
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE
PRESENTED BY
RAJESHWARI.J (732720243021)
SHANILA RINSI.T (732720243025)
SHAMILSHA (732720243024)
GUIDED BY
JASEENASH.R , AP/CSE
ABSTARCT
• Fake News is a news designed to deliberately spread hoaxes,
propaganda and misinformation or Falsehood also known as Fake
News, it is over-whelming.
• The main objective of this project is to find the fake news, which is
unique or classic text classification problems with a straight-forward
proposition. It is used to build a model that can differentiate which is
real news and fake news.
• the Logistic regression, Decision Tree, Gradient Boosting, Random
forests algorithms for finding the fake news, that involves the terms
frequency and inverse document frequency to vectorize the news
contents.
DOMAIN INTRODUCTION
• Fake news and lack of trust in the media are growing problems with huge
ramifications in our society. Obviously, a purposely misleading story is
“fake news”, but lately blathering social media’s discourse is changing its
definition.
• Disinformation, also as known as fake news, is over whelming.
• The phrase “Fake news” named the word of the year in 2016 by the
Macquarie dictionary. Thus, the traditional media is urged to be more
creative so that it can gain more attention from the public.
• A social event like Corona Virus in 2020, official organizations are
conducting fake news from the beginning. A project lead by the World
Health Organization called EPI-WIN is available online to provide credible
information and announce fake news regarding this disease.
PROBLEM DEFINITION
• The effects of fake news can be political, economic, business,
organization, health or even personal. It is needed to build a model
that can differentiate between “Real” news and “Fake” news.
• The extensive spread of fake news has the potential for extremely
negative impacts on individuals and society. Therefore, fake news
detection has become a challenging task in today's world.
• Using SK-learn, build a TF-IDF Vectorizers on the provided dataset.
a Passive Aggressive Classifier and fit the model. In the end, the
accuracy score and the confusion matrix tell us how well our model
fares.
EXISTING SYSTEM
• How to enforce user privacy preferences
• how to secure data when stored into the PDS.
• Users are not skilled enough to understand how to translate their
privacy requirements into their privacy preferences.
• Average users might have difficulties in properly setting potentially
complex privacy preferences
DISADVANTAGES OF EXISTING SYSTEM
• Personal data we are digitally producing are scattered.
• managed by different providers (e.g., online social media, hospitals, banks,
airlines, etc).
• Users are losing control on their data, whose protection is under the
responsibility of the data provider,.
• They cannot fully exploit their data, since each provider keeps a separate
view of them.
PROPOSED SYSTEM
By utilizing Logistic regression, Decision Tree, Gradient Boosting, Random
forests algorithms, we will make our model in order to increase the
performance and accuracy.
The proposed system is cost effective.
• Does not require any external hardware implementation.
• Early detection reduces fatalities.
• System generated results prone to less error.
• Reduces human effort and intervention.
• Drastically reduces time compared to manual detection.
• Accuracy in prediction.
• Flexible and portable.
Advantages
• This Project comes up with the applications of NLP (Natural
Language Processing) techniques for detecting the 'fake news',
that is, misleading news stories that comes from the non-
reputable sources.
• The main objective of this project is to detect the fake news by
using Natural Language Processing techniques.
OBJECTIVES
REQURIMENTS SPECIFICATION
HARDWARE REQURIMENTS
• System : Intel core i5.
• Hard Disk : 500GB.
• RAM : 4GB.
• Any desktop or laptop system with above configuration or higher level.
SOFTWARE REQURIMENTS
• Operating System : Windows XP/&/10
• Coding language : Python and various libraries.
• Software : Anaconda.
• IDE : Jupyter Notebook, Pycharm.
• Front-end : HTML.
• Back-end : Flask framework
METHODOLOGY
DATA COLLECTION
In this project, the dataset is being taken from Kaggle.com. The size of
the dataset is 6335*4. It means that there are 6335 rows along with 4
columns. The name of the columns are ‘URLs’, ‘Headline’, ‘Body’ and
‘Label’. The first column identifies the news, the second and third are the
title and text, and the fourth column has labels denoting the news is
REAL or FAKE.
FEATURE EXTRACTION
To analyze and model text after it has been preprocessed, it must
first be converted into features. Techniques include Bag of Words
and TF-IDF Vectorizer.
Term Frequency-Inverse Document Frequency
It increases proportionally with the number of times a word appears in a
document but is offset by its frequency in the overall corpus. While TF-IDF is a
good basic metric for extracting descriptive terms, it does not take into
consideration a word’s position or context
Bag of words
This model analyzes the text from all input documents and converts it in a bag-
of-words form. For example, for more than one text(set of text documents).we
can have one bag of words which will contains all distinct words from all texts
in one bag.
CLASSIFICATION
Naive Bayes Classifier
In machine learning Naive Bayes Classifiers are a family of simple
“probabilistic classifiers” based on applying Bayes’ theorem with
powerful(naïve) independent assumptions between the features.
Passive Aggressive Classifier
Online learning algorithm is mainly designed for detecting
fake news on social media where new data is added every second.
LANGUAGE USED FOR IMPLEMENTATION
Python is an interpreted, object-oriented programming language similar
to PERL, that has gained popularity because of its clear syntax and
readability.
The source code is freely available and open for modification and reuse.
Features Of Python
• Easy understandable and readable.
• Interpreted Language.
• Cross-platform Language
• Free and Open Source
• Object-Oriented Language
• Extensible
• GUI Programming Support
Advantages Of Python
 Presence of Third-Party Modules
 Open Source and Community Development
 Learning Ease and Support Available
 User-friendly Data Structures
 Productivity and Speed.
FRONT-END: HTML
HTML provides a means to create structured documents by denoting structural
semantics for text such as headings, paragraphs, lists, links, quotes and other items.
Browsers do not display the HTML tags but use them to interpret the content of the page.
BACK-END: FLASK FRAMEWORK
FLASK is a micro web framework written in Python and can be used for building complex
database-driven websites starting with mostly static pages. It is classified as a micro
framework because it does not require particular tools or libraries. Applications that use
the Flask framework include Pinterest and LinkedIn.
PLATFORM
PYCHARM is an integrated development environment (IDE) used in computer
programming, specifically for the Python language. It is developed by the Czech
company Jet Brains. It provides code analysis, a graphical debugger, an integrated unit
tester, integration with version control systems (VCS), and supports web development
with Django as well as Data Science with Anaconda.
• Python refactoring: includes rename, extract method, introduce variable,
introduce constant, pull up, push down and others.
• Support for web frameworks: Django, web2py and Flask .
• Integrated Python debugger.
JUPYTER NOTEBOOK
A Jupyter Notebook can be converted to a number of open standard output formats such
as HTML, presentation slides, LaTeX, PDF, Restructured Text, Markdown, Python through
‘Download As’ in the web interface via the convert library where it takes a URL to any
publicly available notebook document, convert it to HTML on the fly and display to the
user.
ALGORITHMS
LOGISTIC REGRESSION
• Logistic regression is one of the most popular Machine Learning algorithms, which
comes under the Supervised Learning technique.
• It is used for predicting the categorical dependent variable using a given set of
independent variables.
• it gives the probabilistic values which lie between 0 and 1.
• Logistic Regression is much similar to the Linear Regression except that how they are
used.
• Logistic regression is used for solving the classification problems.
STEPS IN LOGISTIC REGRESSION
Data Pre-processing step:
• Fitting Logistic Regression to the Training set
• Predicting the test result
• Test accuracy of the result (Creation of Confusion matrix)
• Visualizing the test set result.
DECISION TREE CLASSIFICATION
• Decision Tree is a Supervised learning technique that can be used for both classification
and Regression problems, but mostly it is preferred for solving Classification problems.
• It is a tree-structured classifier, where internal nodes represent the features of a dataset,
branches represent the decision rules and each leaf node represents the outcome.
GRADIENT BOOSTING CLASSIFIER
• Gradient Boosting is a popular boosting algorithm. In gradient boosting, each
predictor corrects its predecessor’s error.
• In contrast to Ad boost, the weights of the training instances are not tweaked,
instead, each predictor is trained using the residual errors of predecessor as labels.
• There is a technique called the Gradient Boosted Trees whose base learner is CART
(Classification and Regression Trees).
• The below diagram explains how gradient boosted trees are trained for regression
problems.
# This Python 3 environment comes with many helpful analytics libraries installed # It
is defined by the kaggle/python
Docker image: https://github.com/kaggle/docker-python
# For example, here's several helpful packages to load
import numpy as np # linear algebra
import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv)
# Input data files are available in the read-only "../input/" directory
# For example, running this (by clicking run or pressing Shift+Enter) will list all
files under the input directory
# Modelling Algorithms
from sklearn.naive_bayes import MultinomialNB
from sklearn.linear_model import LogisticRegression
from sklearn.linear_model import PassiveAggressiveClassifier
# Modelling Helpers
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer from
sklearn.feature_extraction.text import CountVectorizer
from sklearn import metrics
# Computations
import itertools
# Visualization
import matplotlib.pyplot as plt
test = pd.read_csv ("test.csv")
submit = pd.read_csv ("submit.csv") train = pd.read_csv("test.csv") train.head()
SOURCE CODE
id title author text
0 20800
Specter of Trump Loosens Tongues, if Not
Purse... David Streitfeld
PALO ALTO, Calif. — After years of
scorning...
1 20801 Russian warships ready to strike terrorists ne... NaN Russian warships ready to strike terrorists ne...
2 20802
#NoDAPL: Native American Leaders Vow to
Stay A... Common Dreams
Videos #NoDAPL: Native American Leaders
Vow to...
id
title author text
3 20803
Tim Tebow Will Attempt Another Comeback,
This ...
Daniel Victor If at first you don’t succeed, try a different...
4 20804 Keiser Report: Meme Wars (E995)
Truth Broadcast
Network
42 mins ago 1 Views 0 Comments 0 Likes
'For th...
print(f"Train Shape : {train.shape}") print(f"Test Shape : {test.shape}")
print(f"Submit Shape : {submit.shape}")
Train Shape : (5200, 4)
Test Shape : (5200, 4)
Submit Shape : (5200, 2)
train.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5200 entries, 0 to 5199 Data
columns (total 4 columns):
#Column Non-Null Count Dtype
•id 5200 non-null int64
•title 5078 non-null object
•author 4697 non-null object
•text 5193 non-null object dtypes: int64(1), object(3)
memory usage: 162.6+ KB
train.isnull().sum()
id 0 title 122
author 503
text 7
dtype: int64
train.dtypes.value_counts()
object 3
int64 1
dtype: int64
test=test.fillna(' ') train=train.fillna(' ‘)
# Create a column with all the data available
test['total']=test['title']+' '+test['author']+' '+test['text']
train['total']=train['title']+' '+train['author']+'
'+train['text']
# Have a glance at our training set
train.info() train.head()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5200 entries, 0
to 5199 Data columns (total 5 columns):
# Create a column with all the data available
test['total']=test['title']+' '+test['author']+' '+test['text']
train['total']=train['title']+' '+train['author']+' '+train['text']
# Have a glance at our training set
train.info() train.head()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 5200 entries, 0 to 5199 Data
columns (total 5 columns):
# Column Non-Null Count Dtype
•id 5200 non-null int64
•title 5200 non-null object
•author 5200 non-null object
•text 5200 non-null object
•total 5200 non-null object dtypes: int64(1), object(4)
memory usage: 203.2+ KB
id title author text total
0 20800
Specter of Trump Loosens
Tongues, if Not Purse...
David Streitfeld
PALO ALTO, Calif. — After
years of scorning...
Specter of Trump
LoosensTongues, if Not
Purse...
1 20801
Russian warships ready to
strike terrorists
ne...
Russian warships ready to
strike terrorists
ne...
Russian warships ready
tostrike terrorists
ne...
2 20802
#NoDAPL: Native American
Leaders Vow to Stay A...
Common
Dreams
Videos #NoDAPL: Native
American Leaders Vow to...
#NoDAPL: Native American
Leaders Vow to Stay A...
3 20803
Tim Tebow Will Attempt
Another Comeback, This ...
Daniel Victor
If at first you don’t succeed, try
a different...
Tim Tebow Will Attempt
Another Comeback, This ...
4 20804
Keiser Report: Meme Wars
(E995)
Truth Broadcast
Network
42 mins ago 1 Views 0
Comments 0 Likes 'For th...
Keiser Report: Meme
Wars(E995) Truth
Broadcas...
OUTPUT
CONCLUSION:
• In our project, we have used the Logistic regression, Decision Tree,
Gradient Boosting classifier, Random forests classifier, these algorithms
were helpful for predicting the honesty which user gives an input news.
• After the user news input, it predicts the model with selection features
called as Count Vectorization and TF-IDF .
• Both features are useful for finding the extent of accuracy.
FUTURE SCOPE
• In the future, how to combine statistical linear models with context related
matric will be used to increase the accuracy while keeping time complexity
as low. For instance, a complex detection method can be set up with PA as
the first screen step.
• With other specialized machine learning technologies taking metadata into
account to increase the accuracy.
REFERENCES
[1]. By ‘Murari Choudhary, Prashant, Shashank Jha, Deepika Saxena and Ashutosh
Kumar Singh’ in the year 2021.
[2]. S. B. Parikh and P. K. Atrey, "Media-Rich Fake News Detection: A Survey",
2018 IEEE Conference on Multimedia Information Processing and Retrieval
(MIPR), pp. 436-441, 2018, April.
[3]. M. Granik and V.Mesyura, "Fake news detection using naive
Bayesclassifier",2017 IEEE First Ukraine Conference on Electrical and Computer
Engineering (UKRCON), pp. 900-903,2017.
[4.]. J. Zhang, L. Cui, Y. Fu and F. B. Gouza, "Fake news detection with deep
diffusive network model", 2018.
[5.] By ‘Terry Traylor, Jeremy Straub, Gurmeet and Nicholas Snell’, in the year 2019.
[6]. By ‘Rahul R Mandical, N Mamatha, N Shivakumar, R Monica and AN
Krishna’, in theyear 2020.
[7]. Ammara Habib, Muhammad Zubair Asghar, Adil Khan, Anam Habib and Aurangzeb
Khan, "False information detection in online content and its role in decision making:a
systematic literature review" in, Austria: Springer-Verlag GmbH, 2019.
[8]. Aswini Thota, Priyanka Tilak, Simrat Ahluwalia and Nibrat Lohia, "Fake News
Detection: A Deep Learning Approach", SMU Data Science Review, vol. 1, no. 3, 2018.
[9]. Kuai Xu, Feng Wang, Haiyan Wang and Bo Yang, "Detecting Fake News Over Online
Social Media via Domain Reputations and Content Understanding", the proceeding
of Tsinghua Science and Technology, vol. 25, no. 1, Feb. 2020.
[10]. Chaowei Zhang, Ashish Gupta, Christian Kauten, Amit V. Deokar and Xiao Qin,
"Detecting fake news for reducing misinformation risks using analytics approaches",
the proceeding of ELSEVIER European Journal of Operational Research, 2019.
[11]. By: Sadia Afroz*, Michael Brennan and Rachel Greenstadt* in theYear: 2012.
[12]. By S. S. Y. L. Natali Ruchansky, "CSI: A Hybrid Deep Model for Fake News
Detection", CIKM 2017 Internationall Conference on Information and
Knowledgemangaement, 2017.
[13]. By J. S. G. N. S. Terry Traylor, "Classifying Fake News Articles Using Natural
Language Processing to Identify In-Article Attribution as a Supervised Learning
Estimator", IEEE 13th International Conference on Semantic Computing, 2019.

More Related Content

What's hot

Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine LearningScaleway
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction SystemBigDataCloud
 
Brief Introduction to Boltzmann Machine
Brief Introduction to Boltzmann MachineBrief Introduction to Boltzmann Machine
Brief Introduction to Boltzmann MachineArunabha Saha
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic RegressionKnoldus Inc.
 
Diabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine LearningDiabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine Learningjagan477830
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networksAkash Goel
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachinePulse
 
Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAndrea Dal Pozzolo
 
How do we train AI to be Ethical and Unbiased?
How do we train AI to be Ethical and Unbiased?How do we train AI to be Ethical and Unbiased?
How do we train AI to be Ethical and Unbiased?Mark Borg
 
Federated learning
Federated learningFederated learning
Federated learningMindos Cheng
 
Heart disease prediction system
Heart disease prediction systemHeart disease prediction system
Heart disease prediction systemSWAMI06
 
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGDETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGijcsit
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningKuppusamy P
 
Ai lecture 07 inference engine
Ai lecture 07 inference engineAi lecture 07 inference engine
Ai lecture 07 inference engineAhmad sohail Kakar
 
Fake News Detection Using Machine learning algorithm
Fake News Detection Using Machine learning algorithm Fake News Detection Using Machine learning algorithm
Fake News Detection Using Machine learning algorithm MudasirBashir23
 
Classification using back propagation algorithm
Classification using back propagation algorithmClassification using back propagation algorithm
Classification using back propagation algorithmKIRAN R
 
Neural Networks
Neural NetworksNeural Networks
Neural NetworksAdri Jovin
 
Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection TechniqueChakrit Phain
 

What's hot (20)

Fraud detection with Machine Learning
Fraud detection with Machine LearningFraud detection with Machine Learning
Fraud detection with Machine Learning
 
Crime Analysis & Prediction System
Crime Analysis & Prediction SystemCrime Analysis & Prediction System
Crime Analysis & Prediction System
 
Brief Introduction to Boltzmann Machine
Brief Introduction to Boltzmann MachineBrief Introduction to Boltzmann Machine
Brief Introduction to Boltzmann Machine
 
Machine Learning With Logistic Regression
Machine Learning  With Logistic RegressionMachine Learning  With Logistic Regression
Machine Learning With Logistic Regression
 
Diabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine LearningDiabetes Prediction Using Machine Learning
Diabetes Prediction Using Machine Learning
 
backpropagation in neural networks
backpropagation in neural networksbackpropagation in neural networks
backpropagation in neural networks
 
Machine Learning and Real-World Applications
Machine Learning and Real-World ApplicationsMachine Learning and Real-World Applications
Machine Learning and Real-World Applications
 
Adaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud DetectionAdaptive Machine Learning for Credit Card Fraud Detection
Adaptive Machine Learning for Credit Card Fraud Detection
 
How do we train AI to be Ethical and Unbiased?
How do we train AI to be Ethical and Unbiased?How do we train AI to be Ethical and Unbiased?
How do we train AI to be Ethical and Unbiased?
 
Federated learning
Federated learningFederated learning
Federated learning
 
Bagging.pptx
Bagging.pptxBagging.pptx
Bagging.pptx
 
Deep learning presentation
Deep learning presentationDeep learning presentation
Deep learning presentation
 
Heart disease prediction system
Heart disease prediction systemHeart disease prediction system
Heart disease prediction system
 
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNINGDETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
DETECTION OF FAKE ACCOUNTS IN INSTAGRAM USING MACHINE LEARNING
 
Anomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine LearningAnomaly detection (Unsupervised Learning) in Machine Learning
Anomaly detection (Unsupervised Learning) in Machine Learning
 
Ai lecture 07 inference engine
Ai lecture 07 inference engineAi lecture 07 inference engine
Ai lecture 07 inference engine
 
Fake News Detection Using Machine learning algorithm
Fake News Detection Using Machine learning algorithm Fake News Detection Using Machine learning algorithm
Fake News Detection Using Machine learning algorithm
 
Classification using back propagation algorithm
Classification using back propagation algorithmClassification using back propagation algorithm
Classification using back propagation algorithm
 
Neural Networks
Neural NetworksNeural Networks
Neural Networks
 
Anomaly Detection Technique
Anomaly Detection TechniqueAnomaly Detection Technique
Anomaly Detection Technique
 

Similar to Fake news detection

NLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionNLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionMohammad Ilyas Malik
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discoveryadamkraut
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAjaved75
 
Simagis for healthcare
Simagis for healthcareSimagis for healthcare
Simagis for healthcarekhvatkov
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Ali Alkan
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedRobert Grossman
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyYannick Pouliot
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...Bonnie Hurwitz
 
DATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGEDATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGENeeraj Goswami
 
Turning data into knowledge the impacts of bioinformatics
Turning data into knowledge  the impacts of bioinformaticsTurning data into knowledge  the impacts of bioinformatics
Turning data into knowledge the impacts of bioinformaticsICRISAT
 
Online talent sourcing - a future essentia
Online talent sourcing - a future essentiaOnline talent sourcing - a future essentia
Online talent sourcing - a future essentiaHSE Guru
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sangerChris Dwan
 
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptxGEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptxGeetha982072
 

Similar to Fake news detection (20)

NLP, Expert system and pattern recognition
NLP, Expert system and pattern recognitionNLP, Expert system and pattern recognition
NLP, Expert system and pattern recognition
 
Data Science and Analysis.pptx
Data Science and Analysis.pptxData Science and Analysis.pptx
Data Science and Analysis.pptx
 
Building Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated DiscoveryBuilding Data Ecosystems for Accelerated Discovery
Building Data Ecosystems for Accelerated Discovery
 
Data Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATAData Science.pptx NEW COURICUUMN IN DATA
Data Science.pptx NEW COURICUUMN IN DATA
 
Simagis for healthcare
Simagis for healthcareSimagis for healthcare
Simagis for healthcare
 
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
Makine Öğrenmesi, Yapay Zeka ve Veri Bilimi Süreçlerinin Otomatikleştirilmesi...
 
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed DeployedCrossing the Analytics Chasm and Getting the Models You Developed Deployed
Crossing the Analytics Chasm and Getting the Models You Developed Deployed
 
Databases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems ImmunologyDatabases, Web Services and Tools For Systems Immunology
Databases, Web Services and Tools For Systems Immunology
 
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
iMicrobe and iVirus: Extending the iPlant cyberinfrastructure from plants to ...
 
Introduction
IntroductionIntroduction
Introduction
 
Bar camp bigdata
Bar camp bigdataBar camp bigdata
Bar camp bigdata
 
DATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGEDATA MINING TOOL- ORANGE
DATA MINING TOOL- ORANGE
 
Big Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARLBig Data & DS Analytics for PAARL
Big Data & DS Analytics for PAARL
 
Big data
Big dataBig data
Big data
 
Turning data into knowledge the impacts of bioinformatics
Turning data into knowledge  the impacts of bioinformaticsTurning data into knowledge  the impacts of bioinformatics
Turning data into knowledge the impacts of bioinformatics
 
ODSC and iRODS
ODSC and iRODSODSC and iRODS
ODSC and iRODS
 
Online talent sourcing - a future essentia
Online talent sourcing - a future essentiaOnline talent sourcing - a future essentia
Online talent sourcing - a future essentia
 
2016 05 sanger
2016 05 sanger2016 05 sanger
2016 05 sanger
 
data analytics.pptx
data analytics.pptxdata analytics.pptx
data analytics.pptx
 
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptxGEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
GEETHAhshansbbsbsbhshnsnsn_INTERNSHIP.pptx
 

Recently uploaded

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformChameera Dedduwage
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdfssuser54595a
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...Marc Dusseiller Dusjagr
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13Steve Thomason
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Sapana Sha
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting DataJhengPantaleon
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxNirmalaLoungPoorunde1
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...M56BOOKSTORE PRODUCT/SERVICE
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationnomboosow
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Educationpboyjonauth
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon AUnboundStockton
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionSafetyChain Software
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxGaneshChakor2
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docxPoojaSen20
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentInMediaRes1
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsKarinaGenton
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactdawncurless
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxpboyjonauth
 

Recently uploaded (20)

A Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy ReformA Critique of the Proposed National Education Policy Reform
A Critique of the Proposed National Education Policy Reform
 
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
18-04-UA_REPORT_MEDIALITERAСY_INDEX-DM_23-1-final-eng.pdf
 
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
“Oh GOSH! Reflecting on Hackteria's Collaborative Practices in a Global Do-It...
 
The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13The Most Excellent Way | 1 Corinthians 13
The Most Excellent Way | 1 Corinthians 13
 
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111Call Girls in Dwarka Mor Delhi Contact Us 9654467111
Call Girls in Dwarka Mor Delhi Contact Us 9654467111
 
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data_Math 4-Q4 Week 5.pptx Steps in Collecting Data
_Math 4-Q4 Week 5.pptx Steps in Collecting Data
 
Employee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptxEmployee wellbeing at the workplace.pptx
Employee wellbeing at the workplace.pptx
 
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdfTataKelola dan KamSiber Kecerdasan Buatan v022.pdf
TataKelola dan KamSiber Kecerdasan Buatan v022.pdf
 
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
KSHARA STURA .pptx---KSHARA KARMA THERAPY (CAUSTIC THERAPY)————IMP.OF KSHARA ...
 
Interactive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communicationInteractive Powerpoint_How to Master effective communication
Interactive Powerpoint_How to Master effective communication
 
Introduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher EducationIntroduction to ArtificiaI Intelligence in Higher Education
Introduction to ArtificiaI Intelligence in Higher Education
 
Crayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon ACrayon Activity Handout For the Crayon A
Crayon Activity Handout For the Crayon A
 
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝Model Call Girl in Bikash Puri  Delhi reach out to us at 🔝9953056974🔝
Model Call Girl in Bikash Puri Delhi reach out to us at 🔝9953056974🔝
 
Mastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory InspectionMastering the Unannounced Regulatory Inspection
Mastering the Unannounced Regulatory Inspection
 
CARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptxCARE OF CHILD IN INCUBATOR..........pptx
CARE OF CHILD IN INCUBATOR..........pptx
 
MENTAL STATUS EXAMINATION format.docx
MENTAL     STATUS EXAMINATION format.docxMENTAL     STATUS EXAMINATION format.docx
MENTAL STATUS EXAMINATION format.docx
 
Alper Gobel In Media Res Media Component
Alper Gobel In Media Res Media ComponentAlper Gobel In Media Res Media Component
Alper Gobel In Media Res Media Component
 
Science 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its CharacteristicsScience 7 - LAND and SEA BREEZE and its Characteristics
Science 7 - LAND and SEA BREEZE and its Characteristics
 
Accessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impactAccessible design: Minimum effort, maximum impact
Accessible design: Minimum effort, maximum impact
 
Introduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptxIntroduction to AI in Higher Education_draft.pptx
Introduction to AI in Higher Education_draft.pptx
 

Fake news detection

  • 1. SRI SHANMUGHA COLLEGE OF ENGINEERING AND TECHNOLOGY FAKE NEWS DETECTION USING PYTHON AND MACHINE LEARNING DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND DATA SCIENCE PRESENTED BY RAJESHWARI.J (732720243021) SHANILA RINSI.T (732720243025) SHAMILSHA (732720243024) GUIDED BY JASEENASH.R , AP/CSE
  • 2. ABSTARCT • Fake News is a news designed to deliberately spread hoaxes, propaganda and misinformation or Falsehood also known as Fake News, it is over-whelming. • The main objective of this project is to find the fake news, which is unique or classic text classification problems with a straight-forward proposition. It is used to build a model that can differentiate which is real news and fake news. • the Logistic regression, Decision Tree, Gradient Boosting, Random forests algorithms for finding the fake news, that involves the terms frequency and inverse document frequency to vectorize the news contents.
  • 3. DOMAIN INTRODUCTION • Fake news and lack of trust in the media are growing problems with huge ramifications in our society. Obviously, a purposely misleading story is “fake news”, but lately blathering social media’s discourse is changing its definition. • Disinformation, also as known as fake news, is over whelming. • The phrase “Fake news” named the word of the year in 2016 by the Macquarie dictionary. Thus, the traditional media is urged to be more creative so that it can gain more attention from the public. • A social event like Corona Virus in 2020, official organizations are conducting fake news from the beginning. A project lead by the World Health Organization called EPI-WIN is available online to provide credible information and announce fake news regarding this disease.
  • 4. PROBLEM DEFINITION • The effects of fake news can be political, economic, business, organization, health or even personal. It is needed to build a model that can differentiate between “Real” news and “Fake” news. • The extensive spread of fake news has the potential for extremely negative impacts on individuals and society. Therefore, fake news detection has become a challenging task in today's world. • Using SK-learn, build a TF-IDF Vectorizers on the provided dataset. a Passive Aggressive Classifier and fit the model. In the end, the accuracy score and the confusion matrix tell us how well our model fares.
  • 5. EXISTING SYSTEM • How to enforce user privacy preferences • how to secure data when stored into the PDS. • Users are not skilled enough to understand how to translate their privacy requirements into their privacy preferences. • Average users might have difficulties in properly setting potentially complex privacy preferences DISADVANTAGES OF EXISTING SYSTEM • Personal data we are digitally producing are scattered. • managed by different providers (e.g., online social media, hospitals, banks, airlines, etc). • Users are losing control on their data, whose protection is under the responsibility of the data provider,. • They cannot fully exploit their data, since each provider keeps a separate view of them.
  • 6. PROPOSED SYSTEM By utilizing Logistic regression, Decision Tree, Gradient Boosting, Random forests algorithms, we will make our model in order to increase the performance and accuracy. The proposed system is cost effective. • Does not require any external hardware implementation. • Early detection reduces fatalities. • System generated results prone to less error. • Reduces human effort and intervention. • Drastically reduces time compared to manual detection. • Accuracy in prediction. • Flexible and portable. Advantages
  • 7. • This Project comes up with the applications of NLP (Natural Language Processing) techniques for detecting the 'fake news', that is, misleading news stories that comes from the non- reputable sources. • The main objective of this project is to detect the fake news by using Natural Language Processing techniques. OBJECTIVES
  • 8. REQURIMENTS SPECIFICATION HARDWARE REQURIMENTS • System : Intel core i5. • Hard Disk : 500GB. • RAM : 4GB. • Any desktop or laptop system with above configuration or higher level. SOFTWARE REQURIMENTS • Operating System : Windows XP/&/10 • Coding language : Python and various libraries. • Software : Anaconda. • IDE : Jupyter Notebook, Pycharm. • Front-end : HTML. • Back-end : Flask framework
  • 9. METHODOLOGY DATA COLLECTION In this project, the dataset is being taken from Kaggle.com. The size of the dataset is 6335*4. It means that there are 6335 rows along with 4 columns. The name of the columns are ‘URLs’, ‘Headline’, ‘Body’ and ‘Label’. The first column identifies the news, the second and third are the title and text, and the fourth column has labels denoting the news is REAL or FAKE. FEATURE EXTRACTION To analyze and model text after it has been preprocessed, it must first be converted into features. Techniques include Bag of Words and TF-IDF Vectorizer.
  • 10. Term Frequency-Inverse Document Frequency It increases proportionally with the number of times a word appears in a document but is offset by its frequency in the overall corpus. While TF-IDF is a good basic metric for extracting descriptive terms, it does not take into consideration a word’s position or context Bag of words This model analyzes the text from all input documents and converts it in a bag- of-words form. For example, for more than one text(set of text documents).we can have one bag of words which will contains all distinct words from all texts in one bag.
  • 11. CLASSIFICATION Naive Bayes Classifier In machine learning Naive Bayes Classifiers are a family of simple “probabilistic classifiers” based on applying Bayes’ theorem with powerful(naïve) independent assumptions between the features. Passive Aggressive Classifier Online learning algorithm is mainly designed for detecting fake news on social media where new data is added every second.
  • 12. LANGUAGE USED FOR IMPLEMENTATION Python is an interpreted, object-oriented programming language similar to PERL, that has gained popularity because of its clear syntax and readability. The source code is freely available and open for modification and reuse. Features Of Python • Easy understandable and readable. • Interpreted Language. • Cross-platform Language • Free and Open Source • Object-Oriented Language • Extensible • GUI Programming Support
  • 13. Advantages Of Python  Presence of Third-Party Modules  Open Source and Community Development  Learning Ease and Support Available  User-friendly Data Structures  Productivity and Speed. FRONT-END: HTML HTML provides a means to create structured documents by denoting structural semantics for text such as headings, paragraphs, lists, links, quotes and other items. Browsers do not display the HTML tags but use them to interpret the content of the page. BACK-END: FLASK FRAMEWORK FLASK is a micro web framework written in Python and can be used for building complex database-driven websites starting with mostly static pages. It is classified as a micro framework because it does not require particular tools or libraries. Applications that use the Flask framework include Pinterest and LinkedIn.
  • 14. PLATFORM PYCHARM is an integrated development environment (IDE) used in computer programming, specifically for the Python language. It is developed by the Czech company Jet Brains. It provides code analysis, a graphical debugger, an integrated unit tester, integration with version control systems (VCS), and supports web development with Django as well as Data Science with Anaconda. • Python refactoring: includes rename, extract method, introduce variable, introduce constant, pull up, push down and others. • Support for web frameworks: Django, web2py and Flask . • Integrated Python debugger. JUPYTER NOTEBOOK A Jupyter Notebook can be converted to a number of open standard output formats such as HTML, presentation slides, LaTeX, PDF, Restructured Text, Markdown, Python through ‘Download As’ in the web interface via the convert library where it takes a URL to any publicly available notebook document, convert it to HTML on the fly and display to the user.
  • 15. ALGORITHMS LOGISTIC REGRESSION • Logistic regression is one of the most popular Machine Learning algorithms, which comes under the Supervised Learning technique. • It is used for predicting the categorical dependent variable using a given set of independent variables. • it gives the probabilistic values which lie between 0 and 1. • Logistic Regression is much similar to the Linear Regression except that how they are used. • Logistic regression is used for solving the classification problems. STEPS IN LOGISTIC REGRESSION Data Pre-processing step: • Fitting Logistic Regression to the Training set • Predicting the test result • Test accuracy of the result (Creation of Confusion matrix) • Visualizing the test set result.
  • 16. DECISION TREE CLASSIFICATION • Decision Tree is a Supervised learning technique that can be used for both classification and Regression problems, but mostly it is preferred for solving Classification problems. • It is a tree-structured classifier, where internal nodes represent the features of a dataset, branches represent the decision rules and each leaf node represents the outcome. GRADIENT BOOSTING CLASSIFIER • Gradient Boosting is a popular boosting algorithm. In gradient boosting, each predictor corrects its predecessor’s error. • In contrast to Ad boost, the weights of the training instances are not tweaked, instead, each predictor is trained using the residual errors of predecessor as labels. • There is a technique called the Gradient Boosted Trees whose base learner is CART (Classification and Regression Trees). • The below diagram explains how gradient boosted trees are trained for regression problems.
  • 17. # This Python 3 environment comes with many helpful analytics libraries installed # It is defined by the kaggle/python Docker image: https://github.com/kaggle/docker-python # For example, here's several helpful packages to load import numpy as np # linear algebra import pandas as pd # data processing, CSV file I/O (e.g. pd.read_csv) # Input data files are available in the read-only "../input/" directory # For example, running this (by clicking run or pressing Shift+Enter) will list all files under the input directory # Modelling Algorithms from sklearn.naive_bayes import MultinomialNB from sklearn.linear_model import LogisticRegression from sklearn.linear_model import PassiveAggressiveClassifier # Modelling Helpers from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.feature_extraction.text import CountVectorizer from sklearn import metrics # Computations import itertools # Visualization import matplotlib.pyplot as plt test = pd.read_csv ("test.csv") submit = pd.read_csv ("submit.csv") train = pd.read_csv("test.csv") train.head() SOURCE CODE
  • 18. id title author text 0 20800 Specter of Trump Loosens Tongues, if Not Purse... David Streitfeld PALO ALTO, Calif. — After years of scorning... 1 20801 Russian warships ready to strike terrorists ne... NaN Russian warships ready to strike terrorists ne... 2 20802 #NoDAPL: Native American Leaders Vow to Stay A... Common Dreams Videos #NoDAPL: Native American Leaders Vow to... id title author text 3 20803 Tim Tebow Will Attempt Another Comeback, This ... Daniel Victor If at first you don’t succeed, try a different... 4 20804 Keiser Report: Meme Wars (E995) Truth Broadcast Network 42 mins ago 1 Views 0 Comments 0 Likes 'For th... print(f"Train Shape : {train.shape}") print(f"Test Shape : {test.shape}") print(f"Submit Shape : {submit.shape}") Train Shape : (5200, 4) Test Shape : (5200, 4) Submit Shape : (5200, 2) train.info() <class 'pandas.core.frame.DataFrame'> RangeIndex: 5200 entries, 0 to 5199 Data columns (total 4 columns):
  • 19. #Column Non-Null Count Dtype •id 5200 non-null int64 •title 5078 non-null object •author 4697 non-null object •text 5193 non-null object dtypes: int64(1), object(3) memory usage: 162.6+ KB train.isnull().sum() id 0 title 122 author 503 text 7 dtype: int64 train.dtypes.value_counts() object 3 int64 1 dtype: int64 test=test.fillna(' ') train=train.fillna(' ‘) # Create a column with all the data available test['total']=test['title']+' '+test['author']+' '+test['text'] train['total']=train['title']+' '+train['author']+' '+train['text'] # Have a glance at our training set train.info() train.head() <class 'pandas.core.frame.DataFrame'> RangeIndex: 5200 entries, 0 to 5199 Data columns (total 5 columns):
  • 20. # Create a column with all the data available test['total']=test['title']+' '+test['author']+' '+test['text'] train['total']=train['title']+' '+train['author']+' '+train['text'] # Have a glance at our training set train.info() train.head() <class 'pandas.core.frame.DataFrame'> RangeIndex: 5200 entries, 0 to 5199 Data columns (total 5 columns): # Column Non-Null Count Dtype •id 5200 non-null int64 •title 5200 non-null object •author 5200 non-null object •text 5200 non-null object •total 5200 non-null object dtypes: int64(1), object(4) memory usage: 203.2+ KB
  • 21. id title author text total 0 20800 Specter of Trump Loosens Tongues, if Not Purse... David Streitfeld PALO ALTO, Calif. — After years of scorning... Specter of Trump LoosensTongues, if Not Purse... 1 20801 Russian warships ready to strike terrorists ne... Russian warships ready to strike terrorists ne... Russian warships ready tostrike terrorists ne... 2 20802 #NoDAPL: Native American Leaders Vow to Stay A... Common Dreams Videos #NoDAPL: Native American Leaders Vow to... #NoDAPL: Native American Leaders Vow to Stay A... 3 20803 Tim Tebow Will Attempt Another Comeback, This ... Daniel Victor If at first you don’t succeed, try a different... Tim Tebow Will Attempt Another Comeback, This ... 4 20804 Keiser Report: Meme Wars (E995) Truth Broadcast Network 42 mins ago 1 Views 0 Comments 0 Likes 'For th... Keiser Report: Meme Wars(E995) Truth Broadcas... OUTPUT
  • 22. CONCLUSION: • In our project, we have used the Logistic regression, Decision Tree, Gradient Boosting classifier, Random forests classifier, these algorithms were helpful for predicting the honesty which user gives an input news. • After the user news input, it predicts the model with selection features called as Count Vectorization and TF-IDF . • Both features are useful for finding the extent of accuracy. FUTURE SCOPE • In the future, how to combine statistical linear models with context related matric will be used to increase the accuracy while keeping time complexity as low. For instance, a complex detection method can be set up with PA as the first screen step. • With other specialized machine learning technologies taking metadata into account to increase the accuracy.
  • 23. REFERENCES [1]. By ‘Murari Choudhary, Prashant, Shashank Jha, Deepika Saxena and Ashutosh Kumar Singh’ in the year 2021. [2]. S. B. Parikh and P. K. Atrey, "Media-Rich Fake News Detection: A Survey", 2018 IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), pp. 436-441, 2018, April. [3]. M. Granik and V.Mesyura, "Fake news detection using naive Bayesclassifier",2017 IEEE First Ukraine Conference on Electrical and Computer Engineering (UKRCON), pp. 900-903,2017. [4.]. J. Zhang, L. Cui, Y. Fu and F. B. Gouza, "Fake news detection with deep diffusive network model", 2018. [5.] By ‘Terry Traylor, Jeremy Straub, Gurmeet and Nicholas Snell’, in the year 2019.
  • 24. [6]. By ‘Rahul R Mandical, N Mamatha, N Shivakumar, R Monica and AN Krishna’, in theyear 2020. [7]. Ammara Habib, Muhammad Zubair Asghar, Adil Khan, Anam Habib and Aurangzeb Khan, "False information detection in online content and its role in decision making:a systematic literature review" in, Austria: Springer-Verlag GmbH, 2019. [8]. Aswini Thota, Priyanka Tilak, Simrat Ahluwalia and Nibrat Lohia, "Fake News Detection: A Deep Learning Approach", SMU Data Science Review, vol. 1, no. 3, 2018. [9]. Kuai Xu, Feng Wang, Haiyan Wang and Bo Yang, "Detecting Fake News Over Online Social Media via Domain Reputations and Content Understanding", the proceeding of Tsinghua Science and Technology, vol. 25, no. 1, Feb. 2020. [10]. Chaowei Zhang, Ashish Gupta, Christian Kauten, Amit V. Deokar and Xiao Qin, "Detecting fake news for reducing misinformation risks using analytics approaches", the proceeding of ELSEVIER European Journal of Operational Research, 2019.
  • 25. [11]. By: Sadia Afroz*, Michael Brennan and Rachel Greenstadt* in theYear: 2012. [12]. By S. S. Y. L. Natali Ruchansky, "CSI: A Hybrid Deep Model for Fake News Detection", CIKM 2017 Internationall Conference on Information and Knowledgemangaement, 2017. [13]. By J. S. G. N. S. Terry Traylor, "Classifying Fake News Articles Using Natural Language Processing to Identify In-Article Attribution as a Supervised Learning Estimator", IEEE 13th International Conference on Semantic Computing, 2019.