2. Team
Anand Mohan - 20BCE0146
Vihith arekatla - 20BCE2878
Amit Kumar - 20BCE0135
Abhishek Kumar - 20BCE0210
3. 1. Abstract
e-News is the most readable content in the world and reading the news is common in everyday life.In india we
have different languages till now we have only english news classifier we don’t have regional languages, such
as Hindi, Telugu, Malayalam, Tamil, etc news classifier. Here we came up with our project is to train a neural
network model to classify the news articles into the following categories: Tamil Nadu, India, Cinema, Sports,
Politics, World. We are using two neural networks to achieve these results Dense neural layer and Long Short
Term Neural Layer.
The aim of this project is to train a neural network model to classify the news articles into the following
categories: Tamil Nadu, India, Cinema, Sports Politics World. Here, we have used six individual binary neural
networks using Long short term memory layers for each category, and using a voting algorithm, predict the
most apt news category a given news article belongs to. We realise this using a simple Graphical User
Interface or GUI, to get the input from the user, and return the predicted category back again.
4. 2. Problem Statement
In this project, we aim to train a simple neural network model to classify news articles
in “Tamil” Language to six distinct categories. We train six binary classifiers - one for
each of the six categories, and use a simple voting algorithm to get the final predicted
category. We also develop a simple graphical user interface to get the news headlines
from the user, and return the predicted classifier back to the user again.
5. 3. Literature review
Research Paper Methodology Detailed Description Advantages &
Research Gap
1. News Classification and Its
Techniques: A Review
Author’s: Gurmeet Kaur,
Karan Bajaj
The paper say about how news
classification can be done by
using different techniques in
which the first step is news
classification. News Tokenisation
here it includes in dividing the
enormous text into little tokens.
and the words in the news are
called as a string.
It is difficult to implement it in a larger
corpora and hence algorithms can be
improved so that efficiency of
categorisation could be improved these
algorithms can be tried on bigger
corpora. In addition these algorithms
can be improved so efficiency of
categorisation could be improved
6. Research Paper Methodology Detailed Description Advantages &
Research Gap
2.Sentiment analysis of
tweets in three Indian
languages.
In Proceedings of the 6th
Workshop on South and
Southeast Asian Natural
Language Processing
Phani, S., Lahiri, S., and
Biswas
In this paper they explain the
sentiment analysis on tweets in
three indian regional languages
namely Tamil,Hindi and Bengali
by using the SAIL dataset which
is released at 2015.
All classifiers cannot be used. It is
limited only to multinomial Naive Bayes
present in WEKA for implementing our
system because our experiments with
other classifiers show that they give
poorer performance on SAIL data set.
7. Research Paper Methodology Detailed Description Advantages &
Research Gap
3.Graph Convolutional
Network for Swahili News
Classification
Alexandros Kastanos, Tyler
Martin
experimentation is done on the
sparsely-labelled semi-
supervised context which is
representative of the practical
constraints facing low-resourced
African languages.
Alternative graph structures can be
used instead. They can also consider
implementing methods from text
Inductive GNN methods
8. Research Paper Methodology Detailed Description Advantages &
Research Gap
4. categorization of Tamil
News Articles using Pre
Trained Word2Vec
Embeddings with
Convolutional Neural
Network
Mr. RamrajS, Arthi.R
Convolutional Neural Network
system is designed with
three convolutions followed by a
merge layer. Input for
convolutions are eed from the
embedding layer. Three type
convolutions 3×3,4×4,5×5 are
used through which features are
formalized
The precision, recall and F1 score for
the class politics is low when
compared to other two classes. The
reason for this may be due to the
occurrence of new tokens in politics test
data than in cinema and sports. In
future,it can be improved by
accommodating the same methodology
for other social media data as done for
news web data. Also, sentiment of the
data can be analysed after
topic categorization
9. Research Paper Methodology Detailed Description Advantages &
Research Gap
5.A Deep Learning Approach
for URL based Health
Information Search
R.Rajalakshmi and S.
Ramraj
an URL based design has been
suggested to ease the task of
health information search. The
content based methods are not
suitable, as it is time consuming
and does not reflect the dynamic
changes in the web.
By consolidating the yields of two
person CNN models, this issue has
been disposed of. To examine the
viability of the proposed troupe
approach, 5-overlap cross approval was
performed.
10. Research Paper Methodology Detailed Description Advantages &
Research Gap
6.News Text Classification
Method and Simulation
Based on the Hybrid Deep
Learning Model
Ningfeng Sun and Chengye
Du
The paper says about how this
simulation based on the hybrid
deep learning model is essentially
made out of four sections, in
particular, news message pre-
processing, word vector-based
news message representation,
news message include extraction
and grouping, and message order
result assessment.
In this paper The influence of dropout
parameter changes based on the
accuracy of news text classification
which can be improved by adopting
different methodologies.
11. Research Paper Methodology Detailed Description Advantages &
Research Gap
7.A systematic review of text
classification research based
on deep learning models in
Arabic language
Ahlam Wahdan, Sendeyah
Hantoobi
, Said A. Salloum
, Khaled Shaalan
Deep learning techniques in
classification and its
type are discussed in this paper
as well. Neural networks of
various types,
namely, RNN, CNN, FFNN, and
LSTM, are identified as the
subject of study.
The researchers did not indicate in
detail the parameters used in these
networks and how they are
tuned.Usually, the machine learning
algorithms are tuned by changing
parameters and re-running the
experiments
to get significant results.
12. Research Paper Methodology Detailed Description Advantages &
Research Gap
8. Analyzing sentiment in
Indian languages micro text
using recurrent neural
networks.
IIOAB Journal: A Journal of
Multidisciplinary Science and
Technology
S. Seshadri, A.K.a b
Madasamy
In this work the tweets are
classified into three polarity
category namely positive,
negative and neutral. Twitter data
of three languages namely Tamil,
Hindi and Bengali are already
provided by SAIL 2015 task
organizers as we have
participated in the contest.
They limited their research to only 3
languages even thugh the accuracy is
pretty high, more languages can be
added in future.