2. C O N T E N T S
.
1.Abstract
2.Introduction
2.1 Background and Motivation
2.2 Objectives of the Project
2.3 Overview of Promptify and its Role in Text
Classification
3.Literature Survey
3.1 Overview of Text Classification Techniques
3.2 Previous Approaches to Multi-Class Text
Classification
3.3 Review of Promptify and its Applications in
NLP
4.Methodology
5.Implementation
6.Results and Evaluation
3. ABSTRACT
Multi-class text classification has evolved significantly over time, with new algorithms
and techniques improving accuracy and efficiency. Early focus was binary classification,
followed by multi-class classification. In the 1990s, decision trees and SVMs became
popular, while deep learning algorithm gained popularity in the 2000s. Text
classification, the process of categorizing textual data into predefined classes or
categories, plays a crucial role in various natural language processing (NLP) tasks. In
this project, we explore the application of Promptify, a method for generating prompts to
guide language model behavior, in the context of multi-class text classification. We begin
by reviewing existing literature on text classification techniques and previous approaches
to multi-class classification. Additionally, we provide an overview of Promptify and its
potential applications in NLP tasks.
4. 2.Introduction
2.1 Background and Motivation
Text classification is a fundamental task in natural language processing (NLP) that involves categorizing
text documents into predefined classes or categories based on their content.
Traditional approaches to multi-class text classification often involve feature engineering, such as bag-of-
words or TF-IDF representations, coupled with machine learning algorithms like Naive Bayes, Support
Vector Machines (SVM), or deep learning models such as recurrent neural networks (RNNs) and
convolutional neural networks (CNNs).
Why “Multiclass Text Classsification”?
1. Information Organization
2. Decision Support Systems
3. Personalization and Recommendation Systems
4. Enhanced User Experience
5. Research and Innovation
5. OBJECTIVE
Text classification aims to categorize documents into predefined
categories, such as positive or negative sentiment. Standard
machine learning s ystems, such as Naive Ba yes Classifier, Linear
Support Vector Machine, Logistic Regression, Word to vector
(Word2vec), Document to vector (Doc2vc), and Bag of Words (BOW)
w ith Keras, outperform human-delivered baselines w hen used on a
dataset of stack overflow questions, answ ers, and tags. This paper
examines and compares the accuracies of these algorithms using a
dataset of stack overflow questions, answ ers, and tags
7. Authors :-Xiw u Han,Gregory Toner published year:- may,2017
Technologies used:- NVM, support vector machines, dating text extraction
Accuracy:-STI NBM for 6yrs 47.01* for 12yrs 49.90*For 20yrs 55.92*
Result:- The task of dating texts by multi-class classification with sliding time intervals
involves categorizing texts into different date ranges
Drawbacks:-The STI method significantly outperformed FTI classifiers, and the NBM STI
achieved the best dating precision on DTE Subtask 2 thoughinvolving only two types of
classification features.
Dating Texts by Multi-class Classification
withSliding Time Intervals
8. Author:-Kewen Xia Published year:- October 8, 2020
Technologies used : Senti4SD,deep learning
RESULT : The proposed approach is evaluated on a public dataset, and the results
suggest that it significantly improves the state of the art.
Accuracy : The approach improves precision from 75.72% to 95.49%, average recall
from 69.40% to 93.94%, and f-measure from 72.41% to 94.71%
Drawbacks :- 1.Fixed-size input requirement
2. Difficulty in handling out-of-vocabulary words
Convolutional Neural Network Based Classification
of App Reviews
9. AUTHORS : Jinhong Wu, Caiyun Huang, Yongyue Chen published year: -2020
ALGORITHMS IMPLEMENTED : Bidirectional LSTM (Bi-LSTM), Attention Mechanism.
RESULT : By comparing the Bi-LSTM-A method with the traditional classification method,
the Bi-LSTM-A method showed better results at all levels of classification,and was able to
improve the classification at the group level w ith higher text similarity
Accuracy : Most of the results are above 94% While not 100% accurate, it improves
classification errors in some categories.
LIMITATIONS : 1. Overfitting 2. Scalability
Patent Text Classification Study Based on Bi-
LSTM-A Model
10. Authuor name :-Shreehar Joshi published date:2021 Technologies
used : LR,Random ForestGradient Boosting Machines (GBM)
Accuracy : approximately 80%
Result: help us figure out how good the models are at understanding and sorting
through online drug reviews, which is important for improving patient safety and
healthcare quality.
Drawbacks: Data Quality: Some reviews may be incomplete, biased, or contain
irrelevant information, which can affect the performance of the
classification models.
Multi-class Text Classification Using MachineLearning
Models for Online Drug Reviews
11. Research and Implementation of Text Topic Classification Based
on Text CNN
Research and Implementation of Text Topic
Classification Based on Text CNN
Author:-Wanbo Luo Published year:- 2022
Technologies used : TextCNN + self-attention mechanism, Introduced classification Layer
RESULT : Intorduced CNN for text Classification problems and performed well on the
trained Data.
Accuracy : CNN Loss- 0.0216 Accuracy-0.9872 MAE- 0.0725
Drawbacks :- 1. Limited Contextual Information
2. Difficulty in handling out-of-vocabulary words
3.Can’t work well with network structure
12. MCNN-LSTM: Combining CNN and LSTM to Classify
Multi-Class Text in Imbalanced News Data
AUTHORS : Asif Karim published year:-2023
ALGORITHMS IMPLEMENTED : MCNN-LSTM , Tomek-Link
RESULT : This model performed well on un balanced data, whre it configures the un balanced data
and uses multi-class convolution neural network with LSTM.
Accuracy : our model MCNN-LSTM (95.4%) outperforms current methods for finding balanced text
data.
LIMITATIONS : 1.Limited Contextual Understanding
2. Hyper parameter tuning