SlideShare a Scribd company logo
1 of 4
Download to read offline
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 366
Text Segmentation for Online Subjective Examination Using Machine
Learning
Shahid Khan1, Rakshanda Chavan2 , Diksha Singh3, Tina Sajwan4
1,2,3,4 Modern Education Society’s College of Engineering, Pune-411001
---------------------------------------------------------------------***---------------------------------------------------------------------
Abstract - This paper focuses on text segmentation for
natural language using k-Nearest Neighbour (K-NN)
classifier , which is a type of instance-based learning, or lazy
learning, where the function is only approximated locally
and all computation is deferred until classification. The Text
segmentation divides written text into meaningful units,
which is used by humans when reading text, and artificial
processes implemented in computers which are subject to
natural language processing. K-NN computes the similarity
measure among attributes to determine similarity between
feature vectors after which K-NN is modified based on the
similarity measure, this version is applied into the text
segmentation task. The goal of this paper is to implement
natural language processing using text segmentation which
provides the benefits.
Key Words: K-NN, text segmentation, feature similarity,
NLP
1.INTRODUCTION
The text segmentation is defined as process of segmenting
automatically a large text into many parts based on its
topic or content. The information retrieval (IR) systems
tend to retrieve long texts which contain more than one
topic, as very high relevant texts to the given query, so the
long texts need to be segmented into text partitions topic
by topic. The task of text segmentation is to partition the
text into sentences and paragraphs and judge whether the
topic boundary is put or not between two adjacent
sentences or paragraph. In this task, the text is given as the
input and segmented into paragraphs, a list of pairs of
adjacent paragraphs is generated, and each pair is judged
whether we put the topic boundary between them, or not.
The task is interpreted into a binary classification where
each pair of paragraphs is classified into separation or
non-separation. The task may be interpreted into the
binary classification where each sentence or paragraph
pair into the transition to the different topic or the
continuation of the identical topic.
Some issues are caused by encoding texts into numerical
vectors and computing their similarities based on only
attribute values. This problem causes very high costs for
processing each numerical vector representing a
document in terms of time and system resources. Much
more training examples are required proportionally to the
dimension for avoiding overfitting. The second problem is
sparse distribution where each numerical vector has zero
values dominantly.
Let us mention what we propose in this research as some
agenda. In this research, we assume that words are given
as features of numerical vectors in encoding texts, and
they have their semantic relations with others. Based on
the assumption, we define the similarity measure for
computing the similarity between feature vectors,
considering both feature values and features. We modify
the KNN into the version where both the feature similarity
and the feature value similarity are used, and apply it to
the classification task mapped from the text segmentation.
As benefits from this research, we expect its more
tolerance to the sparse distributions and the potential
avoidance of the huge dimensionality.
Let us mention what is expected from this research as
benefits by implementing the above ideas. We may cut
down the dimensionality in encoding texts into numerical
vectors, potentially. The information loss in computing the
similarity between texts may be reduced by reflecting the
similarities among the features.
We present some benefits which are expected from this
research. By representing the texts into alternative one to
the numerical vectors, we may escape from the two main
problems in doing so. The proposed approach becomes
less sensitive to the sparse distribution of numerical
vectors, because the similarity among features is captured
as well as among feature values.
2. RELATED WORK
Let us survey the previous cases of encoding texts into
structured forms for using the machine learning
algorithms for text mining tasks. The three main problems,
huge dimensionality, sparse distribution, and poor
transparency, have existed inherently in encoding them
into numerical vectors. In previous works, various
schemes of pre-processing texts have been proposed, in
order to solve the problems.
In paper [1], it is given that text segmentation refers to the
process of segmenting an article into its several parts
based on its content. Because in the information retrieval
systems, a long text tends to be retrieved most frequently
by overestimation of its relevancy to a query, we need to
segment it into its several parts, in order to avoid the
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 367
problem. In this task, the text is given as the input and
segmented into paragraphs, a list of pairs of adjacent
paragraphs is generated, and each pair is judged whether
we put the topic boundary between them, or not.
In paper [2], The task of text segmentation is to partition
the text into sentences and paragraphs and judge whether
the topic boundary is put or not between two adjacent
sentences or paragraph. The task may be interpreted into
the binary classification where each sentence or
paragraph pair into the transition to the different topic or
the continuation of the identical topic. Segmentation of
speech texts into sentences or paragraphs may be
considered but covered in the next research. In the text
categorization, the sample texts may span over various
domains, whereas in the text segmentation, the sample
paragraphs should be within a domain. Therefore,
although the text segmentation belongs to the
classification task, it should be distinguished from the
topic based text categorization. The text segmentation is
mapped into a binary classification.
In paper [3], the application of the back propagation to the
judgment of keywords is validated restrictedly. The
definition of the back propagation to the judgment of
keywords may be considered in various ways. The
Information systems dealing with documents, such as
Knowledge Management (KM), Information Retrieval (IR)
and Digital Library (DL) systems require the storage of
documents and structured data, called the document
surrogate, associated with documents. Documents are
written in natural language and cannot be processed
directly by computers. A typical document surrogate,
which is converted from the natural language document
by computer, contains indices of the document and
includes main words reflecting the contents. Indexing
defines the process of converting a document into a list of
words included in it. This paper proposed the application
of back propagation and consideration of more factors
with the addition to TF (Term Frequency) and IDF
(Inverse Document Frequency).
Paper [4], states that text categorization is the process of
assigning one or some among predefined categories to
each document. The task belongs to pattern classification
where texts or documents are given as patterns. Note that
almost information in any system is given as textual
formats dominantly over numerical one. For managing
efficiently the kind of information given as the textual
format, techniques of text categorization are necessary;
text categorization became a very interesting research
topic in both academic and industrial worlds. In this
version of the proposed text categorization system, the
number of entries of tables is fixed constantly. The
proposed one is called static index based approach.
However, the optimal number of entries is very dependent
on the given document or corpus. The size of each table
should be optimized in terms of two factors: reliability and
efficiency.
In paper [5], authors tried to understand the automated
categorization (or classification) of texts into predefined
categories has witnessed a booming interest in the last 10
years, due to the increased availability of documents in
digital form and the ensuing need to organize them. It is
important to bear in mind that the considerations above
are not absolute statements (if there may be any) on the
comparative effectiveness of these TC methods. One of the
reasons is that a particular applicative context may exhibit
very different characteristics from the ones to be found in
Reuters, and different classifiers may respond differently
to these characteristics. An experimental study by
Joachims [1998] involving support vector machines, k-NN,
decision trees, Rocchio, and Naive Bayes, showed all these
classifiers to have similar effectiveness on categories with
300 positive training examples each. The fact that this
experiment involved the methods which have scored best
(support vector machines, k-NN) and worst (Rocchio and
Naive Bayes).Most popular approach to TC, at least in the
operational (i.e., real world applications) community, was
a knowledge engineering (KE).
In paper [6], the authors have studied that text clustering
refers to the process of segmenting a particular group of
documents into sub groups each of which contains content
based similar documents. A collection or group of
documents is given as the input of the task. Several smaller
groups of content-based similar documents are generated
from the task as its output. Although there are many
heuristic approaches to the task, unsupervised learning
algorithms have been used as state of the art approaches
to it. The process of encoding documents into numerical
vectors for using traditional unsupervised learning
algorithms for text clustering causes the two main
problems. The first problem is huge dimensionality where
documents must be encoded into very large dimensional
numerical vectors for preventing information loss. In
general, documents must be encoded at least into several
hundreds dimensional numerical vectors in previous
literatures. This problem causes very expensive cost for
processing each numerical vector representing a
document in terms of time and system resources.
Furthermore, much more training examples are required
proportionally to the dimension for avoiding
overfitting.The second problem is sparse distribution
where each numerical vector has zero values dominantly.
In other words, more than 90 degree 0 of its elements are
zero values in each numerical vector. This phenomenon
degrades the discrimination among numerical vectors.
This causes poor performance of text categorization or
text clustering. In order to improve performance of both
tasks, the two problems should be solved.
3. PROPOSED SYSTEM
KNN Classifier:
This section tells about the KNN classifier which is an
algorithm used for text segmentation. It keeps the record
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 368
of all the previous cases and another unknown case is
been classified. It is a type of supervised learning. The
unknown case is been classified by the maximum votes of
its K nearest neighbours. It is a kind of Machine Learning
algorithm and also one of the simplest algorithm used for
classification. It considers the similarity between the
attributes of the answers written by the user and then
computes the similarities between the features of the
answer and specimen answers. In this research, we
encode sentence pairs or paragraph pairs into string
vectors, and apply the string vector based version of KNN
to the classification task mapped from the text
segmentation
NLP:
This section is concerned with Natural Language
Processing which is a field of AI(Artificial Intelligence). It
is about the co-operation between the computer and the
Natural Language used by humans. NLP is helpful in
solving many problems like machine translation and text
segmentation.
Text Segmentation:
This section is concerned about Text segmentation which
is the process where the text which is been written is
divided into small parts. The term applies both to mental
processes used by humans when reading text, and to
artificial processes implemented in computers, which are
the subject of natural language processing. It is very
helpful in assisting computers so that it is possible for the
computers to do artificial things. It is a precursor Natural
Language Processing. Text Segmentation recognizes the
boundaries in between the words.
Data Store:
This section tells us about the role of data store in the
process. A data store is a repository for storing collections
of data, such as database. A data store is basically a
connection to the repository of data, whether the data is
stored in a single database or in one more different files.
The data store can be used to gain data or you can export
the data from results and then store it in the data store, or
both. The data collected from the users is stored in the
data store. For the processing the data stored in the data
store is processed and stored back into the data store for
the users to retrieve their processed data whenever he
wants. Hence data store plays a major role in the entire
process. For the data to be stored in the data store it need
not compulsorily be arranged in some relational format.
Fig -1: Architecture Diagram
4. CONCLUSIONS
An examination system is developed based on the web.
This paper describes the principle of the system, presents
the main functions of the system, analyzes the auto-
generating test paper algorithm, and discusses the
security of the system. With the help of the algorithm we
can conduct online subjective exams anywhere and
everywhere.
It saves time as it allows number of students to give the
exam at a time and displays the results as the test gets
over, so no need to wait for the result. It is automatically
generated by the server. Staff has a privilege to create,
modify and delete the test papers and its particular
questions. Student can register, login and give the test
with his specific id, and can see the results as well.
ACKNOWLEDGEMENT
We thank our guide Prof. A. D. Dhawale for his guidance
and support.
REFERENCES
1) Taeho Jo, “Using K Nearest Neighbors for Text
Segmentation with Feature Similarity”,
International Conference on Communication,
Control, Computing and Electronics Engineering
(ICCCCEE), 2017.
International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056
Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072
© 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 369
2) Taeho Jo, “Content based Segmentation of Texts
using Table based KNN”, IKE, 2017.
3) Taeho Jo, Malrey Lee , and Thomas M Gatton,
“Keyword Extraction from Documents Using a
Neural Network Model”, IEEE, 2016.
4) T. Jo, “NTC (Neural Text Categorizer): Neural
Network for Text Categorization”, pp83-96,
International Journal of Information Studies, Vol 2,
No 2, 2010.
5) T. Jo, “Normalized Table Matching Algorithm as
Approach to Text Categorization”, pp839-849, Soft
Computing, Vol 19, No 4, 2015.
6) T. Jo, “Single Pass Algorithm for Text Clustering by
Encoding Documents into Tables”, pp1749-1757,
Journal of Korea Multimedia Society, Vol 11, No 12,
2008.
7) T. Jo and D. Cho, “Index based Approach for Text
Categorization”, International Journal of
Mathematics and Computers in Simulation, Vol 2,
No 1, 2008.
8) H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini,
and C. Watkins, “Text Classification with String
Kernels”, pp419-444, Journal of Machine Learning
Research, Vol 2, No 2, 2002.
9) F. Sebastiani, “Machine Learning in Automated Text
Categorization”, pp1-47, ACM Computing Survey,
Vol 34, No 1, 2002
10) T. Jo, “Representation of Texts into String Vectors
for Text Categorization”, pp110-127, Journal of
Computing Science and Engineering, Vol 4, No 2,
2010.

More Related Content

What's hot

A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...
A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...
A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...IJERA Editor
 
G04124041046
G04124041046G04124041046
G04124041046IOSR-JEN
 
An efficient-classification-model-for-unstructured-text-document
An efficient-classification-model-for-unstructured-text-documentAn efficient-classification-model-for-unstructured-text-document
An efficient-classification-model-for-unstructured-text-documentSaleihGero
 
Relevance feature discovery for text mining
Relevance feature discovery for text miningRelevance feature discovery for text mining
Relevance feature discovery for text miningredpel dot com
 
An Evaluation of Preprocessing Techniques for Text Classification
An Evaluation of Preprocessing Techniques for Text ClassificationAn Evaluation of Preprocessing Techniques for Text Classification
An Evaluation of Preprocessing Techniques for Text ClassificationIJCSIS Research Publications
 
Query Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationQuery Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationIJMER
 
IRJET- Text Document Clustering using K-Means Algorithm
IRJET-  	  Text Document Clustering using K-Means Algorithm IRJET-  	  Text Document Clustering using K-Means Algorithm
IRJET- Text Document Clustering using K-Means Algorithm IRJET Journal
 
IRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between SentencesIRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between SentencesIRJET Journal
 
Legal Document
Legal DocumentLegal Document
Legal Documentlegal4
 
Text Document categorization using support vector machine
Text Document categorization using support vector machineText Document categorization using support vector machine
Text Document categorization using support vector machineIRJET Journal
 
Suitability of naïve bayesian methods for paragraph level text classification...
Suitability of naïve bayesian methods for paragraph level text classification...Suitability of naïve bayesian methods for paragraph level text classification...
Suitability of naïve bayesian methods for paragraph level text classification...ijaia
 
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGA CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGijcsit
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
 

What's hot (15)

A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...
A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...
A Comparative Study of Centroid-Based and Naïve Bayes Classifiers for Documen...
 
G04124041046
G04124041046G04124041046
G04124041046
 
An efficient-classification-model-for-unstructured-text-document
An efficient-classification-model-for-unstructured-text-documentAn efficient-classification-model-for-unstructured-text-document
An efficient-classification-model-for-unstructured-text-document
 
Bl24409420
Bl24409420Bl24409420
Bl24409420
 
Relevance feature discovery for text mining
Relevance feature discovery for text miningRelevance feature discovery for text mining
Relevance feature discovery for text mining
 
An Evaluation of Preprocessing Techniques for Text Classification
An Evaluation of Preprocessing Techniques for Text ClassificationAn Evaluation of Preprocessing Techniques for Text Classification
An Evaluation of Preprocessing Techniques for Text Classification
 
Query Answering Approach Based on Document Summarization
Query Answering Approach Based on Document SummarizationQuery Answering Approach Based on Document Summarization
Query Answering Approach Based on Document Summarization
 
IRJET- Text Document Clustering using K-Means Algorithm
IRJET-  	  Text Document Clustering using K-Means Algorithm IRJET-  	  Text Document Clustering using K-Means Algorithm
IRJET- Text Document Clustering using K-Means Algorithm
 
IRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between SentencesIRJET-Semantic Similarity Between Sentences
IRJET-Semantic Similarity Between Sentences
 
Hc3612711275
Hc3612711275Hc3612711275
Hc3612711275
 
Legal Document
Legal DocumentLegal Document
Legal Document
 
Text Document categorization using support vector machine
Text Document categorization using support vector machineText Document categorization using support vector machine
Text Document categorization using support vector machine
 
Suitability of naïve bayesian methods for paragraph level text classification...
Suitability of naïve bayesian methods for paragraph level text classification...Suitability of naïve bayesian methods for paragraph level text classification...
Suitability of naïve bayesian methods for paragraph level text classification...
 
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MININGA CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
A CLUSTERING TECHNIQUE FOR EMAIL CONTENT MINING
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
 

Similar to Machine Learning Text Segmentation IRJET

IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET Journal
 
An in-depth review on News Classification through NLP
An in-depth review on News Classification through NLPAn in-depth review on News Classification through NLP
An in-depth review on News Classification through NLPIRJET Journal
 
Reviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringReviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringIRJET Journal
 
Semantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical ChainsSemantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical ChainsIRJET Journal
 
Text Document Classification System
Text Document Classification SystemText Document Classification System
Text Document Classification SystemIRJET Journal
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemIRJET Journal
 
Feature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsFeature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...IRJET Journal
 
Exploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsIRJET Journal
 
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET-  	  Review on Information Retrieval for Desktop Search EngineIRJET-  	  Review on Information Retrieval for Desktop Search Engine
IRJET- Review on Information Retrieval for Desktop Search EngineIRJET Journal
 
Improved Text Mining for Bulk Data Using Deep Learning Approach
Improved Text Mining for Bulk Data Using Deep Learning Approach Improved Text Mining for Bulk Data Using Deep Learning Approach
Improved Text Mining for Bulk Data Using Deep Learning Approach IJCSIS Research Publications
 
IRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET- Semantic based Automatic Text Summarization based on Soft ComputingIRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET- Semantic based Automatic Text Summarization based on Soft ComputingIRJET Journal
 
Survey of Machine Learning Techniques in Textual Document Classification
Survey of Machine Learning Techniques in Textual Document ClassificationSurvey of Machine Learning Techniques in Textual Document Classification
Survey of Machine Learning Techniques in Textual Document ClassificationIOSR Journals
 
Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase IJECEIAES
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET Journal
 
Context Driven Technique for Document Classification
Context Driven Technique for Document ClassificationContext Driven Technique for Document Classification
Context Driven Technique for Document ClassificationIDES Editor
 
Meta documents and query extension to enhance information retrieval process
Meta documents and query extension to enhance information retrieval processMeta documents and query extension to enhance information retrieval process
Meta documents and query extension to enhance information retrieval processeSAT Journals
 
Group4 doc
Group4 docGroup4 doc
Group4 docfirati
 
Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...IAESIJAI
 

Similar to Machine Learning Text Segmentation IRJET (20)

IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
IRJET- Diverse Approaches for Document Clustering in Product Development Anal...
 
An in-depth review on News Classification through NLP
An in-depth review on News Classification through NLPAn in-depth review on News Classification through NLP
An in-depth review on News Classification through NLP
 
Reviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clusteringReviews on swarm intelligence algorithms for text document clustering
Reviews on swarm intelligence algorithms for text document clustering
 
Semantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical ChainsSemantic Based Document Clustering Using Lexical Chains
Semantic Based Document Clustering Using Lexical Chains
 
C017321319
C017321319C017321319
C017321319
 
Text Document Classification System
Text Document Classification SystemText Document Classification System
Text Document Classification System
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
 
Feature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsFeature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documents
 
IRJET- Automated Document Summarization and Classification using Deep Lear...
IRJET- 	  Automated Document Summarization and Classification using Deep Lear...IRJET- 	  Automated Document Summarization and Classification using Deep Lear...
IRJET- Automated Document Summarization and Classification using Deep Lear...
 
Exploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining ApplicationsExploiting Wikipedia and Twitter for Text Mining Applications
Exploiting Wikipedia and Twitter for Text Mining Applications
 
IRJET- Review on Information Retrieval for Desktop Search Engine
IRJET-  	  Review on Information Retrieval for Desktop Search EngineIRJET-  	  Review on Information Retrieval for Desktop Search Engine
IRJET- Review on Information Retrieval for Desktop Search Engine
 
Improved Text Mining for Bulk Data Using Deep Learning Approach
Improved Text Mining for Bulk Data Using Deep Learning Approach Improved Text Mining for Bulk Data Using Deep Learning Approach
Improved Text Mining for Bulk Data Using Deep Learning Approach
 
IRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET- Semantic based Automatic Text Summarization based on Soft ComputingIRJET- Semantic based Automatic Text Summarization based on Soft Computing
IRJET- Semantic based Automatic Text Summarization based on Soft Computing
 
Survey of Machine Learning Techniques in Textual Document Classification
Survey of Machine Learning Techniques in Textual Document ClassificationSurvey of Machine Learning Techniques in Textual Document Classification
Survey of Machine Learning Techniques in Textual Document Classification
 
Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase Converting UML Class Diagrams into Temporal Object Relational DataBase
Converting UML Class Diagrams into Temporal Object Relational DataBase
 
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word EmbeddingIRJET- Short-Text Semantic Similarity using Glove Word Embedding
IRJET- Short-Text Semantic Similarity using Glove Word Embedding
 
Context Driven Technique for Document Classification
Context Driven Technique for Document ClassificationContext Driven Technique for Document Classification
Context Driven Technique for Document Classification
 
Meta documents and query extension to enhance information retrieval process
Meta documents and query extension to enhance information retrieval processMeta documents and query extension to enhance information retrieval process
Meta documents and query extension to enhance information retrieval process
 
Group4 doc
Group4 docGroup4 doc
Group4 doc
 
Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...
 

More from IRJET Journal

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...IRJET Journal
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTUREIRJET Journal
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...IRJET Journal
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsIRJET Journal
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...IRJET Journal
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...IRJET Journal
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...IRJET Journal
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...IRJET Journal
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASIRJET Journal
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...IRJET Journal
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProIRJET Journal
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...IRJET Journal
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemIRJET Journal
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesIRJET Journal
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web applicationIRJET Journal
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...IRJET Journal
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.IRJET Journal
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...IRJET Journal
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignIRJET Journal
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...IRJET Journal
 

More from IRJET Journal (20)

TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
TUNNELING IN HIMALAYAS WITH NATM METHOD: A SPECIAL REFERENCES TO SUNGAL TUNNE...
 
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURESTUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
STUDY THE EFFECT OF RESPONSE REDUCTION FACTOR ON RC FRAMED STRUCTURE
 
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
A COMPARATIVE ANALYSIS OF RCC ELEMENT OF SLAB WITH STARK STEEL (HYSD STEEL) A...
 
Effect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil CharacteristicsEffect of Camber and Angles of Attack on Airfoil Characteristics
Effect of Camber and Angles of Attack on Airfoil Characteristics
 
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
A Review on the Progress and Challenges of Aluminum-Based Metal Matrix Compos...
 
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
Dynamic Urban Transit Optimization: A Graph Neural Network Approach for Real-...
 
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
Structural Analysis and Design of Multi-Storey Symmetric and Asymmetric Shape...
 
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
A Review of “Seismic Response of RC Structures Having Plan and Vertical Irreg...
 
A REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADASA REVIEW ON MACHINE LEARNING IN ADAS
A REVIEW ON MACHINE LEARNING IN ADAS
 
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
Long Term Trend Analysis of Precipitation and Temperature for Asosa district,...
 
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD ProP.E.B. Framed Structure Design and Analysis Using STAAD Pro
P.E.B. Framed Structure Design and Analysis Using STAAD Pro
 
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
A Review on Innovative Fiber Integration for Enhanced Reinforcement of Concre...
 
Survey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare SystemSurvey Paper on Cloud-Based Secured Healthcare System
Survey Paper on Cloud-Based Secured Healthcare System
 
Review on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridgesReview on studies and research on widening of existing concrete bridges
Review on studies and research on widening of existing concrete bridges
 
React based fullstack edtech web application
React based fullstack edtech web applicationReact based fullstack edtech web application
React based fullstack edtech web application
 
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
A Comprehensive Review of Integrating IoT and Blockchain Technologies in the ...
 
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
A REVIEW ON THE PERFORMANCE OF COCONUT FIBRE REINFORCED CONCRETE.
 
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
Optimizing Business Management Process Workflows: The Dynamic Influence of Mi...
 
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic DesignMultistoried and Multi Bay Steel Building Frame by using Seismic Design
Multistoried and Multi Bay Steel Building Frame by using Seismic Design
 
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
Cost Optimization of Construction Using Plastic Waste as a Sustainable Constr...
 

Recently uploaded

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSKurinjimalarL3
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZTE
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxbritheesh05
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSCAESB
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxpurnimasatapathy1234
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidNikhilNagaraju
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfAsst.prof M.Gokilavani
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionDr.Costas Sachpazis
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLDeelipZope
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort servicejennyeacort
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacingjaychoudhary37
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineeringmalavadedarshan25
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxPoojaBan
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxwendy cai
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 

Recently uploaded (20)

APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICSAPPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
APPLICATIONS-AC/DC DRIVES-OPERATING CHARACTERISTICS
 
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
ZXCTN 5804 / ZTE PTN / ZTE POTN / ZTE 5804 PTN / ZTE POTN 5804 ( 100/200 GE Z...
 
Artificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptxArtificial-Intelligence-in-Electronics (K).pptx
Artificial-Intelligence-in-Electronics (K).pptx
 
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptxExploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
Exploring_Network_Security_with_JA3_by_Rakesh Seal.pptx
 
GDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentationGDSC ASEB Gen AI study jams presentation
GDSC ASEB Gen AI study jams presentation
 
Microscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptxMicroscopic Analysis of Ceramic Materials.pptx
Microscopic Analysis of Ceramic Materials.pptx
 
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
9953056974 Call Girls In South Ex, Escorts (Delhi) NCR.pdf
 
main PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfidmain PPT.pptx of girls hostel security using rfid
main PPT.pptx of girls hostel security using rfid
 
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdfCCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
CCS355 Neural Network & Deep Learning Unit II Notes with Question bank .pdf
 
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective IntroductionSachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
Sachpazis Costas: Geotechnical Engineering: A student's Perspective Introduction
 
Current Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCLCurrent Transformer Drawing and GTP for MSETCL
Current Transformer Drawing and GTP for MSETCL
 
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort serviceGurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
Gurgaon ✡️9711147426✨Call In girls Gurgaon Sector 51 escort service
 
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(ANVI) Koregaon Park Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
microprocessor 8085 and its interfacing
microprocessor 8085  and its interfacingmicroprocessor 8085  and its interfacing
microprocessor 8085 and its interfacing
 
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
🔝9953056974🔝!!-YOUNG call girls in Rajendra Nagar Escort rvice Shot 2000 nigh...
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Internship report on mechanical engineering
Internship report on mechanical engineeringInternship report on mechanical engineering
Internship report on mechanical engineering
 
Heart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptxHeart Disease Prediction using machine learning.pptx
Heart Disease Prediction using machine learning.pptx
 
What are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptxWhat are the advantages and disadvantages of membrane structures.pptx
What are the advantages and disadvantages of membrane structures.pptx
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 

Machine Learning Text Segmentation IRJET

  • 1. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 366 Text Segmentation for Online Subjective Examination Using Machine Learning Shahid Khan1, Rakshanda Chavan2 , Diksha Singh3, Tina Sajwan4 1,2,3,4 Modern Education Society’s College of Engineering, Pune-411001 ---------------------------------------------------------------------***--------------------------------------------------------------------- Abstract - This paper focuses on text segmentation for natural language using k-Nearest Neighbour (K-NN) classifier , which is a type of instance-based learning, or lazy learning, where the function is only approximated locally and all computation is deferred until classification. The Text segmentation divides written text into meaningful units, which is used by humans when reading text, and artificial processes implemented in computers which are subject to natural language processing. K-NN computes the similarity measure among attributes to determine similarity between feature vectors after which K-NN is modified based on the similarity measure, this version is applied into the text segmentation task. The goal of this paper is to implement natural language processing using text segmentation which provides the benefits. Key Words: K-NN, text segmentation, feature similarity, NLP 1.INTRODUCTION The text segmentation is defined as process of segmenting automatically a large text into many parts based on its topic or content. The information retrieval (IR) systems tend to retrieve long texts which contain more than one topic, as very high relevant texts to the given query, so the long texts need to be segmented into text partitions topic by topic. The task of text segmentation is to partition the text into sentences and paragraphs and judge whether the topic boundary is put or not between two adjacent sentences or paragraph. In this task, the text is given as the input and segmented into paragraphs, a list of pairs of adjacent paragraphs is generated, and each pair is judged whether we put the topic boundary between them, or not. The task is interpreted into a binary classification where each pair of paragraphs is classified into separation or non-separation. The task may be interpreted into the binary classification where each sentence or paragraph pair into the transition to the different topic or the continuation of the identical topic. Some issues are caused by encoding texts into numerical vectors and computing their similarities based on only attribute values. This problem causes very high costs for processing each numerical vector representing a document in terms of time and system resources. Much more training examples are required proportionally to the dimension for avoiding overfitting. The second problem is sparse distribution where each numerical vector has zero values dominantly. Let us mention what we propose in this research as some agenda. In this research, we assume that words are given as features of numerical vectors in encoding texts, and they have their semantic relations with others. Based on the assumption, we define the similarity measure for computing the similarity between feature vectors, considering both feature values and features. We modify the KNN into the version where both the feature similarity and the feature value similarity are used, and apply it to the classification task mapped from the text segmentation. As benefits from this research, we expect its more tolerance to the sparse distributions and the potential avoidance of the huge dimensionality. Let us mention what is expected from this research as benefits by implementing the above ideas. We may cut down the dimensionality in encoding texts into numerical vectors, potentially. The information loss in computing the similarity between texts may be reduced by reflecting the similarities among the features. We present some benefits which are expected from this research. By representing the texts into alternative one to the numerical vectors, we may escape from the two main problems in doing so. The proposed approach becomes less sensitive to the sparse distribution of numerical vectors, because the similarity among features is captured as well as among feature values. 2. RELATED WORK Let us survey the previous cases of encoding texts into structured forms for using the machine learning algorithms for text mining tasks. The three main problems, huge dimensionality, sparse distribution, and poor transparency, have existed inherently in encoding them into numerical vectors. In previous works, various schemes of pre-processing texts have been proposed, in order to solve the problems. In paper [1], it is given that text segmentation refers to the process of segmenting an article into its several parts based on its content. Because in the information retrieval systems, a long text tends to be retrieved most frequently by overestimation of its relevancy to a query, we need to segment it into its several parts, in order to avoid the
  • 2. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 367 problem. In this task, the text is given as the input and segmented into paragraphs, a list of pairs of adjacent paragraphs is generated, and each pair is judged whether we put the topic boundary between them, or not. In paper [2], The task of text segmentation is to partition the text into sentences and paragraphs and judge whether the topic boundary is put or not between two adjacent sentences or paragraph. The task may be interpreted into the binary classification where each sentence or paragraph pair into the transition to the different topic or the continuation of the identical topic. Segmentation of speech texts into sentences or paragraphs may be considered but covered in the next research. In the text categorization, the sample texts may span over various domains, whereas in the text segmentation, the sample paragraphs should be within a domain. Therefore, although the text segmentation belongs to the classification task, it should be distinguished from the topic based text categorization. The text segmentation is mapped into a binary classification. In paper [3], the application of the back propagation to the judgment of keywords is validated restrictedly. The definition of the back propagation to the judgment of keywords may be considered in various ways. The Information systems dealing with documents, such as Knowledge Management (KM), Information Retrieval (IR) and Digital Library (DL) systems require the storage of documents and structured data, called the document surrogate, associated with documents. Documents are written in natural language and cannot be processed directly by computers. A typical document surrogate, which is converted from the natural language document by computer, contains indices of the document and includes main words reflecting the contents. Indexing defines the process of converting a document into a list of words included in it. This paper proposed the application of back propagation and consideration of more factors with the addition to TF (Term Frequency) and IDF (Inverse Document Frequency). Paper [4], states that text categorization is the process of assigning one or some among predefined categories to each document. The task belongs to pattern classification where texts or documents are given as patterns. Note that almost information in any system is given as textual formats dominantly over numerical one. For managing efficiently the kind of information given as the textual format, techniques of text categorization are necessary; text categorization became a very interesting research topic in both academic and industrial worlds. In this version of the proposed text categorization system, the number of entries of tables is fixed constantly. The proposed one is called static index based approach. However, the optimal number of entries is very dependent on the given document or corpus. The size of each table should be optimized in terms of two factors: reliability and efficiency. In paper [5], authors tried to understand the automated categorization (or classification) of texts into predefined categories has witnessed a booming interest in the last 10 years, due to the increased availability of documents in digital form and the ensuing need to organize them. It is important to bear in mind that the considerations above are not absolute statements (if there may be any) on the comparative effectiveness of these TC methods. One of the reasons is that a particular applicative context may exhibit very different characteristics from the ones to be found in Reuters, and different classifiers may respond differently to these characteristics. An experimental study by Joachims [1998] involving support vector machines, k-NN, decision trees, Rocchio, and Naive Bayes, showed all these classifiers to have similar effectiveness on categories with 300 positive training examples each. The fact that this experiment involved the methods which have scored best (support vector machines, k-NN) and worst (Rocchio and Naive Bayes).Most popular approach to TC, at least in the operational (i.e., real world applications) community, was a knowledge engineering (KE). In paper [6], the authors have studied that text clustering refers to the process of segmenting a particular group of documents into sub groups each of which contains content based similar documents. A collection or group of documents is given as the input of the task. Several smaller groups of content-based similar documents are generated from the task as its output. Although there are many heuristic approaches to the task, unsupervised learning algorithms have been used as state of the art approaches to it. The process of encoding documents into numerical vectors for using traditional unsupervised learning algorithms for text clustering causes the two main problems. The first problem is huge dimensionality where documents must be encoded into very large dimensional numerical vectors for preventing information loss. In general, documents must be encoded at least into several hundreds dimensional numerical vectors in previous literatures. This problem causes very expensive cost for processing each numerical vector representing a document in terms of time and system resources. Furthermore, much more training examples are required proportionally to the dimension for avoiding overfitting.The second problem is sparse distribution where each numerical vector has zero values dominantly. In other words, more than 90 degree 0 of its elements are zero values in each numerical vector. This phenomenon degrades the discrimination among numerical vectors. This causes poor performance of text categorization or text clustering. In order to improve performance of both tasks, the two problems should be solved. 3. PROPOSED SYSTEM KNN Classifier: This section tells about the KNN classifier which is an algorithm used for text segmentation. It keeps the record
  • 3. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 368 of all the previous cases and another unknown case is been classified. It is a type of supervised learning. The unknown case is been classified by the maximum votes of its K nearest neighbours. It is a kind of Machine Learning algorithm and also one of the simplest algorithm used for classification. It considers the similarity between the attributes of the answers written by the user and then computes the similarities between the features of the answer and specimen answers. In this research, we encode sentence pairs or paragraph pairs into string vectors, and apply the string vector based version of KNN to the classification task mapped from the text segmentation NLP: This section is concerned with Natural Language Processing which is a field of AI(Artificial Intelligence). It is about the co-operation between the computer and the Natural Language used by humans. NLP is helpful in solving many problems like machine translation and text segmentation. Text Segmentation: This section is concerned about Text segmentation which is the process where the text which is been written is divided into small parts. The term applies both to mental processes used by humans when reading text, and to artificial processes implemented in computers, which are the subject of natural language processing. It is very helpful in assisting computers so that it is possible for the computers to do artificial things. It is a precursor Natural Language Processing. Text Segmentation recognizes the boundaries in between the words. Data Store: This section tells us about the role of data store in the process. A data store is a repository for storing collections of data, such as database. A data store is basically a connection to the repository of data, whether the data is stored in a single database or in one more different files. The data store can be used to gain data or you can export the data from results and then store it in the data store, or both. The data collected from the users is stored in the data store. For the processing the data stored in the data store is processed and stored back into the data store for the users to retrieve their processed data whenever he wants. Hence data store plays a major role in the entire process. For the data to be stored in the data store it need not compulsorily be arranged in some relational format. Fig -1: Architecture Diagram 4. CONCLUSIONS An examination system is developed based on the web. This paper describes the principle of the system, presents the main functions of the system, analyzes the auto- generating test paper algorithm, and discusses the security of the system. With the help of the algorithm we can conduct online subjective exams anywhere and everywhere. It saves time as it allows number of students to give the exam at a time and displays the results as the test gets over, so no need to wait for the result. It is automatically generated by the server. Staff has a privilege to create, modify and delete the test papers and its particular questions. Student can register, login and give the test with his specific id, and can see the results as well. ACKNOWLEDGEMENT We thank our guide Prof. A. D. Dhawale for his guidance and support. REFERENCES 1) Taeho Jo, “Using K Nearest Neighbors for Text Segmentation with Feature Similarity”, International Conference on Communication, Control, Computing and Electronics Engineering (ICCCCEE), 2017.
  • 4. International Research Journal of Engineering and Technology (IRJET) e-ISSN: 2395-0056 Volume: 04 Issue: 12 | Dec-2017 www.irjet.net p-ISSN: 2395-0072 © 2017, IRJET | Impact Factor value: 6.171 | ISO 9001:2008 Certified Journal | Page 369 2) Taeho Jo, “Content based Segmentation of Texts using Table based KNN”, IKE, 2017. 3) Taeho Jo, Malrey Lee , and Thomas M Gatton, “Keyword Extraction from Documents Using a Neural Network Model”, IEEE, 2016. 4) T. Jo, “NTC (Neural Text Categorizer): Neural Network for Text Categorization”, pp83-96, International Journal of Information Studies, Vol 2, No 2, 2010. 5) T. Jo, “Normalized Table Matching Algorithm as Approach to Text Categorization”, pp839-849, Soft Computing, Vol 19, No 4, 2015. 6) T. Jo, “Single Pass Algorithm for Text Clustering by Encoding Documents into Tables”, pp1749-1757, Journal of Korea Multimedia Society, Vol 11, No 12, 2008. 7) T. Jo and D. Cho, “Index based Approach for Text Categorization”, International Journal of Mathematics and Computers in Simulation, Vol 2, No 1, 2008. 8) H. Lodhi, C. Saunders, J. Shawe-Taylor, N. Cristianini, and C. Watkins, “Text Classification with String Kernels”, pp419-444, Journal of Machine Learning Research, Vol 2, No 2, 2002. 9) F. Sebastiani, “Machine Learning in Automated Text Categorization”, pp1-47, ACM Computing Survey, Vol 34, No 1, 2002 10) T. Jo, “Representation of Texts into String Vectors for Text Categorization”, pp110-127, Journal of Computing Science and Engineering, Vol 4, No 2, 2010.