SlideShare a Scribd company logo
1 of 7
Download to read offline
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 11 | Nov-2015, Available @ http://www.ijret.org 334
A TEMPLATE BASED ALGORITHM FOR AUTOMATIC
SUMMARIZATION AND DIALOGUE MANAGEMENT FOR TEXT
DOCUMENTS
Prashant G. Desai1
, Sarojadevi H2
, Niranjan N. Chiplunkar3
1
Lecturer, Department of Computer Science & Engineering, N.R.A.M.P., Nitte, Karnataka, India
2
Professor & Head, Department of Computer Science, N.M.A.M.I.T., Nitte, Karnataka, India
3
Principal, N.M.A.M.I.T., Nitte, Karnataka, India
Abstract
This paper describes an automated approach for extracting significant and useful events from unstructured text. The goal of
research is to come out with a methodology which helps in extracting important events such as dates, places, and subjects of
interest. It would be also convenient if the methodology helps in presenting the users with a shorter version of the text which
contain all non-trivial information. We also discuss implementation of algorithms which exactly does this task, developed by us.
Key Words: Cosine Similarity, Information, Natural Language, Summarization, Text Mining
--------------------------------------------------------------------***----------------------------------------------------------------------
1. INTRODUCTION
The world is witnessing a large quantity of digital
information today. This digital information is available in
different ways and from different sources. The World Wide
Web has been the dominant source of digital information.
This digital information has been presented to user mainly in
two formats: structured information and unstructured
information. Understanding and processing of structured
information is quite easier due to its very basic nature that
the information is arranged in a specific format. The same is
not true with unstructured information since there is no
specific format in which the information is arranged. The
information no matter how significant it is, spread across the
whole document. Therefore it is a challenging task to
understand, process and extract the useful and significant
information from unstructured documents such as emails,
blogs, and research documents available in doc, pdf and txt
formats. The ultimate goal of text mining is to discover
useful information and make knowledge out of it.
This paper describes a natural language processing approach
for identifying and extracting the important information
from the unstructured text.
2. RELATED WORK
The work presented in [6] describes a model for answering
the user inputted question, by referring to a large database
collection of question-answer sets. The words in the
question submitted by the user are expanded using the
synonyms of respective words. Researchers have mainly
concentrated on obtaining the synonyms of noun, verbs and
adjectives. Database search for finding the answer is done
after the question is expanded.
Researchers of [18] present a two-phased process for text
summarization. In the 1st phase text from original document
is analyzed for topic and passages are extracted. In the next
phase the extracted text is understood with the help of
WordNet and FrameNet based on which text is extracted,
joined together before being presented to the user as output.
[23] is a research work for extracting answer from on-line
psychological counseling websites. The information
available from these websites is used as knowledge base. An
index is created for making the search of key words
efficient. Using this index based search candidate answers
fetched for a user query is generated as response.
Three attributes‟ information is used for summarizing a
document in [1]. Those 3 words are- field, associated terms
and attribute grammars. The results of experiments
conducted by the authors‟ show that high level accuracy is
attained while implementing their concept.
The work illustrated by [5], formulates a set of rules for
summarizing the text document. To start with algorithm
analyses the structure of the natural language text. Later, the
prescribed rules formulated by the researchers are applied to
generate the summary.
[17] discusses a summarization technique based on fuzzy
logic. Before applying the fuzzy logic, sentences from the
original text document, are selected based on 8 pre-defined
rules such as title feature, sentence length, term weight,
sentence position, sentence to sentence similarity, proper
noun, thematic word, and numerical data. These are
implemented as simple IF-THEN rules in an inference
engine module of the algorithm.
[9] illustrates the significance of artificial neural networks
for finding key phrases from the text. Authors compare the
key words extracted from neural networks, naïve-bayes and
decision tree algorithms. Key words extracted from all these
algorithms are then compared with publicly available key
phrase extraction algorithm KEA. Laters authors are of the
opinion that, neural networks perform better for extracting
keywords.
A text mining approach is discussed in [22] for creating
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 11 | Nov-2015, Available @ http://www.ijret.org 335
ontology from text belonging to a specific domain. The
methodology adopted for building ontology uses data pre-
processing, natural language analysis, concept extraction,
semantic relation extraction and ontology building.
In the study of [14], it is learnt that quality of automated
summary is dependent of key phrases. Therefore, the author
focuses on locating high quality key words from the text.
The procedure adopted for summarizing is pre-processing,
key phrase extraction and then sentence selection.
[16] provides guidelines which are implemented by them for
achieving text mining independent of the language of
original text. Important points made by the authors‟ are to
formulate the rules independent of the language, minimizing
the use of language specific resources, if the use of such
resources become inventible, keep them in a modular
representation so that adding new ones become easier, for
cross-lingual applications, represent the docuSDment in a
neutral language to overcome the complexity of dealing
with multiple languages.
3. METHODOLOGY
This methodology is implemented in two phases: 1. Text
Pre-Processing 2. Information Extraction. Soon after the
document is chosen by the user, document goes to file
classification module. This module identifies whether the
input document is a pdf or MS-Word document before
reading the contents.
Phase 1: Text Pre-Processing
This part of the implementation includes the modules shown
below:
1.Syntactic analysis
2. Tokenization
3. Semantic analysis
4. Stop word removal
5. Stemming
Phase 2: Information extraction
This part of the implementation includes the modules shown
below:
1. Training the dialogue control
2. Knowledge base discovery
3. Dialogue Management
4. Template based summarization
5. Domain intelligence suites
Figure 1 and Figure 2 show the building blocks of our
system architecture.
Phase1: Text Pre-Processing
3.1 Syntactic Analysis
The unstructured text does not have any pre-defined format
in which data is organized. Hence, it is very much required
to decide where a particular piece of data begins and where
exactly it ends. The task of syntactic analysis module is
exactly this, i.e., to decide beginning and ending of each
sentence in a document. Presently we assume that full stop
symbol marks the end of sentence. Any string of characters
up to full stop symbol is treated as one sentence. We do this
with help of java class string tokenizer.
Text
Document
File
classification
Processed text
Text Pre-processing
Syntactic
analysis
Semantic
analysis
Tokeniza
tion
Stop
word
removal
Stemming
Fig -1: Text Pre-Processing modules
3.2 Tokenization
The task of tokenizer is to break the sentence into tokens.
The broken pieces of a sentence may include words,
numbers and punctuation marks. This step is very much
essential for next level of processing the text. [15].
3.3 Semantic Analysis
Semantic analysis module understands the part of each word
played in a sentence. Then it assigns a tag to every word
such as noun, verb, adjective, adverb and so on. This
process of understanding the part played by each word and
then assigning an appropriate tag is called Part-Of-Speech
tagging or POS tagging. In order to implement this module,
Stanford University [19] POS tagger is used
3.4 Stop Word Removal
Some of the words appear in the natural language text more
often but have very little significance when we take over all
meaning of the sentence. Such words are termed stop words.
These words are removed before the document is considered
for next level of processing. [10]
3.5 Stemming
Stemming is the process of finding base form of a certain
word in the text document [4, 3]. Any text document may
include repetition of same word but in different forms of the
grammar such as word being present in past tense, or in
present tense and sometimes in present continuous tense and
so on. To avoid all these forms of the same considered as
different words, stemming is performed. Presently porter
stemmer [11] is used for stemming.
The text after going through all these modules is fed into the
second phase called Information extraction module.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 11 | Nov-2015, Available @ http://www.ijret.org 336
Phase2: Information extraction
This module takes the pre-processed text for discovering
important information using machine learning algorithms
designed and implemented by us. The process of the
algorithm begins with training the model.
3.6 Training The Dialogue Control
Here the administrator of the module trains the system.
During training, the system learns important or index terms,
named entities such as name of the persons, places, temporal
formats according to rules. These rules are defined and
discussed in our previous paper [12]. Intelligence of the
algorithm increases with every training. The concepts learnt
during training are stored in the knowledge base of the
system.
3.7 Knowledge Base Discovery
Knowledge base discovery is the task of creating
intelligence from structured (e.g., database, XML, etc.), and
unstructured text (e.g., documents, blogs, emails, etc.)
[8, 7]. Our research is purely focused on processing
unstructured text. Hence, knowledge discovery here means
the process of deriving intelligent information and storing
from unstructured text.
The knowledge base is capable of storing important terms
such as index terms, name of the persons, name of the
locations, in an efficient may. By calling the knowledge
base design as efficient, we mean that, structure is formed in
such a way that all the terms are stored together yet in a
distinguishable format. There by reducing the need for
creating multiple storage structures for storing different
category terms, reducing the search time and thus improving
the overall performance of the algorithm.
Pre-Processed
Text
Domain Intelligence
Suites
Training
Knowledge
Discovery
Data base
Information Extraction
Dialogue
Control
Template Based
Summary
Fig -2:Information Extraction modules
3.8 Dialogue Management
Dialogue is an interaction occurring between two parties:
either between 2 humans or between human and machine
[21]. This paper discusses implementation of an algorithm
for a dialogue between human and machine. The dialogue
management module is a program which offers interaction
between human and the computer. With this module the end
user can request the information using natural language text.
The dialogue control module which built upon the training
model, accepts the user request, understands, refers the
knowledge base and then produces the answers which
possibly contain the sought information. Algorithm below
illustrates the functioning of dialogue management.
1. Choose the input document
2. User is presented with 2 choices:
1) Event related options
2) Writing a customized query
3. With choice-1 events such as important dates, locations,
subjects of interest available in the document are extracted.
4. With choice-2 user has the provision for writing a
customized query
5. Tokenize the query given in natural language text
6. Identify and extract the index terms (which are also called
as index words, cue words, key words and so on)
7. Search the knowledge base using the index terms.
7.1 While searching the answers, algorithm considers the
synonyms of every index word.
7.2 Word Net [13] is used for collecting the synonyms of a
term.
8. Answers which have the index terms appearing in them
are short listed.
9. Then sequence of appearance of index terms in answer
and query is matched
10. Answers are presented to user along with score for every
answer.
11. Score of the answer is count of the index terms
appearing in sequence in which they appear in query.
3.9 Template Based Summarization
Text summarization is the process of putting together
meaningful text present in a document in a condensed
format. Template based summarization in our system is
designed to perform this operation.
Here user has the freedom of choosing what should be
present in the summary. In other words, user prepares
template based on which summary is generated. This
prepared template can include POS tags like Noun, Verb,
Adverb, etc, and the sequence in which user wants them to
appear in the document. User is not confined to only one
pattern of this kind but can have as many patterns as
possible. Once the POS patterns are finalized, user can
decide whether to include named entities and dates in the
summary expected. This completes the step of preparing the
template. The template based summary module takes into
account all these requisites of the user while it generates
summary.
4. EXPERIMENTS AND RESULTS
4.1 Evaluation Metrics
There are many similarity measures to decide the closeness
of two text documents [2] such as Cosine similarity,
Euclidean distance, Jaccard coefficient, Pearson Correlation
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 11 | Nov-2015, Available @ http://www.ijret.org 337
Coefficient, Averaged Kullback-Leibler Divergence. The
experimental study carried out by [20] is of the opinion that
Cosine similarity is better suited for text documents. This
cosine similarity measure is used to evaluate automatically
obtained summary. While evaluation metrics such as
Precision, Recall, and F-Measure are used to evaluate the
dialogue management interface.
4.2 Cosine Similarity
If D = {d1, . . . , dn} is a set of documents and T = {t1, . . .
,tm} the set of unique terms appearing in D. The m-
dimensional vector is used represent the document . The
frequency of term t ∈ T in document d ∈ D is denoted by
tf(d, t). The vector of a document d is represented by the
equation = (tf(d, t1), . . . , tf(d, tm)). Given two
documents „a‟ and „b‟ whose term vectors are represented
by respectively, their cosine similarity is
calculated by the formula
4.3 Precision
Precision measures the number of correctly identified items
as a percentage of the number of items identified. It is
formally defined as
Precision (P) =
 
1
Correct Partial
2
Correct Spurious Partial
 
 
 
 
4.4 Recall
Recall measures the number of correctly identified items as
a percentage of the total number of correct items.
Recall is formally defined as
Recall(R) =
 
1
Correct Partial
2
Correct Missing Partial
 
 
 
 
4.5 Measure
The F-measure is often used in conjunction with Precision
and Recall, as a weighted average of the two. If P and R are
to be given equal weights, then we can use the equation
F1=
 
 
*
0.5* P R
P R

4.6 Experimental Setup
Ten text documents belonging mainly to research documents
and conference papers (CFP) are chosen for evaluating the
results. We manually generated a summary for each of these
ten documents. These manually created summaries are also
called reference summaries. The reference or manual
summaries for the research papers are generated on the basis
of contents having important parts-of-speech (POS) such as
Noun, Verb, Adverb, Proper nouns and so on. The manual
summaries for conference papers are chosen on the basis of
text containing important dates, subjects of interests and
names of the locations. Templates are prepared for
generating automated summary in such a way that,
templates contain patterns having sequences of POS which
were chosen for preparing manual summaries. This is done
mainly to verify the accuracy of the automated summary.
That is how humans prepare the manual summary and
algorithm prepares the summary with same references. Then
we compared the template based summary generated by the
system, with the respective manual summaries. In order to
compare system summary with the manual summary cosine
similarity measure is used, as both of these summaries are
stored as text documents. Table 1 displays the results in
terms of cosine similarity values.
Table -1: Cosine Similarity Indices
Length
Document
Category
Manual
Summary
Length
System
Summary
Length
Cosine
Similarity
Value
168
Technical
/ Research
Document
114 52 0.534
215 CFP 114 246 0.696
221
Technical
/ Research
Document
168 140 0.81
236
Technical
/ Research
Document
156 35 0.56
311
Technical
/ Research
Document
172 226 0.844
325
Technical
/ Research
Document
85 72 0.788
348 CFP 174 295 0.6666
512
Technical
/ Research
Document
265 246 0.785
676
Technical
/ Research
Document
165 379 0.84
1328
Technical
/ Research
Document
151 571 0.656
Table 2 illustrates results analysis for the dialogue
management interface, tested on 02 technical documents and
02 conference papers from the chosen dataset.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 11 | Nov-2015, Available @ http://www.ijret.org 338
4.7 Screen Shots
Figure 3 shows the template prepared and the summary
generated by the system. The template is prepared with two
patterns :
1. A “Determiner” POS tag followed “Plural Noun”
2. A “Base form of Verb” POS tag followed by a
“Singular Noun”.
The original document has 6 sentences. After processing this
original document with reference to the template prepared
by the user, system has produces an automated summary
which contains only 3 sentences of significance. This is
shown in figure 3.
Fig -3 Summary generated based on a template
Fig -4 Cosine similarity index between system and reference summaries.
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 11 | Nov-2015, Available @ http://www.ijret.org 339
Table -2: Result Analysis of Dialogue Management Interface
Index Words
Used for information
extraction Submitted
Document
Category
Expected
Information
Information
Extracted
Correct Partial Spurious Missing Precision Recall
F
Measure
Text, Summarization
Technical
research
paper
Information
from the
document
on Text,
Summarizat
ion
All the
information
present in
the
document
having the
input words
23 8 8 0 0.69 0.87 0.19
Dates
CFP
Dates
embedded
inside the
text
Dates
present in
the
document
1 0 0 0 1.00 1.00 0.25
Places
Name of
Places
Places
present in
the text
2 0 0 1 1.00 0.67 0.20
Subjects of Interest
Techincal
subjects
Areas of
Interest
11 1 6 0 0.64 0.96 0.19
5. CONCLUSIONS
Text mining is a trending research area in the present
scenario. The simple reason for which it is being considered
as trending topic is the vast availability of information in the
form of natural language. In this paper, we presented
algorithms for automatic text summarization and dialogue
management. The focus of automatic text summarization
algorithm is on the requirements of user before original text
is summarized while the algorithm used for dialogue
management establishes interactions between user and the
computer. The main contribution of work reported here is
the template based algorithm for text summarization and
dialogue management. The experiments conducted during
the implementation of work have produced encouraging
results. These results indicate that the accuracy of both
automatic summarization and dialogue management
algorithms is more than70%. Hence, it can be concluded that
the performance of algorithmic implementation of our work
is satisfactory.
REFERENCES
[1] Abdunabi UBUL, EI-Sayed ATLAM, Kazuhiro
MORITA, Masao FUKETA, Junichi AOE, “A Method
for Generating Document Summary using Field
Association Knowledge and Subjectively Information”,
Proceedings of IEEE, 2010
[2] Anna Huang, “Similarity Measures for Text Document
Clustering”, proceedings of NZCSRSC, 2008, pp 49-56
[3] Brill, E. “A Simple Rule-Based Part-of-speech Tagger”.
Proceedings of 3rd Conference on Applied Natural
Language Processing, 1992, pp.152–155.
[4] Church, K.W., “A Stochastic parts program and noun
phrase parser for unrestricted text”. Proceedings 1st
Conference on Applied Natural Language Processing,
ANLP, pp. 136–143. ACL, 1988.
[5] C. Lakshmi Devasenal, M. Hemalatha, “ Automatic
Text Categorization and Summarization using Rule
Reduction”, Proceedings of ICAESM, 2012, pp594-598
[6] Dang Truong Son, Dao Tien Dung, “Apply a mapping
question approach in building the question answering
system for vietnamese language”, Proceedings of
Journal of Engineering Technology and Education,
2012, pp 380-386
[7] en.wikipedia.org/wiki/Knowledge_extraction.
[8] Jantima Polpinij , “Ontology-based Knowledge
Discovery from Unstructured Text”, Proceedings of
IJIPM, Volume4, Number4, 2013, pp 21-31
[9] Kamal Sarkar, Mita Nasipuri,Suranjan Ghose,
“Machine Learning Based Keyphrase Extraction:
Comparing Decision Trees, Naïve Bayes, and Artificial
Neural Networks”, Proceedings of JIPS, 2012, pp693-
712
[10]M.Suneetha, S. Sameen Fatima, “Corpus based
Automatic Text Summarization System with HMM
Tagger”, Proceedings of IJSCE, ISSN: 2231-2307,
Volume-1, Issue-3, 2011, pp 118-123
[11]Porter stemmer,
http://www.tartarus.org/~martin/PorterStemmer
[12]Prashant G. Desai , Sarojadevi H. , Niranjan N.
Chiplunkar, “Rule-Knowledge Based Algorithm for
Event Extraction”, Proceedings of IJARCCE, ISSN
(Online) : 2278-1021, ISSN (Print) : 2319-5940,
Volume 4, Issue 1, 2015, pp 79-85
[13]Princeton University. Word Net,
http://wordnet.princeton.edu/
[14]Rafeeq Al-Hashemi, “Text Summarization Extraction
System (TSES) Using Extracted Keywords”,
IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308
_______________________________________________________________________________________
Volume: 04 Issue: 11 | Nov-2015, Available @ http://www.ijret.org 340
Proceedings of International Arab Journal of e-
Technology, Volume. 1, Number. 4, 2010, pp 164-168
[15]Raju Barskar, Gulfishan Firdose Ahmed, Nepal Barska,
“An Approach for Extracting Exact Answers to
Question Answering (QA) System for English
Sentences”, Proceedings of ICCTSD, 2012, pp 1187 –
1194
[16]Ralf Steinberger, Bruno Pouliquen, Camelia Ignat,
“Using language-independent rules to achieve high
multilinguality in Text Mining”, Proceedings of Mining
Massive Data Sets for Security, 2008, pp 217-240
[17]Rucha S. Dixit, Prof. Dr.S.S.Apte, “ Improvement of
Text Summarization using Fuzzy Logic Based
Method”, Proceedings of IOSRJCE, ISSN: 2278-0661,
ISBN: 2278-8727 , Volume 5, Issue 6, 2012, pp 05-10
[18]Shuhua Liu, “Enhancing E-Business-Intelligence-
Service: A Topic-Guided Text Summarization
Framework”, Proceedings of the Seventh IEEE
International Conference on E-Commerce Technology,
2005
[19]Stanford University POS Tagger
[20]Subhashini, Kumar, “Evaluating the Performance of
Similarity Measures Used in Document Clustering and
Information Retrieval”, Proceedings of ICIIC, 2010, pp
27 - 31
[21]Trung H. BUI, Multimodal Dialogue Management -
State of the art. January 3, 2006
[22]Xing Jiang,Ah-Hwee Tan “Mining Ontological
Knowledge from Domain-Specific Text Documents ”,
Proceedings of the Fifth IEEE International Conference
on Data Mining, 2005
[23]Yuanchao Liu1, Ming Liu, Zhimao Lu, Mingkai Song,
“ Extracting Knowledge from On-Line Forums for
Non-Obstructive Psychological Counseling Q&A
System”, International Journal of Intelligence Science,
2012, pp 40-48

More Related Content

What's hot

Summarization in Computational linguistics
Summarization in Computational linguisticsSummarization in Computational linguistics
Summarization in Computational linguisticsAhmad Mashhood
 
Architecture of an ontology based domain-specific natural language question a...
Architecture of an ontology based domain-specific natural language question a...Architecture of an ontology based domain-specific natural language question a...
Architecture of an ontology based domain-specific natural language question a...IJwest
 
Sentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clustersSentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clustersMOHDSAIFWAJID1
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESijnlc
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUEJournal For Research
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALijaia
 
IRJET- Automatic Recapitulation of Text Document
IRJET- Automatic Recapitulation of Text DocumentIRJET- Automatic Recapitulation of Text Document
IRJET- Automatic Recapitulation of Text DocumentIRJET Journal
 
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION cscpconf
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibEl Habib NFAOUI
 
Optimal approach for text summarization
Optimal approach for text summarizationOptimal approach for text summarization
Optimal approach for text summarizationIAEME Publication
 
A domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logicA domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logicIAEME Publication
 
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTIONTRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTIONijnlc
 
Answer extraction and passage retrieval for
Answer extraction and passage retrieval forAnswer extraction and passage retrieval for
Answer extraction and passage retrieval forWaheeb Ahmed
 
Legal Document
Legal DocumentLegal Document
Legal Documentlegal4
 
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...idescitation
 

What's hot (18)

Summarization in Computational linguistics
Summarization in Computational linguisticsSummarization in Computational linguistics
Summarization in Computational linguistics
 
Architecture of an ontology based domain-specific natural language question a...
Architecture of an ontology based domain-specific natural language question a...Architecture of an ontology based domain-specific natural language question a...
Architecture of an ontology based domain-specific natural language question a...
 
Sentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clustersSentence similarity-based-text-summarization-using-clusters
Sentence similarity-based-text-summarization-using-clusters
 
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGESA NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
A NOVEL APPROACH FOR WORD RETRIEVAL FROM DEVANAGARI DOCUMENT IMAGES
 
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUECOMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
COMPREHENSIVE ANALYSIS OF NATURAL LANGUAGE PROCESSING TECHNIQUE
 
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVAL
 
L1803058388
L1803058388L1803058388
L1803058388
 
IRJET- Automatic Recapitulation of Text Document
IRJET- Automatic Recapitulation of Text DocumentIRJET- Automatic Recapitulation of Text Document
IRJET- Automatic Recapitulation of Text Document
 
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION
 
Information_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_HabibInformation_Retrieval_Models_Nfaoui_El_Habib
Information_Retrieval_Models_Nfaoui_El_Habib
 
Optimal approach for text summarization
Optimal approach for text summarizationOptimal approach for text summarization
Optimal approach for text summarization
 
A domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logicA domain specific automatic text summarization using fuzzy logic
A domain specific automatic text summarization using fuzzy logic
 
[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma[IJET-V2I3P19] Authors: Priyanka Sharma
[IJET-V2I3P19] Authors: Priyanka Sharma
 
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTIONTRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
 
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTIONTRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTION
 
Answer extraction and passage retrieval for
Answer extraction and passage retrieval forAnswer extraction and passage retrieval for
Answer extraction and passage retrieval for
 
Legal Document
Legal DocumentLegal Document
Legal Document
 
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...
A Novel Method for Keyword Retrieval using Weighted Standard Deviation: “D4 A...
 

Similar to A template based algorithm for automatic summarization and dialogue management for text documents

CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMCANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMIRJET Journal
 
Review of Topic Modeling and Summarization
Review of Topic Modeling and SummarizationReview of Topic Modeling and Summarization
Review of Topic Modeling and SummarizationIRJET Journal
 
Conceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationConceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationijnlc
 
Automatic Text Summarization Using Natural Language Processing (1)
Automatic Text Summarization Using Natural Language Processing (1)Automatic Text Summarization Using Natural Language Processing (1)
Automatic Text Summarization Using Natural Language Processing (1)Don Dooley
 
A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...IJECEIAES
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
 
A statistical model for gist generation a case study on hindi news article
A statistical model for gist generation  a case study on hindi news articleA statistical model for gist generation  a case study on hindi news article
A statistical model for gist generation a case study on hindi news articleIJDKP
 
A Novel approach for Document Clustering using Concept Extraction
A Novel approach for Document Clustering using Concept ExtractionA Novel approach for Document Clustering using Concept Extraction
A Novel approach for Document Clustering using Concept ExtractionAM Publications
 
6.domain extraction from research papers
6.domain extraction from research papers6.domain extraction from research papers
6.domain extraction from research papersEditorJST
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniquesugginaramesh
 
IRJET- Multi-Document Summarization using Fuzzy and Hierarchical Approach
IRJET-  	  Multi-Document Summarization using Fuzzy and Hierarchical ApproachIRJET-  	  Multi-Document Summarization using Fuzzy and Hierarchical Approach
IRJET- Multi-Document Summarization using Fuzzy and Hierarchical ApproachIRJET Journal
 
Feature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsFeature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemIRJET Journal
 
Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...IAESIJAI
 
A novel approach for text extraction using effective pattern matching technique
A novel approach for text extraction using effective pattern matching techniqueA novel approach for text extraction using effective pattern matching technique
A novel approach for text extraction using effective pattern matching techniqueeSAT Journals
 
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...mlaij
 
CLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIES
CLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIESCLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIES
CLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIESecij
 
IRJET- Concept Extraction from Ambiguous Text Document using K-Means
IRJET- Concept Extraction from Ambiguous Text Document using K-MeansIRJET- Concept Extraction from Ambiguous Text Document using K-Means
IRJET- Concept Extraction from Ambiguous Text Document using K-MeansIRJET Journal
 

Similar to A template based algorithm for automatic summarization and dialogue management for text documents (20)

CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEMCANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
CANDIDATE SET KEY DOCUMENT RETRIEVAL SYSTEM
 
Review of Topic Modeling and Summarization
Review of Topic Modeling and SummarizationReview of Topic Modeling and Summarization
Review of Topic Modeling and Summarization
 
Conceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarizationConceptual framework for abstractive text summarization
Conceptual framework for abstractive text summarization
 
Summarization of Software Artifacts : A Review
Summarization of Software Artifacts : A ReviewSummarization of Software Artifacts : A Review
Summarization of Software Artifacts : A Review
 
Summarization of Software Artifacts : A Review
Summarization of Software Artifacts : A ReviewSummarization of Software Artifacts : A Review
Summarization of Software Artifacts : A Review
 
Automatic Text Summarization Using Natural Language Processing (1)
Automatic Text Summarization Using Natural Language Processing (1)Automatic Text Summarization Using Natural Language Processing (1)
Automatic Text Summarization Using Natural Language Processing (1)
 
A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...A hybrid approach for text summarization using semantic latent Dirichlet allo...
A hybrid approach for text summarization using semantic latent Dirichlet allo...
 
8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network8 efficient multi-document summary generation using neural network
8 efficient multi-document summary generation using neural network
 
A statistical model for gist generation a case study on hindi news article
A statistical model for gist generation  a case study on hindi news articleA statistical model for gist generation  a case study on hindi news article
A statistical model for gist generation a case study on hindi news article
 
A Novel approach for Document Clustering using Concept Extraction
A Novel approach for Document Clustering using Concept ExtractionA Novel approach for Document Clustering using Concept Extraction
A Novel approach for Document Clustering using Concept Extraction
 
6.domain extraction from research papers
6.domain extraction from research papers6.domain extraction from research papers
6.domain extraction from research papers
 
Comparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization TechniquesComparative Analysis of Text Summarization Techniques
Comparative Analysis of Text Summarization Techniques
 
IRJET- Multi-Document Summarization using Fuzzy and Hierarchical Approach
IRJET-  	  Multi-Document Summarization using Fuzzy and Hierarchical ApproachIRJET-  	  Multi-Document Summarization using Fuzzy and Hierarchical Approach
IRJET- Multi-Document Summarization using Fuzzy and Hierarchical Approach
 
Feature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documentsFeature selection, optimization and clustering strategies of text documents
Feature selection, optimization and clustering strategies of text documents
 
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering SystemKnowledge Graph and Similarity Based Retrieval Method for Query Answering System
Knowledge Graph and Similarity Based Retrieval Method for Query Answering System
 
Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...Machine learning for text document classification-efficient classification ap...
Machine learning for text document classification-efficient classification ap...
 
A novel approach for text extraction using effective pattern matching technique
A novel approach for text extraction using effective pattern matching techniqueA novel approach for text extraction using effective pattern matching technique
A novel approach for text extraction using effective pattern matching technique
 
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...
 
CLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIES
CLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIESCLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIES
CLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIES
 
IRJET- Concept Extraction from Ambiguous Text Document using K-Means
IRJET- Concept Extraction from Ambiguous Text Document using K-MeansIRJET- Concept Extraction from Ambiguous Text Document using K-Means
IRJET- Concept Extraction from Ambiguous Text Document using K-Means
 

More from eSAT Journals

Mechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavementsMechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavementseSAT Journals
 
Material management in construction – a case study
Material management in construction – a case studyMaterial management in construction – a case study
Material management in construction – a case studyeSAT Journals
 
Managing drought short term strategies in semi arid regions a case study
Managing drought    short term strategies in semi arid regions  a case studyManaging drought    short term strategies in semi arid regions  a case study
Managing drought short term strategies in semi arid regions a case studyeSAT Journals
 
Life cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangaloreLife cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangaloreeSAT Journals
 
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materialsLaboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materialseSAT Journals
 
Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...eSAT Journals
 
Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...eSAT Journals
 
Influence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizerInfluence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizereSAT Journals
 
Geographical information system (gis) for water resources management
Geographical information system (gis) for water resources managementGeographical information system (gis) for water resources management
Geographical information system (gis) for water resources managementeSAT Journals
 
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...eSAT Journals
 
Factors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concreteFactors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concreteeSAT Journals
 
Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...eSAT Journals
 
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...eSAT Journals
 
Evaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabsEvaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabseSAT Journals
 
Evaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in indiaEvaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in indiaeSAT Journals
 
Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...eSAT Journals
 
Estimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn methodEstimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn methodeSAT Journals
 
Estimation of morphometric parameters and runoff using rs & gis techniques
Estimation of morphometric parameters and runoff using rs & gis techniquesEstimation of morphometric parameters and runoff using rs & gis techniques
Estimation of morphometric parameters and runoff using rs & gis techniqueseSAT Journals
 
Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...eSAT Journals
 
Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...eSAT Journals
 

More from eSAT Journals (20)

Mechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavementsMechanical properties of hybrid fiber reinforced concrete for pavements
Mechanical properties of hybrid fiber reinforced concrete for pavements
 
Material management in construction – a case study
Material management in construction – a case studyMaterial management in construction – a case study
Material management in construction – a case study
 
Managing drought short term strategies in semi arid regions a case study
Managing drought    short term strategies in semi arid regions  a case studyManaging drought    short term strategies in semi arid regions  a case study
Managing drought short term strategies in semi arid regions a case study
 
Life cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangaloreLife cycle cost analysis of overlay for an urban road in bangalore
Life cycle cost analysis of overlay for an urban road in bangalore
 
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materialsLaboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
Laboratory studies of dense bituminous mixes ii with reclaimed asphalt materials
 
Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...Laboratory investigation of expansive soil stabilized with natural inorganic ...
Laboratory investigation of expansive soil stabilized with natural inorganic ...
 
Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...Influence of reinforcement on the behavior of hollow concrete block masonry p...
Influence of reinforcement on the behavior of hollow concrete block masonry p...
 
Influence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizerInfluence of compaction energy on soil stabilized with chemical stabilizer
Influence of compaction energy on soil stabilized with chemical stabilizer
 
Geographical information system (gis) for water resources management
Geographical information system (gis) for water resources managementGeographical information system (gis) for water resources management
Geographical information system (gis) for water resources management
 
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...Forest type mapping of bidar forest division, karnataka using geoinformatics ...
Forest type mapping of bidar forest division, karnataka using geoinformatics ...
 
Factors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concreteFactors influencing compressive strength of geopolymer concrete
Factors influencing compressive strength of geopolymer concrete
 
Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...Experimental investigation on circular hollow steel columns in filled with li...
Experimental investigation on circular hollow steel columns in filled with li...
 
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...Experimental behavior of circular hsscfrc filled steel tubular columns under ...
Experimental behavior of circular hsscfrc filled steel tubular columns under ...
 
Evaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabsEvaluation of punching shear in flat slabs
Evaluation of punching shear in flat slabs
 
Evaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in indiaEvaluation of performance of intake tower dam for recent earthquake in india
Evaluation of performance of intake tower dam for recent earthquake in india
 
Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...Evaluation of operational efficiency of urban road network using travel time ...
Evaluation of operational efficiency of urban road network using travel time ...
 
Estimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn methodEstimation of surface runoff in nallur amanikere watershed using scs cn method
Estimation of surface runoff in nallur amanikere watershed using scs cn method
 
Estimation of morphometric parameters and runoff using rs & gis techniques
Estimation of morphometric parameters and runoff using rs & gis techniquesEstimation of morphometric parameters and runoff using rs & gis techniques
Estimation of morphometric parameters and runoff using rs & gis techniques
 
Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...Effect of variation of plastic hinge length on the results of non linear anal...
Effect of variation of plastic hinge length on the results of non linear anal...
 
Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...Effect of use of recycled materials on indirect tensile strength of asphalt c...
Effect of use of recycled materials on indirect tensile strength of asphalt c...
 

Recently uploaded

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxfenichawla
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxupamatechverse
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...ranjana rawat
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)simmis5
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escortsranjana rawat
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingrknatarajan
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCall Girls in Nagpur High Profile
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Christo Ananth
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Dr.Costas Sachpazis
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur High Profile
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college projectTonystark477637
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSSIVASHANKAR N
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduitsrknatarajan
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlysanyuktamishra911
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...Call Girls in Nagpur High Profile
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...Call Girls in Nagpur High Profile
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSISrknatarajan
 

Recently uploaded (20)

BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptxBSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
BSides Seattle 2024 - Stopping Ethan Hunt From Taking Your Data.pptx
 
Introduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptxIntroduction and different types of Ethernet.pptx
Introduction and different types of Ethernet.pptx
 
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
(PRIYA) Rajgurunagar Call Girls Just Call 7001035870 [ Cash on Delivery ] Pun...
 
Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)Java Programming :Event Handling(Types of Events)
Java Programming :Event Handling(Types of Events)
 
Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024Water Industry Process Automation & Control Monthly - April 2024
Water Industry Process Automation & Control Monthly - April 2024
 
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Isha Call 7001035870 Meet With Nagpur Escorts
 
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and workingUNIT-V FMM.HYDRAULIC TURBINE - Construction and working
UNIT-V FMM.HYDRAULIC TURBINE - Construction and working
 
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service NashikCollege Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
College Call Girls Nashik Nehal 7001305949 Independent Escort Service Nashik
 
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur EscortsCall Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
Call Girls in Nagpur Suman Call 7001035870 Meet With Nagpur Escorts
 
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
Call for Papers - Educational Administration: Theory and Practice, E-ISSN: 21...
 
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
Structural Analysis and Design of Foundations: A Comprehensive Handbook for S...
 
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur EscortsHigh Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
High Profile Call Girls Nagpur Meera Call 7001035870 Meet With Nagpur Escorts
 
result management system report for college project
result management system report for college projectresult management system report for college project
result management system report for college project
 
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLSMANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
MANUFACTURING PROCESS-II UNIT-5 NC MACHINE TOOLS
 
UNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular ConduitsUNIT-II FMM-Flow Through Circular Conduits
UNIT-II FMM-Flow Through Circular Conduits
 
KubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghlyKubeKraft presentation @CloudNativeHooghly
KubeKraft presentation @CloudNativeHooghly
 
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...Booking open Available Pune Call Girls Koregaon Park  6297143586 Call Hot Ind...
Booking open Available Pune Call Girls Koregaon Park 6297143586 Call Hot Ind...
 
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...Booking open Available Pune Call Girls Pargaon  6297143586 Call Hot Indian Gi...
Booking open Available Pune Call Girls Pargaon 6297143586 Call Hot Indian Gi...
 
Roadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and RoutesRoadmap to Membership of RICS - Pathways and Routes
Roadmap to Membership of RICS - Pathways and Routes
 
UNIT-III FMM. DIMENSIONAL ANALYSIS
UNIT-III FMM.        DIMENSIONAL ANALYSISUNIT-III FMM.        DIMENSIONAL ANALYSIS
UNIT-III FMM. DIMENSIONAL ANALYSIS
 

A template based algorithm for automatic summarization and dialogue management for text documents

  • 1. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 11 | Nov-2015, Available @ http://www.ijret.org 334 A TEMPLATE BASED ALGORITHM FOR AUTOMATIC SUMMARIZATION AND DIALOGUE MANAGEMENT FOR TEXT DOCUMENTS Prashant G. Desai1 , Sarojadevi H2 , Niranjan N. Chiplunkar3 1 Lecturer, Department of Computer Science & Engineering, N.R.A.M.P., Nitte, Karnataka, India 2 Professor & Head, Department of Computer Science, N.M.A.M.I.T., Nitte, Karnataka, India 3 Principal, N.M.A.M.I.T., Nitte, Karnataka, India Abstract This paper describes an automated approach for extracting significant and useful events from unstructured text. The goal of research is to come out with a methodology which helps in extracting important events such as dates, places, and subjects of interest. It would be also convenient if the methodology helps in presenting the users with a shorter version of the text which contain all non-trivial information. We also discuss implementation of algorithms which exactly does this task, developed by us. Key Words: Cosine Similarity, Information, Natural Language, Summarization, Text Mining --------------------------------------------------------------------***---------------------------------------------------------------------- 1. INTRODUCTION The world is witnessing a large quantity of digital information today. This digital information is available in different ways and from different sources. The World Wide Web has been the dominant source of digital information. This digital information has been presented to user mainly in two formats: structured information and unstructured information. Understanding and processing of structured information is quite easier due to its very basic nature that the information is arranged in a specific format. The same is not true with unstructured information since there is no specific format in which the information is arranged. The information no matter how significant it is, spread across the whole document. Therefore it is a challenging task to understand, process and extract the useful and significant information from unstructured documents such as emails, blogs, and research documents available in doc, pdf and txt formats. The ultimate goal of text mining is to discover useful information and make knowledge out of it. This paper describes a natural language processing approach for identifying and extracting the important information from the unstructured text. 2. RELATED WORK The work presented in [6] describes a model for answering the user inputted question, by referring to a large database collection of question-answer sets. The words in the question submitted by the user are expanded using the synonyms of respective words. Researchers have mainly concentrated on obtaining the synonyms of noun, verbs and adjectives. Database search for finding the answer is done after the question is expanded. Researchers of [18] present a two-phased process for text summarization. In the 1st phase text from original document is analyzed for topic and passages are extracted. In the next phase the extracted text is understood with the help of WordNet and FrameNet based on which text is extracted, joined together before being presented to the user as output. [23] is a research work for extracting answer from on-line psychological counseling websites. The information available from these websites is used as knowledge base. An index is created for making the search of key words efficient. Using this index based search candidate answers fetched for a user query is generated as response. Three attributes‟ information is used for summarizing a document in [1]. Those 3 words are- field, associated terms and attribute grammars. The results of experiments conducted by the authors‟ show that high level accuracy is attained while implementing their concept. The work illustrated by [5], formulates a set of rules for summarizing the text document. To start with algorithm analyses the structure of the natural language text. Later, the prescribed rules formulated by the researchers are applied to generate the summary. [17] discusses a summarization technique based on fuzzy logic. Before applying the fuzzy logic, sentences from the original text document, are selected based on 8 pre-defined rules such as title feature, sentence length, term weight, sentence position, sentence to sentence similarity, proper noun, thematic word, and numerical data. These are implemented as simple IF-THEN rules in an inference engine module of the algorithm. [9] illustrates the significance of artificial neural networks for finding key phrases from the text. Authors compare the key words extracted from neural networks, naïve-bayes and decision tree algorithms. Key words extracted from all these algorithms are then compared with publicly available key phrase extraction algorithm KEA. Laters authors are of the opinion that, neural networks perform better for extracting keywords. A text mining approach is discussed in [22] for creating
  • 2. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 11 | Nov-2015, Available @ http://www.ijret.org 335 ontology from text belonging to a specific domain. The methodology adopted for building ontology uses data pre- processing, natural language analysis, concept extraction, semantic relation extraction and ontology building. In the study of [14], it is learnt that quality of automated summary is dependent of key phrases. Therefore, the author focuses on locating high quality key words from the text. The procedure adopted for summarizing is pre-processing, key phrase extraction and then sentence selection. [16] provides guidelines which are implemented by them for achieving text mining independent of the language of original text. Important points made by the authors‟ are to formulate the rules independent of the language, minimizing the use of language specific resources, if the use of such resources become inventible, keep them in a modular representation so that adding new ones become easier, for cross-lingual applications, represent the docuSDment in a neutral language to overcome the complexity of dealing with multiple languages. 3. METHODOLOGY This methodology is implemented in two phases: 1. Text Pre-Processing 2. Information Extraction. Soon after the document is chosen by the user, document goes to file classification module. This module identifies whether the input document is a pdf or MS-Word document before reading the contents. Phase 1: Text Pre-Processing This part of the implementation includes the modules shown below: 1.Syntactic analysis 2. Tokenization 3. Semantic analysis 4. Stop word removal 5. Stemming Phase 2: Information extraction This part of the implementation includes the modules shown below: 1. Training the dialogue control 2. Knowledge base discovery 3. Dialogue Management 4. Template based summarization 5. Domain intelligence suites Figure 1 and Figure 2 show the building blocks of our system architecture. Phase1: Text Pre-Processing 3.1 Syntactic Analysis The unstructured text does not have any pre-defined format in which data is organized. Hence, it is very much required to decide where a particular piece of data begins and where exactly it ends. The task of syntactic analysis module is exactly this, i.e., to decide beginning and ending of each sentence in a document. Presently we assume that full stop symbol marks the end of sentence. Any string of characters up to full stop symbol is treated as one sentence. We do this with help of java class string tokenizer. Text Document File classification Processed text Text Pre-processing Syntactic analysis Semantic analysis Tokeniza tion Stop word removal Stemming Fig -1: Text Pre-Processing modules 3.2 Tokenization The task of tokenizer is to break the sentence into tokens. The broken pieces of a sentence may include words, numbers and punctuation marks. This step is very much essential for next level of processing the text. [15]. 3.3 Semantic Analysis Semantic analysis module understands the part of each word played in a sentence. Then it assigns a tag to every word such as noun, verb, adjective, adverb and so on. This process of understanding the part played by each word and then assigning an appropriate tag is called Part-Of-Speech tagging or POS tagging. In order to implement this module, Stanford University [19] POS tagger is used 3.4 Stop Word Removal Some of the words appear in the natural language text more often but have very little significance when we take over all meaning of the sentence. Such words are termed stop words. These words are removed before the document is considered for next level of processing. [10] 3.5 Stemming Stemming is the process of finding base form of a certain word in the text document [4, 3]. Any text document may include repetition of same word but in different forms of the grammar such as word being present in past tense, or in present tense and sometimes in present continuous tense and so on. To avoid all these forms of the same considered as different words, stemming is performed. Presently porter stemmer [11] is used for stemming. The text after going through all these modules is fed into the second phase called Information extraction module.
  • 3. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 11 | Nov-2015, Available @ http://www.ijret.org 336 Phase2: Information extraction This module takes the pre-processed text for discovering important information using machine learning algorithms designed and implemented by us. The process of the algorithm begins with training the model. 3.6 Training The Dialogue Control Here the administrator of the module trains the system. During training, the system learns important or index terms, named entities such as name of the persons, places, temporal formats according to rules. These rules are defined and discussed in our previous paper [12]. Intelligence of the algorithm increases with every training. The concepts learnt during training are stored in the knowledge base of the system. 3.7 Knowledge Base Discovery Knowledge base discovery is the task of creating intelligence from structured (e.g., database, XML, etc.), and unstructured text (e.g., documents, blogs, emails, etc.) [8, 7]. Our research is purely focused on processing unstructured text. Hence, knowledge discovery here means the process of deriving intelligent information and storing from unstructured text. The knowledge base is capable of storing important terms such as index terms, name of the persons, name of the locations, in an efficient may. By calling the knowledge base design as efficient, we mean that, structure is formed in such a way that all the terms are stored together yet in a distinguishable format. There by reducing the need for creating multiple storage structures for storing different category terms, reducing the search time and thus improving the overall performance of the algorithm. Pre-Processed Text Domain Intelligence Suites Training Knowledge Discovery Data base Information Extraction Dialogue Control Template Based Summary Fig -2:Information Extraction modules 3.8 Dialogue Management Dialogue is an interaction occurring between two parties: either between 2 humans or between human and machine [21]. This paper discusses implementation of an algorithm for a dialogue between human and machine. The dialogue management module is a program which offers interaction between human and the computer. With this module the end user can request the information using natural language text. The dialogue control module which built upon the training model, accepts the user request, understands, refers the knowledge base and then produces the answers which possibly contain the sought information. Algorithm below illustrates the functioning of dialogue management. 1. Choose the input document 2. User is presented with 2 choices: 1) Event related options 2) Writing a customized query 3. With choice-1 events such as important dates, locations, subjects of interest available in the document are extracted. 4. With choice-2 user has the provision for writing a customized query 5. Tokenize the query given in natural language text 6. Identify and extract the index terms (which are also called as index words, cue words, key words and so on) 7. Search the knowledge base using the index terms. 7.1 While searching the answers, algorithm considers the synonyms of every index word. 7.2 Word Net [13] is used for collecting the synonyms of a term. 8. Answers which have the index terms appearing in them are short listed. 9. Then sequence of appearance of index terms in answer and query is matched 10. Answers are presented to user along with score for every answer. 11. Score of the answer is count of the index terms appearing in sequence in which they appear in query. 3.9 Template Based Summarization Text summarization is the process of putting together meaningful text present in a document in a condensed format. Template based summarization in our system is designed to perform this operation. Here user has the freedom of choosing what should be present in the summary. In other words, user prepares template based on which summary is generated. This prepared template can include POS tags like Noun, Verb, Adverb, etc, and the sequence in which user wants them to appear in the document. User is not confined to only one pattern of this kind but can have as many patterns as possible. Once the POS patterns are finalized, user can decide whether to include named entities and dates in the summary expected. This completes the step of preparing the template. The template based summary module takes into account all these requisites of the user while it generates summary. 4. EXPERIMENTS AND RESULTS 4.1 Evaluation Metrics There are many similarity measures to decide the closeness of two text documents [2] such as Cosine similarity, Euclidean distance, Jaccard coefficient, Pearson Correlation
  • 4. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 11 | Nov-2015, Available @ http://www.ijret.org 337 Coefficient, Averaged Kullback-Leibler Divergence. The experimental study carried out by [20] is of the opinion that Cosine similarity is better suited for text documents. This cosine similarity measure is used to evaluate automatically obtained summary. While evaluation metrics such as Precision, Recall, and F-Measure are used to evaluate the dialogue management interface. 4.2 Cosine Similarity If D = {d1, . . . , dn} is a set of documents and T = {t1, . . . ,tm} the set of unique terms appearing in D. The m- dimensional vector is used represent the document . The frequency of term t ∈ T in document d ∈ D is denoted by tf(d, t). The vector of a document d is represented by the equation = (tf(d, t1), . . . , tf(d, tm)). Given two documents „a‟ and „b‟ whose term vectors are represented by respectively, their cosine similarity is calculated by the formula 4.3 Precision Precision measures the number of correctly identified items as a percentage of the number of items identified. It is formally defined as Precision (P) =   1 Correct Partial 2 Correct Spurious Partial         4.4 Recall Recall measures the number of correctly identified items as a percentage of the total number of correct items. Recall is formally defined as Recall(R) =   1 Correct Partial 2 Correct Missing Partial         4.5 Measure The F-measure is often used in conjunction with Precision and Recall, as a weighted average of the two. If P and R are to be given equal weights, then we can use the equation F1=     * 0.5* P R P R  4.6 Experimental Setup Ten text documents belonging mainly to research documents and conference papers (CFP) are chosen for evaluating the results. We manually generated a summary for each of these ten documents. These manually created summaries are also called reference summaries. The reference or manual summaries for the research papers are generated on the basis of contents having important parts-of-speech (POS) such as Noun, Verb, Adverb, Proper nouns and so on. The manual summaries for conference papers are chosen on the basis of text containing important dates, subjects of interests and names of the locations. Templates are prepared for generating automated summary in such a way that, templates contain patterns having sequences of POS which were chosen for preparing manual summaries. This is done mainly to verify the accuracy of the automated summary. That is how humans prepare the manual summary and algorithm prepares the summary with same references. Then we compared the template based summary generated by the system, with the respective manual summaries. In order to compare system summary with the manual summary cosine similarity measure is used, as both of these summaries are stored as text documents. Table 1 displays the results in terms of cosine similarity values. Table -1: Cosine Similarity Indices Length Document Category Manual Summary Length System Summary Length Cosine Similarity Value 168 Technical / Research Document 114 52 0.534 215 CFP 114 246 0.696 221 Technical / Research Document 168 140 0.81 236 Technical / Research Document 156 35 0.56 311 Technical / Research Document 172 226 0.844 325 Technical / Research Document 85 72 0.788 348 CFP 174 295 0.6666 512 Technical / Research Document 265 246 0.785 676 Technical / Research Document 165 379 0.84 1328 Technical / Research Document 151 571 0.656 Table 2 illustrates results analysis for the dialogue management interface, tested on 02 technical documents and 02 conference papers from the chosen dataset.
  • 5. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 11 | Nov-2015, Available @ http://www.ijret.org 338 4.7 Screen Shots Figure 3 shows the template prepared and the summary generated by the system. The template is prepared with two patterns : 1. A “Determiner” POS tag followed “Plural Noun” 2. A “Base form of Verb” POS tag followed by a “Singular Noun”. The original document has 6 sentences. After processing this original document with reference to the template prepared by the user, system has produces an automated summary which contains only 3 sentences of significance. This is shown in figure 3. Fig -3 Summary generated based on a template Fig -4 Cosine similarity index between system and reference summaries.
  • 6. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 11 | Nov-2015, Available @ http://www.ijret.org 339 Table -2: Result Analysis of Dialogue Management Interface Index Words Used for information extraction Submitted Document Category Expected Information Information Extracted Correct Partial Spurious Missing Precision Recall F Measure Text, Summarization Technical research paper Information from the document on Text, Summarizat ion All the information present in the document having the input words 23 8 8 0 0.69 0.87 0.19 Dates CFP Dates embedded inside the text Dates present in the document 1 0 0 0 1.00 1.00 0.25 Places Name of Places Places present in the text 2 0 0 1 1.00 0.67 0.20 Subjects of Interest Techincal subjects Areas of Interest 11 1 6 0 0.64 0.96 0.19 5. CONCLUSIONS Text mining is a trending research area in the present scenario. The simple reason for which it is being considered as trending topic is the vast availability of information in the form of natural language. In this paper, we presented algorithms for automatic text summarization and dialogue management. The focus of automatic text summarization algorithm is on the requirements of user before original text is summarized while the algorithm used for dialogue management establishes interactions between user and the computer. The main contribution of work reported here is the template based algorithm for text summarization and dialogue management. The experiments conducted during the implementation of work have produced encouraging results. These results indicate that the accuracy of both automatic summarization and dialogue management algorithms is more than70%. Hence, it can be concluded that the performance of algorithmic implementation of our work is satisfactory. REFERENCES [1] Abdunabi UBUL, EI-Sayed ATLAM, Kazuhiro MORITA, Masao FUKETA, Junichi AOE, “A Method for Generating Document Summary using Field Association Knowledge and Subjectively Information”, Proceedings of IEEE, 2010 [2] Anna Huang, “Similarity Measures for Text Document Clustering”, proceedings of NZCSRSC, 2008, pp 49-56 [3] Brill, E. “A Simple Rule-Based Part-of-speech Tagger”. Proceedings of 3rd Conference on Applied Natural Language Processing, 1992, pp.152–155. [4] Church, K.W., “A Stochastic parts program and noun phrase parser for unrestricted text”. Proceedings 1st Conference on Applied Natural Language Processing, ANLP, pp. 136–143. ACL, 1988. [5] C. Lakshmi Devasenal, M. Hemalatha, “ Automatic Text Categorization and Summarization using Rule Reduction”, Proceedings of ICAESM, 2012, pp594-598 [6] Dang Truong Son, Dao Tien Dung, “Apply a mapping question approach in building the question answering system for vietnamese language”, Proceedings of Journal of Engineering Technology and Education, 2012, pp 380-386 [7] en.wikipedia.org/wiki/Knowledge_extraction. [8] Jantima Polpinij , “Ontology-based Knowledge Discovery from Unstructured Text”, Proceedings of IJIPM, Volume4, Number4, 2013, pp 21-31 [9] Kamal Sarkar, Mita Nasipuri,Suranjan Ghose, “Machine Learning Based Keyphrase Extraction: Comparing Decision Trees, Naïve Bayes, and Artificial Neural Networks”, Proceedings of JIPS, 2012, pp693- 712 [10]M.Suneetha, S. Sameen Fatima, “Corpus based Automatic Text Summarization System with HMM Tagger”, Proceedings of IJSCE, ISSN: 2231-2307, Volume-1, Issue-3, 2011, pp 118-123 [11]Porter stemmer, http://www.tartarus.org/~martin/PorterStemmer [12]Prashant G. Desai , Sarojadevi H. , Niranjan N. Chiplunkar, “Rule-Knowledge Based Algorithm for Event Extraction”, Proceedings of IJARCCE, ISSN (Online) : 2278-1021, ISSN (Print) : 2319-5940, Volume 4, Issue 1, 2015, pp 79-85 [13]Princeton University. Word Net, http://wordnet.princeton.edu/ [14]Rafeeq Al-Hashemi, “Text Summarization Extraction System (TSES) Using Extracted Keywords”,
  • 7. IJRET: International Journal of Research in Engineering and Technology eISSN: 2319-1163 | pISSN: 2321-7308 _______________________________________________________________________________________ Volume: 04 Issue: 11 | Nov-2015, Available @ http://www.ijret.org 340 Proceedings of International Arab Journal of e- Technology, Volume. 1, Number. 4, 2010, pp 164-168 [15]Raju Barskar, Gulfishan Firdose Ahmed, Nepal Barska, “An Approach for Extracting Exact Answers to Question Answering (QA) System for English Sentences”, Proceedings of ICCTSD, 2012, pp 1187 – 1194 [16]Ralf Steinberger, Bruno Pouliquen, Camelia Ignat, “Using language-independent rules to achieve high multilinguality in Text Mining”, Proceedings of Mining Massive Data Sets for Security, 2008, pp 217-240 [17]Rucha S. Dixit, Prof. Dr.S.S.Apte, “ Improvement of Text Summarization using Fuzzy Logic Based Method”, Proceedings of IOSRJCE, ISSN: 2278-0661, ISBN: 2278-8727 , Volume 5, Issue 6, 2012, pp 05-10 [18]Shuhua Liu, “Enhancing E-Business-Intelligence- Service: A Topic-Guided Text Summarization Framework”, Proceedings of the Seventh IEEE International Conference on E-Commerce Technology, 2005 [19]Stanford University POS Tagger [20]Subhashini, Kumar, “Evaluating the Performance of Similarity Measures Used in Document Clustering and Information Retrieval”, Proceedings of ICIIC, 2010, pp 27 - 31 [21]Trung H. BUI, Multimodal Dialogue Management - State of the art. January 3, 2006 [22]Xing Jiang,Ah-Hwee Tan “Mining Ontological Knowledge from Domain-Specific Text Documents ”, Proceedings of the Fifth IEEE International Conference on Data Mining, 2005 [23]Yuanchao Liu1, Ming Liu, Zhimao Lu, Mingkai Song, “ Extracting Knowledge from On-Line Forums for Non-Obstructive Psychological Counseling Q&A System”, International Journal of Intelligence Science, 2012, pp 40-48