This document summarizes an article that proposes an automatic text summarization technique using feature terms to calculate sentence relevance. The technique uses both statistical and linguistic methods to identify semantically important sentences for creating a generic summary. It determines the relevance of sentences based on feature term ranks and performs semantic analysis of sentences with the highest ranks to select those most important for the summary. The performance is evaluated by comparing summaries to those created by human evaluators.
An automatic text summarization using lexical cohesion and correlation of sen...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
A template based algorithm for automatic summarization and dialogue managemen...eSAT Journals
Abstract This paper describes an automated approach for extracting significant and useful events from unstructured text. The goal of research is to come out with a methodology which helps in extracting important events such as dates, places, and subjects of interest. It would be also convenient if the methodology helps in presenting the users with a shorter version of the text which contain all non-trivial information. We also discuss implementation of algorithms which exactly does this task, developed by us. Key Words: Cosine Similarity, Information, Natural Language, Summarization, Text Mining
Keyword Extraction Based Summarization of Categorized Kannada Text Documents ijsc
The internet has caused a humongous growth in the number of documents available online. Summaries of documents can help find the right information and are particularly effective when the document base is very large. Keywords are closely associated to a document as they reflect the document's content and act as indices for a given document. In this work, we present a method to produce extractive summaries of documents in the Kannada language, given number of sentences as limitation. The algorithm extracts key words from pre-categorized Kannada documents collected from online resources. We use two feature selection techniques for obtaining features from documents, then we combine scores obtained by GSS (Galavotti, Sebastiani, Simi) coefficients and IDF (Inverse Document Frequency) methods along with TF (Term Frequency) for extracting key words and later use these for summarization based on rank of the sentence. In the current implementation, a document from a given category is selected from our database and depending on the number of sentences given by the user, a summary is generated.
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
From last few years online information is growing tremendously on World Wide Web or on user’s desktops and thus online information gains much more attention in the field of automatic text summarization. Text mining has become a significant research field as it produces valuable data from unstructured and large amount of texts. Summarization systems provide the possibility of searching the important keywords of the texts and so the consumer will expend less time on reading the whole document. Main objective of summarization system is to generate a new form which expresses the key meaning of the contained text. This paper study on various existing techniques with needs of novel Multi-Document summarization schemes. This paper is motivated by arising need to provide high quality summary in very short period of time. In proposed system, user can quickly and easily access correctly-developed summaries which expresses the key meaning of the contained text. The primary focus of this paper lies with thef_β-optimal merge function, a function recently presented here, that uses the weighted harmonic mean to discover a harmony in the middle of precision and recall. Proposed system utilizes Bisect K-means clustering to improve the time and Neural Networks to improve the accuracy of summary generated by NEWSUM algorithm.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
An automatic text summarization using lexical cohesion and correlation of sen...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
A template based algorithm for automatic summarization and dialogue managemen...eSAT Journals
Abstract This paper describes an automated approach for extracting significant and useful events from unstructured text. The goal of research is to come out with a methodology which helps in extracting important events such as dates, places, and subjects of interest. It would be also convenient if the methodology helps in presenting the users with a shorter version of the text which contain all non-trivial information. We also discuss implementation of algorithms which exactly does this task, developed by us. Key Words: Cosine Similarity, Information, Natural Language, Summarization, Text Mining
Keyword Extraction Based Summarization of Categorized Kannada Text Documents ijsc
The internet has caused a humongous growth in the number of documents available online. Summaries of documents can help find the right information and are particularly effective when the document base is very large. Keywords are closely associated to a document as they reflect the document's content and act as indices for a given document. In this work, we present a method to produce extractive summaries of documents in the Kannada language, given number of sentences as limitation. The algorithm extracts key words from pre-categorized Kannada documents collected from online resources. We use two feature selection techniques for obtaining features from documents, then we combine scores obtained by GSS (Galavotti, Sebastiani, Simi) coefficients and IDF (Inverse Document Frequency) methods along with TF (Term Frequency) for extracting key words and later use these for summarization based on rank of the sentence. In the current implementation, a document from a given category is selected from our database and depending on the number of sentences given by the user, a summary is generated.
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
From last few years online information is growing tremendously on World Wide Web or on user’s desktops and thus online information gains much more attention in the field of automatic text summarization. Text mining has become a significant research field as it produces valuable data from unstructured and large amount of texts. Summarization systems provide the possibility of searching the important keywords of the texts and so the consumer will expend less time on reading the whole document. Main objective of summarization system is to generate a new form which expresses the key meaning of the contained text. This paper study on various existing techniques with needs of novel Multi-Document summarization schemes. This paper is motivated by arising need to provide high quality summary in very short period of time. In proposed system, user can quickly and easily access correctly-developed summaries which expresses the key meaning of the contained text. The primary focus of this paper lies with thef_β-optimal merge function, a function recently presented here, that uses the weighted harmonic mean to discover a harmony in the middle of precision and recall. Proposed system utilizes Bisect K-means clustering to improve the time and Neural Networks to improve the accuracy of summary generated by NEWSUM algorithm.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...mlaij
In this new era, where tremendous information is available on the internet, it is of most important to
provide the improved mechanism to extract the information quickly and most efficiently. It is very difficult
for human beings to manually extract the summary of large documents of text. Therefore, there is a
problem of searching for relevant documents from the number of documents available, and absorbing
relevant information from it. In order to solve the above two problems, the automatic text summarization is
very much necessary. Text summarization is the process of identifying the most important meaningful
information in a document or set of related documents and compressing them into a shorter version
preserving its overall meanings. More specific, Abstractive Text Summarization (ATS), is the task of
constructing summary sentences by merging facts from different source sentences and condensing them
into a shorter representation while preserving information content and overall meaning. This Paper
introduces a newly proposed technique for Summarizing the abstractive newspapers’ articles based on
deep learning.
The computer is based on natural language on
summarization and machine system. It is very
difficult for human being manually summarize
large amount of text. The greatest challenge
for text summarization to summarize convent
from number of textual and semi structured
sources, including text , HTML page, portable
document file . This summarization can be
determine from internal and external measure.
Our proposed work sentence similarity based
text summarization using clusters help in
finding subjective question and answer on
internet . This work provide short units of text
that belong to similar information. I proposed
my work on sentence similarity-based
computation that helps to experiment for
similar text computation. Extractive
summarization text system choosing a subset of
the similar group from the text. proposal work I
used the part of speech, proper noun, verb,
pronouns such as he, she, and they etc. With
the help of part of speech we find important
sentence using a statistical method like proper
noun and sentence similarity system. It based
on internet information that contain picky
sentence.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
A statistical model for gist generation a case study on hindi news articleIJDKP
Every day, huge number of news articles are reported and disseminated on the Internet. By generating gist
of an article, reader can go through the main topics instead of reading the whole article as it takes much
time for reader to read the entire content of the article. An ideal system would understand the document
and generate the appropriate theme(s) directly from the results of the understanding. In the absence of
natural language understanding system, it is required to design an appropriate system. Gist generation is a
difficult task because it requires both maximizing text content in short summary and maintains
grammaticality of the text. In this paper we present a statistical approach to generate a gist of a Hindi
news article. The experimental results are evaluated using the standard measures such as precision, recall
and F1 measure for different statistical models and their combination on the article before pre-processing
and after pre-processing.
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text
classification. In this paper, Fast Fuzzy Feature clustering for text classification is proposed. It
is based on the framework proposed by Jung-Yi Jiang, Ren-Jia Liou and Shie-Jue Lee in 2011.
The word in the feature vector of the document is grouped into the cluster in less iteration. The
numbers of iterations required to obtain cluster centers are reduced by transforming clusters
center dimension from n-dimension to 2-dimension. Principle Component Analysis with slit
change is used for dimension reduction. Experimental results show that, this method improve
the performance by significantly reducing the number of iterations required to obtain the cluster
center. The same is being verified with three benchmark datasets
Query Answering Approach Based on Document SummarizationIJMER
The growing of online information obliged the availability of a thorough research in the
domain of automatic text summarization within the Natural Language Processing (NLP)
community.The aim of this paper is to propose a novel approach for a language independent automatic
summarization approach that combines three main approaches. The Rhetorical Structure Theory
(RST), the query processing approach, and the Network Representationapproach (NRA). RST, as a
theory of major aspect for the structure of natural text, is used to extract the semantic relation behind
the text.Query processing approachclassifies the question type and finds the answer in a way that suits
the user’s needs. The NRA is used to create a graph representing the extracted semantic relation. The
output is an answer, which not only responses to the question, but also gives the user an opportunity to
find additional information that is related to the question.We implemented the proposed approach. As a
case study, the implemented approachis applied on Arabic text in the agriculture field. The
implemented approach succeeded in summarizing extension documents according to user's query. The
approach results have been evaluated using Recall, Precision and F-score measures.
An Improved Similarity Matching based Clustering Framework for Short and Sent...IJECEIAES
Text clustering plays a key role in navigation and browsing process. For an efficient text clustering, the large amount of information is grouped into meaningful clusters. Multiple text clustering techniques do not address the issues such as, high time and space complexity, inability to understand the relational and contextual attributes of the word, less robustness, risks related to privacy exposure, etc. To address these issues, an efficient text based clustering framework is proposed. The Reuters dataset is chosen as the input dataset. Once the input dataset is preprocessed, the similarity between the words are computed using the cosine similarity. The similarities between the components are compared and the vector data is created. From the vector data the clustering particle is computed. To optimize the clustering results, mutation is applied to the vector data. The performance the proposed text based clustering framework is analyzed using the metrics such as Mean Square Error (MSE), Peak Signal Noise Ratio (PSNR) and Processing time. From the experimental results, it is found that, the proposed text based clustering framework produced optimal MSE, PSNR and processing time when compared to the existing Fuzzy C-Means (FCM) and Pairwise Random Swap (PRS) methods.
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION cscpconf
The internet has caused a humongous growth in the amount of data available to the common
man. Summaries of documents can help find the right information and are particularly effective
when the document base is very large. Keywords are closely associated to a document as they
reflect the document's content and act as indexes for the given document. In this work, we
present a method to produce extractive summaries of documents in the Kannada language. The
algorithm extracts key words from pre-categorized Kannada documents collected from online
resources. We combine GSS (Galavotti, Sebastiani, Simi) coefficients and IDF (Inverse
Document Frequency) methods along with TF (Term Frequency) for extracting key words and
later use these for summarization. In the current implementation a document from a given category is selected from our database and depending on the number of sentences given by theuser, a summary is generated.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
An Efficient Approach for Keyword Selection ; Improving Accessibility of Web ...dannyijwest
General search engines often provide low precise results even for detailed queries. So there is a vital need
to elicit useful information like keywords for search engines to provide acceptable results for user’s search
queries. Although many methods have been proposed to show how to extract keywords automatically, all
attempt to get a better recall, precision and other criteria which describe how the method has done its job
as an author. This paper presents a new automatic keyword extraction method which improves accessibility
of web content by search engines. The proposed method defines some coefficients determining features
efficiency and tries to optimize them by using a genetic algorithm. Furthermore, it evaluates candidate
keywords by a function that utilizes the result of search engines. When comparing to the other methods,
experiments demonstrate that by using the proposed method, a higher score is achieved from search
engines without losing noticeable recall or precision.
Due to an exponential growth in the generation of textual data, the need for tools and mechanisms for automatic summarization of documents has become very critical. Text documents are vital to any organization's day-to-day working and as such, long documents often hamper trivial work. Therefore, an automatic summarizer is vital towards reducing human effort. Text summarization is an important activity in the analysis of a high volume text documents and is currently a major research topic in Natural Language Processing. It is the process of generation of the summary of input text by extracting the representative sentences from it. In this project, we present a novel technique for generating the summarization of domain specific text by using Semantic Analysis for text summarization, which is a subset of Natural Language Processing.
CLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIESecij
This paper presents a cluster priority ranking based approach for extractive automatic text summarization that aggregates different cluster ranks for final sentence scoring. This approach does not require any learning, feature weighting and semantic processing. Surface level features combinations are used for
individual cluster scoring. Proposed approach produces quality summaries without using title feature. Experimental results on DUC 2002 dataset proves robustness of proposed approach as compared to other surface level approaches using ROUGE evaluation matrices.
A Newly Proposed Technique for Summarizing the Abstractive Newspapers’ Articl...mlaij
In this new era, where tremendous information is available on the internet, it is of most important to
provide the improved mechanism to extract the information quickly and most efficiently. It is very difficult
for human beings to manually extract the summary of large documents of text. Therefore, there is a
problem of searching for relevant documents from the number of documents available, and absorbing
relevant information from it. In order to solve the above two problems, the automatic text summarization is
very much necessary. Text summarization is the process of identifying the most important meaningful
information in a document or set of related documents and compressing them into a shorter version
preserving its overall meanings. More specific, Abstractive Text Summarization (ATS), is the task of
constructing summary sentences by merging facts from different source sentences and condensing them
into a shorter representation while preserving information content and overall meaning. This Paper
introduces a newly proposed technique for Summarizing the abstractive newspapers’ articles based on
deep learning.
The computer is based on natural language on
summarization and machine system. It is very
difficult for human being manually summarize
large amount of text. The greatest challenge
for text summarization to summarize convent
from number of textual and semi structured
sources, including text , HTML page, portable
document file . This summarization can be
determine from internal and external measure.
Our proposed work sentence similarity based
text summarization using clusters help in
finding subjective question and answer on
internet . This work provide short units of text
that belong to similar information. I proposed
my work on sentence similarity-based
computation that helps to experiment for
similar text computation. Extractive
summarization text system choosing a subset of
the similar group from the text. proposal work I
used the part of speech, proper noun, verb,
pronouns such as he, she, and they etc. With
the help of part of speech we find important
sentence using a statistical method like proper
noun and sentence similarity system. It based
on internet information that contain picky
sentence.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
Performance Evaluation of Query Processing Techniques in Information Retrievalidescitation
The first element of the search process is the query.
The user query being on an average restricted to two or three
keywords makes the query ambiguous to the search engine.
Given the user query, the goal of an Information Retrieval
[IR] system is to retrieve information which might be useful
or relevant to the information need of the user. Hence, the
query processing plays an important role in IR system.
The query processing can be divided into four categories
i.e. query expansion, query optimization, query classification and
query parsing. In this paper an attempt is made to evaluate the
performance of query processing algorithms in each of the
category. The evaluation was based on dataset as specified by
Forum for Information Retrieval [FIRE15]. The criteria used
for evaluation are precision and relative recall. The analysis is
based on the importance of each step in query processing. The
experimental results show that the significance of each step
in query processing and also the relevance of web semantics
and spelling correction in the user query.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
A statistical model for gist generation a case study on hindi news articleIJDKP
Every day, huge number of news articles are reported and disseminated on the Internet. By generating gist
of an article, reader can go through the main topics instead of reading the whole article as it takes much
time for reader to read the entire content of the article. An ideal system would understand the document
and generate the appropriate theme(s) directly from the results of the understanding. In the absence of
natural language understanding system, it is required to design an appropriate system. Gist generation is a
difficult task because it requires both maximizing text content in short summary and maintains
grammaticality of the text. In this paper we present a statistical approach to generate a gist of a Hindi
news article. The experimental results are evaluated using the standard measures such as precision, recall
and F1 measure for different statistical models and their combination on the article before pre-processing
and after pre-processing.
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text
classification. In this paper, Fast Fuzzy Feature clustering for text classification is proposed. It
is based on the framework proposed by Jung-Yi Jiang, Ren-Jia Liou and Shie-Jue Lee in 2011.
The word in the feature vector of the document is grouped into the cluster in less iteration. The
numbers of iterations required to obtain cluster centers are reduced by transforming clusters
center dimension from n-dimension to 2-dimension. Principle Component Analysis with slit
change is used for dimension reduction. Experimental results show that, this method improve
the performance by significantly reducing the number of iterations required to obtain the cluster
center. The same is being verified with three benchmark datasets
Query Answering Approach Based on Document SummarizationIJMER
The growing of online information obliged the availability of a thorough research in the
domain of automatic text summarization within the Natural Language Processing (NLP)
community.The aim of this paper is to propose a novel approach for a language independent automatic
summarization approach that combines three main approaches. The Rhetorical Structure Theory
(RST), the query processing approach, and the Network Representationapproach (NRA). RST, as a
theory of major aspect for the structure of natural text, is used to extract the semantic relation behind
the text.Query processing approachclassifies the question type and finds the answer in a way that suits
the user’s needs. The NRA is used to create a graph representing the extracted semantic relation. The
output is an answer, which not only responses to the question, but also gives the user an opportunity to
find additional information that is related to the question.We implemented the proposed approach. As a
case study, the implemented approachis applied on Arabic text in the agriculture field. The
implemented approach succeeded in summarizing extension documents according to user's query. The
approach results have been evaluated using Recall, Precision and F-score measures.
An Improved Similarity Matching based Clustering Framework for Short and Sent...IJECEIAES
Text clustering plays a key role in navigation and browsing process. For an efficient text clustering, the large amount of information is grouped into meaningful clusters. Multiple text clustering techniques do not address the issues such as, high time and space complexity, inability to understand the relational and contextual attributes of the word, less robustness, risks related to privacy exposure, etc. To address these issues, an efficient text based clustering framework is proposed. The Reuters dataset is chosen as the input dataset. Once the input dataset is preprocessed, the similarity between the words are computed using the cosine similarity. The similarities between the components are compared and the vector data is created. From the vector data the clustering particle is computed. To optimize the clustering results, mutation is applied to the vector data. The performance the proposed text based clustering framework is analyzed using the metrics such as Mean Square Error (MSE), Peak Signal Noise Ratio (PSNR) and Processing time. From the experimental results, it is found that, the proposed text based clustering framework produced optimal MSE, PSNR and processing time when compared to the existing Fuzzy C-Means (FCM) and Pairwise Random Swap (PRS) methods.
DOCUMENT SUMMARIZATION IN KANNADA USING KEYWORD EXTRACTION cscpconf
The internet has caused a humongous growth in the amount of data available to the common
man. Summaries of documents can help find the right information and are particularly effective
when the document base is very large. Keywords are closely associated to a document as they
reflect the document's content and act as indexes for the given document. In this work, we
present a method to produce extractive summaries of documents in the Kannada language. The
algorithm extracts key words from pre-categorized Kannada documents collected from online
resources. We combine GSS (Galavotti, Sebastiani, Simi) coefficients and IDF (Inverse
Document Frequency) methods along with TF (Term Frequency) for extracting key words and
later use these for summarization. In the current implementation a document from a given category is selected from our database and depending on the number of sentences given by theuser, a summary is generated.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
An Efficient Approach for Keyword Selection ; Improving Accessibility of Web ...dannyijwest
General search engines often provide low precise results even for detailed queries. So there is a vital need
to elicit useful information like keywords for search engines to provide acceptable results for user’s search
queries. Although many methods have been proposed to show how to extract keywords automatically, all
attempt to get a better recall, precision and other criteria which describe how the method has done its job
as an author. This paper presents a new automatic keyword extraction method which improves accessibility
of web content by search engines. The proposed method defines some coefficients determining features
efficiency and tries to optimize them by using a genetic algorithm. Furthermore, it evaluates candidate
keywords by a function that utilizes the result of search engines. When comparing to the other methods,
experiments demonstrate that by using the proposed method, a higher score is achieved from search
engines without losing noticeable recall or precision.
Due to an exponential growth in the generation of textual data, the need for tools and mechanisms for automatic summarization of documents has become very critical. Text documents are vital to any organization's day-to-day working and as such, long documents often hamper trivial work. Therefore, an automatic summarizer is vital towards reducing human effort. Text summarization is an important activity in the analysis of a high volume text documents and is currently a major research topic in Natural Language Processing. It is the process of generation of the summary of input text by extracting the representative sentences from it. In this project, we present a novel technique for generating the summarization of domain specific text by using Semantic Analysis for text summarization, which is a subset of Natural Language Processing.
CLUSTER PRIORITY BASED SENTENCE RANKING FOR EFFICIENT EXTRACTIVE TEXT SUMMARIESecij
This paper presents a cluster priority ranking based approach for extractive automatic text summarization that aggregates different cluster ranks for final sentence scoring. This approach does not require any learning, feature weighting and semantic processing. Surface level features combinations are used for
individual cluster scoring. Proposed approach produces quality summaries without using title feature. Experimental results on DUC 2002 dataset proves robustness of proposed approach as compared to other surface level approaches using ROUGE evaluation matrices.
A hybrid approach for text summarization using semantic latent Dirichlet allo...IJECEIAES
Automatic text summarization generates a summary that contains sentences reflecting the essential and relevant information of the original documents. Extractive summarization requires semantic understanding, while abstractive summarization requires a better intermediate text representation. This paper proposes a hybrid approach for generating text summaries that combine extractive and abstractive methods. To improve the semantic understanding of the model, we propose two novel extractive methods: semantic latent Dirichlet allocation (semantic LDA) and sentence concept mapping. We then generate an intermediate summary by applying our proposed sentence ranking algorithm over the sentence concept mapping. This intermediate summary is input to a transformer-based abstractive model fine-tuned with a multi-head attention mechanism. Our experimental results demonstrate that the proposed hybrid model generates coherent summaries using the intermediate extractive summary covering semantics. As we increase the concepts and number of words in the summary the rouge scores are improved for precision and F1 scores in our proposed model.
Conceptual framework for abstractive text summarizationijnlc
As the volume of information available on the Internet increases, there is a growing need for tools helping users to find, filter and manage these resources. While more and more textual information is available on-line, effective retrieval is difficult without proper indexing and summarization of the content. One of the possible solutions to this problem is abstractive text summarization. The idea is to propose a system that will accept single document as input in English and processes the input by building a rich semantic graph and then reducing this graph for generating the final summary.
Summarization of software artifacts is an ongoing field of research among the software engineering community due to the benefits that summarization provides like saving of time and efforts in various software engineering tasks like code search, duplicate bug reports detection, traceability link recovery, etc. Summarization is to produce short and concise summaries. The paper presents the review of the state of the art of summarization techniques in software engineering context. The paper gives a brief overview to the software artifacts which are mostly used for summarization or have benefits from summarization. The paper briefly describes the general process of summarization. The paper reviews the papers published from 2010 to June 2017 and classifies the works into extractive and abstractive summarization. The paper also reviews the evaluation techniques used for summarizing software artifacts. The paper discusses the open problems and challenges in this field of research. The paper also discusses the future scopes in this area for new researchers.
Summarization of software artifacts is an ongoing field of research among the software engineering community due to the benefits that summarization provides like saving of time and efforts in various software engineering tasks like code search, duplicate bug reports detection, traceability link recovery, etc. Summarization is to produce short and concise summaries. The paper presents the review of the state of the art of summarization techniques in software engineering context. The paper gives a brief overview to the software artifacts which are mostly used for summarization or have benefits from summarization. The paper briefly describes the general process of summarization. The paper reviews the papers published from 2010 to June 2017 and classifies the works into extractive and abstractive summarization. The paper also reviews the evaluation techniques used for summarizing software artifacts. The paper discusses the open problems and challenges in this field of research. The paper also discusses the future scopes in this area for new researchers.
Text Mining is the technique that helps users to find out useful information from a large amount of text documents on the web or database. Most popular text mining and classification methods have adopted term-based approaches. The term based approaches and the pattern-based method describing user preferences. This review paper analyse how the text mining work on the three level i.e sentence level, document level and feature level. In this paper we review the related work which is previously done. This paper also demonstrated that what are the problems arise while doing text mining done at the feature level. This paper presents the technique to text mining for the compound sentences.
A Survey of Various Methods for Text SummarizationIJERD Editor
Document summarization means retrieved short and important text from the source document. In this paper, we studied various techniques. Plenty of techniques have been developed on English summarization and other Indian languages but very less efforts have been taken for Hindi language. Here, we discusses various techniques in which so many features are included such as time and memory consumption, efficiency, accuracy, ambiguity, redundancy.
The sarcasm detection with the method of logistic regressionEditorIJAERD
The prediction analysis is approach which may predict future possibilities. This research work is based on the
sarcasm detection from the text data. In the previous time SVM classification is applied for the sarcasm detection. The SVM
classifier classifies data based on the hyper plane which give low accuracy. To improve accuracy for sarcasm detection
logistic regression is applied during this work. The existing and proposed techniques are implemented in python and results
are analysed in terms of accuracy, execution time. The proposed approach has high accuracy and low execution time as
compared to SVM classifier for sarcasm detection.
Data mining is the knowledge discovery in databases and the gaol is to extract patterns and knowledge from
large amounts of data. The important term in data mining is text mining. Text mining extracts the quality
information highly from text. Statistical pattern learning is used to high quality information. High –quality in
text mining defines the combinations of relevance, novelty and interestingness. Tasks in text mining are text
categorization, text clustering, entity extraction and sentiment analysis. Applications of natural language
processing and analytical methods are highly preferred to turn
Machine learning for text document classification-efficient classification ap...IAESIJAI
Numerous alternative methods for text classification have been created because of the increase in the amount of online text information available. The cosine similarity classifier is the most extensively utilized simple and efficient approach. It improves text classification performance. It is combined with estimated values provided by conventional classifiers such as Multinomial Naive Bayesian (MNB). Consequently, combining the similarity between a test document and a category with the estimated value for the category enhances the performance of the classifier. This approach provides a text document categorization method that is both efficient and effective. In addition, methods for determining the proper relationship between a set of words in a document and its document categorization is also obtained.
Normal Labour/ Stages of Labour/ Mechanism of LabourWasim Ak
Normal labor is also termed spontaneous labor, defined as the natural physiological process through which the fetus, placenta, and membranes are expelled from the uterus through the birth canal at term (37 to 42 weeks
Biological screening of herbal drugs: Introduction and Need for
Phyto-Pharmacological Screening, New Strategies for evaluating
Natural Products, In vitro evaluation techniques for Antioxidants, Antimicrobial and Anticancer drugs. In vivo evaluation techniques
for Anti-inflammatory, Antiulcer, Anticancer, Wound healing, Antidiabetic, Hepatoprotective, Cardio protective, Diuretics and
Antifertility, Toxicity studies as per OECD guidelines
Synthetic Fiber Construction in lab .pptxPavel ( NSTU)
Synthetic fiber production is a fascinating and complex field that blends chemistry, engineering, and environmental science. By understanding these aspects, students can gain a comprehensive view of synthetic fiber production, its impact on society and the environment, and the potential for future innovations. Synthetic fibers play a crucial role in modern society, impacting various aspects of daily life, industry, and the environment. ynthetic fibers are integral to modern life, offering a range of benefits from cost-effectiveness and versatility to innovative applications and performance characteristics. While they pose environmental challenges, ongoing research and development aim to create more sustainable and eco-friendly alternatives. Understanding the importance of synthetic fibers helps in appreciating their role in the economy, industry, and daily life, while also emphasizing the need for sustainable practices and innovation.
June 3, 2024 Anti-Semitism Letter Sent to MIT President Kornbluth and MIT Cor...Levi Shapiro
Letter from the Congress of the United States regarding Anti-Semitism sent June 3rd to MIT President Sally Kornbluth, MIT Corp Chair, Mark Gorenberg
Dear Dr. Kornbluth and Mr. Gorenberg,
The US House of Representatives is deeply concerned by ongoing and pervasive acts of antisemitic
harassment and intimidation at the Massachusetts Institute of Technology (MIT). Failing to act decisively to ensure a safe learning environment for all students would be a grave dereliction of your responsibilities as President of MIT and Chair of the MIT Corporation.
This Congress will not stand idly by and allow an environment hostile to Jewish students to persist. The House believes that your institution is in violation of Title VI of the Civil Rights Act, and the inability or
unwillingness to rectify this violation through action requires accountability.
Postsecondary education is a unique opportunity for students to learn and have their ideas and beliefs challenged. However, universities receiving hundreds of millions of federal funds annually have denied
students that opportunity and have been hijacked to become venues for the promotion of terrorism, antisemitic harassment and intimidation, unlawful encampments, and in some cases, assaults and riots.
The House of Representatives will not countenance the use of federal funds to indoctrinate students into hateful, antisemitic, anti-American supporters of terrorism. Investigations into campus antisemitism by the Committee on Education and the Workforce and the Committee on Ways and Means have been expanded into a Congress-wide probe across all relevant jurisdictions to address this national crisis. The undersigned Committees will conduct oversight into the use of federal funds at MIT and its learning environment under authorities granted to each Committee.
• The Committee on Education and the Workforce has been investigating your institution since December 7, 2023. The Committee has broad jurisdiction over postsecondary education, including its compliance with Title VI of the Civil Rights Act, campus safety concerns over disruptions to the learning environment, and the awarding of federal student aid under the Higher Education Act.
• The Committee on Oversight and Accountability is investigating the sources of funding and other support flowing to groups espousing pro-Hamas propaganda and engaged in antisemitic harassment and intimidation of students. The Committee on Oversight and Accountability is the principal oversight committee of the US House of Representatives and has broad authority to investigate “any matter” at “any time” under House Rule X.
• The Committee on Ways and Means has been investigating several universities since November 15, 2023, when the Committee held a hearing entitled From Ivory Towers to Dark Corners: Investigating the Nexus Between Antisemitism, Tax-Exempt Universities, and Terror Financing. The Committee followed the hearing with letters to those institutions on January 10, 202
A Strategic Approach: GenAI in EducationPeter Windle
Artificial Intelligence (AI) technologies such as Generative AI, Image Generators and Large Language Models have had a dramatic impact on teaching, learning and assessment over the past 18 months. The most immediate threat AI posed was to Academic Integrity with Higher Education Institutes (HEIs) focusing their efforts on combating the use of GenAI in assessment. Guidelines were developed for staff and students, policies put in place too. Innovative educators have forged paths in the use of Generative AI for teaching, learning and assessments leading to pockets of transformation springing up across HEIs, often with little or no top-down guidance, support or direction.
This Gasta posits a strategic approach to integrating AI into HEIs to prepare staff, students and the curriculum for an evolving world and workplace. We will highlight the advantages of working with these technologies beyond the realm of teaching, learning and assessment by considering prompt engineering skills, industry impact, curriculum changes, and the need for staff upskilling. In contrast, not engaging strategically with Generative AI poses risks, including falling behind peers, missed opportunities and failing to ensure our graduates remain employable. The rapid evolution of AI technologies necessitates a proactive and strategic approach if we are to remain relevant.
Acetabularia Information For Class 9 .docxvaibhavrinwa19
Acetabularia acetabulum is a single-celled green alga that in its vegetative state is morphologically differentiated into a basal rhizoid and an axially elongated stalk, which bears whorls of branching hairs. The single diploid nucleus resides in the rhizoid.
Francesca Gottschalk - How can education support child empowerment.pptxEduSkills OECD
Francesca Gottschalk from the OECD’s Centre for Educational Research and Innovation presents at the Ask an Expert Webinar: How can education support child empowerment?
Francesca Gottschalk - How can education support child empowerment.pptx
K0936266
1. IOSR Journal of Computer Engineering (IOSR-JCE)
e-ISSN: 2278-0661, p- ISSN: 2278-8727Volume 9, Issue 3 (Mar. - Apr. 2013), PP 62 -66
www.iosrjournals.org
An automatic Text Summarization using feature terms for
relevance measure
Anita.R.Kulkarni1, Dr Mrs. S.S.Apte2
1
(Computer Science & Engg Department, Walchand Institute of Technology, India)
2
(Computer Science & Engg Department, Walchand Institute of Technology, India)
Abstract: Text Summarization is the process of generating a short summary for the document that
contains the overall meaning. This paper explains the extractive technique of summarization which
consists of selecting important sentences from the document and concatenating them into a short
summary. This work presents a method for identifying some feature terms of sentences and calculates
their ranks. The relevance measure of sentences is determined based on their ranks. It then uses a
combination of Statistical and Linguistic methods to identify semantically important sentences for
summary creations. Performance evaluation is done by comparing their summarization outputs with
manual summaries generated by three independent human evaluators.
Keywords: - Generic text summarization, Relevance measure, Semantic Analysis, Term-frequency Rank.
I. INTRODUCTION
With enormous growth of information on WWW, Conventional IR techniques have become inefficient
for finding relevant information effectively. Given a keyword-based search on the internet, it returns thousands
of documents overwhelming the user. It becomes very difficult and time consuming task to find the relevant
documents.. Therefore this paper provides text summarization approach as a solution to this problem. This
approach reduces the time required to find the web document having relevant and useful data. Text
Summarization is the process of automatically creating a compressed version of the given text. This compressed
version is called summary. Text summarization has two approaches, namely Extraction and Abstraction. This
paper focuses on extractive summarization.
Text summaries can be either query relevant or generic summaries. Query relevant summaries
contain sentences or passages from the document that are query specific. It is achieved by using
conventional IR techniques. On the other hand, generic summary provides an overall sense of the document‟s
content. In this method neither query nor any topic will be provided to summarizer. It is a big challenge for a
summarizer to produce a good quality generic summary. In this paper, we propose an extractive technique for
text summarization by using feature terms for calculating the relevance measure of sentences and extract the
sentences of highest ranks. Then we perform their semantic analysis to identify semantically important
sentences for creating a generic summary. Our proposed work generates a generic summary .There are various
techniques that have been applied in text summarization. It includes
1. Statistical approach
2. knowledge-based approach
3. Linguistic Technique
I.1 Statistical approach
The statistical approach summarizes using statistical features. It uses Information Retrieval methods to
determine the relevance of sentences. IR methods use frequency of terms and phrases to determine the relevance
of the sentence. However IR-based methods do not give satisfactory results as the summary generated is not
very coherent. Recently classification and position-based methods are also used for summarization. The
classifier uses the training data to classify the sentences as relevant or not. An algorithm using certain
parameters is used for this task. The position-based methods use the position of sentences to determine whether
the given sentence is important to be included in the summary or not. Statistical approaches are faster as they are
domain-independent but do not produce good quality of summary.
I.2 knowledge-based Approach
The knowledge-based approach stands in direct contrast to the statistical approach. This approach
interprets the text using extensive domain knowledge as well as natural language techniques and then
summarizes it.
www.iosrjournals.org 62 | Page
2. An Automatic Text Summarization using feature terms for relevance measure
I.3 Linguistic approach
The natural language processing techniques are the other ways to produce an abstracted summary by
understanding the context of the original content. The techniques of natural language processing & statistical
approach can be combined to generate more useful and meaningful summaries.
This paper focuses on method that utilizes both types of text summarization. In our study we focus on
sentence based extractive summarization. This technique attempts to identify a set of sentences that are most
important for the overall understanding of the given document. This paper presents an approach to generic
summarization on single document using both statistical method and linguistic method. The rest of the paper is
organized as follows. Section 2 gives the related work, section 3 gives the explanation of the method, section 4
gives Evaluation methods section 5 gives conclusion and section 6 gives future scope
II. Related work
The earliest instances of research on summarizing scientific documents proposed paradigms for
extracting salient sentences from text using features like word and phrase frequency, position in the text and key
terms or key phrases [1].
Most of the research work has focused on extraction in late 80s.It focused on extracts rather than
abstracts along with the renewed interest in earlier surface level approaches. Evolution of different features of
sentences and their extraction for sentence scoring has been studied in various research papers [1][2].
Other significant approaches such as hidden Markov models and log-linear models to improve extractive
summarization were studied [3][4].
Various works Published has concentrated on different domains where text summarization is used. The
domain-specific text summarization then became popular which used corpus for keyword frequencies [4][7].
This emphasizes on extractive approaches to summarization using statistical methods.
Recent Papers published have shown the use of fuzzy logic and neural networks [5] for text summarization
in order to improve the quality of the summary created by the general statistical method, they proposed
fuzzy logic based text summarization. They have also proposed an improved feature scoring technique based
on fuzzy logic for producing good summary [8]. They have proposed to address the problem of inaccurate
and unsure feature score utilizing fuzzy logic.
A neural network was used for summarizing news articles in the recent work[9]. A neural network was
trained to learn the significant features of sentences that are suitable for inclusion in the article summary. Then
the significant features are generalized and combined and modified accordingly. Then the neural network acts as
a filter and summarizes news articles.
We also discuss about some summarization tools
1 SweSum[1] a summarization tool from Royal Institute of Technology, Sweden.
2 MEAD- a public domain multi-lingual multi-document summarization system developed by the research
group of Dragomir Radev
3 LEMUR[2]- a summarizer toolkit that provides summary with its own search engine.
2.1 SWESUM
It is an online summarizer [1] that was first constructed by Hercules Dalianis and developed by Martin
Hassel. It is a traditional extraction-based domain specific text summarizer that works on sentences from news
text using HTML tags. For topic identification, Swesum makes use of hypothesis, where the high frequent
content words are keys to the topic of the text. Sentences that contain keywords are scored high [5]. Sentences
that contain numerical data are also considered to carry important information. These parameters are put into a
combination function with modifiable weights to obtain total score of each sentence. It is completely user
dependent and it is also difficult for inexpert user to set the parameter of the SweSum.
2.2 MEAD
The centroid-based method [6] is the most popular extractive summarization method. MEAD is an
implementation of this method. MEAD uses three features to determine the rank of the sentence. They are
centroid-score, position and overlap with first sentence. It computes centroid using tf-idf-type data. It ranks
candidate summary sentences by combining sentence scores against centroid, text position value, and tf-idf-title.
Sentence selection into summary is constrained by summary length and redundant new sentences avoided by
checking cosine similarity against prior ones. It only works with news text, but not with web pages whose
structure is different from news articles
www.iosrjournals.org 63 | Page
3. An Automatic Text Summarization using feature terms for relevance measure
2.3 LEMUR
It is a toolkit [2] used for searching the web and it makes summary of single document and
multidocuments [7]. It uses TF-IDF (vector model), Okapi (Probabilistic model) for multidocument
summarization and standard query language as relevance feedback. Lemur also provides standard tokenizer that
has options for stemming and stop words.
III. A better approach to summarization
Our work uses a combination of statistical and Linguistic [4] method to improve the quality of
summary. It works in four phases
i) Preprocessing of text
ii) Feature extraction of both words and sentences
iii) Summarization algorithm for calculation of rank using features score.
iv) Extracting sentences of higher ranks to generate summary
The figure 1 below shows the architecture of this technique.
III.1 PREPROCESSING OF TEXT:
It involves 4 steps
Sentence Segmentation
Tokenization and POS tagging
Removing Stop Word
Word Stemming
The system accepts the document from DUC 2002 and divides it into sentences using sentence
segmentation.
Then it is fed to the standard parser for generating tokens. A parser cum POS tagger provided by
Stanford is used to tag the input text into various parts of speech such as nouns(NN), verbs(VBZ), adjectives(JJ)
and adverbs(ADVB), determiners(DT) coordinating conjunction(CC) etc. It also divides the text into groups of
syntactically correlated parts of words as Noun phrase[NP], verb phrase[VB], adjective phrase[AP] etc
Example: The rose has a variety of colors, shapes and sizes.
[NP the/DT rose/NP] [VP has/VBZ] [NP a/DT variety/NN [PP of/of] [NP colors/NN shapes/NN and/CC
sizes/NN]
Next, Stop Words are removed. Stop words are the words which appear frequently in document but provide less
meaning in identifying the important content of the document such as „a‟, „an‟, „the‟, etc..
The last step for preprocessing is Word Stemming; Word stemming is the process of removing prefixes and
suffixes of each word.
www.iosrjournals.org 64 | Page
4. An Automatic Text Summarization using feature terms for relevance measure
III.2 FEATURE EXTRACTION
The text document D, after preprocessing is subjected to feature extraction by which each sentence in
the text document obtains a feature score based on its importance. Each feature is given a value between 0 and
1. The important text features to be used in this d system are as follows :
III.2.1 Title feature
The sentences that contain words or their synonyms, in the title should be considered for inclusion in the
summary as they reveal the theme of the document.
III.2.2 Term frequency-Inverse Sentence Frequency
Term frequency TF (t, d) of term t in the document d is defined as the number of times that term t
occurs in d.
Inverse Sentence frequency is used to measure the information content of a word. It says that terms that occur in
most of the sentences are less important than the ones that occur in few sentences.
Inverse sentence frequency is calculated as follows
ISFi=log (N/ni)
Where N denotes the number of sentences in the document and ni denotes the number of sentences in which
term i occurs.
TF-ISFi=TF*log (N/ni)
The TF-ISF of all terms in a sentence is added .The sentence having TF-ISF greater than a threshold is selected
for inclusion in summary.
III.2.3 Existence of Indicated words
Indicated words are the information containing words that help to extract important sentences. They can be
domain-specific words. For example, if the domain of text summarization is research articles or papers, then
following are some of the indicated words.
Purpose: It gives the information of need of research work or the motivation for research work.
Methods: It indicates the method or experimental procedures used in research
.
Conclusions: This word indicates the significance of research
The sentences containing indicated words are considered important to be included in summary.
The list of indicated words is predefined.
III.2.4 Existence of Cue-Phrase
Sentences containing cue phrase such as “This letter” “this paper” ,”The proposed work”, “this report”
,”develop” etc are candidate sentences to be included in the summary.
III.2.5 Sentence Position
The first sentence and the last sentence of the paragraphs are usually included in the summary.
III.2.6 Existence of Key phrase
The sentences that contain noun phrase or verb phrase are considered important to be included in summary. This
is possible by noun and verb chunking. The n-best output for taggers could be used to define chunks. This
makes POS Tagging at Lexical level.
III.2.7 Correlation among sentences
Correlation of sentences is very important for the summary as the sentence often refers to the previous or the
next sentence. If we consider only the relation of a sentence with the previous sentence then sentences starting
with connectives such as such, although, however, moreover ,also, this, those and that are related with the
preceding sentence. In such case the preceding sentence is also selected to be included in the summary. If the
rank of the preceding sentence is equal to or greater than 70% of the rank of the selected sentence, then it is
included in the summary.
IV. Sentence Selection And Assembly
The score of every feature will be normalized between 0 and 1 and the score of the sentence is the sum
of all the scores of every feature. The score of the sentence is called the rank of the sentence. The sentences are
www.iosrjournals.org 65 | Page
5. An Automatic Text Summarization using feature terms for relevance measure
stored in descending order of their ranks and top n highest scoring sentences are considered for summary, where
value of n is based on choice for percentage of summary.
V Summary Generation
The sentences are put into the summary in the order of their positions in the original document. URLs
and E-mails are removed from them as they do not contain important information.
VI. Evaluation
Evaluation is a key part of any research and development effort. Each approach should have an
evaluation. It will not only tell how effective of the approach, but also can be used to study and improve
sentence selection criteria.
Text summarization can be evaluated by using precision and recall, which are well known measurable
quantities based on statistical approach in the information retrieval discipline. Precision refers to the measure of
correctness of output based on relevance of the retrieved information. Recall measures the completeness of the
output, which refers to the relevant extracted information. Relevant sentences are those that occur in summary
generated by human experts and retrieved sentences are those that are selected by the summarization system.
The harmonic mean of precision and recall is called as F-measure. All these 3 parameters will be calculated for
generated summaries of test documents.
VII. Conclusions
In this paper we explained an approach to summarize a single document using statistical and Linguistic
approaches. We calculated scores of word and sentence features. Then we calculated the rank of the sentences
by summing up these scores. The top n ranked sentences were picked up to be included in summary. Some
minor post processing was done on these sentences to generate the final summary.
VIII. Future Scope
This work focuses on single document summarization. It can be extended to multidocument and
multilingual summarization
References
[1] H. Dalianis, “SweSum-A Text Summarizer for Swedish”, Technical Report, TRITA-NA-P0015, IPLab-174, KTH NADA, Sweden,
2000.
[2] N. McCracken, “IR Experiments with Lemur”, Available at: http://www-2.cs.cmu.edu/~lemur/ [December 16, 2009]
[3] T. Chang and W. Hsiao, “A hybrid approach to automatic text summarization”, 8th IEEE International Conference on Computer and
Information Technology (CIT 2008), Sydney, Australia, 2008.
[4] Ghadeer Natshah, Yasmeen Ta‟amra, Bara Amar and Manal Tamini, “Text Summarization: Using combinational Statistical and
Linguistic Methods”
[5] Vishal Gupta and Gurpreet Singh Lehal “A survey of Text summarization techniques “Journal of Emerging Technologies in Web
Intelligence VOL 2 NO 3 August 2010.
[6] Suneetha Manne and S.Sammen Fatima‟s”A feature Terms based Method for Improving Text summarization with supervised POS
Tagging”.
[7] R.Shams, A.Elsayed and Q.M Akter, ”A corpus-based evaluation of a domain-specific text to knowledge mapping prototype”, A
special issue of Journal of Computers, Academy Publisher, 2010(In Press)
[8] T. Chang and W. Hsiao, “A hybrid approach to automatic text summarization”, 8th IEEE International Conference on Computer and
Information Technology (CIT 2008), Sydney, Australia, 2008.
[9] Stanford NLP Group, “Stanford log-linear part of speech tagger”, Available at: Http://nlp.stanford.edu/software/tagger.shtml [June
15, 2009]
[10] Chengcheng L.(2010).Automatic Text Summarization Based On Rhetorical Structure Theory. IEEE. 2010 International Conference
on Computer Application and System Modeling (ICCASM 2010).
www.iosrjournals.org 66 | Page