The growing of online information obliged the availability of a thorough research in the
domain of automatic text summarization within the Natural Language Processing (NLP)
community.The aim of this paper is to propose a novel approach for a language independent automatic
summarization approach that combines three main approaches. The Rhetorical Structure Theory
(RST), the query processing approach, and the Network Representationapproach (NRA). RST, as a
theory of major aspect for the structure of natural text, is used to extract the semantic relation behind
the text.Query processing approachclassifies the question type and finds the answer in a way that suits
the user’s needs. The NRA is used to create a graph representing the extracted semantic relation. The
output is an answer, which not only responses to the question, but also gives the user an opportunity to
find additional information that is related to the question.We implemented the proposed approach. As a
case study, the implemented approachis applied on Arabic text in the agriculture field. The
implemented approach succeeded in summarizing extension documents according to user's query. The
approach results have been evaluated using Recall, Precision and F-score measures.
IRJET- A Survey Paper on Text Summarization MethodsIRJET Journal
This document summarizes various approaches for automatic text summarization, including extractive and abstractive methods. It discusses early surface-level approaches from the 1950s that identified important sentences based on word frequency. It also reviews corpus-based, cohesion-based, rhetoric-based, and graph-based approaches. The document then examines single document summarization techniques like naive Bayes methods, log-linear models, and deep natural language analysis. It concludes with a discussion of multi-document summarization, including abstraction and information fusion as well as graph spreading activation approaches. The goal of the survey is to provide an overview of the major existing methods that have been used for automatic text summarization.
The document proposes using conditional random fields (CRFs) to improve legal document summarization. CRFs are applied to segment legal documents into seven labeled rhetorical components. Feature sets are used to improve CRF performance. A term distribution model and structured domain knowledge are then used to extract key sentences for each rhetorical category. The resulting structured summary is found to be 80% accurate compared to ideal summaries generated by experts.
Novelty detection via topic modeling in research articlescsandit
In today’s world redundancy is the most vital problem faced in almost all domains. Novelty
detection is the identification of new or unknown data or signal that a machine learning system
is not aware of during training. The problem becomes more intense when it comes to “Research
Articles”. A method of identifying novelty at each sections of the article is highly required for
determining the novel idea proposed in the research paper. Since research articles are semistructured,
detecting novelty of information from them requires more accurate systems. Topic
model provides a useful means to process them and provides a simple way to analyze them. This
work compares the most predominantly used topic model- Latent Dirichlet Allocation with the
hierarchical Pachinko Allocation Model. The results obtained are promising towards
hierarchical Pachinko Allocation Model when used for document retrieval.
Rhetorical Sentence Classification for Automatic Title Generation in Scientif...TELKOMNIKA JOURNAL
In this paper, we proposed a work on rhetorical corpus construction and sentence classification
model experiment that specifically could be incorporated in automatic paper title generation task for
scientific article. Rhetorical classification is treated as sequence labeling. Rhetorical sentence classification
model is useful in task which considers document’s discourse structure. We performed experiments using
two domains of datasets: computer science (CS dataset), and chemistry (GaN dataset). We evaluated the
models using 10-fold-cross validation (0.70-0.79 weighted average F-measure) as well as on-the-run
(0.30-0.36 error rate at best). We argued that our models performed best when handled using SMOTE
filter for imbalanced data.
This document summarizes an article that proposes an automatic text summarization technique using feature terms to calculate sentence relevance. The technique uses both statistical and linguistic methods to identify semantically important sentences for creating a generic summary. It determines the relevance of sentences based on feature term ranks and performs semantic analysis of sentences with the highest ranks to select those most important for the summary. The performance is evaluated by comparing summaries to those created by human evaluators.
This document describes a method for sentence similarity based text summarization using clusters. It involves preprocessing text, extracting primitives from sentences, linking primitives, computing sentence similarity, merging similarity values, clustering similar sentences, and extracting a representative sentence from each cluster to generate a summary. Key steps include identifying common elements (primitives) between sentences, representing sentences as vectors of primitives, computing similarity based on shared primitives, clustering similar sentences, pruning clusters to remove dissimilar sentences, ranking clusters by importance, and selecting a representative sentence from each cluster for the summary. The goal is to automatically generate a short summary that captures the essential information from a collection of documents or text on the same topic.
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTIONijnlc
This document discusses an approach for translating legal sentences by segmenting them based on their logical structure and then selecting appropriate translation rules. It first uses conditional random fields to recognize the logical structure of legal sentences, which involves a requisite and effectuation. It then segments the sentences based on this structure. Finally, it proposes a maximum entropy model for rule selection in statistical machine translation that incorporates linguistic and contextual information from the segmented sentences. An experiment on English to Japanese legal translation showed improvements over baseline systems.
This document proposes a method for translating legal sentences that involves two steps: 1) segmenting legal sentences based on their logical structure using conditional random fields, and 2) selecting translation rules for the segmented sentences using maximum entropy and linguistic/contextual features. The method was tested on a corpus of 516 English-Japanese sentence pairs and showed improvements over baseline systems in recognizing logical structures and BLEU scores.
IRJET- A Survey Paper on Text Summarization MethodsIRJET Journal
This document summarizes various approaches for automatic text summarization, including extractive and abstractive methods. It discusses early surface-level approaches from the 1950s that identified important sentences based on word frequency. It also reviews corpus-based, cohesion-based, rhetoric-based, and graph-based approaches. The document then examines single document summarization techniques like naive Bayes methods, log-linear models, and deep natural language analysis. It concludes with a discussion of multi-document summarization, including abstraction and information fusion as well as graph spreading activation approaches. The goal of the survey is to provide an overview of the major existing methods that have been used for automatic text summarization.
The document proposes using conditional random fields (CRFs) to improve legal document summarization. CRFs are applied to segment legal documents into seven labeled rhetorical components. Feature sets are used to improve CRF performance. A term distribution model and structured domain knowledge are then used to extract key sentences for each rhetorical category. The resulting structured summary is found to be 80% accurate compared to ideal summaries generated by experts.
Novelty detection via topic modeling in research articlescsandit
In today’s world redundancy is the most vital problem faced in almost all domains. Novelty
detection is the identification of new or unknown data or signal that a machine learning system
is not aware of during training. The problem becomes more intense when it comes to “Research
Articles”. A method of identifying novelty at each sections of the article is highly required for
determining the novel idea proposed in the research paper. Since research articles are semistructured,
detecting novelty of information from them requires more accurate systems. Topic
model provides a useful means to process them and provides a simple way to analyze them. This
work compares the most predominantly used topic model- Latent Dirichlet Allocation with the
hierarchical Pachinko Allocation Model. The results obtained are promising towards
hierarchical Pachinko Allocation Model when used for document retrieval.
Rhetorical Sentence Classification for Automatic Title Generation in Scientif...TELKOMNIKA JOURNAL
In this paper, we proposed a work on rhetorical corpus construction and sentence classification
model experiment that specifically could be incorporated in automatic paper title generation task for
scientific article. Rhetorical classification is treated as sequence labeling. Rhetorical sentence classification
model is useful in task which considers document’s discourse structure. We performed experiments using
two domains of datasets: computer science (CS dataset), and chemistry (GaN dataset). We evaluated the
models using 10-fold-cross validation (0.70-0.79 weighted average F-measure) as well as on-the-run
(0.30-0.36 error rate at best). We argued that our models performed best when handled using SMOTE
filter for imbalanced data.
This document summarizes an article that proposes an automatic text summarization technique using feature terms to calculate sentence relevance. The technique uses both statistical and linguistic methods to identify semantically important sentences for creating a generic summary. It determines the relevance of sentences based on feature term ranks and performs semantic analysis of sentences with the highest ranks to select those most important for the summary. The performance is evaluated by comparing summaries to those created by human evaluators.
This document describes a method for sentence similarity based text summarization using clusters. It involves preprocessing text, extracting primitives from sentences, linking primitives, computing sentence similarity, merging similarity values, clustering similar sentences, and extracting a representative sentence from each cluster to generate a summary. Key steps include identifying common elements (primitives) between sentences, representing sentences as vectors of primitives, computing similarity based on shared primitives, clustering similar sentences, pruning clusters to remove dissimilar sentences, ranking clusters by importance, and selecting a representative sentence from each cluster for the summary. The goal is to automatically generate a short summary that captures the essential information from a collection of documents or text on the same topic.
TRANSLATING LEGAL SENTENCE BY SEGMENTATION AND RULE SELECTIONijnlc
This document discusses an approach for translating legal sentences by segmenting them based on their logical structure and then selecting appropriate translation rules. It first uses conditional random fields to recognize the logical structure of legal sentences, which involves a requisite and effectuation. It then segments the sentences based on this structure. Finally, it proposes a maximum entropy model for rule selection in statistical machine translation that incorporates linguistic and contextual information from the segmented sentences. An experiment on English to Japanese legal translation showed improvements over baseline systems.
This document proposes a method for translating legal sentences that involves two steps: 1) segmenting legal sentences based on their logical structure using conditional random fields, and 2) selecting translation rules for the segmented sentences using maximum entropy and linguistic/contextual features. The method was tested on a corpus of 516 English-Japanese sentence pairs and showed improvements over baseline systems in recognizing logical structures and BLEU scores.
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESacijjournal
Named Entity Recognition which is an important subject of Natural Language Processing is a key technology of information extraction, information retrieval, question answering and other text processing applications. In this study, we evaluate previously well-established association measures as an initial
attempt to extract two-worded named entities in a Turkish corpus. Furthermore we propose a new association measure, and compare it with the other methods. The evaluation of these methods is performed by precision and recall measures.
Taxonomy extraction from automotive natural language requirements using unsup...ijnlc
In this paper we present a novel approach to semi-automatically learn concept hierarchies from natural
language requirements of the automotive industry. The approach is based on the distributional hypothesis
and the special characteristics of domain-specific German compounds. We extract taxonomies by using
clustering techniques in combination with general thesauri. Such a taxonomy can be used to support
requirements engineering in early stages by providing a common system understanding and an agreedupon
terminology. This work is part of an ontology-driven requirements engineering process, which builds
on top of the taxonomy. Evaluation shows that this taxonomy extraction approach outperforms common
hierarchical clustering techniques.
Text Segmentation for Online Subjective Examination using Machine LearningIRJET Journal
This document discusses using k-Nearest Neighbor (K-NN) machine learning for text segmentation of online exams. K-NN is an instance-based learning method that computes similarity between feature vectors to determine similarity between texts. The goal is to implement natural language processing using text segmentation, which provides benefits. It reviews related work applying various machine learning methods like K-NN, support vector machines, decision trees to tasks like text categorization and clustering.
Review of Various Text Categorization Methodsiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Complete agglomerative hierarchy document’s clustering based on fuzzy luhn’s ...IJECEIAES
Agglomerative hierarchical is a bottom up clustering method, where the distances between documents can be retrieved by extracting feature values using a topic-based latent dirichlet allocation method. To reduce the number of features, term selection can be done using Luhn’s Idea. Those methods can be used to build the better clusters for document. But, there is less research discusses it. Therefore, in this research, the term weighting calculation uses Luhn’s Idea to select the terms by defining upper and lower cut-off, and then extracts the feature of terms using gibbs sampling latent dirichlet allocation combined with term frequency and fuzzy Sugeno method. The feature values used to be the distance between documents, and clustered with single, complete and average link algorithm. The evaluations show the feature extraction with and without lower cut-off have less difference. But, the topic determination of each term based on term frequency and fuzzy Sugeno method is better than Tsukamoto method in finding more relevant documents. The used of lower cut-off and fuzzy Sugeno gibbs latent dirichlet allocation for complete agglomerative hierarchical clustering have consistent metric values. This clustering method suggested as a better method in clustering documents that is more relevant to its gold standard.
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...acijjournal
Representation of semantic information contained in the words is needed for any Arabic Text Mining
applications. More precisely, the purpose is to better take into account the semantic dependencies
between words expressed by the co-occurrence frequencies of these words. There have been many
proposals to compute similarities between words based on their distributions in contexts. In this paper,
we compare and contrast the effect of two preprocessing techniques applied to Arabic corpus: Rootbased (Stemming), and Stem-based (Light Stemming) approaches for measuring the similarity between
Arabic words with the well known abstractive model -Latent Semantic Analysis (LSA)- with a wide
variety of distance functions and similarity measures, such as the Euclidean Distance, Cosine Similarity,
Jaccard Coefficient, and the Pearson Correlation Coefficient. The obtained results show that, on the one
hand, the variety of the corpus produces more accurate results; on the other hand, the Stem-based
approach outperformed the Root-based one because this latter affects the words meanings.
Many of previous research have proven that the usage of rhetorical relations is capable to enhance many applications such as text summarization, question answering and natural language generation. This work proposes an approach that expands the benefit of rhetorical relations to address redundancy problem for cluster-based text summarization of multiple documents. We exploited rhetorical relations exist between sentences to group similar sentences into multiple clusters to identify themes of common information. The candidate summary were extracted from these clusters. Then, cluster-based text summarization is performed using Conditional Markov Random Walk Model to measure the saliency scores of the candidate summary. We evaluated our method by measuring the cohesion and separation of the clusters constructed by exploiting rhetorical relations and ROUGE score of generated summaries. The experimental result shows that our method performed well which shows promising potential of applying rhetorical relation in text clustering which benefits text summarization of multiple documents.
A comparative analysis of particle swarm optimization and k means algorithm f...ijnlc
The volume of digitized text documents on the web have been increasing rapidly. As there is huge collection
of data on the web there is a need for grouping(clustering) the documents into clusters for speedy
information retrieval. Clustering of documents is collection of documents into groups such that the
documents within each group are similar to each other and not to documents of other groups. Quality of
clustering result depends greatly on the representation of text and the clustering algorithm. This paper
presents a comparative analysis of three algorithms namely K-means, Particle swarm Optimization (PSO)
and hybrid PSO+K-means algorithm for clustering of text documents using WordNet. The common way of
representing a text document is bag of terms. The bag of terms representation is often unsatisfactory as it
does not exploit the semantics. In this paper, texts are represented in terms of synsets corresponding to a
word. Bag of terms data representation of text is thus enriched with synonyms from WordNet. K-means,
Particle Swarm Optimization (PSO) and hybrid PSO+K-means algorithms are applied for clustering of
text in Nepali language. Experimental evaluation is performed by using intra cluster similarity and inter
cluster similarity.
This document summarizes a research paper that investigates authorship attribution on short historical Arabic texts using naive Bayes classification. It explores using n-gram word and character level features and the naive Bayes classifier to attribute 30 short texts by 10 authors. The experiments show the naive Bayes classifier achieves high accuracy, up to 96%, especially using word 1-gram features. It also finds punctuation marks can help distinguish authors and increase performance.
Many of previous research have proven that the usage of rhetorical relations is capable to enhance many applications such as text summarization, question answering and natural language generation. This work proposes an approach that expands the benefit of rhetorical
relations to address redundancy problem in text summarization. We first examined and redefined the type of rhetorical relations that is useful to retrieve sentences with identical content and performed the identification of those relations using SVMs. By exploiting the
rhetorical relations exist between sentences, we generate clusters of similar sentences from document sets. Then, cluster-based text summarization is performed using Conditional Markov Random Walk Model to measure the saliency scores of candidates summary. We evaluated our
method by measuring the cohesion and separation of the clusters and ROUGE score of generated summaries. The experimental result shows that our method performed well which shows promising potential of applying rhetorical relation in cluster-based text summarization.
This paper proposes Natural language based Discourse Analysis method used for extracting
information from the news article of different domain. The Discourse analysis used the Rhetorical Structure
theory which is used to find coherent group of text which are most prominent for extracting information
from text. RST theory used the Nucleus- Satellite concept for finding most prominent text from the text
document. After Discourse analysis the text analysis has been done for extracting domain related object
and relates this object. For extracting the information knowledge based system has been used which
consist of domain dictionary .The domain dictionary has a bag of words for domain. The system is
evaluated according gold-of-art analysis and human decision for extracted information.
The document discusses query-based summarization, including defining the task, evaluation criteria, and different approaches used. Some key approaches discussed are using document graphs to identify relevant sections, rhetorical structure theory to create a graph representation, linguistics techniques like Hidden Markov Models for sentence selection, and machine learning methods like using support vector machines to rank sentences. Different domains like medical and opinion summarization are also outlined.
This summarizes an academic paper that proposes an automatic ontology creation method for classifying research papers. It uses text mining techniques like classification and clustering algorithms. It first builds a research ontology by extracting keywords and patterns from previous papers. It then uses a decision tree algorithm to classify new papers into disciplines defined in the ontology. The classified papers are then clustered based on similarities to group them. The method was tested on a dataset of 100 papers and achieved average precision of 85.7% for term-based and 89.3% for pattern-based keyword extraction.
Secure File Sharing In Cloud Using Encryption with Digital Signature IJMER
This paper we discuss about the data sharing in a cloud with multiple owner by generating
keys and digital signature. The proposed methodology suggests the encryption of the files to be uploaded
on the cloud. To ensure the security of data, we proposed a method by implementing AES algorithm. The
integrity and confidentiality of the data is achieved by not only encrypting it but also providing Digital
Signature for successful authentication. It provides two way security for data sharing.
Unplanned startup launch: Product Hunt vs Fast Company vs Gizmodo. Source eff...Alessandro Marchesini
We shared our landing page in a few Facebook groups to test our startup idea and gain feedbacks. In the next days we ended up on Product Hunt, Fast Company and Gizmodo. We weren't ready for it and ... we'd like to share with you what we learned during the unplanned launch of Earlyclaim.com
This document analyzes capacity utilization at a cold forging organization in Bangalore, India. It examines machine data and utilization charts from February for various operations like annealing and phosphating. The data is analyzed on a weekly basis and inferences are made about factors affecting capacity like power cuts, maintenance issues, and labor shortages. Suggestions are provided like installing generators, improving labor scheduling, and expanding phosphating capacity. The document concludes there is scope for automating processes and conducting longer-term cost analyses to further optimize capacity utilization.
The document describes a fuzzy logic hysteresis current controlled shunt active power filter (SAPF) for power quality improvement. A SAPF uses a voltage source inverter to inject compensating currents to correct power factor, balance loads, and eliminate harmonics. The paper compares a conventional PI controller to a fuzzy logic controller for regulating the DC bus voltage and generating PWM signals. Simulation results using Matlab/Simulink show the fuzzy controller has better dynamic performance than the PI controller, maintaining steady state conditions and responding faster to transients or load changes.
The document provides a review of the Kindle Fire HD 8.9 tablet. It summarizes the tablet's hardware specifications including its 1920x1200 resolution screen, dual-core processor, 1GB of RAM, 32GB of storage, and 1.3MP camera. It discusses the tablet's picture quality, performance, battery life, software experience, and overall value. An additional video review is also embedded. The reviewer gives positive feedback and recommends the tablet for reading, watching videos, playing games and general media consumption, especially at its $300 price point.
Detection of Stall Region during Testing of CompressorIJMER
Most of the algorithms for anti surge control for industrial centrifugal compressor
work on the data taken from various process parameters . In recent years , some more works have
been done to co- relate the compressor stall / surge with machinery vibrations. These features have
been applied in some compressor control PLC as well. Compressor stalls are aerodynamic stalls
which can result to reduced compression efficiency or complete breakdown in compression. Stall is
normally a precursor to surge which cause radial vibration at certain frequencies .Pure surge, on the
other hand, is an axisymmetric oscillation of the mass row along the axial length of the compressor. It
is more severe than rotating stall and may cause severe damage to compressor. For high speed , high
pressure centrifugal compressor, it is imprative to conduct a ASME PTC10 type 1 test before
equipment is towed out. For high pressure centrifugal compressor, aerodyanmics induced vibration
may lead to Resizing diffuser and return channelleading to enormous delay if issue is not identified at
OEM ( Original equipment manufacturer ) test bench itself .This paper provides a road map to draw a
surge limit line and locating stall region in compressor map which can help deriving a Real Surge
Precursor Algorithm based on vibration and dynmaic pressure analysis at OEM works . These can be
used also as a part of diagnostic tool to identify the real source of sub synchronous vibration.
Lossy Transmission Lines Terminated by Parallel Connected RC-Loads and in Ser...IJMER
This summary provides the key details from the document in 3 sentences:
The document presents an analysis of lossy transmission lines terminated by nonlinear parallel connected GC loads and series connected L-load. It formulates the boundary conditions for the system based on Kirchhoff's law and reduces the mixed problem for the hyperbolic system of equations to an initial value problem for a neutral system on the boundary. The analysis shows that only oscillating solutions are characteristic for this configuration and nonlinearities in the system are also examined.
Design and Implementation of Wireless Embedded Systems at 60 GHz Millimeter-W...IJMER
ABSTRACT: Globally, there is a burning desire for a communication system that provides high quality, high capacity and
high speed information exchange and we need to develop an extremely spectrum-efficient transmission technology for the
same. This paper describes a realistic capacity and BER comparison of a robust and secured multiple access schemes and
develops a wireless embedded system at 60 GHz Millimeter-Wave using WiMAX waveform. The system is tested at the
laboratory with multimedia transmission and reception but yet to be tested after mounting on the vehicles. Technical
expertise are developed towards Simulink programming, methods of poring to VSG, IF and millimeter wave hardware, RTSA
use, Data Acquisition and DSP. With proper deployment of this 60 GHz system on vehicles, the existing commercial
products for 802.11P will be required to be replaced or updated soon. Simulation and implementation of the results will
elucidate that a significant amelioration in the spectral efficiency parameter can be achieved using the proposed WiMAX at
60GHz which provides both frequency diversity and spectral efficiency to yield a powerful and affordable solution for superhigh speed/4G transmission and ever-increasing requirement of high throughput in wideband multimedia communications
and ITS in vehicular communication.
Keywords: AWG, C2C-CC, MC-CDMA, VSA, WiMAX and WMAN, 4G
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESacijjournal
Named Entity Recognition which is an important subject of Natural Language Processing is a key technology of information extraction, information retrieval, question answering and other text processing applications. In this study, we evaluate previously well-established association measures as an initial
attempt to extract two-worded named entities in a Turkish corpus. Furthermore we propose a new association measure, and compare it with the other methods. The evaluation of these methods is performed by precision and recall measures.
Taxonomy extraction from automotive natural language requirements using unsup...ijnlc
In this paper we present a novel approach to semi-automatically learn concept hierarchies from natural
language requirements of the automotive industry. The approach is based on the distributional hypothesis
and the special characteristics of domain-specific German compounds. We extract taxonomies by using
clustering techniques in combination with general thesauri. Such a taxonomy can be used to support
requirements engineering in early stages by providing a common system understanding and an agreedupon
terminology. This work is part of an ontology-driven requirements engineering process, which builds
on top of the taxonomy. Evaluation shows that this taxonomy extraction approach outperforms common
hierarchical clustering techniques.
Text Segmentation for Online Subjective Examination using Machine LearningIRJET Journal
This document discusses using k-Nearest Neighbor (K-NN) machine learning for text segmentation of online exams. K-NN is an instance-based learning method that computes similarity between feature vectors to determine similarity between texts. The goal is to implement natural language processing using text segmentation, which provides benefits. It reviews related work applying various machine learning methods like K-NN, support vector machines, decision trees to tasks like text categorization and clustering.
Review of Various Text Categorization Methodsiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
Complete agglomerative hierarchy document’s clustering based on fuzzy luhn’s ...IJECEIAES
Agglomerative hierarchical is a bottom up clustering method, where the distances between documents can be retrieved by extracting feature values using a topic-based latent dirichlet allocation method. To reduce the number of features, term selection can be done using Luhn’s Idea. Those methods can be used to build the better clusters for document. But, there is less research discusses it. Therefore, in this research, the term weighting calculation uses Luhn’s Idea to select the terms by defining upper and lower cut-off, and then extracts the feature of terms using gibbs sampling latent dirichlet allocation combined with term frequency and fuzzy Sugeno method. The feature values used to be the distance between documents, and clustered with single, complete and average link algorithm. The evaluations show the feature extraction with and without lower cut-off have less difference. But, the topic determination of each term based on term frequency and fuzzy Sugeno method is better than Tsukamoto method in finding more relevant documents. The used of lower cut-off and fuzzy Sugeno gibbs latent dirichlet allocation for complete agglomerative hierarchical clustering have consistent metric values. This clustering method suggested as a better method in clustering documents that is more relevant to its gold standard.
A COMPARATIVE STUDY OF ROOT-BASED AND STEM-BASED APPROACHES FOR MEASURING THE...acijjournal
Representation of semantic information contained in the words is needed for any Arabic Text Mining
applications. More precisely, the purpose is to better take into account the semantic dependencies
between words expressed by the co-occurrence frequencies of these words. There have been many
proposals to compute similarities between words based on their distributions in contexts. In this paper,
we compare and contrast the effect of two preprocessing techniques applied to Arabic corpus: Rootbased (Stemming), and Stem-based (Light Stemming) approaches for measuring the similarity between
Arabic words with the well known abstractive model -Latent Semantic Analysis (LSA)- with a wide
variety of distance functions and similarity measures, such as the Euclidean Distance, Cosine Similarity,
Jaccard Coefficient, and the Pearson Correlation Coefficient. The obtained results show that, on the one
hand, the variety of the corpus produces more accurate results; on the other hand, the Stem-based
approach outperformed the Root-based one because this latter affects the words meanings.
Many of previous research have proven that the usage of rhetorical relations is capable to enhance many applications such as text summarization, question answering and natural language generation. This work proposes an approach that expands the benefit of rhetorical relations to address redundancy problem for cluster-based text summarization of multiple documents. We exploited rhetorical relations exist between sentences to group similar sentences into multiple clusters to identify themes of common information. The candidate summary were extracted from these clusters. Then, cluster-based text summarization is performed using Conditional Markov Random Walk Model to measure the saliency scores of the candidate summary. We evaluated our method by measuring the cohesion and separation of the clusters constructed by exploiting rhetorical relations and ROUGE score of generated summaries. The experimental result shows that our method performed well which shows promising potential of applying rhetorical relation in text clustering which benefits text summarization of multiple documents.
A comparative analysis of particle swarm optimization and k means algorithm f...ijnlc
The volume of digitized text documents on the web have been increasing rapidly. As there is huge collection
of data on the web there is a need for grouping(clustering) the documents into clusters for speedy
information retrieval. Clustering of documents is collection of documents into groups such that the
documents within each group are similar to each other and not to documents of other groups. Quality of
clustering result depends greatly on the representation of text and the clustering algorithm. This paper
presents a comparative analysis of three algorithms namely K-means, Particle swarm Optimization (PSO)
and hybrid PSO+K-means algorithm for clustering of text documents using WordNet. The common way of
representing a text document is bag of terms. The bag of terms representation is often unsatisfactory as it
does not exploit the semantics. In this paper, texts are represented in terms of synsets corresponding to a
word. Bag of terms data representation of text is thus enriched with synonyms from WordNet. K-means,
Particle Swarm Optimization (PSO) and hybrid PSO+K-means algorithms are applied for clustering of
text in Nepali language. Experimental evaluation is performed by using intra cluster similarity and inter
cluster similarity.
This document summarizes a research paper that investigates authorship attribution on short historical Arabic texts using naive Bayes classification. It explores using n-gram word and character level features and the naive Bayes classifier to attribute 30 short texts by 10 authors. The experiments show the naive Bayes classifier achieves high accuracy, up to 96%, especially using word 1-gram features. It also finds punctuation marks can help distinguish authors and increase performance.
Many of previous research have proven that the usage of rhetorical relations is capable to enhance many applications such as text summarization, question answering and natural language generation. This work proposes an approach that expands the benefit of rhetorical
relations to address redundancy problem in text summarization. We first examined and redefined the type of rhetorical relations that is useful to retrieve sentences with identical content and performed the identification of those relations using SVMs. By exploiting the
rhetorical relations exist between sentences, we generate clusters of similar sentences from document sets. Then, cluster-based text summarization is performed using Conditional Markov Random Walk Model to measure the saliency scores of candidates summary. We evaluated our
method by measuring the cohesion and separation of the clusters and ROUGE score of generated summaries. The experimental result shows that our method performed well which shows promising potential of applying rhetorical relation in cluster-based text summarization.
This paper proposes Natural language based Discourse Analysis method used for extracting
information from the news article of different domain. The Discourse analysis used the Rhetorical Structure
theory which is used to find coherent group of text which are most prominent for extracting information
from text. RST theory used the Nucleus- Satellite concept for finding most prominent text from the text
document. After Discourse analysis the text analysis has been done for extracting domain related object
and relates this object. For extracting the information knowledge based system has been used which
consist of domain dictionary .The domain dictionary has a bag of words for domain. The system is
evaluated according gold-of-art analysis and human decision for extracted information.
The document discusses query-based summarization, including defining the task, evaluation criteria, and different approaches used. Some key approaches discussed are using document graphs to identify relevant sections, rhetorical structure theory to create a graph representation, linguistics techniques like Hidden Markov Models for sentence selection, and machine learning methods like using support vector machines to rank sentences. Different domains like medical and opinion summarization are also outlined.
This summarizes an academic paper that proposes an automatic ontology creation method for classifying research papers. It uses text mining techniques like classification and clustering algorithms. It first builds a research ontology by extracting keywords and patterns from previous papers. It then uses a decision tree algorithm to classify new papers into disciplines defined in the ontology. The classified papers are then clustered based on similarities to group them. The method was tested on a dataset of 100 papers and achieved average precision of 85.7% for term-based and 89.3% for pattern-based keyword extraction.
Secure File Sharing In Cloud Using Encryption with Digital Signature IJMER
This paper we discuss about the data sharing in a cloud with multiple owner by generating
keys and digital signature. The proposed methodology suggests the encryption of the files to be uploaded
on the cloud. To ensure the security of data, we proposed a method by implementing AES algorithm. The
integrity and confidentiality of the data is achieved by not only encrypting it but also providing Digital
Signature for successful authentication. It provides two way security for data sharing.
Unplanned startup launch: Product Hunt vs Fast Company vs Gizmodo. Source eff...Alessandro Marchesini
We shared our landing page in a few Facebook groups to test our startup idea and gain feedbacks. In the next days we ended up on Product Hunt, Fast Company and Gizmodo. We weren't ready for it and ... we'd like to share with you what we learned during the unplanned launch of Earlyclaim.com
This document analyzes capacity utilization at a cold forging organization in Bangalore, India. It examines machine data and utilization charts from February for various operations like annealing and phosphating. The data is analyzed on a weekly basis and inferences are made about factors affecting capacity like power cuts, maintenance issues, and labor shortages. Suggestions are provided like installing generators, improving labor scheduling, and expanding phosphating capacity. The document concludes there is scope for automating processes and conducting longer-term cost analyses to further optimize capacity utilization.
The document describes a fuzzy logic hysteresis current controlled shunt active power filter (SAPF) for power quality improvement. A SAPF uses a voltage source inverter to inject compensating currents to correct power factor, balance loads, and eliminate harmonics. The paper compares a conventional PI controller to a fuzzy logic controller for regulating the DC bus voltage and generating PWM signals. Simulation results using Matlab/Simulink show the fuzzy controller has better dynamic performance than the PI controller, maintaining steady state conditions and responding faster to transients or load changes.
The document provides a review of the Kindle Fire HD 8.9 tablet. It summarizes the tablet's hardware specifications including its 1920x1200 resolution screen, dual-core processor, 1GB of RAM, 32GB of storage, and 1.3MP camera. It discusses the tablet's picture quality, performance, battery life, software experience, and overall value. An additional video review is also embedded. The reviewer gives positive feedback and recommends the tablet for reading, watching videos, playing games and general media consumption, especially at its $300 price point.
Detection of Stall Region during Testing of CompressorIJMER
Most of the algorithms for anti surge control for industrial centrifugal compressor
work on the data taken from various process parameters . In recent years , some more works have
been done to co- relate the compressor stall / surge with machinery vibrations. These features have
been applied in some compressor control PLC as well. Compressor stalls are aerodynamic stalls
which can result to reduced compression efficiency or complete breakdown in compression. Stall is
normally a precursor to surge which cause radial vibration at certain frequencies .Pure surge, on the
other hand, is an axisymmetric oscillation of the mass row along the axial length of the compressor. It
is more severe than rotating stall and may cause severe damage to compressor. For high speed , high
pressure centrifugal compressor, it is imprative to conduct a ASME PTC10 type 1 test before
equipment is towed out. For high pressure centrifugal compressor, aerodyanmics induced vibration
may lead to Resizing diffuser and return channelleading to enormous delay if issue is not identified at
OEM ( Original equipment manufacturer ) test bench itself .This paper provides a road map to draw a
surge limit line and locating stall region in compressor map which can help deriving a Real Surge
Precursor Algorithm based on vibration and dynmaic pressure analysis at OEM works . These can be
used also as a part of diagnostic tool to identify the real source of sub synchronous vibration.
Lossy Transmission Lines Terminated by Parallel Connected RC-Loads and in Ser...IJMER
This summary provides the key details from the document in 3 sentences:
The document presents an analysis of lossy transmission lines terminated by nonlinear parallel connected GC loads and series connected L-load. It formulates the boundary conditions for the system based on Kirchhoff's law and reduces the mixed problem for the hyperbolic system of equations to an initial value problem for a neutral system on the boundary. The analysis shows that only oscillating solutions are characteristic for this configuration and nonlinearities in the system are also examined.
Design and Implementation of Wireless Embedded Systems at 60 GHz Millimeter-W...IJMER
ABSTRACT: Globally, there is a burning desire for a communication system that provides high quality, high capacity and
high speed information exchange and we need to develop an extremely spectrum-efficient transmission technology for the
same. This paper describes a realistic capacity and BER comparison of a robust and secured multiple access schemes and
develops a wireless embedded system at 60 GHz Millimeter-Wave using WiMAX waveform. The system is tested at the
laboratory with multimedia transmission and reception but yet to be tested after mounting on the vehicles. Technical
expertise are developed towards Simulink programming, methods of poring to VSG, IF and millimeter wave hardware, RTSA
use, Data Acquisition and DSP. With proper deployment of this 60 GHz system on vehicles, the existing commercial
products for 802.11P will be required to be replaced or updated soon. Simulation and implementation of the results will
elucidate that a significant amelioration in the spectral efficiency parameter can be achieved using the proposed WiMAX at
60GHz which provides both frequency diversity and spectral efficiency to yield a powerful and affordable solution for superhigh speed/4G transmission and ever-increasing requirement of high throughput in wideband multimedia communications
and ITS in vehicular communication.
Keywords: AWG, C2C-CC, MC-CDMA, VSA, WiMAX and WMAN, 4G
A Unique Application to Reserve Doctor Appointment by Using Wireless Applicat...IJMER
WAP is a standardized technology for cross-platform, distributed computing, very similar to the Internet’s combination of Hypertext Markup Language (HTML) and Hypertext Transfer Protocol (HTTP). WAP could be described as a set of protocols that has inherited its characteristics and functionality from Internet standards and from standards developed for wireless services by some of the world’s leading companies in the business of wireless telecommunications. This application will help patients, the normal doctor and the medical director. The patient can reserve an appointment. The normal doctor can view and print the lists of patient appointment under his responsibility. The medical director can add new departments, add new doctors, and also can change the password to access the database. He can also modify data and working schedules of doctors assigned. He can add new patients and can have privilege access to transfer any patient appointment to another doctor. This Application which has been developed by using WAP was the first of its kind here, where software has been developed.
Scope of Improving Energy Utilization in Coal Based Co-Generation on Thermal ...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
Web search engines help users find useful information on the WWW. However, when the same query is submitted by different users, typical search engines return the same result regardless of who submitted the query. Generally, each user has different information needs for his/her query. Therefore, the search results should be adapted to users with different information needs. So, there is need of
several approaches to adapting search results according to each user’s need for relevant information without any user effort. Such search systems that adapt to each user’s preferences can be achieved by constructing user profiles based on modified collaborative filtering with detailed analysis of user’s browsing history. There are three possible types of web search system which can provide personalized information: (1) systems using relevance feedback, (2) systems in which users register their interest, and (3) systems that recommend information based on user’s history. In first technique, users have to provide feedback on relevant or irrelevant judgments which is time consuming and the second one needs
registration of users with their static interests which need extra effort from user. So, the third technique is best in which users don’t have to give explicit rating; relevancy automatically tracked by user behavior with search results and history of data usage. It doesn’t require registration of interests; it captures changing interests of user dynamically by itself. The result section shows that user’s browsing history allows each user to perform more fine-grained search by capturing changes of each user’s
preferences without any user effort. Users need less time to find the relevant snippet in personalized
search results compared to original results.
This document discusses the idea of constructing multi-functional storage reservoirs in mountain regions of Poland. Small reservoirs could be built instead of large ones by taking advantage of the terrain and using naturally occurring rock eluvium to form cofferdams. This would help address issues like floods, droughts, and uneven water distribution. The flysch rock formations and eluvium soils in mountainous areas of southern Poland are well-suited for small reservoir construction without needing additional sealing due to their composition and properties. Such reservoirs could provide benefits like increased water resources, flood protection, landscape preservation, and slope stabilization.
This document discusses HIV/AIDS, including how it is transmitted, prevented, and tested for. It notes that HIV is a retrovirus that attacks and destroys the immune system over time. It can be transmitted through blood, semen, vaginal fluid, and breast milk. Prevention methods include safer sex practices and not sharing needles. HIV testing involves a blood test to detect antibodies and must include counseling. While there is no cure for HIV/AIDS, treatment can suppress the virus and prolong life. The document addresses issues around transmission, prevention, and testing.
This document discusses the effect of preform geometry on material behavior and densification during hot upset forging of sintered AISI 9840 steel powder metal parts. Powder blends were prepared with different compositions and compacted into preforms with varying initial aspect ratios between 0.45-0.92. The preforms were sintered and hot forged to different height strains. Results showed that lower aspect ratio preforms densified more rapidly than higher ratios. Densification curves followed a third order polynomial relationship with height strain. Preform geometry significantly affected the densification curves and Poisson's ratio with density.
This document presents a new approach for multiclass image segmentation and categorization using Bayesian networks and spatial Markov kernels. It first constructs an over-segmented image and Bayesian network to model relationships between image elements. Interactive segmentation is performed to match pixels to an outline provided by the user. The segmented image is then categorized using a spatial Markov kernel algorithm based on visual keywords assigned to image blocks. The approach achieves 93.5% accuracy on test images. It provides a probabilistic way to model image segmentation and allows new knowledge to be incorporated through the Bayesian network framework.
This document summarizes an article that proposes a new algorithm for efficiently mining both positive and negative association rules from transactional databases. The algorithm first constructs a frequent pattern tree (FP-tree) to store the transaction information. It then uses an FP-growth approach to iteratively find frequent patterns and generate the positive and negative association rules without candidate generation. The algorithm aims to overcome limitations of previous methods and efficiently find all valid comparative association rules.
Planning and Designing a Stand Alone Solar Power System for Multi-Building Or...IJMER
The purpose of this project is to discover ways to produce energy with alternate sources. This presents current status, major achievements and future aspects of solar energy in India and evaluation of current energy policies for conquering the obstructions and implementing solar for the future is also been presented. Solar energy is expected to play a very significant role in the future especially in developing countries, but it has also potential prospects for developed countries. Solar radiation is an integral part of different renewable energy resources like PV power, solar thermal power, solar heater etc.
This consists of Study of the Solar cell, Solar Photovoltaic Technology, Planning and Designing a Stand Alone Solar Power System for Multi Building in an Organization where Solar energy plays an important role for the power supply in case of emergency by replacing Diesel Generator set i.e. DG Set.
This gives a detail planning and designing of solar power system of 80KW demand per hour for Al-Falah School of engineering and technology, Brown hills college of engineering and technology, central canteen, masjid, Hostel as well as Al-Falah School of Training and Education.
Experimental Investigation & Evaluation of Incorporated Material to Set Thei...IJMER
1. The document discusses an experimental investigation and evaluation of inventory control methods at a steel manufacturing plant in India.
2. Data was collected on 49 items in the plant's inventory, including annual usage, costs, consumption rates, and other factors.
3. Mathematical modeling using VED analysis, XYZ analysis, and MATLAB was conducted to determine the optimum re-order point (ROP) for items, balancing safety stock needs with inventory carrying costs.
The document describes a proposed system for automatic text summarization using latent semantic analysis and natural language processing. The system would take in a user-submitted document, preprocess it by removing stop words, generate a singular value decomposition matrix to derive the latent semantic structure, and output a summary by semantically arranging sentences to encompass the original text's concepts. The document provides an implementation example in Python using the LSA method from the scikit-learn machine learning library for natural language processing and matrix decomposition.
A hybrid approach for text summarization using semantic latent Dirichlet allo...IJECEIAES
Automatic text summarization generates a summary that contains sentences reflecting the essential and relevant information of the original documents. Extractive summarization requires semantic understanding, while abstractive summarization requires a better intermediate text representation. This paper proposes a hybrid approach for generating text summaries that combine extractive and abstractive methods. To improve the semantic understanding of the model, we propose two novel extractive methods: semantic latent Dirichlet allocation (semantic LDA) and sentence concept mapping. We then generate an intermediate summary by applying our proposed sentence ranking algorithm over the sentence concept mapping. This intermediate summary is input to a transformer-based abstractive model fine-tuned with a multi-head attention mechanism. Our experimental results demonstrate that the proposed hybrid model generates coherent summaries using the intermediate extractive summary covering semantics. As we increase the concepts and number of words in the summary the rouge scores are improved for precision and F1 scores in our proposed model.
EXPLOITING RHETORICAL RELATIONS TO MULTIPLE DOCUMENTS TEXT SUMMARIZATIONIJNSA Journal
Many of previous research have proven that the usage of rhetorical relations is capable to enhance many applications such as text summarization, question answering and natural language generation. This work proposes an approach that expands the benefit of rhetorical relations to address redundancy problem for cluster-based text summarization of multiple documents. We exploited rhetorical relations exist between sentences to group similar sentences into multiple clusters to identify themes of common information. The candidate summary were extracted from these clusters. Then, cluster-based text summarization is performed using Conditional Markov Random Walk Model to measure the saliency scores of the candidate summary. We evaluated our method by measuring the cohesion and separation of the clusters constructed by exploiting rhetorical relations and ROUGE score of generated summaries. The experimental result shows that our method performed well which shows promising potential of applying rhetorical relation in text clustering which benefits text summarization of multiple documents.
A template based algorithm for automatic summarization and dialogue managemen...eSAT Journals
Abstract This paper describes an automated approach for extracting significant and useful events from unstructured text. The goal of research is to come out with a methodology which helps in extracting important events such as dates, places, and subjects of interest. It would be also convenient if the methodology helps in presenting the users with a shorter version of the text which contain all non-trivial information. We also discuss implementation of algorithms which exactly does this task, developed by us. Key Words: Cosine Similarity, Information, Natural Language, Summarization, Text Mining
A hybrid composite features based sentence level sentiment analyzerIAESIJAI
Current lexica and machine learning based sentiment analysis approaches
still suffer from a two-fold limitation. First, manual lexicon construction and
machine training is time consuming and error-prone. Second, the
prediction’s accuracy entails sentences and their corresponding training text
should fall under the same domain. In this article, we experimentally
evaluate four sentiment classifiers, namely support vector machines (SVMs),
Naive Bayes (NB), logistic regression (LR) and random forest (RF). We
quantify the quality of each of these models using three real-world datasets
that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic
movie reviews. Specifically, we study the impact of a variety of natural
language processing (NLP) pipelines on the quality of the predicted
sentiment orientations. Additionally, we measure the impact of incorporating
lexical semantic knowledge captured by WordNet on expanding original
words in sentences. Findings demonstrate that the utilizing different NLP
pipelines and semantic relationships impacts the quality of the sentiment
analyzers. In particular, results indicate that coupling lemmatization and
knowledge-based n-gram features proved to produce higher accuracy results.
With this coupling, the accuracy of the SVM classifier has improved to
90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three
other classifiers.
A domain specific automatic text summarization using fuzzy logicIAEME Publication
This paper proposes a domain-specific automatic text summarization technique using fuzzy logic. It extracts important sentences from documents using both statistical and linguistic features. The technique first preprocesses documents by segmenting sentences, removing stop words, and stemming words. It then extracts eight features from each sentence, including title words, sentence length, position, and similarity. Fuzzy logic is used to assign an importance score to each sentence based on these features. Sentences are ranked and the top sentences are selected for the summary based on the compression rate. The technique was evaluated on news articles and shown to generate good quality summaries compared to other summarizers based on precision, recall, and F-measure.
The document proposes using conditional random fields (CRFs) to improve legal document summarization. CRFs are applied to segment legal documents into seven labeled rhetorical components. Feature sets are used to improve CRF performance. A term distribution model and structured domain knowledge are then used to extract key sentences for each rhetorical category. The resulting structured summary is found to be 80% accurate compared to ideal summaries generated by experts.
The document proposes applying conditional random fields (CRFs) for legal document summarization. CRFs are used to segment legal documents into seven labeled rhetorical roles. Feature sets are employed to improve CRF performance. A term distribution model and structured domain knowledge are then used to extract key sentences related to the rhetorical categories. The final summaries generated are around 80% accurate compared to expert-generated summaries.
The document proposes using conditional random fields (CRFs) to improve legal document summarization. CRFs are applied to segment legal documents into seven labeled rhetorical components. Feature sets are used to improve CRF performance. A term distribution model and structured domain knowledge are then used to extract key sentences for each rhetorical category. The resulting structured summary is found to be 80% accurate compared to ideal summaries generated by experts.
A new approach based on the detection of opinion by sentiwordnet for automati...csandit
In this paper, we propose a new approach based on t
he detection of opinion by the
SentiWordNet for the production of text summarizati
on by using the scoring extraction
technique adapted to detecting of opinion. The tex
ts are decomposed into sentences then
represented by a vector of scores of opinion of thi
s sentences. The summary will be done by
elimination of sentences whose opinion is different
from the original text. This difference is
expressed by a threshold opinion. The following hyp
othesis: "textual units that do not share the
same opinion of the text are ideas used for the dev
elopment or comparison and their absences
have no vocation to reach the semantics of the abst
ract" Has been verified by the statistical
measure of Chi_2 which we used it to calculate a de
pendence between the unit textual and the
text. Finally we found an opinion threshold interva
l which generate the optimal assessments.
A NEW APPROACH BASED ON THE DETECTION OF OPINION BY SENTIWORDNET FOR AUTOMATI...cscpconf
In this paper, we propose a new approach based on the detection of opinion by the
SentiWordNet for the production of text summarization by using the scoring extraction
technique adapted to detecting of opinion. The texts are decomposed into sentences then
represented by a vector of scores of opinion of this sentences. The summary will be done by
elimination of sentences whose opinion is different from the original text. This difference is
expressed by a threshold opinion. The following hypothesis: "textual units that do not share the
same opinion of the text are ideas used for the development or comparison and their absences
have no vocation to reach the semantics of the abstract" Has been verified by the statistical
measure of Chi_2 which we used it to calculate a dependence between the unit textual and the
text. Finally we found an opinion threshold interval which generate the optimal assessments
Text Mining is the technique that helps users to find out useful information from a large amount of text documents on the web or database. Most popular text mining and classification methods have adopted term-based approaches. The term based approaches and the pattern-based method describing user preferences. This review paper analyse how the text mining work on the three level i.e sentence level, document level and feature level. In this paper we review the related work which is previously done. This paper also demonstrated that what are the problems arise while doing text mining done at the feature level. This paper presents the technique to text mining for the compound sentences.
A statistical model for gist generation a case study on hindi news articleIJDKP
Every day, huge number of news articles are reported and disseminated on the Internet. By generating gist
of an article, reader can go through the main topics instead of reading the whole article as it takes much
time for reader to read the entire content of the article. An ideal system would understand the document
and generate the appropriate theme(s) directly from the results of the understanding. In the absence of
natural language understanding system, it is required to design an appropriate system. Gist generation is a
difficult task because it requires both maximizing text content in short summary and maintains
grammaticality of the text. In this paper we present a statistical approach to generate a gist of a Hindi
news article. The experimental results are evaluated using the standard measures such as precision, recall
and F1 measure for different statistical models and their combination on the article before pre-processing
and after pre-processing.
Semantic Based Document Clustering Using Lexical ChainsIRJET Journal
The document proposes a semantic-based document clustering approach using lexical chains. It uses WordNet to perform word sense disambiguation on documents to identify core semantic features. Lexical chains of semantically related words are then generated from the documents based on these features. The lexical chains represent the semantic content of documents and are used to cluster the documents. The results show improved clustering performance compared to traditional approaches. The approach aims to address challenges in text clustering like extracting core semantics, assigning meaningful cluster descriptions, and vocabulary diversity.
IRJET-Semantic Based Document Clustering Using Lexical ChainsIRJET Journal
This document discusses a semantic-based document clustering approach using lexical chains. It proposes using WordNet to perform word sense disambiguation on documents to extract core semantic features represented as lexical chains. Lexical chains identify semantically related words in a text based on relations like synonyms and hypernyms. Documents are then clustered based on the lexical chains extracted. The approach aims to overcome issues in traditional clustering like synonyms and polysemy by incorporating semantic information from WordNet ontology. It is argued that identifying themes based on disambiguated semantic features extracted via lexical chains can improve text clustering performance compared to bag-of-words models. An evaluation of the approach showed better results when using a threshold of 50% for lexical chain selection.
The document describes a new multi-topic multi-document summarization technique that uses automatically extracted keyphrases to evaluate the importance of sentences and documents. It introduces two keyphrase-based techniques: Sen-Rich, which extracts summary sentences rich in important topics, and Doc-Rich, which selects sentences from important centroid documents. An evaluation of the techniques on Arabic documents found that Doc-Rich performed better, producing summaries with extra coverage and more cohesion.
IRJET- Automatic Recapitulation of Text DocumentIRJET Journal
The document describes an approach for automatically generating summaries of text documents. It involves preprocessing the input text by tokenizing, removing stop words, and stemming words. It then extracts features from the preprocessed text, such as term frequency-inverse document frequency (tf-idf) values. Sentences are scored based on these features and sentences with higher scores are selected to form the summary. Keywords from the text are also identified using WordNet to help select the most relevant sentences for the summary. The proposed approach aims to generate concise yet meaningful summaries using natural language processing techniques.
The effect of training set size in authorship attribution: application on sho...IJECEIAES
Authorship attribution (AA) is a subfield of linguistics analysis, aiming to identify the original author among a set of candidate authors. Several research papers were published and several methods and models were developed for many languages. However, the number of related works for Arabic is limited. Moreover, investigating the impact of short words length and training set size is not well addressed. To the best of our knowledge, no published works or researches, in this direction or even in other languages, are available. Therefore, we propose to investigate this effect, taking into account different stylomatric combination. The Mahalanobis distance (MD), Linear Regression (LR), and Multilayer Perceptron (MP) are selected as AA classifiers. During the experiment, the training dataset size is increased and the accuracy of the classifiers is recorded. The results are quite interesting and show different classifiers behaviours. Combining word-based stylomatric features with n-grams provides the best accuracy reached in average 93%.
Similar to Query Answering Approach Based on Document Summarization (20)
A Study on Translucent Concrete Product and Its Properties by Using Optical F...IJMER
- Translucent concrete is a concrete based material with light-transferring properties,
obtained due to embedded light optical elements like Optical fibers used in concrete. Light is conducted
through the concrete from one end to the other. This results into a certain light pattern on the other
surface, depending on the fiber structure. Optical fibers transmit light so effectively that there is
virtually no loss of light conducted through the fibers. This paper deals with the modeling of such
translucent or transparent concrete blocks and panel and their usage and also the advantages it brings
in the field. The main purpose is to use sunlight as a light source to reduce the power consumption of
illumination and to use the optical fiber to sense the stress of structures and also use this concrete as an
architectural purpose of the building
Developing Cost Effective Automation for Cotton Seed DelintingIJMER
A low cost automation system for removal of lint from cottonseed is to be designed and
developed. The setup consists of stainless steel drum with stirrer in which cottonseeds having lint is mixed
with concentrated sulphuric acid. So lint will get burn. This lint free cottonseed treated with lime water to
neutralize acidic nature. After water washing this cottonseeds are used for agriculter purpose
Study & Testing Of Bio-Composite Material Based On Munja FibreIJMER
The incorporation of natural fibres such as munja fiber composites has gained
increasing applications both in many areas of Engineering and Technology. The aim of this study is to
evaluate mechanical properties such as flexural and tensile properties of reinforced epoxy composites.
This is mainly due to their applicable benefits as they are light weight and offer low cost compared to
synthetic fibre composites. Munja fibres recently have been a substitute material in many weight-critical
applications in areas such as aerospace, automotive and other high demanding industrial sectors. In
this study, natural munja fibre composites and munja/fibreglass hybrid composites were fabricated by a
combination of hand lay-up and cold-press methods. A new variety in munja fibre is the present work
the main aim of the work is to extract the neat fibre and is characterized for its flexural characteristics.
The composites are fabricated by reinforcing untreated and treated fibre and are tested for their
mechanical, properties strictly as per ASTM procedures.
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)IJMER
Hybrid engine is a combination of Stirling engine, IC engine and Electric motor. All these 3 are
connected together to a single shaft. The power source of the Stirling engine will be a Solar Panel. The aim of
this is to run the automobile using a Hybrid engine
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...IJMER
This document summarizes research on the fabrication and characterization of bio-composite materials using sunnhemp fibre. The document discusses how sunnhemp fibre was used to reinforce an epoxy matrix through hand lay-up methods. Various mechanical properties of the bio-composites were tested, including tensile, flexural, and impact properties. The results of the mechanical tests on the bio-composite specimens are presented. Potential applications of the sunnhemp fibre bio-composites are also suggested, such as in fall ceilings, partitions, packaging, automotive interiors, and toys.
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...IJMER
The Greenstone belts of Karnataka are enriched in BIFs in Dharwar craton, where Iron
formations are confined to the basin shelf, clearly separated from the deeper-water iron formation that
accumulated at the basin margin and flanking the marine basin. Geochemical data procured in terms of
major, trace and REE are plotted in various diagrams to interpret the genesis of BIFs. Al2O3, Fe2O3 (T),
TiO2, CaO, and SiO2 abundances and ratios show a wide variation. Ni, Co, Zr, Sc, V, Rb, Sr, U, Th,
ΣREE, La, Ce and Eu anomalies and their binary relationships indicate that wherever the terrigenous
component has increased, the concentration of elements of felsic such as Zr and Hf has gone up. Elevated
concentrations of Ni, Co and Sc are contributed by chlorite and other components characteristic of basic
volcanic debris. The data suggest that these formations were generated by chemical and clastic
sedimentary processes on a shallow shelf. During transgression, chemical precipitation took place at the
sediment-water interface, whereas at the time of regression. Iron ore formed with sedimentary structures
and textures in Kammatturu area, in a setting where the water column was oxygenated.
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...IJMER
In this paper, the mechanical characteristics of C45 medium carbon steel are investigated
under various working conditions. The main characteristic to be studied on this paper is impact toughness
of the material with different configurations and the experiment were carried out on charpy impact testing
equipment. This study reveals the ability of the material to absorb energy up to failure for various
specimen configurations under different heat treated conditions and the corresponding results were
compared with the analysis outcome
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...IJMER
Robot guns are being increasingly employed in automotive manufacturing to replace
risky jobs and also to increase productivity. Using a single robot for a single operation proves to be
expensive. Hence for cost optimization, multiple guns are mounted on a single robot and multiple
operations are performed. Robot Gun structure is an efficient way in which multiple welds can be done
simultaneously. However mounting several weld guns on a single structure induces a variety of
dynamic loads, especially during movement of the robot arm as it maneuvers to reach the weld
locations. The primary idea employed in this paper, is to model those dynamic loads as equivalent G
force loads in FEA. This approach will be on the conservative side, and will be saving time and
subsequently cost efficient. The approach of the paper is towards creating a standard operating
procedure when it comes to analysis of such structures, with emphasis on deploying various technical
aspects of FEA such as Non Linear Geometry, Multipoint Constraint Contact Algorithm, Multizone
meshing .
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationIJMER
This paper aims to do modelling, simulation and performing the static analysis of a go
kart chassis consisting of Circular beams. Modelling, simulations and analysis are performed using 3-D
modelling software i.e. Solid Works and ANSYS according to the rulebook provided by Indian Society of
New Era Engineers (ISNEE) for National Go Kart Championship (NGKC-14).The maximum deflection is
determined by performing static analysis. Computed results are then compared to analytical calculation,
where it is found that the location of maximum deflection agrees well with theoretical approximation but
varies on magnitude aspect.
In récent year various vehicle introduced in market but due to limitation in
carbon émission and BS Séries limitd speed availability vehicle in the market and causing of
environnent pollution over few year There is need to decrease dependancy on fuel vehicle.
bicycle is to be modified for optional in the future To implement new technique using change in
pedal assembly and variable speed gearbox such as planetary gear optimise speed of vehicle
with variable speed ratio.To increase the efficiency of bicycle for confortable drive and to
reduce torque appli éd on bicycle. we introduced epicyclic gear box in which transmission done
throgh Chain Drive (i.e. Sprocket )to rear wheel with help of Epicyclical gear Box to give
number of différent Speed during driving.To reduce torque requirent in the cycle with change in
the pedal mechanism
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIJMER
This document discusses integrating the Spring, Struts, and Hibernate frameworks to develop enterprise applications. It provides an overview of each framework and their features. The Spring Framework is a lightweight, modular framework that allows for inversion of control and aspect-oriented programming. It can be used to develop any or all tiers of an application. The document proposes an architecture for an e-commerce website that integrates these three frameworks, with Spring handling the business layer, Struts the presentation layer, and Hibernate the data access layer. This modular approach allows for clear separation of concerns and reduces complexity in application development.
Microcontroller Based Automatic Sprinkler Irrigation SystemIJMER
Microcontroller based Automatic Sprinkler System is a new concept of using
intelligence power of embedded technology in the sprinkler irrigation work. Designed system replaces
the conventional manual work involved in sprinkler irrigation to automatic process. Using this system a
farmer is protected against adverse inhuman weather conditions, tedious work of changing over of
sprinkler water pipe lines & risk of accident due to high pressure in the water pipe line. Overall
sprinkler irrigation work is transformed in to a comfortableautomatic work. This system provides
flexibility & accuracy in respect of time set for the operation of a sprinkler water pipe lines. In present
work the author has designed and developed an automatic sprinkler irrigation system which is
controlled and monitored by a microcontroller interfaced with solenoid valves.
On some locally closed sets and spaces in Ideal Topological SpacesIJMER
This document introduces and studies the concept of δˆ s-locally closed sets in ideal topological spaces. Some key points:
- A subset A is δˆ s-locally closed if A can be written as the intersection of a δˆ s-open set and a δˆ s-closed set.
- Various properties of δˆ s-locally closed sets are introduced and characterized, including relationships to other concepts like generalized locally closed sets.
- It is shown that a subset A is δˆ s-locally closed if and only if A can be written as the intersection of a δˆ s-open set and the δˆ s-closure of A.
- Theore
Intrusion Detection and Forensics based on decision tree and Association rule...IJMER
This paper present an approach based on the combination of, two techniques using
decision tree and Association rule mining for Probe attack detection. This approach proves to be
better than the traditional approach of generating rules for fuzzy expert system by clustering methods.
Association rule mining for selecting the best attributes together and decision tree for identifying the
best parameters together to create the rules for fuzzy expert system. After that rules for fuzzy expert
system are generated using association rule mining and decision trees. Decision trees is generated for
dataset and to find the basic parameters for creating the membership functions of fuzzy inference
system. Membership functions are generated for the probe attack. Based on these rules we have
created the fuzzy inference system that is used as an input to neuro-fuzzy system. Fuzzy inference
system is loaded to neuro-fuzzy toolbox as an input and the final ANFIS structure is generated for
outcome of neuro-fuzzy approach. The experiments and evaluations of the proposed method were
done with NSL-KDD intrusion detection dataset. As the experimental results, the proposed approach
based on the combination of, two techniques using decision tree and Association rule mining
efficiently detected probe attacks. Experimental results shows better results for detecting intrusions as
compared to others existing methods
Natural Language Ambiguity and its Effect on Machine LearningIJMER
This document discusses natural language ambiguity and its effect on machine learning. It begins by introducing different types of ambiguity that exist in natural languages, including lexical, syntactic, semantic, discourse, and pragmatic ambiguities. It then examines how these ambiguities present challenges for computational linguistics and machine translation systems. Specifically, it notes that ambiguity is a major problem for computers in processing human language as they lack the world knowledge and context that humans use to resolve ambiguities. The document concludes by outlining the typical process of machine translation and how ambiguities can interfere with tasks like analysis, transfer, and generation of text in the target language.
Today in era of software industry there is no perfect software framework available for
analysis and software development. Currently there are enormous number of software development
process exists which can be implemented to stabilize the process of developing a software system. But no
perfect system is recognized till yet which can help software developers for opting of best software
development process. This paper present the framework of skillful system combined with Likert scale. With
the help of Likert scale we define a rule based model and delegate some mass score to every process and
develop one tool name as MuxSet which will help the software developers to select an appropriate
development process that may enhance the probability of system success.
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersIJMER
The present study investigates the creep in a thick-walled composite cylinders made
up of aluminum/aluminum alloy matrix and reinforced with silicon carbide particles. The distribution
of SiCp is assumed to be either uniform or decreasing linearly from the inner to the outer radius of
the cylinder. The creep behavior of the cylinder has been described by threshold stress based creep
law with a stress exponent of 5. The composite cylinders are subjected to internal pressure which is
applied gradually and steady state condition of stress is assumed. The creep parameters required to
be used in creep law, are extracted by conducting regression analysis on the available experimental
results. The mathematical models have been developed to describe steady state creep in the composite
cylinder by using von-Mises criterion. Regression analysis is used to obtain the creep parameters
required in the study. The basic equilibrium equation of the cylinder and other constitutive equations
have been solved to obtain creep stresses in the cylinder. The effect of varying particle size, particle
content and temperature on the stresses in the composite cylinder has been analyzed. The study
revealed that the stress distributions in the cylinder do not vary significantly for various combinations
of particle size, particle content and operating temperature except for slight variation observed for
varying particle content. Functionally Graded Materials (FGMs) emerged and led to the development
of superior heat resistant materials.
Energy Audit is the systematic process for finding out the energy conservation
opportunities in industrial processes. The project carried out studies on various energy conservation
measures application in areas like lighting, motors, compressors, transformer, ventilation system etc.
In this investigation, studied the technical aspects of the various measures along with its cost benefit
analysis.
Investigation found that major areas of energy conservation are-
1. Energy efficient lighting schemes.
2. Use of electronic ballast instead of copper ballast.
3. Use of wind ventilators for ventilation.
4. Use of VFD for compressor.
5. Transparent roofing sheets to reduce energy consumption.
So Energy Audit is the only perfect & analyzed way of meeting the Industrial Energy Conservation.
An Implementation of I2C Slave Interface using Verilog HDLIJMER
This document describes the implementation of an I2C slave interface using Verilog HDL. It introduces the I2C protocol which uses only two bidirectional lines (SDA and SCL) for communication. The document discusses the I2C protocol specifications including start/stop conditions, addressing, read/write operations, and acknowledgements. It then provides details on designing an I2C slave module in Verilog that responds to commands from an I2C master and allows synchronization through clock stretching. The module is simulated in ModelSim and synthesized in Xilinx. Simulation waveforms demonstrate successful read and write operations to the slave device.
Discrete Model of Two Predators competing for One PreyIJMER
This paper investigates the dynamical behavior of a discrete model of one prey two
predator systems. The equilibrium points and their stability are analyzed. Time series plots are obtained
for different sets of parameter values. Also bifurcation diagrams are plotted to show dynamical behavior
of the system in selected range of growth parameter
Discover the latest insights on Data Driven Maintenance with our comprehensive webinar presentation. Learn about traditional maintenance challenges, the right approach to utilizing data, and the benefits of adopting a Data Driven Maintenance strategy. Explore real-world examples, industry best practices, and innovative solutions like FMECA and the D3M model. This presentation, led by expert Jules Oudmans, is essential for asset owners looking to optimize their maintenance processes and leverage digital technologies for improved efficiency and performance. Download now to stay ahead in the evolving maintenance landscape.
Rainfall intensity duration frequency curve statistical analysis and modeling...bijceesjournal
Using data from 41 years in Patna’ India’ the study’s goal is to analyze the trends of how often it rains on a weekly, seasonal, and annual basis (1981−2020). First, utilizing the intensity-duration-frequency (IDF) curve and the relationship by statistically analyzing rainfall’ the historical rainfall data set for Patna’ India’ during a 41 year period (1981−2020), was evaluated for its quality. Changes in the hydrologic cycle as a result of increased greenhouse gas emissions are expected to induce variations in the intensity, length, and frequency of precipitation events. One strategy to lessen vulnerability is to quantify probable changes and adapt to them. Techniques such as log-normal, normal, and Gumbel are used (EV-I). Distributions were created with durations of 1, 2, 3, 6, and 24 h and return times of 2, 5, 10, 25, and 100 years. There were also mathematical correlations discovered between rainfall and recurrence interval.
Findings: Based on findings, the Gumbel approach produced the highest intensity values, whereas the other approaches produced values that were close to each other. The data indicates that 461.9 mm of rain fell during the monsoon season’s 301st week. However, it was found that the 29th week had the greatest average rainfall, 92.6 mm. With 952.6 mm on average, the monsoon season saw the highest rainfall. Calculations revealed that the yearly rainfall averaged 1171.1 mm. Using Weibull’s method, the study was subsequently expanded to examine rainfall distribution at different recurrence intervals of 2, 5, 10, and 25 years. Rainfall and recurrence interval mathematical correlations were also developed. Further regression analysis revealed that short wave irrigation, wind direction, wind speed, pressure, relative humidity, and temperature all had a substantial influence on rainfall.
Originality and value: The results of the rainfall IDF curves can provide useful information to policymakers in making appropriate decisions in managing and minimizing floods in the study area.
Prediction of Electrical Energy Efficiency Using Information on Consumer's Ac...PriyankaKilaniya
Energy efficiency has been important since the latter part of the last century. The main object of this survey is to determine the energy efficiency knowledge among consumers. Two separate districts in Bangladesh are selected to conduct the survey on households and showrooms about the energy and seller also. The survey uses the data to find some regression equations from which it is easy to predict energy efficiency knowledge. The data is analyzed and calculated based on five important criteria. The initial target was to find some factors that help predict a person's energy efficiency knowledge. From the survey, it is found that the energy efficiency awareness among the people of our country is very low. Relationships between household energy use behaviors are estimated using a unique dataset of about 40 households and 20 showrooms in Bangladesh's Chapainawabganj and Bagerhat districts. Knowledge of energy consumption and energy efficiency technology options is found to be associated with household use of energy conservation practices. Household characteristics also influence household energy use behavior. Younger household cohorts are more likely to adopt energy-efficient technologies and energy conservation practices and place primary importance on energy saving for environmental reasons. Education also influences attitudes toward energy conservation in Bangladesh. Low-education households indicate they primarily save electricity for the environment while high-education households indicate they are motivated by environmental concerns.
VARIABLE FREQUENCY DRIVE. VFDs are widely used in industrial applications for...PIMR BHOPAL
Variable frequency drive .A Variable Frequency Drive (VFD) is an electronic device used to control the speed and torque of an electric motor by varying the frequency and voltage of its power supply. VFDs are widely used in industrial applications for motor control, providing significant energy savings and precise motor operation.
Advanced control scheme of doubly fed induction generator for wind turbine us...IJECEIAES
This paper describes a speed control device for generating electrical energy on an electricity network based on the doubly fed induction generator (DFIG) used for wind power conversion systems. At first, a double-fed induction generator model was constructed. A control law is formulated to govern the flow of energy between the stator of a DFIG and the energy network using three types of controllers: proportional integral (PI), sliding mode controller (SMC) and second order sliding mode controller (SOSMC). Their different results in terms of power reference tracking, reaction to unexpected speed fluctuations, sensitivity to perturbations, and resilience against machine parameter alterations are compared. MATLAB/Simulink was used to conduct the simulations for the preceding study. Multiple simulations have shown very satisfying results, and the investigations demonstrate the efficacy and power-enhancing capabilities of the suggested control system.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
AI for Legal Research with applications, toolsmahaffeycheryld
AI applications in legal research include rapid document analysis, case law review, and statute interpretation. AI-powered tools can sift through vast legal databases to find relevant precedents and citations, enhancing research accuracy and speed. They assist in legal writing by drafting and proofreading documents. Predictive analytics help foresee case outcomes based on historical data, aiding in strategic decision-making. AI also automates routine tasks like contract review and due diligence, freeing up lawyers to focus on complex legal issues. These applications make legal research more efficient, cost-effective, and accessible.
Query Answering Approach Based on Document Summarization
1. International
OPEN ACCESS Journal
Of Modern Engineering Research (IJMER)
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 12 | Dec. 2014 | 50 |
Query Answering Approach Based on Document Summarization
Hesham Ahmed Hassan1
, Mohamed Yehia Dahab2
, Khaled Bahnassy3
,
Amira M. Idrees4
, Fatma Gamal5
1
Faculty of Computer and Information Computer Science Department, Cairo University
2
Faculty of Computer and Information, Computer Science Department, King Abdulaziz University
3
Faculty of Computer and Information, Computer Science Department, Ain Shams University
4
Faculty of Computer and Information, Information Systems Department, Fayoum University
5
Faculty of Computer and Information, Computer Science Department, Cairo University
I. Introduction
Summarization is ―a brief restatement within the document (usually at the end) of its salient findings and
conclusions, and is intended to complete the orientation of a reader who has studied the preceding text‖ while an
abstract is, according to the same standard, a ―Short representation of the content of a document without
interpretation or criticism‖. [MARTIN, 2008]. According to Mikael, in [Mikael, 2014], Automatic text
summarization approaches can be classified into vector based approach, Fuzzy based approach, Genetic
algorithm based approach, and Neural Network based approach.
Semantic pattern can be defined, according to [Mohamed, 2008] as "a generic format for natural language
expression, to declare a specific meaning". The distinguishing of these semantic patterns are not straightforward
since natural languages may have different lexical items that can be used to make reference to the same situation
as well as different syntactic realization of the same arguments.
The semantic patterns elements are:
Abstract ontological class. These classes were imported from the Agrovoc thesaurus and the publications of
CLEASE as we will discuss later.
Verb group. These groups were extracted from different lexicons like the Wordnet.
Text constant expression.
All these elements are non-terminal element except the third element, it is a terminal element.We refer to
abstract ontological class as a word between "<>" signs.
In this proposed approachwe will be working with single document summarization as an experimental
study and our aim will be to produce a short summary that best suits both, the user’s criteria and the writer’s
point of view.We will introduce some common terms in the summarization dialect: extraction is the process of
detecting important segments of content and generating a new verbatim of these segments; abstraction targets to
construct significant information in a new, non-verbatim way; fusion merges extracted segments coherently; and
compression objects to discard unimportant segments of text [Radev et al., 2004]. Initial studieson summarizing
documents proposed models for extracting weighty sentences from text using features like word or phrase
frequency [Luhn, 1958], position in the text[Baxendale, 1958] and key phrases[Edmundson, 1969]. Our
summary will be extractive summarization that takes benefit of three approaches, Rhetorical Structure Theory,
query processing approach, and the Network Representation approach.
Abstract: The growing of online information obliged the availability of a thorough research in the
domain of automatic text summarization within the Natural Language Processing (NLP)
community.The aim of this paper is to propose a novel approach for a language independent automatic
summarization approach that combines three main approaches. The Rhetorical Structure Theory
(RST), the query processing approach, and the Network Representationapproach (NRA). RST, as a
theory of major aspect for the structure of natural text, is used to extract the semantic relation behind
the text.Query processing approachclassifies the question type and finds the answer in a way that suits
the user’s needs. The NRA is used to create a graph representing the extracted semantic relation. The
output is an answer, which not only responses to the question, but also gives the user an opportunity to
find additional information that is related to the question.We implemented the proposed approach. As a
case study, the implemented approachis applied on Arabic text in the agriculture field. The
implemented approach succeeded in summarizing extension documents according to user's query. The
approach results have been evaluated using Recall, Precision and F-score measures.
Keywords: Information Extraction, Text Summarization, Natural Language Processing.
2. Query Answering Approach Based on Document Summarization
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 12 | Dec. 2014 | 51 |
II. Related Work
Several automatic text summarization techniques have been proposed. These summarization techniques are
classified according to [Mikael, 2014] into four categories, they are Heuristic techniques, Semantics-based
techniques, Query-oriented techniques, and Cluster-based techniques. Based on these different techniques, we’ll
review the work done on text summarization in the last few years.
Barzilay and Elhadad in 1997 present an algorithm that computes lexical chains in a text. Based on
lexical chains, they identify the nominal groups of sentences and the algorithm for segmentation by using
different sources such as the part of speech tagger. Other sources may be used such as the wordnet thesaurus
[Barzilay, 1997]. According to [Mikael, 2014] this method shows improvement over commercially available
summarizer systems but still has two limitations. First: Sentence granularity- there is a high probability of
selecting long sentences to be included in the summary. Second: the sentences that are selected and included in
the summary may contain anaphor links to other parts of the text which may not be included in the summary.
In 2006 Wang and Yang [Wang, 2006] suggested a ―fractal summarization technique‖. It uses a fractal approach
for controlling the information viewed [Dolores, 2008].The fractal theory converts the text document into a tree
hierarchy [Mohsen, 2012]. Their technique proposed A fractal theory to produce a summary by determining the
salient features for the text and its hierarchical structure. The proposed technique used a statistical approach;
therefore it can be used for multilingual text documents with minor modification of the system due to the
difference of each language’s features [Mikael, 2014]. Wang and Yang technique enhances the convergence of
information analysis of a summary as user can control the compression ratio, and the system produces a
summary that expands the information coverage and reduces the dissimilarity from the source document. Fractal
theory that considers both the abstraction level of document and statistical property of the text their result shows
the superior result compared to flat methods and the other structured summarization method in literature
[Mohsen, 2012].
In 2007, Steinberger et al. proposed a new method for using anaphoric information in Latent Semantic
Analysis (LSA) and consider its product to develop an LSA-based summarizer [Steinberger, 2013]. This method
was able to attain improved performance more than the methods that don't use anaphoric information, and it also
had an improvement performance by the rouge measures than all except the ones of the single-document
summarizers participating in DUC-2002. The LSA has some limitation; first of which is that the word order has
no effect on the syntactic relations or logic, or of morphology. Strangely, despite of this limitation the system
succeeds to extract accurate reflections of segment and word implication quite well; nonetheless there would be
few errors on some occasions [phiên, 2008]. Another limitation is in the resulting dimensions. The resulting
dimension is not easy to interpret. This leads to results which can be acceptable on the mathematical level, but
have no meaning in natural language [Steinberger, 2007]. According to [phiên, 2008] another limitation is that,
LSA cannot acquire polysemy (A polysemy is different words or phrases with the same meaning). LSA treats
each word as if it has the same meaning despite its context, which results to a drawback in the output, as the
output will depend on the occurrence average of the words. This method of producing a summary may lead to a
difficulty in performing the text comparison [Steinberger, 2007]. Therefore, another approach is introduced
namely ―Probabilistic latent semantic analysis‖, this approach based on multinomial model and it had better
results than LSA [Thomas, 2007].
Based on the literature survey, the problems and challenges in the area of summarization are identified,
providing the basis for the work to be carried out.
III. Proposed Approach
The aim of our proposed approach is to compose an extractive summary for a document using the previously
mentioned approaches. Toreach this goal,we combined three main approaches. RST, query processing, and
network representation. In our proposed approach we used RST to determine the sentence type to be either
nuclear or Satellitethen we were able to decide sentence priority since Nuclear sentences have more importance
to the writer; therefore the nuclear sentences must have higher priority than satellite sentences. In the final
summary both nuclear and satellite sentences may be included. If a satellite sentence is having a high priority, it
will still be included in the final summary and its nuclear will also be dragged to the summary to emphasize the
meaning to the reader. Many relations are considered in the proposed approach, these relations are imported
from many sources. These sources are:
1. Pen Discourse Tree corpus - Marcu, D., Romera, M. and Amorrortu, E. (1999b) and the relation analysis
done by their PDTB Research Group. http://www.seas.upenn.edu/~pdtb/
2. Mann and Thompson paper includes 24 relations [Mann, 2006], called ―Classical RST‖.
3. Alsanie[Al-Sanie, et. al, 2005] defined 11 Arabic relations.
4. The Leeds Arabic Discourse Treebank and the LADTB –discourse annotated Arabic
Treebankhttp://www.arabicdiscourse.net/
3. Query Answering Approach Based on Document Summarization
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 12 | Dec. 2014 | 52 |
In our proposed approach, we investigate a graph-based, language-independent approach to extractive text
summarization inspired by recent developments in the area of networks. We argue that if two sentences are
connected in this network they probably convey complementary information about related topics, possibly about
the same topic. As our goal is to construct informative extracts, the concept of complementary sentences is
crucial for the development of our summarization techniques. The document under consideration is mapped into
a network representation according to the adjacency and weight matrices of order N X N (where N is the
number of nodes/sentences). Table 1 is an N X N Matrix for the example in Figure 1.
Table 1: N X N Matrix
According to this matrix, it is clear that the sentence with the highest priority is S1 followed by S3 then S2.
According to RST, S1has more importance to the writer while S2 and S3 are used just to describe S1, and S2 is
used just to describe S3. In all summarization approaches S1 can’t be excluded, while S2 and S3 can’t be
included without sentence S1. In our proposed approach, S1 will be included in the final summary if S1 has a
high similarity to the user’s query. If S2 or S3 has a high similarity to the user’s query, i.e. the answer to the
user’s query was in a satellite sentence, in this particular case, including the satellite sentence alone will be
meaningless to the user (satellite sentences are only used by the writer to emphasize the nuclear ones). So in
such situations we are not going to ignore the satellite sentences after all, but we’ll have to include it’s nuclear
sentences as well.
IV. Proposed Framework
The proposed approach represented in Figure 1 is composed of three main phases namely document
summarization phase, query processing phase and Generating Final Summary phase. The objective of the first
phase - document summarization phase - is to process the user's document, to define the document rhetorical
relations and to rank the document's sentences. The objective of the second phase - query processing phase - is
to measure the semantic similarity between the user query and the document. The objective of the last phase -
generating Final Summary phase – is to generate the final summary by selecting the sentences that are mostly
related to both the user's query and the document's writer.
Figure 1: Proposed Approach
Query Processing Phase responds automatically to a user's query, this includes determining the relevance
ranking of sentences in the document to the user’s query according to a keyword search. To calculate the
sentence score, the ranking is performed using "document similarities theory" [Suganya, 2014], according to
[Sylvia, 2014] by comparing the deviation of angles between the document sentences’ vectors and the query
vector, where the query and the sentences of the document are represented as vectors, we can find the similarity
between the sentences of the document and the query. This representation leads to the effectiveness in
calculating the similarity between the query vector and each sentence vector in the document.
The document summarization performs the main objective which is producing the summary of the
document. The first step is classifying the document submitted by the user; the classes that we used were
imported from the Agrovocthesaurus [Boris, 2006]. The classification will help in the sentence ranking
S1 S2 S3 Sum
S1 - 1 1 2
S2 - - - -
S3 - 1 - 1
Document
Summarization
phase
Document
Query
Query processing
phase
SimQidf for query
Generate Final
Summary Phase
Document
Summary
Final
Summary
According to
user query
4. Query Answering Approach Based on Document Summarization
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 12 | Dec. 2014 | 53 |
component. According to the document class the sentence ranking component will determine the keywords that
will raise the sentence’s ranking. The second step is determining the relations between each sentence and the
proceeding sentence within the same paragraph. The document is then transformed into a directed graph where
an edge is drawn between each two sentences if a relation exists between these sentences. According to these
relations, the sentence type is determined and a weight is given to the relation. The third step is transforming the
document submitted by the user into an RST discourse graph which represents the relation between sentences in
the document. The final phase namely, summary generator phase, selects the sentences that are mostly related to
both the user's query and the document's writer. a rank for each sentence is measured depending on its
importance to the user,elimination the similar sentences is performed, then the summary is generated according
to the following rules
1- Select the highest priority sentences, the number of sentences included in summary is determined by the
user.
2- If the sentence type is body then the sentence header must be included in the summary,
3- Ifthe sentence type is head and there is no body sentences underneath it are of high priority, then it is
excluded.
4- If the sentence is nuclear, then it is included in the summary
5- If the RST type of the sentence is satellite then its nuclear sentence will be included even though its priority
is low.
V. Results Analysis
We present a comprehensive evaluation of the automatic text summarization methods based on rhetorical
structure theory (RST), claimed to be among the best ones. We also propose a new approach and compare our
results to the results of our expert. To the best of our knowledge, most of our results are new in the area and
reveal very interesting conclusions.We have applied the proposed system on 15 experiments; each experiment
consists of a query and a document. The results of the test cases of the experiments are listed in table 12.
According to the test results the system makes accurate results in most cases. 10 cases retrieved correct results,
all sentences were relevant and none of the retrieved sentences were irrelevant. These cases had an f score of 1.
The cases where some relevant sentences were not retrieved, was found to be due to insufficient data exported
from the Agrovoc like the diseases names, the crop names etc., in case of diseases names for example, the
disease اللفحه (blight disease) was not included in the Agrovoc, which made it impossible for the engine to
identify it. This situation should be handled using a defined methodology.
In other cases where some irrelevant sentences were retrieved by the engine, this was due to the
problem of synonyms and antonyms that we discussed earlier; this kind of error is better handled when using the
RST in combination with the Semantic patterns.The RST showed a better progress in the retrieval process, but
still there were many problems. First, some relations are inclusive, not exclusive, which means that there will be
no keywords to recognize them, these relations are out of the scope of our research. Secondly, the relations are
some times between non coherent sentences, when we tried to handle the relation between each sentence and all
other sentences within the same paragraph, the results were mostly wrong, so we considered the relation
between each sentence and the proceeding sentence, but this was not completely successful, cause mainly, in
Arabic sentences the relations are usually between more than two sentences, besides sometimes the relation is
between two sentences that are not even coherent, sometimes the relation can be between a sentence in the
beginning of the paragraph and another sentence at the end of the paragraph. These problems resulted in not
being able to recognize all of the relations between the sentences and therefore the priority of the sentences was
not completely accurate. The lexical chain approach can be used to solve this problem.
We faced another problem when dealing with the keywords imported from the different corpora. First
we thought that stemming a word would give a better result in the similarity checking process, but unfortunately
that was completely misleading. We found that not stemming the words would reduce the problem of synonyms
and antonyms to a great deal. When using the semantic patterns of the terms and combining them with all their
senses, the result was much more accurate. The problem of Ambiguity is varies in different languages, it is
increased in the Arabic language due to the overlooking rules that combine words with clitics and affixes
[grammar-lexis specifications]. Another source of confusion is that the Arabic verbs can inflect for the
imperative mood and the passive voice. One final problem was the problem of readability of the summary.
Mostly the summary readability was fine, only in rare situations of having a hidden pronoun, the sentences are
misunderstood, the RST minimized this problem to a great deal, as the hidden pronoun in most cases is between
two coherent sentences, but it was still present in a very rare situations.
5. Query Answering Approach Based on Document Summarization
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 12 | Dec. 2014 | 54 |
Table 12: The results of the test cases
ID
relevant and
retrieved (A)
relevant and not
retrieved (B)
Irrelevant
retrieved ( c)
precision recall F
1 4 2 0 1 0.666666667 0.8
2 6 0 0 1 1 1
3 5 0 1 0.833333333 1 0.909090909
4 6 0 0 1 1 1
5 8 0 0 1 1 1
6 5 1 1 0.833333333 0.833333333 0.833333333
7 5 2 0 1 0.714285714 0.833333333
8 5 0 0 1 1 1
9 7 0 0 1 1 1
10 4 2 0 1 0.666666667 0.8
11 6 1 1 0.857142857 0.857142857 0.857142857
12 8 0 0 1 1 1
13 5 0 0 1 1 1
14 4 3 1 0.8 0.571428571 0.666666667
15 6 0 0 1 1 1
VI. Conclusion
This paper shows how question answering systems—which aim at finding precise answers to questions—can be
improved by exploiting summarization techniques to extract more than just the answer from the document in
which the answer resides. This is done using a graph search algorithm which searches for relevant sentences in
the discourse structure, which is represented as a graph. The Rhetorical Structure Theory (RST) is used to create
a graph representation of a text document. The output is an extensive answer, which not only answers the
question, but also gives the user an opportunity to assess the accuracy of the answer (is this what I am looking
for?), and to find additional information that is related to the question, and which may satisfy an information
need. This has been implemented in a working multimodal question answering system where it operates with
two independently developed question answering modules. The classification process has two phases the first
of which is done offline while the second one is done online.
We presented a system for document summarization satisfying user query based on RST. We proposed
an approach and applied it on twenty experiments. The experiment results showed success in most cases and it
triggered some problems. They are, insufficiency of data imported from different corpora, in addition to the
irrelevant sentences and hidden pronoun included in the summary. The problem of synonyms and anatomies is
also one of the crucial problems to be monitored.
REFERENCES
[1] [Barzilay, 1997], Barzilay, R. and M. Elhadad, "Using lexical chains for text summarization". In Proceedings of the
35th Annual Meeting of the Association for Computational Linguistics and the 8th European Chapter Meeting of the
Association for Baxendale,ΓP.Γ(1958).ΓMachine-madeΓindexΓforΓtechnicalΓliteratureΓ-ΓanΓexperiment.ΓIBM
Journal of Research
[2] [Boris,2006], Agrovoc Web Services-Improved, real-time access to an agricultural thesaurus, 2006, Boris Lauser,.
MargheritaSini, Gauri. Salokhe,
[3] [Dolores, 2008] M. Dolores Ruiz, Antonio B. BailónEnterprise, 2008, "Summarizing Structured Documents through
a Fractal Technique", springer Information SystemsLecture Notes in Business Information Processing Volume 12,
2008, pp 328-340
[4] [Edmundson, 1969], Edmundson, H.P. 1969. "New methods in automatic abstracting". In: Journal of the Association
for Computing Machinery 16 (2). 264-285. Reprinted in: Mani, I.; Maybury, M.T. (eds.) Advances in Automatic Text
Summarization. Cambridge, Massachusetts: MIT Press. 21-42, 1969.
[5] [Luhn, 1958]Luhn Hans Peter, 1958"The Automatic Creation of Literature Abstracts". IBM Journal of Research
Development, 2(2):159–165.
[6] [Mann, 2006]MaiteTaboada and William C. Mann, 2006, ―Rhetorical Structure Theory: looking back and moving
ahead‖, sage publications London.
[7] [Martin , 2008] Martin Hassel, 2008, "Resource Lean and Portable Automatic Text Summarization", Doctoral Thesis
Stockholm, Sweden.
[8] [Mikael , 2014] Mikael Kågebäck, OlofMogren, Nina Tahmasebi, DevdattDubhashi, 2014," Extractive
Summarization using Continuous Vector Space Models", published in: 2nd Workshop on Continuous Vector Space
Models and their Compositionality CVSC, Gothenburg Sweden.
[9] [Mohamed 2008] Hesham Ahmed Hassan, Hesham Ahmed Hassan, Ahmed Rafea, 2008, "TextOntoEx Automatic
ontology construction from natural English text", science direct expert system with application 34 2008 1474-1480
6. Query Answering Approach Based on Document Summarization
| IJMER | ISSN: 2249–6645 | www.ijmer.com | Vol. 4 | Iss. 12 | Dec. 2014 | 55 |
[10] [Mohsen, 2012] Mohsen Tofighy , OmidKashefi , Azadehamanifar, Hamid Haj, SeyyedJavadi, 2012, "Persian Text
Summarization Using Fractal Theory" , University of Science and Technology, Tehran, Iran.
[11] [phiên, 2008] ĐâyLàPhiênBảnTàiLiệuĐơnGiảnXemPhiênBảnĐầyĐủCủaTàiLiệu, 2008, "Toward Classification And
Clustering In Vietnamese Web Documents", Viet Nam National University, Hanoi College Of Technology.
[12] [Radev, et. al, 2004]Radev D. R., E. Hovy, K. McKeown, 2004, "Introduction to the special issue on summarization".
Computational Linguistics. Vol 28(4). 399-408.
[13] [Steinberger , 2013] Josef Steinberger April 2013, "Multilingual Summarization and Sentiment Analysis",
Habilitation.
[14] [Suganya1 , 2014] B. Suganya1, 2014, "Analysis on Clustering Techniques based on Similarity of Text Documents",
International Journal of Advance Research in Computer Science and Management Studies Research. Available
online at: www.ijarcsms.com.
[15] [Sylvia, 2014] Sylvia Poulimenou, Sofia Stamou, SozonPapavlasopoulos, and MariosPoulos, 2014, "Keywords
Extraction from Articles’ Title for Ontological Purposes", Mathematics and Computers in Science and Engineering
Series.
[16] [Thomas, 2007] Thomas Landauer, Susan T. Dumais, 2007"A Solution to Plato's Problem: The Latent Semantic
Analysis Theory of Acquisition, Induction, and Representation of Knowledge".
[17] [Wang, 2006] Wang, F.L., & Yang, C.C. 2006. "The impact analysis of language differences on an automatic
multilingual text summarization system", Journal of the American Society for Information Science and Technology,
57, 684–696.
[18] http://www.arabicdiscourse.net/
[19] http://www.medar.info/BLARK/unannotated_corpora.php
[20] http://www.elda.org/medar_lri/monolingual_corpora.php
[21] http://www.seas.upenn.edu/~pdtb/
[22] http://www.arabicdiscourse.net/
[23] [FAO] AGROVOC Multilingual agricultural thesaurus,Food and Agriculture Organization of the United Nations
(FAO), http://aims.fao.org/standards/agrovoc, URI: http://ring.ciard.net/node/1910