This paper proposes an unsupervised model for sentence compression based on clustering the nodes of a sentence's dependency graph. The model first clusters related nodes into chunks using the Louvain clustering method. It then merges chunks based on linguistic rules to improve coherence. Candidate compressions are generated by removing chunks, and scored based on language models and word importance to select the best compression. An experiment found the proposed method performed better than a recent supervised technique.
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...CSCJournals
Semantic relation extraction is an important component of ontologies that can support many applications e.g. text mining, question answering, and information extraction. However, extracting semantic relations between concepts is not trivial and one of the main challenges in Natural Language Processing (NLP) Field. In this paper, we propose a method for semantic relation extraction between concepts. The method relies on the definition of concept context and the semantic similarity measures to extract relations from domain corpus. In this work, we implemented algorithm for concept context construction and for similarity computation based on different semantic similarity measures. We analyze the proposed methods and evaluate their performance. The preliminary experiments showed that the best results precision of 83% are obtained with Lin measure at minimum confidence =0.50 and precision of 85% with the Cosine and Jaccard similarity measures. The main advantage is the automatic and unsupervised operation; it doesn't need any pre labeled training data. Also used effectively for relation extraction in various domains. The results show the high effectiveness of the proposed approach to extract relations for Arabic ontology construction.
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...ijaia
Chinese discourse coherence modeling remains a challenge taskin Natural Language Processing
field.Existing approaches mostlyfocus on the need for feature engineering, whichadoptthe sophisticated
features to capture the logic or syntactic or semantic relationships acrosssentences within a text.In this
paper, we present an entity-drivenrecursive deep modelfor the Chinese discourse coherence evaluation
based on current English discourse coherenceneural network model. Specifically, to overcome the
shortage of identifying the entity(nouns) overlap across sentences in the currentmodel, Our combined
modelsuccessfully investigatesthe entities information into the recursive neural network
freamework.Evaluation results on both sentence ordering and machine translation coherence rating
task show the effectiveness of the proposed model, which significantly outperforms the existing strong
baseline.
Many of previous research have proven that the usage of rhetorical relations is capable to enhance many applications such as text summarization, question answering and natural language generation. This work proposes an approach that expands the benefit of rhetorical relations to address redundancy problem for cluster-based text summarization of multiple documents. We exploited rhetorical relations exist between sentences to group similar sentences into multiple clusters to identify themes of common information. The candidate summary were extracted from these clusters. Then, cluster-based text summarization is performed using Conditional Markov Random Walk Model to measure the saliency scores of the candidate summary. We evaluated our method by measuring the cohesion and separation of the clusters constructed by exploiting rhetorical relations and ROUGE score of generated summaries. The experimental result shows that our method performed well which shows promising potential of applying rhetorical relation in text clustering which benefits text summarization of multiple documents.
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...IJNSA Journal
In recent days terrorism poses a threat to homeland security. The major problem faced in network analysis is to automatically identify the key player who can maximally influence other nodes in a large relational covert network. The existing centrality based and graph theoretic approach are more concerned about the network structure rather than the node attributes. In this paper an unsupervised framework SoNMine has been developed to identify the key players in 9/11 network using their behavioral profile. The behaviors of nodes are analyzed based on the behavioral profile generated. The key players are identified using the outlier analysis based on the profile and the highly communicating node is concluded to be the most influential person of the covert network. Further, in order to improve
the classification of a normal and outlier node, intermediate reference class R is generated. Based on these three classes the most dominating feature set is determined which further helps to accurately justify the outlier nodes.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...CSCJournals
Semantic relation extraction is an important component of ontologies that can support many applications e.g. text mining, question answering, and information extraction. However, extracting semantic relations between concepts is not trivial and one of the main challenges in Natural Language Processing (NLP) Field. In this paper, we propose a method for semantic relation extraction between concepts. The method relies on the definition of concept context and the semantic similarity measures to extract relations from domain corpus. In this work, we implemented algorithm for concept context construction and for similarity computation based on different semantic similarity measures. We analyze the proposed methods and evaluate their performance. The preliminary experiments showed that the best results precision of 83% are obtained with Lin measure at minimum confidence =0.50 and precision of 85% with the Cosine and Jaccard similarity measures. The main advantage is the automatic and unsupervised operation; it doesn't need any pre labeled training data. Also used effectively for relation extraction in various domains. The results show the high effectiveness of the proposed approach to extract relations for Arabic ontology construction.
An Entity-Driven Recursive Neural Network Model for Chinese Discourse Coheren...ijaia
Chinese discourse coherence modeling remains a challenge taskin Natural Language Processing
field.Existing approaches mostlyfocus on the need for feature engineering, whichadoptthe sophisticated
features to capture the logic or syntactic or semantic relationships acrosssentences within a text.In this
paper, we present an entity-drivenrecursive deep modelfor the Chinese discourse coherence evaluation
based on current English discourse coherenceneural network model. Specifically, to overcome the
shortage of identifying the entity(nouns) overlap across sentences in the currentmodel, Our combined
modelsuccessfully investigatesthe entities information into the recursive neural network
freamework.Evaluation results on both sentence ordering and machine translation coherence rating
task show the effectiveness of the proposed model, which significantly outperforms the existing strong
baseline.
Many of previous research have proven that the usage of rhetorical relations is capable to enhance many applications such as text summarization, question answering and natural language generation. This work proposes an approach that expands the benefit of rhetorical relations to address redundancy problem for cluster-based text summarization of multiple documents. We exploited rhetorical relations exist between sentences to group similar sentences into multiple clusters to identify themes of common information. The candidate summary were extracted from these clusters. Then, cluster-based text summarization is performed using Conditional Markov Random Walk Model to measure the saliency scores of the candidate summary. We evaluated our method by measuring the cohesion and separation of the clusters constructed by exploiting rhetorical relations and ROUGE score of generated summaries. The experimental result shows that our method performed well which shows promising potential of applying rhetorical relation in text clustering which benefits text summarization of multiple documents.
DOMINANT FEATURES IDENTIFICATION FOR COVERT NODES IN 9/11 ATTACK USING THEIR ...IJNSA Journal
In recent days terrorism poses a threat to homeland security. The major problem faced in network analysis is to automatically identify the key player who can maximally influence other nodes in a large relational covert network. The existing centrality based and graph theoretic approach are more concerned about the network structure rather than the node attributes. In this paper an unsupervised framework SoNMine has been developed to identify the key players in 9/11 network using their behavioral profile. The behaviors of nodes are analyzed based on the behavioral profile generated. The key players are identified using the outlier analysis based on the profile and the highly communicating node is concluded to be the most influential person of the covert network. Further, in order to improve
the classification of a normal and outlier node, intermediate reference class R is generated. Based on these three classes the most dominating feature set is determined which further helps to accurately justify the outlier nodes.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
Taxonomy extraction from automotive natural language requirements using unsup...ijnlc
In this paper we present a novel approach to semi-automatically learn concept hierarchies from natural
language requirements of the automotive industry. The approach is based on the distributional hypothesis
and the special characteristics of domain-specific German compounds. We extract taxonomies by using
clustering techniques in combination with general thesauri. Such a taxonomy can be used to support
requirements engineering in early stages by providing a common system understanding and an agreedupon
terminology. This work is part of an ontology-driven requirements engineering process, which builds
on top of the taxonomy. Evaluation shows that this taxonomy extraction approach outperforms common
hierarchical clustering techniques.
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
ABSTRACT
In this paper we propose a novel method to cluster categorical data while retaining their context. Typically, clustering is performed on numerical data. However it is often useful to cluster categorical data as well, especially when dealing with data in real-world contexts. Several methods exist which can cluster categorical data, but our approach is unique in that we use recent text-processing and machine learning advancements like GloVe and t- SNE to develop a a context-aware clustering approach (using pre-trained
word embeddings). We encode words or categorical data into numerical, context-aware, vectors that we use to cluster the data points using common clustering algorithms like K-means.
A new approach based on the detection of opinion by sentiwordnet for automati...csandit
In this paper, we propose a new approach based on t
he detection of opinion by the
SentiWordNet for the production of text summarizati
on by using the scoring extraction
technique adapted to detecting of opinion. The tex
ts are decomposed into sentences then
represented by a vector of scores of opinion of thi
s sentences. The summary will be done by
elimination of sentences whose opinion is different
from the original text. This difference is
expressed by a threshold opinion. The following hyp
othesis: "textual units that do not share the
same opinion of the text are ideas used for the dev
elopment or comparison and their absences
have no vocation to reach the semantics of the abst
ract" Has been verified by the statistical
measure of Chi_2 which we used it to calculate a de
pendence between the unit textual and the
text. Finally we found an opinion threshold interva
l which generate the optimal assessments.
An effective approach to offline arabic handwriting recognitionijaia
Segmentation is the most challenging part of the Arabic handwriting recognition, due to the unique
characteristics of Arabic writing that allows the same shape to denote different characters. In this paper,
an off-line Arabic handwriting recognition system is proposed. The processing details are presented in
three main stages. Firstly, the image is skeletonized to one pixel thin. Secondly, transfer each diagonally
connected foreground pixel to the closest horizontal or vertical line. Finally, these orthogonal lines are
coded as vectors of unique integer numbers; each vector represents one letter of the word. In order to
evaluate the proposed techniques, the system has been tested on the IFN/ENIT database, and the
experimental results show that our method is superior to those methods currently available.
Ontology matching finds correspondences between similar entities of different ontologies. Two ontologies may be similar in some aspects such as structure, semantic etc. Most ontology matching systems integrate multiple matchers to extract all the similarities that two ontologies may have. Thus, we face a major problem to aggregate different similarities.
Some matching systems use experimental weights for aggregation of similarities among different matchers while others use machine learning approaches and optimization algorithms to find optimal weights to assign to different matchers. However, both approaches have their own deficiencies.
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESijnlc
In this paper, we propose a compression based multi-document summarization technique by incorporating
word bigram probability and word co-occurrence measure. First we implemented a graph based technique
to achieve sentence compression and information fusion. In the second step, we use hand-crafted rule
based syntactic constraint to prune our compressed sentences. Finally we use probabilistic measure while
exploiting word co-occurrence within a sentence to obtain our summaries. The system can generate summaries for any user-defined compression rate.
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
This article will introduce some approaches for improving text categorization models by integrating
previously imported ontologies. From the Reuters Corpus Volume I (RCV1) dataset, some categories very
similar in content and related to telecommunications, Internet and computer areas were selected for models
experiments. Several domain ontologies, covering these areas were built and integrated to categorization
models for their improvements.
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...ijtsrd
Natural Language Processing NLP is the one of the major filed of Natural Language Generation NLG . NLG can generate natural language from a machine representation. Generating suggestions for a sentence especially for Indian languages is much difficult. One of the major reason is that it is morphologically rich and the format is just reverse of English language. By using deep learning approach with the help of Long Short Term Memory LSTM layers we can generate a possible set of solutions for erroneous part in a sentence. To effectively generate a bunch of sentences having equivalent meaning as the original sentence using Deep Learning DL approach is to train a model on this task, e.g. we need thousands of examples of inputs and outputs with which to train a model. Veena S Nair | Amina Beevi A ""Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Learning"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23842.pdf
Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/23842/suggestion-generation-for-specific-erroneous-part-in-a-sentence-using-deep-learning/veena-s-nair
Increasing interpreting needs a more objective and automatic measurement. We hold a basic idea that 'translating means translating meaning' in that we can assessment interpretation quality by comparing the
meaning of the interpreting output with the source input. That is, a translation unit of a 'chunk' named
Frame which comes from frame semantics and its components named Frame Elements (FEs) which comes
from Frame Net are proposed to explore their matching rate between target and source texts. A case study in this paper verifies the usability of semi-automatic graded semantic-scoring measurement for human
simultaneous interpreting and shows how to use frame and FE matches to score. Experiments results show that the semantic-scoring metrics have a significantly correlation coefficient with human judgment.
Over view to an event tailored to India’s burgeoning modern hospitality sector that seeks to invigorate the market and unify the industry, building trade links and a national community.
Taxonomy extraction from automotive natural language requirements using unsup...ijnlc
In this paper we present a novel approach to semi-automatically learn concept hierarchies from natural
language requirements of the automotive industry. The approach is based on the distributional hypothesis
and the special characteristics of domain-specific German compounds. We extract taxonomies by using
clustering techniques in combination with general thesauri. Such a taxonomy can be used to support
requirements engineering in early stages by providing a common system understanding and an agreedupon
terminology. This work is part of an ontology-driven requirements engineering process, which builds
on top of the taxonomy. Evaluation shows that this taxonomy extraction approach outperforms common
hierarchical clustering techniques.
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
ABSTRACT
In this paper we propose a novel method to cluster categorical data while retaining their context. Typically, clustering is performed on numerical data. However it is often useful to cluster categorical data as well, especially when dealing with data in real-world contexts. Several methods exist which can cluster categorical data, but our approach is unique in that we use recent text-processing and machine learning advancements like GloVe and t- SNE to develop a a context-aware clustering approach (using pre-trained
word embeddings). We encode words or categorical data into numerical, context-aware, vectors that we use to cluster the data points using common clustering algorithms like K-means.
A new approach based on the detection of opinion by sentiwordnet for automati...csandit
In this paper, we propose a new approach based on t
he detection of opinion by the
SentiWordNet for the production of text summarizati
on by using the scoring extraction
technique adapted to detecting of opinion. The tex
ts are decomposed into sentences then
represented by a vector of scores of opinion of thi
s sentences. The summary will be done by
elimination of sentences whose opinion is different
from the original text. This difference is
expressed by a threshold opinion. The following hyp
othesis: "textual units that do not share the
same opinion of the text are ideas used for the dev
elopment or comparison and their absences
have no vocation to reach the semantics of the abst
ract" Has been verified by the statistical
measure of Chi_2 which we used it to calculate a de
pendence between the unit textual and the
text. Finally we found an opinion threshold interva
l which generate the optimal assessments.
An effective approach to offline arabic handwriting recognitionijaia
Segmentation is the most challenging part of the Arabic handwriting recognition, due to the unique
characteristics of Arabic writing that allows the same shape to denote different characters. In this paper,
an off-line Arabic handwriting recognition system is proposed. The processing details are presented in
three main stages. Firstly, the image is skeletonized to one pixel thin. Secondly, transfer each diagonally
connected foreground pixel to the closest horizontal or vertical line. Finally, these orthogonal lines are
coded as vectors of unique integer numbers; each vector represents one letter of the word. In order to
evaluate the proposed techniques, the system has been tested on the IFN/ENIT database, and the
experimental results show that our method is superior to those methods currently available.
Ontology matching finds correspondences between similar entities of different ontologies. Two ontologies may be similar in some aspects such as structure, semantic etc. Most ontology matching systems integrate multiple matchers to extract all the similarities that two ontologies may have. Thus, we face a major problem to aggregate different similarities.
Some matching systems use experimental weights for aggregation of similarities among different matchers while others use machine learning approaches and optimization algorithms to find optimal weights to assign to different matchers. However, both approaches have their own deficiencies.
GENERATING SUMMARIES USING SENTENCE COMPRESSION AND STATISTICAL MEASURESijnlc
In this paper, we propose a compression based multi-document summarization technique by incorporating
word bigram probability and word co-occurrence measure. First we implemented a graph based technique
to achieve sentence compression and information fusion. In the second step, we use hand-crafted rule
based syntactic constraint to prune our compressed sentences. Finally we use probabilistic measure while
exploiting word co-occurrence within a sentence to obtain our summaries. The system can generate summaries for any user-defined compression rate.
ONTOLOGY INTEGRATION APPROACHES AND ITS IMPACT ON TEXT CATEGORIZATIONIJDKP
This article will introduce some approaches for improving text categorization models by integrating
previously imported ontologies. From the Reuters Corpus Volume I (RCV1) dataset, some categories very
similar in content and related to telecommunications, Internet and computer areas were selected for models
experiments. Several domain ontologies, covering these areas were built and integrated to categorization
models for their improvements.
Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Le...ijtsrd
Natural Language Processing NLP is the one of the major filed of Natural Language Generation NLG . NLG can generate natural language from a machine representation. Generating suggestions for a sentence especially for Indian languages is much difficult. One of the major reason is that it is morphologically rich and the format is just reverse of English language. By using deep learning approach with the help of Long Short Term Memory LSTM layers we can generate a possible set of solutions for erroneous part in a sentence. To effectively generate a bunch of sentences having equivalent meaning as the original sentence using Deep Learning DL approach is to train a model on this task, e.g. we need thousands of examples of inputs and outputs with which to train a model. Veena S Nair | Amina Beevi A ""Suggestion Generation for Specific Erroneous Part in a Sentence using Deep Learning"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23842.pdf
Paper URL: https://www.ijtsrd.com/engineering/computer-engineering/23842/suggestion-generation-for-specific-erroneous-part-in-a-sentence-using-deep-learning/veena-s-nair
Increasing interpreting needs a more objective and automatic measurement. We hold a basic idea that 'translating means translating meaning' in that we can assessment interpretation quality by comparing the
meaning of the interpreting output with the source input. That is, a translation unit of a 'chunk' named
Frame which comes from frame semantics and its components named Frame Elements (FEs) which comes
from Frame Net are proposed to explore their matching rate between target and source texts. A case study in this paper verifies the usability of semi-automatic graded semantic-scoring measurement for human
simultaneous interpreting and shows how to use frame and FE matches to score. Experiments results show that the semantic-scoring metrics have a significantly correlation coefficient with human judgment.
Over view to an event tailored to India’s burgeoning modern hospitality sector that seeks to invigorate the market and unify the industry, building trade links and a national community.
How Health Care Reform Affects Franchises With Under 50 EmployeesFranchise Workforce
Learn how the US Affordable Care Act affects your franchise with under 50 employees. Whether you are a franchisee or franchisor, start planning today to avoid costly mistakes!
Each technological age has been marked by a shift in how the industrial platform enables companies to rethink their business processes and create wealth. In the talk I argue that we are limiting our view of what this next industrial/digital age can offer because of how we read, measure and through that perceive the world (how we cherry pick data). Companies are locked in metrics and quantitative measures, data that can fit into a spreadsheet. And by that they see the digital transformation merely as an efficiency tool to the fossil fuel age. But we need to stretch further…
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING cscpconf
In the last decade, ontologies have played a key technology role for information sharing and agents interoperability in different application domains. In semantic web domain, ontologies are efficiently used toface the great challenge of representing the semantics of data, in order to bring the actual web to its full
power and hence, achieve its objective. However, using ontologies as common and shared vocabularies requires a certain degree of interoperability between them. To confront this requirement, mapping ontologies is a solution that is not to be avoided. In deed, ontology mapping build a meta layer that allows different applications and information systems to access and share their informations, of course, after resolving the different forms of syntactic, semantic and lexical mismatches. In the contribution presented in this paper, we have integrated the semantic aspect based on an external lexical resource, wordNet, to design a new algorithm for fully automatic ontology mapping. This fully automatic character features the
main difference of our contribution with regards to the most of the existing semi-automatic algorithms of ontology mapping, such as Chimaera, Prompt, Onion, Glue, etc. To better enhance the performances of our algorithm, the mapping discovery stage is based on the combination of two sub-modules. The former
analysis the concept’s names and the later analysis their properties. Each one of these two sub-modules is
it self based on the combination of lexical and semantic similarity measures.
HYPONYMY EXTRACTION OF DOMAIN ONTOLOGY CONCEPT BASED ON CCRFS AND HIERARCHY C...dannyijwest
Concept hierarchy is the backbone of ontology, and the concept hierarchy acquisition has been a hot topic in the field of ontology learning. this paper proposes a hyponymy extraction method of domain ontology concept based on cascaded conditional random field(CCRFs) and hierarchy clustering. It takes free text as extracting object, adopts CCRFs identifying the domain concepts. First the low layer of CCRFs is used to identify simple domain concept, then the results are sent to the high layer, in which the nesting concepts are recognized. Next we adopt hierarchy clustering to identify the hyponymy relation between domain ontology concepts. The experimental results demonstrate the proposed method is efficient.
Concept hierarchy is the backbone of ontology, and the concept hierarchy acquisition has been a hot topic in the field of ontology learning. this paper proposes a hyponymy extraction method of domain ontology concept based on cascaded conditional random field(CCRFs) and hierarchy clustering. It takes free text as extracting object, adopts CCRFs identifying the domain concepts. First the low layer of CCRFs is used to identify simple domain concept, then the results are sent to the high layer, in which the nesting concepts are recognized. Next we adopt hierarchy clustering to identify the hyponymy relation between domain ontology concepts. The experimental results demonstrate the proposed method is efficient.
Comparative analysis of c99 and topictiling text segmentation algorithmseSAT Journals
Abstract In this paper, the work done includes the extraction of information from image datasets which contain natural text. The difficulty level of segmenting natural text from an image is too high and so precision is the most important factor to be kept in mind. To minimize the error rates, error filtration technique is provided, as filtration is adopted while doing image segmentation basically text segmentation present in images. Furthermore, a comparative analysis of two different text segmentation algorithms namely C99 and TopicTiling on image documents is presented. To assess how well each algorithm works, each was applied on different datasets and results were compared. The work done also proves the efficiency of TopicTiling over C99. Index Terms: Text Segmentation, text extraction, image documents,C99 and TopicTiling.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
A hybrid composite features based sentence level sentiment analyzerIAESIJAI
Current lexica and machine learning based sentiment analysis approaches
still suffer from a two-fold limitation. First, manual lexicon construction and
machine training is time consuming and error-prone. Second, the
prediction’s accuracy entails sentences and their corresponding training text
should fall under the same domain. In this article, we experimentally
evaluate four sentiment classifiers, namely support vector machines (SVMs),
Naive Bayes (NB), logistic regression (LR) and random forest (RF). We
quantify the quality of each of these models using three real-world datasets
that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic
movie reviews. Specifically, we study the impact of a variety of natural
language processing (NLP) pipelines on the quality of the predicted
sentiment orientations. Additionally, we measure the impact of incorporating
lexical semantic knowledge captured by WordNet on expanding original
words in sentences. Findings demonstrate that the utilizing different NLP
pipelines and semantic relationships impacts the quality of the sentiment
analyzers. In particular, results indicate that coupling lemmatization and
knowledge-based n-gram features proved to produce higher accuracy results.
With this coupling, the accuracy of the SVM classifier has improved to
90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three
other classifiers.
In this paper we tried to correlate text sequences those provides common topics for semantic clues. We propose a two step method for asynchronous text mining. Step one check for the common topics in the sequences and isolates these with their timestamps. Step two takes the topic and tries to give the timestamp of the text document. After multiple repetitions of step two, we could give optimum result.
Nature Inspired Models And The Semantic WebStefan Ceriu
In this paper we present a series of nature inspired models used as alternative solutions for Semantic Web concerns. Some of the methods presented in this article perform better than classic algorithms by enhancing response time and computational costs. Others are just proof of concept, first steps towards new techniques that will improve their respective field. The intricate nature of the Semantic Web urges the need for faster, more intelligent algorithms and nature inspired models have been proven to be more than suitable for such complex tasks.
A novel approach for clone group mappingijseajournal
Clone group mapping has a very important significance in the evolution of code clone. The topic modeling
techniques were applied into code clone firstly and a new clone group mapping method was proposed. The
method is very effective for not only Type-1 and Type-2 clone but also Type-3 clone .By making full use of
the source text and structure information, topic modeling techniques transform the mapping problem of
high-dimensional code space into a low-dimensional topic space, the goal of clone group mapping was
indirectly reached by mapping clone group topics. Experiments on four open source software show that the
recall and precision are up to 0.99, thus the method can effectively and accurately reach the goal of clone
group mapping.
Text mining efforts to innovate new, previous unknown or hidden data by automatically extracting
collection of information from various written resources. Applying knowledge detection method to
formless text is known as Knowledge Discovery in Text or Text data mining and also called Text Mining.
Most of the techniques used in Text Mining are found on the statistical study of a term either word or
phrase. There are different algorithms in Text mining are used in the previous method. For example
Single-Link Algorithm and Self-Organizing Mapping(SOM) is introduces an approach for visualizing
high-dimensional data and a very useful tool for processing textual data based on Projection method.
Genetic and Sequential algorithms are provide the capability for multiscale representation of datasets and
fast to compute with less CPU time based on the Isolet Reduces subsets in Unsupervised Feature
Selection. We are going to propose the Vector Space Model and Concept based analysis algorithm it will
improve the text clustering quality and a better text clustering result may achieve. We think it is a good
behavior of the proposed algorithm is in terms of toughness and constancy with respect to the formation of
Neural Network.
Semantic similarity and semantic relatedness
measure in particular is very important in the current scenario
due to the huge demand for natural language processing based
applications such as chatbots and information retrieval systems
such as knowledge base based FAQ systems. Current approaches
generally use similarity measures which does not use the context
sensitive relationships between the words. This leads to erroneous
similarity predictions and is not of much use in real life
applications. This work proposes a novel approach that gives an
accurate relatedness measure of any two words in a sentence by
taking their context into consideration. This context correction
results in a more accurate similarity prediction which results in
higher accuracy of information retrieval systems.
Novel Class Detection Using RBF SVM Kernel from Feature Evolving Data Streamsirjes
In the data mining field the classification of data stream creates many problems. The challenges
faces in the data stream are infinite length, concept drift, concept evaluation and feature evolution. Most of the
existing system focuses on the only first two challenges. We propose a framework in which each classifier is
prepared with the novel class detector for addressing the two challenges concept drift and concept evaluation
and for addressing the feature evolution feature set homogeneous technique is proposed. We improved the
novel class detection module by building it more adaptive to evolving the stream. SVM based feature extraction
for RBF kernel method is also proposed for detecting the novel class from the steaming data. By using the
concept of permutation and combination RBF kernel extracts the features and find out the relation between
them. This improves the novel class detect technique and provide more accuracy for classifying the data
Effect of word embedding vector dimensionality on sentiment analysis through ...IAESIJAI
Word embedding has become the most popular method of lexical description
in a given context in the natural language processing domain, especially
through the word to vector (Word2Vec) and global vectors (GloVe)
implementations. Since GloVe is a pre-trained model that provides access to
word mapping vectors on many dimensionalities, a large number of
applications rely on its prowess, especially in the field of sentiment analysis.
However, in the literature, we found that in many cases, GloVe is
implemented with arbitrary dimensionalities (often 300d) regardless of the
length of the text to be analyzed. In this work, we conducted a study that
identifies the effect of the dimensionality of word embedding mapping
vectors on short and long texts in a sentiment analysis context. The results
suggest that as the dimensionality of the vectors increases, the performance
metrics of the model also increase for long texts. In contrast, for short texts,
we recorded a threshold at which dimensionality does not matter.
Similar to Sentence compression via clustering of dependency graph nodes - NLP-KE 2012 (20)
Effect of word embedding vector dimensionality on sentiment analysis through ...
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012
1. Sentence Compression via Clustering of Dependency Graph Nodes
Ayman R. El-Kilany
Faculty of Computers and Information, Cairo University
Cairo, Egypt
a.elkilany@fci-cu.edu.eg
Samhaa R. El-Beltagy
Nile University
Cairo, Egypt
samhaa@computer.org
Mohamed E. El-Sharkawi
Faculty of Computers and Information, Cairo University
Cairo, Egypt
m.elsharkawi@fci-cu.edu.eg
Sentence compression is the process of removing words or phrases from a sentence in a
manner that would abbreviate the sentence while conserving its original meaning. This work
introduces a model for sentence compression based on dependency graph clustering. The main
idea of this work is to cluster related dependency graph nodes in a single chunk, and to then
remove the chunk which has the least significant effect on a sentence. The proposed model
does not require any training parallel corpus. Instead, it uses the grammatical structure graph
of the sentence itself to find which parts should be removed. The paper also presents the
results of an experiment in which the proposed work was compared to a recent supervised
technique and was found to perform better.
Keywords: Sentence Compression; Summarization; Unsupervised Model; Dependency Graph.
1. Introduction
Sentence compression is the task of producing a shorter form of a single given
sentence, so that the new form is grammatical and retains the most important
information of the original one (Jing, 2000). The main application of sentence
compression is within text summarization systems.
A lot of models have been developed for sentence compression with different
methods and learning techniques, but the focus of most is on selecting individual
words to remove from a sentence. Most of the existing models for sentence
compression are supervised models that rely on a training set in order to learn how to
compress a new sentence. These systems require large training sets, the construction
of which requires extensive human effort. Unsupervised models on the other hand,
1
2. 2 A. El-Kilany, S. El-Beltagy, M. El-Shakawii
can handle the problems of lack of training sets and text ambiguity. In such systems,
most of the effort is directed towards crafting good rules.
In this paper, we propose a new unsupervised model for sentence compression.
Rather than considering individual words, phrases or even dependency graph nodes as
candidates for elimination in the sentence compression task as previous work has
done, the proposed model operates on the level of a data chunk in which related nodes
of a sentence’s dependency graph are grouped together. The basic premise on which
this work builds is that grouping related nodes in one chunk will most likely result in
a chunk whose nodes are less related to other nodes in other chunks and that the
removal of one of these chunks will have minimal effect on the overall sentence given
appropriate constraints in the selection of the chunk or chunks to be removed. Our
model follows a two stage procedure to reach a compressed version of a sentence. In
the first stage, the sentence is converted into a group of chunks where each chunk
represents a coherent group of words acquired by clustering dependency graph nodes
of a sentence. Initial chunk generation is done through the use of the Louvain method
(Blondel, Guillaume, et al, 2008) which hasn’t been used for sentence dependency
graph node clustering before. Chunks are then further merged through a set of
heuristic rules. In the second stage, one or more chunks are removed from sentence
chunks to satisfy the required compression ratio such that the removed chunks have
the least effect on sentence. Effect is measured using language model scores and word
importance scores.
The rest of this paper is organized as follows: Section 2 discusses work related to
the sentence compression problem followed by a brief description of the Louvain
method in Section 3. A detailed description of the proposed approach is given in
Section 4. Section 5 describes the experimental setup while Section 6 presents the
results. Section 7 analyzes the results and section 8 presents the conclusion.
2. Related Work
In this paper, we develop an unsupervised model for sentence compression based
on graph clustering. Before presenting our algorithm, we briefly summarize some of
the highly related work on sentence compression.
The model developed by (Jing 2000) was one of the earliest works that tackled the
sentence compression problem. Given that a sentence consists of a number of phrases,
the model has to decide which phrases to remove according to many factors. One of
the main factors is grammar checking that specifies which phrases are grammatically
obligatory and can’t be removed. Also, the lexical relations between words are used to
weigh the importance of the word in sentence context. Another factor is the deletion
probability of a phrase which shows how frequently the phrase was deleted in
compressions produced by humans. This deletion probability is estimated from a
sentence compression parallel corpus.
3. Sentence Compression via Clustering of Dependency Graph Nodes 3
In contrast to (Jing 2000), a noisy-channel model (Knight and Marcu, 2002)
depends exclusively on a parallel corpus to build the compression model. The
noisy-channel model is considered the baseline model for comparing many supervised
approaches that appeared later. Given a source sentence (x), and a target compression
(y), the model consists of a language model P(y) whose role is to guarantee that the
compression output is grammatically correct, a channel model P(x|y) that captures the
probability that the source sentence x is an expansion of the target compression y, and
a decoder which searches for the compression y that maximizes P(y)P(x|y). The
channel model is acquired from a parsed version of a parallel corpus. To train and
evaluate their system (Knight and Marcu, 200) used 1035 sentences from the
Ziff–Davis corpus for training and 32 sentences for testing.
Another supervised approach is exemplified by the model developed by
(Sporleder and Lapata 2005) which utilizes discourse chunking. According to their
definition, discourse chunking is the process of identifying central and non-central
segments in the text. The model defines a supervised discourse chunking approach to
break the sentence into labeled chunks (which in this context is different than the
definition we’ve given to a chunk) before removing non-central chunks in order to
compress a sentence.
(Nomoto 2009) presents a supervised model for sentence compression based on a
parallel corpus gathered automatically from the internet. The model generates
candidate compressions by following defined paths in the dependency graph of the
sentence where each path represents a candidate compression. A candidate
compression gets classified to determine if it would work as a compression or not.
The classifier used for this process was trained using 2,000 RSS feeds and their
corresponding ‘sentence pairs’ gathered automatically from internet news, and was
tested using 116 feeds. Nomoto was able to prove that his system significantly
outperforms what was then the “state-of-art model intensive [sentence compression]
approach known as Tree-to-Tree Transducer, or T3” (Nomoto 2009) proposed by
(Cohn and Lapata, 2009).
As for unsupervised approaches, (Hori and Furui 2004) proposed a model for
automatically compressing transcribed spoken text. The model operates on the word
level where each word is considered as a candidate for removal either individually or
with any other word in the sentence producing candidate compressions. For each
candidate compression, the model calculates a language model score, a significance
score and a confidence score then chooses the one with the highest score as its output.
Another word-based compression approach was developed by (Clarke and Lapata
2008). The approach views the sentence compression task as an optimization problem
with respect to the number of decisions that have to be taken to find the best
compression. Three optimization models were developed to represent the sentence
compression problem (unsupervised, semi-supervised, supervised ) in the presence of
linguistically motivated constraints, a language model score and a significance score
are used to improve the quality of taken decisions.
4. 4 A. El-Kilany, S. El-Beltagy, M. El-Shakawii
(Filippova and Strube 2008) present an unsupervised model for sentence
compression based on the dependency tree of a sentence. Their approach shortens
sentence by pruning sub-trees out of a dependency tree. Each directed edge from head
H to word W in dependency tree is scored. They score each edge with a significance
score for word W and a probability of dependencies which is the conditional
probability of the edge type given the head word H. Probabilities of dependencies
were gathered from a parsed corpus of English sentences. They built an optimization
model to search for the smallest pruned dependency graph possible with the highest
score. In their work they were able to prove that their system can outperform that of
another un-supervised approach which was presented by (Clarke and Lapata, 2008).
Following the same strategy of (Nomoto 2009) by gathering resources
automatically from news resources, (Cordeiro, Dias and Brazdil 2009) presented an
unsupervised model which automatically gathers similar stories from internet news in
order to find sentences sharing paraphrases but written in different lengths. The model
then induces compression rules using alignment algorithms used in bioinformatics
such that these rules would be used later to compress new sentences.
3. The Louvain Method
The Louvain method (Blondel, Guillaume, et al, 2008) is an algorithm for finding
communities in large social networks. Given a network, the Louvain method produces
a hierarchy out of the network where each level of the hierarchy is divided into
communities and where nodes in each community are more related to each other than
nodes in other communities. Community quality is measured using modularity.
Modularity measures the strength of division of a network into communities (also
called groups, clusters or modules). Networks with high modularity have dense
connections between the nodes within communities but sparse connections between
nodes in different communities.
The Louvain method has two steps that are repeated iteratively. Given a graph of
n nodes, each node is said to represent a community. All nodes are processed in order
such that each node either joins one of the communities of its neighbors or remains in
its current community according to the highest modularity gained. This step is
performed iteratively until a local maximum of modularity is gained. Second, once a
local maximum has been reached, the method builds a new graph whose nodes are the
generated communities. The weight of the edges between new nodes in the new graph
is the total weight of the edges between the nodes of old communities. After the new
graph is built, steps one and two are repeated on this new graph until there are no
more changes and maximum modularity is gained.
5. Sentence Compression via Clustering of Dependency Graph Nodes 5
4. The Proposed Approach
We propose an approach for the sentence compression task. Our goal is to cluster
related dependency graph nodes in a single chunk, and to then remove the chunk
which has the least significant effect on a sentence. Towards this end, we follow six
steps to reach a compression for the sentence:
1. Chunking: In this step, the dependency graph of a sentence is generated and then
the nodes of dependency graph are grouped into chunks. The chunking process is
done mainly through clustering the dependency graph nodes of a sentence using
the Louvain method.
2. Chunk Merging: in this step, chunks are further refined using a set of heuristic
rules that make use of grammatical relations between words in a sentence to
enhance the quality of the resulting chunks.
3. Word scoring: In this step, each word in a sentence is assigned an importance
score. The word scores are used in later steps to calculate the importance of
words in candidate compressions relative to a generated dependency graph.
4. Candidate compression generation: in this step, a number of candidate
compressions are generated based on obtained chunks.
5. Candidate compression scoring: here each of the candidate compressions
generated from step 4, is scored.
6. Compression generation: in this step, the compression which best fits a desired
compression ratio, is produced.
Each of these steps is detailed in the following subsections.
4.1. Chunking
Given an input sentence, the goal of this phase is to turn this sentence into a
number of chunks where each chunk represents a coherent group of words from the
sentence. To accomplish this step, the dependency graph of the sentence is first
generated then the Louvain method is used over the dependency graph of the sentence.
Louvain method is used because of its ability to find related communities/chunks
inside the network/graph of nodes (that in our case represent the sentence) regardless
of the size of the network (big or small).
The Louvain method was chosen over other graph clustering algorithms for two
main reasons. First, the nature of a dependency graph itself where there isn't a defined
distance measure between the nodes of the graph and where only links between nodes
define a relationship. Since the Louvain method is used on large networks that have
the same features and which primarily makes use of links between nodes, it is an
excellent candidate for what we want to do, Second, the Louvain method strategy of
finding the best local cluster for a node from its direct neighbors clusters before
finding global clusters is suited for finding coherent chunks of words inside the
6. 6 A. El-Kilany, S. El-Beltagy, M. El-Shakawii
dependency graph since the nodes with the most number of links between each other
in a dependency graph should represent a coherent chunk.
The Stanford parser (de Marneffe, MacCartney, and Manning 2006) was used to
generate the dependency graph of a sentence using the parser’s basic typed
dependencies. As for the Louvain method, the C++ implementation provided by the
authors was used.
Since the Louvain method produces a hierarchal decomposition of the graph, we
had the choice to either use the lowest or highest level of the hierarchy, where the
highest level yields fewer chunks. We use the lowest level of the hierarchy if the
number of resulting chunks is <= 5; otherwise, the highest level of hierarchy is
used. It's important to note that this value was obtained empirically by
experimenting with a dataset of 100 randomly collected internet news sentences.
For the sentence "Bell, based in Los Angeles, makes and distributes electronic,
computer and building products" clustering results obtained from the first level of the
hierarchy are illustrated in Fig.1 where nodes enclosed in each circle represent a
generated chunk. To our knowledge, the Louvain method has not been used in this
context before.
Fig. 1:Example of a clustring results for a dependency graph
4.2. Chunk Merging
The Louvain method is a general purpose algorithm that does not address
linguistic constraints; it just works on dependencies between nodes in a graph as pure
relations without considering the grammatical type of relationships between the
nodes/words. So, the method can easily separate a verb and its subject by placing
them in different chunks making it hard to produce a good compression for the
sentence later.
7. Sentence Compression via Clustering of Dependency Graph Nodes 7
To solve this problem, we had to do some post-processing over the resulting
chunks using the grammatical relations between nodes/words from dependency graph
to improve their quality. Towards this end, some linguistic rules that preserve
verb/object/subject relation in all its forms (clausal subject, regular subject, passive
subject, regular object, indirect object, clausal compliments with external/internal
subject…) were crafted. For example one of these rules dictates that if there is a verb
in a chunk and if its subject is in another chunk, then both chunks should be merged.
Another rule ensures that all conjunctions, propositions and determiners are
attached to the chunks that contain words to their right in the natural order of sentence.
Since such types of words affect just the words beside them, there is no need to
distribute them across chunks. For example, if we have the following sentence: "Bell,
based in Los Angeles, makes and distributes electronic, computer and building
products.", then the proposition "in" will be placed in the same chunk as the noun
"Los", the conjunction "and" will be placed in the same chunk as the verb
"distributes" and the other conjunction "and" will be in the same chunk with the noun
"building".
Chunk merging is not used if merging leads to extreme cases. Extreme cases are
those in which all chunks are merged into one or two chunks or where one chunk
contains more than 80 % of the original sentence. In such cases the merging
conditions are relaxed and the highest level of the hierarchy is used.
When this step was applied on the experimental dataset described in section 5, the
average chunk size was 8.4 words ±5.1 words.
4.3. Word Scoring
We chose a scoring scheme that reflects the importance of words in the context of
the sentence such that a word that appears at a high level within a dependency graph
and has many dependents should be more important than other words at lower
levels with less dependents.
Using the structure of the dependency graph where the root of the graph gets a
score=1, we score each word in the sentence – except stop words – according to the
following equation:
cc(W )
S (W ) n ( w)
u S ( p (W )) (1)
1 cc(W ) ¦ cc( w(i ))
i 1
Where S is the score, W is the word, CC is the child count of the word within the dependency tree, p(W) is
the parent of the word W within the dependency tree, n(w) is the total number of siblings of word w, W(i)
is the word for sibling i, in the dependency graph
8. 8 A. El-Kilany, S. El-Beltagy, M. El-Shakawii
According to Eq. (1), each word is scored according to its level in the dependency
graph and the number of children it has relative to the number of children of its
siblings in the dependency graph. The score of the word is a percentage of the score
of its parent in the dependency graph which means that the deeper the level of the
word is in the dependency graph, the lower its score. The number of children of the
word relative to its siblings' number of children defines the percentage of the score the
word would get from its parent which means a word that has more children should get
a better score than a sibling with less number of children.
4.4. Candidate Compression Generation
Candidate compressions are generated out of chunks obtained from the previously
described steps. For example, if the chunking and merging steps produced 3 chunks
(Chunk1, Chunk2, Chunk3) then different combinations from these chunks constitute
the candidate compressions (Chunk1 – Chunk2 – Chunk3 – Chunk1Chunk2 –
Chunk1Chunk3 – Chunk2Chunk3). Words in each combination will be placed in their
natural order in the original sentence. Since one of these combinations shall be the
resulting compression of our system, we forced a new rule to ensure the quality of the
model’s output. The rule forces all combinations to include the chunk that has the root
word of the dependency graph. The root of the dependency graph has the most
important word in the sentence as all other words in the sentence depend on it.
Given the previous example, if Chunk2 is the one containing the root of the
dependency graph then the generated combinations would be limited to: (Chunk2 –
Chunk1Chunk2 – Chunk2Chunk3).
For example, candidate compressions for the sentence Bell, based in Los Angeles,
makes and distributes electronic, computer and building products giving that the
chunk that has the root node is Bell makes and distributes products would be as
follow:
x Bell makes and distributes products
x Bell, based in los Angeles, makes and distributes products
x Bell, makes and distributes electronic, computer and building products
4.5. Candidate Compression Scoring
Given the candidate compressions that were produced in the previous subsection,
one of them needs to be selected as an output. To do so, each candidate compression
is scored with a language model score and an importance score. The candidate
compression with the highest combined score is the one to be produced as an output.
For the language model, we used a 3-gram language model- provided by (Keith
Vertanen) that was trained using the newswire text provided in the English Gigaword
corpus with a vocabulary of the top 64k words occurring in the training text.
9. Sentence Compression via Clustering of Dependency Graph Nodes 9
Using the 3-gram language model, we scored each candidate compression
according to the following equation:
Given a candidate compression of n words:
LMScore(CandidateCompression)
log[ P(W 0 | Start )] log[ P(end | Wn, Wn 1)] (2)
¦ (log[P(Wi 1 | Wi,Wj )] log[P(Wi | Wj,Wj 1)])
i, j
Where i ,j are for the two words that represent end of a chunk and start of following chunk for every joining
between two different chunks in the candidate compression
Contrary to the usual usage of a language model, we use it only to find the
probability of the start word of the compression, end word of the compression and
adjacent words in the compression that represent the connection of different chunks
even if these words follow the natural order within a sentence.
For Example: If a candidate compression X has the following words from the
original sentence: w0, w1, w2, w3, w6, w7, w8, w9, w10 where w0, w1, w2, w3 are
in chunk 1 and w6, w7, w8, w9, w10 are in chunk 3
LMScore( X )
log[ P(W 0 | Start )] log[ P(end | W 10, W 9)]
log[ P(W 7 | W 6, W 3)] log[ P(W 6 | W 3, W 2)
In this way, candidate compressions with few chunks are given a better LMScore
than other candidate compressions.
The importance score for a candidate compression X that has n words is
calculated using the following equation:
n
ImportanceScore ( X ) ¦ Score(Wk )
k 1
(3)
Where is the importance score of word k calculated using Eq. (2) presented in section 4.3
4.6. Compression Generation
The model now has a number of candidate compressions for an input sentence and
each of these has an LMScore and an ImportanceScore. Given the required
compression ratio range, we search among candidate compressions for one that satisfy
the desired compression range. Then we choose the compression that maximizes:
(LMScore ÷ ImportanceScore).
10. 10 A. El-Kilany, S. El-Beltagy, M. El-Shakawii
For example, given a desired compression range between 0.6 to 0.69, the resulting
compression for the sentence Bell, based in Los Angeles, makes and distributes
electronic, computer and building products will be Bell, based in los Angeles,
makes and distributes products as it will get highest score.
It is possible that there will be no compression which satisfies the required
compression ratio. In such cases, the original sentence will be returned with no
compression.
5. Experimental Setup
The goal of our experiment was to prove that our unsupervised algorithm can
achieve at least comparable results with one of the highly-rated supervised approaches
in the domain of sentence compression. Towards this end, we’ve compared our results
to those of the free approach model developed by (Nomoto 2009) through manual and
automatic evaluation.
We chose to compare our approach to a supervised one which utilizes dependency
graphs for sentence compression since supervised approaches are usually trained for a
specific dataset which means that they should be highly competent in carrying out
their task on this specific dataset. So, by selecting a reputable supervised model, and
demonstrating comparable or superior performance, we can assert the validity of our
approach.
In our evaluation, we tried to build a fair evaluation environment as suggested by
(Napoles, Van Durme and Burch 2011) in their work about best conditions to evaluate
sentence compression models.
Nomoto kindly provided us with the dataset he used to evaluate his system as well
as with the output of his system at different compression ratios. We evaluated our
model’s output and his model’s output from the dataset on the same sentences and
same compression ratios; so we considered 0.6 and 0.7 compression ratios only. It is
important to note that using both our approach and Nomoto’s free approach, it is
impossible to get an exact compression ratio, so for example a desired compression
ratio of 0.6 will be matched with any compression that falls in the range of 0.6 to
0.69.
For the compression ratio of 0.6 we evaluated 92 sentences and 90 sentences for
the compression ratio of 0.7. It's important to note that these numbers represent what
both models succeeded to compress at the specified compression ratios.
11. Sentence Compression via Clustering of Dependency Graph Nodes 11
Table 1: Guidelines for rating compressions
Meaning Explanation Score
Very Bad Intelligibility: means the sentence in question 1
is rubbish, no sense can be made out of it.
Representativeness: means that there is no way
in which the compression could be viewed as
representing its source.
Poor Either the sentence is broken or fails to make 2
sense for the most part, or it is focusing on
points of least significance in the source
Fair The sentence can be understood, though with 3
some mental effort; it covers some of the
important points in the source sentence.
Good The sentence allows easy comprehension; it 4
covers most of important points talked about
in the source sentence.
Excellent The sentence reads as if it were written by 5
human; it gives a very good idea of what is
being discussed in the source sentence.
Sentences that our model didn't succeed to compress at the specified compression
ratios were ignored. The reason our model failed to compress certain sentences to a
specific ratio is based on the fact that our system removes chunks, not words. So
adjusting the compression to a specific ratio is not always possible. The RSS feeds of
the original sentences were considered as the gold standard.
For manual evaluation, scores were solicited from two human judges who defined
their English level as fluent. The human judges followed the scoring guidelines shown
in Table 1 which are the same guidelines used for evaluating Nomoto’s approach.
Intelligibility means how well the compression reads. Representativeness indicates
how well the compression represents its source sentence.
A tool was developed specifically for the judges to collect their evaluation scores.
Through this tool the judges were given the original sentence as well as various
compressions generated by the proposed system, by Nomoto’s system and the RSS
feed. The judges were unaware of which compression was generated by which model
in order to eliminate any bias on their part.
As for automatic evaluation, we used the parsing-based evaluation measure
proposed by Riezler et al. (2003) where grammatical relations found in a compression
are compared to grammatical relations found in the gold standard to find precision,
recall and F1-score. We calculated precision, recall and F1-score for our model and
Nomoto's model on both 0.6 and 0.7 compression ratios against RSS feeds (gold
standard). We used the Stanford parser to find grammatical relations in both
compressions and the gold standard.
12. 12 A. El-Kilany, S. El-Beltagy, M. El-Shakawii
6. Results
6.1. Manual Evaluation Results
Evaluation scores for each evaluation criterion and each model were collected
from the tool and averaged.
Table 2: Manual evaluation results for our proposed model, Free Approach model and gold standard.
Model CompRatio Intelligibility Representativeness
Gold Standard 0.69 4.71± 0.38 3.79±0.76
Proposed Model 0.64 4.09± 0.79 3.57±0.66
Free Approach 0.64 3.34±0.78 3.25±0.65
Proposed Model 0.74 4.15±0.73 3.78±0.59
Free Approach 0.73 3.63±0.78 3.52±0.66
Table 2 shows how our model and the free approach model performed on the
dataset along with the gold standard. Table 3 shows the differences between results of
our evaluation of the free approach model and the results reported in the Nomoto
(2009) paper.
Table 3 : Results of our manual evaluation for Free Approach model and gold standard and evaluation
obtained from paper by Nomoto.
Model Intelligibility Representativeness
Our judges Nomoto’s judges Our judges Nomoto’s judges
Gold Standard (0.69) 4.71 4.78 3.79 3.93
Free Approach (0.64) 3.34 3.06 3.25 3.26
Free Approach (0.74) 3.63 3.33 3.52 3.62
When reading table 3, it is important to note that Nomoto’s judges evaluated a
slightly bigger subset, as there are cases as previously mentioned before, where our
system failed to produce a result at a specified compression. The difference between
the results of our evaluation for the free approach model and the results reported in
the original paper was as follows: intelligibility has a mean difference of +0.2 and
representativeness has a mean difference of +0.1. So our judges actually gave a
13. Sentence Compression via Clustering of Dependency Graph Nodes 13
slightly higher score to compressions produced by Nomoto’s system, than his own
evaluators. All in all however, the scores are very close.
Table 2 shows that our model performed better than the free approach model on
intelligibility and representativeness but with higher rates on intelligibility.
We ran the t-test on the results of the evaluation scores obtained from both our
model and Nomoto’s (shown in table 2) to find out if the difference between our
model and Nomoto's model is significant or not. The t-test revealed that the difference
over both criteria and both compression ratios is statistically significant as all
resulting p-values for the t-test were less than 0.01.
Also, our model for the 0.7 compression ratio, managed to achieve close results to
those of the gold standard dataset especially on the representativeness ratings.
6.2. Automatic Evaluation Results
Table 4 shows how our model and the free approach model compare against the
RSS feeds gold standard on 0.6 and 0.7 compression ratios in terms of precision,
recall and F1-score. The results show that our model performed better than the free
approach model.
Table 4 : Automatic evaluation results for our proposed model, Free Approach model against gold standard.
CompRatio Precision Recall F-measure
Free Approach 0.64 46.2 43 44.6
Proposed Model .064 50.1 47.5 48.8
Free Approach 0.73 50.3 50.8 50.5
Proposed Model 0.74 51.9 53.7 52.8
To summarize, the results show that our proposed model manages to achieve
better results than the free approach model on two different compression ratios using
two different evaluation methods.
Table 5 shows examples of compressions created by our model, Nomoto's model,
gold standard compressions and relevant source sentences.
6.3. Results Analysis
We believe that the reason that our model performed better than Nomoto's model
on intelligibility can be attributed to the way we used the dependency graph.
14. 14 A. El-Kilany, S. El-Beltagy, M. El-Shakawii
Table 5: Examples from our model output, Nomoto's model output and gold standard
Source Officials at the Iraqi special tribunal set up to try the former dictator and his top
aides have said they expect to put him on trial by the end of the year in the deaths
of nearly 160 men and boys from dujail , all Shiites , some in their early teens .
Gold Iraqi officials expect to put saddam hussein on trial by the end of the year in the
deaths of nearly 160 men and boys from dujail.
Nomoto's Officials at the Special Tribunal set up to try the dictator and his aides have said
they expect to put him by the end of the year in the deaths of 160 men.
Our Model Officials at the Iraqi special tribunal set up to try the former dictator by the end
of the year in the deaths of nearly 160 men and boys from dujail.
Source Microsoft's truce with palm, its longtime rival in palmtop software, was forged
with a rare agreement to allow palm to tinker with the windows mobile software ,
the companies ' leaders said here Monday .
Gold Microsoft's truce with palm, its longtime rival in palmtop software, was forged
with a rare agreement to allow palm to tinker with the windows mobile software.
Nomoto's Microsoft truce with Palm, rival in palmtop software, was forged with agreement
to allow Palm to tinker , companies' said here Monday.
Our Model Microsoft's truce with palm was forged with a rare agreement to allow palm to
tinker with the windows mobile software.
Source The six major Hollywood studios, hoping to gain more control over their
technological destiny, have agreed to jointly finance a multimillion-dollar
research laboratory to speed the development of new ways to foil movie pirates .
Gold The six major Hollywood studios have agreed to jointly finance a
multimillion-dollar research laboratory to speed the development of new ways to
foil movie pirates.
Nomoto's The six major Hollywood studios, hoping to gain, have agreed to jointly finance
laboratory to speed the development of new ways to foil pirates.
Our Model The six major Hollywood studios, over their technological destiny, have agreed
to jointly finance a multimillion-dollar research laboratory to speed to foil movie
pirates.
Nomoto's model finds its candidate compressions by selecting paths inside the
dependency graph from the root node to a terminal node which means that words
inside the compression can be scattered all over the graph which won't guarantee high
intelligibility. Our model on the other hand, searches for densities inside the
dependency graph and then chooses some to keep and others to discard which allows
more grammatical compressions than Nomoto's model.
As for performing better on representativeness, the fact that we are searching for
densities inside the dependency graph to keep or to discard, will usually result in
discarded chunks that bear little relevance to the rest of the chunks. This contrasts to
15. Sentence Compression via Clustering of Dependency Graph Nodes 15
Nomoto's model which considers that the words of the compression are scattered all
over dependency graph nodes.
Automatic evaluation results support the superiority of our model against
Nomoto's model as it shows that our model’s output is more similar to the gold
standard than Nomoto's model thus re-enforcing the results of manual evaluation.
7. Conclusion
This paper presented a method for sentence compression by considering the
clusters of dependency graph nodes of a sentence as the main unit of information.
Using dependency graphs for sentence compression is not novel in itself as both
(Nomoto 2009) and (Filippova and Strube 2008) have also employed it for the same
task. However clustering dependency graph nodes into chunks and eliminating chunks
rather than nodes does present a new approach to the sentence compression problem.
Comparing this work to that of a supervised approach that employ dependency graph
as we do has shown that the presented model performs better when compressing
sentences both in terms of manual and automatic evaluation.
Future work will consider incorporating the proposed model into a more general
context like paragraph summarization or multi-documents summarization. We also
plan on experimenting on other datasets.
Acknowledgments
The authors wish to thank Professor Tadashi Nomoto for kindly availing his
experimental dataset and results for use by the authors.
References
1. V. D. Blondel, J.-L. Guillaume, R. Lambiotte, and E. Lefebvre. Fast unfolding of
communities in large network. Journal of Statistical Mechanics: Theory and Experiment,
Vol. 2008, No. 10, P10008. 2008.
2. M. de Marneffe, B. MacCartney, and C. Manning. Generating typed dependency parses
from phrase structure parses. In proceeding of LREC 2006. Genoa, Italy.2006.
3. H. Jing. 2000. Sentence reduction for automatic text summarization. In Proceedings of the
6th Applied Natural Language Processing Conference, 310–315. Seattle, WA, USA.
4. K. Knight and D. Marcu. Summarization beyond sentence extraction: A probabilistic
approach to sentence reduction. Artificial Intelligence,E139(1):91–107. 2002.
5. C. Hori and S. Furui. Speech summarization: an approach through word extraction and a
method for evaluation. IEICE Transactions on Information and Systems, E87- D(1):15–25.
2004.
6. C. Sporleder and M. Lapata. Discourse Chunking and its Application to Sentence reduction.
In Proceedings of the Human Language Technology Conference and the Conference on
16. 16 A. El-Kilany, S. El-Beltagy, M. El-Shakawii
Empirical Methods in Natural Language Processing, pp 257-264. Vancouver, Canada.
2005.
7. J. Clarke and M. Lapata. Global inference for sentence reduction: An integer linear
programming approach. Journal of Artificial Intelligence Research, 31(1):399–429. 2008.
8. K. Filippova. and M. Strube Dependency tree based sentence compression. In
Proceedings of INLG-08, pp. 25–32. Salt Fork, Ohio,USA. 2008.
9. J. Cordeiro, G. Dias and P. Brazdil. 2009. Unsupervised Induction of Sentence
Compression Rules. Proceedings of the ACL-IJCNLP Workshop: Language Generation
and Summarisation (UCNLG+Sum), pp 15—22. Singapore. 2009
10. T. Nomoto. A comparison of model free versus model intensive approaches to sentence
reduction. In Proceedings of EMNLP, pp 391–399. Singapore. 2009.
11. C. Napoles, B. Van Durme and C. Callison-Burch. Evaluating Sentence Compression:
Pitfalls and Suggested Remedies. In Workshop on Monolingual Text-To-Text Generation -
ACL HLT 2011. Portland, Oregon, USA. 2011.
12. T. Cohn and M. Lapata. Sentence compression as tree transduction. Journal of Artificial
Intelligence Research, 34:637–674. 2009.
13. S. Riezler, T. H. King, et al. Parsing the Wall Street Journal using a Lexical-Functional
Grammar and discriminative estimation techniques. Association for Computational
Linguistics. 2002.
14. K. Vertanen. English Gigaword language model training recipe
http://www.keithv.com/software/giga/