In this paper, we propose a new approach based on the detection of opinion by the
SentiWordNet for the production of text summarization by using the scoring extraction
technique adapted to detecting of opinion. The texts are decomposed into sentences then
represented by a vector of scores of opinion of this sentences. The summary will be done by
elimination of sentences whose opinion is different from the original text. This difference is
expressed by a threshold opinion. The following hypothesis: "textual units that do not share the
same opinion of the text are ideas used for the development or comparison and their absences
have no vocation to reach the semantics of the abstract" Has been verified by the statistical
measure of Chi_2 which we used it to calculate a dependence between the unit textual and the
text. Finally we found an opinion threshold interval which generate the optimal assessments
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
ABSTRACT
In this paper we propose a novel method to cluster categorical data while retaining their context. Typically, clustering is performed on numerical data. However it is often useful to cluster categorical data as well, especially when dealing with data in real-world contexts. Several methods exist which can cluster categorical data, but our approach is unique in that we use recent text-processing and machine learning advancements like GloVe and t- SNE to develop a a context-aware clustering approach (using pre-trained
word embeddings). We encode words or categorical data into numerical, context-aware, vectors that we use to cluster the data points using common clustering algorithms like K-means.
This document discusses an integrated approach to ontology development methodology and provides a case study using a shopping mall domain. It begins by reviewing existing ontology development methodologies and identifying their pitfalls. An integrated methodology is then proposed which aims to reduce these pitfalls. The key steps in the proposed methodology are: 1) capturing motivating user scenarios or keywords, 2) generating formal/informal questions and answers from the scenarios, 3) extracting terms and constraints, and 4) building the ontology using a top-down approach. The methodology is applied to developing an ontology for a shopping mall domain to provide multilingual information to visitors.
Novelty detection via topic modeling in research articlescsandit
In today’s world redundancy is the most vital problem faced in almost all domains. Novelty
detection is the identification of new or unknown data or signal that a machine learning system
is not aware of during training. The problem becomes more intense when it comes to “Research
Articles”. A method of identifying novelty at each sections of the article is highly required for
determining the novel idea proposed in the research paper. Since research articles are semistructured,
detecting novelty of information from them requires more accurate systems. Topic
model provides a useful means to process them and provides a simple way to analyze them. This
work compares the most predominantly used topic model- Latent Dirichlet Allocation with the
hierarchical Pachinko Allocation Model. The results obtained are promising towards
hierarchical Pachinko Allocation Model when used for document retrieval.
Query Answering Approach Based on Document SummarizationIJMER
The growing of online information obliged the availability of a thorough research in the
domain of automatic text summarization within the Natural Language Processing (NLP)
community.The aim of this paper is to propose a novel approach for a language independent automatic
summarization approach that combines three main approaches. The Rhetorical Structure Theory
(RST), the query processing approach, and the Network Representationapproach (NRA). RST, as a
theory of major aspect for the structure of natural text, is used to extract the semantic relation behind
the text.Query processing approachclassifies the question type and finds the answer in a way that suits
the user’s needs. The NRA is used to create a graph representing the extracted semantic relation. The
output is an answer, which not only responses to the question, but also gives the user an opportunity to
find additional information that is related to the question.We implemented the proposed approach. As a
case study, the implemented approachis applied on Arabic text in the agriculture field. The
implemented approach succeeded in summarizing extension documents according to user's query. The
approach results have been evaluated using Recall, Precision and F-score measures.
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHSijnlc
In this paper a methodology to mine the concepts from documents and use these concepts to generate an
objective summary of the claims section of the patent documents is proposed. Conceptual Graph (CG)
formalism as proposed by Sowa (Sowa 1984) is used in this work for representing the concepts and their
relationships. Automatic identification of concepts and conceptual relations from text documents is a
challenging task. In this work the focus is on the analysis of the patent documents, mainly on the claim’s
section (Claim) of the documents. There are several complexities in the writing style of these documents as
they are technical as well as legal. It is observed that the general in-depth parsers available in the open
domain fail to parse the ‘claims section’ sentences in patent documents. The failure of in-depth parsers
has motivated us, to develop methodology to extract CGs using other resources. Thus in the present work
shallow parsing, NER and machine learning technique for extracting concepts and conceptual
relationships from sentences in the claim section of patent documents is used. Thus, this paper discusses i)
Generation of CG, a semantic network and ii) Generation of abstractive summary of the claims section of
the patent. The aim is to generate a summary which is 30% of the whole claim section. Here we use
Restricted Boltzmann Machines (RBMs), a deep learning technique for automatically extracting CGs. We
have tested our methodology using a corpus of 5000 patent documents from electronics domain. The results
obtained are encouraging and is comparable with the state of the art systems.
Tools for Ontology Building from Texts: Analysis and Improvement of the Resul...IOSR Journals
Text2Onto is a tool that learns ontologies from textual data by extracting ontology components like concepts, relations, instances, and hierarchies. It analyzes texts through linguistic preprocessing using Gate to tokenize, tag parts of speech, and identify noun and verb phrases. Algorithms then extract ontology components and store them probabilistically in a Preliminary Ontology Model independent of any representation language. The study aimed to understand Text2Onto's architecture, analyze errors in its extractions, and attempt improvements by using a meta-model of the text to better classify concepts under core concepts.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
CONTEXT-AWARE CLUSTERING USING GLOVE AND K-MEANSijseajournal
ABSTRACT
In this paper we propose a novel method to cluster categorical data while retaining their context. Typically, clustering is performed on numerical data. However it is often useful to cluster categorical data as well, especially when dealing with data in real-world contexts. Several methods exist which can cluster categorical data, but our approach is unique in that we use recent text-processing and machine learning advancements like GloVe and t- SNE to develop a a context-aware clustering approach (using pre-trained
word embeddings). We encode words or categorical data into numerical, context-aware, vectors that we use to cluster the data points using common clustering algorithms like K-means.
This document discusses an integrated approach to ontology development methodology and provides a case study using a shopping mall domain. It begins by reviewing existing ontology development methodologies and identifying their pitfalls. An integrated methodology is then proposed which aims to reduce these pitfalls. The key steps in the proposed methodology are: 1) capturing motivating user scenarios or keywords, 2) generating formal/informal questions and answers from the scenarios, 3) extracting terms and constraints, and 4) building the ontology using a top-down approach. The methodology is applied to developing an ontology for a shopping mall domain to provide multilingual information to visitors.
Novelty detection via topic modeling in research articlescsandit
In today’s world redundancy is the most vital problem faced in almost all domains. Novelty
detection is the identification of new or unknown data or signal that a machine learning system
is not aware of during training. The problem becomes more intense when it comes to “Research
Articles”. A method of identifying novelty at each sections of the article is highly required for
determining the novel idea proposed in the research paper. Since research articles are semistructured,
detecting novelty of information from them requires more accurate systems. Topic
model provides a useful means to process them and provides a simple way to analyze them. This
work compares the most predominantly used topic model- Latent Dirichlet Allocation with the
hierarchical Pachinko Allocation Model. The results obtained are promising towards
hierarchical Pachinko Allocation Model when used for document retrieval.
Query Answering Approach Based on Document SummarizationIJMER
The growing of online information obliged the availability of a thorough research in the
domain of automatic text summarization within the Natural Language Processing (NLP)
community.The aim of this paper is to propose a novel approach for a language independent automatic
summarization approach that combines three main approaches. The Rhetorical Structure Theory
(RST), the query processing approach, and the Network Representationapproach (NRA). RST, as a
theory of major aspect for the structure of natural text, is used to extract the semantic relation behind
the text.Query processing approachclassifies the question type and finds the answer in a way that suits
the user’s needs. The NRA is used to create a graph representing the extracted semantic relation. The
output is an answer, which not only responses to the question, but also gives the user an opportunity to
find additional information that is related to the question.We implemented the proposed approach. As a
case study, the implemented approachis applied on Arabic text in the agriculture field. The
implemented approach succeeded in summarizing extension documents according to user's query. The
approach results have been evaluated using Recall, Precision and F-score measures.
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHSijnlc
In this paper a methodology to mine the concepts from documents and use these concepts to generate an
objective summary of the claims section of the patent documents is proposed. Conceptual Graph (CG)
formalism as proposed by Sowa (Sowa 1984) is used in this work for representing the concepts and their
relationships. Automatic identification of concepts and conceptual relations from text documents is a
challenging task. In this work the focus is on the analysis of the patent documents, mainly on the claim’s
section (Claim) of the documents. There are several complexities in the writing style of these documents as
they are technical as well as legal. It is observed that the general in-depth parsers available in the open
domain fail to parse the ‘claims section’ sentences in patent documents. The failure of in-depth parsers
has motivated us, to develop methodology to extract CGs using other resources. Thus in the present work
shallow parsing, NER and machine learning technique for extracting concepts and conceptual
relationships from sentences in the claim section of patent documents is used. Thus, this paper discusses i)
Generation of CG, a semantic network and ii) Generation of abstractive summary of the claims section of
the patent. The aim is to generate a summary which is 30% of the whole claim section. Here we use
Restricted Boltzmann Machines (RBMs), a deep learning technique for automatically extracting CGs. We
have tested our methodology using a corpus of 5000 patent documents from electronics domain. The results
obtained are encouraging and is comparable with the state of the art systems.
Tools for Ontology Building from Texts: Analysis and Improvement of the Resul...IOSR Journals
Text2Onto is a tool that learns ontologies from textual data by extracting ontology components like concepts, relations, instances, and hierarchies. It analyzes texts through linguistic preprocessing using Gate to tokenize, tag parts of speech, and identify noun and verb phrases. Algorithms then extract ontology components and store them probabilistically in a Preliminary Ontology Model independent of any representation language. The study aimed to understand Text2Onto's architecture, analyze errors in its extractions, and attempt improvements by using a meta-model of the text to better classify concepts under core concepts.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESacijjournal
Named Entity Recognition which is an important subject of Natural Language Processing is a key technology of information extraction, information retrieval, question answering and other text processing applications. In this study, we evaluate previously well-established association measures as an initial
attempt to extract two-worded named entities in a Turkish corpus. Furthermore we propose a new association measure, and compare it with the other methods. The evaluation of these methods is performed by precision and recall measures.
Ontology matching finds correspondences between similar entities of different ontologies. Two ontologies may be similar in some aspects such as structure, semantic etc. Most ontology matching systems integrate multiple matchers to extract all the similarities that two ontologies may have. Thus, we face a major problem to aggregate different similarities.
Some matching systems use experimental weights for aggregation of similarities among different matchers while others use machine learning approaches and optimization algorithms to find optimal weights to assign to different matchers. However, both approaches have their own deficiencies.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
The increased potential of the ontologies to reduce the human interference has wide range of applications. This paper identifies requirements for an ontology development platform to innovate artificially intelligent web. To facilitate this process, RDF and OWL have been developed as standard formats for the sharing and integration of data and knowledge. The knowledge in the form of rich conceptual schemas called ontologies. Based on the framework, an architectural paradigm is put forward in view of ontology engineering and development of ontology applications and a development portal designed to support ontology engineering, content authoring and application development with a view to maximal scalability in size and complexity of semantic knowledge and flexible reuse of ontology models and ontology application processes in a distributed and collaborative engineering environment.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
Generation of Question and Answer from Unstructured Document using Gaussian M...IJACEE IJACEE
The document describes a system that automatically generates questions and answers from an unstructured document. It involves several steps: (1) simplifying complex sentences, (2) generating initial questions using named entities and semantic role labeling, (3) identifying subtopics using LDA and GMNTM models, (4) measuring syntactic correctness of questions, and (5) extracting answers using pattern matching. The system is expected to produce more accurate results compared to using only LDA for subtopic identification, as GMNTM also considers word order and semantics. Key techniques include semantic role labeling, Extended String Subsequence Kernel for similarity measurement, and syntactic tree kernel for question ranking.
A statistical model for gist generation a case study on hindi news articleIJDKP
Every day, huge number of news articles are reported and disseminated on the Internet. By generating gist
of an article, reader can go through the main topics instead of reading the whole article as it takes much
time for reader to read the entire content of the article. An ideal system would understand the document
and generate the appropriate theme(s) directly from the results of the understanding. In the absence of
natural language understanding system, it is required to design an appropriate system. Gist generation is a
difficult task because it requires both maximizing text content in short summary and maintains
grammaticality of the text. In this paper we present a statistical approach to generate a gist of a Hindi
news article. The experimental results are evaluated using the standard measures such as precision, recall
and F1 measure for different statistical models and their combination on the article before pre-processing
and after pre-processing.
Modeling Text Independent Speaker Identification with Vector QuantizationTELKOMNIKA JOURNAL
Speaker identification is one of the most important technologies nowadays. Many fields such as
bioinformatics and security are using speaker identification. Also, almost all electronic devices are using
this technology too. Based on number of text, speaker identification divided into text dependent and text
independent. On many fields, text independent is mostly used because number of text is unlimited. So, text
independent is generally more challenging than text dependent. In this research, speaker identification text
independent with Indonesian speaker data was modelled with Vector Quantization (VQ). In this research
VQ with K-Means initialization was used. K-Means clustering also was used to initialize mean and
Hierarchical Agglomerative Clustering was used to identify K value for VQ. The best VQ accuracy was
59.67% when k was 5. According to the result, Indonesian language could be modelled by VQ. This
research can be developed using optimization method for VQ parameters such as Genetic Algorithm or
Particle Swarm Optimization.
The important problem of word segmentation in Thai language is sentential noun phrase. The existing
studies try to minimize the problem. But there is no research that solves this problem directly. This study
investigates the approach to resolve this problem using conditional random fields which is a probabilistic
model to segment and label sequence data. The results present that the corrected data of noun phrase was
detected more than 78.61 % based on our technique.
A Semi-Automatic Ontology Extension Method for Semantic Web ServicesIDES Editor
this paper provides a novel semi-automatic ontology
extension method for Semantic Web Services (SWS). This is
significant since ontology extension methods those existing
in literature mostly deal with semantic description of static
Web resources such as text documents. Hence, there is a need
for methods that can serve dynamic Web resources such as
SWS. The developed method in this paper avoids redundancy
and respects consistency so as to assure high quality of the
resulting shared ontologies.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
This document summarizes a proposed framework for sentiment classification using fuzzy logic. The framework aims to detect both implicit and explicit sentiment expressions in text by incorporating multiple datasets and techniques. It involves preprocessing text data, classifying sentiment, applying fuzzy logic to reduce emotions, and using fuzzy c-means clustering to further group similar emotions. The goal is to more accurately extract sentiment from transcripts by identifying both implicit and explicit expressions as well as topics through this combined approach. Evaluation metrics like precision, recall and f-measure will be used to assess performance.
This document summarizes an article that proposes an automatic text summarization technique using feature terms to calculate sentence relevance. The technique uses both statistical and linguistic methods to identify semantically important sentences for creating a generic summary. It determines the relevance of sentences based on feature term ranks and performs semantic analysis of sentences with the highest ranks to select those most important for the summary. The performance is evaluated by comparing summaries to those created by human evaluators.
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012Ayman El-Kilany
This paper proposes an unsupervised model for sentence compression based on clustering the nodes of a sentence's dependency graph. The model first clusters related nodes into chunks using the Louvain clustering method. It then merges chunks based on linguistic rules to improve coherence. Candidate compressions are generated by removing chunks, and scored based on language models and word importance to select the best compression. An experiment found the proposed method performed better than a recent supervised technique.
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...CSCJournals
The document presents a method for extracting semantic relations between concepts in Arabic texts. It constructs context vectors for concepts based on their co-occurrence with other concepts. It then uses several semantic similarity measures (Cosine, Jaccard, Lin) to calculate similarity scores between candidate concept vectors and seed concept vectors. Relations are extracted between candidates and seeds if their similarity score is above the average threshold for that seed. The method was evaluated on an Arabic corpus and achieved a precision of 83-85% for relation extraction, showing it is an effective unsupervised approach for extracting relations to construct Arabic ontologies.
Dynamic Question Answer Generator An Enhanced Approach to Question Generationijtsrd
Teachers and educational institutions seek new questions with different difficulty levels for setting up tests for their students. Also, students long for distinct and new questions to practice for their tests as redundant questions are found everywhere. However, setting up new questions every time is a tedious task for teachers. To overcome this conundrum, we have concocted an artificially intelligent system which generates questions and answers for the mathematical topic –Quadratic equations. The system uses i Randomization technique for generating unique questions each time and ii First order logic and Automated deduction to produce solution for the generated question. The goal was achieved and the system works efficiently. It is robust, reliable and helpful for teachers, students and other organizations for retrieving Quadratic equations questions, hassle free. Rahul Bhatia | Vishakha Gautam | Yash Kumar | Ankush Garg ""Dynamic Question Answer Generator: An Enhanced Approach to Question Generation"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23730.pdf
Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/23730/dynamic-question-answer-generator-an-enhanced-approach-to-question-generation/rahul-bhatia
Data mining is the knowledge discovery in databases and the gaol is to extract patterns and knowledge from
large amounts of data. The important term in data mining is text mining. Text mining extracts the quality
information highly from text. Statistical pattern learning is used to high quality information. High –quality in
text mining defines the combinations of relevance, novelty and interestingness. Tasks in text mining are text
categorization, text clustering, entity extraction and sentiment analysis. Applications of natural language
processing and analytical methods are highly preferred to turn
TALASH: A SEMANTIC AND CONTEXT BASED OPTIMIZED HINDI SEARCH ENGINEIJCSEIT Journal
This document summarizes a research paper that proposes three models for query expansion in a Hindi search engine: 1) Using lexical resources like HindiWordNet to find synonyms and related terms, 2) Using user context information like location, interests and profession, 3) Combining lexical resources and user context. An experiment compares the precision of results from simple Google searches to searches using each model. Precision was highest using the combined Model III at 0.79, showing that integrating lexical and user context information improves search quality in Hindi.
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHSkevig
In this paper a methodology to mine the concepts from documents and use these concepts to generate an
objective summary of the claims section of the patent documents is proposed. Conceptual Graph (CG)
formalism as proposed by Sowa (Sowa 1984) is used in this work for representing the concepts and their
relationships. Automatic identification of concepts and conceptual relations from text documents is a
challenging task. In this work the focus is on the analysis of the patent documents, mainly on the claim’s
section (Claim) of the documents. There are several complexities in the writing style of these documents as
they are technical as well as legal. It is observed that the general in-depth parsers available in the open
domain fail to parse the ‘claims section’ sentences in patent documents. The failure of in-depth parsers
has motivated us, to develop methodology to extract CGs using other resources. Thus in the present work
shallow parsing, NER and machine learning technique for extracting concepts and conceptual
relationships from sentences in the claim section of patent documents is used. Thus, this paper discusses i)
Generation of CG, a semantic network and ii) Generation of abstractive summary of the claims section of
the patent. The aim is to generate a summary which is 30% of the whole claim section. Here we use
Restricted Boltzmann Machines (RBMs), a deep learning technique for automatically extracting CGs. We
have tested our methodology using a corpus of 5000 patent documents from electronics domain. The results
obtained are encouraging and is comparable with the state of the art systems.
The sarcasm detection with the method of logistic regressionEditorIJAERD
The document discusses sarcasm detection using logistic regression. It compares the performance of logistic regression and SVM classification for sarcasm detection. Logistic regression achieved higher accuracy of 93.5% for sarcasm detection, with lower execution time compared to SVM classification. The proposed approach uses data preprocessing, feature extraction using N-grams, and trains a logistic regression classifier on a manually labeled dataset to classify text as sarcastic or non-sarcastic. Accuracy and execution time analysis shows logistic regression performs better than SVM for this task.
This summarizes an academic paper that proposes an automatic ontology creation method for classifying research papers. It uses text mining techniques like classification and clustering algorithms. It first builds a research ontology by extracting keywords and patterns from previous papers. It then uses a decision tree algorithm to classify new papers into disciplines defined in the ontology. The classified papers are then clustered based on similarities to group them. The method was tested on a dataset of 100 papers and achieved average precision of 85.7% for term-based and 89.3% for pattern-based keyword extraction.
NAMED ENTITY RECOGNITION IN TURKISH USING ASSOCIATION MEASURESacijjournal
Named Entity Recognition which is an important subject of Natural Language Processing is a key technology of information extraction, information retrieval, question answering and other text processing applications. In this study, we evaluate previously well-established association measures as an initial
attempt to extract two-worded named entities in a Turkish corpus. Furthermore we propose a new association measure, and compare it with the other methods. The evaluation of these methods is performed by precision and recall measures.
Ontology matching finds correspondences between similar entities of different ontologies. Two ontologies may be similar in some aspects such as structure, semantic etc. Most ontology matching systems integrate multiple matchers to extract all the similarities that two ontologies may have. Thus, we face a major problem to aggregate different similarities.
Some matching systems use experimental weights for aggregation of similarities among different matchers while others use machine learning approaches and optimization algorithms to find optimal weights to assign to different matchers. However, both approaches have their own deficiencies.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
The increased potential of the ontologies to reduce the human interference has wide range of applications. This paper identifies requirements for an ontology development platform to innovate artificially intelligent web. To facilitate this process, RDF and OWL have been developed as standard formats for the sharing and integration of data and knowledge. The knowledge in the form of rich conceptual schemas called ontologies. Based on the framework, an architectural paradigm is put forward in view of ontology engineering and development of ontology applications and a development portal designed to support ontology engineering, content authoring and application development with a view to maximal scalability in size and complexity of semantic knowledge and flexible reuse of ontology models and ontology application processes in a distributed and collaborative engineering environment.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
Generation of Question and Answer from Unstructured Document using Gaussian M...IJACEE IJACEE
The document describes a system that automatically generates questions and answers from an unstructured document. It involves several steps: (1) simplifying complex sentences, (2) generating initial questions using named entities and semantic role labeling, (3) identifying subtopics using LDA and GMNTM models, (4) measuring syntactic correctness of questions, and (5) extracting answers using pattern matching. The system is expected to produce more accurate results compared to using only LDA for subtopic identification, as GMNTM also considers word order and semantics. Key techniques include semantic role labeling, Extended String Subsequence Kernel for similarity measurement, and syntactic tree kernel for question ranking.
A statistical model for gist generation a case study on hindi news articleIJDKP
Every day, huge number of news articles are reported and disseminated on the Internet. By generating gist
of an article, reader can go through the main topics instead of reading the whole article as it takes much
time for reader to read the entire content of the article. An ideal system would understand the document
and generate the appropriate theme(s) directly from the results of the understanding. In the absence of
natural language understanding system, it is required to design an appropriate system. Gist generation is a
difficult task because it requires both maximizing text content in short summary and maintains
grammaticality of the text. In this paper we present a statistical approach to generate a gist of a Hindi
news article. The experimental results are evaluated using the standard measures such as precision, recall
and F1 measure for different statistical models and their combination on the article before pre-processing
and after pre-processing.
Modeling Text Independent Speaker Identification with Vector QuantizationTELKOMNIKA JOURNAL
Speaker identification is one of the most important technologies nowadays. Many fields such as
bioinformatics and security are using speaker identification. Also, almost all electronic devices are using
this technology too. Based on number of text, speaker identification divided into text dependent and text
independent. On many fields, text independent is mostly used because number of text is unlimited. So, text
independent is generally more challenging than text dependent. In this research, speaker identification text
independent with Indonesian speaker data was modelled with Vector Quantization (VQ). In this research
VQ with K-Means initialization was used. K-Means clustering also was used to initialize mean and
Hierarchical Agglomerative Clustering was used to identify K value for VQ. The best VQ accuracy was
59.67% when k was 5. According to the result, Indonesian language could be modelled by VQ. This
research can be developed using optimization method for VQ parameters such as Genetic Algorithm or
Particle Swarm Optimization.
The important problem of word segmentation in Thai language is sentential noun phrase. The existing
studies try to minimize the problem. But there is no research that solves this problem directly. This study
investigates the approach to resolve this problem using conditional random fields which is a probabilistic
model to segment and label sequence data. The results present that the corrected data of noun phrase was
detected more than 78.61 % based on our technique.
A Semi-Automatic Ontology Extension Method for Semantic Web ServicesIDES Editor
this paper provides a novel semi-automatic ontology
extension method for Semantic Web Services (SWS). This is
significant since ontology extension methods those existing
in literature mostly deal with semantic description of static
Web resources such as text documents. Hence, there is a need
for methods that can serve dynamic Web resources such as
SWS. The developed method in this paper avoids redundancy
and respects consistency so as to assure high quality of the
resulting shared ontologies.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
This document summarizes a proposed framework for sentiment classification using fuzzy logic. The framework aims to detect both implicit and explicit sentiment expressions in text by incorporating multiple datasets and techniques. It involves preprocessing text data, classifying sentiment, applying fuzzy logic to reduce emotions, and using fuzzy c-means clustering to further group similar emotions. The goal is to more accurately extract sentiment from transcripts by identifying both implicit and explicit expressions as well as topics through this combined approach. Evaluation metrics like precision, recall and f-measure will be used to assess performance.
This document summarizes an article that proposes an automatic text summarization technique using feature terms to calculate sentence relevance. The technique uses both statistical and linguistic methods to identify semantically important sentences for creating a generic summary. It determines the relevance of sentences based on feature term ranks and performs semantic analysis of sentences with the highest ranks to select those most important for the summary. The performance is evaluated by comparing summaries to those created by human evaluators.
Sentence compression via clustering of dependency graph nodes - NLP-KE 2012Ayman El-Kilany
This paper proposes an unsupervised model for sentence compression based on clustering the nodes of a sentence's dependency graph. The model first clusters related nodes into chunks using the Louvain clustering method. It then merges chunks based on linguistic rules to improve coherence. Candidate compressions are generated by removing chunks, and scored based on language models and word importance to select the best compression. An experiment found the proposed method performed better than a recent supervised technique.
An Efficient Semantic Relation Extraction Method For Arabic Texts Based On Si...CSCJournals
The document presents a method for extracting semantic relations between concepts in Arabic texts. It constructs context vectors for concepts based on their co-occurrence with other concepts. It then uses several semantic similarity measures (Cosine, Jaccard, Lin) to calculate similarity scores between candidate concept vectors and seed concept vectors. Relations are extracted between candidates and seeds if their similarity score is above the average threshold for that seed. The method was evaluated on an Arabic corpus and achieved a precision of 83-85% for relation extraction, showing it is an effective unsupervised approach for extracting relations to construct Arabic ontologies.
Dynamic Question Answer Generator An Enhanced Approach to Question Generationijtsrd
Teachers and educational institutions seek new questions with different difficulty levels for setting up tests for their students. Also, students long for distinct and new questions to practice for their tests as redundant questions are found everywhere. However, setting up new questions every time is a tedious task for teachers. To overcome this conundrum, we have concocted an artificially intelligent system which generates questions and answers for the mathematical topic –Quadratic equations. The system uses i Randomization technique for generating unique questions each time and ii First order logic and Automated deduction to produce solution for the generated question. The goal was achieved and the system works efficiently. It is robust, reliable and helpful for teachers, students and other organizations for retrieving Quadratic equations questions, hassle free. Rahul Bhatia | Vishakha Gautam | Yash Kumar | Ankush Garg ""Dynamic Question Answer Generator: An Enhanced Approach to Question Generation"" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-4 , June 2019, URL: https://www.ijtsrd.com/papers/ijtsrd23730.pdf
Paper URL: https://www.ijtsrd.com/computer-science/artificial-intelligence/23730/dynamic-question-answer-generator-an-enhanced-approach-to-question-generation/rahul-bhatia
Data mining is the knowledge discovery in databases and the gaol is to extract patterns and knowledge from
large amounts of data. The important term in data mining is text mining. Text mining extracts the quality
information highly from text. Statistical pattern learning is used to high quality information. High –quality in
text mining defines the combinations of relevance, novelty and interestingness. Tasks in text mining are text
categorization, text clustering, entity extraction and sentiment analysis. Applications of natural language
processing and analytical methods are highly preferred to turn
TALASH: A SEMANTIC AND CONTEXT BASED OPTIMIZED HINDI SEARCH ENGINEIJCSEIT Journal
This document summarizes a research paper that proposes three models for query expansion in a Hindi search engine: 1) Using lexical resources like HindiWordNet to find synonyms and related terms, 2) Using user context information like location, interests and profession, 3) Combining lexical resources and user context. An experiment compares the precision of results from simple Google searches to searches using each model. Precision was highest using the combined Model III at 0.79, showing that integrating lexical and user context information improves search quality in Hindi.
PATENT DOCUMENT SUMMARIZATION USING CONCEPTUAL GRAPHSkevig
In this paper a methodology to mine the concepts from documents and use these concepts to generate an
objective summary of the claims section of the patent documents is proposed. Conceptual Graph (CG)
formalism as proposed by Sowa (Sowa 1984) is used in this work for representing the concepts and their
relationships. Automatic identification of concepts and conceptual relations from text documents is a
challenging task. In this work the focus is on the analysis of the patent documents, mainly on the claim’s
section (Claim) of the documents. There are several complexities in the writing style of these documents as
they are technical as well as legal. It is observed that the general in-depth parsers available in the open
domain fail to parse the ‘claims section’ sentences in patent documents. The failure of in-depth parsers
has motivated us, to develop methodology to extract CGs using other resources. Thus in the present work
shallow parsing, NER and machine learning technique for extracting concepts and conceptual
relationships from sentences in the claim section of patent documents is used. Thus, this paper discusses i)
Generation of CG, a semantic network and ii) Generation of abstractive summary of the claims section of
the patent. The aim is to generate a summary which is 30% of the whole claim section. Here we use
Restricted Boltzmann Machines (RBMs), a deep learning technique for automatically extracting CGs. We
have tested our methodology using a corpus of 5000 patent documents from electronics domain. The results
obtained are encouraging and is comparable with the state of the art systems.
The sarcasm detection with the method of logistic regressionEditorIJAERD
The document discusses sarcasm detection using logistic regression. It compares the performance of logistic regression and SVM classification for sarcasm detection. Logistic regression achieved higher accuracy of 93.5% for sarcasm detection, with lower execution time compared to SVM classification. The proposed approach uses data preprocessing, feature extraction using N-grams, and trains a logistic regression classifier on a manually labeled dataset to classify text as sarcastic or non-sarcastic. Accuracy and execution time analysis shows logistic regression performs better than SVM for this task.
This summarizes an academic paper that proposes an automatic ontology creation method for classifying research papers. It uses text mining techniques like classification and clustering algorithms. It first builds a research ontology by extracting keywords and patterns from previous papers. It then uses a decision tree algorithm to classify new papers into disciplines defined in the ontology. The classified papers are then clustered based on similarities to group them. The method was tested on a dataset of 100 papers and achieved average precision of 85.7% for term-based and 89.3% for pattern-based keyword extraction.
The peer-reviewed International Journal of Engineering Inventions (IJEI) is started with a mission to encourage contribution to research in Science and Technology. Encourage and motivate researchers in challenging areas of Sciences and Technology.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
Improved method for pattern discovery in text miningeSAT Journals
This document summarizes an improved method for pattern discovery in text mining proposed by Bharate Laxman and D. Sujatha. The method implements a novel pattern discovery technique proposed by Zhong et al. that discovers patterns from text documents and computes pattern specificities to evaluate term weights. The authors built a prototype application to test the technique. Experimental results showed the solution is useful for text mining as it avoids problems of misinterpretation and low frequency compared to previous methods.
A Novel approach for Document Clustering using Concept ExtractionAM Publications
In this paper we present a novel approach to extract the concept from a document and cluster such set of documents depending on the concept extracted from each of them. We transform the corpus into vector space by using term frequency–inverse document frequency then calculate the cosine distance between each document, followed by clustering them using K means algorithm. We also use multidimensional scaling to reduce the dimensionality within the corpus. It results in the grouping of documents which are most similar to each other with respect to their content and the genre.
The document reviews question answering systems by discussing their history, frameworks, issues, and classifications. It describes the typical three-stage framework of question analysis, passage retrieval, and answer extraction. It also examines important issues for QA systems like question processing, data sources, and multilingual capabilities. Finally, it classifies QA systems based on criteria like application domain, question type, data source, and answer format.
French machine reading for question answeringAli Kabbadj
This paper proposes to unlock the main barrier to machine reading and comprehension French natural language texts. This open the way to machine to find to a question a precise answer buried in the mass of unstructured French texts. Or to create a universal French chatbot. Deep learning has produced extremely promising results for various tasks in natural language understanding particularly topic classification, sentiment analysis, question answering, and language translation. But to be effective Deep Learning methods need very large training da-tasets. Until now these technics cannot be actually used for French texts Question Answering (Q&A) applications since there was not a large Q&A training dataset. We produced a large (100 000+) French training Dataset for Q&A by translating and adapting the English SQuAD v1.1 Dataset, a GloVe French word and character embed-ding vectors from Wikipedia French Dump. We trained and evaluated of three different Q&A neural network ar-chitectures in French and carried out a French Q&A models with F1 score around 70%.
In this paper we tried to correlate text sequences those provides common topics for semantic clues. We propose a two step method for asynchronous text mining. Step one check for the common topics in the sequences and isolates these with their timestamps. Step two takes the topic and tries to give the timestamp of the text document. After multiple repetitions of step two, we could give optimum result.
The document describes a proposed system for automatic text summarization using latent semantic analysis and natural language processing. The system would take in a user-submitted document, preprocess it by removing stop words, generate a singular value decomposition matrix to derive the latent semantic structure, and output a summary by semantically arranging sentences to encompass the original text's concepts. The document provides an implementation example in Python using the LSA method from the scikit-learn machine learning library for natural language processing and matrix decomposition.
Text Mining is the technique that helps users to find out useful information from a large amount of text documents on the web or database. Most popular text mining and classification methods have adopted term-based approaches. The term based approaches and the pattern-based method describing user preferences. This review paper analyse how the text mining work on the three level i.e sentence level, document level and feature level. In this paper we review the related work which is previously done. This paper also demonstrated that what are the problems arise while doing text mining done at the feature level. This paper presents the technique to text mining for the compound sentences.
A Novel Method for An Intelligent Based Voice Meeting System Using Machine Le...IRJET Journal
This document proposes a method for an intelligent voice meeting system using machine learning. It involves speech recognition to convert speech to text, followed by text summarization. The proposed method achieved high accuracy, precision, and F1 scores in experiments. It provides an effective way to transcribe and summarize meetings by utilizing speech recognition and text summarization technologies. The system architecture involves speech to text conversion, text summarization, storage in a database, and display of summarized text to users. The experimental results demonstrate the system's ability to accurately transcribe and summarize spoken content.
A N E XTENSION OF P ROTÉGÉ FOR AN AUTOMA TIC F UZZY - O NTOLOGY BUILDING U...ijcsit
The process of building ontology is a very
complex and time
-
consuming process
especially when dealing
with huge amount of data. Unfortunately current
marketed
tools are very limited and don’t meet
all
user
needs.
Indeed, t
hese software build the core of the ontology from initial data that generates
a
big number of
information.
In this paper, we
aim to resolve these problems
by adding an extension to the well known
ontology editor Protégé in order to work towards a complete
FCA
-
based framework
which resolves the
limitation of other tools in
building fuzzy
-
ontology
.
W
e will give
, in this paper
, some
details on
our
sem
i
-
automat
ic collaborative tool
called FOD Tab Plug
-
in
which
takes into consideration another degree of
granularity in the process of generation
.
In fact, i
t follows a bottom
-
up strategy based on conceptual
clustering, fuzzy logic and Formal Concept Analysis (FCA) a
nd it defines ontology between classes
resulting from a preliminary classification of data and not from the initial large amount of data
.
Classification of News and Research Articles Using Text Pattern MiningIOSR Journals
This document summarizes a research paper that proposes a method for classifying news and research articles using text pattern mining. The method involves preprocessing text to remove stop words and perform stemming. Frequent and closed patterns are then discovered from the preprocessed text. These patterns are structured into a taxonomy and deployed to classify new documents. The method also involves evolving patterns by reshuffling term supports within patterns to reduce the effects of noise from negative documents. Over 80% of documents were successfully classified using this pattern-based approach.
A Review Of Text Mining Techniques And ApplicationsLisa Graves
This document provides a review of various text mining techniques and applications. It discusses techniques used for text classification and summarization, including Naive Bayes classification, backpropagation neural networks, keyword matching, and information extraction. It also covers applications of text mining in areas like sentiment analysis of social media posts and hotel reviews. Finally, it discusses the need for organizational text mining to extract useful information and insights from large amounts of unstructured text data.
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET Journal
This document discusses a proposed system for deep collaborative filtering with aspect information. The system aims to help web users efficiently locate relevant information on unfamiliar topics to increase their knowledge. It utilizes techniques like multi-keyword search, synonym matching, and ontology mapping to return relevant web links, images, and news articles to the user based on their search terms. The proposed system architecture includes an index structure to efficiently search and rank results based on similarity to the search query terms. The implementation and evaluation of the proposed system are also discussed.
A Survey of Ontology-based Information Extraction for Social Media Content An...ijcnes
The amount of information generated in the Web has grown enormously over the years. This information is significant to individuals, businesses and organizations. If analyzed, understood and utilized, it will provide a valuable insight to its stakeholders. However, many of these information are semi-structured or unstructured which makes it difficult to draw in-depth understanding of the implications behind those information. This is where Ontology-based Information Extraction (OBIE) and social media content analysis come into play. OBIE has now become a popular way to extract information coming from machine-readable sources. This paper presents a survey of OBIE, Ontology languages and tools and the process to build an ontology model and framework. The author made a comparison of two ontology building frameworks and identified which framework is complete.
IRJET- Automatic Recapitulation of Text DocumentIRJET Journal
The document describes an approach for automatically generating summaries of text documents. It involves preprocessing the input text by tokenizing, removing stop words, and stemming words. It then extracts features from the preprocessed text, such as term frequency-inverse document frequency (tf-idf) values. Sentences are scored based on these features and sentences with higher scores are selected to form the summary. Keywords from the text are also identified using WordNet to help select the most relevant sentences for the summary. The proposed approach aims to generate concise yet meaningful summaries using natural language processing techniques.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Similar to A NEW APPROACH BASED ON THE DETECTION OF OPINION BY SENTIWORDNET FOR AUTOMATIC TEXT SUMMARIES BY EXTRACTION (20)
ANALYSIS OF LAND SURFACE DEFORMATION GRADIENT BY DINSAR cscpconf
The progressive development of Synthetic Aperture Radar (SAR) systems diversify the exploitation of the generated images by these systems in different applications of geoscience. Detection and monitoring surface deformations, procreated by various phenomena had benefited from this evolution and had been realized by interferometry (InSAR) and differential interferometry (DInSAR) techniques. Nevertheless, spatial and temporal decorrelations of the interferometric couples used, limit strongly the precision of analysis results by these techniques. In this context, we propose, in this work, a methodological approach of surface deformation detection and analysis by differential interferograms to show the limits of this technique according to noise quality and level. The detectability model is generated from the deformation signatures, by simulating a linear fault merged to the images couples of ERS1 / ERS2 sensors acquired in a region of the Algerian south.
4D AUTOMATIC LIP-READING FOR SPEAKER'S FACE IDENTIFCATIONcscpconf
A novel based a trajectory-guided, concatenating approach for synthesizing high-quality image real sample renders video is proposed . The lips reading automated is seeking for modeled the closest real image sample sequence preserve in the library under the data video to the HMM predicted trajectory. The object trajectory is modeled obtained by projecting the face patterns into an KDA feature space is estimated. The approach for speaker's face identification by using synthesise the identity surface of a subject face from a small sample of patterns which sparsely each the view sphere. An KDA algorithm use to the Lip-reading image is discrimination, after that work consisted of in the low dimensional for the fundamental lip features vector is reduced by using the 2D-DCT.The mouth of the set area dimensionality is ordered by a normally reduction base on the PCA to obtain the Eigen lips approach, their proposed approach by[33]. The subjective performance results of the cost function under the automatic lips reading modeled , which wasn’t illustrate the superior performance of the
method.
MOVING FROM WATERFALL TO AGILE PROCESS IN SOFTWARE ENGINEERING CAPSTONE PROJE...cscpconf
Universities offer software engineering capstone course to simulate a real world-working environment in which students can work in a team for a fixed period to deliver a quality product. The objective of the paper is to report on our experience in moving from Waterfall process to Agile process in conducting the software engineering capstone project. We present the capstone course designs for both Waterfall driven and Agile driven methodologies that highlight the structure, deliverables and assessment plans.To evaluate the improvement, we conducted a survey for two different sections taught by two different instructors to evaluate students’ experience in moving from traditional Waterfall model to Agile like process. Twentyeight students filled the survey. The survey consisted of eight multiple-choice questions and an open-ended question to collect feedback from students. The survey results show that students were able to attain hands one experience, which simulate a real world-working environment. The results also show that the Agile approach helped students to have overall better design and avoid mistakes they have made in the initial design completed in of the first phase of the capstone project. In addition, they were able to decide on their team capabilities, training needs and thus learn the required technologies earlier which is reflected on the final product quality
PROMOTING STUDENT ENGAGEMENT USING SOCIAL MEDIA TECHNOLOGIEScscpconf
This document discusses using social media technologies to promote student engagement in a software project management course. It describes the course and objectives of enhancing communication. It discusses using Facebook for 4 years, then switching to WhatsApp based on student feedback, and finally introducing Slack to enable personalized team communication. Surveys found students engaged and satisfied with all three tools, though less familiar with Slack. The conclusion is that social media promotes engagement but familiarity with the tool also impacts satisfaction.
A SURVEY ON QUESTION ANSWERING SYSTEMS: THE ADVANCES OF FUZZY LOGICcscpconf
In real world computing environment with using a computer to answer questions has been a human dream since the beginning of the digital era, Question-answering systems are referred to as intelligent systems, that can be used to provide responses for the questions being asked by the user based on certain facts or rules stored in the knowledge base it can generate answers of questions asked in natural , and the first main idea of fuzzy logic was to working on the problem of computer understanding of natural language, so this survey paper provides an overview on what Question-Answering is and its system architecture and the possible relationship and
different with fuzzy logic, as well as the previous related research with respect to approaches that were followed. At the end, the survey provides an analytical discussion of the proposed QA models, along or combined with fuzzy logic and their main contributions and limitations.
DYNAMIC PHONE WARPING – A METHOD TO MEASURE THE DISTANCE BETWEEN PRONUNCIATIONS cscpconf
Human beings generate different speech waveforms while speaking the same word at different times. Also, different human beings have different accents and generate significantly varying speech waveforms for the same word. There is a need to measure the distances between various words which facilitate preparation of pronunciation dictionaries. A new algorithm called Dynamic Phone Warping (DPW) is presented in this paper. It uses dynamic programming technique for global alignment and shortest distance measurements. The DPW algorithm can be used to enhance the pronunciation dictionaries of the well-known languages like English or to build pronunciation dictionaries to the less known sparse languages. The precision measurement experiments show 88.9% accuracy.
INTELLIGENT ELECTRONIC ASSESSMENT FOR SUBJECTIVE EXAMS cscpconf
In education, the use of electronic (E) examination systems is not a novel idea, as Eexamination systems have been used to conduct objective assessments for the last few years. This research deals with randomly designed E-examinations and proposes an E-assessment system that can be used for subjective questions. This system assesses answers to subjective questions by finding a matching ratio for the keywords in instructor and student answers. The matching ratio is achieved based on semantic and document similarity. The assessment system is composed of four modules: preprocessing, keyword expansion, matching, and grading. A survey and case study were used in the research design to validate the proposed system. The examination assessment system will help instructors to save time, costs, and resources, while increasing efficiency and improving the productivity of exam setting and assessments.
TWO DISCRETE BINARY VERSIONS OF AFRICAN BUFFALO OPTIMIZATION METAHEURISTICcscpconf
African Buffalo Optimization (ABO) is one of the most recent swarms intelligence based metaheuristics. ABO algorithm is inspired by the buffalo’s behavior and lifestyle. Unfortunately, the standard ABO algorithm is proposed only for continuous optimization problems. In this paper, the authors propose two discrete binary ABO algorithms to deal with binary optimization problems. In the first version (called SBABO) they use the sigmoid function and probability model to generate binary solutions. In the second version (called LBABO) they use some logical operator to operate the binary solutions. Computational results on two knapsack problems (KP and MKP) instances show the effectiveness of the proposed algorithm and their ability to achieve good and promising solutions.
DETECTION OF ALGORITHMICALLY GENERATED MALICIOUS DOMAINcscpconf
In recent years, many malware writers have relied on Dynamic Domain Name Services (DDNS) to maintain their Command and Control (C&C) network infrastructure to ensure a persistence presence on a compromised host. Amongst the various DDNS techniques, Domain Generation Algorithm (DGA) is often perceived as the most difficult to detect using traditional methods. This paper presents an approach for detecting DGA using frequency analysis of the character distribution and the weighted scores of the domain names. The approach’s feasibility is demonstrated using a range of legitimate domains and a number of malicious algorithmicallygenerated domain names. Findings from this study show that domain names made up of English characters “a-z” achieving a weighted score of < 45 are often associated with DGA. When a weighted score of < 45 is applied to the Alexa one million list of domain names, only 15% of the domain names were treated as non-human generated.
GLOBAL MUSIC ASSET ASSURANCE DIGITAL CURRENCY: A DRM SOLUTION FOR STREAMING C...cscpconf
The document proposes a blockchain-based digital currency and streaming platform called GoMAA to address issues of piracy in the online music streaming industry. Key points:
- GoMAA would use a digital token on the iMediaStreams blockchain to enable secure dissemination and tracking of streamed content. Content owners could control access and track consumption of released content.
- Original media files would be converted to a Secure Portable Streaming (SPS) format, embedding watermarks and smart contract data to indicate ownership and enable validation on the blockchain.
- A browser plugin would provide wallets for fans to collect GoMAA tokens as rewards for consuming content, incentivizing participation and addressing royalty discrepancies by recording
IMPORTANCE OF VERB SUFFIX MAPPING IN DISCOURSE TRANSLATION SYSTEMcscpconf
This document discusses the importance of verb suffix mapping in discourse translation from English to Telugu. It explains that after anaphora resolution, the verbs must be changed to agree with the gender, number, and person features of the subject or anaphoric pronoun. Verbs in Telugu inflect based on these features, while verbs in English only inflect based on number and person. Several examples are provided that demonstrate how the Telugu verb changes based on whether the subject or pronoun is masculine, feminine, neuter, singular or plural. Proper verb suffix mapping is essential for generating natural and coherent translations while preserving the context and meaning of the original discourse.
EXACT SOLUTIONS OF A FAMILY OF HIGHER-DIMENSIONAL SPACE-TIME FRACTIONAL KDV-T...cscpconf
In this paper, based on the definition of conformable fractional derivative, the functional
variable method (FVM) is proposed to seek the exact traveling wave solutions of two higherdimensional
space-time fractional KdV-type equations in mathematical physics, namely the
(3+1)-dimensional space–time fractional Zakharov-Kuznetsov (ZK) equation and the (2+1)-
dimensional space–time fractional Generalized Zakharov-Kuznetsov-Benjamin-Bona-Mahony
(GZK-BBM) equation. Some new solutions are procured and depicted. These solutions, which
contain kink-shaped, singular kink, bell-shaped soliton, singular soliton and periodic wave
solutions, have many potential applications in mathematical physics and engineering. The
simplicity and reliability of the proposed method is verified.
AUTOMATED PENETRATION TESTING: AN OVERVIEWcscpconf
The document discusses automated penetration testing and provides an overview. It compares manual and automated penetration testing, noting that automated testing allows for faster, more standardized and repeatable tests but has limitations in developing new exploits. It also reviews some current automated penetration testing methodologies and tools, including those using HTTP/TCP/IP attacks, linking common scanning tools, a Python-based tool targeting databases, and one using POMDPs for multi-step penetration test planning under uncertainty. The document concludes that automated testing is more efficient than manual for known vulnerabilities but cannot replace manual testing for discovering new exploits.
CLASSIFICATION OF ALZHEIMER USING fMRI DATA AND BRAIN NETWORKcscpconf
Since the mid of 1990s, functional connectivity study using fMRI (fcMRI) has drawn increasing
attention of neuroscientists and computer scientists, since it opens a new window to explore
functional network of human brain with relatively high resolution. BOLD technique provides
almost accurate state of brain. Past researches prove that neuro diseases damage the brain
network interaction, protein- protein interaction and gene-gene interaction. A number of
neurological research paper also analyse the relationship among damaged part. By
computational method especially machine learning technique we can show such classifications.
In this paper we used OASIS fMRI dataset affected with Alzheimer’s disease and normal
patient’s dataset. After proper processing the fMRI data we use the processed data to form
classifier models using SVM (Support Vector Machine), KNN (K- nearest neighbour) & Naïve
Bayes. We also compare the accuracy of our proposed method with existing methods. In future,
we will other combinations of methods for better accuracy.
VALIDATION METHOD OF FUZZY ASSOCIATION RULES BASED ON FUZZY FORMAL CONCEPT AN...cscpconf
The document proposes a new validation method for fuzzy association rules based on three steps: (1) applying the EFAR-PN algorithm to extract a generic base of non-redundant fuzzy association rules using fuzzy formal concept analysis, (2) categorizing the extracted rules into groups, and (3) evaluating the relevance of the rules using structural equation modeling, specifically partial least squares. The method aims to address issues with existing fuzzy association rule extraction algorithms such as large numbers of extracted rules, redundancy, and difficulties with manual validation.
PROBABILITY BASED CLUSTER EXPANSION OVERSAMPLING TECHNIQUE FOR IMBALANCED DATAcscpconf
In many applications of data mining, class imbalance is noticed when examples in one class are
overrepresented. Traditional classifiers result in poor accuracy of the minority class due to the
class imbalance. Further, the presence of within class imbalance where classes are composed of
multiple sub-concepts with different number of examples also affect the performance of
classifier. In this paper, we propose an oversampling technique that handles between class and
within class imbalance simultaneously and also takes into consideration the generalization
ability in data space. The proposed method is based on two steps- performing Model Based
Clustering with respect to classes to identify the sub-concepts; and then computing the
separating hyperplane based on equal posterior probability between the classes. The proposed
method is tested on 10 publicly available data sets and the result shows that the proposed
method is statistically superior to other existing oversampling methods.
CHARACTER AND IMAGE RECOGNITION FOR DATA CATALOGING IN ECOLOGICAL RESEARCHcscpconf
Data collection is an essential, but manpower intensive procedure in ecological research. An
algorithm was developed by the author which incorporated two important computer vision
techniques to automate data cataloging for butterfly measurements. Optical Character
Recognition is used for character recognition and Contour Detection is used for imageprocessing.
Proper pre-processing is first done on the images to improve accuracy. Although
there are limitations to Tesseract’s detection of certain fonts, overall, it can successfully identify
words of basic fonts. Contour detection is an advanced technique that can be utilized to
measure an image. Shapes and mathematical calculations are crucial in determining the precise
location of the points on which to draw the body and forewing lines of the butterfly. Overall,
92% accuracy were achieved by the program for the set of butterflies measured.
SOCIAL MEDIA ANALYTICS FOR SENTIMENT ANALYSIS AND EVENT DETECTION IN SMART CI...cscpconf
Smart cities utilize Internet of Things (IoT) devices and sensors to enhance the quality of the city
services including energy, transportation, health, and much more. They generate massive
volumes of structured and unstructured data on a daily basis. Also, social networks, such as
Twitter, Facebook, and Google+, are becoming a new source of real-time information in smart
cities. Social network users are acting as social sensors. These datasets so large and complex
are difficult to manage with conventional data management tools and methods. To become
valuable, this massive amount of data, known as 'big data,' needs to be processed and
comprehended to hold the promise of supporting a broad range of urban and smart cities
functions, including among others transportation, water, and energy consumption, pollution
surveillance, and smart city governance. In this work, we investigate how social media analytics
help to analyze smart city data collected from various social media sources, such as Twitter and
Facebook, to detect various events taking place in a smart city and identify the importance of
events and concerns of citizens regarding some events. A case scenario analyses the opinions of
users concerning the traffic in three largest cities in the UAE
SOCIAL NETWORK HATE SPEECH DETECTION FOR AMHARIC LANGUAGEcscpconf
The anonymity of social networks makes it attractive for hate speech to mask their criminal
activities online posing a challenge to the world and in particular Ethiopia. With this everincreasing
volume of social media data, hate speech identification becomes a challenge in
aggravating conflict between citizens of nations. The high rate of production, has become
difficult to collect, store and analyze such big data using traditional detection methods. This
paper proposed the application of apache spark in hate speech detection to reduce the
challenges. Authors developed an apache spark based model to classify Amharic Facebook
posts and comments into hate and not hate. Authors employed Random forest and Naïve Bayes
for learning and Word2Vec and TF-IDF for feature selection. Tested by 10-fold crossvalidation,
the model based on word2vec embedding performed best with 79.83%accuracy. The
proposed method achieve a promising result with unique feature of spark for big data.
GENERAL REGRESSION NEURAL NETWORK BASED POS TAGGING FOR NEPALI TEXTcscpconf
This article presents Part of Speech tagging for Nepali text using General Regression Neural
Network (GRNN). The corpus is divided into two parts viz. training and testing. The network is
trained and validated on both training and testing data. It is observed that 96.13% words are
correctly being tagged on training set whereas 74.38% words are tagged correctly on testing
data set using GRNN. The result is compared with the traditional Viterbi algorithm based on
Hidden Markov Model. Viterbi algorithm yields 97.2% and 40% classification accuracies on
training and testing data sets respectively. GRNN based POS Tagger is more consistent than the
traditional Viterbi decoding technique.
Let's Integrate MuleSoft RPA, COMPOSER, APM with AWS IDP along with Slackshyamraj55
Discover the seamless integration of RPA (Robotic Process Automation), COMPOSER, and APM with AWS IDP enhanced with Slack notifications. Explore how these technologies converge to streamline workflows, optimize performance, and ensure secure access, all while leveraging the power of AWS IDP and real-time communication via Slack notifications.
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Unlocking Productivity: Leveraging the Potential of Copilot in Microsoft 365, a presentation by Christoforos Vlachos, Senior Solutions Manager – Modern Workplace, Uni Systems
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
UiPath Test Automation using UiPath Test Suite series, part 5DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 5. In this session, we will cover CI/CD with devops.
Topics covered:
CI/CD with in UiPath
End-to-end overview of CI/CD pipeline with Azure devops
Speaker:
Lyndsey Byblow, Test Suite Sales Engineer @ UiPath, Inc.
2. 164 Computer Science & Information Technology (CS & IT)
Summary of text appears interesting for fast access to the content of textual information. A
summary is a reissued the original text in smaller form that is done under the constraint of
keeping the semantics of a document that is minimized entropy semantics. The purpose of this
operation is to help the reader identify interesting information for him without having to read the
entire document. The uses of automatic summaries aim to reduce the time to find the relevant
documents or reduce treatment long texts by identifying the key information. The volume of
electronic textual information is increasing, making access to information difficult. Producing a
summary may facilitate access to information, but it is also a complex task because it requires
language skills.
To do an automatic summarization, the current literature presents three approaches:
• Automatic Summarization by extraction
• Automatic Summarization by understanding
• Automatic Summarization by automatic classification
Another line of research that has gained momentum in recent years, in case the Opinion Mining
or the fact of detecting opinion of a sentence, paragraph or text. Our job is to use detection
methods to produce a summary opinion. We propose the hypothesis:
"Textual units that do not share the same opinion of the text are ideas used for the
development or comparison and their absences have no vocation to reach the semantics of
the abstract"
In this work we will generate a summary automatically by extraction approach, we will use the
scoring technique where the score will calculate according to opinion by using a SentiWordNet
We will build a summary of the sentences that have an opinion similar to that of full text
according to a threshold of opinion; our work will give an answer for the following question:
• Have our hypothesis been testable? If so, is it valid?
• What is the impact of opinion threshold on the quality of the summary?
• The opinion mining can he bring a plus for automatic summarization?
2. LITERATURE REVIEW
Automatically produce a summary is an idea that has emerged in the early 1950s, this is a branch
of natural language processing (NLP). The first attempts made their apparition in 1950, the
community has tried to implement simple approaches as extracting relevant sentences according
to a scoring in order to arrive at a summary understandable and easily readable by a human.
(Luhn, 1958) [1] (Edmundson, 1969)[2] proposed to identify lexical units carriers of semantics
by manual analysis, the result is called extracts (in English), i.e. an extract is a summary built by
phrases (considered as pertinent) original text. Their idea is to assign a weight to each sentence
that represents its pertinence then extract either by a reduced rate, N sentences whose weight are
greater, or a threshold scoring, tell that all the sentences with a score and greater than or equal to
the threshold will be kept.
3. Computer Science & Information Technology (CS & IT) 165
Other work focuses on automating the analysis for the detection of pertinent lexical units: several
methods and variations have been proposed (Radev et al, 2001)[3]; (Radev et al, 2004)[4];
(Boudin and al , 2007)[5]; (Carbonell and al, 1998)[6].(Boudin and al, 2008)[7] we see that we
can select two sentences pertinent but if it looks like, they have worked on eliminating
redundancy by similarity measures sentences, they even prove their method detects the topic text.
The proposed approaches are intuitive, since the 1960s (Edmundson, 1969)[2] pro-poses to make
an identification of keywords (based on a theme), followed by a job that involves a consideration
of the position of sentences in the document. The MEAD2 system is the most popular nowadays,
developed instead by Ramdev, he implemented the approach of this type (Radev et al., 2001)[3].
He identified the most salient words in each text, which he calls "centroid", and A extract is made
by sen-tence who contain the greatest number of "centroid". Another Neo-cortex system was
developed at the University of Avignon, which is based on the combination of screening
measures, phrases to select the benefits covered by each of them (action selection), this system
obtained very good results by evaluating the algorithm MMR (Maximal Marginal Relevance) of
(Carbonell & Goldstein, 1998)[6].
Some years later other approaches were presented, these approaches are based on knowledge
representation (Mani, 2004)[8], the thematic segmentation (Farzindar et al, 2004)[9] Or
recognition of user profiles (author of full text) (Crispino & Couto, 2004)[10].
Noting some attempts to introduce deep linguistic systems in automatic summarization, most
recent work was on the syntax tree by providing a method for comparing between syntactic trees,
make the elimination of syntax trees or merge (Barzilay & McKeown, 2005)[11];. This attempt
was followed by another job that offers alternative methods of syntactic compression based on
theoretical and empirical linguistic properties (Yousfi-Monod, 2007)[12].
Most jobs are not interested in the opinion contained in the text, at the end of 2000 the
community began to have more specialized query summaries, including dealing with the
detection and analysis of the opinion. (Eyrich, et al, 2001)[13] propose a system diagram that has
not been implemented: This system integrates a QR module and an analysis module opinion to
make a summary of the responses to the opinion without changing the content.
Work on automatic summaries have neglected early analysis of the opinion. Analysis of opinion
is divided into three main levels of subtasks: the first sub-task is to distinguish between a
subjective texts and an objective text (Yu & Hatzivassiloglou, 2003) [14]. The second sub-task is
to classify texts subjective positive or negative (Turney, 2002)[15]. The third level of refinement
trying to determine the extent to which positive or negative texts (Wilson et al., 2004)[16]. The
impulse given by campaigns such as TREC Blog task opinion since 2006 is undeniable (Pang &
Lee, 2008)[17]. (Zhang et al, 2007)[18];(Dey & Haque, 2009)[19].
Opinion Mining is an area that has attracted many researchers which resulted in several works.
There are two types of approaches for detecting opinion: Approaches based on corpus (Corpus-
based Approach) (Hatzivassiloglou and McKeown, 1997)[20] (Wiebe, 2000) [21]
(Kanayama and Nasukawa, 2006) [22] (Esuli and Sebastiani 2006) [23]; and (Qiu et al, 2009)
[24], others based on a dictionary (Dictionary-based Approach) dictionary (Hu and Liu, 2004)
[25], (Kim and Hovy, 2004) [26], (Kamps et al, 2004) [27], (Esuli and Sebastiani , 2005)[28],
(Andreevskaya and Bergler, 2006) [29], and (Bouchlaghem et al, 2010) [30].
4. 166 Computer Science & Information Technology (CS & IT)
This is the second approach that will be used in this article, (Wu and Liu, 2004) use the adjectives
for the detection of opinions. They manually build a list of adjectives they use to predict the
orientation of the sentence and use WordNet to supply the list synonyms and antonyms of
adjectives whose polarity is known. In (Liu and al, 2007)[31], the authors count the number of
occurrences of each entity in the "Pros" expressing a positive opinion and "Cons" that negative
opinions. In (Zhang and Liu, 2011) [32], the authors showed that the noun and noun phrases can
also enclose opinions, They count the number of positive and negative sentences for each feature
of the product using the lexicon of opinion prepared by (Ding and al, 2008)[33].Strength
(intensity) of opinion is also required; Indeed, subjectivity is expressed in different ways; "Good
battery" is different from "great battery" and "excellent battery." (Pang and Lee, 2008)[17] focus
on detecting the strength of opinion using the techniques of boosting, rule learning and support
vector regression. (Pang and Lee, 2008) [17]and (Turney, 2002)[15] classify documents as
"thumbs up" or "thumbs down" according to the opinion they convey. However, (Pang and Lee,
2005)[34] exploit machine learning techniques to give a score from 1 to 5 on passages opinions.
While (Esuli and Sebastiani 2006)[28] construct the SentiWordNet that a dictionary of general
opinion; currently in its third version SentiWordNet 3.0; SentiWordNet can be defined as a
lexical resource designed specifically for use by application of detecting opinions and feelings.
SentiWordNet is the result of an annotation of all synsets of WordNet so that it assigns to each
word (synset) an opinion score. SentiWordNet contains 1000 synsets which makes it very small
compared with WordNet, besides the 1000 synsets SentiWordNet automatically ignores all other
inputs. Another weakness is that several synsets are not carrying opinion[35].
we can not pretend that our opinion detection for the production of summary text goes beyond the
assessment of degrees of positivity or negativity, we must shed light on recent efforts to introduce
more linguistic and discourse approaches (taking into account the modality of the speaker) in this
accompli by (Asher and al., 2008)[36].
As for the evaluation of abstracts is a crucial problem, emphasize the contribution (Goulet,
2007)[37] that goes beyond the coverage of n-grams and offers a terminology adapted to French.
In recent years, large-scale assessments, independent designers systems have emerged and
several evaluation measures have been proposed. As regards the assessment of automatic
summary, two evaluation campaigns have already been conducted by the U.S. DARPA (Defense
Advanced Research Projects Agency). The first, entitled SUMMAC, ran from 1996 to 1998
under the TIPSTER (Lin and al., 2003) [38] program, and the second, entitled DUC (Document
Understanding Conferences) (noting that France still lagging behind several countries in all
science especially: Computer science), (Das and al., 2007) [39] followed from 2000 to 2007.
Since 2008 it is the Text Analysis Conference Such measures may be applied to the distribution
of units in the summaries of P systems and those of reference Q. The method was evaluated by
Lin et al. (2006) on the corpus DUC'02 for tasks mono and multi-document summary. A good
correlation was found between measures of divergence and the two rankings obtained with
ROUGE and coverage. (Louis & Nenkova ,2009) [39] went further and, proposed to compare the
distribution of words in the complete documents with the words in automatic summaries to infer
an evaluation measure based on the content.
5. Computer Science & Information Technology (CS & IT) 167
3. OUR PROPOSED APPROACH
Our approach is based on the identification of opinions textual units (phrases, clauses, sentences,
paragraphs), the identification of opinion original text, finally extracting textual units that share
the same opinion that the original text
We start from the hypothesis mentioned in previous section.
Recall that the opinion is an expression of the feelings of a person towards an enti-ty or an aspect
of the entity (Liu, 2010). An entity may be a product, a person, event, organization or topic.
SentiWordNet is used to filter the word bearer of opinion first, then we will use the score
returned by SentiWordNet like so:
If (score_opinion (term i) <0) then the opinion of (term i)
is negative, else opinion is positive
Our approach will follow the following steps:
3.1 Pretreatment
Simple cleaning: Empty words will not be deleted, because the method for automatic
summarization by extraction is based on extracting the most informative sentences without
change and because the final result is a text (abstract) : if any words will be deleted without
information on their morphosyntactic and semantic impact in sentences, you can get a text
summary inconsistent.
And for this cleaning will be limited to delete emoticons and replace spaces with _ and remove
the special characters that cannot fit in French or English literature (#, , [,]............)
Choice of term: for automatic summarization by extraction we will need two representations:
• Bag of words representation.
• Bag sentence representation.
The two representations are introduced in the context of vector model:
The first representation is to transform the text into a vector vi ( w1 , w2 , .... , w | T | ) where T is
the number of all the words appearing at least once in the text. The weight wk indicates the
occurrence of the word tk in the document.
The representation is to transform the text into a vector V1
i ( w'1 , w'2 , .... , w ' | R | ) where R is the
number of all the phrases that appear at least once in the text. The w'k weight indicates the
occurrence of tk sentence in the document.
And finally a matrix of occurrence sentences * will generate a word from the two previous
representations, the size of the matrix is :
6. 168 Computer Science & Information Technology (CS & IT)
• The number of words in the text * the number of words in the text,
• The weight pik represents the number of occurrences of the word k in sentence i;
3.2 Detecting opinion by SentiWordNet:
The "sentence-term" matrix is reduced to a "sentence - carrier term of opinion " matrix filtering
the term vector vi by SentiWordNet, no-existing terms in the dictionary opinion will be
eliminated.
At the end of this step, a matrix M of size nxp where n is equal to the number of phrases and p is
equal to the number of term carrier opinion. Mij indicates the occurrence of the word (opinion
holder) j in sentence i
ܱ = ܯ ∗ ݁ݎܿݏ (݆) (1)
The score (j) is the score obtained by the SentiWordNet for the term j
3.3 Construction of Summary
Weighting: Once the matrix "Phrase- carrier term of opinion" is ready, we calculate the score of
phrases as well as the score of text in order to proceed to the final step.
The opinion score for textual units (sentences, paragraph or text) is equal to average the score of
the holders of opinion obtained by the SentiWordNet terms.
So the score of opinion of each sentence will be calculated as follows:
ܵܿ݁ݎ௦ሾ݅ሿ =
∑ ܱ
ୀ
∑ ܯ
ୀ
(2)
such that n = number of carrier term of opinion in the textual unit
Finally, we identified the opinion of text that is the average of opinion score of phrase that the
compound:
ܵܿ݁ݎ௧௫௧ =
∑ ܵܿ݁ݎ௦ሾ݅ሿோ
ୀ
ܴ
(3)
As size R of vector V’ (number of sentences)
Summary Final: "The suggested procedure claims on the principle that high-frequency words in a
document are important words" [Luhn 1958]. In our case we will adapt this quote as follows:
"The suggested procedure claims on the principle that sentences that share the same opinion that
the document (text) are important phrases" ie phrases that do not share the same opinion with text
without phrases that have been used by the author to develop an idea or comparison is equivalent
to saying that we can eliminate them without causing a large entropy of sense.
The final step is to select the phrases that have the same opinions as the text, for this we proposed
this method:
7. Computer Science & Information Technology (CS & IT) 169
By neighborhood threshold of opinion of phrases: we kept the phrase his degree of similarity
of opinion between this phrase and the text is greater than or equal threshold neighborhood or
opinion threshold.
For each sentence k do
If (score_texte - threshold_opinion < threshold _phrase [k]) and (threshold _phrase [k]
<score_texte + threshold _opinion)
Then selected the sentence k
Fig. 1. Full process of the proposed approach
4. EXPERIMENTATION
To test our hypothesis already mentioned we use the Chi2 measure is a well-known statistical
measure, it evaluates the lack of independence between a textual unit and a text. It uses the same
concepts of co-occurrence word / text mutual information, but the difference lies on
standardization, which makes them comparable terms. Measurement Chi_2 still loses relevance
for infrequent terms.[41]
ܺ² (ݐݑ, ݐݔ݁ݐ) =
|ܶ|. ሾܲ (ݐݑ, ݐݔ݁ݐ). ܲ (ݐݑതതതതത, ݐݔ݁ݐపതതതതതതത) − ܲ (ݐݑ, ݐݔ݁ݐ). ܲ (ݐݑതതതതത, ݐݔ݁ݐ)ሿ²
ܲ(ݐݑ). ܲ(ݐݑതതതതത). ܲ(ݐݔ݁ݐ). ܲ(ݐݔ݁ݐపതതതതതതത)
(4)
The use of measurement Chi_2 determines the independence of sentences eliminated by detecting
opinion with the original text. Chi_2 promotes the absence of terms and the most common and
takes into consideration information from the text terms. A high value of the Chi-2 (k, i) reflects a
dependency between the sentence k and the text i .
The second step of our experiment will begin to study is a robustness of summary, we use two
evaluation method ROUGE-SU (2) (Recall-Oriented Understudy for Gisting Evaluation –Skip
Unit) and F-measure.
8. 170 Computer Science & Information Technology (CS & IT)
4.1 Used corpus
Was used as the text corpus "Hurricane" in the English text (to use the SentiWordNet): the text
contains a title and 20 sentences.
Summaries reference: we took three reference summary product successively by Summarizer
CORTEX, Essentiel Summarizerand the third by a human expert.
Cortex
Hurricaine Gilbert Swept towrd the Dominican Republic Sunday, and the Civil Defense
alerted its heavily populated south coast to prepare for high winds, heavy rains, and
high seas. The National Hurricaine Center in Miami reported its position at 2 a.m.
Sunday at latitude 16.1 north, longitude 67.5 west, about 140 miles south of Ponce,
Puerto Rico, and 200 miles southeast of Santo Domingo. The National Weather Service
in San Juan, Puerto Rico, said Gilbert was moving westard at 15 mph with a "broad
area of cloudiness and heavy weather" rotating around the center of the storm. Strong
winds associated with the Gilbert brought coastal flooding, strong southeast winds,
and up to feet to Puerto Rico's south coast.
Essential Summarizer
"There is no need for alarm," Civil Defense Director Eugenio Cabral said in a
television alert shortly after midnight Saturday. Cabral said residents of the province
of Barahona should closely follow Gilbert's movement. On Saturday, Hurricane
Florence was downgraded to a tropical storm, and its remnants pushed inland from the
U.S. Gulf Coast. Residents returned home, happy to find little damage from 90 mph
winds and sheets of rain.
Human expert
Hurricane Gilbert is moving toward the Dominican Republic, where the residents if the
south coast, especially the Barahona Province, have been alerted to prepare for heavy
rain, and high winds and seas. Tropical storm Gilbert formed in the eastern Caribbean
and became a hurricane on Saturday night. By 2am Sanday it was about 200 miles
southeast of Santo Domingo and moving westward at 15 mph with of 75 mph. Flooding
is expected in Puerto Rico and the Virgin Islands. The second hurricane of the season,
Florence, is now over the southern United States and downgraded to a tropical storm.
4.2 Validation
To estimate the robustness of our summary, we calculate the correlation metric ROUGE-SU (2)
(Lin 2004) which compares a candidate summary (automatically generated by the system to be
evaluated) and a reference summary (created by human experts or other automatic summarization
systems known). And another F-Measure metric that we have proposed in earlier work.
The evaluation measure Recall - Oriented Understudy for Gisting Evaluation.
The evaluation of abstracts can be done semi-automatically through measures of similarities
computed between a candidate summary and one or more reference summaries. We evaluate the
results of this work by the measure called Recall - Oriented Understudy for Gisting Evaluation
9. Computer Science & Information Technology (CS & IT) 171
(ROUGE) proposed by (Lin, 2004) involving the differences between distributions of words.
Heavily used in the DUC campaigns, these measures are increasingly considered standard by the
community because of their strong correlation with manual ratings. Two variants of ROUGE will
be developed; it is the measures used in the DUC campaigns.
ROUGE (N)
Measurement of return calculated on the co-occurrences of N-grams between a candidate
summary Rcan and a set of reference summaries Rref Co-occurrences (N-gram) is the maximum
number of co-occurrences N-grams and Rref in Rcan number and (N-grams ) to the number of N-
grams appearing in the abstract
ܴܱܷܧܩ (ܰ) =
∑ ∑ Co − occurences (R୰ୣ, Rୡୟ୬, N)ୱ∈ୖౙୱ∈ୖ౨
Nbr − NGramme (N)ୖ౨
(5)
ROUGE-SU (M)
Adaptation of ROUGE-2 using bigrams hole (skip units (SU)) maximum size M and counting
unigrams.
F-Measure pour l’évaluation des résumés automatique par extraction
We proposed in our previous work an adaptation of the F-measure for the validation of automatic
summarization by extraction, since this technique is based on sentences to keep and delete else
following some a philosophy (scoring, detection Thematic....), this can be considered as a two-
class classification.
Our method is a hybrid between the two valuation methods: intrinsic and extrinsic.
We shall compare the applicant and the full text (summary) summary to identify textual units that
have been kept and which have been deleted, then did the same operation between the reference
summary and the full text (summary), and finally a comparison between the two summaries
(candidate and reference) is performed to obtain the following confusion matrix.
Table 1. Confusion matrix summary automatic
A Recall of "Tu-K" class is defined by the number of textual units kept in the candidate and
reference summary (shared), divided by the number of units of text kept by the reference
summary; in parallel calculates Recall of "Tu-D" class in the same way that is to say, the number
of text units deleted in the candidate and reference summary (shared), divided by the number of
units of text deleted reference summary
ܴ݈ܽ݁୳ି =
ܺ
ܺ + ܻ
(6) ܴ݈ܽ݁୭ିୈ =
ܹ
ܹ + ܼ
(7)
10. 172 Computer Science & Information Technology (CS & IT)
The precision of "Tu-K" class is defined by the number of textual units kept in the candidate and
reference summary (shared), divided by the number of units of textual kept by the candidate
summary; in parallel calculates the precision class "Tu-D" in the same way that is to say, the
number of text units removed from the candidate and reference summary (shared), divided by the
number of textual units deleted by the candidate summary.
Since automatic summarization by extraction is a two-class classification thus:
4.3 The algorithm description.
Begin
Pretreatment paper
Mij = integer array [1 .. number of sentence] [1 .. number of
terms]
Mij = vectorization (Word Bag, bag of sentences)
Oij = integer array [1 .. number of sentence] [1 .. number of
terms]
For each sentence i do begin
For each j terms to begin
ܱ = ܯ ∗ ݁ݎܿݏ (݆)(13)
End for
End for
Score _phrase = array real [1 .. number of sentence]
for each sentence k do begin
ܵܿ_݁ݎℎ݁ݏܽݎ ሾ݅ሿ =
∑ ܱ
ୀ
∑ ܯ
ୀ
(14)
End for
ܵܿ݁ݐݔ݁ݐ_݁ݎ =
∑ ܵܿ݁ݎ௦ሾ݅ሿோ
ୀ
ܴ
(15)
Threshold = défini_par _l'utilisateur
For each sentence do i start
If (score_texte-seuil_voisingane <score_phrase [i] <+ score_texte
seuil_voisingane)
then remember sentence [i] if not eliminated sentence [i]
End for
END
ܲݎéܿ݅݊݅ݏ୳ି =
ܺ
ܺ + ܼ
(8) ܲݎéܿ݅݊݅ݏ୭ିୈ =
ܹ
ܹ + ܻ
(9)
ܨ − ݁ݎݑݏ݁ܯ =
2 ∗ (ܲݎéܿ݅݊݅ݏ ∗ ܴܽ)݈݁
(ܲݎéܿ݅݊݅ݏ + ܴܽ)݈݁
(12)
ܴappel =
ୖୟ୮୮ୣ୪౪ ౪ృାୖୟ୮୮ୣ୪౪ ౪
ଶ
(10) Précision =
୰éୡ୧ୱ୧୭୬౪ ౪ృା୰éୡ୧ୱ୧୭୬౪ ౪
ଶ
(11)
Finally combining precision and recall is calculated weighting to the F-Measure
12. 174 Computer Science & Information Technology (CS & IT)
Table 2. Result Evaluation of summary produced (candidate) with ROUGE and F-measure using 3
reference summary Cortex, Essential Summaries, human expert (second part)
The above table includes an assessment summary with a different threshold of opinions
comparing with all reference summary using the F - measure and ROUGE. The following table
shows in a manner explicit the selected sentences (keep) (K) and the sentences deleted (D) at
each threshold of opinion and gives the chi_2 value for each sentence with original text and the
chi2 rate of each summary report by the original text.
The rate of chi-2 which is equal to the sum of value chi_2 sentence divided by retained by the
number of all the phrases which constitutes the original text, it indicates the correctness of choice
of phrases relative to their dependence original text.
13. Computer Science & Information Technology (CS & IT) 175
Table 3. Distribution of sentences in summary product (candidate) for each threshold of opinion , his chi_2
rate and reduction rate, and Chi_2 value for each sentence
4.5 The best summary text
The following table summarizes all the results mentioned in the two previous tables in case the
table2 and table 3:
14. 176 Computer Science & Information Technology (CS & IT)
this is the second part of Table above
Table 4. Recapitulation of table 1 and 2, ROUGE-SU(2) vs F-Measure, Vs chi_2 rate summary Vs rate reduction
5. INTERPRETATION
We tested our approach with an incremental threshold 0.00125 to see the impact of threshold of
opinion on the quality of the summary and to recommend range threshold that returns good
results.
ROUGE is a intrinsic semi-automatic evaluation metric based on the number of co-occurrence
between a candidate summary and one or more reference summaries divided by the size of the
latter. Its weakness is that it is based on references summary and neglects the original text.
The value given by ROUGE for a summary with a negligible reduction rate is high. This high
value is explained by the to the increased number of co-occurrence between the candidate
summary and references summary.
15. Computer Science & Information Technology (CS & IT) 177
The F-Measure is one of the most robust metric and most used for the evaluation of
classification; The F-measure is a combination of Recall and precision. For our adaptation we
added to the force F-Measures an extrinsic evaluation in the beginning, and continue with an
intrinsic evaluation: So this is hybrid evaluation. For automatic summary reduced rate of
reduction, F-measure gives better than ROUGE assessments because it takes account of the
absence of term. But unlike ROUGE, evaluating a candidate summary with high reduction ratio
summary can be distorted, because FALSE NEGATIVE FN and TRUE NEGATIVE TN is the
maximum which will give good result in summary generally poor (highest reduction rate leads to
an increase of entropy information)
Fig. 2. Chi_2 rate candidate summary Vs reduction rate candidate summary
The precision indicated the purity of the candidate summary, while recall interprets the likeness
of the candidate summary reference.
We can deduce from the above graph that the two variable: rate reduction rate Chi_2 text and
have an inverse correlation, indicating that the increased number of selected sentences increases
the independence original text. This indication is logic and expected, returning to table 2, we can
see that in the made to retain P4, P 13 ,P18 sentences we increase the rate of chi-2 + by 25%,
thanks to their strong dependence on text that is equal to 0.57 for the P4 and larger than 0.71 for
P13 and P18
Fig. 3. ROUGE VS F-Measure (with 3 reference summary) vs chi2 rate vs reduction rate
16. 178 Computer Science & Information Technology (CS & IT)
We see the ROUGE index (black curve, red and green) is higher when the rate of reduction (red
curve) is low; unlike rate chi-2 indicating a great loss to the sentence dependente with a orginale
text, in this case assessment ROUGE is false, and this is the result of the high value of the co-
occurrence between the candidate summary and the reference summary, the probability found
that of co-occurrence between a long and a short text reference is more important than a short text
summary with the same reference.
Seen on the graph that the F-measure overstates the automatic summarization which has a high
rate of reduction, by against it does not overstate the summary is a low rate of reduction; is
explained by the consideration of missing words (False Negative: textual untités deleted in the
candidate resumé but retained by the abstract and negative reference True: textual untités deleted
in the candidate and the reference summary resumé). This is a strong point of the F-Measure
adapted to enable automatic extraction summaries, although it must be noted that its weakness
against the very small summary and True Negative achieving the maximum value (all deleted
sentences in the summary Reference will also be summarized in this summary has a high
reduction ratio)
Finally, we can see all indexes used for the evaluation values are reached their optimal threshold
set between 0.0125 and 0.0175, in this interval all evaluation value is good for the candidate
summary.
We can see from Figure 3 and Table 4 that the selection of words that are less dependent
originally the text does not improve Result, for example: between 0.0125 and 0.01375 threshold,
the only difference is the selection of the fifth sentence in 0.01375 (it was not in the selected
0.0125 threshold), in Table 4 we seen Chi_2 the value of this sentence is low which is also
readable on the graph not stagnation the ROUGE and a slight drop in F-measure for three
reference summary. This confirms our hypothesis.
6. CONCLUSION AND PERSPECTIVE
In this article, we presented a new approach for the production of an automatic summary
extraction based on the detection of conscience SentiWordNet.
First line, we proposed a hypothesis that will support this approach "textual units that do not
share the same opinion of the text are ideas used for the development or comparison and
their absences have no vocation to reach the semantics of the abstract "
The second line, we explain our approach to detecting and opinion proposed a flexible technique
to choose the sentence that is near to the original text opinion poll threshold.
Given the results obtained, we have validated our hypothesis; and therefore this work can help
solve one of the major problems of automatic summarization: the reduction of information
entropy and conservation semantics.
Looking ahead, we will try to improve automatic summarization by extraction based on the
detection of opinion by the application of technical and other conventional method such as
detection thematic.
17. Computer Science & Information Technology (CS & IT) 179
7. ANNEX
Title : Hurricaine Gilbert
Hurricaine Gilbert Swept towrd the Dominican Republic Sunday, and the Civil Defense alerted
its heavily populated south coast to prepare for high winds, heavy rains, and high seas. The storm
was approaching from the southeast with sustained winds of 75 mph gusting to 92 mph. "There is
no need for alarm," Civil Defense Director Eugenio Cabral said in a television alert shortly after
midnight Saturday. Cabral said residents of the province of Barahona should closely follow
Gilbert's movement.
An estimated 100,000 people live in the province, including 70,000 in the city of Barahona, about
125 miles west of Santo Domingo. Tropical storm Gilbert formed in the eastern Carribean and
strenghtened into a hurricaine Saturday night. The National Hurricaine Center in Miami reported
its position at 2 a.m. Sunday at latitude 16.1 north, longitude 67.5 west, about 140 miles south of
Ponce, Puerto Rico, and 200 miles southeast of Santo Domingo. The National Weather Service in
San Juan, Puerto Rico, said Gilbert was moving westard at 15 mph with a "broad area of
cloudiness and heavy weather" rotating around the center of the storm. The weather service
issued a flash flood watch for Puerto Rico and the Virgin Islands until at least 6 p.m. Sunday.
Strong winds associated with the Gilbert brought coastal flooding, strong southeast winds, and up
to 12 feet to Puerto Rico's south coast. There were no reports on casualties. San Juan, on the north
coast, had heavy rains and gusts Saturday, but they subsided during the night. On Saturday,
Hurricane Florence was downgraded to a tropical storm, and its remnants pushed inland from the
U.S. Gulf Coast. Residents returned home, happy to find little damage from 90 mph winds and
sheets of rain. Florence, the sixth named storm of the 1988 Atlantic storm season, was the second
hurricane. The first, Debby, reached minimal hurricane strength briefly before hitting the
Mexican coast last month.
REFERENCES
[1] Luhn, H. P. (1958). The automatic creation of literature abstracts. IBM Journal of research and
development, 2(2), 159-165.
[2] Edmundson, H. P. (1969). New methods in automatic extracting. Journal of the ACM (JACM), 16(2),
264-285.
[3] Radev, D. R., Blair-Goldensohn, S., & Zhang, Z. (2001). Experiments in single and multi-document
summarization using MEAD. Ann Arbor, 1001, 48109.
[4] Erkan, G., & Radev, D. R. (2004). LexRank: Graph-based lexical centrality as salience in text
summarization. Journal of Artificial Intelligence Research, 457-479.
[5] Boudin, F., & Moreno, J. M. T. (2007). NEO-CORTEX: a performant user-oriented multi-document
summarization system. In Computational Linguistics and Intelligent Text Processing (pp. 551-562).
Springer Berlin Heidelberg.
[6] Carbonell, J., & Goldstein, J. (1998, August). The use of MMR, diversity-based reranking for
reordering documents and producing summaries. In Proceedings of the 21st annual international
ACM SIGIR conference on Research and development in information retrieval (pp. 335-336). ACM.
[7] Boudin, F., & El-Bèze, M. (2008). A scalable MMR approach to sentence scoring for multi-document
update summarization.
[8] Mani, I., Pustejovsky, J., & Sundheim, B. (2004). Introduction to the special issue on temporal
information processing. ACM Transactions on Asian Language Information Processing (TALIP),
3(1), 1-10.
18. 180 Computer Science & Information Technology (CS & IT)
[9] Farzindar, A., Lapalme, G., & Desclés, J. P. (2004). Résumé de textes juridiques par identification de
leur structure thématique. Traitement Automatique des Langues (TAL), Numéro spécial sur: Le
résumé automatique de texte: solutions et perspectives, 45(1), 26.
[10] CRISPINO, G., & COUTO, J. (2004). Construction automatique de résumés: Une approche
dynamique: Résumé automatique de textes. TAL. Traitement automatique des langues, 45(1), 95-120.
[11] Barzilay, R., & McKeown, K. R. (2005). Sentence fusion for multidocument news summarization.
Computational Linguistics, 31(3), 297-328.
[12] Yousfi-Monod, M. (2007). Compression automatique ou semi-automatique de textes par élagage des
constituants effaçables: une approche interactive et indépendante des corpus (Doctoral dissertation,
Université Montpellier II-Sciences et Techniques du Languedoc).
[13] Eyrich, V. A., Martı-Renom, M. A., Przybylski, D., Madhusudhan, M. S., Fiser, A., Pazos, F., ... &
Rost, B. (2001). EVA: continuous automatic evaluation of protein structure prediction servers.
Bioinformatics, 17(12), 1242-1243.
[14] Yu, H., & Hatzivassiloglou, V. (2003, July). Towards answering opinion questions: Separating facts
from opinions and identifying the polarity of opinion sentences. In Proceedings of the 2003
conference on Empirical methods in natural language processing (pp. 129-136). Association for
Computational Linguistics.
[15] Turney, P. D. (2002, July). Thumbs up or thumbs down?: semantic orientation applied to
unsupervised classification of reviews. In Proceedings of the 40th annual meeting on association for
computational linguistics (pp. 417-424). Association for Computational Linguistics.
[16] Wilson, T., Wiebe, J., & Hwa, R. (2004, July). Just how mad are you? Finding strong and weak
opinion clauses. In aaai (Vol. 4, pp. 761-769).
[17] Pang, B., & Lee, L. (2008). Opinion mining and sentiment analysis. Foundations and trends in
information retrieval, 2(1-2), 1-135.
[18] Zhang, Y. C., Medo, M., Ren, J., Zhou, T., Li, T., & Yang, F. (2007). Recommendation model based
on opinion diffusion. EPL (Europhysics Letters), 80(6), 68003.
[19] Dey, L., & Haque, S. K. (2009, July). Studying the effects of noisy text on text mining applications.
In Proceedings of The Third Workshop on Analytics for Noisy Unstructured Text Data (pp. 107-114).
ACM.[
[20] Hatzivassiloglou, V., & McKeown, K. R. (1997, July). Predicting the semantic orientation of
adjectives. In Proceedings of the 35th annual meeting of the association for computational linguistics
and eighth conference of the european chapter of the association for computational linguistics (pp.
174-181). Association for Computational Linguistics.
[21] Wiebe, J. (2000, July). Learning subjective adjectives from corpora. In AAAI/IAAI (pp. 735-740).
[22] Kanayama, H., & Nasukawa, T. (2006, July). Fully automatic lexicon expansion for domain-oriented
sentiment analysis. In Proceedings of the 2006 Conference on Empirical Methods in Natural
Language Processing (pp. 355-363). Association for Computational Linguistics.
[23] Esuli, A., & Sebastiani, F. (2006, May). Sentiwordnet: A publicly available lexical resource for
opinion mining. In Proceedings of LREC (Vol. 6, pp. 417-422).
[24] Qiu, H., Xue, L., Ji, G., Zhou, G., Huang, X., Qu, Y., & Gao, P. (2009). Enzyme-modified
nanoporous gold-based electrochemical biosensors. Biosensors and Bioelectronics, 24(10), 3014-
3018.
[25] Hu, M., & Liu, B. (2004, August). Mining and summarizing customer reviews. In Proceedings of the
tenth ACM SIGKDD international conference on Knowledge discovery and data mining (pp. 168-
177). ACM.
[26] Kim, S. M., & Hovy, E. (2004, August). Determining the sentiment of opinions. In Proceedings of the
20th international conference on Computational Linguistics (p. 1367). Association for Computational
Linguistics.
[27] Kamps, J., Marx, M., Mokken, R. J., & De Rijke, M. (2004, May). Using WordNet to Measure
Semantic Orientations of Adjectives. In LREC (Vol. 4, pp. 1115-1118).
19. Computer Science & Information Technology (CS & IT) 181
[28] Esuli, A., & Sebastiani, F. (2006, May). Sentiwordnet: A publicly available lexical resource for
opinion mining. In Proceedings of LREC (Vol. 6, pp. 417-422).
[29] Ohana, B., & Tierney, B. (2009, October). Sentiment classification of reviews using SentiWordNet.
In 9th. IT & T Conference (p. 13).
[30] Elkhlifi, A., Bouchlaghem, R., & Faiz, R. (2011, March). Opinion Extraction and Classification
Based on Semantic Similarities. In FLAIRS Conference.
[31] Liu, J., Cao, Y., Lin, C. Y., Huang, Y., & Zhou, M. (2007, June). Low-Quality Product Review
Detection in Opinion Summarization. In EMNLP-CoNLL (pp. 334-342).
[32] Zhang, L., & Liu, B. (2011, June). Identifying noun product features that imply opinions. In
Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics: Human
Language Technologies: short papers-Volume 2 (pp. 575-580). Association for Computational
Linguistics.
[33] Ding, X., Liu, B., & Yu, P. S. (2008, February). A holistic lexicon-based approach to opinion mining.
In Proceedings of the 2008 International Conference on Web Search and Data Mining (pp. 231-240).
ACM.
[34] Pang, B., & Lee, L. (2005, June). Seeing stars: Exploiting class relationships for sentiment
categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on
Association for Computational Linguistics (pp. 115-124). Association for Computational Linguistics.
[35] Baccianella, S., Esuli, A., & Sebastiani, F. (2010, May). SentiWordNet 3.0: An Enhanced Lexical
Resource for Sentiment Analysis and Opinion Mining. In LREC (Vol. 10, pp. 2200-2204).
[36] Asher, N., Benamara, F., & Mathieu, Y. Y. (2008). Distilling Opinion in Discourse: A Preliminary
Study. In COLING (Posters) (pp. 7-10).
[37] Goulet, M. J. (2007). Terminologie et paramètres expérimentaux pour l’évaluation des résumés
automatiques. Traitement Automatique des Langues, 48(1).
[38] Lin, C. Y., & Hovy, E. (2003, May). Automatic evaluation of summaries using n-gram co-occurrence
statistics. In Proceedings of the 2003 Conference of the North American Chapter of the Association
for Computational Linguistics on Human Language Technology-Volume 1 (pp. 71-78). Association
for Computational Linguistics.
[39] Das, D., & Martins, A. F. (2007). A survey on automatic text summarization. Literature Survey for
the Language and Statistics II course at CMU, 4, 192-195.
[40] Louis, A., & Nenkova, A. (2009, August). Automatically evaluating content selection in
summarization without human models. In Proceedings of the 2009 Conference on Empirical Methods
in Natural Language Processing: Volume 1-Volume 1 (pp. 306-314). Association for Computational
Linguistics.
[41] Moh'd A Mesleh, A. (2007). Chi square feature extraction based SVMs Arabic language text
categorization system. Journal of Computer Science, 3(6), 430-435