In this paper we tried to correlate text sequences those provides common topics for semantic clues. We propose a two step method for asynchronous text mining. Step one check for the common topics in the sequences and isolates these with their timestamps. Step two takes the topic and tries to give the timestamp of the text document. After multiple repetitions of step two, we could give optimum result.
A rough set based hybrid method to text categorizationNinad Samel
This document summarizes a hybrid text categorization method that combines Latent Semantic Indexing (LSI) and Rough Sets theory to reduce the dimensionality of text data and generate classification rules. It introduces LSI to reduce the feature space of text documents represented as high-dimensional vectors. Then it applies Rough Sets theory to the reduced feature space to locate a minimal set of keywords that can distinguish document classes and generate multiple knowledge bases for classification instead of a single one. The method is tested on text categorization tasks and shown to improve accuracy over previous Rough Sets approaches.
This document discusses integrating natural language processing and parse tree query language with text mining and topic summarization methods to more efficiently extract relevant content from documents. It presents an approach that uses natural language processing to automatically generate queries from sentences, and then applies a topic summarization method called TSCAN to identify themes, segment events, and construct an evolution graph to show relationships between events. The integrated system aims to make content extraction more effective and easier to use for real-time applications. Evaluation of the methods showed benefits for tasks like information extraction.
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET Journal
This document discusses a proposed system for deep collaborative filtering with aspect information. The system aims to help web users efficiently locate relevant information on unfamiliar topics to increase their knowledge. It utilizes techniques like multi-keyword search, synonym matching, and ontology mapping to return relevant web links, images, and news articles to the user based on their search terms. The proposed system architecture includes an index structure to efficiently search and rank results based on similarity to the search query terms. The implementation and evaluation of the proposed system are also discussed.
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...IJDKP
As existing computer search engines struggle to understand the meaning of natural language, semantically
enriched metadata may improve interest-based search engine capabilities and user satisfaction.
This paper presents an enhanced version of the ecosystem focusing on semantic topic metadata detection
and enrichments. It is based on a previous paper, a semantic metadata enrichment software ecosystem
(SMESE). Through text analysis approaches for topic detection and metadata enrichments this paper
propose an algorithm to enhance search engines capabilities and consequently help users finding content
according to their interests. It presents the design, implementation and evaluation of SATD (Scalable
Annotation-based Topic Detection) model and algorithm using metadata from the web, linked open data,
concordance rules, and bibliographic record authorities. It includes a prototype of a semantic engine using
keyword extraction, classification and concept extraction that allows generating semantic topics by text,
and multimedia document analysis using the proposed SATD model and algorithm.
The performance of the proposed ecosystem is evaluated using a number of prototype simulations by
comparing them to existing enriched metadata techniques (e.g., AlchemyAPI, DBpedia, Wikimeta, Bitext,
AIDA, TextRazor). It was noted that SATD algorithm supports more attributes than other algorithms. The
results show that the enhanced platform and its algorithm enable greater understanding of documents
related to user interests.
SEMANTICS GRAPH MINING FOR TOPIC DISCOVERY AND WORD ASSOCIATIONSIJDKP
Big Data creates many challenges for data mining experts, in particular in getting meanings of text data. It is beneficial for text mining to build a bridge between word embedding process and graph capacity to connect the dots and represent complex correlations between entities. In this study we examine processes of building a semantic graph model to determine word associations and discover document topics. We introduce a novel Word2Vec2Graph model that is built on top of Word2Vec word embedding model. We demonstrate how this model can be used to analyze long documents, get unexpected word associations and uncover document topics. To validate topic discovery method we transfer words to vectors and vectors to images and use CNN deep learning image classification.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
Text mining efforts to innovate new, previous unknown or hidden data by automatically extracting
collection of information from various written resources. Applying knowledge detection method to
formless text is known as Knowledge Discovery in Text or Text data mining and also called Text Mining.
Most of the techniques used in Text Mining are found on the statistical study of a term either word or
phrase. There are different algorithms in Text mining are used in the previous method. For example
Single-Link Algorithm and Self-Organizing Mapping(SOM) is introduces an approach for visualizing
high-dimensional data and a very useful tool for processing textual data based on Projection method.
Genetic and Sequential algorithms are provide the capability for multiscale representation of datasets and
fast to compute with less CPU time based on the Isolet Reduces subsets in Unsupervised Feature
Selection. We are going to propose the Vector Space Model and Concept based analysis algorithm it will
improve the text clustering quality and a better text clustering result may achieve. We think it is a good
behavior of the proposed algorithm is in terms of toughness and constancy with respect to the formation of
Neural Network.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
A rough set based hybrid method to text categorizationNinad Samel
This document summarizes a hybrid text categorization method that combines Latent Semantic Indexing (LSI) and Rough Sets theory to reduce the dimensionality of text data and generate classification rules. It introduces LSI to reduce the feature space of text documents represented as high-dimensional vectors. Then it applies Rough Sets theory to the reduced feature space to locate a minimal set of keywords that can distinguish document classes and generate multiple knowledge bases for classification instead of a single one. The method is tested on text categorization tasks and shown to improve accuracy over previous Rough Sets approaches.
This document discusses integrating natural language processing and parse tree query language with text mining and topic summarization methods to more efficiently extract relevant content from documents. It presents an approach that uses natural language processing to automatically generate queries from sentences, and then applies a topic summarization method called TSCAN to identify themes, segment events, and construct an evolution graph to show relationships between events. The integrated system aims to make content extraction more effective and easier to use for real-time applications. Evaluation of the methods showed benefits for tasks like information extraction.
IRJET - Deep Collaborrative Filtering with Aspect InformationIRJET Journal
This document discusses a proposed system for deep collaborative filtering with aspect information. The system aims to help web users efficiently locate relevant information on unfamiliar topics to increase their knowledge. It utilizes techniques like multi-keyword search, synonym matching, and ontology mapping to return relevant web links, images, and news articles to the user based on their search terms. The proposed system architecture includes an index structure to efficiently search and rank results based on similarity to the search query terms. The implementation and evaluation of the proposed system are also discussed.
A SEMANTIC METADATA ENRICHMENT SOFTWARE ECOSYSTEM BASED ON TOPIC METADATA ENR...IJDKP
As existing computer search engines struggle to understand the meaning of natural language, semantically
enriched metadata may improve interest-based search engine capabilities and user satisfaction.
This paper presents an enhanced version of the ecosystem focusing on semantic topic metadata detection
and enrichments. It is based on a previous paper, a semantic metadata enrichment software ecosystem
(SMESE). Through text analysis approaches for topic detection and metadata enrichments this paper
propose an algorithm to enhance search engines capabilities and consequently help users finding content
according to their interests. It presents the design, implementation and evaluation of SATD (Scalable
Annotation-based Topic Detection) model and algorithm using metadata from the web, linked open data,
concordance rules, and bibliographic record authorities. It includes a prototype of a semantic engine using
keyword extraction, classification and concept extraction that allows generating semantic topics by text,
and multimedia document analysis using the proposed SATD model and algorithm.
The performance of the proposed ecosystem is evaluated using a number of prototype simulations by
comparing them to existing enriched metadata techniques (e.g., AlchemyAPI, DBpedia, Wikimeta, Bitext,
AIDA, TextRazor). It was noted that SATD algorithm supports more attributes than other algorithms. The
results show that the enhanced platform and its algorithm enable greater understanding of documents
related to user interests.
SEMANTICS GRAPH MINING FOR TOPIC DISCOVERY AND WORD ASSOCIATIONSIJDKP
Big Data creates many challenges for data mining experts, in particular in getting meanings of text data. It is beneficial for text mining to build a bridge between word embedding process and graph capacity to connect the dots and represent complex correlations between entities. In this study we examine processes of building a semantic graph model to determine word associations and discover document topics. We introduce a novel Word2Vec2Graph model that is built on top of Word2Vec word embedding model. We demonstrate how this model can be used to analyze long documents, get unexpected word associations and uncover document topics. To validate topic discovery method we transfer words to vectors and vectors to images and use CNN deep learning image classification.
Semantic Based Model for Text Document Clustering with IdiomsWaqas Tariq
Text document clustering has become an increasingly important problem in recent years because of the tremendous amount of unstructured data which is available in various forms in online forums such as the web, social networks, and other information networks. Clustering is a very powerful data mining technique to organize the large amount of information on the web. Traditionally, document clustering methods do not consider the semantic structure of the document. This paper addresses the task of developing an effective and efficient method to improve the semantic structure of the text documents. A method has been developed that performs the following: tag the documents for parsing, replacement of idioms with their original meaning, semantic weights calculation for document words and apply semantic grammar. The similarity measure is obtained between the documents and then the documents are clustered using Hierarchical clustering algorithm. The method adopted in this work is evaluated on different data sets with standard performance measures and the effectiveness of the method to develop in meaningful clusters has been proved.
Text mining efforts to innovate new, previous unknown or hidden data by automatically extracting
collection of information from various written resources. Applying knowledge detection method to
formless text is known as Knowledge Discovery in Text or Text data mining and also called Text Mining.
Most of the techniques used in Text Mining are found on the statistical study of a term either word or
phrase. There are different algorithms in Text mining are used in the previous method. For example
Single-Link Algorithm and Self-Organizing Mapping(SOM) is introduces an approach for visualizing
high-dimensional data and a very useful tool for processing textual data based on Projection method.
Genetic and Sequential algorithms are provide the capability for multiscale representation of datasets and
fast to compute with less CPU time based on the Isolet Reduces subsets in Unsupervised Feature
Selection. We are going to propose the Vector Space Model and Concept based analysis algorithm it will
improve the text clustering quality and a better text clustering result may achieve. We think it is a good
behavior of the proposed algorithm is in terms of toughness and constancy with respect to the formation of
Neural Network.
TEXT SENTIMENTS FOR FORUMS HOTSPOT DETECTIONijistjournal
The user generated content on the web grows rapidly in this emergent information age. The evolutionary changes in technology make use of such information to capture only the user’s essence and finally the useful information are exposed to information seekers. Most of the existing research on text information processing, focuses in the factual domain rather than the opinion domain. In this paper we detect online hotspot forums by computing sentiment analysis for text data available in each forum. This approach analyses the forum text data and computes value for each word of text. The proposed approach combines K-means clustering and Support Vector Machine with PSO (SVM-PSO) classification algorithm that can be used to group the forums into two clusters forming hotspot forums and non-hotspot forums within the current time span. The proposed system accuracy is compared with the other classification algorithms such as Naïve Bayes, Decision tree and SVM. The experiment helps to identify that K-means and SVM-PSO together achieve highly consistent results.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
This paper proposes a multi-document summarization system that uses bisect k-means clustering, an optimal merge function, and a neural network. The system first preprocesses input documents through stemming and removing stop words. It then applies bisect k-means clustering to group similar sentences. The clusters are merged using an optimal merge function to find important keywords. The NEWSUM algorithm is used to generate a primary summary for each keyword. A neural network trained on sentence classifications is then used to classify sentences in the primary summary as positive or negative. Only positively classified sentences are included in the final summary to improve accuracy. The system aims to generate a concise and accurate summary in a short period of time from multiple documents on a given topic.
This document analyzes a single student learning episode using two theoretical lenses: the instrumental genesis perspective and the onto-semiotic approach. The instrumental genesis perspective focuses on how students develop techniques for using tools or artifacts to solve mathematical tasks, and the relationships between thinking and gestures. The onto-semiotic approach views mathematical knowledge and learning as involving systems of practices within social and institutional contexts. Analyzing the same episode from both perspectives provides complementary insights and a richer understanding of the phenomena, while also helping to identify the strengths and limitations of each theoretical approach. Networking the two theories in this way contributes to theoretical development in mathematics education.
Text Mining is the technique that helps users to find out useful information from a large amount of text documents on the web or database. Most popular text mining and classification methods have adopted term-based approaches. The term based approaches and the pattern-based method describing user preferences. This review paper analyse how the text mining work on the three level i.e sentence level, document level and feature level. In this paper we review the related work which is previously done. This paper also demonstrated that what are the problems arise while doing text mining done at the feature level. This paper presents the technique to text mining for the compound sentences.
A Competent and Empirical Model of Distributed ClusteringIRJET Journal
This document discusses distributed document clustering. It begins with an introduction to how documents are stored and indexed in computers. It then discusses different clustering algorithms like hierarchical and k-means clustering that are used to group similar documents. The document proposes a new framework for efficiently clustering text documents stored across different distributed resources. It argues that traditional clustering algorithms cannot perfectly cluster text data in decentralized systems. The framework uses properties of traditional algorithms with the ability to cluster in distributed systems.
This document summarizes an article that proposes an automatic text summarization technique using feature terms to calculate sentence relevance. The technique uses both statistical and linguistic methods to identify semantically important sentences for creating a generic summary. It determines the relevance of sentences based on feature term ranks and performs semantic analysis of sentences with the highest ranks to select those most important for the summary. The performance is evaluated by comparing summaries to those created by human evaluators.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
The International Journal of Engineering and Science (IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Automatically Generating Wikipedia Articles: A Structure-Aware ApproachGeorge Ang
The document describes an approach for automatically generating Wikipedia-style articles by using the structure of existing human-authored articles as templates. It involves inducing templates by analyzing section headings across documents, retrieving relevant excerpts from the internet for each template topic, and jointly training extractors to select excerpts that optimize both local relevance and global coherence across the entire article. The results confirm the benefits of incorporating structural information into the content selection process.
IRJET- A Survey Paper on Text Summarization MethodsIRJET Journal
This document summarizes various approaches for automatic text summarization, including extractive and abstractive methods. It discusses early surface-level approaches from the 1950s that identified important sentences based on word frequency. It also reviews corpus-based, cohesion-based, rhetoric-based, and graph-based approaches. The document then examines single document summarization techniques like naive Bayes methods, log-linear models, and deep natural language analysis. It concludes with a discussion of multi-document summarization, including abstraction and information fusion as well as graph spreading activation approaches. The goal of the survey is to provide an overview of the major existing methods that have been used for automatic text summarization.
This document presents a general framework for building classifiers and clustering models using hidden topics to deal with short and sparse text data. It analyzes hidden topics from a large universal dataset using LDA. These topics are then used to enrich both the training data and new short text data by combining them with the topic distributions. This helps reduce data sparseness and improves classification and clustering accuracy for short texts like web snippets. The framework is also applied to contextual advertising by matching web pages and ads based on their hidden topic similarity.
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING cscpconf
In the last decade, ontologies have played a key technology role for information sharing and agents interoperability in different application domains. In semantic web domain, ontologies are efficiently used toface the great challenge of representing the semantics of data, in order to bring the actual web to its full
power and hence, achieve its objective. However, using ontologies as common and shared vocabularies requires a certain degree of interoperability between them. To confront this requirement, mapping ontologies is a solution that is not to be avoided. In deed, ontology mapping build a meta layer that allows different applications and information systems to access and share their informations, of course, after resolving the different forms of syntactic, semantic and lexical mismatches. In the contribution presented in this paper, we have integrated the semantic aspect based on an external lexical resource, wordNet, to design a new algorithm for fully automatic ontology mapping. This fully automatic character features the
main difference of our contribution with regards to the most of the existing semi-automatic algorithms of ontology mapping, such as Chimaera, Prompt, Onion, Glue, etc. To better enhance the performances of our algorithm, the mapping discovery stage is based on the combination of two sub-modules. The former
analysis the concept’s names and the later analysis their properties. Each one of these two sub-modules is
it self based on the combination of lexical and semantic similarity measures.
IRJET- Text Document Clustering using K-Means Algorithm IRJET Journal
This document discusses using the K-Means clustering algorithm to cluster text documents and compares it to using K-Means clustering with dimension reduction techniques. It uses the BBC Sports dataset containing 737 documents in 5 classes. The document outlines preprocessing the text, creating a document term matrix, applying K-Means clustering, and using dimension reduction techniques like InfoGain before clustering. It evaluates the different methods using precision, recall, accuracy, and F-measure, finding that K-Means with InfoGain dimension reduction outperforms standard K-Means clustering.
Text document clustering and similarity detection is the major part of document management, where every document should be identified by its key terms and domain knowledge. Based on the similarity, the documents are grouped into clusters. For document similarity calculation there are several approaches were proposed in the existing system. But the existing system is either term based or pattern based. And those systems suffered from several problems. To make a revolution in this challenging environment, the proposed system presents an innovative model for document similarity by applying back propagation time stamp algorithm. It discovers patterns in text documents as higher level features and creates a network for fast grouping. It also detects the most appropriate patterns based on its weight and BPTT performs the document similarity measures. Using this approach, the document can be categorized easily. In order to perform the above, a new approach is used. This helps to reduce the training process problems. The above framework is named as BPTT. The BPTT has implemented and evaluated using dot net platform with different set of datasets.
Review of Various Text Categorization Methodsiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Prediction of Answer Keywords using Char-RNNIJECEIAES
Generating sequences of characters using a Recurrent Neural Network (RNN) is a tried and tested method for creating unique and context aware words, and is fundamental in Natural Language Processing tasks. These type of Neural Networks can also be used a question-answering system. The main drawback of most of these systems is that they work from a factoid database of information, and when queried about new and current information, the responses are usually bleak. In this paper, the author proposes a novel approach to finding answer keywords from a given body of news text or headline, based on the query that was asked, where the query would be of the nature of current affairs or recent news, with the use of Gated Recurrent Unit (GRU) variant of RNNs. Thus, this ensures that the answers provided are relevant to the content of query that was put forth.
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text
classification. In this paper, Fast Fuzzy Feature clustering for text classification is proposed. It
is based on the framework proposed by Jung-Yi Jiang, Ren-Jia Liou and Shie-Jue Lee in 2011.
The word in the feature vector of the document is grouped into the cluster in less iteration. The
numbers of iterations required to obtain cluster centers are reduced by transforming clusters
center dimension from n-dimension to 2-dimension. Principle Component Analysis with slit
change is used for dimension reduction. Experimental results show that, this method improve
the performance by significantly reducing the number of iterations required to obtain the cluster
center. The same is being verified with three benchmark datasets
The Virtual Museum Report summarizes the activities of the Animal Demography Unit's Virtual Museum in 2014. It notes that over 56,000 records were added in 2014, a 76% increase over 2013. This pushed the total records in the Virtual Museum database past the 1 million record milestone. Several citizen science projects like LepiMAP contributed large numbers of records. New projects in mushrooms, lacewings, and orchids also joined. A key publication was an atlas and red list of South African reptiles. Work is ongoing to red list mammals with a call for more mammal records.
Feature selection, optimization and clustering strategies of text documentsIJECEIAES
Clustering is one of the most researched areas of data mining applications in the contemporary literature. The need for efficient clustering is observed across wide sectors including consumer segmentation, categorization, shared filtering, document management, and indexing. The research of clustering task is to be performed prior to its adaptation in the text environment. Conventional approaches typically emphasized on the quantitative information where the selected features are numbers. Efforts also have been put forward for achieving efficient clustering in the context of categorical information where the selected features can assume nominal values. This manuscript presents an in-depth analysis of challenges of clustering in the text environment. Further, this paper also details prominent models proposed for clustering along with the pros and cons of each model. In addition, it also focuses on various latest developments in the clustering task in the social network and associated environments.
8 efficient multi-document summary generation using neural networkINFOGAIN PUBLICATION
This paper proposes a multi-document summarization system that uses bisect k-means clustering, an optimal merge function, and a neural network. The system first preprocesses input documents through stemming and removing stop words. It then applies bisect k-means clustering to group similar sentences. The clusters are merged using an optimal merge function to find important keywords. The NEWSUM algorithm is used to generate a primary summary for each keyword. A neural network trained on sentence classifications is then used to classify sentences in the primary summary as positive or negative. Only positively classified sentences are included in the final summary to improve accuracy. The system aims to generate a concise and accurate summary in a short period of time from multiple documents on a given topic.
This document analyzes a single student learning episode using two theoretical lenses: the instrumental genesis perspective and the onto-semiotic approach. The instrumental genesis perspective focuses on how students develop techniques for using tools or artifacts to solve mathematical tasks, and the relationships between thinking and gestures. The onto-semiotic approach views mathematical knowledge and learning as involving systems of practices within social and institutional contexts. Analyzing the same episode from both perspectives provides complementary insights and a richer understanding of the phenomena, while also helping to identify the strengths and limitations of each theoretical approach. Networking the two theories in this way contributes to theoretical development in mathematics education.
Text Mining is the technique that helps users to find out useful information from a large amount of text documents on the web or database. Most popular text mining and classification methods have adopted term-based approaches. The term based approaches and the pattern-based method describing user preferences. This review paper analyse how the text mining work on the three level i.e sentence level, document level and feature level. In this paper we review the related work which is previously done. This paper also demonstrated that what are the problems arise while doing text mining done at the feature level. This paper presents the technique to text mining for the compound sentences.
A Competent and Empirical Model of Distributed ClusteringIRJET Journal
This document discusses distributed document clustering. It begins with an introduction to how documents are stored and indexed in computers. It then discusses different clustering algorithms like hierarchical and k-means clustering that are used to group similar documents. The document proposes a new framework for efficiently clustering text documents stored across different distributed resources. It argues that traditional clustering algorithms cannot perfectly cluster text data in decentralized systems. The framework uses properties of traditional algorithms with the ability to cluster in distributed systems.
This document summarizes an article that proposes an automatic text summarization technique using feature terms to calculate sentence relevance. The technique uses both statistical and linguistic methods to identify semantically important sentences for creating a generic summary. It determines the relevance of sentences based on feature term ranks and performs semantic analysis of sentences with the highest ranks to select those most important for the summary. The performance is evaluated by comparing summaries to those created by human evaluators.
A Survey on Sentiment Categorization of Movie ReviewsEditor IJMTER
Sentiment categorization is a process of mining user generated text content and determine
the sentiment of the users towards that particular thing. It is the approach of detecting the sentiment of
the author in regard to some topics. It also known as sentiment detection, sentiment analysis and opinion
mining. It is very useful for movie production companies that interested in knowing how users feel
about their movies. For example word “excellent” indicates that the review gives positive emotion about
particular movie. The same applies to movies, songs, cars, holiday destinations, Political parties, social
network sites, web blogs, discussion forum and so on. Sentiment categorization can be carried out by
using three approaches. First, Supervised machine learning based text classifier on Naïve Bayes,
Maximum Entropy, SVM, kNN classifier, hidden marcov model. Second, Unsupervised Semantic
Orientation scheme of extracting relevant N-grams of the text and then labelling. Third, SentiWordNet
based publicly available library.
The International Journal of Engineering and Science (IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Automatically Generating Wikipedia Articles: A Structure-Aware ApproachGeorge Ang
The document describes an approach for automatically generating Wikipedia-style articles by using the structure of existing human-authored articles as templates. It involves inducing templates by analyzing section headings across documents, retrieving relevant excerpts from the internet for each template topic, and jointly training extractors to select excerpts that optimize both local relevance and global coherence across the entire article. The results confirm the benefits of incorporating structural information into the content selection process.
IRJET- A Survey Paper on Text Summarization MethodsIRJET Journal
This document summarizes various approaches for automatic text summarization, including extractive and abstractive methods. It discusses early surface-level approaches from the 1950s that identified important sentences based on word frequency. It also reviews corpus-based, cohesion-based, rhetoric-based, and graph-based approaches. The document then examines single document summarization techniques like naive Bayes methods, log-linear models, and deep natural language analysis. It concludes with a discussion of multi-document summarization, including abstraction and information fusion as well as graph spreading activation approaches. The goal of the survey is to provide an overview of the major existing methods that have been used for automatic text summarization.
This document presents a general framework for building classifiers and clustering models using hidden topics to deal with short and sparse text data. It analyzes hidden topics from a large universal dataset using LDA. These topics are then used to enrich both the training data and new short text data by combining them with the topic distributions. This helps reduce data sparseness and improves classification and clustering accuracy for short texts like web snippets. The framework is also applied to contextual advertising by matching web pages and ads based on their hidden topic similarity.
SEMANTIC INTEGRATION FOR AUTOMATIC ONTOLOGY MAPPING cscpconf
In the last decade, ontologies have played a key technology role for information sharing and agents interoperability in different application domains. In semantic web domain, ontologies are efficiently used toface the great challenge of representing the semantics of data, in order to bring the actual web to its full
power and hence, achieve its objective. However, using ontologies as common and shared vocabularies requires a certain degree of interoperability between them. To confront this requirement, mapping ontologies is a solution that is not to be avoided. In deed, ontology mapping build a meta layer that allows different applications and information systems to access and share their informations, of course, after resolving the different forms of syntactic, semantic and lexical mismatches. In the contribution presented in this paper, we have integrated the semantic aspect based on an external lexical resource, wordNet, to design a new algorithm for fully automatic ontology mapping. This fully automatic character features the
main difference of our contribution with regards to the most of the existing semi-automatic algorithms of ontology mapping, such as Chimaera, Prompt, Onion, Glue, etc. To better enhance the performances of our algorithm, the mapping discovery stage is based on the combination of two sub-modules. The former
analysis the concept’s names and the later analysis their properties. Each one of these two sub-modules is
it self based on the combination of lexical and semantic similarity measures.
IRJET- Text Document Clustering using K-Means Algorithm IRJET Journal
This document discusses using the K-Means clustering algorithm to cluster text documents and compares it to using K-Means clustering with dimension reduction techniques. It uses the BBC Sports dataset containing 737 documents in 5 classes. The document outlines preprocessing the text, creating a document term matrix, applying K-Means clustering, and using dimension reduction techniques like InfoGain before clustering. It evaluates the different methods using precision, recall, accuracy, and F-measure, finding that K-Means with InfoGain dimension reduction outperforms standard K-Means clustering.
Text document clustering and similarity detection is the major part of document management, where every document should be identified by its key terms and domain knowledge. Based on the similarity, the documents are grouped into clusters. For document similarity calculation there are several approaches were proposed in the existing system. But the existing system is either term based or pattern based. And those systems suffered from several problems. To make a revolution in this challenging environment, the proposed system presents an innovative model for document similarity by applying back propagation time stamp algorithm. It discovers patterns in text documents as higher level features and creates a network for fast grouping. It also detects the most appropriate patterns based on its weight and BPTT performs the document similarity measures. Using this approach, the document can be categorized easily. In order to perform the above, a new approach is used. This helps to reduce the training process problems. The above framework is named as BPTT. The BPTT has implemented and evaluated using dot net platform with different set of datasets.
Review of Various Text Categorization Methodsiosrjce
IOSR Journal of Computer Engineering (IOSR-JCE) is a double blind peer reviewed International Journal that provides rapid publication (within a month) of articles in all areas of computer engineering and its applications. The journal welcomes publications of high quality papers on theoretical developments and practical applications in computer technology. Original research papers, state-of-the-art reviews, and high quality technical notes are invited for publications.
International Journal of Engineering and Science Invention (IJESI) is an international journal intended for professionals and researchers in all fields of computer science and electronics. IJESI publishes research articles and reviews within the whole field Engineering Science and Technology, new teaching methods, assessment, validation and the impact of new technologies and it will continue to provide information on the latest trends and developments in this ever-expanding subject. The publications of papers are selected through double peer reviewed to ensure originality, relevance, and readability. The articles published in our journal can be accessed online.
Prediction of Answer Keywords using Char-RNNIJECEIAES
Generating sequences of characters using a Recurrent Neural Network (RNN) is a tried and tested method for creating unique and context aware words, and is fundamental in Natural Language Processing tasks. These type of Neural Networks can also be used a question-answering system. The main drawback of most of these systems is that they work from a factoid database of information, and when queried about new and current information, the responses are usually bleak. In this paper, the author proposes a novel approach to finding answer keywords from a given body of news text or headline, based on the query that was asked, where the query would be of the nature of current affairs or recent news, with the use of Gated Recurrent Unit (GRU) variant of RNNs. Thus, this ensures that the answers provided are relevant to the content of query that was put forth.
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text
classification. In this paper, Fast Fuzzy Feature clustering for text classification is proposed. It
is based on the framework proposed by Jung-Yi Jiang, Ren-Jia Liou and Shie-Jue Lee in 2011.
The word in the feature vector of the document is grouped into the cluster in less iteration. The
numbers of iterations required to obtain cluster centers are reduced by transforming clusters
center dimension from n-dimension to 2-dimension. Principle Component Analysis with slit
change is used for dimension reduction. Experimental results show that, this method improve
the performance by significantly reducing the number of iterations required to obtain the cluster
center. The same is being verified with three benchmark datasets
The Virtual Museum Report summarizes the activities of the Animal Demography Unit's Virtual Museum in 2014. It notes that over 56,000 records were added in 2014, a 76% increase over 2013. This pushed the total records in the Virtual Museum database past the 1 million record milestone. Several citizen science projects like LepiMAP contributed large numbers of records. New projects in mushrooms, lacewings, and orchids also joined. A key publication was an atlas and red list of South African reptiles. Work is ongoing to red list mammals with a call for more mammal records.
Photography is about how you observe things, not just what you see. The photographer advocates for natural light in photos and wants pictures to focus on both the model and the background, as the whole picture including clothes and fashion tells a story that defines the model's persona.
This document presents a computer simulation model for radar target detection. The model considers two moving targets generated using a keypad and applies nine levels of noise to simulate changing signal-to-noise ratios. As noise levels increase, the brightness of target blips on the radar display decrease by around 5% for each level. The simulation is programmed in Turbo C++ and interfaces with a computer through a parallel port to simulate a radar display and evaluate the effects of noise on target detection performance.
Electronic bands structure and gap in mid-infrared detector InAs/GaSb type II...IJERA Editor
We present here theoretical study of the electronic bands structure E (d1) of InAs (d1=25 Å)/GaSb (d2=25 Å) type
II superlattice at 4.2 K performed in the envelope function formalism. We study the effect of d1 and the offset ,
between heavy holes bands edges of InAs and GaSb, on the band gap Eg (), at the center of the first Brillouin
zone, and the semiconductor-to-semimetal transition. Eg (, T) decreases from 288.7 meV at 4.2 K to 230 meV
at 300K. In the investigated temperature range, the cut-off wavelength 4.3 m ≤ c ≤ 5.4 m situates this sample
as mid-wavelength infrared detector (MWIR). Our results are in good agreement with the experimental data
realized by C. Cervera et al.
The document consists of repeated text stating a website address (www.paksociety.com) and attribution to Waqar Azeem Pakistanipoint as the scanner. It appears to be a scan of a blank or empty document with no other substantive information provided.
Grasp and Manipulation of Five Fingered Hand Robot in Unstructured EnvironmentsIJERA Editor
Handling of objects with irregular shapes and that of flexible/soft objects by ordinary robot grippers is difficult. It is required that various objects with different shapes or sizes could be grasped and manipulated by one robot hand mechanism for the sake of factory automation and labor saving. Dexterous grippers will be the appropriate solution to such problems. Corresponding to such needs, the present work is towards the design and development of an articulated mechanical hand with five fingers and twenty five degrees-of-freedom having an improved grasp capability. Since the designed hand is capable of enveloping and grasping an object mechanically, it can be conveniently used in manufacturing automation as well as for medical rehabilitation purpose. This work presents the kinematic design and the grasping analysis of such a hand.
The document is a list of repeated links to the URL http://imranseries.tumblr.com/forcompletelist. There is no other textual content. The list contains over 200 repetitions of this same link.
Scopri insieme a MailUp come rendere più efficace la tua
comunicazione. Invia una newsletter, una DEM o un’email
transazionale all’indirizzo che riceverai via email dopo aver
compilato il form che trovi a questo indirizzo:
http://www.mailup.it/email-checkup/
Puoi usare la tua console o un altro sistema di invio.
Navigation through citation network based on content similarity using cosine ...Salam Shah
The rate of scientific literature has been increased in the past few decades; new topics and information is added in the form of articles, papers, text documents, web logs, and patents. The growth of information at rapid rate caused a tremendous amount of additions in the current and past knowledge, during this process, new topics emerged, some topics split into many other sub-topics, on the other hand, many topics merge to formed single topic. The selection and search of a topic manually in such a huge amount of information have been found as an expensive and workforce-intensive task. For the emerging need of an automatic process to locate, organize, connect, and make associations among these sources the researchers have proposed different techniques that automatically extract components of the information presented in various formats and organize or structure them. The targeted data which is going to be processed for component extraction might be in the form of text, video or audio. The addition of different algorithms has structured information and grouped similar information into clusters and on the basis of their importance, weighted them. The organized, structured and weighted data is then compared with other structures to find similarity with the use of various algorithms. The semantic patterns can be found by employing visualization techniques that show similarity or relation between topics over time or related to a specific event. In this paper, we have proposed a model based on Cosine Similarity Algorithm for citation network which will answer the questions like, how to connect documents with the help of citation and content similarity and how to visualize and navigate through the document.
This document describes a proposed concept-based mining model that aims to improve document clustering and information retrieval by extracting concepts and semantic relationships rather than just keywords. The model uses natural language processing techniques like part-of-speech tagging and parsing to extract concepts from text. It represents concepts and their relationships in a semantic network and clusters documents based on conceptual similarity rather than term frequency. The model is evaluated using singular value decomposition to increase the precision of key term and phrase extraction.
RAPID INDUCTION OF MULTIPLE TAXONOMIES FOR ENHANCED FACETED TEXT BROWSINGijaia
In this paper we present and compare two methodologies for rapidly inducing multiple subject-specific
taxonomies from crawled data. The first method involves a sentence-level words co-occurrence frequency
method for building the taxonomy, while the second involves the bootstrapping of a Word2Vec based
algorithm with a directed crawler. We exploit the multilingual open-content directory of the World Wide
Web, DMOZ1
to seed the crawl, and the domain name to direct the crawl. This domain corpus is then input
to our algorithm that can automatically induce taxonomies. The induced taxonomies provide hierarchical
semantic dimensions for the purposes of faceted browsing. As part of an ongoing personal semantics
project, we applied the resulting taxonomies to personal social media data (Twitter, Gmail, Facebook,
Instagram, Flickr) with an objective of enhancing an individual’s exploration of their personal information
through faceted searching. We also perform a comprehensive corpus based evaluation of the algorithms
based on many datasets drawn from the fields of medicine (diseases) and leisure (hobbies) and show that
the induced taxonomies are of high quality.
This document describes a method for sentence similarity based text summarization using clusters. It involves preprocessing text, extracting primitives from sentences, linking primitives, computing sentence similarity, merging similarity values, clustering similar sentences, and extracting a representative sentence from each cluster to generate a summary. Key steps include identifying common elements (primitives) between sentences, representing sentences as vectors of primitives, computing similarity based on shared primitives, clustering similar sentences, pruning clusters to remove dissimilar sentences, ranking clusters by importance, and selecting a representative sentence from each cluster for the summary. The goal is to automatically generate a short summary that captures the essential information from a collection of documents or text on the same topic.
Research on ontology based information retrieval techniquesKausar Mukadam
The document summarizes and compares three novel ontology-based information retrieval techniques. It discusses a technique for retrieving information in the domain of Traditional Chinese Medicine that uses an ontology to represent concepts and measures concept similarity to sort search results. It also describes a framework for semantic indexing and querying that uses an ontology and entity-attribute-value model to improve scalability, usability, and retrieval performance for transport systems. Additionally, it outlines a semantic extension retrieval model that uses ontology annotation and semantic extension of queries to address limitations of keyword-based search. The techniques are evaluated based on precision and recall measures to analyze their effectiveness compared to traditional methods.
This document summarizes a survey on string similarity matching search techniques. It discusses how string similarity matching is used to find relevant information in text collections. The document reviews different algorithms for string matching, including edit distance, NR-grep, n-grams, and approaches based on hashing and locality-sensitive hashing. It analyzes techniques like pattern matching, threshold-based joins, and vector representations. The goal is to present an overview of the field and compare algorithm performance for similarity searches.
A simplified classification computational model of opinion mining using deep ...IJECEIAES
Opinion and attempts to develop an automated system to determine people's viewpoints towards various units such as events, topics, products, services, organizations, individuals, and issues. Opinion analysis from the natural text can be regarded as a text and sequence classification problem which poses high feature space due to the involvement of dynamic information that needs to be addressed precisely. This paper introduces effective modelling of human opinion analysis from social media data subjected to complex and dynamic content. Firstly, a customized preprocessing operation based on natural language processing mechanisms as an effective data treatment process towards building quality-aware input data. On the other hand, a suitable deep learning technique, bidirectional long short term-memory (Bi-LSTM), is implemented for the opinion classification, followed by a data modelling process where truncating and padding is performed manually to achieve better data generalization in the training phase. The design and development of the model are carried on the MATLAB tool. The performance analysis has shown that the proposed system offers a significant advantage in terms of classification accuracy and less training time due to a reduction in the feature space by the data treatment operation.
Concurrent Inference of Topic Models and Distributed Vector RepresentationsParang Saraf
Abstract: Topic modeling techniques have been widely used to uncover dominant themes hidden inside an unstructured document collection. Though these techniques first originated in the probabilistic analysis of word distributions, many deep learning approaches have been adopted recently. In this paper, we propose a novel neural network based architecture that produces distributed representation of topics to capture topical themes in a dataset. Unlike many state-of-the-art techniques for generating distributed representation of words and documents that directly use neighboring words for training, we leverage the outcome of a sophisticated deep neural network to estimate the topic labels of each document. The networks, for topic modeling and generation of distributed representations, are trained concurrently in a cascaded style with better runtime without sacrificing the quality of the topics. Empirical studies reported in the paper show that the distributed representations of topics represent intuitive themes using smaller dimensions than conventional topic modeling approaches.
For more information, please visit: http://people.cs.vt.edu/parang/ or contact parang at firstname at cs vt edu
The World Wide Web holds a large size of different information. Sometimes while searching the World Wide Web, users always do not gain the type of information they expect. In the subject of information extraction, extracting semantic relationships between terms from documents become a challenge. This paper proposes a system helps in retrieving documents based on the query expansion and tackles the extracting of semantic relationships from biological documents. This system retrieved documents that are relevant to the input terms then it extracts the existence of a relationship. In this system, we use Boolean model and the pattern recognition which helps in determining the relevant documents and determining the place of the relationship in the biological document. The system constructs a term-relation table that accelerates the relation extracting part. The proposed method offers another usage of the system so the researchers can use it to figure out the relationship between two biological terms through the available information in the biological documents. Also for the retrieved documents, the system measures the percentage of the precision and recall.
The World Wide Web holds a large size of different information. Sometimes while searching the World Wide Web, users always do not gain the type of information they expect. In the subject of information extraction, extracting semantic relationships between terms from documents become a challenge. This paper proposes a system helps in retrieving documents based on the query expansion and tackles the extracting of semantic relationships from biological documents. This system retrieved documents that are relevant to the input terms then it extracts the existence of a relationship. In this system, we use Boolean model and the pattern recognition which helps in determining the relevant documents and determining the place of the relationship in the biological document. The system constructs a term-relation table that accelerates the relation extracting part. The proposed method offers another usage of the system so the researchers can use it to figure out the relationship between two biological terms through the available information in the biological documents. Also for the retrieved documents, the system measures the percentage of the precision and recall.
A Semantic Retrieval System for Extracting Relationships from Biological Corpusijcsit
The World Wide Web holds a large size of different information. Sometimes while searching the World Wide Web, users always do not gain the type of information they expect. In the subject of information extraction, extracting semantic relationships between terms from documents become a challenge. This
paper proposes a system helps in retrieving documents based on the query expansion and tackles the extracting of semantic relationships from biological documents. This system retrieved documents that are relevant to the input terms then it extracts the existence of a relationship. In this system, we use Boolean
model and the pattern recognition which helps in determining the relevant documents and determining the place of the relationship in the biological document. The system constructs a term-relation table that accelerates the relation extracting part. The proposed method offers another usage of the system so the
researchers can use it to figure out the relationship between two biological terms through the available information in the biological documents. Also for the retrieved documents, the system measures the percentage of the precision and recall.
A hybrid composite features based sentence level sentiment analyzerIAESIJAI
Current lexica and machine learning based sentiment analysis approaches
still suffer from a two-fold limitation. First, manual lexicon construction and
machine training is time consuming and error-prone. Second, the
prediction’s accuracy entails sentences and their corresponding training text
should fall under the same domain. In this article, we experimentally
evaluate four sentiment classifiers, namely support vector machines (SVMs),
Naive Bayes (NB), logistic regression (LR) and random forest (RF). We
quantify the quality of each of these models using three real-world datasets
that comprise 50,000 movie reviews, 10,662 sentences, and 300 generic
movie reviews. Specifically, we study the impact of a variety of natural
language processing (NLP) pipelines on the quality of the predicted
sentiment orientations. Additionally, we measure the impact of incorporating
lexical semantic knowledge captured by WordNet on expanding original
words in sentences. Findings demonstrate that the utilizing different NLP
pipelines and semantic relationships impacts the quality of the sentiment
analyzers. In particular, results indicate that coupling lemmatization and
knowledge-based n-gram features proved to produce higher accuracy results.
With this coupling, the accuracy of the SVM classifier has improved to
90.43%, while it was 86.83%, 90.11%, 86.20%, respectively using the three
other classifiers.
Ontology Based Approach for Classifying Biomedical Text AbstractsWaqas Tariq
Classifying biomedical literature is a difficult and challenging task, especially when a large number of biomedical articles should be organized into a hierarchical structure. Due to this problem, various classification methods were proposed by many researchers for classifying biomedical literature in order to help users find relevant articles on the web. In this paper, we propose a new approach to classifying a collection of biomedical text abstracts by using ontology alignment algorithm that we have developed. To accomplish our goal, we construct the OHSUMED disease hierarchy as the initial training hierarchy and the Medline abstract disease hierarchies as our testing hierarchy. For enriching our training hierarchy, we use the relevant features that extracted from selected categories in the OHSUMED dataset as feature vectors. These feature vectors then are mapped to each node or concept in the OHSUMED disease hierarchy according to their specific category. Afterward, we align and match the concepts in both hierarchies using our ontology alignment algorithm for finding probable concepts or categories. Subsequently, we compute the cosine similarity score between the feature vectors in probable concepts, in the genrichedh OHSUMED disease hierarchy and the Medline abstract disease hierarchy. Finally, we predict a category to the new Medline abstracts based on the highest cosine similarity score. The results obtained from the experiments demonstrate that our proposed approach for hierarchical classification performs slightly better than the multi-class flat classification.
This paper proposes Natural language based Discourse Analysis method used for extracting
information from the news article of different domain. The Discourse analysis used the Rhetorical Structure
theory which is used to find coherent group of text which are most prominent for extracting information
from text. RST theory used the Nucleus- Satellite concept for finding most prominent text from the text
document. After Discourse analysis the text analysis has been done for extracting domain related object
and relates this object. For extracting the information knowledge based system has been used which
consist of domain dictionary .The domain dictionary has a bag of words for domain. The system is
evaluated according gold-of-art analysis and human decision for extracted information.
This document discusses challenging issues and similarity measures for web document clustering. It begins with an introduction to text mining and document clustering. Some key challenges discussed include ambiguity in natural language, efficiently measuring semantic similarity between words, and cluster validity. Various string-based, term-based, and corpus-based similarity measures are then described that can be used for document clustering, including Jaro-Winkler distance, cosine similarity, latent semantic analysis, and pointwise mutual information. The conclusion states that accurate clustering requires a precise definition of similarity between document pairs.
Challenging Issues and Similarity Measures for Web Document ClusteringIOSR Journals
This document discusses challenging issues and similarity measures for web document clustering. It begins with an introduction to text mining and document clustering. It then reviews related work on similarity approaches and measures. Some key challenging issues in web document clustering are discussed, such as measuring semantic similarity between words and evaluating cluster validity. Various types of similarity measures are also described, including string-based measures like Jaro-Winkler distance and corpus-based measures like latent semantic analysis. The conclusion states that accurate clustering requires a precise definition of similarity between document pairs and discusses different similarity measures that can be used.
Relevance feature discovery for text miningredpel dot com
The document discusses relevance feature discovery for text mining. It presents an innovative model that discovers both positive and negative patterns in text documents as higher-level features and uses them to classify terms into categories and update term weights based on their specificity and distribution in patterns. Experiments on standard datasets show the proposed model outperforms both term-based and pattern-based methods.
There is a vast amount of unstructured Arabic information on the Web, this data is always organized in
semi-structured text and cannot be used directly. This research proposes a semi-supervised technique that
extracts binary relations between two Arabic named entities from the Web. Several works have been
performed for relation extraction from Latin texts and as far as we know, there isn’t any work for Arabic
text using a semi-supervised technique. The goal of this research is to extract a large list or table from
named entities and relations in a specific domain. A small set of a handful of instance relations are
required as input from the user. The system exploits summaries from Google search engine as a source
text. These instances are used to extract patterns. The output is a set of new entities and their relations. The
results from four experiments show that precision and recall varies according to relation type. Precision
ranges from 0.61 to 0.75 while recall ranges from 0.71 to 0.83. The best result is obtained for (player, club)
relationship, 0.72 and 0.83 for precision and recall respectively.
The document discusses text mining and summarizes several key points:
1) Text mining involves deriving patterns and trends from text to discover useful knowledge, but it is challenging to accurately evaluate features due to issues like polysemy and synonymy.
2) Phrase-based approaches could perform better than term-based approaches by carrying more semantic meaning, but have faced challenges due to low phrase frequencies and redundant/noisy phrases.
3) The proposed approach uses pattern mining to discover specific patterns and evaluates term weights based on pattern distributions rather than full document distributions to address misinterpretation issues and improve accuracy.
Similar to Text Mining: (Asynchronous Sequences) (20)
Building Production Ready Search Pipelines with Spark and MilvusZilliz
Spark is the widely used ETL tool for processing, indexing and ingesting data to serving stack for search. Milvus is the production-ready open-source vector database. In this talk we will show how to use Spark to process unstructured data to extract vector representations, and push the vectors to Milvus vector database for search serving.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Salesforce Integration for Bonterra Impact Management (fka Social Solutions A...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on integration of Salesforce with Bonterra Impact Management.
Interested in deploying an integration with Salesforce for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
OpenID AuthZEN Interop Read Out - AuthorizationDavid Brossard
During Identiverse 2024 and EIC 2024, members of the OpenID AuthZEN WG got together and demoed their authorization endpoints conforming to the AuthZEN API
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
1. Sheema Khan Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 12( Part 4), December 2014, pp.55-59
www.ijera.com 55 | P a g e
Text Mining: (Asynchronous Sequences)
Sheema Khan(Student ME CSE), Zafar Ul Hasan, (ME (CSE), MBA(Mktg), LMISTE)
Department of Computer Science and Engineering, Everest Educational Societies College of Engineering &
Technology, Aurangabad, 431001 India.
Intellisense Research & Development Cunsultancy (IRD), Aurangabad, 431001 India.
Abstract
In this paper we tried to correlate text sequences those provides common topics for semantic clues. We propose
a two step method for asynchronous text mining. Step one check for the common topics in the sequences and
isolates these with their timestamps. Step two takes the topic and tries to give the timestamp of the text
document. After multiple repetitions of step two, we could give optimum result.
Keywords: Correlation, Asynchronous sequences, Timestamps
I. Introduction
If we see today’s scenario, we would find that
lot of text material is generated in this decade. The
sources of text generation are different like Internet,
Application packages, Research investigations
reports, Communication, Entertainment, etc. We can
be lost in searching knowledge from these text
sources. Hence it is required to give some method for
discovering knowledge from this enormous text
generated from decades. In this paper we tried
formulate a method to extract knowledge from the
available source of text sequences. This method of
discovering knowledge from text can be achieved in
two steps. In steps one, we try to determine the
distribution of word intensity by knowing the
meaning of the topic which is to be mined and in
steps two, the distribution can by using time
distribution method.
In reality many topics and text sequences are
correlated. A semantic topic and the comprehensive
topic are discovered by the interaction of multiple
sequences, rather than a single or individual stream.
According to recent work, over different sequences
same time distribution are share by the common
topic, usually different sequences are synchronous in
time. But multiple sequences which contain
synchronisms, is actually very common in practice.
For example this is not sure that the article of any
news feeds having the same topic indexed the same
timestamps, because for any news agencies there is
delay of hours, and days for news papers, and also
weeks for periodicals.
We proposed an effective method to solve the
problem of mining common topics from multiple
asynchronous text sequences. We formally define the
framework which is a principled, and the problem
based on which a unified objective function can be
derived. To optimize the objective function an
algorithm is define, this algorithm helps by
exploiting the mutual impact between topic
discovery and time synchronization.
The key point is to use the correlation between
sequences which is temporal and meaningful to build
up a mutual and strong process. First we extract
common topic using their timestamp from a given set
of sequences. Second we update the timestamp of
documents by checking them to most closely
connected topic, on the basis of extracted topic and
their word distribution. This step reduces the
asynchronism among sequences. According to new
timestamp the common topic are refined after
synchronization. To maximize the unified objective
function these steps are repeated alternatively and
these function are provably converges
monotonically.
The main points of our work are to
Mining common topic from multiple text sequences.
An objective function of problem which introduce
the principled and probabilistic framework to
formalize our problem
To maximize the objective function we develop
optimization algorithm which is guaranteed
optimum.
II. Literature Survey
Yuekui Yang et al. [1] parsed every web page as
a Dom-Tree. They proposed some rules in tree
aiming at extracting the relationship among different
paragraphs and then presented a new topic-specific
web crawler which calculated the unvisited URLs
prediction score based on the web page hierarchy
and the text semantic similarity. They calculated
the text similarity using vector space model (VSM)
which considered the query or paragraph as a vector
in which the terms are independent and contacted
different paragraphs in a web page according to their
hierarchy in a Dom-Tree. Xiaofeng Liao et al. [2]
considered the problem of modeling the topics in
a sequence of images with known timestamp.
REVIEW ARTICLE OPEN ACCESS
2. Sheema Khan Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 12( Part 4), December 2014, pp.55-59
www.ijera.com 56 | P a g e
Detecting and tracking of temporal data is an
important task in multiple applications, such as
finding hot research point from scientific literature,
news article series analysis, email surveillance,
search query log mining, etc.
Besides collections of text document they also
considered mining temporal topic trends from image
data set. Chenguha Lin et al. [3] proposed joint
sentiment-topic (JST) model based on latent
Dirichlet allocation [7], which detects sentiment
and topic simultaneously from text. JST is equivalent
to Reverse-JST without a hierarchical priority.
Neustein [4] showed how sequence package analysis
is informed by algorithms that can work with, rather
than be hindered by, less than perfect natural speech
for intelligent mining of doctor-patient recordings
and blogs. Watts et al. [5] throws light on
organization's knowledge gained through technical
conference. They worked out that there are processes
where the knowledge gains are limited to the
experiences and communication skills of the
individuals attending the conference.
Many conference proceedings are published and
provided to attendees in electronic format, such as on
CD-ROM and/or published on the internet, such as
IEEE conference proceedings. These proceedings
provide a rich repository that can be mined. They
compiled reflected hot topics as defined by the
researchers in the field and delineate the technical
approaches being applied. As per their work R&D
profiling can more fully exploited by recorded
conference proceedings' research to enhance
corporate knowledge. They illustrated in their paper
the potential in profiling conference proceedings
through use of WebQL information retrieval and
TechOasis (Vantage Point) text mining software by
showing how tracking research patterns and changes
over a sequence of conferences can illuminate R&D
trends map dominant issues and spotlight key
research organizations. Walenz et al. [6] described
sequencer system for the temporal analysis of named
entities in news articles between media reported
stories and user generated content. They explored the
evolution of social contexts with time that can
provide unique insights into human social dynamics.
Wenfeng Li et al. [7] used the user's web browsing
history that can be mined out. They presented an
innovative method to extract user's interests from
his/her web browsing history. They applied an
algorithm to extract useful texts from the web pages
in user's browsed URL sequence [10]. Unlike other
works that need a lot of training data to train a model
to adopt supervised information, they directly
introduced raw supervised information to the
procedure of LLDA-TF. In their paper Fotiadis et al.
[8] presented a methodology for biosequence
classification, which employs sequential
pattern mining and optimization algorithms.
In first stage, sequential pattern mining algorithm is
applied to a set of biological sequences and the
sequential patterns are extracted. Then, the score of
each pattern with respect to each sequence is
calculated using a scoring function and the score of
each class under consideration is estimated. The
scores of the patterns and classes are updated,
multiplied by a weight. In the second stage
optimization technique was employed to calculate
the weight values to achieve the optimal
classification accuracy. The methodology is applied
in the protein class and fold prediction problem.
Extensive evaluation is carried out, using a dataset
obtained from the Protein Data Bank. Itoh [9] gave a
contextual analysis processing technique, consisting
in determining the context understanding together
with coherences in sentences, of concepts and
phenomena related to each others that must be able
to simultaneously interpret accurately a sequence of
multiple semantic representations.
By applying a semantic analysis of co-occurrence
expressions, based on the use of phrases having an
absolute evaluation polarity, he developed a system
achieving analysis capable of detecting the role
relations between words, the relationship of meaning
in a sentence, identifying transitions in the topic,
anaphora, endophora, and analyzing even idiomatic
expressions and textual emoticons. Our system
evaluated correctly ldquopositiverdquo or
ldquonegativerdquo nuance for 75.0% of
those. Subasic et al. [11] proposed a method and
visualization tool for mapping and interacting stories
published in web pages and articles. In contrast to
existing approaches, their method concentrated on
relational information and on local patterns rather
than on the occurrence of individual concepts and
global models. They also presented an evaluation
framework.
Sekiya et al. [13] proposed that a word sequence can
be used to identify context. Both contexts identified
by word sequences and word sets related to the
contexts are shown concretely. They used the
confabulation model and five statistical measures as
relations. Comparing the measures they found that
cogency and mutual information were the most
effective. Creamer et al. [14] in their paper analyzed
the relationship between asset return, volatility and
the centrality indicators of a corporate news network
conducting a longitudinal network analysis. They
built a sequence of daily corporate news network
using companies of the STOXX 50 index as nodes,
the weights of the edges the sum of the number of
news items with the same topic by every pair of
companies identified by the topic model
methodology. They performed the Granger causality
test and the Brownian distance covariance test of
independence among several measures of centrality,
return and volatility. They found that the average
3. Sheema Khan Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 12( Part 4), December 2014, pp.55-59
www.ijera.com 57 | P a g e
eigenvector centrality of the corporate news
networks at different points of time has an impact on
return and volatility of the STOXX 50 index.
Likewise, return and volatility of the STOXX 50
index also had an effect on average eigenvector
centrality.
Perez et al. [15] proposed architecture for the
integration of a corporate warehouse of structured
data with a warehouse of text-rich XML documents.
Yanpeng Li et al. [16] present a feature coupling
generalization (FCG) framework for generating new
features from unlabeled data. It selected two special
types of features example-distinguishing features
(EDFs) and class-distinguishing features (CDFs)
from original feature set, and then generalizes EDFs
into higher-level features based on their coupling
degrees with CDFs in unlabeled data. It gave EDFs
with extreme sparsity in labeled data that can be
enriched by their co-occurrences with CDFs in
unlabeled data so that the performance of these low-
frequency features can be greatly boosted and new
information from unlabeled can be incorporated.
Navigli et al. [17] presented a method, called
structural semantic interconnections (SSI), which
creates structural specifications of the possible
senses for each word in a context and selects the best
hypothesis according to a grammar G, describing
relations between sense specifications. They created
sense specifications from several available lexical
resources that they integrated in part manually, in
part with the help of automatic procedures. The SSI
algorithm applied to different semantic
disambiguation problems.
III. Proposed Model
Fig a Proposed generative model for asynchronous text mining.
The process as per the fig a is as follows.
1. Select a document D from its source like website,
data warehouse, etc.
2. A timestamp T for that related document is
selected. This means for every document, only one
timestamp is associated.
3. The next step will be to select common topic Tc
and then to select the topic T and get the word.
Conventional methods on topic mining try to
maximize the likelihood function L by adjusting the
probabilities of topic and word assuming probability
of timestamp is known. However, in our work, we
need to consider the potential asynchronism among
different sequences. Thus, besides finding optimal
probabilities of topic and words we also need to
decide probability of timestamp to further maximize
the likelihood function. In other words, we want to
assign the document with timestamp T to a new
timestamp by determining its relevance to respective
topics, so that we can obtain larger L, or
equivalently, topics with better quality. By the term
asynchronism, we refer to the time distortion among
different sequences. The relative temporal order
within each individual sequence is still considered
meaningful and generally correct. Therefore, during
each synchronization step, we preserve the relative
temporal order of documents in each individual
sequences with earlier timestamp before adjustment
will never be assigned to later timestamp after
adjustment as compared to its successors. This
constraint aims to protect local temporal information
within each individual sequence while fixing the
asynchronism among different sequences.
3.1 Algorithm
In this section, we show how to solve our
objective function through an alternate (constrained)
optimization scheme. Our algorithm has two steps.
The first one assumes that the current timestamps of
the sequences are synchronous and extract common
topics from them. The second step synchronizes the
timestamps of all documents by matching them to
most related topics, respectively. Then, we go back
to first step and iterate until convergence.
3.2 Topic Extraction
We assume the current timestamps of all
sequences are already synchronous and extract
common topics from them. Our algorithm is
summarized as below. K is the number of topics
specified by users. The initial values of timestamps
and objective function are counted from the original
timestamps in the sequences.
3.3 Algorithm: Topic mining with time
synchronization
Input: K, Timestamp, Objective function
Output: Word, Topic, Timestamp
Repeat
Update word with timestamp and objective
function
Initialize: Topic and word values with
random numbers
Repeat
Update word and topic values.
Until convergence
For m=1 to M do (M is no of steps)
For u=1 to T do Initialize objective function
For v=2 to T do
Document
(D)
Timestamp
(Ts)
Topic
(Tc)
Word
(W)
Number of
sequences
(Ns)
Number of
Topics
(Nt)
4. Sheema Khan Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 12( Part 4), December 2014, pp.55-59
www.ijera.com 58 | P a g e
For w=1 to T do compute objective
function
End
End
Update timestamp
End
Until convergence
3.4 Constraint on Time Synchronization
We assumed asynchronism in given sequences. We
assumed that timestamps are distorted and sequential
information between documents is correct. This
assumption was based on observations from real-
world applications like news stories published by
different news agencies may vary in absolute
timestamps, but their sequential information
conforms to the order of the occurrences of the
events.
We argue that the second option works better in
practice since real-world data sets are not perfect.
Although we assume that the sequential information
of the given sequences is correct in general, there
will still be a small number of documents that do not
conform to our assumption. Our iterative updating
process and the relaxed constraint will help recover
this kind of outlying documents and assign them to
the correct topics.
3.5 Convergence
Our objective function will converge to a local
optimum after iterations. Notice that there is a trivial
solution to the objective function, which is to assign
all documents to an arbitrary timestamp and our
algorithm would terminate at this local optimum.
This local optimum is apparently meaningless since
it is equivalent to discard all temporal information of
text sequences and treat them like a collection of
documents. Nevertheless, this trivial solution only
exists theoretically. In practice, our algorithm will
not converge to this trivial solution, as long as we
use the original timestamps of text sequences as
initial value and have more numbers of topics.
3.6 The Local Search Strategy
In some real-world applications, we can have a
quantitative estimation of the asynchronism among
sequences so it is unnecessary to search the entire
time dimension when adjusting the timestamps of
documents. This gives us the opportunity to reduce
the complexity of time synchronization step without
causing substantial performance loss, by setting a
upper bound for the difference between the
timestamps of documents before and after
adjustment in each iteration. Specifically, given
document D with time T, we now look for an optimal
topic function within the neighborhood of topic.
IV. Conclusion
Our first aim is that to extract common topic
from multiple sequences which are asynchronous.
We propose a method which importantly extract
common topic by fixing potential asynchronism
among sequences. Self improvement process is
introduced by utilizing correlation between the
semantic and temporal information in sequences. To
optimize a unified objective function by extracting
common topic and time synchronization alternately.
Most likely results are guaranteed by our algorithm.
Two baseline methods are
From asynchronous text sequences we extract
meaningful and discriminative topic.
Quality and quantity are maintained by our method.
Performance of our method is strong and healthy
against random initialization and parameter setting.
References
[1]. Yuekui Yang, Yajun Du, Yufeng Hai
and Zhaoqiong Gao “A Topic-Specific Web
Crawler with Web Page Hierarchy Based on
HTML Dom-Tree”,IEEE Vol 1,
10.1109/APCIP.2009.110, 2009 , Page(s): 420 -
423
[2]. Xiaofeng Liao, Wang, Yongji and Liping Ding,
“Discovering Temporal Patterns from Images
using Extended PLSA”, IEEE,
10.1109/ICMULT.2010.5630978, 2010 , Page(s):
1 – 7
[3]. Chenghua Lin, Yulan He, Everson, R. And Ruger,
S., “Weakly Supervised Joint Sentiment-
Topic Detection from Text” IEEE Vol 24, Issue 6,
10.1109/TKDE.2011.48, 2012 , Page(s): 1134 -
1145
[4]. Neustein, A., “SEQUENCE PACKAGE
ANALYSIS: A New Natural Language
Understanding Method for Intelligent Mining of
Recordings of Doctor-Patient Interviews and
Health-Related Blogs”, IEEE,
10.1109/ITNG.2007.179, 2007 , Page(s): 431 –
438
[5]. Watts, R.J. and Porter, A.L., “Mining conference
proceedings for corporate technology knowledge
management”, IEEE,
10.1109/PICMET.2005.1509711, 2005 , Page(s):
349 – 358
Walenz, B., Gandhi, R., Mahoney,
W. And Quiming Zhu, “Exploring Social
Contexts along the Time Dimension: Temporal
Analysis of Named Entities”, IEEE,
10.1109/socialcom.2010.80, 2010 , Page(s): 508 –
512
[6]. Wenfeng Li, Xiaojie Wang, Rile Hu and Jilei
Tian, “User interest modeling by labeled LDA
with topic features”, IEEE,
10.1109/CCIS.2011.6045022, 2011 , Page(s): 6 –
11
[7]. Fotiadis, D.I., Exarchos, T.P. Tsipouras, M.G.
and Papaloukas, C., “Biosequence Classification
using Sequential Pattern Miningand
Optimization”, IEEE,
5. Sheema Khan Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 4, Issue 12( Part 4), December 2014, pp.55-59
www.ijera.com 59 | P a g e
10.1109/ITAB.2007.4407423 , 2007 , Page(s): 58
– 61
[8]. Itoh M, “Contextual Analysis Processing Methods
Able to Interpret Sentiments Evaluation
Representations” IEEE,10.1109/ICSC.2009.98,
2009 , Page(s): 71 – 76
[9]. He Bai, jinlin Wang and Ye Li, “An Approach to
Extracting Central urls on Catalog Page”, IEEE,
10.1109/KAM.2008.71 , 2008 , Page(s): 388 –
392
[10]. Subasic, I. And Berendt, B., “Web Mining for
Understanding Stories through Graph
Visualisation, IEEE, 10.1109/ICDAM.2008.138
[11]. A. K. Santra, C. Josephine Christy, “ Genetic
Algorithm and Confusion Matrix for Document
Clustering”, IJCSI International Journal of
Computer Science Issues, Vol. 9, Issue 1, No 2,
January 2012, ISSN (Online): 1694-0814, , Page
322-328
[12]. Sekiya, H., Kondo, T., Hashimoto,
M. And Takagi, T. , “Context representation using
word sequences extracted from a news corpus”,
IEEE, 10.1109/NAFIPS.2005.1548639 , 2005 ,
Page(s): 783 - 786
[13]. Creamer, G.G., Ren, Y. And Nickerson, J.V.,
“Impact of Dynamic Corporate News Networks
on Asset Return and Volatility”, IEEE,
10.1109/socialcom.2013.121, 2013 , Page(s): 809
– 814
[14]. Perez, J.M., Berlanga, R., Aramburu,
M.J.and; Pedersen, T.B., “R-Cubes: OLAP Cubes
Contextualized with Documents”, IEEE,
10.1109/ICDE.2007.369041, 2007 , Page(s): 1477
- 1478
[15]. Yanpeng L, Xiaohua Hu, Hongfei Lin and Zhihao
Yang, “A Framework for Semisupervised Feature
Generation and Its Applications in Biomedical
Literature Mining”, IEEE, Vol: 8 , Issue: 2,
10.1109/TCBB.2010.99, 2011 , Page(s): 294 -
307
[16]. Navigli, R., and Velardi, Paola, “Structural
semantic interconnections: a knowledge-based
approach to word sense disambiguation”, IEEE,
Vol 27 , Issue: 7, 10.1109/TPAMI.2005.149,
2005 , Page(s): 1075 - 1086