To mine out relevant facts at the time of need from web has been a tenuous task. Research on diverse fields
are fine tuning methodologies toward these goals that extracts the best of information relevant to the users
search query. In the proposed methodology discussed in this paper find ways to ease the search complexity
tackling the severe issues hindering the performance of traditional approaches in use. The proposed
methodology find effective means to find all possible semantic relatable frequent sets with FP Growth
algorithm. The outcome of which is the further source of fuel for Bio inspired Fuzzy PSO to find the optimal
attractive points for the web documents to get clustered meeting the requirement of the search query
without losing the relevance. On the whole the proposed system optimizes the objective function of
minimizing the intra cluster differences and maximizes the inter cluster distances along with retention of all
possible relationships with the search context intact. The major contribution being the system finds all
possible combinations matching the user search transaction and thereby making the system more
meaningful. These relatable sets form the set of particles for Fuzzy Clustering as well as PSO and thus
being unbiased and maintains a innate behaviour for any number of new additions to follow the herd
behaviour’s evaluations reveals the proposed methodology fares well as an optimized and effective
enhancements over the conventional approaches.
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHIJDKP
Text mining is an emerging research field evolving from information retrieval area. Clustering and
classification are the two approaches in data mining which may also be used to perform text classification
and text clustering. The former is supervised while the later is un-supervised. In this paper, our objective is
to perform text clustering by defining an improved distance metric to compute the similarity between two
text files. We use incremental frequent pattern mining to find frequent items and reduce dimensionality.
The improved distance metric may also be used to perform text classification. The distance metric is
validated for the worst, average and best case situations [15]. The results show the proposed distance
metric outperforms the existing measures.
This document discusses the use of fuzzy queries to retrieve information from databases. Fuzzy queries allow for imprecise or vague terms to be used in queries, similar to natural language. The document first provides background on limitations of traditional database queries. It then discusses how fuzzy set theory and membership functions can be applied to queries and data to handle uncertain terms. The proposed approach applies fuzzy queries to a relational database, defining linguistic variables and membership functions. This allows information to be retrieved based on fuzzy criteria and improves the ability to query databases using human-like terms. Benefits of fuzzy queries include more natural interaction and accounting for real-world data imperfections.
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALijaia
This document proposes a methodology to extract information from big data sources like course handouts and directories and represent it in a graphical, ontological tree format. Keywords are extracted from documents using natural language processing techniques and used to generate a hierarchical tree based on the DMOZ open directory project. The trees provide a comprehensive overview of document content and structure. The method is implemented using Python for natural language processing and Java for visualization. Evaluation on computer science course handouts shows the trees accurately represent topic coverage and depth. Future work aims to increase the number of keywords extracted.
Seeds Affinity Propagation Based on Text ClusteringIJRES Journal
The objective is to find among all partitions of the data set, best publishing according to some quality measure. Affinity propagation is a low error, high speed, flexible, and remarkably simple clustering algorithm that may be used in forming teams of participants for business simulations and experiential exercises, and in organizing participant’s preferences for the parameters of simulations. This paper proposes an efficient Affinity Propagation algorithm that guarantees the same clustering result as the original algorithm after convergence. The heart of our approach is (1) to prune unnecessary message exchanges in the iterations and (2) to compute the convergence values of pruned messages after the iterations to determine clusters.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
This document provides a survey of optimization approaches that have been applied to text document clustering. It discusses several clustering algorithms and categorizes them as partitioning methods, hierarchical methods, density-based methods, grid-based methods, model-based methods, frequent pattern-based clustering, and constraint-based clustering. It then describes several soft computing techniques that have been used as optimization approaches for text document clustering, including genetic algorithms, bees algorithms, particle swarm optimization, and ant colony optimization. These optimization techniques perform a global search to improve the quality and efficiency of document clustering algorithms.
Text document clustering and similarity detection is the major part of document management, where every document should be identified by its key terms and domain knowledge. Based on the similarity, the documents are grouped into clusters. For document similarity calculation there are several approaches were proposed in the existing system. But the existing system is either term based or pattern based. And those systems suffered from several problems. To make a revolution in this challenging environment, the proposed system presents an innovative model for document similarity by applying back propagation time stamp algorithm. It discovers patterns in text documents as higher level features and creates a network for fast grouping. It also detects the most appropriate patterns based on its weight and BPTT performs the document similarity measures. Using this approach, the document can be categorized easily. In order to perform the above, a new approach is used. This helps to reduce the training process problems. The above framework is named as BPTT. The BPTT has implemented and evaluated using dot net platform with different set of datasets.
The document discusses text mining and summarizes several key points:
1) Text mining involves deriving patterns and trends from text to discover useful knowledge, but it is challenging to accurately evaluate features due to issues like polysemy and synonymy.
2) Phrase-based approaches could perform better than term-based approaches by carrying more semantic meaning, but have faced challenges due to low phrase frequencies and redundant/noisy phrases.
3) The proposed approach uses pattern mining to discover specific patterns and evaluates term weights based on pattern distributions rather than full document distributions to address misinterpretation issues and improve accuracy.
This document presents a feature clustering algorithm to reduce the dimensionality of feature vectors for text classification. The algorithm groups words in documents into clusters based on similarity, with each cluster characterized by a membership function. Words not similar to existing clusters form new clusters. This avoids specifying features in advance and the need for trial and error. Experimental results showed the method can classify text faster and with better extracted features than other methods.
TEXT CLUSTERING USING INCREMENTAL FREQUENT PATTERN MINING APPROACHIJDKP
Text mining is an emerging research field evolving from information retrieval area. Clustering and
classification are the two approaches in data mining which may also be used to perform text classification
and text clustering. The former is supervised while the later is un-supervised. In this paper, our objective is
to perform text clustering by defining an improved distance metric to compute the similarity between two
text files. We use incremental frequent pattern mining to find frequent items and reduce dimensionality.
The improved distance metric may also be used to perform text classification. The distance metric is
validated for the worst, average and best case situations [15]. The results show the proposed distance
metric outperforms the existing measures.
This document discusses the use of fuzzy queries to retrieve information from databases. Fuzzy queries allow for imprecise or vague terms to be used in queries, similar to natural language. The document first provides background on limitations of traditional database queries. It then discusses how fuzzy set theory and membership functions can be applied to queries and data to handle uncertain terms. The proposed approach applies fuzzy queries to a relational database, defining linguistic variables and membership functions. This allows information to be retrieved based on fuzzy criteria and improves the ability to query databases using human-like terms. Benefits of fuzzy queries include more natural interaction and accounting for real-world data imperfections.
ONTOLOGICAL TREE GENERATION FOR ENHANCED INFORMATION RETRIEVALijaia
This document proposes a methodology to extract information from big data sources like course handouts and directories and represent it in a graphical, ontological tree format. Keywords are extracted from documents using natural language processing techniques and used to generate a hierarchical tree based on the DMOZ open directory project. The trees provide a comprehensive overview of document content and structure. The method is implemented using Python for natural language processing and Java for visualization. Evaluation on computer science course handouts shows the trees accurately represent topic coverage and depth. Future work aims to increase the number of keywords extracted.
Seeds Affinity Propagation Based on Text ClusteringIJRES Journal
The objective is to find among all partitions of the data set, best publishing according to some quality measure. Affinity propagation is a low error, high speed, flexible, and remarkably simple clustering algorithm that may be used in forming teams of participants for business simulations and experiential exercises, and in organizing participant’s preferences for the parameters of simulations. This paper proposes an efficient Affinity Propagation algorithm that guarantees the same clustering result as the original algorithm after convergence. The heart of our approach is (1) to prune unnecessary message exchanges in the iterations and (2) to compute the convergence values of pruned messages after the iterations to determine clusters.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
This document provides a survey of optimization approaches that have been applied to text document clustering. It discusses several clustering algorithms and categorizes them as partitioning methods, hierarchical methods, density-based methods, grid-based methods, model-based methods, frequent pattern-based clustering, and constraint-based clustering. It then describes several soft computing techniques that have been used as optimization approaches for text document clustering, including genetic algorithms, bees algorithms, particle swarm optimization, and ant colony optimization. These optimization techniques perform a global search to improve the quality and efficiency of document clustering algorithms.
Text document clustering and similarity detection is the major part of document management, where every document should be identified by its key terms and domain knowledge. Based on the similarity, the documents are grouped into clusters. For document similarity calculation there are several approaches were proposed in the existing system. But the existing system is either term based or pattern based. And those systems suffered from several problems. To make a revolution in this challenging environment, the proposed system presents an innovative model for document similarity by applying back propagation time stamp algorithm. It discovers patterns in text documents as higher level features and creates a network for fast grouping. It also detects the most appropriate patterns based on its weight and BPTT performs the document similarity measures. Using this approach, the document can be categorized easily. In order to perform the above, a new approach is used. This helps to reduce the training process problems. The above framework is named as BPTT. The BPTT has implemented and evaluated using dot net platform with different set of datasets.
The document discusses text mining and summarizes several key points:
1) Text mining involves deriving patterns and trends from text to discover useful knowledge, but it is challenging to accurately evaluate features due to issues like polysemy and synonymy.
2) Phrase-based approaches could perform better than term-based approaches by carrying more semantic meaning, but have faced challenges due to low phrase frequencies and redundant/noisy phrases.
3) The proposed approach uses pattern mining to discover specific patterns and evaluates term weights based on pattern distributions rather than full document distributions to address misinterpretation issues and improve accuracy.
This document presents a feature clustering algorithm to reduce the dimensionality of feature vectors for text classification. The algorithm groups words in documents into clusters based on similarity, with each cluster characterized by a membership function. Words not similar to existing clusters form new clusters. This avoids specifying features in advance and the need for trial and error. Experimental results showed the method can classify text faster and with better extracted features than other methods.
Semi-supervised learning approach using modified self-training algorithm to c...IJECEIAES
Burst header packet flooding is an attack on optical burst switching (OBS) network which may cause denial of service. Application of machine learning technique to detect malicious nodes in OBS network is relatively new. As finding sufficient amount of labeled data to perform supervised learning is difficult, semi-supervised method of learning (SSML) can be leveraged. In this paper, we studied the classical self-training algorithm (ST) which uses SSML paradigm. Generally, in ST, the available true-labeled data (L) is used to train a base classifier. Then it predicts the labels of unlabeled data (U). A portion from the newly labeled data is removed from U based on prediction confidence and combined with L. The resulting data is then used to re-train the classifier. This process is repeated until convergence. This paper proposes a modified self-training method (MST). We trained multiple classifiers on L in two stages and leveraged agreement among those classifiers to determine labels. The performance of MST was compared with ST on several datasets and significant improvement was found. We applied the MST on a simulated OBS network dataset and found very high accuracy with a small number of labeled data. Finally we compared this work with some related works.
A Survey on Fuzzy Association Rule Mining MethodologiesIOSR Journals
Abstract : Fuzzy association rule mining (Fuzzy ARM) uses fuzzy logic to generate interesting association
rules. These association relationships can help in decision making for the solution of a given problem. Fuzzy
ARM is a variant of classical association rule mining. Classical association rule mining uses the concept of
crisp sets. Because of this reason classical association rule mining has several drawbacks. To overcome those
drawbacks the concept of fuzzy association rule mining came. Today there is a huge number of different types of
fuzzy association rule mining algorithms are present in research works and day by day these algorithms are
getting better. But as the problem domain is also becoming more complex in nature, continuous research work
is still going on. In this paper, we have studied several well-known methodologies and algorithms for fuzzy
association rule mining. Four important methodologies are briefly discussed in this paper which will show the
recent trends and future scope of research in the field of fuzzy association rule mining.
Keywords: Knowledge discovery in databases, Data mining, Fuzzy association rule mining, Classical
association rule mining, Very large datasets, Minimum support, Cardinality, Certainty factor, Redundant rule,
Equivalence , Equivalent rules
Using data mining methods knowledge discovery for text miningeSAT Journals
Abstract Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase)-based approaches should perform better than the term-based ones, but many experiments do not support this hypothesis. Proposed work presents an innovative and effective pattern discovery technique which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Keywords:-Text mining, text classification, pattern mining, pattern evolving, information filtering.
Data mining involves using machine learning and statistical methods to discover patterns in large datasets and is useful in bioinformatics for analyzing biological data. Bioinformatics analyzes data from sequences, molecules, gene expressions, and pathways. Data mining can help understand these rapidly growing biological datasets. Common data mining tools in bioinformatics include BLAST for sequence comparisons, Entrez for integrated database searching, and ORF Finder for identifying open reading frames. Data mining approaches are well-suited to the enormous volumes of data in bioinformatics databases.
The document presents Kuralagam, a concept relation based search engine for Thirukkural (a classic Tamil literature). It uses CoReX, a concept relation indexing technique, to index Thirukkurals based on concepts and relations rather than keywords. This allows retrieval of semantically related Thirukkurals. The search engine was evaluated against traditional keyword search and found to have higher average precision (0.83 vs 0.52). It presents results in three tabs based on exact matches, related concepts, and expanded queries.
The document proposes a new approach to compare stock market patterns to DNA sequences using compression techniques. Stock market data is converted to binary sequences representing increases and decreases, which are then encoded into DNA nucleotides. These nucleotide sequences are divided and matched against human genome sequences using BLAST. The analysis found certain sub-sequences of the stock market patterns matched 100% to the human genome, suggesting this approach could potentially predict stock market behavior.
Conceptual similarity measurement algorithm for domain specific ontology[Zac Darcy
This paper presents the similarity measurement algorithm for domain specific terms collected in the
ontology based data integration system. This similarity measurement algorithm can be used in ontology
mapping and query service of
ontology based data integration sy
stem. In this paper, we focus
o
n the web
query service to apply
this proposed algorithm
. Concepts similarity is important for web query service
because the words in user input query are not
same wholly with the concepts in
ontology. So, we need to
extract the possible concepts that are match or related to the input words with the help of machine readable
dictionary WordNet. Sometimes, we use the generated mapping rules in query generation procedure for
some words that canno
t be
confirmed the similarity of these words
by WordNet. We prove the effect
of this
algorithm with two degree semantic result of web minin
g by generating
the concepts results obtained form
the input query
Review: Semi-Supervised Learning Methods for Word Sense DisambiguationIOSR Journals
This document provides a review of semi-supervised learning methods for word sense disambiguation. It discusses how semi-supervised learning uses both labeled and unlabeled data, requiring only a small amount of labeled data. The document outlines several semi-supervised learning techniques for word sense disambiguation, including bootstrapping algorithms like Yarowsky's algorithm, and graph-based approaches like label propagation. It provides details on Yarowsky's bootstrapping algorithm and how it is able to generalize to label new examples through exploiting properties like one-sense-per-collocation and language redundancy.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text
classification. In this paper, Fast Fuzzy Feature clustering for text classification is proposed. It
is based on the framework proposed by Jung-Yi Jiang, Ren-Jia Liou and Shie-Jue Lee in 2011.
The word in the feature vector of the document is grouped into the cluster in less iteration. The
numbers of iterations required to obtain cluster centers are reduced by transforming clusters
center dimension from n-dimension to 2-dimension. Principle Component Analysis with slit
change is used for dimension reduction. Experimental results show that, this method improve
the performance by significantly reducing the number of iterations required to obtain the cluster
center. The same is being verified with three benchmark datasets
Enhancing Keyword Query Results Over Database for Improving User Satisfaction ijmpict
Storing data in relational databases is widely increasing to support keyword queries but search results does not gives effective answers to keyword query and hence it is inflexible from user perspective. It would be helpful to recognize such type of queries which gives results with low ranking. Here we estimate prediction of query performance to find out effectiveness of a search performed in response to query and features of such hard queries is studied by taking into account contents of the database and result list. One relevant problem of database is the presence of missing data and it can be handled by imputation. Here an inTeractive Retrieving-Inferring data imputation method (TRIP) is used which achieves retrieving and inferring alternately to fill the missing attribute values in the database. So by considering both the prediction of hard queries and imputation over the database, we can get better keyword search results.
An efficient algorithm for sequence generation in data miningijcisjournal
Data mining is the method or the activity of analyzing data from different perspectives and summarizing it
into useful information. There are several major data mining techniques that have been developed and are
used in the data mining projects which include association, classification, clustering, sequential patterns,
prediction and decision tree. Among different tasks in data mining, sequential pattern mining is one of the
most important tasks. Sequential pattern mining involves the mining of the subsequences that appear
frequently in a set of sequences. It has a variety of applications in several domains such as the analysis of
customer purchase patterns, protein sequence analysis, DNA analysis, gene sequence analysis, web access
patterns, seismologic data and weather observations. Various models and algorithms have been developed
for the efficient mining of sequential patterns in large amount of data. This research paper analyzes the
efficiency of three sequence generation algorithms namely GSP, SPADE and PrefixSpan on a retail dataset
by applying various performance factors. From the experimental results, it is observed that the PrefixSpan
algorithm is more efficient than other two algorithms.
This document summarizes a research paper that proposes a multidimensional data mining algorithm to determine association rules across different granularities. The algorithm addresses weaknesses in existing techniques, such as having to rescan the entire database when new attributes are added. It uses a concept taxonomy structure to represent the search space and finds association patterns by selecting concepts from individual taxonomies. An experiment on a wholesale business dataset demonstrates that the algorithm is linear and highly scalable to the number of records and can flexibly handle different data types.
This document summarizes a study on multilabel text classification and the effect of label hierarchy. The study implements various algorithms for multilabel classification, including naive Bayes, k-nearest neighbors, random forests, SVMs, RBMs, and hierarchical classification algorithms. It evaluates the algorithms on four datasets that vary in features, labels, training/test sizes, and label cardinality. The goal is to analyze how different algorithmic approaches and dataset properties affect classification performance, particularly for hierarchical learning algorithms. Evaluation measures include micro/macro-averaged precision, recall and F1-score. The document provides details on the problem formulation, algorithms, implementation, datasets and evaluation.
This document discusses hierarchical clustering and similarity measures for document clustering. It summarizes that hierarchical clustering creates a hierarchical decomposition of data objects through either agglomerative or divisive approaches. The success of clustering depends on the similarity measure used, with traditional measures using a single viewpoint, while multiviewpoint measures use different viewpoints to increase accuracy. The paper then focuses on applying a multiviewpoint similarity measure to hierarchical clustering of documents.
IRJET- Text Document Clustering using K-Means Algorithm IRJET Journal
This document discusses using the K-Means clustering algorithm to cluster text documents and compares it to using K-Means clustering with dimension reduction techniques. It uses the BBC Sports dataset containing 737 documents in 5 classes. The document outlines preprocessing the text, creating a document term matrix, applying K-Means clustering, and using dimension reduction techniques like InfoGain before clustering. It evaluates the different methods using precision, recall, accuracy, and F-measure, finding that K-Means with InfoGain dimension reduction outperforms standard K-Means clustering.
Combining inductive and analytical learningswapnac12
The document discusses combining inductive and analytical learning methods. Inductive methods like decision trees and neural networks seek general hypotheses based solely on training data, while analytical methods like PROLOG-EBG seek hypotheses based on prior knowledge and training data. The document argues that combining these approaches could overcome their individual limitations by using prior knowledge to guide inductive learning methods. It examines using prior knowledge to initialize hypotheses, alter the search objective, and modify the search operators of inductive learners. As an example, it describes the KBANN algorithm, which initializes a neural network to perfectly fit a domain theory before refining it inductively to the training data.
How does your media product represent particular socialchloe_mullahy
The document discusses how the media product represents particular social groups. It aims its R&B magazine at youth aged 15-30 and those in social groups D and E, who are unemployed, students, or unskilled workers. To attract this demographic, the magazine features models wearing basic clothing like hoodies that would be relatable and affordable. It also keeps pricing around £1.99 to be accessible. Through these representation of models and consideration of pricing, the magazine adapts its content to effectively reach its target demographic.
UNIVERSITY BUSES ROUTING AND TRACKING SYSTEMijcseit
This paper proposes development of an android app to improve the transportation services for bus rental
companies that lift Taibah University students. It intends to reduce the waiting time for bus students,
thereby to stimulate sharing of updated information between the bus drivers and students. The application
can run only on android devices. It would inform the students about the exact time of arrival and departure
of buses on route. This proposed app would specifically be used by students and drivers of Taibah
University. Any change in the scheduled movement of the buses would be updated in the software. Regular
alerts would be sent in case of delays or cancelation of buses. Bus locations and routes are shown on
dynamic maps using Google maps. The application is designed and tested where the users assured that the
application gives the real time service and it is very helpful for them.
This document discusses hematopoietic stem cell transplantation (HSCT) associated thrombotic microangiopathy (TMA). TMA causes symptoms like elevated blood pressure, acute kidney injury, proteinuria, hemolysis, anemia, and thrombocytopenia. Causes of TMA include viruses, calcineurin inhibitors, sirolimus, chemotherapy, radiation, complement disorders, graft-versus-host disease, and coagulation abnormalities. Treatment focuses on altering graft-versus-host disease treatment, though data on therapies like plasma exchange is limited. Newer complement-based therapies also show promise.
Semi-supervised learning approach using modified self-training algorithm to c...IJECEIAES
Burst header packet flooding is an attack on optical burst switching (OBS) network which may cause denial of service. Application of machine learning technique to detect malicious nodes in OBS network is relatively new. As finding sufficient amount of labeled data to perform supervised learning is difficult, semi-supervised method of learning (SSML) can be leveraged. In this paper, we studied the classical self-training algorithm (ST) which uses SSML paradigm. Generally, in ST, the available true-labeled data (L) is used to train a base classifier. Then it predicts the labels of unlabeled data (U). A portion from the newly labeled data is removed from U based on prediction confidence and combined with L. The resulting data is then used to re-train the classifier. This process is repeated until convergence. This paper proposes a modified self-training method (MST). We trained multiple classifiers on L in two stages and leveraged agreement among those classifiers to determine labels. The performance of MST was compared with ST on several datasets and significant improvement was found. We applied the MST on a simulated OBS network dataset and found very high accuracy with a small number of labeled data. Finally we compared this work with some related works.
A Survey on Fuzzy Association Rule Mining MethodologiesIOSR Journals
Abstract : Fuzzy association rule mining (Fuzzy ARM) uses fuzzy logic to generate interesting association
rules. These association relationships can help in decision making for the solution of a given problem. Fuzzy
ARM is a variant of classical association rule mining. Classical association rule mining uses the concept of
crisp sets. Because of this reason classical association rule mining has several drawbacks. To overcome those
drawbacks the concept of fuzzy association rule mining came. Today there is a huge number of different types of
fuzzy association rule mining algorithms are present in research works and day by day these algorithms are
getting better. But as the problem domain is also becoming more complex in nature, continuous research work
is still going on. In this paper, we have studied several well-known methodologies and algorithms for fuzzy
association rule mining. Four important methodologies are briefly discussed in this paper which will show the
recent trends and future scope of research in the field of fuzzy association rule mining.
Keywords: Knowledge discovery in databases, Data mining, Fuzzy association rule mining, Classical
association rule mining, Very large datasets, Minimum support, Cardinality, Certainty factor, Redundant rule,
Equivalence , Equivalent rules
Using data mining methods knowledge discovery for text miningeSAT Journals
Abstract Many data mining techniques have been proposed for mining useful patterns in text documents. However, how to effectively use and update discovered patterns is still an open research issue, especially in the domain of text mining. Since most existing text mining methods adopted term-based approaches, they all suffer from the problems of polysemy and synonymy. Over the years, people have often held the hypothesis that pattern (or phrase)-based approaches should perform better than the term-based ones, but many experiments do not support this hypothesis. Proposed work presents an innovative and effective pattern discovery technique which includes the processes of pattern deploying and pattern evolving, to improve the effectiveness of using and updating discovered patterns for finding relevant and interesting information. Keywords:-Text mining, text classification, pattern mining, pattern evolving, information filtering.
Data mining involves using machine learning and statistical methods to discover patterns in large datasets and is useful in bioinformatics for analyzing biological data. Bioinformatics analyzes data from sequences, molecules, gene expressions, and pathways. Data mining can help understand these rapidly growing biological datasets. Common data mining tools in bioinformatics include BLAST for sequence comparisons, Entrez for integrated database searching, and ORF Finder for identifying open reading frames. Data mining approaches are well-suited to the enormous volumes of data in bioinformatics databases.
The document presents Kuralagam, a concept relation based search engine for Thirukkural (a classic Tamil literature). It uses CoReX, a concept relation indexing technique, to index Thirukkurals based on concepts and relations rather than keywords. This allows retrieval of semantically related Thirukkurals. The search engine was evaluated against traditional keyword search and found to have higher average precision (0.83 vs 0.52). It presents results in three tabs based on exact matches, related concepts, and expanded queries.
The document proposes a new approach to compare stock market patterns to DNA sequences using compression techniques. Stock market data is converted to binary sequences representing increases and decreases, which are then encoded into DNA nucleotides. These nucleotide sequences are divided and matched against human genome sequences using BLAST. The analysis found certain sub-sequences of the stock market patterns matched 100% to the human genome, suggesting this approach could potentially predict stock market behavior.
Conceptual similarity measurement algorithm for domain specific ontology[Zac Darcy
This paper presents the similarity measurement algorithm for domain specific terms collected in the
ontology based data integration system. This similarity measurement algorithm can be used in ontology
mapping and query service of
ontology based data integration sy
stem. In this paper, we focus
o
n the web
query service to apply
this proposed algorithm
. Concepts similarity is important for web query service
because the words in user input query are not
same wholly with the concepts in
ontology. So, we need to
extract the possible concepts that are match or related to the input words with the help of machine readable
dictionary WordNet. Sometimes, we use the generated mapping rules in query generation procedure for
some words that canno
t be
confirmed the similarity of these words
by WordNet. We prove the effect
of this
algorithm with two degree semantic result of web minin
g by generating
the concepts results obtained form
the input query
Review: Semi-Supervised Learning Methods for Word Sense DisambiguationIOSR Journals
This document provides a review of semi-supervised learning methods for word sense disambiguation. It discusses how semi-supervised learning uses both labeled and unlabeled data, requiring only a small amount of labeled data. The document outlines several semi-supervised learning techniques for word sense disambiguation, including bootstrapping algorithms like Yarowsky's algorithm, and graph-based approaches like label propagation. It provides details on Yarowsky's bootstrapping algorithm and how it is able to generalize to label new examples through exploiting properties like one-sense-per-collocation and language redundancy.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
FAST FUZZY FEATURE CLUSTERING FOR TEXT CLASSIFICATION cscpconf
Feature clustering is a powerful method to reduce the dimensionality of feature vectors for text
classification. In this paper, Fast Fuzzy Feature clustering for text classification is proposed. It
is based on the framework proposed by Jung-Yi Jiang, Ren-Jia Liou and Shie-Jue Lee in 2011.
The word in the feature vector of the document is grouped into the cluster in less iteration. The
numbers of iterations required to obtain cluster centers are reduced by transforming clusters
center dimension from n-dimension to 2-dimension. Principle Component Analysis with slit
change is used for dimension reduction. Experimental results show that, this method improve
the performance by significantly reducing the number of iterations required to obtain the cluster
center. The same is being verified with three benchmark datasets
Enhancing Keyword Query Results Over Database for Improving User Satisfaction ijmpict
Storing data in relational databases is widely increasing to support keyword queries but search results does not gives effective answers to keyword query and hence it is inflexible from user perspective. It would be helpful to recognize such type of queries which gives results with low ranking. Here we estimate prediction of query performance to find out effectiveness of a search performed in response to query and features of such hard queries is studied by taking into account contents of the database and result list. One relevant problem of database is the presence of missing data and it can be handled by imputation. Here an inTeractive Retrieving-Inferring data imputation method (TRIP) is used which achieves retrieving and inferring alternately to fill the missing attribute values in the database. So by considering both the prediction of hard queries and imputation over the database, we can get better keyword search results.
An efficient algorithm for sequence generation in data miningijcisjournal
Data mining is the method or the activity of analyzing data from different perspectives and summarizing it
into useful information. There are several major data mining techniques that have been developed and are
used in the data mining projects which include association, classification, clustering, sequential patterns,
prediction and decision tree. Among different tasks in data mining, sequential pattern mining is one of the
most important tasks. Sequential pattern mining involves the mining of the subsequences that appear
frequently in a set of sequences. It has a variety of applications in several domains such as the analysis of
customer purchase patterns, protein sequence analysis, DNA analysis, gene sequence analysis, web access
patterns, seismologic data and weather observations. Various models and algorithms have been developed
for the efficient mining of sequential patterns in large amount of data. This research paper analyzes the
efficiency of three sequence generation algorithms namely GSP, SPADE and PrefixSpan on a retail dataset
by applying various performance factors. From the experimental results, it is observed that the PrefixSpan
algorithm is more efficient than other two algorithms.
This document summarizes a research paper that proposes a multidimensional data mining algorithm to determine association rules across different granularities. The algorithm addresses weaknesses in existing techniques, such as having to rescan the entire database when new attributes are added. It uses a concept taxonomy structure to represent the search space and finds association patterns by selecting concepts from individual taxonomies. An experiment on a wholesale business dataset demonstrates that the algorithm is linear and highly scalable to the number of records and can flexibly handle different data types.
This document summarizes a study on multilabel text classification and the effect of label hierarchy. The study implements various algorithms for multilabel classification, including naive Bayes, k-nearest neighbors, random forests, SVMs, RBMs, and hierarchical classification algorithms. It evaluates the algorithms on four datasets that vary in features, labels, training/test sizes, and label cardinality. The goal is to analyze how different algorithmic approaches and dataset properties affect classification performance, particularly for hierarchical learning algorithms. Evaluation measures include micro/macro-averaged precision, recall and F1-score. The document provides details on the problem formulation, algorithms, implementation, datasets and evaluation.
This document discusses hierarchical clustering and similarity measures for document clustering. It summarizes that hierarchical clustering creates a hierarchical decomposition of data objects through either agglomerative or divisive approaches. The success of clustering depends on the similarity measure used, with traditional measures using a single viewpoint, while multiviewpoint measures use different viewpoints to increase accuracy. The paper then focuses on applying a multiviewpoint similarity measure to hierarchical clustering of documents.
IRJET- Text Document Clustering using K-Means Algorithm IRJET Journal
This document discusses using the K-Means clustering algorithm to cluster text documents and compares it to using K-Means clustering with dimension reduction techniques. It uses the BBC Sports dataset containing 737 documents in 5 classes. The document outlines preprocessing the text, creating a document term matrix, applying K-Means clustering, and using dimension reduction techniques like InfoGain before clustering. It evaluates the different methods using precision, recall, accuracy, and F-measure, finding that K-Means with InfoGain dimension reduction outperforms standard K-Means clustering.
Combining inductive and analytical learningswapnac12
The document discusses combining inductive and analytical learning methods. Inductive methods like decision trees and neural networks seek general hypotheses based solely on training data, while analytical methods like PROLOG-EBG seek hypotheses based on prior knowledge and training data. The document argues that combining these approaches could overcome their individual limitations by using prior knowledge to guide inductive learning methods. It examines using prior knowledge to initialize hypotheses, alter the search objective, and modify the search operators of inductive learners. As an example, it describes the KBANN algorithm, which initializes a neural network to perfectly fit a domain theory before refining it inductively to the training data.
How does your media product represent particular socialchloe_mullahy
The document discusses how the media product represents particular social groups. It aims its R&B magazine at youth aged 15-30 and those in social groups D and E, who are unemployed, students, or unskilled workers. To attract this demographic, the magazine features models wearing basic clothing like hoodies that would be relatable and affordable. It also keeps pricing around £1.99 to be accessible. Through these representation of models and consideration of pricing, the magazine adapts its content to effectively reach its target demographic.
UNIVERSITY BUSES ROUTING AND TRACKING SYSTEMijcseit
This paper proposes development of an android app to improve the transportation services for bus rental
companies that lift Taibah University students. It intends to reduce the waiting time for bus students,
thereby to stimulate sharing of updated information between the bus drivers and students. The application
can run only on android devices. It would inform the students about the exact time of arrival and departure
of buses on route. This proposed app would specifically be used by students and drivers of Taibah
University. Any change in the scheduled movement of the buses would be updated in the software. Regular
alerts would be sent in case of delays or cancelation of buses. Bus locations and routes are shown on
dynamic maps using Google maps. The application is designed and tested where the users assured that the
application gives the real time service and it is very helpful for them.
This document discusses hematopoietic stem cell transplantation (HSCT) associated thrombotic microangiopathy (TMA). TMA causes symptoms like elevated blood pressure, acute kidney injury, proteinuria, hemolysis, anemia, and thrombocytopenia. Causes of TMA include viruses, calcineurin inhibitors, sirolimus, chemotherapy, radiation, complement disorders, graft-versus-host disease, and coagulation abnormalities. Treatment focuses on altering graft-versus-host disease treatment, though data on therapies like plasma exchange is limited. Newer complement-based therapies also show promise.
SMi’s 4th annual Peptides conference will give you the perfect platform to network, collaborate and learn over 2 days, featuring various key presentations by senior level industry professionals. With topics such as peptide formulation and delivery, as well as new advances in alternative peptide analysis technologies, Cell Penetrating Peptides, enhancements in synthesis mechanisms, analytical monitoring techniques, and many more.
SMi proudly presents due to the huge success of our Pharmaceutical Microbiology UK event the launch of SMi's Pharmaceutical Microbiology USA Conference in San Diego on 8th and 9th June 2017. Latest reports show that global rapid microbiology tests market to reach a net worth USD 19.5 Bn by 2022.* With the field growing at such a rapid rate SMi would like to take this opportunity to invite you to join us as we address the hot topics of the industry such as RMM, low endotoxin recovery, risk assessment and many more! With the forthcoming changes to annexe 1, take this opportunity to hear direct regulatory feedback and benchmark your strategies amongst industry peers. Our 2 day event will include presentations from both regulatory and industry experts sharing recent case studies and developing trends in the field of pharmaceutical microbiology
La población carcelaria femenina en España y Cataluña ha aumentado espectacularmente en los últimos años. Las principales líneas de intervención social con las mujeres encarceladas incluyen medidas alternativas a la prisión, trabajar para la reinserción social desde la entrada a la cárcel, dotar de servicios básicos, atender a los hijos de las presas fuera de la cárcel, y mejorar la salud física y mental de las reclusas. Sin embargo, la desconexión entre los servicios sociales penitenciarios y los servicios sociales
El documento describe el diseño de un muro de contención con contrafuertes. Incluye el cálculo del ángulo de fricción interna, la determinación de las dimensiones y refuerzo de la pantalla, el dimensionamiento de la zapata, y la verificación de la estabilidad y presiones sobre el terreno.
Ramón Alberto Valenzuela Llanos fue un pintor chileno nacido en 1869 que estudió en la Academia de Bellas Artes de Santiago y más tarde en la Academia Julien de París, donde ganó reconocimiento. Fue profesor en la Escuela de Bellas Artes de Santiago y se destacó por sus paisajes al óleo de los alrededores de Santiago. Recibió numerosos premios y reconocimientos en Chile y el extranjero por su destacada carrera artística.
Gujarati cuisine originates from the western Indian state of Gujarat. The cuisine is primarily vegetarian due to Jain and Buddhist influences in the region. Staple foods include millet, yogurt, buttermilk, coconut, groundnut, sesame seeds and jaggery. Signature dishes include dhokla, thepla, khandvi, shrikhand, methi dhal and undhiyu. The cuisine varies regionally, with southern Gujarat using more green vegetables and fruit and Kathiawar and Kutch regions having a predominant garlic flavor and use of spices. A basic Gujarati thali consists of rice or wheat, dal, kadhi, pulses, vegetables, sal
GICSA es la empresa líder en el desarrollo, inversión, comercialización y operación de centros comerciales, oficinas corporativas y naves industriales.
The mood board analysis discusses how the images collected for the mood board are unified by a combative or foreboding atmosphere, which fits well with the violent nature of stick figures in internet culture. The mood board will influence the final product's design by emulating the precision of the drawn figures and capturing the fast-paced energy of combat through a neon-shadow color scheme to indicate character alignment or motivation.
Survey on Efficient Techniques of Text Miningvivatechijri
In the current era, with the advancement of technology, more and more data is available in digital
form. Among which, most of the data (approx. 85%) is in unstructured textual form. So it has become essential to
develop better techniques and algorithms to extract useful and interesting information from this large amount of
textual data. Text mining is process of extracting useful data from unstructured text. The algorithm used for text
mining has advantages and disadvantages. Moreover the issues in the field of text mining that affect the accuracy
and relevance of the results are identified.
Ontology based clustering algorithms aim to standardize clustering by incorporating domain knowledge through ontologies. They calculate similarity matrices between objects using ontology-based methods, then merge the closest clusters and recalculate the matrix in an iterative process. Several ontology based clustering algorithms are discussed, including Apriori, which generates frequent item sets to cluster data, and algorithms that use ontologies to weight features or perform recursive mining on an FP-tree. These algorithms integrate distributed semantic web data through ontologies to improve search, classification and reuse of knowledge resources.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
This document summarizes a research paper that proposes a new Particle Swarm Optimization (PSO) based K-Prototype clustering algorithm to cluster mixed numeric and categorical data. It begins with background information on clustering algorithms like K-Means, K-Modes, and K-Prototype. It then describes the K-Prototype algorithm, PSO, and discrete binary PSO. Related work integrating PSO with other clustering algorithms is also reviewed. The proposed approach uses binary PSO to select improved initial prototypes for K-Prototype clustering in order to obtain better clustering results than traditional K-Prototype and avoid local optima.
This document discusses using particle swarm optimization to improve the k-prototype clustering algorithm. The k-prototype algorithm clusters data with both numeric and categorical attributes but can get stuck in local optima. The proposed method uses particle swarm optimization, a global optimization technique, to guide the k-prototype algorithm towards better clusterings. Particle swarm optimization models potential solutions as particles that explore the search space. It is integrated with k-prototype clustering to avoid locally optimal solutions and produce better clusterings. The method is tested on standard benchmark datasets and shown to outperform traditional k-modes and k-prototype clustering algorithms.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
Mining frequent pattern is a NP-hard problem and has become a hot topic in recent researches. Moreover,
protein dataset contains distinct Pattern that can be used in many areas such as drug discovery, disease
prediction, etc. In early decades, pattern discovery and protein fold recognition was determined by
biophysics and biochemistry approach; and X-ray and NMR have been used for protein structure
prediction which are very expensive and time consuming while, a mathematical approach can reduce the
cost of such laboratory experiments. Many computer based tests have been applied for the protein fold
detection such as graph based algorithms and data mining viewpoints like classification or clustering, and
all have their advantages and drawbacks. Pattern matching in protein sequential dataset for fold
recognition plays a meaningful role in the field of bioinformatics since it evolved prediction of unknown
protein function. There are lots of pattern recognition algorithms but in this work we used PrefixSpan. The
reason of selecting this algorithm will be discussed below in section 2. For evaluating the result of
experiments we used SCOPE dataset which is a classified protein dataset and ASTRAL, a discriminative
sequential dataset of SCOPE.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
1. The document presents a hybrid algorithm that combines Kernelized Fuzzy C-Means (KFCM), Hybrid Ant Colony Optimization (HACO), and Fuzzy Adaptive Particle Swarm Optimization (FAPSO) to improve clustering of electrocardiogram (ECG) beat data.
2. The algorithm maps data into a higher dimensional space using kernel functions to make clusters more linearly separable, addresses issues with KFCM being sensitive to initialization and prone to local minima.
3. It uses HACO to optimize cluster centers and membership degrees, and FAPSO to evaluate fitness values and optimize weight vectors, forming usable clusters for applications like ECG classification.
A Novel Penalized and Compensated Constraints Based Modified Fuzzy Possibilis...ijsrd.com
A cluster is a group of objects which are similar to each other within a cluster and are dissimilar to the objects of other clusters. The similarity is typically calculated on the basis of distance between two objects or clusters. Two or more objects present inside a cluster and only if those objects are close to each other based on the distance between them.The major objective of clustering is to discover collection of comparable objects based on similarity metric. Fuzzy Possibilistic C-Means (FPCM) is the effective clustering algorithm available to cluster unlabeled data that produces both membership and typicality values during clustering process. In this approach, the efficiency of the Fuzzy Possibilistic C-means clustering approach is enhanced by using the penalized and compensated constraints based FPCM (PCFPCM). The proposed PCFPCM approach differ from the conventional clustering techniques by imposing the possibilistic reasoning strategy on fuzzy clustering with penalized and compensated constraints for updating the grades of membership and typicality. The performance of the proposed approaches is evaluated on the University of California, Irvine (UCI) machine repository datasets such as Iris, Wine, Lung Cancer and Lymphograma. The parameters used for the evaluation is Clustering accuracy, Mean Squared Error (MSE), Execution Time and Convergence behavior.
Re-Mining Association Mining Results through Visualization, Data Envelopment ...Gurdal Ertek
Re-mining is a general framework which suggests the execution of additional data mining steps based on the results of an original data mining process. This study investigates the multi-faceted re-mining of association mining results, develops and presents a practical methodology, and shows the applicability of the developed methodology through real world data. The methodology suggests re-mining using data visualization, data envelopment analysis, and decision trees. Six hypotheses, regarding how re-mining can be carried out on association mining results, are answered in the case study through empirical analysis.
http://research.sabanciuniv.edu.
Efficient Parallel Pruning of Associative Rules with Optimized SearchIOSR Journals
This document describes a proposed algorithm called Fuzzy Optimal Search Space Pruning (FOSSP) to improve the efficiency of association rule mining. FOSSP uses parallel pruning techniques to simultaneously mine large transactional datasets at different item set levels, reducing execution time. It generates candidate item sets based on relative item values and information requirements. Frequent item sets and strong association rules are then generated from the parallel pruned item levels based on minimum support and conditional probability. The algorithm aims to minimize candidate sets and maximize informative rules while reducing execution time for association rule mining.
Re-Mining Association Mining Results Through Visualization, Data Envelopment ...ertekg
İndirmek için Bağlantı > https://ertekprojects.com/gurdal-ertek-publications/blog/re-mining-association-mining-results-through-visualization-data-envelopment-analysis-and-decision-trees/
Re-mining is a general framework which suggests the execution of additional data mining steps based on the results of an original data mining process. This study investigates the multi-faceted re-mining of association mining results, develops and presents a practical methodology, and shows the applicability of the developed methodology through real world data. The methodology suggests re-mining using data visualization, data envelopment analysis, and decision trees. Six hypotheses, regarding how re-mining can be carried out on association mining results, are answered in the case study through empirical analysis.
Delineation of techniques to implement on the enhanced proposed model using d...ijdms
In post genomic era with the advent of new technologies a huge amount of complex molecular data are
generated with high throughput. The management of this biological data is definitely a challenging task
due to complexity and heterogeneity of data for discovering new knowledge. Issues like managing noisy
and incomplete data are needed to be dealt with. Use of data mining in biological domain has made its
inventory success. Discovering new knowledge from the biological data is a major challenge in data
mining technique. The novelty of the proposed model is its combined use of intelligent techniques to classify
the protein sequence faster and efficiently. Use of FFT, fuzzy classifier, String weighted algorithm, gram
encoding method, neural network model and rough set classifier in a single model and in an appropriate
place can enhance the quality of the classification system .Thus the primary challenge is to identify and
classify the large protein sequences in a very fast and easy but intellectual way to decrease the time
complexity and space complexity.
This document outlines the aim, objectives, scope, and structure of a dissertation on using genetic programming to optimize and combine K nearest neighbor classifiers for intrusion detection. The aim is to use genetic programming with the KDD Cup 1999 dataset to develop a numeric classifier that shows improved performance over individual KNN classifiers. The objectives are to determine if a GP-based numeric classifier outperforms individual KNN classifiers, if GP combination techniques produce higher performance than KNN component classifiers, and if heterogeneous KNN classifier combination performs better than homogeneous combination. The document describes the methodology that will be used, including developing an optimal KNN classifier using fitness evaluation in the first phase and combining optimal KNN classifiers based on ROC curves in the second phase.
Conceptual Similarity Measurement Algorithm For Domain Specific OntologyZac Darcy
This paper presents the similarity measurement algorithm for domain specific terms collected in the
ontology based data integration system. This similarity measurement algorithm can be used in ontology
mapping and query service of ontology based data integration system. In this paper, we focus on the web
query service to apply this proposed algorithm. Concepts similarity is important for web query service
because the words in user input query are not same wholly with the concepts in ontology. So, we need to
extract the possible concepts that are match or related to the input words with the help of machine readable
dictionary WordNet. Sometimes, we use the generated mapping rules in query generation procedure for
some words that cannot be confirmed the similarity of these words by WordNet. We prove the effect of this
algorithm with two degree semantic result of web mining by generating the concepts results obtained form
the input query.
Parallel Key Value Pattern Matching Modelijsrd.com
Mining frequent itemsets from the huge transactional database is an important task in data mining. To find frequent itemsets in databases involves big decision in data mining for the purpose of extracting association rules. Association rule mining is used to find relationships among large datasets. Many algorithms were developed to find those frequent itemsets. This work presents a summarization and new model of parallel key value pattern matching model which shards a large-scale mining task into independent, parallel tasks. It produces a frequent pattern showing their capabilities and efficiency in terms of time consumption. It also avoids the high computational cost. It discovers the frequent item set from the database.
Classifier Model using Artificial Neural NetworkAI Publications
This document summarizes a research paper that investigates using supervised instance selection (SIS) as a preprocessing step to improve the performance of artificial neural networks (ANNs) for classification tasks. SIS aims to select a subset of examples from the original dataset to enhance the accuracy of future classifications. The goal of applying SIS before ANNs is to provide a cleaner input dataset that handles noisy or redundant data better. The paper presents the architecture of feedforward neural networks and the backpropagation algorithm for training networks. It also discusses using mutual information-based feature selection as part of the SIS preprocessing approach.
Nature Inspired Models And The Semantic WebStefan Ceriu
In this paper we present a series of nature inspired models used as alternative solutions for Semantic Web concerns. Some of the methods presented in this article perform better than classic algorithms by enhancing response time and computational costs. Others are just proof of concept, first steps towards new techniques that will improve their respective field. The intricate nature of the Semantic Web urges the need for faster, more intelligent algorithms and nature inspired models have been proven to be more than suitable for such complex tasks.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Similar to AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SWARM OPTIMIZATION (20)
Null Bangalore | Pentesters Approach to AWS IAMDivyanshu
#Abstract:
- Learn more about the real-world methods for auditing AWS IAM (Identity and Access Management) as a pentester. So let us proceed with a brief discussion of IAM as well as some typical misconfigurations and their potential exploits in order to reinforce the understanding of IAM security best practices.
- Gain actionable insights into AWS IAM policies and roles, using hands on approach.
#Prerequisites:
- Basic understanding of AWS services and architecture
- Familiarity with cloud security concepts
- Experience using the AWS Management Console or AWS CLI.
- For hands on lab create account on [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
# Scenario Covered:
- Basics of IAM in AWS
- Implementing IAM Policies with Least Privilege to Manage S3 Bucket
- Objective: Create an S3 bucket with least privilege IAM policy and validate access.
- Steps:
- Create S3 bucket.
- Attach least privilege policy to IAM user.
- Validate access.
- Exploiting IAM PassRole Misconfiguration
-Allows a user to pass a specific IAM role to an AWS service (ec2), typically used for service access delegation. Then exploit PassRole Misconfiguration granting unauthorized access to sensitive resources.
- Objective: Demonstrate how a PassRole misconfiguration can grant unauthorized access.
- Steps:
- Allow user to pass IAM role to EC2.
- Exploit misconfiguration for unauthorized access.
- Access sensitive resources.
- Exploiting IAM AssumeRole Misconfiguration with Overly Permissive Role
- An overly permissive IAM role configuration can lead to privilege escalation by creating a role with administrative privileges and allow a user to assume this role.
- Objective: Show how overly permissive IAM roles can lead to privilege escalation.
- Steps:
- Create role with administrative privileges.
- Allow user to assume the role.
- Perform administrative actions.
- Differentiation between PassRole vs AssumeRole
Try at [killercoda.com](https://killercoda.com/cloudsecurity-scenario/)
Redefining brain tumor segmentation: a cutting-edge convolutional neural netw...IJECEIAES
Medical image analysis has witnessed significant advancements with deep learning techniques. In the domain of brain tumor segmentation, the ability to
precisely delineate tumor boundaries from magnetic resonance imaging (MRI)
scans holds profound implications for diagnosis. This study presents an ensemble convolutional neural network (CNN) with transfer learning, integrating
the state-of-the-art Deeplabv3+ architecture with the ResNet18 backbone. The
model is rigorously trained and evaluated, exhibiting remarkable performance
metrics, including an impressive global accuracy of 99.286%, a high-class accuracy of 82.191%, a mean intersection over union (IoU) of 79.900%, a weighted
IoU of 98.620%, and a Boundary F1 (BF) score of 83.303%. Notably, a detailed comparative analysis with existing methods showcases the superiority of
our proposed model. These findings underscore the model’s competence in precise brain tumor localization, underscoring its potential to revolutionize medical
image analysis and enhance healthcare outcomes. This research paves the way
for future exploration and optimization of advanced CNN models in medical
imaging, emphasizing addressing false positives and resource efficiency.
Rainfall intensity duration frequency curve statistical analysis and modeling...bijceesjournal
Using data from 41 years in Patna’ India’ the study’s goal is to analyze the trends of how often it rains on a weekly, seasonal, and annual basis (1981−2020). First, utilizing the intensity-duration-frequency (IDF) curve and the relationship by statistically analyzing rainfall’ the historical rainfall data set for Patna’ India’ during a 41 year period (1981−2020), was evaluated for its quality. Changes in the hydrologic cycle as a result of increased greenhouse gas emissions are expected to induce variations in the intensity, length, and frequency of precipitation events. One strategy to lessen vulnerability is to quantify probable changes and adapt to them. Techniques such as log-normal, normal, and Gumbel are used (EV-I). Distributions were created with durations of 1, 2, 3, 6, and 24 h and return times of 2, 5, 10, 25, and 100 years. There were also mathematical correlations discovered between rainfall and recurrence interval.
Findings: Based on findings, the Gumbel approach produced the highest intensity values, whereas the other approaches produced values that were close to each other. The data indicates that 461.9 mm of rain fell during the monsoon season’s 301st week. However, it was found that the 29th week had the greatest average rainfall, 92.6 mm. With 952.6 mm on average, the monsoon season saw the highest rainfall. Calculations revealed that the yearly rainfall averaged 1171.1 mm. Using Weibull’s method, the study was subsequently expanded to examine rainfall distribution at different recurrence intervals of 2, 5, 10, and 25 years. Rainfall and recurrence interval mathematical correlations were also developed. Further regression analysis revealed that short wave irrigation, wind direction, wind speed, pressure, relative humidity, and temperature all had a substantial influence on rainfall.
Originality and value: The results of the rainfall IDF curves can provide useful information to policymakers in making appropriate decisions in managing and minimizing floods in the study area.
Optimizing Gradle Builds - Gradle DPE Tour Berlin 2024Sinan KOZAK
Sinan from the Delivery Hero mobile infrastructure engineering team shares a deep dive into performance acceleration with Gradle build cache optimizations. Sinan shares their journey into solving complex build-cache problems that affect Gradle builds. By understanding the challenges and solutions found in our journey, we aim to demonstrate the possibilities for faster builds. The case study reveals how overlapping outputs and cache misconfigurations led to significant increases in build times, especially as the project scaled up with numerous modules using Paparazzi tests. The journey from diagnosing to defeating cache issues offers invaluable lessons on maintaining cache integrity without sacrificing functionality.
An improved modulation technique suitable for a three level flying capacitor ...IJECEIAES
This research paper introduces an innovative modulation technique for controlling a 3-level flying capacitor multilevel inverter (FCMLI), aiming to streamline the modulation process in contrast to conventional methods. The proposed
simplified modulation technique paves the way for more straightforward and
efficient control of multilevel inverters, enabling their widespread adoption and
integration into modern power electronic systems. Through the amalgamation of
sinusoidal pulse width modulation (SPWM) with a high-frequency square wave
pulse, this controlling technique attains energy equilibrium across the coupling
capacitor. The modulation scheme incorporates a simplified switching pattern
and a decreased count of voltage references, thereby simplifying the control
algorithm.
Embedded machine learning-based road conditions and driving behavior monitoringIJECEIAES
Car accident rates have increased in recent years, resulting in losses in human lives, properties, and other financial costs. An embedded machine learning-based system is developed to address this critical issue. The system can monitor road conditions, detect driving patterns, and identify aggressive driving behaviors. The system is based on neural networks trained on a comprehensive dataset of driving events, driving styles, and road conditions. The system effectively detects potential risks and helps mitigate the frequency and impact of accidents. The primary goal is to ensure the safety of drivers and vehicles. Collecting data involved gathering information on three key road events: normal street and normal drive, speed bumps, circular yellow speed bumps, and three aggressive driving actions: sudden start, sudden stop, and sudden entry. The gathered data is processed and analyzed using a machine learning system designed for limited power and memory devices. The developed system resulted in 91.9% accuracy, 93.6% precision, and 92% recall. The achieved inference time on an Arduino Nano 33 BLE Sense with a 32-bit CPU running at 64 MHz is 34 ms and requires 2.6 kB peak RAM and 139.9 kB program flash memory, making it suitable for resource-constrained embedded systems.
Use PyCharm for remote debugging of WSL on a Windo cf5c162d672e4e58b4dde5d797...shadow0702a
This document serves as a comprehensive step-by-step guide on how to effectively use PyCharm for remote debugging of the Windows Subsystem for Linux (WSL) on a local Windows machine. It meticulously outlines several critical steps in the process, starting with the crucial task of enabling permissions, followed by the installation and configuration of WSL.
The guide then proceeds to explain how to set up the SSH service within the WSL environment, an integral part of the process. Alongside this, it also provides detailed instructions on how to modify the inbound rules of the Windows firewall to facilitate the process, ensuring that there are no connectivity issues that could potentially hinder the debugging process.
The document further emphasizes on the importance of checking the connection between the Windows and WSL environments, providing instructions on how to ensure that the connection is optimal and ready for remote debugging.
It also offers an in-depth guide on how to configure the WSL interpreter and files within the PyCharm environment. This is essential for ensuring that the debugging process is set up correctly and that the program can be run effectively within the WSL terminal.
Additionally, the document provides guidance on how to set up breakpoints for debugging, a fundamental aspect of the debugging process which allows the developer to stop the execution of their code at certain points and inspect their program at those stages.
Finally, the document concludes by providing a link to a reference blog. This blog offers additional information and guidance on configuring the remote Python interpreter in PyCharm, providing the reader with a well-rounded understanding of the process.
2008 BUILDING CONSTRUCTION Illustrated - Ching Chapter 02 The Building.pdf
AUTOMATED INFORMATION RETRIEVAL MODEL USING FP GROWTH BASED FUZZY PARTICLE SWARM OPTIMIZATION
1. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 1, February 2017
DOI:10.5121/ijcsit.2017.9109 105
AUTOMATED INFORMATION RETRIEVAL MODEL
USING FP GROWTH BASED FUZZY PARTICLE
SWARM OPTIMIZATION
Raja Varma Pamba1
Elizabeth Sherly2
and Kiran Mohan3
1
School of Computer Sciences, Mahatma Gandhi University, Kottayam,India
2
Indian Institute of Information Technology and Management-Kerala, Trivandrum,India
3
Payszone LLC LTD, Dubai,UAE
ABSTRACT
To mine out relevant facts at the time of need from web has been a tenuous task. Research on diverse fields
are fine tuning methodologies toward these goals that extracts the best of information relevant to the users
search query. In the proposed methodology discussed in this paper find ways to ease the search complexity
tackling the severe issues hindering the performance of traditional approaches in use. The proposed
methodology find effective means to find all possible semantic relatable frequent sets with FP Growth
algorithm. The outcome of which is the further source of fuel for Bio inspired Fuzzy PSO to find the optimal
attractive points for the web documents to get clustered meeting the requirement of the search query
without losing the relevance. On the whole the proposed system optimizes the objective function of
minimizing the intra cluster differences and maximizes the inter cluster distances along with retention of all
possible relationships with the search context intact. The major contribution being the system finds all
possible combinations matching the user search transaction and thereby making the system more
meaningful. These relatable sets form the set of particles for Fuzzy Clustering as well as PSO and thus
being unbiased and maintains a innate behaviour for any number of new additions to follow the herd
behaviour’s evaluations reveals the proposed methodology fares well as an optimized and effective
enhancements over the conventional approaches.
KEYWORDS
Information retrieval, Clustering, Fuzzy particle swarm optimization & Frequent Pattern growth
1. INTRODUCTION
Huge influx of information in the web poses the greatest challenge before all researches to find an
effective way to find the best of all relevant information matching the search context. Searching
and indexing methods in retrieval systems exists in multitudes but one that finds information
matching to the relevance to the search query automatically still remains a problem unresolved.
Web documents exhibit the property of being belonging to more than one cluster. This quality
fine tunes to algorithms that caters to partitional clustering out performing hierarchical clustering.
Hierarchical clustering fails at situations when it needs to trace back to new clusters if any
changes happens. While in partitional clustering adapts well to soft clustering. The traditional
approaches K means and FCM much in popular fails with regard to initial supply of random
cluster centroids by making the system biased to the user inputs. In the proposed methodology to
my knowledge first of its kind to make the system by itself generating meaningful inputs to fuel
the Fuzzy Particle swarm optimization. Except for the initial set of documents to be clustered no
other inputs are given to the system making the system automated.
2. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 1, February 2017
106
The rest of the paper is organized as follows. Section 2 deals with the related works. Section 3
describes the problem and the research objectives. Section 4 discusses Proposed Methodology.
The Experimental results and Conclusions are described by Section 5 and 6 respectively.
2. RELATED WORK
Paper by Liu.et.al (2008) discusses on the combination of PSO and Fuzzy C-Means . There is also
an integration of PSO and K-means algorithm (KPSO). Yau.et.al (2013 )improves KPSO
algorithm by proposing an enhanced cluster matching.In the authors proposed an innovative
approach of using PSO to overcome the shortcomings of FCM clustering .Statistical analysis
based principal component analysis was used a variant of Fuzzy clustering in J. C. Bezdek, C.
Coray work.
3. RESEARCH OBJECTIVE
The main objectives of the proposed methodology are find ways to achieve the following
objectives
1. Find all possible semantic relatable frequent sets as the reduced search space for the
clustering to begin with.
2. To find from among the semantic relatable sets the initial optimal cluster centroids and
the set of particles where partitional clustering fails to find it.
3. Retrieve all possible fuzzy clusters matching the user query relevance using Frequent
pattern growth based Fuzzy Particle swarm optimization abbreviated as FPFPSO for the
rest of the paper.
4. Evaluate the cluster quality for its internal cluster quality.
4.METHODOLGIES
4.1 SEMANTICALLY RELATABLE SETS FOR SEARCH SPACE
4.1.1 FREQUENT PATTERN GROWTH ALGORITHM
In the proposed methodology discussed we make use of Frequent Pattern Growth Algorithm for
reducing the search space by retaining all possible semantic relatable frequent item sets matching
to the search transactions. With its initial phase dealing with Frequent pattern tree generation we
compress all of our keywords into a tree after processioning of documents. In this the keywords
are initially checked for the threshold frequency to match with support and confidence to finally
prune out all those terms not in relevance to the transaction. By this we retrieve only terms
relevant to the search query. Once the FP tree is generated the second stage of FPG is Frequent
Pattern Growth for generating all possible semantic relatable frequent items sets. The algorithm
retrieves all conditional frequent item sets in use frequently which could for the rest of the paper
can be presumed as semantic relatable sets which are related to the search context.
For Example if key terms after pre-processing are Computer, Network, Social then after FP Tree
it shows the co-occurrence of each keyword and in every chain of keywords how many times the
particular terms have been traversed. With FP Growth, it finally generates all possible
combinations like [Computer Network, Social Network, Computer Social Network].} All
possible combinations matching the threshold support. A template of the semantic relatable set
retrieved from the FP Growth algorithm is as given below in Table 1.
3. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 1, February 2017
107
From this those matches the threshold support are filtered and rest are pruned out to reduce the
search space. The retrieved bunch has in them all possible semantically related term sets which
has higher relevancy to the user search context as shown in Table 1. Indirectly system builds in a
concept tagging of highly related terms to form meaning as well as relevance to the search
context. By making use of projected data structure based tree generation, a divide and conquer
approach strategy to effectively overcome the issue of space constraints of FP Growth algorithm
as in [ Han J., Pei J., Yin Y. and Mao R.,2003]. The major constraint of every conventional
clustering is that the approach is skewed towards the random initialization of inputs. To avoid this
the proposed methodology recast the outcomes of FP Growth to suit the demands of FuzzyPSO.
4.2. FUZZY CLUSTERING
4.2.1. FUZZY PARTICLE SWARM OPTIMIZATION
The fuzzy clustering can be considered as an stochastic meta heuristic optimization problem that
gives optimized clusters. The objective of optimization is to seek values for a set of parameters
that maximize or minimize objective functions subject to certain constraints. Swarm Intelligence
is an intelligent paradigm for solving optimization problems that is basically evolved from bio
nature inspired mechanisms like fish schooling, bird flocking and swarm behaviour of honey
bees. The drawbacks of Fuzzy clustering normally FCM are resolved with the optimization tools
like PSO ,ACO and Genetic Algorithms. Out of all evolutionary algorithms the simplest of all in
implementation and managing is Particle swarm optimization which does not require any
crossover or mutation operators. Web documents being fuzzy and vague the need of the time is
call for Fuzzy Particle swarm optimization which caters to the dynamicity and vagueness of web
documents.
The fuzzy clustering of objects is described by a fuzzy matrix µ with n rows and c columns in
which n is the number of data objects and c is the number of clusters. The element in the ith
row
and jth
column in µ{ij}, indicates the degree of association or membership function of the ith
object
with the jth
cluster. The characters of µ are as follows:
4. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 1, February 2017
108
4.3. HYBRID FP GROWTH BASED FUZZY PARTICLE SWARM OPTIMIZATION
In this proposed methodology as referred to subsection 4.1.1 the parameters for Fuzzy PSO are
generated by the system itself after FP Growth algorithm. The set of conditional frequent item
sets are considered as the swarm and cluster centroids calculated by finding the average in every
sets.
The Fuzzy PSO starts with a population of particles and initial number of clusters obtained from
FP growth algorithm whose positions represent the potential solutions for the studied problems
and velocities are randomly initialized in the search space. The position matrix X is redefined in
this proposed algorithm, represent the fuzzy relation (membership function) between the frequent
items sets(particles) in columns and cluster centres as rows. The position matrix is given below:
In each of its iteration, the search for optimal solution is executed by updating the particle
velocities and its position. The fitness value of each frequent item sets(particles or swarm) is
determined using a fitness function, Jm based on Euclidean Distance measure as defined in
Equation 3 .The velocity of each particle is updated using two best position ,individual best
position ibest and social best solution sbest .The individual best position ibest is the best position
that particle has visited so far and sbest is the best position the swarm has visited so far. A particle
velocity and position are updated as follows :
Where X and Y are position and velocity of particles respectively, w is inertia weight, c1 and c2
are constants, called acceleration coefficients which control the influence of ibest and sbest on the
search process, P is the number of particles in the swarm derived from the frequent item sets
generated from FP Growth, r1 and r2 are random values in the range[0,1].
Based upon the fitness function evaluated, the particle with least fitness function can be
eliminated to help improve the search space. Finally the FPFPSO Clustering is executed upon the
resultant set to fine tune the elements for clustering.
A stage reaches where the position of the particle saturates with no changes in the position
further. Here the algorithm converges to retrieve the final cluster positions of individual
documents to its respective clusters as shown in Table 2.
5. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 1, February 2017
109
5. EXPERIMENTAL RESULTS
Conventional approaches KMeans, FCM and FPFCM algorithm have been conducted on
Reuters 21578 for the purpose of performance evaluation. In the experiment 50 particles have
been trained for 100 iteration. The documents in the Reuters-21578 collection are originally taken
from Reuters newswire in 1987. The Reuters-21578 contains 22 files. Each of the first 21 files
(reut2-000.sgm through reut2-020.sgm) contains 1000 documents, while the last (reut2-021.sgm)
contains 578 documents. The documents are broadly divided into five broad categories
(Exchanges, People, Topics, Organizations and Places). These categories are further divided into
subcategories. The Reuters-21578 test collection is available at [9]. Figure 1 shows that the
fitness value decreases for every iteration and FPFPSO performs better than K-Means, FCM and
FPFCM. When the number of clusters increases then the fitness function value is decreased. The
K-means algorithm generates a local optima result. Two similar documents in the same cluster are
true positive (TP) and two similar documents in different clusters are coined true negative (TN).
While two dissimilar documents to the same cluster is false positive (FP) and two similar
documents to different clusters is false negative (FN). Precision (P) and Recall(R) are two popular
measures for clustering. Thumb rule is higher the precision lower the recall value. The F-measure
( F) is used to evaluate the performance measure of the proposed system. A higher F- measure
indicates better performance which the proposed methodology achieves as shown in Figure 2.
The Precision and Recall are given below. The F-measure is evaluated using the
The FPFPSO clustering maximizes the similarity between the documents in the same cluster
and also maximizes the distance between the cluster centres. The fitness function of the
hybrid algorithm is the objective function of FCM algorithm.
FCM performs well but with random initialization system rather turns biased and hooks at local
minima. The bio inspired particle swarm algorithm is a meta stochastic tool that is mostly used to
supplement the performance of conventional clustering approaches. The proposed algorithm fares
well with the convergence speed of PSO and also overcomes the shortcomings of FCM to escape
from local optima.
6. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 1, February 2017
110
6. CONCLUSIONS
In this proposed approach, the methodology takes into account primarily the base issues of any
information retrieval systems effectively. To instil concept tagging and relationship in the
retrieved patterns, we use FP Growth. Use of FP Growth also helps reduce search space by
identifying all transactions relevant to the search context and generates all possible combinations
of related terms. From this bunch of highly related cooccurrence data points taken as swarm and
7. International Journal of Computer Science & Information Technology (IJCSIT) Vol 9, No 1, February 2017
111
optimal number of clusters been taken from FP Growth we direct the same to Fuzzy PSO to
inherit and discipline in the system to a specific behaviour, this takes care of attracting the
documents to its respective cluster nest by following the Fuzzy PSO. Evaluation results as shown
in Table 3 and Fig 1 gives a promising perspective end towards an efficient and effective
clustering which minimizes the intra cluster distances and maximizes the inter cluster differences.
REFERENCES
[1] Tan, P.N., Steinbach, M. and Kumar, V. " Cluster Analysis: Basic concepts and algorithms",
Introduction to Data Mining, Pearson Addison Wesley, Boston, pp. 487-568, 2006.
[2] Baeza-Yates, R. and Ribeiro-Neto, B. “Modern Information Retrieval”, Addison Wesley, 1999.
[3] Bezdek, J. “Pattern recognition with fuzzy objective function algorithms”, New York, Plenum
Press, 1981.
[4] Cui, X., Potok, T.E. and Palathingal, P. “Document clustering using particle swarm
optimization”, Proceedings of IEEE Swarm Intelligence Symposium, pp. 185-191, 2005
[5] Abraham, A., Guo, H. and Liu, H. “Swarm intelligence: foundations, perspectives and
applications”, Swarm Intelligent Systems, Nedjah, N.and Mourelle,L. Eds. Studies in
Computational Intelligence, Springer Verlag Germany, pp. 3-25, 2006.
[6] Han J., Pei J., Yin Y. and Mao R "Mining frequent patterns without candidate generation: A frequent-
pattern tree approach",Data Mining and Knowledge Discovery,2003.
AUTHORS
Raja Varma Pamba is an Assistant Professor in the Department of LBS Institute of Technology for Women,
Trivandrum, Kerala, India and pursuing Ph.D in School of Computer Sciences, Mahatma Gandhi
University, Kerala, India.
Dr Elizabeth Sherly is the Professor & Head for VRCLC department at Indian Institute of Information
Technology and Management-Kerala, India .
Kiran Mohan is the Operation Director for Payszone LLC Ltd,Dubai