Patterns that only occur in objects belonging to a
single class are called Jumping Emerging Patterns (JEP). JEP
based Classifiers are considered one of the successful classification
systems. Due to its comprehensibility, simplicity and strong
differentiating abilities JEPs have captured significant recognition.
However, discovery of JEPs in a large pattern space is normally a
time consuming and challenging task because of their exponential
behaviour. In this work a novel method based on genetic
algorithm (GA) is proposed to discover JEPs in large pattern
space. Since the complexity of GA is lower than other algorithms,
so we have combined the power of JEPs and GA to find high
quality JEPs from datasets to improve performance of
classification system. Our proposed method explores a set of high
quality JEPs from pattern search space unlike other methods in
literature that compute complete set of JEPs, Large numbers of
duplicate and redundant JEPs are filtered out during their
discovery process. Experimental results show that our proposed
Genetic-JEPs are effective and accurate for classification of a
variety of data sets and in general achieve higher accuracy than
other standard classifiers.
The document discusses finding the optimal parameter K for the emerging pattern-based classifier PCL. PCL selects the top-K emerging patterns from each class that match a test instance and uses these patterns to predict the class label. The authors develop an algorithm to determine the best value of K for PCL on different datasets, as K's optimal value varies. They sample subsets of training data to find the K that performs best on average, maximizing the likelihood of good performance on the whole dataset. Experimental results show that finding the best K improves PCL's accuracy and that using incremental frequent pattern maintenance techniques speeds up the algorithm significantly.
This document discusses using genetic programming (GP) for data classification. It begins with an introduction to classification problems and algorithms like decision trees, neural networks, and Bayesian classification. It then provides background on genetic algorithms and genetic programming, explaining that GP uses tree-based representations of computer programs to evolve solutions. The document presents a case study on using GP for classification, discussing fitness measures, creating training sets, and an interleaved data format to address class imbalance. The goal is to automatically generate classification models for data sets using GP.
Automatic Feature Subset Selection using Genetic Algorithm for Clusteringidescitation
Feature subset selection is a process of selecting a
subset of minimal, relevant features and is a pre processing
technique for a wide variety of applications. High dimensional
data clustering is a challenging task in data mining. Reduced
set of features helps to make the patterns easier to understand.
Reduced set of features are more significant if they are
application specific. Almost all existing feature subset
selection algorithms are not automatic and are not application
specific. This paper made an attempt to find the feature subset
for optimal clusters while clustering. The proposed Automatic
Feature Subset Selection using Genetic Algorithm (AFSGA)
identifies the required features automatically and reduces
the computational cost in determining good clusters. The
performance of AFSGA is tested using public and synthetic
datasets with varying dimensionality. Experimental results
have shown the improved efficacy of the algorithm with optimal
clusters and computational cost.
Fuzzy clustering has been widely studied and applied in a variety of key areas of science and
engineering. In this paper the Improved Teaching Learning Based Optimization (ITLBO)
algorithm is used for data clustering, in which the objects in the same cluster are similar. This
algorithm has been tested on several datasets and compared with some other popular algorithm
in clustering. Results have been shown that the proposed method improves the output of
clustering and can be efficiently used for fuzzy clustering.
sis of health condition is very challenging task for every human being because life is directly related to health
condition. Data mining based classification is one of the important applications for classification of data. In this
research work, we have used various classification techniques for classification of thyroid data. CART gives highest
accuracy 99.47% as best model. Feature selection plays very important role to computationally efficient and increase
the performance of model. This research work focus on Info Gain and Gain Ratio feature selection technique to
reduce the irrelevant features from original data set and computationally increase the performance of model. We have
applied both the feature selection techniques on best model i. e. CART. Our proposed CART-Info Gain and CARTGain
Ratio gives 99.47% and 99.20% accuracy with 25 and 3 feature respectively.
Dynamic Radius Species Conserving Genetic Algorithm for Test Generation for S...ijseajournal
This document summarizes a research paper that proposes a new approach called Dynamic-radius Species-conserving Genetic Algorithm (DSGA) for generating structural test cases using a genetic algorithm. DSGA aims to generate a complete test suite with a single run by finding test cases that cover different areas of the program structure. It begins by finding test cases that cover some areas, then excludes those areas to search for test cases covering other uncovered areas, similar to how humans generate structural test cases. The paper evaluates DSGA on the Triangle Classification algorithm and finds it able to generate a complete test suite without limitations of other genetic algorithm approaches for structural test case generation.
The document discusses deductive databases and how they differ from conventional databases. Deductive databases contain facts and rules that allow implicit facts to be deduced from the stored information. This reduces the amount of storage needed compared to explicitly storing all facts. Deductive databases use logic programming through languages like Datalog to specify rules that define virtual relations. The rules allow new facts to be inferred through an inference engine even if they are not explicitly represented.
The document discusses finding the optimal parameter K for the emerging pattern-based classifier PCL. PCL selects the top-K emerging patterns from each class that match a test instance and uses these patterns to predict the class label. The authors develop an algorithm to determine the best value of K for PCL on different datasets, as K's optimal value varies. They sample subsets of training data to find the K that performs best on average, maximizing the likelihood of good performance on the whole dataset. Experimental results show that finding the best K improves PCL's accuracy and that using incremental frequent pattern maintenance techniques speeds up the algorithm significantly.
This document discusses using genetic programming (GP) for data classification. It begins with an introduction to classification problems and algorithms like decision trees, neural networks, and Bayesian classification. It then provides background on genetic algorithms and genetic programming, explaining that GP uses tree-based representations of computer programs to evolve solutions. The document presents a case study on using GP for classification, discussing fitness measures, creating training sets, and an interleaved data format to address class imbalance. The goal is to automatically generate classification models for data sets using GP.
Automatic Feature Subset Selection using Genetic Algorithm for Clusteringidescitation
Feature subset selection is a process of selecting a
subset of minimal, relevant features and is a pre processing
technique for a wide variety of applications. High dimensional
data clustering is a challenging task in data mining. Reduced
set of features helps to make the patterns easier to understand.
Reduced set of features are more significant if they are
application specific. Almost all existing feature subset
selection algorithms are not automatic and are not application
specific. This paper made an attempt to find the feature subset
for optimal clusters while clustering. The proposed Automatic
Feature Subset Selection using Genetic Algorithm (AFSGA)
identifies the required features automatically and reduces
the computational cost in determining good clusters. The
performance of AFSGA is tested using public and synthetic
datasets with varying dimensionality. Experimental results
have shown the improved efficacy of the algorithm with optimal
clusters and computational cost.
Fuzzy clustering has been widely studied and applied in a variety of key areas of science and
engineering. In this paper the Improved Teaching Learning Based Optimization (ITLBO)
algorithm is used for data clustering, in which the objects in the same cluster are similar. This
algorithm has been tested on several datasets and compared with some other popular algorithm
in clustering. Results have been shown that the proposed method improves the output of
clustering and can be efficiently used for fuzzy clustering.
sis of health condition is very challenging task for every human being because life is directly related to health
condition. Data mining based classification is one of the important applications for classification of data. In this
research work, we have used various classification techniques for classification of thyroid data. CART gives highest
accuracy 99.47% as best model. Feature selection plays very important role to computationally efficient and increase
the performance of model. This research work focus on Info Gain and Gain Ratio feature selection technique to
reduce the irrelevant features from original data set and computationally increase the performance of model. We have
applied both the feature selection techniques on best model i. e. CART. Our proposed CART-Info Gain and CARTGain
Ratio gives 99.47% and 99.20% accuracy with 25 and 3 feature respectively.
Dynamic Radius Species Conserving Genetic Algorithm for Test Generation for S...ijseajournal
This document summarizes a research paper that proposes a new approach called Dynamic-radius Species-conserving Genetic Algorithm (DSGA) for generating structural test cases using a genetic algorithm. DSGA aims to generate a complete test suite with a single run by finding test cases that cover different areas of the program structure. It begins by finding test cases that cover some areas, then excludes those areas to search for test cases covering other uncovered areas, similar to how humans generate structural test cases. The paper evaluates DSGA on the Triangle Classification algorithm and finds it able to generate a complete test suite without limitations of other genetic algorithm approaches for structural test case generation.
The document discusses deductive databases and how they differ from conventional databases. Deductive databases contain facts and rules that allow implicit facts to be deduced from the stored information. This reduces the amount of storage needed compared to explicitly storing all facts. Deductive databases use logic programming through languages like Datalog to specify rules that define virtual relations. The rules allow new facts to be inferred through an inference engine even if they are not explicitly represented.
This document proposes a generalized definition language for implementing an object-based fuzzy class model. It begins by reviewing related work on defining fuzzy classes and identifying limitations in existing approaches. It then summarizes the authors' previous work developing a generalized fuzzy class structure and model. The document introduces several new data types for representing different types of fuzzy attributes. Finally, it proposes a formal definition language for the fuzzy class model that utilizes the new data types to define fuzzy class structure and accurately represent fuzzy data types and attribute values. The language is intended to serve as a data definition language for object-based fuzzy database systems.
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...cscpconf
Classification is a step by step practice for allocating a given piece of input into any of the given
category. Classification is an essential Machine Learning technique. There are many
classification problem occurs in different application areas and need to be solved. Different
types are classification algorithms like memory-based, tree-based, rule-based, etc are widely
used. This work studies the performance of different memory based classifiers for classification
of Multivariate data set from UCI machine learning repository using the open source machine
learning tool. A comparison of different memory based classifiers used and a practical
guideline for selecting the most suited algorithm for a classification is presented. Apart fromthat some empirical criteria for describing and evaluating the best classifiers are discussed
Multi-Population Methods with Adaptive Mutation for Multi-Modal Optimization ...ijscai
This paper presents an efficient scheme to locate multiple peaks on multi-modal optimization problems by
using genetic algorithms (GAs). The premature convergence problem shows due to the loss of diversity,
the multi-population technique can be applied to maintain the diversity in the population and the
convergence capacity of GAs. The proposed scheme is the combination of multi-population with adaptive
mutation operator, which determines two different mutation probabilities for different sites of the
solutions. The probabilities are updated by the fitness and distribution of solutions in the search space
during the evolution process. The experimental results demonstrate the performance of the proposed
algorithm based on a set of benchmark problems in comparison with relevant algorithms.
The+application+of+irt+using+the+rasch+model presnetation1Carlo Magno
The document discusses the application of Item Response Theory (IRT) using the Rasch model to construct cognitive measures. It provides an overview of psychometric theory, classical test theory, and IRT approaches like the Rasch model. The Rasch model assumes that the probability of a correct response depends only on the difference between a person's ability and the item difficulty. It provides sample-independent item calibrations and person measures. The document outlines the assumptions, uses, and procedures of the Rasch model for test analysis.
This document provides an overview of classification techniques. It defines classification as assigning records to predefined classes based on their attribute values. The key steps are building a classification model from training data and then using the model to classify new, unseen records. Decision trees are discussed as a popular classification method that uses a tree structure with internal nodes for attributes and leaf nodes for classes. The document covers decision tree induction, handling overfitting, and performance evaluation methods like holdout validation and cross-validation.
This paper presents a set of methods that uses a genetic algorithm for automatic test-data generation in
software testing. For several years researchers have proposed several methods for generating test data
which had different drawbacks. In this paper, we have presented various Genetic Algorithm (GA) based test
methods which will be having different parameters to automate the structural-oriented test data generation
on the basis of internal program structure. The factors discovered are used in evaluating the fitness
function of Genetic algorithm for selecting the best possible Test method. These methods take the test
populations as an input and then evaluate the test cases for that program. This integration will help in
improving the overall performance of genetic algorithm in search space exploration and exploitation fields
with better convergence rate.
In order to solve the complex decision-making problems, there are many approaches and systems based on the fuzzy theory were proposed. In 1998, Smarandache introduced the concept of single-valued neutrosophic set as a complete development of fuzzy theory. In this paper, we research on the distance measure between single-valued neutrosophic sets based on the H-max measure of Ngan et al. [8]. The proposed measure is also a distance measure between picture fuzzy sets which was introduced by Cuong in 2013 [15]. Based on the proposed measure, an Adaptive Neuro Picture Fuzzy Inference System (ANPFIS) is built and applied to the decision making for the link states in interconnection networks. In experimental evaluation on the real datasets taken from the UPV (Universitat Politècnica de València) university, the performance of the proposed model is better than that of the related fuzzy methods.
Relationships Among Classical Test Theory and Item Response Theory Frameworks...AnusornKoedsri3
This document discusses relationships between classical test theory (CTT) and item response theory (IRT) frameworks. It reviews prior literature that has found the frameworks to be quite comparable in estimating person and item parameters. However, previous studies only used IRT to generate data and did not consider CTT models with an underlying normal variable assumption. The current study aims to compare item and person statistics from IRT and CTT models with an underlying normal variable assumption using simulated data generated from both frameworks. A simulation was conducted varying test length (20, 40, 60 items) and number of examinees (500, 1000) to compare one-parameter logistic, two-parameter logistic, parallel, and congeneric models.
This paper addresses the problem of determinizing probabilistic data to enable storage in legacy systems that only accept deterministic data. The paper explores this problem in the context of triggers and selection queries. Existing approaches like thresholding or top-1 selection are shown to provide suboptimal performance. Instead, the paper develops a query-aware strategy and demonstrates its advantages over existing solutions through empirical evaluation on real and synthetic datasets.
Introduction to Optimization with Genetic Algorithm (GA)Ahmed Gad
Selection of the optimal parameters for machine learning tasks is challenging. Some results may be bad not because the data is noisy or the used learning algorithm is weak, but due to the bad selection of the parameters values. This article gives a brief introduction about evolutionary algorithms (EAs) and describes genetic algorithm (GA) which is one of the simplest random-based EAs.
References:
Eiben, Agoston E., and James E. Smith. Introduction to evolutionary computing. Vol. 53. Heidelberg: springer, 2003.
https://www.linkedin.com/pulse/introduction-optimization-genetic-algorithm-ahmed-gad
https://www.kdnuggets.com/2018/03/introduction-optimization-with-genetic-algorithm.html
bulk ieee projects in pondicherry,ieee projects in pondicherry,final year ieee projects in pondicherry
Nexgen Technology Address:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com.
www.nexgenproject.com
Mobile: 9751442511,9791938249
Telephone: 0413-2211159.
NEXGEN TECHNOLOGY as an efficient Software Training Center located at Pondicherry with IT Training on IEEE Projects in Android,IEEE IT B.Tech Student Projects, Android Projects Training with Placements Pondicherry, IEEE projects in pondicherry, final IEEE Projects in Pondicherry , MCA, BTech, BCA Projects in Pondicherry, Bulk IEEE PROJECTS IN Pondicherry.So far we have reached almost all engineering colleges located in Pondicherry and around 90km
This document discusses different types of machine learning problems including association rule learning and classification. Association rule learning aims to discover relationships between variables in large datasets, such as customers that buy onions and potatoes also tend to buy hamburgers. Classification involves identifying which category a new observation belongs to based on a training set where category membership is known, such as recognizing characters, identifying faces, or diagnosing illnesses. Several common machine learning algorithms used for classification are also listed.
This document summarizes a research paper that analyzes the performance and convergence of a novel genetic algorithm model towards finding global minima. The paper introduces genetic algorithms, which are probabilistic search algorithms inspired by natural evolution. It describes the components of genetic algorithms, including chromosomes, fitness functions, reproduction, crossover, and mutation operators. It also discusses encoding solutions as chromosomes and two common genetic algorithm models: Holland's original model and the common model. The paper aims to present an analysis of applying genetic algorithms to optimize test functions and finding their global minima.
A Modified KS-test for Feature SelectionIOSR Journals
This document proposes a modified Kolmogorov-Smirnov (KS) test-based feature selection algorithm. It begins with an overview of feature selection and its benefits. It then discusses two common feature selection approaches: filter and wrapper models. The document proposes a fast redundancy removal filter based on a modified KS statistic that utilizes class label information to compare feature pairs. It compares the proposed algorithm to other methods like Correlation Feature Selection (CFS) and KS-Correlation Based Filter (KS-CBF). The efficiency and effectiveness of the various methods are tested on standard classifiers. In most cases, the proposed approach achieved equal or better classification accuracy compared to using all features or the other algorithms.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
This document summarizes a research paper that proposes a new Particle Swarm Optimization (PSO) based K-Prototype clustering algorithm to cluster mixed numeric and categorical data. It begins with background information on clustering algorithms like K-Means, K-Modes, and K-Prototype. It then describes the K-Prototype algorithm, PSO, and discrete binary PSO. Related work integrating PSO with other clustering algorithms is also reviewed. The proposed approach uses binary PSO to select improved initial prototypes for K-Prototype clustering in order to obtain better clustering results than traditional K-Prototype and avoid local optima.
Association rule discovery for student performance prediction using metaheuri...csandit
According to the increase of using data mining tech
niques in improving educational systems
operations, Educational Data Mining has been introd
uced as a new and fast growing research
area. Educational Data Mining aims to analyze data
in educational environments in order to
solve educational research problems. In this paper
a new associative classification technique
has been proposed to predict students final perform
ance. Despite of several machine learning
approaches such as ANNs, SVMs, etc. associative cla
ssifiers maintain interpretability along
with high accuracy. In this research work, we have
employed Honeybee Colony Optimization
and Particle Swarm Optimization to extract associat
ion rule for student performance prediction
as a multi-objective classification problem. Result
s indicate that the proposed swarm based
algorithm outperforms well-known classification tec
hniques on student performance prediction
classification problem.
ASSOCIATION RULE DISCOVERY FOR STUDENT PERFORMANCE PREDICTION USING METAHEURI...cscpconf
According to the increase of using data mining techniques in improving educational systems
operations, Educational Data Mining has been introduced as a new and fast growing research
area. Educational Data Mining aims to analyze data in educational environments in order to
solve educational research problems. In this paper a new associative classification technique
has been proposed to predict students final performance. Despite of several machine learning
approaches such as ANNs, SVMs, etc. associative classifiers maintain interpretability along
with high accuracy. In this research work, we have employed Honeybee Colony Optimization
and Particle Swarm Optimization to extract association rule for student performance prediction
as a multi-objective classification problem. Results indicate that the proposed swarm based
algorithm outperforms well-known classification techniques on student performance prediction
classification problem.
In the present day huge amount of data is generated in every minute and transferred frequently. Although
the data is sometimes static but most commonly it is dynamic and transactional. New data that is being
generated is getting constantly added to the old/existing data. To discover the knowledge from this
incremental data, one approach is to run the algorithm repeatedly for the modified data sets which is time
consuming. Again to analyze the datasets properly, construction of efficient classifier model is necessary.
The objective of developing such a classifier is to classify unlabeled dataset into appropriate classes. The
paper proposes a dimension reduction algorithm that can be applied in dynamic environment for
generation of reduced attribute set as dynamic reduct, and an optimization algorithm which uses the
reduct and build up the corresponding classification system. The method analyzes the new dataset, when it
becomes available, and modifies the reduct accordingly to fit the entire dataset and from the entire data
set, interesting optimal classification rule sets are generated. The concepts of discernibility relation,
attribute dependency and attribute significance of Rough Set Theory are integrated for the generation of
dynamic reduct set, and optimal classification rules are selected using PSO method, which not only
reduces the complexity but also helps to achieve higher accuracy of the decision system. The proposed
method has been applied on some benchmark dataset collected from the UCI repository and dynamic
reduct is computed, and from the reduct optimal classification rules are also generated. Experimental
result shows the efficiency of the proposed method.
This document proposes algorithms to efficiently find the best parameter K for the emerging pattern-based classifier PCL. It first summarizes the PCL classifier and discusses how the parameter K impacts its performance. It then presents the rPCL and ePCL algorithms to determine the optimal K value. rPCL repeatedly subsamples the training data and evaluates different K values to find the best average accuracy. ePCL enhances rPCL by using an incremental pattern maintenance technique called PSM to avoid recomputing emerging patterns from scratch for each subsample, improving runtime efficiency. Experimental results show that finding the best K through these algorithms significantly improves PCL's accuracy.
Efficiently Finding the Best Parameter for the Emerging Pattern-Based ClassifierGanesh Pavan Kambhampati
This document proposes algorithms to efficiently find the best parameter K for the emerging pattern-based classifier PCL. It first summarizes the PCL classifier and discusses how the parameter K impacts its performance. It then presents the rPCL and ePCL algorithms. rPCL repeatedly runs cross-validation to determine the best K by averaging accuracies. ePCL enhances rPCL by using an incremental pattern maintenance technique to avoid repeatedly constructing emerging patterns from scratch, improving efficiency. Experimental results show ePCL finds the best K much faster than rPCL by leveraging pattern maintenance.
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELijcsit
Predicting the student performance is a great concern to the higher education managements.This
prediction helps to identify and to improve students' performance.Several factors may improve this
performance.In the present study, we employ the data mining processes, particularly classification, to
enhance the quality of the higher educational system. Recently, a new direction is used for the improvement
of the classification accuracy by combining classifiers.In thispaper, we design and evaluate a fastlearning
algorithm using AdaBoost ensemble with a simple genetic algorithmcalled “Ada-GA” where the genetic
algorithm is demonstrated to successfully improve the accuracy of the combined classifier performance.
The Ada-GA algorithm proved to be of considerable usefulness in identifying the students at risk early,
especially in very large classes. This early prediction allows the instructor to provide appropriate advising
to those students. The Ada/GA algorithm is implemented and tested on ASSISTments dataset, the results
showed that this algorithm hassuccessfully improved the detection accuracy as well as it reduces the
complexity of computation.
This document proposes a generalized definition language for implementing an object-based fuzzy class model. It begins by reviewing related work on defining fuzzy classes and identifying limitations in existing approaches. It then summarizes the authors' previous work developing a generalized fuzzy class structure and model. The document introduces several new data types for representing different types of fuzzy attributes. Finally, it proposes a formal definition language for the fuzzy class model that utilizes the new data types to define fuzzy class structure and accurately represent fuzzy data types and attribute values. The language is intended to serve as a data definition language for object-based fuzzy database systems.
EFFECTIVENESS PREDICTION OF MEMORY BASED CLASSIFIERS FOR THE CLASSIFICATION O...cscpconf
Classification is a step by step practice for allocating a given piece of input into any of the given
category. Classification is an essential Machine Learning technique. There are many
classification problem occurs in different application areas and need to be solved. Different
types are classification algorithms like memory-based, tree-based, rule-based, etc are widely
used. This work studies the performance of different memory based classifiers for classification
of Multivariate data set from UCI machine learning repository using the open source machine
learning tool. A comparison of different memory based classifiers used and a practical
guideline for selecting the most suited algorithm for a classification is presented. Apart fromthat some empirical criteria for describing and evaluating the best classifiers are discussed
Multi-Population Methods with Adaptive Mutation for Multi-Modal Optimization ...ijscai
This paper presents an efficient scheme to locate multiple peaks on multi-modal optimization problems by
using genetic algorithms (GAs). The premature convergence problem shows due to the loss of diversity,
the multi-population technique can be applied to maintain the diversity in the population and the
convergence capacity of GAs. The proposed scheme is the combination of multi-population with adaptive
mutation operator, which determines two different mutation probabilities for different sites of the
solutions. The probabilities are updated by the fitness and distribution of solutions in the search space
during the evolution process. The experimental results demonstrate the performance of the proposed
algorithm based on a set of benchmark problems in comparison with relevant algorithms.
The+application+of+irt+using+the+rasch+model presnetation1Carlo Magno
The document discusses the application of Item Response Theory (IRT) using the Rasch model to construct cognitive measures. It provides an overview of psychometric theory, classical test theory, and IRT approaches like the Rasch model. The Rasch model assumes that the probability of a correct response depends only on the difference between a person's ability and the item difficulty. It provides sample-independent item calibrations and person measures. The document outlines the assumptions, uses, and procedures of the Rasch model for test analysis.
This document provides an overview of classification techniques. It defines classification as assigning records to predefined classes based on their attribute values. The key steps are building a classification model from training data and then using the model to classify new, unseen records. Decision trees are discussed as a popular classification method that uses a tree structure with internal nodes for attributes and leaf nodes for classes. The document covers decision tree induction, handling overfitting, and performance evaluation methods like holdout validation and cross-validation.
This paper presents a set of methods that uses a genetic algorithm for automatic test-data generation in
software testing. For several years researchers have proposed several methods for generating test data
which had different drawbacks. In this paper, we have presented various Genetic Algorithm (GA) based test
methods which will be having different parameters to automate the structural-oriented test data generation
on the basis of internal program structure. The factors discovered are used in evaluating the fitness
function of Genetic algorithm for selecting the best possible Test method. These methods take the test
populations as an input and then evaluate the test cases for that program. This integration will help in
improving the overall performance of genetic algorithm in search space exploration and exploitation fields
with better convergence rate.
In order to solve the complex decision-making problems, there are many approaches and systems based on the fuzzy theory were proposed. In 1998, Smarandache introduced the concept of single-valued neutrosophic set as a complete development of fuzzy theory. In this paper, we research on the distance measure between single-valued neutrosophic sets based on the H-max measure of Ngan et al. [8]. The proposed measure is also a distance measure between picture fuzzy sets which was introduced by Cuong in 2013 [15]. Based on the proposed measure, an Adaptive Neuro Picture Fuzzy Inference System (ANPFIS) is built and applied to the decision making for the link states in interconnection networks. In experimental evaluation on the real datasets taken from the UPV (Universitat Politècnica de València) university, the performance of the proposed model is better than that of the related fuzzy methods.
Relationships Among Classical Test Theory and Item Response Theory Frameworks...AnusornKoedsri3
This document discusses relationships between classical test theory (CTT) and item response theory (IRT) frameworks. It reviews prior literature that has found the frameworks to be quite comparable in estimating person and item parameters. However, previous studies only used IRT to generate data and did not consider CTT models with an underlying normal variable assumption. The current study aims to compare item and person statistics from IRT and CTT models with an underlying normal variable assumption using simulated data generated from both frameworks. A simulation was conducted varying test length (20, 40, 60 items) and number of examinees (500, 1000) to compare one-parameter logistic, two-parameter logistic, parallel, and congeneric models.
This paper addresses the problem of determinizing probabilistic data to enable storage in legacy systems that only accept deterministic data. The paper explores this problem in the context of triggers and selection queries. Existing approaches like thresholding or top-1 selection are shown to provide suboptimal performance. Instead, the paper develops a query-aware strategy and demonstrates its advantages over existing solutions through empirical evaluation on real and synthetic datasets.
Introduction to Optimization with Genetic Algorithm (GA)Ahmed Gad
Selection of the optimal parameters for machine learning tasks is challenging. Some results may be bad not because the data is noisy or the used learning algorithm is weak, but due to the bad selection of the parameters values. This article gives a brief introduction about evolutionary algorithms (EAs) and describes genetic algorithm (GA) which is one of the simplest random-based EAs.
References:
Eiben, Agoston E., and James E. Smith. Introduction to evolutionary computing. Vol. 53. Heidelberg: springer, 2003.
https://www.linkedin.com/pulse/introduction-optimization-genetic-algorithm-ahmed-gad
https://www.kdnuggets.com/2018/03/introduction-optimization-with-genetic-algorithm.html
bulk ieee projects in pondicherry,ieee projects in pondicherry,final year ieee projects in pondicherry
Nexgen Technology Address:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com.
www.nexgenproject.com
Mobile: 9751442511,9791938249
Telephone: 0413-2211159.
NEXGEN TECHNOLOGY as an efficient Software Training Center located at Pondicherry with IT Training on IEEE Projects in Android,IEEE IT B.Tech Student Projects, Android Projects Training with Placements Pondicherry, IEEE projects in pondicherry, final IEEE Projects in Pondicherry , MCA, BTech, BCA Projects in Pondicherry, Bulk IEEE PROJECTS IN Pondicherry.So far we have reached almost all engineering colleges located in Pondicherry and around 90km
This document discusses different types of machine learning problems including association rule learning and classification. Association rule learning aims to discover relationships between variables in large datasets, such as customers that buy onions and potatoes also tend to buy hamburgers. Classification involves identifying which category a new observation belongs to based on a training set where category membership is known, such as recognizing characters, identifying faces, or diagnosing illnesses. Several common machine learning algorithms used for classification are also listed.
This document summarizes a research paper that analyzes the performance and convergence of a novel genetic algorithm model towards finding global minima. The paper introduces genetic algorithms, which are probabilistic search algorithms inspired by natural evolution. It describes the components of genetic algorithms, including chromosomes, fitness functions, reproduction, crossover, and mutation operators. It also discusses encoding solutions as chromosomes and two common genetic algorithm models: Holland's original model and the common model. The paper aims to present an analysis of applying genetic algorithms to optimize test functions and finding their global minima.
A Modified KS-test for Feature SelectionIOSR Journals
This document proposes a modified Kolmogorov-Smirnov (KS) test-based feature selection algorithm. It begins with an overview of feature selection and its benefits. It then discusses two common feature selection approaches: filter and wrapper models. The document proposes a fast redundancy removal filter based on a modified KS statistic that utilizes class label information to compare feature pairs. It compares the proposed algorithm to other methods like Correlation Feature Selection (CFS) and KS-Correlation Based Filter (KS-CBF). The efficiency and effectiveness of the various methods are tested on standard classifiers. In most cases, the proposed approach achieved equal or better classification accuracy compared to using all features or the other algorithms.
Particle Swarm Optimization based K-Prototype Clustering Algorithm iosrjce
This document summarizes a research paper that proposes a new Particle Swarm Optimization (PSO) based K-Prototype clustering algorithm to cluster mixed numeric and categorical data. It begins with background information on clustering algorithms like K-Means, K-Modes, and K-Prototype. It then describes the K-Prototype algorithm, PSO, and discrete binary PSO. Related work integrating PSO with other clustering algorithms is also reviewed. The proposed approach uses binary PSO to select improved initial prototypes for K-Prototype clustering in order to obtain better clustering results than traditional K-Prototype and avoid local optima.
Association rule discovery for student performance prediction using metaheuri...csandit
According to the increase of using data mining tech
niques in improving educational systems
operations, Educational Data Mining has been introd
uced as a new and fast growing research
area. Educational Data Mining aims to analyze data
in educational environments in order to
solve educational research problems. In this paper
a new associative classification technique
has been proposed to predict students final perform
ance. Despite of several machine learning
approaches such as ANNs, SVMs, etc. associative cla
ssifiers maintain interpretability along
with high accuracy. In this research work, we have
employed Honeybee Colony Optimization
and Particle Swarm Optimization to extract associat
ion rule for student performance prediction
as a multi-objective classification problem. Result
s indicate that the proposed swarm based
algorithm outperforms well-known classification tec
hniques on student performance prediction
classification problem.
ASSOCIATION RULE DISCOVERY FOR STUDENT PERFORMANCE PREDICTION USING METAHEURI...cscpconf
According to the increase of using data mining techniques in improving educational systems
operations, Educational Data Mining has been introduced as a new and fast growing research
area. Educational Data Mining aims to analyze data in educational environments in order to
solve educational research problems. In this paper a new associative classification technique
has been proposed to predict students final performance. Despite of several machine learning
approaches such as ANNs, SVMs, etc. associative classifiers maintain interpretability along
with high accuracy. In this research work, we have employed Honeybee Colony Optimization
and Particle Swarm Optimization to extract association rule for student performance prediction
as a multi-objective classification problem. Results indicate that the proposed swarm based
algorithm outperforms well-known classification techniques on student performance prediction
classification problem.
In the present day huge amount of data is generated in every minute and transferred frequently. Although
the data is sometimes static but most commonly it is dynamic and transactional. New data that is being
generated is getting constantly added to the old/existing data. To discover the knowledge from this
incremental data, one approach is to run the algorithm repeatedly for the modified data sets which is time
consuming. Again to analyze the datasets properly, construction of efficient classifier model is necessary.
The objective of developing such a classifier is to classify unlabeled dataset into appropriate classes. The
paper proposes a dimension reduction algorithm that can be applied in dynamic environment for
generation of reduced attribute set as dynamic reduct, and an optimization algorithm which uses the
reduct and build up the corresponding classification system. The method analyzes the new dataset, when it
becomes available, and modifies the reduct accordingly to fit the entire dataset and from the entire data
set, interesting optimal classification rule sets are generated. The concepts of discernibility relation,
attribute dependency and attribute significance of Rough Set Theory are integrated for the generation of
dynamic reduct set, and optimal classification rules are selected using PSO method, which not only
reduces the complexity but also helps to achieve higher accuracy of the decision system. The proposed
method has been applied on some benchmark dataset collected from the UCI repository and dynamic
reduct is computed, and from the reduct optimal classification rules are also generated. Experimental
result shows the efficiency of the proposed method.
This document proposes algorithms to efficiently find the best parameter K for the emerging pattern-based classifier PCL. It first summarizes the PCL classifier and discusses how the parameter K impacts its performance. It then presents the rPCL and ePCL algorithms to determine the optimal K value. rPCL repeatedly subsamples the training data and evaluates different K values to find the best average accuracy. ePCL enhances rPCL by using an incremental pattern maintenance technique called PSM to avoid recomputing emerging patterns from scratch for each subsample, improving runtime efficiency. Experimental results show that finding the best K through these algorithms significantly improves PCL's accuracy.
Efficiently Finding the Best Parameter for the Emerging Pattern-Based ClassifierGanesh Pavan Kambhampati
This document proposes algorithms to efficiently find the best parameter K for the emerging pattern-based classifier PCL. It first summarizes the PCL classifier and discusses how the parameter K impacts its performance. It then presents the rPCL and ePCL algorithms. rPCL repeatedly runs cross-validation to determine the best K by averaging accuracies. ePCL enhances rPCL by using an incremental pattern maintenance technique to avoid repeatedly constructing emerging patterns from scratch, improving efficiency. Experimental results show ePCL finds the best K much faster than rPCL by leveraging pattern maintenance.
ADABOOST ENSEMBLE WITH SIMPLE GENETIC ALGORITHM FOR STUDENT PREDICTION MODELijcsit
Predicting the student performance is a great concern to the higher education managements.This
prediction helps to identify and to improve students' performance.Several factors may improve this
performance.In the present study, we employ the data mining processes, particularly classification, to
enhance the quality of the higher educational system. Recently, a new direction is used for the improvement
of the classification accuracy by combining classifiers.In thispaper, we design and evaluate a fastlearning
algorithm using AdaBoost ensemble with a simple genetic algorithmcalled “Ada-GA” where the genetic
algorithm is demonstrated to successfully improve the accuracy of the combined classifier performance.
The Ada-GA algorithm proved to be of considerable usefulness in identifying the students at risk early,
especially in very large classes. This early prediction allows the instructor to provide appropriate advising
to those students. The Ada/GA algorithm is implemented and tested on ASSISTments dataset, the results
showed that this algorithm hassuccessfully improved the detection accuracy as well as it reduces the
complexity of computation.
This document discusses using evolutionary techniques like genetic algorithms and particle swarm optimization to reduce the order of large-scale linear systems. Specifically, it examines using GA and PSO to find stable reduced order models by minimizing the integral squared error between the original and reduced system responses. Both techniques guarantee stability if the original system is stable. The paper provides background on model order reduction techniques and describes how GA and PSO are applied, including representing systems as chromosomes and using selection, crossover, and mutation operators. Results from a numerical example are compared to a conventional method.
A Preference Model on Adaptive Affinity PropagationIJECEIAES
In recent years, two new data clustering algorithms have been proposed. One of them is Affinity Propagation (AP). AP is a new data clustering technique that use iterative message passing and consider all data points as potential exemplars. Two important inputs of AP are a similarity matrix (SM) of the data and the parameter ”preference” p. Although the original AP algorithm has shown much success in data clustering, it still suffer from one limitation: it is not easy to determine the value of the parameter ”preference” p which can result an optimal clustering solution. To resolve this limitation, we propose a new model of the parameter ”preference” p, i.e. it is modeled based on the similarity distribution. Having the SM and p, Modified Adaptive AP (MAAP) procedure is running. MAAP procedure means that we omit the adaptive p-scanning algorithm as in original Adaptive-AP (AAP) procedure. Experimental results on random non-partition and partition data sets show that (i) the proposed algorithm, MAAP-DDP, is slower than original AP for random non-partition dataset, (ii) for random 4-partition dataset and real datasets the proposed algorithm has succeeded to identify clusters according to the number of dataset’s true labels with the execution times that are comparable with those original AP. Beside that the MAAP-DDP algorithm demonstrates more feasible and effective than original AAP procedure.
A Survey Ondecision Tree Learning Algorithms for Knowledge DiscoveryIJERA Editor
Theimmense volumes of data are populated into repositories from various applications. In order to find out desired information and knowledge from large datasets, the data mining techniques are very much helpful. Classification is one of the knowledge discovery techniques. In Classification, Decision trees are very popular in research community due to simplicity and easy comprehensibility. This paper presentsan updated review of recent developments in the field of decision trees.
Oversampling technique in student performance classification from engineering...IJECEIAES
This document discusses various oversampling techniques for dealing with imbalanced data in student performance classification. It compares SMOTE, Borderline-SMOTE, SVMSMOTE, and ADASYN oversampling combined with MLP, gradient boosting, AdaBoost, and random forest classifiers. The results show that Borderline-SMOTE gave the best performance for predicting the minority (low performance) class according to several evaluation metrics. SVMSMOTE also performed well overall, particularly for recall, F1-measure, and AUC. Gradient boosting provided high and consistent precision, recall, F1-measure, and AUC across the different oversampling methods.
This document discusses using particle swarm optimization to improve the k-prototype clustering algorithm. The k-prototype algorithm clusters data with both numeric and categorical attributes but can get stuck in local optima. The proposed method uses particle swarm optimization, a global optimization technique, to guide the k-prototype algorithm towards better clusterings. Particle swarm optimization models potential solutions as particles that explore the search space. It is integrated with k-prototype clustering to avoid locally optimal solutions and produce better clusterings. The method is tested on standard benchmark datasets and shown to outperform traditional k-modes and k-prototype clustering algorithms.
EFFICIENTLY PROCESSING OF TOP-K TYPICALITY QUERY FOR STRUCTURED DATAcsandit
This work presents a novel ranking scheme for structured data. We show how to apply the
notion of typicality analysis from cognitive science and how to use this notion to formulate the
problem of ranking data with categorical attributes. First, we formalize the typicality query
model for relational databases. We adopt Pearson correlation coefficient to quantify the extent
of the typicality of an object. The correlation coefficient estimates the extent of statistical
relationships between two variables based on the patterns of occurrences and absences of their
values. Second, we develop a top-k query processing method for efficient computation. TPFilter
prunes unpromising objects based on tight upper bounds and selectively joins tuples of highest
typicality score. Our methods efficiently prune unpromising objects based on upper bounds.
Experimental results show our approach is promising for real data.
Sakanashi, h.; kakazu, y. (1994): co evolving genetic algorithm with filtered...ArchiLab 7
This document describes a co-evolving genetic algorithm called the filtering genetic algorithm (GA). It consists of two interacting GAs - GAfit and GAsch - and a filter. GAfit's population is evaluated using a modified function that incorporates the filter, which aims to direct the population away from areas of convergence found by GAsch. GAsch's population, consisting of schemas, is evaluated based on how well they cover strings in GAfit's population. The filter is updated based on the degree of convergence in GAfit's population to varying strengths influence GAfit's evaluation and drive it to new areas. The approach allows the algorithm to efficiently locate multiple optima in multimodal functions without getting stuck in local optim
Genetic Programming for Generating Prototypes in Classification ProblemsTarundeep Dhot
This document summarizes a research paper that proposes using genetic programming to generate prototypes for classification problems. It describes encoding prototypes as variable-length logical expressions and using a multi-tree representation. The approach evolves a population of these prototypes using genetic operators like crossover and mutation. Classification involves matching samples to prototypes, and fitness is based on recognition rate on a training set. The method was tested on several datasets and achieved higher recognition rates than an alternative genetic programming approach.
An experimental study on hypothyroid using rotation forestIJDKP
This paper majorly focuses on hypothyroid medical diseases caused by underactive thyroid glands. The
dataset used for the study on hypothyroid is taken from UCI repository. Classification of this thyroid
disease is a considerable task. An experimental study is carried out using rotation forest using features
selection methods to achieve better accuracy. An important step to gain good accuracy is a pre- processing
step, thus here two feature selection techniques are used. A filter method, Correlation features subset
selection and wrappers method has helped in removing irrelevant as well as useless features from the data
set. Fourteen different machine learning algorithms were tested on hypothyroid data set using rotation
forest which successfully turned out giving positively improved results
Markov Chain and Adaptive Parameter Selection on Particle Swarm Optimizer ijsc
Particle Swarm Optimizer (PSO) is such a complex stochastic process so that analysis on the stochastic behavior of the PSO is not easy. The choosing of parameters plays an important role since it is critical in the performance of PSO. As far as our investigation is concerned, most of the relevant researches are based on computer simulations and few of them are based on theoretical approach. In this paper, theoretical approach is used to investigate the behavior of PSO. Firstly, a state of PSO is defined in this paper, which contains all the information needed for the future evolution. Then the memory-less property of the state defined in this paper is investigated and proved. Secondly, by using the concept of the state and suitably dividing the whole process of PSO into countable number of stages (levels), a stationary Markov chain is established. Finally, according to the property of a stationary Markov chain, an adaptive method for parameter selection is proposed.
Rd1 r17a19 datawarehousing and mining_cap617t_cap617Ravi Kumar
Some other areas where we can apply data mining in University are:
1. Predicting student performance and identifying at-risk students to provide early interventions.
2. Analyzing course evaluations and feedback to identify strengths and weaknesses in teaching methods.
3. Examining enrollment and registration patterns to understand student preferences and inform course scheduling.
4. Mining alumni data to understand career paths, further education choices and how well programs prepare students.
5. Analyzing library usage data to optimize resources, collections and services based on user needs.
6. Applying clustering and segmentation to understand different student profiles and tailor support services.
7. Mining online learning platforms to understand engagement, predict dropouts
This document discusses using fuzzy set theory and decision trees to predict student performance. It proposes using fuzzy sets to represent numeric student data like test scores and attendance to allow for imprecise values. A decision tree is generated on this fuzzy data set to classify students as passing or failing. The fuzzy decision tree achieves an accuracy of 81.5% compared to 76% for a non-fuzzy decision tree, indicating fuzzy sets improve predictive performance. Location, attendance, and prior academic performance were identified as important factors impacting student results.
An Efficient PSO Based Ensemble Classification Model on High Dimensional Data...ijsc
This summary provides the high-level information from the document in 3 sentences:
The document proposes a Particle Swarm Optimization (PSO) based ensemble classification model to improve classification of high-dimensional biomedical datasets. It develops an optimized PSO technique to select optimal features and initialize weights for base classifiers in the ensemble model. Experimental results on microarray datasets show the proposed model achieves higher accuracy, true positive rate, and lower error rate compared to traditional feature selection based classification models.
Similar to Discovery of Jumping Emerging Patterns Using Genetic Algorithm (20)
Cosa hanno in comune un mattoncino Lego e la backdoor XZ?Speck&Tech
ABSTRACT: A prima vista, un mattoncino Lego e la backdoor XZ potrebbero avere in comune il fatto di essere entrambi blocchi di costruzione, o dipendenze di progetti creativi e software. La realtà è che un mattoncino Lego e il caso della backdoor XZ hanno molto di più di tutto ciò in comune.
Partecipate alla presentazione per immergervi in una storia di interoperabilità, standard e formati aperti, per poi discutere del ruolo importante che i contributori hanno in una comunità open source sostenibile.
BIO: Sostenitrice del software libero e dei formati standard e aperti. È stata un membro attivo dei progetti Fedora e openSUSE e ha co-fondato l'Associazione LibreItalia dove è stata coinvolta in diversi eventi, migrazioni e formazione relativi a LibreOffice. In precedenza ha lavorato a migrazioni e corsi di formazione su LibreOffice per diverse amministrazioni pubbliche e privati. Da gennaio 2020 lavora in SUSE come Software Release Engineer per Uyuni e SUSE Manager e quando non segue la sua passione per i computer e per Geeko coltiva la sua curiosità per l'astronomia (da cui deriva il suo nickname deneb_alpha).
Digital Marketing Trends in 2024 | Guide for Staying AheadWask
https://www.wask.co/ebooks/digital-marketing-trends-in-2024
Feeling lost in the digital marketing whirlwind of 2024? Technology is changing, consumer habits are evolving, and staying ahead of the curve feels like a never-ending pursuit. This e-book is your compass. Dive into actionable insights to handle the complexities of modern marketing. From hyper-personalization to the power of user-generated content, learn how to build long-term relationships with your audience and unlock the secrets to success in the ever-shifting digital landscape.
Fueling AI with Great Data with Airbyte WebinarZilliz
This talk will focus on how to collect data from a variety of sources, leveraging this data for RAG and other GenAI use cases, and finally charting your course to productionalization.
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
AI 101: An Introduction to the Basics and Impact of Artificial IntelligenceIndexBug
Imagine a world where machines not only perform tasks but also learn, adapt, and make decisions. This is the promise of Artificial Intelligence (AI), a technology that's not just enhancing our lives but revolutionizing entire industries.
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
TrustArc Webinar - 2024 Global Privacy SurveyTrustArc
How does your privacy program stack up against your peers? What challenges are privacy teams tackling and prioritizing in 2024?
In the fifth annual Global Privacy Benchmarks Survey, we asked over 1,800 global privacy professionals and business executives to share their perspectives on the current state of privacy inside and outside of their organizations. This year’s report focused on emerging areas of importance for privacy and compliance professionals, including considerations and implications of Artificial Intelligence (AI) technologies, building brand trust, and different approaches for achieving higher privacy competence scores.
See how organizational priorities and strategic approaches to data security and privacy are evolving around the globe.
This webinar will review:
- The top 10 privacy insights from the fifth annual Global Privacy Benchmarks Survey
- The top challenges for privacy leaders, practitioners, and organizations in 2024
- Key themes to consider in developing and maintaining your privacy program
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Ivanti’s Patch Tuesday breakdown goes beyond patching your applications and brings you the intelligence and guidance needed to prioritize where to focus your attention first. Catch early analysis on our Ivanti blog, then join industry expert Chris Goettl for the Patch Tuesday Webinar Event. There we’ll do a deep dive into each of the bulletins and give guidance on the risks associated with the newly-identified vulnerabilities.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Introduction of Cybersecurity with OSS at Code Europe 2024Hiroshi SHIBATA
I develop the Ruby programming language, RubyGems, and Bundler, which are package managers for Ruby. Today, I will introduce how to enhance the security of your application using open-source software (OSS) examples from Ruby and RubyGems.
The first topic is CVE (Common Vulnerabilities and Exposures). I have published CVEs many times. But what exactly is a CVE? I'll provide a basic understanding of CVEs and explain how to detect and handle vulnerabilities in OSS.
Next, let's discuss package managers. Package managers play a critical role in the OSS ecosystem. I'll explain how to manage library dependencies in your application.
I'll share insights into how the Ruby and RubyGems core team works to keep our ecosystem safe. By the end of this talk, you'll have a better understanding of how to safeguard your code.
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptxSitimaJohn
Ocean Lotus cyber threat actors represent a sophisticated, persistent, and politically motivated group that poses a significant risk to organizations and individuals in the Southeast Asian region. Their continuous evolution and adaptability underscore the need for robust cybersecurity measures and international cooperation to identify and mitigate the threats posed by such advanced persistent threat groups.
Ocean lotus Threat actors project by John Sitima 2024 (1).pptx
Discovery of Jumping Emerging Patterns Using Genetic Algorithm
1. Discovery of Jumping Emerging Patterns Using
Genetic Algorithm
Sumera Qurat ul Ain #1,
Saif ur Rehman *2
UIIT, Arid Agriculture University,
Rawalpindi, Pakistan
1
sumera_quratulain@yahoo.com
UIIT, Arid Agriculture University,
Rawalpindi, Pakistan
2
saif@uaar.edu.pk
Abstract— Patterns that only occur in objects belonging to a
single class are called Jumping Emerging Patterns (JEP). JEP
based Classifiers are considered one of the successful classification
systems. Due to its comprehensibility, simplicity and strong
differentiating abilities JEPs have captured significant recognition.
However, discovery of JEPs in a large pattern space is normally a
time consuming and challenging task because of their exponential
behaviour. In this work a novel method based on genetic
algorithm (GA) is proposed to discover JEPs in large pattern
space. Since the complexity of GA is lower than other algorithms,
so we have combined the power of JEPs and GA to find high
quality JEPs from datasets to improve performance of
classification system. Our proposed method explores a set of high
quality JEPs from pattern search space unlike other methods in
literature that compute complete set of JEPs, Large numbers of
duplicate and redundant JEPs are filtered out during their
discovery process. Experimental results show that our proposed
Genetic-JEPs are effective and accurate for classification of a
variety of data sets and in general achieve higher accuracy than
other standard classifiers.
Keywords- Classification, Jumping Emerging Patterns, Genetic
Algorithm.
I. INTRODUCTION
One of the primary objectives in data mining is
classification and has been researched broadly in the fields of
expert systems [1], machine learning [6], and neural networks
[7] over decades. With reference to machine learning
classification is the procedure to explore classifiers or
predictive models that can differentiate testing data between
different classes, in model derivation first training data is
examined on the basis of quantifiable feature set (types
include ordinal, categorical, integer-valued and real-valued)
[8]. This training data contains information about the class of
each instance from which it belongs. An essential part of the
classification is termed as “classifiers” which is an
implementation of an algorithm to classify unlabeled instances
or a mathematical function which assigns test instances, to a
category or class.
There exist problems when high accuracy is required along
with explanation on which basis; it classifies objects to a class
so that it should be understandable by users [2]. Many
classifiers lack such explanations, which is an important
drawback to use them. For Example, in US Equal credit
opportunity act, if a credit has been denied by a financial
institution, an explanation also required explaining the reasons
on which ground credit has been rejected to an applicant, if
such reasons are not explained properly, credit denied
considered illegal [4]. Medical diagnosis and mineral prospect
ions are some other fields to name a few, where explanations
and clarification are key user requirements in the classifiers
[3]. A family in understandable classifiers are develop from
emerging patterns [5]. “A pattern can be defined as an
expression” in a language to describe a group of objects[7],
moreover emerging pattern (EP) occurs abundantly in objects
in one class except rare to locate in objects of other classes [9].
Jumping Emerging Pattern (JEP) is a type of EPs, are Itemset
whose support rises suddenly from zero in one class of data to
nonzero in another class, the ratio of support rises to infinite
[1]. Due to their firm predictive power JEPs are extensively
used in emerging pattern based classifiers [10]. Moreover
JEPs are combination of simple conditions (e.g [Color = red]
AND [Gender = female] AND [Age > 26]) therefore they are
easily understandable by the user [11].
With all their merits of JEPs based classification, major
drawback lies in the number of discovered patterns, useful for
classification. If data has noise, number of patterns becomes
larger. Searching useful JEPs in training data is a key
procedure in JEP based classification system. Previous
techniques either uses support threshold value or predefined
numbers of patterns [12]. Due to these problems JEPs
discovery is considered as a challenging task. To avoid
redundancy in JEPs, we uses a subset of JEPs called minimal
JEPs. Genetic algorithm adopts global search method that is
better in finding solution, particularly in large search spaces.
Since, time complexity of GA is less than other algorithms
[13], so we have combined the power of JEPs and GA to find
significant JEPs from different data sets for classification of
data.
In our work we propose a new method to discover minimal
JEPs that have significant affect on classification accuracy.
Exploring fittest JEPs from datasets is a multistep process. In
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 4, April 2018
331 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
2. first step, chromosomes( candidate solutions) are initiated
these chromosomes are termed as population in literature, in
subsequent steps solution of each population are used to
generate new population in next stage having hope to get
better population in each generation when compare with
previous ones. This population generation is regulated with
the help of a fitness function whole process is repeated until
terminating conditions reached. After complete iterations of
genetic algorithm, we will get a high quality fittest JEPs list,
which will be evaluated on some test data for classification
accuracy.
This paper is organized as: section II presents some
background knowledge and related work. Section III presents
proposed algorithm for discovery of JEPs. Section IV presents
experimental results and analysis and section V describes
conclusion.
II. BACKGROUND AND RELATED WORK
This section consists of some basic concepts and definitions,
used throughout in this paper. A dataset is a set of objects
having multiple attributes (A1, A2,......An), each data object is
referred as an instance. Each instance of dataset has associated
class C ϵ {C1, C2,.....,Cn}[14]. If I represents set of items or
patterns, P is an itemset which is subset of I, if any instance S
holds an itemset P provided P ⊆ S then support of an itemset
P in a data set D represent as suppD(P) is countD(P)/|D|,
Where (P) is the number of instances in D containing
P, and |D| denoted total instances in D. Growth Rate is the
ratio of support of a pattern in its native class Cp with the
support of pattern in other class, it represents the predictive
power of the pattern. Growth Rate(P) is defined as:
Growth Rate(P)=
, , = 0 Λ , = 0
∞, , = 0 Λ , 0
,
,
,
,
,
,
Definition 1. Let D is a dataset comprises on D1 and D2 where
objects of D1 belongs to one class and objects of D2 belongs to
a different class, a Jumping Emerging Pattern (JEP) from D1
to D2 is an itemset X, that satisfies the condition:
suppD1 (X)=0 and suppD2 (X) > 0
Definition 2. If a JEP does not hold another JEP as a proper
subset, then it is called minimal JEP.
Example 1. Let For dataset presented in Table I, itemsets
{l,p}(1 , 0), {l,n,p}(1 , 0), {l,o,p}(1 , 0), {l, n, o,p}(1 , 0), {m,
p}(2 , 0), {m, n, p}(1 , 0), {m, o, p}(1 , 0), {m, n, o, p}(1 , 0),
and {n, o, p}(2 , 0) are JEPs of class 1; itemsets {l, m}(0 ,
2),{l, m, n}(0 , 1), {l, m, o}(0 , 1), and {l, m, n, o}(0 , 1) are
JEPs of class 2. The total number of JEPs is 13, and four
among them are minimal JEPs, namely, {l, p}(1 : 0), {m,
p}(2 : 0), {n, o, p}(2 : 0), and {l, m}(0 : 2)[1].
In available literature, researchers proposed several JEP-
based Classifiers [15]. In [16] concept of Essential Jumping
Emerging pattern (eJEPs) is proposed, this classifier utilizes
less JEPs than the other JEP classifier. Limitations of this
research includes that it is not complete not sound to discover
all JEPs and yields itemsets other than actual JEPs [2].
Classifier proposed in [2] is a FP tree based technique which
TABLE I
Example dataset having two classes
ID Class Labels Instances (Itemsets)
1 D1 {l,n,o,p}
2 D1 {l}
3 D1 {m,p}
4 D1 {m,n,o,p}
5 D2 {l,m}
6 D2 {n,p}
7 D2 {l,m,n,o}
8 D2 {o,p}
uses a FP growth algorithm to explore JEPs, exploring tree
based structure has high time and space complexity when
comparing with other data structures, in [1] Strong JEP is
proposed, authors claim that SJEPs achieve higher quality
when comparing with JEPs, it is an efficient classifier due to
the less number of SJEPs. In same research author proposed
Noise tolerant emerging patterns (NEPs) and generalized
noise tolerant emerging patterns (GNEPs).The limitations of
these JEPs include that they find SJEPs in two steps initially it
generate a large number of JEPs then in second step generated
JEPs are filtered out to produce a small set of SJEPs. This
twostep process is time consuming and its time and space
complexity is higher than its competitors in literature, another
major demerit of this study is its requirement of minimum
threshold value to prune large set of JEPs. In [17] negJEP-
Classifier and JEPN-classifier introduced, these classifiers use
negative information for classification, results revealed that a
limitation of this research is its infeasibility to build JEPN
classifier for some data sets [17]. [18] discover Top K
minimal jumping emerging patterns, proposed method finds
strong JEPs which are CP tree based, instead of exploring all
JEPs it reduces search space and explore top K- JEPs only on
the bases of increasing minimum threshold support value. The
drawback of this research includes additional pattern counting
[18]. In [19] Highest impact JEPs introduced, which is based
on introducing a new coefficient REAL/ALL, this coefficient
is helpful in comparing discriminative power of distinguished
JEPs collections, demerit of this research is its limited
application to specific datasets. An improved tree based
method is proposed in [14], this method explores SJEPs more
efficiently, drawback of this method include it cannot mine
SJEPs directly, [20] extended border based mining method
and proposed a classifying model for discovering JEPs with
occurrence count for classification. The drawback of this
method which can be improved is its discovery of a large
number of JEPs. The scheme presented in [21] is based on CP
tree whose growth rate is dynamically increasing, it prunes the
search space with the help of new pattern pruning method,
demerits of this research is like in every tree based search is
its space complexity and inability to handle more complex
patterns [21]. In [12] a method for discovery of minimal JEPs
introduced and to the best of my knowledge I found the last
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 4, April 2018
332 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
3. reported work on JEPs is the use of cosine similarity with
JEPs in classification by [10] but its accuracy is compromised.
Various methods have been proposed in the past to reduce
the huge pattern set but it is still an open research area to
considerably reduce the number of derived patterns and
improve the worth of selected patterns.
III. PROPOSED ALGORITHM FOR DISCOVERY OF JEPS
Inspired by Darwin’s Evolution theory Genetic algorithm is
a solution to many problems. Genetic algorithm adopts global
search strategy that is better for searching optimized solution,
particularly in bigger search spaces. In the following section,
detailed procedure of discovery of JEPs by using Genetic
Algorithm is given, major steps includes: Individual
representation, Initial Population Generation, Fitness
evaluation, creation of next generation by using selection,
reproduction, crossover and mutation.
A. Individual Representation
Different type of attributes exists, e.g. categorical,
numerical, in numerical attributes an assumption is made, this
assumption is about the discretized range of value in intervals.
Different combinations of attribute and their values are
normally used to express patterns, like (Color = red, gender =
female, Age = 26) or as logical properties, like [Color = red]
AND [Gender = female] AND [Age > 26]. Normally, data
need to be encoded into chromosomes because genetic
algorithm cannot directly handle data in the solution search
space. Every Chromosome represents a candidate pattern and
have the form of (A1= V1j) Λ ……… Λ (An=Vnj ). Where Ai
represents ith feature and Vij represents jth value of the ith
feature’s domain and n represents length of chromosome [22].
Fig. 1. Individual Representation
B. Initial Population Generation
We have randomly created population of n chromosomes
(where n is the number of chromosomes in population),
population size can be fixed according the problem. In our
method we have used population of 100 chromosomes. These
chromosomes represent patterns by randomly selecting
attributes and their values.
C. Fitness Evaluation of Individual
When applying Genetic Algorithm fitness function is the
central part that plays a vital role in problem optimization. We
need to find some measures (support and growth ratio
described in section II) of these patterns to analyze their
differentiating power between different classes. After
calculating the support and growth ratio values of a pattern
proposed algorithm will check the pattern whether it is a JEP
or not? In our method we are exploring Minimal JEPs so extra
attributes will be removed by applying a pruning method.
Growth ratio of a pattern will be its Fitness value. Having very
low fitness value patterns will be less probable to survive for
the next generation. Class will be assigned to the pattern
according to its growth rate. Then discovered Minimal JEPs
will be added to a global pattern list. Each time when we add a
discovered pattern to the pattern list there is a check to avoid
duplication of pattern in global pattern list.
D. Create Next generation
After fitness evaluation we will create new population for
the next generation by repeating following steps unless new
population according to population size is completed.
1) Selection: In selection process those patterns are used
which has good fitness values to produce next generation with
the expectation to have offspring that has higher fitness value
than their parents. In order to getting better members we have
used Proportional Roulette wheel selection. Each chromosome
has selection probability i.e. directly proportional with
individual’s fitness value. Chromosomes with higher
probability capture the larger fragment; while the less fit
captures likewise smaller fragment in the roulette wheel.
Clearly those chromosomes which have larger fragment size
have more chances to be chosen as parent for next generation.
Let f1, f2, ......, fn be fitness values of individual 1, 2, ....n. Then
the selection probability, Pi for individual i can be calculated
as[23 ]:
=
∑
2) Crossover: Procedure of crossover begins with selecting
more than two parent solutions (patterns) and generating two
child solutions from them. We have used single point
crossover so first we select a random point then beyond that
point all data in both patterns is swapped between the two
parents. This results in two offspring patterns.
3) Mutation: Mutation will be applied on a single
chromosome at a time. It preserves gene diversity in
population and ensures searching in the entire solution space.
In our proposed method mutation rate is very low i.e 5 %
because we don’t want major changes in population it is used
only for global optimization. Complete algorithm is given as
follows:
Algorithm 1: Algorithm for discovery of JEPs using GA
Input: D : m x n matrix, training dataset.
GenNo : maximum number of allowed generations
MR : Mutation rate
CR : Crossover Rate
PSize : Maximum number of chromosomes in
Population
Output: The best Jumping emerging pattern in all
generations.
1. Generate random population of n chromosomes ,
where n = 1 …. PSize.
2. for i = 1 to GenNo do
3. for j= 1 to PSize do
4. Calculate fitness of each individual.
5. Add discovered patterns to list.
6. Prune pattern list.
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 4, April 2018
333 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
4. 7. Calculate selection probability of
each individual.
8. end do
9. Create fresh population through repeating
the following steps unless the population
completion:
10. Selection().
11. Crossover( );
12. Mutation( );
13. Add new child to fresh population.
14. Use newly created population for further
running of algorithm.
15. Check Termination criteria: If the
terminating criterion is satisfied, stop,
Termination condition is predefined number
of generations. If termination criterion is not
satisfied then go to step 2.
16. end for
17. Evaluate Pattern list on test set.
18. Display Result.
19. end
Classification may be binary or multiclass. In binary
classification, test objects can be classified in two classes only,
whereas in multi-class more than two classes can be assigned
to test objects. Our proposed method can be used for
multiclass classification. For multiclass classification we will
use one against all technique.
IV.EXPERIMENTAL RESULTS AND ANALYSIS
Databases used in the experiments are downloaded from
the UCI ML repository. Description of Datasets is given in the
table II. Results of all the experimental are expressed as
Average, i.e. ratio of correct classified instances with the
overall number of instances present in database. To ensure
accuracy CV-10 methodology is used. Results represent the
average classification performance over the 10 folds.
Moreover, frequency of experiment execution is five. To
ensure robustness in the values, results of all five executions
are than used to calculate average.
TABLE II
Description of Datasets
Datasets
# of
Instances
# of
attributes
# of classes
adult+stretch_data 20 5 2
breast-cancer 286 10 2
Crx 690 16 2
Diabete 768 9 2
Glass 214 10 6
Hayes Roth 132 5 3
Iris 150 5 3
Monks 432 7 2
Nursery 12960 9 5
Sonar 208 61 2
Soybean-small 47 22 4
Tic Tac Toe 958 10 2
Vehicle 846 19 4
Xad 94 19 4
Xae 94 19 4
It is evident that all these experiments are aimed to assess
the success and predictive power of the discovered JEPs and
its discovery method. For the experiments of the classifier, 15
common and benchmark databases for classification are used.
Proposed technique is implemented in Microsft Visual
C# .Net Framwork 4.0, using Microsoft Visual Studio 2015.
All experiments were conducted on HP ProBook 4530 Core
i5-2430M CPU @ 2.40 GHz, 4Gb RAM) running Windows 7
Home Premium 64 bit version. Statistical details on important
information of the data set used for the experiments are given
in the following table.
In experiments, different parameters have been used for the
evaluation purposes. These parameters are included GenNo
represents the Total Number of Generation used for the
iteration of the proposed algorithm. Other parameters are
included MR which is the mutation rate for global
optimizaiotn. CR is the Cross Over Rate which is used for the
generation of the next population. PSize is the size of
population. Finally, the D parameter is used for training
dataset given to proposed algorithm. We have used different
values for the above mentioned parameters. These parameter
values are given in the Table III.
TABLE III
Experimental Parameters
Parameter Value
GenNo 50
MR 0.15
CR 0.85
PSize 100
D M X N matrix training data set
Genetic-JEPs (G-JEPs) accuracy has been compared with
the seven popular classifiers, i.e. Naïve Bayes, J48, ID3, Zero-
R, One- R, bagging, and Random Forest. The accuracy of
Naive Bayes, J48, ID3, Zero-R, One- R, bagging, and
Random Forest are obtained using Weka 3.6.9 implementation.
TABLE IV
G-JEPs Accuracy on various datasets
Datasets G-JEPs Accuracy
adult+stretch_data 100.00
breast-cancer 96.494
Crx 85.285
Diabete 88.34
Glass 59.345
Hayes-roth 100.00
Iris 95.066
Monks 99.459
Nursery 90.848
Sonar 57.211
Soybean-small 100.00
tic-tac-toe 98.121
vehicle 56.146
xad 71.276
xae 67.021
We present a best/ same / poor summary in table VI to
evaluate performance of the G- JEPs against competitors.
best/same/poor represents the number of data sets on which
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 4, April 2018
334 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
5. the G- JEPs results greater accuracy than others, the number
for which the both classifier results same accuracy, and the
number on which comparator achieves greater accuracy. From
table V and table VI, it is clearly shown that the G-JEPs
obtained significantly better accuracy on majority datasets.
The G-JEPs classifier has the best performance in 10 out of 15
datasets. Moreover our proposed method has not proof as a
worst performer among its competitors.
TABLE V
Accuracy comparison with standard classifiers
TABLE VI
Best/ same/ poor record of G-JEPs vs. Alternatives
JEPs Vs
Naive
Bayes
J48 bagging ID3
Random
Forest
0- R 1- R
best/same/poor 10,2,3 9,1,5 8,2,5 11,1,3 9,2,4 15,0,0 13,0,2
Datasets
Naive
Bayes
J48 bagging ID3
Random
Forest
ZeroR One-R G-JEPs
adult+stretch_data 100.00 100.00 100.00 100.00 100.00 60.00 70.00 100.00
breast-cancer 72.08 75.524 67.832 56.993 69.231 70.279 65.734 96.494
crx 84.787 85.36 85.942 72.029 83.913 55.507 85.507 85.285
diabete 72.91 74.218 73.046 63.411 69.531 65.104 74.088 88.34
glass 63.081 59.813 63.084 61.215 69.53 35.514 44.392 59.345
Hayes-roth 80.303 72.727 77.272 62.121 77.272 37.878 43.930 100.00
iris 87.333 90.667 90.00 89.333 90.00 33.333 82.667 95.066
monks 75.00 96.527 98.6111 95.370 96.527 49.074 75.00 99.459
nursery 90.324 97.052 97.268 98.186 97.938 33.333 70.972 90.848
sonar 67.788 71.153 77.884 --- 79.326 53.365 62.50 57.211
Soybean-small 100.00 97.872 100.00 95.744 100.00 36.170 87.234 100.00
tic-tac-toe 69.624 85.073 90.710 83.298 90.709 65.344 69.937 98.121
vehicle 57.10 65.484 67.021 61.820 65.484 25.650 41.489 56.146
xad 56.383 53.191 48.936 38.297 57.446 30.851 41.489 71.276
xae 57.447 53.191 48.936 38.297 57.446 30.851 41.489 67.021
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 4, April 2018
335 https://sites.google.com/site/ijcsis/
ISSN 1947-5500
6. Fig. 2. G-JEPs accuracy on various UCI datasets
Fig. 3. Best/ Same/ Poor performance of G-JEPs vs Alternatives
V. CONCLUSION
A comprehensive study about Jumping Emerging Patterns
(JEPs) and the earlier related research works has been
presented to highlight their limitations in order to address
classification problems. Our contribution presented in this
work is a novel method for JEPs discovery using Genetic
Algorithm aimed to overcome the deficiencies of the previous
related approaches in this domain. To decrease the number of
JEPs, we used a function for pruning. Large numbers of
duplicate and redundant JEPs are filtered out during their
discovery process. We developed an accurate and effective
classification model based on JEPs. We have performed
experiments on standard data sets to illustrate that proposed
G-JEPs are effective and accurate for classification of a
variety of data sets and in general achieves higher accuracy
than other standard classifiers.
REFERENCES
[1] H. Fan, and k. Ramamohanarao. 2006. Fast discovery and the
generalization of strong jumping emerging patterns for building
compact and accurate classifiers, knowledge and Data Engineering,
IEEE Transactions on, vol. 18, pp.721-737 2006.
[2] J. Bailey, T. Manoukian, and K. Ramamohanarao, “Fast Algorithms for
Mining Emerging Patterns”, Proc. Sixth European Conf. Principles
and Practiceof knowledge Discovery in Databases(PKDD’02), 2002.
[3] M. García-Borroto, J. F. Martínez-Trinidad and J. A. Carrasco-Ochoa.
2014. A survey of emerging patterns for supervised
classification, Artificial Intelligence Review, 42(4), pp. 705-721.
[4] M. García-Borroto, J. F. Martínez-Trinidad, J. A.Carrasco-Ochoa, M.A.
Medina-Pérez, and J. Ruiz-Shulcloper. 2010. LCMine: An efficient
algorithm for mining discriminative regularities and its application in
supervised classification. Pattern Recognition, 43(9), pp. 3025-3034.
[5] K. Ramamohanarao and H. Fan. 2007. Patterns based classifiers, World
Wide Web, 10(1), pp.71-83.
[6] T.M. Mitchell, Machine Learning. McGraw-Hill Higher Education,
1997.
[7] G. Pateski, and W.Frawley. 1991. Knowledge discovery in databases,
MIT press.
[8] Y. Ma, L. Bing and H.Wynne. 1998. Integrating classification and
association rule mining, In Proceeding of the fourth international
conference on knowledge discovery and data mining.
[9] P. Andruszkiewicz. 2011. Lazy approach to preserving classification
with emerging patterns, In Emerging intelligent technologies in
industry, pp. 253-268. Springer Berlin Heidelberg.
[10] M. Ferrandin, A. Boava, and A.S. R. Pinto. 2015. Classification Using
Jumping Emerging Patterns and Cosine Similarity. In Proceedings on
the International Conference on Artificial Intelligence (ICAI) pp. 682.
[11] G. Dong and J. Li. Efficient mining of emerging patterns: discovering
trends and differences. In: KDD, pp.43-52n(1999).
[12] B. Kane, B. Cuissart and B. Cremilleux. 2015. Minimal Jumping
Emerging Patterns: Computation and Practical Assessment, In
Advances in Knowledge Discovery and Data Mining, pp. 722-733.
Springer International Publishing.
[13] M. Kabir, M. J., S. Xu, B. H. Kang. and Z. Zhao. 2015. Comparative
analysis of genetic based approach and Apriori algorithm for mining
maximal frequent item sets. In IEEE Congress on Evolutionary
Computation (CEC), pp. 39-45.
[14] Chen, Xiangtao, and Lijuan Lu. "An improved algorithm of mining
Strong Jumping Emerging Patterns based on sorted SJEP-Tree." Bio-
Inspired Computing: Theories and Applications (BIC-TA), 2010 IEEE
Fifth International Conference on. IEEE, 2010.
[15] Li, J., G. Dong and K. Ramamohanarao. 2001. Making use of the most
expressive jumping emerging patterns for classification, Knowledge
and Information systems, 3(2), pp. 131-145.
[16] H. Fan and K. Ramamohanarao. 2002. An efficient single-scan
algorithm for mining essential jumping emerging patterns for
classification, In Advances in Knowledge Discovery and Data Mining,
Springer Berlin Heidelberg, pp. 456-462.
[17] P. Terlecki and K. Walczak. 2007. Jumping emerging patterns with
negation in transaction databases–Classification and
discovery, Information Sciences,177(24), pp. 5675-5690.
[18] P. Terlecki and K. Walczak. 2008. Efficient discovery of top-k
minimal jumping emerging patterns, In Rough Sets and Current Trends
in Computing, Springer Berlin Heidelberg, pp. 438-447.
[19] T. Gambin, and K. Walczak. 2009. Classification based on the highest
impact jumping emerging patterns, In Computer Science and
Information Technology, IMCSIT'09, International Multiconference on,
pp. 37-42. IEEE.
[20] Walczak. 2009. Classification based on the highest impact jumping
emerging patterns, In Computer Science and Information Technology,
IMCSIT'09, International Multiconference on, pp. 37-42. IEEE.
[21] Kobyliński, Ł. and K. Walczak. 2011. Efficient mining of jumping
emerging patterns with occurrence counts for classification,
In Transactions on rough sets XIII, Springer Berlin Heidelberg, pp. 73-
88.
[22] Liu, Q., P. Shi, Z. Hu, and Y. Zhang. 2014. A novel approach of
mining strong jumping emerging patterns based on BSC-
tree. International Journal of Systems Science, 45(3), pp. 598-615.
[23] X. Shi, J. and H. Lei. 2008. A genetic algorithm-based approach for
classification rule discovery. International Conference on Information
Management, Innovation Management and Industrial Engineering, Vol.
1, pp. 175-178. IEEE..
0
20
40
60
80
100
adult+stretc…
breast-cancer
crx
diabete
glass
Hayes-roth
iris
monks
nursery
sonar
Soybean-…
tic-tac-toe
vehicle
xad
xae
G-JEPs Accuracy
0
2
4
6
8
10
12
14
16
Accuracy
Classifiers
Win
Draw
loss
International Journal of Computer Science and Information Security (IJCSIS),
Vol. 16, No. 4, April 2018
336 https://sites.google.com/site/ijcsis/
ISSN 1947-5500