Clustering is the process of grouping a set of objects
into classes of similar objects. Dynamic clustering comes in a
new research area that is concerned about dataset with dynamic
aspects. It requires updates of the clusters whenever new data
records are added to the dataset and may result in a change of
clustering over time. When there is a continuous update and
huge amount of dynamic data, rescan the database is not
possible in static data mining. But this is possible in Dynamic
data mining process. This dynamic data mining occurs when
the derived information is present for the purpose of analysis
and the environment is dynamic, i.e. many updates occur.
Since this has now been established by most researchers and
they will move into solving some of the problems and the
research is to concentrate on solving the problem of using data
mining dynamic databases. This paper gives some
investigation of existing work done in some papers related with
dynamic clustering and incremental data clustering
A statistical data fusion technique in virtual data integration environmentIJDKP
Data fusion in the virtual data integration environment starts after detecting and clustering duplicated
records from the different integrated data sources. It refers to the process of selecting or fusing attribute
values from the clustered duplicates into a single record representing the real world object. In this paper, a
statistical technique for data fusion is introduced based on some probabilistic scores from both data
sources and clustered duplicates
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
Data mining works to extract information known in advance from the enormous quantities of data which can lead to knowledge. It provides information that helps to make good decisions. The effectiveness of data mining in access to knowledge to achieve the goal of which is the discovery of the hidden facts contained in databases and through the use of multiple technologies. Clustering is organizing data into clusters or groups such that they have high intra-cluster similarity and low inter cluster similarity. This paper deals with K-means clustering algorithm which collect a number of data based on the characteristics and attributes of this data, and process the Clustering by reducing the distances between the data center. This algorithm is applied using open source tool called WEKA, with the Insurance dataset as its input
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
Huge volume of data from domain specific applications such as medical, financial, library, telephone,
shopping records and individual are regularly generated. Sharing of these data is proved to be beneficial
for data mining application. On one hand such data is an important asset to business decision making by
analyzing it. On the other hand data privacy concerns may prevent data owners from sharing information
for data analysis. In order to share data while preserving privacy, data owner must come up with a solution
which achieves the dual goal of privacy preservation as well as an accuracy of data mining task –
clustering and classification. An efficient and effective approach has been proposed that aims to protect
privacy of sensitive information and obtaining data clustering with minimum information loss
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
Recommendation system using bloom filter in mapreduceIJDKP
Many clients like to use the Web to discover product details in the form of online reviews. The reviews are
provided by other clients and specialists. Recommender systems provide an important response to the
information overload problem as it presents users more practical and personalized information facilities.
Collaborative filtering methods are vital component in recommender systems as they generate high-quality
recommendations by influencing the likings of society of similar users. The collaborative filtering method
has assumption that people having same tastes choose the same items. The conventional collaborative
filtering system has drawbacks as sparse data problem & lack of scalability. A new recommender system is
required to deal with the sparse data problem & produce high quality recommendations in large scale
mobile environment. MapReduce is a programming model which is widely used for large-scale data
analysis. The described algorithm of recommendation mechanism for mobile commerce is user based
collaborative filtering using MapReduce which reduces scalability problem in conventional CF system.
One of the essential operations for the data analysis is join operation. But MapReduce is not very
competent to execute the join operation as it always uses all records in the datasets where only small
fraction of datasets are applicable for the join operation. This problem can be reduced by applying
bloomjoin algorithm. The bloom filters are constructed and used to filter out redundant intermediate
records. The proposed algorithm using bloom filter will reduce the number of intermediate results and will
improve the join performance.
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEIJDKP
Metadata represents the information about data to be stored in Data Warehouses. It is a mandatory
element of Data Warehouse to build an efficient Data Warehouse. Metadata helps in data integration,
lineage, data quality and populating transformed data into data warehouse. Spatial data warehouses are
based on spatial data mostly collected from Geographical Information Systems (GIS) and the transactional
systems that are specific to an application or enterprise. Metadata design and deployment is the most
critical phase in building of data warehouse where it is mandatory to bring the spatial information and
data modeling together. In this paper, we present a holistic metadata framework that drives metadata
creation for spatial data warehouse. Theoretically, the proposed metadata framework improves the
efficiency of accessing of data in response to frequent queries on SDWs. In other words, the proposed
framework decreases the response time of the query and accurate information is fetched from Data
Warehouse including the spatial information
A statistical data fusion technique in virtual data integration environmentIJDKP
Data fusion in the virtual data integration environment starts after detecting and clustering duplicated
records from the different integrated data sources. It refers to the process of selecting or fusing attribute
values from the clustered duplicates into a single record representing the real world object. In this paper, a
statistical technique for data fusion is introduced based on some probabilistic scores from both data
sources and clustered duplicates
Applying K-Means Clustering Algorithm to Discover Knowledge from Insurance Da...theijes
Data mining works to extract information known in advance from the enormous quantities of data which can lead to knowledge. It provides information that helps to make good decisions. The effectiveness of data mining in access to knowledge to achieve the goal of which is the discovery of the hidden facts contained in databases and through the use of multiple technologies. Clustering is organizing data into clusters or groups such that they have high intra-cluster similarity and low inter cluster similarity. This paper deals with K-means clustering algorithm which collect a number of data based on the characteristics and attributes of this data, and process the Clustering by reducing the distances between the data center. This algorithm is applied using open source tool called WEKA, with the Insurance dataset as its input
TUPLE VALUE BASED MULTIPLICATIVE DATA PERTURBATION APPROACH TO PRESERVE PRIVA...IJDKP
Huge volume of data from domain specific applications such as medical, financial, library, telephone,
shopping records and individual are regularly generated. Sharing of these data is proved to be beneficial
for data mining application. On one hand such data is an important asset to business decision making by
analyzing it. On the other hand data privacy concerns may prevent data owners from sharing information
for data analysis. In order to share data while preserving privacy, data owner must come up with a solution
which achieves the dual goal of privacy preservation as well as an accuracy of data mining task –
clustering and classification. An efficient and effective approach has been proposed that aims to protect
privacy of sensitive information and obtaining data clustering with minimum information loss
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
Recommendation system using bloom filter in mapreduceIJDKP
Many clients like to use the Web to discover product details in the form of online reviews. The reviews are
provided by other clients and specialists. Recommender systems provide an important response to the
information overload problem as it presents users more practical and personalized information facilities.
Collaborative filtering methods are vital component in recommender systems as they generate high-quality
recommendations by influencing the likings of society of similar users. The collaborative filtering method
has assumption that people having same tastes choose the same items. The conventional collaborative
filtering system has drawbacks as sparse data problem & lack of scalability. A new recommender system is
required to deal with the sparse data problem & produce high quality recommendations in large scale
mobile environment. MapReduce is a programming model which is widely used for large-scale data
analysis. The described algorithm of recommendation mechanism for mobile commerce is user based
collaborative filtering using MapReduce which reduces scalability problem in conventional CF system.
One of the essential operations for the data analysis is join operation. But MapReduce is not very
competent to execute the join operation as it always uses all records in the datasets where only small
fraction of datasets are applicable for the join operation. This problem can be reduced by applying
bloomjoin algorithm. The bloom filters are constructed and used to filter out redundant intermediate
records. The proposed algorithm using bloom filter will reduce the number of intermediate results and will
improve the join performance.
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEIJDKP
Metadata represents the information about data to be stored in Data Warehouses. It is a mandatory
element of Data Warehouse to build an efficient Data Warehouse. Metadata helps in data integration,
lineage, data quality and populating transformed data into data warehouse. Spatial data warehouses are
based on spatial data mostly collected from Geographical Information Systems (GIS) and the transactional
systems that are specific to an application or enterprise. Metadata design and deployment is the most
critical phase in building of data warehouse where it is mandatory to bring the spatial information and
data modeling together. In this paper, we present a holistic metadata framework that drives metadata
creation for spatial data warehouse. Theoretically, the proposed metadata framework improves the
efficiency of accessing of data in response to frequent queries on SDWs. In other words, the proposed
framework decreases the response time of the query and accurate information is fetched from Data
Warehouse including the spatial information
New proximity estimate for incremental update of non uniformly distributed cl...IJDKP
The conventional clustering algorithms mine static databases and generate a set of patterns in the form of
clusters. Many real life databases keep growing incrementally. For such dynamic databases, the patterns
extracted from the original database become obsolete. Thus the conventional clustering algorithms are not
suitable for incremental databases due to lack of capability to modify the clustering results in accordance
with recent updates. In this paper, the author proposes a new incremental clustering algorithm called
CFICA(Cluster Feature-Based Incremental Clustering Approach for numerical data) to handle numerical
data and suggests a new proximity metric called Inverse Proximity Estimate (IPE) which considers the
proximity of a data point to a cluster representative as well as its proximity to a farthest point in its vicinity.
CFICA makes use of the proposed proximity metric to determine the membership of a data point into a
cluster.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE cscpconf
The traditional medical analysis is based on the static data, the medical data is about to be analysis after the collection of these data sets is completed, but this is far from satisfying the actual demand. Large amounts of medical data are generated in real time, so that real-time analysis can yield more value. This paper introduces the design of the Sentinel which can realize the real-time analysis system based on the clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering algorithm and issue an early alert.
N ETWORK F AULT D IAGNOSIS U SING D ATA M INING C LASSIFIERScsandit
Mobile networks are under more pressure than ever b
efore because of the increasing number of
smartphone users and the number of people relying o
n mobile data networks. With larger
numbers of users, the issue of service quality has
become more important for network operators.
Identifying faults in mobile networks that reduce t
he quality of service must be found within
minutes so that problems can be addressed and netwo
rks returned to optimised performance. In
this paper, a method of automated fault diagnosis i
s presented using decision trees, rules and
Bayesian classifiers for visualization of network f
aults. Using data mining techniques the model
classifies optimisation criteria based on the key p
erformance indicators metrics to identify
network faults supporting the most efficient optimi
sation decisions. The goal is to help wireless
providers to localize the key performance indicator
alarms and determine which Quality of
Service factors should be addressed first and at wh
ich locations
Novel Ensemble Tree for Fast Prediction on Data StreamsIJERA Editor
Data Streams are sequential set of data records. When data appears at highest speed and constantly, so predicting
the class accordingly to the time is very essential. Currently Ensemble modeling techniques are growing
speedily in Classification of Data Stream. Ensemble learning will be accepted since its benefit to manage huge
amount of data stream, means it will manage the data in a large size and also it will be able to manage concept
drifting. Prior learning, mostly focused on accuracy of ensemble model, prediction efficiency has not considered
much since existing ensemble model predicts in linear time, which is enough for small applications and
accessible models workings on integrating some of the classifier. Although real time application has huge
amount of data stream so we required base classifier to recognize dissimilar model and make a high grade
ensemble model. To fix these challenges we developed Ensemble tree which is height balanced tree indexing
structure of base classifier for quick prediction on data streams by ensemble modeling techniques. Ensemble
Tree manages ensembles as geodatabases and it utilizes R tree similar to structure to achieve sub linear time
complexity
Data mining is utilized to manage huge measure of information which are put in the data ware houses and databases, to discover required information and data. Numerous data mining systems have been proposed, for example, association rules, decision trees, neural systems, clustering, and so on. It has turned into the purpose of consideration from numerous years. A re-known amongst the available data mining strategies is clustering of the dataset. It is the most effective data mining method. It groups the dataset in number of clusters based on certain guidelines that are predefined. It is dependable to discover the connection between the distinctive characteristics of data.
In k-mean clustering algorithm, the function is being selected on the basis of the relevancy of the function for predicting the data and also the Euclidian distance between the centroid of any cluster and the data objects outside the cluster is being computed for the clustering the data points. In this work, author enhanced the Euclidian distance formula to increase the cluster quality.
The problem of accuracy and redundancy of the dissimilar points in the clusters remains in the improved k-means for which new enhanced approach is been proposed which uses the similarity function for checking the similarity level of the point before including it to the cluster.
Assessment of Cluster Tree Analysis based on Data Linkagesjournal ijrtem
Abstract: Details linkage is a procedure which almost adjoins two or more places of data (surveyed or proprietary) from different companies to generate a value chest of information which can be used for further analysis. This allows for the real application of the details. One-to-Many data linkage affiliates an enterprise from the first data set with a number of related companies from the other data places. Before performs concentrate on accomplishing one-to-one data linkages. So formerly a two level clustering shrub known as One-Class Clustering Tree (OCCT) with designed in Jaccard Likeness evaluate was suggested in which each flyer contains team instead of only one categorized sequence. OCCT's strategy to use Jaccard's similarity co-efficient increases time complexness significantly. So we recommend to substitute jaccard's similarity coefficient with Jaro wrinket similarity evaluate to acquire the team similarity related because it requires purchase into consideration using positional indices to calculate relevance compared with Jaccard's. An assessment of our suggested idea suffices as approval of an enhanced one-to-many data linkage system.
Index Terms: Maximum-Weighted Bipartite Matching, Ant Colony Optimization, Graph Partitioning Technique
Characterizing and Processing of Big Data Using Data Mining TechniquesIJTET Journal
Abstract— Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. It concerns Large-Volume, Complex and growing data sets in both multiple and autonomous sources. Not only in science and engineering big data are now rapidly expanding in all domains like physical, bio logical etc...The main objective of this paper is to characterize the features of big data. Here the HACE theorem, that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective, is used. The aggregation of mining, analysis, information sources, user interest modeling, privacy and security are involved in this model. To explore and extract the large volumes of data and useful information or knowledge respectively is the most fundamental challenge in Big Data. So we should have a tendency to analyze these problems and knowledge revolution.
Enhancement techniques for data warehouse staging areaIJDKP
Poor performance can turn a successful data warehousing project into a failure. Consequently, several
attempts have been made by various researchers to deal with the problem of scheduling the Extract-
Transform-Load (ETL) process. In this paper we therefore present several approaches in the context of
enhancing the data warehousing Extract, Transform and loading stages. We focus on enhancing the
performance of extract and transform phases by proposing two algorithms that reduce the time needed in
each phase through employing the hidden semantic information in the data. Using the semantic
information, a large volume of useless data can be pruned in early design stage. We also focus on the
problem of scheduling the execution of the ETL activities, with the goal of minimizing ETL execution time.
We explore and invest in this area by choosing three scheduling techniques for ETL. Finally, we
experimentally show their behavior in terms of execution time in the sales domain to understand the impact
of implementing any of them and choosing the one leading to maximum performance enhancement.
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
Institution is a place where teacher explains and student just understands and learns the lesson. Every student has his own definition for toughness and easiness and there isn’t any absolute scale for measuring knowledge but examination score indicate the performance of student. In this case study, knowledge of data mining is combined with educational strategies to improve students’ performance. Generally, data mining (sometimes called data or knowledge discovery) is the process of analysing data from different perspectives and summarizing it into useful information. Data mining software is one of a number of analytical tools for data. It allows users to analyse data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational database. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).This project describes the use of clustering data mining technique to improve the efficiency of academic performance in the educational institutions .In this project, a live experiment was conducted on students .By conducting an exam on students of computer science major using MOODLE(LMS) and analysing that data generated using RapidMiner(Datamining Software) and later by performing clustering on the data. This method helps to identify the students who need special advising or counselling by the teacher to give high quality of education.
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...Editor IJMTER
This system technique is used for efficient data mining in SRMS (Student Records
Management System) through vertical approach with association rules in distributed databases. The
current leading technique is that of Kantarcioglu and Clifton[1]. In this system I deal with two
challenges or issues, one that computes the union of private subsets that each of the interacting users
hold, and another that tests the inclusion of an element held by one user in a subset held by another.
The existing system uses different techniques for data mining purpose like Apriori algorithm. The
Fast Distributed Mining (FDM) algorithm of Cheung et al. [2], which is an unsecured distributed
version of the Apriori algorithm. Proposed system offers enhanced privacy and data mining with
respect to the Encryption techniques and Association rule with Fp-Growth Algorithm in private
cloud (system contains different files of subjects with respect to their branches). Due to this above
techniques the expected effect on this system is that, it is simpler and more efficient in terms of
communication cost and combinational cost. Due to these techniques it will affect the parameter like
time consumption for execution, length of the code is decrease, find the data fast, extracting hidden
predictive information from large databases and the efficiency of this proposed system should
increase by the 20%.
Concept Drift Identification using Classifier Ensemble Approach IJECEIAES
Abstract:-In Internetworking system, the huge amount of data is scattered, generated and processed over the network. The data mining techniques are used to discover the unknown pattern from the underlying data. A traditional classification model is used to classify the data based on past labelled data. However in many current applications, data is increasing in size with fluctuating patterns. Due to this new feature may arrive in the data. It is present in many applications like sensornetwork, banking and telecommunication systems, financial domain, Electricity usage and prices based on its demand and supplyetc .Thus change in data distribution reduces the accuracy of classifying the data. It may discover some patterns as frequent while other patterns tend to disappear and wrongly classify. To mine such data distribution, traditionalclassification techniques may not be suitable as the distribution generating the items can change over time so data from the past may become irrelevant or even false for the current prediction. For handlingsuch varying pattern of data, concept drift mining approach is used to improve the accuracy of classification techniques. In this paper we have proposed ensemble approach for improving the accuracy of classifier. The ensemble classifier is applied on 3 different data sets. We investigated different features for the different chunk of data which is further given to ensemble classifier. We observed the proposed approach improves the accuracy of classifier for different chunks of data.
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...IJECEIAES
Data analysis plays a prominent role in interpreting various phenomena. Data mining is the process to hypothesize useful knowledge from the extensive data. Based upon the classical statistical prototypes the data can be exploited beyond the storage and management of the data. Cluster analysis a primary investigation with little or no prior knowledge, consists of research and development across a wide variety of communities. Cluster ensembles are melange of individual solutions obtained from different clusterings to produce final quality clustering which is required in wider applications. The method arises in the perspective of increasing robustness, scalability and accuracy. This paper gives a brief overview of the generation methods and consensus functions included in cluster ensemble. The survey is to analyze the various techniques and cluster ensemble methods.
Recent Trends in Incremental Clustering: A ReviewIOSRjournaljce
This paper presents a review on recent trends in incremental clustering algorithms. It tries to focus on both clustering based on similarity measure and clustering not based on similarity measure. In this context, the paper is devoted to various typical incremental clustering algorithms. Mainly optimization, genetic and fuzzy approaches of these algorithms is covered in the paper. The paper is original with respect to one aspect that is, it provides a complete overview that is fully devoted to evolutionary algorithms for incremental clustering. A number of references are provided that describe applications of evolutionary algorithms for incremental clustering in different domains, such as human activity detection, online fault detection, information security, track an object consistently throughout the network solving boundary problem etc.
A SURVEY ON DATA MINING IN STEEL INDUSTRIESIJCSES Journal
In Industrial environments, huge amount of data is being generated which in turn collected indatabase anddata warehouses from all involved areas such as planning, process design, materials, assembly, production, quality, process control, scheduling, fault detection,shutdown, customer relation management, and so on. Data Mining has become auseful tool for knowledge acquisition for industrial process of Iron and steel making. Due to the rapid growth in Data Mining, various industries started using data mining technology to search the hidden patterns, which might further be used to the system with the new knowledge which might design new models to enhance the production quality, productivity optimum cost and maintenance etc. The continuous improvement of all steel production process regarding the avoidance of quality deficiencies and the related improvement of production yield is an essential task of steel producer. Therefore, zero defect strategy is popular today and to maintain it several quality assurancetechniques areused. The present report explains the methods of data mining and describes its application in the industrial environment and especially, in the steel industry.
New proximity estimate for incremental update of non uniformly distributed cl...IJDKP
The conventional clustering algorithms mine static databases and generate a set of patterns in the form of
clusters. Many real life databases keep growing incrementally. For such dynamic databases, the patterns
extracted from the original database become obsolete. Thus the conventional clustering algorithms are not
suitable for incremental databases due to lack of capability to modify the clustering results in accordance
with recent updates. In this paper, the author proposes a new incremental clustering algorithm called
CFICA(Cluster Feature-Based Incremental Clustering Approach for numerical data) to handle numerical
data and suggests a new proximity metric called Inverse Proximity Estimate (IPE) which considers the
proximity of a data point to a cluster representative as well as its proximity to a farthest point in its vicinity.
CFICA makes use of the proposed proximity metric to determine the membership of a data point into a
cluster.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCE cscpconf
The traditional medical analysis is based on the static data, the medical data is about to be analysis after the collection of these data sets is completed, but this is far from satisfying the actual demand. Large amounts of medical data are generated in real time, so that real-time analysis can yield more value. This paper introduces the design of the Sentinel which can realize the real-time analysis system based on the clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering algorithm and issue an early alert.
N ETWORK F AULT D IAGNOSIS U SING D ATA M INING C LASSIFIERScsandit
Mobile networks are under more pressure than ever b
efore because of the increasing number of
smartphone users and the number of people relying o
n mobile data networks. With larger
numbers of users, the issue of service quality has
become more important for network operators.
Identifying faults in mobile networks that reduce t
he quality of service must be found within
minutes so that problems can be addressed and netwo
rks returned to optimised performance. In
this paper, a method of automated fault diagnosis i
s presented using decision trees, rules and
Bayesian classifiers for visualization of network f
aults. Using data mining techniques the model
classifies optimisation criteria based on the key p
erformance indicators metrics to identify
network faults supporting the most efficient optimi
sation decisions. The goal is to help wireless
providers to localize the key performance indicator
alarms and determine which Quality of
Service factors should be addressed first and at wh
ich locations
Novel Ensemble Tree for Fast Prediction on Data StreamsIJERA Editor
Data Streams are sequential set of data records. When data appears at highest speed and constantly, so predicting
the class accordingly to the time is very essential. Currently Ensemble modeling techniques are growing
speedily in Classification of Data Stream. Ensemble learning will be accepted since its benefit to manage huge
amount of data stream, means it will manage the data in a large size and also it will be able to manage concept
drifting. Prior learning, mostly focused on accuracy of ensemble model, prediction efficiency has not considered
much since existing ensemble model predicts in linear time, which is enough for small applications and
accessible models workings on integrating some of the classifier. Although real time application has huge
amount of data stream so we required base classifier to recognize dissimilar model and make a high grade
ensemble model. To fix these challenges we developed Ensemble tree which is height balanced tree indexing
structure of base classifier for quick prediction on data streams by ensemble modeling techniques. Ensemble
Tree manages ensembles as geodatabases and it utilizes R tree similar to structure to achieve sub linear time
complexity
Data mining is utilized to manage huge measure of information which are put in the data ware houses and databases, to discover required information and data. Numerous data mining systems have been proposed, for example, association rules, decision trees, neural systems, clustering, and so on. It has turned into the purpose of consideration from numerous years. A re-known amongst the available data mining strategies is clustering of the dataset. It is the most effective data mining method. It groups the dataset in number of clusters based on certain guidelines that are predefined. It is dependable to discover the connection between the distinctive characteristics of data.
In k-mean clustering algorithm, the function is being selected on the basis of the relevancy of the function for predicting the data and also the Euclidian distance between the centroid of any cluster and the data objects outside the cluster is being computed for the clustering the data points. In this work, author enhanced the Euclidian distance formula to increase the cluster quality.
The problem of accuracy and redundancy of the dissimilar points in the clusters remains in the improved k-means for which new enhanced approach is been proposed which uses the similarity function for checking the similarity level of the point before including it to the cluster.
Assessment of Cluster Tree Analysis based on Data Linkagesjournal ijrtem
Abstract: Details linkage is a procedure which almost adjoins two or more places of data (surveyed or proprietary) from different companies to generate a value chest of information which can be used for further analysis. This allows for the real application of the details. One-to-Many data linkage affiliates an enterprise from the first data set with a number of related companies from the other data places. Before performs concentrate on accomplishing one-to-one data linkages. So formerly a two level clustering shrub known as One-Class Clustering Tree (OCCT) with designed in Jaccard Likeness evaluate was suggested in which each flyer contains team instead of only one categorized sequence. OCCT's strategy to use Jaccard's similarity co-efficient increases time complexness significantly. So we recommend to substitute jaccard's similarity coefficient with Jaro wrinket similarity evaluate to acquire the team similarity related because it requires purchase into consideration using positional indices to calculate relevance compared with Jaccard's. An assessment of our suggested idea suffices as approval of an enhanced one-to-many data linkage system.
Index Terms: Maximum-Weighted Bipartite Matching, Ant Colony Optimization, Graph Partitioning Technique
Characterizing and Processing of Big Data Using Data Mining TechniquesIJTET Journal
Abstract— Big data is a popular term used to describe the exponential growth and availability of data, both structured and unstructured. It concerns Large-Volume, Complex and growing data sets in both multiple and autonomous sources. Not only in science and engineering big data are now rapidly expanding in all domains like physical, bio logical etc...The main objective of this paper is to characterize the features of big data. Here the HACE theorem, that characterizes the features of the Big Data revolution, and proposes a Big Data processing model, from the data mining perspective, is used. The aggregation of mining, analysis, information sources, user interest modeling, privacy and security are involved in this model. To explore and extract the large volumes of data and useful information or knowledge respectively is the most fundamental challenge in Big Data. So we should have a tendency to analyze these problems and knowledge revolution.
Enhancement techniques for data warehouse staging areaIJDKP
Poor performance can turn a successful data warehousing project into a failure. Consequently, several
attempts have been made by various researchers to deal with the problem of scheduling the Extract-
Transform-Load (ETL) process. In this paper we therefore present several approaches in the context of
enhancing the data warehousing Extract, Transform and loading stages. We focus on enhancing the
performance of extract and transform phases by proposing two algorithms that reduce the time needed in
each phase through employing the hidden semantic information in the data. Using the semantic
information, a large volume of useless data can be pruned in early design stage. We also focus on the
problem of scheduling the execution of the ETL activities, with the goal of minimizing ETL execution time.
We explore and invest in this area by choosing three scheduling techniques for ETL. Finally, we
experimentally show their behavior in terms of execution time in the sales domain to understand the impact
of implementing any of them and choosing the one leading to maximum performance enhancement.
Study and Analysis of K-Means Clustering Algorithm Using RapidminerIJERA Editor
Institution is a place where teacher explains and student just understands and learns the lesson. Every student has his own definition for toughness and easiness and there isn’t any absolute scale for measuring knowledge but examination score indicate the performance of student. In this case study, knowledge of data mining is combined with educational strategies to improve students’ performance. Generally, data mining (sometimes called data or knowledge discovery) is the process of analysing data from different perspectives and summarizing it into useful information. Data mining software is one of a number of analytical tools for data. It allows users to analyse data from many different dimensions or angles, categorize it, and summarize the relationships identified. Technically, data mining is the process of finding correlations or patterns among dozens of fields in large relational database. Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense or another) to each other than to those in other groups (clusters).This project describes the use of clustering data mining technique to improve the efficiency of academic performance in the educational institutions .In this project, a live experiment was conducted on students .By conducting an exam on students of computer science major using MOODLE(LMS) and analysing that data generated using RapidMiner(Datamining Software) and later by performing clustering on the data. This method helps to identify the students who need special advising or counselling by the teacher to give high quality of education.
PERFORMING DATA MINING IN (SRMS) THROUGH VERTICAL APPROACH WITH ASSOCIATION R...Editor IJMTER
This system technique is used for efficient data mining in SRMS (Student Records
Management System) through vertical approach with association rules in distributed databases. The
current leading technique is that of Kantarcioglu and Clifton[1]. In this system I deal with two
challenges or issues, one that computes the union of private subsets that each of the interacting users
hold, and another that tests the inclusion of an element held by one user in a subset held by another.
The existing system uses different techniques for data mining purpose like Apriori algorithm. The
Fast Distributed Mining (FDM) algorithm of Cheung et al. [2], which is an unsecured distributed
version of the Apriori algorithm. Proposed system offers enhanced privacy and data mining with
respect to the Encryption techniques and Association rule with Fp-Growth Algorithm in private
cloud (system contains different files of subjects with respect to their branches). Due to this above
techniques the expected effect on this system is that, it is simpler and more efficient in terms of
communication cost and combinational cost. Due to these techniques it will affect the parameter like
time consumption for execution, length of the code is decrease, find the data fast, extracting hidden
predictive information from large databases and the efficiency of this proposed system should
increase by the 20%.
Concept Drift Identification using Classifier Ensemble Approach IJECEIAES
Abstract:-In Internetworking system, the huge amount of data is scattered, generated and processed over the network. The data mining techniques are used to discover the unknown pattern from the underlying data. A traditional classification model is used to classify the data based on past labelled data. However in many current applications, data is increasing in size with fluctuating patterns. Due to this new feature may arrive in the data. It is present in many applications like sensornetwork, banking and telecommunication systems, financial domain, Electricity usage and prices based on its demand and supplyetc .Thus change in data distribution reduces the accuracy of classifying the data. It may discover some patterns as frequent while other patterns tend to disappear and wrongly classify. To mine such data distribution, traditionalclassification techniques may not be suitable as the distribution generating the items can change over time so data from the past may become irrelevant or even false for the current prediction. For handlingsuch varying pattern of data, concept drift mining approach is used to improve the accuracy of classification techniques. In this paper we have proposed ensemble approach for improving the accuracy of classifier. The ensemble classifier is applied on 3 different data sets. We investigated different features for the different chunk of data which is further given to ensemble classifier. We observed the proposed approach improves the accuracy of classifier for different chunks of data.
Extensive Analysis on Generation and Consensus Mechanisms of Clustering Ensem...IJECEIAES
Data analysis plays a prominent role in interpreting various phenomena. Data mining is the process to hypothesize useful knowledge from the extensive data. Based upon the classical statistical prototypes the data can be exploited beyond the storage and management of the data. Cluster analysis a primary investigation with little or no prior knowledge, consists of research and development across a wide variety of communities. Cluster ensembles are melange of individual solutions obtained from different clusterings to produce final quality clustering which is required in wider applications. The method arises in the perspective of increasing robustness, scalability and accuracy. This paper gives a brief overview of the generation methods and consensus functions included in cluster ensemble. The survey is to analyze the various techniques and cluster ensemble methods.
Recent Trends in Incremental Clustering: A ReviewIOSRjournaljce
This paper presents a review on recent trends in incremental clustering algorithms. It tries to focus on both clustering based on similarity measure and clustering not based on similarity measure. In this context, the paper is devoted to various typical incremental clustering algorithms. Mainly optimization, genetic and fuzzy approaches of these algorithms is covered in the paper. The paper is original with respect to one aspect that is, it provides a complete overview that is fully devoted to evolutionary algorithms for incremental clustering. A number of references are provided that describe applications of evolutionary algorithms for incremental clustering in different domains, such as human activity detection, online fault detection, information security, track an object consistently throughout the network solving boundary problem etc.
A SURVEY ON DATA MINING IN STEEL INDUSTRIESIJCSES Journal
In Industrial environments, huge amount of data is being generated which in turn collected indatabase anddata warehouses from all involved areas such as planning, process design, materials, assembly, production, quality, process control, scheduling, fault detection,shutdown, customer relation management, and so on. Data Mining has become auseful tool for knowledge acquisition for industrial process of Iron and steel making. Due to the rapid growth in Data Mining, various industries started using data mining technology to search the hidden patterns, which might further be used to the system with the new knowledge which might design new models to enhance the production quality, productivity optimum cost and maintenance etc. The continuous improvement of all steel production process regarding the avoidance of quality deficiencies and the related improvement of production yield is an essential task of steel producer. Therefore, zero defect strategy is popular today and to maintain it several quality assurancetechniques areused. The present report explains the methods of data mining and describes its application in the industrial environment and especially, in the steel industry.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after
the collection of these data sets is completed, but this is far from satisfying the actual demand. Large
amounts of medical data are generated in real time, so that real-time analysis can yield more value. This
paper introduces the design of the Sentinel which can realize the real-time analysis system based on the
clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering
algorithm and issue an early alert.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after
the collection of these data sets is completed, but this is far from satisfying the actual demand. Large
amounts of medical data are generated in real time, so that real-time analysis can yield more value. This
paper introduces the design of the Sentinel which can realize the real-time analysis system based on the
clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering
algorithm and issue an early alert.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after the collection of these data sets is completed, but this is far from satisfying the actual demand. Large amounts of medical data are generated in real time, so that real-time analysis can yield more value. This paper introduces the design of the Sentinel which can realize the real-time analysis system based on the clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering algorithm and issue an early alert.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after the collection of these data sets is completed, but this is far from satisfying the actual demand. Large amounts of medical data are generated in real time, so that real-time analysis can yield more value. This paper introduces the design of the Sentinel which can realize the real-time analysis system based on the clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering algorithm and issue an early alert.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after
the collection of these data sets is completed, but this is far from satisfying the actual demand. Large
amounts of medical data are generated in real time, so that real-time analysis can yield more value. This
paper introduces the design of the Sentinel which can realize the real-time analysis system based on the
clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering
algorithm and issue an early alert.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after
the collection of these data sets is completed, but this is far from satisfying the actual demand. Large
amounts of medical data are generated in real time, so that real-time analysis can yield more value. This
paper introduces the design of the Sentinel which can realize the real-time analysis system based on the
clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering
algorithm and issue an early alert.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
A study and survey on various progressive duplicate detection mechanismseSAT Journals
Abstract One of the serious problems faced in several applications with personal details management, customer affiliation management, data mining, etc is duplicate detection. This survey deals with the various duplicate record detection techniques in both small and large datasets. To detect the duplicity with less time of execution and also without disturbing the dataset quality, methods like Progressive Blocking and Progressive Neighborhood are used. Progressive sorted neighborhood method also called as PSNM is used in this model for finding or detecting the duplicate in a parallel approach. Progressive Blocking algorithm works on large datasets where finding duplication requires immense time. These algorithms are used to enhance duplicate detection system. The efficiency can be doubled over the conventional duplicate detection method using this algorithm. Severa
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A review on data mining techniques for Digital Mammographic Analysisijdmtaiir
- Medical Data mining is the search for relationships
and patterns within the medical data that could provide useful
knowledge for effective medical diagnosis. The predictability
of disease will become more effective and early detection of
disease will aid in increased exposure to required patient care
and improved cure rates using computational applications.
Review shows that the reasons for feature selection include
improvement in performance prediction, reduction in
computational requirements, reduction in data storage
requirements, reduction in the cost of future measurements
and improvement in data or model understanding
Comparison on PCA ICA and LDA in Face Recognitionijdmtaiir
Face recognition is used in wide range of application.
In recent years, face recognition has become one of the most
successful applications in image analysis and understanding.
Different statistical method and research groups reported a
contradictory result when comparing principal component
analysis (PCA) algorithm, independent component analysis
(ICA) algorithm, and linear discriminant analysis (LDA)
algorithm that has been proposed in recent years. The goal of
this paper is to compare and analyze the three algorithms and
conclude which is best. Feret Dataset is used for consistency
A Novel Approach to Mathematical Concepts in Data Miningijdmtaiir
-This paper describes three different fundamental
mathematical programming approaches that are relevant to
data mining. They are: Feature Selection, Clustering and
Robust Representation. This paper comprises of two clustering
algorithms such as K-mean algorithm and K-median
algorithms. Clustering is illustrated by the unsupervised
learning of patterns and clusters that may exist in a given
databases and useful tool for Knowledge Discovery in
Database (KDD). The results of k-median algorithm are used
to collecting the blood cancer patient from a medical database.
K-mean clustering is a data mining/machine learning algorithm
used to cluster observations into groups of related observations
without any prior knowledge of those relationships. The kmean algorithm is one of the simplest clustering techniques
and it is commonly used in medical imaging, biometrics and
related fields.
Analysis of Classification Algorithm in Data Miningijdmtaiir
Data Mining is the extraction of hidden predictive
information from large database. Classification is the process
of finding a model that describes and distinguishes data classes
or concept. This paper performs the study of prediction of class
label using C4.5 and Naïve Bayesian algorithm.C4.5 generates
classifiers expressed as decision trees from a fixed set of
examples. The resulting tree is used to classify future samples
.The leaf nodes of the decision tree contain the class name
whereas a non-leaf node is a decision node. The decision node
is an attribute test with each branch (to another decision tree)
being a possible value of the attribute. C4.5 uses information
gain to help it decide which attribute goes into a decision node.
A Naïve Bayesian classifier is a simple probabilistic classifier
based on applying Baye’s theorem with strong (naive)
independence assumptions. Naive Bayesian classifier assumes
that the effect of an attribute value on a given class is
independent of the values of the other attribute. This
assumption is called class conditional independence. The
results indicate that Predicting of class label using Naïve
Bayesian classifier is very effective and simple compared to
C4.5 classifier
Performance Analysis of Selected Classifiers in User Profilingijdmtaiir
User profiles can serve as indicators of personal
preferences which can be effectively used while providing
personalized services. Building user files which can capture
accurate information of individuals has been a daunting task.
Several attempts have been made by researchers to extract
information from different data sources to build user profiles
on different application domains. Towards this end, in this
paper we employ different classification algorithmsto create
accurate user profiles based on information gathered from
demographic data. The aim of this work is to analyze the
performance of five most effective classification methods,
namely Bayesian Network(BN), Naïve Bayesian(NB), Naives
Bayes Updateable(NBU), J48, and Decision Table(DT). Our
simulation results show that, in general, the J48has the highest
classification accuracy performance with the lowest error rate.
On the other hand, it is found that Naïve Bayesian and Naives
Bayes Updateable classifiers have the lowest time requirement
to build the classification model
Analysis of Sales and Distribution of an IT Industry Using Data Mining Techni...ijdmtaiir
The goal of this work is to allow a corporation to
improve its marketing, sales, and customer support operations
through a better understanding of its customers. Keep in mind,
however, that the data mining techniques and tools described
here are equally applicable in fields ranging from law
enforcement to radio astronomy, medicine, and industrial
process control. Businesses in today’s environment
increasingly focus on gaining competitive advantages.
Organizations have recognized that the effective use of data is
the key element in the next generation is to predict the sales
value and emerging trend of technology market. Data is
becoming an important resource for the companies to analyze
existing sales value with current technology trends and this
will be more useful for the companies to identify future sales
value. There a variety of data analysis and modeling techniques
to discover patterns and relationships in data that are used to
understand what your customers want and predict what they
will do. The main focus of this is to help companies to select
the right prospects on whom to focus, offer the right additional
products to company’s existing customers and identify good
customers who may be about to leave. This results in improved
revenue because of a greatly improved ability to respond to
each individual contact in the best way and reduced costs due
to properly allocated resources. Keywords: sales, customer,
technology, profit.
Analysis of Influences of memory on Cognitive load Using Neural Network Back ...ijdmtaiir
Educational mining used to evaluate the leaner's
performance and the learning environment. The learning
process are involved and influenced by different
components. The memory is playing vital role in the
process of learning. The long term, short term, working,
instant, responsive, process, recollect, reference,
instruction and action memory are involved in the
process of learning. The influencing factors on these
memories are identified through the construction analysis
of Neural Network Back Propagation Algorithm. The
observed set of data represented using cubical dataset
format for the mining approach. The mining process is
carried out using neural network based back propagation
network model to decide the influencing cognitive load
for the different learning challenges. The learners’
difficulties are identified through the experimental
results.
An Analysis of Data Mining Applications for Fraud Detection in Securities Marketijdmtaiir
-In recent securities fraud broadly refers to deceptive
practices in connection with the offering for sale of securities.
There are many challenges involved in developing data mining
applications for fraud detection in securities market, including:
massive datasets, accuracy, privacy, performance measures and
complexity. The impacts on the market and the training of
regulators are other issues that need to be addressed. In this
paper we present the results of a Comprehensive systematic
literature review on data mining techniques for detecting
fraudulent activities and market manipulation in securities
market. We identify the best practices that are based on data
mining methods for detecting known fraudulent patterns and
discovering new predatory strategies. Furthermore, we
highlight the challenges faced in the development and
implementation of data mining systems for detecting market
manipulation in securities market and we provide
recommendation for future research works accordingly
An Ill-identified Classification to Predict Cardiac Disease Using Data Cluste...ijdmtaiir
The health care industry contains large amount of
health care data with hidden information. This information is
useful for making effective decision. For getting appropriate
result from the hidden information computer based data mining
techniques are used. Previously Neural Network (NN) is
widely used for predicting cardiac disease. In this paper, a
Cardiac Disease Prediction System (CDPS) is developed by
using data clustering. The CDPS system uses 15 parameters to
predict the disease, for example BP, Obesity, cholesterol, etc.
This 15 attributes like sex, age, weight are given as the input.
In this paper by using the patient’s medical record, an illdefined classification is used at the early stage of the patient to
diagnose the cardiac disease. Based on the result the patients
are advised to keep the sensor to predict them.
Scaling Down Dimensions and Feature Extraction in Document Repository Classif...ijdmtaiir
-In this study a comprehensive evaluation of two
supervised feature selection methods for dimensionality
reduction is performed - Latent Semantic Indexing (LSI) and
Principal Component Analysis (PCA). This is gauged against
unsupervised techniques like fuzzy feature clustering using
hard fuzzy C-means (FCM) . The main objective of the study is
to estimate the relative efficiency of two supervised techniques
against unsupervised fuzzy techniques while reducing the
feature space. It is found that clustering using FCM leads to
better accuracy in classifying documents in the face of
evolutionary algorithms like LSI and PCA. Results show that
the clustering of features improves the accuracy of document
classification
Music Promotes Gross National Happiness Using Neutrosophic fuzzyCognitive Map...ijdmtaiir
This paper provides an investigation to promote
gross national happiness from music using fuzzy logic model.
Music influences rate of learning. It has been the subject of
study for many years. Researchers have confirmed that loud
background noise impedes learning, concentration, and
information acquisition. An interesting phenomenon that
occurs frequently in listening new music among the students
creates sense of anxiety even without having
properunderstandingofmusic.Happiness is the emotion that
expresses various degrees of positive and negativefeelings
ranging from satisfaction to extreme joy. The happiness is
thegoal most people strive to achieve. Happypeople are
satisfied with their lives. The goal of this work is to find the
particular component of music which will ultimately promote
the happiness of people because of indeterminacy situation in
components of music
A Study on Youth Violence and Aggression using DEMATEL with FCM Methodsijdmtaiir
The DEMATEL method is then a good technique for
making decisions. In this paper we analyzed the risk factors of
youth violence and what makes them more aggressive. Since
there are more risk factors of youth violence, to relate each
other more complex to construct FCM and analyze them.
Moreover the data is an unsupervised one obtained from
survey as well as interviews. Hence fuzzy alone has the
capacity to analyses these concepts.
Analyzing the Role of a Family in Constructing Gender Roles Using Combined Ov...ijdmtaiir
Family, as a social institution and as the
fundamental unit of the society, plays a vital role in forming
persons. We become full-fledged members of the society
through the process of socialization which starts in the family.
It provides the first foundational formation of personhood.
Especially in traditional societies like India the role of the
family assumes even greater importance. We learn to
differentiate and to discriminate between man and woman from
the role our parents play. In this paper we analyze the role of
the family in constructing gender roles using Combined
Overlap Block Fuzzy Cognitive Maps
An Interval Based Fuzzy Multiple Expert System to Analyze the Impacts of Clim...ijdmtaiir
Indian agriculture is completely dependent on the
environment and any undesirable change in the environment
has an adverse impact on agriculture. Climate change and
pollution in India have caused great damage to the
environment. In this paper we analyse the impact of climate
change on Indian agriculture. The first section gives an
introduction to the problem. In section two we introduce a new
fuzzy tool called interval based fuzzy multiple expert system.
Section three adapt the new fuzzy tool to analyse the problem
of agricultural impacts of climate change and in section four
we give the results and suggestions based on our analysis
An Approach for the Detection of Vascular Abnormalities in Diabetic Retinopathyijdmtaiir
Diabetic Retinopathy is a common complication of
diabetes that is caused by changes in the blood vessels of the
retina. The blood vessels in the retina get altered. Exudates are
secreted, micro-aneurysms and hemorrhages occur in the
retina. The appearance of these features represents the degree
of severity of the disease. In this paper the proposed approach
detects the presence of abnormalities in the retina using image
processing techniques by applying morphological processing
techniques to the fundus images to extract features such as
blood vessels, micro aneurysms and exudates. These features
are used for the detection of severity of Diabetic Retinopathy.
It can quickly process a large number of fundus images
obtained from mass screening to help reduce the cost, increase
productivity and efficiency for ophthalmologists.
Improve the Performance of Clustering Using Combination of Multiple Clusterin...ijdmtaiir
The ever-increasing availability of textual
documents has lead to a growing challenge for information
systems to effectively manage and retrieve the information
comprised in large collections of texts according to the user’s
information needs. There is no clustering method that can
adequately handle all sorts of cluster structures and properties
(e.g. shape, size, overlapping, and density). Combining
multiple clustering methods is an approach to overcome the
deficiency of single algorithms and further enhance their
performances. A disadvantage of the cluster ensemble is the
highly computational load of combing the clustering results
especially for large and high dimensional datasets. In this paper
we propose a multiclustering algorithm , it is a combination of
Cooperative Hard-Fuzzy Clustering model based on
intermediate cooperation between the hard k-means (KM) and
fuzzy c-means (FCM) to produce better intermediate clusters
and ant colony algorithm. This proposed method gives better
result than individual clusters.
The Study of Symptoms of Tuberculosis Using Induced Fuzzy Coginitive Maps (IF...ijdmtaiir
Tuberculosis or TB is a common, infectious disease
caused by various strains of mycobacterium usually called as
mycobacterium Tuberculosis. Tb attacks the lungs but can also
affect other parts of the body. Most infections are
asymptomatic and latent, but about one in ten latent infections
eventually progresses to active disease which, if left untreated,
kills more than 50% of those so infected. Hence, this paper
analysisthes y m p t o m s o f Tuberculosis using Induced
Fuzzy Cognitive Maps (IFCMs). IFCMs area fuzzy-graph
modeling approach based on expert’s opinion. This is the nonstatistical approach to study the problems with imprecise
information.
A Study on Finding the Key Motive of Happiness Using Fuzzy Cognitive Maps (FCMs)ijdmtaiir
Happiness is subjective. It is difficult to compare one
person’s happiness with another. It can be especially difficult
to compare happiness across cultures. The function of man is
to live a certain kind of life and this activity implies a rational
principle. The function of a good man is the good and noble
performance. If any action is well performed it is performed in
accord with the appropriate excellence. If this is the case, then
happiness turns out to be an activity of the soul in accordance
with virtue. That is happiness as the exercise of virtue. Every
human being thinks happiness in his own perspective.
Everyone wants to be happy and searching happiness in all
their activities. Happiness is typically measured using
subjective measures. Happiness cannot be defined in terms of
rigid boundaries and it is a vague term and it is appropriate to
use Fuzzy Logic. In particular, we use Fuzzy Cognitive
Mapping to find the key motive of happiness. This paper
consists of four sections. The first section is of inductive
nature about happiness. The section two introduces the
concept of Fuzzy Cognitive Mapping (FCMs). In section three
Fuzzy Cognitive Mapping applied on the concept of
Happiness. Section four gives the conclusions and
suggestions
Study of sustainable development using Fuzzy Cognitive Relational Maps (FCM)ijdmtaiir
Sustainable development provides a framework under
which communities can use resources efficiently, create efficient
infrastructures, protect and enhance quality of life, and create new
businesses to strengthen their economies. It can help us create
healthy communities that can sustain our generation, as well as
those that follow ours. Sustainable development is not a new
concept. Rather, it is the latest expression of a long-standing ethic
involving peoples' relationships with the environment and the
current generation's responsibilities to future generations. For a
community to be truly sustainable, it must adopt a three-pronged
approach that considers economic, environmental and cultural
resources. Communities must consider these needs in the short
term as well as the long term. Sustainable Development also can
be defined simply as a better quality of life for everyone, now and
for generations to come. It is a vision of progress that links
economic development, protection of the environment and social
justice, and its values are recognised by democratic governments
and political movements the world over. Sustainable
Development is therefore closely linked to Governance, Better
Regulation and Impact Assessment. Indicators to measure
progress are also vital.This paper has four sections. In the first
section we introduce the notion of fuzzy cognitive maps and
Combined Fuzzy Cognitive Maps (CFCMs). In section two we
describe the problem and justification for the use of FCMs. In
section three we give the adaptation of FCM to the problem. In
the final section we give conclusions based on our analysis of the
problem using FCM
A Study of Personality Influence in Building Work Life Balance Using Fuzzy Re...ijdmtaiir
Personality plays an important role in work life
balance irrespective of the organizational setups and other
factors. It has become a subject of concern in terms of
technological, market and organizational changes associated
with an individual’s personality. Here in this study an attempt is
made to study about the holistic picture of personality influence
in work-life balance on the basis of experts’ opinion. The
influence of personality is studied from the big five factors of
personality traits. The data were analyzed using Fuzzy
Relational mapping (FRM) model and conclusions arrived for
which personality has more influence in building work life
balance and which one is more vulnerable for work life
imbalance
About
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Technical Specifications
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
Key Features
Indigenized remote control interface card suitable for MAFI system CCR equipment. Compatible for IDM8000 CCR. Backplane mounted serial and TCP/Ethernet communication module for CCR remote access. IDM 8000 CCR remote control on serial and TCP protocol.
• Remote control: Parallel or serial interface
• Compatible with MAFI CCR system
• Copatiable with IDM8000 CCR
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
Application
• Remote control: Parallel or serial interface.
• Compatible with MAFI CCR system.
• Compatible with IDM8000 CCR.
• Compatible with Backplane mount serial communication.
• Compatible with commercial and Defence aviation CCR system.
• Remote control system for accessing CCR and allied system over serial or TCP.
• Indigenized local Support/presence in India.
• Easy in configuration using DIP switches.
Cosmetic shop management system project report.pdfKamal Acharya
Buying new cosmetic products is difficult. It can even be scary for those who have sensitive skin and are prone to skin trouble. The information needed to alleviate this problem is on the back of each product, but it's thought to interpret those ingredient lists unless you have a background in chemistry.
Instead of buying and hoping for the best, we can use data science to help us predict which products may be good fits for us. It includes various function programs to do the above mentioned tasks.
Data file handling has been effectively used in the program.
The automated cosmetic shop management system should deal with the automation of general workflow and administration process of the shop. The main processes of the system focus on customer's request where the system is able to search the most appropriate products and deliver it to the customers. It should help the employees to quickly identify the list of cosmetic product that have reached the minimum quantity and also keep a track of expired date for each cosmetic product. It should help the employees to find the rack number in which the product is placed.It is also Faster and more efficient way.
Welcome to WIPAC Monthly the magazine brought to you by the LinkedIn Group Water Industry Process Automation & Control.
In this month's edition, along with this month's industry news to celebrate the 13 years since the group was created we have articles including
A case study of the used of Advanced Process Control at the Wastewater Treatment works at Lleida in Spain
A look back on an article on smart wastewater networks in order to see how the industry has measured up in the interim around the adoption of Digital Transformation in the Water Industry.
Automobile Management System Project Report.pdfKamal Acharya
The proposed project is developed to manage the automobile in the automobile dealer company. The main module in this project is login, automobile management, customer management, sales, complaints and reports. The first module is the login. The automobile showroom owner should login to the project for usage. The username and password are verified and if it is correct, next form opens. If the username and password are not correct, it shows the error message.
When a customer search for a automobile, if the automobile is available, they will be taken to a page that shows the details of the automobile including automobile name, automobile ID, quantity, price etc. “Automobile Management System” is useful for maintaining automobiles, customers effectively and hence helps for establishing good relation between customer and automobile organization. It contains various customized modules for effectively maintaining automobiles and stock information accurately and safely.
When the automobile is sold to the customer, stock will be reduced automatically. When a new purchase is made, stock will be increased automatically. While selecting automobiles for sale, the proposed software will automatically check for total number of available stock of that particular item, if the total stock of that particular item is less than 5, software will notify the user to purchase the particular item.
Also when the user tries to sale items which are not in stock, the system will prompt the user that the stock is not enough. Customers of this system can search for a automobile; can purchase a automobile easily by selecting fast. On the other hand the stock of automobiles can be maintained perfectly by the automobile shop manager overcoming the drawbacks of existing system.
Forklift Classes Overview by Intella PartsIntella Parts
Discover the different forklift classes and their specific applications. Learn how to choose the right forklift for your needs to ensure safety, efficiency, and compliance in your operations.
For more technical information, visit our website https://intellaparts.com
Courier management system project report.pdfKamal Acharya
It is now-a-days very important for the people to send or receive articles like imported furniture, electronic items, gifts, business goods and the like. People depend vastly on different transport systems which mostly use the manual way of receiving and delivering the articles. There is no way to track the articles till they are received and there is no way to let the customer know what happened in transit, once he booked some articles. In such a situation, we need a system which completely computerizes the cargo activities including time to time tracking of the articles sent. This need is fulfilled by Courier Management System software which is online software for the cargo management people that enables them to receive the goods from a source and send them to a required destination and track their status from time to time.
Vaccine management system project report documentation..pdfKamal Acharya
The Division of Vaccine and Immunization is facing increasing difficulty monitoring vaccines and other commodities distribution once they have been distributed from the national stores. With the introduction of new vaccines, more challenges have been anticipated with this additions posing serious threat to the already over strained vaccine supply chain system in Kenya.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
Sachpazis:Terzaghi Bearing Capacity Estimation in simple terms with Calculati...Dr.Costas Sachpazis
Terzaghi's soil bearing capacity theory, developed by Karl Terzaghi, is a fundamental principle in geotechnical engineering used to determine the bearing capacity of shallow foundations. This theory provides a method to calculate the ultimate bearing capacity of soil, which is the maximum load per unit area that the soil can support without undergoing shear failure. The Calculation HTML Code included.
CFD Simulation of By-pass Flow in a HRSG module by R&R Consult.pptxR&R Consult
CFD analysis is incredibly effective at solving mysteries and improving the performance of complex systems!
Here's a great example: At a large natural gas-fired power plant, where they use waste heat to generate steam and energy, they were puzzled that their boiler wasn't producing as much steam as expected.
R&R and Tetra Engineering Group Inc. were asked to solve the issue with reduced steam production.
An inspection had shown that a significant amount of hot flue gas was bypassing the boiler tubes, where the heat was supposed to be transferred.
R&R Consult conducted a CFD analysis, which revealed that 6.3% of the flue gas was bypassing the boiler tubes without transferring heat. The analysis also showed that the flue gas was instead being directed along the sides of the boiler and between the modules that were supposed to capture the heat. This was the cause of the reduced performance.
Based on our results, Tetra Engineering installed covering plates to reduce the bypass flow. This improved the boiler's performance and increased electricity production.
It is always satisfying when we can help solve complex challenges like this. Do your systems also need a check-up or optimization? Give us a call!
Work done in cooperation with James Malloy and David Moelling from Tetra Engineering.
More examples of our work https://www.r-r-consult.dk/en/cases-en/
Overview of the fundamental roles in Hydropower generation and the components involved in wider Electrical Engineering.
This paper presents the design and construction of hydroelectric dams from the hydrologist’s survey of the valley before construction, all aspects and involved disciplines, fluid dynamics, structural engineering, generation and mains frequency regulation to the very transmission of power through the network in the United Kingdom.
Author: Robbie Edward Sayers
Collaborators and co editors: Charlie Sims and Connor Healey.
(C) 2024 Robbie E. Sayers
Design and Analysis of Algorithms-DP,Backtracking,Graphs,B&B
Certain Investigation on Dynamic Clustering in Dynamic Datamining
1. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 02 Issue: 02 December 2013 Page No.71-75
ISSN: 2278-2419
71
Certain Investigation on Dynamic Clustering in
Dynamic Datamining
S.Angel Latha Mary1
,Dr.K.R.Shankar Kumar2
1
Associate Professor, CSE Department, Karpagam college of Engineering,Coimbatore, India
2
Professor, ECE Department, Ranganathan Engineering College
Email : xavierangellatha@gmail.com, shanwire@gmail.com
Abstract - Clustering is the process of grouping a set of objects
into classes of similar objects. Dynamic clustering comes in a
new research area that is concerned about dataset with dynamic
aspects. It requires updates of the clusters whenever new data
records are added to the dataset and may result in a change of
clustering over time. When there is a continuous update and
huge amount of dynamic data, rescan the database is not
possible in static data mining. But this is possible in Dynamic
data mining process. This dynamic data mining occurs when
the derived information is present for the purpose of analysis
and the environment is dynamic, i.e. many updates occur.
Since this has now been established by most researchers and
they will move into solving some of the problems and the
research is to concentrate on solving the problem of using data
mining dynamic databases. This paper gives some
investigation of existing work done in some papers related with
dynamic clustering and incremental data clustering.
Keywords- Clustering, incremental data clustering dynamic
clustering, dynamic data mining, Cluster evaluation, Cluster
validity Index.
I. INTRODUCTION
Clustering is the process of grouping a set of objects into
classes of similar objects. This data objects are similar to one
another within the same cluster and dissimilar to the objects in
the other clusters [1]. Clustering is a challenging problem in
scalability, ability to deal with different types of attributes,
discovery of cluster with arbitrary shape, domain knowledge to
determine input parameters, updating new dataset and
visualizing high-dimensional sparse data simultaneously [2].
Dynamic clustering comes in a new research area that is
concerned about dataset with dynamic aspects. It requires
updates of the clusters whenever new data records are added to
the dataset and may result in a change of clustering over time.
For example, the bank customer is interested in obtaining his
current account status. An economic analyst can receive a lot
of new articles every day and he would like to update the
relevant associations based on all current articles. Recent
developments of clustering systems uses dynamic data which
are concerned about the clustering process in dynamically
[3][4] .
II. DYNAMIC DATA MINING (DDM)
Data mining, the process of knowledge discovery in databases
(KDD), is concerned with finding patterns in the raw data and
finds useful information or to predict trends. Recently, the data
are growing with unpredictable rate. Discovering knowledge in
these data is a very expensive operation [5]. Running data
mining algorithms each time when there is a change in data is a
challenging problem. Therefore updating knowledge
dynamically will solve these problems. Dynamic data mining
is a shift from static analysis to dynamic analysis which
discoverers and updates knowledge along with new updated
data [3]. Dynamic data mining is very useful to obtain high
quality results in the field of time series analysis,
telecommunications, mobile networking, nanotechnology,
physics, chemistry, biology, health care, sociology and
economics [6]. When there is a continuous update and huge
amount of dynamic data, rescan the database is not possible in
static data mining. But this is possible in Dynamic data mining
process [7].
Dynamic data mining applies data mining algorithms on
dynamic database. It updates existing set of clusters
dynamically. A data warehouse is not updated immediately
when insertions and deletions takes place in the databases.
These updates are applied to the data warehouse periodically,
e.g. each night. This dynamic data mining occurs when the
derived information is present for the purpose of analysis and
the environment is dynamic, i.e. many updates occur [8][9].
The data mining tasks clustering, classification and
summarization use this dynamic data mining. Some examples
are some set of customers (clustering sales transactions),
Symptoms of distinguishing disease A from disease B
(classification in a medical data warehouse), Description of the
typical WWW access patterns (summarization in the data
warehouse of an internet provider).
Some factors for dynamic data mining.
Whenever a database may have frequent updates.
Database modified for every insertion or deletion.
For every modification (insertion and deletion) the
database requires rescanning again. So the time
complexity is high.
Hence the incremental clustering algorithm includes the
logic for insertion and deletion of the databases as separate
dynamic operation.
After insertions and deletions to the database, the existing
clusters have to be updated.
When the data is inserted or deleted the rescanning of the
whole database increases the time complexity. The system
with dynamic concept will not rescan the database, but it
updates the new data existing data. So it is more efficient
and suitable to use in a large multidimensional dynamic
database.
III. DYNAMIC CLUSTERING
2. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 02 Issue: 02 December 2013 Page No.71-75
ISSN: 2278-2419
72
Clustering and visualizing high dimensional dynamic data is a
challenging problem in the data mining. Most of the existing
clustering algorithms are based on the static statistical
relationship among data. In the clustering process, there are no
predefined classes and no examples that would show what kind
of desirable relations should be valid among the data [2]. The
databases will dynamically change due to frequent insertions
and deletions which changes clustering structure over time.
Completely reapplying the clustering algorithm to detect the
changes in the clustering structure and update the uncovered
data patterns following such deletions and insertions is very
expensive for large high dimensional fast changing dataset.
Dynamic clustering is a mechanism to adopt and discover
clusters where a data set is updated periodically through
insertions and deletions. There are many applications such as
incremental data mining in data warehousing applications,
sensor network which relies on dynamic data clustering
algorithms.
IV. EVALUATING THE QUALITY OF THE
RESULT CLUSTERS
The choice of number of the clusters is an important sub
problem of clustering. Since it needs a priori knowledge in
general and the vector dimensions are often higher than two,
which do not have visually apparent clusters. The solution of
this problem directly affects the quality of the result. If the
number of clusters is too small, different objects in data will
not be separated. Moreover, if this estimated number is too
large, relatively regions may be separated into a number of
smaller regions [10]. Both of these situations are to be avoided.
This problem is known as the cluster validation problem. The
aim is to estimate the number of clusters during the clustering
process. The basic idea is the evaluation of a clustering
structure by generating several clustering for various number
of clusters and compare them against with some evaluation
criteria. The procedure of evaluating the results of a clustering
algorithm is known as cluster validity. In general terms there
are three approaches to investigate cluster validity. The first is
based on external criteria. This implies that the results of a
clustering algorithm is evaluated based on a pre-specified
structure, which is imposed on a data set and reflects user
intuition about the clustering structure of the data set [11]. The
indices Rand Statistic, Jaccard Coefficient, Fowlkes–Mallows
index, Hubert’s statistic are used to measure the degree of
similarity based on external criteria.
The second approach is based on internal criteria. In this case
the clustering results are evaluated in terms of quantities that
involve the vectors of the data set themselves (e.g. proximity
matrix). The following methods can be used to assess the
quality clustering algorithms based on internal criterion:
Davies–Bouldin index, Dunn index and F-measure. The third
approach of clustering validity is based on relative criteria.
Here the basic idea is the evaluation of a clustering structure by
comparing it to other clustering schemes resulting by the same
algorithm but with different input parameter values. The two
first approaches are based on statistical tests and their major
drawback is their high computational cost. The third approach
aims at finding the best clustering scheme that a clustering
algorithm can define under certain assumptions and parameters
[11].
V. WORKS RELATED WITH DYNAMIC
CLUSTERING
The most perspective direction is based on the attempts to
model the work of human brain, which is highly complex,
nonlinear and parallel information processing system. Complex
cortex structure is modeled and formed by artificial neuron
lattices, which are joined by great amount of interlinks. This
global link of simple neurons provides their collective
behavior. Each neuron carries out the role of a processor.
That's why neuron network structure is the most appropriate
base for parallel computing. There is no need to prepare data
(in neural network input data is already parallelized). For
parallel computing to work correctly software should be able to
partition its work and data it operates on over hundreds of
processors. High speed and with the same time high quality
solution of most various complicated problems can be received
by means of microsystem's collective behavior property. The
main idea of self organization is in distributed character of data
processing, when one element dynamics means nothing, but at
the same time group dynamics define macroscopic unique state
of the whole system, that allows this system to reveal
capabilities for adaptation, learning, data mining and as one of
the results high computation effectiveness [5].Advances in
experimental brain science give evidence to the hypothesis that
cognition, memory, attention processes are the results of
cooperative chaotic dynamics of brain cortex elements
(neurons). Thus the design of artificial dynamic neural
networks on the base of neurobiological prototype seems to be
the right direction of the search for innovative clustering
techniques. Computer science development predetermined
promising possibilities of computer modeling. It became
possible to study complex nonlinear systems. Dynamics
exponential unpredictability of chaotic systems, their extreme
instability generates variety of system's possible states that can
help us to describe all the multiformity of our planet.
It is assumed to be very advantageous to obtain clustering
problem solution using effects produced by chaotic systems
interaction. This research tries to make next step in the
development of universal clustering technique. Dynamic data
mining combines modern data mining techniques with modern
time series analysis techniques.Saka and Nasraoui [2]
designing a simultaneous clustering and visualization
algorithm and the Dynamic FClust Algorithm which is based
on flocks of agents as a biological metaphor. This algorithm
falls within the swarm based clustering family, which is unique
compared to other approaches, because its model is an ongoing
swarm of agents that socially interact with each other and is
therefore inherently dynamic.Yucheng Kao and Szu-Yuan Lee
[12] presented a new dynamic data clustering algorithm based
on K-Means and combinatorial particle swarm optimization,
called KCPSO. Unlike the traditional K-Means method,
KCPSO does not need a specific number of clusters given
before performing the clustering process and is able to find the
optimal number of clusters during the clustering process. In
each iteration of KCPSO, a discrete PSO (Particle Swarm
Optimization) is used to optimize the number of clusters with
which the K-Means is used to find the best clustering result.
KCPSO has been developed into a software system and
evaluated by testing some datasets. Encouraging results show
3. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 02 Issue: 02 December 2013 Page No.71-75
ISSN: 2278-2419
73
that KCPSO is an effective algorithm for solving dynamic
clustering problems.
Elghazel Haytham et al [13] presented a dynamic version for
the b-coloring based clustering approach which relies only on
dissimilarity matrix and cluster dominating vertices in order to
cluster new data as they are added to the data collection or to
rearrange a partition when an existing data is removed. A real
advantage of this method is that it performs a dynamic
classification that correctly satisfies the b-coloring properties
and the clustering performances in terms of quality and
runtime, when the number of clusters is not pre-defined and
without any exception on the type of data. The results obtained
over three UCI data sets have illustrated the efficiency of the
algorithm to generate good results than Single-Pass and k-NN
(k-nearest neighbor) algorithms. There are many interesting
issues to pursue: (1) leading additional experiments on a larger
medical data set where a patient stay typology is required and
an inlet patient stay is regular and has to be incorporate to the
typology. (2) Extending the incremental concept to add or
remove simultaneously sets of instances and (3) to define some
operators which permit to combine easily different clustering’s
constructing on different data.Ester et al [14] presented an
incremental clustering algorithm based on the clustering
algorithm DBSCAN for mining in a data warehousing
environment which is applicable to any database containing
data from a metric space, e.g., to a spatial database or to a
WWW-log database. Due to the density based nature of
DBSCAN, the insertion or deletion of an object affects the
current clustering only in the neighborhood of that object. Thus
efficient algorithms could be given for incremental insertions
and deletions to an existing clustering. Based on the formal
definition of clusters, this incremental algorithm yields the
same result as DBSCAN. Incremental DBSCAN yields
significant speed-up factors over DBSCAN even for large
numbers of daily updates in a data warehouse. The authors
were assumed that the parameter values Eps and MinPts of
incremental DBSCAN did not change significantly when
inserting and deleting objects.
Elena and Sofya [5] described about centuries humans admire
animate nature and accessories applied by life creatures to
fulfill various functions. At first it was just formal resemblance
and mechanistic imitation, then along with sciences maturity
the focus shifted on inner construction of living systems.
However due to the complexity of a living system it is
reproduced partly. Separate subsystems embody limited set of
functions and principals. Just independently showed up
artificial neural networks (attempts to mimic neural system),
genetic algorithms (data transfer by means of inheritance),
artificial immune systems (partial reproduction of immune
system), evolutionary modeling (imitation of evolution
development principals). The idea of natural self-organization
within individuals is the basis for swarm and ant colony
technologies. It is important to note that nearly all mentioned
technologies deal with distributed parallel data processing
thanks to numerous simple processing units comprised into
self-organized networks that adapt to ever-changing
environment (input information).Of course there exist
substantial peculiarities in the types of local cooperation and
global behavior mechanisms predetermined by system's goal
(as it is well-known systems demonstrate not only
interconnectivity of elements but their ability to serve one
purpose). Evolution of society, new computer technologies
have in common idea of small worlds modeling. Communities
of various natures (interests clubs, computer clusters,
marketing networks, etc.) speak up for strong local linkage of
units and weak connectivity outward nearest neighbors (nodes
of the net).
Recent research on brain activities gives evidence for its
cluster organization. So generally the small world models
reflect both animate nature and abiocoen. Originally the notion
bio-inspired comprised problem solving approaches borrowed
from living systems but nowadays it is understood more
widely. Results in physics in the field of chaos theory and
nonlinear dynamics contribute greatly to bio-inspired
methodology as soon as nonlinear chaotic models find their
application in data mining - first and fore most bio-inspired
scientific area. It proposes to classify bio inspired methods on
different issues:
a. Structure and connection: neural networks (macro level)
and artificial immune systems (micro level);
b. Collective behavior: ant-based networks, swarm methods,
multi agent systems, small- world networks;
c. Evolution and selection: genetic algorithm, evolutionary
programming and evolutionary modeling, evolutionary
computations;
d. Linguistics: fuzzy logic.
To step forward with generalization one can note that nearly all
mentioned methods realize collective data processing through
adaptation to external environment. Exception is fuzzy logic
more relative to classical mathematics (interval logic reflects
the diversity of natural language descriptions) Though bio
inspired methods are applied to solve a wide set of problems it
focus on clustering problem as the most complex and resource
consuming. The division of input set of objects into subsets
(mainly non-overlapping) in most cases is interpreted as
optimization task with goal function determined by inter and
inner cluster distances. This approach obliges the user to give
them a priori information about priorities: what is of most
importance - compactness of clusters and their diversity in
feature space or inner cluster density and small number of
clusters. The formalization process of clustering problems in
terms of optimization procedures is one of the edge one in data
mining.
Recent modifications of bio inspired methods are developed as
heuristics. It desires to enlarge the abilities of intellectual
systems a separate knowledge domain within artificial
intelligence field revealed.Soft computing (SC) considers
various combinations of bio-inspired methods. As a result there
appeared such hybrid methods like: neural fuzzy methods,
genetic algorithms with elements of fuzzy logic (FL), hybrid
comprised by genetic algorithms (GA) and neural networks
(NN); fuzzy logic with genetic algorithm constituent, fuzzy
systems with neural network constituent, etc. One of the main
ideas of such combinations is to obtain flexible tool that allow
to solve complex problems and to compensate drawbacks of
one approach by means of cooperation with another.For
example, FL and NN combination provides learning abilities
and at the same time formalized knowledge can be represented
due to fuzzy logic element. Fuzzy logic is applied as soon as to
4. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 02 Issue: 02 December 2013 Page No.71-75
ISSN: 2278-2419
74
add some flexibility to a data mining technique. One of the
main drawbacks of all fuzzy systems are absence of learning
capabilities, absence of parallel distributing processing and
what is more critical the rely on expert's opinions when
membership functions are tuned. In advance to input
parameters sensitivity almost all methods suffer from
dimension curse and remain to be resource consuming. The
efficiency of these methods depends greatly on the parallel
processing hardware that simulate processing units: neurons of
neural networks, lymphocyte in artificial immune systems, ants
and swarms, agents in multi-agent systems, nodes in small-
world networks, chromosomes in genetic algorithms, genetic
programming and genetic modeling.
The benefit from synergetic effects considers not only
collective dynamics but also physical and chemical nature of
construction elements - nonlinear oscillators with chaotic
dynamics. As it is shown in numerous investigations on
nonlinear dynamics: the more is the problem complexity. The
more complex should be the system dynamics. All over the
world investigations on molecular level take place to get new
materials, to find new medicine, to solve pattern recognition
problem etc. Most of them consume knowledge from adjacent
disciplines: biology, chemistry, math, informatics, nonlinear
dynamics and synergetics.
VI. RELATED WORKS
Due to the continuous, unbounded, and high speed
characteristics of dynamic data, there is a huge amount of data
and there is not enough time to rescan the whole database or
perform a rescan as in traditional data mining algorithms
whenever an update occurs [4]. Ganti et al [15] examine
mining of data streams. A block evolution model is introduced
where a data set is updated periodically through insertions and
deletions. In this model the data set consists of conceptually
infinite sequence of data blocks D1, D2, ... that arrive at times
1, 2, ... where each block has a set of records. The authors
highlight two challenges in mining evolving blocks of data:
change detection and data mining model maintenance. In
change detection, the differences between two data blocks are
determined. Next, a data mining model should be maintained
under the insertions and deletions of blocks of the data
according to a specified data span and block selection
sequence.Crespoa and Weberb [3] presented a methodology for
dynamic data mining using fuzzy clustering that assigns static
objects to dynamic classes. Changes that they have studied are
movement, creation and
elimination of classes and any of their combination. Once a
data mining system is installed and is being used in daily
operations, the user has to be concerned with the system’s
future performance because the extracted knowledge is based
on past behavior of the analyzed objects If future performance
is very similar to past performance (e.g. if company customers
files do not change their files over time) using the initial data
mining system could be justified. If, however, performance
changes over time (e.g. if hospital patients do not change their
files over time), the continued use of the early system could
lead to an unsuitable results and (as an effect) to an
unacceptable decisions based on these results. Here dynamic
data mining could be extremely helpful in making the right
decision in the right time and affects the efficiency of the
decision.
There are three strategies if a user is to keep applying his/her
data mining system in a changing environment.
The user can neglect changes in the environment and keep
on applying the initial system without any further updates.
It has the advantage of being “computationally cheap”
since no update to data mining system is performed. Also
it does not require changes in subsequent processes. Its
disadvantage is that current updates could not be detected.
Every certain period of time, depending on the application,
a new system is developed using all the available data. The
advantage in this case is the user has always a system “up-
to-date” due to the use of current data. Disadvantages of
this strategy are the computational costs of creating a new
system every time from scratch.
Based on the initial system and “new data” an update of
data is performed. This will be shown to be available
method in this dissertation.
In the area of data mining various methods have been
developed in order to find useful information patterns within
data. Among the most important methods are association rules,
clustering and decision trees methods. For each of the above
data mining methods, updating has different aspects and some
updating approaches have been proposed:Decision trees:
Various techniques for incremental learning and tree
restructuring.Neural networks: Updating is often used in the
sense of re-learning or improving the net’s performance by
learning with new examples presented to the
network.Clustering: Chung and Mcleod describes in more
detailed approaches for dynamic data mining using clustering
techniques.Association rules :Raghavan et al developed a
system for dynamic data mining for association rules. Chung
and Mcleod proposed mining framework that supports the
identification of useful patterns based on incremental data
clustering, they focused their attention on news stream mining,
they presented a sophisticated incremental hierarchical
document clustering algorithm using a neighbourhood search.
Reigrotzki et al (2001) presented the application of several
process control-related methods applied in the context of
monitoring and controlling data quality in financial databases.
They showed that the quality control process can be considered
as a classical control loop which can be measured via
application of quality tests which exploit data redundancy
defined by Meta information or extracted from data by
statistical models. Appropriate processing and visualization of
the tests results enable Human or automatic detection and
diagnosis of data quality problems. Moreover, the model-based
methods give an insight into business-related information
contained in the data. The methods have been applied to the
data quality monitoring of a real financial database at a
customer site, delivering business benefits, such as
improvements of the modeling quality, a reduction in the
number of the modeling cycles, and better data understanding.
These benefits in turn lead to financial savings.In many
situations, new information is more important than old
information, such as in publication database, stock
transactions, grocery markets, or web-log records.
Consequently, a frequent itemset in the dynamic database is
also important even if it is infrequent in the updated database.
5. Integrated Intelligent Research (IIR) International Journal of Data Mining Techniques and Applications
Volume: 02 Issue: 02 December 2013 Page No.71-75
ISSN: 2278-2419
75
Incremental clustering is the process of updating an existing set
of clusters incrementally rather than mining them from the
scratch on each database update. A brief overview of work
done on incremental clustering algorithms is given next.
COBWEB was proposed by Fisher [16]. It is an incremental
clustering algorithm that builds taxonomy of clusters without
having a pre-defined number of clusters. Gennary et al [17]
proposed CLASSIT which associates normal distributions with
cluster nodes. The main drawback of both COBWEB and
CLASSIT is that they results in highly unbalanced trees.
Charikar et al [18] introduced new deterministic and
randomized incremental clustering algorithms while trying to
minimize the maximum diameters of the clusters. The diameter
of a cluster is its maximum distance among its points and is
used in the restructuring process of the clusters. When a new
point arrives, it is either assigned to one of the current clusters
or it initializes its own cluster while two existing clusters are
combined into one. Ester et al [14] presented Incremental
DBSCAN suitable for mining in a data warehousing
environment. Incremental DBSCAN is based on the DBSCAN
algorithm which is a density based clustering algorithm. It uses
R* Tree as an index structure to perform region queries. Due to
its density based qualities, in Incremental DBSCAN the effects
of inserting and deleting objects are limited only to the
neighborhood of these objects. Incremental DBSCAN requires
only a distance function and is applicable to any data set from a
metric space. However, the proposed method does not address
the problem of changing point densities over time, which
would require adapting the input parameters for Incremental
DBSCAN over time. Another limitation of the algorithm is that
it adds or deletes one data point at a time. An incremental
clustering algorithm based on SWARM intelligence is given in
Chen and Meng[19].
VII. CONCLUSION
The ability to solve complex clustering problems in terms of
oscillations clustering language in future research can be
extended by dynamic inputs or at the beginning of the road, as
there are many aspects of data mining that have not been
tested. Up to date most of the data mining projects have been
dealing with verifying the actual data mining concepts. Since
this has now been established most researchers will move into
solving some of the problems and in this case, the research is
to concentrate on solving the problem of using data mining
dynamic databases. This chapter gives existing work done in
some papers related with dynamic clustering and incremental
data clustering.
REFERENCES
[1] Han, J. and Kamber, M. “Data Mining: Concepts and Techniques”,
Morgan Kaufmann Publishers, 2006.
[2] Saka, E. and Nasraoui, O. “On dynamic data clustering and visualization
using swarm intelligence”, In Data Engineering Workshops (ICDEW), 26th
International Conference on IEEE, pp. 337-340, 2010.
[3] Crespoa, F. and Weber, R. “A methodology for dynamic data mining
based on fuzzy clustering”, Fuzzy Sets and Systems, Elsevier,
Vol. 150, pp. 267-284, 2005.
[4] Hebah, H.O. Naseredd, “Stream Data Mining”, in International Journal of
Web Applications, Vol. 1, No. 4, pp. 183-190, 2009.
[5] Elena N. Benderskaya and Sofya V. Zhukova, “Dynamic Data Mining:
Synergy of Bio-Inspired Clustering Methods”, Knowledge-Oriented
Applications in Data Mining.
[6] Xu, R., Wunsh, D. II, “Survey of clustering algorithms”, IEEE Trans. on
Neural Networks, Vol. 15, pp. 645-678, 2005.
[7] Zhang, J., Tianrui Li, Da Ruan and Dun Liu, “Neighborhood Rough Sets
for Dynamic Data Mining”, International Journal of Intelligent Systems, Wiley
Periodicals, Inc., Vol. 27, pp. 317-342, 2012.
[8] Sanjay Chakraborty and Nagwani, N.K. “Analysis and Study of
Incremental DBSCAN Clustering Algorithm”, International Journal
of Enterprise Computing and Business Systems, Vol. 1, Issue 2,
pp. 2230-8849, 2011.
[9] Sanjay Chakraborty, Nagwani, N.K. and Lopamudra Dey, “Performance
Comparison of Incremental K-Means and Incremental DBSCAN Algorithms”,
International Journal of Computer Applications (0975 – 8887) , Vol. 27, No.
11, 2011.
[10] Svetlana Cherednichenko, Outlier Detection in Clustering, 2005.
[11] Halkidi, M., Batistakis, Y. and Vazirgiannis, M. “Cluster Validity
Methods: part I”, In Proceedings of the ACM SIGMOD International
Conference on Management of Data, Vol. 31, Issue 2, pp. 40-45, 2002.
[12] Yucheng Kao and Szu-Yuan Lee, “Combining K-Means and particle
swarm optimization for dynamic data clustering problems”, This paper appears
in: Intelligent Computing and Intelligent Systems, ICIS 2009, IEEE
International Conference on, Vol. 1, pp. 757-761, 2009.
[13] Elghazel Haytham, Hamamache Kheddouci, Véronique Deslandres and
Alain Dussauchoy, “A Partially Dynamic Clustering Algorithm for Data
Insertion and Removal”, Discovery Science ,Lecture Notes in Computer
Science, Vol. 4755, pp. 78-90, 2007.
[14] Ester, M. and Wittmann, R. “Incremental Generalization for Mining in a
Data Warehousing Environment”, Proc. 6th Int. Conf. on Extending Database
Technology, Valencia, Spain, 1998, in: Lecture Notes in Computer Science,
Springer, Vol. 1377, pp. 135-152, 1998.
[15] Ganti, V., Gehrke, J., Ramakrishnan, R. and Loh, W. “Mining data
streams under block evolution”, In ACM SIGKDD Explorations,
Vol. 3(2), pp. 1-10, 2002.
[16] Fisher, “Knowledge acquisition via incremental conceptual clustering”,
Machine Learning, Vol. 2, pp. 139-172, 1987.
[17] Gennary, J., Langley, P. and Fisher, D. “Model of Incremental Concept
Formation”, Artificial Intelligence Journal, Vol. 40, pp. 11-61, 1989.
[18] Charikar, M., Chekuri, C., Feder, T. and Motwani, R. “Incremental
clustering and dynamic information retrieval”, 29th Symposium on Theory of
Computing, pp. 626-635, 1997.
[19] Chen, Z. and Meng, Q.C. “An incremental clustering algorithm based on
SWARM intelligence theory”, Proc. of the 3rd Int. Conf. on Machine Learning
and Cybernetics, Shanghai, 26-29 August, 2004