The document proposes a strategy for clustering distributed databases using self-organizing maps (SOM) and K-means algorithms. The strategy applies SOM locally to each distributed data set to obtain representative subsets, then combines the results and applies SOM and K-means globally. Specifically, it performs local SOM clustering, sends representative data to a central site, applies SOM again on the combined data, then uses K-means on the unified map to produce the final clustering result.
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
Currently, there are two techniques used for large-scale gene-expression profiling; microarray and
RNA-Sequence (RNA-Seq).This paper is intended to study and compare different clustering algorithms that used
in microarray data analysis. Microarray is a DNA molecules array which allows multiple hybridization
experiments to be carried out simultaneously and trace expression levels of thousands of genes. It is a highthroughput
technology for gene expression analysis and becomes an effective tool for biomedical research.
Microarray analysis aims to interpret the data produced from experiments on DNA, RNA, and protein
microarrays, which enable researchers to investigate the expression state of a large number of genes. Data
clustering represents the first and main process in microarray data analysis. The k-means, fuzzy c-mean, selforganizing
map, and hierarchical clustering algorithms are under investigation in this paper. These algorithms
are compared based on their clustering model.
In recent machine learning community, there is a trend of constructing a linear logarithm version of
nonlinear version through the ‘kernel method’ for example kernel principal component analysis, kernel
fisher discriminant analysis, support Vector Machines (SVMs), and the current kernel clustering
algorithms. Typically, in unsupervised methods of clustering algorithms utilizing kernel method, a
nonlinear mapping is operated initially in order to map the data into a much higher space feature, and then
clustering is executed. A hitch of these kernel clustering algorithms is that the clustering prototype resides
in increased features specs of dimensions and therefore lack intuitive and clear descriptions without
utilizing added approximation of projection from the specs to the data as executed in the literature
presented. This paper aims to utilize the ‘kernel method’, a novel clustering algorithm, founded on the
conventional fuzzy clustering algorithm (FCM) is anticipated and known as kernel fuzzy c-means algorithm
(KFCM). This method embraces a novel kernel-induced metric in the space of data in order to interchange
the novel Euclidean matric norm in cluster prototype and fuzzy clustering algorithm still reside in the space
of data so that the results of clustering could be interpreted and reformulated in the spaces which are
original. This property is used for clustering incomplete data. Execution on supposed data illustrate that
KFCM has improved performance of clustering and stout as compare to other transformations of FCM for
clustering incomplete data.
Clustering is also known as data segmentation aims to partitions data set into groups, clusters, according to their similarity. Cluster analysis has been extensively studied in many researches. There are many algorithms for different types of clustering. These classical algorithms can't be applied on big data due to its distinct features. It is a challenge to apply the traditional techniques on large unstructured data. This study proposes a hybrid model to cluster big data using the famous traditional K-means clustering algorithm. The proposed model consists of three phases namely; Mapper phase, Clustering Phase and Reduce phase. The first phase uses map-reduce algorithm to split big data into small datasets. Whereas, the second phase implements the traditional clustering K-means algorithm on each of the spitted small data sets. The last phase is responsible of producing the general clusters output of the complete data set. Two functions, Mode and Fuzzy Gaussian, have been implemented and compared at the last phase to determine the most suitable one. The experimental study used four benchmark big data sets; Covtype, Covtype-2, Poker, and Poker-2. The results proved the efficiency of the proposed model in clustering big data using the traditional K-means algorithm. Also, the experiments show that the Fuzzy Gaussian function produces more accurate results than the traditional Mode function.
Comparison Between Clustering Algorithms for Microarray Data AnalysisIOSR Journals
Currently, there are two techniques used for large-scale gene-expression profiling; microarray and
RNA-Sequence (RNA-Seq).This paper is intended to study and compare different clustering algorithms that used
in microarray data analysis. Microarray is a DNA molecules array which allows multiple hybridization
experiments to be carried out simultaneously and trace expression levels of thousands of genes. It is a highthroughput
technology for gene expression analysis and becomes an effective tool for biomedical research.
Microarray analysis aims to interpret the data produced from experiments on DNA, RNA, and protein
microarrays, which enable researchers to investigate the expression state of a large number of genes. Data
clustering represents the first and main process in microarray data analysis. The k-means, fuzzy c-mean, selforganizing
map, and hierarchical clustering algorithms are under investigation in this paper. These algorithms
are compared based on their clustering model.
In recent machine learning community, there is a trend of constructing a linear logarithm version of
nonlinear version through the ‘kernel method’ for example kernel principal component analysis, kernel
fisher discriminant analysis, support Vector Machines (SVMs), and the current kernel clustering
algorithms. Typically, in unsupervised methods of clustering algorithms utilizing kernel method, a
nonlinear mapping is operated initially in order to map the data into a much higher space feature, and then
clustering is executed. A hitch of these kernel clustering algorithms is that the clustering prototype resides
in increased features specs of dimensions and therefore lack intuitive and clear descriptions without
utilizing added approximation of projection from the specs to the data as executed in the literature
presented. This paper aims to utilize the ‘kernel method’, a novel clustering algorithm, founded on the
conventional fuzzy clustering algorithm (FCM) is anticipated and known as kernel fuzzy c-means algorithm
(KFCM). This method embraces a novel kernel-induced metric in the space of data in order to interchange
the novel Euclidean matric norm in cluster prototype and fuzzy clustering algorithm still reside in the space
of data so that the results of clustering could be interpreted and reformulated in the spaces which are
original. This property is used for clustering incomplete data. Execution on supposed data illustrate that
KFCM has improved performance of clustering and stout as compare to other transformations of FCM for
clustering incomplete data.
Clustering is also known as data segmentation aims to partitions data set into groups, clusters, according to their similarity. Cluster analysis has been extensively studied in many researches. There are many algorithms for different types of clustering. These classical algorithms can't be applied on big data due to its distinct features. It is a challenge to apply the traditional techniques on large unstructured data. This study proposes a hybrid model to cluster big data using the famous traditional K-means clustering algorithm. The proposed model consists of three phases namely; Mapper phase, Clustering Phase and Reduce phase. The first phase uses map-reduce algorithm to split big data into small datasets. Whereas, the second phase implements the traditional clustering K-means algorithm on each of the spitted small data sets. The last phase is responsible of producing the general clusters output of the complete data set. Two functions, Mode and Fuzzy Gaussian, have been implemented and compared at the last phase to determine the most suitable one. The experimental study used four benchmark big data sets; Covtype, Covtype-2, Poker, and Poker-2. The results proved the efficiency of the proposed model in clustering big data using the traditional K-means algorithm. Also, the experiments show that the Fuzzy Gaussian function produces more accurate results than the traditional Mode function.
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONSijscmcj
The purpose of this article is to determine the usefulness of the Graphics Processing Unit (GPU) calculations used to implement the Latent Semantic Indexing (LSI) reduction of the TERM-BY DOCUMENT matrix. Considered reduction of the matrix is based on the use of the SVD (Singular Value Decomposition) decomposition. A high computational complexity of the SVD decomposition - O(n3), causes that a reduction of a large indexing structure is a difficult task. In this article there is a comparison of the time complexity and accuracy of the algorithms implemented for two different environments. The first environment is associated with the CPU and MATLAB R2011a. The second environment is related to graphics processors and the CULA library. The calculations were carried out on generally available benchmark matrices, which were combined to achieve the resulting matrix of high size. For both considered environments computations were performed for double and single precision data.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
In this video from the 2015 HPC User Forum in Broomfield, Barry Bolding from Cray presents: HPC + D + A = HPDA?
"The flexible, multi-use Cray Urika-XA extreme analytics platform addresses perhaps the most critical obstacle in data analytics today — limitation. Analytics problems are getting more varied and complex but the available solution technologies have significant constraints. Traditional analytics appliances lock you into a single approach and building a custom solution in-house is so difficult and time consuming that the business value derived from analytics fails to materialize. In contrast, the Urika-XA platform is open, high performing and cost effective, serving a wide range of analytics tools with varying computing demands in a single environment. Pre-integrated with the Hadoop and Spark frameworks, the Urika-XA system combines the benefits of a turnkey analytics appliance with a flexible, open platform that you can modify for future analytics workloads. This single-platform consolidation of workloads reduces your analytics footprint and total cost of ownership."
Learn more: http://www.cray.com/products/analytics/urika-xa
Watch the video presentation: http://wp.me/p3RLEV-3yR
Sign up for our insideBIGDATA Newsletter: http://insidebigdata.com/newsletter
This article got published in the Software Developer's Journal's February Edition.
It describes the use of MapReduce paradigm to design Clustering algorithms and explain three algorithms using MapReduce.
- K-Means Clustering
- Canopy Clustering
- MinHash Clustering
An Efficient Clustering Method for Aggregation on Data FragmentsIJMER
Clustering is an important step in the process of data analysis with applications to numerous fields. Clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a quality cluster. Existing clustering aggregation algorithms are applied directly to large number of data points. The algorithms are inefficient if the number of data points is large. This project defines an efficient approach for clustering aggregation based on data fragments. In fragment-based approach, a data fragment is any subset of the data. To increase the efficiency of the proposed approach, the clustering aggregation can be performed directly on data fragments under comparison measure and normalized mutual information measures for clustering aggregation, enhanced clustering aggregation algorithms are described. To show the minimal computational complexity. (Agglomerative, Furthest, and Local Search); nevertheless, which increases the accuracy.
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Data mining is utilized to manage huge measure of information which are put in the data ware houses and databases, to discover required information and data. Numerous data mining systems have been proposed, for example, association rules, decision trees, neural systems, clustering, and so on. It has turned into the purpose of consideration from numerous years. A re-known amongst the available data mining strategies is clustering of the dataset. It is the most effective data mining method. It groups the dataset in number of clusters based on certain guidelines that are predefined. It is dependable to discover the connection between the distinctive characteristics of data.
In k-mean clustering algorithm, the function is being selected on the basis of the relevancy of the function for predicting the data and also the Euclidian distance between the centroid of any cluster and the data objects outside the cluster is being computed for the clustering the data points. In this work, author enhanced the Euclidian distance formula to increase the cluster quality.
The problem of accuracy and redundancy of the dissimilar points in the clusters remains in the improved k-means for which new enhanced approach is been proposed which uses the similarity function for checking the similarity level of the point before including it to the cluster.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
SVD BASED LATENT SEMANTIC INDEXING WITH USE OF THE GPU COMPUTATIONSijscmcj
The purpose of this article is to determine the usefulness of the Graphics Processing Unit (GPU) calculations used to implement the Latent Semantic Indexing (LSI) reduction of the TERM-BY DOCUMENT matrix. Considered reduction of the matrix is based on the use of the SVD (Singular Value Decomposition) decomposition. A high computational complexity of the SVD decomposition - O(n3), causes that a reduction of a large indexing structure is a difficult task. In this article there is a comparison of the time complexity and accuracy of the algorithms implemented for two different environments. The first environment is associated with the CPU and MATLAB R2011a. The second environment is related to graphics processors and the CULA library. The calculations were carried out on generally available benchmark matrices, which were combined to achieve the resulting matrix of high size. For both considered environments computations were performed for double and single precision data.
Ensemble based Distributed K-Modes ClusteringIJERD Editor
Clustering has been recognized as the unsupervised classification of data items into groups. Due to the explosion in the number of autonomous data sources, there is an emergent need for effective approaches in distributed clustering. The distributed clustering algorithm is used to cluster the distributed datasets without gathering all the data in a single site. The K-Means is a popular clustering method owing to its simplicity and speed in clustering large datasets. But it fails to handle directly the datasets with categorical attributes which are generally occurred in real life datasets. Huang proposed the K-Modes clustering algorithm by introducing a new dissimilarity measure to cluster categorical data. This algorithm replaces means of clusters with a frequency based method which updates modes in the clustering process to minimize the cost function. Most of the distributed clustering algorithms found in the literature seek to cluster numerical data. In this paper, a novel Ensemble based Distributed K-Modes clustering algorithm is proposed, which is well suited to handle categorical data sets as well as to perform distributed clustering process in an asynchronous manner. The performance of the proposed algorithm is compared with the existing distributed K-Means clustering algorithms, and K-Modes based Centralized Clustering algorithm. The experiments are carried out for various datasets of UCI machine learning data repository.
In this video from the 2015 HPC User Forum in Broomfield, Barry Bolding from Cray presents: HPC + D + A = HPDA?
"The flexible, multi-use Cray Urika-XA extreme analytics platform addresses perhaps the most critical obstacle in data analytics today — limitation. Analytics problems are getting more varied and complex but the available solution technologies have significant constraints. Traditional analytics appliances lock you into a single approach and building a custom solution in-house is so difficult and time consuming that the business value derived from analytics fails to materialize. In contrast, the Urika-XA platform is open, high performing and cost effective, serving a wide range of analytics tools with varying computing demands in a single environment. Pre-integrated with the Hadoop and Spark frameworks, the Urika-XA system combines the benefits of a turnkey analytics appliance with a flexible, open platform that you can modify for future analytics workloads. This single-platform consolidation of workloads reduces your analytics footprint and total cost of ownership."
Learn more: http://www.cray.com/products/analytics/urika-xa
Watch the video presentation: http://wp.me/p3RLEV-3yR
Sign up for our insideBIGDATA Newsletter: http://insidebigdata.com/newsletter
This article got published in the Software Developer's Journal's February Edition.
It describes the use of MapReduce paradigm to design Clustering algorithms and explain three algorithms using MapReduce.
- K-Means Clustering
- Canopy Clustering
- MinHash Clustering
An Efficient Clustering Method for Aggregation on Data FragmentsIJMER
Clustering is an important step in the process of data analysis with applications to numerous fields. Clustering ensembles, has emerged as a powerful technique for combining different clustering results to obtain a quality cluster. Existing clustering aggregation algorithms are applied directly to large number of data points. The algorithms are inefficient if the number of data points is large. This project defines an efficient approach for clustering aggregation based on data fragments. In fragment-based approach, a data fragment is any subset of the data. To increase the efficiency of the proposed approach, the clustering aggregation can be performed directly on data fragments under comparison measure and normalized mutual information measures for clustering aggregation, enhanced clustering aggregation algorithms are described. To show the minimal computational complexity. (Agglomerative, Furthest, and Local Search); nevertheless, which increases the accuracy.
A Novel Multi- Viewpoint based Similarity Measure for Document ClusteringIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
IJERA (International journal of Engineering Research and Applications) is International online, ... peer reviewed journal. For more detail or submit your article, please visit www.ijera.com
Data mining is utilized to manage huge measure of information which are put in the data ware houses and databases, to discover required information and data. Numerous data mining systems have been proposed, for example, association rules, decision trees, neural systems, clustering, and so on. It has turned into the purpose of consideration from numerous years. A re-known amongst the available data mining strategies is clustering of the dataset. It is the most effective data mining method. It groups the dataset in number of clusters based on certain guidelines that are predefined. It is dependable to discover the connection between the distinctive characteristics of data.
In k-mean clustering algorithm, the function is being selected on the basis of the relevancy of the function for predicting the data and also the Euclidian distance between the centroid of any cluster and the data objects outside the cluster is being computed for the clustering the data points. In this work, author enhanced the Euclidian distance formula to increase the cluster quality.
The problem of accuracy and redundancy of the dissimilar points in the clusters remains in the improved k-means for which new enhanced approach is been proposed which uses the similarity function for checking the similarity level of the point before including it to the cluster.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Influence of Mineralogy and Fabric on the Engineering Properties of the Miang...IJMER
Abstract: The major problem associated with the use of rock aggregates in engineering construction, is the difficulty of
predicting their probable field performance. This is mainly due to the inadequate understanding of the decisive factors that
control their engineering behavior. The mineralogy and fabric of the Miango granite porphyry was studied to assess their
influence on engineering properties. The uniaxial compressive strength and aggregate crushing values show that the rocks are
weak while other tests such as aggregate impact strength, water absorption, and absorption under saturation, soundness and
specific gravity values are fairly good. However, thin section studies revealed three distinctive features which greatly influence
the physico-mechanical properties: (a) abundant fractures of varying sizes (b) Sericitization of the orthoclase and or plagioclase
feldspars (c) intergrowth of quartz and or orthoclase feldspars. The quartz grains shows extensive cracking which further reduces
their mechanical strength. The strength loss of the granite porphyry could be attributed to the presence of the fractures on the
quartz grains and the sericitization of the orthoclase and plagioclase feldspars. Geotechnical characterization of the rocks shows
that they can be utilized as roadstone or could be polished and used as facing stones because of their non disintegration to
sulphate attack and the large feldspar phenocrysts in the rock.
Key Words: Aggregate Impact Value Tests; Aggregate Crushing Value; Water Absorption; Crushing Load
Improvement of Surface Roughness of Nickel Alloy Specimen by Removing Recast ...IJMER
In this investigation, experimental work and computational work are combined to obtain
improvement in the surface roughness of nickel alloy specimen, the machining is carried out by means
of CNC wire electric discharge machining (WEDM). Brass wire is used as the tool electrode and nickel
alloy (Inconel600) is used as the work piece material. The machining parameters such as Pulse-On time
(Ton), Pulse-Off time (Toff), Peak Current (Ip), and Bed speed are considered as input parameters for this
project. Surface roughness and Recast layer are considered the output parameters. The experiments
with the pre-planned set of input parameters are designed based on Taguchi’s orthogonal array. The
surface roughness is measured using stylus type roughness tester and the thickness of the Recast layer
is measured using Scanning Electron Microscope (SEM). The results obtained from the experiments are
fed to the Minitab software and optimum input parameters for the desired output parameters are
identified. The software uses the concept of analysis of variance (ANOVA) and indicates the nature of
effect of input parameters on the output parameters and confirmation is done by validation
experiments. Once the recast layer thickness is obtained Chemical Etching and abrasive blasting is
performed in order to remove the recast layer and again the surface roughness is measured by using
stylus type roughness tester. Finally from the obtained results it was found that there was significant
improvement in the Surface roughness of the nickel alloy material. In addition using regression
analysis this work is stimulated by computational method and the results are obtained.
On ranges and null spaces of a special type of operator named 𝝀 − 𝒋𝒆𝒄𝒕𝒊𝒐𝒏. – ...IJMER
In this article, 𝜆 − 𝑗𝑒𝑐𝑡𝑖𝑜𝑛 has been introduced which is a generalization of trijection
operator as introduced in P.Chandra’s Ph. D. thesis titled “Investigation into the theory of operators
and linear spaces” (Patna University,1977). We obtain relation between ranges and null spaces of two
given 𝜆 − 𝑗𝑒𝑐𝑡𝑖𝑜𝑛𝑠 under suitable conditions.
A Survey of User Authentication Schemes for Mobile DeviceIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
On some locally closed sets and spaces in Ideal Topological SpacesIJMER
In this paper we introduce and characterize some new generalized locally closed sets
known as
δ
ˆ
s-locally closed sets and spaces are known as
δ
ˆ
s-normal space and
δ
ˆ
s-connected space and
discussed some of their properties
A Novel Approach To Answer Continuous Aggregation Queries Using Data Aggregat...IJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
A SURVEY ON OPTIMIZATION APPROACHES TO TEXT DOCUMENT CLUSTERINGijcsa
Text Document Clustering is one of the fastest growing research areas because of availability of huge amount of information in an electronic form. There are several number of techniques launched for clustering documents in such a way that documents within a cluster have high intra-similarity and low inter-similarity to other clusters. Many document clustering algorithms provide localized search in effectively navigating, summarizing, and organizing information. A global optimal solution can be obtained by applying high-speed and high-quality optimization algorithms. The optimization technique performs a globalized search in the entire solution space. In this paper, a brief survey on optimization approaches to text document clustering is turned out.
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high
dimensional data. Many significant subspace clustering algorithms exist, each having different
characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive
classification scheme is essential which will consider all such characteristics to divide subspace clustering
approaches in various families. The algorithms belonging to same family will satisfy common
characteristics. Such a categorization will help future developers to better understand the quality criteria to
be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In
this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family).
Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc.
As an illustration, we further provided a comprehensive, systematic description and comparison of few
significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
Apriori is one of the key algorithms to generate frequent itemsets. Analysing frequent itemset is a crucial
step in analysing structured data and in finding association relationship between items. This stands as an
elementary foundation to supervised learning, which encompasses classifier and feature extraction
methods. Applying this algorithm is crucial to understand the behaviour of structured data. Most of the
structured data in scientific domain are voluminous. Processing such kind of data requires state of the art
computing machines. Setting up such an infrastructure is expensive. Hence a distributed environment
such as a clustered setup is employed for tackling such scenarios. Apache Hadoop distribution is one of
the cluster frameworks in distributed environment that helps by distributing voluminous data across a
number of nodes in the framework. This paper focuses on map/reduce design and implementation of
Apriori algorithm for structured data analysis.
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.
Data mining is an integrated field, depicted technologies in combination to the areas having database, learning by machine, statistical study, and recognition in patterns of same type, information regeneration, A.I networks, knowledge-based portfolios, artificial intelligence, neural network, and data determination. In real terms, mining of data is the investigation of provisional data sets for finding hidden connections and to gather the information in peculiar form which are justifiable and understandable to the owner of gather or mined data. An unsupervised formula which differentiate data components into collections by which the components in similar group are more allied to one other and items in rest of cluster seems to be non-allied, by the criteria of measurement of equality or predictability is called process of clustering. Cluster analysis is a relegating task that is utilized to identify same group of object and it is additionally one of the most widely used method for many practical application in data mining. It is a method of grouping objects, where objects can be physical, such as a student or may be a summary such as customer comportment, handwriting. It has been proposed many clustering algorithms that it falls into the different clustering methods. The intention of this paper is to provide a relegation of some prominent clustering algorithms.
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algorithms is that they are sensitive to the selection of the initial partitions and are prone to premature converge to local optima. Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and cluster centers for any given set of data. The cluster estimates can be used to initialize iterative optimization-based clustering methods and model identification methods. In this paper, we present a hybrid Particle Swarm Optimization, Subtractive + (PSO) clustering algorithm that performs fast clustering. For comparison purpose, we applied the Subtractive + (PSO) clustering algorithm, PSO, and the Subtractive clustering algorithms on three different datasets. The results illustrate that the Subtractive + (PSO) clustering algorithm can generate the most compact clustering results as compared to other algorithms.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
Introduction to Multi-Objective Clustering EnsembleIJSRD
Association rule mining is a popular and well researched method for discovering interesting relations between variables in large databases. In this paper we introduce the concept of Data mining, Association rule and Multilevel association rule with different algorithm, its advantage and concept of Fuzzy logic and Genetic Algorithm. Multilevel association rules can be mined efficiently using concept hierarchies under a support-confidence framework.
A Study on Translucent Concrete Product and Its Properties by Using Optical F...IJMER
- Translucent concrete is a concrete based material with light-transferring properties,
obtained due to embedded light optical elements like Optical fibers used in concrete. Light is conducted
through the concrete from one end to the other. This results into a certain light pattern on the other
surface, depending on the fiber structure. Optical fibers transmit light so effectively that there is
virtually no loss of light conducted through the fibers. This paper deals with the modeling of such
translucent or transparent concrete blocks and panel and their usage and also the advantages it brings
in the field. The main purpose is to use sunlight as a light source to reduce the power consumption of
illumination and to use the optical fiber to sense the stress of structures and also use this concrete as an
architectural purpose of the building
Developing Cost Effective Automation for Cotton Seed DelintingIJMER
A low cost automation system for removal of lint from cottonseed is to be designed and
developed. The setup consists of stainless steel drum with stirrer in which cottonseeds having lint is mixed
with concentrated sulphuric acid. So lint will get burn. This lint free cottonseed treated with lime water to
neutralize acidic nature. After water washing this cottonseeds are used for agriculter purpose
Study & Testing Of Bio-Composite Material Based On Munja FibreIJMER
The incorporation of natural fibres such as munja fiber composites has gained
increasing applications both in many areas of Engineering and Technology. The aim of this study is to
evaluate mechanical properties such as flexural and tensile properties of reinforced epoxy composites.
This is mainly due to their applicable benefits as they are light weight and offer low cost compared to
synthetic fibre composites. Munja fibres recently have been a substitute material in many weight-critical
applications in areas such as aerospace, automotive and other high demanding industrial sectors. In
this study, natural munja fibre composites and munja/fibreglass hybrid composites were fabricated by a
combination of hand lay-up and cold-press methods. A new variety in munja fibre is the present work
the main aim of the work is to extract the neat fibre and is characterized for its flexural characteristics.
The composites are fabricated by reinforcing untreated and treated fibre and are tested for their
mechanical, properties strictly as per ASTM procedures.
Hybrid Engine (Stirling Engine + IC Engine + Electric Motor)IJMER
Hybrid engine is a combination of Stirling engine, IC engine and Electric motor. All these 3 are
connected together to a single shaft. The power source of the Stirling engine will be a Solar Panel. The aim of
this is to run the automobile using a Hybrid engine
Fabrication & Characterization of Bio Composite Materials Based On Sunnhemp F...IJMER
The present day technology demands eco-friendly developments. In this era the
composite material are playing a vital roal in different field of Engineering .The composite materials
are using as a principle materials. Nowaday the composite materials are utilizing as a important
component of engineering field .Where as the importance of the applications of composites is well
known, but thrust on the use of natural fibres in it for reinforcement has been given priority for some
times. But changing from synthetic fibres to natural fibres provides only half green-composites. A
partial green composite will be achieved if the matrix component is also eco-friendly. Keeping this in
view, a detailed literature surveyed has been carried out through various issues of the Journals
related to this field. The material systems used are sunnhemp fibres. Some epoxy and hardener has
been also added for stability and drying of the bio-composites. Various graphs and bar-charts are
super-imposed on each other for comparison among themselves and Graphs is plotted on MAT LAB
and ORIGIN 6.0 software. To determining tensile strengths, Various properties for different biocomposites
have been compared among themselves. Comparison of the behaviour of bio-composites of
this work has been also compare with other works. The bio-composites developed in this work are
likely to get applications in fall ceilings, partitions, bio-degradable packagings, automotive interiors,
sports things (e.g. rackets, nets, etc.), toys etc.
Geochemistry and Genesis of Kammatturu Iron Ores of Devagiri Formation, Sandu...IJMER
The Greenstone belts of Karnataka are enriched in BIFs in Dharwar craton, where Iron
formations are confined to the basin shelf, clearly separated from the deeper-water iron formation that
accumulated at the basin margin and flanking the marine basin. Geochemical data procured in terms of
major, trace and REE are plotted in various diagrams to interpret the genesis of BIFs. Al2O3, Fe2O3 (T),
TiO2, CaO, and SiO2 abundances and ratios show a wide variation. Ni, Co, Zr, Sc, V, Rb, Sr, U, Th,
ΣREE, La, Ce and Eu anomalies and their binary relationships indicate that wherever the terrigenous
component has increased, the concentration of elements of felsic such as Zr and Hf has gone up. Elevated
concentrations of Ni, Co and Sc are contributed by chlorite and other components characteristic of basic
volcanic debris. The data suggest that these formations were generated by chemical and clastic
sedimentary processes on a shallow shelf. During transgression, chemical precipitation took place at the
sediment-water interface, whereas at the time of regression. Iron ore formed with sedimentary structures
and textures in Kammatturu area, in a setting where the water column was oxygenated.
Experimental Investigation on Characteristic Study of the Carbon Steel C45 in...IJMER
In this paper, the mechanical characteristics of C45 medium carbon steel are investigated
under various working conditions. The main characteristic to be studied on this paper is impact toughness
of the material with different configurations and the experiment were carried out on charpy impact testing
equipment. This study reveals the ability of the material to absorb energy up to failure for various
specimen configurations under different heat treated conditions and the corresponding results were
compared with the analysis outcome
Non linear analysis of Robot Gun Support Structure using Equivalent Dynamic A...IJMER
Robot guns are being increasingly employed in automotive manufacturing to replace
risky jobs and also to increase productivity. Using a single robot for a single operation proves to be
expensive. Hence for cost optimization, multiple guns are mounted on a single robot and multiple
operations are performed. Robot Gun structure is an efficient way in which multiple welds can be done
simultaneously. However mounting several weld guns on a single structure induces a variety of
dynamic loads, especially during movement of the robot arm as it maneuvers to reach the weld
locations. The primary idea employed in this paper, is to model those dynamic loads as equivalent G
force loads in FEA. This approach will be on the conservative side, and will be saving time and
subsequently cost efficient. The approach of the paper is towards creating a standard operating
procedure when it comes to analysis of such structures, with emphasis on deploying various technical
aspects of FEA such as Non Linear Geometry, Multipoint Constraint Contact Algorithm, Multizone
meshing .
Static Analysis of Go-Kart Chassis by Analytical and Solid Works SimulationIJMER
This paper aims to do modelling, simulation and performing the static analysis of a go
kart chassis consisting of Circular beams. Modelling, simulations and analysis are performed using 3-D
modelling software i.e. Solid Works and ANSYS according to the rulebook provided by Indian Society of
New Era Engineers (ISNEE) for National Go Kart Championship (NGKC-14).The maximum deflection is
determined by performing static analysis. Computed results are then compared to analytical calculation,
where it is found that the location of maximum deflection agrees well with theoretical approximation but
varies on magnitude aspect.
In récent year various vehicle introduced in market but due to limitation in
carbon émission and BS Séries limitd speed availability vehicle in the market and causing of
environnent pollution over few year There is need to decrease dependancy on fuel vehicle.
bicycle is to be modified for optional in the future To implement new technique using change in
pedal assembly and variable speed gearbox such as planetary gear optimise speed of vehicle
with variable speed ratio.To increase the efficiency of bicycle for confortable drive and to
reduce torque appli éd on bicycle. we introduced epicyclic gear box in which transmission done
throgh Chain Drive (i.e. Sprocket )to rear wheel with help of Epicyclical gear Box to give
number of différent Speed during driving.To reduce torque requirent in the cycle with change in
the pedal mechanism
Integration of Struts & Spring & Hibernate for Enterprise ApplicationsIJMER
The proposal of this paper is to present Spring Framework which is widely used in
developing enterprise applications. Considering the current state where applications are developed using
the EJB model, Spring Framework assert that ordinary java beans(POJO) can be utilize with minimal
modifications. This modular framework can be used to develop the application faster and can reduce
complexity. This paper will highlight the design overview of Spring Framework along with its features that
have made the framework useful. The integration of multiple frameworks for an E-commerce system has
also been addressed in this paper. This paper also proposes structure for a website based on integration of
Spring, Hibernate and Struts Framework.
Microcontroller Based Automatic Sprinkler Irrigation SystemIJMER
Microcontroller based Automatic Sprinkler System is a new concept of using
intelligence power of embedded technology in the sprinkler irrigation work. Designed system replaces
the conventional manual work involved in sprinkler irrigation to automatic process. Using this system a
farmer is protected against adverse inhuman weather conditions, tedious work of changing over of
sprinkler water pipe lines & risk of accident due to high pressure in the water pipe line. Overall
sprinkler irrigation work is transformed in to a comfortableautomatic work. This system provides
flexibility & accuracy in respect of time set for the operation of a sprinkler water pipe lines. In present
work the author has designed and developed an automatic sprinkler irrigation system which is
controlled and monitored by a microcontroller interfaced with solenoid valves.
Intrusion Detection and Forensics based on decision tree and Association rule...IJMER
This paper present an approach based on the combination of, two techniques using
decision tree and Association rule mining for Probe attack detection. This approach proves to be
better than the traditional approach of generating rules for fuzzy expert system by clustering methods.
Association rule mining for selecting the best attributes together and decision tree for identifying the
best parameters together to create the rules for fuzzy expert system. After that rules for fuzzy expert
system are generated using association rule mining and decision trees. Decision trees is generated for
dataset and to find the basic parameters for creating the membership functions of fuzzy inference
system. Membership functions are generated for the probe attack. Based on these rules we have
created the fuzzy inference system that is used as an input to neuro-fuzzy system. Fuzzy inference
system is loaded to neuro-fuzzy toolbox as an input and the final ANFIS structure is generated for
outcome of neuro-fuzzy approach. The experiments and evaluations of the proposed method were
done with NSL-KDD intrusion detection dataset. As the experimental results, the proposed approach
based on the combination of, two techniques using decision tree and Association rule mining
efficiently detected probe attacks. Experimental results shows better results for detecting intrusions as
compared to others existing methods
Natural Language Ambiguity and its Effect on Machine LearningIJMER
"Natural language processing" here refers to the use and ability of systems to process
sentences in a natural language such as English, rather than in a specialized artificial computer
language such as C++. The systems of real interest here are digital computers of the type we think of as
personal computers and mainframes. Of course humans can process natural languages, but for us the
question is whether digital computers can or ever will process natural languages. We have tried to
explore in depth and break down the types of ambiguities persistent throughout the natural languages
and provide an answer to the question “How it affects the machine translation process and thereby
machine learning as whole?” .
Today in era of software industry there is no perfect software framework available for
analysis and software development. Currently there are enormous number of software development
process exists which can be implemented to stabilize the process of developing a software system. But no
perfect system is recognized till yet which can help software developers for opting of best software
development process. This paper present the framework of skillful system combined with Likert scale. With
the help of Likert scale we define a rule based model and delegate some mass score to every process and
develop one tool name as MuxSet which will help the software developers to select an appropriate
development process that may enhance the probability of system success.
Material Parameter and Effect of Thermal Load on Functionally Graded CylindersIJMER
The present study investigates the creep in a thick-walled composite cylinders made
up of aluminum/aluminum alloy matrix and reinforced with silicon carbide particles. The distribution
of SiCp is assumed to be either uniform or decreasing linearly from the inner to the outer radius of
the cylinder. The creep behavior of the cylinder has been described by threshold stress based creep
law with a stress exponent of 5. The composite cylinders are subjected to internal pressure which is
applied gradually and steady state condition of stress is assumed. The creep parameters required to
be used in creep law, are extracted by conducting regression analysis on the available experimental
results. The mathematical models have been developed to describe steady state creep in the composite
cylinder by using von-Mises criterion. Regression analysis is used to obtain the creep parameters
required in the study. The basic equilibrium equation of the cylinder and other constitutive equations
have been solved to obtain creep stresses in the cylinder. The effect of varying particle size, particle
content and temperature on the stresses in the composite cylinder has been analyzed. The study
revealed that the stress distributions in the cylinder do not vary significantly for various combinations
of particle size, particle content and operating temperature except for slight variation observed for
varying particle content. Functionally Graded Materials (FGMs) emerged and led to the development
of superior heat resistant materials.
Energy Audit is the systematic process for finding out the energy conservation
opportunities in industrial processes. The project carried out studies on various energy conservation
measures application in areas like lighting, motors, compressors, transformer, ventilation system etc.
In this investigation, studied the technical aspects of the various measures along with its cost benefit
analysis.
Investigation found that major areas of energy conservation are-
1. Energy efficient lighting schemes.
2. Use of electronic ballast instead of copper ballast.
3. Use of wind ventilators for ventilation.
4. Use of VFD for compressor.
5. Transparent roofing sheets to reduce energy consumption.
So Energy Audit is the only perfect & analyzed way of meeting the Industrial Energy Conservation.
An Implementation of I2C Slave Interface using Verilog HDLIJMER
The focus of this paper is on implementation of Inter Integrated Circuit (I2C) protocol
following slave module for no data loss. In this paper, the principle and the operation of I2C bus protocol
will be introduced. It follows the I2C specification to provide device addressing, read/write operation and
an acknowledgement. The programmable nature of device provide users with the flexibility of configuring
the I2C slave device to any legal slave address to avoid the slave address collision on an I2C bus with
multiple slave devices. This paper demonstrates how I2C Master controller transmits and receives data to
and from the Slave with proper synchronization.
The module is designed in Verilog and simulated in ModelSim. The design is also synthesized in Xilinx
XST 14.1. This module acts as a slave for the microprocessor which can be customized for no data loss.
Discrete Model of Two Predators competing for One PreyIJMER
This paper investigates the dynamical behavior of a discrete model of one prey two
predator systems. The equilibrium points and their stability are analyzed. Time series plots are obtained
for different sets of parameter values. Also bifurcation diagrams are plotted to show dynamical behavior
of the system in selected range of growth parameter
Application of Parabolic Trough Collectorfor Reduction of Pressure Drop in Oi...IJMER
Pipelines are the least expensive and most effective method for the oil transportation.
Due to high viscosity of crude oil, the pressure drop and pumping power requirements are very high.
So it is necessary to bring down the viscosity of crude oil. Heated pipelines are used reduce the oil
viscosity by increasing the oil temperature. Electrical heating and direct flame heating are the common
method used for heating the oil pipeline. In this work, a new application of Parabolic Trough Collector
in the field of oil pipeline transport is introduced for reducing pressure drop in oil pipelines. Oil
pipeline is heated by applying concentrated solar radiation on the pipe surface using a Parabolic
Trough Collector in which the oil pipeline acts as the absorber pipe. 3-D steady state analysis is
carried out on a heated oil pipeline using commercial CFD software package ANSYS Fluent 14.5. In
this work an effort is made to investigate the effect of concentrated solar radiation for reducing
pressure drop in the oil pipeline. The results from the numerical analysis shows that the pressure drop
in oil pipeline is get reduced by heating the pipe line using concentrated solar radiation. From this
work, the application of PTC in oil pipeline transportation is justified.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as “predictable inference”.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Connector Corner: Automate dynamic content and events by pushing a buttonDianaGray10
Here is something new! In our next Connector Corner webinar, we will demonstrate how you can use a single workflow to:
Create a campaign using Mailchimp with merge tags/fields
Send an interactive Slack channel message (using buttons)
Have the message received by managers and peers along with a test email for review
But there’s more:
In a second workflow supporting the same use case, you’ll see:
Your campaign sent to target colleagues for approval
If the “Approve” button is clicked, a Jira/Zendesk ticket is created for the marketing design team
But—if the “Reject” button is pushed, colleagues will be alerted via Slack message
Join us to learn more about this new, human-in-the-loop capability, brought to you by Integration Service connectors.
And...
Speakers:
Akshay Agnihotri, Product Manager
Charlie Greenberg, Host
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Epistemic Interaction - tuning interfaces to provide information for AI supportAlan Dix
Paper presented at SYNERGY workshop at AVI 2024, Genoa, Italy. 3rd June 2024
https://alandix.com/academic/papers/synergy2024-epistemic/
As machine learning integrates deeper into human-computer interactions, the concept of epistemic interaction emerges, aiming to refine these interactions to enhance system adaptability. This approach encourages minor, intentional adjustments in user behaviour to enrich the data available for system learning. This paper introduces epistemic interaction within the context of human-system communication, illustrating how deliberate interaction design can improve system understanding and adaptation. Through concrete examples, we demonstrate the potential of epistemic interaction to significantly advance human-computer interaction by leveraging intuitive human communication strategies to inform system design and functionality, offering a novel pathway for enriching user-system engagements.
LF Energy Webinar: Electrical Grid Modelling and Simulation Through PowSyBl -...DanBrown980551
Do you want to learn how to model and simulate an electrical network from scratch in under an hour?
Then welcome to this PowSyBl workshop, hosted by Rte, the French Transmission System Operator (TSO)!
During the webinar, you will discover the PowSyBl ecosystem as well as handle and study an electrical network through an interactive Python notebook.
PowSyBl is an open source project hosted by LF Energy, which offers a comprehensive set of features for electrical grid modelling and simulation. Among other advanced features, PowSyBl provides:
- A fully editable and extendable library for grid component modelling;
- Visualization tools to display your network;
- Grid simulation tools, such as power flows, security analyses (with or without remedial actions) and sensitivity analyses;
The framework is mostly written in Java, with a Python binding so that Python developers can access PowSyBl functionalities as well.
What you will learn during the webinar:
- For beginners: discover PowSyBl's functionalities through a quick general presentation and the notebook, without needing any expert coding skills;
- For advanced developers: master the skills to efficiently apply PowSyBl functionalities to your real-world scenarios.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more ‘mechanical’ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Smart TV Buyer Insights Survey 2024 by 91mobiles.pdf91mobiles
91mobiles recently conducted a Smart TV Buyer Insights Survey in which we asked over 3,000 respondents about the TV they own, aspects they look at on a new TV, and their TV buying preferences.
Knowledge engineering: from people to machines and back
Ba2419551957
1. International Journal of Modern Engineering Research (IJMER)
www.ijmer.com Vol.2, Issue.4, July-Aug 2012 pp-1955-1957 ISSN: 2249-6645
Protection of Privacy in Distributed Databases using Clustering
Ganesh P.1, KamalRaj R2, Dr. Karthik S.3
1,2,3
Department of Computer Science EngineeringSNS College of Technology
Abstract: Clustering is the technique which discovers partitions of data, to obtain a representative subset of each
groups over huge amount of data, based on similarities, local dataset. In the sequence these representative subsets
regardless of their structure (multi-dimensional or two are sent to a central site, which performs a fusion of the
dimensional). We applied an algorithm (DSOM) to cluster results and applies SOM and K-means algorithms to obtain
distributed datasets, based on self-organizing maps (SOM) the final result.
and extends this approach presenting a strategy for efficient The remainder of the article is organized as follows:
cluster analysis in distributed databases using SOM and K- section 2 presents a brief review about distributed data
means. The proposed strategy applies SOM algorithm clustering algorithms and section 3 describes the main
separately in each distributed dataset, relative to database aspects of the SOM. The proposed algorithm is presented in
horizontal partitions, to obtain a representative subset of section 4 describes the methodology. Finally, section 5
each local dataset. In the sequence, these representative presents conclusions and future research directions.
subsets are sent to a central site, which performs a fusion of
the partial results and applies SOM and K-means algorithms II. Bibliography Review
to obtain a final result. Cluster analysis algorithms groups data based on
the similarities between patterns. The complexity of cluster
I. Introduction analysis process increases with data cardinality and
In the recent years, there has been an increasing of dimensionality. Cardinality :-(N, the number of objects in
data volume in organizations, due to many factors such as a database) and dimensionality:- (p, the number of
the automation of the data acquisition and reduced storage attributes).Clustering methods range from those that are
costs. For that reason, there has been also a growing interest Largely heuristic method to statistic method. Several
in computational algorithms that can be used to extracting algorithms have been developed based on different
relevant information from recorded data. strategies, including hierarchical clustering, vector
Data mining is the process of applying various quantization, graph theory, fuzzy logic, neural networks and
methods and techniques to databases, with the objective of others. A recent survey of cluster analysis algorithms is
extract information hidden in large amounts of data. A presented in Xu and Wunsch [1].
frequently used method is cluster analysis, which can be Searching clusters in high-dimensional databases is
defined as the process of partition data into a certain number a non trivial task. Some common algorithms, such as
of clusters (or groups) of similar objects, where each group traditional agglomerative hierarchical methods, are
consists of similar objects amongst themselves (internal improper to large datasets. The increase in the number of
homogeneity) and different from the objects of the other attributes of each entrance does not just influence negatively
groups (external heterogeneity), i.e., patterns in the same in the time of processing of the algorithm, as well as it
cluster should be similar to each other, while patterns in hinders the process of identification of the clusters. An
different clusters should not [1]. alternative approach is divide database into partitions and to
More formally, given a set of N input patterns: perform data clustering each one
X = {x1, …, xN}, where each xj = (xj1, …, xjp) represents separately.
a p-dimensional vector and each measure xji represents a Some current applications have so large databases that
attribute (or variable) from dataset, a clustering process are not possible to maintain them integrally in the main
attempts to seek a K partition of X, denoted by C = {C1, …, memory, even using robust machines. Kantardzic[2] points
CK}, (K <= N). Artificial neural networks are an important three approaches to solve that problem:
computational tool with strong inspiration neurobiological
and widely used in the solution of complex problems, which a) The data are stored in secondary memory and data
cannot be handled with traditional algorithmic solutions subsets are clustered separately. A subsequent stage is
[13]. Applications for RNA include pattern recognition, needed to merge results;
signal analysis and processing, analysis tasks, diagnosis and b) Usage of an incremental grouping algorithm. Each
prognostic, data classification and clustering. element is individually stored in the main memory and
In some works, they presented a simple and associated to one of the existent groups or allocated in a
efficient algorithm to cluster distributed datasets, based on new group;
multiples parallel SOM, denominated partSOM [5]. The c) Usage of a parallel implementation. Several
algorithm is particularly interesting in situations where the algorithms work simultaneously on the stored data.
data volume is very large or when data privacy and security
policies forbid data consolidation into a single location. Two approaches are usually used to partition dataset: the
This work extends this approach presenting a first, and more usual, is to divide horizontally the database,
strategy for efficient cluster analysis in distributed databases creating homogeneous subsets of the data. Each algorithm
using SOM and K-means. The strategy is to apply SOM operates on the same attributes. Another approach is to
algorithm separately in each distributed dataset, horizontal divide horizontally the database, creating heterogeneous
www.ijmer.com 1955 | P a g e
2. International Journal of Modern Engineering Research (IJMER)
www.ijmer.com Vol.2, Issue.4, July-Aug 2012 pp-1955-1957 ISSN: 2249-6645
data subsets. In this case, each algorithm operates on the projections and 2D and 3D surface plots of distance
same registrations, but handle on different sets of attributes. matrices. The U-matrix method [17] enables visualization of
Some recent works about distributed data the topological relations of the neurons in an organized
clustering include Forman and Zhang [3] that describes a SOM. A gradient image (2D) or a surface plot is generated
technique to parallels several algorithms in order to obtain by computing distances between adjacent neurons. High
larger efficiency in data mining process of multiple values in the U-matrix encode dissimilarities between
distributed databases. Authors reinforce the concern need in neurons and correspond to cluster borders. Strategies for
relation to reducing communication cluster detection using U-matrix were proposed by Costa
Several organizations maintain geographically and Netto [16]. The algorithms were developed for
distributed databases as a form of increasing the safety of automatic partitioning and labeling of a trained SOM
their information. In that way, if safety policies fail, the network. The result is a segmented SOM output with
invader has just access to a part of the existent information. regions of neurons describing the data clusters.
Vaidya and Clifton [18] approaches vertically partitioned
databases using a distributed K-means algorithm. IV. Proposed Methodology
Jagannathan et al. [11] present a variant of K-means Distributed clustering algorithms usually work in
algorithm to clustering horizontally partitioned databases. two stages. Initially, the data are analyzed locally, in each
Oliveira and Zaïane [9] proposed a spatial data unit that is part of the distributed database. In a second
transformation method to protecting attributes values when stage, a central instance gathers partial results and combines
sharing data for clustering, called RBT, that and is them into an overall result.
independent of any clustering algorithm. This section presents a strategy for clustering
In databases with a large number of attributes, similar objects located in distributed databases, using
another approach sometimes used is to accomplish the parallel self-organizing maps and K-means algorithm. The
analysis considering only a subset of the attributes, instead process is divided in three stages.
of considering all of them. An obvious difficulty of this a) Traditional SOM algorithm is applied locally in each
approach is to identify which attributes are more important one of the distributed bases, in order to elect a
in the process of clusters identification. Some papers related representative subset from input data;
with this approach have frequently used statistical methods b) Traditional SOM is applied again, this time to the
as Principal Components Analysis (PCA) and Factor representatives of each one of the distributed bases that
Analysis to treat this problem. Kargupta et al. [10] presented are unified in a central unit;
a PCA-based technique denominated Collective Principal c) K-means algorithm is applied over trained self-
Component Analysis (CPCA) for cluster analysis of high- organizing map, to create a definitive result.
dimensional heterogeneous distributed databases. The
authors demonstrated concern in reducing data transfers The proposed algorithm, consisting of six steps: step 1
taxes among distributed sites. applies local clustering in each local dataset (horizontal
Other works consider the possibility to partition parties from the database) using traditional SOM. Thus, the
attributes in subsets, but considering each one of them in algorithm is applied to an attribute subset in each of the
data mining process. This is of particular interest for the remote units, obtaining a reference vector from each data
maintenance of whole characteristics present in initial subset. This reference vector, known as the codebook, is the
dataset. He et al. [12] analyzed the influence of data types in self-organizing map trained.
clustering process and presented a strategy that divided the In step 2, a projection is made of the input data on
attributes in two subsets, one with the numerical attributes the map in the previous stage, in each local unit. Each input
and other with the categorical ones. Subsequently, they is presented to the trained map and the index corresponding
propose to cluster separately of each one of the subsets, to the closest vector (BMU) present in the codebook is
using appropriate algorithms for each one of the types. The stored in an index vector. So, a data index is created based
cluster results were combined in a new database, which was on representative objects instead of original objects. Despite
again submitted to a clustering algorithm for categorical the difference from the original dataset, representative
data. objects in the index vector are very similar to the original
data, since maintenance of data topology is an important
III. Self-Organizing Map characteristic of the SOM.
The self-organizing feature map (SOM) has been In step 3, each remote unit sends its index and reference
widely used as a tool for visualization of high-dimensional vector to the central unit, which is responsible for unifying
data. Important features include information compression all partial results. An additional advantage of the proposed
while preserving topological and metric relationship of the algorithm is that the amount of transferred data is
primary data items [14]. SOM is composed of two layers of considerately reduced, since index vectors have only one
neurons, input and output layers. A neighbouring relation column (containing an integer value) and the codebook is
with neurons defines the topology of the map. Training is usually mush less than the original data. So, reducing data
similar to neural competitive learning, but the best match transfer and communication overload are considered by the
unit (c or BMU) is updated as well as they neighbors. Each proposed algorithm.
input is mapped to a BMU, which has weight vectors Step 4 is responsible for receiving the index
most similar to the presented data. vector and the codebook from each local unit and
A number of methods for visualizing data relations combining partial results to remount a database based on
in a trained SOM have been proposed [17], such as multiple received data. To remount each dataset, index vector
views of component planes, mesh visualization using indexes are substituted by the equivalent value in the
www.ijmer.com 1956 | P a g e
3. International Journal of Modern Engineering Research (IJMER)
www.ijmer.com Vol.2, Issue.4, July-Aug 2012 pp-1955-1957 ISSN: 2249-6645
codebook. Datasets are combined juxtaposing partial New Privacy-Preserving Distributed k-Clustering
datasets; however, it is important to ensure that objects are Algorithm”, In: Proceedings of the 2006 SIAM
in the same order as that of the original datasets. Note that International Conference on Data Mining (SDM), 2006.
the new database is slightly different from the original data, [12] Z. He, X. Xu and S. Deng, “Clustering Mixed Numeric and
Categorical Data: A Cluster Ensemble Approach”,
but data topology is maintained. Technical Report,
In step 5, the SOM algorithm is again applied over http://aps.arxiv.org/ftp/cs/papers/0509/0509011.pdf, 2005.
the complete database obtained in step 4. The expectation is [13] S. Haykin, “Neural networks: A comprehensive
that the results obtained in that stage can be generalized as foundation”, 2nd ed., N. York: Macmillan College
being equivalent to the clustering process of the entire Publishing Company, 1999.
database. The data obtained after the step 4 and that will [14] T. Kohonen, Self-Organizing Maps, 3rd ed., New York:
serve as input in stage 5 correspond to values close to the Springer-Verlag, 2001.
original, because vectors correspondents in codebook are [15] J. Vesanto, “Using SOM in Data Mining”, Licentiate's
representatives of input dataset. Thesis, Department of Computer Science and Engineering,
Helsinki University of Technology, Espoo, Finland, 2000
In step 6, K-means algorithm is applied over the [16] J. A. F. Costa and M. L. de Andrade Netto, “Clustering of
final trained map, in order to improve the quality of the complex shaped data sets via Kohonen maps and
visualization results. mathematical morphology”. In: Proceedings of the SPIE,
Data Mining and Knowledge Discovery. B. Dasarathy
V. Conclusion (Ed.), Vol. 4384, 2001, pp. 16-27.
Self-organizing map is neural network concept, [17] A. Ultsch, “Self-Organizing Neural Networks for
unsupervised learning strategy, has been widely used in Visualization and Classification”, In: O. Opitz et al. (Eds).
clustering applications. However, SOM approach is Information and Classification, Springer Berlin, 1993,
normally applied to single and local datasets. In one of the pp.301-306.
research work, they introduced partSOM, an efficient [18] J. Vaidya and C. Clifton, “Privacy-preserving k-means
strategy SOM-based to perform distributed data clustering clustering over vertically partitioned data”. In: Proc. of the
Ninth ACM SIGKDD Intl. Conf. on Knowledge Discovery
on geographically distributed databases.
and Data Mining, New York, 2003,pp.206-215.
However, SOM and partSOM approaches have
some limitations for presenting results. In this work we join
partSOM strategy with an alternative approach for cluster
detection using K-means algorithm.
References
[1] R. Xu and D. Wunsch II. “Survey of Clustering
Algorithms”, IEEE Trans. on Neural Networks, Vol.16,
Iss.3, May 2005, pp. 645-678.
[2] M. Kantardzic, Data Mining: Concepts, Models, Methods,
and Algorithms, IEEE Press, 2003.
[3] G. Forman and B. Zhang, “Distributed data clustering can
be efficient and exact”. SIGKDD Explor. Newsl. 2, Dec.
2000, pp 34-38.
[4] Chak-Man Lam, Xiao-Feng Zhang and W. K. Cheung,
“Mining Local Data Sources for Learning Global Cluster
Models”, Web Intelligence, Proceedings IEEE/WIC/ACM
International Conference on WI 2004, Iss. 20-24, Sept.
2004, pp. 748-751.
[5] F. L. Gorgônio and J. A. F. Costa, “Parallel Self-
Organizing Maps with Applications in Clustering
Distributed Data”, International Joint Conference on
Neural Networks (IJCNN’2008), (Accepted).
[6] A. Asuncion and D.J. Newman. UCI Machine Learning
Repository,availableathttp://www.ics.uci.edu/~mlearn/ML
Repository.html, Irvine, CA, 2007.
[7] J. C. Silva and M. Klusch, “Inference in distributed data
clustering”, Engineering Applications of Artificial
Intelligence, vol. 19, 2006, pp. 363-369.
[8] P. Berkhin, “A Survey of Clustering Data Mining
Techniques”, In: J. Kogan et al., Grouping
Multidimensional Data: Recent Advances in Clustering,
New York: Springer- Verlag, 2006.
[9] S. R. M. Oliveira and O. R. Zaïane, “Privacy Preservation
When Sharing Data For Clustering”, In: Proceedings of the
International Workshop on Secure Data Management in a
Connected World, 2004.
[10] H. Kargupta, W. Huang, K. Sivakumar and E. Johnson,
“Distributed Clustering Using Collective Principal
Component Analysis”, Knowledge and Information
Systems Journal, Vol. 3, Num. 4, May 2001, pp. 422-448.
[11] G. Jagannathan, K. Pillaipakkamnatt and R. N. Wright,“A
www.ijmer.com 1957 | P a g e