Quality of clustering is an important issue in application of clustering techniques. Most traditional cluster validity indices are geometry-based cluster quality measures. This work proposes a cluster validity index based on the decision-theoretic rough set model by considering various loss functions. Real time retail data show the usefulness of the proposed validity index for the evaluation of rough and crisp clustering. The measure is shown to help determine optimal number of clusters, as well as an important parameter called threshold in rough clustering. The experiments with a promotional campaign for the retail data illustrate the ability of the proposed measure to incorporate financial considerations in evaluating quality of a clustering scheme. This ability to deal with monetary values distinguishes the proposed decision-theoretic measure from other distance-based measures. Our proposed system validity index can also be efficient for evaluating other clustering algorithms such as fuzzy clustering.
Step by step operations by which we make a group of objects in which attributes
of all the objects are nearly similar, known as clustering. So, a cluster is a collection of
objects that acquire nearly same attribute values. The property of an object in a cluster is
similar to other objects in same cluster but different with objects of other clusters.
Clustering is used in wide range of applications like pattern recognition, image processing,
data analysis, machine learning etc. Nowadays, more attention has been put on categorical
data rather than numerical data. Where, the range of numerical attributes organizes in a
class like small, medium, high, and so on. There is wide range of algorithm that used to
make clusters of given categorical data. Our approach is to enhance the working on well-
known clustering algorithm k-modes to improve accuracy of algorithm. We proposed a new
approach named βHigh Accuracy Clustering Algorithm for Categorical datasetsβ.
A brief description of clustering, two relevant clustering algorithms(K-means and Fuzzy C-means), clustering validation, two inner validity indices(Dunn-n-Dunn and Devies Bouldin) .
Step by step operations by which we make a group of objects in which attributes
of all the objects are nearly similar, known as clustering. So, a cluster is a collection of
objects that acquire nearly same attribute values. The property of an object in a cluster is
similar to other objects in same cluster but different with objects of other clusters.
Clustering is used in wide range of applications like pattern recognition, image processing,
data analysis, machine learning etc. Nowadays, more attention has been put on categorical
data rather than numerical data. Where, the range of numerical attributes organizes in a
class like small, medium, high, and so on. There is wide range of algorithm that used to
make clusters of given categorical data. Our approach is to enhance the working on well-
known clustering algorithm k-modes to improve accuracy of algorithm. We proposed a new
approach named βHigh Accuracy Clustering Algorithm for Categorical datasetsβ.
A brief description of clustering, two relevant clustering algorithms(K-means and Fuzzy C-means), clustering validation, two inner validity indices(Dunn-n-Dunn and Devies Bouldin) .
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Β
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
Cluster analysis is a major tool in a number of applications in many fields of Business, Engineering & etc.(The odoridis and Koutroubas, 1999):
Data reduction.
Hypothesis generation.
Hypothesis testing.
Prediction based on groups.
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
Β
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality
very useful for cluster analysis. supportive for engineering student as well as it students. also provide example for every topic helps in numerical problems. good material for reading.
A survey on Efficient Enhanced K-Means Clustering Algorithmijsrd.com
Β
Data mining is the process of using technology to identify patterns and prospects from large amount of information. In Data Mining, Clustering is an important research topic and wide range of unverified classification application. Clustering is technique which divides a data into meaningful groups. K-means clustering is a method of cluster analysis which aims to partition n observations into k clusters in which each observation belongs to the cluster with the nearest mean. In this paper, we present the comparison of different K-means clustering algorithms.
Cluster analysis is a major tool in a number of applications in many fields of Business, Engineering & etc.(The odoridis and Koutroubas, 1999):
Data reduction.
Hypothesis generation.
Hypothesis testing.
Prediction based on groups.
K-MEDOIDS CLUSTERING USING PARTITIONING AROUND MEDOIDS FOR PERFORMING FACE R...ijscmc
Β
Face recognition is one of the most unobtrusive biometric techniques that can be used for access control as well as surveillance purposes. Various methods for implementing face recognition have been proposed with varying degrees of performance in different scenarios. The most common issue with effective facial biometric systems is high susceptibility of variations in the face owing to different factors like changes in pose, varying illumination, different expression, presence of outliers, noise etc. This paper explores a novel technique for face recognition by performing classification of the face images using unsupervised learning approach through K-Medoids clustering. Partitioning Around Medoids algorithm (PAM) has been used for performing K-Medoids clustering of the data. The results are suggestive of increased robustness to noise and outliers in comparison to other clustering methods. Therefore the technique can also be used to increase the overall robustness of a face recognition system and thereby increase its invariance and make it a reliably usable biometric modality
very useful for cluster analysis. supportive for engineering student as well as it students. also provide example for every topic helps in numerical problems. good material for reading.
Improved Performance of Unsupervised Method by Renovated K-MeansIJASCSE
Β
Clustering is a separation of data into groups of similar objects. Every group called cluster consists of objects that are similar to one another and dissimilar to objects of other groups. In this paper, the K-Means algorithm is implemented by three distance functions and to identify the optimal distance function for clustering methods. The proposed K-Means algorithm is compared with K-Means, Static Weighted K-Means (SWK-Means) and Dynamic Weighted K-Means (DWK-Means) algorithm by using Davis Bouldin index, Execution Time and Iteration count methods. Experimental results show that the proposed K-Means algorithm performed better on Iris and Wine dataset when compared with other three clustering methods.
WHAT DOES MEAN βGODβ?... (A New theory on βRELIGIONβ)IJERD Editor
Β
RELIGIONβ is exclusive asset of βHUMANβ?...
i. βAnimalsβ do not have βReligionβ?...
ii. βPlantsβ do not have βReligionβ?...
iii. βPlanetsβ do not have βReligionβ?...
k-Means is a rather simple but well known algorithms for grouping objects, clustering. Again all objects need to be represented as a set of numerical features. In addition the user has to specify the number of groups (referred to as k) he wishes to identify. Each object can be thought of as being represented by some feature vector in an n dimensional space, n being the number of all features used to describe the objects to cluster. The algorithm then randomly chooses k points in that vector space, these point serve as the initial centers of the clusters. Afterwards all objects are each assigned to center they are closest to. Usually the distance measure is chosen by the user and determined by the learning task. After that, for each cluster a new center is computed by averaging the feature vectors of all objects assigned to it. The process of assigning objects and recomputing centers is repeated until the process converges. The algorithm can be proven to converge after a finite number of iterations. Several tweaks concerning distance measure, initial center choice and computation of new average centers have been explored, as well as the estimation of the number of clusters k. Yet the main principle always remains the same. In this project we will discuss about K-means clustering algorithm, implementation and its application to the problem of unsupervised learning
Optimising Data Using K-Means Clustering AlgorithmIJERA Editor
Β
K-means is one of the simplest unsupervised learning algorithms that solve the well known clustering problem. The procedure follows a simple and easy way to classify a given data set through a certain number of clusters (assume k clusters) fixed a priori. The main idea is to define k centroids, one for each cluster. These centroids should be placed in a cunning way because of different location causes different result. So, the better choice is to place them as much as possible far away from each other.
Unsupervised learning Algorithms and Assumptionsrefedey275
Β
Topics :
Introduction to unsupervised learning
Unsupervised learning Algorithms and Assumptions
K-Means algorithm β introduction
Implementation of K-means algorithm
Hierarchical Clustering β need and importance of hierarchical clustering
Agglomerative Hierarchical Clustering
Working of dendrogram
Steps for implementation of AHC using Python
Gaussian Mixture Models β Introduction, importance and need of the model
Normal , Gaussian distribution
Implementation of Gaussian mixture model
Understand the different distance metrics used in clustering
Euclidean, Manhattan, Cosine, Mahala Nobis
Features of a Cluster β Labels, Centroids, Inertia, Eigen vectors and Eigen values
Principal component analysis
Supervised learning (classification)
Supervision: The training data (observations, measurements, etc.) are accompanied by labels indicating the class of the observations
New data is classified based on the training set
Unsupervised learning (clustering)
The class labels of training data is unknown
Given a set of measurements, observations, etc. with the aim of establishing the existence of classes or clusters in the data
Types of Hierarchical Clustering
There are mainly two types of hierarchical clustering:
Agglomerative hierarchical clustering
Divisive Hierarchical clustering
A distribution in statistics isΒ a function that shows the possible values for a variable and how often they occur.Β
In probability theory and statistics, theΒ Normal Distribution, also called theΒ Gaussian Distribution.
is the most significant continuous probability distribution.
Sometimes it is also called a bell curve.Β
Fuzzy clustering and fuzzy c-means partition cluster analysis and validation ...IJECEIAES
Β
A hard partition clustering algorithm assigns equally distant points to one of the clusters, where each datum has the probability to appear in simultaneous assignment to further clusters. The fuzzy cluster analysis assigns membership coefficients of data points which are equidistant between two clusters so the information directs have a place toward in excess of one cluster in the meantime. For a subset of CiteScore dataset, fuzzy clustering (fanny) and fuzzy c-means (fcm) algorithms were implemented to study the data points that lie equally distant from each other. Before analysis, clusterability of the dataset was evaluated with Hopkins statistic which resulted in 0.4371, a value < 0.5, indicating that the data is highly clusterable. The optimal clusters were determined using NbClust package, where it is evidenced that 9 various indices proposed 3 cluster solutions as best clusters. Further, appropriate value of fuzziness parameter m was evaluated to determine the distribution of membership values with variation in m from 1 to 2. Coefficient of variation (CV), also known as relative variability was evaluated to study the spread of data. The time complexity of fuzzy clustering (fanny) and fuzzy c-means algorithms were evaluated by keeping data points constant and varying number of clusters.
K-Means clustering uses an iterative procedure which is very much sensitive and dependent upon the initial centroids. The initial centroids in the k-means clustering are chosen randomly, and hence the clustering also changes with respect to the initial centroids. This paper tries to overcome this problem of random selection of centroids and hence change of clusters with a premeditated selection of initial centroids. We have used the iris, abalone and wine data sets to demonstrate that the proposed method of finding the initial centroids and using the centroids in k-means algorithm improves the clustering performance. The clustering also remains the same in every run as the initial centroids are not randomly selected but through premeditated method.
Comparision of methods for combination of multiple classifiers that predict b...IJERA Editor
Β
Predictive analysis include techniques fromdata mining that analyze current and historical data and make
predictions about the future. Predictive analytics is used in actuarial science, financial services, retail, travel,
healthcare, insurance, pharmaceuticals, marketing, telecommunications and other fields.Predicting patterns can
be considered as a classification problem and combining the different classifiers gives better results. We will
study and compare three methods used to combine multiple classifiers. Bayesian networks perform
classification based on conditional probability. It is ineffective and easy to interpret as it assumes that the
predictors are independent. Tree augmented naΓ―ve Bayes (TAN) constructs a maximum weighted spanning tree
that maximizes the likelihood of the training data, to perform classification.This tree structure eliminates the
independent attribute assumption of naΓ―ve Bayesian networks. Behavior-knowledge space method works in two
phases and can provide very good performances if large and representative data sets are available.
Help the Genetic Algorithm to Minimize the Urban Traffic on IntersectionsIJORCS
Β
Control of traffic lights at the intersections of the main issues is the optimal traffic. Intersections to regulate traffic flow of vehicles and eliminate conflicting traffic flows are used. Modeling and simulation of traffic are widely used in industry. In fact, the modeling and simulation of an industrial system is studied before creating economically and when it is affordable. The aim of this article is a smart way to control traffic. The first stage of the project with the objective of collecting statistical data (cycle time of each of the intersection of the lights of vehicles is waiting for a red light) steps where the data collection found optimal amounts next it is. Introduced by genetic algorithm optimization of parameters is performed. GA begin with coding step as a binary variable (the range specified by the initial data set is obtained) will start with an initial population and then a new generation of genetic operators mutation and crossover and will Finally, the members of the optimal fitness values are selected as the solution set. The optimal output of Petri nets CPN TOOLS modeling and software have been implemented. The results indicate that the performance improvement project in intersections traffic control systems. It is known that other data collected and enforced intersections of evolutionary methods such as genetic algorithms to reduce the waiting time for traffic lights behind the red lights and to determine the appropriate cycle.
Welcoming the research scholars, scientists around the globeΒ in theΒ Open Access Dimension, IJORCS isΒ now accepting manuscripts for its next issue (Volume 4, Issue 4). Authors are encouraged to contribute to the research community by submitting to IJORCS, articles that clarify new research results, projects, surveying works and industrial experiences that describe significant advances in field of computer science.
All paper submissions (http://www.ijorcs.org/submit-paper) are received and managed electronically by IJORCS Team. Detailed instructions about the submission procedure are available on IJORCS website (http://www.ijorcs.org/author-guidelines)
License plate recognition system is one of the core technologies in intelligent traffic control. In this paper, a new and tunable algorithm which can detect multiple license plates in high resolution applications is proposed. The algorithm aims at investigation into and identification of the novel Iranian and some European countries plate, characterized by both inclusion of blue area on it and its geometric shape. Obviously, the suggested algorithm contains suitable velocity due to not making use of heavy pre-processing operation such as image-improving filters, edge-detection operation and omission of noise at the beginning stages. So, the recommended method of ours is compatible with model-adaptation, i.e., the very blue section of the plate so that the present method indicated the fact that if several plates are included in the image, the method can successfully manage to detect it. We evaluated our method on the two Persian single vehicle license plate data set that we obtained 99.33, 99% correct recognition rate respectively. Further we tested our algorithm on the Persian multiple vehicle license plate data set and we achieved 98% accuracy rate. Also we obtained approximately 99% accuracy in character recognition stage.
FPGA Implementation of FIR Filter using Various Algorithms: A RetrospectiveIJORCS
Β
This Paper is a review study of FPGA implementation of Finite Impulse response (FIR) with low cost and high performance. The key observation of this paper is an elaborate analysis about hardware implementations of FIR filters using different algorithm i.e., Distributed Arithmetic (DA), DA-Offset Binary Coding (DA-OBC), Common Sub-expression Elimination (CSE) and sum-of-power-of-two (SOPOT) with less resources and without affecting the performance of the original FIR Filter.
Using Virtualization Technique to Increase Security and Reduce Energy Consump...IJORCS
Β
An approach has been presented in this paper in order to generate a secure environment on internet Based Virtual Computing platform and also to reduce energy consumption in green cloud computing. The proposed approach constantly checks the accuracy of stored data by means of a central control service inside the network environment and also checks system security through isolating single virtual machines using a common virtual environment. This approach has been simulated on two types of Virtual Machine Manager (VMM) Quick EMUlator (Qemu), HVM (Hardware Virtual Machine) Xen and outputs of the simulation in VMInsight show that when service is getting singly used, the overhead of its performance will be increased. As a secure system, the proposed approach is able to recognize malicious behaviors and assure service security by means of operational integrity measurement. Moreover, the rate of system efficiency has been evaluated according to the amount of energy consumption on five applications (Defragmentation, Compression, Linux Boot Decompression and Kernel Boot). Therefore, this has been resulted that to secure multi-tenant environment, managers and supervisors should independently install a security monitoring system for each Virtual Machines (VMs) which will come up to have the management heavy workload of. While the proposed approach, can respond to all VMβs with just one virtual machine as a supervisor.
Algebraic Fault Attack on the SHA-256 Compression FunctionIJORCS
Β
The cryptographic hash function SHA-256 is one member of the SHA-2 hash family, which was proposed in 2000 and was standardized by NIST in 2002 as a successor of SHA-1. Although the differential fault attack on SHA-1compression function has been proposed, it seems hard to be directly adapted to SHA-256. In this paper, an efficient algebraic fault attack on SHA-256 compression function is proposed under the word-oriented random fault model. During the attack, an automatic tool STP is exploited, which constructs binary expressions for the word-based operations in SHA-256 compression function and then invokes a SAT solver to solve the equations. The simulation of the new attack needs about 65 fault injections to recover the chaining value and the input message block with about 200 seconds on average. Moreover, based on the attack on SHA-256 compression function, an almost universal forgery attack on HMAC-SHA-256 is presented. Our algebraic fault analysis is generic, automatic and can be applied to other ARX-based primitives.
Enhancement of DES Algorithm with Multi State LogicIJORCS
Β
The principal goal to design any encryption algorithm must be the security against unauthorized access or attacks. Data Encryption Standard algorithm is a symmetric key algorithm and it is used to secure the data. Enhanced DES algorithm works on increasing the key length or complex S-BOX design or increased the number of states in which the information is to be represented or combination of above criteria. By increasing the key length, the number of combinations for key will increase which is hard for the intruder to do the brute force attack. As the S-BOX design will become the complex there will be a good avalanche effect. As the number of states increases in which the information is represented, it is hard for the intruder to crack the actual information. Proposed algorithm replace the predefined XOR operation applied during the 16 round of the standard algorithm by a new operation called βHash functionβ depends on using two keys. One key used in βFβ function and another key consists of a combination of 16 states (0,1,2β¦13,14,15) instead of the ordinary 2 state key (0, 1). This replacement adds a new level of protection strength and more robustness against breaking methods.
Hybrid Simulated Annealing and Nelder-Mead Algorithm for Solving Large-Scale ...IJORCS
Β
This paper presents a new algorithm for solving large scale global optimization problems based on hybridization of simulated annealing and Nelder-Mead algorithm. The new algorithm is called simulated Nelder-Mead algorithm with random variables updating (SNMRVU). SNMRVU starts with an initial solution, which is generated randomly and then the solution is divided into partitions. The neighborhood zone is generated, random number of partitions are selected and variables updating process is starting in order to generate a trail neighbor solutions. This process helps the SNMRVU algorithm to explore the region around a current iterate solution. The Nelder- Mead algorithm is used in the final stage in order to improve the best solution found so far and accelerates the convergence in the final stage. The performance of the SNMRVU algorithm is evaluated using 27 scalable benchmark functions and compared with four algorithms. The results show that the SNMRVU algorithm is promising and produces high quality solutions with low computational costs.
Welcoming the research scholars, scientists around the globeΒ in theΒ Open Access Dimension, IJORCS isΒ now accepting manuscripts for its next issue (Volume 4, Issue 2). Authors are encouraged to contribute to the research community by submitting to IJORCS, articles that clarify new research results, projects, surveying works and industrial experiences that describe significant advances in field of computer science.
To view complete list ofΒ topics coverage of IJORCS, Aim & Scope, please visit, www.ijorcs.org/scope
Welcoming the research scholars, scientists around the globe in the Open Access Dimension, IJORCS is now accepting manuscripts for its next issue (Volume 4, Issue 1). Authors are encouraged to contribute to the research community by submitting to IJORCS, articles that clarify new research results, projects, surveying works and industrial experiences that describe significant advances in field of computer science.
Voice Recognition System using Template MatchingIJORCS
Β
It is easy for human to recognize familiar voice but using computer programs to identify a voice when compared with others is a herculean task. This is due to the problem that is encountered when developing the algorithm to recognize human voice. It is impossible to say a word the same way in two different occasions. Human speech analysis by computer gives different interpretation based on varying speed of speech delivery. This research paper gives detail description of the process behind implementation of an effective voice recognition algorithm. The algorithm utilize discrete Fourier transform to compare the frequency spectra of two voice samples because it remained unchanged as speech is slightly varied. Chebyshev inequality is then used to determine whether the two voices came from the same person. The algorithm is implemented and tested using MATLAB.
Channel Aware Mac Protocol for Maximizing Throughput and FairnessIJORCS
Β
The proper channel utilization and the queue length aware routing protocol is a challenging task in MANET. To overcome this drawback we are extending the previous work by improving the MAC protocol to maximize the Throughput and Fairness. In this work we are estimating the channel condition and Contention for a channel aware packet scheduling and the queue length is also calculated for the routing protocol which is aware of the queue length. The channel is scheduled based on the channel condition and the routing is carried out by considering the queue length. This queue length will provide a measurement of traffic load at the mobile node itself. Depending upon this load the node with the lesser load will be selected for the routing; this will effectively balance the load and improve the throughput of the ad hoc network.
A Review and Analysis on Mobile Application Development Processes using Agile...IJORCS
Β
Over a last decade, mobile telecommunication industry has observed a rapid growth, proved to be highly competitive, uncertain and dynamic environment. Besides its advancement, it has also raised number of questions and gained concern both in industry and research. The development process of mobile application differs from traditional softwares as the users expect same features similar to their desktop computer applications with additional mobile specific functionalities. Advanced mobile applications require assimilation with existing enterprise computing systems such as databases, legacy applications and Web services. In addition, the lifecycle of a mobile application moves much faster than that of a traditional Web application and therefore the lifecycle management associated therein must be adjusted accordingly. The Security and application testing are more stimulating and interesting in mobile application than in Web applications since the technology in mobile devices progresses rapidly and developers must stay in touch with the latest developments, news and trends in their area of work. With the rising competence of software market, researchers are seeking more flexible methods that can adjust to dynamic situations where software system requirements are changing over time, producing valuable software in short duration and within low budget. The intrinsic uncertainty and complexity in any software project therefore requires an iterative developmental plan to cope with uncertainty and a large number of unknown variables. Agile Methodologies were thus introduced to meet the new requirements of the software development companies. The agile methodologies aim at facilitating software development processes where changes are acceptable at any stage and provide a structure for highly collaborative software development. Therefore, the present paper aims in reviewing and analysing different prevalent methodologies utilizing agile techniques that are currently in use for the development of mobile applications. This paper provides a detailed review and analysis on the use of agile methodologies in the proposed processes associated with mobile application skills and highlights its benefit and constraints. In addition, based on this analysis, future research needs are identified and discussed.
Congestion Prediction and Adaptive Rate Adjustment Technique for Wireless Sen...IJORCS
Β
In general, nodes in Wireless Sensor Networks (WSNs) are equipped with limited battery and computation capabilities but the occurrence of congestion consumes more energy and computation power by retransmitting the data packets. Thus, congestion should be regulated to improve network performance. In this paper, we propose a congestion prediction and adaptive rate adjustment technique for Wireless Sensor Networks. This technique predicts congestion level using fuzzy logic system. Node degree, data arrival rate and queue length are taken as inputs to the fuzzy system and congestion level is obtained as an outcome. When the congestion level is amidst moderate and maximum ranges, adaptive rate adjustment technique is triggered. Our technique prevents congestion by controlling data sending rate and also avoids unsolicited packet losses. By simulation, we prove the proficiency our technique. It increases system throughput and network performance significantly.
A Study of Routing Techniques in Intermittently Connected MANETsIJORCS
Β
A Mobile Ad hoc Network (MANET) is a self-configuring infrastructure less network of mobile devices connected by wireless. These are a kind of wireless Ad hoc Networks that usually has a routable networking environment on top of a Link Layer Ad hoc Network. The routing approach in MANET includes mainly three categories viz., Reactive Protocols, Proactive Protocols and Hybrid Protocols. These traditional routing schemes are not pertinent to the so called Intermittently Connected Mobile Ad hoc Network (ICMANET). ICMANET is a form of Delay Tolerant Network, where there never exists a complete end β to β end path between two nodes wishing to communicate. The intermittent connectivity araise when network is sparse or highly mobile. Routing in such a spasmodic environment is arduous. In this paper, we put forward the indication of prevailing routing approaches for ICMANET with their benefits and detriments
Improving the Efficiency of Spectral Subtraction Method by Combining it with ...IJORCS
Β
In the field of speech signal processing, Spectral subtraction method (SSM) has been successfully implemented to suppress the noise that is added acoustically. SSM does reduce the noise at satisfactory level but musical noise is a major drawback of this method. To implement spectral subtraction method, transformation of speech signal from time domain to frequency domain is required. On the other hand, Wavelet transform displays another aspect of speech signal. In this paper we have applied a new approach in which SSM is cascaded with wavelet thresholding technique (WTT) for improving the quality of speech signal by removing the problem of musical noise to a great extent. Results of this proposed system have been simulated on MATLAB.
An Adaptive Load Sharing Algorithm for Heterogeneous Distributed SystemIJORCS
Β
Due to the restriction of designing faster and faster computers, one has to find the ways to maximize the performance of the available hardware. A distributed system consists of several autonomous nodes, where some nodes are busy with processing, while some nodes are idle without any processing. To make better utilization of the hardware, the tasks or load of the overloaded node will be sent to the under loaded node that has less processing weight to minimize the response time of the tasks. Load balancing is a tool used effectively for balancing the load among the systems. Dynamic load balancing takes into account of the current system state for migration of the tasks from heavily loaded nodes to the lightly loaded nodes. In this paper, we devised an adaptive load-sharing algorithm to balance the load by taking into consideration of connectivity among the nodes, processing capacity of each node and link capacity.
The Design of Cognitive Social Simulation Framework using Statistical Methodo...IJORCS
Β
Modeling the behavior of the cognitive architecture in the context of social simulation using statistical methodologies is currently a growing research area. Normally, a cognitive architecture for an intelligent agent involves artificial computational process which exemplifies theories of cognition in computer algorithms under the consideration of state space. More specifically, for such cognitive system with large state space the problem like large tables and data sparsity are faced. Hence in this paper, we have proposed a method using a value iterative approach based on Q-learning algorithm, with function approximation technique to handle the cognitive systems with large state space. From the experimental results in the application domain of academic science it has been verified that the proposed approach has better performance compared to its existing approaches.
An Enhanced Framework for Improving Spatio-Temporal Queries for Global Positi...IJORCS
Β
To efficiently process continuous spatio-temporal queries, we need to efficiently and effectively handle large number of moving objects and continuous updates on these queries. In this paper, we propose a framework that employs a new indexing algorithm that is built on top of SQL Server 2008 and avoid the overhead related to R-Tree indexing. To answer range queries, we utilize dynamic materialized view concept to efficiently handle update queries. We propose an adaptive safe region to reduce communication costs between the client and the server and to minimize position update load. Caching of results was utilized to enhance the overall performance of the framework. To handle concurrent spatio-temporal queries, we utilize publish/subscribe paradigm to group similar queries and efficiently process these requests. Experiments show that the overall proposed framework performance was able to outperform R-Tree index and produce promising and satisfactory results.
A PSO-Based Subtractive Data Clustering AlgorithmIJORCS
Β
There is a tremendous proliferation in the amount of information available on the largest shared information source, the World Wide Web. Fast and high-quality clustering algorithms play an important role in helping users to effectively navigate, summarize, and organize the information. Recent studies have shown that partitional clustering algorithms such as the k-means algorithm are the most popular algorithms for clustering large datasets. The major problem with partitional clustering algorithms is that they are sensitive to the selection of the initial partitions and are prone to premature converge to local optima. Subtractive clustering is a fast, one-pass algorithm for estimating the number of clusters and cluster centers for any given set of data. The cluster estimates can be used to initialize iterative optimization-based clustering methods and model identification methods. In this paper, we present a hybrid Particle Swarm Optimization, Subtractive + (PSO) clustering algorithm that performs fast clustering. For comparison purpose, we applied the Subtractive + (PSO) clustering algorithm, PSO, and the Subtractive clustering algorithms on three different datasets. The results illustrate that the Subtractive + (PSO) clustering algorithm can generate the most compact clustering results as compared to other algorithms.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Β
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
Elevating Tactical DDD Patterns Through Object CalisthenicsDorra BARTAGUIZ
Β
After immersing yourself in the blue book and its red counterpart, attending DDD-focused conferences, and applying tactical patterns, you're left with a crucial question: How do I ensure my design is effective? Tactical patterns within Domain-Driven Design (DDD) serve as guiding principles for creating clear and manageable domain models. However, achieving success with these patterns requires additional guidance. Interestingly, we've observed that a set of constraints initially designed for training purposes remarkably aligns with effective pattern implementation, offering a more βmechanicalβ approach. Let's explore together how Object Calisthenics can elevate the design of your tactical DDD patterns, offering concrete help for those venturing into DDD for the first time!
Securing your Kubernetes cluster_ a step-by-step guide to success !KatiaHIMEUR1
Β
Today, after several years of existence, an extremely active community and an ultra-dynamic ecosystem, Kubernetes has established itself as the de facto standard in container orchestration. Thanks to a wide range of managed services, it has never been so easy to set up a ready-to-use Kubernetes cluster.
However, this ease of use means that the subject of security in Kubernetes is often left for later, or even neglected. This exposes companies to significant risks.
In this talk, I'll show you step-by-step how to secure your Kubernetes cluster for greater peace of mind and reliability.
Neuro-symbolic is not enough, we need neuro-*semantic*Frank van Harmelen
Β
Neuro-symbolic (NeSy) AI is on the rise. However, simply machine learning on just any symbolic structure is not sufficient to really harvest the gains of NeSy. These will only be gained when the symbolic structures have an actual semantics. I give an operational definition of semantics as βpredictable inferenceβ.
All of this illustrated with link prediction over knowledge graphs, but the argument is general.
Key Trends Shaping the Future of Infrastructure.pdfCheryl Hung
Β
Keynote at DIGIT West Expo, Glasgow on 29 May 2024.
Cheryl Hung, ochery.com
Sr Director, Infrastructure Ecosystem, Arm.
The key trends across hardware, cloud and open-source; exploring how these areas are likely to mature and develop over the short and long-term, and then considering how organisations can position themselves to adapt and thrive.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
Β
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
2. 40 B.Rajasekhar, B. Sunil Kumar, Rajesh Vibhudi, B.V.Rama Krishna
Secondly, every point in the data is then assigned to and Intra Γ. The greater the value of Intra the more is
the closest centroid, and each collection of points the cluster compactness.[1] If the second BMUs of all
assigned to a centroid forms a cluster. The centroid of data vectors in Ck are also in Ck, then Intra(Ck)=1. The
each cluster is then updated based on the points intra-cluster connectivity of all clusters (Intra) is the
π΄ = β ππΎ
πΌππ‘ππ_πΆπππ(πΆ π )
assigned to the cluster. This process is repeated until average compactness which is given below
πΎ
no point changes clusters.
A. Clustering Crisp Method: The objective of the k-
means is to assign n objects to k clusters. The process D. Cluster Quality: Several cluster validity indices to
begins by randomly choosing k objects as the centroids evaluate cluster quality obtained by different clustering
π(π₯1 , β1 ) between the object vector β1 and the cluster
β π π₯
of the k clusters. The objects are assigned to one of the algorithms. An excellent summary of various validity
vector β1 the distance π(π₯1 , β1 ) can be the standard
π β π
k clusters based on the minimum value of the distance measures [10] are two classical cluster validity indices
and one used for fuzzy clusters.
Euclidean distance. 1. Davies-Bouldin Index:
Assignment of all the objects to various clusters, the This index [6] is a function of the ratio of the sum
β β π βπ π β π
π₯ β π₯ the distance between cluster β π and βπ , denoted by π ππ ,
π π
of within cluster scatter to between-cluster separation.
βπ =
π , π€βπππ 1 β€ π β€ π
new centroid vectors of the clusters are calculated as
|π π |
The scatter within the ith cluster, denoted by Si, and
β
1οΏ½
Here |π π | is the cardinality of cluster β π . The process
β π π π,π = οΏ½|π | β βπ₯βπ πβ β β
β π₯ β π β2 οΏ½
π
1 π π
are defined as follows:
β π
π ππ,π‘ = οΏ½π π β βπ οΏ½ π‘
β π
stops when the centroids of clusters stabilize, i.e., the
where π π is the center of the ith cluster. π ππ is the
centroid vectors from the previous iteration are
identical to those generated in the current iteration.
number of objects in βπ . Integers q and t can be
π
B. Cluster Validity: A New validity index conn_index
for prototype based clustering of data sets is applicable
with a wide variety of cluster characteristics clusters of selected independently such that q, t > 1. The Davies-
different shapes, sizes, densities and even overlaps. Bouldin index for a clustering scheme (CS) is then
1
π
π·π΅(πΆπ) = οΏ½ π π,ππ‘ , π€βπππ π π,ππ‘
Conn_index is based on weighted Delaunay defined as
π
triangulation called βconnectivity matrixβ.
= max1β€πβ€π,πβ 1 { π π,π + π π,π β π ππ,π‘ }
π=1
Crisp clustering the Davies-Bouldin index and the
generalized Dunn Index are some of the most
commonly used indices depend on a separation
measure between clusters and a measure for The Davies-Bouldin index considers the average case
compactness of clusters based on distance. When the of similarity between each cluster and the one that is
clusters have homogeneous density distribution, one most similar to it. Lower Davies-Bouldin index means
effective approach to correctly evaluate the clustering a better clustering scheme.
of data sets is CDbw (composite density between and
within clusters) [16]. CDbw finds prototypes for 2. Dunn Index:
clusters instead of representing the clusters by their Dunn proposed another cluster validity index [7]. The
centroids, and calculates the validity measure based on
πΏοΏ½π π , βπ οΏ½
β π
index corresponding to a clustering scheme (CS) is
π·(πΆπ) = min οΏ½ min οΏ½ οΏ½οΏ½
inter- and intra-cluster densities, and cluster defined by
max1β€πβ€π β(π π )β
separation.
1β€πβ€π 1β€πβ€π,πβ 1
πΏοΏ½π π , βπ οΏ½ =
β π min οΏ½π π βπ π οΏ½ ,
β β
C. Compactness of Clusters: Assuming k number of
clusters, N prototypes v in a data set, Ck and Cl are two
1β€π,πβ€π,πβ π
β(π π ) = max β β π β β π‘ β
β π₯ π₯
different clusters where 1 β€ k, l β€K, the new proposed
CONN_Index will be defined with the help of Intra
β π ,π₯ π‘ βπ π
π₯ β β
and Inter quantities which are considered as
compactness and separation. The compactness of Ck,
Intra (Ck) is the ratio of the number of data vectors in
usually large and the diameters of the clusters, Ξ ( π π )
Ck whose second BMU is also in Ck, to the number of If a data set is well separated by a clustering scheme,
the distance among the clusters, Ξ΄(ci,cj)(1β€ i,j β€k) is
β π,ποΏ½πΆπ΄π·π½(π, π): π£ π π£ π β πΆ π οΏ½
π
data vectors in Ck. The Intra (Ck) is defined by
πΌππ‘ππ_πΆπππ(πΆ π ) =
β π,π{ πΆπ΄π·π½(π, π): π£ π β πΆ π }
π
(1β€ i β€k), are expected to be small. Therefore, a large
value of D(CS) corresponds to a good clustering
www.ijorcs.org
4. 42 B.Rajasekhar, B. Sunil Kumar, Rajesh Vibhudi, B.V.Rama Krishna
Cluster favorable execution time and the user has to know in
Dataset Validity advance how many clusters are to be searched, k-
Measure means is data driven is efficient for smaller data sets
and anomaly detection. Instead of taking the mean
value of the objects in a cluster as a reference point, a
Medoid can be used, which is the most centrally
Decision Loss located object in a cluster. Clustering requires the
Tree Function distance between every pair of objects only once and
uses the distance at every stage of iteration.
Result
Comparing to [8] clustering, classification algorithms
performs efficient for complex datasets, noise and
Figure 3 is proposed system ([6])
outlier detection such as algorithm designers have had
We choose K-means clustering because 1)it is data much success with equal width method, equal depth
driven method relatively few assumptions on the method approaches to building class descriptions. It is
distributions of the underlying data and 2) greedy chosen decision tree learners made popular by ID3,
search strategy of K-means guarantees at least a local C4.5 and CART, because they are relatively fast and
minimum of the criterion function, thereby typically they produce competitive classifiers. In fact,
accelerating the convergence of clusters on large the decision tree generator C4.5, a successor to ID3,
datasets. has become a standard factor for comparison in
A. Cluster Quality on Decision Theory: machine learning research, because it produces good
classifiers quickly. For non numeric datasets, the
Unsupervised learning method is the techniques we growth of the run time of ID3 (and C4.5) is linear in all
apply only data available are unlabeled, algorithms examples.
need to know the number of clusters. Cluster validity
measures are Davies-Bouldin can help us assess The practical run time complexity of C4.5 has been
whether a clustering method accurately presents the determined empirically to be worse than O (e2) on
structure of the data set. There are several cluster some datasets. One possible explanation is based on
indices to evaluate crisp and fuzzy clusteruing. the observation of Oates and Jensen (1998) that the
Decision framework has been helpful in providing a size of C4.5 trees increases linearly with the number of
better understanding of the classification model. examples. One of the factors of a in C4.5βs run-time
Decision rough set model considers various classes of complexity corresponds to the tree depth, which
loss functions, the extension of the decision rough set cannot be larger than the number of attributes. Tree
model to multicategory is possible to construct a depth is related to tree size, and thereby to the number
cluster validity measure by considering various loss of examples. When compared with C4.5, the run time
V. CONCLUSION
functions based on decision theory. Within a given set complexity of CART is satisfactory.
of objects there may be clusters such that objects in the
same cluster are more similar than those in different
clusters. Clustering is to find the right groups or A cluster quality index based on decision theory,
clusters for the given set of objects. To find right proposal uses a loss function to construct the quality
cluster we need exponential time comparisons has index. Therefore, the cluster quality is evaluated by
been proved to be NP-hard. For defining framework considering the total risk of classifying all the objects.
we assume partitions a set of objects X={x1β¦.xn} into Such a decision-theoretic representation of cluster
clusters CS={c1β¦ck}, the k-means algorithm quality may be more useful in business-oriented data
approximate the actual clustering. It is possible that mining than traditional geometry-based cluster quality
each object may not necessarily belong to only one measures. In addition to evaluating crisp clustering, the
cluster. However there will be corresponding to each proposal is an evaluation measure for rough clustering.
cluster within the clustering scheme, the centroid of This is the first measure that takes into account special
the hypothetical core will be used Cluster core. Let features of rough clustering that allow for an object to
core (ci) be the core of the cluster ci, which is used to belong to more than one cluster. The measure is shown
calculate the centroid of the cluster. Any x1Π core (ci) to be useful in determining important aspects of a
cannot belong to other clusters. Therefore, core (ci) can clustering exercise such as determining the appropriate
be considered the best representation of ci to a certain number of clusters and size of boundary region. The
extent. application of the measure to synthetic data with
known number of clusters and boundary region
B. Comparison of Clustering and Classification: provides credence to the proposal.
Clustering work well for finding unlabeled clusters in A real advantage of the decision-theoretic cluster
small to large data points K-means algorithm is its validity measure is its ability to include monetary
www.ijorcs.org
5. Quality of Cluster Index Based on Study of Decision Tree 43
considerations in evaluating a clustering scheme. Use
of the measure to derive an appropriate clustering
scheme for a promotional campaign in a retail store
highlighted its unique ability to include cost and
benefit considerations in commercial data mining. We
can also extend it to evaluating other clustering
algorithms such as fuzzy clustering. Such a cluster
validity measure can be useful in further theoretical
VI. REFERENCES
development in clustering.
[1] J.C. Bezdek, Pattern Recognition with Fuzzy Objective
Function Algorithms. Plenum Press, 1981.
[2] D.L. Davies and D.W. Bouldin, βA Cluster Separation
Measure,β IEEE Trans. Pattern Analysis and Machine
Intelligence, vol. 1, no 2, pp. 224-227, Apr. 1979.
[3] J.C. Dunn, βWell Separated Clusters and Optimal Fuzzy
Partitions,β J. Cybernetics, vol. 4, pp. 95-104, 1974.
[4] S. Hirano and S. Tsumoto, βOn Constructing Clusters
from Non- Euclidean Dissimilarity Matrix by Using
Rough Clustering,β Proc. Japanese Soc. for Artificial
Intelligence (JSAI) Workshops, pp. 5-16, 2005.
[5] T.B. Ho and N.B. Nguyen, βNonhierarchical Document
Clustering by a Tolerance Rough Set Model,β Intβl J.
Intelligent Systems, vol. 17, no. 2, pp. 199-212, 2002.
[6] Rough Cluster Quality Index Based on Decision Theory
Pawan Lingras, Member, IEEE, Min Chen, and
Duoqian Miao IEEE TRANSACTIONS ON
KNOWLEDGE AND DATA ENGINEERING, VOL.
21, NO. 7, JULY 2009
[7] W. Pedrycz and J. Waletzky, βFuzzy Clustering with
Partial Supervision,β IEEE Trans. Systems, Man, and
Cybernetics, vol. 27, no. 5, pp. 787-795, Sept. 1997.
[8] Partition Algorithmsβ A Study and Emergence of
Mining Projected Clusters in High-Dimensional
Dataset-International Journal of Computer Science and
Telecommunications [Volume 2, Issue 4, July 2011]
[9] Jensen, D. D. and Cohen, P. R (1999), "Multiple
Comparisons in Induction Algorithms," Machine
Learning (to appear). Excellent discussion of bias
inherent in selecting an input. Explore
http://www.cs.umass.edu/~jensen/papers.
www.ijorcs.org