Data streams are generated by many real time systems. Data stream is fast changing and massive. In stream data mining traditional methods are not efficient so that many methodologies developed to stream data processing. Many applications require data into groups based on its characteristics. So clustering on data streams is applied. Clustering of non liner data density based clustering is used. Review of clustering algorithm and methodologies is represented and evaluated if they meet requirement of users. Study of density based clustering algorithm is presented here because of advantages of density based clustering method over other clustering method.
Dynamic K-Means Algorithm for Optimized Routing in Mobile Ad Hoc Networks IJCSES Journal
In this paper, a dynamic K-means algorithm to improve the routing process in Mobile Ad-Hoc networks
(MANETs) is presented. Mobile ad-hoc networks are a collocation of mobile wireless nodes that can
operate without using focal access points, pre-existing infrastructures, or a centralized management point.
In MANETs, the quick motion of nodes modifies the topology of network. This feature of MANETS is lead
to various problems in the routing process such as increase of the overhead massages and inefficient
routing between nodes of network. A large variety of clustering methods have been developed for
establishing an efficient routing process in MANETs. Routing is one of the crucial topics which are having
significant impact on MANETs performance. The K-means algorithm is one of the effective clustering
methods aimed to reduce routing difficulties related to bandwidth, throughput and power consumption.
This paper proposed a new K-means clustering algorithm to find out optimal path from source node to
destinations node in MANETs. The main goal of proposed approach which is called the dynamic K-means
clustering methods is to solve the limitation of basic K-means method like permanent cluster head and fixed
cluster members. The experimental results demonstrate that using dynamic K-means scheme enhance the
performance of routing process in Mobile ad-hoc networks.
Dynamic K-Means Algorithm for Optimized Routing in Mobile Ad Hoc Networks IJCSES Journal
In this paper, a dynamic K-means algorithm to improve the routing process in Mobile Ad-Hoc networks
(MANETs) is presented. Mobile ad-hoc networks are a collocation of mobile wireless nodes that can
operate without using focal access points, pre-existing infrastructures, or a centralized management point.
In MANETs, the quick motion of nodes modifies the topology of network. This feature of MANETS is lead
to various problems in the routing process such as increase of the overhead massages and inefficient
routing between nodes of network. A large variety of clustering methods have been developed for
establishing an efficient routing process in MANETs. Routing is one of the crucial topics which are having
significant impact on MANETs performance. The K-means algorithm is one of the effective clustering
methods aimed to reduce routing difficulties related to bandwidth, throughput and power consumption.
This paper proposed a new K-means clustering algorithm to find out optimal path from source node to
destinations node in MANETs. The main goal of proposed approach which is called the dynamic K-means
clustering methods is to solve the limitation of basic K-means method like permanent cluster head and fixed
cluster members. The experimental results demonstrate that using dynamic K-means scheme enhance the
performance of routing process in Mobile ad-hoc networks.
201907 AutoML and Neural Architecture SearchDaeJin Kim
Brief introduction of NAS
Review of EfficientNet (Google Brain), RandWire (FAIR) papers
NAS flow slide from KihoSuh's slideshare (https://www.slideshare.net/KihoSuh/neural-architecture-search-with-reinforcement-learning-76883153)
[References]
[1] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (https://arxiv.org/abs/1905.11946)
[2] Exploring Randomly Wired Neural Networks for Image Recognition (https://arxiv.org/abs/1904.01569)
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...cscpconf
Data Mining is one of the most significant tools for discovering association patterns that are useful for many knowledge domains. Yet, there are some drawbacks in existing mining techniques. Three main weaknesses of current data-mining techniques are: 1) re-scanning of the entire database must be done whenever new attributes are added. 2) An association rule may be true on a certain granularity but fail on a smaller ones and vise verse. 3) Current methods can only be used to find either frequent rules or infrequent rules, but not both at the same time. This research proposes a novel data schema and an algorithm that solves the above weaknesses while improving on the efficiency and effectiveness of data mining strategies. Crucial mechanisms in each step will be clarified in this paper. Finally, this paper presents experimental results regarding efficiency, scalability, information loss, etc. of the proposed approach to prove its advantages.
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Eswar Publications
Recently machine learning has been introduced into the area of adaptive video streaming. This paper explores a novel taxonomy that includes six state of the art techniques of machine learning that have been applied to Dynamic Adaptive Streaming over HTTP (DASH): (1) Q-learning, (2) Reinforcement learning, (3) Regression, (4) Classification, (5) Decision Tree learning, and (6) Neural networks.
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval IJECEIAES
Data mining is an essential process for identifying the patterns in large datasets through machine learning techniques and database systems. Clustering of high dimensional data is becoming very challenging process due to curse of dimensionality. In addition, space complexity and data retrieval performance was not improved. In order to overcome the limitation, Spectral Clustering Based VP Tree Indexing Technique is introduced. The technique clusters and indexes the densely populated high dimensional data points for effective data retrieval based on user query. A Normalized Spectral Clustering Algorithm is used to group similar high dimensional data points. After that, Vantage Point Tree is constructed for indexing the clustered data points with minimum space complexity. At last, indexed data gets retrieved based on user query using Vantage Point Tree based Data Retrieval Algorithm. This in turn helps to improve true positive rate with minimum retrieval time. The performance is measured in terms of space complexity, true positive rate and data retrieval time with El Nino weather data sets from UCI Machine Learning Repository. An experimental result shows that the proposed technique is able to reduce the space complexity by 33% and also reduces the data retrieval time by 24% when compared to state-of-the-artworks.
A STUDY OF IMAGE COMPRESSION BASED TRANSMISSION ALGORITHM USING SPIHT FOR LOW...acijjournal
Image compression is internationally recognized up to the minute tools for decrease the communication
bandwidth and save the transmitting power. It should reproduce a good quality image after compressed
at low bit rates. Set partitioning in hierarchical trees (SPIHT) is wavelet based computationally very fast
and among the best image compression based transmission algorithm that offers good compression
ratios, fast execution time and good image quality. Precise Rate Control (PRC) is the distinct
characteristic of SPIHT. Image compression-based on Precise Rate Control and fast coding time are
principally analyzed in this paper. Experimental result shows that, in the case of low bit-rate, the
modified algorithm with fast Coding Time and Precise Rate Control can reduce the execution time and
improves the quality of reconstructed image in both PSNR and perceptual when compare to at the same
low bit rate.
Tail-biting convolutional codes (TBCC) have been extensively applied in communication systems. This method is implemented by replacing the fixedtail with tail-biting data. This concept is needed to achieve an effective decoding computation. Unfortunately, it makes the decoding computation becomes more complex. Hence, several algorithms have been developed to overcome this issue in which most of them are implemented iteratively with uncertain number of iteration. In this paper, we propose a VLSI architecture to implement our proposed reversed-trellis TBCC (RT-TBCC) algorithm. This algorithm is designed by modifying direct-terminating maximumlikelihood (ML) decoding process to achieve better correction rate. The purpose is to offer an alternative solution for tail-biting convolutional code decoding process with less number of computation compared to the existing solution. The proposed architecture has been evaluated for LTE standard and it significantly reduces the computational time and resources compared to the existing direct-terminating ML decoder. For evaluations on functionality and Bit Error Rate (BER) analysis, several simulations, System-on-Chip (SoC) implementation and synthesis in FPGA are performed.
A Study of BFLOAT16 for Deep Learning TrainingSubhajit Sahu
Highlighted notes of:
A Study of BFLOAT16 for Deep Learning Training
This paper presents the first comprehensive empirical study demonstrating the efficacy of the Brain Floating Point (BFLOAT16) half-precision format for DeepLearning training across image classification, speech recognition, language model-ing, generative networks, and industrial recommendation systems. BFLOAT16 is attractive for Deep Learning training for two reasons: the range of values it can represent is the same as that of IEEE 754 floating-point format (FP32) and conversion to/from FP32 is simple. Maintaining the same range as FP32 is important to ensure that no hyper-parameter tuning is required for convergence; e.g., IEEE 754compliant half-precision floating point (FP16) requires hyper-parameter tuning. In this paper, we discuss the flow of tensors and various key operations in mixed-precision training and delve into details of operations, such as the rounding modes for converting FP32 tensors to BFLOAT16. We have implemented a method to emulate BFLOAT16 operations in Tensorflow, Caffe2, IntelCaffe, and Neon for our experiments. Our results show that deep learning training using BFLOAT16tensors achieves the same state-of-the-art (SOTA) results across domains as FP32tensors in the same number of iterations and with no changes to hyper-parameters.
Clustering is also known as data segmentation aims to partitions data set into groups, clusters, according to their similarity. Cluster analysis has been extensively studied in many researches. There are many algorithms for different types of clustering. These classical algorithms can't be applied on big data due to its distinct features. It is a challenge to apply the traditional techniques on large unstructured data. This study proposes a hybrid model to cluster big data using the famous traditional K-means clustering algorithm. The proposed model consists of three phases namely; Mapper phase, Clustering Phase and Reduce phase. The first phase uses map-reduce algorithm to split big data into small datasets. Whereas, the second phase implements the traditional clustering K-means algorithm on each of the spitted small data sets. The last phase is responsible of producing the general clusters output of the complete data set. Two functions, Mode and Fuzzy Gaussian, have been implemented and compared at the last phase to determine the most suitable one. The experimental study used four benchmark big data sets; Covtype, Covtype-2, Poker, and Poker-2. The results proved the efficiency of the proposed model in clustering big data using the traditional K-means algorithm. Also, the experiments show that the Fuzzy Gaussian function produces more accurate results than the traditional Mode function.
On the Tree Construction of Multi hop Wireless Mesh Networks with Evolutionar...CSCJournals
Abstract — in this paper, we study the structure of WiMAX mesh networks and the influence of tree’s structure on the performance of the network. From a given network’s graph, we search for trees, which fulfill some network, QoS requirements. Since the searching space is very huge, we use genetic algorithm in order to find solution in acceptable time. We use NetKey representation which is an unbiased representation with high locality, and due to high locality we expect standard genetic operators like n-point cross over and mutation work properly and there is no need for problem specific operators. This encoding belongs to class of weighted encoding family. In contrast to other representation such as characteristics vector encoding which can only indicate whether a link is established or not, weighted encodings use weights for genotype and can thus encode the importance of links. Moreover, by using proper fitness function we can search for any desired QOS constraint in the network.
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkPutra Wanda
Deep learning (DL) has achieved a significant performance in computer vision problems, mainly in automatic feature extraction and representation. However, it is not easy to determine the best pooling method in a different case study. For instance, experts can implement the best types of pooling in image processing cases, which might not be optimal for various tasks. Thus, it is
required to keep in line with the philosophy of DL. In dynamic neural network architecture, it is not practically possible to find
a proper pooling technique for the layers. It is the primary reason why various pooling cannot be applied in the dynamic and multidimensional dataset. To deal with the limitations, it needs to construct an optimal pooling method as a better option than max pooling and average pooling. Therefore, we introduce a dynamic pooling layer called RunPool to train the convolutional
neuralnetwork(CNN)architecture.RunPoolpoolingisproposedtoregularizetheneuralnetworkthatreplacesthedeterministic
pooling functions. In the final section, we test the proposed pooling layer to address classification problems with online social network (OSN) dataset
Bio-medical (EMG) Signal Analysis and Feature Extraction Using Wavelet TransformIJERA Editor
In this paper, the multi-channel electromyogram acquisition system is being developed using programmable
system on chip (PSOC) microcontroller to obtain the surface of EMG signal. The two pairs of single-channel
surface electrodes are utilized to measure the EMG signal obtained from forearm muscles. Then different levels
of Wavelet family are used to analyze the EMG signal. Later features in terms of root mean square, logarithm of
root mean square, centroid of frequency, as well as standard deviation were used to extract the EMG signal. The
proposed method of feature extraction for extracting EMG signal states that root means square feature extraction
method gives better performance as compared to the other features. In the near future, this method can be used to
control a mechanical arm as well as robotic arm in field of real-time processing.
Door Misha de Sterke
van De Sterke Consultancy
Volg me via @mishadesterke
Meer informatie? Ga naar www.mishadesterke.nl of mail naar desterkeconsultancy@gmail.com
Analysis, Design and Application of Continuously Variable Transmission (CVT)IJERA Editor
Continuously Variable Transmission (CVT) offers a continuum of gear ratios between desired limits. This allows the engine to operate more time in the optimum range. In contrast, traditional automatic and manual transmissions have several fixed transmission ratios forcing the engine to operate outside the optimum range. The need for a transmission system and the working principle of CVT has been discussed in depth. An attempt has been made to understand the contribution of Hydraulic Actuators, which is an integral part of a CVT. Furthermore, the question of how and why a Torque Converter has effectively replaced a conventional clutch has been answered. The materials used, constructional aspects and stress analysis of the belt has been discussed in detail. A graphical comparison of fuel efficiency between a manual transmission and a CVT in different high end vehicles is included. Recent developments in clamping force control for the push belt Continuously Variable Transmission (CVT) have resulted in increased efficiency in combination with improved robustness. Current control strategies attempt to prevent macro slip between elements and pulleys at all times for maximum robustness.
201907 AutoML and Neural Architecture SearchDaeJin Kim
Brief introduction of NAS
Review of EfficientNet (Google Brain), RandWire (FAIR) papers
NAS flow slide from KihoSuh's slideshare (https://www.slideshare.net/KihoSuh/neural-architecture-search-with-reinforcement-learning-76883153)
[References]
[1] EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks (https://arxiv.org/abs/1905.11946)
[2] Exploring Randomly Wired Neural Networks for Image Recognition (https://arxiv.org/abs/1904.01569)
DEVELOPING A NOVEL MULTIDIMENSIONAL MULTIGRANULARITY DATA MINING APPROACH FOR...cscpconf
Data Mining is one of the most significant tools for discovering association patterns that are useful for many knowledge domains. Yet, there are some drawbacks in existing mining techniques. Three main weaknesses of current data-mining techniques are: 1) re-scanning of the entire database must be done whenever new attributes are added. 2) An association rule may be true on a certain granularity but fail on a smaller ones and vise verse. 3) Current methods can only be used to find either frequent rules or infrequent rules, but not both at the same time. This research proposes a novel data schema and an algorithm that solves the above weaknesses while improving on the efficiency and effectiveness of data mining strategies. Crucial mechanisms in each step will be clarified in this paper. Finally, this paper presents experimental results regarding efficiency, scalability, information loss, etc. of the proposed approach to prove its advantages.
Machine learning in Dynamic Adaptive Streaming over HTTP (DASH)Eswar Publications
Recently machine learning has been introduced into the area of adaptive video streaming. This paper explores a novel taxonomy that includes six state of the art techniques of machine learning that have been applied to Dynamic Adaptive Streaming over HTTP (DASH): (1) Q-learning, (2) Reinforcement learning, (3) Regression, (4) Classification, (5) Decision Tree learning, and (6) Neural networks.
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval IJECEIAES
Data mining is an essential process for identifying the patterns in large datasets through machine learning techniques and database systems. Clustering of high dimensional data is becoming very challenging process due to curse of dimensionality. In addition, space complexity and data retrieval performance was not improved. In order to overcome the limitation, Spectral Clustering Based VP Tree Indexing Technique is introduced. The technique clusters and indexes the densely populated high dimensional data points for effective data retrieval based on user query. A Normalized Spectral Clustering Algorithm is used to group similar high dimensional data points. After that, Vantage Point Tree is constructed for indexing the clustered data points with minimum space complexity. At last, indexed data gets retrieved based on user query using Vantage Point Tree based Data Retrieval Algorithm. This in turn helps to improve true positive rate with minimum retrieval time. The performance is measured in terms of space complexity, true positive rate and data retrieval time with El Nino weather data sets from UCI Machine Learning Repository. An experimental result shows that the proposed technique is able to reduce the space complexity by 33% and also reduces the data retrieval time by 24% when compared to state-of-the-artworks.
A STUDY OF IMAGE COMPRESSION BASED TRANSMISSION ALGORITHM USING SPIHT FOR LOW...acijjournal
Image compression is internationally recognized up to the minute tools for decrease the communication
bandwidth and save the transmitting power. It should reproduce a good quality image after compressed
at low bit rates. Set partitioning in hierarchical trees (SPIHT) is wavelet based computationally very fast
and among the best image compression based transmission algorithm that offers good compression
ratios, fast execution time and good image quality. Precise Rate Control (PRC) is the distinct
characteristic of SPIHT. Image compression-based on Precise Rate Control and fast coding time are
principally analyzed in this paper. Experimental result shows that, in the case of low bit-rate, the
modified algorithm with fast Coding Time and Precise Rate Control can reduce the execution time and
improves the quality of reconstructed image in both PSNR and perceptual when compare to at the same
low bit rate.
Tail-biting convolutional codes (TBCC) have been extensively applied in communication systems. This method is implemented by replacing the fixedtail with tail-biting data. This concept is needed to achieve an effective decoding computation. Unfortunately, it makes the decoding computation becomes more complex. Hence, several algorithms have been developed to overcome this issue in which most of them are implemented iteratively with uncertain number of iteration. In this paper, we propose a VLSI architecture to implement our proposed reversed-trellis TBCC (RT-TBCC) algorithm. This algorithm is designed by modifying direct-terminating maximumlikelihood (ML) decoding process to achieve better correction rate. The purpose is to offer an alternative solution for tail-biting convolutional code decoding process with less number of computation compared to the existing solution. The proposed architecture has been evaluated for LTE standard and it significantly reduces the computational time and resources compared to the existing direct-terminating ML decoder. For evaluations on functionality and Bit Error Rate (BER) analysis, several simulations, System-on-Chip (SoC) implementation and synthesis in FPGA are performed.
A Study of BFLOAT16 for Deep Learning TrainingSubhajit Sahu
Highlighted notes of:
A Study of BFLOAT16 for Deep Learning Training
This paper presents the first comprehensive empirical study demonstrating the efficacy of the Brain Floating Point (BFLOAT16) half-precision format for DeepLearning training across image classification, speech recognition, language model-ing, generative networks, and industrial recommendation systems. BFLOAT16 is attractive for Deep Learning training for two reasons: the range of values it can represent is the same as that of IEEE 754 floating-point format (FP32) and conversion to/from FP32 is simple. Maintaining the same range as FP32 is important to ensure that no hyper-parameter tuning is required for convergence; e.g., IEEE 754compliant half-precision floating point (FP16) requires hyper-parameter tuning. In this paper, we discuss the flow of tensors and various key operations in mixed-precision training and delve into details of operations, such as the rounding modes for converting FP32 tensors to BFLOAT16. We have implemented a method to emulate BFLOAT16 operations in Tensorflow, Caffe2, IntelCaffe, and Neon for our experiments. Our results show that deep learning training using BFLOAT16tensors achieves the same state-of-the-art (SOTA) results across domains as FP32tensors in the same number of iterations and with no changes to hyper-parameters.
Clustering is also known as data segmentation aims to partitions data set into groups, clusters, according to their similarity. Cluster analysis has been extensively studied in many researches. There are many algorithms for different types of clustering. These classical algorithms can't be applied on big data due to its distinct features. It is a challenge to apply the traditional techniques on large unstructured data. This study proposes a hybrid model to cluster big data using the famous traditional K-means clustering algorithm. The proposed model consists of three phases namely; Mapper phase, Clustering Phase and Reduce phase. The first phase uses map-reduce algorithm to split big data into small datasets. Whereas, the second phase implements the traditional clustering K-means algorithm on each of the spitted small data sets. The last phase is responsible of producing the general clusters output of the complete data set. Two functions, Mode and Fuzzy Gaussian, have been implemented and compared at the last phase to determine the most suitable one. The experimental study used four benchmark big data sets; Covtype, Covtype-2, Poker, and Poker-2. The results proved the efficiency of the proposed model in clustering big data using the traditional K-means algorithm. Also, the experiments show that the Fuzzy Gaussian function produces more accurate results than the traditional Mode function.
On the Tree Construction of Multi hop Wireless Mesh Networks with Evolutionar...CSCJournals
Abstract — in this paper, we study the structure of WiMAX mesh networks and the influence of tree’s structure on the performance of the network. From a given network’s graph, we search for trees, which fulfill some network, QoS requirements. Since the searching space is very huge, we use genetic algorithm in order to find solution in acceptable time. We use NetKey representation which is an unbiased representation with high locality, and due to high locality we expect standard genetic operators like n-point cross over and mutation work properly and there is no need for problem specific operators. This encoding belongs to class of weighted encoding family. In contrast to other representation such as characteristics vector encoding which can only indicate whether a link is established or not, weighted encodings use weights for genotype and can thus encode the importance of links. Moreover, by using proper fitness function we can search for any desired QOS constraint in the network.
RunPool: A Dynamic Pooling Layer for Convolution Neural NetworkPutra Wanda
Deep learning (DL) has achieved a significant performance in computer vision problems, mainly in automatic feature extraction and representation. However, it is not easy to determine the best pooling method in a different case study. For instance, experts can implement the best types of pooling in image processing cases, which might not be optimal for various tasks. Thus, it is
required to keep in line with the philosophy of DL. In dynamic neural network architecture, it is not practically possible to find
a proper pooling technique for the layers. It is the primary reason why various pooling cannot be applied in the dynamic and multidimensional dataset. To deal with the limitations, it needs to construct an optimal pooling method as a better option than max pooling and average pooling. Therefore, we introduce a dynamic pooling layer called RunPool to train the convolutional
neuralnetwork(CNN)architecture.RunPoolpoolingisproposedtoregularizetheneuralnetworkthatreplacesthedeterministic
pooling functions. In the final section, we test the proposed pooling layer to address classification problems with online social network (OSN) dataset
Bio-medical (EMG) Signal Analysis and Feature Extraction Using Wavelet TransformIJERA Editor
In this paper, the multi-channel electromyogram acquisition system is being developed using programmable
system on chip (PSOC) microcontroller to obtain the surface of EMG signal. The two pairs of single-channel
surface electrodes are utilized to measure the EMG signal obtained from forearm muscles. Then different levels
of Wavelet family are used to analyze the EMG signal. Later features in terms of root mean square, logarithm of
root mean square, centroid of frequency, as well as standard deviation were used to extract the EMG signal. The
proposed method of feature extraction for extracting EMG signal states that root means square feature extraction
method gives better performance as compared to the other features. In the near future, this method can be used to
control a mechanical arm as well as robotic arm in field of real-time processing.
Door Misha de Sterke
van De Sterke Consultancy
Volg me via @mishadesterke
Meer informatie? Ga naar www.mishadesterke.nl of mail naar desterkeconsultancy@gmail.com
Analysis, Design and Application of Continuously Variable Transmission (CVT)IJERA Editor
Continuously Variable Transmission (CVT) offers a continuum of gear ratios between desired limits. This allows the engine to operate more time in the optimum range. In contrast, traditional automatic and manual transmissions have several fixed transmission ratios forcing the engine to operate outside the optimum range. The need for a transmission system and the working principle of CVT has been discussed in depth. An attempt has been made to understand the contribution of Hydraulic Actuators, which is an integral part of a CVT. Furthermore, the question of how and why a Torque Converter has effectively replaced a conventional clutch has been answered. The materials used, constructional aspects and stress analysis of the belt has been discussed in detail. A graphical comparison of fuel efficiency between a manual transmission and a CVT in different high end vehicles is included. Recent developments in clamping force control for the push belt Continuously Variable Transmission (CVT) have resulted in increased efficiency in combination with improved robustness. Current control strategies attempt to prevent macro slip between elements and pulleys at all times for maximum robustness.
Due to technological advances, vast data sets (e.g. big data) are increasing now days. Big Data a new term; is used
to identify the collected datasets. But due to their large size and complexity, we cannot manage with our current
methodologies or data mining software tools to extract those datasets. Such datasets provide us with unparalleled
opportunities for modelling and predicting of future with new challenges. So as an awareness of this and
weaknesses as well as the possibilities of these large data sets, are necessary to forecast the future. Today’s we
have an overwhelming growth of data in terms of volume, velocity and variety on web. Moreover this, from a
security and privacy views, both area have an unpredictable growth. So Big Data challenge is becoming one of the
most exciting opportunities for researchers in upcoming years.
Hence this paper discuss about this topic in a broad overview like; its current status; controversy; and challenges to
forecast the future. This paper defines at some of these problems, using illustrations with applications from various
areas. Finally this paper discuss secure management and privacy of big data as one of essential issues.
Experimental Investigation of Heat Transfer Using “Twisted Aluminium Tape”IJERA Editor
Heat exchanger has a wide industrial and engineering applications. Increasing the efficiency of heat exchanger will increase the overall performance of the unit. This paper is about increasing the heat transfer coefficient of heat exchanger by causing a turbulence in the liquid flow through the pipe. For causing the turbulence, liquid flow is subjected to a Twisted Aluminium Tape (TAT) which will increase the Reynolds number of flow. This will ensure heat transfer through the length of flow and thus there will be an increased heat transfer rate. The increased turbulence and higher shear caused by Twisted Aluminium Tape (TAT) will offer resistance to fouling also. KEY ]
A Survey of Privacy-Preserving Algorithms for Finding meeting point in Mobile...IJERA Editor
Location privacy in Location Based Services (LBS) is the capability to protect the connection between user’s identity, uncertainty sources, servers and database, thereby restraining an impending attacker from conveniently linking users of LBS to convinced locations. Smart Phones have become most important gadget for maintaining the daily activities, highly interconnected urban population is also increasingly dependent on these gadgets to regulate and schedule their daily lives. These applications often depend on current location of user or a class of user. Use of Smart Mapping technology is also increasing in large area; this system provides an easy attainable online platform that can be used for accessing many services. This survey paper projects the privacy-preserving algorithm to find the most favorable meeting location for a class of users. GSM calculates the location of all users.
Fabrication of aluminum foam from aluminum scrapIJERA Editor
In this study the optimum parameters affecting the preparation of aluminum foam from recycled aluminum were studied, these parameters are: temperature, CaCO3 to aluminum scrap wt. ratio as foaming agent, Al2O3 to aluminum scrap wt. ratio as thickening agent, and stirring time. The results show that, the optimum parameters are the temperature ranged from 800 to 850oC, CaCO3 to aluminum scrap wt. ratio was 5%, Al2O3 to aluminum scrap wt. ratio was 3% and stirring time was 45 second with stirring speed 1200 rpm. The produced foam apparent densities ranged from 0.40-0.60 g/cm3. The microstructure of aluminum foam was examined by using SEM, EDX and XRD, the results show that, the aluminum pores were uniformly distributed along the all matrices and the cell walls covered by thin oxide film.
Comparative Study of Manual Lubrication and Automatic LubricationIJERA Editor
Lubrication is the process or technique employed to reduce the wear of one or both surface in close proximity. Most of the bear lubrication fail due to the too much or too less grease injected using manual lubrication. So, to replace the manual lubrication discrete wiring employed in lubrication system in heavy machines using PLC. The PLC’s can be used to overcome the shortcomings PLC is mainly used to reduce the labor cost, power consumption, complication in circuits.
Removal of Harmful Textile Dye Congo Red from Aqueous Solution Using Chitosan...IJERA Editor
Color is an important aspect of human life. Textile industries are the major consumers of dye stuffs. During coloration process, 10 to 15 percent of the dyes will be lost and this will be discharged with the effluents coming from textile industries. These are very difficult to degrade and they may degrade to form products that are highly toxic to human. Today, methods such as coagulation, flocculation, activated carbon adsorption, etc. are available for the removal of dyes. These are all quite expensive and difficult to degrade. Chitosan is a natural hetero polymer derived from chitin. Chitosan has proved to be effective in removing hazardous compounds from environment due to its multiple functional groups. It is available as flakes and powder. In the present work, chitosan beads were prepared and modified with a cationic surfactant CTAB for the removal of dye Congo Red. Batch experiments were conducted to study the effect of CTAB concentration, contact time, agitation speed, adsorbent dosage, initial dye concentration and pH. Batch equilibrium data were analyzed using Langmuir and Freundlich isotherm. Bach kinetic data were analyzed using Pseudo first order kinetic model and pseudo second order kinetic model.
Acquiring Ecg Signals And Analysing For Different Heart AilmentsIJERA Editor
This paper describes and focuses on acquiring and identification of cardiac diseases using ECG waveform in LabVIEW software, which would bridge the gap between engineers and medical physicians. This model work collects the waveform of an affected person. The waveform is analyzed for diseases and then a report is sent to the doctor through mail. Initially the waveforms are collected from the person using EKG sensor with the help of surface electrodes and the hardware controlled by MCU C8051, acquires ECG and also Phonocardiogram (PCG) synchronously and the waveform is sent to the PC installed with LabVIEW software through DAQ-6211. The waveform in digital format is saved and sent to the loops containing conditions for different diseases. If the waveform parameters coincide with any of the looping statements, particular disease is indicated. Simultaneously the patient PCG report is also collected in a separate database containing all information, which will be sent to the doctor through mail.
Effect of Composition of Sand Mold on Mechanical Properties and Density of Al...IJERA Editor
Sand casting process involves many parameters such as size of the sand grain, amount of clay, percentage of moisture, green compressive strength, permeability, number of ramming, shatter index, mold type, mold hardness, etc. to name a few. In this paper the effect of sand mold casting process parameters, especially composition of molding sand, on hardness, tensile strength and density of aluminium alloy castings are studied. While other parameters are kept constant, grain fineness number, amount of clay, amount of moisture and number of ramming are varied. Experiments are conducted based on Taguchi’s L9 orthogonal array and cast specimens are tested to obtain their mechanical properties and density. The results obtained are evaluated to optimize process parameters at three different levels. . Optimum levels are found as Grain fineness number 55, two times ramming, 12 percent Clay and 13 percent Moisture. Based on optimum level of process parameters, confirmation test is conducted and results are found to be in confidence level.
The Adoption of a National Cloud Framework for Healthcare Delivery in NigeriaIJERA Editor
National cloud framework (NCF) , based on the cloud computing framework, is the idea of a cloud that is completely owned and managed by the government of a country for the sole purpose of delivering social amenities for the citizenry of that country with the aim of achieving the mandate of governance.The emergence of the internet and more specifically the idea of a cloud have brought about the application of information technology in almost all area of human activities, one such area is the healthcare sector.This paper tries to draw out a framework for the adoption of eHealth into cloud computing through the NCF model for the easy delivery of healthcare services by the government of Nigeria.The paper tries to address challenges concerning eHealth as regards Nigeria health sector and how the NCF model will help resolve these issues and also propose the adoption of this framework by other developing countries.
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MININGijcsit
The main goal of cluster analysis is to classify elements into groupsbased on their similarity. Clustering
has many applications such as astronomy, bioinformatics, bibliography, and pattern recognition. In this
paper, a survey of clustering methods and techniques and identification of advantages and disadvantages
of these methods are presented to give a solid background to choose the best method to extract strong
association rules.
A SURVEY OF CLUSTERING ALGORITHMS IN ASSOCIATION RULES MININGijcsit
The main goal of cluster analysis is to classify elements into groupsbased on their similarity. Clustering
has many applications such as astronomy, bioinformatics, bibliography, and pattern recognition. In this
paper, a survey of clustering methods and techniques and identification of advantages and disadvantages
of these methods are presented to give a solid background to choose the best method to extract strong
association rules.
The main goal of cluster analysis is to classify elements into groupsbased on their similarity. Clustering has many applications such as astronomy, bioinformatics, bibliography, and pattern recognition. In this paper, a survey of clustering methods and techniques and identification of advantages and disadvantages of these methods are presented to give a solid background to choose the best method to extract strong association rules.
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
Apriori is one of the key algorithms to generate frequent itemsets. Analysing frequent itemset is a crucial
step in analysing structured data and in finding association relationship between items. This stands as an
elementary foundation to supervised learning, which encompasses classifier and feature extraction
methods. Applying this algorithm is crucial to understand the behaviour of structured data. Most of the
structured data in scientific domain are voluminous. Processing such kind of data requires state of the art
computing machines. Setting up such an infrastructure is expensive. Hence a distributed environment
such as a clustered setup is employed for tackling such scenarios. Apache Hadoop distribution is one of
the cluster frameworks in distributed environment that helps by distributing voluminous data across a
number of nodes in the framework. This paper focuses on map/reduce design and implementation of
Apriori algorithm for structured data analysis.
DECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTIONcscpconf
Column-Stores has gained market share due to promising physical storage alternative for analytical queries. However, for multi-attribute queries column-stores pays performance
penalties due to on-the-fly tuple reconstruction. This paper presents an adaptive approach for reducing tuple reconstruction time. Proposed approach exploits decision tree algorithm to
cluster attributes for each projection and also eliminates frequent database scanning.Experimentations with TPC-H data shows the effectiveness of proposed approach.
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
Experimental study of Data clustering using k- Means and modified algorithmsIJDKP
The k- Means clustering algorithm is an old algorithm that has been intensely researched owing to its ease
and simplicity of implementation. Clustering algorithm has a broad attraction and usefulness in
exploratory data analysis. This paper presents results of the experimental study of different approaches to
k- Means clustering, thereby comparing results on different datasets using Original k-Means and other
modified algorithms implemented using MATLAB R2009b. The results are calculated on some performance
measures such as no. of iterations, no. of points misclassified, accuracy, Silhouette validity index and
execution time
Extended pso algorithm for improvement problems k means clustering algorithmIJMIT JOURNAL
The clustering is a without monitoring process and one of the most common data mining techniques. The
purpose of clustering is grouping similar data together in a group, so were most similar to each other in a
cluster and the difference with most other instances in the cluster are. In this paper we focus on clustering
partition k-means, due to ease of implementation and high-speed performance of large data sets, After 30
year it is still very popular among the developed clustering algorithm and then for improvement problem of
placing of k-means algorithm in local optimal, we pose extended PSO algorithm, that its name is ECPSO.
Our new algorithm is able to be cause of exit from local optimal and with high percent produce the
problem’s optimal answer. The probe of results show that mooted algorithm have better performance
regards as other clustering algorithms specially in two index, the carefulness of clustering and the quality
of clustering.
Decision Tree Clustering : A Columnstores Tuple Reconstructioncsandit
Column-Stores has gained market share due to promising physical storage alternative for
analytical queries. However, for multi-attribute queries column-stores pays performance
penalties due to on-the-fly tuple reconstruction. This paper presents an adaptive approach for
reducing tuple reconstruction time. Proposed approach exploits decision tree algorithm to
cluster attributes for each projection and also eliminates frequent database scanning.
Experimentations with TPC-H data shows the effectiveness of proposed approach.
Decision tree clustering a columnstores tuple reconstructioncsandit
Column-Stores has gained market share due to promi
sing physical storage alternative for
analytical queries. However, for multi-attribute qu
eries column-stores pays performance
penalties due to on-the-fly tuple reconstruction. T
his paper presents an adaptive approach for
reducing tuple reconstruction time. Proposed approa
ch exploits decision tree algorithm to
cluster attributes for each projection and also eli
minates frequent database scanning.
Experimentations with TPC-H data shows the effectiv
eness of proposed approach.
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...ijcsit
This paper introduces a novel approach for efficient video categorization. It relies on two main
components. The first one is a new relational clustering technique that identifies video key frames by
learning cluster dependent Gaussian kernels. The proposed algorithm, called clustering and Local Scale
Learning algorithm (LSL) learns the underlying cluster dependent dissimilarity measure while finding
compact clusters in the given dataset. The learned measure is a Gaussian dissimilarity function defined
with respect to each cluster. We minimize one objective function to optimize the optimal partition and the
cluster dependent parameter. This optimization is done iteratively by dynamically updating the partition
and the local measure. The kernel learning task exploits the unlabeled data and reciprocally, the
categorization task takes advantages of the local learned kernel. The second component of the proposed
video categorization system consists in discovering the video categories in an unsupervised manner using
the proposed LSL. We illustrate the clustering performance of LSL on synthetic 2D datasets and on high
dimensional real data. Also, we assess the proposed video categorization system using a real video
collection and LSL algorithm.
Online stream mining approach for clustering network trafficeSAT Journals
Abstract A large number of research have been proposed on intrusion detection system, which leads to the implementation of agent based intelligent IDS (IIDS), Non – intelligent IDS (NIDS), signature based IDS etc. While building such IDS models, learning algorithms from flow of network traffic plays crucial role in accuracy of IDS systems. The proposed work focuses on implementing the novel method to cluster network traffic which eliminates the limitations in existing online clustering algorithms and prove the robustness and accuracy over large stream of network traffic arriving at extremely high rate. We compare the existing algorithm with novel methods to analyse the accuracy and complexity. Keywords— NIDS, Data Stream Mining, Online Clustering, RAH algorithm, Online Efficient Incremental Clustering algorithm
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
Data mining is a process to extract information from a huge amount of data and transform it into an
understandable structure. Data mining provides the number of tasks to extract data from large databases such
as Classification, Clustering, Regression, Association rule mining. This paper provides the concept of
Classification. Classification is an important data mining technique based on machine learning which is used to
classify the each item on the bases of features of the item with respect to the predefined set of classes or groups.
This paper summarises various techniques that are implemented for the classification such as k-NN, Decision
Tree, Naïve Bayes, SVM, ANN and RF. The techniques are analyzed and compared on the basis of their
advantages and disadvantages
Similar to Study of Density Based Clustering Techniques on Data Streams (20)
Literature Review Basics and Understanding Reference Management.pptxDr Ramhari Poudyal
Three-day training on academic research focuses on analytical tools at United Technical College, supported by the University Grant Commission, Nepal. 24-26 May 2024
Understanding Inductive Bias in Machine LearningSUTEJAS
This presentation explores the concept of inductive bias in machine learning. It explains how algorithms come with built-in assumptions and preferences that guide the learning process. You'll learn about the different types of inductive bias and how they can impact the performance and generalizability of machine learning models.
The presentation also covers the positive and negative aspects of inductive bias, along with strategies for mitigating potential drawbacks. We'll explore examples of how bias manifests in algorithms like neural networks and decision trees.
By understanding inductive bias, you can gain valuable insights into how machine learning models work and make informed decisions when building and deploying them.
HEAP SORT ILLUSTRATED WITH HEAPIFY, BUILD HEAP FOR DYNAMIC ARRAYS.
Heap sort is a comparison-based sorting technique based on Binary Heap data structure. It is similar to the selection sort where we first find the minimum element and place the minimum element at the beginning. Repeat the same process for the remaining elements.
We have compiled the most important slides from each speaker's presentation. This year’s compilation, available for free, captures the key insights and contributions shared during the DfMAy 2024 conference.
NUMERICAL SIMULATIONS OF HEAT AND MASS TRANSFER IN CONDENSING HEAT EXCHANGERS...ssuser7dcef0
Power plants release a large amount of water vapor into the
atmosphere through the stack. The flue gas can be a potential
source for obtaining much needed cooling water for a power
plant. If a power plant could recover and reuse a portion of this
moisture, it could reduce its total cooling water intake
requirement. One of the most practical way to recover water
from flue gas is to use a condensing heat exchanger. The power
plant could also recover latent heat due to condensation as well
as sensible heat due to lowering the flue gas exit temperature.
Additionally, harmful acids released from the stack can be
reduced in a condensing heat exchanger by acid condensation. reduced in a condensing heat exchanger by acid condensation.
Condensation of vapors in flue gas is a complicated
phenomenon since heat and mass transfer of water vapor and
various acids simultaneously occur in the presence of noncondensable
gases such as nitrogen and oxygen. Design of a
condenser depends on the knowledge and understanding of the
heat and mass transfer processes. A computer program for
numerical simulations of water (H2O) and sulfuric acid (H2SO4)
condensation in a flue gas condensing heat exchanger was
developed using MATLAB. Governing equations based on
mass and energy balances for the system were derived to
predict variables such as flue gas exit temperature, cooling
water outlet temperature, mole fraction and condensation rates
of water and sulfuric acid vapors. The equations were solved
using an iterative solution technique with calculations of heat
and mass transfer coefficients and physical properties.
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...Amil Baba Dawood bangali
Contact with Dawood Bhai Just call on +92322-6382012 and we'll help you. We'll solve all your problems within 12 to 24 hours and with 101% guarantee and with astrology systematic. If you want to take any personal or professional advice then also you can call us on +92322-6382012 , ONLINE LOVE PROBLEM & Other all types of Daily Life Problem's.Then CALL or WHATSAPP us on +92322-6382012 and Get all these problems solutions here by Amil Baba DAWOOD BANGALI
#vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore#blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #blackmagicforlove #blackmagicformarriage #aamilbaba #kalajadu #kalailam #taweez #wazifaexpert #jadumantar #vashikaranspecialist #astrologer #palmistry #amliyaat #taweez #manpasandshadi #horoscope #spiritual #lovelife #lovespell #marriagespell#aamilbabainpakistan #amilbabainkarachi #powerfullblackmagicspell #kalajadumantarspecialist #realamilbaba #AmilbabainPakistan #astrologerincanada #astrologerindubai #lovespellsmaster #kalajaduspecialist #lovespellsthatwork #aamilbabainlahore #Amilbabainuk #amilbabainspain #amilbabaindubai #Amilbabainnorway #amilbabainkrachi #amilbabainlahore #amilbabaingujranwalan #amilbabainislamabad
NO1 Uk best vashikaran specialist in delhi vashikaran baba near me online vas...
Study of Density Based Clustering Techniques on Data Streams
1. Darshali Thoriya Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 5, Issue 2, ( Part -2) February 2015, pp.40-47
www.ijera.com 40 | P a g e
Study of Density Based Clustering Techniques on Data Streams
Darshali Thoriya*, Madhu Shukla**
*(Department of Computer Science, Marwadi Education Foundation Group of Institutes, Rajkot-3)
** (Department of Computer Science, Marwadi Education Foundation Group of Institutes, Rajkot-3)
ABSTRACT
Data streams are generated by many real time systems. Data stream is fast changing and massive. In stream data
mining traditional methods are not efficient so that many methodologies developed to stream data processing.
Many applications require data into groups based on its characteristics. So clustering on data streams is applied.
Clustering of non liner data density based clustering is used. Review of clustering algorithm and methodologies
is represented and evaluated if they meet requirement of users. Study of density based clustering algorithm is
presented here because of advantages of density based clustering method over other clustering method.
Keywords - Data Streams, Clustering, Density Based Clustering, Algorithms, Non Linear Data Base component
I. Introduction
Nowadays organization and scientific fields have
very large data base. Fields like astronomy,
telecommunication operations, stock market
application, social media, website analysis, banking,
e-commerce, network data, network intrusion
detection, weather monitoring, planetary remote
sensing, meteorological data, phone records this all
are examples of large data . This kind of data is
known as stream data. Stream data is ordered,
massive, fast changing, continuous and likely infinite
database. To find out pattern, find changes in data,
making better decision and discovering new facts
mining of stream data is necessary. Traditional data
mining methods is not useful for mining of data
streams. So for that many algorithm and
methodologies is developed to mine the stream data.
For processing stream data effectively we need new
techniques, data structure and algorithms because we
do not have enough space to store this large amount of
data. Random sampling, sliding window, histograms,
multi resolution methods, sketches and randomized
algorithm are basic data structure and methodologies
for mining data streams [13]. Classification of stream
data is not efficiently possible with the simple
classification algorithm of data mining. Classification
of stream data mining is possible with Hoeffding tree
algorithm, very fast decision tree (VFDT), concept
adaptive very fast decision tree (CVFDT) and
classifier ensemble approach. Web clickstream, stock
market analysis and network intrusion detection
clustering of stream data is required.
There are some requirements for any clustering
algorithm. Algorithm’s representation must be
compact because lengthy representation is not always
affordable. Processing of new data points must be
fast. Identification of outliers must be clear and fast.
What to do with the outliers this decision should be
taken simultaneously [1]. There are some challenges
and issues in data stream clustering. Accuracy,
efficiency, compactness, separateness, space
limitation and cluster validity are important issue in
the aspect of quality of clusters. In data streams data is
uncertain so this also becomes a challenge for
clustering. Different data type should be treated
differently this is also an issue. Arbitrary shape of
clusters makes hard to distinguish the accurate shape
of cluster [2]. Clustering methods are as follows:
partitioning methods, hierarchical methods, model
based methods, density based methods, grid based
methods, constraint based methods and evolutionary
methods.
II. ALGORITHMS OF CLUSTERING
1. BIRCH
BIRCH algorithm builds CF-tree hierarchical
data structure, which is a height balance tree. In CF-
tree each node contains CF-vector which contains
information of node which is under that node [1].
BIRCH algorithm gives best cluster which has
available main memory and it decreases I/O
requirement. Complexity of this algorithm is O(N)
because algorithm is one pass algorithm. Birch is not
useful for arbitrary shapes because here node of CF-
tree have limited space. BIRCH algorithm does not
provide compact representation because more
memory is required to process the data. It takes
considerable time amount to minimize the I/O
requirements. BIRCH algorithm have some reserved
space to handle outliers and it timely re-evaluates
outlier to check if they can absorbed in the current tree
without increasing the size [1]. Fig 1 shows how the
BIRCH algorithm works. Here in BIRCH Algorithm
there are 4 phases. In first phase of BIRCH algorithm
data are loaded into main memory by building CF
tree. Subsequent phases become fast, accurate and less
REVIEW ARTICLE OPEN ACCESS
2. Darshali Thoriya Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 5, Issue 2, ( Part -2) February 2015, pp.40-47
www.ijera.com 41 | P a g e
order sensitive. Second phase is optional. Here data is
condensed and build a smaller CF-tree. Third phase is
the global clustering phase where clustering algorithm
is applied on CF-leafs. Forth phase is optional and
offline phase. In fourth phase additional passes are
applied over the dataset & reassign data points to the
closest centroid from phase 3. Refining is optional. In
this way BIRCH algorithm works. Figure shows flow
diagram of BIRCH algorithm. Where four phases are
displayed with its input and working of individual
phase. As per diagram we give data as input and get
final cluster as output.
Fig.1. Working of BIRCH Algorithm
2. COBWEB
COBWEB is clustering algorithm which uses
incremental clustering technique. For classification
and tree formation it uses category utility function.
Hierarchical clustering model is used as classification
tree. In this tree each node has description of objects
classified under that node. These algorithms have
certain disadvantages because COBWEB does not
provide compact representation and time complexity
of adding new point is high. But identification of
outlier is effectively done by COBWEB [1].
3. STREAM
STREAM algorithm makes the batches of points
which fit in main memory and then process the data
streams. STREAM uses the LOCALSEARCH
algorithm which runs in linear time in proportional to
number of point [1]. STREAM algorithm provides
compact representation but it has higher time
complexity. Identification of outlier is not done by
STREAM which is considerable disadvantage.
4. Fractal Clustering Algorithm
FC algorithm uses the parameter of fractal
impact. This algorithm places the points in the cluster
which have minimum fractal impact [1]. FC has
capacity of finding clusters in arbitrary shapes. This
algorithm has compact representation and lower time
complexity. Identification of outlier is done
effectively by this algorithm. So impact of FC
algorithm is better than BIRCH, COBWEB and
STREAM.
5. CluStream
In the CluStream idea of BIRCH and STREAM
are used. In online and offline component concept of
micro clustering and macro clustering is applied. This
algorithm uses CF-vector to store data summary. It
employs pyramid structure for organizing clusters.
This algorithm has acceptable higher efficiency and
accuracy.
6. SPKMEANS Algorithm
SPKMEANS is the improved version of K-
Means, Spherical K-Means algorithm. In
SPKMEANS algorithm all vectors are normalized
and distance measure between them is cosine
similarity [2]. In SPKMEANS there is concept to set
cluster center such that it make both uniform and
minimal the angle between components.
7. Density Based Clustering
Density based clustering method is useful
because it can find clusters in arbitrary shapes and it
can handle noisy data efficiently. And it is one scan
algorithm because examination of raw data is done
only once [3]. This all are advantages of density
based clustering algorithms. In density based
clustering clusters are defined as areas of higher
density than remainder of the data sets. One cluster is
separated from other clusters by lower density
regions. There are many algorithms developed under
the density based clustering methods DBSCAN,
GDBSCAN, OPTICS and DenClu.
8. OCTS and OCTSM algorithm
OCTS is online clustering algorithm and
OCTSM is extended version of OCTS. There are two
phases in the OCTS algorithm 1) offline initialization
phase 2) online clustering process. Topic signature
model is generated in offline phase [2]. Xract tool is
used in this algorithm’s offline phase. OCTS
algorithm provides good cluster quality. OCTSM is
developed to improve the accuracy of the algorithm.
In OCTSM process of merging is introduced. For that
MergeFactor is used. Merge process gives second
chance to inactive cluster.
Phase 1: Load data into Memory by building
a CF Tree
Phase 2: Condense into desirable range by
building a smaller CF-tree
Phase 3: Global Clustering
Phase 4: Cluster Refining
Data
Initial CF tree
Smaller
CF tree
Good Cluster
Better
Cluster
3. Darshali Thoriya Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 5, Issue 2, ( Part -2) February 2015, pp.40-47
www.ijera.com 42 | P a g e
9. New Weighted Fuzzy Cmeans Algorithm:
New weighted fuzzy CMeans algorihm
(NWFCM) is developed to overcome disadvantages
of FCM and FWCM. This algorithm is useful for
unsupervised learning. Two concepts first weighted
mean from the nonparametric weighted feature
extraction (NWFE) and second cluster mean from
discriminate analysis feature extraction (DAFE) is
used in (NW-FCM) [2]. This algorithm has better
classification accuracy and stability than other fuzzy
logic algorithms. This algorithm is used for high-
dimensional multiclass pattern reorganization
problem.
III. DENSITY BASED CLUSTERING
Density based clustering method have capability
to detect cluster in arbitrary shapes and it can also
handle noise. Main algorithms of density based
clustering are DBSCAN, OPTICS and DENCLUE
[11]. Where DBSCAN is main algorithm which
represent the density based clustering method
effectively. DBSCAN is density based clustering
algorithm which defines the high density regions into
clusters. This algorithm is able to find clusters of
arbitrary shapes and it can handle noisy data.
Parameters of DBSCAN are radius of neighborhood
and the number of objects in neighborhood (MinPts).
Algorithm of DBSCAN is as follows: first this
algorithm randomly selecting a point and checking
whether the ǫ-neighbourhood of the point contains at
least MinPts points [3]. Than if yes than, it is
considered as a core point and a new cluster is
created and if not, it is considered as a noise point.
DBSCAN iteratively adds the data points, which do
not belong to any cluster and are directly density
reachable. This algorithm continues until all points
are visited and not any new point is added. In initial
stage of DBSCAN, it performed on simplified dataset
as shown in figure [3]. The dashed line displays a two
dimensional distance radius and the minimum points
threshold is 1.
Fig 3 DBSCAN on simplified dataset
IV. DENSITY BASED CLUSTERING
ALGORITHM ON DATA STREAMS
1. DenStream algorithm
This algorithm has ability to handle noise. This
algorithm use fading window model for clustering the
stream data. The algorithm expands the micro-cluster
concept as core micro-cluster, potential micro-cluster,
and outlier micro-cluster in order to distinguish real
data and outliers [11]. It is based on the online-offline
framework.
Fig 3 Algorithm of DenStream
This algorithm use fading window model for
clustering the stream data. Core-micro-cluster is
defined as CMC (W, C, R), W is the weight, C is
center and R is radius. Algorithm for DenStream is as
follows: DenStream (DS, : first define the
minimal time span for micro cluster than get the next
point at current time from data streams than merging
process is done on data streams. In merging process
first we try to merge point into nearest micro cluster
if it does not fit into micro cluster than we try to
merge it with outliers and check the weight of current
micro cluster [11]. This process gets repeated if
request of cluster is arrived and generate the cluster.
DenStream algorithm does not release any memory
space by either deleting a micro-cluster or merging
two old microclusters [11].
2. StreamOptics
OPTICS- Ordering Points To Identify The
Clustering Structure.
StreamOptics extends OPTICS algorithm for
data streams using concepts of micro clusters. It
computes augmented cluster ordering for automatic
and interactive analysis. This algorithm maintains
proper ordering of the objects in given database and
also store the core-distance and reach ability distance
4. Darshali Thoriya Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 5, Issue 2, ( Part -2) February 2015, pp.40-47
www.ijera.com 43 | P a g e
for each object respectively [12]. Stream algorithm
also uses potential micro-cluster and outlier micro-
cluster. StreamOptics algorithm are as follows: first
the neighbourhood of each potential micro-cluster is
determined than ordered list of potential micro-
clusters is made based by using reachability distance
and then produces a reachability plot that represents
the micro-cluster structure using OPTICS algorithm
[12]. This algorithm provides three dimensional plot
which shows evolution of cluster structure with
respect to time. But it is not useful for cluster
extraction and manual checking is needed in three
dimensional plots.
3. MR-Stream Algorithm
MR-stream is algorithm which applies density
based cluster on multiple resolution data stream. This
algorithm improves the performance by running
offline component in steady time [10]. MR-Stream
algorithm uses tree like data structure. In MR-Stream
algorithm there are two phases online phase and
offline phase. Algorithm for online component of
MR-Stream is as follows: first step is to initialize tree
than check if data stream is active or not. Second step
is to read the records than using parameters two
methods updateTree and pruneTree is applied [10].
Next step is to sample number of nodes in T and after
that update and prune method applied again. After the
online component offline component generate the
cluster cell. In MR-Stream algorithm a memory
sampling method is proposed to trigger right time to
offline component [10]s. This algorithm makes
relation between odes of tree and evaluated cluster.
This algorithm improves the performance by
introducing memory sampling method which define
the right time for online and offline clusters. MR-
Stream put the sparse grids and merges them for
consideration as a noise cluster, this we can consider a
disadvantage.
4. D-Stream Algorithm
This algorithm is developed to clustering data in
real time. There are mainly two challenges for which
algorithm is developed. First it is not desirable to treat
the data stream as a long sequence of data. And
second, it is not possible to retain the density
information for every data in database [3]. Algorithm
which displays overall process of D-Stream algorithm
is as follows. In algorithm first initialize empty hash
table to grid list than read record of data stream than
determine the density of the grid. After that check
whether density grid g is in the grid list or not. If
density grid is not in the grid list than insert density
grid in the grid list. Then update the characteristic
vector of density grid. Then method of initial
clustering and adjust clustering is called. Timestamp
we should increase continuously. In the process of
initial clustering, they update the density of all grids
which are active to the current proposed time. After
than cluster procedure become same as standard
method of density based clustering. Now the process
of dynamically adjusting cluster is as follows: Here
grid list is given as input. If density grid g is sparse
grid than delete g from its cluster and label g as
NO_CLASS. If cluster becomes unconnected than
split into two clusters [3]. If g is dense grid than
among all neighbouring grids of g find out grid whose
cluster has the largest size than check for grid h, if it is
sparse, dense or traditional grid. In this way process of
adjust clustering takes place [3]. Figure 4 shows the
illustration of density grid. Here data is assigned to
grids. Grids are weighted by regency of points added
to it and each grid is associated with one label. After
the offline processing we get the final cluster. This
algorithm effectively handles the outliers. It proposes
density decaying function to adjusting the clusters in
real time data. But for finding time interval gap it
gives minimum time but time interval gap depends
upon many parameters which is disadvantage of the
D-Stream algorithm.
Fig 4 Illustration of the use of density grid
5. HDDStream Algorithm
This algorithm is for clustering high dimensional
data streams. In the online phase of this algorithm
keep the summary of the points and dimensions and
offline phase generates the final cluster based on
given projected cluster [8]. Main three parts of
HDDStream algorithm: first initial set of microcluster
is extracted. After that online microcluster
maintenance as new point has arrived and old points
are expire due to ageing. Adding method is used in
this phase where update dimension preference and
find the closest micro cluster are there and final
cluster is extracted on demand [8]. HDDStream can
cluster high dimensional data effectively but in the
pruning time it checks only micro cluster weights
whereas micro cluster weight should be checked also.
This is the disadvantage of the HDDStream algorithm.
6. DENGRIS Algorithm
DENGRIS-Stream (a DENsity GRId-based
algorithm for clustering data streams over sliding
5. Darshali Thoriya Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 5, Issue 2, ( Part -2) February 2015, pp.40-47
www.ijera.com 44 | P a g e
window). This algorithm maps each input into grids.
It has online and offline components like other
algorithms. Online components read data streams into
grids and maps the data to related grid cells [4]. It
updates the feature vector. Offline component adjust
cluster in sliding window. Algorithm for DENGRIS
is as follows: First initialize empty tree for grid list
then for active data stream read record from the data
streams and determine the grid cell. If grid cell not in
the grid list than insert it into grid list otherwise
update feature vector of grid cell after that generate
initial cluster, adjust cluster and remove expired grids
is applied [4]. RemoveExpiredGrids process detects
and removes expired grid based on the grid’s
timestamp. Two other processes are used in the
DENGRIS algorithm. Which are Ganerate Initial
Cluster and Adjust Cluster. Generate Initial Clusters
initializes the clusters by considering each dense grid
as unique cluster identification [4]. Adjust-Clusters
the sparse grids are removed from the grid list while
clustering the dense grids. In adjust cluster method
neighbouring grids are checked in respect to large
cluster [4]. Then assign grid into the cluster with
respect to its size. Still DENGRIS algorithm is not
compared with any other current algorithm so for
efficient evaluation comparison is needed.
7. SOMKE Algorithm
SOMKE algorithm is developed for kernel
density estimation over streaming data and data
streams which is based on sequence of self
organizing maps. SOMKE algorithm creates SOM
sequence and merges SOM sequence [5]. SOMKE
algorithm has good performance compared to other
Algorithms. SOMKE consist main three steps first
creates the SOM sequence which is built to
summarize the information of Wt (weight) needed for
KDE analysis and second merging the SOM
sequence. SOM sequences are merged based on the
kullback-Leibler divergence [5].
Fig. 5 SOMKE: the proposed system architecture of
the SOM-based KDE method (example of 2-D data).
Final estimation which is the last step of SOMKE
where SOM sequence to estimate the PDFs over
arbitrary time periods on data stream. Advatages of
SOMKE algorithm is that it can be used in the context
of nonstationary data streams effectively and
efficiently. In figure 5 proposed system architecture of
SOM based KDE method is displayed.
8. LeaDen-Stream Algorithm
LeaDen-Stream algorithm is Leader Density
based Clustering algorithm over data streams.
LeaDen-Stream algorithm is two phase clustering
method where online phase selects the appropriate
mini-micro cluster leader or micro cluster leader
depended on distribution of data points in the micro
cluster. Offline phase makes final cluster.
Algorithm for Lea Den Stream is as follows:
Here in this algorithm we give data stream as input
and arbitrary shape cluster is output. After that read
whole data stream and Adjust Leader Cluster and
Puring Leader Cluster this two methods is applied on
the data streams [6]. Adjust Leader Cluster method
list mini micro leader cluster and micro leader cluster
we got as a output. Here all data points are analyzed
and fine the nearest micro leader cluster. If the
distance is less than lm than find distance with mini
micro leader cluster if it is less than merge the data
point with cluster otherwise form a new mini micro
leader cluster or micro leader cluster as per condition
[6]. Here all the MLC and MMLC (micro mini leader
cluster) are checked. If all the MMLC are sparse then
delete the MLC (micro leader cluster), if all the
MMLC are dense then add the MLC center to the list.
If some of MMLC are dense and some sparse then
add all the DMMLC (dense mini micro leader cluster)
center and remove SMMLC (sparse mini micro leader
cluster) [6]. In this way we got list of centers. After
that by using this information final cluster is formed.
However, we need to use a clustering algorithm to get
the final clusters. When a clustering request arrives,
DBSCAN algorithm is used on the micro and mini-
micro leader cluster centers to get the final results.
Each mini- micro and micro leader center is used as a
virtual point to be used for clustering. LeaDen Stream
algorithm.
Fig. 3 Mini micro and micro leader clusters
6. Darshali Thoriya Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 5, Issue 2, ( Part -2) February 2015, pp.40-47
www.ijera.com 45 | P a g e
The efficiency is measured by the execution time. The quality of LeaDen- Stream is higher than CluStream
with lower execution time. The LeaDen-Stream clustering quality is equal to DenStream while it runs faster than
DenStream.
V. ANALYSIS OF DENSITY BASED CLUSTERING ALGORITHMS ON DATA
STREAMS
Here we have compared density based clustering algorithms on data streams based on its parameters,
advantages, disadvantages, technology and future scope. Denstream algorithm, streamoptics algorithm, MR-
Stream algorithm, D-Stream algorithm, HDDStream algorithm DENGRIS algorithm, SOMKE algorithm and
LeaDen Stream algorithm are compared in this comparison table
Table 1: Comparison of Algorithms
Algorithms Parameters Advantages Disadvantages Technology Future Scope
Denstream
Algorithm
[11]
Cluster
Radius,
Cluster
Weight,
Outlier
Threshold,
Decay Factor
-Gives Arbitrary
shape clusters
-Handles the data
stream effectively
by recognizing the
potential clusters
from the real
outliers.
-Does not release
any memory
space by either
deleting a micro-
cluster or
merging two old
micro-clusters.
-Pruning phase
for removing
outliers is a time
consuming
process in the
algorithm
MOA tool -The discovery of clusters
with arbitrary shape at
multiple levels of granularity.
-Dynamic adaption of the
parameters in data streams.
- Investigation of our
framework for outlier
detection and density-based
clustering in other stream
models, particular in a sliding
window model.
Streamoptics
Algorithm
[12]
Potential
Micro-Cluster
List, Core
Distance,
Reachability
Distance
-Cluster structure
plot over time
-Provide the
three-dimensional
plot
-It is not a
supervised
method for
cluster extraction
-It needs manual
checking of the
generated three-
dimensional plot
WEKA and
R tool
-automate the generation three
dimensional plot and improve
the efficiency.
MR-Stream
Algorithm
[10]
Data Stream,
Decay Factor,
Dense Cell
Threshold,
Sparse Cell
Threshold
-Memory
sampling method
in order to define
the right time for
running the
offline component
-Improves the
performance of
the clustering.
-This algorithm
gives clusters in
multiple
resolutions.
-It improves
performance
-MR-Stream
keeps the sparse
grids and merges
them for
consideration as
a noise cluster.
-Cannot work
properly in high-
dimensional
data.
Apache
hadoop
-Make algorithm effective for
detecting noise by using
various approaches.
D-Stream
Algorithm
[3]
Data Stream,
Decay Factor,
Dense Grid
Threshold,
Sparse Grid
Threshold
-It proposes a
density decaying
to adjust the
clusters in real
time and captures
the evolving
behavior of data
streams
-for determining
the time interval
gap, the
algorithm
considers the
minimum time
but gap depends
on many
R-tool and
VC++ 6.0
-Algorithm would define the
time gap based on only the
conversion of dense grids to
sparse ones, since the
conversion of sparse grid to
dense one has already been
considered in the weight of the
grid
7. Darshali Thoriya Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 5, Issue 2, ( Part -2) February 2015, pp.40-47
www.ijera.com 46 | P a g e
-It has techniques
for handling the
outlier.
-It provide
clusters in
arbitrary shapes.
-improves the
quality
parameters.
-It cannot handle
the high-
dimensional data
HDDStream
Algorithm
[8]
Cluster
Radius,
Cluster
Weight, Out-
Lier
Threshold,
Decay Factor
-Arbitrary shape
clusters
-Clustering high-
dimensional data
-allows noise
handling
-does not require
the information of
number of clusters
-monitors the
behavior of data.
-searching the
affected
neighboring
clusters is a time
consuming
process.
java and
c++
-improve the time complexity
by using various concepts.
DENGRIS
Algorithm
[4]
Data Stream,
Sliding
Window Size
-it can capture the
distribution of
recent records
precisely using
sliding window
model.
-it removes the
expired grids
before any
processing on the
grid list which
leads to save time
and memory
No evaluation to
show its
effectiveness
compared with
other state-of-
the-art
algorithms
KEEL
software
-Compare with other density
based clustering approaches to
show its effectiveness.
SOMKE
Algorithm
[5]
Window Size,
Neurons,
Topological
Neighborhood
-Algorithm has
good performance
compared to other
algorithm.
-Use in non
stationary data
efficiently and
effectively.
- Cannot handle
imbalance data.
WEKA -Compare with other SOM
based time series approach
like patch cluster method,
SOMAR and GSOMAR
model and observe and
analyze the results. So new
insight and findings we can
get.
-Study of magnification effect
on SOM-based technique.
-Apply this algorithm on
imbalanced data.
Leaden
Stream
algorithm [6]
MMLC And
MLCweight,
MMLC And
MLC Center,
MMLC And
MLC Radius.
-Increase cluster
quality
-Decrease time
complexity
- Cannot handle
high dimensional
data.
- Quality is same
as Denstream
Algorithm
MOA
framework
-Automate the parameters of
this algorithm and examine
algorithm in sliding window
model.
8. Darshali Thoriya Int. Journal of Engineering Research and Applications www.ijera.com
ISSN : 2248-9622, Vol. 5, Issue 2, ( Part -2) February 2015, pp.40-47
www.ijera.com 47 | P a g e
VI. CONCLUSION
The density-based clustering method has many
advantages like special characteristics, which has the
ability to detect arbitrary shape clusters and handle
noise. Therefore, so many clustering algorithms on
data stream used density method. In this paper, we
surveyed a number of density based clustering
algorithms over data streams. The main advantage of
this paper is that it gives a comprehensive overview
of the density-based data stream clustering algorithms
and the evaluation table gives information of
parameters, advantages and disadvantages.
REFERENCES
Journal Papers
[1] Daniel barbara,”Requirements of clustering
data streams”in SIGKDD Explorations,
Volume 3, Issue 2 - page 23-27
[2] Madjid Khalilian, Norwati Mustapha,” Data
Stream Clustering: Challenges and
Issues”,IMCES 2010, march 17-19,2010,
hong kong.
[3] Yixin Chen and Li Tu,” Density-Based
Clustering for Real-Time Stream Data”.
[4] Amineh Amini1 and Teh Ying Wah2,”
DENGRIS-Stream: A Density-Grid based
Clustering Algorithm for Evolving Data
Streams over Sliding Window”,
(ICDMCE'2012) December 21-22, 2012
Bangkok (Thailand), 206-210.
[5] Yuan Cao, Haibo He and Hong Man,”
SOMKE: Kernel Density Estimation Over
Data Streams by Sequences of Self-
Organizing Maps”, IEEE Transactions On
Neural Networks And Learning Systems,
Vol. 23, No. 8, August 2012, 1254-1267.
[6] Amineh Amini, Teh Ying Wah,” Leaden-
Stream: A Leader Density-Based Clustering
Algorithm over Evolving Data Stream”,
Journal of Computer and Communications,
2013, 1, 26-31
[7] David Breitkreutz and Kate Casey,”
Clusterers: a Comparison of Partitioning and
Density-Based Algorithms and a Discussion
of Optimisations”.
[8] Irene Ntoutsi1, Arthur Zimek1, Themis
Palpanas2, Peer Kröger1, Hans-Peter
Kriegel1,” Density-based Projected
Clustering over High Dimensional Data
Streams”.
[9] Amineh Amini, Teh Ying Wah, and Hadi
Saboohi,” On Density-Based Data Streams
Clustering Algorithms: A Survey”, Journal
Of Computer Science And Technology
29(1): 116–141 Jan. 2014. DOI
10.1007/s11390-013-1416-3.
[10] Wan, Li, Wan, L. “Density-based clustering
of data streams at multiple resolutions”
Presented at Discover URECA @ NTU
poster exhibition and competition, Nanyang
Technological University, Singapore.2008
March.
[11] Feng Cao, Martin Estery, Weining Qian,
Aoying Zhou, “Density-Based Clustering
over an Evolving Data Stream with Noise”.
[12] Ankerst M, Breunig M M, Kriegel H P,
Sander J. Optics: Ordering points to identify
the clustering structure. ACM
SIGMOD Record, 1999, 28(2): 49-60.
Books:
[13] M, Kamber and J, Han. Data Mining:
Concepts and Techniques,Second edition,
2001.