Information extraction from data is one of the key necessities for data analysis. Unsupervised nature of data leads to complex computational methods for analysis. This paper presents a density based spatial clustering technique integrated with one-class Support Vector Machine (SVM), a machine learning technique for noise reduction, a modified variant of DBSCAN called Noise Reduced DBSCAN (NRDBSCAN). Analysis of DBSCAN exhibits its major requirement of accurate thresholds, absence of which yields suboptimal results. However, identifying accurate threshold settings is unattainable. Noise is one of the major side-effects of the threshold gap. The proposed work reduces noise by integrating a machine learning classifier into the operation structure of DBSCAN. The Experimental results indicate high homogeneity levels in the clustering process.
Survey Paper on Clustering Data Streams Based on Shared Density between Micro...IRJET Journal
This document discusses a survey of clustering data streams based on shared density between micro-clusters. It describes how current reclustering approaches for data stream clustering ignore density information between micro-clusters, which can result in inaccurate cluster assignments. The paper proposes DBSTREAM, a new approach that captures shared density between micro-clusters using a density graph. This density information is then used in the reclustering process to generate final clusters based on actual density between adjacent micro-clusters rather than assumptions about data distribution.
A fuzzy clustering algorithm for high dimensional streaming dataAlexander Decker
This document summarizes a research paper that proposes a new dimension-reduced weighted fuzzy clustering algorithm (sWFCM-HD) for high-dimensional streaming data. The algorithm can cluster datasets that have both high dimensionality and a streaming (continuously arriving) nature. It combines previous work on clustering algorithms for streaming data and high-dimensional data. The paper introduces the algorithm and compares it experimentally to show improvements in memory usage and runtime over other approaches for these types of datasets.
This document summarizes a research paper that proposes a new density-based clustering technique called Triangle-Density Based Clustering Technique (TDCT) to efficiently cluster large spatial datasets. TDCT uses a polygon approach where the number of data points inside each triangle of a polygon is calculated to determine triangle densities. Triangle densities are used to identify clusters based on a density confidence threshold. The technique aims to identify clusters of arbitrary shapes and densities while minimizing computational costs. Experimental results demonstrate the technique's superiority in terms of cluster quality and complexity compared to other density-based clustering algorithms.
The document proposes a Modified Pure Radix Sort algorithm for large heterogeneous datasets. The algorithm divides the data into numeric and string processes that work simultaneously. The numeric process further divides data into sublists by element length and sorts them simultaneously using an even/odd logic across digits. The string process identifies common patterns to convert strings to numbers that are then sorted. This optimizes problems with traditional radix sort through a distributed computing approach.
This document summarizes a research paper on developing an improved LEACH (Low-Energy Adaptive Clustering Hierarchy) communication protocol for energy efficient data mining in multi-feature sensor networks. It begins with background on wireless sensor networks and issues like energy efficiency. It then discusses the existing LEACH protocol and its drawbacks. The proposed improved LEACH protocol includes cluster heads, sub-cluster heads, and cluster nodes to address LEACH's limitations. This new version aims to minimize energy consumption during cluster formation and data aggregation in multi-feature sensor networks.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
GET IEEE BIG DATA,JAVA ,DOTNET,ANDROID ,NS2,MATLAB,EMBEDED AT LOW COST WITH BEST QUALITY PLEASE CONTACT BELOW NUMBER
FOR MORE INFORMATION PLEASE FIND THE BELOW DETAILS:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com
Mobile: 9791938249
Telephone: 0413-2211159
www.nexgenproject.com
This document summarizes an article that proposes a new image steganography technique using discrete wavelet transform. The technique applies an adaptive pixel pair matching method from the spatial domain to the frequency domain. Data is embedded in the middle frequencies of the discrete wavelet transformed image because they are more robust to attacks than high frequencies. The coefficients in the low frequency sub-band are preserved unchanged to improve image quality. The experimental results showed better performance with discrete wavelet transform compared to the spatial domain.
Survey Paper on Clustering Data Streams Based on Shared Density between Micro...IRJET Journal
This document discusses a survey of clustering data streams based on shared density between micro-clusters. It describes how current reclustering approaches for data stream clustering ignore density information between micro-clusters, which can result in inaccurate cluster assignments. The paper proposes DBSTREAM, a new approach that captures shared density between micro-clusters using a density graph. This density information is then used in the reclustering process to generate final clusters based on actual density between adjacent micro-clusters rather than assumptions about data distribution.
A fuzzy clustering algorithm for high dimensional streaming dataAlexander Decker
This document summarizes a research paper that proposes a new dimension-reduced weighted fuzzy clustering algorithm (sWFCM-HD) for high-dimensional streaming data. The algorithm can cluster datasets that have both high dimensionality and a streaming (continuously arriving) nature. It combines previous work on clustering algorithms for streaming data and high-dimensional data. The paper introduces the algorithm and compares it experimentally to show improvements in memory usage and runtime over other approaches for these types of datasets.
This document summarizes a research paper that proposes a new density-based clustering technique called Triangle-Density Based Clustering Technique (TDCT) to efficiently cluster large spatial datasets. TDCT uses a polygon approach where the number of data points inside each triangle of a polygon is calculated to determine triangle densities. Triangle densities are used to identify clusters based on a density confidence threshold. The technique aims to identify clusters of arbitrary shapes and densities while minimizing computational costs. Experimental results demonstrate the technique's superiority in terms of cluster quality and complexity compared to other density-based clustering algorithms.
The document proposes a Modified Pure Radix Sort algorithm for large heterogeneous datasets. The algorithm divides the data into numeric and string processes that work simultaneously. The numeric process further divides data into sublists by element length and sorts them simultaneously using an even/odd logic across digits. The string process identifies common patterns to convert strings to numbers that are then sorted. This optimizes problems with traditional radix sort through a distributed computing approach.
This document summarizes a research paper on developing an improved LEACH (Low-Energy Adaptive Clustering Hierarchy) communication protocol for energy efficient data mining in multi-feature sensor networks. It begins with background on wireless sensor networks and issues like energy efficiency. It then discusses the existing LEACH protocol and its drawbacks. The proposed improved LEACH protocol includes cluster heads, sub-cluster heads, and cluster nodes to address LEACH's limitations. This new version aims to minimize energy consumption during cluster formation and data aggregation in multi-feature sensor networks.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
GET IEEE BIG DATA,JAVA ,DOTNET,ANDROID ,NS2,MATLAB,EMBEDED AT LOW COST WITH BEST QUALITY PLEASE CONTACT BELOW NUMBER
FOR MORE INFORMATION PLEASE FIND THE BELOW DETAILS:
Nexgen Technology
No :66,4th cross,Venkata nagar,
Near SBI ATM,
Puducherry.
Email Id: praveen@nexgenproject.com
Mobile: 9791938249
Telephone: 0413-2211159
www.nexgenproject.com
This document summarizes an article that proposes a new image steganography technique using discrete wavelet transform. The technique applies an adaptive pixel pair matching method from the spatial domain to the frequency domain. Data is embedded in the middle frequencies of the discrete wavelet transformed image because they are more robust to attacks than high frequencies. The coefficients in the low frequency sub-band are preserved unchanged to improve image quality. The experimental results showed better performance with discrete wavelet transform compared to the spatial domain.
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET Journal
This document discusses various methods for customer segmentation through analysis of massive customer transaction data, including K-Means clustering, PAM clustering, agglomerative clustering, divisive clustering, and density-based clustering. It finds that K-Means is the most commonly used partitioning method. The document also reviews related work on customer segmentation and clustering algorithms like CLARA, CLARANS, BIRCH, ROCK, CHAMELEON, CURE, DHCC, DBSCAN, and LOF. It proposes a framework for an online shopping site that would apply these techniques to group customers based on their product preferences in transaction data.
This document summarizes an article from the International Journal of Computer Engineering and Technology (IJCET) that proposes an algorithm called Replica Placement in Graph Topology Grid (RPGTG) to optimally place data replicas in a graph-based data grid while ensuring quality of service (QoS). The algorithm aims to minimize data access time, balance load among replica servers, and avoid unnecessary replications, while restricting QoS in terms of number of hops and deadline to complete requests. The article describes how the algorithm converts the graph structure of the data grid to a hierarchical structure to better manage replica servers and proposes services to facilitate dynamic replication, including a replica catalog to track replica locations and a replica manager to perform replication
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms ijcseit
Subspace clustering is an emerging task that aims at detecting clusters in entrenched in
subspaces. Recent approaches fail to reduce results to relevant subspace clusters. Their results are
typically highly redundant and lack the fact of considering the critical problem, “the density divergence
problem,” in discovering the clusters, where they utilize an absolute density value as the density threshold
to identify the dense regions in all subspaces. Considering the varying region densities in different
subspace cardinalities, we note that a more appropriate way to determine whether a region in a subspace
should be identified as dense is by comparing its density with the region densities in that subspace. Based
on this idea and due to the infeasibility of applying previous techniques in this novel clustering model, we
devise an innovative algorithm, referred to as DENCOS(DENsity Conscious Subspace clustering), to adopt
a divide-and-conquer scheme to efficiently discover clusters satisfying different density thresholds in
different subspace cardinalities. DENCOS can discover the clusters in all subspaces with high quality, and
the efficiency significantly outperforms previous works, thus demonstrating its practicability for subspace
clustering. As validated by our extensive experiments on retail dataset, it outperforms previous works. We
extend our work with a clustering technique based on genetic algorithms which is capable of optimizing the
number of clusters for tasks with well formed and separated clusters.
This document discusses using a particle swarm optimization algorithm to solve the K-node set reliability optimization problem for distributed computing systems. The K-node set reliability optimization problem aims to maximize the reliability of a subset of k nodes in a distributed computing system while satisfying a specified capacity constraint. It presents the problem formulation and describes particle swarm optimization, a metaheuristic optimization technique inspired by swarm intelligence. The proposed algorithm applies a discrete particle swarm optimization approach to solve the K-node set reliability optimization problem, which is demonstrated on an example distributed computing system topology.
Density Based Clustering Approach for Solving the Software Component Restruct...IRJET Journal
This document presents research on using the DBSCAN clustering algorithm to solve the problem of software component restructuring. It begins with an abstract that introduces DBSCAN and describes how it can group related software components. It then provides background on software component clustering and describes DBSCAN in more detail. The methodology section outlines the 4 phases of the proposed approach: data collection and processing, clustering with DBSCAN, visualization and analysis, and final restructuring. Experimental results show that DBSCAN produces more evenly distributed clusters compared to fuzzy clustering. The conclusion is that DBSCAN is a better technique for software restructuring as it can identify clusters of varying shapes and sizes without specifying the number of clusters in advance.
Data collection in multi application sharing wireless sensor networksPvrtechnologies Nellore
- This document discusses algorithms for minimizing data collection in wireless sensor networks that are shared by multiple applications. It introduces the interval data sharing problem, where each application requires continuous interval data sampling rather than single data points.
- The problem is formulated as a non-linear, non-convex optimization problem. A 2-factor approximation algorithm is proposed with time complexity O(n^2) and memory complexity O(n) to address the high complexity of solving the optimization problem on resource-constrained sensor nodes.
- A special case where sampling intervals are the same length is analyzed, and a dynamic programming algorithm is provided that runs in optimal O(n^2) time and O(n) memory. Three online algorithms
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
The problem of evaluating node importance in clustering has been active research in present days and many methods have been developed. Most of the clustering algorithms deal with general similarity measures. However In real situation most of the cases data changes over time. But clustering this type of data not only decreases the quality of clusters but also disregards the expectation of users, when usually require recent clustering results. In this regard we proposed Our-NIR method that is better than Ming-Syan Chen proposed a method and it has proven with the help of results of node importance, which is related to calculate the node importance that is very useful in clustering of categorical data, still it has deficiency that is importance of data labeling and outlier detection. In this paper we modified Our-NIR method for evaluating of node importance by introducing the probability distribution which will be better than by comparing the results.
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...IRJET Journal
This document discusses techniques for clustering hierarchical documents based on their structural similarity. It summarizes several existing approaches:
1) A tree edit distance-based method that represents trees as paths and computes the distance between subtrees. However, it requires trees to have a pre-specified structure.
2) Chawathe's algorithm that uses pre-order tree traversal and transforms trees into sequences of node labels and depths to calculate distances. It allows efficient assignment of new documents to clusters.
3) The XCLSC algorithm that clusters documents in two phases - grouping structurally similar documents and then searching to further improve clustering results and performance. However, it has high computational requirements.
4) The XPattern and PathXP
IRJET- Enhanced Density Based Method for Clustering Data StreamIRJET Journal
The document presents a new incremental density-based algorithm called Enhanced Density-based Data Stream (EDDS) for clustering data streams. EDDS modifies the traditional DBSCAN algorithm to represent clusters using only surface core points. It detects clusters and outliers in incoming data chunks, merges new clusters with existing ones, and filters outliers for the next round. The algorithm prunes internal core points using heuristics and removes aged core points/outliers using a fading function. It was evaluated on datasets and found to improve clustering correctness with time complexity comparable to existing methods.
A Survey on Balancing the Network Load Using Geographic Hash TablesIOSR Journals
This document summarizes a survey on balancing network load using geographic hash tables. It discusses how geographic hash tables are used to store and retrieve data from nodes in a wireless network. Two approaches to balancing the network load are proposed: 1) An analytical approach that adds new nodes to servers when load exceeds thresholds. 2) A heuristic approach that moves data between nodes to balance load without changing underlying routing protocols. The approaches aim to prevent many requests from going to single nodes. Load balancing improves network lifespan by distributing transmission and reception operations across nodes.
This document describes MMRetrieval.net, a multimodal search engine that allows retrieval of information from multiple modalities including text and images. It discusses challenges in multimodal retrieval like defining modality, fusing scores across modalities, and improving efficiency through two-stage retrieval. MMRetrieval.net was developed by researchers at DUTH to participate in the ImageCLEF 2010 Wikipedia retrieval task, where it achieved the second best performance overall by combining text and visual features through various fusion techniques. The system demonstrates the promise of multimodal search but also identifies open challenges around modality definitions, fusion methods, and scaling to large databases.
Clustering using kernel entropy principal component analysis and variable ker...IJECEIAES
Clustering as unsupervised learning method is the mission of dividing data objects into clusters with common characteristics. In the present paper, we introduce an enhanced technique of the existing EPCA data transformation method. Incorporating the kernel function into the EPCA, the input space can be mapped implicitly into a high-dimensional of feature space. Then, the Shannon’s entropy estimated via the inertia provided by the contribution of every mapped object in data is the key measure to determine the optimal extracted features space. Our proposed method performs very well the clustering algorithm of the fast search of clusters’ centers based on the local densities’ computing. Experimental results disclose that the approach is feasible and efficient on the performance query.
Parallel KNN for Big Data using Adaptive IndexingIRJET Journal
This document presents an evaluation of different algorithms for performing parallel k-nearest neighbor (kNN) queries on big data using the MapReduce framework. It first discusses how kNN algorithms do not scale well for large datasets. It then reviews existing MapReduce-based kNN algorithms like H-BNLJ, H-zkNNJ, and RankReduce that improve performance by partitioning data and distributing computation. The document also proposes using an adaptive indexing technique with the RankReduce algorithm. An implementation of this approach on a airline on-time statistics dataset shows it achieves better precision and speed than other algorithms.
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high
dimensional data. Many significant subspace clustering algorithms exist, each having different
characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive
classification scheme is essential which will consider all such characteristics to divide subspace clustering
approaches in various families. The algorithms belonging to same family will satisfy common
characteristics. Such a categorization will help future developers to better understand the quality criteria to
be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In
this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family).
Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc.
As an illustration, we further provided a comprehensive, systematic description and comparison of few
significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...ijcseit
This paper mainly presents some technical discussions on the identification and analyze of “LAN usersessions”.
The identification of a user-session is non trivial. Classical methods approaches rely on
threshold based mechanisms. Threshold based techniques are very sensitive to the value chosen for the
threshold, which may be difficult to set correctly. Clustering techniques are used to define a novel
methodology to identify LAN user-sessions without requiring an a priori definition of threshold values. We
have defined a clustering based approach in detail, and also we discussed positive and negative of this
approach, and we apply it to real traffic traces. The proposed methodology is applied to artificially
generated traces to evaluate its benefits against traditional threshold based approaches. We also analyzed
the characteristics of user-sessions extracted by the clustering methodology from real traces and study
their statistical properties.
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATIONijdms
This document summarizes a research paper that proposes a new local recoding approach for data anonymization based on minimum spanning tree partitioning. The approach aims to achieve k-anonymity while minimizing information loss. It involves constructing a minimum spanning tree using distances between data points based on attribute hierarchies, removing edges to form initial clusters, and generating equivalence classes that satisfy the anonymity requirement k. Experiments showed the proposed local recoding framework produced better quality anonymized tables than existing global recoding and clustering approaches.
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval IJECEIAES
Data mining is an essential process for identifying the patterns in large datasets through machine learning techniques and database systems. Clustering of high dimensional data is becoming very challenging process due to curse of dimensionality. In addition, space complexity and data retrieval performance was not improved. In order to overcome the limitation, Spectral Clustering Based VP Tree Indexing Technique is introduced. The technique clusters and indexes the densely populated high dimensional data points for effective data retrieval based on user query. A Normalized Spectral Clustering Algorithm is used to group similar high dimensional data points. After that, Vantage Point Tree is constructed for indexing the clustered data points with minimum space complexity. At last, indexed data gets retrieved based on user query using Vantage Point Tree based Data Retrieval Algorithm. This in turn helps to improve true positive rate with minimum retrieval time. The performance is measured in terms of space complexity, true positive rate and data retrieval time with El Nino weather data sets from UCI Machine Learning Repository. An experimental result shows that the proposed technique is able to reduce the space complexity by 33% and also reduces the data retrieval time by 24% when compared to state-of-the-artworks.
With ever increasing number of documents on web and other repositories, the task of organizing and
categorizing these documents to the diverse need of the user by manual means is a complicated job, hence
a machine learning technique named clustering is very useful. Text documents are clustered by pair wise
similarity of documents with similarity measures like Cosine, Jaccard or Pearson. Best clustering results
are seen when overlapping of terms in documents is less, that is, when clusters are distinguishable. Hence
for this problem, to find document similarity we apply link and neighbor introduced in ROCK. Link
specifies number of shared neighbors of a pair of documents. Significantly similar documents are called as
neighbors. This work applies links and neighbors to Bisecting K-means clustering in identifying seed
documents in the dataset, as a heuristic measure in choosing a cluster to be partitioned and as a means to
find the number of partitions possible in the dataset. Our experiments on real-time datasets showed a
significant improvement in terms of accuracy with minimum time.
Analysis of mass based and density based clustering techniques on numerical d...Alexander Decker
This document compares and analyzes mass-based and density-based clustering techniques. It summarizes DBSCAN and OPTICS, two popular density-based clustering algorithms, and introduces DEMassDBSCAN, a mass-based clustering algorithm. The document tests the algorithms on several datasets and finds that DEMassDBSCAN has better runtime than DBSCAN, especially on larger datasets, while producing fewer unassigned clusters.
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...IOSR Journals
This document presents a density-based clustering technique called TDCT (Triangle-density based clustering technique) for efficiently clustering large spatial datasets. The technique uses a polygon approach where the number of data points inside each triangle of a polygon is calculated. If the ratio of point densities between two neighboring triangles exceeds a threshold, the triangles are merged into the same cluster. The technique is capable of identifying clusters of arbitrary shapes and densities. Experimental results demonstrate the technique has superior cluster quality and complexity compared to other methods.
The document is a research paper that proposes and compares several clustering algorithms for remote sensing data:
1) DBSCAN, a density-based clustering algorithm that groups together densely populated areas.
2) OPTICS, an improved version of DBSCAN that handles varying cluster densities better.
3) Grid-based clustering that divides data into a grid for faster processing time.
4) Hybrid approaches like Grid-DBSCAN and Grid-OPTICS that combine grid-based clustering with DBSCAN and OPTICS to reduce computational complexity.
The paper evaluates and compares the accuracy and runtime of these algorithms on remote sensing image data.
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesIRJET Journal
The document discusses feature subset selection for high dimensional data using clustering techniques. It proposes the FAST algorithm which has three steps: 1) remove irrelevant features, 2) divide features into clusters using DBSCAN, and 3) select the most representative feature from each cluster. DBSCAN is a density-based clustering algorithm that can identify clusters of varying densities and detect outliers. The FAST algorithm is evaluated to select a small number of discriminative features from high dimensional data in an efficient manner. It aims to remove irrelevant and redundant features to improve predictive accuracy while handling large feature sets.
IRJET- Customer Segmentation from Massive Customer Transaction DataIRJET Journal
This document discusses various methods for customer segmentation through analysis of massive customer transaction data, including K-Means clustering, PAM clustering, agglomerative clustering, divisive clustering, and density-based clustering. It finds that K-Means is the most commonly used partitioning method. The document also reviews related work on customer segmentation and clustering algorithms like CLARA, CLARANS, BIRCH, ROCK, CHAMELEON, CURE, DHCC, DBSCAN, and LOF. It proposes a framework for an online shopping site that would apply these techniques to group customers based on their product preferences in transaction data.
This document summarizes an article from the International Journal of Computer Engineering and Technology (IJCET) that proposes an algorithm called Replica Placement in Graph Topology Grid (RPGTG) to optimally place data replicas in a graph-based data grid while ensuring quality of service (QoS). The algorithm aims to minimize data access time, balance load among replica servers, and avoid unnecessary replications, while restricting QoS in terms of number of hops and deadline to complete requests. The article describes how the algorithm converts the graph structure of the data grid to a hierarchical structure to better manage replica servers and proposes services to facilitate dynamic replication, including a replica catalog to track replica locations and a replica manager to perform replication
A Novel Dencos Model For High Dimensional Data Using Genetic Algorithms ijcseit
Subspace clustering is an emerging task that aims at detecting clusters in entrenched in
subspaces. Recent approaches fail to reduce results to relevant subspace clusters. Their results are
typically highly redundant and lack the fact of considering the critical problem, “the density divergence
problem,” in discovering the clusters, where they utilize an absolute density value as the density threshold
to identify the dense regions in all subspaces. Considering the varying region densities in different
subspace cardinalities, we note that a more appropriate way to determine whether a region in a subspace
should be identified as dense is by comparing its density with the region densities in that subspace. Based
on this idea and due to the infeasibility of applying previous techniques in this novel clustering model, we
devise an innovative algorithm, referred to as DENCOS(DENsity Conscious Subspace clustering), to adopt
a divide-and-conquer scheme to efficiently discover clusters satisfying different density thresholds in
different subspace cardinalities. DENCOS can discover the clusters in all subspaces with high quality, and
the efficiency significantly outperforms previous works, thus demonstrating its practicability for subspace
clustering. As validated by our extensive experiments on retail dataset, it outperforms previous works. We
extend our work with a clustering technique based on genetic algorithms which is capable of optimizing the
number of clusters for tasks with well formed and separated clusters.
This document discusses using a particle swarm optimization algorithm to solve the K-node set reliability optimization problem for distributed computing systems. The K-node set reliability optimization problem aims to maximize the reliability of a subset of k nodes in a distributed computing system while satisfying a specified capacity constraint. It presents the problem formulation and describes particle swarm optimization, a metaheuristic optimization technique inspired by swarm intelligence. The proposed algorithm applies a discrete particle swarm optimization approach to solve the K-node set reliability optimization problem, which is demonstrated on an example distributed computing system topology.
Density Based Clustering Approach for Solving the Software Component Restruct...IRJET Journal
This document presents research on using the DBSCAN clustering algorithm to solve the problem of software component restructuring. It begins with an abstract that introduces DBSCAN and describes how it can group related software components. It then provides background on software component clustering and describes DBSCAN in more detail. The methodology section outlines the 4 phases of the proposed approach: data collection and processing, clustering with DBSCAN, visualization and analysis, and final restructuring. Experimental results show that DBSCAN produces more evenly distributed clusters compared to fuzzy clustering. The conclusion is that DBSCAN is a better technique for software restructuring as it can identify clusters of varying shapes and sizes without specifying the number of clusters in advance.
Data collection in multi application sharing wireless sensor networksPvrtechnologies Nellore
- This document discusses algorithms for minimizing data collection in wireless sensor networks that are shared by multiple applications. It introduces the interval data sharing problem, where each application requires continuous interval data sampling rather than single data points.
- The problem is formulated as a non-linear, non-convex optimization problem. A 2-factor approximation algorithm is proposed with time complexity O(n^2) and memory complexity O(n) to address the high complexity of solving the optimization problem on resource-constrained sensor nodes.
- A special case where sampling intervals are the same length is analyzed, and a dynamic programming algorithm is provided that runs in optimal O(n^2) time and O(n) memory. Three online algorithms
Finding Relationships between the Our-NIR Cluster ResultsCSCJournals
The problem of evaluating node importance in clustering has been active research in present days and many methods have been developed. Most of the clustering algorithms deal with general similarity measures. However In real situation most of the cases data changes over time. But clustering this type of data not only decreases the quality of clusters but also disregards the expectation of users, when usually require recent clustering results. In this regard we proposed Our-NIR method that is better than Ming-Syan Chen proposed a method and it has proven with the help of results of node importance, which is related to calculate the node importance that is very useful in clustering of categorical data, still it has deficiency that is importance of data labeling and outlier detection. In this paper we modified Our-NIR method for evaluating of node importance by introducing the probability distribution which will be better than by comparing the results.
IRJET- Clustering of Hierarchical Documents based on the Similarity Deduc...IRJET Journal
This document discusses techniques for clustering hierarchical documents based on their structural similarity. It summarizes several existing approaches:
1) A tree edit distance-based method that represents trees as paths and computes the distance between subtrees. However, it requires trees to have a pre-specified structure.
2) Chawathe's algorithm that uses pre-order tree traversal and transforms trees into sequences of node labels and depths to calculate distances. It allows efficient assignment of new documents to clusters.
3) The XCLSC algorithm that clusters documents in two phases - grouping structurally similar documents and then searching to further improve clustering results and performance. However, it has high computational requirements.
4) The XPattern and PathXP
IRJET- Enhanced Density Based Method for Clustering Data StreamIRJET Journal
The document presents a new incremental density-based algorithm called Enhanced Density-based Data Stream (EDDS) for clustering data streams. EDDS modifies the traditional DBSCAN algorithm to represent clusters using only surface core points. It detects clusters and outliers in incoming data chunks, merges new clusters with existing ones, and filters outliers for the next round. The algorithm prunes internal core points using heuristics and removes aged core points/outliers using a fading function. It was evaluated on datasets and found to improve clustering correctness with time complexity comparable to existing methods.
A Survey on Balancing the Network Load Using Geographic Hash TablesIOSR Journals
This document summarizes a survey on balancing network load using geographic hash tables. It discusses how geographic hash tables are used to store and retrieve data from nodes in a wireless network. Two approaches to balancing the network load are proposed: 1) An analytical approach that adds new nodes to servers when load exceeds thresholds. 2) A heuristic approach that moves data between nodes to balance load without changing underlying routing protocols. The approaches aim to prevent many requests from going to single nodes. Load balancing improves network lifespan by distributing transmission and reception operations across nodes.
This document describes MMRetrieval.net, a multimodal search engine that allows retrieval of information from multiple modalities including text and images. It discusses challenges in multimodal retrieval like defining modality, fusing scores across modalities, and improving efficiency through two-stage retrieval. MMRetrieval.net was developed by researchers at DUTH to participate in the ImageCLEF 2010 Wikipedia retrieval task, where it achieved the second best performance overall by combining text and visual features through various fusion techniques. The system demonstrates the promise of multimodal search but also identifies open challenges around modality definitions, fusion methods, and scaling to large databases.
Clustering using kernel entropy principal component analysis and variable ker...IJECEIAES
Clustering as unsupervised learning method is the mission of dividing data objects into clusters with common characteristics. In the present paper, we introduce an enhanced technique of the existing EPCA data transformation method. Incorporating the kernel function into the EPCA, the input space can be mapped implicitly into a high-dimensional of feature space. Then, the Shannon’s entropy estimated via the inertia provided by the contribution of every mapped object in data is the key measure to determine the optimal extracted features space. Our proposed method performs very well the clustering algorithm of the fast search of clusters’ centers based on the local densities’ computing. Experimental results disclose that the approach is feasible and efficient on the performance query.
Parallel KNN for Big Data using Adaptive IndexingIRJET Journal
This document presents an evaluation of different algorithms for performing parallel k-nearest neighbor (kNN) queries on big data using the MapReduce framework. It first discusses how kNN algorithms do not scale well for large datasets. It then reviews existing MapReduce-based kNN algorithms like H-BNLJ, H-zkNNJ, and RankReduce that improve performance by partitioning data and distributing computation. The document also proposes using an adaptive indexing technique with the RankReduce algorithm. An implementation of this approach on a airline on-time statistics dataset shows it achieves better precision and speed than other algorithms.
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high
dimensional data. Many significant subspace clustering algorithms exist, each having different
characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive
classification scheme is essential which will consider all such characteristics to divide subspace clustering
approaches in various families. The algorithms belonging to same family will satisfy common
characteristics. Such a categorization will help future developers to better understand the quality criteria to
be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In
this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family).
Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc.
As an illustration, we further provided a comprehensive, systematic description and comparison of few
significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
IDENTIFICATION AND INVESTIGATION OF THE USER SESSION FOR LAN CONNECTIVITY VIA...ijcseit
This paper mainly presents some technical discussions on the identification and analyze of “LAN usersessions”.
The identification of a user-session is non trivial. Classical methods approaches rely on
threshold based mechanisms. Threshold based techniques are very sensitive to the value chosen for the
threshold, which may be difficult to set correctly. Clustering techniques are used to define a novel
methodology to identify LAN user-sessions without requiring an a priori definition of threshold values. We
have defined a clustering based approach in detail, and also we discussed positive and negative of this
approach, and we apply it to real traffic traces. The proposed methodology is applied to artificially
generated traces to evaluate its benefits against traditional threshold based approaches. We also analyzed
the characteristics of user-sessions extracted by the clustering methodology from real traces and study
their statistical properties.
GRAPH BASED LOCAL RECODING FOR DATA ANONYMIZATIONijdms
This document summarizes a research paper that proposes a new local recoding approach for data anonymization based on minimum spanning tree partitioning. The approach aims to achieve k-anonymity while minimizing information loss. It involves constructing a minimum spanning tree using distances between data points based on attribute hierarchies, removing edges to form initial clusters, and generating equivalence classes that satisfy the anonymity requirement k. Experiments showed the proposed local recoding framework produced better quality anonymized tables than existing global recoding and clustering approaches.
Spectral Clustering and Vantage Point Indexing for Efficient Data Retrieval IJECEIAES
Data mining is an essential process for identifying the patterns in large datasets through machine learning techniques and database systems. Clustering of high dimensional data is becoming very challenging process due to curse of dimensionality. In addition, space complexity and data retrieval performance was not improved. In order to overcome the limitation, Spectral Clustering Based VP Tree Indexing Technique is introduced. The technique clusters and indexes the densely populated high dimensional data points for effective data retrieval based on user query. A Normalized Spectral Clustering Algorithm is used to group similar high dimensional data points. After that, Vantage Point Tree is constructed for indexing the clustered data points with minimum space complexity. At last, indexed data gets retrieved based on user query using Vantage Point Tree based Data Retrieval Algorithm. This in turn helps to improve true positive rate with minimum retrieval time. The performance is measured in terms of space complexity, true positive rate and data retrieval time with El Nino weather data sets from UCI Machine Learning Repository. An experimental result shows that the proposed technique is able to reduce the space complexity by 33% and also reduces the data retrieval time by 24% when compared to state-of-the-artworks.
With ever increasing number of documents on web and other repositories, the task of organizing and
categorizing these documents to the diverse need of the user by manual means is a complicated job, hence
a machine learning technique named clustering is very useful. Text documents are clustered by pair wise
similarity of documents with similarity measures like Cosine, Jaccard or Pearson. Best clustering results
are seen when overlapping of terms in documents is less, that is, when clusters are distinguishable. Hence
for this problem, to find document similarity we apply link and neighbor introduced in ROCK. Link
specifies number of shared neighbors of a pair of documents. Significantly similar documents are called as
neighbors. This work applies links and neighbors to Bisecting K-means clustering in identifying seed
documents in the dataset, as a heuristic measure in choosing a cluster to be partitioned and as a means to
find the number of partitions possible in the dataset. Our experiments on real-time datasets showed a
significant improvement in terms of accuracy with minimum time.
Analysis of mass based and density based clustering techniques on numerical d...Alexander Decker
This document compares and analyzes mass-based and density-based clustering techniques. It summarizes DBSCAN and OPTICS, two popular density-based clustering algorithms, and introduces DEMassDBSCAN, a mass-based clustering algorithm. The document tests the algorithms on several datasets and finds that DEMassDBSCAN has better runtime than DBSCAN, especially on larger datasets, while producing fewer unassigned clusters.
A Density Based Clustering Technique For Large Spatial Data Using Polygon App...IOSR Journals
This document presents a density-based clustering technique called TDCT (Triangle-density based clustering technique) for efficiently clustering large spatial datasets. The technique uses a polygon approach where the number of data points inside each triangle of a polygon is calculated. If the ratio of point densities between two neighboring triangles exceeds a threshold, the triangles are merged into the same cluster. The technique is capable of identifying clusters of arbitrary shapes and densities. Experimental results demonstrate the technique has superior cluster quality and complexity compared to other methods.
The document is a research paper that proposes and compares several clustering algorithms for remote sensing data:
1) DBSCAN, a density-based clustering algorithm that groups together densely populated areas.
2) OPTICS, an improved version of DBSCAN that handles varying cluster densities better.
3) Grid-based clustering that divides data into a grid for faster processing time.
4) Hybrid approaches like Grid-DBSCAN and Grid-OPTICS that combine grid-based clustering with DBSCAN and OPTICS to reduce computational complexity.
The paper evaluates and compares the accuracy and runtime of these algorithms on remote sensing image data.
Feature Subset Selection for High Dimensional Data Using Clustering TechniquesIRJET Journal
The document discusses feature subset selection for high dimensional data using clustering techniques. It proposes the FAST algorithm which has three steps: 1) remove irrelevant features, 2) divide features into clusters using DBSCAN, and 3) select the most representative feature from each cluster. DBSCAN is a density-based clustering algorithm that can identify clusters of varying densities and detect outliers. The FAST algorithm is evaluated to select a small number of discriminative features from high dimensional data in an efficient manner. It aims to remove irrelevant and redundant features to improve predictive accuracy while handling large feature sets.
A NOVEL DENSITY-BASED CLUSTERING ALGORITHM FOR PREDICTING CARDIOVASCULAR DISEASEindexPub
Cardiovascular diseases (CVDs) remain a leading cause of global morbidity and mortality. Early identification of individuals at risk of heart disease is crucial for effective preventive interventions. To improve the prediction accuracy, this paper proposed Heart Disease Prediction using the Density-Based Ordering of Clustering Objects (DBOCO) framework. The Dataset has been pre-processed using Weighted Transform K-Means Clustering (WTKMC). Features are selected using Ensemble Feature Selection (EFS) with a Weighted Binary Bat Algorithm (WBBAT) used to ensure that the emphasis is on the most relevant predictors. Finally, the prediction has been done using the Density-Based Ordering of Clustering method, which has been designed exclusively for cardiovascular disease prediction. DBOCO, a density-based clustering approach, effectively finds dense clusters within data, allowing for the inherent overlap in cardiovascular risk variables. DBOCO captures complicated patterns by detecting these overlapping clusters, improving the accuracy of disease prediction models. The proposed approach has been verified with heart disease datasets, displaying higher performance than traditional methods. This study marks a substantial leap in predicting cardiovascular disease providing a comprehensive and dependable framework for early identification and preventive concern.
Intrusion Detection System using K-Means Clustering and SMOTEIRJET Journal
The document proposes a hybrid sampling technique combining K-Means clustering and SMOTE oversampling to address data imbalance in network intrusion detection systems. It first applies K-Means clustering to handle outliers in the data, and then uses SMOTE to generate additional samples for the minority (intrusion) class. This produces a balanced dataset for training classification models like Random Forest and CNN. The technique is evaluated on the NSL-KDD dataset and achieves accuracy above 94% with these models, outperforming an alternative approach using DBSCAN and SMOTE.
A COST EFFECTIVE COMPRESSIVE DATA AGGREGATION TECHNIQUE FOR WIRELESS SENSOR N...ijasuc
In wireless sensor network (WSN) there are two main problems in employing conventional compression
techniques. The compression performance depends on the organization of the routes for a larger extent.
The efficiency of an in-network data compression scheme is not solely determined by the compression
ratio, but also depends on the computational and communication overheads. In Compressive Data
Aggregation technique, data is gathered at some intermediate node where its size is reduced by applying
compression technique without losing any information of complete data. In our previous work, we have
developed an adaptive traffic aware aggregation technique in which the aggregation technique can be
changed into structured and structure-free adaptively, depending on the load status of the traffic. In this
paper, as an extension to our previous work, we provide a cost effective compressive data gathering
technique to enhance the traffic load, by using structured data aggregation scheme. We also design a
technique that effectively reduces the computation and communication costs involved in the compressive
data gathering process. The use of compressive data gathering process provides a compressed sensor
reading to reduce global data traffic and distributes energy consumption evenly to prolong the network
lifetime. By simulation results, we show that our proposed technique improves the delivery ratio while
reducing the energy and delay
Survey on classification algorithms for data mining (comparison and evaluation)Alexander Decker
This document provides an overview and comparison of three classification algorithms: K-Nearest Neighbors (KNN), Decision Trees, and Bayesian Networks. It discusses each algorithm, including how KNN classifies data based on its k nearest neighbors. Decision Trees classify data based on a tree structure of decisions, and Bayesian Networks classify data based on probabilities of relationships between variables. The document conducts an analysis of these three algorithms to determine which has the best performance and lowest time complexity for classification tasks based on evaluating a mock dataset over 24 months.
This document summarizes a research paper that evaluates cluster quality using a modified density subspace clustering approach. It discusses how density subspace clustering can be used to identify clusters in high-dimensional datasets by detecting density-connected clusters in all subspaces. The proposed approach uses a density subspace clustering algorithm to select attribute subsets and identify the best clusters. It then calculates intra-cluster and inter-cluster distances to evaluate cluster quality and compares the results to other clustering algorithms in terms of accuracy and runtime. Experimental results showed that the proposed method improves clustering quality and performs faster than existing techniques.
This document discusses various algorithms used for clustering data streams. It begins by introducing the problem of clustering streaming data and the common approach of using micro-clusters to summarize streaming data. It then reviews several prominent clustering algorithms like DBSCAN, DENCLUE, SNN, and CHAMELEON. The document focuses on the DBSTREAM algorithm, which explicitly captures density between micro-clusters using a shared density graph to improve reclustering. Experimental results show DBSTREAM's reclustering using shared density outperforms other reclustering strategies while using fewer micro-clusters.
A frame work for clustering time evolving dataiaemedu
The document proposes a framework for clustering time-evolving categorical data using a sliding window technique. It uses an existing clustering algorithm (Node Importance Representative) and a Drifting Concept Detection algorithm to detect changes in cluster distributions between the current and previous data windows. If a threshold difference in clusters is exceeded, reclustering is performed on the new window. Otherwise, the new clusters are added to the previous results. The framework aims to improve on prior work by handling drifting concepts in categorical time-series data.
TOWARDS REDUCTION OF DATA FLOW IN A DISTRIBUTED NETWORK USING PRINCIPAL COMPO...cscpconf
For performing distributed data mining two approaches are possible: First, data from several sources are copied to a data warehouse and mining algorithms are applied in it. Secondly,
mining can performed at the local sites and the results can be aggregated. When the number of
features is high, a lot of bandwidth is consumed in transferring datasets to a centralized location. For this dimensionality reduction can be done at the local sites. In dimensionality reduction a certain encoding is applied on data so as to obtain its compressed form. The
reduced features thus obtained at the local sites are aggregated and data mining algorithms are applied on them. There are several methods of performing dimensionality reduction. Two most important ones are Discrete Wavelet Transforms (DWT) and Principal Component Analysis (PCA). Here a detailed study is done on how PCA could be useful in reducing data flow across a distributed network.
CONTENT BASED VIDEO CATEGORIZATION USING RELATIONAL CLUSTERING WITH LOCAL SCA...ijcsit
This paper introduces a novel approach for efficient video categorization. It relies on two main
components. The first one is a new relational clustering technique that identifies video key frames by
learning cluster dependent Gaussian kernels. The proposed algorithm, called clustering and Local Scale
Learning algorithm (LSL) learns the underlying cluster dependent dissimilarity measure while finding
compact clusters in the given dataset. The learned measure is a Gaussian dissimilarity function defined
with respect to each cluster. We minimize one objective function to optimize the optimal partition and the
cluster dependent parameter. This optimization is done iteratively by dynamically updating the partition
and the local measure. The kernel learning task exploits the unlabeled data and reciprocally, the
categorization task takes advantages of the local learned kernel. The second component of the proposed
video categorization system consists in discovering the video categories in an unsupervised manner using
the proposed LSL. We illustrate the clustering performance of LSL on synthetic 2D datasets and on high
dimensional real data. Also, we assess the proposed video categorization system using a real video
collection and LSL algorithm.
The document proposes an error-aware data clustering technique for in-network data reduction in wireless sensor networks. It consists of three modules: 1) Histogram-based data clustering groups similar sensor data into clusters over time to reduce redundancy. 2) Recursive outlier detection and smoothing detects and replaces random outliers while maintaining a predefined error threshold. 3) Verification of RODS detects both random and frequent outliers using temporal and spatial correlations to provide more robust error-aware clustering. The technique aims to significantly reduce redundant data with minimum error for applications monitoring environmental conditions.
Data mining is utilized to manage huge measure of information which are put in the data ware houses and databases, to discover required information and data. Numerous data mining systems have been proposed, for example, association rules, decision trees, neural systems, clustering, and so on. It has turned into the purpose of consideration from numerous years. A re-known amongst the available data mining strategies is clustering of the dataset. It is the most effective data mining method. It groups the dataset in number of clusters based on certain guidelines that are predefined. It is dependable to discover the connection between the distinctive characteristics of data.
In k-mean clustering algorithm, the function is being selected on the basis of the relevancy of the function for predicting the data and also the Euclidian distance between the centroid of any cluster and the data objects outside the cluster is being computed for the clustering the data points. In this work, author enhanced the Euclidian distance formula to increase the cluster quality.
The problem of accuracy and redundancy of the dissimilar points in the clusters remains in the improved k-means for which new enhanced approach is been proposed which uses the similarity function for checking the similarity level of the point before including it to the cluster.
The document describes a novel algorithm called the Selecting Sub Attribute Algorithm (SSAA) for fast attribute selection in categorical data clustering. SSAA improves upon the existing Maximum Degree of Dominance of Soft Set (MDDS) algorithm by excluding attributes that are dominated by others, reducing computational time. The SSAA and MDDS algorithms are compared using UCI benchmark datasets, demonstrating that SSAA achieves lower execution time. SSAA has potential for applications involving high-dimensional categorical data clustering.
K- means clustering method based Data Mining of Network Shared Resources .pptxSaiPragnaKancheti
K-means clustering is an unsupervised machine learning algorithm that is useful for clustering and categorizing unlabeled data points. It works by assigning data points to a set number of clusters, K, where each data point belongs to the cluster with the nearest mean. The document discusses how k-means clustering can be applied to network shared resources mining to overcome limitations of existing methods. It provides details on how k-means clustering works, compares it to other clustering algorithms, and demonstrates how it can accurately and efficiently cluster network resource data into groups within 0.6 seconds on average.
K- means clustering method based Data Mining of Network Shared Resources .pptxSaiPragnaKancheti
K-means clustering is an unsupervised machine learning algorithm that is useful for clustering and categorizing unlabeled data points. It works by assigning data points to a set number of clusters, K, where each data point belongs to the cluster with the nearest mean. The document discusses how k-means clustering can be applied to network shared resources mining to overcome limitations of existing methods. It provides details on how k-means clustering works, compares it to other clustering algorithms, and demonstrates how it can accurately and efficiently cluster network resource data into groups within 0.6 seconds on average.
The document discusses the applicability of fuzzy theory in remote sensing image classification. It presents three experiments comparing different classification methods: 1) Unsupervised fuzzy c-means classification, 2) Supervised classification using fuzzy signatures, 3) Supervised classification using fuzzy signatures and membership functions. The supervised fuzzy methods achieved higher accuracy than the unsupervised method, with the third method performing best with an overall accuracy of 83.9%. Fuzzy convolution can further optimize results by combining classification bands.
E XTENDED F AST S EARCH C LUSTERING A LGORITHM : W IDELY D ENSITY C LUSTERS ,...csandit
CFSFDP (clustering by fast search and find of densi
ty peaks) is recently developed density-
based clustering algorithm. Compared to DBSCAN, it
needs less parameters and is
computationally cheap for its non-iteration. Alex.
at al have demonstrated its power by many
applications. However, CFSFDP performs not well whe
n there are more than one density peak
for one cluster, what we name as "no density peaks"
. In this paper, inspired by the idea of a
hierarchical clustering algorithm CHAMELEON, we pro
pose an extension of CFSFDP,
E_CFSFDP, to adapt more applications. In particular
, we take use of original CFSFDP to
generating initial clusters first, then merge the s
ub clusters in the second phase. We have
conducted the algorithm to several data sets, of wh
ich, there are "no density peaks". Experiment
results show that our approach outperforms the orig
inal one due to it breaks through the strict
claim of data sets
This document describes an electronic doorbell system that uses a keypad and GSM for home security. The system consists of a doorbell connected to a microcontroller that triggers a GSM module to send an SMS to the homeowner when the doorbell is pressed. The homeowner can then respond via button press to open the door, or a message will be displayed if they do not respond. An authorized person can enter a password on the keypad, and if multiple wrong passwords are entered, a message will be sent to the homeowner about a potential burglary attempt. The system aims to provide notification to homeowners and prevent unauthorized access through use of passwords and messaging capabilities.
Augmented reality, the new age technology, has widespread applications in every field imaginable. This technology has proven to be an inflection point in numerous verticals, improving lives and improving performance. In this paper, we explore the various possible applications of Augmented Reality (AR) in the field of Medicine. The objective of using AR in medicine or generally in any field is the fact that, AR helps in motivating the user, making sessions interactive and assist in faster learning. In this paper, we discuss about the applicability of AR in the field of medical diagnosis. Augmented reality technology reinforces remote collaboration, allowing doctors to diagnose patients from a different locality. Additionally, we believe that a much more pronounced effect can be achieved by bringing together the cutting edge technology of AR and the lifesaving field of Medical sciences. AR is a mechanism that could be applied in the learning process too. Similarly, virtual reality could be used in the field where more of practical experience is needed such as driving, sports, neonatal care training.
Image fusion is a sub field of image processing in which more than one images are fused to create an image where all the objects are in focus. The process of image fusion is performed for multi-sensor and multi-focus images of the same scene. Multi-sensor images of the same scene are captured by different sensors whereas multi-focus images are captured by the same sensor. In multi-focus images, the objects in the scene which are closer to the camera are in focus and the farther objects get blurred. Contrary to it, when the farther objects are focused then closer objects get blurred in the image. To achieve an image where all the objects are in focus, the process of images fusion is performed either in spatial domain or in transformed domain. In recent times, the applications of image processing have grown immensely. Usually due to limited depth of field of optical lenses especially with greater focal length, it becomes impossible to obtain an image where all the objects are in focus. Thus, it plays an important role to perform other tasks of image processing such as image segmentation, edge detection, stereo matching and image enhancement. Hence, a novel feature-level multi-focus image fusion technique has been proposed which fuses multi-focus images. Thus, the results of extensive experimentation performed to highlight the efficiency and utility of the proposed technique is presented. The proposed work further explores comparison between fuzzy based image fusion and neuro fuzzy fusion technique along with quality evaluation indices.
Graphs have become the dominant life-form of many tasks as they advance a
structure to represent many tasks and the corresponding relations. A powerful
role of networks/graphs is to bridge local feats that exist in vertices as they
blossom into patterns that help explain how nodal relations and their edges
impacts a complex effect that ripple via a graph. User cluster are formed as a
result of interactions between entities. Many users can hardly categorize their
contact into groups today such as “family”, “friends”, “colleagues” etc. Thus,
the need to analyze such user social graph via implicit clusters, enables the
dynamism in contact management. Study seeks to implement this dynamism
via a comparative study of deep neural network and friend suggest algorithm.
We analyze a user’s implicit social graph and seek to automatically create
custom contact groups using metrics that classify such contacts based on a
user’s affinity to contacts. Experimental results demonstrate the importance
of both the implicit group relationships and the interaction-based affinity in
suggesting friends.
This paper projects Gryllidae Optimization Algorithm (GOA) has been applied to solve optimal reactive power problem. Proposed GOA approach is based on the chirping characteristics of Gryllidae. In common, male Gryllidae chirp, on the other hand some female Gryllidae also do as well. Male Gryllidae draw the females by this sound which they produce. Moreover, they caution the other Gryllidae against dangers with this sound. The hearing organs of the Gryllidae are housed in an expansion of their forelegs. Through this, they bias to the produced fluttering sounds. Proposed Gryllidae Optimization Algorithm (GOA) has been tested in standard IEEE 14, 30 bus test systems and simulation results show that the projected algorithms reduced the real power loss considerably.
In the wake of the sudden replacement of wood and kerosene by gas cookers for several purposes in Nigeria, gas leakage has caused several damages in our homes, Laboratories among others. installation of a gas leakage detection device was globally inspired to eliminate accidents related to gas leakage. We present an alternative approach to developing a device that can automatically detect and control gas leakages and also monitor temperature. The system detects the leakage of the LPG (Liquefied Petroleum Gas) using a gas sensor, then triggred the control system response which employs ventilator system, Mobile phone alert and alarm when the LPG concentration in the air exceeds a certain level. The performance of two gas sensors (MQ5 and MQ6) were tested for a guided decision. Also, when the temperature of the environment poses a danger, LED (indicator), buzzer and LCD (16x2) display was used to indicate temperature and gas leakage status in degree Celsius and PPM respectively. Attension was given to the response time of the control system, which was ascertained that this system significantly increases the chances and efficiency of eliminating gas leakage related accident.
Feature selection problem is one of the main important problems in the text and data mining domain. This paper presents a comparative study of feature selection methods for Arabic text classification. Five of the feature selection methods were selected: ICHI square, CHI square, Information Gain, Mutual Information and Wrapper. It was tested with five classification algorithms: Bayes Net, Naive Bayes, Random Forest, Decision Tree and Artificial Neural Networks. In addition, Data Collection was used in Arabic consisting of 9055 documents, which were compared by four criteria: Precision, Recall, F-measure and Time to build model. The results showed that the improved ICHI feature selection got almost all the best results in comparison with other methods.
The document proposes the Gentoo Penguin Algorithm (GPA) to solve the optimal reactive power problem. The goal is to minimize real power loss. GPA is inspired by the natural behaviors of Gentoo penguins. In GPA, penguin positions represent potential solutions. Penguins move toward other penguins with lower "cost" or higher heat concentration, representing better solutions. Cost is defined by heat concentration and distance between penguins. Heat radiation decreases with distance. The algorithm is tested on the IEEE 57 bus system and reduces real power loss effectively compared to other methods.
08 20272 academic insight on applicationIAESIJEECS
This research has thrown up many questions in need of further investigation.There was an expressive quantitative-qualitative research, which a common investigation form was used in.The dialogue item was also applied to discover if the contributors asserted the media-based attitude supplements their learning of academic English writing classes or not.Data recounted academic” insights toward using Skype as a sustaining implement for lessons releasing based on chosen variables: their occupation, year of education, and knowledge with Skype discovered that there were no important statistical differences in the use of Skype units because of medical academics major knowledge. There are statistically important differences in using Skype units. The findings also, disclosed that there are statistically significant differences in using Skype units due to the practice with Skype variable, in favors of academics with no Skype use practice. Skype instrument as an instructive media is a positive medium to be employed to supply academic medical writing data and assist education. Academics who do not have enough time to contribute in classes believe comfortable using the Skype-based attitude in scientific writing. They who took part in the course claimed that their approval of this media is due to learning academic innovative medical writing.
Cloud computing has sweeping impact on the human productivity. Today it’s used for Computing, Storage, Predictions and Intelligent Decision Making, among others. Intelligent Decision-Making using Machine Learning has pushed for the Cloud Services to be even more fast, robust and accurate. Security remains one of the major concerns which affect the cloud computing growth however there exist various research challenges in cloud computing adoption such as lack of well managed service level agreement (SLA), frequent disconnections, resource scarcity, interoperability, privacy, and reliability. Tremendous amount of work still needs to be done to explore the security challenges arising due to widespread usage of cloud deployment using Containers. We also discuss Impact of Cloud Computing and Cloud Standards. Hence in this research paper, a detailed survey of cloud computing, concepts, architectural principles, key services, and implementation, design and deployment challenges of cloud computing are discussed in detail and important future research directions in the era of Machine Learning and Data Science have been identified.
Notary is an official authorized to make an authentic deed regarding all deeds, agreements and stipulations required by a general rule. Activities carried out at the notary office such as recording client data and file data still use traditional systems that tend to be manual. The problem that occurs is the inefficiency in data processing and providing information to clients. Clients have difficulty getting information related to the progress of documents that are being taken care of at the notary's office. The client must take the time to arrive to the notary's office repeatedly to check the progress of the work of the document file. The purpose of this study is to facilitate clients in obtaining information about the progress of the work in progress, and make it easier for employees to process incoming documents by implementing an administrative system. This system was developed with the waterfall system development method and uses the Multi-Channel Access Technology integrated in the website to simplify the process of delivering information and requesting information from clients and to clients with Telegram and SMS Gateway. Clients will come to the office only when there is a notification from the system via Telegram or SMS notifying that the client must come directly to the notary's office, thus leading to an efficient time and avoiding excessive transportation costs. The overall functional system can function properly based on the results of alpha testing. The results of beta testing conducted by distributing the system feasibility test questionnaire to end users, get a percentage of 96% of users agree the system is feasible to be implemented.
In this work Tundra wolf algorithm (TWA) is proposed to solve the optimal reactive power problem. In the projected Tundra wolf algorithm (TWA) in order to avoid the searching agents from trapping into the local optimal the converging towards global optimal is divided based on two different conditions. In the proposed Tundra wolf algorithm (TWA) omega tundra wolf has been taken as searching agent as an alternative of indebted to pursue the first three most excellent candidates. Escalating the searching agents’ numbers will perk up the exploration capability of the Tundra wolf wolves in an extensive range. Proposed Tundra wolf algorithm (TWA) has been tested in standard IEEE 14, 30 bus test systems and simulation results show the proposed algorithm reduced the real power loss effectively.
In this work Predestination of Particles Wavering Search (PPS) algorithm has been applied to solve optimal reactive power problem. PPS algorithm has been modeled based on the motion of the particles in the exploration space. Normally the movement of the particle is based on gradient and swarming motion. Particles are permitted to progress in steady velocity in gradient-based progress, but when the outcome is poor when compared to previous upshot, immediately particle rapidity will be upturned with semi of the magnitude and it will help to reach local optimal solution and it is expressed as wavering movement. In standard IEEE 14, 30, 57,118,300 bus systems Proposed Predestination of Particles Wavering Search (PPS) algorithm is evaluated and simulation results show the PPS reduced the power loss efficiently.
In this paper, Mine Blast Algorithm (MBA) has been intermingled with Harmony Search (HS) algorithm for solving optimal reactive power dispatch problem. MBA is based on explosion of landmines and HS is based on Creativeness progression of musicians-both are hybridized to solve the problem. In MBA Initial distance of shrapnel pieces are reduced gradually to allow the mine bombs search the probable global minimum location in order to amplify the global explore capability. Harmony search (HS) imitates the music creativity process where the musicians supervise their instruments’ pitch by searching for a best state of harmony. Hybridization of Mine Blast Algorithm with Harmony Search algorithm (MH) improves the search effectively in the solution space. Mine blast algorithm improves the exploration and harmony search algorithm augments the exploitation. At first the proposed algorithm starts with exploration & gradually it moves to the phase of exploitation. Proposed Hybridized Mine Blast Algorithm with Harmony Search algorithm (MH) has been tested on standard IEEE 14, 300 bus test systems. Real power loss has been reduced considerably by the proposed algorithm. Then Hybridized Mine Blast Algorithm with Harmony Search algorithm (MH) tested in IEEE 30, bus system (with considering voltage stability index)- real power loss minimization, voltage deviation minimization, and voltage stability index enhancement has been attained.
Artificial Neural Networks have proved their efficiency in a large number of research domains. In this paper, we have applied Artificial Neural Networks on Arabic text to prove correct language modeling, text generation, and missing text prediction. In one hand, we have adapted Recurrent Neural Networks architectures to model Arabic language in order to generate correct Arabic sequences. In the other hand, Convolutional Neural Networks have been parameterized, basing on some specific features of Arabic, to predict missing text in Arabic documents. We have demonstrated the power of our adapted models in generating and predicting correct Arabic text comparing to the standard model. The model had been trained and tested on known free Arabic datasets. Results have been promising with sufficient accuracy.
In the present-day communications speech signals get contaminated due to
various sorts of noises that degrade the speech quality and adversely impacts
speech recognition performance. To overcome these issues, a novel approach
for speech enhancement using Modified Wiener filtering is developed and
power spectrum computation is applied for degraded signal to obtain the
noise characteristics from a noisy spectrum. In next phase, MMSE technique
is applied where Gaussian distribution of each signal i.e. original and noisy
signal is analyzed. The Gaussian distribution provides spectrum estimation
and spectral coefficient parameters which can be used for probabilistic model
formulation. Moreover, a-priori-SNR computation is also incorporated for
coefficient updation and noise presence estimation which operates similar to
the conventional VAD. However, conventional VAD scheme is based on the
hard threshold which is not capable to derive satisfactory performance and a
soft-decision based threshold is developed for improving the performance of
speech enhancement. An extensive simulation study is carried out using
MATLAB simulation tool on NOIZEUS speech database and a comparative
study is presented where proposed approach is proved better in comparison
with existing technique.
Previous research work has highlighted that neuro-signals of Alzheimer’s disease patients are least complex and have low synchronization as compared to that of healthy and normal subjects. The changes in EEG signals of Alzheimer’s subjects start at early stage but are not clinically observed and detected. To detect these abnormalities, three synchrony measures and wavelet-based features have been computed and studied on experimental database. After computing these synchrony measures and wavelet features, it is observed that Phase Synchrony and Coherence based features are able to distinguish between Alzheimer’s disease patients and healthy subjects. Support Vector Machine classifier is used for classification giving 94% accuracy on experimental database used. Combining, these synchrony features and other such relevant features can yield a reliable system for diagnosing the Alzheimer’s disease.
Attenuation correction designed for PET/MR hybrid imaging frameworks along with portion making arrangements used for MR-based radiation treatment remain testing because of lacking high-energy photon weakening data. We present a new method so as to uses the learned nonlinear neighborhood descriptors also highlight coordinating toward foresee pseudo-CT pictures starting T1w along with T2w MRI information. The nonlinear neighborhood descriptors are acquired through anticipating the direct descriptors interested in the nonlinear high-dimensional space utilizing an unequivocal constituent guide also low-position guess through regulated complex regularization. The nearby neighbors of every near descriptor inside the data MR pictures are looked during an obliged spatial extent of the MR pictures among the training dataset. By that point, the pseudo-CT patches are evaluated through k-closest neighbor relapse. The planned procedure designed for pseudo-CT forecast is quantitatively broke downward on top of a dataset comprising of coordinated mind MRI along with CT pictures on or after 13 subjects.
The cognitive radio prototype performance is to alleviate the scarcity of spectral resources for wireless communication through intelligent sensing and quick resource allocation techniques. Secondary users (SU’s) actively obtain the spectrum access opportunity by supporting primary users (PU’s) in cognitive radio networks (CRNs). In present generation, spectrum access is endowed through cooperative communication-based link-level frame-based cooperative (LLC) principle. In this SUs independently act as conveyors for PUs to achieve spectrum access opportunities. Unfortunately, this LLC approach cannot fully exploit spectrum access opportunities to enhance the throughput of CRNs and fails to motivate PUs to join the spectrum sharing processes. Therefore, to overcome this con, network level cooperative (NLC) principle was used, where SUs are integrated mutually to collaborate with PUs session by session, instead of frame based cooperation for spectrum access opportunities. NLC approach has justified the challenges facing in LLC approach. In this paper we make a survey of some models that have been proposed to tackle the problem of LLC. We show the relevant aspects of each model, in order to characterize the parameters that we should take in account to achieve a spectrum access opportunity.
In this paper, the author provides insights and lessons that can be learned from colleagues at American universities about their online education experiences. The literature review and previous studies of online educations gains are explored and summarized in this research. Emerging trends in online education are discussed in detail, and strategies to implement these trends are explained. The author provides several tools and strategies that enable universities to ensure the quality of online education. At the end of this research paper, the researcher provides examples from Arab universities who have successfully implemented online education and expanded their impact on the society. This research provides a strategy and a model that can be used by universities in the Middle East as a roadmap to implement online education in their regions.
Driving Business Innovation: Latest Generative AI Advancements & Success StorySafe Software
Are you ready to revolutionize how you handle data? Join us for a webinar where we’ll bring you up to speed with the latest advancements in Generative AI technology and discover how leveraging FME with tools from giants like Google Gemini, Amazon, and Microsoft OpenAI can supercharge your workflow efficiency.
During the hour, we’ll take you through:
Guest Speaker Segment with Hannah Barrington: Dive into the world of dynamic real estate marketing with Hannah, the Marketing Manager at Workspace Group. Hear firsthand how their team generates engaging descriptions for thousands of office units by integrating diverse data sources—from PDF floorplans to web pages—using FME transformers, like OpenAIVisionConnector and AnthropicVisionConnector. This use case will show you how GenAI can streamline content creation for marketing across the board.
Ollama Use Case: Learn how Scenario Specialist Dmitri Bagh has utilized Ollama within FME to input data, create custom models, and enhance security protocols. This segment will include demos to illustrate the full capabilities of FME in AI-driven processes.
Custom AI Models: Discover how to leverage FME to build personalized AI models using your data. Whether it’s populating a model with local data for added security or integrating public AI tools, find out how FME facilitates a versatile and secure approach to AI.
We’ll wrap up with a live Q&A session where you can engage with our experts on your specific use cases, and learn more about optimizing your data workflows with AI.
This webinar is ideal for professionals seeking to harness the power of AI within their data management systems while ensuring high levels of customization and security. Whether you're a novice or an expert, gain actionable insights and strategies to elevate your data processes. Join us to see how FME and AI can revolutionize how you work with data!
Generating privacy-protected synthetic data using Secludy and MilvusZilliz
During this demo, the founders of Secludy will demonstrate how their system utilizes Milvus to store and manipulate embeddings for generating privacy-protected synthetic data. Their approach not only maintains the confidentiality of the original data but also enhances the utility and scalability of LLMs under privacy constraints. Attendees, including machine learning engineers, data scientists, and data managers, will witness first-hand how Secludy's integration with Milvus empowers organizations to harness the power of LLMs securely and efficiently.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
Have you ever been confused by the myriad of choices offered by AWS for hosting a website or an API?
Lambda, Elastic Beanstalk, Lightsail, Amplify, S3 (and more!) can each host websites + APIs. But which one should we choose?
Which one is cheapest? Which one is fastest? Which one will scale to meet our needs?
Join me in this session as we dive into each AWS hosting service to determine which one is best for your scenario and explain why!
Programming Foundation Models with DSPy - Meetup SlidesZilliz
Prompting language models is hard, while programming language models is easy. In this talk, I will discuss the state-of-the-art framework DSPy for programming foundation models with its powerful optimizers and runtime constraint system.
Taking AI to the Next Level in Manufacturing.pdfssuserfac0301
Read Taking AI to the Next Level in Manufacturing to gain insights on AI adoption in the manufacturing industry, such as:
1. How quickly AI is being implemented in manufacturing.
2. Which barriers stand in the way of AI adoption.
3. How data quality and governance form the backbone of AI.
4. Organizational processes and structures that may inhibit effective AI adoption.
6. Ideas and approaches to help build your organization's AI strategy.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
AI-Powered Food Delivery Transforming App Development in Saudi Arabia.pdfTechgropse Pvt.Ltd.
In this blog post, we'll delve into the intersection of AI and app development in Saudi Arabia, focusing on the food delivery sector. We'll explore how AI is revolutionizing the way Saudi consumers order food, how restaurants manage their operations, and how delivery partners navigate the bustling streets of cities like Riyadh, Jeddah, and Dammam. Through real-world case studies, we'll showcase how leading Saudi food delivery apps are leveraging AI to redefine convenience, personalization, and efficiency.
HCL Notes and Domino License Cost Reduction in the World of DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-and-domino-license-cost-reduction-in-the-world-of-dlau/
The introduction of DLAU and the CCB & CCX licensing model caused quite a stir in the HCL community. As a Notes and Domino customer, you may have faced challenges with unexpected user counts and license costs. You probably have questions on how this new licensing approach works and how to benefit from it. Most importantly, you likely have budget constraints and want to save money where possible. Don’t worry, we can help with all of this!
We’ll show you how to fix common misconfigurations that cause higher-than-expected user counts, and how to identify accounts which you can deactivate to save money. There are also frequent patterns that can cause unnecessary cost, like using a person document instead of a mail-in for shared mailboxes. We’ll provide examples and solutions for those as well. And naturally we’ll explain the new licensing model.
Join HCL Ambassador Marc Thomas in this webinar with a special guest appearance from Franz Walder. It will give you the tools and know-how to stay on top of what is going on with Domino licensing. You will be able lower your cost through an optimized configuration and keep it low going forward.
These topics will be covered
- Reducing license cost by finding and fixing misconfigurations and superfluous accounts
- How do CCB and CCX licenses really work?
- Understanding the DLAU tool and how to best utilize it
- Tips for common problem areas, like team mailboxes, functional/test users, etc
- Practical examples and best practices to implement right away
Unlock the Future of Search with MongoDB Atlas_ Vector Search Unleashed.pdfMalak Abu Hammad
Discover how MongoDB Atlas and vector search technology can revolutionize your application's search capabilities. This comprehensive presentation covers:
* What is Vector Search?
* Importance and benefits of vector search
* Practical use cases across various industries
* Step-by-step implementation guide
* Live demos with code snippets
* Enhancing LLM capabilities with vector search
* Best practices and optimization strategies
Perfect for developers, AI enthusiasts, and tech leaders. Learn how to leverage MongoDB Atlas to deliver highly relevant, context-aware search results, transforming your data retrieval process. Stay ahead in tech innovation and maximize the potential of your applications.
#MongoDB #VectorSearch #AI #SemanticSearch #TechInnovation #DataScience #LLM #MachineLearning #SearchTechnology
For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/06/building-and-scaling-ai-applications-with-the-nx-ai-manager-a-presentation-from-network-optix/
Robin van Emden, Senior Director of Data Science at Network Optix, presents the “Building and Scaling AI Applications with the Nx AI Manager,” tutorial at the May 2024 Embedded Vision Summit.
In this presentation, van Emden covers the basics of scaling edge AI solutions using the Nx tool kit. He emphasizes the process of developing AI models and deploying them globally. He also showcases the conversion of AI models and the creation of effective edge AI pipelines, with a focus on pre-processing, model conversion, selecting the appropriate inference engine for the target hardware and post-processing.
van Emden shows how Nx can simplify the developer’s life and facilitate a rapid transition from concept to production-ready applications.He provides valuable insights into developing scalable and efficient edge AI solutions, with a strong focus on practical implementation.
UiPath Test Automation using UiPath Test Suite series, part 6DianaGray10
Welcome to UiPath Test Automation using UiPath Test Suite series part 6. In this session, we will cover Test Automation with generative AI and Open AI.
UiPath Test Automation with generative AI and Open AI webinar offers an in-depth exploration of leveraging cutting-edge technologies for test automation within the UiPath platform. Attendees will delve into the integration of generative AI, a test automation solution, with Open AI advanced natural language processing capabilities.
Throughout the session, participants will discover how this synergy empowers testers to automate repetitive tasks, enhance testing accuracy, and expedite the software testing life cycle. Topics covered include the seamless integration process, practical use cases, and the benefits of harnessing AI-driven automation for UiPath testing initiatives. By attending this webinar, testers, and automation professionals can gain valuable insights into harnessing the power of AI to optimize their test automation workflows within the UiPath ecosystem, ultimately driving efficiency and quality in software development processes.
What will you get from this session?
1. Insights into integrating generative AI.
2. Understanding how this integration enhances test automation within the UiPath platform
3. Practical demonstrations
4. Exploration of real-world use cases illustrating the benefits of AI-driven test automation for UiPath
Topics covered:
What is generative AI
Test Automation with generative AI and Open AI.
UiPath integration with generative AI
Speaker:
Deepak Rai, Automation Practice Lead, Boundaryless Group and UiPath MVP
Essentials of Automations: The Art of Triggers and Actions in FMESafe Software
In this second installment of our Essentials of Automations webinar series, we’ll explore the landscape of triggers and actions, guiding you through the nuances of authoring and adapting workspaces for seamless automations. Gain an understanding of the full spectrum of triggers and actions available in FME, empowering you to enhance your workspaces for efficient automation.
We’ll kick things off by showcasing the most commonly used event-based triggers, introducing you to various automation workflows like manual triggers, schedules, directory watchers, and more. Plus, see how these elements play out in real scenarios.
Whether you’re tweaking your current setup or building from the ground up, this session will arm you with the tools and insights needed to transform your FME usage into a powerhouse of productivity. Join us to discover effective strategies that simplify complex processes, enhancing your productivity and transforming your data management practices with FME. Let’s turn complexity into clarity and make your workspaces work wonders!
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
Infrastructure Challenges in Scaling RAG with Custom AI modelsZilliz
Building Retrieval-Augmented Generation (RAG) systems with open-source and custom AI models is a complex task. This talk explores the challenges in productionizing RAG systems, including retrieval performance, response synthesis, and evaluation. We’ll discuss how to leverage open-source models like text embeddings, language models, and custom fine-tuned models to enhance RAG performance. Additionally, we’ll cover how BentoML can help orchestrate and scale these AI components efficiently, ensuring seamless deployment and management of RAG systems in the cloud.
Threats to mobile devices are more prevalent and increasing in scope and complexity. Users of mobile devices desire to take full advantage of the features
available on those devices, but many of the features provide convenience and capability but sacrifice security. This best practices guide outlines steps the users can take to better protect personal devices and information.
2. ISSN: 2252-8776
IJECE Vol. 6, No. 3, December 2017 : 199– 208
200
Density based clustering techniques are currently on the increase due to the increase in the
requirements for spatial data processing. However, most of these techniques are derivatives from DBSCAN,
extending it according to their operating domains. Extensions are usually in terms of reduction in time,
parameter automation, input reduction in terms of feature selection and ability to handle varied densities.
FDBSCAN [7] and IDBSCAN [8] are two approaches concentrating on the reduction in time component.
Both operate by identifying a minimal set of highly representative points for their operation rather than
utilizing the entire data points. In addition to this, the IDBSCAN utilized a grid based data selection for
reducing the input levels.
Parameter fine-tuning based algorithms includes k-VDBSCAN [9], DBCLASD [10] and APSCAN
[11]. The k-VDBSCAN is a parameter free technique that automatically identifies the parameters.
DBCLASD operates on large data and is a parameter free clustering technique, enabling gradual cluster
expansion on the basis of the neighbors and their densities. APSCAN utilizes Affinity Propagation technique
to identify the cluster densities based on local data variations.
A density based clustering technique aiding in the discovery of clusters with varying densities
within a single dataset was proposed by Zhu et al in [12]. In general, the input data is considered to contain
data distributed in uniform densities. However, some unusual real time data such as population maps tend to
contain such data. This leads to a complex issue, as increasing the density levels for the entire process
includes several outliers into clusters, while reducing the density levels misses several legitimate clusters.
Two approaches exist in literature to solve this issue, namely modifying the algorithm appropriately and
rescaling the data. The technique proposed in [12] utilizes the latter by rescaling the data to appropriately
identify the clusters. DBSCAN is applied to the rescaled data to identify clusters. A similar technique that
identifies varied density based clusters was proposed by Louhichi et al. in [13]. The operational process of
this technique is divided into two phases. The first phase identifies the density levels of the input data using
the exponential spline technique on the distance matrix. Second phase utilizes the density values determined
from the first phase as local threshold parameters to identify clusters. Other density based clustering
techniques include VDBSCAN [14], GMDBSCAN [15], DDSC [16], EDBSCAN [17] etc.
This paper concentrates on developing a density based spatial clustering technique. DBSCAN, being
the precursor of such techniques, is adopted by most of the techniques involving non-uniform clusters. It was
identified that DBSCAN has high potency of generating outliers. On further assessment it was identified that
several of these outliers effectively fit into the formed clusters. However, they still remain outliers due to the
parameters defined for the clustering process. The parameter setting process is always optimal and is
identified by the data experts through data analysis and trial and error. As resolving this issue is a complex
task, this work presents NRDBSCAN with one-class SVM to reduce the noise levels.
2. RESEARCH METHOD
Density based spatial clustering proposes a grouping mechanism, that operates on the basis of both
distance and the node density. The major advantage of using such an approach is that it does not rely on
centroid based operations, hence the inconsistencies in the formation of clusters are eliminated. Further,
density based clustering is an unsupervised approach with no prior information requirements. Density based
clustering approaches has the ability to identify clusters of arbitrary shapes and sizes rather than being
confined to the traditional circular clusters. It was identified that DBSCAN based techniques exhibits high
noise levels. Even though the base clustering process was identified to be efficient, the noise levels generated
by DBSCAN were found to be excessive. This paper proposes NRDBSCAN, a hybridized density based
clustering approach that initially clusters the data, after which it tries to incorporate noise into any of the
existing definite clusters, hence reducing the noise levels.
Prior to the actual clustering process, the input data is processed and is converted to the required
format. A schema analysis is performed on the data to identify the data types of the contents. The proposed
architecture accepts only numerical data, hence textual data are eliminated and categorical and ordinal data
are converted to numerical formats. The data pre-processing is followed by the actual clustering process.
Algorithm for the proposed NRDBSCAN is shown below.
3. IJ-ICT ISSN: 2252-8776
Density Based Clustering with Integrated One-Class SVM… (K. Nafees Ahmed)
201
NRDBSCAN Algorithm
1. Input base data
2. Input threshold levels (minPts, maxDist)
3. Initialize first cluster with a random node n
4. neighbors identifyNeighbors(n)
5. If the neighbour count<minPts,
a. Set n into a separate cluster (Outlier)
6. Else expandCluster(n,neighbors)
7. Perform this process until all the nodes are grouped into a
cluster
8. For each identified cluster C
a. If density of C < 1
i. noise C
9. Until Noise is not empty
a. For each identified cluster C
i. Prediction predict(C,noise)
ii. Add all true predictions to cluster C
iii. Delete corresponding entries from noise
b. if no entries are deleted for the last two iterations
i. Allocate new clusters for each entry in noise
ii. Empty noise
function identifyNeighbors (n)
1. For all nodes n1 in data
a. If distance(n,n1)<maxDist
i. Add n1 to neighborList
2. return neighborList
function expandCluster(n,neighbors)
1. Add n to the cluster C
2. For each neighbour n1
a. neighborL1 identifyNeighbors(n1)
b. if neighborL1 count >= minPts
i. Add n1 to C
function predict(C,noise)
1. Initialize one-class SVM with polynomial kernel
2. Set the degree of kernel to be the dimensions of C
3. Train SVM with C
4. Predictions Apply noise to trained kernel
5. Return Predictions
2.1. Initial Level Cluster Formulation using DBSCAN
The pre-processed data is analyzed and the maximum acceptable distance (maxDist) and the
minimum neighbor requirements (minPts) are identified using the data distribution. However, accurately
identifying the best parameters is not possible in a single iteration. This is a trial and error based fine-tuning
process. Further, these parameters vary with the dataset being used. Hence distribution based transient values
are initially identified and the final parameters are identified by methodically increasing and decreasing the
parameter values to identify the best parameter set for the current data under analysis.
2.1.1. Cluster Identification Phase
In this phase, the process begins with a random node in the dataset. The initial cluster is composed
of this single node n. Neighbors of the selected node (n) are identified using maxDist as the threshold. If the
node satisfies the minimum neighbor requirements (minPts), it is considered to be a part of a cluster and not
an outlier. Hence, this is followed by the cluster expansion phase.
2.1.2. Cluster Expansion Phase
This phase finds the neighbors of each node in n. If each of these nodes satisfies the minPts
requirement they are incorporated as an entity in the cluster. This process is continued until all the
appropriate nodes are grouped into the cluster. However, several clusters might exist in a dataset. Hence
4. ISSN: 2252-8776
IJECE Vol. 6, No. 3, December 2017 : 199– 208
202
some points would remain outside the composed cluster. A random such point is considered as the base for
the next cluster. The neighbor identification and cluster expansion phase are repeated to identify the points
corresponding to the new cluster. This process is repeated until all the points are categorized into a cluster.
Though all points are categorized into clusters, not all clusters would be composed with high or even
considerable densities. Some clusters would be composed of single node. These nodes are intended to be
outliers. Major reasons for the occurrence of outliers are improper parameter tuning, their actual presence or
inappropriate node validation in the expansion phase. Though the actual presence of outliers needs to be
considered, the other two issues also play a vital role in the occurrence of outliers in DBSCAN. Hence a
second level examination would incorporate several nodes that could have been components of the existing
clusters, leading to a reduction in the outliers.
2.2. One-Class SVM based Noise Reduction
DBSCAN eliminates several nodes, considering them as outliers. However, they might not
necessarily correspond to outliers, rather they could be components of the defined clusters. Utilizing the
conventional conditional checks utilized by DBSCAN again ultimately has the same effects. Hence the
proposed NRDBSCAN incorporates a machine learning classifier, One-Class Support Vector Machine
(SVM) for the categorization process.
Utilizing all the clusters for training a binary/multi-class classifier and then performing classification
on the detected outliers has several downsides. The major issue is that binary/multi-class classifier definitely
categorizes the data into any one of the groups. Hence all the outliers would definitely be grouped into a
cluster, leading to zero outliers. However, the proposed work is intended on grouping only the appropriate
nodes and retaining the outliers. Hence a one-class classifier would be the best suited approach. One-class
classifier is a special case of classifiers, where the pattern of a single class would be well known, while the
patterns that do not confine to the well-known trained patterns would be considered as outliers.
Output from DBSCAN is in terms of clusters. Data within clusters have obvious associations, hence
these can be used as the training data for the classifiers. The noise reduction phase operates by training the
one-class SVM using data from one cluster and performing predictions on the detected outliers. Outliers
categorized as components of the cluster used for training are incorporated into the cluster. The residual
outliers are considered for processing by training the one-class SVM with data from the next cluster. This
process is continued for all the defined clusters with considerable density levels.
One-class SVM was suggested by Scholkopf et al. in [18]. It operates by separating the data points
from the origin and considering the origin alone as the second class. The base operational nature of one-class
SVM is to maximize the distance from the hyperplane created by the data points to the origin. The resultant
of this is a binary function that effectively captures the regions in the input space, returning +1 for data
contained in the hyperplane and -1 for others. One-class SVM used in the NRDBSCAN utilizes the
polynomial kernel [19], as the input data has the tendency to contain several dimensions.
Not all data are expected to be components of the clusters. After the entire process, some residual data
remains, which are categorized as outliers.
3. RESULTS AND ANALYSIS
DBSCAN was implemented in C#.NET and the noise reduction components were incorporated into
the operational process. Analysis of the NRDBSCAN was performed by using four benchmark datasets.
Clustering specific datasets such as Iris and Banana and spatial datasets such as Quake and Forest were used
for identifying the efficiency of the algorithm.
Cluster densities and the number of clusters obtained from each of the datasets using NRDBSCAN
is shown in figures(1-4). Cluster densities correspond to the number of nodes in each of the formulated
cluster. A density of one indicates an outlier. It could be observed that the clusters formulated using the
proposed approach exhibits low outlier levels and clusters of considerable densities. Iris and Quake,
containing low variations exhibit low outlier levels, while Forest and Banana exhibits high variations, hence
high outliers. However, it could be observed that the clusters other than outliers exhibit high density levels.
6. ISSN: 2252-8776
IJECE Vol. 6, No. 3, December 2017 : 199– 208
204
Figure 4. Cluster Density (Forest)
Intra-cluster radius of the proposed NRDBSCAN is presented in figures(5-8). It could be observed
that the proposed technique exhibits low to moderate intra cluster distance levels. Moderate levels are due to
the varied shaped clusters formed by the algorithm. Inter-cluster distance exhibited by the algorithm is
presented in figure 9. It could be observed that low inter cluster distances are exhibited by NRDBSCAN.
This validates our claim of varied shaped clusters with different density levels.
Figure 5. Intra Cluster Radius (Iris)
494
4 5 4 1 1 1 1 1 1 1 1 1 1
0
100
200
300
400
500
600
1 2 3 4 5 6 7 8 9 10 11 12 13 14
No.ofNodes
Cluster Number
Cluster Density (Forest)
0
0.5
1
1.5
2
2.5
3
1 2 3 4 5 6 7
NodeDistance
Cluster Number
IntraCluster Radius (Iris)
8. ISSN: 2252-8776
IJECE Vol. 6, No. 3, December 2017 : 199– 208
206
Figure 9. Inter Cluster Distance
A comparison in terms of time is carried out between DBSCAN, Modified PSO based DBSCAN
[20] and the proposed NRDBSCAN is presented in figure 10. It could be observed that the time taken for
modified PSO is the highest. However, the time taken by NRDBSCAN is higher compared to DBSCAN.
Even-though that is the case, the difference is in terms of 0.3 sec (maximum). Hence the significance of the
time increase is considered to be low.
Figure 10. Time Comparison
A comparison in terms of the number of clusters generated by DBSCAN and NRDBSCAN is shown
in figure 11. It could be observed that the number of clusters generated by NRDBSCAN is at-least 60% less
than the conventional DBSCAN. This exhibits the effective noise reduction levels exhibited by the proposed
approach.
0
0.5
1
1.5
2
2.5
3
3.5
4
Iris Banana Quake Forest
Distance
Inter Cluster Distance
Iris Banana Quake Forest
Time (DBSCAN) 0.002 0.046 0.0361 0.0126
Time
(NRDBSCAN)
0.005 0.3605 0.0706 0.0217
Time (Modified
PSO)
0.193 3.972 3.421 1.794
0
0.5
1
1.5
2
2.5
3
3.5
4
4.5
Time(sec)
9. IJ-ICT ISSN: 2252-8776
Density Based Clustering with Integrated One-Class SVM… (K. Nafees Ahmed)
207
Figure 11. Comparison in terms of number of clusters
4. CONCLUSION
Clustering, being one of the major techniques for data analysis has seen several adaptations due to
the changing constraints. The major adaptations include identifying varied shaped clusters. This paper
proposes an enhancement of the DBSCAN that is a pioneer algorithm in identifying varied shaped and varied
density clusters. The major downside of DBSCAN was observed to be its requirement for accurate parameter
setting. A slightly varied setting results in the increase in outliers. However, identifying the perfect parameter
is not feasible. Hence the proposed NRDBSCAN enhances the DBSCAN approach by introducing a noise
reduction component based on one-class classifier model. One-class SVM was used to perform this process.
Results exhibits significant reduction in the noise levels. Current algorithm operates effectively on fixed
density clusters. Future works will concentrate on porting the algorithm to operate on datasets with varied
densities and effectively identify clusters with varied density levels.
REFERENCES
[1] J. A. Hartigan and M.A. Wong, “Algorithm AS 136: A k-means Clustering Algorithm,” Journal of Royal Statistical
Society,Series C (Applied Statistics) vol. 28, pp. 100-108, Jan 1979.
[2] C.P. Wei, et al., “Empirical Comparison of Fast Clustering Algorithms for Large data sets,” in System Sciences,
2000 IEEE Proceeding of 33rd
Annual Hawaii International Conference, 2000, pp. 1-10.
[3] M.Ester, et al., “A Density Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise,” in
2nd
International Conference on Knowledge Discovery and Data Mining. KDD-1996, vol. 96, pp. 226-231. 1996.
[4] A. Hinneburg and D.A. Keim, “An Efficient Approach to Clustering in Large Multimedia Databases with Noise,”
in International Conference on Knowlegde Discovery and Data Mining, KDD-1998, vol. 98, pp. 58-65, 1998.
[5] M. Ankerst, et al., “OPTICS: Ordering Points to Identify the Clustering Structure,” in ACM SIGMOD Record 1999,
vol. 28, pp. 49-60, 1999.
[6] E. Güngör and A. Özmen, “Distance and Density based Clustering Algorithm using Gaussian Kernel,” Expert
Systems with Appilcations, vol. 1, pp. 10-20, 2017.
[7] S. Zhou, et al., “FDBSCAN: A Fast DBSCAN Algorithm,” Ruan Jian Xue Bao, vol. 11, pp. 735-744, 2000.
[8] C.F. Tsain and H.F. Yeh, “Npust: An Efficient Clustering Algorithm using Partition Space Technique for Large
Databases,” in International Conference on Industrial, Engineering and Other Applications of Applied Intelligent
Systems, pp. 787-796, 2009, Springer Berlin Heidelberg.
[9] A.R. Chowdhury, et al., “An Efficient Method forSsubjectively Choosing Parameter „k‟Automatically in
VDBSCAN (Varied Density Based Spatial Clustering of Applications with Noise)Algorithm,” in Computer and
Automation Engineering, 2010, ICCAE 2010. Second International Conference, IEEE, pp. 38-41.
[10] M. Parimala, et al., “A Survey on Density Based Clustering Algorithms for Mining Large Spatial Databases,”
International Journal of Advanced Science and Technology, vol. 31, pp. 59-66, 2011.
[11] X. Chen, et al., “APSCAN: A Parameter Free Algorithm for Clustering,” Pattern Recognition Letters, vol. 32, pp.
973-986, 2011.
[12] Y. Zhu, et al., “Density-ratio Based Clustering for Discovering Clusters with Varying Densities,” Pattern
Recognition, vol. 60, pp. 983-997, 2016.
[13] S. Louhichi, et al., “Unsupervised Varied Density Based Clustering Algorithm using Spline,” Pattern Recognition
Letters, 2016.
0
50
100
150
200
250
300
Iris Banana Quake Forest
ClusterNumber
No of Clusters
# Clusters (DBSCAN) # Clusters (NRDBSCAN)
10. ISSN: 2252-8776
IJECE Vol. 6, No. 3, December 2017 : 199– 208
208
[14] P. Liu, et al., “VDBSCAN: Varied Density Based Spatial Clustering of Applications with Noise,” in Service
Systems and Service Management, 2007, IEEE International Conference, 2007, pp. 1-4.
[15] C. Xiaoyum, et al., “GMDBSCAN: Multi-Density DBSCAN Cluster Based on Grid,” in e-Business Engineering,
2008, ICEBE 2008. IEEE International Conference, 2008, pp. 780-783.
[16] B. Borah and D.K. Bhattacharyya, “DDSC: A Density Differentiated Spatial Clustering Technique,” Journal of
computers, vol. 3, pp. 72-79, 2008.
[17] A. Ram, et al., “An Enhanced Density Based Spatial Clustering of Applications with Noise,” in Advanced
Computing Conference, 2009. IACC 2009. IEEE International, 2009, pp. 1475-1478.
[18] B. Schölkopf, et al., “Estimating the Support of a High-dimensional Distribution,” Neural Computation, 2001, pp.
1443-1471.
[19] L.M. Manevitz and M. Yousef, “One-class SVMs for Document Classification,” Journal of Machine Learning
Research, 2001, vol. 2, pp. 139-154.
[20] K. Nafees Ahmed and T. Abdul Razak, “Density Based Clustering using Modified PSO based Neighbor Selection,”
International Journal of Computer Science and Engineering, vol. 9, pp. 192-199.