Any geographic location undergoes changes over a period of time. These changes can be observed by
naked eye, only if they are huge in number spread over a small area. However, when the changes are small
and spread over a large area, it is very difficult to observe or extract the changes. Presently, there are few
methods available for tackling these types of problems, such as GRID, DBSCAN etc. However, these
existing mechanisms are not adequate for finding an accurate changes or observation which is essential
with respect to most important geometrical changes such as deforestations and land grabbing etc.,. This
paper proposes new mechanism to solve the above problem. In this proposed method, spatial image
changes are compared over a period of time taken by the satellite. Partitioning the satellite image in to
grids, employed in the proposed hybrid method, provides finer details of the image which are responsible
for improving the precision of clustering compared to whole image manipulation, used in DBSCAN, at a
time .The simplicity of DBSCAN explored while processing portioned grid portion.
Drsp dimension reduction for similarity matching and pruning of time series ...IJDKP
Similarity matching and join of time series data streams has gained a lot of relevance in today’s world that
has large streaming data. This process finds wide scale application in the areas of location tracking,
sensor networks, object positioning and monitoring to name a few. However, as the size of the data stream
increases, the cost involved to retain all the data in order to aid the process of similarity matching also
increases. We develop a novel framework to addresses the following objectives. Firstly, Dimension
reduction is performed in the preprocessing stage, where large stream data is segmented and reduced into
a compact representation such that it retains all the crucial information by a technique called Multi-level
Segment Means (MSM). This reduces the space complexity associated with the storage of large time-series
data streams. Secondly, it incorporates effective Similarity Matching technique to analyze if the new data
objects are symmetric to the existing data stream. And finally, the Pruning Technique that filters out the
pseudo data object pairs and join only the relevant pairs. The computational cost for MSM is O(l*ni) and
the cost for pruning is O(DRF*wsize*d), where DRF is the Dimension Reduction Factor. We have
performed exhaustive experimental trials to show that the proposed framework is both efficient and
competent in comparison with earlier works.
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...IJAEMSJORNAL
The current trend in the industry is to analyze large data sets and apply data mining, machine learning techniques to identify a pattern. But the challenges with huge data sets are the high dimensions associated with it. Sometimes in data analytics applications, large amounts of data produce worse performance. Also, most of the data mining algorithms are implemented column wise and too many columns restrict the performance and make it slower. Therefore, dimensionality reduction is an important step in data analysis. Dimensionality reduction is a technique that converts high dimensional data into much lower dimension, such that maximum variance is explained within the first few dimensions. This paper focuses on multivariate statistical and artificial neural networks techniques for data reduction. Each method has a different rationale to preserve the relationship between input parameters during analysis. Principal Component Analysis which is a multivariate technique and Self Organising Map a neural network technique is presented in this paper. Also, a hierarchical clustering approach has been applied to the reduced data set. A case study of Air quality measurement has been considered to evaluate the performance of the proposed techniques.
10 9242 it geo-spatial information for managing ambiguity(edit ty)IAESIJEECS
An innate test emerging in any dataset containing data of space as well as time is vulnerability due to different wellsprings of imprecision. Incorporating the effect of the instability is a principal while evaluating the unwavering quality (certainty) of any question result from the hidden information. To bargain with vulnerability, arrangements have been proposed freely in the geo-science and the information science look into group. This interdisciplinary instructional exercise crosses over any barrier between the two groups by giving an exhaustive diagram of the distinctive difficulties required in managing indeterminate geo-spatial information, by looking over arrangements from both research groups, and by distinguishing likenesses, cooperative energies and open research issues.
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.
Improving K-NN Internet Traffic Classification Using Clustering and Principle...journalBEEI
K-Nearest Neighbour (K-NN) is one of the popular classification algorithm, in this research K-NN use to classify internet traffic, the K-NN is appropriate for huge amounts of data and have more accurate classification, K-NN algorithm has a disadvantages in computation process because K-NN algorithm calculate the distance of all existing data in dataset. Clustering is one of the solution to conquer the K-NN weaknesses, clustering process should be done before the K-NN classification process, the clustering process does not need high computing time to conqest the data which have same characteristic, Fuzzy C-Mean is the clustering algorithm used in this research. The Fuzzy C-Mean algorithm no need to determine the first number of clusters to be formed, clusters that form on this algorithm will be formed naturally based datasets be entered. The Fuzzy C-Mean has weakness in clustering results obtained are frequently not same even though the input of dataset was same because the initial dataset that of the Fuzzy C-Mean is less optimal, to optimize the initial datasets needs feature selection algorithm. Feature selection is a method to produce an optimum initial dataset Fuzzy C-Means. Feature selection algorithm in this research is Principal Component Analysis (PCA). PCA can reduce non significant attribute or feature to create optimal dataset and can improve performance for clustering and classification algorithm. The resultsof this research is the combination method of classification, clustering and feature selection of internet traffic dataset was successfully modeled internet traffic classification method that higher accuracy and faster performance.
COLOCATION MINING IN UNCERTAIN DATA SETS: A PROBABILISTIC APPROACHIJCI JOURNAL
In this paper we investigate colocation mining problem in the context of uncertain data. Uncertain data is a
partially complete data. Many of the real world data is Uncertain, for example, Demographic data, Sensor
networks data, GIS data etc.,. Handling such data is a challenge for knowledge discovery particularly in
colocation mining. One straightforward method is to find the Probabilistic Prevalent colocations (PPCs).
This method tries to find all colocations that are to be generated from a random world. For this we first
apply an approximation error to find all the PPCs which reduce the computations. Next find all the
possible worlds and split them into two different worlds and compute the prevalence probability. These
worlds are used to compare with a minimum probability threshold to decide whether it is Probabilistic
Prevalent colocation (PPCs) or not. The experimental results on the selected data set show the significant
improvement in computational time in comparison to some of the existing methods used in colocation
mining.
Drsp dimension reduction for similarity matching and pruning of time series ...IJDKP
Similarity matching and join of time series data streams has gained a lot of relevance in today’s world that
has large streaming data. This process finds wide scale application in the areas of location tracking,
sensor networks, object positioning and monitoring to name a few. However, as the size of the data stream
increases, the cost involved to retain all the data in order to aid the process of similarity matching also
increases. We develop a novel framework to addresses the following objectives. Firstly, Dimension
reduction is performed in the preprocessing stage, where large stream data is segmented and reduced into
a compact representation such that it retains all the crucial information by a technique called Multi-level
Segment Means (MSM). This reduces the space complexity associated with the storage of large time-series
data streams. Secondly, it incorporates effective Similarity Matching technique to analyze if the new data
objects are symmetric to the existing data stream. And finally, the Pruning Technique that filters out the
pseudo data object pairs and join only the relevant pairs. The computational cost for MSM is O(l*ni) and
the cost for pruning is O(DRF*wsize*d), where DRF is the Dimension Reduction Factor. We have
performed exhaustive experimental trials to show that the proposed framework is both efficient and
competent in comparison with earlier works.
Influence over the Dimensionality Reduction and Clustering for Air Quality Me...IJAEMSJORNAL
The current trend in the industry is to analyze large data sets and apply data mining, machine learning techniques to identify a pattern. But the challenges with huge data sets are the high dimensions associated with it. Sometimes in data analytics applications, large amounts of data produce worse performance. Also, most of the data mining algorithms are implemented column wise and too many columns restrict the performance and make it slower. Therefore, dimensionality reduction is an important step in data analysis. Dimensionality reduction is a technique that converts high dimensional data into much lower dimension, such that maximum variance is explained within the first few dimensions. This paper focuses on multivariate statistical and artificial neural networks techniques for data reduction. Each method has a different rationale to preserve the relationship between input parameters during analysis. Principal Component Analysis which is a multivariate technique and Self Organising Map a neural network technique is presented in this paper. Also, a hierarchical clustering approach has been applied to the reduced data set. A case study of Air quality measurement has been considered to evaluate the performance of the proposed techniques.
10 9242 it geo-spatial information for managing ambiguity(edit ty)IAESIJEECS
An innate test emerging in any dataset containing data of space as well as time is vulnerability due to different wellsprings of imprecision. Incorporating the effect of the instability is a principal while evaluating the unwavering quality (certainty) of any question result from the hidden information. To bargain with vulnerability, arrangements have been proposed freely in the geo-science and the information science look into group. This interdisciplinary instructional exercise crosses over any barrier between the two groups by giving an exhaustive diagram of the distinctive difficulties required in managing indeterminate geo-spatial information, by looking over arrangements from both research groups, and by distinguishing likenesses, cooperative energies and open research issues.
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.
Improving K-NN Internet Traffic Classification Using Clustering and Principle...journalBEEI
K-Nearest Neighbour (K-NN) is one of the popular classification algorithm, in this research K-NN use to classify internet traffic, the K-NN is appropriate for huge amounts of data and have more accurate classification, K-NN algorithm has a disadvantages in computation process because K-NN algorithm calculate the distance of all existing data in dataset. Clustering is one of the solution to conquer the K-NN weaknesses, clustering process should be done before the K-NN classification process, the clustering process does not need high computing time to conqest the data which have same characteristic, Fuzzy C-Mean is the clustering algorithm used in this research. The Fuzzy C-Mean algorithm no need to determine the first number of clusters to be formed, clusters that form on this algorithm will be formed naturally based datasets be entered. The Fuzzy C-Mean has weakness in clustering results obtained are frequently not same even though the input of dataset was same because the initial dataset that of the Fuzzy C-Mean is less optimal, to optimize the initial datasets needs feature selection algorithm. Feature selection is a method to produce an optimum initial dataset Fuzzy C-Means. Feature selection algorithm in this research is Principal Component Analysis (PCA). PCA can reduce non significant attribute or feature to create optimal dataset and can improve performance for clustering and classification algorithm. The resultsof this research is the combination method of classification, clustering and feature selection of internet traffic dataset was successfully modeled internet traffic classification method that higher accuracy and faster performance.
COLOCATION MINING IN UNCERTAIN DATA SETS: A PROBABILISTIC APPROACHIJCI JOURNAL
In this paper we investigate colocation mining problem in the context of uncertain data. Uncertain data is a
partially complete data. Many of the real world data is Uncertain, for example, Demographic data, Sensor
networks data, GIS data etc.,. Handling such data is a challenge for knowledge discovery particularly in
colocation mining. One straightforward method is to find the Probabilistic Prevalent colocations (PPCs).
This method tries to find all colocations that are to be generated from a random world. For this we first
apply an approximation error to find all the PPCs which reduce the computations. Next find all the
possible worlds and split them into two different worlds and compute the prevalence probability. These
worlds are used to compare with a minimum probability threshold to decide whether it is Probabilistic
Prevalent colocation (PPCs) or not. The experimental results on the selected data set show the significant
improvement in computational time in comparison to some of the existing methods used in colocation
mining.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
Recently graph data rises in many applications and there is need to manage such large amount of data by performing various graph operations over graphs using some graph search queries. Many approaches and algorithms serve this purpose but continuously require improvement over it in terms of stability and performance. Such approaches are less efficient when large and complex data is involved. Applications need to execute faster in order to improve overall performance of the system and need to perform many
advanced and complex operations. Shortest path estimation is one of the key search queries in many applications. Here we present a system which will find the shortest path between nodes and contribute to performance of the system with the help of different shortest path algorithms such as bidirectional search and AStar algorithm and takes a relational approach using some new standard SQL queries to solve the
problem, utilizing advantages of relational database which solves the problem efficiently.
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEIJDKP
Metadata represents the information about data to be stored in Data Warehouses. It is a mandatory
element of Data Warehouse to build an efficient Data Warehouse. Metadata helps in data integration,
lineage, data quality and populating transformed data into data warehouse. Spatial data warehouses are
based on spatial data mostly collected from Geographical Information Systems (GIS) and the transactional
systems that are specific to an application or enterprise. Metadata design and deployment is the most
critical phase in building of data warehouse where it is mandatory to bring the spatial information and
data modeling together. In this paper, we present a holistic metadata framework that drives metadata
creation for spatial data warehouse. Theoretically, the proposed metadata framework improves the
efficiency of accessing of data in response to frequent queries on SDWs. In other words, the proposed
framework decreases the response time of the query and accurate information is fetched from Data
Warehouse including the spatial information
PREDICTION OF STORM DISASTER USING CLOUD MAP-REDUCE METHODAM Publications
Data mining is the process of analyzing data from different perspectives and summarizing it into useful information the patterns, associations, or relationships among all this data can provide information. Spatial Data Mining (SDM) is the part of data mining. It is mainly used for finding the figure in data that is related to space. In spatial data mining, to get the result analyst use geographical or spatial information which require some special technique to get geographical data in the appropriate formats. SDM is mainly used in earth science data, crime mapping, census data, traffic data, and cancer cluster (i.e. to investigate environment health hazards). For real time processing, Stream data mining is used for the prediction of storm using spatial dataset with the help of stream data mining strategy i.e. CMR (Cloud Map-Reduce). In this, Stream data mining is presented to detect the storm disaster of coastal area which is located in Central America country and then by taking the dataset of coastal area which are having various regions. There are various regions i.e. Panama, Greater Antilles, Mexico golf which is used to detect the storm. It also detects that, is the region is affected or not, if affected then which area of that region is affected and from this it is helpful to predict the storm before the disaster occur. In this, two parameters are used to test the technique i.e. processing time and computational load and using this parameter compare the previous and proposed technique. The main aim of this is to predict storm disaster before it happen with the help of directionality and velocity of storm.
High dimensionality reduction on graphical dataeSAT Journals
Abstract In spite of the fact that graph embedding has been an intense instrument for displaying data natural structures, just utilizing all elements for data structures revelation may bring about noise amplification. This is especially serious for high dimensional data with little examples. To meet this test, a novel effective structure to perform highlight determination for graph embedding, in which a classification of graph implanting routines is given a role as a slightest squares relapse issue. In this structure, a twofold component selector is acquainted with normally handle the component cardinality at all squares detailing. The proposed strategy is quick and memory proficient. The proposed system is connected to a few graph embedding learning issues, counting administered, unsupervised and semi supervised graph embedding. Key Words:Efficient feature selection, High dimensional data, Sparse graph embedding, Sparse principal component analysis, Subproblem Optimization.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
With the rapid development in Geographic Information Systems (GISs) and their applications, more and
more geo-graphical databases have been developed by different vendors. However, data integration and
accessing is still a big problem for the development of GIS applications as no interoperability exists among
different spatial databases. In this paper we propose a unified approach for spatial data query. The paper
describes a framework for integrating information from repositories containing different vector data sets
formats and repositories containing raster datasets. The presented approach converts different vector data
formats into a single unified format (File Geo-Database “GDB”). In addition, we employ “metadata” to
support a wide range of users’ queries to retrieve relevant geographic information from heterogeneous and
distributed repositories. Such an employment enhances both query processing and performance.
New proximity estimate for incremental update of non uniformly distributed cl...IJDKP
The conventional clustering algorithms mine static databases and generate a set of patterns in the form of
clusters. Many real life databases keep growing incrementally. For such dynamic databases, the patterns
extracted from the original database become obsolete. Thus the conventional clustering algorithms are not
suitable for incremental databases due to lack of capability to modify the clustering results in accordance
with recent updates. In this paper, the author proposes a new incremental clustering algorithm called
CFICA(Cluster Feature-Based Incremental Clustering Approach for numerical data) to handle numerical
data and suggests a new proximity metric called Inverse Proximity Estimate (IPE) which considers the
proximity of a data point to a cluster representative as well as its proximity to a farthest point in its vicinity.
CFICA makes use of the proposed proximity metric to determine the membership of a data point into a
cluster.
Data imputing uses to posit missing data values, as missing data have a negative effect on the computation validity of models. This study develops a genetic algorithm (GA) to optimize imputing for missing cost data of fans used in road tunnels by the Swedish Transport Administration (Trafikverket). GA uses to impute the missing cost data using an optimized valid data period. The results show highly correlated data (R- squared 0.99) after imputing the missing data. Therefore, GA provides a wide search space to optimize imputing and create complete data. The complete data can be used for forecasting and life cycle cost analysis. Ritesh Kumar Pandey | Dr Asha Ambhaikar"Data Imputation by Soft Computing" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-4 , June 2018, URL: http://www.ijtsrd.com/papers/ijtsrd14112.pdf http://www.ijtsrd.com/computer-science/real-time-computing/14112/data-imputation-by-soft-computing/ritesh-kumar-pandey
The complexity of landscape pattern mining is well stated due to its non-linear spatial image formation and
inhomogeneity of the satellite images. Land Ex tool of the literature work needs several seconds to answer input
image pattern query. The time duration of content based image retrieval depends on input query complexity. This
paper focuses on designing and implementing a training dataset to train NML (Neural network based Machine
Learning) algorithm to reduce the search time to improve the result accuracy. The performance evolution of
proposed NML CBIR (Content Based Image Retrieval) method will be used for comparison of satellite and natural
images by means of increasing speed and accuracy.
Keywords: Spatial Image, Satellite image, NML, CBIR
Mobile Location Indexing Based On Synthetic Moving ObjectsIJECEIAES
Today, the number of researches based on the data they move known as mobile objects indexing came out from the traditional static one. There are some indexing approaches to handle the complicated moving positions. One of the suitable ideas is pre-ordering these objects before building index structure. In this paper, a structure, a presorted-nearest index tree algorithm is proposed that allowed maintaining, updating, and range querying mobile objects within the desired period. Besides, it gives the advantage of an index structure to easy data access and fast query along with the retrieving nearest locations from a location point in the index structure. A synthetic mobile position dataset is also proposed for performance evaluation so that it is free from location privacy and confidentiality. The detail experimental results are discussed together with the performance evaluation of KDtree-based index structure. Both approaches are similarly efficient in range searching. However, the proposed approach is especially much more save time for the nearest neighbor search within a range than KD tree-based calculation.
Integrating Web Services With Geospatial Data Mining Disaster Management for ...Waqas Tariq
Data Mining (DM) and Geographical Information Systems (GIS) are complementary techniques for describing, transforming, analyzing and modeling data about real world system. GIS and DM are naturally synergistic technologies that can be joined to produce powerful market insight from a sea of disparate data. Web Services would greatly simplify the development of many kinds of data integration and knowledge management applications. This research aims to develop a Spatial DM web service. It integrates state of the art GIS and DM functionality in an open, highly extensible, web-based architecture. The Interoperability of geospatial data previously focus just on data formats and standards. The recent popularity and adoption of Web Services has provided new means of interoperability for geospatial information not just for exchanging data but for analyzing these data during exchange as well. An integrated, user friendly Spatial DM System available on the internet via a web service offers exciting new possibilities for geo-spatial analysis to be ready for decision making and geographical research to a wide range of potential users.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after
the collection of these data sets is completed, but this is far from satisfying the actual demand. Large
amounts of medical data are generated in real time, so that real-time analysis can yield more value. This
paper introduces the design of the Sentinel which can realize the real-time analysis system based on the
clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering
algorithm and issue an early alert.
International Journal of Research in Engineering and Science is an open access peer-reviewed international forum for scientists involved in research to publish quality and refereed papers. Papers reporting original research or experimentally proved review work are welcome. Papers for publication are selected through peer review to ensure originality, relevance, and readability.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
Recently graph data rises in many applications and there is need to manage such large amount of data by performing various graph operations over graphs using some graph search queries. Many approaches and algorithms serve this purpose but continuously require improvement over it in terms of stability and performance. Such approaches are less efficient when large and complex data is involved. Applications need to execute faster in order to improve overall performance of the system and need to perform many
advanced and complex operations. Shortest path estimation is one of the key search queries in many applications. Here we present a system which will find the shortest path between nodes and contribute to performance of the system with the help of different shortest path algorithms such as bidirectional search and AStar algorithm and takes a relational approach using some new standard SQL queries to solve the
problem, utilizing advantages of relational database which solves the problem efficiently.
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEIJDKP
Metadata represents the information about data to be stored in Data Warehouses. It is a mandatory
element of Data Warehouse to build an efficient Data Warehouse. Metadata helps in data integration,
lineage, data quality and populating transformed data into data warehouse. Spatial data warehouses are
based on spatial data mostly collected from Geographical Information Systems (GIS) and the transactional
systems that are specific to an application or enterprise. Metadata design and deployment is the most
critical phase in building of data warehouse where it is mandatory to bring the spatial information and
data modeling together. In this paper, we present a holistic metadata framework that drives metadata
creation for spatial data warehouse. Theoretically, the proposed metadata framework improves the
efficiency of accessing of data in response to frequent queries on SDWs. In other words, the proposed
framework decreases the response time of the query and accurate information is fetched from Data
Warehouse including the spatial information
PREDICTION OF STORM DISASTER USING CLOUD MAP-REDUCE METHODAM Publications
Data mining is the process of analyzing data from different perspectives and summarizing it into useful information the patterns, associations, or relationships among all this data can provide information. Spatial Data Mining (SDM) is the part of data mining. It is mainly used for finding the figure in data that is related to space. In spatial data mining, to get the result analyst use geographical or spatial information which require some special technique to get geographical data in the appropriate formats. SDM is mainly used in earth science data, crime mapping, census data, traffic data, and cancer cluster (i.e. to investigate environment health hazards). For real time processing, Stream data mining is used for the prediction of storm using spatial dataset with the help of stream data mining strategy i.e. CMR (Cloud Map-Reduce). In this, Stream data mining is presented to detect the storm disaster of coastal area which is located in Central America country and then by taking the dataset of coastal area which are having various regions. There are various regions i.e. Panama, Greater Antilles, Mexico golf which is used to detect the storm. It also detects that, is the region is affected or not, if affected then which area of that region is affected and from this it is helpful to predict the storm before the disaster occur. In this, two parameters are used to test the technique i.e. processing time and computational load and using this parameter compare the previous and proposed technique. The main aim of this is to predict storm disaster before it happen with the help of directionality and velocity of storm.
High dimensionality reduction on graphical dataeSAT Journals
Abstract In spite of the fact that graph embedding has been an intense instrument for displaying data natural structures, just utilizing all elements for data structures revelation may bring about noise amplification. This is especially serious for high dimensional data with little examples. To meet this test, a novel effective structure to perform highlight determination for graph embedding, in which a classification of graph implanting routines is given a role as a slightest squares relapse issue. In this structure, a twofold component selector is acquainted with normally handle the component cardinality at all squares detailing. The proposed strategy is quick and memory proficient. The proposed system is connected to a few graph embedding learning issues, counting administered, unsupervised and semi supervised graph embedding. Key Words:Efficient feature selection, High dimensional data, Sparse graph embedding, Sparse principal component analysis, Subproblem Optimization.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
With the rapid development in Geographic Information Systems (GISs) and their applications, more and
more geo-graphical databases have been developed by different vendors. However, data integration and
accessing is still a big problem for the development of GIS applications as no interoperability exists among
different spatial databases. In this paper we propose a unified approach for spatial data query. The paper
describes a framework for integrating information from repositories containing different vector data sets
formats and repositories containing raster datasets. The presented approach converts different vector data
formats into a single unified format (File Geo-Database “GDB”). In addition, we employ “metadata” to
support a wide range of users’ queries to retrieve relevant geographic information from heterogeneous and
distributed repositories. Such an employment enhances both query processing and performance.
New proximity estimate for incremental update of non uniformly distributed cl...IJDKP
The conventional clustering algorithms mine static databases and generate a set of patterns in the form of
clusters. Many real life databases keep growing incrementally. For such dynamic databases, the patterns
extracted from the original database become obsolete. Thus the conventional clustering algorithms are not
suitable for incremental databases due to lack of capability to modify the clustering results in accordance
with recent updates. In this paper, the author proposes a new incremental clustering algorithm called
CFICA(Cluster Feature-Based Incremental Clustering Approach for numerical data) to handle numerical
data and suggests a new proximity metric called Inverse Proximity Estimate (IPE) which considers the
proximity of a data point to a cluster representative as well as its proximity to a farthest point in its vicinity.
CFICA makes use of the proposed proximity metric to determine the membership of a data point into a
cluster.
Data imputing uses to posit missing data values, as missing data have a negative effect on the computation validity of models. This study develops a genetic algorithm (GA) to optimize imputing for missing cost data of fans used in road tunnels by the Swedish Transport Administration (Trafikverket). GA uses to impute the missing cost data using an optimized valid data period. The results show highly correlated data (R- squared 0.99) after imputing the missing data. Therefore, GA provides a wide search space to optimize imputing and create complete data. The complete data can be used for forecasting and life cycle cost analysis. Ritesh Kumar Pandey | Dr Asha Ambhaikar"Data Imputation by Soft Computing" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-2 | Issue-4 , June 2018, URL: http://www.ijtsrd.com/papers/ijtsrd14112.pdf http://www.ijtsrd.com/computer-science/real-time-computing/14112/data-imputation-by-soft-computing/ritesh-kumar-pandey
The complexity of landscape pattern mining is well stated due to its non-linear spatial image formation and
inhomogeneity of the satellite images. Land Ex tool of the literature work needs several seconds to answer input
image pattern query. The time duration of content based image retrieval depends on input query complexity. This
paper focuses on designing and implementing a training dataset to train NML (Neural network based Machine
Learning) algorithm to reduce the search time to improve the result accuracy. The performance evolution of
proposed NML CBIR (Content Based Image Retrieval) method will be used for comparison of satellite and natural
images by means of increasing speed and accuracy.
Keywords: Spatial Image, Satellite image, NML, CBIR
Mobile Location Indexing Based On Synthetic Moving ObjectsIJECEIAES
Today, the number of researches based on the data they move known as mobile objects indexing came out from the traditional static one. There are some indexing approaches to handle the complicated moving positions. One of the suitable ideas is pre-ordering these objects before building index structure. In this paper, a structure, a presorted-nearest index tree algorithm is proposed that allowed maintaining, updating, and range querying mobile objects within the desired period. Besides, it gives the advantage of an index structure to easy data access and fast query along with the retrieving nearest locations from a location point in the index structure. A synthetic mobile position dataset is also proposed for performance evaluation so that it is free from location privacy and confidentiality. The detail experimental results are discussed together with the performance evaluation of KDtree-based index structure. Both approaches are similarly efficient in range searching. However, the proposed approach is especially much more save time for the nearest neighbor search within a range than KD tree-based calculation.
Integrating Web Services With Geospatial Data Mining Disaster Management for ...Waqas Tariq
Data Mining (DM) and Geographical Information Systems (GIS) are complementary techniques for describing, transforming, analyzing and modeling data about real world system. GIS and DM are naturally synergistic technologies that can be joined to produce powerful market insight from a sea of disparate data. Web Services would greatly simplify the development of many kinds of data integration and knowledge management applications. This research aims to develop a Spatial DM web service. It integrates state of the art GIS and DM functionality in an open, highly extensible, web-based architecture. The Interoperability of geospatial data previously focus just on data formats and standards. The recent popularity and adoption of Web Services has provided new means of interoperability for geospatial information not just for exchanging data but for analyzing these data during exchange as well. An integrated, user friendly Spatial DM System available on the internet via a web service offers exciting new possibilities for geo-spatial analysis to be ready for decision making and geographical research to a wide range of potential users.
APPLICATION OF DYNAMIC CLUSTERING ALGORITHM IN MEDICAL SURVEILLANCEIJCSEA Journal
The traditional medical analysis is based on the static data, the medical data is about to be analysis after
the collection of these data sets is completed, but this is far from satisfying the actual demand. Large
amounts of medical data are generated in real time, so that real-time analysis can yield more value. This
paper introduces the design of the Sentinel which can realize the real-time analysis system based on the
clustering algorithm. Sentinel can realize clustering analysis of real-time data based on the clustering
algorithm and issue an early alert.
International Journal of Research in Engineering and Science is an open access peer-reviewed international forum for scientists involved in research to publish quality and refereed papers. Papers reporting original research or experimentally proved review work are welcome. Papers for publication are selected through peer review to ensure originality, relevance, and readability.
Clustering Using Shared Reference Points Algorithm Based On a Sound Data ModelWaqas Tariq
A novel clustering algorithm CSHARP is presented for the purpose of finding clusters of arbitrary shapes and arbitrary densities in high dimensional feature spaces. It can be considered as a variation of the Shared Nearest Neighbor algorithm (SNN), in which each sample data point votes for the points in its k-nearest neighborhood. Sets of points sharing a common mutual nearest neighbor are considered as dense regions/ blocks. These blocks are the seeds from which clusters may grow up. Therefore, CSHARP is not a point-to-point clustering algorithm. Rather, it is a block-to-block clustering technique. Much of its advantages come from these facts: Noise points and outliers correspond to blocks of small sizes, and homogeneous blocks highly overlap. This technique is not prone to merge clusters of different densities or different homogeneity. The algorithm has been applied to a variety of low and high dimensional data sets with superior results over existing techniques such as DBScan, K-means, Chameleon, Mitosis and Spectral Clustering. The quality of its results as well as its time complexity, rank it at the front of these techniques.
Fusion of Multispectral And Full Polarimetric SAR Images In NSST DomainCSCJournals
Polarimetric SAR (POLSAR) and multispectral images provide different characteristics of the imaged objects. Multispectral provides information about surface material while POLSAR provides information about geometrical and physical properties of the objects. Merging both should resolve many of object recognition problems that exist when they are used separately. Through this paper, we propose a new scheme for image fusion of full polarization radar image (POLSAR) with multispectral optical satellite image (Egyptsat). The proposed scheme is based on Non-Subsampled Shearlet Transform (NSST) and multi-channel Pulse Coupled Neural Network (m-PCNN). We use NSST to decompose images into low frequency and band-pass sub- band coefficients. With respect to low frequency coefficients, a fusion rule is proposed based on local energy and dispersion index. In respect of sub-band coefficients, m-PCNN is used to guide how the fused sub-band coefficients are calculated using image textural information.
The proposed method is applied on three batches of Egyptsat (Red-Green-infra-red) and radarsat2 (C-band full-polarimetric HH-HV and VV-polarization) images. The batches are selected to react differently with different polarization. Visual assessment of the obtained fused image gives excellent information on clarity and delineation of different objects. Quantitative evaluations show the proposed method can superior the other data fusion methods.
Improving pixel based change detection accuracy using an object-based approac...LogicMindtech Nologies
IMAGE PROCESSING Projects for M. Tech, IMAGE PROCESSING Projects in Vijayanagar, IMAGE PROCESSING Projects in Bangalore, M. Tech Projects in Vijayanagar, M. Tech Projects in Bangalore, IMAGE PROCESSING IEEE projects in Bangalore, IEEE 2015 IMAGE PROCESSING Projects, MATLAB Image Processing Projects, MATLAB Image Processing Projects in Bangalore, MATLAB Image Processing Projects in Vijayangar
Change Detection of Water-Body in Synthetic Aperture Radar ImagesCSCJournals
Change detection is the art of quantifying the changes in the Synthetic Aperture Radar (SAR) images that have happened over a period of time. Remote sensing has been the parental technique to perform change detection analysis. This paper empirically investigates the impact of applying the combination of texture features for different classification techniques to separate water body from non-water body. At first, the images are classified using unsupervised Principle Component Analysis (PCA) based K-means clustering for dimension reduction. Then the texture features like Energy, Entropy, Contrast , Inverse Differential Moment , Directional Moment and the Median are extracted using Gray Level Co-occurrence Matrix (GLCM) and these features are utilized in Linear Vector Quantization (LVQ) and Support Vector Machine (SVM) classifiers. This paper aims to apply a combination of the texture features in order to significantly improve the accuracy of detection. The utility of detection analysis, influences management and policy decision making for long-term construction projects by predicting the preventable losses.
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology.
This last part of a course about SAR iamges concerns urban areas.
Recent development about urban are presented. They include advanced modes such as polarimetry, interferometry, DinSAR and POLINSAR.
Final Year M.Tech/B.Tech IEEE Projects-Dip,dsp & communication [matlab & ti p...Chinnasamy C
Hi,
Pantech ProEd Pvt Ltd, Provides Projects guidance for ME, M.Tech, &Msc and all Electrical science Students......
Related to : final year project, Live project, Real time projects, Embedded Projects, VLSI projects, DSP/DIP projects, POWER ELECTRONICS, java, Dotnet, .Net,
Data mining, Mobile computing, system, NETWORKING, application, PLC,
MATLAB :( DIP DSP)
Compression of images
Fusion of different images
Communication projects in MIMO, OFDM
Wireless technology.
SIMULINK Projects.
DSP hardware kit projects
A Study on Data Visualization Techniques of Spatio Temporal DataIJMTST Journal
Data visualization is an important tool to analyze complex Spatio Temporal data. The spatio-temporal data
can be visualized using 2D, 3D or any other type of maps. Cartography is the major technique used in
mapping. The data can also be visualized by placing different layers of maps one on other, which is done by
using GIS. Many data visualization techniques are in trend but the usage of the techniques must be decided
by considering the application requirements.
In this study various techniques for exploratory spatial data analysis are reviewed : spatial autocorrelation, Moran's I statistic, hot spots analysis, spatial lag and spatial error models.
SCAF – AN EFFECTIVE APPROACH TO CLASSIFY SUBSPACE CLUSTERING ALGORITHMSijdkp
Subspace clustering discovers the clusters embedded in multiple, overlapping subspaces of high
dimensional data. Many significant subspace clustering algorithms exist, each having different
characteristics caused by the use of different techniques, assumptions, heuristics used etc. A comprehensive
classification scheme is essential which will consider all such characteristics to divide subspace clustering
approaches in various families. The algorithms belonging to same family will satisfy common
characteristics. Such a categorization will help future developers to better understand the quality criteria to
be used and similar algorithms to be used to compare results with their proposed clustering algorithms. In
this paper, we first proposed the concept of SCAF (Subspace Clustering Algorithms’ Family).
Characteristics of SCAF will be based on the classes such as cluster orientation, overlap of dimensions etc.
As an illustration, we further provided a comprehensive, systematic description and comparison of few
significant algorithms belonging to “Axis parallel, overlapping, density based” SCAF.
MULTICRITERIA DECISION AIDED SYSTEM FOR RANKING INDUSTRIAL ZONES (RPRO4SIGZI) cscpconf
Integration of Geographic Information Systems (GIS) and multi-criteria decision analysis (MCDA) is a privileged and indispensable way to evolve GIS into real decision support systems. RPRO4SIGZI, the system proposed in this paper allows, from a detailed study of geographical, environmental and socioe-conomic criteria to cooperate GIS and multi-criteria decision analysis method for spatial choosing of the right site for installing industrial projects. The result obtained by RPRO (Ranking PROMETHEE) for ranking industrial zones in western Algeria is refined by a viewing SIGZI (Geographic Information System for Industrial Zones). The RPRO unit rank industrial zones using the outranking PROMETHEE II method issue from European school and SIGZI module to the visualization of these zones on the map. RPRO4SIGZI system was designed for the evaluation of a new methodology of multi-criteria analysis guided by data mining. The objective is to show how data mining is used to model the preferences of the decision maker tainted with subjectivity and hesitance to generate suitable performance tables. Only RPRO4SIGZI system is presented in this paper.
A STUDY OF TRADITIONAL DATA ANALYSIS AND SENSOR DATA ANALYTICSijistjournal
The growth of smart and intelligent devices known as sensors generate large amount of data. These generated data over a time span takes such a large volume which is designated as big data. The data structure of repository holds unstructured data. The traditional data analytics methods well developed and used widely to analyze structured data and to limit extend the semi-structured data which involves additional processing over heads. The similar methods used to analyze unstructured data are different because of distributed computing approach where as there is a possibility of centralized processing in case of structured and semi-structured data. The under taken work is confined to analysis of both verities of methods. The result of this study is targeted to introduce methods available to analyze big data.
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION cscpconf
While designing a new type of engineering material one has to search for some existing
materials which suits design requirement and then he can try to produce new kind of
engineering material. This selection process itself is tedious as he has to select few numbers of
materials out of a set of lakhs of materials. Therefore in this paper a model is proposed to select
a particular material which suits the user requirement, by using some similarity/distance
measuring functionalities. Here thirteen different types of similarity/distance measuring
functionalities are examined. Performance Index Measure(PIM) is calculated to verify the
relative performance of the selected material with the target material. Then all the results are
normalised for the purpose of analysing the results. Hence the proposed model reduces the
wastage of time in selection and also avoids the haphazardly selection of the materials in materials design and manufacturing industries.
An important measurable indicator of urbanization and its environmental implications has been identified as the urban
impervious surface. It presents a strategy based on three-dimensional convolutional neural networks (3D CNNs) for extracting
urbanization from the LiDAR datasets using deep learning technology. Various 3D CNN parameters are tested to see how they
affect impervious surface extraction. For urban impervious surface delineation, this study investigates the synergistic
integration of multiple remote sensing datasets of Azad Kashmir, State of Pakistan, to alleviate the restrictions imposed by
single sensor data. Overall accuracy was greater than 95% and overall kappa value was greater than 90% in our suggested 3D
CNN approach, which shows tremendous promise for impervious surface extraction. Because it uses multiscale convolutional
processes to combine spatial and spectral information and texture and feature maps, we discovered that our proposed 3D
CNN approach makes better use of urbanization than the commonly utilized pixel-based support vector machine classifier. In
the fast-growing big data era, image analysis presents significant obstacles, yet our proposed 3D CNNs will effectively extract
more urban impervious surfaces
Principle Component Analysis Based on Optimal Centroid Selection Model for Su...ijtsrd
Clustering a large sparse and large scale data is an open research in the data mining. To discover the significant information through clustering algorithm stands inadequate as most of the data finds to be non actionable. Existing clustering technique is not feasible to time varying data in high dimensional space. Hence Subspace clustering will be answerable to problems in the clustering through incorporation of domain knowledge and parameter sensitive prediction. Sensitiveness of the data is also predicted through thresholding mechanism. The problems of usability and usefulness in 3D subspace clustering are very important issue in subspace clustering. . The Solutions is highly helpful benefit for police departments and law enforcement organisations to better understand stock issues and provide insights that will enable them to track activities, predict the likelihood. Also determining the correct dimension is inconsistent and challenging issue in subspace clustering .In this thesis, we propose Centroid based Subspace Forecasting Framework by constraints is proposed, i.e. must link and must not link with domain knowledge. Unsupervised Subspace clustering algorithm with inbuilt process like inconsistent constraints correlating to dimensions has been resolved through singular value decomposition. Principle component analysis is been used in which condition has been explored to estimate the strength of actionable to be particular attributes and utilizing the domain knowledge to refinement and validating the optimal centroids dynamically. An experimental result proves that proposed framework outperforms other competition subspace clustering technique in terms of efficiency, Fmeasure, parameter insensitiveness and accuracy. G. Raj Kamal | A. Deepika | D. Pavithra | J. Mohammed Nadeem | V. Prasath Kumar "Principle Component Analysis Based on Optimal Centroid Selection Model for SubSpace Clustering Model" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-4 | Issue-4 , June 2020, URL: https://www.ijtsrd.com/papers/ijtsrd31374.pdf Paper Url :https://www.ijtsrd.com/computer-science/data-miining/31374/principle-component-analysis-based-on-optimal-centroid-selection-model-for-subspace-clustering-model/g-raj-kamal
Clustering heterogeneous categorical data using enhanced mini batch K-means ...IJECEIAES
Clustering methods in data mining aim to group a set of patterns based on their similarity. In a data survey, heterogeneous information is established with various types of data scales like nominal, ordinal, binary, and Likert scales. A lack of treatment of heterogeneous data and information leads to loss of information and scanty decision-making. Although many similarity measures have been established, solutions for heterogeneous data in clustering are still lacking. The recent entropy distance measure seems to provide good results for the heterogeneous categorical data. However, it requires many experiments and evaluations. This article presents a proposed framework for heterogeneous categorical data solution using a mini batch k-means with entropy measure (MBKEM) which is to investigate the effectiveness of similarity measure in clustering method using heterogeneous categorical data. Secondary data from a public survey was used. The findings demonstrate the proposed framework has improved the clustering’s quality. MBKEM outperformed other clustering algorithms with the accuracy at 0.88, v-measure (VM) at 0.82, adjusted rand index (ARI) at 0.87, and Fowlkes-Mallow’s index (FMI) at 0.94. It is observed that the average minimum elapsed time-varying for cluster generation, k at 0.26 s. In the future, the proposed solution would be beneficial for improving the quality of clustering for heterogeneous categorical data problems in many domains.
International Journal of Engineering Research and Applications (IJERA) is an open access online peer reviewed international journal that publishes research and review articles in the fields of Computer Science, Neural Networks, Electrical Engineering, Software Engineering, Information Technology, Mechanical Engineering, Chemical Engineering, Plastic Engineering, Food Technology, Textile Engineering, Nano Technology & science, Power Electronics, Electronics & Communication Engineering, Computational mathematics, Image processing, Civil Engineering, Structural Engineering, Environmental Engineering, VLSI Testing & Low Power VLSI Design etc.
Hyperparameters analysis of long short-term memory architecture for crop cla...IJECEIAES
Deep learning (DL) has seen a massive rise in popularity for remote sensing (RS) based applications over the past few years. However, the performance of DL algorithms is dependent on the optimization of various hyperparameters since the hyperparameters have a huge impact on the performance of deep neural networks. The impact of hyperparameters on the accuracy and reliability of DL models is a significant area for investigation. In this study, the grid Search algorithm is used for hyperparameters optimization of long short-term memory (LSTM) network for the RS-based classification. The hyperparameters considered for this study are, optimizer, activation function, batch size, and the number of LSTM layers. In this study, over 1,000 hyperparameter sets are evaluated and the result of all the sets are analyzed to see the effects of various combinations of hyperparameters as well the individual parameter effect on the performance of the LSTM model. The performance of the LSTM model is evaluated using the performance metric of minimum loss and average loss and it was found that classification can be highly affected by the choice of optimizer; however, other parameters such as the number of LSTM layers have less influence.
A Mixture Model of Hubness and PCA for Detection of Projected OutliersZac Darcy
With the Advancement of time and technology, Outlier Mining methodologies help to sift through the large
amount of interesting data patterns and winnows the malicious data entering in any field of concern. It has
become indispensible to build not only a robust and a generalised model for anomaly detection but also to
dress the same model with extra features like utmost accuracy and precision. Although the K-means
algorithm is one of the most popular, unsupervised, unique and the easiest clustering algorithm, yet it can
be used to dovetail PCA with hubness and the robust model formed from Guassian Mixture to build a very
generalised and a robust anomaly detection system. A major loophole of the K-means algorithm is its
constant attempt to find the local minima and result in a cluster that leads to ambiguity. In this paper, an
attempt has done to combine K-means algorithm with PCA technique that results in the formation of more
closely centred clusters that work more accurately with K-means algorithm .This combination not only
provides the great boost to the detection of outliers but also enhances its accuracy and precision.
A MIXTURE MODEL OF HUBNESS AND PCA FOR DETECTION OF PROJECTED OUTLIERSZac Darcy
With the Advancement of time and technology, Outlier Mining methodologies help to sift through the large
amount of interesting data patterns and winnows the malicious data entering in any field of concern. It has
become indispensible to build not only a robust and a generalised model for anomaly detection but also to
dress the same model with extra features like utmost accuracy and precision. Although the K-means
algorithm is one of the most popular, unsupervised, unique and the easiest clustering algorithm, yet it can
be used to dovetail PCA with hubness and the robust model formed from Guassian Mixture to build a very
generalised and a robust anomaly detection system. A major loophole of the K-means algorithm is its
constant attempt to find the local minima and result in a cluster that leads to ambiguity. In this paper, an
attempt has done to combine K-means algorithm with PCA technique that results in the formation of more
closely centred clusters that work more accurately with K-means algorithm
Similar to A hybrid approach for analysis of dynamic changes in spatial data (20)
Let's dive deeper into the world of ODC! Ricardo Alves (OutSystems) will join us to tell all about the new Data Fabric. After that, Sezen de Bruijn (OutSystems) will get into the details on how to best design a sturdy architecture within ODC.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Slack (or Teams) Automation for Bonterra Impact Management (fka Social Soluti...Jeffrey Haguewood
Sidekick Solutions uses Bonterra Impact Management (fka Social Solutions Apricot) and automation solutions to integrate data for business workflows.
We believe integration and automation are essential to user experience and the promise of efficient work through technology. Automation is the critical ingredient to realizing that full vision. We develop integration products and services for Bonterra Case Management software to support the deployment of automations for a variety of use cases.
This video focuses on the notifications, alerts, and approval requests using Slack for Bonterra Impact Management. The solutions covered in this webinar can also be deployed for Microsoft Teams.
Interested in deploying notification automations for Bonterra Impact Management? Contact us at sales@sidekicksolutionsllc.com to discuss next steps.
PHP Frameworks: I want to break free (IPC Berlin 2024)Ralf Eggert
In this presentation, we examine the challenges and limitations of relying too heavily on PHP frameworks in web development. We discuss the history of PHP and its frameworks to understand how this dependence has evolved. The focus will be on providing concrete tips and strategies to reduce reliance on these frameworks, based on real-world examples and practical considerations. The goal is to equip developers with the skills and knowledge to create more flexible and future-proof web applications. We'll explore the importance of maintaining autonomy in a rapidly changing tech landscape and how to make informed decisions in PHP development.
This talk is aimed at encouraging a more independent approach to using PHP frameworks, moving towards a more flexible and future-proof approach to PHP development.
State of ICS and IoT Cyber Threat Landscape Report 2024 previewPrayukth K V
The IoT and OT threat landscape report has been prepared by the Threat Research Team at Sectrio using data from Sectrio, cyber threat intelligence farming facilities spread across over 85 cities around the world. In addition, Sectrio also runs AI-based advanced threat and payload engagement facilities that serve as sinks to attract and engage sophisticated threat actors, and newer malware including new variants and latent threats that are at an earlier stage of development.
The latest edition of the OT/ICS and IoT security Threat Landscape Report 2024 also covers:
State of global ICS asset and network exposure
Sectoral targets and attacks as well as the cost of ransom
Global APT activity, AI usage, actor and tactic profiles, and implications
Rise in volumes of AI-powered cyberattacks
Major cyber events in 2024
Malware and malicious payload trends
Cyberattack types and targets
Vulnerability exploit attempts on CVEs
Attacks on counties – USA
Expansion of bot farms – how, where, and why
In-depth analysis of the cyber threat landscape across North America, South America, Europe, APAC, and the Middle East
Why are attacks on smart factories rising?
Cyber risk predictions
Axis of attacks – Europe
Systemic attacks in the Middle East
Download the full report from here:
https://sectrio.com/resources/ot-threat-landscape-reports/sectrio-releases-ot-ics-and-iot-security-threat-landscape-report-2024/
Software Delivery At the Speed of AI: Inflectra Invests In AI-Powered QualityInflectra
In this insightful webinar, Inflectra explores how artificial intelligence (AI) is transforming software development and testing. Discover how AI-powered tools are revolutionizing every stage of the software development lifecycle (SDLC), from design and prototyping to testing, deployment, and monitoring.
Learn about:
• The Future of Testing: How AI is shifting testing towards verification, analysis, and higher-level skills, while reducing repetitive tasks.
• Test Automation: How AI-powered test case generation, optimization, and self-healing tests are making testing more efficient and effective.
• Visual Testing: Explore the emerging capabilities of AI in visual testing and how it's set to revolutionize UI verification.
• Inflectra's AI Solutions: See demonstrations of Inflectra's cutting-edge AI tools like the ChatGPT plugin and Azure Open AI platform, designed to streamline your testing process.
Whether you're a developer, tester, or QA professional, this webinar will give you valuable insights into how AI is shaping the future of software delivery.
Builder.ai Founder Sachin Dev Duggal's Strategic Approach to Create an Innova...Ramesh Iyer
In today's fast-changing business world, Companies that adapt and embrace new ideas often need help to keep up with the competition. However, fostering a culture of innovation takes much work. It takes vision, leadership and willingness to take risks in the right proportion. Sachin Dev Duggal, co-founder of Builder.ai, has perfected the art of this balance, creating a company culture where creativity and growth are nurtured at each stage.
Accelerate your Kubernetes clusters with Varnish CachingThijs Feryn
A presentation about the usage and availability of Varnish on Kubernetes. This talk explores the capabilities of Varnish caching and shows how to use the Varnish Helm chart to deploy it to Kubernetes.
This presentation was delivered at K8SUG Singapore. See https://feryn.eu/presentations/accelerate-your-kubernetes-clusters-with-varnish-caching-k8sug-singapore-28-2024 for more details.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...BookNet Canada
The publishing industry has been selling digital audiobooks and ebooks for over a decade and has found its groove. What’s changed? What has stayed the same? Where do we go from here? Join a group of leading sales peers from across the industry for a conversation about the lessons learned since the popularization of digital books, best practices, digital book supply chain management, and more.
Link to video recording: https://bnctechforum.ca/sessions/selling-digital-books-in-2024-insights-from-industry-leaders/
Presented by BookNet Canada on May 28, 2024, with support from the Department of Canadian Heritage.
Transcript: Selling digital books in 2024: Insights from industry leaders - T...
A hybrid approach for analysis of dynamic changes in spatial data
1. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.1, February 2014
DOI : 10.5121/ijdms.2014.6104 45
A HYBRID APPROACH FOR ANALYSIS OF
DYNAMIC CHANGES IN SPATIAL DATA
Ch. Mallikarjuna Rao1
and Dr.Ananda Rao.Akepogu2
1
Department of Computer and Engineering, G.R.I.E.T, Hyderabad, India
2
Director of IR&P,SCDE, JNTUCEA, Ananthapuramu, India
ABSTRACT
Any geographic location undergoes changes over a period of time. These changes can be observed by
naked eye, only if they are huge in number spread over a small area. However, when the changes are small
and spread over a large area, it is very difficult to observe or extract the changes. Presently, there are few
methods available for tackling these types of problems, such as GRID, DBSCAN etc. However, these
existing mechanisms are not adequate for finding an accurate changes or observation which is essential
with respect to most important geometrical changes such as deforestations and land grabbing etc.,. This
paper proposes new mechanism to solve the above problem. In this proposed method, spatial image
changes are compared over a period of time taken by the satellite. Partitioning the satellite image in to
grids, employed in the proposed hybrid method, provides finer details of the image which are responsible
for improving the precision of clustering compared to whole image manipulation, used in DBSCAN, at a
time .The simplicity of DBSCAN explored while processing portioned grid portion.
KEYWORDS
Clustering, Grid-based algorithm, Grid-density, grids, Image.
1. INTRODUCTION
Detection of geometrical changes in the satellite imagery is a challenging issue as it is natural
phenomenon connected to naturally occurring disasters in the environment. The geographically
located objects are represented by spatial data which needs to be monitored for the analysis,
classification and study of various features of spatial objects. Spatial data mining is an intellectual
study of analyzing the different kinds of relationship such as topological information, spatial
indexing and distances etc..The data of collected objects are processed and there by information
of features that are distinct will be the outcome of spatial data mining.
An intensive study has been conducted on data mining, such as those are reported in [1,2,3,4]
the spatial data has been analyzed using clustering algorithms [5,6] with the main objective of
structuring data while considering similar features and attributes among the data objects.
Previous work in spatial data mining established many algorithms such as DBSCAN, GRID etc.,
DBSCAN cannot cluster data sets well with large differences in densities since the Min Pts-
combination cannot then be chosen appropriately for all clusters. Similarly, the GRID clustering
2. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.1, February 2014
46
depends on the granularity of lowest level of grid structure. If the granularity is very fine, the cost
of processing will increase substantially. However, if the bottom level of grid structure is too
coarse, it may reduce the quality of cluster analysis. Hence, a new clustering method is proposed
,which associate with the features of GRID and DBSCAN at the same time their limitations have
been taken care.
The performance of proposed technique has been compared with existing methods using the
same data set. The innovative finding during investigation is observed as the kappa coefficient is
extremely high of our proposed hybrid algorithm [95%] when compared to the best GRID
approach [81%]. The ability of the clustering performance is carefully analyzed in both accuracy
and processing time. The innovative finding achieved in the proposed investigation is that
detection mechanism of geometrical changes in satellite image has been established with high
degree of accuracy in terms two parameters i) the number of clusters generated and ii) deviation
of spectral distribution of colors .
Paper organization: Section 2 concerns with literature survey associated with spatial data
analysis. The proposed method explained in detail in section 3..Implementation. Results and
discussions are made in section 4 and 5. Section 6 deals with conclusion and scope of the work.
2. LITERATURE SURVEY
Spatial data analysis is the process by which raw data generates useful information. In simple
words spatial analysis includes the formal methods which, study entities or objects using their
topological and geographic properties. Topology deals with the most basic properties of space
such as connectedness, geometric features, relative position of figures etc. Geographic details that
delivers in the study of science comprising of lands, features, inhabitants etc. Spatial analysis is
a set of methods whose results change when the objects location which is being analyzed, is
changed. Its results also change when the frame used to analyze them, changes. The most basic
methodology of approach in spatial data processes is modeling the spatial location of the
attributes and objects that are to be analyzed.
2.1 Significance of Spatial Data Analysis
In modern technological and scientific society, the spatio-temporal data analysis and spatial data
mining became an essential research area in solving the land cover mapping problems for urban
planning and land –usage [7] for a continuous processing and analysis of repositories of remote
sensing images to generate thematic usage of land maps and land usage [8,9] virtual globes.
Applications of the data mining covers the study of various fields such as Environment, Climate,
and Mobile-commerce etc.
Traditional techniques in data mining often exhibit poor performance when spatial and spatio-
temporal data sets are applied due to the various reasons [10,11]. First, these spatial datasets are
continuous and other kind of information is in discrete database. Second, patterns are confined to
local patterns but global patterns are most emphasized in traditional data mining techniques. A
general observed factor found in statistical treatment of analysis in the field of classical data set is
generation of data samples in an independent way. The concept of independency is the required
approach in analysis of spatial and spatio-temporal data as these data points are at high degree of
correlation [12,13]. In spatial statistical analysis mechanism of gathering the data of same
3. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.1, February 2014
47
features or characteristics is called autocorrelation. Autocorrelation in analysis of data of spatial
and spatio-temporal characteristics may lead to hypothetical models that may not be at high
degree of accuracy or consistency with the collected data set.
Thus, there seems to be a need of introducing new innovative methods to attempt the analysis of
spatial and spatio-temporal data treated and modeled in a useful and non-trivial patterns. The
review of earlier attempts in [14,15,16] data processing suggests some of the new methods which
include the facilitation of interactions such as co-locations, co-occurrences, and tele-connections
in the process of detecting spatial outliers and prediction of locations in spite of innovative ideas
that emerges in the mining of spatio-temporal pattern.
2.2 Role of Clustering
Clustering is an important task in data mining to cluster data into significant sub groups to obtain
useful information. Clustering spatial data is a vital significant issue in detecting the hidden
patterns involved in useful sub groups. It has several applications like medical image analysis,
satellite data analysis, GIS issues etc. Clustering can be treated as data segmentation as large
data sets are made into groups in accordance with similarity in the features and characteristics.
Clustering can also be implemented for analysis of the outliers. Entities and attributes with
similar characteristics are assigned in one cluster. Outlier detection with the help of existence of
dissimilarities in the clusters is the most common application in data mining process.
3. PROPOSED METHODOLOGY AND APPROACH
In this section we present our selection of techniques and analyze them according to the proposed
criteria.
3.1 Rational behind the selection of Clustering techniques:
We carefully selected clustering techniques after going through evaluation and non evaluations
data needs. The rational behind these categories is that the algorithms have to share several
properties such as spatial ,temporal and fuzzy based to improve the performance of the
techniques. The spatial evaluation concerns about mobile / immobile ,point ,line and region
features. Majority of clustering features makes use of these features. The , interval behavior of
temporal data can also guide efficient clustering. By conceding all these needs we have selected
GRID based approach to address spatial needs and DBSCAN to deal with localized temporal
features. The fusion of these techniques has become handy while arriving our new approach
“GRID-DBSCAN”. The number of clusters put the constraints on computational requirements
and accuracy of the approach. More clusters provide the finer details but require more
computational power. Less clusters limit the accuracy of approach. In our hybrid approach the
number of clusters are increase without effecting computational needs the scalability of the
approach i.e segmented images processing confined to local region there by reducing
computational needs. The approach is suitable even for parallel computing environment.
3.2 Proposed Approach.
Our new Algorithm is aimed to improve the accuracy of the clustering technique . In this method
4. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.1, February 2014
48
the hybrid approach is used to combine the DBSCAN and GRID clustering algorithm to create a
new clustering algorithm GRID-DBSCAN. The cell hierarchical structure is formed by
rectangular cells, representing different resolutions, which are obtained by partitioning the spatial
image. These higher level rectangular cells are recursively partitioned to the corresponding lower
levels.
After construction of the GRID structure clustering on the GRID is done in the following way:
Top-down grid based method is used to perform clustering on earlier described data structure
with the help of density parameter supplied by the user. Detailed procedure is described below:
First a layer within the hierarchical structure is determined; in this work the root layer is selected.
For each cell in the current layer, compute the confidence interval that the cell which is relevant
to the clustering. Cells which do not meet the confident level are deemed as irrelevant. After
finishing the examination of the current layer, the next lower levels of the cells are examined. The
same process are repeated for the other levels .The difference is that instead of going through all
cells, only those cells that are children of the relevant cells of the previous layer are processed.
This procedure continues until it reaches the lowest level layer (bottom layer). At this time,
regions of relevant cells are processed. In GRID-DBSCAN algorithm the DBSCAN algorithm is
used for processing of the relevant bottom cells. In this process all the 6 regions (Agriculture,
water, greenery, urban, sea and other) are displayed on the clustering output. So the user need not
give any input parameter. The thresholds are calculated from the pixel information.
3.3.Image Histogram
An image histogram is a graphical representation of the tonal distribution in an image. An image
histogram plots each tonal value in terms of the respective pixels. On the visual base distribution
of colors for a specific image can be understood at glance in spectral plotting of color values.
The horizontal axis or the x-axis of the graph represents the tonal distribution and the vertical axis
or the y-axis represents the number of pixels in that particular tone. The black and dark areas are
indicated by left side of the horizontal axis. Medium grey is indicated by the middle of the
horizontal axis and light and pure white areas are shown by right side. The size of the areas
captured in the zones is shown by the vertical axis. From histogram of every dark image, it is
found that majority of data points exist in left side and center of the represented graph. In the
opposite way, bright image is found with majority of data points in the dark locations at the center
and right of the graph.
The distributions of colors are represented with a color histogram for an image. For digital
images, it represents the number of pixels that have colors in each of a fixed list of color range,
that span an image's color space.
A color histogram is plotted for any kind of color space, although it is in general used for 3-
dimensional spaces like HSV or RGB. For images, like the monochromatic images, the
term intensity histogram is generally used. For images where each pixel is denoted by an arbitrary
number of numerical estimations, i.e., the multi-spectral images, the color histogram is n-
dimensional, where n is the number of estimations carried out. Every specified pixel has identity
of certain spectrum range and limitation of wavelength, some of the pixels in the image can also
exist out of the limitations of the visible spectrum.
5. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.1, February 2014
49
For a small set of possible values assigned to the colors, a range as shown in figure 1. has to be
defined to establish a histogram for counting the color values. A space is automatically set to one
of the ranges defined in the histogram and therefore can be treated as a regular grid of similar
colors.
Fig.1. Color range
Thus the color histogram may be represented and displayed as a function defined over the color
space that can be used to approximate the pixel counts. Further it is a static, like other spectral
representations can be an approximation of distribution of color’s values in continuous manner. In
our approach 6-8 colors are used while representing visual clusters.
4. IMPLEMENTATION OF THE PROPOSED WORK
An image is taken as input and converted into pixels. Then these pixel values can be used to
cluster the image. The principle used is, the pixels having similar values should be grouped into a
cluster. Thus, clusters can be formed for a given image. Similarly the clusters can be formed for
the second image. If the number of clusters for both the images is same, then the images are
similar. But, if the images have different number of clusters, then the second image represents the
first image with changes. The algorithm proposed for this task is:
Algorithm for GRID-DBSCAN
GRID-DBSCAN(Image :img)
begin
img=0
While (img<2)
for each i in width
for each j in height
pixel=image.getRGB(i, j);
pixelArray[i][j]=pixel;
endfor
endfor
new_val=PixelArray[0][0];
new_v=PIlArray[0][0];
for each y in width
6. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.1, February 2014
50
for each x within width
if(Math.abs(PixelArray[x][y]-new_v)>500)then
cluster[img]++;
endif
endfor
end for
img++
endwhile
if(cluster[0]==cluster[1]) then //two images are same
else // two images are different.
endif
end.
METHOD
1: Start
2: Input the first image.
3: Convert the images into pixels.
Store the pixel values as an array of pixels.
4: Determine the clusters, by placing the pixel
positions with similar pixel values into a cluster.
5: Store the total numbers of clusters in a variable.
6: Input the second image.
7: Steps 3 to 5 are made to be repeated
8: If the number of clusters for both the images
are equal then // The two images are equal
else // the two images are different.
9: End if.
10: Stop.
Flow chart for the proposed algorithm is shown below.
7. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.1, February 2014
51
Fig.2: Flow chart of proposed data mining algorithm.
5. RESULTS AND ANALYSIS
Proposed technique is implemented on two pairs of the geometrical images before and after
natural disturbances. World Trade Center (WTC) towers before and after the 09/11 attacks and
geometrical location of a forest before and after the fire are considered. The changes in satellite
images collected after occurrence of natural tragedy can be depicted in terms of the number of
clusters generated. The number of clusters are scene dependent. In the first pair of satellite image
8. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.1, February 2014
52
of WTC towers, the number of generated clusters is found to be 3135 and 2717 respectively
before and after the natural disturbance. In the second pair of images with forest, the numbers of
clusters generated by the proposed algorithm are 1301 and 1849 before and after the fire
respectively. The observation shows that grid or segmentation of the image fluctuates due to the
natural tragedy and it seems to be automatic alignment of the ranges of the colors as directed by
the proposed technique. And therefore corresponding changes in the number of clusters are
clearly displayed in the screen shots of the simulation of proposed algorithm in fig.(4,7).
A considerable statistical deviation of the distribution of colors that has been found in both the
pair of collected satellite images, as shown in fig 5.The spatial data gained attention from the
researchers after observing these deviations during the natural calamities and disasters. We
considered two spatial images i.e figures 3 and 6 represent the images before and after 09/11
attack on World Trade Center.
A statistical evaluation of the algorithms (GRID, DBSCAN) in clustering task is also carried
out. The study shows that GRID-DBSCAN is found to be a potential method for clustering task
as its performance of accuracy is maximal.
Fig 3.Satellite Image1.(WTC towers Before September 11, 2001)
The spatial data in figure 3 is processed with the proposed method and spectral information and
number of clusters is estimated in the present investigation.
Fig .4 Screen shot showing the generated clusters for satellite image 1
9. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.1, February 2014
53
Figure.4 reveals the details of clusters produced in the data mining process and it is given in the
figure that the number of clusters is equal to 3135.
.
Fig.5 Spectral representation of colors in satellite image1.
The figure 5 shows the distribution of colors in the clusters of satellite data. The point noted in
the figure is that spread of colors is found to be statistical uniform in the clusters.
Fig.6 Satellite Image2 (After September 11)
The figure 6 is a collected satellite mage of WTC towers after 09/11 attacks. The image is taken
as input data in the process of mining with proposed algorithm in the present work.
Fig.7 Screen shot showing the generated clusters for satellite image 3.
The figure 7 gives the details of the number of generated clusters in the simulation of proposed
10. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.1, February 2014
54
technique. The total number of clusters in the satellite data is 2717. It is noted that this number is
less than that in the spatial data of figure 4.
Fig .8 Spectral representation of colors in satellite image 3.
The figure 8 shows the pattern of spectral distribution of the colors given by the proposed
method. The pattern is found to be a slight dispersion in distribution when compared with the
satellite image1 on the visual basis.
Fig. 9 Satellite image 3 of a Forest before the fire
The figure 9 is a satellite image 3 of a forest before the fire and is considered for the analysis with
proposed system. The analysis is aimed at finding the geometrical features after the forest is
influenced by the fire effect. The number of clusters formed is: 1301.
Fig .10: Screen shot showing the generated clusters for satellite image of fig.9.
11. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.1, February 2014
55
The generated number of clusters produced in the proposed technique is found to 1301 as
depicted by the figure 10. This classification is important in the analysis of geometrical forest
location features.
Fig.11: Spectral representation of colors in satellite image of fig .9.
The figure 10 depicts that the high degree of correlation in the distribution of colors on the visual
basis is found in the satellite image of forest before it undergoes the fire effect.
Fig .12: Forest after the fire (Satellite image 4)
The satellite image given in the above figure shows the fire affected forest area considered in the
analysis of proposed mining technique. The number of clusters formed is: 1849 ,the figure 12 is
the forest with fire and this image is considered for execution of the proposed method in the
paper.
Fig 13. Screen shot showing the generated clusters for fig 12.
12. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.1, February 2014
56
The figure is the screenshot generated with the simulation of proposed algorithm. The
information displayed in screenshot belongs to the forest area after effect of the fire.
Fig 14. Spectral representation of colors for fig12.
The figure 14 shows the variation of colors with the pixel taken on the y-axis. This graph is for
the image of forest under fire influence, it gives the distribution of colors.
Accuracy Comparison
IMAGES
DBSCAN GRID GRID-DBSCAN
Kappa
Coeffi
cient
Time
Kappa
Coeffic
ient
Time
Kappa
Coeffic
ient
Time
I1
0.6
00:08:52:6
57 0.7695
00:08:05:6
64 0.933
00:07:53
:33
I2
0.58
00:09:36:7
32 0.8226
00:08:34:0
0 0.936
00:07:53
:36
I3
0.524
00:07:21:4
58 0.7107
00:06:33:6
1 0.983
00:05:31
:33
I4
0.673
00:07:25:8
27 0.8123
00:06:55:6
4 0.963
00:06:34
:90
I5
0.543
00:08:71:8
7 0.722
00:08:50:0
0 0.966
00:07:34
:77
I6
0.598
00:08:90:8
7 0.844
00:07:95:6
6 0.987
00:07:34
:90
13. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.1, February 2014
57
I7
0.6
00:07:16:2
51 0.745
00:06:19:6
6 0..899
00:06:09
:00
I8
0.56
00:08:20:0
0 0.744
00:07:60:2
9 0.866
00:07:05
:20
I9
0.579
00:07:33:0
0 0.899
00:06:91:2
2 0.976
00:05:10
:00
I10
0.49
00:05:89:0
0 0.7112
00:04:89:1
5 0.929
00:04:09
:20
Table1.0: Statistical analysis of the clustering performance of algorithms
A statistical comparison has been made for the performance analysis of algorithms as shown in
the table1.0. It is found from the study that high degree of accuracy is noted for the GRID-
DBSCAN in terms of Kappa Coefficient. Even though a considerable computational time is
observed in the performance, the algorithm achieves an ultimate objective of acquiring more
precision and accuracy in the task of clustering assignment for the satellite imagery.
Fig.15 Levels of accuracy of performance.
From the figure 15 it is revealed that among three algorithms, GRID-DBSCAN has exhibited an
ultimate performance in terms of maximum statistical accuracy on comparison with the other two
earlier established techniques in clustering process.
14. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.1, February 2014
58
6. CONCLUSIONS & FUTURE WORK
In this paper, the proposed algorithm called GRID-DBSCAN is found to be a potential technique
in detecting the geometrical changes of the spatial data and having better performance. The
simulations are performed on dual core machine with 1GB graphic card using MatLab10.0 The
simulation of proposed algorithm indicates the realignment of geometrical locations in terms of
the number of clusters. The proposed methodology in the present paper also reveals the
corresponding variation of color distribution in the spectral figures. The performance of the
proposed GRID-DBSCAN technique is found to be more accurate than the already established
algorithms of DBSCAN and GRID. The spatial disturbances over a time and location can easily
be detected with two coined parameters the number of clusters and the variation distribution of
colors with and without the natural disturbances in geometrical locations. The research findings
achieved in the paper establish a powerful detecting system of dynamic changes in geometrical
locations in satellite imagery.
In future scope, this clustering technique can be extended to hexagonal grids, Circular grids
which may be suitable for the respective target domain. Some of the potential areas where our
work can be complementing the heuristic approaches are GIS, Spatial, Agriculture and climate
analysis.
REFERENCES
[1] R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami, “An Interval Classifier for Database
Mining Applications,” Proc. 18th Conf. Very Large Databases, pp. 560–573, 1992.
[2] R. Agrawal, T. Imielinski, and A. Swami, “Mining Association Rules between Sets of Items in Large
Databases,” Proc. 1993 ACM Special Interest Group on Management of Data, pp. 207–216, 1993.
[3] Borgida and R. J. Brachman, “Loading Data into Description Reasoners,” Proc. 1993 ACM Special
Interest Group on Management of Data, pp. 217–226, 1993.
[4] D. Dobkin and D. Kirkpatrick, “A Linear Algorithm for Determining the Separation of Convex
Polyhedra,” J. Algorithms,vol. 6, no. 3, pp. 381–392, 1985.
[5] R. Agrawal, J. Gehrke, D. Gunopulos, and P. Raghavan,“Automatic Subspace Clustering of High
Dimensional Data for Data Mining Applications,” Proc. 1998 ACM-SIGMOD, pp. 94–105, 1998.
[6] Jiadong Ren, Lili Meng and Changzhen Hu, “CABGD: An Improved Clustering Algorithm Based
on Grid-Density”. Fourth International Conference on Innovative Computing, Information and
Control, Pages 381-384,2009.
[7] L. David, “Hyperspectral image data analysis as a high dimensional signal processing problem,”
IEEE Signal Process. Mag., vol. 19, no. 1,pp. 17–28, 2002. [8] T.Brinkhoff and H.-P. Kriegel, B.
Seeger, “Efficient Processing of Spatial Joins Using R-Trees,” Proc. 1993 ACM Special Interest
Group on Management of Data, pp. 237-246, 1993.
[8] J. Senthilnath, S. N. Omkar, V. Mani, N. Tejovanth, P. G. Diwakar, and S. B. Archana, “Multi-
spectral satellite image classification using glowworm swarm optimization,” in Proc. IEEE Int.
Geoscience and Remote Sensing Symp. (IGARSS), 2011, pp. 47–50.
[9] J. Senthilnath, Student Member, IEEE, S.N.Omkar, V.Mani, TejovanthN, P.G. Diwakar, and Archana
Shenoy B” Hierarchical Clustering Algorithm for Land Cover Mapping Using Satellite Images” ieee
journal of selected topics in applied earth observations and remote sensing, vol. 5, no. 3, june 2012.
[10] M. Ankerst, M. Breunig, H.-P. Kriegel, and J. Sander, “OPTICS: Ordering Points To Identify the
Clustering Structure,” Proc. 1999 ACM Special Interest Group on Management of Data, pp. 49–60,
1999.
[11] W.G. Aref and H. Samet, “Optimization Strategies for Spatial Query Processing,” Proc. 17th Conf.
Very Large Databases, pp. 81-90,1991.
[12] A. E. Zaart, D. Ziou, S. Wang, and Q. Jiang, “Segmentation of SAR images using mixture of gamma
distribution,” Pattern Recognit., vol.35, no. 3, pp. 713–724, 2002.
15. International Journal of Database Management Systems ( IJDMS ) Vol.6, No.1, February 2014
59
[13] P. Gamba and M. Aldrighi, “SAR data classification of urban areas by means of segmentation
techniques and ancillary optical data,”IEEE J. Sel. Top. Appl.Earth Observ. Remote Sens., vol. 5, no.
4, pp.1140–1148, 2012.
[14] W. Chumsamrong, P. Thitimajshima, and Y. Rangsanseri, “Synthetic aperture radar (SAR) image
segmentation using a new modified fuzzy c-means algorithm,” in Proc. IEEE Symp. Geosci. Remote
Sens., Honolulu,HI, 2000, pp. 624–626.
[15] D. H. Hoekman, M. A. M. Vissers, and T. N. Tran, “Unsupervised full-polarimetric SAR data
segmentation as a tool for classification ofAgricultural areas,” IEEE J. Sel.Top. Appl. EarthObserv.
Remote Sens.,vol. 4, no. 2, pp. 402–411, 2011.
[16] Y.Dong, B. C. Forster, and A. K.Milne, “Comparison of radar image segmentation by Gaussian and
Gamma-Markov random field models,”Int. J. Remote Sens., vol. 24, no. 4, pp. 711–722, 2003.
AUTHORS
Dr. Ananda Rao Akepogu received B.Tech degree in Computer Science & Engineering from University
of Hyderabad, Andhra Pradesh, India and M.Tech degree in A.I & Robotics from University of Hyderabad,
Andhra Pradesh, India. He received PhD degree from Indian Institute of Technology Madras, Chennai,
India. He is Professor of Computer Science & Engineering Department and currently working as Principal
of JNTUA College of Engineering, Anantapur, Jawaharlal Nehru Technological University, Andhra
Pradesh, India. Dr. Rao published more than 100 publications in various National and International
Journals/Conferences. He received Best Research Paper award for the paper titled “An Approach to Test
Case Design for Cost Effective Software Testing” in an International Conference on Software Engineering
held at Hong Kong, 18-20 March 2009. He also received Best Educationist Award for outstanding
achievements in the field of education by International Institute of Education & Management, New Delhi
on 21st
Jan. 2012. He bagged Bharat Vidya Shiromani Award from Indian Solidarity Council and
Rashtriya Vidya Gaurav Gold Medal Award from International Institute of Education & Management,
New Delhi on 19th
March, 2012. His main research interest includes software engineering and data mining.
Ch.Mallikarjuna Rao received the B.Tech degree in computer science from the Dr.Baba sahib Ambedkar
Marathwada University, Aurangabad,Maharastra in 1998,and the M.Tech Degree in Computer Science and
Engineering from J.N.T.U Anantapur ,Andhrapradesh in 2007. He is currently pursuing his PhD degree
from JNTU Ananthapur University, Andhra Pradesh. His research interest includes Data bases and data
mining.