Tracing moving objects have turned out to be essential in our life and have a lot of uses like: GPS guide,
traffic monitor based administrations and location-based services. Tracking the changing places of objects
has turned into important issues. The moving entities send their positions to the server through a system
and large amount of data is generated from these objects with high frequent updates so we need an index
structure to retrieve information as fast as possible. The index structure should be adaptive, dynamic to
monitor the locations of objects and quick to give responses to the inquiries efficiently. The most wellknown
kinds of queries strategies in moving objects databases are Rang, Point and K-Nearest Neighbour
and inquiries. This study uses R-tree method to get detailed range query results efficiently. But using R-tree
only will generate much overlapping and coverage between MBR. So R-tree by combining with Gridpartition
index is used because grid-index can reduce the overlap and coverage between MBR. The query
performance will be efficient by using these methods. We perform an extensive experimental study to
compare the two approaches on modern hardware.
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEIJDKP
Metadata represents the information about data to be stored in Data Warehouses. It is a mandatory
element of Data Warehouse to build an efficient Data Warehouse. Metadata helps in data integration,
lineage, data quality and populating transformed data into data warehouse. Spatial data warehouses are
based on spatial data mostly collected from Geographical Information Systems (GIS) and the transactional
systems that are specific to an application or enterprise. Metadata design and deployment is the most
critical phase in building of data warehouse where it is mandatory to bring the spatial information and
data modeling together. In this paper, we present a holistic metadata framework that drives metadata
creation for spatial data warehouse. Theoretically, the proposed metadata framework improves the
efficiency of accessing of data in response to frequent queries on SDWs. In other words, the proposed
framework decreases the response time of the query and accurate information is fetched from Data
Warehouse including the spatial information
Parallel Key Value Pattern Matching Modelijsrd.com
Mining frequent itemsets from the huge transactional database is an important task in data mining. To find frequent itemsets in databases involves big decision in data mining for the purpose of extracting association rules. Association rule mining is used to find relationships among large datasets. Many algorithms were developed to find those frequent itemsets. This work presents a summarization and new model of parallel key value pattern matching model which shards a large-scale mining task into independent, parallel tasks. It produces a frequent pattern showing their capabilities and efficiency in terms of time consumption. It also avoids the high computational cost. It discovers the frequent item set from the database.
Construction of a compact FP-tree ensures that subsequent mining can be performed with a rather compact data structure. This does not automatically guarantee that it will be highly efficient since one may still encounter the combinatorial problem of candidate generation if one simply uses this FP-tree to generate and check all the candidate patterns. we study how to explore the compact information stored in an FP-tree, develop the principles of frequent-pattern growth by examination of our running example, explore how to perform further optimization when there exit a single prefix path in an FP-tree, and propose a frequent- pattern growth algorithm, FP-growth, for mining the complete set of frequent patterns using FP-tree.
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
Clustering the results of a search helps the user to overview the information returned. In this paper, we
look upon the clustering task as cataloguing the search results. By catalogue we mean a structured label
list that can help the user to realize the labels and search results. Labelling Cluster is crucial because
meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra
time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able
to label clusters effectively, a new cluster labelling method is introduced. More emphasis was given to
/produce comprehensible and accurate cluster labels in addition to the discovery of document clusters. We
also present a new metric that employs to assess the success of cluster labelling. We adopt a comparative
evaluation strategy to derive the relative performance of the proposed method with respect to the two
prominent search result clustering methods: Suffix Tree Clustering and Lingo.
we perform the experiments using the publicly available Datasets Ambient and ODP-239
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYIJDKP
This document summarizes an approach to improve source code retrieval using structural information from source code. A lexical parser is developed to extract control statements and method identifiers from Java programs. A similarity measure is proposed that calculates the ratio of fully matching statements to partially matching statements in a sequence. Experiments show the retrieval model using this measure improves retrieval performance over other models by up to 90.9% relative to the number of retrieved methods.
Progressive Mining of Sequential Patterns Based on Single ConstraintTELKOMNIKA JOURNAL
Data that were appeared in the order of time and stored in a sequence database can be processed to obtain sequential patterns. Sequential pattern mining is the process to obtain sequential patterns from database. However, large amount of data with a variety of data type and rapid data growth raise the scalability issue in data mining process. On the other hand, user needs to analyze data based on specific organizational needs. Therefore, constraint is used to impose limitation in the mining process. Constraint in sequential pattern mining can reduce the short and trivial sequential patterns so that the sequential patterns satisfy user needs. Progressive mining of sequential patterns, PISA, based on single constraint utilizes Period of Interest (POI) as predefined time frame set by user in progressive sequential tree. Single constraint checking in PISA utilizes the concept of anti monotonic or monotonic constraint. Therefore, the number of sequential patterns will decrease, the total execution time of mining process will decrease and as a result, the system scalability will be achieved.
A CONCEPTUAL METADATA FRAMEWORK FOR SPATIAL DATA WAREHOUSEIJDKP
Metadata represents the information about data to be stored in Data Warehouses. It is a mandatory
element of Data Warehouse to build an efficient Data Warehouse. Metadata helps in data integration,
lineage, data quality and populating transformed data into data warehouse. Spatial data warehouses are
based on spatial data mostly collected from Geographical Information Systems (GIS) and the transactional
systems that are specific to an application or enterprise. Metadata design and deployment is the most
critical phase in building of data warehouse where it is mandatory to bring the spatial information and
data modeling together. In this paper, we present a holistic metadata framework that drives metadata
creation for spatial data warehouse. Theoretically, the proposed metadata framework improves the
efficiency of accessing of data in response to frequent queries on SDWs. In other words, the proposed
framework decreases the response time of the query and accurate information is fetched from Data
Warehouse including the spatial information
Parallel Key Value Pattern Matching Modelijsrd.com
Mining frequent itemsets from the huge transactional database is an important task in data mining. To find frequent itemsets in databases involves big decision in data mining for the purpose of extracting association rules. Association rule mining is used to find relationships among large datasets. Many algorithms were developed to find those frequent itemsets. This work presents a summarization and new model of parallel key value pattern matching model which shards a large-scale mining task into independent, parallel tasks. It produces a frequent pattern showing their capabilities and efficiency in terms of time consumption. It also avoids the high computational cost. It discovers the frequent item set from the database.
Construction of a compact FP-tree ensures that subsequent mining can be performed with a rather compact data structure. This does not automatically guarantee that it will be highly efficient since one may still encounter the combinatorial problem of candidate generation if one simply uses this FP-tree to generate and check all the candidate patterns. we study how to explore the compact information stored in an FP-tree, develop the principles of frequent-pattern growth by examination of our running example, explore how to perform further optimization when there exit a single prefix path in an FP-tree, and propose a frequent- pattern growth algorithm, FP-growth, for mining the complete set of frequent patterns using FP-tree.
With the development of database, the data volume stored in database increases rapidly and in the large
amounts of data much important information is hidden. If the information can be extracted from the
database they will create a lot of profit for the organization. The question they are asking is how to extract
this value. The answer is data mining. There are many technologies available to data mining practitioners,
including Artificial Neural Networks, Genetics, Fuzzy logic and Decision Trees. Many practitioners are
wary of Neural Networks due to their black box nature, even though they have proven themselves in many
situations. This paper is an overview of artificial neural networks and questions their position as a
preferred tool by data mining practitioners.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
Clustering the results of a search helps the user to overview the information returned. In this paper, we
look upon the clustering task as cataloguing the search results. By catalogue we mean a structured label
list that can help the user to realize the labels and search results. Labelling Cluster is crucial because
meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra
time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able
to label clusters effectively, a new cluster labelling method is introduced. More emphasis was given to
/produce comprehensible and accurate cluster labels in addition to the discovery of document clusters. We
also present a new metric that employs to assess the success of cluster labelling. We adopt a comparative
evaluation strategy to derive the relative performance of the proposed method with respect to the two
prominent search result clustering methods: Suffix Tree Clustering and Lingo.
we perform the experiments using the publicly available Datasets Ambient and ODP-239
SOURCE CODE RETRIEVAL USING SEQUENCE BASED SIMILARITYIJDKP
This document summarizes an approach to improve source code retrieval using structural information from source code. A lexical parser is developed to extract control statements and method identifiers from Java programs. A similarity measure is proposed that calculates the ratio of fully matching statements to partially matching statements in a sequence. Experiments show the retrieval model using this measure improves retrieval performance over other models by up to 90.9% relative to the number of retrieved methods.
Progressive Mining of Sequential Patterns Based on Single ConstraintTELKOMNIKA JOURNAL
Data that were appeared in the order of time and stored in a sequence database can be processed to obtain sequential patterns. Sequential pattern mining is the process to obtain sequential patterns from database. However, large amount of data with a variety of data type and rapid data growth raise the scalability issue in data mining process. On the other hand, user needs to analyze data based on specific organizational needs. Therefore, constraint is used to impose limitation in the mining process. Constraint in sequential pattern mining can reduce the short and trivial sequential patterns so that the sequential patterns satisfy user needs. Progressive mining of sequential patterns, PISA, based on single constraint utilizes Period of Interest (POI) as predefined time frame set by user in progressive sequential tree. Single constraint checking in PISA utilizes the concept of anti monotonic or monotonic constraint. Therefore, the number of sequential patterns will decrease, the total execution time of mining process will decrease and as a result, the system scalability will be achieved.
With the rapid development in Geographic Information Systems (GISs) and their applications, more and
more geo-graphical databases have been developed by different vendors. However, data integration and
accessing is still a big problem for the development of GIS applications as no interoperability exists among
different spatial databases. In this paper we propose a unified approach for spatial data query. The paper
describes a framework for integrating information from repositories containing different vector data sets
formats and repositories containing raster datasets. The presented approach converts different vector data
formats into a single unified format (File Geo-Database “GDB”). In addition, we employ “metadata” to
support a wide range of users’ queries to retrieve relevant geographic information from heterogeneous and
distributed repositories. Such an employment enhances both query processing and performance.
This document discusses GCUBE indexing, which is a method for indexing and aggregating spatial/continuous values in a data warehouse. The key challenges addressed are defining and aggregating spatial/continuous values, and efficiently representing, indexing, updating and querying data that includes both categorical and continuous dimensions. The proposed GCUBE approach maps multi-dimensional data to a linear ordering using the Hilbert curve, and then constructs an index structure on the ordered data to enable efficient query processing. Empirical results show the GCUBE indexing offers significant performance advantages over alternative approaches.
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
Apriori is one of the key algorithms to generate frequent itemsets. Analysing frequent itemset is a crucial
step in analysing structured data and in finding association relationship between items. This stands as an
elementary foundation to supervised learning, which encompasses classifier and feature extraction
methods. Applying this algorithm is crucial to understand the behaviour of structured data. Most of the
structured data in scientific domain are voluminous. Processing such kind of data requires state of the art
computing machines. Setting up such an infrastructure is expensive. Hence a distributed environment
such as a clustered setup is employed for tackling such scenarios. Apache Hadoop distribution is one of
the cluster frameworks in distributed environment that helps by distributing voluminous data across a
number of nodes in the framework. This paper focuses on map/reduce design and implementation of
Apriori algorithm for structured data analysis.
This document summarizes a research paper that proposes a multidimensional data mining algorithm to determine association rules across different granularities. The algorithm addresses weaknesses in existing techniques, such as having to rescan the entire database when new attributes are added. It uses a concept taxonomy structure to represent the search space and finds association patterns by selecting concepts from individual taxonomies. An experiment on a wholesale business dataset demonstrates that the algorithm is linear and highly scalable to the number of records and can flexibly handle different data types.
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...IOSR Journals
This document summarizes and compares three clustering algorithms: DBSCAN, k-means, and SOM (Self-Organizing Maps). It first provides background on spatial data mining and clustering techniques. It then describes the DBSCAN, k-means, and SOM algorithms in detail, explaining their key steps and properties. The document evaluates the performance of these three algorithms on two-dimensional spatial datasets, analyzing the density-based clustering characteristics of each. It finds that DBSCAN can identify clusters of varying shapes and sizes with noise, while k-means partitions data into a predefined number of clusters and SOM uses neural networks to cluster unlabeled data.
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.
New proximity estimate for incremental update of non uniformly distributed cl...IJDKP
The conventional clustering algorithms mine static databases and generate a set of patterns in the form of
clusters. Many real life databases keep growing incrementally. For such dynamic databases, the patterns
extracted from the original database become obsolete. Thus the conventional clustering algorithms are not
suitable for incremental databases due to lack of capability to modify the clustering results in accordance
with recent updates. In this paper, the author proposes a new incremental clustering algorithm called
CFICA(Cluster Feature-Based Incremental Clustering Approach for numerical data) to handle numerical
data and suggests a new proximity metric called Inverse Proximity Estimate (IPE) which considers the
proximity of a data point to a cluster representative as well as its proximity to a farthest point in its vicinity.
CFICA makes use of the proposed proximity metric to determine the membership of a data point into a
cluster.
The document proposes using an A* algorithm along with a relational framework to more efficiently calculate shortest paths in graph data stored in a relational database. The system initializes a source node, then iteratively selects the next frontier node and expands paths until the target node is found. Experimental results on road network data show the proposed approach has faster execution time than bidirectional search, especially on larger datasets containing over 500,000 records. The approach requires more memory than bidirectional search but is more efficient than other shortest path algorithms.
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...ijdpsjournal
This document summarizes a research paper that presents a task-decomposition based anomaly detection system for analyzing massive and highly volatile session data from the Science Information Network (SINET), Japan's academic backbone network. The system uses a master-worker design with dynamic task scheduling to process over 1 billion sessions per day. It discriminates incoming and outgoing traffic using GPU parallelization and generates histograms of traffic volumes over time. Long short-term memory (LSTM) neural networks detect anomalies like spikes in incoming traffic volumes. The experiment analyzed SINET data from February 27 to March 8, 2021, detecting some anomalies while processing 500-650 gigabytes of daily session data.
Mining High Utility Patterns in Large Databases using Mapreduce FrameworkIRJET Journal
This document discusses mining high utility patterns from large databases using the MapReduce framework. It proposes using the d2HUP algorithm to efficiently mine high utility patterns from partitioned big data in parallel. The algorithm traverses a reverse set enumeration tree using depth-first search to identify high utility patterns based on a minimum utility threshold, without generating candidates. It partitions the data using MapReduce and mines patterns from each partition individually. The results are then combined to obtain the final high utility patterns. The proposed approach aims to improve efficiency over existing methods that are not scalable to large datasets.
Analysis and Comparative of Sorting Algorithmsijtsrd
There are many popular problems in different practical fields of computer sciences, computer networks, database applications and artificial intelligence. One of these basic operations and problems is the sorting algorithm. Sorting is also a fundamental problem in algorithm analysis and designing point of view. Therefore, many computer scientists have much worked on sorting algorithms. Sorting is a key data structure operation, which makes easy arranging, searching, and finding the information. Sorting of elements is an important task in computation that is used frequently in different processes. For accomplish, the task in a reasonable amount of time efficient algorithm is needed. Different types of sorting algorithms have been devised for the purpose. Which is the best suited sorting algorithm can only be decided by comparing the available algorithms in different aspects. In this paper, a comparison is made for different sorting algorithms used in the computation. Htwe Htwe Aung "Analysis and Comparative of Sorting Algorithms" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26575.pdfPaper URL: https://www.ijtsrd.com/computer-science/programming-language/26575/analysis-and-comparative-of-sorting-algorithms/htwe-htwe-aung
Recommendation system using bloom filter in mapreduceIJDKP
Many clients like to use the Web to discover product details in the form of online reviews. The reviews are
provided by other clients and specialists. Recommender systems provide an important response to the
information overload problem as it presents users more practical and personalized information facilities.
Collaborative filtering methods are vital component in recommender systems as they generate high-quality
recommendations by influencing the likings of society of similar users. The collaborative filtering method
has assumption that people having same tastes choose the same items. The conventional collaborative
filtering system has drawbacks as sparse data problem & lack of scalability. A new recommender system is
required to deal with the sparse data problem & produce high quality recommendations in large scale
mobile environment. MapReduce is a programming model which is widely used for large-scale data
analysis. The described algorithm of recommendation mechanism for mobile commerce is user based
collaborative filtering using MapReduce which reduces scalability problem in conventional CF system.
One of the essential operations for the data analysis is join operation. But MapReduce is not very
competent to execute the join operation as it always uses all records in the datasets where only small
fraction of datasets are applicable for the join operation. This problem can be reduced by applying
bloomjoin algorithm. The bloom filters are constructed and used to filter out redundant intermediate
records. The proposed algorithm using bloom filter will reduce the number of intermediate results and will
improve the join performance.
Ginix Generalized Inverted Index for Keyword SearchIRJET Journal
This paper presents a new index structure called Ginix (Generalized Inverted Index) that more efficiently supports keyword searches on text datasets. Ginix compresses traditional inverted indexes by merging consecutive document IDs into intervals to reduce storage space. It also develops efficient algorithms for set operations like union and intersection directly on the interval lists without requiring decompression. Experiments show that Ginix not only reduces index size but also improves search performance compared to traditional inverted indexes on real datasets.
This document summarizes a research paper that proposes a modified pure radix sort algorithm for large heterogeneous data sets in distributed computing environments. It begins with an introduction to sorting and radix sorting. It then reviews previous work on optimizing radix sort, including reducing memory accesses and improving data locality. The paper proposes a new modified pure radix sort algorithm aimed at optimizing problems with radix sort for large heterogeneous data sets through a distributed computing approach using divide and conquer.
This document summarizes a research paper that introduces a novel multi-viewpoint similarity measure for clustering text documents. The paper begins with background on commonly used similarity measures like Euclidean distance and cosine similarity. It then presents the novel multi-viewpoint measure, which considers multiple viewpoints (objects not assumed to be in the same cluster) rather than a single viewpoint. The paper proposes two new clustering criterion functions based on this measure and compares them to other algorithms on benchmark datasets. The goal is to develop a similarity measure and clustering methods that provide high-quality, consistent performance like k-means but can better handle sparse, high-dimensional text data.
Parametric comparison based on split criterion on classification algorithmIAEME Publication
This document presents a comparison of different attribute selection criteria for classification algorithms in stream data mining. It analyzes two common criteria - information gain and Gini index - and evaluates their impact on classification accuracy using different datasets. The results show that information gain generally achieves higher accuracy than Gini index, especially for larger data sizes. The document aims to improve the performance of stream data classification algorithms by optimizing the split criterion selection approach.
Effective Data Retrieval in XML using TreeMatch AlgorithmIRJET Journal
This document summarizes research on effective data retrieval from XML documents using the TreeMatch algorithm. It begins with an abstract that introduces the TreeMatch algorithm and its ability to provide fast data retrieval from XML documents by matching tree-shaped patterns. It then reviews related work on XML tree matching algorithms and their issues like suboptimality. The document proposes using the TreeMatch algorithm to overcome issues with wildcards, negation, and siblings when querying XML documents with XPath or XQuery. It provides details on the TreeMatch algorithm and its ability to process different types of XML tree pattern queries efficiently while avoiding intermediate results. In conclusion, it states that the TreeMatch algorithm can efficiently handle three types of XML tree pattern queries and overcome the problem of sub
Spatio-Temporal Database and Its Models: A ReviewIOSR Journals
This document provides a review of spatial-temporal databases and their models. It discusses the key components and characteristics of spatial databases, temporal databases, and spatial-temporal databases. Some of the main models of spatial-temporal data modeling that are described include the snapshot model, space-time composite data model, simple time-stamping models, event-oriented models, three-domain model, and history graph model. The review examines how these different models approach representing and querying spatial and temporal data.
Abstract—Since the demand for information retrieval increases quickly, indexing structures became an important issue to support fast information retrieval. According to the work in this paper, a new data structure called Dynamic Ordered Multi-field Index (DOMI) for information retrieval has been introduced. It is based on radix trees organized in segments in addition to a hash table to point to the roots of each segment, where each segment is dedicated to store the values of a single field. The hash table is used to access the needed segments directly without traversing the upper segments. So, DOMI improves look-up performance for queries addressing to a single field. In the case of multiple queries addressing, each segment of the radix tree is traversed sequentially without visiting the unrelated branches. The use of segmentation for the proposed DOMI provides flexibility for minimizing communication overhead in the distributed system. Every field in the radix tree is represented by one segment, where each segment can be stored as one block.
In addition to, the proposed DOMI consumes less space comparing to indexes which are built using B or B+ trees. Hence, it is more suitable for intensive-data such as Big Data.
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
The document discusses techniques for detecting similarity and deduplication in document analysis using vector analysis. It proposes analyzing documents by extracting abstract content, separating words and combining them in a word cloud to determine frequency. This approach aims to identify whether documents are duplicates by analyzing word vectors at the word, sentence and paragraph level while also applying techniques like stemming, stopping words and semantic similarity.
Convincing a customer is always considered as a challenging task in every business. But when it comes to online business, this task becomes even more difficult. Online retailers try everything possible to gain the trust of the customer. One of the solutions is to provide an area for existing users to leave their comments. This service can effectively develop the trust of the customer however normally the customer comments about the product in their native language using Roman script. If there are hundreds of comments this makes difficulty even for the native customers to make a buying decision. This research proposes a system which extracts the comments posted in Roman Urdu, translate them, find their polarity and then gives us the rating of the product. This rating will help the native and non-native customers to make buying decision efficiently from the comments posted in Roman Urdu.
Spatial database are becoming more and more popular in recent years. There is more and more
commercial and research interest in location-based search from spatial database. Spatial keyword search
has been well studied for years due to its importance to commercial search engines. Specially, a spatial
keyword query takes a user location and user-supplied keywords as arguments and returns objects that are
spatially and textually relevant to these arguments. Geo-textual index play an important role in spatial
keyword querying. A number of geo-textual indices have been proposed in recent years which mainly
combine the R-tree and its variants and the inverted file. This paper propose new index structure that
combine K-d tree and inverted file for spatial range keyword query which are based on the most spatial
and textual relevance to query point within given range.
With the rapid development in Geographic Information Systems (GISs) and their applications, more and
more geo-graphical databases have been developed by different vendors. However, data integration and
accessing is still a big problem for the development of GIS applications as no interoperability exists among
different spatial databases. In this paper we propose a unified approach for spatial data query. The paper
describes a framework for integrating information from repositories containing different vector data sets
formats and repositories containing raster datasets. The presented approach converts different vector data
formats into a single unified format (File Geo-Database “GDB”). In addition, we employ “metadata” to
support a wide range of users’ queries to retrieve relevant geographic information from heterogeneous and
distributed repositories. Such an employment enhances both query processing and performance.
This document discusses GCUBE indexing, which is a method for indexing and aggregating spatial/continuous values in a data warehouse. The key challenges addressed are defining and aggregating spatial/continuous values, and efficiently representing, indexing, updating and querying data that includes both categorical and continuous dimensions. The proposed GCUBE approach maps multi-dimensional data to a linear ordering using the Hilbert curve, and then constructs an index structure on the ordered data to enable efficient query processing. Empirical results show the GCUBE indexing offers significant performance advantages over alternative approaches.
MAP/REDUCE DESIGN AND IMPLEMENTATION OF APRIORIALGORITHM FOR HANDLING VOLUMIN...acijjournal
Apriori is one of the key algorithms to generate frequent itemsets. Analysing frequent itemset is a crucial
step in analysing structured data and in finding association relationship between items. This stands as an
elementary foundation to supervised learning, which encompasses classifier and feature extraction
methods. Applying this algorithm is crucial to understand the behaviour of structured data. Most of the
structured data in scientific domain are voluminous. Processing such kind of data requires state of the art
computing machines. Setting up such an infrastructure is expensive. Hence a distributed environment
such as a clustered setup is employed for tackling such scenarios. Apache Hadoop distribution is one of
the cluster frameworks in distributed environment that helps by distributing voluminous data across a
number of nodes in the framework. This paper focuses on map/reduce design and implementation of
Apriori algorithm for structured data analysis.
This document summarizes a research paper that proposes a multidimensional data mining algorithm to determine association rules across different granularities. The algorithm addresses weaknesses in existing techniques, such as having to rescan the entire database when new attributes are added. It uses a concept taxonomy structure to represent the search space and finds association patterns by selecting concepts from individual taxonomies. An experiment on a wholesale business dataset demonstrates that the algorithm is linear and highly scalable to the number of records and can flexibly handle different data types.
Impulsion of Mining Paradigm with Density Based Clustering of Multi Dimension...IOSR Journals
This document summarizes and compares three clustering algorithms: DBSCAN, k-means, and SOM (Self-Organizing Maps). It first provides background on spatial data mining and clustering techniques. It then describes the DBSCAN, k-means, and SOM algorithms in detail, explaining their key steps and properties. The document evaluates the performance of these three algorithms on two-dimensional spatial datasets, analyzing the density-based clustering characteristics of each. It finds that DBSCAN can identify clusters of varying shapes and sizes with noise, while k-means partitions data into a predefined number of clusters and SOM uses neural networks to cluster unlabeled data.
A Novel Approach for Clustering Big Data based on MapReduce IJECEIAES
Clustering is one of the most important applications of data mining. It has attracted attention of researchers in statistics and machine learning. It is used in many applications like information retrieval, image processing and social network analytics etc. It helps the user to understand the similarity and dissimilarity between objects. Cluster analysis makes the users understand complex and large data sets more clearly. There are different types of clustering algorithms analyzed by various researchers. Kmeans is the most popular partitioning based algorithm as it provides good results because of accurate calculation on numerical data. But Kmeans give good results for numerical data only. Big data is combination of numerical and categorical data. Kprototype algorithm is used to deal with numerical as well as categorical data. Kprototype combines the distance calculated from numeric and categorical data. With the growth of data due to social networking websites, business transactions, scientific calculation etc., there is vast collection of structured, semi-structured and unstructured data. So, there is need of optimization of Kprototype so that these varieties of data can be analyzed efficiently.In this work, Kprototype algorithm is implemented on MapReduce in this paper. Experiments have proved that Kprototype implemented on Mapreduce gives better performance gain on multiple nodes as compared to single node. CPU execution time and speedup are used as evaluation metrics for comparison.Intellegent splitter is proposed in this paper which splits mixed big data into numerical and categorical data. Comparison with traditional algorithms proves that proposed algorithm works better for large scale of data.
New proximity estimate for incremental update of non uniformly distributed cl...IJDKP
The conventional clustering algorithms mine static databases and generate a set of patterns in the form of
clusters. Many real life databases keep growing incrementally. For such dynamic databases, the patterns
extracted from the original database become obsolete. Thus the conventional clustering algorithms are not
suitable for incremental databases due to lack of capability to modify the clustering results in accordance
with recent updates. In this paper, the author proposes a new incremental clustering algorithm called
CFICA(Cluster Feature-Based Incremental Clustering Approach for numerical data) to handle numerical
data and suggests a new proximity metric called Inverse Proximity Estimate (IPE) which considers the
proximity of a data point to a cluster representative as well as its proximity to a farthest point in its vicinity.
CFICA makes use of the proposed proximity metric to determine the membership of a data point into a
cluster.
The document proposes using an A* algorithm along with a relational framework to more efficiently calculate shortest paths in graph data stored in a relational database. The system initializes a source node, then iteratively selects the next frontier node and expands paths until the target node is found. Experimental results on road network data show the proposed approach has faster execution time than bidirectional search, especially on larger datasets containing over 500,000 records. The approach requires more memory than bidirectional search but is more efficient than other shortest path algorithms.
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...ijdpsjournal
This document summarizes a research paper that presents a task-decomposition based anomaly detection system for analyzing massive and highly volatile session data from the Science Information Network (SINET), Japan's academic backbone network. The system uses a master-worker design with dynamic task scheduling to process over 1 billion sessions per day. It discriminates incoming and outgoing traffic using GPU parallelization and generates histograms of traffic volumes over time. Long short-term memory (LSTM) neural networks detect anomalies like spikes in incoming traffic volumes. The experiment analyzed SINET data from February 27 to March 8, 2021, detecting some anomalies while processing 500-650 gigabytes of daily session data.
Mining High Utility Patterns in Large Databases using Mapreduce FrameworkIRJET Journal
This document discusses mining high utility patterns from large databases using the MapReduce framework. It proposes using the d2HUP algorithm to efficiently mine high utility patterns from partitioned big data in parallel. The algorithm traverses a reverse set enumeration tree using depth-first search to identify high utility patterns based on a minimum utility threshold, without generating candidates. It partitions the data using MapReduce and mines patterns from each partition individually. The results are then combined to obtain the final high utility patterns. The proposed approach aims to improve efficiency over existing methods that are not scalable to large datasets.
Analysis and Comparative of Sorting Algorithmsijtsrd
There are many popular problems in different practical fields of computer sciences, computer networks, database applications and artificial intelligence. One of these basic operations and problems is the sorting algorithm. Sorting is also a fundamental problem in algorithm analysis and designing point of view. Therefore, many computer scientists have much worked on sorting algorithms. Sorting is a key data structure operation, which makes easy arranging, searching, and finding the information. Sorting of elements is an important task in computation that is used frequently in different processes. For accomplish, the task in a reasonable amount of time efficient algorithm is needed. Different types of sorting algorithms have been devised for the purpose. Which is the best suited sorting algorithm can only be decided by comparing the available algorithms in different aspects. In this paper, a comparison is made for different sorting algorithms used in the computation. Htwe Htwe Aung "Analysis and Comparative of Sorting Algorithms" Published in International Journal of Trend in Scientific Research and Development (ijtsrd), ISSN: 2456-6470, Volume-3 | Issue-5 , August 2019, URL: https://www.ijtsrd.com/papers/ijtsrd26575.pdfPaper URL: https://www.ijtsrd.com/computer-science/programming-language/26575/analysis-and-comparative-of-sorting-algorithms/htwe-htwe-aung
Recommendation system using bloom filter in mapreduceIJDKP
Many clients like to use the Web to discover product details in the form of online reviews. The reviews are
provided by other clients and specialists. Recommender systems provide an important response to the
information overload problem as it presents users more practical and personalized information facilities.
Collaborative filtering methods are vital component in recommender systems as they generate high-quality
recommendations by influencing the likings of society of similar users. The collaborative filtering method
has assumption that people having same tastes choose the same items. The conventional collaborative
filtering system has drawbacks as sparse data problem & lack of scalability. A new recommender system is
required to deal with the sparse data problem & produce high quality recommendations in large scale
mobile environment. MapReduce is a programming model which is widely used for large-scale data
analysis. The described algorithm of recommendation mechanism for mobile commerce is user based
collaborative filtering using MapReduce which reduces scalability problem in conventional CF system.
One of the essential operations for the data analysis is join operation. But MapReduce is not very
competent to execute the join operation as it always uses all records in the datasets where only small
fraction of datasets are applicable for the join operation. This problem can be reduced by applying
bloomjoin algorithm. The bloom filters are constructed and used to filter out redundant intermediate
records. The proposed algorithm using bloom filter will reduce the number of intermediate results and will
improve the join performance.
Ginix Generalized Inverted Index for Keyword SearchIRJET Journal
This paper presents a new index structure called Ginix (Generalized Inverted Index) that more efficiently supports keyword searches on text datasets. Ginix compresses traditional inverted indexes by merging consecutive document IDs into intervals to reduce storage space. It also develops efficient algorithms for set operations like union and intersection directly on the interval lists without requiring decompression. Experiments show that Ginix not only reduces index size but also improves search performance compared to traditional inverted indexes on real datasets.
This document summarizes a research paper that proposes a modified pure radix sort algorithm for large heterogeneous data sets in distributed computing environments. It begins with an introduction to sorting and radix sorting. It then reviews previous work on optimizing radix sort, including reducing memory accesses and improving data locality. The paper proposes a new modified pure radix sort algorithm aimed at optimizing problems with radix sort for large heterogeneous data sets through a distributed computing approach using divide and conquer.
This document summarizes a research paper that introduces a novel multi-viewpoint similarity measure for clustering text documents. The paper begins with background on commonly used similarity measures like Euclidean distance and cosine similarity. It then presents the novel multi-viewpoint measure, which considers multiple viewpoints (objects not assumed to be in the same cluster) rather than a single viewpoint. The paper proposes two new clustering criterion functions based on this measure and compares them to other algorithms on benchmark datasets. The goal is to develop a similarity measure and clustering methods that provide high-quality, consistent performance like k-means but can better handle sparse, high-dimensional text data.
Parametric comparison based on split criterion on classification algorithmIAEME Publication
This document presents a comparison of different attribute selection criteria for classification algorithms in stream data mining. It analyzes two common criteria - information gain and Gini index - and evaluates their impact on classification accuracy using different datasets. The results show that information gain generally achieves higher accuracy than Gini index, especially for larger data sizes. The document aims to improve the performance of stream data classification algorithms by optimizing the split criterion selection approach.
Effective Data Retrieval in XML using TreeMatch AlgorithmIRJET Journal
This document summarizes research on effective data retrieval from XML documents using the TreeMatch algorithm. It begins with an abstract that introduces the TreeMatch algorithm and its ability to provide fast data retrieval from XML documents by matching tree-shaped patterns. It then reviews related work on XML tree matching algorithms and their issues like suboptimality. The document proposes using the TreeMatch algorithm to overcome issues with wildcards, negation, and siblings when querying XML documents with XPath or XQuery. It provides details on the TreeMatch algorithm and its ability to process different types of XML tree pattern queries efficiently while avoiding intermediate results. In conclusion, it states that the TreeMatch algorithm can efficiently handle three types of XML tree pattern queries and overcome the problem of sub
Spatio-Temporal Database and Its Models: A ReviewIOSR Journals
This document provides a review of spatial-temporal databases and their models. It discusses the key components and characteristics of spatial databases, temporal databases, and spatial-temporal databases. Some of the main models of spatial-temporal data modeling that are described include the snapshot model, space-time composite data model, simple time-stamping models, event-oriented models, three-domain model, and history graph model. The review examines how these different models approach representing and querying spatial and temporal data.
Abstract—Since the demand for information retrieval increases quickly, indexing structures became an important issue to support fast information retrieval. According to the work in this paper, a new data structure called Dynamic Ordered Multi-field Index (DOMI) for information retrieval has been introduced. It is based on radix trees organized in segments in addition to a hash table to point to the roots of each segment, where each segment is dedicated to store the values of a single field. The hash table is used to access the needed segments directly without traversing the upper segments. So, DOMI improves look-up performance for queries addressing to a single field. In the case of multiple queries addressing, each segment of the radix tree is traversed sequentially without visiting the unrelated branches. The use of segmentation for the proposed DOMI provides flexibility for minimizing communication overhead in the distributed system. Every field in the radix tree is represented by one segment, where each segment can be stored as one block.
In addition to, the proposed DOMI consumes less space comparing to indexes which are built using B or B+ trees. Hence, it is more suitable for intensive-data such as Big Data.
IRJET- Deduplication Detection for Similarity in Document Analysis Via Vector...IRJET Journal
The document discusses techniques for detecting similarity and deduplication in document analysis using vector analysis. It proposes analyzing documents by extracting abstract content, separating words and combining them in a word cloud to determine frequency. This approach aims to identify whether documents are duplicates by analyzing word vectors at the word, sentence and paragraph level while also applying techniques like stemming, stopping words and semantic similarity.
Convincing a customer is always considered as a challenging task in every business. But when it comes to online business, this task becomes even more difficult. Online retailers try everything possible to gain the trust of the customer. One of the solutions is to provide an area for existing users to leave their comments. This service can effectively develop the trust of the customer however normally the customer comments about the product in their native language using Roman script. If there are hundreds of comments this makes difficulty even for the native customers to make a buying decision. This research proposes a system which extracts the comments posted in Roman Urdu, translate them, find their polarity and then gives us the rating of the product. This rating will help the native and non-native customers to make buying decision efficiently from the comments posted in Roman Urdu.
Spatial database are becoming more and more popular in recent years. There is more and more
commercial and research interest in location-based search from spatial database. Spatial keyword search
has been well studied for years due to its importance to commercial search engines. Specially, a spatial
keyword query takes a user location and user-supplied keywords as arguments and returns objects that are
spatially and textually relevant to these arguments. Geo-textual index play an important role in spatial
keyword querying. A number of geo-textual indices have been proposed in recent years which mainly
combine the R-tree and its variants and the inverted file. This paper propose new index structure that
combine K-d tree and inverted file for spatial range keyword query which are based on the most spatial
and textual relevance to query point within given range.
A Survey on Graph Database Management Techniques for Huge Unstructured Data IJECEIAES
Data analysis, data management, and big data play a major role in both social and business perspective, in the last decade. Nowadays, the graph database is the hottest and trending research topic. A graph database is preferred to deal with the dynamic and complex relationships in connected data and offer better results. Every data element is represented as a node. For example, in social media site, a person is represented as a node, and its properties name, age, likes, and dislikes, etc and the nodes are connected with the relationships via edges. Use of graph database is expected to be beneficial in business, and social networking sites that generate huge unstructured data as that Big Data requires proper and efficient computational techniques to handle with. This paper reviews the existing graph data computational techniques and the research work, to offer the future research line up in graph database management.
The document summarizes research on performing spatio-textual similarity joins. It discusses:
1) Developing a filter-and-refine framework to efficiently find similar object pairs from two datasets using signatures.
2) Generating spatial and textual signatures for objects and building inverted indexes on the signatures to find candidate pairs.
3) Refining the candidate pairs to obtain the final result pairs that satisfy spatial and textual similarity thresholds.
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTERijdms
In this study, we evaluate the performance of SQL and NoSQL database management systems namely;
Cassandra, CouchDB, MongoDB, PostgreSQL, and RethinkDB. We use a cluster of four nodes to run the
database systems, with external load generators.The evaluation is conducted using data from Telenor
Sverige, a telecommunication company that operates in Sweden. The experiments are conducted using
three datasets of different sizes.The write throughput and latency as well as the read throughput and
latency are evaluated for four queries; namely distance query, k-nearest neighbour query, range query, and
region query. For write operations Cassandra has the highest throughput when multiple nodes are used,
whereas PostgreSQL has the lowest latency and the highest throughput for a single node. For read
operations MongoDB has the lowest latency for all queries. However, Cassandra has the highest
throughput for reads. The throughput decreasesas the dataset size increases for both write and read, for
both sequential as well as random order access. However, this decrease is more significant for random
read and write. In this study, we present the experience we had with these different database management
systems including setup and configuration complexity
Algorithm for calculating relevance of documents in information retrieval sys...IRJET Journal
The document proposes an algorithm to calculate the relevance of documents returned in response to user queries in information retrieval systems. It is based on classical similarity formulas like cosine, Jaccard, and dice that calculate similarity between document and query vectors. The algorithm aims to integrate user search preferences as a variable in determining document relevance, as classic models do not account for this. It uses text and web mining techniques to process user query and document metadata.
Overview of Indexing In Object Oriented DatabaseEditor IJMTER
The document discusses indexing in object-oriented databases. It provides an overview of indexing and describes some of the issues involved in indexing object-oriented data, including indexing on classes and over type hierarchies. It also discusses uni-directional versus bi-directional indexing. Indexing aims to improve query performance by allowing faster retrieval of data from the database. However, indexing object-oriented data raises additional challenges compared to relational databases due to the complex nested structure of objects.
The technology of object oriented databases was introduced to system developers in
the late 1980’s. Object DBMSs add database functionality to object programming languages. A
major benefit of this approach is the unification of the application and database development into
a seamless data model and language environment. As a result, applications require less code, use
more natural data modeling, and code bases are easier to maintain.
This document discusses topics related to data structures and algorithms. It covers structured programming and its advantages and disadvantages. It then introduces common data structures like stacks, queues, trees, and graphs. It discusses algorithm time and space complexity analysis and different types of algorithms. Sorting algorithms and their analysis are also introduced. Key concepts covered include linear and non-linear data structures, static and dynamic memory allocation, Big O notation for analyzing algorithms, and common sorting algorithms.
This document summarizes a research paper that proposes a new technique for web page clustering inspired by the cemetery organization behavior of ants. The technique involves 3 main steps:
1) Generating a term-document matrix to represent web pages and reducing the dimensionality of the matrix using Latent Semantic Indexing.
2) Transforming the web pages into a two-dimensional grid space based on the cemetery organization behavior of ants, where web pages are more likely to be placed near similar web pages.
3) Clustering the web pages represented in the two-dimensional grid space using k-means clustering. The paper claims this technique can improve web page clustering compared to other approaches.
This document summarizes a research paper that proposes a new technique for web page clustering inspired by the cemetery organization behavior of ants. The technique begins by reducing the dimensionality of the web page index using latent semantic indexing. It then transforms web pages into a two-dimensional grid space based on the cemetery organization behavior of ants. Finally, it clusters the web pages in this space using k-means clustering. The paper aims to address the challenges of web page clustering by applying this three-step technique and demonstrates the impact of dimensionality reduction and distance measures on clustering results.
A STUDY ON SIMILARITY MEASURE FUNCTIONS ON ENGINEERING MATERIALS SELECTION cscpconf
This document presents a study on using similarity measure functions to select engineering materials from a database. Thirteen similarity/distance functions are examined to quantify the similarity between materials based on their properties. A performance index measure is used to evaluate how well a selected material matches the target material. The materials are ranked based on their normalized performance index values. An algorithm is presented that takes a target material, calculates similarities to materials in a database using different functions, computes performance indexes, and selects the material with the minimum normalized performance index as the best match. Experimental results applied the approach to select materials from a database of 5670 materials that best match a target material described by 25 properties.
Performing Fast Spatial Query Search by Using Ultimate Code WordsBRNSSPublicationHubI
This document discusses performing fast spatial query searches using ultimate code words. It begins with an abstract discussing closest keyword search queries that find objects covering query keywords with minimum distance. It then provides background on spatial keyword queries that retrieve results based on both spatial and textual relevance. The document discusses challenges with existing approaches and proposes a new method considering both keyword and spatial relevance to improve search results. It reviews related work on spatial keyword query techniques and indexes before defining the problem of searching a spatial database for nearest neighbors to a query based on object importance and features.
This document proposes a range query algorithm for big data based on MapReduce and P-Ring indexing. It introduces a dual-index structure that uses an improved P-Ring to index files across nodes, and B+ trees to index attribute values within files. The algorithm divides range queries into two steps: using P-Ring to search for relevant files, then searching within those files' B+ trees to find matching data. It describes how the P-Ring and B+ trees are constructed and used to process range queries in a distributed manner optimized for big data environments.
Indexing for Large DNA Database sequencesCSCJournals
Bioinformatics data consists of a huge amount of information due to the large number of sequences, the very high sequences lengths and the daily new additions. This data need to be efficiently accessed for many needs. What makes one DNA data item distinct from another is its DNA sequence. DNA sequence consists of a combination of four characters which are A, C, G, T and have different lengths. Use a suitable representation of DNA sequences, and a suitable index structure to hold this representation at main memory will lead to have efficient processing by accessing the DNA sequences through indexing, and will reduce number of disk I/O accesses. I/O operations needed at the end, to avoid false hits, we reduce the number of candidate DNA sequences that need to be checked by pruning, so no need to search the whole database. We need to have a suitable index for searching DNA sequences efficiently, with suitable index size and searching time. The suitable selection of relation fields, where index is build upon has a big effect on index size and search time. Our experiments use the n-gram wavelet transformation upon one field and multi-fields index structure under the relational DBMS environment. Results show the need to consider index size and search time while using indexing carefully. Increasing window size decreases the amount of I/O reference. The use of a single field and multiple fields indexing is highly affected by window size value. Increasing window size value lead to better searching time with special type index using single filed indexing. While the search time is almost good and the same with most index types when using multiple field indexing. Storage space needed for RDMS indexing types are almost the same or greater than the actual data.
This document provides lecture notes on data structures that cover key topics including:
- Classifying data structures as simple, compound, linear, and non-linear and providing examples.
- Defining abstract data types and algorithms, and explaining their structure and properties.
- Discussing approaches for designing algorithms and issues related to time and space complexity.
- Covering searching techniques like linear search and sorting techniques including bubble sort, selection sort, and quick sort.
- Describing linear data structures like stacks, queues, and linked lists and non-linear structures like trees and graphs.
1) The document discusses using k-means clustering to analyze big data. K-means is an algorithm that partitions data into k clusters based on similarity.
2) It provides background on big data characteristics like volume, variety, and velocity. It also discusses challenges of heterogeneous, decentralized, and evolving data.
3) The document proposes applying k-means clustering to big data to map data into clusters according to its properties in a fast and efficient manner. This allows statistical analysis and knowledge extraction from large, complex datasets.
This document summarizes a research paper on using k-means clustering to analyze big data. It begins with an introduction to big data and its characteristics. It then discusses related work on big data storage, mining, and analytics. The HACE theorem for defining big data is presented. The k-means clustering algorithm is explained as an efficient method for partitioning big data into groups. The proposed system uses k-means clustering followed by data mining and classification modules. Experimental results on two datasets show that the recursive k-means approach finds clusters closer to the actual number than the iterative approach. In conclusion, clustering is effective for handling big data attributes like heterogeneity and complexity, and k-means distribution helps distribute data into appropriate clusters.
A spatial data model for moving object databasesijdms
Moving Object Databases will have significant role in Geospatial Information Systems as they allow users
to model continuous movements of entities in the databases and perform spatio-temporal analysis. For
representing and querying moving objects, an algebra with a comprehensive framework of User Defined
Types together with a set of functions on those types is needed. Moreover, concerning real world
applications, moving objects move along constrained environments like transportation networks so that an
extra algebra for modeling networks is demanded, too. These algebras can be inserted in any data model if
their designs are based on available standards such as Open Geospatial Consortium that provides a
common model for existing DBMS’s. In this paper, we focus on extending a spatial data model for
constrained moving objects. Static and moving geometries in our model are based on Open Geospatial
Consortium standards. We also extend Structured Query Language for retrieving, querying, and
manipulating spatio-temporal data related to moving objects as a simple and expressive query language.
Finally as a proof-of-concept, we implement a generator to generate data for moving objects constrained
by a transportation network. Such a generator primarily aims at traffic planning applications.
Similar to SPATIAL R-TREE INDEX BASED ON GRID DIVISION FOR QUERY PROCESSING (20)
This presentation was provided by Steph Pollock of The American Psychological Association’s Journals Program, and Damita Snow, of The American Society of Civil Engineers (ASCE), for the initial session of NISO's 2024 Training Series "DEIA in the Scholarly Landscape." Session One: 'Setting Expectations: a DEIA Primer,' was held June 6, 2024.
বাংলাদেশের অর্থনৈতিক সমীক্ষা ২০২৪ [Bangladesh Economic Review 2024 Bangla.pdf] কম্পিউটার , ট্যাব ও স্মার্ট ফোন ভার্সন সহ সম্পূর্ণ বাংলা ই-বুক বা pdf বই " সুচিপত্র ...বুকমার্ক মেনু 🔖 ও হাইপার লিংক মেনু 📝👆 যুক্ত ..
আমাদের সবার জন্য খুব খুব গুরুত্বপূর্ণ একটি বই ..বিসিএস, ব্যাংক, ইউনিভার্সিটি ভর্তি ও যে কোন প্রতিযোগিতা মূলক পরীক্ষার জন্য এর খুব ইম্পরট্যান্ট একটি বিষয় ...তাছাড়া বাংলাদেশের সাম্প্রতিক যে কোন ডাটা বা তথ্য এই বইতে পাবেন ...
তাই একজন নাগরিক হিসাবে এই তথ্য গুলো আপনার জানা প্রয়োজন ...।
বিসিএস ও ব্যাংক এর লিখিত পরীক্ষা ...+এছাড়া মাধ্যমিক ও উচ্চমাধ্যমিকের স্টুডেন্টদের জন্য অনেক কাজে আসবে ...
it describes the bony anatomy including the femoral head , acetabulum, labrum . also discusses the capsule , ligaments . muscle that act on the hip joint and the range of motion are outlined. factors affecting hip joint stability and weight transmission through the joint are summarized.
Chapter wise All Notes of First year Basic Civil Engineering.pptxDenish Jangid
Chapter wise All Notes of First year Basic Civil Engineering
Syllabus
Chapter-1
Introduction to objective, scope and outcome the subject
Chapter 2
Introduction: Scope and Specialization of Civil Engineering, Role of civil Engineer in Society, Impact of infrastructural development on economy of country.
Chapter 3
Surveying: Object Principles & Types of Surveying; Site Plans, Plans & Maps; Scales & Unit of different Measurements.
Linear Measurements: Instruments used. Linear Measurement by Tape, Ranging out Survey Lines and overcoming Obstructions; Measurements on sloping ground; Tape corrections, conventional symbols. Angular Measurements: Instruments used; Introduction to Compass Surveying, Bearings and Longitude & Latitude of a Line, Introduction to total station.
Levelling: Instrument used Object of levelling, Methods of levelling in brief, and Contour maps.
Chapter 4
Buildings: Selection of site for Buildings, Layout of Building Plan, Types of buildings, Plinth area, carpet area, floor space index, Introduction to building byelaws, concept of sun light & ventilation. Components of Buildings & their functions, Basic concept of R.C.C., Introduction to types of foundation
Chapter 5
Transportation: Introduction to Transportation Engineering; Traffic and Road Safety: Types and Characteristics of Various Modes of Transportation; Various Road Traffic Signs, Causes of Accidents and Road Safety Measures.
Chapter 6
Environmental Engineering: Environmental Pollution, Environmental Acts and Regulations, Functional Concepts of Ecology, Basics of Species, Biodiversity, Ecosystem, Hydrological Cycle; Chemical Cycles: Carbon, Nitrogen & Phosphorus; Energy Flow in Ecosystems.
Water Pollution: Water Quality standards, Introduction to Treatment & Disposal of Waste Water. Reuse and Saving of Water, Rain Water Harvesting. Solid Waste Management: Classification of Solid Waste, Collection, Transportation and Disposal of Solid. Recycling of Solid Waste: Energy Recovery, Sanitary Landfill, On-Site Sanitation. Air & Noise Pollution: Primary and Secondary air pollutants, Harmful effects of Air Pollution, Control of Air Pollution. . Noise Pollution Harmful Effects of noise pollution, control of noise pollution, Global warming & Climate Change, Ozone depletion, Greenhouse effect
Text Books:
1. Palancharmy, Basic Civil Engineering, McGraw Hill publishers.
2. Satheesh Gopi, Basic Civil Engineering, Pearson Publishers.
3. Ketki Rangwala Dalal, Essentials of Civil Engineering, Charotar Publishing House.
4. BCP, Surveying volume 1
This slide is special for master students (MIBS & MIFB) in UUM. Also useful for readers who are interested in the topic of contemporary Islamic banking.
How to Make a Field Mandatory in Odoo 17Celine George
In Odoo, making a field required can be done through both Python code and XML views. When you set the required attribute to True in Python code, it makes the field required across all views where it's used. Conversely, when you set the required attribute in XML views, it makes the field required only in the context of that particular view.
Walmart Business+ and Spark Good for Nonprofits.pdfTechSoup
"Learn about all the ways Walmart supports nonprofit organizations.
You will hear from Liz Willett, the Head of Nonprofits, and hear about what Walmart is doing to help nonprofits, including Walmart Business and Spark Good. Walmart Business+ is a new offer for nonprofits that offers discounts and also streamlines nonprofits order and expense tracking, saving time and money.
The webinar may also give some examples on how nonprofits can best leverage Walmart Business+.
The event will cover the following::
Walmart Business + (https://business.walmart.com/plus) is a new shopping experience for nonprofits, schools, and local business customers that connects an exclusive online shopping experience to stores. Benefits include free delivery and shipping, a 'Spend Analytics” feature, special discounts, deals and tax-exempt shopping.
Special TechSoup offer for a free 180 days membership, and up to $150 in discounts on eligible orders.
Spark Good (walmart.com/sparkgood) is a charitable platform that enables nonprofits to receive donations directly from customers and associates.
Answers about how you can do more with Walmart!"
BÀI TẬP BỔ TRỢ TIẾNG ANH LỚP 9 CẢ NĂM - GLOBAL SUCCESS - NĂM HỌC 2024-2025 - ...
SPATIAL R-TREE INDEX BASED ON GRID DIVISION FOR QUERY PROCESSING
1. International Journal of Database Management Systems ( IJDMS ) Vol.9, No.6, December 2017
DOI : 10.5121/ijdms.2017.9602 25
SPATIAL R-TREE INDEX BASED ON GRID DIVISION
FOR QUERY PROCESSING
Esraa Rslan1
, Hala Abdel Hameed1
and Ehab Ezzat2
1
Faculty of Computers and Information, Fayoum University, Egypt
2
Faculty of Computers and Information, Cairo University, Egypt
ABSTRACT
Tracing moving objects have turned out to be essential in our life and have a lot of uses like: GPS guide,
traffic monitor based administrations and location-based services. Tracking the changing places of objects
has turned into important issues. The moving entities send their positions to the server through a system
and large amount of data is generated from these objects with high frequent updates so we need an index
structure to retrieve information as fast as possible. The index structure should be adaptive, dynamic to
monitor the locations of objects and quick to give responses to the inquiries efficiently. The most well-
known kinds of queries strategies in moving objects databases are Rang, Point and K-Nearest Neighbour
and inquiries. This study uses R-tree method to get detailed range query results efficiently. But using R-tree
only will generate much overlapping and coverage between MBR. So R-tree by combining with Grid-
partition index is used because grid-index can reduce the overlap and coverage between MBR. The query
performance will be efficient by using these methods. We perform an extensive experimental study to
compare the two approaches on modern hardware.
KEYWORDS
MOD, Spatial indexing, R-tree, grid.
1. INTRODUCTION
Moving object applications have become very important in our life and have a lot of usages, such
as GPS car navigator and location-based services. Moving object database is a branch of the
spatial-temporal database research that has been developed recent years, indexing moving object
is one of the key technologies of the moving object database. Tracing the changing positions of
objects has become a serious matter and played an essential role in location-based services, such
as customers with cell phones and cars. These services need to know the current positions of
objects quickly. The index structure should be adaptive and dynamic to keep track of the
positions of objects and to provide quick answers to the expected queries [1].
An example for this is the traffic observing service for a population of 10 mega moving objects
that are processed with speed 15 m/s. Another example is related to expecting objects positions of
an exactness of no less than 100 meters. In this situation, we need up to 1 mega update rate every
second. As the update frequency is growing, the workloads are also increasing. High workloads in
Sidlauskas’s paper can't be managed by traditional disk-based techniques [2]. Rather, we propose
a primary memory index that is compatible with the characteristic of parallelism accessible in
multicore processors. The main goal of the index structure is to keep up a reliable database and
get back the right results.
Indexing techniques are divided into two main groups: tree based and grid based index. Examples
of tree based index are R-tree and its variations R+tree, R*tree, TPR-tree and TPR*tree (Time
2. International Journal of Database Management Systems (IJDMS ) Vol.9, No.6, December 2017
26
Parametric Rectangle) [3]. Examples of grid based index are a uniform, parallel, hierarchical, and
LU (Luzy Update) grid. There are also many kinds of queries, such as point query and density
query.
The R-tree is famous for its efficiency to support information skew and various query types. It
can be used to index moving objects in a specific region. The idea of the index structure is to
utilize the Minimum Bounding Rectangle (MBR). In any case, due to overlapping between
MBRs, the search might be necessary along different paths causing the exceeding of search paths
[4]. With the expansion in the amount of index data, it will require more storage space.
Subsequently, it will truly influence the search and update efficiency. Though R-tree is known for
its power to support various query types, the main problem with the R-tree is its bad update
leverage. On the other hand, using a grid index in moving objects is considerably faster than an
R-tree in an update and query processing. It is obviously discernible and it can straight forwardly
calculate all objects overlapping with the range query with no need to access the index entities.
In this paper, a robust and scalable index technique is proposed. The main goal is to build a
hybrid index technique that supports a different number of queries with the minimum cost update.
We used the R-tree and grid index together by using R-tree at the root node and grid at the leaf
node to achieve two main benefits. The important benefit is to take the advantage of the two
indexes as the R-tree is good for searching and querying. Also, grid is efficient in update
operations. Experimental studies have shown that the design of the index may help provide
answers to range queries with a great number of moving objects. To verify the efficient and
effectiveness of the three types of indexes (R-tree, Grid, and hybrid R-tree and Grid), we explore
several experiments to make a comparison between them based on the response time with a range
queries and the CPU time in updating objects.
The remaining parts of this paper are organized as follows: in section two we introduce the
background and related work; the aim herein is to briefly explain the R-tree structure and grid. In
section three, the proposed index technique and associated algorithms are explained. In section
four, an analysis of the index structure and the experimental results are shown. The conclusion is
presented in section five.
2. BACKGROUND AND RELATED WORK
In this section, we introduce two main types of moving objects indexing techniques: tree and grid
based indexes.
2.1. Tree Based Index
R-tree is a height balanced tree (all leaf nodes are at the same height). It consists of internal nodes
and leaf nodes. The R-tree is known for its robustness to data skew and it supports different types
of queries. Yet, it is poor in update performance. Numerous applications including indexing the
location of moving objects display workloads described by updates expansion to continuous
queries. Indexes are divided into two categories based on data to be stored: (1) indexing the
historical location of objects and (2) indexing the current location and predicting the future
location of moving objects [3].
An overview of spatial-temporal access techniques can be found in the two papers [5,6]. The two
surveys focus on the indexing of past, current, and future objects' positions. Indexes are divided
into two parts: the first part is from 1980 to 2003, such as the TPR-tree, the TPR*-tree, the Bx-
tree, and the Bdual-tree, STRIPES definition of spatial index is “the primary key to improve the
efficiency of queries”; the second part dates back to the period from 2003 to 2010, such as MON
tree, RTR-tree TPR-tree, Bx-tree and ST2B-tree.
3. International Journal of Database Management Systems (IJDMS ) Vol.9, No.6, December 2017
27
Guttman[7] first presented an index structure called R-tree for spatial searching. It's known that
R-tree is inefficient in update performance because of overlapping and the splitting of MBR,
though it is efficient in supporting different types of queries. Six variants of R-tree variants are
compared in Manolopouloset al.[8]. The most famous spatial techniques that utilize minimum
bounding rectangles are R-trees [6]and their variation R-trees. In TPR-tree that expands the R-tree
as objects move in linear motion function. Tao introduced the TPR*-tree that enhances the TPR-
tree with advanced insert, update, and delete process. R-tree, TPR-tree, and TPR*-tree have low
efficiency in updates because of their node splitting and reinsertion operations. Other than R-
trees, B-trees, and Quadtrees[9] ,some indexes can likewise be utilized to moving entities. The
B+-tree, proposed by Jensen et al.[10] utilizes space filling curves like Hilbert curves or Z-curves
that are used in mapping dimensional locations.
2.1.1 Structure
R-tree consists of internal nodes and leaf nodes [11]. The internal node has pointers to other
nodes; leaf node has pointers to objects (entries). As shown in figure 1, R-tree satisfies the
following properties:
1. Every leaf node has between m and M entries unless the node is the root node, where M is the
maximum set of records in a node, m is the minimum number of records that can be held in a
node and let m<=M/2.
2. For every index entity (I, tuple-identifier) in a leaf node, I is the smallest rectangle that has the
object and tuple-identifier is a unique number that identifies the record in the database.
3. Each non-leaf node contains between m and M children.
4. For each entry (I, child-pointer) in a non-leaf node, I is the smallest rectangle that overlays all
the rectangles in the child node, the child-pointer is the address of the lower node.
5. The root node contains at least two child nodes unless it is a leaf.
6. All leaf nodes should be at the same grade.
Figure 1. An example of R-tree
4. International Journal of Database Management Systems (IJDMS ) Vol.9, No.6, December 2017
28
Figure 2 shows the structure of R-tree in on every node, there are Node Size (ns) number of each
node and Minimum Children (mc) a fraction of the full nodes which are the essential attributes in
the R-tree [12]. A leaf node contains node metadata like a number of entries, MBR, leaf flag, the
id of the parent node, a number of leaf entries and a pointer to the parent node at the end. Likely,
internal node consists of the same metadata fields in the leaf node, but the differences between
them are the child pointer and MBR, where the MBR is the smallest rectangle that gathers the
spatial entities together.
Figure 2. Structure of R-tree
2.1.2 Update, Insertion, and Deletion
Every update operation deletes the object from place and inserts it in another one. Assume the
point P (oldX, oldY) is needed to be updated; first we search the tree from root to leaves until we
find the node with MBR that have point P (oldX, oldY), more than one path are visited because of
the overlapping between MBRs, then delete the point, but be careful that no underflow occurs
(less than the minimum number of entries, often denoted as m). After the deletion operation is
done, the insertion is applied. The insertion traverses the tree from root to leaves to find the
suitable leaf node and insert (newX, newY). Insertion may cause overflow node (extends the
maximum number of objects denoted as M). There are many variations of the R-tree in the
literature; the uniform R-tree in is only considered.
2.1.3 Query Processing
Two main types of queries are discussed: Range Query and K-nearest neighbour query. A range
query is processed by accessing the tree from the root to the leaves if the present node is not a
leaf; check every internal node MBR that intersects or overlaps with the MBR of the range query.
2.2 Grid Based Index
Grid is a pre-defined partitioning index in which the region is partitioned into rectangular cells
because it indicates the pre-defined spatial region. Grid covers a part of the space that object
position is inside the boundaries of a grid cell and this object belongs to this cell. Šidlauskaset al
[12] proposed the grid index which is one of the principal framework index structures for moving
5. International Journal of Database Management Systems (IJDMS ) Vol.9, No.6, December 2017
29
objects database. D-Grid [13] enhances U-Grid which it indexes both the location and velocity of
moving objects in the dual space. The dual space is a 2-dimensional Euclidean space, with a first
dimension for velocity and the other for location. Another index structure that is based on grid
cell and continuous k-nearest neighbor query processing is proposed by Wei Zhang [14].
Existing Concurrency Control (CC) methods for multidimensional indexes are also introduced
like Šidlauskaset al. [15] who proposed the TwinGrid and the PGrid[16]give high query and
update performance. PGrid makes various duplicates of information on a request and at the object
granularity (i.e., non-local objects).
2.2.1 Structure
Grid is a 2-d array that every cell in the array matches to a square cell with a length of grid cell
size. Every grid cell in the array has a link to a list of buckets which contains the data object [12].
Each bucket has bucket size bs and metadata fields where metadata contains the next bucket, the
number of entries and pointer to the next bucket. Grid is defined by two main parameters: gcs,
and bs.
Figure 3. Grid index structure
2.2.2 Update, Insertion, and Deletion
All updates operations are classified into two sets: local and nonlocal. The update is considered
local when the new object location falls in the same cell and we need only to change the (oldX,
oldY) with the (newX, newY). Otherwise, the update is non-local (new location not in the same
cell) as the object will be deleted from its current cell, and it will then be inserted inside the new
cell. During the insert operation, the new object is always put at the end of the first bucket. In the
condition that the first bucket is complete and has no space, a new bucket is created and the
pointers are upgraded to make the new bucket the first one. If the first bucket is empty, the next
bucket becomes the first one.
2.2.3 Query Processing
The range query divides the cells, which overlap in the range query into two categories: the
completely and the partly covered cells. Only the objects that are partly covered are needed to be
listed in order to know if they are inside the range or not. KNN algorithm is known by a point P
and the numbers of nearest neighbour k, cells are divided into different sets. Every set has the
minimum range to the point P. Similar to the KNN query in the tree index, the queue is used to
store these cells as they are sorted from the nearest to the farthest.
6. International Journal of Database Management Systems (IJDMS ) Vol.9, No.6, December 2017
30
3. PROPOSED INDEX TECHNIQUE
R-tree and grid-partition index is proposed, which give execution enhancements in handling
different types of queries like range and KNN queries. The proposed index has two main standard
data structures: the first is the R-tree and the second one is a grid index. The R-tree is used to
divide the space into MBRs then a grid is used in non-leaf nodes to store the data object and to
reduce the overlapping between MBRs.In grid implementation, divide space into M x M grid of
squares, create linked list for each square, use 2D array to directly access relevant square. It
performs less maintenance effort as no grid changes are needed to be done in a grid updates.
It is known that if the number of objects increases a lot of overlapping and converging between
MBRs will be generated, approximately; each single update will need about four tree traversals
affecting the performance significantly. To reduce node splitting in the R-tree, a grid is used to
store data objects at leaf level. The performance of grid in inserting, deleting, and updating
algorithms is superior to the R-tree, so it can highly decrease the time complexity of an algorithm.
Extensive and broad experiments reveal that our proposed index structure essentially outperforms
and beats the existing grid index or R-tree index in single threaded environments.
As shown in figure 4, the index structure has two data structures: the root level is R-tree and each
non leaf node is in a grid index structure, taking the advantages of both Tree and Grid index. R-
tree is very powerful in supporting different types of queries with minimum time response. In
addition, the grid is the best choice in updates because no grid re-adjusting should be done
through updates, also grids require less repair effort and index speed is quick. We take the
advantage of each one of these indexes in update and query processing. In a query range, grids
can number all index objects that cover the range query with no need to access to the index
records.
Figure 4. The new index structure
7. International Journal of Database Management Systems (IJDMS ) Vol.9, No.6, December 2017
31
3.1 Index structure
The R-tree is used to implement both root and parent levels, which utilize the MBR to partition
the regions, and child node uses a grid to implement MBR with equal sized cells. The grid
needs insignificant update costs. Hence, no framework re-adjusting is done when objects
change their locations. The index structure is as follows:
1. It has a root node in R-tree.
2. Non-leaf node contains the internal nodes that have the tuple <mbr, ptr> where the mbr is
the minimum bounding rectangle of the node and ptr is an indicator to point to the next
level node.
3. Leaf node is in grid coordination that divides the MBR into M x M cells, the grid is a 2-d
array that every cell in the array matches to a square cell with a length of Grid Cell Size
(gcs).
4. At leaf node, grid squares store a pointer to the list of buckets which have the object data.
In a huge size of data, getting the relationships from the grid index can highly minimize
the complexity of the algorithm in contrast to the conventional R-tree methods. Grid
coordinate makes the position of spatial entity objects clear, so reduces the repetition of
storage. Also, to avoid the search operation costs.
3.2 Update, Insertion, and Deletion
The R-tree operates every update as a companion of two processes: delete and insert, traversing
the tree top- down to locate the suitable leaf and update it. Algorithm 1 is proposed for inserting
the new object. The algorithm takes a new object as an input with a unique id, and then inserts it
at the index. The first step, the algorithm searches the tree from the root t to leaf to find the
suitable child node by searching in the node of min.x≦x≦max.x in x-dimension according to
MBR coordinates (min.x, min.y, max.x, max.y) of this object. If it is true, go to the second step,
then go to the fourth step. The second step, if the MBR is found, the insertion operation is done
by inserting the object into the suitable grid cell based on (x, y) of the new object as the child
node is implemented in grid coordination. The third step, the new object is always put at the end
of the first bucket in the grid. If the first bucket is complete and has no space, a new bucket is
created and the pointers are upgraded to make the new bucket the first one. Otherwise, the first
bucket is empty, and then the following bucket is the first one. The first free location at the end of
the bucket is located, and the new record is also inserted. The fourth step, if the no MBR is found,
create a child node approximate to X-value, Y-value and its leaf node (x, y) and insert the new
object into the grid expressed by the leaf node.
Algorithm1: insert (ObjectTuple new(n), Cell cell)
1. Search object n in tree
2. if tree is non-leaf // then for each entry in n do
3. if EI overlaps n // true… insert new object in the index
4. firstBucket= *cell ; // the 1st bucket is equal to reference cell
5. if Full(firstBucket) // if firstbucket is full ..create new bucket and make it first ;
6. freePosition= firstBucket.entries[firstBucket.nO] // first free location of the first bucket is
located
7. insertObject(n, freePosition); // the new object is inserted
8. elseCreate child node (ObjectTuple new(n)) //false…. No MBR found
8. International Journal of Database Management Systems (IJDMS ) Vol.9, No.6, December 2017
32
The update process in algorithm 2 takes a new object (newx, newy) as an input. Based on its
position, the tree is searched to find the MBR node that overlaps with the object. After locating
the node that has the object to be updated according to its oldx, oldy. The information is retrieved
from the primary index (the location of the object, current cell of object), then the algorithm
determines the update type whatever is it local or not by contrasting the new and old positions.
The out-dated object tuple is over-written with the new x and newly if the update is local as well,
the old object tuple is deleted (call delete algorithm) and the new object tuple is inserted into the
appropriate child node,” update is non-local”.
Algorithm 2: update(ObjectTuple new (new))
1. Search in obj Tree t
2. if EI overlaps obj
3. SearchSubTree(EP, obj)
4. newCell= computenewCell(newx, newy);// new cell is located for the object
5. if newCell = = oldCell // Local update .. in the same cell
6. overwriteObject(new, obj); // obj is overwritten new
7. else // Non-local update
detete (obj);//delete the old object tuple
insert(new); // insert the new object and update the index
In algorithm 3, the delete operation is done as follows: First, search in the tree until finding the
child node that contains the object that we want to delete. Second, as child node is implemented
in grid coordination. Third, after finding the object the algorithm seeks to shift the last object of
the first bucket into the place of the deleted object; all pointers (pointer1, pointer2, and idx) of the
last object are modified accordingly in the secondary index. Fourth, if the first bucket was empty,
it is erased and the following bucket is the first or the grid cell becomes empty, put a null pointer
in the cell.
Algorithm 3: delete(ObjectTuple obj)
1. Searchobj in Tree t
2. if t is not a leaf //then for each entry E in t do
3. if EI overlaps obj
4. SearchSubTree(EP, obj)
5. firstBucket = * sindex.ldCell; // cell refers to the frist bucket
6. lastObject = firstBckt.entries[firstBckt.nO - 1];
7. if firstBucket.nO == 0 then // is empty?
8. * sindex.ldCell = firstBucket.next;
9. free(firstBucket);
10. update all pointers of deleted object
3.3 Query Processing
The range query is explained and how it works.
3.3.1 Range Query
In range query, the resulting cells that overlap with the range query is divided into the completely
and the partly covered cells. The objects that are partly covered are needed to be listed in order to
know if they are inside the range or not. The input is the query box q (x0, y0, x1, y1) in which (x0,
y0) and (x1, y1) are the lower-left and the upper-right corner coordinates respectively. First, the
algorithm searches the node of x1 ≥x ≥x0 in dimension x based on the coordinates (x0, y0, x1,
y1) of the q in the MBR of the child node; if it has the node of x value. Then, it searches each
9. International Journal of Database Management Systems (IJDMS ) Vol.9, No.6, December 2017
33
node of Max.y1 ≥ y ≥ Min.y0 corresponding to the x-value; if there exists the node of y-value, go
to bucket number 1and increment the reader (R) of a bucket before coming into it and decrement
it when the bucket has been examined. Finally returns the objects which are covered by the query
range in the result set.
4. EXPERIMENTAL RESULTS
In order to check the performance of the proposed technique, we compare it with both traditional
R-tree and grid by providing an overview of the performance and showing the average CPU time
of individual update, range and I/O visits.
4.1 Index Implementations
The index structure is implemented using C++ and compiled with g++ under the Linux. The test
environment uses Windows Seven with 64-bit OS, Core (TM) i3-2.10 GHz, 6GB RAM. The
indexes are studied under massive datasets; these datasets are generated using an Open-source
Moving Object Trace generator (MOTO) that is based on Brinkhoff [17] moving-object
generator. Objects move into a network-based roads in five German cities. Table 1 shows the
Experimental settings of the generator; the default values are in bold.
Table1. The Experimental settings of the generator
Dataset Parameters Values
Number of objects (×10 ) 5, 10, 20, 40, 80
Number of Updates (×10 ) 300
Monitored region (KM ) 641 × 864
Velocity(kilometer/hour) 20, 30, 40, 50, 60, 90
Update/queryratio 250:1, …1000:1, …16000:1
Time through updates (s) 10, 20, 40, 80, 160
k nearest neighbor (×10 ) 0.5,1,2, 4, 8, 16, 32
# network segments 32,750,494
# network nodes 28,933,679
4.2 Comparison of the Indexes
A comparative experiment was done to compare the efficiency of the proposed hybrid index and
the conventional R-tree and grid under different conditions and parameters. The different index
behaviour of the total CPU update time and different number of moving objects are shown in
Figure 5. When the number of objects exceeds, the new index outperforms both R-tree and the
grid. With the Grid parameters, it still performs updates a little better and performs similar (range)
query performance with the tree in figure 6.
10. International Journal of Database Management Systems (IJDMS ) Vol.9, No.6, December 2017
34
Figure 5. Increasing number of objects in update operation
Figure 6. Increasing number of objects in range query
11. International Journal of Database Management Systems (IJDMS ) Vol.9, No.6, December 2017
35
Figure 7. Number of I/O visits
I/O visits are shown in figure 7 when inserting in different number of spatial entity objects in
queries.The results exhibit that the new index structure is outperforming than the conventional R-
tree because after dividing the region into the first MBRs using R-tree, each MBR is partitioned
using the grid. In querying, locate the grid cell connected to the MBR of the objects, because leaf
nodes contain the data objects. The tree is built based on the (x, y) value of the grid cells, so it can
speedily find the grid cell of the related object based on the MBR coordinates of the entity object
and insert the object into the related leaf-node; if the MBR of this object not found in the tree, we
create a new child node. In a huge size of data, getting the relationships from the grid structure
can highly minimize the complexity of the algorithm contrast to the conventional method of
looking for MBR containing the spatial entries.
5. CONCLUSIONS
A new spatial index technique based on spatial grid coordinates division is introduced, and the
description of its data structure and algorithm is also given, we compared the hybridized index
with the traditional r-tree index, and grid. The new index can reduce the overlapping and
coverage between intermediate nodes. The important point in the new index is to retrieve quickly
data and give speed answer effectively based on the current location like knowing the locations of
schools, clinics, bazaars, mini-marts that can be used by various levels of people. The index
structure is very flexible, so it can improve the retrieval performance of spatial entity objects, and
has a better application in practice.
REFERENCES
[1] W. Choi, B. Moon, and S. Lee, "Adaptive cell-based index for moving objects," Data & Knowledge
Engineering, vol. 48, pp. 75-101, 2004.
[2] D. Sidlauskas, S. Saltenis, and C. S. Jensen, "Processing of extreme moving-object update and query
workloads in main memory," The VLDB Journal The VLDB Journal : The International Journal on
Very Large Data Bases, vol. 23, pp. 817-841, 2014.
[3] Y. Tao, D. Papadias, and J. Sun, "The TPR*-tree: an optimized spatio-temporal access method for
predictive queries," in Proceedings of the 29th international conference on Very large data bases-
Volume 29, 2003, pp. 790-801.
[4] N. Beckmann, H.-P. Kriegel, R. Schneider, and B. Seeger, "The R*-tree: an efficient and robust
access method for points and rectangles," in Acm Sigmod Record, 1990, pp. 322-331.
12. International Journal of Database Management Systems (IJDMS ) Vol.9, No.
[5] M. F. Mokbel, T. M. Ghanem, and W. G. Aref, "Spatio
Bull., vol. 26, pp. 40-49, 2003.
[6] L.-V. Nguyen-Dinh, W. G. Aref, and M. Mokbel, "Spa
2010)," 2010.
[7] A. Guttman, "R-trees: a dynamic index structure for spatial searching," presented at the Proceedings
of the 1984 ACM SIGMOD international conference on Management of data, Boston, Massachusetts,
1984.
[8] Y. Manolopoulos, A. Nanopoulos, A. N. Papadopoulos, and Y. Theodoridis, R
Applications: Springer Science & Business Media, 2010.
[9] Y. Xia and S. Prabhakar, "Q+ Rtree: Efficient indexing for moving object databases," in Database
Systems for Advanced Applications, 2003.(DASFAA 2003). Proceedings. Eighth International
Conference on, 2003, pp. 175-
[10] C. S. Jensen, D. Lin, and B. C. Ooi, "Query and update efficient B+
objects," in Proceedings of the Thirtieth international conference on Very large data bases
30, 2004, pp. 768-779.
[11] Y. Gui-juna and Z. Ji-xianb, "A Dynamic Index Structure for Spatial Database Querying Based on R
Trees," in Proceedings of International Symposium on Spatio
Analysis, Data Mining and Data Fusion, 2005, pp. 27
[12] D. Šidlauskas, S. Šaltenis, C. W. Christiansen, J. M. Johansen, and D. Šaulys, "Trees or grids?:
indexing moving objects in main memory," in Proceedings of th
international conference on Advances in Geographic Information Systems, 2009, pp. 236
[13] X. Xu, L. Xiong, and V. Sunderam, "D
databases," in Mobile Data Management (MDM
pp. 252-261.
[14] W. Zhang, J. Li, and H. Pan, "Processing continuous k
application," Int’l Journal of Computer Science and Network Security, vol. 6, pp. 1
[15] D. Šidlauskas, K. Ross, C. Jensen, and S. Šaltenis, "Thread
moving-object workloads," Advances in Spatial and Temporal Databases, pp. 186
[16] D. Šidlauskas, S. Šaltenis, and C. S. Jensen
and update workloads," in Proceedings of the 2012 ACM SIGMOD International Conference on
Management of Data, 2012, pp. 37
[17] T. Brinkhoff, "A framework for generating network
pp. 153-180, 2002.
AUTHORS
Esraa Rslan was born in 1991. She received the B.S. degree in Faculty of Computers
and Information from Fayoum University, Egypt, in 2012. She is M. S. candidate. Her
research interests include Database Managements, Spatial Databases and Database
Query Processing. Her E-mail is eaa06@fayoum.edu.eg
Hala Abdel Hameed Mostafa was born in 1977. She received the B.S. degree in
Faculty of Computers and Information from Helwan University, Egypt, in 2000. She
had a Ph.D. in Computer and Information Systems from Mansoura University, 2012
and Master of Computer Science from Helwan Un
at Faculty of Computers and Information Systems, Fayoum University. You can reach
her at ham07@fayoum.edu.eg
Ehab Ezzat Hassanein hold a Master degree in Computer Science from
Northwestern University in 1989.A Ph.D. from Northwestern University in December
1992. Now he is Chairman of the Information Systems Department of Computers and
Information Faculty, Cairo University, Egypt. He
cu.edu.eg
International Journal of Database Management Systems (IJDMS ) Vol.9, No.6, December
M. F. Mokbel, T. M. Ghanem, and W. G. Aref, "Spatio-temporal access methods," IEEE Data Eng.
49, 2003.
Dinh, W. G. Aref, and M. Mokbel, "Spatio-temporal access methods: Part 2 (2003
trees: a dynamic index structure for spatial searching," presented at the Proceedings
of the 1984 ACM SIGMOD international conference on Management of data, Boston, Massachusetts,
Y. Manolopoulos, A. Nanopoulos, A. N. Papadopoulos, and Y. Theodoridis, R-trees: Theory and
Applications: Springer Science & Business Media, 2010.
Y. Xia and S. Prabhakar, "Q+ Rtree: Efficient indexing for moving object databases," in Database
Systems for Advanced Applications, 2003.(DASFAA 2003). Proceedings. Eighth International
-182.
C. S. Jensen, D. Lin, and B. C. Ooi, "Query and update efficient B+-tree based indexing of moving
he Thirtieth international conference on Very large data bases
xianb, "A Dynamic Index Structure for Spatial Database Querying Based on R
Trees," in Proceedings of International Symposium on Spatio-temporal Modeling, Spatial Reasoning,
Analysis, Data Mining and Data Fusion, 2005, pp. 27-29.
D. Šidlauskas, S. Šaltenis, C. W. Christiansen, J. M. Johansen, and D. Šaulys, "Trees or grids?:
indexing moving objects in main memory," in Proceedings of the 17th ACM SIGSPATIAL
international conference on Advances in Geographic Information Systems, 2009, pp. 236
X. Xu, L. Xiong, and V. Sunderam, "D-grid: An in-memory dual space grid index for moving object
databases," in Mobile Data Management (MDM), 2016 17th IEEE International Conference on, 2016,
W. Zhang, J. Li, and H. Pan, "Processing continuous k-nearest neighbor queries in location
application," Int’l Journal of Computer Science and Network Security, vol. 6, pp. 1-9, 2006.
D. Šidlauskas, K. Ross, C. Jensen, and S. Šaltenis, "Thread-level parallel indexing of update intensive
object workloads," Advances in Spatial and Temporal Databases, pp. 186-204, 2011.
D. Šidlauskas, S. Šaltenis, and C. S. Jensen, "Parallel main-memory indexing for moving
and update workloads," in Proceedings of the 2012 ACM SIGMOD International Conference on
Management of Data, 2012, pp. 37-48.
T. Brinkhoff, "A framework for generating network-based moving objects," GeoInformatica, vol. 6,
was born in 1991. She received the B.S. degree in Faculty of Computers
and Information from Fayoum University, Egypt, in 2012. She is M. S. candidate. Her
e Database Managements, Spatial Databases and Database
eaa06@fayoum.edu.eg
was born in 1977. She received the B.S. degree in
Faculty of Computers and Information from Helwan University, Egypt, in 2000. She
had a Ph.D. in Computer and Information Systems from Mansoura University, 2012
and Master of Computer Science from Helwan University, 2006. She is now Lecturer
at Faculty of Computers and Information Systems, Fayoum University. You can reach
hold a Master degree in Computer Science from
Northwestern University in 1989.A Ph.D. from Northwestern University in December
1992. Now he is Chairman of the Information Systems Department of Computers and
Information Faculty, Cairo University, Egypt. He may be reached at e.ezat@fci-
6, December 2017
36
temporal access methods," IEEE Data Eng.
temporal access methods: Part 2 (2003-
trees: a dynamic index structure for spatial searching," presented at the Proceedings
of the 1984 ACM SIGMOD international conference on Management of data, Boston, Massachusetts,
trees: Theory and
Y. Xia and S. Prabhakar, "Q+ Rtree: Efficient indexing for moving object databases," in Database
Systems for Advanced Applications, 2003.(DASFAA 2003). Proceedings. Eighth International
tree based indexing of moving
he Thirtieth international conference on Very large data bases-Volume
xianb, "A Dynamic Index Structure for Spatial Database Querying Based on R-
emporal Modeling, Spatial Reasoning,
D. Šidlauskas, S. Šaltenis, C. W. Christiansen, J. M. Johansen, and D. Šaulys, "Trees or grids?:
e 17th ACM SIGSPATIAL
international conference on Advances in Geographic Information Systems, 2009, pp. 236-245.
memory dual space grid index for moving object
), 2016 17th IEEE International Conference on, 2016,
nearest neighbor queries in location-dependent
9, 2006.
level parallel indexing of update intensive
204, 2011.
memory indexing for moving-object query
and update workloads," in Proceedings of the 2012 ACM SIGMOD International Conference on
objects," GeoInformatica, vol. 6,