This document summarizes a research paper on efficient techniques for processing top-k spatial preference queries in a database. It discusses how such queries allow users to rank spatial objects based on the aggregated qualities of nearby features. It proposes two algorithms - branch-and-bound and feature join - to efficiently process these queries by pruning search space. The paper also studies extensions of the algorithms for different aggregate functions and for queries using an influence score to weight nearby features.
Research Inventy : International Journal of Engineering and Scienceinventy
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
Ranking Preferences to Data by Using R-TreesIOSR Journals
This document discusses algorithms for efficiently processing top-k spatial preference queries in databases containing spatial and non-spatial data. It defines top-k spatial preference queries as ranking objects based on quality and features in their nearest neighborhoods. It presents the branch and bound and feature join algorithms for computing the top-k results without having to calculate scores for all objects. It also discusses using R-trees to index spatial data and feature data to accelerate query processing.
Ranking spatial data by quality preferences pptSaurav Kumar
A spatial preference query ranks objects based on the qualities of features in their spatial neighborhood. For example, using a real estate agency database of flats for lease, a customer may want to rank the flats with respect to the appropriateness of their location, defined after aggregating the qualities of other features (e.g., restaurants, cafes, hospital, market, etc.) within their spatial neighborhood. Such a neighborhood concept can be specified by the user via different functions. It can be an explicit circular region within a given distance from the flat. Another intuitive definition is to assign higher weights to the features based on their proximity to the flat. In this paper, we formally define spatial preference queries and propose appropriate indexing techniques and search algorithms for them. Extensive evaluation of our methods on both real and synthetic data reveals that an optimized branch-and-bound solution is efficient and robust with respect to different parameters
The document summarizes research on performing spatio-textual similarity joins. It discusses:
1) Developing a filter-and-refine framework to efficiently find similar object pairs from two datasets using signatures.
2) Generating spatial and textual signatures for objects and building inverted indexes on the signatures to find candidate pairs.
3) Refining the candidate pairs to obtain the final result pairs that satisfy spatial and textual similarity thresholds.
TYBSC IT PGIS Unit II Chapter I Data Management and Processing SystemsArti Parab Academics
This document discusses geographic information systems (GIS). It defines GIS as hardware and software used to process, store, and transfer geographic data. It describes how GIS has evolved from using analog data and manual processing to increased use of digital data, computers, and software. It also discusses key GIS concepts like spatial data capture and analysis, data storage and management, and data presentation.
Spatial databases are used to store geographic information. Querying on such databases are : range queries, nearest neighbor queries and spatial joins. Many indexing techniques are used for faster retrieval of data out of which r-trees are mainly efficient. Other indexing techniques are quad-trees, grid files etc. Spatial data is used in GIS applications.
- The document describes a study that used various machine learning algorithms to classify images of letters based on 16 extracted features.
- The best performing algorithm was the Gaussian Mixture Model, which achieved 96.43% accuracy by modeling each letter as a Gaussian distribution and accounting for correlations between features.
- The k-Nearest Neighbors algorithm achieved 95.65% accuracy when using only the single nearest neighbor for classification, while the Naive Bayes classifier achieved only 62.45% due to its strong independence assumptions.
Research Inventy : International Journal of Engineering and Scienceinventy
Research Inventy : International Journal of Engineering and Science is published by the group of young academic and industrial researchers with 12 Issues per year. It is an online as well as print version open access journal that provides rapid publication (monthly) of articles in all areas of the subject such as: civil, mechanical, chemical, electronic and computer engineering as well as production and information technology. The Journal welcomes the submission of manuscripts that meet the general criteria of significance and scientific excellence. Papers will be published by rapid process within 20 days after acceptance and peer review process takes only 7 days. All articles published in Research Inventy will be peer-reviewed.
Ranking Preferences to Data by Using R-TreesIOSR Journals
This document discusses algorithms for efficiently processing top-k spatial preference queries in databases containing spatial and non-spatial data. It defines top-k spatial preference queries as ranking objects based on quality and features in their nearest neighborhoods. It presents the branch and bound and feature join algorithms for computing the top-k results without having to calculate scores for all objects. It also discusses using R-trees to index spatial data and feature data to accelerate query processing.
Ranking spatial data by quality preferences pptSaurav Kumar
A spatial preference query ranks objects based on the qualities of features in their spatial neighborhood. For example, using a real estate agency database of flats for lease, a customer may want to rank the flats with respect to the appropriateness of their location, defined after aggregating the qualities of other features (e.g., restaurants, cafes, hospital, market, etc.) within their spatial neighborhood. Such a neighborhood concept can be specified by the user via different functions. It can be an explicit circular region within a given distance from the flat. Another intuitive definition is to assign higher weights to the features based on their proximity to the flat. In this paper, we formally define spatial preference queries and propose appropriate indexing techniques and search algorithms for them. Extensive evaluation of our methods on both real and synthetic data reveals that an optimized branch-and-bound solution is efficient and robust with respect to different parameters
The document summarizes research on performing spatio-textual similarity joins. It discusses:
1) Developing a filter-and-refine framework to efficiently find similar object pairs from two datasets using signatures.
2) Generating spatial and textual signatures for objects and building inverted indexes on the signatures to find candidate pairs.
3) Refining the candidate pairs to obtain the final result pairs that satisfy spatial and textual similarity thresholds.
TYBSC IT PGIS Unit II Chapter I Data Management and Processing SystemsArti Parab Academics
This document discusses geographic information systems (GIS). It defines GIS as hardware and software used to process, store, and transfer geographic data. It describes how GIS has evolved from using analog data and manual processing to increased use of digital data, computers, and software. It also discusses key GIS concepts like spatial data capture and analysis, data storage and management, and data presentation.
Spatial databases are used to store geographic information. Querying on such databases are : range queries, nearest neighbor queries and spatial joins. Many indexing techniques are used for faster retrieval of data out of which r-trees are mainly efficient. Other indexing techniques are quad-trees, grid files etc. Spatial data is used in GIS applications.
- The document describes a study that used various machine learning algorithms to classify images of letters based on 16 extracted features.
- The best performing algorithm was the Gaussian Mixture Model, which achieved 96.43% accuracy by modeling each letter as a Gaussian distribution and accounting for correlations between features.
- The k-Nearest Neighbors algorithm achieved 95.65% accuracy when using only the single nearest neighbor for classification, while the Naive Bayes classifier achieved only 62.45% due to its strong independence assumptions.
This document contains summaries of several academic papers. The papers discuss topics related to computer vision, image processing, and machine learning including monocular depth estimation, subspace clustering, person re-identification, image set coding, 3D shape matching, single-image super-resolution, video synopsis, frame rate upconversion, in-loop filtering in video coding, microscopy image denoising, visual tracking, stereo matching, and brain image segmentation. Contact information is provided for TSYS Academic Projects in Adyar, India.
TYBSC IT PGIS Unit I Chapter I- Introduction to Geographic Information SystemsArti Parab Academics
A Gentle Introduction to GIS The nature of GIS: Some fundamental observations, Defining GIS, GISystems, GIScience and GIApplications, Spatial data and Geoinformation. The real world and representations of it: Models and modelling, Maps, Databases, Spatial databases and spatial analysis
Spatial data mining involves discovering patterns from large spatial datasets. It differs from traditional data mining due to properties of spatial data like spatial autocorrelation and heterogeneity. Key spatial data mining tasks include clustering, classification, trend analysis and association rule mining. Clustering algorithms like PAM and CLARA are useful for grouping spatial data objects. Trend analysis can identify global or local trends by analyzing attributes of spatially related objects. Future areas of research include spatial data mining in object oriented databases and using parallel processing to improve computational efficiency for large spatial datasets.
This document discusses integrated analysis of spatial and non-spatial data in geographic information systems (GIS). It defines GIS and describes how spatial data includes location, shape and size attributes and can be represented as rasters or vectors, while non-spatial data provides additional information linked to spatial features. The document outlines common spatial analysis functions like overlay and neighborhood analysis that integrate both data types to answer questions about real-world features and relationships.
Spatial databases allow for storage and querying of spatial data types like points, lines, and regions. A spatial database management system (SDBMS) extends a regular DBMS with spatial data types, models, and query languages. It supports spatial indexing and algorithms for spatial queries. Common spatial queries include selection queries to find objects within a region, nearest neighbor queries to find closest objects, and spatial joins to find object relationships. Spatial indexing uses access methods like R-trees that represent regions as minimum bounding rectangles to enable efficient filtering and refinement of query results.
Data Entry and Preparation Spatial Data Input: Direct spatial data capture, Indirect spatial data captiure, Obtaining spatial data elsewhere Data Quality: Accuracy and Positioning, Positional accuracy, Attribute accuracy, Temporal accuracy, Lineage, Completeness, Logical consistency Data Preparation: Data checks and repairs, Combining data from multiple sources Point Data Transformation: Interpolating discrete data, Interpolating continuous data
This PPT contains the basics of Geocoding. If you are a beginner then it will definitely help you to understand what Geocoding is all about and why we do Geocoding.
Spatial databases are designed to store and analyze spatial data more efficiently than traditional databases. Spatial data represents objects in geometric space and includes points, lines, and polygons. Spatial databases use spatial indexes and spatial query languages to optimize storage and retrieval of spatial data types and allow spatial queries and analysis. Common spatial database operations include measurements, functions, and predicates on geometric objects.
This document describes an efficient method for segmenting organized point cloud data using connected components analysis. The method works by assigning integer labels to points in the cloud that are similar according to a comparison function. It can be used for tasks like planar segmentation and tabletop object detection. Planar segmentation works by first computing surface normals and plane equations for each point, then comparing points and merging those that are part of the same plane segment. The method enables real-time segmentation of RGB-D point clouds.
2014_IJCCC_Handwritten Documents Text Line Segmentation based on Information ...Mihai Tanase
This document summarizes a research paper on segmenting text lines in handwritten documents. It proposes using an "information energy" approach, where each pixel is assigned an energy value based on its information content. Pixels with high variation, like those in text, have high energy, while uniform pixels like spaces between lines have low energy. An algorithm computes the energy map while accounting for varying text direction. It then identifies low-energy pixels separating lines. The method is tested on 500 documents, showing good results. Future work could evaluate the algorithm on larger datasets and optimize its computational complexity.
USING THE G.I.S. INSTRUMENTS TO OPTIMIZE THE DECISIONAL PROCESS AT THE LOCAL ...mihai_herbei
A G.I.S. is an information management tool that helps us to store, organize and utilize spatial information in a form that will enable everyday tasks to be completed more efficiently.
1. Georeferencing is the process of aligning geospatial data like images and maps to a known coordinate system so they can be accurately overlayed and compared. It allows data collected over time to be analyzed together and related to other geographic datasets.
2. There are several methods for georeferencing, but it generally involves identifying control points on the image/map to align to known coordinates. A polynomial transformation is then applied to warp the image and assign accurate coordinates to each pixel.
3. Reverse geocoding is the process of taking geographic coordinates like from GPS and determining the closest matching address or place name. It allows location data to be understood in human readable terms. However, geocoding
This document discusses various GIS analysis techniques including buffering, overlay, and distance measurement. Buffering creates zones around features, which can be used for planning purposes like restricting certain land uses near schools. Overlay combines data layers by joining features that occupy the same space. Feature overlay splits features where they overlap, while raster overlay combines characteristics across layers. Overlay is useful for queries and modeling. Distance measurement finds straight-line distances between features for analysis and pattern detection.
A Study on Optimization of Top-k Queries in Relational DatabasesIOSR Journals
This document discusses optimization of top-k queries in relational databases. It begins with an introduction to top-k queries and their increasing importance. It then discusses extending relational algebra and query optimization to integrate rank-join operators that can efficiently evaluate top-k queries. Specifically, it outlines enlarging the search space of plans to include those using rank-join operators, and providing a cost model for these operators to help the optimizer select the most efficient plan. The goal is to fully support efficient evaluation of top-k queries within relational database systems.
A Landmark Based Shortest Path Detection by Using A* and Haversine Formularahulmonikasharma
In 1900, less than 20 percent of the world populace lived in cities, in 2007, fair more than 50 percent of the world populace lived in cities. In 2050, it has been anticipated that more than 70 percent of the worldwide population (about 6.4 billion individuals) will be city tenants. There's more weight being set on cities through this increment in population [1]. With approach of keen cities, data and communication technology is progressively transforming the way city regions and city inhabitants organize and work in reaction to urban development. In this paper, we create a nonspecific plot for navigating a route throughout city A asked route is given by utilizing combination of A* Algorithm and Haversine equation. Haversine Equation gives least distance between any two focuses on spherical body by utilizing latitude and longitude. This least distance is at that point given to A* calculation to calculate minimum distance. The method for identifying the shortest path is specify in this paper.
The document discusses improving the accuracy of digital terrain models (DTMs). It compares different algorithms for generating DTMs from point data, including inverse distance weighting, spline interpolation, Voronoi diagrams, Delaunay triangulation, and kriging. The author tests these algorithms on real elevation data from Oradea, Romania. Results show Delaunay triangulation produces the most accurate surface, followed by kriging and Shepard interpolation. Higher point density leads to greater DTM accuracy than the interpolation method used. The quality of DTMs can be improved by using Delaunay triangulation with a dense point distribution.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
This document summarizes two algorithms - MFA and ATRA - for processing top-k spatial preference queries. MFA is a threshold-based algorithm that partitions queries into three features - spatial, preference, and text - and retrieves objects with the highest aggregate scores. ATRA uses a hybrid indexing structure called AIR-tree to more efficiently retrieve only relevant objects without revisiting the same data. The paper then proposes using an R-tree index structure combined with an enhanced branch-and-bound search algorithm to answer preference-based top-k spatial keyword queries by ranking objects based on feature quality in their neighborhoods.
This document contains summaries of several academic papers. The papers discuss topics related to computer vision, image processing, and machine learning including monocular depth estimation, subspace clustering, person re-identification, image set coding, 3D shape matching, single-image super-resolution, video synopsis, frame rate upconversion, in-loop filtering in video coding, microscopy image denoising, visual tracking, stereo matching, and brain image segmentation. Contact information is provided for TSYS Academic Projects in Adyar, India.
TYBSC IT PGIS Unit I Chapter I- Introduction to Geographic Information SystemsArti Parab Academics
A Gentle Introduction to GIS The nature of GIS: Some fundamental observations, Defining GIS, GISystems, GIScience and GIApplications, Spatial data and Geoinformation. The real world and representations of it: Models and modelling, Maps, Databases, Spatial databases and spatial analysis
Spatial data mining involves discovering patterns from large spatial datasets. It differs from traditional data mining due to properties of spatial data like spatial autocorrelation and heterogeneity. Key spatial data mining tasks include clustering, classification, trend analysis and association rule mining. Clustering algorithms like PAM and CLARA are useful for grouping spatial data objects. Trend analysis can identify global or local trends by analyzing attributes of spatially related objects. Future areas of research include spatial data mining in object oriented databases and using parallel processing to improve computational efficiency for large spatial datasets.
This document discusses integrated analysis of spatial and non-spatial data in geographic information systems (GIS). It defines GIS and describes how spatial data includes location, shape and size attributes and can be represented as rasters or vectors, while non-spatial data provides additional information linked to spatial features. The document outlines common spatial analysis functions like overlay and neighborhood analysis that integrate both data types to answer questions about real-world features and relationships.
Spatial databases allow for storage and querying of spatial data types like points, lines, and regions. A spatial database management system (SDBMS) extends a regular DBMS with spatial data types, models, and query languages. It supports spatial indexing and algorithms for spatial queries. Common spatial queries include selection queries to find objects within a region, nearest neighbor queries to find closest objects, and spatial joins to find object relationships. Spatial indexing uses access methods like R-trees that represent regions as minimum bounding rectangles to enable efficient filtering and refinement of query results.
Data Entry and Preparation Spatial Data Input: Direct spatial data capture, Indirect spatial data captiure, Obtaining spatial data elsewhere Data Quality: Accuracy and Positioning, Positional accuracy, Attribute accuracy, Temporal accuracy, Lineage, Completeness, Logical consistency Data Preparation: Data checks and repairs, Combining data from multiple sources Point Data Transformation: Interpolating discrete data, Interpolating continuous data
This PPT contains the basics of Geocoding. If you are a beginner then it will definitely help you to understand what Geocoding is all about and why we do Geocoding.
Spatial databases are designed to store and analyze spatial data more efficiently than traditional databases. Spatial data represents objects in geometric space and includes points, lines, and polygons. Spatial databases use spatial indexes and spatial query languages to optimize storage and retrieval of spatial data types and allow spatial queries and analysis. Common spatial database operations include measurements, functions, and predicates on geometric objects.
This document describes an efficient method for segmenting organized point cloud data using connected components analysis. The method works by assigning integer labels to points in the cloud that are similar according to a comparison function. It can be used for tasks like planar segmentation and tabletop object detection. Planar segmentation works by first computing surface normals and plane equations for each point, then comparing points and merging those that are part of the same plane segment. The method enables real-time segmentation of RGB-D point clouds.
2014_IJCCC_Handwritten Documents Text Line Segmentation based on Information ...Mihai Tanase
This document summarizes a research paper on segmenting text lines in handwritten documents. It proposes using an "information energy" approach, where each pixel is assigned an energy value based on its information content. Pixels with high variation, like those in text, have high energy, while uniform pixels like spaces between lines have low energy. An algorithm computes the energy map while accounting for varying text direction. It then identifies low-energy pixels separating lines. The method is tested on 500 documents, showing good results. Future work could evaluate the algorithm on larger datasets and optimize its computational complexity.
USING THE G.I.S. INSTRUMENTS TO OPTIMIZE THE DECISIONAL PROCESS AT THE LOCAL ...mihai_herbei
A G.I.S. is an information management tool that helps us to store, organize and utilize spatial information in a form that will enable everyday tasks to be completed more efficiently.
1. Georeferencing is the process of aligning geospatial data like images and maps to a known coordinate system so they can be accurately overlayed and compared. It allows data collected over time to be analyzed together and related to other geographic datasets.
2. There are several methods for georeferencing, but it generally involves identifying control points on the image/map to align to known coordinates. A polynomial transformation is then applied to warp the image and assign accurate coordinates to each pixel.
3. Reverse geocoding is the process of taking geographic coordinates like from GPS and determining the closest matching address or place name. It allows location data to be understood in human readable terms. However, geocoding
This document discusses various GIS analysis techniques including buffering, overlay, and distance measurement. Buffering creates zones around features, which can be used for planning purposes like restricting certain land uses near schools. Overlay combines data layers by joining features that occupy the same space. Feature overlay splits features where they overlap, while raster overlay combines characteristics across layers. Overlay is useful for queries and modeling. Distance measurement finds straight-line distances between features for analysis and pattern detection.
A Study on Optimization of Top-k Queries in Relational DatabasesIOSR Journals
This document discusses optimization of top-k queries in relational databases. It begins with an introduction to top-k queries and their increasing importance. It then discusses extending relational algebra and query optimization to integrate rank-join operators that can efficiently evaluate top-k queries. Specifically, it outlines enlarging the search space of plans to include those using rank-join operators, and providing a cost model for these operators to help the optimizer select the most efficient plan. The goal is to fully support efficient evaluation of top-k queries within relational database systems.
A Landmark Based Shortest Path Detection by Using A* and Haversine Formularahulmonikasharma
In 1900, less than 20 percent of the world populace lived in cities, in 2007, fair more than 50 percent of the world populace lived in cities. In 2050, it has been anticipated that more than 70 percent of the worldwide population (about 6.4 billion individuals) will be city tenants. There's more weight being set on cities through this increment in population [1]. With approach of keen cities, data and communication technology is progressively transforming the way city regions and city inhabitants organize and work in reaction to urban development. In this paper, we create a nonspecific plot for navigating a route throughout city A asked route is given by utilizing combination of A* Algorithm and Haversine equation. Haversine Equation gives least distance between any two focuses on spherical body by utilizing latitude and longitude. This least distance is at that point given to A* calculation to calculate minimum distance. The method for identifying the shortest path is specify in this paper.
The document discusses improving the accuracy of digital terrain models (DTMs). It compares different algorithms for generating DTMs from point data, including inverse distance weighting, spline interpolation, Voronoi diagrams, Delaunay triangulation, and kriging. The author tests these algorithms on real elevation data from Oradea, Romania. Results show Delaunay triangulation produces the most accurate surface, followed by kriging and Shepard interpolation. Higher point density leads to greater DTM accuracy than the interpolation method used. The quality of DTMs can be improved by using Delaunay triangulation with a dense point distribution.
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
This document summarizes two algorithms - MFA and ATRA - for processing top-k spatial preference queries. MFA is a threshold-based algorithm that partitions queries into three features - spatial, preference, and text - and retrieves objects with the highest aggregate scores. ATRA uses a hybrid indexing structure called AIR-tree to more efficiently retrieve only relevant objects without revisiting the same data. The paper then proposes using an R-tree index structure combined with an enhanced branch-and-bound search algorithm to answer preference-based top-k spatial keyword queries by ranking objects based on feature quality in their neighborhoods.
This document proposes and defines a top-k spatial preference query that ranks spatial objects based on the qualities of features in their neighborhood. It introduces existing systems that do not support this type of query. The proposed system uses a scoring function to calculate a score for each object based on features within its neighborhood, defined by either a range score or influence score. It describes algorithms to perform a multiway join on feature trees to obtain qualified feature combinations and search the object tree to retrieve relevant objects.
Scalable Keyword Cover Search using Keyword NNE and Inverted IndexingIRJET Journal
This document discusses scalable keyword cover search using keyword nearest neighbor expansion (keyword-NNE) and inverted indexing. It proposes a more efficient algorithm called keyword-NNE to address the performance issues of existing baseline algorithms for closest keyword search as the number of query keywords increases. Keyword-NNE significantly reduces the number of candidate keyword covers generated compared to the baseline. The algorithm and inverted indexing techniques are analyzed and shown to outperform alternatives through experiments on real datasets.
ADVANCE DATABASE MANAGEMENT SYSTEM CONCEPTS & ARCHITECTURE by vikas jagtapVikas Jagtap
The data that indicates the earth location (latitude & longitude, or height & depth ) of these rendered objects is known as spatial data.
When the map is rendered, objects of this spatial data are used to project the location of the objects on 2-Dimentional piece of paper.
The spatial data management systems are designed to make the storage, retrieval, & manipulation of spatial data (i.e points, lines and polygons) easier and natural to users, such as GIS.
While typical databases can understand various numeric and character types of data, additional functionality needs to be added for databases to process spatial data types.
These are typically called geometry or feature.
Performance Evaluation of Trajectory Queries on Multiprocessor and Clustercsandit
In this study, we evaluate the performance of traje
ctory queries that are handled by Cassandra,
MongoDB, and PostgreSQL. The evaluation is conducte
d on a multiprocessor and a cluster.
Telecommunication companies collect a lot of data f
rom their mobile users. These data must be
analysed in order to support business decisions, su
ch as infrastructure planning. The optimal
choice of hardware platform and database can be dif
ferent from a query to another. We use data
collected from Telenor Sverige, a telecommunication
company that operates in Sweden. These
data are collected every five minutes for an entire
week in a medium sized city. The execution
time results show that Cassandra performs much bett
er than MongoDB and PostgreSQL for
queries that do not have spatial features. Statio’s
Cassandra Lucene index incorporates a
geospatial index into Cassandra, thus making Cassan
dra to perform similarly as MongoDB to
handle spatial queries. In four use cases, namely,
distance query, k-nearest neigbhor query,
range query, and region query, Cassandra performs m
uch better than MongoDB and
PostgreSQL for two cases, namely range query and re
gion query. The scalability is also good for
these two use cases.
PERFORMANCE EVALUATION OF TRAJECTORY QUERIES ON MULTIPROCESSOR AND CLUSTERcscpconf
In this study, we evaluate the performance of trajectory queries that are handled by Cassandra, MongoDB, and PostgreSQL. The evaluation is conducted on a multiprocessor and a cluster.
Telecommunication companies collect a lot of data from their mobile users. These data must be analyzed in order to support business decisions, such as infrastructure planning. The optimal
choice of hardware platform and database can be different from a query to another. We use data
collected from Telenor Sverige, a telecommunication company that operates in Sweden. These data are collected every five minutes for an entire week in a medium sized city. The execution
time results show that Cassandra performs much better than MongoDB and PostgreSQL for queries that do not have spatial features. Statio’s Cassandra Lucene index incorporates a
geospatial index into Cassandra, thus making Cassandra to perform similarly as MongoDB to
handle spatial queries. In four use cases, namely, distance query, k-nearest neigbhor query, range query, and region query, Cassandra performs much better than MongoDB and
PostgreSQL for two cases, namely range query and region query. The scalability is also good for these two use cases
Skyline Query Processing using Filtering in Distributed EnvironmentIJMER
This document summarizes a research paper about skyline query processing in distributed databases. Skyline queries return multidimensional data points that are not dominated by other points. In distributed databases, skyline queries must be processed across multiple data sites. The paper proposes using multiple filtering points selected from each local skyline result to reduce the number of false positive results and communication costs between sites. Two heuristics called MaxSum and MaxDist are described for selecting filtering points that maximize their combined dominating potential across sites to improve distributed skyline query processing performance.
Convincing a customer is always considered as a challenging task in every business. But when it comes to online business, this task becomes even more difficult. Online retailers try everything possible to gain the trust of the customer. One of the solutions is to provide an area for existing users to leave their comments. This service can effectively develop the trust of the customer however normally the customer comments about the product in their native language using Roman script. If there are hundreds of comments this makes difficulty even for the native customers to make a buying decision. This research proposes a system which extracts the comments posted in Roman Urdu, translate them, find their polarity and then gives us the rating of the product. This rating will help the native and non-native customers to make buying decision efficiently from the comments posted in Roman Urdu.
Spatial database are becoming more and more popular in recent years. There is more and more
commercial and research interest in location-based search from spatial database. Spatial keyword search
has been well studied for years due to its importance to commercial search engines. Specially, a spatial
keyword query takes a user location and user-supplied keywords as arguments and returns objects that are
spatially and textually relevant to these arguments. Geo-textual index play an important role in spatial
keyword querying. A number of geo-textual indices have been proposed in recent years which mainly
combine the R-tree and its variants and the inverted file. This paper propose new index structure that
combine K-d tree and inverted file for spatial range keyword query which are based on the most spatial
and textual relevance to query point within given range.
PERFORMANCE EVALUATION OF SQL AND NOSQL DATABASE MANAGEMENT SYSTEMS IN A CLUSTERijdms
In this study, we evaluate the performance of SQL and NoSQL database management systems namely;
Cassandra, CouchDB, MongoDB, PostgreSQL, and RethinkDB. We use a cluster of four nodes to run the
database systems, with external load generators.The evaluation is conducted using data from Telenor
Sverige, a telecommunication company that operates in Sweden. The experiments are conducted using
three datasets of different sizes.The write throughput and latency as well as the read throughput and
latency are evaluated for four queries; namely distance query, k-nearest neighbour query, range query, and
region query. For write operations Cassandra has the highest throughput when multiple nodes are used,
whereas PostgreSQL has the lowest latency and the highest throughput for a single node. For read
operations MongoDB has the lowest latency for all queries. However, Cassandra has the highest
throughput for reads. The throughput decreasesas the dataset size increases for both write and read, for
both sequential as well as random order access. However, this decrease is more significant for random
read and write. In this study, we present the experience we had with these different database management
systems including setup and configuration complexity
International Journal of Engineering Research and DevelopmentIJERD Editor
Electrical, Electronics and Computer Engineering,
Information Engineering and Technology,
Mechanical, Industrial and Manufacturing Engineering,
Automation and Mechatronics Engineering,
Material and Chemical Engineering,
Civil and Architecture Engineering,
Biotechnology and Bio Engineering,
Environmental Engineering,
Petroleum and Mining Engineering,
Marine and Agriculture engineering,
Aerospace Engineering.
The document summarizes a proposed system for currency recognition on mobile phones. The system has the following modules: 1) segmentation to isolate the currency from background noise, 2) feature extraction and building a visual vocabulary, 3) instance retrieval using inverted indexing and spatial reranking, 4) classification by vote counting spatially consistent features. The system was adapted for mobile by reducing complexity, such as using an inverted index, while maintaining accuracy. Performance is evaluated using metrics like accuracy and precision.
Web users and content are increasingly being geo-positioned, and increased focus is being given
to serving local content in response to web queries. This development calls for spatial keyword queries that take
into account both the locations and textual descriptions of content. We study the efficient, joint processing of
multiple top-k spatial keyword queries. Such joint processing is attractive during high query loads and also
occurs when multiple queries are used to obfuscate a user’s true query. To propose a novel algorithm and index
structure for the joint processing of top-k spatial keyword queries. Empirical studies show that the proposed
solution is efficient on real datasets. They also offer analytical studies on synthetic datasets to demonstrate the
efficiency of the proposed solution.
Databases Basics and Spacial Matrix - Discussig Geographic Potentials of Data...Jerin John
A core introduction to Data Types, Databases around us and the use cases, Integrating GeoSpacial Arrangements to our projects. This include Speaker Notes. and is indented to give an overall idea about things not focusing on a specific focus. This Incudes references of Relational Databases likePostgresql and Mysql, Key Value databases like Redis, Document DB like MongoDB, Search Engines like ElasticSearch, and the second part handles with GeoSpatial Arrangements in Postgis.
A Geo-PFM Model for Point Of Interest RecommendationIRJET Journal
The document discusses recommending points of interest (POIs) like restaurants and movie theaters by proposing a geographical probabilistic factor modeling (Geo-PFM) framework. Geo-PFM considers factors like user preferences, geographical influences based on Tobler's law of geography, and user mobility patterns that impact a user's choice of POI. It provides flexible modeling of check-in data that can be sparse. Experimental results on real LBSN datasets show the proposed systems outperform baseline models in recommending POIs to users.
This document discusses techniques for efficiently processing range aggregate queries when the query location is uncertain. It proposes two novel filtering techniques, statistical filtering and anchor point filtering, to address the limitations of existing probabilistic constrained region techniques. The statistical filtering technique has decent filtering power with minimal storage requirements, while anchor point filtering provides more flexibility to enhance filtering power by storing more pre-computed information. Extensive experiments demonstrate the efficiency of the proposed techniques for computing range aggregates over uncertain location-based queries.
Extended hybrid region growing segmentation of point clouds with different re...csandit
In the recent years, 3D city reconstruction is one of the active researches in the field of
photogrammetry. The goal of this work is to improve and extend region growing based
segmentation in the X-Y-Z image in the form of 3D structured data with combination of spectral
information of RGB and grayscale image to extract building roofs, streets and vegetation. In
order to process 3D point clouds, hybrid segmentation is carried out in both object space and
image space. Our experiments on two case studies verify that updating plane parameters and
robust least squares plane fitting improves the results of building extraction especially in case
of low accurate point clouds. In addition, region growing in image space has been derived to
the fact that grayscale image is more flexible than RGB image and results in more realistic
building roofs.
EXTENDED HYBRID REGION GROWING SEGMENTATION OF POINT CLOUDS WITH DIFFERENT RE...cscpconf
In the recent years, 3D city reconstruction is one of the active researches in the field of photogrammetry. The goal of this work is to improve and extend region growing based
segmentation in the X-Y-Z image in the form of 3D structured data with combination of spectral information of RGB and grayscale image to extract building roofs, streets and vegetation. In order to process 3D point clouds, hybrid segmentation is carried out in both object space and
image space. Our experiments on two case studies verify that updating plane parameters and robust least squares plane fitting improves the results of building extraction especially in case of low accurate point clouds. In addition, region growing in image space has been derived to the fact that grayscale image is more flexible than RGB image and results in more realistic
building roofs.
This document describes a web application for analyzing building energy management data using predictive modeling and machine learning techniques. The application contains years of sensor data from a CUNY building and allows users to visualize the data, perform statistical analysis, and generate forecasts using Python modules. Key features include interactive data visualization, filtering and selecting subsets of data, defining expressions of sensor variables, and applying machine learning models for prediction. The application provides a customizable platform for exploring time series data while allowing different users to share their work.
Similar to Aggregation of data by using top k spatial query preferences (20)
Abnormalities of hormones and inflammatory cytokines in women affected with p...Alexander Decker
Women with polycystic ovary syndrome (PCOS) have elevated levels of hormones like luteinizing hormone and testosterone, as well as higher levels of insulin and insulin resistance compared to healthy women. They also have increased levels of inflammatory markers like C-reactive protein, interleukin-6, and leptin. This study found these abnormalities in the hormones and inflammatory cytokines of women with PCOS ages 23-40, indicating that hormone imbalances associated with insulin resistance and elevated inflammatory markers may worsen infertility in women with PCOS.
A usability evaluation framework for b2 c e commerce websitesAlexander Decker
This document presents a framework for evaluating the usability of B2C e-commerce websites. It involves user testing methods like usability testing and interviews to identify usability problems in areas like navigation, design, purchasing processes, and customer service. The framework specifies goals for the evaluation, determines which website aspects to evaluate, and identifies target users. It then describes collecting data through user testing and analyzing the results to identify usability problems and suggest improvements.
A universal model for managing the marketing executives in nigerian banksAlexander Decker
This document discusses a study that aimed to synthesize motivation theories into a universal model for managing marketing executives in Nigerian banks. The study was guided by Maslow and McGregor's theories. A sample of 303 marketing executives was used. The results showed that managers will be most effective at motivating marketing executives if they consider individual needs and create challenging but attainable goals. The emerged model suggests managers should provide job satisfaction by tailoring assignments to abilities and monitoring performance with feedback. This addresses confusion faced by Nigerian bank managers in determining effective motivation strategies.
A unique common fixed point theorems in generalized dAlexander Decker
This document presents definitions and properties related to generalized D*-metric spaces and establishes some common fixed point theorems for contractive type mappings in these spaces. It begins by introducing D*-metric spaces and generalized D*-metric spaces, defines concepts like convergence and Cauchy sequences. It presents lemmas showing the uniqueness of limits in these spaces and the equivalence of different definitions of convergence. The goal of the paper is then stated as obtaining a unique common fixed point theorem for generalized D*-metric spaces.
A trends of salmonella and antibiotic resistanceAlexander Decker
This document provides a review of trends in Salmonella and antibiotic resistance. It begins with an introduction to Salmonella as a facultative anaerobe that causes nontyphoidal salmonellosis. The emergence of antimicrobial-resistant Salmonella is then discussed. The document proceeds to cover the historical perspective and classification of Salmonella, definitions of antimicrobials and antibiotic resistance, and mechanisms of antibiotic resistance in Salmonella including modification or destruction of antimicrobial agents, efflux pumps, modification of antibiotic targets, and decreased membrane permeability. Specific resistance mechanisms are discussed for several classes of antimicrobials.
A transformational generative approach towards understanding al-istifhamAlexander Decker
This document discusses a transformational-generative approach to understanding Al-Istifham, which refers to interrogative sentences in Arabic. It begins with an introduction to the origin and development of Arabic grammar. The paper then explains the theoretical framework of transformational-generative grammar that is used. Basic linguistic concepts and terms related to Arabic grammar are defined. The document analyzes how interrogative sentences in Arabic can be derived and transformed via tools from transformational-generative grammar, categorizing Al-Istifham into linguistic and literary questions.
A time series analysis of the determinants of savings in namibiaAlexander Decker
This document summarizes a study on the determinants of savings in Namibia from 1991 to 2012. It reviews previous literature on savings determinants in developing countries. The study uses time series analysis including unit root tests, cointegration, and error correction models to analyze the relationship between savings and variables like income, inflation, population growth, deposit rates, and financial deepening in Namibia. The results found inflation and income have a positive impact on savings, while population growth negatively impacts savings. Deposit rates and financial deepening were found to have no significant impact. The study reinforces previous work and emphasizes the importance of improving income levels to achieve higher savings rates in Namibia.
A therapy for physical and mental fitness of school childrenAlexander Decker
This document summarizes a study on the importance of exercise in maintaining physical and mental fitness for school children. It discusses how physical and mental fitness are developed through participation in regular physical exercises and cannot be achieved solely through classroom learning. The document outlines different types and components of fitness and argues that developing fitness should be a key objective of education systems. It recommends that schools ensure pupils engage in graded physical activities and exercises to support their overall development.
A theory of efficiency for managing the marketing executives in nigerian banksAlexander Decker
This document summarizes a study examining efficiency in managing marketing executives in Nigerian banks. The study was examined through the lenses of Kaizen theory (continuous improvement) and efficiency theory. A survey of 303 marketing executives from Nigerian banks found that management plays a key role in identifying and implementing efficiency improvements. The document recommends adopting a "3H grand strategy" to improve the heads, hearts, and hands of management and marketing executives by enhancing their knowledge, attitudes, and tools.
This document discusses evaluating the link budget for effective 900MHz GSM communication. It describes the basic parameters needed for a high-level link budget calculation, including transmitter power, antenna gains, path loss, and propagation models. Common propagation models for 900MHz that are described include Okumura model for urban areas and Hata model for urban, suburban, and open areas. Rain attenuation is also incorporated using the updated ITU model to improve communication during rainfall.
A synthetic review of contraceptive supplies in punjabAlexander Decker
This document discusses contraceptive use in Punjab, Pakistan. It begins by providing background on the benefits of family planning and contraceptive use for maternal and child health. It then analyzes contraceptive commodity data from Punjab, finding that use is still low despite efforts to improve access. The document concludes by emphasizing the need for strategies to bridge gaps and meet the unmet need for effective and affordable contraceptive methods and supplies in Punjab in order to improve health outcomes.
A synthesis of taylor’s and fayol’s management approaches for managing market...Alexander Decker
1) The document discusses synthesizing Taylor's scientific management approach and Fayol's process management approach to identify an effective way to manage marketing executives in Nigerian banks.
2) It reviews Taylor's emphasis on efficiency and breaking tasks into small parts, and Fayol's focus on developing general management principles.
3) The study administered a survey to 303 marketing executives in Nigerian banks to test if combining elements of Taylor and Fayol's approaches would help manage their performance through clear roles, accountability, and motivation. Statistical analysis supported combining the two approaches.
A survey paper on sequence pattern mining with incrementalAlexander Decker
This document summarizes four algorithms for sequential pattern mining: GSP, ISM, FreeSpan, and PrefixSpan. GSP is an Apriori-based algorithm that incorporates time constraints. ISM extends SPADE to incrementally update patterns after database changes. FreeSpan uses frequent items to recursively project databases and grow subsequences. PrefixSpan also uses projection but claims to not require candidate generation. It recursively projects databases based on short prefix patterns. The document concludes by stating the goal was to find an efficient scheme for extracting sequential patterns from transactional datasets.
A survey on live virtual machine migrations and its techniquesAlexander Decker
This document summarizes several techniques for live virtual machine migration in cloud computing. It discusses works that have proposed affinity-aware migration models to improve resource utilization, energy efficient migration approaches using storage migration and live VM migration, and a dynamic consolidation technique using migration control to avoid unnecessary migrations. The document also summarizes works that have designed methods to minimize migration downtime and network traffic, proposed a resource reservation framework for efficient migration of multiple VMs, and addressed real-time issues in live migration. Finally, it provides a table summarizing the techniques, tools used, and potential future work or gaps identified for each discussed work.
A survey on data mining and analysis in hadoop and mongo dbAlexander Decker
This document discusses data mining of big data using Hadoop and MongoDB. It provides an overview of Hadoop and MongoDB and their uses in big data analysis. Specifically, it proposes using Hadoop for distributed processing and MongoDB for data storage and input. The document reviews several related works that discuss big data analysis using these tools, as well as their capabilities for scalable data storage and mining. It aims to improve computational time and fault tolerance for big data analysis by mining data stored in Hadoop using MongoDB and MapReduce.
1. The document discusses several challenges for integrating media with cloud computing including media content convergence, scalability and expandability, finding appropriate applications, and reliability.
2. Media content convergence challenges include dealing with the heterogeneity of media types, services, networks, devices, and quality of service requirements as well as integrating technologies used by media providers and consumers.
3. Scalability and expandability challenges involve adapting to the increasing volume of media content and being able to support new media formats and outlets over time.
This document surveys trust architectures that leverage provenance in wireless sensor networks. It begins with background on provenance, which refers to the documented history or derivation of data. Provenance can be used to assess trust by providing metadata about how data was processed. The document then discusses challenges for using provenance to establish trust in wireless sensor networks, which have constraints on energy and computation. Finally, it provides background on trust, which is the subjective probability that a node will behave dependably. Trust architectures need to be lightweight to account for the constraints of wireless sensor networks.
This document discusses private equity investments in Kenya. It provides background on private equity and discusses trends in various regions. The objectives of the study discussed are to establish the extent of private equity adoption in Kenya, identify common forms of private equity utilized, and determine typical exit strategies. Private equity can involve venture capital, leveraged buyouts, or mezzanine financing. Exits allow recycling of capital into new opportunities. The document provides context on private equity globally and in developing markets like Africa to frame the goals of the study.
This document discusses a study that analyzes the financial health of the Indian logistics industry from 2005-2012 using Altman's Z-score model. The study finds that the average Z-score for selected logistics firms was in the healthy to very healthy range during the study period. The average Z-score increased from 2006 to 2010 when the Indian economy was hit by the global recession, indicating the overall performance of the Indian logistics industry was good. The document reviews previous literature on measuring financial performance and distress using ratios and Z-scores, and outlines the objectives and methodology used in the current study.
HCL Notes und Domino Lizenzkostenreduzierung in der Welt von DLAUpanagenda
Webinar Recording: https://www.panagenda.com/webinars/hcl-notes-und-domino-lizenzkostenreduzierung-in-der-welt-von-dlau/
DLAU und die Lizenzen nach dem CCB- und CCX-Modell sind für viele in der HCL-Community seit letztem Jahr ein heißes Thema. Als Notes- oder Domino-Kunde haben Sie vielleicht mit unerwartet hohen Benutzerzahlen und Lizenzgebühren zu kämpfen. Sie fragen sich vielleicht, wie diese neue Art der Lizenzierung funktioniert und welchen Nutzen sie Ihnen bringt. Vor allem wollen Sie sicherlich Ihr Budget einhalten und Kosten sparen, wo immer möglich. Das verstehen wir und wir möchten Ihnen dabei helfen!
Wir erklären Ihnen, wie Sie häufige Konfigurationsprobleme lösen können, die dazu führen können, dass mehr Benutzer gezählt werden als nötig, und wie Sie überflüssige oder ungenutzte Konten identifizieren und entfernen können, um Geld zu sparen. Es gibt auch einige Ansätze, die zu unnötigen Ausgaben führen können, z. B. wenn ein Personendokument anstelle eines Mail-Ins für geteilte Mailboxen verwendet wird. Wir zeigen Ihnen solche Fälle und deren Lösungen. Und natürlich erklären wir Ihnen das neue Lizenzmodell.
Nehmen Sie an diesem Webinar teil, bei dem HCL-Ambassador Marc Thomas und Gastredner Franz Walder Ihnen diese neue Welt näherbringen. Es vermittelt Ihnen die Tools und das Know-how, um den Überblick zu bewahren. Sie werden in der Lage sein, Ihre Kosten durch eine optimierte Domino-Konfiguration zu reduzieren und auch in Zukunft gering zu halten.
Diese Themen werden behandelt
- Reduzierung der Lizenzkosten durch Auffinden und Beheben von Fehlkonfigurationen und überflüssigen Konten
- Wie funktionieren CCB- und CCX-Lizenzen wirklich?
- Verstehen des DLAU-Tools und wie man es am besten nutzt
- Tipps für häufige Problembereiche, wie z. B. Team-Postfächer, Funktions-/Testbenutzer usw.
- Praxisbeispiele und Best Practices zum sofortigen Umsetzen
5th LF Energy Power Grid Model Meet-up SlidesDanBrown980551
5th Power Grid Model Meet-up
It is with great pleasure that we extend to you an invitation to the 5th Power Grid Model Meet-up, scheduled for 6th June 2024. This event will adopt a hybrid format, allowing participants to join us either through an online Mircosoft Teams session or in person at TU/e located at Den Dolech 2, Eindhoven, Netherlands. The meet-up will be hosted by Eindhoven University of Technology (TU/e), a research university specializing in engineering science & technology.
Power Grid Model
The global energy transition is placing new and unprecedented demands on Distribution System Operators (DSOs). Alongside upgrades to grid capacity, processes such as digitization, capacity optimization, and congestion management are becoming vital for delivering reliable services.
Power Grid Model is an open source project from Linux Foundation Energy and provides a calculation engine that is increasingly essential for DSOs. It offers a standards-based foundation enabling real-time power systems analysis, simulations of electrical power grids, and sophisticated what-if analysis. In addition, it enables in-depth studies and analysis of the electrical power grid’s behavior and performance. This comprehensive model incorporates essential factors such as power generation capacity, electrical losses, voltage levels, power flows, and system stability.
Power Grid Model is currently being applied in a wide variety of use cases, including grid planning, expansion, reliability, and congestion studies. It can also help in analyzing the impact of renewable energy integration, assessing the effects of disturbances or faults, and developing strategies for grid control and optimization.
What to expect
For the upcoming meetup we are organizing, we have an exciting lineup of activities planned:
-Insightful presentations covering two practical applications of the Power Grid Model.
-An update on the latest advancements in Power Grid -Model technology during the first and second quarters of 2024.
-An interactive brainstorming session to discuss and propose new feature requests.
-An opportunity to connect with fellow Power Grid Model enthusiasts and users.
Conversational agents, or chatbots, are increasingly used to access all sorts of services using natural language. While open-domain chatbots - like ChatGPT - can converse on any topic, task-oriented chatbots - the focus of this paper - are designed for specific tasks, like booking a flight, obtaining customer support, or setting an appointment. Like any other software, task-oriented chatbots need to be properly tested, usually by defining and executing test scenarios (i.e., sequences of user-chatbot interactions). However, there is currently a lack of methods to quantify the completeness and strength of such test scenarios, which can lead to low-quality tests, and hence to buggy chatbots.
To fill this gap, we propose adapting mutation testing (MuT) for task-oriented chatbots. To this end, we introduce a set of mutation operators that emulate faults in chatbot designs, an architecture that enables MuT on chatbots built using heterogeneous technologies, and a practical realisation as an Eclipse plugin. Moreover, we evaluate the applicability, effectiveness and efficiency of our approach on open-source chatbots, with promising results.
In the realm of cybersecurity, offensive security practices act as a critical shield. By simulating real-world attacks in a controlled environment, these techniques expose vulnerabilities before malicious actors can exploit them. This proactive approach allows manufacturers to identify and fix weaknesses, significantly enhancing system security.
This presentation delves into the development of a system designed to mimic Galileo's Open Service signal using software-defined radio (SDR) technology. We'll begin with a foundational overview of both Global Navigation Satellite Systems (GNSS) and the intricacies of digital signal processing.
The presentation culminates in a live demonstration. We'll showcase the manipulation of Galileo's Open Service pilot signal, simulating an attack on various software and hardware systems. This practical demonstration serves to highlight the potential consequences of unaddressed vulnerabilities, emphasizing the importance of offensive security practices in safeguarding critical infrastructure.
zkStudyClub - LatticeFold: A Lattice-based Folding Scheme and its Application...Alex Pruden
Folding is a recent technique for building efficient recursive SNARKs. Several elegant folding protocols have been proposed, such as Nova, Supernova, Hypernova, Protostar, and others. However, all of them rely on an additively homomorphic commitment scheme based on discrete log, and are therefore not post-quantum secure. In this work we present LatticeFold, the first lattice-based folding protocol based on the Module SIS problem. This folding protocol naturally leads to an efficient recursive lattice-based SNARK and an efficient PCD scheme. LatticeFold supports folding low-degree relations, such as R1CS, as well as high-degree relations, such as CCS. The key challenge is to construct a secure folding protocol that works with the Ajtai commitment scheme. The difficulty, is ensuring that extracted witnesses are low norm through many rounds of folding. We present a novel technique using the sumcheck protocol to ensure that extracted witnesses are always low norm no matter how many rounds of folding are used. Our evaluation of the final proof system suggests that it is as performant as Hypernova, while providing post-quantum security.
Paper Link: https://eprint.iacr.org/2024/257
Main news related to the CCS TSI 2023 (2023/1695)Jakub Marek
An English 🇬🇧 translation of a presentation to the speech I gave about the main changes brought by CCS TSI 2023 at the biggest Czech conference on Communications and signalling systems on Railways, which was held in Clarion Hotel Olomouc from 7th to 9th November 2023 (konferenceszt.cz). Attended by around 500 participants and 200 on-line followers.
The original Czech 🇨🇿 version of the presentation can be found here: https://www.slideshare.net/slideshow/hlavni-novinky-souvisejici-s-ccs-tsi-2023-2023-1695/269688092 .
The videorecording (in Czech) from the presentation is available here: https://youtu.be/WzjJWm4IyPk?si=SImb06tuXGb30BEH .
Monitoring and Managing Anomaly Detection on OpenShift.pdfTosin Akinosho
Monitoring and Managing Anomaly Detection on OpenShift
Overview
Dive into the world of anomaly detection on edge devices with our comprehensive hands-on tutorial. This SlideShare presentation will guide you through the entire process, from data collection and model training to edge deployment and real-time monitoring. Perfect for those looking to implement robust anomaly detection systems on resource-constrained IoT/edge devices.
Key Topics Covered
1. Introduction to Anomaly Detection
- Understand the fundamentals of anomaly detection and its importance in identifying unusual behavior or failures in systems.
2. Understanding Edge (IoT)
- Learn about edge computing and IoT, and how they enable real-time data processing and decision-making at the source.
3. What is ArgoCD?
- Discover ArgoCD, a declarative, GitOps continuous delivery tool for Kubernetes, and its role in deploying applications on edge devices.
4. Deployment Using ArgoCD for Edge Devices
- Step-by-step guide on deploying anomaly detection models on edge devices using ArgoCD.
5. Introduction to Apache Kafka and S3
- Explore Apache Kafka for real-time data streaming and Amazon S3 for scalable storage solutions.
6. Viewing Kafka Messages in the Data Lake
- Learn how to view and analyze Kafka messages stored in a data lake for better insights.
7. What is Prometheus?
- Get to know Prometheus, an open-source monitoring and alerting toolkit, and its application in monitoring edge devices.
8. Monitoring Application Metrics with Prometheus
- Detailed instructions on setting up Prometheus to monitor the performance and health of your anomaly detection system.
9. What is Camel K?
- Introduction to Camel K, a lightweight integration framework built on Apache Camel, designed for Kubernetes.
10. Configuring Camel K Integrations for Data Pipelines
- Learn how to configure Camel K for seamless data pipeline integrations in your anomaly detection workflow.
11. What is a Jupyter Notebook?
- Overview of Jupyter Notebooks, an open-source web application for creating and sharing documents with live code, equations, visualizations, and narrative text.
12. Jupyter Notebooks with Code Examples
- Hands-on examples and code snippets in Jupyter Notebooks to help you implement and test anomaly detection models.
How information systems are built or acquired puts information, which is what they should be about, in a secondary place. Our language adapted accordingly, and we no longer talk about information systems but applications. Applications evolved in a way to break data into diverse fragments, tightly coupled with applications and expensive to integrate. The result is technical debt, which is re-paid by taking even bigger "loans", resulting in an ever-increasing technical debt. Software engineering and procurement practices work in sync with market forces to maintain this trend. This talk demonstrates how natural this situation is. The question is: can something be done to reverse the trend?
How to Interpret Trends in the Kalyan Rajdhani Mix Chart.pdfChart Kalyan
A Mix Chart displays historical data of numbers in a graphical or tabular form. The Kalyan Rajdhani Mix Chart specifically shows the results of a sequence of numbers over different periods.
Discover top-tier mobile app development services, offering innovative solutions for iOS and Android. Enhance your business with custom, user-friendly mobile applications.
Your One-Stop Shop for Python Success: Top 10 US Python Development Providersakankshawande
Simplify your search for a reliable Python development partner! This list presents the top 10 trusted US providers offering comprehensive Python development services, ensuring your project's success from conception to completion.
AppSec PNW: Android and iOS Application Security with MobSFAjin Abraham
Mobile Security Framework - MobSF is a free and open source automated mobile application security testing environment designed to help security engineers, researchers, developers, and penetration testers to identify security vulnerabilities, malicious behaviours and privacy concerns in mobile applications using static and dynamic analysis. It supports all the popular mobile application binaries and source code formats built for Android and iOS devices. In addition to automated security assessment, it also offers an interactive testing environment to build and execute scenario based test/fuzz cases against the application.
This talk covers:
Using MobSF for static analysis of mobile applications.
Interactive dynamic security assessment of Android and iOS applications.
Solving Mobile app CTF challenges.
Reverse engineering and runtime analysis of Mobile malware.
How to shift left and integrate MobSF/mobsfscan SAST and DAST in your build pipeline.
AppSec PNW: Android and iOS Application Security with MobSF
Aggregation of data by using top k spatial query preferences
1. Computer Engineering and Intelligent Systems
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.12, 2013
www.iiste.org
Aggregation of Data by Using Top -K Spatial Query Preferences
Hanaa Mohsin Ali Al-Abboodi
College of Engineering , University of Babylon,Iraq
*Email:hanaamohsin77@Gmail.com
Abstract:
A spatial database is a database that is optimized to store and query data that represents objects defined in a
geometric space. A spatial preference query ranks objects based on the qualities of features in their spatial
neighborhood. For example, using a real estate agency database of flats for lease, a customer may want to rank
the flats with respect to the appropriateness of their location, defined after aggregating the qualities of other
features (e.g., restaurants, cafes, hospital, market, etc.) within their spatial neighborhood. Such a neighborhood
concept can be specified by the user via different functions. It can be an explicit circular region within a given
distance from the flat. Another intuitive definition is to assign higher weights to the features based on their
proximity to the flat. In this paper, we formally define spatial preference queries and propose appropriate
indexing techniques and search algorithms for them. Extensive evaluation of our methods on both real and
synthetic data reveals that an optimized branch-and-bound solution is efficient and robust with respect to
different parameters.
Index Terms: Query processing, spatial databases.
1. INTRODUCTION
Spatial database systems manage large collections of geographic entities, which apart from spatial attributes
contain non-spatial information (e.g., name, size, type, price, etc.). In this paper, we study an interesting type of
preference queries, which select the best spatial location with respect to the quality of facilities in its spatial
neighborhood.
Given a set D of interesting objects (e.g., candidate locations), a top-k spatial preference query retrieves the k
objects in D with the highest scores. The score of an object is defined by the quality of features (e.g., facilities or
services) in its spatial neighborhood. As a motivating example, consider a real estate agency office that holds a
database with available flats for lease. Here “feature” refers to a class of objects in a spatial map such as specific
facilities or services. A customer may want to rank the contents of this database with respect to the quality of
their locations, quantified by aggregating non-spatial characteristics of other features (e.g., restaurants, cafes,
hospital, market, etc.) in the spatial neighborhood of the flat (defined by a spatial range around it). Quality may
be subjective and query-parametric. For example, a user may define quality with respect to non-spatial attributes
of restaurants around it (e.g., whether they serve seafood, price range, etc.).
As another example, the user (e.g., a tourist) wishes to find a hotel p that is close to a high-quality restaurant and
a high quality cafe.
Fig. 1. Examples of top-k spatial preference queries. (a) Range score,
0:2 km.
= 0:2 km. (b) Influence score,
=
Fig. 1a illustrates the locations of an object data set D (hotels) in white, and two feature data sets: the set F1
(restaurants) in gray, and the set F2 (cafes) in black. Feature points are labeled by quality values that can be
obtained from rating providers (e.g., http://www.zagat.com/). For the ease of discussion, the qualities are
9
2. Computer Engineering and Intelligent Systems
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.12, 2013
www.iiste.org
normalized to values in [0; 1]. The score
of a hotel p is defined in terms of: 1) the maximum quality for each
feature in the neighborhood region of p, and 2) the aggregation of those qualities.
A simple score instance, called the range score, binds the neighborhood region to a circular region at p with
(shown as a circle), and the aggregate function to SUM. For instance, the maximum quality of gray
radius
and black points within the circle of p1 are 0.9 and 0.6, respectively, so the score of p1 is
= 0:9 + 0:6 = 1:5.
Similarly, we obtain
= 1:0 + 0:1 = 1:1 and
= 0:7 + 0:7 = 1:4. Hence, the hotel p1 is returned as the top
result.
In fact, the semantics of the aggregate function is relevant to the user’s query. The SUM function attempts to
balance the overall qualities of all features. For the MIN function, the top result becomes p3, with the
score
= min {0:7; 0:7} = 0:7. It ensures that the top result has reasonably high qualities in all features. For
the MAX function, the top result is p2, with
= max {1:0; 0:1} = 1:0. It is used to optimize the quality in a
particular feature, but not necessarily all of them.
The neighborhood region in the above spatial preference query can also be defined by other score functions.
A meaningful score function is the influence score (see Section 4). As opposed to the crisp radius _ constraint in
the range score, the influence score smoothens the effect of and assigns higher weights to cafes that are closer
to the hotel. Fig. 1b shows a hotel p5 and three cafes s1; s2; s3 (with their quality values). The circles have their
radii as multiples of . Now, the score of a cafe si is computed by multiplying its quality with the weight 2-j,
where j is the order of the smallest circle containing si. For example, the scores of s1, s2, and s3 are 0:3/21 = 0:15,
0:9/22 = 0:225, and 1:0/23 = 0:125, respectively. The influence score of p5 is taken as the highest value (0.225).
Traditionally, there are two basic ways for ranking objects: 1) spatial ranking, which orders the objects according
to their distance from a reference point, and 2) non-spatial ranking, which orders the objects by an aggregate
function on their non
spatial values. Our top-k spatial preference query integrates these two types of ranking in an intuitive way. As
indicated by our examples, this new query has a wide range of applications in service recommendation and
decision support systems.
To our knowledge, there is no existing efficient solution for processing the top-k spatial preference query. A
brute force approach (to be elaborated in Section 3.2) for evaluating it is to compute the scores of all objects in D
and select the top-k ones. This method, however, is expected to be very expensive for large input data sets. In
this paper, we propose alternative techniques that aim at minimizing the I/O accesses to the object and feature
data sets, while being also computationally efficient. Our techniques apply on spatial-partitioning access
methods and compute upper score bounds for the objects indexed by them, which are used to effectively prune
the search space. Specifically, we contribute the branch-and-bound (BB) algorithm and the feature join (FJ)
algorithm for efficiently processing the top-k spatial preference query.
Furthermore, this paper studies three relevant extensions that have not been investigated in our preliminary
work [1]. The first extension (Section 3.4) is an optimized version of BB that exploits a more efficient technique
for computing the scores of the objects. The second extension (Section 3.6) studies adaptations of the proposed
algorithms for aggregate functions other than SUM, e.g., the functions MIN and MAX. The third extension
(Section 4) develops solutions for the top-k spatial preference query based on the influence score.
The rest of this paper is structured as follows: Section 2 provides background on basic and advanced queries
on spatial databases, as well as top-k query evaluation in relational databases. Section 2 defines the top-k spatial
preference query and presents our solutions. Section 3 studies the query extension for the influence score. In
Section 3, our query algorithms are experimentally evaluated with real and synthetic data. Finally, Section 4
concludes the paper with future research directions.
2. SPATIAL DATA MODEL AND QUERY LANGUAGE
A spatial data model [5], [3] is a type of data-abstraction that hides the details of data-storage. There are two
common models of spatial information: field-based and object based. The field-based model treats spatial
information such as altitude, rainfall and temperature as a collection of spatial functions transforming a spacepartition to an attribute domain. The object-based model treats the information space as if it is populated by
discrete, identifiable, spatially referenced entities. The operations on spatial objects include distance and
boundary. The operations on fields include local, focal, and zonal operations, as shown in Table 1.
10
3. Computer Engineering and Intelligent Systems
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.12, 2013
www.iiste.org
Fig. 2. Three-layer architecture.
TABLE 1
A SAMPLE OF SPATIAL OPERATIONS
The fields may be continuous, differentiable, discrete, and isotropic or anisotropic, with positive or negative
autocorrelation. Certain field operations (slope or interpolation) assume certain field properties (differentiable or
positive autocorrelation).
An implementation of a spatial data model in the context of object-relational databases consists of a set of
spatial data types and the operations on those types. Much work has been done over the last decade on the design
11
4. Computer Engineering and Intelligent Systems
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.12, 2013
www.iiste.org
of spatial Abstract Data Types (ADTs) and their embedding in a query language. Consensus is slowly emerging
via standardization efforts, and recently the OGIS consortium [2] has proposed a specification for incorporating
2D geospatial ADTs in SQL.
Fig. 4. Spatial data type hierarchy
Fig. 4, which illustrates this spatial data-type hierarchy, consists of Point, Curve, and Surface classes and a
parallel class of Geometry Collection. The basic operations operative on all datatypes is shown in Table 2.
TABLE 2
REPRESENTATIVE FUNCTIONS SPECIFIED BY OGIS
12
5. Computer Engineering and Intelligent Systems
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.12, 2013
www.iiste.org
The topological operations are based on the ubiquitous nine intersections model [10]. Using the OGIS
specification, common spatial queries can be intuitively posed in SQL. For example, the query Find all lakes
which have an area greater than 5 sq. km. and are within 20 km. from the campgrounds can be posed as shown in
Fig. 3a. Other example GIS queries which can be implemented using OGIS operations are provided in Table 3.
The OGIS specification is confined to topological and metric operations on vector data types. Other interesting
classes of operations are network, direction, dynamic and the field operations of focal, local and zonal (see Table
1). While standards for field based raster data types are still emerging, specifically designed for cartographic
modeling and for general multidimensional discrete objects (satellite images, X-rays, etc.), are important
milestones.
TABLE 3
TYPICAL SPATIAL QUERIES FROM GIS
Fig. 3: (a) SQL query with spatial operators; (b) corresponding query tree.
2.1 Spatial Query Processing
13
6. Computer Engineering and Intelligent Systems
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.12, 2013
www.iiste.org
The efficient processing of spatial queries requires both efficient representation and efficient algorithms.
Common representations of spatial data in an object model include spaghetti, the node-arc-area (NAA) model,
the doubly connected edge- list (DCEL), and boundary representation [7], some of which are shown in Fig. 5
using entity-relationship diagrams? The NAA model differentiates between the topological concepts (node, arc,
areas) and the embedding space (points, lines, areas). The spaghetti-ring and DCEL focus on the topological
concepts. The representation of the field data model includes a regular tessellation (triangular, square, hexagonal
grid), as well as triangular irregular networks (TIN).
The spatial queries [7], shown in Table 3, are often processed using filter and refine techniques. Approximate
geometry such as the minimal orthogonal bounding rectangle of an extended spatial object is first used to filter
out many irrelevant objects quickly. Exact geometry is then used for the remaining spatial objects to complete
the processing. Strategies for range-queries include a scan and index-search in conjunction with the plane-sweep
algorithm [5]. Strategies for the spatial-join include the nested loop, tree matching [5], when indices are present
on all participating relations, and space partitioning [2], in the absence of indices. To speed up computation for
large spatial objects (it is common for polygons to have 1,000 or more edges), object indices are used in
extended filtering. Strategies such as object approximation and tree matching originated in spatial-databases, and
can potentially be applied in other
Fig.5. Entity relationship diagrams for common representations of spatial data.
2.2 Spatial File Organization and Indices
The physical design of a spatial database optimizes the instructions to storage devices for performing common
operations on spatial data files. File designs for secondary storage include clustering methods as well as spatial
hashing methods. The design of spatial clustering techniques is more difficult compared to the design of
traditional clustering because there is no natural order in multidimensional space where spatial data resides. This
is only complicated by the fact that the storage disk is a logical one-dimensional device. Thus, what is needed is
a mapping from a higher dimensional space to a one dimensional space that is distance-preserving: So that
elements that are close in space are mapped onto nearby points on the line, and one-one: no two points in the
14
7. Computer Engineering and Intelligent Systems
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.12, 2013
www.iiste.org
space are mapped onto the same point on the line [2]. Several mappings, none of them ideal, have been proposed
to accomplish this. The most prominent ones include row-order, z-order, and the Hilbert-curve (see Fig. 6).
Fig.6. Space-filling curves to linearize a multidimensional space.
Metric clustering techniques use the notion of distance to group nearest neighbors together in a metric space.
Topological clustering methods like connectivity-clustered access methods [7] use the min-cut partitioning of a
graph representation to efficiently support graph traversal operations. The physical organization of files can be
supplemented with indices, which are data-structures to improve the performance of search operations.
Classical one-dimensional indices such as the B+ tree can be used for spatial data by linearizing a
multidimensional space using a space-filling curve such as the Z-order (see Fig. 6). A large number of spatial
indices [3] have been explored for multidimensional Euclidean space. Representative indices for point objects
include Grid files, multidimensional grid files [8], Point-Quad-Trees, and Kd-trees. Representative indices for
extended objects include the R-tree family, the Field tree, Cell tree, BSP tree, and Balanced and Nested grid
files.
One of the first access methods created to handle extended object. The R-tree is a height balanced natural
extension of the B+ tree for higher dimensions. Objects are represented in the R-tree by their minimum bounding
rectangles (MBRs). Nonleaf nodes are composed of entries of the form (R, child-pointer), where R is the MBR
of all entries contained in the child pointer. Leaf nodes contain the MBRs of the data objects. To guarantee good
space utilization and height-balance, the parent MBRs are allowed to overlap.
.
Fig. 7: (a) Spatial objects (bold) arranged in R-tree hierarchy; (b) R-tree file structure on disk.
Fig. 7a illustrates the spatial objects organized in an R-tree, while Fig. 7b shows the file structure where the
nodes correspond to disk pages. Many variations of the R-tree structure exist whose main emphasis is on
discovering new strategies to maintain the balance of the tree in case of a split and to minimize the overlap of the
MBRs in order to improve the search time.
15
8. Computer Engineering and Intelligent Systems
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.12, 2013
www.iiste.org
Concurrency control for spatial access methods [6] is provided by the R-link tree, which is a variant of the Rtree with additional sibling pointers that allow the tracking of modifications. Concurrency is provided during
operations such as search, insert, and delete. The R-link tree is also recoverable in a write-ahead logging
environment
2.3 Other Accomplishments
Spatial applications like NASA’s Earth Observation System (EOS) have some of the largest data sets
encountered in any application to date. This has prompted new research in database-file design for storage on
tertiary storage devices such as juke-boxes. Representative results include those from the. High-performance
spatial applications such as flight simulators with geographic accuracy have triggered the development of new
parallel formalizations for the range query and the spatial join query, including declustering methods and
dynamic-load balancing techniques for multidimensional spatial data [8], [9]. Other interesting developments
include hierarchical algorithms for shortest path computation [4] and view materialization [6].
3. EXPERIMENTAL EVALUATION
In this section, we compare the efficiency of the proposed algorithms using real and synthetic data sets. Each
data set is indexed by an R-tree with 4 K bytes page size. We used an LRU memory buffer whose default size is
set to 0.5 percent of the sum of tree sizes (for the object and feature trees used). Our algorithms were
implemented in C++ and experiments were run on a Pentium D 2.8 GHz PC with 1 GB of RAM. In all
experiments, we measure both the I/O cost (in number of page faults) and the total execution time (in seconds) of
our algorithms. Section 5.1 describes the experimental settings. Sections 3.2 and 3.3 study the performance of
the proposed algorithms for queries with range scores and influence scores, respectively. We then present our
experimental findings on real data in Section 3.4.
3.1 Experimental Settings
We used both real and synthetic data for the experiments. The real data sets will be described in Section 3.4.
For each synthetic data set, the coordinates of points are random values uniformly and independently generated
for different dimensions. By default, an object data set contains 200K points and a feature data set contains 100K
points. The point coordinates of all data sets are normalized to the 2D space [0; 10000]2.
For a feature data set Fc, we generated qualities for its points such that they simulate a real-world scenario:
facilities close to (far from) a town center often have high (low) quality. For this, a single anchor point s? is
selected such that its neighborhood region contains high number of points. Let distmin (distmax) be the minimum
(maximum) distance of a point in Fc from the anchor
. Then, the quality of a feature point s is generated as
where
stands for the distance between s and
and
controls the skewness (default: 1.0) of
quality distribution. In this way, the qualities of points in Fc lie in [0; 1] and the points closer to the anchor have
higher qualities. Also, the quality distribution is highly skewed for large values of .
We study the performance of our algorithms with respect to various parameters, which are displayed in Table
4 (their default values are shown in bold). In each experiment, only one parameter varies while the others are
fixed to their default values.
TABLE 4
Range of Parameter Values
16
9. Computer Engineering and Intelligent Systems
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.12, 2013
www.iiste.org
3.2 Performance on Queries with Range Scores
This section studies the performance of our algorithms for top-k spatial preference queries on range scores.
TABLE 5
Effect of the Aggregate Function, Range Scores
Table 5 shows the I/O cost and execution time of the algorithms, for different aggregate functions (SUM, MIN,
and MAX). GP has lower cost than SP because GP computes the scores of points within the same leaf node
concurrently. The incremental computation technique (used by SP and GP) derives a tight upper bound score (of
each point) for the MIN function, a partially tight bound for SUM, and a loose bound for MAX. This explains
the performance of SP and GP across different aggregate functions. However, the costs of the other methods are
mainly influenced by the effectiveness of pruning. BB employs an effective technique
to prune unqualified non-leaf entries in the object tree so it outperforms GP. The optimized score computation
method enables BB* to save on average 20 percent I/O and 30 percent time of BB. FJ outperforms its
competitors as it discovers qualified combination of feature entries early.
We ignore SP in subsequent experiments, and compare the cost of the remaining algorithms on synthetic data
sets with respect to different parameters.
Next, we empirically justify the choice of using level-1 entries of feature trees Fc for the upper bound score
computation routine in the BB algorithm. In this experiment, we use the default parameter setting and study how
the number of node accesses of BB is affected by the level of Fc used.
TABLE 6
Effect of the Level of Fc Used for Upper Bound Score Computation in the BB Algorithm
Table 6 shows the decomposition of node accesses over the tree D and the trees Fc, and the statistics of upper
bound score computation. Each accessed non-leaf node of D invokes a call of the upper bound score
computation routine.
When level-0 entries of Fc are used, each upper bound computation call incurs a high number (617.5) of node
accesses (of Fc). On the other hand, using level-2 entries for upper bound computation leads to very loose
bounds, making it difficult to prune the leaf nodes of D. Observe that the total cost is minimized when level-1
entries (of Fc) are used. In that case, the node accesses per upper bound computation call is low (15), and yet the
obtained bounds are tight enough for pruning most leaf nodes of D.
17
10. Computer Engineering and Intelligent Systems
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.12, 2013
www.iiste.org
Fig. 8. Effect of buffer size, range scores. (a) I/O. (b) Time.
Fig. 8 plots the cost of the algorithms as a function of the buffer size. As the buffer size increases, the I/O of all
algorithms drops. FJ remains the best method, BB* the second, and BB the third; all of them outperform GP by a
wide margin. Since the buffer size does not affect the pruning effectiveness of the algorithms, it has a small
impact on the execution time
Fig. 9. Effect of , range scores. (a) I/O. (b) Time.
Fig. 9 compares the cost of the algorithms with respect to the object data size . Since the cost of FJ is
dominated by the cost of joining feature data sets, it is insensitive to . On the other hand, the cost of the other
methods (GP, BB, and BB*) increases with
as score computations need to be done for more objects in D.
Fig. 10. Effect of , range scores. (a) I/O. (b) Time.
Fig. 10 plots the I/O cost of the algorithms with respect to the feature data size
(of each feature data set).
As
increases, the cost of GP, BB, and FJ increases. In contrast, BB* experiences a slight cost reduction as its
optimized score computation method (for objects and non leaf entries) is able to perform pruning early at a large
value.
18
11. Computer Engineering and Intelligent Systems
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.12, 2013
www.iiste.org
Fig. 11. Effect of m, range scores. (a) I/O. (b) Time.
Fig. 11 plots the cost of the algorithms with respect to the number m of feature data sets. The costs of GP, BB,
and BB* rise linearly as m increases because the number of component score computations is at most linear to
m. On the other hand, the cost of FJ increases significantly with m, because the number of qualified
combinations of entries is exponential to m.
Fig. 12. Effect of k, range scores. (a) I/O. (b) Time.
Fig. 12 shows the cost of the algorithms as a function of the number k of requested results. GP, BB, and BB*
compute the scores of objects in D in batches, so their performance is insensitive to k. As k increases, FJ has
weaker pruning power and its cost increases slightly.
Fig. 13. Effect of k, range scores. (a) I/O. (b) Time.
Fig. 13 shows the cost of the algorithms, when varying the query range k. As k increases, all methods access
more nodes in feature trees to compute the scores of the points. The difference in execution time between BB*
and FJ shrinks as
increases. In summary, although FJ is the clear winner in most of the experimental
instances, its performance is significantly affected by the number m of feature data sets. BB* is the most robust
algorithm to parameter changes and it is recommended for problems with large m.
4. CONCLUSION
19
12. Computer Engineering and Intelligent Systems
ISSN 2222-1719 (Paper) ISSN 2222-2863 (Online)
Vol.4, No.12, 2013
www.iiste.org
In this paper, we studied top-k spatial preference queries, which provide a novel type of ranking for spatial
objects based on qualities of features in their neighborhood. The neighborhood of an object p is captured by the
scoring function: 1) the range score restricts the neighborhood to a crisp region centered at p, whereas 2) the
influence score relaxes the neighborhood to the whole space and assigns higher weights to locations closer to p.
We presented five algorithms for processing top-k spatial preference queries. The baseline algorithm SP
computes the scores of every object by querying on feature data sets. The algorithm GP is a variant of SP that
reduces I/O cost by computing scores of objects in the same leaf node concurrently. The algorithm BB derives
upper bound scores for non-leaf entries in the object tree, and prunes those that cannot lead to better results. The
algorithm BB* is a variant of BB that utilizes an optimized method for computing the scores of objects (and
upper bound scores of nonleaf entries). The algorithm FJ performs a multi-way join on feature trees to obtain
qualified combinations of feature points and then search for their relevant objects in the object tree. Based on our
experimental findings, BB* is scalable to large data sets and it is the most robust algorithm with respect to
various parameters. However, FJ is the best algorithm in cases where the number m of feature data sets is low
and each feature data set is small. In the future, we will study the top-k spatial preference query on a road
network, in which the distance between two points is defined by their shortest path distance rather than their
euclidean distance. The challenge is to develop alternative methods for computing the upper bound scores for a
group of points on a road network.
5. REFERENCES
[1] Man Lung Yiu, Hua Lu, Member, IEEE, Nikos Mamoulis, and Michail Vaitis, “Ranking Spatial Data by
Quality Preferences”, IEEE Transactions On Knowledge And Data Engineering, Vol. 23, No. 3, March 2011.
[2] N. Bruno, L. Gravano, and A. Marian, “Evaluating Top-k Queries over Web-Accessible Databases,” Proc.
IEEE Int’l Conf. Data Eng. (ICDE), 2002.
[3] A. Guttman, “R-Trees: A Dynamic Index Structure for Spatial Searching,” Proc. ACM SIGMOD, 1984.
[4] G.R. Hjaltason and H. Samet, “Distance Browsing in Spatial Databases,” ACM Trans. Database Systems,
vol. 24, no. 2, pp. 265- 318, 1999.
[5] R. Weber, H.-J. Schek, and S. Blott, “A Quantitative Analysis and Performance Study for Similarity-Search
Methods in High- Dimensional Spaces,” Proc. Int’l Conf. Very Large Data Bases (VLDB), 1998.
[6] K.S. Beyer, J. Goldstein, R. Ramakrishna, and U. Shaft, “When is ‘Nearest Neighbor’ Meaningful?” Proc.
Seventh Int’l Conf. Database Theory (ICDT), 1999.
[7] R. Fagin, A. Lotem, and M. Naor, “Optimal Aggregation Algorithms for Middleware,” Proc. Int’l Symp.
Principles of Database Systems (PODS), 2001.
[8] I.F. Ilyas, W.G. Aref, and A. Elmagarmid, “Supporting Top-k Join Queries in Relational Databases,” Proc.
29th Int’l Conf. Very Large Data Bases (VLDB), 2003.
[9] N. Mamoulis, M.L. Yiu, K.H. Cheng, and D.W. Cheung, “Efficient Top-k Aggregation of Ranked Inputs,”
ACM Trans. Database Systems, vol. 32, no. 3, p. 19, 2007.
[10] D. Papadias, P. Kalnis, J. Zhang, and Y. Tao, “Efficient OLAP Operations in Spatial Data Warehouses,”
Proc. Int’l Symp. Spatial and Temporal Databases (SSTD), 2001.
[11] S. Hong, B. Moon, and S. Lee, “Efficient Execution of Range Top-k Queries in Aggregate R-Trees,” IEICE
Trans. Information and Systems, vol. 88-D, no. 11, pp. 2544-2554, 2005.
[12] T. Xia, D. Zhang, E. Kanoulas, and Y. Du, “On Computing Top-t Most Influential Spatial Sites,” Proc. 31st
Int’l Conf. Very Large Data Bases (VLDB), 2005.
20
13. This academic article was published by The International Institute for Science,
Technology and Education (IISTE). The IISTE is a pioneer in the Open Access
Publishing service based in the U.S. and Europe. The aim of the institute is
Accelerating Global Knowledge Sharing.
More information about the publisher can be found in the IISTE’s homepage:
http://www.iiste.org
CALL FOR JOURNAL PAPERS
The IISTE is currently hosting more than 30 peer-reviewed academic journals and
collaborating with academic institutions around the world. There’s no deadline for
submission. Prospective authors of IISTE journals can find the submission
instruction on the following page: http://www.iiste.org/journals/
The IISTE
editorial team promises to the review and publish all the qualified submissions in a
fast manner. All the journals articles are available online to the readers all over the
world without financial, legal, or technical barriers other than those inseparable from
gaining access to the internet itself. Printed version of the journals is also available
upon request of readers and authors.
MORE RESOURCES
Book publication information: http://www.iiste.org/book/
Recent conferences: http://www.iiste.org/conference/
IISTE Knowledge Sharing Partners
EBSCO, Index Copernicus, Ulrich's Periodicals Directory, JournalTOCS, PKP Open
Archives Harvester, Bielefeld Academic Search Engine, Elektronische
Zeitschriftenbibliothek EZB, Open J-Gate, OCLC WorldCat, Universe Digtial
Library , NewJour, Google Scholar