Column-Stores database stores data column-by-column. The need for Column-Stores database arose for
the efficient query processing in read-intensive relational databases. Also, for read-intensive relational
databases,extensive research has performed for efficient data storage and query processing. This paper
gives an overview of storage and performance optimization techniques used in Column-Stores.
New proximity estimate for incremental update of non uniformly distributed cl...IJDKP
The conventional clustering algorithms mine static databases and generate a set of patterns in the form of
clusters. Many real life databases keep growing incrementally. For such dynamic databases, the patterns
extracted from the original database become obsolete. Thus the conventional clustering algorithms are not
suitable for incremental databases due to lack of capability to modify the clustering results in accordance
with recent updates. In this paper, the author proposes a new incremental clustering algorithm called
CFICA(Cluster Feature-Based Incremental Clustering Approach for numerical data) to handle numerical
data and suggests a new proximity metric called Inverse Proximity Estimate (IPE) which considers the
proximity of a data point to a cluster representative as well as its proximity to a farthest point in its vicinity.
CFICA makes use of the proposed proximity metric to determine the membership of a data point into a
cluster.
Decision tree clustering a columnstores tuple reconstructioncsandit
Column-Stores has gained market share due to promi
sing physical storage alternative for
analytical queries. However, for multi-attribute qu
eries column-stores pays performance
penalties due to on-the-fly tuple reconstruction. T
his paper presents an adaptive approach for
reducing tuple reconstruction time. Proposed approa
ch exploits decision tree algorithm to
cluster attributes for each projection and also eli
minates frequent database scanning.
Experimentations with TPC-H data shows the effectiv
eness of proposed approach.
With the rapid development in Geographic Information Systems (GISs) and their applications, more and
more geo-graphical databases have been developed by different vendors. However, data integration and
accessing is still a big problem for the development of GIS applications as no interoperability exists among
different spatial databases. In this paper we propose a unified approach for spatial data query. The paper
describes a framework for integrating information from repositories containing different vector data sets
formats and repositories containing raster datasets. The presented approach converts different vector data
formats into a single unified format (File Geo-Database “GDB”). In addition, we employ “metadata” to
support a wide range of users’ queries to retrieve relevant geographic information from heterogeneous and
distributed repositories. Such an employment enhances both query processing and performance.
Clustering the results of a search helps the user to overview the information returned. In this paper, we
look upon the clustering task as cataloguing the search results. By catalogue we mean a structured label
list that can help the user to realize the labels and search results. Labelling Cluster is crucial because
meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra
time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able
to label clusters effectively, a new cluster labelling method is introduced. More emphasis was given to
/produce comprehensible and accurate cluster labels in addition to the discovery of document clusters. We
also present a new metric that employs to assess the success of cluster labelling. We adopt a comparative
evaluation strategy to derive the relative performance of the proposed method with respect to the two
prominent search result clustering methods: Suffix Tree Clustering and Lingo.
we perform the experiments using the publicly available Datasets Ambient and ODP-239
New proximity estimate for incremental update of non uniformly distributed cl...IJDKP
The conventional clustering algorithms mine static databases and generate a set of patterns in the form of
clusters. Many real life databases keep growing incrementally. For such dynamic databases, the patterns
extracted from the original database become obsolete. Thus the conventional clustering algorithms are not
suitable for incremental databases due to lack of capability to modify the clustering results in accordance
with recent updates. In this paper, the author proposes a new incremental clustering algorithm called
CFICA(Cluster Feature-Based Incremental Clustering Approach for numerical data) to handle numerical
data and suggests a new proximity metric called Inverse Proximity Estimate (IPE) which considers the
proximity of a data point to a cluster representative as well as its proximity to a farthest point in its vicinity.
CFICA makes use of the proposed proximity metric to determine the membership of a data point into a
cluster.
Decision tree clustering a columnstores tuple reconstructioncsandit
Column-Stores has gained market share due to promi
sing physical storage alternative for
analytical queries. However, for multi-attribute qu
eries column-stores pays performance
penalties due to on-the-fly tuple reconstruction. T
his paper presents an adaptive approach for
reducing tuple reconstruction time. Proposed approa
ch exploits decision tree algorithm to
cluster attributes for each projection and also eli
minates frequent database scanning.
Experimentations with TPC-H data shows the effectiv
eness of proposed approach.
With the rapid development in Geographic Information Systems (GISs) and their applications, more and
more geo-graphical databases have been developed by different vendors. However, data integration and
accessing is still a big problem for the development of GIS applications as no interoperability exists among
different spatial databases. In this paper we propose a unified approach for spatial data query. The paper
describes a framework for integrating information from repositories containing different vector data sets
formats and repositories containing raster datasets. The presented approach converts different vector data
formats into a single unified format (File Geo-Database “GDB”). In addition, we employ “metadata” to
support a wide range of users’ queries to retrieve relevant geographic information from heterogeneous and
distributed repositories. Such an employment enhances both query processing and performance.
Clustering the results of a search helps the user to overview the information returned. In this paper, we
look upon the clustering task as cataloguing the search results. By catalogue we mean a structured label
list that can help the user to realize the labels and search results. Labelling Cluster is crucial because
meaningless or confusing labels may mislead users to check wrong clusters for the query and lose extra
time. Additionally, labels should reflect the contents of documents within the cluster accurately. To be able
to label clusters effectively, a new cluster labelling method is introduced. More emphasis was given to
/produce comprehensible and accurate cluster labels in addition to the discovery of document clusters. We
also present a new metric that employs to assess the success of cluster labelling. We adopt a comparative
evaluation strategy to derive the relative performance of the proposed method with respect to the two
prominent search result clustering methods: Suffix Tree Clustering and Lingo.
we perform the experiments using the publicly available Datasets Ambient and ODP-239
SURVEY ON SCHEDULING AND ALLOCATION IN HIGH LEVEL SYNTHESIScscpconf
This paper presents the detailed survey of scheduling and allocation techniques in the High Level Synthesis (HLS) presented in the research literature. It also presents the methodologies and techniques to improve the Speed, (silicon) Area and Power in High Level Synthesis, which are presented in the research literature.
Exploration of Call Transcripts with MapReduce and Zipf’s LawTom Donoghue
This study implements a proof of concept
pipeline to capture web based call transcripts and produces
a word frequency dataset ready for textual analysis
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Accelerating sparse matrix-vector multiplication in iterative methods using GPUSubhajit Sahu
Kiran Kumar Matam; Kishore Kothapalli
All Authors
19
Paper
Citations
1
Patent
Citation
486
Full
Text Views
Multiplying a sparse matrix with a vector (spmv for short) is a fundamental operation in many linear algebra kernels. Having an efficient spmv kernel on modern architectures such as the GPUs is therefore of principal interest. The computational challenges that spmv poses are significantlydifferent compared to that of the dense linear algebra kernels. Recent work in this direction has focused on designing data structures to represent sparse matrices so as to improve theefficiency of spmv kernels. However, as the nature of sparseness differs across sparse matrices, there is no clear answer as to which data structure to use given a sparse matrix. In this work, we address this problem by devising techniques to understand the nature of the sparse matrix and then choose appropriate data structures accordingly. By using our technique, we are able to improve the performance of the spmv kernel on an Nvidia Tesla GPU (C1060) by a factor of up to80% in some instances, and about 25% on average compared to the best results of Bell and Garland [3] on the standard dataset (cf. Williams et al. SC'07) used in recent literature. We also use our spmv in the conjugate gradient method and show an average 20% improvement compared to using HYB spmv of [3], on the dataset obtained from the The University of Florida Sparse Matrix Collection [9].
Published in: 2011 International Conference on Parallel Processing
Date of Conference: 13-16 Sept. 2011
Date Added to IEEE Xplore: 17 October 2011
ISBN Information:
ISSN Information:
INSPEC Accession Number: 12316254
DOI: 10.1109/ICPP.2011.82
Publisher: IEEE
Conference Location: Taipei City, Taiwan
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...ijdpsjournal
The Science Information Network (SINET) is a Japanese academic backbone network for more than 800 universities and research institutions. The characteristic of SINET traffic is that it is enormous and highly variable. In this paper, we present a task-decomposition based anomaly detection of massive and highvolatility session data of SINET. Three main features are discussed: Tash scheduling, Traffic discrimination, and Histogramming. We adopt a task-decomposition based dynamic scheduling method to handle the massive session data stream of SINET. In the experiment, we have analysed SINET traffic from 2/27 to 3/8 and detect some anomalies by LSTM based time-series data processing.
Performance Improvement Technique in Column-StoreIDES Editor
Column-oriented database has gained popularity as
“Data Warehousing” data and performance issues for
“Analytical Queries” have increased. Each attribute of a
relation is physically stored as a separate column, which will
help analytical queries to work fast. The overhead is incurred
in tuple reconstruction for multi attribute queries. Each tuple
reconstruction is joining of two columns based on tuple IDs,
making it significant cost component. For reducing cost,
physical design have multiple presorted copies of each base
table, such that tuples are already appropriately organized in
different orders across the various columns.
This paper proposes a novel design, called
partitioning, that minimizes the tuple reconstruction cost. It
achieves performance similar to using presorted data, but
without requiring the heavy initial presorting step. In addition,
it handles dynamic, unpredictable workloads with no idle time
and frequent updates. Partitioning provides the direct loading
of the data in respective partitions. Partitions are created on
the fly and depend on distribution of data, which will work
nicely in limited storage space environments.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
An enhanced adaptive scoring job scheduling algorithm with replication strate...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A Survey of Machine Learning Techniques for Self-tuning Hadoop Performance IJECEIAES
The Apache Hadoop framework is an open source implementation of MapReduce for processing and storing big data. However, to get the best performance from this is a big challenge because of its large number configuration parameters. In this paper, the concept of critical issues of Hadoop system, big data and machine learning have been highlighted and an analysis of some machine learning techniques applied so far, for improving the Hadoop performance is presented. Then, a promising machine learning technique using deep learning algorithm is proposed for Hadoop system performance improvement.
Recently graph data rises in many applications and there is need to manage such large amount of data by performing various graph operations over graphs using some graph search queries. Many approaches and algorithms serve this purpose but continuously require improvement over it in terms of stability and performance. Such approaches are less efficient when large and complex data is involved. Applications need to execute faster in order to improve overall performance of the system and need to perform many
advanced and complex operations. Shortest path estimation is one of the key search queries in many applications. Here we present a system which will find the shortest path between nodes and contribute to performance of the system with the help of different shortest path algorithms such as bidirectional search and AStar algorithm and takes a relational approach using some new standard SQL queries to solve the
problem, utilizing advantages of relational database which solves the problem efficiently.
SCIENTIFIC WORKFLOW CLUSTERING BASED ON MOTIF DISCOVERYijcseit
In this paper, clustering of scientific workflows is investigated. It proposes a work to encode workflows through workflow representations as sets of embedded workflows. Then, it embeds extracted workflow
motifs in sets of workflows. By motifs, common patterns of workflow steps and relationships are replaced with indices. Motifs are defined as small functional units that occur much more frequently than expected.They can show hidden relationships, and they keep as much underlying information as possible. In order to have a good estimate on distances between observed workflows, this work proposes the scientific workflow clustering problem with exploiting set descriptors, instead of vector based descriptors. It uses k-means
algorithm as a popular clustering algorithm for workflow clustering. However, one of the biggest limitations of the k-means algorithm is the requirement of the number of clusters, K, to be specified before the algorithm is applied. To address this problem it proposes a method based on the SFLA. The simulation results show that the proposed method is better than PSO and GA algorithms in the K selection.
Converting UML Class Diagrams into Temporal Object Relational DataBase IJECEIAES
Number of active researchers and experts, are engaged to develop and implement new mechanism and features in time varying database management system (TVDBMS), to respond to the recommendation of modern business environment.Time-varying data management has been much taken into consideration with either the attribute or tuple time stamping schema. Our main approach here is to try to offer a better solution to all mentioned limitations of existing works, in order to provide the nonprocedural data definitions, queries of temporal data as complete as possible technical conversion ,that allow to easily realize and share all conceptual details of the UML class specifications, from conception and design point of view. This paper contributes to represent a logical design schema by UML class diagrams, which are handled by stereotypes to express a temporal object relational database with attribute timestamping.
SURVEY ON SCHEDULING AND ALLOCATION IN HIGH LEVEL SYNTHESIScscpconf
This paper presents the detailed survey of scheduling and allocation techniques in the High Level Synthesis (HLS) presented in the research literature. It also presents the methodologies and techniques to improve the Speed, (silicon) Area and Power in High Level Synthesis, which are presented in the research literature.
Exploration of Call Transcripts with MapReduce and Zipf’s LawTom Donoghue
This study implements a proof of concept
pipeline to capture web based call transcripts and produces
a word frequency dataset ready for textual analysis
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Accelerating sparse matrix-vector multiplication in iterative methods using GPUSubhajit Sahu
Kiran Kumar Matam; Kishore Kothapalli
All Authors
19
Paper
Citations
1
Patent
Citation
486
Full
Text Views
Multiplying a sparse matrix with a vector (spmv for short) is a fundamental operation in many linear algebra kernels. Having an efficient spmv kernel on modern architectures such as the GPUs is therefore of principal interest. The computational challenges that spmv poses are significantlydifferent compared to that of the dense linear algebra kernels. Recent work in this direction has focused on designing data structures to represent sparse matrices so as to improve theefficiency of spmv kernels. However, as the nature of sparseness differs across sparse matrices, there is no clear answer as to which data structure to use given a sparse matrix. In this work, we address this problem by devising techniques to understand the nature of the sparse matrix and then choose appropriate data structures accordingly. By using our technique, we are able to improve the performance of the spmv kernel on an Nvidia Tesla GPU (C1060) by a factor of up to80% in some instances, and about 25% on average compared to the best results of Bell and Garland [3] on the standard dataset (cf. Williams et al. SC'07) used in recent literature. We also use our spmv in the conjugate gradient method and show an average 20% improvement compared to using HYB spmv of [3], on the dataset obtained from the The University of Florida Sparse Matrix Collection [9].
Published in: 2011 International Conference on Parallel Processing
Date of Conference: 13-16 Sept. 2011
Date Added to IEEE Xplore: 17 October 2011
ISBN Information:
ISSN Information:
INSPEC Accession Number: 12316254
DOI: 10.1109/ICPP.2011.82
Publisher: IEEE
Conference Location: Taipei City, Taiwan
TASK-DECOMPOSITION BASED ANOMALY DETECTION OF MASSIVE AND HIGH-VOLATILITY SES...ijdpsjournal
The Science Information Network (SINET) is a Japanese academic backbone network for more than 800 universities and research institutions. The characteristic of SINET traffic is that it is enormous and highly variable. In this paper, we present a task-decomposition based anomaly detection of massive and highvolatility session data of SINET. Three main features are discussed: Tash scheduling, Traffic discrimination, and Histogramming. We adopt a task-decomposition based dynamic scheduling method to handle the massive session data stream of SINET. In the experiment, we have analysed SINET traffic from 2/27 to 3/8 and detect some anomalies by LSTM based time-series data processing.
Performance Improvement Technique in Column-StoreIDES Editor
Column-oriented database has gained popularity as
“Data Warehousing” data and performance issues for
“Analytical Queries” have increased. Each attribute of a
relation is physically stored as a separate column, which will
help analytical queries to work fast. The overhead is incurred
in tuple reconstruction for multi attribute queries. Each tuple
reconstruction is joining of two columns based on tuple IDs,
making it significant cost component. For reducing cost,
physical design have multiple presorted copies of each base
table, such that tuples are already appropriately organized in
different orders across the various columns.
This paper proposes a novel design, called
partitioning, that minimizes the tuple reconstruction cost. It
achieves performance similar to using presorted data, but
without requiring the heavy initial presorting step. In addition,
it handles dynamic, unpredictable workloads with no idle time
and frequent updates. Partitioning provides the direct loading
of the data in respective partitions. Partitions are created on
the fly and depend on distribution of data, which will work
nicely in limited storage space environments.
USING ONTOLOGIES TO IMPROVE DOCUMENT CLASSIFICATION WITH TRANSDUCTIVE SUPPORT...IJDKP
Many applications of automatic document classification require learning accurately with little training
data. The semi-supervised classification technique uses labeled and unlabeled data for training. This
technique has shown to be effective in some cases; however, the use of unlabeled data is not always
beneficial.
On the other hand, the emergence of web technologies has originated the collaborative development of
ontologies. In this paper, we propose the use of ontologies in order to improve the accuracy and efficiency
of the semi-supervised document classification.
We used support vector machines, which is one of the most effective algorithms that have been studied for
text. Our algorithm enhances the performance of transductive support vector machines through the use of
ontologies. We report experimental results applying our algorithm to three different datasets. Our
experiments show an increment of accuracy of 4% on average and up to 20%, in comparison with the
traditional semi-supervised model.
An enhanced adaptive scoring job scheduling algorithm with replication strate...eSAT Publishing House
IJRET : International Journal of Research in Engineering and Technology is an international peer reviewed, online journal published by eSAT Publishing House for the enhancement of research in various disciplines of Engineering and Technology. The aim and scope of the journal is to provide an academic medium and an important reference for the advancement and dissemination of research results that support high-level learning, teaching and research in the fields of Engineering and Technology. We bring together Scientists, Academician, Field Engineers, Scholars and Students of related fields of Engineering and Technology
A Survey of Machine Learning Techniques for Self-tuning Hadoop Performance IJECEIAES
The Apache Hadoop framework is an open source implementation of MapReduce for processing and storing big data. However, to get the best performance from this is a big challenge because of its large number configuration parameters. In this paper, the concept of critical issues of Hadoop system, big data and machine learning have been highlighted and an analysis of some machine learning techniques applied so far, for improving the Hadoop performance is presented. Then, a promising machine learning technique using deep learning algorithm is proposed for Hadoop system performance improvement.
Recently graph data rises in many applications and there is need to manage such large amount of data by performing various graph operations over graphs using some graph search queries. Many approaches and algorithms serve this purpose but continuously require improvement over it in terms of stability and performance. Such approaches are less efficient when large and complex data is involved. Applications need to execute faster in order to improve overall performance of the system and need to perform many
advanced and complex operations. Shortest path estimation is one of the key search queries in many applications. Here we present a system which will find the shortest path between nodes and contribute to performance of the system with the help of different shortest path algorithms such as bidirectional search and AStar algorithm and takes a relational approach using some new standard SQL queries to solve the
problem, utilizing advantages of relational database which solves the problem efficiently.
SCIENTIFIC WORKFLOW CLUSTERING BASED ON MOTIF DISCOVERYijcseit
In this paper, clustering of scientific workflows is investigated. It proposes a work to encode workflows through workflow representations as sets of embedded workflows. Then, it embeds extracted workflow
motifs in sets of workflows. By motifs, common patterns of workflow steps and relationships are replaced with indices. Motifs are defined as small functional units that occur much more frequently than expected.They can show hidden relationships, and they keep as much underlying information as possible. In order to have a good estimate on distances between observed workflows, this work proposes the scientific workflow clustering problem with exploiting set descriptors, instead of vector based descriptors. It uses k-means
algorithm as a popular clustering algorithm for workflow clustering. However, one of the biggest limitations of the k-means algorithm is the requirement of the number of clusters, K, to be specified before the algorithm is applied. To address this problem it proposes a method based on the SFLA. The simulation results show that the proposed method is better than PSO and GA algorithms in the K selection.
Converting UML Class Diagrams into Temporal Object Relational DataBase IJECEIAES
Number of active researchers and experts, are engaged to develop and implement new mechanism and features in time varying database management system (TVDBMS), to respond to the recommendation of modern business environment.Time-varying data management has been much taken into consideration with either the attribute or tuple time stamping schema. Our main approach here is to try to offer a better solution to all mentioned limitations of existing works, in order to provide the nonprocedural data definitions, queries of temporal data as complete as possible technical conversion ,that allow to easily realize and share all conceptual details of the UML class specifications, from conception and design point of view. This paper contributes to represent a logical design schema by UML class diagrams, which are handled by stereotypes to express a temporal object relational database with attribute timestamping.
JAVA 2013 IEEE NETWORKSECURITY PROJECT SORT: A Self-ORganizing Trust Model fo...IEEEGLOBALSOFTTECHNOLOGIES
To Get any Project for CSE, IT ECE, EEE Contact Me @ 09849539085, 09966235788 or mail us - ieeefinalsemprojects@gmail.com-Visit Our Website: www.finalyearprojects.org
beca.MOS en un proyecto para la mejora de la empleabilidad de jóvenes entre 16 y 30 años, a través de la certificación en MOS (Microsoft Office Specialist) de sus competencias digitales en el uso de Office 2010, financiado por Microsoft dentro de la estrategia Red Conecta.
Comparison between riss and dcharm for mining gene expression dataIJDKP
Since the rapid advance of microarray technology, gene expression data are gaining recent interest to
reveal biological information about genes functions and their relation to health. Data mining techniques
are effective and efficient in extracting useful patterns. Most of the current data mining algorithms suffer
from high processing time while generating frequent itemsets. The aim of this paper is to provide a
comparative study of two Closed Frequent Itemsets algorithms (CFI), dCHARM and RISS. They are
examined with high dimension data specifically gene expression data. Nine experiments are conducted with
different number of genes to examine the performance of both algorithms. It is found that RISS outperforms
dCHARM in terms of processing time..
QUERY OPTIMIZATION IN OODBMS: IDENTIFYING SUBQUERY FOR COMPLEX QUERY MANAGEMENTcsandit
This paper is based on relatively newer approach for query optimization in object databases,
which uses query decomposition and cached query results to improve execution a query. Issues
that are focused here is fast retrieval and high reuse of cached queries, Decompose Query into
Sub query, Decomposition of complex queries into smaller for fast retrieval of result.
Here we try to address another open area of query caching like handling wider queries. By
using some parts of cached results helpful for answering other queries (wider Queries) and
combining many cached queries while producing the result.
Multiple experiments were performed to prove the productivity of this newer way of optimizing
a query. The limitation of this technique is that it’s useful especially in scenarios where data
manipulation rate is very low as compared to data retrieval rate.
Enhancing keyword search over relational databases using ontologiescsandit
Keyword Search Over Relational Databases (KSORDB) provides an easy way for casual users
to access relational databases using a set of keywords. Although much research has been done
and several prototypes have been developed recently, most of this research implements exact
(also called syntactic or keyword) match. So, if there is a vocabulary mismatch, the user cannot
get an answer although the database may contain relevant data. In this paper we propose a
system that overcomes this issue. Our system extends existing schema-free KSORDB systems
with semantic match features. So, if there were no or very few answers, our system exploits
domain ontology to progressively return related terms that can be used to retrieve more
relevant answers to user.
ENHANCING KEYWORD SEARCH OVER RELATIONAL DATABASES USING ONTOLOGIES cscpconf
Keyword Search Over Relational Databases (KSORDB) provides an easy way for casual users to access relational databases using a set of keywords. Although much research has been done and several prototypes have been developed recently, most of this research implements exact also called syntactic or keyword) match. So, if there is a vocabulary mismatch, the user cannotget an answer although the database may contain relevant data. In this paper we propose a
system that overcomes this issue. Our system extends existing schema-free KSORDB systems with semantic match features. So, if there were no or very few answers, our system exploits
domain ontology to progressively return related terms that can be used to retrieve morerelevant answers to user.
International Refereed Journal of Engineering and Science (IRJES) is a peer reviewed online journal for professionals and researchers in the field of computer science. The main aim is to resolve emerging and outstanding problems revealed by recent social and technological change. IJRES provides the platform for the researchers to present and evaluate their work from both theoretical and technical aspects and to share their views.
www.irjes.com
Query optimization in oodbms identifying subquery for query managementijdms
This paper is based on relatively newer approach for query optimization in object databases, which uses
query decomposition and cached query results to improve execution a query. Issues that are focused here is
fast retrieval and high reuse of cached queries, Decompose Query into Sub query, Decomposition of
complex queries into smaller for fast retrieval of result.
Here we try to address another open area of query caching like handling wider queries. By using some
parts of cached results helpful for answering other queries (wider Queries) and combining many cached
queries while producing the result.
Multiple experiments were performed to prove the productivity of this newer way of optimizing a query.
The limitation of this technique is that it’s useful especially in scenarios where data manipulation rate is
very low as compared to data retrieval rate.
The International Journal of Engineering and Science (The IJES)theijes
The International Journal of Engineering & Science is aimed at providing a platform for researchers, engineers, scientists, or educators to publish their original research results, to exchange new ideas, to disseminate information in innovative designs, engineering experiences and technological skills. It is also the Journal's objective to promote engineering and technology education. All papers submitted to the Journal will be blind peer-reviewed. Only original articles will be published.
Hortizontal Aggregation in SQL for Data Mining Analysis to Prepare Data SetsIJMER
International Journal of Modern Engineering Research (IJMER) is Peer reviewed, online Journal. It serves as an international archival forum of scholarly research related to engineering and science education.
International Journal of Modern Engineering Research (IJMER) covers all the fields of engineering and science: Electrical Engineering, Mechanical Engineering, Civil Engineering, Chemical Engineering, Computer Engineering, Agricultural Engineering, Aerospace Engineering, Thermodynamics, Structural Engineering, Control Engineering, Robotics, Mechatronics, Fluid Mechanics, Nanotechnology, Simulators, Web-based Learning, Remote Laboratories, Engineering Design Methods, Education Research, Students' Satisfaction and Motivation, Global Projects, and Assessment…. And many more.
Elimination of data redundancy before persisting into dbms using svm classifi...nalini manogaran
Elimination of data redundancy before persisting into dbms using svm classification,
Data Base Management System is one of the
growing fields in computing world. Grid computing, internet
sharing, distributed computing, parallel processing and cloud
are the areas store their huge amount of data in a DBMS to
maintain the structure of the data. Memory management is
one of the major portions in DBMS due to edit, delete, recover
and commit operations used on the records. To improve the
memory utilization efficiently, the redundant data should be
eliminated accurately. In this paper, the redundant data is
fetched by the Quick Search Bad Character (QSBC) function
and intimate to the DB admin to remove the redundancy.
QSBC function compares the entire data with patterns taken
from index table created for all the data persisted in the
DBMS to easy comparison of redundant (duplicate) data in
the database. This experiment in examined in SQL server
software on a university student database and performance is
evaluated in terms of time and accuracy. The database is
having 15000 students data involved in various activities.
Keywords—Data redundancy, Data Base Management System,
Support Vector Machine, Data Duplicate.
I. INTRODUCTION
The growing (prenominal) mass of information
present in digital media has become a resistive problem for
data administrators. Usually, shaped on data congregate
from distinct origin, data repositories such as those used by
digital libraries and e-commerce agent based records with
disparate schemata and structures. Also problems regarding
to low response time, availability, security and quality
assurance become more troublesome to manage as the
amount of data grow larger. It is practicable to specimen
that the peculiarity of the data that an association uses in its
systems is relative to its efficiency for offering beneficial
services to their users. In this environment, the
determination of maintenance repositories with “dirty” data
(i.e., with replicas, identification errors, equal patterns,
etc.) goes greatly beyond technical discussion such as the
everywhere quickness or accomplishment of data
administration systems.
Nalini.M, nalini.tptwin@gmail.com, Anbu.S, anomaly detection,
data mining
big data
dbms
intrusion detection
dublicate detection
data cleaning
data redundancy
data replication, redundancy removel, QSBC, Duplicate detection, error correction, de-duplication, Data cleaning, Dbms, Data sets
Decision Tree Clustering : A Columnstores Tuple Reconstructioncsandit
Column-Stores has gained market share due to promising physical storage alternative for
analytical queries. However, for multi-attribute queries column-stores pays performance
penalties due to on-the-fly tuple reconstruction. This paper presents an adaptive approach for
reducing tuple reconstruction time. Proposed approach exploits decision tree algorithm to
cluster attributes for each projection and also eliminates frequent database scanning.
Experimentations with TPC-H data shows the effectiveness of proposed approach.
DECISION TREE CLUSTERING: A COLUMNSTORES TUPLE RECONSTRUCTIONcscpconf
Column-Stores has gained market share due to promising physical storage alternative for analytical queries. However, for multi-attribute queries column-stores pays performance
penalties due to on-the-fly tuple reconstruction. This paper presents an adaptive approach for reducing tuple reconstruction time. Proposed approach exploits decision tree algorithm to
cluster attributes for each projection and also eliminates frequent database scanning.Experimentations with TPC-H data shows the effectiveness of proposed approach.
Power Management in Micro grid Using Hybrid Energy Storage Systemijcnes
This paper proposed for power management in micro grid using a hybrid distributed generator based on photovoltaic, wind-driven PMDC and energy storage system is proposed. In this generator, the sources are together connected to the grid with the help of interleaved boost converter followed by an inverter. Thus, compared to earlier schemes, the proposed scheme has fewer power converters. FUZZY based MPPT controllers are also proposed for the new hybrid scheme to separately trigger the interleaved DC-DC converter and the inverter for tracking the maximum power from both the sources. The integrated operations of both the proposed controllers for different conditions are demonstrated through simulation with the help of MATLAB software
EVALUATING CASSANDRA, MONGO DB LIKE NOSQL DATASETS USING HADOOP STREAMINGijiert bestjournal
An unstructured data poses challenges to storing da ta. Experts estimate that 80 to 90 percent of the d ata in any organization is unstructured. And the amount of uns tructured data in enterprises is growing significan tly� often many times faster than structured databases are gro wing. As structured data is existing in table forma t i,e having proper scheme but unstructured data is schema less database So it�s directly signifying the importance of NoSQL storage Model and Map Reduce platform. For processi ng unstructured data,where in existing it is given to Cassandra dataset. Here in present system along wit h Cassandra dataset,Mongo DB is to be implemented. As Mongo DB provide flexible data model and large amou nt of options for querying unstructured data. Where as Cassandra model their data in such a way as to mini mize the total number of queries through more caref ul planning and renormalizations. It offers basic secondary ind exes but for the best performance it�s recommended to model our data as to use them infrequently. So to process
Dominant block guided optimal cache size estimation to maximize ipc of embedd...ijesajournal
Embedded system software is highly constrained from performance, memory footprint, energy consumption and implementing cost view point. It is always desirable to obtain better Instructions per Cycle (IPC). Instruction cache has major contribution in improving IPC. Cache memories are realized on the same chip where the processor is running. This considerably increases the system cost as well. Hence, it is required to maintain a trade-off between cache sizes and performance improvement offered. Determining the number of cache lines and size of cache line are important parameters for cache designing. The design space for cache is quite large. It is time taking to execute the given application with different cache sizes on an instruction set simulator (ISS) to figure out the optimal cache size. In this paper, a technique is proposed to identify a number of cache lines and cache line size for the L1 instruction cache that will offer best or nearly best IPC. Cache size is derived, at a higher abstraction level, from basic block analysis in the Low Level Virtual Machine (LLVM) environment. The cache size estimated from the LLVM environment is cross validated by simulating the set of benchmark applications with different cache sizes in SimpleScalar’s out-of-order simulator. The proposed method seems to be superior in terms of estimation accuracy and/or estimation time as compared to the existing methods for estimation of optimal cache size parameters (cache line size, number of cache lines).
Similar to Column store databases approaches and optimization techniques (20)
The Art of the Pitch: WordPress Relationships and SalesLaura Byrne
Clients don’t know what they don’t know. What web solutions are right for them? How does WordPress come into the picture? How do you make sure you understand scope and timeline? What do you do if sometime changes?
All these questions and more will be explored as we talk about matching clients’ needs with what your agency offers without pulling teeth or pulling your hair out. Practical tips, and strategies for successful relationship building that leads to closing the deal.
Maruthi Prithivirajan, Head of ASEAN & IN Solution Architecture, Neo4j
Get an inside look at the latest Neo4j innovations that enable relationship-driven intelligence at scale. Learn more about the newest cloud integrations and product enhancements that make Neo4j an essential choice for developers building apps with interconnected data and generative AI.
GDG Cloud Southlake #33: Boule & Rebala: Effective AppSec in SDLC using Deplo...James Anderson
Effective Application Security in Software Delivery lifecycle using Deployment Firewall and DBOM
The modern software delivery process (or the CI/CD process) includes many tools, distributed teams, open-source code, and cloud platforms. Constant focus on speed to release software to market, along with the traditional slow and manual security checks has caused gaps in continuous security as an important piece in the software supply chain. Today organizations feel more susceptible to external and internal cyber threats due to the vast attack surface in their applications supply chain and the lack of end-to-end governance and risk management.
The software team must secure its software delivery process to avoid vulnerability and security breaches. This needs to be achieved with existing tool chains and without extensive rework of the delivery processes. This talk will present strategies and techniques for providing visibility into the true risk of the existing vulnerabilities, preventing the introduction of security issues in the software, resolving vulnerabilities in production environments quickly, and capturing the deployment bill of materials (DBOM).
Speakers:
Bob Boule
Robert Boule is a technology enthusiast with PASSION for technology and making things work along with a knack for helping others understand how things work. He comes with around 20 years of solution engineering experience in application security, software continuous delivery, and SaaS platforms. He is known for his dynamic presentations in CI/CD and application security integrated in software delivery lifecycle.
Gopinath Rebala
Gopinath Rebala is the CTO of OpsMx, where he has overall responsibility for the machine learning and data processing architectures for Secure Software Delivery. Gopi also has a strong connection with our customers, leading design and architecture for strategic implementations. Gopi is a frequent speaker and well-known leader in continuous delivery and integrating security into software delivery.
GraphSummit Singapore | The Future of Agility: Supercharging Digital Transfor...Neo4j
Leonard Jayamohan, Partner & Generative AI Lead, Deloitte
This keynote will reveal how Deloitte leverages Neo4j’s graph power for groundbreaking digital twin solutions, achieving a staggering 100x performance boost. Discover the essential role knowledge graphs play in successful generative AI implementations. Plus, get an exclusive look at an innovative Neo4j + Generative AI solution Deloitte is developing in-house.
GraphRAG is All You need? LLM & Knowledge GraphGuy Korland
Guy Korland, CEO and Co-founder of FalkorDB, will review two articles on the integration of language models with knowledge graphs.
1. Unifying Large Language Models and Knowledge Graphs: A Roadmap.
https://arxiv.org/abs/2306.08302
2. Microsoft Research's GraphRAG paper and a review paper on various uses of knowledge graphs:
https://www.microsoft.com/en-us/research/blog/graphrag-unlocking-llm-discovery-on-narrative-private-data/
A tale of scale & speed: How the US Navy is enabling software delivery from l...sonjaschweigert1
Rapid and secure feature delivery is a goal across every application team and every branch of the DoD. The Navy’s DevSecOps platform, Party Barge, has achieved:
- Reduction in onboarding time from 5 weeks to 1 day
- Improved developer experience and productivity through actionable findings and reduction of false positives
- Maintenance of superior security standards and inherent policy enforcement with Authorization to Operate (ATO)
Development teams can ship efficiently and ensure applications are cyber ready for Navy Authorizing Officials (AOs). In this webinar, Sigma Defense and Anchore will give attendees a look behind the scenes and demo secure pipeline automation and security artifacts that speed up application ATO and time to production.
We will cover:
- How to remove silos in DevSecOps
- How to build efficient development pipeline roles and component templates
- How to deliver security artifacts that matter for ATO’s (SBOMs, vulnerability reports, and policy evidence)
- How to streamline operations with automated policy checks on container images
Dr. Sean Tan, Head of Data Science, Changi Airport Group
Discover how Changi Airport Group (CAG) leverages graph technologies and generative AI to revolutionize their search capabilities. This session delves into the unique search needs of CAG’s diverse passengers and customers, showcasing how graph data structures enhance the accuracy and relevance of AI-generated search results, mitigating the risk of “hallucinations” and improving the overall customer journey.
Encryption in Microsoft 365 - ExpertsLive Netherlands 2024Albert Hoitingh
In this session I delve into the encryption technology used in Microsoft 365 and Microsoft Purview. Including the concepts of Customer Key and Double Key Encryption.
Climate Impact of Software Testing at Nordic Testing DaysKari Kakkonen
My slides at Nordic Testing Days 6.6.2024
Climate impact / sustainability of software testing discussed on the talk. ICT and testing must carry their part of global responsibility to help with the climat warming. We can minimize the carbon footprint but we can also have a carbon handprint, a positive impact on the climate. Quality characteristics can be added with sustainability, and then measured continuously. Test environments can be used less, and in smaller scale and on demand. Test techniques can be used in optimizing or minimizing number of tests. Test automation can be used to speed up testing.
Observability Concepts EVERY Developer Should Know -- DeveloperWeek Europe.pdfPaige Cruz
Monitoring and observability aren’t traditionally found in software curriculums and many of us cobble this knowledge together from whatever vendor or ecosystem we were first introduced to and whatever is a part of your current company’s observability stack.
While the dev and ops silo continues to crumble….many organizations still relegate monitoring & observability as the purview of ops, infra and SRE teams. This is a mistake - achieving a highly observable system requires collaboration up and down the stack.
I, a former op, would like to extend an invitation to all application developers to join the observability party will share these foundational concepts to build on:
Removing Uninteresting Bytes in Software FuzzingAftab Hussain
Imagine a world where software fuzzing, the process of mutating bytes in test seeds to uncover hidden and erroneous program behaviors, becomes faster and more effective. A lot depends on the initial seeds, which can significantly dictate the trajectory of a fuzzing campaign, particularly in terms of how long it takes to uncover interesting behaviour in your code. We introduce DIAR, a technique designed to speedup fuzzing campaigns by pinpointing and eliminating those uninteresting bytes in the seeds. Picture this: instead of wasting valuable resources on meaningless mutations in large, bloated seeds, DIAR removes the unnecessary bytes, streamlining the entire process.
In this work, we equipped AFL, a popular fuzzer, with DIAR and examined two critical Linux libraries -- Libxml's xmllint, a tool for parsing xml documents, and Binutil's readelf, an essential debugging and security analysis command-line tool used to display detailed information about ELF (Executable and Linkable Format). Our preliminary results show that AFL+DIAR does not only discover new paths more quickly but also achieves higher coverage overall. This work thus showcases how starting with lean and optimized seeds can lead to faster, more comprehensive fuzzing campaigns -- and DIAR helps you find such seeds.
- These are slides of the talk given at IEEE International Conference on Software Testing Verification and Validation Workshop, ICSTW 2022.
How to Get CNIC Information System with Paksim Ga.pptxdanishmna97
Pakdata Cf is a groundbreaking system designed to streamline and facilitate access to CNIC information. This innovative platform leverages advanced technology to provide users with efficient and secure access to their CNIC details.
DevOps and Testing slides at DASA ConnectKari Kakkonen
My and Rik Marselis slides at 30.5.2024 DASA Connect conference. We discuss about what is testing, then what is agile testing and finally what is Testing in DevOps. Finally we had lovely workshop with the participants trying to find out different ways to think about quality and testing in different parts of the DevOps infinity loop.
Large Language Model (LLM) and it’s Geospatial Applications
Column store databases approaches and optimization techniques
1. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
DOI : 10.5121/ijdkp.2013.3508 83
COLUMN-STORE DATABASES: APPROACHES
AND OPTIMIZATION TECHNIQUES
Tejaswini Apte1
, Dr. Maya Ingle2
and Dr. A.K. Goyal2
1
Symbiosis Institute of Computer Studies and Research
2
Devi Ahilya Vishwavidyalaya, Indore
ABSTRACT
Column-Stores database stores data column-by-column. The need for Column-Stores database arose for
the efficient query processing in read-intensive relational databases. Also, for read-intensive relational
databases,extensive research has performed for efficient data storage and query processing. This paper
gives an overview of storage and performance optimization techniques used in Column-Stores.
1. INTRODUCTION
Database system storage technology is mainly composed by Column-Stores (CS) and Row-Stores
(RS) [1]. Currently, the majority of popular database products use RS, i.e. to store each record’s
attributes together, such as Microsoft SQL Server, Oracle and so on. A single disk writing action
may bring all the columns of a record onto disk. RS system shows its high performance in the
majority of database system applications like business data management. In contrast, RS system
functions poorly for variety of analytical data applications, as the purpose of analytical operations
is to gain new view point through data reading and to drive the production and implementation of
plan.
Therefore, researchers proposed the idea of CS in read-optimized databases [1]. CS database is
different from traditional RS database, because the data in table is stored in column and visited by
column. However, the CS access method is not quite suitable for transactional environment
(activities), where activities and data in rows are tightly integrated. During the execution on CS
database, query on metadata is the most frequent kind of operation [1]. The metadata table is
always presented in the form of tuples to the upper interface. The metadata access efficiency is
most concerned problem for ad-hoc query operations for large database. To improve the
efficiency of read and write operations of metadata queries, the most simple and effective way is,
column to row operation of CS metadata table in buffer. The objective of this survey is to present
an overview of CS features for read-intensive relational databases.
The organization of the paper is as follows: Section 2 elaborates CS approaches, its themes and
aspects of general disciplines that help to improve performance for ad-hoc queries. Section 3
emphasizes on the areas of performance optimization in CS. Moreover, it attempts to clarify the
skills useful for improving the performance in CS. We conclude in Section 4 with the summary.
2. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
84
2. COLUMN-STORE APPROACHES
There are different approaches proposed in the literature to build a CS namely; Vertical
Partitioning, Index-Only Plans, and Fractured Mirror.
2.1 Vertical Partitioning
The most straightforward way to simulate CS in RS is through vertical partition. Tuple
reconstruction is required to build a single row in fully vertically partitioned approach. For tuple
reconstruction in vertical partition, an integer "position" column is been added to every partition,
preferably primary key (Figure 1). For multi-attribute queries joins are performed on position
attributes through rewriting the queries [2].
Figure 1: Vertical Partitioning
2.2 Index-Only Plans
The vertical partitioning approach wastes space by keeping the position attributes at each
partition. To utilize the space efficiently, the second approach is index-only plans, in which each
column of every table is indexed using unclustered B+Tree and base relations are stored using a
standard, RS design. Index-only plan works through building and merging lists of (record-id,
value) pairs that satisfy predicates on each table (Figure 2).
Figure 2:Index-Only Plans
3. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
85
Indices do not store duplicate values, and hence have less tuple overhead. For no predicate, query
scanning through the index-only approach may be slower than scanning a heap file. Hence, a
composite key indexes are created for an optimization through the index-only approach.
2.3 Fractured Mirrors Approach
This approach is driven by hybrid row/column approach. The design of fractured mirror includes
RS for updates and the CS for reads, with a background processes migrating the data from RS to
CS (Figure 3). Exploration of vertically partitioned strategy has been done in detail with the
conclusion that tuple reconstruction is a significant problem, and pre-fetching of tuples from the
secondary storage is essential to improve tuple reconstruction times [3].
Figure 3 : Fractured Mirror Approach
3. OPTIMIZATION TECHNIQUES
Three common optimizations techniques are suggested in the literature for improving the
performance in CS database systems.
3.1 Compression
The compression ratio through existing compression algorithms is more for CS, and has been
shown to improve query performance [4], majorly through I/O cost. For a CS, if query execution
can operate directly on compressed data, performance can further be improved, as decompression
is avoided. Hu-Tucker algorithm is systematic and optimal logical compression algorithm for
performing order preserving compression. The frequency of column values are considered to
generate an optimal weighted binary tree. The dictionary is constituted from these frequent values
[5].
In dictionary encoding logical compression algorithm, values are defined in schema and, in the
source dataset these values are replaced by equivalent symbols, as the index value in the
dictionary. To reduce the effect of decompression, indexes are used to scan the dictionary. An
extensive research is being carried in dictionary based domain compression, which is extensively
used in relational database. Dictionary encoding logical compression algorithm is frequently used
for commercial database applications [6, 7].
4. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
86
Entropy encoding logical compression, including Huffman encoding [6], are considered heavy-
weight algorithms, since decompressing variable length entropy encoded data is more processor
intensive. These algorithms are modified to perform better in relational databases [7, 8]. Initially
compression work mainly focused on disk space savings and I/O performance [7, 10, 11]. In
addition literature shows that, with lower decompression cost; compression can also lead to better
CPU performance [12, 13]. There exists important aspect related to the cost calculation of query
optimizer in recent database literature.
3.2 Materialization Strategies
For read-intensive relational databases, vertical partitioning plays vital role for performance
improvement. Recently several read-intensive relational databases have adopted the idea of fully
vertically partition [1, 14]. Research on CS has shown that for certain read-mostly workloads, this
approach can provide substantial performance benefits over traditional RS database systems. CS
are essentially a modification only to the physical view of a database and at the logical and view
level, CS looks identical to a RS.
Separate columns must ultimately be stitched into tuples. Determining the stitching schedule in a
query plan is critical. Lessons from RS suggest a natural tuple construction policy i.e. for each
column access, column is being added to an intermediate tuple representation for later
requirement. Adding columns to the intermediate results as early as possible in the literature is
known as early materialization. Early materialization approach in CS is more CPU bound. Late
materialization approach is more CPU efficient, as less intermediate tuples to be stitched.
However, sometime rescanning of the base columns is required to form tuples, which can be slow
[15, 16].
(a) (b)
Figure 4 : Early Materialization Query Plan
5. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
87
(a) (b)
Figure 5: Late Materialization Query Plan
To overcome disadvantages, an invisible join can be used in CS for foreign-key/primary-key joins
by minimizing the values that need to be extracted in arbitrary order. The joins are rewritten for
predicates of foreign key columns. Predicate evaluation is carried through hash lookup. Only after
evaluation of predicates the appropriate tuples are extracted from the relevant dimensions [17].
The successful CS systems C-Store and Vertica, exploit projection to support tuple
reconstruction. They divide logically a relation into sub-relations, called projections. Each
attribute may appear in more than one projection. The attributes in the same projection are sorted
on one or more sort key(s) and are stored respectively on the sorted order [18]. Projection makes
multiple attributes referenced by a query to be stored in the same order, but not all the attributes
be the part of the projection and hence pays tuple reconstruction time penalty. In the past, the
series of CS databases C-Store and MonetDB have attracted researchers in CS area due to their
good performance [18].
C-Store series exploit projections to organize logically the attributes of base tables. Each
projection may contain multiple attributes from a base table. An attribute may appear in more
than one projection and stored in different sorted order. The objective of projections is to improve
the tuple reconstruction time. Though the issue of partitioning strategies and techniques has not
been addressed well, a good solution is to cluster attributes into sub-relations based on the usage
patterns. The Bond Energy Algorithm (BEA) was developed to cluster the attributes of the
relation based on compatibility [19].
MonetDB proposed self-organization tuple reconstruction strategy in a main memory structure
called cracker map [14]. Split-up and combination of the pieces of the existing relevant cracker
maps make the current or subsequent query select qualified result set faster. But for the large
databases, cracker map pays performance penalty due to high maintenance cost [14]. Therefore,
the cracker map is only adapted to the main memory database systems such as MonetDB.
6. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
88
3.3 Block Searching
Block Searching in CS is greatly influenced by the address lookup process. Block search process
has been carried through hashing algorithms [4, 20, 1]. Hashing algorithms have been used
widely to avoid block conflicts in multiprocessor systems. To compute the block number, integer
arithmetic is widely used in linear skewing functions. Burroughs Scientific Processor introduced
fragmentation to minimize the block search time. For stride patterns, block conflicts may be
reduce by XOR-based hash functions. XOR-based functions design mainly focused on conflict-
free hash function for various patterns [22]. Bob Jenkins' hash produces uniformly distributed
values. However, the handling of partial final block increases the complexity of Bob Jenkins'
hash [21].
4. CONCLUSION
In this paper, we presented various CS approaches to optimize performance of read-intensive
relational database systems. CS enables read-intensive relational database engineers to broaden
the areas of storage architecture and design, by exploration of entities, attributes and frequency of
data values. Moreover, all disciplines require knowledge that constitutes data structures, data
behaviour, and relational database design.
REFERENCES
[1] S. Harizopoulos, V.Liang, D.J.Abadi, and S.Madden, “Performance tradeoffs in read-optimized
databases,” Proc. the 32th international conference on very large data bases, VLDB Endowment,
2006, pp. 487-498.
[2] S. Navathe, and M. Ra. “Vertical partitioning for database design: a graphical algorithm”. ACM
SIGMOD, Portland, June 1989:440-450
[3] Ramamurthy R., Dewitt D.J., and Su Q. A Case for Fractured Mirrors. Proceedings of VLDB
2003, 12: 89–101
[4] Daniel J. Abadi, Samuel R. Madden, and Miguel Ferreira. Integrating compression and execution
in column oriented database systems. In SIGMOD, pages 671–682, Chicago, IL, USA, 2006.
[5] http://monetdb.cwi.nl/, 2008.
[6] J. Goldstein, R. Ramakrishnan, and U. Shaft, “Compressing relations and indexes,” Proc. ICDE
1998, IEEE Computer Society, 1998, pp. 370-379.
[7] G. V. Cormack, “Data compression on a database system,” Commun.ACM, 28(12):1336-1342,
1985.
[8] B. R. Iyer, and D. Wilhite, “Data compression support in databases,” Proc. VLDB 1994, Morgan
Kaufmann, 1994, pp. 695-704.
[9] H. Lekatsas and W. Wolf. Samc: A code compression algorithm for embedded processors. IEEE
Transactions on Computer-Aided Design of Integrated Circuits and Systems, 18(12):1689–1701,
December 1999.
7. International Journal of Data Mining & Knowledge Management Process (IJDKP) Vol.3, No.5, September 2013
89
[10] G. Ray, J. R. Haritsa, and S. Seshadri, “Database compression: A performance enhancement
tool,” Proc. COMAD 1995, 1995.
[11] L. Holloway, V. Ramon, G. Swart, and D. J. Dewitt, “How to barter bits for chronons:
compression and bandwidth trade offs for database scans,” Proc. SIGMOD 2007, ACM, 2007,
pp. 389-400.
[12] D. Huffman, “A method for the construction of minimum redundancy codes,” Proc. I.R.E, 40(9),
pp. 1098- 1101, September 1952.
[13] C. Lefurgy, P. Bird, I. Chen, and T. Mudge. Improving code density using compression
techniques. In Proceedings of International Symposium on Microarchitecture (MICRO), pages
194–203, 1997.
[14] D. 1 Abadi, D. S. Myers, D. 1 DeWitt, and S. Madden, “Materialization strategies in a column-
oriented DBMS,” Proc. the 23th international conference on data engineering, ICDE, 2007,
pp.466-475
[15] S. Idreos, M.L.Kersten, and S.Manegold, “Self organizing tuple reconstruction in column-
stores,” Proc. the 35th SIGMOD international conference on management of data, ACMPress,
2009, pp. 297-308
[16] H. I. Abdalla. “Vertical partitioning impact on performance and manageability of distributed
database systems (A Comparative study of some vertical partitioning algorithms)”. Proc. the 18th
national computer conference 2006, Saudi Computer Society.
[17] Ailamaki, D. J. DeWitt, M. D. Hill, and M. Skounakis. Weaving relations for cache performance.
In VLDB ’01, pages 169–180, San Francisco, CA, USA, 2001.
[18] R. Agrawal and R. Srikanr.”Fast algorithms for mining association rules,”. Proc. the 20th
international conference on very large data bases,Santiago,Chile, 1994.
[19] R. Raghavan and J.P. Hayes, “On Randomly Interleaved Memories,”SC90: Proc.
Supercomputing ’90, pp. 49-58, Nov. 1990.
[20] Gennady Antoshenkov, David B. Lomet, and James Murray. Order preserving compression. In
ICDE ’96, pages 655–663. IEEE Computer Society, 1996.
[21] Jenkins Bob "A hash function for hash Table lookup".
[22] J.M. Frailong, W. Jalby, and J. Lenfant, “XOR-Schemes: A Flexible Data Organization in
Parallel Memories,” Proc. 1985 Int’l Conf.Parallel Processing, pp. 276-283, Aug. 1985.