SlideShare a Scribd company logo
1 of 9
Download to read offline
An Efficient Cache Handling Technique in Database
Systems
Abhishek Shah George Sam Nikhil Lakade
rakeshsh@usc.edu gsam@usc.edu lakade@usc.edu
University of Southern University of Southern University of Southern
California California California
1. ABSTRACT
In various commercial database
systems, the queries which have complex
structures often take longer time to execute. The
efficiency of the query processing can be greatly
improved if the results of the previous queries are
stored in the form of caches.
These caches can be used to answer
queries later on. Furthermore, the cost factor to
process large and complex queries is huge in
commercial databases due to the size of the
databases, and hence we need a way to optimize
processing by automatically caching the
intermediate results. Creating such an automatic
system to cache the results would help in saving
time.
Existing cache systems do manage to
store the intermediate results, but they suffer
from the problem of not knowing how efficiently
to use the cache memory to store the results. It
also becomes a problem, if the database gets
regularly updated. The cache would then become
obsolete. It is necessary to decide when to discard
a cache and the frequency of checking the
updates in the database.
2. INTRODUCTION
2.1 The Problem
Over the past few years there has been
an increase the need for use of Large Data
Warehouses and OLAP[2]
(On-Line Analytical
Processing) systems in managing large scale data.
These techniques provide an efficient and
economical way of storing large data and
extracting useful information from it. These
systems have proven useful in multiple
applications. The most common being in the use
of Data-Warehouses of Super-Marts, like
Walmart. increase the need for use of Large Data
Warehouses and OLAP (On-Line Analytical
Processing) systems in managing large scale data.
These techniques provide an efficient and
economical way of storing large data and
extracting useful information from it. These
systems have proven useful in multiple
applications. The most common being in the use
of Data-Warehouses of Super-Marts, like
Walmart.
The query processing time for OLAP
and decision support usually ranges from minutes
to hours, but depends mostly on the extent of the
database, the type of query and the processing
capabilities of the servers. Usually, large scale
servers require minimal time to process the
decision support queries. This processing time
however, reduces vastly if multiple queries are
executed simultaneously, or if the structure of the
query is complex. Complex queries are built up
of many sub queries, and the result set of the final
query depends on the result set obtained by
executing the sub queries.
Traditional databases employ the
method of treating every query independently.
However, this results in increase in processing
time. Moreover, if there is a scenario where a
particular query is frequently used, then every
time one needs to fire the same query, which
brings the problem of redundancy in processes.
2.2 Challenges
a. One main concern is knowing and deciding
which cache entries are to be deleted, and
whether to delete the entire cache or just part
of the cache. This becomes crucial if part of
the result is needed for further queries.
b. Many commercial database systems are
frequently updated. Our challenge here is to
update the results of the intermediate queries
as and when CRUD[1]
operations take place
on the database.
3. MOTIVATION
Seeing how commercial databases and
OLAP struggle to fetch query results efficiently,
having an automated system closely coupled with
the query optimizer to cache results and manage
them will provide an efficient way to retrieve
data. For example, consider a website which has
to fetch data from a large database every time a
page is accessed by a user.
The load time of the page increases
significantly because of large data and relatively
slow query processing time. It is also useful if the
system can automatically decide when to discard
or keep a particular cache entry, and handle
frequent database updates.
4. RELATED WORK
4.1 Exchequer
Exchequer[1]
is an intermediate query
caching system developed for the purpose of
storing relevant sub query results. The authors of
the paper on this system have differentiated their
system from normal Operating System caching
techniques based on the following aspects:
a. In traditional cache systems, size and cost of
computation aren’t considered, while the recency
of a used data object is sufficient.
b. In a query cache system, results from previous
cached queries can be used further for use.
c. In the traditional cache systems, the pages are
independent of each other, and so can be easily
deleted without affecting other pages.
In the Exchequer system, they take into
account the dynamic nature of the cache, so here
the traditional materialized view or an index
solution will not work efficiently. In the
materialized view scenario, there are techniques
to decide which entities to materialize and other
previous materialized views are taken into
consideration. This does not work in a static
cache system. Thus exchequer uses the dynamic
nature of cache system. Another system uses
multi-query caching, and takes into account the
cost of materializing the selected views, but
makes static decision on what to materialize. .
The Exchequer system also uses an AND-OR
DAG representation of queries and the cached
results. The use of DAG makes it extensible for
new operations and efficiently encodes
alternative ways of evaluating queries .The
exchequer DAG representation also takes into
account sort orders and presence of indexes.
In the exchequer architecture, there is a
tight coupling between cache manager and
optimizer. It uses a query execution plan to refer
to cached relations which is got by the execution
engine and the new intermediate query results
produced by the query are sometimes cached. It
uses an incremental greedy algorithm to decide
which results should be cached. The algorithm
first checks if any of the nodes of the chosen
query plan should be cached.
The incremental decision is made by
updating the representative set with a new query
and a selection algorithm is applied to the nodes
selected when the previous query was considered
and the nodes of the chosen query plan. The
output of this algorithm is a set of nodes that are
marked for caching and the best plan for the
current query. Thus when the query gets
executed, the nodes in its best plan that are
marked are added to cache, which then replace
the unmarked nodes. The unmarked nodes are
chosen for replacement using LCS/LRU, i.e. the
largest results are evicted, and amongst the
remaining results, the least recently used is
evicted. The exchequer optimizes the query
before fresh caching decisions are made the
chosen plan for each query is optimal for that
query, given the current cache contents.
4.2 Composite Information Server Platform &
Query Processing Engine:
Composite Information Server
Platform[2]
is an intermediate server platform that
works over REST, HTTP and SOAP protocols in
a client- server web application environment. It is
provided by Oracle. It receives client requests
and authenticates them either through LDAP or
Composite Domain Authentication and then
passes it to the Query Processing Engine. The
query processing engine then executes this
request over some data source and retrieves a
data result. It then combines this data into a
single SQL or XML result set and returns to the
client. The Query Processing Engine provides
various optimization methods so that the SQL
query is efficient. It basically translates all
requests into a distribution plan. It then analyzes
queries and creates an optimized execution plan
to determine the intermediate steps and data
source requests. The Query Processing Engine
also employs a Caching technique for the queries.
These sequence of queries are then executed
against the relevant data.
The engine minimizes the overheads and
creates efficient join methods that leverage
against a suboptimal query. The techniques that
the query engine provide include:
a. SQL Pushdown:
The Query Processing Engine offloads most of
the query processing. It pushes down the select
query operations like string searches,
comparisons, sorting etc. into the underlying
relational data sources.
b. Parallel Processing:
The Query Engine processes requests in a parallel
and asynchronous way on separate threads, thus
reducing the wait time and data source response
latency
c. Caching:
The Composite Information Server is
configured to cache the results of query, web
service calls and the procedures. It does this on a
per view/query basis. It stores this intermediate
results in either relational database or in a file
based cache. The Engine always checks if the
result of the query is already present in the cache,
and uses this cache data. It is most useful when
used on data which is frequently invoked and
which change rarely. In the scenario where the
data is constantly changing, the query engine
does not perform very well and cannot handle
frequent changes to data.
4.3 Multidimensional Query Cache using Chunks
To improve query response time in
OLAP caching of queries has been proposed,
which consists of mainly two approaches, table
level caching and query level caching. A
proposed previous work uses chunks[3]
to reuse
results of previous queries to answer future
queries. To achieve performance later, chunked
cache is combined with chunked file
organization.
Chunk file organization is basically
redefines the organization for relation tables. This
new organization of chunk files reduces cost
chunk cache miss. Concern about this
methodology is select required chunk. Smaller
chunks results into efficient query optimization
but efficiency downgrades when total number of
chunks increases in system and hence another
paradigm comes, which is to decide replace
policies for chunked caches.
4.4 Usability Based Caching
Another similar work that has been
done on this topic is the usability-based caching[4]
of query results in OLAP systems. In this method
they propose a new cache management scheme
for OLAP systems which is based on the usability
of query results in rewriting and processing of
related future queries. Not only they take into
consideration the queries that are currently being
executed but it also predicts the future queries
based on the present and past queries that are
being executed on this system using the
probability model.
5. SOLUTION SYSTEM
5.1 Architecture
The architecture of our proposed model of
optimizer is shown in figure 1.
Fig1. Architecture of System
The optimizer and cache manager
works in close coupling with our intermediate
query cache system. The optimizer uses the
chunked cache to efficiently cache incoming
database queries.
The query execution plan and the cache
management plan are designed inside the
Optimizer and Cache Manager. This block is
responsible for changing the current cache. The
query execution plan is created using the cached
chunks. This chunked cache is obtained from the
Execution Engine when as and when required.
5.2 Use of Chunks
There are systems which are becoming
increasingly dynamic like that of OLAP and
important for business data analysis. Usually the
data sets in such systems are of multidimensional
nature. The traditional relational systems are
designed in such a way that they cannot provide
the required performance for these data sets.
Hence such systems are built by using a three tier
architecture. The first tier gives an easy to use
graphical tool that allows the users to build
queries. The second tier provides a
multidimensional view of the data stored In the
final tier, which can be a RDBMS. Queries that
occur in systems like OLAP are very interactive
and demand quick response time even if they are
of complex nature.
At times OLAP queries are repetitive in
nature a d follow a predictable pattern. An
OLAP type session can be characterized using
different kinds of locality.
1. Terrestrial: The same data might be accessed
repeatedly by the same user or a different user.
2. Hierarchical: This kind of locality is specific to
the OLAP domain and is consequence of the
presence of hierarchies on the dimensions. Data
members which are related by the parent/child or
sibling relationships will be accessed over and
over again. For example if a user is looking at
data for United States his next query is likely to
be about Canada or Mexico.
We are going to use dynamic caching
scheme where the cache contents vary
dynamically, since new items may be inserted
and old items may be removed from the cache. A
dynamic approach will be significantly beneficial
at the middle tier, since it adapts to the query
profile. Also, we use chunks for dynamic caching
and demonstrate its feasibility under realistic
query workloads without much overhead. We use
multidimensional arrays to represent data. Instead
of storing a large array in simple rows or columns
we break them down to chunks and store them in
a chunked format. The different values for each
dimension are divided into ranges, and chunks
are created based on this division. The figure 1
shows how multidimensional space can be
broken up into chunks.
Fig 2 Chunks
5.3 Caching with the Chunks
In this type of caching using chunks the
query results to be stored in the cache are broken
up into chunks and the chunks are cached. When
a user inputs a new query the existing chunks are
required to answer that query. Depending on the
content available in cache, the list of chunks is
divided into two. One part is answered from the
cache. The other consisting of the missing chunk,
has to be computed from the backend. The cost is
reduced here by just computing the missing
chunk from the backend.
Caching chunks improves the
granularity of caching. This leads to better
utilization of the cache in two ways.
1. Frequently accessed chunks of a query get
cached. The chunks which are not frequently
accessed are replaced eventually.
2. Previous queries can be used much more
effectively. For example Figure 2 shows a chunk
based cache, in which each query represents a
portion of multidimensional space. Say we have
three queries Q1, Q2 and Q3 and Q1 and they are
called in the increasing order. Now Q3 is not
contained in Q1 and Q2 or their union. Thus,
methods based on query containment will not be
able to use Q1 and/or Q2 to answer Q3. With
chunk based caching, Q3 can use the chunks it
has in common with Q1 and Q2. Only the
remaining chunks which are shown below in the
figure 2 have to be computed. The chunked file
organization in the relational backend enables
these remaining chunks to be computed in time
proportional to their size rather than in time
proportional to the size of Q3.
Fig 3 Reuse of Cached Chunks
5.4 Replacement Scheme using Chunks
Replacement schemes become a very import
structure of this system as the future queries are largely
dependent on this. The old chunks has to be removed and
the new chunks have to be added to the cache for an
efficient caching. There are different replacements
strategies like LRU but are not efficient enough.
Schemes which make use of a profit metric
consisting of the size and execution cost of a query are
considered in [6]. We also something similar for the
replacement scheme, we combined the TIME scheme with
the notion of benefit. Let Benefit(C) denote the benefit of a
chunk. We associate one more quality, called Weight(C)
with each chunk C in the cache. The replacement algorithm
is as follows:
Algorithm: TimeBenefit
Input: chunk N to be inserted in the cache
while [ space not available for N]:
Let C be the chunk corresponding to current
time position.
If [ Wieght (C) ≤ 0] :
Evict C from the cache
Else :
Weight (C) = Weight (C) – Benefit (N)
EndIf
Advance Time position
EndWhile
Insert N into cache
Weight (N) = Benefit (N)
5.2 DAG Representation
Dag is a Directed Acyclic Graph. In our
implementation of a cache query optimizer
system, it is important that we find an efficient
way to represent a query. It is done so that we
find an optimized query plan to execute. The
query execution is optimized if we have an
efficient query plan. To use the query evaluation
structurally, we use the concept of Directed
Acyclic graphs. The DAG is a way to optimally
represent the set of queries and operations. Using
the DAG an efficient query plan can be
generated, and this query DAG can be further
used to create a query caching algorithm.
An efficient algorithm using the DAG structure
for queries is the Volcano Algorithm [5]. This
algorithm represents the queries and the set of
queries in the form of DAG. In a DAG, there are
2 set of nodes:
AND nodes and OR nodes. AND nodes are used
to represent operations performed on the result
sets and queries. It represents the operations like
select, join and other operations on result sets.
OR nodes are used to represent queries and result
sets. In our implementation the OR nodes will
represent the sub queries which will get cached.
The OR nodes are called equivalence nodes in the
Volcano Algorithm. The equivalence nodes do
not have any operational representation and are to
describe the data in the system.
How single queries are handled:
The single queries are directly
representable in DAG. In a single query, a query
tree is first created using the relations in the
query and the operations. Once the query tree is
created it is sequentially expanded to generate
further equivalence nodes over the operational
nodes. It is given in the following diagrams. The
squares denote the equivalence nodes and the
circles denote the operational nodes.
Let us assume the query is of the form
A⋈B⋈C. It is represented as DAG inFig4.a,
Fig4.b Fig4.c. The relations A, B, C and the
intermediate relations are represented in the
equivalence nodes. The join operator are
represented as the circle nodes. It will be
represented as the following steps. Fig 4.a shows
the query tree for the query. The additional
equivalence nodes for the intermediate results are
created for AB in Fig 4.b.
Now the DAG is expanded to
accommodate for all possible combinations of the
join operators. In Fig4.c we take all possible
initial joins. It is done between AB, AC and BC.
These are then stored as intermediate result sets.
These are later used for the join query to create
the final result set.
Fig4.a- Initial Query Tree
Fig 4.b- Intermediate DAG
Fig 4.c- DAG of Single Query
How query sets are handled:
Query sets are handle a little differently
in the Volcano Algorithm. In this version of the
Volcano Algorithm, the intermediate equivalence
nodes represent the result sets. Each query set is a
set of multiple queries. The deletion of queries in
this case is done by reference counting
mechanisms. The queries are added into the DAG
one at a time. At each time a query is inserted, a
new equivalence node and operational node is
created.
Sometimes, the expressions may match
existing subexpressions in the DAG. Query
subexpressions may be equal too. The volcano
optimizer algorithms handles these subexpression
anomalies. An example of this could be the
problem arriving due to associativity of the join
operators in multi relation queries. The Volcano
Algorithm applies the associativity and then
unifies the nodes by replacing them with a single
equivalence node.
5.3 Query Optimization over DAG:
Now that the Dag is created, the
algorithm will perform certain functions on the nodes to
evalueate the cost value of each node based on its type. The
equivalence nodes cost and the operational nodes costs are
evaluated separately. The optimizer also takes into account
the cost of reading the input when pipelines are not used.
The cost at each node is a function of its children and the
subtrees below it.
For operational node o the cost function is defined as
cost(o) = cost of executing (o) + ∑ei∈children(o) cost(ei)
The children of o are all equivalence nodes. The cost of
each equivalence node is
cost(e) = min{cost(oi) | oi ∈ children€}
= 0 if there are no children
We now have to take into account the case when
some subset of nodes may be materialized and we may
need to reuse these materialized nodes. We introduce a new
function called reusecost(ei) which gives us the cost if an
equivalence node ei is re used again from a materialized set
M.
Thus the modified cost factor will be
Cost(o) = cost of executing (o) + ∑ei∈children(o) CC(ei)
Where CC(ei) = cost(ei) if ei ∉ M
= min(cost(ei), reusecost(ei)) if ei ∈ M
5.4 Algorithm to Handle Cache Delete and Insertion
The algorithm proposed finds out if any of nodes
in the DAG and the chunk system are worth caching. We
need to find out the benefit of adding or deleting these
nodes. We create benefit functions for DAG too which will
be similar to the Benefit(N) function in the chunk system.
The benefit function also takes into account the number of
times the previous query was used. The proposed optimizer
needs to know the nodes selected when the previous query
was considered and all the nodes of the query plan. Now
suppose S is a set of nodes selected to be cached from
representative set R, then for a query q
Cost(R,S) = ∑q∈R (cost(q, S) * weight(q))
We now find out the benefit function.
Benefit(R,x,S) = cost(R, S) – (cost(R, {x} ∪ S) + cost(x, S))
This finds out the benefit we get by adding node
x to the DAG. In cases where x is computed already we
assume the cost(x, S) to be 0.
We can now create a modified algorithm that will
handle cache deletions and insertions.
Algorithm: TimeBenefit
Input: chunk N to be inserted in the cache
Set X of the expanded DAG with nodes cached
Node x with benefit(x,R,X)
while [ space not available for N]:
Let C be the chunk corresponding to current
time position.
If [ Wieght (C) ≤ 0] :
Evict C from the cache
Delete x and its equivalence nodes from
X
Else :
Weight (C) = Weight (C) – Benefit (N)
EndIf
Advance Time position
EndWhile
Insert N into cache
Weight (N) = Benefit (N)
Algorithm Modify_Cache
Input:
Expanded DAG for R, the representative set of queries,
and the set of candidate equivalence nodes for caching
Chunk N to be inserted
Output: Set of nodes to be cached
X=φ
Y = set of candidates equivalence nodes for caching
while (Y = φ)
L1:
Among nodes y ∈ Y such that size({y} ∪ X) < CacheSize)
Pick the node x with the highest benef it(R, x, X)/size(x)
/* i.e., highest benefit per unit space */
if (benef it(R, x, X) < 0)
break; /* No further benefits to be had, stop */
TimeBenefit(N,X,x)
Y = Y - x; X = X ∪ {x}
return X
The Modify_Cache algorithm now handles the
deletion of nodes and chunks from the optimizer and also
creates optimum cache mechanism for the queries.
5.5 Handling Frequent Updates
Our solution system also handles the frequent updates
made on the database and the coherency with the cache
system. As new data keeps getting added to the databases,
the cache needs to be modified. We also need to discard or
modify necessary chunks in the chunk system.
To do this we create a sub module which will act
as a proxy between the database and the client. As and
when the client system proposes changes to the database,
the data will first enter the proxy system. The proxy system
will then decide which relations and attributes in each
relation need to be modified in the database. The proxy
module is also connected to the Optimizer and Cache
Manager. This ensures that the chunks are mapped on to
the respective relation attributes in the proxy. Whenever a
new update enters the proxy module, it creates a map
pointer to the necessary chunks in the chunk system.
The proxy module then finds out which chunks in the
chunk system will need to be modified. Once this mapping
is created it then sends the update to the database to be
materialized. Further analysis of the proxy system will be
done in the future work.
6. EVALUATION OF RESULTS
6.1 Evaluating the use of DAG
The Dag based approach is also followed by The
Exchequer system. In this method we use the concept of
making the query into a set of nodes to be evaluated as a
DAG. This DAG is then expanded to analyze the nodes and
then the operations on the nodes are performed one by one,
We then get the cached nodes of the Dag and nodes which
need to be deleted form the DAG.
The query structure taken for evaluation is of the following
type.
SELECT SUM(QUANTITY)
FROM ORDERS, SUPPLIER, PART,
CUSTOMER, TIME
WHERE join-list AND select-list
GROUP BY groupby-list;
It has a central Orders fact table and dimension tables- Part,
Supplier, Customer and Time. The size of these tables is
the same as used in [1]. The join-list us used to have
equality between attributes of the order fact table and
primary keys. The select-list are generated by selecting 0
too 2 attributes from the join-list. The groupby-list is
generated by picking at random a subset of all the keys.
These queries are decided so that a fair comparison can be
made.
The metrics used to measure the goodness of the algorithm
is the total response time of the set of queries. The report is
generated for a sequence of 100 queries after 50 queries
have already allowed the cache to be generated. This total
response time is denoted by the estimated cost which is
calculated using the cost functions mentioned in secition
5.3 and 5.4.
To evaluate, the representative set is initially set to 10. And
then we check for different sized of caches, which is
around 5%, 10%, 20% of the database.
We compare our algorithm with the Exchequer system and
with a system which has no cache, it has the LRU method
for cache management. The LRU policy is found widely in
ADBMS systems., it picks the least recently used chunk
and Dag node to be replaced. In LRU the system is
unaware of the load of the work.
Analyzing the Estimated Response Time:
In this part, initially the cache size is kept at a minimum.
Then the algorithms are run on the queries. The
Exchequer’s Algorithm and our Modify_Cache perform
better than the NoCache system. Now the size of the cache
is increased to accommodate 5% of the database. The
algorithms show an improvement in their performance.
This can be because the increase in cache size makes it
easier for the algorithms to find out the chunks which will
be able to answer the queries on their own. In low sized
caches, this poses as a problem as it becomes costlier and
longer to find out chunks to look for answers and then
using the replacement methods for the DAG and the
chunks.
For low cache sizes the Modify_Cache algorithm performs
better than the Exchequer algorithm, with a higher rate of
improvement. However, as the cache size is increased to
30% the estimated cost increases for the Exchequer system.
After a given cache size, the system investment in caching
the extra results obtained in this increased size does not
help in the Benefit factor. With the cache being 30% of the
database size, the Modify_Cache algorithm still returns a
better estimated time than the Exchequer’s algorithm.
6.2 Evaluating for Chunks
We considered various performance measures to evaluate
the effectiveness of the schemes we have employed.
1. Using our system we executed 100 queries to
calculate the average execution time.
2. Cost Saving Ratio: This performance measure is the
percentage of the total cost of the queries saved due to hits
in the cache. The cost to execute the query at the backend
to compute the savings in cost due to a cache hit. Consider
a query stream consisting of a mix of n queries q1, q2,…qn.
c: this is the cost when we execute the query at the backend
hi: when we satisfy qi references made to the cache
ri: number of references to the query qi
Comparing the CSR of a query based system and a chunk
based cache we have come to a conclusion that query based
system gives a value of 0.42 because of the redundant
storage in the cache and the chunk based system the CSR
was 0.98 showing that the cache storage was not redundant.
7. FUTURE WORK
This concept of using queries for caching can be further
optimized. Our algorithm of Modify_Cache currently only
uses the Benefit anf Weight functions to evaluate the utility
of cache chunks. It can be further improved by
accommodating other aspects of the chunks, and also
taking into account eh different operations that can be
performed on the OLAP databases. Further work can be
done in improving the run time of our algorithm. Work also
needs to be done to better use the DAGs and imporving the
run time of the DAG expansions. The time requirement and
additional space requirement for the DAGs to store query
cache information play a crucial role. This needs to be
taken into account for further work. Also more work needs
to be done to find out hoe the frequent updates of the
databases can be handled. Our algorithm, at this point of
time cannot efficiently handle high frequency cache
updates. Work needs to be to implement advanced methods
to identify high frequency cache updates and hence
maintain the efficiency and consistency of the cache.
8. CONCLUSION
Thus it can be seen that the use of chunks and DAG in
implementing a Cache management system proves useful.
The performance of the Query Engine improves with the
use of the Modify_Cache algorithm. The estimated time to
run the queries also decreases tremendously with the use of
query caching and the use of chunks to store caches. The
use of chunks in caching helps in utility of large datastores
and results in decrement in running OLAP queries on the
same datastores.
9. REFERENCES
[1] Don’t Trash your Intermediate Results, Cache ’em,
Prasan Roy, Krithi Ramamritham, S. Seshadri, Pradeep
Shenoy, S. Sudarshan, IIT Bombay.
http://arxiv.org/abs/cs.DB/0003005
[2] Composite Data Virtualization, White Paper, Composite
Software
http://cdn.information-
management.com/media/pdfs/CompositeDVPerformance.p
df
[3] Caching Multidimensional Queries Using Chunks,
Prasad M. Deshpande, Karthikeyan Ramasamy, Amit
Shukla, Jeffrey F. Naughton, University of Wisconsin,
Madison
[4] Usability-based caching of query results in OLAP
systems, Chang-Sup Park, Myoung Ho Kim, Yoon-Joon
Lee
http://www.cin.ufpe.br/~bmcr/public_html/Usability-
based_caching_of_query.pdf
[5] Extensibility and Search Efficiency in the Volcano,
Goetz Graefe and William J. McKenna. Optimizer
Generator. Technical Report CU-CS-91-563, University of
Colorado at Boulder, December 1991.
[6] WATCHMAN: A Data Warehouse Intelligent Cache
Manager, P. Scheuermann, j. Shim and R. Vingralek,
VLDB Conf. 1996

More Related Content

What's hot

A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...Editor IJCATR
 
Data guard architecture
Data guard architectureData guard architecture
Data guard architectureVimlendu Kumar
 
שבוע אורקל 2016
שבוע אורקל 2016שבוע אורקל 2016
שבוע אורקל 2016Aaron Shilo
 
Netezza fundamentals-for-developers
Netezza fundamentals-for-developersNetezza fundamentals-for-developers
Netezza fundamentals-for-developersTariq H. Khan
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Aaron Shilo
 
Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…Aaron Shilo
 
Active / Active configurations with Oracle Active Data Guard
Active / Active configurations with Oracle Active Data GuardActive / Active configurations with Oracle Active Data Guard
Active / Active configurations with Oracle Active Data GuardAris Prassinos
 
Managing user Online Training in IBM Netezza DBA Development by www.etraining...
Managing user Online Training in IBM Netezza DBA Development by www.etraining...Managing user Online Training in IBM Netezza DBA Development by www.etraining...
Managing user Online Training in IBM Netezza DBA Development by www.etraining...Ravikumar Nandigam
 
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsDB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsJohn Beresniewicz
 
High Availability And Oracle Data Guard 11g R2
High Availability And Oracle Data Guard 11g R2High Availability And Oracle Data Guard 11g R2
High Availability And Oracle Data Guard 11g R2Mario Redón Luz
 
Fast Start Failover DataGuard
Fast Start Failover DataGuardFast Start Failover DataGuard
Fast Start Failover DataGuardBorsaniya Vaibhav
 
Sql server 2008 r2 performance and scale
Sql server 2008 r2 performance and scaleSql server 2008 r2 performance and scale
Sql server 2008 r2 performance and scaleKlaudiia Jacome
 
226 team project-report-manjula kollipara
226 team project-report-manjula kollipara226 team project-report-manjula kollipara
226 team project-report-manjula kolliparaManjula Kollipara
 

What's hot (17)

A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...A Review of Data Access Optimization Techniques in a Distributed Database Man...
A Review of Data Access Optimization Techniques in a Distributed Database Man...
 
Data guard oracle
Data guard oracleData guard oracle
Data guard oracle
 
Data guard architecture
Data guard architectureData guard architecture
Data guard architecture
 
netezza-pdf
netezza-pdfnetezza-pdf
netezza-pdf
 
שבוע אורקל 2016
שבוע אורקל 2016שבוע אורקל 2016
שבוע אורקל 2016
 
Netezza fundamentals-for-developers
Netezza fundamentals-for-developersNetezza fundamentals-for-developers
Netezza fundamentals-for-developers
 
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
Exploring Oracle Database Performance Tuning Best Practices for DBAs and Deve...
 
Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…Getting to know oracle database objects iot, mviews, clusters and more…
Getting to know oracle database objects iot, mviews, clusters and more…
 
Active / Active configurations with Oracle Active Data Guard
Active / Active configurations with Oracle Active Data GuardActive / Active configurations with Oracle Active Data Guard
Active / Active configurations with Oracle Active Data Guard
 
Managing user Online Training in IBM Netezza DBA Development by www.etraining...
Managing user Online Training in IBM Netezza DBA Development by www.etraining...Managing user Online Training in IBM Netezza DBA Development by www.etraining...
Managing user Online Training in IBM Netezza DBA Development by www.etraining...
 
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentalsDB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
DB Time, Average Active Sessions, and ASH Math - Oracle performance fundamentals
 
High Availability And Oracle Data Guard 11g R2
High Availability And Oracle Data Guard 11g R2High Availability And Oracle Data Guard 11g R2
High Availability And Oracle Data Guard 11g R2
 
Fast Start Failover DataGuard
Fast Start Failover DataGuardFast Start Failover DataGuard
Fast Start Failover DataGuard
 
Sql server 2008 r2 performance and scale
Sql server 2008 r2 performance and scaleSql server 2008 r2 performance and scale
Sql server 2008 r2 performance and scale
 
226 team project-report-manjula kollipara
226 team project-report-manjula kollipara226 team project-report-manjula kollipara
226 team project-report-manjula kollipara
 
201 Pdfsam
201 Pdfsam201 Pdfsam
201 Pdfsam
 
1 Pdfsam
1 Pdfsam1 Pdfsam
1 Pdfsam
 

Viewers also liked

How Anonymous Can Someone be on Twitter?
How Anonymous Can Someone be on Twitter?How Anonymous Can Someone be on Twitter?
How Anonymous Can Someone be on Twitter?George Sam
 
Federated Ontology for Sports- Paper
Federated Ontology for Sports- PaperFederated Ontology for Sports- Paper
Federated Ontology for Sports- PaperGeorge Sam
 
Federated Ontology Based Query System
Federated Ontology Based Query System Federated Ontology Based Query System
Federated Ontology Based Query System George Sam
 
Mark logic user-group-2012
Mark logic user-group-2012Mark logic user-group-2012
Mark logic user-group-2012Jem Rayfield
 
GFII - Financial Times - Semantic Publishing
GFII - Financial Times - Semantic PublishingGFII - Financial Times - Semantic Publishing
GFII - Financial Times - Semantic PublishingJem Rayfield
 
Dsp bbc-jem rayfield-semtech2011
Dsp bbc-jem rayfield-semtech2011Dsp bbc-jem rayfield-semtech2011
Dsp bbc-jem rayfield-semtech2011Jem Rayfield
 

Viewers also liked (6)

How Anonymous Can Someone be on Twitter?
How Anonymous Can Someone be on Twitter?How Anonymous Can Someone be on Twitter?
How Anonymous Can Someone be on Twitter?
 
Federated Ontology for Sports- Paper
Federated Ontology for Sports- PaperFederated Ontology for Sports- Paper
Federated Ontology for Sports- Paper
 
Federated Ontology Based Query System
Federated Ontology Based Query System Federated Ontology Based Query System
Federated Ontology Based Query System
 
Mark logic user-group-2012
Mark logic user-group-2012Mark logic user-group-2012
Mark logic user-group-2012
 
GFII - Financial Times - Semantic Publishing
GFII - Financial Times - Semantic PublishingGFII - Financial Times - Semantic Publishing
GFII - Financial Times - Semantic Publishing
 
Dsp bbc-jem rayfield-semtech2011
Dsp bbc-jem rayfield-semtech2011Dsp bbc-jem rayfield-semtech2011
Dsp bbc-jem rayfield-semtech2011
 

Similar to Final report group2

Load distribution of analytical query workloads for database cluster architec...
Load distribution of analytical query workloads for database cluster architec...Load distribution of analytical query workloads for database cluster architec...
Load distribution of analytical query workloads for database cluster architec...Matheesha Fernando
 
Comparison of Reporting architectures
Comparison of Reporting architecturesComparison of Reporting architectures
Comparison of Reporting architecturesRajendran Avadaiappan
 
SQL Server 2017 - Adaptive Query Processing and Automatic Query Tuning
SQL Server 2017 - Adaptive Query Processing and Automatic Query TuningSQL Server 2017 - Adaptive Query Processing and Automatic Query Tuning
SQL Server 2017 - Adaptive Query Processing and Automatic Query TuningJavier Villegas
 
QUERY OPTIMIZATION FOR BIG DATA ANALYTICS
QUERY OPTIMIZATION FOR BIG DATA ANALYTICSQUERY OPTIMIZATION FOR BIG DATA ANALYTICS
QUERY OPTIMIZATION FOR BIG DATA ANALYTICSijcsit
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)IJERD Editor
 
Query optimization
Query optimizationQuery optimization
Query optimizationPooja Dixit
 
Orca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataOrca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataEMC
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD Editor
 
Weblogic Cluster advanced performance tuning
Weblogic Cluster advanced performance tuningWeblogic Cluster advanced performance tuning
Weblogic Cluster advanced performance tuningAditya Bhuyan
 
Weblogic performance tuning2
Weblogic performance tuning2Weblogic performance tuning2
Weblogic performance tuning2Aditya Bhuyan
 
Applications of parellel computing
Applications of parellel computingApplications of parellel computing
Applications of parellel computingpbhopi
 
Parallel processing in data warehousing and big data
Parallel processing in data warehousing and big dataParallel processing in data warehousing and big data
Parallel processing in data warehousing and big dataAbhishek Sharma
 
Weblogic Cluster performance tuning
Weblogic Cluster performance tuningWeblogic Cluster performance tuning
Weblogic Cluster performance tuningAditya Bhuyan
 
Query optimization in oodbms identifying subquery for query management
Query optimization in oodbms identifying subquery for query managementQuery optimization in oodbms identifying subquery for query management
Query optimization in oodbms identifying subquery for query managementijdms
 
Real-Time Data Warehouse Loading Methodology Ricardo Jorge S.docx
Real-Time Data Warehouse Loading Methodology Ricardo Jorge S.docxReal-Time Data Warehouse Loading Methodology Ricardo Jorge S.docx
Real-Time Data Warehouse Loading Methodology Ricardo Jorge S.docxsodhi3
 

Similar to Final report group2 (20)

OLAP
OLAPOLAP
OLAP
 
Cjoin
CjoinCjoin
Cjoin
 
Msbi Architecture
Msbi ArchitectureMsbi Architecture
Msbi Architecture
 
Load distribution of analytical query workloads for database cluster architec...
Load distribution of analytical query workloads for database cluster architec...Load distribution of analytical query workloads for database cluster architec...
Load distribution of analytical query workloads for database cluster architec...
 
Comparison of Reporting architectures
Comparison of Reporting architecturesComparison of Reporting architectures
Comparison of Reporting architectures
 
SQL Server 2017 - Adaptive Query Processing and Automatic Query Tuning
SQL Server 2017 - Adaptive Query Processing and Automatic Query TuningSQL Server 2017 - Adaptive Query Processing and Automatic Query Tuning
SQL Server 2017 - Adaptive Query Processing and Automatic Query Tuning
 
QUERY OPTIMIZATION FOR BIG DATA ANALYTICS
QUERY OPTIMIZATION FOR BIG DATA ANALYTICSQUERY OPTIMIZATION FOR BIG DATA ANALYTICS
QUERY OPTIMIZATION FOR BIG DATA ANALYTICS
 
Query Optimization for Big Data Analytics
Query Optimization for Big Data AnalyticsQuery Optimization for Big Data Analytics
Query Optimization for Big Data Analytics
 
International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)International Journal of Engineering Research and Development (IJERD)
International Journal of Engineering Research and Development (IJERD)
 
Query optimization
Query optimizationQuery optimization
Query optimization
 
Orca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big DataOrca: A Modular Query Optimizer Architecture for Big Data
Orca: A Modular Query Optimizer Architecture for Big Data
 
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
IJERD (www.ijerd.com) International Journal of Engineering Research and Devel...
 
Weblogic Cluster advanced performance tuning
Weblogic Cluster advanced performance tuningWeblogic Cluster advanced performance tuning
Weblogic Cluster advanced performance tuning
 
Weblogic performance tuning2
Weblogic performance tuning2Weblogic performance tuning2
Weblogic performance tuning2
 
Applications of parellel computing
Applications of parellel computingApplications of parellel computing
Applications of parellel computing
 
Parallel processing in data warehousing and big data
Parallel processing in data warehousing and big dataParallel processing in data warehousing and big data
Parallel processing in data warehousing and big data
 
Weblogic Cluster performance tuning
Weblogic Cluster performance tuningWeblogic Cluster performance tuning
Weblogic Cluster performance tuning
 
Query optimization in oodbms identifying subquery for query management
Query optimization in oodbms identifying subquery for query managementQuery optimization in oodbms identifying subquery for query management
Query optimization in oodbms identifying subquery for query management
 
Dremel Paper Review
Dremel Paper ReviewDremel Paper Review
Dremel Paper Review
 
Real-Time Data Warehouse Loading Methodology Ricardo Jorge S.docx
Real-Time Data Warehouse Loading Methodology Ricardo Jorge S.docxReal-Time Data Warehouse Loading Methodology Ricardo Jorge S.docx
Real-Time Data Warehouse Loading Methodology Ricardo Jorge S.docx
 

Recently uploaded

"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.MdManikurRahman
 
BORESCOPE INSPECTION for engins CFM56.pdf
BORESCOPE INSPECTION for engins CFM56.pdfBORESCOPE INSPECTION for engins CFM56.pdf
BORESCOPE INSPECTION for engins CFM56.pdfomarzaboub1997
 
Intelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsIntelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsSheetal Jain
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2T.D. Shashikala
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdfKamal Acharya
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...drjose256
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualBalamuruganV28
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..MaherOthman7
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor banktawat puangthong
 
How to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdfHow to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdftawat puangthong
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdfAlexander Litvinenko
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxKarpagam Institute of Teechnology
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisDr.Costas Sachpazis
 
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineLow rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineAftabkhan575376
 
Introduction to Heat Exchangers: Principle, Types and Applications
Introduction to Heat Exchangers: Principle, Types and ApplicationsIntroduction to Heat Exchangers: Principle, Types and Applications
Introduction to Heat Exchangers: Principle, Types and ApplicationsKineticEngineeringCo
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdfKamal Acharya
 
Online crime reporting system project.pdf
Online crime reporting system project.pdfOnline crime reporting system project.pdf
Online crime reporting system project.pdfKamal Acharya
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfJNTUA
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdfKamal Acharya
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024EMMANUELLEFRANCEHELI
 

Recently uploaded (20)

"United Nations Park" Site Visit Report.
"United Nations Park" Site  Visit Report."United Nations Park" Site  Visit Report.
"United Nations Park" Site Visit Report.
 
BORESCOPE INSPECTION for engins CFM56.pdf
BORESCOPE INSPECTION for engins CFM56.pdfBORESCOPE INSPECTION for engins CFM56.pdf
BORESCOPE INSPECTION for engins CFM56.pdf
 
Intelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent ActsIntelligent Agents, A discovery on How A Rational Agent Acts
Intelligent Agents, A discovery on How A Rational Agent Acts
 
Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2Research Methodolgy & Intellectual Property Rights Series 2
Research Methodolgy & Intellectual Property Rights Series 2
 
Supermarket billing system project report..pdf
Supermarket billing system project report..pdfSupermarket billing system project report..pdf
Supermarket billing system project report..pdf
 
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
Tembisa Central Terminating Pills +27838792658 PHOMOLONG Top Abortion Pills F...
 
Final DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manualFinal DBMS Manual (2).pdf final lab manual
Final DBMS Manual (2).pdf final lab manual
 
Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..Maher Othman Interior Design Portfolio..
Maher Othman Interior Design Portfolio..
 
Theory for How to calculation capacitor bank
Theory for How to calculation capacitor bankTheory for How to calculation capacitor bank
Theory for How to calculation capacitor bank
 
How to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdfHow to Design and spec harmonic filter.pdf
How to Design and spec harmonic filter.pdf
 
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdflitvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
litvinenko_Henry_Intrusion_Hong-Kong_2024.pdf
 
analog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptxanalog-vs-digital-communication (concept of analog and digital).pptx
analog-vs-digital-communication (concept of analog and digital).pptx
 
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas SachpazisSeismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
Seismic Hazard Assessment Software in Python by Prof. Dr. Costas Sachpazis
 
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbineLow rpm Generator for efficient energy harnessing from a two stage wind turbine
Low rpm Generator for efficient energy harnessing from a two stage wind turbine
 
Introduction to Heat Exchangers: Principle, Types and Applications
Introduction to Heat Exchangers: Principle, Types and ApplicationsIntroduction to Heat Exchangers: Principle, Types and Applications
Introduction to Heat Exchangers: Principle, Types and Applications
 
Electrical shop management system project report.pdf
Electrical shop management system project report.pdfElectrical shop management system project report.pdf
Electrical shop management system project report.pdf
 
Online crime reporting system project.pdf
Online crime reporting system project.pdfOnline crime reporting system project.pdf
Online crime reporting system project.pdf
 
Diploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdfDiploma Engineering Drawing Qp-2024 Ece .pdf
Diploma Engineering Drawing Qp-2024 Ece .pdf
 
Online book store management system project.pdf
Online book store management system project.pdfOnline book store management system project.pdf
Online book store management system project.pdf
 
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
NEWLETTER FRANCE HELICES/ SDS SURFACE DRIVES - MAY 2024
 

Final report group2

  • 1. An Efficient Cache Handling Technique in Database Systems Abhishek Shah George Sam Nikhil Lakade rakeshsh@usc.edu gsam@usc.edu lakade@usc.edu University of Southern University of Southern University of Southern California California California 1. ABSTRACT In various commercial database systems, the queries which have complex structures often take longer time to execute. The efficiency of the query processing can be greatly improved if the results of the previous queries are stored in the form of caches. These caches can be used to answer queries later on. Furthermore, the cost factor to process large and complex queries is huge in commercial databases due to the size of the databases, and hence we need a way to optimize processing by automatically caching the intermediate results. Creating such an automatic system to cache the results would help in saving time. Existing cache systems do manage to store the intermediate results, but they suffer from the problem of not knowing how efficiently to use the cache memory to store the results. It also becomes a problem, if the database gets regularly updated. The cache would then become obsolete. It is necessary to decide when to discard a cache and the frequency of checking the updates in the database. 2. INTRODUCTION 2.1 The Problem Over the past few years there has been an increase the need for use of Large Data Warehouses and OLAP[2] (On-Line Analytical Processing) systems in managing large scale data. These techniques provide an efficient and economical way of storing large data and extracting useful information from it. These systems have proven useful in multiple applications. The most common being in the use of Data-Warehouses of Super-Marts, like Walmart. increase the need for use of Large Data Warehouses and OLAP (On-Line Analytical Processing) systems in managing large scale data. These techniques provide an efficient and economical way of storing large data and extracting useful information from it. These systems have proven useful in multiple applications. The most common being in the use of Data-Warehouses of Super-Marts, like Walmart. The query processing time for OLAP and decision support usually ranges from minutes to hours, but depends mostly on the extent of the database, the type of query and the processing capabilities of the servers. Usually, large scale servers require minimal time to process the decision support queries. This processing time however, reduces vastly if multiple queries are executed simultaneously, or if the structure of the query is complex. Complex queries are built up of many sub queries, and the result set of the final query depends on the result set obtained by executing the sub queries. Traditional databases employ the method of treating every query independently. However, this results in increase in processing time. Moreover, if there is a scenario where a particular query is frequently used, then every time one needs to fire the same query, which brings the problem of redundancy in processes. 2.2 Challenges a. One main concern is knowing and deciding which cache entries are to be deleted, and whether to delete the entire cache or just part of the cache. This becomes crucial if part of the result is needed for further queries. b. Many commercial database systems are frequently updated. Our challenge here is to update the results of the intermediate queries as and when CRUD[1] operations take place on the database. 3. MOTIVATION Seeing how commercial databases and OLAP struggle to fetch query results efficiently, having an automated system closely coupled with
  • 2. the query optimizer to cache results and manage them will provide an efficient way to retrieve data. For example, consider a website which has to fetch data from a large database every time a page is accessed by a user. The load time of the page increases significantly because of large data and relatively slow query processing time. It is also useful if the system can automatically decide when to discard or keep a particular cache entry, and handle frequent database updates. 4. RELATED WORK 4.1 Exchequer Exchequer[1] is an intermediate query caching system developed for the purpose of storing relevant sub query results. The authors of the paper on this system have differentiated their system from normal Operating System caching techniques based on the following aspects: a. In traditional cache systems, size and cost of computation aren’t considered, while the recency of a used data object is sufficient. b. In a query cache system, results from previous cached queries can be used further for use. c. In the traditional cache systems, the pages are independent of each other, and so can be easily deleted without affecting other pages. In the Exchequer system, they take into account the dynamic nature of the cache, so here the traditional materialized view or an index solution will not work efficiently. In the materialized view scenario, there are techniques to decide which entities to materialize and other previous materialized views are taken into consideration. This does not work in a static cache system. Thus exchequer uses the dynamic nature of cache system. Another system uses multi-query caching, and takes into account the cost of materializing the selected views, but makes static decision on what to materialize. . The Exchequer system also uses an AND-OR DAG representation of queries and the cached results. The use of DAG makes it extensible for new operations and efficiently encodes alternative ways of evaluating queries .The exchequer DAG representation also takes into account sort orders and presence of indexes. In the exchequer architecture, there is a tight coupling between cache manager and optimizer. It uses a query execution plan to refer to cached relations which is got by the execution engine and the new intermediate query results produced by the query are sometimes cached. It uses an incremental greedy algorithm to decide which results should be cached. The algorithm first checks if any of the nodes of the chosen query plan should be cached. The incremental decision is made by updating the representative set with a new query and a selection algorithm is applied to the nodes selected when the previous query was considered and the nodes of the chosen query plan. The output of this algorithm is a set of nodes that are marked for caching and the best plan for the current query. Thus when the query gets executed, the nodes in its best plan that are marked are added to cache, which then replace the unmarked nodes. The unmarked nodes are chosen for replacement using LCS/LRU, i.e. the largest results are evicted, and amongst the remaining results, the least recently used is evicted. The exchequer optimizes the query before fresh caching decisions are made the chosen plan for each query is optimal for that query, given the current cache contents. 4.2 Composite Information Server Platform & Query Processing Engine: Composite Information Server Platform[2] is an intermediate server platform that works over REST, HTTP and SOAP protocols in a client- server web application environment. It is provided by Oracle. It receives client requests and authenticates them either through LDAP or Composite Domain Authentication and then passes it to the Query Processing Engine. The query processing engine then executes this request over some data source and retrieves a data result. It then combines this data into a single SQL or XML result set and returns to the client. The Query Processing Engine provides various optimization methods so that the SQL query is efficient. It basically translates all requests into a distribution plan. It then analyzes queries and creates an optimized execution plan to determine the intermediate steps and data source requests. The Query Processing Engine also employs a Caching technique for the queries. These sequence of queries are then executed against the relevant data. The engine minimizes the overheads and creates efficient join methods that leverage
  • 3. against a suboptimal query. The techniques that the query engine provide include: a. SQL Pushdown: The Query Processing Engine offloads most of the query processing. It pushes down the select query operations like string searches, comparisons, sorting etc. into the underlying relational data sources. b. Parallel Processing: The Query Engine processes requests in a parallel and asynchronous way on separate threads, thus reducing the wait time and data source response latency c. Caching: The Composite Information Server is configured to cache the results of query, web service calls and the procedures. It does this on a per view/query basis. It stores this intermediate results in either relational database or in a file based cache. The Engine always checks if the result of the query is already present in the cache, and uses this cache data. It is most useful when used on data which is frequently invoked and which change rarely. In the scenario where the data is constantly changing, the query engine does not perform very well and cannot handle frequent changes to data. 4.3 Multidimensional Query Cache using Chunks To improve query response time in OLAP caching of queries has been proposed, which consists of mainly two approaches, table level caching and query level caching. A proposed previous work uses chunks[3] to reuse results of previous queries to answer future queries. To achieve performance later, chunked cache is combined with chunked file organization. Chunk file organization is basically redefines the organization for relation tables. This new organization of chunk files reduces cost chunk cache miss. Concern about this methodology is select required chunk. Smaller chunks results into efficient query optimization but efficiency downgrades when total number of chunks increases in system and hence another paradigm comes, which is to decide replace policies for chunked caches. 4.4 Usability Based Caching Another similar work that has been done on this topic is the usability-based caching[4] of query results in OLAP systems. In this method they propose a new cache management scheme for OLAP systems which is based on the usability of query results in rewriting and processing of related future queries. Not only they take into consideration the queries that are currently being executed but it also predicts the future queries based on the present and past queries that are being executed on this system using the probability model. 5. SOLUTION SYSTEM 5.1 Architecture The architecture of our proposed model of optimizer is shown in figure 1. Fig1. Architecture of System The optimizer and cache manager works in close coupling with our intermediate query cache system. The optimizer uses the chunked cache to efficiently cache incoming database queries. The query execution plan and the cache management plan are designed inside the Optimizer and Cache Manager. This block is responsible for changing the current cache. The query execution plan is created using the cached chunks. This chunked cache is obtained from the Execution Engine when as and when required. 5.2 Use of Chunks There are systems which are becoming increasingly dynamic like that of OLAP and important for business data analysis. Usually the data sets in such systems are of multidimensional nature. The traditional relational systems are designed in such a way that they cannot provide
  • 4. the required performance for these data sets. Hence such systems are built by using a three tier architecture. The first tier gives an easy to use graphical tool that allows the users to build queries. The second tier provides a multidimensional view of the data stored In the final tier, which can be a RDBMS. Queries that occur in systems like OLAP are very interactive and demand quick response time even if they are of complex nature. At times OLAP queries are repetitive in nature a d follow a predictable pattern. An OLAP type session can be characterized using different kinds of locality. 1. Terrestrial: The same data might be accessed repeatedly by the same user or a different user. 2. Hierarchical: This kind of locality is specific to the OLAP domain and is consequence of the presence of hierarchies on the dimensions. Data members which are related by the parent/child or sibling relationships will be accessed over and over again. For example if a user is looking at data for United States his next query is likely to be about Canada or Mexico. We are going to use dynamic caching scheme where the cache contents vary dynamically, since new items may be inserted and old items may be removed from the cache. A dynamic approach will be significantly beneficial at the middle tier, since it adapts to the query profile. Also, we use chunks for dynamic caching and demonstrate its feasibility under realistic query workloads without much overhead. We use multidimensional arrays to represent data. Instead of storing a large array in simple rows or columns we break them down to chunks and store them in a chunked format. The different values for each dimension are divided into ranges, and chunks are created based on this division. The figure 1 shows how multidimensional space can be broken up into chunks. Fig 2 Chunks 5.3 Caching with the Chunks In this type of caching using chunks the query results to be stored in the cache are broken up into chunks and the chunks are cached. When a user inputs a new query the existing chunks are required to answer that query. Depending on the content available in cache, the list of chunks is divided into two. One part is answered from the cache. The other consisting of the missing chunk, has to be computed from the backend. The cost is reduced here by just computing the missing chunk from the backend. Caching chunks improves the granularity of caching. This leads to better utilization of the cache in two ways. 1. Frequently accessed chunks of a query get cached. The chunks which are not frequently accessed are replaced eventually. 2. Previous queries can be used much more effectively. For example Figure 2 shows a chunk based cache, in which each query represents a portion of multidimensional space. Say we have three queries Q1, Q2 and Q3 and Q1 and they are called in the increasing order. Now Q3 is not contained in Q1 and Q2 or their union. Thus, methods based on query containment will not be able to use Q1 and/or Q2 to answer Q3. With chunk based caching, Q3 can use the chunks it has in common with Q1 and Q2. Only the remaining chunks which are shown below in the figure 2 have to be computed. The chunked file organization in the relational backend enables these remaining chunks to be computed in time
  • 5. proportional to their size rather than in time proportional to the size of Q3. Fig 3 Reuse of Cached Chunks 5.4 Replacement Scheme using Chunks Replacement schemes become a very import structure of this system as the future queries are largely dependent on this. The old chunks has to be removed and the new chunks have to be added to the cache for an efficient caching. There are different replacements strategies like LRU but are not efficient enough. Schemes which make use of a profit metric consisting of the size and execution cost of a query are considered in [6]. We also something similar for the replacement scheme, we combined the TIME scheme with the notion of benefit. Let Benefit(C) denote the benefit of a chunk. We associate one more quality, called Weight(C) with each chunk C in the cache. The replacement algorithm is as follows: Algorithm: TimeBenefit Input: chunk N to be inserted in the cache while [ space not available for N]: Let C be the chunk corresponding to current time position. If [ Wieght (C) ≤ 0] : Evict C from the cache Else : Weight (C) = Weight (C) – Benefit (N) EndIf Advance Time position EndWhile Insert N into cache Weight (N) = Benefit (N) 5.2 DAG Representation Dag is a Directed Acyclic Graph. In our implementation of a cache query optimizer system, it is important that we find an efficient way to represent a query. It is done so that we find an optimized query plan to execute. The query execution is optimized if we have an efficient query plan. To use the query evaluation structurally, we use the concept of Directed Acyclic graphs. The DAG is a way to optimally represent the set of queries and operations. Using the DAG an efficient query plan can be generated, and this query DAG can be further used to create a query caching algorithm. An efficient algorithm using the DAG structure for queries is the Volcano Algorithm [5]. This algorithm represents the queries and the set of queries in the form of DAG. In a DAG, there are 2 set of nodes: AND nodes and OR nodes. AND nodes are used to represent operations performed on the result sets and queries. It represents the operations like select, join and other operations on result sets. OR nodes are used to represent queries and result sets. In our implementation the OR nodes will represent the sub queries which will get cached. The OR nodes are called equivalence nodes in the Volcano Algorithm. The equivalence nodes do not have any operational representation and are to describe the data in the system. How single queries are handled: The single queries are directly representable in DAG. In a single query, a query tree is first created using the relations in the query and the operations. Once the query tree is created it is sequentially expanded to generate further equivalence nodes over the operational nodes. It is given in the following diagrams. The squares denote the equivalence nodes and the circles denote the operational nodes. Let us assume the query is of the form A⋈B⋈C. It is represented as DAG inFig4.a, Fig4.b Fig4.c. The relations A, B, C and the intermediate relations are represented in the equivalence nodes. The join operator are represented as the circle nodes. It will be represented as the following steps. Fig 4.a shows the query tree for the query. The additional
  • 6. equivalence nodes for the intermediate results are created for AB in Fig 4.b. Now the DAG is expanded to accommodate for all possible combinations of the join operators. In Fig4.c we take all possible initial joins. It is done between AB, AC and BC. These are then stored as intermediate result sets. These are later used for the join query to create the final result set. Fig4.a- Initial Query Tree Fig 4.b- Intermediate DAG Fig 4.c- DAG of Single Query How query sets are handled: Query sets are handle a little differently in the Volcano Algorithm. In this version of the Volcano Algorithm, the intermediate equivalence nodes represent the result sets. Each query set is a set of multiple queries. The deletion of queries in this case is done by reference counting mechanisms. The queries are added into the DAG one at a time. At each time a query is inserted, a new equivalence node and operational node is created. Sometimes, the expressions may match existing subexpressions in the DAG. Query subexpressions may be equal too. The volcano optimizer algorithms handles these subexpression anomalies. An example of this could be the problem arriving due to associativity of the join operators in multi relation queries. The Volcano Algorithm applies the associativity and then unifies the nodes by replacing them with a single equivalence node. 5.3 Query Optimization over DAG: Now that the Dag is created, the algorithm will perform certain functions on the nodes to evalueate the cost value of each node based on its type. The equivalence nodes cost and the operational nodes costs are evaluated separately. The optimizer also takes into account the cost of reading the input when pipelines are not used. The cost at each node is a function of its children and the subtrees below it.
  • 7. For operational node o the cost function is defined as cost(o) = cost of executing (o) + ∑ei∈children(o) cost(ei) The children of o are all equivalence nodes. The cost of each equivalence node is cost(e) = min{cost(oi) | oi ∈ children€} = 0 if there are no children We now have to take into account the case when some subset of nodes may be materialized and we may need to reuse these materialized nodes. We introduce a new function called reusecost(ei) which gives us the cost if an equivalence node ei is re used again from a materialized set M. Thus the modified cost factor will be Cost(o) = cost of executing (o) + ∑ei∈children(o) CC(ei) Where CC(ei) = cost(ei) if ei ∉ M = min(cost(ei), reusecost(ei)) if ei ∈ M 5.4 Algorithm to Handle Cache Delete and Insertion The algorithm proposed finds out if any of nodes in the DAG and the chunk system are worth caching. We need to find out the benefit of adding or deleting these nodes. We create benefit functions for DAG too which will be similar to the Benefit(N) function in the chunk system. The benefit function also takes into account the number of times the previous query was used. The proposed optimizer needs to know the nodes selected when the previous query was considered and all the nodes of the query plan. Now suppose S is a set of nodes selected to be cached from representative set R, then for a query q Cost(R,S) = ∑q∈R (cost(q, S) * weight(q)) We now find out the benefit function. Benefit(R,x,S) = cost(R, S) – (cost(R, {x} ∪ S) + cost(x, S)) This finds out the benefit we get by adding node x to the DAG. In cases where x is computed already we assume the cost(x, S) to be 0. We can now create a modified algorithm that will handle cache deletions and insertions. Algorithm: TimeBenefit Input: chunk N to be inserted in the cache Set X of the expanded DAG with nodes cached Node x with benefit(x,R,X) while [ space not available for N]: Let C be the chunk corresponding to current time position. If [ Wieght (C) ≤ 0] : Evict C from the cache Delete x and its equivalence nodes from X Else : Weight (C) = Weight (C) – Benefit (N) EndIf Advance Time position EndWhile Insert N into cache Weight (N) = Benefit (N) Algorithm Modify_Cache Input: Expanded DAG for R, the representative set of queries, and the set of candidate equivalence nodes for caching Chunk N to be inserted Output: Set of nodes to be cached X=φ Y = set of candidates equivalence nodes for caching while (Y = φ) L1: Among nodes y ∈ Y such that size({y} ∪ X) < CacheSize) Pick the node x with the highest benef it(R, x, X)/size(x) /* i.e., highest benefit per unit space */ if (benef it(R, x, X) < 0) break; /* No further benefits to be had, stop */ TimeBenefit(N,X,x) Y = Y - x; X = X ∪ {x} return X The Modify_Cache algorithm now handles the deletion of nodes and chunks from the optimizer and also creates optimum cache mechanism for the queries. 5.5 Handling Frequent Updates Our solution system also handles the frequent updates made on the database and the coherency with the cache system. As new data keeps getting added to the databases, the cache needs to be modified. We also need to discard or modify necessary chunks in the chunk system. To do this we create a sub module which will act as a proxy between the database and the client. As and when the client system proposes changes to the database, the data will first enter the proxy system. The proxy system will then decide which relations and attributes in each
  • 8. relation need to be modified in the database. The proxy module is also connected to the Optimizer and Cache Manager. This ensures that the chunks are mapped on to the respective relation attributes in the proxy. Whenever a new update enters the proxy module, it creates a map pointer to the necessary chunks in the chunk system. The proxy module then finds out which chunks in the chunk system will need to be modified. Once this mapping is created it then sends the update to the database to be materialized. Further analysis of the proxy system will be done in the future work. 6. EVALUATION OF RESULTS 6.1 Evaluating the use of DAG The Dag based approach is also followed by The Exchequer system. In this method we use the concept of making the query into a set of nodes to be evaluated as a DAG. This DAG is then expanded to analyze the nodes and then the operations on the nodes are performed one by one, We then get the cached nodes of the Dag and nodes which need to be deleted form the DAG. The query structure taken for evaluation is of the following type. SELECT SUM(QUANTITY) FROM ORDERS, SUPPLIER, PART, CUSTOMER, TIME WHERE join-list AND select-list GROUP BY groupby-list; It has a central Orders fact table and dimension tables- Part, Supplier, Customer and Time. The size of these tables is the same as used in [1]. The join-list us used to have equality between attributes of the order fact table and primary keys. The select-list are generated by selecting 0 too 2 attributes from the join-list. The groupby-list is generated by picking at random a subset of all the keys. These queries are decided so that a fair comparison can be made. The metrics used to measure the goodness of the algorithm is the total response time of the set of queries. The report is generated for a sequence of 100 queries after 50 queries have already allowed the cache to be generated. This total response time is denoted by the estimated cost which is calculated using the cost functions mentioned in secition 5.3 and 5.4. To evaluate, the representative set is initially set to 10. And then we check for different sized of caches, which is around 5%, 10%, 20% of the database. We compare our algorithm with the Exchequer system and with a system which has no cache, it has the LRU method for cache management. The LRU policy is found widely in ADBMS systems., it picks the least recently used chunk and Dag node to be replaced. In LRU the system is unaware of the load of the work. Analyzing the Estimated Response Time: In this part, initially the cache size is kept at a minimum. Then the algorithms are run on the queries. The Exchequer’s Algorithm and our Modify_Cache perform better than the NoCache system. Now the size of the cache is increased to accommodate 5% of the database. The algorithms show an improvement in their performance. This can be because the increase in cache size makes it easier for the algorithms to find out the chunks which will be able to answer the queries on their own. In low sized caches, this poses as a problem as it becomes costlier and longer to find out chunks to look for answers and then using the replacement methods for the DAG and the chunks. For low cache sizes the Modify_Cache algorithm performs better than the Exchequer algorithm, with a higher rate of improvement. However, as the cache size is increased to 30% the estimated cost increases for the Exchequer system. After a given cache size, the system investment in caching the extra results obtained in this increased size does not help in the Benefit factor. With the cache being 30% of the database size, the Modify_Cache algorithm still returns a better estimated time than the Exchequer’s algorithm. 6.2 Evaluating for Chunks We considered various performance measures to evaluate the effectiveness of the schemes we have employed. 1. Using our system we executed 100 queries to calculate the average execution time. 2. Cost Saving Ratio: This performance measure is the percentage of the total cost of the queries saved due to hits in the cache. The cost to execute the query at the backend to compute the savings in cost due to a cache hit. Consider a query stream consisting of a mix of n queries q1, q2,…qn. c: this is the cost when we execute the query at the backend hi: when we satisfy qi references made to the cache ri: number of references to the query qi
  • 9. Comparing the CSR of a query based system and a chunk based cache we have come to a conclusion that query based system gives a value of 0.42 because of the redundant storage in the cache and the chunk based system the CSR was 0.98 showing that the cache storage was not redundant. 7. FUTURE WORK This concept of using queries for caching can be further optimized. Our algorithm of Modify_Cache currently only uses the Benefit anf Weight functions to evaluate the utility of cache chunks. It can be further improved by accommodating other aspects of the chunks, and also taking into account eh different operations that can be performed on the OLAP databases. Further work can be done in improving the run time of our algorithm. Work also needs to be done to better use the DAGs and imporving the run time of the DAG expansions. The time requirement and additional space requirement for the DAGs to store query cache information play a crucial role. This needs to be taken into account for further work. Also more work needs to be done to find out hoe the frequent updates of the databases can be handled. Our algorithm, at this point of time cannot efficiently handle high frequency cache updates. Work needs to be to implement advanced methods to identify high frequency cache updates and hence maintain the efficiency and consistency of the cache. 8. CONCLUSION Thus it can be seen that the use of chunks and DAG in implementing a Cache management system proves useful. The performance of the Query Engine improves with the use of the Modify_Cache algorithm. The estimated time to run the queries also decreases tremendously with the use of query caching and the use of chunks to store caches. The use of chunks in caching helps in utility of large datastores and results in decrement in running OLAP queries on the same datastores. 9. REFERENCES [1] Don’t Trash your Intermediate Results, Cache ’em, Prasan Roy, Krithi Ramamritham, S. Seshadri, Pradeep Shenoy, S. Sudarshan, IIT Bombay. http://arxiv.org/abs/cs.DB/0003005 [2] Composite Data Virtualization, White Paper, Composite Software http://cdn.information- management.com/media/pdfs/CompositeDVPerformance.p df [3] Caching Multidimensional Queries Using Chunks, Prasad M. Deshpande, Karthikeyan Ramasamy, Amit Shukla, Jeffrey F. Naughton, University of Wisconsin, Madison [4] Usability-based caching of query results in OLAP systems, Chang-Sup Park, Myoung Ho Kim, Yoon-Joon Lee http://www.cin.ufpe.br/~bmcr/public_html/Usability- based_caching_of_query.pdf [5] Extensibility and Search Efficiency in the Volcano, Goetz Graefe and William J. McKenna. Optimizer Generator. Technical Report CU-CS-91-563, University of Colorado at Boulder, December 1991. [6] WATCHMAN: A Data Warehouse Intelligent Cache Manager, P. Scheuermann, j. Shim and R. Vingralek, VLDB Conf. 1996