This presentation was prepared by Ishara Amarasekera based on the paper, A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database by Hasso Plattner.
This presentation contains a summary of the content provided in this research paper and was presented as a paper discussion for the course, Advanced Database Systems in Computer Science.
A Common Database Approach for OLTP and OLAP Using an In-Memory Column Database
1. A Common Database
Approach for OLTP and OLAP
Using an In-Memory Column
Database
By Hasso Plattner
Presenter : Ishara Amarasekera
2. Outline
Introduction
OLTP and OLAP Systems
Motivation
Experiment and Benefits of Column Database
Data Organization
Memory Consumption
Contribution to Development of Software
1
3. Introduction
Relational database systems was the backbone for 20
years.
OLTP and OLAP are based on the relational but use
different technical approaches.
2
4. OLTP
Designed the database structures to cope with the
more complex business requirements.
Need to focus on the transactional processing.
Tuples are arranged in rows, which are stored in
blocks.
The blocks reside on disk and are cached in main
memory in the database server.
Sophisticated indexing allows fast access to single
tuples.
3
6. OLAP
Designed to perform the analytical and financial
planning
Provides more flexibility and better performance.
OLAP schema is a list of cubes that are grouped
together so that one or more SAS OLAP Servers can
access them.
For OLAP systems, in contrast, data is often organized
in star schemas, where a popular optimization is to
compress attributes (columns) with the help of
dictionaries.
5
8. Motivation
OLTP and OLAP, are based on the relational theory but
using different technical approaches.
It is desirable to have OLTP and OLAP capabilities in
one system to make components more valuable to the
users.
The use of column store data- bases for analytics has
become quite popular.
Dictionary compression on the database level and
reading only those columns necessary to process a
query speed up query processing significantly in the
column store case.
7
9. Full Table Scan for table with
160 columns and 34 million tuples.
1 million tuples ~ 1GB of memory.
34 million tuples ~ 35GB of memory.
Column Store DB equivalent table size is 8GB.
8
Experiment
10. In real applications, the only 10% of the attributes in a
single table is used in 1 SQL statement.
For column store at most
800MB of data has to be
accessed to calculate the
total value.
Experiment
9
11. A comparison of row and column store
database
Lowercase database
with horizontal
compression can not
compete with CVs, if
the treatment is
based on sample,
and requires access
to the columns
(columnar
operations)
10
12. Why Use Colum Store?
Enterprise Computing based on set processing not single
tuple access
11
Performance Gain
13. The Benefit of Column Store
Database
Processing samples and not access to a particular
tuple
Operations on tuples, using compression format
integers
Parallel processing
View one or more columns very well parallelized
Restriction: recommended to use as much as possible
less than projections
13
14. Parallel Processing
Calculations at the level of tuples automatically
parallelized and completely independent of each other
Modern processors can handle 1MB in msec, and
parallel processing by 16 cores -more than 10Mb in
ms.
For example, take a measurement, compressed in 4b, We can scan 2.5m tuples 1
ms. At this rate we do not even need to create an index based on the primary key.
13
16. Updates in Column Store Database
Adoption: Column Store DB suitable for UPDATE-
intensive application
In memory greatly improves situation, because it is
working with RAM where faster, but nevertheless some
problems remain.
15
17. Update Categories
From the history tables SAP, it was found that update is
divided into 3 categories
Aggregate update: The attributes are accumulated
values as part of materialized views (between 1 and 5
for each accounting line item)
Status update: Binary change of a status variable,
typically with timestamps
Value update: The value of an attribute changes by
replacement
16
18. Aggregate Update
Units - is the result of some analytical query (profit
quarter)
In column database in memory turned more convenient
to compute units "on the fly", and do not store units
already established
Do not take up too much space
With modern facilities of this occurs quickly
17
19. A plot of the time of receipt of the total
unit the number of tuples
18
20. Status Update
Status variables (e.g., paid, not paid) are usually one of
a number of possible values, so that problems with
their updating should not arise because the data
volume is not changed
19
21. Value Update
Insert-Only - a good approach Because it is an average
of only 5% of tuples varies within t
20
22. Insert-Only
Insert Only - Stores the "history" of the database.
It is an approach where there is little or no queries type
Update, and only Insert. Thus, instead of some added,
Update new tuple with a new time stamp.
Allows horizontal divide the table: it means that new
records stored in fast memory and older who attribute
"transaction date" is very old in the store less fast
memory (somewhere far away).
21
23. Data Organization
To build a combined system with OLTP and OLAP, data
should be organized based on the,
Frequent sampling of the set of tuples
Fast INSERT
Maximum parallelization (read)
Low cost (in time) of reorganization (at update and
insert)
22
24. Memory Consumption
Comparing the memory consumption of row and
column-DB winnings columnar obvious due best
compression algorithms.
Different analyzes of real data showed that:
The database allows the speaker to compress 20
times
The progressive only 2 times the average
23
25. Contribution Development of
Software
If we rewrite the current applications that will use a
columnar DBMS instead of line, then
Plan to reduce the amount of code to work with Data
for 30-50%
Many parts can be completely restructured, taking
into account all index nature lowercase bd
Also desirable rare use projections
24