Data Compression for Multi-dimentional Data Warehouses

1

Data Compression for Large
Multidimensional Data
Warehouses

Supervisor: Presented by:
Dr. K.M. Azharul Hasan Abdullah Al Mahmud,
Associate Professor, Roll : 0507006
Head of the Department, Md. Mushfiqur Rahman,
Department of CSE, KUET Roll : 0507029

This slide is prepared by Muhammad Mushfiqur Rahman & Abdullah Al Mahmud for the presentation of Thesis

2

Presentation Layout

 Objectives
 Existing Compression Schemes
 Traditional Extendible Array
 Proposed Compression Scheme
 EXCS
(Extendible Array Based Compression Scheme)
Comparative Analysis
Conclusion

Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh

3

Objectives
Data compression technology reduces:
 effective price of logical data storage capacity
improves query performance

 Multidimensional array is widely used in large
number of scientific research.
 An efficient compression of multidimensional
array can handle large multidimensional data
sets of data warehouses


4

Existing Compression Schemes (1/ 3)

 Bitmap compression
 Run Length Encoding
 Header compression
 Compressed Column Storage
 Compressed Row Storage


5


(a) A sparse array. (b) The CRS scheme


6


Classical methods cannot support updates
without completely readjusting runs .

Compressing sparse array

 Do not support extendibility


7

Traditional Extendible Array
History
Table
0 1 3 5
 TEA supports
dynamic extension Address
Table
0 1 4 9
of dimension size.
0 0 0 1 4 9

Position <1,3> 2 2 2 3 5 10

H1[1]<H2[3] 4 6 6 7 8 11

Address of History Counter= 0
4
2
3
5
1

Cell=Address1[3]+1=10
Figure 1: TEA Construction And Access


8

Proposed Compression Scheme
Multidimensional arrays are important for
sparse array operations

Extendibility of multidimensional arrays

 A compression technique that can work on
multidimensional extendible array

 Our proposed compression scheme is EXCS
(Extendible array based Compression
Scheme)

9

Extendible array based
Compression Scheme (EXCS) 1/3

We implemented the multidimensional
extendible array in secondary memory

We have considered dimension =3 in our
experimental approach

The sub-arrays are distinguished to store
them individually in the secondary memory


10


The sub-arrays are of n-1(=2) dimension

A large no. of sub-arrays are generated to be
compressed

Sub-arrays are dynamically taken as input

Only the max no of sub-arrays is to be given

11


Each sub-array is compressed individually

The compression technique used is similar to
CRS

The compressed elements are written in the
secondary memory as RO, CO, VL of
subarray_1, subarray_2, … … subarray_N

12

Performance Measurement
Performance is measured by measuring two
key factors of the compression schemes:
 Data Density
 Length of Dimension/ Number of Data

 compression ratio=
(compressed data/ original data)
 space savings = 1 – compression ratio

 we have considered space savings in percent

13

Comparative Analysis (1/4)
100

80

60
Space savings

Header
40
Bitmap
CRS
EACRS
20
Offset

0
64 729 4096 15625 46656

-20

-40
No. of data
Figure: Comparison with fixed density = 20%

14

80

60

40
Space savings

Header
Bitmap
20 CRS
EACRS
Offset

0
64 729 4096 15625 46656

-20

-40
No. of data
Figure: Comparison with fixed density = 25%

15

100

80

60
compression ratio

40
Header

Bitmap
20

CRS

0
EACRS
10 20 30 40 50

Offset
-20

-40

-60
Density of data
Figure: Comparison with fixed no. of data=64

16

Comparative Analysis 100
(4/4)

80

60
compression ratio

40
Header
Bitmap
20 CRS
EACRS
Offset
0
10 20 30 40 50

-20

-40

-60
Density of data
Figure: Comparison with fixed no. of data=4096

17

Performance Measurement

 Extendibility of arrays
 Using multidimensional arrays
 Extendibility toward any dimension
EXCS allows dynamic extension of arrays.
In analysis, we can extend data up to n
dimensions
 Performance is good for large no. of data


18

Conclusion
 Our proposed compression scheme is
experimentally done up to 3 dimension data

 It can be extended experimentally for
compressing n dimension data in future.

EXCS is effective for large multidimensional
data warehouses


Data Compression for Multi-dimentional Data Warehouses

More Related Content

Viewers also liked

Similar to Data Compression for Multi-dimentional Data Warehouses

Data Compression for Multi-dimentional Data Warehouses