Data compression for Large Multidimensional Data Warehouses

  • 1,295 views
Uploaded on

This presentation is prepared for the presentation of thesis titled as "Data compression for Large Multidimensional Data Warehouses" which was done for the partial fulfillment of the undergrad course …

This presentation is prepared for the presentation of thesis titled as "Data compression for Large Multidimensional Data Warehouses" which was done for the partial fulfillment of the undergrad course in Dept of CSE, KUET, Bangladesh.

More in: Education
  • Full Name Full Name Comment goes here.
    Are you sure you want to
    Your message goes here
    Be the first to comment
No Downloads

Views

Total Views
1,295
On Slideshare
0
From Embeds
0
Number of Embeds
0

Actions

Shares
Downloads
28
Comments
0
Likes
1

Embeds 0

No embeds

Report content

Flagged as inappropriate Flag as inappropriate
Flag as inappropriate

Select your reason for flagging this presentation as inappropriate.

Cancel
    No notes for slide

Transcript

  • 1. Data Compression for Large Multidimensional Data Warehouses
    Supervisor:
    Presented by:
    Dr. K.M. Azharul Hasan
    Associate Professor,
    Head of the Department,
    Department of CSE, KUET
    Abdullah Al Mahmud,
    Roll : 0507006
    Md. Mushfiqur Rahman,
    Roll : 0507029
    1
    This slide is prepared by Abdullah Al Mahmud for the presentation of Thesis which was done as the partial fulfillment of degree of in undergrad course in Khulna University of Engineering & Technology(KUET), Bangladesh
  • 2. Presentation Layout
    • Objectives
    • 3. Existing Compression Schemes
    • 4. Traditional Extendible Array
    • 5. Proposed Compression Scheme
    • 6. EXCS
    (Extendible Array Based Compression Scheme)
    • Comparative Analysis
    • 7. Conclusion
    2
    Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
  • 8.
    • Data compression technology reduces:
    • 9. effective price of logical data storage capacity
    • 10. improves query performance
    • 11. Multidimensional array is widely used in large number of scientific research.
    • 12. An efficient compression of multidimensional array can handle large multidimensional data sets of data warehouses
    3
    Objectives
    Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
  • 13. Existing Compression Schemes (1/ 3)
    • Bitmap compression
    • 14. Run Length Encoding
    • 15. Header compression
    • 16. Compressed Column Storage
    • 17. Compressed Row Storage
    4
    Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
  • 18. Existing Compression Schemes (2/ 3)
    5
    (a) A sparse array. (b) The CRS scheme
    Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
  • 19. Existing Compression Schemes (3/ 3)
    • Classical methods cannot support updates without completely readjusting runs .
    • 20. Compressing sparse array
    • 21. Do not support extendibility
    6
    Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
  • 22. Traditional Extendible Array
    • TEA supports dynamic extension of dimension size.
    7
    Position <1,3>
    H1[1]<H2[3]
    Address of Cell=Address1[3]+1=10
    0
    History Counter=
    0
    1
    2
    3
    4
    5
    Figure 1: TEA Construction And Access
    Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
  • 23. Proposed Compression Scheme
    • Multidimensional arrays are important for sparse array operations
    • 24. Extendibility of multidimensional arrays
    • 25. A compression technique that can work on multidimensional extendible array
    • 26. Our proposed compression scheme is EXCS (Extendible array based Compression Scheme)
    8
    Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
  • 27. Extendible array based Compression Scheme (EXCS) 1/3
    • We implemented the multidimensional extendible array in secondary memory
    • 28. We have considered dimension =3 in our experimental approach
    • 29. The sub-arrays are distinguished to store them individually in the secondary memory
    9
    Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
  • 30. Extendible array based Compression Scheme (EXCS) 2/3
    • The sub-arrays are of n-1(=2) dimension
    • 31. A large no. of sub-arrays are generated to be compressed
    • 32. Sub-arrays are dynamically taken as input
    • 33. Only the max no of sub-arrays is to be given
    10
    Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
  • 34. 11
    Extendible array based Compression Scheme (EXCS) 3/3
    • Each sub-array is compressed individually
    • 35. The compression technique used is similar to CRS
    • 36. The compressed elements are written in the secondary memory as RO, CO, VL of subarray_1, subarray_2, … … subarray_N
    Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
  • 37. Performance Measurement
    • Performance is measured by measuring two key factors of the compression schemes:
    • 38. Data Density
    • 39. Length of Dimension/ Number of Data
    • 40. compression ratio=
    (compressed data/ original data)
    • space savings = 1 – compression ratio
    • 41. we have considered space savings in percent
    12
    Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
  • 42. Comparative Analysis (1/4)
    13
    No. of data
    Figure: Comparison with fixed density = 20%
    Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
  • 43. 14
    Comparative Analysis (2/4)
    No. of data
    Figure: Comparison with fixed density = 25%
    Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
  • 44. Comparative Analysis (3/4)
    15
    Density of data
    Figure: Comparison with fixed no. of data=64
    Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
  • 45. Comparative Analysis (4/4)
    16
    Density of data
    Figure: Comparison with fixed no. of data=4096
    Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
  • 46. Performance Measurement
    • Extendibility of arrays
    • 47. Using multidimensional arrays
    • 48. Extendibility toward any dimension
    • 49. EXCS allows dynamic extension of arrays.
    • 50. In analysis, we can extend data up to n dimensions
    • 51. Performance is good for large no. of data
    17
    Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh
  • 52. Conclusion
    • Our proposed compression scheme is experimentally done up to 3 dimension data
    • 53. It can be extended experimentally for compressing n dimension data in future.
    • 54. EXCS is effective for large multidimensional data warehouses
    18
    Abdullah Al Mahmud, Student ID: 0507006, Dept. of CSE, KUET, Bangladesh