The document presents a proposed compression scheme called EXCS for multidimensional data warehouses. EXCS uses an extendible array to store multidimensional data and compresses each subarray individually using a technique similar to compressed row storage. Performance is evaluated based on compression ratio and space savings for EXCS compared to other schemes like bitmap, header, offset compression and compressed row storage under varying data densities and dimensional sizes. EXCS achieves higher space savings than other techniques in most cases due to its ability to dynamically compress subarrays of an extendible multidimensional array.
Circuit Partitioning for VLSI Layout presented by Oveis Dehghantanhaoveis dehghantanha
Application of Evolutionary Algorithms
for Multi-objective Optimization in VLSI
and Embedded Systems Chapter 3 presented by Oveis Dehghantanha in University of Birjand
GZIP compression works by first using LZ77 algorithm to replace repeated data with references, and then using Huffman coding to assign shorter codes to more frequent characters. While GZIP is not the best compression method, it provides a good balance between speed and compression ratio. Additional preprocessing of data, such as reordering XML attributes or transposing JSON, can further improve compression ratios achieved by GZIP.
The document discusses different types of data compression techniques. It explains that compression reduces the size of data to reduce bandwidth and storage requirements when transmitting or storing audio, video, and images. It describes how compression works by exploiting redundancies in the data as well as properties of human perception. Various compression methods are outlined, including lossless techniques that preserve all information as well as lossy methods for audio and video that can tolerate some loss of information. Specific algorithms discussed include Huffman coding, run-length coding, LZW coding, and arithmetic coding. The document also provides details on JPEG image compression and MPEG video compression standards.
The document describes a file compression application. It allows large files to be compressed to reduce file size and speed up transfers. It uses the GZip and Deflate compression standards which save space and time by compressing data. The application provides functions for compressing files into a zip archive and decompressing files from the archive. It produces compressed files with smaller sizes than the originals, allowing more efficient storage, emailing, and downloading of files.
The document summarizes a study evaluating three data compression algorithms created by Dr. Samuel Sterns. The study was led by Myuran Kanga and evaluated the algorithms on various waveforms to determine compression accuracy and efficiency. Algorithm 2 used quantization, algorithm 3 added prediction of quantized data, and algorithm 4 used adaptive arithmetic coding for further compression. Waveforms like sine, square and sawtooth waves as well as noise were compressed and decompressed, and the results were analyzed for differences between original and decompressed signals.
This is the subject slides for the module MMS2401 - Multimedia System and Communication taught in Shepherd College of Media Technology, Affiliated with Purbanchal University.
Comparison of various data compression techniques and it perfectly differentiates different techniques of data compression. Its likely to be precise and focused on techniques rather than the topic itself.
Circuit Partitioning for VLSI Layout presented by Oveis Dehghantanhaoveis dehghantanha
Application of Evolutionary Algorithms
for Multi-objective Optimization in VLSI
and Embedded Systems Chapter 3 presented by Oveis Dehghantanha in University of Birjand
GZIP compression works by first using LZ77 algorithm to replace repeated data with references, and then using Huffman coding to assign shorter codes to more frequent characters. While GZIP is not the best compression method, it provides a good balance between speed and compression ratio. Additional preprocessing of data, such as reordering XML attributes or transposing JSON, can further improve compression ratios achieved by GZIP.
The document discusses different types of data compression techniques. It explains that compression reduces the size of data to reduce bandwidth and storage requirements when transmitting or storing audio, video, and images. It describes how compression works by exploiting redundancies in the data as well as properties of human perception. Various compression methods are outlined, including lossless techniques that preserve all information as well as lossy methods for audio and video that can tolerate some loss of information. Specific algorithms discussed include Huffman coding, run-length coding, LZW coding, and arithmetic coding. The document also provides details on JPEG image compression and MPEG video compression standards.
The document describes a file compression application. It allows large files to be compressed to reduce file size and speed up transfers. It uses the GZip and Deflate compression standards which save space and time by compressing data. The application provides functions for compressing files into a zip archive and decompressing files from the archive. It produces compressed files with smaller sizes than the originals, allowing more efficient storage, emailing, and downloading of files.
The document summarizes a study evaluating three data compression algorithms created by Dr. Samuel Sterns. The study was led by Myuran Kanga and evaluated the algorithms on various waveforms to determine compression accuracy and efficiency. Algorithm 2 used quantization, algorithm 3 added prediction of quantized data, and algorithm 4 used adaptive arithmetic coding for further compression. Waveforms like sine, square and sawtooth waves as well as noise were compressed and decompressed, and the results were analyzed for differences between original and decompressed signals.
This is the subject slides for the module MMS2401 - Multimedia System and Communication taught in Shepherd College of Media Technology, Affiliated with Purbanchal University.
Comparison of various data compression techniques and it perfectly differentiates different techniques of data compression. Its likely to be precise and focused on techniques rather than the topic itself.
This document discusses text compression algorithms LZW and Flate. It describes LZW's dictionary-based encoding approach and provides examples of encoding and decoding a string. Flate compression is explained as combining LZ77 compression, which finds repeated sequences, and Huffman coding, which assigns variable length codes based on frequency. Flate can choose between no compression, LZ77 then Huffman, or LZ77 and custom Huffman trees. The advantages of LZW include lossless compression and not needing the code table during decompression, while its disadvantage is dictionary size limits. Flate provides adaptive compression and lossless compression but has overhead from generating Huffman trees and complex implementation.
This document provides an overview of data compression techniques. It discusses lossless compression algorithms like Huffman encoding and LZW encoding which allow for exact reconstruction of the original data. It also discusses lossy compression techniques like JPEG and MPEG which allow for approximate reconstruction for images and video in order to achieve higher compression rates. JPEG divides images into 8x8 blocks and applies discrete cosine transform, quantization, and run length encoding. MPEG spatially compresses each video frame using JPEG and temporally compresses frames by removing redundant frames.
This document discusses different compression techniques including lossless and lossy compression. Lossless compression recovers the exact original data after compression and is used for databases and documents. Lossy compression results in some loss of accuracy but allows for greater compression and is used for images and audio. Common lossless compression algorithms discussed include run-length encoding, Huffman coding, and arithmetic coding. Lossy compression is used in applications like digital cameras to increase storage capacity with minimal quality degradation.
There are two categories of data compression methods: lossless and lossy. Lossless methods preserve the integrity of the data by using compression and decompression algorithms that are exact inverses, while lossy methods allow for data loss. Common lossless methods include run-length encoding and Huffman coding, while lossy methods like JPEG, MPEG, and MP3 are used to compress images, video, and audio by removing imperceptible or redundant data.
In computer science and information theory, data compression, source coding,[1] or bit-rate reduction involves encoding information using fewer bits than the original representation.[2] Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression.
This document discusses various methods of data compression. It begins by defining compression as reducing the size of data while retaining its meaning. There are two main types of compression: lossless and lossy. Lossless compression allows for perfect reconstruction of the original data by removing redundant data. Common lossless methods include run-length encoding and Huffman coding. Lossy compression is used for images and video, and results in some loss of information. Popular lossy schemes are JPEG, MPEG, and MP3. The document then proceeds to describe several specific compression algorithms and their encoding and decoding processes.
This document discusses text compression algorithms LZW and Flate. It describes LZW's dictionary-based encoding approach and provides examples of encoding and decoding a string. Flate compression is explained as combining LZ77 compression, which finds repeated sequences, and Huffman coding, which assigns variable length codes based on frequency. Flate can choose between no compression, LZ77 then Huffman, or LZ77 and custom Huffman trees. The advantages of LZW include lossless compression and not needing the code table during decompression, while its disadvantage is dictionary size limits. Flate provides adaptive compression and lossless compression but has overhead from generating Huffman trees and complex implementation.
This document provides an overview of data compression techniques. It discusses lossless compression algorithms like Huffman encoding and LZW encoding which allow for exact reconstruction of the original data. It also discusses lossy compression techniques like JPEG and MPEG which allow for approximate reconstruction for images and video in order to achieve higher compression rates. JPEG divides images into 8x8 blocks and applies discrete cosine transform, quantization, and run length encoding. MPEG spatially compresses each video frame using JPEG and temporally compresses frames by removing redundant frames.
This document discusses different compression techniques including lossless and lossy compression. Lossless compression recovers the exact original data after compression and is used for databases and documents. Lossy compression results in some loss of accuracy but allows for greater compression and is used for images and audio. Common lossless compression algorithms discussed include run-length encoding, Huffman coding, and arithmetic coding. Lossy compression is used in applications like digital cameras to increase storage capacity with minimal quality degradation.
There are two categories of data compression methods: lossless and lossy. Lossless methods preserve the integrity of the data by using compression and decompression algorithms that are exact inverses, while lossy methods allow for data loss. Common lossless methods include run-length encoding and Huffman coding, while lossy methods like JPEG, MPEG, and MP3 are used to compress images, video, and audio by removing imperceptible or redundant data.
In computer science and information theory, data compression, source coding,[1] or bit-rate reduction involves encoding information using fewer bits than the original representation.[2] Compression can be either lossy or lossless. Lossless compression reduces bits by identifying and eliminating statistical redundancy. No information is lost in lossless compression.
This document discusses various methods of data compression. It begins by defining compression as reducing the size of data while retaining its meaning. There are two main types of compression: lossless and lossy. Lossless compression allows for perfect reconstruction of the original data by removing redundant data. Common lossless methods include run-length encoding and Huffman coding. Lossy compression is used for images and video, and results in some loss of information. Popular lossy schemes are JPEG, MPEG, and MP3. The document then proceeds to describe several specific compression algorithms and their encoding and decoding processes.
Data Compression for Multi-dimentional Data Warehouses
1. 1
Data Compression for Large
Multidimensional Data
Warehouses
Supervisor: Presented by:
Dr. K.M. Azharul Hasan Abdullah Al Mahmud,
Associate Professor, Roll : 0507006
Head of the Department, Md. Mushfiqur Rahman,
Department of CSE, KUET Roll : 0507029
This slide is prepared by Muhammad Mushfiqur Rahman & Abdullah Al Mahmud for the presentation of Thesis
3. 3
Objectives
Data compression technology reduces:
effective price of logical data storage capacity
improves query performance
Multidimensional array is widely used in large
number of scientific research.
An efficient compression of multidimensional
array can handle large multidimensional data
sets of data warehouses
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
5. 5
Existing Compression Schemes (2/ 3)
(a) A sparse array. (b) The CRS scheme
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
6. 6
Existing Compression Schemes (3/ 3)
Classical methods cannot support updates
without completely readjusting runs .
Compressing sparse array
Do not support extendibility
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
7. 7
Traditional Extendible Array
History
Table
0 1 3 5
TEA supports
dynamic extension Address
Table
0 1 4 9
of dimension size.
0 0 0 1 4 9
Position <1,3> 2 2 2 3 5 10
H1[1]<H2[3] 4 6 6 7 8 11
Address of History Counter= 0
4
2
3
5
1
Cell=Address1[3]+1=10
Figure 1: TEA Construction And Access
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
8. 8
Proposed Compression Scheme
Multidimensional arrays are important for
sparse array operations
Extendibility of multidimensional arrays
A compression technique that can work on
multidimensional extendible array
Our proposed compression scheme is EXCS
(Extendible array based Compression
Scheme)
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
9. 9
Extendible array based
Compression Scheme (EXCS) 1/3
We implemented the multidimensional
extendible array in secondary memory
We have considered dimension =3 in our
experimental approach
The sub-arrays are distinguished to store
them individually in the secondary memory
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
10. 10
Extendible array based
Compression Scheme (EXCS) 2/3
The sub-arrays are of n-1(=2) dimension
A large no. of sub-arrays are generated to be
compressed
Sub-arrays are dynamically taken as input
Only the max no of sub-arrays is to be given
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
11. 11
Extendible array based
Compression Scheme (EXCS) 3/3
Each sub-array is compressed individually
The compression technique used is similar to
CRS
The compressed elements are written in the
secondary memory as RO, CO, VL of
subarray_1, subarray_2, … … subarray_N
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
12. 12
Performance Measurement
Performance is measured by measuring two
key factors of the compression schemes:
Data Density
Length of Dimension/ Number of Data
compression ratio=
(compressed data/ original data)
space savings = 1 – compression ratio
we have considered space savings in percent
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
13. 13
Comparative Analysis (1/4)
100
80
60
Space savings
Header
40
Bitmap
CRS
EACRS
20
Offset
0
64 729 4096 15625 46656
-20
-40
No. of data
Figure: Comparison with fixed density = 20%
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
14. 14
Comparative Analysis (2/4)
80
60
40
Space savings
Header
Bitmap
20 CRS
EACRS
Offset
0
64 729 4096 15625 46656
-20
-40
No. of data
Figure: Comparison with fixed density = 25%
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
15. 15
Comparative Analysis (3/4)
100
80
60
compression ratio
40
Header
Bitmap
20
CRS
0
EACRS
10 20 30 40 50
Offset
-20
-40
-60
Density of data
Figure: Comparison with fixed no. of data=64
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
16. 16
Comparative Analysis 100
(4/4)
80
60
compression ratio
40
Header
Bitmap
20 CRS
EACRS
Offset
0
10 20 30 40 50
-20
-40
-60
Density of data
Figure: Comparison with fixed no. of data=4096
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
17. 17
Performance Measurement
Extendibility of arrays
Using multidimensional arrays
Extendibility toward any dimension
EXCS allows dynamic extension of arrays.
In analysis, we can extend data up to n
dimensions
Performance is good for large no. of data
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh
18. 18
Conclusion
Our proposed compression scheme is
experimentally done up to 3 dimension data
It can be extended experimentally for
compressing n dimension data in future.
EXCS is effective for large multidimensional
data warehouses
Muhammad Mushfiqur Rahman, Student ID: 0507029, CSE, KUET, Bangladesh