Successfully reported this slideshow.
We use your LinkedIn profile and activity data to personalize ads and to show you more relevant ads. You can change your ad preferences anytime.
Mathematical 
Preliminaries 
1
 The development of data compression algorithms 
for a variety of data can be divided into two phases. 
 Modeling 
 Cod...
 A description of the model and a “description” of 
how the data differ from the model are coded, 
generally using a bina...
 If binary representations of these numbers is to be 
transmitted or stored, we would need to use 5 bits per 
sample. 
 ...
 We see that the data 
seem to fall on a straight 
line. 
 A model for the data 
could therefore be a 
straight line giv...
 To examine the difference between the data and the 
model. The difference (or residual) is computed: 
 En =An − Ān = 0 ...
7
 The number of distinct values has been reduced. 
 Fewer bits are required to represent each number and 
compression is ...
 The sequence is made up of eight different symbols. 
 we need to use 3 bits per symbol. 
 Say we have assigned a codew...
10
 If we substitute the codes for each symbol, 
we will use 106 bits to encode the entire 
sequence. 
 As there are 41 sym...
 Information 
 Amount of Information 
 Entropy 
 Maximum Entropy 
 Condition for maximum entropy 
12
 Compression is achieved by removing data redundancy 
while preserving information content. 
 The information content of...
 How to determine the entropy 
 Find the probability p(x) of symbol x in the message 
 The entropy H(x) of the symbol x...
15
Upcoming SlideShare
Loading in …5
×

3 mathematical priliminaries DATA compression

1,429 views

Published on

Data Compression

Published in: Engineering
  • Login to see the comments

3 mathematical priliminaries DATA compression

  1. 1. Mathematical Preliminaries 1
  2. 2.  The development of data compression algorithms for a variety of data can be divided into two phases.  Modeling  Coding  In modeling phase we try to extract information about any redundancy that exists in the data and describe the redundancy in the form of a model. 2
  3. 3.  A description of the model and a “description” of how the data differ from the model are coded, generally using a binary alphabet.  The difference between the data and the model is often referred to as the residual.  In the following three examples we will look at three different ways that data can be modeled. We will then use the model to obtain compression. 3
  4. 4.  If binary representations of these numbers is to be transmitted or stored, we would need to use 5 bits per sample.  By exploiting the structure in the data, we can represent  the sequence using fewer bits.  If we plot this data as shown in following Figure 4
  5. 5.  We see that the data seem to fall on a straight line.  A model for the data could therefore be a straight line given by the equation: 5
  6. 6.  To examine the difference between the data and the model. The difference (or residual) is computed:  En =An − Ān = 0 1 0 −1 1 −1 0 1 −1 −1 1 1  The residual sequence consists of only three numbers −1 0 1.  If we assign a code of 00 to −1, a code of 01 to 0, and a code of 10 to 1, we need to use 2 bits to represent each element of the residual sequence.  Therefore, we can obtain compression by transmitting or storing the parameters of the model and the residual sequence.  The encoding can be exact if the required compression is to be lossless, or approximate if the compression can be lossy. 6
  7. 7. 7
  8. 8.  The number of distinct values has been reduced.  Fewer bits are required to represent each number and compression is achieved.  The decoder adds each received value to the previous decoded value to obtain the reconstruction corresponding to the received value. 8
  9. 9.  The sequence is made up of eight different symbols.  we need to use 3 bits per symbol.  Say we have assigned a codeword with only a single bit to the symbol that occurs most often, and correspondingly longer codewords to symbols that occur less often. 9
  10. 10. 10
  11. 11.  If we substitute the codes for each symbol, we will use 106 bits to encode the entire sequence.  As there are 41 symbols in the sequence, this works out to approximately 2.58 bits per symbol.  This means we have obtained a compression ratio of 1.16:1. 11
  12. 12.  Information  Amount of Information  Entropy  Maximum Entropy  Condition for maximum entropy 12
  13. 13.  Compression is achieved by removing data redundancy while preserving information content.  The information content of a group of bytes (a message).  Entropy is the measure of information content in a message.  Messages with higher entropy carry more information than messages with lower entropy.  Data with low entropy permit a larger compression ratio than data with high entropy. 13
  14. 14.  How to determine the entropy  Find the probability p(x) of symbol x in the message  The entropy H(x) of the symbol x is: H(x) = - p(x) • log2p(x)  The average entropy over the entire message is the sum of the entropy of all n symbols in the message. 14
  15. 15. 15

×