1
DATA COMPRESSION
“Data” mean the information in digital form on which computer program operate,
“Compression” means a process of removing redundancy in the data.
“Data Compression”, mean deriving techniques or, more specifically, designing more
efficient algorithm to:
 Represent data in a less redundant fashion.
 Remove the redundancy in data.
 Implement compression algorithm, including both compression and
decompression.
We view data compression as a process of deriving algorithmic solutions to a
compression problem.
Data compression has become popular as compression reduces size of a file which helps
save space when storing and save time when transmitting it. Data compression can be
applied to text, image, sound and video information. Applications of data compressions
are many which include generic file compression, multimedia, communication and
database maintained by Google servers.
Most people frequently use data compression software such as zip, gzip and WinZip (and
many others) to reduce the file size before storing or transferring it in media.
Compression techniques are embedded in more and more software and data are often
compressed without people knowing it. Data compression has become a common
requirement for most application software as well as an important and active research
area in computer science. Without compression techniques, none of the ever-growing
Internet, digital TV, mobile communication or increasing video communication
techniques would have been practical developments.
2
D a t a c o m p r e s s i o n p r o b l e m s
A compression problem involves finding an efficient algorithm to remove various
redundancies from a certain type of data. The general question to ask here would be, for
example, given a string s, what is the alternative sequence of symbols which takes less
storage space? The solutions to the compression problems would then be the compression
algorithms that will derive an alternative sequence of symbols which contains fewer
number of bits in total, plus the decompression algorithms to recover the original string.
How many fewer bits? It would depend on the algorithms but it would also depend on
how much the redundancy can be extracted from the original data. Different data may
require different techniques to identify the redundancy and to remove the redundancy in
the data. Obviously, this makes the compression problems 'hard' to solve because the
general question is difficult to answer easily when it contains too many instances.
Fortunately, we can take certain constraints and heuristics into consideration when
designing algorithms.
There is no 'one size fits all' solution for data compression problems. In data
compression studies, we essentially need to analyses the characteristics of the data to be
compressed and hope to deduce some patterns in order to achieve a compact
representation. This gives rise to a variety of data modeling and representation
techniques, which are at the heart of compression techniques.
Decompression
Any compression algorithm will not work unless a means of decompression is also
provided due to the nature of data compression. When compression algorithms are
discussed in general, the word compression alone actually implies the context of both
compression and decompression.
In many practical cases, the efficiency of the decompression algorithm is of more concern
than that of the compression algorithm. For example, movies, photos, and audio data are
3
often compressed once by the artist and then the same version of the compressed files is
decompressed many times by millions of viewers or listeners.
Alternatively, the efficiency of the compression algorithm is sometimes more important.
For example, the recording audio or video data from some real-time programs may need
to be recorded directly to limited computer storage, or transmitted to a remote destination
through a narrow signal channel. Depending on specific problems, we sometimes
consider compression and decompression as two separate synchronous or asynchronous
processes.
Differentiate between Lossless and Lossy Compression.
SR.NO Lossy Compression Lossless Compression
1 Some loss of data in order to achieve
higher compression.
No loss data in order to achieve
high compression
2 In some cases a lossy method can
produced a much smaller compressed
File.
In some cases a lossless method can
produce a bigger compressed File.
3 Lossy methods are not reversible so that
original data cannot be reconstructed.
Losses compression schemes are
reversible so that the original data
can be reconstructed.
4 Lossy compression increased
compression ratio.
Decreased compression ratio.
5 Used when some loss of fidelity is
acceptable.
Use in real world data.
6 Compression of digital form of data. Compression of analog symbol.
4
Physical Model:
For more efficient compression algorithms, we require good models for sources. There
are different approaches are available to build mathematical models. To construct a
model, we can use the information about the physics of the data generation process.
Ex. Vocal cord model for speech coding – Here knowledge about the physics of speech
production can be used to construct a mathematic the sampled speech process and then
this sampled speech can be encoded using this model. Ex. Head and shoulder model for
video coding. But it is very difficult to understand the physics of data generation.
Probability Model:
In statistical data compression methods we can use probability model concept. Before
compression starts, we have to construct a model for the data for such type of method. To
build the probability model, we have to read the entire input stream, count appearance of
each symbol and compute the probability of occurrence of each symbol.
Then input the data stream symbol by symbol and is compressed using the information in
the probability model. So we can say that the probability model cannot capture symbol-
to-symbol dependence.
For source alphabet A={a1,a2,…….,am}, we can have a probability model
P={p(a1),p(a2),….p(am)} if we can assume that the symbols coming from the source are
independent to each other.
Markov Model:
Markov model are particularly useful in text compression, where the probability of the
next letter is heavily influenced by the preceding letters.
In current text compression, the kTH order markov models are more widely known as
finite context models, with the word context being used for what we have earlier defined
as setae.
5
Composite Source Model
In many applications it is not easy to use a single model to describe the source. In such
cases, we can define a composite source, which can be viewed as a combination or
composition of several sources, with only one source being active at any given time.
A composite source can be represented as a number of individual source Si, each with its
own model model Mi, and a switch that selects a source Si with probability Pi.
This is an exceptionally rich model and can be used to describe same very complicated
process.
Data Compression = Modeling + Coding
Data compression is defined as a process where a set of data is taken and compressed
over the network for transmission. This basicity gives rise to two important facts-
Modeling and Coding. The limit or extent to which we can control data redundancy
basically depends on this context.
If a proper model has been used and proper coding has been generated by the coder, the
redundancy may be significantly improved. The model is the one that guesses as to what
is the probability of occurrence of most frequently to occurring less frequently while the
coder outputs the decision of the first one.
Coding entirely does not sums up the data compression, modeling of data it also
relevantly important.
Input Data
Modeling (Predicts Probability)
Encoder (Generates the codes by
applying coding)
Output encoded
Data
6
Modeling is defined just as a process or technique that predicts the probability of
occurrence and there by selects a coding scheme for the very same. Model is another
name that is usually used to represent a copy or exact mode in which we can describe any
detail in form of a physical entity. It is just a unique representation of the unique ideas or
distinguished features that is usually considered for the describing any personality in best
order.
Here also the concept of modeling has been defined in the above mentioned concept only.
It is simply regarding the representation of unique features of a data so that it can be
modeled out as per human expectation and can be defined in a more precise way.
When we input certain data, we have to carry out certain processing in order to improve
the performance and deliverance of the desired categories in order to obtain a pre
specified output. We need to have a certain specific model handler to predict the type of
model that is to be implemented in order to carry out certain pre-specified format on the
given list.
The encoder makes use of the model information and based upon the information
contained works with the pre-specific procedure to produce the desired result.
So we can say that data compression is the result of not only the data model that is being
used but a combination of both model and the encoder.

Charter1 material

  • 1.
    1 DATA COMPRESSION “Data” meanthe information in digital form on which computer program operate, “Compression” means a process of removing redundancy in the data. “Data Compression”, mean deriving techniques or, more specifically, designing more efficient algorithm to:  Represent data in a less redundant fashion.  Remove the redundancy in data.  Implement compression algorithm, including both compression and decompression. We view data compression as a process of deriving algorithmic solutions to a compression problem. Data compression has become popular as compression reduces size of a file which helps save space when storing and save time when transmitting it. Data compression can be applied to text, image, sound and video information. Applications of data compressions are many which include generic file compression, multimedia, communication and database maintained by Google servers. Most people frequently use data compression software such as zip, gzip and WinZip (and many others) to reduce the file size before storing or transferring it in media. Compression techniques are embedded in more and more software and data are often compressed without people knowing it. Data compression has become a common requirement for most application software as well as an important and active research area in computer science. Without compression techniques, none of the ever-growing Internet, digital TV, mobile communication or increasing video communication techniques would have been practical developments.
  • 2.
    2 D a ta c o m p r e s s i o n p r o b l e m s A compression problem involves finding an efficient algorithm to remove various redundancies from a certain type of data. The general question to ask here would be, for example, given a string s, what is the alternative sequence of symbols which takes less storage space? The solutions to the compression problems would then be the compression algorithms that will derive an alternative sequence of symbols which contains fewer number of bits in total, plus the decompression algorithms to recover the original string. How many fewer bits? It would depend on the algorithms but it would also depend on how much the redundancy can be extracted from the original data. Different data may require different techniques to identify the redundancy and to remove the redundancy in the data. Obviously, this makes the compression problems 'hard' to solve because the general question is difficult to answer easily when it contains too many instances. Fortunately, we can take certain constraints and heuristics into consideration when designing algorithms. There is no 'one size fits all' solution for data compression problems. In data compression studies, we essentially need to analyses the characteristics of the data to be compressed and hope to deduce some patterns in order to achieve a compact representation. This gives rise to a variety of data modeling and representation techniques, which are at the heart of compression techniques. Decompression Any compression algorithm will not work unless a means of decompression is also provided due to the nature of data compression. When compression algorithms are discussed in general, the word compression alone actually implies the context of both compression and decompression. In many practical cases, the efficiency of the decompression algorithm is of more concern than that of the compression algorithm. For example, movies, photos, and audio data are
  • 3.
    3 often compressed onceby the artist and then the same version of the compressed files is decompressed many times by millions of viewers or listeners. Alternatively, the efficiency of the compression algorithm is sometimes more important. For example, the recording audio or video data from some real-time programs may need to be recorded directly to limited computer storage, or transmitted to a remote destination through a narrow signal channel. Depending on specific problems, we sometimes consider compression and decompression as two separate synchronous or asynchronous processes. Differentiate between Lossless and Lossy Compression. SR.NO Lossy Compression Lossless Compression 1 Some loss of data in order to achieve higher compression. No loss data in order to achieve high compression 2 In some cases a lossy method can produced a much smaller compressed File. In some cases a lossless method can produce a bigger compressed File. 3 Lossy methods are not reversible so that original data cannot be reconstructed. Losses compression schemes are reversible so that the original data can be reconstructed. 4 Lossy compression increased compression ratio. Decreased compression ratio. 5 Used when some loss of fidelity is acceptable. Use in real world data. 6 Compression of digital form of data. Compression of analog symbol.
  • 4.
    4 Physical Model: For moreefficient compression algorithms, we require good models for sources. There are different approaches are available to build mathematical models. To construct a model, we can use the information about the physics of the data generation process. Ex. Vocal cord model for speech coding – Here knowledge about the physics of speech production can be used to construct a mathematic the sampled speech process and then this sampled speech can be encoded using this model. Ex. Head and shoulder model for video coding. But it is very difficult to understand the physics of data generation. Probability Model: In statistical data compression methods we can use probability model concept. Before compression starts, we have to construct a model for the data for such type of method. To build the probability model, we have to read the entire input stream, count appearance of each symbol and compute the probability of occurrence of each symbol. Then input the data stream symbol by symbol and is compressed using the information in the probability model. So we can say that the probability model cannot capture symbol- to-symbol dependence. For source alphabet A={a1,a2,…….,am}, we can have a probability model P={p(a1),p(a2),….p(am)} if we can assume that the symbols coming from the source are independent to each other. Markov Model: Markov model are particularly useful in text compression, where the probability of the next letter is heavily influenced by the preceding letters. In current text compression, the kTH order markov models are more widely known as finite context models, with the word context being used for what we have earlier defined as setae.
  • 5.
    5 Composite Source Model Inmany applications it is not easy to use a single model to describe the source. In such cases, we can define a composite source, which can be viewed as a combination or composition of several sources, with only one source being active at any given time. A composite source can be represented as a number of individual source Si, each with its own model model Mi, and a switch that selects a source Si with probability Pi. This is an exceptionally rich model and can be used to describe same very complicated process. Data Compression = Modeling + Coding Data compression is defined as a process where a set of data is taken and compressed over the network for transmission. This basicity gives rise to two important facts- Modeling and Coding. The limit or extent to which we can control data redundancy basically depends on this context. If a proper model has been used and proper coding has been generated by the coder, the redundancy may be significantly improved. The model is the one that guesses as to what is the probability of occurrence of most frequently to occurring less frequently while the coder outputs the decision of the first one. Coding entirely does not sums up the data compression, modeling of data it also relevantly important. Input Data Modeling (Predicts Probability) Encoder (Generates the codes by applying coding) Output encoded Data
  • 6.
    6 Modeling is definedjust as a process or technique that predicts the probability of occurrence and there by selects a coding scheme for the very same. Model is another name that is usually used to represent a copy or exact mode in which we can describe any detail in form of a physical entity. It is just a unique representation of the unique ideas or distinguished features that is usually considered for the describing any personality in best order. Here also the concept of modeling has been defined in the above mentioned concept only. It is simply regarding the representation of unique features of a data so that it can be modeled out as per human expectation and can be defined in a more precise way. When we input certain data, we have to carry out certain processing in order to improve the performance and deliverance of the desired categories in order to obtain a pre specified output. We need to have a certain specific model handler to predict the type of model that is to be implemented in order to carry out certain pre-specified format on the given list. The encoder makes use of the model information and based upon the information contained works with the pre-specific procedure to produce the desired result. So we can say that data compression is the result of not only the data model that is being used but a combination of both model and the encoder.