08448380779 Call Girls In Greater Kailash - I Women Seeking Men
Data Compression Techniques: Lossless vs Lossy, Audio & Video Compression
1. TOPICS
DATA COMPRESSION
COMPRESSION TECHNIQUES
LOSSLESS COMPRESSION
LOSSY COMPRESSION
AUDIO COMPRESSION
VIDEO COMPRESSION
MPEG COMPRESSION
JPEG COMPRESSION
LOSSLESS VS. LOSSY COMPRESSION
ADVANTAGE OF COMPRESSION
2. DATA COMPRESSION
The process of reducing
the volume of data by
applying a compression
technique is called
compression.
The resulting data is
called compressed data.
3. DATA COMPRESSION
The reverse process of
reproducing the original
data from compressed data
is called decompression.
The resulting data is called
decompressed data.
4. Reasons to Compress
Reduce File Size
Save disk space
Increase transfer speed at a
given data rate
Allow real-time transfer at a
given data rate
5. Types of compression techniques
Compression techniques can be
categorized based on following
consideration:
Lossless or lossy
Symmetrical or asymmetrical
Software or hardware
6. Types of compression techniques
1. Lossless or lossy
If the decompressed data
is the same as the original
data, it is referred to as
lossless compression,
otherwise the compression
is lossy.
7. Types of compression techniques
2. Symmetrical or asymmetrical
In symmetrical compression,
the time required to compress
and to decompress are roughly
the same.
In asymmetrical compression,
the time taken for compression
is usually much longer than
decompression.
8. Types of compression techniques
3. Software or hardware
A compression technique
may be implemented either in
hardware or software. As
compared to software codecs
(coder and decoder), hardware
codecs offer better quality and
performance.
10. Compression - Types
Spatial Compression
– Finds similarities in an image and
compresses those similarities in a smaller
form
– Intra-frame
Temporal Compression
– Finds similarities across images and
compresses those similarities in a smaller
form
– Inter-frame
Quality of Compression
– Lossless
– Lossy
11. Compression - Spatial
Run Length Encoding
– Replace a run of consecutive
pixels of the same color by a
single copy of the color value and
a count of the number of pixels
Huffman coding
– Similar to RLE, but assigns codes
of different lengths to colors
(most common colors have
minimum bits)
12. Compression - Spatial
Dictionary-based coding
– Fixed length bits point to a table
of variable length colors codes
– Basis of LZW and PKZIP
All Lossless compression schemes
– 50% compression at best
13. Compression - Spatial
GIF
– Lossless compression
– Best suited for simple images
– Reduces colors to reduce file
size (256 colors max)
14. Compression - Spatial
JPEG
– Joint Photographic Experts
Group
– Lossy compression
– Best suited for photography
– Throws data away to further
reduce file size
15. Compression - Temporal
Motion JPEG
– Most popular for capturing
analog video
– JPEG on each frame of video
– No temporal compression
– Special-purpose hardware may
be needed for real-time
16. Compression - Temporal
DV
– Most popular for storage and
capturing digital video
– 5:1 compression usually done in
hardware (camera)
– Spatial and a little temporal
compression
17. Compression - Temporal
MPEG
– Motion Picture Experts Group
– Most popular for delivery of
digital video
– Temporal and spatial
compression
– MPEG1, MPEG2, MPEG4 &
MPEG 7
19. Lossless techniques
RUN-LENGTH CODING: -
repeated symbols in a
string are replaced with the
symbol and the number of
instances it is repeated.
example
“aaaabbcccccaaaaaabaaaaa”
is expressed as
“a4b2c5a6b1a1b4”.
21. Run-Length coding
Look at compressing same sequence
again:
ABBBBBBBBBCDEEEEF
– Using RLE compression, the compressed file
takes up 10 bytes and would look like this:
A Ω9BCDΩ4EF
– Data size before compression: 17 bytes
– Data size after compression: 10 bytes
Savings: 17/10 = 1.7
24. Compression Fundamentals
Lossless
– ensures that the data
recovered from the
compression /
decompression process is
exactly the same as the
original data.
– Commonly used to
compress executable
code, text files, and
numeric data.
25. Compression Fundamentals
Lossy
– does not promise that the data
received is exactly the same as
the data sent
– removes removes information
that it cannot later restore
(Hopefully, no one will notice.)
– Commonly used to compress
digital imagery, including video.
27. Introduction
Digital Audio Compression
– Removal of redundant or
otherwise irrelevant information
from audio signal
– Audio compression algorithms
are often referred to as “audio
encoders”
Applications
– Reduces required storage space
– Reduces required transmission
bandwidth
28. Audio Data Compression
Lossless Audio Compression
– Removes redundant data
– Resulting signal is same as
original – perfect reconstruction
Lossy Audio Encoding
– Removes irrelevant data
– Resulting signal is similar to
original
29. Audio compression
Audio compression is a form of
data compression designed to
reduce the size of audio data
files.
Audio compression can mean two
things:
Audio data compression
Audio level compression
30. Audio compression
Audio data compression - in
which the amount of data in a
recorded waveform is reduced
for transmission. This is used
in MP3 encoding, internet
radio, and the like.
31. Audio compression
Audio level compression - in
which the dynamic range
(difference between loud and
quiet) of an audio waveform is
reduced. This is used in guitar
effects racks, recording studios,
etc.
33. MPEG Components
MPEG (motion pictures experts
group) is a multimedia standard
with specifications for coding,
compression and transmission of
audio, video and data streams.
Video: describes compression of
frames
Audio: describes compression of
audio frames
34. Audio compression
• MPEG audio
Mpeg audio is a standard for
compression and decompression of
digital audio.
The coding technique used in mpeg
audio standard(known as perceptual
coding) takes advantage of this
perceptual weakness of human ears
(pshychoacoustic phenomena).
In perceptual coding, the audio
spectrum is divided into a set of narrow
frequency bands, to reflect the
frequency selectivity of human hearing.
35. BASIC STEPS OF MPEG AUDIO COMPRESSION
INPUT AUDIO
BIT/NOISE ENCODED
TIME TO
SIGNAL FREQUENCY
FILTER ALLOCATION, BIT-STREAM
MAPPING
BANK QUANTIZER, FORMATTING BIT-STREAM
AND CODING
PSYCHOACOUSTIC
MODEL
MPEG AUDIO ENCODING
36. BASIC STEPS OF MPEG AUDIO COMPRESSION
ENCODED FREQUENCY FREQUENCY DECODED AUDIO
BIT STREAM
SAMPLE TO THE
BIT-STREAM
UNPACKING
RECONSTRUCTION MAPPING
SIGNAL
MPEG AUDIO DECODING
37. VIDEO COMPRESSION
• Mpeg video
MPEG video is a subset of the
MPEG standard.
Digital video compression may
either apply intraframe
compression to each individual
frame of the video or combine
both intraframe and interframe
compression.
38. VIDEO COMPRESSION
Mpeg uses both intra-frame
and inter-frame techniques for
data compression.
Mpeg compression is lossy
and asymmetric, with the
encoding process requiring
more than the decoding
process.
39. BASIC STEPS OF MPEG VIDEO COMPRESSION
VIDEO DATA TO BE COMPRESSED
PERFORM QUANYIZATION
OF DCTCOEFFICIENTS
USING A Q-TABLE
PREPROCESSING AND
COLOR SUBSAMPLING
OF INDIVIDUAL FRAMES
ORDER THE 2-D OUTPUT
IN ZIGZAG SEQUENCE
INTERFRAME MOTION
COMPENSATION FOR
P-FRAME AND B-FRAME
APPLY RUN-LENGTH
ENCODING TO THE
ZIGZAG SEQUENCE
DIVIDE EACH FRAME
INTO 8X8 PIXEL BLOCKS
APPLY VARIABLE LENGTH
ENCODING TO THE
RESULTING STREAM
APPLY DCT
TRANSFORMATION TO
EACH 8X8 PIXEL BLOCK
MPEG COMPRESSED VIDEO STREAM
40. Three Types of Frames
Intra frames (same as JPEG)
– typically about 12 frames between I frames
Predictive frames
– encode from previous I or P reference frame
Bi-directional frames
– encode from previous and future I or P
frames
I B B P B B P B B P B B I
41. Lossless compression
Loss-less compressions
reduce file size by encoding
image information more
efficiently.
Images compressed using
loss-less algorithms are
able to be restored to their
original condition.
42. Lossy compression
Lossy compressions reduce file
size by considerably greater
amounts than loss-less
compressions but lose both
information and quality.
At high compression, the image
will become visibly degraded.
44. Standard
JPEG
It supports the four modes of
encoding
– Sequential
• The image is encoded in the
order in which it is scanned.
– Progressive
• The image is encoded in
multiple passes.
45. JPEG (contd.)
– the original quality of the image
can be fully restored Hierarchical
• The image is encoded at
multiple resolutions to
accommodate different types of
displays.
– Lossless
• The image is encoded in such a
way that
48. MPEG Standard
MPEG-1: Initial video and
audio compression standard.
Later used as the standard for
VIDEO CD, and (MP3) audio
compression format.
49. MPEG
MPEG-2: Video and audio standards for
broadcast-quality television. Used for
digital satellite TV services like DIRECT
TV, digital Cable television signals, and
(with slight modifications) for DVD video
discs.
MPEG-3: Originally designed for HDTV,
but abandoned in favor of MPEG-2.
50. MPEG Standard
MPEG-4: Expands MPEG-1 to support
video/audio "objects", 3D content, low
bitrate encoding and support for
Digital Rights Managements.
MPEG-7: A formal system for
describing multimedia content.
MPEG-21: MPEG describes this future
standard as a Multimedia Framework.
51. JPEG Standard
JPEG :
– the real image compression is the Discrete
Cosine Transform (DCT).
– Removes the redundant information (the
"invisible" parts).
JPEG-2000:
– Successor to the JPEG .
– Blockiness of JPEG is removed,
– The compression ratio for JPEG 2000 is higher
than for JPEG
53. Why do we do data compression?
Data compression is
simply done for saving the
space in the hard disk,
thereby to make it more
fault tolerant.
54. What is the use of data compression
on network?
The most prominent use of
data compression on the
network is to make the server
more spacious so that more
files can be stored on it.
55. Compression
Even though disks have
gotten bigger, we are still
running short on disk space
A common technique is to
compress files so that they
take up less space on the disk.
57. Compression Utilities
Zip files are used for rapidly
distributing and storing files.
Zip files are compressed to save
space.
WinZip - a popular compression
utility for Windows.
Win RAR
59. Applications
lossless data
compression is often used
to better use disk space on
office computers, or better
use the connection
bandwidth in a computer
network.
60. Applications
In other kinds of data such as
sounds and pictures, a small
loss of quality can be tolerated
without losing the essential
nature of the data, so lossy
data compression methods
can be used.
61. Lossless vs. Lossy Compression
file:///C:/Documents and Settings/login.IPS/Desktop/amit_jain/abc/lossy data compression Information From Answers_com_files/LOSSY.gif
NOTE:
Business data requires lossless compression, while audio and video
applications can tolerate some loss, which may not be very noticeable.
62. ADVANTAGES
Data compression is simply done
for saving the space in the hard
disk, thereby to make it more fault
tolerant.
The most prominent use of data
compression on the network is to
make the server more spacious
so that more files can be stored
on it.
63. Lossy vs. Lossless Compression
Lossy method can produce a
much smaller compressed file
than any known lossless method,
while still meeting the
requirements of the application.
Lossily compressed still images
are often compressed to 1/10th
their original size, as with audio,
but the quality loss is more
noticeable, especially on closer
inspection.
64. DATA COMPRESSION
NEEDED AS MOST OF THE REAL
WORLD DATA IS REDUNDANT
IMPORTANCE?
SAVES DISK SPACE
SAVES CONNECTION BANDWIDTH
REDUCES PROCESSING TIME
REDUCES COMMUNICATION TIME
ENABLES FAST STORAGE AND
RETRIEVAL
66. INFORMATION THEORY
IT IS A BRANCH OF MATHEMATICS
THAT DEALS WITH DATA/INFORM -
ATION REPRESENTATION
DATA COMPRESSION IS ONE OF
THE APPLICATIONS OF
INFORMATION THEORY
67. SHANNON’S PRINCIPLE
FOR INFORMATION
FOR DATA COMPRESSION, IT IS
ESENTIAL TO MEASURE INFORMATION
CONTENTS IN THE DATA OR THE
DEGREE OF
RANDOMNESS/UNCERTAINTY
HIGH PROBABILITY EVENTS CONTAIN
LESS SELF-INFORMATION WHEREAS
LOW PROB EVENT ASSOCIATES MUCH
MORE SELF INFORMATION
68. SHANNON’S PRINCIPLE
FOR INFORMATION
IT WAS GIVEN BY CLAUDE
SHANNON
ACCORDING TO HIM, SELF-
INFORMATION IS ASSOCIATED
WITH EVERY POSSIBLE OUTCOME
OF AN EVENT.
69. SHANNON’S PRINCIPLE
FOR INFORMATION
LET P(A) & P(B) BE THE PROB OF
OCCURANCE OF EVENTS A & B
RESPECTIVELY.
ACCORDING TO SHANNON, SELF-INFO
ASSOCIATED WITH EVENT A MAY BE
DEFINED AS
Si(A) = - logmP(A)= logm[1/P(A)]
SIMILARLY, Si(B)= logm[1/P(B)]
WHERE m DEFINES THE UNIT OF INFO
71. SHANNON’S PRINCIPLE
FOR INFORMATION
CONCEPT OF Si MAY ALSO BE USED
TO MAKE INFERENCES BY
ASSOCIATING IT WITH 2 INDEPENDENT
EVENTS
LET A & B BE 2 INDEPENDENT EVENTS,
THEN
P(AB)= P(A)*P(B)
Si(AB)=-log2[P(AB)]
= [-log2P(A)] + [-log2P(B)]
= Si(A) + Si(B)
72. ENTROPY OF INFORMATION
ENTROPY IS A CONCEPT OF
THERMODYNAMICS
IN INFO THEORY, IT IS USED TO
FIND OUT THE
RANDOMNESS/UNCERTAINTY IN A
MESSAGE
73. ENTROPY OF INFORMATION
THE AVERAGE INFO CONTENT OF A
MESSAGE IS CALLED ITS ENTROPY
THE LESS LIKELY A MESSAGE IS TO
OCCUR, THE LARGER ITS INFO
CONTENT
ENTROPY IS AN IMPORTANT CONCEPT
OF DATA COMPRESSION
74. ENTROPY OF INFORMATION
ENTROPY (Ee) IS THE MINIMUM NO
OF BITS NEEDED TO ENCODE
THAT ELEMENT
THE ENTROPY OF AN ENTIRE
MESSAGE (Em) IS THE MIN NO. OF
BITS NEEDED TO ENCODE THE
ENTIRE MESSAGE WITH A
LOSSLESS COMPRESSION.
75. ENTROPY OF INFORMATION
THE ENTROPY OF A MESSAGE
CAN BE USED TO DETERMINE IF
THE DATA COMPRESSION IS
WORTH ATTEMPTING.
IT CAN ALSO BE USED TO
EVALUATE THE EFFECTIVENESS
OF COMPRESSION.
76. ENTROPY OF INFORMATION
THE NO. OF BITS IN A
COMPRESSED CODE CAN BE
COMPARED TO THE ENTROPY
FOR THAT MESSAGE Em
REVEALING HOW CLOSE TO
OPTIMAL COMPRESSION ONE’S
CODE IS.
77. ENTROPY OF INFORMATION
SHANNON PROPOSED THE
FOLLOWING ENTROPY FN FOR A
MESSAGE:
Em = - Σ Pi log2(Pi), sum over 1
TO N ---- (1)
WHERE N= NO. OF POSSIBLE CHAR
TYPES USED IN THE MESSAGE AND Pi
DENOTES THE PROB OF THE ith CHAR.
Eg “AABCCD”, N=4
78. ENTROPY OF INFORMATION
THE ENTROPY OF A CHAR IS
GIVEN BY ITS SELF INFO ie.,
ENTROPY OF A CHAR A IS GIVEN
BY Ee=-log2P(A)
THE ENTROPY OF A MESSAGE
CONTAINING N CHARS CAN ALSO
BE FOUND OUT IN TERMS OF AV
SELF INFO OF ALL N CHARS
ie, Em = (1/N)*Σ Si OR ENTROPY
OF ith CHAR, I= 1 TO N ------- (2)
79. ENTROPY OF INFORMATION
NOTICE THE DIFFERENCE
BETWEEN N IN THE TWO
EQUATIONS
IN 1ST, N IS THE NO OF DISTINCT
CHARS USED IN THE MESSAGE
AND IN 2ND N = TOTAL NO OF
CHARS USED IN THE MESSAGE
80. ENTROPY OF INFORMATION
SO, ENTROPY OF A MESSAGE
GIVES THE AVERAGE NO OF BITS
REQUIRED TO REPRESENT A
CHARACTER IN THE MESSAGE
QUES: FOR THE MESSAGE
“dadbadcadbaadac” CALCULATE
Si ASSOCIATED WITH CHARS A &
B, ENTROPY OF CHARS C & D, AV
SELF INFO IN THE MESSAGE,
ENTROPY OF THE MESSAGE?
81. ENTROPY OF INFORMATION
N=15
CHAR NO OF CHARS PROB OF CHAR Si
d 4 4/15 1.90
a 6 6/15 1.32
b 2 2/15 2.90
c 3 3/15 2.32
AV SELF INFO OF MESSAGE= [1/N]*Σ ENTROPY OF ith CHAR
= [1/15]*[E(1) + E(2) + E(3) + ….+ E(15)]
=[1/15]*[E(d)+E(a)+E(d)+E(b)+………+E(c)]
= [1/15] *[1.90+1.32+1.90+2.90+..+2.32] = [1/15]*28.28 = 1.88
ENTROPY OF MESSAGE=-Σ Pi*log2(Pi), i=1 TO 4
= (4/15)*(1/1.90) + (6/15)*(1/1.32) + (2/15)*(1/2.90) + (3/15)(1/2.32)
= 1.88
82. ENTROPY OF INFORMATION
NOTE THAT THE AV SELF
INFORMATION OF THE MESSAGE
AND THE ENTROPY OF THE
MESSAGE BOTH ARE SAME AND
BOTH THE FUNCTIONS GIVE THE
AVERAGE NO OF BITS REQUIRED
TO REPRESENT A CHARACTER IN
THE MESSAGE
83. ENTROPY OF INFORMATION
QUES2: CALCULATE THE AV NO.
OF BITS REQUIRED TO
REPRESENT A CHAR IN THE
MESSAGE STRING “AAAAABBCC”
84. ENTROPY OF INFORMATION
A 6 0.6
B 2 0.2
C 2 0.2
ENTROPY OF MESSAGE=-Σ Pi*log2(Pi), I=1 TO N
HERE N=3
ENTROPY OF THE MESSAGE
= 0.6*log2(1/0.6)+0.2*log2(1/0.2)+0.2*log2(1/0.2)
=0.6*0.74 + 0.2*2.32 + 0.2*2.32
=0.44 + 0.46 + 0.46
=1.36= AV NO OF BITS REQUIRED TO
REPRESENT A CHARACTER
85. CODES
A CODE IS ANY MAPPING FROM AN
INPUT ALPHABET TO AN OUTPUT
ALPHABET
A CODE CAN BE SAY {a,b,c} = {0,1,00},
BUT THIS CODE IS NOT UNIQUELY
DECODABLE.
IF THE DECODER GETS A CODE
MESSAGE OF 2 ZEROS, THERE IS NO
WAY IT CAN KNOW WHETHER THE
ORIGINAL MESSAGE HAD TWO a’S OR
ONE c’S
86. CODES
A CODE IS INSTANTANEOUS IF EACH
CODEWORD IN A MESSAGE CAN BE
DECODED AS SOON AS IT IS RECEIVED.
THE BINARY CODE {a,b} = {0,01} IS
UNIQUELY DECODABLE, BUT IT IS NOT
INSTANTANEOUS. ONE HAS TO SEE IF
THE NEXT BIT IS 1. IF IT IS, b IS
DECODED; IF NOT a IS DECODED.
THE BINARY CODE {a,b,c}={0,10,11} IS
AN INSTANTANEOUS CODE
87. CODES
A CODE IS A PREFIX CODE IFF NO
CODEWORD IS A PREFIX OF ANOTHER
CODE WORD.
A CODE IS INSTANTANEOUS IFF IT IS A
PREFIX CODE, SO A PREFIX CODE IS
ALWAYS A UNIQUELY DECODABLE
INSTANTANEOUS CODE.
ALL UNIQUELY DECODABLE CODES
CAN BE CHANGED INTO PREFIX CODES
OF EQUAL CODE LENGTHS.
88. TYPES OF CODING
THERE ARE MANY ALGORITHMS
FOR CODING THE CHARACTERS
BUT CAN BE BROADLY DIVIDED
INTO 2 TYPES:
STATIC (FIXED SIZE) CODING
DYNAMIC (VARIABLE SIZE)
CODING
89. STATIC CODING SCHEME
IF THE MESSAGE IS COMPOSED
BY THE COMBINATION OF M
DISTINCT CHARS, THEN THE
POSSIBLE NO. OF BITS REQUIRED
IN THE CODE= N = logbM, WHERE
N= MINIMUM NO. OF BITS
REQUIRED TO REPRESENT M
DISTINCT CHARS AND b= BASE OF
THE NUMBER SYSTEM
90. STATIC CODING SCHEME
THE MAIN DISADVANTAGE IS
THAT IT DOES NOT CONSIDER
THE FREQUENCY OR PROB OF
OCCURANCE OF A PARTICULAR
CHAR IN THE MESSAGE
91. STATIC CODING SCHEME
QUES 3: CONSIDER THE MESSAGE “RAMRAHIM”
FIND THE NO OF DISTINCT CHARS, THE MIN
NO OF BITS REQUIRED TO REPRESENT A
CHAR, GENERATE THE CODE FOR ALL
DISTINCT CHARS, BY USING THESE CODES
WHAT SHALL BE THE CODED MESSAGE FOR
THE MESSAGE “MIHIR”, HOW MUCH IS THE
SAVING BY USING THE CODING SCHEME
OVER ASCII REPRESENTATION
92. STATIC CODING SCHEME
NO OF DISTINCT CHARS = 5
N=log2M = log25= 3; SO 3 BIT CODE IS NEEDED TO
REPRESENT EACH SYMBOL
000=R, 001=A, 010=M, 011=H, 100=I; REST ARE
UNUSED
BY USING THE CODES AS ABOVE, THE CODED
MESSAGE FOR “MIHIR” SHALL BE
“010100011100000”
EACH CHARACTER OF THE STRING IS
REPRESENTED BY 3 BITS AND THERE ARE 5
CHARACTERS IN THE MESSAGE. SO THE NO OF BITS
REQUIRED= 5*3=15; THEREFORE SAVING = 40 – 15 =
25 BITS
93. DYNAMIC CODING
SCHEME
COMPUTERS ENCODE CHARS IN ASCII CODE.
SO, A FILE HAVING 100 CHARS SHALL
REQUIRE 800 BITS
BUT IN ANY TEXT FILES, SOME CHARS
OCCUR WITH MORE FREQUENCY THAN
OTHERS
SO, IT IS BETTER THAT SHORTER BIT CODES
ARE ASSIGNED TO THE FREQUENTLY
OCCURING CHARS THAN OTHERS.
THIS WAS ALSO REALIZED WAY BACK BY
SAMUEL NORSE.
THIS CONCEPT IS USED IN DYNAMIC CODING.
94. DYNAMIC CODING SCHEME
IT USES VARIABLE SIZE CODE
MINIMUM NO OF BITS ARE ASSIGNED
TO THE MOST FREQUENTLY OCCURING
CHARACTER AND MAXIMUM NO OF
BITS TO THOSE WHICH ARE LEAST
FREQUENTLY USED.
ANY STATISTICAL MODEL MAY BE
USED TO CALCULATE THE
FREQUENCY OF OCCURANCE OF
CHARACTERS.
95. DYNAMIC CODING SCHEME
QUES 4: CONSIDER THE MESSAGE
“RAAMRAHMMM”
FIND OUT THE DISTINCT CHARACTERS AND
THEIR FREQUENCY, GENERATE CODES FOR
ALL CHARACTERS USING DYNAMIC
CODING, USING GENERATED CODES WRITE
THE CODE FOR “MAHR”, HOW MUCH IS THE
SAVINGS IN BIT.
3. 4 DISTINCT CHARS; R-2,A-3, M-4 AND H-1
4. M-1,A-01,R-001,H-0001
5. 1010001001= 10 BITS
6. SAVINGS = 32 – 10 = 22 BITS
96. USE OF ENTROPY IN CODING
THE ENTROPY FN IS USED TO
DEVELOP AN EFFICIENT CODE FOR
THE PURPOSE OF COMMUNICATION.
ONE CAN USE ENTROPY TO FIND OUT
THE SCOPE OF FURTHER REFINEMENT
IN THE CODING SCHEME AS THE
ENTROPY OF THE MESSAGE RESULTS
IN AVERAGE NO OF BITS REQUIRED TO
REPRESENT A CHARACTER.
97. USE OF ENTROPY IN CODING
QUES 5: CONSIDER A MESSAGE STREAM CONSISTING OF
CHARS A,B,C,D. LET THE PROB OF OCCURANCE OF
CHARS BE 0.6, 0.3, 0.08 AND 0.02 RESPECTIVELY. Si
RESPECTIVELY IS 0.73,1.73,3.64 AND 5.64
B. FIND MIN NO OF BITS REQ TO REPRESENT A CHAR
USING STATIC CODING, IF A MESSAGE CONSISTS OF
ALL THE 4 CHARS
C. GENERATE CODE FOR THE CHARS USING DYNAMIC
SCHEME. WHAT IS THE AV NO OF BITS REQ TO
REPRESENT A CHAR IN THIS CODING SCHEME, IF A
MESSAGE CONTAINS 100 CHARS
D. IS THERE ANY POSSIBILITY OF FURTHER
REFINEMENT IN THE CODING SCHEME?
98. USE OF ENTROPY IN CODING
M=4; MIN NO OF BITS REQ TO REP A
CHAR=N=log24=2
BY LOOKING INTOTHE TABLE, THE FOLLOWING
CODES CAN BE GENERATED USING DYNAMIC
SCHEME:
CHAR PROB CODE
A 0.70 1
B 0.15 01
C 0.10 001
D 0.05 0001
AV NO OF BITS REQ TO COMM A MESSAGE OF 100
CHARS = [(70*1)+(15*3)+(10*2)+(5*4)]/100
= 150/100 = 1.5
99. USE OF ENTROPY IN CODING
3. DYNAMIC CODING IS MORE EFFICIENT
4. ENTROPY = - Σ Pi *log2(Pi)
= - (0.7*log2(0.7)+0.15*log2(.15)+0.1*log2(.1)+0.05*
log2(.05))
= 1.31
SO, AV NO OF BITS REQ TO REPRESENT A CHAR IN THE
MESSAGE=1.3
THERE IS A DIFFERENCE BETWEEN THE ENTROPY VALUE
AND THE NO OF BITS REQUIRED BY BOTH THE
METHODS, THEREFORE FURTHER REFINEMENT IS
POSSIBLE IN THE CODING SCHEMES.
100. LOSSLESS DATA COMPRESSION
ALL ALGORITHMS ATTEMPT TO
RE-ENCODE DATA TO REMOVE
REDUNDANCY
IT IMPLIES THAT DATA WITH NO
REDUNDANCY CAN NOT BE
COMPRESSED BY THESE
TECHNIQUES WITHOUT SOME
LOSS OF INFORMATION
101. SHANNON FANO ALGORITHM
IT USES THE IDEA OF USING
SHORTER CODES FOR MORE
FREQUENTLY OCCURING
CHARACTERS
GIVEN BY CLAUDE SHANNON &
R.M.FANO
102. SHANNON FANO ALGORITHM
ADV?
CONSIDER A FILE HAVING 40
LETTERS WITH THE GIVEN
FREQUENCY- A:14; B:7; C:10; D:5;
E:4
ASCII – 40*8=320 BITS. DECODING
SIMPLY CONSISTS OF BREAKING
INTO 8 BYTES AND CONVERTING
IT INTO CHARACTER. SO, IT
NEEDS NO ADDITIONAL INFO.
103. SHANNON FANO ALGORITHM
VARIABLE LENGTH ENCODING
SCHEMES SUCH AS HUFFMAN
AND SHANNON-FANO HAVE THE
FOLLOWING PROPERTIES:
CODES FOR MORE FREQUENT
CHARS ARE SHORTER THAN
ONES FOR LESS PROBABLE
CHARS
104. SHANNON FANO ALGORITHM
EACH CODE CAN BE UNIQUELY
DECODED. THIS IS CALLED THE PREFIX
PROPERTY ie., NO CHARS ENCODING
IS A PREFIX OF ANY OTHER.
• TO SEE WHY THIS PROPERTY IS
IMPORTANT, CONSIDER “A” ENCODED
AS 0;”B” AS 01;”C” AS 10. IF THE
DECODER ENCOUNTERS THE BIT-
STREAM “0010”, IS IT “ABA” OR
“AAC”?
105. SHANNON FANO ALGORITHM
WITH THE PREFIX GUARANTEE, THERE
IS NO AMBIGUITY IN DETERMINING
WHERE THE CHAR BOUNDARIES ARE.
ONE STARTS READING FROM THE
BEGINNING AND GATHER BITS IN A
SEQUENCE UNTIL ONE FINDS A
MATCH.
THAT INDICATES THE END OF CHAR
AND ONE MOVES ALONG TO THE NEXT
CHAR.
106. SHANNON FANO ALGORITHM
1. FIND THE FREQ OF OCCURANCE OF
EACH SYMBOL
3. SORT IT IN THE DESCENDING ORDER
5. DIVIDE THE LIST INTO 2 PARTS, WITH
THE TOTAL FREQ COUNT OF THE
UPPER HALF BEING AS CLOSE TO
THAT OF THE BOTTOM HALF AS
POSSIBLE
107. SHANNON FANO ALGORITHM
1. REPEAT STEP 3 UNTIL EACH HALF
CONTAINS JUST ONE SYMBOL
3. CONSTRUCT THE BINARY TREE (SF
TREE) SO THAT THE UPPER HALF
BECOMES THE LEFT SUB-TREE AND
THE LOWER HALF BECOMES THE
RIGHT SUB-TREE. EACH LEFT
BRANCH IS ASSIGNED 0 AND EACH
RIGHT HALF 1
108. SHANNON FANO ALGORITHM
1. TO OBTAIN THE CODE FOR ANY
SYMBOL, THE CODE IS THE
COMBINATION OF ALL THE
DIGITS FROM THE ROOT TO
THAT LEAF (SYMBOL)
QUES: APPLY SF ALGO TO A
TEXT FILE HAVING 40 CHARS
WITH THE GIVEN FREQ: A-14,
B-7, C-10, D-5, E-4
113. SHANNON FANO ALGORITHM
THIRD ITERATION
A 14
C 10
B 7
D 5
E 4
AFTER THE FOURTH ITERATION, WE
WILL HAVE THE FOURTH DIVISION AND
ALL THE HALF WILL THEN HAVE ONLY
ONE SYMBOL.
115. SHANNON FANO ALGORITHM
1. OBTAINING THE CODE FROM THE TREE
SYMBOL CODE NO OF BITS FREQUENCY
A 00 2 14
B 10 2 7
C 01 2 10
D 110 3 5
E 111 3 4
TOTAL NO OF BITS NEEDED FOR TEXT = 89
SO, AV NO OF BITS USED BY ANY SYMBOL=89/40=2.225
WHICH IS QUITE LESS AS COMPARED TO 8 BITS PER
SYMBOL NEEDED IN ASCII
116. HUFFMAN ALGORITHM
GIVEN BY DAVID HUFFMAN
IMPROVEMENT OVER S-F ALGO.
LOSSLESS COMP ALGO, IDEAL FOR
COMPRESSING TEXT OR PROGRAM
FILES
HUFFMAN CODE TABLE GUARANTEES
TO PRODUCE THE LOWEST POSSIBLE
OUTPUT BIT COUNT POSSIBLE FOR
THE INPUT STREAM OF SYMBOLS,
WHEN USING FIXED LENGTH CODES
117. HUFFMAN ALGORITHM
HUFFMAN CALLED THESE
“MINIMUM REDUNDANCY CODES”
IT BELONGS TO THE FAMILY OF
ALGOS WITH A VARIABLE CODE
WORD LENGTH.
USED IN PKZIP, LHA, GZ, ZOO AND
ARJ, JPEG AND MPEG
118. HUFFMAN ALGORITHM
MAIN DIFFERENCE:
S-F ALGO BUILDS THE BINARY TREE
FROM TOP TO BOTTOM, WHEREAS
HUFFMAN’S ALGO FORMS THE BINARY
TREE FROM BOTTOM TO TOP
PERFORMANCE OF BOTH OF THEM
ARE QUITE SIMILAR
119. HUFFMAN ALGORITHM
1. COUNT THE NO OF CHARS AND THE
FREQ OF OCCURANCE OF EACH
CHARACTER
3. ARRANGE THEM IN THE DESCENDING
ORDER OF FREQ.
5. CONSTRUCT HUFFMAN TREE FOR
THE GENERATION OF CODES
120. HUFFMAN ALGORITHM
CONSTRUCTION OF HUFFMAN TREE:
2. PICK UP 2 CHARS FROM THE LIST HAVING
MINIMUM FREQ. LET US CALL THESE
CHARS A AND B
4. CREATE 2 FREE NODES OF THE BT AND
ASSIGN A AND B TO THESE NODES
6. ASSIGN A PARENT NODE FOR THEM AND
ASSIGN IT THE FREQ THAT IS THE SUM OF
THE CHILD NODES. LET US CALL IT “AB”
121. HUFFMAN ALGORITHM
1. DELETE A AND B FROM THE LIST
3. ADD THE VALUE OF “AB” TO THE LIST
5. REPEAT THE STEPS 1 TO 5 TILL THE LIST OF CHARS
BECOMES EMPTY. THE RESULTANT TREE THUS
GENERATED IS THE HUFFMAN TREE.
7. ASSIGN THE BITS TO THE NODES OF THE TREE AS IN
S-F ALGO.ie., 0 TO LEFT CHILD & 1 TO RIGHT CHILD
9. TO FIND THE CODE FOR A CHAR, TRAVERSE FROM
ROOT TO LEAF CONTAINING THAT CHAR.
122. HUFFMAN ALGORITHM
PROBLEM: LET A MESSAGE OF 100 CHARS
CONTAIN THE FOLLOWING:
CHAR FREQUENCY
A 50
B 20
C 15
D 10
E 5
STEPS 1 AND 2 HAVE ALREADY BEEN DONE
123. HUFFMAN ALGORITHM
CONSTRUCTION OF HUFFMAN TREE:
2. TWO CHARS HAVING MINIMUM FREQ ARE D
&E
2 MAKE D AND E 2 FREE NODES OF THE TREE
10 5
D E
3. ASSIGN A PARENT NODE FOR THEM:
DE
15
10 5
D E
124. HUFFMAN ALGORITHM
4 & 5: DELETE D & E FROM THE LIST AND ADD
DE AND REPEAT
30 CDE
15 15
C DE
6. REPEAT 1 TO 5 UNTIL LIST EMPTY
50 CDEB
30 20
CDE B
126. HUFFMAN ALGORITHM
H TREE: CDEBA
100
0 1
50 50 CDEB
A 0 1
30 CDE 20 B
0 1
15 15
C 0 DE 1
10 5
D E
127. HUFFMAN ALGORITHM
CHARACTER HUFFMAN CODE SIZE
A 0 1
B 11 2
C 100 3
D 1010 4
E 1011 4
TOTAL NO OF BITS REQUIRED=195
AV BITS USED = 1.95
ENTROPY OF MESSAGE= - Σ Pi* log (Pi) =1.932
BITS
SO, REDUNDANCY= 1.95 – 1.932 = 0.018
BITS/CHAR
128. HUFFMAN ALGORITHM
HUFFMAN CODING CAN BE FURTHER
OPTIMIZED:
EXTENDED HUFFMAN COMPRESSION-
CAN ENCODE GROUP OF SYMBOLS
RATHER THAN SINGLE SYMBOL
ADAPTIVE HUFFMAN CODING-
DYNAMICALLY CHANGES THE CODE
WORDS ACCORDING TO THE CHANGE
OF PROBABILITY OF SYMBOLS
129. ARITHMETIC CODING
IT IS A METHOD OF WRITING A CODE IN A
NON-INTEGER LENGTH
IT ALLOWS ONE TO CODE VERY CLOSE TO
IDEAL ENTROPY
IT DOES NOT REPLACE AN INPUT SYMBOL
WITH A SPECIFIC CODE
INSTEAD, IT TAKES A STREAM OF INPUT
SYMBOLS AND REPLACES IT WITH A SINGLE
FLOATING POINT OUTPUT NUMBER
130. ARITHMETIC CODING
IT IS QUITE SIMILAR TO HUFFMAN, BECAUSE
IT IS USED FOR THE SAME KIND OF
CONTENTS TO COMPRESS AS IN HUFFMAN
IT IS DIFFERENT FROM HUFFMAN IN THE WAY
IT PROCESSES THE SOURCE
INSTEAD OF GIVING BIT VALUE TO EACH
CHAR, IT USES PROB VALUE FOR EACH CHAR
IT IS BASED UPON PROB BETWEEN 0 AND 1
131. ARITHMETIC CODING
THE OUTPUT FROM AN ARITHMETIC CODING
PROCESS IS A SINGLE NUMBER LESS THAN 1
AND GREATER THAN OR EQUAL TO 0
THIS SINGLE NUMBER CAN BE UNIQUELY
DECODED TO CREATE THE EXACT STREAM
OF SYMBOLS THAT WENT INTO ITS
CONSTRUCTION
IT RESULTS IN BEST COMPRESSION RATIO.
132. ARITHMETIC CODING
1. IT REQUIRES 5 VARIABLES FOR ENCODING:
RANGE, LOW, HIGH, RF(RANGE FROM), RT
(RANGE TO)
EXAMPLE: LET THE MESSAGE BE “ABCBAA”.
THE FREQ OF CHARS A, B AND C ARE 3, 2
AND 1 RESPECTIVELY. A TABLE IS TO BE
CREATED AS FOLLOWS:
CHAR PROB RANGE RANGEFROM RANGETO
A 0.5 >=0&<.5 0.00 0.50
B 0.33 >=0.5 &<0.83 0.50 0.83
C 0.16 >=0.83&<1 0.83 1.00
133. ARITHMETIC CODING
1. NOW ENCODE THE CHARS IN THE MESSAGE USING
THE TABLE AS OBTAINED IN STEP 1, AS FOLLOWS,
AND CREATE A TABLE AGAIN:
SET LOW=0 AND HIGH=1.0
WHILE there are still input symbols, DO
GET AN INPUT SYMBOL
RANGE=HIGH (previous) – LOW (prev)
LOW=LOW(prev) + (RANGE*RF of current symbol)
HIGH=LOW(prev) + (RANGE* RT of current symbol)
END OF EHILE
OUTPUT=LOW
134. ARITHMETIC CODING
CHAR RANGE LOW HIGH
START-NONE 1-0=1 0+1*0=0 0+1*0=0
A 1-0=1 0+1*0=0 0+1*.5=0.5
B 0.5-0=0.5 0+.5*.5=.25 0+.5*.83=.415
C 0.415-.25=0.165 .25+.165*.83 .25+.165*1.0
=.38695 =0.415
B 0.415-0.38695 .38695+.02805 .38695+.02805
= .02805 *0.5=.400975 *.83=.4102315
A 0.4102315-.400975 .400975+ .400975+
=0.0092565 *0.0=0.400975 .0092565*0.5
=0.40560325
A 0.40560325- .400975+ 0.400975+
0.400975 .00462825*0.0 .00462825*.5
=0.00462825 = 0.400975 = 0.403289125
135. ARITHMETIC CODING
THE FINAL OUTPUT VALUE = LOW =0.400975
STEP 3: TO DECODE THE MESSAGE Ie., TO GET THE
CHARS BACK, THE FOLLOWINF PROCESS IS
ADOPTED:
AGAIN 5 VARIABLES ARE REQUIRED- RANGE, RF, RT,
VALUE AND RD (RANGE DIFFERENCE)
5. OUTPUT THE SYMBOL BY DETERMINING THAT IN
WHICH RANGE THE VALUE IS. IN THIS EXAMPLE
OUTPUT IS 0.400975 WHICH LIES BETWEEN O AND
0.5. SO, THE FIRST CHARACTER DECODED IS “A”.
2. GET A NEW VALUE USING RD=RT – RF
NEW VALUE = (PREV VALUE – PREV RF)/PREV RD
136. ARITHMETIC CODING
VALUE RANGE CHAR DECODED RD
.400975 0.00 - <0.5 A 0.5
.80195 0.5 - <0.83 B 0.33
0.915 0.83 - <1.00 C 0.16
0.53125 0.5 - <0.83 B 0.33
0.09469696 0.00 - <0.5 A 0.50
0.18939392 0.00 - <0.5 A 0.50
ADV: BETTER RESULT THAN HUFFMAN
DISADV:
10. COMPLICATED CALCULATIONS
11. IT REQUIRES FPU, SO PROCESS IS SLOW
12. DOES NOT KNOW WHERE THE DECODING PROCESS
SHOULD END. TO OVERCOME THIS PROBLEM, ONE SPL
CHAR IS INSERTED INTO THE ENCODED TEXT AS
DELIMITER. AT THE TIME OF DECODING, IT INDICATES
THAT THERE ARE NO MORE CHARS TO DECODE.
137. Compression ratio
ONE NEEDS TO KNOW IT TO FIND
OUT THE EFFICIENCY OF THE
COMPRESSION ALGORITHM
C.R.= SIZE OF O.D.- SIZE OF C.D.
SIZE OF ORIGINAL DATA
138. DICTIONARY BASED COMPRESSION TECHNIQUES
STATISTICAL METHODS, SUCH AS S-F AND
HUFFMAN, ENCODE A SINGLE SYMBOL AT A
TIME BY GENERATING A ONE-TO-ONE
SYMBOL-TO-CODE MAP.
DICTIONARY BASED COMPRESSOR
REPLACES AN OCCURANCE OF A
PARTICULAR PHRASE OR GROUP OF BYTES
IN A PIECE OF DATA WITH AN INDEX TO THE
PREVIOUS OCCURANCE OF THAT PHRASE.
139. DICTIONARY BASED COMPRESSION TECHNIQUES
SUPPOSE A TEXT IS GIVEN
IT IS ASSUMED THAT THERE IS A DICTIONARY
THAT HAS ALL THE WORDS IN THE GIVEN
TEXT.
EACH WORD IN THE DICTIONARY IS
REPRESENTED BY A UNIQUE NUMBER THAT
ALSO INDICATES THE POSITION OR THE
INDEX OF THE WORD IN THE DICTIONARY.
140. DICTIONARY BASED COMPRESSION TECHNIQUES
WHEN THE TEXT IS TO BE COMPRESSED, THE WORDS
OF THE TEXT ARE REPLACED BY THE INDEX OF THAT
WORD.
LET THE TEXT BE “LEARN THE DICTIONARY BASED
COMPRESSION METHOD. IT IS A VERY SIMPLE
METHOD. THANK YOU.”
SAY THE DICTIONARY IS LIKE THIS: LEARN-1; THE-2;
DICTIONARY-3; BASED-4; COMPRESSION-5; METHOD-6;
IT-7; IS-8; A-9; VERY-10; SIMPLE-11; THANK YOU-12
THE ENCODED MESSAGE WILL BE “1 2 3 4 5 6 7 8 9 10
11 6 12”
141. DICTIONARY BASED COMPRESSION TECHNIQUES
IF THE PHRASES ARE USED, EFFICIENCY INCREASES.
FOR THIS TO WORK, IT IS IMPORTANT THAT THE
SENDER AND THE RECEIVER MUST HAVE ACCESS TO
THE SAME DICTIONARY.
DICTIONARY BASED METHODS ARE MORE EFFICIENT
THAN CHARACTER BASED METHODS.
IT GENERATES THE CODE FOR CHARS AS WELL AS
FREQUENTLY USED WORDS AND PHRASES.
142. DICTIONARY BASED COMPRESSION TECHNIQUES
THE DICTIONARY BASED METHOD MAY BE
STATIC OR DYNAMIC DEPENDING UPON THE
CREATION AND USE OF DICTIONARY.
STATIC DICTIONARY IS PREPARED BEFORE
THE COMMUNICATION OF THE ENCODED
MESSAGE TO THE RECEIVER’S END. ALL
POSSIBLE CHARS/WORDS/PHRASES ARE
INSERTED INTO THE DICTIONARY AND
INDEXED.
143. DICTIONARY BASED COMPRESSION TECHNIQUES
THE MAIN DRAWBACK OF STATIC METHOD IS
THAT PERFORMANCE DEPENDS UPON THE
TEXT TO BE ENCODED AND IS HIGHLY
DEPENDENT ON THE ORGANIZATION OF THE
CHARS/WORDS/PHRASES IN THE
DICTIONARY.
SECONDLY, IF THERE IS ANY WORD NOT IN
THE DICTIONARY, IT FAILS.
THE SOLUTION TO THE PROBLEM IS DYNAMIC
DICTIONARY COMPRESSION.
144. DICTIONARY BASED COMPRESSION TECHNIQUES
IN THIS METHOD, THE DICTIONARY IS
PREPARED AT THE TIME OF ENCODING OF
TEXT.
LZ77, LZ78 AND LZW TECHNIQUES USE
DYNAMIC DICTIONARY COMPRESSION
TECHNIQUE.
IT GENERATES OPTIMUM SIZE CODES.
145. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE
DESIGNED FOR SEQUENTIAL DATA COMPRESSION.
THE DICTIONARY IS A PORTION OF THE PREVIOUSLY
ENCODED SEQUENCE.
THE ENCODER EXAMINES THE INPUT SEQUENCE
THROUGH A SLIDING WINDOW.
THE WINDOW CONSISTS OF 2 PARTS: A SEARCH
BUFFER, THAT CONTAINS A PORTION OF THE
RECENTLY ENCODED SEQUENCE AND A LOOK-AHEAD
BUFFER, THAT CONTAINS THE NEXT PORTION OF THE
SEQUENCE TO BE ENCODED.
146. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE
LZ77: pointer
SEARCH BUFFER LOOKAHEAD BUFFER
c a b r A c a d a b r A r r a r r
1. To encode the sequence in look-ahead buffer,
the encoder moves a search pointer back through
the search buffer until it encounters a match to the
first symbol in the look-ahead buffer. The distance
of the pointer from the look-ahead buffer is called
the offset.
147. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE
1. The encoder then examines the symbols
following the symbol at the pointer location to
see if they match consecutive symbols in the
look-ahead buffer. The number of consecutive
symbols in the search buffer that match
consecutive symbols in the look-ahead
buffer, starting with the first symbol, is called
the length of the match. The encoder
searches the search buffer for the longest
match.
148. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE
1. Once the longest match has been found, the
encoder encodes it with a triple <o,l,c> where
o is the offset, l is the length of the match and
c is the code-word corresponding to the
symbol in the look-ahead buffer that follows
the match.
In the diagram, the longest match is the first a
of the search buffer. The offset o in this case
is 7, l is 4, and the symbol in the look-ahead
buffer following the match is r.
149. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE
The reason for sending the third element in the
triple is to take care of the situation where no
match for the symbol in the look-ahead buffer
can be found in the search buffer. In this case,
the offset and the match length values are set
to 0, and the third element of the triple is the
code for the symbol itself.
For the decoding process, it is basically a table
look-up procedure and can be done by
reversing the encoding procedure.
150. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE
Take a buffer of the same size as used in
encoding, say n, and then use its first (N – n)
spaces to hold the previously decoded chars,
where N is the size of the window ( sum of the
size of the look-ahead buffer and the search
buffer) used in the encoding process.
If one breaks up each triple that one encounters
back into its components- position offset o,
match length l and the last symbol of the
incoming stream c, one can extract the match
string from buffer according to o, and thus
obtain the original content.
151. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE
It provides very good compression ratio for
many types of data.
But, the encoding process is very time
consuming as there are many comparisons to
be performed between the look-ahead buffer
and the window.
On the other hand, the decoding process is
very simple and fast and both the encoding and
the decoding processes have a low memory
consumption, since the only data held in the
memory is the window (between 4 & 64 kb)
152. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE
ALL POPULAR ARCHIVERS (.ARJ, .LHA, .ZIP, .ZOO) ARE
VARIATIONS ON LZ77 THEME.
DRAWBACK: IT USES ONLY A SMALL WINDOW INTO
PREVIOSLY SEEN TEXT, WHICH MEANS IT
CONTINUOUSLY THROWS AWAY VALUABLE
DICTIONARY ENTRIES BECAUSE THEY SLIDE OUT OF
THE DICTIONARY. THE LONGEST MATCH POSSIBLE IS
ROUGHLY THE SIZE OF THE LOOK-AHEAD BUFFER.
SECONDLY, IF A STRING THAT HAS ALREADY BEEN
CAPTURED APPEARS AT LONGER INTERVAL, THEN A
SEPARATE CODE WILL BE GENERATED FOR THE
SAME STRING.
153. LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE
TO OVERCOME SUCH PROBLEMS, LZ78 ALGO WAS
GIVEN.
THE ONLY DIFFERENCE HERE IS THAT THE FIXED SIZE
WINDOW OF LZ77 IS REPLACED BY A DICTIONARY IN
LZ78.
WHILE LZ77 WORKS ON PAST DATA, LZ78 ATTEMPTS
TO WORK ON FUTURE DATA.
IT DOES THIS BY FORWARD SCANNING THE INPUT
BUFFER AND MATCHING IT AGAINST A DICTIONARY IT
MAINTAINS.
154. LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE
THE DICTIONARY IN LZ78 IS A TABLE OF
STRINGS. EVERY STRING IS ASSIGNED A
CODE WORD ACCORDING TO ITS INDEX
NUMBER IN THE DICTIONARY.
BEFORE UNDERSTANDING THE METHOD,
LOOK AT THE FOLLOWING TERMS:
CHARSTREAM: A SEQUENCE OF DATA TO BE
ENCODED.
155. LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE
CODE WORD: A BASIC DATA ELEMENT IN THE CODE
STREAM. IT REPRESENTS A STRING FROM THE
DICTIONARY.
PREFIX: A SEQUENCE OF CHARS THAT PRECEDE ONE
CHARACTER
STRING: THE PREFIX TOGETHER WITH THE CHAR IT
PRECEDES
CODESTREAM: THE SEQUENCE OF CODE WORDS AND
CHARS ( THE OUTPUT OF THE ENCODING ALGORITHM)
156. LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE
CURRENT PREFIX (P) : THE PREFIX
CURRENTLY BEING PROCESSED IN THE
ENCODING ALGORITHM
CURRENT CHARACTER (C ): A CHAR
DETERMINED IN THE ENCODING ALGORITHM.
GENERALLY, THIS IS THE CHARACTER
PRECEDED BY THE CURRENT PREFIX.
CURRENT CODE WORD (W): THE CODE WORD
CURRENTLY PROCESSED IN THE DECODING
ALGORITHM.
157. LZ78 (LEMPEL-ZIV) ENCODING PROCESS
IT STARTS WITH A NEW DICTIONARY ie., AT
THE BEGINNING OF ENCODING THE
DICTIONARY IS EMPTY.
LET US CONSIDER A POINT WITHIN THE
ENCODING PROCESS, WHEN THE
DICTIONARY ALREADY CONTAINS SOME
STRINGS.
ONE STARTS ANALYZING A NEW PREFIX IN
THE CHARSTREAM, BEGINNING WITH AN
EMPTY PREFIX.
158. LZ78 (LEMPEL-ZIV) ENCODING PROCESS
IF ITS CORRESPONDING STRING (P+ C) IS
PRESENT IN THE DICTIONARY, THE PREFIX IS
EXTENDED WITH THE CHAR C.
THIS EXTENDING IS REPEATED UNTIL ONE
GETS A STRING WHICH IS NOT PRESENT IN
THE DICTIONARY.
AT THAT POINT, ONE OUTPUTS 2 THINGS TO
THE CODESTREAM: THE CODEWORD THAT
REPRESENTS THE PREFIX P AND THEN THE
CHAR C.
159. LZ78 (LEMPEL-ZIV) ENCODING PROCESS
THEN ONE ADDS THE WHOLE STRING (P+C) TO THE
DICTIONARY AND STARTS PROCESSING THE NEXT
PREFIX IN THE CHARSTREAM.
A SPECIAL CASE OCCURS IF THE DICTIONARY DOES
NOT CONTAIN EVEN THE STARTING ONE CHARACTER
STRING ( IT ALWAYS HAPPENS IN THE FIRST
ENCODING STEP). IN THAT CASE, ONE OUTPUTS A
SPECIAL CODE WORD THAT REPRESENTS AN EMPTY
STRING, FOLLOWED BY THIS CHARACTER AND ADD
THIS CHAR TO THE DICTIONARY.
160. LZ78 (LEMPEL-ZIV) ENCODING PROCESS
THE OUTPUT FROM THIS ALGO IS A SEQ OF
CODEWORD CHAR PAIR (W,C). EACH TIME A
PAIR IS OUTPUT TO THE CODESTREAM, THE
STRING FROM THE DICTIONARY
CORRESPONDING TO W IS EXTENDED WITH
THE CHAR C AND THE RESULTING STRING IS
ADDED TO THE DICTIONARY.
IT MEANS THAT WHEN A NEW STRING IS
BEING ADDED, THE DICTIONARY ALREADY
CONTAINS ALL THE SUBSTRINGS FORMED BY
REMOVING CHARS FROM THE END OF THE
NEW STRING.
161. LZ78 (LEMPEL-ZIV) ENCODING ALGORITHM
LZ78:
2. START WITH AN EMPTY DICTIONARY WITH AN EMPTY PREFIX P.
3. C= NEXT CHAR IN THE CHARSTREAM
4. IS THE STRING (P+C) PRESENT IN THE DICTIONARY?
IF YES, THEN P= P+C
IF NOT, THEN
OUTPUT THESE 2 OBJECTS, P & C, TO THE
CODESTREAM, [THE CODEWORD
CORRESPONDING TO P AND C IN THE SAME
FORM AS INPUT FROM CHARSTREAM]
ADD THE STRING P+C TO THE DICTIONARY
P= EMPTY
ARE THERE MORE CHARS IN THE CHARSTREAM?
IF YES, RETURN TO STEP 2
IF NOT
IF P IS NOT EMPTY, OUTPUT THE CODE WORD
CORREPONDING TO P
END
162. LZ78 (LEMPEL-ZIV) DECODING PROCESS
AT THE START OF DECODING THE DICTIONARY IS
EMPTY.
IT GETS RECONSTRUCTED IN THE PROCESS OF
DECODING.
IN EACH STEP, A PAIR CODEWORD-CHAR –(W,C) IS
READ FROM THE CODESTREAM.
THE CODEWORD ALWAYS REFERS TO A STRING
ALREADY PRESENT IN THE DICTIONARY. THE STRING
W & C ARE OUTPUT TO THE CHARSTREAM AND THE
STRING (W+C) IS ADDED TO THE DICTIONARY. AFTER
THE DECODING, THE DICTIONARY WILL LOOK
EXACTLY THE SAME AS AFTER ENCODING.
163. LZ78 (LEMPEL-ZIV) DECODING ALGORITHM
1. AT THE START THE DICTIONARY IS EMPTY.
2. W= NEXT CODEWORD IN THE CODESTREAM
3. C= THE CHARACTER FOLLOWING IT
4. OUTPUT W TO THE CODESTREAM (THIS
CAN BE AN EMPTY STRING) AND THEN
OUTPUT C
5. ADD THE STRING W+C TO THE DICTIONARY
6. ARE THERE MORE CODEWORDS IN THE
CODESTREAM?
IF YES, GO BACK TO STEP 2
IF NOT, END.
164. LZ78 (LEMPEL-ZIV) ENCODING PROCESS
EXAMPLE: LET THE CHAR STREAM TO BE
ENCODED BE
POS 1 2 3 4 5 6 7 8 9
CHAR A B B C B C A B A
ENCODING PROCESS
ENCODING CURRENT DICTIONARY OUTPUT
STEP POSITION
1 1 A (0,A)
2 2 B (0,B)
3 3 BC (2,C)
4 5 BC A (3,A)
5 8 B A (2,A)
165. LZ78 (LEMPEL-ZIV) ENCODING PROCESS
THE COLUMN DICTIONARY SHOWS WHAT
STRING HAS BEEN ADDED TO THE
DICTIONARY. THE INDEX OF THE STRING IS
EQUAL TO THE STEP NUMBER.
THE COLUMN OUTPUT PRESENTS THE
OUTPUT IN THE FORM (W,C)
THE OUTPUT OF EACH STEP DECODES TO
THE STRING THAT HAS BEEN ADDED TO THE
DICTIONARY.
166. LZ78 (LEMPEL-ZIV) DECODING PROCESS
THE DECODING PROCESS
STEPS OUTPUT PHASE TEXT GENERATED
1 (0,A) A
2 (0,B) B
3 (2,C) BC
4 (3,A) BCA
5 (2,A) BA
167. LZW ALGORITHM
LZW WORKS BY ENTERING PHRASES INTO A
DICTIONARY AND THEN, WHEN A REPEAT
OCCURANCE OF THAT PARTICULAR PHRASE
IS FOUND, OUTPUTTING THE DICTIONARY
INDEX INSTEAD OF THE PHRASE.
FOR EXAMPLE, IT USES A DICTIONARY WITH
4096 ENTRIES. IN THE BEGINNING, THE
ENTRIES 0-255 REFER TO INDIVIDUAL BYTES
AND THE REST 256-4095 REFER TO LONGER
STRINGS.
168. LZW ALGORITHM
EACH TIME A NEW CODE IS GENERATED, IT
MEANS A NEW STRING HAS BEEN SELECTED
FROM THE INPUT STREAM.
NEW STRINGS THAT ARE ADDED TO THE
DICTIONARY ARE CREATED BY APPENDING
THE CURRENT CHARACTER K TO THE END OF
AN EXISTING STRING W.
169. LZW ALGORITHM
SET W=NIL
LOOP
READ A CHARACTER K
IF WK EXISTS IN THE DICTIONARY
W=WK
ELSE
OUTPUT THE CODE FOR W
ADD WK TO THE STRING TABLE
W=K
END-LOOP
170. JPEG
DESIGNED FOR COMPRESSING FULL COLOUR
OR GRAY SCALE DIGITAL IMAGES OF REAL-
WORLD SCENES.
IT DOES NOT WORK WELL ON TEXT, NON-
REALISTIC IMAGES SUCH AS CARTOONS AND
LINE DRAWINGS.
IT IS INDEPENDENT OF SOURCE IMAGE.
IT MEANS IT CAN COMPRESS IMAGE
IRRESPECTIVE OF ITS SIZE.
171. JPEG
IT DOES NOT HANDLE B&W (1 BIT PER PIXEL)
IMAGES NOR DOES IT HANDLE MOTION
PICTURE COMPRESSION.
IT USES LOSSY TECHNIQUE.
THE ALGO ACHIEVES MUCH OF ITS
COMPRESSION BY EXPLOITING KNOWN
LIMITATIONS OF HUMAN EYE, THE FACT THAT
SMALL COLOUR DETAILS ARE NOT
PERCEIVED AS WELL AS SMALL DETAILS OF
LIGHT AND DARK.
172. JPEG
IT IS INTENDED FOR COMPRESSING IMAGES
THAT WILL BE LOOKED AT BY HUMANS.
THE JPEG STANDARD INCLUDES A
SEPARATE LOSSLESS MODE, BUT IT IS
RARELY USED AND DOES NOT GIVE NEARLY
AS MUCH COMPRESSION AS THE LOSSY
MODE.
A USEFUL PROPERTY OF JPEG IS THAT THE
DEGREE OF LOSSINESS CAN BE VARIED BY
ADJUSTING COMPRESSION PARAMETERS.
173. JPEG
DECODERS CAN TRADE-OFF DECODING SPEED
AGAINST IMAGE QUALITY BY USING FAST BUT
INACCURATE APPROXIMATIONS TO THE REQUIRED
CALCULATIONS.
MAIN ADV OF USING JPEG IS THAT ONE CAN MAKE
IMAGE FILES SMALLER, AS WELL AS ONE CAN STORE
24 BIT/PIXEL COLOR DATA (16 MILLION COLORS)
INSTEAD OF 8 BIT/PIXEL DATA (256 OR FEWER
COLORS)
JPEG CAN EASILY PROVIDE 20:1 COMPRESSION OF
FULL COLOR DATA. AT LOW QUALITY EVEN 100:1
COMPRESSION IS POSSIBLE.
JPEG IS WIDELY USED ON WWW FOR
STORING/TRANSMITTING PHOTOGRAPHS.
174. MPEG
THE MAIN ADVANTAGE IS THAT IT
COMPRESSES DATA UPTO 1.5 MBITS/SECOND
WHICH IS EQUAL TO CDROM DATA
TRANSFER RATE.
USING MPEG1, ONE CAN DELIVER 1.2 Mbps
OF VIDEO AND 250 kbps OF 2 CHANNEL
STEREO SOUND USING CDROM
TECHNOLOGY. SO MOVIE MAY BE STORED
ON CDROM IN MPEG FORMAT AND MAY BE
VIEWED WITHOUT ANY SYNCHRONIZATION
FAULT.
175. MPEG
JPEG IS FOR STILL IMAGE COMPRESSION
WHEREAS MPEG IS FOR MOVING PICTURES.
BUT AS DIGITAL VIDEO OR MOVIES STORE A
SEQ OF STILL COLOR IMAGES, MPEG
STANDARD USES THE JPEG COMPRESSION
TO COMPRESS STILL COLOR IMAGES.
MPEG IS SUITABLE FOR SYMMETRIC AS
WELL AS ASYMMETRIC COMPRESSION.
176. MPEG
ASYMMETRIC COMPRESSION REQUIRES
MORE EFFORT FOR CODING THAN
DECODING. IN THIS CASE, COMPRESSION IS
CARRIED OUT ONCE WHEREAS
DECOMPRESSION IS PERFORMED MANY
TIMES.
SYMMETRIC COMPRESSION IS KNOWN TO
EXPECT EQUAL EFFORT FOR COMPRESSION
AND DECOMPRESSION PROCESSES.
177. MPEG
INTERACTIVE DIALOGUE APPLICATIONS
MAKE USE OF THIS ENCODING TECHNIQUE,
WHERE RESTRICTED END-TO-END DELAY IS
REQUIRED.
MPEG HAS BECOME THE METHOD OF CHOICE
FOR ENCODING MOTION IMAGES BECAUSE IT
HAS BECOME WIDELY ACCEPTED FOR BOTH
INTERNET AND DVD-VIDEO.
178. MHEG
MULTIMEDIA HYPERMEDIA EXPERT GROUP
SET UP BY ISO FOR STANDARDIZATION OF EXCHANGE
FORMAT FOR MULTIMEDIA PRESENTATION AND
MULTIMEDIA SYSTEM.
IT IS ALMOST IMPOSSIBLE TO MAKE A MM
PRESENTATION WHICH CAN WORK ACROSS DIFF HW
PLATFORMS.
MAIN OBJECTIVE OF THIS GROUP IS TO CREATE THE
STANDARD METHOD OF STORE, EXCHANGE AND
DISPLAY MM PRESENTATION.
179. MHEG
IT IS BASED ON OBJECT ORIENTED TECHNOLOGY.
FOR MM PRESENTATION, THERE ARE MANY CLASSES
THAT DEFINE HOW AUDIO, VIDEO AND MUSIC CAN BE
PLAYED.
THERE ARE CLASSES THAT CAN HELP TO DEVELOP
USER INTERACTION DURING MM PRESENTATIONS.
THE THREE IMP CLASSES USED ARE CONTENT CLASS,
BEHAVIOUR CLASS AND INTERACTION CLASS.
180. MHEG
CONTENT CLASS IS USED TO DESCRIBE THE
ACTUAL CONTENTS OF THE MM
PRESENTATION
BEHAVIOUR CLASS IS USED TO DECIDE THE
BEHAVIOUR OF PRESENTATION.FOR
EXAMPLE HOW AND WHEN DATA WILL BE
PRESENTED TO THE USER. IT HAS 2 SUB
CLASSES- ACTION CLASS AND LINK CLASS
WHICH ARE USEFUL FOR SYNC THE EVENTS
WITH THE USER INTERFACE.
181. MHEG
THIS CLASS DESCRIBES THE ELEMENTS OF
THE USER INTERFACE (ie., THE ELEMENTS
THAT APPEAR ON THE USER SCREEN) THAT
ALLOW THE USER TO MAKE SELECTIONS,
TRIGGER EVENTS AND INPUT INFORMATION.
FOR EXAMPLE, THE ELEMENTS CHECK BOX,
RADIO BUTTONS AND LISTS ARE USED TO
MAKE SELECTIONS; ELEMENT PUSH BUTTON
IS USED TO TRIGGER EVENTS AND TEXT
ENTRY FIELD IS USED TO INPUT
INFORMATION FROM THE USER.
Editor's Notes
Hello, Today I will talk about the common techniques commonly used for digital audio compression of various audio filetype formats.
-I will discuss the difference between redundant and irrelevant further in my presentation. -Depending on storage or transmission, there is an optimization in size
Hello, my name is…. My background was originally in computer science as applied to satellite ground station operations,and more recently, in using geographic information technology for natural resources and environmental health.
Compression or archival software is used to condense files to save disk storage space and to speed the transfer of information through the network. Compression works by eliminating redundancies in data. Decompression software, which is used to open compressed files, is usually bundled with compression software. Virtually every Internet file uses some kind of compress/decompression utility, including text, programs or applications, images, audio, video,and virtual reality (VR). Generally, compression is built into your Internet browser and is transparent to you, but enhanced stand-alone software is available allowing you to do your own archival, upload, download and copy to floppy disk. Also you may, on occasion, download compressed files which require a decompression program of the same format, in order to expand the files and make them usable again. There are many different compression formats which vary in popularity, but currently ZIP seems to be the most common on PCs. Other formats have their proponents (e.g. unix formats like TAR and Mac compression format), so it is useful to know where to go to find a utility program that will read the file. WinZip and StuffitExpander are two popular compression utilities. Be sure to check your handout and some of the software archival sites, though, when looking for the right tool. Unzip IrfanView
Hello, my name is…. My background was originally in computer science as applied to satellite ground station operations,and more recently, in using geographic information technology for natural resources and environmental health.