SlideShare a Scribd company logo
1 of 181
TOPICS
 DATA COMPRESSION
 COMPRESSION TECHNIQUES
 LOSSLESS COMPRESSION
 LOSSY COMPRESSION
 AUDIO COMPRESSION
 VIDEO COMPRESSION
 MPEG COMPRESSION
 JPEG COMPRESSION
 LOSSLESS VS. LOSSY COMPRESSION
 ADVANTAGE OF COMPRESSION
DATA COMPRESSION

The process of reducing
 the volume of data by
 applying a compression
 technique is called
 compression.
The resulting data is
 called compressed data.
DATA COMPRESSION

The reverse process of
 reproducing the original
 data from compressed data
 is called decompression.
The resulting data is called
 decompressed data.
Reasons to Compress

Reduce File Size
Save disk space
Increase transfer speed at a
 given data rate
Allow real-time transfer at a
 given data rate
Types of compression techniques


Compression techniques can be
 categorized based on following
 consideration:

 Lossless or lossy
 Symmetrical or asymmetrical
 Software or hardware
Types of compression techniques

1. Lossless or lossy

       If the decompressed data
    is the same as the original
    data, it is referred to as
    lossless        compression,
    otherwise the compression
    is lossy.
Types of compression techniques

2. Symmetrical or asymmetrical
 In symmetrical compression,
  the time required to compress
  and to decompress are roughly
  the same.
 In asymmetrical compression,
 the time taken for compression
 is usually much longer than
 decompression.
Types of compression techniques

3. Software or hardware
      A compression technique
    may be implemented either in
    hardware or software. As
    compared to software codecs
    (coder and decoder), hardware
    codecs offer better quality and
    performance.
Basics of Compression
Compression - Types

 Spatial Compression
   – Finds similarities in an image and
     compresses those similarities in a smaller
     form
   – Intra-frame
 Temporal Compression
   – Finds similarities across images and
     compresses those similarities in a smaller
     form
   – Inter-frame
 Quality of Compression
   – Lossless
   – Lossy
Compression - Spatial
 Run Length Encoding
  – Replace a run of consecutive
    pixels of the same color by a
    single copy of the color value and
    a count of the number of pixels
 Huffman coding
  – Similar to RLE, but assigns codes
    of different lengths to colors
    (most common colors have
    minimum bits)
Compression - Spatial
 Dictionary-based coding
  – Fixed length bits point to a table
    of variable length colors codes
  – Basis of LZW and PKZIP
 All Lossless compression schemes
  – 50% compression at best
Compression - Spatial
 GIF
  – Lossless compression
  – Best suited for simple images
  – Reduces colors to reduce file
    size (256 colors max)
Compression - Spatial
 JPEG
   – Joint Photographic Experts
     Group
   – Lossy compression
   – Best suited for photography
   – Throws data away to further
     reduce file size
Compression - Temporal
 Motion JPEG
  – Most popular for capturing
    analog video
  – JPEG on each frame of video
  – No temporal compression
  – Special-purpose hardware may
    be needed for real-time
Compression - Temporal
 DV
  – Most popular for storage and
    capturing digital video
  – 5:1 compression usually done in
    hardware (camera)
  – Spatial and a little temporal
    compression
Compression - Temporal
 MPEG
  – Motion Picture Experts Group
  – Most popular for delivery of
    digital video
  – Temporal and spatial
    compression
  – MPEG1, MPEG2, MPEG4 &
    MPEG 7
BASIC COMPRESSION TECHNIQUES


Lossless techniques

Lossy techniques
Lossless techniques
 RUN-LENGTH CODING: -
     repeated symbols in a
  string are replaced with the
  symbol and the number of
  instances it is repeated.
    example
  “aaaabbcccccaaaaaabaaaaa”
  is expressed as
  “a4b2c5a6b1a1b4”.
Lossless techniques
VARIABLE-LENGTH CODING: -
    In general, coding schemes for
 a given character set use a fixed
 number of bits per character.
 example bcd, EBCDIC, and
 ASCII
Run-Length coding

 Look at compressing same sequence
  again:
  ABBBBBBBBBCDEEEEF
  – Using RLE compression, the compressed file
    takes up 10 bytes and would look like this:
  A Ω9BCDΩ4EF
  – Data size before compression: 17 bytes
  – Data size after compression: 10 bytes
    Savings: 17/10 = 1.7
LOSSY TECHNIQUES


PREDICTIVE ENCODING: -
Stores only the initial sample
Sample may be a pixel, line,
 audio sample, or video frame.
LOSSY TECHNIQUES

TRANSFORM ENCODING:-
Data is converted from one
 domain to another.
DCT (Discrete cosine
 transform) encoding is the
 best example of this method.
Compression Fundamentals
Lossless
 – ensures that the data
   recovered from the
   compression /
   decompression process is
   exactly the same as the
   original data.
 – Commonly used to
   compress executable
   code, text files, and
   numeric data.
Compression Fundamentals
 Lossy
  – does not promise that the data
    received is exactly the same as
    the data sent
  – removes removes information
    that it cannot later restore
    (Hopefully, no one will notice.)
  – Commonly used to compress
    digital imagery, including video.
Audio Compression
Techniques
Introduction
 Digital Audio Compression
  – Removal of redundant or
    otherwise irrelevant information
    from audio signal
  – Audio compression algorithms
    are often referred to as “audio
    encoders”
 Applications
  – Reduces required storage space
  – Reduces required transmission
    bandwidth
Audio Data Compression

 Lossless Audio Compression
  – Removes redundant data
  – Resulting signal is same as
    original – perfect reconstruction
 Lossy Audio Encoding
  – Removes irrelevant data
  – Resulting signal is similar to
    original
Audio compression

       Audio compression is a form of
    data compression designed to
    reduce the size of audio data
    files.

 Audio compression can mean two
  things:
 Audio data compression
 Audio level compression
Audio compression


Audio data compression - in
 which the amount of data in a
 recorded waveform is reduced
 for transmission. This is used
 in MP3 encoding, internet
 radio, and the like.
Audio compression
Audio level compression - in
 which the dynamic range
 (difference between loud and
 quiet) of an audio waveform is
 reduced. This is used in guitar
 effects racks, recording studios,
 etc.
MPEG Compression
MPEG Components

 MPEG (motion pictures experts
  group) is a multimedia standard
  with specifications for coding,
  compression and transmission of
  audio, video and data streams.
 Video: describes compression of
  frames
 Audio: describes compression of
  audio frames
Audio compression

• MPEG audio
 Mpeg audio is a standard for
  compression and decompression of
  digital audio.
 The coding technique used in mpeg
  audio standard(known as perceptual
  coding) takes advantage of this
  perceptual weakness of human ears
  (pshychoacoustic phenomena).
 In perceptual coding, the audio
  spectrum is divided into a set of narrow
  frequency bands, to reflect the
  frequency selectivity of human hearing.
BASIC STEPS OF MPEG AUDIO COMPRESSION




INPUT AUDIO
                                      BIT/NOISE                 ENCODED
                TIME TO
  SIGNAL       FREQUENCY
                            FILTER   ALLOCATION,   BIT-STREAM
                MAPPING
                             BANK    QUANTIZER,    FORMATTING   BIT-STREAM
                                     AND CODING




                    PSYCHOACOUSTIC
                        MODEL




              MPEG AUDIO ENCODING
BASIC STEPS OF MPEG AUDIO COMPRESSION




ENCODED                      FREQUENCY     FREQUENCY   DECODED AUDIO
             BIT STREAM
                               SAMPLE        TO THE
BIT-STREAM
             UNPACKING
                          RECONSTRUCTION    MAPPING
                                                          SIGNAL




             MPEG AUDIO DECODING
VIDEO COMPRESSION

• Mpeg video
 MPEG video is a subset of the
  MPEG standard.
 Digital video compression may
  either apply intraframe
  compression to each individual
  frame of the video or combine
  both intraframe and interframe
  compression.
VIDEO COMPRESSION


Mpeg uses both intra-frame
 and inter-frame techniques for
 data compression.
 Mpeg compression is lossy
 and asymmetric, with the
 encoding process requiring
 more than the decoding
 process.
BASIC STEPS OF MPEG VIDEO COMPRESSION



VIDEO DATA TO BE COMPRESSED
                                   PERFORM QUANYIZATION
                                     OF DCTCOEFFICIENTS
                                       USING A Q-TABLE

      PREPROCESSING AND
      COLOR SUBSAMPLING
     OF INDIVIDUAL FRAMES
                                   ORDER THE 2-D OUTPUT
                                    IN ZIGZAG SEQUENCE


      INTERFRAME MOTION
       COMPENSATION FOR
      P-FRAME AND B-FRAME
                                     APPLY RUN-LENGTH
                                      ENCODING TO THE
                                      ZIGZAG SEQUENCE



      DIVIDE EACH FRAME
     INTO 8X8 PIXEL BLOCKS
                                   APPLY VARIABLE LENGTH
                                     ENCODING TO THE
                                     RESULTING STREAM


           APPLY DCT
      TRANSFORMATION TO
      EACH 8X8 PIXEL BLOCK
                              MPEG COMPRESSED VIDEO STREAM
Three Types of Frames

 Intra frames (same as JPEG)
     – typically about 12 frames between I frames
 Predictive frames
     – encode from previous I or P reference frame
 Bi-directional frames
     – encode from previous and future I or P
       frames




 I    B B P   B B P B B P B B      I
Lossless compression

Loss-less compressions
 reduce file size by encoding
 image information more
 efficiently.
Images compressed using
 loss-less algorithms are
 able to be restored to their
 original condition.
Lossy compression
Lossy compressions reduce file
 size by considerably greater
 amounts than loss-less
 compressions but lose both
 information and quality.
At high compression, the image
 will become visibly degraded.
Standard

JPEG
Joint Photographic Experts
 Group
 Jpeg is the standard
 compression techniques for
 still images
Lossy compression
Best suited for photography
Standard
 JPEG
 It supports the four modes of
   encoding
    – Sequential
       • The image is encoded in the
         order in which it is scanned.
    – Progressive
       • The image is encoded in
         multiple passes.
JPEG (contd.)

– the original quality of the image
  can be fully restored Hierarchical
   • The image is encoded at
     multiple resolutions to
     accommodate different types of
     displays.
– Lossless
   • The image is encoded in such a
     way that
Standard
 JPEG
MPEG Standard

 MPEG-1
 MPEG-2
 MPEG-3
 MPEG-4
 MPEG-7
 MPEG-21
MPEG Standard

MPEG-1: Initial video and
audio compression standard.
Later used as the standard for
VIDEO CD, and (MP3) audio
compression format.
MPEG
MPEG-2: Video and audio standards for
 broadcast-quality television. Used for
 digital satellite TV services like DIRECT
 TV, digital Cable television signals, and
 (with slight modifications) for DVD video
 discs.
MPEG-3: Originally designed for HDTV,
 but abandoned in favor of MPEG-2.
MPEG Standard

MPEG-4: Expands MPEG-1 to support
video/audio "objects", 3D content, low
bitrate encoding and support for
Digital Rights Managements.
MPEG-7: A formal system for
describing multimedia content.
MPEG-21: MPEG describes this future
standard as a Multimedia Framework.
JPEG Standard
 JPEG :
  – the real image compression is the Discrete
    Cosine Transform (DCT).
  – Removes the redundant information (the
    "invisible" parts).
 JPEG-2000:
  – Successor to the JPEG .
  – Blockiness of JPEG is removed,
  – The compression ratio for JPEG 2000 is higher
    than for JPEG
Codec's

 Compression/Decompression
  Scheme
 Hardware or software based
 Many use both spatial and temporal
  compression techniques
Why do we do data compression?

Data compression is
 simply done for saving the
 space in the hard disk,
 thereby to make it more
 fault tolerant.
What is the use of data compression
            on network?

The most prominent use of
 data compression on the
 network is to make the server
 more spacious so that more
 files can be stored on it.
Compression
Even though disks have
 gotten bigger, we are still
 running short on disk space
A common technique is to
 compress files so that they
 take up less space on the disk.
COMPUTER TOOLS
 AND UTILITIES
Compression Utilities

 Zip files are used for rapidly
  distributing and storing files.
 Zip files are compressed to save
  space.
 WinZip - a popular compression
  utility for Windows.
 Win RAR
Lossless vs. lossy
Applications

      lossless data
    compression is often used
    to better use disk space on
    office computers, or better
    use the connection
    bandwidth in a computer
    network.
Applications
In other kinds of data such as
 sounds and pictures, a small
 loss of quality can be tolerated
 without losing the essential
 nature of the data, so lossy
 data compression methods
 can be used.
Lossless vs. Lossy Compression

file:///C:/Documents and Settings/login.IPS/Desktop/amit_jain/abc/lossy data compression Information From Answers_com_files/LOSSY.gif




                         NOTE:
                         Business data requires lossless compression, while audio and video
                         applications can tolerate some loss, which may not be very noticeable.
ADVANTAGES


 Data compression is simply done
  for saving the space in the hard
  disk, thereby to make it more fault
  tolerant.

 The most prominent use of data
  compression on the network is to
  make the server more spacious
  so that more files can be stored
  on it.
Lossy vs. Lossless Compression

 Lossy method can produce a
 much smaller compressed file
 than any known lossless method,
 while still meeting the
 requirements of the application.
 Lossily compressed still images
 are often compressed to 1/10th
 their original size, as with audio,
 but the quality loss is more
 noticeable, especially on closer
 inspection.
DATA COMPRESSION
 NEEDED AS MOST OF THE REAL
  WORLD DATA IS REDUNDANT

 IMPORTANCE?
 SAVES DISK SPACE
 SAVES CONNECTION BANDWIDTH
 REDUCES PROCESSING TIME
 REDUCES COMMUNICATION TIME
 ENABLES FAST STORAGE AND
  RETRIEVAL
DATA COMPRESSION
 TYPES


REVERSIBLE          IRREVERSIBLE



LOSSY – WHEN EFFICIENCY OF
TRANSMISSION IS MORE IMPORTANT
THAN ACCURACY OF INFORMATION.
INFORMATION THEORY
 IT IS A BRANCH OF MATHEMATICS
  THAT DEALS WITH DATA/INFORM -
  ATION REPRESENTATION

 DATA COMPRESSION IS ONE OF
  THE APPLICATIONS OF
  INFORMATION THEORY
SHANNON’S PRINCIPLE
FOR INFORMATION
 FOR DATA COMPRESSION, IT IS
  ESENTIAL TO MEASURE INFORMATION
  CONTENTS IN THE DATA OR THE
  DEGREE OF
  RANDOMNESS/UNCERTAINTY

 HIGH PROBABILITY EVENTS CONTAIN
  LESS SELF-INFORMATION WHEREAS
  LOW PROB EVENT ASSOCIATES MUCH
  MORE SELF INFORMATION
SHANNON’S PRINCIPLE
FOR INFORMATION
 IT WAS GIVEN BY CLAUDE
  SHANNON

 ACCORDING TO HIM, SELF-
  INFORMATION IS ASSOCIATED
  WITH EVERY POSSIBLE OUTCOME
  OF AN EVENT.
SHANNON’S PRINCIPLE
FOR INFORMATION
 LET P(A) & P(B) BE THE PROB OF
  OCCURANCE OF EVENTS A & B
  RESPECTIVELY.
 ACCORDING TO SHANNON, SELF-INFO
  ASSOCIATED WITH EVENT A MAY BE
  DEFINED AS
 Si(A) = - logmP(A)= logm[1/P(A)]
 SIMILARLY, Si(B)= logm[1/P(B)]
 WHERE m DEFINES THE UNIT OF INFO
SHANNON’S PRINCIPLE
FOR INFORMATION
 PROB EVENT VIS-À-VIS SELF INFO
    P(A)          Si(A)
     1             0
     0.5           1.0
     0.25          2.0
     0.10          3.32
     0.05          4.32
SHANNON’S PRINCIPLE
FOR INFORMATION
 CONCEPT OF Si MAY ALSO BE USED
  TO MAKE INFERENCES BY
  ASSOCIATING IT WITH 2 INDEPENDENT
  EVENTS
 LET A & B BE 2 INDEPENDENT EVENTS,
  THEN
 P(AB)= P(A)*P(B)
 Si(AB)=-log2[P(AB)]
        = [-log2P(A)] + [-log2P(B)]
        = Si(A) + Si(B)
ENTROPY OF INFORMATION
 ENTROPY IS A CONCEPT OF
  THERMODYNAMICS
 IN INFO THEORY, IT IS USED TO
  FIND OUT THE
  RANDOMNESS/UNCERTAINTY IN A
  MESSAGE
ENTROPY OF INFORMATION
 THE AVERAGE INFO CONTENT OF A
  MESSAGE IS CALLED ITS ENTROPY

 THE LESS LIKELY A MESSAGE IS TO
  OCCUR, THE LARGER ITS INFO
  CONTENT

 ENTROPY IS AN IMPORTANT CONCEPT
  OF DATA COMPRESSION
ENTROPY OF INFORMATION
 ENTROPY (Ee) IS THE MINIMUM NO
  OF BITS NEEDED TO ENCODE
  THAT ELEMENT

 THE ENTROPY OF AN ENTIRE
  MESSAGE (Em) IS THE MIN NO. OF
  BITS NEEDED TO ENCODE THE
  ENTIRE MESSAGE WITH A
  LOSSLESS COMPRESSION.
ENTROPY OF INFORMATION
 THE ENTROPY OF A MESSAGE
  CAN BE USED TO DETERMINE IF
  THE DATA COMPRESSION IS
  WORTH ATTEMPTING.

 IT CAN ALSO BE USED TO
  EVALUATE THE EFFECTIVENESS
  OF COMPRESSION.
ENTROPY OF INFORMATION
 THE NO. OF BITS IN A
  COMPRESSED CODE CAN BE
  COMPARED TO THE ENTROPY
  FOR THAT MESSAGE Em
  REVEALING HOW CLOSE TO
  OPTIMAL COMPRESSION ONE’S
  CODE IS.
ENTROPY OF INFORMATION
 SHANNON PROPOSED THE
  FOLLOWING ENTROPY FN FOR A
  MESSAGE:
    Em = - Σ Pi log2(Pi), sum over 1
                     TO N ---- (1)
  WHERE N= NO. OF POSSIBLE CHAR
  TYPES USED IN THE MESSAGE AND Pi
  DENOTES THE PROB OF THE ith CHAR.
  Eg “AABCCD”, N=4
ENTROPY OF INFORMATION
 THE ENTROPY OF A CHAR IS
  GIVEN BY ITS SELF INFO ie.,
  ENTROPY OF A CHAR A IS GIVEN
  BY Ee=-log2P(A)
 THE ENTROPY OF A MESSAGE
  CONTAINING N CHARS CAN ALSO
  BE FOUND OUT IN TERMS OF AV
  SELF INFO OF ALL N CHARS
  ie, Em = (1/N)*Σ Si OR ENTROPY
  OF ith CHAR, I= 1 TO N ------- (2)
ENTROPY OF INFORMATION
 NOTICE THE DIFFERENCE
  BETWEEN N IN THE TWO
  EQUATIONS

 IN 1ST, N IS THE NO OF DISTINCT
  CHARS USED IN THE MESSAGE
  AND IN 2ND N = TOTAL NO OF
  CHARS USED IN THE MESSAGE
ENTROPY OF INFORMATION
 SO, ENTROPY OF A MESSAGE
  GIVES THE AVERAGE NO OF BITS
  REQUIRED TO REPRESENT A
  CHARACTER IN THE MESSAGE
 QUES: FOR THE MESSAGE
  “dadbadcadbaadac” CALCULATE
  Si ASSOCIATED WITH CHARS A &
  B, ENTROPY OF CHARS C & D, AV
  SELF INFO IN THE MESSAGE,
  ENTROPY OF THE MESSAGE?
ENTROPY OF INFORMATION
 N=15
    CHAR NO OF CHARS              PROB OF CHAR          Si
     d        4                        4/15             1.90
     a       6                         6/15             1.32
     b        2                        2/15             2.90
     c       3                         3/15             2.32

AV SELF INFO OF MESSAGE= [1/N]*Σ ENTROPY OF ith CHAR
= [1/15]*[E(1) + E(2) + E(3) + ….+ E(15)]
=[1/15]*[E(d)+E(a)+E(d)+E(b)+………+E(c)]
= [1/15] *[1.90+1.32+1.90+2.90+..+2.32] = [1/15]*28.28 = 1.88
ENTROPY OF MESSAGE=-Σ Pi*log2(Pi), i=1 TO 4
= (4/15)*(1/1.90) + (6/15)*(1/1.32) + (2/15)*(1/2.90) + (3/15)(1/2.32)
= 1.88
ENTROPY OF INFORMATION
 NOTE THAT THE AV SELF
  INFORMATION OF THE MESSAGE
  AND THE ENTROPY OF THE
  MESSAGE BOTH ARE SAME AND
  BOTH THE FUNCTIONS GIVE THE
  AVERAGE NO OF BITS REQUIRED
  TO REPRESENT A CHARACTER IN
  THE MESSAGE
ENTROPY OF INFORMATION
 QUES2: CALCULATE THE AV NO.
  OF BITS REQUIRED TO
  REPRESENT A CHAR IN THE
  MESSAGE STRING “AAAAABBCC”
ENTROPY OF INFORMATION
A         6        0.6
B         2        0.2
C         2        0.2
ENTROPY OF MESSAGE=-Σ Pi*log2(Pi), I=1 TO N
HERE N=3
ENTROPY OF THE MESSAGE
= 0.6*log2(1/0.6)+0.2*log2(1/0.2)+0.2*log2(1/0.2)
=0.6*0.74 + 0.2*2.32 + 0.2*2.32
=0.44 + 0.46 + 0.46
=1.36= AV NO OF BITS REQUIRED TO
  REPRESENT A CHARACTER
CODES
 A CODE IS ANY MAPPING FROM AN
  INPUT ALPHABET TO AN OUTPUT
  ALPHABET
 A CODE CAN BE SAY {a,b,c} = {0,1,00},
  BUT THIS CODE IS NOT UNIQUELY
  DECODABLE.
 IF THE DECODER GETS A CODE
  MESSAGE OF 2 ZEROS, THERE IS NO
  WAY IT CAN KNOW WHETHER THE
  ORIGINAL MESSAGE HAD TWO a’S OR
  ONE c’S
CODES
 A CODE IS INSTANTANEOUS IF EACH
  CODEWORD IN A MESSAGE CAN BE
  DECODED AS SOON AS IT IS RECEIVED.
 THE BINARY CODE {a,b} = {0,01} IS
  UNIQUELY DECODABLE, BUT IT IS NOT
  INSTANTANEOUS. ONE HAS TO SEE IF
  THE NEXT BIT IS 1. IF IT IS, b IS
  DECODED; IF NOT a IS DECODED.
 THE BINARY CODE {a,b,c}={0,10,11} IS
  AN INSTANTANEOUS CODE
CODES
 A CODE IS A PREFIX CODE IFF NO
  CODEWORD IS A PREFIX OF ANOTHER
  CODE WORD.
 A CODE IS INSTANTANEOUS IFF IT IS A
  PREFIX CODE, SO A PREFIX CODE IS
  ALWAYS A UNIQUELY DECODABLE
  INSTANTANEOUS CODE.
 ALL UNIQUELY DECODABLE CODES
  CAN BE CHANGED INTO PREFIX CODES
  OF EQUAL CODE LENGTHS.
TYPES OF CODING
 THERE ARE MANY ALGORITHMS
  FOR CODING THE CHARACTERS
  BUT CAN BE BROADLY DIVIDED
  INTO 2 TYPES:

 STATIC (FIXED SIZE) CODING

 DYNAMIC (VARIABLE SIZE)
  CODING
STATIC CODING SCHEME
 IF THE MESSAGE IS COMPOSED
  BY THE COMBINATION OF M
  DISTINCT CHARS, THEN THE
  POSSIBLE NO. OF BITS REQUIRED
  IN THE CODE= N = logbM, WHERE
  N= MINIMUM NO. OF BITS
  REQUIRED TO REPRESENT M
  DISTINCT CHARS AND b= BASE OF
  THE NUMBER SYSTEM
STATIC CODING SCHEME
 THE MAIN DISADVANTAGE IS
  THAT IT DOES NOT CONSIDER
  THE FREQUENCY OR PROB OF
  OCCURANCE OF A PARTICULAR
  CHAR IN THE MESSAGE
STATIC CODING SCHEME
QUES 3: CONSIDER THE MESSAGE “RAMRAHIM”
 FIND THE NO OF DISTINCT CHARS, THE MIN
 NO OF BITS REQUIRED TO REPRESENT A
 CHAR, GENERATE THE CODE FOR ALL
 DISTINCT CHARS, BY USING THESE CODES
 WHAT SHALL BE THE CODED MESSAGE FOR
 THE MESSAGE “MIHIR”, HOW MUCH IS THE
 SAVING BY USING THE CODING SCHEME
 OVER ASCII REPRESENTATION
STATIC CODING SCHEME
   NO OF DISTINCT CHARS = 5
   N=log2M = log25= 3; SO 3 BIT CODE IS NEEDED TO
    REPRESENT EACH SYMBOL
   000=R, 001=A, 010=M, 011=H, 100=I; REST ARE
    UNUSED
   BY USING THE CODES AS ABOVE, THE CODED
    MESSAGE FOR “MIHIR” SHALL BE
    “010100011100000”
   EACH CHARACTER OF THE STRING IS
    REPRESENTED BY 3 BITS AND THERE ARE 5
    CHARACTERS IN THE MESSAGE. SO THE NO OF BITS
    REQUIRED= 5*3=15; THEREFORE SAVING = 40 – 15 =
    25 BITS
DYNAMIC CODING
SCHEME
 COMPUTERS ENCODE CHARS IN ASCII CODE.
  SO, A FILE HAVING 100 CHARS SHALL
  REQUIRE 800 BITS
 BUT IN ANY TEXT FILES, SOME CHARS
  OCCUR WITH MORE FREQUENCY THAN
  OTHERS
 SO, IT IS BETTER THAT SHORTER BIT CODES
  ARE ASSIGNED TO THE FREQUENTLY
  OCCURING CHARS THAN OTHERS.
 THIS WAS ALSO REALIZED WAY BACK BY
  SAMUEL NORSE.
 THIS CONCEPT IS USED IN DYNAMIC CODING.
DYNAMIC CODING SCHEME
 IT USES VARIABLE SIZE CODE
 MINIMUM NO OF BITS ARE ASSIGNED
  TO THE MOST FREQUENTLY OCCURING
  CHARACTER AND MAXIMUM NO OF
  BITS TO THOSE WHICH ARE LEAST
  FREQUENTLY USED.
 ANY STATISTICAL MODEL MAY BE
  USED TO CALCULATE THE
  FREQUENCY OF OCCURANCE OF
  CHARACTERS.
DYNAMIC CODING SCHEME
QUES 4: CONSIDER THE MESSAGE
    “RAAMRAHMMM”
FIND OUT THE DISTINCT CHARACTERS AND
    THEIR FREQUENCY, GENERATE CODES FOR
    ALL CHARACTERS USING DYNAMIC
    CODING, USING GENERATED CODES WRITE
    THE CODE FOR “MAHR”, HOW MUCH IS THE
    SAVINGS IN BIT.
3. 4 DISTINCT CHARS; R-2,A-3, M-4 AND H-1
4. M-1,A-01,R-001,H-0001
5. 1010001001= 10 BITS
6. SAVINGS = 32 – 10 = 22 BITS
USE OF ENTROPY IN CODING
 THE ENTROPY FN IS USED TO
  DEVELOP AN EFFICIENT CODE FOR
  THE PURPOSE OF COMMUNICATION.

 ONE CAN USE ENTROPY TO FIND OUT
  THE SCOPE OF FURTHER REFINEMENT
  IN THE CODING SCHEME AS THE
  ENTROPY OF THE MESSAGE RESULTS
  IN AVERAGE NO OF BITS REQUIRED TO
  REPRESENT A CHARACTER.
USE OF ENTROPY IN CODING
QUES 5: CONSIDER A MESSAGE STREAM CONSISTING OF
   CHARS A,B,C,D. LET THE PROB OF OCCURANCE OF
   CHARS BE 0.6, 0.3, 0.08 AND 0.02 RESPECTIVELY. Si
   RESPECTIVELY IS 0.73,1.73,3.64 AND 5.64
B. FIND MIN NO OF BITS REQ TO REPRESENT A CHAR
   USING STATIC CODING, IF A MESSAGE CONSISTS OF
   ALL THE 4 CHARS
C. GENERATE CODE FOR THE CHARS USING DYNAMIC
   SCHEME. WHAT IS THE AV NO OF BITS REQ TO
   REPRESENT A CHAR IN THIS CODING SCHEME, IF A
   MESSAGE CONTAINS 100 CHARS
D. IS THERE ANY POSSIBILITY OF FURTHER
   REFINEMENT IN THE CODING SCHEME?
USE OF ENTROPY IN CODING
   M=4; MIN NO OF BITS REQ TO REP A
    CHAR=N=log24=2
    BY LOOKING INTOTHE TABLE, THE FOLLOWING
     CODES CAN BE GENERATED USING DYNAMIC
     SCHEME:
    CHAR     PROB      CODE
      A        0.70      1
      B        0.15      01
      C        0.10      001
      D        0.05      0001
    AV NO OF BITS REQ TO COMM A MESSAGE OF 100
     CHARS = [(70*1)+(15*3)+(10*2)+(5*4)]/100
    = 150/100 = 1.5
USE OF ENTROPY IN CODING
3. DYNAMIC CODING IS MORE EFFICIENT
4. ENTROPY = - Σ Pi *log2(Pi)
  = - (0.7*log2(0.7)+0.15*log2(.15)+0.1*log2(.1)+0.05*
log2(.05))
  = 1.31
 SO, AV NO OF BITS REQ TO REPRESENT A CHAR IN THE
  MESSAGE=1.3
THERE IS A DIFFERENCE BETWEEN THE ENTROPY VALUE
  AND THE NO OF BITS REQUIRED BY BOTH THE
  METHODS, THEREFORE FURTHER REFINEMENT IS
  POSSIBLE IN THE CODING SCHEMES.
LOSSLESS DATA COMPRESSION

 ALL ALGORITHMS ATTEMPT TO
  RE-ENCODE DATA TO REMOVE
  REDUNDANCY

 IT IMPLIES THAT DATA WITH NO
  REDUNDANCY CAN NOT BE
  COMPRESSED BY THESE
  TECHNIQUES WITHOUT SOME
  LOSS OF INFORMATION
SHANNON FANO ALGORITHM
 IT USES THE IDEA OF USING
  SHORTER CODES FOR MORE
  FREQUENTLY OCCURING
  CHARACTERS

 GIVEN BY CLAUDE SHANNON &
  R.M.FANO
SHANNON FANO ALGORITHM
 ADV?
 CONSIDER A FILE HAVING 40
  LETTERS WITH THE GIVEN
  FREQUENCY- A:14; B:7; C:10; D:5;
  E:4
 ASCII – 40*8=320 BITS. DECODING
  SIMPLY CONSISTS OF BREAKING
  INTO 8 BYTES AND CONVERTING
  IT INTO CHARACTER. SO, IT
  NEEDS NO ADDITIONAL INFO.
SHANNON FANO ALGORITHM
 VARIABLE LENGTH ENCODING
  SCHEMES SUCH AS HUFFMAN
  AND SHANNON-FANO HAVE THE
  FOLLOWING PROPERTIES:

 CODES FOR MORE FREQUENT
  CHARS ARE SHORTER THAN
  ONES FOR LESS PROBABLE
  CHARS
SHANNON FANO ALGORITHM
 EACH CODE CAN BE UNIQUELY
  DECODED. THIS IS CALLED THE PREFIX
  PROPERTY ie., NO CHARS ENCODING
  IS A PREFIX OF ANY OTHER.
• TO SEE WHY THIS PROPERTY IS
  IMPORTANT, CONSIDER “A” ENCODED
  AS 0;”B” AS 01;”C” AS 10. IF THE
  DECODER ENCOUNTERS THE BIT-
  STREAM “0010”, IS IT “ABA” OR
  “AAC”?
SHANNON FANO ALGORITHM
 WITH THE PREFIX GUARANTEE, THERE
  IS NO AMBIGUITY IN DETERMINING
  WHERE THE CHAR BOUNDARIES ARE.

 ONE STARTS READING FROM THE
  BEGINNING AND GATHER BITS IN A
  SEQUENCE UNTIL ONE FINDS A
  MATCH.

 THAT INDICATES THE END OF CHAR
  AND ONE MOVES ALONG TO THE NEXT
  CHAR.
SHANNON FANO ALGORITHM
1. FIND THE FREQ OF OCCURANCE OF
   EACH SYMBOL

3. SORT IT IN THE DESCENDING ORDER

5. DIVIDE THE LIST INTO 2 PARTS, WITH
   THE TOTAL FREQ COUNT OF THE
   UPPER HALF BEING AS CLOSE TO
   THAT OF THE BOTTOM HALF AS
   POSSIBLE
SHANNON FANO ALGORITHM
1. REPEAT STEP 3 UNTIL EACH HALF
   CONTAINS JUST ONE SYMBOL

3. CONSTRUCT THE BINARY TREE (SF
   TREE) SO THAT THE UPPER HALF
   BECOMES THE LEFT SUB-TREE AND
   THE LOWER HALF BECOMES THE
   RIGHT SUB-TREE. EACH LEFT
   BRANCH IS ASSIGNED 0 AND EACH
   RIGHT HALF 1
SHANNON FANO ALGORITHM
1. TO OBTAIN THE CODE FOR ANY
   SYMBOL, THE CODE IS THE
   COMBINATION OF ALL THE
   DIGITS FROM THE ROOT TO
   THAT LEAF (SYMBOL)
 QUES: APPLY SF ALGO TO A
   TEXT FILE HAVING 40 CHARS
   WITH THE GIVEN FREQ: A-14,
   B-7, C-10, D-5, E-4
SHANNON FANO ALGORITHM
1. SYMBOL   FREQUENCY
    A           14
    B           7
    C           10
    D           5
    E           4
SHANNON FANO ALGORITHM
1. SORT IT IN DESCENDING ORDER
   SYMBOL        FREQUENCY
    A               14
    C               10
    B               7
    D               5
    E               4
SHANNON FANO ALGORITHM
1. DIVIDING INTO PARTS
   FIRST ITERATION
    A        14
    C        10
    B       7
    D       5
    E       4
SHANNON FANO ALGORITHM
SECOND ITERATION

      A    14
      C    10
      B    7
      D    5
      E    4
SHANNON FANO ALGORITHM
THIRD ITERATION
           A        14
          C         10
          B         7
          D         5
          E         4

  AFTER THE FOURTH ITERATION, WE
  WILL HAVE THE FOURTH DIVISION AND
  ALL THE HALF WILL THEN HAVE ONLY
  ONE SYMBOL.
SHANNON FANO ALGORITHM
1.   SHANNON FANO TREE:

                      40
                 0          1
            24                  16
        0        1          0            1
       14        10        7                 9
        A        C         B         0           1

                                     5               4
                                     D               E
SHANNON FANO ALGORITHM
1.   OBTAINING THE CODE FROM THE TREE
     SYMBOL CODE NO OF BITS     FREQUENCY
       A       00      2              14
       B       10      2              7
      C        01      2              10
       D       110     3              5
       E       111     3              4
TOTAL NO OF BITS NEEDED FOR TEXT = 89
SO, AV NO OF BITS USED BY ANY SYMBOL=89/40=2.225
WHICH IS QUITE LESS AS COMPARED TO 8 BITS PER
SYMBOL NEEDED IN ASCII
HUFFMAN ALGORITHM
 GIVEN BY DAVID HUFFMAN
 IMPROVEMENT OVER S-F ALGO.
 LOSSLESS COMP ALGO, IDEAL FOR
  COMPRESSING TEXT OR PROGRAM
  FILES
 HUFFMAN CODE TABLE GUARANTEES
  TO PRODUCE THE LOWEST POSSIBLE
  OUTPUT BIT COUNT POSSIBLE FOR
  THE INPUT STREAM OF SYMBOLS,
  WHEN USING FIXED LENGTH CODES
HUFFMAN ALGORITHM
 HUFFMAN CALLED THESE
  “MINIMUM REDUNDANCY CODES”

 IT BELONGS TO THE FAMILY OF
  ALGOS WITH A VARIABLE CODE
  WORD LENGTH.

 USED IN PKZIP, LHA, GZ, ZOO AND
  ARJ, JPEG AND MPEG
HUFFMAN ALGORITHM
 MAIN DIFFERENCE:

 S-F ALGO BUILDS THE BINARY TREE
  FROM TOP TO BOTTOM, WHEREAS
  HUFFMAN’S ALGO FORMS THE BINARY
  TREE FROM BOTTOM TO TOP

 PERFORMANCE OF BOTH OF THEM
  ARE QUITE SIMILAR
HUFFMAN ALGORITHM
1. COUNT THE NO OF CHARS AND THE
   FREQ OF OCCURANCE OF EACH
   CHARACTER

3. ARRANGE THEM IN THE DESCENDING
   ORDER OF FREQ.

5. CONSTRUCT HUFFMAN TREE FOR
   THE GENERATION OF CODES
HUFFMAN ALGORITHM
 CONSTRUCTION OF HUFFMAN TREE:
2.   PICK UP 2 CHARS FROM THE LIST HAVING
     MINIMUM FREQ. LET US CALL THESE
     CHARS A AND B

4.   CREATE 2 FREE NODES OF THE BT AND
     ASSIGN A AND B TO THESE NODES

6.   ASSIGN A PARENT NODE FOR THEM AND
     ASSIGN IT THE FREQ THAT IS THE SUM OF
     THE CHILD NODES. LET US CALL IT “AB”
HUFFMAN ALGORITHM
1.   DELETE A AND B FROM THE LIST

3.   ADD THE VALUE OF “AB” TO THE LIST

5.   REPEAT THE STEPS 1 TO 5 TILL THE LIST OF CHARS
     BECOMES EMPTY. THE RESULTANT TREE THUS
     GENERATED IS THE HUFFMAN TREE.

7.   ASSIGN THE BITS TO THE NODES OF THE TREE AS IN
     S-F ALGO.ie., 0 TO LEFT CHILD & 1 TO RIGHT CHILD

9.   TO FIND THE CODE FOR A CHAR, TRAVERSE FROM
     ROOT TO LEAF CONTAINING THAT CHAR.
HUFFMAN ALGORITHM
 PROBLEM: LET A MESSAGE OF 100 CHARS
  CONTAIN THE FOLLOWING:
     CHAR           FREQUENCY
     A                   50
     B                   20
     C                   15
     D                   10
     E                   5

 STEPS 1 AND 2 HAVE ALREADY BEEN DONE
HUFFMAN ALGORITHM
    CONSTRUCTION OF HUFFMAN TREE:
2.   TWO CHARS HAVING MINIMUM FREQ ARE D
     &E
2    MAKE D AND E 2 FREE NODES OF THE TREE

             10    5

            D    E
3. ASSIGN A PARENT NODE FOR THEM:

                        DE
                   15

              10         5
         D                   E
HUFFMAN ALGORITHM
4 & 5: DELETE D & E FROM THE LIST AND ADD
  DE AND REPEAT
                 30    CDE


            15          15
           C          DE
6. REPEAT 1 TO 5 UNTIL LIST EMPTY
                 50    CDEB


          30             20
         CDE             B
HUFFMAN ALGORITHM
CONTD:              CDEBA
              100


         50          50
     CDEB            A
HUFFMAN ALGORITHM
 H TREE:                          CDEBA
                         100

                    0               1
                50                   50     CDEB
                A            0                1

                        30 CDE                    20 B

                0              1
            15               15
            C            0     DE 1
                        10              5
                     D              E
HUFFMAN ALGORITHM
CHARACTER        HUFFMAN CODE      SIZE
      A               0             1
      B               11            2
      C               100           3
      D               1010          4
      E               1011          4
TOTAL NO OF BITS REQUIRED=195
AV BITS USED = 1.95
ENTROPY OF MESSAGE= - Σ Pi* log (Pi) =1.932
  BITS
SO, REDUNDANCY= 1.95 – 1.932 = 0.018
  BITS/CHAR
HUFFMAN ALGORITHM
 HUFFMAN CODING CAN BE FURTHER
  OPTIMIZED:
 EXTENDED HUFFMAN COMPRESSION-
  CAN ENCODE GROUP OF SYMBOLS
  RATHER THAN SINGLE SYMBOL

 ADAPTIVE HUFFMAN CODING-
  DYNAMICALLY CHANGES THE CODE
  WORDS ACCORDING TO THE CHANGE
  OF PROBABILITY OF SYMBOLS
ARITHMETIC CODING
 IT IS A METHOD OF WRITING A CODE IN A
  NON-INTEGER LENGTH

 IT ALLOWS ONE TO CODE VERY CLOSE TO
  IDEAL ENTROPY

 IT DOES NOT REPLACE AN INPUT SYMBOL
  WITH A SPECIFIC CODE

 INSTEAD, IT TAKES A STREAM OF INPUT
  SYMBOLS AND REPLACES IT WITH A SINGLE
  FLOATING POINT OUTPUT NUMBER
ARITHMETIC CODING
 IT IS QUITE SIMILAR TO HUFFMAN, BECAUSE
  IT IS USED FOR THE SAME KIND OF
  CONTENTS TO COMPRESS AS IN HUFFMAN

 IT IS DIFFERENT FROM HUFFMAN IN THE WAY
  IT PROCESSES THE SOURCE

 INSTEAD OF GIVING BIT VALUE TO EACH
  CHAR, IT USES PROB VALUE FOR EACH CHAR

 IT IS BASED UPON PROB BETWEEN 0 AND 1
ARITHMETIC CODING
 THE OUTPUT FROM AN ARITHMETIC CODING
  PROCESS IS A SINGLE NUMBER LESS THAN 1
  AND GREATER THAN OR EQUAL TO 0

 THIS SINGLE NUMBER CAN BE UNIQUELY
  DECODED TO CREATE THE EXACT STREAM
  OF SYMBOLS THAT WENT INTO ITS
  CONSTRUCTION

 IT RESULTS IN BEST COMPRESSION RATIO.
ARITHMETIC CODING
1.   IT REQUIRES 5 VARIABLES FOR ENCODING:
     RANGE, LOW, HIGH, RF(RANGE FROM), RT
     (RANGE TO)

EXAMPLE: LET THE MESSAGE BE “ABCBAA”.
    THE FREQ OF CHARS A, B AND C ARE 3, 2
    AND 1 RESPECTIVELY. A TABLE IS TO BE
    CREATED AS FOLLOWS:
CHAR PROB RANGE RANGEFROM RANGETO
  A     0.5  >=0&<.5     0.00        0.50
  B    0.33 >=0.5 &<0.83 0.50        0.83
  C    0.16 >=0.83&<1 0.83           1.00
ARITHMETIC CODING
1.   NOW ENCODE THE CHARS IN THE MESSAGE USING
     THE TABLE AS OBTAINED IN STEP 1, AS FOLLOWS,
     AND CREATE A TABLE AGAIN:



     SET LOW=0 AND HIGH=1.0
     WHILE there are still input symbols, DO
     GET AN INPUT SYMBOL
     RANGE=HIGH (previous) – LOW (prev)
     LOW=LOW(prev) + (RANGE*RF of current symbol)
     HIGH=LOW(prev) + (RANGE* RT of current symbol)
     END OF EHILE
     OUTPUT=LOW
ARITHMETIC CODING
CHAR            RANGE            LOW       HIGH
START-NONE 1-0=1                 0+1*0=0 0+1*0=0
  A             1-0=1            0+1*0=0 0+1*.5=0.5
  B    0.5-0=0.5    0+.5*.5=.25 0+.5*.83=.415
  C   0.415-.25=0.165 .25+.165*.83 .25+.165*1.0
                            =.38695               =0.415
  B   0.415-0.38695 .38695+.02805 .38695+.02805
         = .02805       *0.5=.400975   *.83=.4102315
  A 0.4102315-.400975 .400975+          .400975+
      =0.0092565        *0.0=0.400975 .0092565*0.5
                                        =0.40560325
  A 0.40560325-       .400975+          0.400975+
    0.400975         .00462825*0.0     .00462825*.5
    =0.00462825       = 0.400975       = 0.403289125
ARITHMETIC CODING
    THE FINAL OUTPUT VALUE = LOW =0.400975
    STEP 3: TO DECODE THE MESSAGE Ie., TO GET THE
     CHARS BACK, THE FOLLOWINF PROCESS IS
     ADOPTED:
    AGAIN 5 VARIABLES ARE REQUIRED- RANGE, RF, RT,
     VALUE AND RD (RANGE DIFFERENCE)

5.   OUTPUT THE SYMBOL BY DETERMINING THAT IN
     WHICH RANGE THE VALUE IS. IN THIS EXAMPLE
     OUTPUT IS 0.400975 WHICH LIES BETWEEN O AND
     0.5. SO, THE FIRST CHARACTER DECODED IS “A”.

2. GET A NEW VALUE USING RD=RT – RF
   NEW VALUE = (PREV VALUE – PREV RF)/PREV RD
ARITHMETIC CODING
VALUE      RANGE       CHAR DECODED          RD
.400975 0.00 - <0.5    A            0.5
.80195     0.5 - <0.83        B              0.33
0.915     0.83 - <1.00        C              0.16
0.53125 0.5 - <0.83           B              0.33
0.09469696 0.00 - <0.5        A              0.50
0.18939392 0.00 - <0.5        A              0.50
ADV: BETTER RESULT THAN HUFFMAN
DISADV:
10. COMPLICATED CALCULATIONS
11. IT REQUIRES FPU, SO PROCESS IS SLOW
12. DOES NOT KNOW WHERE THE DECODING PROCESS
      SHOULD END. TO OVERCOME THIS PROBLEM, ONE SPL
      CHAR IS INSERTED INTO THE ENCODED TEXT AS
      DELIMITER. AT THE TIME OF DECODING, IT INDICATES
      THAT THERE ARE NO MORE CHARS TO DECODE.
Compression ratio
 ONE NEEDS TO KNOW IT TO FIND
  OUT THE EFFICIENCY OF THE
  COMPRESSION ALGORITHM

 C.R.= SIZE OF O.D.- SIZE OF C.D.
          SIZE OF ORIGINAL DATA
DICTIONARY BASED COMPRESSION TECHNIQUES


 STATISTICAL METHODS, SUCH AS S-F AND
  HUFFMAN, ENCODE A SINGLE SYMBOL AT A
  TIME BY GENERATING A ONE-TO-ONE
  SYMBOL-TO-CODE MAP.

 DICTIONARY BASED COMPRESSOR
  REPLACES AN OCCURANCE OF A
  PARTICULAR PHRASE OR GROUP OF BYTES
  IN A PIECE OF DATA WITH AN INDEX TO THE
  PREVIOUS OCCURANCE OF THAT PHRASE.
DICTIONARY BASED COMPRESSION TECHNIQUES


 SUPPOSE A TEXT IS GIVEN

 IT IS ASSUMED THAT THERE IS A DICTIONARY
  THAT HAS ALL THE WORDS IN THE GIVEN
  TEXT.

 EACH WORD IN THE DICTIONARY IS
  REPRESENTED BY A UNIQUE NUMBER THAT
  ALSO INDICATES THE POSITION OR THE
  INDEX OF THE WORD IN THE DICTIONARY.
DICTIONARY BASED COMPRESSION TECHNIQUES

 WHEN THE TEXT IS TO BE COMPRESSED, THE WORDS
  OF THE TEXT ARE REPLACED BY THE INDEX OF THAT
  WORD.

 LET THE TEXT BE “LEARN THE DICTIONARY BASED
  COMPRESSION METHOD. IT IS A VERY SIMPLE
  METHOD. THANK YOU.”

 SAY THE DICTIONARY IS LIKE THIS: LEARN-1; THE-2;
  DICTIONARY-3; BASED-4; COMPRESSION-5; METHOD-6;
  IT-7; IS-8; A-9; VERY-10; SIMPLE-11; THANK YOU-12

 THE ENCODED MESSAGE WILL BE “1 2 3 4 5 6 7 8 9 10
  11 6 12”
DICTIONARY BASED COMPRESSION TECHNIQUES

 IF THE PHRASES ARE USED, EFFICIENCY INCREASES.

 FOR THIS TO WORK, IT IS IMPORTANT THAT THE
  SENDER AND THE RECEIVER MUST HAVE ACCESS TO
  THE SAME DICTIONARY.

 DICTIONARY BASED METHODS ARE MORE EFFICIENT
  THAN CHARACTER BASED METHODS.

 IT GENERATES THE CODE FOR CHARS AS WELL AS
  FREQUENTLY USED WORDS AND PHRASES.
DICTIONARY BASED COMPRESSION TECHNIQUES


 THE DICTIONARY BASED METHOD MAY BE
  STATIC OR DYNAMIC DEPENDING UPON THE
  CREATION AND USE OF DICTIONARY.

 STATIC DICTIONARY IS PREPARED BEFORE
  THE COMMUNICATION OF THE ENCODED
  MESSAGE TO THE RECEIVER’S END. ALL
  POSSIBLE CHARS/WORDS/PHRASES ARE
  INSERTED INTO THE DICTIONARY AND
  INDEXED.
DICTIONARY BASED COMPRESSION TECHNIQUES


 THE MAIN DRAWBACK OF STATIC METHOD IS
  THAT PERFORMANCE DEPENDS UPON THE
  TEXT TO BE ENCODED AND IS HIGHLY
  DEPENDENT ON THE ORGANIZATION OF THE
  CHARS/WORDS/PHRASES IN THE
  DICTIONARY.
 SECONDLY, IF THERE IS ANY WORD NOT IN
  THE DICTIONARY, IT FAILS.

 THE SOLUTION TO THE PROBLEM IS DYNAMIC
  DICTIONARY COMPRESSION.
DICTIONARY BASED COMPRESSION TECHNIQUES


 IN THIS METHOD, THE DICTIONARY IS
  PREPARED AT THE TIME OF ENCODING OF
  TEXT.

 LZ77, LZ78 AND LZW TECHNIQUES USE
  DYNAMIC DICTIONARY COMPRESSION
  TECHNIQUE.

 IT GENERATES OPTIMUM SIZE CODES.
LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE

 DESIGNED FOR SEQUENTIAL DATA COMPRESSION.

 THE DICTIONARY IS A PORTION OF THE PREVIOUSLY
  ENCODED SEQUENCE.

 THE ENCODER EXAMINES THE INPUT SEQUENCE
  THROUGH A SLIDING WINDOW.

 THE WINDOW CONSISTS OF 2 PARTS: A SEARCH
  BUFFER, THAT CONTAINS A PORTION OF THE
  RECENTLY ENCODED SEQUENCE AND A LOOK-AHEAD
  BUFFER, THAT CONTAINS THE NEXT PORTION OF THE
  SEQUENCE TO BE ENCODED.
LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE


LZ77:         pointer
    SEARCH BUFFER        LOOKAHEAD BUFFER
   c    a   b r A c a   d a b r A r r   a r   r




1. To encode the sequence in look-ahead buffer,
the encoder moves a search pointer back through
the search buffer until it encounters a match to the
first symbol in the look-ahead buffer. The distance
of the pointer from the look-ahead buffer is called
the offset.
LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE


1.   The encoder then examines the symbols
     following the symbol at the pointer location to
     see if they match consecutive symbols in the
     look-ahead buffer. The number of consecutive
     symbols in the search buffer that match
     consecutive symbols in the look-ahead
     buffer, starting with the first symbol, is called
     the length of the match. The encoder
     searches the search buffer for the longest
     match.
LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE


1.   Once the longest match has been found, the
     encoder encodes it with a triple <o,l,c> where
     o is the offset, l is the length of the match and
     c is the code-word corresponding to the
     symbol in the look-ahead buffer that follows
     the match.

     In the diagram, the longest match is the first a
     of the search buffer. The offset o in this case
     is 7, l is 4, and the symbol in the look-ahead
     buffer following the match is r.
LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE


 The reason for sending the third element in the
  triple is to take care of the situation where no
  match for the symbol in the look-ahead buffer
  can be found in the search buffer. In this case,
  the offset and the match length values are set
  to 0, and the third element of the triple is the
  code for the symbol itself.

 For the decoding process, it is basically a table
  look-up procedure and can be done by
  reversing the encoding procedure.
LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE

 Take a buffer of the same size as used in
  encoding, say n, and then use its first (N – n)
  spaces to hold the previously decoded chars,
  where N is the size of the window ( sum of the
  size of the look-ahead buffer and the search
  buffer) used in the encoding process.

 If one breaks up each triple that one encounters
  back into its components- position offset o,
  match length l and the last symbol of the
  incoming stream c, one can extract the match
  string from buffer according to o, and thus
  obtain the original content.
LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE

 It provides very good compression ratio for
  many types of data.

 But, the encoding process is very time
  consuming as there are many comparisons to
  be performed between the look-ahead buffer
  and the window.

 On the other hand, the decoding process is
  very simple and fast and both the encoding and
  the decoding processes have a low memory
  consumption, since the only data held in the
  memory is the window (between 4 & 64 kb)
LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE

 ALL POPULAR ARCHIVERS (.ARJ, .LHA, .ZIP, .ZOO) ARE
  VARIATIONS ON LZ77 THEME.

 DRAWBACK: IT USES ONLY A SMALL WINDOW INTO
  PREVIOSLY SEEN TEXT, WHICH MEANS IT
  CONTINUOUSLY THROWS AWAY VALUABLE
  DICTIONARY ENTRIES BECAUSE THEY SLIDE OUT OF
  THE DICTIONARY. THE LONGEST MATCH POSSIBLE IS
  ROUGHLY THE SIZE OF THE LOOK-AHEAD BUFFER.

 SECONDLY, IF A STRING THAT HAS ALREADY BEEN
  CAPTURED APPEARS AT LONGER INTERVAL, THEN A
  SEPARATE CODE WILL BE GENERATED FOR THE
  SAME STRING.
LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE

 TO OVERCOME SUCH PROBLEMS, LZ78 ALGO WAS
  GIVEN.

 THE ONLY DIFFERENCE HERE IS THAT THE FIXED SIZE
  WINDOW OF LZ77 IS REPLACED BY A DICTIONARY IN
  LZ78.

 WHILE LZ77 WORKS ON PAST DATA, LZ78 ATTEMPTS
  TO WORK ON FUTURE DATA.

 IT DOES THIS BY FORWARD SCANNING THE INPUT
  BUFFER AND MATCHING IT AGAINST A DICTIONARY IT
  MAINTAINS.
LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE


 THE DICTIONARY IN LZ78 IS A TABLE OF
  STRINGS. EVERY STRING IS ASSIGNED A
  CODE WORD ACCORDING TO ITS INDEX
  NUMBER IN THE DICTIONARY.

 BEFORE UNDERSTANDING THE METHOD,
  LOOK AT THE FOLLOWING TERMS:

 CHARSTREAM: A SEQUENCE OF DATA TO BE
  ENCODED.
LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE

 CODE WORD: A BASIC DATA ELEMENT IN THE CODE
  STREAM. IT REPRESENTS A STRING FROM THE
  DICTIONARY.

 PREFIX: A SEQUENCE OF CHARS THAT PRECEDE ONE
  CHARACTER

 STRING: THE PREFIX TOGETHER WITH THE CHAR IT
  PRECEDES

 CODESTREAM: THE SEQUENCE OF CODE WORDS AND
  CHARS ( THE OUTPUT OF THE ENCODING ALGORITHM)
LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE

 CURRENT PREFIX (P) : THE PREFIX
  CURRENTLY BEING PROCESSED IN THE
  ENCODING ALGORITHM

 CURRENT CHARACTER (C ): A CHAR
  DETERMINED IN THE ENCODING ALGORITHM.
  GENERALLY, THIS IS THE CHARACTER
  PRECEDED BY THE CURRENT PREFIX.

 CURRENT CODE WORD (W): THE CODE WORD
  CURRENTLY PROCESSED IN THE DECODING
  ALGORITHM.
LZ78 (LEMPEL-ZIV) ENCODING PROCESS

 IT STARTS WITH A NEW DICTIONARY ie., AT
  THE BEGINNING OF ENCODING THE
  DICTIONARY IS EMPTY.

 LET US CONSIDER A POINT WITHIN THE
  ENCODING PROCESS, WHEN THE
  DICTIONARY ALREADY CONTAINS SOME
  STRINGS.

 ONE STARTS ANALYZING A NEW PREFIX IN
  THE CHARSTREAM, BEGINNING WITH AN
  EMPTY PREFIX.
LZ78 (LEMPEL-ZIV) ENCODING PROCESS

 IF ITS CORRESPONDING STRING (P+ C) IS
  PRESENT IN THE DICTIONARY, THE PREFIX IS
  EXTENDED WITH THE CHAR C.

 THIS EXTENDING IS REPEATED UNTIL ONE
  GETS A STRING WHICH IS NOT PRESENT IN
  THE DICTIONARY.

 AT THAT POINT, ONE OUTPUTS 2 THINGS TO
  THE CODESTREAM: THE CODEWORD THAT
  REPRESENTS THE PREFIX P AND THEN THE
  CHAR C.
LZ78 (LEMPEL-ZIV) ENCODING PROCESS

 THEN ONE ADDS THE WHOLE STRING (P+C) TO THE
  DICTIONARY AND STARTS PROCESSING THE NEXT
  PREFIX IN THE CHARSTREAM.




 A SPECIAL CASE OCCURS IF THE DICTIONARY DOES
  NOT CONTAIN EVEN THE STARTING ONE CHARACTER
  STRING ( IT ALWAYS HAPPENS IN THE FIRST
  ENCODING STEP). IN THAT CASE, ONE OUTPUTS A
  SPECIAL CODE WORD THAT REPRESENTS AN EMPTY
  STRING, FOLLOWED BY THIS CHARACTER AND ADD
  THIS CHAR TO THE DICTIONARY.
LZ78 (LEMPEL-ZIV) ENCODING PROCESS

 THE OUTPUT FROM THIS ALGO IS A SEQ OF
  CODEWORD CHAR PAIR (W,C). EACH TIME A
  PAIR IS OUTPUT TO THE CODESTREAM, THE
  STRING FROM THE DICTIONARY
  CORRESPONDING TO W IS EXTENDED WITH
  THE CHAR C AND THE RESULTING STRING IS
  ADDED TO THE DICTIONARY.


 IT MEANS THAT WHEN A NEW STRING IS
  BEING ADDED, THE DICTIONARY ALREADY
  CONTAINS ALL THE SUBSTRINGS FORMED BY
  REMOVING CHARS FROM THE END OF THE
  NEW STRING.
LZ78 (LEMPEL-ZIV) ENCODING ALGORITHM

     LZ78:
2.    START WITH AN EMPTY DICTIONARY WITH AN EMPTY PREFIX P.
3.    C= NEXT CHAR IN THE CHARSTREAM
4.    IS THE STRING (P+C) PRESENT IN THE DICTIONARY?
     IF YES, THEN P= P+C
     IF NOT, THEN
          OUTPUT THESE 2 OBJECTS, P & C, TO THE
          CODESTREAM, [THE CODEWORD
          CORRESPONDING TO P AND C IN THE SAME
          FORM AS INPUT FROM CHARSTREAM]
          ADD THE STRING P+C TO THE DICTIONARY
          P= EMPTY
     ARE THERE MORE CHARS IN THE CHARSTREAM?
          IF YES, RETURN TO STEP 2
          IF NOT
              IF P IS NOT EMPTY, OUTPUT THE CODE WORD
              CORREPONDING TO P
      END
LZ78 (LEMPEL-ZIV) DECODING PROCESS

 AT THE START OF DECODING THE DICTIONARY IS
  EMPTY.

 IT GETS RECONSTRUCTED IN THE PROCESS OF
  DECODING.

 IN EACH STEP, A PAIR CODEWORD-CHAR –(W,C) IS
  READ FROM THE CODESTREAM.

 THE CODEWORD ALWAYS REFERS TO A STRING
  ALREADY PRESENT IN THE DICTIONARY. THE STRING
  W & C ARE OUTPUT TO THE CHARSTREAM AND THE
  STRING (W+C) IS ADDED TO THE DICTIONARY. AFTER
  THE DECODING, THE DICTIONARY WILL LOOK
  EXACTLY THE SAME AS AFTER ENCODING.
LZ78 (LEMPEL-ZIV) DECODING ALGORITHM


1.   AT THE START THE DICTIONARY IS EMPTY.
2.   W= NEXT CODEWORD IN THE CODESTREAM
3.   C= THE CHARACTER FOLLOWING IT
4.   OUTPUT W TO THE CODESTREAM (THIS
     CAN BE AN EMPTY STRING) AND THEN
     OUTPUT C
5.   ADD THE STRING W+C TO THE DICTIONARY
6.   ARE THERE MORE CODEWORDS IN THE
     CODESTREAM?
     IF YES, GO BACK TO STEP 2
     IF NOT, END.
LZ78 (LEMPEL-ZIV) ENCODING PROCESS

 EXAMPLE: LET THE CHAR STREAM TO BE
  ENCODED BE
POS   1 2 3 4 5 6 7 8 9
CHAR A B B C B C A B A
             ENCODING PROCESS
ENCODING CURRENT    DICTIONARY   OUTPUT
STEP     POSITION
   1        1            A       (0,A)
   2        2            B       (0,B)
   3        3            BC      (2,C)
   4        5            BC A    (3,A)
   5        8            B A     (2,A)
LZ78 (LEMPEL-ZIV) ENCODING PROCESS


 THE COLUMN DICTIONARY SHOWS WHAT
  STRING HAS BEEN ADDED TO THE
  DICTIONARY. THE INDEX OF THE STRING IS
  EQUAL TO THE STEP NUMBER.

 THE COLUMN OUTPUT PRESENTS THE
  OUTPUT IN THE FORM (W,C)

 THE OUTPUT OF EACH STEP DECODES TO
  THE STRING THAT HAS BEEN ADDED TO THE
  DICTIONARY.
LZ78 (LEMPEL-ZIV) DECODING PROCESS


THE DECODING PROCESS

STEPS     OUTPUT PHASE    TEXT GENERATED
 1        (0,A)           A
 2        (0,B)           B
 3        (2,C)           BC
 4        (3,A)           BCA
 5        (2,A)           BA
LZW ALGORITHM
 LZW WORKS BY ENTERING PHRASES INTO A
  DICTIONARY AND THEN, WHEN A REPEAT
  OCCURANCE OF THAT PARTICULAR PHRASE
  IS FOUND, OUTPUTTING THE DICTIONARY
  INDEX INSTEAD OF THE PHRASE.

 FOR EXAMPLE, IT USES A DICTIONARY WITH
  4096 ENTRIES. IN THE BEGINNING, THE
  ENTRIES 0-255 REFER TO INDIVIDUAL BYTES
  AND THE REST 256-4095 REFER TO LONGER
  STRINGS.
LZW ALGORITHM
 EACH TIME A NEW CODE IS GENERATED, IT
  MEANS A NEW STRING HAS BEEN SELECTED
  FROM THE INPUT STREAM.

 NEW STRINGS THAT ARE ADDED TO THE
  DICTIONARY ARE CREATED BY APPENDING
  THE CURRENT CHARACTER K TO THE END OF
  AN EXISTING STRING W.
LZW ALGORITHM
SET W=NIL
LOOP
   READ A CHARACTER K
   IF WK EXISTS IN THE DICTIONARY
       W=WK
   ELSE
       OUTPUT THE CODE FOR W
       ADD WK TO THE STRING TABLE
       W=K
END-LOOP
JPEG
 DESIGNED FOR COMPRESSING FULL COLOUR
  OR GRAY SCALE DIGITAL IMAGES OF REAL-
  WORLD SCENES.

 IT DOES NOT WORK WELL ON TEXT, NON-
  REALISTIC IMAGES SUCH AS CARTOONS AND
  LINE DRAWINGS.

 IT IS INDEPENDENT OF SOURCE IMAGE.

 IT MEANS IT CAN COMPRESS IMAGE
  IRRESPECTIVE OF ITS SIZE.
JPEG
 IT DOES NOT HANDLE B&W (1 BIT PER PIXEL)
  IMAGES NOR DOES IT HANDLE MOTION
  PICTURE COMPRESSION.

 IT USES LOSSY TECHNIQUE.

 THE ALGO ACHIEVES MUCH OF ITS
  COMPRESSION BY EXPLOITING KNOWN
  LIMITATIONS OF HUMAN EYE, THE FACT THAT
  SMALL COLOUR DETAILS ARE NOT
  PERCEIVED AS WELL AS SMALL DETAILS OF
  LIGHT AND DARK.
JPEG
 IT IS INTENDED FOR COMPRESSING IMAGES
  THAT WILL BE LOOKED AT BY HUMANS.

 THE JPEG STANDARD INCLUDES A
  SEPARATE LOSSLESS MODE, BUT IT IS
  RARELY USED AND DOES NOT GIVE NEARLY
  AS MUCH COMPRESSION AS THE LOSSY
  MODE.

 A USEFUL PROPERTY OF JPEG IS THAT THE
  DEGREE OF LOSSINESS CAN BE VARIED BY
  ADJUSTING COMPRESSION PARAMETERS.
JPEG
 DECODERS CAN TRADE-OFF DECODING SPEED
  AGAINST IMAGE QUALITY BY USING FAST BUT
  INACCURATE APPROXIMATIONS TO THE REQUIRED
  CALCULATIONS.

 MAIN ADV OF USING JPEG IS THAT ONE CAN MAKE
  IMAGE FILES SMALLER, AS WELL AS ONE CAN STORE
  24 BIT/PIXEL COLOR DATA (16 MILLION COLORS)
  INSTEAD OF 8 BIT/PIXEL DATA (256 OR FEWER
  COLORS)

 JPEG CAN EASILY PROVIDE 20:1 COMPRESSION OF
  FULL COLOR DATA. AT LOW QUALITY EVEN 100:1
  COMPRESSION IS POSSIBLE.

 JPEG IS WIDELY USED ON WWW FOR
  STORING/TRANSMITTING PHOTOGRAPHS.
MPEG
 THE MAIN ADVANTAGE IS THAT IT
  COMPRESSES DATA UPTO 1.5 MBITS/SECOND
  WHICH IS EQUAL TO CDROM DATA
  TRANSFER RATE.

 USING MPEG1, ONE CAN DELIVER 1.2 Mbps
  OF VIDEO AND 250 kbps OF 2 CHANNEL
  STEREO SOUND USING CDROM
  TECHNOLOGY. SO MOVIE MAY BE STORED
  ON CDROM IN MPEG FORMAT AND MAY BE
  VIEWED WITHOUT ANY SYNCHRONIZATION
  FAULT.
MPEG
 JPEG IS FOR STILL IMAGE COMPRESSION
  WHEREAS MPEG IS FOR MOVING PICTURES.

 BUT AS DIGITAL VIDEO OR MOVIES STORE A
  SEQ OF STILL COLOR IMAGES, MPEG
  STANDARD USES THE JPEG COMPRESSION
  TO COMPRESS STILL COLOR IMAGES.

 MPEG IS SUITABLE FOR SYMMETRIC AS
  WELL AS ASYMMETRIC COMPRESSION.
MPEG
 ASYMMETRIC COMPRESSION REQUIRES
  MORE EFFORT FOR CODING THAN
  DECODING. IN THIS CASE, COMPRESSION IS
  CARRIED OUT ONCE WHEREAS
  DECOMPRESSION IS PERFORMED MANY
  TIMES.

 SYMMETRIC COMPRESSION IS KNOWN TO
  EXPECT EQUAL EFFORT FOR COMPRESSION
  AND DECOMPRESSION PROCESSES.
MPEG
 INTERACTIVE DIALOGUE APPLICATIONS
  MAKE USE OF THIS ENCODING TECHNIQUE,
  WHERE RESTRICTED END-TO-END DELAY IS
  REQUIRED.

 MPEG HAS BECOME THE METHOD OF CHOICE
  FOR ENCODING MOTION IMAGES BECAUSE IT
  HAS BECOME WIDELY ACCEPTED FOR BOTH
  INTERNET AND DVD-VIDEO.
MHEG
 MULTIMEDIA HYPERMEDIA EXPERT GROUP

 SET UP BY ISO FOR STANDARDIZATION OF EXCHANGE
  FORMAT FOR MULTIMEDIA PRESENTATION AND
  MULTIMEDIA SYSTEM.

 IT IS ALMOST IMPOSSIBLE TO MAKE A MM
  PRESENTATION WHICH CAN WORK ACROSS DIFF HW
  PLATFORMS.

 MAIN OBJECTIVE OF THIS GROUP IS TO CREATE THE
  STANDARD METHOD OF STORE, EXCHANGE AND
  DISPLAY MM PRESENTATION.
MHEG
 IT IS BASED ON OBJECT ORIENTED TECHNOLOGY.

 FOR MM PRESENTATION, THERE ARE MANY CLASSES
  THAT DEFINE HOW AUDIO, VIDEO AND MUSIC CAN BE
  PLAYED.

 THERE ARE CLASSES THAT CAN HELP TO DEVELOP
  USER INTERACTION DURING MM PRESENTATIONS.

 THE THREE IMP CLASSES USED ARE CONTENT CLASS,
  BEHAVIOUR CLASS AND INTERACTION CLASS.
MHEG
 CONTENT CLASS IS USED TO DESCRIBE THE
  ACTUAL CONTENTS OF THE MM
  PRESENTATION

 BEHAVIOUR CLASS IS USED TO DECIDE THE
  BEHAVIOUR OF PRESENTATION.FOR
  EXAMPLE HOW AND WHEN DATA WILL BE
  PRESENTED TO THE USER. IT HAS 2 SUB
  CLASSES- ACTION CLASS AND LINK CLASS
  WHICH ARE USEFUL FOR SYNC THE EVENTS
  WITH THE USER INTERFACE.
MHEG
 THIS CLASS DESCRIBES THE ELEMENTS OF
  THE USER INTERFACE (ie., THE ELEMENTS
  THAT APPEAR ON THE USER SCREEN) THAT
  ALLOW THE USER TO MAKE SELECTIONS,
  TRIGGER EVENTS AND INPUT INFORMATION.
  FOR EXAMPLE, THE ELEMENTS CHECK BOX,
  RADIO BUTTONS AND LISTS ARE USED TO
  MAKE SELECTIONS; ELEMENT PUSH BUTTON
  IS USED TO TRIGGER EVENTS AND TEXT
  ENTRY FIELD IS USED TO INPUT
  INFORMATION FROM THE USER.

More Related Content

What's hot

Data Compression (Lossy and Lossless)
Data Compression (Lossy and Lossless)Data Compression (Lossy and Lossless)
Data Compression (Lossy and Lossless)Project Student
 
Data compression techniques
Data compression techniquesData compression techniques
Data compression techniquesDeep Bhatt
 
Multimedia image compression standards
Multimedia image compression standardsMultimedia image compression standards
Multimedia image compression standardsMazin Alwaaly
 
MPEG video compression standard
MPEG video compression standardMPEG video compression standard
MPEG video compression standardanuragjagetiya
 
Ppt on audio file formats
Ppt on audio file formatsPpt on audio file formats
Ppt on audio file formatsIshank Ranjan
 
Dictionary Based Compression
Dictionary Based CompressionDictionary Based Compression
Dictionary Based Compressionanithabalaprabhu
 
Chapter 3 : IMAGE
Chapter 3 : IMAGEChapter 3 : IMAGE
Chapter 3 : IMAGEazira96
 
Chapter 3 - Multimedia System Design
Chapter 3 - Multimedia System DesignChapter 3 - Multimedia System Design
Chapter 3 - Multimedia System DesignPratik Pradhan
 
Chapter 8 - Multimedia Storage and Retrieval
Chapter 8 - Multimedia Storage and RetrievalChapter 8 - Multimedia Storage and Retrieval
Chapter 8 - Multimedia Storage and RetrievalPratik Pradhan
 
Digital Image Processing - Image Compression
Digital Image Processing - Image CompressionDigital Image Processing - Image Compression
Digital Image Processing - Image CompressionMathankumar S
 
Compression of digital voice and video
Compression of digital voice and videoCompression of digital voice and video
Compression of digital voice and videosangusajjan
 
Introduction to Image Compression
Introduction to Image CompressionIntroduction to Image Compression
Introduction to Image CompressionKalyan Acharjya
 

What's hot (20)

JPEG
JPEGJPEG
JPEG
 
Data Compression (Lossy and Lossless)
Data Compression (Lossy and Lossless)Data Compression (Lossy and Lossless)
Data Compression (Lossy and Lossless)
 
Audio and Video Compression
Audio and Video CompressionAudio and Video Compression
Audio and Video Compression
 
Data compression techniques
Data compression techniquesData compression techniques
Data compression techniques
 
Video Compression
Video CompressionVideo Compression
Video Compression
 
Multimedia image compression standards
Multimedia image compression standardsMultimedia image compression standards
Multimedia image compression standards
 
data compression.
data compression.data compression.
data compression.
 
Compression techniques
Compression techniquesCompression techniques
Compression techniques
 
Text compression
Text compressionText compression
Text compression
 
MPEG video compression standard
MPEG video compression standardMPEG video compression standard
MPEG video compression standard
 
Ppt on audio file formats
Ppt on audio file formatsPpt on audio file formats
Ppt on audio file formats
 
Dictionary Based Compression
Dictionary Based CompressionDictionary Based Compression
Dictionary Based Compression
 
Chapter 3 : IMAGE
Chapter 3 : IMAGEChapter 3 : IMAGE
Chapter 3 : IMAGE
 
Chapter 3 - Multimedia System Design
Chapter 3 - Multimedia System DesignChapter 3 - Multimedia System Design
Chapter 3 - Multimedia System Design
 
Chapter 8 - Multimedia Storage and Retrieval
Chapter 8 - Multimedia Storage and RetrievalChapter 8 - Multimedia Storage and Retrieval
Chapter 8 - Multimedia Storage and Retrieval
 
JPEG Image Compression
JPEG Image CompressionJPEG Image Compression
JPEG Image Compression
 
Digital Image Processing - Image Compression
Digital Image Processing - Image CompressionDigital Image Processing - Image Compression
Digital Image Processing - Image Compression
 
Image Compression
Image CompressionImage Compression
Image Compression
 
Compression of digital voice and video
Compression of digital voice and videoCompression of digital voice and video
Compression of digital voice and video
 
Introduction to Image Compression
Introduction to Image CompressionIntroduction to Image Compression
Introduction to Image Compression
 

Similar to Data Compression Techniques: Lossless vs Lossy, Audio & Video Compression

Video Compression Basics - MPEG2
Video Compression Basics - MPEG2Video Compression Basics - MPEG2
Video Compression Basics - MPEG2VijayKumarArya
 
10lecture10datacompression-171023182241.pdf
10lecture10datacompression-171023182241.pdf10lecture10datacompression-171023182241.pdf
10lecture10datacompression-171023182241.pdfPUSHKAR ARYA
 
Compression presentation 415 (1)
Compression presentation 415 (1)Compression presentation 415 (1)
Compression presentation 415 (1)Godo Dodo
 
Basics of Mpeg 4 3D Graphics Compression
Basics of Mpeg 4 3D Graphics CompressionBasics of Mpeg 4 3D Graphics Compression
Basics of Mpeg 4 3D Graphics CompressionMarius Preda PhD
 
Data representation
Data representationData representation
Data representationChingTing
 
video compression techique
video compression techiquevideo compression techique
video compression techiqueAshish Kumar
 
Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainVideoguy
 
simple video compression
simple video compression simple video compression
simple video compression LaLit DuBey
 
An overview Survey on Various Video compressions and its importance
An overview Survey on Various Video compressions and its importanceAn overview Survey on Various Video compressions and its importance
An overview Survey on Various Video compressions and its importanceINFOGAIN PUBLICATION
 
Lecture 6 -_presentation_layer
Lecture 6 -_presentation_layerLecture 6 -_presentation_layer
Lecture 6 -_presentation_layerSerious_SamSoul
 
Ch07_-_Multimedia_Element-Video_1_.ppt
Ch07_-_Multimedia_Element-Video_1_.pptCh07_-_Multimedia_Element-Video_1_.ppt
Ch07_-_Multimedia_Element-Video_1_.pptdjempol
 
Data Communication & Computer network: Data compression
Data Communication & Computer network: Data compressionData Communication & Computer network: Data compression
Data Communication & Computer network: Data compressionDr Rajiv Srivastava
 
J03502050055
J03502050055J03502050055
J03502050055theijes
 
Digital Video And Compression
Digital Video And CompressionDigital Video And Compression
Digital Video And CompressionRobert Burk
 

Similar to Data Compression Techniques: Lossless vs Lossy, Audio & Video Compression (20)

Video Compression Basics - MPEG2
Video Compression Basics - MPEG2Video Compression Basics - MPEG2
Video Compression Basics - MPEG2
 
10lecture10datacompression-171023182241.pdf
10lecture10datacompression-171023182241.pdf10lecture10datacompression-171023182241.pdf
10lecture10datacompression-171023182241.pdf
 
lecture on data compression
lecture on data compressionlecture on data compression
lecture on data compression
 
Compression presentation 415 (1)
Compression presentation 415 (1)Compression presentation 415 (1)
Compression presentation 415 (1)
 
Basics of Mpeg 4 3D Graphics Compression
Basics of Mpeg 4 3D Graphics CompressionBasics of Mpeg 4 3D Graphics Compression
Basics of Mpeg 4 3D Graphics Compression
 
Data representation
Data representationData representation
Data representation
 
85 videocompress
85 videocompress85 videocompress
85 videocompress
 
video compression techique
video compression techiquevideo compression techique
video compression techique
 
video comparison
video comparison video comparison
video comparison
 
Introduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag JainIntroduction to Video Compression Techniques - Anurag Jain
Introduction to Video Compression Techniques - Anurag Jain
 
simple video compression
simple video compression simple video compression
simple video compression
 
An overview Survey on Various Video compressions and its importance
An overview Survey on Various Video compressions and its importanceAn overview Survey on Various Video compressions and its importance
An overview Survey on Various Video compressions and its importance
 
Lecture 6 -_presentation_layer
Lecture 6 -_presentation_layerLecture 6 -_presentation_layer
Lecture 6 -_presentation_layer
 
Media Encoding
Media Encoding Media Encoding
Media Encoding
 
Ch07_-_Multimedia_Element-Video_1_.ppt
Ch07_-_Multimedia_Element-Video_1_.pptCh07_-_Multimedia_Element-Video_1_.ppt
Ch07_-_Multimedia_Element-Video_1_.ppt
 
Data Communication & Computer network: Data compression
Data Communication & Computer network: Data compressionData Communication & Computer network: Data compression
Data Communication & Computer network: Data compression
 
J03502050055
J03502050055J03502050055
J03502050055
 
Audio compression
Audio compressionAudio compression
Audio compression
 
Digital Video And Compression
Digital Video And CompressionDigital Video And Compression
Digital Video And Compression
 
Multimedia Object - Video
Multimedia Object - VideoMultimedia Object - Video
Multimedia Object - Video
 

More from Ashish Kumar

Computer architecture
Computer architecture Computer architecture
Computer architecture Ashish Kumar
 
Introduction vision
Introduction visionIntroduction vision
Introduction visionAshish Kumar
 
Introduction image processing
Introduction image processingIntroduction image processing
Introduction image processingAshish Kumar
 
Image pre processing-restoration
Image pre processing-restorationImage pre processing-restoration
Image pre processing-restorationAshish Kumar
 
Image pre processing
Image pre processingImage pre processing
Image pre processingAshish Kumar
 
Image pre processing - local processing
Image pre processing - local processingImage pre processing - local processing
Image pre processing - local processingAshish Kumar
 
Enhancement in spatial domain
Enhancement in spatial domainEnhancement in spatial domain
Enhancement in spatial domainAshish Kumar
 
Enhancement in frequency domain
Enhancement in frequency domainEnhancement in frequency domain
Enhancement in frequency domainAshish Kumar
 
Digitized images and
Digitized images andDigitized images and
Digitized images andAshish Kumar
 
process management
 process management process management
process managementAshish Kumar
 
resource management
  resource management  resource management
resource managementAshish Kumar
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memoryAshish Kumar
 
remote procedure calls
  remote procedure calls  remote procedure calls
remote procedure callsAshish Kumar
 
Introduction image processing
Introduction image processingIntroduction image processing
Introduction image processingAshish Kumar
 

More from Ashish Kumar (20)

Computer architecture
Computer architecture Computer architecture
Computer architecture
 
Lecture2 color
Lecture2 colorLecture2 color
Lecture2 color
 
Introduction vision
Introduction visionIntroduction vision
Introduction vision
 
Introduction image processing
Introduction image processingIntroduction image processing
Introduction image processing
 
Image pre processing-restoration
Image pre processing-restorationImage pre processing-restoration
Image pre processing-restoration
 
Image pre processing
Image pre processingImage pre processing
Image pre processing
 
Image pre processing - local processing
Image pre processing - local processingImage pre processing - local processing
Image pre processing - local processing
 
Enhancement in spatial domain
Enhancement in spatial domainEnhancement in spatial domain
Enhancement in spatial domain
 
Enhancement in frequency domain
Enhancement in frequency domainEnhancement in frequency domain
Enhancement in frequency domain
 
Digitized images and
Digitized images andDigitized images and
Digitized images and
 
Data structures
Data structuresData structures
Data structures
 
Lecture2 light
Lecture2 lightLecture2 light
Lecture2 light
 
process management
 process management process management
process management
 
resource management
  resource management  resource management
resource management
 
distributed shared memory
 distributed shared memory distributed shared memory
distributed shared memory
 
message passing
 message passing message passing
message passing
 
remote procedure calls
  remote procedure calls  remote procedure calls
remote procedure calls
 
color
colorcolor
color
 
Introduction image processing
Introduction image processingIntroduction image processing
Introduction image processing
 
system design
system designsystem design
system design
 

Recently uploaded

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Allon Mureinik
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure servicePooja Nehwal
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsMemoori
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking MenDelhi Call girls
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...HostedbyConfluent
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitecturePixlogix Infotech
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonetsnaman860154
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slidespraypatel2
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024BookNet Canada
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking MenDelhi Call girls
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticscarlostorres15106
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhisoniya singh
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationSafe Software
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking MenDelhi Call girls
 

Recently uploaded (20)

Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)Injustice - Developers Among Us (SciFiDevCon 2024)
Injustice - Developers Among Us (SciFiDevCon 2024)
 
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure serviceWhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
WhatsApp 9892124323 ✓Call Girls In Kalyan ( Mumbai ) secure service
 
AI as an Interface for Commercial Buildings
AI as an Interface for Commercial BuildingsAI as an Interface for Commercial Buildings
AI as an Interface for Commercial Buildings
 
08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men08448380779 Call Girls In Civil Lines Women Seeking Men
08448380779 Call Girls In Civil Lines Women Seeking Men
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
Transforming Data Streams with Kafka Connect: An Introduction to Single Messa...
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Understanding the Laravel MVC Architecture
Understanding the Laravel MVC ArchitectureUnderstanding the Laravel MVC Architecture
Understanding the Laravel MVC Architecture
 
How to convert PDF to text with Nanonets
How to convert PDF to text with NanonetsHow to convert PDF to text with Nanonets
How to convert PDF to text with Nanonets
 
Slack Application Development 101 Slides
Slack Application Development 101 SlidesSlack Application Development 101 Slides
Slack Application Development 101 Slides
 
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
#StandardsGoals for 2024: What’s new for BISAC - Tech Forum 2024
 
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
08448380779 Call Girls In Diplomatic Enclave Women Seeking Men
 
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmaticsKotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
Kotlin Multiplatform & Compose Multiplatform - Starter kit for pragmatics
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | DelhiFULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
FULL ENJOY 🔝 8264348440 🔝 Call Girls in Diplomatic Enclave | Delhi
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry InnovationBeyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
Beyond Boundaries: Leveraging No-Code Solutions for Industry Innovation
 
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men08448380779 Call Girls In Greater Kailash - I Women Seeking Men
08448380779 Call Girls In Greater Kailash - I Women Seeking Men
 

Data Compression Techniques: Lossless vs Lossy, Audio & Video Compression

  • 1. TOPICS  DATA COMPRESSION  COMPRESSION TECHNIQUES  LOSSLESS COMPRESSION  LOSSY COMPRESSION  AUDIO COMPRESSION  VIDEO COMPRESSION  MPEG COMPRESSION  JPEG COMPRESSION  LOSSLESS VS. LOSSY COMPRESSION  ADVANTAGE OF COMPRESSION
  • 2. DATA COMPRESSION The process of reducing the volume of data by applying a compression technique is called compression. The resulting data is called compressed data.
  • 3. DATA COMPRESSION The reverse process of reproducing the original data from compressed data is called decompression. The resulting data is called decompressed data.
  • 4. Reasons to Compress Reduce File Size Save disk space Increase transfer speed at a given data rate Allow real-time transfer at a given data rate
  • 5. Types of compression techniques Compression techniques can be categorized based on following consideration:  Lossless or lossy  Symmetrical or asymmetrical  Software or hardware
  • 6. Types of compression techniques 1. Lossless or lossy  If the decompressed data is the same as the original data, it is referred to as lossless compression, otherwise the compression is lossy.
  • 7. Types of compression techniques 2. Symmetrical or asymmetrical  In symmetrical compression, the time required to compress and to decompress are roughly the same.  In asymmetrical compression, the time taken for compression is usually much longer than decompression.
  • 8. Types of compression techniques 3. Software or hardware  A compression technique may be implemented either in hardware or software. As compared to software codecs (coder and decoder), hardware codecs offer better quality and performance.
  • 10. Compression - Types  Spatial Compression – Finds similarities in an image and compresses those similarities in a smaller form – Intra-frame  Temporal Compression – Finds similarities across images and compresses those similarities in a smaller form – Inter-frame  Quality of Compression – Lossless – Lossy
  • 11. Compression - Spatial  Run Length Encoding – Replace a run of consecutive pixels of the same color by a single copy of the color value and a count of the number of pixels  Huffman coding – Similar to RLE, but assigns codes of different lengths to colors (most common colors have minimum bits)
  • 12. Compression - Spatial  Dictionary-based coding – Fixed length bits point to a table of variable length colors codes – Basis of LZW and PKZIP  All Lossless compression schemes – 50% compression at best
  • 13. Compression - Spatial  GIF – Lossless compression – Best suited for simple images – Reduces colors to reduce file size (256 colors max)
  • 14. Compression - Spatial  JPEG – Joint Photographic Experts Group – Lossy compression – Best suited for photography – Throws data away to further reduce file size
  • 15. Compression - Temporal  Motion JPEG – Most popular for capturing analog video – JPEG on each frame of video – No temporal compression – Special-purpose hardware may be needed for real-time
  • 16. Compression - Temporal  DV – Most popular for storage and capturing digital video – 5:1 compression usually done in hardware (camera) – Spatial and a little temporal compression
  • 17. Compression - Temporal  MPEG – Motion Picture Experts Group – Most popular for delivery of digital video – Temporal and spatial compression – MPEG1, MPEG2, MPEG4 & MPEG 7
  • 18. BASIC COMPRESSION TECHNIQUES Lossless techniques Lossy techniques
  • 19. Lossless techniques  RUN-LENGTH CODING: - repeated symbols in a string are replaced with the symbol and the number of instances it is repeated. example “aaaabbcccccaaaaaabaaaaa” is expressed as “a4b2c5a6b1a1b4”.
  • 20. Lossless techniques VARIABLE-LENGTH CODING: - In general, coding schemes for a given character set use a fixed number of bits per character. example bcd, EBCDIC, and ASCII
  • 21. Run-Length coding  Look at compressing same sequence again: ABBBBBBBBBCDEEEEF – Using RLE compression, the compressed file takes up 10 bytes and would look like this: A Ω9BCDΩ4EF – Data size before compression: 17 bytes – Data size after compression: 10 bytes Savings: 17/10 = 1.7
  • 22. LOSSY TECHNIQUES PREDICTIVE ENCODING: - Stores only the initial sample Sample may be a pixel, line, audio sample, or video frame.
  • 23. LOSSY TECHNIQUES TRANSFORM ENCODING:- Data is converted from one domain to another. DCT (Discrete cosine transform) encoding is the best example of this method.
  • 24. Compression Fundamentals Lossless – ensures that the data recovered from the compression / decompression process is exactly the same as the original data. – Commonly used to compress executable code, text files, and numeric data.
  • 25. Compression Fundamentals  Lossy – does not promise that the data received is exactly the same as the data sent – removes removes information that it cannot later restore (Hopefully, no one will notice.) – Commonly used to compress digital imagery, including video.
  • 27. Introduction  Digital Audio Compression – Removal of redundant or otherwise irrelevant information from audio signal – Audio compression algorithms are often referred to as “audio encoders”  Applications – Reduces required storage space – Reduces required transmission bandwidth
  • 28. Audio Data Compression  Lossless Audio Compression – Removes redundant data – Resulting signal is same as original – perfect reconstruction  Lossy Audio Encoding – Removes irrelevant data – Resulting signal is similar to original
  • 29. Audio compression  Audio compression is a form of data compression designed to reduce the size of audio data files.  Audio compression can mean two things:  Audio data compression  Audio level compression
  • 30. Audio compression Audio data compression - in which the amount of data in a recorded waveform is reduced for transmission. This is used in MP3 encoding, internet radio, and the like.
  • 31. Audio compression Audio level compression - in which the dynamic range (difference between loud and quiet) of an audio waveform is reduced. This is used in guitar effects racks, recording studios, etc.
  • 33. MPEG Components  MPEG (motion pictures experts group) is a multimedia standard with specifications for coding, compression and transmission of audio, video and data streams.  Video: describes compression of frames  Audio: describes compression of audio frames
  • 34. Audio compression • MPEG audio  Mpeg audio is a standard for compression and decompression of digital audio.  The coding technique used in mpeg audio standard(known as perceptual coding) takes advantage of this perceptual weakness of human ears (pshychoacoustic phenomena).  In perceptual coding, the audio spectrum is divided into a set of narrow frequency bands, to reflect the frequency selectivity of human hearing.
  • 35. BASIC STEPS OF MPEG AUDIO COMPRESSION INPUT AUDIO BIT/NOISE ENCODED TIME TO SIGNAL FREQUENCY FILTER ALLOCATION, BIT-STREAM MAPPING BANK QUANTIZER, FORMATTING BIT-STREAM AND CODING PSYCHOACOUSTIC MODEL MPEG AUDIO ENCODING
  • 36. BASIC STEPS OF MPEG AUDIO COMPRESSION ENCODED FREQUENCY FREQUENCY DECODED AUDIO BIT STREAM SAMPLE TO THE BIT-STREAM UNPACKING RECONSTRUCTION MAPPING SIGNAL MPEG AUDIO DECODING
  • 37. VIDEO COMPRESSION • Mpeg video  MPEG video is a subset of the MPEG standard.  Digital video compression may either apply intraframe compression to each individual frame of the video or combine both intraframe and interframe compression.
  • 38. VIDEO COMPRESSION Mpeg uses both intra-frame and inter-frame techniques for data compression.  Mpeg compression is lossy and asymmetric, with the encoding process requiring more than the decoding process.
  • 39. BASIC STEPS OF MPEG VIDEO COMPRESSION VIDEO DATA TO BE COMPRESSED PERFORM QUANYIZATION OF DCTCOEFFICIENTS USING A Q-TABLE PREPROCESSING AND COLOR SUBSAMPLING OF INDIVIDUAL FRAMES ORDER THE 2-D OUTPUT IN ZIGZAG SEQUENCE INTERFRAME MOTION COMPENSATION FOR P-FRAME AND B-FRAME APPLY RUN-LENGTH ENCODING TO THE ZIGZAG SEQUENCE DIVIDE EACH FRAME INTO 8X8 PIXEL BLOCKS APPLY VARIABLE LENGTH ENCODING TO THE RESULTING STREAM APPLY DCT TRANSFORMATION TO EACH 8X8 PIXEL BLOCK MPEG COMPRESSED VIDEO STREAM
  • 40. Three Types of Frames  Intra frames (same as JPEG) – typically about 12 frames between I frames  Predictive frames – encode from previous I or P reference frame  Bi-directional frames – encode from previous and future I or P frames I B B P B B P B B P B B I
  • 41. Lossless compression Loss-less compressions reduce file size by encoding image information more efficiently. Images compressed using loss-less algorithms are able to be restored to their original condition.
  • 42. Lossy compression Lossy compressions reduce file size by considerably greater amounts than loss-less compressions but lose both information and quality. At high compression, the image will become visibly degraded.
  • 43. Standard JPEG Joint Photographic Experts Group  Jpeg is the standard compression techniques for still images Lossy compression Best suited for photography
  • 44. Standard  JPEG It supports the four modes of encoding – Sequential • The image is encoded in the order in which it is scanned. – Progressive • The image is encoded in multiple passes.
  • 45. JPEG (contd.) – the original quality of the image can be fully restored Hierarchical • The image is encoded at multiple resolutions to accommodate different types of displays. – Lossless • The image is encoded in such a way that
  • 47. MPEG Standard  MPEG-1  MPEG-2  MPEG-3  MPEG-4  MPEG-7  MPEG-21
  • 48. MPEG Standard MPEG-1: Initial video and audio compression standard. Later used as the standard for VIDEO CD, and (MP3) audio compression format.
  • 49. MPEG MPEG-2: Video and audio standards for broadcast-quality television. Used for digital satellite TV services like DIRECT TV, digital Cable television signals, and (with slight modifications) for DVD video discs. MPEG-3: Originally designed for HDTV, but abandoned in favor of MPEG-2.
  • 50. MPEG Standard MPEG-4: Expands MPEG-1 to support video/audio "objects", 3D content, low bitrate encoding and support for Digital Rights Managements. MPEG-7: A formal system for describing multimedia content. MPEG-21: MPEG describes this future standard as a Multimedia Framework.
  • 51. JPEG Standard  JPEG : – the real image compression is the Discrete Cosine Transform (DCT). – Removes the redundant information (the "invisible" parts).  JPEG-2000: – Successor to the JPEG . – Blockiness of JPEG is removed, – The compression ratio for JPEG 2000 is higher than for JPEG
  • 52. Codec's  Compression/Decompression Scheme  Hardware or software based  Many use both spatial and temporal compression techniques
  • 53. Why do we do data compression? Data compression is simply done for saving the space in the hard disk, thereby to make it more fault tolerant.
  • 54. What is the use of data compression on network? The most prominent use of data compression on the network is to make the server more spacious so that more files can be stored on it.
  • 55. Compression Even though disks have gotten bigger, we are still running short on disk space A common technique is to compress files so that they take up less space on the disk.
  • 56. COMPUTER TOOLS AND UTILITIES
  • 57. Compression Utilities  Zip files are used for rapidly distributing and storing files.  Zip files are compressed to save space.  WinZip - a popular compression utility for Windows.  Win RAR
  • 59. Applications  lossless data compression is often used to better use disk space on office computers, or better use the connection bandwidth in a computer network.
  • 60. Applications In other kinds of data such as sounds and pictures, a small loss of quality can be tolerated without losing the essential nature of the data, so lossy data compression methods can be used.
  • 61. Lossless vs. Lossy Compression file:///C:/Documents and Settings/login.IPS/Desktop/amit_jain/abc/lossy data compression Information From Answers_com_files/LOSSY.gif NOTE: Business data requires lossless compression, while audio and video applications can tolerate some loss, which may not be very noticeable.
  • 62. ADVANTAGES  Data compression is simply done for saving the space in the hard disk, thereby to make it more fault tolerant.  The most prominent use of data compression on the network is to make the server more spacious so that more files can be stored on it.
  • 63. Lossy vs. Lossless Compression  Lossy method can produce a much smaller compressed file than any known lossless method, while still meeting the requirements of the application.  Lossily compressed still images are often compressed to 1/10th their original size, as with audio, but the quality loss is more noticeable, especially on closer inspection.
  • 64. DATA COMPRESSION  NEEDED AS MOST OF THE REAL WORLD DATA IS REDUNDANT  IMPORTANCE?  SAVES DISK SPACE  SAVES CONNECTION BANDWIDTH  REDUCES PROCESSING TIME  REDUCES COMMUNICATION TIME  ENABLES FAST STORAGE AND RETRIEVAL
  • 65. DATA COMPRESSION  TYPES REVERSIBLE IRREVERSIBLE LOSSY – WHEN EFFICIENCY OF TRANSMISSION IS MORE IMPORTANT THAN ACCURACY OF INFORMATION.
  • 66. INFORMATION THEORY  IT IS A BRANCH OF MATHEMATICS THAT DEALS WITH DATA/INFORM - ATION REPRESENTATION  DATA COMPRESSION IS ONE OF THE APPLICATIONS OF INFORMATION THEORY
  • 67. SHANNON’S PRINCIPLE FOR INFORMATION  FOR DATA COMPRESSION, IT IS ESENTIAL TO MEASURE INFORMATION CONTENTS IN THE DATA OR THE DEGREE OF RANDOMNESS/UNCERTAINTY  HIGH PROBABILITY EVENTS CONTAIN LESS SELF-INFORMATION WHEREAS LOW PROB EVENT ASSOCIATES MUCH MORE SELF INFORMATION
  • 68. SHANNON’S PRINCIPLE FOR INFORMATION  IT WAS GIVEN BY CLAUDE SHANNON  ACCORDING TO HIM, SELF- INFORMATION IS ASSOCIATED WITH EVERY POSSIBLE OUTCOME OF AN EVENT.
  • 69. SHANNON’S PRINCIPLE FOR INFORMATION  LET P(A) & P(B) BE THE PROB OF OCCURANCE OF EVENTS A & B RESPECTIVELY.  ACCORDING TO SHANNON, SELF-INFO ASSOCIATED WITH EVENT A MAY BE DEFINED AS  Si(A) = - logmP(A)= logm[1/P(A)]  SIMILARLY, Si(B)= logm[1/P(B)]  WHERE m DEFINES THE UNIT OF INFO
  • 70. SHANNON’S PRINCIPLE FOR INFORMATION  PROB EVENT VIS-À-VIS SELF INFO P(A) Si(A) 1 0 0.5 1.0 0.25 2.0 0.10 3.32 0.05 4.32
  • 71. SHANNON’S PRINCIPLE FOR INFORMATION  CONCEPT OF Si MAY ALSO BE USED TO MAKE INFERENCES BY ASSOCIATING IT WITH 2 INDEPENDENT EVENTS  LET A & B BE 2 INDEPENDENT EVENTS, THEN  P(AB)= P(A)*P(B)  Si(AB)=-log2[P(AB)] = [-log2P(A)] + [-log2P(B)] = Si(A) + Si(B)
  • 72. ENTROPY OF INFORMATION  ENTROPY IS A CONCEPT OF THERMODYNAMICS  IN INFO THEORY, IT IS USED TO FIND OUT THE RANDOMNESS/UNCERTAINTY IN A MESSAGE
  • 73. ENTROPY OF INFORMATION  THE AVERAGE INFO CONTENT OF A MESSAGE IS CALLED ITS ENTROPY  THE LESS LIKELY A MESSAGE IS TO OCCUR, THE LARGER ITS INFO CONTENT  ENTROPY IS AN IMPORTANT CONCEPT OF DATA COMPRESSION
  • 74. ENTROPY OF INFORMATION  ENTROPY (Ee) IS THE MINIMUM NO OF BITS NEEDED TO ENCODE THAT ELEMENT  THE ENTROPY OF AN ENTIRE MESSAGE (Em) IS THE MIN NO. OF BITS NEEDED TO ENCODE THE ENTIRE MESSAGE WITH A LOSSLESS COMPRESSION.
  • 75. ENTROPY OF INFORMATION  THE ENTROPY OF A MESSAGE CAN BE USED TO DETERMINE IF THE DATA COMPRESSION IS WORTH ATTEMPTING.  IT CAN ALSO BE USED TO EVALUATE THE EFFECTIVENESS OF COMPRESSION.
  • 76. ENTROPY OF INFORMATION  THE NO. OF BITS IN A COMPRESSED CODE CAN BE COMPARED TO THE ENTROPY FOR THAT MESSAGE Em REVEALING HOW CLOSE TO OPTIMAL COMPRESSION ONE’S CODE IS.
  • 77. ENTROPY OF INFORMATION  SHANNON PROPOSED THE FOLLOWING ENTROPY FN FOR A MESSAGE: Em = - Σ Pi log2(Pi), sum over 1 TO N ---- (1) WHERE N= NO. OF POSSIBLE CHAR TYPES USED IN THE MESSAGE AND Pi DENOTES THE PROB OF THE ith CHAR. Eg “AABCCD”, N=4
  • 78. ENTROPY OF INFORMATION  THE ENTROPY OF A CHAR IS GIVEN BY ITS SELF INFO ie., ENTROPY OF A CHAR A IS GIVEN BY Ee=-log2P(A)  THE ENTROPY OF A MESSAGE CONTAINING N CHARS CAN ALSO BE FOUND OUT IN TERMS OF AV SELF INFO OF ALL N CHARS ie, Em = (1/N)*Σ Si OR ENTROPY OF ith CHAR, I= 1 TO N ------- (2)
  • 79. ENTROPY OF INFORMATION  NOTICE THE DIFFERENCE BETWEEN N IN THE TWO EQUATIONS  IN 1ST, N IS THE NO OF DISTINCT CHARS USED IN THE MESSAGE AND IN 2ND N = TOTAL NO OF CHARS USED IN THE MESSAGE
  • 80. ENTROPY OF INFORMATION  SO, ENTROPY OF A MESSAGE GIVES THE AVERAGE NO OF BITS REQUIRED TO REPRESENT A CHARACTER IN THE MESSAGE  QUES: FOR THE MESSAGE “dadbadcadbaadac” CALCULATE Si ASSOCIATED WITH CHARS A & B, ENTROPY OF CHARS C & D, AV SELF INFO IN THE MESSAGE, ENTROPY OF THE MESSAGE?
  • 81. ENTROPY OF INFORMATION  N=15 CHAR NO OF CHARS PROB OF CHAR Si d 4 4/15 1.90 a 6 6/15 1.32 b 2 2/15 2.90 c 3 3/15 2.32 AV SELF INFO OF MESSAGE= [1/N]*Σ ENTROPY OF ith CHAR = [1/15]*[E(1) + E(2) + E(3) + ….+ E(15)] =[1/15]*[E(d)+E(a)+E(d)+E(b)+………+E(c)] = [1/15] *[1.90+1.32+1.90+2.90+..+2.32] = [1/15]*28.28 = 1.88 ENTROPY OF MESSAGE=-Σ Pi*log2(Pi), i=1 TO 4 = (4/15)*(1/1.90) + (6/15)*(1/1.32) + (2/15)*(1/2.90) + (3/15)(1/2.32) = 1.88
  • 82. ENTROPY OF INFORMATION  NOTE THAT THE AV SELF INFORMATION OF THE MESSAGE AND THE ENTROPY OF THE MESSAGE BOTH ARE SAME AND BOTH THE FUNCTIONS GIVE THE AVERAGE NO OF BITS REQUIRED TO REPRESENT A CHARACTER IN THE MESSAGE
  • 83. ENTROPY OF INFORMATION  QUES2: CALCULATE THE AV NO. OF BITS REQUIRED TO REPRESENT A CHAR IN THE MESSAGE STRING “AAAAABBCC”
  • 84. ENTROPY OF INFORMATION A 6 0.6 B 2 0.2 C 2 0.2 ENTROPY OF MESSAGE=-Σ Pi*log2(Pi), I=1 TO N HERE N=3 ENTROPY OF THE MESSAGE = 0.6*log2(1/0.6)+0.2*log2(1/0.2)+0.2*log2(1/0.2) =0.6*0.74 + 0.2*2.32 + 0.2*2.32 =0.44 + 0.46 + 0.46 =1.36= AV NO OF BITS REQUIRED TO REPRESENT A CHARACTER
  • 85. CODES  A CODE IS ANY MAPPING FROM AN INPUT ALPHABET TO AN OUTPUT ALPHABET  A CODE CAN BE SAY {a,b,c} = {0,1,00}, BUT THIS CODE IS NOT UNIQUELY DECODABLE.  IF THE DECODER GETS A CODE MESSAGE OF 2 ZEROS, THERE IS NO WAY IT CAN KNOW WHETHER THE ORIGINAL MESSAGE HAD TWO a’S OR ONE c’S
  • 86. CODES  A CODE IS INSTANTANEOUS IF EACH CODEWORD IN A MESSAGE CAN BE DECODED AS SOON AS IT IS RECEIVED.  THE BINARY CODE {a,b} = {0,01} IS UNIQUELY DECODABLE, BUT IT IS NOT INSTANTANEOUS. ONE HAS TO SEE IF THE NEXT BIT IS 1. IF IT IS, b IS DECODED; IF NOT a IS DECODED.  THE BINARY CODE {a,b,c}={0,10,11} IS AN INSTANTANEOUS CODE
  • 87. CODES  A CODE IS A PREFIX CODE IFF NO CODEWORD IS A PREFIX OF ANOTHER CODE WORD.  A CODE IS INSTANTANEOUS IFF IT IS A PREFIX CODE, SO A PREFIX CODE IS ALWAYS A UNIQUELY DECODABLE INSTANTANEOUS CODE.  ALL UNIQUELY DECODABLE CODES CAN BE CHANGED INTO PREFIX CODES OF EQUAL CODE LENGTHS.
  • 88. TYPES OF CODING  THERE ARE MANY ALGORITHMS FOR CODING THE CHARACTERS BUT CAN BE BROADLY DIVIDED INTO 2 TYPES:  STATIC (FIXED SIZE) CODING  DYNAMIC (VARIABLE SIZE) CODING
  • 89. STATIC CODING SCHEME  IF THE MESSAGE IS COMPOSED BY THE COMBINATION OF M DISTINCT CHARS, THEN THE POSSIBLE NO. OF BITS REQUIRED IN THE CODE= N = logbM, WHERE N= MINIMUM NO. OF BITS REQUIRED TO REPRESENT M DISTINCT CHARS AND b= BASE OF THE NUMBER SYSTEM
  • 90. STATIC CODING SCHEME  THE MAIN DISADVANTAGE IS THAT IT DOES NOT CONSIDER THE FREQUENCY OR PROB OF OCCURANCE OF A PARTICULAR CHAR IN THE MESSAGE
  • 91. STATIC CODING SCHEME QUES 3: CONSIDER THE MESSAGE “RAMRAHIM” FIND THE NO OF DISTINCT CHARS, THE MIN NO OF BITS REQUIRED TO REPRESENT A CHAR, GENERATE THE CODE FOR ALL DISTINCT CHARS, BY USING THESE CODES WHAT SHALL BE THE CODED MESSAGE FOR THE MESSAGE “MIHIR”, HOW MUCH IS THE SAVING BY USING THE CODING SCHEME OVER ASCII REPRESENTATION
  • 92. STATIC CODING SCHEME  NO OF DISTINCT CHARS = 5  N=log2M = log25= 3; SO 3 BIT CODE IS NEEDED TO REPRESENT EACH SYMBOL  000=R, 001=A, 010=M, 011=H, 100=I; REST ARE UNUSED  BY USING THE CODES AS ABOVE, THE CODED MESSAGE FOR “MIHIR” SHALL BE “010100011100000”  EACH CHARACTER OF THE STRING IS REPRESENTED BY 3 BITS AND THERE ARE 5 CHARACTERS IN THE MESSAGE. SO THE NO OF BITS REQUIRED= 5*3=15; THEREFORE SAVING = 40 – 15 = 25 BITS
  • 93. DYNAMIC CODING SCHEME  COMPUTERS ENCODE CHARS IN ASCII CODE. SO, A FILE HAVING 100 CHARS SHALL REQUIRE 800 BITS  BUT IN ANY TEXT FILES, SOME CHARS OCCUR WITH MORE FREQUENCY THAN OTHERS  SO, IT IS BETTER THAT SHORTER BIT CODES ARE ASSIGNED TO THE FREQUENTLY OCCURING CHARS THAN OTHERS.  THIS WAS ALSO REALIZED WAY BACK BY SAMUEL NORSE.  THIS CONCEPT IS USED IN DYNAMIC CODING.
  • 94. DYNAMIC CODING SCHEME  IT USES VARIABLE SIZE CODE  MINIMUM NO OF BITS ARE ASSIGNED TO THE MOST FREQUENTLY OCCURING CHARACTER AND MAXIMUM NO OF BITS TO THOSE WHICH ARE LEAST FREQUENTLY USED.  ANY STATISTICAL MODEL MAY BE USED TO CALCULATE THE FREQUENCY OF OCCURANCE OF CHARACTERS.
  • 95. DYNAMIC CODING SCHEME QUES 4: CONSIDER THE MESSAGE “RAAMRAHMMM” FIND OUT THE DISTINCT CHARACTERS AND THEIR FREQUENCY, GENERATE CODES FOR ALL CHARACTERS USING DYNAMIC CODING, USING GENERATED CODES WRITE THE CODE FOR “MAHR”, HOW MUCH IS THE SAVINGS IN BIT. 3. 4 DISTINCT CHARS; R-2,A-3, M-4 AND H-1 4. M-1,A-01,R-001,H-0001 5. 1010001001= 10 BITS 6. SAVINGS = 32 – 10 = 22 BITS
  • 96. USE OF ENTROPY IN CODING  THE ENTROPY FN IS USED TO DEVELOP AN EFFICIENT CODE FOR THE PURPOSE OF COMMUNICATION.  ONE CAN USE ENTROPY TO FIND OUT THE SCOPE OF FURTHER REFINEMENT IN THE CODING SCHEME AS THE ENTROPY OF THE MESSAGE RESULTS IN AVERAGE NO OF BITS REQUIRED TO REPRESENT A CHARACTER.
  • 97. USE OF ENTROPY IN CODING QUES 5: CONSIDER A MESSAGE STREAM CONSISTING OF CHARS A,B,C,D. LET THE PROB OF OCCURANCE OF CHARS BE 0.6, 0.3, 0.08 AND 0.02 RESPECTIVELY. Si RESPECTIVELY IS 0.73,1.73,3.64 AND 5.64 B. FIND MIN NO OF BITS REQ TO REPRESENT A CHAR USING STATIC CODING, IF A MESSAGE CONSISTS OF ALL THE 4 CHARS C. GENERATE CODE FOR THE CHARS USING DYNAMIC SCHEME. WHAT IS THE AV NO OF BITS REQ TO REPRESENT A CHAR IN THIS CODING SCHEME, IF A MESSAGE CONTAINS 100 CHARS D. IS THERE ANY POSSIBILITY OF FURTHER REFINEMENT IN THE CODING SCHEME?
  • 98. USE OF ENTROPY IN CODING  M=4; MIN NO OF BITS REQ TO REP A CHAR=N=log24=2  BY LOOKING INTOTHE TABLE, THE FOLLOWING CODES CAN BE GENERATED USING DYNAMIC SCHEME: CHAR PROB CODE A 0.70 1 B 0.15 01 C 0.10 001 D 0.05 0001 AV NO OF BITS REQ TO COMM A MESSAGE OF 100 CHARS = [(70*1)+(15*3)+(10*2)+(5*4)]/100 = 150/100 = 1.5
  • 99. USE OF ENTROPY IN CODING 3. DYNAMIC CODING IS MORE EFFICIENT 4. ENTROPY = - Σ Pi *log2(Pi) = - (0.7*log2(0.7)+0.15*log2(.15)+0.1*log2(.1)+0.05* log2(.05)) = 1.31 SO, AV NO OF BITS REQ TO REPRESENT A CHAR IN THE MESSAGE=1.3 THERE IS A DIFFERENCE BETWEEN THE ENTROPY VALUE AND THE NO OF BITS REQUIRED BY BOTH THE METHODS, THEREFORE FURTHER REFINEMENT IS POSSIBLE IN THE CODING SCHEMES.
  • 100. LOSSLESS DATA COMPRESSION  ALL ALGORITHMS ATTEMPT TO RE-ENCODE DATA TO REMOVE REDUNDANCY  IT IMPLIES THAT DATA WITH NO REDUNDANCY CAN NOT BE COMPRESSED BY THESE TECHNIQUES WITHOUT SOME LOSS OF INFORMATION
  • 101. SHANNON FANO ALGORITHM  IT USES THE IDEA OF USING SHORTER CODES FOR MORE FREQUENTLY OCCURING CHARACTERS  GIVEN BY CLAUDE SHANNON & R.M.FANO
  • 102. SHANNON FANO ALGORITHM  ADV?  CONSIDER A FILE HAVING 40 LETTERS WITH THE GIVEN FREQUENCY- A:14; B:7; C:10; D:5; E:4  ASCII – 40*8=320 BITS. DECODING SIMPLY CONSISTS OF BREAKING INTO 8 BYTES AND CONVERTING IT INTO CHARACTER. SO, IT NEEDS NO ADDITIONAL INFO.
  • 103. SHANNON FANO ALGORITHM  VARIABLE LENGTH ENCODING SCHEMES SUCH AS HUFFMAN AND SHANNON-FANO HAVE THE FOLLOWING PROPERTIES:  CODES FOR MORE FREQUENT CHARS ARE SHORTER THAN ONES FOR LESS PROBABLE CHARS
  • 104. SHANNON FANO ALGORITHM  EACH CODE CAN BE UNIQUELY DECODED. THIS IS CALLED THE PREFIX PROPERTY ie., NO CHARS ENCODING IS A PREFIX OF ANY OTHER. • TO SEE WHY THIS PROPERTY IS IMPORTANT, CONSIDER “A” ENCODED AS 0;”B” AS 01;”C” AS 10. IF THE DECODER ENCOUNTERS THE BIT- STREAM “0010”, IS IT “ABA” OR “AAC”?
  • 105. SHANNON FANO ALGORITHM  WITH THE PREFIX GUARANTEE, THERE IS NO AMBIGUITY IN DETERMINING WHERE THE CHAR BOUNDARIES ARE.  ONE STARTS READING FROM THE BEGINNING AND GATHER BITS IN A SEQUENCE UNTIL ONE FINDS A MATCH.  THAT INDICATES THE END OF CHAR AND ONE MOVES ALONG TO THE NEXT CHAR.
  • 106. SHANNON FANO ALGORITHM 1. FIND THE FREQ OF OCCURANCE OF EACH SYMBOL 3. SORT IT IN THE DESCENDING ORDER 5. DIVIDE THE LIST INTO 2 PARTS, WITH THE TOTAL FREQ COUNT OF THE UPPER HALF BEING AS CLOSE TO THAT OF THE BOTTOM HALF AS POSSIBLE
  • 107. SHANNON FANO ALGORITHM 1. REPEAT STEP 3 UNTIL EACH HALF CONTAINS JUST ONE SYMBOL 3. CONSTRUCT THE BINARY TREE (SF TREE) SO THAT THE UPPER HALF BECOMES THE LEFT SUB-TREE AND THE LOWER HALF BECOMES THE RIGHT SUB-TREE. EACH LEFT BRANCH IS ASSIGNED 0 AND EACH RIGHT HALF 1
  • 108. SHANNON FANO ALGORITHM 1. TO OBTAIN THE CODE FOR ANY SYMBOL, THE CODE IS THE COMBINATION OF ALL THE DIGITS FROM THE ROOT TO THAT LEAF (SYMBOL)  QUES: APPLY SF ALGO TO A TEXT FILE HAVING 40 CHARS WITH THE GIVEN FREQ: A-14, B-7, C-10, D-5, E-4
  • 109. SHANNON FANO ALGORITHM 1. SYMBOL FREQUENCY A 14 B 7 C 10 D 5 E 4
  • 110. SHANNON FANO ALGORITHM 1. SORT IT IN DESCENDING ORDER SYMBOL FREQUENCY A 14 C 10 B 7 D 5 E 4
  • 111. SHANNON FANO ALGORITHM 1. DIVIDING INTO PARTS FIRST ITERATION A 14 C 10 B 7 D 5 E 4
  • 112. SHANNON FANO ALGORITHM SECOND ITERATION A 14 C 10 B 7 D 5 E 4
  • 113. SHANNON FANO ALGORITHM THIRD ITERATION A 14 C 10 B 7 D 5 E 4 AFTER THE FOURTH ITERATION, WE WILL HAVE THE FOURTH DIVISION AND ALL THE HALF WILL THEN HAVE ONLY ONE SYMBOL.
  • 114. SHANNON FANO ALGORITHM 1. SHANNON FANO TREE: 40 0 1 24 16 0 1 0 1 14 10 7 9 A C B 0 1 5 4 D E
  • 115. SHANNON FANO ALGORITHM 1. OBTAINING THE CODE FROM THE TREE SYMBOL CODE NO OF BITS FREQUENCY A 00 2 14 B 10 2 7 C 01 2 10 D 110 3 5 E 111 3 4 TOTAL NO OF BITS NEEDED FOR TEXT = 89 SO, AV NO OF BITS USED BY ANY SYMBOL=89/40=2.225 WHICH IS QUITE LESS AS COMPARED TO 8 BITS PER SYMBOL NEEDED IN ASCII
  • 116. HUFFMAN ALGORITHM  GIVEN BY DAVID HUFFMAN  IMPROVEMENT OVER S-F ALGO.  LOSSLESS COMP ALGO, IDEAL FOR COMPRESSING TEXT OR PROGRAM FILES  HUFFMAN CODE TABLE GUARANTEES TO PRODUCE THE LOWEST POSSIBLE OUTPUT BIT COUNT POSSIBLE FOR THE INPUT STREAM OF SYMBOLS, WHEN USING FIXED LENGTH CODES
  • 117. HUFFMAN ALGORITHM  HUFFMAN CALLED THESE “MINIMUM REDUNDANCY CODES”  IT BELONGS TO THE FAMILY OF ALGOS WITH A VARIABLE CODE WORD LENGTH.  USED IN PKZIP, LHA, GZ, ZOO AND ARJ, JPEG AND MPEG
  • 118. HUFFMAN ALGORITHM  MAIN DIFFERENCE:  S-F ALGO BUILDS THE BINARY TREE FROM TOP TO BOTTOM, WHEREAS HUFFMAN’S ALGO FORMS THE BINARY TREE FROM BOTTOM TO TOP  PERFORMANCE OF BOTH OF THEM ARE QUITE SIMILAR
  • 119. HUFFMAN ALGORITHM 1. COUNT THE NO OF CHARS AND THE FREQ OF OCCURANCE OF EACH CHARACTER 3. ARRANGE THEM IN THE DESCENDING ORDER OF FREQ. 5. CONSTRUCT HUFFMAN TREE FOR THE GENERATION OF CODES
  • 120. HUFFMAN ALGORITHM  CONSTRUCTION OF HUFFMAN TREE: 2. PICK UP 2 CHARS FROM THE LIST HAVING MINIMUM FREQ. LET US CALL THESE CHARS A AND B 4. CREATE 2 FREE NODES OF THE BT AND ASSIGN A AND B TO THESE NODES 6. ASSIGN A PARENT NODE FOR THEM AND ASSIGN IT THE FREQ THAT IS THE SUM OF THE CHILD NODES. LET US CALL IT “AB”
  • 121. HUFFMAN ALGORITHM 1. DELETE A AND B FROM THE LIST 3. ADD THE VALUE OF “AB” TO THE LIST 5. REPEAT THE STEPS 1 TO 5 TILL THE LIST OF CHARS BECOMES EMPTY. THE RESULTANT TREE THUS GENERATED IS THE HUFFMAN TREE. 7. ASSIGN THE BITS TO THE NODES OF THE TREE AS IN S-F ALGO.ie., 0 TO LEFT CHILD & 1 TO RIGHT CHILD 9. TO FIND THE CODE FOR A CHAR, TRAVERSE FROM ROOT TO LEAF CONTAINING THAT CHAR.
  • 122. HUFFMAN ALGORITHM  PROBLEM: LET A MESSAGE OF 100 CHARS CONTAIN THE FOLLOWING: CHAR FREQUENCY A 50 B 20 C 15 D 10 E 5  STEPS 1 AND 2 HAVE ALREADY BEEN DONE
  • 123. HUFFMAN ALGORITHM  CONSTRUCTION OF HUFFMAN TREE: 2. TWO CHARS HAVING MINIMUM FREQ ARE D &E 2 MAKE D AND E 2 FREE NODES OF THE TREE 10 5 D E 3. ASSIGN A PARENT NODE FOR THEM: DE 15 10 5 D E
  • 124. HUFFMAN ALGORITHM 4 & 5: DELETE D & E FROM THE LIST AND ADD DE AND REPEAT 30 CDE 15 15 C DE 6. REPEAT 1 TO 5 UNTIL LIST EMPTY 50 CDEB 30 20 CDE B
  • 125. HUFFMAN ALGORITHM CONTD: CDEBA 100 50 50 CDEB A
  • 126. HUFFMAN ALGORITHM  H TREE: CDEBA 100 0 1 50 50 CDEB A 0 1 30 CDE 20 B 0 1 15 15 C 0 DE 1 10 5 D E
  • 127. HUFFMAN ALGORITHM CHARACTER HUFFMAN CODE SIZE A 0 1 B 11 2 C 100 3 D 1010 4 E 1011 4 TOTAL NO OF BITS REQUIRED=195 AV BITS USED = 1.95 ENTROPY OF MESSAGE= - Σ Pi* log (Pi) =1.932 BITS SO, REDUNDANCY= 1.95 – 1.932 = 0.018 BITS/CHAR
  • 128. HUFFMAN ALGORITHM  HUFFMAN CODING CAN BE FURTHER OPTIMIZED:  EXTENDED HUFFMAN COMPRESSION- CAN ENCODE GROUP OF SYMBOLS RATHER THAN SINGLE SYMBOL  ADAPTIVE HUFFMAN CODING- DYNAMICALLY CHANGES THE CODE WORDS ACCORDING TO THE CHANGE OF PROBABILITY OF SYMBOLS
  • 129. ARITHMETIC CODING  IT IS A METHOD OF WRITING A CODE IN A NON-INTEGER LENGTH  IT ALLOWS ONE TO CODE VERY CLOSE TO IDEAL ENTROPY  IT DOES NOT REPLACE AN INPUT SYMBOL WITH A SPECIFIC CODE  INSTEAD, IT TAKES A STREAM OF INPUT SYMBOLS AND REPLACES IT WITH A SINGLE FLOATING POINT OUTPUT NUMBER
  • 130. ARITHMETIC CODING  IT IS QUITE SIMILAR TO HUFFMAN, BECAUSE IT IS USED FOR THE SAME KIND OF CONTENTS TO COMPRESS AS IN HUFFMAN  IT IS DIFFERENT FROM HUFFMAN IN THE WAY IT PROCESSES THE SOURCE  INSTEAD OF GIVING BIT VALUE TO EACH CHAR, IT USES PROB VALUE FOR EACH CHAR  IT IS BASED UPON PROB BETWEEN 0 AND 1
  • 131. ARITHMETIC CODING  THE OUTPUT FROM AN ARITHMETIC CODING PROCESS IS A SINGLE NUMBER LESS THAN 1 AND GREATER THAN OR EQUAL TO 0  THIS SINGLE NUMBER CAN BE UNIQUELY DECODED TO CREATE THE EXACT STREAM OF SYMBOLS THAT WENT INTO ITS CONSTRUCTION  IT RESULTS IN BEST COMPRESSION RATIO.
  • 132. ARITHMETIC CODING 1. IT REQUIRES 5 VARIABLES FOR ENCODING: RANGE, LOW, HIGH, RF(RANGE FROM), RT (RANGE TO) EXAMPLE: LET THE MESSAGE BE “ABCBAA”. THE FREQ OF CHARS A, B AND C ARE 3, 2 AND 1 RESPECTIVELY. A TABLE IS TO BE CREATED AS FOLLOWS: CHAR PROB RANGE RANGEFROM RANGETO A 0.5 >=0&<.5 0.00 0.50 B 0.33 >=0.5 &<0.83 0.50 0.83 C 0.16 >=0.83&<1 0.83 1.00
  • 133. ARITHMETIC CODING 1. NOW ENCODE THE CHARS IN THE MESSAGE USING THE TABLE AS OBTAINED IN STEP 1, AS FOLLOWS, AND CREATE A TABLE AGAIN: SET LOW=0 AND HIGH=1.0 WHILE there are still input symbols, DO GET AN INPUT SYMBOL RANGE=HIGH (previous) – LOW (prev) LOW=LOW(prev) + (RANGE*RF of current symbol) HIGH=LOW(prev) + (RANGE* RT of current symbol) END OF EHILE OUTPUT=LOW
  • 134. ARITHMETIC CODING CHAR RANGE LOW HIGH START-NONE 1-0=1 0+1*0=0 0+1*0=0 A 1-0=1 0+1*0=0 0+1*.5=0.5 B 0.5-0=0.5 0+.5*.5=.25 0+.5*.83=.415 C 0.415-.25=0.165 .25+.165*.83 .25+.165*1.0 =.38695 =0.415 B 0.415-0.38695 .38695+.02805 .38695+.02805 = .02805 *0.5=.400975 *.83=.4102315 A 0.4102315-.400975 .400975+ .400975+ =0.0092565 *0.0=0.400975 .0092565*0.5 =0.40560325 A 0.40560325- .400975+ 0.400975+ 0.400975 .00462825*0.0 .00462825*.5 =0.00462825 = 0.400975 = 0.403289125
  • 135. ARITHMETIC CODING  THE FINAL OUTPUT VALUE = LOW =0.400975  STEP 3: TO DECODE THE MESSAGE Ie., TO GET THE CHARS BACK, THE FOLLOWINF PROCESS IS ADOPTED:  AGAIN 5 VARIABLES ARE REQUIRED- RANGE, RF, RT, VALUE AND RD (RANGE DIFFERENCE) 5. OUTPUT THE SYMBOL BY DETERMINING THAT IN WHICH RANGE THE VALUE IS. IN THIS EXAMPLE OUTPUT IS 0.400975 WHICH LIES BETWEEN O AND 0.5. SO, THE FIRST CHARACTER DECODED IS “A”. 2. GET A NEW VALUE USING RD=RT – RF NEW VALUE = (PREV VALUE – PREV RF)/PREV RD
  • 136. ARITHMETIC CODING VALUE RANGE CHAR DECODED RD .400975 0.00 - <0.5 A 0.5 .80195 0.5 - <0.83 B 0.33 0.915 0.83 - <1.00 C 0.16 0.53125 0.5 - <0.83 B 0.33 0.09469696 0.00 - <0.5 A 0.50 0.18939392 0.00 - <0.5 A 0.50 ADV: BETTER RESULT THAN HUFFMAN DISADV: 10. COMPLICATED CALCULATIONS 11. IT REQUIRES FPU, SO PROCESS IS SLOW 12. DOES NOT KNOW WHERE THE DECODING PROCESS SHOULD END. TO OVERCOME THIS PROBLEM, ONE SPL CHAR IS INSERTED INTO THE ENCODED TEXT AS DELIMITER. AT THE TIME OF DECODING, IT INDICATES THAT THERE ARE NO MORE CHARS TO DECODE.
  • 137. Compression ratio  ONE NEEDS TO KNOW IT TO FIND OUT THE EFFICIENCY OF THE COMPRESSION ALGORITHM  C.R.= SIZE OF O.D.- SIZE OF C.D. SIZE OF ORIGINAL DATA
  • 138. DICTIONARY BASED COMPRESSION TECHNIQUES  STATISTICAL METHODS, SUCH AS S-F AND HUFFMAN, ENCODE A SINGLE SYMBOL AT A TIME BY GENERATING A ONE-TO-ONE SYMBOL-TO-CODE MAP.  DICTIONARY BASED COMPRESSOR REPLACES AN OCCURANCE OF A PARTICULAR PHRASE OR GROUP OF BYTES IN A PIECE OF DATA WITH AN INDEX TO THE PREVIOUS OCCURANCE OF THAT PHRASE.
  • 139. DICTIONARY BASED COMPRESSION TECHNIQUES  SUPPOSE A TEXT IS GIVEN  IT IS ASSUMED THAT THERE IS A DICTIONARY THAT HAS ALL THE WORDS IN THE GIVEN TEXT.  EACH WORD IN THE DICTIONARY IS REPRESENTED BY A UNIQUE NUMBER THAT ALSO INDICATES THE POSITION OR THE INDEX OF THE WORD IN THE DICTIONARY.
  • 140. DICTIONARY BASED COMPRESSION TECHNIQUES  WHEN THE TEXT IS TO BE COMPRESSED, THE WORDS OF THE TEXT ARE REPLACED BY THE INDEX OF THAT WORD.  LET THE TEXT BE “LEARN THE DICTIONARY BASED COMPRESSION METHOD. IT IS A VERY SIMPLE METHOD. THANK YOU.”  SAY THE DICTIONARY IS LIKE THIS: LEARN-1; THE-2; DICTIONARY-3; BASED-4; COMPRESSION-5; METHOD-6; IT-7; IS-8; A-9; VERY-10; SIMPLE-11; THANK YOU-12  THE ENCODED MESSAGE WILL BE “1 2 3 4 5 6 7 8 9 10 11 6 12”
  • 141. DICTIONARY BASED COMPRESSION TECHNIQUES  IF THE PHRASES ARE USED, EFFICIENCY INCREASES.  FOR THIS TO WORK, IT IS IMPORTANT THAT THE SENDER AND THE RECEIVER MUST HAVE ACCESS TO THE SAME DICTIONARY.  DICTIONARY BASED METHODS ARE MORE EFFICIENT THAN CHARACTER BASED METHODS.  IT GENERATES THE CODE FOR CHARS AS WELL AS FREQUENTLY USED WORDS AND PHRASES.
  • 142. DICTIONARY BASED COMPRESSION TECHNIQUES  THE DICTIONARY BASED METHOD MAY BE STATIC OR DYNAMIC DEPENDING UPON THE CREATION AND USE OF DICTIONARY.  STATIC DICTIONARY IS PREPARED BEFORE THE COMMUNICATION OF THE ENCODED MESSAGE TO THE RECEIVER’S END. ALL POSSIBLE CHARS/WORDS/PHRASES ARE INSERTED INTO THE DICTIONARY AND INDEXED.
  • 143. DICTIONARY BASED COMPRESSION TECHNIQUES  THE MAIN DRAWBACK OF STATIC METHOD IS THAT PERFORMANCE DEPENDS UPON THE TEXT TO BE ENCODED AND IS HIGHLY DEPENDENT ON THE ORGANIZATION OF THE CHARS/WORDS/PHRASES IN THE DICTIONARY.  SECONDLY, IF THERE IS ANY WORD NOT IN THE DICTIONARY, IT FAILS.  THE SOLUTION TO THE PROBLEM IS DYNAMIC DICTIONARY COMPRESSION.
  • 144. DICTIONARY BASED COMPRESSION TECHNIQUES  IN THIS METHOD, THE DICTIONARY IS PREPARED AT THE TIME OF ENCODING OF TEXT.  LZ77, LZ78 AND LZW TECHNIQUES USE DYNAMIC DICTIONARY COMPRESSION TECHNIQUE.  IT GENERATES OPTIMUM SIZE CODES.
  • 145. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE  DESIGNED FOR SEQUENTIAL DATA COMPRESSION.  THE DICTIONARY IS A PORTION OF THE PREVIOUSLY ENCODED SEQUENCE.  THE ENCODER EXAMINES THE INPUT SEQUENCE THROUGH A SLIDING WINDOW.  THE WINDOW CONSISTS OF 2 PARTS: A SEARCH BUFFER, THAT CONTAINS A PORTION OF THE RECENTLY ENCODED SEQUENCE AND A LOOK-AHEAD BUFFER, THAT CONTAINS THE NEXT PORTION OF THE SEQUENCE TO BE ENCODED.
  • 146. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE LZ77: pointer SEARCH BUFFER LOOKAHEAD BUFFER c a b r A c a d a b r A r r a r r 1. To encode the sequence in look-ahead buffer, the encoder moves a search pointer back through the search buffer until it encounters a match to the first symbol in the look-ahead buffer. The distance of the pointer from the look-ahead buffer is called the offset.
  • 147. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE 1. The encoder then examines the symbols following the symbol at the pointer location to see if they match consecutive symbols in the look-ahead buffer. The number of consecutive symbols in the search buffer that match consecutive symbols in the look-ahead buffer, starting with the first symbol, is called the length of the match. The encoder searches the search buffer for the longest match.
  • 148. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE 1. Once the longest match has been found, the encoder encodes it with a triple <o,l,c> where o is the offset, l is the length of the match and c is the code-word corresponding to the symbol in the look-ahead buffer that follows the match. In the diagram, the longest match is the first a of the search buffer. The offset o in this case is 7, l is 4, and the symbol in the look-ahead buffer following the match is r.
  • 149. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE  The reason for sending the third element in the triple is to take care of the situation where no match for the symbol in the look-ahead buffer can be found in the search buffer. In this case, the offset and the match length values are set to 0, and the third element of the triple is the code for the symbol itself.  For the decoding process, it is basically a table look-up procedure and can be done by reversing the encoding procedure.
  • 150. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE  Take a buffer of the same size as used in encoding, say n, and then use its first (N – n) spaces to hold the previously decoded chars, where N is the size of the window ( sum of the size of the look-ahead buffer and the search buffer) used in the encoding process.  If one breaks up each triple that one encounters back into its components- position offset o, match length l and the last symbol of the incoming stream c, one can extract the match string from buffer according to o, and thus obtain the original content.
  • 151. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE  It provides very good compression ratio for many types of data.  But, the encoding process is very time consuming as there are many comparisons to be performed between the look-ahead buffer and the window.  On the other hand, the decoding process is very simple and fast and both the encoding and the decoding processes have a low memory consumption, since the only data held in the memory is the window (between 4 & 64 kb)
  • 152. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE  ALL POPULAR ARCHIVERS (.ARJ, .LHA, .ZIP, .ZOO) ARE VARIATIONS ON LZ77 THEME.  DRAWBACK: IT USES ONLY A SMALL WINDOW INTO PREVIOSLY SEEN TEXT, WHICH MEANS IT CONTINUOUSLY THROWS AWAY VALUABLE DICTIONARY ENTRIES BECAUSE THEY SLIDE OUT OF THE DICTIONARY. THE LONGEST MATCH POSSIBLE IS ROUGHLY THE SIZE OF THE LOOK-AHEAD BUFFER.  SECONDLY, IF A STRING THAT HAS ALREADY BEEN CAPTURED APPEARS AT LONGER INTERVAL, THEN A SEPARATE CODE WILL BE GENERATED FOR THE SAME STRING.
  • 153. LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE  TO OVERCOME SUCH PROBLEMS, LZ78 ALGO WAS GIVEN.  THE ONLY DIFFERENCE HERE IS THAT THE FIXED SIZE WINDOW OF LZ77 IS REPLACED BY A DICTIONARY IN LZ78.  WHILE LZ77 WORKS ON PAST DATA, LZ78 ATTEMPTS TO WORK ON FUTURE DATA.  IT DOES THIS BY FORWARD SCANNING THE INPUT BUFFER AND MATCHING IT AGAINST A DICTIONARY IT MAINTAINS.
  • 154. LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE  THE DICTIONARY IN LZ78 IS A TABLE OF STRINGS. EVERY STRING IS ASSIGNED A CODE WORD ACCORDING TO ITS INDEX NUMBER IN THE DICTIONARY.  BEFORE UNDERSTANDING THE METHOD, LOOK AT THE FOLLOWING TERMS:  CHARSTREAM: A SEQUENCE OF DATA TO BE ENCODED.
  • 155. LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE  CODE WORD: A BASIC DATA ELEMENT IN THE CODE STREAM. IT REPRESENTS A STRING FROM THE DICTIONARY.  PREFIX: A SEQUENCE OF CHARS THAT PRECEDE ONE CHARACTER  STRING: THE PREFIX TOGETHER WITH THE CHAR IT PRECEDES  CODESTREAM: THE SEQUENCE OF CODE WORDS AND CHARS ( THE OUTPUT OF THE ENCODING ALGORITHM)
  • 156. LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE  CURRENT PREFIX (P) : THE PREFIX CURRENTLY BEING PROCESSED IN THE ENCODING ALGORITHM  CURRENT CHARACTER (C ): A CHAR DETERMINED IN THE ENCODING ALGORITHM. GENERALLY, THIS IS THE CHARACTER PRECEDED BY THE CURRENT PREFIX.  CURRENT CODE WORD (W): THE CODE WORD CURRENTLY PROCESSED IN THE DECODING ALGORITHM.
  • 157. LZ78 (LEMPEL-ZIV) ENCODING PROCESS  IT STARTS WITH A NEW DICTIONARY ie., AT THE BEGINNING OF ENCODING THE DICTIONARY IS EMPTY.  LET US CONSIDER A POINT WITHIN THE ENCODING PROCESS, WHEN THE DICTIONARY ALREADY CONTAINS SOME STRINGS.  ONE STARTS ANALYZING A NEW PREFIX IN THE CHARSTREAM, BEGINNING WITH AN EMPTY PREFIX.
  • 158. LZ78 (LEMPEL-ZIV) ENCODING PROCESS  IF ITS CORRESPONDING STRING (P+ C) IS PRESENT IN THE DICTIONARY, THE PREFIX IS EXTENDED WITH THE CHAR C.  THIS EXTENDING IS REPEATED UNTIL ONE GETS A STRING WHICH IS NOT PRESENT IN THE DICTIONARY.  AT THAT POINT, ONE OUTPUTS 2 THINGS TO THE CODESTREAM: THE CODEWORD THAT REPRESENTS THE PREFIX P AND THEN THE CHAR C.
  • 159. LZ78 (LEMPEL-ZIV) ENCODING PROCESS  THEN ONE ADDS THE WHOLE STRING (P+C) TO THE DICTIONARY AND STARTS PROCESSING THE NEXT PREFIX IN THE CHARSTREAM.  A SPECIAL CASE OCCURS IF THE DICTIONARY DOES NOT CONTAIN EVEN THE STARTING ONE CHARACTER STRING ( IT ALWAYS HAPPENS IN THE FIRST ENCODING STEP). IN THAT CASE, ONE OUTPUTS A SPECIAL CODE WORD THAT REPRESENTS AN EMPTY STRING, FOLLOWED BY THIS CHARACTER AND ADD THIS CHAR TO THE DICTIONARY.
  • 160. LZ78 (LEMPEL-ZIV) ENCODING PROCESS  THE OUTPUT FROM THIS ALGO IS A SEQ OF CODEWORD CHAR PAIR (W,C). EACH TIME A PAIR IS OUTPUT TO THE CODESTREAM, THE STRING FROM THE DICTIONARY CORRESPONDING TO W IS EXTENDED WITH THE CHAR C AND THE RESULTING STRING IS ADDED TO THE DICTIONARY.  IT MEANS THAT WHEN A NEW STRING IS BEING ADDED, THE DICTIONARY ALREADY CONTAINS ALL THE SUBSTRINGS FORMED BY REMOVING CHARS FROM THE END OF THE NEW STRING.
  • 161. LZ78 (LEMPEL-ZIV) ENCODING ALGORITHM  LZ78: 2. START WITH AN EMPTY DICTIONARY WITH AN EMPTY PREFIX P. 3. C= NEXT CHAR IN THE CHARSTREAM 4. IS THE STRING (P+C) PRESENT IN THE DICTIONARY? IF YES, THEN P= P+C IF NOT, THEN OUTPUT THESE 2 OBJECTS, P & C, TO THE CODESTREAM, [THE CODEWORD CORRESPONDING TO P AND C IN THE SAME FORM AS INPUT FROM CHARSTREAM] ADD THE STRING P+C TO THE DICTIONARY P= EMPTY ARE THERE MORE CHARS IN THE CHARSTREAM? IF YES, RETURN TO STEP 2 IF NOT IF P IS NOT EMPTY, OUTPUT THE CODE WORD CORREPONDING TO P END
  • 162. LZ78 (LEMPEL-ZIV) DECODING PROCESS  AT THE START OF DECODING THE DICTIONARY IS EMPTY.  IT GETS RECONSTRUCTED IN THE PROCESS OF DECODING.  IN EACH STEP, A PAIR CODEWORD-CHAR –(W,C) IS READ FROM THE CODESTREAM.  THE CODEWORD ALWAYS REFERS TO A STRING ALREADY PRESENT IN THE DICTIONARY. THE STRING W & C ARE OUTPUT TO THE CHARSTREAM AND THE STRING (W+C) IS ADDED TO THE DICTIONARY. AFTER THE DECODING, THE DICTIONARY WILL LOOK EXACTLY THE SAME AS AFTER ENCODING.
  • 163. LZ78 (LEMPEL-ZIV) DECODING ALGORITHM 1. AT THE START THE DICTIONARY IS EMPTY. 2. W= NEXT CODEWORD IN THE CODESTREAM 3. C= THE CHARACTER FOLLOWING IT 4. OUTPUT W TO THE CODESTREAM (THIS CAN BE AN EMPTY STRING) AND THEN OUTPUT C 5. ADD THE STRING W+C TO THE DICTIONARY 6. ARE THERE MORE CODEWORDS IN THE CODESTREAM? IF YES, GO BACK TO STEP 2 IF NOT, END.
  • 164. LZ78 (LEMPEL-ZIV) ENCODING PROCESS  EXAMPLE: LET THE CHAR STREAM TO BE ENCODED BE POS 1 2 3 4 5 6 7 8 9 CHAR A B B C B C A B A ENCODING PROCESS ENCODING CURRENT DICTIONARY OUTPUT STEP POSITION 1 1 A (0,A) 2 2 B (0,B) 3 3 BC (2,C) 4 5 BC A (3,A) 5 8 B A (2,A)
  • 165. LZ78 (LEMPEL-ZIV) ENCODING PROCESS  THE COLUMN DICTIONARY SHOWS WHAT STRING HAS BEEN ADDED TO THE DICTIONARY. THE INDEX OF THE STRING IS EQUAL TO THE STEP NUMBER.  THE COLUMN OUTPUT PRESENTS THE OUTPUT IN THE FORM (W,C)  THE OUTPUT OF EACH STEP DECODES TO THE STRING THAT HAS BEEN ADDED TO THE DICTIONARY.
  • 166. LZ78 (LEMPEL-ZIV) DECODING PROCESS THE DECODING PROCESS STEPS OUTPUT PHASE TEXT GENERATED 1 (0,A) A 2 (0,B) B 3 (2,C) BC 4 (3,A) BCA 5 (2,A) BA
  • 167. LZW ALGORITHM  LZW WORKS BY ENTERING PHRASES INTO A DICTIONARY AND THEN, WHEN A REPEAT OCCURANCE OF THAT PARTICULAR PHRASE IS FOUND, OUTPUTTING THE DICTIONARY INDEX INSTEAD OF THE PHRASE.  FOR EXAMPLE, IT USES A DICTIONARY WITH 4096 ENTRIES. IN THE BEGINNING, THE ENTRIES 0-255 REFER TO INDIVIDUAL BYTES AND THE REST 256-4095 REFER TO LONGER STRINGS.
  • 168. LZW ALGORITHM  EACH TIME A NEW CODE IS GENERATED, IT MEANS A NEW STRING HAS BEEN SELECTED FROM THE INPUT STREAM.  NEW STRINGS THAT ARE ADDED TO THE DICTIONARY ARE CREATED BY APPENDING THE CURRENT CHARACTER K TO THE END OF AN EXISTING STRING W.
  • 169. LZW ALGORITHM SET W=NIL LOOP READ A CHARACTER K IF WK EXISTS IN THE DICTIONARY W=WK ELSE OUTPUT THE CODE FOR W ADD WK TO THE STRING TABLE W=K END-LOOP
  • 170. JPEG  DESIGNED FOR COMPRESSING FULL COLOUR OR GRAY SCALE DIGITAL IMAGES OF REAL- WORLD SCENES.  IT DOES NOT WORK WELL ON TEXT, NON- REALISTIC IMAGES SUCH AS CARTOONS AND LINE DRAWINGS.  IT IS INDEPENDENT OF SOURCE IMAGE.  IT MEANS IT CAN COMPRESS IMAGE IRRESPECTIVE OF ITS SIZE.
  • 171. JPEG  IT DOES NOT HANDLE B&W (1 BIT PER PIXEL) IMAGES NOR DOES IT HANDLE MOTION PICTURE COMPRESSION.  IT USES LOSSY TECHNIQUE.  THE ALGO ACHIEVES MUCH OF ITS COMPRESSION BY EXPLOITING KNOWN LIMITATIONS OF HUMAN EYE, THE FACT THAT SMALL COLOUR DETAILS ARE NOT PERCEIVED AS WELL AS SMALL DETAILS OF LIGHT AND DARK.
  • 172. JPEG  IT IS INTENDED FOR COMPRESSING IMAGES THAT WILL BE LOOKED AT BY HUMANS.  THE JPEG STANDARD INCLUDES A SEPARATE LOSSLESS MODE, BUT IT IS RARELY USED AND DOES NOT GIVE NEARLY AS MUCH COMPRESSION AS THE LOSSY MODE.  A USEFUL PROPERTY OF JPEG IS THAT THE DEGREE OF LOSSINESS CAN BE VARIED BY ADJUSTING COMPRESSION PARAMETERS.
  • 173. JPEG  DECODERS CAN TRADE-OFF DECODING SPEED AGAINST IMAGE QUALITY BY USING FAST BUT INACCURATE APPROXIMATIONS TO THE REQUIRED CALCULATIONS.  MAIN ADV OF USING JPEG IS THAT ONE CAN MAKE IMAGE FILES SMALLER, AS WELL AS ONE CAN STORE 24 BIT/PIXEL COLOR DATA (16 MILLION COLORS) INSTEAD OF 8 BIT/PIXEL DATA (256 OR FEWER COLORS)  JPEG CAN EASILY PROVIDE 20:1 COMPRESSION OF FULL COLOR DATA. AT LOW QUALITY EVEN 100:1 COMPRESSION IS POSSIBLE.  JPEG IS WIDELY USED ON WWW FOR STORING/TRANSMITTING PHOTOGRAPHS.
  • 174. MPEG  THE MAIN ADVANTAGE IS THAT IT COMPRESSES DATA UPTO 1.5 MBITS/SECOND WHICH IS EQUAL TO CDROM DATA TRANSFER RATE.  USING MPEG1, ONE CAN DELIVER 1.2 Mbps OF VIDEO AND 250 kbps OF 2 CHANNEL STEREO SOUND USING CDROM TECHNOLOGY. SO MOVIE MAY BE STORED ON CDROM IN MPEG FORMAT AND MAY BE VIEWED WITHOUT ANY SYNCHRONIZATION FAULT.
  • 175. MPEG  JPEG IS FOR STILL IMAGE COMPRESSION WHEREAS MPEG IS FOR MOVING PICTURES.  BUT AS DIGITAL VIDEO OR MOVIES STORE A SEQ OF STILL COLOR IMAGES, MPEG STANDARD USES THE JPEG COMPRESSION TO COMPRESS STILL COLOR IMAGES.  MPEG IS SUITABLE FOR SYMMETRIC AS WELL AS ASYMMETRIC COMPRESSION.
  • 176. MPEG  ASYMMETRIC COMPRESSION REQUIRES MORE EFFORT FOR CODING THAN DECODING. IN THIS CASE, COMPRESSION IS CARRIED OUT ONCE WHEREAS DECOMPRESSION IS PERFORMED MANY TIMES.  SYMMETRIC COMPRESSION IS KNOWN TO EXPECT EQUAL EFFORT FOR COMPRESSION AND DECOMPRESSION PROCESSES.
  • 177. MPEG  INTERACTIVE DIALOGUE APPLICATIONS MAKE USE OF THIS ENCODING TECHNIQUE, WHERE RESTRICTED END-TO-END DELAY IS REQUIRED.  MPEG HAS BECOME THE METHOD OF CHOICE FOR ENCODING MOTION IMAGES BECAUSE IT HAS BECOME WIDELY ACCEPTED FOR BOTH INTERNET AND DVD-VIDEO.
  • 178. MHEG  MULTIMEDIA HYPERMEDIA EXPERT GROUP  SET UP BY ISO FOR STANDARDIZATION OF EXCHANGE FORMAT FOR MULTIMEDIA PRESENTATION AND MULTIMEDIA SYSTEM.  IT IS ALMOST IMPOSSIBLE TO MAKE A MM PRESENTATION WHICH CAN WORK ACROSS DIFF HW PLATFORMS.  MAIN OBJECTIVE OF THIS GROUP IS TO CREATE THE STANDARD METHOD OF STORE, EXCHANGE AND DISPLAY MM PRESENTATION.
  • 179. MHEG  IT IS BASED ON OBJECT ORIENTED TECHNOLOGY.  FOR MM PRESENTATION, THERE ARE MANY CLASSES THAT DEFINE HOW AUDIO, VIDEO AND MUSIC CAN BE PLAYED.  THERE ARE CLASSES THAT CAN HELP TO DEVELOP USER INTERACTION DURING MM PRESENTATIONS.  THE THREE IMP CLASSES USED ARE CONTENT CLASS, BEHAVIOUR CLASS AND INTERACTION CLASS.
  • 180. MHEG  CONTENT CLASS IS USED TO DESCRIBE THE ACTUAL CONTENTS OF THE MM PRESENTATION  BEHAVIOUR CLASS IS USED TO DECIDE THE BEHAVIOUR OF PRESENTATION.FOR EXAMPLE HOW AND WHEN DATA WILL BE PRESENTED TO THE USER. IT HAS 2 SUB CLASSES- ACTION CLASS AND LINK CLASS WHICH ARE USEFUL FOR SYNC THE EVENTS WITH THE USER INTERFACE.
  • 181. MHEG  THIS CLASS DESCRIBES THE ELEMENTS OF THE USER INTERFACE (ie., THE ELEMENTS THAT APPEAR ON THE USER SCREEN) THAT ALLOW THE USER TO MAKE SELECTIONS, TRIGGER EVENTS AND INPUT INFORMATION. FOR EXAMPLE, THE ELEMENTS CHECK BOX, RADIO BUTTONS AND LISTS ARE USED TO MAKE SELECTIONS; ELEMENT PUSH BUTTON IS USED TO TRIGGER EVENTS AND TEXT ENTRY FIELD IS USED TO INPUT INFORMATION FROM THE USER.

Editor's Notes

  1. Hello, Today I will talk about the common techniques commonly used for digital audio compression of various audio filetype formats.
  2. -I will discuss the difference between redundant and irrelevant further in my presentation. -Depending on storage or transmission, there is an optimization in size
  3. Hello, my name is…. My background was originally in computer science as applied to satellite ground station operations,and more recently, in using geographic information technology for natural resources and environmental health.
  4. Compression or archival software is used to condense files to save disk storage space and to speed the transfer of information through the network. Compression works by eliminating redundancies in data. Decompression software, which is used to open compressed files, is usually bundled with compression software. Virtually every Internet file uses some kind of compress/decompression utility, including text, programs or applications, images, audio, video,and virtual reality (VR). Generally, compression is built into your Internet browser and is transparent to you, but enhanced stand-alone software is available allowing you to do your own archival, upload, download and copy to floppy disk. Also you may, on occasion, download compressed files which require a decompression program of the same format, in order to expand the files and make them usable again. There are many different compression formats which vary in popularity, but currently ZIP seems to be the most common on PCs. Other formats have their proponents (e.g. unix formats like TAR and Mac compression format), so it is useful to know where to go to find a utility program that will read the file. WinZip and StuffitExpander are two popular compression utilities. Be sure to check your handout and some of the software archival sites, though, when looking for the right tool. Unzip IrfanView
  5. Hello, my name is…. My background was originally in computer science as applied to satellite ground station operations,and more recently, in using geographic information technology for natural resources and environmental health.