TOPICS DATA COMPRESSION COMPRESSION TECHNIQUES LOSSLESS COMPRESSION LOSSY COMPRESSION AUDIO COMPRESSION VIDEO COMPRE...
DATA COMPRESSIONThe process of reducing the volume of data by applying a compression technique is called compression.The...
DATA COMPRESSIONThe reverse process of reproducing the original data from compressed data is called decompression.The re...
Reasons to CompressReduce File SizeSave disk spaceIncrease transfer speed at a given data rateAllow real-time transfer...
Types of compression techniquesCompression techniques can be categorized based on following consideration: Lossless or lo...
Types of compression techniques1. Lossless or lossy       If the decompressed data    is the same as the original    data...
Types of compression techniques2. Symmetrical or asymmetrical In symmetrical compression,  the time required to compress ...
Types of compression techniques3. Software or hardware      A compression technique    may be implemented either in    ha...
Basics of Compression
Compression - Types Spatial Compression   – Finds similarities in an image and     compresses those similarities in a sma...
Compression - Spatial Run Length Encoding  – Replace a run of consecutive    pixels of the same color by a    single copy...
Compression - Spatial Dictionary-based coding  – Fixed length bits point to a table    of variable length colors codes  –...
Compression - Spatial GIF  – Lossless compression  – Best suited for simple images  – Reduces colors to reduce file    si...
Compression - Spatial JPEG   – Joint Photographic Experts     Group   – Lossy compression   – Best suited for photography...
Compression - Temporal Motion JPEG  – Most popular for capturing    analog video  – JPEG on each frame of video  – No tem...
Compression - Temporal DV  – Most popular for storage and    capturing digital video  – 5:1 compression usually done in  ...
Compression - Temporal MPEG  – Motion Picture Experts Group  – Most popular for delivery of    digital video  – Temporal ...
BASIC COMPRESSION TECHNIQUESLossless techniquesLossy techniques
Lossless techniques RUN-LENGTH CODING: -     repeated symbols in a  string are replaced with the  symbol and the number o...
Lossless techniquesVARIABLE-LENGTH CODING: -    In general, coding schemes for a given character set use a fixed number o...
Run-Length coding Look at compressing same sequence  again:  ABBBBBBBBBCDEEEEF  – Using RLE compression, the compressed f...
LOSSY TECHNIQUESPREDICTIVE ENCODING: -Stores only the initial sampleSample may be a pixel, line, audio sample, or video...
LOSSY TECHNIQUESTRANSFORM ENCODING:-Data is converted from one domain to another.DCT (Discrete cosine transform) encodi...
Compression FundamentalsLossless – ensures that the data   recovered from the   compression /   decompression process is ...
Compression Fundamentals Lossy  – does not promise that the data    received is exactly the same as    the data sent  – r...
Audio CompressionTechniques
Introduction Digital Audio Compression  – Removal of redundant or    otherwise irrelevant information    from audio signa...
Audio Data Compression Lossless Audio Compression  – Removes redundant data  – Resulting signal is same as    original – ...
Audio compression       Audio compression is a form of    data compression designed to    reduce the size of audio data  ...
Audio compressionAudio data compression - in which the amount of data in a recorded waveform is reduced for transmission....
Audio compressionAudio level compression - in which the dynamic range (difference between loud and quiet) of an audio wav...
MPEG Compression
MPEG Components MPEG (motion pictures experts  group) is a multimedia standard  with specifications for coding,  compress...
Audio compression• MPEG audio Mpeg audio is a standard for  compression and decompression of  digital audio. The coding ...
BASIC STEPS OF MPEG AUDIO COMPRESSIONINPUT AUDIO                                      BIT/NOISE                 ENCODED   ...
BASIC STEPS OF MPEG AUDIO COMPRESSIONENCODED                      FREQUENCY     FREQUENCY   DECODED AUDIO             BIT ...
VIDEO COMPRESSION• Mpeg video MPEG video is a subset of the  MPEG standard. Digital video compression may  either apply ...
VIDEO COMPRESSIONMpeg uses both intra-frame and inter-frame techniques for data compression. Mpeg compression is lossy a...
BASIC STEPS OF MPEG VIDEO COMPRESSIONVIDEO DATA TO BE COMPRESSED                                   PERFORM QUANYIZATION   ...
Three Types of Frames Intra frames (same as JPEG)     – typically about 12 frames between I frames Predictive frames    ...
Lossless compressionLoss-less compressions reduce file size by encoding image information more efficiently.Images compre...
Lossy compressionLossy compressions reduce file size by considerably greater amounts than loss-less compressions but lose...
StandardJPEGJoint Photographic Experts Group Jpeg is the standard compression techniques for still imagesLossy compres...
Standard JPEG It supports the four modes of   encoding    – Sequential       • The image is encoded in the         order ...
JPEG (contd.)– the original quality of the image  can be fully restored Hierarchical   • The image is encoded at     multi...
Standard JPEG
MPEG Standard MPEG-1 MPEG-2 MPEG-3 MPEG-4 MPEG-7 MPEG-21
MPEG StandardMPEG-1: Initial video andaudio compression standard.Later used as the standard forVIDEO CD, and (MP3) audioco...
MPEGMPEG-2: Video and audio standards for broadcast-quality television. Used for digital satellite TV services like DIRECT...
MPEG StandardMPEG-4: Expands MPEG-1 to supportvideo/audio "objects", 3D content, lowbitrate encoding and support forDigita...
JPEG Standard JPEG :  – the real image compression is the Discrete    Cosine Transform (DCT).  – Removes the redundant in...
Codecs Compression/Decompression  Scheme Hardware or software based Many use both spatial and temporal  compression tec...
Why do we do data compression?Data compression is simply done for saving the space in the hard disk, thereby to make it m...
What is the use of data compression            on network?The most prominent use of data compression on the network is to...
CompressionEven though disks have gotten bigger, we are still running short on disk spaceA common technique is to compre...
COMPUTER TOOLS AND UTILITIES
Compression Utilities Zip files are used for rapidly  distributing and storing files. Zip files are compressed to save  ...
Lossless vs. lossy
Applications      lossless data    compression is often used    to better use disk space on    office computers, or bette...
ApplicationsIn other kinds of data such as sounds and pictures, a small loss of quality can be tolerated without losing t...
Lossless vs. Lossy Compressionfile:///C:/Documents and Settings/login.IPS/Desktop/amit_jain/abc/lossy data compression Inf...
ADVANTAGES Data compression is simply done  for saving the space in the hard  disk, thereby to make it more fault  tolera...
Lossy vs. Lossless Compression Lossy method can produce a much smaller compressed file than any known lossless method, wh...
DATA COMPRESSION NEEDED AS MOST OF THE REAL  WORLD DATA IS REDUNDANT IMPORTANCE? SAVES DISK SPACE SAVES CONNECTION BAN...
DATA COMPRESSION TYPESREVERSIBLE          IRREVERSIBLELOSSY – WHEN EFFICIENCY OFTRANSMISSION IS MORE IMPORTANTTHAN ACCURA...
INFORMATION THEORY IT IS A BRANCH OF MATHEMATICS  THAT DEALS WITH DATA/INFORM -  ATION REPRESENTATION DATA COMPRESSION I...
SHANNON’S PRINCIPLEFOR INFORMATION FOR DATA COMPRESSION, IT IS  ESENTIAL TO MEASURE INFORMATION  CONTENTS IN THE DATA OR ...
SHANNON’S PRINCIPLEFOR INFORMATION IT WAS GIVEN BY CLAUDE  SHANNON ACCORDING TO HIM, SELF-  INFORMATION IS ASSOCIATED  W...
SHANNON’S PRINCIPLEFOR INFORMATION LET P(A) & P(B) BE THE PROB OF  OCCURANCE OF EVENTS A & B  RESPECTIVELY. ACCORDING TO...
SHANNON’S PRINCIPLEFOR INFORMATION PROB EVENT VIS-À-VIS SELF INFO    P(A)          Si(A)     1             0     0.5     ...
SHANNON’S PRINCIPLEFOR INFORMATION CONCEPT OF Si MAY ALSO BE USED  TO MAKE INFERENCES BY  ASSOCIATING IT WITH 2 INDEPENDE...
ENTROPY OF INFORMATION ENTROPY IS A CONCEPT OF  THERMODYNAMICS IN INFO THEORY, IT IS USED TO  FIND OUT THE  RANDOMNESS/U...
ENTROPY OF INFORMATION THE AVERAGE INFO CONTENT OF A  MESSAGE IS CALLED ITS ENTROPY THE LESS LIKELY A MESSAGE IS TO  OCC...
ENTROPY OF INFORMATION ENTROPY (Ee) IS THE MINIMUM NO  OF BITS NEEDED TO ENCODE  THAT ELEMENT THE ENTROPY OF AN ENTIRE  ...
ENTROPY OF INFORMATION THE ENTROPY OF A MESSAGE  CAN BE USED TO DETERMINE IF  THE DATA COMPRESSION IS  WORTH ATTEMPTING....
ENTROPY OF INFORMATION THE NO. OF BITS IN A  COMPRESSED CODE CAN BE  COMPARED TO THE ENTROPY  FOR THAT MESSAGE Em  REVEAL...
ENTROPY OF INFORMATION SHANNON PROPOSED THE  FOLLOWING ENTROPY FN FOR A  MESSAGE:    Em = - Σ Pi log2(Pi), sum over 1    ...
ENTROPY OF INFORMATION THE ENTROPY OF A CHAR IS  GIVEN BY ITS SELF INFO ie.,  ENTROPY OF A CHAR A IS GIVEN  BY Ee=-log2P(...
ENTROPY OF INFORMATION NOTICE THE DIFFERENCE  BETWEEN N IN THE TWO  EQUATIONS IN 1ST, N IS THE NO OF DISTINCT  CHARS USE...
ENTROPY OF INFORMATION SO, ENTROPY OF A MESSAGE  GIVES THE AVERAGE NO OF BITS  REQUIRED TO REPRESENT A  CHARACTER IN THE ...
ENTROPY OF INFORMATION N=15    CHAR NO OF CHARS              PROB OF CHAR          Si     d        4                     ...
ENTROPY OF INFORMATION NOTE THAT THE AV SELF  INFORMATION OF THE MESSAGE  AND THE ENTROPY OF THE  MESSAGE BOTH ARE SAME A...
ENTROPY OF INFORMATION QUES2: CALCULATE THE AV NO.  OF BITS REQUIRED TO  REPRESENT A CHAR IN THE  MESSAGE STRING “AAAAABB...
ENTROPY OF INFORMATIONA         6        0.6B         2        0.2C         2        0.2ENTROPY OF MESSAGE=-Σ Pi*log2(Pi),...
CODES A CODE IS ANY MAPPING FROM AN  INPUT ALPHABET TO AN OUTPUT  ALPHABET A CODE CAN BE SAY {a,b,c} = {0,1,00},  BUT TH...
CODES A CODE IS INSTANTANEOUS IF EACH  CODEWORD IN A MESSAGE CAN BE  DECODED AS SOON AS IT IS RECEIVED. THE BINARY CODE ...
CODES A CODE IS A PREFIX CODE IFF NO  CODEWORD IS A PREFIX OF ANOTHER  CODE WORD. A CODE IS INSTANTANEOUS IFF IT IS A  P...
TYPES OF CODING THERE ARE MANY ALGORITHMS  FOR CODING THE CHARACTERS  BUT CAN BE BROADLY DIVIDED  INTO 2 TYPES: STATIC (...
STATIC CODING SCHEME IF THE MESSAGE IS COMPOSED  BY THE COMBINATION OF M  DISTINCT CHARS, THEN THE  POSSIBLE NO. OF BITS ...
STATIC CODING SCHEME THE MAIN DISADVANTAGE IS  THAT IT DOES NOT CONSIDER  THE FREQUENCY OR PROB OF  OCCURANCE OF A PARTIC...
STATIC CODING SCHEMEQUES 3: CONSIDER THE MESSAGE “RAMRAHIM” FIND THE NO OF DISTINCT CHARS, THE MIN NO OF BITS REQUIRED TO ...
STATIC CODING SCHEME   NO OF DISTINCT CHARS = 5   N=log2M = log25= 3; SO 3 BIT CODE IS NEEDED TO    REPRESENT EACH SYMBO...
DYNAMIC CODINGSCHEME COMPUTERS ENCODE CHARS IN ASCII CODE.  SO, A FILE HAVING 100 CHARS SHALL  REQUIRE 800 BITS BUT IN A...
DYNAMIC CODING SCHEME IT USES VARIABLE SIZE CODE MINIMUM NO OF BITS ARE ASSIGNED  TO THE MOST FREQUENTLY OCCURING  CHARA...
DYNAMIC CODING SCHEMEQUES 4: CONSIDER THE MESSAGE    “RAAMRAHMMM”FIND OUT THE DISTINCT CHARACTERS AND    THEIR FREQUENCY, ...
USE OF ENTROPY IN CODING THE ENTROPY FN IS USED TO  DEVELOP AN EFFICIENT CODE FOR  THE PURPOSE OF COMMUNICATION. ONE CAN...
USE OF ENTROPY IN CODINGQUES 5: CONSIDER A MESSAGE STREAM CONSISTING OF   CHARS A,B,C,D. LET THE PROB OF OCCURANCE OF   CH...
USE OF ENTROPY IN CODING   M=4; MIN NO OF BITS REQ TO REP A    CHAR=N=log24=2    BY LOOKING INTOTHE TABLE, THE FOLLOWING...
USE OF ENTROPY IN CODING3. DYNAMIC CODING IS MORE EFFICIENT4. ENTROPY = - Σ Pi *log2(Pi)  = - (0.7*log2(0.7)+0.15*log2(.15...
LOSSLESS DATA COMPRESSION ALL ALGORITHMS ATTEMPT TO  RE-ENCODE DATA TO REMOVE  REDUNDANCY IT IMPLIES THAT DATA WITH NO  ...
SHANNON FANO ALGORITHM IT USES THE IDEA OF USING  SHORTER CODES FOR MORE  FREQUENTLY OCCURING  CHARACTERS GIVEN BY CLAUD...
SHANNON FANO ALGORITHM ADV? CONSIDER A FILE HAVING 40  LETTERS WITH THE GIVEN  FREQUENCY- A:14; B:7; C:10; D:5;  E:4 AS...
SHANNON FANO ALGORITHM VARIABLE LENGTH ENCODING  SCHEMES SUCH AS HUFFMAN  AND SHANNON-FANO HAVE THE  FOLLOWING PROPERTIES...
SHANNON FANO ALGORITHM EACH CODE CAN BE UNIQUELY  DECODED. THIS IS CALLED THE PREFIX  PROPERTY ie., NO CHARS ENCODING  IS...
SHANNON FANO ALGORITHM WITH THE PREFIX GUARANTEE, THERE  IS NO AMBIGUITY IN DETERMINING  WHERE THE CHAR BOUNDARIES ARE. ...
SHANNON FANO ALGORITHM1. FIND THE FREQ OF OCCURANCE OF   EACH SYMBOL3. SORT IT IN THE DESCENDING ORDER5. DIVIDE THE LIST I...
SHANNON FANO ALGORITHM1. REPEAT STEP 3 UNTIL EACH HALF   CONTAINS JUST ONE SYMBOL3. CONSTRUCT THE BINARY TREE (SF   TREE) ...
SHANNON FANO ALGORITHM1. TO OBTAIN THE CODE FOR ANY   SYMBOL, THE CODE IS THE   COMBINATION OF ALL THE   DIGITS FROM THE R...
SHANNON FANO ALGORITHM1. SYMBOL   FREQUENCY    A           14    B           7    C           10    D           5    E    ...
SHANNON FANO ALGORITHM1. SORT IT IN DESCENDING ORDER   SYMBOL        FREQUENCY    A               14    C               10...
SHANNON FANO ALGORITHM1. DIVIDING INTO PARTS   FIRST ITERATION    A        14    C        10    B       7    D       5    ...
SHANNON FANO ALGORITHMSECOND ITERATION      A    14      C    10      B    7      D    5      E    4
SHANNON FANO ALGORITHMTHIRD ITERATION           A        14          C         10          B         7          D         ...
SHANNON FANO ALGORITHM1.   SHANNON FANO TREE:                      40                 0          1            24          ...
SHANNON FANO ALGORITHM1.   OBTAINING THE CODE FROM THE TREE     SYMBOL CODE NO OF BITS     FREQUENCY       A       00     ...
HUFFMAN ALGORITHM GIVEN BY DAVID HUFFMAN IMPROVEMENT OVER S-F ALGO. LOSSLESS COMP ALGO, IDEAL FOR  COMPRESSING TEXT OR ...
HUFFMAN ALGORITHM HUFFMAN CALLED THESE  “MINIMUM REDUNDANCY CODES” IT BELONGS TO THE FAMILY OF  ALGOS WITH A VARIABLE CO...
HUFFMAN ALGORITHM MAIN DIFFERENCE: S-F ALGO BUILDS THE BINARY TREE  FROM TOP TO BOTTOM, WHEREAS  HUFFMAN’S ALGO FORMS TH...
HUFFMAN ALGORITHM1. COUNT THE NO OF CHARS AND THE   FREQ OF OCCURANCE OF EACH   CHARACTER3. ARRANGE THEM IN THE DESCENDING...
HUFFMAN ALGORITHM CONSTRUCTION OF HUFFMAN TREE:2.   PICK UP 2 CHARS FROM THE LIST HAVING     MINIMUM FREQ. LET US CALL TH...
HUFFMAN ALGORITHM1.   DELETE A AND B FROM THE LIST3.   ADD THE VALUE OF “AB” TO THE LIST5.   REPEAT THE STEPS 1 TO 5 TILL ...
HUFFMAN ALGORITHM PROBLEM: LET A MESSAGE OF 100 CHARS  CONTAIN THE FOLLOWING:     CHAR           FREQUENCY     A         ...
HUFFMAN ALGORITHM    CONSTRUCTION OF HUFFMAN TREE:2.   TWO CHARS HAVING MINIMUM FREQ ARE D     &E2    MAKE D AND E 2 FREE...
HUFFMAN ALGORITHM4 & 5: DELETE D & E FROM THE LIST AND ADD  DE AND REPEAT                 30    CDE            15         ...
HUFFMAN ALGORITHMCONTD:              CDEBA              100         50          50     CDEB            A
HUFFMAN ALGORITHM H TREE:                          CDEBA                         100                    0               1...
HUFFMAN ALGORITHMCHARACTER        HUFFMAN CODE      SIZE      A               0             1      B               11     ...
HUFFMAN ALGORITHM HUFFMAN CODING CAN BE FURTHER  OPTIMIZED: EXTENDED HUFFMAN COMPRESSION-  CAN ENCODE GROUP OF SYMBOLS  ...
ARITHMETIC CODING IT IS A METHOD OF WRITING A CODE IN A  NON-INTEGER LENGTH IT ALLOWS ONE TO CODE VERY CLOSE TO  IDEAL E...
ARITHMETIC CODING IT IS QUITE SIMILAR TO HUFFMAN, BECAUSE  IT IS USED FOR THE SAME KIND OF  CONTENTS TO COMPRESS AS IN HU...
ARITHMETIC CODING THE OUTPUT FROM AN ARITHMETIC CODING  PROCESS IS A SINGLE NUMBER LESS THAN 1  AND GREATER THAN OR EQUAL...
ARITHMETIC CODING1.   IT REQUIRES 5 VARIABLES FOR ENCODING:     RANGE, LOW, HIGH, RF(RANGE FROM), RT     (RANGE TO)EXAMPLE...
ARITHMETIC CODING1.   NOW ENCODE THE CHARS IN THE MESSAGE USING     THE TABLE AS OBTAINED IN STEP 1, AS FOLLOWS,     AND C...
ARITHMETIC CODINGCHAR            RANGE            LOW       HIGHSTART-NONE 1-0=1                 0+1*0=0 0+1*0=0  A       ...
ARITHMETIC CODING    THE FINAL OUTPUT VALUE = LOW =0.400975    STEP 3: TO DECODE THE MESSAGE Ie., TO GET THE     CHARS B...
ARITHMETIC CODINGVALUE      RANGE       CHAR DECODED          RD.400975 0.00 - <0.5    A            0.5.80195     0.5 - <0...
Compression ratio ONE NEEDS TO KNOW IT TO FIND  OUT THE EFFICIENCY OF THE  COMPRESSION ALGORITHM C.R.= SIZE OF O.D.- SIZ...
DICTIONARY BASED COMPRESSION TECHNIQUES STATISTICAL METHODS, SUCH AS S-F AND  HUFFMAN, ENCODE A SINGLE SYMBOL AT A  TIME ...
DICTIONARY BASED COMPRESSION TECHNIQUES SUPPOSE A TEXT IS GIVEN IT IS ASSUMED THAT THERE IS A DICTIONARY  THAT HAS ALL T...
DICTIONARY BASED COMPRESSION TECHNIQUES WHEN THE TEXT IS TO BE COMPRESSED, THE WORDS  OF THE TEXT ARE REPLACED BY THE IND...
DICTIONARY BASED COMPRESSION TECHNIQUES IF THE PHRASES ARE USED, EFFICIENCY INCREASES. FOR THIS TO WORK, IT IS IMPORTANT...
DICTIONARY BASED COMPRESSION TECHNIQUES THE DICTIONARY BASED METHOD MAY BE  STATIC OR DYNAMIC DEPENDING UPON THE  CREATIO...
DICTIONARY BASED COMPRESSION TECHNIQUES THE MAIN DRAWBACK OF STATIC METHOD IS  THAT PERFORMANCE DEPENDS UPON THE  TEXT TO...
DICTIONARY BASED COMPRESSION TECHNIQUES IN THIS METHOD, THE DICTIONARY IS  PREPARED AT THE TIME OF ENCODING OF  TEXT. LZ...
LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE DESIGNED FOR SEQUENTIAL DATA COMPRESSION. THE DICTIONARY IS A PORTION OF THE PRE...
LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUELZ77:         pointer    SEARCH BUFFER        LOOKAHEAD BUFFER   c    a   b r A c a...
LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE1.   The encoder then examines the symbols     following the symbol at the pointer ...
LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE1.   Once the longest match has been found, the     encoder encodes it with a tripl...
LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE The reason for sending the third element in the  triple is to take care of the si...
LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE Take a buffer of the same size as used in  encoding, say n, and then use its firs...
LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE It provides very good compression ratio for  many types of data. But, the encodi...
LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE ALL POPULAR ARCHIVERS (.ARJ, .LHA, .ZIP, .ZOO) ARE  VARIATIONS ON LZ77 THEME. DR...
LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE TO OVERCOME SUCH PROBLEMS, LZ78 ALGO WAS  GIVEN. THE ONLY DIFFERENCE HERE IS THA...
LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE THE DICTIONARY IN LZ78 IS A TABLE OF  STRINGS. EVERY STRING IS ASSIGNED A  CODE W...
LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE CODE WORD: A BASIC DATA ELEMENT IN THE CODE  STREAM. IT REPRESENTS A STRING FROM ...
LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE CURRENT PREFIX (P) : THE PREFIX  CURRENTLY BEING PROCESSED IN THE  ENCODING ALGOR...
LZ78 (LEMPEL-ZIV) ENCODING PROCESS IT STARTS WITH A NEW DICTIONARY ie., AT  THE BEGINNING OF ENCODING THE  DICTIONARY IS ...
LZ78 (LEMPEL-ZIV) ENCODING PROCESS IF ITS CORRESPONDING STRING (P+ C) IS  PRESENT IN THE DICTIONARY, THE PREFIX IS  EXTEN...
LZ78 (LEMPEL-ZIV) ENCODING PROCESS THEN ONE ADDS THE WHOLE STRING (P+C) TO THE  DICTIONARY AND STARTS PROCESSING THE NEXT...
LZ78 (LEMPEL-ZIV) ENCODING PROCESS THE OUTPUT FROM THIS ALGO IS A SEQ OF  CODEWORD CHAR PAIR (W,C). EACH TIME A  PAIR IS ...
LZ78 (LEMPEL-ZIV) ENCODING ALGORITHM     LZ78:2.    START WITH AN EMPTY DICTIONARY WITH AN EMPTY PREFIX P.3.    C= NEXT C...
LZ78 (LEMPEL-ZIV) DECODING PROCESS AT THE START OF DECODING THE DICTIONARY IS  EMPTY. IT GETS RECONSTRUCTED IN THE PROCE...
LZ78 (LEMPEL-ZIV) DECODING ALGORITHM1.   AT THE START THE DICTIONARY IS EMPTY.2.   W= NEXT CODEWORD IN THE CODESTREAM3.   ...
LZ78 (LEMPEL-ZIV) ENCODING PROCESS EXAMPLE: LET THE CHAR STREAM TO BE  ENCODED BEPOS   1 2 3 4 5 6 7 8 9CHAR A B B C B C ...
LZ78 (LEMPEL-ZIV) ENCODING PROCESS THE COLUMN DICTIONARY SHOWS WHAT  STRING HAS BEEN ADDED TO THE  DICTIONARY. THE INDEX ...
LZ78 (LEMPEL-ZIV) DECODING PROCESSTHE DECODING PROCESSSTEPS     OUTPUT PHASE    TEXT GENERATED 1        (0,A)           A ...
LZW ALGORITHM LZW WORKS BY ENTERING PHRASES INTO A  DICTIONARY AND THEN, WHEN A REPEAT  OCCURANCE OF THAT PARTICULAR PHRA...
LZW ALGORITHM EACH TIME A NEW CODE IS GENERATED, IT  MEANS A NEW STRING HAS BEEN SELECTED  FROM THE INPUT STREAM. NEW ST...
LZW ALGORITHMSET W=NILLOOP   READ A CHARACTER K   IF WK EXISTS IN THE DICTIONARY       W=WK   ELSE       OUTPUT THE CODE F...
JPEG DESIGNED FOR COMPRESSING FULL COLOUR  OR GRAY SCALE DIGITAL IMAGES OF REAL-  WORLD SCENES. IT DOES NOT WORK WELL ON...
JPEG IT DOES NOT HANDLE B&W (1 BIT PER PIXEL)  IMAGES NOR DOES IT HANDLE MOTION  PICTURE COMPRESSION. IT USES LOSSY TECH...
JPEG IT IS INTENDED FOR COMPRESSING IMAGES  THAT WILL BE LOOKED AT BY HUMANS. THE JPEG STANDARD INCLUDES A  SEPARATE LOS...
JPEG DECODERS CAN TRADE-OFF DECODING SPEED  AGAINST IMAGE QUALITY BY USING FAST BUT  INACCURATE APPROXIMATIONS TO THE REQ...
MPEG THE MAIN ADVANTAGE IS THAT IT  COMPRESSES DATA UPTO 1.5 MBITS/SECOND  WHICH IS EQUAL TO CDROM DATA  TRANSFER RATE. ...
MPEG JPEG IS FOR STILL IMAGE COMPRESSION  WHEREAS MPEG IS FOR MOVING PICTURES. BUT AS DIGITAL VIDEO OR MOVIES STORE A  S...
MPEG ASYMMETRIC COMPRESSION REQUIRES  MORE EFFORT FOR CODING THAN  DECODING. IN THIS CASE, COMPRESSION IS  CARRIED OUT ON...
MPEG INTERACTIVE DIALOGUE APPLICATIONS  MAKE USE OF THIS ENCODING TECHNIQUE,  WHERE RESTRICTED END-TO-END DELAY IS  REQUI...
MHEG MULTIMEDIA HYPERMEDIA EXPERT GROUP SET UP BY ISO FOR STANDARDIZATION OF EXCHANGE  FORMAT FOR MULTIMEDIA PRESENTATIO...
MHEG IT IS BASED ON OBJECT ORIENTED TECHNOLOGY. FOR MM PRESENTATION, THERE ARE MANY CLASSES  THAT DEFINE HOW AUDIO, VIDE...
MHEG CONTENT CLASS IS USED TO DESCRIBE THE  ACTUAL CONTENTS OF THE MM  PRESENTATION BEHAVIOUR CLASS IS USED TO DECIDE TH...
MHEG THIS CLASS DESCRIBES THE ELEMENTS OF  THE USER INTERFACE (ie., THE ELEMENTS  THAT APPEAR ON THE USER SCREEN) THAT  A...
Upcoming SlideShare
Loading in...5
×

Compression

6,729

Published on

Published in: Technology, Art & Photos
0 Comments
4 Likes
Statistics
Notes
  • Be the first to comment

No Downloads
Views
Total Views
6,729
On Slideshare
0
From Embeds
0
Number of Embeds
0
Actions
Shares
0
Downloads
696
Comments
0
Likes
4
Embeds 0
No embeds

No notes for slide
  • Hello, Today I will talk about the common techniques commonly used for digital audio compression of various audio filetype formats.
  • -I will discuss the difference between redundant and irrelevant further in my presentation. -Depending on storage or transmission, there is an optimization in size
  • Hello, my name is…. My background was originally in computer science as applied to satellite ground station operations,and more recently, in using geographic information technology for natural resources and environmental health.
  • Compression or archival software is used to condense files to save disk storage space and to speed the transfer of information through the network. Compression works by eliminating redundancies in data. Decompression software, which is used to open compressed files, is usually bundled with compression software. Virtually every Internet file uses some kind of compress/decompression utility, including text, programs or applications, images, audio, video,and virtual reality (VR). Generally, compression is built into your Internet browser and is transparent to you, but enhanced stand-alone software is available allowing you to do your own archival, upload, download and copy to floppy disk. Also you may, on occasion, download compressed files which require a decompression program of the same format, in order to expand the files and make them usable again. There are many different compression formats which vary in popularity, but currently ZIP seems to be the most common on PCs. Other formats have their proponents (e.g. unix formats like TAR and Mac compression format), so it is useful to know where to go to find a utility program that will read the file. WinZip and StuffitExpander are two popular compression utilities. Be sure to check your handout and some of the software archival sites, though, when looking for the right tool. Unzip IrfanView
  • Hello, my name is…. My background was originally in computer science as applied to satellite ground station operations,and more recently, in using geographic information technology for natural resources and environmental health.
  • Compression

    1. 1. TOPICS DATA COMPRESSION COMPRESSION TECHNIQUES LOSSLESS COMPRESSION LOSSY COMPRESSION AUDIO COMPRESSION VIDEO COMPRESSION MPEG COMPRESSION JPEG COMPRESSION LOSSLESS VS. LOSSY COMPRESSION ADVANTAGE OF COMPRESSION
    2. 2. DATA COMPRESSIONThe process of reducing the volume of data by applying a compression technique is called compression.The resulting data is called compressed data.
    3. 3. DATA COMPRESSIONThe reverse process of reproducing the original data from compressed data is called decompression.The resulting data is called decompressed data.
    4. 4. Reasons to CompressReduce File SizeSave disk spaceIncrease transfer speed at a given data rateAllow real-time transfer at a given data rate
    5. 5. Types of compression techniquesCompression techniques can be categorized based on following consideration: Lossless or lossy Symmetrical or asymmetrical Software or hardware
    6. 6. Types of compression techniques1. Lossless or lossy If the decompressed data is the same as the original data, it is referred to as lossless compression, otherwise the compression is lossy.
    7. 7. Types of compression techniques2. Symmetrical or asymmetrical In symmetrical compression, the time required to compress and to decompress are roughly the same. In asymmetrical compression, the time taken for compression is usually much longer than decompression.
    8. 8. Types of compression techniques3. Software or hardware A compression technique may be implemented either in hardware or software. As compared to software codecs (coder and decoder), hardware codecs offer better quality and performance.
    9. 9. Basics of Compression
    10. 10. Compression - Types Spatial Compression – Finds similarities in an image and compresses those similarities in a smaller form – Intra-frame Temporal Compression – Finds similarities across images and compresses those similarities in a smaller form – Inter-frame Quality of Compression – Lossless – Lossy
    11. 11. Compression - Spatial Run Length Encoding – Replace a run of consecutive pixels of the same color by a single copy of the color value and a count of the number of pixels Huffman coding – Similar to RLE, but assigns codes of different lengths to colors (most common colors have minimum bits)
    12. 12. Compression - Spatial Dictionary-based coding – Fixed length bits point to a table of variable length colors codes – Basis of LZW and PKZIP All Lossless compression schemes – 50% compression at best
    13. 13. Compression - Spatial GIF – Lossless compression – Best suited for simple images – Reduces colors to reduce file size (256 colors max)
    14. 14. Compression - Spatial JPEG – Joint Photographic Experts Group – Lossy compression – Best suited for photography – Throws data away to further reduce file size
    15. 15. Compression - Temporal Motion JPEG – Most popular for capturing analog video – JPEG on each frame of video – No temporal compression – Special-purpose hardware may be needed for real-time
    16. 16. Compression - Temporal DV – Most popular for storage and capturing digital video – 5:1 compression usually done in hardware (camera) – Spatial and a little temporal compression
    17. 17. Compression - Temporal MPEG – Motion Picture Experts Group – Most popular for delivery of digital video – Temporal and spatial compression – MPEG1, MPEG2, MPEG4 & MPEG 7
    18. 18. BASIC COMPRESSION TECHNIQUESLossless techniquesLossy techniques
    19. 19. Lossless techniques RUN-LENGTH CODING: - repeated symbols in a string are replaced with the symbol and the number of instances it is repeated. example “aaaabbcccccaaaaaabaaaaa” is expressed as “a4b2c5a6b1a1b4”.
    20. 20. Lossless techniquesVARIABLE-LENGTH CODING: - In general, coding schemes for a given character set use a fixed number of bits per character. example bcd, EBCDIC, and ASCII
    21. 21. Run-Length coding Look at compressing same sequence again: ABBBBBBBBBCDEEEEF – Using RLE compression, the compressed file takes up 10 bytes and would look like this: A Ω9BCDΩ4EF – Data size before compression: 17 bytes – Data size after compression: 10 bytes Savings: 17/10 = 1.7
    22. 22. LOSSY TECHNIQUESPREDICTIVE ENCODING: -Stores only the initial sampleSample may be a pixel, line, audio sample, or video frame.
    23. 23. LOSSY TECHNIQUESTRANSFORM ENCODING:-Data is converted from one domain to another.DCT (Discrete cosine transform) encoding is the best example of this method.
    24. 24. Compression FundamentalsLossless – ensures that the data recovered from the compression / decompression process is exactly the same as the original data. – Commonly used to compress executable code, text files, and numeric data.
    25. 25. Compression Fundamentals Lossy – does not promise that the data received is exactly the same as the data sent – removes removes information that it cannot later restore (Hopefully, no one will notice.) – Commonly used to compress digital imagery, including video.
    26. 26. Audio CompressionTechniques
    27. 27. Introduction Digital Audio Compression – Removal of redundant or otherwise irrelevant information from audio signal – Audio compression algorithms are often referred to as “audio encoders” Applications – Reduces required storage space – Reduces required transmission bandwidth
    28. 28. Audio Data Compression Lossless Audio Compression – Removes redundant data – Resulting signal is same as original – perfect reconstruction Lossy Audio Encoding – Removes irrelevant data – Resulting signal is similar to original
    29. 29. Audio compression Audio compression is a form of data compression designed to reduce the size of audio data files. Audio compression can mean two things: Audio data compression Audio level compression
    30. 30. Audio compressionAudio data compression - in which the amount of data in a recorded waveform is reduced for transmission. This is used in MP3 encoding, internet radio, and the like.
    31. 31. Audio compressionAudio level compression - in which the dynamic range (difference between loud and quiet) of an audio waveform is reduced. This is used in guitar effects racks, recording studios, etc.
    32. 32. MPEG Compression
    33. 33. MPEG Components MPEG (motion pictures experts group) is a multimedia standard with specifications for coding, compression and transmission of audio, video and data streams. Video: describes compression of frames Audio: describes compression of audio frames
    34. 34. Audio compression• MPEG audio Mpeg audio is a standard for compression and decompression of digital audio. The coding technique used in mpeg audio standard(known as perceptual coding) takes advantage of this perceptual weakness of human ears (pshychoacoustic phenomena). In perceptual coding, the audio spectrum is divided into a set of narrow frequency bands, to reflect the frequency selectivity of human hearing.
    35. 35. BASIC STEPS OF MPEG AUDIO COMPRESSIONINPUT AUDIO BIT/NOISE ENCODED TIME TO SIGNAL FREQUENCY FILTER ALLOCATION, BIT-STREAM MAPPING BANK QUANTIZER, FORMATTING BIT-STREAM AND CODING PSYCHOACOUSTIC MODEL MPEG AUDIO ENCODING
    36. 36. BASIC STEPS OF MPEG AUDIO COMPRESSIONENCODED FREQUENCY FREQUENCY DECODED AUDIO BIT STREAM SAMPLE TO THEBIT-STREAM UNPACKING RECONSTRUCTION MAPPING SIGNAL MPEG AUDIO DECODING
    37. 37. VIDEO COMPRESSION• Mpeg video MPEG video is a subset of the MPEG standard. Digital video compression may either apply intraframe compression to each individual frame of the video or combine both intraframe and interframe compression.
    38. 38. VIDEO COMPRESSIONMpeg uses both intra-frame and inter-frame techniques for data compression. Mpeg compression is lossy and asymmetric, with the encoding process requiring more than the decoding process.
    39. 39. BASIC STEPS OF MPEG VIDEO COMPRESSIONVIDEO DATA TO BE COMPRESSED PERFORM QUANYIZATION OF DCTCOEFFICIENTS USING A Q-TABLE PREPROCESSING AND COLOR SUBSAMPLING OF INDIVIDUAL FRAMES ORDER THE 2-D OUTPUT IN ZIGZAG SEQUENCE INTERFRAME MOTION COMPENSATION FOR P-FRAME AND B-FRAME APPLY RUN-LENGTH ENCODING TO THE ZIGZAG SEQUENCE DIVIDE EACH FRAME INTO 8X8 PIXEL BLOCKS APPLY VARIABLE LENGTH ENCODING TO THE RESULTING STREAM APPLY DCT TRANSFORMATION TO EACH 8X8 PIXEL BLOCK MPEG COMPRESSED VIDEO STREAM
    40. 40. Three Types of Frames Intra frames (same as JPEG) – typically about 12 frames between I frames Predictive frames – encode from previous I or P reference frame Bi-directional frames – encode from previous and future I or P frames I B B P B B P B B P B B I
    41. 41. Lossless compressionLoss-less compressions reduce file size by encoding image information more efficiently.Images compressed using loss-less algorithms are able to be restored to their original condition.
    42. 42. Lossy compressionLossy compressions reduce file size by considerably greater amounts than loss-less compressions but lose both information and quality.At high compression, the image will become visibly degraded.
    43. 43. StandardJPEGJoint Photographic Experts Group Jpeg is the standard compression techniques for still imagesLossy compressionBest suited for photography
    44. 44. Standard JPEG It supports the four modes of encoding – Sequential • The image is encoded in the order in which it is scanned. – Progressive • The image is encoded in multiple passes.
    45. 45. JPEG (contd.)– the original quality of the image can be fully restored Hierarchical • The image is encoded at multiple resolutions to accommodate different types of displays.– Lossless • The image is encoded in such a way that
    46. 46. Standard JPEG
    47. 47. MPEG Standard MPEG-1 MPEG-2 MPEG-3 MPEG-4 MPEG-7 MPEG-21
    48. 48. MPEG StandardMPEG-1: Initial video andaudio compression standard.Later used as the standard forVIDEO CD, and (MP3) audiocompression format.
    49. 49. MPEGMPEG-2: Video and audio standards for broadcast-quality television. Used for digital satellite TV services like DIRECT TV, digital Cable television signals, and (with slight modifications) for DVD video discs.MPEG-3: Originally designed for HDTV, but abandoned in favor of MPEG-2.
    50. 50. MPEG StandardMPEG-4: Expands MPEG-1 to supportvideo/audio "objects", 3D content, lowbitrate encoding and support forDigital Rights Managements.MPEG-7: A formal system fordescribing multimedia content.MPEG-21: MPEG describes this futurestandard as a Multimedia Framework.
    51. 51. JPEG Standard JPEG : – the real image compression is the Discrete Cosine Transform (DCT). – Removes the redundant information (the "invisible" parts). JPEG-2000: – Successor to the JPEG . – Blockiness of JPEG is removed, – The compression ratio for JPEG 2000 is higher than for JPEG
    52. 52. Codecs Compression/Decompression Scheme Hardware or software based Many use both spatial and temporal compression techniques
    53. 53. Why do we do data compression?Data compression is simply done for saving the space in the hard disk, thereby to make it more fault tolerant.
    54. 54. What is the use of data compression on network?The most prominent use of data compression on the network is to make the server more spacious so that more files can be stored on it.
    55. 55. CompressionEven though disks have gotten bigger, we are still running short on disk spaceA common technique is to compress files so that they take up less space on the disk.
    56. 56. COMPUTER TOOLS AND UTILITIES
    57. 57. Compression Utilities Zip files are used for rapidly distributing and storing files. Zip files are compressed to save space. WinZip - a popular compression utility for Windows. Win RAR
    58. 58. Lossless vs. lossy
    59. 59. Applications lossless data compression is often used to better use disk space on office computers, or better use the connection bandwidth in a computer network.
    60. 60. ApplicationsIn other kinds of data such as sounds and pictures, a small loss of quality can be tolerated without losing the essential nature of the data, so lossy data compression methods can be used.
    61. 61. Lossless vs. Lossy Compressionfile:///C:/Documents and Settings/login.IPS/Desktop/amit_jain/abc/lossy data compression Information From Answers_com_files/LOSSY.gif NOTE: Business data requires lossless compression, while audio and video applications can tolerate some loss, which may not be very noticeable.
    62. 62. ADVANTAGES Data compression is simply done for saving the space in the hard disk, thereby to make it more fault tolerant. The most prominent use of data compression on the network is to make the server more spacious so that more files can be stored on it.
    63. 63. Lossy vs. Lossless Compression Lossy method can produce a much smaller compressed file than any known lossless method, while still meeting the requirements of the application. Lossily compressed still images are often compressed to 1/10th their original size, as with audio, but the quality loss is more noticeable, especially on closer inspection.
    64. 64. DATA COMPRESSION NEEDED AS MOST OF THE REAL WORLD DATA IS REDUNDANT IMPORTANCE? SAVES DISK SPACE SAVES CONNECTION BANDWIDTH REDUCES PROCESSING TIME REDUCES COMMUNICATION TIME ENABLES FAST STORAGE AND RETRIEVAL
    65. 65. DATA COMPRESSION TYPESREVERSIBLE IRREVERSIBLELOSSY – WHEN EFFICIENCY OFTRANSMISSION IS MORE IMPORTANTTHAN ACCURACY OF INFORMATION.
    66. 66. INFORMATION THEORY IT IS A BRANCH OF MATHEMATICS THAT DEALS WITH DATA/INFORM - ATION REPRESENTATION DATA COMPRESSION IS ONE OF THE APPLICATIONS OF INFORMATION THEORY
    67. 67. SHANNON’S PRINCIPLEFOR INFORMATION FOR DATA COMPRESSION, IT IS ESENTIAL TO MEASURE INFORMATION CONTENTS IN THE DATA OR THE DEGREE OF RANDOMNESS/UNCERTAINTY HIGH PROBABILITY EVENTS CONTAIN LESS SELF-INFORMATION WHEREAS LOW PROB EVENT ASSOCIATES MUCH MORE SELF INFORMATION
    68. 68. SHANNON’S PRINCIPLEFOR INFORMATION IT WAS GIVEN BY CLAUDE SHANNON ACCORDING TO HIM, SELF- INFORMATION IS ASSOCIATED WITH EVERY POSSIBLE OUTCOME OF AN EVENT.
    69. 69. SHANNON’S PRINCIPLEFOR INFORMATION LET P(A) & P(B) BE THE PROB OF OCCURANCE OF EVENTS A & B RESPECTIVELY. ACCORDING TO SHANNON, SELF-INFO ASSOCIATED WITH EVENT A MAY BE DEFINED AS Si(A) = - logmP(A)= logm[1/P(A)] SIMILARLY, Si(B)= logm[1/P(B)] WHERE m DEFINES THE UNIT OF INFO
    70. 70. SHANNON’S PRINCIPLEFOR INFORMATION PROB EVENT VIS-À-VIS SELF INFO P(A) Si(A) 1 0 0.5 1.0 0.25 2.0 0.10 3.32 0.05 4.32
    71. 71. SHANNON’S PRINCIPLEFOR INFORMATION CONCEPT OF Si MAY ALSO BE USED TO MAKE INFERENCES BY ASSOCIATING IT WITH 2 INDEPENDENT EVENTS LET A & B BE 2 INDEPENDENT EVENTS, THEN P(AB)= P(A)*P(B) Si(AB)=-log2[P(AB)] = [-log2P(A)] + [-log2P(B)] = Si(A) + Si(B)
    72. 72. ENTROPY OF INFORMATION ENTROPY IS A CONCEPT OF THERMODYNAMICS IN INFO THEORY, IT IS USED TO FIND OUT THE RANDOMNESS/UNCERTAINTY IN A MESSAGE
    73. 73. ENTROPY OF INFORMATION THE AVERAGE INFO CONTENT OF A MESSAGE IS CALLED ITS ENTROPY THE LESS LIKELY A MESSAGE IS TO OCCUR, THE LARGER ITS INFO CONTENT ENTROPY IS AN IMPORTANT CONCEPT OF DATA COMPRESSION
    74. 74. ENTROPY OF INFORMATION ENTROPY (Ee) IS THE MINIMUM NO OF BITS NEEDED TO ENCODE THAT ELEMENT THE ENTROPY OF AN ENTIRE MESSAGE (Em) IS THE MIN NO. OF BITS NEEDED TO ENCODE THE ENTIRE MESSAGE WITH A LOSSLESS COMPRESSION.
    75. 75. ENTROPY OF INFORMATION THE ENTROPY OF A MESSAGE CAN BE USED TO DETERMINE IF THE DATA COMPRESSION IS WORTH ATTEMPTING. IT CAN ALSO BE USED TO EVALUATE THE EFFECTIVENESS OF COMPRESSION.
    76. 76. ENTROPY OF INFORMATION THE NO. OF BITS IN A COMPRESSED CODE CAN BE COMPARED TO THE ENTROPY FOR THAT MESSAGE Em REVEALING HOW CLOSE TO OPTIMAL COMPRESSION ONE’S CODE IS.
    77. 77. ENTROPY OF INFORMATION SHANNON PROPOSED THE FOLLOWING ENTROPY FN FOR A MESSAGE: Em = - Σ Pi log2(Pi), sum over 1 TO N ---- (1) WHERE N= NO. OF POSSIBLE CHAR TYPES USED IN THE MESSAGE AND Pi DENOTES THE PROB OF THE ith CHAR. Eg “AABCCD”, N=4
    78. 78. ENTROPY OF INFORMATION THE ENTROPY OF A CHAR IS GIVEN BY ITS SELF INFO ie., ENTROPY OF A CHAR A IS GIVEN BY Ee=-log2P(A) THE ENTROPY OF A MESSAGE CONTAINING N CHARS CAN ALSO BE FOUND OUT IN TERMS OF AV SELF INFO OF ALL N CHARS ie, Em = (1/N)*Σ Si OR ENTROPY OF ith CHAR, I= 1 TO N ------- (2)
    79. 79. ENTROPY OF INFORMATION NOTICE THE DIFFERENCE BETWEEN N IN THE TWO EQUATIONS IN 1ST, N IS THE NO OF DISTINCT CHARS USED IN THE MESSAGE AND IN 2ND N = TOTAL NO OF CHARS USED IN THE MESSAGE
    80. 80. ENTROPY OF INFORMATION SO, ENTROPY OF A MESSAGE GIVES THE AVERAGE NO OF BITS REQUIRED TO REPRESENT A CHARACTER IN THE MESSAGE QUES: FOR THE MESSAGE “dadbadcadbaadac” CALCULATE Si ASSOCIATED WITH CHARS A & B, ENTROPY OF CHARS C & D, AV SELF INFO IN THE MESSAGE, ENTROPY OF THE MESSAGE?
    81. 81. ENTROPY OF INFORMATION N=15 CHAR NO OF CHARS PROB OF CHAR Si d 4 4/15 1.90 a 6 6/15 1.32 b 2 2/15 2.90 c 3 3/15 2.32AV SELF INFO OF MESSAGE= [1/N]*Σ ENTROPY OF ith CHAR= [1/15]*[E(1) + E(2) + E(3) + ….+ E(15)]=[1/15]*[E(d)+E(a)+E(d)+E(b)+………+E(c)]= [1/15] *[1.90+1.32+1.90+2.90+..+2.32] = [1/15]*28.28 = 1.88ENTROPY OF MESSAGE=-Σ Pi*log2(Pi), i=1 TO 4= (4/15)*(1/1.90) + (6/15)*(1/1.32) + (2/15)*(1/2.90) + (3/15)(1/2.32)= 1.88
    82. 82. ENTROPY OF INFORMATION NOTE THAT THE AV SELF INFORMATION OF THE MESSAGE AND THE ENTROPY OF THE MESSAGE BOTH ARE SAME AND BOTH THE FUNCTIONS GIVE THE AVERAGE NO OF BITS REQUIRED TO REPRESENT A CHARACTER IN THE MESSAGE
    83. 83. ENTROPY OF INFORMATION QUES2: CALCULATE THE AV NO. OF BITS REQUIRED TO REPRESENT A CHAR IN THE MESSAGE STRING “AAAAABBCC”
    84. 84. ENTROPY OF INFORMATIONA 6 0.6B 2 0.2C 2 0.2ENTROPY OF MESSAGE=-Σ Pi*log2(Pi), I=1 TO NHERE N=3ENTROPY OF THE MESSAGE= 0.6*log2(1/0.6)+0.2*log2(1/0.2)+0.2*log2(1/0.2)=0.6*0.74 + 0.2*2.32 + 0.2*2.32=0.44 + 0.46 + 0.46=1.36= AV NO OF BITS REQUIRED TO REPRESENT A CHARACTER
    85. 85. CODES A CODE IS ANY MAPPING FROM AN INPUT ALPHABET TO AN OUTPUT ALPHABET A CODE CAN BE SAY {a,b,c} = {0,1,00}, BUT THIS CODE IS NOT UNIQUELY DECODABLE. IF THE DECODER GETS A CODE MESSAGE OF 2 ZEROS, THERE IS NO WAY IT CAN KNOW WHETHER THE ORIGINAL MESSAGE HAD TWO a’S OR ONE c’S
    86. 86. CODES A CODE IS INSTANTANEOUS IF EACH CODEWORD IN A MESSAGE CAN BE DECODED AS SOON AS IT IS RECEIVED. THE BINARY CODE {a,b} = {0,01} IS UNIQUELY DECODABLE, BUT IT IS NOT INSTANTANEOUS. ONE HAS TO SEE IF THE NEXT BIT IS 1. IF IT IS, b IS DECODED; IF NOT a IS DECODED. THE BINARY CODE {a,b,c}={0,10,11} IS AN INSTANTANEOUS CODE
    87. 87. CODES A CODE IS A PREFIX CODE IFF NO CODEWORD IS A PREFIX OF ANOTHER CODE WORD. A CODE IS INSTANTANEOUS IFF IT IS A PREFIX CODE, SO A PREFIX CODE IS ALWAYS A UNIQUELY DECODABLE INSTANTANEOUS CODE. ALL UNIQUELY DECODABLE CODES CAN BE CHANGED INTO PREFIX CODES OF EQUAL CODE LENGTHS.
    88. 88. TYPES OF CODING THERE ARE MANY ALGORITHMS FOR CODING THE CHARACTERS BUT CAN BE BROADLY DIVIDED INTO 2 TYPES: STATIC (FIXED SIZE) CODING DYNAMIC (VARIABLE SIZE) CODING
    89. 89. STATIC CODING SCHEME IF THE MESSAGE IS COMPOSED BY THE COMBINATION OF M DISTINCT CHARS, THEN THE POSSIBLE NO. OF BITS REQUIRED IN THE CODE= N = logbM, WHERE N= MINIMUM NO. OF BITS REQUIRED TO REPRESENT M DISTINCT CHARS AND b= BASE OF THE NUMBER SYSTEM
    90. 90. STATIC CODING SCHEME THE MAIN DISADVANTAGE IS THAT IT DOES NOT CONSIDER THE FREQUENCY OR PROB OF OCCURANCE OF A PARTICULAR CHAR IN THE MESSAGE
    91. 91. STATIC CODING SCHEMEQUES 3: CONSIDER THE MESSAGE “RAMRAHIM” FIND THE NO OF DISTINCT CHARS, THE MIN NO OF BITS REQUIRED TO REPRESENT A CHAR, GENERATE THE CODE FOR ALL DISTINCT CHARS, BY USING THESE CODES WHAT SHALL BE THE CODED MESSAGE FOR THE MESSAGE “MIHIR”, HOW MUCH IS THE SAVING BY USING THE CODING SCHEME OVER ASCII REPRESENTATION
    92. 92. STATIC CODING SCHEME NO OF DISTINCT CHARS = 5 N=log2M = log25= 3; SO 3 BIT CODE IS NEEDED TO REPRESENT EACH SYMBOL 000=R, 001=A, 010=M, 011=H, 100=I; REST ARE UNUSED BY USING THE CODES AS ABOVE, THE CODED MESSAGE FOR “MIHIR” SHALL BE “010100011100000” EACH CHARACTER OF THE STRING IS REPRESENTED BY 3 BITS AND THERE ARE 5 CHARACTERS IN THE MESSAGE. SO THE NO OF BITS REQUIRED= 5*3=15; THEREFORE SAVING = 40 – 15 = 25 BITS
    93. 93. DYNAMIC CODINGSCHEME COMPUTERS ENCODE CHARS IN ASCII CODE. SO, A FILE HAVING 100 CHARS SHALL REQUIRE 800 BITS BUT IN ANY TEXT FILES, SOME CHARS OCCUR WITH MORE FREQUENCY THAN OTHERS SO, IT IS BETTER THAT SHORTER BIT CODES ARE ASSIGNED TO THE FREQUENTLY OCCURING CHARS THAN OTHERS. THIS WAS ALSO REALIZED WAY BACK BY SAMUEL NORSE. THIS CONCEPT IS USED IN DYNAMIC CODING.
    94. 94. DYNAMIC CODING SCHEME IT USES VARIABLE SIZE CODE MINIMUM NO OF BITS ARE ASSIGNED TO THE MOST FREQUENTLY OCCURING CHARACTER AND MAXIMUM NO OF BITS TO THOSE WHICH ARE LEAST FREQUENTLY USED. ANY STATISTICAL MODEL MAY BE USED TO CALCULATE THE FREQUENCY OF OCCURANCE OF CHARACTERS.
    95. 95. DYNAMIC CODING SCHEMEQUES 4: CONSIDER THE MESSAGE “RAAMRAHMMM”FIND OUT THE DISTINCT CHARACTERS AND THEIR FREQUENCY, GENERATE CODES FOR ALL CHARACTERS USING DYNAMIC CODING, USING GENERATED CODES WRITE THE CODE FOR “MAHR”, HOW MUCH IS THE SAVINGS IN BIT.3. 4 DISTINCT CHARS; R-2,A-3, M-4 AND H-14. M-1,A-01,R-001,H-00015. 1010001001= 10 BITS6. SAVINGS = 32 – 10 = 22 BITS
    96. 96. USE OF ENTROPY IN CODING THE ENTROPY FN IS USED TO DEVELOP AN EFFICIENT CODE FOR THE PURPOSE OF COMMUNICATION. ONE CAN USE ENTROPY TO FIND OUT THE SCOPE OF FURTHER REFINEMENT IN THE CODING SCHEME AS THE ENTROPY OF THE MESSAGE RESULTS IN AVERAGE NO OF BITS REQUIRED TO REPRESENT A CHARACTER.
    97. 97. USE OF ENTROPY IN CODINGQUES 5: CONSIDER A MESSAGE STREAM CONSISTING OF CHARS A,B,C,D. LET THE PROB OF OCCURANCE OF CHARS BE 0.6, 0.3, 0.08 AND 0.02 RESPECTIVELY. Si RESPECTIVELY IS 0.73,1.73,3.64 AND 5.64B. FIND MIN NO OF BITS REQ TO REPRESENT A CHAR USING STATIC CODING, IF A MESSAGE CONSISTS OF ALL THE 4 CHARSC. GENERATE CODE FOR THE CHARS USING DYNAMIC SCHEME. WHAT IS THE AV NO OF BITS REQ TO REPRESENT A CHAR IN THIS CODING SCHEME, IF A MESSAGE CONTAINS 100 CHARSD. IS THERE ANY POSSIBILITY OF FURTHER REFINEMENT IN THE CODING SCHEME?
    98. 98. USE OF ENTROPY IN CODING M=4; MIN NO OF BITS REQ TO REP A CHAR=N=log24=2 BY LOOKING INTOTHE TABLE, THE FOLLOWING CODES CAN BE GENERATED USING DYNAMIC SCHEME: CHAR PROB CODE A 0.70 1 B 0.15 01 C 0.10 001 D 0.05 0001 AV NO OF BITS REQ TO COMM A MESSAGE OF 100 CHARS = [(70*1)+(15*3)+(10*2)+(5*4)]/100 = 150/100 = 1.5
    99. 99. USE OF ENTROPY IN CODING3. DYNAMIC CODING IS MORE EFFICIENT4. ENTROPY = - Σ Pi *log2(Pi) = - (0.7*log2(0.7)+0.15*log2(.15)+0.1*log2(.1)+0.05*log2(.05)) = 1.31 SO, AV NO OF BITS REQ TO REPRESENT A CHAR IN THE MESSAGE=1.3THERE IS A DIFFERENCE BETWEEN THE ENTROPY VALUE AND THE NO OF BITS REQUIRED BY BOTH THE METHODS, THEREFORE FURTHER REFINEMENT IS POSSIBLE IN THE CODING SCHEMES.
    100. 100. LOSSLESS DATA COMPRESSION ALL ALGORITHMS ATTEMPT TO RE-ENCODE DATA TO REMOVE REDUNDANCY IT IMPLIES THAT DATA WITH NO REDUNDANCY CAN NOT BE COMPRESSED BY THESE TECHNIQUES WITHOUT SOME LOSS OF INFORMATION
    101. 101. SHANNON FANO ALGORITHM IT USES THE IDEA OF USING SHORTER CODES FOR MORE FREQUENTLY OCCURING CHARACTERS GIVEN BY CLAUDE SHANNON & R.M.FANO
    102. 102. SHANNON FANO ALGORITHM ADV? CONSIDER A FILE HAVING 40 LETTERS WITH THE GIVEN FREQUENCY- A:14; B:7; C:10; D:5; E:4 ASCII – 40*8=320 BITS. DECODING SIMPLY CONSISTS OF BREAKING INTO 8 BYTES AND CONVERTING IT INTO CHARACTER. SO, IT NEEDS NO ADDITIONAL INFO.
    103. 103. SHANNON FANO ALGORITHM VARIABLE LENGTH ENCODING SCHEMES SUCH AS HUFFMAN AND SHANNON-FANO HAVE THE FOLLOWING PROPERTIES: CODES FOR MORE FREQUENT CHARS ARE SHORTER THAN ONES FOR LESS PROBABLE CHARS
    104. 104. SHANNON FANO ALGORITHM EACH CODE CAN BE UNIQUELY DECODED. THIS IS CALLED THE PREFIX PROPERTY ie., NO CHARS ENCODING IS A PREFIX OF ANY OTHER.• TO SEE WHY THIS PROPERTY IS IMPORTANT, CONSIDER “A” ENCODED AS 0;”B” AS 01;”C” AS 10. IF THE DECODER ENCOUNTERS THE BIT- STREAM “0010”, IS IT “ABA” OR “AAC”?
    105. 105. SHANNON FANO ALGORITHM WITH THE PREFIX GUARANTEE, THERE IS NO AMBIGUITY IN DETERMINING WHERE THE CHAR BOUNDARIES ARE. ONE STARTS READING FROM THE BEGINNING AND GATHER BITS IN A SEQUENCE UNTIL ONE FINDS A MATCH. THAT INDICATES THE END OF CHAR AND ONE MOVES ALONG TO THE NEXT CHAR.
    106. 106. SHANNON FANO ALGORITHM1. FIND THE FREQ OF OCCURANCE OF EACH SYMBOL3. SORT IT IN THE DESCENDING ORDER5. DIVIDE THE LIST INTO 2 PARTS, WITH THE TOTAL FREQ COUNT OF THE UPPER HALF BEING AS CLOSE TO THAT OF THE BOTTOM HALF AS POSSIBLE
    107. 107. SHANNON FANO ALGORITHM1. REPEAT STEP 3 UNTIL EACH HALF CONTAINS JUST ONE SYMBOL3. CONSTRUCT THE BINARY TREE (SF TREE) SO THAT THE UPPER HALF BECOMES THE LEFT SUB-TREE AND THE LOWER HALF BECOMES THE RIGHT SUB-TREE. EACH LEFT BRANCH IS ASSIGNED 0 AND EACH RIGHT HALF 1
    108. 108. SHANNON FANO ALGORITHM1. TO OBTAIN THE CODE FOR ANY SYMBOL, THE CODE IS THE COMBINATION OF ALL THE DIGITS FROM THE ROOT TO THAT LEAF (SYMBOL) QUES: APPLY SF ALGO TO A TEXT FILE HAVING 40 CHARS WITH THE GIVEN FREQ: A-14, B-7, C-10, D-5, E-4
    109. 109. SHANNON FANO ALGORITHM1. SYMBOL FREQUENCY A 14 B 7 C 10 D 5 E 4
    110. 110. SHANNON FANO ALGORITHM1. SORT IT IN DESCENDING ORDER SYMBOL FREQUENCY A 14 C 10 B 7 D 5 E 4
    111. 111. SHANNON FANO ALGORITHM1. DIVIDING INTO PARTS FIRST ITERATION A 14 C 10 B 7 D 5 E 4
    112. 112. SHANNON FANO ALGORITHMSECOND ITERATION A 14 C 10 B 7 D 5 E 4
    113. 113. SHANNON FANO ALGORITHMTHIRD ITERATION A 14 C 10 B 7 D 5 E 4 AFTER THE FOURTH ITERATION, WE WILL HAVE THE FOURTH DIVISION AND ALL THE HALF WILL THEN HAVE ONLY ONE SYMBOL.
    114. 114. SHANNON FANO ALGORITHM1. SHANNON FANO TREE: 40 0 1 24 16 0 1 0 1 14 10 7 9 A C B 0 1 5 4 D E
    115. 115. SHANNON FANO ALGORITHM1. OBTAINING THE CODE FROM THE TREE SYMBOL CODE NO OF BITS FREQUENCY A 00 2 14 B 10 2 7 C 01 2 10 D 110 3 5 E 111 3 4TOTAL NO OF BITS NEEDED FOR TEXT = 89SO, AV NO OF BITS USED BY ANY SYMBOL=89/40=2.225WHICH IS QUITE LESS AS COMPARED TO 8 BITS PERSYMBOL NEEDED IN ASCII
    116. 116. HUFFMAN ALGORITHM GIVEN BY DAVID HUFFMAN IMPROVEMENT OVER S-F ALGO. LOSSLESS COMP ALGO, IDEAL FOR COMPRESSING TEXT OR PROGRAM FILES HUFFMAN CODE TABLE GUARANTEES TO PRODUCE THE LOWEST POSSIBLE OUTPUT BIT COUNT POSSIBLE FOR THE INPUT STREAM OF SYMBOLS, WHEN USING FIXED LENGTH CODES
    117. 117. HUFFMAN ALGORITHM HUFFMAN CALLED THESE “MINIMUM REDUNDANCY CODES” IT BELONGS TO THE FAMILY OF ALGOS WITH A VARIABLE CODE WORD LENGTH. USED IN PKZIP, LHA, GZ, ZOO AND ARJ, JPEG AND MPEG
    118. 118. HUFFMAN ALGORITHM MAIN DIFFERENCE: S-F ALGO BUILDS THE BINARY TREE FROM TOP TO BOTTOM, WHEREAS HUFFMAN’S ALGO FORMS THE BINARY TREE FROM BOTTOM TO TOP PERFORMANCE OF BOTH OF THEM ARE QUITE SIMILAR
    119. 119. HUFFMAN ALGORITHM1. COUNT THE NO OF CHARS AND THE FREQ OF OCCURANCE OF EACH CHARACTER3. ARRANGE THEM IN THE DESCENDING ORDER OF FREQ.5. CONSTRUCT HUFFMAN TREE FOR THE GENERATION OF CODES
    120. 120. HUFFMAN ALGORITHM CONSTRUCTION OF HUFFMAN TREE:2. PICK UP 2 CHARS FROM THE LIST HAVING MINIMUM FREQ. LET US CALL THESE CHARS A AND B4. CREATE 2 FREE NODES OF THE BT AND ASSIGN A AND B TO THESE NODES6. ASSIGN A PARENT NODE FOR THEM AND ASSIGN IT THE FREQ THAT IS THE SUM OF THE CHILD NODES. LET US CALL IT “AB”
    121. 121. HUFFMAN ALGORITHM1. DELETE A AND B FROM THE LIST3. ADD THE VALUE OF “AB” TO THE LIST5. REPEAT THE STEPS 1 TO 5 TILL THE LIST OF CHARS BECOMES EMPTY. THE RESULTANT TREE THUS GENERATED IS THE HUFFMAN TREE.7. ASSIGN THE BITS TO THE NODES OF THE TREE AS IN S-F ALGO.ie., 0 TO LEFT CHILD & 1 TO RIGHT CHILD9. TO FIND THE CODE FOR A CHAR, TRAVERSE FROM ROOT TO LEAF CONTAINING THAT CHAR.
    122. 122. HUFFMAN ALGORITHM PROBLEM: LET A MESSAGE OF 100 CHARS CONTAIN THE FOLLOWING: CHAR FREQUENCY A 50 B 20 C 15 D 10 E 5 STEPS 1 AND 2 HAVE ALREADY BEEN DONE
    123. 123. HUFFMAN ALGORITHM CONSTRUCTION OF HUFFMAN TREE:2. TWO CHARS HAVING MINIMUM FREQ ARE D &E2 MAKE D AND E 2 FREE NODES OF THE TREE 10 5 D E3. ASSIGN A PARENT NODE FOR THEM: DE 15 10 5 D E
    124. 124. HUFFMAN ALGORITHM4 & 5: DELETE D & E FROM THE LIST AND ADD DE AND REPEAT 30 CDE 15 15 C DE6. REPEAT 1 TO 5 UNTIL LIST EMPTY 50 CDEB 30 20 CDE B
    125. 125. HUFFMAN ALGORITHMCONTD: CDEBA 100 50 50 CDEB A
    126. 126. HUFFMAN ALGORITHM H TREE: CDEBA 100 0 1 50 50 CDEB A 0 1 30 CDE 20 B 0 1 15 15 C 0 DE 1 10 5 D E
    127. 127. HUFFMAN ALGORITHMCHARACTER HUFFMAN CODE SIZE A 0 1 B 11 2 C 100 3 D 1010 4 E 1011 4TOTAL NO OF BITS REQUIRED=195AV BITS USED = 1.95ENTROPY OF MESSAGE= - Σ Pi* log (Pi) =1.932 BITSSO, REDUNDANCY= 1.95 – 1.932 = 0.018 BITS/CHAR
    128. 128. HUFFMAN ALGORITHM HUFFMAN CODING CAN BE FURTHER OPTIMIZED: EXTENDED HUFFMAN COMPRESSION- CAN ENCODE GROUP OF SYMBOLS RATHER THAN SINGLE SYMBOL ADAPTIVE HUFFMAN CODING- DYNAMICALLY CHANGES THE CODE WORDS ACCORDING TO THE CHANGE OF PROBABILITY OF SYMBOLS
    129. 129. ARITHMETIC CODING IT IS A METHOD OF WRITING A CODE IN A NON-INTEGER LENGTH IT ALLOWS ONE TO CODE VERY CLOSE TO IDEAL ENTROPY IT DOES NOT REPLACE AN INPUT SYMBOL WITH A SPECIFIC CODE INSTEAD, IT TAKES A STREAM OF INPUT SYMBOLS AND REPLACES IT WITH A SINGLE FLOATING POINT OUTPUT NUMBER
    130. 130. ARITHMETIC CODING IT IS QUITE SIMILAR TO HUFFMAN, BECAUSE IT IS USED FOR THE SAME KIND OF CONTENTS TO COMPRESS AS IN HUFFMAN IT IS DIFFERENT FROM HUFFMAN IN THE WAY IT PROCESSES THE SOURCE INSTEAD OF GIVING BIT VALUE TO EACH CHAR, IT USES PROB VALUE FOR EACH CHAR IT IS BASED UPON PROB BETWEEN 0 AND 1
    131. 131. ARITHMETIC CODING THE OUTPUT FROM AN ARITHMETIC CODING PROCESS IS A SINGLE NUMBER LESS THAN 1 AND GREATER THAN OR EQUAL TO 0 THIS SINGLE NUMBER CAN BE UNIQUELY DECODED TO CREATE THE EXACT STREAM OF SYMBOLS THAT WENT INTO ITS CONSTRUCTION IT RESULTS IN BEST COMPRESSION RATIO.
    132. 132. ARITHMETIC CODING1. IT REQUIRES 5 VARIABLES FOR ENCODING: RANGE, LOW, HIGH, RF(RANGE FROM), RT (RANGE TO)EXAMPLE: LET THE MESSAGE BE “ABCBAA”. THE FREQ OF CHARS A, B AND C ARE 3, 2 AND 1 RESPECTIVELY. A TABLE IS TO BE CREATED AS FOLLOWS:CHAR PROB RANGE RANGEFROM RANGETO A 0.5 >=0&<.5 0.00 0.50 B 0.33 >=0.5 &<0.83 0.50 0.83 C 0.16 >=0.83&<1 0.83 1.00
    133. 133. ARITHMETIC CODING1. NOW ENCODE THE CHARS IN THE MESSAGE USING THE TABLE AS OBTAINED IN STEP 1, AS FOLLOWS, AND CREATE A TABLE AGAIN: SET LOW=0 AND HIGH=1.0 WHILE there are still input symbols, DO GET AN INPUT SYMBOL RANGE=HIGH (previous) – LOW (prev) LOW=LOW(prev) + (RANGE*RF of current symbol) HIGH=LOW(prev) + (RANGE* RT of current symbol) END OF EHILE OUTPUT=LOW
    134. 134. ARITHMETIC CODINGCHAR RANGE LOW HIGHSTART-NONE 1-0=1 0+1*0=0 0+1*0=0 A 1-0=1 0+1*0=0 0+1*.5=0.5 B 0.5-0=0.5 0+.5*.5=.25 0+.5*.83=.415 C 0.415-.25=0.165 .25+.165*.83 .25+.165*1.0 =.38695 =0.415 B 0.415-0.38695 .38695+.02805 .38695+.02805 = .02805 *0.5=.400975 *.83=.4102315 A 0.4102315-.400975 .400975+ .400975+ =0.0092565 *0.0=0.400975 .0092565*0.5 =0.40560325 A 0.40560325- .400975+ 0.400975+ 0.400975 .00462825*0.0 .00462825*.5 =0.00462825 = 0.400975 = 0.403289125
    135. 135. ARITHMETIC CODING THE FINAL OUTPUT VALUE = LOW =0.400975 STEP 3: TO DECODE THE MESSAGE Ie., TO GET THE CHARS BACK, THE FOLLOWINF PROCESS IS ADOPTED: AGAIN 5 VARIABLES ARE REQUIRED- RANGE, RF, RT, VALUE AND RD (RANGE DIFFERENCE)5. OUTPUT THE SYMBOL BY DETERMINING THAT IN WHICH RANGE THE VALUE IS. IN THIS EXAMPLE OUTPUT IS 0.400975 WHICH LIES BETWEEN O AND 0.5. SO, THE FIRST CHARACTER DECODED IS “A”.2. GET A NEW VALUE USING RD=RT – RF NEW VALUE = (PREV VALUE – PREV RF)/PREV RD
    136. 136. ARITHMETIC CODINGVALUE RANGE CHAR DECODED RD.400975 0.00 - <0.5 A 0.5.80195 0.5 - <0.83 B 0.330.915 0.83 - <1.00 C 0.160.53125 0.5 - <0.83 B 0.330.09469696 0.00 - <0.5 A 0.500.18939392 0.00 - <0.5 A 0.50ADV: BETTER RESULT THAN HUFFMANDISADV:10. COMPLICATED CALCULATIONS11. IT REQUIRES FPU, SO PROCESS IS SLOW12. DOES NOT KNOW WHERE THE DECODING PROCESS SHOULD END. TO OVERCOME THIS PROBLEM, ONE SPL CHAR IS INSERTED INTO THE ENCODED TEXT AS DELIMITER. AT THE TIME OF DECODING, IT INDICATES THAT THERE ARE NO MORE CHARS TO DECODE.
    137. 137. Compression ratio ONE NEEDS TO KNOW IT TO FIND OUT THE EFFICIENCY OF THE COMPRESSION ALGORITHM C.R.= SIZE OF O.D.- SIZE OF C.D. SIZE OF ORIGINAL DATA
    138. 138. DICTIONARY BASED COMPRESSION TECHNIQUES STATISTICAL METHODS, SUCH AS S-F AND HUFFMAN, ENCODE A SINGLE SYMBOL AT A TIME BY GENERATING A ONE-TO-ONE SYMBOL-TO-CODE MAP. DICTIONARY BASED COMPRESSOR REPLACES AN OCCURANCE OF A PARTICULAR PHRASE OR GROUP OF BYTES IN A PIECE OF DATA WITH AN INDEX TO THE PREVIOUS OCCURANCE OF THAT PHRASE.
    139. 139. DICTIONARY BASED COMPRESSION TECHNIQUES SUPPOSE A TEXT IS GIVEN IT IS ASSUMED THAT THERE IS A DICTIONARY THAT HAS ALL THE WORDS IN THE GIVEN TEXT. EACH WORD IN THE DICTIONARY IS REPRESENTED BY A UNIQUE NUMBER THAT ALSO INDICATES THE POSITION OR THE INDEX OF THE WORD IN THE DICTIONARY.
    140. 140. DICTIONARY BASED COMPRESSION TECHNIQUES WHEN THE TEXT IS TO BE COMPRESSED, THE WORDS OF THE TEXT ARE REPLACED BY THE INDEX OF THAT WORD. LET THE TEXT BE “LEARN THE DICTIONARY BASED COMPRESSION METHOD. IT IS A VERY SIMPLE METHOD. THANK YOU.” SAY THE DICTIONARY IS LIKE THIS: LEARN-1; THE-2; DICTIONARY-3; BASED-4; COMPRESSION-5; METHOD-6; IT-7; IS-8; A-9; VERY-10; SIMPLE-11; THANK YOU-12 THE ENCODED MESSAGE WILL BE “1 2 3 4 5 6 7 8 9 10 11 6 12”
    141. 141. DICTIONARY BASED COMPRESSION TECHNIQUES IF THE PHRASES ARE USED, EFFICIENCY INCREASES. FOR THIS TO WORK, IT IS IMPORTANT THAT THE SENDER AND THE RECEIVER MUST HAVE ACCESS TO THE SAME DICTIONARY. DICTIONARY BASED METHODS ARE MORE EFFICIENT THAN CHARACTER BASED METHODS. IT GENERATES THE CODE FOR CHARS AS WELL AS FREQUENTLY USED WORDS AND PHRASES.
    142. 142. DICTIONARY BASED COMPRESSION TECHNIQUES THE DICTIONARY BASED METHOD MAY BE STATIC OR DYNAMIC DEPENDING UPON THE CREATION AND USE OF DICTIONARY. STATIC DICTIONARY IS PREPARED BEFORE THE COMMUNICATION OF THE ENCODED MESSAGE TO THE RECEIVER’S END. ALL POSSIBLE CHARS/WORDS/PHRASES ARE INSERTED INTO THE DICTIONARY AND INDEXED.
    143. 143. DICTIONARY BASED COMPRESSION TECHNIQUES THE MAIN DRAWBACK OF STATIC METHOD IS THAT PERFORMANCE DEPENDS UPON THE TEXT TO BE ENCODED AND IS HIGHLY DEPENDENT ON THE ORGANIZATION OF THE CHARS/WORDS/PHRASES IN THE DICTIONARY. SECONDLY, IF THERE IS ANY WORD NOT IN THE DICTIONARY, IT FAILS. THE SOLUTION TO THE PROBLEM IS DYNAMIC DICTIONARY COMPRESSION.
    144. 144. DICTIONARY BASED COMPRESSION TECHNIQUES IN THIS METHOD, THE DICTIONARY IS PREPARED AT THE TIME OF ENCODING OF TEXT. LZ77, LZ78 AND LZW TECHNIQUES USE DYNAMIC DICTIONARY COMPRESSION TECHNIQUE. IT GENERATES OPTIMUM SIZE CODES.
    145. 145. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE DESIGNED FOR SEQUENTIAL DATA COMPRESSION. THE DICTIONARY IS A PORTION OF THE PREVIOUSLY ENCODED SEQUENCE. THE ENCODER EXAMINES THE INPUT SEQUENCE THROUGH A SLIDING WINDOW. THE WINDOW CONSISTS OF 2 PARTS: A SEARCH BUFFER, THAT CONTAINS A PORTION OF THE RECENTLY ENCODED SEQUENCE AND A LOOK-AHEAD BUFFER, THAT CONTAINS THE NEXT PORTION OF THE SEQUENCE TO BE ENCODED.
    146. 146. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUELZ77: pointer SEARCH BUFFER LOOKAHEAD BUFFER c a b r A c a d a b r A r r a r r1. To encode the sequence in look-ahead buffer,the encoder moves a search pointer back throughthe search buffer until it encounters a match to thefirst symbol in the look-ahead buffer. The distanceof the pointer from the look-ahead buffer is calledthe offset.
    147. 147. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE1. The encoder then examines the symbols following the symbol at the pointer location to see if they match consecutive symbols in the look-ahead buffer. The number of consecutive symbols in the search buffer that match consecutive symbols in the look-ahead buffer, starting with the first symbol, is called the length of the match. The encoder searches the search buffer for the longest match.
    148. 148. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE1. Once the longest match has been found, the encoder encodes it with a triple <o,l,c> where o is the offset, l is the length of the match and c is the code-word corresponding to the symbol in the look-ahead buffer that follows the match. In the diagram, the longest match is the first a of the search buffer. The offset o in this case is 7, l is 4, and the symbol in the look-ahead buffer following the match is r.
    149. 149. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE The reason for sending the third element in the triple is to take care of the situation where no match for the symbol in the look-ahead buffer can be found in the search buffer. In this case, the offset and the match length values are set to 0, and the third element of the triple is the code for the symbol itself. For the decoding process, it is basically a table look-up procedure and can be done by reversing the encoding procedure.
    150. 150. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE Take a buffer of the same size as used in encoding, say n, and then use its first (N – n) spaces to hold the previously decoded chars, where N is the size of the window ( sum of the size of the look-ahead buffer and the search buffer) used in the encoding process. If one breaks up each triple that one encounters back into its components- position offset o, match length l and the last symbol of the incoming stream c, one can extract the match string from buffer according to o, and thus obtain the original content.
    151. 151. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE It provides very good compression ratio for many types of data. But, the encoding process is very time consuming as there are many comparisons to be performed between the look-ahead buffer and the window. On the other hand, the decoding process is very simple and fast and both the encoding and the decoding processes have a low memory consumption, since the only data held in the memory is the window (between 4 & 64 kb)
    152. 152. LZ77 (LEMPEL-ZIV) COMPRESSION TECHNIQUE ALL POPULAR ARCHIVERS (.ARJ, .LHA, .ZIP, .ZOO) ARE VARIATIONS ON LZ77 THEME. DRAWBACK: IT USES ONLY A SMALL WINDOW INTO PREVIOSLY SEEN TEXT, WHICH MEANS IT CONTINUOUSLY THROWS AWAY VALUABLE DICTIONARY ENTRIES BECAUSE THEY SLIDE OUT OF THE DICTIONARY. THE LONGEST MATCH POSSIBLE IS ROUGHLY THE SIZE OF THE LOOK-AHEAD BUFFER. SECONDLY, IF A STRING THAT HAS ALREADY BEEN CAPTURED APPEARS AT LONGER INTERVAL, THEN A SEPARATE CODE WILL BE GENERATED FOR THE SAME STRING.
    153. 153. LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE TO OVERCOME SUCH PROBLEMS, LZ78 ALGO WAS GIVEN. THE ONLY DIFFERENCE HERE IS THAT THE FIXED SIZE WINDOW OF LZ77 IS REPLACED BY A DICTIONARY IN LZ78. WHILE LZ77 WORKS ON PAST DATA, LZ78 ATTEMPTS TO WORK ON FUTURE DATA. IT DOES THIS BY FORWARD SCANNING THE INPUT BUFFER AND MATCHING IT AGAINST A DICTIONARY IT MAINTAINS.
    154. 154. LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE THE DICTIONARY IN LZ78 IS A TABLE OF STRINGS. EVERY STRING IS ASSIGNED A CODE WORD ACCORDING TO ITS INDEX NUMBER IN THE DICTIONARY. BEFORE UNDERSTANDING THE METHOD, LOOK AT THE FOLLOWING TERMS: CHARSTREAM: A SEQUENCE OF DATA TO BE ENCODED.
    155. 155. LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE CODE WORD: A BASIC DATA ELEMENT IN THE CODE STREAM. IT REPRESENTS A STRING FROM THE DICTIONARY. PREFIX: A SEQUENCE OF CHARS THAT PRECEDE ONE CHARACTER STRING: THE PREFIX TOGETHER WITH THE CHAR IT PRECEDES CODESTREAM: THE SEQUENCE OF CODE WORDS AND CHARS ( THE OUTPUT OF THE ENCODING ALGORITHM)
    156. 156. LZ78 (LEMPEL-ZIV) COMPRESSION TECHNIQUE CURRENT PREFIX (P) : THE PREFIX CURRENTLY BEING PROCESSED IN THE ENCODING ALGORITHM CURRENT CHARACTER (C ): A CHAR DETERMINED IN THE ENCODING ALGORITHM. GENERALLY, THIS IS THE CHARACTER PRECEDED BY THE CURRENT PREFIX. CURRENT CODE WORD (W): THE CODE WORD CURRENTLY PROCESSED IN THE DECODING ALGORITHM.
    157. 157. LZ78 (LEMPEL-ZIV) ENCODING PROCESS IT STARTS WITH A NEW DICTIONARY ie., AT THE BEGINNING OF ENCODING THE DICTIONARY IS EMPTY. LET US CONSIDER A POINT WITHIN THE ENCODING PROCESS, WHEN THE DICTIONARY ALREADY CONTAINS SOME STRINGS. ONE STARTS ANALYZING A NEW PREFIX IN THE CHARSTREAM, BEGINNING WITH AN EMPTY PREFIX.
    158. 158. LZ78 (LEMPEL-ZIV) ENCODING PROCESS IF ITS CORRESPONDING STRING (P+ C) IS PRESENT IN THE DICTIONARY, THE PREFIX IS EXTENDED WITH THE CHAR C. THIS EXTENDING IS REPEATED UNTIL ONE GETS A STRING WHICH IS NOT PRESENT IN THE DICTIONARY. AT THAT POINT, ONE OUTPUTS 2 THINGS TO THE CODESTREAM: THE CODEWORD THAT REPRESENTS THE PREFIX P AND THEN THE CHAR C.
    159. 159. LZ78 (LEMPEL-ZIV) ENCODING PROCESS THEN ONE ADDS THE WHOLE STRING (P+C) TO THE DICTIONARY AND STARTS PROCESSING THE NEXT PREFIX IN THE CHARSTREAM. A SPECIAL CASE OCCURS IF THE DICTIONARY DOES NOT CONTAIN EVEN THE STARTING ONE CHARACTER STRING ( IT ALWAYS HAPPENS IN THE FIRST ENCODING STEP). IN THAT CASE, ONE OUTPUTS A SPECIAL CODE WORD THAT REPRESENTS AN EMPTY STRING, FOLLOWED BY THIS CHARACTER AND ADD THIS CHAR TO THE DICTIONARY.
    160. 160. LZ78 (LEMPEL-ZIV) ENCODING PROCESS THE OUTPUT FROM THIS ALGO IS A SEQ OF CODEWORD CHAR PAIR (W,C). EACH TIME A PAIR IS OUTPUT TO THE CODESTREAM, THE STRING FROM THE DICTIONARY CORRESPONDING TO W IS EXTENDED WITH THE CHAR C AND THE RESULTING STRING IS ADDED TO THE DICTIONARY. IT MEANS THAT WHEN A NEW STRING IS BEING ADDED, THE DICTIONARY ALREADY CONTAINS ALL THE SUBSTRINGS FORMED BY REMOVING CHARS FROM THE END OF THE NEW STRING.
    161. 161. LZ78 (LEMPEL-ZIV) ENCODING ALGORITHM LZ78:2. START WITH AN EMPTY DICTIONARY WITH AN EMPTY PREFIX P.3. C= NEXT CHAR IN THE CHARSTREAM4. IS THE STRING (P+C) PRESENT IN THE DICTIONARY? IF YES, THEN P= P+C IF NOT, THEN OUTPUT THESE 2 OBJECTS, P & C, TO THE CODESTREAM, [THE CODEWORD CORRESPONDING TO P AND C IN THE SAME FORM AS INPUT FROM CHARSTREAM] ADD THE STRING P+C TO THE DICTIONARY P= EMPTY ARE THERE MORE CHARS IN THE CHARSTREAM? IF YES, RETURN TO STEP 2 IF NOT IF P IS NOT EMPTY, OUTPUT THE CODE WORD CORREPONDING TO P END
    162. 162. LZ78 (LEMPEL-ZIV) DECODING PROCESS AT THE START OF DECODING THE DICTIONARY IS EMPTY. IT GETS RECONSTRUCTED IN THE PROCESS OF DECODING. IN EACH STEP, A PAIR CODEWORD-CHAR –(W,C) IS READ FROM THE CODESTREAM. THE CODEWORD ALWAYS REFERS TO A STRING ALREADY PRESENT IN THE DICTIONARY. THE STRING W & C ARE OUTPUT TO THE CHARSTREAM AND THE STRING (W+C) IS ADDED TO THE DICTIONARY. AFTER THE DECODING, THE DICTIONARY WILL LOOK EXACTLY THE SAME AS AFTER ENCODING.
    163. 163. LZ78 (LEMPEL-ZIV) DECODING ALGORITHM1. AT THE START THE DICTIONARY IS EMPTY.2. W= NEXT CODEWORD IN THE CODESTREAM3. C= THE CHARACTER FOLLOWING IT4. OUTPUT W TO THE CODESTREAM (THIS CAN BE AN EMPTY STRING) AND THEN OUTPUT C5. ADD THE STRING W+C TO THE DICTIONARY6. ARE THERE MORE CODEWORDS IN THE CODESTREAM? IF YES, GO BACK TO STEP 2 IF NOT, END.
    164. 164. LZ78 (LEMPEL-ZIV) ENCODING PROCESS EXAMPLE: LET THE CHAR STREAM TO BE ENCODED BEPOS 1 2 3 4 5 6 7 8 9CHAR A B B C B C A B A ENCODING PROCESSENCODING CURRENT DICTIONARY OUTPUTSTEP POSITION 1 1 A (0,A) 2 2 B (0,B) 3 3 BC (2,C) 4 5 BC A (3,A) 5 8 B A (2,A)
    165. 165. LZ78 (LEMPEL-ZIV) ENCODING PROCESS THE COLUMN DICTIONARY SHOWS WHAT STRING HAS BEEN ADDED TO THE DICTIONARY. THE INDEX OF THE STRING IS EQUAL TO THE STEP NUMBER. THE COLUMN OUTPUT PRESENTS THE OUTPUT IN THE FORM (W,C) THE OUTPUT OF EACH STEP DECODES TO THE STRING THAT HAS BEEN ADDED TO THE DICTIONARY.
    166. 166. LZ78 (LEMPEL-ZIV) DECODING PROCESSTHE DECODING PROCESSSTEPS OUTPUT PHASE TEXT GENERATED 1 (0,A) A 2 (0,B) B 3 (2,C) BC 4 (3,A) BCA 5 (2,A) BA
    167. 167. LZW ALGORITHM LZW WORKS BY ENTERING PHRASES INTO A DICTIONARY AND THEN, WHEN A REPEAT OCCURANCE OF THAT PARTICULAR PHRASE IS FOUND, OUTPUTTING THE DICTIONARY INDEX INSTEAD OF THE PHRASE. FOR EXAMPLE, IT USES A DICTIONARY WITH 4096 ENTRIES. IN THE BEGINNING, THE ENTRIES 0-255 REFER TO INDIVIDUAL BYTES AND THE REST 256-4095 REFER TO LONGER STRINGS.
    168. 168. LZW ALGORITHM EACH TIME A NEW CODE IS GENERATED, IT MEANS A NEW STRING HAS BEEN SELECTED FROM THE INPUT STREAM. NEW STRINGS THAT ARE ADDED TO THE DICTIONARY ARE CREATED BY APPENDING THE CURRENT CHARACTER K TO THE END OF AN EXISTING STRING W.
    169. 169. LZW ALGORITHMSET W=NILLOOP READ A CHARACTER K IF WK EXISTS IN THE DICTIONARY W=WK ELSE OUTPUT THE CODE FOR W ADD WK TO THE STRING TABLE W=KEND-LOOP
    170. 170. JPEG DESIGNED FOR COMPRESSING FULL COLOUR OR GRAY SCALE DIGITAL IMAGES OF REAL- WORLD SCENES. IT DOES NOT WORK WELL ON TEXT, NON- REALISTIC IMAGES SUCH AS CARTOONS AND LINE DRAWINGS. IT IS INDEPENDENT OF SOURCE IMAGE. IT MEANS IT CAN COMPRESS IMAGE IRRESPECTIVE OF ITS SIZE.
    171. 171. JPEG IT DOES NOT HANDLE B&W (1 BIT PER PIXEL) IMAGES NOR DOES IT HANDLE MOTION PICTURE COMPRESSION. IT USES LOSSY TECHNIQUE. THE ALGO ACHIEVES MUCH OF ITS COMPRESSION BY EXPLOITING KNOWN LIMITATIONS OF HUMAN EYE, THE FACT THAT SMALL COLOUR DETAILS ARE NOT PERCEIVED AS WELL AS SMALL DETAILS OF LIGHT AND DARK.
    172. 172. JPEG IT IS INTENDED FOR COMPRESSING IMAGES THAT WILL BE LOOKED AT BY HUMANS. THE JPEG STANDARD INCLUDES A SEPARATE LOSSLESS MODE, BUT IT IS RARELY USED AND DOES NOT GIVE NEARLY AS MUCH COMPRESSION AS THE LOSSY MODE. A USEFUL PROPERTY OF JPEG IS THAT THE DEGREE OF LOSSINESS CAN BE VARIED BY ADJUSTING COMPRESSION PARAMETERS.
    173. 173. JPEG DECODERS CAN TRADE-OFF DECODING SPEED AGAINST IMAGE QUALITY BY USING FAST BUT INACCURATE APPROXIMATIONS TO THE REQUIRED CALCULATIONS. MAIN ADV OF USING JPEG IS THAT ONE CAN MAKE IMAGE FILES SMALLER, AS WELL AS ONE CAN STORE 24 BIT/PIXEL COLOR DATA (16 MILLION COLORS) INSTEAD OF 8 BIT/PIXEL DATA (256 OR FEWER COLORS) JPEG CAN EASILY PROVIDE 20:1 COMPRESSION OF FULL COLOR DATA. AT LOW QUALITY EVEN 100:1 COMPRESSION IS POSSIBLE. JPEG IS WIDELY USED ON WWW FOR STORING/TRANSMITTING PHOTOGRAPHS.
    174. 174. MPEG THE MAIN ADVANTAGE IS THAT IT COMPRESSES DATA UPTO 1.5 MBITS/SECOND WHICH IS EQUAL TO CDROM DATA TRANSFER RATE. USING MPEG1, ONE CAN DELIVER 1.2 Mbps OF VIDEO AND 250 kbps OF 2 CHANNEL STEREO SOUND USING CDROM TECHNOLOGY. SO MOVIE MAY BE STORED ON CDROM IN MPEG FORMAT AND MAY BE VIEWED WITHOUT ANY SYNCHRONIZATION FAULT.
    175. 175. MPEG JPEG IS FOR STILL IMAGE COMPRESSION WHEREAS MPEG IS FOR MOVING PICTURES. BUT AS DIGITAL VIDEO OR MOVIES STORE A SEQ OF STILL COLOR IMAGES, MPEG STANDARD USES THE JPEG COMPRESSION TO COMPRESS STILL COLOR IMAGES. MPEG IS SUITABLE FOR SYMMETRIC AS WELL AS ASYMMETRIC COMPRESSION.
    176. 176. MPEG ASYMMETRIC COMPRESSION REQUIRES MORE EFFORT FOR CODING THAN DECODING. IN THIS CASE, COMPRESSION IS CARRIED OUT ONCE WHEREAS DECOMPRESSION IS PERFORMED MANY TIMES. SYMMETRIC COMPRESSION IS KNOWN TO EXPECT EQUAL EFFORT FOR COMPRESSION AND DECOMPRESSION PROCESSES.
    177. 177. MPEG INTERACTIVE DIALOGUE APPLICATIONS MAKE USE OF THIS ENCODING TECHNIQUE, WHERE RESTRICTED END-TO-END DELAY IS REQUIRED. MPEG HAS BECOME THE METHOD OF CHOICE FOR ENCODING MOTION IMAGES BECAUSE IT HAS BECOME WIDELY ACCEPTED FOR BOTH INTERNET AND DVD-VIDEO.
    178. 178. MHEG MULTIMEDIA HYPERMEDIA EXPERT GROUP SET UP BY ISO FOR STANDARDIZATION OF EXCHANGE FORMAT FOR MULTIMEDIA PRESENTATION AND MULTIMEDIA SYSTEM. IT IS ALMOST IMPOSSIBLE TO MAKE A MM PRESENTATION WHICH CAN WORK ACROSS DIFF HW PLATFORMS. MAIN OBJECTIVE OF THIS GROUP IS TO CREATE THE STANDARD METHOD OF STORE, EXCHANGE AND DISPLAY MM PRESENTATION.
    179. 179. MHEG IT IS BASED ON OBJECT ORIENTED TECHNOLOGY. FOR MM PRESENTATION, THERE ARE MANY CLASSES THAT DEFINE HOW AUDIO, VIDEO AND MUSIC CAN BE PLAYED. THERE ARE CLASSES THAT CAN HELP TO DEVELOP USER INTERACTION DURING MM PRESENTATIONS. THE THREE IMP CLASSES USED ARE CONTENT CLASS, BEHAVIOUR CLASS AND INTERACTION CLASS.
    180. 180. MHEG CONTENT CLASS IS USED TO DESCRIBE THE ACTUAL CONTENTS OF THE MM PRESENTATION BEHAVIOUR CLASS IS USED TO DECIDE THE BEHAVIOUR OF PRESENTATION.FOR EXAMPLE HOW AND WHEN DATA WILL BE PRESENTED TO THE USER. IT HAS 2 SUB CLASSES- ACTION CLASS AND LINK CLASS WHICH ARE USEFUL FOR SYNC THE EVENTS WITH THE USER INTERFACE.
    181. 181. MHEG THIS CLASS DESCRIBES THE ELEMENTS OF THE USER INTERFACE (ie., THE ELEMENTS THAT APPEAR ON THE USER SCREEN) THAT ALLOW THE USER TO MAKE SELECTIONS, TRIGGER EVENTS AND INPUT INFORMATION. FOR EXAMPLE, THE ELEMENTS CHECK BOX, RADIO BUTTONS AND LISTS ARE USED TO MAKE SELECTIONS; ELEMENT PUSH BUTTON IS USED TO TRIGGER EVENTS AND TEXT ENTRY FIELD IS USED TO INPUT INFORMATION FROM THE USER.
    1. ¿Le ha llamado la atención una diapositiva en particular?

      Recortar diapositivas es una manera útil de recopilar información importante para consultarla más tarde.

    ×