IntroductionWhat is Compression?Data compression requires the identification andextraction of source redundancy.In other words, data compression seeks to reducethe number of bits used to store or transmitinformation.There are a wide range of compression methodswhich can be so unlike one another that they havelittle in common except that they compress data.
Compression can be categorized in two broad ways:• Lossless compression• Lossy compression
Lossless compression• recover the exact original data after compression.• mainly use for compressing database records,spreadsheets or word processing files, where exactreplication of the original is essential.
Lossy compression.• will result in a certain loss of accuracy in exchange for a substantial increase in compression.• more effective when used to compress graphic imagesand digitised voice where losses outside visual or auralperception can be tolerated.• Most lossy compression techniques can be adjusted todifferent quality levels, gaining higher accuracy inexchange for less effective compression.
The Need For Compression…In terms of storage, the capacity of a storagedevice can be effectively increased with methodsthat compresses a body of data on its way to astorage device and decompresses it when it isretrieved. In terms of communications, the bandwidth of adigital communication link can be effectivelyincreased by compressing data at the sendingend and decompressing data at the receivingend.
A Brief History of Data Compression..• The late 40s were the early years of InformationTheory, the idea of developing efficient newcoding methods was just starting to be fleshedout. Ideas of entropy, information content andredundancy were explored.• One popular notion held that if the probability ofsymbols in a message were known, there oughtto be a way to code the symbols so that themessage will take up less space.
• The first well-known method for compressingdigital signals is now known as Shannon- Fanocoding. Shannon and Fano [~1948]simultaneously developed this algorithm whichassigns binary codewords to unique symbols thatappear within a given data file.• While Shannon-Fano coding was a great leapforward, it had the unfortunate luck to be quicklysuperseded by an even more efficient codingsystem : Huffman Coding.
• Huffman coding  shares most characteristics of Shannon-Fano coding.• Huffman coding could perform effective data compression by reducing the amount of redundancy in the coding of symbols.• It has been proven to be the most efficient fixed-length coding method available
• In the last fifteen years, Huffman coding has been replaced by arithmetic coding.• Arithmetic coding bypasses the idea of replacing an input symbol with a specific code.• It replaces a stream of input symbols with a single floating-point output number.• More bits are needed in the output number for longer, complex messages.
Terminology• Compressor–Software (or hardware) device that compresses data• Decompressor–Software (or hardware) device that decompresses data• Codec–Software (or hardware) device that compresses and decompresses data• Algorithm–The logic that governs the compression/decompression process
Repetitive• If a sequence aSequence Suppression series on n successive tokens appears• Replace series with a token and a count number ofoccurrences.• Usually need to have a special flag to denote when therepeated token appears• Example89400000000000000000000000000000000• we can replace with 894f32, where f is the flag for zero.
Run-length EncodingExample:• Original Sequence:111122233333311112222• can be encoded as:(1,4),(2,3),(3,6),(1,4),(2,4)
Run-Length Encoding (RLE) MethodExample: blue x 6, magenta x 7, red x 3, yellow x 3 and green x 4
Run-Length Encoding (RLE) Method• Example: This would give: which is twice the size!
• Uncompress• Blue White White White White White White Blue• White Blue White White White White White Blue• etc.• Compress• 1XBlue 6XWhite 1XBlue• 1XWhite 1XBlue 4Xwhite 1XBlue 1XWhite• etc.
The Shannon-Fano Algorithm• Example• Data:• ABBAAAACDEAAABBBDDEEAAA........• Count symbols in stream:
Arithmetic CodingExample• Raw data: BACATherefore• A occurs with probability 0.5,• B and C with probabilities 0.252/4=0.51/4=0.25
Cont..• Start by assigning each symbol to the• probability range 0–1. The first symbol in our example stream is B
Applications Lossless compression..• The above is a very simple example of run- length encoding,• wherein large runs of consecutive identical data values are replaced by a simple code with the data value and length of the run. This is an example of lossless data compression.• It is often used to optimize disk space on office computers, or better use the connection bandwidth in a computer network
Lossy image compression• is used in digital cameras,• to increase storage capacities with minimal degradation of picture quality.
Algorthim of coding…function LZW Decode(File)function LZW Encode(File) n ReadIndex(File)n..ReadByte(File)GetString(n)while nÇEOF do Output(