SlideShare a Scribd company logo
1 of 10
1
Lossless Compression
Multimedia Systems (Module 2)
r Lesson 1:
m Minimum Redundancy Coding based on Information
Theory:
• Shannon-Fano Coding
• Huffman Coding
r Lesson 2:
m Adaptive Coding based on Statistical Modeling:
• Adaptive Huffman
• Arithmetic coding
r Lesson 3:
m Dictionary-based Coding
• LZW
2
Lossless Compression
Multimedia Systems (Module 2 Lesson 1)
Summary:
r Compression
m With loss
m Without loss
r Shannon: Information
Theory
r Shannon-Fano Coding
Algorithm
r Huffman Coding Algorithm
Sources:
r The Data Compression Book,
2nd Ed., Mark Nelson and
Jean-Loup Gailly.
r Dr. Ze-Nian Li’s course
material
3
Compression
Why Compression?
All media, be it text, audio, graphics or video has “redundancy”.
Compression attempts to eliminate this redundancy.
What is Redundancy?
m If one representation of a media content, M, takes X bytes and
another takes Y bytes(Y< X), then X is a redundant
representation relative to Y.
m Other forms of Redundancy
If the representation of a media captures content that is not
perceivable by humans, then removing such content will not
affect the quality of the content.
• For example, capturing audio frequencies outside the human hearing
range can be avoided without any harm to the audio’s quality.
Is there a representation with an optimal size Z that cannot be
improved upon?
This question is tackled by information theory.
4
Compression
Lossless Compression Compression with loss
M
m
Lossless Compress
M
Uncompress
M
m
Compress with loss
M’
Uncompress
M’  M
5
Information Theory
According to Shannon, the entropy@ of an information
source S is defined as:
H(S) = Σi (pi log 2 (1/pi ))
m log 2 (1/pi ) indicates the amount of information contained in
symbol Si, i.e., the number of bits needed to code symbol Si.
m For example, in an image with uniform distribution of gray-
level intensity, i.e. pi = 1/256, with the number of bits
needed to code each gray level being 8 bits. The entropy of
the image is 8.
m Q: What is the entropy of a source with M symbols where
each symbol is equally likely?
• Entropy, H(S) = log2 M
m Q: How about an image in which half of the pixels are white
and half are black?
• Entropy, H(S) = 1
@Here is an excellent primer by Dr. Schnieder on this subject
6
Information Theory
Discussion:
m Entropy is a measure of how much information is encoded in
a message. Higher the entropy, higher the information
content.
• We could also say entropy is a measure of uncertainty in a
message. Information and Uncertainty are equivalent concepts.
m The units (in coding theory) of entropy are bits per symbol.
• It is determined by the base of the logarithm:
2: binary (bit);
10: decimal (digit).
m Entropy gives the actual number of bits of information
contained in a message source.
• Example: If the probability of the character ‘e’ appearing in
this slide is 1/16, then the information content of this
character is 4 bits. So, the character string “eeeee” has a total
content of 20 bits (contrast this to using an 8-bit ASCII
coding that could result in 40 bits to represent “eeeee”).
7
Data Compression = Modeling + Coding
Data Compression consists of taking a stream of symbols and
transforming them into codes.
m The model is a collection of data and rules used to process input
symbols and determine their probabilities.
m A coder uses a model (probabilities) to spit out codes when its
given input symbols
Let’s take Huffman coding to demonstrate the distinction:
Input
Stream
Model Encoder
Output
Stream
Symbols Probabilities Codes
r The output of the Huffman encoder is determined by the Model
(probabilities). Higher the probability shorter the code.
r Model A could determine raw probabilities of each symbol
occurring anywhere in the input stream. (pi = # of occurrences of Si /
Total number of Symbols)
r Model B could determine prob. based on the last 10 symbols in the
i/p stream. (continuously re-computes the probabilities)
8
The Shannon-Fano Encoding Algorithm
1. Calculate the frequencies
of the list of symbols
(organize as a list).
2. Sort the list in
(decreasing) order of
frequencies.
3. Divide list into two
halfs, with the total
freq. Counts of each half
being as close as
possible to the other.
4. The upper half is
assigned a code of 0 and
lower a code of 1.
5. Recursively apply steps 3
and 4 to each of the
halves, until each symbol
has become a
corresponding code leaf
on a tree.
Symbol
Count
B
A D
C E
7
15 6 6 5
0 0 1 1 1
0 1 0 1
1
1
0
Example
Symbol Count Info.
-log2(pi)
Code Subtotal
# of
Bits
A 15 1.38 00 30
B 7 2.48 01 14
C 6 2.70 10 12
D 6 2.70 110 18
E 5 2.96 111 15
It takes a total of 89 bits to encode
85.25 bits of information (Pretty
good huh!)
x
x
x
x
x
85.25 89
9
The Huffman Algorithm
1. Initialization: Put all
nodes in an OPEN list L,
keep it sorted at all
times (e.g., ABCDE).
2. Repeat the following
steps until the list L
has only one node left:
1. From L pick two nodes
having the lowest
frequencies, create a
parent node of them.
2. Assign the sum of the
children's frequencies
to the parent node and
insert it into L.
3. Assign code 0, 1 to
the two branches of
the tree, and delete
the children from L.
Symbol B
A D
C E
Count 7
15 6 6 5
Example
Symbol Count Info.
-log2(pi)
Code Subtotal
# of
Bits
A 15 1.38 1 15
B 7 2.48 000 21
C 6 2.70 001 18
D 6 2.70 010 18
E 5 2.96 011 15
x
x
x
x
x
85.25 87
0 1
24
0 1
13
0 1
11
39
0
1
10
Huffman Alg.: Discussion
Decoding for the above two algorithms is trivial as long as the
coding table (the statistics) is sent before the data. There
is an overhead for sending this, negligible if the data file is
big.
Unique Prefix Property: no code is a prefix to any other code
(all symbols are at the leaf nodes)
--> great for decoder, unambiguous; unique Decipherability?
If prior statistics are available and accurate, then Huffman
coding is very good.
Number of bits (per symbol) needed for Huffman Coding
is:
87 / 39 = 2.23
Number of bits (per symbol)needed for Shannon-Fano
Coding is:
89 / 39 = 2.28

More Related Content

Similar to Huffman&Shannon-multimedia algorithms.ppt

Lec_8_Image Compression.pdf
Lec_8_Image Compression.pdfLec_8_Image Compression.pdf
Lec_8_Image Compression.pdfnagwaAboElenein
 
cp467_12_lecture14_image compression1.pdf
cp467_12_lecture14_image compression1.pdfcp467_12_lecture14_image compression1.pdf
cp467_12_lecture14_image compression1.pdfshaikmoosa2003
 
Noise infotheory1
Noise infotheory1Noise infotheory1
Noise infotheory1vmspraneeth
 
Noise info theory and Entrophy
Noise info theory and EntrophyNoise info theory and Entrophy
Noise info theory and EntrophyIzah Asmadi
 
Information Theory and coding - Lecture 3
Information Theory and coding - Lecture 3Information Theory and coding - Lecture 3
Information Theory and coding - Lecture 3Aref35
 
Huffman coding
Huffman codingHuffman coding
Huffman codingGeorge Ang
 
DC Lecture Slides 1 - Information Theory.ppt
DC Lecture Slides 1 - Information Theory.pptDC Lecture Slides 1 - Information Theory.ppt
DC Lecture Slides 1 - Information Theory.pptshortstime400
 
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...Helan4
 
3 mathematical priliminaries DATA compression
3 mathematical priliminaries DATA compression3 mathematical priliminaries DATA compression
3 mathematical priliminaries DATA compressionShubham Jain
 
Itblock2 150209161919-conversion-gate01
Itblock2 150209161919-conversion-gate01Itblock2 150209161919-conversion-gate01
Itblock2 150209161919-conversion-gate01Xuan Phu Nguyen
 
Information Theory and coding - Lecture 2
Information Theory and coding - Lecture 2Information Theory and coding - Lecture 2
Information Theory and coding - Lecture 2Aref35
 
Lossless image compression.(1)
Lossless image compression.(1)Lossless image compression.(1)
Lossless image compression.(1)MohnishSatidasani
 
How Computer Work
How Computer WorkHow Computer Work
How Computer Workguest5dedf5
 

Similar to Huffman&Shannon-multimedia algorithms.ppt (20)

Lec_8_Image Compression.pdf
Lec_8_Image Compression.pdfLec_8_Image Compression.pdf
Lec_8_Image Compression.pdf
 
cp467_12_lecture14_image compression1.pdf
cp467_12_lecture14_image compression1.pdfcp467_12_lecture14_image compression1.pdf
cp467_12_lecture14_image compression1.pdf
 
Noise infotheory1
Noise infotheory1Noise infotheory1
Noise infotheory1
 
Noise info theory and Entrophy
Noise info theory and EntrophyNoise info theory and Entrophy
Noise info theory and Entrophy
 
Information Theory and coding - Lecture 3
Information Theory and coding - Lecture 3Information Theory and coding - Lecture 3
Information Theory and coding - Lecture 3
 
Huffman coding
Huffman codingHuffman coding
Huffman coding
 
DC Lecture Slides 1 - Information Theory.ppt
DC Lecture Slides 1 - Information Theory.pptDC Lecture Slides 1 - Information Theory.ppt
DC Lecture Slides 1 - Information Theory.ppt
 
Huffman Coding
Huffman CodingHuffman Coding
Huffman Coding
 
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
2.3 unit-ii-text-compression-a-outline-compression-techniques-run-length-codi...
 
Compression Ii
Compression IiCompression Ii
Compression Ii
 
Compression Ii
Compression IiCompression Ii
Compression Ii
 
3 mathematical priliminaries DATA compression
3 mathematical priliminaries DATA compression3 mathematical priliminaries DATA compression
3 mathematical priliminaries DATA compression
 
UNIT-2.pdf
UNIT-2.pdfUNIT-2.pdf
UNIT-2.pdf
 
Itblock2 150209161919-conversion-gate01
Itblock2 150209161919-conversion-gate01Itblock2 150209161919-conversion-gate01
Itblock2 150209161919-conversion-gate01
 
Information Theory and coding - Lecture 2
Information Theory and coding - Lecture 2Information Theory and coding - Lecture 2
Information Theory and coding - Lecture 2
 
Lossless image compression.(1)
Lossless image compression.(1)Lossless image compression.(1)
Lossless image compression.(1)
 
Lec5 Compression
Lec5 CompressionLec5 Compression
Lec5 Compression
 
How Computer Work
How Computer WorkHow Computer Work
How Computer Work
 
I1803014852
I1803014852I1803014852
I1803014852
 
Komdat-Kompresi Data
Komdat-Kompresi DataKomdat-Kompresi Data
Komdat-Kompresi Data
 

Recently uploaded

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking MenDelhi Call girls
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDGMarianaLemus7
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxnull - The Open Security Community
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupFlorian Wilhelm
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphNeo4j
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr LapshynFwdays
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxMalak Abu Hammad
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreternaman860154
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 3652toLead Limited
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationRidwan Fadjar
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationSlibray Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesSinan KOZAK
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsRizwan Syed
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountPuma Security, LLC
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsPrecisely
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Scott Keck-Warren
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machinePadma Pradeep
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):comworks
 

Recently uploaded (20)

08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men08448380779 Call Girls In Friends Colony Women Seeking Men
08448380779 Call Girls In Friends Colony Women Seeking Men
 
APIForce Zurich 5 April Automation LPDG
APIForce Zurich 5 April  Automation LPDGAPIForce Zurich 5 April  Automation LPDG
APIForce Zurich 5 April Automation LPDG
 
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptxMaking_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
Making_way_through_DLL_hollowing_inspite_of_CFG_by_Debjeet Banerjee.pptx
 
Streamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project SetupStreamlining Python Development: A Guide to a Modern Project Setup
Streamlining Python Development: A Guide to a Modern Project Setup
 
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge GraphSIEMENS: RAPUNZEL – A Tale About Knowledge Graph
SIEMENS: RAPUNZEL – A Tale About Knowledge Graph
 
Pigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food ManufacturingPigging Solutions in Pet Food Manufacturing
Pigging Solutions in Pet Food Manufacturing
 
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
"Federated learning: out of reach no matter how close",Oleksandr Lapshyn
 
The Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptxThe Codex of Business Writing Software for Real-World Solutions 2.pptx
The Codex of Business Writing Software for Real-World Solutions 2.pptx
 
Presentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreterPresentation on how to chat with PDF using ChatGPT code interpreter
Presentation on how to chat with PDF using ChatGPT code interpreter
 
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
Tech-Forward - Achieving Business Readiness For Copilot in Microsoft 365
 
My Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 PresentationMy Hashitalk Indonesia April 2024 Presentation
My Hashitalk Indonesia April 2024 Presentation
 
Connect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck PresentationConnect Wave/ connectwave Pitch Deck Presentation
Connect Wave/ connectwave Pitch Deck Presentation
 
Unblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen FramesUnblocking The Main Thread Solving ANRs and Frozen Frames
Unblocking The Main Thread Solving ANRs and Frozen Frames
 
Scanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL CertsScanning the Internet for External Cloud Exposures via SSL Certs
Scanning the Internet for External Cloud Exposures via SSL Certs
 
Breaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path MountBreaking the Kubernetes Kill Chain: Host Path Mount
Breaking the Kubernetes Kill Chain: Host Path Mount
 
Unlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power SystemsUnlocking the Potential of the Cloud for IBM Power Systems
Unlocking the Potential of the Cloud for IBM Power Systems
 
Pigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping ElbowsPigging Solutions Piggable Sweeping Elbows
Pigging Solutions Piggable Sweeping Elbows
 
Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024Advanced Test Driven-Development @ php[tek] 2024
Advanced Test Driven-Development @ php[tek] 2024
 
Install Stable Diffusion in windows machine
Install Stable Diffusion in windows machineInstall Stable Diffusion in windows machine
Install Stable Diffusion in windows machine
 
CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):CloudStudio User manual (basic edition):
CloudStudio User manual (basic edition):
 

Huffman&Shannon-multimedia algorithms.ppt

  • 1. 1 Lossless Compression Multimedia Systems (Module 2) r Lesson 1: m Minimum Redundancy Coding based on Information Theory: • Shannon-Fano Coding • Huffman Coding r Lesson 2: m Adaptive Coding based on Statistical Modeling: • Adaptive Huffman • Arithmetic coding r Lesson 3: m Dictionary-based Coding • LZW
  • 2. 2 Lossless Compression Multimedia Systems (Module 2 Lesson 1) Summary: r Compression m With loss m Without loss r Shannon: Information Theory r Shannon-Fano Coding Algorithm r Huffman Coding Algorithm Sources: r The Data Compression Book, 2nd Ed., Mark Nelson and Jean-Loup Gailly. r Dr. Ze-Nian Li’s course material
  • 3. 3 Compression Why Compression? All media, be it text, audio, graphics or video has “redundancy”. Compression attempts to eliminate this redundancy. What is Redundancy? m If one representation of a media content, M, takes X bytes and another takes Y bytes(Y< X), then X is a redundant representation relative to Y. m Other forms of Redundancy If the representation of a media captures content that is not perceivable by humans, then removing such content will not affect the quality of the content. • For example, capturing audio frequencies outside the human hearing range can be avoided without any harm to the audio’s quality. Is there a representation with an optimal size Z that cannot be improved upon? This question is tackled by information theory.
  • 4. 4 Compression Lossless Compression Compression with loss M m Lossless Compress M Uncompress M m Compress with loss M’ Uncompress M’  M
  • 5. 5 Information Theory According to Shannon, the entropy@ of an information source S is defined as: H(S) = Σi (pi log 2 (1/pi )) m log 2 (1/pi ) indicates the amount of information contained in symbol Si, i.e., the number of bits needed to code symbol Si. m For example, in an image with uniform distribution of gray- level intensity, i.e. pi = 1/256, with the number of bits needed to code each gray level being 8 bits. The entropy of the image is 8. m Q: What is the entropy of a source with M symbols where each symbol is equally likely? • Entropy, H(S) = log2 M m Q: How about an image in which half of the pixels are white and half are black? • Entropy, H(S) = 1 @Here is an excellent primer by Dr. Schnieder on this subject
  • 6. 6 Information Theory Discussion: m Entropy is a measure of how much information is encoded in a message. Higher the entropy, higher the information content. • We could also say entropy is a measure of uncertainty in a message. Information and Uncertainty are equivalent concepts. m The units (in coding theory) of entropy are bits per symbol. • It is determined by the base of the logarithm: 2: binary (bit); 10: decimal (digit). m Entropy gives the actual number of bits of information contained in a message source. • Example: If the probability of the character ‘e’ appearing in this slide is 1/16, then the information content of this character is 4 bits. So, the character string “eeeee” has a total content of 20 bits (contrast this to using an 8-bit ASCII coding that could result in 40 bits to represent “eeeee”).
  • 7. 7 Data Compression = Modeling + Coding Data Compression consists of taking a stream of symbols and transforming them into codes. m The model is a collection of data and rules used to process input symbols and determine their probabilities. m A coder uses a model (probabilities) to spit out codes when its given input symbols Let’s take Huffman coding to demonstrate the distinction: Input Stream Model Encoder Output Stream Symbols Probabilities Codes r The output of the Huffman encoder is determined by the Model (probabilities). Higher the probability shorter the code. r Model A could determine raw probabilities of each symbol occurring anywhere in the input stream. (pi = # of occurrences of Si / Total number of Symbols) r Model B could determine prob. based on the last 10 symbols in the i/p stream. (continuously re-computes the probabilities)
  • 8. 8 The Shannon-Fano Encoding Algorithm 1. Calculate the frequencies of the list of symbols (organize as a list). 2. Sort the list in (decreasing) order of frequencies. 3. Divide list into two halfs, with the total freq. Counts of each half being as close as possible to the other. 4. The upper half is assigned a code of 0 and lower a code of 1. 5. Recursively apply steps 3 and 4 to each of the halves, until each symbol has become a corresponding code leaf on a tree. Symbol Count B A D C E 7 15 6 6 5 0 0 1 1 1 0 1 0 1 1 1 0 Example Symbol Count Info. -log2(pi) Code Subtotal # of Bits A 15 1.38 00 30 B 7 2.48 01 14 C 6 2.70 10 12 D 6 2.70 110 18 E 5 2.96 111 15 It takes a total of 89 bits to encode 85.25 bits of information (Pretty good huh!) x x x x x 85.25 89
  • 9. 9 The Huffman Algorithm 1. Initialization: Put all nodes in an OPEN list L, keep it sorted at all times (e.g., ABCDE). 2. Repeat the following steps until the list L has only one node left: 1. From L pick two nodes having the lowest frequencies, create a parent node of them. 2. Assign the sum of the children's frequencies to the parent node and insert it into L. 3. Assign code 0, 1 to the two branches of the tree, and delete the children from L. Symbol B A D C E Count 7 15 6 6 5 Example Symbol Count Info. -log2(pi) Code Subtotal # of Bits A 15 1.38 1 15 B 7 2.48 000 21 C 6 2.70 001 18 D 6 2.70 010 18 E 5 2.96 011 15 x x x x x 85.25 87 0 1 24 0 1 13 0 1 11 39 0 1
  • 10. 10 Huffman Alg.: Discussion Decoding for the above two algorithms is trivial as long as the coding table (the statistics) is sent before the data. There is an overhead for sending this, negligible if the data file is big. Unique Prefix Property: no code is a prefix to any other code (all symbols are at the leaf nodes) --> great for decoder, unambiguous; unique Decipherability? If prior statistics are available and accurate, then Huffman coding is very good. Number of bits (per symbol) needed for Huffman Coding is: 87 / 39 = 2.23 Number of bits (per symbol)needed for Shannon-Fano Coding is: 89 / 39 = 2.28