Greedy Technique
Huffman coding
0 Huffman Coding is a technique of compressing data to
reduce its size without losing any of the details. It was
first developed by David Huffman.
0 Huffman Coding is generally useful to compress the data.
0 It uses variable length encoding.
0 It assigns variable length code to all the characters.
0 The code length of a character depends on how
frequently it occurs in the given text.
0 The character which occurs most frequently gets the
smallest code.
0 The character which occurs least frequently gets the
largest code.
0 It is also known as Huffman Encoding.
Encoding
• Fixed Length encoding (ASCII Encoding)
• Variable length encoding(Huffman Encoding)
Fixed length encoding/ASCII Encoding
• Fixed Length encoding has same length of code for all
the characters.
same length m>=Log2n
Ex: BCCADECEE message to be sent on network
In ASCII encoding, each character occupies 8 bits. There
are a total of 15 characters in the above string. Thus, a
total of 9*8=48 bits are required to send the above
string.
Contd.,
0 Using the Huffman Coding technique, we can
compress the string to a smaller size.
0 Because Huffman coding uses variable length
coding to generate different length code for
each character.
0 Huffman coding first creates a tree using the
frequencies of the character and then generates
code for each character.
0 There are two major steps in Huffman Coding-
1.Building a Huffman Tree from the input characters.
2.Assigning code to the characters by traversing the
Huffman Tree.
Huffman Tree
The steps involved in the construction of Huffman Tree are
as follows-
• Create a leaf node for each character of the text.
• Leaf node of a character contains the occurring frequency
of that character.
• Arrange all the nodes in increasing order of their frequency
value.
• Considering the first two nodes having minimum frequency
create a new internal node.
• The frequency of this new node is the sum of frequency of
those two nodes.
• Make the first node as a left child and the other node as a
right child of the newly created node.
• Keep repeating until all the nodes form a single tree.
• The tree finally obtained is the desired Huffman Tree.
Huffman Code
0 Now,
• We assign weight to all the edges of the constructed
Huffman Tree.
• Assign weight ‘0’ to the left edges and weight ‘1’ to the
right edges.
0 Rule
• If we assign weight ‘0’ to the left edges, then assign
weight ‘1’ to the right edges.
• If we assign weight ‘1’ to the left edges, then assign
weight ‘0’ to the right edges.
• Any of the above two conventions may be followed.
• But we must follow the same convention at the time of
decoding that is adopted at the time of encoding.
0.25
0.1
B
0.15
_
0.2
C
0.2
D
0.35
A
0.2
C
0.2
D
0.35
A
0.1
B
0.15
_
0.4
0.2
C
0.2
D
0.6
0.25
0.1
B
0.15
_
0.6
1.0
0 1
0.4
0.2
C
0.2
D
0.25
0.1
B
0.15
_
0 1 0
0
1
1
0.25
0.1
B
0.15
_
0.35
A
0.4
0.2
C
0.2
D
0.35
A
0.35
A
character A B C D _
frequency 0.35 0.1 0.2 0.2 0.15
codeword 11 100 00 01 101
Average bits per character: 2.25
(0.35*2+0.1*3+02*2+0.2*2+0.15*3=2.25)/
sum(frequencies)
= 2.25/(0.35+0.1+0.2+0.2+0.15)
=2.25/ 0.8369
=2.7
Example 1:
Encoding and Decoding
Encode the Text: AB_CD_AB
AB_CD_AB
w.k.t codeword A=11 B=100 C=00 D=01 _= 101
Encoded msg is: 11100101000110111100
Decode the Text: 11100101000110111100
AB_CD_AB
Example -2
Frequency
Character
Final Soln:
A=111
I=00
S=01
E=10
T=11000
O=11001
U=1101
Example 3:
character A B C D _
frequency 0.4 0.1 0.2 0.15 0.15
codeword 1 000 011 001 010
5c. huffman coding using greedy technique.pptx

5c. huffman coding using greedy technique.pptx

  • 1.
  • 2.
    Huffman coding 0 HuffmanCoding is a technique of compressing data to reduce its size without losing any of the details. It was first developed by David Huffman. 0 Huffman Coding is generally useful to compress the data. 0 It uses variable length encoding. 0 It assigns variable length code to all the characters. 0 The code length of a character depends on how frequently it occurs in the given text. 0 The character which occurs most frequently gets the smallest code. 0 The character which occurs least frequently gets the largest code. 0 It is also known as Huffman Encoding.
  • 3.
    Encoding • Fixed Lengthencoding (ASCII Encoding) • Variable length encoding(Huffman Encoding)
  • 4.
    Fixed length encoding/ASCIIEncoding • Fixed Length encoding has same length of code for all the characters. same length m>=Log2n Ex: BCCADECEE message to be sent on network In ASCII encoding, each character occupies 8 bits. There are a total of 15 characters in the above string. Thus, a total of 9*8=48 bits are required to send the above string.
  • 5.
    Contd., 0 Using theHuffman Coding technique, we can compress the string to a smaller size. 0 Because Huffman coding uses variable length coding to generate different length code for each character. 0 Huffman coding first creates a tree using the frequencies of the character and then generates code for each character. 0 There are two major steps in Huffman Coding- 1.Building a Huffman Tree from the input characters. 2.Assigning code to the characters by traversing the Huffman Tree.
  • 6.
    Huffman Tree The stepsinvolved in the construction of Huffman Tree are as follows- • Create a leaf node for each character of the text. • Leaf node of a character contains the occurring frequency of that character. • Arrange all the nodes in increasing order of their frequency value. • Considering the first two nodes having minimum frequency create a new internal node. • The frequency of this new node is the sum of frequency of those two nodes. • Make the first node as a left child and the other node as a right child of the newly created node. • Keep repeating until all the nodes form a single tree. • The tree finally obtained is the desired Huffman Tree.
  • 7.
    Huffman Code 0 Now, •We assign weight to all the edges of the constructed Huffman Tree. • Assign weight ‘0’ to the left edges and weight ‘1’ to the right edges. 0 Rule • If we assign weight ‘0’ to the left edges, then assign weight ‘1’ to the right edges. • If we assign weight ‘1’ to the left edges, then assign weight ‘0’ to the right edges. • Any of the above two conventions may be followed. • But we must follow the same convention at the time of decoding that is adopted at the time of encoding.
  • 8.
    0.25 0.1 B 0.15 _ 0.2 C 0.2 D 0.35 A 0.2 C 0.2 D 0.35 A 0.1 B 0.15 _ 0.4 0.2 C 0.2 D 0.6 0.25 0.1 B 0.15 _ 0.6 1.0 0 1 0.4 0.2 C 0.2 D 0.25 0.1 B 0.15 _ 0 10 0 1 1 0.25 0.1 B 0.15 _ 0.35 A 0.4 0.2 C 0.2 D 0.35 A 0.35 A character A B C D _ frequency 0.35 0.1 0.2 0.2 0.15 codeword 11 100 00 01 101 Average bits per character: 2.25 (0.35*2+0.1*3+02*2+0.2*2+0.15*3=2.25)/ sum(frequencies) = 2.25/(0.35+0.1+0.2+0.2+0.15) =2.25/ 0.8369 =2.7 Example 1:
  • 9.
    Encoding and Decoding Encodethe Text: AB_CD_AB AB_CD_AB w.k.t codeword A=11 B=100 C=00 D=01 _= 101 Encoded msg is: 11100101000110111100 Decode the Text: 11100101000110111100 AB_CD_AB
  • 10.
  • 11.
  • 12.
    Example 3: character AB C D _ frequency 0.4 0.1 0.2 0.15 0.15 codeword 1 000 011 001 010