Huffman Coding
and its implementation on
MATLAB
To
Dr. Samir Ghadhban
By
Faisal K. Al-Hajri
227064
King Fahd University of Petroleum & Minerals
Electrical Engineering Department
EE430-062
Before I start explaining how
could you implement Huffman coding
using MATLAB functions, lets first get a
clear idea about Huffman coding and how
could you find the codewords.
Huffman Coding:
Huffman codes are lossless data compression
codes. They play an important role in data
communications, speech coding, and video or graphical
image compression. Huffman codes generally have
variable-length code words.
Construction of Huffman Codes:-
1- List the source symbols in a column in
descending order of probability.
2- Combine symbols, starting with the two lowest
probability symbols, to form a new compound symbol.
3- Repeating step (2), using the two lowest probability
symbols from the new set of symbols. This process continues
until all of the original symbols have been combined into a
single compound symbol having probability 1.
4- Code words are assigned by reading the labels of
the tree stems from right to left back to the original symbol.
Example:
Construct a Huffman code for the following probabilities:
Symbols Probabilities
E 0.30
N 0.13
I 0.10
O 0.16
P 0.05
T 0.23
W 0.03
• First, list the source symbols in a column in descending
order of probability.
Symbols Probabilities
E 0.30
T 0.23
O 0.16
N 0.13
I 0.10
P 0.05
W 0.03
• Second, Combine symbols starting with the two lowest
probability symbols to form a new compound symbol.
Symbols Probabilities
E 0.30
T 0.23
O 0.16
N 0.13
I 0.10
P 0.05
W 0.03
These are the
two lowest
probabilities.
So, we will
add them.
Symbols Probabilities
E 0.30
T 0.23
O 0.16
N 0.13
I 0.10
P 0.05
W 0.03
New Set
0.30
0.23
0.16
0.13
0.10
0.08
We keep adding the lowest probabilities from the new set
until all of the original symbols have been combined into a
single compound symbols (E, T, O, N, I, P, W) having
probability of 1.
Note:- for each new list, we MUST rearrangement in descending
order.
New
Set
0.30
0.23
0.16
0.13
0.10
0.08
New
Set
0.30
0.23
0.18
0.16
0.13
New
Set
0.30
0.29
0.23
0.18
New
Set
0.41
0.30
0.29
New
Set
0.59
0.41
New
Set
1
Symbol
s
Probabi
lities
E 0.30
T 0.23
O 0.16
N 0.13
I 0.10
P 0.05
W 0.03
Each new set, we
rearrangement in descending
order and keep tracing each
probability from which
symbols it comes.
New
Set
0.30
0.23
0.16
0.13
0.10
0.08
New
Set
0.30
0.23
0.18
0.16
0.13
New
Set
0.30
0.29
0.23
0.18
New
Set
0.41
0.30
0.29
New
Set
0.59
0.41
New
Set
1
0.03
W
0.05
P
0.10
I
0.13
N
0.16
O
0.23
T
0.30
E
Probabi
lities
Symbol
s
The code word for the symbol W is: ( )
The blue line shows
the trace line for the
symbol W. Assign 0 if
the probability that
has been added is the
lowest and 1 if it’s
before the lowest.
So, the code word for the all symbols are:
The code word for the symbol E is:(11)
The code word for the symbol T is: (01)
The code word for the symbol O is: (101)
The code word for the symbol N is: (100)
The code word for the symbol I is: (001)
The code word for the symbol P is: (0001)
The code word for the symbol W is: (0000)
Now,
after we get a clear idea about Huffman coding. Lets now try
to implement it using MATLAP…
There are many ways to implement Huffman coding using
MATLAP or another program. My program consists of three parts or
should I call it Three phases.
First Phase:
It’s a checking phase. Where all the input arguments are being
checked whether they satisfy the conditions or not ( as we will see).
Second Phase:
This phase I should call it the heart of the algorithm. In this
phase, there are some loops that are employed for sorting & decisions
such as comparisons.
Third Phase:
In the third phase, all the output arguments are being finalized
and calculated.
Before introducing each phase, let me talk briefly about the MATLAB function
( function )
Description:
You add new functions to the MATLAB vocabulary by expressing
them in terms of existing functions. The existing commands and functions that
compose the new function reside in a text file called an M-file. This function is
usually used for writing an algorithm for general uses not specified for one
case.
In my algorithm, I have started with this code which means the following:
huffcode(s,p): huffcode for my function name and M-file must be also named
as same as your function name. s stands for symbols and p for probabilities.
CWord, WLen, Entropy, EfficiencyBefore, EfficiencyAfter & Codes, these are
the output arguments
MATLAB CODE
Phase I
• This if statement checks if the probabilities
are in vector or not. If it’s not the MATLAB
will show the error message and will not
continue to the rest of the program.
You can’t give probability of type integer.
You must give a probability of type double.
Note: this check is not too important.
Check each element in the probability
vector. If there is a number less than zero
(negative) then MATLAB will show the
error message and stop.
To be sure that all the elements in
probability vector are less than one.
It’s not enough to check if all the elements
are less than one. We have to check also
the sum of all probability elements are
equal to one.
Here simply to make sure that the size of the
symbols vector and probability vector are
equal.
Phase II
In this phase I will show the idea of
structures and how did I use it to write this
algorithms.
Before taking the Symbols and Probabilities
vectors and push them inside the structure
we must rearrangement in descending
order of probability. That can be done
easily by this simple code.
Structure Function
Struct
Create structure array
Syntax
s = struct('field1', {}, 'field2', {}, ...)
s = struct('field1', values1, 'field2', values2, ...)
Description
s = struct('field1', {}, 'field2', {}, ...) creates an empty structure with fields field1, field2,
s = struct('field1', values1, 'field2', values2, ...) creates a structure array with the
specified fields and values. The value arrays values1, values2, etc., must be cell
arrays of the same size or scalar cells. Corresponding elements of the value arrays
are placed into corresponding structure array elements. The size of the resulting
structure is the same size as the value cell arrays or 1-by-1 if none of the values is a
cell.
Tree is the name of structure. This structure now contains only one
structure with this following properties:
Symbol: To save the symbol name.
prob : To save the original symbol name.
pTemp1: here to where the addition and sorting are done.
Node: To trace each symbol
Serial : Not important, just for checking.
Code: Here, where the code words are saved.
CP: this is the code pointer. The code pointer is to point where the next
0 or 1 should be written.
Note:- one structure is only for one symbol. So, we have to put Tree
inside a loop to generate many structures equal to the number of
symbols we have.
Let’s recall now our example and see how it’s going to be fit inside the
structure.
E
0.30
0.3
0
7
1
T
0.23
0.23
0
6
1
O
0.16
0.16
0
5
1
N
0.13
0.13
0
4
1
I
0.10
0.10
0
3
1
P
0.05
0.05
0
2
1
W
0.03
0.03
0
1
1
S1 S2 S3 S4 S5 S6 S7
Tree
Symbol
Probabilitiy
Temp. Prob
Node
Number
Serial
number
Code word
Code
Pointer
Note: S1, S2 … S7 are structure1, structure 2 … structure7
E
0.30
0.3
0
7
1
T
0.23
0.23
0
6
1
O
0.16
0.16
0
5
1
N
0.13
0.13
0
4
1
I
0.10
0.10
0
3
1
P
0.05
0.05
0
2
1
W
0.03
0.03
0
1
1
Add these two lowest probabilities. Assign 0 for the
lowest probability and 1 for the higher one. After
the increment code pointer with 1 and assign for
both of them same node which is 1.
E
0.30
0.3
0
7
1
T
0.23
0.23
0
6
1
O
0.16
0.16
0
5
1
N
0.13
0.13
0
4
1
I
0.10
0.10
0
3
1
P
0.05
0.8
1
2
[1]
2
W
0.03
0.8
1
1
[0]
2
Resort all the structures, but it happens that they are already
in descending order. Sort only the pTemp1. Now we add the
next lowest probabilities, HOWEVER, we exclude the last
structure ( W symbol) in this addition.
E
0.30
0.3
0
7
1
T
0.23
0.23
0
6
1
O
0.16
0.16
0
5
1
N
0.13
0.13
0
4
1
I
0.10
0.18
2
3
[1]
2
P
0.05
0.18
2
2
[01]
3
W
0.03
0.8
2
1
[00]
3
After addition, and assigning 0 for the lowest and 1 for the higher one. We
check the node number if it’s not 0 search on all structure having the same
node number and change it to the new node number. Also, we increment the
CP and assign 0 for previous structures having the same node number. In this
case P was having node number 1 and same thing W, so, we assign for both
the new node number and put 0 for the code words as well as increment the
code pointer (CP) for both.
E
0.30
0.3
0
7
1
T
0.23
0.23
0
6
1
O
0.16
0.16
0
5
1
N
0.13
0.13
0
4
1
I
0.10
0.18
2
3
[1]
2
P
0.05
0.18
2
2
[01]
3
W
0.03
0.8
2
1
[00]
3
Resorting all the structures in descending order for the pTemp1. Also,
we exclude the last two structures in the sorting ( P & W).
Now, we add the lowest two probabilities O & N.
E
0.30
0.3
0
7
1
T
0.23
0.23
0
6
1
O
0.16
0.29
3
5
[1]
2
N
0.13
0.29
3
4
[0]
2
I
0.10
0.18
2
3
[1]
2
P
0.05
0.18
2
2
[01]
3
W
0.03
0.8
2
1
[00]
3
Because they were having node numbers equal to 0 that means they
have not been added with other symbols we don’t need to check.
E
0.30
0.3
0
7
1
T
0.23
0.23
0
6
1
O
0.16
0.29
3
5
[1]
2
N
0.13
0.29
3
4
[0]
2
I
0.10
0.18
2
3
[1]
2
P
0.05
0.18
2
2
[01]
3
W
0.03
0.8
2
1
[00]
3
Resorting again all the structures and we ignore the last three structures ( N,
P & W). We do the same thing adding the lowest two probabilities. The last
three structures we don’t care about them. They have been excluded and only
we go to them just in case if they have the same node number with the higher
probabilities. If it’s that so, we assign usually 0 the position that has been
indicated by CP (code pointer).
E
0.30
0.3
0
7
1
T
0.23
0.41
4
6
[1]
2
O
0.16
0.29
3
5
[1]
2
N
0.13
0.29
3
4
[0]
2
I
0.10
0.41
4
3
[01]
3
P
0.05
0.18
4
2
[001]
4
W
0.03
0.8
4
1
[000]
4
As we have seen I, P & W have the same node number which is 2. So, we
assign for each of them 0 because I is the lowest symbol comparing with T and
the 0 is in the position that has been indicated by CP. After that, we assign node
number 4 for T, I, P & W and also we increment code pointer for all these four
structures.
Now, resort all the structures again in descending order of pTemp1 and ignoring
the last 4 structures ( I, N, P & W).
E
0.30
0.3
0
7
1
T
0.23
0.41
4
6
[1]
2
O
0.16
0.29
3
5
[1]
2
N
0.13
0.29
3
4
[0]
2
I
0.10
0.41
4
3
[01]
3
P
0.05
0.18
4
2
[001]
4
W
0.03
0.8
4
1
[000]
4
Now, we add the lowest two probabilities and also we
ignore the last four structures.
E
0.30
0.59
5
7
[1]
2
T
0.23
0.41
4
6
[1]
2
O
0.16
0.59
5
5
[01]
3
N
0.13
0.29
5
4
[00]
3
I
0.10
0.41
4
3
[01]
3
P
0.05
0.18
4
2
[001]
4
W
0.03
0.8
4
1
[000]
4
As we have seen O & N have the same node number which is 3. So, we assign for
each of them 0 because O is the lowest symbol comparing with E and the 0 is in the
position that has been indicated by CP. After that, we assign node number 5 for E,
O & N and also we increment code pointer for all these three structures.
Now, resort all the structures again in descending order of pTemp1 and ignoring the
last 5 structures ( O, I, N, P & W).
E
0.30
0.59
5
7
[1]
2
T
0.23
0.41
4
6
[1]
2
O
0.16
0.59
5
5
[01]
3
N
0.13
0.29
5
4
[00]
3
I
0.10
0.41
4
3
[01]
3
P
0.05
0.18
4
2
[001]
4
W
0.03
0.8
4
1
[000]
4
We add these probabilities.
E
0.30
1
6
7
[11]
3
T
0.23
1
6
6
[01]
3
O
0.16
0.59
6
5
[101]
4
N
0.13
0.29
6
4
[100]
4
I
0.10
0.41
6
3
[001]
4
P
0.05
0.18
6
2
[0001]
5
W
0.03
0.8
6
1
[0000]
5
This is the end of the tree. The node number for symbol T was 4 we search in all the
previous structures and check if the have the same node number or not so that we
assign 0 for them too. For E symbol which has node number 5 we do the same but
we assign 1 because it’s higher than T.
You can see that CP is so important to indicate where should I write the 0 or 1.
Phase III
After we have calculated the code word for
each symbol. Now we go to the finalizing
stage there where all the calculations of
length, entropy, efficiency and code words.
This is easy, just applying formula.
This is the output. Code words are listed in the descending order
of the original probabilities.

Huffman Coding & Its Implementation on Matlab.ppt