This document is a project report for an information theory project that implemented data compression using Hadamard transform and Huffman coding. The student was able to successfully compress data, achieving a 28.6% compression ratio. However, they were unable to fully implement the decompression portion. While the compression portion worked well, more information would need to be included in the compressed file for full decompression to function, such as the Huffman coding table. The student reflected that the project helped connect theoretical concepts in information theory to practical implementation of compression algorithms.
1. Information Theory Project Report
Name: Yafei Qu
Computer and Communication Systems
Student ID:13044656
Date: 20.11.2013
Introduction
Appropriate Compression of file has lots of benefit for users and transform systems. This project
required us to understand the process of compression in details and connect theory with
practical. Two computer programs are needed for compression and decompression using
Hadamard Transform and Huffman Coding as basic algorithms.
My whole program was written by C++. In fact, it doesn’t include much object orientation
methods. So I prefer it is a C programming in .cpp file. The compression part works well, but I
am kind of stuck at the decompression part.
For now, the compression part has been done. The generated compression file has 18745
bytes in it. Then the compression ratio will be 18745/65536*100%=28.6%, which is pretty good.
But the trick is the size just includes the pure data without any information for decompression.
Region A, B and C are not required to be recode by any methods. The data of them is directly
moved to the compression file as type of short, whose length is enough for the largest number
in these regions with smaller space. Each data in these 3 regions is saved with 2 bytes, and
2. there are 64*64 = 4096 of them. Therefore, 4096*2 = 8192 bytes are occupied to save them in
the compression file. The rest part of it includes Region D, E, G and G. The
I thinks the major aim of this project could be coding theory and get it involved with practical
implication. The result of that opinion is that this program was established in the simplest way.
For making program simpler, I didn’t save the Huffman coding table in the compression file but
in the memory. More particularly, every compression file should contain a coding table for
decompression could be done everywhere else with certain applications.
Assessment
There are 3 steps of compression, Transform for separate efficient data and useless data,
quantization for making the amount of information lack and Variable Length Coding for making
the information expressing in a more efficient way.
Hadamard Transform: In the project, Hadamard transform is used as the basic algorithm for
transform. In the transform process, figures change in the way Fig.1 shows. It is convenient to
generate visional results. I used to use MATLAB to do the transform and it works great and
generate the pictures in Fig.1. After the transform, the data in the left up corner will be much
greater than original range, so you have to change it yourself if you want to check the figure with
C/C++, which MATLAB will do automatically.
Quantization: Then here comes few problems. There are several ways to get the work done,
shifting bits or division and strategies to handle the quantization. I tried that in both, and I prefer
the shifting and the quantization steps is determined by how many bits the extreme numbers
take. It might make it clear and easier for programming, but there are few problems can’t be
ignored. In fact, the shifting can’t follow the requirement closely. The G region is asked to be
quantized with 2 levels, which means there would be 4 numbers representing all the values in
this area. Shifting can only pick equal numbers in negative and positive numbers and a 0, which
is not exactly required. Furthermore, whether the highest bit determining the number negative or
positive should be operated as the regular bits is not clear for me. In the division method, two
strategies are quite different. Quantization with the range of numbers or the highest bit will lead
in two results. My final strategy is if the highest bit is n, required quantization level is m, then
erasing the lowest n-m bits by shifting right and back, which might be easier to operate.
Variable Length Coding: This part contains the most complex algorithms in whole project,
Huffman Coding.