The assigment is overdue now. I will up the price I am willing to pay to have it done. It must be completed by the latest on the 8th. However I would greatly preffer for it to be finished on the 7th. I will add $20 to the price if you can complete it by then.
I need the files to run in Eclipse. Please upload them as such. See my file as an example.
They must be coded in java.
This is for a 200 level data structures class so please match that level accordingly.
I need java Test cases and comments please. (see my example file)
I have included part of the previous assigment so you can see what mine looks like and it also may have things you may need like map etc. Though I cannot guarantee all of it is working properly. (named example)
-----
The Task
Okay, we are finally going to finish off the Huffman application. Last week, we got to the point where we could build the tree and do some simple encoding and decoding. The problem with our technique is that we were converting text into Strings of 0s and 1s. Hopefully, the vast majority realized that this is the opposite of actual compression. Our goal was to
reduce
the number of bits required to encode a character. Instead, we blew it up, replacing one character with a whole String of them! Ouch.
So, we are going to do some bit level work, and learn how to write binary data out to files and read it back in again. The second issue that we have is that once we have an encoded file, we don't have a good way to decode it without first having the original source file, which rather defeats the purpose of compressing the file in the first place. Thus, we need to store a copy of the tree in the coded file. Of course, we want to store the tree in a format we can read back out again, and also doesn't take up too much room...
Background: Canonical form
We would like our Huffman codes to be
canonical
. I gave you an algorithm for picking which character to add next to your Huffman tree that comes close to producing canonical codes, but, sadly, I have to admit it has a flaw. There is an edge case that it can never compensate for. So, we are going to have to do some more work to get it into the canonical form.
The canonical form adds two rules to the way Huffman codes work. First, all codes of a given bit length have lexicographically consecutive values (i.e., not just in order, but immediately consecutive), in the same order as the symbols they represent, and second, shorter codes lexicographically precede longer ones. An example will make this a little clearer...
Suppose we have some piece of text in which the characters A,B,C,and D appear, with the following frequencies [A:1, B:5, C:6, D:2]. We could put these into a Huffman tree and come out with the following codes:
A
001
B
01
C
1
D
000
This is not in canonical form. The first rule is broken because A comes before D, but 000 should precede 001. The second rule is broken because C had a shorter code than B, but 01 precedes 1 lexicographically. Here.
internship ppt on smartinternz platform as salesforce developer
The assigment is overdue now. I will up the price I am willing to pa.docx
1. The assigment is overdue now. I will up the price I am willing
to pay to have it done. It must be completed by the latest on the
8th. However I would greatly preffer for it to be finished on the
7th. I will add $20 to the price if you can complete it by then.
I need the files to run in Eclipse. Please upload them as such.
See my file as an example.
They must be coded in java.
This is for a 200 level data structures class so please match that
level accordingly.
I need java Test cases and comments please. (see my example
file)
I have included part of the previous assigment so you can see
what mine looks like and it also may have things you may need
like map etc. Though I cannot guarantee all of it is working
properly. (named example)
-----
The Task
Okay, we are finally going to finish off the Huffman
application. Last week, we got to the point where we could
build the tree and do some simple encoding and decoding. The
problem with our technique is that we were converting text into
Strings of 0s and 1s. Hopefully, the vast majority realized that
this is the opposite of actual compression. Our goal was to
reduce
the number of bits required to encode a character. Instead, we
blew it up, replacing one character with a whole String of them!
Ouch.
So, we are going to do some bit level work, and learn how to
write binary data out to files and read it back in again. The
second issue that we have is that once we have an encoded file,
we don't have a good way to decode it without first having the
2. original source file, which rather defeats the purpose of
compressing the file in the first place. Thus, we need to store a
copy of the tree in the coded file. Of course, we want to store
the tree in a format we can read back out again, and also doesn't
take up too much room...
Background: Canonical form
We would like our Huffman codes to be
canonical
. I gave you an algorithm for picking which character to add
next to your Huffman tree that comes close to producing
canonical codes, but, sadly, I have to admit it has a flaw. There
is an edge case that it can never compensate for. So, we are
going to have to do some more work to get it into the canonical
form.
The canonical form adds two rules to the way Huffman codes
work. First, all codes of a given bit length have
lexicographically consecutive values (i.e., not just in order, but
immediately consecutive), in the same order as the symbols they
represent, and second, shorter codes lexicographically precede
longer ones. An example will make this a little clearer...
Suppose we have some piece of text in which the characters
A,B,C,and D appear, with the following frequencies [A:1, B:5,
C:6, D:2]. We could put these into a Huffman tree and come out
with the following codes:
A
001
B
01
C
1
D
000
This is not in canonical form. The first rule is broken because A
comes before D, but 000 should precede 001. The second rule is
broken because C had a shorter code than B, but 01 precedes 1
lexicographically. Here is an alternate assignment of codes that
3. does satisfy the rule:
A
110
B
10
C
0
D
111
0 precedes 10, which precedes 11x, and 110 and 111 are
lexicographically consecutive.
The algorithm for deriving canonical codes is fairly
straightforward, provided we start by knowing the lengths of the
codes. The trick is to think of the codes as binary numbers. The
first code, which is given to the lexicographically smallest
symbol with the shortest code, is 0, padded out with 0s to be as
long as the symbol's code. In our example, C has the shortest
code (length 1), so its code is 0 (if C's code has been length
three, the code would have been 000). We then increment the
code (counting in binary), which will produce the
lexicographically consecutive code. If the next symbol has a
longer code, we pad the code with 0s to the right to match the
code length (in other words, we perform a left shift or a
multiply by 2). Thus, the next code would be 1, but B has a
longer code length, so it gets the code 10. To move to A, we
increment (11), and shift again to get 110. D has the same code
length as A, so it gets 111.
Why is this interesting? Well, it means that we can recover the
entire tree starting only from knowledge about the length of the
codes. So, when we want to store our tree in a file with our
data, we just have to record the character followed by the length
of its code. Cool, huh?
Part one: Generate canonical codes [25 pts]
We can keep all of the work that we did in the last assignment,
but the tree that we produce is no longer the tree that we want
to keep. Its sole purpose is to provide us with the initial code
4. lengths for generating the canonical codes. Instead of traversing
the tree and loading the reverse lookup into a map, you are
going to traverse the tree and collect symbols and their code
lengths, dumping them into a new priority queue as we go. I
suggest tweaking your old wrapper class that held symbols and
frequencies to be a more generic pairing of counts and symbols,
so you can reuse it for symbols and code lengths (this is
probably just some renaming to make it more general purpose).
You will want to write a new Comparator that will return these
items to you in order first by code length (shortest first) and
then in lexicographic order (a to z).
You should then write a new private method called
buildTreeFromMap
which takes in a PriorityQueue of your wrapper objects as an
argument. In this method you will use the PriorityQueue to
implement the above algorithm to build the Huffman tree
(which the algorithm I provided just specifies the codes, you
will want to build up into a tree that represents the given code
words). The tree that results from this process is the one to save
as an instance variable (for decode). Now is when you will
build the Map that you need for encode (you should be able to
reuse the code you had for building it before). The old
buildTreeFromFile
should obviously call
buildTreeFromMap
once it has built the code length map.
Part two: File handling [40 pts]
We can finally get to the actual application! The first step will
be to encode the file. Start by making one additional tweak to
buildFromFile
that saves the
File
object in an instance variable. Write a new function called
saveCompressedFile
, which will take a single
File
5. object as an argument. This file will be the destination for the
compressed file. This function will first write the tree
representation to the destination file. Start by writing an integer
that says how many character, code lengths pairs to expect (this
tells the reader how many bytes of the file are consumed by the
tree). Then write out each character followed by the length of
its code. Following the tree, you can start writing the
compressed data from the file. This works just like encode,
walking character by character, except that you will want to
write bits, rather than Strings holding 0s and 1s (we'll talk
about this in class).
You will then want to write the compliment to this. The
difference here, however, is that we don't have the original
source data to build the tree again. So I want you to write a
second static factory method called
newTreeFromCompressedFile
. Like
newTreeFromFile
, this will create a new HuffmanTree object and then call a
private instance method called
buildFromCompressedFile
, to which it passes the compressed file. This should read the
tree description out of the file, dumping the symbols and code
lengths into a priority queue as it reads. It should then call
buildTreeFromMap
to initialize the tree and the reverse lookup map.
Finally, write a method called
saveExpandedFile
, which takes in a destination
File
and processes the original file, using the tree to decode the
data.
If you encounter errors in either compressing or decompressing
with respect to unknown codes, throw a
CharConversionException.
A little help
6. To help you get started, here is a skeleton of the HuffmanTree
that includes som of the old functionality from the last
homework: -- huffmanTree -- see attatched
---
Java Tests needed
The assignment info from the previous assignment is attatched.
It explains how other things such as map are supposed to be
built.
Huffman tree skeleton is also attatched.