COMP2150 Asignment 3 Sumer 2018
Due:
Friday, August 10, before 1:59 pm.
Notes
• Please folow both the "Programing Standards" and "Asignment Guidelines" for
al work you submit. Programing standards 1-25 are in efect.
• Hand-in wil be via the UM Learn Assignments facility. Make sure you leave enough
time before the deadline to ensure your hand-in works properly.
• Oficial test data wil be provided several days before the asignment due date. Until
then, work with your own test data.
• Al asignments in this course cary an equal weight.
Assignment 3: Hufman encoding
Structure of the Asignment
I have structured this asignment so you can work on it in stages. The stages are fairly
independent, but form. part of the overal asignment. I wil provide some fre code to get
you started, which includes test programs. Work on stages and test them with the provided
programs. Even if you don't get the whole program working, you can stil get god marks
by completing stages.
You wil write several modules which implement several data structures. Then you wil use
those data structures to solve the problem of Hufman encoding of text strings. The marks
for the modules wil be asigned for sucesfuly runing the test programs.
Huffman encoding
Hufman encoding is an interesting aplication of binary tres. Hufman's algorithm alows
us to produce a minimum data encoding scheme based on the frequencies of characters in a
given text dataset. To use Hufman's algorithm, you first ned to read a file of text and
produce a list of each individual character in the text and the number of times that
character ocured. For this purpose we wil create a data structure caled an Ocurence
List. Once you have these frequencies, you ned to gradualy build an encoding tre using a
Priority Queue (the second module). We begin by making a tre from each of the pieces of
frequency data that we just talied in the file (i.e. many tres with one node in each) and
inserting each of these tres in the priority queue (priority in this priority queue is by
frequency). After everything is in (there wil be a significant size to this queue, as the data
file wil contain smal and capital leters, digits, and special characters), we proced to build
a Hufman tre as folows: repeatedly take the two LOWEST priority items from the queue
(so lowest frequencies here - if two items have the same frequency, ensure that the one
that is the lowest in the ASCI colating sequence is removed from the priority queue first),
build a new tre node with data consisting of the concatenated string of both of those
items, with a frequency equal to the total of those two items, and with the two subtres
themselves as left and right children of the node (make the left one the smaler priority
COMP2150 Asignment 3 Sumer 2018
2
(lowest frequency) of the two subtres, again with colating sequence breaking a tie). Then
insert that new tre in the priority queue. If we kep doing this, we wil eventualy end up
with a single node (tre) in the priority queue, whose data value consists of the al the
string data items in the original frequencies list concatenated together. Al the other
original tres wil be somewhere below this tre - and the more infrequent they were, the
deper they'l be.
So now we have a Hufman tre. How does this help us encode? Wel, since it's a binary tre
(our third module), we can do binary encoding by moving left or right. Say we have a
character we want to encode. Start at the top of the tre and lok for it (at any point I can
tel which to folow because I can lok at the strings of data in the left and right subtres
and se which one contains the character). Let us say that if I have to move right to get to
the part of the tre the character I am encoding is in I wil record a 1, and if I have to move
left I wil record a 0. As I trace down the tre I'l record a series of 1's and 0's, and wil
eventualy find the data I'm loking for (asuming it was in my original data file!). At that
point, the binary digits that were recorded form. the encoding for that character (se the
example in the Notes) - and the encoding is guaranted to be as short as posible since al
the frequently used characters have the shortest strings (shortest path through the tre).
While normal encoding would be REAL binary numbers, of course, we'l just use strings of
0's and 1's.
Modules
Module MyOrderedItem defines an abstract clas caled OrderedItem. Use the techniques
discused in the notes to define an 'abstract' clas in ruby. This abstract clas is used to
ensure sub-clases define two functions: compareTo and to_s.
Module MyOccurenceList defines a data structure caled OccurrenceList. As in the
previous asignments, al your data structures are linked structures that you create on your
own. The nodes contain an item and a count, as wel as a link to the next node. Provide an
increment method in the Node clas so that the count in a node may be easily incremented
by one.
The ocurence list itself should suport the add method, which either ads a new item to
the head of the list (with a count of 1), or increments the count of the existing item. You
may want to write a find method to suport this operation, but you don't have to. Also
write the to_s method, which returns a string containing the contents of the list (items and
counts), one item per line.
Module MyPriorityQueue defines a data structure caled PriorityQueue. Since a priority
queue must have some means of ordering item by priority, ensure that only OrderedItem
objects are aded to the data structure (use the kind_of? method to check that items aded
to the priority queue are OrderedItems). Your data structure must suport an insert
method, a get_first method and an empty? method.
Module MyBST defines a data structure caled BST (a binary search tre). This structure
also must be able to order items, but use Duck Typing this time to ensure that items can
respond to the compareTo mesage (use respond_to?). Your BST wil use the simple
insertion technique of simply folowing a path from the rot to an empty subtre, and
COMP2150 Asignment 3 Sumer 2018
3
inserting a new node at that point (we are not concerned about making a balanced tre). As
you have probably done in Java, have a public insert method, which cals a private,
recursive insert to actualy perform. the insertion. Provide a to_s method that performs an
inorder traversal of the tre (again using a private, recursive method) and returns the
string.
Since we are inserting BST objects into a PriorityQueue when performing Hufman
encoding, the BST clas must extend OrderedItem. Thus, you must also provide a
compareTo method (compare the items in the rots of the tres).
Main program
The main program wil read input from a file (this is included in the fre code). It then
produces a Hufman Tre, and uses that to produce an encoding of the original input. The
encoding is actualy a sequence of '0' and '1' characters, NOT a binary file. It then prints the
original string and the 'encoded' version.
Output
Se the notes on Hufman encoding for an example of Hufman encoding.
Hand-in
Submit al your source code. Include a README explaining how to run your program.
You should also submit a text document containing a short English description of the
interactions betwen objects in your project. This should answer questions like "Which
clas is initialy caled?", "Which method(s) proces the input?", and which objects deal
with tasks in the code. This description should be at most one page of text.
Submit al files on UM Learn.