Project #2 (Huffman Coding)
CSC 172 (Data Structures and Algorithms), Spring 2021,
University of Rochester
Objectives
This project covers Huffman Codes and their implementation (priority queues, trees). Also you will learn how to
read and write binary files.
Project Description
Huffman encoding is one of the most popular lossless compression algorithms that works fairly well on any text
file. In fact, this algorithm can be applied to any file. It can reduce the storage required by a half or even more in
some situations. At the end of the project, you will impress yourself implementing this excellent algorithm! This
particular project can be used to even encrypt and decrypt files (provided you keep the frequency file hidden).
In this project, you have to write a program that allows the user to compress and decompress files using the
standard Huffman algorithm for encoding and decoding.
Overview of the Program Structure
We will provide four Java files to start with:
BinaryIn.java
(source: ❤tt♣s✿✴✴❛❧❣s✹✳❝s✳♣r✐♥❝❡t♦♥✳❡❞✉✴❝♦❞❡✴❡❞✉✴♣r✐♥❝❡t♦♥✴❝s✴❛❧❣s✹✴❇✐♥❛r②■♥✳❥❛✈❛✳❤t♠❧)
This class provides methods for reading bits (one or many) from a binary input stream. The binary input stream can
be produced from a filename.
The methods of interest are:
♣✉❜❧✐❝ ❜♦♦❧❡❛♥ r❡❛❞❇♦♦❧❡❛♥ ✭✮ ❀
♣✉❜❧✐❝ ❝❤❛r r❡❛❞❈❤❛r ✭✮ ❀
BinaryOut.java
(source: ❤tt♣s✿✴✴❛❧❣s✹✳❝s✳♣r✐♥❝❡t♦♥✳❡❞✉✴❝♦❞❡✴❡❞✉✴♣r✐♥❝❡t♦♥✴❝s✴❛❧❣s✹✴❇✐♥❛r②❖✉t✳❥❛✈❛✳❤t♠❧)
This class provides methods for writing 1-bit boolean or 8-bit char to an output stream such as a file.
The methods of interest:
♣✉❜❧✐❝ ✈♦✐❞ ✇r✐t❡ ✭ ❜♦♦❧❡❛♥ ①✮❀
♣✉❜❧✐❝ ✈♦✐❞ ✇r✐t❡ ✭ ❜②t❡ ①✮❀
♣✉❜❧✐❝ ✈♦✐❞ ❢❧✉s❤ ✭✮ ❀ ✴✴ ❘❡❛❧❧② ✐♠♣♦rt❛♥t ✳ ❋❧✉s❤❡s t❤❡ ❜✐♥❛r② ♦✉t♣✉t
str❡❛♠ ✱ ♣❛❞❞✐♥❣ ✵s ✐❢ ♥✉♠❜❡r ♦❢ ❜✐ts ✇r✐tt❡♥ s♦ ❢❛r ✐s ♥♦t ❛
♠✉❧t✐♣❧❡ ♦❢ ✽✳
Project #2 (Huffman Coding) Page 1 / 4
Huffman.java
❍✉❢❢♠❛♥✳❥❛✈❛ contains Huffman interface which you must implement.
❨♦✉ ❛r❡ ♥♦t ❛❧❧♦✇❡❞ t♦ ♠♦❞✐❢② t❤✐s ❢✐❧❡✳
This interface contains two methods:
♣✉❜❧✐❝ ✈♦✐❞ ❡♥❝♦❞❡ ✭ ❙tr✐♥❣ ✐♥♣✉t❋✐❧❡ ✱ ❙tr✐♥❣ ♦✉t♣✉t❋✐❧❡ ✱ ❙tr✐♥❣ ❢r❡q❋✐❧❡ ✮❀
♣✉❜❧✐❝ ✈♦✐❞ ❞❡❝♦❞❡ ✭ ❙tr✐♥❣ ✐♥♣✉t❋✐❧❡ ✱ ❙tr✐♥❣ ♦✉t♣✉t❋✐❧❡ ✱ ❙tr✐♥❣ ❢r❡q❋✐❧❡ ✮❀
HuffmanSubmit.java
This class implements Huffman interface (given in ❍✉❢❢♠❛♥✳❥❛✈❛ ). All modifications that you make must be
limited to this file. Feel free to add new methods and variables, and import any packages as required. The only
requirement is implementing ❡♥❝♦❞❡ and ❞❡❝♦❞❡ methods correctly. We will test your submission on these two
methods only.
source code
You can download the tar file from ❤tt♣✿✴✴✇✇✇✳❝s✳r♦❝❤❡st❡r✳❡❞✉✴❝♦✉rs❡s✴✶✼✷✴s♣r✐♥❣✷✵✶✽✴t❛r✴♣r♦❥✸❴
✉♣❧♦❛❞✳t❛r
Extract the tar file. The resulting directory should contain four java files and 2 sample files to test your code.
Both ❇✐♥❛r②■♥✳❥❛✈❛ and ❇✐♥❛r②❖✉t✳❥❛✈❛ files contain main methods to play with the functionality of these
two classes.
Your Task
HuffmanSubmit.java
This is the only file where you need to modify.
■♠♣❧❡♠❡♥t ❡♥❝♦❞❡ ❛♥❞ ❞❡❝♦❞❡ ♠❡t❤♦❞s✳
Input arguments for ❡♥❝♦❞❡ method:
• inputFile: The name of the file to be encoded.
• outputFile: The name of the encoded file. This is the compressed file (produced after encoding).
• freqFile: The name of the frequency file. This file stores the frequency of each character present in the input
file. Each row contains a 8 digit representation of the char and its frequency separated by ‘:’. A sample row
may look like: ✶✶✵✵✶✵✶✵✿✹✵ where ‘11001010’ is the character occurring 40 times in the input file.
Input arguments for ❞❡❝♦❞❡ method:
• inputFile: The name of the already encoded file.
• outputFile: The name of the decoded file you want to produce. This is the file to be written after decoding.
• freqFile: The name of the frequency file created while encoding. Use this file to construct the Huffman Tree
and perform decoding.
Project #2 (Huffman Coding) Page 2 / 4
A sample invocation of encode and decode methods
You can test your code with the provided sample files, ❛❧✐❝❡✸✵✳t①t and ✉r✳❥♣❣ . For your convenience, I have
already added the following code snippet to the ❍✉❢❢♠❛♥❙✉❜♠✐t✳❥❛✈❛ file.
❍✉❢❢♠❛♥ ❤✉❢❢♠❛♥ ❂ ♥❡✇ ❍✉❢❢♠❛♥❙✉❜♠✐t ✭✮ ❀
❤✉❢❢♠❛♥ ✳ ❡♥❝♦❞❡ ✭✧ ✉r ✳ ❥♣❣ ✧ ✱ ✧ ✉r ✳ ❡♥❝ ✧ ✱ ✧ ❢r❡q ✳ t①t ✧✮ ❀
❤✉❢❢♠❛♥ ✳ ❞❡❝♦❞❡ ✭✧ ✉r ✳ ❡♥❝ ✧ ✱ ✧ ✉r❴❞❡❝ ✳ ❥♣❣ ✧ ✱ ✧ ❢r❡q ✳ t①t ✧ ✮❀
• The encoded file ✉r✳❡♥❝ is input to the decode method
• In encode method, we write ’freq.txt’ and in decode method we read the same file.
• The output file generated in decode method should have a different name than the file you started with
(in this example, ✉r✳❥♣❣ vs. ✉r❴❞❡❝✳❥♣❣). This is just to ensure that after decoding, both ✉r✳❥♣❣ and
✉r❴❞❡❝✳❥♣❣ are the same. On linux and mac, you can use ‘diff’ command to check if they are the same.
• Important: Encoding and decoding are not required to be done at the same time. For example, I encode a
file today. close the program, and tomorrow, I should be able to decode the file. Also, the same program
should be able to encode and decode multiple files during the same run, i.e. each encode and decode is
independent.
How to submit
Zip (archive) ❍✉❢❢♠❛♥❙✉❜♠✐t✳❥❛✈❛ and a README file together and save it as Pr♦❥✷✳③✐♣ . Upload the file
to Blackboard. Your README file should include any specific instruction which is useful for the project. Include
in README also names of team members. Make sure your source file are properly commented so that user can
browse your code without difficulty. You will lose points for submitting any additional file or directory.
Grading Details
• Pr♦❥✷✳③✐♣ file must contain only two files, HuffmanSubmit.java and README.
• Frequency file correctly generated: 20 pts
• ❡♥❝♦❞❡✭✮ method performs correctly: 40 pts
• ❞❡❝♦❞❡✭✮ method performs correctly: 40 pts
(For getting any credit for decoding, your encoding must work correctly)
You should thoroughly check your code for encoding and decoding before submission.
• Bonus: up to 10 extra points if you show a visual representation of the Huffman Tree created in the process +
up to 10 points if you show an animation of the tree creation.
Acknowledgement
Thanks to Robert Sedgewick and Kevin Wayne for providing ❇✐♥❛r②■♥✳❥❛✈❛ and ❇✐♥❛r②❖✉t✳❥❛✈❛ files.
Project #2 (Huffman Coding) Page 3 / 4
Tips and Suggestions
Based on questions from the students.
• When writing the bit patterns to the encoded file, you do not write the ASCII characters ‘0’ and ‘1’ (That
would rather increase the file size), instead the bits are written as true/false (0/1) using the write(boolean)
function given in BinaryOut.java.
• Frequency file is a regular text file. So, you really do not need to use BinaryOut for that