首页 > > 详细

调试Java、Java讲解、辅导Java、调试Huffman encoding

In this asignment, you will implement the Huffman encoding of English characters, using a combined list and binary
tre data structure. The method should have the following signature:

public static String huffmanEncoder(String inputFileName, String encodingFileName String outputFileName);

which wil read and compres an input text file inputFileName, based on the Hufman encoding produced using the
input text file encodingFileName, and output the compresed file in outputFileName (note that these strings represent
the file names, not content – you ned to perform. file I/O). The method wil output the outcome of its execution, e.g.,
“OK”, “Input file error”, “Encoding file error” , etc. The output file can simply contain the characters “0” and “1”, instead
of the binary bits. It means you are not required to use sophisticated binary bit operations to actually store the
encoded characters in a compact way. In fact, because you wil be translating each character into a sequence of
zeros and ones, each being the full character itself, your output file size wil actually be significantly larger than the
input file. Decompresing a file is not required.

The program should also print out

(a) The Huffman encoding of characters in the form. of a table of character/frequency/encoding triples – one triple
per line, with “:” separating the elements, e.g. (the encoding is imaginary, just an example):

a: 315: 10010
b: 855: 1010


(b) The calculated amount of space savings (see below for details).

You wil ned to use a Java’s standard list facility but implement your own HufmanNode class with the folowing
fields:

· inChar: the character denoted by the node. This is meaningful only for the leaf nodes of a Huffman tre so
interior nodes can have garbage here. But storing garbage rarely represents a good programming style. So you
need to distinguish between a meaningful character and “nothing”. To do so even when any value may potentially
represent some valid character code you should use the Character wraper clas instead of the primitive “char” type.
Then you can have “null” to represent “nothing”.
· frequency: the frequency of ocurences of all characters in the subtre roted at this node. For a leaf node,
this frequency is the frequency of the character in the leaf node; for an interior node, the frequency is the sum of all
frequency values in the leaves of the subtre.
· left: left child of a node in the Huffman tre
· right: right child of a node in the Hufman tree

(Note: the above clas contains no “next” field since you are using Java’s native lists and the objects of this clas are
just elements in the list)

Explain your choice of the Java List implementation (ArrayList vs LinkedList) in the coments. (Do not lok
for a dep reason why one of the choices wil be unsuitable – both are actually workable; but because you
have to pick one implementation, share your thoughts why you picked one and not the other.)

You wil implement several helper methods, specificaly:
· A method to scan the encoding text file to generate the initial list of HufmanNodes (you wil ned to
read the file and define a local table in this method to remember the frequency),
· A method to merge two HufmanNodes and return the combined HufmanNode,
· A method to run the Hufman encoding algorithm to produce the Hufman tre,
· A method to traverse the Hufman tre to output the character encoding, and
· A method to scan the input text file, produce the encoded output file, and compute the savings.

You can use whatever method of computing the character frequencies. However, once you have them, you
do need to generate Huffman encoding, rather than assign variable-length codes to characters in some ad-
hoc manner.


Testing your code

As a test of your program, you wil ned to test the compresion eficiency of the included text “Gadsby.txt” using two
generated encodings.

Gadsby is a 1939 novel writen by Ernest Vincent Wright. It is a story about how the titular character, John Gadsby,
fights to restore his hometown to its former glory. What makes this novel distinctive is that it was written entirely
without the leter “E”. As it is written entirely without that letter the novel does not follow the typical letter frequency
comon in most english writing where “E” is the most comon. You wil analyse the relative eficiency of
compressing the novel using a Hufman Encoding generated by statistical analysis of general english writing and a
Hufman Encoding generated by analysis of the text of the novel itself.

The first encoding you wil use is based on the relative character frequency obtained through the analysis of Gadsby.
This can be obtained done by running the helper methods that are specified above to generate the encoding using
“Gadsby.txt” as the encoding file. This wil entail caling the method hufmanEncoder using “Gadsby.txt” as both the
input file and the encoding file.

The other of these encodings is based on the relative character frequency in text writen in english. This frequency
wil be obtained by using your helper methods to generate a Hufman encoding based on the unabridged dictionary
included as an attached file as “Dictionary.txt”. While this is not a perfect huffman encoding relative to the frequencies
of characters it is close enough for our purposes. This test entails calling huffmanEncoder using “Gadsby.txt” as the
input file and the unabridged dictionary as the encoding file.

Calculate the relative eficiency of the encoded text generated by each encoding and compare them to each other.
For this, you assume that in the original encoding, each character costs 8 bits in space, and then as you encode the
text with Huffman encoding, you count how many bits each character will now ocupy, and so you can maintain the
runing total to the end. Please write a short (~1 Paragraph) explanation of why the compresed encodings are the
size they are and how they compare. Please include details about why basing your encoding on another text aside
from the one you are compresing changes the compresed size.
Deliverables:
A zip file containing:
1. All source codes, including suficient coments for the TAs to understand what you have done (code without
coments wil be penalized). This must include the main method. We wil run your program using comand:

“java HufmanCompresor inputFile encodingFile outputFile”

so please name your classes accordingly.

3. The Huffman encoding tables you generated using the two encoding files.
4. The encoded text files, labeled to specify which file is the result of which encoding.
5. The calculated amount of space savings each encoding has achieved on your text file. (Again, this wil not be
reflected in the actual sizes of the original and encoded files because you are representing bits as entire characters in
your output file.)
6. The explanation of the relative sizes of the compressed text with detail about how the encoding changes the
size.
Grading rubrics:
Scaning the encoding file and generating character frequencies: 20 pts
Producing the Hufman tre (including the HuffmanNode clas specification): 25
Using the Hufman tre to produce the encoding table: 25
Producing the encoded output files, computing the space savings, paragraph on encoding sizes: 20
Style. and completenes of submision, including detailed comments: 10
Remember that al work submited must be your own. Asking for help is fine but copying is explicitly not.

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!