This function is in charge of populating 3 hash tables by reading in the dataFile, grabbing each line through
fgets(), and computing the hash value from each of the 3 hash functions. When reading in line by line, use
strchr() to find the newline character in the resulting buffer from fgets(), and replace it with a null
character. Next, in order to allow our strings to be case-insensitive, use tolower() on every character of
each string to fully convert all characters in it to lower-case. Following this, we will generate the three hashes
for each string by calling hash(), revHash(), and evenOddHash(). Note that the hashkey returned by each
of these hash functions will not necessarily be an index within the size of the hash table. In order to guarantee
that the indices are within the size of the hashtable and are positive, you MUST do the following for each hash
key returned by the hashes, and use the result as the index into the corresponding hashtable:
idx = ((hash_key % table_size) + table_size) % table_size.
After calculating all 3 hash indices, index into the hash tables htbl and rtbl and set the entry to 1. The third
hash table contains a linked list at each hash bucket, so when an element hashes to that location, the string is
pushed to the front of that linked list, and we do so by calling pushToList().
writeTables.c
void writeTables( FILE * outFile, table_t * htbl, table_t * rtbl,
table_t * eotbl );
This function serializes the three hash tables and writes them to outFile so that they can later be read and
deserialized back into hashtables in their original format (note that the ordering of elements in each linked list
does not matter).
Contents of the tables should be written to outFile in the following order:
1. Write the size of the tables. You only need to write one number here since all three of the them have
the same size.
Back to Top
2. Write all entries of htbl. They are pointed by member tbl in table_t and each entry is a void
*. Use fwrite() (man -s3 fwrite) to write to outFile.
3. Write all entries of rtbl. They are pointed by member tbl in table_t and each entry is a void
*. Use fwrite().
4. Write the strings in each bucket of eotbl. This is slightly more complicated than the three writes
above, which can be accomplished with calls to fwrite. You can use fputs() (man -s3 fputs) to
help you write each string. Notice that fputs writes the string without its terminating null byte (‘\0’), so
you should write a ‘\0’ after each string using fputc()(man -s3 fputc). To represent the end of a
bucket, we want to write an extra null byte (‘\0’).
Example:
The following eotbl should be serialized as: \\0\\0\0\0How to test:
In testwriteTables.c, you should set up the hashtables for testing and a FILE stream to write to.
To check the correctness of the output, you can use the following commands to dump the content of the file:
od -c filename (octal dump that print printable characters)
xxd filename (hex dump)
Check the man pages for these commands to get more information!
Post-Milestone Functions to be Written
Listed below are the modules to be written after the milestone functions are complete.
create.c
int main( int argc, char * argv[] );
This is the main driver for the create portion your program. Its main tasks are to parse the command line
arguments, build the hashtables of email data, and serialize the hashtables to a file.
Parsing command line arguments:
For this assignment, we want the program to support both short and long options, so we will be using
getopt_long() to parse the flags described in the long usage statement. Check the man page (man -s3
getopt_long) for more details and examples of how to use getopt_long(). Make sure to use the
constants for argument parsing defined in pa4.h. The implementation details for the flags are listed below:
Required
/Optional
Short
Flag
Long Flag Required
Argument
How to Handle
O -h --help N/A Print out the long usage to stdout and return indicating
success.
O -s --size table size Convert the table size from a string to a long (using
strtol()) and save the converted value for later use.
R -i --infile infile name Try to open the file of this name
R -o --output output file name Try to open the file of this name
In the case where getopt_long() detects an invalid flag or missing flag argument, it will automatically print
an error message for you. If this happens, you also need to print the short usage and return indicating failure.
Note that infile and output flags are required, so you need to check if either of them is missing. Size flag is
optional. DEFAULT_SIZE (pa4.h) should be used if no size flag entered.
If all flags are parsed without any errors, make sure you also check for extra arguments after getopt_long()
completes.
Build the hashtables and write to file:
Dynamically allocate zero-filled memory for the three hashtables. Remember to check if any error happens
during dynamic memory allocation. If no error occurs, call populateTables(). After the hashtables get
populated, call writeTables() to write to the file.
Free up memory:
Now you are done with the hashtables and are about to exit. Remember to free all the memory you
dynamically allocated. The function freeLinkedList() should be helpful here.Reasons for error:
In this file, as soon as an error is detected, print the appropriate error message(s), short usage
SHORT_CREATE_USAGE (see pa4Strings.h), and return indicating failure.
The following are error messages (numbers indicate the order of error checking) you need to print for each
case, make sure you don’t forget to print the short usage in each case!
1. Command line argument error
○ Size (make sure the error checking is the following order)
■ Size is not a valid number: print the error message (INVALID_NUM)
■ Size is a number too large to be converted to a long: use snprintf() and perror()
to print out the error message (TOO_LARGE_NUM)
■ Size is not within [MIN_SIZE, MAX_SIZE] (pa4.h) inclusive. (isInRange())
○ Infile
■ Error encountered when opening the file. If this happens, call perror() with the
FILTER_ERR, and immediately return. (Don’t forget to set errno to 0 before calling
fopen())
○ Output
■ Error encountered when opening the file. If this happens, call perror() with the
WTABLE_FILE_ERR, and immediately return. (Don’t forget to set errno to 0 before
calling fopen())
○ Any missing flag argument or invalid flag
■ getopt_long() prints out the error message for you
2. Missing either of infile flag or output flag
○ Print ARG_ERR
3. extra arguments after getopt_long() completes
○ Print EXTRA_ARGS
4. Dynamic memory allocation failed
○ Free whatever you have dynamically allocated in this function
○ Call perror() with the custom MEM_ER
Return Value: EXIT_SUCCESS on success and EXIT_FAILURE on any error.
search.c
int main( int argc, char * argv[] );
This function performs the functionality for the search program. It starts by parsing the command line
arguments, handling any erros encountered in the process; then it creates the hash tables and fills them by
reading from the input file. Finally it performs an interactive search and prints stats if needed.
Now let’s take a closer look at each step of the program.
1. Parse command line arguments using getopt: we need to handle 3 possible options
a. INFILE_FLAG: use fopen to open the specified input file. If an error occurs opening the file,
print RTABLE_FILE_ERROR using perror, and then print SHORT_SEARCH_USAGE.
b. HELP_FLAG: simply print LONG_SEARCH_USAGE.
c. STATS_FLAG: if stats flag is used, you need to print info for stats when interactive search is
done.
Back to Top
d. If we didn't see an input file, print error message MISSING_INFILE, and
SHORT_SEARCH_USAGE, then immediately return.
e. If we saw extra args, print error message EXTRA_ARGS, and SHORT_SEARCH_USAGE, then
immediately return.
2. Create the 3 tables, and then fill them by reading from the specified input file (call readTables())
3. Perform. interactive search by calling beginUserQuery()
4. If the stats flag is used, we need to print the number of collisions occurred in the linked list and in each
table.
5. Finally, don't forget to free all allocated memory.
Reasons for error:
● Error calling fopen
● No input file specified
● Extra arguments for the input flags
Return Value: 1 if an error occurs; 0 otherwise.
beginUserQuery.c
void beginUserQuery(int counts[], table_t * htbl, table_t * rtbl,
table_t * eotbl, int statsOn);
This function handles user interaction to search the tables and reports number of collisions occurred as the
user searches for emails.
First, start by printing out a prompt for the user to input the string. Then, use fgets() in a while loop to read
the user input line by line.
● Once we read in a string, we want to find any newline character also read in along the string, and
replace it with a NUL character.
● If the user did not enter any string (simply pressed ‘enter’), then we want to immediately reprompt, and
skip the search.
● Before performing the search, we also need to pre-process the string by lowercasing every character.
(Hint: you might find tolower() helpful)
● Finally we want to check the string in three tables by calling checkTables().
○ If checkTables() returns the macro IN_TABLE, print FOUND_IN_ALL with the current word
as the argument.
○ If the string is found in at least one table, and the user enables the stats flag, print
FOUND_IN_SOME with corresponding arguments.
○ If the string in not found in any of the tables, print WORD_NOT_FOUND.
● To report how many collisions occurred, don’t forget to update the counts[] array in each fgets()
iteration. Index in counts[]represents the number of tables, while the value in the array represents
how many collision occurred in at least index number of tables. For example, 2 words found in all 3
table (count[3] = 2) and there are 2 other words found in 2 of the three tables. Then counts[2] = 2 + 2 =
4. This is because counts[2] holds the number of emails are in 2 or MORE tables.
● At the end of each fgets() iteration, reprompt the user for the next input.
Lastly, print a newline before exiting the function.
checkTables.c
int checkTables( char * str, table_t * htbl, table_t * rtbl, table_t * eotbl );
This function reports the number of tables the input string was seen in.
Perform. the check in each table by calling getFromTable(). Check htbl first, if it is found in htbl then
check rtbl, if it is found in rtbl then check if it is found in eotbl. Increment the count every time the string
is found in a table. If the string is found in eotbl, then you must also look inside the linked list attached to its
eotbl index. If the input string itself is also found in the linked list, return IN_TABLE instead of the count.
Return Value: Number of tables the string was found in, or IN_TABLE if it is found in all three tables and the
linked list attached to the 3rd table
freeLinkedList.c
void freeLinkedList( linkedList_t * head );
This function frees all elements in the input linked list. To free a linked list, iterate through the list starting from
the head, first free the value of the node, then the node itself, until we reach the end of the list.
getFromTable.c
void * getFromTable( int hashVal, table_t * table );
This function takes a hash value and a table pointer and returns the element at the relevant index of the table.
To calculate the index, mod the hash value by the size of the table, add table size to the result, and then mod
the entire result by the table size again (the same formula from populateTables.c).
Return Value: The element at the relevant index of the table.
readTables.c
void readTables( FILE * outFile, table_t * htbl, table_t * rtbl,
table_t * eotbl );
Red from outFile and populate the three hashtables serialized in writeTables(). You can assume the
outFile is a file of the same format produced by writeTables(). The hashtables should have the same
structure as original ones, but the order of elements within a linked list does not matter.
Refer to writeTables() for how the hashtables are serialized to the file.
You should read in the following order:
● Read the size of the hashtables
● Read the htbl contents
● Read the rtbl contents
● Read eotbl contents
Back to Top
Here is one way to implement reading the third table eotbl from the file:
● Keep a relatively big buffer (size should be greater than max word size) and read the file in blocks of
the size of the buffer with fread()
○ If there is no incomplete string left from last read into the buffer, you can try reading in a whole
buffer size of data
○ If there is an incomplete string from last read, the incomplete string should be move to the front
of the buffer (more details on this in the next bullet point). When you read from file into the buffer
again, you should append to the incomplete string that is currently at the beginning of the buffer
● Process data in the buffer you just read
○ If you encounter a ‘\0’,check if it is the end of a bucket in eotbl or the end of a string
■ If it is the end of a bucket, go to the next bucket
■ If it is the end of a string, push the string to the linked list of the current bucket
(pushToList())
○ If there is a string that is at the end of the buffer but not terminated, move the incomplete string
to the front of the buffer so that next time when you read data from the file into the buffer, you
can start after the end of the incomplete string. You can use memmove()(man -s3 memmove)
to accomplish this.
Reasons for error:
● Error with dynamically allocating memory.
○ Free previously allocated memory for the hashtables. Call perror(), passing in the macro
MEM_ERR defined in pa4.h as the parameter. (perror() will print the correct error message
for you) And then return.
Return Value: None (The three hashtables should be populated correctly if no error occurred)
Unit Testing
You are provided with only one basic unit test file for the Milestone function, testrevHash(). This file only
has minimal test cases and is only meant to give you an idea of how to write your own unit test files. You must
write unit test files for each of the Milestone functions, as well as add several of your own thorough
test cases to all of the unit test files. You need to add as many unit tests as necessary to cover all
possible use cases of each function. You will lose points if you don’t do this! You are responsible for
making sure you thoroughly test your functions. Make sure you think about boundary cases, special cases,
general cases, extreme limits, error cases, etc. as appropriate for each function. The Makefile includes the
rules for compiling these tests. Keep in mind that your unit tests will not build until all required files for that
specific unit test have been written. These test files will be collected with the Milestone, they must be
complete before you submit your Milestone.
Unit tests you need to complete:
testpushToList.c
testrevHash.c
testevenOddHash.c
testpopulateTables.c
testwriteTables.c
To compile:
$ make testrevHash
To run:
$ ./testrevHash
(Replace “testrevHash” with the appropriate file names to
compile and run the other unit tests)
Back to Top
README Questions
1. [AI] Your friend is stressed and asks you for your CSE 30 PA code. How do you convince them to act
with integrity?
2. [Unix] You are trying to open your pa4Strings.h file. You have already typed “vim pa4S”. Which single
key can you press to autocomplete the command to “vim pa4Strings.h”?
3. [Vim] How do you change the color scheme in vim? Give the command you would use to change the
color scheme to “desert”.
Extra Credit
There are 5 points total for extra credit on this assignment.
● Early turnin: [2 Points] 48 hours before regular due date and time
[1 Point] 24 hours before regular due date and time
(it’s one or the other, not both)
● [3 Points] Multiple Email Search
If there is anything in these procedures which needs clarifying, please feel free to ask any tutor, the instructor,
or post on the Piazza Discussion Board.