首页 > > 详细

ECE 209 Program 3: market

ECE 209 Program 3: market
Fall 2019
Due Friday, Oct 23 @ 11:59pm
This programming assignment must be completed individually. Do not share your 
code with or receive code from any other student. Evidence of copying or other 
unauthorized collaboration will be investigated as a potential academic integrity 
violation. The minimum penalty for cheating on a programming assignment is a 
grade of 0 on the assignment. If you are tempted to copy because you're running late, 
or don’t know what you’re doing, you will be better off missing the assignment and 
taking a zero. Providing your code to someone is cheating, just as much as copying 
someone else's work.
DO NOT copy code from the Internet, or use programs found online or in textbooks 
as a "starting point" for your code. Your job is to design and write this program from 
scratch, on your own. Evidence of using external code from any source will be 
investigated as a potential academic integrity violation.
For this assignment, you will be reading and analyzing data about purchases at a grocery store.
The learning objectives of the program are:
• Write C functions according to specifications. • Use standard library functions for text file I/O. • Sorting an array of struct values. • Using struct values and pointers with functions.
• Writing a multi-file C program.
Background
Data mining is the practice of analyzing data to discover patterns and extract information. For examples, 
retail stores examine the buying patterns of their customers to learn what kinds of items are typically 
bought together. This information can be used for pricing, product placement, promotions, and so forth.
A particular kind of data mining is to learn association rules about purchases. There are a few statistics 
that are useful in associating the purchase of one item with another in the same transaction. These 
statistics are:
• Support(X) = fraction of transactions that contain item X
• Confidence(X,Y) = fraction of transactions containing X that also contain Y
• Lift(X,Y) = increase in probability of purchasing Y when X is purchased, compared to the 
probability of purchasing Y with no knowledge of X. If lift is greater than 1, there is a high 
association of Y with X -- in other words, if X is purchased, it’s more likely that Y is also 
purchased.
These statistics are mathematically defined below, but the idea for this program is to read a text file with 
purchase information, and to calculate the statistics for certain items and pairs of items.
The data used for this program was downloaded from Kaggle, a site with numerous data sets and 
competitions related to data science. https://www.kaggle.com/heeraldedhia/groceries-dataset
Your program must be written to work with any file that matches the format described below. The 
Kaggle data site has more than 38,000 data items. Your program must work correctly for files that have 
fewer or more data items.
Main Program
You are provided with a program in main.c. It doesn’t do much, except make calls to some of the 
required functions. The primary goal of this assignment is to implement the functions, which will be 
tested and graded using ZyLab. You are encouraged to implement your own main function to test your 
program during development.
Program Structure
The main program is implemented in main.c. This is provided for you. You are also given two header 
files: goceryFile.h and stats.h. You must create two source code files that define the functions 
declared in the header files:
groceryFile.c -- function for reading purchase data from a CSV file.
stats.c -- functions to extract statistics from the file data.
All of these files are compiled and linked together to create a single executable program. See the 
Appendix for information on how to build a project with multiple files. You should start by creating 
dummy versions of all the functions, so that you can have a working program as you work 
incrementally.
File Format: CSV
For this program, we use a more-or-less-standard CSV (comma-separated values) format that is a 
common way to store a spreadsheet in a text-only format. Microsoft Excel can read and write CSV files.
Typically, an extension of .csv is used in the name of a CSV file.
Details about CSV can be found here: https://en.wikipedia.org/wiki/Comma-separated_values
The idea is that values are stored in text format. A line of text corresponds to one row of a spreadsheet, 
or one record of related data. Data items on a single row are separated by a comma. Spaces are not 
ignored -- if there’s a space, it is considered to be part of the value. A linefeed character indicates the 
end of the last data item on a row. Each row should have the same number of data items, but that’s 
probably not enforced. (In our files, it will be true.)
Example: Here are two rows of integer data in CSV format, with four data items in each row.
10,100,35,19
-97,62,800,-1234
Example: Here are two rows of string data in CSV format, with four data items in each row.
fee,fie,foe,fum
tweedle-dee,tweedle-dum,hansel and gretel,Snow White & 7 dwarves
2
Things get a little tricky when the data contains a comma. In that case, the data item uses quotation 
marks to mark the beginning and end of the item. The quotation marks are NOT part of the data. This 
program will not have any CSV data with commas, so that part of the format is not relevant.
Grocery File Format
The data for this program has three fields in each row of the CSV file, in the following order:
• Member (integer)
• Date (string) • Item (string)
The first row of the table contains column header strings. You must read this line and ignore it. Each 
CSV row represents a purchase of a single item. Items purchased by the same member on the same date 
are considered to be part of one transaction.
For this program, you can assume that there will be at most 40,000 rows containing data (not counting 
the row of header strings). The date string will be no more than 10 characters, following the format 
described below. The item string may include spaces, and will be no longer than 25 characters.
There are two different formats used for dates. (I didn’t do this -- the file came this way!)
dd-mm-yyyy
mm/dd/yyyy
In all cases, “mm” is the number of the month, “dd” is the number of the day, and “yyyy” is the year. 
The year is always four digits. In the top format, mm and dd are always two digits, but that’s not true in 
the bottom format.
Recommendation: Read the information as a string (or as multiple strings), and use the atoi function to 
convert from a string to a decimal integer. See Appendix.
NOTE: Your functions will be tested with files other than groceries.csv. There may be other files 
provided for you for testing, and there may be “mystery” files that we use for tests that are not provided 
to you. Your code must work with ANY file that follows the format described above. Your code may 
assume that there are no formatting errors -- the data will match the specified format; there will be no 
missing data or malformed data.
readFile function
The readFile function takes two arguments. The first is a string, which specifies the name of the file. 
The second is a pointer to a struct fileInfo value. Information about the file and its data will be written 
to the value through this pointer.
Two data structures are defined in groceryFile.h that will be needed for this function, and for other 
functions in the program: struct purchase holds information about a single purchase (i.e., a single row 
in the CSV file), and struct fileInfo holds information extracted from the entire file, including an array 
of struct purchase values. There are typedef statements that allow these types to be called Purchase
and FileInfo, respectively.
A Purchase value contains the following fields:
• transaction id -- (integer) This is a unique ID that identifies purchases that are part of the same 
transaction. This is not read directly from the file. Your readFile function must identify purchase 
3
by the same member on the same date, and assign each of those purchases the same transaction 
id.
• member -- (integer) A unique identifier for each shopper. This is read directly from the CSV file.
• date -- (integer) This is an integer representation of the date of the purchase. We use an integer 
because it takes up less space and is faster to compare than a string. The encoding of the integer 
must be: (month × 106
) + (day × 104
) + year. This format makes the integer easy for a human to 
interpret it -- for example, the date 3/6/2020 would result in the integer 3062020.
• item -- (char array) The string describing the item will be read directly from the CSV file and 
stored in this array as a string. (It MUST have a null terminator.) The string may include spaces, 
but the linefeed at the end of the line is NOT part of the string.
Item strings are case-insensitive. In other words, for the purposes of comparing or sorting items, case 
must be ignored. (Use strcasecmp.)
NOTE: It’s a bit wasteful to allocate space for the item string in each purchase, since there will likely be 
multiple copies stored in memory. However, we do this for convenience and to make the programming a 
little easier. (Thought experiment -- How would you redesign this program so that only one copy of each 
item string is stored?)
A FileInfo value contains the following fields:
• purchases -- (integer) The number of purchases read from the file.
• transactions -- (integer) The number of unique transactions in the purchase data.
• members -- (integer) The number of unique member IDs in the purchase data.
• items -- (integer) The number of unique item strings in the purchase data. (Remember: item 
strings are case-insensitive.)
• data -- (pointer to Purchase) This field is an array of Purchase values. The array must be 
dynamically allocated, and its size must be greater than or equal to the value stored in the 
purchases field. Each purchase read from the file must be stored in this array. There is no 
specified order in which the purchases are stored. (In other words, none of your functions should 
assume any particular ordering of purchase data.1) NOTE: You may preallocate the data array according to the maximum number of purchases in a file 
(40,000). This is the recommended approach, even though it may allocate more memory than needed. 
Any user of the data array will know how many purchases are actually stored by looking at the 
purchases field. (Thought experiment -- How would you implement readFile to make sure that the data 
array is exactly the right size?)
NOTE: The data file provided has some duplicates: multiple purchases of the same item on the same day 
by the same member. Do not remove these purchases. It is allowable for a single item to appear multiple 
times in a single transaction.
Hint: Implement the readFile function first. This will give you the data you need to implement the 
statistics functions.
Hint: You may want to sort the data in various orders, to make counting the number of items, assigning 
IDs to transactions, etc. (For example, sorting by item makes it easier to count the number of unique 
1 Remember: We are writing tests for your functions. The data passed in from our test function may not be the same as the 
data your functions see in your code. You must not assume anything about the ordering of purchase data in the array.
4
5 items.) Use the sorting algorithms discussed in class. Implement these as separate functions within 
groceryFile.c.
stats functions and ItemCount
The stats.h file defines one additional data structure type, and also declares several functions that must 
be implemented.
The struct itemCount (also known as ItemCount) provides a convenient way to keep track of the 
number of items for various purposes. It contains an item string (const char *) and a count (integer). 
The item string is intended to point to a string in a purchase -- this is why it’s a pointer, and not an array. 
There is no reason to keep a separate copy of the string in this data structure.
The functions that must be implemented are:
countItem -- return the number of times an item was purchased.
countPair -- return the number of times two specific items were purchased in the same 
transaction.
topItems -- produce an sorted array of ItemCount for the items that were most purchased. The
array must be sorted from highest count to lowest count, and the number of desired items (and 
the size of the array) is provided by the caller. The return value is the number of items stored in 
the array. (This is because the caller might ask for the top 1000 items, when the file only 
contains 50 items.)
support -- return the support value (double) for a specified item.
confidence -- return the confidence value (double) for a specified item pair.
lift -- return the lift value (double) for a specified item pair.
For all of these functions, a const FileInfo pointer is passed in, which provides the file data. This is a 
const pointer because these functions will only read the data, and will not change it. (NOTE: The data
array pointed to by the struct can actually be changed, so you are allowed to reorder the purchases and, 
in theory, to change the purchase data through this pointer. But you definitely should NOT change the 
purchase data.)
The mathematical definitions of the last three are as follows:
support(𝑋𝑋) = number of times X is purchased
number of transactions
confidence(𝑋𝑋, 𝑌𝑌) = number of times X and Y are purchased in the same transaction
number of times X is purchased
lift(𝑋𝑋, 𝑌𝑌) = confidence(𝑋𝑋, 𝑌𝑌)
support(𝑌𝑌)
If the denominator of any of these equations is zero, the function must return 0. Otherwise, you will get
a divide-by-zero error and your program may crash.
Reminder: Dividing integer values uses integer division. To perform floating-point division, you must 
cast one or both of the integers into floating point values, e.g.: (double) x.
Hints and Suggestions
• Don’t overcomplicate the program. Do not use anything that we have not covered in class. 
• For compiler errors, look at the source code statement mentioned in the error. Try to figure out 
what the error is telling you. Try to fix the first error first, and then recompile. Sometimes, 
fixing the first error will make all the other errors go away. (Because the compiler got confused 
after the first error.)
• Use a source-level debugger to step through the program if the program behavior is not correct. 
If you are using CLion on your own computer, the debugger is integrated with the editor and 
compiler, so there’s no excuse for not using it. 
• For general questions or clarifications, use the Piazza, so that other students can see your 
question and the answer. For code-specific questions, post a private message to Piazza and 
attach your code as a file. (Do not copy and paste!) 
Administrative Info
Updates or clarifications on Piazza:
Any corrections or clarifications to this program spec will be posted on Piazza. It is important that you 
read these postings, so that your program will match the updated specification.
What to turn in:
• Submit your groceryFile.c and stats.c files to the zyBook assignment “Fall 2020 - Program 3 -
market”.
Grading criteria:
10 points: Proper coding style, comments, and headers. No global variables, except as needed and 
justified by your comments. (There is no good reason to use a global variable for this 
program.) No goto. See the Programming Assignments section on Moodle for more style 
guidelines. (You will not get these points if you only submit trivial code.)
30 points: readFile
15 points: countItems
15 points: countPairs
10 points: support (depends on countItems)
5 points: confidence (depends on countItems and countPairs)
5 points: lift (depends on countItems and countPairs)
10 points: topItems
NOTE: Points may be deducted for errors, even if all of the zyBook tests pass. This can happen if your 
dummy function “accidentally” passes a test, even though it doesn’t do anything. It may also happen if 
it is obvious to the grader that the program is written specifically to pass the specific tests, and would not 
pass other similar tests. 
6
Appendix A: Setting up the CLion project
1. Create a new project for a C executable with the C11 standard.
2. Download the main.c file (overwriting the main.c created by your project).
3. Download all the .h files into the same directory.
4. Download the .csv files into the same directory.
5. For each additional file (groceryFile.c, stats.c), do the following:
a. Right-click on the project name.
b. Select New > C/C++ Source File.
c. In the menu, set the file name (e.g., stats) and select the .c file. Check the box next to 
“add to targets”. Then click OK.
If you don’t add to the target, then you will need to edit the CMakeLists.txt file to include the .c files in 
the list of files to be compiled for the target executable:
add_executable(XXX main.c groceryFile.c stats.c)
The XXX can be anything, but is generally the name of your project.
Each file is compiled separately. Each file should at least include the corresponding .h file. For 
example, in stats.c, you will need:
#include "stats.h"
You may need to include other header files. Include the header files for the functions you need to use --
do NOT copy the declarations of those functions into your file.
You will need to set the Working Directory, so that the program can file the .csv files.
Also, set the C compiler flags to be consistent with the zyBook:
set(CMAKE_C_FLAGS "-Wall -Werror")
Appendix B: Some useful library functions
To read the first row of the .csv file, you can use fgets (from stdio). Create an array large enough to 
hold the entire line, and use fgets to read the first line into that string.
You can use fgets to read the remaining lines and then interpret the data. This may or may not be 
simpler than using fscanf to read each element. It’s up to you.
https://en.cppreference.com/w/c/io/fgets
The atoi function (from stdlib) is also useful. It takes a string parameter, and tries to interpret that 
string as a decimal integer. This will be helpful when converting the date string to an integer value. And 
if you use fgets to read the entire line, this can be used for the member number.
https://en.cppreference.com/w/c/string/byte/atoi
 
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-23:00
  • 微信:codinghelp2
热点文章
程序代写更多图片

联系我们 - QQ: 99515681 微信:codinghelp2
© 2014 www.7daixie.com
程序代写网!