CS 1100 – Computer Science and Its Applications
Topic 10: Pivot Tables
How to Get Started
To get started, download the starter file ( .xslx).
What to Turn in
You must submit your solution to Canvas by the due date. When you finish the assignment, save the file and upload it to Canvas. The file will be named
LastName-FirstName .pivotTables
where LastName is your last name and FirstName is your full first name.
Knowledge Needed
This assignment involves the following Excel functions and techniques:
• Pivot Tables, inserting fields, filters, and slicers.
• Functional programming with dynamic arrays: LAMBDA, LET, SORT, UNIQUE, FILTER, SUM- IFS, AVERAGEIFS, all LAMBDA-helper functions we covered in earlier assignments/labs, text processing functions such as TEXTBEFORE and TEXTAFTER, stacking functions (HSTACK, VSTACK), array accessing functions such as TAKE, DROP etc.
Keepinmindthathowyouexpresstheformulasmattersandispartofthegrading.It’snotjustgetting therightanswerthatcounts;howyouwritethedynamicformulamatterstoo.
Helper Array Debugging (HAD)
We use the Helper Array Debugging technique for LAMBDA Excel formulas with LET in assignments and exams. Checking your LAMBDAs with LET carefully with different inputs helps you understand the ChatGPT-generated formulas. It improves your debugging skills because it allows you to localize errors in the relevant subformulas.
HAD: Brief Instructions for debugging formulas, specifically LAMBDAs with LET, using helper arrays.
• Show the LAMBDA using FORMULATEXT. You need it to reproduce the formulas in the helper arrays.
• The LET introduces names: each name is assigned a helper array. Put each name as the header of the helper array. If needed, introduce a final helper array for the final result.
• Put the formulas from the LET into the appropriate helper array using inputs and hashtag references to earlier arrays. You can use additional helper arrays if you need to see the output of subformulas.
• Check each step: does it produce the correct output for changing inputs? Note that using helper arrays also produces a dynamic array solution and lets you inspect the output of each step indepen- dently of future steps.
• Using ChatGPT to create the debugging formulas directly in the LAMBDA. It is recommended to use a LAMBDA to show the value or the error code as text (using the VALUETOTEXT function). We call the function echeck.
LAMBDA(value,IFERROR(value, "ERROR " & VALUETOTEXT(value,1)))
This function will help shorten the debugging code because we encapsulate the error handling for one error in a function with a short name (echeck). The purpose of echeck (an abbreviation for error check) is to return a value or to show the Excel 365 error string if the value cannot be computed. The show part of the debugging code does not need to know the details of the echeck implementation. Indeed, other implementations of echeck are possible, and the show section of the debugging should be independent of those details. Otherwise, we would have a violation of the Principle of Least Knowledge, aka the Law of Demeter.
The function below is a bit synthetic and abstract, but it illustrates the importance of catching all possible errors. For example, if you delete the line
"d",echeck(d),
you will get NO debugging information! Therefore, it is essential that each variable called value in the LET has an echeck line in the debugging formula area. Remember that LAMBDAs defined in LET don’t need such an echeck line because the LAMBDA is guaranteed to cause an error. After all, the arguments are missing.
=LAMBDA(x,y, LET(
a,SEQUENCE(x),
b,1/0+a+y,
c,SEQUENCE(x)=0,
d, FILTER(a,SEQUENCE(x)=0),
e, 1/0,
COMMENT1, "Debugging formulas follow",
echeck, LAMBDA(value,IFERROR(value, "ERROR " & VALUETOTEXT(value,1) )),
show,IFERROR(HSTACK(
"a",echeck(a),
"b",echeck(b),
"c",echeck(c),
"d",echeck(d),
"e",echeck(e) ),""),
show))(2,2)
LAMBDAs in this assignment and ChatGPT
Use the Design Recipe to Guide the development of your LAMBDAs. In the testing part (section 6), use HAD to debug your formulas. Use function echeck. Remember that we shared a document containing ChatGPT instructions in English for generating the debugging code. It would help if you used it to avoid the repetitive work of producing the debugging boilerplate. Make sure to fact-check the output.
Required Setup
Transform the data in the sheet Main Data Source to an Excel table. Name the table Problems1to5 and use it to create pivot tables in Problems 1 to 4.
Problem 1 Create a Pivot Table (22 points)
1. Create a pivot table using the data in the worksheet Main Data Source. Choose the option to put the Pivot Table in a new worksheet.
2. Label the new worksheet Problem 1.
3. Show the average REVENUE per MARKET split by LINE OF BUSINESS.
4. Separate the values by REGION.
5. Format REVENUE as Currency.
6. Show a subtotal for each MARKET.
7. Create a report using Tabular Form. You find this in Design Tab (under PivotTable Tools > Report Layout > Tabular Form.
Your pivot table should look like Figure 1.
Problem 2 Create a Pivot Table and Pivot Bar Chart and simulate it with Dynamic Array Formulas (30 points)
1. Create a pivot table and pivot bar chart to show the total Revenue by Region and then by Market using the data in the worksheet Main Data Source. Use the Tabular Form.
2. Label the new worksheet Problem 2.
3. Filter the chart so it shows only the South and sort the result. Don’t show any subtotals or grand totals.
Note: Excel for Mac may produce Pivot Charts that look different from what is shown here.
4. Your pivot chart should look like Figure 3. You have latitude in your chart’s look for color, fonts, and the like.
5. Write a function called SimplePivotTable to simulate the Pivot table below. A big advantage of using a function is that you don’t have to press the Refresh button when the data changes. Follow the Design Recipe to construct the LAMBDA including using HAD in the debugging phase. To define SimplePivotTable we use generic argument names or terminology that works for other Pivot tables you may want to generate. The problem formulation is "Show total X by Y (filtered for specific Y__1) by Z using the data in the Main Data Source". Here are three examples (in the context of Main Data Source) how the function SimplePivotTable could be used.
Show total X by Y (filtered for specific value Y_1) by Z
Show total Revenue by Region (filtered for specific Region_1) by Line of Business
Show total Revenue by Region (filtered for specific Region_1) by Market
Show total Revenue by Market (filtered for specific Market_1) by Line of Business
Figure 1: The Pivot Table in Tabular Form for Problem 1
Note that we need four formal arguments CNumber,CName1,CName2,SpecificCName1 correspond- ing to this line
Show total CNumber by CName1 . (filtered for SpecificCName1) by CName2
The signature for function SimplePivotTable is
SimplePivotTable(Cnumber,CName1,CName2,SpecificCName1) it returns a 3-column-table
where Cnumber is a column of numbers (e.g., REVENUE numbers) that will be summed, CName1 is a column of names (e.g., Region names), SpecificCName1 is a specific value in CName1 (e.g. South), and CName2 is a column of names (e.g., Market names). The columns CNumber, CName1 and CName2 are columns from the main data source. They all have the same number of rows. The CName1 and the CName2 are descriptor values or categories. The function produces a table with 3 columns where the first column contains the value for SpecificCName1. The second col- umn is a sorted and unique filtered list for CName2 that have the corresponding SpecificName1 (e.g., South). Make sure you combine the corresponding rows for the unique CName2 value. The third column contains the corresponding aggregated Cnumber values for the SpecificCName1 and CName2 value. It is ok to add the headers of the three columns manually instead of producing them programmatically.
You will need the following functions to implement the LAMBDA for SimplePivotTable: LET, SORT, UNIQUE, FILTER, HSTACK, =, and SUMIFS.
Figure 2: The Pivot Table for Problem 2
Figure 3: The Pivot Chart for Problem 2
Hints:
• Send the following request to ChatGPT to get started:
This question is about simulating a simple Excel pivot table with Excel dynamic array formulas. Given is a table MDS (main data source) of data with columns Revenue, Region (with entries “North”, “South” and “West”), and Market. Create a pivot table to show the total revenue split by Region and then by Market using the MDS table. Use the tabular form. Filter the table to show only a specific region called “South”. Don’t show any subtotals or grand totals.
Therefore the table shows the total revenue for each market for the “South” region.
Give dynamic array formulas to produce the same output as the pivot table. Package the formulas into a LAMBDA(Revenue, Region, Market, SpecificRegion,...). Test your LAMBDA formula by using columns of the MDS table as actual arguments and “South” as the SpecificRegion. Use a SUMIFS formula with five arguments as part of the solution. Sort the markets in the South region alphabetically.
• When I used ChatGPT on 3/21/2023, it created the following formula for SimplePivotTable:
LAMBDA(Revenue, Region, Market, SpecificRegion,
LET(
FilteredMarket, FILTER(Market, Region=SpecificRegion),
UniqueFilteredMarket, SORT(UNIQUE(FilteredMarket)),
TotalRevenue, SUMIFS(Revenue,
Region, SpecificRegion,
Market, UniqueFilteredMarket),
CHOOSE({1,2}, UniqueFilteredMarket, TotalRevenue)
)
)
Test the LAMBDA carefully and change to the terminology described in the assignment. Instead of the CHOOSE function, use HSTACK.
• Turn in the LAMBDA for the function SimplePivotTable you generated.
Figure 4: The Pivot Chart for Problem 3
Problem 3 Creating Calculated Fields (20 points)
1. Create a pivot table to show the total Revenue by Region and then by Market using the data in the worksheet Main Data Source. Show the results only for the North region.
2. Label the new worksheet Problem 3.
3. Create a Calculated Field Bonus that calculates a bonus for each Market.
Bonus is 1% of Revenue above $40,000,000, and 0.5% for Revenue at or below $40,000,000.
4. Create a second Calculated Field Total that sums Revenue and Bonus.
5. Do not show Subtotals.
6. Format the Revenue, Bonus and Total as Currency.
7. Using the Pivot Table Styles, pick the one that matches what is shown in Figure 4.
8. Call function SimplePivotTable to simulate the Pivot table. Write one dynamic array formula to calculate the bonus and a second one to compute the total. Use function FORMULATEXT to show your call to SimplePivotTable and for the formulas that you use for the calculated fields.
Problem 4 Using Slicers (23 points)
1. Create a pivot table to show the total Revenue by Region and then by Line of Business us- ing the data in the worksheet Main Data Source. Use function SimplePivotTable to produce the same data only for the North region and all lines of business in the North region. Use function FORMULATEXT to show your call to SimplePivotTable.
2. Label the new worksheet Problem 4.
3. Add a Count of Revenue to show how many times each product was sold.
4. Insert two Slicers: one for Region and one for Line of Business.
5. Filter Line of Business so only values for Copiers and Printers are visible.
Your sheet should look like Figure 5.
Figure 5: The Pivot Table for Problem 4
Problem 5 BONUS: Improving Text Processing with REDUCE (10 points)
We go back to assignment T6, problem 1, on text processing. When solving T6, problem 1, we did a lot of manual work to repeatedly call TEXTBEFORE and TEXTAFTER to translate text and delimiters into fields for name, downloads(IOS or Android), company, and the current user base. We did careful manual bookkeeping to ensure TEXTBEFORE and TEXTAFTER produced the desired output. Now we have learned about automating tedious repetition using REDUCE, which we apply now to solving T6, problem one automatically. You write a function PARSE that takes two inputs, a column of text and a list of delimiters and it produces as output a table of fields. We give PARSE in incomplete form, with a lot of holes that you need to fill in, based on the knowledge you learned in the course.
An essential principle of Excel programming and any programming activity is DRY (Don’t Repeat Your-
self). We violated this principle significantly by writing numerous pairs of calls TEXTBEFORE(text,delim,. . .), TEXTAFTER(text,delim,. . .) .
This is a tedious and error-prone task. Instead, we should have written ONE pair of calls TEXTBEFORE, TEXTAFTER inside a REDUCE function call, and then the REDUCE function would repeat the pair call until the text is processed.
Write a function PARSE which takes as input a text and a range of k delimiters, and it outputs a table of k+1 columns, each one representing one component of the text. For example, the text is “a;b::c,d”, k=3, the delimiters are “;”,”::”,”,”. And as output we get the 4 column table “a”,”b”,”c”,”d”.
Instead of writing
=TRIM(TEXTBEFORE(M9#,N8))
=TEXTAFTERCSA(M9#,N8)
many times, we write ONCE
first,TRIM(TEXTBEFORE(new_acc,val)), rest,TEXTAFTER(new_acc,val),
inside a REDUCE call. REDUCE will then automatically execute TEXTBEFORE and TEXTAFTER the correct number of times and handle all the bookkeeping.
Write a function PARSE which has two arguments. The first is text to be split into its components and the second defines the rules how the text is to be split. Those rules are called a simple grammar in computer science. Our grammars are very simple: Text separated by delimiters and we specify them by giving the list of delimiters. PARSE has the signature
PARSE(text,delims) -> Table with COLUMNS(delims)+1 columns .
Each column corresponds to a component extracted from the text based on the delims.
The formula has “holes” that you need to fill in. The holes are numbered UNKNOWN1, UNKNOWN2, . . . ,UNKNOWN10. Turn in a table with two columns and assign to each UNKNOWN a function name so that the formula works correctly according to specification. Test your chosen UNKNOWNs by using them in your formula. Each UNKNOWN is worth 1 point.
Your answer will look like this:
UNKNOWN1 IF
UNKNOWN2 SUMIFS
UNKNOWN3 AND
etc .
=LAMBDA(text,delims,
UNKNOWN1(decorated_first_cells,REDUCE(text,delims, UNKNOWN2(acc,val,
UNKNOWN3(
COMMENT1,"initialize acc with input string and define delimiters as a range", COMMENT12,"acc is a list of text cells",
COMMENT2,"acc is a row and we need the last column which is the rest of the string to parse", new_acc,TAKE(acc,1,-1),
first,TRIM(UNKNOWN4(new_acc,val)), rest,UNKNOWN5(new_acc,val),
first_rest_pair,UNKNOWN6(acc,val, first, rest),
first_rest_pair) )),
COMMENT3,"now comes some cleanup work", number_of_delims,UNKNOWN7(delims),
COMMENT,"every 3rd entry contains a component", seq,UNKNOWN8(1,number_of_delims*3),
logical,UNKNOWN9(seq,3)=0,
drop_last,DROP(decorated_first_cells,,-1), cleaned_first,UNKNOWN10(drop_last,logical),
completed_first,HSTACK(cleaned_first,TAKE(decorated_first_cells,,-1)), inter,IFERROR(VSTACK(decorated_first_cells, number_of_delims,
seq, logical,drop_last,cleaned_first, completed_first),""), inter))("a;b::c",{";","::"})
The output of this formula is:
a;b::c ; a b::c :: b c
2
1 2 3 4 5 6
FALSE FALSE TRUE FALSE FALSE TRUE a;b::c ; a b::c :: b
a b
a b c
Topic 10: Pivot Tables
Problem 6 Grade Computation (5 points)
As you know, the grading rules of CS1100/CS1101 cannot be expressed in Canvas. Therefore, we provide you with an Excel spreadsheet to compute your grade. It should be a straightforward exercise for you to write the spreadsheet yourself. We focus only on one student and use two formulas: one for computing the exam grade and one for the non-exam grade. You have already calculated the exam grade in an earlier assignment. However, the formula given here is improved: The information about how to provide weights for the different components of the course is now localized into one array constant and not spread across a formula. The dynamic array function SumProduct is used to compute the grade. Should the weights change, we only need to update the array constant, not a formula at several places. This is a good separation of concerns and follows the Principle of Least Knowledge.
Use the file
gradeCS1100_one_student .xlsx
next to the assignment instructions file to compute your grade for three scenarios: best, realistic, and minimal. In the best scenario, you give yourself 100 for each grade you have not received, e.g., for make- up 1 and 2. Apply the formulas to three rows for the three scenarios using any technique you like, e.g., dragging. Implement the lookup using XLOOKUP to map percentages (exam percentage + non-exam percentage) to letter grades.
Download your grades from Canvas and feet them to the grading calculator. For M__1 and M__2 give as input 0 if you plan not to take the makeup exams. The formulas spill so you can feed multiple rows to investigate different scenarios or to also compute the grade for a classmate.
The names used for the formal arguments should be self-explanatory:
TH=Take-Home Exam,
E__1=Exam 1,
M__1=Makeup Exam 1, etc.
The above grade calculator also answers this question:
• Q1 - My take home exam grade was not replaced by exam1 grade in canvas.
• A1 - The change will not be reflected in Canvas. It is considered in the formula when calculating the final grade.
Here are more detailed instructions:
• Canvas
Step 1: Open CS1101 from the course tab.
Step 2: Click through the Grades tab.
Step 3: Manually extract the scores (Quizzes, Project, Attendance, Assignments, Take Home Exam, Exams 1 & 2). Put your scores into one row of an Excel spreadsheet where you have your grade calculation functions.
• For Grade Calculator
Step 1: After obtaining the scores, enter the score information into the provided Grade Calculator by carefully giving the correct actual arguments to the functions.
Step 2: Check for the percentage and grade letter from the Grade Calculator.
Step 3: Choose whether or not to take the make-up exam 1 or 2 to improve your final results based on your results from step 2. You have the option of taking both make-up exams. Remember that you must be well prepared because the make-up exam grade will override the corresponding exam grade (potentially lowering it; see the detailed formula above).
Here, follow the two LAMBDAs for grade calculation for one student. They are also available as an Excel file for the T10 files.
=LAMBDA(TH,E_1,E_2,M_1,M_2,LET(
COMMENT0,"Calculate exam grade for one student (weighted percentage)",
COMMENT1,"Function for make-up exam rule",
override,LAMBDA(exam,mu_exam, IF(mu_exam=0,exam,mu_exam)),
exam_weights, {0.06,0.25,0.25},
TH_real,MAX(TH,E_1),
E_1_real,override(E_1,M_1),
E_2_real,override(E_2,M_2),
unweighted_exam_percentages,HSTACK(TH_real,E_1_real,E_2_real),
result,SUMPRODUCT(unweighted_exam_percentages,exam_weights), result))(100,0,0,100,100)
=LAMBDA(Quizzes,Assignments,Project,Attendance, LET(
COMMENT0,"Calculate non-exam grade for one student (weighted percentage)",
COMMENT1,"Function to drop lowest grade from a list of grades",
average_drop_lowest, LAMBDA(r,LET(min_a,MIN( r ),
result,(SUM( r )-min_a)/(COLUMNS( r )-1),result)),
weights, {0 .1,0 .2,0 .1,0.04},
AvgQuizzes,AVERAGE(Quizzes),
AvgAssignments,average_drop_lowest(Assignments),
AvgAttendance,AVERAGE(Attendance)*100, COMMENT2,"Attendance is 1 or 0", unweighted_non_exam_percentages,
HSTACK(AvgQuizzes, AvgAssignments, Project, AvgAttendance), result,SUMPRODUCT(unweighted_non_exam_percentages,weights),
debugging,TEXTJOIN(" ",TRUE,AvgQuizzes," =AvgQuizzes ",AvgAssignments," =AvgAssignments ",
Project,"=Project",AvgAttendance," =AvgAttendance ",result," =result "), result))(E15:F15,G15:I15,J15,K15:N15)