BAA1034 PRINCIPLES OF BUSINESS ANALYTICS
ASSIGNMENT INSTRUCTIONS
1. This is a group project (4 students per group) worth 30% of your final marks for the subject. The total number of marks for this assignment is 70.
2. Ensure that you regularly make back-up copies of your work. Computer, disk, or laptop problems will not be accepted as valid reasons for late submissions or requests for extensions.
3. Answer the questions according to the parts stated. All answers including the analysis (i.e. tables and charts) conducted in Microsoft Excel, need to be transferred into a Microsoft Word document.
4. Students should adhere to the following formatting and referencing guidelines:
(a) Use default format, paragraph, and margin settings
(b) Font: Times New Roman
(c) Font size: 12
(d) Line spacing: 1.5
5. Generative artificial intelligence (Al) tools can be used to conduct research pertaining to the assessment task. Any use of these tools must be appropriately acknowledged.
6. You will also be required to put your assignment through SafeAssign. The similarity index should not be more than 15%. Note that this is only a rough guideline. Some common usage of phrases and sentences may contribute to the similarity index. You should not be worried about this particular instance.
7. Your submission should include:
(a) Cover sheet which includes the names and student ID numbers of group members
(b) PDF file containing your answers
(c) Excel file (.xlsx) of your analysis
BACKGROUND
In today's dynamic and competitive retail environment, understanding consumer shopping behaviour is crucial for businesses to thrive and effectively meet customer needs. This Group Assignment will allow you to investigate the complexities of consumer behaviour.
The dataset that you will be working with captures a wide range of consumer purchases, offering a multifaceted view of shopping patterns and preferences. This includes:
1. Demographic information (Age, Gender)
2. Purchase details (Item Purchased, Category, Purchase Amount)
3. Indicator of customer behaviour (Frequency of Purchases)
4. Contextual factors (Season)
5. Transaction details (Payment Method, Shipping Type)
This collection of variables allows for a nuanced exploration of how various factors interact to shape consumer choices and shopping patterns. Your task is to apply a range of statistical and data visualization techniques to uncover meaningful patterns and relationships within this data.
DATA
Download the data file from eLearn. The data file contains the following variables:
DATA PREPARATION
The dataset contains 3,900 observations of consumers' shopping behaviour. For this Group Assignment, you are required to draw a random sample of 2,000 observations and conduct data analysis based on this random sample.
1. Add a new column (column K) and name this column as "Random Number".
2. In cell K2, type the Excel function =RAND(). Press "Enter" and a random number will appear in the cell.
Note: The number that you see in your sheet will differ from the number shown in the image below.
3. Fill in the remaining cells in the column up to the last observation.
Note: The values will change each time you do this, but it is not an issue.
4. Sort the random numbers from "Smallest to Largest".
Note: You will notice that the random numbers are not sorted from the smallest to the largest value. However, the Customer ID column will be sorted randomly.
5. After sorting, select the first 2,000 observations. Copy and paste these 2,000 observations into the sheet named "n = 2000". Your analysis should be based on the data in this new Excel sheet.
ASSIGNMENT TASK
Part A: Descriptive Analysis – Age Group and Frequency of Purchases (25 marks)
Where necessary, values should be presented in 2 decimal places.
1. Create a new column, "Age Group", using an appropriate Excel function that contains the following categories:
18-24
25-34
35-44
45-54
55-64
65 or older (3 marks)
2. Using the appropriate Excel function, obtain a frequency distribution for Age Group by filling in Table 1. (6 marks)
Before attempting the subsequent questions, sort the data by Age Group in ascending order. Filter the data for those aged below 35 years, and copy and paste these datad into the sheet named "Below 35". Do the same for observations for those aged 35 and above into the sheet named "35 and above".
3. Obtain two frequency distributions for the Frequency of Purchases by using the appropriate Excel function- one for the "Below 35" category and another for the "35 and above” category. Fill in your answers in Table 2.
In your own words, explain why using percentage frequency (i.e. proportion) is better than using frequency when describing categories. (10 marks)
4. Use an appropriate chart to visualize the information in Table 2. (3 marks)
5. Based on Part A: Questions 3 and 4, discuss the relationship between Age Group and Frequency of Purchases. (3 marks)
Part B: Descriptive Analysis - Age Group and Purchase Amount (25 marks)
Where necessary, values should be presented in 2 decimal places.
1. Obtain the summary statistics for Purchase Amount for the "Below 35" and "35 and above" categories. This analysis can be done in the respective sheets containing the data for the two categories. (2 marks)
2. Transfer the information from Part B: Question 1 into Table 3. Fill in the remaining information in Table 3 using the relevant Excel functions and formulas. (10 marks)
3. Discuss and compare the measures of location for the two age categories. (4 marks)
4. Discuss and compare the measures of dispersion for the two age categories. (4 marks)
5. Use an appropriate chart to visualize the distribution of Purchase Amount for the two age categories. Ensure that you provide an interpretation of the chart. (5 marks)
Part C: Probability — Age Group and Payment Method (12 marks)
Answers for this part should be presented in 3 decimal places. Ensure that you show all calculations.
1. Construct a PivotTable for Age Group and Payment Method in a new sheet and name this sheet "Part C (PivotTable)". (2 marks)
2. What is the probability that a randomly selected customer is between 45-54 years of age? (2 marks)
3. What is the probability that a randomly selected customer uses a credit card as their payment method? (2 marks)
4. What is the probability that a randomly selected customer is between 18-24 years old and uses Venmo as their payment method? (2 marks)
5. What is the probability that a randomly selected customer uses a credit card as their payment method, given that the customer is between 18-24 years old? Are these events independent? (4 marks)
Part D: Visualization - Category, Item Purchased, and Purchase Amount (8 marks)
1. This part requires you to do some additional research. Use a new Excel sheet to answer this part and name the sheet "Part D". 1. There are four main categories in the dataset: Accessories, Clothing, Footwear, and Outerwear. Each of these categories contains a variety of specific items that customers have purchased.
Create a comprehensive visualization that presents a hierarchical view of the product categories, and their specific items along with the purchase amount for each item. You should avoid using bar and column charts in your visualization. Note: This should be answered using a single chart. (5 marks)
2. Provide some insights into the visualization produced. (3 marks)