首页 > > 详细

Part 1. Building a Recommender System

Part 1. Building a Recommender System (10 marks)

1.1 Kaggle Competition (5 marks)

In this part of the assignment, you will build a recommender system model to predict ratings

related to music reviews on Amazon. Specifically, given a (user, item) pair and associated review

data, we want to predict the review’s star rating as accurately as possible. Performance will be

measured with MSE.

Files train.json.zip 200,000 review to be used for training. It is not necessary to use all ratings for training, for example if doing so proves too computationally intensive. reviewerID The ID of the user. This is a hashed user identifier from Amazon. itemID The ID of the item. This is a hashed product identifier from Amazon. reviewText The text of the review. summary A short summary of the review. overall The star rating of the user’s review from 1 to 5. price Price of the item. reviewHash Hash of the review (essentially a unique identifier for the review). unixReviewTime Time of the review in seconds since 1970. reviewTime Plain-text representation of the review time. category Category labels of the product being reviewed. test.json.zip 10,000 reviews to be used for generating the final Kaggle submission. All fields are the same as in train.json.zip with the exception of the overall rating removed. rating pairs.csv Pairs (reviewerIDs and itemIDs) on which you are to predict ratings. baselines.py A simple baseline that computes a user average and global average on training data, then uses this to predict on test data. This code is given to demonstrate how to properly format predictions for uploading to Kaggle. A submission made with this code corresponds to the ‘naive baseline‘ submission on the leaderboard. Please do not try to collect these reviews from Amazon, or to reverse-engineer the hashing function we used to anonymize the data. Doing so will not be easier than successfully completing the assignment. We will require working code for all submissions to ensure no violation of the competition rules.

Grading and Evaluation

Performing well on the task is worth 5 marks. Your Kaggle performance will be graded as follows:

• Your ability to obtain a solution which outperforms the leaderboard baselines on the unseen portion of the test data (4 marks). Obtaining full marks requires a solution which is

substantially better than baseline performance.

• Obtain a solution which outperforms the baselines on the seen portion of the test data (i.e.,

the leaderboard). This is a consolation prize in case you overfit to the leaderboard. (1 mark).

• Students with submissions ranked in the top 10 will receive a single bonus mark.

To obtain good performance, you should not need to invent new approaches (though you are

more than welcome to!) but rather you will be graded based on your ability to apply reasonable

approaches to each of the given tasks. You will submit a zip file containing the code used to

produce your submission to Markus. We will be checking submissions for similar or copied code

and to verify competition rules were followed.

1.2 Written Report (5 marks)

You will also write a brief report about the approaches you took. Your report should be 12 pt font and be between 2 and 4 pages excluding references.

1. Describe how you processed your data and what features you used. Your exploratory analysis

here should motivate the model you use in the next section.

2. Describe your model. Explain and justify your decision to use the model you proposed. How

will you optimize it? Did you run into any issues due to scalability, overfitting, etc.? What

other models did you consider for comparison? What were your unsuccessful attempts along

the way? What are the strengths and weaknesses of the di↵erent models being compared?

3. Describe your results and conclusions. How well does your model perform compared to

alternatives, and what is the significance of the results? Which feature representations worked

well and which do not? What is the interpretation of your model’s parameters? Why did the

proposed model succeed why others failed (or if it failed, why did it fail)?

联系我们

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

热点文章

更多

辅导 comm2000 creating socia... 2026-01-08
讲解 isen1000 – introductio... 2026-01-08
讲解 cme213 radix sort讲解 c... 2026-01-08
辅导 csc370 database讲解迭代 2026-01-08
讲解 ca2401 a list of colleg... 2026-01-08
讲解 nfe2140 midi scale play... 2026-01-08
讲解 ca2401 the universal li... 2026-01-08
辅导 engg7302 advanced compu... 2026-01-08
辅导 comp331/557 – class te... 2026-01-08
讲解 soft2412 comp9412 exam辅... 2026-01-08
讲解 scenario # 1 honesty讲解... 2026-01-08
讲解 002499 accounting infor... 2026-01-08
讲解 comp9313 2021t3 project... 2026-01-08
讲解 stat1201 analysis of sc... 2026-01-08
辅导 stat5611: statistical m... 2026-01-08
辅导 mth2010-mth2015 - multi... 2026-01-08
辅导 eeet2387 switched mode ... 2026-01-08
讲解 an online payment servi... 2026-01-08
讲解 textfilter辅导 r语言 2026-01-08
讲解 rutgers ece 434 linux o... 2026-01-08

热点标签

engn4536/engn6536

comp(2041|9044)

litr1-uc6201.200

int2067/int5051

csci-ua.0480-003

cs247—assignment

msinm014/msing014/msing014b

联系我们 - QQ: 99515681 微信：codinghelp

© 2024 www.7daixie.com

程序辅导网！