首页 > > 详细

Part 1. Building a Recommender System

 Part 1. Building a Recommender System (10 marks)

1.1 Kaggle Competition (5 marks)
In this part of the assignment, you will build a recommender system model to predict ratings
related to music reviews on Amazon. Specifically, given a (user, item) pair and associated review
data, we want to predict the review’s star rating as accurately as possible. Performance will be
measured with MSE.
Files train.json.zip 200,000 review to be used for training. It is not necessary to use all ratings for training, for example if doing so proves too computationally intensive. reviewerID The ID of the user. This is a hashed user identifier from Amazon. itemID The ID of the item. This is a hashed product identifier from Amazon. reviewText The text of the review. summary A short summary of the review. overall The star rating of the user’s review from 1 to 5. price Price of the item. reviewHash Hash of the review (essentially a unique identifier for the review). unixReviewTime Time of the review in seconds since 1970. reviewTime Plain-text representation of the review time. category Category labels of the product being reviewed. test.json.zip 10,000 reviews to be used for generating the final Kaggle submission. All fields are the same as in train.json.zip with the exception of the overall rating removed. rating pairs.csv Pairs (reviewerIDs and itemIDs) on which you are to predict ratings. baselines.py A simple baseline that computes a user average and global average on training data, then uses this to predict on test data. This code is given to demonstrate how to properly format predictions for uploading to Kaggle. A submission made with this code corresponds to the ‘naive baseline‘ submission on the leaderboard. Please do not try to collect these reviews from Amazon, or to reverse-engineer the hashing function we used to anonymize the data. Doing so will not be easier than successfully completing the assignment. We will require working code for all submissions to ensure no violation of the competition rules.
Grading and Evaluation
Performing well on the task is worth 5 marks. Your Kaggle performance will be graded as follows:
• Your ability to obtain a solution which outperforms the leaderboard baselines on the un￾seen portion of the test data (4 marks). Obtaining full marks requires a solution which is
substantially better than baseline performance.
• Obtain a solution which outperforms the baselines on the seen portion of the test data (i.e.,
the leaderboard). This is a consolation prize in case you overfit to the leaderboard. (1 mark).
• Students with submissions ranked in the top 10 will receive a single bonus mark.
To obtain good performance, you should not need to invent new approaches (though you are
more than welcome to!) but rather you will be graded based on your ability to apply reasonable
approaches to each of the given tasks. You will submit a zip file containing the code used to
produce your submission to Markus. We will be checking submissions for similar or copied code
and to verify competition rules were followed.
1.2 Written Report (5 marks)
You will also write a brief report about the approaches you took. Your report should be 12 pt font and be between 2 and 4 pages excluding references.
1. Describe how you processed your data and what features you used. Your exploratory analysis
here should motivate the model you use in the next section.
2. Describe your model. Explain and justify your decision to use the model you proposed. How
will you optimize it? Did you run into any issues due to scalability, overfitting, etc.? What
other models did you consider for comparison? What were your unsuccessful attempts along
the way? What are the strengths and weaknesses of the di↵erent models being compared?
3. Describe your results and conclusions. How well does your model perform compared to
alternatives, and what is the significance of the results? Which feature representations worked
well and which do not? What is the interpretation of your model’s parameters? Why did the
proposed model succeed why others failed (or if it failed, why did it fail)?
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!