讲解 EE5434 final project讲解 Processing

EE5434 final project

Data will be available on Oct. 2

https://www.kaggle.com/t/0e5f2d6870f7451893f45239bcb34181

Report and source codes due (deadline): 11:59PM, Dec. 6th

Full mark: 100 pts.

During the process, you can keep trying new machine learning models and boost the learning accuracy.

You are encouraged to form. groups of size 3 with your classmates so that the team can implement multiple learning models and compare their performance. If you cannot find any partners, please send a message on the group discussion board and briefly introduce your expertise. In the worst case, we can match you with students lacking group members. If you prefer to do this yourself, you can get 5 bonus points.

Submission format: Report should be in PDF format. Source code should be in a notebook file (.ipynb) and also save your source code as a HTML file (.html). Thus, there are three files you need to upload to Canvas. Remember that you should not copy anyone’s codes.

Files and naming rules: For example, if the team members are: Jackie Lee and Xuantian Chan, name it as JackieLee-XuantianChan.xxx. 5 pts will be deducted if the naming rule is not followed. In your report, please clearly show the group members.

How do we grade your report? We will consider the following factors.

1. You would get 30% (basic grade) if you correctly applied one learning model to our

classification problem. The accuracy should be much better than random guess. Your report is written in generally correct English and is easy to follow. Your report should include clear explanation of your implementation details and basic analysis of the results.

2. Factors in grading:

a. Applied/implemented and compared at least 2 different models. You show good sense in choosing appropriate models (such as some NLP related models).

b. For each model, clear explanation of the feature encoding methods, model structure, etc. Carefully tuned multiple sets of parameters or feature engineering methods. Provided evidence of multiple methods to boost the performance.

c. Consider performance metrics beyond accuracy (such as confusion matrix, recall, etc.). Carefully compare the performance of different methods/models/parameter sets. Being able to present your results using the most insightful means such as tables/figures etc.

d. Well-written reports that are easy to follow/read.

e. Final ranking on Kaggle.

For each of the factor, we have unsatisfactory (1), acceptable (2), satisfactory (3), good (4), excellent (5). The sum of each factor will determine the grade. For example, student A got 4 good and 1 acceptable for a to e. Then, A’s total score is 4*4+2=16. The full mark for a to e is 25. So, A’s percentage is 64%.

Note that if the final performance is very close (e.g. 0.65 vs 0.66), the corresponding submissions belong to the same group in the ranking.

Factors that can increase your grade:

1. You used a new learning model/feature engineering method that was not taught in

class. This requires some reading and clear explanation why you think this model fits this problem.

2. Your model’s performance is much better than others because of a new or optimized method based on the data properties.

The format of the report

1. There is no page limit for the report. If you don’t have much to report, keep it simple. Also, miminize the language issues by proofreading.

2. Write down the name and Kaggle user name (not the team name) of your team members.

3. To make our grading more standard, please use the following sections:

a. Abstract. Summarize the report (what you done, what methods you use and the conclusions). (less than 300 words)

b. Data properties (data explortary analysis). You should describe your understanding/analysis of the data properties. For example, what is the distribution of the classes (balanced/imblanced). Use a table/figure to visualize this.

c. Methods/models. In this section, you should describe your implemented models. Provide key parameters. For example, what are the features? If you use kNN, what is k and how you computed the distance? If you use ANN, what is the architecture, etc. You should separate the high-level description of the models and the tuning of hyper-parameters.

d. Experimental results. In this section, compare and summarize the results using appropriate tables/figures. Simplying copying screening is acceptable but will lead to low mark for sure. Instead, you should *summarize* your results. You can also compare the performance of your model under different hyperparameters.

e. Conclusion and discussion. Discussion why your models perform well or poorly.

f. Future work. Discuss what you could do if more time is given.

4. For each model you tried, provide the codes of the model with the best performance. In your report, you can detail the performance of this model with different parameters.