首页 > > 详细

CS 165B – Machine Learning Machine Problem #3

CS 165B – Machine Learning, Spring 2021

Machine Problem #3

Due June 6 by 11:59pm PST

Note: This assignment should be completed individually. You are not

supposed to copy code directly from anywhere else, and we will review the

similarity of your submissions.

1. Task and Dataset:

In this assignment, you need to write a Python3 program to solve the

dialogue sentiment classification problem, which aims to identify the

sentiment (negative, neutral, positive) of each turn utterance in the dialogue.

From machine learning perspective, it can be viewed as multiclass singlelabel text classification task where each sentiment represents a class (label)

of a given text. This task is important since it can be used in some real

problems such as analyzing user satisafisatin of dialogue system. The

dialogues in this task is collected and labeled from the TV show of Friends.

2. Implementation:

You can find the code framework and data in the mp3_starter_package in

the resources of Piazza.

2.1. About The Input and output

You are supposed to implement the run_train_test function.

The inputs of the this function:

• training_data: this is the training data in the form of a list of dictionary,

where each dictionary is a sample of a single turn in the dialogue. Each

dictionary has the following items:

• “text”: the utterance of the speaker.

• “label”: the sentiment label. 0 (negative), 1 (neutral) or 2 (positive).

• “speaker”: the name of the speaker.

• “dialogue_id”: the id of the dialogue.

• “utterance_id”: the order of this utterance within the dialogue.

• testing_data: this is same to the training_data with “label” item removed.

The output for the function:

• testing_labels: the labels for the testing data in the form of a list of integer.

2.2. How to evaluate your code locally

To evaluate your code in local machine, run the following command:

# python3 evaluate.py

which will output the f1-score on the development dataset for each class. We

will give grades based on the average f1-score.

2.3. How to convert the text into features

There are two general ways to convert the text into features. The first is to

use TF-IDF related features. You can use TfidfVectorizer provided by sklearn

to achieve this. For more information about how to use this, please refer the

official document: https://scikit-learn.org/stable/modules/generated/

sklearn.feature_extraction.text.TfidfVectorizer.html

Another way requires more background in the NLP field, which convert the

text into embeddings using Word2Vec or other language models. If you are

interested in this direction, there is a guidance about word embedding:

https://towardsdatascience.com/a-beginners-guide-to-word-embedding-withgensim-word2vec-model-5970fa56cc92. You can also find other resources

online. After you have the word embedding, you can try some popular

recurrent models given a sequence of word embeddings such as LSTM.

2.4. What modules can be used

I have installed pandas, numpy, nltk, sklearn, pytorch, scipy, tensorflow,

keras on the Gradescope. Please try use these tools as you can. If you need to

use other tools, please let me know. Note that there is no GPU on the

Gradescope, so please make sure your program can run successfully on a

single-CPU machine without GPU resources.

3. Submission

3.1. How to submit

To submit your code: upload a single file mp3.py (multiple files are not

accepted) in the Gradescope, which will show your average f1-score and

your grade on the testing data. Note that the testing data on Gradescope is

different from the local development dataset that you have.

4. Grading

3.1. Basic grade

I have run a simple baseline which get around 50 average f1-score on the

testing data, and I will use this as the threshold. If your f1-score x is higher

than 50, then you will get 85+ (x - 50). For example, if you get 55 f1-score,

then your score will be 85+(55-50) = 90. Note that the maximum score will

be 100. If your f1-score x is lower than 50, then you will get 85 - 0.5 * (50-

x)*(50 - x), i.e., the lower your f1-score is than the threshold, the more

points you will get by improving the performance.

3.2. Extra credit

Besides the above basic score, we are going to have a leaderboard and give

extra credits to students with top performances for this assignment. There

are pre-submission tournament and the post-submission tournament.

For the pre-submission tournament, any student who leads the leaderboard

for one day will get 2% extra credit. This pre-submission tournament will

last from May 27 to June 6. We will count the winner at the end of the day.

For the post-submission tournament, top-3 performers will get extra credit

50%, 25% and 25%. We will count the top-3 performers immediately after

the deadline (June 6, 11.59PM PST). Any submissions after the deadline will

no be counted for the post-submission tournament.

Your final score is basic_score*(1+extra_credit). You can check the

leaderboard on Gradescope about the performance of others, and you don’t

have to use to real name for leaderboard.

联系我们

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

热点文章

更多

辅导 comm2000 creating socia... 2026-01-08
讲解 isen1000 – introductio... 2026-01-08
讲解 cme213 radix sort讲解 c... 2026-01-08
辅导 csc370 database讲解迭代 2026-01-08
讲解 ca2401 a list of colleg... 2026-01-08
讲解 nfe2140 midi scale play... 2026-01-08
讲解 ca2401 the universal li... 2026-01-08
辅导 engg7302 advanced compu... 2026-01-08
辅导 comp331/557 – class te... 2026-01-08
讲解 soft2412 comp9412 exam辅... 2026-01-08
讲解 scenario # 1 honesty讲解... 2026-01-08
讲解 002499 accounting infor... 2026-01-08
讲解 comp9313 2021t3 project... 2026-01-08
讲解 stat1201 analysis of sc... 2026-01-08
辅导 stat5611: statistical m... 2026-01-08
辅导 mth2010-mth2015 - multi... 2026-01-08
辅导 eeet2387 switched mode ... 2026-01-08
讲解 an online payment servi... 2026-01-08
讲解 textfilter辅导 r语言 2026-01-08
讲解 rutgers ece 434 linux o... 2026-01-08

热点标签

engn4536/engn6536

comp(2041|9044)

litr1-uc6201.200

int2067/int5051

csci-ua.0480-003

cs247—assignment

msinm014/msing014/msing014b

联系我们 - QQ: 99515681 微信：codinghelp

© 2024 www.7daixie.com

程序辅导网！