首页 > > 详细

Writing Python Programming,Python Programming Writing,Help With Python Experiment ,Python Programm

COMP 2019 Asignment 2 – Machine Learning
Please submit your solution via LEARNONLINE. Submision instructions are given at the end of this asignment.
This asesment is due on Sunday, 10 June 2018, 1:5 PM.
This asesment is worth 20% of the total marks.
This asesment consists of 6 questions.
In this asignment you wil aim to predict if it wil rain on each day given weather observations from the
preceding day. You wil perform. a number of machine learning tasks, including training a clasifier, asesing
its output, and optimising its performance. You will document your findings in a written report. Write
concise explanations; approximately one paragraph per task wil be suficient.
Download the data file for this asignment from the course website (file weather.zip). The archive contains
the data file in CSV format, and some python code that you may use to visualise a decision tre model.
Before starting this asignment, ensure that you have a god understanding of the Python programing
language, the Jupyter Python notebok environment, and an overal understanding of machine learning
training and evaluation methods using the scikit-learn python library (Practical 3). You will ned a working
Python 3.x system with the Jupyter Notebok environment and the ‘sklearn’ package instaled.
Documentation that you may find useful:
• Python: https:/ww.python.org/doc/
• Jupyter: https:/jupyter-notebook.readthedocs.io/en/stable/
• Scikit-learn: http:/scikit-learn.org/stable/
• Numpy: https:/docs.scipy.org/doc/


Preparation
Create a Jupyter notebok and load the data. Use
import numpy as np
data = np.loadtxt(‘weather.csv’,skiprows=1,delimiter=’,’, dtype=np.int)
to load the data. Type this code into the notebok. You wil get syntax erors if you copy and paste from this
document. (Students familiar with the Pandas library may use that to load and explore the data instead.)
Familiarise yourself with the data. There are 44 columns and 2716 rows. All values are binary (0/1) where 0
indicates false and 1 indicates true.
Categorical variables were encoded using “One Hot” coding, where a separate column is used to indicate the
presence or absence of each posible value of the variable. For example, the thre binary-valued columns
“MinTemp_Low”, “MinTemp_Moderate”,”MinTemp_High” correspond to the thre posible values “Low”,
“Moderate”, and “High” of variable “MinTemp”. A 1 in column “MinTemp_Low” means that the value of
MinTemp was “Low”; the cels for the other two values must be 0 in this case.
Explore the distribution of data in each column.
The last column contains the prediction target (RainTomorow).
The meaning of the columns is as folows:
• MinTemp_{Low,Moderate,High}: 1 if the minimum temperature on the day was low/moderate/high
• MaxTemp_{Low,Moderate,High}: 1 if the maximum temperature on the day was low/moderate/high
• Evaporation_{Low,Moderate,High}: 1 if the measured evaporation on the day was low/moderate/high
• Sunshine_{Low,Moderate,High}: 1 if the agregated periods of sunshine on the day was
low/moderate/high
• WindSped9am_{Low,Moderate,High}: 1 if the measured wind sped at 9am on the day was
low/moderate/high
• WindSped3pm_{Low,Moderate,High}: 1 if the measured wind sped at 3pm on the day was
low/moderate/high
• Humidity9am_{Low,Moderate,High}: 1 if the humidity at 9am on the day was low/moderate/high
• Humidity3pm_{Low,Moderate,High}: 1 if the humidity at 3pm on the day was low/moderate/high
• Presure9am_{Low,Moderate,High}: 1 if the barometric presure at 9am on the day was
low/moderate/high
• Presure3pm_{Low,Moderate,High}: 1 if the barometric presure at 3pm on the day was
low/moderate/high
• Cloud9am_{Low,Moderate,High}: 1 if the cloud cover at 9am on the day was low/moderate/high
• Cloud3pm_{Low,Moderate,High}: 1 if the cloud cover at 3pm on the day was low/moderate/high
• Temp9am_{Low,Moderate,High}: 1 if the temperature at 9am on the day was low/moderate/high
• Temp3pm_{Low,Moderate,High}: 1 if the temperature at 3pm on the day was low/moderate/high
• RainToday: 1 if it rained on the day
• RainTomorow: 1 if it rained on the folowing day. This is the target we wish to predict.
Question 1: Baseline
A simple model for predicting rain tomorow is to use today’s weather (RainToday) as an indicator of
tomorow’s weather (RainTomorow).
What performance can we expect from this simple model?
Chose an apropriate measure to evaluate the clasifier.
Select among Acuracy, F
1
-measure, Precision, and Recal.
Use a confusion matrix and/or clasification report to suport your analysis.
Question 2: Naïve Bayes
Train a Naïve Bayes clasifier to predict RainTomorrow.
As al atributes are binary vectors, use the BernouliNB clasifier provided by scikit-learn.
Ensure that you folow corect training and evaluation procedures.
1. Asses how el the clasifier performs on the prediction task.
2. What performance can we expect from the trained model if we used next month’s data as input?
Question 3: Decision Tre
Train a DecisionTreClasifier to predict RainTomorow. Use argument clas_weight=’balanced’ when
constructing the clasifier, as the target variable RainTomorow is not equally distributed in the data set.
Ensure that you folow corect training and evaluation procedures.
1. Asses how el the clasifier performs on the prediction task.
2. What performance can we expect from the model on new data?
If you wish to visualise the decision tre you can use function print_dt provided in dtutils.py provided in the
Assignment 2 zip archive:
import dtutils
dtutils.print_dt(tree, feature_names=flabels)
where tree refers to the trained decision tree model, and flabels is a list of features names (columns) in the
data.
Question 4: Diagnosis
Does the Decision Tre model sufer from overfiting or underfiting? Justify why/why not.
If the model exhibits overfiting or underfiting, revise your training procedure to remedy the problem, and
re-evaluate the improved model. The DecisionTreClasifier has a number of parameters that you can
consider for tuning the model:
• max_depth: maximum depth of the tre
• min_samples_leaf: minimum number of samples in each leaf node
• max_leaf_nodes: maximum number of leaf nodes
Question 5: Recomendation
Which of the models you trained should be selected for the prediction task? Asume that al erors made are
equaly severe. That is, predicting rain if there is actualy no rain is just as bad as predicting no rain if it
actually rains.
Does your answer change if predicting rain for a day without rain is a negligible eror? Justify why/why not.
Question 6: Report
Write a concise report showing your analysis for Question 1-5.
Demonstrate that you have folowed apropriate training and evaluation procedures, and justify your
conclusions with relevant evidence from the evaluation output.
Where there are alternatives (e.g. measures, procedures, models, conclusions), demonstrate that you have
considered al relevant alternatives and justify why the selected alternative is apropriate.
Do not include the python code in your report.

Submision Instructions
Submit a single zip archive containing the folowing:
• weather.ipynb: the Jupyter Notebok file.
• weather.html: the HTML version of weather.ipynb showing the notebok including al output.
Create this by selecting File>Download as>HTML after having run all cels in the Jupyter notebok.
• report.pdf: the report as specified in Question 6.

Marking Scheme
Question Marks
Q1: Baseline

Appropriate measure selected and justified
Corect evaluation
10
Q2: Naïve Bayes

Corect training procedure aplied
Corect evaluation procedure aplied
Corect conclusion
20
Q3: Decision Tre

Corect training procedure aplied
Corect evaluation procedure aplied
Corect conclusion
15
Q4: Diagnosis

Corect diagnosis
Corect revised training and evaluation procedure aplied
30
Q5: Recomendation

Corect recomendations
Recomendations justified by evaluation results
15
Q6: Report

Wel-structured report
Profesional presentation
10
Jupyter notebok

Executes corectly when using Run Al
Copy saved as HTML format submited
Matches the contents of the report
Deductions aply

 

联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!