讲解K Nearest neighbour (KNN) classifier、讲解Python程序

Artificial Inteligence
Introduction
In this asignment, you wil develop several clasification models to classify
noisy input images into the clases square or circle, as shown in Fig. 1
Your clasification models wil use the training and testing sets (that are
available with this asignment) containing many image samples labelled as
square or circle. Your task is to write a Python code that can be run on a
Jupyter Notebook session, which will train and validate the following
classification models:
1) K Nearest neighbour (KNN) classifier [35 marks]. For the KNN classifier,
you can only use standard Python libraries (e.g., numpy) in order to implement
al aspects of the training and testing algorithms. You wil ned to implement
two functions: a) one to build a K-d tre from the training set (this function takes
the training samples and labels as its parameters), and b) another to test the
KN clasifier and compute the clasification acuracy, where the parameters
are K and the test images and labels. Using matplotlib, plot a graph of the
evolution of classification accuracy for the training and testing sets as a function
of K, where K = 1 to 10. Clearly identify the value of K, where generalisation is
best.
2) Decision tre classifier [35 marks]. For the decision tree classifier, you can
only use standard Python libraries (e.g., numpy) in order to implement al
aspects of the training and testing algorithms. Esentialy you wil ned to
implement two functions: a) one to train the decision tre using the training
samples and labels plus a pre-pruning parameter indicating the minimum
information content before stop spliting, and b) another to test the decision tree
and compute the clasification acuracy (similarly to the KN clasifier, the test
function takes as one of its parameters the test images and labels and returns the
classification accuracy). Using matplotlib, plot a graph of the evolution of
classification accuracy for the training and testing sets as a function of the
information content, where information content = 0 to 0.5 bits. Clearly identify
the value of information content, where generalisation is best.
3) Convolutional neural network (CNN) classifier [20 marks]. For the
convolutional neural network, you are allowed to use Keras using TensorFlow
backend, similar to the example shown in the code provided. The CNN structure
is the lenet structure used in lecture. Using matplotlib, please plot a graph of the
evolution of accuracy for the training and testing sets as a function of the number
of epochs, where the max number of epochs is 20. Clearly identify the value of
information content, where generalisation is best.

A sample code that trains and tests a multi-layer perceptron classifier that can
run on a Jupyter Notebook session is provided, and it is expected that the
submited code can run on a Jupyter Notebok sesion in a similar maner. A
held-out test set will be used to test the generalisation of the implemented
clasification models, but this held-out set will only be available after the
asignment deadline – please note that this held-out set wil contain samples
obtained from the same distributions used to generate the training and testing
sets.
You must write the program yourself in Python, and the code must be a single
file that can run on a Jupyter Notebook session (file type .ipynb). You will only
get marks for the parts that you implemented yourself. If you use a library
package or language function call for training or testing a KNN or a Decision Tree
clasifier, then you wil be limited to 50% of the available marks (noting that this
asignment is a hurdle for the course). If there is evidence you have simply
copied code from the web, you wil be awarded no marks and refered for
plagiarism
Submission
You must submit, by the due date, two files:
1. ipynb file containing your code with the thre classifiers and all
implementations described above
2. pdf file with a short written report detailing your implementation in no more
than 1 page, and the following results:

a) The training and testing accuracies at the best generalisation operating
point for each type of classifier, using a table [5 marks]:

Training Accuracy Testing Accuracy
K=1 NN
…
K=10 NN
DT (IC = 0 bits)
…
DT (IC = 0.5 bits)
CNN
b) Runing time for training and testing algorithms acuracies of each type
of classifier, using a table [5 marks]:
Training Time Testing Time
K=1 NN
…
K=10 NN
DT (IC = 0 bits)
…
DT (IC = 0.5 bits)
CNN
c) Bonus question: How can the classification accuracy of the decision tree
clasifier be improved? Please implement your idea (hint: dimensionality
reduction) [10 marks].
Total number of marks: 10 + 10 bonus marks
This asignment is due 1.5pm on Monday 14th May, 2018. If your submision
is late, the maximum mark you can obtain wil be reduced by 25% per day (or
part thereof) past the due date or any extension you are granted.
This asignment relates to the folowing ACS CBOK areas: abstraction, design,
hardware and software, data and information, HCI and programing.