首页 > > 详细

辅导CS 383-Assignment 3讲解留学生Python语言

CS 383 - Machine Learning 
Assignment 3 - Dimensionality Reduction 
Introduction 
In this assignment you’ll work on visualizing data, reducing its dimensionality and clustering it. 
You may not use any functions from machine learning library in your code, however you may use 
statistical functions. For example, if available you MAY NOT use functions like 
• pca 
• k-nearest neighbors functions 
Unless explicitly told to do so. But you MAY use basic statistical functions like: 
• std 
• mean 
• cov 
• eig 
Grading 
Part 1 (Theory) 10pts 
Part 2 (PCA) 40pts 
Part 3 (Eigenfaces) 40pts 
Report 10pts 
TOTAL 100pts 
Table 1: Grading Rubric 
DataSets 
Labeled Faces in the Wild Datasaet This dataset consists of celebrities download from the 
Internet from the early 2000s. We use the grayscale version from sklearn.datasets. 
we will download the images in a specific way as shown below. You will have 3,023 images, each 
87x65 pixels large, belonging to 62 different people. 
from sk l e a rn . da ta s e t s import f e t c h l f w p e o p l e 
import matp lo t l i b . pyplot as p l t 
import matp lo t l i b . cm as cm 
people = f e t c h l f w p e o p l e ( m i n f a c e s p e r p e r s o n =20, r e s i z e =0.7) 
image shape = people . images [ 0 ] . shape 
f i g , axes = p l t . subp lo t s (2 , 5 , f i g s i z e =(15 , 8 ) , 
subplot kw ={ ’ x t i ck s ’ : ( ) , ’ y t i ck s ’ : ( )} ) 
f o r target , image , ax in z ip ( people . ta rget , people . images , axes . r a v e l ( ) ) : 
ax . imshow ( image , cmap=cm. gray ) 
ax . s e t t i t l e ( people . target names [ t a r g e t ] ) 
1 Theory Questions 
1. Consider the following data: 
 
−2 1 
−5 −4 
−3 1 
0 3 
−8 11 
−2 5 
1 0 
5 −1 
−1 −3 
6 1 
 
(a) Find the principle components of the data (you must show the math, including how you 
compute the eivenvectors and eigenvalues). Make sure you standardize the data first and 
that your principle components are normalized to be unit length. As for the amount of 
detail needed in your work imagine that you were working on paper with a basic calculator. 
Show me whatever you would be writing on that paper. (7pts). 
(b) Project the data onto the principal component corresponding to the largest eigenvalue 
found in the previous part (3pts). 
2 Dimensionality Reduction via PCA 
Import the data as shown above. This the labeled faces in the wild dataset. 
Verify that you have the correct number of people and classes 
p r i n t (” people . images . shape : {}” . format ( people . images . shape ) ) 
p r i n t (”Number o f c l a s s e s : {}” . format ( l en ( people . target names ) ) ) 
people . images . shape : (3023 , 87 , 65) 
Number o f c l a s s e s : 62 
This dataset is skewed toward George W. Bush and Colin Powell as you can verify here 
# count how o f t en each t a r g e t appears 
counts = np . bincount ( people . t a r g e t ) 
# pr in t counts next to t a r g e t names 
f o r i , ( count , name) in enumerate ( z ip ( counts , people . target names ) ) : 
p r i n t (”{0 :25} {1 : 3}” . format (name , count ) , end=’ ’ ) 
i f ( i + 1) % 3 == 0 : 
p r i n t ( ) 
To make the data less skewed, we will only take up to 50 images of each person (otherwise, the 
feature extraction would be overwhelmed by the likelihood of George W. Bush): 
mask = np . z e r o s ( people . t a r g e t . shape , dtype=np . bool ) 
f o r t a r g e t in np . unique ( people . t a r g e t ) : 
mask [ np . where ( people . t a r g e t == t a r g e t ) [ 0 ] [ : 5 0 ] ] = 1 
X people = people . data [ mask ] 
y peop le = people . t a r g e t [ mask ] 
# s c a l e the g r a y s c a l e va lue s to be between 0 and 1 
# ins t ead o f 0 and 255 f o r b e t t e r numeric s t a b i l i t y 
X people = X people / 255 . 
We are now going to compute how well a KNN classifier does using just the pixels alone. 
from sk l e a rn . ne ighbors import KNe ighbo r sC la s s i f i e r 
# s p l i t the data in to t r a i n i n g and t e s t s e t s 
X train , X test , y t ra in , y t e s t = t r a i n t e s t s p l i t ( 
X people , y people , s t r a t i f y=y people , random state =0) 
# bu i ld a KNe ighbo r sC la s s i f i e r us ing one neighbor 
knn = KNe ighbo r sC la s s i f i e r ( n ne ighbors =1) 
knn . f i t ( X train , y t r a i n ) 
p r i n t (” Test s e t s co r e o f 1−nn : { : . 2 f }” . format ( knn . s co r e ( X test , y t e s t ) ) ) 
You should have an accuracy around 23% - 27%. 
Once you have your setup complete, write a script to do the following: 
1. Write your own version of KNN (k=1) where you use the SSD (sum of squared differences) to 
compute similarity 
2. Verify that your KNN has a similar accuracy as sklearn’s version 
3. Standardize your data (zero mean, divide by standard deviation) 
4. Reduces the data to 100D using PCA 
5. Compute the KNN again where K=1 with the 100D data. Report the accuracy 
6. Compute the KNN again where K=1 with the 100D Whitened data. Report the accuracy 
7. Reduces the data to 2D using PCA 
8. Graphs the data for visualization 
Recall that although you may not use any package ML functions like pca, you may use statistical 
functions like eig or svd. 
Your graph should end up looking similar to Figure 1 (although it may be rotated differently, de- 
pending how you ordered things). 
Figure 1: 2D PCA Projection of data 
3 Eigenfaces 
Import the data as shown above. This the labeled faces in the wild dataset. 
Use the X train data from above. Let’s analyze the first and second principal components. 
Write a script that: 
1. Imports the data as mentioned above. 
2. Standardizes the data. 
3. Performs PCA on the data (again, although you may not use any package ML functions like 
pca, you may use statistical functions like eig). No need to whiten here. 
4. Find the max and min image on PC1’s axis. Find the max and min of PC2. Plot and report 
the faces, what variation do these components capture? 
5. Visualizes the most important principle component as a 87x65 image (see Figure 2). 
6. Reconstructs the X train[0,:] image using the primary principle component. To best see the full 
re-construction, “unstandardize” the reconstruction by multiplying it by the original standard 
deviation and adding back in the original mean. 
7. Determines the number of principle components necessary to encode at least 95% of the infor- 
mation, k. 
8. Reconstructs the X train[0,:] image using the k most significant eigen-vectors (found in the 
previous step, see Figure 4). For the fun of it maybe even look to see if you can perfectly 
reconstruct the face if you use all the eigen-vectors! Again, to best see the full re-construction, 
“unstandardize” the reconstruction by multiplying it by the original standard deviation and 
adding back in the original mean. 
Your principle eigenface should end up looking similar to Figure 2. 
Figure 2: Primary Principle Component 
 
Your principal reconstruction should end up looking similar to Figure 3. 
Figure 3: Reconstruction of first person 
Your 95% reconstruction should end up looking similar to Figure 4. 
Figure 4: Reconstruction of first person) 
 
Submission 
For your submission, upload to Blackboard a single zip file containing: 
1. A LaTeX typeset PDF containing: 
(a) Part 1: Your answers to the theory questions. 
(b) Part 2: The visualization of the PCA result, KNN accuracies 
(c) Part 3: 
i. Visualization of primary principle component 
ii. Number of principle components needed to represent 95% of information, k. 
iii. Visualization of the reconstruction of the first person using 
A. Original image 
B. Single principle component 
C. k principle components. 
(d) Source Code - python notebook 
 
联系我们
  • QQ:99515681
  • 邮箱:99515681@qq.com
  • 工作时间:8:00-21:00
  • 微信:codinghelp
热点标签

联系我们 - QQ: 99515681 微信:codinghelp
程序辅导网!