辅导CS 383-Assignment 3讲解留学生Python语言

CS 383 - Machine Learning

Assignment 3 - Dimensionality Reduction

Introduction

In this assignment you’ll work on visualizing data, reducing its dimensionality and clustering it.

You may not use any functions from machine learning library in your code, however you may use

statistical functions. For example, if available you MAY NOT use functions like

• pca

• k-nearest neighbors functions

Unless explicitly told to do so. But you MAY use basic statistical functions like:

• std

• mean

• cov

• eig

Grading

Part 1 (Theory) 10pts

Part 2 (PCA) 40pts

Part 3 (Eigenfaces) 40pts

Report 10pts

TOTAL 100pts

Table 1: Grading Rubric

DataSets

Labeled Faces in the Wild Datasaet This dataset consists of celebrities download from the

Internet from the early 2000s. We use the grayscale version from sklearn.datasets.

we will download the images in a specific way as shown below. You will have 3,023 images, each

87x65 pixels large, belonging to 62 different people.

from sk l e a rn . da ta s e t s import f e t c h l f w p e o p l e

import matp lo t l i b . pyplot as p l t

import matp lo t l i b . cm as cm

people = f e t c h l f w p e o p l e ( m i n f a c e s p e r p e r s o n =20, r e s i z e =0.7)

image shape = people . images [ 0 ] . shape

f i g , axes = p l t . subp lo t s (2 , 5 , f i g s i z e =(15 , 8 ) ,

subplot kw ={ ’ x t i ck s ’ : ( ) , ’ y t i ck s ’ : ( )} )

f o r target , image , ax in z ip ( people . ta rget , people . images , axes . r a v e l ( ) ) :

ax . imshow ( image , cmap=cm. gray )

ax . s e t t i t l e ( people . target names [ t a r g e t ] )

1 Theory Questions

1. Consider the following data:



−2 1

−5 −4

−3 1

0 3

−8 11

−2 5

1 0

5 −1

−1 −3

6 1



(a) Find the principle components of the data (you must show the math, including how you

compute the eivenvectors and eigenvalues). Make sure you standardize the data first and

that your principle components are normalized to be unit length. As for the amount of

detail needed in your work imagine that you were working on paper with a basic calculator.

Show me whatever you would be writing on that paper. (7pts).

(b) Project the data onto the principal component corresponding to the largest eigenvalue

found in the previous part (3pts).

2 Dimensionality Reduction via PCA

Import the data as shown above. This the labeled faces in the wild dataset.

Verify that you have the correct number of people and classes

p r i n t (” people . images . shape : {}” . format ( people . images . shape ) )

p r i n t (”Number o f c l a s s e s : {}” . format ( l en ( people . target names ) ) )

people . images . shape : (3023 , 87 , 65)

Number o f c l a s s e s : 62

This dataset is skewed toward George W. Bush and Colin Powell as you can verify here

# count how o f t en each t a r g e t appears

counts = np . bincount ( people . t a r g e t )

# pr in t counts next to t a r g e t names

f o r i , ( count , name) in enumerate ( z ip ( counts , people . target names ) ) :

p r i n t (”{0 :25} {1 : 3}” . format (name , count ) , end=’ ’ )

i f ( i + 1) % 3 == 0 :

p r i n t ( )

To make the data less skewed, we will only take up to 50 images of each person (otherwise, the

feature extraction would be overwhelmed by the likelihood of George W. Bush):

mask = np . z e r o s ( people . t a r g e t . shape , dtype=np . bool )

f o r t a r g e t in np . unique ( people . t a r g e t ) :

mask [ np . where ( people . t a r g e t == t a r g e t ) [ 0 ] [ : 5 0 ] ] = 1

X people = people . data [ mask ]

y peop le = people . t a r g e t [ mask ]

# s c a l e the g r a y s c a l e va lue s to be between 0 and 1

# ins t ead o f 0 and 255 f o r b e t t e r numeric s t a b i l i t y

X people = X people / 255 .

We are now going to compute how well a KNN classifier does using just the pixels alone.

from sk l e a rn . ne ighbors import KNe ighbo r sC la s s i f i e r

# s p l i t the data in to t r a i n i n g and t e s t s e t s

X train , X test , y t ra in , y t e s t = t r a i n t e s t s p l i t (

X people , y people , s t r a t i f y=y people , random state =0)

# bu i ld a KNe ighbo r sC la s s i f i e r us ing one neighbor

knn = KNe ighbo r sC la s s i f i e r ( n ne ighbors =1)

knn . f i t ( X train , y t r a i n )

p r i n t (” Test s e t s co r e o f 1−nn : { : . 2 f }” . format ( knn . s co r e ( X test , y t e s t ) ) )

You should have an accuracy around 23% - 27%.

Once you have your setup complete, write a script to do the following:

1. Write your own version of KNN (k=1) where you use the SSD (sum of squared differences) to

compute similarity

2. Verify that your KNN has a similar accuracy as sklearn’s version

3. Standardize your data (zero mean, divide by standard deviation)

4. Reduces the data to 100D using PCA

5. Compute the KNN again where K=1 with the 100D data. Report the accuracy

6. Compute the KNN again where K=1 with the 100D Whitened data. Report the accuracy

7. Reduces the data to 2D using PCA

8. Graphs the data for visualization

Recall that although you may not use any package ML functions like pca, you may use statistical

functions like eig or svd.

Your graph should end up looking similar to Figure 1 (although it may be rotated differently, de-

pending how you ordered things).

Figure 1: 2D PCA Projection of data

3 Eigenfaces

Import the data as shown above. This the labeled faces in the wild dataset.

Use the X train data from above. Let’s analyze the first and second principal components.

Write a script that:

1. Imports the data as mentioned above.

2. Standardizes the data.

3. Performs PCA on the data (again, although you may not use any package ML functions like

pca, you may use statistical functions like eig). No need to whiten here.

4. Find the max and min image on PC1’s axis. Find the max and min of PC2. Plot and report

the faces, what variation do these components capture?

5. Visualizes the most important principle component as a 87x65 image (see Figure 2).

6. Reconstructs the X train[0,:] image using the primary principle component. To best see the full

re-construction, “unstandardize” the reconstruction by multiplying it by the original standard

deviation and adding back in the original mean.

7. Determines the number of principle components necessary to encode at least 95% of the infor-

mation, k.

8. Reconstructs the X train[0,:] image using the k most significant eigen-vectors (found in the

previous step, see Figure 4). For the fun of it maybe even look to see if you can perfectly

reconstruct the face if you use all the eigen-vectors! Again, to best see the full re-construction,

“unstandardize” the reconstruction by multiplying it by the original standard deviation and

adding back in the original mean.

Your principle eigenface should end up looking similar to Figure 2.

Figure 2: Primary Principle Component

Your principal reconstruction should end up looking similar to Figure 3.

Figure 3: Reconstruction of first person

Your 95% reconstruction should end up looking similar to Figure 4.

Figure 4: Reconstruction of first person)

Submission

For your submission, upload to Blackboard a single zip file containing:

1. A LaTeX typeset PDF containing:

(a) Part 1: Your answers to the theory questions.

(b) Part 2: The visualization of the PCA result, KNN accuracies

i. Visualization of primary principle component

ii. Number of principle components needed to represent 95% of information, k.

iii. Visualization of the reconstruction of the first person using

A. Original image

B. Single principle component

C. k principle components.

(d) Source Code - python notebook

联系我们

QQ：99515681
邮箱：99515681@qq.com
工作时间：8:00-21:00
微信：codinghelp

热点文章

辅导 comm2000 creating socia... 2026-01-08
讲解 isen1000 – introductio... 2026-01-08
讲解 cme213 radix sort讲解 c... 2026-01-08
辅导 csc370 database讲解迭代 2026-01-08
讲解 ca2401 a list of colleg... 2026-01-08
讲解 nfe2140 midi scale play... 2026-01-08
讲解 ca2401 the universal li... 2026-01-08
辅导 engg7302 advanced compu... 2026-01-08
辅导 comp331/557 – class te... 2026-01-08
讲解 soft2412 comp9412 exam辅... 2026-01-08
讲解 scenario # 1 honesty讲解... 2026-01-08
讲解 002499 accounting infor... 2026-01-08
讲解 comp9313 2021t3 project... 2026-01-08
讲解 stat1201 analysis of sc... 2026-01-08
辅导 stat5611: statistical m... 2026-01-08
辅导 mth2010-mth2015 - multi... 2026-01-08
辅导 eeet2387 switched mode ... 2026-01-08
讲解 an online payment servi... 2026-01-08
讲解 textfilter辅导 r语言 2026-01-08
讲解 rutgers ece 434 linux o... 2026-01-08

热点标签

msinm014/msing014/msing014b

联系我们 - QQ: 99515681 微信：codinghelp

程序辅导网！