首页 >
> 详细

Pennsylvania State University

College of Information Sciences and Technology

DS 310 Midterm Exam

Instructions

1. There are 4 problems each of which is worth 25 points.

2. Please consult the instructor if you have difficulty understanding any of the problems.

1. (25 pts.) Consider a M-class pattern classification problem in which each pattern

X to be classified belongs to exactly one of M mutually exclusive classes ω1 ··· ωM.

Suppose that X is represented using a vector of N binary features X = [x1, x2 ··· xN ].

Let pji = P(xi = 1|ωj ). Assume that the features are independent given the class

label. Let P(ω1)··· P(ωM) be the prior probabilities of each class.

(a) Describe in words, the meaning of pji.

(b) Show that a classifier that assigns X to class ωk gk(X) ≥ gj (X)∀j ̸= k. where

gi(X) can be written in the form !N

i=1 wjixi + wj0 is a minimum error (Bayes

optimal) classifier. Express wji, in terms of pji and P(ωj ) where (1 ≤ j ≤ M;

0 ≤ i ≤ N).

(c) Generalize the above solution for (b) above to obtain minimum loss classification

where λij is the loss incurred in assigning a pattern to class ωi when it in fact

belongs to class ωj .

2. (25 pts.) Suppose you have been hired by an AI consulting firm. Indicate which

algorithm you would choose in each of the following data-driven knowledge acquisition

scenarios. In each case, briefly justify your recommendation.

(a) Your client, has a database of patient records containing symptoms and expert

diagnosis. She would like to build a diagnosis system. The attributes can be

numeric (e.g., patient’s temperature), as well as categorical (e.g., whether the

patient is pregnant). In addition to performing accurate diagnosis of patients, your

client would like to use the system to obtain insight regarding the relationships

between attributes for different diseases.

(b) Your client is designing an email organizer to be integrated into an email program.

The SPAM detector is to be trained using information that is obtained

by observing the user’s actions on each email (read and respond, mark as junk,

ignore).

(c) Your client is a large company which has managed to gather a large database of

information about past applicants to jobs in various categories (e.g., software engineer,

data scientist, etc.) and the hiring decisions on each applicant. Experts in

the organization are convinced that they can automate the process of shortlisting

applicants for further consideration. You are told that it is important that the

Calculateprobability ofXi 18咒j

pcxniilwikRF.hu

Bayesdecisionhe

decision-making process be transparent – that is, it should be easy to understand

why an applicant was shortlisted (or not).

(d) Your client is a large hospital that is interested in reducing the workload on

radiologists working on breast cancer diagnosis from radiological images. The

hospital has a large database of radiological images labeled with expert diagnosis.

The goal here is to automate diagnosis on the cases where the AI system can

produce high confidence results, and identify a subset of cases that the AI system

is not so confident about for further examination by expert radiologists.

3. (25 pts.) Recall that the perceptron learning algorithm that was described in class

is an additive weight update algorithm - that is, we add or subtract a fraction of

the misclassified sample to the weight vector at each iteration. Consider instead,

a multiplicative weight update algorithm for an n-input neuron defined as follows:

Consider a neuron defined by two weight vectors w+ and w−. Suppose both weight

vectors are initialized with a value 1 for each of their components. Consider a training

example (xp, dp) where xp ∈ {−1, 1}n is an input pattern and dp ∈ {−1, 1} is its

class label. Let yp, the output of the classifier be 1 if w+ · xp > w− · xp and yp =

−1 otherwise. Suppose the weights are updated as follows: w+

i ← w+

i β−(dp−yp)xip

and w−

i ← w−

i β(dp−yp)xip where 0 < β < 1 is a learning rate. Comment on the

potential advantages of such a multiplicative weight update algorithm over its additive

counterpart. Prove that this algorithm is guaranteed to converge to a pair of weight

vectors (w+⋆ , w−⋆ ) that correctly classify the training data whenever such weight vectors

exist. Hint: Show that the multiplicative weight update algorithm as an instance of

the standard perceptron algorithm operating on a suitably transformed weight space.

4. (a) (12.5 pts. )

Suppose you have designed (or trained) a n-input perceptron with the weight

vector W and threshold θ to correctly classify a set S of n-dimensional patterns.

Further assume that all the weight values of the perceptron so designed have

been hard-wired but the threshold θ can be set under user control. When the

perceptron is later used on a factory floor, suppose you find that the source of

the patterns (say the camera or the digitizer) adds a fixed amount of noise (given

by the n-dimensional noise vector V) to each pattern. Assuming that you can

somehow measure (or can calculate) V, how would you change θ so that the

perceptron with the same weight vector W continues to correctly classify all the

patterns in S?

(b) (12.5 pts) Briefly explain the significance of the following properties of the error

functions used to derive gradient-descent based learning algorithms of the type

considered in part (a).

i. Continuity and Differentiablilty (with respect to the weights or other parameters

of interest)

ii. Convexity with respect to weights or other parameters of interest

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- 代写artificial课程作业、Java，Python程序语言作业调试、C 2020-05-27
- Comp Sci 3306作业代写、Python编程语言作业调试、代做jav 2020-05-27
- Data留学生作业代写、代做r课程设计作业、Analytics作业代做、R编 2020-05-27
- Csci 3120作业代做、C++程序语言作业调试、代做c/C++课程作业、 2020-05-26
- 代写algorithms作业、Data留学生作业代做、代写java、Pyth 2020-05-26
- Data Science作业代写、C++程序设计作业代写、Programmi 2020-05-26
- Data课程作业代写、C++编程设计作业调试、C/C++语言作业代做、Alg 2020-05-26
- 代写r留学生作业、代做data课程作业、代写r编程语言作业代做r语言编程|调 2020-05-25
- Cosc473作业代做、Systems作业代写、Python编程设计作业调试 2020-05-25
- Data留学生作业代做、R编程设计作业调试、R语言作业代写、Program课 2020-05-25
- Comp 250 Assignment 3 2020-05-24
- Macm 316 – Computing Assignment 7 2020-05-24
- Sta457 Assignment 2020-05-24
- Homework 10 2020-05-24
- Lab 2 Msc: Time Series Prediction With... 2020-05-24
- Comp2011作业代做、Data Analysis作业代写、C++编程语言 2020-05-24
- 代做compsys201作业、Python，Java，C/C++编程语言作业 2020-05-24
- Program留学生作业代做、Python编程设计作业调试、Data作业代写 2020-05-24
- 代写 Practical 3 Covid-19程序作业，代写... 2020-05-23
- 代写comp3059作业、代做programming作业、Java语言作业代 2020-05-23