首页 >
> 详细

Contents

I Membership Inference[10 points] 2

II Mitigating Bias with Adversarial Learning [10 points] 4

II-1 Demographic Parity [5 points] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

II-2 Equality of Opportunity [5 points] . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

IIIMitigating Bias in Word Embeddings[10 points] 5

III-1 Debiasing word embeddings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

Setup

I Membership Inference[10 points]

In this part of the assignment, you will be implementing the black-box shadow model membership inference

attack of Shokri et al. [3]1 also described in Nasr et al. [2], the survey we read for class. A membership

inference is the scenario in which an adversary seeks to determine if a given instance was used for training a

given model. We will consider the "black-box" variant of membership inference: the adversary does not have

direct access to the model, they only have query access in that they can ask the model to make predictions

on chosen instances and observe the outcome. More difficult variants of this scenario only give the adversary

the prediction class for their chosen queries but we will work in the easier case where the adversary is given

the output probability vector. We will see that despite not having direct access, the attacker can still achieve

success. White-box vs. black-box is a common dichotomy in security where it is generally recognized that

hiding systems or models (that is, black-box scenarios) from adversaries is typically not an effective form

of defense. This is true in a wider sense as exemplified by the concept of "security through obscurity", the

usually ineffective approach for securing systems by hiding their internal operation.

White-box Membership Inference Let (x,y) ∼ D be a dataset partitioned into subsets T0, T1. Let f

be a model trained on T1 where yˆ = f(x), the model outputs, are probability distributions over a set of

classes. An adversary is a procedure, that given an instance x from either T0 or T1 outputs either 0 or 1

indicating the guess of which subset the instance comes from (that is, whether it is a training instance). The

adversary also has:

• Query access to f ’s probability vectors. In another words, given any chosen instance x′, the attacker

can obtain yˆ′ = f(x′). They are not limited in how many queries to make (though typically this is a

point of comparison for black-box attacks).

• a shadow subset S ⊆ D independent of the training set: S ∩ T0 = ∅ 2.

• Knowledge of the format of the inputs and outputs of the targeted model, including their number and

the range of values they can take.

1https://arxiv.org/abs/1610.05820

2The attack will work better if S happens to have some overlap with T1.

2

• Knowledge of the type and architecture of the machine learning model, as well as the training algo-

rithm3.

Let b be a fair coin flip in range {0, 1} and let x be a sample uniformly from Tb. The probability that the

attacker outputs b is a measure of adversary success. We will write Af,S to denote the adversary with the

aforementioned access, then:

success (Af,S) def= Pr

b

[Af,S(x) = b | x ∈ Tb]

Shadow Model Attack In the shadow model attack, the attacker uses the shadow dataset S to create

a predictor for the question "was x used to train this model?". The process has two main steps: 1) train

predictors for known splits of S and collect their predictions on instances in S and out of S into a synthesized

dataset for the membership inference task, 2) train a model over the synthesized dataset. We elaborate below.

1. Repeat a number of times the process:

• Split S into two disjoint subsets Sin and Sout and train a shadow model g using only Sin. We will

use this model to generalize output behaviour of models on training instances vs. non-training

instances.

• Synthesize two datasets Ain and Aout. The features in Ain are the ground truth label y and the

g-predicted class distribution for each instance (x, y) ∈ Sin while Aout has the same but for each

instance of Sout. The target class in these is an indicator whether the given instances are from

Sin (indicated by 1) or from Sout (indicated by 0).

We now have (y, yˆ, 1) ∼ Ain where yˆ = g(x) for (x, y) ∈ Sin and (y, yˆ, 0) ∼ Aout where yˆ = g(x)

for (x, y) ∈ Sout

2. Combine all of the produced Ain and Aout sets into (y, yˆ, b) ∼ A.

3. Train an attack model m : (y, yˆ) 7→ b using A to predict training set membership b.

Now given an instance (x, y), we can use m to predict b = m(y, g(x)) telling us whether (x, y) ∈ Sin,

the training set of the shadow model g. Interestingly, we can also b = m(y, f(x)) to determine whether

(x, y) ∈ T1, the training set of the model we are attacking, f !

For the implementation, we are additionally asking you to create not one attack model m but rather a

family my of models where my is specialized to instances of only class y. Thus to make membership guess

for instance (x, y), we look up the prediction b = my(y, f(x)).

Implementation For the attack models, my, you can use the following architecture though we encourage

you to experiment:

• Let C be the number of classes of the target model (C can be obtained via shadow_labels.max() + 1).

The input of m has shape (None, 2C) (2C because m takes both the predicted distribution over labels

and the one-hot true label.

• One hidden layer of shape (None, 4C) with a ReLU activation.

• Output is of shape (None, 1) with a sigmoid activation.

3This is not required in the original paper but is assumed in this homework for simplicity.

3

• Binary crossentropy for the loss function.

Coding Exercise 1 [4 points] Implement synthesize_attack_data in hw5_part1.py. This corresponds

to the first two points of the algorithm above.

Coding Exercise 2 [4 points] Implement build_attack_models in hw5_part1.py. This is the last point

of the algorithm above.

Coding Exercise 3 [2 points] Implement evaluate_membership in hw5_part1.py. This method applies

the attack models to a dataset to make their membership guesses.

The starter code in hw5_part1.py includes an invocation of the exercises on a model for CIFAR. You can

use it to test your solutions.

Tips:

• The build_attack_model function takes the target model, shadow data and labels (S), and the number

of shadow models to use for the attack. When splitting the shadow data into Sin and Sout, you should

use the DataSplit class (found in hw5_part1_utils.py). The constructor for DataSplit takes the

labels of the dataset you would like to split, and a seed (index from 0 to num_shadow_models). The

resulting object has two attributes, in_idx and out_idx, which give the list of indices into the original

data that form the “in” and “out” datasets. For example, with a DataSplit object, split, Sin can be

obtained via shadow_data[split.in_idx].

• The evaluate_membership function takes the attack models returned by build_attack_models, the

target model’s predictions on a set of points, and the true labels for the same set of points. Recall that,

while the attack model takes both the predicted labels and the one-hot true labels as input, there is

also a separate attack model for each class.

II Mitigating Bias with Adversarial Learning [10 points]

In this part of the assignment, you will be implementing the GAN-like fair training routine of Zhang et al.

[4] described in lecture. You will be implementing two variants of the training procedure:

1. A variant that aims to achieve demographic parity which we will identify with the condition

Pr[Yˆ = 1 | Z = 0] = Pr[Yˆ = 1 | Z = 1]

where prediction yˆ = 1 is positive and z ∈ {0, 1} are two groups (genders, etc.).

2. A variant that aims to achieve equality of opportunity for positive ground truth, identified by the

condition

Pr[Yˆ = 1 | Y = 1, Z = 0] = Pr[Yˆ = 1 | Y = 1, Z = 1]

where y = 1 indicates positive ground truth.

Your solution will fill in the holes of hw5_part2.py. The starter code also includes invocations based on the

UCI Adult dataset4. We split the dataset for you into demographics features matrix X, class y (1 indicates

income >= 50k, the outcome we will consider positive), and a group (gender) attribute z (1 indicates male).

4http://archive.ics.uci.edu/ml/datasets/Adult

4

The idea of the debiased training procedure is that we can mitigate bias in our classifier via a competition

between an adversary and the classifier. The classifier wants to predict the correct output, while also

keeping the adversary from predicting the protected attribute. Meanwhile, the adversary wants to predict

the protected attribute. We give the adversary different information depending upon which fairness objective

to achieve. For demographic parity, the adversary only gets the classifier prediction while for equality of

opportunity, the adversary gets both the correct class and classifier prediction.

II-1 Demographic Parity [5 points]

Coding Exercise 4 [1 points] Implement the evaluate_dem_parity method in hw5_part2.py. This method

measures the demographic parity of a given model. It should return a tuple with two values: (1) the probability

that the prediction for group 0 is 1, and (2) the probability that the prediction for group 1 is 1. Demographic

parity is achieved if these two values equal.

Coding Exercise 5 [4 points] Implement the train_dem_parity method of the AdversarialFairModel

class in hw5_part2.py.

The train_dem_parity method returns nothing but should update self.classifier by training it accord-

ing to the following procedure:

1. Create the adversary and connect it to the classifier’s outputs.

2. Create operations for the loss, gradients, and parameter updates of the adversary.

3. Create operations for the loss, modified gradients, and parameter updates of the classifier.

4. For each epoch, train the adversary, then the classifier on all batches (on epoch t, use a learning rate of

1/t and an α of

√

t). This will results in reduction of learning rate with epochs and a slowly increasing

debiasing strength.

The adversary network should be simply a linear model with a single sigmoid output (as the protected

attribute is binary).

II-2 Equality of Opportunity [5 points]

Coding Exercise 6 [1 points] Implement the evaluate_eq_op method in hw5_part2.py. This computes

a measure of equality of opportunity for a given model. We will focus on the positive ground truth (1). It

should return a tuple with two values: (1) the probability that the prediction for group 0 is 1, given that the

ground truth is 1, and (2) the probability that the prediction for group 1 is 1 given that the ground truth is

1. Equality of opportunity (for positive ground truth) is achieved if these values equal.

Coding Exercise 7 [4 points] Implement the train_eq_op method of the AdversarialFairModel in hw5_part2.py.

The general operation of this method as described in the exercise for demographic parity.

You can test your implementation with the last part of the starter code hw5_part2.py.

III Mitigating Bias in Word Embeddings[10 points]

In this part you will implement the word-embedding debiasing technique of Bolukbasi et al. [1]. In the paper,

they refer to this technique as "hard-debiasing" or "neutralize and equalize". Please refer to Section 6

of the paper [1] for more details.

5

Before you get started Install extra packages (json and gensim may be necessary) and download the

word2vec word embedding. You will then need to unzip the data and place it in the data folder:

pip install gensim json

wget https://s3.amazonaws.com/dl4j-distribution/GoogleNews-vectors-negative300.bin.gz

gunzip GoogleNews-vectors-negative300.bin.gz

mv GoogleNews-vectors-negative300.bin data

We have provided the class that loads in the embedding in the starter code in hw5_part3.py. There are

four other json/txt files you need to use in your implementation, which are also loaded for you.

• definitional_pairs.json: Definitional pairs used to find the gender dimension.

• gender_specific_full.json: All words that you should not debias.

• equalize_pairs.json: Word pairs to equalize so that they are equal-distant to the debiased gender-

neutral words.

• questions-word.txt: An evaluation dataset to test the performance of word embeddings. Each line

contains an analogy, in some lines it has subcategory information with colon that you can ignore.

III-1 Debiasing word embeddings

Coding Exercise 8 [3 points] Complete the method identify_gender_subspace that extracts the gender

direction (1 dimension). This is done by performing PCA on the gender definitional words. You can use

np.linalg.svd for PCA. No other packages (such as sklearn) are allowed in this part.

Coding Exercise 9 [3 points] Complete the method neutralize that project all gender-neutral words (the

complement of gender-specific words) away from the gender axis.

Coding Exercise 10 [4 points] Complete the method equalize that makes sure both words within each

equalized pair are equal-distant to the gender-neutral words.

To evaluate the bias and utility of the original and debiased embedding, you can use compute_analogy,

which computes the fourth word given three words in an analogy. It is done by finding a word (different from

the given three words) that is closest (in terms of inner product) to the fourth vertex in the parallelogram

where other given words occupy three vertices. The end of hw5_part3.py also includes an invocation of the

requested methods you can test your solution with.

References

[1] Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. Man is to com-

puter programmer as woman is to homemaker? debiasing word embeddings. In Advances in neural information

processing systems, pages 4349–4357, 2016. URL https://arxiv.org/abs/1607.06520.

[2] Milad Nasr, Reza Shokri, and Amir Houmansadr. Comprehensive privacy analysis of deep learning: Stand-alone

and federated learning under passive and active white-box inference attacks. arXiv preprint arXiv:1812.00910,

2018. URL https://arxiv.org/pdf/1812.00910.

[3] Reza Shokri, Marco Stronati, Congzheng Song, and Vitaly Shmatikov. Membership inference attacks against

machine learning models. In 2017 IEEE Symposium on Security and Privacy (SP), pages 3–18. IEEE, 2017. URL

https://arxiv.org/abs/1610.05820.

[4] Brian Hu Zhang, Blake Lemoine, and Margaret Mitchell. Mitigating unwanted biases with adversarial learning.

CoRR, abs/1801.07593, 2018. URL http://arxiv.org/abs/1801.07593.

联系我们

- QQ：99515681
- 邮箱：99515681@qq.com
- 工作时间：8:00-23:00
- 微信：codinghelp2

- Cs2461-10实验程序代做、代写java，C/C++，Python编程设 2021-03-02
- 代写program程序语言、代做python，C++课程程序、代写java编 2021-03-02
- Programming课程代做、代写c++程序语言、Algorithms编程 2021-03-02
- 代写csc1-Ua程序、代做java编程设计、Java实验编程代做 代做留学 2021-03-02
- 代做program编程语言、代写python程序、代做python设计编程 2021-03-02
- 代写data编程设计、代做python语言程序、Python课程编程代写 代 2021-03-02
- Cse 13S程序实验代做、代写c++编程、C/C++程序语言调试 代写留学 2021-03-02
- Mat136h5编程代做、C/C++程序调试、Python，Java编程设计 2021-03-01
- 代写ee425x实验编程、代做python，C++，Java程序设计 帮做c 2021-03-01
- Cscc11程序课程代做、代写python程序设计、Python编程调试 代 2021-03-01
- 代写program编程、Python语言程序调试、Python编程设计代写 2021-03-01
- 代做r语言编程|代做database|代做留学生p... 2021-03-01
- Data Structures代写、代做r编程课程、代做r程序实验 帮做ha 2021-03-01
- 代做data留学生编程、C++，Python语言代写、Java程序代做 代写 2021-03-01
- 代写aps 105编程实验、C/C++程序语言代做 代写r语言程序|代写py 2021-03-01
- Fre6831 Computational Finance 2021-02-28
- Sta141b Assignment 5 Interactive Visu... 2021-02-28
- Eecs2011a-F20 2021-02-28
- Comp-251 Final Asssessment 2021-02-28
- 代写cs1027课程程序、代做java编程语言、代写java留学生编程帮做h 2021-02-28