ECE 285 – MLIP – Assignment 4
Image Denoising with Deep CNNs
Written by Charles Deledalle and Anurag Paul on May 21, 2019.
In Assignments 1, 2, 3, we were focusing on classification. In Assignment 3, we introduced you to the
data loading mechanisms of the package torch.utils.data and learning mechanisms of our homemade
package nntools which we will continue to use here. In this assignment, we are going to consider a
specific regression problem: image denoising. We will be using deep Convolutional Neural Networks
(CNNs) with PyTorch, investigate DnCNN and U-net architectures.
We will be using images from the “Berkeley Segmentation Dataset and Benchmark” that are downloadable here https://www2.eecs.berkeley.edu/Research/Projects/CS/vision/bsds/, and located
on DSMLP here /datasets/ee285s-public/bsds/. This directory contains two sub-directories: train
and test, which consist of 200 and 100 images, respectively, of either size 321 × 481 or 481 × 321. While
we saw that thousand to millions of images were required for image classification, we can use a much
smaller training set for image denoising. This is because denoising each pixel of an image can be seen
as one regression problem. Hence, our training is in fact composed of 200×321×481 ≈ 31 million samples.
It is possible that DSMLP will be too busy to let you run everything from this assignment. We know
what the results should look like, so we will only grade the code.
1 Getting started
First of all, connect to DSMLP (ieng6 server) and start a pod with enabled GPU/CUDA capabilities
✩ launch-py3torch-gpu.sh
Next connect to your Jupyter Notebook from your web browser (refer to Assignment 0 for more details).
Create a new notebook assignment4.ipynb and import
%matplotlib notebook
import os
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.utils.data as td
import torchvision as tv
from PIL import Image
import matplotlib.pyplot as plt
import nntools as nt
Select the relevant device
device = ✬cuda✬ if torch.cuda.is_available() else ✬cpu✬
print(device)
1
For the following questions, please write your code and answers directly in your notebook. Organize
your notebook with headings, markdown and code cells (following the numbering of the questions).
2 Creating noisy images of BSDS dataset with DataSet
Our goal is to use deep convolutional neural networks to learn the mapping xi → yi where xi are noisy
images (our data/observations) and yi are clean images (our labels/ground-truth)1
. We will consider the
images of the BSDS dataset as our clean/ground-truth images: yi
. For each of them, we will generate
noisy versions by adding white Gaussian noise: xi = yi + wi where wi
is an image where each pixel is
an independent realization of a zero-mean Gaussian distribution with standard deviation σ = 30. Since
images have different sizes, we will consider random crops of size of 180 × 180.
1. Create a variable dataset root dir and make it point to the BSDS dataset directory.
2. As we built a dataset class in Assignment 3, we will do that here again to create our
NoisyBSDSDataset. Interpret and complete the following piece of code:
class NoisyBSDSDataset(td.Dataset):
def __init__(self, root_dir, mode=✬train✬, image_size=(180, 180), sigma=30):
super(NoisyBSDSDataset, self).__init__()
self.mode = mode
self.image_size = image_size
self.sigma = sigma
self.images_dir = os.path.join(root_dir, mode)
self.files = os.listdir(self.images_dir)
def __len__(self):
return len(self.files)
def __repr__(self):
return "NoisyBSDSDataset(mode={}, image_size={}, sigma={})". \
format(self.mode, self.image_size, self.sigma)
def __getitem__(self, idx):
img_path = os.path.join(self.images_dir, self.files[idx])
clean = Image.open(img_path).convert(✬RGB✬)
i = np.random.randint(clean.size[0] - self.image_size[0])
j = np.random.randint(clean.size[1] - self.image_size[1])
# COMPLETE
noisy = clean + 2 / 255 * self.sigma * torch.randn(clean.shape)
return noisy, clean
the method getitem should return two torch tensors of shape (3, n1, n2) where (n1, n2) =
image size and normalized to the range [[1, 1].
3. Create train set and test set as instances of NoisyBSDSDataset for mode=’train’ and
mode=’test’ respectively. Consider images of size 180 × 180 for training and 320 × 320 for testing.
1Note that in regression literature, notation y is often used for noisy data and x for clean data, but we will use the machine
learning notations where x is the observed data and y the prediction.
2
Sample the element with index 12 in the testing set. Store it in a variable x. Use myimshow function
from Assignment 3 to display next to each other the clean and noisy version of this sample.
Note that unlike the networks we investigate for classification, here, the networks will be fully convolutional. The architecture is then independent of the spatial dimension of the input. That is why
we can use different spatial dimensions for training and validation.
3 DnCNN
In this assignment, we are going to investigate three different CNN architectures with variable depth. To
make our code more versatile, easy to read and save precious coding time, all these architectures are going
to inherit from the abstract class NeuralNetwork of the nntools package. The first architecture we will
be considering is called DnCNN and can be described as:
Similar to VGG, it uses only 3 × 3 convolutions. Similar to ResNet, it uses a skip connection between the
first and the last layer.
4. Similar to NNClassifier from Assignment 3, create a subclass NNRegressor that inherits from
NeuralNetwork and implements the method criterion as being the MSE loss.
5. Copy, interpret and complete the following code implementing DnCNN:
class DnCNN(NNRegressor):
def __init__(self, D, C=64):
super(DnCNN, self).__init__()
self.D = D
self.conv = nn.ModuleList()
self.conv.append(nn.Conv2d(3, C, 3, padding=?))
# COMPLETE
self.bn = nn.ModuleList()
for k in range(D):
self.bn.append(nn.BatchNorm2d(C, C))
# TO BE COMPLETED LATER IN QUESTION 7
def forward(self, x):
D = self.D
h = F.relu(self.conv[0](x))
3
# COMPLETE
y = self.conv[D+1](h) + x
return y
Refer to PyTorch’s documentation for detailed explanations about nn.ModuleList and
nn.BatchNorm2d. Note that in order to preserve the spatial feature dimensions between each successive layer of the network, we will have to use zero-padding by a suitable number of pixels that
you have to determine.
6. Sample the last image of the training dataset, and store its noisy version in a variable x of shape
(1, 3, 180, 180). Create instances of DnCNN for D = 0, 1, 2, 4 and 8. For each of them feed-forward
x, store the result in y, and display next to each other x[0], y[0] and x[0]-y[0]. What do you
observe? What will be the implication on back-propagation?
7. Modify DnCNN constructor (the init method) in order to apply He’s initialization on the weights
of convolution layers (stored in self.conv[k].weight.data) used before any ReLU activation function. As Batch normalization is used between convolutions and ReLU, you will also need to initialize
the weights of the Batch normalization layers as
nn.init.constant_(self.bn[k].weight.data, 1.25 * np.sqrt(C))
Hint: read the documentation of nn.init.kaiming normal .
8. Rerun the experiment from question 6. What do you observe? Why? What will be the implication
on back-propagation?
9. A very classical (but controversial) way to compare the quality of restoration techniques is to use
the PSNR (Peak Signal-to-Noise-Ratio) defined for images ranging in [[1, 1] as
PSNR = 10 log10
4n
ky y dk
2
2
(1)
where d is the desired ideal image, y the estimate obtained from x and n the number of elements in the tensor. The PSNR measures in decibels (dB) the quality of the restoration: the
higher the better. Similar to ClassificationStatsManager from Assignment 3, create the subclass DenoisingStatsManager that inherits from StatsManager and computes and averages PSNR
between mini-batches (instead of accuracy).
10. Create a DnCNN network with D = 6 and transfer your network to GPU. Create an experiment
exp1 for DnCNN based on Adam optimizer with learning rate 1e-3, using DenoisingStatsManager,
mini batches of size 4, and storing the checkpoints in denoising1.
11. Run the experiment for 200 epochs after completing the following
def plot(exp, fig, axes, noisy, visu_rate=2):
if exp.epoch % visu_rate != 0:
return
with torch.no_grad():
denoised = exp.net(noisy[np.newaxis].to(exp.net.device))[0]
axes[0][0].clear()
axes[0][1].clear()
axes[1][0].clear()
4
axes[1][1].clear()
myimshow(noisy, ax=axes[0][0])
axes[0][0].set_title(✬Noisy image✬)
# COMPLETE
plt.tight_layout()
fig.canvas.draw()
fig, axes = plt.subplots(ncols=2, nrows=2, figsize=(7,6))
exp1.run(num_epochs=200, plot=lambda exp: plot(exp, fig=fig, axes=axes,
noisy=test_set[73][0]))
such that your function displays at every visu rate=2 epochs something similar to the results below.
If it does not so, interupt it (Esc+i+i), check your code, delete the output dir, and start again.
12. Once you have trained your network for 200 epochs (about 17 minutes), compare the noisy, clean
and denoised results on a few images of the testing set. Evaluate the visual quality of your result.
5
Do you see any artifacts or loss of information?
Hint: use a plt.subplots created with the arguments sharex=’all’ and sharey=’all’. This will
allow you to zoom and navigate on the images in a synchronized way.
13. What is the number of parameters of DnCNN(D)? What is the receptive field of DnCNN(D), i.e,
how many input pixels do influence an output pixel?
14. Denoising literature claims that for reducing Gaussian noise of standard deviation σ = 30 efficiently,
a pixel should be influenced by at least 33 × 33 pixels. How large D (how deep) should DnCNN
be to satisfy this constraint? What would be the implication on the number of parameters and the
computation time?
4 U-net like CNNs
As seen in class, pooling layers allows us to increase the receptive field without making the network deeper
and slower. But pooling layers lose spatial resolution. In order to retrieve the spatial dimension we will
have to use unpooling (like introduced for feature visualizing in ZFNet). Our architecture will resemble
the so-called U-net architecture (that was originally introduced for image segmentation). Our network
will consist of a contracting path and an expansive path, which gives it the U-shaped architecture. The
contracting path is a typical convolutional network that consists of repeated application of convolutions,
ReLU and max pooling operation. During the contraction, the spatial information is reduced. The
expansive pathway reconstructs spatial information through a sequence of unpooling and convolutions
(note that if we were using strided convolutions, we would have to use transpose convolutions), and
combined high-resolution features from the contracting path. In the original U-net, features were combined
by concatenating the channels of the tensors, but here we will consider their sum divided by √
2 (this is
to preserve the range of feature variations). The architecture is summarized in the following scheme:
15. Copy paste the class DnCNN into a new class UDnCNN. Modify it to implement the U-shaped DnCNN.
Use F.max pool2d(..., return indices=True) and F.max unpool2d(..., output size=...)
of size 2 × 2. You may need to create three lists to store the features, the spatial dimensions
and the max pool indices.
Hint: if the layer with index k is in the expansive pathway, its symmetrical layer in the contractive
pathway has index D D k + 1.
16. Create a new experiment exp2 for UDnCNN based on Adam optimizer with learning rate 1e-3,
using DenoisingStatsManager, mini batches of size 4, and storing the checkpoints in denoising2.
6
Run your experiment for 200 epochs (about 15 minutes) using the same function plot as before.
17. What is the number of parameters of UDnCNN(D)? What is the receptive field of UDnCNN(D)?
Do you expect UDnCNN(D) to beat DnCNN(D)?
18. Using the method evaluate of Experiement, compare the validation performance of DnCNN and
UDnCNN. Interpret the results.
5 U-net like CNNs with dilated convolutions
Though pooling layers increase the receptive field, they lose information about exact locations. This is
desired for classification, but for denoising this decreases performance. An alternative to pooling is to use
dilated convolutions (sometimes refer to the `a trous algorithm, meaning with holes). Instead of increasing
the receptive field by reducing the feature spatial dimensions by a factor 2 after each convolution, the
filters are dilated by a factor 2. In order to maintain the same number of parameters, the dimensions are
increased by injecting “holes” between each rows and columns of the filter. Please, refer to the following
figure:
Next, instead of using unpooling to recover the original spatial dimensions, the filter dimensions are
decreased by reducing the number of holes in the filters.
19. Copy paste the class DnCNN into a new class DUDnCNN and modify it to implement the Dilated Ushaped DnCNN. Each convolution placed after k pooling and l unpooling in the network, should be
replaced by a dilated filter with 2kk l √ 1 holes. This can be achieved with the dilation optional
argument of nn.Conv2d. Make sure you set up the argument padding accordingly in order to
maintain tensors with the same spatial dimension during the forward propagation.
20. In theory, dilated convolutions should not be slower than standard convolutions, but
for some reasons they are when using the default PyTorch backend implementation.
For this reason, add the two instructions torch.backends.cudnn.benchmark=True and
torch.backends.cudnn.benchmark=False, before and after running any dilated convolutions. For
more details, see the discussion here: https://github.com/pytorch/pytorch/issues/15054.
21. Create a new experiment for DUDnCNN based on Adam optimizer with learning rate 1e-3, using
DenoisingStatsManager, mini batches of size 4, and storing the checkpoints in denoising3. Run
your experiment for 200 epochs (about 20 minutes) using the same function plot as before.
22. Compare the validation performance of DnCNN, UDnCNN and DUDnCNN. Display results from a
few samples of the testing set. Make sure your results are similar to the ones shown on Figure 1.
23. What is the number of parameters of DUDnCNN(D)? What is the receptive field of DUDnCNN(D)?
Conclude.
7
Figure 1: Comparative results between DnCNN, UDnCNN and DUDnCNN. All use 8 convolution layers
and have the same number of learnable parameters.
6 Bonus (optional and ungraded)
Play with the depth D, the learning rate, the size of the mini-batches, the noise level, etc.