程序辅导案例 > Program >

程序代写案例-B31XO

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

Prof. Y. Wiaux Heriot-Watt B31XO Course project
DEEP LEARNING IN COMPUTATIONAL IMAGING
1 Introduction
1.1 Project in a nutshell
Computational imaging is an approach to the process of imaging a scene of interest, where the acquired
data are not the sought images themselves, but rather contain only incomplete information about these
images (e.g. due to time, cost, or physical constraints inherent to the sensing procedure). The data is
often not acquired in the image domain itself, but rather in a transform domain (e.g. the Fourier domain).
In this context, an ill-posed inverse problem appears for image formation, and advanced algorithms are
needed to reconstruct an image from data. Important imaging modalities in science and technology,
ranging from astronomy to medicine, fall into this category.
In the first part of the course, we have introduced the framework of convex optimisation, which pro-
vides a whole wealth of algorithms enabling to solve imaging inverse problems. We have in particular
studied proximal splitting methods, such as the Forward-Backward algorithm, and the Alternating Di-
rection Method of Multipliers (ADMM). The second part of the course will take the form of a group
project which will enable us to explore how machine learning algorithms, more specifically deep neural
networks, can provide an alternative framework to solve imaging inverse problems, or otherwise inte-
grate and enhance optimisation algorithms. For the sake of the illustration, the project will focus on
a specific computational imaging modality known as Magnetic Resonance (MR) Imaging, which is a
state-of-the-art imaging technique in medicine.
1.2 MR imaging
MR imaging is a non-invasive non-ionising medical imaging technique that finds its superiority in the
flexibility of its contrast mechanisms. It comes in various modalities ranging from structural imaging
aiming at mapping detailed tissue structures, or diffusion imaging mapping the structural neuronal con-
nectivity by probing molecular diffusion in each voxel of the brain, to dynamic imaging mapping for
example the heart movements. MR imaging is a good example of how it is possible to play with phys-
ical phenomena to encode data. Here a phenomenon called “magnetic resonance”, which relates to the
alignment of the spins of protons in a magnetic field, enables the sequential measurement of Fourier co-
efficients of the image of interest [1]. MR data acquisition is intrinsically long, sometimes prohibitively,
due to its sequential nature. MR acquisition sequences that are fast and simultaneously enable high-
resolution imaging constitutes a big challenge for medical research. In this context, the acquisition of
incomplete Fourier data (i.e. of only part of the Fourier coefficients) is a commonly used acceleration
strategy. This places the modality in the category of computational imaging techniques, with advanced
algorithms required for image reconstruction from the data.
1.3 Deep Learning?
Recently, Deep Neural Networks (DNNs), with mixed architectures, including simple feed-forward net-
works, variational networks, generative adversarial networks, autoencoders, and convolutional neural
networks (CNNs) such as U-net, were demonstrated solving inverse imaging problems with outstanding
DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 1/11
Prof. Y. Wiaux Heriot-Watt B31XO Course project
Figure 1: Illustration of a generic U-net CNN architecture. The network will be fed with an “input image”
and produce an “output image”. The input is progressively processed through the sequential application
of simple activation functions (typically thresholding operations such as ReLU) and convolutions with
small kernels (Conv) over a number of layers. On top of that “Max pooling”, “Transposed Conv”, “Con-
cat”, and “BN” respectively perform down-sampling, up-sampling, concatenation, and normalisation
operations, which all help creating a complex structure of “feature maps” (yellow planes) over the layers.
The parameters of the network (typically the convolution kernels that lead to the feature maps) will be
learned, i.e. their values will be chosen to optimise the output of the network over a training data set of
input/output images. The more complex the architecture, the larger the inference power of the network.
precision and for a variety of applications, in particular in medical imaging [2–5]. For illustration, a
CNN will process data, in the form of an input “corrupted” image, through the sequential application
of simple nonlinear activation functions and small convolutions over a number of layers, to produce an
output “corrected” image (see Figure 1). If the network is not too deep, i.e. the number of layers is not
too high, its application may be extremely fast in comparison with an optimisation algorithm, opening
the door to a significant scalability to the large image and data dimensions characterising modern ap-
plications. The computational cost is transferred to training the DNN offline. The training task itself is
actually performed using optimisation algorithms! It requires large training databases, as well as power-
ful computation resources, in particular dedicated GPU systems. The lack of theoretical understanding
of DNNs however raises the question of the reliability of the estimated image. In particular, the relation
between the data and the underlying image is, in most applications, dependent on a variety of acquisition
parameters. Training can only be performed over a limited range of conditions, making the question of
the power of generalisation of a DNN to imaging conditions outside those considered during training a
constant challenge.
Emerging “Plug-and-Play” (PnP) methods in optimisation propose to replace the proximal operator
associated with the regularisation term in a proximal splitting optimisation method by a more general
denoising operator characterising a more general prior image model [6–8]. They were also shown to pro-
vide outstanding imaging precision, in particular when the image model is learnt and the corresponding
denoiser takes the form of a DNN [9–14]. Interestingly, the DNN being merely a denoiser, it becomes
agnostic to the data acquisition conditions. The PnP approach thus solves the generalisation issue men-
tioned above.
DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 2/11
Prof. Y. Wiaux Heriot-Watt B31XO Course project
Figure 2: Left: the groundtruth xo we wish to recover, center: the Fourier mask diag(M†M), and right:
the observed Fourier data correctly positioned in the Fourier plane M†y (in logscale of the modulus).
2 The project
2.1 Problem formulation and objectives
Problem formulation. In MR imaging, one wishes to recover an image xo ∈ RN from a noisy and
sub-sampled version of its Fourier coefficients y ∈ CM , with M < N . The acquisition process can be
formulated as
y = Φ†xo +n, with Φ† = MF, (1)
where F ∈ CN×N denotes the discrete Fourier transform, M ∈ {0, 1}M×N is a matrix implementing the
selection of a subset of Fourier coefficients (each of its M lines contains a single nonzero value indexing
a spatial frequency to be sampled), and n ∈ CM is some additive i.i.d. Gaussian noise (both its real and
imaginary parts follow a normal distribution with zero mean and variance σ2/2). We consider a class
of Fourier sampling patterns where observed frequencies are organised by lines in the Fourier plane of
the image in order to fit technical constraints on the design of an MR acquisition sequence. All low
frequency lines are sampled, while higher frequencies are selected uniformly at random (see Figure 2).
Objectives. The aim of this project is to implement 3 algorithms to solve the inverse imaging problem,
i.e. for the recovery of xo from y , and to compare them in terms of both quality of the recovered image,
and computation time. The 3 methods are: (M1) A pure optimisation method based on the ADMM
algorithm; (M2) A pure deep learning method based on the U-net CNN; (M3) A hybrid PnP method
where a simple CNN trained as a denoiser is plugged into ADMM to substitute the proximity operator of
the regularisation term.
2.2 M1
In order to solve (1), one may introduce a sparsity regularisation term, in a well chosen sparsity basis.
The estimation x? of xo will be defined as a solution of the following optimisation problem:
x? = argmin
x
‖Ψ†x‖1 subject to ‖y −Φ†x‖2 ≤ ε, (2)
DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 3/11
Prof. Y. Wiaux Heriot-Watt B31XO Course project
where Ψ† is the (orthonormal) Db4 basis and ε = σ
√
M + 2
√
M . The ADMM algorithm for solving
(2) reads
xt+1 = Ψ shrink
(
Ψ† (xt − δR(Φ(st +nt − ν¯ t))) , δρ−1
)
st+1 = Φ
†xt+1 − y
nt+1 = PB(0,ε)(ν¯ t − st+1)
ν¯ t+1 = ν¯ t − (st+1 +nt+1),
(3)
where R denotes the real part, δ > 0 satisfies δ ≤ σ−1max(R(ΦΦ†)), and ρ > 0 is a free parameter. The
value of this parameter will usually be fine-tuned to maximize the reconstruction quality over some test
set of groundtruth images, i.e. a data set of Te pairs (y i,xoi )i∈Te . A properly tuned value of ρ should lead
to x?i ' xoi ∀i ∈ Te. If the required accuracy is not reached, the value of ρ needs to be changed. Or more
importantly the regularisation term and the objective function might have to be adapted!
Note that, in theory, the optimisation algorithm will converge after an infinite number of iterations.
In a practical implementation the iterative process will typically be stopped when the relative variation
of consecutive iterates is smaller than a fixed bound α > 0, i.e. ‖xt+1 − xt‖2/‖xt+1‖2 ≤ α. Note that
other criteria can also be applied.
2.3 M2
What is a neural network? Given a specific architecture (sequence of nonlinear activation functions
and convolution operations, see Figure 1), a neural network can be represented as a highly nonconvex
function G of a set of parameters θ. The latter are optimised in order to have G solve a specific problem,
which can be formalised as recovering some variable xo from input data y . Ideally, one seeks that
the network output G(y,θ?) be equal to xo. In order to find the optimal value θ? of the parameters,
the network is trained by minimizing a loss (objective) function l over a training data set of Tr pairs
(y i,x
o
i )i∈Tr . A widely used loss function is the mean squared error over the data set between the network
output G(y,θ) and the true variable xo:
θ? = argmin
θ
1
M
M∑
i=1
‖G(y i, θ)− xoi ‖22. (4)
Algorithms such as Stochastic Gradient Descent (SGD) and Adaptive Moment Estimation (ADAM) solve
problem (4) efficiently (even though this problem is nonconvex). Just as ADMM, these algorithmic
structures contain free parameters that will affect the optimal value θ?. Once the network is trained, it
is evaluated on a separate testing data set, i.e. a data set of Te pairs (y i,xoi )i∈Te on which the network
has not been trained. A properly trained network should satisfy G(y i, θ?) ' xoi ∀i ∈ Te. If the required
accuracy is not reached, the free parameters need to be re-tuned and the network re-trained. Or more
importantly the architecture G of the network might have to be adapted!
Application to imaging inverse problems. To solve an imaging inverse problem, a neural network
will aim to estimate the sought image xo from the incomplete data y . For our problem, the training
and testing data sets should be built as Tr + Te input/output pairs (y i,xoi )i∈Tr+Te , where y i is related
to the groundtruth image xoi through (1). Note that working on pairs where both input and output are
represented as images greatly simplifies the network architecture and training procedure. One can simply
DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 4/11
Prof. Y. Wiaux Heriot-Watt B31XO Course project
Figure 3: Left: the groundtruth xo we wish to recover, middle: the observed Fourier data correctly
positioned in the Fourier plane M†y (in logscale of the modulus), and right: the backprojected image
x˜ = <{Φy}.
pre-process the data y and project them onto the image domain to produced the so-called “backprojected
image”
x˜ = <{Φy} = <{Φ(Φ†xo +n)} ∈ RN , (5)
and work with input/output pairs (x˜i,xoi )i∈Tr+Te (see Figure 3 for an illustration).
The full design of an appropriate network architecture is a complicated question lying beyond the
scope of the current project. We will choose and adapt here an existing architecture dubbed U-net [15],
which has shown good results in medical imaging [2]. The explicit U-net architecture proposed is detailed
in Appendix A.
2.4 M3
The PnP version of ADMM [9, 11] simply reads,
xt+1 = G (xt − δR(Φ(st +nt − ν¯ t)))
st+1 = Φ
†xt+1 − y
nt+1 = PB(0,ε)(ν¯ t − st+1)
ν¯ t+1 = ν¯ t − (st+1 +nt+1),
(6)
where the stepsize δ > 0 satisfies δ ≤ σ−1max(R(ΦΦ†)) and ε = σ
√
M + 2
√
M , and a denoising neural
network G is substituted to the prox of the regularisation term in the objective function (2), imposing a
learned image model. As the network is to be applied at each iteration of ADMM (rather than only once
as in M2), we choose a simpler architecture, namely “DnCNN” [10], which implies a faster application
of G and will help containing the overall computation time. The explicit DnCNN architecture proposed
is detailed in Appendix B.
Recall that a network is trained by minimizing a loss function over a training data set Tr using
optimisation algorithms containing additional free parameters that need tuning. Once the network is
trained, it is evaluated on a testing data set Te, and if the required accuracy is not reached, the free
parameters need to be re-tuned and the network re-trained, or more importantly the architecture of the
network might have to be adapted!
DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 5/11
Prof. Y. Wiaux Heriot-Watt B31XO Course project
The training and testing data sets are built as input/output pairs (xˆi,xoi )i∈Tr+Te , where xˆi are noisy
versions of the groundtruth images xoi :
xˆi = x
o
i + n¯i, (7)
with n¯i ∈ RN is some additive i.i.d. Gaussian noise, whose components follow a normal distribution
with zero mean and variance σ¯2. Note that this variance σ¯2 should be adapted to the variance σ2 of the
noise on the observed data.
Note that the convergence PnP version of ADMM is not proven for standard network architectures
such as DnCNN [7, 8, 11]. It is therefore safe to stop the algorithm after a fixed number of iterations,
rather than “hoping” for the relative variation of consecutive iterates to be smaller than a fixed bound.
3 Implementation and Data
The algorithms should be implemented in MATLAB. All data and files needed for the project are pro-
vided on dropbox and organised in the following directories:
• implementation/ contains the MATLAB files M1.m, M2.m, and M3.m, for your implementation
and validation of the 3 methods; the 3 files contain useful MATLAB functions; M2.m also contains
a partial implementation of the U-net architecture proposed in Appendix A, for you to finalise;
M3.m also contains a partial implementation of the DnCNN architecture proposed in Appendix B,
for you to finalise;
• trainingset/ contains a training data set of approximately 7000 MR images of size 320 × 320
borrowed from [16] to serve as groundtruths for training both the U-net and DnCNN. From these,
you will need to build the backprojected images to serve as an input to the Unet according to
equation (5) in the training phase of M2, and the noisy images to serve as an input to the DnCNN
according to equation (7) in the training phase of M3;
• testingset/ contains a testing data set of 20 MR images of size 320 × 320 borrowed from [16] to
serve as groundtruths for the validation of the 3 methods.
4 DNN online tutorials
For building background knowledge on DNNs (CNNs in particular, including their MATLAB implemen-
tation) please refer to the following tutorials:
• MATLAB Deep Learning Onramp
• Stanford course on CNNs
5 Beyond M1, M2, M3
The design and development of M1, M2, and M3 raises a significant amount of questions with regards to
possible improvements. Can the objective function for M1 be enhanced? Can the network architectures
and/or the training data sets (input/output) for M2 and M3 be enhanced? Would another PnP algorithm
DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 6/11
Prof. Y. Wiaux Heriot-Watt B31XO Course project
SNR (dB) SSIM Reconstruction time (s)
Method Average Standard dev. Average Standard dev. Average Standard dev.
M1
M2
M3
M4
Table 1: SNR, SSIM, and computation time results
enable increased performance of M3? Can DNNs and optimisation algorithms be interfaced in a dif-
ferent way? These questions lay the ground for the development of new methods that would possibly
outperform M1, M2, and M3...
Challenge. You should attempt to develop and implement a new method M4 to outperform M1, M2,
and M3.
6 Report
Your results. You are asked to provide the results of validation of your developments over the 20
images from the testing data set. More precisely, you need to provide, for each method (M1, M2, M3,
and your M4):
1. the average and standard deviation of the Signal-to-Noise Ratio (SNR (dB)) and Structural Similar-
ity Index Measure (SSIM) of the reconstructions (use Table 1). Note that the SNR of reconstruction
x? is defined as SNR = 20 log10 (‖xo‖2/‖xo − x?‖2). The more involved definition of SSIM can
be found, e.g. on Wikipedia, and MATLAB has a built-in function for it. Both metrics are standard
estimators of the reconstruction quality;
2. the average computation time and standard deviation (use Table 1);
3. the reconstructed images x?i in comparison with the groundtruth images x
o
i and the backprojected
images x˜i = <{Φy i}. For convenience, you only need to show 3 out of the 20 images.
Last but not least, it is of prime importance that you discuss your results, in particular how the
methods compare to one another, both in terms of visual reconstruction quality and the quantitative SNR
and SSIM metrics. You are required to discuss their strengths and limitations, and how you think they
could be improved.
The document structure. You need to provide a project report containing the following sections: (i)
Introduction, (ii) Objectives, (iii) Methodology and implementation, (iv) Validation and results, (v) Dis-
cussion and conclusion, (vi) References, (vii) Attendance and contributions. The document should have
maximum 20 pages (font Arial, font size 11, top-bottom-lef-right margins 2cm). The “Attendance and
contributions” section is mandatory and should detail, for each student, the number of progress meet-
ings attended as well as their contributions to both the work (algorithms and codes) and report. This is
understood as a joint statement agreed by the whole group.
DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 7/11
Prof. Y. Wiaux Heriot-Watt B31XO Course project
Upload. The project is set as an assignment on VLE (Virtual Learning Environment). You need to
upload one zip file named “B31XO-groupname-project-20XX.zip” containing (i) the 3 MATLAB files
M1 sol.m, M2 sol.m and M3 sol.m (and maybe your M4 sol.m) reproducing the results of your report
for each method, (ii) the codes used for the training of your networks (the M2 Unet.m for M2 and
M3 DnCNN.m for M3), (iii) your report in pdf format named “B31XO-groupname-report-20XX.pdf”.
DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 8/11
Prof. Y. Wiaux Heriot-Watt B31XO Course project
A Proposed U-net architecture
Figure 4: Illustration of the chosen U-net architecture, inspired from [2]. The network is designed as
a number of layers of operations (arrows) distributed in 3 depths, each creating a feature map (yellow
planes). Below each map is indicated its number of channels (between 64 and 256) as well as their
dimension (between 320 × 320 and 80 × 80). At depth 1, the input image of dimension 320 × 320
is processed through 3 layers consisting in a sequence of 2D Convolution, Batch Normalisation and
ReLU operations (red arrows, see online tutorials for a detailed explanation of these operations), each
outputting a feature map with 64 channels. At depth 2, after a 2× 2 Max pooling (purple arrow), maps
are of dimension 160 × 160 and processed through 2 layers of 2D Convolution, Batch Normalisation
and ReLU operations (red arrows), to create a feature map with 128 channels. At depth 3, after a
second 2 × 2 Max pooling (purple arrow), maps are of dimension 80 × 80 and processed through 2
layers of 2D Convolution, Batch Normalisation and ReLU operations (red arrows), to create a feature
map with 256 channels. Maps are then up-sampled to dimension 160× 160 by a layer of 2D Transposed
Convolution with stride 2 (green arrow, see online tutorials), Batch Normalisation and ReLU operations,
bringing them back to depth 2 with 128 channels. This up-sampled output is concatenated to the 128
channels of the last feature map from depth 2 and processed through 1 layer of 2D Convolution, Batch
Normalisation and ReLU operations (yellow arrow), to create a feature map with 128 channels. This
output is further processed by 1 layer of 2D convolution, Batch Normalisation and ReLU operations (red
arrow), outputting 128 channels. Maps are up-sampled a second time to dimension 320× 320 by a layer
of 2D Transposed Convolution with stride 2 (green arrow), Batch Normalisation and ReLU operations,
bringing them back to depth 1 with 64 channels. This up-sampled output is concatenated to the 64
channels of the last feature map from depth 1 and processed through 1 layer of 2D Convolution, Batch
Normalisation and ReLU operations (yellow arrow), to create a feature map with 64 channels. This
output is further processed by 1 layer of 2D convolution, Batch Normalisation and ReLU operations (red
arrow), outputting 64 channels. Eventually, this last feature map is processed through 1 layer of 2D
Convolution, Batch Normalisation and ReLU operations (red arrow), to create a feature map with only 1
channel... the output image.
DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 9/11
Prof. Y. Wiaux Heriot-Watt B31XO Course project
B Proposed DnCNN architecture
Figure 5: Illustration of the chosen DnCNN architecture, inspired from [10]. The overall architecture
consists of a total of 17 layers involving 2D Convolution, Batch Normalisation and ReLU operations
(red arrows, see online tutorials for a detailed explanation of these operations), each creating a feature
map (yellow planes). Below each feature map is indicated its number of channels (between 1 and 32) as
well as their dimension (64×64). The input image is fed to the first layer. The output of the 17th layer is
added to the input image, through a so-called “skip connection”. This means that the network is actually
learning the difference between input and output. Such networks are called “residual networks”. Note on
the image dimension: due to the i.i.d. nature of the noise, and with the aim to reduce computation cost,
a standard procedure for training a denoiser is to train it on small images (here of size 64 × 64), with a
training data set built from patches of the images (here of size 320×320) in the original training data set.
Once trained, the denoising network can be applied to images of any size. This follows from realising
that the weights learned are those of convolution kernels, and the convolution can simply be performed
on any image size. Notice on the contrary that, in case the network is trained to recover directly the
image from the observed data, as in M2, the measurement operator induces nonlocal dependencies in the
artefacts to be removed from the backprojected image (input to the network), possibly extending over the
whole image, therefore preventing the same approach of training on patches.
DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 10/11
Prof. Y. Wiaux Heriot-Watt B31XO Course project
References
[1] Liang and Lauterbur, “Principles of magnetic resonance imaging: a signal processing perspective”,
SPIE Optical Engineering Press, 2000.
[2] Jin et al., “Deep convolutional neural network for inverse problems in imaging”, IEEE TIP 26 (2017)
4509.
[3] Hauptmann et al., “Multi-Scale Learned Iterative Reconstruction”, accepted in IEEE TCI,
arXiv:1908.00936.
[4] Yang et al., “DAGAN: Deep De-Aliasing Generative Adversarial Networks for Fast Compressed
Sensing MRI Reconstruction”, IEEE TMI 37 (2018) 1310.
[5] Knoll et al., “Deep Learning Methods for Parallel Magnetic Resonance Image Reconstruction”,
arXiv:1904.01112.
[6] Venkatakrishnan et al., “Plug-and-play priors for model based reconstruction”, in Proc. IEEE Global
SIP (2013) 945.
[7] Reehorst & Schniter, “Regularization by denoising: Clarifications and new interpretations”, IEEE
TCI 5 (2018) 52.
[8] Sun et al., “An online plug-and-play algorithm for regularized image reconstruction”, IEEE TCI 5
(2019) 395.
[9] Meinhardt et al., “Learning Proximal Operators: Using Denoising Networks for Regularizing Inverse
Imaging Problems”, in Proc. ICCV (2017), arXiv:1704.03488.
[10] Zhang et al., “Learning Deep CNN Denoiser Prior for Image Restoration”, in Proc. CVPR (2017),
arXiv:1704.03264.
[11] Ryu et al., “Plug-and-Play Methods Provably Converge with Properly Trained Denoisers”, in
Proc. ICML (2019), arXiv:1905.05406.
[12] Gupta et al., “Cnn-based projected gradient descent for consistent CT image reconstruction”, IEEE
TMI 37 (2018) 1440.
[13] Chang et al., in Proc. ICCV (2017), “One network to solve them all–solving linear inverse problems
using deep projection models”, in Procs. ICCV (2017), arXiv:1703.09912.
[14] Yazdanpanah et al., “Deep Plug-and-Play Prior for Parallel MRI Reconstruction”, in Procs. ICCV
(2019), arXiv:1909.00089.
[15] Ronneberger et al., “U-net: Convolutional networks for biomedical image segmentation”, in Procs.
MICCAI (2015), arXiv:1505.04597.
[16] Zbontar et al., “fastMRI: An open dataset and benchmarks for accelerated MRI”, arXiv:1811.08839.
DEEP LEARNING IN COMPUTATIONAL IMAGING (COURSE PROJECT) 11/11

欢迎咨询51作业君