辅导案例-CSCI433/933
CSCI433/933: Machine Learning Algorithms and
Applications
Assignment Problem Set III (Individual)
Part A: General knowledge questions - 15 Marks
1. In what way are regression and neural networks models similar?
2. Why is a deep neural network better than a shallow network?
3. What is the difference between a convolutional neural network (CNN) and a recurrent neural
network (RNN)?
4. Explain the principles of back propagation algorithm.
5. Explain the concepts of over-fitting and under-fitting
6. Explain the effects of ridge and LASSO regularizations as used in regression model.
Part B: Design and Programming - 50 + 10 bonus Marks
Aims and Objectives
This assignment aims at evaluating basic familiarity with fundamental concepts and implementa-
tion of deep neural networks and statistical machine learning. On completion of this assignment,
you should be able to demonstrate basic mastery of:
• concepts of deep autoencoder, feature extraction, SVM, convolutional autoencoder;
• implementation of machine learning algorithms using Tensorflow, Keras, SkLearn.
Introduction
Autoencoder is a popular unsupervised learning technique used to learn data representations.
Specifically, a neural network architecture is designed such that we impose a bottleneck in the
network which forces a compressed knowledge representation of the original input. To build an
autoencoder, we need three components: an encoder to compress the data, a decoder to decompress
the data, and a loss function to measure the data reconstruction error. There are different types
of autoencoder models with the variation of the encoder/decoder architectures. This assignment
will focus mainly on two popular autoencoder models, which are deep fully-connected autoencoder
and deep convolutional autoencoder.
1
What needs to be done
1. Implement a multi-layer fully-connected autoencoder using Tensorflow and Keras (15 + 5 bonus
Marks):
• Load fashion mnist using keras. Fashion mnist is a dataset of article images-consisting of a
training set of 60,000 examples and a test of 10,000 examples. Each image is a 28×28 grayscale
image, associated with a label from 10 classes. Change the data type of each image as float32
and normalize the pixel values to [0, 1]. Hint: x train, x test = x train.astype(’float32’)/255.0,
x test.astype(’float32’)/255.0.
• Use Tensorflow and keras to implement a six-layer fully-connected autoencoder based on
fashion mnist dataset. The encoder and decoder of the autoencoder both consists of three
layers. Each input image is flattened into dimensionality of 784. The three encoder layers
are with output dimensionality of 128, 64, 32. The three decoder layers are with output
dimensionality of 64, 128, 784. After each of the six layers, nonlinear function ReLU is
applied.
• Train the network using mean squared error as loss function and Adam as optimizer.
The batch size should be set to 256. Train the network for 30 epochs. Hint: for each epoch,
the training data should be randomly shuffled.
• Print out the training error and testing error for each epoch. Randomly choose two test
images, display and compare the original images and reconstructed images.
• Bonus (5 marks): Try to increase the depth (more layers) and width (higher dimensional
hidden layers) of the autoencoder and monitor the change of training and testing losses.
2. Train SVM classifier based on the image representations extracted from the above autoencoder
(15 marks):
• Once the above fully-connected autoencoder is trained, for each image, extract the 32-
dimensional hidden vector (the output of the ReLU after the third encoder layer) as the
image representation.
• Train a linear SVM classifier on the training images of fashion mnist based on the 32-
dimensional features. Tune the hyper-parameter ’C’ using cross-validation. Print out the
training accuracy. SkLearn is recommended.
• Test the trained SVM on the test images of fashion mnist. Print out the testing accuracy.
• Try a kernel-based SVM and compare the performance with linear SVM.
3. Train deep convolutional autoencoder using Tensorflow and Keras (10 + 5 bonus Marks):
• Load and pre-process fashion mnist as in fully-connected autoencoder.
• Train a six-layer convolutional autoencoder. Different from fully-connected autoencoder,
each encoder and decoder layer is now a convolutional layer instead of fully-connected layer.
Again, the autoencoder consists of three encoder layers and three decoder layers. Each
input image to the autoencoder is now 3D (28 × 28 × 1). For each convolutional layer, use
convolutional kernel of 3 × 3, stride = 1, padding=’same’. After the first and second
encoder convolutional layers, use (2, 2) max pooling to downsample the feature maps and
after the first and second decoder convolutional layers, use (2, 2) upsampling2d operation
to upsample the feature maps. Choose proper number of neurons for each convolutional
layer, e.g. 16, 24. Each convolutional layer uses activation function ReLu.
• Train the network using mean squared error as loss function and Adam as optimizer.
The batch size should be set to 256. Train the network for 15 epochs. Hint: for each epoch,
the training data should be randomly shuffled.
2
• Print out the training error and testing error for each epoch. Randomly choose two test
images, display and compare the original images and reconstructed images.
• Bonus (5 marks): Implement denoisy convolutional autoencoder using the same architecture
as above.
4. Write a report of no more than three (3) pages to illustrate the experiments as well as your
conclusions (10 Marks). By default, after reading your report, others should be able to reproduce
your experiments.
Part C: Numerical/Analytical - 35 Marks
A company manufactures personal protective equipment (PPE) for use in hospitals and personal
use. The company has two manufacturing facilities (factories) from which these PPE are made
before they are transported to a warehouse and packed for export. After a few consignments
were delivered, customers complained about defective equipment. The PPE are made from several
components that could contribute to malfunctioning. It is desired to design a classifier that can
identify which of the two factories produced a given defective PPE. Let the components of the PPE
be available and measurable/testable. They can be be modelled as d conditionally independent
binary-valued features, x = (x1, . . . , xd)
t, where the components xi are either 0 or 1. The two
factories are modelled as classes ω1 and ω2. Suppose the company has information about the
reliability of the factories regarding each component. This information can be modelled as
pi = Pr[xi = 1|ω1]
qi = Pr[xi = 1|ω2],
where pi and qi are respectively the probabilities of each factory making a non-defective component.
Each feature thus gives us a yes/no answer about the pattern. Furthermore, if pi > qi, we expect
the i th feature to give a “yes” answer more frequently when the state of nature is ω1 than when
it is ω2.
Now, the assumption of conditional independence allows the class conditional probabilities to be
written as products of the components of x. For example we can write:
P (x|ω1) =
d∏
i=1
pxii (1− pi)1−xi (1)
for the class ω1. The likelihood ratio is written as
P (x|ω1)
P (x|ω2) . The discriminant function for each class
can be written equivalently (i.e. they will produce the same result) as any one of the following
three equations:
gi(x) = P (ωi|x) = p(x|ωi)P (ωi)∑c
j=1 p(x|ωj)P (ωj)
(2)
gi(x) = p(x|ωi)P (ωi)
gi(x) = ln p(x|ωi) + lnP (ωi) (since natural logarithm is a monotonic function.)
where (ln) is the natural logarithm and c is the number of classes. For this binary classification
problem we can combine the two discriminant functions and write
g(x) ≡ g1(x)− g2(x) (3)
and then decide class ω1 if g(x) > 0; class ω2 if g(x) ≤ 0.
1. Using Equations (1), (2 ) and (3) derive a discriminant function of the form
g(x) =
d∑
i=1
wixi + x0 (4)
3
where
wi = ln
pi(1− qi)
qi(1− pi) i = 1, . . . , d
and
w0 =
d∑
i=1
ln
1− pi
1− qi + ln
P (ω1)
P (ω2)
and wi is a weight; w0 is the bias.
2. What is the relevance of the weight in determining a classification?
3. Assuming there are only three components in making a PPE, and P (ω1) = P (ω2) = 0.5,
pi = 0.8, qi = 0.5 for i = 1, 2, 3, compute the values of the weights wi and threshold w0.
4. Using the values of wi and w0 obtained in the last question, determine the factory (ω1 or ω2)
from which a PPE with feature (1, 0, 1) was manufactured.
Part D: Submission instructions
Submission:
1. Prepare your response to Part A in a PDF file of no more than 2 pages.
2. Prepare three executable python files, i.e. deep autoencoder.py, svm.py, conv autoencoder.py.
All the implementations should be based on python3.
3. Prepare a PDF file containing your 3-page report.
4. Prepare your response to Part C in a PDF file of no more than 3 pages.
5. These files should be compresssed as a ZIP (or RAR) file and submitted on Moodle on or
before the due date and time.
4
51作业君 51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: IT_51zuoyejun