程序代写案例-ECM3420

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top


ECM3420



UNIVERSITY OF EXETER

COLLEGE OF ENGINEERING, MATHEMATICS
AND PHYSICAL SCIENCES

CO
MPUTER SCIENCE


Exam period May 2021


Learning from Data
Module Convenor: Mohamed Bader-El-Den


Duration: TWO HOURS + 30 MINUTES UPLOAD TIME


Answer ALL questions.

Please use EXAM ANSWER SHEET for writing your answers.

The marks for this module are calculated from 60% of the percentage mark for this paper plus 40% of the
percentage mark for associated coursework.


This is an Open Book exam



ECM3420


ECM3420 May 2021 Page 1 of 11
Section 1: Multiple Choice Questions
There are two types of questions:
• [Select one answer only]: you must select one and only one answer. Selecting more or less than
one answer will result in ZERO marks.
• [Select all the correct statements]: in these questions you must select the correct answers only.
Selecting more or less than correct answers will result in ZERO marks.


1- What methods could be used to help reduce overfitting in decision trees? [Select all the correct
statements].
☐ A) Pruning.
☐ B) Enforce a minimum number of samples in leaf nodes.
☐ C) Make sure that each leaf-node is one pure class.
☐ D) Make sure that your data is normalized.
☐ E) Enforce a maximum depth for the tree.
☐ F) Use “entropy” to calculate the information gain.
(2 marks)

2- Which of the following statements about Neural Networks is/are true? [Select all the correct
statements].
☐ A) Optimize a convex cost function.
☐ B) Always output values between 0 and 1.
☐ C) Can be used in an ensemble.
☐ D) Can be used for regression as well as classification.
(2 marks)

3- In neural networks, what is/are true about the nonlinear activation functions such as sigmoid, tanh, and
ReLU? [Select all the correct statements].
☐ A) Used to speed up the gradient calculation in backpropagation, as compared to linear units
☐ B) Are applied only to the output units
☐ C) Help to learn nonlinear decision boundaries
☐ D) Always output values between 0 and 1
(2 marks)

4- Suppose we are given data comprising points of several different classes. Each class has a different
probability distribution from which the sample points are drawn. We do not have the class labels. We use
k-means clustering to try to guess the classes. Which of the following circumstances would undermine its
effectiveness? [Select all the correct statements].
☐ A) Some of the classes are not normally distributed.
☐ B) The variance of each distribution is small in all directions.
☐ C) Each class has the same mean.
☐ D) You choose k = n, the number of sample points
(2 marks)

ECM3420
ECM3420 May 2021 Page 2 of 11
5- You have used the same data to train two different Decision Tree (DT) classifiers. The first DT has 2
levels (DT2), the second DT has 6 levels (DT6). In terms of the bias-variance decomposition, the “DT6”
model is likely to have:
[Select all the correct statements]
☐ A) Higher variance than “DT2” model.
☐ B) Lower variance than “DT2” model.
☐ C) Higher bias than “DT2” model.
☐ D) Lower bias than “DT2” model.
(2 marks)

6- Which of the following are true about bagging? [Select all the correct statements].
☐ A) In bagging, we choose random subsamples of the input points with replacement
☐ B) Bagging is ineffective with logistic regression, because all of the learners learn exactly
the same decision boundary
☐ C) The main purpose of bagging is to decrease the bias of learning algorithms.
☐ D) If we use decision trees that have one sample point per leaf, bagging never gives lower
training error than one ordinary decision tree.
(2 marks)

7- Regarding variance and bias, which of the following statements are true? (Here ‘high’ and ‘low’ are
relative to the ideal model.) [Select all the correct statements].
☐ A) Models which overfit have a high bias.
☐ B) Models which overfit have a low bias.
☐ C) Models which underfit have a high variance.
☐ D) Models which underfit have a low variance.
(2 marks)

8- High entropy means that the partitions in classification are [select one answer only]
◯ A) pure
◯ B) not pure
◯ C) useful
◯ D) useless
(2 marks)

9- Suppose we would like to perform clustering analysis on a spatial dataset such as the geometrical
locations of properties and houses. We wish to produce clusters of many different sizes and shapes. Which
of the following methods is the most appropriate? [select one answer only]
◯ A) Decision Trees
◯ B) Density-based clustering
◯ C) Model-based clustering
◯ D) K-means clustering
(2 marks)
ECM3420
ECM3420 May 2021 Page 3 of 11
10 – You are dealing with an imbalanced dataset. You decide to use Synthetic Minority Oversampling
Technique (SMOTE) to deal with the class imbalance in the data. Select the most appropriate way to
apply SMOTE out of the following. [select one answer only]
◯ A) SMOTE should be applied on all records in the dataset (training and testing).
◯ B) SMOTE should be applied on training records only.
◯ C) SMOTE should be applied on testing records only.
◯ D) SMOTE should be applied on a random subset of both training and testing.
◯ E) SMOTE is not a suitable method to deal with the class imbalance.
(2 marks)

11 – You have been asked to develop prediction models for the London Stock Exchange market. Your
models should be able to predict if the price of a given “stock market share” is likely to increase or
decrease in the future. What would be the most suitable validation method to evaluate your models:
[select one answer only]
◯ A) K-Fold cross validation
◯ B) Out-of-time validation sampling
◯ C) Hold-out validation sampling.
◯ D) Leave-one-out Cross Validation.
(2 marks)

12 - If we know the support of itemset {a, b} is 12, which of the following numbers are the possible
supports of itemset {a, b, c}? [Select all possible answers].
☐ A) 12
☐ B) 13
☐ C) 10
☐ D) 15
(2 marks)

13 - Below are the 8 actual values of target/output variable in the train file.
[0,0,0,1,1,1,1,1]
What is the entropy of the target variable? [Select one answer only].
◯ A) -(5/8 log(5/8) + 3/8 log(3/8))
◯ B) 5/8 log(5/8) + 3/8 log(3/8)
◯ C) 3/8 log(5/8) + 5/8 log(3/8)
◯ D) 5/8 log(3/8) – 3/8 log(5/8)
(2 marks)


ECM3420
ECM3420 May 2021 Page 4 of 11
14- What is “gradient descent”? [Select one answer].
◯ A) Entropy function.
◯ B) Iterative optimization algorithm.
◯ C) Recommendation algorithm.
◯ D) Classification algorithm.
◯ E) Clustering algorithm.
(2 marks)

15- Which of the following clustering methods suffers from the “convergence at local optima problem”?
[Select all the correct statements].
☐ A) K-means clustering
☐ B) DBSCAN clustering
☐ C) Agglomerative clustering
☐ D) K-medoid clustering
(2 marks)

16- What of the following is/are correct about K-Mean Clustering? [Select all the correct statements].
☐ A) K-means is extremely sensitive to cluster centre initializations.
☐ B) Bad initialization can lead to poor convergence speed.
☐ C) Bad initialization can lead to bad overall clustering.
☐ D) Sensitive to outliers in the data
☐ E) Deterministic method
(2 marks)

17- Which of the following statements is/are true about Random Forest and Gradient Boosting ensemble
methods? [Select all the correct statements]
☐ A) Both algorithms can be used for classification problems.
☐ B) Random Forest is used for classification problems, but Gradient Boosting is used for
regression problems.
☐ C) Random Forest is used for regression problems, but Gradient Boosting is used for
Classification problems.
☐ D) Both algorithms can be used for regression problems.
(2 marks)

18- Given a “Random Forest” model consists of a number of trees (say T1, T2 ....Tn). Select from the
following what is/are true about a single (Tk) tree in the Random Forest? [Select all the correct
statements]
☐ A) A single tree is built on a subset of the input features/attributes.
☐ B) A single tree is built on all the input features/attributes.
☐ C) Individual tree is built on a subset of observations/records in the dataset.
☐ D) Individual tree is built on full set of observations/records in the dataset.
(2 marks)

ECM3420
ECM3420 May 2021 Page 5 of 11
19- Select the scenario(s) in which a “Gain Ratio” is preferred over “Information Gain” for training
Decision Tree Classifiers. [Select all the correct statements]
☐ A) When a categorical or ordinal attribute has very large number of categories.
☐ B) When a categorical or ordinal attribute has very small number of categories.
☐ C) Number of categories or ordinal attribute is not the reason.
☐ D) When the number of the features is small.
☐ E) None of these
(2 marks)

20- Assume that you are dealing with a classification dataset with highly imbalanced classes. The
majority class is observed 99% of the times in the training data.
Your model has 99% accuracy after taking the predictions on test data. Which of the following is true in
such a case? [Select all the correct statements]
☐ A) Accuracy metric is not a good idea for imbalanced class problems.
☐ B) Accuracy metric is a good idea for imbalanced class problems.
☐ C) Precision and recall metrics are good for imbalanced class problems.
☐ D) Precision and recall metrics aren't good for imbalanced class problems.
(2 marks)

21- Consider the following transaction set:

ID Items
10 Bread, Milk, Pen, Nuts
20 Bread, Nuts, Pen
30 Bread, Pen, Eggs
40 Bread, Nuts, Eggs, Milk
50 Nuts, Milk, Pen, Eggs

Given the above transaction table and “Minimum Support” s = 60%, how many frequent 3-itemsets are
there? [Select one answer only]
◯ A) 0
◯ B) 1
◯ C) 2
◯ D) 3
◯ E) 4
◯ F) 5
(2 marks)

22- The widely used metrics and tools to assess a classification model are: [Select all the correct
statements]
☐ A) Confusion matrix.
☐ B) Cost-sensitive accuracy.
☐ C) Davies–Bouldin index.
☐ D) Area under the ROC curve.
(2 marks)
ECM3420
ECM3420 May 2021 Page 6 of 11
23- Which of the target/output variables has the highest entropy value? [Select one answer only]
◯ A) [a a a b b b]
◯ B) [a a a a a a a a b b b b b b b b]
◯ C) [a a a a a]
◯ D) [b b b b b b b]
◯ E) [a b c]
(2 marks)

24– You are increasing the size of the layers (more hidden units per layer) in your neural network. What
kind of impact it will likely have on bias and variance? [Select one answer only]
◯ A) Increases, increases
◯ B) Increases, decreases
◯ C) Decreases, increases
◯ D) Decreases, decreases.
(2 marks)

25- There are different algorithms for Association Rule Learning. What is/are the main difference(s)
between most Association Rule Learning algorithms? [Select all the correct statements]
☐ A) Different algorithms are likely to generate different rules.
☐ B) The quality of the rules
☐ C) Execution Time
☐ D) Memory requirements may be different
☐ E) Different tuning parameters
(2 marks)

26- The ABCD bank developed a fraud detection system to monitor the credit card transactions of their
clients. To train the model they used 20 million records corresponding to all the card transactions in
October 2020. Following the feature and model selection, the model was calibrated and trained. The
accuracy of the model on the test data was a staggering 99.9%. The model was deployed but after one
month in production, the model failed to flag 8,630 fraudulent activities (most of the fraudulent
activities). You were hired as a consultant to help them understand what happened with the system.
What would be the simplest explanation for the failure of this system? [select one answer only]
◯ A) The model is overfitted due to a large degree of noise in the data.
◯ B) The hyperparameters of the model were not fine-tuned appropriately.
◯ C) The dataset is extremely imbalanced, and they did not account for this fact.
◯ D) The data has not been appropriately rescaled.
◯ E) Due to the large number of dimensions (20M) the curse of dimensionality is causing all
the data points to be very close to each other in terms of their pairwise distances.
(2 marks)


ECM3420
ECM3420 May 2021 Page 7 of 11
27- A data scientist from an innovative, revolutionary and disruptive location-based, social media start-up
called FACEBOOT, in order to obtain better insights on the users' activities behaviours (e.g., number of
"enjoys", interaction frequencies, engagement), decides to use an unsupervised strategy to extract
different activity profiles for their users.
What is the most plausible explanation for the decision to use an unsupervised approach instead of a
supervised method to identify the groups of activity profiles? [select one answer only]
◯ A) The computational cost of running a supervised method on such a high-dimensional
dataset would make it inappropriate from a business perspective.
◯ B) The presence of categorical variables in the data can significantly impact the performance
of a supervised method.
◯ C) Given the complex nature of human behavioural traits, it is extremely unlikely that these
different activity profiles would be linearly separable.
◯ D) The dataset might not contain an attribute uniquely characterising the individual
behavioural profiles of interest.
◯ E) The decision to use an unsupervised approach cannot be justified by any of the
explanations above.
(2 marks)

28 - You run gradient descent for 15 iterations with a learning rate a=0.3 and compute J (theta) after each
iteration. You find that the value of J(Theta) decreases quickly and then levels off. Based on this, which
of the following conclusions seems most plausible? [select one answer only]
◯ A) Rather than using the current value of a, use a larger value of a (say a=1.0).
◯ B) Rather than using the current value of a, use a smaller value of a (say a=0.1).
◯ C) a=0.3 is an effective choice of learning rate.
◯ D) None of the above.
(2 marks)

29- In which of the following cases the DBSCAN clustering algorithm is likely to fail to give good
results? [Select all the correct statements]
☐ A) Data points with outliers.
☐ B) Different clusters have different density of points.
☐ C) Data points with arbitrary shapes.
☐ D) Data with too many dimensions.
☐ E) Data with small number of clusters.
(2 marks)


ECM3420
ECM3420 May 2021 Page 8 of 11
30- Imagine you are involved in a project which deals with binary classification problems. The confusion
matrix for one of the models that you have trained on the training set is shown below:

n = 175 Predicted Positive Predicted Negative
Actual Positives 50 10
Actual Negative 15 100

Which calculation(s) is/are correct? Numbers are rounded to the 3rd decimal place. [Select all the correct
statements]
☐ A) True positive rate = 0.671
☐ B) Accuracy = 0.857
☐ C) Sensitivity = 0.833
☐ D) False positive rate = 0.0
☐ E) Precision = 0.50
(2 marks)

31- Why is it crucial for machine learning algorithms to have access to data of high quality? [select one
answer only]
◯ A) It will take too long for programmers to scrub poor data.
◯ B) If the data quality is high, the algorithms development will be easier.
◯ C) Poor-quality data requires more processing power than high-quality data.
◯ D) If the data quality is poor, you will get inaccurate results.
(2 marks)

32- Your insurance company wants to use machine learning to predict whether existing car insurance
customers are more likely to buy life insurance. It created a model to better predict the best customers
contact about life insurance, and the prediction model had a low variance but high bias. What does that
inform you about the prediction model? [select one answer only]
◯ A) The model is consistently wrong.
◯ B) The model is inconsistently wrong.
◯ C) The model is consistently right.
◯ D) The model is equally right and wrong.
(2 marks)


ECM3420
ECM3420 May 2021 Page 9 of 11
33- You have been asked to identify global weather patterns that may have been affected by changes in
the climate. To achieve this task, you want to use machine learning algorithms to find patterns in the data
that would otherwise be imperceptible (not known) to a human meteorologist (weather forecast expert).
What would be a possible starting point? [select one answer only]
◯ A) Find labelled data of sunny days so that the machine models will learn to identify bad
weather.
◯ B) Use unsupervised learning methods to look for anomalies in a massive weather database.
◯ C) Create a training dataset of unusual patterns and ask the machine learning algorithms to
classify them.
◯ D) Create a training dataset of normal weather and have the machine look for similar
weather patterns.
(2 marks)

34- During the training process of a neural network, the loss function stagnates after a number of epochs.
Why might this be happening? [Select all the correct statements]
☐ A) Learning rate is too small.
☐ B) Learning rate is too large.
☐ C) The model is stuck in a local minimum.
☐ D) Regularization parameter is too large.
(2 marks)

35- You work for the government team that is tracking the community spread of a COVID-19 virus. The
data science team created a smartwatch app that uploads body temperature data from thousands of
participants. The data collected was combined with the participants’ COVID-test results (positive or
negative). What is the best technique to analyse the data? [select one answer only]
◯ A) Use reinforcement learning to reward the system when a new person participates.
◯ B) Unsupervised machine learning to cluster together people based on patterns the machine
discovers.
◯ C) Supervised machine learning to sort people based on their demographic data.
◯ D) Supervised machine learning to classify people by body temperature.
(2 marks)

(Total 70 marks)


ECM3420
ECM3420 May 2021 Page 10 of 11
Section 2: True or False with Justification
• In the justification section you are expected to list the reasons, give examples etc. to justify your
answer.
• You must clearly answer the question by True or False and provide the justification.


1- In association rule learning, the following two rules will always have the same support value.
Rule 1: {item_3, item_2} -> {item_1}
Rule 2: {item_1, item_3} -> {item_2}
True or False? Justify.
(3 marks)


2- k-means clustering cannot be used to perform hierarchical clustering as it requires k (number of
clusters) as an input parameter.
True or False? Justify.
(3 marks)


3- You should apply standardization / normalization on your data before applying the Random
Forest classification algorithm.
True or False? Justify.
(3 marks)


4- The “recall” measure is not enough to give a full picture of the classifier performance.
True or False? Justify.
(3 marks)


5- K-means clustering algorithm is less sensitive to outliers than the K-medoid clustering algorithm.
True or False? Justify.
(3 marks)


6- If you increase the number of hidden layers of a multilayer perceptron, the classification error
will be reduced.
True or False? Justify.
(3 marks)


7- Decision trees are considered as unstable classifiers.
True or False? Justify.
(3 marks)

ECM3420
ECM3420 May 2021 Page 11 of 11
8- Imputation methods i.e., mean and median could be used to replace any type of missing values.
True or False? Justify.
(3 marks)


9- Decision Tree (DT) algorithms should not be applied to data that has missing values because the
DT algorithm will not be able to split the attribute that has missing values.
True or False? Justify.
(3 marks)


10- Silhouette analysis and Elbow methods could be used to identify individual points/records that
should be in another class.
True or False? Justify.
(3 marks)


(Total 30 marks)








(Total Exam 100 marks)








End of Paper


欢迎咨询51作业君
51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468