辅导案例-CSCI 567

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

CSCI 567 Summer 2018 Sample Midterm
DO NOT OPEN EXAM UNTIL INSTRUCTED TO DO SO
Name
ID #
Read the following instructions carefully; you will be expected to conform to them during the exam.
• Multiple choice questions must be answered by filling in the appropriate segments on the Scantron
form provided with a #2 pencil. These questions might not be weighted equally and partial credit may
be available on some of them.
• Free response questions should be answered concisely. Excessive writings will incur mark reduction.
Derivations should be short — no need to prove/derive extensively. No derivations will require more
than 10 lines.
• Write legibly so the grader can read your answers without misunderstandings. Avoid cursive writings.
• You must answer each free response question on the page provided. We will not provide blank new
copies of pages.
• The duration of the quiz is 90 minutes. Please budget your time on each question accordingly.
• You may not leave your seat during the exam for any reason. If you leave your seat, you may be
required to submit your exam at that moment.
• The quiz has a total of 7 physical pages (including this cover), 19 multiple choice questions and 3
free response questions. Each question may have sub-questions. Once you are permitted to open
your exam (and not before), you should count the pages to ensure that you have the right number.
• This is a closed-book exam. Consulting classmates, the Internet, or other resources is NOT permitted.
You may not use any electronics for any reason or say anything to a classmate for any reason.
• There are some spaces in the packet that can be used for scratch work. No additional scratch paper
will be provided.
• When the proctor tells you the test is over, you must immediately cease writing and close the packet.
Continuing to write after the exam is over will be reported for academic discipline, including at mini-
mum an F in the class.
Question/Section Possible Points
Multiple choices 19
1 3
2 3
3 5
This is the inside cover of the exam. You may use this as scratch paper if needed. Your packet must
remain attached throughout the exam – you may not remove this page for any reason.
2
Naive Bayes
1. The Super Bowl MVP award has been given out 53 times, although some individuals have won it more
than once. We can break down the data for the award by where the winner went to college; suppose
we have the following incomplete data for fifteen (15) instances of the award:
University Times an alum has won Super Bowl MVP
Michigan 5
USC 3
Florida State 2
Stanford 2
Louisville 1
UCLA 1
West Virginia 1
Suppose I want to use this (incomplete) data and a Naive Bayes approach to estimate the probability
that someone from USC wins the next Super Bowl MVP award. What will my probability estimate
be?
(a) 3/15
(b) 15/53
(c) 3/53
(d) 3/7
(e) 1/7
Regularization
Which of the following can be used as regularization to control model complexity? For questions 2 to 3,
answer yes by filling in (A) for the appropriate Scantron row, or no by filling in (B).
2. R(w) = ‖w‖1 =
∑D
d=1 |wd|
3. R(w) =
∑D
d=1 w
3
d
3
Concept Learning
For this problem, we will use the “sunburned?” domain from June 12’s discussion. Attributes in this problem
were:
• Hair Color, which could be blonde, brown, or red (3 choices)
• Height, which could be short, average, or tall (3 choices)
• Weight, which could be light, average, or heavy (3 choices)
• Lotion?, indicating if this person applied sunscreen, which could be yes or no (2 choices)
4. One of the following is not in S = 〈blonde, ?, ?, no〉and G = 〈?, ?, ?, ?〉for the Candidate-Elimination
algorithm. What is it?
(a) 〈blonde, average, light, no 〉
(b) 〈blonde, tall, light, no 〉
(c) 〈blonde, average, heavy, no 〉
(d) 〈red, average, average, no 〉
(e) 〈blonde, average, light, no 〉
5. Given the above attributes, how many syntactically distinct hypotheses are in the hypothesis space?
(a) 5 · 5 · 5 · 4
(b) 1 + 4 · 4 · 4 · 3
(c) 3 · 3 · 3 · 2
6. Given the above attributes, how many semantically distinct hypotheses are in the hypothesis space?
(a) 5 · 5 · 5 · 4
(b) 1 + 4 · 4 · 4 · 3
(c) 3 · 3 · 3 · 2
4
K Nearest Neighbors classifier
Questions 7 through 13 deal with the following data, where squares, triangles, and open circles are three
different classes of data in the training set and the diamond (♦) and star (*) are test points.
1 1.5 2 2.5 3
1
1.5
2
2.5
3
7. Suppose I use all the training data as the full validation set (disregarding the two specially-marked
test items) to classify the data using a KNN with k=1. How many of the 13 points will be correctly
classified?
(a) 0
(b) 1
(c) 11
(d) 12
(e) 13
8. For the KNN classifier with k=1, how many training data points will be misclassified with leave-one-out
heuristic?
(a) 0
(b) 1
(c) 11
(d) 12
(e) 13
9. What is the smallest value of k to always classify the diamond as class triangle?
(a) 1
(b) 3
(c) 5
(d) 7
(e) 9
5
The following is the same diagram from the previous page.
1 1.5 2 2.5 3
1
1.5
2
2.5
3
10. What is the smallest value of k to classify the star as open circle?
(a) 1
(b) 3
(c) 5
(d) 7
(e) cannot be done
11. What label will we predict for diamond with k = 1?
(a) Square
(b) Triangle
(c) Circle
(d) Cannot be determined from data available.
12. What label will we predict for star with k = 3?
(a) Square
(b) Triangle
(c) Circle
(d) Cannot be determined from data available.
13. What label will we predict for star with k = 5?
(a) Square
(b) Triangle
(c) Circle
(d) Cannot be determined from data available.
6
Decision tree
Questions 14 through 19, as well as the two free response questions in this section, deal with the data listed
in Table 1, which contains some information of several houses on the market.
Table 1: List of houses
Houses Lot (sqft) Built year Price ($)
A 10454 1970 535k
B 6969 1977 375k
C 6524 1910 350k
D 11019 1975 495k
E 10500 1993 520k
F 5719 2005 395k
G 4356 2000 440k
We would like to build a decision tree classifier for the house prices. For the following two questions,
consider the following binary classification problem: Whether the price of a house is higher than
$500k or not.
Lot size
(sq ft)
≤ 9000
Built year
≤ 1980 > 1980
> 9000
Tree 1
Lot size
(sq ft)
≤ 9000
Built year
≤ 1973 > 1983 > 1973 and ≤ 1983
> 9000
Tree 2
14. How many points in the training data are correctly classified by tree 1, assuming the leaf nodes have
the best-possible uniform answer in them?
(a) 3
(b) 4
(c) 5
(d) 6
(e) 7
15. How many points in the training data are correctly classified by tree 2, assuming the leaf nodes have
the best-possible uniform answer in them?
(a) 3
(b) 4
(c) 5
(d) 6
(e) 7
7
Free Response Question 1 (3 points)
Given an impurity measure QT (m) for the leaf m of tree T , let’s define the tree comparison measure as
CT =
(∑
m
QT (m)
)
+ λ|T |,
where λ is a regularization paramter and |T | measures the number of leaf nodes.
If we use Gini index as impurity measure, find the value of λ such that these two trees are equally good.
(Gini Index for leaf m with K classes:
∑K
k=1 pˆmk(1− pˆmk))
8
Boosting
We would like to use boosting with decision stumps to classify the house prices. Given a set of weak classifiers
H, in which each weak classifier is a mapping from x ∈ RD to y ∈ {+1,−1} , we would like to construct a
strong classifier as
h(x) = sign
[
T−1∑
t=0
βtht(x)
]
,
where ht(x) ∈ H for t = 0, 1, · · · , T − 1 and
sign(a) =
{
1 a ≥ 0
−1 a < 0 .
For the remaining questions in this section, consider the following binary classification problem: Whether
the price of a house is higher than $400k or not.
The scatter plot of houses is shown in Figure 1. Note that any dimension (other than price, obviously)
from Table 1 is a possible dimension for our boosting algorithm.
Figure 1: Scatter plot of houses with an example decision stump
What are the first two decision stumps? Note the example included an example stump; your
stumps should be from an empty start. The example is on attribute “built year” and is
between houses A and C
Decision stump example as shown in Figure 1
16. On which attribute is decision stump 1 based?
(a) lot
(b) built year
17. Decision stump 1 is between two houses. Which ones?
(a) A and B
(b) A and C
(c) F and C
(d) E and G
(e) F and G
9
18. On which attribute is decision stump 2 based?
(a) lot
(b) built year
19. Decision stump 2 is between two houses. Which ones?
(a) A and B
(b) A and C
(c) F and C
(d) E and G
(e) F and G
Free Response Question 2 (3 points)
What is the minimum number of decision stumps needed for boosting if we would like to classify the given
samples perfectly? Why?
10
Linear Regression with input dependent noise (5 points)
In linear regression, we have a model in which output is linearly dependent on input with respect to param-
eters.
y = wTx+
We also saw that if noise is a normal distribution (N (0, σ2)), maximizing likelihood leads to the sum of
squares loss formulation.
Here, we will consider a simple case where y and x are scalars and x, y ∈ R but the noise is dependent
on input such that ∼ N (0, σ2x2) i.e. standard deviation scales linearly with the input. We observe N
independent training samples denoted as D = {(xi, yi)}Ni=1. Model is
y = wx+
Information: x ∼ N (µ, σ2) =⇒ p(x) = 1√
2piσ
exp{− (x−µ)22σ2 }
3.1
Write an expression for likelihood of data p(D|w) in terms of p(x), σ, xi, yi, w.
11
3.2
Find the maximum likelihood estimate wML of weights w.
3.3
Assuming that the prior distribution of w is N (0, α2), express the posterior distribution of w, i.e. p(w|D) in
terms of p(x), σ, α, xi, yi, w.
12
3.4
Find the MAP estimate wMAP of weights w.
3.5
Show that MAP estimate (wMAP ) is biased whereas ML estimate (wML) is not.
Information: An estimate wˆ is biased if E[wˆ] 6= w
13