程序代写案例-CSCI 561

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

CSCI 561 Foundations of Artificial Intelligence
Spring 2019
Final Exam

Final Exam
CSCI 561 Spring 2019: Foundations of Artificial Intelligence
Please print neatly.
Exam Version 1.2 with Solution
Instructions:
1. Date: May 1, 2019 4:30-6:20 PM PT.
2. Maximum points for this midterm: 100
3. The percentages for each question are indicated in square brackets [ ].
4. No books, cell phones, calculators (or any other material) are allowed.
5. Write down your name, student ID and USC email address.
6. Write answers ONLY in designated areas.
7. Do NOT write on the 2D QR code.
8. The back of the pages will NOT be graded. You may use it for scratch paper.
9. No questions during the exam. If something is unclear to you, write it down on your exam
paper.
10.Be brief: a few words are often enough if they are precise.
11.When finished, raise completed exam sheets until approached by a proctor.
12.Adhere to the Academic Integrity Code.

Problems 100 Percent total
General AI Knowledge 1 (TF) 10
General AI Knowledge 2 (MC) 20
Markov Decision Process 20
Bayesian Network 20
Decision Tree 12
Neural Networks 18

1. [10%] General AI Knowledge: True/False
For each of the statements below, answer T if the statement is always true, or F otherwise.

Each question is worth 1 point, no partial credit.

T 1. The outcome of a Markov Decision Process is dependent on both chance and a decision maker’s choice.
F 2. Overfitting is when a machine learning hypothesis space has too few dimensions.
T 3. Naive Bayes assumes conditional independence between input variables given the output variable.
F 4. Uniform Cost Search is a special case of Depth-First Search.
F 5. Simulated Annealing is a probabilistic algorithm that guarantees convergence on a global minimum (or maximum) solution.
T 6. A logical agent can behave rationally even in a partially observable environment.
T 7. (A∧B) ∨ ¬(A ⇒ B) is satisfiable.
F 8. A Horn clause is a disjunction of literals with exactly one positive literal.
F 9. The k-nearest neighbors algorithm (k-NN) is a parametric method that can be used for classification and regression.
T 10. The hill climbing algorithm is equivalent to simulated annealing with a constant temperature of 0.

2. [20%] General Knowledge: Multiple Choice

Each question has one or more correct answers. Circle all correct answers. Please note that
there will be no partial credit. You will receive full credit if and only if you choose all of the
correct answers and none of the wrong answers.

Each question is worth 2 points, no partial credit.

1. Given two admissible heuristic functions h1(s) and h2(s), which is also an admissible heuristic
function?

A. h1(s) + h2(s)
B. h1(s) * h2(s)
C. max(h1(s), h2(s))
D. min(h1(s), h2(s))
E. None of the above

2. Which of the following statement about Bayesian networks is true?

A. A Bayesian network can be a directed and acyclical graph
B. A Bayesian network can be an undirected and cyclical graph
C. A Bayesian network can be an undirected and acyclical graph
D. A Bayesian network can be a fully connected graph
E. A Bayesian network can be a graph with no edges

3. What substitutions result from unifying P(x, Messi, y) with P(Neymar, F(y), z)?

A. {x/Neymar, Messi/F(y), y/z}
B. {x/Messi, y/Neymar, F/z}
C. {x/Neymar, F(y)/Messi, y/z}
D. {x/Neymar, y/Messi, z/Messi}
E. These literals fail to unify
4. Which of the following arc consistency outcomes is insufficient to draw an unambiguous
conclusion?

A. One domain is empty
B. Each domain has a single value
C. One domain has more than one value
D. Some domains have more than one value
E. None of the above

5. Consider p = “If Vijay is Alice’s brother, then Alice is Bob’s aunt” and q = “Bob is Vijay’s nephew.”
Which of the following statements is equivalent to p ⇒ q?

A. If Bob is Vijay’s nephew, then Vijay is Alice’s nephew and Alice is not Bob’s aunt.
B. If Bob is Vijay’s nephew, then Vijay is Alice’s brother and Alice is Bob’s aunt.
C. If Bob is not Vijay’s nephew, then Vijay is not Alice’s brother or Alice is not Bob’s aunt.
D. If Bob is not Vijay’s nephew, then Vijay is not Alice’s brother and Alice is Bob’s aunt.
E. If Bob is not Vijay’s nephew, then Vijay is Alice’s brother and Alice is not Bob’s aunt.

6. Consider using k-nearest neighbors with k=3 to classify inputs with two dimensions, using
Manhattan Distance, defined as:

dist((a, b), (c, d)) = |a - c| + | b - d|

Considering the following data set:
(0, 0), FALSE
(1, 0), TRUE
(0, 1), TRUE
(2, 1), FALSE
(3, 0), FALSE
(3, 1), TRUE

Which of the following new inputs would be labeled as TRUE?

A. (0, 0)
B. (0, 1)
C. (1, 0)
D. (1, 1)
E. (2, 0)
F. (3, 0)
7. What is the English equivalent of “¬∀x(Computer(x) ⇒ Intelligent(x))”?

A. Not every computer is intelligent
B. There is no intelligent computer
C. Being a computer implies not being intelligent
D. Some computers are not intelligent
E. None of the above

8. Consider the game tree illustrated below where A-F are integers. Assume the nodes are explored
from left to right and alpha-beta pruning is used. Which of the following is true?

A. if A = 3, B can be pruned
B. if A = 6, B can be pruned
C. if A = 5, B = 5, then the subtree containing C and D can be pruned
D. if A = 5, B = 5, C = 3, D = 4, then the subtree containing E and F can be pruned
E. There are SOME values of A and B such that the subtree containing C and D can be pruned

9. Consider the following Bayesian network, which of the following statements are true?

A. A and B are independent
B. A and B are conditionally independent given C
C. D and E are independent
D. D is conditionally independent of G given C
E. H is dependent on D and E

10. A CSCI561 student got an internship at a movie studio in Silver Lake. She learned that her job will
involve adding colors to old movies, currently, a process requiring manual touchup frame-by-frame.
Recalling the lessons from her AI class, she decided to use deep learning. The state-of-the-art
auto-colorization technique requires object detection, image classification, and a multilayer
convolutional network (CNN). Which of the following AI techniques will this aspiring movie editor
use?

A. Supervised
B. Unsupervised
C. Semi-supervised
D. Hierarchical
E. None of the above
** We will accept either ABD or ABCD **
3. [20%] Markov Decision Process

Consider the 3 × 3 grid world below. At each state, the agent may have a choice of three deterministic
actions: Up, Down, or Right. Actions that would move the agent into a wall are not allowed. For
example, in state (1, 3), the only available actions for the agent are Down and Right. The goals
(terminal states) with rewards are in the state (3, 1) and (3, 3). The agent gets a reward of R(3,1) = 4
if it’s at (3, 1) and a reward of R(3, 3) = 8 if it’s at (3, 3). The discount factor is γ = 1/2 and the reward
is R(s) = 0 for each nonterminal state, s.

1. [5%] On the figure below, show the optimal value V* and the corresponding optimal action for each
state. (In any state, if multiple actions result in the same optimal value, just write down one of them as
the optimal action.)

Optimal values: Optimal actions:

Solution:

Note:
- For optimal actions, if they use arrows (→, ↓, etc.) or letters (R, D, etc.) instead of English
words, it’s OK.
- For state (1, 2) and (2, 2), either UP or RIGHT is acceptable
- 2 points for each figure. The last 1 point is a gift.
- For each figure, partial credit (1 point) if any grid is incorrect.

2. [5%] What POSITIVE values of γ would make the optimal action in state (2, 1) be UP, assuming no
other changes to the MDP? Use fractions (no decimal points) to show the final answer.

Solution:
The action in state (2,1) Right => Up
R(3, 3) > γ R(3, 1) and 1 ≥ γ > 0γ3 (3 points)
1 ≥ γ > 2
√2 (2 points)

Note:
- Please check their final answer first. If it’s correct, most likely it gets full credits.
- If they use value instead of symbol, it’s OK. For example, 8 > γ 4.γ3
- Only γ > without 1 ≥ γ > 0 is OK.2
√2
- If a student only gives a possible value of γ, give 1 point

3. [5%] What values of R(s) for nonterminal states, s, would make the optimal action in state (2, 1) be
UP, assuming the original discount factor (γ = 1/2) and no other changes to the MDP? Use fractions
(no decimal points) to show the final answer.

Solution:
The action in (2, 1) Right => Up
R(s) + γ R(s) + R(s) + R(3, 3) > R(s) + γ R(3, 1)γ2 γ3 (3 points)
R(s) > 4/3 (2 points)

Note:
- Please check their final answer first. If it’s correct, most likely it gets full credits.
- If they use value instead of symbol, it’s OK. For example,
R(s) + 1/2 R(s) + R(s) + * 8 > R(s) + 1/2 * 4/41 /81
- If a student only gives a possible value of R(s), give 1 point

4. [5%] In this question, we consider only state (3, 2) and the two goals from the original MDP. The
environment is also no longer deterministic: with probability p, the agent moves in the direction it
selects; with probability (1-p), it moves to the opposite to the intended direction. What values of p
would make the optimal action in state (3, 2) be DOWN? Use fractions (no decimal points) to show
the final answer.

Solution:
The action in (3, 2) Up => Down
p γ R(3, 3) + (1-p) γ R(3, 1) < p γ R(3, 1) + (1-p) γ R(3, 3) and p ≥ 0 (3 points)
1/2 > p ≥ 0 (2 points)

Note:
- Please check their final answer first. If it’s correct, most likely it gets full credits.
- Again, the same as the previous questions, both symbol and value are OK.
p 1/2 * 8 + (1-p) 1/2 * 4 < p 1/2 * 4 + (1-p) 1/2 * 8
- Only p < 1/2 without p >= 0 is OK.
- If a student only gives a possible value of p, give 1 point

4. [20%] Bayesian Network

Consider the following Bayesian network, where variables O, B, F, Q and G have the following
meaning:

O: hasOwl, B: isBrave, F: likesFlyingClass, Q: likesQuidditch, G: isGryffindorStudent.

1. [6%] Mark the following equations as True/False (T/F) based on whether they hold in this Bayesian
network.

P(G | F, Q, B) = P(G | O, F, Q, B)
P(F, B | O) = P(F | O) P(B | O)
P(O | F, Q, B) = P(O | F, Q, B, G)

Solution:
T (Descendent P517-518)
T (Conditionally independent)
T (Markov blanket P517-518)

Note:
- 2 pts for each.

2. [4%] Calculate the value of P(O+, B-, F+, Q+, G-). Show your work. Use decimal points without
rounding in the final answer.

Solution:
P(O+, B-, F+, Q+, G-) = P(O+) P(B- | O+) P(F+ | O+) P(Q+ | O+, F+) P(G- | Q+, B-)
= 0.5 * (1 - 0.4) * 0.9 * 0.8 * (1 - 0.4)
= 0.5 * 0.6 * 0.9 * 0.8 * 0.6
= 0.1296

Note:
- 2 pts for P(O+) P(B- | O+) P(F+ | O+) P(Q+ | O+, F+) P(G- | Q+, B-). No partial points.
- 2 pts for the final value. No partial points.

3. [2%] Write down the statement/inequality corresponding to the following English sentence. “A
Brave Gryffindor student who does not like Flying class would be more likely to have an Owl than
not”.

Solution:
P(O+ | B+, F-, G+) > P(O- | B+, F-, G+)
Or
P(O+ | B+, F-, G+) > 0.5

Note:
- No partial points.

4. [8%] Is the statement/inequality in Q4.3 true? Why? Please show your calculation.

Solution:
P(O | B+, F-, G+) = α P(O, B+, F-, G+)
= α ( P(O, B+, F-, Q+, G+) + P(O, B+, F-, Q-, G+) )
= α ( P(O-) P(B+ | O-) P(F- | O-) P(Q+ | O-, F-) P(G+ | Q+, B+)> +
P(O-) P(B+ | O-) P(F- | O-) P(Q- | O-, F-) P(G+ | Q-, B+)> )
= α ( < 0.5 * 0.4 * (1-0.9) * 0.4 * 0.9, (1-0.5) * 0.2 * (1-0.2) * 0.2 * 0.9 > +
<0.5 * 0.4 * (1-0.9) * (1-0.4) * 0.8, (1-0.5) * 0.2 * (1-0.2) * (1-0.2) *0.8 >)
= α (< 0.0072, 0.0144 > + < 0.0096, 0.0512 >)
= α <0.0168, 0.0656>
≈ <0.2 : 0.8> (Normalization)

0.2 < 0.8, (or 0.0168 < 0.0656)
P(O+ | B+, F-, G+) < P(O- | B+, F-, G+),
So the statement/inequality in Q4.3 is false.

Note:
Alternative representation:

P(O+ | B+, F-, G+) = P (B+, F−, G+)
P (B+, F−, G+, O+)
= P (B+, F−, G+ | O+) P (O+)P (B+, F−, G+ | O+) P (O+) + P (B+, F−, G+ | O−) P (O−)
= … < 1/2
or
P(O- | B+, F-, G+) = 1 - P(O+ | B+, F-, G+) > P(O+ | B+, F-, G+)

Correct formation with incorrect final answer can get partial points.
- To answer “it’s False” or “it’s not True” gets 2 pts
- The calculation of P(O+ | B+, F-, G+), 3 pts. 2 pts for symbolic expression and calculation, 1
pt for final value
- The calculation of P(O- | B+, F-, G+), 3 pts. 2 pts for symbolic expression and calculation, 1
pt for final value

5. [12%] Decision Tree

Daniel wants to watch the movie Avengers: Endgame. Before he invites his friends, he wants to
predict their answer. The table below shows 4 input features (x0-x3) and the results (y) last year
when he invited friends to Infinity War. Help him build the decision tree and find the candidate(s) he
should invite.

x0
Close Friend
x1
Marvel Fan
x2
Happy
x3
Have Exam
y
Let’s go!
1 False True True True True
2 False False False True False
3 False False True False False
4 False True False True True
5 True False True False True
6 False True False True True
7 True True True True False
8 True True True True False
9 True False True True False

1. [3%] Draw a complete decision tree that maximizes the information gain at its branches and
correctly classifies all the example. Put False “attribute value” on the left side, True on the right side.
You just need to write x0/x1/x2/x3 in the node, don’t write down feature name. (NO partial credit)

There are three kind of trees. Any of them get full mark.

For 5.1 we will also accept trees with x2 as root for partial credit.

Note that we are not expecting calculations of the values. Expressions in log and fraction
formats are acceptable.

2. [3%] Calculate the entropy of the data given. Write down your answer leaving fractions and
logarithms as is (no decimal points).

log log− 9
5
9
5 − 9
4
9
4
(0.9911 is not acceptable answer)

3. [3%] Calculate the information gain of the branch at the root node. Write down your answer leaving
fractions and logarithms as is (no decimal points).

log log ( log log ) ( log log ) − 9
5
9
5 − 9
4
9
4 + 9
5
5
2
5
2 + 5
3
5
3 + 9
4
4
1
4
1 + 4
3
4
3

(0.0911 is not acceptable answer)

4. [3%] Predict how the following friends will respond according to your decision tree.

x0
Close friend
x1
Marvel fan
x2
Happy
x3
Have Exam
y
Let’s go!
1 True False False True False
2 False True False False True
3 True False True False True

6. [18%] Neural Networks
Given the truth tables and neural network structures below, write the weights and bias in the empty
squares on the edge that will be equivalent to the required logic gate. Assume the following activation
function:

1a. [3%] NAND Gate

Solution:
The NAND gate is 0 only if both inputs are 1.
Note:
- No partial credits.
- There are more than one correct solutions, solutions are correct as long as it satisfies the truth
table

1b. [3%] Assuming the initial weights are all 1 and that the learning rate is 0.5, what would be the
weights after training on the correct value for x1=1 and x2=1? Please show your work below. (You
must show the updated results of W0, W1 and W2)

Solution:

x1 x2 y hw(x)
1 1 0 1

wi ← wi + ⍺(y − hw(x))∗xi
1st iteration:
W0 = 1 + 0.5(0-1)*1 = 0.5
W1 = 1 + 0.5(0-1)*1 = 0.5
W2 = 1 + 0.5(0-1)*1 = 0.5
Output = 1
2nd iteration:
W0 = 0.5 + 0.5(0-1)*1 = 0
W1 = 0.5 + 0.5(0-1)*1 = 0
W2 = 0.5 + 0.5(0-1)*1 = 0
Output = 1
3rd iteration:
W0 = 0 + 0.5(0-1)*1 = -0.5
W1 = 0 + 0.5(0-1)*1 = -0.5
W2 = 0 + 0.5(0-1)*1 = -0.5
Output = 0 = y

- Note:
- Full credits if 1st iteration is correct, do not deduct if they do 2nd or 3rd iterations

2a. [3%] NOR Gate

Solution:
The NOR gate is 1 only if both inputs are 0.
Note:
- No partial credits.
- There are more than one correct solutions, solutions are correct as long as it satisfies the truth
table

2b. [3%] Assuming the initial weights are all -1 and that the learning rate is 0.5, what would be the
weights after training on the correct value for x1=0 and x2=0? Please show your work below. (You
must show the updated results of W0, W1 and W2)

Solution:

x1 x2 y hw(x)
0 0 1 0

wi ← wi + ⍺(y − hw(x))∗xi
1st iteration:
W0 = -1 + 0.5(1-0)*1 = -0.5
W1 = -1 + 0.5(1-0)*0 = -1
W2 = -1 + 0.5(1-0)*0 = -1
Output = 0
2nd iteration:
W0 = -0.5 + 0.5(1-0)*1 = 0
W1 = -1 + 0.5(1-0)*0 = -1
W2 = -1 + 0.5(1-0)*0 = -1
Output = 1 = y
- Note:
- Full credits if 1st iteration is correct

3a. [3%] XNOR Gate

Solution:
The Boolean representation of an XNOR gate is x x(x )1 ⊕ x2 = x1 2 + x1 2
From the expression, we can say that the XNOR gate consists of an AND gate ( ), a NOR gate (xx1 2
), and an OR gate.xx1 2
This means we will have to combine 3 perceptrons:
The first layer is AND and NOR gate
The second layer is OR gate
Note:
- No partial credits.
- There are more than one correct solutions, solutions are correct as long as it satisfies
the truth table

3b. [3%] Assuming the initial weights are all 1 and that the learning rate is 0.7, what would be the
weights after training on the correct value for x1=0 and x2=1? Please show your work below. (You
must show the updated results of W11, W12, W13, W14, W15, W16, W21, W22 and W23)

Solution:

x1 x2 y hw(x)
0 1 0 1

wi,j ← wi,j + ⍺Δj∗ai
Δj = (yj−[hw(x)]j)∗g′(inj) // when j is the output layer
Δj = g′(inj) * k (wjk * Δk) // when j is a hidden layerΣ

// Simplified In this case, when j is the only hidden layer
Δj = g′(inj) * k ( wjk * (yk−[hw(x))*g’(ink) )Σ

// Simplify again when j has only one output
Δj = g′(inj) * wjk*(yj−[hw(x))*g’(ink)

The easy part (output layer):
Hidden1 = 1
Hidden2 = 1
W21 = 1 + 0.7(0-1)*1 = 0.3
W22 = 1 + 0.7(0-1)*1 = 0.3
W23 = 1 + 0.7(0-1)*1 = 0.3

The ambiguous part (hidden layer):
1) Propagate the error backward proportional to the weights

Hidden1 error = -1
W11 = 1 + 0.7*(-1)*1 = 0.3
W13 = 1 + 0.7*(-1)*0 = 1
W15 = 1 + 0.7*(-1)*1 = 0.3

Hidden2 error = -1
W12 = 1 + 0.7*(-1)*1 = 0.3
W14 = 1 + 0.7*(-1)*0 = 1
W16 = 1 + 0.7*(-1)*1 = 0.3

2) Observe that the activation function’s derivative is 0

Hidden1 error = 0
W11 = 1
W13 = 1
W15 = 1

Hidden2 error = 0
W12 = 1
W14 = 1
W16 = 1

- Note:
- Any one of the 2 solutions is correct

欢迎咨询51作业君