CSCI 561 Foundations of Artificial Intelligence Spring 2019 Final Exam Final Exam CSCI 561 Spring 2019: Foundations of Artificial Intelligence Please print neatly. Exam Version 1.2 with Solution Instructions: 1. Date: May 1, 2019 4:30-6:20 PM PT. 2. Maximum points for this midterm: 100 3. The percentages for each question are indicated in square brackets [ ]. 4. No books, cell phones, calculators (or any other material) are allowed. 5. Write down your name, student ID and USC email address. 6. Write answers ONLY in designated areas. 7. Do NOT write on the 2D QR code. 8. The back of the pages will NOT be graded. You may use it for scratch paper. 9. No questions during the exam. If something is unclear to you, write it down on your exam paper. 10.Be brief: a few words are often enough if they are precise. 11.When finished, raise completed exam sheets until approached by a proctor. 12.Adhere to the Academic Integrity Code. Problems 100 Percent total General AI Knowledge 1 (TF) 10 General AI Knowledge 2 (MC) 20 Markov Decision Process 20 Bayesian Network 20 Decision Tree 12 Neural Networks 18 1. [10%] General AI Knowledge: True/False For each of the statements below, answer T if the statement is always true, or F otherwise. Each question is worth 1 point, no partial credit. T 1. The outcome of a Markov Decision Process is dependent on both chance and a decision maker’s choice. F 2. Overfitting is when a machine learning hypothesis space has too few dimensions. T 3. Naive Bayes assumes conditional independence between input variables given the output variable. F 4. Uniform Cost Search is a special case of Depth-First Search. F 5. Simulated Annealing is a probabilistic algorithm that guarantees convergence on a global minimum (or maximum) solution. T 6. A logical agent can behave rationally even in a partially observable environment. T 7. (A∧B) ∨ ¬(A ⇒ B) is satisfiable. F 8. A Horn clause is a disjunction of literals with exactly one positive literal. F 9. The k-nearest neighbors algorithm (k-NN) is a parametric method that can be used for classification and regression. T 10. The hill climbing algorithm is equivalent to simulated annealing with a constant temperature of 0. 2. [20%] General Knowledge: Multiple Choice Each question has one or more correct answers. Circle all correct answers. Please note that there will be no partial credit. You will receive full credit if and only if you choose all of the correct answers and none of the wrong answers. Each question is worth 2 points, no partial credit. 1. Given two admissible heuristic functions h1(s) and h2(s), which is also an admissible heuristic function? A. h1(s) + h2(s) B. h1(s) * h2(s) C. max(h1(s), h2(s)) D. min(h1(s), h2(s)) E. None of the above 2. Which of the following statement about Bayesian networks is true? A. A Bayesian network can be a directed and acyclical graph B. A Bayesian network can be an undirected and cyclical graph C. A Bayesian network can be an undirected and acyclical graph D. A Bayesian network can be a fully connected graph E. A Bayesian network can be a graph with no edges 3. What substitutions result from unifying P(x, Messi, y) with P(Neymar, F(y), z)? A. {x/Neymar, Messi/F(y), y/z} B. {x/Messi, y/Neymar, F/z} C. {x/Neymar, F(y)/Messi, y/z} D. {x/Neymar, y/Messi, z/Messi} E. These literals fail to unify 4. Which of the following arc consistency outcomes is insufficient to draw an unambiguous conclusion? A. One domain is empty B. Each domain has a single value C. One domain has more than one value D. Some domains have more than one value E. None of the above 5. Consider p = “If Vijay is Alice’s brother, then Alice is Bob’s aunt” and q = “Bob is Vijay’s nephew.” Which of the following statements is equivalent to p ⇒ q? A. If Bob is Vijay’s nephew, then Vijay is Alice’s nephew and Alice is not Bob’s aunt. B. If Bob is Vijay’s nephew, then Vijay is Alice’s brother and Alice is Bob’s aunt. C. If Bob is not Vijay’s nephew, then Vijay is not Alice’s brother or Alice is not Bob’s aunt. D. If Bob is not Vijay’s nephew, then Vijay is not Alice’s brother and Alice is Bob’s aunt. E. If Bob is not Vijay’s nephew, then Vijay is Alice’s brother and Alice is not Bob’s aunt. 6. Consider using k-nearest neighbors with k=3 to classify inputs with two dimensions, using Manhattan Distance, defined as: dist((a, b), (c, d)) = |a - c| + | b - d| Considering the following data set: (0, 0), FALSE (1, 0), TRUE (0, 1), TRUE (2, 1), FALSE (3, 0), FALSE (3, 1), TRUE Which of the following new inputs would be labeled as TRUE? A. (0, 0) B. (0, 1) C. (1, 0) D. (1, 1) E. (2, 0) F. (3, 0) 7. What is the English equivalent of “¬∀x(Computer(x) ⇒ Intelligent(x))”? A. Not every computer is intelligent B. There is no intelligent computer C. Being a computer implies not being intelligent D. Some computers are not intelligent E. None of the above 8. Consider the game tree illustrated below where A-F are integers. Assume the nodes are explored from left to right and alpha-beta pruning is used. Which of the following is true? A. if A = 3, B can be pruned B. if A = 6, B can be pruned C. if A = 5, B = 5, then the subtree containing C and D can be pruned D. if A = 5, B = 5, C = 3, D = 4, then the subtree containing E and F can be pruned E. There are SOME values of A and B such that the subtree containing C and D can be pruned 9. Consider the following Bayesian network, which of the following statements are true? A. A and B are independent B. A and B are conditionally independent given C C. D and E are independent D. D is conditionally independent of G given C E. H is dependent on D and E 10. A CSCI561 student got an internship at a movie studio in Silver Lake. She learned that her job will involve adding colors to old movies, currently, a process requiring manual touchup frame-by-frame. Recalling the lessons from her AI class, she decided to use deep learning. The state-of-the-art auto-colorization technique requires object detection, image classification, and a multilayer convolutional network (CNN). Which of the following AI techniques will this aspiring movie editor use? A. Supervised B. Unsupervised C. Semi-supervised D. Hierarchical E. None of the above ** We will accept either ABD or ABCD ** 3. [20%] Markov Decision Process Consider the 3 × 3 grid world below. At each state, the agent may have a choice of three deterministic actions: Up, Down, or Right. Actions that would move the agent into a wall are not allowed. For example, in state (1, 3), the only available actions for the agent are Down and Right. The goals (terminal states) with rewards are in the state (3, 1) and (3, 3). The agent gets a reward of R(3,1) = 4 if it’s at (3, 1) and a reward of R(3, 3) = 8 if it’s at (3, 3). The discount factor is γ = 1/2 and the reward is R(s) = 0 for each nonterminal state, s. 1. [5%] On the figure below, show the optimal value V* and the corresponding optimal action for each state. (In any state, if multiple actions result in the same optimal value, just write down one of them as the optimal action.) Optimal values: Optimal actions: Solution: Note: - For optimal actions, if they use arrows (→, ↓, etc.) or letters (R, D, etc.) instead of English words, it’s OK. - For state (1, 2) and (2, 2), either UP or RIGHT is acceptable - 2 points for each figure. The last 1 point is a gift. - For each figure, partial credit (1 point) if any grid is incorrect. 2. [5%] What POSITIVE values of γ would make the optimal action in state (2, 1) be UP, assuming no other changes to the MDP? Use fractions (no decimal points) to show the final answer. Solution: The action in state (2,1) Right => Up R(3, 3) > γ R(3, 1) and 1 ≥ γ > 0γ3 (3 points) 1 ≥ γ > 2 √2 (2 points) Note: - Please check their final answer first. If it’s correct, most likely it gets full credits. - If they use value instead of symbol, it’s OK. For example, 8 > γ 4.γ3 - Only γ > without 1 ≥ γ > 0 is OK.2 √2 - If a student only gives a possible value of γ, give 1 point 3. [5%] What values of R(s) for nonterminal states, s, would make the optimal action in state (2, 1) be UP, assuming the original discount factor (γ = 1/2) and no other changes to the MDP? Use fractions (no decimal points) to show the final answer. Solution: The action in (2, 1) Right => Up R(s) + γ R(s) + R(s) + R(3, 3) > R(s) + γ R(3, 1)γ2 γ3 (3 points) R(s) > 4/3 (2 points) Note: - Please check their final answer first. If it’s correct, most likely it gets full credits. - If they use value instead of symbol, it’s OK. For example, R(s) + 1/2 R(s) + R(s) + * 8 > R(s) + 1/2 * 4/41 /81 - If a student only gives a possible value of R(s), give 1 point 4. [5%] In this question, we consider only state (3, 2) and the two goals from the original MDP. The environment is also no longer deterministic: with probability p, the agent moves in the direction it selects; with probability (1-p), it moves to the opposite to the intended direction. What values of p would make the optimal action in state (3, 2) be DOWN? Use fractions (no decimal points) to show the final answer. Solution: The action in (3, 2) Up => Down p γ R(3, 3) + (1-p) γ R(3, 1) < p γ R(3, 1) + (1-p) γ R(3, 3) and p ≥ 0 (3 points) 1/2 > p ≥ 0 (2 points) Note: - Please check their final answer first. If it’s correct, most likely it gets full credits. - Again, the same as the previous questions, both symbol and value are OK. p 1/2 * 8 + (1-p) 1/2 * 4 < p 1/2 * 4 + (1-p) 1/2 * 8 - Only p < 1/2 without p >= 0 is OK. - If a student only gives a possible value of p, give 1 point 4. [20%] Bayesian Network Consider the following Bayesian network, where variables O, B, F, Q and G have the following meaning: O: hasOwl, B: isBrave, F: likesFlyingClass, Q: likesQuidditch, G: isGryffindorStudent. 1. [6%] Mark the following equations as True/False (T/F) based on whether they hold in this Bayesian network. P(G | F, Q, B) = P(G | O, F, Q, B) P(F, B | O) = P(F | O) P(B | O) P(O | F, Q, B) = P(O | F, Q, B, G) Solution: T (Descendent P517-518) T (Conditionally independent) T (Markov blanket P517-518) Note: - 2 pts for each. 2. [4%] Calculate the value of P(O+, B-, F+, Q+, G-). Show your work. Use decimal points without rounding in the final answer. Solution: P(O+, B-, F+, Q+, G-) = P(O+) P(B- | O+) P(F+ | O+) P(Q+ | O+, F+) P(G- | Q+, B-) = 0.5 * (1 - 0.4) * 0.9 * 0.8 * (1 - 0.4) = 0.5 * 0.6 * 0.9 * 0.8 * 0.6 = 0.1296 Note: - 2 pts for P(O+) P(B- | O+) P(F+ | O+) P(Q+ | O+, F+) P(G- | Q+, B-). No partial points. - 2 pts for the final value. No partial points. 3. [2%] Write down the statement/inequality corresponding to the following English sentence. “A Brave Gryffindor student who does not like Flying class would be more likely to have an Owl than not”. Solution: P(O+ | B+, F-, G+) > P(O- | B+, F-, G+) Or P(O+ | B+, F-, G+) > 0.5 Note: - No partial points. 4. [8%] Is the statement/inequality in Q4.3 true? Why? Please show your calculation. Solution: P(O | B+, F-, G+) = α P(O, B+, F-, G+) = α ( P(O, B+, F-, Q+, G+) + P(O, B+, F-, Q-, G+) ) = α (
P(O-) P(B+ | O-) P(F- | O-) P(Q+ | O-, F-) P(G+ | Q+, B+)> +
P(O-) P(B+ | O-) P(F- | O-) P(Q- | O-, F-) P(G+ | Q-, B+)> ) = α ( < 0.5 * 0.4 * (1-0.9) * 0.4 * 0.9, (1-0.5) * 0.2 * (1-0.2) * 0.2 * 0.9 > + <0.5 * 0.4 * (1-0.9) * (1-0.4) * 0.8, (1-0.5) * 0.2 * (1-0.2) * (1-0.2) *0.8 >) = α (< 0.0072, 0.0144 > + < 0.0096, 0.0512 >) = α <0.0168, 0.0656> ≈ <0.2 : 0.8> (Normalization) 0.2 < 0.8, (or 0.0168 < 0.0656) P(O+ | B+, F-, G+) < P(O- | B+, F-, G+), So the statement/inequality in Q4.3 is false. Note: Alternative representation: P(O+ | B+, F-, G+) = P (B+, F−, G+) P (B+, F−, G+, O+) = P (B+, F−, G+ | O+) P (O+)P (B+, F−, G+ | O+) P (O+) + P (B+, F−, G+ | O−) P (O−) = … < 1/2 or P(O- | B+, F-, G+) = 1 - P(O+ | B+, F-, G+) > P(O+ | B+, F-, G+) Correct formation with incorrect final answer can get partial points. - To answer “it’s False” or “it’s not True” gets 2 pts - The calculation of P(O+ | B+, F-, G+), 3 pts. 2 pts for symbolic expression and calculation, 1 pt for final value - The calculation of P(O- | B+, F-, G+), 3 pts. 2 pts for symbolic expression and calculation, 1 pt for final value 5. [12%] Decision Tree Daniel wants to watch the movie Avengers: Endgame. Before he invites his friends, he wants to predict their answer. The table below shows 4 input features (x0-x3) and the results (y) last year when he invited friends to Infinity War. Help him build the decision tree and find the candidate(s) he should invite. x0 Close Friend x1 Marvel Fan x2 Happy x3 Have Exam y Let’s go! 1 False True True True True 2 False False False True False 3 False False True False False 4 False True False True True 5 True False True False True 6 False True False True True 7 True True True True False 8 True True True True False 9 True False True True False 1. [3%] Draw a complete decision tree that maximizes the information gain at its branches and correctly classifies all the example. Put False “attribute value” on the left side, True on the right side. You just need to write x0/x1/x2/x3 in the node, don’t write down feature name. (NO partial credit) There are three kind of trees. Any of them get full mark. For 5.1 we will also accept trees with x2 as root for partial credit. Note that we are not expecting calculations of the values. Expressions in log and fraction formats are acceptable. 2. [3%] Calculate the entropy of the data given. Write down your answer leaving fractions and logarithms as is (no decimal points). log log− 9 5 9 5 − 9 4 9 4 (0.9911 is not acceptable answer) 3. [3%] Calculate the information gain of the branch at the root node. Write down your answer leaving fractions and logarithms as is (no decimal points). log log ( log log ) ( log log ) − 9 5 9 5 − 9 4 9 4 + 9 5 5 2 5 2 + 5 3 5 3 + 9 4 4 1 4 1 + 4 3 4 3 (0.0911 is not acceptable answer) 4. [3%] Predict how the following friends will respond according to your decision tree. x0 Close friend x1 Marvel fan x2 Happy x3 Have Exam y Let’s go! 1 True False False True False 2 False True False False True 3 True False True False True 6. [18%] Neural Networks Given the truth tables and neural network structures below, write the weights and bias in the empty squares on the edge that will be equivalent to the required logic gate. Assume the following activation function: 1a. [3%] NAND Gate Solution: The NAND gate is 0 only if both inputs are 1. Note: - No partial credits. - There are more than one correct solutions, solutions are correct as long as it satisfies the truth table 1b. [3%] Assuming the initial weights are all 1 and that the learning rate is 0.5, what would be the weights after training on the correct value for x1=1 and x2=1? Please show your work below. (You must show the updated results of W0, W1 and W2) Solution: x1 x2 y hw(x) 1 1 0 1 wi ← wi + ⍺(y − hw(x))∗xi 1st iteration: W0 = 1 + 0.5(0-1)*1 = 0.5 W1 = 1 + 0.5(0-1)*1 = 0.5 W2 = 1 + 0.5(0-1)*1 = 0.5 Output = 1 2nd iteration: W0 = 0.5 + 0.5(0-1)*1 = 0 W1 = 0.5 + 0.5(0-1)*1 = 0 W2 = 0.5 + 0.5(0-1)*1 = 0 Output = 1 3rd iteration: W0 = 0 + 0.5(0-1)*1 = -0.5 W1 = 0 + 0.5(0-1)*1 = -0.5 W2 = 0 + 0.5(0-1)*1 = -0.5 Output = 0 = y - Note: - Full credits if 1st iteration is correct, do not deduct if they do 2nd or 3rd iterations 2a. [3%] NOR Gate Solution: The NOR gate is 1 only if both inputs are 0. Note: - No partial credits. - There are more than one correct solutions, solutions are correct as long as it satisfies the truth table 2b. [3%] Assuming the initial weights are all -1 and that the learning rate is 0.5, what would be the weights after training on the correct value for x1=0 and x2=0? Please show your work below. (You must show the updated results of W0, W1 and W2) Solution: x1 x2 y hw(x) 0 0 1 0 wi ← wi + ⍺(y − hw(x))∗xi 1st iteration: W0 = -1 + 0.5(1-0)*1 = -0.5 W1 = -1 + 0.5(1-0)*0 = -1 W2 = -1 + 0.5(1-0)*0 = -1 Output = 0 2nd iteration: W0 = -0.5 + 0.5(1-0)*1 = 0 W1 = -1 + 0.5(1-0)*0 = -1 W2 = -1 + 0.5(1-0)*0 = -1 Output = 1 = y - Note: - Full credits if 1st iteration is correct 3a. [3%] XNOR Gate Solution: The Boolean representation of an XNOR gate is x x(x )1 ⊕ x2 = x1 2 + x1 2 From the expression, we can say that the XNOR gate consists of an AND gate ( ), a NOR gate (xx1 2 ), and an OR gate.xx1 2 This means we will have to combine 3 perceptrons: The first layer is AND and NOR gate The second layer is OR gate Note: - No partial credits. - There are more than one correct solutions, solutions are correct as long as it satisfies the truth table 3b. [3%] Assuming the initial weights are all 1 and that the learning rate is 0.7, what would be the weights after training on the correct value for x1=0 and x2=1? Please show your work below. (You must show the updated results of W11, W12, W13, W14, W15, W16, W21, W22 and W23) Solution: x1 x2 y hw(x) 0 1 0 1 wi,j ← wi,j + ⍺Δj∗ai Δj = (yj−[hw(x)]j)∗g′(inj) // when j is the output layer Δj = g′(inj) * k (wjk * Δk) // when j is a hidden layerΣ // Simplified In this case, when j is the only hidden layer Δj = g′(inj) * k ( wjk * (yk−[hw(x))*g’(ink) )Σ // Simplify again when j has only one output Δj = g′(inj) * wjk*(yj−[hw(x))*g’(ink) The easy part (output layer): Hidden1 = 1 Hidden2 = 1 W21 = 1 + 0.7(0-1)*1 = 0.3 W22 = 1 + 0.7(0-1)*1 = 0.3 W23 = 1 + 0.7(0-1)*1 = 0.3 The ambiguous part (hidden layer): 1) Propagate the error backward proportional to the weights Hidden1 error = -1 W11 = 1 + 0.7*(-1)*1 = 0.3 W13 = 1 + 0.7*(-1)*0 = 1 W15 = 1 + 0.7*(-1)*1 = 0.3 Hidden2 error = -1 W12 = 1 + 0.7*(-1)*1 = 0.3 W14 = 1 + 0.7*(-1)*0 = 1 W16 = 1 + 0.7*(-1)*1 = 0.3 2) Observe that the activation function’s derivative is 0 Hidden1 error = 0 W11 = 1 W13 = 1 W15 = 1 Hidden2 error = 0 W12 = 1 W14 = 1 W16 = 1 - Note: - Any one of the 2 solutions is correct 欢迎咨询51作业君