1 CSE150B Final (Sample)
2 Please, finishing reading everything on this page before you start. Or we’ll just reply “Page 1” for questions
3 that are already answered here and you still need to read it anyway.
4 •
5 •
6 •
7 8
9 • 10
11 12 13 14
15 16 17 18
19 20
21 22
23 24
25 • 26
27
28 • 29
30 • 31
32
33
Submission deadline: ...
Submit photos or pdfs of your written answers on Gradescope.
Record a video in which you briefly explain the answers. Don’t make it too long, just enough so that we can tell that you get the ideas for each question you answer. The grading will still be based on the written part; the video will be used to spot serious problems or help you getting partial credits.
For the video submission, we recommend the following:
1.
2.
3. 4. 5.
Record your video using Zoom or your phone and submit the mp4 file through the link ...
Make sure the file name has your first name, last name, and PID. Dropbox will send you confir- mation. You can update the video before the deadline (do not overdo it, because there might be glitches in Dropbox over multiple versions; uploading just once is the best), and make sure you always use the same file name otherwise we will grade using an arbitrary one from what you sent.
Or you can record your video using Zoom and use its option “Record on the Cloud” and then Zoom will send you a link and password for sharing that video. Send the link and password to my email [email protected] (write ”150B Final” and your PID in the subject). This is less preferred than uploading the video directly, and you do not get a confirmation immediately either.
If you recorded using your phone or in other ways, definitely try compressing the video into mp4 format (can use relatively low resolution), otherwise the file may be too large.
We highly recommend leaving enough time for uploading – ideally start uploading before Friday is over for you so that there is time to deal with technical issues on Saturday if there is any.
If you have difficulty with all these options, send a message on Slack in the #question channel before the submission deadline and we’ll direct message you to figure out a way.
When a question has a “why” or “explain” part, make sure to write down a short explanation in writing first and then you can just repeat them in your video. When there is no “why” asked in a question, you only need to write down the answer and then explain the why in the video.
You can write your answers on blank paper and take photos of them, or directly annotate on the pdf, etc. Just make sure things are readable.
Your final grades will depend on the sum of all points you’ve obtained in the class. So you only need to get enough points in the final that fit the grade you want. E.g., if you just need a B and you got full points for all the PAs already (summing up to 70 or more), then you just need to aim for 10 points from the final – of course, over-budgeting a bit is always good.
1
34 1 Classical Search (6 Points)
35 Consider the state space graph shown in Figure 1. A is the start state and G is the goal state. Each edge
36 can be traversed in both directions.
ABC EF
D
H
G
Figure 1: Graph for the classical search questions.
37 Question 1 (3 Points). Let the cost of passing through each node be defined as:
c(A) = 0,c(B) = 1,c(C) = 10,c(D) = 3,c(E) = 3,c(F) = 1,c(G) = 10,c(H) = 1.
38 The table below lists three basic graph search strategies and four possible paths. Check which, if any, of the
39 listed paths each strategy may return: each algorithm may return multiple different paths because we have
40 not restricted the order in which we push new states into the frontier or how we break ties. Just mark the
corresponding box and make sure to mark all paths that can be returned by each algorithm. A-E-B-A A-B-C-G A-E-B-C-G A-D-H-F-G
Depth-First Search Breadth-First Search Uniform Cost Search
41
42 Is h a consistent heuristic function? What is the path returned by A* search using h?
43 2 Adversarial Search
44 Consider this (upside-down Geisel library) game tree in Figure 2. You can think of it as playing “one round
45 of chess with trembling hands” in the following sense: there are two rational players as usual, Min and Max,
46 and for each player after they determine what action to take, there is a chance player coming in (think of it
47 as the environment, modeling the fact that the two players may face challenges putting the pieces down at
48 places they want) that gives a random outcome according to the probabilities as labeled on the edges in the
49 tree. That is, the “Chance 1” player always gives the left outcome with 0.8 probability and the right one
50 with 0.2 probability, while the “Chance 2” player uses a uniform distribution, giving each possible outcome
51 0.5 probability. As always, Max will maximize, Min will minimize, and the chance players will return a
52 sampled outcome according to the probabilities.
Question 2 (3 Points). Define the following heuristic function h:
h(A) = 9,h(B) = 9,h(C) = 2,h(D) = 7,h(E) = 1,h(F) = 4,h(G) = 0,h(H) = 10.
2
Chance 1
Max
0.2 0.8 0.2
Chance 2
0.8
0.8 0.2
Min
Terminal
10
0.5
-2 0 3
Figure 2: Tree for the adversarial search questions.
53 Question 3 (2 Points). What is the final value for the max player (i.e., what is the number you should
54 annotate the root node with)?
55 Question 4 (3 Points). I want to write the value of the max player succinctly as
? ? ??
Vmax=maxE minER(τ) a a′
56 so that this value Vmax corresponds to what you have answered in the previous question. I hope you are
57 convinced that this equation makes sense (I hope you will not ask what E means). Answer the follow
58 questions: How many choices does the maxa range over? And how many choices does the mina′ range over?
59 What should the τ here refer to? What is R(τ)? Also, give an arbitrary instance of τ and its R(τ).
60 3 Markov Decision Processes
61 Consider an MDP with states S = {s1 , s2 , s3 }, actions A = {a}, and the following transition probabilities:
62 P(s1|s1,a) = 0,P(s2|s1,a) = 0.6,P(s3|s1,a) = 0.4. Define rewards R(s1) = 0, R(s2) = 1, R(s3) = 2. For
63 simplicity we do not use a discount factor in this problem (or you can think of it as γ = 1).
64 Question 5 (2 Points). Enumerate all possible trajectories with s1 being the initial state (in the form of
65 state, reward of state, action, next state, and so on until termination), and their probabilities. Use them to
66 compute the value of s1 in the MDP.
67 Question 6 (2 Points). Since s2 and s3 are terminal states, we can consider the MDP as a tree with s1 as
68 its root. Show that the expectimax value of s1 in this tree is the same as the value of s1 in the MDP. (Again
69 we are not using discount factor here.)
3
70 4 Reinforcement Learning
71 We will consider RL on a tiny control problem. Shown in Figure 3 is a simplified inverted pendulum (mass
72 attached to an inelastic weightless rod). The hope is to balance the pendulum at the upright position by
73 giving a torque input at the end of the rod (the box with ±T indicating two choices of torque in opposite
74 directions). The angular position is measured by the difference with the upright position, and tilting right
75 is considered the positive direction. Note that this is a (of course extremely) simplified model of how robots
76 walk and SpaceX lands its rockets.
!àà !=à !?à
! ±T
Figure 3: Inverted pendulum for the RL questions.
77 We use variable θ to represent the pendulum’s angular position, and ω to represent its angular speed.
78 Suppose s = (θ, ω) be the current state, and s′ = (θ′, ω′) be the next state, after the system runs for a small
79 time step ∆t. The discretized and very aggressively simplified dynamics of the pendulum is as follows:
θ′ = θ+ω∆t (1) ω′ = ω+(k1θ+k2a)∆t (2) a ∈ {0,T,−T} (3)
80 where k1,k2 are constants dependent on the physical parameters that we do not need to care about, so let
81 us just assume they are just k1 = k2 = 1. These equations tell you how the current state (θ, ω) transitions
82 to the next state (θ′,ω′) under an action a. This control action a is simply either 0 or T or −T, and we let
83 T = 1. We set the time duration of each step ∆t to also be constant ∆t = 1 (which is not small enough to
84 physically justify the simplifications we made in the equations but let’s not worry about that).
85 Note that there is no probability involved in this setting, but the system dynamics will not be known to
86 the learner, so the agent needs to learn the control policy by simulating the system and observing the effects
87 of the actions. Define the reward function for each state s = (θ, ω) be R(s) = −(θ2 + ω2). By maximizing
88 accumulated rewards, the agent can learn to maintain the pendulum at the upright position, in which case
89 the accumulated rewards will be close to zero (and otherwise much more negative).
90 Question 7 (1 Point). For the problem, is it enough to consider only the angular position θ as the system’s
91 state, without the angular velocity ω component? Why? (Recall what “Markov” means, and although we
92 don’t have probabilities here, think about why the Markov condition was important.)
93 Question 8 (2 Points). Let the initial state be s0 = (0, 1). Take two steps of actions, a0 = T and a1 = −T ,
94 which will unroll the system from s0 to the next state s1 and then to another state s2, according to the
95 equations in (1)-(2). We will stop the simulation there and consider s2 as a terminal state. Write down what
96 s1 and s2 are (their numerical values as a vector), and the reward-to-go for each of these three states in this
97 trajectory. If you need, use a discount factor γ = 0.5.
4
98 Question 9 (3 Extra Points). Construct a trajectory of three states s0,s1,s2 such that it ends up at the
99 perfect upright position, i.e., s2 = (0,0). We require that s0 ̸= s2 and a0 ̸= 0. (Hint: Write down a few
100 linear equations and you just need to find one of the many possible solutions.) Now, for the state s1 and the
101 corresponding a1 that you came up with in this sequence, what should the optimal Q value be for Q(s1, a1)?
102 (Hint: Note that s2 is the best possible terminal state for the system, so the action a1 you computed is the
103 best possible one).
104 When you are interested (after the quarter ends apparently, and sorry no more extra credits), try coding
105 this environment up and see how Q learning can be used to find a good control policy for it.
106 5 Monte Carlo Tree Search
Figure 4: Tree for the MCTS Question
112 Question 10 (2 Points). Draw the first four trees that the MCTS algorithm constructs. Use c = 10 in the
113 UCB formula (refer to slides or pseudocode of PA4). After 6 rounds of the algorithm (6 iterations of the
114 entire MCTS loop), what is the number of wins and the number of visits for Max’s “right” action?
115 6 Constraint Solving
116 We now look at constraint solving in the context of neural networks. Show in Figure 5 is a simple neural
117 network. You don’t need to know about them before (and knowing them will not give any advantage), just
118 follow the descriptions below.
Figure 5: Left: Neural network model for the constraint solving questions. Right: The tanh function. 5
Consider the minimax tree in Figure 4. This is the entire game tree, and the leaf nodes are the actual
107
108 terminal states of the game. In this question, wherever there is a random choice to make, the order is from
109 left to right. For instance, when you need to break tie between two children nodes, prioritize the left one.
110 When you need to sample a random outcome between left and right, the random sequence is fixed to be left,
111 right, left, right, etc. In the tree, the number 1 means Max wins, and the number 0 means Min wins.
x a? tanh a2 y b? b2
119 The x variable represents the input, and it first passes through a linear transformation with weight a1
120 and bias b1 and then gets fed into the “neuron” which we choose to be the tanh function here. Look up this
121 function on Google and observe that it has a “switch” like behavior, which makes it a continuous version of
122 a logical gate. Also, importantly, it is a monotonic function. Then the output from the tanh function passes
123 through another transformation defined by a2 and b2 and then becomes the output value y. So overall, this
124 network is simply the diagram of a nonlinear function from the input x to the output y:
y = a2 tanh(a1x + b1) + b2.
125 This nested form can be flattened in to a constraint system over x, y and two more additional variables s1, s2,
126 in the sense that the input and output values of the network always satisfy the following constraint system:
{s1 =a1x+b1,s2 =tanh(s1),y=a2s2 +b2}. (4)
127 In this form, we can use constraint solving algorithms to analyze the relation between inputs and outputs of
128 this network. Note that all the really large networks (with millions or billions of neurons like this) can be
129 encoded and analyzed with the same principle, if the constraint solving algorithms scale well enough.
130 Question 11 (3 Points). Suppose y ∈ [1, 2] is the range for the output to be classified as a certain result (like
131 “cat”). Assume that the network parameters have been trained to be simply a1 = b1 = a2 = b2 = 1. What
132 is the maximum range (lower and upper bounds) of input x that will produce this output range y ∈ [1, 2]?
133 Round up numbers to 2 digits after decimal point in your result. Feel free to use Google or Python for any
134 of the computation needed. (Hint: use arc-consistency to propagate the range of variables from y to x, using
135 the constraints in (4)).
136 Question 12 (3 Points). Suppose I have an input x0 that already produces the output y0. I now want a
137 different input x1 that is quite different from x0, in the sense that |x0 − x1| > ε, where ε is a given positive
138 parameter that defines the threshold, such that the output on x1 is exactly the same as x0’s output. What
139 is the constraint system (write it in a similar way as (4)) that you can write down to uniquely define all such
140 x1? Feel free to use all the variables that appeared here (x0, y0, ε) in your constraints. (Note: this in general
141 is the adversarial attack problem for neural networks. By solving such constraints, you can find a picture
142 that is nonsense but still classified as cat, which is what people worry about for face recognition systems and
143 autonomous cars driven by neural perception systems.)
144 7 Propositional Reasoning
145 Question 13 (2 Points). List all the satisfying assignments of the following DNF formula.
(p1 ∧p2 ∧p3)∨(¬p1 ∧p2)∨(p2 ∧¬p3)
146 (Hint: Recall the special structure of DNF and you don’t need a truth table to do this.)
147 Question 14 (2 Points). List all the satisfying assignments of the following CNF formula. (¬p1 ∨¬p2 ∨¬p3)∧(p1 ∨¬p2)∧(¬p2 ∨p3)
148 (Hint: Pay attention to the relation between this formula and the previous one.)
6