代写接单- Fall 2020 CSE150B Final 2

欢迎使用51辅导,51作业君孵化低价透明的学长辅导平台,服务保持优质,平均费用压低50%以上! 51fudao.top

1 Fall 2020 CSE150B Final 2 

Please, finishing reading everything on this page before you start. Or well just reply 

Page 1 for questions 3 that are already answered here and you still need to read it anyway. 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 Submission deadline: Dec-19 13:00pm PST. Submit photos or pdfs of your written answers on Gradescope. Record a video in which you briefly explain the answers. Dont make it too long, just enough so that we can tell that you get the ideas for each question you answer. The grading will still be based on the written part; the video will be used to spot serious problems or help you getting partial credits. For the video submission, we recommend the following: 1. 2. 3. 4. 5. Record your video using Zoom or your phone and submit the mp4 file through the link https://www.dropbox.com/request/si4ItzYr6RkVmpz0hNVu Make sure the file name has your first name, last name, and PID. Dropbox will send you confir- mation. You can update the video before the deadline (do not overdo it, because there might be glitches in Dropbox over multiple versions; uploading just once is the best), and make sure you always use the same file name otherwise we will grade using an arbitrary one from what you sent. Or you can record your video using Zoom and use its option Record on the Cloud and then Zoom will send you a link and password for sharing that video. Send the link and password to my email [email protected] (write 150B Final and your PID in the subject). This is less preferred than uploading the video directly, and you do not get a confirmation immediately either. If you recorded using your phone or in other ways, definitely try compressing the video into mp4 format (can use relatively low resolution), otherwise the file may be too large. We highly recommend leaving enough time for uploading ideally start uploading before Friday is over for you so that there is time to deal with technical issues on Saturday if there is any. If you have difficulty with all these options, send a message on Slack in the #question channel before the submission deadline and well direct message you to figure out a way. When a question has a why or explain part, make sure to write down a short explanation in writing first and then you can just repeat them in your video. When there is no why asked in a question, you only need to write down the answer and then explain the why in the video. You can write your answers on blank paper and take photos of them, or directly annotate on the pdf, etc. Just make sure things are readable. Your final grades will depend on the sum of all points youve obtained in the class. So you only need to get enough points in the final that fit the grade you want. E.g., if you just need a B and you got full points for all the PAs already (summing up to 70 or more), then you just need to aim for 10 points from the final of course, over-budgeting a bit is always good. 1 35 1 Classical Search (6 Points) 36 Consider the state space graph shown in Figure 1. A is the start state and G is the goal state. Each edge 37 can be traversed in both directions. ABC EF D H G Figure 1: Graph for the classical search questions. 38 Question 1 (3 Points). Let the cost of passing through each node be defined as: c(A) = 0,c(B) = 1,c(C) = 10,c(D) = 3,c(E) = 3,c(F) = 1,c(G) = 2,c(H) = 1. 39 The table below lists three basic graph search strategies and four possible paths. Check which, if any, of the 40 listed paths each strategy may return: each algorithm may return multiple different paths because we have 41 not restricted the order in which we push new states into the frontier or how we break ties. Just mark the 42 corresponding box and make sure to mark all paths that can be returned by each algorithm. A-E-B-A A-B-C-G A-E-B-C-G A-D-H-F-G x x x x x Question 2 (3 Points). Define the following heuristic function h: h(A) = 9,h(B) = 9,h(C) = 2,h(D) = 7,h(E) = 1,h(F) = 4,h(G) = 0,h(H) = 10. 44 Is h a consistent heuristic function? What is the path returned by A search using h? 45 Answer 2. Heuristic function is not consistent. Path found is ABCG. 46 2 Adversarial Search 47 Consider this (upside-down Geisel library) game tree in Figure 2. You can think of it as playing one round 48 of chess with trembling hands in the following sense: there are two rational players as usual, Min and Max, 49 and for each player after they determine what action to take, there is a chance player coming in (think of it Depth-First Search Breadth-First Search Uniform Cost Search 43 Answer 1. 2 Chance 1 Max 0.2 0.8 0.2 Chance 2 0.8 0.8 0.2 Min Terminal 10 0.5 -2 0 3 Figure 2: Tree for the adversarial search questions. 50 as the environment, modeling the fact that the two players may face challenges putting the pieces down at 51 places they want) that gives a random outcome according to the probabilities as labeled on the edges in the 52 tree. That is, the Chance 1 player always gives the left outcome with 0.8 probability and the right one 53 with 0.2 probability, while the Chance 2 player uses a uniform distribution, giving each possible outcome 54 0.5 probability. As always, Max will maximize, Min will minimize, and the chance players will return a 55 sampled outcome according to the probabilities. 56 Question 3 (3 Points). What is the final value for the max player (i.e., what is the number you should 57 annotate the root node with)? 58 Answer 3. 5.3 59 Question 4 (2 Points). I want to write the value of the max player succinctly as Vmax=maxE minER() a a 60 so that this value Vmax corresponds to what you have answered in the previous question. I hope you are 61 convinced that this equation makes sense (I hope you will not ask what E means). Answer the follow 62 questions: How many choices does the maxa range over? And how many choices does the mina range over? 63 What should the here refer to? What is R()? Also, give an arbitrary instance of and its R(). 64 Answer 4. maxa ranges over 3 choices, while mina ranges over 2 choices for each min node (12 total). 65 refers to a trajectory. Any path in the graph above from max to terminal is a valid trajectory. 66 3 Markov Decision Processes 67 Consider an MDP with states S = {s1 , s2 , s3 }, actions A = {a}, and the following transition probabilities: 68 P(s1|s1,a) = 0,P(s2|s1,a) = 0.6,P(s3|s1,a) = 0.4. Define rewards R(s1) = 0, R(s2) = 1, R(s3) = 2. For 69 simplicity we do not use a discount factor in this problem (or you can think of it as = 1). if they only write 12 then -1, since its wrong many will write ter- minal node, -0.5 3 70 Question 5 (2 Points). Enumerate all possible trajectories with s1 being the initial state (in the form of 71 state, reward of state, action, next state, and so on until termination), and their probabilities. Use them to 72 compute the value of s1 in the MDP. tra jectory 73 Answer5. s1,R=0,a,s2,R=1,a= s1,R = 0,a,s3,R = 2,a = probability totalreward 0.6 1 , 0.4 2 74 So the expected reward starting from s1 is 0.6 1 + 0.4 2 = 1.4 75 Question 6 (2 Points). Since s2 and s3 are terminal states, we can consider the MDP as a tree with s1 as 76 its root. Show that the expectimax value of s1 in this tree is the same as the value of s1 in the MDP. (Again 77 we are not using discount factor here.) 78 Answer 6. Value of s1: V (s1) = R(s1)+P (s1|s1, a)V (s1)+P (s2|s1, a)V (s2)+P (s3|s1, a)V (s3). Since s2 and 79 s3 areterminalstates,V(s2)=R(s2)=1andV(s3)=R(s3)=2. Thus,V(s1)=R(s1)+P(s1|s1,a)V(s1)+ 80 P(s2|s1,a)V(s2)+P(s3|s1,a)V(s3)=R(s1)+P(s1|s1,a)V(s1)+P(s2|s1,a)R(s2)+P(s3|s1,a)R(s3)=1.4. 81 By drawing the expectimax tree, expectimax value is the same (P (s2|s1, a)R(s2) + P (s3|s1, a)R(s3) = 1.4. 82 4 Reinforcement Learning ! != ! ! T Figure 3: Inverted pendulum for the RL questions. We will consider RL on a tiny control problem. Shown in Figure 3 is a simplified inverted pendulum 83 84 (mass attached to an inelastic weightless rod). The hope is to balance the pendulum at the upright position 85 by giving a torque input at the end of the rod (the box with T indicating two choices of torque in opposite 86 directions). The angular position is measured by the difference with the upright position, and tilting right 87 is considered the positive direction. Note that this is a (of course extremely) simplified model of how robots 88 walk and Spacex lands its rockets. 89 We use variable to represent the pendulums angular position, and to represent its angular speed. 90 Suppose s = (, ) be the current state, and s = (, ) be the next state, after the system runs for a small 91 time step t. The discretized and very aggressively simplified dynamics of the pendulum is as follows: = +t (1) = +(k1+k2a)t (2) a {0,T,T} (3) 4 92 where k1,k2 are constants dependent on the physical parameters that we do not need to care about, so let 93 us just assume they are just k1 = k2 = 1. These equations tell you how the current state (, ) transitions 94 to the next state (,) under an action a. This control action a is simply either 0 or T or T, and we let 95 T = 1. We set the time duration of each step t to also be constant t = 1 (which is not small enough to 96 physically justify the simplifications we made in the equations but lets not worry about that). 97 Note that there is no probability involved in this setting, but the system dynamics will not be known to 98 the learner, so the agent needs to learn the control policy by simulating the system and observing the effects 99 of the actions. Define the reward function for each state s = (, ) be R(s) = (2 + 2). By maximizing 100 accumulated rewards, the agent can learn to maintain the pendulum at the upright position, in which case 101 the accumulated rewards will be close to zero (and otherwise much more negative). 102 Question 7 (1 Point). For the problem, is it enough to consider only the angular position as the systems 103 state, without the angular velocity component? Why? (Recall what Markov means, and although we 104 dont have probabilities here, think about why the Markov condition was important.) 105 Answer 7. No. It would violate Markov property: by not having the angular velocity, future states depend 106 not only on the current state, but also the sequence of states preceded it. 107 Question 8 (2 Points). Let the initial state be s0 = (0, 1). Take two steps of actions, a0 = T and a1 = T , 108 which will unroll the system from s0 to the next state s1 and then to another state s2, according to the 109 equations in (1)-(2). We will stop the simulation there and consider s2 as a terminal state. Write down what 110 s1 and s2 are (their numerical values as a vector), and the reward-to-go for each of these three states in this 111 trajectory. If you need, use a discount factor = 0.5. 112 Answer 8. s1 = (1, 2), s2 = (3, 2). Reward-to-gos: G(s0) = 6.75, G(s1) = 11.5, G(s2) = 13 113 Question 9 (3 Extra Points). Construct a trajectory of three states s0,s1,s2 such that it ends up at the 114 perfect upright position, i.e., s2 = (0,0). We require that s0 = s2 and a0 = 0. (Hint: Write down a few 115 linear equations and you just need to find one of the many possible solutions.) Now, for the state s1 and the 116 corresponding a1 that you came up with in this sequence, what should the optimal Q value be for Q(s1, a1)? 117 (Hint: Note that s2 is the best possible terminal state for the system, so the action a1 you computed is the 118 best possible one). Answer 9. We require so 1 = 0 + 0 1 = 0 + 0 + a0 2 = 0 = 1 + 1 2 = 0 = 1 + 1 + a1, 2 =0=20 +20 +a0 2=0=20+20 +a0+a1, 119 andsoitisnecessaryandsufficientthata1 =0and20+20 +a0 =0. 120 If you are interested (after the quarter ends apparently, and sorry no more extra credits), try coding this 121 environment up and see how Q learning can be used to find a good control policy for it (although you may 122 need to tune the parameters in the equations to make it more realistic and feasible). 5 and the Q value should be whatever their 1,1 are, plugged into the re- ward func- tion (since the next state from there is s2 with reward 0). Figure 4: Tree for the MCTS Question 123 5 Monte Carlo Tree Search 124 Consider the minimax tree in Figure 4. This is the entire game tree, and the leaf nodes are the actual 125 terminal states of the game. In this question, wherever there is a random choice to make, the order is from 126 left to right. For instance, when you need to break tie between two children nodes, prioritize the left one. 127 When you need to sample a random outcome between left and right, the random sequence is fixed to be left, 128 right, left, right, etc. In the tree, the number 1 means Max wins, and the number 0 means Min wins. 129 Question 10 (2 Points). Draw the first four trees that the MCTS algorithm constructs. Use c = 10 in the 130 UCB formula (refer to slides or pseudocode of PA3). After 6 rounds of the algorithm (6 iterations of the 131 entire MCTS loop), what is the number of wins and the number of visits for Maxs right action? 132 Answer 10. Expand order: L, R, RL, LL. After 6 rounds, Q = 3, N = 3 133 6 Constraint Solving 134 We now look at constraint solving in the context of neural networks. Show in Figure 5 is a simple neural 135 network. You dont need to know about them before (and knowing them will not give any advantage), just 136 follow the descriptions below. Figure 5: Left: Neural network model for the constraint solving questions. Right: The tanh function. 137 The x variable represents the input, and it first passes through a linear transformation with weight a1 138 and bias b1 and then gets fed into the neuron which we choose to be the tanh function here. Look up this 139 function on Google and observe that it has a switch like behavior, which makes it a continuous version of 140 a logical gate. Also, importantly, it is a monotonic function. Then the output from the tanh function passes 141 through another transformation defined by a2 and b2 and then becomes the output value y. So overall, this 142 network is simply the diagram of a nonlinear function from the input x to the output y: y = a2 tanh(a1x + b1) + b2. 6 x a tanh a2 y b b2 143 This nested form can be flattened in to a constraint system over x, y and two more additional variables s1, s2, 144 in the sense that the input and output values of the network always satisfy the following constraint system: {s1 =a1x+b1,s2 =tanh(s1),y=a2s2 +b2}. (4) 145 In this form, we can use constraint solving algorithms to analyze the relation between inputs and outputs of 146 this network. Note that all the really large networks (with millions or billions of neurons like this) can be 147 encoded and analyzed with the same principle, if the constraint solving algorithms scale well enough. 148 Question 11 (3 Points). Suppose y [1, 2] is the range for the output to be classified as a certain result (like 149 cat). Assume that the network parameters have been trained to be simply a1 = b1 = a2 = b2 = 1. What 150 is the maximum range (lower and upper bounds) of input x that will produce this output range y [1, 2]? 151 Round up numbers to 2 digits after decimal point in your result. Feel free to use Google or Python for any 152 of the computation needed. (Hint: use arc-consistency to propagate the range of variables from y to x, using 153 the constraints in (4)). 154 Answer 11. y [1, 2] requires s2 + 1 [1, 2] (constraint 3), so s2 [0, 1], which requires tanh s1 [0, 1] 155 (constraint 2), so s1 [0, ], which requires x + 1 [0, ) (constraint 1), so x [1, ). 156 Question 12 (3 Points). Suppose I have an input x0 that already produces the output y0. I now want a 157 different input x1 that is quite different from x0, in the sense that |x0 x1| > , where is a given positive 158 parameter that defines the threshold, such that the output on x1 is exactly the same as x0s output. What 159 is the constraint system (write it in a similar way as (4)) that you can write down to uniquely define all such 160 x1? Feel free to use all the variables that appeared here (x0, y0, ) in your constraints. (Note: this in general 161 is the adversarial attack problem for neural networks. By solving such constraints, you can find a picture 162 that is nonsense but still classified as cat, which is what people worry about for face recognition systems and 163 autonomous cars driven by neural perception systems.) 164 Answer 12. Replace y with y0 and x with x1 in the constraints above, and add the constraints |x0 x1| > : {s1 =a1x0 +b1,s2 =tanh(s1),y0 =a2s2 +b2,s1 =a1x1 +b1,s2 =tanh(s1),y0 =a2s2 +b2,|x0 x1|>} 165 166 7 167 Question 13 (2 Points). List all the satisfying assignments of the following DNF formula. (p1 p2 p3)(p1 p2)(p2 p3) 168 (Hint: Recall the special structure of DNF and you dont need a truth table to do this.) Answer 13. p1 =1,p2 =1,p3 =1 p1 =0,p2 =1,p3 =0 p1 =0,p2 =1,p3 =1 p1 =1,p2 =1,p3 =0 169 Question 14 (2 Points). List all the satisfying assignments of the following CNF formula. (p1 p2 p3)(p1 p2)(p2 p3) 170 (Hint: Pay attention to the relation between this formula and the previous one.) 7 I was origi- nally putting y in [1,2] just as a placeholder before fixing the weights, then forgot lol thats why theres infinity here... I meant to require x0 and y0 to be constrained as well, be- cause they should be (so there should be two copies of the con- straints); but the wording is not clear perhaps so if they just use them its fine... Actually, if they dont constrain x0 and y0 lets do -1. Propositional Reasoning Answer 14. p2 = 0, and p1, p3 can be arbitrarily assigned. Or, explicitly: p1 =0,p2 =0,p3 =0 p1 =0,p2 =0,p3 =1 p1 =1,p2 =0,p3 =0 p1 =1,p2 =0,p3 =1 8 


51作业君

Email:51zuoyejun

@gmail.com

添加客服微信: abby12468