CE191 Lab4 1 CE191 Lab 5 Stochastic Dynamic Programming Dounan Tang 11/18/2015 Lab assignment will be due on 12/4/2015 CE191 Lab5 2 Bridge management Goal: minimize the cost of bridge management If no action is performed, the bridge deteriorates year after year Possible actions: “maintenance”, “replacement”, “do nothing” CE191 Lab5 3 Deterministic vs. stochastic model Deterministic model If the bridge is in condition 5 this year, and we do nothing, the bridge will be in condition 4 next year Stochastic model If the bridge is in condition 5 this year, and we do nothing, there is 20% probability that the bridge will be in condition 5 next year, and 80% probability in condition 4 next year CE191 Lab5 4 Bellman equation for stochastic model ( ) ( ) ( ) ( ) ' ' 'min , Pr | , u U x X V x c x u ob x x u V xα ∈ ∈ = + ∑ minimum cost to manage bridge discount factor state transition matrices, in transitionMatrices.mat cost function, in cxu.m x: bridge condition u: action CE191 Lab5 5 Example Penalty(poor) = 2, Penalty(okay) = 1, Penalty(good) = 0 Repair(nothing) = 0, Repair (repair) = 1. okay poor 0.1 0.1 0.1 good 0.1 0.8 0.1 action = repair okay poor 1 0.7 0.3 good 0.5 0.3 0.2 action = nothing 0.8 0.5 0.4 CE191 Lab5 6 Backward value iteration Life can be only understood backward… Good OK Poor Year t Good OK Poor Good OK Poor Year t-1Year t-2 initialize V(Good)=0 V(OK)=0 V(Poor)=0 V(Good),V(OK),V(Poor) are computed by solving Bellman Equation. … repair or do nothing?repair or do nothing? Computation direction CE191 Lab5 7 Finite vs. infinite horizon Finite horizon Backward recursion to t-T, in which T is the horizon As a result, optimal cost function V(t,x) is a function of both time and state viteration(transitionMatrices, cxu, horizon) Infinite horizon Backward recursion until V converges Once converged, V is only a function of state viterationInf(transitionMatrices, cxu, epsilon) Epsilon is used to check the convergence of value iteration CE191 Lab5 8 Our problem 10 conditions from 0 to 9 Actions can be replacement, do nothing, maintenance Conditions 0-2 are not acceptable? Penalty vector [Inf,Inf,Inf,0,0,0,0,0,0,0] This has been implemented in cxu.m T is 5 years (Question 1) and infinity (Question 2) CE191 Lab5 9 Work plan Understand the input data structure and cxu.m Complete the two functions. Make sure your results are making sense. Complete the lab report. CE191 Lab5 10 Discussion (no extra credits) What kind of problems can be solved by dynamic programming?
欢迎咨询51作业君