程序辅导案例 > C/C++ >

代写辅导接单- SCDAA Coursework 2022-23∗ David Sˇiˇska, School of Mathematics, University of Edinburgh Submit via Learn by Friday 7th April 2023 at 12:01pm

欢迎使用51辅导，51作业君孵化低价透明的学长辅导平台，服务保持优质，平均费用压低50%以上！ 51fudao.top

SCDAA Coursework 2022-23∗

David Sˇiˇska, School of Mathematics, University of Edinburgh

Submit via Learn by Friday 7th April 2023 at 12:01pm

Overview

Your overall task will be to implement a numerical algorithm for solving stochastic control problems using policy iteration combined with the “deep Galerkin method” for solving a linear PDE.

This will be broken into steps which a) give you credit for completion and b) allow you to check your progress.

1. Create Python code for solving the LQR problem in 2 spatial and 2 control di- mensions, see Section 1. We will need this to check our policy iteration algorithm produces correct answers in the end.

2. Train a neural network for the optimal Markovian control from the previous step (supervised learning), see Section 2 and train a different neural network for the value function using the value functions from the previous step. This will verify that the networks are “rich enough” and can provide good approximations.

3. Implement the “deep Galerkin method” for solving a linear PDE, see Section 3.

4. Implement the policy iteration, see Section 4.

You will be expected to submit a PDF report on what you did on Learn by the deadline. In the PDF provide a link to an online git repository where your code can be found. It should contain a README.md telling me how to run the code to reproduce any output (graphic / table) you include in the PDF. I will use the code from the latest commit made prior to the deadline.

Use numpy, scipy, matplotlib and torch and all the functions they contain but do not use other libraries.1 You will be working in groups of three2 and the submitted PDF and git repo README.md should state your names, student numbers andindividualcontributionfractions.3 Youneedtoworktogetheroneachofthetasks at least to the extent that you can use code created in one task for the next one. There are 22 marks in total on this assignment but it is marked out of 20. Think of the extra two marks as a bonus.

∗Last updated 21st March 2023.

1If you wish to use another library seek permission from the course organiser first.

2Self-organise. One person from each group should send [email protected] an email

(Cc’ing the other members) with the subject SCDAA 2023 CW group stating names and student num- bers of group members.

3The default is that you contributed equally i.e. 13 each. If one of you feels they contributed more then they have to agree with the others about how you split this (e.g. 12 , 14 , 14 ). If you cannot agree speak with the course organiser.

Please note that a submission that doesn’t demonstrate why things work (e.g. test examples, convergence plots etc.) will get very few marks even if you have code that correctly solves the exercises.

1 Linear quadratic regulator

We consider

Our aim is to minimize

dXs =[HXs +Mαs]ds+σdWs,s∈[t,T],Xt =x. α t,x��T⊤ ⊤ ⊤��

(1)

J (t,x):=E (Xs CXs +α Dαs)ds+XT RXT t

where4 C ≥ 0, R ≥ 0 and D > 0, are given and deterministic and we will assume 2×2.

The value function is v(t,x) := infα Jα(t,x). We know how to solve the Bellman PDE to obtain that

⊤��T⊤ v(t,x)=x S(t)x+ tr(σσ Sr)dr,

where S is the solution of the Ricatti ODE: S′(r)=−2H⊤S(r)+StMD−1MS(r)−C, r∈[t,T], S(T)=R.

Note that solution takes values in the space of 2 × 2 matrices. The optimal Markov control is

a(t,x) = −DM⊤S(t)x. Exercise 1.1 (Solving LQR using from Ricatti ODE, 1 mark).

Write a class which:

i) Can be initialised with the matrices specifying the LQR problem and T > 0.

ii) Has a method which will solve (approximate) the associated Ricatti ODE on a time grid which is an input (numpy array or torch tensor).

iii) Has a method that, given a torch tensor of dimension batch size × 1 × 2, will return a torch tensor of dimension batch size × 1 with entries being the control problem value v(t, x) for each t, x in the batch (for x two dimensional).

iv) Has a method that, given a torch tensor of dimension batch size × 1 × 2, will return a torch tensor of dimension batch size × 2 with entries being the Markov control function for each t, x in the batch (for x two dimensional).

Exercise 1.2 (LQR MC checks, 6 marks). Run a Monte Carlo simulation of the system with the optimal solution you have obtained and ensure that you’re converging to the optimal value function you obtained in Exercise 1.1. In particular:

1. Define how you’ll measure the error and justify your choice.

4 For any matrix M we will write M ≥ 0 to denote positive definite matrices i.e. matrices such that for any ξ ∈ Rn we have ξ⊤Mξ ≥ 0. Similarly M > 0 denotes strictly positive definite matrices i.e. thosesuchthatforanyξ∈Rn,ξ∕=0,wehaveξ⊤Mξ>0.

2. With number of Conte Carlo samples large (e.g. 105) vary the number of time steps in your simulation, take 1, 10, 50, 100, 500, 1000, 5000 and plot the error as a log-log plot.

3. With a number of time steps large e.g. 5000 vary the number of Monte-Carlo samples,take10,50,102,5·102,103,5·103,104,5·104,105 andplottheerroras a log-log plot.

Hint: Let a = a(t, x) denote the optimal control from Exercise 1.1. Once you’ve plugged this in (1) you get

dXs =[HXs +Ma(s,Xs)]ds+σdWs,s∈[t,T],Xt =x. (2)

To do a time discretisation fix N ∈ N (number of time steps); let τ := T/N be the time step. Assume from now that you’d only possible want to start the SDE at times tk = kτ for some k = 0,1,...,N. You have a choice of (at least) two schemes since you know a(t,x) = −DM⊤S(t)x.

Explicit:

XN = XN+τ [HXN−MDM⊤S(t )XN)]+σ(W −W ), n = k,...,N , X = x.

tn+1tn tn ntn tn+1tn tk Implicit:

XN = XN+τ [HXN −MDM⊤S(t )XN )]+σ(W tn+1 tn tn+1 n+1 tn+1 tn+1

−W

The implicit scheme will require you solve a linear system at each time step.

= x.

2 Supervised learning, checking the NNs are good enough

For our purposes a neural network is a parametric function that depends on an input, say x ∈ Rd and parameters, say θ ∈ Rp and takes values in say Rd′ and we’ll write φ = φ(x; θ).

An example of one hidden layer neural network is

φ(x; θ) = φ(x; α(1), α(2), β(1), β(2)) = α(1)ψ(α(2)x + β(2)) + β(1) ,

where ψ is an activation function that is getting applied component-wise, α(2) is a h × d matrix, β(2) is an h-dimensional vector, α(1) is a d′ × n matrix and β(1) is a d′-dimensional vector. In this case θ = ��α(1), α(2), β(1), β(2)�� so that p = h × d + h + d′ × h + d′ is the number of parameters or “weights” depending on the size of the hidden layer h.

The type of supervised learning task that is the most relevant for our purposes is the following. Our aim is to find neural network (NN) weights θ∗ such that our NN φ(·;θ∗) is a good approximation of some function f.

We get a training data set ��x(i),f(x(i))��Ndata and search for θ∗ by attempting to

minimize

R(θ):= 1 �� φ(x(i);θ)−f(x(i))��2

i=1

Ndata ��

Ndata i=1

over θ ∈ Rp by running (some variant) of a gradient descent algorithm i.e. from an

initial guess of θ(0) we update

θ(k+1)=θ(k)−γ∇θR(θ(k)), k=0,1,2,....

), n = k,...,N , X

Exercise 2.1 (Supervised learning of value function v, 3 marks). Take the neural network [1, Net DGM class] with hidden layer size of 100.

With T = 1, use Exercise 1.1 to generate training data v(t(i),x(i)) by sampling t uniformly on [0, T ] and x uniformly on [−3, 3] × [3, 3].

Use the torch Adam optimizer to find network weights which give a good approx- imation of the value functions. Provide a plot of the training loss.

Exercise 2.2 (Supervised learning of Markov control a, 2 marks). Carry out the same task as in Exercise 2.1 but for the Markov control from Exercise 1.1. Use the neural network [1, FFN class] with 2 hidden layers of size 100. Note that the output of the network must be 2-dimensional in this case as the function you’re approximating returns a 2-dimensional vector. Provide a plot of the training loss.

3 Deep Galerkin approximation for a linear PDE

Consider the linear PDE where α = (1,1)⊤ and H, M, C, D, R, σ are the matrices from Exercise (1.1).

∂u+12tr(σσ⊤∂xxu)+Hx∂xu+Mα∂xu+x⊤Cx+α⊤Dα=0 on [0,T)×R2, u(T,x) = x⊤Rx on R2 .

This is the linearisation of the Bellman PDE resulting from taking the constant control α = (1, 1)⊤ regardless of the state of the system.

The deep Galerkin method replaces u by a neural network approximation u(·, ·; θ), chooses random points from the problem domain (t(i), x(i)) with i = 1, . . . , Nbatch and them aims to minimize

R(θ) := Reqn(θ)+Rboundary(θ)

u(t(i), x(i); θ))

2 xx ��

N�� batch ��

∂u(t(i), x(i); θ) + 1 tr(σσ⊤∂

Nbatch i=1 ��

over θ in Rp. Since the RHS of the PDE we’re solving is zero if we can get R(θ) to be

��

Exercise 3.1 (Deep galerkin, Linear PDE, 5 marks). Implement the Deep Galerkin method for the above linear PDE. You can use [1, pde BlackScholes exchange dgm.py] or [2] as starting point / inspiration.

Provide: plot of the training loss and also (perhaps not at every learning step but at reasonable intervals) error against the Monte Carlo solution which you will adapt from Exercise 1.2 by replacing the optimal control with the constant α.

4 Policy iteration with DGM

Recall the policy iteration / improvement algorithm PIA [3, Section 4.4]. We will approximate the value function v by v(·, ·; θval) and the Marko controls a by a(·, ·; θact)

2 +Hx(i)∂xu(t(i),x(i);θ)+Mα∂xu(t(i),x(i);θ)+(x(i))⊤Cx(i) +α⊤Dα

+ 1

Nbatch ��

�� u(T,x(i);θ)−(x(i))⊤Rx(i)��2

N batch

equal to zero then we have a good approximation of the solution.

At each iteration we will:

i=1

��

i) Given markov control function a approximated by a(·, ·; θact) we will need to solve ∂u+12tr(σσ⊤∂xxu)+Hx∂xu+Ma∂xu+x⊤Cx+a⊤Da=0 on [0,T)×R2,

u(T,x) = x⊤Rx on R2 .

We already know how to solve the linear PDE: we’ll just adapt Exercise 3.1 by replacing the constant α with output of a(·,·;θact). This will give as an update of θval

ii) Now with θval fixed we want to update θact so that we minimize the Hamiltonian:

��b a t c h �� 1N

Exercise 4.1 (PIA with DGM, 5 marks). Implement the policy iteration algorithm and check that it converges to the correct solution by comparing with the optimal value and actions from Exercise 1.1.

References

[1] M. Sabate-Vidales, Deep-PDE-Solvers, Github project, https://github.com/msabvid/Deep-PDE-Solvers, 2021.

[2] C. Jiang, Deep Galerkin Method, Github project, https://github.com/EurasianEagleOwl/DeepGalerkinMethod, 2022.

[3] G. dos Reis and D. Sˇiˇska. Stochastic Control and Dynamic Asset Allocation. https://www.maths.ed.ac.uk/∼dsiska/LecNotesSCDAA.pdf. 2021.

Hx(i)∂xv(t(i),x(i);θval)+Ma(t(i),x(i);θact)∂xu(t(i),x(i);θval)

H(θact)= Nbatch

Note that here we’re deliberately not working with mean-square-error. The al-

i=1

+(x(i))⊤Cx(i)+a(t(i),x(i);θact)⊤Da(t(i),x(i);θact) . gorithm simply minimizes the Hamiltonian (and most likely not to 0).

��