ENG3 Final Project

Due June 12, 2020, 5:00pm

Modeling the Covid-19 pandemic

The SIR (susceptible-infected-removed) epidemiological model consists of a

simplified mathematical model for the spread of infectious diseases. The idea

of the SIR model is simple: consider a population of N individuals. We then

separate the population into three groups: susceptible, infected and removed

(deceased). The sizes of these subpopulations at time t are denoted by S(t),

I(t), and R(t), respectively. Mathematically, this model can be written as a

system of three coupled, non-linear ordinary differential equations (ODEs)

dS

dt

= −βSI, (1)

dI

dt

= βSI − γI, (2)

dR

dt

= γI, (3)

where we assume that the disease transmission rate β = β(t) and the mortal-

ity rate γ = γ(t) are real, time-dependent functions. Given that the ODEs

are nonlinear and coupled, finding an analytical solution for Eqs. (1)-(3) is

impractical. Nonetheless, one can search for solutions numerically, using, for

example, ODE solvers such as the ones available in Matlab. We will use this

classic epidemiological model to model the spread of Covid-19 in Santa Bar-

bara County.

Part 1: Exploring the data (10 points)

Your first task consists of loading the data file covid_data.mat provided on

GauchoSpace and plotting its contents. The file consists of official data re-

ported by local authorities in the interval from March 15 to May 13 in Santa

1

Barbara County. It has 2 columns: the first contains the cumulative num-

ber of infected individuals in the population at a certain date, and

the second contains the cumulative number of deaths recorded on the

same date.

(a) Create a script named explore_data.m that loads the file covid_data.mat

provided.

(b) Still in your explore_data.m script, create a figure with two plots: i)

number of infected individuals vs time and ii) deaths vs time. Make sure that

you:

• Change the line thickness to 3 for each line.

• Use legends "Infected Individuals", "Deaths" to identify each curve.

• Name the x and y axes as "Time (days since March 15)" and "Number

of individuals", respectively. You can use legend(’location’,’best’)

to move your legend box to the top left corner, so it doesn’t cover your

plot.

• Give your plot the following title: "Covid-19 in Santa Barbara County".

• Turn the grid on using the command grid on.

• Increase the font size of legends, axis labels and title using the command

set(gca,’FontSize’,20).

At the end, your plot should look like Fig. (1). Upload explore_data.m

to GauchoSpace.

2

Figure 1: Plot of the number of individuals infected and deaths in Santa Barbara County

due to Covid-19.

Part 2: Estimating the parameters from the data (20

points)

Now that you have familiarized yourself with the data, let us come back to

the SIR model. Our goal is to use it to predict the growth of the number

of infected individuals and as well as the number of deaths in the near fu-

ture. But the problem is that we don’t know the values for the functions

β(t), γ(t) in Eqs. (1)-(3). Despite that, observing the data from places where

the Covid-19 epidemic got under control via quarantine measures can give us

some hints about these parameters. For example, it is reasonable to consider

that both the infection and the mortality rates decreased as more and more

people started taking prophylaxis measures, e.g., stay-at-home shelters, self-

quarantine, crowd control in public transportation systems and the usage of

masks to avoid the spread of the virus.

Based on this information, let us postulate that both the infection and

the mortality rate decreased exponentially after the quarantine measures were

implemented, i.e.,

γ(t) = 0.0247e−0.111t, (4)

β(t) = β0e

−0.0506t, (5)

where β0 is a constant to be determined. Your task will be to estimate β0

3

using the data you plotted in Part 1.

There exist many methods to estimate parameters from a given set of data

points. For simplicity, you are going to use a Euclidean norm approach to

minimize the error in the total number of cases C(t) = I(t) + R(t). The idea

is simple: solve Eqs. (1)-(3) for various β0 values within a range, and for each

of these values, compute the error given by the norm of the difference between

the model results C(t) and the data, i.e.,

err =

√∑Ndays

i=1 (I(i) +R(i)− covid_data(i, 1))2

Ndays

, (6)

where Ndays is the number of days covered by the data (i.e., the length of the

covid_data matrix).

(a) Write a function named sir_model.m that takes as input tspan (time

span) and X0 = [* ; * ; *] (column vector of initial conditions for S, I, R)

and returns the right-hand side of the ODE system given by Eqs. (1)-(3).

(b) Write a script named sir_main.m that solves the ODE system for various

values of β0 (use a for loop for that). Use the following range of values for

β0: beta0 = [0.1:0.001:0.5]./S0, where S0 is the initial susceptible popu-

lation. Compute the error using Eq. (6) for each value of β0 tested, and store

these values. For the initial conditions, consider that the susceptible popu-

lation of Santa Barbara County was 500,000, and assume that the infection

started with 1 person infected and zero deaths. Use a time span of 100 days.

Use the ode45() function to solve the ODE system.

(c) Still in your script sir_main.m, plot β0 vs err. Make sure that:

• Your plots look like Fig. 2.

• Give your plot the following title: "Error analysis (SIR model vs data)".

• Change the line thickness to 3 for each line.

• Name the x and y axis as "beta0" and "Error", respectively.

• Turn the grid on using the command grid on.

• Increase the font size of legends, axis labels and title using the command

set(gca,’FontSize’,20).

4

Figure 2: Evolution of the Euclidean norm of the error as a function of β0.

(d) Still in your script sir_main.m, plot i) the total number of cases C(t) =

I(t) + R(t) vs. time, and ii) the number of deaths R(t) vs. time, for the best

value of β0 found in part (c). In addition to that, overlap the curves for C and

R with the covid_data.mat data provided on GauchoSpace, so that you can

compare simulations with data. Make sure that:

• Your plots look like Fig. 3.

• Change the line thickness to 3 for each line.

• Instead of using the usual plot() command, use semilogy() to plot in

semi-log scale. This will make your two curves look more distinguishable.

• Give your plot the following title: "SIR model, beta0 = X", where X

must be replaced by the respective value of β0 found in (c).

• Use legends "Cumulative infected (SIR)", "Deaths (SIR)", "Cumulative

infected (data)", "Deaths (data)" in your plots.

• Name the x and y axis as "Time (days since March 15)" and "Number of

individuals", respectively. Again, you can use legend(’location’,’best’)

to move your legend box to the best available spot, so it doesn’t cover

your plots.

• Turn the grid on using the command grid on.

5

• Increase the font size of legends, axis labels and title using the command

set(gca,’FontSize’,20).

Note: The best β0 value obtained in (c) should be close to β0 ≈ 7.580 ×

10−7. If you have not been able to accomplish the optimization, please proceed

with β0 = 7.580× 10−7 for Part 3 of this project.

Figure 3: Semi-log plot of the solution of the SIR model with β0 = 7.580× 10−7.

Upload sir_model.m and sir_main.m to GauchoSpace.

6

Part 3: Stochastic simulation of SIR model (20 points)

Let’s take a look back at your plots from Part 2-(c). At this point you probably

noticed that we can find a β0 that fits the data reasonably well, but no matter

how much you change it, we will never find the perfect fit. One of the reasons

for that involves the fact that traditional deterministic (ODE) models like

the one given by Eqs. (1)-(3) cannot accurately predict the early stages of

the infection, since they rely on the hypothesis of a well-mixed population

and interactions of millions of individuals. In contrast, for small populations,

random events are often relevant, and therefore situations like jumps in the

number of cases are prone to occur; deterministic models, on the other hand,

disregard these events. To solve these problems, we use stochastic models.

Notice that the current number of infected individuals in Santa Barbara

County is relatively small compared to the total population. Thus, the deter-

ministic model given by Eqs. (1)-(3) might lead to incorrect predictions. Your

goal for this assignment is to solve a stochastic SIR model using a method

called the Gillespie algorithm.

The Gillespie algorithm belongs to a class of methods called dynamic Monte

Carlo methods. As you probably remember from homeworks 6 and 7, Monte

Carlo simulations allow us to estimate probabilities of events by testing mul-

tiple trials (also called realizations). Since each realization involves random

numbers, the results of Monte Carlo simulations are never exactly the same.

Similarly, each realization (simulation) via the Gillespie algorithm will gener-

ate unique curves for S(t), I(t), R(t).

The stochastic SIR model considers the following chains of events (also

called Markov chains)

S + I

β (t)−−→ 2 I, (7)

I

γ (t)−−→ R, (8)

where γ(t), β(t) are the same functions given by Eqs. (4)-(5). The chain given

by Eq. (7) describes the process of infection: each time a susceptible individual

meets an infected one, there’s a propensity (which is proportional to β(t)) that

dictates how often the susceptible individual becomes infected. Similarly, the

chain in Eq. (8) describes the process of infected individuals succumbing to

the virus and dying, with a propensity proportional to γ(t). The propensity

function αi gives the probability that an event in the chain i will occur in the

time interval (t, t+ τ):

αi = number of individuals interacting at chain ”i”× rate, (9)

7

which, for the chains given by Eq. (7)-(8), read

α1 = S(t)I(t)β(t), (10)

α2 = I(t)γ(t). (11)

For more details about the implementation of this algorithm, see p. 11 of

this final project assignment: Recipe for Stochastic SIR using Gillespie Algo-

rithm.

(a) Using the recipe provided on p. 11, write a function named ssir_model.m,

that takes as input tspan (time span) and X0 = [* ; * ; *] (column vector

of initial conditions for S, I, R) and returns one realization (i.e., column vec-

tors containing S(t), I(t), R(t), resulting from the algorithm provided in p. 9)

of the Gillespie algorithm, and a time vector tVec. Use the functions γ(t), β(t)

given by Eqs. (4)-(5), and use the value of β0 that you found best fit

the data in Part 2-(c), i.e., β0 = 7.580× 10−7 .

(b) Write a script named ssir_main.m to compute Nr = 5 realizations of the

stochastic SIR model by calling your function ssir_model 5 times (use a for

loop for that). For the initial conditions, assume that the susceptible popu-

lation of Santa Barbara County was 500,000, and assume that the infection

started with 1 person infected and zero deaths. Use a time span of 100 days.

(c) Still in your script ssir_main.m, plot i) the total number of cases C(t) =

I(t) +R(t) vs. time, and ii) the number of deceased individuals R(t) vs. time,

for each of the 5 realizations computed in Part 3-(b). In addition to that,

overlap the curves for C and R with the covid_data.mat data provided on

GauchoSpace, so that you can compare simulations with data. Make sure that:

• Your plots, legends and title look like Fig. 4. Note: since the method is

stochastic, your curves will be different than the ones shown in Fig. 4.

• Instead of using the usual plot(), use the command semilogy() to plot

in semi-log scale. This will make your two curves look more distinguish-

able.

• Give your plot the following title: "sSIR model".

• Use legends ’Cumulative infected (sSIR)’, ’Deaths (sSIR)’, ’Cumulative

infected (data)’, ’Deaths (data)’ in your plots. Note: Since all the real-

izations have the same legends, use ’HandleVisibility’,’off’ in your

semilogy() commands for realizations 2 to Nr, to avoid legend repetition.

8

• Name the x and y axis as "Time (days since March 15)" and "Number of

individuals", respectively. Again, you can use legend(’location’,’best’)

to move your legend box to the best available spot, so it doesn’t cover

your plots.

• Turn the grid on using the command grid on.

Figure 4: Semi-log plot of the solution of the stochastic SIR model for 5 realizations (each

line corresponds to a realization).

9

(d) In a similar manner as you did for your script ssir_main.m, create an-

other script, named ssir_hist.m, to compute Nr = 2000 realizations of the

stochastic SIR model. At the end of each realization i, store the last value of

the total number of cases, i.e., C(i) = I(end) = R(end). This value repre-

sents a prediction of the number of cases in Santa Barbara County after tspan

days (i.e., 100 days). With these values, create a histogram of the array C

using the command histogram(C). Your histogram should look like Fig. 5.

Based on your histogram, write a comment in your script ssir_hist.m indi-

cating, using your own words, what you think is reasonable upper bound for

the number of cases.

Hint: for an empirical distribution, you can compute the cumulative distri-

bution function using the command cdfplot(C). The plot gives an estimate

of the probability F that the number of cases C will be smaller than or equal

to a value X.

Upload ssir_model.m, ssir_main.m, ssir_hist.m to GauchoSpace.

Figure 5: Histogram of the total number of cases C = I+R at day 100, for 2000 realizations

of the stochastic SIR model.

10

Recipe for Stochastic SIR using the Gillespie Algorithm:

While t < tspan , do the following:

1. Generate two random numbers, r1, r2, uniformly distributed in the inter-

val (0, 1).

2. Compute the propensity function α for each chain. Since we have two

chains, the propensities are given by

α1 = S(t)I(t)β(t), (12)

α2 = I(t)γ(t), (13)

where γ(t), β(t) are the functions given by Eqs. (4)-(5). Use the value of

β0 that you found best fit the data in Part 2-(c).

3. Compute the total propensity α0 = α1 + α2.

4. Compute the time τ when the next event (i.e., either contamination or

death) will take place.

τ =

1

α0

ln

(

1

r1

)

(14)

5. Compute the number of individuals for each group S, I, R at time t + τ

as the following:

if r2 ∈ [0, α1/α0) :

S(t+ τ) = S(t)− 1,

I(t+ τ) = I(t) + 1,

R(t+ τ) = R(t),

(15)

else if r2 ∈ [α1/α0, 1) :

S(t+ τ) = S(t),

I(t+ τ) = I(t)− 1,

R(t+ τ) = R(t) + 1.

(16)

6. Make sure that the system always includes at least one infected person,

i.e., check if I(t+ τ) > 0. If I(t+ τ) becomes 0, include another infected

person at time t+ τ by setting S(t+ τ) = 1.

7. Save your new time as a point in the time vector: tVec(i) = t + tau

(so you can plot S(t), I(t), R(t) vs t later).

11