Evaluation Scenario 3

using EasyModelAnalysis, LinearAlgebra, CSV
using Catlab, AlgebraicPetri
using Catlab.CategoricalAlgebra

Question 1

Setup Model

  1. Begin with a basic SIR model without vital dynamics. Calibrate the model parameters using data on cases during the ‘training period’. Then evaluate the model during the out-of-sample ‘test period’.

To get started with the code, we implemented the basic SIR without vital dynamics directly in ModelingToolkit.jl. This is a version that was written by an epidemiologist at Microsoft Pandemic, Simon Frost, who has become a fan of the TA3 automated simulation tools and wrote an entire repository of tutorials for this software. It is found at https://github.com/epirecipes/sir-julia.

In there is an SIR without vital dynamics which we took in full.

sir = read_json_acset(LabelledPetriNet, "sir.json")
sys = ODESystem(sir)
sys = complete(sys)
@unpack S, I, R, inf, rec = sys
@parameters N = 1
param_sub = [
    inf => inf / N,
]
sys = substitute(sys, param_sub)
defs = ModelingToolkit.defaults(sys)
defs[S] = 990
defs[I] = 10
defs[R] = 0.0
defs[N] = sum(x -> defs[x], (S, I, R))
defs[inf] = 0.5
defs[rec] = 0.25
tspan = (0.0, 40.0)
prob = ODEProblem(sys, [], tspan);
sol = solve(prob);
retcode: Success
Interpolation: specialized 4th order "free" interpolation, specialized 2nd order "free" stiffness-aware interpolation
t: 18-element Vector{Float64}:
  0.0
  0.0005656568557826305
  0.006222225413608936
  0.06278791099187198
  0.4044041746579178
  1.1716707206133883
  2.3193985544539766
  3.7902010609469463
  5.688002988241076
  8.045031472810619
 10.952685511821805
 14.36883303195685
 18.205603066519323
 22.312475477563503
 27.5677352097887
 32.191866923650174
 39.282042629160195
 40.0
u: 18-element Vector{Vector{Float64}}:
 [990.0, 10.0, 0.0]
 [989.997199808495, 10.001385951371075, 0.0014142401338982807]
 [989.9691769761303, 10.015255597754882, 0.015567426114815186]
 [989.6868470614252, 10.15496986954471, 0.1581830690301042]
 [987.8980568369944, 11.03922714131362, 1.0627160216919251]
 [983.3091728230193, 13.300150793572843, 3.390676383407916]
 [974.7050055074421, 17.509956411136862, 7.785038081421031]
 [959.8263689906823, 24.69736157515851, 15.476269434159235]
 [932.156350825352, 37.74146464152402, 30.102184533124046]
 [880.1526490978858, 61.04259814385608, 58.8047527582581]
 [784.0032611609198, 99.35091388225933, 116.64582495682093]
 [636.2213354658815, 142.69999820378337, 221.0786663303353]
 [473.41147443856124, 157.72282304177654, 368.8657025196623]
 [350.34232582978564, 130.25871589174935, 519.3989582784651]
 [266.86012469628594, 77.66364691232455, 655.4762283913897]
 [232.70651011464724, 43.34385172813365, 723.9496381572193]
 [210.99497199067702, 16.05922330888179, 772.9458047004414]
 [209.84281057038268, 14.47359751742121, 775.6835919121963]
plot(sol)

Perform Model Calibration

Model Calibration Unit Test

As a unit test of the model calibration tools, we generated data at the default parameters, then ran the global optimization, to see how well the parameters were recovered.

dataset = solve(prob, saveat = 0.1)
t_train = dataset.t[1:201]
t_test = dataset.t[202:end]
data_train = [S => dataset[S][1:201], I => dataset[I][1:201], R => dataset[R][1:201]]
data_test = [S => dataset[S][202:end], I => dataset[I][202:end], R => dataset[R][202:end]]
3-element Vector{Pair{Num, Vector{Float64}}}:
 S(t) => [409.1161430565276, 406.0851894334398, 403.0906862475753, 400.13255724437124, 397.2107110564623, 394.32504120368196, 391.47542609306186, 388.6617290188318, 385.8837981624198, 383.14146659245216  …  211.3075571878417, 211.13481077920125, 210.96459010199578, 210.7969146422015, 210.6317767627232, 210.46913757098204, 210.30895876698696, 210.15120264333456, 209.99583208520914, 209.84281057038268]
 I(t) => [149.03149213037267, 148.34527972394542, 147.63999456270116, 146.9161810459246, 146.1743853325435, 145.41515534112918, 144.6390407498963, 143.8465929967028, 143.0383652790499, 142.21491255408228  …  16.486756991150163, 16.250637777438623, 16.01760330167562, 15.787718859249022, 15.561004241947408, 15.337419612744819, 15.116925514072884, 14.899482867820842, 14.685052975335488, 14.47359751742121]
 R(t) => [441.8523648130998, 445.5695308426149, 449.2693191897236, 452.9512617097042, 456.61490361099425, 460.25980345518894, 463.88553315704195, 467.49167798446547, 471.0778365585304, 474.6436208534657  …  772.2056858210083, 772.6145514433604, 773.0178065963288, 773.4153664985497, 773.8072189953295, 774.1934428162733, 774.5741157189403, 774.9493144888448, 775.3191149394555, 775.6835919121963]
fitparams = global_datafit(prob, [inf => [0.2, 2.0], rec => [0.05, 0.5]],
                           t_train, data_train)
2-element Vector{Pair{Num, Float64}}:
 inf => 0.5000010679839274
 rec => 0.250001562122767

This then gives the forecasts in the test data:

_prob = remake(prob, p = fitparams)
sol = solve(_prob, saveat = t_test);
plot(sol, idxs = S)
plot!(t_test, data_test[1][2])
plot(sol, idxs = I)
plot!(t_test, data_test[2][2])
plot(sol, idxs = R)
plot!(t_test, data_test[3][2])

This looks very good and matches the original data, confirming that the inverse problem functionality is functional.

Now we train on data from June 1 2021 to September 30 2021.

Application to Real Data from TA1

using CSV, DataFrames, Downloads

# Infectious/Recovered day by day:
url = "https://raw.githubusercontent.com/DARPA-ASKEM/program-milestones/data-h-d-breakdown/6-month-milestone/evaluation/scenario_3/ta_4/usa-IRDVHN_age_HD_breakdown.csv"
file = CSV.File(Downloads.download(url))
df_raw = DataFrame(file)

start_train = 171
stop_train = 171 + 121
start_test = 171 + 122
stop_test = 171 + 122 + 92

df_train = df_raw[start_train:stop_train, :]
df_test = df_raw[start_test:stop_test, :]

t_train = collect(0:(size(df_train, 1) - 1))
t_test = collect(0:(size(df_test, 1) - 1))

N_total = 334998398 # assumed to be constant from (https://github.com/DARPA-ASKEM/program-milestones/blob/main/6-month-milestone/evaluation/scenario_3/ta_1/usa-2021-population-age-stratified.csv)
#S = N_total - R - I
data_train = [S => N_total .- df_train.I .- df_train.R, I => df_train.I, R => df_train.R]
data_test = [S => N_total .- df_test.I .- df_test.R, I => df_test.I, R => df_test.R]

u0s = [S => N_total - df_train.I[1] - df_train.R[1], I => df_train.I[1], R => df_train.R[1]]
_prob = remake(prob, u0 = u0s, tspan = (t_train[1], t_train[end]), p = [N => N_total])

fitparams = global_datafit(_prob, [inf => [0, 1.0], rec => [0.0, 1.0]], t_train, data_train,
                           maxiters = 1_000_000)
2-element Vector{Pair{Num, Float64}}:
 inf => 0.08544111323471651
 rec => 0.05628211182124159
# Plot training fit
_prob_train = remake(_prob, p = fitparams)
sol = solve(_prob_train, saveat = t_train);
retcode: Success
Interpolation: 1st order linear
t: 122-element Vector{Float64}:
   0.0
   1.0
   2.0
   3.0
   4.0
   5.0
   6.0
   7.0
   8.0
   9.0
   ⋮
 113.0
 114.0
 115.0
 116.0
 117.0
 118.0
 119.0
 120.0
 121.0
u: 122-element Vector{Vector{Float64}}:
 [3.02835257e8, 282869.0, 3.1880272e7]
 [3.028131790795386e8, 288858.48632869346, 3.1896360434132766e7]
 [3.027906354050454e8, 294973.115959352, 3.191278947899529e7]
 [3.0276761631738675e8, 301215.4318898591, 3.192956625072342e7]
 [3.027441119541516e8, 307588.0278556735, 3.1946698017992787e7]
 [3.027201122845863e8, 314093.54087658395, 3.1964192174537133e7]
 [3.026956070730565e8, 320734.659160394, 3.198205626778314e7]
 [3.026705858694344e8, 327514.12417019095, 3.200029800639545e7]
 [3.02645038009099e8, 334434.730624346, 3.2018925260276668e7]
 [3.026189526129362e8, 341499.32649651397, 3.2037946060567345e7]
 ⋮
 [2.9329411947155106e8, 2.7596482227077945e6, 3.89446303057412e7]
 [2.9308583645331657e8, 2.811165898373973e6, 3.910139564830953e7]
 [2.9287382326768345e8, 2.863491636493737e6, 3.926108309582288e7]
 [2.926580277780524e8, 2.9166319621289466e6, 3.942373825981873e7]
 [2.924383976261137e8, 2.970593231778911e6, 3.958940714210745e7]
 [2.9221488023184747e8, 3.025381633380385e6, 3.975813613477224e7]
 [2.9198742279352325e8, 3.0810031863075793e6, 3.992997202016925e7]
 [2.917559722877005e8, 3.137463741372155e6, 4.010496197092748e7]
 [2.91520475469228e8, 3.194768980823226e6, 4.028315354994881e7]
plot(map(data_train) do (var, num)
         plot(sol, idxs = var)
         plot!(t_train, num)
     end..., dpi = 300)
savefig("train_fit_S3_Q1.png")
"/home/runner/work/ASKEM_Evaluation_Staging/ASKEM_Evaluation_Staging/docs/build/Scenario3/train_fit_S3_Q1.png"

Why is that the best fit?

At first glance it may look like the system was incorrect, i.e. that it did not find the global optima for this problem. However, upon further inspection we can show that this is truly the global optima. To see this, we have to inspect against the "intuitive" solution. The intuitive solution would be to simply place the peak of the infections at the right spot.

_prob_train = remake(_prob, p = [inf => 0.363, rec => 0.29])
sol = solve(_prob_train, saveat = t_train);
plot(sol, idxs = I)
plot!(t_train, data_train[2][2], lab = "I_train")

However, if we check the L2 loss of this fit we will see it's a lot higher.

pkeys = [inf, rec]
EasyModelAnalysis.l2loss([0.363, 0.29], (_prob, pkeys, t_train, data_train))
1.359435546843824e17
EasyModelAnalysis.l2loss([fitparams[1][2], fitparams[2][2]],
                         (_prob, pkeys, t_train, data_train))
7.274007434625836e13

The reason for this is because the fits of "making the infected maximum have the correct peak" forces the susceptible population to be far off. This then introduces a much larger total error.

_prob_train = remake(_prob, p = [inf => 0.363, rec => 0.29])
sol = solve(_prob_train, saveat = t_train);
plot(sol, idxs = S)
plot!(t_train, data_train[1][2], lab = "S_train")
_prob_train = remake(_prob, p = [inf => 0.363, rec => 0.29])
sol = solve(_prob_train, saveat = t_train);
plot(sol, idxs = R)
plot!(t_train, data_train[2][2], lab = "R_train")

The reason that it is off is because the onset of this pandemic has a delay, i.e. it is flat for a bit before taking off. That cannot be the case for the SIR model. If inf > rec, then the onset of the pandemic is at time zero.

One may think that the observed delay is not a real delay, due to underreporting. However, in the training and testing period COVID-19 tests were already widely available. In addition, seroprevalence data speaks against this hypothesis. More specifically, The change ratios, ratios estimating the change in seroprevalence compared to the change in reported case prevalence, can be used as a multiplier to enhance the understanding of the infection burden represented by officially reported case rates. Yet, the ratio reached a low point from April 21 to July 1 2021 (~ the beginning of the training period), (1.1, CI: 0.6–1.7). These ratios increased to 2.3 (CI: 2.0–2.5) from July 1 to September 20 2021 (~ the end of the training period) [From https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9716971/pdf/main.pdf]. Therefore, other factors, such as violation of the well-mixed assumption of the SIR model, mutation or behavior induced changes in the infection rate might be a more likely cause for the delayed onset of the pandemic wave in summer 2021.

This motivates fitting in terms of not the L2 norm but the relative L2 norm, i.e. with values weighted in terms of the relative size of S. This would make the much larger values of S not dominate the overall loss due to the relative difference in units.

fitparams = global_datafit(_prob, [inf => [0, 1.0], rec => [0.0, 1.0]], t_train, data_train,
                           maxiters = 1_000_000, loss = EasyModelAnalysis.relative_l2loss)
2-element Vector{Pair{Num, Float64}}:
 inf => 0.08862698291447653
 rec => 0.05818610089322468

which makes no substantial difference to the result. This is because the "intuitive solution" is also worse in terms of relative loss:

EasyModelAnalysis.relative_l2loss([0.363, 0.29], (_prob, pkeys, t_train, data_train))
38.7125655931015
EasyModelAnalysis.relative_l2loss([fitparams[1][2], fitparams[2][2]],
                                  (_prob, pkeys, t_train, data_train))
15.456929561697294

In other words, while one may wish to fit the infected spike, doing so would cause the susceptible and recovered values to be so far off that it leads to more error than a bad fit of the infected. The SIR model is simply not a good fit to this data.

Another way to see this result is to notice that both the number of susceptible individuals and recovered individuals are both dropping exponentially at a growing rate at the end of the time after the peak of the infection, which is incompatible with the SIR model's assumptions that the rate of S -> I and I -> R would both drop after the infection's peak.

SIR Forecasting Plots

Demonstrated are the forecasts with the best fitting SIR parameters

# Plot testing fit
u0s = [S => N_total - df_test.I[1] - df_test.R[1], I => df_test.I[1], R => df_test.R[1]]
_prob_test = remake(_prob, p = fitparams, u0 = u0s, tspan = (t_test[1], t_test[end]))
sol = solve(_prob_test, saveat = t_test);

plot(map(data_test) do (var, num)
         plot(sol, idxs = var)
         plot!(t_test, num)
     end..., dpi = 300)
savefig("test_fit_S3_Q1.png")
"/home/runner/work/ASKEM_Evaluation_Staging/ASKEM_Evaluation_Staging/docs/build/Scenario3/test_fit_S3_Q1.png"

Question 2: Add Hospitalizations and Deaths

This expands the original SIR model to explore a model space comprising SIRD, SIRH, and SIRHD.

sird = read_json_acset(LabelledPetriNet, "sird.json")
sirh = read_json_acset(LabelledPetriNet, "sirh.json")
sirhd = read_json_acset(LabelledPetriNet, "sirhd.json")
sirhd_sys = ODESystem(sirhd)
sirhd_sys = complete(sirhd_sys)
@unpack S, I, R, H, D, inf, rec, ideath, death, hosp, hrec = sirhd_sys
@parameters N = 1
param_sub = [
    inf => inf / N,
]
sirhd_sys = substitute(sirhd_sys, param_sub)
defs = ModelingToolkit.defaults(sirhd_sys)
defs[S] = N_total - 10
defs[I] = 10
defs[H] = 0
defs[D] = 0
defs[R] = 0.0
defs[N] = N_total
defs[inf] = 0.5
defs[rec] = 0.25
defs[ideath] = 0.25
defs[death] = 0.25
defs[hosp] = 0.25
defs[hrec] = 0.25
tspan = (0.0, 40.0)
sirhd_prob = ODEProblem(sirhd_sys, [], tspan)
sirhd_sol = solve(sirhd_prob)
plot(sirhd_sol)

Question 2 involves doing the same analysis as question one but on the SIR model with hospitalizations and deaths included.

Model Calibration

data_train = [
    S => N_total .- df_train.I .- df_train.R .- df_train.D .- df_train.H,
    I => df_train.I, R => df_train.R, H => df_train.H, D => df_train.D,
]
data_test = [
    S => N_total .- df_test.I .- df_test.R .- df_test.D .- df_test.H,
    I => df_test.I, R => df_test.R, H => df_test.H, D => df_test.D,
]

u0s = [
    S => N_total - df_train.I[1] - df_train.R[1] - df_train.H[1] - df_train.D[1],
    I => df_train.I[1], R => df_train.R[1], H => df_train.H[1], D => df_train.D[1],
]
_prob2 = remake(sirhd_prob, u0 = u0s, tspan = (t_train[1], t_train[end]),
                p = [N => N_total])

param_bounds = [inf => [0.0, 1.0]
                rec => [0.0, 1.0]
                death => [0.0, 5.0]
                ideath => [0.0, 5.0]
                hosp => [0.0, 5.0]
                hrec => [0.0, 5.0]]
fitparams2 = global_datafit(_prob2, param_bounds, t_train, data_train,
                            maxiters = 500_000, loss = EasyModelAnalysis.relative_l2loss)
6-element Vector{Pair{Num, Float64}}:
    inf => 0.08939596798743692
    rec => 0.05820266593013657
  death => 0.00037966817998817643
 ideath => 0.0005122652730412624
   hosp => 2.3285242941670046e-17
   hrec => 4.3338256796399776e-16
# Plot training fit
_prob2_train = remake(_prob2, p = fitparams2)
sol = solve(_prob2_train, saveat = t_train);
plot(map(data_train) do (var, num)
         plot(sol, idxs = var)
         plot!(t_train, num)
     end..., dpi = 300)
savefig("train_fit_S3_Q2.png")
"/home/runner/work/ASKEM_Evaluation_Staging/ASKEM_Evaluation_Staging/docs/build/Scenario3/train_fit_S3_Q2.png"
u0s = [
    S => N_total - df_test.I[1] - df_test.R[1] - df_test.H[1] - df_test.D[1],
    I => df_test.I[1], R => df_test.R[1], H => df_test.H[1], D => df_test.D[1],
]
_prob2_test = remake(_prob2, p = fitparams2, u0 = u0s, tspan = (t_test[1], t_test[end]))
sol = solve(_prob2_test, saveat = t_test);
plot(map(data_test) do (var, num)
         plot(sol, idxs = var)
         plot!(t_test, num)
     end..., dpi = 300)
savefig("test_fit_S3_Q2.png")
"/home/runner/work/ASKEM_Evaluation_Staging/ASKEM_Evaluation_Staging/docs/build/Scenario3/test_fit_S3_Q2.png"

Explanation of the Fit

Once again it's clear that the model is unable to fit the data well. The same issues apply to the new data as well. The S, R, and D data all increase the rate of growth after the infection's peak, which is impossible to occur in the SIRHD model and thus suggests that the infection peak might be an anomoly of the data. In either case, it's impossible to both fit the non-decreasing derivatives of these 3 data series while also fitting the peak of the infection, with this model. Additionally. for no parameters does the SIRHD model emit oscillatory solutions as seen in the data, which suggests a model deficiency.

Evaluate Model Forecasts

In order to evaluate the model forecasts, we developed a functional which does the forecasting part with multiple models and puts a score on the forecast result. This score is calculated using the L2 norm.

norm(solve(_prob, saveat = t_test)[S] - data_test[1][2]) +
norm(solve(_prob, saveat = t_test)[I] - data_test[2][2]) +
norm(solve(_prob, saveat = t_test)[R] - data_test[3][2])
3.166974479556958e9
norm(solve(_prob2, saveat = t_test)[S] - data_test[1][2]) +
norm(solve(_prob2, saveat = t_test)[I] - data_test[2][2]) +
norm(solve(_prob2, saveat = t_test)[R] - data_test[3][2]) +
norm(solve(_prob2, saveat = t_test)[H] - data_test[4][2]) +
norm(solve(_prob2, saveat = t_test)[D] - data_test[5][2])
2.863191757389864e8

Overall the SIRHD model gives a better forecast. Though for no values of the model can an SIR or SIRHD model predict a second wave, all of these models can only have a singular peak in the infections, and thus we would say that neither of the models are adequate predictors of this data.

Question 3: Add Vaccinations

This expands the previous SIRHD model to add vaccination.

using ModelingToolkit.Symbolics: variable
using ModelingToolkit: toparam
sirhd_vax = read_json_acset(LabelledPetriNet, "sirhd_vax.json")
sirhd_vax_sys = structural_simplify(ODESystem(sirhd_vax))
sirhd_vax_sys = complete(sirhd_vax_sys)
names = string.(ModelingToolkit.getname.(states(sirhd_vax_sys)))
sts_names = Symbol.(getindex.(names, 6), :_, getindex.(names, 11))
@variables t
@parameters N
sts = map(n -> variable(n, T = SymbolicUtils.FnType)(t), sts_names)
names = split.(string.(parameters(sirhd_vax_sys)), "\"")
ps_names = Symbol.(getindex.(split.(getindex.(names, 3), "\\"), 1), :_,
                   getindex.(split.(getindex.(names, 5), "\\"), 1))
ps = map(n -> toparam(variable(n)), ps_names)
nps = findall(x -> occursin("inf", string(x)), ps)
ups = findall(x -> !occursin("inf", string(x)), ps)
subs = [parameters(sirhd_vax_sys)[nps] .=> ps[nps] ./ N
        parameters(sirhd_vax_sys)[ups] .=> ps[ups]
        states(sirhd_vax_sys) .=> sts]
sirhd_vax_sys = substitute(sirhd_vax_sys, subs)
@unpack S_U, I_U, R_U, H_U, D_U, S_V, I_V, R_V, H_V, D_V = sirhd_vax_sys
S, I, R, H, D = S_U, I_U, R_U, H_U, D_U
Sv, Iv, Rv, Hv, Dv = S_V, I_V, R_V, H_V, D_V
@unpack id_vax, inf_infuu, inf_infuv, hosp_id, ideath_id, rec_id, hrec_id, death_id, inf_infvu, inf_infvv = sirhd_vax_sys
sys3 = sirhd_vax_sys
prob3 = ODEProblem(sys3, nothing, (0, 1.0))
ODEProblem with uType Nothing and tType Float64. In-place: true
timespan: (0.0, 1.0)
u0: nothing

Question 3 is the same analysis as questions 1 and 2 done on a model with vaccination added. In order to build unit tests for the analysis and functionality, we started by building the model with vaccine by hand, awaiting a swap to the version from TA2.

Model Calibration

data_train = [(S + Sv) => N_total .- df_train.I .- df_train.R .- df_train.H .- df_train.D,
    (I + Iv) => df_train.I, (R + Rv) => df_train.R, H => df_train.H_unvac,
    Hv => df_train.H_vac, D => df_train.D_unvac, Dv => df_train.D_vac]
data_test = [(S + Sv) => N_total .- df_test.I .- df_test.R .- df_test.H .- df_test.D,
    Sv => 0,
    (I + Iv) => df_test.I, (R + Rv) => df_test.R, H => df_test.H_unvac, Hv => df_test.H_vac,
    D => df_test.D_unvac, Dv => df_test.D_vac]

vac_rate = df_train.H_vac[1] / (df_train.H_vac[1] + df_train.H_unvac[1])
# 52% of hospitalizations are vaccinated, we do not have data for vaccination rates for other compartments,
# so we assume that the vaccination rate is the same for all compartments.
0.5191533638180773
u0s = [
    S => (1 - vac_rate) *
         (N_total - df_train.I[1] - df_train.R[1] - df_train.H[1] - df_train.D[1]),
    I => (1 - vac_rate) * df_train.I[1],
    R => (1 - vac_rate) * df_train.R[1],
    H => df_train.H_unvac[1],
    D => df_train.D_unvac[1],
    Sv => vac_rate *
          (N_total - df_train.I[1] - df_train.R[1] - df_train.H[1] - df_train.D[1]),
    Iv => vac_rate * df_train.I[1],
    Rv => vac_rate * df_train.R[1],
    Hv => df_train.H_vac[1],
    Dv => df_train.D_vac[1],
]
_prob3 = remake(prob3, u0 = u0s, tspan = (t_train[1], t_train[end]),
                p = [ps .=> 0.1; N => N_total])

fitparams3 = global_datafit(_prob3, ps .=> ([0, 5.0],), t_train, data_train,
                            maxiters = 200_000, loss = EasyModelAnalysis.relative_l2loss)
10-element Vector{Pair{Num, Float64}}:
 inf_infuu => 4.109045159474684e-16
 inf_infvu => 1.0821616715228021e-16
 inf_infuv => 3.1022429303649554e-18
 inf_infvv => 0.38994297902012576
    id_vax => 0.00032456309347267815
    rec_id => 0.15039515789737148
   hosp_id => 6.217351257751445e-5
 ideath_id => 0.0010199963890105772
   hrec_id => 0.0019902949910320425
  death_id => 1.3610476705042196e-20
u0s = [
    S => (1 - vac_rate) *
         (N_total - df_train.I[1] - df_train.R[1] - df_train.H[1] - df_train.D[1]),
    I => (1 - vac_rate) * df_train.I[1],
    R => (1 - vac_rate) * df_train.R[1],
    H => df_train.H_unvac[1],
    D => df_train.D_unvac[1],
    Sv => vac_rate *
          (N_total - df_train.I[1] - df_train.R[1] - df_train.H[1] - df_train.D[1]),
    Iv => vac_rate * df_train.I[1],
    Rv => vac_rate * df_train.R[1],
    Hv => df_train.H_vac[1],
    Dv => df_train.D_vac[1],
]
_prob3 = remake(_prob3, p = fitparams3, u0 = u0s, tspan = (t_train[1], t_train[end]))
sol = solve(_prob3, saveat = t_train);
plot(map(data_train) do (var, num)
         plot(sol, idxs = [var])
         plot!(t_train, num)
     end...)
u0s = [
    S => (1 - vac_rate) *
         (N_total - df_test.I[1] - df_test.R[1] - df_test.H[1] - df_test.D[1]),
    I => (1 - vac_rate) * df_test.I[1],
    R => (1 - vac_rate) * df_test.R[1],
    H => df_test.H_unvac[1],
    D => df_test.D_unvac[1],
    Sv => vac_rate * (N_total - df_test.I[1] - df_test.R[1] - df_test.H[1] - df_test.D[1]),
    Iv => vac_rate * df_test.I[1],
    Rv => vac_rate * df_test.R[1],
    Hv => df_test.H_vac[1],
    Dv => df_test.D_vac[1],
]
_prob3 = remake(_prob3, p = fitparams3, u0 = u0s, tspan = (t_test[1], t_test[end]))
sol = solve(_prob3, saveat = t_test);
plot(map(data_test) do (var, num)
         plot(sol, idxs = [var])
         plot!(t_test, num)
     end...)

Question 4: Age-Stratified Model

Question 5: Add Reinfection

Question 5 is the same analysis as questions 1, 2, 3, and 4 on a model with reinfection added. We will use SIRHD with reinfection.

sirhd = read_json_acset(LabelledPetriNet, "sirhd_renew.json")
sirhd_sys = ODESystem(sirhd)
sirhd_sys = complete(sirhd_sys)
@unpack S, I, R, H, D, inf, rec, ideath, death, hosp, hrec, renew = sirhd_sys
@parameters N = 1
param_sub = [
    inf => inf / N,
]
sirhd_sys = substitute(sirhd_sys, param_sub)
defs = ModelingToolkit.defaults(sirhd_sys)
defs[S] = 100 - 10
defs[I] = 10
defs[H] = 0
defs[D] = 0
defs[R] = 0.0
defs[N] = 100
defs[inf] = 0.5
defs[rec] = 0.25
defs[ideath] = 0.25
defs[death] = 0.25
defs[hosp] = 0.25
defs[hrec] = 0.25
defs[renew] = 0.25
tspan = (0.0, 40.0)
sirhd_prob = ODEProblem(sirhd_sys, [], tspan)
sirhd_sol = solve(sirhd_prob)
plot(sirhd_sol)

Model Calibration

start_train = 171
stop_train = 171 + 213
start_test = 171 + 214
stop_test = 171 + 214 + 151

df_train = df_raw[start_train:stop_train, :]
df_test = df_raw[start_test:stop_test, :]

t_train = collect(0:(size(df_train, 1) - 1))
t_test = collect(0:(size(df_test, 1) - 1))

data_train = [
    S => N_total .- df_train.I .- df_train.R .- df_train.D .- df_train.H,
    I => df_train.I, R => df_train.R, H => df_train.H, D => df_train.D,
]
data_test = [
    S => N_total .- df_test.I .- df_test.R .- df_test.D .- df_test.H,
    I => df_test.I, R => df_test.R, H => df_test.H, D => df_test.D,
]

u0s = [
    S => N_total - df_train.I[1] - df_train.R[1] - df_train.H[1] - df_train.D[1],
    I => df_train.I[1], R => df_train.R[1], H => df_train.H[1], D => df_train.D[1],
]
_prob5 = remake(sirhd_prob, u0 = u0s, tspan = (t_train[1], t_train[end]),
                p = [N => N_total])

param_bounds = [inf => [0.0, 1.0]
                rec => [0.0, 1.0]
                death => [0.0, 5.0]
                ideath => [0.0, 5.0]
                hosp => [0.0, 5.0]
                hrec => [0.0, 5.0]
                renew => [0.0, 1.0]]
fitparams5 = global_datafit(_prob5, param_bounds, t_train, data_train,
                            maxiters = 500_000, loss = EasyModelAnalysis.relative_l2loss)
7-element Vector{Pair{Num, Float64}}:
    inf => 0.6668135696992136
    rec => 0.2993120423106434
  death => 0.007483362643165488
 ideath => 1.335711212458645e-10
   hosp => 0.28006348591478
   hrec => 3.57533659700115
  renew => 0.02132168640203914
# Plot training fit
_prob5_train = remake(_prob5, p = fitparams5)
sol = solve(_prob5_train, saveat = t_train);
plot(map(data_train) do (var, num)
         plot(sol, idxs = var)
         plot!(t_train, num)
     end..., dpi = 300)
savefig("train_fit_S3_Q5.png")
"/home/runner/work/ASKEM_Evaluation_Staging/ASKEM_Evaluation_Staging/docs/build/Scenario3/train_fit_S3_Q5.png"
u0s = [
    S => N_total - df_test.I[1] - df_test.R[1] - df_test.H[1] - df_test.D[1],
    I => df_test.I[1], R => df_test.R[1], H => df_test.H[1], D => df_test.D[1],
]
_prob5_test = remake(_prob5, p = fitparams5, u0 = u0s, tspan = (t_test[1], t_test[end]))
sol = solve(_prob5_test, saveat = t_test);
plot(map(data_test) do (var, num)
         plot(sol, idxs = var)
         plot!(t_test, num)
     end..., dpi = 300)
savefig("test_fit_S3_Q5.png")
"/home/runner/work/ASKEM_Evaluation_Staging/ASKEM_Evaluation_Staging/docs/build/Scenario3/test_fit_S3_Q5.png"

Explanation of the Fit

The other models were unable to fit the data, especially the forecasts, because of the property of monotonicity after the infection peak. The SIRHD model with reinfections does not have this property, so does this make it able to fit such data? Absolutely not!

In order to have the non-monotonicity property and repeat infections, the SIRHD model relies on the R -> S transition. What this means is that these properties are only possible if S is non-monotonic. S has to increase in order for this to occur. Since S is monotonic in the data, a second wave is not possible. In other words:

  1. Parameters which fit the second wave characteristics of the I data are required to have non-monotonicity in S, which thus gives a qualitatively bad fit to S. This then leads to non-monotonicity in R as well.
  2. Parameters which fit S and R must give a single wave characteristic, and thus will have a qualitatively bad fit against I.

The global optima of the parameters seems to pick the result (1), though both give qualitatively "eyeball" poor fits due to this model property.

Question 6: New Data

Question 7: Analysis

For each model, summarize your conclusions about the following:

  1. Do parameters fit from data seem reasonable and fall within typical ranges you might see in the broader literature? Provide references to support your conclusions.
  2. Describe how well the fitted model compares against historical data, both for the ‘training’ and ‘test’ periods.

Answer

Question 2

The fits against the historical data, both training and test periods, are unremarkable but for clear reasons as described in the text. The SIR/SIRHD/etc. models all have clear limitations which can easily be proven from their analytical form this includes:

  1. Only a single infection peak or monotonic results in I.
  2. Rate decreases required after the infection peak for S, R, and D.
  3. H also can only have simple peaking results, or monotonic results.

These 3 points are directly incompatible with the real-world data. This is because the real-world data has the properties that:

  1. The rate of decrease in S, R, and D do not decrease after the peak of the infection.
  2. The H data is non-monotonic.

Since these facts are directly incompatible with the SIR and SIRHD models, there is no set of parameters that is able to fit all of the qualitative features of the model. Instead, the model fit tends to fit S and R, or S, R, and D with disreguard to the other features as a way to minimize the L2 loss, which leads to a non-qualitative fit of the infection peak and the hospitalization non-monotonicity.

These facts are compounded in the forecasting, which attempts to forecast a new wave of the infection. The quadratic models are unable to model equations with dual infection peaks, and thus this is impossible to predict from any of the models except the final which includes reinfection.

Question 1

The fits from the data seem reasonable according to literature on the analytical results of the SIR and SIRHD type models. https://bmcinfectdis.biomedcentral.com/articles/10.1186/s12879-022-07403-5 describes how interventions are required in the model in order to be able to model second wave events, which makes it clear that a second wave cannot be modeled in these models and thus the forecast predictions being incorrect is a known defficiency of the model. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7997702/ provides a full derivation for the infection peak and shows that the calculation has a unique zero derivative point, which directly proves the assertion that there cannot be a second wave in the SIR and SIRHD models.