On this website:
Also on this website:
Max Pierini, Sandra Mazzoli, Alessio Pamovio
info@maxpierini.it
A simple method is presented to predict new cases and Percent Positive, estimated on effective reproduction number $R_t$ of COVID-19 in italian regions with a Markov chain Monte Carlo and Poisson likelihood parametrized on daily new cases.
New cases $y_r$ for each $r$ italian regions (source Dipartimento Protezione Civile)will be filtered with Hampel filter for outliers detection (gaussian window of 7 days, $\sigma=2$) and smoothed with rolling mean (gaussian window of 14 days, $\sigma=22$). Smoothed new cases will be adjusted to be $>0$ to avoid negative values (due to data error corrections).
For each day $t=[0,T]$ (where $T$ is the last observed day), smoothed new cases $y_{r,t}$ will be supposed distributed as Poisson with $\lambda_{r,t}$ parameter
$$ y_{r,t} \sim \mathcal{P}(\lambda_{r,t}) $$where $\lambda_{r,t}$ is defined by the serial interval inverse $\gamma$, previous day smoothed new cases $k_{r,t-1}$ and effective reproduction number in time $R_{r,t}$ (ref: Bettencourt & Ribeiro 2008)
$$ \lambda_{r,t} = k_{r,t-1} e^{\gamma (R_{r,t} - 1)} $$Parameters $R_{{r,t}}$ will be distributed Half Normal with mean equal to previous day posteriors $R_{{r,t-1}}$ and overarching unknown precision $\tau$
$$ R_{{r,t}} \sim \mathcal{{N}}^+(R_{{r,t-1}}, \tau) $$where, first day $R_{{r,0}}$ (outcome) is set to zero
$$ R_{{r,0}} = 0 $$and $\tau$ priors are distributed Gamma (partial pooling between regions) with shape and rate based on results of preliminary tests with uniformative $\tau$ distribution
$$ \tau \sim \Gamma(1000, 100) $$so that the standard deviation $\sigma$ will be
$$ \sigma = \frac{1}{\sqrt{\tau}} $$If previous new cases are zero $k_{r,t-1}=0$, parameter $R_{r,t}$ is undefined, given the chosen function for $\lambda_{r,t}$ parameter of Poisson likelihood, even if it should be $R_{r,t}=0$ (no new cases means null effective reproduction number). Thus, in these cases, priors of $R_{r,t}$ will be forced to
$$ R_{r,t} \sim \mathcal{N}^+(0, \tau) \;,\; k_{r,t-1}=0 $$and previous new cases will be forced to $k_{r,t-1}=1$, so that $\lambda_{r,t}$ will be
$$ \lambda_{r,t} = e^{\gamma( \mathcal{N}^+(0, \tau) - 1 )} \;,\; k_{r,t-1}=0 $$At the same time, $R_t$ for Italy as nation will be calculated.
Percent positive (PP), aka Percent Positive Rate or Positivity Rate, can be considered as an index of disease transmission (ref: Johns Hopkins).
The index is calculated as
$$ \mathbf{PP} = \frac{ \Delta_\textrm{positive} }{ \Delta_\textrm{tested} } \cdot 100 $$where $\Delta_\textrm{positive}$ is daily new amount of positive tests and $\Delta_\textrm{tested}$ is daily new amount of total tests.
When it's too high means that more tests are needed and/or pharmaceutical (PI) or non-pharmaceutical (NPI) interventions should be seriously considered to reduce transmission.
The "alarm threshold" has been established at 5%. World Health Organization recommend to relax COVID-19 NPI (lockdown and high level social distancing rules) if PP keeps below the threshold for at least two weeks.
We can add an "alert" threshold, between 3% and 5%: if PP raises over 3%, governments should consider a likely forthcoming alarm scenario. Below 3% can be considered in a safe area.
Percent positive index for Italy is here calculated using Dipartimento di Protezione Civile published data (ref: GitHub) filterd with Hampel filter for outliers (gaussian window of 14 days, $\sigma=5$) and smoothed with rolling mean (gaussian window of 7 days, $\sigma=2$).
To predict $y_{\mathrm{pred},{T+p}}$ new cases in Italy and regions for the next $p=[1 ... P]$ days, the same likelihood will be used, distributing new cases as Poisson with $\lambda_{\mathrm{pred},{T+p}}$ parameter
$$ y_{\mathrm{pred},{T+p}} \sim \mathcal{P}(\lambda_{\mathrm{pred},{T+p}}) \;,\; p=[1 ... P] $$where $\lambda_{\mathrm{pred},{T+p}}$ is parametrized on $\gamma$, the posteriors of estimated $y_{\mathrm{pred},{T+p-1}}$ and the $R_{T+p}^*$, where $R_{T}^*$ is the mean of estimated effective reproduction number in the last observed $d$ days
$$ R_t^* = \frac{ \sum_{T-d}^{T}{R_i} }{ d } $$$$ \lambda_{\mathrm{pred},t} = y_{\mathrm{pred},{T+p-1}} \cdot e^{\gamma (R_{T+p}^* - 1)} $$where $y_{\mathrm{pred},{T+p-1}}$ are distributed Half Normal with uknown precision
$$ y_{\mathrm{pred},{T+p-1}} \sim \mathcal{N}^+ \left( \mu=y_{\mathrm{pred},{T+p-1}} , \tau_{\mathrm{pred}} \right) $$and
$$ \tau_{\mathrm{pred}} \sim \Gamma( 1, 1) $$To predict best and worst scenario, $R_p^*$ will be supposed to be respectively higher and lower by $\varepsilon$
$$ R^*_{T+p,\mathrm{best}} = R_{T+p}^* - \frac{p}{P} \varepsilon \;,\; p=[1...P] $$$$ R^*_{T+p,\mathrm{worst}} = R_{T+p}^* + \frac{p}{P} \varepsilon \;,\; p=[1...P] $$where $\varepsilon$ is the mean of previously estimated $R_t$ variation within $P$ days, rounded to 2 decimals (see Epsilon).
Predicted $\mathbf{PP}_{T+p}$ will be calculated as
$$ \mathbf{PP}_{T+p} = \frac{ y_{\mathrm{pred},T+p} }{ \Delta_{\mathrm{tested}}^* } \cdot 100 $$where $y_{\mathrm{pred},{T+p}}$ are estimated posteriors of predicted new cases and $\Delta_{\mathrm{tested}}^*$ is the last value of filtered and smoothed tested daily cases.
The jupyter notebook backend is available at GitHub.
Given previously estimated $R_t$ mean with this model and $\Delta_d (R_t)$ the absolute variation of $R_t$ within $d$ days, $\varepsilon$ can be supposed as the mean of $\Delta_d (R_t)$, rounded to 2 decimals.
$$ \varepsilon = \left\lceil \overline{\Delta_d (R_t)} \right\rfloor $$Legenda:
Legenda: