START: 2020-09-02 17:53:32.135651

Max Pierini, Sandra Mazzoli, Alessio Pamovio

info@maxpierini.it

Abstract

A simple method is presented to predict new cases and Percent Positive, estimated on effective reproduction number $R_t$ of COVID-19 in italian regions with a Markov chain Monte Carlo and Poisson likelihood parametrized on daily new cases.

Method

New cases $y_r$ for each $r$ italian regions (source Dipartimento Protezione Civile)will be filtered with Hampel filter for outliers detection (gaussian window of 7 days, $\sigma=2$) and smoothed with rolling mean (gaussian window of 14 days, $\sigma=22$). Smoothed new cases will be adjusted to be $>0$ to avoid negative values (due to data error corrections).

For each day $t=[0,T]$ (where $T$ is the last observed day), smoothed new cases $y_{r,t}$ will be supposed distributed as Poisson with $\lambda_{r,t}$ parameter

$$ y_{r,t} \sim \mathcal{P}(\lambda_{r,t}) $$

where $\lambda_{r,t}$ is defined by the serial interval inverse $\gamma$, previous day smoothed new cases $k_{r,t-1}$ and effective reproduction number in time $R_{r,t}$ (ref: Bettencourt & Ribeiro 2008)

$$ \lambda_{r,t} = k_{r,t-1} e^{\gamma (R_{r,t} - 1)} $$

The serial interval SI is supposed to be distributed as Gamma, with mean $\mu=7.5$ and standard deviation $\sigma=3.4$ (ref: Li, Ghuan et Al. 2020a)

$$ \mathbf{SI} \sim \Gamma(\mu_{=7.5}, \sigma_{=3.4}) $$

so that $\gamma$ is distributed as Inverse Gamma

$$ \gamma \sim \Gamma^{-1}(\mu_{=7.5}, \sigma_{=3.4}) $$

Parameters $R_{{r,t}}$ will be distributed Half Normal with mean equal to previous day posteriors $R_{{r,t-1}}$ and overarching unknown precision $\tau$

$$ R_{{r,t}} \sim \mathcal{{N}}^+(R_{{r,t-1}}, \tau) $$

where, first day $R_{{r,0}}$ (outcome) is set to zero

$$ R_{{r,0}} = 0 $$

and $\tau$ priors are distributed Gamma (partial pooling between regions) with shape and rate based on results of preliminary tests with uniformative $\tau$ distribution

$$ \tau \sim \Gamma(1000, 100) $$

so that the standard deviation $\sigma$ will be

$$ \sigma = \frac{1}{\sqrt{\tau}} $$

If previous new cases are zero $k_{r,t-1}=0$, parameter $R_{r,t}$ is undefined, given the chosen function for $\lambda_{r,t}$ parameter of Poisson likelihood, even if it should be $R_{r,t}=0$ (no new cases means null effective reproduction number). Thus, in these cases, priors of $R_{r,t}$ will be forced to

$$ R_{r,t} \sim \mathcal{N}^+(0, \tau) \;,\; k_{r,t-1}=0 $$

and previous new cases will be forced to $k_{r,t-1}=1$, so that $\lambda_{r,t}$ will be

$$ \lambda_{r,t} = e^{\gamma( \mathcal{N}^+(0, \tau) - 1 )} \;,\; k_{r,t-1}=0 $$

At the same time, $R_t$ for Italy as nation will be calculated.

Percent positive (PP), aka Percent Positive Rate or Positivity Rate, can be considered as an index of disease transmission (ref: Johns Hopkins).

The index is calculated as

$$ \mathbf{PP} = \frac{ \Delta_\textrm{positive} }{ \Delta_\textrm{tested} } \cdot 100 $$

where $\Delta_\textrm{positive}$ is daily new amount of positive tests and $\Delta_\textrm{tested}$ is daily new amount of total tests.

When it's too high means that more tests are needed and/or pharmaceutical (PI) or non-pharmaceutical (NPI) interventions should be seriously considered to reduce transmission.

The "alarm threshold" has been established at 5%. World Health Organization recommend to relax COVID-19 NPI (lockdown and high level social distancing rules) if PP keeps below the threshold for at least two weeks.

We can add an "alert" threshold, between 3% and 5%: if PP raises over 3%, governments should consider a likely forthcoming alarm scenario. Below 3% can be considered in a safe area.

Percent positive index for Italy is here calculated using Dipartimento di Protezione Civile published data (ref: GitHub) filterd with Hampel filter for outliers (gaussian window of 14 days, $\sigma=5$) and smoothed with rolling mean (gaussian window of 7 days, $\sigma=2$).

To predict $y_{\mathrm{pred},{T+p}}$ new cases in Italy and regions for the next $p=[1 ... P]$ days, the same likelihood will be used, distributing new cases as Poisson with $\lambda_{\mathrm{pred},{T+p}}$ parameter

$$ y_{\mathrm{pred},{T+p}} \sim \mathcal{P}(\lambda_{\mathrm{pred},{T+p}}) \;,\; p=[1 ... P] $$

where $\lambda_{\mathrm{pred},{T+p}}$ is parametrized on $\gamma$, the posteriors of estimated $y_{\mathrm{pred},{T+p-1}}$ and the $R_{T+p}^*$, where $R_{T}^*$ is the mean of estimated effective reproduction number in the last observed $d$ days

$$ R_t^* = \frac{ \sum_{T-d}^{T}{R_i} }{ d } $$$$ \lambda_{\mathrm{pred},t} = y_{\mathrm{pred},{T+p-1}} \cdot e^{\gamma (R_{T+p}^* - 1)} $$

where $y_{\mathrm{pred},{T+p-1}}$ are distributed Half Normal with uknown precision

$$ y_{\mathrm{pred},{T+p-1}} \sim \mathcal{N}^+ \left( \mu=y_{\mathrm{pred},{T+p-1}} , \tau_{\mathrm{pred}} \right) $$

and

$$ \tau_{\mathrm{pred}} \sim \Gamma( 1, 1) $$

To predict best and worst scenario, $R_p^*$ will be supposed to be respectively higher and lower by $\varepsilon$

$$ R^*_{T+p,\mathrm{best}} = R_{T+p}^* - \frac{p}{P} \varepsilon \;,\; p=[1...P] $$$$ R^*_{T+p,\mathrm{worst}} = R_{T+p}^* + \frac{p}{P} \varepsilon \;,\; p=[1...P] $$

where $\varepsilon$ is the mean of previously estimated $R_t$ variation within $P$ days, rounded to 2 decimals (see Epsilon).

Predicted $\mathbf{PP}_{T+p}$ will be calculated as

$$ \mathbf{PP}_{T+p} = \frac{ y_{\mathrm{pred},T+p} }{ \Delta_{\mathrm{tested}}^* } \cdot 100 $$

where $y_{\mathrm{pred},{T+p}}$ are estimated posteriors of predicted new cases and $\Delta_{\mathrm{tested}}^*$ is the last value of filtered and smoothed tested daily cases.

A Markov chain Monte Carlo will be used with Metropolis-Hasting algorithm and Gibbs sampler (adapt 500, warmup 2000, samples 2000, chains 4, thin 1) in Python 3.8.5 with pyjags==1.3.7 and JAGS 4.3.0 for macOS.

The jupyter notebook backend is available at GitHub.

Epsilon

Given previously estimated $R_t$ mean with this model and $\Delta_d (R_t)$ the absolute variation of $R_t$ within $d$ days, $\varepsilon$ can be supposed as the mean of $\Delta_d (R_t)$, rounded to 2 decimals.

$$ \varepsilon = \left\lceil \overline{\Delta_d (R_t)} \right\rfloor $$
<ipython-input-13-9623fa130049>:2: RuntimeWarning: invalid value encountered in greater
  Rvar = Rdiffs[Rdiffs>0].mean()
<ipython-input-13-9623fa130049>:3: RuntimeWarning: invalid value encountered in greater
  Rstd = Rdiffs[Rdiffs>0].std()
<ipython-input-13-9623fa130049>:15: RuntimeWarning: invalid value encountered in greater
  ax.set_ylim(0, Rdiffs[Rdiffs>0].max())

The mean of $\Delta_7 (R_t)$ is equal to $0.33245 \pm 0.23636$.

$$ \varepsilon = \left \lceil 0.33245 \right \rfloor = 0.33 $$

Model

model {
    ###################################
    # Rt estimation
    ###################################
    # Overarching Rt standard deviation
    tau_R ~ dgamma( 1000 , 100 )
    sigma_R <- 1 / sqrt( tau_R )

    # Serial interval distribution
    SI_mu <- 7.5
    SI_sd <- 3.4
    SI_sh <- SI_mu^2 / SI_sd^2
    SI_ra <- SI_mu / SI_sd^2
    SI ~ dgamma( SI_sh , SI_ra )
    gamma <- 1 / SI

    for ( r in 1:C ) {
        # First Rt prior
        RR[r,1] <- 0
        for ( t in 2:T ) {
            # Rt prior for k>0
            RRpr[r,t] ~ dnorm( RR[r,(t-1)] , tau_R )  T(0,)
            # Rt prior for k=0
            RRnu[r,t] ~ dnorm( 0 , tau_R )  T(0,)

            # Define Rt prior
            RR[r,t] <- ifelse( kR[r,t-1]==0 , RRnu[r,t] , RRpr[r,t] )
            # Avoid k=0 (undefined Rt)
            KR[r,t] <- ifelse( kR[r,t-1]==0, 1 , kR[r,t-1] )

            # Poisson likelihood
            lambdaR[r,t] <- KR[r,t] * exp( gamma * ( RR[r,t] - 1 ) )
            yR[r,t] ~ dpois( lambdaR[r,t] )
        }
    }

    # First Rt prior
    R[1] <- 0
    for ( t in 2:T ) {
        # Rt prior for k>0
        Rpr[t] ~ dnorm( R[t-1] , tau_R )  T(0,)
        # Rt prior for k=0
        Rnu[t] ~ dnorm( 0 , tau_R )  T(0,)

        # Define Rt prior
        R[t] <- ifelse( k[t-1]==0 , Rnu[t] , Rpr[t] )
        # Avoid k=0 (undefined Rt)
        K[t] <- ifelse( k[t-1]==0, 1 , k[t-1] )

        # Poisson likelihood
        lambda[t] <- K[t] * exp( gamma * ( R[t] - 1 ) )
        y[t] ~ dpois( lambda[t] )
    }

    ###################################
    # Predictions
    ###################################
    R_pred <- sum(R[(T-P):T]) / P
    y_pred[1] <- y[T]
    y_pred_hi[1] <- y[T]
    y_pred_lo[1] <- y[T]
    PP[1] <- PPobs
    PP_lo[1] <- PPobs
    PP_hi[1] <- PPobs

    # New cases precision prior
    y_tau ~ dgamma( 1 , 1 )

    for ( r in 1:C ) {
        RR_pred[r] <- sum(RR[r,((T-P):T)]) / P
        yR_pred[r,1] <- yR[r,T]
        yR_pred_hi[r,1] <- yR[r,T]
        yR_pred_lo[r,1] <- yR[r,T]
        PPR[r,1] <- PPRobs[r]
        PPR_lo[r,1] <- PPRobs[r]
        PPR_hi[r,1] <- PPRobs[r]

        for ( p in 2:P ) {
            RR_pred_hi[r,p] <- RR_pred[r] + 0.33 * (p-1) / (P-1)
            RR_pred_lo[r,p] <- RR_pred[r] - 0.33 * (p-1) / (P-1)

            # most likely scenario
            yR_prior[r,p] ~ dnorm( yR_pred[r,p-1] , y_tau )  T(0,)
            lambdaR_pred[r,p] <- yR_prior[r,p] * exp( gamma * ( RR_pred[r] - 1 ) )
            yR_pred[r,p] ~ dpois( lambdaR_pred[r,p] )
            PPR[r,p] <- yR_pred[r,p] / testsR[r] * 100

            # worst scenario
            yR_prior_hi[r,p] ~ dnorm( yR_pred_hi[r,p-1] , y_tau )  T(0,)
            lambdaR_pred_hi[r,p] <- yR_prior_hi[r,p] * exp( gamma * ( RR_pred_hi[r,p] - 1 ) )
            yR_pred_hi[r,p] ~ dpois( lambdaR_pred_hi[r,p] )
            PPR_hi[r,p] <- yR_pred_hi[r,p] / testsR[r] * 100

            # best scenario
            yR_prior_lo[r,p] ~ dnorm( yR_pred_lo[r,p-1] , y_tau )  T(0,)
            lambdaR_pred_lo[r,p] <- yR_prior_lo[r,p] * exp( gamma * ( RR_pred_lo[r,p] - 1 ) )
            yR_pred_lo[r,p] ~ dpois( lambdaR_pred_lo[r,p] )
            PPR_lo[r,p] <- yR_pred_lo[r,p] / testsR[r] * 100
        }
    }

    for ( p in 2:P ) {
        R_pred_hi[p] <- R_pred + 0.33 * (p-1) / (P-1)
        R_pred_lo[p] <- R_pred - 0.33 * (p-1) / (P-1)

        # most likely scenario
        y_prior[p] ~ dnorm( y_pred[p-1] , y_tau )  T(0,)
        lambda_pred[p] <- y_prior[p] * exp( gamma * ( R_pred - 1 ) )
        y_pred[p] ~ dpois( lambda_pred[p] )
        PP[p] <- y_pred[p] / tests * 100

        # worst scenario
        y_prior_hi[p] ~ dnorm( y_pred_hi[p-1] , y_tau )  T(0,)
        lambda_pred_hi[p] <- y_prior_hi[p] * exp( gamma * ( R_pred_hi[p] - 1 ) )
        y_pred_hi[p] ~ dpois( lambda_pred_hi[p] )
        PP_hi[p] <- y_pred_hi[p] / tests * 100

        # best scenario
        y_prior_lo[p] ~ dnorm( y_pred_lo[p-1] , y_tau )  T(0,)
        lambda_pred_lo[p] <- y_prior_lo[p] * exp( gamma * ( R_pred_lo[p] - 1 ) )
        y_pred_lo[p] ~ dpois( lambda_pred_lo[p] )
        PP_lo[p] <- y_pred_lo[p] / tests * 100
    }
}

Results

Italy

Abruzzo

Basilicata

Calabria

Campania

Emilia-Romagna

Friuli Venezia Giulia

Lazio

Liguria

Lombardia

Marche

Molise

P.A. Bolzano

P.A. Trento

Piemonte

Puglia

Sardegna

Sicilia

Toscana

Umbria

Valle d'Aosta

Veneto

Latest Rt

MCMC diagnosis

sigma

Rt

Prediction tests

model {
    ###################################
    # Rt estimation
    ###################################
    # Overarching Rt standard deviation
    tau_R ~ dgamma( 1000 , 100 )
    sigma_R <- 1 / sqrt( tau_R )

    # Serial interval distribution
    SI_mu <- 7.5
    SI_sd <- 3.4
    SI_sh <- SI_mu^2 / SI_sd^2
    SI_ra <- SI_mu / SI_sd^2
    SI ~ dgamma( SI_sh , SI_ra )
    gamma <- 1 / SI

    # First Rt prior
    R[1] <- 0
    for ( t in 2:T ) {
        # Rt prior for k>0
        Rpr[t] ~ dnorm( R[t-1] , tau_R )  T(0,)
        # Rt prior for k=0
        Rnu[t] ~ dnorm( 0 , tau_R )  T(0,)

        # Define Rt prior
        R[t] <- ifelse( k[t-1]==0 , Rnu[t] , Rpr[t] )
        # Avoid k=0 (undefined Rt)
        K[t] <- ifelse( k[t-1]==0, 1 , k[t-1] )

        # Poisson likelihood
        lambda[t] <- K[t] * exp( gamma * ( R[t] - 1 ) )
        y[t] ~ dpois( lambda[t] )
    }

    ###################################
    # Predictions
    ###################################
    # New cases precision prior
    y_tau ~ dgamma( 1 , 1 )

    for ( t in U:(T-1)) {
        R_pred[t] <- sum(R[(t-P):t]) / P
        y_pred[t,1] <- y[t]
        y_pred_hi[t,1] <- y[t]
        y_pred_lo[t,1] <- y[t]
        PP[t,1] <- PPobs[t]
        PP_lo[t,1] <- PPobs[t]
        PP_hi[t,1] <- PPobs[t]

        for ( p in 2:P ) {
            R_pred_hi[t,p] <- R_pred[t] + 0.33 * (p-1) / (P-1)
            R_pred_lo[t,p] <- R_pred[t] - 0.33 * (p-1) / (P-1)

            # most likely scenario
            y_prior[t,p] ~ dnorm( y_pred[t,(p-1)] , y_tau )  T(0,)
            lambda_pred[t,p] <- y_prior[t,p] * exp( gamma * ( R_pred[t] - 1 ) )
            y_pred[t,p] ~ dpois( lambda_pred[t,p] )
            PP[t,p] <- y_pred[t,p] / tests[t] * 100

            # worst scenario
            y_prior_hi[t,p] ~ dnorm( y_pred_hi[t,(p-1)] , y_tau )  T(0,)
            lambda_pred_hi[t,p] <- y_prior_hi[t,p] * exp( gamma * ( R_pred_hi[t,p] - 1 ) )
            y_pred_hi[t,p] ~ dpois( lambda_pred_hi[t,p] )
            PP_hi[t,p] <- y_pred_hi[t,p] / tests[t] * 100

            # best scenario
            y_prior_lo[t,p] ~ dnorm( y_pred_lo[t,(p-1)] , y_tau )  T(0,)
            lambda_pred_lo[t,p] <- y_prior_lo[t,p] * exp( gamma * ( R_pred_lo[t,p] - 1 ) )
            y_pred_lo[t,p] ~ dpois( lambda_pred_lo[t,p] )
            PP_lo[t,p] <- y_pred_lo[t,p] / tests[t] * 100
        }
    }
}

New cases

Legenda:

  • dotted black line: observed new cases
  • solid blu line: smoothed new cases
  • blu area: most likely predicted new cases HPDI 90%
  • green area: best scenario predicted new cases HPDI 99%
  • orange area: worst scenario predicted new cases HPDI 99%

Percent Positive

Legenda:

  • dotted black line: observed Percent Positive
  • solid blu line: smoothed Percent Positive
  • blu area: most likely predicted Percent Positive HPDI 90%
  • green area: best scenario predicted Percent Positive HPDI 99%
  • orange area: worst scenario predicted Percent Positive HPDI 99%

END: 2020-09-02 18:27:45.540156, Completed in 0:34:13.404505