Dynamic Regression Case Study

Author

David Kohns

Published

2025-03-27

Modified

2025-03-28

1 Introduction

This case study showcases how to apply the ARR2 prior of Kohns et al. (2025a) for dynamic regression. For details on the derivations presented in this case-study, please consult the the supplementary material Kohns et al. (2025b). Code for the case study can be found in my repo.

2 Background: Dynamic Regression

Dynamic regression, or time-varying parameter regression, relaxes the assumption of static (w.r.t. time) regression coefficients. Considering the nature of most time-series applications in social science, such as the response of economic output to financial conditions, it seems reasonable that coefficients change over time. Indeed, fixing the law of motion for the regression parameters to independent univariate auto-regressive models (AR) has a long tradition (see J. C. Chan and Strachan (2023) for a brief history of such models in econ) and good empirical performance Huber, Koop, and Onorante (2021).

Consider a univariate target y_t \in \mathbb{R} which is modeled via observed K-dimensional covariates x_t and unknown regression parameters \beta_t \in \mathbb{R}^{K \times 1} for t = 1,\dotsc,M:

\begin{aligned} y_t &= x_t^{T}\beta_t + \epsilon_t^y \\ \beta_t &= \Phi\beta_{t-1} + \epsilon_t^{\beta}, \end{aligned} \tag{1}

where \Phi \in \mathbb{R}^{K \times K} describes the transition of regression parameters over time. \epsilon^{y}_t and \epsilon^{\beta}_t are normal stochastic innovation terms of the target and the parameter process respectively. Equation Equation 1 in fact belongs to the family of discrete state-space models, or more commonly referred to as hidden Markov models. In this literature, the process for y_t is referred to as the observation equation while that for \beta_t is referred to as the state equation.

To keep things simple, assume normal distributions for the observations noise, \epsilon_t^{y} \sim N(0,\sigma^2), state innovations, \epsilon_t^{\beta} \sim N(0,\Sigma_{\beta}) and that the only unknowns in the covariance are the diagonals \Sigma_{\beta} = \mathrm{diag}~(\sigma^2_{\beta,1},\dotsc,\sigma^2_{\beta,K}). Assume further that the state transition matrix \Phi is diagonal, \Phi = \mathrm{diag}~(\phi_1,\dotsc,\phi_K).

2.1 Problems with standard state-space model priors

It is standard in the state space literature to assume some form of conditionally conjugate inverse-gamma priors for the state innovation variance \sigma^2_{\beta,\cdot} \sim \mathrm{IG}(a,b). 2 problems arise from this:

  1. The \mathrm{IG} bounds the state variance away from 0, leading to over-estimation of the state-variance (Frühwirth-Schnatter and Wagner 2010).

  2. Inflated variance in predictions

While the first problem often causes the second problem, it’s also makes it harder to judge which parameters are truly varying over time. Those problems are poised to be exacerbated in higher dimensions.

A symptom of over-estimating state noise of the state-process is that the variance of the predictor term \mathrm{var}(x_t^{T}\beta_t), dwarfs the variance of the observations noise, \sigma^2, leading to high variance explained, R^2 .

2.2 Review of R2 framework for stationary state space models

Kohns et al. (2025a) have extended the R2D2 framework of Zhang et al. (2022) to a large class of discrete time-series models, including stationary state-space models.

Denote the location of the normal model at time point t as \mu_t. Then, following Gelman et al. (2019) , R^2 is defined as

R^2 = \frac{\mathrm{var}(\mu_t)}{\mathrm{var}(\mu_t) + \sigma^2_{y}}.

For the dynamic regression in Equation 1, Kohns et al. (2025a) show that the R^2 boils down to

R^2 = \frac{\sigma^2_y\sum_{i=1}^K \frac{\sigma^2_{\beta,i}}{1-\phi^2_i}}{\sigma^2_y\sum_{i=1}^K \frac{\sigma^2_{\beta,i}}{1-\phi^2_i} + \sigma^2_y} = \frac{\tau^2}{\tau^2 + 1},

where \tau^2 =\sum_{i=1}^K \frac{\sigma^2_{\beta,i}}{1-\phi^2_i} are the scaled sum of marginal variances of the dynamic regression components. Assuming R^2 \sim \mathrm{beta}(\mu_{R^2},\varphi_{R^2}), the ARR2 prior for each state at time t, conditional on the states up until t-1 is thus

\beta_{t,i} \sim N(\phi_{i}\beta_{t-1,i},\sigma^2(1-\phi^2_i)\psi_i\tau^2).

\psi \sim \mathrm{Dir}(\alpha) decomposes the variance explained to each of the K state components of the dynamic regression. The hyper-parameters \alpha determine the correlation structure and sparsity a-priori among the states (lower \alpha means higher degree of sparsity). See Kohns et al. (2025a) for recommendations on how to set \alpha. We will also follow the recommendations of Kohns et al. (2025a) in setting the hyper-parameters of the prior on R^2 to weakly favour lower R^2 values a-priori, (\mu_{R^2},\varphi_{R^2}) = (1/3,3).

In this case-study, we will consider the two competing models showcased in Kohns et al. (2025a), the Minnesota and regularised horseshoe models. This will only differ in their priors on the initial conditions vector \beta_0. The prior for \phi_i for all i is assumed to be the same across all models, (\phi_i+1)/2 \sim \mathrm{beta}(10,2) following Frühwirth-Schnatter and Wagner (2010). This prior ensures stationarity of the state process, as well as positive correlation in the states.

2.2.1 Minnesota Prior

The Minnesota prior assumes that the starting conditions, \beta_{0,j} \sim N(0,\frac{\hat{\sigma}_y^2}{\hat{\sigma}_{x,i}^2}\kappa_i^2), where \kappa_i^2 \sim G(1,1/(0.04^2)) for i=1,\dotsc, K Giannone, Lenza, and Primiceri (2012). The variance-terms in the fraction appearing in the prior are estimated as the residual variance of univariate AR(4) models following J. C. C. Chan (2021). For the state-standard deviation, we allow for modelling with a fat-tailed distribution, \beta_{t,i} \sim N(\phi_{i}\beta_{t-1,i},\lambda^2_{i}), \lambda_i \sim C_+(0,1).

2.2.2 Regularised Horseshoe Prior (RHS)

The RHS prior assumes that the starting conditions \beta_{0,j} \sim N(0,\lambda^2_i\tau^2_{RHS}), where \lambda_{i} and \tau_{RHS} follow the modified Cauchy priors as in Piironen and Vehtari (2017). For the state-standard deviation, we allow for modelling with a fat-tailed distribution, \beta_{t,i} \sim N(\phi_{i}\beta_{t-1,i},\lambda^2_{i}), \lambda_i \sim C_+(0,1). The expected number of non-zero coefficients in the \beta_0 are set to K/2 as explained in Kohns et al. (2025a).

3 Application: Inflation forecasting

We take the US monthly inflation example from Kohns et al. (2025a) to showcase the TVP regressions. Let’s firstly load all the required packages and functions needed throughout this project.

Code
rm(list = ls())
# Packages
library(fbi)
library(fredr)
library(tsutils)
library(ggplot2)
library(tidyr)
library(dplyr)
options(pillar.neg=FALSE)
library(gridExtra)
library(bayesplot)
library(ggdist)
theme_set(bayesplot::theme_default(base_family = "sans"))
library(reshape2)
library(cmdstanr)

# Functions
calculate_rolling_betas <- function(emp_app, T_start) {
  T_end <- length(emp_app$y)
  n_coef <- dim(emp_app$X)[2]
  
  betas <- array(0, c((T_end - T_start), n_coef))
  
  for (i in 1:n_coef){
    for (t in 1:(T_end - T_start)) {
      t_temp <- (T_start + t - 100):(T_start + t - 1)
      X_scaled <- scale(emp_app$X[t_temp, i])
      y_scaled <- scale(emp_app$y[t_temp])
      
      betas[t, i] <- solve(t(X_scaled) %*% X_scaled) %*% t(X_scaled) %*% y_scaled
    }
  }
  
  betas_df <- as.data.frame(betas)
  betas_df$Time <- emp_app$datevec[(T_end - dim(betas)[1] + 1):T_end]
  colnames(betas_df) <- c(emp_app$Xnames, "Time")
  
  return(betas_df)
}

extract_beta <- function(beta,K,T,Xnames,Timevec){
  
beta_mean <- data.frame()
  
  for(i in 1:K){
    
    beta_temp <- as.vector(colMeans(beta[,(i*T-T+1):(i*T)])) |> as.data.frame() |> mutate(variable = Xnames[i],time = Timevec)
    colnames(beta_temp)[1:3] <- c("Value","Variable","Time")
    beta_mean <- rbind(beta_mean,beta_temp)
  }

return(beta_mean)
  
}

plot_tvp <- function(beta_mean, model_name) {
  ggplot(beta_mean, aes(Time, Value)) +
    geom_line() +
    facet_wrap(~ Variable, scales = "fixed", nrow = 6) +
    cowplot::theme_half_open() +
    theme(
      plot.title = element_text(hjust = 0.5, size = 12),
      legend.position = "none",
      axis.title.x = element_blank(),
      axis.title.y = element_blank(),
      strip.background = element_blank(),  # Remove facet title box color
      axis.text.x = element_text(size = 8),  # Decrease x-axis label font size
      axis.text.y = element_text(size = 8)   # Decrease y-axis label font size
    ) +
    ggtitle(paste("TVP over time:", model_name))
}

extract_r2 <- function(fit, model_name, timevec) {
  r2_mean <- colMeans(fit$draws(variables = "R2_t", format = "matrix"))
  data.frame(R2_t = r2_mean, Time = timevec[1:length(r2_mean)], Model = model_name)
}

# Helper function for comparing predictive performance
compare_elpd <- function(fit_objects) {
  # Extract yf_lpdf from fit objects
  extract_yf_lpdf <- function(fit) {
    as.matrix(fit$draws("yf_lpdf"))
  }
  
  # Create a named list to store log-likelihoods and model names
  log_likelihoods <- lapply(fit_objects, extract_yf_lpdf)
  model_names <- names(fit_objects)
  
  if (is.null(model_names)) {
    model_names <- paste0("model_", seq_along(fit_objects))
  }
  
  # Calculate mean log-likelihood and standard error for each model
  mean_ll <- sapply(log_likelihoods, mean)
  se_ll <- sapply(log_likelihoods, function(ll) sd(ll) / sqrt(length(ll)))
  
  # Identify the best predicting model
  best_model_index <- which.max(mean_ll)
  best_model_name <- model_names[best_model_index]
  
  # Initialize vectors for ELPD differences and their standard errors
  elpd_diff <- numeric(length(fit_objects))
  se_diff <- numeric(length(fit_objects))
  
  # Calculate differences relative to the best predicting model
  for (i in seq_along(fit_objects)) {
    elpd_diff[i] = mean_ll[i] - mean_ll[best_model_index]
    se_diff[i] = sqrt(se_ll[i]^2 + se_ll[best_model_index]^2)
  }
  
  # Round results to 2 decimal places
  elpd_diff <- round(elpd_diff, 2)
  se_diff <- round(se_diff, 2)
  
  # Create a comparison table
  comparison_table <- tibble::tibble(
    Model = model_names,
    elpd_diff = elpd_diff,
    se_diff = se_diff
  ) %>%
    arrange(desc(elpd_diff))

  return(comparison_table)
}

minn_sig_create <- function(X,lags){
  X <- as.matrix(X)
  T <- dim(X)[1]
  K <- dim(X)[2]
  # Outputs
  var_x <- array(0,c(K,1))
  for (j in 1:K){
    # Create Lags
    Y_matrix <- array(0,c((T-lags), lags))
    for(t in 1:(T-p)) {
      for(i in 1:p) {
        Y_matrix[t, (p-i+1)] = X[(t + (i-1)),j];
      }
    }
    # Create AR(p) Residual Variances
    Y <- X[(p+1):T,j]
    beta <- solve(t(Y_matrix)%*%Y_matrix)%*%t(Y_matrix)%*%Y
    var_x[j] <- var(Y-Y_matrix%*%beta)
  }
  return(var_x)
}

Now, let’s download the latest vintages of the data from the FRED-MD data base:

Code
subset <- "subset" # Set to 0 for full FRED-MD data-set
outfile <- "output_location"

####----- User Choices -----####

# Forecast horizon
h <- 1
# Degree of differening (dif = 2 adheres to Hauzenber et al., 2023)
dif <- 2
# Number of lags
lags <- 0

# Covariate choice
frednames <- c("INDPRO","IPMANSICS","CUMFNS","HOUST","PERMIT","USGOOD","MANEMP"
               ,"CES0600000007","AWHMAN","CUSR0000SAC","M1SL","NONBORRES","REALLN",
               "COMPAPFFx","TB3SMFFM","TB6SMFFM","T1YFFM","T10YFFM","AAAFFM","BAAFFM")

####----- Preparation Steps -----####

## 1: Supply FRED mnemonics for data download.  We follow Hauzenber et al. (2023)
filepath <- "https://files.stlouisfed.org/files/htdocs/fred-md/monthly/2024-06.csv"

startdate <- as.Date("01/01/1980","%m/%d/%Y")
enddate <- as.Date("01/01/2024","%m/%d/%Y")

data <- fredmd(filepath, date_start = startdate, date_end = enddate, transform = TRUE)

## 2: Format data into a data frame
datevec <- data$date

if (subset == "subset") {
  data <- data[,frednames]
} else {
  data <- data[,2:ncol(data)]
}

## 3: Impute any missing values following Ercument Cahan, Jushan Bai, and Serena Ng (2021)
data_est <-  tp_apc(data, 4, center = FALSE, standardize = FALSE, re_estimate = TRUE)
data_imp <- data_est$data

## 3: Get inflation data
fredr_set_key("315410e54f1b6f2552a99cefd47e2344") #API key
inflation <- fredr(  series_id = "CPIAUCSL",
                     observation_start = startdate,
                     observation_end = enddate,
                     frequency = "m") #percent change transformation 
inflation <- inflation$value

if (dif == 2) {
  inflation <- log(inflation[(h+2):length(inflation)]/inflation[2:(length(inflation)-h)]) - log(inflation[(2):(length(inflation)-h)]/inflation[1:(length(inflation)-h-1)])
} else if (dif ==1) {
  inflation <- log(inflation[(h+2):length(inflation)]/inflation[2:(length(inflation)-h)])*100 # following Chan (2017) The Stochastic Volatility in Mean Model With TimeVarying Parameters
}

X <- as.matrix(data_imp[2:(dim(data_imp)[1]-h),])
T <- length(inflation)
if (subset == "subset") {
  K <- length(frednames)
} else {
  K <- ncol(X)
}
# Create matrix of lags
if (lags == 0){
  X_all <- array(0,c(T,K))
} else {
X_all <- array(0,c(T-lags,K*lags))
}

if (lags > 0){
for (j in 1:K){
  lagtemp <- lagmatrix(X[,j],1:lags)
  X_all[,(lags*(j-1)+1):(lags*j)] <- lagtemp[((lags+1):dim(lagtemp)[1]),]
}
}
####----- Save Data -----####

# placeholder for data matrix
y <- as.vector(inflation[(lags+1):length(inflation)])
datevec <- datevec[(lags+1):length(inflation)]
if (subset == "subset") {
  Xnames <- frednames
} else {
  Xnames <- colnames(data_imp[2:ncol(data_imp)])
}
yname <- "CPIAUCSL"
lagstructure <- rep(1:lags,K)

emp_app <- list(X = X,
                y = y,
                Xnames = Xnames,
                yname = yname,
                datevec = datevec,
                lagstructure = lagstructure)

# Get all objects in the environment
all_objects <- ls()

# Identify functions
function_objects <- sapply(all_objects, function(x) is.function(get(x)))

# Create a vector of objects to keep (emp_app and functions)
objects_to_keep <- c("emp_app", names(function_objects)[function_objects])

# Remove all objects except for those to keep
rm(list=setdiff(all_objects, objects_to_keep))

3.1 Some Quick Visual Checks

Let’s quickly plot the time-series.

Code
# Convert emp_app into a dataframe
emp_app_df <- data.frame(
  Time = as.Date(emp_app$datevec),
  Inflation = scale(emp_app$y),
  scale(emp_app$X)
)

# Add appropriate column names
colnames(emp_app_df)[3:ncol(emp_app_df)] <- emp_app$Xnames

# Prep Data for plotting
dat <- emp_app_df %>%
  pivot_longer(-Time, names_to = "variable", values_to = "value")

# Plot the data
tsplot <- ggplot(dat, aes(Time, value)) +
  geom_line() +
  facet_wrap(. ~ variable, scales = "free_y", nrow = 6) +
  cowplot::theme_half_open() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 12),
    legend.position = "none",
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    strip.background = element_blank(),  # Remove facet title box color
    axis.text.x = element_text(size = 8),  # Decrease x-axis label font size
    axis.text.y = element_text(size = 8)   # Decrease y-axis label font size
  ) +
  ggtitle("Simple time-series of all variables")

# Print the plot
tsplot

A fairly quick check to investigate whether we should expect there to be variation in the coefficient process at all is to estimate a rolling window OLS regressions for each covariate separately. This won’t be a perfect indicator for joint variation, as correlation might cause many of the coefficient processes to be shrunk to zero. This will be expected given that macro data are typically highly correlated Giannone, Lenza, and Primiceri (2021).

Code
# Calculate the rolling window coefficients
betas_df <- calculate_rolling_betas(emp_app, 100)

# Reshape the data for plotting
betas_long <- betas_df %>%
  pivot_longer(-Time, names_to = "variable", values_to = "value") %>%
  mutate(Time = as.Date(Time))

# Plot the data
tsplot <- ggplot(betas_long, aes(Time, value)) +
  geom_line() +
  facet_wrap(~ variable, scales = "free_y", nrow = 6) +
  cowplot::theme_half_open() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 12),
    legend.position = "none",
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    strip.background = element_blank(),  # Remove facet title box color
    axis.text.x = element_text(size = 8),  # Decrease x-axis label font size
    axis.text.y = element_text(size = 8)   # Decrease y-axis label font size
  ) +
  ggtitle("Coefficients based on 100-month rolling window")

# Print the plot
print(tsplot)

From simple OLS based on a 100-month window, it is apparent that there is some time-variation in the coefficients, hence, there is likely need for dynamic regression.

For computational simplicity, we focus here on predicting only a single point out-of-sample, September-2005, at which point inflation took a nose-dive. This is observation 308 in the date-vector.

3.2 Stan Models

Compile and estimate the following Stan models:

  1. ARR2 prior
  2. Minnesota prior
  3. RHS prior
Code
arr2 <- cmdstan_model("stan/arr2_dynreg_v2.stan")
minnesota <- cmdstan_model("stan/minnesota_dynreg.stan") 
rhs <- cmdstan_model("stan/rhs_dynreg.stan") 

# Helper function to create the minnesota model input
p <- 4 # just needed in order to get the Minnesota estimates

3.3 Prior Predictive Analysis

Code
# data for stan models
dat <- list(
  X = scale(emp_app$X)[1:307,],
  Y = as.vector(emp_app$y[1:307])*100,
  Xf = as.vector(scale(emp_app$X[1:308,])[308,]),
  yf = (emp_app$y[308])*100,
  K = dim(emp_app$X)[2],
  T = 307,#dim(emp_app$X)[1],
  mean_R2 = 1/3,
  prec_R2 = 3,
  alpha_sd = 3,
  var_x = rep(1,dim(emp_app$X)[2]),
  cons = rep(1,dim(emp_app$X)[2]),
  var_x_minn = as.numeric(minn_sig_create(scale(emp_app$X),p)),
  var_y_minn = as.numeric(minn_sig_create((emp_app$y*100),p)),
  hs_df = 3,
  hs_df_global = 1,
  hs_df_slab = 4,
  hs_scale_global = 1,
  hs_scale_slab = 2,
  p0 = round(dim(emp_app$X)[2]/2),
  pp = 1 # if 1, then the likelihood contribution is left out of the model block creating prior predictives
)

fit_arr2 <- arr2$sample(
  data = dat,
  seed = 123,
  chains = 4,
  adapt_delta = 0.99,
  max_treedepth = 15,
  parallel_chains = 4,
  refresh = 0,
  iter_warmup = 1000,
  iter_sampling = 1000,
  show_messages = FALSE
  )

fit_minn <- minnesota$sample(
  data = dat,
  seed = 123,
  chains = 4,
  adapt_delta = 0.99,
  max_treedepth = 15,
  parallel_chains = 4,
  refresh = 0,
  iter_warmup = 1000,
  iter_sampling = 1000,
  show_messages = FALSE
)

fit_rhs <- rhs$sample(
  data = dat,
  seed = 123,
  chains = 4,
  adapt_delta = 0.99,
  max_treedepth = 15,
  parallel_chains = 4,
  refresh = 0,
  iter_warmup = 1000,
  iter_sampling = 1000,
  show_messages = FALSE
)

Now, let’s plot the prior predictive R2 distributions. Remember that y_t = x_t^T \beta_t + \epsilon^y_t. Define \mu_t = x_t^T\beta_t, and stack across all obervations as \mu = (\mu_1,\dotsc,\mu_T), then we define the prior predictive R2 as for each draw (s) of the prior as \mathrm{var}(\mu^{(s)})/(\mathrm{var}(\mu^{(s)}) + \sigma^{2,(s)}_{y}).

Code
## Prior  R2 Graphs
# Extract Marg_r2 values
r2_arr2 <- fit_arr2$draws(variables = c("R2_samp"), format = "matrix")
r2_minnesota <- fit_minn$draws(variables = c("R2_samp"), format = "matrix")
r2_rhs <- fit_rhs$draws(variables = c("R2_samp"), format = "matrix")

# Convert draws to data frames
df_arr2 <- as.data.frame(r2_arr2) %>% mutate(model = "ARR2")
df_minnesota <- as.data.frame(r2_minnesota) %>% mutate(model = "Minnesota")
df_rhs <- as.data.frame(r2_rhs) %>% mutate(model = "RHS")

# Rename the column for consistency
colnames(df_arr2)[1] <- "marg_r2"
colnames(df_minnesota)[1] <- "marg_r2"
colnames(df_rhs)[1] <- "marg_r2"

# Combine data frames
combined_df <- bind_rows(df_arr2, df_minnesota, df_rhs)

# Create the plot
p <- ggplot(combined_df, aes(x = marg_r2, color = model)) +
  geom_density() +
  labs(title = "Prior R2 Distributions", y = "Density", x = "Marginal R2") +
  xlim(0, 1) +
  theme_minimal() +
  facet_wrap(~ model, scales = "free_y", ncol = 1) +
  theme(
    legend.position = "none",
    panel.grid = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.title.x = element_blank(),
    panel.border = element_blank()
  )

print(p)

The ARR2 prior clearly adheres to the logic imposed by the prior framework, while the Minnesota and RHS models tend toward an R^2 of 1 due to the state variance drowning out the variance of the observation equation. This manifests as very erratic prior expected paths of the TVP process:

3.4 Posterior Analysis

First, let’s estimate the models with the likelihood contribution included in the model block:

Now, let’s look at the posterior of the overall R2:

Code
## R2 Graphs
# Extract Marg_r2 values
r2_arr2 <- fit_arr2$draws(variables = c("R2_samp"), format = "matrix")
r2_minnesota <- fit_minn$draws(variables = c("R2_samp"), format = "matrix")
r2_rhs <- fit_rhs$draws(variables = c("R2_samp"), format = "matrix")

# Convert draws to data frames
df_arr2 <- as.data.frame(r2_arr2) %>% mutate(model = "ARR2")
df_minnesota <- as.data.frame(r2_minnesota) %>% mutate(model = "Minnesota")
df_rhs <- as.data.frame(r2_rhs) %>% mutate(model = "RHS")

# Rename the column for consistency
colnames(df_arr2)[1] <- "marg_r2"
colnames(df_minnesota)[1] <- "marg_r2"
colnames(df_rhs)[1] <- "marg_r2"

# Combine data frames
combined_df <- bind_rows(df_arr2, df_minnesota, df_rhs)

# Create the plot
p <- ggplot(combined_df, aes(x = marg_r2, color = model)) +
  geom_density() +
  labs(title = "Posterior R2 Distributions", y = "Density", x = "Marginal R2") +
  xlim(0, 1) +
  theme_minimal() +
  facet_wrap(~ model, scales = "free_y", ncol = 1) +
  theme(
    legend.position = "none",
    panel.grid = element_blank(),
    axis.text.y = element_blank(),
    axis.ticks.y = element_blank(),
    axis.title.x = element_blank(),
    panel.border = element_blank()
  )

print(p)

Clearly, the Minnesota and RHS prior regularise the variation in the coefficient processes less than the ARR2 prior, yielding larger marginal R^2 estimates. It is well known that inflation series are hard to predict Stock and Watson (2006) , and we would also expect to find a low R^2 estimates.

Since the \beta processes are functions of time, an estimate of time-variation in R^2 may be approximated by R^2_t = \frac{\beta_t^Tx_t^Tx_t^T\beta_t}{\beta_t^Tx_t^Tx_t^T\beta_t + \sigma^2_y}. This is how the mean of this quantity varies for the two models over time:

Code
timevec <- as.Date(emp_app$datevec[1:307])
betas1 <- extract_r2(fit_arr2, "ARR2", timevec)
betas2 <- extract_r2(fit_minn, "Minnesota", timevec)
betas3 <- extract_r2(fit_rhs, "RHS", timevec)

# Combine all R2_t data into one dataframe
betas <- bind_rows(betas1, betas2, betas3)


# Plot the data
tsplot <- ggplot(betas, aes(Time, R2_t)) +
  geom_line() +
  facet_wrap(~ Model, scales = "fixed",ncol =1) +
  cowplot::theme_half_open() +
  theme(
    plot.title = element_text(hjust = 0.5, size = 12),
    legend.position = "none",
    axis.title.x = element_blank(),
    axis.title.y = element_blank(),
    strip.background = element_blank(),  # Remove facet title box color
    axis.text.x = element_text(size = 8),  # Decrease x-axis label font size
    axis.text.y = element_text(size = 8)   # Decrease y-axis label font size
  ) +
  ggtitle("R2_t over time")

# Print the plot
print(tsplot)

The Minnesota and RHS models tend to favor more variation and larger magnitudes in R^2. Let’s look at the posterior mean estimates of \beta_t over time:

Finally, let’s have a quick look at the predictions for the held-out data point.

Code
fit_objects <- list(
  arr2 = fit_arr2,
  minn = fit_minn,
  rhs = fit_rhs
)

comparison_table <- compare_elpd(fit_objects)
print(comparison_table)
# A tibble: 3 × 3
  Model elpd_diff se_diff
  <chr>     <dbl>   <dbl>
1 arr2        0      0.06
2 minn      -13.4    0.77
3 rhs       -13.4    0.65

The ARR2 model also results in far better predictions.

4 Conclusion and Discussion

Prior and posterior R2 distributions show the similar behaviour as to what Kohns et al. (2025a) have shown: the ARR2 prior is the only prior that allows by design to have direct control over the models prior predictive R2 distribution

This leads to better predictions, at least for the out-of-sample observation under investigation, although in the future it would be better to do a full, expanding window predictions analysis as was done in Kohns et al. (2025a).

Allowing some states to have large variance results in meaningful differences in the coefficients processes as well as the R2 over time.

The non-centred state-space inference approach analysed in Frühwirth-Schnatter and Wagner (2010) would allow also non-ARR2 priors to exert stronger shrinkage on the state-standard deviations. It would be interesting derive the R2 prior for the non-centred formulation and then compare to non-centred versions of the competing priors here, too.

Licenses

  • Code © 2025, David Kohns, licensed under BSD-3.
  • Text © 2025, David Kohns, licensed under CC-BY-NC 4.0.

5 Stan Models

You can find the stan models used in this case-study here

Code
arr2$print()
// ARR2 Model


data {
  int<lower=1> T;
  vector[T] Y;
  int<lower=0> K;
  matrix[T,K] X;
  vector[K] Xf;
  real yf;

  // data for the R2D2 prior
  real<lower=0> mean_R2;  // mean of the R2 prior
  real<lower=0> prec_R2;  // precision of the R2 prior

  // Variance for intercept
  real<lower=0> alpha_sd;

  vector<lower=0>[K] var_x;
  
  // concentrations
  vector<lower=0>[K] cons;
  
  // Prior predictive indicator
  int<lower=0> pp; // if 1, then only prior predictive
  
  
}

transformed data{
  real Ymean;
  real sd_Yc;
  vector[T] Yc;
  matrix[K,K] xtx;

  xtx = X' * X;

  Ymean = mean(Y);
  for (i in 1:T){
    Yc[i] = Y[i] - Ymean;
  }

  sd_Yc = sd(Yc);

}

parameters {
  // States
  array[T] vector[K] z_beta;
  
  // Variance Parameters
  real<lower=0> sigma;
  // State AR Weights
  vector<lower=0,upper=1>[K] phi;
  // R2 Parameters
  simplex[(K)] psi;
  real<lower=0, upper=1> R2;
  // Intercept parameters
  real alpha;
}


transformed parameters{
  real<lower=0> tau2 = R2 / (1 - R2);
  vector<lower=0>[K] sigma_beta = sigma*sqrt(tau2) * sqrt(1-phi.^2)  .* sqrt(psi);
  array[T] vector[K] beta;

  // Initial Conditions
  beta[1] = z_beta[1] .* sigma_beta;

  // Loop here for TVP coefficients
  for (t in 2:T){
  beta[t]  =  phi.*beta[t-1] + z_beta[t] .* sigma_beta;
  }
  
}

model {
  // Priors
  sigma ~ student_t(3,0, sd_Yc);
  
    for (t in 1:T) {
  z_beta[t] ~ std_normal();
  }

  (phi +1)./2 ~ beta(10,2); // prior ensures stationarity of the AR coefficients


  psi ~ dirichlet(cons);
  R2 ~ beta(mean_R2 * prec_R2, (1 - mean_R2) * prec_R2);

  // Observation Likelihood
  target += normal_lpdf(alpha | 0, alpha_sd); // Intercept
  if (pp==0){
for (t in 1:T){
  Yc[t] ~ normal(X[t,]*beta[t] + alpha, sigma);
}
}
}


generated quantities {
  // Define storage
  real intercept = alpha + Ymean;
  real marg_r2 = sum(sigma_beta)/(sum(sigma_beta + sigma^2));
  vector[K] beta_T = phi.*beta[T];
  vector[K] bf;
  real muf;
  real yf_pred;
  real yf_lpdf;
  vector[T] R2_t; // R2 over time
  vector[T] log_lik;
  vector[T] ypred;
  
  // Define storage for Hadamard product
  matrix[T,K] X_tilde;
  
  // Define storage for row sums
  vector[T] s;
  real<lower=0,upper=1> R2_samp;
  
  // Calculate the Hadamard product and row sums
  for (t in 1:T) {
    s[t] = 0; // Initialize sum for row t
    for (k in 1:K) {
      X_tilde[t,k] = beta[t][k] * X[t,k];
      s[t] += X_tilde[t,k]; // Sum the elements in row t
    }
  }
  
  // R2 over time + logscores
  for (t in 1:T){
    R2_t[t] =  ((X[t,]*beta[t])^2)/((X[t,]*beta[t])^2 + sigma^2);
    log_lik[t] = normal_lpdf(Yc[t] | X[t,]*beta[t] + intercept,sigma);
    ypred[t] = normal_rng(X[t,]*beta[t],sigma);
  }
  
  // Sample R2
  R2_samp = variance(s)/(variance(s)+sigma^2);
  
  
  // Oos prediction
  for (i in 1:K){
  bf[i] = normal_rng(beta_T[i],sigma_beta[i]);
  }
  muf = intercept + Xf'*bf;
  yf_pred = normal_rng(muf,sigma);
  yf_lpdf = normal_lpdf(yf|muf,sigma);    
  
}
Code
minnesota$print()
// Minnesota Model

data {
  int<lower=1> T;
  vector[T] Y;
  int<lower=0> K;
  matrix[T,K] X;
  vector[K] Xf;
  real yf;

  // Variance for intercept
  real<lower=0> alpha_sd;

  // AR(4) residual variances of y and X
  real<lower=0> var_y_minn;
  vector<lower=0>[K] var_x_minn;
  int<lower=0> pp;
  
}

transformed data{
  real Ymean;
  real sd_Yc;
  vector[T] Yc;
  matrix[K,K] xtx;

  xtx = X' * X;

  Ymean = mean(Y);
  for (i in 1:T){
    Yc[i] = Y[i] - Ymean;
  }

  sd_Yc = sd(Yc);

}

parameters {
  // States
  array[T] vector[K] z_beta;
  
  // Variance Parameters
  real<lower=0> sigma;
  // State AR Weights
  vector<lower=0,upper=1>[K] phi;
  // Minnesota scale parameter
  real<lower=0> kappa2_k;
  // Intercept parameters
  real alpha;
  vector<lower=0>[K] lam_diff;
  //real<lower=0> nu_diff;
}


transformed parameters{
  vector<lower=0>[K] sigma_beta = sqrt(var_y_minn)* sqrt(1./var_x_minn) * sqrt(kappa2_k);
  vector<lower=0>[K] sigma_diff = lam_diff;
  array[T] vector[K] beta;
  // Initial Conditions
  beta[1] = z_beta[1] .* sigma_beta;

  // Loop here for TVP coefficients
  for (t in 2:T){
  beta[t]  =  phi.*beta[t-1] + z_beta[t] .* sigma_diff;
  }
  
}

model {
  // Priors
  sigma ~ student_t(3,0, sd_Yc);
  
    for (t in 1:T) {
  z_beta[t] ~ std_normal();
  }

  (phi +1)./2 ~ beta(10,2); // prior ensures stationarity of the AR coefficients


  kappa2_k ~ gamma(1,1/(0.04^2));
  lam_diff ~ student_t(1,0,1);
  //nu_diff ~ student_t(1,0,1);

  // Observation Likelihood
  target += normal_lpdf(alpha | 0, alpha_sd); // Intercept
  
  if (pp == 0) {
for (t in 1:T){
  Yc[t] ~ normal(X[t,]*beta[t] + alpha, sigma);
}
}
}

generated quantities {
    // Define storage
  real intercept = alpha + Ymean;
  real marg_r2 = sum(sigma_beta)/(sum(sigma_beta + sigma^2));
  vector[K] beta_T = phi.*beta[T];
  vector[K] bf;
  real muf;
  real yf_pred;
  real yf_lpdf;
  vector[T] R2_t; // R2 over time
  vector[T] log_lik;
  vector[T] ypred;
  
  // Define storage for Hadamard product
  matrix[T,K] X_tilde;
  
  // Define storage for row sums
  vector[T] s;
  real<lower=0,upper=1> R2_samp;
  
  // Calculate the Hadamard product and row sums
  for (t in 1:T) {
    s[t] = 0; // Initialize sum for row t
    for (k in 1:K) {
      X_tilde[t,k] = beta[t][k] * X[t,k];
      s[t] += X_tilde[t,k]; // Sum the elements in row t
    }
  }
  
  // R2 over time + logscores
  for (t in 1:T){
    R2_t[t] =  ((X[t,]*beta[t])^2)/((X[t,]*beta[t])^2 + sigma^2);
    log_lik[t] = normal_lpdf(Yc[t] | X[t,]*beta[t] + intercept,sigma);
    ypred[t] = normal_rng(X[t,]*beta[t],sigma);
  }
  
  // Sample R2
  R2_samp = variance(s)/(variance(s)+sigma^2);
  
  // Oos prediction
  for (i in 1:K){
  bf[i] = normal_rng(beta_T[i],sigma_beta[i]);
  }
  muf = intercept + Xf'*bf;
  yf_pred = normal_rng(muf,sigma);
  yf_lpdf = normal_lpdf(yf|muf,sigma);  
  
}
Code
rhs$print()
// RHS Model

functions {
  /* Efficient computation of the horseshoe prior
   * see Appendix C.1 in https://projecteuclid.org/euclid.ejs/1513306866
   * Args:
   *   z: standardized population-level coefficients
   *   lambda: local shrinkage parameters
   *   tau: global shrinkage parameter
   *   c2: slab regularization parameter
   * Returns:
   *   population-level coefficients following the horseshoe prior
   */
  vector horseshoe(vector z, vector lambda, real tau, real c2) {
    int K = rows(z);
    vector[K] lambda2 = square(lambda);
    vector[K] lambda_tilde = sqrt(c2 * lambda2 ./ (c2 + tau ^ 2 * lambda2));
    return z .* lambda_tilde * tau;
  }
}


data {
  int<lower=1> T;
  vector[T] Y;
  int<lower=0> K;
  matrix[T,K] X;

    // data for the horseshoe prior
  real<lower=0> hs_df; // local degrees of freedom
  real<lower=0> hs_df_global; // global degrees of freedom
  real<lower=0> hs_df_slab; // slab degrees of freedom
  real<lower=0> hs_scale_slab; // slab prior scale
  real<lower=0> p0;

  // Variance for intercept
  real<lower=0> alpha_sd;

  vector<lower=0>[K] var_x;
  
  // concentrations
  vector<lower=0>[K] cons;
  
  // For Predictions
  vector[K] Xf;
  real yf;
  
  int<lower=0> pp;

}

transformed data{
  real Ymean;
  real sd_Yc;
  vector[T] Yc;
  matrix[K,K] xtx;

  xtx = X' * X;

  Ymean = mean(Y);
  for (i in 1:T){
    Yc[i] = Y[i] - Ymean;
  }

  sd_Yc = sd(Yc);

}

parameters {
  // States
  array[T] vector[K] z_beta;
  
  // Variance Parameters
  real<lower=0> sigma;
  // State AR Weights
  vector<lower=0,upper=1>[K] phi;
  
  // local parameters for the horseshoe prior
  vector<lower=0>[K] hs_local;
  // horseshoe shrinkage parameters
  real<lower=0> hs_global; // global shrinkage parameter
  real<lower=0> hs_slab; // slab regularization parameter
  
    vector<lower=0>[K] lam_diff;

  
  // Intercept parameters
  real alpha;
}


transformed parameters{
  vector[K] lambda2 = square(hs_local);
  vector[K] lambda_tilde = sqrt(hs_scale_slab ^ 2 * hs_slab * lambda2 ./ (hs_scale_slab ^ 2 * hs_slab + hs_global ^ 2 * lambda2));
  vector<lower=0>[K] sigma_beta = lambda_tilde*hs_global;
  vector<lower=0>[K] sigma_diff = lam_diff;
  array[T] vector[K] beta;

  // Initial Conditions
  beta[1] = z_beta[1] .* sigma_beta;

  // Loop here for TVP coefficients
  for (t in 2:T){
  beta[t]  =  phi.*beta[t-1] + z_beta[t] .* sigma_diff;
  }
  
}

model {
  // Priors
  sigma ~ student_t(3,0, sd_Yc);
  
    for (t in 1:T) {
  z_beta[t] ~ std_normal();
  }

  (phi +1)./2 ~ beta(10,2); // prior ensures stationarity of the AR coefficients


  hs_global ~ student_t(hs_df_global, 0, p0 / (K - p0) * sigma / sqrt(T));
  hs_slab ~ inv_gamma(0.5 * hs_df_slab, 0.5 * hs_df_slab);
  hs_local ~ student_t(hs_df, 0, 1);
  
  sigma_diff ~ student_t(1,0,1);

  // Observation Likelihood
  target += normal_lpdf(alpha | 0, alpha_sd); // Intercept
  
  if (pp==0){
for (t in 1:T){
  Yc[t] ~ normal(X[t,]*beta[t] + alpha, sigma);
}
}
}

generated quantities {
  // Define storage
  real intercept = alpha + Ymean;
  real marg_r2 = sum(sigma_beta)/(sum(sigma_beta + sigma^2));
  vector[K] beta_T = phi.*beta[T];
  vector[K] bf;
  real muf;
  real yf_pred;
  real yf_lpdf;
  vector[T] R2_t; // R2 over time
  vector[T] log_lik;
  vector[T] ypred;
  
  // Define storage for Hadamard product
  matrix[T,K] X_tilde;
  
  // Define storage for row sums
  vector[T] s;
  real<lower=0,upper=1> R2_samp;
  
  // Calculate the Hadamard product and row sums
  for (t in 1:T) {
    s[t] = 0; // Initialize sum for row t
    for (k in 1:K) {
      X_tilde[t,k] = beta[t][k] * X[t,k];
      s[t] += X_tilde[t,k]; // Sum the elements in row t
    }
  }
  
  // R2 over time + logscores
  for (t in 1:T){
    R2_t[t] =  ((X[t,]*beta[t])^2)/((X[t,]*beta[t])^2 + sigma^2);
    log_lik[t] = normal_lpdf(Yc[t] | X[t,]*beta[t] + intercept,sigma);
    ypred[t] = normal_rng(X[t,]*beta[t],sigma);
  }
  
  // Sample R2
  R2_samp = variance(s)/(variance(s)+sigma^2);
  
  // Oos prediction
  for (i in 1:K){
  bf[i] = normal_rng(beta_T[i],sigma_beta[i]);
  }
  muf = intercept + Xf'*bf;
  yf_pred = normal_rng(muf,sigma);
  yf_lpdf = normal_lpdf(yf|muf,sigma);   
  
}

6 Original Computing Environment

Code
sessionInfo()
R version 4.3.0 (2023-04-21)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS 15.1.1

Matrix products: default
BLAS:   /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRblas.0.dylib 
LAPACK: /Library/Frameworks/R.framework/Versions/4.3-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.11.0

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

time zone: Europe/Tallinn
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] cmdstanr_0.5.3        reshape2_1.4.4        ggdist_3.3.2         
 [4] bayesplot_1.11.1.9000 gridExtra_2.3         dplyr_1.1.4          
 [7] tidyr_1.3.1           ggplot2_3.5.1.9000    tsutils_0.9.3        
[10] fredr_2.1.0           fbi_0.7.0            

loaded via a namespace (and not attached):
 [1] tensorA_0.36.2.1     gtable_0.3.6         xfun_0.39           
 [4] htmlwidgets_1.6.2    processx_3.8.4       lattice_0.21-8      
 [7] tzdb_0.4.0           ps_1.8.1             quadprog_1.5-8      
[10] vctrs_0.6.5          tools_4.3.0          generics_0.1.3      
[13] curl_5.0.0           parallel_4.3.0       tibble_3.2.1        
[16] xts_0.13.1           pkgconfig_2.0.3      checkmate_2.3.2     
[19] RColorBrewer_1.1-3   distributional_0.5.0 lifecycle_1.0.4     
[22] stringr_1.5.1        compiler_4.3.0       statmod_1.5.0       
[25] munsell_0.5.1        codetools_0.2-19     htmltools_0.5.5     
[28] yaml_2.3.7           pracma_2.4.4         pillar_1.10.1       
[31] nloptr_2.0.3         MASS_7.3-60          abind_1.4-8         
[34] smooth_3.2.1         nlme_3.1-162         posterior_1.6.1     
[37] fracdiff_1.5-2       tidyselect_1.2.1     digest_0.6.37       
[40] stringi_1.8.4        MAPA_2.0.5           purrr_1.0.4         
[43] tseries_0.10-54      fastmap_1.2.0        grid_4.3.0          
[46] colorspace_2.1-1     cli_3.6.4            magrittr_2.0.3      
[49] readr_2.1.4          withr_3.0.2          backports_1.5.0     
[52] scales_1.3.0         forecast_8.21        TTR_0.24.3          
[55] rmarkdown_2.22       httr_1.4.6           quantmod_0.4.22     
[58] nnet_7.3-18          timeDate_4022.108    zoo_1.8-12          
[61] hms_1.1.3            urca_1.3-3           evaluate_0.21       
[64] knitr_1.43           lmtest_0.9-40        greybox_1.0.8       
[67] rlang_1.1.5          Rcpp_1.0.13-1        xtable_1.8-4        
[70] glue_1.8.0           rstudioapi_0.14      jsonlite_1.8.4      
[73] plyr_1.8.9           R6_2.6.1             texreg_1.38.6       

References

Chan, Joshua C. C. 2021. “Minnesota-Type Adaptive Hierarchical Priors for Large Bayesian VARs.” International Journal of Forecasting 37 (3): 1212–26. https://doi.org/10.1016/j.ijforecast.2021.01.002.
Chan, Joshua CC, and Rodney W Strachan. 2023. “Bayesian State Space Models in Macroeconometrics.” Journal of Economic Surveys 37 (1): 58–75.
Frühwirth-Schnatter, Sylvia, and Helga Wagner. 2010. “Stochastic Model Specification Search for Gaussian and Partial Non-Gaussian State Space Models.” Journal of Econometrics 154 (1): 85–100.
Gelman, Andrew, Ben Goodrich, Jonah Gabry, and Aki Vehtari. 2019. “R-Squared for Bayesian Regression Models.” The American Statistician 73 (3): 307–9. https://doi.org/10.1080/00031305.2018.1549100.
Giannone, Domenico, Michele Lenza, and Giorgio Primiceri. 2012. “Prior Selection for Vector Autoregressions.” w18467. Cambridge, MA: National Bureau of Economic Research. https://doi.org/10.3386/w18467.
Giannone, Domenico, Michele Lenza, and Giorgio E Primiceri. 2021. “Economic Predictions with Big Data: The Illusion of Sparsity.” Econometrica 89 (5): 2409–37.
Huber, Florian, Gary Koop, and Luca Onorante. 2021. “Inducing Sparsity and Shrinkage in Time-Varying Parameter Models.” Journal of Business & Economic Statistics 39 (3): 669–83.
Kohns, David, Noa Kallioinen, Yann McLatchie, and Aki Vehtari. 2025b. “The ARR2 Prior: Flexible Predictive Prior Definition for Bayesian Auto-Regressions.” Bayesian Analysis 1 (1): 1–32. https://doi.org/10.1214/25-BA1512SUPP.
———. 2025a. “The ARR2 Prior: Flexible Predictive Prior Definition for Bayesian Auto-Regressions.” Bayesian Analysis 1 (1): 1–32.
Piironen, Juho, and Aki Vehtari. 2017. “On the Hyperprior Choice for the Global Shrinkage Parameter in the Horseshoe Prior.” In Proceedings of the 20th International Conference on Artificial Intelligence and Statistics, 905–13. PMLR.
Stock, James, and Mark Watson. 2006. “Why Has U.S. Inflation Become Harder to Forecast?” w12324. Cambridge, MA: National Bureau of Economic Research. https://doi.org/10.3386/w12324.
Zhang, Yan Dora, Brian P. Naughton, Howard D. Bondell, and Brian J. Reich. 2022. “Bayesian Regression Using a Prior on the Model Fit: The R2-D2 Shrinkage Prior.” Journal of the American Statistical Association 117 (538): 862–74. https://doi.org/10.1080/01621459.2020.1825449.